Administrators often find themselves having to search and list items in their organization that match certain criteria. They might need this information for different administrative or auditing purposes. An example of such is to find items modified within the last 'n' days and sort them by their popularity or number of views. This notebook will work through such a use case of finding the top 100
public ArcGIS Dashboard items sorted by number of views and output the results into a CSV file that can be used for reporting or ingested into any other system.
The configurations needed for this notebook are in the top few cells. While this exact use case may not be what you need, you can easily modify the configuration cells and adopt it to suit your reporting needs.
Import arcgis
and other libraries
from arcgis.gis import GIS
from datetime import datetime, timedelta, timezone
from dateutil import tz
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from IPython.display import display
import os
gis = GIS("home")
Set up search parameters
# set up time zone for searching - 'PDT' in this example
la_tz = tz.gettz('America/Los_Angeles')
# set up a time filter - last 20 days in this example
end_time = datetime.now(tz=la_tz)
start_time = end_time - timedelta(days=20)
# sort order
search_sort_order = 'desc'
# search outside org?
search_outside_org = True
# number of items to search for
search_items_max = 100
# search item type
search_item_type = "Dashboard"
# output location
out_folder = '/arcgis/home/dashboard_counts'
ArcGIS stores the created
and modified
times for items as Unix Epoch millisecond timestamps in UTC time zone. The next cell will convert the start and end times to UTC timezone and then to Epoch. We multiply by 1000 to convert seconds to milliseconds.
end_time_epoch = end_time.astimezone(tz.UTC).timestamp()*1000
start_time_epoch = start_time.astimezone(tz.UTC).timestamp()*1000
# print settings
print(f'Time zone used: {end_time.tzname()}')
print(f'start time: {start_time} | as epoch: {start_time_epoch}')
print(f'end time: {end_time} | as epoch: {end_time_epoch}')
Time zone used: PDT start time: 2020-09-25 22:22:18.008928-07:00 | as epoch: 1601097738008.928 end time: 2020-10-15 22:22:18.008928-07:00 | as epoch: 1602825738008.928
Search for ArcGIS Dashboard items
Next, we will construct a search query using the parameters defined above and query the org. To learn about the different parameters you can query for, see the search reference. You can combine this reference with the properties of Items found here to construct complex queries.
Since our org does not have over 100 Dashboard items, for the purpose of illustration, we search across all of ArcGIS Online.
query_string = f'modified: [{start_time_epoch} TO {end_time_epoch}]'
# search 100 most popular ArcGIS Dashboard items across all of ArcGIS Online
search_result = gis.content.search(query=query_string, item_type=search_item_type,
sort_field='numViews', sort_order=search_sort_order,
max_items=search_items_max, outside_org=search_outside_org)
len(search_result)
100
Compose a table from search results
Our next step is to compose a Pandas DataFrame object from the search result. For this, we will compose a list of dictionary objects from the search results and choose important item properties such as item ID, title, URL, created time, view counts etc.
%%time
result_list = []
for current_item in search_result:
result_dict = {}
result_dict['item_id'] = current_item.id
result_dict['num_views'] = current_item.numViews
result_dict['title'] = current_item.title
# process creation date
date_modified = datetime.fromtimestamp(current_item.modified/1000, tz=tz.UTC)
result_dict['date_modified'] = date_modified
result_dict['url'] = current_item.homepage
# append to list
result_list.append(result_dict)
CPU times: user 437 µs, sys: 118 µs, total: 555 µs Wall time: 559 µs
df = pd.DataFrame(data=result_list)
Print the table's top 5 and bottom 5 rows
df.head() # top 5
item_id | num_views | title | date_modified | url | |
---|---|---|---|---|---|
0 | bda7594740fd40299423467b48e9ecf6 | 1950923268 | Coronavirus COVID-19 (2019-nCoV) | 2020-10-08 20:56:35+00:00 | https://www.arcgis.com/home/item.html?id=bda75... |
1 | 85320e2ea5424dfaaa75ae62e5c06e61 | 961689924 | Dashboard Coronavirus COVID-19 (Mobile) | 2020-10-08 20:56:24+00:00 | https://www.arcgis.com/home/item.html?id=85320... |
2 | 32cb2526e3044fe8b5392ac4c9a466bc | 109083112 | Covid19 Lietuva stats LINE portal MOB | 2020-10-09 08:07:48+00:00 | https://www.arcgis.com/home/item.html?id=32cb2... |
3 | e19b7292484a4c05bb2a74e1f00421ba | 99927240 | Covid-19 flat curve LT | 2020-10-09 09:55:47+00:00 | https://www.arcgis.com/home/item.html?id=e19b7... |
4 | 8d0de33f260d444c852a615dc7837c86 | 84417372 | Florida COVID-19 Confirmed Cases | 2020-10-02 15:55:07+00:00 | https://www.arcgis.com/home/item.html?id=8d0de... |
df.tail() # bottom 5
item_id | num_views | title | date_modified | url | |
---|---|---|---|---|---|
95 | 31370c72d3844e6b962fcf8490718821 | 1458106 | COVID Zip Code Dashboard | 2020-10-07 16:42:49+00:00 | https://www.arcgis.com/home/item.html?id=31370... |
96 | 21bec056a9a6449abcca89a329868fd6 | 1454170 | Douglas County NE COVID-19 Dashboard (retired ... | 2020-09-29 14:19:30+00:00 | https://www.arcgis.com/home/item.html?id=21bec... |
97 | ebb119cd215b4c57933b7fbe477e7c30 | 1421105 | Tulsa County Public Health COVID-19 Cases | 2020-10-15 20:49:48+00:00 | https://www.arcgis.com/home/item.html?id=ebb11... |
98 | f1d9acad6d0947ecaae1aee987f13339 | 1405731 | SSI COVID - 19 Dashboard Dansk | 2020-10-15 12:20:59+00:00 | https://www.arcgis.com/home/item.html?id=f1d9a... |
99 | dcf9e493894e49b1853737f1ca3f6c6f | 1365022 | SSI COVID - 19 Dashboard Dansk - Mobil | 2020-10-15 13:04:53+00:00 | https://www.arcgis.com/home/item.html?id=dcf9e... |
Exploratory analysis on the top 'n' items
Now that we can collected our data, let us explore it. First we create a histogram of the number of views to look at the distribution.
fig, ax = plt.subplots(figsize=(10,6))
(df['num_views']/1000000).hist(bins=50)
ax.set_title(f'Histogram of view counts of top {search_items_max} ArcGIS {search_item_type} items')
ax.set_xlabel('Number of views in millions');
Most items in the top 100
list have less than one million views. We have a few outliers that have over a billion and one that is nearing a trillion views. We can find what those items are, simply by displaying the top few Item objects.
for current_item in search_result[:4]:
display(current_item)
Next, let us visualize the last modified date as a histogram. The date_modified
column is read as a DateTime
object with minute and second level data. We will resample this column and aggregate on a per day basis. The cell below uses Pandas
resample()
method for the same.
df2 = df.resample(rule='1D', on='date_modified') # resample to daily intervals
last_modified_counts = df2['item_id'].count()
# simplify date formatting
last_modified_counts.index = last_modified_counts.index.strftime('%m-%d')
# plot last modified dates as a histogram
fig, ax = plt.subplots(figsize=(15,6))
last_modified_counts.plot(kind='bar', ax=ax)
ax.set(xlabel = 'Dates',
title='Number of items modified in the last 20 days')
plt.xticks(rotation='horizontal');
Make a word cloud out of the item titles
To make a word cloud, we use a library called wordcloud
. As of this notebook, this library is not part of the default set of libraries available in the ArcGIS Notebook environment. However, you can easily install it as shown below:
!pip install wordcloud
Collecting wordcloud Downloading wordcloud-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (365 kB) [K |████████████████████████████████| 365 kB 3.9 MB/s eta 0:00:01 [?25hRequirement already satisfied: matplotlib in /opt/conda/lib/python3.6/site-packages (from wordcloud) (3.1.3) Requirement already satisfied: numpy>=1.6.1 in /opt/conda/lib/python3.6/site-packages (from wordcloud) (1.18.1) Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from wordcloud) (6.2.2) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (1.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.4.7) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.8.1) Requirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.15.0) Installing collected packages: wordcloud Successfully installed wordcloud-1.8.0
Next we collect title strings from all the items and join them into a long paragraph.
%%time
title_series = df['title'].dropna()
title_list = list(title_series)
title_paragraph = '. '.join(title_list)
title_paragraph
CPU times: user 0 ns, sys: 4.08 ms, total: 4.08 ms Wall time: 2.13 ms
'Coronavirus COVID-19 (2019-nCoV). Dashboard Coronavirus COVID-19 (Mobile). Covid19 Lietuva stats LINE portal MOB. Covid-19 flat curve LT. Florida COVID-19 Confirmed Cases. Covid19 Lietuva stats LINE portal. COVID-19 panel. COVID-19 ITALIA - Desktop. RKI COVID-19 Germany Mobil. RKI COVID-19 Germany LK. RKI COVID-19 Germany BL. JHU Centers for Civic Impact US COVID-19 Dashboard by County. COVID-19 ITALIA - Mobile. JHU Centers for Civic Impact US COVID-19 Dashboard by County (Mobile). Synthèse Patients Covid19 en France - LCI Vertical. MD COVID-19 Data Dashboard. MD COVID-19 Data Dashboard (Mobile). COVID-19 In Texas (Dashboard). Cases and Trends. mobile - 都道府県別新型コロナウイルス感染者数マップ Coronavirus COVID-19 Japan Case (2019-nCoV) . 都道府県別新型コロナウイルス感染者数マップ Coronavirus COVID-19 Japan Case (2019-nCoV)【INFRAME】. State of Minnesota Gender Dashboard Percentage MDH. State of Minnesota Age Dashboard Percentage MDH. State of Minnesota Race Dashboard Percentage MDH. State of Minnesota Exposure Dashboard Percentage MDH. Alabama COVID-19 Data and Surveillance Dashboard. Covid-19 Cases by Zip Code - Mobile. Synthèse Patients Covid-19 en France - Desktop. State of Minnesota Ethnicity Dashboard Percentage MDH. Casos Coronavirus en Uruguay (Web). Casos Coronavirus en Uruguay (Movil). COVID-19 Latvijā. FOHM Covid-19 (mobil). Evolução do Covid-19 em Portugal - Mobile. PR COVID-19 Desktop App. FOHM Covid-19 (desktop). התפשטות נגיף הקורונה. COVID-19 Algérie- Evolution de la situation_version mobile. COVID-19 -Algérie- Evolution de la situation. COVID19 PY (mobile). Covid19 Lietuva STATS. Riverside County COVID19 Cases per Census Designated Place (CDP). COVID-19 Surveillance Dashboard (Myanmar) Mobile View. Covid-19 Cases by Zip Code. Manitoba COVID-19 - Daily Statistics Chart. COVID-19 Dashboard - Harris County Public Health and Houston Health Department. התפשטות נגיף הקורונה Mobile Ynet Bright. Covid-19 testai LT. COVID-19 In Texas (Dashboard) Mobile Optimized. COVID-19. PROD_CASES BY COUNTY. Suivi du Covid-19 au Sénégal en temps réél. Austin Travis COVID-19 Public Impact Dashboard (Mobile). COVID19 PY. British Columbia COVID-19 Dashboard - Mobile. COVID-19 IN ALABAMA. Covid-19 Tilannekuva RELEASE 2.0 (MOBIILI). Austin Travis County COVID-19 Public Dashboard. Coronavirus (COVID-19) Guatemala. British Columbia COVID-19 Dashboard - Desktop. Novel Coronavirus (2019-nCoV) Surveillance in Myanmar. COVID-19 Canarias. Ghana Health Service COVID-19 Dashboard (Mobile Version). Dashboard InaCOVID-19 (desktop). Coronavirus in Nederland (desktop weergave). Covid-19 Tilannekuva RELEASE 2.0 (DESKTOP). Dashboard InaCOVID-19 (mobile). Louisiana COVID-19 Information - Mobile (relayout). Tirol_MOBIL_Dashboard BEZIRKE. COVID19 Malta Mobile. Koronavirus (COVID-19) - Slovenija. Koronavirus Hrvatska (mobile). COVID-19 Erie County, New York. CoSA/Bexar County COVID-19 Dashboard. State of Oregon Fires and Hotspots Dashboard Mobile. COVID-19 Dashboard - Harris County Public Health and Houston Health Department (mobile version). התפשטות נגיף הקורונה מפה בלבד. COVID-19 PC Surveillance Dashboard. Coronavirus l Dresden. COVID-19 in Texas - Texas Tests and Hospitals (Dashboard). Plaza Publica WEB. COVID-19 Dashboard (Mobile Public). Plaza Publica Movil. Public MOH Dashboard Arabic. Ausbreitung des Coronavirus im Kreis Paderborn. COVID-19 CV PC v6 SDE. Coronafaelle2020_KreisGT. Lancaster County NE COVID-19 Dashboard. COVID-19 Canarias (móvil). COVID-19. DuPage County COVID-19 - Municipal Level (Public Version). FOHM Covid-19 (tablet). COVID-19 mapy light. COVID-19 CV MV v6 SDE. Manitoba COVID-19 Dashboard (URL filter). COVID Zip Code Dashboard. Douglas County NE COVID-19 Dashboard (retired 20200929). Tulsa County Public Health COVID-19 Cases. SSI COVID - 19 Dashboard Dansk. SSI COVID - 19 Dashboard Dansk - Mobil'
from wordcloud import WordCloud
wc = WordCloud(width=1000, height=600, background_color='white')
wc_img = wc.generate_from_text(title_paragraph)
plt.figure(figsize=(20,10))
plt.imshow(wc_img, interpolation="bilinear")
plt.axis('off')
plt.title('What are the top 100 ArcGIS Dashboard items about?');
Not surprisingly, most items are about the Novel Coronavirus. The word 'Dashboard' also appears pretty frequently enough.
Write the table to a CSV in your 'files' location
We create a folder defined earlier in the configuration section of this notebook to store a CSV
file containing the items table.
# create a folder for these files if it does not exist
if not os.path.exists(out_folder):
os.makedirs(out_folder)
print(f'Created output folder at: {out_folder}')
else:
print(f'Using existing output folder at: {out_folder}')
Using existing output folder at: /arcgis/home/dashboard_counts
# append timestamp to filename to make it unique
output_filename = f"top_dash_items_{start_time.strftime('%m-%d-%y')}_to_{end_time.strftime('%m-%d-%y')}"
# write table to csv
df.to_csv(os.path.join(out_folder, output_filename))
print('Output csv created at : ' + os.path.join(out_folder, output_filename))
Output csv created at : /arcgis/home/dashboard_counts/top_dash_items_09-25-20_to_10-15-20
Conclusion
This notebook demonstrated how to use the ArcGIS API for Python library to construct a search query and search for items in your org (or outside it). The notebook also demonstrated how to work with timezones, datetime
objects and how to explore the meta data of the items collected. The notebook concludes by writing the table as a CSV on disk. If this kind of workflow needs to be repeated at set intervals, you can easily do so by scheduling your notebook to run at set intervals.