Find the top 'n' items in your org

Administrators often find themselves having to search and list items in their organization that match certain criteria. They might need this information for different administrative or auditing purposes. An example of such is to find items modified within the last 'n' days and sort them by their popularity or number of views. This notebook will work through such a use case of finding the top 100 public ArcGIS Dashboard items sorted by number of views and output the results into a CSV file that can be used for reporting or ingested into any other system.

The configurations needed for this notebook are in the top few cells. While this exact use case may not be what you need, you can easily modify the configuration cells and adopt it to suit your reporting needs.

Import `arcgis` and other libraries

from arcgis.gis import GIS
from datetime import datetime, timedelta, timezone
from dateutil import tz
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from IPython.display import display
import os
gis = GIS("home")

Set up search parameters

# set up time zone for searching - 'PDT' in this example
la_tz = tz.gettz('America/Los_Angeles')

# set up a time filter - last 20 days in this example
end_time = datetime.now(tz=la_tz)
start_time = end_time - timedelta(days=20)

# sort order
search_sort_order = 'desc'

# search outside org?
search_outside_org = True

# number of items to search for
search_items_max = 100

# search item type
search_item_type = "Dashboard"

# output location
out_folder = '/arcgis/home/dashboard_counts'

ArcGIS stores the created and modified times for items as Unix Epoch millisecond timestamps in UTC time zone. The next cell will convert the start and end times to UTC timezone and then to Epoch. We multiply by 1000 to convert seconds to milliseconds.

end_time_epoch = end_time.astimezone(tz.UTC).timestamp()*1000
start_time_epoch = start_time.astimezone(tz.UTC).timestamp()*1000

# print settings
print(f'Time zone used: {end_time.tzname()}')
print(f'start time: {start_time} | as epoch: {start_time_epoch}')
print(f'end time: {end_time} | as epoch: {end_time_epoch}')

Time zone used: PDT
start time: 2020-09-25 22:22:18.008928-07:00 | as epoch: 1601097738008.928
end time: 2020-10-15 22:22:18.008928-07:00 | as epoch: 1602825738008.928

Search for ArcGIS Dashboard items

Next, we will construct a search query using the parameters defined above and query the org. To learn about the different parameters you can query for, see the search reference. You can combine this reference with the properties of Items found here to construct complex queries.

Since our org does not have over 100 Dashboard items, for the purpose of illustration, we search across all of ArcGIS Online.

query_string = f'modified: [{start_time_epoch} TO {end_time_epoch}]'

# search 100 most popular ArcGIS Dashboard items across all of ArcGIS Online
search_result = gis.content.search(query=query_string, item_type=search_item_type, 
                                   sort_field='numViews', sort_order=search_sort_order,
                                   max_items=search_items_max, outside_org=search_outside_org)
len(search_result)

Compose a table from search results

Our next step is to compose a Pandas DataFrame object from the search result. For this, we will compose a list of dictionary objects from the search results and choose important item properties such as item ID, title, URL, created time, view counts etc.

%%time
result_list = []

for current_item in search_result:
    result_dict = {}
    result_dict['item_id'] = current_item.id
    result_dict['num_views'] = current_item.numViews
    result_dict['title'] = current_item.title
    
    # process creation date
    date_modified = datetime.fromtimestamp(current_item.modified/1000, tz=tz.UTC)
    result_dict['date_modified'] = date_modified
    
    result_dict['url'] = current_item.homepage
    
    # append to list
    result_list.append(result_dict)

CPU times: user 437 µs, sys: 118 µs, total: 555 µs
Wall time: 559 µs

df = pd.DataFrame(data=result_list)

Print the table's top 5 and bottom 5 rows

df.head() # top 5

	item_id	num_views	title	date_modified	url
0	bda7594740fd40299423467b48e9ecf6	1950923268	Coronavirus COVID-19 (2019-nCoV)	2020-10-08 20:56:35+00:00	https://www.arcgis.com/home/item.html?id=bda75...
1	85320e2ea5424dfaaa75ae62e5c06e61	961689924	Dashboard Coronavirus COVID-19 (Mobile)	2020-10-08 20:56:24+00:00	https://www.arcgis.com/home/item.html?id=85320...
2	32cb2526e3044fe8b5392ac4c9a466bc	109083112	Covid19 Lietuva stats LINE portal MOB	2020-10-09 08:07:48+00:00	https://www.arcgis.com/home/item.html?id=32cb2...
3	e19b7292484a4c05bb2a74e1f00421ba	99927240	Covid-19 flat curve LT	2020-10-09 09:55:47+00:00	https://www.arcgis.com/home/item.html?id=e19b7...
4	8d0de33f260d444c852a615dc7837c86	84417372	Florida COVID-19 Confirmed Cases	2020-10-02 15:55:07+00:00	https://www.arcgis.com/home/item.html?id=8d0de...

df.tail() # bottom 5

	item_id	num_views	title	date_modified	url
95	31370c72d3844e6b962fcf8490718821	1458106	COVID Zip Code Dashboard	2020-10-07 16:42:49+00:00	https://www.arcgis.com/home/item.html?id=31370...
96	21bec056a9a6449abcca89a329868fd6	1454170	Douglas County NE COVID-19 Dashboard (retired ...	2020-09-29 14:19:30+00:00	https://www.arcgis.com/home/item.html?id=21bec...
97	ebb119cd215b4c57933b7fbe477e7c30	1421105	Tulsa County Public Health COVID-19 Cases	2020-10-15 20:49:48+00:00	https://www.arcgis.com/home/item.html?id=ebb11...
98	f1d9acad6d0947ecaae1aee987f13339	1405731	SSI COVID - 19 Dashboard Dansk	2020-10-15 12:20:59+00:00	https://www.arcgis.com/home/item.html?id=f1d9a...
99	dcf9e493894e49b1853737f1ca3f6c6f	1365022	SSI COVID - 19 Dashboard Dansk - Mobil	2020-10-15 13:04:53+00:00	https://www.arcgis.com/home/item.html?id=dcf9e...

Exploratory analysis on the top 'n' items

Now that we can collected our data, let us explore it. First we create a histogram of the number of views to look at the distribution.

fig, ax = plt.subplots(figsize=(10,6))
(df['num_views']/1000000).hist(bins=50)
ax.set_title(f'Histogram of view counts of top {search_items_max} ArcGIS {search_item_type} items')
ax.set_xlabel('Number of views in millions');

Most items in the top 100 list have less than one million views. We have a few outliers that have over a billion and one that is nearing a trillion views. We can find what those items are, simply by displaying the top few Item objects.

for current_item in search_result[:4]:
    display(current_item)

Coronavirus COVID-19 (2019-nCoV)
This ArcGIS Dashboard created by the CSSE Team at Johns Hopkins contains the most up-to-date coronavirus COVID-19 (2019-nCoV) cases and latest trend plot.

Dashboard by CSSE_GISandData
Last Modified: October 08, 2020
0 comments, 1,950,923,268 views

Dashboard Coronavirus COVID-19 (Mobile)
This ArcGIS Dashboard created by the CSSE Team at Johns Hopkins contains the most up-to-date coronavirus COVID-19 (2019-nCoV) cases and latest trend plot.

Dashboard by CSSE_GISandData
Last Modified: October 08, 2020
0 comments, 961,689,924 views

Covid19 Lietuva stats LINE portal MOB

Dashboard by 15min.lt
Last Modified: October 09, 2020
0 comments, 109,083,112 views

Covid-19 flat curve LT

Dashboard by 15min.lt
Last Modified: October 09, 2020
0 comments, 99,927,240 views

Next, let us visualize the last modified date as a histogram. The date_modified column is read as a DateTime object with minute and second level data. We will resample this column and aggregate on a per day basis. The cell below uses Pandas resample() method for the same.

df2 = df.resample(rule='1D', on='date_modified')  # resample to daily intervals
last_modified_counts = df2['item_id'].count()

# simplify date formatting
last_modified_counts.index = last_modified_counts.index.strftime('%m-%d')

# plot last modified dates as a histogram
fig, ax = plt.subplots(figsize=(15,6))
last_modified_counts.plot(kind='bar', ax=ax)
ax.set(xlabel = 'Dates',
      title='Number of items modified in the last 20 days')
plt.xticks(rotation='horizontal');

Make a word cloud out of the item titles

To make a word cloud, we use a library called wordcloud. As of this notebook, this library is not part of the default set of libraries available in the ArcGIS Notebook environment. However, you can easily install it as shown below:

!pip install wordcloud

Collecting wordcloud
  Downloading wordcloud-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (365 kB)
[K     |████████████████████████████████| 365 kB 3.9 MB/s eta 0:00:01
[?25hRequirement already satisfied: matplotlib in /opt/conda/lib/python3.6/site-packages (from wordcloud) (3.1.3)
Requirement already satisfied: numpy>=1.6.1 in /opt/conda/lib/python3.6/site-packages (from wordcloud) (1.18.1)
Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from wordcloud) (6.2.2)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (1.2.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.15.0)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.8.0

Next we collect title strings from all the items and join them into a long paragraph.

%%time
title_series = df['title'].dropna()
title_list = list(title_series)
title_paragraph = '. '.join(title_list)
title_paragraph

CPU times: user 0 ns, sys: 4.08 ms, total: 4.08 ms
Wall time: 2.13 ms

'Coronavirus COVID-19 (2019-nCoV). Dashboard Coronavirus COVID-19 (Mobile). Covid19 Lietuva stats LINE portal MOB. Covid-19 flat curve LT. Florida COVID-19 Confirmed Cases. Covid19 Lietuva stats LINE portal. COVID-19 panel. COVID-19 ITALIA - Desktop. RKI COVID-19 Germany Mobil. RKI COVID-19 Germany LK. RKI COVID-19 Germany BL. JHU Centers for Civic Impact US COVID-19 Dashboard by County. COVID-19 ITALIA - Mobile. JHU Centers for Civic Impact US COVID-19 Dashboard by County   (Mobile). Synthèse Patients Covid19 en France - LCI Vertical. MD COVID-19 Data Dashboard. MD COVID-19 Data Dashboard (Mobile). COVID-19 In Texas (Dashboard). Cases and Trends. mobile - 都道府県別新型コロナウイルス感染者数マップ Coronavirus COVID-19 Japan Case (2019-nCoV) . 都道府県別新型コロナウイルス感染者数マップ Coronavirus COVID-19 Japan Case (2019-nCoV)【INFRAME】. State of Minnesota Gender Dashboard Percentage MDH. State of Minnesota Age Dashboard Percentage MDH. State of Minnesota Race Dashboard Percentage MDH. State of Minnesota Exposure Dashboard Percentage MDH. Alabama COVID-19 Data and Surveillance Dashboard. Covid-19 Cases by Zip Code  - Mobile. Synthèse Patients Covid-19 en France - Desktop. State of Minnesota Ethnicity Dashboard Percentage MDH. Casos Coronavirus en Uruguay (Web). Casos Coronavirus en Uruguay (Movil). COVID-19 Latvijā. FOHM Covid-19 (mobil). Evolução do Covid-19 em Portugal - Mobile. PR COVID-19 Desktop App. FOHM Covid-19 (desktop). התפשטות נגיף הקורונה. COVID-19 Algérie- Evolution de la situation_version mobile. COVID-19 -Algérie- Evolution de la situation. COVID19 PY (mobile). Covid19 Lietuva STATS. Riverside County COVID19 Cases per Census Designated Place (CDP). COVID-19 Surveillance Dashboard (Myanmar) Mobile View. Covid-19 Cases by Zip Code. Manitoba COVID-19 - Daily Statistics Chart. COVID-19 Dashboard - Harris County Public Health and Houston Health Department. התפשטות נגיף הקורונה Mobile Ynet Bright. Covid-19 testai LT. COVID-19 In Texas (Dashboard) Mobile Optimized. COVID-19. PROD_CASES BY COUNTY. Suivi du Covid-19 au Sénégal en temps réél. Austin Travis COVID-19 Public Impact Dashboard (Mobile). COVID19 PY. British Columbia COVID-19 Dashboard - Mobile. COVID-19 IN ALABAMA. Covid-19 Tilannekuva RELEASE 2.0 (MOBIILI). Austin Travis County COVID-19 Public Dashboard. Coronavirus (COVID-19) Guatemala. British Columbia COVID-19 Dashboard - Desktop. Novel Coronavirus (2019-nCoV) Surveillance in Myanmar. COVID-19 Canarias. Ghana Health Service COVID-19 Dashboard (Mobile Version). Dashboard InaCOVID-19 (desktop). Coronavirus in Nederland (desktop weergave). Covid-19 Tilannekuva RELEASE 2.0 (DESKTOP). Dashboard InaCOVID-19 (mobile). Louisiana COVID-19 Information - Mobile (relayout). Tirol_MOBIL_Dashboard BEZIRKE. COVID19 Malta Mobile. Koronavirus (COVID-19) - Slovenija. Koronavirus Hrvatska (mobile). COVID-19 Erie County, New York. CoSA/Bexar County COVID-19 Dashboard. State of Oregon Fires and Hotspots Dashboard Mobile. COVID-19 Dashboard - Harris County Public Health and Houston Health Department (mobile version).  התפשטות נגיף הקורונה מפה בלבד. COVID-19 PC Surveillance Dashboard. Coronavirus l Dresden. COVID-19 in Texas - Texas Tests and Hospitals (Dashboard). Plaza Publica WEB. COVID-19 Dashboard (Mobile Public). Plaza Publica Movil. Public MOH Dashboard Arabic.  Ausbreitung des Coronavirus im Kreis Paderborn. COVID-19 CV PC v6 SDE. Coronafaelle2020_KreisGT. Lancaster County NE COVID-19 Dashboard. COVID-19 Canarias (móvil). COVID-19. DuPage County COVID-19 - Municipal Level (Public Version). FOHM Covid-19 (tablet). COVID-19 mapy light. COVID-19 CV MV v6 SDE. Manitoba COVID-19 Dashboard (URL filter). COVID Zip Code Dashboard. Douglas County NE COVID-19 Dashboard (retired 20200929). Tulsa County Public Health COVID-19 Cases. SSI COVID - 19 Dashboard Dansk. SSI COVID - 19 Dashboard Dansk - Mobil'

from wordcloud import WordCloud
wc = WordCloud(width=1000, height=600, background_color='white')

wc_img = wc.generate_from_text(title_paragraph)

plt.figure(figsize=(20,10))
plt.imshow(wc_img, interpolation="bilinear")
plt.axis('off')
plt.title('What are the top 100 ArcGIS Dashboard items about?');

Not surprisingly, most items are about the Novel Coronavirus. The word 'Dashboard' also appears pretty frequently enough.

Write the table to a CSV in your 'files' location

We create a folder defined earlier in the configuration section of this notebook to store a CSV file containing the items table.

# create a folder for these files if it does not exist

if not os.path.exists(out_folder):
    os.makedirs(out_folder)
    print(f'Created output folder at: {out_folder}')
else:
    print(f'Using existing output folder at: {out_folder}')

Using existing output folder at: /arcgis/home/dashboard_counts

# append timestamp to filename to make it unique
output_filename = f"top_dash_items_{start_time.strftime('%m-%d-%y')}_to_{end_time.strftime('%m-%d-%y')}"

# write table to csv
df.to_csv(os.path.join(out_folder, output_filename))
print('Output csv created at : ' + os.path.join(out_folder, output_filename))

Output csv created at : /arcgis/home/dashboard_counts/top_dash_items_09-25-20_to_10-15-20

Conclusion

This notebook demonstrated how to use the ArcGIS API for Python library to construct a search query and search for items in your org (or outside it). The notebook also demonstrated how to work with timezones, datetime objects and how to explore the meta data of the items collected. The notebook concludes by writing the table as a CSV on disk. If this kind of workflow needs to be repeated at set intervals, you can easily do so by scheduling your notebook to run at set intervals.