Find the top 'n' items in your org
Administrators often find themselves having to search and list items in their organization that match certain criteria. They might need this information for different administrative or auditing purposes. An example of such is to find items modified within the last 'n' days and sort them by their popularity or number of views. This notebook will work through such a use case of finding the top 100
public ArcGIS Dashboard items sorted by number of views and output the results into a CSV file that can be used for reporting or ingested into any other system.
The configurations needed for this notebook are in the top few cells. While this exact use case may not be what you need, you can easily modify the configuration cells and adopt it to suit your reporting needs.
Import arcgis
and other libraries¶
from arcgis.gis import GIS
from datetime import datetime, timedelta, timezone
from dateutil import tz
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from IPython.display import display
import os
gis = GIS("home")
Set up search parameters¶
# set up time zone for searching - 'PDT' in this example
la_tz = tz.gettz('America/Los_Angeles')
# set up a time filter - last 20 days in this example
end_time = datetime.now(tz=la_tz)
start_time = end_time - timedelta(days=20)
# sort order
search_sort_order = 'desc'
# search outside org?
search_outside_org = True
# number of items to search for
search_items_max = 100
# search item type
search_item_type = "Dashboard"
# output location
out_folder = '/arcgis/home/dashboard_counts'
ArcGIS stores the created
and modified
times for items as Unix Epoch millisecond timestamps in UTC time zone. The next cell will convert the start and end times to UTC timezone and then to Epoch. We multiply by 1000 to convert seconds to milliseconds.
end_time_epoch = end_time.astimezone(tz.UTC).timestamp()*1000
start_time_epoch = start_time.astimezone(tz.UTC).timestamp()*1000
# print settings
print(f'Time zone used: {end_time.tzname()}')
print(f'start time: {start_time} | as epoch: {start_time_epoch}')
print(f'end time: {end_time} | as epoch: {end_time_epoch}')
Search for ArcGIS Dashboard items¶
Next, we will construct a search query using the parameters defined above and query the org. To learn about the different parameters you can query for, see the search reference. You can combine this reference with the properties of Items found here to construct complex queries.
Since our org does not have over 100 Dashboard items, for the purpose of illustration, we search across all of ArcGIS Online.
query_string = f'modified: [{start_time_epoch} TO {end_time_epoch}]'
# search 100 most popular ArcGIS Dashboard items across all of ArcGIS Online
search_result = gis.content.search(query=query_string, item_type=search_item_type,
sort_field='numViews', sort_order=search_sort_order,
max_items=search_items_max, outside_org=search_outside_org)
len(search_result)
Compose a table from search results¶
Our next step is to compose a Pandas DataFrame object from the search result. For this, we will compose a list of dictionary objects from the search results and choose important item properties such as item ID, title, URL, created time, view counts etc.
%%time
result_list = []
for current_item in search_result:
result_dict = {}
result_dict['item_id'] = current_item.id
result_dict['num_views'] = current_item.numViews
result_dict['title'] = current_item.title
# process creation date
date_modified = datetime.fromtimestamp(current_item.modified/1000, tz=tz.UTC)
result_dict['date_modified'] = date_modified
result_dict['url'] = current_item.homepage
# append to list
result_list.append(result_dict)
df = pd.DataFrame(data=result_list)
Print the table's top 5 and bottom 5 rows
df.head() # top 5
df.tail() # bottom 5
Exploratory analysis on the top 'n' items¶
Now that we can collected our data, let us explore it. First we create a histogram of the number of views to look at the distribution.
fig, ax = plt.subplots(figsize=(10,6))
(df['num_views']/1000000).hist(bins=50)
ax.set_title(f'Histogram of view counts of top {search_items_max} ArcGIS {search_item_type} items')
ax.set_xlabel('Number of views in millions');
Most items in the top 100
list have less than one million views. We have a few outliers that have over a billion and one that is nearing a trillion views. We can find what those items are, simply by displaying the top few Item objects.
for current_item in search_result[:4]:
display(current_item)
Next, let us visualize the last modified date as a histogram. The date_modified
column is read as a DateTime
object with minute and second level data. We will resample this column and aggregate on a per day basis. The cell below uses Pandas
resample()
method for the same.
df2 = df.resample(rule='1D', on='date_modified') # resample to daily intervals
last_modified_counts = df2['item_id'].count()
# simplify date formatting
last_modified_counts.index = last_modified_counts.index.strftime('%m-%d')
# plot last modified dates as a histogram
fig, ax = plt.subplots(figsize=(15,6))
last_modified_counts.plot(kind='bar', ax=ax)
ax.set(xlabel = 'Dates',
title='Number of items modified in the last 20 days')
plt.xticks(rotation='horizontal');
Make a word cloud out of the item titles¶
To make a word cloud, we use a library called wordcloud
. As of this notebook, this library is not part of the default set of libraries available in the ArcGIS Notebook environment. However, you can easily install it as shown below:
!pip install wordcloud
Next we collect title strings from all the items and join them into a long paragraph.
%%time
title_series = df['title'].dropna()
title_list = list(title_series)
title_paragraph = '. '.join(title_list)
title_paragraph
from wordcloud import WordCloud
wc = WordCloud(width=1000, height=600, background_color='white')
wc_img = wc.generate_from_text(title_paragraph)
plt.figure(figsize=(20,10))
plt.imshow(wc_img, interpolation="bilinear")
plt.axis('off')
plt.title('What are the top 100 ArcGIS Dashboard items about?');
Not surprisingly, most items are about the Novel Coronavirus. The word 'Dashboard' also appears pretty frequently enough.
Write the table to a CSV in your 'files' location¶
We create a folder defined earlier in the configuration section of this notebook to store a CSV
file containing the items table.
# create a folder for these files if it does not exist
if not os.path.exists(out_folder):
os.makedirs(out_folder)
print(f'Created output folder at: {out_folder}')
else:
print(f'Using existing output folder at: {out_folder}')
# append timestamp to filename to make it unique
output_filename = f"top_dash_items_{start_time.strftime('%m-%d-%y')}_to_{end_time.strftime('%m-%d-%y')}"
# write table to csv
df.to_csv(os.path.join(out_folder, output_filename))
print('Output csv created at : ' + os.path.join(out_folder, output_filename))
Conclusion¶
This notebook demonstrated how to use the ArcGIS API for Python library to construct a search query and search for items in your org (or outside it). The notebook also demonstrated how to work with timezones, datetime
objects and how to explore the meta data of the items collected. The notebook concludes by writing the table as a CSV on disk. If this kind of workflow needs to be repeated at set intervals, you can easily do so by scheduling your notebook to run at set intervals.