Inventory Organizational Content
Being able to retrieve, display, analyze, and export the content within an organization Portal are important tasks for any admin. Here we will leverage the ContentManager and UserManager classes of the GIS module, as well as some functionality from the Pandas library, to accomplish those tasks.
Import Libraries¶
import pandas as pd
from IPython.display import display
import arcgis
from arcgis.gis import GIS
Connect to ArcGIS Online¶
profile_name = "my_dev_profile"
gis = GIS(profile=profile_name)
gis.users.me
Querying Content¶
To search for content within our organization, we can access the ContentManager class via gis.content()
.
Using the advanced_search()
method, we can query content belonging to a user by providing the string "owner: < username >"
. By setting the return_count
parameter of advanced_search()
to True
, we can simply return a single integer representing the number of items which that user owns.
Let's return the number of items that belong to the user currently logged in:
qe = f"owner: {gis.users.me.username}"
my_content_count = gis.content.advanced_search(query=qe,return_count=True)
print(my_content_count, 'items found for current user')
Searching for Content¶
If we leave the return_count
parameter as its default value False
, then we will receive a response dictionary containing metadata about the query as well as a list of returned items in the results
field.
By setting the max_items
parameter, we can limit the number of items that are returned in the results
field.
max_items = 3
user_content = gis.content.advanced_search(query=qe, max_items=max_items)
user_content
Displaying Content¶
# Displaying the result items through IPython.display.display()
for item in user_content['results']:
display(item)
It is also possible to have these items returned as dictionary objects by setting the as_dict
parameter:
# return items as a dictionary with as_dict=True
user_content_as_dict = gis.content.advanced_search(
query=qe, max_items=max_items,as_dict=True)
user_content_as_dict['results']
Sorting Content¶
The sort_field
and sort_order
parameters of the advanced_search()
method can be used to sort the returned content server side.
Possible values for sort_order
are "asc"
for ascending or increasing order and "desc"
for descending or decreasing order. Default values for the sort_field
and sort_order
parameters are "title"
and "asc"
, respectively.
In this next example we'll search for the last 3 items that the current user modified by setting sort_field="modified"
and sort_order="desc"
:
content_last_modified = gis.content.advanced_search(
query=qe, max_items=max_items, sort_field="modified", sort_order="desc")
for item in content_last_modified['results']:
display(item)
Here we return the first 3 items that the user created by setting sort_field="created"
and sort_order="asc"
:
content_first_created = gis.content.advanced_search(query=qe, max_items=max_items, sort_field="created", sort_order="asc")
for item in content_first_created['results']:
display(item)
Querying Organization Content¶
Searching for Organization Members¶
We can search for a list of the members within the organization by using the UserManager class within the GIS module. Here we access the UserManager by calling gis.users
, and use the search()
method to return a list of organization members. The search()
method will return all users in the organization if no parameters are provided:
# View UserManager object
gis.users
org_users = gis.users.search()
print(f'{len(org_users)} users found')
org_users[:3]
# Display a misc member
org_member = org_users[1]
org_member
Getting Member Content¶
Similarly to above, we can set return_count=True
and see how many items this user has:
# See the number of member items
qe = "owner: " + org_member.username
member_content_count = gis.content.advanced_search(
query=qe, max_items=-1, return_count=True)
print(f"Org member has {member_content_count} items")
# Return <max_items> items from member
max_items = 3
member_content = gis.content.advanced_search(query=qe, max_items=max_items)
member_content['results']
Compiling Organization Content¶
If we return all items for each user in the organization, we can compile those items into a single list representing all of the organizations content.
We can remove the item limit for each query by setting max_items=-1
in the advanced_search()
function:
# return content for each user in org, compile into a single list
org_content = []
for user in org_users:
qe = f"owner: {user.username}"
user_content = gis.content.advanced_search(query=qe, max_items=-1)['results']
org_content += user_content
print(f"{len(org_content)} items found in org")
Analyzing Organization Content with Pandas¶
Let's put our compiled list into a pandas DataFrame to easily view and filter our data
# Create DataFrame
content_df = pd.DataFrame(org_content)
content_df.head()
We can use the pandas function value_counts()
to see how many occurrences there are of each value for a particular column. Here we return the top 10 most frequently occurring item types and the number of instances they have:
# use value_counts() to see how many items you have with a particular key:value pair
content_df.type.value_counts().head(10)
Another value_counts()
example where we see the distribution of access levels for each of the items in the organization:
content_df.access.value_counts()
Using the value_counts()
function in conjunction with the groupby()
operation allows for an additional level of analysis. Here we see the breakdown of item types that each user has created:
content_df.groupby('owner').type.value_counts().head(10)
# Viewing the number item types per access level within the org
content_df.groupby('type').access.value_counts().head(10)
Filtering the Dataset¶
We can choose which columns we'd like to view, and the order we'd like to view them in, by providing the DataFrame with a list of strings matching column names:
view_columns = ['id','title','owner','type','access']
content_df[view_columns].head()
Creating and applying Boolean masks is a very efficient way to the filter the rows of a DataFrame. By using standard operators such as <
, >
, ==
and !=
on pandas Series objects (e.g. the columns of our DataFrame), we can create a new Series of True
and False
values, called a mask. When this mask is applied to the original DataFrame, a new DataFrame will be returned with only the rows corresponding to where the mask had a True
value.
Let's create a mask to represent all items with public level access:
filter_value = 'public'
filter_column = 'access'
row_filter = content_df[filter_column]==filter_value
row_filter.head()
Applying this mask to our DataFrame, we return all fields for objects which have access=='public'
:
print(len(content_df[row_filter]), 'objects in filtered DataFrame')
content_df[row_filter].head()
We can apply both the column filter and Boolean mask at the same time to reduce the amount of information displayed:
content_df[row_filter][view_columns].head()
Another example where we create a Boolean mask for all objects of type "Web Map"
:
filter_value = 'Web Map'
filter_column = 'type'
row_filter = content_df[filter_column]==filter_value
content_df[row_filter][view_columns]
Boolean masks can also be combined to represent multiple filters. Here we combine the Web Map and Public masks to return all items in our organization which are public web maps:
# Combining masks
web_map_filter = content_df.type=='Web Map'
public_filter = content_df.access=='public'
combined_mask = web_map_filter & public_filter
content_df[combined_mask][view_columns]
The apply()
method can also be used to generate masks that can't be created using the standard comparison operators. As long as the function called within the apply method has a Boolean output, then the result can be used as a mask to filter rows. Here we use a lambda function to return all items which have a type that ends with the word "Service".
# Creating masks with .apply and lambda functions
service_filter = content_df.type.apply(lambda x: x.endswith('Service'))
content_df[service_filter][view_columns]
Accessing Content by ID¶
Once we've identified an item of interest in our DataFrame, we can return the content of that item by providing its ID to the ContentManager get()
method. If we know the index of the object in the DataFrame (i.e. the leftmost value), then we can access that row's information using the loc()
method. From there we can get the id of the item and provide it to the get method.
# Return the index of the last item in the previous output
# In this example the index column is labelled 'name'
target_index = content_df[service_filter].iloc[-1].name
print("Target index:", target_index)
# Accessing items with content.get()
target_data = content_df.loc[target_index]
print(target_data.id)
target_content = gis.content.get(target_data.id)
target_content
For more information on using item ids, see this community post.
Exporting Data¶
Pandas provides a convenient to_csv()
method which can be used to generate zipped and unzipped csv outputs. Simply provide your target path with the appropriate file extension and call the method on the DataFrame object you would like to export.
# Exporting data to a csv
target_path = "org_content.csv"
content_df.to_csv(target_path)
# Exporting data to gzipped csv file
target_path_gzip = "org_content.csv.gz"
content_df.to_csv(target_path_gzip)
# Exporting data to zipped csv file
target_path_zip = "org_content.csv.zip"
content_df.to_csv(target_path_zip)
Pandas also provides additional methods for exporting the data as different file formats (e.g. to_json()
, to_pickle()
, to_excel()
) which behave similarly.