Updating features in a feature layer¶
As content publishers, you may be required to keep certain web layers up to date. As new data arrives, you may have to append new features, update existing features etc. There are a couple of different options to accomplish this:
- Method 1: editing individual features as updated datasets are available
- Method 2: overwriting feature layers altogether with updated datasets
Depending on the number of features that are updated, your workflow requirements, you may adopt either or both kinds of update mechanisms.
In this sample, we explore the first method:
Method 1
- Updating feature layer by editing individual features
For Method 2, refer to the sample titled Overwriting feature layers
Note: To run this sample, you need the pandas
library in your conda environment. If you don't have the library, install it by running the following command from cmd.exe or your shell
conda install pandas
# Connect to the GIS
from arcgis.gis import GIS
from arcgis import features
import pandas as pd
#Access the portal using "amazing_arcgis_123" as password for the given Username.
gis = GIS("https://pythonapi.playground.esri.com/portal", "arcgis_python")
Updating feature layer by editing individual features¶
Let us consider a scenario where we need to update a feature layer containing the capital cities of the US. We have 3 csv datasets simulating an update workflow as described below:
- capitals_1.csv -- contains the initial, incomplete dataset
- capitals_2.csv -- contains additional points and updates to existing points, building on top of capitals_1.csv
- capitals_annex.csv -- an alternate table containing additional attribute information
Our goal is to update the feature layer with each of these datasets doing the necessary edit operations.
Publish the cities feature layer using the initial dataset¶
# read the initial csv
csv1 = 'data/updating_gis_content/capitals_1.csv'
cities_df_1 = pd.read_csv(csv1)
cities_df_1.head()
# print the number of records in this csv
cities_df_1.shape
As you can see, this dataset only contains 19 rows or 19 capital cities. It is not the complete dataset.
Let's add this csv
as a portal item. Adding the item creates a CSV item and uploads the original file to the portal, establishing a link between the item and the original file name. Therefore, we need a unique name for the file to guarantee it does not collide with any file of the same name that may have been uploaded by the same user. We'll use standard library modules to copy the file and give it a new name so we can add it to the portal
import os
import datetime as dt
import shutil
# assign variables to locations on the file system
cwd = os.path.abspath(os.getcwd())
data_pth = os.path.join(cwd, r'data/updating_gis_content/')
# create a unique timestamp string to append to the file name
now_ts = str(int(dt.datetime.now().timestamp()))
# copy the file, appending the unique string and assign it to a variable
my_csv = shutil.copyfile(os.path.abspath(csv1),
os.path.join(data_pth, 'capitals_1_' + now_ts + '.csv'))
my_csv
# add the initial csv file and publish that as a web layer
item_prop = {'title':'USA Capitals spreadsheet ' + now_ts}
csv_item = gis.content.add(item_properties=item_prop, data=my_csv)
csv_item
This spreadsheet has co-ordinates as latitude
and longitude
columns which will be used for geometries during publishing.
# publish the csv item into a feature layer
cities_item = csv_item.publish()
cities_item
# update the item metadata
item_prop = {'title':'USA Capitals'}
cities_item.update(item_properties = item_prop, thumbnail='data/updating_gis_content/capital_cities.png')
cities_item
Apply updates from the second spreadsheet¶
The next set of updates have arrived and are stored in capitals_2.csv
. We are told it contains corrections for the original set in addition to new features. We need to figure out which rows have changed, apply update
operation on those, then apply add
operation to new rows.
To start with, let us read the second csv file. Note, in this sample, data is stored in csv. In reality, it could be from your enterprise database or any other data source.
# read the second csv set
csv2 = 'data/updating_gis_content/capitals_2.csv'
cities_df_2 = pd.read_csv(csv2)
cities_df_2.head()
# get the dimensions of this csv
cities_df_2.shape
Identifying existing features that need to be updated¶
To identify features that need to be updated, let us read the attribute table of the published feature layer and compare that against the second csv. To read the attribute table, we perform a query()
on the feature layer which returns us an arcgis.feature.FeatureSet
object. Refer to the guide pages on accessing features from feature layers to learn more about this.
Note, at this point, we could work with the cities_df_1
dataframe we created from the original csv file. However, in practice you may not always have the original dataset or your feature layer might have undergone edits after it was published. Hence, we query the feature layer directly.
cities_flayer = cities_item.layers[0]
cities_fset = cities_flayer.query() #querying without any conditions returns all the features
cities_fset.sdf.head()
The city_id
column is common between both the datasets. Next, let us perform an inner
join with the table from feature layer as left and updated csv as right. Inner joins will yield those rows that are present in both tables. Learn more about inner joins here.
overlap_rows = pd.merge(left = cities_fset.sdf, right = cities_df_2, how='inner',
on = 'city_id')
overlap_rows
Thus, of 19 features in original and 36 features in second csv, 4 features are common. Inspecting the table, we find certain columns are updated, for instance, Cheyenne has its coordinates corrected, Oklahoma City has its state abbreviation corrected and similarly other cities have one of their attribute columns updated.
We could either update individual attribute values for these 4 features or update all attribute values with the latest csv. Below, we are performing the latter as it is simple and fast.
Perform updates to the existing features¶
features_for_update = [] #list containing corrected features
all_features = cities_fset.features
# inspect one of the features
all_features[0]
Note the X and Y geometry values are different from decimal degree coordinates present in Longitude and Latitude fields. To perform geometry edits, we need to project the coordinates to match that of the feature layer.
# get the spatial reference of the features since we need to update the geometry
cities_fset.spatial_reference
Below, we prepare updated geometries and attributes for each of the 4 features we determined above. We use the arcgis.geometry
module to project
the coordinates from geographic to projected coordinate system. The cell below prints the original Feature
objects followed by the updated ones. If you look closely, you can find the differences.
from arcgis import geometry #use geometry module to project Long,Lat to X and Y
from copy import deepcopy
for city_id in overlap_rows['city_id']:
# get the feature to be updated
original_feature = [f for f in all_features if f.attributes['city_id'] == city_id][0]
feature_to_be_updated = deepcopy(original_feature)
print(str(original_feature))
# get the matching row from csv
matching_row = cities_df_2.where(cities_df_2.city_id == city_id).dropna()
#get geometries in the destination coordinate system
input_geometry = {'y':float(matching_row['latitude']),
'x':float(matching_row['longitude'])}
output_geometry = geometry.project(geometries = [input_geometry],
in_sr = 4326,
out_sr = cities_fset.spatial_reference['latestWkid'],
gis = gis)
# assign the updated values
feature_to_be_updated.geometry = output_geometry[0]
feature_to_be_updated.attributes['longitude'] = float(matching_row['longitude'])
feature_to_be_updated.attributes['city_id'] = int(matching_row['city_id'])
feature_to_be_updated.attributes['state'] = matching_row['state'].values[0]
feature_to_be_updated.attributes['capital'] = matching_row['capital'].values[0]
feature_to_be_updated.attributes['latitude'] = float(matching_row['latitude'])
feature_to_be_updated.attributes['name'] = matching_row['name'].values[0]
feature_to_be_updated.attributes['pop2000'] = int(matching_row['pop2000'])
feature_to_be_updated.attributes['pop2007'] = int(matching_row['pop2007'])
#add this to the list of features to be updated
features_for_update.append(feature_to_be_updated)
print(str(feature_to_be_updated))
print("========================================================================")
We have constructed a list of features with updated values. We can use this list to perform updates on the feature layer.
features_for_update
To update the feature layer, call the edit_features()
method of the FeatureLayer
object and pass the list of features to the updates
parameter:
cities_flayer.edit_features(updates= features_for_update)
We have successfully applied corrections to those features which existed in the feature layer from the initial dataset. Next let us proceed to adding new features present only in the second csv file.
Identifying new features that need to be added¶
#select those rows in the capitals_2.csv that do not overlap with those in capitals_1.csv
new_rows = cities_df_2[~cities_df_2['city_id'].isin(overlap_rows['city_id'])]
print(new_rows.shape)
new_rows.head()
Thus, of the total 36 rows in the second csv, we have determined the 32 other rows which are new and need to be appended as new features.
Adding new features¶
Next, let us compose another list
of Feature
objects similar to earlier, from the new_rows
data frame.
features_to_be_added = []
# get a template feature object
template_feature = deepcopy(features_for_update[0])
# loop through each row and add to the list of features to be added
for row in new_rows.iterrows():
new_feature = deepcopy(template_feature)
#print
print("Creating " + row[1]['name'])
#get geometries in the destination coordinate system
input_geometry = {'y':float(row[1]['latitude']),
'x':float(row[1]['longitude'])}
output_geometry = geometry.project(geometries = [input_geometry],
in_sr = 4326,
out_sr = cities_fset.spatial_reference['latestWkid'],
gis = gis)
# assign the updated values
new_feature.geometry = output_geometry[0]
new_feature.attributes['longitude'] = float(row[1]['longitude'])
new_feature.attributes['city_id'] = int(row[1]['city_id'])
new_feature.attributes['state'] = row[1]['state']
new_feature.attributes['capital'] = row[1]['capital']
new_feature.attributes['latitude'] = float(row[1]['latitude'])
new_feature.attributes['name'] = row[1]['name']
new_feature.attributes['pop2000'] = int(row[1]['pop2000'])
new_feature.attributes['pop2007'] = int(row[1]['pop2007'])
#add this to the list of features to be updated
features_to_be_added.append(new_feature)
# take a look at one of the features we created
features_to_be_added[0]
Thus, we have created a list
of Feature
objects with appropriate attributes and geometries. Next, to add these new features to the feature layer, call the edit_features()
method of the FeatureLayer
object and pass the list of Feature
objects to the adds
parameter:
cities_flayer.edit_features(adds = features_to_be_added)
Thus, we have successfully applied edits from second csv file. Next let us look at how we can apply edits from third csv file.
Apply edits from third spreadsheet¶
The next set of updates have arrived and are stored in capitals_annex.csv
. We are told it contains additional columns for each of the features that we want to add to the feature layer.
To start with, let us read the third csv file. Note in this sample, data is stored in csv. In reality, it could be from your enterprise database or any other data source.
# read the third csv set
csv3 = 'data/updating_gis_content/capitals_annex.csv'
cities_df_3 = pd.read_csv(csv3)
cities_df_3.head()
#find the number of rows in the third csv
cities_df_3.shape
The capitals_annex.csv
does not add new features, instead it adds additional attribute columns to existing features. It has 51 rows which were found to match the 19 + 32 rows from first and second csv files. The columns City_ID
and NAME
are common to all 3 spreadsheets. Next let us take a look at how we can append this additional attribute information to our feature layer.
Inspecting existing fields of the feature layer¶
The manager
property of the FeatureLayer
object exposes a set of methods to read and update the properties and definition of feature layers.
#Get the existing list of fields on the cities feature layer
cities_fields = cities_flayer.manager.properties.fields
# Your feature layer may have multiple fields,
# instead of printing all, let us take a look at one of the fields:
cities_fields[1]
From above, we can see the representation of one of the fields. Let us loop through each of the fields and print the name
, alias
, type
and sqlType
properties
for field in cities_fields:
print(f"{field.name:13}| {field.alias:13}| {field.type:25}| {field.sqlType}")
Preparing additional columns to add to the feature layer¶
Now that we have an idea of how the fields are defined, we can go ahead and append new fields to the layer's definition. Once we compose a list of new fields, by calling the add_to_definition()
method we can push those changes to the feature layer. Once the feature layer's definition is updated with new fields, we can loop through each feature and add the appropriate attribute values.
To compose a list of new fields to be added, we start by making a copy of one of the fields as a template and start editing it. One easy part in this example is, all new fields that need to be added except one, are of the same data type: integer. With your data, this may not be the case. In such instances, you can add each field individually.
# get a template field
template_field = dict(deepcopy(cities_fields[1]))
template_field
Let us use pandas to get the list of fields that are new in spread sheet 3
# get the list of new fields to add from the third spreadsheet, that are not in spread sheets 1,2
new_field_names = list(cities_df_3.columns.difference(cities_df_1.columns))
new_field_names
Now loop though each new field name and create a field dictionary using the template we created earlier. Except the field titled class
all other fields are of type integer
.
fields_to_be_added = []
for new_field_name in new_field_names:
current_field = deepcopy(template_field)
if new_field_name.lower() == 'class':
current_field['sqlType'] = 'sqlTypeVarchar'
current_field['type'] = 'esriFieldTypeString'
current_field['length'] = 8000
current_field['name'] = new_field_name.lower()
current_field['alias'] = new_field_name
fields_to_be_added.append(current_field)
len(fields_to_be_added)
#inspect one of the fields
fields_to_be_added[3]
Adding additional fields to the feature layer¶
The list of new fields we composed can be pushed to the server by calling add_to_definition()
method on the manager
property.
cities_flayer.manager.add_to_definition({'fields':fields_to_be_added})
Thus, we have successfully added new fields to our feature layer. Let us verify the new columns show up:
new_cities_fields = cities_flayer.manager.properties.fields
len(new_cities_fields)
for field in new_cities_fields:
print(f"{field.name:10}| {field.type}")
Adding attribute values to the new columns¶
Next we can loop through each row in the third csv and add the new attribute values for these newly created columns.
# Run a fresh query on the feature layer so it includes the new features from
# csv2 and new columns from csv3
cities_fset2 = cities_flayer.query()
cities_features2 = cities_fset2.features
Loop through each row in the third spreadsheet, find the corresponding feature by matching the city_id
value and apply the attribute values for the new fields.
features_for_update = []
for city_id in cities_df_3['city_id']:
# get the matching row from csv
matching_row = cities_df_3.where(cities_df_3.city_id == city_id).dropna()
print(str(city_id) + " Adding additional attributes for: " + matching_row['name'].values[0])
# get the feature to be updated
original_feature = [f for f in cities_features2 if f.attributes['city_id'] == city_id][0]
feature_to_be_updated = deepcopy(original_feature)
# assign the updated values
feature_to_be_updated.attributes['class'] = matching_row['class'].values[0]
feature_to_be_updated.attributes['white'] = int(matching_row['white'])
feature_to_be_updated.attributes['black'] = int(matching_row['black'])
feature_to_be_updated.attributes['ameri_es'] = int(matching_row['ameri_es'])
feature_to_be_updated.attributes['asian'] = int(matching_row['asian'])
feature_to_be_updated.attributes['hawn_pl'] = int(matching_row['hawn_pl'])
feature_to_be_updated.attributes['hispanic'] = int(matching_row['hispanic'])
feature_to_be_updated.attributes['males'] = int(matching_row['males'])
feature_to_be_updated.attributes['females'] = int(matching_row['females'])
#add this to the list of features to be updated
features_for_update.append(feature_to_be_updated)
# inspect one of the features
features_for_update[-1]
# apply the edits to the feature layer
cities_flayer.edit_features(updates= features_for_update)
Verify the changes made so far¶
Let us run another query on the feature layer and visualize a few rows.
cities_fset3 = cities_flayer.query()
cities_fset3.sdf.head(5)
Conclusion¶
In this sample, we observed an edit intensive method to keep feature layers updated. We published data from first spreadsheet as a feature layer. We then updated existing features from second spread sheet (used geometry module to project the coordinates in the process), and added new features. The third spreadsheet presented additional attribute columns which were added to the feature layer by editing its definition and then updating the features with this additional data.
This method is editing intensive and you may choose this when the number of features to edit is less or if you needed to selectively update certain features as updates come in.
An alternate method is to overwrite the feature layer altogether when you always have current information coming in. This method is explained in the sample Overwriting feature layers
Feedback on this topic?