Introduction to the Spatially Enabled DataFrame

The Spatially Enabled DataFrame (SEDF) creates a simple, intutive object that can easily manipulate geometric and attribute data.

New at version 1.5, the Spatially Enabled DataFrame is an evolution of the SpatialDataFrame object that you may be familiar with. While the SDF object is still avialable for use, the team has stopped active development of it and is promoting the use of this new Spatially Enabled DataFrame pattern. The SEDF provides you better memory management, ability to handle larger datasets and is the pattern that Pandas advocates as the path forward.

The Spatially Enabled DataFrame inserts a custom namespace called spatial into the popular Pandas DataFrame structure to give it spatial abilities. This allows you to use intutive, pandorable operations on both the attribute and spatial columns. Thus, the SEDF is based on data structures inherently suited to data analysis, with natural operations for the filtering and inspecting of subsets of values which are fundamental to statistical and geographic manipulations.

The dataframe reads from many sources, including shapefiles, Pandas DataFrames, feature classes, GeoJSON, and Feature Layers.

In [1]:
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor

Accessing GIS data

GIS users need to work with both published layers on remote servers (web layers) and local data, but the ability to manipulate these datasets without permanentently copying the data is lacking. The Spatial Enabled DataFrame solves this problem because it is an in-memory object that can read, write and manipulate geospatial data.

The SEDF integrates with Esri's ArcPy site-package as well as the open source pyshp, shapely and fiona packages. This means the ArcGIS API for Python SEDF can use either of these geometry engines to provide you options for easily working with geospatial data regardless of your platform. The SEDF transforms data into the formats you desire so you can use Python functionality to analyze and visualize geographic information.

Data can be read and scripted to automate workflows and just as easily visualized on maps in Jupyter notebooks. The SEDF can export data as feature classes or publish them directly to servers for sharing according to your needs. Let's explore some of the different options available with the versatile Spatial Enabled DataFrame namespaces:

Reading Web Layers

Feature layers hosted on ArcGIS Online or ArcGIS Enterprise can be easily read into a Spatially Enabled DataFrame using the from_layer method. Once you read it into a SEDF object, you can create reports, manipulate the data, or convert it to a form that is comfortable and makes sense for its intended purpose.

Example: Retrieving an ArcGIS Online item and using the layers property to inspect the first 5 records of the layer

In [6]:
from arcgis import GIS
gis = GIS()
item = gis.content.get("85d0ca4ea1ca4b9abf0c51b9bd34de2e")
flayer = item.layers[0]

# create a Spatially Enabled DataFrame object
sdf = pd.DataFrame.spatial.from_layer(flayer)
sdf.head()
Out[6]:
AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44 AGE_45_54 AGE_55_64 AGE_5_9 AGE_65_74 AGE_75_84 ... PLACEFIPS POP2010 POPULATION POP_CLASS RENTER_OCC SHAPE ST STFIPS VACANT WHITE
0 2144 2314 2002 3531 3887 5643 6353 2067 5799 2850 ... 0408220 39540 40346 6 6563 {"x": -12751215.004681978, "y": 4180278.406256... AZ 04 6703 32367
1 876 867 574 1247 1560 2122 2342 733 2157 975 ... 0424895 14364 14847 6 1397 {"x": -12755627.731115643, "y": 4164465.572856... AZ 04 1389 12730
2 1000 1003 833 2311 2063 2374 3631 1068 6165 3776 ... 0425030 26265 26977 6 1963 {"x": -12734674.294574209, "y": 3850472.723091... AZ 04 9636 22995
3 2730 2850 2194 4674 5240 7438 8440 2499 8145 4608 ... 0439370 52527 55041 7 6765 {"x": -12725332.21151233, "y": 4096532.0908223... AZ 04 9159 47335
4 2732 2965 2024 3182 3512 3109 1632 2497 916 467 ... 0463470 25505 29767 6 1681 {"x": -12770984.257542243, "y": 3826624.133935... AZ 04 572 16120

5 rows × 51 columns

When you inspect the type of the object, you get back a standard pandas DataFrame object. However, this object now has an additional SHAPE column that allows you to perform geometric operations. In other words, this DataFrame is now geo-aware.

In [7]:
type(sdf)
Out[7]:
pandas.core.frame.DataFrame

Further, the DataFrame has a new spatial property that provides a list of geoprocessing operations that can be performed on the object. The rest of the guides in this section go into details of how to use these functionalities. So, sit tight.

Reading Feature Layer Data

As seen above, the SEDF can consume a Feature Layer served from either ArcGIS Online or ArcGIS Enterprise orgs. Let's take a step-by-step approach to break down the notebook cell above and then extract a subset of records from the feature layer.

Example: Examining Feature Layer content

Use the from_layer method on the SEDF to instantiate a data frame from an item's layer and inspect the first 5 records.

In [9]:
# Retrieve an item from ArcGIS Online from a known ID value
known_item = gis.content.get("85d0ca4ea1ca4b9abf0c51b9bd34de2e")
known_item
Out[9]:
USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.Feature Layer Collection by esri_dm
Last Modified: December 21, 2017
3 comments, 331,873 views
In [10]:
# Obtain the first feature layer from the item
fl = known_item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(fl)

# Return the first 5 records. 
sdf.head()
Out[10]:
AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44 AGE_45_54 AGE_55_64 AGE_5_9 AGE_65_74 AGE_75_84 ... PLACEFIPS POP2010 POPULATION POP_CLASS RENTER_OCC SHAPE ST STFIPS VACANT WHITE
0 2144 2314 2002 3531 3887 5643 6353 2067 5799 2850 ... 0408220 39540 40346 6 6563 {"x": -12751215.004681978, "y": 4180278.406256... AZ 04 6703 32367
1 876 867 574 1247 1560 2122 2342 733 2157 975 ... 0424895 14364 14847 6 1397 {"x": -12755627.731115643, "y": 4164465.572856... AZ 04 1389 12730
2 1000 1003 833 2311 2063 2374 3631 1068 6165 3776 ... 0425030 26265 26977 6 1963 {"x": -12734674.294574209, "y": 3850472.723091... AZ 04 9636 22995
3 2730 2850 2194 4674 5240 7438 8440 2499 8145 4608 ... 0439370 52527 55041 7 6765 {"x": -12725332.21151233, "y": 4096532.0908223... AZ 04 9159 47335
4 2732 2965 2024 3182 3512 3109 1632 2497 916 467 ... 0463470 25505 29767 6 1681 {"x": -12770984.257542243, "y": 3826624.133935... AZ 04 572 16120

5 rows × 51 columns

NOTE: See Pandas DataFrame head() method documentation for details.

You can also use sql queries to return a subset of records by leveraging the ArcGIS API for Python's Feature Layer object itself. When you run a query() on a FeatureLayer, you get back a FeatureSet object. Calling the sdf property of the FeatureSet returns a Spatially Enabled DataFrame object. We then use the data frame's head() method to return the first 5 records and a subset of columns from the DataFrame:

Example: Feature Layer Query Results to a Spatially Enabled DataFrame

We'll use the AGE_45_54 column to query the dataframe and return a new DataFrame with a subset of records. We can use the built-in zip() function to print the data frame attribute field names, and then use data frame syntax to view specific attribute fields in the output:

In [5]:
# Filter feature layer records with a sql query. 
# See https://developers.arcgis.com/rest/services-reference/query-feature-service-layer-.htm

df = fl.query(where="AGE_45_54 < 1500").sdf
In [6]:
for a,b,c,d in zip(df.columns[::4], df.columns[1::4],df.columns[2::4], df.columns[3::4]):
    print("{:<30}{:<30}{:<30}{:<}".format(a,b,c,d))
AGE_10_14                     AGE_15_19                     AGE_20_24                     AGE_25_34
AGE_35_44                     AGE_45_54                     AGE_55_64                     AGE_5_9
AGE_65_74                     AGE_75_84                     AGE_85_UP                     AGE_UNDER5
AMERI_ES                      ASIAN                         AVE_FAM_SZ                    AVE_HH_SZ
BLACK                         CAPITAL                       CLASS                         FAMILIES
FEMALES                       FHH_CHILD                     FID                           HAWN_PI
HISPANIC                      HOUSEHOLDS                    HSEHLD_1_F                    HSEHLD_1_M
HSE_UNITS                     MALES                         MARHH_CHD                     MARHH_NO_C
MED_AGE                       MED_AGE_F                     MED_AGE_M                     MHH_CHILD
MULT_RACE                     NAME                          OBJECTID                      OTHER
OWNER_OCC                     PLACEFIPS                     POP2010                       POPULATION
POP_CLASS                     RENTER_OCC                    SHAPE                         ST
In [7]:
# Return a subset of columns on just the first 5 records
df[['NAME', 'AGE_45_54', 'POP2010']].head()
Out[7]:
NAME AGE_45_54 POP2010
0 Somerton 1411 14287
1 Anderson 1333 9932
2 Camp Pendleton South 127 10616
3 Citrus 1443 10866
4 Commerce 1478 12823

Accessing local GIS data

The SEDF can also access local geospatial data. Depending upon what Python modules you have installed, you'll have access to a wide range of functionality:

Example: Reading a Shapefile

You must authenticate to ArcGIS Online or ArcGIS Enterprise to use the from_featureclass() method to read a shapefile with a Python interpreter that does not have access to ArcPy.

g2 = GIS("https://www.arcgis.com", "username", "password")

In [8]:
g2 = GIS("https://python.playground.esri.com/portal", "arcgis_python", "amazing_arcgis_123")
In [10]:
sdf = pd.DataFrame.spatial.from_featureclass("path\to\your\data\census_example\cities.shp")
sdf.tail()
Out[10]:
PLACENS GEOID NAMELSAD CLASSFP FUNCSTAT ALAND AWATER INTPTLAT INTPTLON OBJECTID SHAPE
216 02390518 3481380 Williamstown CDP U1 S 19224931.0 9679.0 +39.6842012 -74.9686746 217 {"rings": [[[-75.01233999999994, 39.6705850000...
217 02390527 3481950 Woodbridge CDP U1 S 10066988.0 73147.0 +40.5550241 -74.2849508 218 {"rings": [[[-74.30846699999995, 40.5411700000...
218 02633183 3483170 Yardville CDP U1 S 10534691.0 93678.0 +40.1866452 -74.6630801 219 {"rings": [[[-74.70470199999994, 40.1848300000...
219 02390550 3483245 Yorketown CDP U1 S 6184354.0 24137.0 +40.3059040 -74.3388992 220 {"rings": [[[-74.35499299999998, 40.3024340000...
220 02584041 3483290 Zarephath CDP U1 S 1044431.0 87101.0 +40.5345727 -74.5721914 221 {"rings": [[[-74.58222099999995, 40.5350360000...

Saving Spatially Enabled DataFrames

The SEDF can export data to various data formats for use in other applications.

Export Options

Export to Feature Class

The SEDF allows for the export of whole datasets or partial datasets.

Example: Export a whole dataset to a shapefile:

In [18]:
sdf.spatial.to_featureclass(location=r"c:\output_examples\census.shp")
Out[18]:
'c:\\output_examples\\census.shp'

The ArcGIS API for Python installs on all macOS and Linux machines, as well as those Windows machines not using Python interpreters that have access to ArcPy will only be able to write out to shapefile format with the to_featureclass method. Writing to file geodatabases requires the ArcPy site-package.

Example: Export dataset with a subset of columns and top 5 records to a shapefile:

In [17]:
for a,b,c,d in zip(sdf.columns[::4], sdf.columns[1::4], sdf.columns[2::4], sdf.columns[3::4]):
    print("{:<30}{:<30}{:<30}{:<}".format(a,b,c,d))
PLACENS                       GEOID                         NAMELSAD                      CLASSFP
FUNCSTAT                      ALAND                         AWATER                        INTPTLAT
In [15]:
columns = ['NAME', 'ST', 'CAPITAL', 'STFIPS', 'POP2000', 'POP2007', 'SHAPE']
sdf[columns].head().spatial.to_featureclass(location=r"/path/to/your/data/directory/sdf_head_output.shp")
Out[15]:
'/path/to/your/data/directory/sdf_head_output.shp'

Feedback on this topic?