ArcGIS Developer
Dashboard

ArcGIS API for Python

Part-2 Data IO with SeDF - Accessing Data

Introduction

In part-1 of this guide series, we started with an introduction to the Spatially enabled DataFrame (SeDF), the spatial and geom namespaces, and looked at a quick example of SeDF in action. In this part of the guide series, we will look at how GIS data can be accessed from various data formats using SeDF.

GIS users work with different vector-based spatial data formats, like published layers on remote servers (web layers) and local data. The Spatially enabled DataFrame allows the users to read, write, and manipulate spatial data by bringing the data in-memory.

The SeDF integrates with Esri's ArcPy site-package, as well as the open source pyshp, shapely and fiona packages. This means that the SeDF can use either shapely or arcpy geometry engines to provide you with options for easily working with geospatial data, regardless of your platform. The SeDF transforms the data into the formats you desire, allowing you to use Python functionality to analyze and visualize geographic information.

Data can be read and scripted to automate workflows and be visualized on maps in a Jupyter notebooks. Let's explore the options available for accessing GIS data with the versatile Spatially enabled DataFrame.

The data used in this guide is available as an item. We will start by importing some libraries and downloading and extracting the data needed for the analysis in this guide.

In [1]:
# Import Libraries
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.gis import GIS
from IPython.display import display
import zipfile
import os
import shutil
In [2]:
# Create a GIS connection
gis = GIS()
agol_gis = GIS("https://www.arcgis.com","arcgis_python","amazing_arcgis_123")
In [3]:
# Get the data item
data_item = gis.content.get('c7140ae3d7ae4fd0817181461019aa75')
data_item
Out[3]:
sedf_guide_data
Data for Spatially enabled DataFrame GuidesShapefile by api_data_owner
Last Modified: November 11, 2021
0 comments, 4 views

The cell below downloads and extracts the data from the data item to your machine.

In [4]:
# Download and extract the data
def unzip_data():
    """
    This function:
    - creates a directory `sedf_data` to download the data from the item
    - downloads the item as `sedf_guide_data.zip` file in the sedf_data directory
    - unzips and extracts the data to '.\sedf_data\cities'.
    """
    try:
        
        data_dir = os.path.join(os.getcwd(), 'sedf_data')    # path to downloaded data folder
        
        # remove existing cities directory if exists
        if os.path.isdir(data_dir):
            shutil.rmtree(data_dir)
            print(f'Removed existing data directory')
        else:
            os.makedirs(data_dir)
            
        data_item.download(data_dir)    # download the data item
        zipped_file_path = os.path.join(data_dir, 'sedf_guide_data.zip')    # path to zipped file inside data folder

        # unzip the data
        zip_ref = zipfile.ZipFile(zipped_file_path, 'r')
        zip_ref.extractall(data_dir)
        zip_ref.close()
        
        cities_dir = os.path.join(data_dir, 'cities')    # path to new cities directory
        print(f'Dataset unzipped at: {os.path.relpath(cities_dir)}')
        
    except Exception as e:
        print(f'Error unzipping file: {e}')
        

# Extract data
unzip_data()
Removed existing data directory
Dataset unzipped at: sedf_data\cities

Accessing GIS Data

The Spatially enabled DataFrame reads from many sources, including Feature layers, Feature classes, Shapefiles, Pandas DataFrames and more. Let's dive into the details of accessing GIS data from various sources.

Read in Web Feature Layers

Feature layers hosted on ArcGIS Online or ArcGIS Enterprise can be easily read into a Spatially enabled DataFrame using the from_layer() method.

The example below shows how the get() method can be used to retrieve an ArcGIS Online item and how the layers property of an item can be used to access the data.

In [5]:
# Retrieve an item from ArcGIS Online using Item ID value
gis = GIS()
item = gis.content.get("85d0ca4ea1ca4b9abf0c51b9bd34de2e")
item
Out[5]:
USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.Feature Layer Collection by esri_dm
Last Modified: May 19, 2020
1 comments, 33,841,105 views
In [6]:
# Obtain the first feature layer from the item
flayer = item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(flayer)

# Check shape
sdf.shape
Out[6]:
(3886, 50)
In [7]:
# Check first few records
sdf.head()
Out[7]:
AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44 AGE_45_54 AGE_55_64 AGE_5_9 AGE_65_74 AGE_75_84 ... PLACEFIPS POP2010 POPULATION POP_CLASS RENTER_OCC SHAPE ST STFIPS VACANT WHITE
0 1313 1058 734 2031 1767 1446 1136 1503 665 486 ... 1601990 13816 15181 6 1271 {"x": -12462673.723706165, "y": 5384674.994080... ID 16 271 13002
1 890 817 818 1799 1235 1330 1143 1099 721 579 ... 1607840 11899 11946 6 1441 {"x": -12506251.313993266, "y": 5341537.793529... ID 16 318 9893
2 12750 13959 16966 32135 27048 29595 24177 12933 12176 7087 ... 1608830 205671 225405 8 33359 {"x": -12938676.6836459, "y": 5403597.04949123... ID 16 6996 182991
3 790 768 699 1445 1136 1134 935 959 679 464 ... 1611260 10345 10727 6 1461 {"x": -12667411.402393516, "y": 5241722.820606... ID 16 241 7984
4 3803 3779 3687 7571 5559 4744 3624 4397 2296 1222 ... 1612250 46237 53942 7 5196 {"x": -12989383.674504515, "y": 5413226.487333... ID 16 1428 35856

5 rows × 50 columns

In [8]:
# Check type of sdf
type(sdf)
Out[8]:
pandas.core.frame.DataFrame
In [9]:
# Access spatial namespace
sdf.spatial.geometry_type
Out[9]:
['point']

We can see that the dataset has 3886 records and 50 columns. Inspecting the type of sdf object and accessing the spatial namespace shows us that a Spatially enabled DataFrame has been created from all the data in the layer.

Memory usage and the query() operation

The from_layer() method will attempt to read all the data from the layer into the memory. This approach works when you are dealing with small datasets. However, when it comes to large datasets, it becomes imperative to use the memory efficiently and query for only what is necessary.

Let's take a look at the memory usage of the existing SeDF using the memory_usage() method from Pandas.

In [10]:
# Check memory usage of current sdf
mem_used = sdf.memory_usage().sum() / (1024**2) #converting to megabytes
print(f'Shape of data: {sdf.shape}')
print(f'Memory used: {round(mem_used, 2)} MB')
Shape of data: (3886, 50)
Memory used: 1.48 MB

We can see that a SeDF created using the from_layer() method reads all the data into the memory. So, the sdf object has 3886 records and 50 columns, and uses 1.48MB memory.

But what if we only needed a small amount of data for our analysis and did not need to bring everything from the layer into the memory? Good question... let's see how we can achieve that.

The query() method is a powerful operation that allows you to use SQL like queries to return only a subset of records. Since the processing is performed on the server, this operation is not restricted by the capacity of your computer.

The method returns a FeatureSet object; however, the return type can be changed to a Spatially enabled DataFrame object by specifying the parameter as_df=True.

Let's subset the data using query(), create a new SeDF, and check the memory usage. We'll use the AGE_45_54 column to query the layer and get a subset of records.

In [11]:
# Filter feature layer records with a query. 
sub_sdf = flayer.query(where="AGE_45_54 < 1500", as_df=True)
sub_sdf.shape
Out[11]:
(316, 50)
In [12]:
# Check memory usage of current sdf
mem_used = sub_sdf.memory_usage().sum() / (1024**2) #converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')
Memory used is: 0.12 MB

Now that we are only querying for records where AGE_45_54 < 1500, the result is a smaller DataFrame with 316 records and 50 columns. Since the processing is performed on the server side, only a subset of data is being saved in the memory reducing usage from 1.48 MB to 0.12 MB.

The query() method allows you to specify a number of optional parameters that may further refine and transform the results. One such key parameter is out_fields. With out_fields, you can subset your data by specifying a list of field names to return.

In [13]:
# Filter feature layer with where and out_fields 
out_fields = ['NAME','ST','POP_CLASS','AGE_45_54']
sub_sdf2 = flayer.query(where="AGE_45_54 < 1500", 
                        out_fields=out_fields,
                        as_df=True)
sub_sdf2.shape
Out[13]:
(316, 6)
In [14]:
# Check head
sub_sdf2.head()
Out[14]:
FID NAME ST POP_CLASS AGE_45_54 SHAPE
0 1 Ammon ID 6 1446 {"x": -12462673.723706165, "y": 5384674.994080...
1 2 Blackfoot ID 6 1330 {"x": -12506251.313993266, "y": 5341537.793529...
2 4 Burley ID 6 1134 {"x": -12667411.402393516, "y": 5241722.820606...
3 6 Chubbuck ID 6 1494 {"x": -12520053.904151963, "y": 5300220.333409...
4 12 Jerome ID 6 1155 {"x": -12747828.64784961, "y": 5269214.8197742...
In [15]:
# Check memory usage of current sdf
mem_used = sub_sdf2.memory_usage().sum() / (1024**2) #converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')
Memory used is: 0.01 MB

Using out_fields, we have further reduced memory usage by subsetting the data and bringing only necessary information into the memory.

Create SeDF from FeatureSet

As mentioned earlier, the query() method returns a FeatureSet object. The FeatureSet object contains useful information about the data that can be accessed through its various properties.

Let's use the AGE_45_54 column to query the layer to get the result as a FeatureSet and check some its properties.

In [16]:
# Filter feature layer to return a feature set. 
fset = flayer.query(where="AGE_45_54 < 1500")
In [17]:
# Check type
type(fset)
Out[17]:
arcgis.features.feature.FeatureSet
In [18]:
# Check length
len(fset.features)
Out[18]:
316
In [19]:
# Check geometry of a feature in the featureset
fset.features[0].geometry
Out[19]:
{'x': -12462673.723706165,
 'y': 5384674.994080178,
 'spatialReference': {'wkid': 102100, 'latestWkid': 3857}}

The fields property of a FeatureSet returns a list containing information about each column recorded as a dictionary. Let's use the fields property to access information about the first column.

In [20]:
# Check details of a column in the feature set
fset.fields[0]
Out[20]:
{'name': 'FID',
 'type': 'esriFieldTypeOID',
 'alias': 'FID',
 'sqlType': 'sqlTypeInteger',
 'domain': None,
 'defaultValue': None}

Let's get the names of the columns in the data.

In [21]:
# Get column names
f_names = [f['name'] for f in fset.fields]
f_names[:5]
Out[21]:
['FID', 'NAME', 'CLASS', 'ST', 'STFIPS']

Now, let's create a Spatially enabled DataFrame from a FeatureSet using the .sdf property.

In [22]:
# Create SeDF from FeatureSet
fset_df = fset.sdf
fset_df.shape
Out[22]:
(316, 50)
In [23]:
# Check head
fset_df.head(2)
Out[23]:
FID NAME CLASS ST STFIPS PLACEFIPS CAPITAL POP_CLASS POPULATION POP2010 ... MARHH_NO_C MHH_CHILD FHH_CHILD FAMILIES AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC SHAPE
0 1 Ammon city ID 16 1601990 6 15181 13816 ... 1131 106 335 3352 3.61 4747 271 3205 1271 {"x": -12462673.723706165, "y": 5384674.994080...
1 2 Blackfoot city ID 16 1607840 6 11946 11899 ... 1081 174 381 2958 3.31 4547 318 2788 1441 {"x": -12506251.313993266, "y": 5341537.793529...

2 rows × 50 columns

In [24]:
# Check geometry type
fset_df.spatial.geometry_type
Out[24]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureSet.

Create SeDF from FeatureCollection

Tools within the ArcGIS API for Python often return a FeatureCollection object as a result of some analysis. A FeatureCollection is an in-memory collection of Feature objects with rendering information. Similar to feature layers, feature collections can also be used to store features. With a feature collection, a service is not created to serve out the feature data.

Let's create a SeDF from a FeatureCollection. Here, we:

  • Import the Major Ports feature layer.
  • Create 5 mile buffers using create_buffers() tool resulting in a FeatureCollection.
  • Using the query() method on a FeatureCollection returns a FeatureSet object. We will create a SeDF from the buffered FeatureCollection using the the .sdf property of a FeatureSet object returned from query().
In [25]:
# Get the ports item
ports_item = gis.content.get("405963eaea24428c9db236ec289760eb")
ports_item
Out[25]:
Major Ports
This feature layer, utilizing data from the U.S. Department of Transportation, depicts Major Ports in the United States by total tonnage.Feature Layer Collection by Federal_User_Community
Last Modified: October 27, 2021
0 comments, 157,223 views
In [26]:
# Get the ports layer
ports_lyr = ports_item.layers[0]
ports_lyr
Out[26]:
<FeatureLayer url:"https://geo.dot.gov/server/rest/services/NTAD/Ports_Major/MapServer/0">
In [27]:
# Create buffers
from arcgis.features.use_proximity import create_buffers
ports_buffer50 = create_buffers(ports_lyr, distances=[5], units = 'Miles', gis=agol_gis)
In [28]:
# Check type of result from the analysis
type(ports_buffer50)
Out[28]:
arcgis.features.feature.FeatureCollection

The create_buffers() tool resulted in a FeatureCollection.

Now, we will create a SeDF from the FeatureCollection object.

In [29]:
# Create SeDF
sedf_fc = ports_buffer50.query().sdf
sedf_fc.head(2)
Out[29]:
OBJECTID_1 OBJECTID ID PORT PORT_NAME GRAND_TOTA FOREIGN_TO IMPORTS EXPORTS DOMESTIC BUFF_DIST ORIG_FID AnalysisArea SHAPE
0 1 1 124 C4947 Unalaska Island, AK 1652281 1236829 426251 810578 415452 5 1 78.528402 {"rings": [[[-18806114.3995, 7138385.537799999...
1 2 2 85 C4410 Kahului, Maui, HI 3615449 20391 20391 0 3595058 5 2 78.528402 {"rings": [[[-17418472.419, 2388455.4312999994...
In [30]:
# Check geometry type
sedf_fc.spatial.geometry_type
Out[30]:
['polygon']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureCollection.

Read in local GIS data

Local geospatial data, such as Feature classes and shapefiles can be easily accessed using the Spatially enabled DataFrame. The from_featureclass() method can be used to access local data. Let's look at some examples.

Reading a Shapefile

A locally stored shapefile can be accessed by passing the location of the file in the from_featureclass() method.

Note: In the absence of arcpy, the PyShp package must be present in your current conda environment in order to read shapefiles. To check if PyShp is present, you can run the following in a cell: !conda list pyshp To install PyShp, you can run the following in a cell: !conda install pyshp
In [31]:
# Reading from shape file
shp_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.shp")
shp_df.shape
Out[31]:
(3886, 51)
In [32]:
shp_df.spatial.geometry_type
Out[32]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the shapefile stored locally.

Shapefile from a URL

The url of a zipped shapefile can be used to create a SeDF by passing the url as location in the from_featureclass() method. The image below shows how the operation can be performed.

Note: This operation requires PyShp to be available in the environment.

image.png

Reading a Featureclass

A featureclass can be accessed from a File Geodatabase by passing its location in the from_featureclass() method.

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment in order to read a featureclass. To check if Fiona is present, you can run the following in a cell: !conda list fiona To install Fiona, you can run the following in a cell: !conda install fiona
In [33]:
# Reading from FGDB
fcls_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities")
fcls_df.shape
Out[33]:
(3886, 51)
In [34]:
# Check head
fcls_df.head(2)
Out[34]:
OBJECTID age_10_14 age_15_19 age_20_24 age_25_34 age_35_44 age_45_54 age_55_64 age_5_9 age_65_74 ... placefips pop2010 population pop_class renter_occ st stfips vacant white SHAPE
0 1 1313 1058 734 2031 1767 1446 1136 1503 665 ... 1601990 13816 15181 6 1271 ID 16 271 13002 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 2 890 817 818 1799 1235 1330 1143 1099 721 ... 1607840 11899 11946 6 1441 ID 16 318 9893 {"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

In [35]:
# Check geometry type
fcls_df.spatial.geometry_type
Out[35]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the featureclass stored locally.

Specify optional parameters

The from_featureclass() method allows users to specify optional parameters when the ArcPy library is available in the current environment. These parameters are:

  • sql_clause: a pair of SQL prefix and postfix clauses, sql_clause=(prefix,postfix), organized in a list or a tuple can be passed to query specific data. The parameter allows only a small set of operations to be performed. Learn more about the allowed operations here.
  • where_clause: where statement to subset the data. Learn more about it here.
  • fields: to subset the data for specific fields.
  • spatial_filter: a geometry object to filter the results.
Note: The operations below can only be performed in an environment that contains arcpy.
Subset data for specific fields
In [36]:
# Subset for fields
fcls_flds = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st','pop_class'])
fcls_flds.shape
Out[36]:
(3886, 3)
In [37]:
# Check head
fcls_flds.head(2)
Out[37]:
st pop_class SHAPE
0 ID 6 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 ID 6 {"x": -12506251.314, "y": 5341537.793499999, "...
Subset using where_clause

Learn more about how to use where_clause here.

In [38]:
# Subset using where_clause
fcls_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  where_clause="st='ID' and pop_class=6")
fcls_whr.shape
Out[38]:
(15, 51)
In [39]:
# Check head
fcls_whr.head(2)
Out[39]:
OBJECTID age_10_14 age_15_19 age_20_24 age_25_34 age_35_44 age_45_54 age_55_64 age_5_9 age_65_74 ... placefips pop2010 population pop_class renter_occ st stfips vacant white SHAPE
0 1 1313 1058 734 2031 1767 1446 1136 1503 665 ... 1601990 13816 15181 6 1271 ID 16 271 13002 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 2 890 817 818 1799 1235 1330 1143 1099 721 ... 1607840 11899 11946 6 1441 ID 16 318 9893 {"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

Subset using fields and where_clause
In [40]:
# Subset using where_clause
flds_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  fields=['st','pop_class','age_10_14','age_15_19'],
                                                  where_clause="st='ID' and pop_class=6")
flds_whr.shape
Out[40]:
(15, 5)
In [41]:
# Check head
flds_whr.head(2)
Out[41]:
st pop_class age_10_14 age_15_19 SHAPE
0 ID 6 1313 1058 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 ID 6 890 817 {"x": -12506251.314, "y": 5341537.793499999, "...
Subset using sql_clause

sql_clause can be combined with fields and where_clause to further subset the data. You can learn more about the allowed operations here. Now let's look at some examples.

Prefix sql_clause - DISTINCT operation
In [42]:
# Prefix Sql clause - DISTINCT operation
fcls_sql1 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  sql_clause=("DISTINCT pop_class", None))

# Check shape
fcls_sql1.shape
Out[42]:
(3886, 51)
In [43]:
# Check head
fcls_sql1.head(2)
Out[43]:
OBJECTID age_10_14 age_15_19 age_20_24 age_25_34 age_35_44 age_45_54 age_55_64 age_5_9 age_65_74 ... placefips pop2010 population pop_class renter_occ st stfips vacant white SHAPE
0 941 1247 1213 1043 2022 1692 2116 1827 1187 1037 ... 0507330 15620 14771 6 3006 AR 05 1303 6216 {"x": -10006810.091, "y": 4290154.581699997, "...
1 1405 796 748 754 1999 1717 2062 1450 760 851 ... 2466850 12677 13188 6 814 MD 24 281 11613 {"x": -8517714.7855, "y": 4744316.880199999, "...

2 rows × 51 columns

Postfix sql_clause with specific fields

Here, we will subset the data for the state and population class fields and apply a postfix clause.

In [44]:
# Postfix Sql clause with specific fields
fcls_sql2 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st','pop_class'],
                                                  sql_clause=(None, "ORDER BY st, pop_class"))
# Check shape
fcls_sql2.shape
Out[44]:
(3886, 3)
In [45]:
# Check head
fcls_sql2.head()
Out[45]:
st pop_class SHAPE
0 AK 6 {"x": -16417572.1606, "y": 9562359.403800003, ...
1 AK 6 {"x": -16455422.2224, "y": 9574022.0224, "spat...
2 AK 6 {"x": -16444303.0276, "y": 9568008.9705, "spat...
3 AK 6 {"x": -14962313.3618, "y": 8031014.926600002, ...
4 AK 6 {"x": -16657118.680399999, "y": 8746757.662600...
Prefix and Postfix sql_clause with specific fields and where_clause

Here, we will subset the data using where_clause, keep specific fields, and then apply both prefix and postfix clause.

In [48]:
# Prefix and Postfix sql_clause
fcls_sql3_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st','name','pop_class','age_10_14'],
                                                   where_clause="st='ID'",
                                                   sql_clause=("DISTINCT pop_class", "ORDER BY name"))

# Check Shape
fcls_sql3_df.shape
Out[48]:
(22, 5)
In [49]:
# Check head
fcls_sql3_df.head()
Out[49]:
st name pop_class age_10_14 SHAPE
0 ID Ammon 6 1313 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 ID Blackfoot 6 890 {"x": -12506251.314, "y": 5341537.793499999, "...
2 ID Boise City 8 12750 {"x": -12938676.683600001, "y": 5403597.049500...
3 ID Burley 6 790 {"x": -12667411.4024, "y": 5241722.820600003, ...
4 ID Caldwell 7 3803 {"x": -12989383.6745, "y": 5413226.487300001, ...
Using spatial_filter

spatial_filter can be used to query the results by using a spatial relationship with another geometry. The spatial filtering is even more powerful when integrated with Geoenrichment. Let's use this approach to filter our results for the state of Idaho. In this example, we will:

  • use arcgis.geoenrichment.Country to derive the geometries for the state of Idaho.
  • use arcgis.geometry.filters.intersects(geometry, sr=None) to create a geometry filter object that filters results whose geometry intersects with the specified geometry (i.e. filter data points within the boundary of Idaho).
  • pass the geometry filter object to spatial_filter to get desired results.
Note: To perform enrichment operations, GeoEnrichment must be configured in your GIS organization. GeoEnrichment consumes credits, and you can learn more about credit consumption here.
In [51]:
# Basic Imports
from arcgis.geometry import Geometry
from arcgis.geometry.filters import intersects
from arcgis.geoenrichment import Country
In [59]:
# Create country object
usa = Country.get('US', gis=agol_gis)
type(usa)
Out[59]:
arcgis.geoenrichment.enrichment.Country
In [62]:
# Get boundaries for Idaho
named_area_ID = usa.search(query='Idaho', layers=['US.States'])
display(named_area_ID[0])
named_area_ID[0].geometry.as_arcpy
<NamedArea name:"Idaho" area_id="16", level="US.States", country="147">
Out[62]:
In [64]:
# Create spatial reference
sr_id = named_area_ID[0].geometry["spatialReference"]
sr_id
Out[64]:
{'wkid': 4326, 'latestWkid': 4326}
In [66]:
# Construct a geometry filter using the filter geometry
id_state_filter = intersects(named_area_ID[0].geometry, 
                              sr=sr_id)
type(id_state_filter)
Out[66]:
dict
In [71]:
# Pass geometry filter object as a spatial_filter
fcls_spfl_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st','name','pop_class','age_10_14'],
                                                  spatial_filter=id_state_filter)
# Check shape
fcls_spfl_df.shape
Out[71]:
(22, 5)
In [73]:
# Check head
fcls_spfl_df.head()
Out[73]:
st name pop_class age_10_14 SHAPE
0 ID Ammon 6 1313 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 ID Blackfoot 6 890 {"x": -12506251.314, "y": 5341537.793499999, "...
2 ID Boise City 8 12750 {"x": -12938676.683600001, "y": 5403597.049500...
3 ID Burley 6 790 {"x": -12667411.4024, "y": 5241722.820600003, ...
4 ID Caldwell 7 3803 {"x": -12989383.6745, "y": 5413226.487300001, ...

The result shows the data points filtered for Idaho as defined by the spatial filter.

You can learn more about applying spatial filters in our Working with geometries guide series.

Read in DataFrame with Addresses

A SeDF can be easily created from a DataFrame with address information using the from_df() method. This method geocodes the addresses using the first configured geocoder in your GIS. The locations generated after geocoding are used as the geometry of the SeDF.

You can learn more about geocoding in our Finding Places with geocoding guide series.

Note: The from_df() method performs a batch geocoding operation which consumes credits. If a geocoder is not specified, then the first configured geocoder in your GIS organization will be used. Learn more about credit consumption here. To avoid credit consumption, you may specify your own `geocoder`.

Let's look at an example of using from_df(). We will read addresses into a DataFrame using the pd.read_csv() method. Next, we will create a SeDF by passing the DataFrame and address column as parameters to the from_df() method.

In [48]:
# Read the csv file with address into a DataFrame
orders_df = pd.read_csv("./sedf_data/cities/orders.csv")

# Check head
orders_df.head()
Out[48]:
Address
0 602 Murray Cir, Sausalito, CA 94965
1 340 Stockton St, San Francisco, CA 94108
2 3619 Balboa St, San Francisco, CA 94121
3 1274 El Camino Real, San Bruno, CA 94066
4 625 Monterey Blvd, San Francisco, CA 94127

The DataFrame shows a column with address information.

In [53]:
# Use from_df to create SeDF
orders_sdf = pd.DataFrame.spatial.from_df(df=orders_df, address_column="Address")
orders_sdf.head()
Out[53]:
Address SHAPE
0 602 Murray Cir, Sausalito, CA 94965 {"x": -122.47885242199999, "y": 37.83735920100...
1 340 Stockton St, San Francisco, CA 94108 {"x": -122.44955096499996, "y": 37.73152250200...
2 3619 Balboa St, San Francisco, CA 94121 {"x": -122.49772620499999, "y": 37.77567413500...
3 1274 El Camino Real, San Bruno, CA 94066 {"x": -122.40685153899994, "y": 37.78910429100...
4 625 Monterey Blvd, San Francisco, CA 94127 {"x": -122.42218381299995, "y": 37.63856151200...
In [54]:
# Check geometry type
orders_sdf.spatial.geometry_type
Out[54]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with address information.

Read in DataFrame with Lat/Long Information

As we saw in part-1 of this guide series, a SeDF can be created from any Pandas DataFrame with location information (Latitude and Longitude) using the from_xy() method.

Let's look at an example. We will read the data with latitude and longitude information into a DataFrame using the pd.read_csv() method. Then, we will create a SeDF by passing the DataFrame, latitude, and longitude as parameters to the from_xy() method.

In [55]:
# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Return the first 5 records
cms_df.head()
Out[55]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds LONGITUDE LATITUDE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 -87.792973 42.012012
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 -85.197651 40.392722
2 PARKWAY MANOR MARION IL 0 0 0 131 84 -88.982944 37.750143
3 AVANTARA LONG GROVE LONG GROVE IL 6 141 0 195 131 -87.986442 42.160843
4 HARMONY NURSING & REHAB CENTER CHICAGO IL 19 75 16 180 116 -87.726353 41.975505
In [56]:
# Create a SeDF
cms_sedf = pd.DataFrame.spatial.from_xy(df=cms_df, x_column='LONGITUDE', y_column='LATITUDE', sr=4326)

# Check head
cms_sedf.head()
Out[56]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds LONGITUDE LATITUDE SHAPE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 -87.792973 42.012012 {"spatialReference": {"wkid": 4326}, "x": -87....
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 -85.197651 40.392722 {"spatialReference": {"wkid": 4326}, "x": -85....
2 PARKWAY MANOR MARION IL 0 0 0 131 84 -88.982944 37.750143 {"spatialReference": {"wkid": 4326}, "x": -88....
3 AVANTARA LONG GROVE LONG GROVE IL 6 141 0 195 131 -87.986442 42.160843 {"spatialReference": {"wkid": 4326}, "x": -87....
4 HARMONY NURSING & REHAB CENTER CHICAGO IL 19 75 16 180 116 -87.726353 41.975505 {"spatialReference": {"wkid": 4326}, "x": -87....

The SHAPE feature shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with latitude and longitude information.

Read in GeoPandas DataFrame

A SeDF can be easily created from a GeoPandas's GeoDataFrame using the from_geodataframe() method. We will:

Create a GeoDataFrame

Here, we will create a GeoDataFrame from a Pandas DataFrame, cms_df, defined above.

In [57]:
# Import libraries
from geopandas import GeoDataFrame
from shapely.geometry import Point
In [58]:
# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Create Geopandas DataFrame
gdf = GeoDataFrame(cms_df.drop(['LONGITUDE','LATITUDE'], axis=1), 
                   crs={'init': 'epsg:4326'},
                   geometry=[Point(xy) for xy in zip(cms_df.LONGITUDE, cms_df.LATITUDE)])
gdf.shape
Out[58]:
(124, 9)
In [59]:
# Check head
gdf.head(2)
Out[59]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds geometry
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 POINT (-87.79297 42.01201)
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 POINT (-85.19765 40.39272)

A GeoDataFrame has been created with a geometry column that stores the geometry of the dataset.

Create a SeDF from GeoDataFrame

Here, we will create a SeDF from the gdf GeoDataFrame created above using the from_geodataframe() method.

In [60]:
# Create a SeDF
sedf_gpd = pd.DataFrame.spatial.from_geodataframe(gdf)
sedf_gpd.head(2)
Out[60]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds SHAPE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 {"x": -87.792973, "y": 42.012012, "spatialRefe...
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 {"x": -85.197651, "y": 40.392722, "spatialRefe...
In [61]:
# Check geometry type
sedf_gpd.spatial.geometry_type
Out[61]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a GeoDataFrame.

Read in feather format data

A SeDF can be easily created from the data in feather format using the from_feather() method. The method's defaults SHAPE is the spatial_column for geo-spatial information, but any other column with spatial information can be specified.

In [62]:
# Check head
cms_sedf.head(2)
Out[62]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds LONGITUDE LATITUDE SHAPE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 -87.792973 42.012012 {"spatialReference": {"wkid": 4326}, "x": -87....
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 -85.197651 40.392722 {"spatialReference": {"wkid": 4326}, "x": -85....
In [63]:
# Create SeDf by reading from feather
sedf_fthr = pd.DataFrame.spatial.from_feather('./sedf_data/cities/sample_cms_data.feather')
sedf_fthr.head(2)
Out[63]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds LONGITUDE LATITUDE SHAPE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 -87.792973 42.012012 {"x": -87.792973, "y": 42.012012, "spatialRefe...
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 -85.197651 40.392722 {"x": -85.197651, "y": 40.392722, "spatialRefe...
In [64]:
# Check geometry type
sedf_fthr.spatial.geometry_type
Out[64]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from feather format data.

Read in Non-spatial Table data

Non-spatial table data can be hosted on ArcGIS Online or ArcGIS Enterprise, or it can be stored locally in a File Geodatabase. A SeDF can be easily created from such non-spatial table data using the following methods:

Using the from_table() method

A SeDF can be created from local non-spatial data using the from_table() method. The method can read a csv file (in any environment) or a table stored in a File Geodatabase (with ArcPy only).

Reading a csv file
In [65]:
# Create SeDF
tbl_df = pd.DataFrame.spatial.from_table(filename='./sedf_data/cities/sample_cms_data.csv')
tbl_df.head(2)
Out[65]:
Provider Name Provider City Provider State Residents Total Admissions COVID-19 Residents Total COVID-19 Cases Residents Total COVID-19 Deaths Number of All Beds Total Number of Occupied Beds LONGITUDE LATITUDE
0 GROSSE POINTE MANOR NILES IL 5 56 12 99 61 -87.792973 42.012012
1 MILLER'S MERRY MANOR DUNKIRK IN 0 0 0 46 43 -85.197651 40.392722

A Pandas DataFrame without any spatial information is returned.

Reading table from a File Geodatabase
Note: The operation below can only be performed in an environment that contains arcpy.
In [66]:
# Create SeDF
tbl_df2 = pd.DataFrame.spatial.from_table(filename="./sedf_data/cities/cities.gdb/cities_table_export")
tbl_df2.head(2)
Out[66]:
OBJECTID NAME OTHER OWNER_OCC PLACEFIPS POP2010 POPULATION POP_CLASS RENTER_OCC ST STFIPS VACANT WHITE
0 1 Ammon 307 3205 1601990 13816 15181 6 1271 ID 16 271 13002
1 2 Blackfoot 1077 2788 1607840 11899 11946 6 1441 ID 16 318 9893

A Pandas DataFrame without any spatial information is returned.

Using the from_layer() method

A SeDF can be created from hosted non-spatial data using thefrom_layer() method.

In [67]:
# Get table item
tbl_item = agol_gis.content.get("b022d30f881f478f8155153b9205ce12")
tbl_item
Out[67]:
sedf_major_cities_table
Table Layer by api_data_owner
Last Modified: November 11, 2021
0 comments, 4 views
In [68]:
# Get table url
tbl = tbl_item.tables[0]
tbl
Out[68]:
<Table url:"https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/major_cities_table/FeatureServer/0">
In [69]:
tbl_df2 = pd.DataFrame.spatial.from_layer(tbl)
tbl_df2.head(2)
Out[69]:
CLASS OBJECTID PLACEFIPS POP2010 POPULATION POP_CLASS STFIPS
0 city 1 1601990 13816 15181 6 16
1 city 2 1607840 11899 11946 6 16

A Pandas DataFrame without any spatial information is returned.

Read in data from 'lite and portable' databases

Geospatial data stored in a mobile geodatabase (.geodatabase) or a SQLite Database can be easily accessed using the Spatially enabled DataFrame.

  • A mobile geodatabase (.geodatabase) is a collection of various types of GIS datasets contained in a single file on disk that can store, query, and manage spatial and nonspatial data. Mobile geodatabases are stored in an SQLite database.

  • SQLite is a full-featured relational database with the advantage of being portable and interoperable making it ubiquitous in mobile app development.

The from_featureclass() method can be used to create a SeDF by reading in data from these databases. Let's look at some examples.

Note: The operations below can only be performed in an environment that contains arcpy.

Read from a mobile geodatabase

In [70]:
# Reading from mobile geodatabase
mobile_gdb_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities_mobile.geodatabase/main.cities")
mobile_gdb_df.shape
Out[70]:
(3886, 51)
In [71]:
# Check head
mobile_gdb_df.head(2)
Out[71]:
OBJECTID age_10_14 age_15_19 age_20_24 age_25_34 age_35_44 age_45_54 age_55_64 age_5_9 age_65_74 ... placefips pop2010 population pop_class renter_occ st stfips vacant white SHAPE
0 1 1313 1058 734 2031 1767 1446 1136 1503 665 ... 1601990 13816 15181 6 1271 ID 16 271 13002 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 2 890 817 818 1799 1235 1330 1143 1099 721 ... 1607840 11899 11946 6 1441 ID 16 318 9893 {"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

In [72]:
# Check geometry type
mobile_gdb_df.spatial.geometry_type
Out[72]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Read from a SQLite database

In [73]:
# Reading from sqlite database
sqlite_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.sqlite/main.cities")
sqlite_df.shape
Out[73]:
(3886, 51)
In [74]:
# Check head
sqlite_df.head(2)
Out[74]:
OBJECTID age_10_14 age_15_19 age_20_24 age_25_34 age_35_44 age_45_54 age_55_64 age_5_9 age_65_74 ... placefips pop2010 population pop_class renter_occ st stfips vacant white SHAPE
0 1 1313 1058 734 2031 1767 1446 1136 1503 665 ... 1601990 13816 15181 6 1271 ID 16 271 13002 {"x": -12462673.7237, "y": 5384674.994099997, ...
1 2 890 817 818 1799 1235 1330 1143 1099 721 ... 1607840 11899 11946 6 1441 ID 16 318 9893 {"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

In [75]:
# Check geometry type
sqlite_df.spatial.geometry_type
Out[75]:
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Conclusion

In this guide, we explored how Spatially enabled DataFrame (SeDF) can be used to read spatial data from various formats. We started by reading data from web feature layers and using the query() operation to optimize performance and results. We explored reading data from various local data sources such as file geodatabase and shapefile. Next, we explained how data with address or coordinate information, in a geopandas dataframe, or in feather format can be used to create a SeDF. We also discussed creating SeDF from non-spatial table data. Towards the end, we also discussed how SeDF can be created using data from lite and portable databases.

In the next part of the guide series, you will learn about exporting data using Spatially enabled DataFrame.

Note: Given the importance and popularity of Spatially enabled DataFrame, we are revisiting our documentation for this topic. Our goal is to enhance the existing documentation to showcase the various capabilities of Spatially enabled DataFrame in detail with even more examples this time. Creating quality documentation is time-consuming and exhaustive, but we are committed to providing you with the best experience possible. With that in mind, we will be rolling out the revamped guides on this topic as different parts of a guide series (like the Data Engineering or Geometry guide series). This is "part-2" of the guide series for Spatially Enabled DataFrame. You will continue to see the existing documentation as we revamp it to add new parts. Stay tuned for more on this topic.

Feedback on this topic?