Part-2 Data IO with SeDF - Accessing Data

Introduction

In part-1 of this guide series, we started with an introduction to the Spatially enabled DataFrame (SeDF), the spatial and geom namespaces, and looked at a quick example of SeDF in action. In this part of the guide series, we will look at how GIS data can be accessed from various data formats using SeDF.

GIS users work with different vector-based spatial data formats, like published layers on remote servers (web layers) and local data. The Spatially enabled DataFrame allows the users to read, write, and manipulate spatial data by bringing the data in-memory.

The SeDF integrates with Esri's ArcPy site-package, as well as the open source pyshp, shapely and fiona packages. This means that the SeDF can use either shapely or arcpy geometry engines to provide you with options for easily working with geospatial data, regardless of your platform. The SeDF transforms the data into the formats you desire, allowing you to use Python functionality to analyze and visualize geographic information.

Data can be read and scripted to automate workflows and be visualized on maps in a Jupyter notebooks. Let's explore the options available for accessing GIS data with the versatile Spatially enabled DataFrame.

The data used in this guide is available as an item. We will start by importing some libraries and downloading and extracting the data needed for the analysis in this guide.

# Import Libraries
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.gis import GIS
from IPython.display import display
import zipfile
import os
import shutil

# Create a GIS connection
gis = GIS()
agol_gis = GIS("https://www.arcgis.com", "arcgis_python", "amazing_arcgis_123")

# Get the data item
data_item = gis.content.get('c7140ae3d7ae4fd0817181461019aa75')
data_item

sedf_guide_data
Data for Spatially enabled DataFrame Guides

Shapefile by api_data_owner
Last Modified: November 11, 2021
0 comments, 4 views

The cell below downloads and extracts the data from the data item to your machine.

# Download and extract the data
def unzip_data():
    """
    This function:
    - creates a directory `sedf_data` to download the data from the item
    - downloads the item as `sedf_guide_data.zip` file in the sedf_data directory
    - unzips and extracts the data to '.\sedf_data\cities'.
    """
    try:

        # path to downloaded data folder
        data_dir = os.path.join(os.getcwd(), 'sedf_data')

        # remove existing cities directory if exists
        if os.path.isdir(data_dir):
            shutil.rmtree(data_dir)
            print(f'Removed existing data directory')
        else:
            os.makedirs(data_dir)

        data_item.download(data_dir)    # download the data item
        # path to zipped file inside data folder
        zipped_file_path = os.path.join(data_dir, 'sedf_guide_data.zip')

        # unzip the data
        zip_ref = zipfile.ZipFile(zipped_file_path, 'r')
        zip_ref.extractall(data_dir)
        zip_ref.close()

        # path to new cities directory
        cities_dir = os.path.join(data_dir, 'cities')
        print(f'Dataset unzipped at: {os.path.relpath(cities_dir)}')

    except Exception as e:
        print(f'Error unzipping file: {e}')


# Extract data
unzip_data()

Removed existing data directory
Dataset unzipped at: sedf_data\cities

Accessing GIS Data

The Spatially enabled DataFrame reads from many sources, including Feature layers, Feature classes, Shapefiles, Pandas DataFrames and more. Let's dive into the details of accessing GIS data from various sources.

Read in Web Feature Layers

Feature layers hosted on ArcGIS Online or ArcGIS Enterprise can be easily read into a Spatially enabled DataFrame using the from_layer() method.

The example below shows how the get() method can be used to retrieve an ArcGIS Online item and how the layers property of an item can be used to access the data.

gis = GIS()
item = gis.content.search(
    "USA Major Cities", item_type="Feature layer", outside_org=True)[0]
item

USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.

Feature Layer Collection by esri_dm
Last Modified: May 19, 2020
1 comments, 33,841,105 views

# Obtain the first feature layer from the item
flayer = item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(flayer)

# Check shape
sdf.shape

(3886, 50)

# Check first few records
sdf.head()

	AGE_10_14	AGE_15_19	AGE_20_24	AGE_25_34	AGE_35_44	AGE_45_54	AGE_55_64	AGE_5_9	AGE_65_74	AGE_75_84	...	PLACEFIPS	POP2010	POPULATION	POP_CLASS	RENTER_OCC	SHAPE	ST	STFIPS	VACANT	WHITE
0	1313	1058	734	2031	1767	1446	1136	1503	665	486	...	1601990	13816	15181	6	1271	{"x": -12462673.723706165, "y": 5384674.994080...	ID	16	271	13002
1	890	817	818	1799	1235	1330	1143	1099	721	579	...	1607840	11899	11946	6	1441	{"x": -12506251.313993266, "y": 5341537.793529...	ID	16	318	9893
2	12750	13959	16966	32135	27048	29595	24177	12933	12176	7087	...	1608830	205671	225405	8	33359	{"x": -12938676.6836459, "y": 5403597.04949123...	ID	16	6996	182991
3	790	768	699	1445	1136	1134	935	959	679	464	...	1611260	10345	10727	6	1461	{"x": -12667411.402393516, "y": 5241722.820606...	ID	16	241	7984
4	3803	3779	3687	7571	5559	4744	3624	4397	2296	1222	...	1612250	46237	53942	7	5196	{"x": -12989383.674504515, "y": 5413226.487333...	ID	16	1428	35856

5 rows × 50 columns

# Check type of sdf
type(sdf)

pandas.core.frame.DataFrame

# Access spatial namespace
sdf.spatial.geometry_type

['point']

We can see that the dataset has 3886 records and 50 columns. Inspecting the type of sdf object and accessing the spatial namespace shows us that a Spatially enabled DataFrame has been created from all the data in the layer.

Memory usage and the `query()` operation

The from_layer() method will attempt to read all the data from the layer into the memory. This approach works when you are dealing with small datasets. However, when it comes to large datasets, it becomes imperative to use the memory efficiently and query for only what is necessary.

Let's take a look at the memory usage of the existing SeDF using the memory_usage() method from Pandas.

# Check memory usage of current sdf
mem_used = sdf.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Shape of data: {sdf.shape}')
print(f'Memory used: {round(mem_used, 2)} MB')

Shape of data: (3886, 50)
Memory used: 1.48 MB

We can see that a SeDF created using the from_layer() method reads all the data into the memory. So, the sdf object has 3886 records and 50 columns, and uses 1.48MB memory.

But what if we only needed a small amount of data for our analysis and did not need to bring everything from the layer into the memory? Good question... let's see how we can achieve that.

The query() method is a powerful operation that allows you to use SQL like queries to return only a subset of records. Since the processing is performed on the server, this operation is not restricted by the capacity of your computer.

The method returns a FeatureSet object; however, the return type can be changed to a Spatially enabled DataFrame object by specifying the parameter as_df=True.

Let's subset the data using query(), create a new SeDF, and check the memory usage. We'll use the AGE_45_54 column to query the layer and get a subset of records.

# Filter feature layer records with a query.
sub_sdf = flayer.query(where="AGE_45_54 < 1500", as_df=True)
sub_sdf.shape

(316, 50)

# Check memory usage of current sdf
mem_used = sub_sdf.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')

Memory used is: 0.12 MB

Now that we are only querying for records where AGE_45_54 < 1500, the result is a smaller DataFrame with 316 records and 50 columns. Since the processing is performed on the server side, only a subset of data is being saved in the memory reducing usage from 1.48 MB to 0.12 MB.

The query() method allows you to specify a number of optional parameters that may further refine and transform the results. One such key parameter is out_fields. With out_fields, you can subset your data by specifying a list of field names to return.

# Filter feature layer with where and out_fields
out_fields = ['NAME', 'ST', 'POP_CLASS', 'AGE_45_54']
sub_sdf2 = flayer.query(where="AGE_45_54 < 1500",
                        out_fields=out_fields,
                        as_df=True)
sub_sdf2.shape

(316, 6)

# Check head
sub_sdf2.head()

	FID	NAME	ST	POP_CLASS	AGE_45_54	SHAPE
0	1	Ammon	ID	6	1446	{"x": -12462673.723706165, "y": 5384674.994080...
1	2	Blackfoot	ID	6	1330	{"x": -12506251.313993266, "y": 5341537.793529...
2	4	Burley	ID	6	1134	{"x": -12667411.402393516, "y": 5241722.820606...
3	6	Chubbuck	ID	6	1494	{"x": -12520053.904151963, "y": 5300220.333409...
4	12	Jerome	ID	6	1155	{"x": -12747828.64784961, "y": 5269214.8197742...

# Check memory usage of current sdf
mem_used = sub_sdf2.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')

Memory used is: 0.01 MB

Using out_fields, we have further reduced memory usage by subsetting the data and bringing only necessary information into the memory.

Create SeDF from FeatureSet

As mentioned earlier, the query() method returns a FeatureSet object. The FeatureSet object contains useful information about the data that can be accessed through its various properties.

Let's use the AGE_45_54 column to query the layer to get the result as a FeatureSet and check some its properties.

# Filter feature layer to return a feature set.
fset = flayer.query(where="AGE_45_54 < 1500")

# Check type
type(fset)

arcgis.features.feature.FeatureSet

# Check length
len(fset.features)

# Check geometry of a feature in the featureset
fset.features[0].geometry

{'x': -12462673.723706165,
 'y': 5384674.994080178,
 'spatialReference': {'wkid': 102100, 'latestWkid': 3857}}

The fields property of a FeatureSet returns a list containing information about each column recorded as a dictionary. Let's use the fields property to access information about the first column.

# Check details of a column in the feature set
fset.fields[0]

{'name': 'FID',
 'type': 'esriFieldTypeOID',
 'alias': 'FID',
 'sqlType': 'sqlTypeInteger',
 'domain': None,
 'defaultValue': None}

Let's get the names of the columns in the data.

# Get column names
f_names = [f['name'] for f in fset.fields]
f_names[:5]

['FID', 'NAME', 'CLASS', 'ST', 'STFIPS']

Now, let's create a Spatially enabled DataFrame from a FeatureSet using the .sdf property.

# Create SeDF from FeatureSet
fset_df = fset.sdf
fset_df.shape

(316, 50)

# Check head
fset_df.head(2)

	FID	NAME	CLASS	ST	STFIPS	PLACEFIPS	CAPITAL	POP_CLASS	POPULATION	POP2010	...	MARHH_NO_C	MHH_CHILD	FHH_CHILD	FAMILIES	AVE_FAM_SZ	HSE_UNITS	VACANT	OWNER_OCC	RENTER_OCC	SHAPE
0	1	Ammon	city	ID	16	1601990		6	15181	13816	...	1131	106	335	3352	3.61	4747	271	3205	1271	{"x": -12462673.723706165, "y": 5384674.994080...
1	2	Blackfoot	city	ID	16	1607840		6	11946	11899	...	1081	174	381	2958	3.31	4547	318	2788	1441	{"x": -12506251.313993266, "y": 5341537.793529...

2 rows × 50 columns

# Check geometry type
fset_df.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureSet.

Create SeDF from FeatureCollection

Tools within the ArcGIS API for Python often return a FeatureCollection object as a result of some analysis. A FeatureCollection is an in-memory collection of Feature objects with rendering information. Similar to feature layers, feature collections can also be used to store features. With a feature collection, a service is not created to serve out the feature data.

Let's create a SeDF from a FeatureCollection. Here, we:

Import the Major Ports feature layer.
Create 5 mile buffers using create_buffers() tool resulting in a FeatureCollection.
Using the query() method on a FeatureCollection returns a FeatureSet object. We will create a SeDF from the buffered FeatureCollection using the the .sdf property of a FeatureSet object returned from query().

# Get the ports item
ports_item = gis.content.get("405963eaea24428c9db236ec289760eb")
ports_item

Major Ports
This feature layer, utilizing data from the U.S. Department of Transportation, depicts Major Ports in the United States by total tonnage.

Feature Layer Collection by Federal_User_Community
Last Modified: October 27, 2021
0 comments, 157,223 views

# Get the ports layer
ports_lyr = ports_item.layers[0]
ports_lyr

<FeatureLayer url:"https://geo.dot.gov/server/rest/services/NTAD/Ports_Major/MapServer/0">

# Create buffers
from arcgis.features.use_proximity import create_buffers
ports_buffer50 = create_buffers(
    ports_lyr, distances=[5], units='Miles', gis=agol_gis)

# Check type of result from the analysis
type(ports_buffer50)

arcgis.features.feature.FeatureCollection

The create_buffers() tool resulted in a FeatureCollection.

Now, we will create a SeDF from the FeatureCollection object.

# Create SeDF
sedf_fc = ports_buffer50.query().sdf
sedf_fc.head(2)

	OBJECTID_1	OBJECTID	ID	PORT	PORT_NAME	GRAND_TOTA	FOREIGN_TO	IMPORTS	EXPORTS	DOMESTIC	BUFF_DIST	ORIG_FID	AnalysisArea	SHAPE
0	1	1	124	C4947	Unalaska Island, AK	1652281	1236829	426251	810578	415452	5	1	78.528402	{"rings": [[[-18806114.3995, 7138385.537799999...
1	2	2	85	C4410	Kahului, Maui, HI	3615449	20391	20391	0	3595058	5	2	78.528402	{"rings": [[[-17418472.419, 2388455.4312999994...

# Check geometry type
sedf_fc.spatial.geometry_type

['polygon']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureCollection.

Read in local GIS data

Local geospatial data, such as Feature classes and shapefiles can be easily accessed using the Spatially enabled DataFrame. The from_featureclass() method can be used to access local data. Let's look at some examples.

Reading a Shapefile

A locally stored shapefile can be accessed by passing the location of the file in the from_featureclass() method.

Note: In the absence of arcpy, the PyShp package must be present in your current conda environment in order to read shapefiles. To check if PyShp is present, you can run the following in a cell: !conda list pyshp To install PyShp, you can run the following in a cell: !conda install pyshp

# Reading from shape file
shp_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.shp")
shp_df.shape

(3886, 51)

shp_df.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the shapefile stored locally.

Shapefile from a URL

The url of a zipped shapefile can be used to create a SeDF by passing the url as location in the from_featureclass() method. The image below shows how the operation can be performed.

Note: This operation requires PyShp to be available in the environment.

Reading a Featureclass

A featureclass can be accessed from a File Geodatabase by passing its location in the from_featureclass() method.

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment in order to read a featureclass. To check if Fiona is present, you can run the following in a cell: !conda list fiona To install Fiona, you can run the following in a cell: !conda install fiona

# Reading from FGDB
fcls_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.gdb/cities")
fcls_df.shape

(3886, 51)

# Check head
fcls_df.head(2)

	OBJECTID	age_10_14	age_15_19	age_20_24	age_25_34	age_35_44	age_45_54	age_55_64	age_5_9	age_65_74	...	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	SHAPE
0	1	1313	1058	734	2031	1767	1446	1136	1503	665	...	1601990	13816	15181	6	1271	ID	16	271	13002	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	2	890	817	818	1799	1235	1330	1143	1099	721	...	1607840	11899	11946	6	1441	ID	16	318	9893	{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
fcls_df.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the featureclass stored locally.

Specify optional parameters

The from_featureclass() method allows users to specify optional parameters when the ArcPy library is available in the current environment. These parameters are:

sql_clause: a pair of SQL prefix and postfix clauses, sql_clause=(prefix,postfix), organized in a list or a tuple can be passed to query specific data. The parameter allows only a small set of operations to be performed. Learn more about the allowed operations here.
where_clause: where statement to subset the data. Learn more about it here.
fields: to subset the data for specific fields.
spatial_filter: a geometry object to filter the results.

Note: The operations below can only be performed in an environment that contains arcpy.

Subset data for specific fields

# Subset for fields
fcls_flds = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st', 'pop_class'])
fcls_flds.shape

(3886, 3)

# Check head
fcls_flds.head(2)

	st	pop_class	SHAPE
0	ID	6	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	ID	6	{"x": -12506251.314, "y": 5341537.793499999, "...

Subset using `where_clause`

Learn more about how to use where_clause here.

# Subset using where_clause
fcls_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  where_clause="st='ID' and pop_class=6")
fcls_whr.shape

(15, 51)

# Check head
fcls_whr.head(2)

	OBJECTID	age_10_14	age_15_19	age_20_24	age_25_34	age_35_44	age_45_54	age_55_64	age_5_9	age_65_74	...	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	SHAPE
0	1	1313	1058	734	2031	1767	1446	1136	1503	665	...	1601990	13816	15181	6	1271	ID	16	271	13002	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	2	890	817	818	1799	1235	1330	1143	1099	721	...	1607840	11899	11946	6	1441	ID	16	318	9893	{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

Subset using `fields` and `where_clause`

# Subset using where_clause
flds_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  fields=[
                                                      'st', 'pop_class', 'age_10_14', 'age_15_19'],
                                                  where_clause="st='ID' and pop_class=6")
flds_whr.shape

(15, 5)

# Check head
flds_whr.head(2)

	st	pop_class	age_10_14	age_15_19	SHAPE
0	ID	6	1313	1058	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	ID	6	890	817	{"x": -12506251.314, "y": 5341537.793499999, "...

Subset using `sql_clause`

sql_clause can be combined with fields and where_clause to further subset the data. You can learn more about the allowed operations here. Now let's look at some examples.

Prefix `sql_clause` - DISTINCT operation

# Prefix Sql clause - DISTINCT operation
fcls_sql1 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   sql_clause=("DISTINCT pop_class", None))

# Check shape
fcls_sql1.shape

(3886, 51)

# Check head
fcls_sql1.head(2)

	OBJECTID	age_10_14	age_15_19	age_20_24	age_25_34	age_35_44	age_45_54	age_55_64	age_5_9	age_65_74	...	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	SHAPE
0	941	1247	1213	1043	2022	1692	2116	1827	1187	1037	...	0507330	15620	14771	6	3006	AR	05	1303	6216	{"x": -10006810.091, "y": 4290154.581699997, "...
1	1405	796	748	754	1999	1717	2062	1450	760	851	...	2466850	12677	13188	6	814	MD	24	281	11613	{"x": -8517714.7855, "y": 4744316.880199999, "...

2 rows × 51 columns

Postfix `sql_clause` with specific fields

Here, we will subset the data for the state and population class fields and apply a postfix clause.

# Postfix Sql clause with specific fields
fcls_sql2 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st', 'pop_class'],
                                                   sql_clause=(None, "ORDER BY st, pop_class"))
# Check shape
fcls_sql2.shape

(3886, 3)

# Check head
fcls_sql2.head()

	st	pop_class	SHAPE
0	AK	6	{"x": -16417572.1606, "y": 9562359.403800003, ...
1	AK	6	{"x": -16455422.2224, "y": 9574022.0224, "spat...
2	AK	6	{"x": -16444303.0276, "y": 9568008.9705, "spat...
3	AK	6	{"x": -14962313.3618, "y": 8031014.926600002, ...
4	AK	6	{"x": -16657118.680399999, "y": 8746757.662600...

Prefix and Postfix `sql_clause` with specific fields and `where_clause`

Here, we will subset the data using where_clause, keep specific fields, and then apply both prefix and postfix clause.

# Prefix and Postfix sql_clause
fcls_sql3_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                      fields=[
                                                          'st', 'name', 'pop_class', 'age_10_14'],
                                                      where_clause="st='ID'",
                                                      sql_clause=("DISTINCT pop_class", "ORDER BY name"))

# Check Shape
fcls_sql3_df.shape

(22, 5)

# Check head
fcls_sql3_df.head()

	st	name	pop_class	age_10_14	SHAPE
0	ID	Ammon	6	1313	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	ID	Blackfoot	6	890	{"x": -12506251.314, "y": 5341537.793499999, "...
2	ID	Boise City	8	12750	{"x": -12938676.683600001, "y": 5403597.049500...
3	ID	Burley	6	790	{"x": -12667411.4024, "y": 5241722.820600003, ...
4	ID	Caldwell	7	3803	{"x": -12989383.6745, "y": 5413226.487300001, ...

Using `spatial_filter`

spatial_filter can be used to query the results by using a spatial relationship with another geometry. The spatial filtering is even more powerful when integrated with Geoenrichment. Let's use this approach to filter our results for the state of Idaho. In this example, we will:

use arcgis.geoenrichment.Country to derive the geometries for the state of Idaho.
use arcgis.geometry.filters.intersects(geometry, sr=None) to create a geometry filter object that filters results whose geometry intersects with the specified geometry (i.e. filter data points within the boundary of Idaho).
pass the geometry filter object to spatial_filter to get desired results.

Note: To perform enrichment operations, GeoEnrichment must be configured in your GIS organization. GeoEnrichment consumes credits, and you can learn more about credit consumption here.

# Basic Imports
from arcgis.geometry import Geometry
from arcgis.geometry.filters import intersects
from arcgis.geoenrichment import Country

# Create country object
usa = Country.get('US', gis=agol_gis)
type(usa)

arcgis.geoenrichment.enrichment.Country

# Get boundaries for Idaho
named_area_ID = usa.search(query='Idaho', layers=['US.States'])
display(named_area_ID[0])
named_area_ID[0].geometry.as_arcpy

<NamedArea name:"Idaho" area_id="16", level="US.States", country="147">

# Create spatial reference
sr_id = named_area_ID[0].geometry["spatialReference"]
sr_id

{'wkid': 4326, 'latestWkid': 4326}

# Construct a geometry filter using the filter geometry
id_state_filter = intersects(named_area_ID[0].geometry,
                             sr=sr_id)
type(id_state_filter)

dict

# Pass geometry filter object as a spatial_filter
fcls_spfl_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                      fields=[
                                                          'st', 'name', 'pop_class', 'age_10_14'],
                                                      spatial_filter=id_state_filter)
# Check shape
fcls_spfl_df.shape

(22, 5)

# Check head
fcls_spfl_df.head()

	st	name	pop_class	age_10_14	SHAPE
0	ID	Ammon	6	1313	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	ID	Blackfoot	6	890	{"x": -12506251.314, "y": 5341537.793499999, "...
2	ID	Boise City	8	12750	{"x": -12938676.683600001, "y": 5403597.049500...
3	ID	Burley	6	790	{"x": -12667411.4024, "y": 5241722.820600003, ...
4	ID	Caldwell	7	3803	{"x": -12989383.6745, "y": 5413226.487300001, ...

The result shows the data points filtered for Idaho as defined by the spatial filter.

You can learn more about applying spatial filters in our Working with geometries guide series.

Read in DataFrame with Addresses

A SeDF can be easily created from a DataFrame with address information using the from_df() method. This method geocodes the addresses using the first configured geocoder in your GIS. The locations generated after geocoding are used as the geometry of the SeDF.

You can learn more about geocoding in our Finding Places with geocoding guide series.

Note: The from_df() method performs a batch geocoding operation which consumes credits. If a geocoder is not specified, then the first configured geocoder in your GIS organization will be used. Learn more about credit consumption here.

To avoid credit consumption, you may specify your own geocoder.

Let's look at an example of using from_df(). We will read addresses into a DataFrame using the pd.read_csv() method. Next, we will create a SeDF by passing the DataFrame and address column as parameters to the from_df() method.

# Read the csv file with address into a DataFrame
orders_df = pd.read_csv("./sedf_data/cities/orders.csv")

# Check head
orders_df.head()

	Address
0	602 Murray Cir, Sausalito, CA 94965
1	340 Stockton St, San Francisco, CA 94108
2	3619 Balboa St, San Francisco, CA 94121
3	1274 El Camino Real, San Bruno, CA 94066
4	625 Monterey Blvd, San Francisco, CA 94127

The DataFrame shows a column with address information.

# Use from_df to create SeDF
orders_sdf = pd.DataFrame.spatial.from_df(
    df=orders_df, address_column="Address")
orders_sdf.head()

	Address	SHAPE
0	602 Murray Cir, Sausalito, CA 94965	{"x": -122.47885242199999, "y": 37.83735920100...
1	340 Stockton St, San Francisco, CA 94108	{"x": -122.44955096499996, "y": 37.73152250200...
2	3619 Balboa St, San Francisco, CA 94121	{"x": -122.49772620499999, "y": 37.77567413500...
3	1274 El Camino Real, San Bruno, CA 94066	{"x": -122.40685153899994, "y": 37.78910429100...
4	625 Monterey Blvd, San Francisco, CA 94127	{"x": -122.42218381299995, "y": 37.63856151200...

# Check geometry type
orders_sdf.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with address information.

Read in DataFrame with Lat/Long Information

As we saw in part-1 of this guide series, a SeDF can be created from any Pandas DataFrame with location information (Latitude and Longitude) using the from_xy() method.

Let's look at an example. We will read the data with latitude and longitude information into a DataFrame using the pd.read_csv() method. Then, we will create a SeDF by passing the DataFrame, latitude, and longitude as parameters to the from_xy() method.

# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Return the first 5 records
cms_df.head()

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	LONGITUDE	LATITUDE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	-87.792973	42.012012
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	-85.197651	40.392722
2	PARKWAY MANOR	MARION	IL	0	0	0	131	84	-88.982944	37.750143
3	AVANTARA LONG GROVE	LONG GROVE	IL	6	141	0	195	131	-87.986442	42.160843
4	HARMONY NURSING & REHAB CENTER	CHICAGO	IL	19	75	16	180	116	-87.726353	41.975505

# Create a SeDF
cms_sedf = pd.DataFrame.spatial.from_xy(
    df=cms_df, x_column='LONGITUDE', y_column='LATITUDE', sr=4326)

# Check head
cms_sedf.head()

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	LONGITUDE	LATITUDE	SHAPE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	-87.792973	42.012012	{"spatialReference": {"wkid": 4326}, "x": -87....
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	-85.197651	40.392722	{"spatialReference": {"wkid": 4326}, "x": -85....
2	PARKWAY MANOR	MARION	IL	0	0	0	131	84	-88.982944	37.750143	{"spatialReference": {"wkid": 4326}, "x": -88....
3	AVANTARA LONG GROVE	LONG GROVE	IL	6	141	0	195	131	-87.986442	42.160843	{"spatialReference": {"wkid": 4326}, "x": -87....
4	HARMONY NURSING & REHAB CENTER	CHICAGO	IL	19	75	16	180	116	-87.726353	41.975505	{"spatialReference": {"wkid": 4326}, "x": -87....

The SHAPE feature shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with latitude and longitude information.

Read in GeoPandas DataFrame

A SeDF can be easily created from a GeoPandas's GeoDataFrame using the from_geodataframe() method. We will:

Import Geopandas and create a GeoDataFrame.
Create a Spatially enabled DataFrame from a GeoDataFrame.

Create a GeoDataFrame

Here, we will create a GeoDataFrame from a Pandas DataFrame, cms_df, defined above.

# Import libraries
from geopandas import GeoDataFrame
from shapely.geometry import Point

# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Create Geopandas DataFrame
gdf = GeoDataFrame(cms_df.drop(['LONGITUDE', 'LATITUDE'], axis=1),
                   crs={'init': 'epsg:4326'},
                   geometry=[Point(xy) for xy in zip(cms_df.LONGITUDE, cms_df.LATITUDE)])
gdf.shape

(124, 9)

# Check head
gdf.head(2)

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	geometry
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	POINT (-87.79297 42.01201)
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	POINT (-85.19765 40.39272)

A GeoDataFrame has been created with a geometry column that stores the geometry of the dataset.

Create a SeDF from GeoDataFrame

Here, we will create a SeDF from the gdf GeoDataFrame created above using the from_geodataframe() method.

# Create a SeDF
sedf_gpd = pd.DataFrame.spatial.from_geodataframe(gdf)
sedf_gpd.head(2)

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	SHAPE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	{"x": -87.792973, "y": 42.012012, "spatialRefe...
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	{"x": -85.197651, "y": 40.392722, "spatialRefe...

# Check geometry type
sedf_gpd.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a GeoDataFrame.

Read in feather format data

A SeDF can be easily created from the data in feather format using the from_feather() method. The method's defaults SHAPE is the spatial_column for geo-spatial information, but any other column with spatial information can be specified.

# Check head
cms_sedf.head(2)

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	LONGITUDE	LATITUDE	SHAPE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	-87.792973	42.012012	{"spatialReference": {"wkid": 4326}, "x": -87....
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	-85.197651	40.392722	{"spatialReference": {"wkid": 4326}, "x": -85....

# Create SeDf by reading from feather
sedf_fthr = pd.DataFrame.spatial.from_feather(
    './sedf_data/cities/sample_cms_data.feather')
sedf_fthr.head(2)

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	LONGITUDE	LATITUDE	SHAPE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	-87.792973	42.012012	{"x": -87.792973, "y": 42.012012, "spatialRefe...
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	-85.197651	40.392722	{"x": -85.197651, "y": 40.392722, "spatialRefe...

# Check geometry type
sedf_fthr.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from feather format data.

Read in Non-spatial Table data

Non-spatial table data can be hosted on ArcGIS Online or ArcGIS Enterprise, or it can be stored locally in a File Geodatabase. A SeDF can be easily created from such non-spatial table data using the following methods:

from_table() - for local data
from_layer() - for data hosted on ArcGIS Online or Enterprise

Using the `from_table()` method

A SeDF can be created from local non-spatial data using the from_table() method. The method can read a csv file (in any environment) or a table stored in a File Geodatabase (with ArcPy only).

Reading a csv file

# Create SeDF
tbl_df = pd.DataFrame.spatial.from_table(
    filename='./sedf_data/cities/sample_cms_data.csv')
tbl_df.head(2)

	Provider Name	Provider City	Provider State	Residents Total Admissions COVID-19	Residents Total COVID-19 Cases	Residents Total COVID-19 Deaths	Number of All Beds	Total Number of Occupied Beds	LONGITUDE	LATITUDE
0	GROSSE POINTE MANOR	NILES	IL	5	56	12	99	61	-87.792973	42.012012
1	MILLER'S MERRY MANOR	DUNKIRK	IN	0	0	0	46	43	-85.197651	40.392722

A Pandas DataFrame without any spatial information is returned.

Reading table from a File Geodatabase

Note: The operation below can only be performed in an environment that contains arcpy.

# Create SeDF
tbl_df2 = pd.DataFrame.spatial.from_table(
    filename="./sedf_data/cities/cities.gdb/cities_table_export")
tbl_df2.head(2)

	OBJECTID	NAME	OTHER	OWNER_OCC	PLACEFIPS	POP2010	POPULATION	POP_CLASS	RENTER_OCC	ST	STFIPS	VACANT	WHITE
0	1	Ammon	307	3205	1601990	13816	15181	6	1271	ID	16	271	13002
1	2	Blackfoot	1077	2788	1607840	11899	11946	6	1441	ID	16	318	9893

A Pandas DataFrame without any spatial information is returned.

Using the `from_layer()` method

A SeDF can be created from hosted non-spatial data using thefrom_layer() method.

tbl_item = agol_gis.content.get("019215fdda4b4b3eb5b4712f3b06f544")
tbl_item

sedf_major_cities_table

Table Layer by api_data_owner
Last Modified: September 30, 2024
0 comments, 3 views

# Get table url
tbl = tbl_item.tables[0]
tbl

<Table url:"https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/sedf_major_cities_table/FeatureServer/0">

import pandas as pd
tbl_df2 = pd.DataFrame.spatial.from_layer(tbl)
tbl_df2.head(2)

	OBJECTID	PLACEFIPS	POP2010	POPULATION	POP_CLASS	STFIPS	CLASS	ObjectId2
0	0	1601990	13816	15181	6	16	city	1
1	1	1607840	11899	11946	6	16	city	2

A Pandas DataFrame without any spatial information is returned.

Read in data from 'lite and portable' databases

Geospatial data stored in a mobile geodatabase (.geodatabase) or a SQLite Database can be easily accessed using the Spatially enabled DataFrame.

A mobile geodatabase (.geodatabase) is a collection of various types of GIS datasets contained in a single file on disk that can store, query, and manage spatial and nonspatial data. Mobile geodatabases are stored in an SQLite database.
SQLite is a full-featured relational database with the advantage of being portable and interoperable making it ubiquitous in mobile app development.

The from_featureclass() method can be used to create a SeDF by reading in data from these databases. Let's look at some examples.

Note: The operations below can only be performed in an environment that contains arcpy.

Read from a mobile geodatabase

# Reading from mobile geodatabase
mobile_gdb_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities_mobile.geodatabase/main.cities")
mobile_gdb_df.shape

(3886, 51)

# Check head
mobile_gdb_df.head(2)

	OBJECTID	age_10_14	age_15_19	age_20_24	age_25_34	age_35_44	age_45_54	age_55_64	age_5_9	age_65_74	...	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	SHAPE
0	1	1313	1058	734	2031	1767	1446	1136	1503	665	...	1601990	13816	15181	6	1271	ID	16	271	13002	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	2	890	817	818	1799	1235	1330	1143	1099	721	...	1607840	11899	11946	6	1441	ID	16	318	9893	{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
mobile_gdb_df.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Read from a SQLite database

# Reading from sqlite database
sqlite_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.sqlite/main.cities")
sqlite_df.shape

(3886, 51)

# Check head
sqlite_df.head(2)

	OBJECTID	age_10_14	age_15_19	age_20_24	age_25_34	age_35_44	age_45_54	age_55_64	age_5_9	age_65_74	...	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	SHAPE
0	1	1313	1058	734	2031	1767	1446	1136	1503	665	...	1601990	13816	15181	6	1271	ID	16	271	13002	{"x": -12462673.7237, "y": 5384674.994099997, ...
1	2	890	817	818	1799	1235	1330	1143	1099	721	...	1607840	11899	11946	6	1441	ID	16	318	9893	{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
sqlite_df.spatial.geometry_type

['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Conclusion

In this guide, we explored how Spatially enabled DataFrame (SeDF) can be used to read spatial data from various formats. We started by reading data from web feature layers and using the query() operation to optimize performance and results. We explored reading data from various local data sources such as file geodatabase and shapefile. Next, we explained how data with address or coordinate information, in a geopandas dataframe, or in feather format can be used to create a SeDF. We also discussed creating SeDF from non-spatial table data. Towards the end, we also discussed how SeDF can be created using data from lite and portable databases.

In the next part of the guide series, you will learn about exporting data using Spatially enabled DataFrame.

Note: Given the importance and popularity of Spatially enabled DataFrame, we are revisiting our documentation for this topic. Our goal is to enhance the existing documentation to showcase the various capabilities of Spatially enabled DataFrame in detail with even more examples this time.

Creating quality documentation is time-consuming and exhaustive, but we are committed to providing you with the best experience possible. With that in mind, we will be rolling out the revamped guides on this topic as different parts of a guide series (like the Data Engineering or Geometry guide series). This is "part-2" of the guide series for Spatially Enabled DataFrame. You will continue to see the existing documentation as we revamp it to add new parts. Stay tuned for more on this topic.

Part-2 Data IO with SeDF - Accessing Data

Introduction

Accessing GIS Data

Read in Web Feature Layers

Memory usage and the query() operation

Create SeDF from FeatureSet

Create SeDF from FeatureCollection

Read in local GIS data

Reading a Shapefile

Shapefile from a URL

Reading a Featureclass

Subset data for specific fields

Subset using where_clause

Subset using fields and where_clause

Subset using sql_clause

Prefix sql_clause - DISTINCT operation

Postfix sql_clause with specific fields

Prefix and Postfix sql_clause with specific fields and where_clause

Using spatial_filter

Read in DataFrame with Addresses

Read in DataFrame with Lat/Long Information

Read in GeoPandas DataFrame

Create a GeoDataFrame

Create a SeDF from GeoDataFrame

Read in feather format data

Read in Non-spatial Table data

Using the from_table() method

Reading a csv file

Reading table from a File Geodatabase

Using the from_layer() method

Read in data from 'lite and portable' databases

Read from a mobile geodatabase

Read from a SQLite database

Conclusion

Memory usage and the `query()` operation

Subset using `where_clause`

Subset using `fields` and `where_clause`

Subset using `sql_clause`

Prefix `sql_clause` - DISTINCT operation

Postfix `sql_clause` with specific fields

Prefix and Postfix `sql_clause` with specific fields and `where_clause`

Using `spatial_filter`

Using the `from_table()` method

Using the `from_layer()` method