Part-3 Data IO with SeDF - Exporting Data

Introduction

In part-2 of this guide series, we saw how GIS data can be accessed from various data formats using Spatially enabled DataFrame (SeDF). In this part of the guide series, we will look at how SeDF can be used to export the data to various spatial and non-spatial formats. We will also explore how local data can be easily overwritten using SeDF. Let's explore some of the different options available with the versatile Spatially enabled DataFrame.

The data used in this guide is provided as an item. We will start by importing some libaries and downloading and extracting the data needed for the analysis in this guide.

# Import Libraries
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.gis import GIS
from IPython.display import display
import zipfile
import os
import shutil

# Create a GIS connection
gis = GIS()
agol_gis = GIS("https://www.arcgis.com", "arcgis_python", "amazing_arcgis_123")

# Get the data item
data_item = gis.content.get('c7140ae3d7ae4fd0817181461019aa75')
data_item

sedf_guide_data
Data for Spatially enabled DataFrame Guides

Shapefile by api_data_owner
Last Modified: November 11, 2021
0 comments, 3 views

The cell below downloads and extracts the data from the data item to your machine.

# Download and extract the data
def unzip_data():
    """
    This function:
    - creates a directory `sedf_data` to download the data from the item
    - downloads the item as `sedf_guide_data.zip` file in the sedf_data directory
    - unzips and extracts the data to '.\sedf_data\cities'.
    """
    try:

        # path to downloaded data folder
        data_dir = os.path.join(os.getcwd(), 'sedf_data')

        # remove existing cities directory if exists
        if os.path.isdir(data_dir):
            shutil.rmtree(data_dir)
            print(f'Removed existing data directory')
        else:
            os.makedirs(data_dir)

        data_item.download(data_dir)    # download the data item
        # path to zipped file inside data folder
        zipped_file_path = os.path.join(data_dir, 'sedf_guide_data.zip')

        # unzip the data
        zip_ref = zipfile.ZipFile(zipped_file_path, 'r')
        zip_ref.extractall(data_dir)
        zip_ref.close()

        # path to new cities directory
        cities_dir = os.path.join(data_dir, 'cities')
        print(f'Dataset unzipped at: {os.path.relpath(cities_dir)}')

    except Exception as e:
        print(f'Error unzipping file: {e}')


# Extract data
unzip_data()

Dataset unzipped at: sedf_data\cities

Create a SeDF

Here, we will create a SeDF and then export the data to various data formats.

gis = GIS()
item = gis.content.search(
    "USA Major Cities", item_type="Feature layer", outside_org=True)[0]
item

USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.

Feature Layer Collection by esri_dm
Last Modified: May 19, 2020
1 comments, 33,763,272 views

# Obtain the first feature layer from the item
flayer = item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(flayer)

# Check shape
sdf.shape

(3886, 50)

# Check first few records
sdf.head()

	AGE_10_14	AGE_15_19	AGE_20_24	AGE_25_34	AGE_35_44	AGE_45_54	AGE_55_64	AGE_5_9	AGE_65_74	AGE_75_84	...	PLACEFIPS	POP2010	POPULATION	POP_CLASS	RENTER_OCC	SHAPE	ST	STFIPS	VACANT	WHITE
0	1313	1058	734	2031	1767	1446	1136	1503	665	486	...	1601990	13816	15181	6	1271	{"x": -12462673.723706163, "y": 5384674.994080...	ID	16	271	13002
1	890	817	818	1799	1235	1330	1143	1099	721	579	...	1607840	11899	11946	6	1441	{"x": -12506251.313993266, "y": 5341537.793529...	ID	16	318	9893
2	12750	13959	16966	32135	27048	29595	24177	12933	12176	7087	...	1608830	205671	225405	8	33359	{"x": -12938676.6836459, "y": 5403597.04949123...	ID	16	6996	182991
3	790	768	699	1445	1136	1134	935	959	679	464	...	1611260	10345	10727	6	1461	{"x": -12667411.402393516, "y": 5241722.820606...	ID	16	241	7984
4	3803	3779	3687	7571	5559	4744	3624	4397	2296	1222	...	1612250	46237	53942	7	5196	{"x": -12989383.674504517, "y": 5413226.487333...	ID	16	1428	35856

5 rows × 50 columns

# Check type of sdf
type(sdf)

pandas.core.frame.DataFrame

# Access spatial namespace
sdf.spatial.geometry_type

['point']

We can see that the dataset has 3886 records and 50 columns. Inspecting the type of sdf object and accessing the spatial namespace shows us that a Spatially enabled DataFrame has been created from all the data in the layer.

Writing GIS Data

The Spatially enabled DataFrame can export data to various data formats for use in other applications. Let's dive into the details of exporting GIS data to various sources.

Publish as a Feature Layer

Data in a Spatially enabled DataFrame can be exported to Feature layers hosted on ArcGIS Online or ArcGIS Enterprise using the to_featurelayer() method.

Let's export the sdf DataFrame, created above, to a feature layer stored in an ArcGIS Online organization.

# Export to feature layer
lyr = sdf.spatial.to_featurelayer('census_cities_export', gis=agol_gis)
lyr

census_cities_export

Feature Layer Collection by arcgis_python
Last Modified: November 12, 2021
0 comments, 0 views

# Check type
type(lyr.layers[0])

arcgis.features.layer.FeatureLayer

The census_cities_export feature layer has been created at the ArcGIS Online connection specified.

Write to JSON based formats

Data in a Spatially enabled DataFrame can be exported to JSON based formats, such as FeatureSet or FeatureCollection, using the to_featureset() and to_feature_collection() methods. Let's take a look.

Write to FeatureSet

The to_featureset() method can be used to export data from a SeDF into a FeatureSet.

# Write to FeatureSet
fset_exp = sdf.spatial.to_featureset()

# Check type
type(fset_exp)

arcgis.features.feature.FeatureSet

A FeatureSet object has been created from the data in the SeDF.

Write to FeatureCollection

The to_feature_collection() method can be used to export data from a SeDF into a FeatureCollection.

# Write to FeatureCollection
fc_exp = sdf.spatial.to_feature_collection()

# Check type
type(fc_exp)

arcgis.features.feature.FeatureCollection

A FeatureCollection object has been created from the data in the SeDF.

Write to a local file

Data in a Spatially enabled DataFrame can be exported to local spatial file formats, such as Feature classes or shapefiles, and non-spatial formats, such as csv files or tables. Let's take a look.

Write to local databases

The to_featureclass() method can be used to export spatial data from a SeDF into various local databases, such as a File geodatabase, a Mobile geodatabase (.geodatabase), or a SQLite Database.

File Geodatabase

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.

# Export to a feature class in File Geodatabase
sdf.spatial.to_featureclass(
    location="./sedf_data/cities/cities.gdb/major_cities_export")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\major_cities_export'

A Feature Class has been created in a File Geodatabase from the data in the SeDF.

Mobile Geodatabase

Note: This operation can only be performed in an environment that contains arcpy.

# Export to a feature class in Mobile Geodatabase
sdf.spatial.to_featureclass(
    location="./sedf_data/cities/cities_mobile.geodatabase/major_cities_export")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities_mobile.geodatabase\\main.major_cities_export'

A Feature Class has been created in a Mobile Geodatabase from the data in the SeDF.

SQLite Database

Note: This operation can only be performed in an environment that contains arcpy.

# Export to a feature class in SQLite Database
sdf.spatial.to_featureclass(
    location="./sedf_data/cities/cities.sqlite/major_cities_export")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.sqlite\\major_cities_export'

A Feature Class has been created in a SQLite Database from the data in the SeDF.

Write to a shapefile

The to_featureclass() method can also be used to export spatial data from a SeDF into a shapefile.

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.

# Export to a shapefile
sdf.spatial.to_featureclass(
    location="./sedf_data/cities/major_cities_export.shp")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\major_cities_export.shp'

A Shapefile has been created from the data in the SeDF.

Write to Non-spatial formats

The to_table() method can be used to export data from a SeDF into non-spatial formats, such as csv files or tables.

Write to a csv file

# Export to a csv file
sdf.spatial.to_table(location="./sedf_data/cities/cities_table_export.csv")

'./sedf_data/cities/cities_table_export.csv'

A csv file has been created from the data in the SeDF.

Write to a table in a File Geodatabase

Note: The operation below can only be performed in an environment that contains arcpy.

# Export to a table in a File Geodatabase
sdf.spatial.to_table(
    location="./sedf_data/cities/cities.gdb/cities_table_export")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\cities_table_export'

A table has been created in a File Geodatabase from the data in the SeDF.

Overwriting GIS Data

The GIS data stored locally can be easily overwritten using the Spatially enabled DataFrame. Let's take a look.

Overwrite a Featureclass

The default overwrite=True argument in the to_featureclass() method can be used to overwrite an existing feature class from the data in a SeDF.

The major_cities_export featureclass was created in a section above using sdf. We will overwrite this featureclass with a subset of the data from sdf.

# Subset the data
sub_df = sdf.iloc[:10, -13:].copy()
sub_df.shape

(10, 13)

# Check head
sub_df.head(2)

	NAME	OTHER	OWNER_OCC	PLACEFIPS	POP2010	POPULATION	POP_CLASS	RENTER_OCC	SHAPE	ST	STFIPS	VACANT	WHITE
0	Ammon	307	3205	1601990	13816	15181	6	1271	{"x": -12462673.723706163, "y": 5384674.994080...	ID	16	271	13002
1	Blackfoot	1077	2788	1607840	11899	11946	6	1441	{"x": -12506251.313993266, "y": 5341537.793529...	ID	16	318	9893

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.

# Export sub_df to the existing major_cities_export featureclass
sub_df.spatial.to_featureclass(
    location="./sedf_data/cities/cities.gdb/major_cities_export", overwrite=True)

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\major_cities_export'

# Check if the featureclass is updated
fc_new_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.gdb/major_cities_export")
fc_new_df.shape

(10, 14)

The featureclass has been overwritten with new data.

Overwrite a table

The default overwrite=True argument in the to_table() method can be used to overwrite an existing non-spatial table from the data in a SeDF.

The cities_table_export table was created in a section above using sdf. We will overwrite this table with a subset of the data sub_df defined above.

Table in a csv file

# Export sub_df to an existing cities_table_export.csv file
sub_df.spatial.to_table(
    location="./sedf_data/cities/cities_table_export.csv", overwrite=True)

'./sedf_data/cities/cities_table_export.csv'

# Check if the csv file is updated
tbl_new_df = pd.DataFrame.spatial.from_table(
    filename="./sedf_data/cities/cities_table_export.csv")
tbl_new_df.shape

(10, 14)

The csv file has been overwritten with new data.

Table in a File Geodatabase

Note: The operations below can only be performed in an environment that contains arcpy.

# Export sub_df to an existing table in a File Geodatabase
sub_df.spatial.to_table(
    location="./sedf_data/cities/cities.gdb/cities_table_export")

'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\cities_table_export'

# Check if the table file is updated
tbl_new_df2 = pd.DataFrame.spatial.from_table(
    filename="./sedf_data/cities/cities.gdb/cities_table_export")
tbl_new_df2.shape

(10, 13)

The table file has been overwritten with new data.

Memory-based Workspace

Writing geoprocessing outputs to memory is an alternative to writing output to a geodatabase or file-based format. It is often significantly faster than writing to on-disk formats. Data written into memory is temporary and is deleted when the application is closed, so it is an ideal location to write intermediate data.

ArcGIS provides two memory-based workspaces where geoprocessing outputs can be written.

memory - is a new memory-based workspace developed for ArcGIS Pro that supports output feature classes, tables, and raster datasets.
in_memory - is the legacy memory-based workspace built for ArcMap that supports output feature classes, tables, and raster datasets.

Let's look at an example of writing to a memory workspace. Here, we will:

write data from SeDF to a memory workspace.
use the data in the memory workspace to generate buffers and export the results to another memory workspace.
see how results in a memory workspace can be converted to a featureclass.
delete memory workspaces.

Caution:

- Memory-based workspaces do not support geodatabase elements, such as feature datasets, representations, topologies, geometric networks, or network datasets.

- Folders cannot be created in memory-based workspaces.

- Since memory-based workspaces are stored in your system's physical memory, or RAM, your system may run low on memory if you write large datasets into the workspace. This can negatively impact processing performance.

Note: The operations below can only be performed in an environment that contains arcpy.

# Import arcpy
import arcpy

# Check head
sub_df.head(2)

	NAME	OTHER	OWNER_OCC	PLACEFIPS	POP2010	POPULATION	POP_CLASS	RENTER_OCC	SHAPE	ST	STFIPS	VACANT	WHITE
0	Ammon	307	3205	1601990	13816	15181	6	1271	{"x": -12462673.723706163, "y": 5384674.994080...	ID	16	271	13002
1	Blackfoot	1077	2788	1607840	11899	11946	6	1441	{"x": -12506251.313993266, "y": 5341537.793529...	ID	16	318	9893

# Write data from SeDF to a memory workspace.
sub_df.spatial.to_featureclass(r"memory\sub_df")

'memory\\sub_df'

# Use data in memory to generate buffers, exporting output to memory
arcpy.Buffer_analysis(in_features=r"memory\sub_df",
                      out_feature_class="memory\subBuffers",
                      buffer_distance_or_field=1)

Output

memory\subBuffers

Messages

Start Time: Friday, November 12, 2021 12:14:14 PM
Succeeded at Friday, November 12, 2021 12:14:15 PM (Elapsed Time: 0.08 seconds)

# Read buffer output into a SeDF
buffered_df = pd.DataFrame.spatial.from_featureclass(r"memory\subBuffers")
buffered_df.shape

(10, 16)

# Check head
buffered_df.head(2)

	OBJECTID	name	other	owner_occ	placefips	pop2010	population	pop_class	renter_occ	st	stfips	vacant	white	BUFF_DIST	ORIG_FID	SHAPE
0	1	Ammon	307	3205	1601990	13816	15181	6	1271	ID	16	271	13002	1.0	1	{"curveRings": [[[-12462673.7237, 5384675.9940...
1	2	Blackfoot	1077	2788	1607840	11899	11946	6	1441	ID	16	318	9893	1.0	2	{"curveRings": [[[-12506251.314, 5341538.79349...

# Convert buffer results to a featureclass
arcpy.Dissolve_management(r"memory\subBuffers",
                          "./sedf_data/cities/cities.gdb/memBuffers2")

Output

.\sedf_data\cities\cities.gdb\memBuffers2

Messages

Start Time: Friday, November 12, 2021 12:19:46 PM
Dissolving...
Succeeded at Friday, November 12, 2021 12:19:46 PM (Elapsed Time: 0.47 seconds)

# Delete the in-memory item
arcpy.Delete_management(r"memory\sub_df")

Output

true

Messages

Start Time: Friday, November 12, 2021 12:20:45 PM
Succeeded at Friday, November 12, 2021 12:20:45 PM (Elapsed Time: 0.00 seconds)

# Delete the in-memory item
arcpy.Delete_management(r"memory\subBuffers")

Output

true

Messages

Start Time: Friday, November 12, 2021 12:20:47 PM
Succeeded at Friday, November 12, 2021 12:20:47 PM (Elapsed Time: 0.00 seconds)

Conclusion

In this guide, we explored how Spatially enabled DataFrame (SeDF) can be used to export spatial data to various formats. We started by exporting the data to web feature layers and to in-memory JSON based formats, such as FeatureSet and FeatureCollection. Next, we explored writing the data to various local data sources, such as a file geodatabase, a mobile geodatabase, an sqlite database, and a shapefile. We also discussed exporting the data to non-spatial formats, such as a csv file or a table. We introduced how the data in local file formats, such as a feature class or a table in a File Geodatabase, can be overwritten using a SeDF. Towards the end, we discussed how the data from SeDF can be exported to in-memory workspaces.

In the next part of the guide series, you will learn about the various properties of a SeDF and how they can be used to pre-process a SeDF.

Note: Given the importance and popularity of Spatially enabled DataFrame, we are revisiting our documentation for this topic. Our goal is to enhance the existing documentation to showcase the various capabilities of Spatially enabled DataFrame in detail with even more examples this time.

Creating quality documentation is time-consuming and exhaustive, but we are committed to providing you with the best experience possible. With that in mind, we will be rolling out the revamped guides on this topic as different parts of a guide series (like the Data Engineering or Geometry guide series). This is "part-3" of the guide series for Spatially Enabled DataFrame. You will continue to see the existing documentation as we revamp it to add new parts. Stay tuned for more on this topic.