Part-3 Data IO with SeDF - Exporting Data

Introduction

In part-2 of this guide series, we saw how GIS data can be accessed from various data formats using Spatially enabled DataFrame (SeDF). In this part of the guide series, we will look at how SeDF can be used to export the data to various spatial and non-spatial formats. We will also explore how local data can be easily overwritten using SeDF. Let's explore some of the different options available with the versatile Spatially enabled DataFrame.

The data used in this guide is provided as an item. We will start by importing some libaries and downloading and extracting the data needed for the analysis in this guide.

# Import Libraries
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.gis import GIS
from IPython.display import display
import zipfile
import os
import shutil
# Create a GIS connection
gis = GIS()
agol_gis = GIS("https://www.arcgis.com","arcgis_python","amazing_arcgis_123")
# Get the data item
data_item = gis.content.get('c7140ae3d7ae4fd0817181461019aa75')
data_item
sedf_guide_data
Data for Spatially enabled DataFrame GuidesShapefile by api_data_owner
Last Modified: November 11, 2021
0 comments, 3 views

The cell below downloads and extracts the data from the data item to your machine.

# Download and extract the data
def unzip_data():
    """
    This function:
    - creates a directory `sedf_data` to download the data from the item
    - downloads the item as `sedf_guide_data.zip` file in the sedf_data directory
    - unzips and extracts the data to '.\sedf_data\cities'.
    """
    try:
        
        data_dir = os.path.join(os.getcwd(), 'sedf_data')    # path to downloaded data folder
        
        # remove existing cities directory if exists
        if os.path.isdir(data_dir):
            shutil.rmtree(data_dir)
            print(f'Removed existing data directory')
        else:
            os.makedirs(data_dir)
            
        data_item.download(data_dir)    # download the data item
        zipped_file_path = os.path.join(data_dir, 'sedf_guide_data.zip')    # path to zipped file inside data folder

        # unzip the data
        zip_ref = zipfile.ZipFile(zipped_file_path, 'r')
        zip_ref.extractall(data_dir)
        zip_ref.close()
        
        cities_dir = os.path.join(data_dir, 'cities')    # path to new cities directory
        print(f'Dataset unzipped at: {os.path.relpath(cities_dir)}')
        
    except Exception as e:
        print(f'Error unzipping file: {e}')
        

# Extract data
unzip_data()
Dataset unzipped at: sedf_data\cities

Create a SeDF

Here, we will create a SeDF and then export the data to various data formats.

# Retrieve an item from ArcGIS Online using Item ID value
gis = GIS()
item = gis.content.get("85d0ca4ea1ca4b9abf0c51b9bd34de2e")
item
USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.Feature Layer Collection by esri_dm
Last Modified: May 19, 2020
1 comments, 33,763,272 views
# Obtain the first feature layer from the item
flayer = item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(flayer)

# Check shape
sdf.shape
(3886, 50)
# Check first few records
sdf.head()
AGE_10_14AGE_15_19AGE_20_24AGE_25_34AGE_35_44AGE_45_54AGE_55_64AGE_5_9AGE_65_74AGE_75_84...PLACEFIPSPOP2010POPULATIONPOP_CLASSRENTER_OCCSHAPESTSTFIPSVACANTWHITE
01313105873420311767144611361503665486...1601990138161518161271{"x": -12462673.723706163, "y": 5384674.994080...ID1627113002
189081781817991235133011431099721579...1607840118991194661441{"x": -12506251.313993266, "y": 5341537.793529...ID163189893
21275013959169663213527048295952417712933121767087...1608830205671225405833359{"x": -12938676.6836459, "y": 5403597.04949123...ID166996182991
3790768699144511361134935959679464...1611260103451072761461{"x": -12667411.402393516, "y": 5241722.820606...ID162417984
43803377936877571555947443624439722961222...1612250462375394275196{"x": -12989383.674504517, "y": 5413226.487333...ID16142835856

5 rows × 50 columns

# Check type of sdf
type(sdf)
pandas.core.frame.DataFrame
# Access spatial namespace
sdf.spatial.geometry_type
['point']

We can see that the dataset has 3886 records and 50 columns. Inspecting the type of sdf object and accessing the spatial namespace shows us that a Spatially enabled DataFrame has been created from all the data in the layer.

Writing GIS Data

The Spatially enabled DataFrame can export data to various data formats for use in other applications. Let's dive into the details of exporting GIS data to various sources.

Publish as a Feature Layer

Data in a Spatially enabled DataFrame can be exported to Feature layers hosted on ArcGIS Online or ArcGIS Enterprise using the to_featurelayer() method.

Let's export the sdf DataFrame, created above, to a feature layer stored in an ArcGIS Online organization.

# Export to feature layer
lyr = sdf.spatial.to_featurelayer('census_cities_export', gis=agol_gis)
lyr
census_cities_export
Feature Layer Collection by arcgis_python
Last Modified: November 12, 2021
0 comments, 0 views
# Check type
type(lyr.layers[0])
arcgis.features.layer.FeatureLayer

The census_cities_export feature layer has been created at the ArcGIS Online connection specified.

Write to JSON based formats

Data in a Spatially enabled DataFrame can be exported to JSON based formats, such as FeatureSet or FeatureCollection, using the to_featureset() and to_feature_collection() methods. Let's take a look.

Write to FeatureSet

The to_featureset() method can be used to export data from a SeDF into a FeatureSet.

# Write to FeatureSet
fset_exp = sdf.spatial.to_featureset()
# Check type
type(fset_exp)
arcgis.features.feature.FeatureSet

A FeatureSet object has been created from the data in the SeDF.

Write to FeatureCollection

The to_feature_collection() method can be used to export data from a SeDF into a FeatureCollection.

# Write to FeatureCollection
fc_exp = sdf.spatial.to_feature_collection()
# Check type
type(fc_exp)
arcgis.features.feature.FeatureCollection

A FeatureCollection object has been created from the data in the SeDF.

Write to a local file

Data in a Spatially enabled DataFrame can be exported to local spatial file formats, such as Feature classes or shapefiles, and non-spatial formats, such as csv files or tables. Let's take a look.

Write to local databases

The to_featureclass() method can be used to export spatial data from a SeDF into various local databases, such as a File geodatabase, a Mobile geodatabase (.geodatabase), or a SQLite Database.

File Geodatabase
Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.
# Export to a feature class in File Geodatabase
sdf.spatial.to_featureclass(location="./sedf_data/cities/cities.gdb/major_cities_export")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\major_cities_export'

A Feature Class has been created in a File Geodatabase from the data in the SeDF.

Mobile Geodatabase
Note: This operation can only be performed in an environment that contains arcpy.
# Export to a feature class in Mobile Geodatabase
sdf.spatial.to_featureclass(location="./sedf_data/cities/cities_mobile.geodatabase/major_cities_export")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities_mobile.geodatabase\\main.major_cities_export'

A Feature Class has been created in a Mobile Geodatabase from the data in the SeDF.

SQLite Database
Note: This operation can only be performed in an environment that contains arcpy.
# Export to a feature class in SQLite Database
sdf.spatial.to_featureclass(location="./sedf_data/cities/cities.sqlite/major_cities_export")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.sqlite\\major_cities_export'

A Feature Class has been created in a SQLite Database from the data in the SeDF.

Write to a shapefile

The to_featureclass() method can also be used to export spatial data from a SeDF into a shapefile.

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.
# Export to a shapefile
sdf.spatial.to_featureclass(location="./sedf_data/cities/major_cities_export.shp")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\major_cities_export.shp'

A Shapefile has been created from the data in the SeDF.

Write to Non-spatial formats

The to_table() method can be used to export data from a SeDF into non-spatial formats, such as csv files or tables.

Write to a csv file
# Export to a csv file
sdf.spatial.to_table(location="./sedf_data/cities/cities_table_export.csv")
'./sedf_data/cities/cities_table_export.csv'

A csv file has been created from the data in the SeDF.

Write to a table in a File Geodatabase
Note: The operation below can only be performed in an environment that contains arcpy.
# Export to a table in a File Geodatabase
sdf.spatial.to_table(location="./sedf_data/cities/cities.gdb/cities_table_export")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\cities_table_export'

A table has been created in a File Geodatabase from the data in the SeDF.

Overwriting GIS Data

The GIS data stored locally can be easily overwritten using the Spatially enabled DataFrame. Let's take a look.

Overwrite a Featureclass

The default overwrite=True argument in the to_featureclass() method can be used to overwrite an existing feature class from the data in a SeDF.

The major_cities_export featureclass was created in a section above using sdf. We will overwrite this featureclass with a subset of the data from sdf.

# Subset the data
sub_df = sdf.iloc[:10,-13:].copy()
sub_df.shape
(10, 13)
# Check head
sub_df.head(2)
NAMEOTHEROWNER_OCCPLACEFIPSPOP2010POPULATIONPOP_CLASSRENTER_OCCSHAPESTSTFIPSVACANTWHITE
0Ammon30732051601990138161518161271{"x": -12462673.723706163, "y": 5384674.994080...ID1627113002
1Blackfoot107727881607840118991194661441{"x": -12506251.313993266, "y": 5341537.793529...ID163189893
Note: In the absence of arcpy, the Fiona package must be present in your current conda environment to perform this operation.
# Export sub_df to the existing major_cities_export featureclass
sub_df.spatial.to_featureclass(location="./sedf_data/cities/cities.gdb/major_cities_export", overwrite=True)
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\major_cities_export'
# Check if the featureclass is updated
fc_new_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/major_cities_export")
fc_new_df.shape
(10, 14)

The featureclass has been overwritten with new data.

Overwrite a table

The default overwrite=True argument in the to_table() method can be used to overwrite an existing non-spatial table from the data in a SeDF.

The cities_table_export table was created in a section above using sdf. We will overwrite this table with a subset of the data sub_df defined above.

Table in a csv file

# Export sub_df to an existing cities_table_export.csv file
sub_df.spatial.to_table(location="./sedf_data/cities/cities_table_export.csv", overwrite=True)
'./sedf_data/cities/cities_table_export.csv'
# Check if the csv file is updated
tbl_new_df = pd.DataFrame.spatial.from_table(filename="./sedf_data/cities/cities_table_export.csv")
tbl_new_df.shape
(10, 14)

The csv file has been overwritten with new data.

Table in a File Geodatabase

Note: The operations below can only be performed in an environment that contains arcpy.
# Export sub_df to an existing table in a File Geodatabase
sub_df.spatial.to_table(location="./sedf_data/cities/cities.gdb/cities_table_export")
'C:\\Users\\mohi9282\\Documents\\sedf_guides\\sedf_data\\cities\\cities.gdb\\cities_table_export'
# Check if the table file is updated
tbl_new_df2 = pd.DataFrame.spatial.from_table(filename="./sedf_data/cities/cities.gdb/cities_table_export")
tbl_new_df2.shape
(10, 13)

The table file has been overwritten with new data.

Memory-based Workspace

Writing geoprocessing outputs to memory is an alternative to writing output to a geodatabase or file-based format. It is often significantly faster than writing to on-disk formats. Data written into memory is temporary and is deleted when the application is closed, so it is an ideal location to write intermediate data.

ArcGIS provides two memory-based workspaces where geoprocessing outputs can be written.

  • memory - is a new memory-based workspace developed for ArcGIS Pro that supports output feature classes, tables, and raster datasets.
  • in_memory - is the legacy memory-based workspace built for ArcMap that supports output feature classes, tables, and raster datasets.

Let's look at an example of writing to a memory workspace. Here, we will:

  • write data from SeDF to a memory workspace.
  • use the data in the memory workspace to generate buffers and export the results to another memory workspace.
  • see how results in a memory workspace can be converted to a featureclass.
  • delete memory workspaces.
Caution:
    - Memory-based workspaces do not support geodatabase elements, such as feature datasets, representations, topologies, geometric networks, or network datasets.
    - Folders cannot be created in memory-based workspaces.
    - Since memory-based workspaces are stored in your system's physical memory, or RAM, your system may run low on memory if you write large datasets into the workspace. This can negatively impact processing performance.
Note: The operations below can only be performed in an environment that contains arcpy.
# Import arcpy
import arcpy
# Check head
sub_df.head(2)
NAMEOTHEROWNER_OCCPLACEFIPSPOP2010POPULATIONPOP_CLASSRENTER_OCCSHAPESTSTFIPSVACANTWHITE
0Ammon30732051601990138161518161271{"x": -12462673.723706163, "y": 5384674.994080...ID1627113002
1Blackfoot107727881607840118991194661441{"x": -12506251.313993266, "y": 5341537.793529...ID163189893
# Write data from SeDF to a memory workspace.
sub_df.spatial.to_featureclass(r"memory\sub_df")
'memory\\sub_df'
# Use data in memory to generate buffers, exporting output to memory 
arcpy.Buffer_analysis(in_features=r"memory\sub_df", 
                      out_feature_class="memory\subBuffers", 
                      buffer_distance_or_field=1)

Output

memory\subBuffers

Messages

Start Time: Friday, November 12, 2021 12:14:14 PM
Succeeded at Friday, November 12, 2021 12:14:15 PM (Elapsed Time: 0.08 seconds)
# Read buffer output into a SeDF
buffered_df = pd.DataFrame.spatial.from_featureclass(r"memory\subBuffers")
buffered_df.shape
(10, 16)
# Check head
buffered_df.head(2)
OBJECTIDnameotherowner_occplacefipspop2010populationpop_classrenter_occststfipsvacantwhiteBUFF_DISTORIG_FIDSHAPE
01Ammon30732051601990138161518161271ID16271130021.01{"curveRings": [[[-12462673.7237, 5384675.9940...
12Blackfoot107727881607840118991194661441ID1631898931.02{"curveRings": [[[-12506251.314, 5341538.79349...
# Convert buffer results to a featureclass
arcpy.Dissolve_management(r"memory\subBuffers", "./sedf_data/cities/cities.gdb/memBuffers2")

Output

.\sedf_data\cities\cities.gdb\memBuffers2

Messages

Start Time: Friday, November 12, 2021 12:19:46 PM
Dissolving...
Succeeded at Friday, November 12, 2021 12:19:46 PM (Elapsed Time: 0.47 seconds)
# Delete the in-memory item
arcpy.Delete_management(r"memory\sub_df")

Output

true

Messages

Start Time: Friday, November 12, 2021 12:20:45 PM
Succeeded at Friday, November 12, 2021 12:20:45 PM (Elapsed Time: 0.00 seconds)
# Delete the in-memory item
arcpy.Delete_management(r"memory\subBuffers")

Output

true

Messages

Start Time: Friday, November 12, 2021 12:20:47 PM
Succeeded at Friday, November 12, 2021 12:20:47 PM (Elapsed Time: 0.00 seconds)

Conclusion

In this guide, we explored how Spatially enabled DataFrame (SeDF) can be used to export spatial data to various formats. We started by exporting the data to web feature layers and to in-memory JSON based formats, such as FeatureSet and FeatureCollection. Next, we explored writing the data to various local data sources, such as a file geodatabase, a mobile geodatabase, an sqlite database, and a shapefile. We also discussed exporting the data to non-spatial formats, such as a csv file or a table. We introduced how the data in local file formats, such as a feature class or a table in a File Geodatabase, can be overwritten using a SeDF. Towards the end, we discussed how the data from SeDF can be exported to in-memory workspaces.

In the next part of the guide series, you will learn about the various properties of a SeDF and how they can be used to pre-process a SeDF.

Note: Given the importance and popularity of Spatially enabled DataFrame, we are revisiting our documentation for this topic. Our goal is to enhance the existing documentation to showcase the various capabilities of Spatially enabled DataFrame in detail with even more examples this time.

Creating quality documentation is time-consuming and exhaustive, but we are committed to providing you with the best experience possible. With that in mind, we will be rolling out the revamped guides on this topic as different parts of a guide series (like the Data Engineering or Geometry guide series). This is "part-3" of the guide series for Spatially Enabled DataFrame. You will continue to see the existing documentation as we revamp it to add new parts. Stay tuned for more on this topic.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.