Data sources

GeoAnalytics Engine supports loading and saving data from some common spatial data sources in addition to the data sources supported by Spark. After importing the geoanalytics module, these spatial data sources can be accessed by setting the format when loading or saving a Spark DataFrame.

Most data sources support loading from a single file or from a folder containing multiple files. When loading from a folder, all files within the folder must be the same format and have the same schema. For example, the line below shows a DataFrame, df, being created from multiple shapefiles that are stored in a folder called hurricanes.

df = spark.read.format("shapefile").load("S3://my-data/hurricanes")

The same format() string can also be used to save. For example, the line below shows a DataFrame, df, being saved as a collection of shapefiles stored in HDFS.

df.write.format("shapefile").save("hdfs://nn1home:8020/hurricanes")

Because Spark is a distributed engine, multiple writers are used to save a single DataFrame which results in a collection of files unless the data is explicitly coalesced to one partition before saving.

The table below summarizes the spatial data sources available for loading and saving in ArcGIS GeoAnalytics Engine 1.4.x.

Data sourceFormatLoadSave
CSVcvsYesYes
Feature servicefeature-serviceYesYes
File geodatabasefilegdbYesNo
GeoJSONgeojsonYesYes
GeoParquetgeoparquetYesYes
JDBCjdbcYesYes
ORCorcYesYes
ParquetparquetYesYes
Esri shapefileshapefileYesYes
Vector tilesvector-tileNoYes

What next?

For more examples on how to read and save data formats, see the following tutorials:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.