GeoAnalytics Engine supports loading and saving data from some
common spatial data sources in addition to the data sources supported
by Spark.
After importing the geoanalytics
module, these spatial data sources can be
accessed by setting the format when loading or saving a Spark DataFrame.
Most data sources support loading from a single file or from a folder containing multiple files.
When loading from a folder, all files within the folder must be the same format and have the same schema.
For example, the line below shows a DataFrame, df
, being created from multiple
shapefiles that are stored in a folder called hurricanes.
df = spark.read.format("shapefile").load("
The same format()
string can also be used to save. For example, the line below
shows a DataFrame, df
, being saved as a collection of shapefiles stored in HDFS.
df.write.format("shapefile").save("hdfs
Because Spark is a distributed engine, multiple writers are used to save a single DataFrame which results in a collection of files unless the data is explicitly coalesced to one partition before saving.
The table below summarizes the spatial data sources available for loading and saving in ArcGIS GeoAnalytics Engine.
Data source | Format | Load | Save |
---|---|---|---|
CSV | cvs | Yes | Yes |
Feature service | feature-service | Yes | Yes |
File geodatabase | filegdb | Yes | No |
GeoJSON | geojson | Yes | Yes |
GeoParquet | geoparquet | Yes | Yes |
JDBC | jdbc | Yes | Yes |
ORC | orc | Yes | Yes |
Parquet | parquet | Yes | Yes |
Esri shapefile | shapefile | Yes | Yes |
Vector tiles | vector-tile | No | Yes |
What next?
For more examples on how to read and save data formats, see the following tutorials: