Data sources

GeoAnalytics Engine supports loading and saving data from some common spatial data sources in addition to the data sources supported by Spark. After importing the geoanalytics module, these spatial data sources can be accessed by setting the format when loading or saving a Spark DataFrame.

Most data sources support loading from a single file or from a folder

containing multiple files. When loading from a folder, all files within the folder must be the same format and have the same schema. For example, the line below shows a DataFrame, df, being created from multiple shapefiles that are stored in a folder called hurricanes.

df = spark.read.format("shapefile").load("S3://my-data/hurricanes")

The same format() string can also be used to save. For example, the line below shows a DataFrame, df, being saved as a collection of shapefiles stored in HDFS.

df.write.format("shapefile").save("hdfs://nn1home:8020/hurricanes")

Because Spark is a distributed engine, multiple writers are used to save a single DataFrame which results in a collection of files unless the data is explicitly coalesced to one writer when saving.

Loading or saving are not supported for some data sources. The table below summarizes current support for loading from and saving to spatial data sources in GeoAnalytics Engine.

Data sourceFormatLoadSave
Esri shapefileshapefileYesYes
Feature servicefeature-serviceYesNo
Vector tilesvector-tileNoYes

What next?

For more examples on how to read and save data formats, see the following tutorials:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.