GeoAnalytics Engine supports loading and saving data from some
common spatial data sources in addition to the data sources supported
After importing the
geoanalytics module, these spatial data sources can be
accessed by setting the format when loading or saving a Spark DataFrame.
Most data sources support loading from a single file or from a folder
containing multiple files. When loading from a folder, all files within
the folder must be the same format and have the same schema. For example, the line
below shows a DataFrame,
df, being created from multiple shapefiles that are
stored in a folder called hurricanes.
df = spark.read.format("shapefile").load("S3://my-data/hurricanes")
format() string can also be used to save. For example, the line below
shows a DataFrame,
df, being saved as a collection of shapefiles stored in HDFS.
Because Spark is a distributed engine, multiple writers are used to save a single DataFrame which results in a collection of files unless the data is explicitly coalesced to one writer when saving.
Loading or saving are not supported for some data sources. The table below summarizes current support for loading from and saving to spatial data sources in GeoAnalytics Engine.
For more examples on how to read and save data formats, see the following tutorials: