Spark cluster mode allows you to configure Apache Spark on any number of nodes in a cluster of machines that you deploy. Computation is distributed across a cluster by a cluster manager. Standalone is the cluster manager included with Spark and is a simple way to get started. Other supported managers include Apache Mesos, Hadoop YARN, and Kubernetes.
GeoAnalytics Engine can be used with any cluster manager that is configured with a supported version of Spark. For a list of supported versions see Dependencies. There are three requirements common to every manager:
-
The GeoAnalytics Engine jar must configured with your Spark session using Spark runtime environment properties or command line arguments. If you need to perform a transformation that requires supplementary projection data, the corresponding jars must also be configured with your Spark session.
-
The GeoAnalytics Engine Python package must be installed on every node in the cluster in the same Python environments that are configured with PySpark. The package is provided as both a zip file and a wheel file. The file type you should use depends on how you install the package on your cluster. For more information see Python Package Management.
-
Several Spark properties must be set as shown in the table below before a Spark context is created. For more information see Spark Configuration.
Property Value spark.plugins
com.esri.geoanalytics.
Plugin spark.serializer
org.apache.spark.serializer.
Kryo Serializer spark.kryo.registrator
com.esri.geoanalytics.
Kryo Registrator
Once the requirements listed above are met you will be able to import geoanalytics in your PySpark session and authorize the module using a username and password, or a license file. For more information, see Authorization.
Try out the API by importing the SQL functions as an easy-to-use alias like ST
and listing the first 20 functions in a notebook cell:
from geoanalytics.sql import functions as ST
spark.sql("show user functions like 'ST_*'").show()
What's next?
You can now use any SQL function, track function, or analysis tool in the geoanalytics
module.
See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.