Spark Cluster Mode

Spark cluster mode allows you to configure Apache Spark on any number of nodes in a cluster of machines that you deploy. Computation is distributed across a cluster by a cluster manager. Standalone is the cluster manager included with Spark and is a simple way to get started. Other supported managers include Apache Mesos, Hadoop YARN, and Kubernetes.

GeoAnalytics Engine can be used with any cluster manager that is configured with a supported version of Spark. For a list of supported versions see Dependencies. There are three requirements common to every manager:

  • The GeoAnalytics Engine jar must configured with your Spark session using Spark runtime environment properties or command line arguments. If you need to perform a transformation that requires supplementary projection data, the corresponding jars must also be configured with your Spark session.

  • The GeoAnalytics Engine Python package must be installed on every node in the cluster in the same Python environments that are configured with PySpark. The package is provided as both a zip file and a wheel file. The file type you should use depends on how you install the package on your cluster. For more information see Python Package Management.

  • Several Spark properties must be set as shown in the table below before a Spark context is created. For more information see Spark Configuration.

    PropertyValue
    spark.pluginscom.esri.geoanalytics.Plugin
    spark.serializerorg.apache.spark.serializer.KryoSerializer
    spark.kryo.registratorcom.esri.geoanalytics.KryoRegistrator

Once the requirements listed above are met you will be able to import geoanalytics in your PySpark session and authorize the module using a username and password, token, or authorization file. For more information, see Licensing and Authorization.

Try out the API by importing the SQL functions as an easy-to-use alias like ST and listing the first 20 functions in a notebook cell:

Use dark colors for code blocksCopy
  
1
2
from geoanalytics.sql import functions as ST
spark.sql("show user functions like 'ST_*'").show()

What's next?

You can now use any SQL function or analysis tool in the geoanalytics module.

See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.