Install GeoAnalytics Engine on Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Using the steps outlined below, GeoAnalytics Engine can be leveraged within a PySpark notebook hosted in Azure Synapse Analytics.

The table below summarizes the Azure Synapse runtimes supported by each version of GeoAnalytics Engine.

GeoAnalytics Engine	Azure Synapse
1.0.x-1.2.x	Runtime for Apache Spark 3.1, Runtime for Apache Spark 3.2, Runtime for Apache Spark 3.3
1.3.x	Runtime for Apache Spark 3.2, Runtime for Apache Spark 3.3
1.4.x	Runtime for Apache Spark 3.3, Runtime for Apache Spark 3.4

To complete this install you will need:

An active Azure subscription
GeoAnalytics Engine install files. If you have a GeoAnalytics Engine subscription with a username and password, you can download the ArcGIS GeoAnalytics Engine distribution here after signing in. If you have a license file, follow the instructions provided with your license file to download the GeoAnalytics Engine distribution.
A GeoAnalytics Engine subscription, or a license file.

Prepare the workspace

If you do not have an active Synapse workspace, create one using the Azure portal or with another method listed in Azure documentation.
Launch Azure Synapse Studio from your Azure Synapse Analytics workspace.
Install the GeoAnalytics Engine .jar file and .whl as Workspace packages. Depending on the analysis you will complete, optionally upload the following jars:
- esri-projection-geographic, if you need to perform a transformation that requires supplementary projection data.
- geoanalytics-natives to use geocoding or network analysis tools. This is not supported on Runtime for Apache Spark 3.4.
Note
To use ST_H3Bin or ST_H3Bins, you must also install the H3 jar, following the same steps as described in step 3.

Create a Spark pool

Within Synapse Studio, create a New Apache Spark configuration

by adding the following configuration properties and their associated values and assigning the configuration a proper name.

Property Value
spark.plugins com.esri.geoanalytics.Plugin
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator com.esri.geoanalytics.KryoRegistrator

Once complete, click Create
Within Synapse Studio, select New Apache Spark pool.
Under the Basics tab, configure the pool resources to meet your requirements.
Open the Additional Settings tab. Update the Automatic pausing settings or leave the defaults.
Select a supported Apache Spark version.
Select the Apache Spark configuration you created in step 1.
For Allow session level packages select "Enabled".
Open the Tags tab and add any relevant tags (optional).
Click Review + create and then Create to create the Spark pool.
Wait until you receive a notification that the Spark pool is finished being provisioned, then navigate to the Packages page for your pool. Under Packages click Select from workspace packages and add the geoanalytics.jar and geoanalytics.whl packages.

Property	Value
spark.plugins	com.esri.geoanalytics.Plugin
spark.serializer	org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator	com.esri.geoanalytics.KryoRegistrator

(Optional) Check spark pool status and view logs

Click on the cluster name to go to Azure portal and see deployment status. In the Azure portal page, you will see "Your deployment is complete" if the spark pool has been created successfully. Click on the down arrow next to Deployment details, the status of the cluster should show "OK". If deployment failed, you should see Deployment failed. Click here for more details under Spark pool Overview. Click on it to view the error message. The status of the cluster could show "BadRequest".
To monitor Spark applications and check logs, click on Monitor in your workspace and select Apache Spark applications. You can click on the application name link to view logs generated from running this application to help with troubleshooting.

Authorize GeoAnalytics Engine

Create a new notebook or open an existing one. Choose “PySpark (Python)” as the primary language.
In the notebook, in the Attach to menu, choose the Spark pool that you created earlier.
Select Run on the cell. Synapse will start a new Spark session to run this cell if needed. If a new Spark session is needed, initially it will take about two minutes to be created.
Import the geoanalytics library and authorize it using your username and password or a license file. See Authorization for more information. For example:
Use dark colors for code blocksCopy
```
1
2
import geoanalytics
geoanalytics.auth(username="User1", password="p@ssw0rd")
```
Try out the API by importing the SQL functions as an easy-to-use alias like ST and listing the first 20 functions in a notebook cell:
Use dark colors for code blocksCopy
```
1
2
from geoanalytics.sql import functions as ST
spark.sql("show user functions like 'ST_*'").show()
```

What’s next?

You can now use any SQL function, track function, or analysis tool in the geoanalytics module.

See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.