Install GeoAnalytics Engine on Azure Databricks
The Azure Databricks Lakehouse Platform provides a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. Azure Databricks integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Using the steps outlined below, GeoAnalytics Engine can be leveraged within a PySpark notebook hosted in Azure Databricks.
To complete this install you will need:
- An active Azure subscription
GeoAnalytics Engine install files. If you have a GeoAnalytics Engine subscription with a username and password, you can download the ArcGIS GeoAnalytics Engine distribution here after signing in. If you have a license file, follow the instructions provided with your license file to download the GeoAnalytics Engine distribution.
A GeoAnalytics Engine subscription, or a license file.
Prepare the workspace
- If you do not have an active Azure Databricks workspace, create one using the Azure Portal or with another method listed in Azure documentation.
- Launch the Azure Databricks workspace from the Azure Portal.
- Find the jar file downloaded previously and upload it
to
DBFS.
Note that the DBFS browser is disabled by default.
Copy or make note of the jar path. Use the File API Format, for
example
/dbfs/File
. Optionally, also upload theStore/jars/geoanalytics_ 2_12_1_0_0.jar esri-projection-data
jars to DBFS and take note of their paths, for example/dbfs/File
andStore/jars/esri_ projection_ data1.jar /dbfs/File
Store/jars/esri_ projection_ data2.jar - Open the Admin Console and navigate to the Global Init Scripts tab.
- Add a new script and copy and paste the text below into the Script
field. Replace
JAR_
with the File API path noted in step 3.PATH If you need to perform a transformation that requires supplementary projection data, add the lines below to the script and replaceUse dark colors for code blocks Copy PROJECTION_
andDATA_ JAR1_ PATH PROJECTION_
with the corresponding File API paths noted in step 3.DATA_ JAR2_ PATH Use dark colors for code blocks Copy - Set the Run After option to "Run First" and click the Enabled toggle to enable the script. Click Add to save the script.
Create a cluster
Click the create button to open the Create Cluster page. Choose a name for your cluster.
Choose to deploy either a Multi node or Single node cluster and select a Policy and an Access mode.
Choose a supported Databricks Runtime Version. See Databricks runtime releases for details on runtime components.
Choose your preferred Worker Type and Driver Type options.
For the other parameters, use the default or change them to your preference.
Under Advanced Options find Spark Config and paste in the configuration below.
Use dark colors for code blocks Copy Select Create Cluster.
Install the wheel (
.whl
) file downloaded previously. Installing the file will make it available to import as a python library in a notebook. You can choose to either install the library for every cluster in your workspace or only on the cluster you are creating now. Install any other Python libraries you will need at this time.
(Optional) Check cluster status and view logs
To make sure your cluster has been successfully created, look under Event Log of the created cluster and check for Event Type of RUNNING, usually you will see under message it indicates Cluster is running.
If cluster creation failed, you will find Event Type of TERMINATING under Event Log. The message of TERMINATING event should give you more context of failure. For example, if you see
Reason:
in the message, you should check the global init script logs.Global init script failure If the failure reason isn't clear from Event Log, check the Driver Logs which will provide more information in standard output, standard error, and Log4j logs to help with debugging.
Authorize GeoAnalytics Engine
- Create a new notebook or open an existing one. Choose "Python" as the Default Language and select the cluster created previously for Cluster.
Import the geoanalytics library and authorize it using your username and password or another supported authorization method. See Authorization for more information. For example:
Use dark colors for code blocks Copy Try out the API by importing the SQL functions as an easy-to-use alias like
ST
and listing the first 20 functions in a notebook cell:Use dark colors for code blocks Copy
What’s next?
You can now use any SQL function or analysis tool in the geoanalytics
module.
See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.