Install GeoAnalytics Engine on Azure Databricks

Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Using the steps outlined below, GeoAnalytics Engine can be leveraged within a PySpark notebook hosted in Azure Databricks.

The table below summarizes the Azure Databricks runtimes supported by each version of GeoAnalytics Engine.

GeoAnalytics EngineAzure Databricks
1.0.x7.3-10.5

To complete this install you will need:

  • An active Azure subscription
  • GeoAnalytics Engine install files

  • A GeoAnalytics Engine subscription, or a license file.

Prepare the workspace

  1. If you do not have an active Azure Databricks workspace, create one using the Azure Portal or with another method listed in Azure documentation.
  2. Launch the Azure Databricks workspace from the Azure Portal.
  3. Find the jar file downloaded previously and upload it to DBFS. Note that the DBFS browser is disabled by default. Copy or make note of the jar path. Use the File API Format, for example /dbfs/FileStore/jars/geoanalytics_2.12-1.0.0.jar. Optionally, also upload the esri-projection-data jars to DBFS and take note of their paths, for example /dbfs/FileStore/jars/esri-projection-data1.jar and /dbfs/FileStore/jars/esri-projection-data2.jar
  4. Open the Admin Console and navigate to the Global Init Scripts tab.
  5. Add a new script and copy and paste the text below into the Script field. Replace JAR_PATH with the File API path noted in step 3.
    Use dark colors for code blocksCopy
      
    1
    2
    #!/bin/bash
    cp JAR_PATH /databricks/jars/
    
    If you need to perform a transformation that requires supplementary projection data, add the lines below to the script and replace PROJECTION_DATA_JAR1_PATH and PROJECTION_DATA_JAR2_PATH with the corresponding File API paths noted in step 3.
    Use dark colors for code blocksCopy
      
    1
    2
    cp PROJECTION_DATA_JAR1_PATH /databricks/jars/
    cp PROJECTION_DATA_JAR2_PATH /databricks/jars/
    
  6. Set the Run After option to "Run First" and click the Enabled toggle to enable the script. Click Add to save the script.

Create a cluster

  1. Click the create button to open the Create Cluster page. Choose a name for your cluster.

  2. Choose "Standard" as the Cluster Mode.

  3. Choose a supported Databricks Runtime Version. See Databricks runtime releases for details on runtime components.

  4. Use the default Autopilot Options or change them to your preference.

  5. Choose your preferred Worker Type and Driver Type options.

  6. Under Advanced Options find Spark Config and paste in the configuration below.

    Use dark colors for code blocksCopy
       
    1
    2
    3
    spark.plugins com.esri.geoanalytics.Plugin
    spark.serializer org.apache.spark.serializer.KryoSerializer
    spark.kryo.registrator com.esri.geoanalytics.KryoRegistrator
    
  7. Select Create Cluster.

  8. Install the wheel (.whl) file downloaded previously. Installing the file will make it available to import as a python library in a notebook. You can choose to either install the library for every cluster in your workspace or only on the cluster you are creating now. Install any other Python libraries you will need at this time.

Authorize GeoAnalytics Engine

  1. Create a new notebook or open an existing one. Choose "Python" as the Default Language and select the cluster created previously for Cluster.
  2. Import the geoanalytics library and authorize it using your username and password or another supported authorization method. See Licensing and Authorization for more information. For example:

    Use dark colors for code blocksCopy
      
    1
    2
    import geoanalytics
    geoanalytics.auth(username="User1", password="p@ssw0rd")
  3. Try out the API by importing the SQL functions as an easy-to-use alias like ST and listing the first 20 functions in a notebook cell:

    Use dark colors for code blocksCopy
      
    1
    2
    from geoanalytics.sql import functions as ST
    spark.sql("show user functions like 'ST_*'").show()

What’s next?

You can now use any SQL function or analysis tool in the geoanalytics module.

See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.