Install GeoAnalytics Engine on Google Cloud Dataproc
Google Cloud Dataproc is a fully managed and highly scalable service for running Apache Spark and other open-source tools and frameworks. Using the steps outlined below, GeoAnalytics Engine can be leveraged within a PySpark notebook hosted in Google Dataproc.
GeoAnalytics Engine | Google Dataproc |
---|---|
1.0.x | 2.0-debian10, 2.0-ubuntu18, 2.0-rocky8 |
To complete this install you will need:
- An active subscription to Google Cloud Platform
GeoAnalytics Engine install files
A GeoAnalytics Engine subscription, or a license file.
Prepare the workspace
Log in to the Google Cloud Console.
Select an existing project or set-up a new one.
Create a Google Cloud Storage bucket in the same region you plan to deploy a cluster in.
Upload the geoanalytics.jar file and the geoanlytics.whl file to your bucket.
Copy and paste the text below into a text editor and save it as a .sh script. Replace
<bucket-name>
,<jar-file-name>
, and<wheel-file-name>
with the paths to the bucket name, the jar file name, and the wheel name from step 4. Save the script and upload it to your bucket.Use dark colors for code blocks Copy
Create a cluster
Navigate to Google Dataproc and open the Dataproc Create a cluster page.
Choose a Name, Location, Cluster Type, and Autoscaling policy that meet your requirements.
Under Versioning, choose a supported Google Dataproc image.
Under Component Gateway select "Enable component gateway".
Select any Optional components you may require. You must select at least either "Jupyter Notebook" or "Zeppelin Notebook".
Click Configure nodes and update the Master node and Worker node configurations to meet your requirements. Change other settings as needed.
Click Customize cluster and adjust the settings for Network configuration, Internal IP only, Dataproc Metastore, and Labels or keep the defaults.
Add the three properties in the table below to Cluster properties:
Prefix Key Value spark spark.plugins com.esri.geoanalytics.Plugin spark spark.serializer org.apache.spark.serializer.KryoSerializer spark spark.kryo.registrator com.esri.geoanalytics.KryoRegistrator Under Initialization actions, browse to the .sh script you uploaded previously and select it as the Executable file.
Adjust any remaining settings in Customize cluster as needed or keep the defaults.
Adjust any settings within the Manage security page or keep the defaults.
Click Create to create the cluster.
Authorize GeoAnalytics Engine
Find the Web Interfaces page on the cluster you created previously. Open a Jupyter, JupyterLab or Zeppelin notebook by clicking on the corresponding Component gateway.
Import the geoanalytics library and authorize it using your username and password or another supported authorization method. See Licensing and Authorization for more information. For example:
Use dark colors for code blocks Copy Try out the API by importing the SQL functions as an easy-to-use alias like
ST
and listing the first 20 functions in a notebook cell:Use dark colors for code blocks Copy
What’s next?
You can now use any SQL function or analysis tool in the geoanalytics
module.
See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.