Frequently asked questions

Getting Started

How do I get started with GeoAnalytics for Microsoft Fabric?

See Get started for a quick tutorial that demonstrates some of the basic capabilities of ArcGIS GeoAnalytics for Microsoft Fabric.

Why I am getting ModuleNotFoundError: No module named 'geoanalytics_fabric'?

To import the GeoAnalytics for Microsoft Fabric library in a Python notebook, use the import statement:

Use dark colors for code blocksCopy
1
import geoanalytics_fabric

If you receive an error upon importing the library, here are some potential solutions:

  • GeoAnalytics for Microsoft Fabric is only enabled in the Microsoft Fabric 1.3 runtime. To import and use the library, you must be using the 1.3 runtime. Change your environment to use the latest runtime.
  • The Public Preview for ArcGIS GeoAnalytics for Microsoft Fabric is being rolled out to the Microsoft Fabric production environments in phases. If you are using the 1.3 runtime and still see this error, the library may not have been enabled in your region yet.
Why I am getting Py4JJavaError: An error occurred while calling z:com.esri.geoanalytics.fabric.PluginLoader.hotload. : com.esri.geoanalytics.internal.AuthError: GeoAnalytics has not been enabled for this workspace?

This error implies that the library has not been enabled for your workspace. A tenant administrator must enable the library by going to Settings > Admin Portal > Tenant settings. When disabled, the GeoAnalytics for Microsoft Fabric library will not be available to use in Spark notebooks or Spark job definitions.

How do I use GeoAnalytics for Microsoft Fabric documentation?

Documentation is divided into two main components:

  • API Reference—A concise reference manual containing details about all functions, classes, return types, and arguments in GeoAnalytics for Microsoft Fabric.
  • Guide—Descriptions, code samples, and usage notes for each function and tool, as well as core concepts, frequently asked questions, and tutorials.
What are some helpful resources for learning about Spark?

The Spark SQL programming guide provides a high-level overview of Spark DataFrames and Spark SQL functions and includes extensive examples in Scala, Java, and Python. See the Machine Learning Library (MLlib) guide to learn more about Spark’s capabilities in the areas of classification, regression, clustering, and more.

To learn more about PySpark (the Python API for Spark) specifically, see the PySpark Quickstart and API reference. Spark also comes with a collection of PySpark examples that you can use to become more familiar with the API.

What are some helpful resources for learning more about spatial analysis and GIS?

See Esri’s guide called What is GIS? to find more information and resources. The ArcGIS Book is a great free resource for learning about all things GIS, especially the basics of spatial analysis.

Data sources

What data sources or formats are supported by GeoAnalytics for Microsoft Fabric?

All functions and tools in GeoAnalytics for Microsoft Fabric operate on Spark DataFrames or DataFrame columns. Therefore, the API supports any data source or format that can be loaded into a DataFrame. Spark includes built-in support for reading from Parquet, ORC, JSON, CSV, Text, Binary, and Avro files as well as Hive Tables and JDBC to other Databases. GeoAnalytics for Microsoft Fabric also includes native support for reading from file geodatabases, reading and writing shapefiles, GeoJSON, GeoParquet, and feature services, and writing to vector tiles. See Data sources for a summary of the spatial data sources and sinks supported by GeoAnalytics for Microsoft Fabric.

Does GeoAnalytics for Microsoft Fabric work with imagery or raster data?

No, GeoAnalytics for Microsoft Fabric functions and tools operate on vector geometry data only. This includes points, lines, polygons, multipoints, and generic vector geometries.

Working with geometry and time in DataFrames

How do I create a DataFrame?

The most common way to create a DataFrame is by loading data from a supported data source with spark.read.load(). For example:

Use dark colors for code blocksCopy
1
df = spark.read.load("examples/src/main/resources/users.parquet")

You can also create a DataFrame from a list of values or a Pandas DataFrame using createDataFrame(). See Using DataFrames for more information.

What are the differences between a PySpark DataFrame and a Pandas DataFrame?

PySpark DataFrames and Pandas DataFrames offer similar ways of representing columnar data in Python, but only PySpark DataFrames can be used with GeoAnalytics for Microsoft Fabric.

PySpark DataFrames (often referred to as DataFrames or Spark DataFrames in this documentation) are distributed across a Spark cluster and any operations on them are executed in parallel on all nodes of the cluster. Pandas DataFrames are stored in memory on a single node and operations on them are executed on a single thread. This means that the performance of Pandas DataFrames cannot be scaled out to handle larger datasets and is limited by the memory available on a single machine.

Other differences include that PySpark DataFrames are immutable while Pandas DataFrames are mutable. Also, PySpark uses lazy execution, which means that tasks are not executed until specific actions are taken. In contrast, Pandas uses eager execution which means that tasks are executed as soon as they are called.

How do I covert between a Pandas DataFrame and a PySpark DataFrame?

Several options are available. Koalas is a pandas API for Apache Spark that provides a scalable way to convert between PySpark DataFrames and a pandas-like DataFrame. You must first convert any geometry column into a string or binary column before converting to a Koalas DataFrame.

GeoAnalytics for Microsoft Fabric also includes a to_pandas_sdf() function which converts a PySpark DataFrame to a spatially-enabled DataFrame supported by the ArcGIS API for Python. This option will preserve any geometry columns in your PySpark DataFrame but cannot be distributed across a Spark cluster and thus is not as scalable as using Koalas.

How do I check and/or set the spatial reference of a geometry?

You can check the spatial reference of any geometry column using get_spatial_reference. If you know the spatial reference of the geometry data, you can set it using ST_SRID or ST_SRText.

To learn more about spatial references and how to set them see Coordinate systems and transformations.

What is the difference between ST_SRID and ST_Transform?

ST_SRID gets or sets the spatial reference ID of a geometry column but does not change any of the data in the column. ST_Transform transforms the geometry data within a column from an existing spatial reference to a new spatial reference and also sets the result column’s spatial reference ID.

To learn more about spatial references and how to transform between them see Coordinate systems and transformations.

Why are the values in my geometry column null?

This usually happens when using the wrong function to create the geometry column or when using an invalid or unsupported format. Double check that you are using the SQL function corresponding to the same geometry type as your input data. If you are unsure of the geometry type of your input data, use one of the generic geometry import functions:

Also verify that you’re using the SQL function corresponding to the format of your geometry data (EsriJSON, GeoJSON, WKT, WKB, or Shape), and that the representation is valid.

How do I create a time column?

GeoAnalytics for Microsoft Fabric uses the TimestampType included with PySpark to represent instants in time. Use the to_timestamp() function to create a timestamp column from a numeric or string column using Spark’s datetime patterns for formatting and parsing.

Intervals in time are represented by two timestamp columns containing the start and end instants of each interval.

If you have more than one timestamp column, use the st.set_time_fields() function to specify the time columns.

To check that your time column is set correctly, use the st.get_time_fields() function.

How do I specify which geometry columns or time columns to use in a tool?

If there is only one geometry column in a DataFrame it will be used automatically. If there are multiple geometry columns in a DataFrame, you must call st.set_geometry_field() on the DataFrame to specify the primary geometry column.

Similarly, if there is one timestamp column in a DataFrame it will be used automatically as instant time when time is required by a tool. If there are multiple timestamp columns or you want to represent intervals of time you must call st.set_time_fields().

Running tools and functions

How do I check the progress of a function or tool after calling it?

Check out the Microsoft Fabric documentation on how to monitor Spark jobs within a notebook and on Apache Spark application detail monitoring.

Why does nothing happen when I call a function?

PySpark uses lazy evaluation which means that functions are not executed until certain actions are called. In other words, calling a SQL function will not run that function on your Spark cluster until you call an action on the function return value. Examples of actions include write.save(), plot(), count(), collect(), and take().

What does it mean when a function fails with py4j.Py4JException: Method {} does not exist?

This exception is raised when the arguments to a SQL function are not all of the same documented type or when there are unexpected arguments. For SQL functions that accept x, y, z, and m values in particular, all coordinates must be of the same valid type or the exception above will be thrown. For example, ST_Point(4.0, 2.0) is valid because the x and y coordinates are both floats, but ST_Point(4, 2.0) is not because one coordinate is an integer and the other is a float.

Check that the types of your function arguments match the expected types documented in the API reference.

Plotting

Why should I transform my geometries to the same spatial reference for plotting?

When the geometries in two or more DataFrames are in different spatial references, they won't plot in the expected locations relative to each other. Transforming one to the spatial reference of the other ensures that they use the same coordinate system and units and thus plot together as expected. To learn more see Coordinate systems and transformations.

Why are basemaps not rendering using st.plot?

ArcGIS GeoAnalytics for Microsoft Fabric includes capabilities for plotting results within a notebook using the st.plot functionality. If you do not see a basemap plotting underneath the data from your dataframe in your notebook, you will need to configure the protobuf jar.

To configure the protobuf jar, copy and paste this code snippet into the first cell in your notebook. Note that this may increase the environment startup time to several minutes.

Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
%%configure -f
{
    "conf":
    {
        "spark.driver.extraJavaOptions" : "--add-opens java.base/jdk.internal.loader=ALL-UNNAMED",
        "spark.jars.packages": "com.google.protobuf:protobuf-java:3.25.5"
    }
}

Scala API

Can I use packages under com.esri other than geoanalytics?

Undocumented functionality within the com.esri namespace is for Esri internal use only and does not adhere to our version policy.

Are GeoAnalytics tools available in Scala?

During the Public Preview of ArcGIS GeoAnalytics for Microsoft Fabric, there is no Scala API provided for the GeoAnalytics tools; Scala support is only provided for the GeoAnalytics functions. Scala support will be expanded across the library in subsequent releases.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.