Calculate field

Calculate Field calculates field values on a new or existing field. The output will always be a new DataFrame.

Usage notes

  • Calculate Field runs on any DataFrame. A geometry column can be used in calculations but is not required.

  • Calculate Field will always create a new DataFrame. It will not modify the input. You can only calculate values for a single field at a time.

  • You can calculate values for an existing field or for a new field by creating a unique field name.

  • Expressions are created using ArcGIS Arcade. See Arcade expressions for more information.

  • Arcade expressions can be track aware. Track-aware expressions require that the input DataFrame has a single timestamp (instant) and that the a track field is specified using setTrackFields(). To learn more about building track-aware expressions, see Track aware examples.

  • Tracks are represented by the unique combination of one or more track fields. For example, if the flightID and Destination fields are used as track identifiers, the records ID007, Solden and ID007, Tokyo would be in different tracks since they have different values for the Destination field.

  • Setting setTimeBoundarySplit() segments tracks at a defined interval. For example, if you use a value of 1 day for time_boundary_split, and a value of 9:00 a.m. on January 1, 1990 for the time_boundary_reference parameter, each track will be truncated at 9:00 a.m. every day and analyzed within that segment. This split reduces computing time, as it creates smaller tracks for analysis. If splitting by a recurring time interval boundary makes sense for your analysis, it is recommended for big data processing.

Limitations

  • Only one field can be modified at a time.

  • Calculate Field will always produce a new DataFrame, and will not edit your input DataFrame.

Results

The result is the input DataFrame with a new column appended or an existing column replaced.

Performance notes

Improve the performance of Calculate Field by doing one or more of the following:

  • Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:

    • ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
    • ST_BboxIntersects—Select records that intersect an envelope.
    • ST_EnvIntersects—Select records having an evelope that intersects the envelope of another geometry.
    • ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.
  • If you are using tracks, split your tracks by using setTimeBoundarySplit().

Similar capabilities

The following tools perform similar capabilities:

You can also perform field calculations directly on your DataFrame with Spark SQL.

Syntax

For more details, go to the GeoAnalytics for Microsoft Fabric API reference for calculate field.

SetterDescriptionRequired
run(dataframe)Runs the Calculate Field tool using the provided DataFrame.Yes
setExpression(expression)Sets an Arcade expression used to calculate the new field values.Yes
setField(field_name, field_type)Sets the name and type of the new field.Yes
setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)Sets boundaries to limit calculations to defined spans of time.No
setTrackFields(*track_fields)Sets one or more fields used to identify distinct tracks.No

Examples

Run Calculate Field

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Imports
from geoanalytics_fabric.tools import CalculateField
from geoanalytics_fabric.sql import functions as ST
from pyspark.sql import functions as F

# Path to the Atlantic hurricanes data
data_path = r"https://sampleserver6.arcgisonline.com/arcgis/rest/services/Hurricanes/MapServer/0"

# Create an Atlantic hurricanes DataFrame
df = spark.read.format("feature-service").load(data_path) \
                    .st.set_time_fields("Date_Time")

# Use Calculate Field  to create a new column containing the average windspeed in
# miles per hour based on the previous, current, and next observation in the track.
result = CalculateField() \
            .setField(field_name="avg_windspeed", field_type="DOUBLE") \
            .setExpression(expression="Mean($track.field['WINDSPEED'].window(-1,2))") \
            .setTrackFields("EVENTID") \
            .run(dataframe=df)

# View the first 5 rows of the result DataFrame
result.filter(result["EVENTID"] == "Alberto") \
      .select("EVENTID", F.date_format("Date_Time", "yyyy-MM-dd").alias("Date_Time"), "WINDSPEED", "avg_windspeed") \
      .sort("avg_windspeed", ascending=False).show(5)
Result
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
+-------+----------+---------+------------------+
|EVENTID| Date_Time|WINDSPEED|     avg_windspeed|
+-------+----------+---------+------------------+
|Alberto|2000-08-12|    110.0|108.33333333333333|
|Alberto|2000-08-12|    110.0|106.66666666666667|
|Alberto|2000-08-13|    105.0|103.33333333333333|
|Alberto|2000-08-12|    100.0|             100.0|
|Alberto|2000-08-13|     95.0|              95.0|
+-------+----------+---------+------------------+
only showing top 5 rows

Plot results

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Get the world continents data for plotting
continents_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest" \
    "/services/World_Continents/FeatureServer/0"
continents_subset_df = spark.read.format("feature-service").load(continents_path) \
                                .where("""CONTINENT = 'North America' or
                                          CONTINENT = 'South America' or
                                          CONTINENT = 'Europe' or
                                          CONTINENT = 'Africa'""")

# Plot the Alantic hurricanes data with the continents subset data
continents_subset_plot = continents_subset_df.st.plot(facecolor="none",
                                                      edgecolors="lightblue",
                                                      figsize=(14,8))
result_plot = result.st.plot(cmap_values="avg_windspeed",
                             cmap="YlOrRd",
                             legend=True,
                             ax=continents_subset_plot,
                             basemap="light")
result_plot.set_title("Alantic hurricanes average windspeed (miles per hour)")
result_plot.set_xlabel("X (Meters)")
result_plot.set_xlim(right=10000000)
result_plot.set_ylabel("Y (Meters)");
Plotting example for a Calculate Field result. Average windspeed is shown.

Version table

ReleaseNotes

1.0.0-beta

Python tool introduced

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.