Calculate Field calculates field values on a new or existing field. The output will always be a new DataFrame.
Calculate Field runs on any DataFrame. A geometry column can be used in calculations but is not required.
Calculate Field will always create a new DataFrame. It will not modify the input. You can only calculate values for a single field at a time.
You can calculate values for an existing field or for a new field by creating a unique field name.
Expressions are created using ArcGIS Arcade. See Arcade expressions for more information.
Arcade expressions can be track aware. Track-aware expressions require that the input DataFrame has a single timestamp (instant) and that the a track field is specified using
set. To learn more about building track-aware expressions, see Track aware examples.
Tracks are represented by the unique combination of one or more track fields. For example, if the
Destinationfields are used as track identifiers, the records
Tokyowould be in different tracks since they have different values for the
setsegments tracks at a defined interval. For example, if you use a value of 1 day for
Time Boundary Split()
time_, and a value of 9:00 a.m on January 1, 1990 for the
time_parameter, each track will be truncated at 9:00 a.m. every day and analyzed within that segment. This split reduces computing time, as it creates smaller tracks for analysis. If splitting by a recurring time interval boundary makes sense for your analysis, it is recommended for big data processing.
Only one field can be modified at a time.
Calculate Field will always produce a new DataFrame, and will not edit your input DataFrame.
The result is the input DataFrame with a new column appended or an existing column replaced.
Improve the performance of Calculate Field by doing one or more of the following:
Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:
- If you are using tracks, split your tracks by using
Time Boundary Split()
The following tools perform similar capabilities:
You can also perform field calculations directly on your DataFrame with Spark SQL.
For more details, go to the GeoAnalytics Engine API reference for calculate field.
|Runs the Calculate Field tool using the provided DataFrame.||Yes|
|Sets an Arcade expression used to calculate the new field values.||Yes|
|Sets the name and type of the new field.||Yes|
|Sets boundaries to limit calculations to defined spans of time.||No|
|Sets one or more fields used to identify distinct tracks.||No|
# Log in import geoanalytics geoanalytics.auth(username="myusername", password="mypassword") # Imports from geoanalytics.tools import CalculateField from geoanalytics.sql import functions as ST from pyspark.sql import functions as F # Path to the Atlantic hurricanes data data_path = r"https://sampleserver6.arcgisonline.com/arcgis/rest/services/Hurricanes/MapServer/0" # Create an Atlantic hurricanes DataFrame df = spark.read.format("feature-service").load(data_path) \ .st.set_time_fields("Date_Time") # Use Calculate Field to create a new column containing the average windspeed in # miles per hour based on the previous, current, and next observation in the track. result = CalculateField() \ .setField(field_name="avg_windspeed", field_type="DOUBLE") \ .setExpression(expression="Mean($track.field['WINDSPEED'].window(-1,2))") \ .setTrackFields("EVENTID") \ .run(dataframe=df) # View the first 5 rows of the result DataFrame result.filter(result["EVENTID"] == "Alberto") \ .select("EVENTID", F.date_format("Date_Time", "yyyy-MM-dd").alias("Date_Time"), "WINDSPEED", "avg_windspeed") \ .sort("WINDSPEED", ascending=False).show(5)
+-------+----------+---------+------------------+ |EVENTID| Date_Time|WINDSPEED| avg_windspeed| +-------+----------+---------+------------------+ |Alberto|2000-08-12| 110.0|106.66666666666667| |Alberto|2000-08-12| 110.0|108.33333333333333| |Alberto|2000-08-13| 105.0|103.33333333333333| |Alberto|2000-08-12| 100.0| 100.0| |Alberto|2000-08-13| 95.0| 95.0| +-------+----------+---------+------------------+ only showing top 5 rows
# Get the world continents data for plotting continents_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest" \ "/services/World_Continents/FeatureServer/0" continents_subset_df = spark.read.format("feature-service").load(continents_path) \ .where("""CONTINENT = 'North America' or CONTINENT = 'South America' or CONTINENT = 'Europe' or CONTINENT = 'Africa'""") # Plot the Alantic hurricanes data with the continents subset data continents_subset_plot = continents_subset_df.st.plot(facecolor="none", edgecolors="lightblue", figsize=(14,8)) result_plot = result.st.plot(cmap_values="avg_windspeed", cmap="YlOrRd", legend=True, ax=continents_subset_plot, basemap="light") result_plot.set_title("Alantic hurricanes average windspeed (miles per hour)") result_plot.set_xlabel("X (Meters)") result_plot.set_xlim(right=10000000) result_plot.set_ylabel("Y (Meters)");