Creating hurricane tracks using GeoAnalytics

The sample code below uses big data analytics (GeoAnalytics) to reconstruct hurricane tracks using data registered on a big data file share in the GIS.

Note: This functionality is currently only available on ArcGIS Enterprise 10.5 and later and not yet with ArcGIS Online.

Reconstruct tracks

Reconstruct tracks is a type of data aggregation tool available in the arcgis.geoanalytics module. This tool works with a layer of point features or polygon features that are time enabled. It first determines which points belong to a track using an identification number or identification string. Using the time at each location, the tracks are ordered sequentially and transformed into a line representing the path of movement.

Data used

This sample uses hurricane data collected from 1848 through 2010, totalling over 177,000 points stored as a shapefile. The National Hurricane Center provides similar datasets that can be used for exploratory purposes.

To illustrate the nature of the data, a subset was published as a feature layer so we can visualize it:

In [1]:
from arcgis.gis import GIS

# Create an anonymous connection to ArcGIS Online
arcgis_online = GIS()
hurricane_pts = arcgis_online.content.search("Hurricane_tracks_points AND owner:atma.mani", "Feature Layer")[0]
hurricane_pts
Out[1]:
Hurricane_tracks_points
Years 1932 - 1942Feature Layer Collection by atma.mani
Last Modified: September 15, 2016
0 comments, 780 views
In [ ]:
subset_map = arcgis_online.map("USA")
subset_map

subset_img

In [3]:
subset_map.add_layer(hurricane_pts)

Inspect the data attributes

Let us query the first layer in hurricane_pts and view its attribute table as a Pandas dataframe.

In [4]:
hurricane_pts.layers[0].query().df.head()
Out[4]:
ATC_eye ATC_grade ATC_poci ATC_pres ATC_rmw ATC_roci ATC_w34_r1 ATC_w34_r2 ATC_w34_r3 ATC_w34_r4 ... day hour min_ month wmo_pres wmo_pres__ wmo_wind wmo_wind__ year SHAPE
0 -999 -999. -999 -999 -999 -999 -999 -999 -999 -999 ... 1 0 0 1 -999 -999.0 -999 -999.0 1932 {'x': 58.749999999600334, 'y': -18.07999992365...
1 -999 -999. -999 -999 -999 -999 -999 -999 -999 -999 ... 1 6 0 1 0 -100.0 0 -100.0 1932 {'x': 58.40000152555001, 'y': -18.499999999800...
2 -999 -999. -999 -999 -999 -999 -999 -999 -999 -999 ... 1 12 0 1 -999 -999.0 -999 -999.0 1932 {'x': 58.06999969483013, 'y': -18.899999618587...
3 -999 -999. -999 -999 -999 -999 -999 -999 -999 -999 ... 1 18 0 1 -999 -999.0 -999 -999.0 1932 {'x': 57.729999542445, 'y': -19.3099994656028,...
4 -999 -999. -999 -999 -999 -999 -999 -999 -999 -999 ... 2 0 0 1 -999 -999.0 -999 -999.0 1932 {'x': 57.3499984744501, 'y': -19.7600002291272...

5 rows × 148 columns

Create a data store

GeoAnalytics server processes big data through big data file share items published on the portal. In our case, the source hurricane data is stored as a shapefile. We created the big data file share by registering the folder containing the shapefile as a data store of type bigDataFileShare.

Let us connect to an ArcGIS Enterprise and use the Datastore Manager to search for big data file shares:

In [5]:
gis = GIS("https://pythonapi.playground.esri.com/portal", "arcgis_python", "amazing_arcgis_123")

Get the geoanalytics datastores and search it for the registered datasets:

In [6]:
# Query the data stores available
import arcgis
datastores = arcgis.geoanalytics.get_datastores()
bigdata_fileshares = datastores.search()
bigdata_fileshares
Out[6]:
[<Datastore title:"/bigDataFileShares/NYC_taxi_data15" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/all_hurricanes" type:"bigDataFileShare">,
 <Datastore title:"/bigDataFileShares/hurricanes_1848_1900" type:"bigDataFileShare">]

Using the Datastore Manager.search() method with no parameters returns all the datastores registered with the GeoAnalytics Server. We see the all_hurricanes big data file share in the list of datastores, so let's retrieve the datastore:

In [15]:
data_item = bigdata_fileshares[1]

If there is no big data file share for hurricane track data, we can use the Datastore Manager add_bigdata() method to register a shared folder accessible by the GeoAnalytics Server (see Making your data accessible as a big data file share:

In [8]:
data_item = datastores.add_bigdata("hurricanes_1848_1900", 
                                   r"\\path_to_hurricane_data")
Created Big Data file share for hurricanes_1848_1900

Once a big data file share is registered, the GeoAnalytics server samples all the datasets within the share to discern the schema of the data, including information about the geometry in a dataset. If the dataset is time-enabled, as is required to use some GeoAnalytics Tools, the manifest reports the necessary metadata about how time information is stored as well.

This process can take a few minutes depending on the size of your data. Once processed, querying the manifest property returns the schema as a dictionary. We can use the datasets key to retrieve information about the dataset. As you can see below, the schema is similar to the subset we observed earlier in this sample:

In [9]:
data_item.manifest['datasets'][0]
Out[9]:
{'format': {'extension': 'shp', 'type': 'shapefile'},
 'geometry': {'geometryType': 'esriGeometryPoint',
  'spatialReference': {'wkid': 4326}},
 'name': 'hurricanes',
 'schema': {'fields': [{'name': 'serial_num', 'type': 'esriFieldTypeString'},
   {'name': 'season', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'num', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'basin', 'type': 'esriFieldTypeString'},
   {'name': 'sub_basin', 'type': 'esriFieldTypeString'},
   {'name': 'name', 'type': 'esriFieldTypeString'},
   {'name': 'iso_time', 'type': 'esriFieldTypeString'},
   {'name': 'nature', 'type': 'esriFieldTypeString'},
   {'name': 'latitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'longitude', 'type': 'esriFieldTypeDouble'},
   {'name': 'wind_wmo_', 'type': 'esriFieldTypeDouble'},
   {'name': 'pres_wmo_', 'type': 'esriFieldTypeBigInteger'},
   {'name': 'center', 'type': 'esriFieldTypeString'},
   {'name': 'wind_wmo1', 'type': 'esriFieldTypeDouble'},
   {'name': 'pres_wmo1', 'type': 'esriFieldTypeDouble'},
   {'name': 'track_type', 'type': 'esriFieldTypeString'},
   {'name': 'size', 'type': 'esriFieldTypeString'},
   {'name': 'Wind', 'type': 'esriFieldTypeBigInteger'}]},
 'time': {'fields': [{'formats': ['yyyy-MM-dd HH:mm:ss', 'MM/dd/yyyy HH:mm'],
    'name': 'iso_time'}],
  'timeReference': {'timeZone': 'UTC'},
  'timeType': 'instant'}}

Perform data aggregation using reconstruct tracks tool

When you add a big data file share, a corresponding item gets created in your GIS. You can search for it like a regular item and query its layers.

In [10]:
search_result = gis.content.search("", item_type = "big data file share")
search_result
Out[10]:
[<Item title:"bigDataFileShares_all_hurricanes" type:Big Data File Share owner:api_data_owner>,
 <Item title:"bigDataFileShares_hurricanes_1848_1900" type:Big Data File Share owner:arcgis_python>,
 <Item title:"bigDataFileShares_NYC_taxi_data15" type:Big Data File Share owner:api_data_owner>]
In [11]:
data_item = search_result[0]
data_item
Out[11]:
bigDataFileShares_all_hurricanes
Big Data File Share by api_data_owner
Last Modified: May 01, 2018
0 comments, 0 views
In [12]:
years_all = data_item.layers[0]
years_all
Out[12]:
<Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_all_hurricanes/BigDataCatalogServer/hurricanes">

Reconstruct tracks tool

The arcgis.geoanalytics.summarize_data module contains the reconstruct_tracks(). We will use this tool to aggregate the numerous points into line segments showing the tracks followed by the various hurricanes occurring between 1848 and 2010. The tool creates a feature layer item as output which we'll visualize on a map and query to see the results.

In [13]:
from arcgis.geoanalytics.summarize_data import reconstruct_tracks

We can use the arcgis.env module to modify environment settings that geoprocessing and geoanalytics tools use during execution. Set verboseto True to return detailed messaging when running tools.

In [14]:
arcgis.env.verbose = True

Run the tool

In [15]:
agg_result = reconstruct_tracks(years_all, 
                                track_fields='Serial_Num',
                                method='GEODESIC')
Submitted.
Executing...
Executing (ReconstructTracks): ReconstructTracks "Feature Set" Serial_Num Geodesic # # # # # # "{"serviceProperties": {"name": "Reconstructed_Tracks_I49893", "serviceUrl": "https://pythonapi.playground.esri.com/server/rest/services/Hosted/Reconstructed_Tracks_I49893/FeatureServer"}, "itemProperties": {"itemId": "dca4efc3f7da4dab966bdd9d34437b04"}}" #
Start Time: Mon May 07 18:16:31 2018
Using URL based GPRecordSet param: https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_all_hurricanes/BigDataCatalogServer/hurricanes
{"messageCode":"BD_101028","message":"Starting new distributed job with 8 tasks.","params":{"totalTasks":"8"}}
{"messageCode":"BD_101029","message":"0/8 distributed tasks completed.","params":{"completedTasks":"0","totalTasks":"8"}}
{"messageCode":"BD_101029","message":"5/8 distributed tasks completed.","params":{"completedTasks":"5","totalTasks":"8"}}
{"messageCode":"BD_101029","message":"7/8 distributed tasks completed.","params":{"completedTasks":"7","totalTasks":"8"}}
{"messageCode":"BD_101029","message":"8/8 distributed tasks completed.","params":{"completedTasks":"8","totalTasks":"8"}}
{"messageCode":"BD_101081","message":"Finished writing results:"}
{"messageCode":"BD_101082","message":"* Count of features = 6647","params":{"resultCount":"6647"}}
{"messageCode":"BD_101083","message":"* Spatial extent = {\"xmin\":-180,\"ymin\":-68.5,\"xmax\":180,\"ymax\":70.7}","params":{"extent":"{\"xmin\":-180,\"ymin\":-68.5,\"xmax\":180,\"ymax\":70.7}"}}
{"messageCode":"BD_101084","message":"* Temporal extent = Interval(MutableInstant(1848-01-11 06:00:00.000),MutableInstant(2010-12-22 18:00:00.000))","params":{"extent":"Interval(MutableInstant(1848-01-11 06:00:00.000),MutableInstant(2010-12-22 18:00:00.000))"}}
{"messageCode":"BD_0","message":"Feature service layer created: https://pythonapi.playground.esri.com/server/rest/services/Hosted/Reconstructed_Tracks_I49893/FeatureServer/0","params":{"serviceUrl":"https://pythonapi.playground.esri.com/server/rest/services/Hosted/Reconstructed_Tracks_I49893/FeatureServer/0"}}

Inspect the results

Let us create a map and load the processed result which is a feature service

In [16]:
processed_map = gis.map("USA")
processed_map

extent_1860_1879_img

In [17]:
processed_map.add_layer(agg_result)

Thus we transformed more than 175,000 points into more than 6600 tracks that represents paths taken by individual hurricanes spanning a time period of more than 150 years.

Our input data and the map widget is time enabled, so we can filter these tracks using the set_time_extent method on the map widget, which accepts start_time and end_time parameters to inspect the results for a specific decade, 1860 - 1870:

In [18]:
processed_map.set_time_extent('1860', '1870')

What can GeoAnalytics do for you?

With this sample we just scratched the surface of what big data analysis can do for you. ArcGIS Enterprise at 10.5 packs a powerful set of tools that let you derive huge value from your data. You can do so by asking the right questions. For instance, the weather dataset we examined such could answer important questions such as:

  • did the number of hurricanes per season increase over the years?
  • which hurricanes travelled the longest distance?
  • which hurricanes had the longest duration? Is there a trend?
  • how are wind speed and distance travelled correlated?
  • how many times in the past century did a hurricane occur within 50 miles of my assets?
  • my industry is dependent on tourism, which is heavily impacted by the vagaries of weather. From historical weather data, can I correlate my profits with major weather events? How well is my business insulated from freak weather events?
  • do we see any shifts in major hurricane events over the years? is there any shift in when the hurricane season starts?

The ArcGIS API for Python gives you a gateway to easily access the big data tools from your ArcGIS Enterprise. By combining it with other powerful libraries from the pandas and scipy stack and the rich visualization capabilities of the Jupyter notebook, you can extract a huge amount of value from your data, big or small.


Feedback on this topic?