Geocoding

Geocoding is the process of assigning a location, usually in the form of coordinate values, to an address by comparing the descriptive location elements in the address to those present in the reference material. Addresses come in many forms, ranging from the common address format of house number followed by the street name and succeeding information to other location descriptions such as postal zone or census tract. An address includes any type of information that distinguishes a place.

You can perform geocoding-related operations in GeoAnalytics Engine using the Geocode or Reverse Geocode tools which are detailed below.

Geocoding with GeoAnalytics Engine requires the geoanalytics-natives jar. See the Install and set up documentation for more information on how to install this jar to your environment.

Locators

Geocoding requires a locator. A locator contains a snapshot of the reference data that is used in geocoding operations. The results of geocoding can be used to perform spatial and tabular analysis to help you make specific decisions based on your needs.

Locators in GeoAnalytics Engine

A locator is required to run the Geocode and Reverse Geocode tools. You can load a locator in the Geocode or Reverse Geocode tools with either of the following options:

  • Specify the file path of the locator (.loc) data in .setLocator(). The .loz file needs to be in the same directory as the .loc file.
  • Specify the file path of a mobile map package (.mmpk) in .setLocator(). The locator data stored in the mobile map package is automatically detected and used in geocoding operations.

The locator must be locally accessible to all nodes in your Spark cluster.

For more information, see Locator and network dataset setup.

You can build your own locator if you want to collect and manage your own assets and address data. This is a flexible option to set up geocoding in a way that aligns with your search patterns and needs. You can build your own locator using an application like ArcGIS Pro, to learn more see Introduction to custom locators. You can also license commercially available locator files, for instance, Esri’s ArcGIS StreetMap Premium.

Describe locators in GeoAnalytics Engine

You will need to view the properties of a locator in order to perform geocoding operations with it. Some information is essential for the analysis, such as supported country code, user-defined output fields, etc. The information will help you decide how to configure the parameters of the Geocode and Reverse Geocode tools and better understand the results.

You can use the describe_locator(path) utility function to view all properties stored in the locator.

The function returns a dictionary with the following properties of your locator:

  • allAvailableOutputFields—All output fields available in this locator that can be returned in the Geocode or Reverse Geocode tool output.

  • countryCodes—A list of three-character country codes associated with countries that the locator supports.

  • defaultOutputFields—The output fields in this locator that will be returned in the Geocode or Reverse Geocode tool outputs as default.

  • inputFields—A list of input fields that are used in address matching.

  • maxNumberOfCandidates—Maximum number of candidates that can be returned in the Geocode tool output.

  • minCandidateScore—Minimum score to be considered as a match candidate in the Geocode tool output.

  • spatialReference—The well-known ID (WKID) of the spatial reference the locator data is stored in.

  • supportedRoles—A list of supported address types in this locator. For example, Point Address, Street Address, POI, etc.

  • supportsAddresses—A Boolean value indicating if the locator supports geocoding of addresses.

  • supportsIntersections—A Boolean value indicating if the locator supports geocoding of intersections.

  • supportsPOI—A Boolean value indicating if the locator supports geocoding of Points Of Interest.

  • userDefinedOutputFields—A list of user-defined output fields available to be returned in the Geocode or Reverse Geocode tool outputs.

  • version—The version of the locator.

The following code sample shows an example of describing a locator:

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
import geoanalytics
import json

locator_info = geoanalytics.util.describe_locator(r"/data/example.loc")
print(json.dumps(locator_info, sort_keys=True, indent=4))
Result
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
{
    "allAvailableOutputFields": [
        "Status",
        "Score",
        "Match_addr",
        "LongLabel",
        "ShortLabel",
        "Addr_type",
        "Type",
        "PlaceName",
        "Place_addr",
        "Phone",
        "URL",
        "Rank",
        "AddBldg",
        "AddNum",
        "AddNumFrom",
        "AddNumTo",
        "AddRange",
        "Side",
        "StPreDir",
        "StPreType",
        "StName",
        "StType",
        "StDir",
        "StPreDir1",
        "StPreType1",
        "StName1",
        "StType1",
        "StDir1",
        "StPreDir2",
        "StPreType2",
        "StName2",
        "StType2",
        "StDir2",
        "BldgType",
        "BldgName",
        "LevelType",
        "LevelName",
        "UnitType",
        "UnitName",
        "SubAddr",
        "StAddr",
        "Address",
        "Block",
        "Sector",
        "Nbrhd",
        "Neighborhood",
        "District",
        "City",
        "MetroArea",
        "Subregion",
        "Region",
        "RegionAbbr",
        "Territory",
        "Zone",
        "Postal",
        "PostalExt",
        "Country",
        "CntryName",
        "CountryCode",
        "LangCode",
        "Distance",
        "X",
        "Y",
        "DisplayX",
        "DisplayY",
        "Xmin",
        "Xmax",
        "Ymin",
        "Ymax",
        "ExInfo",
        "Provider"
    ],
    "countryCodes": [
        "PRI",
        "SPM",
        "MNP",
        "GUM",
        "ASM",
        "CAN",
        "VIR",
        "UMI",
        "MEX",
        "USA"
    ],
    "defaultOutputFields": [
        "Status",
        "Score",
        "Match_addr",
        "Addr_type"
    ],
    "inputFields": [
        "Address",
        "Address2",
        "Address3",
        "Neighborhood",
        "City",
        "Subregion",
        "Region",
        "Postal",
        "PostalExt",
        "CountryCode"
    ],
    "maxNumberOfCandidates": 50,
    "minCandidateScore": 70,
    "path": "C:\\data\\example.loc",
    "spatialReference": {
        "wkid": 4326
    },
    "supportedRoles": [
        "Subaddress",
        "PointAddress",
        "StreetAddress",
        "DistanceMarker",
        "StreetName",
        "StreetInt",
        "Postal",
        "Locality",
        "POI"
    ],
    "supportsAddresses": True,
    "supportsIntersections": True,
    "supportsPOI": True,
    "userDefinedOutputFields": [],
    "version": "903619 [GDM2023Q1SMP, 2023-04-09_04-42-50] (10)"
}

For more details, go to the GeoAnalytics Engine API reference for describe locator.

How Geocode works

Geocoding is the process of converting addresses into geographic coordinates that you can use to perform further analysis. The Geocode tool in GeoAnalytics Engine geocodes string values of addresses in a Spark DataFrame. This DataFrame can be created from a table (stored in a data format that is supported by Spark) that contains one or more fields specifying the address you want to geocode. The tool also requires an address locator. This tool matches the stored addresses against the locator and saves the result of each input record in an output DataFrame which contains the result columns and the input columns. The geocode process is distributed across the nodes in your Spark cluster and will be more performant with more cores.

Use cases

The following examples describe how the Geocode tool can be used for various goals:

  • As a retail lead, you can geocode consumer addresses in transaction data to identify trends and propose new store locations.
  • As an insurance analyst, you can detect fraudulent insurance claims by looking at the spatial relationship of geocoded property addresses and disaster footprints.
  • As a property specialist, you can geocode addresses of unclaimed or abandoned houses.

For more information about how to use Geocode, see the tool documentation.

How Reverse Geocode works

Reverse geocoding is the process of converting a location as described by geographic coordinates to a readable address. The Reverse Geocode tool in GeoAnalytics Engine creates addresses from point geometries in a Spark DataFrame and returns them as string values.

The reverse geocode process is distributed across the nodes in your Spark cluster and will be more performant with more cores.

Use cases

  • In the delivery tracking field, truck coordinates can be translated into addresses and matched with an expected delivery address to confirm arrival.
  • The coordinates of city buses, trains, or other public transportation vehicles can be turned into addresses in order to communicate a vehicle's location quickly to riders, employees, and emergency services.
  • In online retail, reverse geocoding the coordinates of a customer's location can allow you to automatically display the local currency and make location-based recommendations.

For more information about how to use Reverse Geocode, see the tool documentation.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.