Reverse Geocode

Reverse Geocode creates addresses from point geometries and returns them as string values. This process requires a Spark DataFrame containing the points that you want to reverse geocode and a locator. The tool matches the points against reference data in a locator and returns the addresses of the points as strings along with other output columns.

Reverse Geocode workflow

Usage notes

  • The input DataFrame needs to have a point column to be able to run Reverse Geocode.

  • If the spatial reference of the input DataFrame is different than that of the locator, the input will be transformed to match the locator.

  • The fields from the input DataFrame will always be included in the output.

    The result fields in the output DataFrame are determined by the predefined_set parameter in the setOutFields() setter.

    • Minimal—Returns the Match_addr, and Addr_type fields. This is the default option.
    • MinimalAndUserFields—Returns the fields defined in Minimal and the custom output fields available in the locator. User defined fields can be configured during the process of creating a locator in ArcGIS Pro. For more information about locators, read the geocoding core concept.
    • All—Returns all available output fields including any custom output fields defined in your locator.
  • When an input DataFrame contains a field that has the same name as one of the output fields in the reverse geocoded result, the output field will be automatically renamed with a suffix of "1". For example, if a field named Address already exists in the input DataFrame, the result DataFrame will have a field named Address from the input, and Address1 representing the output field.

  • The output DataFrame will contain the same number of records as the input DataFrame. Unmatched records are indicated by a null value for Match_addr.

  • If there are no records in the locator that can be associated with the input geometry, a match address will not be returned. The following are common causes for unmatched records:

    • The geometry contains null coordinates.
    • The coordinates are invalid or cannot be transformed to the locator's spatial reference.
    • The locator does not contain reference addresses near the geometry.
    • An address type was specified for which there are no good matches within a reasonable distance.
  • You can use setLanguageCode() to set the language in which reverse geocoded addresses will be returned. When a given language code is not available in the locator, the tool will return results in the default language of the locator. The code should follow the ISO 639-3 standard.

Limitations

Geocoding with GeoAnalytics Engine requires a locator file. Using a locator service, such as the ArcGIS World Geocoding Service, is not supported.

Results

The result of Reverse Geocode is a copy of the input DataFrame with new fields added depending on the setOutFields() setter. The table below explains which fields are returned based on the predefined_set parameter's value in the setOutFields() setter. There are three options:

  • MinimalMatch_addr, and Addr_type are returned. This is the default option.
  • MinimalAndUserFieldsMatch_addr, Addr_type and any custom output fields available in the locator are returned.
  • All—All fields are returned including any custom fields defined in your locator.

The fields are detailed in the table below.

FieldDescription
Loc_nameThe name of the locator used to return a match result. This field is available only if the locator used for matching the table is a composite locator.
Match_addrThe address where the matched location actually resides based on the information of the matched candidate.
LongLabelA longer version of Match_addr containing more administrative information.
ShortLabelA shortened version of Match_addr.
Addr_typeThe geocoded address type, which indicates the level at which the address matched. Supported match levels vary between countries. The table at the bottom of this section describes some possible values.
TypeThe feature type for results returned by a search. The Type field only includes a value for candidates with an address type of POI or Locality. For example, the feature type of Starbucks might be Coffee Shop.
PlaceNameThe formal name of a geocode match candidate (e.g., Paris or Starbucks).
Place_addrThe full street address of a place, including street, city, and region (e.g., 275 Columbus Ave., New York, New York).
PhoneThe primary phone number of a place.
URLThe URL of the primary website for a place.
RankA number that indicates the importance of a result relative to other results with the same name. The smaller numbers represent higher-ranked features. Rank values are based on population or feature type. For example, there are cities in France and Texas named Paris. Paris, France, has a greater population than Paris, Texas, so it will have a higher rank.
AddBldgThe name of a building (e.g., Empire State Building).
AddNumThe alphanumeric value that represents the portion of an address typically known as a house number or building number. This value is returned for PointAddress and StreetAddress matches only.
AddNumFromA value representing the beginning number of a street address range. It is relative to direction of feature digitization and is not always the smallest number in the range. This value is provided for StreetAddress match results.
AddNumToA value representing the ending number of a street address range. It is relative to direction of feature digitization and is not always the largest number in the range. This value is provided for StreetAddress match results.
AddRangeThe full address number range for the street segment that an address lies on, in the format AddNumFrom-AddNumTo. An example is the AddRange value for the street address 123 Main St. may be 101-199.
SideThe side of the street where an address resides relative to the direction of feature digitization. This value is not relative to the direction of travel along the street. L indicates that an address is matched to the left side while R means the address is matched to the right side of the street. No value indicates that the address is not matched or the locator could not determine the side of the street.
StPreDirAn address element defining the direction of a street, which occurs before the primary street name (e.g., North in North Main Street).
StPreTypeAn address element defining the leading type of a street (e.g., Avenid in Avenida Central or Rue in Rue Lapin).
StNameAn address element defining the primary name of a street (e.g., Main in North Main Street).
StTypeAn address element defining the trailing type of a street (e.g., Street in Main Street).
StDirAn address element defining the direction of a street, which occurs after the primary street name (e.g. North in Main Street North).
StPreDir1An address element defining the leading direction of the first street in an intersection.
StPreType1An address element defining the leading type of the first street in an intersection.
StName1An address element defining the primary name of the first street in an intersection.
StDir1An address element defining the trailing direction of the first street in an intersection.
StPreDir2An address element defining the leading direction of the second street in an intersection.
StPreType2An address element defining the leading type of the second street in an intersection.
StName2An address element defining the primary name of the second street in an intersection.
StDir2An address element defining the trailing direction of the second street in an intersection.
BldgNameThe name or number of a building subunit (e.g., A in Building A).
BldgTypeThe classification of a building subunit. Examples include building, hangar, and tower.
LevelTypeThe classification of a floor subunit. Examples include floor, level, and department.
LevelNameThe name or number of a floor subunit (e.g., 3 in Level 3).
UnitTypeThe classification of a unit subunit. Examples include unit, apartment, and suite.
UnitNameThe name or number of a unit subunit (e.g., 2B in Apartment 2B).
SubAddrThe full subunit value for a candidate with an address type of Subaddress.
StAddrThe street address of a place without a zone, such as city or state (e.g., 275 Columbus Ave).
AddressThe full address of a place (e.g., 2000 MCMILLAN AVE, COMPTON, CA 90220).
BlockThe name of the block-level administrative division for a candidate. A block is the smallest administrative area for a country. It can be described as a subdivision of sector or neighborhood or a named city block. It is not commonly used.
SectorThe name of the sector-level administrative division for a candidate. A sector is a subdivision of neighborhood, district, or a collection of blocks. It is not commonly used.
NbrhdThe name of the neighborhood-level administrative division for a candidate. A neighborhood is a subsection of a city or district. For example, Little Italy is the name of a neighborhood in the city of San Diego, California.
NeighborhoodThe name of the neighborhood-level administrative division for a candidate. It is an alias for the field Nbrhd.
DistrictThe name of the district-level administrative division for a candidate, for example, a subdivision of city. For example, Wilhelmsburg is a district in the city of Hamburg in Germany.
CityThe name of the city-level administrative division for a candidate. City is a subdivision of a subregion or region. For example, Atlanta is a city within Fulton County in the state of Georgia.
MetroAreaThe name of the metropolitan area-level administrative division for a candidate. This is usually an urban area consisting of a large city and the smaller cities surrounding it. This can potentially intersect multiple subregions or regions. An example is the Kolkata Metropolitan Area in India.
SubregionThe name of the subregion-level administrative division for a candidate. Subregion is a subdivision of a region. For example, San Diego County is a subregion of the state of California.
RegionThe name of the region-level administrative division for a candidate. This can be a subdivision of a country or territory. It is typically the largest administrative area for a country (such as state or province) if the Territory administrative division is not used.
RegionAbbrAbbreviated region name. For example, the abbreviated name for California is CA.
TerritoryThe name of the territory-level administrative division for a candidate. This is a subdivision of a country and is not commonly used. An example is the Sudeste macroregion of Brazil, which encompasses the states of Espírito Santo, Minas Gerais, Rio de Janeiro, and São Paulo.
PostalAn alphanumeric address element defining the primary postal code (e.g., V7M 2B4 or 92374).
PostalExtAn alphanumeric address element defining the postal code extension (e.g., 8110 in 92373-8110).
CountryA three-character code for a country that follows the ISO 3166-1 alpha-3 standard.
CntryNameThe full country name for an address candidate. The name may be in the same language as the input address, or in the language specified by the langCode parameter. If the full country name is not available in the specified language, the primary language of the country is used (e.g., 日本 for Japan).
LangCodeA three-character language code representing the language of the address. The code should follow the ISO 639-3 standard.
XThe primary x-coordinate of the matched address in the spatial reference of the locator.
YThe primary y-coordinate of the matched address in the spatial reference of the locator.
DisplayXThe display x-coordinate of an address returned in the spatial reference of the locator.
DisplayYThe display y-coordinate of an address returned in the spatial reference of the locator.
XminThe minimum x-coordinate of a geocode result.
XmaxThe maximum x-coordinate of a geocode result.
YminThe minimum y-coordinate of a geocode result.
YmaxThe maximum y-coordinate of a geocode result.
ExInfoA collection of strings from the input that could not be matched to any part of an address and were used to score or penalize the result.

The table below outlines the possible values for Addr_type:

ValueDescription
SubaddressA street address based on points that represent house and building subaddress locations. Typically, this is the most spatially accurate match level. The subaddress elements of unit type and unit identifier help to distinguish one subaddress within or between structures from another when several occur within the same location. Reference data contains address points or polygons with associated house numbers, street names, and subaddress elements, along with administrative divisions and optional postal code. An example is 3836 Emerald Ave., Suite C, La Verne, CA 91750.
PointAddressA street address based on points that represent house and building locations. Reference data contains address points with associated house numbers and street names, along with administrative divisions and optional postal code. The X and Y and geometry output values for a PointAddress match represent the street entry location for the address; this is the location used for routing operations. The DisplayX and DisplayY values represent the rooftop or actual location of the address. An example is 380 New York St., Redlands, CA 92373.
ParcelA plot of land that is considered real property and may include one or more homes or other structures. A parcel typically has an address and parcel identification number assigned to it, such as 17 011100120063.
StreetAddressA street address that differs from PointAddress because the house number is interpolated from a range of numbers. Reference data contains street centerlines with house number ranges, along with administrative divisions and optional postal code information. An example is 647 Haight St., San Francisco, CA 94117.
StreetIntA street address consisting of a street intersection along with city and optional state and postal code information. An example is Redlands Blvd. & New York St., Redlands, CA 92373.
StreetAddressExtAn estimated street address match that is returned when parameter matchOutOfRange=true and the input house number exceeds the house number range for the matched street segment.
POIPoints of interest. Reference data consists of administrative division, place-names, businesses, landmarks, and geographic features. An example is Starbucks.
DistanceMarkerA street address that represents the linear distance along a street, typically in kilometers or miles, from a designated origin location. An example is Carr 682 KM 4, Barceloneta, 00617.
StreetMidBlockThe estimated midpoint of a range of house numbers along a street segment that correspond to a city block. An example is 100 Block of Grant Ave, Millville, New Jersey. The location returned for a StreetMidBlock match is more precise than that of a StreetName match, but less precise than a StreetAddress match. This is currently only functional for the United States.
StreetNameSimilar to a street address but without the house number. Reference data contains street centerlines with associated street names (no numbered address ranges), along with administrative divisions and optional postal code. An example is W Olive Ave., Redlands, CA 92373.
PostalExtA postal code with an additional extension (e.g., 90210-3841). Reference data is postal code points with extensions.
PostalPostal code (e.g., 90210). Reference data is postal code points.
PostalLocA combination of postal code and city name. Reference data is typically a union of postal boundaries and administrative (locality) boundaries. An example is 7132 Frauenkirchen.
LocalityA place-name representing a populated place. The Type output field provides more detailed information about the type of populated place. Possible Type values for Locality matches include Block, Sector, Neighborhood, District, City, MetroArea, County, State or Province, Territory, Country, and Zone.
FeatureA geocoding result returned by a locator created with the Create Feature Locator tool in ArcGIS Pro.
LatLongAn x,y coordinate pair. The LatLong address type is returned when an x,y coordinate pair such as 117.155579,32.703761 is the input.
XY—XYA match based on the assumption that the first coordinate of the input is longitude and the second is latitude.
YX—YXA match based on the assumption that the first coordinate of the input is latitude and the second is longitude.
MGRSA Military Grid Reference System (MGRS) location, such as 46VFM5319397841.
USNGA United States National Grid (USNG) location, such as 15TXN29753883.

How Reverse Geocode works

See the geocoding core concept topic for more info on the geocoding process.

Performance notes

To improve performance, limit the number of output fields returned in the tool output. For example, returning only the Minimal set of output fields should take less time to complete than returning All output fields.

Syntax

For more details, go to the GeoAnalytics Engine API reference for reverse geocode.

SetterDescriptionRequired
run(dataframe)Runs the Reverse Geocode tool using the provided DataFrame.Yes
setLocator(path)Set the address locator that will be used to reverse geocode the geometries.Yes
setFeatureTypes(*feature_types)Specifies the possible match types that will be returned. A single value or multiple values can be specified. Available values are Subaddress, PointAddress, StreetAddress, DistanceMarker, StreetName, StreetInt, Postal, Locality, and POI.No
setLanguageCode(language_code)Sets the language in which reverse geocoded addresses are returned.No
setOutFields(predefined_set)Sets the fields that will be included in the output DataFrame. The predefined_set parameter can accept three options: 'Minimal'(default), 'MinimalAndUserFields' and 'All'.No

Examples

Run Reverse Geocode

Python
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Log in
import geoanalytics
geoanalytics.auth(username="myusername", password="mypassword")

# Imports
from geoanalytics.tools import ReverseGeocode
from geoanalytics.sql import functions as ST

# URL to the public schools data
data_url = r"https://services1.arcgis.com/Ua5sjt3LWTPigjyD/arcgis/rest/services/" \
    "Public_School_Location_201819/FeatureServer/0"

# Create a public schools DataFrame
df = spark.read.format("feature-service").load(data_url) \
                    .withColumn("shape", ST.transform("shape", 4326))\
                    .select("shape")\
                    .where("STATE='CA'")

# Access the locator
# This needs to be accessible to the machine that is running the Reverse Geocode tool.
# If running on a cluster, it  needs to be accessible to all nodes in the cluster.
north_america_locator = r"/data/NA_locator.loc"

# Use Reverse Geocode to convert the coordinates into addresses
result = ReverseGeocode() \
            .setLocator(north_america_locator) \
            .setOutFields("minimal")\
            .setLanguageCode("ENG")\
            .setFeatureTypes("POI")\
            .run(df)

# Show the first 5 outputs
result.show(5)
Result
Use dark colors for code blocksCopy
1
2
3
4
5
6
7
8
9
10
+--------------------+--------------------+---------+
|               shape|          Match_addr|Addr_type|
+--------------------+--------------------+---------+
|{"x":-118.2159902...| Vasquez High School|      POI|
|{"x":-118.1856342...|Meadowlark Elemen...|      POI|
|{"x":-118.1951402...|  High Desert School|      POI|
|{"x":-121.9655031...|California School...|      POI|
|{"x":-121.9633661...|California School...|      POI|
+--------------------+--------------------+---------+
only showing top 5 rows

Version table

ReleaseNotes

1.3.0

Python tool introduced

1.5.0

Added support for loading the locator using SparkContext.addFile.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.