Batch Geocoding¶
The batch_geocode()
function in the arcgis.geocoding
module geocodes an entire list of addresses. Geocoding many addresses at once is also known as bulk geocoding.
This method can be used to find the following types of locations:
- Street addresses:
- 27488 Stanford Ave, Bowden, North Dakota
- 380 New York St, Redlands, CA 92373
- Administrative place names, such as city, county, state, province, or country names:
- Seattle, Washington
- State of Mahārāshtra
- Liechtenstein
- Postal codes:
- 92591
- TW9 1DN
Note: Points of interest (POI) can only be batch geocoded by using the category parameter to specify the place types to geocode.
The addresses in your table can be stored in a single field or in multiple fields — one for each address component. Batch geocoding performance is better when the address parts are stored in separate fields.
In this guide, we will observe:
- Maximum addresses
- Batch geocode access
batch_geocode()
function signature and parameters- Examples
- Category filtering
Maximum addresses¶
There is a limit to the maximum number of addresses that can be geocoded in a single batch request with the geocoder. The MaxBatchSize property defines this limit. For instance, if MaxBatchSize=2000, and 3000 addresses are sent as input, only the first 2000 will be geocoded. The SuggestedBatchSize property is also useful as it specifies the optimal number of addresses to include in a single batch request.
Both of these properties can be determined by querying the geocoder:
from arcgis.gis import GIS
from arcgis.geocoding import get_geocoders, batch_geocode
gis = GIS("http://www.arcgis.com", "username", "password")
# use the first of GIS's configured geocoders
geocoder = get_geocoders(gis)[0]
print("MaxBatchSize : " + str(geocoder.properties.locatorProperties.MaxBatchSize))
print("SuggestedBatchSize : " + str(geocoder.properties.locatorProperties.SuggestedBatchSize))
The client application must account for the limit by dividing the input address table into lists of MaxBatchSize
or fewer addresses, and send each list to the service as a separate request. Note that the gis.content.import_data()
and item.publish()
methods take care of this for you.
For batch geocode operations, the geocoder returns a response when each address in the input recordset has been geocoded. If an unhandled error such as a timeout occurs during the process, the geocoder will not return the results for that call, even if most of the addresses in the input have already been geocoded. For this reason, the client application should implement logic to detect and handle such errors.
Batch geocode access¶
batch_geocode() function signature and parameters¶
The batch_geocode()
function supports searching for lists of places and addresses. Each address in the list can be specified as a single line of text (single field format), or in multi-field format with the address components separated into mulitple parameters.
The code snippet below imports the geocode function and displays its signature and parameters along with a brief description:
help(batch_geocode)
addresses parameter¶
A list of addresses to be geocoded.
- For passing in the location name as a single line of text — single field batch geocoding — use a string.
- For passing in the location name as multiple lines of text — multifield batch geocoding — use the address fields described in the Geocoder documentation.
The Geocoder provides localized versions of the input field names in all locales supported by it. See the topic Localized input field names
in the Geocoder documentation for more information.
Example: batch geocode using single line addresses¶
addresses = ["380 New York St, Redlands, CA",
"1 World Way, Los Angeles, CA",
"1200 Getty Center Drive, Los Angeles, CA",
"5905 Wilshire Boulevard, Los Angeles, CA",
"100 Universal City Plaza, Universal City, CA 91608",
"4800 Oak Grove Dr, Pasadena, CA 91109"]
results = batch_geocode(addresses)
map = gis.map("Los Angeles", 9)
map
for address in results:
map.draw(address['location'])
Each match has keys for score
, location
, attributes
and address
properties:
results[0].keys()
category parameter¶
A place or address type which can be used to filter batch geocoding results. The parameter supports input of single category values or multiple comma-separated values. See the help topic Category filtering for complete details about the category parameter.
Example of category filtering with a single category:
category="Address"
Example of category filtering with multiple categories:
category="Address,Postal"
source_country parameter¶
A value representing the country. When a value is passed for this parameter, all of the addresses in the input table are sent to the specified country locator to be geocoded. For example, if sourceCountry="USA"
is passed in a batch_geocode()
call, it is assumed that all of the addresses are in the United States, and so all of the addresses are sent to the USA country locator. Using this parameter can increase batch geocoding performance when all addresses are within a single country.
Acceptable values include the full country name, the ISO 3166-1 2-digit country code, or the ISO 3166-1 3-digit country code.
A list of supported countries and codes is available here.
Example:
source_country="USA"
out_sr parameter¶
The spatial reference of the x/y coordinates returned by the geocode method. This is useful for applications using a map with a spatial reference different than that of the geocoder.
The spatial reference can be specified as either a well-known ID (WKID) or as a JSON spatial reference object. If outSR is not specified, the spatial reference of the output locations is the same as that of the geocoder. The World Geocoding Service spatial reference is WGS84 (WKID = 4326)
.
For a list of valid WKID values, see Projected Coordinate Systems and Geographic Coordinate Systems.
Example:
out_sr=102100
(102100 is the WKID for the Web Mercator projection)¶
Batch geocoding output fields¶
When you geocode a list of addresses, the output fields are returned as part of the attributes in the response. See the example JSON response below which shows all of the output fields that are returned for each record from a batch geocode process. The output fields are described here.
Batch geocoding examples¶
The earlier example showed how to call batch_geocode() with single line addresses. The following example illustrates how to call batch_geocode() with a list of multi-field addresses.
Example: Batch geocode using multiple field addresses¶
addresses= [{
"Address": "380 New York St.",
"City": "Redlands",
"Region": "CA",
"Postal": "92373"
},{
"Address": "1 World Way",
"City": "Los Angeles",
"Region": "CA",
"Postal": "90045"
}]
results = batch_geocode(addresses)
map = gis.map("Los Angeles", 9)
map
for address in results:
map.draw(address['location'])
Category filtering¶
The batch_geocode() method supports batch geocode filtering by category values, which represent address and place types. By including the category parameter in a batch_geocode() call you can avoid false positive matches to unexpected place and address types due to ambiguous input.
For example, a user has a table of three-letter airport codes that they want to geocode. There may be city or business names that are the same as an airport code, causing false positive matches to other places. However the user can ensure that only airport matches are returned by specifying category="airport"
in the request.
Example: Batch geocode airport codes with category¶
airports = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"], category="airport")
map = gis.map("CA", 6)
map
for airport in airports:
popup = {
"title" : airport['attributes']['PlaceName'],
"content" : airport['address']
}
map.draw(airport['location'], popup)
You can also use category filtering to avoid "low resolution" fallback matches
. By default if the World Geocoding Service cannot find a match for an input address it will automatically search for a lower match level, such as a street name, city, or postal code. For batch geocoding a user may prefer that no match is returned in such cases so that they are not charged for the geocode. If a user passes category="Point Address,Street Address"
in a batch_geocode() call, no fallback will occur if address matches cannot be found; the user will only be charged for the actual address matches.
Example: Batch geocode with fallback allowed (no category)¶
In the example below, the second address is not matched to a point address, but is matched to the city instead, due to fallback:
results = batch_geocode(["380 New York St Redlands CA 92373",
"27488 Stanford Dr Escondido CA"])
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Example: Batch geocode with no fallback allowed (category="Point Address")¶
In the example below, as a point address match is not found for the second address, there is no low resolution fallback as the category has been set to Point Address, and no match is returned for the second address:
results = batch_geocode(["380 New York St Redlands CA 92373",
"27488 Stanford Dr Escondido CA"],
category="Point Address")
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Feedback on this topic?