Part 4 - Batch Geocoding
Introduction¶
The batch_geocode()
function in the arcgis.geocoding
module geocodes an entire list of addresses. Geocoding many addresses at once is also known as bulk geocoding. You can use this method upon finding the following types of locations:
- Street addresses (e.g.
27488 Stanford Ave, Bowden, North Dakota
, or380 New York St, Redlands, CA 92373
) - Administrative place names, such as city, county, state, province, or country names (e.g.
Seattle, Washington
,State of Mahārāshtra
, orLiechtenstein
) - Postal codes: (e.g.
92591
orTW9 1DN
)
Batch sizes (max and suggested batch sizes)¶
There is a limit to the maximum number of addresses that can be geocoded in a single batch request with the geocoder. The MaxBatchSize
property defines this limit. For instance, if MaxBatchSize=2000, and 3000 addresses are sent as input, only the first 2000 will be geocoded. The SuggestedBatchSize
property is also useful as it specifies the optimal number of addresses to include in a single batch request.
Both of these properties can be determined by querying the geocoder:
from arcgis.gis import GIS
from arcgis.geocoding import get_geocoders, batch_geocode
gis = GIS(profile="your_enterprise_profile")
# use the first of GIS's configured geocoders
geocoder = get_geocoders(gis)[0]
print("For current geocoder:")
print(" - MaxBatchSize: " + str(geocoder.properties.locatorProperties.MaxBatchSize))
print(" - SuggestedBatchSize: " + str(geocoder.properties.locatorProperties.SuggestedBatchSize))
Batch geocode single line addresses, multi-line addresses¶
The batch_geocode()
function supports searching for lists of places and addresses. Each address in the list can be specified as a single line of text (single field format), or in multi-field format with the address components separated into mulitple parameters.
The code snippet below imports the geocode
function and displays its signature and parameters along with a brief description:
help(batch_geocode)
The address
parameter will be a list of addresses to be geocoded, and you can choose between:
- a single line of text — single field batch geocoding — use a string.
- or multiple lines of text — multifield batch geocoding — use the address fields described in Part 3.
The Geocoder provides localized versions of the input field names in all locales supported by it.
Single Line Addresses¶
addresses = ["380 New York St, Redlands, CA",
"1 World Way, Los Angeles, CA",
"1200 Getty Center Drive, Los Angeles, CA",
"5905 Wilshire Boulevard, Los Angeles, CA",
"100 Universal City Plaza, Universal City, CA 91608",
"4800 Oak Grove Dr, Pasadena, CA 91109"]
results = batch_geocode(addresses)
map0 = gis.map("Los Angeles", 9)
map0
for address in results:
map0.draw(address['location'])
print(address['score'])
Each match has keys for score, location, attributes and address:
results[0].keys()
Multi-line Addresses¶
The earlier example showed how to call batch_geocode()
with single line addresses. The following example illustrates how to call batch_geocode()
with a list of multi-field addresses.
addresses= [{
"Address": "380 New York St.",
"City": "Redlands",
"Region": "CA",
"Postal": "92373"
},{
"Address": "1 World Way",
"City": "Los Angeles",
"Region": "CA",
"Postal": "90045"
}]
results = batch_geocode(addresses)
map1 = gis.map("Los Angeles", 9)
map1
for address in results:
map1.draw(address['location'])
Get geocoded results as a FeatureSet
object¶
When as_featureset
is set to True, we can get the geocoded results as a FeatureSet
object, which is more convenient for being plotted on the map, and shown as DataFrame
than when the results set is generated as a dict
object.
results_fset = batch_geocode(addresses,
as_featureset=True)
results_fset
map1b = gis.map("Los Angeles", 9)
map1b
map1b.draw(results_fset)
results_fset.sdf
Batch geocoding using geocode_from_items()¶
The batch_geocode()
function geocodes a table or file of addresses and returns the geocoded results. It supports CSV, XLS or table input. The task geocodes the entire file regardless of size. We can first take a look at its signature with help()
:
from arcgis.geocoding import geocode_from_items
help(geocode_from_items)
The geocode_from_items()
function is popular because it allows the user to input a web item (e.g. a CSV
file that has been uploaded to your organization before hand) and generate a resulting web item (in this case, we have specified the output_type
as Feature Layer
). Let's look at an example below:
csv_item = gis.content.search("addresses_file", item_type="CSV")[0]
csv_item
from arcgis.geocoding import geocode_from_items
fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
"column_names": ["Addresses"],
"field_mapping": ['Addresses', 'Address']
},
output_name="address_file_matching",
gis=gis)
Example of geocoding POIs (category param)¶
category
parameter¶
The category
parameter is a place or address type which can be used to filter batch geocoding results. The parameter supports input of single category values or multiple comma-separated values.
Single category filtering example:
category="Address"
Multiple category filtering example:
category="Address,Postal"
We will now explore some examples taking advantage of the category
parameter, in the following orders:
- airports using their codes
- a list of city names
- restaurants of a few different sub-categories (Peruvian, Japanese, Korean, French..)
Example: Finding airports using their codes¶
airports = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"], category="airport")
map2 = gis.map("CA", 6)
map2
for airport in airports:
popup = {
"title" : airport['attributes']['PlaceName'],
"content" : airport['address']
}
map2.draw(airport['location'], popup)
Examples of source_country
and lang_code
¶
source_country
parameter¶
The source_country
parameter is a value representing the country. When a value is passed for this parameter, all of the addresses in the input table are sent to the specified country locator to be geocoded. For example, if source_country="USA"
is passed in a batch_geocode()
call, it is assumed that all of the addresses are in the United States, and so all of the addresses are sent to the USA country locator. Using this parameter can increase batch geocoding performance when all addresses are within a single country.
Acceptable values include the full country name, the ISO 3166-1 2-digit country code
, or the ISO 3166-1 3-digit country code
.
A list of supported countries and codes is available here.
Example:
source_country="USA"
lang_code
parameter¶
The lang_code
parameter is optional. When specified, you can set the language in which geocode results are returned. See the table of supported countries for valid language code values in each country.
Example: Finding Indian Cities and Return Results in Hindi¶
india_cities = batch_geocode(["Mumbai", "New Dehli", "Kolkata"],
category="city",
source_country="IND",
lang_code="HI")
for city in india_cities:
print(city['address'])
india_map = gis.map("India")
india_map
for city in india_cities:
india_map.draw(city['location'])
Getting results in desired coordinate system¶
out_sr
parameter¶
This parameter is the spatial reference of the x/y coordinates returned by the geocode method. It is useful for applications using a map with a spatial reference different than that of the geocoder.
The spatial reference can be specified as either a well-known ID (WKID) or as a JSON spatial reference object. If outSR is not specified, the spatial reference of the output locations is the same as that of the geocoder. The World Geocoding Service spatial reference is WGS84 (WKID = 4326).
For a list of valid WKID values, see Projected Coordinate Systems and Geographic Coordinate Systems.
Example (102100 is the WKID for the Web Mercator projection):
out_sr=102100
airports[0]['address'], airports[0]['location']
For instance, the default output spatial reference is WGS84 with the WKID shown as 4326 (as shown in the previous cell). If we specify the out_sr
as 102100, then the x/y coordinates being returned by batch_geocode()
is now in Web Mercator, as displayed below:
airports_2 = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"],
category="airport",
out_sr=102100)
airports_2[0]['address'], airports_2[0]['location']
How to handle the response of error/exception¶
There are times when we see geocode()
, batch_geocode()
, or geocode_from_item()
having errors or exceptions. In the next example, we will look at how to handle these instances. The next section attempts to perform geocode_from_item()
when the subscribed GIS
does not support batch_geocode
.
First, we can use the url as shown below, and confirm that there is no BatchGeocode
capability in the tasks
properties.
from arcgis.geoprocessing._tool import Toolbox
tbx = Toolbox(url=gis.properties.helperServices.geocode[0]["url"], gis=gis)
tbx.properties.tasks
If we force the usage of geocode_from_items()
with this geocoding service URL, the method will return an AttributeError
.
try:
fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
geocode_service_url=gis.properties.helperServices.geocode[0]["url"],
geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
"column_names": ["Addresses"],
"field_mapping": ['Addresses', 'Address']
},
output_name="address_file_matching",
gis=gis)
display(fl_item)
except AttributeError as e:
print(e)
The solution is to change the active_gis
and the geocode_service_url
to ones that enables BatchGeocode
.
p_gis = GIS("<an alternative enterprise>", "user name", "password", verify_cert=False)
p_gis.properties.helperServices.geocode[0]["url"]
try:
fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
geocode_service_url=p_gis.properties.helperServices.geocode[0]["url"],
geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
"column_names": ["Addresses"],
"field_mapping": ['Addresses', 'Address']
},
output_name="address_file_matching",
gis=p_gis)
display(type(fl_item))
except AttributeError as e:
print(e)
Avoiding fallbacks¶
You can also use category filtering to avoid "low resolution" fallback matches. By default, if the World Geocoding Service cannot find a match for an input address, it will automatically search for a lower match level, such as a street name, city, or postal code. For batch geocoding, a user may prefer that no match is returned in these cases so that they are not charged for the geocode. If a user passes category="Point Address,Street Address" in a batch_geocode()
call, no fallback will occur if address matches cannot be found; the user will only be charged for the actual address matches.
Example: Batch geocode with fallback allowed (no category)¶
In the example below, the second address is not matched to a point address, but is matched to the city instead, due to fallback.
results = batch_geocode(["380 New York St Redlands CA 92373",
"? Stanford Dr Escondido CA"])
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Example: Batch geocode with no fallback allowed (category="Point Address"
)¶
In the example below, as a point address match is not found for the second address, there is no low resolution fallback, as the category has been set to Point Address. As a result, no match is returned for the second address:
results = batch_geocode([ "380 New York St Redlands CA 92373",
"? Stanford Dr Escondido CA"],
category="Street Address")
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Conclusions¶
In this Part 4, we have explored the usage of batch_geocode()
function and how the advanced parameters can help with fine-tuning and filtering the geocoded results.