Part 4 - Batch Geocoding

Introduction

The batch_geocode() function in the arcgis.geocoding module geocodes an entire list of addresses. Geocoding many addresses at once is also known as bulk geocoding. You can use this method upon finding the following types of locations:

  • Street addresses (e.g. 27488 Stanford Ave, Bowden, North Dakota, or 380 New York St, Redlands, CA 92373)
  • Administrative place names, such as city, county, state, province, or country names (e.g. Seattle, Washington, State of Mahārāshtra, or Liechtenstein)
  • Postal codes: (e.g. 92591 or TW9 1DN)

Batch sizes (max and suggested batch sizes)

There is a limit to the maximum number of addresses that can be geocoded in a single batch request with the geocoder. The MaxBatchSize property defines this limit. For instance, if MaxBatchSize=2000, and 3000 addresses are sent as input, only the first 2000 will be geocoded. The SuggestedBatchSize property is also useful as it specifies the optimal number of addresses to include in a single batch request.

Both of these properties can be determined by querying the geocoder:

from arcgis.gis import GIS
from arcgis.geocoding import get_geocoders, batch_geocode
gis = GIS(profile="your_enterprise_profile")
# use the first of GIS's configured geocoders
geocoder = get_geocoders(gis)[0]
print("For current geocoder:")
print(" - MaxBatchSize: " + str(geocoder.properties.locatorProperties.MaxBatchSize))
print(" - SuggestedBatchSize: " + str(geocoder.properties.locatorProperties.SuggestedBatchSize))
For current geocoder:
 - MaxBatchSize: 1000
 - SuggestedBatchSize: 150

Batch geocode single line addresses, multi-line addresses

The batch_geocode() function supports searching for lists of places and addresses. Each address in the list can be specified as a single line of text (single field format), or in multi-field format with the address components separated into mulitple parameters.

The code snippet below imports the geocode function and displays its signature and parameters along with a brief description:

help(batch_geocode)
Help on function batch_geocode in module arcgis.geocoding._functions:

batch_geocode(addresses, source_country=None, category=None, out_sr=None, geocoder=None, as_featureset=False, match_out_of_range=True, location_type='street', search_extent=None, lang_code='EN', preferred_label_values=None)
    The batch_geocode() function geocodes an entire list of addresses.
    Geocoding many addresses at once is also known as bulk geocoding.
    
    =========================     ================================================================
    **Argument**                  **Description**
    -------------------------     ----------------------------------------------------------------
    addresses                     required list of strings or dictionaries.
                                  A list of addresses to be geocoded.
                                  For passing in the location name as a single line of text -
                                  single field batch geocoding - use a string.
                                  For passing in the location name as multiple lines of text
                                  multifield batch geocoding - use the address fields described
                                  in the Geocoder documentation.
                                  The maximum number of addresses that can be geocoded in a
                                  single request is limited to the SuggestedBatchSize property of
                                  the locator.
                                  Syntax:
                                  addresses = ["380 New York St, Redlands, CA",
                                    "1 World Way, Los Angeles, CA",
                                    "1200 Getty Center Drive, Los Angeles, CA",
                                    "5905 Wilshire Boulevard, Los Angeles, CA",
                                    "100 Universal City Plaza, Universal City, CA 91608",
                                    "4800 Oak Grove Dr, Pasadena, CA 91109"]
    
                                  OR
    
                                  addresses= [{
                                       "Address": "380 New York St.",
                                       "City": "Redlands",
                                       "Region": "CA",
                                       "Postal": "92373"
                                   },{
                                       "Address": "1 World Way",
                                       "City": "Los Angeles",
                                       "Region": "CA",
                                       "Postal": "90045"
                                   }]
    -------------------------     ----------------------------------------------------------------
    source_country                optional string, The source_country parameter is
                                  only supported by geocoders published using StreetMap
                                  Premium locators.
                                  Added at 10.3 and only supported by geocoders published
                                  with ArcGIS 10.3 for Server and later versions.
    -------------------------     ----------------------------------------------------------------
    category                      The category parameter is only supported by geocode
                                  services published using StreetMap Premium locators.
    -------------------------     ----------------------------------------------------------------
    out_sr                        optional dictionary, The spatial reference of the
                                  x/y coordinates returned by a geocode request. This
                                  is useful for applications using a map with a spatial
                                  reference different than that of the geocode service.
    -------------------------     ----------------------------------------------------------------
    as_featureset                 optional boolean, if True, the result set is
                                  returned as a FeatureSet object, else it is a
                                  dictionary.
    -------------------------     ----------------------------------------------------------------
    geocoder                      Optional, the geocoder to be used. If not specified,
                                  the active GIS's first geocoder is used.
    -------------------------     ----------------------------------------------------------------
    match_out_of_range            Optional, A Boolean which specifies if StreetAddress matches should
                                  be returned even when the input house number is outside of the house
                                  number range defined for the input street.
    -------------------------     ----------------------------------------------------------------
    location_type                 Optional, Specifies if the output geometry of PointAddress matches
                                  should be the rooftop point or street entrance location. Valid values
                                  are rooftop and street.
    -------------------------     ----------------------------------------------------------------
    search_extent                 Optional, a set of bounding box coordinates that limit the search
                                  area to a specific region. The input can either be a comma-separated
                                  list of coordinates defining the bounding box or a JSON envelope
                                  object.
    -------------------------     ----------------------------------------------------------------
    lang_code                     Optional, sets the language in which geocode results are returned.
                                  See the table of supported countries for valid language code values
                                  in each country.
    -------------------------     ----------------------------------------------------------------
    preferred_label_values        Optional, allows simple configuration of output fields returned
                                  in a response from the World Geocoding Service by specifying which
                                  address component values should be included in output fields. Supports
                                  a single value or a comma-delimited collection of values as input.
                                  e.g. ='matchedCity,primaryStreet'
    =========================     ================================================================
    
    :returns:
       dictionary or FeatureSet

The address parameter will be a list of addresses to be geocoded, and you can choose between:

  • a single line of text — single field batch geocoding — use a string.
  • or multiple lines of text — multifield batch geocoding — use the address fields described in Part 3.

The Geocoder provides localized versions of the input field names in all locales supported by it.

Single Line Addresses

addresses = ["380 New York St, Redlands, CA", 
             "1 World Way, Los Angeles, CA",
             "1200 Getty Center Drive, Los Angeles, CA", 
             "5905 Wilshire Boulevard, Los Angeles, CA",
             "100 Universal City Plaza, Universal City, CA 91608",
             "4800 Oak Grove Dr, Pasadena, CA 91109"]
results = batch_geocode(addresses)
map0 = gis.map("Los Angeles", 9)
map0
for address in results:
    map0.draw(address['location'])
    print(address['score'])
100
100
100
100
100
98.18

Each match has keys for score, location, attributes and address:

results[0].keys()
dict_keys(['address', 'location', 'score', 'attributes'])

Multi-line Addresses

The earlier example showed how to call batch_geocode() with single line addresses. The following example illustrates how to call batch_geocode() with a list of multi-field addresses.

addresses= [{
                "Address": "380 New York St.",
                "City": "Redlands",
                "Region": "CA",
                "Postal": "92373"
            },{
                "Address": "1 World Way",
                "City": "Los Angeles",
                "Region": "CA",
                "Postal": "90045"
            }]
results = batch_geocode(addresses)
map1 = gis.map("Los Angeles", 9)
map1
for address in results:
    map1.draw(address['location'])

Get geocoded results as a FeatureSet object

When as_featureset is set to True, we can get the geocoded results as a FeatureSet object, which is more convenient for being plotted on the map, and shown as DataFrame than when the results set is generated as a dict object.

results_fset = batch_geocode(addresses,
                             as_featureset=True)
results_fset
<FeatureSet> 2 features
map1b = gis.map("Los Angeles", 9)
map1b
map1b.draw(results_fset)
results_fset.sdf
ResultIDLoc_nameStatusScoreMatch_addrLongLabelShortLabelAddr_typeTypePlaceName...YDisplayXDisplayYXminXmaxYminYmaxExInfoOBJECTIDSHAPE
00WorldM100380 New York St, Redlands, California, 92373380 New York St, Redlands, CA, 92373, USA380 New York StPointAddress...34.057237-117.19487234.057237-117.195872-117.19387234.05623734.0582371{"type": "Point", "coordinates": [-117.1956825...
11WorldM1001 World Way, Los Angeles, California, 900451 World Way, Los Angeles, CA, 90045, USA1 World WayStreetAddress...33.944329-118.39846833.944329-118.399468-118.39746833.94332933.9453292{"type": "Point", "coordinates": [-118.3984681...

2 rows × 60 columns

Batch geocoding using geocode_from_items()

The batch_geocode() function geocodes a table or file of addresses and returns the geocoded results. It supports CSV, XLS or table input. The task geocodes the entire file regardless of size. We can first take a look at its signature with help():

from arcgis.geocoding import geocode_from_items

help(geocode_from_items)
Help on function geocode_from_items in module arcgis.geocoding._functions:

geocode_from_items(input_data: 'Union[Item, str, FeatureLayer]', output_type: 'str' = 'Feature Layer', geocode_service_url: 'Optional[Union[str, Geocoder]]' = None, geocode_parameters: 'Optional[dict[str, Any]]' = None, country: 'Optional[str]' = None, output_fields: 'Optional[str]' = None, header_rows_to_skip: 'int' = 1, output_name: 'Optional[str]' = None, category: 'Optional[str]' = None, context: 'Optional[dict[str, Any]]' = None, gis: 'Optional[GIS]' = None)
    The ``geocode_from_items`` method creates :class:`~arcgis.geocoding.Geocoder` objects from an
    :class:`~arcgis.gis.Item` or ``Layer`` objects.
    
    .. note::
        ``geocode_from_items`` geocodes the entire file regardless of size.
    
    =====================     ================================================================
    **Parameter**              **Description**
    ---------------------     ----------------------------------------------------------------
    input_data                required Item, string, Layer. Data to geocode.
    ---------------------     ----------------------------------------------------------------
    output_type               optional string.  Export item types.  Allowed values are "CSV",
                              "XLS", or "FeatureLayer".
    
                              .. note::
                                The default for ``output_type`` is "FeatureLayer".
    ---------------------     ----------------------------------------------------------------
    geocode_service_url       optional string of Geocoder. Optional
                              :class:`~arcgis.geocoding.Geocoder` to use to
                              spatially enable the dataset.
    ---------------------     ----------------------------------------------------------------
    geocode_parameters        optional dictionary.  This includes parameters that help parse
                              the input data, as well the field lengths and a field mapping.
                              This value is the output from the ``analyze_geocode_input``
                              available on your server designated to geocode. It is important
                              to inspect the field mapping closely and adjust them accordingly
                              before submitting your job, otherwise your geocoding results may
                              not be accurate. It is recommended to use the output from
                              ``analyze_geocode_input`` and modify the field mapping instead of
                              constructing this dictionary by hand.
    
                              **Values**
    
                              ``field_info`` - A list of triples with the field names of your input
                              data, the field type (usually TEXT), and the allowed length
                              (usually 255).
    
                              Example: [['ObjectID', 'TEXT', 255], ['Address', 'TEXT', 255],
                                       ['Region', 'TEXT', 255], ['Postal', 'TEXT', 255]]
    
                              ``header_row_exists`` - Enter true or false.
    
                              ``column_names`` - Submit the column names of your data if your data
                              does not have a header row.
    
                              ``field_mapping`` - Field mapping between each input field and
                              candidate fields on the geocoding service.
                              Example: [['ObjectID', 'OBJECTID'], ['Address', 'Address'],
                                          ['Region', 'Region'], ['Postal', 'Postal']]
    ---------------------     ----------------------------------------------------------------
    country                   optional string.  If all your data is in one country, this helps
                              improve performance for locators that accept that variable.
    ---------------------     ----------------------------------------------------------------
    output_fields             optional string. Enter the output fields from the geocoding
                              service that you want returned in the results, separated by
                              commas. To output all available outputFields, leave this
                              parameter blank.
    
                              Example: score,match_addr,x,y
    ---------------------     ----------------------------------------------------------------
    header_rows_to_skip       optional integer. Describes on which row your data begins in
                              your file or table. The default is 1 (since the first row
                              contains the headers). The default is 1.
    ---------------------     ----------------------------------------------------------------
    output_name               optional string, The task will create a feature service of the
                              results. You define the name of the service.
    ---------------------     ----------------------------------------------------------------
    category                  optional string. Enter a category for more precise geocoding
                              results, if applicable. Some geocoding services do not support
                              category, and the available options depend on your geocode service.
    ---------------------     ----------------------------------------------------------------
    context                   optional dictionary. Context contains additional settings that
                              affect task execution. Batch Geocode has the following two
                              settings:
    
                              1. Extent (extent) - A bounding box that defines the analysis
                                 area. Only those points in inputLayer that intersect the
                                 bounding box are analyzed.
                              2. Output Spatial Reference (outSR) - The output features are
                                 projected into the output spatial reference.
    
                              Syntax:
                              {
                              "extent" : {extent}
                              "outSR" : {spatial reference}
                              }
    ---------------------     ----------------------------------------------------------------
    gis                       optional ``GIS``, the :class:`~arcgis.gis.GIS` on which this
                              tool runs.
    
                              .. note::
                                If not specified, the active ``GIS`` is used.
    =====================     ================================================================
    
    .. code-block:: python
    
        # Usage Example
        >>> fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
                             geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
                                                 "column_names": ["Addresses"],
                                                 "field_mapping": ['Addresses', 'Address']
                                                 },
                             output_name="address_file_matching",
                             gis=gis)
        >>> type(fl_item)
        <:class:`~arcgis.gis.Item`>
    
    :return:
        A :class:`~arcgis.gis.Item` object.

The geocode_from_items() function is popular because it allows the user to input a web item (e.g. a CSV file that has been uploaded to your organization before hand) and generate a resulting web item (in this case, we have specified the output_type as Feature Layer). Let's look at an example below:

csv_item = gis.content.search("addresses_file", item_type="CSV")[0]
csv_item
addresses_file
CSV by api_data_owner
Last Modified: June 24, 2021
0 comments, 49 views
from arcgis.geocoding import analyze_geocode_input
my_geocode_parameters = analyze_geocode_input(input_table_or_item=csv_item,
                                              input_file_parameters= {"fileType":"csv",
                                                                      "headerRowExists":"true",
                                                                      "columnDelimiter":"","textQualifier":""})
my_geocode_parameters
{'header_row_exists': True,
 'column_delimiter': '',
 'text_qualifier': '',
 'field_info': '[["Addresses", "TEXT", 255]]',
 'field_mapping': '[["Addresses", "SingleLine"]]',
 'column_names': '',
 'file_type': 'csv',
 'singleline_field': 'SingleLine'}
from arcgis.geocoding import geocode_from_items
fl_item = geocode_from_items(csv_item, output_type='Feature Service',
                             geocode_parameters=my_geocode_parameters,
                             output_name="address_file_matching",
                             gis=gis)

Example of geocoding POIs (category param)

category parameter

The category parameter is a place or address type which can be used to filter batch geocoding results. The parameter supports input of single category values or multiple comma-separated values.

Single category filtering example:

category="Address"

Multiple category filtering example:

category="Address,Postal"

We will now explore some examples taking advantage of the category parameter, in the following orders:

  • airports using their codes
  • a list of city names
  • restaurants of a few different sub-categories (Peruvian, Japanese, Korean, French..)

Example: Finding airports using their codes

airports = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"], category="airport")
map2 = gis.map("CA", 6)
map2
for airport in airports:
    popup = { 
    "title" : airport['attributes']['PlaceName'], 
    "content" : airport['address']
    }
    map2.draw(airport['location'], popup)

Examples of source_country and lang_code

source_country parameter

The source_country parameter is a value representing the country. When a value is passed for this parameter, all of the addresses in the input table are sent to the specified country locator to be geocoded. For example, if source_country="USA" is passed in a batch_geocode() call, it is assumed that all of the addresses are in the United States, and so all of the addresses are sent to the USA country locator. Using this parameter can increase batch geocoding performance when all addresses are within a single country.

Acceptable values include the full country name, the ISO 3166-1 2-digit country code, or the ISO 3166-1 3-digit country code.

A list of supported countries and codes is available here.

Example:

source_country="USA"

lang_code parameter

The lang_code parameter is optional. When specified, you can set the language in which geocode results are returned. See the table of supported countries for valid language code values in each country.

Example: Finding Indian Cities and Return Results in Hindi

india_cities = batch_geocode(["Mumbai", "New Dehli", "Kolkata"], 
                             category="city", 
                             source_country="IND",
                             lang_code="HI")
for city in india_cities:
    print(city['address'])
बृहन मुंबई, बॉम्बे, महाराष्ट्र
नई दिल्ली, दिल्ली
कलकत्ता, पश्चिम बंगाल
india_map = gis.map("India")
india_map
for city in india_cities:
    india_map.draw(city['location'])

Getting results in desired coordinate system

out_sr parameter

This parameter is the spatial reference of the x/y coordinates returned by the geocode method. It is useful for applications using a map with a spatial reference different than that of the geocoder.

The spatial reference can be specified as either a well-known ID (WKID) or as a JSON spatial reference object. If outSR is not specified, the spatial reference of the output locations is the same as that of the geocoder. The World Geocoding Service spatial reference is WGS84 (WKID = 4326).

For a list of valid WKID values, see Projected Coordinate Systems and Geographic Coordinate Systems.

Example (102100 is the WKID for the Web Mercator projection):

out_sr=102100
airports[0]['address'], airports[0]['location']
('LAX',
 {'x': -118.40896999999995,
  'y': 33.94251000000003,
  'type': 'point',
  'spatialReference': {'wkid': 4326}})

For instance, the default output spatial reference is WGS84 with the WKID shown as 4326 (as shown in the previous cell). If we specify the out_sr as 102100, then the x/y coordinates being returned by batch_geocode() is now in Web Mercator, as displayed below:

airports_2 = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"], 
                           category="airport",
                           out_sr=102100)
airports_2[0]['address'], airports_2[0]['location']
('LAX', {'x': -13181226.2458, 'y': 4021085.1335000023})

How to handle the response of error/exception

There are times when we see geocode(), batch_geocode(), or geocode_from_item() having errors or exceptions. In the next example, we will look at how to handle these instances. The next section attempts to perform geocode_from_item() when the subscribed GIS does not support batch_geocode.

First, we can use the url as shown below, and confirm that there is no BatchGeocode capability in the tasks properties.

from arcgis.geoprocessing._tool import Toolbox
tbx = Toolbox(url=gis.properties.helperServices.geocode[0]["url"], gis=gis)
tbx.properties.tasks
['AggregatePoints',
 'CalculateDensity',
 'ChooseBestFacilities',
 'ConnectOriginsToDestinations',
 'CreateBuffers',
 'CreateDriveTimeAreas',
 'CreateRouteLayers',
 'CreateViewshed',
 'CreateWatersheds',
 'DeriveNewLocations',
 'DiagnoseLAALService',
 'DissolveBoundaries',
 'EnrichLayer',
 'ExtractData',
 'FieldCalculator',
 'FindCentroids',
 'FindExistingLocations',
 'FindHotSpots',
 'FindNearest',
 'FindOutliers',
 'FindPointClusters',
 'FindSimilarLocations',
 'GenerateTessellations',
 'InterpolatePoints',
 'JoinFeatures',
 'MergeLayers',
 'OverlayLayers',
 'PlanRoutes',
 'SummarizeCenterAndDispersion',
 'SummarizeNearby',
 'SummarizeWithin',
 'TraceDownstream']

If we force the usage of geocode_from_items() with this geocoding service URL, the method will return an AttributeError.

try:
    fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
                                 geocode_service_url=gis.properties.helperServices.geocode[0]["url"],
                                 geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
                                                     "column_names": ["Addresses"],
                                                     "field_mapping": ['Addresses', 'Address']
                                                     },
                                 output_name="address_file_matching",
                                 gis=gis)
    display(fl_item)
except AttributeError as e:
    print(e)
'Toolbox' object has no attribute 'batch_geocode'

The solution is to change the active_gis and the geocode_service_url to ones that enables BatchGeocode.

p_gis = GIS("<an alternative enterprise>", "user name", "password", verify_cert=False)
p_gis.properties.helperServices.geocode[0]["url"]
'https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer'
try:
    fl_item = geocode_from_items(csv_item, output_type='Feature Layer',
                                 geocode_service_url=p_gis.properties.helperServices.geocode[0]["url"],
                                 geocode_parameters={"field_info": ['Addresses', 'TEXT', 255],
                                                     "column_names": ["Addresses"],
                                                     "field_mapping": ['Addresses', 'Address']
                                                     },
                                 output_name="address_file_matching",
                                 gis=p_gis)
    display(type(fl_item))
except AttributeError as e:
    print(e)
Feature Layer Collection

Avoiding fallbacks

You can also use category filtering to avoid "low resolution" fallback matches. By default, if the World Geocoding Service cannot find a match for an input address, it will automatically search for a lower match level, such as a street name, city, or postal code. For batch geocoding, a user may prefer that no match is returned in these cases so that they are not charged for the geocode. If a user passes category="Point Address,Street Address" in a batch_geocode() call, no fallback will occur if address matches cannot be found; the user will only be charged for the actual address matches.

Example: Batch geocode with fallback allowed (no category)

In the example below, the second address is not matched to a point address, but is matched to the city instead, due to fallback.

results = batch_geocode(["380 New York St Redlands CA 92373",
                         "? Stanford Dr Escondido CA"])
for result in results:
    print("Score " + str(result['score']) + " : " + result['address'])
Score 100 : 380 New York St, Redlands, California, 92373
Score 86.23 : Escondido Mall, Stanford, California, 94305

Example: Batch geocode with no fallback allowed (category="Point Address")

In the example below, as a point address match is not found for the second address, there is no low resolution fallback, as the category has been set to Point Address. As a result, no match is returned for the second address:

results = batch_geocode([ "380 New York St Redlands CA 92373",
                          "? Stanford Dr Escondido CA"],
                          category="Street Address")
for result in results:
    print("Score " + str(result['score']) + " : " + result['address'])
Score 100 : 380 New York St, Redlands, California, 92373
Score 0 : 

Conclusions

In this Part 4, we have explored the usage of batch_geocode() function and how the advanced parameters can help with fine-tuning and filtering the geocoded results.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.