Skip to content

Bulk geocoding

Bulk geocoding capabilities are provided via the geocode_addresses() function in arcgisgeocode. Rather geocoding a single address and returning match candidates, the bulk geocoding capabilities take many addresses and geocode them all at once returning a single location per address.

Using the bulk geocoding capabilities can result in incurring a cost. See more about geocoding pricing.

In this example, you will geocode restaurant addresses in Boston, MA collected by the Boston Area Research Initiative (BARI). The data is originally from their data portal.

Step 1. Authenticate

In order to utilize the bulk geocoding capabilities of the ArcGIS World Geocoder, you must first authenticate using arcgisutils. In this example, we are using user-based authentication via auth_user(). You may choose a different authentication function if it works better for you.

library(arcgisutils)
library(arcgisgeocode)

set_arc_token(auth_user())

Step 2. Prepare the data

Similar to using find_address_candidates() the geocoding results return an ID that can be used to join back onto the original dataset. First, you will read in the dataset from a filepath using readr::read_csv() and then create a unique identifier with dplyr::mutate() and dplyr::row_number().

# Boston Yelp addresses
# Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT
fp <- "https://analysis-1.maps.arcgis.com/sharing/rest/content/items/0423768816b343b69d9a425b82351912/data"

library(dplyr)
restaurants <- readr::read_csv(fp) |>
  mutate(id = row_number())

restaurants
#> # A tibble: 2,664 × 28
#>    restaurant_name        restaurant_ID
#>    <chr>                          <dbl>
#>  1 100% Delicias                      2
#>  2 100% Delicias Express              3
#>  3 107                                4
#>  4 140 Supper Club                    6
#>  5 163 Vietnamese Sandwi…             7
#>  6 180 Cafe                           8
#>  7 180 Restaurant and Lo…             9
#>  8 224 Boston Street Res…            11
#>  9 24 Hour Pizza Delivery            12
#> 10 2Twenty2                          13
#> # ℹ 2,654 more rows
#> # ℹ 26 more variables:
#> #   restaurant_address <chr>,
#> #   restaurant_tag <chr>,
#> #   rating <dbl>, price <chr>,
#> #   review_number <dbl>,
#> #   unique_reviewer <dbl>, …

Step 3. Geocode addresses

The restaurant addresses are contained in the restaurant_address column. Pass this column into the single_line argument of geocode_addresses() and store the results in geocoded.

geocoded <- geocode_addresses(
  single_line = restaurants[["restaurant_address"]], 
  .progress = FALSE
)

# preview the first 10 columns
glimpse(geocoded[, 1:10])
#> Rows: 2,664
#> Columns: 11
#> $ result_id   <int> 1, 3, 4, 5, 2, 6,…
#> $ loc_name    <chr> "World", "World",…
#> $ status      <chr> "M", "M", "M", "M…
#> $ score       <dbl> 100.00, 100.00, 1…
#> $ match_addr  <chr> "635 Hyde Park Av…
#> $ long_label  <chr> "635 Hyde Park Av…
#> $ short_label <chr> "635 Hyde Park Av…
#> $ addr_type   <chr> "PointAddress", "…
#> $ type_field  <chr> NA, NA, NA, NA, N…
#> $ place_name  <chr> NA, NA, NA, NA, N…
#> $ geometry    <POINT [°]> POINT (-71.…

You can use dplyr::reframe() to geocode these addresses in a dplyr-friendly way.

Step 4. Join the results

In the previous step you geocoded the addresses and returned a data frame containing the location information. More likely than not, it would be helpful to have the locations joined onto the original dataset. You can do this by using dplyr::left_join() and joining on the id column you created and the result_id from the geocoding results.

joined_addresses <- left_join(
  restaurants,
  geocoded,
  by = c("id" = "result_id")
)

dplyr::glimpse(joined_addresses)
#> Rows: 2,664
#> Columns: 90
#> $ restaurant_name         <chr> "100%…
#> $ restaurant_ID           <dbl> 2, 3,…
#> $ restaurant_address      <chr> "635 …
#> $ restaurant_tag          <chr> "Lati…
#> $ rating                  <dbl> 2.0, …
#> $ price                   <chr> "$$",…
#> $ review_number           <dbl> 37, 2…
#> $ unique_reviewer         <dbl> 34, 2…
#> $ reviews_Jan_19          <dbl> 0, 1,…
#> $ reviews_Feb_19          <dbl> 1, 2,…
#> $ reviews_Mar_19          <dbl> 1, 3,…
#> $ reviews_Apr_19          <dbl> 0, 3,…
#> $ reviews_May_19          <dbl> 2, 1,…
#> $ reviews_Jun_19          <dbl> 0, 0,…
#> $ reviews_Jul_19          <dbl> 0, 1,…
#> $ reviews_Aug_19          <dbl> 0, 7,…
#> $ reviews_Jan_20          <dbl> 0, 0,…
#> $ reviews_Feb_20          <dbl> 0, 1,…
#> $ reviews_Mar_20          <dbl> 0, 0,…
#> $ reviews_Apr_20          <dbl> 1, 0,…
#> $ reviews_May_20          <dbl> 1, 0,…
#> $ reviews_Jun_20          <dbl> 0, 0,…
#> $ reviews_Jul_20          <dbl> 0, 0,…
#> $ reviews_Aug_20          <dbl> 0, 0,…
#> $ restaurant_neighborhood <chr> "Rosl…
#> $ GIS_ID                  <dbl> 18067…
#> $ CT_ID_10                <dbl> 25025…
#> $ id                      <int> 1, 2,…
#> $ loc_name                <chr> "Worl…
#> $ status                  <chr> "M", …
#> $ score                   <dbl> 100.0…
#> $ match_addr              <chr> "635 …
#> $ long_label              <chr> "635 …
#> $ short_label             <chr> "635 …
#> $ addr_type               <chr> "Poin…
#> $ type_field              <chr> NA, N…
#> $ place_name              <chr> NA, N…
#> $ place_addr              <chr> "635 …
#> $ phone                   <chr> NA, N…
#> $ url                     <chr> NA, N…
#> $ rank                    <dbl> 20, 2…
#> $ add_bldg                <chr> NA, N…
#> $ add_num                 <chr> "635"…
#> $ add_num_from            <chr> NA, N…
#> $ add_num_to              <chr> NA, N…
#> $ add_range               <chr> NA, N…
#> $ side                    <chr> NA, N…
#> $ st_pre_dir              <chr> NA, N…
#> $ st_pre_type             <chr> NA, N…
#> $ st_name                 <chr> "Hyde…
#> $ st_type                 <chr> "Ave"…
#> $ st_dir                  <chr> NA, N…
#> $ bldg_type               <chr> NA, N…
#> $ bldg_name               <chr> NA, N…
#> $ level_type              <chr> NA, N…
#> $ level_name              <chr> NA, N…
#> $ unit_type               <chr> NA, N…
#> $ unit_name               <chr> NA, N…
#> $ sub_addr                <chr> NA, N…
#> $ st_addr                 <chr> "635 …
#> $ block                   <chr> NA, N…
#> $ sector                  <chr> NA, N…
#> $ nbrhd                   <chr> NA, N…
#> $ district                <chr> NA, N…
#> $ city                    <chr> "Rosl…
#> $ metro_area              <chr> NA, N…
#> $ subregion               <chr> "Suff…
#> $ region                  <chr> "Mass…
#> $ region_abbr             <chr> "MA",…
#> $ territory               <chr> NA, N…
#> $ zone                    <chr> NA, N…
#> $ postal                  <chr> "0213…
#> $ postal_ext              <chr> "4723…
#> $ country                 <chr> "USA"…
#> $ cntry_name              <chr> "Unit…
#> $ lang_code               <chr> "ENG"…
#> $ distance                <dbl> 0, 0,…
#> $ x                       <dbl> -71.1…
#> $ y                       <dbl> 42.27…
#> $ display_x               <dbl> -71.1…
#> $ display_y               <dbl> 42.27…
#> $ xmin                    <dbl> -71.1…
#> $ xmax                    <dbl> -71.1…
#> $ ymin                    <dbl> 42.27…
#> $ ymax                    <dbl> 42.27…
#> $ ex_info                 <chr> NA, N…
#> $ bldg_comp               <chr> NA, N…
#> $ struc_type              <chr> "Comm…
#> $ struc_det               <chr> NA, "…
#> $ geometry                <POINT [°]> …

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.