Bulk geocoding capabilities are provided via the geocode_addresses()
function in arcgisgeocode. Rather geocoding a single address and returning match candidates, the bulk geocoding capabilities take many addresses and geocode them all at once returning a single location per address.
Using the bulk geocoding capabilities can result in incurring a cost. See more about geocoding pricing.
In this example, you will geocode restaurant addresses in Boston, MA collected by the Boston Area Research Initiative (BARI). The data is originally from their data portal.
Step 1. Authenticate
In order to utilize the bulk geocoding capabilities of the ArcGIS World Geocoder, you must first authenticate using arcgisutils. In this example, we are using user-based authentication via auth_user()
. You may choose a different authentication function if it works better for you.
library(arcgisutils) library(arcgisgeocode) set_arc_token(auth_user())
Step 2. Prepare the data
Similar to using find_address_candidates()
the geocoding results return an ID that can be used to join back onto the original dataset. First, you will read in the dataset from a filepath using readr::read_csv()
and then create a unique identifier with dplyr::mutate()
and dplyr::row_number()
.
# Boston Yelp addresses # Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT fp <- "https://analysis-1.maps.arcgis.com/sharing/rest/content/items/0423768816b343b69d9a425b82351912/data" library(dplyr) restaurants <- readr::read_csv(fp) |> mutate(id = row_number()) restaurants
#> # A tibble: 2,664 × 28 #> restaurant_name restaurant_ID #> <chr> <dbl> #> 1 100% Delicias 2 #> 2 100% Delicias Express 3 #> 3 107 4 #> 4 140 Supper Club 6 #> 5 163 Vietnamese Sandwi… 7 #> 6 180 Cafe 8 #> 7 180 Restaurant and Lo… 9 #> 8 224 Boston Street Res… 11 #> 9 24 Hour Pizza Delivery 12 #> 10 2Twenty2 13 #> # ℹ 2,654 more rows #> # ℹ 26 more variables: #> # restaurant_address <chr>, #> # restaurant_tag <chr>, #> # rating <dbl>, price <chr>, #> # review_number <dbl>, #> # unique_reviewer <dbl>, …
Step 3. Geocode addresses
The restaurant addresses are contained in the restaurant_address
column. Pass this column into the single_line
argument of geocode_addresses()
and store the results in geocoded
.
geocoded <- geocode_addresses( single_line = restaurants[["restaurant_address"]], .progress = FALSE ) # preview the first 10 columns glimpse(geocoded[, 1:10])
#> Rows: 2,664 #> Columns: 11 #> $ result_id <int> 1, 3, 4, 5, 2, 6,… #> $ loc_name <chr> "World", "World",… #> $ status <chr> "M", "M", "M", "M… #> $ score <dbl> 100.00, 100.00, 1… #> $ match_addr <chr> "635 Hyde Park Av… #> $ long_label <chr> "635 Hyde Park Av… #> $ short_label <chr> "635 Hyde Park Av… #> $ addr_type <chr> "PointAddress", "… #> $ type_field <chr> NA, NA, NA, NA, N… #> $ place_name <chr> NA, NA, NA, NA, N… #> $ geometry <POINT [°]> POINT (-71.…
You can use dplyr::reframe()
to geocode these addresses in a dplyr-friendly way.
Step 4. Join the results
In the previous step you geocoded the addresses and returned a data frame containing the location information. More likely than not, it would be helpful to have the locations joined onto the original dataset. You can do this by using dplyr::left_join()
and joining on the id
column you created and the result_id
from the geocoding results.
joined_addresses <- left_join( restaurants, geocoded, by = c("id" = "result_id") ) dplyr::glimpse(joined_addresses)
#> Rows: 2,664 #> Columns: 90 #> $ restaurant_name <chr> "100%… #> $ restaurant_ID <dbl> 2, 3,… #> $ restaurant_address <chr> "635 … #> $ restaurant_tag <chr> "Lati… #> $ rating <dbl> 2.0, … #> $ price <chr> "$$",… #> $ review_number <dbl> 37, 2… #> $ unique_reviewer <dbl> 34, 2… #> $ reviews_Jan_19 <dbl> 0, 1,… #> $ reviews_Feb_19 <dbl> 1, 2,… #> $ reviews_Mar_19 <dbl> 1, 3,… #> $ reviews_Apr_19 <dbl> 0, 3,… #> $ reviews_May_19 <dbl> 2, 1,… #> $ reviews_Jun_19 <dbl> 0, 0,… #> $ reviews_Jul_19 <dbl> 0, 1,… #> $ reviews_Aug_19 <dbl> 0, 7,… #> $ reviews_Jan_20 <dbl> 0, 0,… #> $ reviews_Feb_20 <dbl> 0, 1,… #> $ reviews_Mar_20 <dbl> 0, 0,… #> $ reviews_Apr_20 <dbl> 1, 0,… #> $ reviews_May_20 <dbl> 1, 0,… #> $ reviews_Jun_20 <dbl> 0, 0,… #> $ reviews_Jul_20 <dbl> 0, 0,… #> $ reviews_Aug_20 <dbl> 0, 0,… #> $ restaurant_neighborhood <chr> "Rosl… #> $ GIS_ID <dbl> 18067… #> $ CT_ID_10 <dbl> 25025… #> $ id <int> 1, 2,… #> $ loc_name <chr> "Worl… #> $ status <chr> "M", … #> $ score <dbl> 100.0… #> $ match_addr <chr> "635 … #> $ long_label <chr> "635 … #> $ short_label <chr> "635 … #> $ addr_type <chr> "Poin… #> $ type_field <chr> NA, N… #> $ place_name <chr> NA, N… #> $ place_addr <chr> "635 … #> $ phone <chr> NA, N… #> $ url <chr> NA, N… #> $ rank <dbl> 20, 2… #> $ add_bldg <chr> NA, N… #> $ add_num <chr> "635"… #> $ add_num_from <chr> NA, N… #> $ add_num_to <chr> NA, N… #> $ add_range <chr> NA, N… #> $ side <chr> NA, N… #> $ st_pre_dir <chr> NA, N… #> $ st_pre_type <chr> NA, N… #> $ st_name <chr> "Hyde… #> $ st_type <chr> "Ave"… #> $ st_dir <chr> NA, N… #> $ bldg_type <chr> NA, N… #> $ bldg_name <chr> NA, N… #> $ level_type <chr> NA, N… #> $ level_name <chr> NA, N… #> $ unit_type <chr> NA, N… #> $ unit_name <chr> NA, N… #> $ sub_addr <chr> NA, N… #> $ st_addr <chr> "635 … #> $ block <chr> NA, N… #> $ sector <chr> NA, N… #> $ nbrhd <chr> NA, N… #> $ district <chr> NA, N… #> $ city <chr> "Rosl… #> $ metro_area <chr> NA, N… #> $ subregion <chr> "Suff… #> $ region <chr> "Mass… #> $ region_abbr <chr> "MA",… #> $ territory <chr> NA, N… #> $ zone <chr> NA, N… #> $ postal <chr> "0213… #> $ postal_ext <chr> "4723… #> $ country <chr> "USA"… #> $ cntry_name <chr> "Unit… #> $ lang_code <chr> "ENG"… #> $ distance <dbl> 0, 0,… #> $ x <dbl> -71.1… #> $ y <dbl> 42.27… #> $ display_x <dbl> -71.1… #> $ display_y <dbl> 42.27… #> $ xmin <dbl> -71.1… #> $ xmax <dbl> -71.1… #> $ ymin <dbl> 42.27… #> $ ymax <dbl> 42.27… #> $ ex_info <chr> NA, N… #> $ bldg_comp <chr> NA, N… #> $ struc_type <chr> "Comm… #> $ struc_det <chr> NA, "… #> $ geometry <POINT [°]> …