Forward geocoding is the process of taking an address or place information and identifying its location on the globe.
To geocode addresses, the arcgisgeocode package provides the function find_address_candidates(). This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).
There are two ways in which you can provide address information:
- Provide the entire address as a string via the 
single_lineargument - Provide parts of the address using the arguments 
address,city,region,postaletc. 
Single line address geocoding
It can be tough to parse out addresses into their components. Using the single_line argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.
For example, we can geocode the same location using 3 decreasingly specific addresses.
library(arcgisgeocode) addresses <- c( "380 New York Street Redlands, California, 92373, USA", "Esri Redlands", "ESRI CA" ) locs <- find_address_candidates( addresses, max_locations = 1L ) locs$geometry
#> Geometry set for 3 features #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -117.1957 ymin: 34.05609 xmax: -117.1948 ymax: 34.05724 #> Geodetic CRS: WGS 84 #> POINT (-117.1948 34.05724) #> POINT (-117.1957 34.05609) #> POINT (-117.1957 34.05609)
In each case, it finds the correct address!
Geocoding from a dataframe
Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates() function does not work well in a dplyr::mutate() function call. Particularly because it is possible to return more than 1 address at a time.
Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates() with a data.frame, it is recommended to create a unique identifier of the row positions.
library(dplyr) fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data" bike_stores <- readr::read_csv(fp) |> mutate(id = row_number()) bike_stores
#> # A tibble: 10 × 3 #> store_name original_address id #> <chr> <chr> <int> #> 1 Cascadia Whe… 3320 N Proctor … 1 #> 2 Puget Sound … between 3206 N.… 2 #> 3 Takoma Bike … 3010 6th Ave, T… 3 #> 4 Trek Bicycle… 3550 Market Pl … 4 #> 5 Opalescent C… 814 6th Ave, Ta… 5 #> 6 Sound Bikes 108 W Main, Puy… 6 #> 7 Trek Bicycle… 3009 McCarver S… 7 #> 8 Second Cycle 1205 M.L.K. Jr … 8 #> 9 Penny bike c… 6419 24th St NE… 9 #> 10 Spider's Bik… 3608 Grandview … 10
To geocode addresses from a data.frame, you can use dplyr::reframe().
bike_stores |> reframe( find_address_candidates(original_address) )
#> # A tibble: 15 × 65 #> input_id result_id loc_name status #> <int> <int> <chr> <chr> #> 1 1 NA World M #> 2 2 NA World M #> 3 2 NA World M #> 4 2 NA World M #> 5 2 NA World M #> 6 2 NA World M #> 7 3 NA World M #> 8 4 NA World M #> 9 5 NA World M #> 10 6 NA World M #> 11 7 NA World M #> 12 8 NA World M #> 13 9 NA World M #> 14 10 NA World M #> 15 10 NA World M #> # ℹ 61 more variables: score <dbl>, #> # match_addr <chr>, #> # long_label <chr>, #> # short_label <chr>, #> # addr_type <chr>, type_field <chr>, #> # place_name <chr>, #> # place_addr <chr>, phone <chr>, …
Notice how there are multiple results for each input_id. This is because the max_locations argument was not specified. To ensure only the best match is returned set max_locations = 1
geocoded <- bike_stores |> reframe( find_address_candidates(original_address, max_locations = 1) ) |> # reframe drops the sf class, must be added sf::st_as_sf() geocoded
#> Simple feature collection with 10 features and 64 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19168 xmax: -122.294 ymax: 47.32302 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 65 #> input_id result_id loc_name status #> <int> <int> <chr> <chr> #> 1 1 NA World M #> 2 2 NA World M #> 3 3 NA World M #> 4 4 NA World M #> 5 5 NA World M #> 6 6 NA World M #> 7 7 NA World M #> 8 8 NA World M #> 9 9 NA World M #> 10 10 NA World M #> # ℹ 61 more variables: score <dbl>, #> # match_addr <chr>, #> # long_label <chr>, #> # short_label <chr>, #> # addr_type <chr>, type_field <chr>, #> # place_name <chr>, #> # place_addr <chr>, phone <chr>, …
With this result, you can now join the address fields back onto the bike_stores data.frame using a left_join().
left_join( bike_stores, geocoded, by = c("id" = "input_id") ) |> # left_join keeps the class of the first table # must add sf class back on sf::st_as_sf()
#> Simple feature collection with 10 features and 66 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19168 xmax: -122.294 ymax: 47.32302 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 67 #> store_name original_address id #> <chr> <chr> <int> #> 1 Cascadia Whe… 3320 N Proctor … 1 #> 2 Puget Sound … between 3206 N.… 2 #> 3 Takoma Bike … 3010 6th Ave, T… 3 #> 4 Trek Bicycle… 3550 Market Pl … 4 #> 5 Opalescent C… 814 6th Ave, Ta… 5 #> 6 Sound Bikes 108 W Main, Puy… 6 #> 7 Trek Bicycle… 3009 McCarver S… 7 #> 8 Second Cycle 1205 M.L.K. Jr … 8 #> 9 Penny bike c… 6419 24th St NE… 9 #> 10 Spider's Bik… 3608 Grandview … 10 #> # ℹ 64 more variables: #> # result_id <int>, loc_name <chr>, #> # status <chr>, score <dbl>, #> # match_addr <chr>, #> # long_label <chr>, #> # short_label <chr>, #> # addr_type <chr>, …