Forward geocoding is the process of taking an address or place information and identifying its location on the globe.
To geocode addresses, the arcgisgeocode package provides the function find_address_candidates(). This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).
There are two ways in which you can provide address information:
- Provide the entire address as a string via the
single_lineargument - Provide parts of the address using the arguments
address,city,region,postaletc.
Single line address geocoding
It can be tough to parse out addresses into their components. Using the single_line argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.
For example, we can geocode the same location using 3 decreasingly specific addresses.
library(arcgisgeocode) addresses <- c( "380 New York Street Redlands, California, 92373, USA", "Esri Redlands", "ESRI CA" ) locs <- find_address_candidates( addresses, max_locations = 1L ) locs$geometry
#> Geometry set for 3 features #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -117.1948 ymin: 34.05724 xmax: -117.1948 ymax: 34.05724 #> Geodetic CRS: WGS 84 #> POINT (-117.1948 34.05724) #> POINT (-117.1957 34.05609) #> POINT (-117.1957 34.05609)
In each case, it finds the correct address!
Geocoding from a dataframe
Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates() function does not work well in a dplyr::mutate() function call. Particularly because it is possible to return more than 1 address at a time.
Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates() with a data.frame, it is recommended to create a unique identifier of the row positions.
library(dplyr) fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data" bike_stores <- readr::read_csv(fp) |> mutate(id = row_number()) bike_stores
#> # A tibble: 10 × 3 #> store_name original_address id #> <chr> <chr> <int> #> 1 Cascadia Wheel Co. 3320 N Proctor St, Tacoma, WA 984… 1 #> 2 Puget Sound Bike and Ski Shop between 3206 N. 15th and 1414, N … 2 #> 3 Takoma Bike & Ski 3010 6th Ave, Tacoma, WA 98406 3 #> 4 Trek Bicycle Tacoma University Place 3550 Market Pl W Suite 102, Unive… 4 #> 5 Opalescent Cyclery 814 6th Ave, Tacoma, WA 98405 5 #> 6 Sound Bikes 108 W Main, Puyallup, WA 98371 6 #> 7 Trek Bicycle Tacoma North End 3009 McCarver St, Tacoma, WA 98403 7 #> 8 Second Cycle 1205 M.L.K. Jr Way, Tacoma, WA 98… 8 #> 9 Penny bike co. 6419 24th St NE, Tacoma, WA 98422 9 #> 10 Spider's Bike, Ski & Tennis Lab 3608 Grandview St, Gig Harbor, WA… 10
To geocode addresses from a data.frame, you can use dplyr::reframe().
bike_stores |> reframe( find_address_candidates(original_address) )
#> # A tibble: 13 × 65 #> input_id result_id loc_name status score match_addr long_label short_label #> <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> #> 1 1 NA World M 100 3320 N Proct… 3320 N Pr… 3320 N Pro… #> 2 2 NA World M 97.3 1414 N Alder… 1414 N Al… 1414 N Ald… #> 3 2 NA World M 92.2 N 15th St & … N 15th St… N 15th St … #> 4 2 NA World M 89.2 S 15th St & … S 15th St… S 15th St … #> 5 2 NA World M 87.3 N Alder St, … N Alder S… N Alder St #> 6 3 NA World M 100 3010 6th Ave… 3010 6th … 3010 6th A… #> 7 4 NA World M 100 3550 Market … 3550 Mark… 3550 Marke… #> 8 5 NA World M 100 814 6th Ave,… 814 6th A… 814 6th Ave #> 9 6 NA World M 100 108 W Main, … 108 W Mai… 108 W Main #> 10 7 NA World M 100 3009 McCarve… 3009 McCa… 3009 McCar… #> 11 8 NA World M 100 1205 Martin … 1205 Mart… 1205 Marti… #> 12 9 NA World M 97.9 6419 24th St… 6419 24th… 6419 24th … #> 13 10 NA World M 100 3608 Grandvi… 3608 Gran… 3608 Grand… #> # ℹ 57 more variables: addr_type <chr>, type_field <chr>, place_name <chr>, #> # place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, #> # add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>, #> # side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>, #> # st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, #> # level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>, #> # sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …
Notice how there are multiple results for each input_id. This is because the max_locations argument was not specified. To ensure only the best match is returned set max_locations = 1
geocoded <- bike_stores |> reframe( find_address_candidates(original_address, max_locations = 1) ) |> # reframe drops the sf class, must be added sf::st_as_sf() geocoded
#> Simple feature collection with 10 features and 64 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19169 xmax: -122.294 ymax: 47.32302 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 65 #> input_id result_id loc_name status score match_addr long_label short_label #> <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> #> 1 1 NA World M 100 3320 N Proct… 3320 N Pr… 3320 N Pro… #> 2 2 NA World M 97.3 1414 N Alder… 1414 N Al… 1414 N Ald… #> 3 3 NA World M 100 3010 6th Ave… 3010 6th … 3010 6th A… #> 4 4 NA World M 100 3550 Market … 3550 Mark… 3550 Marke… #> 5 5 NA World M 100 814 6th Ave,… 814 6th A… 814 6th Ave #> 6 6 NA World M 100 108 W Main, … 108 W Mai… 108 W Main #> 7 7 NA World M 100 3009 McCarve… 3009 McCa… 3009 McCar… #> 8 8 NA World M 100 1205 Martin … 1205 Mart… 1205 Marti… #> 9 9 NA World M 97.9 6419 24th St… 6419 24th… 6419 24th … #> 10 10 NA World M 100 3608 Grandvi… 3608 Gran… 3608 Grand… #> # ℹ 57 more variables: addr_type <chr>, type_field <chr>, place_name <chr>, #> # place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, #> # add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>, #> # side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>, #> # st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, #> # level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>, #> # sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …
With this result, you can now join the address fields back onto the bike_stores data.frame using a left_join().
left_join( bike_stores, geocoded, by = c("id" = "input_id") ) |> # left_join keeps the class of the first table # must add sf class back on sf::st_as_sf()
#> Simple feature collection with 10 features and 66 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19169 xmax: -122.294 ymax: 47.32302 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 67 #> store_name original_address id result_id loc_name status score match_addr #> <chr> <chr> <int> <int> <chr> <chr> <dbl> <chr> #> 1 Cascadia W… 3320 N Proctor … 1 NA World M 100 3320 N Pr… #> 2 Puget Soun… between 3206 N.… 2 NA World M 97.3 1414 N Al… #> 3 Takoma Bik… 3010 6th Ave, T… 3 NA World M 100 3010 6th … #> 4 Trek Bicyc… 3550 Market Pl … 4 NA World M 100 3550 Mark… #> 5 Opalescent… 814 6th Ave, Ta… 5 NA World M 100 814 6th A… #> 6 Sound Bikes 108 W Main, Puy… 6 NA World M 100 108 W Mai… #> 7 Trek Bicyc… 3009 McCarver S… 7 NA World M 100 3009 McCa… #> 8 Second Cyc… 1205 M.L.K. Jr … 8 NA World M 100 1205 Mart… #> 9 Penny bike… 6419 24th St NE… 9 NA World M 97.9 6419 24th… #> 10 Spider's B… 3608 Grandview … 10 NA World M 100 3608 Gran… #> # ℹ 59 more variables: long_label <chr>, short_label <chr>, addr_type <chr>, #> # type_field <chr>, place_name <chr>, place_addr <chr>, phone <chr>, #> # url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>, add_num_from <chr>, #> # add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>, #> # st_pre_type <chr>, st_name <chr>, st_type <chr>, st_dir <chr>, #> # bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>, #> # unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, …