Forward geocoding | ArcGIS R-Bridge

Forward geocoding is the process of taking an address or place information and identifying its location on the globe.

To geocode addresses, the arcgisgeocode package provides the function find_address_candidates(). This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).

There are two ways in which you can provide address information:

Provide the entire address as a string via the single_line argument
Provide parts of the address using the arguments address, city, region, postal etc.

Single line address geocoding

It can be tough to parse out addresses into their components. Using the single_line argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.

For example, we can geocode the same location using 3 decreasingly specific addresses.

library(arcgisgeocode)

addresses <- c(
  "380 New York Street Redlands, California, 92373, USA",
  "Esri Redlands",
  "ESRI CA"
)

locs <- find_address_candidates(
  addresses,
  max_locations = 1L
)

locs$geometry

#> Geometry set for 3 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -117.1957 ymin: 34.05609 xmax: -117.1948 ymax: 34.05724
#> Geodetic CRS:  WGS 84

#> POINT (-117.1948 34.05724)

#> POINT (-117.1957 34.05609)
#> POINT (-117.1957 34.05609)

In each case, it finds the correct address!

Geocoding from a dataframe

Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates() function does not work well in a dplyr::mutate() function call. Particularly because it is possible to return more than 1 address at a time.

Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates() with a data.frame, it is recommended to create a unique identifier of the row positions.

library(dplyr)

fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data"

bike_stores <- readr::read_csv(fp) |>
  mutate(id = row_number())

bike_stores

#> # A tibble: 10 × 3
#>    store_name    original_address    id
#>    <chr>         <chr>            <int>
#>  1 Cascadia Whe… 3320 N Proctor …     1
#>  2 Puget Sound … between 3206 N.…     2
#>  3 Takoma Bike … 3010 6th Ave, T…     3
#>  4 Trek Bicycle… 3550 Market Pl …     4
#>  5 Opalescent C… 814 6th Ave, Ta…     5
#>  6 Sound Bikes   108 W Main, Puy…     6
#>  7 Trek Bicycle… 3009 McCarver S…     7
#>  8 Second Cycle  1205 M.L.K. Jr …     8
#>  9 Penny bike c… 6419 24th St NE…     9
#> 10 Spider's Bik… 3608 Grandview …    10

To geocode addresses from a data.frame, you can use dplyr::reframe().

bike_stores |>
  reframe(
    find_address_candidates(original_address)
  )

#> # A tibble: 15 × 65
#>    input_id result_id loc_name status
#>       <int>     <int> <chr>    <chr> 
#>  1        1        NA World    M     
#>  2        2        NA World    M     
#>  3        2        NA World    M     
#>  4        2        NA World    M     
#>  5        2        NA World    M     
#>  6        2        NA World    M     
#>  7        3        NA World    M     
#>  8        4        NA World    M     
#>  9        5        NA World    M     
#> 10        6        NA World    M     
#> 11        7        NA World    M     
#> 12        8        NA World    M     
#> 13        9        NA World    M     
#> 14       10        NA World    M     
#> 15       10        NA World    M     
#> # ℹ 61 more variables: score <dbl>,
#> #   match_addr <chr>,
#> #   long_label <chr>,
#> #   short_label <chr>,
#> #   addr_type <chr>, type_field <chr>,
#> #   place_name <chr>,
#> #   place_addr <chr>, phone <chr>, …

Notice how there are multiple results for each input_id. This is because the max_locations argument was not specified. To ensure only the best match is returned set max_locations = 1

geocoded <- bike_stores |>
  reframe(
    find_address_candidates(original_address, max_locations = 1)
  ) |>
  # reframe drops the sf class, must be added
  sf::st_as_sf()

geocoded

#> Simple feature collection with 10 features and 64 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -122.5871 ymin: 47.19168 xmax: -122.294 ymax: 47.32302
#> Geodetic CRS:  WGS 84
#> # A tibble: 10 × 65
#>    input_id result_id loc_name status
#>       <int>     <int> <chr>    <chr> 
#>  1        1        NA World    M     
#>  2        2        NA World    M     
#>  3        3        NA World    M     
#>  4        4        NA World    M     
#>  5        5        NA World    M     
#>  6        6        NA World    M     
#>  7        7        NA World    M     
#>  8        8        NA World    M     
#>  9        9        NA World    M     
#> 10       10        NA World    M     
#> # ℹ 61 more variables: score <dbl>,
#> #   match_addr <chr>,
#> #   long_label <chr>,
#> #   short_label <chr>,
#> #   addr_type <chr>, type_field <chr>,
#> #   place_name <chr>,
#> #   place_addr <chr>, phone <chr>, …

With this result, you can now join the address fields back onto the bike_stores data.frame using a left_join().

left_join(
  bike_stores,
  geocoded,
  by = c("id" = "input_id")
) |>
  # left_join keeps the class of the first table
  # must add sf class back on
  sf::st_as_sf()

#> Simple feature collection with 10 features and 66 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -122.5871 ymin: 47.19168 xmax: -122.294 ymax: 47.32302
#> Geodetic CRS:  WGS 84
#> # A tibble: 10 × 67
#>    store_name    original_address    id
#>    <chr>         <chr>            <int>
#>  1 Cascadia Whe… 3320 N Proctor …     1
#>  2 Puget Sound … between 3206 N.…     2
#>  3 Takoma Bike … 3010 6th Ave, T…     3
#>  4 Trek Bicycle… 3550 Market Pl …     4
#>  5 Opalescent C… 814 6th Ave, Ta…     5
#>  6 Sound Bikes   108 W Main, Puy…     6
#>  7 Trek Bicycle… 3009 McCarver S…     7
#>  8 Second Cycle  1205 M.L.K. Jr …     8
#>  9 Penny bike c… 6419 24th St NE…     9
#> 10 Spider's Bik… 3608 Grandview …    10
#> # ℹ 64 more variables:
#> #   result_id <int>, loc_name <chr>,
#> #   status <chr>, score <dbl>,
#> #   match_addr <chr>,
#> #   long_label <chr>,
#> #   short_label <chr>,
#> #   addr_type <chr>, …