Forward geocoding

Forward geocoding is the process of taking an address or place information and identifying its location on the globe.

To geocode addresses, the arcgisgeocode package provides the function find_address_candidates(). This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).

There are two ways in which you can provide address information:

  1. Provide the entire address as a string via the single_line argument
  2. Provide parts of the address using the arguments address, city, region, postal etc.

Single line address geocoding

It can be tough to parse out addresses into their components. Using the single_line argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.

For example, we can geocode the same location using 3 decreasingly specific addresses.

library(arcgisgeocode)

addresses <- c(
  "380 New York Street Redlands, California, 92373, USA",
  "Esri Redlands",
  "ESRI CA"
)

locs <- find_address_candidates(
  addresses,
  max_locations = 1L
)

locs$geometry
#> Geometry set for 3 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -117.1948 ymin: 34.05726 xmax: -117.1948 ymax: 34.05726
#> Geodetic CRS:  WGS 84

#> POINT (-117.1948 34.05726)

#> POINT (-117.1957 34.05609)
#> POINT (-117.1957 34.05609)

In each case, it finds the correct address!

Geocoding from a dataframe

Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates() function does not work well in a dplyr::mutate() function call. Particularly because it is possible to return more than 1 address at a time.

Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates() with a data.frame, it is recommended to create a unique identifier of the row positions.

library(dplyr)

fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data"

bike_stores <- readr::read_csv(fp) |>
  mutate(id = row_number())

bike_stores
#> # A tibble: 10 × 3
#>    store_name                           original_address                                               id
#>    <chr>                                <chr>                                                       <int>
#>  1 Cascadia Wheel Co.                   3320 N Proctor St, Tacoma, WA 98407                             1
#>  2 Puget Sound Bike and Ski Shop        between 3206 N. 15th and 1414, N Alder St, Tacoma, WA 98406     2
#>  3 Takoma Bike & Ski                    3010 6th Ave, Tacoma, WA 98406                                  3
#>  4 Trek Bicycle Tacoma University Place 3550 Market Pl W Suite 102, University Place, WA 98466          4
#>  5 Opalescent Cyclery                   814 6th Ave, Tacoma, WA 98405                                   5
#>  6 Sound Bikes                          108 W Main, Puyallup, WA 98371                                  6
#>  7 Trek Bicycle Tacoma North End        3009 McCarver St, Tacoma, WA 98403                              7
#>  8 Second Cycle                         1205 M.L.K. Jr Way, Tacoma, WA 98405                            8
#>  9 Penny bike co.                       6419 24th St NE, Tacoma, WA 98422                               9
#> 10 Spider's Bike, Ski & Tennis Lab      3608 Grandview St, Gig Harbor, WA 98335                        10

To geocode addresses from a data.frame, you can use dplyr::reframe().

bike_stores |>
  reframe(
    find_address_candidates(original_address)
  )
#> # A tibble: 13 × 62
#>    input_id result_id loc_name status score match_addr           long_label short_label addr_type type_field place_name
#>       <int>     <int> <chr>    <chr>  <dbl> <chr>                <chr>      <chr>       <chr>     <chr>      <chr>     
#>  1        1        NA World    M      100   3320 N Proctor St, … 3320 N Pr… 3320 N Pro… PointAdd… <NA>       <NA>      
#>  2        2        NA World    M       97.6 N 15th St & N Alder… N 15th St… N 15th St … StreetInt <NA>       <NA>      
#>  3        2        NA World    M       97.3 1414 N Alder St, Ta… 1414 N Al… 1414 N Ald… PointAdd… <NA>       <NA>      
#>  4        2        NA World    M       94.7 S 15th St & S Alder… S 15th St… S 15th St … StreetInt <NA>       <NA>      
#>  5        2        NA World    M       84.4 3206 N 15th St, Tac… 3206 N 15… 3206 N 15t… PointAdd… <NA>       <NA>      
#>  6        3        NA World    M      100   3010 6th Ave, Tacom… 3010 6th … 3010 6th A… PointAdd… <NA>       <NA>      
#>  7        4        NA World    M      100   3550 Market Pl W, S… 3550 Mark… 3550 Marke… Subaddre… <NA>       <NA>      
#>  8        5        NA World    M      100   814 6th Ave, Tacoma… 814 6th A… 814 6th Ave PointAdd… <NA>       <NA>      
#>  9        6        NA World    M      100   108 W Main, Puyallu… 108 W Mai… 108 W Main  PointAdd… <NA>       <NA>      
#> 10        7        NA World    M      100   3009 McCarver St, T… 3009 McCa… 3009 McCar… PointAdd… <NA>       <NA>      
#> 11        8        NA World    M      100   1205 Martin Luther … 1205 Mart… 1205 Marti… PointAdd… <NA>       <NA>      
#> 12        9        NA World    M       97.9 6419 24th St NE, Ta… 6419 24th… 6419 24th … PointAdd… <NA>       <NA>      
#> 13       10        NA World    M      100   3608 Grandview St, … 3608 Gran… 3608 Grand… PointAdd… <NA>       <NA>      
#> # ℹ 51 more variables: place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>,
#> #   add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>, st_pre_type <chr>,
#> #   st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>,
#> #   unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>,
#> #   district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>, region_abbr <chr>, territory <chr>,
#> #   zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>, lang_code <chr>, distance <dbl>,
#> #   x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, ymin <dbl>, ymax <dbl>, …

Notice how there are multiple results for each input_id. This is because the max_locations argument was not specified. To ensure only the best match is returned set max_locations = 1

geocoded <- bike_stores |>
  reframe(
    find_address_candidates(original_address, max_locations = 1)
  ) |>
  # reframe drops the sf class, must be added
  sf::st_as_sf()

geocoded
#> Simple feature collection with 10 features and 61 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301
#> Geodetic CRS:  WGS 84
#> # A tibble: 10 × 62
#>    input_id result_id loc_name status score match_addr           long_label short_label addr_type type_field place_name
#>       <int>     <int> <chr>    <chr>  <dbl> <chr>                <chr>      <chr>       <chr>     <chr>      <chr>     
#>  1        1        NA World    M      100   3320 N Proctor St, … 3320 N Pr… 3320 N Pro… PointAdd… <NA>       <NA>      
#>  2        2        NA World    M       97.6 N 15th St & N Alder… N 15th St… N 15th St … StreetInt <NA>       <NA>      
#>  3        3        NA World    M      100   3010 6th Ave, Tacom… 3010 6th … 3010 6th A… PointAdd… <NA>       <NA>      
#>  4        4        NA World    M      100   3550 Market Pl W, S… 3550 Mark… 3550 Marke… Subaddre… <NA>       <NA>      
#>  5        5        NA World    M      100   814 6th Ave, Tacoma… 814 6th A… 814 6th Ave PointAdd… <NA>       <NA>      
#>  6        6        NA World    M      100   108 W Main, Puyallu… 108 W Mai… 108 W Main  PointAdd… <NA>       <NA>      
#>  7        7        NA World    M      100   3009 McCarver St, T… 3009 McCa… 3009 McCar… PointAdd… <NA>       <NA>      
#>  8        8        NA World    M      100   1205 Martin Luther … 1205 Mart… 1205 Marti… PointAdd… <NA>       <NA>      
#>  9        9        NA World    M       97.9 6419 24th St NE, Ta… 6419 24th… 6419 24th … PointAdd… <NA>       <NA>      
#> 10       10        NA World    M      100   3608 Grandview St, … 3608 Gran… 3608 Grand… PointAdd… <NA>       <NA>      
#> # ℹ 51 more variables: place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>,
#> #   add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>, st_pre_type <chr>,
#> #   st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>,
#> #   unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>,
#> #   district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>, region_abbr <chr>, territory <chr>,
#> #   zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>, lang_code <chr>, distance <dbl>,
#> #   x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, ymin <dbl>, ymax <dbl>, …

With this result, you can now join the address fields back onto the bike_stores data.frame using a left_join().

left_join(
  bike_stores,
  geocoded,
  by = c("id" = "input_id")
) |>
  # left_join keeps the class of the first table
  # must add sf class back on
  sf::st_as_sf()
#> Simple feature collection with 10 features and 63 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301
#> Geodetic CRS:  WGS 84
#> # A tibble: 10 × 64
#>    store_name        original_address    id result_id loc_name status score match_addr long_label short_label addr_type
#>    <chr>             <chr>            <int>     <int> <chr>    <chr>  <dbl> <chr>      <chr>      <chr>       <chr>    
#>  1 Cascadia Wheel C… 3320 N Proctor …     1        NA World    M      100   3320 N Pr… 3320 N Pr… 3320 N Pro… PointAdd…
#>  2 Puget Sound Bike… between 3206 N.…     2        NA World    M       97.6 N 15th St… N 15th St… N 15th St … StreetInt
#>  3 Takoma Bike & Ski 3010 6th Ave, T…     3        NA World    M      100   3010 6th … 3010 6th … 3010 6th A… PointAdd…
#>  4 Trek Bicycle Tac… 3550 Market Pl …     4        NA World    M      100   3550 Mark… 3550 Mark… 3550 Marke… Subaddre…
#>  5 Opalescent Cycle… 814 6th Ave, Ta…     5        NA World    M      100   814 6th A… 814 6th A… 814 6th Ave PointAdd…
#>  6 Sound Bikes       108 W Main, Puy…     6        NA World    M      100   108 W Mai… 108 W Mai… 108 W Main  PointAdd…
#>  7 Trek Bicycle Tac… 3009 McCarver S…     7        NA World    M      100   3009 McCa… 3009 McCa… 3009 McCar… PointAdd…
#>  8 Second Cycle      1205 M.L.K. Jr …     8        NA World    M      100   1205 Mart… 1205 Mart… 1205 Marti… PointAdd…
#>  9 Penny bike co.    6419 24th St NE…     9        NA World    M       97.9 6419 24th… 6419 24th… 6419 24th … PointAdd…
#> 10 Spider's Bike, S… 3608 Grandview …    10        NA World    M      100   3608 Gran… 3608 Gran… 3608 Grand… PointAdd…
#> # ℹ 53 more variables: type_field <chr>, place_name <chr>, place_addr <chr>, phone <chr>, url <chr>, rank <dbl>,
#> #   add_bldg <chr>, add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>,
#> #   st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>,
#> #   level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>,
#> #   sector <chr>, nbrhd <chr>, district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>,
#> #   region_abbr <chr>, territory <chr>, zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>,
#> #   lang_code <chr>, distance <dbl>, x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, …

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.