Exploring OpenStreetMap using Pandas and the Python API

This notebook is based around a simple tool named OSM Runner that queries the OpenStreetMap (OSM) Overpass API and returns a Spatial Data Frame. Using the Python API inside of a Jupyter Notebook, we can develop map-driven tools to explore OSM with the full capabilities of the ArcGIS platform at our disposal. Be sure to update the GIS connection information in the cell below before proceeding.

This Notebook was written for an environment that does not have access to arcpy.

!pip install osm-runner
Collecting osm-runner
  Downloading osm_runner-0.0.3-py3-none-any.whl (3.8 kB)
Requirement already satisfied: arcgis in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.8.1)
Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.0.3)
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from osm-runner) (2.23.0)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2020.1)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (1.18.1)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2020.4.5.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (3.0.4)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas->osm-runner) (1.15.0)
Installing collected packages: osm-runner
Successfully installed osm-runner-0.0.3
import time

from osm_runner import Runner  # pip install osm-runner
import pandas as pd

from arcgis.features import FeatureLayer, GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import enrich
from arcgis import dissolve_boundaries
from arcgis.geometry import project
from arcgis.gis import GIS

# Organization Login
gis = GIS('home')

Build Data Frames from Feature Layers & Extract Bounding Box

Let's assume we want to compare recycling amenities in OSM across 2 major cities. The first step will be to turn the boundaries for each city into a Data Frame via the GeoAccessor method from_layer(). Once we have a Data Frame for each city, we will use the Project operation of the Geometry Service in our GIS to get the envelope required to fetch data from Open Street Map.

dc_fl = FeatureLayer('https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/10')
dc_df = GeoAccessor.from_layer(dc_fl)
display(dc_df.head())

dc_extent = dc_df.spatial.full_extent
dc_coords = project([[dc_extent[0], dc_extent[1]], [dc_extent[2], dc_extent[3]]], in_sr=3857, out_sr=4326)
dc_bounds = f"({dc_coords[0]['y']},{dc_coords[0]['x']},{dc_coords[1]['y']},{dc_coords[1]['x']})"

pr_fl = FeatureLayer('https://carto2.apur.org/apur/rest/services/OPENDATA/QUARTIER/MapServer/0')
pr_df = GeoAccessor.from_layer(pr_fl)
display(pr_df.head())

pr_extent = pr_df.spatial.full_extent
pr_coords = project([[pr_extent[0], pr_extent[1]], [pr_extent[2], pr_extent[3]]], in_sr=2154, out_sr=4326)
pr_bounds = f"({pr_coords[0]['y']},{pr_coords[0]['x']},{pr_coords[1]['y']},{pr_coords[1]['x']})"
OBJECTIDCITY_NAMESTATE_CITYCAPITALWEB_URLAREAKMAREAMILESShape_LengthShape_AreaSHAPE
01Washington1150000Yhttp://www.dc.gov177.4768.5267608.2769221.774562e+08{'rings': [[[-8584936.334474642, 4712272.26069...
OBJECTIDN_SQ_QUC_QUC_QUINSEEL_QUC_ARN_SQ_ARSHAPE_LengthSHAPE_AreaSHAPE
01750000046467511202Picpus1275000001218260.6025877.205014e+06{'rings': [[[656779.2524999976, 6859005.5504],...
12750000047477511203Bercy127500000126154.5914151.902932e+06{'rings': [[[655300.0428000018, 6858622.629000...
23750000048487511204Quinze-Vingts127500000124509.2264761.235916e+06{'rings': [[[653996.0221000016, 6860240.466299...
34750000049497511301Salpêtrière137500000134758.7776751.181560e+06{'rings': [[[652751.3332000002, 6859190.603100...
45750000050507511302Gare137500000137069.8189893.044177e+06{'rings': [[[653571.8592000008, 6857669.765999...

Overview of the area in Washington DC to be Collected

dc_map = gis.map('Washington DC')

dc_map.draw(dc_df.iloc[0].SHAPE)
dc_map.draw(dc_df.spatial.bbox)

display(dc_map)

print(f'Searching Area: {round(dc_df.spatial.bbox.area / 1000000)} Square Kilometers')

Overview of the Area in Paris to be Collected

pr_map = gis.map('Paris')

pr_dis = dissolve_boundaries(pr_fl).query().sdf.iloc[0].SHAPE
pr_map.draw(pr_dis)
pr_map.draw(pr_df.spatial.bbox)

display(pr_map)

print(f'Searching Area: {round(pr_df.spatial.bbox.area / 1000000)} Square Kilometers')

Collecting Demographics with Geoenrichment

In addition to the useful GeoAccessor methods and properties we can access via a Data Frame, we may also pass a Data Frame to the enrich() method to learn something useful about the area we are studying. Even before we fetch data from OpenStreetMap, it should be clear from the differences in population size, population density, and the study area being assessed, that any comparison between DC and Paris would not be fair. At the end of the notebook we will set up a solution to help us find a city that might be more comparable to Paris or DC.

try:
    dc_e = enrich(dc_df, gis=gis)
    display(dc_e.head())
    pr_e = enrich(pr_df, gis=gis)
    display(pr_e.head())
    
    print(f'DC Population:    {dc_e.totpop.sum()}')
    print(f'DC Density:       {int(round(dc_e.totpop.sum() / (dc_df.spatial.area / 1000000)))} per Square Kilometer')
    print(f'Paris Population: {pr_e.totpop.sum()}')
    print(f'Paris Density:    {int(round(pr_e.totpop.sum() / (pr_dis.area / 1000000)))} per Square Kilometer')
    
except RuntimeError:
    print('Your GIS Connection Does Not Support Geoenrichment')
IDOBJECTID_0sourceCountryOBJECTIDAREAKMCITY_NAMEShape_AreaAREAMILESShape_LengthSTATE_CITY...aggregationMethodpopulationToPolygonSizeRatingapportionmentConfidenceHasDataTOTPOPTOTHHAVGHHSZTOTMALESTOTFEMALESSHAPE
001US1177.47Washington1.774562e+0868.5267608.2769221150000...BlockApportionment:US.BlockGroups2.1912.57617023503112362.13334037368314{'rings': [[[-77.1197952245159, 38.93435090401...

1 rows × 22 columns

IDOBJECTID_0sourceCountryOBJECTIDC_ARSHAPE_LengthL_QUC_QUINSEEN_SQ_ARSHAPE_Area...aggregationMethodpopulationToPolygonSizeRatingapportionmentConfidenceHasDataTOTPOPTOTHHAVGHHSZTOTMALESTOTFEMALESSHAPE
001FR11218260.602587Picpus75112027500000127.205014e+06...BlockApportionment:FR.IRIS2.0252.731167094344721.953122535869{'rings': [[[2.4112497331466076, 48.8296572091...
112FR2126154.591415Bercy75112037500000121.902932e+06...BlockApportionment:FR.IRIS2.0252.73111358861522.2164277161{'rings': [[[2.391141037849975, 48.82611264487...
223FR3124509.226476Quinze-Vingts75112047500000121.235916e+06...BlockApportionment:FR.IRIS2.0252.731127261147561.851318114079{'rings': [[[2.373204762244875, 48.84057029279...
334FR4134758.777675Salpêtrière75113017500000131.181560e+06...BlockApportionment:FR.IRIS2.0252.73111828697821.8784899797{'rings': [[[2.356363453081923, 48.83103873373...
445FR5137069.818989Gare75113027500000133.044177e+06...BlockApportionment:FR.IRIS2.0252.731173914347342.133493238982{'rings': [[[2.367706331151315, 48.81742119763...

5 rows × 22 columns

DC Population:    702350
DC Density:       2393 per Square Kilometer
Paris Population: 2211963
Paris Density:    20992 per Square Kilometer

Fetch Open Street Map Data within Boundaries as Data Frame.

For our purposes, we will only be looking for recycling amenities. We could have collected all amenities by simply passing the string 'amenity' as the third argument. Or we might have tried to find all of the known surveillance cameras in our extent by passing {'man_made': ['surveillance']}. Please consult the OSM Wiki for more information on what features you can extract. We are adding the following results to the first 2 maps we created above.

runner = Runner()

dc_osm_df = runner.gen_osm_df('point', dc_bounds, {'amenity': ["recycling"]})

dc_osm_df.columns = dc_osm_df.columns.str.replace("recycling:", "rec")
dc_osm_df.SHAPE   = dc_osm_df.geom

dc_osm_df.spatial.plot(map_widget=dc_map, renderer_type='u', col='recycling_type')


pr_osm_df = runner.gen_osm_df('point', pr_bounds, {'amenity': ["recycling"]})

pr_osm_df.columns = pr_osm_df.columns.str.replace("recycling:", "rec")
pr_osm_df.SHAPE   = pr_osm_df.geom

pr_osm_df.spatial.plot(map_widget=pr_map, renderer_type='u', col='recycling_type')

display(dc_osm_df.head(n=1))
display(pr_osm_df.head(n=1))
osm_idgeomamenitysourcerecycling_typeopening_hoursoperatorlocationeledescriptionreccansrecclothesrecglass_bottlesrecpaperwpt_descriptionrecplastic_bottlesfixmerecbookswebsiteSHAPE
02607287308{'x': -77.1188753, 'y': 38.8614258, 'spatialRe...recyclingsite visitNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN{'x': -77.1188753, 'y': 38.8614258, 'spatialRe...
osm_idgeomamenityrecglassrecycling_typesourcelocationrecglass_bottlesmaterialoperator...contact:phonecontact:websiterecbicyclessecond_handservice:bicycle:diyshoprecglass_jarsname:frstart_dateSHAPE
03350036758{'x': 2.2247201, 'y': 48.8158202, 'spatialRefe...recyclingyescontainerGPSO data.gouv.fr 2015-02NaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaN{'x': 2.2247201, 'y': 48.8158202, 'spatialRefe...

1 rows × 82 columns

General Attribute Comparison

We can use basic Data Frame methods to get a general idea of the differences between the recycling features in DC and Paris. While the completeness (more unique sources and operators) of the Paris data may partially be the result of there being many more records, we see a number of non-profit agencies operating these Parisian facilities and government efforts in Paris focused on documenting these features. An interesting question might be whether the large discrepancy (both in raw counts and specificity in the details) is the result of OSM simply being used more in Europe, or the result of a different set of values toward environmental stewardship.

# Values for DC

print(f'Total Records found for DC Data Frame:        {len(dc_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(dc_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({dc_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({dc_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.source.value_counts()[:5].to_string())
Total Records found for DC Data Frame:        69
Total Attributes Defined in Paris Data Frame: 20
#########################
Top 5 Operators (2 Unique)
#########################
Arlington County      1
Better World Books    1
#########################
Top 5 Sources (1 Unique)
#########################
site visit    1
# Values for Paris

print(f'Total Records found for Paris Data Frame:     {len(pr_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(pr_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({pr_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({pr_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.source.value_counts()[:5].to_string())
Total Records found for Paris Data Frame:     1265
Total Attributes Defined in Paris Data Frame: 82
#########################
Top 5 Operators (11 Unique)
#########################
Eco-Emballages        40
Le Relais             27
Ecotextile             4
WWF                    3
Issy en Transition     2
#########################
Top 5 Sources (46 Unique)
#########################
survey                                                                                   254
GPSO data.gouv.fr 2015-02                                                                107
data.issy.com 15/06/2016                                                                  64
GPSO data.gouv.fr 2015-02;survey                                                          54
cadastre-dgi-fr source : Direction Générale des Impôts - Cadastre. Mise à jour : 2011     13

Using the Map to Drive Our Exploration

Perhaps we are interested in finding shops that do not have recycling options nearby. The following 2 cells can be used as a way to explore OSM interactively within the Jupyter Notebook. Locate a place, drag the map around, and then run the last cell to plot a heat map of recycling amenities and all of the shops within the extent of the map. With only a few lines of code, we have the beginning of a site selection tool that also exposes all of the analytical power of Python and the ArcGIS platform.

search_map = gis.map('Berlin', 12)
display(search_map)

Get Data Frame for Map Extent & Plot

extent = search_map.extent
coords = project([[extent['xmin'], extent['ymin']], [extent['xmax'], extent['ymax']]], in_sr=3857, out_sr=4326)
bounds = f"({coords[0]['y']},{coords[0]['x']},{coords[1]['y']},{coords[1]['x']})"

try:
    runner = Runner()
    shop_df = runner.gen_osm_df('point', bounds, {'shop': ['coffee', 'street_vendor', 'convenience']})
    recy_df = runner.gen_osm_df('point', bounds, {'amenity': ['recycling']})

    # Move Geometries to SHAPE Column to Support Plot
    shop_df.SHAPE = shop_df.geom
    recy_df.SHAPE = recy_df.geom

    print(f'OSM Coffee Shops Features Within Current Map View: {len(shop_df)}')
    print(f'OSM Recycling Features Within Current Map View:    {len(recy_df)}')

    recy_df.spatial.plot(map_widget=search_map, renderer_type='h')
    shop_df.spatial.plot(map_widget=search_map)
    
except Exception as e:
    print("We Likely Didn't Find Any Features in this Extent.")
    print(e)
    
except KeyError as e:
    print('Try Moving the Map Around & Running This Cell Again')
    print(e)
OSM Coffee Shops Features Within Current Map View: 636
OSM Recycling Features Within Current Map View:    941

Export the Data to ArcGIS Online or Portal for Further Analysis

Finally, the GeoAccessor gives us a convenient method for pushing our Data Frame into a Hosted Feature Layer within Portal or ArcGIS Online so that we can do further analysis or share the information with other people in our organization. We could have also moved our results into a database with the to_featureclass() method.

recycling_hfl = recy_df.spatial.to_featurelayer(f'OSM_Recycling_{round(time.time())}', gis=gis, tags='OSM')
shops_hfl = shop_df.spatial.to_featurelayer(f'OSM_Shops_{round(time.time())}', gis=gis, tags='OSM')

display(recycling_hfl)
display(shops_hfl)
OSM_Recycling_1574000884
Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views
OSM_Shops_1574000907
Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.