Exploring OpenStreetMap using Pandas and the Python API

This notebook is based around a simple tool named OSM Runner that queries the OpenStreetMap (OSM) Overpass API and returns a Spatial Data Frame. Using the Python API inside of a Jupyter Notebook, we can develop map-driven tools to explore OSM with the full capabilities of the ArcGIS platform at our disposal. Be sure to update the GIS connection information in the cell below before proceeding.

This Notebook was written for an environment that does not have access to arcpy.

Input
!pip install osm-runner
Collecting osm-runner
  Downloading osm_runner-0.0.3-py3-none-any.whl (3.8 kB)
Requirement already satisfied: arcgis in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.8.1)
Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.0.3)
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from osm-runner) (2.23.0)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2020.1)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (1.18.1)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2020.4.5.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (3.0.4)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas->osm-runner) (1.15.0)
Installing collected packages: osm-runner
Successfully installed osm-runner-0.0.3
Input
import time

from osm_runner import Runner  # pip install osm-runner
import pandas as pd

from arcgis.features import FeatureLayer, GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import enrich
from arcgis import dissolve_boundaries
from arcgis.geometry import project
from arcgis.gis import GIS

# Organization Login
gis = GIS('home')

Build Data Frames from Feature Layers & Extract Bounding Box

Let's assume we want to compare recycling amenities in OSM across 2 major cities. The first step will be to turn the boundaries for each city into a Data Frame via the GeoAccessor method from_layer(). Once we have a Data Frame for each city, we will use the Project operation of the Geometry Service in our GIS to get the envelope required to fetch data from Open Street Map.

Input
dc_fl = FeatureLayer('https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/10')
dc_df = GeoAccessor.from_layer(dc_fl)
display(dc_df.head())

dc_extent = dc_df.spatial.full_extent
dc_coords = project([[dc_extent[0], dc_extent[1]], [dc_extent[2], dc_extent[3]]], in_sr=3857, out_sr=4326)
dc_bounds = f"({dc_coords[0]['y']},{dc_coords[0]['x']},{dc_coords[1]['y']},{dc_coords[1]['x']})"

pr_fl = FeatureLayer('https://carto2.apur.org/apur/rest/services/OPENDATA/QUARTIER/MapServer/0')
pr_df = GeoAccessor.from_layer(pr_fl)
display(pr_df.head())

pr_extent = pr_df.spatial.full_extent
pr_coords = project([[pr_extent[0], pr_extent[1]], [pr_extent[2], pr_extent[3]]], in_sr=2154, out_sr=4326)
pr_bounds = f"({pr_coords[0]['y']},{pr_coords[0]['x']},{pr_coords[1]['y']},{pr_coords[1]['x']})"
OBJECTID CITY_NAME STATE_CITY CAPITAL WEB_URL AREAKM AREAMILES Shape_Length Shape_Area SHAPE
0 1 Washington 1150000 Y http://www.dc.gov 177.47 68.52 67608.276922 1.774562e+08 {'rings': [[[-8584936.334474642, 4712272.26069...
OBJECTID N_SQ_QU C_QU C_QUINSEE L_QU C_AR N_SQ_AR SHAPE_Length SHAPE_Area SHAPE
0 1 750000046 46 7511202 Picpus 12 750000012 18260.602587 7.205014e+06 {'rings': [[[656779.2524999976, 6859005.5504],...
1 2 750000047 47 7511203 Bercy 12 750000012 6154.591415 1.902932e+06 {'rings': [[[655300.0428000018, 6858622.629000...
2 3 750000048 48 7511204 Quinze-Vingts 12 750000012 4509.226476 1.235916e+06 {'rings': [[[653996.0221000016, 6860240.466299...
3 4 750000049 49 7511301 Salpêtrière 13 750000013 4758.777675 1.181560e+06 {'rings': [[[652751.3332000002, 6859190.603100...
4 5 750000050 50 7511302 Gare 13 750000013 7069.818989 3.044177e+06 {'rings': [[[653571.8592000008, 6857669.765999...

Overview of the area in Washington DC to be Collected

Input
dc_map = gis.map('Washington DC')

dc_map.draw(dc_df.iloc[0].SHAPE)
dc_map.draw(dc_df.spatial.bbox)

display(dc_map)

print(f'Searching Area: {round(dc_df.spatial.bbox.area / 1000000)} Square Kilometers')
Output

Overview of the Area in Paris to be Collected

Input
pr_map = gis.map('Paris')

pr_dis = dissolve_boundaries(pr_fl).query().sdf.iloc[0].SHAPE
pr_map.draw(pr_dis)
pr_map.draw(pr_df.spatial.bbox)

display(pr_map)

print(f'Searching Area: {round(pr_df.spatial.bbox.area / 1000000)} Square Kilometers')
Output

Collecting Demographics with Geoenrichment

In addition to the useful GeoAccessor methods and properties we can access via a Data Frame, we may also pass a Data Frame to the enrich() method to learn something useful about the area we are studying. Even before we fetch data from OpenStreetMap, it should be clear from the differences in population size, population density, and the study area being assessed, that any comparison between DC and Paris would not be fair. At the end of the notebook we will set up a solution to help us find a city that might be more comparable to Paris or DC.

Input
try:
    dc_e = enrich(dc_df, gis=gis)
    display(dc_e.head())
    pr_e = enrich(pr_df, gis=gis)
    display(pr_e.head())
    
    print(f'DC Population:    {dc_e.TOTPOP.sum()}')
    print(f'DC Density:       {int(round(dc_e.TOTPOP.sum() / (dc_df.spatial.area / 1000000)))} per Square Kilometer')
    print(f'Paris Population: {pr_e.TOTPOP.sum()}')
    print(f'Paris Density:    {int(round(pr_e.TOTPOP.sum() / (pr_dis.area / 1000000)))} per Square Kilometer')
    
except RuntimeError:
    print('Your GIS Connection Does Not Support Geoenrichment')
ID OBJECTID_0 sourceCountry OBJECTID AREAKM CITY_NAME Shape_Area AREAMILES Shape_Length STATE_CITY ... aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData TOTPOP TOTHH AVGHHSZ TOTMALES TOTFEMALES SHAPE
0 0 1 US 1 177.47 Washington 1.774562e+08 68.52 67608.276922 1150000 ... BlockApportionment:US.BlockGroups 2.191 2.576 1 702350 311236 2.13 334037 368314 {'rings': [[[-77.1197952245159, 38.93435090401...

1 rows × 22 columns

ID OBJECTID_0 sourceCountry OBJECTID C_AR SHAPE_Length L_QU C_QUINSEE N_SQ_AR SHAPE_Area ... aggregationMethod populationToPolygonSizeRating apportionmentConfidence HasData TOTPOP TOTHH AVGHHSZ TOTMALES TOTFEMALES SHAPE
0 0 1 FR 1 12 18260.602587 Picpus 7511202 750000012 7.205014e+06 ... BlockApportionment:FR.IRIS 2.025 2.731 1 67094 34472 1.95 31225 35869 {'rings': [[[2.4112497331466076, 48.8296572091...
1 1 2 FR 2 12 6154.591415 Bercy 7511203 750000012 1.902932e+06 ... BlockApportionment:FR.IRIS 2.025 2.731 1 13588 6152 2.21 6427 7161 {'rings': [[[2.391141037849975, 48.82611264487...
2 2 3 FR 3 12 4509.226476 Quinze-Vingts 7511204 750000012 1.235916e+06 ... BlockApportionment:FR.IRIS 2.025 2.731 1 27261 14756 1.85 13181 14079 {'rings': [[[2.373204762244875, 48.84057029279...
3 3 4 FR 4 13 4758.777675 Salpêtrière 7511301 750000013 1.181560e+06 ... BlockApportionment:FR.IRIS 2.025 2.731 1 18286 9782 1.87 8489 9797 {'rings': [[[2.356363453081923, 48.83103873373...
4 4 5 FR 5 13 7069.818989 Gare 7511302 750000013 3.044177e+06 ... BlockApportionment:FR.IRIS 2.025 2.731 1 73914 34734 2.13 34932 38982 {'rings': [[[2.367706331151315, 48.81742119763...

5 rows × 22 columns

DC Population:    702350
DC Density:       2393 per Square Kilometer
Paris Population: 2211963
Paris Density:    20992 per Square Kilometer

Fetch Open Street Map Data within Boundaries as Data Frame.

For our purposes, we will only be looking for recycling amenities. We could have collected all amenities by simply passing the string 'amenity' as the third argument. Or we might have tried to find all of the known surveillance cameras in our extent by passing {'man_made': ['surveillance']}. Please consult the OSM Wiki for more information on what features you can extract. We are adding the following results to the first 2 maps we created above.

Input
runner = Runner()

dc_osm_df = runner.gen_osm_df('point', dc_bounds, {'amenity': ["recycling"]})

dc_osm_df.columns = dc_osm_df.columns.str.replace("recycling:", "rec")
dc_osm_df.SHAPE   = dc_osm_df.geom

dc_osm_df.spatial.plot(map_widget=dc_map, renderer_type='u', col='recycling_type')


pr_osm_df = runner.gen_osm_df('point', pr_bounds, {'amenity': ["recycling"]})

pr_osm_df.columns = pr_osm_df.columns.str.replace("recycling:", "rec")
pr_osm_df.SHAPE   = pr_osm_df.geom

pr_osm_df.spatial.plot(map_widget=pr_map, renderer_type='u', col='recycling_type')

display(dc_osm_df.head(n=1))
display(pr_osm_df.head(n=1))
osm_id geom amenity source recycling_type opening_hours operator location ele description reccans recclothes recglass_bottles recpaper wpt_description recplastic_bottles fixme recbooks website SHAPE
0 2607287308 {'x': -77.1188753, 'y': 38.8614258, 'spatialRe... recycling site visit NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN {'x': -77.1188753, 'y': 38.8614258, 'spatialRe...
osm_id geom amenity recglass recycling_type source location recglass_bottles material operator ... contact:phone contact:website recbicycles second_hand service:bicycle:diy shop recglass_jars name:fr start_date SHAPE
0 3350036758 {'x': 2.2247201, 'y': 48.8158202, 'spatialRefe... recycling yes container GPSO data.gouv.fr 2015-02 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN {'x': 2.2247201, 'y': 48.8158202, 'spatialRefe...

1 rows × 82 columns

General Attribute Comparison

We can use basic Data Frame methods to get a general idea of the differences between the recycling features in DC and Paris. While the completeness (more unique sources and operators) of the Paris data may partially be the result of there being many more records, we see a number of non-profit agencies operating these Parisian facilities and government efforts in Paris focused on documenting these features. An interesting question might be whether the large discrepancy (both in raw counts and specificity in the details) is the result of OSM simply being used more in Europe, or the result of a different set of values toward environmental stewardship.

Input
# Values for DC

print(f'Total Records found for DC Data Frame:        {len(dc_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(dc_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({dc_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({dc_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.source.value_counts()[:5].to_string())
Total Records found for DC Data Frame:        69
Total Attributes Defined in Paris Data Frame: 20
#########################
Top 5 Operators (2 Unique)
#########################
Arlington County      1
Better World Books    1
#########################
Top 5 Sources (1 Unique)
#########################
site visit    1
Input
# Values for Paris

print(f'Total Records found for Paris Data Frame:     {len(pr_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(pr_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({pr_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({pr_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.source.value_counts()[:5].to_string())
Total Records found for Paris Data Frame:     1265
Total Attributes Defined in Paris Data Frame: 82
#########################
Top 5 Operators (11 Unique)
#########################
Eco-Emballages        40
Le Relais             27
Ecotextile             4
WWF                    3
Issy en Transition     2
#########################
Top 5 Sources (46 Unique)
#########################
survey                                                                                   254
GPSO data.gouv.fr 2015-02                                                                107
data.issy.com 15/06/2016                                                                  64
GPSO data.gouv.fr 2015-02;survey                                                          54
cadastre-dgi-fr source : Direction Générale des Impôts - Cadastre. Mise à jour : 2011     13

Using the Map to Drive Our Exploration

Perhaps we are interested in finding shops that do not have recycling options nearby. The following 2 cells can be used as a way to explore OSM interactively within the Jupyter Notebook. Locate a place, drag the map around, and then run the last cell to plot a heat map of recycling amenities and all of the shops within the extent of the map. With only a few lines of code, we have the beginning of a site selection tool that also exposes all of the analytical power of Python and the ArcGIS platform.

Input
search_map = gis.map('Berlin', 12)
display(search_map)
Output

Get Data Frame for Map Extent & Plot

Input
extent = search_map.extent
coords = project([[extent['xmin'], extent['ymin']], [extent['xmax'], extent['ymax']]], in_sr=3857, out_sr=4326)
bounds = f"({coords[0]['y']},{coords[0]['x']},{coords[1]['y']},{coords[1]['x']})"

try:
    runner = Runner()
    shop_df = runner.gen_osm_df('point', bounds, {'shop': ['coffee', 'street_vendor', 'convenience']})
    recy_df = runner.gen_osm_df('point', bounds, {'amenity': ['recycling']})

    # Move Geometries to SHAPE Column to Support Plot
    shop_df.SHAPE = shop_df.geom
    recy_df.SHAPE = recy_df.geom

    print(f'OSM Coffee Shops Features Within Current Map View: {len(shop_df)}')
    print(f'OSM Recycling Features Within Current Map View:    {len(recy_df)}')

    recy_df.spatial.plot(map_widget=search_map, renderer_type='h')
    shop_df.spatial.plot(map_widget=search_map)
    
except Exception as e:
    print("We Likely Didn't Find Any Features in this Extent.")
    print(e)
    
except KeyError as e:
    print('Try Moving the Map Around & Running This Cell Again')
    print(e)
OSM Coffee Shops Features Within Current Map View: 636
OSM Recycling Features Within Current Map View:    941

Export the Data to ArcGIS Online or Portal for Further Analysis

Finally, the GeoAccessor gives us a convenient method for pushing our Data Frame into a Hosted Feature Layer within Portal or ArcGIS Online so that we can do further analysis or share the information with other people in our organization. We could have also moved our results into a database with the to_featureclass() method.

Input
recycling_hfl = recy_df.spatial.to_featurelayer(f'OSM_Recycling_{round(time.time())}', gis=gis, tags='OSM')
shops_hfl = shop_df.spatial.to_featurelayer(f'OSM_Shops_{round(time.time())}', gis=gis, tags='OSM')

display(recycling_hfl)
display(shops_hfl)
OSM_Recycling_1574000884
Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views
OSM_Shops_1574000907
Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.