Exploring OpenStreetMap using Pandas and the Python API

This notebook is based around a simple tool named OSM Runner that queries the OpenStreetMap (OSM) Overpass API and returns a Spatial Data Frame. Using the Python API inside of a Jupyter Notebook, we can develop map-driven tools to explore OSM with the full capabilities of the ArcGIS platform at our disposal. Be sure to update the GIS connection information in the cell below before proceeding.

This Notebook was written for an environment that does not have access to arcpy.

!pip install osm-runner

Collecting osm-runner
  Downloading osm_runner-0.0.3-py3-none-any.whl (3.8 kB)
Requirement already satisfied: arcgis in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.8.1)
Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.0.3)
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from osm-runner) (2.23.0)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2020.1)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (1.18.1)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2020.4.5.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (3.0.4)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas->osm-runner) (1.15.0)
Installing collected packages: osm-runner
Successfully installed osm-runner-0.0.3

import time

from osm_runner import Runner  # pip install osm-runner
import pandas as pd

from arcgis.features import FeatureLayer, GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import enrich
from arcgis import dissolve_boundaries
from arcgis.geometry import project
from arcgis.gis import GIS

# Organization Login
gis = GIS('home')

Build Data Frames from Feature Layers & Extract Bounding Box

Let's assume we want to compare recycling amenities in OSM across 2 major cities. The first step will be to turn the boundaries for each city into a Data Frame via the GeoAccessor method from_layer(). Once we have a Data Frame for each city, we will use the Project operation of the Geometry Service in our GIS to get the envelope required to fetch data from Open Street Map.

dc_fl = FeatureLayer('https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/10')
dc_df = GeoAccessor.from_layer(dc_fl)
display(dc_df.head())

dc_extent = dc_df.spatial.full_extent
dc_coords = project([[dc_extent[0], dc_extent[1]], [dc_extent[2], dc_extent[3]]], in_sr=3857, out_sr=4326)
dc_bounds = f"({dc_coords[0]['y']},{dc_coords[0]['x']},{dc_coords[1]['y']},{dc_coords[1]['x']})"

pr_fl = FeatureLayer('https://carto2.apur.org/apur/rest/services/OPENDATA/QUARTIER/MapServer/0')
pr_df = GeoAccessor.from_layer(pr_fl)
display(pr_df.head())

pr_extent = pr_df.spatial.full_extent
pr_coords = project([[pr_extent[0], pr_extent[1]], [pr_extent[2], pr_extent[3]]], in_sr=2154, out_sr=4326)
pr_bounds = f"({pr_coords[0]['y']},{pr_coords[0]['x']},{pr_coords[1]['y']},{pr_coords[1]['x']})"

	OBJECTID	CITY_NAME	STATE_CITY	CAPITAL	WEB_URL	AREAKM	AREAMILES	Shape_Length	Shape_Area	SHAPE
0	1	Washington	1150000	Y	http://www.dc.gov	177.47	68.52	67608.276922	1.774562e+08	{'rings': [[[-8584936.334474642, 4712272.26069...

	OBJECTID	N_SQ_QU	C_QU	C_QUINSEE	L_QU	C_AR	N_SQ_AR	SHAPE_Length	SHAPE_Area	SHAPE
0	1	750000046	46	7511202	Picpus	12	750000012	18260.602587	7.205014e+06	{'rings': [[[656779.2524999976, 6859005.5504],...
1	2	750000047	47	7511203	Bercy	12	750000012	6154.591415	1.902932e+06	{'rings': [[[655300.0428000018, 6858622.629000...
2	3	750000048	48	7511204	Quinze-Vingts	12	750000012	4509.226476	1.235916e+06	{'rings': [[[653996.0221000016, 6860240.466299...
3	4	750000049	49	7511301	Salpêtrière	13	750000013	4758.777675	1.181560e+06	{'rings': [[[652751.3332000002, 6859190.603100...
4	5	750000050	50	7511302	Gare	13	750000013	7069.818989	3.044177e+06	{'rings': [[[653571.8592000008, 6857669.765999...

Overview of the area in Washington DC to be Collected

dc_map = gis.map('Washington DC')

dc_map.draw(dc_df.iloc[0].SHAPE)
dc_map.draw(dc_df.spatial.bbox)

display(dc_map)

print(f'Searching Area: {round(dc_df.spatial.bbox.area / 1000000)} Square Kilometers')

Overview of the Area in Paris to be Collected

pr_map = gis.map('Paris')

pr_dis = dissolve_boundaries(pr_fl).query().sdf.iloc[0].SHAPE
pr_map.draw(pr_dis)
pr_map.draw(pr_df.spatial.bbox)

display(pr_map)

print(f'Searching Area: {round(pr_df.spatial.bbox.area / 1000000)} Square Kilometers')

Collecting Demographics with Geoenrichment

In addition to the useful GeoAccessor methods and properties we can access via a Data Frame, we may also pass a Data Frame to the enrich() method to learn something useful about the area we are studying. Even before we fetch data from OpenStreetMap, it should be clear from the differences in population size, population density, and the study area being assessed, that any comparison between DC and Paris would not be fair. At the end of the notebook we will set up a solution to help us find a city that might be more comparable to Paris or DC.

try:
    dc_e = enrich(dc_df, gis=gis)
    display(dc_e.head())
    pr_e = enrich(pr_df, gis=gis)
    display(pr_e.head())
    
    print(f'DC Population:    {dc_e.totpop.sum()}')
    print(f'DC Density:       {int(round(dc_e.totpop.sum() / (dc_df.spatial.area / 1000000)))} per Square Kilometer')
    print(f'Paris Population: {pr_e.totpop.sum()}')
    print(f'Paris Density:    {int(round(pr_e.totpop.sum() / (pr_dis.area / 1000000)))} per Square Kilometer')
    
except RuntimeError:
    print('Your GIS Connection Does Not Support Geoenrichment')

	ID	OBJECTID_0	sourceCountry	OBJECTID	AREAKM	CITY_NAME	Shape_Area	AREAMILES	Shape_Length	STATE_CITY	...	aggregationMethod	populationToPolygonSizeRating	apportionmentConfidence	HasData	TOTPOP	TOTHH	AVGHHSZ	TOTMALES	TOTFEMALES	SHAPE
0	0	1	US	1	177.47	Washington	1.774562e+08	68.52	67608.276922	1150000	...	BlockApportionment:US.BlockGroups	2.191	2.576	1	702350	311236	2.13	334037	368314	{'rings': [[[-77.1197952245159, 38.93435090401...

1 rows × 22 columns

	ID	OBJECTID_0	sourceCountry	OBJECTID	C_AR	SHAPE_Length	L_QU	C_QUINSEE	N_SQ_AR	SHAPE_Area	...	aggregationMethod	populationToPolygonSizeRating	apportionmentConfidence	HasData	TOTPOP	TOTHH	AVGHHSZ	TOTMALES	TOTFEMALES	SHAPE
0	0	1	FR	1	12	18260.602587	Picpus	7511202	750000012	7.205014e+06	...	BlockApportionment:FR.IRIS	2.025	2.731	1	67094	34472	1.95	31225	35869	{'rings': [[[2.4112497331466076, 48.8296572091...
1	1	2	FR	2	12	6154.591415	Bercy	7511203	750000012	1.902932e+06	...	BlockApportionment:FR.IRIS	2.025	2.731	1	13588	6152	2.21	6427	7161	{'rings': [[[2.391141037849975, 48.82611264487...
2	2	3	FR	3	12	4509.226476	Quinze-Vingts	7511204	750000012	1.235916e+06	...	BlockApportionment:FR.IRIS	2.025	2.731	1	27261	14756	1.85	13181	14079	{'rings': [[[2.373204762244875, 48.84057029279...
3	3	4	FR	4	13	4758.777675	Salpêtrière	7511301	750000013	1.181560e+06	...	BlockApportionment:FR.IRIS	2.025	2.731	1	18286	9782	1.87	8489	9797	{'rings': [[[2.356363453081923, 48.83103873373...
4	4	5	FR	5	13	7069.818989	Gare	7511302	750000013	3.044177e+06	...	BlockApportionment:FR.IRIS	2.025	2.731	1	73914	34734	2.13	34932	38982	{'rings': [[[2.367706331151315, 48.81742119763...

5 rows × 22 columns

DC Population:    702350
DC Density:       2393 per Square Kilometer
Paris Population: 2211963
Paris Density:    20992 per Square Kilometer

Fetch Open Street Map Data within Boundaries as Data Frame.

For our purposes, we will only be looking for recycling amenities. We could have collected all amenities by simply passing the string 'amenity' as the third argument. Or we might have tried to find all of the known surveillance cameras in our extent by passing {'man_made': ['surveillance']}. Please consult the OSM Wiki for more information on what features you can extract. We are adding the following results to the first 2 maps we created above.

runner = Runner()

dc_osm_df = runner.gen_osm_df('point', dc_bounds, {'amenity': ["recycling"]})

dc_osm_df.columns = dc_osm_df.columns.str.replace("recycling:", "rec")
dc_osm_df.SHAPE   = dc_osm_df.geom

dc_osm_df.spatial.plot(map_widget=dc_map, renderer_type='u', col='recycling_type')


pr_osm_df = runner.gen_osm_df('point', pr_bounds, {'amenity': ["recycling"]})

pr_osm_df.columns = pr_osm_df.columns.str.replace("recycling:", "rec")
pr_osm_df.SHAPE   = pr_osm_df.geom

pr_osm_df.spatial.plot(map_widget=pr_map, renderer_type='u', col='recycling_type')

display(dc_osm_df.head(n=1))
display(pr_osm_df.head(n=1))

	osm_id	geom	amenity	source	recycling_type	opening_hours	operator	location	ele	description	reccans	recclothes	recglass_bottles	recpaper	wpt_description	recplastic_bottles	fixme	recbooks	website	SHAPE
0	2607287308	{'x': -77.1188753, 'y': 38.8614258, 'spatialRe...	recycling	site visit	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	{'x': -77.1188753, 'y': 38.8614258, 'spatialRe...

	osm_id	geom	amenity	recglass	recycling_type	source	location	recglass_bottles	material	operator	...	contact:phone	contact:website	recbicycles	second_hand	service:bicycle:diy	shop	recglass_jars	name:fr	start_date	SHAPE
0	3350036758	{'x': 2.2247201, 'y': 48.8158202, 'spatialRefe...	recycling	yes	container	GPSO data.gouv.fr 2015-02	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	{'x': 2.2247201, 'y': 48.8158202, 'spatialRefe...

1 rows × 82 columns

General Attribute Comparison

We can use basic Data Frame methods to get a general idea of the differences between the recycling features in DC and Paris. While the completeness (more unique sources and operators) of the Paris data may partially be the result of there being many more records, we see a number of non-profit agencies operating these Parisian facilities and government efforts in Paris focused on documenting these features. An interesting question might be whether the large discrepancy (both in raw counts and specificity in the details) is the result of OSM simply being used more in Europe, or the result of a different set of values toward environmental stewardship.

# Values for DC

print(f'Total Records found for DC Data Frame:        {len(dc_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(dc_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({dc_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({dc_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.source.value_counts()[:5].to_string())

Total Records found for DC Data Frame:        69
Total Attributes Defined in Paris Data Frame: 20
#########################
Top 5 Operators (2 Unique)
#########################
Arlington County      1
Better World Books    1
#########################
Top 5 Sources (1 Unique)
#########################
site visit    1

# Values for Paris

print(f'Total Records found for Paris Data Frame:     {len(pr_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(pr_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({pr_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({pr_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.source.value_counts()[:5].to_string())

Total Records found for Paris Data Frame:     1265
Total Attributes Defined in Paris Data Frame: 82
#########################
Top 5 Operators (11 Unique)
#########################
Eco-Emballages        40
Le Relais             27
Ecotextile             4
WWF                    3
Issy en Transition     2
#########################
Top 5 Sources (46 Unique)
#########################
survey                                                                                   254
GPSO data.gouv.fr 2015-02                                                                107
data.issy.com 15/06/2016                                                                  64
GPSO data.gouv.fr 2015-02;survey                                                          54
cadastre-dgi-fr source : Direction Générale des Impôts - Cadastre. Mise à jour : 2011     13

Using the Map to Drive Our Exploration

Perhaps we are interested in finding shops that do not have recycling options nearby. The following 2 cells can be used as a way to explore OSM interactively within the Jupyter Notebook. Locate a place, drag the map around, and then run the last cell to plot a heat map of recycling amenities and all of the shops within the extent of the map. With only a few lines of code, we have the beginning of a site selection tool that also exposes all of the analytical power of Python and the ArcGIS platform.

search_map = gis.map('Berlin', 12)
display(search_map)

Get Data Frame for Map Extent & Plot

extent = search_map.extent
coords = project([[extent['xmin'], extent['ymin']], [extent['xmax'], extent['ymax']]], in_sr=3857, out_sr=4326)
bounds = f"({coords[0]['y']},{coords[0]['x']},{coords[1]['y']},{coords[1]['x']})"

try:
    runner = Runner()
    shop_df = runner.gen_osm_df('point', bounds, {'shop': ['coffee', 'street_vendor', 'convenience']})
    recy_df = runner.gen_osm_df('point', bounds, {'amenity': ['recycling']})

    # Move Geometries to SHAPE Column to Support Plot
    shop_df.SHAPE = shop_df.geom
    recy_df.SHAPE = recy_df.geom

    print(f'OSM Coffee Shops Features Within Current Map View: {len(shop_df)}')
    print(f'OSM Recycling Features Within Current Map View:    {len(recy_df)}')

    recy_df.spatial.plot(map_widget=search_map, renderer_type='h')
    shop_df.spatial.plot(map_widget=search_map)
    
except Exception as e:
    print("We Likely Didn't Find Any Features in this Extent.")
    print(e)
    
except KeyError as e:
    print('Try Moving the Map Around & Running This Cell Again')
    print(e)

OSM Coffee Shops Features Within Current Map View: 636
OSM Recycling Features Within Current Map View:    941

Export the Data to ArcGIS Online or Portal for Further Analysis

Finally, the GeoAccessor gives us a convenient method for pushing our Data Frame into a Hosted Feature Layer within Portal or ArcGIS Online so that we can do further analysis or share the information with other people in our organization. We could have also moved our results into a database with the to_featureclass() method.

recycling_hfl = recy_df.spatial.to_featurelayer(f'OSM_Recycling_{round(time.time())}', gis=gis, tags='OSM')
shops_hfl = shop_df.spatial.to_featurelayer(f'OSM_Shops_{round(time.time())}', gis=gis, tags='OSM')

display(recycling_hfl)
display(shops_hfl)

OSM_Recycling_1574000884

Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views

OSM_Shops_1574000907

Feature Layer Collection by jscarmazzi_DBSNE
Last Modified: November 17, 2019
0 comments, 0 views