Exploring OpenStreetMap using Pandas and the Python API
This notebook is based around a simple tool named OSM Runner that queries the OpenStreetMap (OSM) Overpass API and returns a Spatial Data Frame. Using the Python API inside of a Jupyter Notebook, we can develop map-driven tools to explore OSM with the full capabilities of the ArcGIS platform at our disposal. Be sure to update the GIS connection information in the cell below before proceeding.
This Notebook was written for an environment that does not have access to arcpy.
!pip install osm-runner
import time
from osm_runner import Runner # pip install osm-runner
import pandas as pd
from arcgis.features import FeatureLayer, GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import enrich
from arcgis import dissolve_boundaries
from arcgis.geometry import project
from arcgis.gis import GIS
# Organization Login
gis = GIS('home')
Build Data Frames from Feature Layers & Extract Bounding Box¶
Let's assume we want to compare recycling amenities in OSM across 2 major cities. The first step will be to turn the boundaries for each city into a Data Frame via the GeoAccessor method from_layer(). Once we have a Data Frame for each city, we will use the Project operation of the Geometry Service in our GIS to get the envelope required to fetch data from Open Street Map.
dc_fl = FeatureLayer('https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/10')
dc_df = GeoAccessor.from_layer(dc_fl)
display(dc_df.head())
dc_extent = dc_df.spatial.full_extent
dc_coords = project([[dc_extent[0], dc_extent[1]], [dc_extent[2], dc_extent[3]]], in_sr=3857, out_sr=4326)
dc_bounds = f"({dc_coords[0]['y']},{dc_coords[0]['x']},{dc_coords[1]['y']},{dc_coords[1]['x']})"
pr_fl = FeatureLayer('https://carto2.apur.org/apur/rest/services/OPENDATA/QUARTIER/MapServer/0')
pr_df = GeoAccessor.from_layer(pr_fl)
display(pr_df.head())
pr_extent = pr_df.spatial.full_extent
pr_coords = project([[pr_extent[0], pr_extent[1]], [pr_extent[2], pr_extent[3]]], in_sr=2154, out_sr=4326)
pr_bounds = f"({pr_coords[0]['y']},{pr_coords[0]['x']},{pr_coords[1]['y']},{pr_coords[1]['x']})"
Overview of the area in Washington DC to be Collected¶
dc_map = gis.map('Washington DC')
dc_map.draw(dc_df.iloc[0].SHAPE)
dc_map.draw(dc_df.spatial.bbox)
display(dc_map)
print(f'Searching Area: {round(dc_df.spatial.bbox.area / 1000000)} Square Kilometers')
Overview of the Area in Paris to be Collected¶
pr_map = gis.map('Paris')
pr_dis = dissolve_boundaries(pr_fl).query().sdf.iloc[0].SHAPE
pr_map.draw(pr_dis)
pr_map.draw(pr_df.spatial.bbox)
display(pr_map)
print(f'Searching Area: {round(pr_df.spatial.bbox.area / 1000000)} Square Kilometers')
Collecting Demographics with Geoenrichment¶
In addition to the useful GeoAccessor methods and properties we can access via a Data Frame, we may also pass a Data Frame to the enrich() method to learn something useful about the area we are studying. Even before we fetch data from OpenStreetMap, it should be clear from the differences in population size, population density, and the study area being assessed, that any comparison between DC and Paris would not be fair. At the end of the notebook we will set up a solution to help us find a city that might be more comparable to Paris or DC.
try:
dc_e = enrich(dc_df, gis=gis)
display(dc_e.head())
pr_e = enrich(pr_df, gis=gis)
display(pr_e.head())
print(f'DC Population: {dc_e.TOTPOP.sum()}')
print(f'DC Density: {int(round(dc_e.TOTPOP.sum() / (dc_df.spatial.area / 1000000)))} per Square Kilometer')
print(f'Paris Population: {pr_e.TOTPOP.sum()}')
print(f'Paris Density: {int(round(pr_e.TOTPOP.sum() / (pr_dis.area / 1000000)))} per Square Kilometer')
except RuntimeError:
print('Your GIS Connection Does Not Support Geoenrichment')
Fetch Open Street Map Data within Boundaries as Data Frame.¶
For our purposes, we will only be looking for recycling amenities. We could have collected all amenities by simply passing the string 'amenity' as the third argument. Or we might have tried to find all of the known surveillance cameras in our extent by passing {'man_made': ['surveillance']}. Please consult the OSM Wiki for more information on what features you can extract. We are adding the following results to the first 2 maps we created above.
runner = Runner()
dc_osm_df = runner.gen_osm_df('point', dc_bounds, {'amenity': ["recycling"]})
dc_osm_df.columns = dc_osm_df.columns.str.replace("recycling:", "rec")
dc_osm_df.SHAPE = dc_osm_df.geom
dc_osm_df.spatial.plot(map_widget=dc_map, renderer_type='u', col='recycling_type')
pr_osm_df = runner.gen_osm_df('point', pr_bounds, {'amenity': ["recycling"]})
pr_osm_df.columns = pr_osm_df.columns.str.replace("recycling:", "rec")
pr_osm_df.SHAPE = pr_osm_df.geom
pr_osm_df.spatial.plot(map_widget=pr_map, renderer_type='u', col='recycling_type')
display(dc_osm_df.head(n=1))
display(pr_osm_df.head(n=1))
General Attribute Comparison¶
We can use basic Data Frame methods to get a general idea of the differences between the recycling features in DC and Paris. While the completeness (more unique sources and operators) of the Paris data may partially be the result of there being many more records, we see a number of non-profit agencies operating these Parisian facilities and government efforts in Paris focused on documenting these features. An interesting question might be whether the large discrepancy (both in raw counts and specificity in the details) is the result of OSM simply being used more in Europe, or the result of a different set of values toward environmental stewardship.
# Values for DC
print(f'Total Records found for DC Data Frame: {len(dc_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(dc_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({dc_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({dc_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.source.value_counts()[:5].to_string())
# Values for Paris
print(f'Total Records found for Paris Data Frame: {len(pr_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(pr_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({pr_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({pr_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.source.value_counts()[:5].to_string())
Using the Map to Drive Our Exploration¶
Perhaps we are interested in finding shops that do not have recycling options nearby. The following 2 cells can be used as a way to explore OSM interactively within the Jupyter Notebook. Locate a place, drag the map around, and then run the last cell to plot a heat map of recycling amenities and all of the shops within the extent of the map. With only a few lines of code, we have the beginning of a site selection tool that also exposes all of the analytical power of Python and the ArcGIS platform.
search_map = gis.map('Berlin', 12)
display(search_map)
Get Data Frame for Map Extent & Plot¶
extent = search_map.extent
coords = project([[extent['xmin'], extent['ymin']], [extent['xmax'], extent['ymax']]], in_sr=3857, out_sr=4326)
bounds = f"({coords[0]['y']},{coords[0]['x']},{coords[1]['y']},{coords[1]['x']})"
try:
runner = Runner()
shop_df = runner.gen_osm_df('point', bounds, {'shop': ['coffee', 'street_vendor', 'convenience']})
recy_df = runner.gen_osm_df('point', bounds, {'amenity': ['recycling']})
# Move Geometries to SHAPE Column to Support Plot
shop_df.SHAPE = shop_df.geom
recy_df.SHAPE = recy_df.geom
print(f'OSM Coffee Shops Features Within Current Map View: {len(shop_df)}')
print(f'OSM Recycling Features Within Current Map View: {len(recy_df)}')
recy_df.spatial.plot(map_widget=search_map, renderer_type='h')
shop_df.spatial.plot(map_widget=search_map)
except Exception as e:
print("We Likely Didn't Find Any Features in this Extent.")
print(e)
except KeyError as e:
print('Try Moving the Map Around & Running This Cell Again')
print(e)
Export the Data to ArcGIS Online or Portal for Further Analysis¶
Finally, the GeoAccessor gives us a convenient method for pushing our Data Frame into a Hosted Feature Layer within Portal or ArcGIS Online so that we can do further analysis or share the information with other people in our organization. We could have also moved our results into a database with the to_featureclass() method.
recycling_hfl = recy_df.spatial.to_featurelayer(f'OSM_Recycling_{round(time.time())}', gis=gis, tags='OSM')
shops_hfl = shop_df.spatial.to_featurelayer(f'OSM_Shops_{round(time.time())}', gis=gis, tags='OSM')
display(recycling_hfl)
display(shops_hfl)