This notebook is based around a simple tool named OSM Runner that queries the OpenStreetMap (OSM) Overpass API and returns a Spatial Data Frame. Using the Python API inside of a Jupyter Notebook, we can develop map-driven tools to explore OSM with the full capabilities of the ArcGIS platform at our disposal. Be sure to update the GIS connection information in the cell below before proceeding.
This Notebook was written for an environment that does not have access to arcpy.
!pip install osm-runner
Collecting osm-runner Downloading osm_runner-0.0.3-py3-none-any.whl (3.8 kB) Requirement already satisfied: arcgis in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.8.1) Requirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from osm-runner) (1.0.3) Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from osm-runner) (2.23.0) Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (2020.1) Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from pandas->osm-runner) (1.18.1) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2.9) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (1.25.8) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (2020.4.5.1) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->osm-runner) (3.0.4) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas->osm-runner) (1.15.0) Installing collected packages: osm-runner Successfully installed osm-runner-0.0.3
import time
from osm_runner import Runner # pip install osm-runner
import pandas as pd
from arcgis.features import FeatureLayer, GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import enrich
from arcgis import dissolve_boundaries
from arcgis.geometry import project
from arcgis.gis import GIS
# Organization Login
gis = GIS('home')
Build Data Frames from Feature Layers & Extract Bounding Box
Let's assume we want to compare recycling amenities in OSM across 2 major cities. The first step will be to turn the boundaries for each city into a Data Frame via the GeoAccessor method from_layer(). Once we have a Data Frame for each city, we will use the Project operation of the Geometry Service in our GIS to get the envelope required to fetch data from Open Street Map.
dc_fl = FeatureLayer('https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/10')
dc_df = GeoAccessor.from_layer(dc_fl)
display(dc_df.head())
dc_extent = dc_df.spatial.full_extent
dc_coords = project([[dc_extent[0], dc_extent[1]], [dc_extent[2], dc_extent[3]]], in_sr=3857, out_sr=4326)
dc_bounds = f"({dc_coords[0]['y']},{dc_coords[0]['x']},{dc_coords[1]['y']},{dc_coords[1]['x']})"
pr_fl = FeatureLayer('https://carto2.apur.org/apur/rest/services/OPENDATA/QUARTIER/MapServer/0')
pr_df = GeoAccessor.from_layer(pr_fl)
display(pr_df.head())
pr_extent = pr_df.spatial.full_extent
pr_coords = project([[pr_extent[0], pr_extent[1]], [pr_extent[2], pr_extent[3]]], in_sr=2154, out_sr=4326)
pr_bounds = f"({pr_coords[0]['y']},{pr_coords[0]['x']},{pr_coords[1]['y']},{pr_coords[1]['x']})"
OBJECTID | CITY_NAME | STATE_CITY | CAPITAL | WEB_URL | AREAKM | AREAMILES | Shape_Length | Shape_Area | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Washington | 1150000 | Y | http://www.dc.gov | 177.47 | 68.52 | 67608.276922 | 1.774562e+08 | {'rings': [[[-8584936.334474642, 4712272.26069... |
OBJECTID | N_SQ_QU | C_QU | C_QUINSEE | L_QU | C_AR | N_SQ_AR | SHAPE_Length | SHAPE_Area | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 750000046 | 46 | 7511202 | Picpus | 12 | 750000012 | 18260.602587 | 7.205014e+06 | {'rings': [[[656779.2524999976, 6859005.5504],... |
1 | 2 | 750000047 | 47 | 7511203 | Bercy | 12 | 750000012 | 6154.591415 | 1.902932e+06 | {'rings': [[[655300.0428000018, 6858622.629000... |
2 | 3 | 750000048 | 48 | 7511204 | Quinze-Vingts | 12 | 750000012 | 4509.226476 | 1.235916e+06 | {'rings': [[[653996.0221000016, 6860240.466299... |
3 | 4 | 750000049 | 49 | 7511301 | Salpêtrière | 13 | 750000013 | 4758.777675 | 1.181560e+06 | {'rings': [[[652751.3332000002, 6859190.603100... |
4 | 5 | 750000050 | 50 | 7511302 | Gare | 13 | 750000013 | 7069.818989 | 3.044177e+06 | {'rings': [[[653571.8592000008, 6857669.765999... |
Overview of the area in Washington DC to be Collected
dc_map = gis.map('Washington DC')
dc_map.content.draw(dc_df.iloc[0].SHAPE)
dc_map.content.draw(dc_df.spatial.bbox)
display(dc_map)
print(f'Searching Area: {round(dc_df.spatial.bbox.area / 1000000)} Square Kilometers')
Overview of the Area in Paris to be Collected
pr_map = gis.map('Paris')
pr_dis = dissolve_boundaries(pr_fl).query().sdf.iloc[0].SHAPE
pr_map.content.draw(pr_dis)
pr_map.content.draw(pr_df.spatial.bbox)
display(pr_map)
print(f'Searching Area: {round(pr_df.spatial.bbox.area / 1000000)} Square Kilometers')
Collecting Demographics with Geoenrichment
In addition to the useful GeoAccessor methods and properties we can access via a Data Frame, we may also pass a Data Frame to the enrich() method to learn something useful about the area we are studying. Even before we fetch data from OpenStreetMap, it should be clear from the differences in population size, population density, and the study area being assessed, that any comparison between DC and Paris would not be fair. At the end of the notebook we will set up a solution to help us find a city that might be more comparable to Paris or DC.
try:
dc_e = enrich(dc_df, gis=gis)
display(dc_e.head())
pr_e = enrich(pr_df, gis=gis)
display(pr_e.head())
print(f'DC Population: {dc_e.totpop.sum()}')
print(f'DC Density: {int(round(dc_e.totpop.sum() / (dc_df.spatial.area / 1000000)))} per Square Kilometer')
print(f'Paris Population: {pr_e.totpop.sum()}')
print(f'Paris Density: {int(round(pr_e.totpop.sum() / (pr_dis.area / 1000000)))} per Square Kilometer')
except RuntimeError:
print('Your GIS Connection Does Not Support Geoenrichment')
ID | OBJECTID_0 | sourceCountry | OBJECTID | AREAKM | CITY_NAME | Shape_Area | AREAMILES | Shape_Length | STATE_CITY | ... | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | TOTPOP | TOTHH | AVGHHSZ | TOTMALES | TOTFEMALES | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | 1 | 177.47 | Washington | 1.774562e+08 | 68.52 | 67608.276922 | 1150000 | ... | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | 1 | 702350 | 311236 | 2.13 | 334037 | 368314 | {'rings': [[[-77.1197952245159, 38.93435090401... |
1 rows × 22 columns
ID | OBJECTID_0 | sourceCountry | OBJECTID | C_AR | SHAPE_Length | L_QU | C_QUINSEE | N_SQ_AR | SHAPE_Area | ... | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | TOTPOP | TOTHH | AVGHHSZ | TOTMALES | TOTFEMALES | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | FR | 1 | 12 | 18260.602587 | Picpus | 7511202 | 750000012 | 7.205014e+06 | ... | BlockApportionment:FR.IRIS | 2.025 | 2.731 | 1 | 67094 | 34472 | 1.95 | 31225 | 35869 | {'rings': [[[2.4112497331466076, 48.8296572091... |
1 | 1 | 2 | FR | 2 | 12 | 6154.591415 | Bercy | 7511203 | 750000012 | 1.902932e+06 | ... | BlockApportionment:FR.IRIS | 2.025 | 2.731 | 1 | 13588 | 6152 | 2.21 | 6427 | 7161 | {'rings': [[[2.391141037849975, 48.82611264487... |
2 | 2 | 3 | FR | 3 | 12 | 4509.226476 | Quinze-Vingts | 7511204 | 750000012 | 1.235916e+06 | ... | BlockApportionment:FR.IRIS | 2.025 | 2.731 | 1 | 27261 | 14756 | 1.85 | 13181 | 14079 | {'rings': [[[2.373204762244875, 48.84057029279... |
3 | 3 | 4 | FR | 4 | 13 | 4758.777675 | Salpêtrière | 7511301 | 750000013 | 1.181560e+06 | ... | BlockApportionment:FR.IRIS | 2.025 | 2.731 | 1 | 18286 | 9782 | 1.87 | 8489 | 9797 | {'rings': [[[2.356363453081923, 48.83103873373... |
4 | 4 | 5 | FR | 5 | 13 | 7069.818989 | Gare | 7511302 | 750000013 | 3.044177e+06 | ... | BlockApportionment:FR.IRIS | 2.025 | 2.731 | 1 | 73914 | 34734 | 2.13 | 34932 | 38982 | {'rings': [[[2.367706331151315, 48.81742119763... |
5 rows × 22 columns
DC Population: 702350 DC Density: 2393 per Square Kilometer Paris Population: 2211963 Paris Density: 20992 per Square Kilometer
Fetch Open Street Map Data within Boundaries as Data Frame.
For our purposes, we will only be looking for recycling amenities. We could have collected all amenities by simply passing the string 'amenity' as the third argument. Or we might have tried to find all of the known surveillance cameras in our extent by passing {'man_made': ['surveillance']}. Please consult the OSM Wiki for more information on what features you can extract. We are adding the following results to the first 2 maps we created above.
runner = Runner()
dc_osm_df = runner.gen_osm_df('point', dc_bounds, {'amenity': ["recycling"]})
dc_osm_df.columns = dc_osm_df.columns.str.replace("recycling:", "rec")
dc_osm_df.SHAPE = dc_osm_df.geom
dc_osm_df.spatial.plot(map_widget=dc_map)
pr_osm_df = runner.gen_osm_df('point', pr_bounds, {'amenity': ["recycling"]})
pr_osm_df.columns = pr_osm_df.columns.str.replace("recycling:", "rec")
pr_osm_df.SHAPE = pr_osm_df.geom
pr_osm_df.spatial.plot(map_widget=pr_map)
display(dc_osm_df.head(n=1))
display(pr_osm_df.head(n=1))
osm_id | geom | amenity | source | recycling_type | opening_hours | operator | location | ele | description | reccans | recclothes | recglass_bottles | recpaper | wpt_description | recplastic_bottles | fixme | recbooks | website | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2607287308 | {'x': -77.1188753, 'y': 38.8614258, 'spatialRe... | recycling | site visit | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {'x': -77.1188753, 'y': 38.8614258, 'spatialRe... |
osm_id | geom | amenity | recglass | recycling_type | source | location | recglass_bottles | material | operator | ... | contact:phone | contact:website | recbicycles | second_hand | service:bicycle:diy | shop | recglass_jars | name:fr | start_date | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3350036758 | {'x': 2.2247201, 'y': 48.8158202, 'spatialRefe... | recycling | yes | container | GPSO data.gouv.fr 2015-02 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {'x': 2.2247201, 'y': 48.8158202, 'spatialRefe... |
1 rows × 82 columns
rend1 = dc_map.content.renderer(2) # plotted dc_osm_df
rend2 = pr_map.content.renderer(2) # plotted pr_osm_df
rend1.smart_mapping().class_breaks_renderer(break_type="color", field="recycling_type")
rend2.smart_mapping().class_breaks_renderer(break_type="color", field="recycling_type")
General Attribute Comparison
We can use basic Data Frame methods to get a general idea of the differences between the recycling features in DC and Paris. While the completeness (more unique sources and operators) of the Paris data may partially be the result of there being many more records, we see a number of non-profit agencies operating these Parisian facilities and government efforts in Paris focused on documenting these features. An interesting question might be whether the large discrepancy (both in raw counts and specificity in the details) is the result of OSM simply being used more in Europe, or the result of a different set of values toward environmental stewardship.
# Values for DC
print(f'Total Records found for DC Data Frame: {len(dc_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(dc_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({dc_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({dc_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(dc_osm_df.source.value_counts()[:5].to_string())
Total Records found for DC Data Frame: 69 Total Attributes Defined in Paris Data Frame: 20 ######################### Top 5 Operators (2 Unique) ######################### Arlington County 1 Better World Books 1 ######################### Top 5 Sources (1 Unique) ######################### site visit 1
# Values for Paris
print(f'Total Records found for Paris Data Frame: {len(pr_osm_df)}')
print(f'Total Attributes Defined in Paris Data Frame: {len(list(pr_osm_df))}')
print('#' * 25)
print(f'Top 5 Operators ({pr_osm_df.operator.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.operator.value_counts()[:5].to_string())
print('#' * 25)
print(f'Top 5 Sources ({pr_osm_df.source.nunique()} Unique)')
print('#' * 25)
print(pr_osm_df.source.value_counts()[:5].to_string())
Total Records found for Paris Data Frame: 1265 Total Attributes Defined in Paris Data Frame: 82 ######################### Top 5 Operators (11 Unique) ######################### Eco-Emballages 40 Le Relais 27 Ecotextile 4 WWF 3 Issy en Transition 2 ######################### Top 5 Sources (46 Unique) ######################### survey 254 GPSO data.gouv.fr 2015-02 107 data.issy.com 15/06/2016 64 GPSO data.gouv.fr 2015-02;survey 54 cadastre-dgi-fr source : Direction Générale des Impôts - Cadastre. Mise à jour : 2011 13
Using the Map to Drive Our Exploration
Perhaps we are interested in finding shops that do not have recycling options nearby. The following 2 cells can be used as a way to explore OSM interactively within the Jupyter Notebook. Locate a place, drag the map around, and then run the last cell to plot a heat map of recycling amenities and all of the shops within the extent of the map. With only a few lines of code, we have the beginning of a site selection tool that also exposes all of the analytical power of Python and the ArcGIS platform.
search_map = gis.map('Berlin')
search_map.zoom = 12
display(search_map)
Get Data Frame for Map Extent & Plot
extent = search_map.extent
coords = project([[extent['xmin'], extent['ymin']], [extent['xmax'], extent['ymax']]], in_sr=3857, out_sr=4326)
bounds = f"({coords[0]['y']},{coords[0]['x']},{coords[1]['y']},{coords[1]['x']})"
try:
runner = Runner()
shop_df = runner.gen_osm_df('point', bounds, {'shop': ['coffee', 'street_vendor', 'convenience']})
recy_df = runner.gen_osm_df('point', bounds, {'amenity': ['recycling']})
# Move Geometries to SHAPE Column to Support Plot
shop_df.SHAPE = shop_df.geom
recy_df.SHAPE = recy_df.geom
print(f'OSM Coffee Shops Features Within Current Map View: {len(shop_df)}')
print(f'OSM Recycling Features Within Current Map View: {len(recy_df)}')
recy_df.spatial.plot(map_widget=search_map)
shop_df.spatial.plot(map_widget=search_map)
except Exception as e:
print("We Likely Didn't Find Any Features in this Extent.")
print(e)
except KeyError as e:
print('Try Moving the Map Around & Running This Cell Again')
print(e)
OSM Coffee Shops Features Within Current Map View: 636 OSM Recycling Features Within Current Map View: 941
recy_rend = search_map.content.renderer(0) # plotted recy_df
recy_rend.smart_mapping().heatmap()
Export the Data to ArcGIS Online or Portal for Further Analysis
Finally, the GeoAccessor gives us a convenient method for pushing our Data Frame into a Hosted Feature Layer within Portal or ArcGIS Online so that we can do further analysis or share the information with other people in our organization. We could have also moved our results into a database with the to_featureclass() method.
recycling_hfl = recy_df.spatial.to_featurelayer(f'OSM_Recycling_{round(time.time())}', gis=gis, tags='OSM')
shops_hfl = shop_df.spatial.to_featurelayer(f'OSM_Shops_{round(time.time())}', gis=gis, tags='OSM')
display(recycling_hfl)
display(shops_hfl)