HTML table to Pandas Data Frame to Portal Item

Read HTML table to Pandas Data Frame

Often we read informative articles that present data in a tabular form. If such data contains location information, it would be much more insightful if presented as a cartographic map. Thus this sample shows how Pandas can be used to extract data from a table within a web page (in this case, a Wikipedia article) and how it can be then brought into the GIS for further analysis and visualization.

Note: to run this sample, you need a few extra libraries in your conda environment. If you don't have the libraries, install them by running the following commands from cmd.exe or your shell

conda install lxml
conda install html5lib
conda install beautifulsoup4
conda install matplotlib```
import pandas as pd

Let us read the Wikipedia article on Estimated number of guns per capita by country as a pandas data frame object

df = pd.read_html("https://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country")[0]

Let us process the table by dropping some unncessary columns

df.head()
Unnamed: 0Country or subnational areaEstimate of civilian firearms per 100 personsRegion[3]Subregion[3]Population 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearmsNotes
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
11.0United States120.5AmericasNorth America326474000.03933470001.01073743.0392,273,257 Est.[note 2]
22.0Falkland Islands62.1AmericasSouth America3000.020002.01705.0295NaN
33.0Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaNNaN
44.0New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000NaN
df.dropna(thresh=4, inplace=True)
del df['Unnamed: 0']
del df['Notes']
df.head()
Country or subnational areaEstimate of civilian firearms per 100 personsRegion[3]Subregion[3]Population 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearms
1United States120.5AmericasNorth America326474000.03933470001.01073743.0392,273,257 Est.
2Falkland Islands62.1AmericasSouth America3000.020002.01705.0295
3Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaN
4New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000
5Serbia39.1EuropeSouthern Europe6946000.027190002.01186086.01532914

If you notice, the Estimate of civilian firearms per 100 persons for United States is not a proper integer. We can correct is as below

df.iloc[0,1] = 120.5
df.dtypes
Country or subnational area                       object
Estimate of civilian firearms per 100 persons    float64
Region[3]                                         object
Subregion[3]                                      object
Population 2017                                  float64
Estimate of firearms in civilian possession       object
Computation method                               float64
Registered firearms                              float64
Unregistered firearms                             object
dtype: object

However, we cannot change every incorrect data entry if the table is large. Further we need the Estimate of civilian firearms per 100 persons column in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number

converted_column = pd.to_numeric(df["Estimate of civilian firearms per 100 persons"], errors = 'coerce')
df['Estimate of civilian firearms per 100 persons'] = converted_column
df.head()
Country or subnational areaEstimate of civilian firearms per 100 personsRegion[3]Subregion[3]Population 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearms
1United States120.5AmericasNorth America326474000.03933470001.01073743.0392,273,257 Est.
2Falkland Islands62.1AmericasSouth America3000.020002.01705.0295
3Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaN
4New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000
5Serbia39.1EuropeSouthern Europe6946000.027190002.01186086.01532914

Plot as a map

Let us connect to our GIS to geocode this data and present it as a map, either by specifying username and password, e.g. in gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123") or via an existing profile:

from arcgis.gis import GIS
import json

gis = GIS("home")

Since the table is now using Country (or dependent territory, subnational area, etc.) column to signify the country code, which represents as Country__or_dependent_territory__subnational_area__etc__ as the real column name of the table, the mapping relationship is stated as below:

fc = gis.content.import_data(df, {"CountryCode":"Country or subnational area"})
map1 = gis.map('UK')
map1

image

Let us us smart mapping to render the points with varying sizes representing the number of firearms per 100 residents

map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
                    "field_name": "Estimate_of_civilian_firearms_per_100_persons"})

Publish as Portal Item

Let us publish this layer as a feature collection item in our GIS

item_properties = {
    "title": "Worldwide Firearms ownership",
    "tags" : "guns,violence",
    "snippet": " GSR Worldwide firearms ownership",
    "description": "test description",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typeKeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

Let us search for this item

search_result = gis.content.search("Worldwide Firearms ownership")
search_result[0]
Worldwide Firearms ownership
GSR Worldwide firearms ownershipFeature Collection by arcgis_python
Last Modified: July 18, 2020
0 comments, 13 views

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.