HTML table to Pandas Data Frame to Portal Item

Read HTML table to Pandas Data Frame

Often we read informative articles that present data in a tabular form. If such data contains location information, it would be much more insightful if presented as a cartographic map. Thus this sample shows how Pandas can be used to extract data from a table within a web page (in this case, a Wikipedia article) and how it can be then brought into the GIS for further analysis and visualization.

Note: to run this sample, you need a few extra libraries in your conda environment. If you don't have the libraries, install them by running the following commands from cmd.exe or your shell

conda install lxml
conda install html5lib
conda install beautifulsoup4
conda install matplotlib```
import pandas as pd

Let us read the Wikipedia article on Estimated number of guns per capita by country as a pandas data frame object

df = pd.read_html("https://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country")[0]

Let us process the table by dropping some unncessary columns

df.head()
Unnamed: 0Country (or dependent territory, subnational area, etc.)Estimate of civilian firearms per 100 personsRegionSubregionPopulation 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearmsNotes
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
11.0United States120.5AmericasNorth America326474000.03933470001.01073743.0392273257.0[note 2]
22.0Template:Country data Islas Malvinas (Argentina)62.1AmericasSouth America3000.020002.01705.0295.0NaN
33.0Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaNNaN
44.0New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000.0NaN
df.dropna(thresh=4, inplace=True)
del df['Unnamed: 0']
del df['Notes']
df.head()
Country (or dependent territory, subnational area, etc.)Estimate of civilian firearms per 100 personsRegionSubregionPopulation 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearms
1United States120.5AmericasNorth America326474000.03933470001.01073743.0392273257.0
2Template:Country data Islas Malvinas (Argentina)62.1AmericasSouth America3000.020002.01705.0295.0
3Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaN
4New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000.0
5Montenegro39.1EuropeSouthern Europe626000.02450003.0103536.0141464.0

If you notice, the Estimate of civilian firearms per 100 persons for United States is not a proper integer. We can correct is as below

df.iloc[0,1] = 120.5
df.dtypes
0
Country (or dependent territory, subnational area, etc.)    object
Estimate of civilian firearms per 100 persons               object
Region                                                      object
Subregion                                                   object
Population 2017                                             object
Estimate of firearms in civilian possession                 object
Computation method                                          object
Registered firearms                                         object
Unregistered firearms                                       object
Notes                                                       object
dtype: object

However, we cannot change every incorrect data entry if the table is large. Further we need the Estimate of civilian firearms per 100 persons column in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number

converted_column = pd.to_numeric(df["Estimate of civilian firearms per 100 persons"], errors = 'coerce')
df['Estimate of civilian firearms per 100 persons'] = converted_column
df.head()
Country (or dependent territory, subnational area, etc.)Estimate of civilian firearms per 100 personsRegionSubregionPopulation 2017Estimate of firearms in civilian possessionComputation methodRegistered firearmsUnregistered firearms
1United States120.5AmericasNorth America326474000.03933470001.01073743.0392273257.0
2Template:Country data Islas Malvinas (Argentina)62.1AmericasSouth America3000.020002.01705.0295.0
3Yemen52.8AsiaWestern Asia28120000.0148590002.0NaNNaN
4New Caledonia42.5OceaniaMelanesia270000.01150002.055000.060000.0
5Montenegro39.1EuropeSouthern Europe626000.02450003.0103536.0141464.0

Plot as a map

Let us connect to our GIS to geocode this data and present it as a map, either by specifying username and password, e.g. in gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123") or via an existing profile:

from arcgis.gis import GIS
import json

gis = GIS('home')

Since the table is now using Country (or dependent territory, subnational area, etc.) column to signify the country code, which represents as Country__or_dependent_territory__subnational_area__etc__ as the real column name of the table, the mapping relationship is stated as below:

fc = gis.content.import_data(df, {"CountryCode":"Country__or_dependent_territory__subnational_area__etc__"})
map1 = gis.map('UK')
map1

Let us us smart mapping to render the points with varying sizes representing the number of firearms per 100 residents

map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
                    "field_name": "Estimate_of_civilian_firearms_per_100_persons"})

Publish as Portal Item

Let us publish this layer as a feature collection item in our GIS

item_properties = {
    "title": "Worldwide Firearms ownership",
    "tags" : "guns,violence",
    "snippet": " GSR Worldwide firearms ownership",
    "description": "test description",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typeKeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

Let us search for this item

search_result = gis.content.search("Worldwide Firearms ownership")
search_result[0]
Worldwide Firearms ownership
GSR Worldwide firearms ownershipFeature Collection by arcgis_python
Last Modified: February 19, 2019
0 comments, 0 views

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.