HTML table to Pandas Data Frame to Portal Item

Read HTML table to Pandas Data Frame

Often we read informative articles that present data in a tabular form. If such data contains location information, it would be much more insightful if presented as a cartographic map. Thus this sample shows how Pandas can be used to extract data from a table within a web page (in this case, a Wikipedia article) and how it can be then brought into the GIS for further analysis and visualization.

Note: to run this sample, you need a few extra libraries in your conda environment. If you don't have the libraries, install them by running the following commands from cmd.exe or your shell

conda install lxml
conda install html5lib
conda install beautifulsoup4
conda install matplotlib```

import pandas as pd

Let us read the Wikipedia article on Estimated number of guns per capita by country as a pandas data frame object

df = pd.read_html("https://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country")[0]

Let us process the table by dropping some unncessary columns

df.head()

	Unnamed: 0	Country or subnational area	Estimate of civilian firearms per 100 persons	Region[3]	Subregion[3]	Population 2017	Estimate of firearms in civilian possession	Computation method	Registered firearms	Unregistered firearms	Notes
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	1.0	United States	120.5	Americas	North America	326474000.0	393347000	1.0	1073743.0	392,273,257 Est.	[note 2]
2	2.0	Falkland Islands	62.1	Americas	South America	3000.0	2000	2.0	1705.0	295	NaN
3	3.0	Yemen	52.8	Asia	Western Asia	28120000.0	14859000	2.0	NaN	NaN	NaN
4	4.0	New Caledonia	42.5	Oceania	Melanesia	270000.0	115000	2.0	55000.0	60000	NaN

df.dropna(thresh=4, inplace=True)

del df['Unnamed: 0']
del df['Notes']

df.head()

	Country or subnational area	Estimate of civilian firearms per 100 persons	Region[3]	Subregion[3]	Population 2017	Estimate of firearms in civilian possession	Computation method	Registered firearms	Unregistered firearms
1	United States	120.5	Americas	North America	326474000.0	393347000	1.0	1073743.0	392,273,257 Est.
2	Falkland Islands	62.1	Americas	South America	3000.0	2000	2.0	1705.0	295
3	Yemen	52.8	Asia	Western Asia	28120000.0	14859000	2.0	NaN	NaN
4	New Caledonia	42.5	Oceania	Melanesia	270000.0	115000	2.0	55000.0	60000
5	Serbia	39.1	Europe	Southern Europe	6946000.0	2719000	2.0	1186086.0	1532914

If you notice, the Estimate of civilian firearms per 100 persons for United States is not a proper integer. We can correct is as below

df.iloc[0,1] = 120.5

df.dtypes

Country or subnational area                       object
Estimate of civilian firearms per 100 persons    float64
Region[3]                                         object
Subregion[3]                                      object
Population 2017                                  float64
Estimate of firearms in civilian possession       object
Computation method                               float64
Registered firearms                              float64
Unregistered firearms                             object
dtype: object

However, we cannot change every incorrect data entry if the table is large. Further we need the Estimate of civilian firearms per 100 persons column in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number

converted_column = pd.to_numeric(df["Estimate of civilian firearms per 100 persons"], errors = 'coerce')
df['Estimate of civilian firearms per 100 persons'] = converted_column
df.head()

	Country or subnational area	Estimate of civilian firearms per 100 persons	Region[3]	Subregion[3]	Population 2017	Estimate of firearms in civilian possession	Computation method	Registered firearms	Unregistered firearms
1	United States	120.5	Americas	North America	326474000.0	393347000	1.0	1073743.0	392,273,257 Est.
2	Falkland Islands	62.1	Americas	South America	3000.0	2000	2.0	1705.0	295
3	Yemen	52.8	Asia	Western Asia	28120000.0	14859000	2.0	NaN	NaN
4	New Caledonia	42.5	Oceania	Melanesia	270000.0	115000	2.0	55000.0	60000
5	Serbia	39.1	Europe	Southern Europe	6946000.0	2719000	2.0	1186086.0	1532914

Plot as a map

Let us connect to our GIS to geocode this data and present it as a map, either by specifying username and password, e.g. in gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123") or via an existing profile:

from arcgis.gis import GIS
import json

gis = GIS("home")

Since the table is now using Country (or dependent territory, subnational area, etc.) column to signify the country code, which represents as Country__or_dependent_territory__subnational_area__etc__ as the real column name of the table, the mapping relationship is stated as below:

fc = gis.content.import_data(df, {"CountryCode":"Country or subnational area"})

map1 = gis.map('UK')

map1

Let us us smart mapping to render the points with varying sizes representing the number of firearms per 100 residents

map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
                    "field_name": "Estimate_of_civilian_firearms_per_100_persons"})

Publish as Portal Item

Let us publish this layer as a feature collection item in our GIS

item_properties = {
    "title": "Worldwide Firearms ownership",
    "tags" : "guns,violence",
    "snippet": " GSR Worldwide firearms ownership",
    "description": "test description",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typeKeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

Let us search for this item

search_result = gis.content.search("Worldwide Firearms ownership")
search_result[0]

Worldwide Firearms ownership
GSR Worldwide firearms ownership

Feature Collection by arcgis_python
Last Modified: July 18, 2020
0 comments, 13 views