HTML table to Pandas Data Frame to Portal Item

Often we read informative articles that present data in a tabular form. If such data contained location information, it would be much more insightful if presented as a cartographic map. Thus this sample shows how Pandas can be used to extract data from a table within a web page (in this case, a Wikipedia article) and how it can be then brought into the GIS for further analysis and visualization.

Note: to run this sample, you need a few extra libraries in your conda environment. If you don't have the libraries, install them by running the following commands from cmd.exe or your shell

conda install lxml
conda install html5lib
conda install beautifulsoup4
conda install matplotlib
In [1]:
import pandas as pd

Let us read the Wikipedia article on Estimated number of guns per capita by country as a pandas data frame object

In [2]:
df = pd.read_html("https://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country")[0]

Let us process the table by dropping some unncessary columns

In [3]:
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))
In [4]:
df.head()
Out[4]:
Rank Country Guns per 100 Residents Notes
1 1 United States 112.6[6] According to the Congressional Research Servic...
2 2 Serbia 75.6 NaN
3 3 Yemen 54.8 NaN
4 4 Switzerland 45.7 Estimates range widely, between roughly 25% an...
5 5 Cyprus 36.4[9] NaN

If you notice, the Guns per 100 Residents for United States is not a proper integer. We can correct is as below

In [5]:
df.iloc[0,2] = 112.6
In [6]:
df.dtypes
Out[6]:
0
Rank                      object
Country                   object
Guns per 100 Residents    object
Notes                     object
dtype: object

However, we cannot change every incorrect data entry if the table is large. Further we need the Guns per 100 Residents column in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number

In [7]:
converted_column = pd.to_numeric(df["Guns per 100 Residents"], errors = 'coerce')
df['Guns per 100 Residents'] = converted_column
df.head()
Out[7]:
Rank Country Guns per 100 Residents Notes
1 1 United States 112.6 According to the Congressional Research Servic...
2 2 Serbia 75.6 NaN
3 3 Yemen 54.8 NaN
4 4 Switzerland 45.7 Estimates range widely, between roughly 25% an...
5 5 Cyprus NaN NaN

Plot as a map

Let us connect to our GIS to geocode this data and present it as a map

In [8]:
from arcgis.gis import GIS
import json

gis = GIS("https://www.arcgis.com", "arcgis_python", "P@ssword123")
In [9]:
fc = gis.content.import_data(df, {"CountryCode":"Country"})
In [10]:
map1 = gis.map('UK')
In [11]:
map1

map1

Let us us smart mapping to render the points with varying sizes representing the number of guns per 100 residents

In [12]:
map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
               "field_name": "Guns_per_100_Residents"})

Let us publish this layer as a feature collection item in our GIS

In [13]:
item_properties = {
    "title": "Worldwide gun ownership",
    "tags" : "guns,violence",
    "snippet": " GSR Worldwide gun ownership",
    "description": "test description",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typeKeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

Let us search for this item

In [14]:
search_result = gis.content.search("Worldwide gun ownership")
search_result[0]
Out[14]:
Worldwide gun ownership
GSR Worldwide gun ownershipFeature Collection by demo_deldev
Last Modified: June 05, 2016
0 comments, 19 views

Feedback on this topic?