Mapping the 2019 Novel Coronavirus Pandemic¶
According to WHO, 2019 Novel Corona Virus (COVID-19) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness, which was unknown before the outbreak began in Wuhan, China, in December 2019 [1]. Early on, the disease demonstrated an animal-to-person spread, then a person-to-person spread. Infections with COVID-19, were reported in a growing number of international locations, including the United States". The United States reported the first confirmed instance of person-to-person spread with this virus on January 30, 2020 [2].
This notebook shows how to use the ArcGIS API for Python
to monitor the spread of COVID-19 as it became a pandemic.
1. Import Data¶
Esri provides an open-to-public and free-to-share feature layer that contains the most up-to-date COVID-19 cases covering China, the United States, Canada, Australia (at province/state level), and the rest of the world (at country level, represented by either the country centroids or their capitals). Data sources are WHO, US CDC, China NHC, ECDC, and DXY. The China data is updated automatically at least once per hour, and non-China data is updating manually. The data source repo that this layer referenced from, is created and maintained by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University, and can be viewed here. In this notebook, we will use the feature layer supported by Esri Living Atlas team and JHU Data Services, and provide a different perspective in viewing the global maps of COVID-19 via the use of ArcGIS API for Python.
NOTE: "Since COVID-19 is continuously evolving, the sample reflects data as of July 4th, 2020 (or when the notebook is first published). Running this notebook at a later date might reflect a different result, but the overall steps should hold good."
DISCLAIMER: "This notebook is for the purpose of illustrating an analytical process using Jupyter Notebooks and ArcGIS API for Python, and should not be used as medical or epidemiological advice."
Necessary Imports¶
from io import BytesIO
import requests
import pandas as pd
from arcgis.features import FeatureLayer
from arcgis.gis import GIS
from arcgis.mapping import WebMap
"""
# if you are using arcgis api for python with version 1.8.0 or above,
# make sure that the pandas version>=1.0,
# if not, use `pip install --upgrade pandas>=1` to upgrade.
"""
pd.__version__
gis = GIS('home')
item = gis.content.search("Coronavirus_2019_nCoV_Cases owner:CSSE_GISandData", outside_org=True)[0]
item
Through the API Explorer
provided along with the dashboard product, we can easily fetch the source URL for the Feature Service containing daily updated COVID-19 statistics, which can then be used to create a FeatureLayer
object good for querying and visualizing.
src_url = "https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/Coronavirus_2019_nCoV_Cases/FeatureServer/1"
fl = FeatureLayer(url=src_url)
df_global = fl.query(where="1=1",
return_geometry=True,
as_df=True)
As stated in the dashboard, the source data can be grouped into:
- A. Countries or regions of which data are collected at province/state level, e.g. China, the United States, Canada, Australia;
- B. Countries or regions for the rest of the world of which data collected at country level, and shape represented by either the country centroids or their capitals;
- C. Cruise Ships with confirmed COVID-19 cases.
Group A¶
Let us first take a look at how many countries are within group A, that Country_Region
and Province_State
are not null or NAN.
df_global[~pd.isnull(df_global['Province_State'])].groupby('Country_Region').sum()[['Confirmed', 'Recovered', 'Deaths']]
Each country/region in Group A, has more than 1 feature, as what we have seen below from the query() results.
fset_usa = fl.query(where="Country_Region='US'")
fset_usa
fset_china = fl.query(where="Country_Region='China'")
fset_china
fl.query(where="Country_Region='Denmark'")
Group C¶
Group C contains cruise ships across the globe with reported cases:
df_cruise_ships = fl.query(where="Province_State='Diamond Princess' or \
Province_State='Grand Princess' or \
Country_Region='MS Zaandam' or \
Country_Region='Diamond Princess'",
as_df=True)
df_cruise_ships[["Province_State", "Country_Region", "Last_Update", "Confirmed", "Recovered", "Deaths"]]
Group B¶
In the df_global
, other than the 22 countries (Australia, Canada, China, etc.) in Group A, and those cruise ships in Group C, all other countries/regions fall into Group B, e.g. Thailand. The major difference between Group A and Group B is that the latter contains one and only feature per country.
fl.query(where="Country_Region='Thailand'")
Query the reference feature layers¶
Because the geo-information provided by the dashboard contains only the coordinates representing the centroid of each country/region, the feature layer can only be rendered as points on Map. If this is what you want, you can now skip the rest of section 1, and jump right onto section 2.
On the other hand, if you want to visualize the confirmed/death/recovered cases per country/region as polygons, in other words as a choropleth map, please read along:
First, we need to access the feature service that contains geometry/shape info for all provinces in Mainland China, and merge with the COVID-19 DataFrame.
Access the reference feature layer of China¶
provinces_item = gis.content.get("0f57da7f853c4a1aa5b2e048ff8655d2")
provinces_item
provinces_flayer = provinces_item.layers[0]
provinces_df = provinces_flayer.query(as_df=True)
provinces_df.columns
tmp = provinces_df.sort_values('NAME', ascending=True)
provinces_df = tmp.drop_duplicates(subset='NAME', keep='last')
provinces_df.shape
DataFrame Merging for China Dataset¶
The subsets of dataframe being created in the previous section now needs to be merged with feature services which have geographic information (e.g. geometries, shape, or longitude/latitude) in order to provide location and geometries required for geographic mapping. First, let's acquire the geometries from feature services existing on living atlas or arcgis online organization to represent the geographic information needed of overlap_rows_china
.
df_china = fset_china.sdf[['Province_State', 'Confirmed', 'Recovered', 'Deaths']]
df_china = df_china.assign(NAME = df_china["Province_State"])
df_china.head()
Because the names are inconsistent between the two data sources (e.g. provinces represented differently in df_china['Province_State']
and provinces_df['NAME"]
), the replace_value_in_column
method is declared below to edit the records and unify the column names.
def replace_value_in_column(data_frame, l_value, r_value, column_name = 'NAME'):
data_frame.loc[data_frame[column_name] == l_value, column_name] = r_value
replace_value_in_column(df_china, 'Guangxi', 'Guangxi Zhuang Autonomous Region')
replace_value_in_column(df_china, 'Inner Mongolia', 'Inner Mongolia Autonomous Region')
replace_value_in_column(df_china, 'Ningxia', 'Ningxia Hui Autonomous Region')
replace_value_in_column(df_china, 'Tibet', 'Tibet Autonomous Region')
Now the two DataFrame objects have got unified column names, we can go ahead to use a single function in Pandas
called merge
as an entry point to perform in-memory standard database join operations (similar to that of relational databases such as SQL), and its syntax is shown here -
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)
Note that, the how
argument specifies how to merge (a.k.a. how to determine which keys are to be included in the resulting table). If a key combination does not appear in either the left or the right tables, the values in the joined table will be NA. The table below then shows a summary of the how options and their SQL equivalent names −
Merge Method | SQL Equivalent | Description |
---|---|---|
left | LEFT OUTER JOIN | Use keys from left object |
right | RIGHT OUTER JOIN | Use keys from right object |
outer | FULL OUTER JOIN | Use union of keys |
inner | INNER JOIN | Use intersection of keys |
In this case, we will be calling merge()
with how='inner'
to perform the inner join of the two DataFrame objects on the index field "NAME"
and only to keep the intersection of keys.
cols_2 = ['NAME', 'AREA', 'TOTPOP_CY','SHAPE']
overlap_rows_china = pd.merge(left = provinces_df[cols_2], right = df_china,
how='inner', on = 'NAME')
overlap_rows_china.head()
cols_2 = ['NAME', 'AREA', 'TOTPOP_CY','SHAPE','Shape__Area', 'Shape__Length']
overlap_rows_china = pd.merge(left = provinces_df[cols_2], right = df_china, how='inner',
on = 'NAME')
overlap_rows_china.head()
As shown in overlap_rows_china
, each province/state in China is now merged while retaining the SHAPE
column, and is ready to be rendered as polygons.
Access the reference feature layer of the United States¶
Next, we need to access the feature service that contains geometry/shape info for all states in the U. S., and merge with the DataFrame depicting COVID-19 statistics.
us_states_item = gis.content.get('99fd67933e754a1181cc755146be21ca')
us_states_item
us_states_flayer = us_states_item.layers[0]
us_states_df = us_states_flayer.query(as_df=True)
us_states_df.columns
DataFrame Merging for U.S. Dataset¶
df_usa = fset_usa.sdf[['Province_State', 'Confirmed', 'Recovered', 'Deaths']]
df_usa = df_usa.assign(STATE_NAME = df_usa["Province_State"])
df_usa.head()
cols_4 = ['STATE_NAME','SHAPE']
overlap_rows_usa = pd.merge(left = us_states_df[cols_4], right = df_usa,
how='inner', on = 'STATE_NAME')
overlap_rows_usa.head()
Access the reference feature layer of world countries¶
countries_item = gis.content.get('2b93b06dc0dc4e809d3c8db5cb96ba69')
countries_item
countries_flayer = countries_item.layers[0]
countries_df = countries_flayer.query(as_df=True)
countries_df.columns
DataFrame Merging for global Dataset¶
The df_global
has listed its Country_Region
column with their current best-known names in English, while the countries_df
uses their currently best-known equivalents, and this difference in naming countries has created a problem for the merge()
operation to understand if the two countries listed in two DataFrame objects are the same. We need to hence run the following cell in order to make country names consistent between the two DataFrames to be merged.
df_global.loc[df_global['Country_Region']=='US', 'Country_Region'] = 'United States'
df_global.loc[df_global['Country_Region']=='Korea, South', 'Country_Region'] = 'South Korea'
df_global.loc[df_global['Country_Region']=='Korea, North', 'Country_Region'] = 'North Korea'
df_global.loc[df_global['Country_Region']=='Russia', 'Country_Region'] = 'Russian Federation'
df_global.loc[df_global['Country_Region']=='Czechia', 'Country_Region'] = 'Czech Republic'
List the top 10 countries with largest numbers¶
With df_global
ready, we can now sort countries or regions by their numbers of confirmed/recovered/death cases, through usage of groupby()
, and sort_values()
.
# sorted by # of confirmed cases
df_global_sum = df_global.groupby('Country_Region').sum()[['Confirmed', 'Recovered', 'Deaths']]
df_global_sum_c = df_global_sum.sort_values(by = ['Confirmed'], ascending = False)
df_global_sum_c.head(10)
# sorted by death tolls
df_global_sum_d = df_global_sum.sort_values(by = ['Deaths'], ascending = False)
df_global_sum_d.head(10)
Joining the COVID-19 stats and world countries DataFrames¶
world_merged1 = pd.merge(df_global_sum_c, countries_df[['COUNTRY', 'SHAPE']],
left_index=True, right_on='COUNTRY',
how="left")
world_merged1[['COUNTRY', 'Confirmed','Deaths', 'Recovered']].head(10)
Now, each country/region in world_merged1
is now merged with the SHAPE
column, and is ready to be rendered as polygons.
2. Map the COVID-19 cases in China¶
Next, let us start visualizing the following scenarios targeting at China:
- Confirmed cases rendered as points, and polygons
- Death cases rendered as points, and polygons
- Recovered cases rendered as points, and polygons
Map the confirmed COVID-19 cases in China¶
We can either call the add_layer()
function to add the specified layer or item (i.e. the FeatureLayer object created as fl
) to the map widget, and set the visualization options to be using ClassedSizeRenderer
, or plot the derived SeDF on the map view directly with further descriptions such as how to renderer spatial data using symbol and color palette (Here, the SeDF is the derivative of the merged DataFrame, which now contains a SHAPE column that we can use to plot in the Map widget as polygons).
Display confirmed cases in China as points¶
map1 = gis.map('China', zoomlevel=4)
map1
map1.add_layer(fl, { "type": "FeatureLayer",
"renderer":"ClassedSizeRenderer",
"field_name":"Confirmed"})
map1.zoom = 3
map1.legend = True
Display confirmed cases in China as polygons¶
map1b = gis.map('China')
map1b
map1b.clear_graphics()
overlap_rows_china.spatial.plot( kind='map', map_widget=map1b,
renderer_type='c', # for class breaks renderer
method='esriClassifyNaturalBreaks', # classification algorithm
class_count=4, # choose the number of classes
col='Confirmed', # numeric column to classify
cmap='inferno', # color map to pick colors from for each class
alpha=0.7 # specify opacity
)
map1b.zoom = 4
map1b.legend=True
The Map view above (map1b
) displays the number of confirmed cases per province in Mainland China. Orange polygons refer to provinces with number of confirmed cases in the range of [45423, 68134], and black polygons represent those in the range of [1, 22712].
Also, we can save the MapView object into a Web Map item for the purpose of future references and modifications.
map1b.save({'title':'Confirmed COVID-19 Cases in China',
'snippet':'Map created using Python API showing confirmed COVID-19 cases in China',
'tags':['automation', 'COVID19', 'world health', 'python']})
For example, we can browse the web map in the browser, change its symbology to different color maps in the configuration pane, then visualize it again with different looks here.
map1b_item = gis.content.search('Confirmed COVID-19 Cases in China')[0]
WebMap(map1b_item)
map2 = gis.map('China', zoomlevel=4)
map2
map2.add_layer(fl, { "type": "FeatureLayer",
"renderer":"ClassedSizeRenderer",
"field_name":"Deaths"})
map2.legend = True
Display death cases in China as polygons¶
map2b = gis.map('China')
map2b
map2b = gis.map('China')
map2b
map2b.clear_graphics()
overlap_rows_china.spatial.plot( kind='map', map_widget=map2b,
renderer_type='c', # for class breaks renderer
method='esriClassifyNaturalBreaks', # classification algorithm
class_count=4, # choose the number of classes
col='Deaths', # numeric column to classify
cmap='inferno', # color map to pick colors from for each class
alpha=0.7 # specify opacity
)
map2b.zoom = 4
map2b.legend = True
Using the same approach, we can then map the number of death cases per province in Mainland China. With legend displayed, map2b
shows us orange polygons refer to provinces with number of death cases in the range of [3008, 4512], and black polygons represent those in the range of [0, 1504].
Similarly, we can create an additional deliverable - the Web Map Item created on the active GIS
- and then browse the web map in the browser, change its symbology to different color maps in the configuration pane, and/or visualize it again with different looks here.
map2b_item = map2b.save({'title':'COVID-19 Death Cases in China',
'snippet':'Map created using Python API showing COVID-19 death cases in China',
'tags':['automation', 'COVID19', 'world health', 'python']})
WebMap(map2b_item)
map3 = gis.map('China', zoomlevel=4)
map3
map3.add_layer(fl, { "type": "FeatureLayer",
"renderer":"ClassedSizeRenderer",
"field_name":"Recovered"})
map3.legend = True
Display the recovered cases in China as polygons¶
map3b = gis.map('China')
map3b