Model explainability for ML Models

Introduction

Machine learning is growing at a fast pace. Researchers are coming up with new models and architectures that are taking the predictive power of these models to new heights everyday. Amidst this growing enthusiasm about the improving model performance, there is certain growing hesitancy among the users. This hesitancy is due to the "black box" nature of a lot of these models. A lack of transparency is definitely seen and the users are even now ready to sacrifice a bit of model performance in favour of using a model which is more "explainable". Tree based models have always been explainable due to their inherent nature.

To support the growing need to make models more explainable, arcgis.learn has now added explainability feature to all of its models that work with tabular data. This includes all the MLModels and the fully connected networks. arcgis.learn is now integrated with the model explainability that SHAP offers.

What is SHAP?

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. More details about SHAP and its implementation can be found here. https://github.com/slundberg/shap

Prepare Data

In this guide, we will use a pretrained model to get the predictions of energy generation for Solar Photovoltaic power plants using weather variables. We will then see, how using model explainability we can get the explanations for individual predictions. Finally, we will also see global model interpretibility. To get more details of the use case and how to train such a model please refer https://developers.arcgis.com/python/sample-notebooks/solar-energy-prediction-using-weather-variables/

import shap
import arcgis
from arcgis.gis import GIS
from arcgis.learn import FullyConnectedNetwork, MLModel, prepare_tabulardata
from sklearn.preprocessing import MinMaxScaler,RobustScaler
import pandas as pd

gis=GIS('home')

calgary_no_southland_solar = gis.content.search('calgary_no_southland_solar owner:api_data_owner', 'feature layer')[0]
calgary_no_southland_solar_layer = calgary_no_southland_solar.layers[0]
m1 = gis.map('calgary', zoomlevel=10)
m1.add_layer(calgary_no_southland_solar_layer)
m1

calgary_no_southland_solar_layer_sdf = calgary_no_southland_solar_layer.query().sdf
calgary_no_southland_solar_layer_sdf=calgary_no_southland_solar_layer_sdf[['FID','date','ID','solar_plan','altitude_m',
                                                                           'latitude','longitude','wind_speed','dayl__s_',
                                                                           'prcp__mm_d','srad__W_m_','swe__kg_m_', 'tmax__deg',
                                                                           'tmin__deg','vp__Pa_','kWh_filled','capacity_f',
                                                                           'SHAPE']]
X = ['altitude_m', 'wind_speed', 'dayl__s_', 'prcp__mm_d','srad__W_m_','swe__kg_m_','tmax__deg','tmin__deg','vp__Pa_']
preprocessors =  [('altitude_m', 'wind_speed', 'dayl__s_', 'prcp__mm_d','srad__W_m_','swe__kg_m_','tmax__deg',
                   'tmin__deg','vp__Pa_', RobustScaler())]

data = prepare_tabulardata(calgary_no_southland_solar_layer,
                           'capacity_f',
                           explanatory_variables=X,                           
                           preprocessors=preprocessors)

Load a Trained ML Model

Here we load a trained model which can predict the efficiency of the solar panels based on attributes like length of the day, weather , temperature etc.

model=MLModel.from_model('C:/Users/Karthik/Desktop/Base/MLModel/ML_Model_RF1/ML_Model_RF1.emd',data)

valid=data._dataframe.iloc[data._validation_indexes,:]
train=data._dataframe.iloc[data._training_indexes,:]

Local Prediction interpretation

Now we make a prediction on a sample row. The predict method in arcgis.learn for MLModel and Fully connected network has been modified to accept two new parameters 'explain' and 'explain_index'.

The parameter explain is boolean and setting this parameter to True allows the user to get the explanation of the prediction.
The parameter explain_index is an int and it is the index of the row of the dataframe for which the user would like to get the explanation for. It is only possible to get explanation for one row/index/sample at a time.

out=model.predict(model._data._dataframe[X],prediction_type='dataframe',explain=True,explain_index=0 )
out.head(10)

Using 500 background data samples could cause slower run times. Consider using shap.sample(data, K) or shap.kmeans(data, K) to summarize the background as K samples.

	altitude_m	wind_speed	dayl__s_	prcp__mm_d	srad__W_m_	swe__kg_m_	tmax__deg	tmin__deg	vp__Pa_	prediction_results
0	1095	7.204670	27648.000000	1	108.800003	12	-10.5	-21.0	120	-0.012350
1	1095	3.385235	27648.000000	1	115.199997	12	-18.0	-29.5	40	-0.023367
2	1095	5.076316	27648.000000	0	118.400002	12	-20.0	-32.0	40	-0.033506
3	1095	5.617623	27648.000000	0	96.000000	12	-18.0	-26.5	80	-0.032795
4	1095	2.561512	27648.000000	0	118.400002	12	-17.0	-28.5	40	-0.019064
5	1095	3.362019	27993.599609	2	67.199997	16	-21.5	-27.0	80	-0.043717
6	1095	4.025039	27993.599609	5	48.000000	20	-25.5	-30.5	40	-0.060310
7	1095	4.275431	27993.599609	0	112.000000	20	-24.5	-34.5	40	-0.047935
8	1095	8.036874	27993.599609	0	150.399994	0	-5.0	-30.5	40	-0.000990
9	1095	9.595293	27993.599609	0	112.000000	0	0.0	-11.0	280	0.015096

Running the predictions with explain set to true will generate a plot which is shown above, along with the predictions for all the samples passed into the predict function. Note, that the prediction explanation is generated only for the first observation (With row 0) since 0 was passed as the value for the parameter explain_index

The plot says that the mean value of the entire dataset (denoted as base value in the plot) is 0.12 and the prediction for this specific observation is -0.01. Apart from that, plot also says that the parameter sradW_m has had the greatest influence on the model in getting the prediction of this specific record. This is represented by the length of the block srad_W_m. All the parameters in Blue are making the model move away towards the left from the base prediction and the parameters in Red (In this case - None) are influencing the model positively.

For more details , refer to https://github.com/slundberg/shap

Global Interpretation

We can now also visualize the global interpretation (similar to feature importance in sklearn). Unlike sklearn this plot can be generated for non tree models of sklearn as well.This plot below, tells the user the impact that each feature in the dataset has on the model as a whole. We can infer from the plot that the feature sradW_m has the highest impact on the model decision making and the feature altitude_m has the least impact.

model_lin.feature_importances_