Data processing

Mitigating salary bias due to gender using Automl fairness

Table of Content

Introduction

Bias is prevalent in most datasets, often introduced during data collection and due to other factors. While preprocessing typically addresses problems such as missing data, corrupted records, outliers, featue engineering, etc., bias in datasets is frequently overlooked. Consequently, models trained on biased data can produce biased predictions. To address this, we present an elaborate methodology demonstrating detection and mitigation of gender bias in predicting salaries as a specific case study. Removing bias is a complex process, and we leverage the capabilities of AutoML to both remove bias and identify optimal unbiased models.

Necessary Imports

%matplotlib inline
import matplotlib.pyplot as plt

import pandas as pd

import arcgis
from arcgis.gis import GIS


from arcgis.learn import prepare_tabulardata, AutoML
from sklearn.model_selection import train_test_split

from arcgis.learn import prepare_tabulardata, AutoML
from sklearn.metrics import accuracy_score

Connecting to ArcGIS

gis = GIS("home")

Accessing the dataset

The dataset comprises demographic and employment information for a diverse group of individuals in the United States, featuring variables such as age, education level, occupation, marital status, salary, and more. Our goal is to train a model that predicts whether an individual's salary is above or below 50k.

data_table = gis.content.get("9f56292f1bec417da75d577bbd131889")
data_table
salary
CSV by api_data_owner
Last Modified: July 17, 2024
0 comments, 0 views
# Download the csv and saving it in local folder
data_path = data_table.get_data()
adult_income = pd.read_csv(data_path).drop(["Unnamed: 0"], axis=1)
adult_income.head()
AgeWorkclassFnlwgtEducationEducation-numMarital-statusOccupationRelationshipRaceGenderCapital-gainCapital-lossHours-per-weekNative-countrySalaryannual_salary_$
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K64375
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K19304
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K55493
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K78591
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K55388

The dataset consists of 32,561 records, with 21,790 males and 10,771 females. The age range is from 18 to 59 years old. The majority of the individuals are from the United States (93%), with a few from Puerto Rico, Jamaica, and Cuba. The most common education level is HS-grad (34%), followed by Some-college (20%), and Bachelors (15%). The majority of the individuals are married (63%), with a significant number being divorced (15%) or never-married (12%).

A basic analysis of salary distributions by gender reveals a gender imbalance, with 30.57% of males earning more than 50K compared to only 10.95% of females. This disparity suggests potential bias or disparities in salary distribution based on gender. Further analysis and fairness mitigation strategies will be necessary to address and understand the underlying causes of this imbalance.

adult_income.columns
Index(['Age', 'Workclass', 'Fnlwgt', 'Education', 'Education-num',
       'Marital-status', 'Occupation', 'Relationship', 'Race', 'Gender',
       'Capital-gain', 'Capital-loss', 'Hours-per-week', 'Native-country',
       'Salary', 'annual_salary_$'],
      dtype='object')

Data processing consists of first splitting the dataset into a training dataset and a testing dataset as follows:

test_size = 0.20
train, test = train_test_split(adult_income, test_size = test_size, random_state=32, shuffle=True)

Model Building using AutoML

First, we will train a baseline model using AutoML, which will generate a fairness score for evaluation. This will be a classification model trained using relevant demographic explanatory features from the dataset for predicting and classifying the salary of employees. Here Education-num, Capital-gain, Capital-loss and Hours-per-week are considered as continuous variable, and the rest being categorical. The target variable Salary has two classes and is sutiable for the current automl implementation for fairness mitigation, which can handle only binary classification.

Data Preparation

The preparation of the data is carried out by the prepare_tabulardata method from the arcgis.learn module in the ArcGIS API for Python. This function will take either a non spatial dataframe, a feature layer, or a spatial dataframe containing the dataset as input and will return a TabularDataObject that can be fed into the model. Here we are using a non spatial dataframe.

The primary input parameters required for the tool are:

input_features : non spatial dataframe containing the  primary dataset
variable_predict : field name `Salary` as the y-variable to be predicted from the input dataframe
explanatory_variables : The selected list of explanatory variables. 
explanatory_variables = [
    ('Age', True), ('Workclass', True), ('Education', True), 'Education-num',
    ('Marital-status', True), ('Occupation', True), ('Relationship', True),
    ('Race', True), ('Gender', True), 'Capital-gain', 'Capital-loss',
    'Hours-per-week', ('Native-country', True)
]
data = prepare_tabulardata(train, 'Salary', explanatory_variables=explanatory_variables)
Dataframe is not spatial, Rasters and distance layers will not work
data.show_batch() 
AgeCapital-gainCapital-lossEducationEducation-numGenderHours-per-weekMarital-statusNative-countryOccupationRaceRelationshipSalaryWorkclass
11744700HS-grad9Male40Married-civ-spouseUnited-StatesOther-serviceWhiteHusband<=50KPrivate
50933400HS-grad9Male40Married-civ-spouseUnited-StatesMachine-op-inspctWhiteHusband<=50KPrivate
112044601977Masters14Male40Married-civ-spouseUnited-StatesTech-supportWhiteHusband>50KPrivate
125863000Some-college10Female40DivorcedUnited-StatesAdm-clericalWhiteNot-in-family<=50KLocal-gov
29133300011th7Male40Married-spouse-absentMexicoHandlers-cleanersAmer-Indian-EskimoNot-in-family<=50KPrivate

Model initialization

Here we will initialize the AutoML model by pasing the preprared tabular data from above. We can also pass the mode of the model as Basic, Intermediate or Advanced. The default is Basic.

automl_classifier_plain = AutoML(data=data)

Model training

Finally, the model is ready for training. To train the model, we call the model.fit() function. Based on the mode of the model, it will start training for the relevant epochs until it finds the best model. The time it takes to train the model will depend on the mode chosen, with basic being the fastest and advanced being the most time consuming.

The model will use various available sets of algorithms as a backbone, like Decision Tree, Random Trees, Extra Trees, LightGBM, Xgboost specialized for tabular data, and model ensembling to find the best model.

automl_classifier_plain.fit()
Neural Network algorithm was disabled because it doesn't support n_jobs parameter.
Linear algorithm was disabled.
AutoML directory: C:\Users\sup10432\AppData\Local\Temp\scratch\tmpfvj5jmu4
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Random Trees', 'Extra Trees', 'LightGBM', 'Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 1 model
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree logloss 0.361375 trained in 6.29 seconds
* Step default_algorithms will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM logloss 0.27688 trained in 5.5 seconds
There was an error during 3_Default_Xgboost training.
Please check C:\Users\sup10432\AppData\Local\Temp\scratch\tmpfvj5jmu4\errors.md for details.
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees logloss 0.338299 trained in 8.8 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees logloss 0.368012 trained in 8.38 seconds
* Step ensemble will try to check up to 1 model
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
Ensemble logloss 0.27688 trained in 3.35 seconds
AutoML fit time: 39.66 seconds
AutoML best model: 2_Default_LightGBM
All the evaluated models are saved in the path  C:\Users\sup10432\AppData\Local\Temp\scratch\tmpfvj5jmu4

Once trained, the model score is checked to understand the performance of the trained model.

automl_classifier_plain.score()
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
0.7439539347408829

Additional insights into model performance can be observed from the model report, which includes the AutoML leaderboard, performance metrics for each algorithm attempted, a boxplot depicting model performance, and Spearman correlation analysis.

automl_classifier_plain.report()
In case the report html is not rendered appropriately in the notebook, the same can be found in the path C:\Users\sup10432\AppData\Local\Temp\scratch\tmpfvj5jmu4\README.html

AutoML Leaderboard

Best modelnamemodel_typemetric_typemetric_valuetrain_time
1_DecisionTreeDecision Treelogloss0.3613757.05
the best2_Default_LightGBMLightGBMlogloss0.276886.4
4_Default_RandomTreesRandom Treeslogloss0.3382999.54
5_Default_ExtraTreesExtra Treeslogloss0.3680129.12
EnsembleEnsemblelogloss0.276883.35

AutoML Performance

AutoML Performance

AutoML Performance Boxplot

AutoML Performance Boxplot

Spearman Correlation of Models

models spearman correlation

Check fairness of unmitigated model for gender

Before proceeding, we need to verify if the baseline model exhibits bias and determine if mitigation is necessary. Initially, the fairness score of the baseline AutoML model is assessed to identify any gender-related bias, its type, and magnitude.

%matplotlib inline
fairness_df = automl_classifier_plain.fairness_score(sensitive_feature ='Gender', visualize=True)
<Figure size 640x350 with 1 Axes><Figure size 1200x900 with 4 Axes>

In the output above are four metrics measuring fairness for the classification problems. Equalized odds difference(EOD), Demographic parity difference(DPR), Equalized odds ratio(EOR), Demographic parity ratio(DPR). We discuss the interpretation of these metrics below. To learn more bout the metrics, see how fairness works.

fairness_df[1]
{'equalized_odds_difference': (0.16,
  'The value of equalized_odds_difference is 0.16 which is less than minimum threshold 0.25. The ideal value of this metric is 0. Fairness for this metric is between 0 and 0.25.'),
 'demographic_parity_difference': (0.2,
  'The value of demographic_parity_difference is 0.2 which is less than minimum threshold 0.25. The ideal value of this metric is 0. Fairness for this metric is between 0 and 0.25.'),
 'equalized_odds_ratio': (0.18,
  'The value of equalized_odds_ratio is 0.18 which is less than minimum threshold 0.8. The ideal value of this metric is 1. Fairness for this metric is between 0.8 and 1.'),
 'demographic_parity_ratio': (0.29,
  'The value of demographic_parity_ratio is 0.29 which is less than minimum threshold 0.8. The ideal value of this metric is 1. Fairness for this metric is between 0.8 and 1.')}

The fairness score reveals that the prediciton is biased, as reflected by the equalized odds ratio and the demographic parity ratio, both of which are less than the minimum value of 0.8 and ideal value of 1.

fairness_df[0]
accuracyfalse positive ratefalse negative rateselection ratecount
( Female,)0.9337980.0132450.4433960.0801390.330518
( Male,)0.8583720.0743870.2834220.2809630.669482

Analyse model fairness

In the fairness report above, the Equalized Odds Ratio (EOR) and Demographic Parity Ratio (DPR) are the two critical metrics that reveal significant unfairness in the prediction outcomes between different genders. These metrics should be the primary focus for mitigation efforts. Strategies such as algorithmic adjustments, feature selection, or targeted interventions may be needed to address the observed biases and improve fairness in salary predictions.

Choosing a Metric

If the primary concern is to ensure fairness in both false positives and false negatives, then Equalized Odds Ratio (EOR) would be the preferred metric for bias mitigation. Addressing disparities in both types of errors can lead to a more balanced and equitable outcome.

However, if the focus is solely on ensuring an equal distribution of positive outcomes between genders, then Demographic Parity Ratio (DPR) might be sufficient for mitigation efforts.

In the context of this example:

Equalized Odds Ratio (EOR):

EOR focuses on ensuring fairness in both false positives and false negatives between different males and females. Specifically, EOR (0.18) indicates that the odds of a true positive prediction for the protected group (e.g., females) are 18% of those for the unprotected group (e.g., males). Mitigating bias using EOR means adjusting the model to achieve more balanced error rates across genders, thereby reducing disparities in both types of prediction errors (false positives and false negatives).

Demographic Parity Ratio (DPR):

DPR primarily aims to ensure an equal distribution of positive outcomes (e.g. salary above 50k) between different genders, regardless of predictive errors. In this example, DPR (0.29) indicates that the ratio of positive outcomes for females is 29% of that for males. Mitigating bias using DPR involves adjusting the model to achieve parity in positive outcome rates across genders, without necessarily addressing disparities in prediction errors.

Following this diagnosis, we will now attempt to mitigate the demographic parity ratio bias caused by gender. First we will initialize the automl model with the fairness metric for bias mitigation.

Mitigation using demographic parity ratio

The first step for mitigation is to identify a sensitive feature in the data that is introducing the bias and specify an appropriate fairness metric based on clasification or regresssion. To do this, we initiate the model using the sensitive variable as Gender and the metric as DPR. DPR defines the fairness metric to be optimized and adjusted to achieve demographic parity in positive outcomes (salary) between different gender groups. Other paramters that can be used are fairness_threshold and underprivileged_groups, but the default values are used here. Refer to the earlier link for more details.

automl_mitigation_dpr_obj = AutoML(data,sensitive_variables= ['Gender'], fairness_metric = 'demographic_parity_ratio')

After creating the AutoML object by passing the data obtained from prepare_tabulardata and using mitigation values for other parameters, we will proceed to training the model using AutoML. This is done by calling the fit method as shown below. After training, all of the models and their variants will be saved in a new folder.

automl_mitigation_dpr_obj.fit()
Neural Network algorithm was disabled because it doesn't support n_jobs parameter.
Linear algorithm was disabled.
AutoML directory: C:\Users\sup10432\AppData\Local\Temp\scratch\tmp__c4b209
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Random Trees', 'Extra Trees', 'LightGBM', 'Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'unfairness_mitigation', 'ensemble']
* Step simple_algorithms will try to check up to 1 model
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree logloss 0.361375 trained in 6.95 seconds
* Step default_algorithms will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM logloss 0.27688 trained in 5.77 seconds
There was an error during 3_Default_Xgboost training.
Please check C:\Users\sup10432\AppData\Local\Temp\scratch\tmp__c4b209\errors.md for details.
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees logloss 0.338299 trained in 9.4 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees logloss 0.368012 trained in 8.57 seconds
* Step unfairness_mitigation will try to check up to 4 models
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing logloss 0.35729 trained in 9.24 seconds
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing logloss 0.285305 trained in 5.4 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing logloss 0.384304 trained in 8.8 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing logloss 0.423913 trained in 6.93 seconds
* Step unfairness_mitigation_update_1 will try to check up to 4 models
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing_Update_1 logloss 0.412036 trained in 14.03 seconds
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing_Update_1 logloss 0.295114 trained in 5.58 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing_Update_1 logloss 0.462531 trained in 6.81 seconds
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_1 logloss 0.377543 trained in 12.94 seconds
* Step unfairness_mitigation_update_2 will try to check up to 2 models
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_2 logloss 0.404245 trained in 9.3 seconds
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing_Update_2 logloss 0.307829 trained in 5.91 seconds
* Step ensemble will try to check up to 1 model
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
Ensemble logloss 0.307829 trained in 3.64 seconds
AutoML fit time: 141.1 seconds
AutoML best model: 2_Default_LightGBM_SampleWeigthing_Update_2
All the evaluated models are saved in the path  C:\Users\sup10432\AppData\Local\Temp\scratch\tmp__c4b209

Once the model is trained, it will have mitigated the bias. This can be verified by reviewing the model report and examining the demographic parity ratio metric of the best-trained model. Internally we are using an approach called Reweighing for bias mitigation. Reweighing is a preprocessing method that adjusts the weights of examples in each (group, label) combination to ensure fairness before classification.

automl_mitigation_dpr_obj.report()
In case the report html is not rendered appropriately in the notebook, the same can be found in the path C:\Users\sup10432\AppData\Local\Temp\scratch\tmp__c4b209\README.html

AutoML Leaderboard

Best modelnamemodel_typemetric_typemetric_valuetrain_timefairness_metricfairness_Genderis_fair
1_DecisionTreeDecision Treelogloss0.3613757.75demographic_parity_ratio0.1344False
2_Default_LightGBMLightGBMlogloss0.276886.52demographic_parity_ratio0.3252False
4_Default_RandomTreesRandom Treeslogloss0.33829910.23demographic_parity_ratio0.332False
5_Default_ExtraTreesExtra Treeslogloss0.3680129.44demographic_parity_ratio0.2844False
4_Default_RandomTrees_SampleWeigthingRandom Treeslogloss0.3572910.06demographic_parity_ratio0.3612False
2_Default_LightGBM_SampleWeigthingLightGBMlogloss0.2853056.22demographic_parity_ratio0.5264False
5_Default_ExtraTrees_SampleWeigthingExtra Treeslogloss0.3843049.67demographic_parity_ratio0.7682False
1_DecisionTree_SampleWeigthingDecision Treelogloss0.4239137.76demographic_parity_ratio0.4991False
5_Default_ExtraTrees_SampleWeigthing_Update_1Extra Treeslogloss0.41203614.89demographic_parity_ratio0.9246True
2_Default_LightGBM_SampleWeigthing_Update_1LightGBMlogloss0.2951146.28demographic_parity_ratio0.6955False
1_DecisionTree_SampleWeigthing_Update_1Decision Treelogloss0.4625317.61demographic_parity_ratio0.4962False
4_Default_RandomTrees_SampleWeigthing_Update_1Random Treeslogloss0.37754313.83demographic_parity_ratio0.7167False
4_Default_RandomTrees_SampleWeigthing_Update_2Random Treeslogloss0.40424510.21demographic_parity_ratio0.8917True
the best2_Default_LightGBM_SampleWeigthing_Update_2LightGBMlogloss0.3078296.62demographic_parity_ratio0.8406True
EnsembleEnsemblelogloss0.3078293.64demographic_parity_ratio0.8406True

AutoML Performance

AutoML Performance

AutoML Performance Boxplot

AutoML Performance Boxplot

Performance vs fairness_Gender

Performance vs fairness_Gender

Spearman Correlation of Models

models spearman correlation

The model report shows that 2_Default_LightGBM_SampleWeigthing_Update_2 is the best trained model, with the respective demograpihc_parity_ratio is now 0.84 which is up from 0.29, and surpassing the minimum threshold of 0.80. This suggests that bias mitigation has been successfully achieved. Additionally, the model score is verified to ensure that the performance remains consistent with previous evaluations, which is also the same as before.

DPR mitigation Analysis

Model Performance Metrics Before and After Mitigation for female:

Accuracy (Female)False Positive Rate (Female)False Negative Rate (Female)Selection Rate (Female)Count (Female)
Before Mitigation0.9337980.0132450.4433960.0801390.330518
After Mitigation0.89490.09790.16450.17820.2121

Model Performance Metrics Before and After Mitigation for male:

Accuracy (Male)False Positive Rate (Male)False Negative Rate (Male)Selection Rate (Male)Count (Male)
Before Mitigation0.8583720.0743870.2834220.2809630.669482
After Mitigation0.83940.04730.41620.2120.4391

Selection Rate:

Selection Rate can be defined as the proportion of samples from a specific sensitive group that were selected or predicted as positive by the model. For example, for the male group, a selection rate value of 0.2809 indicates that approximately 28.09 percent of male samples were predicted as positive outcomes by the model.

Before mitigation, the selection rate for females (0.0801) was significantly lower than for males (0.2809).

After mitigation, the selection rates have become more balanced, with males at 0.2120 and females at 0.1782. This indicates an improvement in demographic parity, ensuring more equitable selection between genders.

False Negative Rate:

Before mitigation, females had a much higher rate of being incorrectly classified as earning less than 50k (false negatives) at 0.4433, compared to males at 0.2834.

After mitigation, the rate of females being incorrectly classified as earning less than 50k (false negatives) significantly decreased to 0.1645, indicating an improvement in correctly identifying females earning above 50k.

However, the rate of males being incorrectly classified as earning less than 50k (false negatives) increased from 0.2834 to 0.4162. This indicates that while the mitigation process improved the false negative rate for females, it had an adverse effect on the false negative rate for males.

Overall Accuracy:

The overall accuracy decreased slightly from the pre-mitigation accuracy levels (males: 0.858372, females: 0.933798) to 0.8575 after mitigation. This is a minor change and indicates that overall predictive performance was maintained.

The mitigation strategy improved demographic parity by balancing the selection rates between males and females, but this came at the cost of increasing the false negative rate for males. This trade-off suggests that while aiming for fairness in selection rates, other metrics such as the false negative rate can be adversely affected.

The mitigation achieved in selection rate shows progress towards demographic parity, ensuring a fairer selection process between genders. However, the increase in the false negative rate for males is a concern, as it indicates more males are being incorrectly classified as negative cases after mitigation. Balancing fairness and performance metrics like false negative rate is crucial, and further adjustments or different mitigation techniques may be necessary to achieve a more equitable outcome without compromising accuracy.

automl_mitigation_dpr_obj.score()
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
0.7439539347408829

Finally the mitigated model is used for final prediction on an unseen data:

result_df = automl_mitigation_dpr_obj.predict(test,prediction_type="dataframe")
result_df.head(5)
AgeWorkclassFnlwgtEducationEducation-numMarital-statusOccupationRelationshipRaceGenderCapital-gainCapital-lossHours-per-weekNative-countrySalaryannual_salary_$prediction_resultsprediction_confidence
2450757Private89182HS-grad9WidowedAdm-clericalNot-in-familyWhiteFemale0040United-States<=50K90732<=50K0.834847
2835133Private159548Some-college10DivorcedAdm-clericalUnmarriedBlackFemale0038United-States<=50K87710<=50K0.974263
71719State-gov378418HS-grad9Never-marriedTech-supportOwn-childWhiteFemale0040United-States<=50K64787<=50K0.998881
1941744Private151985Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0024United-States>50K83582>50K0.939446
1674623Private406641Some-college10Never-marriedHandlers-cleanersOther-relativeWhiteFemale0018United-States<=50K86347<=50K0.998467

In the predicted dataframe, the prediction_results column contains the model's predictions. To validate these predictions, they are compared with the actual values. The accuracy, which is then calculated, shows a high value. Significantly this prediction can be now considered free of bias.

accuracy = accuracy_score(result_df["Salary"], result_df['prediction_results'])
print(accuracy)
0.8628896054045755

Mitigation using Equalized Odds Ratio

To address some of the shortcomings of Demographic Parity Ratio (DPR), let's mitigate the model using Equalized Odds Ratio (EOR). EOR aims to balance fairness and performance metrics by considering both false positive and false negative outcomes.

The aim of the Equalized Odds fairness metric is to guarantee that a machine learning model exhibits equal performance across different demographic groups. It imposes a stricter criterion than demographic parity by mandating that the model's predictions are not only independent of the female and male sensitive group membership, but also that the false positive rates and true positive rates are equal across groups. This distinction holds significance because while a model may achieve demographic parity, meaning its predictions are independent of sensitive group membership, it could still produce a higher number of false positive predictions for one group compared to others. Equalized Odds mitigates this concern by ensuring fairness in both false positive and true positive rates across all groups. Unlike demographic parity, Equalized Odds does not introduce the selection issue discussed earlier. For instance, in the present scenario where the objective is to predict salary by gender, it is important to ensure the model performs equally well in predictign appropriate salary from both groups.

automl_mitigation_eqr_obj = AutoML(data,sensitive_variables= ['Gender'], fairness_metric = 'equalized_odds_ratio')
automl_mitigation_eqr_obj.fit()
Neural Network algorithm was disabled because it doesn't support n_jobs parameter.
Linear algorithm was disabled.
AutoML directory: C:\Users\sup10432\AppData\Local\Temp\scratch\tmppbdan7vj
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Random Trees', 'Extra Trees', 'LightGBM', 'Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'unfairness_mitigation', 'ensemble']
* Step simple_algorithms will try to check up to 1 model
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree logloss 0.361375 trained in 9.32 seconds
* Step default_algorithms will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM logloss 0.27688 trained in 6.03 seconds
There was an error during 3_Default_Xgboost training.
Please check C:\Users\sup10432\AppData\Local\Temp\scratch\tmppbdan7vj\errors.md for details.
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees logloss 0.338299 trained in 10.02 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees logloss 0.368012 trained in 9.05 seconds
* Step unfairness_mitigation will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing logloss 0.285305 trained in 5.86 seconds
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing logloss 0.35729 trained in 9.89 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing logloss 0.384304 trained in 9.6 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing logloss 0.423913 trained in 7.02 seconds
* Step unfairness_mitigation_update_1 will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing_Update_1 logloss 0.295114 trained in 6.54 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing_Update_1 logloss 0.412036 trained in 14.37 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing_Update_1 logloss 0.462531 trained in 6.86 seconds
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_1 logloss 0.377543 trained in 13.15 seconds
* Step unfairness_mitigation_update_2 will try to check up to 1 model
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_2 logloss 0.404245 trained in 9.84 seconds
* Step ensemble will try to check up to 1 model
Ensemble not trained. Can't contruct fair ensemble.
AutoML fit time: 137.57 seconds
AutoML best model: 2_Default_LightGBM_SampleWeigthing
AutoML can't construct model that meets your fairness criteria.
What you can do?
1. Please include more samples that are not biased.
2. Please examine the most unfairly treated samples.
3. Please change fairness threshold.
All the evaluated models are saved in the path  C:\Users\sup10432\AppData\Local\Temp\scratch\tmppbdan7vj

Once the model is trained, it will have mitigated the bias. This can be verified by reviewing the model report and examining the Equalized odds ratio metric of the best-trained model.

automl_mitigation_eqr_obj.report()
In case the report html is not rendered appropriately in the notebook, the same can be found in the path C:\Users\sup10432\AppData\Local\Temp\scratch\tmppbdan7vj\README.html

AutoML Leaderboard

Best modelnamemodel_typemetric_typemetric_valuetrain_timefairness_metricfairness_Genderis_fair
1_DecisionTreeDecision Treelogloss0.36137510.13equalized_odds_ratio0.0153False
2_Default_LightGBMLightGBMlogloss0.276886.85equalized_odds_ratio0.2679False
4_Default_RandomTreesRandom Treeslogloss0.33829910.94equalized_odds_ratio0.2314False
5_Default_ExtraTreesExtra Treeslogloss0.3680129.97equalized_odds_ratio0.1706False
the best2_Default_LightGBM_SampleWeigthingLightGBMlogloss0.2853056.56equalized_odds_ratio0.7195False
4_Default_RandomTrees_SampleWeigthingRandom Treeslogloss0.3572910.76equalized_odds_ratio0.3123False
5_Default_ExtraTrees_SampleWeigthingExtra Treeslogloss0.38430410.51equalized_odds_ratio0.6825False
1_DecisionTree_SampleWeigthingDecision Treelogloss0.4239137.89equalized_odds_ratio0.6193False
2_Default_LightGBM_SampleWeigthing_Update_1LightGBMlogloss0.2951147.33equalized_odds_ratio0.6645False
5_Default_ExtraTrees_SampleWeigthing_Update_1Extra Treeslogloss0.41203615.37equalized_odds_ratio0.5769False
1_DecisionTree_SampleWeigthing_Update_1Decision Treelogloss0.4625317.68equalized_odds_ratio0.0249False
4_Default_RandomTrees_SampleWeigthing_Update_1Random Treeslogloss0.37754314.03equalized_odds_ratio0.6816False
4_Default_RandomTrees_SampleWeigthing_Update_2Random Treeslogloss0.40424510.65equalized_odds_ratio0.463False

AutoML Performance

AutoML Performance

AutoML Performance Boxplot

AutoML Performance Boxplot

Performance vs fairness_Gender

Performance vs fairness_Gender

Spearman Correlation of Models

models spearman correlation

The model report shows that 2_Default_LightGBM_SampleWeigthing is the best model. However, the EOR metric shows that it was not able to construct a fair model despite the significant improvement from 0.18 to 0.71. This is close enough to the threshold of 0.8 to be considered a fair model. In fact, the fairness_threshold parameter can be used to lower the EOR threshold to 0.71 for the model to be formally considered fair.

Reducing the threshold for a successful mitigation

Acknowledging the fact that with an EOR and threshold of 0.8 , the model was not able to find a fair model, we can formalize the marked improvement of the EOR from 0.17 to 0.70 by reducing the threshold to 0.70 in the API and retrain the model.

automl_mitigation_eqr_obj = AutoML(data,sensitive_variables= ['Gender'], fairness_metric = 'equalized_odds_ratio', fairness_threshold=0.70)
automl_mitigation_eqr_obj.fit()
Neural Network algorithm was disabled because it doesn't support n_jobs parameter.
Linear algorithm was disabled.
AutoML directory: C:\Users\sup10432\AppData\Local\Temp\scratch\tmpejag8d56
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Random Trees', 'Extra Trees', 'LightGBM', 'Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'unfairness_mitigation', 'ensemble']
* Step simple_algorithms will try to check up to 1 model
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree logloss 0.361375 trained in 7.8 seconds
* Step default_algorithms will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM logloss 0.27688 trained in 6.33 seconds
There was an error during 3_Default_Xgboost training.
Please check C:\Users\sup10432\AppData\Local\Temp\scratch\tmpejag8d56\errors.md for details.
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees logloss 0.338299 trained in 10.26 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees logloss 0.368012 trained in 9.09 seconds
* Step unfairness_mitigation will try to check up to 4 models
LightgbmAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
Exception while producing SHAP explanations. pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: Workclass: object, Education: object, Marital-status: object, Occupation: object, Relationship: object, Race: object, Gender: object, Native-country: object
Continuing ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
2_Default_LightGBM_SampleWeigthing logloss 0.285305 trained in 5.99 seconds
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing logloss 0.35729 trained in 9.24 seconds
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing logloss 0.384304 trained in 9.19 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing logloss 0.423913 trained in 7.03 seconds
* Step unfairness_mitigation_update_1 will try to check up to 3 models
ExtraTreesAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
5_Default_ExtraTrees_SampleWeigthing_Update_1 logloss 0.412036 trained in 13.99 seconds
DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
1_DecisionTree_SampleWeigthing_Update_1 logloss 0.462531 trained in 6.84 seconds
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_1 logloss 0.377543 trained in 13.35 seconds
* Step unfairness_mitigation_update_2 will try to check up to 1 model
RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Problem during computing permutation importance. Skipping ...
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
4_Default_RandomTrees_SampleWeigthing_Update_2 logloss 0.404245 trained in 9.61 seconds
* Step ensemble will try to check up to 1 model
y_true takes value in {' <=50K', ' >50K'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.
Ensemble logloss 0.285305 trained in 3.09 seconds
AutoML fit time: 131.81 seconds
AutoML best model: 2_Default_LightGBM_SampleWeigthing
All the evaluated models are saved in the path  C:\Users\sup10432\AppData\Local\Temp\scratch\tmpejag8d56
automl_mitigation_eqr_obj.report()
In case the report html is not rendered appropriately in the notebook, the same can be found in the path C:\Users\sup10432\AppData\Local\Temp\scratch\tmpejag8d56\README.html

AutoML Leaderboard

Best modelnamemodel_typemetric_typemetric_valuetrain_timefairness_metricfairness_Genderis_fair
1_DecisionTreeDecision Treelogloss0.3613758.58equalized_odds_ratio0.0153False
2_Default_LightGBMLightGBMlogloss0.276887.21equalized_odds_ratio0.2679False
4_Default_RandomTreesRandom Treeslogloss0.33829911.17equalized_odds_ratio0.2314False
5_Default_ExtraTreesExtra Treeslogloss0.3680129.95equalized_odds_ratio0.1706False
the best2_Default_LightGBM_SampleWeigthingLightGBMlogloss0.2853056.76equalized_odds_ratio0.7195True
4_Default_RandomTrees_SampleWeigthingRandom Treeslogloss0.3572910.11equalized_odds_ratio0.3123False
5_Default_ExtraTrees_SampleWeigthingExtra Treeslogloss0.38430410.02equalized_odds_ratio0.6825False
1_DecisionTree_SampleWeigthingDecision Treelogloss0.4239137.95equalized_odds_ratio0.6193False
5_Default_ExtraTrees_SampleWeigthing_Update_1Extra Treeslogloss0.41203614.95equalized_odds_ratio0.5769False
1_DecisionTree_SampleWeigthing_Update_1Decision Treelogloss0.4625317.73equalized_odds_ratio0.0249False
4_Default_RandomTrees_SampleWeigthing_Update_1Random Treeslogloss0.37754314.25equalized_odds_ratio0.6816False
4_Default_RandomTrees_SampleWeigthing_Update_2Random Treeslogloss0.40424510.42equalized_odds_ratio0.463False
EnsembleEnsemblelogloss0.2853053.09equalized_odds_ratio0.7195True

AutoML Performance

AutoML Performance

AutoML Performance Boxplot

AutoML Performance Boxplot

Performance vs fairness_Gender

Performance vs fairness_Gender

Spearman Correlation of Models

models spearman correlation

EOR mitigation Analysis

Model Performance Metrics Before and After Equalized Odds Ratio Mitigation for female:

AccuracyFalse Positive Rate(FPR)False Negative Rate(FNR)Selection Rate
Before Mitigation0.93370.01320.44330.0801
After Mitigation0.93120.05130.21210.1315

Model Performance Metrics Before and After Equalized Odds Ratio Mitigation for male:

AccuracyFalse Positive Rate(FPR)False Negative Rate(FNR)Selection Rate
Before Mitigation0.85840.07440.28340.2810
After Mitigation0.84400.07130.34720.2498

The model report now shows that the best model is fair. However, from the comparison table above, the overall assessment shows that the mitigation strategy has led to mixed results:

Improvements: Female FNR has significantly improved, reducing bias against females by lowering the rate of false negatives. Female SR has increased, leading to a fairer representation of females in the positive selections. Male FPR has slightly decreased.

Drawbacks: The mitigation efforts have succeeded in balancing certain metrics across genders but have also introduced new biases, particularly in the false positive and false negative rates. Further fine-tuning of the mitigation technique might be necessary to achieve a more balanced and fair outcome across all metrics, including addition of more data.

Conclusion

In this study, we explored the application of fairness metrics in machine learning, particularly focusing on the limitations and benefits of Demographic Parity Ratio (DPR) and Equalized Odds Ratio (EOR) for fairness assessment.

First, we performed an initial fairness assessment of the model predicting salary by utilizing the demographic variable dataset and a vanilla automl workflow. The initial model showed discrepancies in fairness metrics, particularly with higher false positive rates for certain groups revelaed by the Demographic Parity Ratio (DPR) and the Equalized Odds Ratio (EOR).

Subsequently, fairness mitigation was done first with DPR and then with EOR. While DPR addressed some aspects of fairness, it fell short in balancing false positive and false negative rates across groups, leading to suboptimal performance in fairness. Then migating using the Equalized Odds Ratio metric provided a more comprehensive fairness assessment by ensuring equal false positive and true positive rates across all groups, thereby addressing the limitations observed with DPR.

Finally, adjusting the threshold allowed automl to construct a fair model, which is useful for getting an Ensemble model. Otherwise if the model is not able to construct a fair model, a model ensemble is not created.

Although there might be bias still present in the model, the mitigation workflow was able to reduce it significantly. Thus continuous evaluation and refinement of the fairness workflow would be crucial for achiving more equitable machine learning models and unbiased decision-making processes.

Data resources

DatasetCitationLink
Census Income datsetExtraction was done by Barry Becker from the 1994 Census databasehttps://archive.ics.uci.edu/dataset/20/census+income
                                                  ------End-----

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.