Making AI discrimination-free using open-source bias mitigation strategies

Written by
Florian van der Steen & Juno Prent
With the previous Dutch parliament falling over a childcare benefits scandal, partially caused by a disciminating algorithm, it seems like awareness of algorithmic bias is now more important than ever.

Artificial Intelligence (AI) has seen immense improvements over the past ten years. Computers are now able to predict the music you will like, drive cars and help doctors diagnose cancer. AI is applied in various areas and decisions that are algorithmically made can have serious consequences. In tasks like predicting whether someone should get a loan, a job or even a shortened prison sentence, the impact of a decision is huge. In these areas, algorithmic bias has a social influence for which the deployers gain responsibility. Handling this responsibility well is incentivized by law and a call from research and media alike. All of these incentivizers plead for algorithmic fairness, the notion that AI models should adhere to some standard to ensure they do not discriminate based on certain features. With the previous Dutch parliament falling over a childcare benefits scandal, partially caused by a disciminating algorithm, it seems like awareness of algorithmic bias is now more important than ever.

Human Resources and Algorithmic Fairness

An area where algorithmic bias can particularly pose a problem is Human Resources (HR). HR data is often very rich and is likely to contain various attributes that are legally protected. These are attributes like ethnicity, age and gender, and are called protected attributes. Furthermore, a diverse workfloor is not only legally incentivized, it also increases revenue. With 40% of international companies utilising AI to make decisions on (future) employees, it is important to keep these algorithmic decision makers in check. Because simply omitting protected features from the input of a model does not negate bias and only decreases performance, there is an ongoing quest for the best methods that mitigate bias. Fairness and performance of a model is often a trade-off, therefore further complicating a solution.

This blog post will assess some of the available methods that enhance fairness in AI models, and will compare them on three data sets in the domain of HR. The blogpost ends with a set of recommendations to guide the reader in choosing the best method for their application.

Fairness Metrics

Algorithmic fairness comes in the form of some mathematical standard. There are various fairness definitions but a particularly interesting definition is Equalized Odds. It entails that the number of correct and incorrect classifications for the favorable class should be equal across groups. In this blogpost we make use of the Intersectional Equalized Odds Ratio (IEOR), which is a definition based on subgroups. If groups are defined based on one protected attribute, i.e. black people, then subgroups are defined based on more than one, i.e. black females. IEOR is the minimum Equalized Odds value for all the subgroups divided by the maximum Equalized Odds value for all the subgroups. The closer the value is to 1, the less disparity among subgroups.

Fairness Frameworks

In an attempt to both quantify and mitigate algorithmic bias, many different fairness frameworks have been developed in recent times. Well-known examples of these are AI Fairness 360, Fairlearn and Scikit-fairness. These frameworks contain several pre-, in- and/or post-processing mitigation strategies to enhance fairness, along with metrics with which fairness can be measured. These different types of methods will be further explained later on.

Bias in All Flavours

Although a bias can certainly be mitigated, explaining its origins is a whole new task altogether. Bias can have many different causes, with up to 23 different types being recognized. Although our experiment made no difference in treatment for specific types of bias, it is nonetheless interesting to look at what can lead to biased results. For instance, the way a data set is created can be biased, such as when it overrepresents one group compared to another or when its creator is biased towards a certain group and thus gives members of said group scores that are higher than they should be. Bias can also occur from the way an originally unbiased data set is used, for instance when a classifier has been pre-trained to prefer one group over another or when the privileged group is intentionally oversampled.

Different Approaches, Similar Goals

Before we dig into the pros and cons of each method, some background information is necessary. The field of bias mitigation strategies can be categorised into three types: pre-processing, in-processing and post-processing techniques. Pre-processing methods manipulate the data to eliminate bias before a machine learning (ML) model is able to incorporate these biases based on the data. Pre-processing techniques often alter the distribution of the protected attributes or perform transformations on the protected attributes. In-processing bias mitigation strategies manipulate the model to mitigate bias that appears during the training process. In-processing techniques often have fairness as a second optimization objective, next to the objective of making the most accurate predictions. Post-processing methods alter the outcomes of a model, preying on bias present in the output. Post-processors change the predictions of a model to improve fairness directly. In the upcoming sections, benefits and limitations of each type will be discussed.


➕ Advantages

An advantage of using pre-processing mitigation strategies is that, after they have altered the given data set, the parameters of the original machine learning model can be optimized once more on this new version of the data set. This leads, in general, to consistently high performance scores and as such only a minor loss in performance compared to a scenario in which no mitigation strategy was applied. Additionally, it can be an advantage to remove any bias as early on as possible within a larger process, which can be achieved by using pre-processing strategies, which strive to remove bias before actual predictions are made by a model.

➖ Disadvantages

A large potential disadvantage of pre-processing mitigation strategies is that many of them (e.g. AIF360’s disparate impact remover or Scikit-fairness’ information filter) are invasive, which means they alter the actual data within a given data set. This might, depending on the use case, be undesired or even prohibited by law, leaving only a small selection of actually usable pre-processing mitigation strategies (such as AIF360’s reweighing).

Another disadvantage of many pre-processing mitigation strategies is their inability to take any non-binary protected features as input. Being forced to change a column containing people’s ages into a column which only contains whether someone’s age is above or below average may lead to significant losses of information.

A third potential disadvantage is that some pre-processing mitigation strategies are unable to deal with multiple protected features. Although some of them are able to do so indirectly (i.e. by applying it for each protected feature individually), some of them (such as the aforementioned reweighing by AIF360) are truly best applied for just one protected feature. This does not translate well to mostly any HR data set, which is always likely to contain at least the gender and age of each person (if not also marital status, ethinicity and more).


➕ Advantages

An advantage of in-processing methods is that they provide flexibility in the fairness metric that they improve. Some in-processing methods even give mathematical guarentees of adhering to a particular notion. Pre- and post-processing methods typically do not have this flexibility nor this mathematical certainty. In-processing methods are the most recent development in fairness research and therefore have a high standard as their performance is compared with all other classical methods.

➖ Disadvantages

A limitation of in-processing methods is that they optimize fairness for a particular fairness constraint, and don’t improve fairness in a general sense. Models become ‘artificially’ fair and this can result in classifiers making randomized decisions or labeling every instance as the majority class.

In-processors give mathematical guarantees but these do not hold when multiple protected features are taken into account, due to the added mathematical complexity. It is therefore recommended to use them with binary sensitive features. A third disadvantage is that in-processing methods often are computationally expensive and applying these methods to large data sets is quite an investment timewise.

A final limitation of in-processing methods is that they require a certain model interface, in the form of returning predictions and probabilities. When used with models incapable of returning probabilities, some in-processing methods fail to work. Models returning predictions in the wrong format require extra steps which can be a nuisance.


➕ Advantages

Post-processing methods are the most direct form of increasing fairness, as they alter the outcomes of a model, making a direct impact. This is great because it acts as if someone were to look at your model and tell you: “This is discrimination!”, and then changed the output, directly lessening discrimination. Post-processing methods are very pleasant to work with as they are model agnostic, uninfluenced by the inner workings of the ML model. Another advantage of post-processing methods is that most are capable of handling multiple protected features and can lessen discrimination for multiple groups at the same time. Therefore they can also be used with categorical features such as age groupings.

➖ Disadvantages

A similar problem with post-processing methods arises given their specified nature for fairness metrics. Post-processing methods are often implemented such that the post-processor has a particular fairness constraint in mind when transforming the output. This gives rise to problems where the classifier overfits and assigns every instance to the same class.

A problem that is inherent to post-processing methods is that they ‘throw away’ all the effort put into generating the output. Using post-processors may therefore be more effective when the post-processor gives recommendations to change an outcome, instead of them directly changing outcomes without collaboration.

Overview of Chosen Methods

Short explanations of each method are provided here to give a better understanding of the way each method works. Firstly, we’ll take a look at the pre-processing methods. Reweighing is a method that calculates new weights for certain features, which causes the unprivileged groups to be more important for the machine learning model. This method only works for one protected feature, so there are two variants of this model: one for gender and one for age. The correlation remover is fairly straightforward, as it removes cross-correlation between the protected features and the rest of the features. The information filter technique aims to do so as well, but by applying the Gram-Schmidt process to the features. The disparate impact remover moves the values of all the unprotected features closer to each other such that there is less disparity between groups, while learning fair representations tries to hide information regarding the protected features by turning the achievement of fairness into an optimization problem.

Then we arrive at the in-processing methods, which are tasked with finding a balance between performance and fairness. Threshold optimization finds an optimal cutoff for the probability value at which a prediction should be classed as positive, while exponentiated gradient reductions relax the fairness constraints and randomize some of the output to ensure fair predictions. As for the post-processing mitigation strategies, reject option classification looks at classifications close to the decision boundary and favorably changes those for the unprivileged group, while Platt scaling is a method that looks at the data and the predicted probabilities and changes the predictions to be more in tune with the actual probabilities given the data.

The properties of the analyzed mitigation strategies are listed in Table 1.

Experimental Setup

The results are obtained by training a random forest classifier on three data sets (IBM, Babushkin and FedScope) found in the domain of HR. Within those data sets, the only chosen protected features are gender and age. The task is to predict whether an employee has resigned, which may be voluntarily or forced by the employer. The random forest is then evaluated on its ML performance and fairness metrics and this acts as a baseline. The chosen fairness metric was IEOR. After applying each bias mitigation strategy to the data, the model or the results, the newly produced outcomes are evaluated. The results of this experiment are listed in Tables 2, 3 and 4. The scores are relative to the baseline.



Comparing the results gained from the performed experiments, very different scores for each of the types of techniques can be seen. The most consistent of the three is pre-processing, which shows not to be prone to overfitting and as such generally manages to improve fairness a bit, while giving in little on performance. In-processing shows more proneness to overfitting and using it thus may lead to large drops in performance. Post-processing often manages to increase the performance scores of the machine learning model, however this frequently does lead to a loss of fairness.

From the tested pre-processing methods, reweighing and the disparate impact remover were the best when it came to dealing with the fairness-performance tradeoff, as the slight reductions in performance scores were paired with significant increases in fairness. For the in-processing methods, this was the case for the gradient reductions algorithm. The two tested post-processing mitigation strategies showed to be so different in their functioning that no such general winner can clearly be picked among them.


The current situation regarding fairness in AI calls for effective measures against discrimination. In this blogpost we have shown some of these measures and have compared them qualitatively and quantitatively. The qualitative assessment showed that post-processing methods are most apt for real-life scenarios as they can handle multiple non-binary protected features and are model-agnostic. Quantitatively, the pre-processing methods are very consistent, barely giving in on performance while enhancing fairness. We therefore recommend using pre-processing techniques when only one protected feature is important or when multiple binary protected features are important. Out of the analyzed pre-processing mitigation strategies, the disparate impact remover and reweighing returned the best results concerning performance and fairness. When multiple non-binary protected features are of importance, we recommend using post-processing techniques.

Lege plus

Continue reading...


Want to stay up to date?

Sign up for our newsletter, and we’ll keep you posted on our research, podcast and other AI goodies.
* We don't share your data. See our Privacy Policy
Thank you! You've subscribed.
Oops! Something went wrong while submitting the form.