AI Fairness and Regulation
Fairness is an important topic to consider when building machine learning models that impact individualsā lives or society as a whole.
Regulators are paying close attention to undesired and potentially harmful or discriminatory outcomes stemming from the use of AI technology. The upcoming EU AI Act will explicitly include fairness as one of the requirements for so-called āHigh-Risk AI Systems,ā which will encompass applications in the ļ¬elds of credit scoring, critical infrastructure, and HR, to name a few.
Understanding the implications of diļ¬erent fairness deļ¬nitions is important, so in this article, we explore the critical issues in AI fairness and illustrate the key concepts through a speciļ¬c use case example in consumer lending.
Having all these fairness metrics available at the same time in the Modulos platform enables users to better monitor and manage fairness-related decisions and achieve more sustainable societal outcomes.
Consequences and Drawbacks of Diļ¬erent Approaches to AI Fairness
By choosing a speciļ¬c fairness deļ¬nition, you are eļ¬ectively setting the standards for what you believe a āfair outcomeā in decision-making should look like. As such, it is important to fully understand the advantages and drawbacks of each of the fairness criteria and what they imply.
Ultimately, the choice of fairness metric will be guided by the speciļ¬c nature of the problem at hand, potential regulatory requirements, and the set of values that your organization embeds in its decision-making, mindful of the values of the society in which it operates.
Use Case Example: Qualifying for a Personal Loan
To understand the diļ¬erent deļ¬nitions of fairness that can be applied to a machine learning model and data, letās examine a concrete use case and explore the diļ¬erent outcomes when using various fairness criteria. The model we use for our analysis aims to assess the eligibility of a ļ¬nancial institution client (or prospect) for a personal loan, a decision that could have a signiļ¬cant impact on the life of the applicant.
We want to use past applicantsā historical information to classify each individual in the dataset as a āgood creditā if the loan was repaid in full or ābad creditā if the individual defaulted after disbursement. We will use this information as the basis to build a credit model and make future eligibility decisions.
Let us assume that we build our credit model based only on the individualās income level: All individuals above a given threshold level are granted the loan (positive outcome), whereas those below are denied the loan (negative outcome).
We can also distinguish two groups of individuals (Group 1 and Group 2) in our database, separating them according to a speciļ¬c personal attribute (e.g., gender). Letās say that one group (say, Group 2) generally has a lower income distribution. By changing income thresholds, we will illustrate diļ¬erent loan acceptance scenarios that conform to one of the fairness deļ¬nitions.
Fairness Building Blocks and Definitions
To get a clear understanding of the nuances across fairness criteria, let us ļ¬rst clarify the concepts of True Positive, True Negative, False Positive, and False Negative, and the ratios used in our example.
True Positive (TP)
An individual who, based on the model classiļ¬cation, will be granted the loan and who, based on historical data, had repaid a past loan.
True Negative (TN)
An individual who, based on the model classiļ¬cation, will be denied the loan and who, based on historical data, had defaulted.
False positive (FP)
An individual who, based on the model classiļ¬cation, will be granted the loan, but based on historical data, had defaulted.
False negative (FN)
An individual who, based on the model classiļ¬cation, will be denied the loan, but based on historical data, had repaid a past loan.
Positive rate (PR)
The probability of a positive outcome within the overall population. In our example, this is the probability of being granted a loan.
True Positive Rate (TPR)
The probability that an actual āgood creditā will be granted a loan based on the model classiļ¬cation.
Precision (Pr)
The probability of a classiļ¬er correctly identifying the positive outcome. In our example, this is the probability that an individual who is granted a loan, is an actual āgood creditā
False Positive Rate (FPR)
The probability that an actual ābad creditā will be granted a loan based on the model classiļ¬cation.
Statistical (Demographic) Parity
Using the statistical parity fairness criterion means aiming for the same acceptance rate with respect to the positive outcome across the two groups (Group 1 and Group 2). In our example, this means that the same proportion of individuals in each group is granted the loan regardless of creditworthiness: the Positive Rate is equal for the two groups. In the chart below, the criterion is satisļ¬ed for the two groups, which display an equal acceptance rate.
For real-world decisions, we would aim to satisfy this criterion when we want to ensure an equal positive outcome for a minority group (e.g., ensure equal acceptance rate for granting loans to men and women). However, by requiring equal acceptance rates, we run the risk of reinforcing historical biases if we do not create the right conditions for the minority group to thrive. In other words, if we want to promote equal access for women to credit, we need to also create labor conditions that encourage women to participate in the job market to ensure that they have the means to remain solvent.
Disparate Impact
Another similar metric we can compute, disparate impact, represents the ratio of acceptance rates between the two groups. Unlike demographic parity, which is calculated as the diļ¬erence in acceptance rates between the two groups, disparate impact is calculated as the Positive Rate for the minority group divided by the Positive Rate for the majority group, where the minority group is the one in a potentially disadvantaged position.
In our example, which is one where demographic parity is satisļ¬ed, this ratio is, as expected, 100%.
The concept of disparate impact has been given great attention in the United States, where it has beenincorporated into labor law. This doctrine prohibits apparently neutral practices in the ļ¬elds of employment and housing when they adversely aļ¬ect one group of people with a protected characteristic more than another.
For example, if the acceptance rate for Group 1 was 50% and for Group 2 it was 20%, that would have led to a disparate impact measure of 40% (20%/50%). US labor law considers the disparate impact measure to be acceptable only when it is at least 80%.
Equal Opportunity
The equal opportunity criterion requires the model to correctly identify the āgood creditsā at the same rate across the two groups, with the probability of granting the loan to those who are āgood creditsā based on historical data being the same, regardless of the group. In other words, the True Positive rate needs to be the same across the two groups.
We can see that this measure completely discards potential misclassiļ¬cations of people who are likely to default but are granted the loan according to the classiļ¬er. No matter how many false positives we would have in one of the groups, the equal opportunity metric would be unaļ¬ected.
As such, in real-world decision-making, this criterion could be considered when we aim to identify as many positive instances as we can correctly, and there is no critical harm in misclassifying a negative instance as a positive one. This would be true if we were trying, for example, to identify fraud attempts or COVID-19 cases: There is relatively only minor harm if we misjudge a non-fraudulent transaction or a negative COVID case and we go through further due diligence or notify the counterparties to a payment transaction or patients. On the contrary: Maximizing our chances to correctly identify fraud or spot individuals infected with COVID means that we can avoid economic and reputational losses in the ļ¬rst case and protect fragile groups from healthconsequences in the second.
In our example, however, what is at stake is the possibility of lending money to bad credit risks, thus putting the ļ¬nancial institution at risk of non-negligible losses and damaging the credit score of the group that has a high proportion of falsepositives. Accordingly, this is probably not the best metric for this use case.
Predictive Parity
The predictive parity criterion requires the model to deliver an equal proportion of āgood creditsā across those who aregranted the loan within the two groups. In other words, the Precision needs to be the same for both groups. With this measure, it does not matter what happens below the income threshold dotted line.
Despite being potentially a āplay it safeā type of criterion, which can help to minimize losses due to default in our example, it also imposes an opportunity cost. Let us assume that in the group with the lowest income distribution, individuals are generally savvier with their money and would have repaid the loan. There is thus the risk of excluding them from accessing a potentially proļ¬table credit exposure for the institution, reducing loans that would likely be repaid and lowering proļ¬ts.
However, when the costs of granting loans to bad payers signiļ¬cantly exceed the proļ¬ts that can be earned from additional loans, this could be an appropriate criterion.
Equalized Odds
This criterion requires the decision model across the two groups to respect these two conditions at the same time:
- Identifying āgood creditsā across all individuals who would not default by granting them a loan in the same proportion across the groups (same as equal opportunity)
- Misjudging ābad creditsā by granting them a loan at an equal rate across the two groups
In other words, both the True Positive and the False Positive rates need to be the same in the two groups.
This criterion rules out the possibility of disparate treatment for one speciļ¬c group, which could be treated more leniently if the ļ¬rst condition is considered in isolation (indeed, thatās one of the risks of the equal opportunity criterion).However, just as in the case of predictive parity, this can come at the expense of discarding proļ¬table lending opportunities.
Conclusion
The use of diļ¬erent fairness criteria requires a thorough analysis of the speciļ¬c use case, considering all the constraints that may apply from economic, ethical, and legal points of view. Diļ¬erent criteria lead to completely diļ¬erent acceptance levels across diļ¬erent groups.
Also, the deļ¬nitions that we have explored in this article all point to diļ¬erent directions as far as group versusindividual fairness is concerned. With demographic parity, we are closer to a group-level concept of fairness, whereas a criterion such as equalized odds requires similar treatment of individuals with similar attributes.
Regardless of the criterion that we select to assess fairness, it is important to be aware of the diļ¬erences and nuances of the various fairness deļ¬nitions. By making all the metrics accessible in a single platform, and by allowing the user to monitor and compare them, Modulos helps companies achieve better decisions and enables more sustainable societal outcomes.