Post-training Data and Model Bias Metrics
Amazon SageMaker Clarify provides eleven post-training data and model bias metrics to help quantify various conceptions of fairness. These concepts cannot all be satisfied simultaneously and the selection depends on specifics of the cases involving potential bias being analyzed. Most of these metrics are a combination of the numbers taken from the binary classification confusion matrices for the different demographic groups. Because fairness and bias can be defined by a wide range of metrics, human judgment is required to understand and choose which metrics are relevant to the individual use case, and customers should consult with appropriate stakeholders to determine the appropriate measure of fairness for their application.
We use the following notation to discuss the bias metrics. The conceptual model described here is for binary classification, where events are labeled as having only two possible outcomes in their sample space, referred to as positive (with value 1) and negative (with value 0). This framework is usually extensible to multicategory classification in a straightforward way or to cases involving continuous valued outcomes when needed. In the binary classification case, positive and negative labels are assigned to outcomes recorded in a raw dataset for a favored facet a and for a disfavored facet d. These labels y are referred to as observed labels to distinguish them from the predicted labels y' that are assigned by a machine learning model during the training or inferences stages of the ML lifecycle. These labels are used to define probability distributions Pa(y) and Pd(y) for their respective facet outcomes.
-
labels:
-
y represents the n observed labels for event outcomes in a training dataset.
-
y' represents the predicted labels for the n observed labels in the dataset by a trained model.
-
-
outcomes:
-
A positive outcome (with value 1) for a sample, such as an application acceptance.
-
n(1) is the number of observed labels for positive outcomes (acceptances).
-
n'(1) is the number of predicted labels for positive outcomes (acceptances).
-
-
A negative outcome (with value 0) for a sample, such as an application rejection.
-
n(0) is the number of observed labels for negative outcomes (rejections).
-
n'(0) is the number of predicted labels for negative outcomes (rejections).
-
-
-
facet values:
-
facet a – The feature value that defines a demographic that bias favors.
-
na is the number of observed labels for the favored facet value: na = na(1) + na(0) the sum of the positive and negative observed labels for the value facet a.
-
n'a is the number of predicted labels for the favored facet value: n'a = n'a(1) + n'a(0) the sum of the positive and negative predicted outcome labels for the facet value a. Note that n'a = na.
-
-
facet d – The feature value that defines a demographic that bias disfavors.
-
nd is the number of observed labels for the disfavored facet value: nd = nd(1) + nd(0) the sum of the positive and negative observed labels for the facet value d.
-
n'd is the number of predicted labels for the disfavored facet value: n'd = n'd(1) + n'd(0) the sum of the positive and negative predicted labels for the facet value d. Note that n'd = nd.
-
-
-
probability distributions for outcomes of the labeled facet data outcomes:
-
Pa(y) is the probability distribution of the observed labels for facet a. For binary labeled data, this distribution is given by the ratio of the number of samples in facet a labeled with positive outcomes to the total number, Pa(y1) = na(1)/ na, and the ratio of the number of samples with negative outcomes to the total number, Pa(y0) = na(0)/ na.
-
Pd(y) is the probability distribution of the observed labels for facet d. For binary labeled data, this distribution is given by the number of samples in facet d labeled with positive outcomes to the total number, Pd(y1) = nd(1)/ nd, and the ratio of the number of samples with negative outcomes to the total number, Pd(y0) = nd(0)/ nd.
-
The following table contains a cheat sheet for quick guidance and links to the post-training bias metrics.
Post-training bias metrics
Post-training bias metric | Description | Example question | Interpreting metric values |
---|---|---|---|
Difference in Positive Proportions in Predicted Labels (DPPL) | Measures the difference in the proportion of positive predictions between the favored facet a and the disfavored facet d. |
Has there been an imbalance across demographic groups in the predicted positive outcomes that might indicate bias? |
Range for normalized binary & multicategory facet labels:
Range for continuous labels: (-∞, +∞) Interpretation:
|
Disparate Impact (DI) | Measures the ratio of proportions of the predicted labels for the favored facet a and the disfavored facet d. | Has there been an imbalance across demographic groups in the predicted positive outcomes that might indicate bias? |
Range for normalized binary, multicategory facet, and continuous labels: [0,∞) Interpretation:
|
Conditional Demographic Disparity in Predicted Labels (CDDPL) | Measures the disparity of predicted labels between the facets as a whole, but also by subgroups. | Do some demographic groups have a larger proportion of rejections for loan application outcomes than their proportion of acceptances? |
The range of CDDPL values for binary, multicategory, and continuous
outcomes:
|
Counterfactual Fliptest (FT) | Examines each member of facet d and assesses whether similar members of facet a have different model predictions. | Is one group of a specific-age demographic matched closely on all features with a different age group, yet paid more on average? | The range for binary and multicategory facet labels is [-1,
+1] .
|
Accuracy Difference (AD) | Measures the difference between the prediction accuracy for the favored and disfavored facets. | Does the model predict labels as accurately for applications across all demographic groups? | The range for binary and multicategory facet labels is [-1,
+1] .
|
Recall Difference (RD) | Compares the recall of the model for the favored and disfavored facets. | Is there an age-based bias in lending due to a model having higher recall for one age group as compared to another? |
Range for binary and multicategory classification:
|
Difference in Conditional Acceptance (DCAcc) | Compares the observed labels to the labels predicted by a model. Assesses whether this is the same across facets for predicted positive outcomes (acceptances). | When comparing one age group to another, are loans accepted more frequently, or less often than predicted (based on qualifications)? |
The range for binary, multicategory facet, and continuous labels: (-∞, +∞).
|
Difference in Acceptance Rates (DAR) | Measures the difference in the ratios of the observed positive outcomes (TP) to the predicted positives (TP + FP) between the favored and disfavored facets. | Does the model have equal precision when predicting loan acceptances for qualified applicants across all age groups? | The range for binary, multicategory facet, and continuous labels is
[-1, +1] .
|
Specificity difference (SD) | Compares the specificity of the model between favored and disfavored facets. | Is there an age-based bias in lending because the model predicts a higher specificity for one age group as compared to another? |
Range for binary and multicategory classification:
|
Difference in Conditional Rejection (DCR) | Compares the observed labels to the labels predicted by a model and assesses whether this is the same across facets for negative outcomes (rejections). | Are there more or less rejections for loan applications than predicted for one age group as compared to another based on qualifications? | The range for binary, multicategory facet, and continuous labels:
(-∞, +∞).
|
Difference in Rejection Rates (DRR) | Measures the difference in the ratios of the observed negative outcomes (TN) to the predicted negatives (TN + FN) between the disfavored and favored facets. | Does the model have equal precision when predicting loan rejections for unqualified applicants across all age groups? | The range for binary, multicategory facet, and continuous labels is
[-1, +1] .
|
Treatment Equality (TE) | Measures the difference in the ratio of false positives to false negatives between the favored and disfavored facets. | In loan applications, is the relative ratio of false positives to false negatives the same across all age demographics? | The range for binary and multicategory facet labels: (-∞, +∞).
|
Generalized entropy (GE) | Measures the inequality in benefits b assigned to each input
by the model predictions. |
Of two candidate models for loan application classification, does one lead to a more uneven distribution of desired outcomes than the other? | The range for binary and multicategory labels: (0, 0.5). GE is undefined
when the model predicts only false negatives.
|
For additional information about post-training bias metrics, see A Family of Fairness Measures for Machine Learning in Finance
Topics
- Difference in Positive Proportions in Predicted Labels (DPPL)
- Disparate Impact (DI)
- Difference in Conditional Acceptance (DCAcc)
- Difference in Conditional Rejection (DCR)
- Specificity difference (SD)
- Recall Difference (RD)
- Difference in Acceptance Rates (DAR)
- Difference in Rejection Rates (DRR)
- Accuracy Difference (AD)
- Treatment Equality (TE)
- Conditional Demographic Disparity in Predicted Labels (CDDPL)
- Counterfactual Fliptest (FT)
- Generalized entropy (GE)