Lp-norm (LP)
The Lp-norm (LP) measures the p-norm distance between the facet distributions of the observed labels in a training dataset. This metric is non-negative and so cannot detect reverse bias.
The formula for the Lp-norm is as follows:
Lp(Pa, Pd) = ( ∑y||Pa - Pd||p)1/p
Where the p-norm distance between the points x and y is defined as follows:
Lp(x, y) = (|x1-y1|p + |x2-y2|p + … +|xn-yn|p)1/p
The 2-norm is the Euclidean norm. Assume you have an outcome distribution with three categories, for example, yi = {y0, y1, y2} = {accepted, waitlisted, rejected} in a college admissions multicategory scenario. You take the sum of the squares of the differences between the outcome counts for facets a and d. The resulting Euclidean distance is calculated as follows:
L2(Pa, Pd) = [(na(0) - nd(0))2 + (na(1) - nd(1))2 + (na(2) - nd(2))2]1/2
Where:
-
na(i) is number of the ith category outcomes in facet a: for example na(0) is number of facet a acceptances.
-
nd(i) is number of the ith category outcomes in facet d: for example nd(2) is number of facet d rejections.
The range of LP values for binary, multicategory, and continuous outcomes is [0, √2), where:
-
Values near zero mean the labels are similarly distributed.
-
Positive values mean the label distributions diverge, the more positive the larger the divergence.
-