Ensuring algorithmic fairness is a central challenge in high-stakes machine learning applications, where biased predictions can have harmful societal consequences. Discrimination may persist even when sensitive features are excluded from model inputs, due to statistical dependencies between protected and ostensibly non-sensitive attributes that allow bias to be indirectly learned. Mutual information is a principled measure of such dependence, yet its non-differentiability under hard-threshold classification rules precludes direct use in gradient-based training. This paper proposes a fairness-aware learning strategy that embeds a differentiable surrogate of mutual information directly into the objective of nonlinear classifiers. The surrogate replaces hard label assignment with a Bernoulli relaxation of predicted probabilities, stabilizes empirical distributions via kernel smoothing, and accounts for epistemic uncertainty through Monte Carlo dropout, yielding an uncertainty-aware regularizer compatible with stochastic gradient optimization. Experiments on synthetic and real-world datasets show that the method achieves favorable fairness–accuracy trade-offs, with substantial bias reduction and minimal impact on predictive performance. Additional evaluations confirm robustness under counterfactual perturbations and reveal sublinear runtime growth with dataset size, contrasting favorably with the superlinear scaling of a genetic algorithm baseline. Overall, the framework offers a general, theoretically grounded, and practically viable approach to fairness-aware learning in complex, high-dimensional settings, supporting responsible algorithmic decision-making alongside expressive modeling.

A differentiable and uncertainty-aware mutual information regularizer for bias mitigation

Incremona, Alessandro;Pozzi, Andrea
;
Tessera, Daniele
2026-01-01

Abstract

Ensuring algorithmic fairness is a central challenge in high-stakes machine learning applications, where biased predictions can have harmful societal consequences. Discrimination may persist even when sensitive features are excluded from model inputs, due to statistical dependencies between protected and ostensibly non-sensitive attributes that allow bias to be indirectly learned. Mutual information is a principled measure of such dependence, yet its non-differentiability under hard-threshold classification rules precludes direct use in gradient-based training. This paper proposes a fairness-aware learning strategy that embeds a differentiable surrogate of mutual information directly into the objective of nonlinear classifiers. The surrogate replaces hard label assignment with a Bernoulli relaxation of predicted probabilities, stabilizes empirical distributions via kernel smoothing, and accounts for epistemic uncertainty through Monte Carlo dropout, yielding an uncertainty-aware regularizer compatible with stochastic gradient optimization. Experiments on synthetic and real-world datasets show that the method achieves favorable fairness–accuracy trade-offs, with substantial bias reduction and minimal impact on predictive performance. Additional evaluations confirm robustness under counterfactual perturbations and reveal sublinear runtime growth with dataset size, contrasting favorably with the superlinear scaling of a genetic algorithm baseline. Overall, the framework offers a general, theoretically grounded, and practically viable approach to fairness-aware learning in complex, high-dimensional settings, supporting responsible algorithmic decision-making alongside expressive modeling.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1540015
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact