Multi-objective approaches towards trustworthy machine learning
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
As artificial intelligence (AI) systems increasingly impact the society, it is important to design and maintain models that are responsible and trustworthy. Models should not discriminate against certain individuals or a group of individuals (fairness), the decisions should be explainable, and they should be robust to adversarial attacks. Moreover, the trained models should be dynamically updated if the data changes over time, and methods to provide explanations for model decisions need to operate efficiently and in real-time.
In this thesis, we address these challenges by developing frameworks that can account for more than one characteristic of responsible artificial intelligence. First, we evaluate existing black-box models using CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of AI models. CERTIFAI uses a custom genetic algorithm to produce counterfactual explanations, which are generated points close to the input point but belonging to a different class. These points can then be used to: provide explanations, measure feature importance, evaluate fairness based on an introduced notion called burden, and measure the robustness to adversarial attacks.
We then introduce FASTER-CE: a novel set of algorithms to generate fast, sparse, and robust counterfactual explanations. The backbone of the proposed method is an autoencoder trained on the original dataset. Random samples from the latent space of the trained autoencoder are used to train linear models for each of the features in the dataset and for the black-box model predictions. Using these trained linear models and additional user-defined constraints, we easily compute the direction for counterfactual explanation search and generate multiple counterfactual explanations that are sparse, realistic, and robust to input manipulations. We show that FASTER-CE is much faster than other state of the art methods to generate counterfactual explanations in generating multiple explanations with several desirable, and often conflicting, properties. Additionally, we explore the trade-offs between the sparsity, proximity, validity, speed of generation, and the robustness of explanations.
Next, we look into training a fairer model by creating a data augmentation based pre-processing bias mitigation technique that also lends itself towards bias disambiguation called FaiDA (fair data augmentation). We theoretically show that two different notions of fairness: statistical parity difference (independence) and average odds difference (separation) always change in the same direction using such an augmentation. We also show submodularity of the proposed fairness-aware augmentation approach that enables an efficient greedy algorithm.
To make models fair and robust, we introduce an in-processing bias mitigation technique FaiR-N: Fair and Robust Neural Networks, that trains models with regularizers to improve on burden-based fairness and adversarial robustness. We show that models can be trained with these considerations without compromising significantly on accuracy, that improving on burden based fairness also improves other fairness measures, and also discuss trade-offs between fairness and adversarial robustness.
We then focus on training models that are more fair and can also account for drift, where the drift could be with respect to accuracy and fairness. We propose FEAMOE, a mixture of experts framework aimed at learning fairer, more interpretable models that can also rapidly adjust to drifts in both the accuracy and fairness of a classifier. We illustrate our framework for three popular fairness measures and demonstrate how drift can be handled with respect to these fairness constraints, while also providing fast explanations. Our framework, as applied to a mixture of linear experts, is able to perform comparably to neural networks in terms of accuracy while producing fairer and more interpretable models that are dynamically updated to account for drift.