Wrapper boxes for increasing model interpretability via example-based explanations

dc.contributor.advisorLease, Matthew A.
dc.contributor.committeeMemberLi, Jessy
dc.creatorSu, Yiheng
dc.date.accessioned2023-08-14T20:22:09Z
dc.date.available2023-08-14T20:22:09Z
dc.date.created2023-05
dc.date.issued2023-04-21
dc.date.submittedMay 2023
dc.date.updated2023-08-14T20:22:10Z
dc.description.abstractWe propose wrapper boxes to provide interpretability in deep learning that is model, training, and dataset-agnostic. The prediction model is trained as usual on some dataset(s), typically optimizing some predetermined loss function. At inference time, the prediction model is augmented by a simpler model that makes forecasts by leveraging learned representations from the former. Hence, any black box model such as deep neural networks can be made more interpretable by "wrapping" them with white box auxiliaries that are explainable by design. We demonstrate the effectiveness of wrapper box approaches across two datasets and three large pre-trained language models, showing that performance is not noticeably different compared to the original model across various configurations, even for simple augmentations like k-nearest neighbors, support vector machines, decision trees, and k-means. In particular, we present quantitative evidence that representations retrieved from the penultimate layer alone are sufficient for white boxes to achieve not noticeably different performance. Finally, we illustrate the additive explainability of white box augmentations by showcasing intuitive and faithful example-based explanations. We hypothesize that any minor degradation in predictive performance is justified by enhanced interpretability for human users, enabling the combined human-AI partnership to be more performant than possible with a black box model alone.
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/121132
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/47962
dc.language.isoen
dc.subjectModel interpretability
dc.subjectModel explainability
dc.titleWrapper boxes for increasing model interpretability via example-based explanations
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computer Sciences

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SU-THESIS-2023.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: