Robust Transformer Neural Network for Computer Vision Applications

dc.creatorBin Karim, Fazlur Rahman
dc.creatorDera, Dimah
dc.date.accessioned2023-10-27T16:26:49Z
dc.date.available2023-10-27T16:26:49Z
dc.date.issued2023-09-28
dc.description.abstractThe remarkable success of the Transformer model in Natural Language Processing (NLP) is increasingly capturing the attention of vision researchers in contemporary times. The Vision Transformer (ViT) model effectively models long-range dependencies while utilizing a self-attention mechanism by converting image information into meaningful representations. Moreover, the parallelism property of ViT ensures better scalability and model generalization compared to Recurrent Neural Networks (RNN). However, developing robust ViT models for high-risk vision applications, such as self-driving cars, is critical. Deterministic ViT models are susceptible to noise and adversarial attacks and incapable of yielding a level of confidence in output predictions. Quantifying the confidence (or uncertainty) level in the decision is highly important in such real-world applications. In this work, we introduce a probabilistic framework for ViT to quantify the level of uncertainty in the model's decision. We approximate the posterior distribution of network parameters using variational inference. While progressing through non-linear layers, the first-order Taylor approximation was deployed. The developed framework propagates the mean and covariance of the posterior distribution through layers of the probabilistic ViT model and quantifies uncertainty at the output predictions. Quantifying uncertainty aids in providing warning signals to real-world applications in case of noisy situations. Experimental results from extensive simulation conducted on numerous benchmark datasets (e.g., MNIST and Fashion-MNIST) for image classification tasks exhibit 1) higher accuracy of proposed probabilistic ViT under noise or adversarial attacks compared to the deterministic ViT. 2) Self- evaluation through uncertainty becomes notably pronounced as noise levels escalate. Simulations were conducted at the Texas Advanced Computing Center (TACC) on the Lonestar6 supercomputer node. With the help of this vital resource, we completed all the experiments within a reasonable period.en_US
dc.description.departmentTexas Advanced Computing Center (TACC)en_US
dc.identifier.urihttps://hdl.handle.net/2152/122165
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/48969
dc.relation.ispartofTACCSTER 2023en_US
dc.rights.restrictionOpenen_US
dc.subjecthigh performance computingen_US
dc.titleRobust Transformer Neural Network for Computer Vision Applicationsen_US
dc.typePosteren_US

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BinKarim_poster.pdf
Size:
524.26 KB
Format:
Adobe Portable Document Format
Description:
BinKarim poster

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: