Vision Transformer-assisted analysis of neural Image compression and generation
dc.contributor.advisor | Bovik, Alan C. (Alan Conrad), 1958- | |
dc.contributor.advisor | Ward, Rachel, 1983- | |
dc.creator | Minchev, Kliment | |
dc.creator.orcid | 0000-0002-4775-5625 | |
dc.date.accessioned | 2022-11-21T19:20:10Z | |
dc.date.available | 2022-11-21T19:20:10Z | |
dc.date.created | 2022-05 | |
dc.date.issued | 2022-06-30 | |
dc.date.submitted | May 2022 | |
dc.date.updated | 2022-11-21T19:20:11Z | |
dc.description.abstract | This work investigates a novel application of a Vision Transformer (ViT) as a quality assessment reference metric for reconstructed images after neural image compression. The Vision Transformer is a revolutionary implementation of the Transformer attention mechanism (typically used in language models) to object detection in digital images. The ViT architecture is designed to output a classification probability distribution against a set of training labels. Thus, it is a suitable candidate for a new method for quantitative assessment of generated image quality based on object-level deviations from the original pre-compression image. The metric is referred to as a ViT-Score. This approach complements other comparative measurement techniques based on per-pixel discrepancies (Mean Squared Error, MSE) or structural comparison (Structural Similarity Index, SSIM). This study proposes an original end-to-end deep learning framework for neural image compression, latent vector representation, reconstruction, and image quality analysis using state-of-the-art model architectures. Neural image compression and reconstruction is achieved using a Generative Adversarial Network (GAN). Results from this work demonstrate that a ViT-Score is capable of assessing the quality of a neurally compressed image. Moreover, this methodology provides valuable insights when measuring GAN output quality and can be used in addition to other relevant perceived quality metrics such as SSIM or Frechet Inception Distance (FID). | |
dc.description.department | Computational Science, Engineering, and Mathematics | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | https://hdl.handle.net/2152/116761 | |
dc.identifier.uri | http://dx.doi.org/10.26153/tsw/43656 | |
dc.language.iso | en | |
dc.subject | Transformers | |
dc.subject | Deep learning | |
dc.subject | Image compression | |
dc.subject | Image processing | |
dc.subject | Computer vision | |
dc.subject | Machine learning | |
dc.subject | Data science | |
dc.title | Vision Transformer-assisted analysis of neural Image compression and generation | |
dc.type | Thesis | |
dc.type.material | text | |
thesis.degree.department | Computational Science, Engineering, and Mathematics | |
thesis.degree.discipline | Computational Science, Engineering, and Mathematics | |
thesis.degree.grantor | The University of Texas at Austin | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science in Computational Science, Engineering, and Mathematics |
Access full-text files
Original bundle
1 - 1 of 1