Vision Transformer-assisted analysis of neural Image compression and generation

dc.contributor.advisorBovik, Alan C. (Alan Conrad), 1958-
dc.contributor.advisorWard, Rachel, 1983-
dc.creatorMinchev, Kliment
dc.creator.orcid0000-0002-4775-5625
dc.date.accessioned2022-11-21T19:20:10Z
dc.date.available2022-11-21T19:20:10Z
dc.date.created2022-05
dc.date.issued2022-06-30
dc.date.submittedMay 2022
dc.date.updated2022-11-21T19:20:11Z
dc.description.abstractThis work investigates a novel application of a Vision Transformer (ViT) as a quality assessment reference metric for reconstructed images after neural image compression. The Vision Transformer is a revolutionary implementation of the Transformer attention mechanism (typically used in language models) to object detection in digital images. The ViT architecture is designed to output a classification probability distribution against a set of training labels. Thus, it is a suitable candidate for a new method for quantitative assessment of generated image quality based on object-level deviations from the original pre-compression image. The metric is referred to as a ViT-Score. This approach complements other comparative measurement techniques based on per-pixel discrepancies (Mean Squared Error, MSE) or structural comparison (Structural Similarity Index, SSIM). This study proposes an original end-to-end deep learning framework for neural image compression, latent vector representation, reconstruction, and image quality analysis using state-of-the-art model architectures. Neural image compression and reconstruction is achieved using a Generative Adversarial Network (GAN). Results from this work demonstrate that a ViT-Score is capable of assessing the quality of a neurally compressed image. Moreover, this methodology provides valuable insights when measuring GAN output quality and can be used in addition to other relevant perceived quality metrics such as SSIM or Frechet Inception Distance (FID).
dc.description.departmentComputational Science, Engineering, and Mathematics
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/116761
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/43656
dc.language.isoen
dc.subjectTransformers
dc.subjectDeep learning
dc.subjectImage compression
dc.subjectImage processing
dc.subjectComputer vision
dc.subjectMachine learning
dc.subjectData science
dc.titleVision Transformer-assisted analysis of neural Image compression and generation
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputational Science, Engineering, and Mathematics
thesis.degree.disciplineComputational Science, Engineering, and Mathematics
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computational Science, Engineering, and Mathematics

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MINCHEV-THESIS-2022.pdf
Size:
4.66 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: