Perceptual quality prediction of social pictures, social videos, and telepresence videos

Ying, Zhenqiang

Perceptual quality prediction of social pictures, social videos, and telepresence videos

dc.contributor.advisor	Bovik, Alan C. (Alan Conrad), 1958-
dc.contributor.committeeMember	Ghadiyaram, Deepti
dc.contributor.committeeMember	De Veciana, Gustavo
dc.contributor.committeeMember	Wang, Atlas
dc.contributor.committeeMember	Geisler, Wilson S
dc.creator	Ying, Zhenqiang
dc.creator.orcid	0000-0001-9730-5262
dc.date.accessioned	2022-08-17T00:46:05Z
dc.date.available	2022-08-17T00:46:05Z
dc.date.created	2022-05
dc.date.issued	2022-07-01
dc.date.submitted	May 2022
dc.date.updated	2022-08-17T00:46:06Z
dc.description.abstract	The unprecedented growth of online social-media venues and rapid advances in technology by camera and mobile device manufacturers have led to the creation and consumption of a limitless supply of images/videos. Given the tremendous prevalence of Internet images/videos, monitoring the perceptual quality of images/videos would be a high-stakes problem. This dissertation focuses on the perceptual quality prediction of social pictures, social videos, and telepresence videos by constructing datasets of images/videos with their perceptual quality labels, as well as on designing algorithms that accurately predict the perceptual quality of images/videos. While considerable efforts have been put into effectively predicting the perceptual quality of synthetically distorted images/videos, real-world images/videos contain complex, composite mixtures of multiple distortions that non-uniformly distribute across space/time. The primary goal of my research is to design automatic image/video quality predictors that can effectively tackle the widely diverse authentic distortions of images/videos. To develop effective quality predictors, we trained deep neural networks on large-scale databases of authentically distorted images/videos. To improve the quality prediction by exploiting the non-uniformity of distortions, we collected quality labels for both the whole images/videos and patches/clips cropped from them. For social images, we built the LIVE-FB Large-Scale Social Picture Quality Database, containing about 40K real-world distorted pictures and 120K patches, on which we collected about 4M human judgments of picture quality. Using these picture and patch quality labels, we built deep region-based models that learn to produce state-of-the-art global picture quality predictions as well as useful local picture quality maps. Our innovations include picture quality prediction architectures that produce global-to-local inferences as well as local-to-global inferences (via feedback). For social videos, we built the Large-Scale Social Video Quality Database, containing 39K real-world distorted videos and 117K space-time localized video patches, and 5.5 M human perceptual quality annotations. Using this, we created two unique blind video quality assessment (VQA) models: (a) a local-to-global region-based blind VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 video quality datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Mapper). For telepresence videos, we mitigated the dearth of subjectively labeled telepresence data by collecting 2k telepresence videos from different countries, on which we crowdsourced 80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.
dc.description.department	Electrical and Computer Engineering
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2152/115241
dc.identifier.uri	http://dx.doi.org/10.26153/tsw/42142
dc.language.iso	en
dc.subject	Image quality assessment
dc.subject	Video quality assessment
dc.subject	Blind quality assessment
dc.subject	Perceptual quality
dc.subject	User-generated content
dc.title	Perceptual quality prediction of social pictures, social videos, and telepresence videos
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Electrical and Computer Engineering
thesis.degree.discipline	Electrical and Computer Engineering
thesis.degree.grantor	The University of Texas at Austin
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1

Name:: YING-DISSERTATION-2022.pdf
Size:: 11.48 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.45 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.84 KB
Format:: Plain Text
Description:

Download

Collections

UT Electronic Theses and Dissertations