Perceptual quality prediction of social pictures, social videos, and telepresence videos

dc.contributor.advisorBovik, Alan C. (Alan Conrad), 1958-
dc.contributor.committeeMemberGhadiyaram, Deepti
dc.contributor.committeeMemberDe Veciana, Gustavo
dc.contributor.committeeMemberWang, Atlas
dc.contributor.committeeMemberGeisler, Wilson S
dc.creatorYing, Zhenqiang
dc.creator.orcid0000-0001-9730-5262
dc.date.accessioned2022-08-17T00:46:05Z
dc.date.available2022-08-17T00:46:05Z
dc.date.created2022-05
dc.date.issued2022-07-01
dc.date.submittedMay 2022
dc.date.updated2022-08-17T00:46:06Z
dc.description.abstractThe unprecedented growth of online social-media venues and rapid advances in technology by camera and mobile device manufacturers have led to the creation and consumption of a limitless supply of images/videos. Given the tremendous prevalence of Internet images/videos, monitoring the perceptual quality of images/videos would be a high-stakes problem. This dissertation focuses on the perceptual quality prediction of social pictures, social videos, and telepresence videos by constructing datasets of images/videos with their perceptual quality labels, as well as on designing algorithms that accurately predict the perceptual quality of images/videos. While considerable efforts have been put into effectively predicting the perceptual quality of synthetically distorted images/videos, real-world images/videos contain complex, composite mixtures of multiple distortions that non-uniformly distribute across space/time. The primary goal of my research is to design automatic image/video quality predictors that can effectively tackle the widely diverse authentic distortions of images/videos. To develop effective quality predictors, we trained deep neural networks on large-scale databases of authentically distorted images/videos. To improve the quality prediction by exploiting the non-uniformity of distortions, we collected quality labels for both the whole images/videos and patches/clips cropped from them. For social images, we built the LIVE-FB Large-Scale Social Picture Quality Database, containing about 40K real-world distorted pictures and 120K patches, on which we collected about 4M human judgments of picture quality. Using these picture and patch quality labels, we built deep region-based models that learn to produce state-of-the-art global picture quality predictions as well as useful local picture quality maps. Our innovations include picture quality prediction architectures that produce global-to-local inferences as well as local-to-global inferences (via feedback). For social videos, we built the Large-Scale Social Video Quality Database, containing 39K real-world distorted videos and 117K space-time localized video patches, and 5.5 M human perceptual quality annotations. Using this, we created two unique blind video quality assessment (VQA) models: (a) a local-to-global region-based blind VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 video quality datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Mapper). For telepresence videos, we mitigated the dearth of subjectively labeled telepresence data by collecting 2k telepresence videos from different countries, on which we crowdsourced 80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/115241
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/42142
dc.language.isoen
dc.subjectImage quality assessment
dc.subjectVideo quality assessment
dc.subjectBlind quality assessment
dc.subjectPerceptual quality
dc.subjectUser-generated content
dc.titlePerceptual quality prediction of social pictures, social videos, and telepresence videos
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YING-DISSERTATION-2022.pdf
Size:
11.48 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: