Quality prediction and visual enhancement of user-generated content




Tu, Zhengzhong

Journal Title

Journal ISSN

Volume Title



With the rapid development of streaming media technologies coupled with the explosion of user-generated content (UGC) captured and streamed over social media platforms, such as YouTube and Facebook, videos now play a central role in the daily lives of billions of people. The increased popularity of UGC videos has catalyzed the great need to understand and analyze billions of these shared contents to optimize pipelines of efficient UGC video storage, processing, and streaming. UGC videos, which are typically created by amateur videographers, often suffer from unsatisfactory perceptual quality, arising from any process throughout video acquisition. In this regard, predicting UGC video quality is much more challenging than assessing the quality of synthetically distorted videos in traditional video quality databases. In this dissertation, we will comprehensively investigate the quality prediction and enhancement problems for UGC pictures and videos. We first study a particular artifact, "the banding artifact," that is a common video compression impairment. We approach this artifact by first analyzing the perceptual and encoding aspects of color bands, then build a new distortion-specific no-reference quality metric dedicated to banding visibility. Furthermore, we aim at building a banding artifact removal algorithm by formulating it as a visual enhancement problem. Accordingly, we propose to solve it by applying a form of content-adaptive smoothing filter followed by dithered quantization, as a post-processing module. We also extend this debanding filter by learning a cascaded artifact removal network to jointly remove banding and blocking artifacts, yielding greater visual enhancement. UGC distortions are more diverse, complicated, commingled, and thus no single quality factor can suffice to predict the overall quality. Blindly predicting the perceptual quality of UGC videos are very challenging. We first conducted a benchmark study on recent large-scale UGC video databases using leading popular no-reference video quality metrics, then propose to leverage feature selection to build a new compact video quality model on top of a curated list of previous effective spatial and temporal features from popular VQA models, which we dub VIDEVAL. In addition to this compact model, we also proposed to build an efficiency-oriented fast model for practical purposes called RAPIQUE by combining efficient natural scene statistics features with pre-trained deep learning models. This model would involve designing an aggressive spatial and temporal sampling process to boost its efficiency. Within the model building, we would also explore the temporal statistics of natural videos, which would contribute to pushing forward the performance of VQA models for motion-intensive videos with large camera motion. Next, we study visual restoration and enhancement of pictures degraded by common distortions existed in UGC videos, including noise, blur, low-light, etc. Based on recent progress on Transformer and multi-layer perceptron (MLP) models, we propose an efficient MLP-based vision backbone, which we dub MAXIM, that can effectively restore images suffering from degradation. The core component of MAXIM is called the multi-axis gated MLP block that achieves a local and global spatial interactions in linear complexity. We further extend this idea to high-level vision tasks such as image recognition by proposing another vision backbone called MaxViT. Our extensive numerical and visual experiments have shown that this multi-axis approach provides a strong vision component for both high-level and low-level vision tasks. Finally, we conclude the thesis with some remarks on the current challenges and future directions regarding the UGC video quality prediction and enhancement problems.


LCSH Subject Headings