Browsing by Subject "Natural scene statistics"

Now showing 1 - 10 of 10

Blind image and video quality assessment using natural scene and motion models
(2013-05) Saad, Michele Antoine; Bovik, Alan C. (Alan Conrad), 1958-
We tackle the problems of no-reference/blind image and video quality evaluation. The approach we take is that of modeling the statistical characteristics of natural images and videos, and utilizing deviations from those natural statistics as indicators of perceived quality. We propose a probabilistic model of natural scenes and a probabilistic model of natural videos to drive our image and video quality assessment (I/VQA) algorithms respectively. The VQA problem is considerably different from the IQA problem since it imposes a number of challenges on top of the challenges faced in the IQA problem; namely the challenges arising from the temporal dimension in video that plays an important role in influencing human perception of quality. We compare our IQA approach to the state of the art in blind, reduced reference and full-reference methods, and we show that it is top performing. We compare our VQA approach to the state of the art in reduced and full-reference methods (no blind VQA methods that perform reliably well exist), and show that our algorithm performs as well as the top performing full and reduced reference algorithms in predicting human judgments of quality.
A closed-form correlation model of oriented bandpass natural images beyond adjacent responses
(2015-05) Sinno, Zeina; Bovik, Alan C. (Alan Conrad), 1958-; Ghosh, Joydeep
Building natural scene statistical models is crucial for a large set of applications starting from the design of faithful image and video quality metrics to image enhancing techniques. Most predominant statistical models of natural images characterize univariate distributions of divisively normalized bandpass responses or wavelet-like decomposition of them. Previous models focusing on these bandpass natural responses offer optimized solutions to numerous problems in image processing however, these models have not focused on finding a closed-form quantative model capturing the bivariate natural statistics. Towards the efforts for filling this gap, Su et. al recently modeled spatially horizontally neighboring bandpass image responses on multiple scales; however, the latter work did not cover the response of distant bandpass image responses with various spatial orientations. This work builds on Su. et al 's model and extends the closed-form correlation model to cover distant bandpass image responses, up to a distance of ten pixels; with multiple spatial orientations, encompassing all the discrete spatial angles for the lastly-mentioned distances on multiple scales.
Measuring and predicting detection performance on security images as a function of image quality
(2021-04-20) Gupta, Praful; Bovik, Alan C. (Alan Conrad), 1958-; Glover, Jack L.; Ghosh, Joydeep; Vikalo, Haris; Geisler, Wilson S.
Developing algorithms that predict how image quality affects task performance is a topic of great interest for many applications involving security, medical imaging, and others. While task performance prediction as a function of image quality has been studied by the medical imaging community, little work in this direction has been reported in the security imaging field. In this work, we take steps towards broadly solving this problem. In the first part of the dissertation, we analyze the effects of X-ray image quality degradations on the detection performance of trained bomb technicians. For this, we conducted a visual task performance study on a database of distorted security X-ray images whose quality is degraded using realistic models of noise. Using this new NIST-LIVE X-Ray IED Image Quality Database, we created a set of objective human task prediction algorithms using models of natural X-ray image statistics. Our basic tools are traditional Image Quality Indicators’ (IQIs) and perceptually relevant Natural Scene Statistics (NSS) based image models that have been extensively used in visible light (VL) image quality prediction algorithms. We show that these measures are able to quantify the perceptual severity of degradations, and can predict the visual task performance of experts trained to detect and identify components of improvised explosive devices (IEDs) in images. Combining these efficient and perceptually relevant models with standardized ‘image quality indicators’ (IQIs) yields even better task performance prediction. This is important, since simple, non-perceptual IQIs currently define IEEE/ANSI standards for numerical measurement of portable X-ray imager quality, and can potentially be augmented with NSS quality models to further improve their predictive capabilities. In the second part of the dissertation, we analyze the effects of distortions in millimeter-wave (MMW) images on the detection performance of automatic target recognition (ATR) systems. Deep-learning based recognition systems have become ubiquitous owing to their remarkable performance in many machine vision tasks. We study the susceptibility of such systems to degradations of input image quality. These ATR systems are responsible for detecting concealed contraband in images of passengers screened at airports. The performance of an ATR system is demonstrated to be worse on test images that are significantly different in quality from the training set. We use the NSS of MMW images to design a distortion-agnostic, no-reference image quality model referred to as MMW-NIQE, which correlates well with the detection performance of the studied ATR systems, and outperforms other image quality models on this task. Using this information, we demonstrate the efficacy of MMW-NIQE as a real-time front-end input image validation tool for ATR systems. The ability to validate input images based on MMW-NIQE quality scores help guide ATR systems to achieve predictable and robust performance. We also show that when the training set consists of images of mixed-quality, it is possible to yield better model generalization performance than just fine-tuning on degraded images. We propose a quality-aware ATR system that incorporates extracted quality measurements from the test image, that in real-time tunes the algorithm to perform optimally. The proposed model achieves this by taking a quality-weighted ensemble of multiple model predictions, each of which is trained on a separate image quality cluster, thereby improving the resilience of the overall system to diverse types of image degradation.
Natural scene statistics based blind image quality assessment in spatial domain
(2011-05) Mittal, Anish; Bovik, Alan C. (Alan Conrad), 1958-; Cormack, Lawrence K.
We propose a natural scene statistic based quality assessment model Refer- enceless Image Spatial QUality Evaluator (RISQUE) which extracts marginal statistics of local normalized luminance signals and measures 'un-naturalness' of the distorted image based on measured deviation of them. We also model distribution of pairwise products of adjacent normalized luminance signals providing us with orientation distortion information. Although multi-scale, the model is defined in the space domain avoiding costly frequency or wavelet transforms. The frame work is simple, fast, human perception based and shown to perform statistically better than other proposed no reference algorithms and full reference structural similarity index(SSIM).
Pattern detection in natural images
(2016-12) Sebastian, Stephen P.; Geisler, Wilson S.; Bovik, Alan; Hayhoe, Mary; Cormack, Lawrence K; Seideman, Eyal
A fundamental visual task is to detect target objects within a background scene. Using relatively simple stimuli, vision science has identified several major factors that affect detection thresholds, such as the luminance of the background, the contrast of the background, the spatial similarity of the background to the target, and uncertainty due to random variations in the properties of the background and in the amplitude of the target. Here I use a new experimental approach together with a theoretical analysis based on signal detection theory, to discover how these factors affect detection in natural scenes. First, I sorted a large collection of natural image backgrounds into multidimensional bins, where each bin corresponds to a narrow range of luminance, contrast and similarity. Detection thresholds were measured by randomly sampling a natural image background from a bin on each trial. In low uncertainty conditions both the bin and the amplitude of the target were blocked and in high uncertainty conditions the bin and amplitude varied randomly on each trial. I found that thresholds increased approximately linearly along all three dimensions and that detection accuracy was unaffected by bin and amplitude uncertainty. The entire set of results was predicted from first principles by a normalized matched template detector, where the dynamic normalizing factor follows directly from the statistical properties of the natural backgrounds. This model assumed that the properties of the background underneath the target were constant across the image, but in natural images this is often not the case. Therefore, in a separate experiment, I measured detection thresholds on backgrounds where the contrast was modulated underneath the target. I found that varying the contrast underneath the target signal had a substantial effect on detectability, and that the pattern of results was predicted by an ideal observer that weighted its response based on an estimate of the local contrast (under the target). This suggests that the human visual system is able to use the varying properties of the background under the target in an near optimal way. Taken together, the results provide a new explanation for some classic laws of psychophysics and their underlying neural mechanisms.
Perceptual monocular depth estimation
(2020-03-24) Pan, Janice S.; Bovik, Alan C. (Alan Conrad), 1958-; Ghosh, Joydeep; Vikalo, Haris; Huang, Qixing; Mueller, Martin
Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve “computer vision problems”. Monocular cues provide sufficient data for humans to instantaneously extract an understanding of scene geometries and relative depths, which is evidence of both the processing power of the human visual system and the predictive power of the monocular data. However, developing computational models to predict depth from monocular images remains challenging. Hand-designed MDE features do not perform particularly well, and even current “deep” models are still evolving. Here we propose a novel approach that uses perceptually-relevant natural scene statistics (NSS) features to predict depths from monocular images in a simple, scale-agnostic way that is competitive with state-of-the-art systems. While the statistics of natural photographic images have been successfully used in a variety of image and video processing, analysis, and quality assessment tasks, they have never been applied in a predictive end-to-end deep-learning model for monocular depth. Here we accomplish this by developing a new closed-form bivariate model of image luminances and use features extracted from this model and from other NSS models to drive a novel deep learning framework for predicting depth given a single image. We then extend our perceptually-based MDE model to fisheye images, which suffer from severe spatial distortions, and we show that our method that uses monocular cues performs comparably to our best fisheye stereo matching approach. Fisheye cameras have become increasingly popular in automotive applications, because they provide a wider (approximately 180 degrees) field-of-view (FoV), thereby giving drivers and driver assistance systems more visibility with minimal hardware. We explore fisheye stereo as it pertains to the problem of automotive surround-view (SV), specifically, which is a system comprising four fisheye cameras positioned on the front, right, rear, and left sides of a vehicle. The SV system perspectively transforms the images captured by these four cameras and stitches them together in a birdseye-view representation of the scene centered around the ego vehicle to display to the driver. With the camera axes oriented orthogonally away from each other and with each camera capturing approximately 180 degrees laterally, there exists an overlap in FoVs between adjacent cameras. It is within these regions where we have stereo vision, and can thus triangulate depths with an appropriate correspondence matching method. Each stereo system within the SV configuration has a wide baseline and two orthogonally-divergent camera axes, both of which make traditional methods for estimating stereo correspondences perform poorly. Our stereo pipeline, which relies on a neural network trained for predicting stereo correspondences, performs well even when the stereo system has limited overlap in FoVs and two dissimilar views. Our monocular approach, however, can be applied to entire fisheye images and does not rely on the underlying geometry of the stereo configuration. We compare these two depth-prediction methods in both performance and application. To explore stereo correspondence matching using fisheye images and MDE in non-fisheye images, we also generated a large-scale photorealistic synthetic database containing co-registered RGB images and depth maps using a simulated SV camera configuration. The database was first captured using fisheye cameras with known intrinsic parameters, and the fisheye distortions were then removed to create the non-fisheye portion of the database. We detail the process of creating the synthetic-but-realistic city scene in which we captured the images and depth maps along with the methodology for generating such a large, varied, and generalizable dataset
Perceptual quality assessment of user-generated-content images and videos
(2022-07-01) Yu, Xiangxu; Bovik, Alan C. (Alan Conrad), 1958-; Joydeep, Ghosh; Vikalo, Haris; Geisler, Wilson SLEEJC; ADSUMILLI, BALU
Because of the increasing ease of image and video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) images and videos to social and streaming media sites over the Internet. UGC images and videos are commonly captured by naive users having limited skills and imperfect techniques, and tend to be afflicted by mixtures of highly diverse in-capture distortions. They are then often uploaded for sharing onto cloud servers, where they are further compressed for storage and transmission. My Ph.D. research first tackles the highly practical problem of predicting the quality of compressed images and videos with only (possibly severely) distorted UGC references. To address this problem, we develop a novel two-step image quality prediction concept called 2stepQA, and a novel Video Quality Assessment (VQA) framework called 1stepVQA. We construct a new, first-of-a-kind dedicated image quality database specialized for the design and testing of two-step IQA models, and a new dedicated video database, which was created by applying a realistic VMAF-Guided perceptual rate distortion optimization (RDO) criterion to create realistically compressed versions of UGC source videos, which typically have pre-existing distortions. Furthermore, we also study the automatic quality prediction of a particular UGC category, UGC gaming videos. To do this, we create a novel UGC gaming video resource, called the LIVE-YouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real UGC gaming videos. We create a new VQA model specifically designed to succeed on UGC gaming videos, called the Gaming Video Quality Predictor (GAME-VQP). GAME-VQP successfully predicts the unique statistical characteristics of gaming videos by drawing upon features designed under modified natural scene statistics models, combined with gaming specific features learned by a Convolution Neural Network. We study the performance of 2stepQA, 1stepVQA, and GAME-VQP on the three new video (image) databases, respectively, and find that they all outperform other mainstream VQA models.
Scene statistics in 3D natural environments
(2010-08) Liu, Yang, 1976-; Bovik, Alan C. (Alan Conrad), 1958-; Cormack, Lawrence K.; Geisler, Wilson G.; Vishwanath, Sriram; Ghosh, Joydeep
In this dissertation, we conducted a stereoscopic eye tracking experiment using naturalistic stereo images. We analyzed low level 2D and 3D scene features at binocular fixations and randomly selected places. The results reveal that humans tend to fixate on regions with higher luminance variations, but lower disparity variations. Because of the often observed co-occurrence of luminance and depth changes in natural environments, the dichotomy between luminance features and disparity features inspired us to study the accurate statistics of 2D and 3D scene properties. Using a range map database, we studied the distribution of disparity in natural scenes. The natural disparity distribution has a high peak at zero, and heavier tails that are similar to a Laplace distribution. The relevance of natural disparity distribution to other studies in neurobiology and visual psychophysics are discussed in detail. We also studied luminance, range and disparity statistics in natural scenes using a co-registered luminance-range database. The distributions of bandpass 2D and 3D scene features can be well modeled by generalized Gaussian models. There are positive correlations between bandpass luminance and depth, which can be captured by varying shape parameters in the probability density functions of the generalized Gaussians. In another study on suprathreshold luminance and depth discontinuities, we show that observing a significant luminance edge at a significant depth edge is much more likely than at homogeneous depth surfaces. It is also true that a significant depth edge happens at a significant luminance edge with a greater probability than at homogeneous luminance regions. Again, the dependency between luminance and depth discontinuities can be modeled successfully by generalized Gaussians. We applied our statistical models in 3D natural scenes to stereo correspondence. A Bayesian framework is proposed to incorporate the bandpass disparity prior, and the luminance-disparity dependency in the likelihood function. We compared our algorithm with a classical simulated annealing method based on heuristically defined energy functions. The computed disparity maps show great improvements both perceptually and objectively.
Statistical and perceptual properties of images and videos with applications
(2019-06-18) Sinno, Zeina; Bovik, Alan C. (Alan Conrad), 1958-; Ghosh, Joydeep; Caramanis, Constantine; Valvano, Jonathan; Geisler, Wilson S
The visual brain is optimally designed to process images from the natural environment that we perceive. Describing the natural environment statistically helps in understanding how the brain encodes those images efficiently. The Natural Scene Statistics (NSS) of the luminance component of images is the basis of several univariate statistical models. Such models were the fundamental building blocks of multiple visual applications, ranging from the design of faithful image and video quality models to the development of perceptually optimized image enhancing techniques. Towards advancing this area, I studied the bivariate statistical properties of images and developed the first of its kind closed-form model that describes the correlation of spatially separated bandpass image samples. I found that the model was useful in tackling different problems such as blindly assessing the quality of images and assessing 3D visual discomfort of stereo images. Provided the success of NSS in tackling image processing problems, I decided to use them as a tool to tackle the blind video quality assessment (VQA) problem. First, I constructed a video quality database, the LIVE Video Quality Challenge Database (LIVE-VQC). This database is the largest across several key dimensions: number of unique contents, distortions, devices, resolutions, and videographers. For collecting the subjective scores, I constructed a new framework in Amazon Mechanical Turk. A massive number of subjects from across the globe participated in my study. Those efforts resulted in a VQA database that serves as a great benchmark for real-world videos. Next, I studied the spatio-temporal statistics of a wide variety of natural videos and created a space-time completely blind VQA model that deploys a directional temporal NSS model to predict quality. My newly created model outperforms all previous completely blind VQA models on the LIVE-VQC
Utilizing natural scene statistics and blind image quality analysis of infrared imagery
(2013-08) Kaser, Jennifer Yvonne; Bovik, Alan C. (Alan Conrad), 1958-
With the increasing number and affordability of image capture devices, there is an increasing demand to objectively analyze and compare the quality of images. Image quality can also be used as an indicator to determine if the source image is of high enough quality to perform analysis on. When applied to real world scenarios, use of a blind algorithm is essential since a flawless reference image typically is unavailable. Recent research has shown promising results in no reference image quality utilizing natural scene statistics in the visual image space. Research has also shown that although the statistical profiles vary slightly, there are statistical regularities in IR images as well which would indicate that natural scene statistical models may be able to be applied. In this project, I will analyze BRISQUE quality features of IR images and determine if the algorithm can successfully be applied to IR images. Additionally, in order to validate the usefulness of these techniques, the BRISQUE quality features are analyzed using a detection algorithm to determine if they can be used to predict conditions which may cause missed detections.