Browsing by Subject "Sampling"

Now showing 1 - 4 of 4

Evaluating the uncertainty of a UN indicator for sustainable development goal 6.1 : case study on water access in Morocco
(2017-12) Oueidat, Haytham; Eaton, David J.; McKinney, Daene
This report examines the reliability of an indicator used by the UN to monitor the progress of member nations on Sustainable Development Goal 6.1 related to improved universal water access. The report examines the international and national framework for development indicators with a case study on Morocco. This study discusses limitations posed by household surveys and post-census investigations on providers of water services. It proposes a methodology to harmonize conflicting indicator definitions that is leading to contested statistics. The new methodology combines user-based stratified sampling techniques and provider-based definitions of the population’s strata with satellite image processing to select households. Two experiments demonstrate improvements in the indicator’s precision and accuracy. A first experiment simulates the uncertainty of water access estimates resulting from a typical household survey. A second experiment provides the reasoning and implementation of the new sampling methodology, showing a clear reduction of bias and variance in water access estimates.
Models of streaming computation
(2021-08-13) Kallaugher, John M.; Price, Eric, Ph. D.; Aaronson, Scott; Zuckerman, David; Kapralov, Michael
Streaming algorithms, which process very large datasets received one update at a time, are a key tool in large-scale computing. This thesis studies the theory of streaming algorithms, and in particular the implications of how we model streaming algorithms. In constructing such a model, two questions arise: • What kinds of stream must the algorithm handle? • What is the algorithm allowed to do? We describe new streaming algorithms and prove new lower bounds in various streaming models, illustrating how the answers to these questions change which kinds of streaming problem are tractable. These include: • New algorithms for counting triangles in the insertion-only and linear sketching models, along with tight (up to log factors) lower bounds. • A quantum streaming algorithm for counting triangles, giving the first quantum advantage for a natural one-pass streaming problem. • A complete characterization of the linear sketching complexity of subgraph counting problems in constant-degree graphs. • An exponential separation between linear sketching and turnstile streaming under natural restrictions on the stream, complementing a prior series of work [Gan08, LNW14, AHLW16] that establishes equivalences between these models in the absence of such restrictions. • A new model of random-order streaming in which some correlations are allowed between edge arrival times, along with tight (up to polylog factors) upper and lower bounds for the problem of finding components in this model. A common theme in many of these results is the connection between sampling algorithms and lower bounds derived from the techniques of Boolean Fourier analysis. We give new methods for extending these techniques to settings such as linear sketching and random-order streaming.
Probabilistic bicriteria models : sampling methodologies and solution strategies
(2010-08) Rengarajan, Tara; Morton, David P.; Hasenbein, John J.; Kutanoglu, Erhan; Muthuraman, Kumar; Popova, Elmira
Many complex systems involve simultaneous optimization of two or more criteria, with uncertainty of system parameters being a key driver in decision making. In this thesis, we consider probabilistic bicriteria models in which we seek to operate a system reliably, keeping operating costs low at the same time. High reliability translates into low risk of uncertain events that can adversely impact the system. In bicriteria decision making, a good solution must, at the very least, have the property that the criteria cannot both be improved relative to it. The problem of identifying a broad spectrum of such solutions can be highly involved with no analytical or robust numerical techniques readily available, particularly when the system involves nontrivial stochastics. This thesis serves as a step in the direction of addressing this issue. We show how to construct approximate solutions using Monte Carlo sampling, that are sufficiently close to optimal, easily calculable and subject to a low margin of error. Our approximations can be used in bicriteria decision making across several domains that involve significant risk such as finance, logistics and revenue management. As a first approach, we place a premium on a low risk threshold, and examine the effects of a sampling technique that guarantees a prespecified upper bound on risk. Our model incorporates a novel construct in the form of an uncertain disrupting event whose time and magnitude of occurrence are both random. We show that stratifying the sample observations in an optimal way can yield savings of a high order. We also demonstrate the existence of generalized stratification techniques which enjoy this property, and which can be used without full distributional knowledge of the parameters that govern the time of disruption. Our work thus provides a computationally tractable approach for solving a wide range of bicriteria models via sampling with a probabilistic guarantee on risk. Improved proximity to the efficient frontier is illustrated in the context of a perishable inventory problem. In contrast to this approach, we next aim to solve a bicriteria facility sizing model, in which risk is the probability the system fails to jointly satisfy a vector-valued random demand. Here, instead of seeking a probabilistic guarantee on risk, we instead seek to approximate well the efficient frontier for a range of risk levels of interest. Replacing the risk measure with an empirical measure induced by a random sample, we proceed to solve a family of parametric chance-constrained and cost-constrained models. These two sampling-based approximations differ substantially in terms of what is known regarding their asymptotic behavior, their computational tractability, and even their feasibility as compared to the underlying "true" family of models. We establish however, that in the bicriteria setting we have the freedom to employ either the chance-constrained or cost-constrained family of models, improving our ability to characterize the quality of the efficient frontiers arising from these sampling-based approximations, and improving our ability to solve the approximating model itself. Our computational results reinforce the need for such flexibility, and enable us to understand the behavior of confidence bounds for the efficient frontier. As a final step, we further study the efficient frontier in the cost versus risk tradeoff for the facility sizing model in the special case in which the (cumulative) distribution function of the underlying demand vector is concave in a region defined by a highly-reliable system. In this case, the "true" efficient frontier is convex. We show that the convex hull of the efficient frontier of a sampling-based approximation: (i) can be computed in strongly polynomial time by relying on a reformulation as a max-flow problem via the well-studied selection problem; and, (ii) converges uniformly to the true efficient frontier, when the latter is convex. We conclude with numerical studies that demonstrate the aforementioned properties.
Rapid Test to Establish Grading of Unbound Aggregate Products: Evaluation of Potential Aggregate Grading Technologies
(2000-02) Rauch, Alan F.; Haas, Carl T. (Carl Thomas); Kim, Hyoungkwan; Browne, Craig
A research study is underway to develop automated methods for rapidly grading aggregates on the production line in a typical aggregate separation or mixing facility. This interim report serves to document preliminary work on this project and presents: (1) A discussion of typical plant layouts, to identify potential sampling locations. The best sampling locations appear to be just after final screening, where sorted material is sent to stockpiles, and just before mixing the final product, where aggregates are fed from either stockpiles or charge bins. (2) A thorough examination of six potential technologies that could be used to rapidly determine particle size. After a critical review and a formal decision analysis, both digital image analysis and laser profiling appear to be equally promising and worthy of additional study. (3) A discussion of our current thinking on how to configure scanning equipment of this kind. By considering innovative methods for presenting aggregate particles to the scanning sensor, the opportunity exists for advancing this technology. An outline of project work planned for the immediate future. Continuing our study of both digital image analysis and laser profiling, we plan to: 1. Conduct a limited, independent evaluation of three commercial particle-sizing machines, which all use digital image analysis. 2. Perform preliminary tests using a laser profiler. 3. Construct a laboratory test bed that can be used to test and evaluate various scanning sensors.