Browsing by Subject "Statistical analysis"

Now showing 1 - 6 of 6

Genomics analysis on the responses of E. coli cells to varying environmental conditions
(2016-05) Yan, Xiwei; Wilke, C. (Claus); Lin, Lizhen
The natural living environments of E. coli cells are diverse, varying from mammalian gastrointestinal tracts and soil. Each environment might require distinct metabolic pathways and transporter systems, and long-term evolution has established elaborate regulatory system for E. coli cells to quickly adapt to the changing conditions. Sensing outside stresses and then adopting a different phenotype enable them to take advantage of any possible nutrients and defend against hostile environment. A lot of regulatory mechanisms have been identified by genetic, biochemical and molecular biology methods, and our study aim to build a systematic view on the response of the whole genome to four different environmental conditions. We used statistical tests including Pearson’s tests and Spearman’s tests and multiple testing adjustments to identify feature genes that are induced or repressed significantly across treatment levels. The feature genes identified were partially supported by previous literatures, and some of the novel genes not found in any previous studies may infer a potential research blind spot. Additionally, we compared the correlation tests to the implementation of machine learning algorithms, and discussed the advantage and drawbacks of each method.
Large-scale statistical analysis of NLDAS variables and hydrologic web applications
(2016-05) Espinoza Dávalos, Gonzalo Enrique; Maidment, David R.; McKinney, Daene C; Passalacqua, Paola; Hodges, Ben R; Yang, Zong-Liang
The Land Data Assimilation System (LDAS) is a model developed by the National Aeronautics and Space Administration (NASA) for the purpose of quantifying the heat and water fluxes between the atmosphere and the land-surface hydrology. LDAS has two forms: National (NLDAS) and Global (GLDAS). The NLDAS grid is 1/8° with hourly and monthly estimates since 1979. The LDAS model output provides a comprehensive time-space dataset. A statistical analysis is necessary to obtain descriptive information, understand seasonal patterns, spatial distribution, and frequency distribution of the model output. The current conditions can be compared to those in the past by using statistical distributions for each variable unique to each time interval and spatial grid point. This dissertation objectives are: (1) perform a statistical analysis on the time series of NLDAS variables and model their spatial-temporal probability distributions, (2) improve data exposure through the comparison of current values with the past using web applications, and (3) evaluate the framework for access to NLDAS data. The methodology presented consists of: (1) the estimation of the NLDAS cumulative distribution functions (CDFs) on a daily and a monthly time step and development of the probability models for five variables: precipitation, runoff, soil moisture, evapotranspiration, and temperature. (2) The creation of dynamic websites displaying the maps, time series, and latest values in the NLDAS model and its relation with the historic distributions. And (3) the implementation of time-indexed and spaced-index data access procedures. The methodology is implemented using the latest technologies in high-performance computing (HPC), cloud storage and deployment, and Geographic Information Systems (GIS) that allow performing this analysis on a large dataset (NLDAS) on a national scale, using the United States as a study case. A statistical analysis of the NLDAS model output and the comparison of current values with the historic distribution provides a thorough insight of the ranges, extremes, and seasonal variation of the hydrologic variables. The exposure of large scientific datasets such as NLDAS though the use of standards and web applications can enhance its use in hydrologic sciences and engineering.
Machine learning and statistical analysis in material property prediction
(2018-12) You, Zhuoya; Li, Wei (Of University of Texas at Austin)
With the development of algorithms, models and data-driven efforts in other areas, machine learning is beginning to make impacts in materials science and engineering. In this work, we review the basic steps of using machine learning in materials science. We also develop several machine learning methods to predict the two physically-distinct properties of transparent conductors: formation enthalpy, which is an indication of stability, and bandgap energy, which is an indication of optical transparency. These include regression-based models such as the ordinary least squares (OLS) regression model, stepwise selection model, Ridge model and Lasso model, and tree-based models such as the random forest model and gradient boosted model (GBM). We discuss the advantages and potential problems of each model and provide suggestions for possible applications.
Rapid Test to Establish Grading of Unbound Aggregate Products: An Evaluation of Automated Devices to Replace and Augment Manual Sieve Analyses in Determining Aggregation Gradation
(2002-02) Rauch, Alan F.; Haas, Carl T. (Carl Thomas); Browne, Craig; Kim, Hyoungkwan
Several automated devices are commercially available for measuring the gradation of unbound stone aggregates. These computerized machines, which provide a rapid alternative to manual sieving, capture and process two-dimensional digital images of aggregate particles to determine grain size distribution. Five of these automated gradation devices were evaluated for accuracy and performance. Fifteen aggregate test samples, with different size, shape, and mineral characteristics, were used in these tests. To quantify how well the machine results compare with data from standard sieve analyses, the CANWE statistic was developed and used. While the machine data did not match the sieve data exactly, the evaluated devices were found to provide good measures of particle gradation for most samples. These tests also indicate that some machines will give more repeatable results in multiple tests of a given material, while others yield better results when testing different materials. The methodology used in this study is suitable for objectively evaluating the accuracy of other rapid gradation machines for various applications. ICAR Project 503 was undertaken to study rapid, automated methods of determining the grain size distribution of unbound aggregate products. Two technologies were studied in detail: digital image analysis and laser profiling. This report summarizes the evaluation of digital imaging devices, while the second part of the final project report describes the development of a laser scanning device for grading aggregates.
Simulation and optimization techniques applied in semiconductor assembly and test operations
(2016-05) Jia, Shihui; Bard, Jonathan F.; Morrice, Douglas J; Hasenbein, John; Khajavirad, Aida; Gao, Zhufeng
The importance of back-end operations in semiconductor manufacturing has been growing steadily in the face of higher customer expectations and stronger competition in the industry. In order to achieve low cycle times, high throughput, and high utilization while improving due-date performance, more effective tools are needed to support machine setup and lot dispatching decisions. In previous work, the problem of maximizing the weighted throughput of lots undergoing assembly and test (AT), while ensuring that critical lots are given priority, was investigated and a greedy randomized adaptive search procedure (GRASP) developed to find solutions. Optimization techniques have long been used for scheduling manufacturing operations on a daily basis. Solutions provide a prescription for machine setups and job processing over a finite the planning horizon. In contrast, simulation provides more detail but in a normative sense. It tells you how the system will evolve in real time for a given demand, a given set of resources and rules for using them. A simulation model can also accommodate changeovers, initial setups and multi-pass requirements easily. The first part of the research is to show how the results of an optimization model can be integrated with the decisions made within a simulation model. The problem addressed is defined in terms of four hierarchical objectives: minimize the weighted sum of key device shortages, maximize weighted throughput, minimize the number of machines used, and minimize makespan for a given set of lots in queue, and a set of resources that includes machines and tooling. The facility can be viewed as a reentrant flow shop. The basic simulation was written in AutoSched AP (ASAP) and then enhanced with the help of customization features available in the software. Several new dispatch rules were developed. Rule_First_setup is able to initialize the simulation with the setups obtained with the GRASP. Rule_All_setups enables a machine to select the setup provided by the optimization solution whenever a decision is about to be made on which setup to choose subsequent to the initial setup. Rule_Hotlot was also proposed to prioritize the processing of the hot lots that contain key devices. The objective of the second part of the research is to design and implement heuristics within the simulation model to schedule back-end operations in a semiconductor AT facility. Rule_Setupnum lets the machines determine which key device to process according to a machine setup frequency table constructed from the GRASP solution. GRASP_asap embeds a more robust selection features of GRASP in the ASAP model through customization. This allows ASAP to explore a larger portion of the feasible region at each decision point by randomizing machine setups using adaptive probability distributions that are a function of solution quality. Rule_Greedy, which is a simplification of GRASP_asap, always picks the setup for a particular machine that gives the greatest marginal improvement in the objective function among all candidates. The purpose of the third part of the research is to statistically validate the relative effectiveness of our top six dispatch rules by comparing their performance on 30 real and randomly generated data sets. Using both GRASP and our ASAP discrete event simulation model, we have (1) identified the general order of dispatch rule performance, (2) investigated the impact of having setups installed on machines at time zero on rule performance, (3) determined the conditions under which restricting the maximum number of changeover affects the rule performance, and (4) studied the factors that might simultaneously affect rule performance with the help of a common random numbers experimental design. In the analysis, the first two objectives, weighted key device shortages and weighted throughput, are used to measure outcomes.
Statistical analysis of historic hydrocarbon production data from Gulf of Mexico oil and gas fields and application to dynamic capacity assessment in CO2 storage
(International Journal of Greenhouse Gas Control, 2019-01-01) Goudarzi, Ali; Meckel, Timothy A.; Hosseini, Seyyed A.; Trevino, Ramon H.