# Browsing by Subject "Machine learning"

Now showing 1 - 20 of 284

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item A CFD-informed model for subchannel resolution crud prediction(2019-02-14) Gurecky, William Ladd; Haas, Derek Anderson, 1981-; Slattery, Stuart; Clarno, Kevin; Leibowicz, Benjamin; Landsberger, SheldonShow more A physics-directed, statistically based, surrogate model of the small scale flow fea- tures that impact Chalk River unidentified deposit (crud) growth is presented in this work. The objective of the surrogate is to provide additional details of the rod surface temperature, heat flux, and near-wall turbulent kinetic energy fields which cannot be explicitly captured by a subchannel code. Operating as a mapping from the high fidelity computational fluid dynamics (CFD) data to the low fidelity subchannel grid (hi2lo), the model provides CFD-informed bound- ary conditions to the crud model executed on the subchannel pin surface mesh. The surface temperature, heat flux, and turbulent kinetic energy, henceforth referred to as the fields of interest (FOI), govern the growth rate of crud on the surface of the rod and the precipitation of boron in the porous crud layer. Therefore the model predicts the behavior of the FOIs as a function of position in the core and local thermal-hydraulic (TH) conditions. The subchannel code produces an estimate for all crud-relevant TH quantities at a coarse spatial resolution everywhere in the core and executes substantially faster than CFD. In the hi2lo approach, the solution provided by the subchannel code is augmented by a predicted stochastic component of the FOI informed by CFD results to provide a more detailed description of the target FOIs than subchannel can provide alone. To this end, a novel method based on the marriage of copula and gradient boosting techniques is proposed. This methodology forgoes a spatial interpolation procedure for a statistically driven approach, which predicts the fractional area of a rod’s surface in excess of some critical temperature but not precisely where such maxima occur on the rod surface. The resultant model retains the ability to account for the presence of hot and cold spots on the rod surface induced by turbulent flow downstream of spacer grids when producing crud estimates. Sklar’s theorem is leveraged to decompose multivariate probability densities of the FOI into independent copula and marginal models. The free parameters within the copula model are predicted using a combination of supervised regression and classification machine learning techniques with training data sets supplied by a suite of precomputed CFD results spanning a typical pressurized water reactor TH envelope. Results show that compared to the subchannel standalone case, the hi2lo method more accurately preserves the influence of spacer grids on the crud growth rate. Or more precisely, the hi2lo method recovers key statistical properties of the FOI which impact crud growth. Compared to gold standard high fidelity CFD/crud coupled results in a single assembly test case, the hi2lo model produced a relative total crud mass difference of -8.9% compared to the standalone subchannel relative crud mass difference of 192.1%.Show more Item A computational fluid dynamics and machine learning study on web flutter during an oven drying process in roll-to-roll manufacturing(2020-05-08) Ahmed, Muhammad Bilal; Li, Wei (Of University of Texas at Austin); Suryanarayanan, SaikishanShow more Fluttering of a web is a major problem during the oven drying process of roll-to-roll manufacturing. In this study, two-dimensional (2D) computational fluid dynamics (CFD) models were developed to understand the flutter phenomenon. The CFD results revealed meandering of air jets as a source of flutter through air-web interactions. The root mean squared pressure (P [subscript RMS]) and mean wall shear stress (τmean) were identified as reasonable measures of web flutter cause and web drying efficiency, respectively. Machine learning models were then trained using the results of CFD simulations. It was shown that machine learning models captured the underlying physics of CFD simulations and were able to make accurate predictions. Using the machine learning models, optimization of parameters was performed where several key design and process parameters of the oven were adjusted to reduce the web flutter while keeping the rate of drying unchanged. Optimization produced promising results that showed about 30% reduction in P [subscript RMS] or web flutter could be achieved. Results of optimization were confirmed to be accurate by performing further CFD simulationsShow more Item A hybrid reduced approach to handle missing values in type 2 diabetes prediction(2016-05-06) You, Xinqi; Saar-Tsechansky, Maytal; Gawande, KishoreShow more Diabetes gains more attention among medical institutions and health care organizations as the increasing trend of diabetes around the world. In the United States, 29.1 million people or 9.3% of U.S. population are diagnosed with diabetes. About 86 million people are categorized as pre-diabetes and 15-30% of them will develop diabetes within 5 years. To tackle this challenge, National Diabetes Prevention Program (DPP) was introduced in 2002 and it reduces risk of diabetes by 58% through lifestyle change program. In order to help select a better group of prediabetes for intervention and maximize the cost-effectiveness of the program, we propose a Hybrid Reduced approach to handle missing values when predicting type 2 diabetes. This approach deals with 4 challenges in electronic medical records: missing values, missing not at random, class imbalance and predicting at a longer window (2-year). We select three ensemble predictive models: AdaBoost.M1, Gradient Boosting and Extremely Randomized Trees and apply this approach across 7 years to assess its robustness. The Hybrid Reduced approach includes two sub-approaches: Hybrid Reduced Organic and Hybrid Reduced Imputed. Throughout the experiments, Hybrid Reduced Imputed is the best performer and achieves a 5-7% improvement in precision. By simply using this approach, we could save $278 million for healthcare and improve people’s health conditionShow more Item A machine learning optical system to ensure that human assembly technicians use the specified bolt tightening sequence in assembly line manufacturing(2020-05-14) Soni, Varun (M.S. in mechanical engineering); Wilson, Preston S.Show more In large number of applications, the mechanical fasteners that are used to assemble the parts of a system must be tightened in a specific sequence to achieve the desired distribution of the load across the population of bolts. Failure to follow the sequence results in an undesired load distribution; this phenomenon is known as bolt crosstalk. Assembly personnel often fail to follow this sequence for a variety of reasons, resulting in over- or under-torqueing of bolts in the final assembly, which can lead to undesired system performance. There is currently no system or device that can ensure that a human operator follows a specified bolt tightening sequence while using a hand-held tool and thereby avoid bolt crosstalk. In this research, a system that constrains the operator to follow the specified tightening sequence was developed and tested. It utilizes a small tool-mounted camera to generate images of the bolt pattern and the relative location of the tool, and a machine learning algorithm to alert the operator if the tool is being brought to the wrong position. The developed software can detect all the bolt positions accurately by using a unique feature associated with them. The average of probabilities of detecting a position in different lightening conditions is more than 85%.Show more Item A modular attention hypothesis for modeling visuomotor behaviors(2021-07-24) Zhang, Ruohan; Ballard, Dana H. (Dana Harry), 1946-; Hayhoe, Mary; Stone, Peter; Huth, Alexander; Dayan, PeterShow more In this dissertation, we explore the hypothesis that complex intelligent behaviors, in vivo, can be decomposed into modules, which are organized in hierarchies and executed in parallel. This organization is similar to a multiprocessing architecture in silico. Biological attention can be viewed as a "process manager" that manages information processing and multiple computations. In this work, we seek to understand and model this modular attention mechanism for humans in a range of behavioral settings. We explain this approach to understanding modular attention at three levels based on David Marr’s paradigm: the computation theory level, the representation and algorithm level, and the hardware implementation level. At the computation theory level, we propose that simple visuomotor behaviors can be broken down into modules that require attention for their execution. At the representation and algorithm level, we model human eye movements and actions in a variety of visuomotor tasks. We collect and publish a large-scale, high-quality dataset of eye movements and actions of humans playing Atari video games. We study the active vision problem by jointly modeling human eye movements and actions, and compare how humans and artificial learning agents play these video games differently. We then propose a modular reinforcement learning model for modeling human subjects’ navigation behaviors in a virtual-reality environment with multiple goals. We further develop a modular inverse reinforcement learning algorithm to efficiently estimate the subjective reward and discount factors associated with each behavioral goal. At the implementation level, we propose a theoretical neuronal communication model named gamma spike multiplexing that allows the cortex to perform multiple computations simultaneously without crosstalk. The model explains how the modular attention hypothesis might be implemented in the biological brain. The end goals of this work are to (1) build models to explain and predict observed human visuomotor behaviors and attention; (2) use these biologically inspired models to develop algorithms for better artificial learning systems.Show more Item A multi-scale framework for graph based machine learning problems(2017-05) Shin, Donghyuk; Dhillon, Inderjit S.; Whinston, Andrew B; Qiu, Lili; Chakrabarti, DeepayanShow more Graph data have become essential in representing and modeling relationships between entities and complex network structures in various domains such as social networks and recommender systems. As a main contributor of the recent Big Data trend, the massive scale of graphs in modern machine learning problems easily overwhelms existing methods and thus sophisticated scalable algorithms are needed for real-world applications. In this thesis, we develop a novel multi-scale framework based on the divide-and-conquer principle as an effective and scalable approach for machine learning tasks involving large sparse graphs. We first demonstrate how our multi-scale framework can be applied to the problem of computing the spectral decomposition of massive graphs, which is one of the most fundamental low-rank matrix approximations used in numerous machine learning tasks. While popular solvers suffer from slow convergence, especially when the desired rank is large, our method exploits the clustering structure of the graph and achieves superior performance compared to existing algorithms in terms of both accuracy and scalability. While the main goal of the divide-and-conquer approach is to efficiently compute solutions for the original problem, the proposed multi-scale framework further admits an attractive but less obvious feature that machine learning problems can benefit from. Particularly, we consider partial solutions of the subproblems computed in the process as localized models of the entire problem. By doing so, we can combine models at multiple scales from local to global and generate a holistic view of the underlying problem to achieve better performance than a single global view. We adapt such multi-scale view for the problems of link prediction in social networks and collaborative filtering in recommender systems with additional side information to obtain a model that can make accurate and robust predictions in a scalable manner.Show more Item Accelerating graph computation with system optimizations and algorithmic design(2021-08-06) Hoang, Loc Dac; Pingali, Keshav; Huang, Qixing; Rossbach, Christopher; Wu, BoShow more Most data in today's world can be represented in a graph form, and these graphs can then be used as input to graph applications to derive useful information, such as shortest paths in a road network, similarity between drugs in a drug-protein network, persons of interest in a social network, or recommended products for customers in a customer purchase history graph. Graphs are growing larger as time passes, so there is an ever-growing need for efficient graph applications. Developers typically have two methods for accelerating the runtime of a graph application: (1) they optimize the systems on which the graph application is run on, and/or (2) they optimize the algorithm itself and gain speedup via algorithmic novelties. In this dissertation, I propose work that accelerates graph applications from these two perspectives. Broadly speaking, the work I present in this dissertation will be split into systems work and algorithmic work. On the systems end, I present the CuSP system and the DeepGalois system: 1. CuSP, or Customizable Streaming Partitioner, is a fast and general distributed graph partitioner that generates partitions for distributed graph analytics systems to use. CuSP provides a solution to the problem of existing slow partitioners that can only handle a few built-in policies by providing users with a general interface for specifying streaming partitioning policies that CuSP will then use to efficiently partition graphs. Our evaluation of the system shows that it can partition up to 22× faster than the state-of-the-art offline graph partitioner XtraPulp while producing partitions that allow graph applications to run 2× faster on average than XtraPulp's partitions. CuSP can be extended to allow users to express specific partitioning policies for their algorithms as we show in a case study with distributed triangle counting. 2. DeepGalois is a distributed graph neural network system that uses the observation that graph neural network computation can be expressed as a graph problem which allows it to be implemented in graph analytics systems. DeepGalois is built using existing distributed graph analytics systems; CuSP and the Gluon communication substrate are used to partition GNN graphs and efficiently communicate partial aggregations and gradients. It also supports sampling and minibatching of the graph. Experimental results on up to 128 CPU machines demonstrate that DeepGalois scales and that DeepGalois's epoch time for its best host configurations for the evaluated graphs is on average 2× faster than DistDGL's epoch time for its best host configurations. From an algorithmic perspective, I present a novel round-efficient distributed betweenness centrality and a novel formulation of the graph transformer network as a graph algorithm that allows for more efficient computation. 1. Min Rounds Betweenness Centrality (MRBC) is a provably round efficient BC algorithm that uses a novel message update rule in order to only send out updates from a vertex if it knows that the data it is going to send it finalized. We prove the correctness of this rule as well as establish bounds on the maximum number of rounds the algorithm takes. We implement MRBC in the D-Galois distributed graph analytics system where we further reduce communication overhead by relaxing proxy update frequency based on the message send rule. Evaluation shows that compared to a classic Brandes BC algorithm implementation, it reduces the number of rounds by 14× and communication time by 2.8× on average. 2. Graph Transformer Networks (GTNs) are a variant of graph convolutional networks that learn and use important typed paths called metapaths in a heterogeneous graph in order to improve task accuracy. The original formulation of the problem uses a series of dense matrix multiplies that are space inefficient, and the matrix formulation makes it difficult to use fine-grained graph techniques like sampling. We formulate the GTN problem as a graph problem that is more space efficient as it does not need dense matrices. In addition, because it is formulated as a graph algorithm, we can apply metapath sampling on top of it to significantly decrease the computational load. Evaluation shows that the sampling-based graph formulation of the GTN can be up to 155× faster than the original formulation without any compromise in task accuracy.Show more Item Accelerating inverse solutions with machine learning and randomization(2023-04-18) Wittmer, Jonathan; Bui-Thanh, Tan; Tsai, Yen-Hsi (Richard); Dawson, Clinton; Ghattas, Omar; Sundar, HariShow more Inverse problems is a field of applied mathematics that finds wide application in both the scientific community and industry where the objective is to estimate some parameter of interest (PoI) from observations. These two quantities are related by a mapping known as the parameter-to-observable (PtO) map, which may be nonlinear. While the forward problem may be well-posed, the inverse problem is often ill-posed, making parameter estimation a difficult problem. Ill-posedness in the Hadamard sense means that at least one of the following is true: 1) the solution does not exist, 2) the solution is not unique, or 3) the solution does not depend continuously on the data. In cases of interest where the PtO map is an observational operator acting on the solution to a system of PDEs that have been discretized on a domain, the ill-posedness can be severe due to the compact nature of the PtO map. To address the ill-posedness, practitioners often write the solution of the inverse problem as the solution of a regularized least squares optimization problem where the regularization is constructed or designed to combat the ill-posedness, resulting in the existence of a unique solution that depends continuously on the data. There are many classical regularization methods including Tikhonov regularization, total variation (TV) regularization, and nonconvex regularization strategies such as using an l [subscript p] norm with 0 < p < 1. In addition to estimating the PoI only, it is also of interest to estimate the uncertainty. To do this, a Bayesian approach is typically employed where the solution to the inverse problem is a posterior probability density rather than a deterministic quantity. By Bayes theorem, the posterior is proportional to the product of the likelihood and the prior probability density. In the case of Gaussian observational noise and prior, finding the maximum a posteriori (MAP) point is equivalent to solving the regularized least squares optimization problem in weighted norms where the likelihood results in the data misfit term weighted by the inverse of the noise covariance matrix and the prior leads to the regularization term weighted by the inverse of the prior covariance. That is, computing the MAP estimate of the PoI in the Bayesian framework requires solving a deterministic inverse problem, so the apparatus for solving Bayesian inverse problems builds on the algorithms and tools used for solving deterministic inverse problems. This understanding is what enables us to gain insight into the inverse solutions from various methods and to develop new techniques that begin with deterministic inverse problems but can then be analyzed from a statistical perspective and used to quantify uncertainty. Since the likelihood depends on the PtO map, significant emphasis has been placed on developing robust and scalable computational models in past decades along with excellent problem-specific priors. On the other hand, there has been a recent trend to abandon models and embrace the era of big data. We aim to show that neither approach is best and that the surplus of data can be used in concert with classical models to both improve the quality of inverse solutions and to accelerate the solution process using modern machine learning techniques. In this research, we use global full waveform seismic inversion and Poisson’s equation as the prototypical inverse problems. Sparsely located seismogram observations are used to reconstruct the acoustic wave speed for the seismic inverse problem. This inverse problem is constrained by a three-dimensional acoustic wave equation which is a system of time-dependent PDEs discretized on the entire globe. Full waveform inversion is an important problem in science and industry with applications to reservoir characterization and various biomedical imaging problems. We use the adjoint method as the starting point from which we develop several new inversion methods. Seismic inversion is a good prototypical problem because it is a nonlinear inverse problem with high computational cost for which scalable libraries exist, enabling us to study the effectiveness of our methods on practical large-scale inverse problems. Sparsely sampled temperature observations are used to reconstruct the underlying heat conductivity for the Poisson problem on a two-dimensional mesh. Poisson’s equation is an excellent test problem because of the severe ill-posedness of inverting for the conductivity. We propose four new methods for solving PDE constrained inverse problems: 1. The data-informed active subspace (DIAS) regularization approach was developed as an alternative to Tikhonov regularization where the regularization is only applied in directions where the data are least informative. 2. The UQ-VAE framework was developed as a hybrid data/model driven machine learning approach for rapid MAP estimation and uncertainty quantification. 3. An autoencoder based compression strategy was developed to address the high cost of solving large-scale time-dependent inverse problems by eliminating the need for checkpointing. 4. By combining the DIAS approach and autoencoder compression, we aim to provide a comprehensive method for computing a data-informed inverse solution while mitigating the additional computational cost with autoencoder compression, enabling the DIAS method to scale to large problems. Additionally, we develop a unifying theory for the convergence of randomized methods for solving inverse problems and show the effectiveness on the Poisson inverse problem. Contributions to CSEM areas of interest: Area A Applied mathematics: The DI framework was rigorously derived from the truncated singular value decomposition. Its deterministic properties were analyzed from a spectral perspective and we show that the DIAS prior is a valid regularization strategy. Additionally, we analyze the DIAS prior from a statistical perspective and show that for linear inverse problems with Gaussian noise and prior covariances, the posterior covariance of the DIAS solution is bounded below by the Tikhonov posterior covariance and show that Tikhonov regularization results in over-confident uncertainty estimates. The UQ-VAE framework was rigorously derived from minimizing the Jensen-Shannon divergence (JSD) between the true posterior and the model posterior, parameterized by the weights of a deep neural network. We derive an explicit finite sample size loss function when the likelihood, prior, and posterior are all assumed to be Gaussian. We prove both asymptotic convergence and derive a non-asymptotic error bound for a broad family of randomized solutions for linear and nonlinear inverse problems. From this family of randomized methods, we show the equivalence of several existing methods and derive new methods. Area B Numerical analysis and scientific computation: While the DIAS prior has firm mathematical foundations, computing the DIAS regularization cost and corresponding gradient term are expensive, both computationally and in terms of storage. Therefore, we develop and investigate an approximate form of the DIAS prior that allows for the action of the inverse of the DIAS prior covariance matrix to be approximately applied to a vector in a scalable fashion. We also derive and implement a form of the DIAS prior that involves low-rank projections and require substantially less storage than the naive implementation of the DIAS prior would suggest. This approximate algorithm with low-rank projection is implemented on a large-scale seismic inverse problem solved on at least 64 nodes of the Frontera supercomputer, demonstrating that the DIAS regularization is viable, even on large problems. Non-asymptotic error analysis for randomized inverse problems employs techniques from numerical analysis to show that the error of the solution to the perturbed (randomized) optimization problem is bounded with high probability. We explore the convergence of various randomized methods numerically to validate the theoretical convergence properties and provide practical insight into the numerical performance of each method on a variety of problems. An autoencoder based compression strategy for time dependent inverse problems was developed as a scalable substitute for checkpointing. We study two different compression schemes: spatial compression and temporal compression. Each method is implemented and scaled on the Frontera supercomputer. Since the goal of this work is to reduce the wasted computational effort of checkpointing, we require that any proposed approach be faster than the original checkpointing implementation. This requires special care in scalable implementation since the underlying PDE solver (and thus restoration from checkpoints) is highly tuned and fast. We develop a novel sparse-dense autoencoder architecture to minimize the FLOPs required to perform compression and decompression, showing that excellent compression results can be obtained with high levels of sparsity in the autoencoder architecture. Lastly, we present a data generation, normalization, and training scheme, showing that even the “offline” cost of training is small relative to the cost of solving the inverse problem. This work was scaled up to 256 nodes of Frontera. Area C Mathematical modeling and applications: We apply our proposed methods to two model applications which are well-suited to exploring each method’s relative advantages and disadvantages. First, we consider a 2D Poisson equation with sparse measurements. While applicable in a wide variety of fields, we consider Poisson’s equation in the context of steady-state heat conduction. Though the forward problem is linear, the inverse problem of inferring the underlying conductivity is nonlinear. The natural ill-posedness of this problem makes it an excellent test problem for new regularization methods and machine learning. Observing the temperature at only a select few locations makes the inverse problem even more ill-posed and of practical interest for testing the capabilities of inverse solvers. We also consider a large-scale seismic inverse problem, or full waveform inversion (FWI). Seismic waves can be modeled as acoustic waves propagating through the earth. The inverse problem we consider is to invert for the underlying acoustic wave speed given sparse measurements of the velocity field. We use this application to exhibit the scalability of our proposed methods to large-scale problems. Additionally, the time-dependence of FWI allows us to develop new methods for accelerating the solution of large-scale inverse problems.Show more Item Accelerating the biotechnology revolution with machine learning-guided protein engineering(2023-05) Diaz, Daniel Jesus; Ellington, Andrew D.; Anslyn, Eric V., 1960-; Henkelman, Graeme; Wilke, Claus; Klivans, Adam; Marcotte, EdwardShow more An extremely important task in biotechnology is the ability to engineer proteins by introducing mutations into their sequences, which ultimately alters their folded structure and function. In nature, this process occurs via random mutation and selection, also known as evolution. Protein engineers have learned to limit the randomness and “direct” evolution, but this process is still too laborious and bottlenecks the application of biotechnology across all sectors of society. Machine learning (ML) guided protein engineering has the potential to revolutionize the development of protein-based biotechnology and enabling this future is the underlying theme of this thesis. To make meaningful advancements and enable ML-guided protein engineering both computational advancement and experimental validation are required. This dissertation presents studies that explore the capabilities of ML frameworks to protein data and experimental validation of structure-based ML frameworks. The first computational study examines the mutational landscape of proteins through the lens of 3D convolutional neural networks (3DCNNs) and evolution. The second study explores how to leverage recent advancements made in protein large language models (pLLMs) for supervised learning on protein stability. In this study, a supervised dataset that uses organism growth temperatures as coarse-grained label is curated and several machine learning techniques invented by the natural language and computer vision community are applied to fine-tune the pLLM, ESM-1b, to predict changes in thermal stability. On the experimental side, three studies on ML-guided protein engineering are presented. First, we used MutCompute, a 3D convolutional neural network (3DCNN), to identify stabilizing mutations on several PET hydrolase scaffolds and demonstrate the ML-engineered variants provide an avenue for the bioremediation of PET. Next, we demonstrate the utility of ML-guided protein engineering for the development of pandemic response biotechnology by stabilizing Bst DNA polymerase to enable low-resource COVID19 diagnostics. The third study is the capstone of this thesis. Here, a structure-based residual neural network (MutComputeX) is trained to generalize to protein-ligand interactions and a ML pipeline for the computational generation of protein-ligand complexes is developed and then combined to guide the active site engineering of norbelladine 4O-methyltransferase, a key enzyme for the biomanufacturing of the FDA-approved drug galantamine. This is the first demonstration of ML-guided active site engineering from a computational generated protein-ligand-cofactor ternary structure. Overall, these computational advancements and empirical validations of ML-guided protein engineering demonstrate that the future of industrial chemistry is a biological one.Show more Item Acoustic source localization and characterization in rivetted metallic panels : a data-driven approach(2020-05-18) Schneider, Melanie Brianna; Salamone, Salvatore; Haberman, Michael R. (Michael Richard), 1977-Show more This thesis presents a data-driven approach to localizing and characterizing acoustic emission (AE) sources in rivetted metallic panels, ubiquitous in the aerospace and naval industry. The geometric features, characteristic to these structures, create an environment of complex wave propagation characterized by multiple reflections, modes, and dispersive behavior. As such, the patterns of wave propagation are not easily defined or derived. This presents challenges to traditional and currently used methods of source localization and characterization. Two methods are evaluated in this thesis. First, delta-t mapping, which uses the difference in time of arrival between multiple sensors to localize AE events. The second method is a deep learning-based framework that leverages the information available within the reflections and multimodal dispersive behavior to localize and characterize AE sources with fewer sensors. This framework uses stacked autoencoders trained on the continuous wavelet transform (CWT) to produce an input pattern that contains both time and frequency dependent information from the waveforms. In addition, this thesis explores the use of a data-driven framework for the identification of structural components in metallic panels by analyzing patterns of AE waveform characteristics. To validate the proposed framework, Hsu-Nielsen sources were generated on a section of a Boeing-777 fuselage panel. The results of this study show: (1) The deep learning framework can accurately localize and characterize AE sources in metallic panels instrumented with a single sensor; (2) The patterns revealed in the characteristics of the AE waveforms can be used to identify structural components in complex metallic panelsShow more Item Active visual category learning(2011-05) Vijayanarasimhan, Sudheendra; Grauman, Kristen Lorraine, 1979-; Dhillon, Inderjit S.; Aggarwal, J K.; Mooney, Raymond J.; Torralba, AntonioShow more Visual recognition research develops algorithms and representations to autonomously recognize visual entities such as objects, actions, and attributes. The traditional protocol involves manually collecting training image examples, annotating them in specific ways, and then learning models to explain the annotated examples. However, this is a rather limited way to transfer human knowledge to visual recognition systems, particularly considering the immense number of visual concepts that are to be learned. I propose new forms of active learning that facilitate large-scale transfer of human knowledge to visual recognition systems in a cost-effective way. The approach is cost-effective in the sense that the division of labor between the machine learner and the human annotators respects any cues regarding which annotations would be easy (or hard) for either party to provide. The approach is large-scale in that it can deal with a large number of annotation types, multiple human annotators, and huge pools of unlabeled data. In particular, I consider three important aspects of the problem: (1) cost-sensitive multi-level active learning, where the expected informativeness of any candidate image annotation is weighed against the predicted cost of obtaining it in order to choose the best annotation at every iteration. (2) budgeted batch active learning, a novel active learning setting that perfectly suits automatic learning from crowd-sourcing services where there are multiple annotators and each annotation task may vary in difficulty. (3) sub-linear time active learning, where one needs to retrieve those points that are most informative to a classifier in time that is sub-linear in the number of unlabeled examples, i.e., without having to exhaustively scan the entire collection. Using the proposed solutions for each aspect, I then demonstrate a complete end-to-end active learning system for scalable, autonomous, online learning of object detectors. The approach provides state-of-the-art recognition and detection results, while using minimal total manual effort. Overall, my work enables recognition systems that continuously improve their knowledge of the world by learning to ask the right questions of human supervisors.Show more Item Ad-hoc teamwork with behavior-switching agents(2019-05) Ravula, Manish Chandra Reddy; Stone, Peter, 1971-Show more As autonomous AI agents proliferate in the real world, they will increasingly need to cooperate with each other to achieve complex goals without always being able to coordinate in advance. This kind of cooperation, in which agents have to learn to cooperate on the fly, is called ad hoc teamwork. Many previous works investigating this setting assumed that teammates behave according to one of many predefined types that is fixed throughout the task. This assumption of stationarity in behaviors, is a strong assumption which cannot be guaranteed in many real-world settings. In this work, we relax this assumption and investigate settings in which teammates can change their types during the course of the task. This adds complexity to the planning problem as now an agent needs to recognize that a change has occurred in addition to figuring out what is the new type of the teammate it is interacting with. In this paper, we present a novel Convolutional-Neural-Network-based Change Point Detection (CPD) algorithm for ad hoc teamwork. When evaluating our algorithm on the modified predator prey domain, we find that it outperforms existing Bayesian CPD algorithms.Show more Item Adapting algorithms : how computational processes move between cultures(2017-05) Carter, Daniel Wayne; Feinberg, Melanie, 1970-; Clement, Tanya Elizabeth; Spinuzzi, Clay; Doty, Philip; Acker, AmeliaShow more Computational processes such as machine learning algorithms are useful in a variety of domains. While largely developed by computer scientists, they are applied in contexts from sports and marketing to law enforcement, as well as in a variety of academic areas such as the natural sciences, social sciences and humanities. The people working in these contexts often have different goals and values than do the computer scientists who develop computational processes—in the terms used here, they are of different cultures. As computational processes continue to spread to new contexts and the promotional rhetoric around big data and analytics encourages their use, it’s important to consider how they can support these diverse goals and values. To what extent does the movement of technical objects also entail the imposition of foreign goals and values? And how do people adapt technical objects in order to align them with their personal needs? In this dissertation, I highlight the diversity of cultures in which computational processes are used by focusing on the work of humanities scholars. As I show, these scholars use the same processes as do computer scientists, but they use them in novel ways. Drawing on interviews, observations and collaborative work with humanities scholars, I develop the concept of fitting practices as a way to connect the properties of computational processes with the goals and values of workers who use them. As I illustrate, prior work in Science and Technology Studies that addresses the adoption of new computational processes has primarily focused on the relationships between processes and other objects (such as software tools, companies and governments). While this work leads to analyses of how knowledge production and decision-making are constrained in specific historical periods, it does less to illuminate how computational processes might be used in new ways. I contribute to this literature by suggesting a more generative and future-oriented perspective on the use of computational processes that might also be applied to the use of technical objects, more broadly.Show more Item Adaptive trading agent strategies using market experience(2011-05) Pardoe, David Merrill; Stone, Peter, 1971-; Miikkulainen, Risto; Mooney, Raymond; Saar-Tsechansky, Maytal; Wellman, MichaelShow more Along with the growth of electronic commerce has come an interest in developing autonomous trading agents. Often, such agents must interact directly with other market participants, and so the behavior of these participants must be taken into account when designing agent strategies. One common approach is to build a model of the market, but this approach requires the use of historical market data, which may not always be available. This dissertation addresses such a case: that of an agent entering a new market in which it has no previous experience. While the agent could adapt by learning about the behavior of other market participants, it would need to do so in an online fashion. The agent would not necessarily have to learn from scratch, however. If the agent had previous experience in similar markets, it could use this experience to tailor its learning approach to its particular situation. This dissertation explores methods that a trading agent could use to take advantage of previous market experience when adapting to a new market. Two distinct learning settings are considered. In the first, an agent acting as an auctioneer must adapt the parameters of an auction mechanism in response to bidder behavior, and a reinforcement learning approach is used. The second setting concerns agents that must adapt to the behavior of competitors in two scenarios from the Trading Agent Competition: supply chain management and ad auctions. Here, the agents use supervised learning to model the market. In both settings, methods of adaptation can be divided into four general categories: i) identifying the most similar previously encountered market, ii) learning from the current market only, iii) learning from the current market but using previous experience to tune the learning algorithm, and iv) learning from both the current and previous markets. The first contribution of this dissertation is the introduction and experimental validation of a number of novel algorithms for market adaptation fitting these categories. The second contribution is an exploration of the degree to which the quantity and nature of market experience impact the relative performance of methods from these categories.Show more Item Adaptive traffic signal control using deep reinforcement learning for network traffic incidents(2023-06-15) Li, Tianxin, M.S. in Engineering; Machemehl, Randy B.; Claudel, Christian; Zhang, Ming; Boyles, StephenShow more Traffic signal control is an essential aspect of urban mobility that significantly impacts the efficiency and safety of transportation networks. Traditional traffic signal control systems rely on fixed-time or actuated signal timings, which may not adapt to the dynamic traffic demands and congestion patterns. Therefore, researchers and practitioners have increasingly turned to reinforcement learning (RL) techniques as a promising approach to improve the performance of traffic signal control. This dissertation investigates the application of RL algorithms to traffic signal control, aiming to optimize traffic flow and reduce congestion. The study develops a simulation model of a signalized intersection and trains RL agents to learn how to adjust signal timings based on real-time traffic conditions. The RL agents are designed to learn from experience and adapt to changing traffic patterns, thereby improving the efficiency of traffic flow, even for scenarios in which traffic incidents occur in the network. In this dissertation, the potential benefits of using RL algorithms to optimize traffic signal control in scenarios with and without traffic incidents were explored. To achieve this, an incident generation module was developed using the open-source traffic signal performance simulation framework that relies on the SUMO software. This module includes emergency response vehicles to mimic the realistic impact of traffic incidents and generates incidents randomly in the network. By exposing the RL agent to this environment, it can learn from the experience and optimize traffic signal control to reduce system delay. The study began with a single intersection scenario, where the DQN algorithm was modeled to form the RL agent traffic signal controller. To improve the training process and model performance, experience replay and target network were implemented to solve the limitations of DQN. Hyperparameter tuning was conducted to find the best parameter combination for the training process, and the results showed that DQN outperformed other controllers in terms of the system-wise and intersection-wise queue distribution and vehicle delay. The study was then extended to a small corridor with 2 intersections and a grid network (2x2 intersection), and the incident generation module was used to expose the RL agent to different traffic scenarios. Again, hyperparameter tuning was conducted, and the DQN model outperformed other controllers in terms of reducing congestion and improving the system performance. The robustness of the DQN performance was also tested with different demands, and the microsimulation results showed that the DQN performance was consistent. Overall, this study highlights the potential of RL algorithms to optimize traffic signal control in scenarios with and without traffic incidents. The incident generation module developed in this study provides a realistic environment for the RL agent to learn and adapt, leading to improved system performance and reduced congestion. In addition, hyperparameter tuning is essential to lay down a solid foundation for the RL training process.Show more Item Advanced methods for subsurface velocity estimation : trans-dimensional inversion and machine learning(2019-12) Biswas, Reetam; Sen, Mrinal K.; Arnulf, Adrien F.; Spikes, Kyle T.; Grand, Stephen P.; Bennett, NicholasShow more Inversion is a widely adopted tool to estimate the subsurface elastic properties of the Earth from seismic data. However, it faces several obstacles due to lack of adequate data coverage, and various assumptions made in forward modeling and inversion algorithms resulting often in sub-optimal results. One such assumption is the choice of parameterization of the model. In general, it is assumed to be known a priori and kept fixed. This can lead to either over or under parameterization, causing either overfitting or underfitting the data. In the first part of my thesis, I address the problem of model parameterization. Along with searching for models that fit the data, I also solve for the optimum number of model parameters required as dictated by the data. In a deterministic approach, I use the Basis Pursuit Inversion (BPI), which imposes sparsity in the model parameterization by adding a regularization term of L₁ norm of the model vector. The weight of the regularization term plays a dominant role, and I propose an approach for automatic calculation of this weighting factor. Alternately, I also develop a stochastic method, using Bayesian framework to solve my inverse problem in which the model parameters are treated as unknown. Unlike BPI, this method also provides us with estimates of uncertainty. Here, I make use of the Reversible Jump Markov Chain Monte Carlo (RJMCMC) framework, which allows changing the number of model parameters. However, the conventional RJMCMC is generally very slow as it attempts to sample a variable dimensional model space. To address this, I propose a new method called the Reversible Jump Hamiltonian Monte Carlo (RJHMC), which improves the efficiency by combining RJMCMC with a gradient-based Hamiltonian Monte Carlo (HMC). The gradient-based steps ensure quick convergence by allowing the sampling to take large steps guided by the gradient instead of complete random steps. I represent my model space using a layer-based earth model for the 1D problem and using an adaptive ensemble of nuclei along with Voronoi partition for a 2D problem. Subsequently, I use the method to solve the deconvolution problem in 1D, and tomography and Full Waveform Inversion problems in a 2D setting. It also provides estimates of the elastic parameters and marginal distribution of the number of model parameters. I use the 1D RJHMC to estimate density, along with P- and Swave velocities from a pre-stack angle gather. The region contains paleo-residual gas (PRG), which shows same signature as that of normal gas saturation, and can be better differentiated using density. Additionally, I applied trans-dimensional tomography to invert for P-wave velocity structure at an Axial Seamount, which is one of the most volcanically active regions in northeastern Pacific. In addition to BPI and RJMHC, I develop workflows, which take advantage of the hybrid schemes and Machine Learning (ML) algorithms. Solving an elastic FWI problem can be challenging, as it is very computationally expensive in comparison to the more commonly used acoustic formulation. I propose a hybrid scheme, where the initial P-wave velocity result from an acoustic FWI can be used to perform less expensive pre-stack Amplitude vs. Angle (AVA) inversion. This provides us with all three elastic parameters: P-wave velocity, S-wave velocity, and Density. Several inverse problems can be mapped into a neural network architecture, which can be solved using the currently developed deep learning algorithms. The last part of my dissertation describes two machine learning (ML) algorithms that I have developed for seismic inversion. I use a Convolutional Neural Network (CNN) to perform seismic inversion, in which instead of using the traditional way of using input-output pairs to train the network, I use the physics of the forward wave-propagation to guide the training. It circumvents the need for providing the label data during training and makes it unsupervised. In addition to this, I propose to use a Recurrent Neural Network (RNN) to estimate NMO velocities, which is a basic seismic processing technique. Generally, the NMO velocity is hand-picked and requires a lot of human intervention and computation time. Using this workflow with only 10% of data used as training, the network estimates NMO velocities almost instantly for the rest of the datasetShow more Item Advanced pattern recognition techniques for wave-based structural health monitoring of metallic panels(2018-08-24) Ebrahimkhanlou, Arvin; Salamone, Salvatore; Haberman, Michael; Kallivokas, Loukas; Engelhardt, Michael; Hrynyk, TrevorShow more Increasing loads on aging and deteriorating aerospace and naval structures, such as airplanes and marine vessels, their usage beyond the designed life, and the desire to reduce the downtime associated with their regular maintenance operations have all motivated research on structural health monitoring (SHM) methods. Among SHM methods, those based guided ultrasonic waves, which are excited and received by low-profile piezoelectric transducers, are one of the most promising candidates for detecting, localizing, and characterizing structural defects. Despite the significant development of these SHM systems, very few, if any, have been implemented in real structures. One major reason for this limited implementation is due the difficulty of the processing and interpreting the reverberation patterns of guided ultrasonic waves. Such reverberations are due to multiple reflections of the waves from structural and geometrical features, such as boundaries, stiffeners, and fasteners. Therefore, the primary goal of this research is to overcome this challenge by advancing pattern recognition techniques and analyzing the patterns of edge-reflected guided-ultrasonic reverberations in thin metallic panels. The objective is to leverage such patterns to improve the accuracy of current damage localization algorithms and reduce the number of required sensors. Specifically, two damage localization modes are considered: active ultrasonic imaging and passive acoustic emission. However, this dissertation gives more weight to the latter. For both active and passive modes, an analytical model are developed to simulate the patterns of edge-reflected, guide-ultrasonic reverberations. For the passive mode, a probabilistic framework is also developed to quantify the systematic uncertainties associated with this reflection-based localization approach. In addition, deep learning based, data-driven approaches are used to extend the application of the passive mode to metallic panels with rivet-connected stiffeners and allow characterizing the defects. For validation, experiments are conducted on rectangular aluminum panels with square-cut edges. The results show the effectiveness of the developed pattern recognition approaches in detecting, localizing, and characterizing structural defects, such as simulated fatigue cracks, with significantly fewer number of sensors. The knowledge gained in this investigation contributes to the condition awareness of metallic panelsShow more Item Advancements to the non-destructive evaluation of strategic components(2021-12-02) Thompson, Cole Joseph; Charlton, William S.; Clarno, Kevin; Landsberger, Sheldon; Trahan, Alexis; O'Neil, BrianShow more This dissertation studies the nexus of nuclear engineering, machine learning, and computer vision. This is realized in two ways. The first is through the design of a neutron multiplicity counter and the application of shallow machine learning methods to the multiplicity counter data collected to estimate different parameters with high accuracy. As an extension of this work, other shallow machine learning methods are used to improve the estimation of item leakage multiplication, yielding doubles rate estimates approximately three times better than with traditional methods. The second way is through the application of deep learning models in the form of convolutional neural nets and transformers to the pixel-wise segmentation of welding defects from radiographic images of welds. The results from this application show that a novel transformer network proposed in this work surpasses the performance of other models when compared using a standard candle by at least one percent. All together this work represents a contribution towards leveraging the vast computing and data capabilities of machine learning within nuclear engineering.Show more Item Advances and application of positive matrix factorization for source attribution of air pollution in megacities(2021-02-22) Bhandari, Sahil; Hildebrandt Ruiz, Lea; Apte, Joshua S.; Sharma, Mukul M; Allen, David TShow more Air pollution is considered the greatest current environmental health threat to humanity, with an estimated mortality burden of 7 million per year. More than half the world’s population is exposed to increasing air pollution. Reduction of air pollution is essential to global health and can be expected to generate long-term societal benefits. Receptor models are efficient mathematical tools for identification of sources of air pollution. A popular receptor modeling technique is Positive Matrix Factorization (PMF). However, PMF is limited by the assumption of constant source profiles throughout the modeling period—while the contribution of each source is modeled to change over time, its profile (e.g., mass spectrum, when PMF is applied to mass spectrometer data) stays constant. PMF is frequently applied to data on air pollution from fine particulate matter (PM), particularly in megacities. Megacities are centers of economic activity, harbor very large populations, and have high PM levels, especially in the developing world, posing acute challenges to public health. One such city is Delhi, India. Delhi is the second most populated city in the world and routinely experiences some of the highest particulate matter concentrations of any megacity on the planet. However, the current understanding of the sources and dynamics of PM pollution in Delhi is limited. Measurements at the Delhi Aerosol Supersite (DAS) provide long-term chemical characterization of ambient submicron aerosol in Delhi, with near-continuous online measurements of aerosol composition. In this dissertation, I apply PMF on data collected in the DAS study to characterize sources and atmospheric dynamics of submicron aerosols in Delhi. In study 1 (chapter 2), I report on source apportionment based on unsupervised (unconstrained) positive matrix factorization (PMF), conducted on 15 months of highly time-resolved speciated submicron non-refractory PM₁ (NR-PM₁) between January 2017 and March 2018. This dataset was collected in the DAS study. I report on seasonal variability across four seasons of 2017 and interannual variability using data from the two winters and springs of 2017 and 2018. I also show that a modified tracer-based organic component analysis provides an opportunity for a real-time source apportionment approach for organics in Delhi. Phase equilibrium modeling of aerosols using the extended aerosol inorganics model (E-AIM) predicts equilibrium gas-phase concentrations and allows evaluation of the importance of the ventilation coefficient (VC) and temperature in controlling primary and secondary organic aerosol. I also find that primary aerosol dominates severe air pollution episodes, and secondary aerosol dominates seasonal averages. An edited version of this chapter has been published in Atmospheric Chemistry and Physics. In study 2 (chapter 3), we develop the approach of conducting supervised (constrained) PMF on long-term datasets separated into 4 hour periods with limited variability in emissions and meteorology and statistically demonstrate its viability. I apply this time-of-day PMF approach on two seasons of highly time-resolved NR-PM₁ organics. This approach improves upon the seasonal source apportionment previously employed in Delhi by capturing the diurnal variability in source mass spectral profiles and retaining low computational intensity. Use of the EPA PMF tool allows application of constraints and quantifies random errors and rotational ambiguity in PMF solutions. Results in this study demonstrate that time-of-day PMF approach gives a greater number of more appropriate PMF factors compared to the traditional seasonal PMF approach. The time-of-day PMF approach fits data better, improving fits at specific time points, and at key m/zs. Portions of this chapter will be submitted to Atmospheric Measurement Techniques. Previous receptor modeling studies have identified vehicular emissions and fossil fuel combustion as prevalent factors contributing to fine PM pollution in Delhi. However, cooking and biomass burning have not been consistently identified in ambient studies. Bottom-up (source-oriented) studies have recognized the high exposure to residential energy emissions from cooking and heating and associated biomass burning emissions. In study 3 (chapter 4), I address these limitations of receptor modeling studies by applying PMF on two seasons of highly time-resolved NR-PM₁ organics. I utilize the time-of-day PMF approach (chapter 3) to separate primary organics into component primary factors. Hydrocarbon-like organic aerosol, or HOA, the fuel combustion and traffic primary organic aerosol surrogate, occurs in every season, and shows strong diurnal patterns. Biomass burning organic aerosol, or BBOA, separates only in winter, and exhibits time series peaks associated with space heating and solid-fuel combustion. Cooking organic aerosol, or COA, separates only in monsoon and reports stable diurnal patterns, suggesting the presence of cooking sources all-day. Equilibrium modeling of organic aerosols using volatility basis sets (VBS) suggests that differences in ventilation coefficient and temperature can explain the differences in factor separation between winter and monsoon. Overall, I show that traffic, and cooking and biomass burning contribute almost equally to the primary organic aerosol burden in Delhi, in broad agreement with several bottom-up studies. Portions of this chapter will be submitted to Atmospheric Chemistry and PhysicsShow more Item Advances in statistical script learning(2017-08-08) Pichotta, Karl; Mooney, Raymond J. (Raymond Joseph); Chambers, Nathanael; Erk, Katrin; Stone, PeterShow more When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence ``Mrs. Dalloway said she would buy the flowers herself,'' one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on. Observing this, it is clear that many different useful natural language end-tasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highly-probable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models. To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling multiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents. Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model. Third, we investigate the performance of a sequence-to-sequence encoder-decoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis. Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.Show more