Browsing by Subject "Neural networks"
Now showing 1 - 20 of 39
- Results Per Page
- Sort Options
Item A modular attention hypothesis for modeling visuomotor behaviors(2021-07-24) Zhang, Ruohan; Ballard, Dana H. (Dana Harry), 1946-; Hayhoe, Mary; Stone, Peter; Huth, Alexander; Dayan, PeterIn this dissertation, we explore the hypothesis that complex intelligent behaviors, in vivo, can be decomposed into modules, which are organized in hierarchies and executed in parallel. This organization is similar to a multiprocessing architecture in silico. Biological attention can be viewed as a "process manager" that manages information processing and multiple computations. In this work, we seek to understand and model this modular attention mechanism for humans in a range of behavioral settings. We explain this approach to understanding modular attention at three levels based on David Marr’s paradigm: the computation theory level, the representation and algorithm level, and the hardware implementation level. At the computation theory level, we propose that simple visuomotor behaviors can be broken down into modules that require attention for their execution. At the representation and algorithm level, we model human eye movements and actions in a variety of visuomotor tasks. We collect and publish a large-scale, high-quality dataset of eye movements and actions of humans playing Atari video games. We study the active vision problem by jointly modeling human eye movements and actions, and compare how humans and artificial learning agents play these video games differently. We then propose a modular reinforcement learning model for modeling human subjects’ navigation behaviors in a virtual-reality environment with multiple goals. We further develop a modular inverse reinforcement learning algorithm to efficiently estimate the subjective reward and discount factors associated with each behavioral goal. At the implementation level, we propose a theoretical neuronal communication model named gamma spike multiplexing that allows the cortex to perform multiple computations simultaneously without crosstalk. The model explains how the modular attention hypothesis might be implemented in the biological brain. The end goals of this work are to (1) build models to explain and predict observed human visuomotor behaviors and attention; (2) use these biologically inspired models to develop algorithms for better artificial learning systems.Item Adapting to unseen driving conditions using context-aware neural networksAbdulquddos, Suhaib; Miikkulainen, Risto; Tutum, Cem COne of the primary inhibitors to successful deployment of autonomous agents in real-world tasks such as driving is their poor ability to adapt to unseen conditions. Whereas a human might be able to deduce the best course of action when confronted with an unfamiliar set of conditions based on past experiences, artificial agents have difficulty performing in conditions that are significantly different from those in which they were trained. This thesis explores an approach in which the addition of a context module to a neural network is used to overcome the challenge of adapting to unseen conditions during evaluation. The approach is tested in the CARLA simulator wherein the torque and steering curves of a vehicle are modified during training and evaluation. Furthermore the agent is trained only on a track with a relatively large radius of curvature but is evaluated on a track with much sharper turns and the agent must learn to adapt its speed and steering during evaluation. Three different neural network architectures are used for these experiments, and their respective performances are compared: Context+Skill, Context only, Skill only. It is observed that when both performance and safety of agents behavior are considered, the context+skill network consistently outperforms both the skill only and the context only architectures. The results presented in this thesis indicate that the context aware approach is a promising step towards solving the generalization problem in the autonomous vehicle domain. Furthermore, this research presents a framework for comparing the generalization capabilities of various network architectures and approaches. It is posited that the context+skill neural network has the potential to advance the field of machine learning with regards to generalization in domains beyond just autonomous driving; that is, any domain where awareness of changing environment parameters can have a positive impact on performance.Item Analyzing and improving MAESTRO's analytical modeling(2021-05-09) Kumari, Aparna; Lin, Yun CalvinDeep learning accelerators efficiently execute deep learning applications through customization. However, designing specialized hardware takes considerable time and engineering effort. Design space exploration (DSE) tools automate the design of specialized accelerators by automatically evaluating designs in this vast design space. A core component of DSE tools is an analytical model, which allows the DSE tool to filter out invalid or sub-optimal candidates at a coarse granularity before resorting to synthesis, which is more accurate but time-consuming. The MAESTRO analytical model has been used in existing DSE tools because it strikes a good balance between detail and speed. In this thesis, we improve the MAESTRO analytical model by identifying and fixing three limitations, namely (1) we consider buffer sizes in the memory energy and area model, (2) we add support for differentiating between a unified buffer and partitioned buffer, and (3) we add support for exploring bit-precision. Next, we detail future directions for improving the MAESTRO analytical model. First, we do a component-wise area and power analysis on a commercial accelerator, Nvidia Deep Learning Accelerator (NVDLA) [4], to gain insights about sub-components not modeled by the analytical model. Next, to understand the impact of compute organization, we do a component-wise area analysis of two compute organizations executing matrix-vector multiplication.Item Analyzing voltage sag direction using protective relays and deep-learning methods(2023-04-21) Patha, Lekhaj; Santoso, SuryaAs the electricity demand continues to grow, power systems are becoming more complex and interconnected, making the need for reliable protection systems more important than ever. Protection systems are designed to detect and isolate faults and other abnormal conditions, preventing them from cascading through the power grid and causing widespread outages. The primary challenge in protection is to detect the fault, the type of fault, and the location of the fault. Traditional relays effectively locate, detect, and isolate faults. Circuit breakers, fuses, and relays, these devices work together to ensure the power system remains stable and reliable, under various conditions including system failures. Smart Intelligent relays (SIRs) are designed to perform a broader range of functions, such as fault location, and power quality monitoring. Machine learning techniques are increasingly applied in power systems protection to enhance fault detection and classification accuracy and speed. ML algorithms can be used to analyze real-time data from sensors and other devices to detect and classify faults, including those that may be too small or subtle to be detected by traditional protection systems. The thesis aims to study methods of identifying the direction of voltage sags in the distribution circuits. Voltage sags arise from the presence of short-circuit faults involving single, double, and three phase-to-ground conditions. The direction of the fault is based on the direction of power flow before the occurrence of the event. A fault can be classified as a downstream fault from a monitoring location if the direction of power flow is towards the fault location before the occurrence of the event. Similarly, a fault can be classified as an upstream fault if the direction of power flow is against the fault location before the occurrence of the fault. The terms upstream and downstream are relative to the monitor location. A downstream fault for one monitor can be an upstream fault for a different monitor. This thesis studies the applications of protective relaying and deep learning techniques in identifying the direction of voltage sags using the real-time waveforms of voltage and current to estimate whether the fault is upstream or downstream from the monitored location(s). The fault data was generated using a time-domain power system modeling tool with variable fault impedances and multiple fault locations. Relay-based approaches have been studied, and a deep-learning technique has been developed with the data generated. The relay-based techniques were capable of identifying the fault direction in all the cases irrespective of the fault location and fault duration. ML algorithms can help analyze large amounts of data and detect patterns that may be difficult or impossible for traditional protection systems to identify.Item Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures(2020-05) Dathathri, Roshan; Pingali, Keshav; Musuvathi, Madanlal; Ramachandran, Vijaya; Rossbach, Christopher; Snir, MarcDistributed and heterogeneous architectures are tedious to program because devices such as CPUs, GPUs, and FPGAs provide different programming abstractions and may have disjoint memories, even if they are on the same machine. In this thesis, I present compiler and runtime systems that make it easier to develop efficient programs for privacy-preserving computation and graph processing applications on such architectures. Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations on encrypted data without requiring a secret key. Recent cryptographic advances have pushed FHE into the realm of practical applications. However, programming these applications remains a huge challenge, as it requires cryptographic domain expertise to ensure correctness, security, and performance. This thesis introduces a domain-specific compiler for fully-homomorphic deep neural network (DNN) inferencing as well as a general-purpose language and compiler for fully-homomorphic computation: 1. I present CHET, a domain-specific optimizing compiler, that is designed to make the task of programming DNN inference applications using FHE easier. CHET automates many laborious and error prone programming tasks including encryption parameter selection to guarantee security and accuracy of the computation, determining efficient data layouts, and performing scheme-specific optimizations. Our evaluation of CHET on a collection of popular DNNs shows that CHET-generated programs outperform expert-tuned ones by an order of magnitude. 2. I present a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET onto EVA. Due to the novel optimizations in EVA, its programs are on average ~ 5.3x faster than those generated by the unmodified version of CHET. These languages and compilers enable a wider adoption of FHE. Applications in several areas like machine learning, bioinformatics, and security need to process and analyze very large graphs. Distributed clusters are essential in processing such graphs in reasonable time. I present a novel approach to building distributed graph analytics systems that exploits heterogeneity in processor types, partitioning policies, and programming models. The key to this approach is Gluon, a domain-specific communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters in the bulk-synchronous parallel (BSP) model and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies. We also extend Gluon to support lock-free, non-blocking, bulk-asynchronous execution by introducing the bulk-asynchronous parallel (BASP) model. Our experiments were done on CPU clusters with up to 256 multi-core, multi-socket hosts and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ~ 2.6x on the average. Gluon's BASP-style execution is on average ~ 1.5x faster than its BSP-style execution for graph applications on real-world large-diameter graphs at scale. The D-Galois and D-IrGL systems built using Gluon scale well and are faster than Gemini, the state-of-the-art distributed CPU-only graph analytics system, by factors of ~ 3.9x and ~ 4.9x on average using distributed CPUs and distributed GPUs respectively. The Gluon-based D-IrGL system for distributed GPUs is also on average ~ 12x faster than Lux, the only other distributed GPU-only graph analytics system. The Gluon-based D-IrGL system was one of the first distributed GPU graph analytics systems and is the only asynchronous one.Item Compressed sensing recovery with unlearned neural networks(2019-07-01) Van Veen, David Michael; Dimakis, Alexandros G.; Vishwanath, SriramThis report investigates methods for solving the problem of compressed sensing, in which the goal is to recover a signal from noisy, linear measurements. Compressed sensing techniques enable signal recovery with far fewer measurements than required by traditional methods such as Nyquist sampling. Signal recovery is an incredibly important area in application domains such as consumer electronics, medical imaging, and many others. While classical methods for compressed sensing recovery are well established, recent developments in machine learning have created wide opportunity for improvement. In this report I first discuss pre-existing approaches, both classical and modern. I then present my own contribution to this field: creating a method using untrained machine learning models. This approach has several advantages which enable its use in complex domains such as medical imagingItem A computational model of language pathology in schizophrenia(2010-12) Grasemann, Hans Ulrich; Miikkulainen, Risto; Hoffman, Ralph E.; Mooney, Raymond J.; Love, Bradley C.; Ballard, Dana H.; Kuipers, Benjamin J.No current laboratory test can reliably identify patients with schizophrenia. Instead, key symptoms are observed via language, including derailment, where patients cannot follow a coherent storyline, and delusions, where false beliefs are repeated as fact. Brain processes underlying these and other symptoms remain unclear, and characterizing them would greatly enhance our understanding of schizophrenia. In this situation, computational models can be valuable tools to formulate testable hypotheses and to complement clinical research. This dissertation aims to capture the link between biology and schizophrenic symptoms using DISCERN, a connectionist model of human story processing. Competing illness mechanisms proposed to underlie schizophrenia are simulated in DISCERN, and are evaluated at the level of narrative language, the same level used to diagnose patients. The result is the first simulation of a speaker with schizophrenia. Of all illness models, hyperlearning, a model of overly intense memory consolidation, produced the best fit to patient data, as well as compelling models of delusions and derailments. If validated experimentally, the hyperlearning hypothesis could advance the current understanding of schizophrenia, and provide a platform for simulating the effects of future treatments.Item Data reduction methods for human decision making and learning(2019-07-16) Mourad, Sara J.; Tewfik, Ahmed; Vikalo, Haris; Dimakis, Georgios-Alexandros; Ghosh, Joydeep; Krähenbühl, PhilippThe rapidly increasing size of data is becoming a major challenge for both humans and machines to process. While more data means more information and less uncertainty, and consequently better performance, more data also means more processing time and more storage. This motivated my thesis which is centered around finding ways to cut the size of data shown to humans, as well as data fed into machine learning algorithms, without compromising the performance. First, in the context of human decision making, we aim at reducing and reordering the data to show to a human subject to enhance their decision performance. We propose a statistical model for human decision making that incorporates cognitive biases. We then propose an algorithm that allows, in polynomial time, to construct an ordered subset of the data so that the human performance approximately matches the optimal performance. Second, we propose an algorithm for selecting a subset of the training data to train the SVM on. The algorithm optimizes a submodular set function, that represents the diversity and the relevance of the subset considered, while providing some performance guarantees. We then propose an algorithm for selecting a weighted subset of the training data to train the SVM on. The weighted subset construction is based on constructing the maximal independent set of the graph induced by the approximate nearest neighborhood properties of the dataset. Third, we propose two algorithms for online selective training for neural networks. The first method is based on picking batches that maximize the reduction in entropy of the estimator. The second method consists of constructing the batches such that all the datapoints included have predicted probabilities under some threshold. Our approaches allow to keep the epoch based framework of training neural networks, and to make the decisions based on up to date values.Item Discovering multi-purpose modules through deep multitask learning(2019-02-14) Meyerson, Elliot Keeler; Miikkulainen, Risto; Graumen, Kristen; Durrett, Greg; Nitschke, GeoffMachine learning scientists aim to discover techniques that can be applied across diverse sets of problems. Such techniques need to exploit regularities that are shared across tasks. This begs the question: What shared regularity is not yet being exploited? Complex tasks may share structure that is difficult for humans to discover. The goal of deep multitask learning is to discover and exploit this structure automatically by training a joint model across tasks. To this end, this dissertation introduces a deep multitask learning framework for collecting generic functional modules that are used in different ways to solve different problems. Within this framework, a progression of systems is developed based on assembling shared modules into task models and leveraging the complementary advantages of gradient descent and evolutionary optimization. In experiments, these systems confirm that modular sharing improves performance across a range of application areas, including general video game playing, computer vision, natural language processing, and genomics; yielding state-of-the-art results in several cases. The conclusion is that multi-purpose modules discovered by deep multitask learning can exceed those developed by humans in performance and generality.Item Efficient and dimension independent methods for neural network surrogate construction and training(2020-08-12) O'Leary-Roseberry, Thomas Finnian; Ghattas, Omar N.; Heimbach, Patrick; Biros, George; Oden, J. Tinsley; Willcox, KarenIn this dissertation I investigate how to efficiently construct neural network surrogates for parametric maps defined by PDEs, and how to use second order information to improve solutions to the related neural network training problem. Many-query problems arising in scientific applications (such as optimization, uncertainty quantification and inference problems) require evaluation of an input output mapping parametrized by a high dimensional nonlinear PDE model. The cost of these evaluations makes solution using the model prohibitive, and efficient accurate surrogates are the key to solving these problems in practice. In this work I investigate neural network surrogates that use model information to detect informed subspaces of the input and output where the parametric map can be represented efficiently. These compact representations require relatively few data to train and outperform conventional data-driven approaches which require large training data sets. Once a neural network is designed, training is a major issue. One seeks to find optimal weights for a neural network that generalize to data not seen during training. In this work I investigate how second order information can be efficiently exploited to design optimizers that have fast convergence and good generalization properties. These optimizers are shown to outperform conventional methods in numerical experiments.Item Estimation of in-situ fluid properties from the combined interpretation of nuclear, dielectric, optical, and magnetic resonance measurements(2018-12) Lee, Hyungjoo; Torres-Verdín, Carlos; Daigle, Hugh; Heidari, Zoya; Okuno, Ryosuke; Raizen, MarkDuring the last few decades, the quantification of hydrocarbon pore volume from borehole measurements has been widely studied for reservoir descriptions. Relatively less effort has been devoted to estimating in-situ fluid properties because (1) acquiring fluid samples is expensive, (2) reservoir fluids are a complex mixture of various miscible and non-miscible phases, and (3) they depend on environmental factors such as temperature and pressure. This dissertation investigates the properties of fluid mixtures based on various manifestations of their electromagnetic properties from the MHz to the THz frequency ranges. A variety of fluids, including water, alcohol, alkane, aromatics, cyclics, ether, and their mixtures, are analyzed with both laboratory experiments and numerical simulations. A new method is introduced to quantify in-situ hydrocarbon properties from borehole nuclear measurements. The inversion-based estimation method allows depth-continuous assessment of compositional gradients at in-situ conditions and provides thermodynamically consistent interpretations of reservoir fluids that depend greatly on phase behavior. Applications of this interpretation method to measurements acquired in two field examples, including one in a gas-oil transition zone, yielded reliable and verifiable hydrocarbon compositions. Dielectric properties of polar liquid mixtures were analyzed in the frequency range from 20 MHz to 20 GHz at ambient conditions. The Havriliak-Negami (HN) model was adapted for the estimation of dielectric permittivity and relaxation time. These experimental dielectric properties were compared to Molecular Dynamics (MD) simulations. Additionally, thermodynamic properties, including excess enthalpy, density, number of hydrogen bonds, and effective self-diffusion coefficient, were computed to cross-validate experimental results. Properties predicted from MD simulations are in excellent agreement with experimental measurements. The three most common optical spectroscopy techniques, i.e. Near Infrared (NIR), Infrared, and Raman, were applied for the estimation of compositions and physical properties of liquid mixtures. Several analytical techniques, including Principal Component Analysis (PCA), Radial Basis Functions (RBF), Partial Least-Squares Regression (PLSR), and Artificial Neural Networks (ANN), were separately implemented for each spectrum to build correlations between spectral data and properties of liquid mixtures. Results show that the proposed methods yield prediction errors from 1.5% to 22.2% smaller than those obtained with standard multivariate methods. Furthermore, the errors can be decreased by combining NIR, Infrared, and Raman spectroscopy measurements. Lastly, the ¹H NMR longitudinal relaxation properties of various liquid mixtures were examined with the objective of detecting individual components. Relaxation times and diffusion coefficients obtained via MD simulations for these mixtures are in agreement with experimental data. Also, the ¹H-¹H dipole-dipole relaxations for fluid mixtures were decomposed into the relaxations emanate from the intramolecular and intermolecular interactions. The quantification of intermolecular interactions between the same molecules and different molecules reveals how much each component contributes to the total NMR longitudinal relaxation of the mixture as well as the level of interactions between different fluids. Both experimental and numerical simulation results documented in this dissertation indicate that selecting measurement techniques that can capture the physical property of interest and maximize the physical contrasts between different components is important for reliable and accurate in-situ fluid identificationItem Evolutionary bilevel optimization for complex control problems and blackbox function optimization(2015-05) Liang, Jason Zhi; Miikkulainen, Risto; Stone, PeterMost optimization algorithms must undergo time consuming parameter tuning in order to solve complex, real-world control tasks. Parameter tuning is inherently a bilevel optimization problem: The lower level objective function is the performance of the control parameters discovered by an optimization algorithm and the upper level objective function is the performance of the algorithm given its parameterization. In the first part of this thesis, a new bilevel optimization method called MetaEvolutionary Algorithm (MEA) is developed to discover optimal parameters for neuroevolution to solve control problems. In two challenging benchmarks, double pole balancing and helicopter hovering, MEA discovers parameters that result in better performance than hand tuning and other automatic methods. In the second part, MEA tunes an adaptive genetic algorithm (AGA) that uses the state of the population every generation to adjust parameters on the fly. Promising experimental results are shown for standard blackbox benchmark functions. Thus, bilevel optimization in general and MEA in particular are promising approaches for solving difficult optimization tasks.Item Evolving scout agents for military simulations(2015-05) Boyles, Brian David; Miikkulainen, Risto; Ballard, DanaSimulations play an increasingly significant role in training and preparing the military, particularly in environments with constrained budgets. Unfortunately, in most cases a small number of people must control a large number of simulated vehicles and soldiers. This often leads to micromanagement of computer-controlled forces in order to get them to exhibit the human-like characteristics of an enemy force. This thesis uses Neuroevolution of Augmenting Topologies (NEAT) to train neural networks to perform the role of scouts which analyze the terrain and decide where to place themselves to best observe the enemy forces. The main attribute that the scout agents consider is a vapor flow rate from the enemy starting location to their intended objective, which according to previous studies indicates likely chokepoints along the enemy route. This thesis experiments with different configurations of sensors and fitness functions in order to maximize how much of the enemy team is spotted over the course of the scenario. The results show that these agents perform better than randomly placed scouts and better than scouts deployed using heuristics in many situations, although not consistently so. Evolutionary optimization of scout agents using vapor flow is thus a promising approach for developing autonomous scout agents in military simulations.Item Exploring distributional semantics in lexical representations and narrative modeling(2020-05-13) Wang, Su, 1985-; Erk, Katrin; Durrett, Greg; Li, Junyi; Wechsler, StephenWe are interested in the computational modeling of lexico-conceptual and narrative knowledge (e.g. how to represent the meaning of cat to reflect facts such as: it is similar to a dog, and it is typically larger than a mouse; how to characterize story, and how to identify different narratives on the same topic). On the lexico-conceptual front, we learn lexical representations with strong interpretability, as well as integrate commonsense knowledge into lexical representations. For narrative modeling, we study how to identify, extract, and generate narratives/stories acceptable to human intuition. As a methodological framework we apply the methods of Distributional Semantics (DS) — “a subfield of Natural Language Processing that learns meaning from word usages” (Herbelot, 2019) — where semantic representations (on any levels such as word, phrases, sentences, etc.) are learned at scale from data through machine learning models (Erk and Pado, 2008; Baroni and Lenci, 2010; Mikolov ´ et al., 2013; Pennington et al., 2014). To infuse interpretability and commonsense into semantic representations (specifically lexical and event), which are typically lacking in previous works (Doran et al., 2017; Gusmao et al., 2018; Carvalho et al., 2019), we complement the data-driven scalability with a minimal amount of human knowledge annotation on a selected set of tasks and have obtained empirical evidence in support of our techniques. For narrative modeling, we draw insights from the rich body of work on scripts and narratives started from Schank and Abelson (1977) and Mooney and DeJong (1985) to Chambers and Jurafsky (2008, 2009), and proposed distributional models for the tasks narrative identification, extraction, and generation which produced state-of-the-art performance. Symbolic approaches to lexical semantics (Wierzbicka, 1996; Goddard and Wierzbicka, 2002) and narrative modeling (Schank and Abelson, 1977; Mooney and DeJong, 1985) have been fruitful on the front of theoretical studies. For example, in theoretical linguistics, Wierzbicka defined a small set of lexical semantic primitives from which complex meaning can be built compositionally; in Artificial Intelligence, Schank and Abelson formulated primitive acts which are conceptualized into semantic episodes (i.e. scripts) understandable by humans. Our focus, however, is primarily on computational approaches that need wide lexical coverage, for which DS provides a better toolkit, especially in practical applications. In this thesis, we innovate by building on the “vanilla” DS techniques (Landauer and Dumais, 1997; Mikolov et al., 2013) to address the issues listed above. Specifically, we present empirical evidence that • On the building block level, with the framework of DS, it is possible to learn highly interpretable lexical and event representations at scale and introduce human commonsense knowledge at low cost. • On the narrative level, well-designed DS modeling offers a balance of precision and scalability, solutions which are empirically stronger to complex narrative modeling questions (e.g. narrative identification, extraction and generation). Further, conducting case-studies on lexical and narrative modeling, we showcase the viability of integrating DS with traditional methods in complementation to retain the strengths of both approaches Concretely, the contributions of this thesis are summarized as follows: • Evidence from analyzing/modeling a small set of common concepts which indicates that interpretable representations can be learned for lexical concepts with minimal human annotation to realize one/few-shot learning. • Commonsense integration in lexical semantics: with carefully designed crowdsourcing, and combined with distributional methods, it is possible to substantially improve inference related to physical knowledge of the world. • Neural distributional methods perform strongly in complex narrative modeling tasks, where we demonstrate that the following techniques are particularly useful: 1) human intuition inspired iterative algorithms; 2) integration of graphical and distributional modeling; pre-trained large-scale language models.Item Gas-hydrates saturation estimation in Krishna-Godavari basin, India(2013-05) Das, Kumar Sundaram; Sen, Mrinal K.; Tatham, Robert; Spikes, KyleGas hydrates are an unconventional energy resource. They may become an important source of energy for India in the future. They occur offshore along the continental margin. They are currently in exploratory and evaluation stages and their quantification is an important task. The goal of this thesis is to demonstrate a new technique for the estimation of gas hydrates volumes. The region of study is the Krishna-Godavari basin. It is located on the eastern offshore areas of India. The presence of gas hydrates has been proven by drilling into marine sediments as a part of the Indian National Gas Hydrates Program. Borehole subsurface and surface seismic data were collected during this expedition. I use a 2D seismic reflection line and borehole log data for my study. The method I use for estimation of gas hydrates saturation uses a combination of inversion of seismic reflection data and development of seismic attributes. My approach can be broadly described by following steps: 1. Process the seismic data to remove noise. Use stacked and migrated data along with well logs to perform poststack seismic inversion to obtain impedance information in volumetric portions of the subsurface. 2. Use NMO corrected CDP gather records of the seismic reflection data along with subsurface well logs to perform prestack seismic inversion to obtain impedance volumes. 3. Compare the results from step1 and step 2 and use the best results to perform multi-attribute analysis using a neural network method to predict resistivity and porosity logs at the well location. Use the transform equations obtained at the well location to predict the well logs throughout the seismic section in the desired zone of interest. 4. Use an anisotropic equivalent of Archie’s law that relates resistivity and porosity to saturation to predict saturation throughout the seismic reflection section. The majority of the previous work done in the region is limited to gas hydrates quantification only at the well location. By using neural networks for multi-attribute analysis, I have demonstrated a statistical based method for the prediction of log properties away from well location. My results suggest gas hydrates saturation in the range of 50-80% in the zone of interest. The estimated saturation of gas hydrates matches up very closely with the saturation estimates obtained from the cores recovered during coring of the boreholes. Hence my method provides a reliable method of quantification of gas hydrates by making best possible use of seismic and well log data. The unique combination of impedance derived attributes and neural-network includes the non-linear behavior in the predictive transform relationships. The use of an anisotropic formulation of Archie’s law to estimate saturation also produces accurate results confirmed with the observed gas-hydrates saturation.Item General-purpose optimization through information maximization(2012-05) Lockett, Alan Justin; Miikkulainen, Risto; Ghosh, Joydeep; Mooney, Raymond; Ravikumar, Pradeep; Zitkovic, GordanThe primary goal of artificial intelligence research is to develop a machine capable of learning to solve disparate real-world tasks autonomously, without relying on specialized problem-specific inputs. This dissertation suggests that such machines are realistic: If No Free Lunch theorems were to apply to all real-world problems, then the world would be utterly unpredictable. In response, the dissertation proposes the information-maximization principle, which claims that the optimal optimization methods make the best use of the information available to them. This principle results in a new algorithm, evolutionary annealing, which is shown to perform well especially in challenging problems with irregular structure.Item Hardware implementation of inference in deep neural networks(2022-05) Houck, Kimble Derek; Miikkulainen, Risto; Taillefumier, Thibaud; Fussell, Don; Ballard, DanaDeep learning neural network algorithms, including convolutional and recurrent networks, have risen to popularity in recent years. Along with this popularity has come a wide range of implementations that optimize the performance of these algorithms on existing hardware, including GPU architectures and with modern x86 CPU SIMD capabilities. Likewise, effort has been put into developing hardware specifically for running these algorithms, either focusing on specific algorithms or on a range of building block operations common to many deep learning variations. While some of these architectures, have large power requirements and are generally designed to run in a datacenter environment, hardware architectures that are designed to run most deep learning well while being small, low cost and/or power are also important for applications where these are limiting factors. In this work I will describe the implementation of both convolutional and recurrent network layer types on such a novel hardware architecture. This novel ultra-wide SIMD architecture is built around a ring of simple data movement and register units that feed simple arithmetic units, attached accumulator registers and post-processing units. Unlike many other architecture designs however, this class of hardware designs posses few methods for efficiently rearranging data over even moderate distances in memory but rather relies on shifting data between adjacent or nearby data units in the ring. Thus, neural network implementations that take the geometry of the inputs into account as much as possible are needed. I present one such implementation, M³inM²V, and show that it allows such simple hardware architectures to be efficiently used for neural network inference, analyzing both it's performance on the described novel architecture and the very different AVX-512 SIMD architecture. Furthermore, I show the applicability of recurrent network architectures to a novel domain; the decoding of information encoded in the electrical spiking activity observed from ensembles of neurons. By comparing the ability of a classifier to infer different pieces of information from the data and/or comparing classifiers trained using different methods of transforming the observed activity into feature vectors inferences can be made about what information is encoded in the neural signal, and how. By showing that deep learning classifiers can perform useful classification on such a dataset, possibly with less parameter tuning than other classifiers, I show that such tools can contribute to increasing scientific understanding of the brain. Furthermore, for future applications for decoding neural signals such as the control of prosthetic devices, the ability to run the decoding algorithms on relatively low power hardware would be highly advantageous.Item hIPPYLearn : an inexact Newton-CG method for training neural networks with analysis of the Hessian(2017-05) Gao, Ge, 1993-; Ghattas, Omar N.; Dawson, ClintNeural networks, as part of deep learning, have become extremely pop- ular due to their ability to extract information from data and to generalize it to new unseen inputs. Neural network has contributed to progress in many classic problems. For example, in natural language processing, utilization of neural network significantly improved the accuracy of parsing natural language sentences [11]. However, training complicated neural network is expensive and time-consuming. In this paper, we introduce more efficient methods to train neural network using Newton-type optimization algorithm. Specifically, we use TensorFlow, the powerful machine learning package developed by Google [2] to define the structure of the neural network and the loss function that we want to optimize. TensorFlow’s automatic differentiation capabilities allow us to efficiently compute gradient and Hessian of the loss function that are needed by the scalable numerical optimization algorithm implemented in hIPPYlib [12]. Numerical examples demonstrate the better performance of Newton method compared to Steepest Descent method, both in terms of number of iterations and computational time. Another important contribution of this work is the study of the spectral properties of the Hessian of the loss function. The distribution of the eigenvalues of the Hessian, in fact, provides extremely valuable information regarding which directions in parameter space are well informed by the data.Item Improving accuracy and efficiency of seismic data analysis using deep learning(2022-05-02) Kaur, Harpreet, Ph. D.; Fomel, Sergey B.; Sen, Mrinal K; Spikes, Kyle T; Abma, Raymond; Biros, GeorgeThe ultimate goal of seismic data analysis is to retrieve high-resolution information about the subsurface structures. It comprises different steps such as data processing, model building, wave propagation, and imaging, etc. Increasing the resolution and fidelity of the different seismic data analysis tasks eventually leads to an improved understanding of fine-scale structural features. Conventional implementation of these techniques is computationally intensive and expensive, especially with large data sets. Recent advances in neural networks have provided an ability to produce a reasonable result to computationally intensive and time-consuming problems. Deep neural networks are capable of extracting complex nonlinear relationships among variables and have shown efficacy as compared to conventional statistical methods in different areas. A major bottleneck for seismic data analysis is the tradeoff between resolution and efficiency. I address some of these challenges by implementing neural network based frameworks. First, I implement a neural network based workflow for stable and efficient wave extrapolation. Conventionally, it is implemented by finite differences (FD), which have a low computational cost but for larger time-steps may suffer from dispersion artifacts and instabilities. On the other hand, recursive integral time extrapolation (RITE) methods, especially the low-rank extrapolation, which are mixed-domain space-wavenumber operators are designed to make time extrapolation stable and dispersion free in heterogeneous media for large time steps, even beyond the Nyquist limit. They have high spectral accuracy; however, they are expensive as compared to finite-difference extrapolation. The proposed framework overcomes the numerical dispersion of finite-difference wave extrapolation for larger time steps and provides stable and efficient wave extrapolation results equivalent to low-rank wave extrapolation at a significantly reduced cost. Second, I address wave-mode separation and wave-vector decomposition problem to separate a full elastic wavefield into different wavefields corresponding to their respective wave mode. Conventionally, wave mode separation in heterogeneous anisotropic media is done by solving the Christoffel equation in all phase directions for a given set of stiffness-tensor coefficients at each spatial location of the medium, which is a computationally expensive process. I circumvent the need to solve the Christoffel equation at each spatial location by implementing a deep neural network based framework. The proposed approach has high accuracy and efficiency for decoupling the elastic waves, which has been demonstrated using different models of increasing complexity. Third, I propose a hyper-parameter optimization (HPO) workflow for a deep learning framework to simulate boundary conditions for acoustic and elastic wave propagation. The conventional low-order implementation of ABCs and PMLs is challenging for strong anisotropic media. In the tilted transverse isotropic (TTI) case, instabilities may appear in layers with PMLs owing to exponentially increasing modes, which eventually degrades the reverse time migration output. The proposed approach is stable and simulates the effect of higher-order absorbing boundary conditions in strongly anisotropic media, especially TTI media, thus having a great potential for application in reverse time migration. Fourth, I implement a coherent noise attenuation framework, especially for ground-roll noise attenuation using deep learning. Accounting for non-stationary properties of seismic data and associated ground-roll noise, I create training labels using local-time frequency transform (LTF) and regularized non-stationary regression (RNR). The proposed approach automates the ground-roll attenuation process without requiring any manual input in picking the parameters for each shot gather other than in the training data. Lastly, I address the limitation of the iterative methods with conventional implementation for true amplitude imaging. I implement a workflow to correct migration amplitudes by estimating the inverse Hessian operator weights using a neural network based framework. To incorporate non-stationarity in the framework, I condition the input migrated image with different conditioners like the velocity model and source illumination. To correct for the remnant artifacts in the deep neural network (DNN) output, I perform iterative least-squares migration using neural network output as an initial model. The network output is close to the true model and therefore, with fewer iterations, a true-amplitude image with the improved resolution is obtained. The proposed method is robust in areas with poor illumination and can easily be generalized to more-complex cases such as viscoacoustic, elastic, and others. The proposed frameworks are numerically stable with high accuracy and efficiency and are, therefore, desirable for different seismic data analysis tasks. I use synthetic and field data examples of varying complexities in both 2D and 3D to test the practical application and accuracy of the proposed approachesItem Improving deep learning through loss-function evolution(2020-12-10) Gonzalez, Santiago; Miikkulainen, Risto; Banzhaf, Wolfgang; Durrett, Greg; Huang, QixingAs the complexity of neural network models has grown, it has become increasingly important to optimize their design automatically through metalearning. Methods for discovering hyperparameters, topologies, and learning rate schedules have lead to significant increases in performance. This dissertation tackles a new type of metalearning: loss-function optimization. Loss functions define a model's core training objective and thus present a clear opportunity. Two techniques, GLO and TaylorGLO, were developed to tackle this metalearning problem using genetic programming and evolutionary strategies. Experiments show that neural networks trained with metalearned loss functions are more accurate, have higher data utilization, train faster, and are more robust against adversarial attacks. A theoretical framework was developed to analyze how and why different loss functions bias training towards different regions of the parameter space. Using this framework, their performance gains are found to result from a regularizing effect that is tailored to each domain. Overall, this dissertation demonstrates that new, metalearned loss functions can result in better trained models, and provides the next stepping stone towards fully automated machine learning.