# Browsing by Subject "Linear regression"

Now showing 1 - 8 of 8

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item Accounting for multiple membership data in adolescent social networks : an analysis of simulated data(2016-05) Peek, Jaclyn Kara; Beretvas, Susan Natasha; Powers, Daniel A.Show more Multilevel modeling allows for the modeling of nested structures such as students nested within middle schools and middle schools nested within high schools. These kinds of hierarchies are common in social science research. Pure hierarchies may exist, where one variable is completely nested within another. Multiple membership (MM) structures occur when some lower level units are members of more than one higher level clustering unit (e.g., a student attends more than one high school). An extension to the conventional multilevel model, the multiple membership random effects model (MMREM) can be used to handle MM data. I compare a random effects model with and without multiple membership effects to demonstrate the possible benefit of accounting for the MM structure. We replicate an existing study on student academic outcomes (Tranmer et al., 2013) which assumes a multiple membership data structure, and add a comparison to a non-MM (i.e. single membership) model in order to assess the improvement in model fit. The original study investigated the effect of school, area, and social network membership in friendship dyads and triads on academic achievement in adolescents, with age, gender, and ethnicity as covariates. Our models retain the MM structure found in the original social network data. The original data is confidential and unavailable for use – therefore, a major component of this report is the simulation of this dataset in R. Results indicate that multiple membership does not necessarily lead to better goodness-of-fit as measured by DIC. Accounting for MM data structure initially produced a worse-fitting model. Artificially inflating the fixed and random effects that generated the simulated academic performance outcome led to the opposite effect. We conclude that the scale of random effects is important in determining the DIC measure of fit, and propose a full simulation study to more conclusively test our original hypothesis.Show more Item Automatic regularization technique for the estimation of neural receptive fields(2010-05) Park, Mijung; Vikalo, Haris; Pillow, Jonathan W.Show more A fundamental question on visual system in neuroscience is how the visual stimuli are functionally related to neural responses. This relationship is often explained by the notion of receptive fields, an approximated linear or quasi-linear filter that encodes the high dimensional visual stimuli into neural spikes. Traditional methods for estimating the filter do not efficiently exploit prior information about the structure of neural receptive fields. Here, we propose several approaches to design the prior distribution over the filter, considering the neurophysiological fact that receptive fields tend to be localized both in space-time and spatio-temporal frequency domain. To automatically regularize the estimation of neural receptive fields, we use the evidence optimization technique, a MAP (maximum a posteriori) estimation under a prior distribution whose parameters are set by maximizing the marginal likelihood. Simulation results show that the proposed methods can estimate the receptive field using datasets that are tens to hundreds of times smaller than those required by traditional methods.Show more Item Computational modeling of tumor cell growth as a function of nutrient dynamics guided by time-resolved microscopy(2021-12-03) Yang, Jianchen (Ph. D. in biomedical engineering); Yankeelov, Thomas E.; Brock, Amy; Dunn, Andrew K; Virostko, JackShow more The varying and extreme nutrient conditions found in the tumor microenvironment force reprogramming of metabolism in tumor cells. This metabolic reprogramming has been identified as a hallmark of cancer. This dissertation focuses on the development and validation of an experimental-mathematical approach that predicts how the dynamics of glucose and lactate influence tumor metabolism and development. Firstly, we developed a baseline model that predicts tumor cell growth as a function of glucose availability. We employed time-resolved microscopy to track the temporal change in the number of live and dead tumor cells under different initial conditions and seeding densities. A family of mathematical models that describes the overall tumor cell growth in response to the initial glucose and confluence was constructed. The most parsimonious model selected from the family using the Akaike Information Criteria was calibrated and validated in two breast cancer cell lines (BT-474 and MDA-MB-231) and demonstrated accuracy in predicting tumor growth. Secondly, we developed noninvasive imaging of nutrient dynamics with a stable transfection of two FRET reporters, one assaying glucose concentration and one assaying lactate concentration, in the MDA-MB-231 breast cancer cell line. The FRET ratio from both reporters was found to increase with increasing concentration of the corresponding ligand and decrease over time for high initial concentration of the ligand. Significant differences in the FRET ratio corresponding to metabolic inhibition were found when cells were treated with glucose/lactate transporter inhibitors. The FRET reporters enabled us to track intracellular glucose and lactate dynamics, providing insight into tumor metabolism and response to therapy over time. Finally, we compared mechanism-based and machine learning models for predicting tumor cells growth when we introduced an inhibitor of glucose uptake as a potential treatment. We extended the baseline model to account for glucose uptake inhibition, considering both the real glucose level in the system and the glucose level accessible to tumor cells. The random forest model provided the best prediction while the mechanism-based model presented a comparable predictive capability.Show more Item Development of linear capacitance-resistance models for characterizing waterflooded reservoirs(2011-12) Kim, Jong Suk; Edgar, Thomas F.; Lake, Larry W.Show more The capacitance-resistance model (CRM) has been continuously improved and tested on both synthetic and real fields. For a large waterflood, with hundreds of injectors and producers present in a reservoir, tens of thousands of model parameters (gains, time constants, and productivity indices) in a field must be determined to completely define the CRM. In this case obtaining a unique solution in history-matching large reservoirs by nonlinear regression is difficult. Moreover, this approach is more likely to produce parameters that are statistically insignificant. The nonlinear nature of the CRM also makes it difficult to quantify the uncertainty in model parameters. The analytical solutions of the two linear reservoir models, the linearly transformed CRM whose control volume is the drainage volume around each producer (ltCRMP) and integrated capacitance-resistance model (ICRM), are developed in this work. Both models are derived from the governing differential equation of the producer-based representation of CRM (CRMP) that represents an in-situ material balance over the effective pore volume of a producer. The proposed methods use a constrained linear multivariate regression (LMR) to provide information about preferential permeability trends and fractures in a reservoir. The two models’ capabilities are validated with simulated data in several synthetic case studies. The ltCRMP and ICRM have the following advantages over the nonlinear waterflood model (CRMP): (1) convex objective functions, (2) elimination of the use of solver when constraints are ignored, and (3) faster computation time in optimization. In both methods, a unique solution can always be obtained regardless of the number of parameters as long as the number of data points is greater than the number of unknowns (parameters). The methods of establishing the confidence limits on CRMP gains and ICRM parameters are demonstrated in this work. This research also presents a method that uses the ICRM to estimate the gains between newly introduced injectors and existing producers for a homogeneous reservoir without having to do additional simulations or regression on newly simulated data. This procedure can guide geoscientists to decide where to drill new injectors to increase future oil recovery and provide rapid solutions without having to run reservoir simulations for each scenario.Show more Item Essays on data-driven optimization(2019-06-20) Zhao, Long, Ph. D.; Muthuraman, Kumar; Chakrabarti, Deepayan; Tompaidis, Efstathios; Caramanis, ConstantineShow more The estimation of a data matrix contains two parts: the well estimated and the poorly estimated. The latter is usually throwing away because the estimations are off. As argued in this paper, ignoring is the wrong thing to do as the poorly estimated part is orthogonal to the well estimated. I will show how to use such orthogonality information via robust optimization and provide application in portfolio optimization, least-square regression, and dimension reduction. Across a large number of experiments, utilizing the orthogonality information consistently improves the performance.Show more Item Linear estimation for data with error ellipses(2012-05) Amen, Sally Kathleen; Powers, Daniel A.; Robinson, Edward L.Show more When scientists collect data to be analyzed, regardless of what quantities are being measured, there are inevitably errors in the measurements. In cases where two independent variables are measured with errors, many existing techniques can produce an estimated least-squares linear fit to the data, taking into consideration the size of the errors in both variables. Yet some experiments yield data that do not only contain errors in both variables, but also a non-zero covariance between the errors. In such situations, the experiment results in measurements with error ellipses with tilts specified by the covariance terms. Following an approach suggested by Dr. Edward Robinson, Professor of Astronomy at the University of Texas at Austin, this report describes a methodology that finds the estimates of linear regression parameters, as well as an estimated covariance matrix, for a dataset with tilted error ellipses. Contained in an appendix is the R code for a program that produces these estimates according to the methodology. This report describes the results of the program run on a dataset of measurements of the surface brightness and Sérsic index of galaxies in the Virgo cluster.Show more Item On structured and distributed learning(2017-12) Tandon, Rashish; Dimakis, Alexandros G.; Ravikumar, Pradeep; Price, Eric; Klivans, AdamShow more With the growth in size and complexity of data, methods exploiting low-dimensional structure, as well as distributed methods, have been playing an ever important role in machine learning. These approaches offer a natural choice to alleviate the computational burden, albeit typically at a statistical trade-off. In this thesis, we show that a careful utilization of structure of a problem, or bottlenecks of a distributed system, can also provide a statistical advantage in such settings. We do this from the purview of the following three problems: 1. Learning Graphical models with a few hubs: Graphical models are a popular tool to represent multivariate distributions. The task of learning a graphical model entails estimating the graph of conditional dependencies between variables. Existing approaches to learn graphical models require a number of samples polynomial in the maximum degree of the true graph, which can be large even if there are a few high-degree nodes. In this part of the thesis, we propose an estimator that detects and then ignores high degree nodes. Consequently, we show that such an estimator has a lower sample complexity requirement for learning the overall graph when the true graph has a few high-degree nodes or "hubs" for e.g. scale-free graphs. 2. Kernel Ridge Regression via partitioning: Kernel methods find wide and varied applicability in machine learning. However, solving the Kernel Ridge Regression (KRR) optimization requires computation that is cubic in the number of samples. In this work, we consider a divide-and-conquer approach to solve the KRR problem. The division step involves splitting the samples based on a partitioning of the input space, and the conquering step is to simply use the local KRR estimate in each partition. We show that this can not only lower the computational requirements of solving the KRR problem, but also lead to improved accuracy over both a single KRR estimate, and estimates based on random data partitioning. 3. Stragglers in Distributed Synchronous Gradient Descent: Synchronous methods in machine learning have many desirable properties, but they are only as fast as the slowest machine in a distributed system. The straggler/slow machine problem is a critical bottleneck for such methods. In this part of our work, we propose a novel framework based on Coding Theory for mitigating stragglers in Distributed Synchronous Gradient Descent (and its variants). Our approach views stragglers as errors/erasures. By carefully replicating data blocks and coding across gradients, we show how this can provide tolerance to failures and stragglers without incurring any communication overheads.Show more Item Power-aware processor system design(2020-05) Kalyanam, Vijay Kiran; Abraham, Jacob A.; Orshansky , Michael; Pan, David; Touba, Nur; Tupuri, RaghuramShow more With everyday advances in technology and low-cost economics, processor systems are moving towards split grid shared power delivery networks (PDNs) while providing increased functionality and higher performance capabilities resulting in increased power consumption. Split grid refers to dividing up the power grid resources among various homogeneous and heterogeneous functional modules and processors. When the PDN is shared and common across multiple processors and function blocks, it is called a Shared PDN. In order to keep the power in control on a split-grid shared PDN, the processor system is required to operate when various hardware modules interact with each other while the supply voltage (V [subscript DD]) and clock frequency (F [subscript CLK]) are scaled. Software or hardware assisted power-collapse and low-power retention modes can be automatically engaged in the processor system. The processor system should also operate at maximum performance under power constraints while consuming the full thermal design power (TDP). The processor system should neither violate board and card current limits nor the power management integrated circuit (PMIC) limits or its slew rate requirements for current draw on the shared PDN. It is expected to operate within thermal limits below an operating temperature. The processor system is also required to detect and mitigate current violations within microseconds and temperature violations in milliseconds. The processor system is expected to be robust and should be able to tolerate voltage droops. Its importance is highlighted with the processor system being on shared PDN. Because of the sharing of the PDN, the voltage droop mitigation scheme is expected to be quick and must suppress V [subscript DD] droop propagation at the source while only introducing negligible performance penalties during this mitigation. Without a solution for V [subscript DD] droop in place, the entire V [subscript DD] of shared PDN is forced to be at a higher voltage, increasing overall system power. This can potentially affect the days of use (DoU) of battery-operated systems, and reliability and cooling of wired systems. A multi-threaded processor system is expected to monitor the current, power and voltage violations and react quickly without affecting the performance of its hardware threads while maintaining quality of service (QoS). Early high-level power estimates are a necessity to project how much power will be consumed by a future processor system. These power projections are used to plan for software use cases and to reassign power-domains of processors and function blocks belonging to the shared PDN. Additionally, it helps to re-design boards and power-cards, re-implement the PDN, change PMIC and plan for additional power, current, voltage and temperature violation related mitigation schemes if the existing solutions are insufficient. The split grid shared PDN that is implemented in a system-on-chip (SoC) is driven by low cost electronics and forces multiple voltage rails for a better energy efficiency. To support this, there is a need for incorporation of voltage levels and power-states into a processor behavioral register transfer level (RTL) model. Low power verification is a must in a split-grid PDN. To facilitate these, the RTL is annotated with voltage supplies and isolation circuits that engage and protect during power collapse scenarios across various voltage domains. The power-aware RTL design is verified, identified and corrected for low power circuit and RTL bugs prior to tape-out. The mandatory features to limit current, power, voltage and temperatures in these high performance and power hungry processor systems introduce a need to provide high level power projections for a processor system accounting for various split-grid PDN supplying V [subscript DD] to the processor, the interface bus, various function blocks, and co-processors. To solve this problem, a power prediction solution is provided that has an average-power error of 8% in prediction and works with reasonable accuracy by tracking instantaneous power for unknown software application traces. The compute time to calculate power using the generated prediction model is 100000X faster and uses 100X less compute memory compared to a commercial electronic design automation (EDA) RTL power tool. This solution is also applied to generate a digital power meter (DPM) in hardware for real-time power estimates while the processor is operational. These high-level power estimates project the potential peak-currents in these processor systems. This resulted in a need for new tests to be created and validated on silicon in order to functionally stress the split-grid shared PDN for extreme voltage droop and sustained high current usage scenarios. For this reason, functional test sequences are created for high power and voltage stress testing of multi-threaded processors. The PDN is a complex system and needs different functional test sequences to generate various kinds of high and low power instruction packets that can stress it. These voltage droop stress tests affect V [subscript MIN] margins in various voltage and frequency modes of operation in a commercial multi-threaded processor. These results underscore a need for voltage mitigation solutions. The processor system operating on a split grid shared PDN can have its V [subscript MIN] increased due to voltage stress tests or a power-virus software application. The shared PDN imposes requirements to mitigate the voltage noise at the source and avoid any possibility of increases to the shared PDN V [subscript DD]. This necessitates implementing a proactive system that can mitigate voltage droop before it occurs while lowering the processor’s minimum voltage of operation (V [subscript MIN]) to help in system power reduction. To mitigate the voltage droops, a proactive clock gating system (PCGS) is implemented with a voltage clock gate (VCG) circuit that uses a digital power meter (DPM) and a model of a PDN to predict the voltage droop before its occurrence. Silicon results show PCGS achieves 10% higher clock frequency (F [subscript CLK]) and 5% lower supply voltage (V [subscript DD]) in a 7nm processor. Questions arise about the effectiveness of PCGS over a reactive voltage droop mitigation scheme in the context of a shared PDN. This results in analysis of PCGS and its comparison against a reactive voltage droop mitigation scheme. This work shows the importance of voltage droop mitigation reaction time for a split grid shared PDN and highlights benefits of PCGS in its ability to provide better V [subscript MIN] of the entire split grid shared PDN. The silicon results from power-stress tests shows the possibility of the high-power processor system exceeding board or power-supply card current capacity and thermal violations. This requires designing a limiting system that can adapt processor performance. This limiting system is expected to meet the stringent system latency of 1 µs for sustained peak-current violations and react in the order of milli-seconds for thermal mitigation. It is also expected of this system to maintain the desired Quality of Service (QoS) of the multi-threaded processor. This results in implementation of a current and temperature limiting response circuit in a 7nm commercial processor. The randomized pulse modulation (RPM) circuit adapts processor performance and reduces current violations in the system within 1 µs and maintains thread fairness with a 0.4% performance resolution across a wide range of operation from 100% to 0.4%. Hard requirements from SoC software and hardware require the processor systems to be within the TDP and power budgets and processors sharing the split gird PDN. Power consumed by the threads (processors) are now exceeded by added functionality of new threads (processors), which could consume much higher power compared to power of previous generation processors. The threads (processors) operate cohesively in a multi-threaded processor system and though there is a large difference in magnitude of power profiles across threads (processors), the overall performance of the multi-threaded processor is not expected to be compromised. This enforces a need for a power limiting system that can specifically slow down the high-power threads (processors) to meet power-budgets and not affect performance of low-power threads. For this reason, a thread specific multi-thread power limiting (MTPL) mechanism is designed that monitors the processor power consumption using the per thread DPM (PTDPM). Implemented in 7nm for a commercial processor, silicon results demonstrate that the thread specific MTPL does not affect the performance of low power threads during power limiting until the current (power) is limited to very low values. For high power threads and during higher current (power) limiting scenarios, the thread specific MTPL shows similar performance to a conventional global limiting mechanism. Thus, the thread specific MTPL enables the multi-threaded processor system to operate at a higher overall performance compared to a conventional global mechanism across most of the power budget range. For the same power budget, the processor performance can be up to 25% higher using the thread specific MPTL compared to using a global power limiting scheme. In summary, in this dissertation design for power concepts are presented for a processor system on a split-grid shared PDN through various solutions that address challenges in high-power processors and help alleviate potential problems. These solutions range from embedding power-intent, to incorporating voltage droop prediction intelligence through power usage estimation, maintaining quality of service within a stringent system latency, to slowing down specific high-power threads of a multi-threaded processor. All these methods can work cohesively to incorporate power-awareness in the processor systems, making the processors energy efficient and operate reliably within the TDP.Show more