Bayesian methods for complex data structures, with applications to precision medicine in women’s healthcare




Starling, Jennifer Elizabeth

Journal Title

Journal ISSN

Volume Title



This thesis explores novel Bayesian non-parametric regression techniques for data with complex structures, developed in response to challenges in women's health and obstetrics. Nearly all pregnancy-related research shares a key statistical issue: that most outcomes vary smoothly with gestational age. Models which reflect this smoothness aid in interpretability by aligning model choices with clinical knowledge; from a statistical perspective, smoothing can reduce variance without inflating bias. Existing models tend to smooth over all covariates, or require specification of parametric forms and interactions based on a priori knowledge of maternal and fetal covariates. Current literature does not provide an especially nuanced characterization of these functional forms.

Chapter 1 frames these issues in the context of current statistical modeling practices in women's health and obstetrics. Chapter 2 introduces a model for estimating patient-specific stillbirth risk over the course of gestation, with the aim to help obstetricians prevent fetal mortality. In this chapter, we introduce BART with Targeted Smoothing (tsBART), a nonparametric regression model which extends the Bayesian Additive Regression Trees (BART) prior to introduce smoothness over a single target covariate t. TsBART extends BART by parameterizing each tree's terminal nodes with smooth functions of t, rather than independent scalars. Both BART and tsBART capture complex nonlinear relationships and interactions among the predictors, but tsBART guarantees that the response surface is smooth in the target covariate. This improves interpretability and helps regularize the estimate. After introducing and benchmarking the tsBART model, we apply it to pregnancy outcomes data from the National Center for Health Statistics. Our aim is to provide patient-specific estimates of stillbirth risk across gestational age (t), based on maternal and fetal risk factors (x). The results of our analysis show the clear superiority of the tsBART model for quantifying stillbirth risk, thereby providing patients and doctors with better information for managing the risk of fetal mortality.

Chapter 3 extends these ideas into the causal inference setting to analyze a new clinical protocol for early medical abortion. We introduce Targeted Smooth Bayesian Causal Forests (tsBCF), a nonparametric Bayesian approach for estimating heterogeneous treatment effects which vary smoothly over a single covariate in the observational data setting. The tsBCF method also induces smoothness by parameterizing terminal tree nodes with smooth functions, and allows for separate regularization of treatment effects versus prognostic effect of control covariates. Smoothing parameters for prognostic and treatment effects can be chosen to reflect prior knowledge or tuned in a data-dependent way. Our aim is to assess the relative effectiveness of simultaneous versus interval administration of mifepristone and misoprostol over the first nine weeks of gestation. The model reflects our expectation that the relative effectiveness varies smoothly over gestation, but not necessarily over other covariates. We demonstrate the performance of the tsBCF method on benchmarking experiments.

In Chapter 4, we aim to characterize the relationship between birth weight and maternal pre-eclampsia across gestation at a large maternity hospital in urban Uganda. Key scientific questions we investigate include: 1) how pre-eclampsia compares to other maternal-fetal covariates as a predictor of birth weight; and 2) whether the impact of pre-eclampsia on birthweight varies across gestation. We propose a nonparametric regression model called Projective Smooth BART (psBART), which addresses several key statistical challenges. First, our model correctly encodes the prior medical knowledge that birth weight should vary smoothly and monotonically with gestational age. It also avoids assumptions about functional forms and about how birth weight varies with other covariates. Finally, psBART accounts for the fact that a high proportion (83%) of birth weights in our dataset are rounded to the nearest 100 grams. Such extreme data coarsening is rare in maternity hospitals in high resource obstetrics settings but common for data sets collected in low and middle-income countries (LMICs); this introduces a substantial extra layer of uncertainty into the problem and is a major reason why we adopt a Bayesian approach. The results of our analysis show that pre-eclampsia is a dominant predictor of birth weight in this urban Ugandan setting and is therefore an important risk factor for perinatal mortality.

Chapter 5 summarizes our contributions and describes directions for future research.



LCSH Subject Headings