Causal inference for investigating Parkinson’s disease pathogenesis

Access full-text files




Zhai, Jingpeng

Journal Title

Journal ISSN

Volume Title



Randomized control trials have long been regarded as the standard method for establishing causal relationships. However, in situations where it is impractical to carry out such trials, observational studies involving natural & random variations along with causal inference methods can be used to reason about causality. Causal inference methods require the expression of expert domain knowledge in the form of a causal model. But what happens in situations where there is little to no prior knowledge? In a dataset with a plethora of variables, how should one identify & isolate potential treatment and outcome variables? For example, Parkinson’s disease (PD) is a disorder with diverse manifestations, multiple proposed molecular pathways but no established etiology. Given a dataset with PD patients and healthy controls, and their clinical data ranging from varying levels of biology, how does one approach causal graph construction? In this paper, we devise a scheme that uses gradient- boost tree ensemble algorithms to identify systematically important features for use in causal graph construction, and attempt to establish causal relationships between them based on biological hierarchy. Lastly, we find one genotype feature of α-synuclein to have a significant causal effect on PD diagnosis.


LCSH Subject Headings