Clinically interpretable models for healthcare data
MetadataShow full item record
The increasing availability of electronic health records (EHRs) has spurred the adoption of data-driven approaches to provide additional insights for diagnoses, prognoses, and cost-effective patient treatment and management. The records are composed of a diverse array of data that includes both structured information (e.g., diagnoses, medications, and lab results) and unstructured clinical narratives notes (e.g., physician's observations, progress notes, etc). Thus, EHRs are a rich source of patient information. However, there are several formidable challenges with using EHRs that have limited their utility for clinical research so far. Problems include data quality; high-dimensional heterogenous information from various sources; privacy; and interoperability across institutions. Further hampering the acceptance of data-driven models is the lack of interpretability of their results. Physicians are accustomed to reasoning based on concise clinical concepts (or phenotypes) rather than directly on high-dimensional EHR data. Unfortunately, these records do not readily map to simple phenotypes, let alone more sophisticated and multifaceted ones. This dissertation investigates the development of clinically interpretable models for EHR data using dimensionality reduction techniques. We posit that clinical concepts are representations in lower dimensional latent spaces. Yet, standard dimensionality reduction techniques alone are insufficient to derive concise and relevant medical concepts from EHR data. We explore two approaches: (1) state space models to dynamically track a patient's cardiac arrest risk, and (2) non--negative matrix and tensor factorization models to generate concise and clinically relevant phenotypes. Our approaches yield clinically interpretable models with minimal human intervention and provides a powerful, and data-driven framework for transforming high-dimensional EHR data into medical concepts.