A hybrid reduced approach to handle missing values in type 2 diabetes prediction

dc.contributor.advisorSaar-Tsechansky, Maytal
dc.contributor.committeeMemberGawande, Kishore
dc.creatorYou, Xinqi
dc.date.accessioned2019-08-21T22:52:38Z
dc.date.available2019-08-21T22:52:38Z
dc.date.created2016-05
dc.date.issued2016-05-06
dc.date.submittedMay 2016
dc.date.updated2019-08-21T22:52:38Z
dc.description.abstractDiabetes gains more attention among medical institutions and health care organizations as the increasing trend of diabetes around the world. In the United States, 29.1 million people or 9.3% of U.S. population are diagnosed with diabetes. About 86 million people are categorized as pre-diabetes and 15-30% of them will develop diabetes within 5 years. To tackle this challenge, National Diabetes Prevention Program (DPP) was introduced in 2002 and it reduces risk of diabetes by 58% through lifestyle change program. In order to help select a better group of prediabetes for intervention and maximize the cost-effectiveness of the program, we propose a Hybrid Reduced approach to handle missing values when predicting type 2 diabetes. This approach deals with 4 challenges in electronic medical records: missing values, missing not at random, class imbalance and predicting at a longer window (2-year). We select three ensemble predictive models: AdaBoost.M1, Gradient Boosting and Extremely Randomized Trees and apply this approach across 7 years to assess its robustness. The Hybrid Reduced approach includes two sub-approaches: Hybrid Reduced Organic and Hybrid Reduced Imputed. Throughout the experiments, Hybrid Reduced Imputed is the best performer and achieves a 5-7% improvement in precision. By simply using this approach, we could save $278 million for healthcare and improve people’s health condition
dc.description.departmentStatistics
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/75640
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/2744
dc.language.isoen
dc.subjectStatistics
dc.subjectMissing value
dc.subjectData mining
dc.subjectMachine learning
dc.subjectPredictive modeling
dc.subjectStatistical modeling
dc.titleA hybrid reduced approach to handle missing values in type 2 diabetes prediction
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Statistics

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YOU-MASTERSREPORT-2016.pdf
Size:
741.36 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: