KKBox subscription prediction : an application of machine learning methods

dc.contributor.advisorZhou, Mingyuan (Assistant professor)
dc.creatorZheng, Hanyue
dc.date.accessioned2018-08-13T20:40:24Z
dc.date.available2018-08-13T20:40:24Z
dc.date.created2018-05
dc.date.issued2018-05
dc.date.submittedMay 2018
dc.date.updated2018-08-13T20:40:24Z
dc.description.abstractThis report used datasets from a Kaggle competition which aims to develop machine learning models to predict if users of a music app called KKBox will renew their membership after it expires. This report created four machine learning classification models including logistic regression, random forest, Naïve Bayes and gradient boosting. Exploratory data analysis was performed to understand data distribution and the relationships between features. For models cannot handle missing data and multicollinearity, data imputation and principle component analysis were performed. The result shows that the variable importance derived from models are quite different, which suggests us to be more cautious selecting models. It is also shown that the random forest model achieved the highest AUC (0.9727), followed by Xgboost (AUC = 0.0921), logistic regression (AUC = 0.8500), and Naïve Bayes (AUC = 0.7962). However, it is unrealistic to judge model performance without considering the real business case. The result from this report is a guidance for further business decision making.
dc.description.departmentStatistics
dc.format.mimetypeapplication/pdf
dc.identifierdoi:10.15781/T2VD6PP7V
dc.identifier.urihttp://hdl.handle.net/2152/67638
dc.language.isoen
dc.subjectMachine learning
dc.subjectClassification
dc.subjectMachine learning models
dc.subjectMachine learning model development
dc.subjectMachine learning classification models
dc.subjectMachine learning model performance
dc.titleKKBox subscription prediction : an application of machine learning methods
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentStatistics
thesis.degree.disciplineStatistics
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Statistics

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHENG-MASTERSREPORT-2018.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: