KKBox subscription prediction : an application of machine learning methods
MetadataShow full item record
This report used datasets from a Kaggle competition which aims to develop machine learning models to predict if users of a music app called KKBox will renew their membership after it expires. This report created four machine learning classification models including logistic regression, random forest, Naïve Bayes and gradient boosting. Exploratory data analysis was performed to understand data distribution and the relationships between features. For models cannot handle missing data and multicollinearity, data imputation and principle component analysis were performed. The result shows that the variable importance derived from models are quite different, which suggests us to be more cautious selecting models. It is also shown that the random forest model achieved the highest AUC (0.9727), followed by Xgboost (AUC = 0.0921), logistic regression (AUC = 0.8500), and Naïve Bayes (AUC = 0.7962). However, it is unrealistic to judge model performance without considering the real business case. The result from this report is a guidance for further business decision making.
Showing items related by title, author, creator and subject.
Learning with Markov logic networks : transfer learning, structure learning, and an application to Web query disambiguation Mihalkova, Lilyana Simeonova (2009-08)Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently ...
Chen, Wei-Ta (2018-01-25)This report aims to predict house prices by using several machine learning methods. These methods include ordinary least squares regression, Ridge regression, Lasso regression, and k-nearest neighbor regression. We compare ...
Shin, Donghyuk; 0000-0001-8687-0258 (2017-05)Graph data have become essential in representing and modeling relationships between entities and complex network structures in various domains such as social networks and recommender systems. As a main contributor of the ...