Machine learning algorithms in political research
In recent years, political science has witnessed an explosion of data. Political scientists have begun turning to machine learning methods to provide reliable and scalable measurements of such large datasets. Building on the emerging literature on the use of machine learning in political science, I contribute four major lessons to the students and scholars who wish to make the most of these methods. These lessons include the advantage of treating machine learning as a process, combining text as data with standard data practices, the strength of pooling together supervised and unsupervised learning and the importance of understanding a model’s strengths and limits. Through two rigorous empirical chapters, I trace the process of machine learning in two case studies, with actual outcomes for two widely-used datasets in the discipline. The first centers on a model for identifying agency-creation in historical data of congressional hearings. In the second case study, I tackle a multi-classification problem of predicting one of 20 major policy topics (and over 220 minor topics) in congressional bills. I conclude with a look to the future of machine learning in the discipline as we shift from a first wave of the literature that served as an introduction to machine learning, to a second wave of utilizing machine learning in actual research on political data and the challenges that these data present.