Catweetegories : machine learning to organize your Twitter stream
MetadataShow full item record
We want to create a web service that will help users better organize the flood of tweets they receive every day by using machine learning. This was done by experimenting with ways to manually classify training sets of tweets such as using Amazon’s Mechanical Turk and crawling the Internet for large quantities of tweets. Once we acquired good training data, we began building a classifier. We tried NLTK and Stanford NLP as libraries for creating a classifier, and we ultimately created a classifier that is 87.5% accurate. We then built a web service to expose this classifier and to allow any user on the Internet to organize their tweets. We built our web service by using many open source tools, and we discuss how we integrated these tools to create a production quality web service. We run our web service in the Amazon cloud, and we review the costs associated with running in Amazon. Finally we review the lessons we learned and share our thoughts on further work we would like to do in the future.