Browsing by Subject "Text classification"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Cost-effective learning for classifying human values(iSchools, 2020-03-23) Ishita, Emi; Fukuda, Satoshi; Oga, Toru; Tomiura, Yoichi; Oard, Douglas W.; Fleischmann, Kenneth R.Prior work has found that classifier accuracy can be improved early in the process by having each annotator label different documents, but that later in the process it becomes better to rely on a more expensive multiple-annotation process in which annotators subsequently meet to adjudicate their differences. This paper reports on a study with a large number of classification tasks, finding that the relative advantage of adjudicated annotations varies not just with training data quantity, but also with annotator agreement, class imbalance, and perceived task difficulty.Item The effect of oversampling and undersampling on classifying imbalanced text datasets(2004-08-16) Liu, Alexander Yun-chung; Ghosh, JoydeepMany machine learning classification algorithms assume that the target classes share similar prior probabilities and misclassification costs. However, this is often not the case in the real world. The problem of classification when one class has a much lower prior probability in the training set is called the imbalanced dataset problem. One popular approach to solving the imbalanced dataset problem is to resample the training set. However, few studies in the past have considered resampling algorithms on data sets with high dimensionality. In this thesis, we examine the imbalanced dataset problem in the realm of text classification. Text has the added problems of both sparsity and high dimensionality. We first describe the resampling techniques we use in this thesis, including several resampling techniques we are introducing. After resampling, we classify the data using multinomial naïve Bayes, k nearest neighbor, and SVMs. Finally, we compare the results of our experiments and find that, while the best resampling technique to use is often dataset dependent, certain resampling techniques tend to perform consistently when coupled with certain classifiersItem Using sentence-level classification to predict sentiment at the document-level(2012-05) Hutton, Amanda Rachel; Ravikumar, Pradeep; Liu, AlexanderThis report explores various aspects of sentiment mining. The two research goals for the report were: (1) to determine useful methods in increasing recall of negative sentences and (2) to determine the best method for applying sentence level classification to the document level. The methods in this report were applied to the Movie Reviews corpus at both the document and sentence level. The basic approach was to first identify polar and neutral sentences within the text and then classify the polar sentences as either positive or negative. The Maximum Entropy classifier was used as the baseline system in which the application of further methods was explored. Part-of-speech tagging was used for its effectiveness to determine if its inclusion increased recall of negative sentences. It was also used to aid in the handling of negations within sentences at the sentence level. Smoothing was investigated and various metrics to describe the sentiment composition were explored to address goal (2). Negative recall was shown to increase with the adjustment of the classification threshold and was also seen to increase through the methods used to address goal (2). Overall, classifying at the sentence level using bigrams and a cutoff value of one was observed to result in the highest evaluation scores.