TexasScholarWorks
    • Login
    • Submit
    View Item 
    •   Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    • Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    The effect of oversampling and undersampling on classifying imbalanced text datasets

    View/Open
    Access restricted to UT Austin EID holders (3.384Mb)
    Date
    2004-08-16
    Author
    Liu, Alexander Yun-chung
    Share
     Facebook
     Twitter
     LinkedIn
    Metadata
    Show full item record
    Abstract
    Many machine learning classification algorithms assume that the target classes share similar prior probabilities and misclassification costs. However, this is often not the case in the real world. The problem of classification when one class has a much lower prior probability in the training set is called the imbalanced dataset problem. One popular approach to solving the imbalanced dataset problem is to resample the training set. However, few studies in the past have considered resampling algorithms on data sets with high dimensionality. In this thesis, we examine the imbalanced dataset problem in the realm of text classification. Text has the added problems of both sparsity and high dimensionality. We first describe the resampling techniques we use in this thesis, including several resampling techniques we are introducing. After resampling, we classify the data using multinomial naïve Bayes, k nearest neighbor, and SVMs. Finally, we compare the results of our experiments and find that, while the best resampling technique to use is often dataset dependent, certain resampling techniques tend to perform consistently when coupled with certain classifiers
    Department
    Electrical and Computer Engineering
    Subject
    Imbalanced dataset problem
    Text classification
    Resampling techniques
    URI
    https://hdl.handle.net/2152/85336
    http://dx.doi.org/10.26153/tsw/12300
    Collections
    • UT Electronic Theses and Dissertations

    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin

     

     

    Browse

    Entire RepositoryCommunities & CollectionsDate IssuedAuthorsTitlesSubjectsDepartmentsThis CollectionDate IssuedAuthorsTitlesSubjectsDepartments

    My Account

    Login

    Statistics

    View Usage Statistics

    Information

    About Contact Policies Getting Started Glossary Help FAQs

    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin