TexasScholarWorks
    • Login
    • Submit
    View Item 
    •   Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    • Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient deep learning for sequence data

    Thumbnail
    View/Open
    ZHANG-DISSERTATION-2020.pdf (6.039Mb)
    Date
    2020-05
    Author
    Zhang, Jiong, Ph. D.
    0000-0003-3192-3281
    Share
     Facebook
     Twitter
     LinkedIn
    Metadata
    Show full item record
    Abstract
    Deep learning has achieved great success in many sequence learning tasks such as machine translation, speech recognition, and time series prediction. Powerful deep sequence learning models, including recurrent neural networks (RNNs) and Transformers, have tremendous expressive power to fit very complex functions. However, they sometimes cannot be applied to real-world scenarios due to lack of efficiency. On one hand, deep learning models usually have millions of parameters and requires computationally intensive algorithms to train. This leads to tediously long training processes, even with the most powerful hardware. On the other hand, capturing long-term dependencies within a sequence remains a contemporary challenge for most deep architectures. To overcome these challenges, we develop a series of methods to improve the efficiency of these deep learning architectures. In particular, we make the following contributions: (1)We propose methods to solve the vanishing and exploding gradient issues that arise in RNNs. These methods enable capturing dependencies over longer ranges by exploiting the orthogonality of Householder matrices or the expressive power of the Fourier basis; (2) We develop a GPU efficient training algorithm to improve the hardware efficiency of the proposed recurrent architectures with advanced linear algebra tools. The GPU efficient algorithm achieves training speed similar to vanilla RNNs while allowing explicit management of recurrent memories; (3) To solve the scalability issue of the self-attentional Transformer models, we design a dynamic training scheme called AutoAssist and an advanced Transformer model with memory summarization (Transformer-FS). We show that the proposed AutoAssist pipeline can save up to 40% of SGD updates and the Transformer-FS can capture long-term dependencies with relatively fewer additional memory cells.
    Department
    Computational Science, Engineering, and Mathematics
    Subject
    Machine learning
    Deep neural networks
    URI
    https://hdl.handle.net/2152/83141
    http://dx.doi.org/10.26153/tsw/10140
    Collections
    • UT Electronic Theses and Dissertations

    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin

     

     

    Browse

    Entire RepositoryCommunities & CollectionsDate IssuedAuthorsTitlesSubjectsDepartmentsThis CollectionDate IssuedAuthorsTitlesSubjectsDepartments

    My Account

    Login

    Statistics

    View Usage Statistics

    Information

    About Contact Policies Getting Started Glossary Help FAQs

    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin