• Login
    • Submit
    View Item 
    •   Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    • Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    CUDIA : a probabilistic cross-level imputation framework using individual auxiliary information

    Icon
    View/Open
    PARK-THESIS.pdf (2.094Mb)
    Date
    2011-12
    Author
    Park, Yubin
    Share
     Facebook
     Twitter
     LinkedIn
    Metadata
    Show full item record
    Abstract
    In healthcare-related studies, individual patient or hospital data are not often publicly available due to privacy restrictions, legal issues or reporting norms. However, such measures may be provided at a higher or more aggregated level, such as state-level, county-level summaries or averages over health zones such as Hospital Referral Regions (HRR) or Hospital Service Areas (HSA). Such levels constitute partitions over the underlying individual level data, which may not match the groupings that would have been obtained if one clustered the data based on individual-level attributes. Moreover, treating aggregated values as representatives for the individuals can result in the ecological fallacy. How can one run data mining procedures on such data where different variables are available at different levels of aggregation or granularity? In this thesis, we seek a better utilization of variably aggregated datasets, which are possibly assembled from different sources. We propose a novel "cross-level" imputation technique that models the generative process of such datasets using a Bayesian directed graphical model. The imputation is based on the underlying data distribution and is shown to be unbiased. This imputation can be further utilized in a subsequent predictive modeling, yielding improved accuracies. The experimental results using a simulated dataset and the Behavioral Risk Factor Surveillance System (BRFSS) dataset are provided to illustrate the generality and capabilities of the proposed framework.
    Department
    Electrical and Computer Engineering
    Description
    text
    Subject
    Clustering
    Privacy preserving data mining
    BRFSS
    URI
    http://hdl.handle.net/2152/ETD-UT-2011-12-4746
    Collections
    • UT Electronic Theses and Dissertations
    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin

    Browse

    Entire RepositoryCommunities & CollectionsDate IssuedAuthorsTitlesSubjectsDepartmentThis CollectionDate IssuedAuthorsTitlesSubjectsDepartment

    My Account

    Login

    Information

    AboutContactPoliciesGetting StartedGlossaryHelpFAQs

    Statistics

    View Usage Statistics
    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin