• Login
    • Submit
    View Item 
    •   Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    • Repository Home
    • UT Electronic Theses and Dissertations
    • UT Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Populating a Linked Data Entity Name System

    Icon
    View/Open
    KEJRIWAL-DISSERTATION-2016.pdf (4.413Mb)
    Date
    2016-05
    Author
    Kejriwal, Mayank
    0000-0001-5988-8305
    Share
     Facebook
     Twitter
     LinkedIn
    Metadata
    Show full item record
    Abstract
    Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This dissertation presents an instance matcher with four properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on ten RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms.
    Department
    Computer Sciences
    Subject
    Resource Description Framework
    Linked Data
    Semantic Web
    Instance matching
    Entity resolution
    Training set generation
    Blocking
    Property alignment
    Domain-independence
    Heterogeneity
    MapReduce
    Scalability
    URI
    http://hdl.handle.net/2152/39566
    Collections
    • UT Electronic Theses and Dissertations
    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin

    Browse

    Entire RepositoryCommunities & CollectionsDate IssuedAuthorsTitlesSubjectsDepartmentThis CollectionDate IssuedAuthorsTitlesSubjectsDepartment

    My Account

    Login

    Information

    AboutContactPoliciesGetting StartedGlossaryHelpFAQs

    Statistics

    View Usage Statistics
    University of Texas at Austin Libraries
    • facebook
    • twitter
    • instagram
    • youtube
    • CONTACT US
    • MAPS & DIRECTIONS
    • JOB OPPORTUNITIES
    • UT Austin Home
    • Emergency Information
    • Site Policies
    • Web Accessibility Policy
    • Web Privacy Policy
    • Adobe Reader
    Subscribe to our NewsletterGive to the Libraries

    © The University of Texas at Austin