Show simple item record

dc.contributor.advisorGhosh, Joydeepen
dc.creatorZhong, Shien
dc.date.accessioned2008-08-28T21:46:41Zen
dc.date.available2008-08-28T21:46:41Zen
dc.date.issued2003en
dc.identifierb57481076en
dc.identifier.urihttp://hdl.handle.net/2152/1096en
dc.descriptiontexten
dc.description.abstractIn many emerging data mining applications, one needs to cluster complex data such as very high-dimensional sparse text documents and continuous or dis- crete time sequences. Probabilistic model-based clustering techniques have shown promising results in many such applications. For real-valued low-dimensional vec- tor data, Gaussian models have been frequently used. For very high-dimensional vector and non-vector data, model-based clustering is a natural choice when it is difficult to extract good features or identify an appropriate measure of similarity between pairs of data objects. This dissertation presents a unified framework for model-based clustering based on a bipartite graph view of data and models. The framework includes an information-theoretic analysis of model-based partitional clustering from a deter- ministic annealing point of view and a view of model-based hierarchical clustering that leads to several useful extensions. The framework is used to develop two new variations of model-based clustering—a balanced model-based partitional cluster- ing algorithm that produces clusters of comparable sizes and a hybrid model-based clustering approach that combines the advantages of partitional and hierarchical model-based algorithms. I apply the framework and new clustering algorithms to cluster several dis- tinct types of complex data, ranging from arbitrary-shaped 2-D synthetic data to high dimensional documents, EEG time series, and gene expression time se- quences. The empirical results demonstrate the usefulness of the scalable, bal- anced model-based clustering algorithms, as well as the benefits of the hybrid model-based clustering approach. They also showcase the generality of the pro- posed clustering framework.
dc.format.mediumelectronicen
dc.language.isoengen
dc.rightsCopyright is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.en
dc.subject.lcshCluster analysis--Computer programsen
dc.titleProbabilistic model-based clustering of complex dataen
dc.description.departmentElectrical and Computer Engineeringen
dc.identifier.oclc57205074en
dc.identifier.proqst3116470en
dc.type.genreThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineElectrical and Computer Engineeringen
thesis.degree.grantorThe University of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record