Scalable smoothing algorithms for massive graph-structured data

dc.contributor.advisorScott, James (Statistician)
dc.contributor.committeeMemberCarvalho, Carlos
dc.contributor.committeeMemberStone, Peter
dc.contributor.committeeMemberGhosh, Joydeep
dc.creatorTansey, Wesley Scott
dc.creator.orcid0000-0002-5294-4228
dc.date.accessioned2017-09-28T16:06:16Z
dc.date.available2017-09-28T16:06:16Z
dc.date.created2017-08
dc.date.issued2017-08
dc.date.submittedAugust 2017
dc.date.updated2017-09-28T16:06:16Z
dc.description.abstractProbabilistically modeling noisy data is a crucial step in virtually all scientific experiments and engineering pipelines. Recent years have seen the rise of several high-throughput techniques in science and a proliferation of cheap sensors in engineering. These dual phenomena have resulted in the generation of massive datasets, each often containing rich, problem-dependent structural dependencies within and between their many observations. Classical ``scalable'' modeling procedures for common tasks such as hypothesis testing and conditional density estimation make the simplifying assumption that the data contains little or no underlying dependency structure. More sophisticated techniques to correct for latent correlations in the data have historically dealt only with small datasets where computational complexity was not a consideration. This creates a clear need for scalable, dependency-aware methods in many areas of computational statistics. To this end, we develop novel graph-based smoothing algorithms that form the foundations of three new methodologies for large-scale structured statistical inference: False Discovery Rate Smoothing (FDRS), Spatial Density Smoothing (SDS), and Smoothed Dyadic Partitioning (SDP). FDRS improves the power of classical multiple hypothesis testing in the scenario where a dependency graph can be defined over each test site. SDS provides a more sample-efficient marginal density estimator when a dependency graph is defined over multiple distributions such as when observing samples arranged on a spatial grid. Finally, when the dependence is between a set of possible outcome values in a discrete conditional probability distribution, SDP leverages the structure of the space to improve the accuracy of the predictions. We demonstrate the utility of our new procedures via a series of benchmarks and three real-world case studies: fMRI analysis with FDRS, detecting radiological anomalies with SDS, and generative modeling of image data with SDP. All code for FDR smoothing, spatial density smoothing, and smoothed dyadic partitioning is publicly available.
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifierdoi:10.15781/T25H7C91H
dc.identifier.urihttp://hdl.handle.net/2152/61823
dc.language.isoen
dc.subjectSmoothing
dc.subjectAlgorithms
dc.subjectFalse discovery rate
dc.subjectSpatial smoothing
dc.subjectTotal variation
dc.subjectTrend filtering
dc.titleScalable smoothing algorithms for massive graph-structured data
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TANSEY-DISSERTATION-2017.pdf
Size:
9.99 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: