Privacy-preserving computation for data mining

dc.contributor.advisorShmatikov, Vitalyen
dc.creatorBrickell, Justin Leeen
dc.date.accessioned2010-06-01T14:13:05Zen
dc.date.available2010-06-01T14:13:05Zen
dc.date.issued2009-05en
dc.descriptiontexten
dc.description.abstractAs data mining matures as a field and develops more powerful algorithms for discovering and exploiting patterns in data, the amount of data about individuals that is collected and stored continues to rapidly increase. This increase in data heightens concerns that data mining violates individual privacy. The goal of data mining is to derive aggregate conclusions, which should not reveal sensitive information. However, the data-mining algorithms run on databases containing information about individuals which may be sensitive. The goal of privacy-preserving data mining is to provide high-quality aggregate conclusions while protecting the privacy of the constituent individuals. The field of "privacy-preserving data mining" encompasses a wide variety of different techniques and approaches, and considers many different threat and trust models. Some techniques use perturbation, where noise is added (either directly to the database that is the input to the algorithm or to the output of queries) to obscure values of sensitive attributes; some use generalization, where identifying attributes are given less specific values; and some use cryp- tography, where joint computations between multiple parties are performed on encrypted data to hide inputs. Because these approaches are applied to different scenarios with different threat models, their overall e ectiveness and privacy properties are incomparable. In this thesis I take a pragmatic approach to privacy-preserving data mining and attempt to determine which techniques are suitable to real-world problems that a data miner might wish to solve, such as evaluating and learning decision-tree classifiers. I show that popular techniques for sanitizing databases prior to publication either fail to provide any meaningful privacy guarantees, or else degrade the data to the point of having only negligible data-mining utility. Cryptographic techniques for secure multi-party computation are a natural alternative to sanitized data publication, and guarantee the privacy of inputs by performing computations on encrypted data. Because of its heavy reliance on public-key cryptography, it is conventionally thought to be too slow to apply to real-world problems. I show that tailor-made protocols for specific data-mining problems can be made fast enough to run on real-world problems, and I strengthen this claim with empirical runtime analysis using prototype implementations. I also expand the use of secure computation beyond its traditional scope of applying a known algorithm to private inputs by showing how it can be used to e ciently apply a private algorithm, chosen from a specific class of algorithms, to a private input.en
dc.description.departmentComputer Science
dc.format.mediumelectronicen
dc.identifier.urihttp://hdl.handle.net/2152/7538en
dc.language.isoengen
dc.rightsCopyright is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.en
dc.subjectData miningen
dc.subjectPrivacy preservationen
dc.subjectSanitized dataen
dc.subjectCryptographic techniquesen
dc.titlePrivacy-preserving computation for data miningen
thesis.degree.departmentComputer Sciencesen
thesis.degree.disciplineComputer Sciencesen
thesis.degree.grantorThe University of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
brickellj52547.pdf
Size:
1013.88 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: