Algorithms for biomarker identification utilizing MALDI TOF mass spectrometry

Access full-text files




Shin, Hyunjin

Journal Title

Journal ISSN

Volume Title



Currently, the best way to reduce the mortality of cancer is to detect and treat it in the earliest stages. Technological advances in genomics and proteomics have opened a new realm of methods for early detection that show potential to overcome the drawbacks of current strategies. In particular, pattern analysis of mass spectra of blood samples has attracted attention as an approach to identification of potential biomarkers for early detection of cancer. Mass spectrometry provides rapid and precise measurements of the sizes and relative abundances of the proteins present in a complex biological/chemical mixture. However, this high-throughput nature of mass spectrometry has also raised a need for the development of efficient and effective bioinformatics tools for finding biologically meaningful information. Many scholars are interested in preprocessing of raw mass spectra and in extracting and selecting features from preprocessed mass spectra. These are key issues for accurate biomarker identification. Thus, in order to improve the process of biomarker identification using mass spectrometry, I have postulated a noise model for MALDI TOF mass spectrometry from the perspective of stochastic signal processing, and have attempted to measure the spectral characteristics of components in the noise model. Noise in mass spectrometry can interfere with identification of the biochemical substances in a sample. I assumed that the noise in MALDI TOF mass spectrometry is composed of three components: noise from instrumentation, noise from random ion motions, and chemical noise. In this dissertation, I have separated and analyzed noise from instrumentation and chemical noise using parametric power spectral density estimation and wavelet-based analysis, respectively. In addition to these noise analysis studies, I also have designed an algorithm that can select independent and discriminant features from mass spectra of complex protein samples by reducing redundant and irrelevant information.