Structural Variant Detection Tools Struggle with Whole Exome Sequencing (WES) Data

Access full-text files




Pugalenthi, Lokesh
Nanduri, Rahul
Hong, Raymond
Arasappan, Dhivya
Prasad, Rohit
Kowalski-Muegge, Jeanne

Journal Title

Journal ISSN

Volume Title



Whole exome sequencing (WES) is a targeted sequencing technique that sequences only the protein-coding regions of the genome. As WES is significantly cheaper than whole genome sequencing (WGS) while still providing meaningful information, WES has become a respected tool in identifying small genetic variants underlying diseases. It is also used, but less commonly, to identify large-scale structural variants (SVs) which because of their size and complexity, are more difficult to detect using short-read sequencing data. SVs are genome alterations spanning fifty or more base pairs and have been linked to the onset or progression of certain diseases, such as Multiple Myeloma (MM). Multiple bioinformatics tools are available for the identification of structural variants from genomic data; however, it is important to benchmark their accuracies and efficiencies, particularly when dealing with exome data. Using exome sequencing data from 71 Multiple Myeloma cell lines, we benchmarked six established SV identification tools by comparing their results to each cell-line’s known SVs. We utilized the Texas Advanced Computing Center (TACC) to parallelly run our workflows on these samples. When comparing the SVs detected by each tool to the SVs expected in these cell lines, the results brought to light the challenges of detecting SVs using short read WES data. At the chromosomal level of these known SVs, only two of six tools had a recall greater than 25%. At the coordinate level, no tool had a recall greater than 20%. These tools have been used in published studies to identify SVs from WES data; their poor recall in these MM cell-lines may indicate the need for WES-specific SV detection tools in the future.




LCSH Subject Headings