Evaluation of bioinformatics tools that decipher mutational signatures from cancer genomes, and tools that infer phylogenetic trees of tumors using single-cell sequencing data
Cancer is the second leading cause of death in the world. Throughout 1990 to 2016, the death rates of neoplasms continuously increased from 108 per 100,000 to 121 per 100,000. According to a report from the International Agency for Research on Cancer, there were 14.1 million new cancer cases, 8.2 million cancer deaths and 32.6 million people living with cancer (within 5 years of diagnosis) in 2012 worldwide. Between 2005 and 2015, incident of cancer cases increased by 33%, from 13.1 million to 17.5 million worldwide. Meanwhile, global cancer deaths increased by 17%, from 7.5 million in 2005 to 8.8 million in 2015. The global cancer incidence burden is estimated to grow to 19.3 million new cancer cases in 2025 and 24 million new cancer cases in 2035. At the same time, the number of cancer deaths is predicted to increase to 11.4 million in 2025 and 14.6 million in 2035.
Human cancers are characterized by somatic mutations. Most somatic mutations are single base substitutions. A single base substitution is a nucleotide of DNA replaced by another one. Other types of somatic mutations are indels, rearrangements, and copy number variations, but they are not as well studied as single base substitutions. Somatic mutations in cancer genomes are the aggregate outcome of the activity of mutational processes. In turn, each mutational process leaves a mutational signature in the genome. Cancer genome is the consequence of exposure to mutational processes of varied strengths. Deciphering signatures of mutational processes is hence beneficial to our understanding of the mechanisms underlying cancer development.
Cancers are clonal in origin. Neoplastic progression is a sequential selection of mutant subpopulations from a common progenitor. Clonal cancer evolution takes place within tissue ecosystem habitats where the venue and determinants for fitness selection are presented. The basis that somatic mutations arise through temporal accumulation of somatic mutations promotes studies of tumor evolutionary history. Single-cell sequencing is getting increasingly popular in cancer phylogenetics. It provides the possibility to study tumor evolution at an unprecedented resolution.
To facilitate mutational signature decipherment from cancer genomes and phylogenetic tree reconstruction from single-cell sequencing data, dedicated bioinformatics/computational tools have been developed and published. They have been used for data analysis in other researchers’ studies. However, those tools’ reliability has not been evaluated by independent research groups. Realizing that correct biological inference relies on analytical tools’ reliability, I designed this study to evaluate those bioinformatics tools.
Seven tools, Wellcome Trust Sanger Institute framework (WTSI), EMu, SomaticSignatures, pmsignature, deconstructSigs, MutSpect, and signeR were found to be currently available tools for deciphering mutational signatures from cancer genomes. Two of them, WTSI and EMu, were evaluated. Their accuracy and robustness were evaluated using simulated data. Experimental data of melanoma and skin cutaneous melanoma was downloaded from the International Cancer Genome Consortium data portal. Consistency of each tool was evaluated using bootstrapped data sets of the experimental data. Agreement between the tools was also evaluated using the bootstrapped data. The findings show that the two tools are not subject to noise data, but their accuracy, consistency and agreement are low.
Three tools, OncoNEM, SCITE, SiFit were found to be currently available computational tools that infer evolutionary trees from cancers using single-cell sequencing data. Their accuracy was evaluated based on simulation data generated using branching process theory. In addition, they were also compared with a random solution and the neighbor joining algorithm. The results show the three tools were less accurate than the neighbor joining method.
This thesis demonstrates the essential of evaluating bioinformatics tools from an independent research group. As shown in my study, the tools that decipher mutational signatures from cancer genomes and tools that infer tumor phylogenetic trees from single-cell sequencing data did not perform as good as that shown by their original publications.