Cancer Exome Data Analysis Options (1/6/2015)


Jan 6, 2015

Tue, Jan 6, 2015 at 8:59 AM

Customer: I am currently managing a project where we have exome data we would like analyzed. The data is for mouse exomes, and was aquired using Agilent All Exon capture and Illumina 100 base paired-end sequencing to 100x.

The data is from a unique mouse model of cancer we have developed, and we wish to discern the nature of the mutations that allow normal cells to turn into cancer cells. We have exome data on a number of non-cancerous and cancerous mouse cell lines for this analysis.

We would like SNV, Indel, and copy number analysis completed on the exome data.

Tue, Jan 6, 2015 at 10:04 AM

AccuraScience LB: Calling copy number variations (CNVs) using exome data could be challenging because current capturing techniques (including the Agilent technique you used) cannot produce even distributed reads, though there are some recent methods developed to try to address this difficulty.

An important question is, are we expecting the samples to show high level of heterogeneity as typically seen in cancer samples? A typical cancer sample (e.g., from autopsy) is a mixture of normal and cancer cells, and the cancer cells could have complicated subclonal structures of somatic mutations. The variant calling for these samples is different (and less developed) than for a "diploid" sample where allele frequencies are expected to be clustered at 0, 50% and 100%. GATK and VarScan are often used in calling SNPs (and indels) for diploid samples. Some newer (and less tested) tools such as SomaticSniper, Strelka and Bassovac could be attempted if many somatic mutations with varying allele frequencies are expected.

Besides GATK and VarScan, commonly used Indel calling methods also include PIndel and DIndel. Indel calling is more challenging than SNP calling, and the results are more error prone.

CNV and/or structural variation calling is even more challenging than Indel calling. Even for whole-genome sequencing data, the results of two CNV (or structural variation) calling methods often agree by only ~20%. Exome data present even higher challenges, as discussed in last email. Commonly used CNV and structural variant calling methods include BreakDancer, VariationHunter and GASV.

If you suspect gene fusion events may contribute to tumorigenesis, there are a few methods developed specifically to capture these events. And once again, these methods may work less effectively for exome data than whole genome sequencing data.

Besides making a catalog of "recurrent" mutations in each of the two groups (tumourigenic and non-tumourigenic), we could also perform a pathway analysis, to identify biological pathways significantly enriched among the genes mutated for each group. If some commonly known "cancer pathways" - e.g., DNA repair, cell cycle or apoptosis - appear on top of these "enriched pathways" list, it might provide useful mechanistic insights to help you decide on the next steps of the research.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.