Genome Assembly with Samples of High Heterozygosity Level (12/12/2014)


12/12/2014

Customer would like to assemble a genome but it is technically challenging to extract adequate sample from a single individual thus a pooled sample has to be used, and this species is known to have high level of diversity across individuals.

Fri, 12/12/2014 at 9:33 AM

AccuraScience LB: Many have faced challenges in either pooled samples or samples with high level of heterozygosity. There are a few potential solutions:

(1) Cortex: http://cortexassembler.sourceforge.net/. It includes a tool called cortex_con for "consensus genome assembly" with pooled samples. Cortex's reference is this Nature Genetics paper: http://www.nature.com/ng/journal/v44/n2/full/ng.1028.html. Unfortunately cortex_con is still in beta testing.

(2) Broad Institute's old tool ARACHNE includes some notes on assembling polymorphic genomes: http://www.broadinstitute.org/crd/wiki/index.php/Polymorphism that may be worth trying.

(3) BGI's new SOAPdenovo (or SOAPdenovo2) takes high heterozygosity situation into account: http://www.gigasciencejournal.com/content/1/1/18.

(4) Hapsembler is "a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism": http://compbio.cs.toronto.edu/hapsembler/.

(5) Planatus is a tool for "Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads". Reference is this Genome Research paper: http://genome.cshlp.org/content/early/2014/04/22/gr.170720.113.

(6) Abyss has been used in a similar situation: https://pag.confex.com/pag/xxiii/webprogram/Paper15404.html.

Not all of these will work, thus this work will be exploratory nature and will involve trial-and-error style testing. In the end, I think it's likely that we can get 2 or more of these tools work thus a comparison can be made.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.