Customer asks about our experience with PacBio data and their assemblers.
Mon, Apr 18, 2016 at 3:02 PM
AccuraScience LB: we have some experience with PacBio, though we work on Illumina data most often. Current assemblers for PacBio includes Falcon, MHAP and CA. Assembling PacBio data is not technically challenging, but could be very demanding on computational resources, if the genome you work on is large.
Mon, Apr 18, 2016 at 3:11 PM
Customer: We have SMRT data for a lower eukaryotic species, ~20 isolates in total. A reference genome is available in NCBI. But we would want to compare the synteny of our isolates to the “classical” sequenced and assembled yeast genomes in GenBank. These ~20 isolates are similar: one of them is the ancestor, and others are derivatives of it.
Tue, Apr 19, 2016 at 4:11 PM
AccuraScience LB: We should attempt assembling the ancestor genome then comparing it with the "wild type" genome in NCBI using a tool such as SyMAP. After the ancestor genome is assembled, the derivative isolates could be analyzed using mapping-based methods – with the genome of the ancestor isolate used as a reference. Depending on how big differences are expected between the derivatives and ancestors, we will decide whether it is necessary to call structural variants on top of SNV/Indel calling.
Note: LB stands for Lead Bioinformatician. n AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.
Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.