Coding and non-coding RNA Sequencing Data Analysis (6/30/2015)


Tue, Jun 30, 2015 at 10:00 AM

Customer: I have two different kind of mouse RNA libraries (mRNA and small RNA) prepared by using Illumina kits and were sequenced in two different lanes of HiSeq2000. Each library consists of 12 samples representing six developmental time points (two biological replicates for each developmental time point). I have approximately total 394 and 239 million raw reads (fastq) for mRNA and small RNA libraries, respectively.

I am interested in the exploration of small RNA species (snoRNA, snRNA, LinRNA, PiwiRNA, etc) expressing in these samples.

Tue, Jun 30, 2015 at 10:49 AM

AccuraScience LB: Illumina's small RNA kit is primarily meant to detect/measure miRNA/siRNA type of small RNAs. I am not sure about piRNAs, and I am pretty sure they would not work well for snoRNAs or LincRNAs. Did you plan to have us analyze data from the mRNA library for these longer RNAs? If so, one important question is whether you used poly-T enrichment protocol or rRNA depletion protocol when generating the mRNA library - if poly-T enrichment protocol was used, then many of the RNA species of your interest might not have been picked up.

Assuming the interesting RNA species have indeed been captured in either the mRNA or small RNA libraries, my understanding of the analysis work you would want performed is that it includes (1) processing of the sequencing data - including quality control, mapping of reads to the reference genome, and obtaining expression levels of RNAs of various species, (2) informatics work, to "annotate" the individual RNAs according to various non-coding RNA databases, and (3) differential expression analysis, which produces lists of differentially expressed RNAs across the 6 developmental time points. Could you tell me if this understanding is correct?

About task (2), we are familiar with long ncRNA resources, and it is important to note that many long ncRNAs have not been functionally characterized, rather, their functions are "predicted" in nature. We are less familiar with snRNAs, snoRNAs and piRNAs, but I tend to think Rfam will include everything we would need to annotate for those RNA species, though this understanding may turn out to be wrong, and additional work may need to be performed to pull information from other RNA databases to annotate them adequately.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer's privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.