Analysis of Narrow- and Broad-peak ChIP-seq Data (4/17/2015)


Fri, Apr 10, 2015 at 2:21 AM

Customer asks about ChIP-seq data analysis.

Fri, Apr 10, 2015 at 8:19 AM

AccuraScience LB: Assuming it is narrow-peak type of ChIP-seq data (i.e., for a transcription factor), the "routine" analysis procedure includes sequencing data quality control, mapping of all reads to the reference genome, peak calling, and identifying sequence motifs significantly enriched among peaks identified.

Thu, Apr 16, 2015 at 3:33 PM

Customer forwards a few papers, which suggest that the protein of interest is not a transcription factor, but a well-known structural protein suspected to possess a novel transcription coregulator function, modulating transcription of a subset of cell cycle-related genes. Customer also forwards reports of initial processing of the ChIP-seq data performed by sequencing core.

Fri, Apr 17, 2015 at 4:56 PM

AccuraScience LB: The result summary files suggest that SOLiD sequencing was used. Could you confirm this?

ChIP-seq experiments are of two general types: (a) those for transcription factors produce narrow peaks, and (b) those for histone modifications produce wide peaks. The analysis procedures for these two types of data are different. Those for Type (a) involve enriched motif analysis, and take advantage of input DNA (or IgG) control data. Those for Type (b) however, do not involve enriched motif analysis. Importantly, though 10 million mapped reads per sample are considered adequate for Type (b) ChIP-seq, 20 million mapped reads per sample are recommended (by ENCODE project) for Type (b). There is uncertainty in whether this ChIP-seq experiment is more like Type (a) or Type (b), though I tend to guess it is more likely to be Type (b). If this is the case, then the read numbers in the dataset (<20 million per sample) could be problematic. We could try to analyze the data regardless of the ENCODE recommendation, but it should be noted that this could produce less-than optimal results.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.