BeadChip Normalization Problem (4/9/2015)


Tue, Apr 7, 2015 at 6:52 PM

Wed, Feb 4, 2015 at 11:13 AM

Customer: We have recently run an Illumina BeadChip array (Human HT12 V4) to identify differentially expressed genes, group/sub-group clustering, and annotated pathways that may be altered in our two group cohorts. We have initiated our analysis of the raw data using Genome Studio, and we have quickly identified issues with normalization; we have 2 BeadChips i.e. a total of 24 samples. After using GenomStudio's normalization methods (quantile, average) and correcting for background and probe detection p values, the samples cluster according to their respective BeadChips. I have an idea what needs to be done, but I do not have expertise in this work, and my staff will need to be trained in R studio and Bioconductor; I am hoping to have this work completed in the next 1-1.5 months.

Wed, Apr 8, 2015 at 8:48 AM

AccuraScience LB: Quantile normalization is among the best normalization methods for canceling batch effects. I am curious what you think might be the reason that it has not worked. Did you also try the rank invariant option (also provided in GenomeStudio)? Anyway, the short answer to your question is yes, we could do something to help you in this situation - if you have some specific things that you would like to try, we would carry out the R programming to fulfill it. Alternatively, we could look into the data and try to find out what the problems are in our own hands, and propose remedy and carry it out.

Wed, Apr 8, 2015 at 7:05 PM

Customer: I don't really have an answer as to what I think may be causing/contributing to the samples clustering to their respective BeadChips after normalization. I normalized the data again using the rank invariant method, and same thing, other than one outlier sample, all samples cluster according to their chips. I'm assuming this is a highly unlikely event if indeed this is biological. With my limited experience with BeadChip arrays, looks like there is a serious batch effect issue going on, and I don't know why the normalization methods aren't working given that the experiment is only two BeadChips processed at the same facility; your guess is better than mine. Might be a GenomeStudio glitch... Even if my two groups didn't cluster together biologically, I would expect random clustering between control and experimental samples between chips, and not just within each chip.

Thu, Apr 9, 2015 at 4:36 PM

AccuraScience LB: There are a few things off the top of my head that we might start trying on, including examining the signal distributions of each sample, and evaluating whether there is unexpected abnormality in rank orders of the expression levels, and maybe looking at expression level of house-keeping genes, and compare your data with public array data from similar samples. Once the process starts, what we observe will point to the next direction we should go towards - it is a typical troubleshooting work.

One point I would have to make is that although we would set off to identify the source of the batch effects and aim to eliminate (or reduce) them, so that the array dataset can be properly analyzed - and we will do anything in our power towards accomplishing these goals - there is no guarantee that our effort will result in positive outcome. There is the possibility that our troubleshooting might lead to the conclusion that there is something fundamentally wrong with the data, and it couldn't be corrected "digitally" in our hands. In that case, after you spend the funds having us do the troubleshooting, you would still have to resort to re-doing the experiments. Hope you would take this into consideration when deciding whether to let us do this, or how far you would like us to go.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.