Return to search

Correction of batch effects in single cell RNA sequencing data using ComBat-Seq

Single cell RNA sequencing allows expression profiles for individual cells to be obtained thus offering unprecedented insight into the behavior of individual cells. Insight gained from exploration of individual cells has implications in both cancer and developmental biology. Much of the power of these models is derived from the shear amount and granularity of the data that can be collected; however, with this power comes the deleterious introduction of batch effects. Samples sequenced on different days, by different technicians can show variance that cannot be attributed to biological condition, but rather is only due to the batch in which it was sequenced. These batch effects can cause alterations to the perceived relationships between the main effect and the outcome of interest, for instance cancer status, the main effect of cancer status may be hidden by the unwanted and unmodeled variance. Two known methods for the correction of batch effects in bulk RNA sequencing data are ComBat-Seq and Surrogate Variable Analysis; in this work, we demonstrate that when cell-type is known, inclusion of that covariate in the ComBat-Seq results in an appropriate correction of the batch effect. We also demonstrate that when cell-type is not known, SVA can be used to infer cell-type information form the latent structure of the count matrix with some loss of accuracy compared to the correction with cell type. This cell type information can be used in place of the actual cell-type covariate information to correct single cell RNA sequencing data with ComBat-Seq; inclusion of surrogate variables helps the accuracy of the correction in certain scenarios. Additionally, in the case where cell-type is not known, and the cell proportions are balanced between batches we demonstrate that ComBat-Seq can be used naive to cell-type information. The efficacy of this procedure is demonstrated with two simulated datasets and a dataset containing Jurkat and t293 cells. These results are then compared to Harmony, a recently reported batch correction algorithm. The procedure, herein reported, has benefits over harmony in certain situations such as when a counts matrix is needed for further analysis or when there is thought to be substantial intra-cell-type variability across different batches.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/42098
Date20 February 2021
CreatorsDullea, Jonathan Tyler
ContributorsJohnson, W. Evan, Campbell, Joshua D.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.002 seconds