Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

IF: 68.164

Cited by: 1,341

Abstract

Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

Keywords

Gene Expression

MeSH terms

Algorithms

Cluster Analysis

Data Analysis

High-Throughput Nucleotide Sequencing

Sequence Analysis, RNA

Single-Cell Analysis

Authors

Haghverdi, Laleh

Lun, Aaron T L

Morgan, Michael D

Marioni, John C

Recommend literature

1. Comprehensive Integration of Single-Cell Data.

2. Fast, sensitive and accurate integration of single-cell data with Harmony.

3. Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.

4. Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces.

5. Spatial reconstruction of single-cell gene expression data.

Similar data

1. A single-cell transcriptome atlas of the human pancreas [CEL-seq2]

2. Transcriptional heterogeneity and lineage commitment in myeloid progenitors [single cell RNA-seq]

3. The transcriptional landscape of mouse blood stem/progenitor cell transitions at single cell resolution

4. Single cell transcriptomics defines human islet cell signatures and reveals cell-type-specific expression changes in type 2 diabetes

5. A single-cell transcriptome atlas of the human pancreas