PMID- 34048541 OWN - NLM STAT- In-Process VI - 37 IP - 22 TI - scDetect: a rank-based ensemble learning algorithm for cell type identification of single-cell RNA sequencing in cancer. PG - 4115-4122 CI - © The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. LA - eng PT - Journal Article PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 IS - 1367-4811 (Electronic) LID - 10.1093/bioinformatics/btab410 [doi] FAU - Shen, Yifei AU - Shen Y AUID- ORCID: 0000-0003-2720-724X AD - China Department of Laboratory Medicine, First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, China. AD - China Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, Hangzhou 310003, China. AD - China Institute of Laboratory Medicine, Zhejiang University, Hangzhou 310003, China. FAU - Chu, Qinjie AU - Chu Q AD - China Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China. FAU - Timko, Michael P AU - Timko MP AD - USA Departments of Biology and Public Health Sciences, University of Virginia, Charlottesville, VA 22903, USA. FAU - Fan, Longjiang AU - Fan L AD - China Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China. AD - China Department of Medical Oncology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310003, China. IS - 1367-4803 (Linking) SB - IM LR - 20230412 DP - 2021 Nov 18 AB - MOTIVATION: Single-cell RNA sequencing (scRNA-seq) has enabled the characterization of different cell types in many tissues and tumor samples. Cell type identification is essential for single-cell RNA profiling, currently transforming the life sciences. Often, this is achieved by searching for combinations of genes that have previously been implicated as being cell-type specific, an approach that is not quantitative and does not explicitly take advantage of other scRNA-seq studies. Batch effects and different data platforms greatly decrease the predictive performance in inter-laboratory and different data type validation. RESULTS: Here, we present a new ensemble learning method named as 'scDetect' that combines gene expression rank-based analysis and a majority vote ensemble machine-learning probability-based prediction method capable of highly accurate classification of cells based on scRNA-seq data by different sequencing platforms. Because of tumor heterogeneity, in order to accurately predict tumor cells in the single-cell RNA-seq data, we have also incorporated cell copy number variation consensus clustering and epithelial score in the classification. We applied scDetect to scRNA-seq data from pancreatic tissue, mononuclear cells and tumor biopsies cells and show that scDetect classified individual cells with high accuracy and better than other publicly available tools. AVAILABILITY AND IMPLEMENTATION: scDetect is an open source software. Source code and test data is freely available from Github (https://github.com/IVDgenomicslab/scDetect/) and Zenodo (https://zenodo.org/record/4764132#.YKCOlrH5AYN). The examples and tutorial page is at https://ivdgenomicslab.github.io/scDetect-Introduction/. And scDetect will be available from Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.