PMID- 32657394 OWN - NLM STAT- MEDLINE VI - 36 IP - Suppl_1 TI - A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification. PG - i292-i299 CI - © The Author(s) 2020. Published by Oxford University Press. LA - eng PT - Journal Article PT - Research Support, N.I.H., Extramural PT - Research Support, U.S. Gov't, Non-P.H.S. PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 IS - 1367-4811 (Electronic) LID - 10.1093/bioinformatics/btaa450 [doi] FAU - Srivastava, Avi AU - Srivastava A AD - Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA. FAU - Malik, Laraib AU - Malik L AD - Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA. FAU - Sarkar, Hirak AU - Sarkar H AD - Computer Science Department, University of Maryland, College Park 20742, MD, USA. FAU - Patro, Rob AU - Patro R AD - Computer Science Department, University of Maryland, College Park 20742, MD, USA. IS - 1367-4803 (Linking) SB - IM MH - Algorithms MH - Bayes Theorem MH - Gene Expression Profiling MH - *Information Dissemination MH - RNA-Seq MH - Sequence Analysis, RNA MH - *Software PMC - PMC7355277 DCOM- 20210308 LR - 20210308 DP - 20200701 AB - MOTIVATION: Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3' sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. RESULTS: We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene-expression patterns, and learn informative, empirical priors which we provide to alevin's gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene-level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups. AVAILABILITY AND IMPLEMENTATION: The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.