ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images.

PMID:31804670

IF: 0

Cited by: 26

Download citation

Abstract

The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. https://github.com/yl2019lw/ImPloc. Supplementary data are available at Bioinformatics online.

MeSH terms

Deep Learning

Humans

Immunohistochemistry

Neural Networks, Computer

Protein Transport

Proteome

Authors

Long, Wei

Yang, Yang

Shen, Hong-Bin

Recommend literature

1. Learning complex subcellular distribution patterns of proteins via analysis of immunohistochemistry images.

2. MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy.

3. ColocML: machine learning quantifies co-localization between mass spectrometry images.

4. AnnoFly: annotating Drosophila embryonic images based on an attention-enhanced RNN model.

5. Rotation equivariant and invariant neural networks for microscopy image analysis.