iProStruct2D: Identifying protein structural classes by deep learning via 2D representations

[abstract]

BIn this paper, we address the problem of protein classification, starting from a multi-view 2D snapshots of proteins. Using JMol, a well-known protein visualization software, a set of multi-view 2D representations including 13 different types of protein visualizations are rendered. The 13 visualization types are used to emphasize specific properties of protein structure (e.g. a backbone visualization that displays the backbone structure of the protein as a trace of the C? atom); while different points of view in the 3D space are used to visualize the protein shapes. Given this set of 2D snapshots for each protein, deep learning is used to perform protein classification starting from 2D images. Each type of representation is used to train a different Convolutional Neural Network (CNN), and the fusion of these CNNs is shown to be able to exploit the diversity of different types of representations to improve classification performance. The multi-view projections, obtained by uniformly rotating the protein structure around its central X, Y, and Z viewing axes, are used as a kind of data augmentation during the training and testing phases. The resulting approach, named iProStruct2D, is different from most of existing methods in the literature, which are based on protein alignment or on measuring the distance between 3D representation of the protein. Experimental evaluation of the proposed approach on two datasets demonstrates the strength of iProStruct2D with respect to other state-of-the-art approaches. The MATLAB code used in this paper is available at https://github.com/LorisNanni

Keywords Protein classification; protein visualization; deep learning; convolutional neural networks.

[full paper]