Set of approaches based on 3D structure and Position Specific Scoring Matrix for predicting DNA-binding proteins

[abstract]

Motivation: Because DNA-binding proteins (DNA-BPs) play a vital role in all aspects of genetic activity, the development of reliable and efficient systems for automatic DNA-BP classification is becoming a crucial proteomic technology. Key to this technology is the discovery of powerful protein representations and feature extraction methods. The goal of this paper is to develop experimentally a system for automatic DNA-BP classification by comparing and combining different descriptors taken from different types of protein representations.

Results: The descriptors we evaluate include those starting from the Position Specific Scoring Matrix (PSSM) of proteins, those derived from the Amino-Acid Sequence (AAS), various matrix representations of proteins, and features taken from the 3-dimensional tertiary structure of proteins. We also intro-duce some new variants of protein descriptors. Each descriptor is used to train a separate support vector machine (SVM), and results are combined by sum rule. Our final system obtains state-or-the-art results on three benchmark DNA-BP datasets.

Supplementary information: The MATLAB code for replicating the experiments presented in this paper is available at https://github.com/LorisNanni.

Keywords Protein representations, PSSM, AAS, matrix representations

[full paper]