Protein Secondary Structure Prediction Using Support Vector Machines and a New Feature Representation

Full text for this resource is not available from the Research Repository.

Gubbi, J, Lai, Daniel ORCID: 0000-0003-3459-7709, Palaniswami, M and Parker, M (2006) Protein Secondary Structure Prediction Using Support Vector Machines and a New Feature Representation. International Journal of Computational Intelligence and Applications, 6 (4). pp. 551-567. ISSN 1469-0268

Abstract

Knowledge of the secondary structure and solvent accessibility of a protein plays a vital role in the prediction of fold, and eventually the tertiary structure of the protein. A challenging issue of predicting protein secondary structure from sequence alone is addressed. Support vector machines (SVM) are employed for the classification and the SVM outputs are converted to posterior probabilities for multi-class classification. The effect of using Chou–Fasman parameters and physico-chemical parameters along with evolutionary information in the form of position specific scoring matrix (PSSM) is analyzed. These proposed methods are tested on the RS126 and CB513 datasets. A new dataset is curated (PSS504) using recent release of CATH. On the CB513 dataset, sevenfold cross-validation accuracy of 77.9% was obtained using the proposed encoding method. A new method of calculating the reliability index based on the number of votes and the Support Vector Machine decision value is also proposed. A blind test on the EVA dataset gives an average Q3 accuracy of 74.5% and ranks in top five protein structure prediction methods. Supplementary material including datasets are available on http://www.ee.unimelb.edu.au/ISSNIP/bioinf/.

Dimensions Badge

Altmetric Badge

Item type Article
URI https://vuir.vu.edu.au/id/eprint/2984
DOI 10.1142/S1469026806002076
Official URL https://www.worldscientific.com/doi/abs/10.1142/S1...
Subjects Historical > FOR Classification > 0903 Biomedical Engineering
Historical > Faculty/School/Research Centre/Department > School of Sport and Exercise Science
Keywords ResPubID19009, protein secondary structure prediction; support vector machines; position specific scoring matrix (PSSM); Chou–Fasman parameters; Kyte–Doolittle hydrophobicity; Grantham polarity; reliability index; novel encoding scheme
Download/View statistics View download statistics for this item

Search Google Scholar

Repository staff login