|
Home Page | Information | Lay Summary | Help page | Example results
Introduction & Further Information RNA (Ribonucleic Acid) is a functionally significant ligand with diverse functions within the cell, such as translation, transcription and catalysis. The identification of RNA-binding residues, in protein sequences known to bind RNA, contributes to the understanding of many biological systems. Computational function annotation methods are essential for initial screening steps prior to experimental protocols such as mutagenesis. PiRaNhA, a server for predicting the RNA-binding residues in a protein sequence, has recently been developed in a collaboration between the Bioinformatics group, University of Sussex, UK and Laboratory of Protein Informatics, Institute for Protein Research, Osaka University, Japan. A Support Vector Machine (SVM) has been applied to training models on a dataset of 81 non-redundant RNA-binding proteins from the PDB. Models have been created using a combination of features: position specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity, from overlapping subsequences of various lengths. Protein residues involved in binding RNA are defined as those involved in hydrogen bonds or Van der Waals interactions with the RNA molecule. SVM were trained to distinguish between RNA-binding and non-RNA-binding residues in protein sequences. PiRaNhA achieves a Matthews Correlation Coefficient (MCC) of 0.50, and an area under ROC curve (AUC) of 0.86, in a 5-fold cross-validation. When tested on a dataset of a further 42 non-redundant RNA-binding proteins, all of which are non-homologous to the training set, PiRaNhA achieves an MCC of 0.41. Tests using previously unseen sequences are a true test of how well the model will perform when used to make predictions for a novel protein. In both the cross-validation and the test cases, PiRaNhA outperforms all previously available machine learning-based methods. PiRaNhA is an accurate web-based prediction server which enables users to upload sequences and obtain information on potential RNA-binding residues; on-screen and as downloadable files. If you use the server please reference:
Spriggs, R.V., Murakami, Y., Nakamura, H. & Jones. S. (2009).
|
Last Update 16/12/2009