Training Artificial Neural Networks To Identify Phage Structural Proteins
Michael Arnoult§, Victor Seguritan#, Peter Salamon¥, and Anca Segall§.
Department of Computational Science#, Department of Biology§, and Department of Mathematics and Statistics¥, San Diego State UniversityBacteriophages are the single most abundant biological entity on earth, and influence every environment in which bacteria exist. Research of bacteriophages, the ir physical components, and functions are therefore of great interest to the scientific community. There are no current algorithms which reliably analyze phage structural protein sequences and predict their function. The research conducted is aimed at developing a new tool that will classify phage structural proteins u sing Artificial Neural Networks, a computational method of analysis inspired by biological neurons. Features of phage protein sequences with known classificatio ns are used to train the neural networks. The networks then predict whether an unknown sequence produces a protein of a specified function. Analysis of the pred ictions will allow biologists to decide, with some accuracy, which proteins are the most appropriate candidates for their research needs.
The training and testing of Neural Networks for this purpose in BioInformatics is a multiple-step process. Known phage major capsid proteins and tail proteins w ere collected, and rid of sequences with inappropriate descriptions. The percent compositions of the amino acids, as well as four other mathematical representat ions of positive and negative sequence examples were used to train neural networks. In test cases, the neural networks classify phage major capsid sequences and non-major capsid sequences with an average of 90% accuracy. Several neural networks will be trained using multiple combinations of the mathematical representat ions, and tested using phage and non-phage genomes; comparisons will provide insights into the differences between phage structural proteins with respect to a v ariety of molecular and organismal perspectives. The validity of at least some of the predictions will be determined experimentally and by ClustalW analysis.