Guide Home

Phage isolation
and purification


Clone library
construction


Sequencing

Assembly

Post-assembly
analyses


The Guide to Phage Genomics:
Post-Assembly Analyses

OVERVIEW

ORF finding
BLASTing the ORFs
Identifying ORFs by synteny

tRNA

Repeats

Codon Bias
Dinucleotide analysis
Comparative genomics analyses
Placement on the Phage Proteomic Tree

Rob Edwards' Miscellaneous Scripts



ORF Finding

We recommend the following ORF Finders depending upon the amount / type of ORF Finding you will be doing.

A. The ORF Finder is a graphical user interface tool for finding ORFs in smaller pieces of DNA and can be found at http://www.ncbi.nlm.nih.gov/gorf/gorf.html

B. Glimmer is an ORF Finder designed for finding genes in prokaryotic DNA. It is provided by TIGR at http://www.tigr.org/software/glimmer/. If this program works for you then you can benefit from a Perl program also now available, free to all, that will use Glimmer's predictions as input to the BLAST and FASTA programs to search any locally-installed protein database. Papers describing Glimmer 1.0 and 2.0 are available at the Glimmer HomePage:
(1) S. Salzberg, A. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated
Markov models. Nucleic Acids Research 26: 544-548.
(2) A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER. Nucleic Acids Research, 27:4636-4641.

C. The GeneMark suite of ORF Finders are provided by Mark Borodovsky's Bioinformatics Group at the Georgia Institute of Technology and can be downloaded at http://opal.biology.gatech.edu/GeneMark/. Of particular interest to studies of phage genomes is GeneMarkS which is designed for finding genes in prokaryotic DNA with a specific focus on identifying gene starts and detection / modeling of functional sites in upstream sequences (eg. ribosomal binding sites). Two GeneMark ORF Finders for prokaryotic viruses can be found at http://opal.biology.gatech.edu/GeneMark/virus.html.
(1) "GeneMarkS" is designed for prokaryotic viruses with larger genomes (>100 kb).
(2) "Heuristic Approach" is designed to handle smaller phage genomes (<100 kb).

GeneMarkS is described in: Besemer J., Lomsadze A. and Borodovsky M. 2001. GeneMarkS: a self-training method for predicition of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29: 2607-2618 -- download the PDF (from the on-line version of Nucleic Acids Research).

Back to Top



BLASTing the ORFs

You can either BLAST your sequences individually through the NCBI webserver
(http://www.ncbi.nlm.nih.gov/BLAST/) or use one of the two high-throughput BLASTing options that can be downloaded through the ftp site at NCBI
:
(1) BLASTCL3 that can be used to BLAST large FASTA files with many ORFs over the NCBI network (ftp://ftp.ncbi.nih.gov/blast/blastcl3/)
(2) Stand-alone BLAST binaries are available for many platforms to set up BLAST as a local function (ftp://ftp.ncbi.nih.gov/blast/executables/)

NOTE: Using iterative PSI-BLASTing has been suggested to increase your sensitivity to finding phage structural genes (MORGAN, G. J., HATFULL, G. F., CASJENS, S., and HENDRIX, R. W. 2002. Bacteriophage Mu Genome Sequence: Analysis and Comparison with Mu-like Prophages in Haemophilus, Neisseria and Deinococcus. J Mol Biol 317: 337-359)

Back to Top



Identifying ORFs by Synteny

The fact that sequence divergence of phage genes is extremely high leads to the need for identifying some ORFs on the basis of synteny. Identification of an ORF by synteny is done when readily identifiable ORFs surround an unidentified ORF in a very well-defined gene cassette. If gene order within this cassette of genes (eg. the T4 gp18-23 structural genes cassette; Hambly et al. 2001) is conserved one may be able to annotate an ORF with a low BLAST e-value with greater confidence. Where was this first described and demonstrated -- Hendrix? Brussow? REF ...

Back to Top



tRNA

There are many programs available for scanning for tRNA genes in your phage genomes. We recommend the following: http://www.genetics.wustl.edu/eddy/tRNAscan-SE/. Detailed information about this program can be found at http://www.genetics.wustl.edu/eddy/software/#trnascan.

Back to Top



Repeats

Identifying terminal repeats in phage genomes can be useful for determining the ends of the genomes.

You can try Rob Edward's script for finding repeats http://salmonella.utmem.edu/cgi-bin/repeatfinder.cgi (source code at http://salmonella.utmem.edu/cgi-bin/cgi.cgi?submit=retrieve&script=10). Alternatively, "repeat-finder" is a program available through TIGR (http://www.tigr.org/software/) for identifying all repeats in very large sequences. (Genome Biology 2:0027.1-0027.11, 2001)

Back to Top



Codon Bias

Paragraph here and link to tools to examine this ...

Back to Top



Dinucleotide analysis

Paragraph here and link to tools to examine this ...

Back to Top



Comparative genomics analyses

For visual comparison of two or more genomes, you could use the ACT DNA Sequence Comparison Viewer (for more information and a free download go to http://www.sanger.ac.uk/Software/ACT/). other tools?

Back to Top



Placement on the Phage Proteomic Tree

The taxonomy of phage is a controversial subject. Official phage taxonomy is based on physical characteristics of the free phage particle (this system is explained here). Unfortunately, phage that look alike can have extremely different biological properties (eg. P22 and lambda). To avoid this problem, we compared the genomes of 105 phage and proposed a new
taxonomy system called The Phage Proteomic Tree. More information on the Phage Proteomic Tree can be found in Rohwer & Edwards (2002) and at the Phage Arboretum web page (http://salmonella.utmem.edu/phage/tree/). Contact Rob Edwards (redwards@utmem.edu) about having your phage genome placed on the Phage Proteomic Tree.

Rohwer & Edwards, 2002. The Phage Proteomic Tree: a genome-based taxonomy for phage.
J Bacteriol. 2002 Aug;184(16):4529-35. -- PDF

Back to Top



Rob Edwards Miscellaneous Scripts

http://salmonella.utmem.edu/cgi-bin/cgi.cgi


Page created: February 20, 2003
Last modified: February 25, 2003
For questions or comments, e-mail Forest Rohwer forest@sunstroke.sdsu.edu or Matt Sullivan mbsulli@mit.edu