Guide Home

Phage isolation
and purification


Clone library
construction


Sequencing

Assembly

Post-assembly
analyses


The Guide to Phage Genomics:
Assembly

OVERVIEW

Software for sequence evaluation and assembly
Raw sequence data to a first assembly
Closing gaps between your assembled contigs

Finding the ends



SOFTWARE FOR SEQUENCE EVALUATION AND ASSEMBLERS

Two options for evaluating the quality of sequence data and assembling the clean sequence data into completed genomes include the following:

(1) UNIX-based =
PHRED/PHRAP/CONSED
Phred/Phrap/Consed are available through the UW Genome Center for free for academic use. These three programs act in tandem: Phred is the base-caller from raw sequence data, Phrap is the assembler and Consed and autofinish are the Unix-based Graphical Editor and Automated Finishing Program for Phrap Sequence Assemblies.
More information can be found at http://www.phrap.org/.

(2) PC- or Mac-based = Sequencher
Sequencher is sold by GeneCodes. It can be used for vector and transposon screening, ORF analysis, protein translations, restriction mapping, heterozygote identification, and more. It is user friendly for those that are not familiar with UNIX operating systems, but often takes longer to process assemblies.
More information can be found at http://www.genecodes.com/

Back to Top



RAW SEQUENCE DATA TO A FIRST ASSEMBLY

Trim ends and vector sequence off your clone sequences
Carefully examine your sequence for repeats / low peaks at 3' end that may confound assembly

Assemble the sequenced clones into contigs

The assembly process can be incredibly frustrating and should be done in an iterative manner. From your 800+ bp sequence reads, you will first have to trim the ends (there are many stringencies associated with this action --- you'll have to decide how stringent to be) to remove the sequence surrounding ambiguous base calls where the quality of the sequence reactions are often low, then trim the sequence that is associated with the cloning vector linker sites. You are best to trim one, then the other iteratively moving towards cleaner sequence. Next you should examine the chromatograms from your remaining sequence and be sure that the peak heights do not fall off at the 3' end ... if they do then you should select a cut-off for how much of this sequence information to include so as to avoid confounding your assembly.

Back to Top



CLOSING GAPS BETWEEN YOUR ASSEMBLED CONTIGS

Design non-degenerate primers to the ends of contigs for "multiplex" PCR to close gaps between contigs
Clone and sequence multiplex PCR amplicons
Re-assemble
until genome is "complete"

If you had enough DNA to size your genome then you will know how large the assembled genome should be to know if you've assembled the sequence fragments and you were lucky enough to end up with a single contig. More often than not, though, you will still have multiple contigs and presumed gaps in your assembly due to incomplete coverage in the clone library or to not sequencing enough clones to completely cover the genome. In the case that you must close gaps, the most efficient method would be to design non-degenerate PCR primers to the ends of all contigs and perform "multiplex" PCR with the pooled primers as described in Tettelin et al. (1999) and used in completion of the Roseophage SIO1 genome by Rohwer et al. (2000). PCR products are cloned and sequenced and used in the assembly process as described above to try to close gaps between contigs. As gaps are closed the appropriate primer pairs are removed from the mix of PCR primers and the multiplex PCR reaction is re-run to iteratively close gaps between contigs.

Rohwer et al. 2000. The complete genomic sequence of the marine phage Roseophage SIO1 shares homology with nonmarine phages. Limnology and Oceanography. 45: 408-418.

Tettelin et al. 1999. Optimized multiplex PCR: Efficiently closing a whole-genome shotgun sequencing project. Genomics. 62: 500-507

Back to Top



FINDING THE ENDS OF YOUR GENOME

The final stage of completing phage genomes is to determining the ends of the genome. Most phage genomes are required to circularize or form linear tandem copies as part of their packaging process (??? is this true -- why, biologically, do they have repeats???) so they have repeated sequences of ~__ bp at the ends of their genomes. If you have captured these repeats then you likely have your ends, if not you might be required to try to get the ends through direct sequencing on your phage DNA by designing outward facing PCR primers to the ends of your single contig.

Back to Top


Page created: February 20, 2003
Last modified: February 25, 2003
For questions or comments, e-mail Forest Rohwer forest@sunstroke.sdsu.edu or Matt Sullivan mbsulli@mit.edu