Two options for
evaluating the quality of sequence data and assembling the clean sequence
data into completed genomes include the following:
ends and vector sequence off your clone sequences
The assembly process can be incredibly frustrating and should be done in an iterative manner. From your 800+ bp sequence reads, you will first have to trim the ends (there are many stringencies associated with this action --- you'll have to decide how stringent to be) to remove the sequence surrounding ambiguous base calls where the quality of the sequence reactions are often low, then trim the sequence that is associated with the cloning vector linker sites. You are best to trim one, then the other iteratively moving towards cleaner sequence. Next you should examine the chromatograms from your remaining sequence and be sure that the peak heights do not fall off at the 3' end ... if they do then you should select a cut-off for how much of this sequence information to include so as to avoid confounding your assembly.
primers to the ends of contigs for "multiplex" PCR to close
gaps between contigs
If you had enough DNA to size your genome then you will know how large the assembled genome should be to know if you've assembled the sequence fragments and you were lucky enough to end up with a single contig. More often than not, though, you will still have multiple contigs and presumed gaps in your assembly due to incomplete coverage in the clone library or to not sequencing enough clones to completely cover the genome. In the case that you must close gaps, the most efficient method would be to design non-degenerate PCR primers to the ends of all contigs and perform "multiplex" PCR with the pooled primers as described in Tettelin et al. (1999) and used in completion of the Roseophage SIO1 genome by Rohwer et al. (2000). PCR products are cloned and sequenced and used in the assembly process as described above to try to close gaps between contigs. As gaps are closed the appropriate primer pairs are removed from the mix of PCR primers and the multiplex PCR reaction is re-run to iteratively close gaps between contigs.
Rohwer et al. 2000. The complete genomic sequence of the marine phage Roseophage SIO1 shares homology with nonmarine phages. Limnology and Oceanography. 45: 408-418.
Tettelin et al. 1999. Optimized multiplex PCR: Efficiently closing a whole-genome shotgun sequencing project. Genomics. 62: 500-507
The final stage of completing phage genomes is to determining the ends of the genome. Most phage genomes are required to circularize or form linear tandem copies as part of their packaging process (??? is this true -- why, biologically, do they have repeats???) so they have repeated sequences of ~__ bp at the ends of their genomes. If you have captured these repeats then you likely have your ends, if not you might be required to try to get the ends through direct sequencing on your phage DNA by designing outward facing PCR primers to the ends of your single contig.
created: February 20, 2003
Last modified: February 25, 2003
For questions or comments, e-mail Forest Rohwer firstname.lastname@example.org or Matt Sullivan email@example.com