Shotgun Sequencing of Uncultured Marine Viral Communities
Step 1 - Isolation of Marine Viral Community DNA
Two-hundred liters of water from Scripps Pier was put through a 0.16 mm tangential flow filter (TFF; Centramate by Pall Filtron; Figure 1; Step 1). Approximately 90% of the viral particles and most of the water passed through the filter and were collected in a separate tank (i.e., the filtrate). Eukaryotic cells and most prokaryotes were removed by this filtering step.
The viruses in the filtrate were then concentrated using a 100 kD TFF filter. In this case, the viruses were retained and the water was pushed out through the filter (Figure 1; Step 2). Recovery of viral particles during this step was essentially 100%.
After the TFF, the viral concentrate was loaded onto a cesium chloride (CsCl) step gradient, ultracentrifuged, and the 1.35-1.5 gram per milliliter fraction was collected (Figure 1; Step 3). This fraction contains most of the marine viral DNA as determined by pulse-field gel electrophoresis (PFGE; please see Figure 4 in Steward et al., 2000). The CsCl step removes dissolved DNA and any contaminating microbial cells that passed through the initial 0.16 mm TFF. The viruses in the CsCl were then lysed using a formamide extraction and the DNA was recovered by isopropanol precipitation and CTAB extraction (Sambrook et al., 1989).
To determine if most of the DNA present after the CsCl step was viral in origin, the number of viral particles was determined using epiflourscent microscopy. The amount of DNA they should contain (assuming 5.5x10^-17 g DNA per virus (Steward et al., 2000)) was then calculated and compared to the actual amount of DNA obtained after the formamide extraction and isopropanol precipitation. In 5 test samples prepared using this protocol, >98% of the DNA appeared to come from the viral particles.
|Figure 1. Protocol for concentrating and purifying marine viruses. The micrographs were taken by filtering the various samples onto 0.02 mm Anodiscs and staining with SYBR-GOLD (an adaptation of the protocol by Noble et al., 1998).|
|Step 2 - Constuction of Linker Amplified Shotgun Library (LASL)|
There are a number of significant problems associated with making shotgun libraries from environmental viral communities. Initially, we attempted to make cosmid libraries from viral communities without any success. In retrospect, we now know that large pieces of viral DNA, in particular phage DNA, are deadly to host cells because products such as holins, lysozyme, etc. are expressed if left intact. Therefore, the DNA needs to be sheared into fragments small enough to disrupt most coding regions (~2 kb). The second problem with phage DNA arises from the fact that it is often modified in ways that make it unclonable. Finally, gathering enough viral DNA from environmental samples is a major problem. There are approximately 5.5 x 10^-17 grams DNA per viral particle. Therefore, 1 liter of seawater (~10^10 viral particles) only has 5.5 x 10^-7 g of viral DNA.
We have circumvented these limitations by: 1) fragmenting viral community DNA, and 2) amplifying the products via thermocycling. The in vitro amplification transforms modified DNA into unmodified DNA and increases the amount of target DNA to clone. The fragmentation cuts the "death genes" into pieces that are not expressed. This method of library construction is an improvement on our original protocol for making libraries from phage DNA (Rohwer et al., 2001) and allows us to make high coverage libraries of over a million clones from 0.05-1 mg of input DNA.
Making a LASL
1. Approximately 1 mg of marine viral community
DNA was subjected to hydroshearing.
|David Mead of Lucigen made the SIO51 library that is analyzed in the current manuscript. He can be contacted at David if specific details about the protocol are desired.|
|Step 3. Validation of LASLs|
The LASLs were extensively evaluated using methods that we previously published for a different shotgun library construction technique (i.e., RASLs). More detailed descriptions of these evaluation criteria can be found in Rohwer, Seguritan, Choi, Segall, and Azam (2001) Production of shotgun libraries using random amplification. BioTechniques. 31(1). 1-7. Please note that some of the information presented here has already been published in this manuscript.
Error Rates Associated with LASLs
A LASL was made from E. coli f l DNA and 100 clones were sequenced. The first 650 bp of each fragment was checked against GenBank to determine the number of errors (only 93 out of the 100 sequences actually had 650 "useable" bp and were used in this analysis). Only 332 ambiguities were found in 60,450 bp of sequence analyzed. This represents an error rate of 0.55%. Most of these errors were probably due to the fact that these were single-pass sequences. In addition, the 100 E. coli f l sequences did not contain any chimeric fragments.
Cloning of Unclonable DNA
Previously we had found that Vibrio parahaemolyticus f 16 (Kellogg et al., 1995) was unclonable using standard approaches (Rohwer et al., 2001). In particular it was shown that DNase 1 digestion of the f 16 DNA, followed by blunt-ending and cloning resulted in very few clones. Different methods of fragmentation were also tried, including nebulization and Sau3AI partial digests. We concluded the something about the DNA made it unclonable (e.g., modified nucleotides that killed the host E. coli). Therefore, f 16 DNA was cloned using the LASL method. Approximately 5.6 x 10^6 clones were obtained from 1 mg of input DNA. As shown below, these clones were evenly distributed throughout the f 16 genome. This result shows that the LASL protocol can be used to transform unclonable DNA into clonable DNA.
Randomness of LASLs
Four-hundred and thirty-one of the f 16 LASL clones were sequenced. The Stacking for this library was calculated using Equation 1.
Equation 1. The average number of clones used to contribute to any base within the consensus sequence of a contig can be calculated using this formula (Rohwer et al., 2001), where Ci is the length (in base pairs) of the ith fragment of contig C and |C| is the length of contig C.
|The LASL stacking was compared against three other methods of constructing shotgun libraries. The first of these was a DNase 1 library constructed with Roseobacter SIO67 f SIO1 DNA (Rohwer et al., 2000). The f SIO1 DNase 1 shotgun library represents a random library where the stacking does not increase as the contig length increases. In contrast, a f 16 library constructed by partially digesting the DNA with Sau3AI and cloning into pCR-Zero (Invitrogen; San Diego, CA) was extremely biased. The non-random distribution of this library was caused by the fact that most of the f 16 genome is not clonable without first being transformed in vitro. The third comparison shotgun library was made from f 16 DNA using the Random Amplified Shotgun Library method (Rohwer et al., 2001). This library construction method has been previously shown to produce random fragments. As shown in Figure 2, the LASL was as random as the DNase 1 f SIO1 library and the f 16 RASL.|
|Figure 2. Coverage of genome using the LASL protocol. Stacking characteristics of V. parahaemolytic f 16 LASL vs. the Sau3A I f 16 library vs. the f SIO1 DNase 1 library vs. the f 16 RASL. Twenty contigs from each library were analyzed using Equation 1.|
|A final piece of evidence that the LASL method produces essentially random libraries comes from analyses of the contigs identified in the uncultured marine viral library SIO51 (the one analyzed in the current manuscript). If the LASL method was biased, then the overlap size on the contigs should fall into distinct groups. To illustrate this concept, consider one genome. When the genome is cut up into fragments at preferred sites (e.g., a restriction enzyme site) and the resulting fragments are reassembled, fragments between sites will overlap on top of each other. In contrast, if the fragmentation method is essentially random, then the size of the overlap region will range from the minimum size required to define a contig to the total length of the fragment, and will be evenly distributed along a line. Figure 3 shows that the size of the overlap region in contigs from the SIO51 LASL are evenly distributed (R^2 of the regression line = 0.9299). This result, in conjunction with the Stacking analysis, strongly suggests that the LASL technique produces shotgun libraries that are essentially random.|
|Figure 3. Sequences from the SIO51 LASL were trimmed to remove any contaminating vector sequence or any sequence with >1 N per 50 bp. These sequences were then aligned using Sequencher (Gene Codes) to find contigs. The overlap region in the contigs was then recorded in an Excel spreadsheet, graphed, and the regression line was determined.|
The SP (SIO51 library) sequences in FASTA format.
An SP (SIO51 library) Excel Sheet with the Significant TBLASTX results.
The MB (MB61 library) sequences in FASTA format.
An MB (MB61 library) Excel Sheet with the Significant TBLASTX results.
Distrubution of Fragment Size.
Kellogg, C. A., J. B. Rose, S. C. Jiang, J. M. Turmond and J. H. Paul (1995). Genetic diversity of related vibriophages isolated from marine environments around Florida and Hawaii, USA. Marine Ecology Progress Series 120: 89-98
Noble, R. T. and J. A. Fuhrman (1998). Use of SYBR Green I for rapid epifluorescence counts of marine viruses and bacteria. Aquatic Microbial Ecology 14(2): 113-118
Rohwer, F., A. Segall, G. Steward, V. Seguritan, M. Breitbart, F. Wolven and F. Azam (2000). The complete genomic sequence of the marine phage Roseophage SIO1 shares homology with non-marine phages. Limnology and Oceanography 42(2): 408-418
Rohwer, F., V. Seguritan, D. H. Choi, A. M. Segall and F. Azam (2001). Production of shotgun libraries using random amplification. BioTechniques 31(1): 108-118
Sambrook, J., E. F. Fritsch and T. Maniatis (1989). Molecular cloning: a laboratory manual, Cold Spring Harbor Laboratory Press
Steward, G. F. and F. Azam (1999). Analysis of marine viral assemblages. Microbial Biosystems: New Frontiers Proceedings of the 8th International Symposium on Microbial Ecology, Halifax, Canada
Steward, G. F., J. L. Montiel and F. Azam (2000). Genome size distributions indicate variability and similarities among marine viral assemblages from diverse environments. Limnology and Oceanography 45(8): 1697-1706