CSUPERB Microchemical Core Facility Homepage
          MicroChemical Core Facility
Home Service Technical Support Price Info & Order Map Direction Links & Software Contact + People

Interpreting Chromatograms..
Our lab routinely produces high quality data with read lengths to 700 bases; however, this depends on many factors, including how clean the template DNA is, or the annealing efficiency of the primer. To prepare DNA which is clean enough for automated sequencing, see Template Preparation. Other problems in the sequencing reaction can affect success as well. Some examples of these, with their solutions, follow.

A good sequence

Low Signal-to-Noise Ratio

Salt Contamination

Alcohol Contamination

GC Rich or Palindromic Regions

Double Priming

Miss call due to weak peaks after stronger peaks

Heterozygote Double Peaks

Poly A region

Other observations

  1. First an example of a good sequence

    Fig. 1 shows an excellent sequence chromatogram of our standard reaction with clean, distinct peaks and very low to no background noise. The sequence was completely accurate to 586 bases with 3 miscalls to 700 bases.

    Figure 1.

    You can expect fully accurate, reliable sequence to be found from 30 to 500-600 bases from the priming site, with 98.5% accuracy extending to 650-700 bases in some reactions.
    Fig. 2 shows the general appearance of the chromatogram peaks around 650 bases, which are much broader and less defined than around the 400 base region.


    Figure 2.


  2. Low Signal-to-Noise Ratio

    Fig. 3 shows a chromatogram with noisy signal peaks; however, the gel image had a completely blank lane. The chromatogram is actually depicting background signal.
    Figure 3.

    Fig. 4 shows a chromatogram with actual sample signal that is very low. The total amount of signal for this sample is about 10% of the amount of signal obtained from the standard reaction run on the same gel. As a result, the background noise level is comparable to the signal level, and can introduce false sequence peaks and deletions.
    The bottom line is that the sequence is not reliable. Low signals are most commonly caused by "dirty DNA", containing small molecule or RNA contaminants.

    Solution: Prepare your DNA template again, taking care not to introduce small molecule contaminants such as salts, EtOH and EDTA (See DNA Template Preparation).

    Figure 4.

  3. Salt Contamination

    Fig. 5 shows a chromatogram with 75mM NaCl added to our standard template reaction. The sequence starts off nicely, but then there is a decrease in signal beginning around 300 bases, gradually descending to background level by the time the first N is called at position 434. A comparison of this sequence to the pGEM3Zf+ sequence on file at NCBI, shows the first miscall to be at base 434, with only 3 more miscalls to 500, 9 miscalls to 550, with a drastic deterioration of 44 miscalls to 600. Salt contamination alone is not a big problem, but in combination with other trace contaminants, can erode accuracy, and shortens read lengths.

    Solution: Wash final DNA pellet with 70% isopropanol (30% water) and dry in a spin vac before resuspending in pure autoclaved water.

    none none
    Figure 5A.
    Figure 5B.
    none none
    Figure 5C.
    Figure 5D.

  4. Alcohol Contamination

    Fig. 6 shows a chromatogram with 1% EtOH added to our standard template reaction. The peaks are sharp and distinct to about 270 bases, but gradually drop in size to background level by the time the first N is called at position 419. A comparison of this sequence to the pGEM3Zf+ sequence on file at NCBI, shows the first miscall at 359 bases, with 34 miscalls to 500, and 97 miscalls to 600. In the chromatogram, you can see the sequence rapidly deteriorates after 400 with erratic peaks. In combination with other contaminants, it can contribute to poor sequence data.

    Solution: Make sure all ethanol is evaporated off of the DNA pellet after precipitation; dry in a spin vac, if possible, before resuspending in pure autoclaved water.

    none none
    Figure 6A.
    Figure 6B.
    none none
    Figure 6C.
    Figure 6D.

  5. GC Rich or Palindromic Regions

    Regions with a GC content greater than 62% are difficult to sequence under our standard reaction conditions. The reason for this is thought to be due to the stronger bonds in GC base pairs, which require a higher melting temperature to denature them. Good denaturation is necessary to allow efficient annealing of the primer and subsequent extension. Fig. 7 shows an example of an abrupt signal drop-off after a good run of sequence, at a GC rich region.

    A similar result is sometimes seen if you are attempting to sequence through a region that has long palindromic sequences that form secondary structure even during the denaturation cycle. These hairpin structures form physical barriers and the DNA polymerase has difficulty reading through these regions.

    Solution:When time permits, the facility will attempt to use an alternate PCR cycle with higher than standard temperatures, or add 5% DMSO or glycerol to the reaction. If this does not resolve the problem, then cutting the insert in the problem regions, subcloning, and then resequencing may be necessary.


    Figure 7.

  6. Double Priming

    Fig. 8 shows a sequence with clean, distinct peaks, but software frequently called N's in several positions. Appearance is that of two separate sequences overlapping, so that multiple peaks occupy the same position, such that clean sequence cannot be determined. This is due to the presence of two priming sites yielding DNA products from two different sequences in the same sample, a common occurrence in cloning.

    Solution: In lieu of recloning your insert into a new vector, you may be able to sequence the same sample using a different primer. Commercial plasmids with multiple cloning sites usually contain alternative common priming sites (e.g. T7, SP6, etc.). If not, you may have to resort to designing a specific primer for your template.


    Figure 8.

  7. Miscalls due to weak peaks after stronger peaks

    These are usually manually edited by our lab before delivery. You can identify these as lower case letters in the sequence, which are also underlined. However, there are places where the weak peak effect may be occurring, yet not be obvious enough for us to call. Weak peaks result from suppression of signal following a strong signal, occurring most commonly for G's after A's, and often for G's after C's, as seen in Fig. 9 We provide computer files of the chromatograms as well as the straight sequence, which can be viewed and edited by Editview software. This is free software available from ABI Perkin-Elmer. To download a copy for either PC or Macintosh, see the Links & Software on our Homepage.

    noneFigure 9: The G's directly following A or C peaks (circled) are suppressed in size compared to other G's in the sequence

  8. Heterozygote Double Peaks

    If you are sequencing DNA directly from a diploid organism (e.g. PCR products from chromosomal DNA) you may see double peaks at one nucleotide, flanked by clean single-peak sequence, as shown below in Fig. 10 This can indicate that the two alleles of the PCR'ed gene are different (the organism is heterozygous), and one base is present on one allele, while the second is present on the other allele.


    Figure 10.

  9. Poly A Region

    Fig. 11A shows drop-off in sequence due to a poly A region. Sequence ahead of this region is clean and accurate; however, sequence following the region has degraded, as shown in Fig. 11B This is caused by polymerase slipping as it extends the poly A chain, essentially causing a frame shift that can produce inconsistent poly A lengths and subsequent chain terminations, i.e. different fluorescent terminations in fragments of the same length, degrading sequence accuracy.

    Solution: to read past the poly A region, use a poly T primer with a degenerate base in the 3' position.

    none none
    Figure 11A.
    Figure 11B.

  10. Other Observations

    In other experiments in our lab, we found that trace amounts of other potential contaminants, i.e. 0.2% phenol-chloroform and trace amounts of silica beads, did not affect sequence data. In investigating the possible effect of TE contamination, 5mM tris did not affect the reaction, but as little as 2 mM EDTA totally squelched signal resulting in a blank lane.

    DMSO is a common additive to sequencing reactions, which can sometime resolve problems thought to be due to secondary structure. The addition of 0.5% DMSO reduces signal by half, and 1% DMSO reduces it further by half again. This same effect has been observed repeatedly in our lab, so that we routinely increase template and primer to maximize signal whenever adding DMSO.


MCF Home SDSU CSUPERB Contact Us Last Updated: 5/2006