Rhone-Alpes Bioinformatics Center
A Simple Phylogenetic Tree Construction pt. 2
Example 1 : Identifying an unknown bacterial sequence
Phylogenetic techniques are often used in many diverse fields of science, such as medicine, archaeology, criminology, microbiology, etc. When presented with an unknown sequence, phylogenetics is the best tool available to identify this sequence and place it in comparison to others.
In this example we are presented with an unknown proteobacteria from an Environmental sample. By building a phylogenetic tree with other proteobacteria sequences, we will try to identify the unknown sequence’s Class (alpha, beta, delta, epsilon, or gamma).
Understand the method used in identifying an unknown sequence.
Understand the limitations of this method
Get to grips with various software (CLUSTALw, SeaView, Phylo_win and Njplot)
- NB : This is the second of two tutorials, some notions mentioned in this tutorial will have already been explained, please go through the first one if you have not.
STEP 1 :
- Download the FASTA sequences of the unknown proteobacteria (SampleProteobacteria) and the reference sequences here :
- FASTA Sequences
STEP 2 :
- Insert the sequences, one after the other and adequately named using the FASTA format into the input box. The options should not need to be modified since we are dealing with short sequences which apparently are not too divergent.
- Once the results have been calculated, save them by right-clicking on
and selecting “Save target as”. Save the file as Proteobarcteria.aln.
- CLUSTALw options do not usually need to be modified, if the alignment is very incorrect with the default options, chances are it won’t manage either with modified options, but for more details do not hesitate to refer to the online help file.
STEP 3 :
- We will now verify the alignment is suitable for the construction of a tree, if not we will edit it manually.
- Open the alignment file in SeaView (File -> Open Clustal).
- Examine the alignment, look for partial sequences which may need to be deleted.
- We see here that D_PartialSequence is much too short to be precise in the construction of a tree. To remedy this, select the sequence and delete by clicking on EDIT -> Delete Sequence(s). Partial sequences do not usually cause problems in alignments, but when building the tree, the amount of missing residues introduces a large distance error into the correction methods. So, to be on the safe side, it is removed.
- We are also removing the first couple of residues because of possible machine errors while sequencing the DNA, the first and last residues are usually slightly unreliable. To do that, enable PROPS -> Allow Seq. Edition, select all the sequences by double clicking on the name list and set the pointer on the 35th residue. Press delete on your keyboard until the sequences are evened out. After all deletes, disable PROPS -> Allow Seq. Edition.
- After editing an alingment in SeaView, gap-only areas appear, they often tend to cause errors in the tree construction and should be removed by selecting all the sequences and clicking EDIT -> Del. Gap-only sites
- Now save the file under MASE format, call it Proteobacteria.mase
STEP 4 :
- Construct a phylogenetic tree using Phylo_win. Use a Neighbor-Joining method with a bootstrap of 1500. Select the Kimura distance correction method and launch the construction of the tree. The Kimura distance correction method is widely used for DNA sequences, although Ka and Ks may be more accurate when using protein-coding sequences.
- Root the tree with the outgroup sequence and tidy up the tree using swap nodes. Save the tree by selecting Output -> Tree File. This is the format designated by NjPlot because of its capacity to store every information of the tree (branch lengths, bootstraps, topology, etc.). Postscript and list do not support these informations and are not accepted by NjPlot.
The tree seems to show that our environmental sample sequence is in fact a betaproteobacteria because its is clustered with all other betaproteobacteria with 100% bootstrap support. Further identification of the environmental sequence would require more sampling within betaproteobacteria.
STEP 5 :
- Open the previously saved tree.
- You are now able to swap nodes, root the tree, select the desired paper type and many other options before printing your tree.