Phylogenetics
A Simple Phylogenetic Tree Construction pt. 1
Example 1 : Ornithorhyncus anatinus
Problem :
According to taxonomy, the duck-billed platypus (Ornithorhynchus anatinus) is a mammal, but signs could lead us to believe that this classification is wrong : amphibian, oviparous, duck-billed, etc. We will try to verify this classification using molecular data.
Objectives :
Understand the method behind constructing a phylogenetic tree from the search for sequences to the analysis of the tree.
Get to grips with various bio-informatic software (BLAST, CLUSTALw, SeaView and Phylo_win).
Understand the FASTA format.
Understand the limitations of these methods.
- NB : for more specific information on how to use the programs, please refer to the individual tutorials.
STEP 1 :
- On the search page, insert Ornithorhynchus anatinus as a Species or Taxon and Cytochrome B as a keyword. Verify settings are set on Search for sequences and Protein databank. Use the Uniprot/SwissProt database.

- Select the Cytochrome B sequence and press Retrieve. You will obtain the sequence in FASTA [1] format.


STEP 2 :
- Open BLASTp from the PBIL website by selecting the BLAST - protein + nucleic followed by the BLAST vs. Protein DB.

- We chose this version of BLASTp over the FASTA and the other PBIL BLASTp due to the large amount of search options available.
- Enter the sequence in FASTA format [1], and then set the advanced options to these settings in order to obtain a large number of varied results.

- Now, we will use the filter to select sequences of other taxons which will be compared with the Ornithorhynchus anatinus sequence. The purpose of this is to obtain sequences for various tetrapods and an outgroup [2] sequence which will allow us to root the tree. By picking only a few sequences from each of these groups, we obtain a varied batch of sequences and enough to construct a tree while not over-crowding it.

Taxon IS aves
- Select 3 sequences and copy their FASTA sequence after the Ornithorhynchus anatinus sequence in your text file. Do not hesitate to rename the sequences, if they are listed by their reference number in the phylogenetic tree, they will not be easily identified.
- Suggested sequences :
Red bird of paradise (CYB_PARRB) Paradisaea rubra
Jungle crow (Q85UF8_CORMC) Corvus macrorhynchos
Gray Jay (P92708_PERCN) Perisoreus canadensis
- Then press back on your browser to return to the original search results. Do this for each of these filters selecting a couple sequences each time.
Taxon IS amphibia
- Suggested sequences :
Salamander (Q644G6_9SALA) Gyrinophilus porphyriticus
Axolotl (Q70ED6_AMBME) Ambystoma mexicanum
Portugal painted Frog (Q5MQM1_9ANUR) Discoglossus galganoi
Taxon IS eutheria
- Suggested sequences :
Northern Flying Squirrel (Q34661_GLASA) Glaucomys sabrinus
Red Deer (Q94QC4_CEREL) Cervus elaphus
Tasmanian Devil (CYB_SARHA) Sarcophilus harrisii
Red kangaroo (CYB_MACRU) Macropus rufus
Blue Whale (CYB_BALMU) Balaenoptera musculus
Taxon IS monotremata
- Suggested sequences :
Long-Beaked Echidna (Q5ZN98_ZAGBR) Zaglossus bruijni
Taxon IS squamata
- Suggested sequences :
Common Iguana (CYB_IGUIG) Iguana iguana
Iberian Wall Lizard (Q85L30_PODHI) Podarcis hispanica
Taxon IS NOT tetrapoda
- Suggested sequences :
Goldfish (CYB_CARAU) Carassius auratus
- If IS NOT tetrapoda is input into the filter, we will be able to obtain our outgroup [2]
- After a certain amount of time, BLAST may discard the results, if that happens, simply restart the search.
If you wish to skip this part and just download the sequences, click here.

- FASTA Sequences
STEP 3 :
- Insert the sequences, one after the other and adequately named using the FASTA format [1] into the input box. The options should not need to be modified since we are dealing with short sequences which apparently are not too divergent.
- Once the results have been calculated, save them by right-clicking on
and selecting “Save target as”. Save the file as Ornithorhynchus.aln.
- CLUSTALw options do not usually need to be modified, if the alignment is very incorrect with the default options, chances are it won’t manage either with modified options, but for more details do not hesitate to refer to the online help file.
STEP 4 :
- Sometimes, if the automatic alignment has not been done properly, it is necessary to manually edit it. For this, we will use SeaView, available for download on the PBIL website : download seaview .
- Open the alignment file in SeaView (File -> Open Clustal)

- If the sequence seems well aligned, it’s all good, if not, you should refer to the second part of the tutorial which explains how to manually correct minor errors in a sequence.
- Once you are finished, save the alignment under the MASE file format by doing File -> Save as... and typing in Ornithorhynchus.mase . The MASE format is recommended due to its more robust build and its ability to contain site and species selections as well as comments.
STEP 5 :
- Now comes the more technical part, constructing the phylogenetic tree. To do this, obtain , install and open Phylo_win .

- Neighbor-Joining and Maximum Parsimony are two of the major methods of tree construction. Often, both methods return the same results, Neighbor-Joining being much quicker and showing distances. On the other hand, Maximum Parsimony may be the most accurate method when dealing with low divergence levels .
- Neighbor-Joining requires computation of distances between sequence pairs, and Phylo_win offers various kinds of distances through its DISTANCE menu. A first possibility is to use the Observed Divergence method, although this will not be looked into ; Obs. Div. is the quickest and simplest distance method but also the most innacurate. It calculates the percentage of different residues between two sequences and uses that value to build the tree. Another distance between protein sequences is given by the Poisson correction that assumes that all amino acid replacements occur with equal probability.
- Bootstrapping consists of taking random portions of a sequence and comparing them to the other sequences repeatedly. This provides us with an evaluation of the robustness of each node. If the sequence is repeated between 500 to 1000 times we are able to gain a statistically accurate result. A bootstrap over 70 may be considered reliable, above 90 is good.
- To set up the options for the Neighbor-Joining method we will select from the DISTANCE menu the Poisson correction method. If we wish to construct the tree through Maximum Parsimony, this is not required.
- Verify all sequences have been selected by clicking Select All on the Species and Site selection boxes.
- Once the tree has been created, it will need cleaning up. First, the tree needs to be rooted [2]. That is the purpose of the goldfish sequence. It is far enough from all the other sequences to be able to be used as a root [2]. For that select new outgroup, and select the Carassius auratus branch. Once that is done, select swap nodes and rearrange the tree to look a bit clearer and display the bootstraps.

- For information on exporting and printing the tree, please refer to the second tutorial
- You should obtain trees similar to these two. A bootstrap ≥ 90% indicates a branch with strong support from the data . Hence these trees clearly show that the Mammalia taxon is monophyletic. The Ornithorhyncus anatinus and the other mammals share the same common ancestor which no other taxon shares as well. The Ornithorhyncus anatinus is most probably a mammal. In contrast, the grouping of Monotremata with placental mammals, to the exclusion of marsupials is supported by a non significant bootstrap value (50% with parsimony, 58% with NJ), showing that this grouping is not supported by the data.
[1] FASTA format :
The FASTA format is a) a sequence name introduced by “>”, the name must be kept under 10 characters preferably as some programs will shorten it, b) the sequence. Multiple sequences may be listed one after the other in the same file.
NB : Protein and DNA cannot both be present in the same file, only one or the other.
For example :
>A17677
GTCGACACGCCTTCTGCACGGGAAGTCCTTCTGCGGCCATCGTTGCTATGGCCGCTTACT
GCCTTCTAGTCCGTGCGGCTCTCGCAACAGCTCACGGGACCTTTTTGAGGATCGCCACTT
CAGGTCTTCAACTCGCGGATGCCCTCATTGGCAACGTTTGCGCCCTGCCTTGGGGCGGCC
GGCAGCCACCAAGTCGAGCACTTTGCGGCGGAACTACTCGGGGTAACACTTCGGCACGGA
>U00096
MRNPTLLQCFHWYYPEGGKLWPELAERADGFNDIGINMVWLPPAYKGASGGYSVGYDSYD
LFDLGEFDQKGSIPTKYGDKAQLLAAIDALKRNDIAVLLDVVVNHKMGADEKEAIRVQRV
NADDRTQIDEEIIECEGWTRYTFPARAGQYSQFIWDFKCFSGIDHIENPDEDGIFKIVND
YTGEGWNDQVDDELGNFDYLMGENIDFRNHAVTEEIKYWARWVMEQTQCDGFRLDAVKHI
[2]
An outgroup is a group of sequences which were selected to root the tree. An outgroup has always branched off from the common ancestor of all other analysed sequences before them. This ensures that the root of the tree is on the branch between the outgroup and the other sequences.
Here, the goldfish is not a tetrapoda, this implies that it branched off at a very early date before all other sequences.
An outgroup must still remain genetically close to the anaylsed sequences for maximum efficiency.