Rhone-Alpes Bioinformatics Center
- Two versions of HOVERGEN are available : the clean and the standard version. Both are available on the PBIL website in the database section, the clean version is reached by clicking on the link in the standard HOVERGEN page.
- HOVERGEN is a database containing homologous vertebrate protein and nucleotide sequences. It allows to easily select similar gene sequences from a wide range of vertebrates. Hence it becomes particularly useful in comparative genomics, phylogeny and evolutionary studies on a molecular level. HOVERGEN Clean contains only complete sequences which reattach to their family. Hence its library is smaller, but more reliable.
- HOVERGEN also offers pre-calculated sequence alignments and phylogenetic trees for most genes (as long as there are not too many sequences). Basically, it amounts to locating a sequence, BLASTing it to retrieve homologous sequences, aligning them through CLUSTALw or a similar program and constructing a phylogenetic tree but without all the complications, time and effort.
- HOVERGEN trees are done beforehand and stored, hence to obtain a specific tree for a limited number of sequences or when doing an advanced analysis, it is possible to download the CLUSTALw alignment, edit it in SeaView and rebuild the tree.
- For example, use the Protein keyword search in HOVERGEN clean, and search for the family “*insulin*”. The * is a wildcard, meaning that HOVERGEN will not only search for “insulin”, but also for “INSULIN/IGF/RELAXIN” or “Insulinoma-glucagonoma” for example.
- Select the INSULIN/IGF/RELAXIN FAMILY sequence by clicking on its identifier name.
- You are now on the gene family page.
- Click Tree to see the gene family tree or click Alignment to see the CLUSTALw alignment. If desired you may also select a limited number of species from the list below (Ctrl + Click or Shift + Click to select multiple sequences) and retrieve the alignment of their sequence by pressing submit.
- On the window which opens up, you may save the alignment using the .aln format or save the tree to your hard drive.
- Retrieve the complete alignment by selecting Alignment, then
save the file to your hard drive as Insulin.aln.
- Open SeaView and load the file. Examine the alignment and correct any possible errors you might see, delete the sequences you do not wish to analyse and save the file in the .aln format if you modified anything or in .mase format if you left it as it was.
- If the alignment was modified, run the new .aln file through CLUSTALw and examine it in SeaView again, repeat as necessary.
- Once you are satisfied, open the .mase file under Phylo_win and construct your tree.
For more detailed help on rebuilding the tree, please refer to articles A Simple Phylogenetic Tree Construction pt. 1 and A Simple Phylogenetic Tree Construction pt. 2.
Example 1 : ADT1_BOVIN
We have an ATP carrier protein extracted from a bovine heart.
Is it possible to view its evolution, is it present in other vertebrates ?
How do we retrace the evolution of the gene.
Understand how HOVERGEN can explain the evolution of a protein family.
- Go to HOVERGEN Clean, and enter in the protein sequence search engine AC P02722.
- This should return the ADT1_BOVIN sequence. This sequence is that of an ATP carrier protein extracted from a bovine heart.
- Select it and display the phylogenetic tree.
- Through the examination of this tree, we are able to determine a certain evolution of the gene. We can already point out the duplications which (may) have happened during the evolution of this gene. To obtain these results, we used the software FamFetch which runs through the tree attempting to identify the duplications. If more certain results are desired, it is advised to analyze the tree by hand.
- The white squares on the following tree show duplications :
- Duplication leads to two paralog sequence ; this is when a gene is copied in the same species. Since two copies exist, the selective pressure is quite low and the second gene is able to mutate much more. (for example : ADT1_HUMAN and ADT2_HUMAN in red).
- Speciation, on the other hand, leads to ortholog sequences. Speciation is when the common ancestor evolves into two species, each one with a copy of the original gene. This is much more common in this tree.
- Hence we can retrace the evolutionary history of the ADT1_BOVIN sequence.