HoSeqI is a web service allowing to automatically assign sequences to homologous gene families from a set of databases. After identification of the most similar gene family to the query sequence, this sequence is added to the whole alignment and the phylogenetic tree of the family is rebuilt. Thus, the phylogenetic position of the query sequence in its gene family can be easily identified (author AM Arigon)

HoSeqI (Homologous Sequence Identification) is a software environment allowing the automatic identification of homologous sequences and their classification into our comprehensive sequence family databases HOVERGEN and HOGENOM.

The HoSeqI environment integrates different programs of similarity search, multiple alignments and phylogenetic tree building, as well as specific tools we developed. This environment can be accessed through a web service implemented in HTML-PHP. It is divided into three parts.

 First, the identification algorithm uses BLASTP to compare the query protein or nucleotide sequence with the protein family database chosen by the user. The BLASTP results are analyzed in order to identify potential families for the submitted sequence.

Alignment viewer

 Second, it is necessary either to align a large number of sequences or to add a sequence to an existing alignment. A set of multiple alignment programs (CLUSTAL W, MULTALIN, MENTALIGN, MUSCLE and MABIOS) is proposed to the user.

Phylogenetic tree viewer

 Lastly, a phylogenetic tree containing the query sequence and the sequences from the family is built. The user can choose among the following tree building programs : QuickTree, FastME, BIONJ, PhyML. For each program, the user can apply the bootstrap option. The tree is then automatically rooted at its mid-point.

For all programs used in HoSeqI (BLASTP, multiple alignment programs and phylogenetic tree building programs), the interface allows to choose non-default parameter values. All results are presented through Web pages and can be downloaded.

The usefulness of HoSeqI is to automate the identification process on large family databases and to contribute to the study of the evolutionary background of new sequences. HoSeqI proposes a user-friendly interface that allows a user to easily identify a query sequence and to visualize the obtained alignment and tree. The user can thus locate the sequence in the tree of its gene family and study the evolution of this new sequence. Computation times range between 30 s (for 143 sequences in the associated family) and 2 min 30 s (for 1132 sequences in the associated family).

Arigon AM, Perriere G, Gouy M. HoSeqI : automated homologous sequence identification in gene family databases. Bioinformatics. 2006 Jul 15 ;22(14):1786-7. Abstract