Select one of the method for searching the Gclust database by pressing on
the respective button and then write a key word or select organisms.
There are two different strategy to find clusters of your interest:
Normal Search and
Use a descriptive word or words, which describe(s)
best the target protein: such as 'DNA-binding protein' or 'photosystem I'.
Many clusters will be displayed on the Search Results.
If you already know the sequence ID (in the Gclust database),
you may use the 'Sequence ID' button and input the ID.
This may be useful after you have used the Gclust database. In some eukaryotic genomes, several different proteins are predicted for a single gene. In that case, the sequence ID is a composite of gene ID plus GI number, such as ATH_AT1G01610_15223437. If you know the gene number, you may use asterisk (*) to search the protein, such as ATH_AT1G01610*. Currently, Genome name (ATH in this case) is obligatory. You may find it in the organism list.
Likewise, if you already know the number of cluster of interest,
you may directly select that cluster by the 'Cluster No.' button.
Normal Search Menu
But more convenient method of getting clusters is to use the BLAST search.
If you have a protein (blastp) or DNA sequence (blastx),
you may simply use the 'BLAST' menu to perform BLAST search.
Then, a list of homologs will be displayed as in other web sites.
However, the BLAST search is done against the database for the selected dataset.
In the Results window, select one of the top-ranking sequences,
and then, the cluster containing that sequence will be displayed.
This method is useful if you have a sequence from an organism,
which is not included in the dataset.
Note, however, that the search is done on the selected dataset.
If a good homolog is not found, try using another dataset from the beginning.
BLAST Search Menu
To perform phylogenetic profiling, you need to select organisms or
organisms groups. The Search Menu provides two methods.
The 'Organism Group' selection is used to select organism groups as shown.
The members of each group are listed in the 'Organisms' list.
There are three types of selections: 'yes', 'no', and 'any'.
The 'yes' label is used to select the organism group,
while the 'no' button is used to intentionally exclude the organism group.
The 'any' button is used to show that the organism group is indifferent
for the selection. Here, the threshold of selection is 50%.
Namely, if you select 'yes' for Bact in the CZ20x0 dataset,
this indicates that you want to select clusters that are conserved in more than
a half of total (two species only in this case) organisms.
This selection is done on the species level, but not on the protein level.
Therefore, even though a single organism is selected,
there may be multiple sequences.
In this way, you may set your choice by selecting each label for all organism groups.
Be sure to press on the 'Organism Group' button.
Finally, press the 'submit' button.
Another method of selection is to set selection for individual organisms.
First, push the 'Organisms' button, and then,
select the small tab located under the button.
This tab is used to specify the meaning of 'unselected' organism.
This may be unselected or indifferent, if you choose 'no' or 'any', respectively.
Then, you may select any organisms listed in the panel.
If your selection is complete, press the 'submit' button.
The use of 'no' tab is usually discouraged because there are sometimes homologs in unexpected organisms.
Phylogenetic Profiling Menu
Many different clusters are listed. The 'Annotations' column
displays the annotation given to one of the sequence in the cluster
(usually the first item in the cluster. NB: The first item may not always be
the representative of the cluster).
Long annotations given in the original database is truncated for display,
but you may easily understand the proteins included in the cluster.
Use 'Cluster Number' to display the whole cluster,
namely, names of sequences included in that cluster.
If you click on the 'Sequence ID',
the sequence of the selected item will be displayed,
but this may not be very useful.
In the Cluster Display screen, a similarity matrix is shown,
along with sequence length in aa and annotation in the original database.
If you click on the 'Sequence ID', the sequence of the selected item will be displayed.
If the cluster is large (containing more than 30 sequences), a similarity matrix is not displayed.
Only basic description of each sequence is shown.
Below the Cluster Display matrix, there are two or three buttons depending on clusters.
'Related Sequences' button is used to display related sequences,
which are not included in the present cluster.
These sequences are shown as a cluster number with the number of sequences within the cluster (in parenthesis).
There is no direct link, but the cluster number is used to display the related cluster
in a different search window.
The 'Related Sequences' button is only shown if such sequences are present.
The 'ClustalW' button is used to invoke clustal W for preparing an alignment of the cluster.
This is only possible if the load of computation is not very high (small number of short sequences).
If invocation of Clustal W fails, try the 'Get All Sequences' button
to download the sequences and make an alignment yourself in your computer or in another web service.
In some cases, the cluster consists of different subclusters, as you may see it by the matrix.
In the matrix, '1' indicates 'similar', and '0' indicates 'not similar at the threshold given on the matrix'.
Not that the cluster is composed by repetitive clustering,
and some members might be somewhat different from other members.
Copyright © 2006 Sato Lab. All Rights Reserved.