cymobase | faq

General

Purpose of the database

CyMoBase is a database driven web application that is designed to assist researchers investigating structure, function and phylogeny of cytoskeletal and motor proteins during their daily work. CyMoBase exclusively contains manually annotated sequences. Although we are aware, that some of the exons/introns might have not been predicted correctly, we are sure that the quality of our protein data is far beyond that of any other database in the world containing motor and cytoskeletal proteins.

Usage

The web interface has been designed using the next generation internet features (Web 2.0). Thus, the site makes extensive use of Ajax (Asynchronous JavaScript and XML) in order to present the user with a feature rich interface while minimizing the amount of transferred data. To use the database you therefore have to enable javascript and session-cookies.

Limitations

At the moment, CyMoBase only contains protein sequences. Those are given with GenBank identifiers so that the genes can be reconstructed. CyMoBase also does not contain any information about alternative splice forms yet. Both informations, genomic DNA for every protein as well as alternative splice forms, will be included in future updates of CyMoBase.

How long has the database been available?

The first version of the database went online in Mai 2006 with release v 0.5.0 of the myosins, and v 0.1.0 of the kinesins and the calcium binding proteins. For recent updates have a look at the Data / Proteins pages that contain detailed version trackings of all projects.

How do I cite CyMoBase?

If you used data or tools from CyMoBase in your research please cite

F. Odronitz & M. Kollmar (2006) Open Access

Pfarao: A web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

BMC Genomics 7, 300.

Thank you.

Who assembled / moderates / pays for the database?

Look at the Team and Funding pages for more information.

Browse DB

Why are nucleotide and protein ID's missing for some sequences?

We decided to only support ID's from GenBank. Many sequencing projects have not been submitted to GenBank yet, but as soon as the data is available from GenBank we will add the corresponding ID's. Providing ID's from all the different sequencing centers would be very confusing to the user, and as the data is sometimes only available via ftp we don't expect it to be useful. Experienced users, who really want to look at the primary data, may use the provided links to the corresponding sequencing project pages.
In general, protein ID's are only given if the corresponding sequences are supported by full-length cDNA clones. It doesn't make sense to track all the ID's of the wrongly predicted genes.

Why are not all publications related to a certain protein cited?

Because this a sequence database, only the first publication mentioning at least a sequence fragment of a certain protein is cited. If several publications appeared independently about the same time, all of them are cited. Most connections between sequences and publications we got from the GenBank ID's, but because these entries don't get updated in many cases, where the publication appeared years later than the submitted sequence, we are sure that we missed many publications. In case you find out that a publication is missing, please do not hesitate to contact us so we can add the corresponding information.

I have found a sequence in GenBank that is different to that in CyMoBase. Why is there no comment?

We have decided to only comment on experimentally derived data. For example, if a gene has been isolated in 1991 and its sequence is different to that derived from genome sequencing. However, most sequences in GenBank are now derived from automatic annotations. These contain many, many mistakes for several reasons. Because they are updated from time to time (the 4th version of the Drosophila melanogaster annotation is available now) it is not possible to provide up-to-date comments on all these sequences. Except for alternative splice forms, that have not been included in CyMoBase yet, you can be sure that the CyMoBase data is correct.

What does a "fragment" or a "partial" sequence mean?

To provide the user with the important information whether a sequence is available in full-length or not, we decided to introduce these descriptors. They are also very important for phylogenetic analyses, as fragments cannot be used (there might not even exist any overlap between two fragments) while partials generally do not change the tree. Partial sequences are those that only miss a small part of the sequence (up to 10 %) while fragments only consist of a small part of the supposed full-length sequence.

How do you judge whether a gene is a pseudogene?

Well, that's a very difficult decision. As experimental data is missing for the major part of the sequences, a stop codon in the sequence might also be derived from a sequencing error. We can only propose, whether a gene might be a pseudogene, if the sequence contains several stop codons, or a certain stop codon conserved in several species.

How do you distinguish between N-terminal, middle, or C-terminal motor domains?

The motor domain in myosins and kinesins may be placed at the N-terminus, in der center, or at the C-terminus of the protein. The differentiation is completely artificial and only based on the length of the N- and C-terminal extensions of the motor domain. If both the N- and the C-terminal extensions are longer than 100 residues we name the motor domain middle. There may be some exceptions that have slightly longer extensions but have been named N-terminal or C-terminal based on their close homology to other "true" N-terminal or C-terminal motors. There are also many motor proteins that have "extensions" of less than 50 residues on both sites and are therefore not labeled.