*****
From: David Bird
Hi Paul,
I wouldn't get too hung-up over "jurisdiction". The reality is that C.
elegans community is much larger and much greater users of gene
nomenclature, and in reality, they drive the practical usage. There is
an established nomenclature (the paper that Don Riddle and I had in J.
Nematol in 1994) which attempts to codify nomenclature for non-C.
elegans species, and that has worked reasonably well, with a number of
influential labs using the general guidelines for parasitic species.
SON also has an ad hoc committee on nomenclature (of which I am
chair), but that committee has been fairly quiet (although I get an
occasional e-mail for advice on nomenclature; maybe 5/year). I agree
that the "parallel" systems in elegans and briggsae is nuts (did this
come from Dave Baillie's lab per chance?), and need to be redressed.
In general, the SON nomenclature copies the C. elegans rules as much
as possible, with the major difference in how alleles are handled,
because we don't have a reference strain for each species (i.e., an N2
equivalent). We assign gene names based on phenotype as much as
possible (e.g, sec for secreted; col for collagen, etc). Because there
is no easy way to assign orthology between different nematode species
(and obviously C. elegans and C. briggsae will be the closest
exception, and other will join as they are done), we don't worry about
that. So col-1 from a particular parasite has no relationship to col-1
from another (other than they both should be collagen genes). On the
other hand, ama-1 from another nematode species almost certainly is
THE orthologue of ama-1 in C. elegans. I stress that one can't be too
worried about orthology (and actually don't understand your
philosophical point "There can multiple orthologs for a gene"; that is
not possible).
As you also propose, we use a species identifier. This has proven to
be our most difficult and contentious point. Our 2-letter code only
allows for 676 species, and obviously there are millions. But is is
worse than that. There are more than 60 species of Meloidogyne, and we
can only accommodate 26 (and egos have been bruised over M. aranaria
and M. artellia).
[As an aside, my prediction is that there will be up to 10 new
nematode genomes in progress this time next year, including a number
of parasites].
Specific comments
The proposal:
1. Orthologs will be given
the same name but with a species prefix. For example,
cb-tra-1 is the C. briggsae ortholog
of C. elegans tra-1. In some cases, there will be
paralogs and some confusion; we expect this to be minor compared to
the convenience of having orthologs having the same names. For
paralogs, a "dot and number" can be appended to distinguish paralogs,
e.g., hsp-16.1, hsp-16.2.
Concur.
2. When a gene is identified
in another species that belongs to a gene class with a clear
equivalent in C. elegans, it should be given the same
gene class name, but with a unique symbol as a postfix to the number.
The symbol will include one or more letters followed by a number. For
example, C. briggsae genes could be
dpy-cb1 OR dpy-B1 OR
dpy-CAENORHABDITISBRIGGSAE000000001 etc. The organism's community
should decide on the exact implementation; this choice will be tracked
by the CGC or WormBase. A species prefix could be added but will be
redundant, e.g., cb-dpy-cb1 OR
cb-dpy-B1. OR ce-dpy-1.
Agree that each community should decide on precise details. Either of
the elegans/briggsae options seem OK. The species identifiers need to
be set on stone somehow within and across communities. This is a major
issue. At some level (certainly in wormbase), the full species
binomial needs to be linked to each gene. I would support having
wormbase as being the central repository for the binomials, and the
accepted 2-letter abbreviations. Some abbreviations should be for ever
assigned to some species, and that list could reasonably be those
species with full genomes done, in progress, or likely to be cone soon
(C. elegans, C. briggsae, Brugia malayi, Meloidogyne hapla,
Heteriodera glycines, etc, etc). Thus, Ce would never be used for any
other species. But Ma might be. The downside is that it would not be
immediately apparent what species Ma really related to (it may be
Meloidogyne aranaeria, but it might be something else). As long as one
can click on it, and show the full binomial, that won't matter. And
the individual researcher working on Meloidogyne aranaeria will know
what Ma means in their circumstance instance)
3. Gene classes with no
equivalent in C. elegans or other species will be
given unique three-letter-number names.
Concur
4. For alleles, strains,
polymorphisms, rearrangements, transgenes, and other variants, unique
numbers (unique across all species) will be assigned by the relevant
laboratory using the standard C. elegans
nomenclature. In all cases, a species prefix can be used, but is
redundant. For example, "syIs802" is an integrated
transgene in C. briggsae from the Sternberg
laboratory; it could be referred to as
cb-syIs802. syIs802will never be used for
something else, especially a C. elegans transgene.
Concur. The SON system follows this idea (but in parrallel, which is
probably not a good idea). Will this require that non-C. elegans labs
be registered in the C. elegans system? I think this should be
encouraged.
Existing gene classes used in
other species could be retained (but retired) since there are not too
many of them (e.g., cby, mip).
I concur.
Responsibility for the
numbering of a gene class will reside with the assigning laboratory,
unless transferred by them to WormBase and the
Caenorhabditis Genetics Center. (As in the present practice,
in some cases, if desirable, a small block of numbers can be assigned
to another laboratory.)
Concur. Labs generating large numbers of genes (mainly through genome
projects) should establish special relationships with WormBase.
****