Subject: Re: nomenclature
From: Jonathan Hodgkin
Date: Wed, 24 Jul 2002 09:40:01 +0100 (BST)
To: Paul Sternberg
CC: jonathan.hodgkin@bioch.ox.ac.uk, , Marie-Anne Felix , , , Bhagwati Gupta ,

A few comments:

1.  In cases of definite molecular orthology or paralogy, I think that
each C. briggsae gene should always get the corresponding C. elegans name,
as its main name, even if the gene may have been initially identified as
cby-x or mip-y.

2.  This principle has already run into a small difficulty in the case of
Cb-nhr genes for which there is no obvious Ce-nhr homolog.  In this case,
I suggested (to Ann Sluder) that they be given new nhr-xxx numbers, even
though there is no corresponding Ce-nhr-xxx.  I don't think this creates
a problem.
Similarly, the set of mir-xx genes in C. elegans is very discontinuous
in numbering, because there appear to be a substantial number of
micro-RNA genes that are present in other organisms but apparently
absent from the C. elegans genome.
So, there is no problem in having C. elegans type names with no
corresponding C. elegans gene.

3.  I have never liked the mip-, ung- and so on gene names, although
they may be a convenient temporary expedient.
Recently Eric Haag queried the very case you use as an example -- what
to call new tra mutations isolated in C. briggsae.  He suggested calling
them tra-a, tra-b and so on, until it is clear whether they correspond
or not to Ce-tra- genes.  If one of them does, then obviously it would
become Cb-tra-2, for example.  If it doesn't, I don't think it would be
problematic to  call it Cb-tra-4.
(I have added Eric Haag to this correspondence).

4.  This obviously corresponds to the suggestion of using blocks of unused
numbers for non-Ce genes, where appropriate.  I think this is the best
solution.  The tra[Cb]-1 type proposal, while workable in principle,
seems to me awkward and prone to misuse and misunderstanding.

5.  The objections raised to the proposal of number block assignment seem
a bit theoretical:
a. 	It seems unlikely that a number catastrophe would ensue,
because the scale of work is not expanding that fast.
b.	It would be easy to keep track of what blocks belonged where, because
that would be the responsibility of the original Ce- assigning lab, for
each gene class.
c.	Much of the time, it should become rapidly known what the
molecular identity of a non-Ce- gene is, so only a temporary name like
tra-a or tra(allele) would be needed in the interim, before cloning.
Pre-assigning huge number blocks would be undesirable and discouraged.

6.  I share Takao's dislike of 3-digit gene numbers, and will try to
prevent 4-digit numbers (which is a possibility for the Ce-let- class).
So one might make an exception in the case of the largest classes,
let, unc and lin, IF large numbers of genes need naming, and give those
new 3-letter names for each species where they are needed.

7.  I think there is still time to proceed with caution, to see how the
non-Ce field evolves, before establishing firm recommendations.

Regards

Jonathan

----------------------------------------------------
Jonathan Hodgkin
CGC Genetic Map and Nomenclature Curator
(Caenorhabditis Genetics Center Subcontract)
Genetics Unit, Department of Biochemistry
University of Oxford
South Parks Road, Oxford OX1 3QU, UK
Tel (+44) 1865 275317
Fax (+44) 1865 275318
Email jah@bioch.ox.ac.uk
Nomenclature Guidelines:
http://biosci.umn.edu/CGC/Nomenclature/nomenguid.htm



On Mon, 22 Jul 2002, Paul Sternberg wrote:

We have started to clone some C.briggsae mutations for which there are
clear elegans orthologs, and have thought about names. What do you think
of the following general nomenclature for other nematodes?  Thanks.
Paul

DRAFT1:Possible revision of gene names for non-C. elegans nematode
species.

Under the current nomenclature system for non-C. elegans nematode
species, each gene class in a given species has a unique three-letter
gene class name which does not overlap with other species. e.g. the C.
briggsae equivalent of dpy is cby, the equivalent of unc is mip.
However, with over 1100 gene classes in C. elegans and an increasing
number of species under study, this will soon become unmanageable. We
therefore propose an alternative nomenclature system, one which will
allow genes with similar mutant phenotypes in other species to keep the
same gene class name.

When a gene is identified in another species that belongs to a gene
class with a clear equivalent in C. elegans, it should be given the same
gene class name, but with a species identifier in square brackets as a
postfix. For example, a new Tra mutation in C. briggsae could be named
"tra[Cb]-1", and a new Unc mutation in P. redivivus could be named
"unc[Pr]-1".

Gene classes with no equivalent in C. elegans or other species will be
given unique identifiers. The postfix indicates the equivalent gene
class but does not indicate orthologous relationship. In other words, C
briggsae genes tra[Cb]-1, tra[Cb]-2 and tra[Cb]-3 do not necessarily
correspond to C. elegans tra-1, tra-2 and tra-3.

The preexisting system of indicating orthology by a prefix (e.g.
Cb-tra-1 for briggsae ortholog of C. elegans tra-1) will be retained and
used in parallel. If tra[Cb]-1 is discovered to be the C. briggsae
ortholog of C. elegans tra-2, it would be renamed Cb-tra-2 and tra[Cb]-1
would be treated as a synonym.

Although this system would lead to similar names with different meanings
(e.g. tra[Cb]-1, Cb-tra-1, Ce-tra[Cb]-1 etc.), it should be easy to
follow if one keeps in mind that prefix indicates orthology at the gene
level and postfix indicates the equivalent gene class. Preexisting
species-specific gene names like cby could be renamed (dpy[Cb]) or
retained.

Discussion
Bhagwati suggests that gene numbers should start from 101 or some other
high number to avoid confusion (e.g. between Cb-lin-11 and lin[Cb]-11).
(Or, higher with unc and let).

Takao prefers low numbers for several reasons.
1) High number would be redundant with the postfix.
2) Three digit gene names like let-617 are hard to remember.
3) We can start tra[Cb] with 101, but if we start other species with 101
also, there would be confusion among non-C. elegans species. If we
assign different number blocks to different species, numbers would get
very large very soon. Also, this would require someone to keep track of
which numbers correspond to which species.
4) If we assign an unique number block for C. briggsae, the tendency
would be to abbreviate the postfix, which could cause problems later,
especially if other species have the same number block. Having
overlapping numbers would enforce the use of postfix.
5) Gene names would get very long with the postfix and three-digit gene
numbers.
Takao is most concerned with the redundancy and awkwardness of
three-digit gene names.

26 squared is 676 and 26 cubed is 17,576;  neither is sufficient.