Subject: Fwd: nomenclature-non elegans
From: Paul Sternberg
Date: Sat, 18 Oct 2003 08:47:55 -0700
To: Bhagwati Gupta , tinoue@caltech.edu



Begin forwarded message:

From: Jonathan Hodgkin <jah@bioch.ox.ac.uk>
Date: Sat Oct 18, 2003  07:41:55 AM US/Pacific
To: Paul Sternberg <pws@caltech.edu>
Cc: Jim McCarter <mccarter@genetics.wustl.edu>, riddled@missouri.edu, vra@dartmouth.edu, Iva Greenwald <greenwald@cancercenter.columbia.edu>
Subject: Re: nomenclature-non elegans


Dear Paul et al

My views are added as annotations below -- also responding to some of
the comments you've already received.  It looks like there is reasonable
convergence.  But, you are never going to please everybody.

Best wishes

Jonathan


----------------------------------------------------
Jonathan Hodgkin
CGC Genetic Map and Nomenclature Curator
(Caenorhabditis Genetics Center Subcontract)
Genetics Unit, Department of Biochemistry
University of Oxford
South Parks Road, Oxford OX1 3QU, UK
Tel (+44) 1865 275317
Fax (+44) 1865 275318
Email jah@bioch.ox.ac.uk
Nomenclature Guidelines:
http://biosci.umn.edu/CGC/Nomenclature/nomenguid.htm




*************************************************************
The proposal:

1.  Orthologs will be given the same name but with a species prefix.
For example, cb-tra-1 is the C. briggsae ortholog of C. elegans tra-1.
In some cases, there will be paralogs and some confusion; we expect
this to be minor compared to the convenience of having orthologs
having the same names.  For paralogs, a ^Ódot and number^Ô can be
appended to distinguish paralogs, e.g., hsp-16.1, hsp-16.2.

-- The optional species prefix currently recommended has 2 letters,
genus capitalized (Ce-tra-1).  As others have noted, cross-phylum
naming now tends to use 3 letters, so I think it has become time to
adopt 3 letters rather than 2 letters, but still keeping the genus
capitalized (Ce-tra-1 --> Cel-tra-1).  It is not that much more
cumbersome.  Using 3 letters will also partly solve
the numerous-nematode-species problem.  If the universe goes on
forever, it may become necessary to start using 4 and 5 letter
prefixes, to cope with all ?9 million nematode species, but we can leave
that problem for our grandchildren to solve.  But some authority will have
to assign the 3 letter organism codes, at least for nematodes.

-- The paralog naming system has been in place for some time in C. elegans,
and is working well for small families (like, one yeast SIR2 corresponds to
four worm paralogs sir-2.1, sir-2.2, sir-2.3, sir-2.4).  Some problems do
arise with larger gene families, such as nhr- genes, which Ann Sluder has
been annotating in the C briggsae genome, in consultation with me.  Four
situations are possible:
a.  Where there is clear 1:1 orthology, the Cbr gene certainly gets
the same nhr- number as the Cel gene.
b.  Where one Cel gene (nhr-x) corresponds to several Cbr genes
(probable paralogs), then the Cbr genes get paralog numbers nhr-x.1,
nhr-x.2, nhr-x.3.
c.  Things get a bit more complicated if one Cbr gene is about equally
similar to several Cel genes.  One can give the single Cbr-nhr- the
same number as the lowest-numbered close relative in Cel, and add a
note about the probable paralogy, but sometimes it may be safer to treat
this the same as case d:
d.  Cbr genes or families of genes with uncertain relationship to Cel
genes.  These should be given new numbers, preferably starting a new
block at, say, nhr-200, so as not to overlap with the Cel-nhr- genes.
Since Ann is in charge of the nhr- gene class naming, it's her call
as to the nhr- numbering in Cbr, and by extension in other nematodes.
   Also, as more genomic information is acquired for additional
species, orthologies and paralogies may become clearer, in which case
some revision of numbering may become appropriate.  Such revision
would not be catastrophic.


2. When a gene is identified in another species that belongs to a gene
class with a clear equivalent in C. elegans, it should be given the
same gene class name, but with a unique symbol as a postfix to the
number.  The symbol will include one or more letters followed by a
number. For example, C. briggsae genes could be dpy-cb1 OR dpy-B1 OR
dpy-CAENORHABDITISBRIGGSAE000000001 etc.  The organism^Òs community
should decide on the exact implementation; this choice will be tracked
by the CGC or WormBase.  A species prefix could be added but will be
redundant, e.g., cb-dpy-cb1 OR cb-dpy-B1.  OR ce-dpy-1.

C. briggsae gene tra-cb1 does not necessarily correspond to C. elegans
tra-1, tra-2 and tra-3.

2a.  We propose that dpy-cb1 be the form for C. briggsae since lower
case is easier to type.  (Paul Sternberg, Bhagwati Gupta have so far
expressed this preference.)

-- Although 3 letters with genus capitalized is what I recommend for
the organism identifier, I think a 2 letter code should be ample for
phenotype-based naming, and agree that lower case is preferable.
[Don't you mean prefix, not postfix?].  It seems unlikely that the
number of large scale forward genetic projects on different nematode
species will ever exceed 676.
So, Cbr-dpy-cb1 or dpy-cb1 for short, is fine.
But as you say, the exact implementation could be left up to the
organism community, as long as they checked for uniqueness with a
central authority.

-- The question about renaming needs to be addressed here.  If a
mutationally defined non-Cel gene subsequently gets cloned and turns
out to be an ortholog of a named Cel gene, then I think it most
certainly should be renamed to indicate the orthology.
So, in Scott Baird's example, if
'Cre-ung-1' = Cre-unc-cr1 does turn out to be the ortholog of
Cel-unc-4, then it should definitely be renamed Cre-unc-4.  Some
amount of renaming has to be accepted, and indeed has always gone on
in Cel.  As usual, the agreement of all parties concerned should be
sought.
In other cases, if (say) Cre-unc-cr90 should be cloned and prove
to have no previously defined equivalent in Cel, then it might be
reasonable to rename it Cre-unc-201, as in the nhr- example above.
Obvously, each case becomes much easier to sort out once the
molecular identity is known.

3.  Gene classes with no equivalent in C. elegans or other species
will be given unique three-letter-number names.
-- Fine.  These names will still need to be approved by the Cel community
(which, as noted, is likely to remain much larger than the total
non-Cel nematode genetics community).

4.  For alleles, strains, polymorphisms, rearrangements, transgenes,
and other variants, unique numbers (unique across all species) will be
assigned by the relevant laboratory using the standard C. elegans
nomenclature.  In all cases, a species prefix can be used, but is
redundant.  For example, ^ÓsyIs802^Ô is an integrated transgene in
C. briggsae from the Sternberg laboratory; it could be referred to as
cb-syIs802.  syIs802will never be used for something else, especially
a C. elegans transgene.

Existing gene classes used in other species could be retained since
there are not too many of them (e.g., ped), but should be retired if
possible.
-- Fine

4a.  We propose that the C. briggsae classes be retired
-- Fine


Responsibility for the numbering of a gene class will reside with the
assigning laboratory, unless transferred by them to WormBase and the
Caenorhabditis Genetics Center.  (As in the present practice, in some
cases, if desirable, a small block of numbers can be assigned to
another laboratory.)

Notes.

Bird and Riddle (1994 J. Nematol.) proposed nomenclature for parasitic
nematodes.  They suggested following the C. elegans guidelines but
with designations in parallel.  It would be desirable for one source
(CGC/Wormbase) to enforce uniqueness of lab and allele designations.


Philosophy and constraints:

a. From an informatician^Òs perspective, each genetic entity should
have a unique name, and there should be an authority to maintain
uniqueness.

b. From a researcher^Òs perspective, the names should be easy-to-use
and intuitive, and not generate confusing nicknames (think about what
you would write on the side of your Petri plate). Sub-communities
(e.g., those working on Pristionchus or C. briggsae) would tend to
drop lengthy identifiers.

c. If possible, the names should not stifle creativity.

d. From a classical geneticist^Òs point of view, there should be names
that can be used for decades before the molecular identity of a locus
is known.

e. From a molecular geneticist^Òs point of view, orthology should be
obvious from the name.  There can multiple homologs for a gene, and
orthology might not be clear, especially if full genome sequence is
not available.

f. However, the name should not confuse relationships among genes.

g. Other species names should not crowd out those in C. elegans.

Uniqueness (a) is the overriding concern.  Ease of use is the second
priority.  Depending on the researcher, (b,d) or maximizing (e) and
minimizing (f) is more important.

N. B.   There are millions of nematode species.

*****************************************************************



On Thu, 16 Oct 2003, Paul Sternberg wrote:

Dear friends,
I think we have reasonable convergence on a nomenclature plan.  (The
immediate motivation is to get briggsae genetic information into
wormbase; there are now many mutations and loci).   (Some of you have
already bought on in Spring-Summer 2002 when this first came up).

One briggsae-specific question is what we want to use for species
identifiers (dpy-cb1 rather than dpy-b1, etc.)

Comments?

After dealing with your comments, this will get posted on WormBase,
Society of Nematology, etc.
Thanks
Paul

Revised genetic nomenclature for non-C. elegans nematode species
DRAFT 2003-10-16

Under the current nomenclature system for non-C. elegans nematode
species, each gene class in a given species has a unique three-letter
gene class name that does not overlap with other species. e.g., the C.
briggsae equivalent of dpy is cby, and the equivalent of unc is mip.
However, with over 1100 gene classes in C. elegans and an increasing
number of species under study, this will soon become unmanageable.  We
therefore propose an alternative nomenclature system that will allow
genes with similar mutant phenotypes in other species to keep the same
gene class name, but with a species-specific identifier.

The proposal:
1.  Orthologs will be given the same name but with a species prefix.
For example, cb-tra-1 is the C. briggsae ortholog of C. elegans tra-1.
In some cases, there will be paralogs and some confusion; we expect
this to be minor compared to the convenience of having orthologs having
the same names.  For paralogs, a “dot and number” can be appended to
distinguish paralogs,  e.g., hsp-16.1, hsp-16.2.

2. When a gene is identified in another species that belongs to a gene
class with a clear equivalent in C. elegans, it should be given the
same gene class name, but with a unique symbol as a postfix to the
number.  The symbol will include one or more letters followed by a
number. For example, C. briggsae genes could be dpy-cb1 OR dpy-B1 OR
dpy-CAENORHABDITISBRIGGSAE000000001 etc.  The organism’s community
should decide on the exact implementation; this choice will be tracked
by the CGC or WormBase.  A species prefix could be added but will be
redundant, e.g., cb-dpy-cb1 OR cb-dpy-B1.  OR ce-dpy-1.

C. briggsae gene tra-cb1 does not necessarily correspond to C. elegans
tra-1, tra-2 and tra-3.

2a.  We propose that dpy-cb1 be the form for C. briggsae since lower
case is easier to type.  (Paul Sternberg, Bhagwati Gupta have so far
expressed this preference.)

3.  Gene classes with no equivalent in C. elegans or other species will
be given unique three-letter-number names.

4.  For alleles, strains, polymorphisms, rearrangements, transgenes,
and other variants, unique numbers (unique across all species) will be
assigned by the relevant laboratory using the standard C. elegans
nomenclature.  In all cases, a species prefix can be used, but is
redundant.  For example, “syIs802” is an integrated transgene in C.
briggsae from the Sternberg laboratory; it could be referred to as
cb-syIs802.   syIs802will never be used for something else, especially
a C. elegans transgene.

Existing gene classes used in other species could be retained since
there are not too many of them (e.g., ped), but should be retired if
possible.

4a.  We propose that the C. briggsae classes be retired

Responsibility for the numbering of a gene class will reside with the
assigning laboratory, unless transferred by them to WormBase and the
Caenorhabditis Genetics Center.  (As in the present practice, in some
cases, if desirable, a small block of numbers can be assigned to
another laboratory.)

Notes.

Bird and Riddle (1994 J. Nematol.) proposed nomenclature for parasitic
nematodes.  They suggested following the C. elegans guidelines but with
designations in parallel.  It would be desirable for one source
(CGC/Wormbase) to enforce uniqueness of lab and allele designations.


Philosophy and constraints:
a. From an informatician’s perspective, each genetic entity should have
a unique name, and there should be an authority to maintain uniqueness.
b. From a researcher’s perspective, the names should be easy-to-use and
intuitive, and not generate confusing nicknames (think about what you
would write on the side of your Petri plate). Sub-communities (e.g.,
those working on Pristionchus or C. briggsae) would tend to drop
lengthy identifiers.
c. If possible, the names should not stifle creativity.
d. From a classical geneticist’s point of view, there should be names
that can be used for decades before the molecular identity of a locus
is known.
e. From a molecular geneticist’s point of view, orthology should be
obvious from the name.  There can multiple homologs for a gene, and
orthology might not be clear, especially if full genome sequence is not
available.
f. However, the name should not confuse relationships among genes.
g. Other species names should not crowd out those in C. elegans.

Uniqueness (a) is the overriding concern.  Ease of use is the second
priority.  Depending on the researcher, (b,d)  or maximizing (e) and
minimizing (f) is more important.

N. B.   There are millions of nematode species.