P450s in the Human Genome Project data as of Sept. 26, 2000
David Nelson
A new search for human P450s has been conducted in the HTGS and NR sections of
Genbank. The results are more complete than the May 8, 2000 search. At that time,
several known human P450s were not found in the genome project data. Today, only
CYP11B2 is still missing. All other known human P450s have at least part of their
sequence represented. Two new P450s have been found. These are CYP26C1 and CYP2W1.
Other sequences that were missing some sequence have been completed. For example, a
possible N-terminal has been found to the 4F12 sequence. This gene is in a cluster of six
4F sequences on accession AC068845.3 on chromosome 19 (4F3, 4F8, 4F10P, 4F12,
4F22, 4F23P). Two of the genes do not have contiguous N-terminal sequences and there
is one N-terminal not associated with a gene. This N-terminal is on the same contig as one
of the genes without an N-terminal, but it is pointing away from the C-terminal of this
gene, so it cannot belong to that sequence. The only gene left is 4F12, so I am guessing
that this is the N-terminal of 4F12.
The sequences have been identified to family and subfamily and named when possible.
Many pseudogenes have been found and these are not all named. Each sequence has been
Blast searched with the closest match available to identify the location of all exons. These
locations are listed to the nearest thousand base pairs as in 5,9,43,51,80,86,91,93,94k.
Exons for the gene can be found at these sites in the sequence. They are frequently on
muliple contigs and the order of the numbering is sequential in the DNA. This does not
reflect exon order in the gene.
THE POSITION INFORMATION IS VERSION SPECIFIC. Accession numbers have a
version number at the end as in AC000016.1. The version numbers change periodically.
You need to pay attention to the version number. It is possible to step back from the most
recent version number to any previous version by following links in that sequence entry.
In a short while many of the version numbers in these tables will be out of date and you
will have to do this to get the sequence I refer to in the tables.
The data have been prepared in three different PDF files. These files are sorted by
CYP name,
accession number
or chromosome.
Each file is useful for different purposes. To see if a new accession number is present look
at the accession number list. To see what genes are on a given chromosome look at the
chromosme file. The see what sequences are available for a particular P450 see the name
file.
The HTGS section of Genbank was searched with 18 different sequences, one from each
mammalian family and two from the CYP2 family. The last 13 searches only uncovered 6
new accession numbers, so nearly all real hits to P450s were found in the first two or three
searches with 2U1, 2C8 and 4F11. The NR section of Genbank was searched with fewer
P450s since these searches return much larger data sets that are harder to sort through.
There may still be some accession numbers in nr that were missed.
Many of the human P450 genes are still incomplete, and most are not assembled into a
single contig. It will be some time before the complete gene structure is realized for these
P450s. Even now, the 26C1 and 27C1 sequences are incomplete and the missing parts
cannot be found in any other section of the database.