Where did all the P450s go?
David Nelson May 2, 2000
Comparison between genomes suggests that the urbilaterian ancestor for all protostomes
and deuterostomes probably had between 15,000 and 20,000 genes. This is based on the
current count in C. elegans of about 19,000 (this may be revised down to about
16,000); Drosophila has about 13,600 and Ciona intestinalis (a tunicate and
invertebrate chordate) is predicted to have about 16,000 PNAS 14, 4437-40 1998.
The number in vertebrates is closer to four times this number, around 80,000. The
larger number in vertebrates is assumed to be due to two whole genome duplications
early in vertebrate evolutionary history. The ray finned fishes seem to have had a
third genome duplication.
Since C. elegans has 80 P450 genes and Drosophila has 90 P450 genes, one might ask why
do humans only have about 60 P450s. If there were two genome duplications wouldn't it
be reasonable to expect that the number of P450s would be about four times higher in
vertebrates than in flies and worms? This would be the case if the number of P450s
was already large before the genome duplications took place.. Since the gene numbers
are of similar magnitude, we must assume that the number of P450s present in the
lineage before genome duplication was small. Futhermore, there should be evidence of
the duplication events in the present day set of vertebrate P450 genes. Fish might
have a higher number of P450s, if they did not lose a large percentage of them after
their third genome duplication.
We can see that half of the fly P450s are from the 4 and 6 families. Also half of the
C. elegans P450s are from the 2 family ancestor. The 6 family in insects is related
to the 3 and 5 famlies in vertebrates and the 13 and 25 families in the worm.
Therefore, we can infer that the 2, 3 and 4 clans were in the common ancestor. The
same must be true for the CYP51 gene. We can say with confidence that the
urbilaterian ancestor had at least 4 P450s. There is a mitochondrial cluster of P450s
that exist in vertebrates, C. elegans (CYP44) and the fly (Cyp12). This pushes the
minimum count to 5 P450s in the common ancestor.
If we assume two genome duplications in the tetrapod vertebrate lineage, we should
have quadrupled this number to about 20 P450s. If the duplications were early, we
might expect to find some deep divisions in these 5 clans. Are there four
subdivisions in the 51, 2, 3, 4 and mitochondrial clans that might be relics of the
The 3 family is related to the 5 family. The 3 and 5 famlies are probably different
branches of an ancient duplication. However, there do not exist any subfamilies in
these two families, so the evidence for two genome duplications is lacking. If one 3
clan gene were lost after the first duplication, then the 3 and 5 families might be
due to the second duplication.
The mito clan has three families and six subfamiles in vertebrates: 11A, 11B, 24, 27A,
27B, 27C. These could be the product of two rounds of genome duplication. This
prediction would be further supported if another mitochondrial family were found in
The 2 clan is a prime example in support of two genome dulpications of a precursor
P450. The 2 clan includes CYP1, CYP2, CYP17 and CYP21 in vertebrates. These are
split just the way one would expect for a double genome duplication with CYP1 and CYP2
clustering together and CYP17 and CYP21 clustering together. The large expansion in
the CYP2 family and the lack of an expansion in the CYP1, 17 and 21 families suggests
that the CYP2 expansion occurred after the second genome duplication.
The 4 family has 5 or 6 subfamilies in vertebrates, 4A, 4B, 4F, 4AH and 4X (and
possibly 4Z). These might be due to genome duplications, but the divergence of the
genes has been less than in the other cases, such that all progeny are still
considered to be in the same family. The CYP46 gene also clusters with the 4 family,
so the 4-46 divergence may have been from the first duplication, with 46 not
duplicated in the second round.
The 51 family at first seems to lack three progeny families that might be derived from
it. However, there are several vertebrate families that lack a home in this scheme.
These include CYP7, 8, 19, 26, and 39. CYP7, 8 and 39 are clearly related, CYP51 and
7 have been linked by Dr. Gotoh and 51 and 26 cluster in some of my trees. It is
possible that the CYP 51 gene was duplicated first to give rise to the 26 family.
This then branched to 26A and 26B. In the second round the 7, 8, 39 lineage could
have been created from the CYP51 duplicating again.
In this scheme all vertebrate P450 familes can be accounted for except CYP19 which is
an orphan. In the biochemical pathway of estrogen biosynthesis, CYP19 acts after
CYP17 and so its function must have evolved after CYP17. We mentioned above that
CYP17 may have been created in the second genome duplication. The lack of CYP19
related genes hints that CYP19 did not exist in the common ancestor, since it would
have spawned some descendant families. Two possible locations for the origin of CYP19
would be the postulated CYP3 branch that has no obvious descendants, or the
duplication of CYP46, which also has no related descendants.
The scheme presented here argues that there were 5 P450 clans present in the common
ancestor to all bilateria. These were the CYP2, 3, 4, 51 and mitochondrial clans.
The 17 families of P450s in vertebrates arose from these five by two successive genome
duplications. The evidence may support an argument for only 5 P450 genes, one in each
of these five clans. More genes present at this point should have resulted in more
families in present day vertebrates.
A detailed analysis of the intron-exon structures of these genes may provide
additional evidence for the proposed scheme.