Where did all the P450s go?

David Nelson May 2, 2000 Comparison between genomes suggests that the urbilaterian ancestor for all protostomes and deuterostomes probably had between 15,000 and 20,000 genes. This is based on the current count in C. elegans of about 19,000 (this may be revised down to about 16,000); Drosophila has about 13,600 and Ciona intestinalis (a tunicate and invertebrate chordate) is predicted to have about 16,000 PNAS 14, 4437-40 1998. The number in vertebrates is closer to four times this number, around 80,000. The larger number in vertebrates is assumed to be due to two whole genome duplications early in vertebrate evolutionary history. The ray finned fishes seem to have had a third genome duplication. Since C. elegans has 80 P450 genes and Drosophila has 90 P450 genes, one might ask why do humans only have about 60 P450s. If there were two genome duplications wouldn't it be reasonable to expect that the number of P450s would be about four times higher in vertebrates than in flies and worms? This would be the case if the number of P450s was already large before the genome duplications took place.. Since the gene numbers are of similar magnitude, we must assume that the number of P450s present in the lineage before genome duplication was small. Futhermore, there should be evidence of the duplication events in the present day set of vertebrate P450 genes. Fish might have a higher number of P450s, if they did not lose a large percentage of them after their third genome duplication. We can see that half of the fly P450s are from the 4 and 6 families. Also half of the C. elegans P450s are from the 2 family ancestor. The 6 family in insects is related to the 3 and 5 famlies in vertebrates and the 13 and 25 families in the worm. Therefore, we can infer that the 2, 3 and 4 clans were in the common ancestor. The same must be true for the CYP51 gene. We can say with confidence that the urbilaterian ancestor had at least 4 P450s. There is a mitochondrial cluster of P450s that exist in vertebrates, C. elegans (CYP44) and the fly (Cyp12). This pushes the minimum count to 5 P450s in the common ancestor. If we assume two genome duplications in the tetrapod vertebrate lineage, we should have quadrupled this number to about 20 P450s. If the duplications were early, we might expect to find some deep divisions in these 5 clans. Are there four subdivisions in the 51, 2, 3, 4 and mitochondrial clans that might be relics of the genome duplications? The 3 family is related to the 5 family. The 3 and 5 famlies are probably different branches of an ancient duplication. However, there do not exist any subfamilies in these two families, so the evidence for two genome duplications is lacking. If one 3 clan gene were lost after the first duplication, then the 3 and 5 families might be due to the second duplication. The mito clan has three families and six subfamiles in vertebrates: 11A, 11B, 24, 27A, 27B, 27C. These could be the product of two rounds of genome duplication. This prediction would be further supported if another mitochondrial family were found in vertebrates. The 2 clan is a prime example in support of two genome dulpications of a precursor P450. The 2 clan includes CYP1, CYP2, CYP17 and CYP21 in vertebrates. These are split just the way one would expect for a double genome duplication with CYP1 and CYP2 clustering together and CYP17 and CYP21 clustering together. The large expansion in the CYP2 family and the lack of an expansion in the CYP1, 17 and 21 families suggests that the CYP2 expansion occurred after the second genome duplication. The 4 family has 5 or 6 subfamilies in vertebrates, 4A, 4B, 4F, 4AH and 4X (and possibly 4Z). These might be due to genome duplications, but the divergence of the genes has been less than in the other cases, such that all progeny are still considered to be in the same family. The CYP46 gene also clusters with the 4 family, so the 4-46 divergence may have been from the first duplication, with 46 not duplicated in the second round. The 51 family at first seems to lack three progeny families that might be derived from it. However, there are several vertebrate families that lack a home in this scheme. These include CYP7, 8, 19, 26, and 39. CYP7, 8 and 39 are clearly related, CYP51 and 7 have been linked by Dr. Gotoh and 51 and 26 cluster in some of my trees. It is possible that the CYP 51 gene was duplicated first to give rise to the 26 family. This then branched to 26A and 26B. In the second round the 7, 8, 39 lineage could have been created from the CYP51 duplicating again. In this scheme all vertebrate P450 familes can be accounted for except CYP19 which is an orphan. In the biochemical pathway of estrogen biosynthesis, CYP19 acts after CYP17 and so its function must have evolved after CYP17. We mentioned above that CYP17 may have been created in the second genome duplication. The lack of CYP19 related genes hints that CYP19 did not exist in the common ancestor, since it would have spawned some descendant families. Two possible locations for the origin of CYP19 would be the postulated CYP3 branch that has no obvious descendants, or the duplication of CYP46, which also has no related descendants. The scheme presented here argues that there were 5 P450 clans present in the common ancestor to all bilateria. These were the CYP2, 3, 4, 51 and mitochondrial clans. The 17 families of P450s in vertebrates arose from these five by two successive genome duplications. The evidence may support an argument for only 5 P450 genes, one in each of these five clans. More genes present at this point should have resulted in more families in present day vertebrates. A detailed analysis of the intron-exon structures of these genes may provide additional evidence for the proposed scheme.