Single family tree of 107 P450 sequences

David Nelson

Sept. 11, 1998

An alignment of 107 P450 sequences was assembled by merging existing alignments, 
removing multiple members of a single family and by addition of several new sequences.  
This alignment was processed by the Phylip programs Protdist, Neighbor (using the 
UPGMA option) and Drawgram.  The plotfile was imported into Illustrator 6 for labeling 
and it was saved as a PDF document for posting to the web.  This figure shows several 
higher order groupings of P450s called clans.  The 3 clan has been expanded beyond the 
conservative group of 45, 6A1 30, 3A1, 5A1 and 9A1 to include sequences down to the 
CYP711A1 from Arabidopsis.  I did this because it seems that this Arabidopsis sequence is 
probably a 3 family descendant, the only one known so far from a plant.  However, a 
second tree with only 3 and 4 clan sequences included did not keep the CYP711A1 and 
CYP28 sequences in the 3 clan.  Therefore, it may not be justified to extend the 3 clan 
beyond the 13 and 25 families.  The 4 clan is more conservative and only contains 
sequences that cluster inside the 4A11 sequence.  The cluster just above this could possibly 
be included in the 4 clan.  These are interesting sequences because they contain four 
bacterial eukaryote-like P450s.  CYP505 is P450foxy from Shoun, and it is probably a 
bacterial sequence adopted by Fusarium oxysporum. Fusarium has done this before with 
the CYP55 sequence that is a CYP105 relative.  In the more limited tree mentioned above, 
the 97 amd 52 families fell outside the 3 and 4 clans. The C. elegans clan has over half 
(46/80) of all C. elegans P450s in it.  It is probably derived from a 2 clan ancestor, but I 
have given it its own name because it is so large and distinctive as a cluster.  Plant group A 
sequences were noted by Durst and Nelson in Drug Metabolism and Drug Interactions 12, 
189-206 (1995) and this group has continued to grow and nopw contains 22 plant P450 
families.  The ancestor of this group has been repeatedly recruited for new roles in plant 
biochemistry, like the C. elegans cluster has arisen in the nematode and the 4 family has 
been amplified in insects and probably in arthropods generally.  There are two additional 
plant clusters the CYP85 clan and the 86 clan that are both strictly plant sequences.  Fungi 
also have some distinct clusters, labeled as Fungal A, B and C here. One other interesting 
group is the mitochondrial clan.  Many of the sequences in this cluster have experimental 
proof for their localization to mitochondria.  The CYP44 and the CYP10 sequences do not 
have this kind of evidence, but the inference is that they too will be found in mitochondria.  
There are several loner sequences that do not form clans.  These include CYP19 and some 
fungal and plant sequences, as well as a mycobacterium tuberculosis gene CYP132.  It is 
not known why some sequences are subject to amplification and diversification and others 
are not.  Some small clusters may be advanced to clan status later after input from P450 
researchers indicates that this should happen. One such cluster is the CYP120 CYP26 and 
CYP51 group.  

The tree does not include CYP74 that always falls at the bottom of such trees because of the 
poor sequence conservation in the I-helix region and elsewhere.  

This tree should provide a framework for more detailed trees covering specific regions of 
the tree, such as the C. elegans clan, or the 4 clan.

Your feeback on this would be appreciated.