La Grande Motte talk Sept. 14, 2001





La Grande Motte     Sept 11-15 2001      D. Nelson         last modified Sept. 10, 2001

P450s of the lower eukaryote world

The tree of eukaryotic life has a complex structure. [slide 1] Based on ultrastructural 
identities, there are about 60 different eukaryotic lineages.  This tree shows schematically 
the relationships between 13 major groups of eukaryotes, but the branch lengths are not 
accurate.  Though animals and green plants are highly visible because of their size, they 
represent only two of these branches.  To appreciate the broader diversity in eukaryotic life 
it is necessary to consider the other less familiar branches.  Most of these are occupied by 
microscopic single cells, though there are some larger members in these groups.  I will 
show two examples from Rhodophyta and the Stramenopiles. This is the red algae 
Gelidium [slide 2].  Molecular biology is indebted to this seaweed since it is a major 
source of agar and agarose. Here we see giant kelp that grows up to 10 meters long.  Kelp 
and the famous Sargasso weed are found in the Stramenopiles [slide 3].  Fungi can be quite 
large also, the current record holder is a 3.4 square mile or 900 hectare Armillaria ostoyae or 
honey mushroom found in the Malheur National Forest in eastern Oregon.  That single 
mycelial clone is about 2 miles across and is estimated to be 2400 years old. 

The genome projects have now advanced to the point of completing a number of eukaryotic 
genomes and many others are far along toward completion.  The 12 eukaryotes colored red 
on this slide [slide 4] have finished or nearly finished genome sequences.  Those in blue 
have genome projects underway.  Much of this sequence data is not in Genbank, however, it 
is possible to use the genome project web sites to blast search these genomes to see what is 
there.  I have done that for a variety of lower eukaryotes and I continue to do it since the 
data is coming in so fast I cannot quite keep up.

Starting at the top, Neurospora crassa has been sequenced.  The sequence coverage of the 
genome is greater than 10 fold.  Blast searches have identified 38 P450s in 36 families, and 
only 4 of these are incomplete, missing some small portion from the N-terminal or the 
middle of the protein sequence.  All 38 P450s have been named.  Only 7 families have 
related sequences in other species, mostly from Fusarium.  Fusarium is being sequenced 
also, but I have not analyzed that genome yet, so there may be additional matches between 
Fusarium and Neurospora.  Neurospora does have a CYP505 which is related to the 
P450foxy from Fusarium.  This P450 is fused with NADPH cytochrome P450 reductase.

The other big surprise genome of 2001 was the white rot fungus Phanerochaete 
chrysosporium. [slide 5]This genome is being sequenced at the Dept. of Energy Joint 
Genome Institute in Walnut Hill California as part of a microbial genome initiative.  The 
fungus breaks down lignin in wood which is brown, leaving behind cellulose which is white, 
thus the name white rot fungus.  The fungus can grow at the high temperatures found in 
wood chip piles, making it a potential industrial agent for bleaching paper pulp instead of 
the polluting acid or base chemistries that are currently used.  The fungus secretes many 
oxidative enzymes and may be useful in bioremediation of toxic waste sites.  The concept of 
fungi as environmental clean up agents is shown here. [slide 6] This slide shows a 
mushroom taking a bite out of a chlorinated and hydroxylated ring compound.  The picture 
was taken from Thom Volk's Fungus of the Month web site, which has detailed write ups 
on dozens of fungi along with some exceptional photos.  I began searching the white rot 
genome expecting it would have a few P450s, maybe 10-15, but I was very surprised by the 
large number of hits.  After doing multiple searches with a variety of P450s and assembly 
of overlapping fragments, I was left with 167 contigs.  So far I have assembled 103 genes 
with all intron-exon boundaries identified and I have 64 more to do.  96 of the 103 
sequences are full length P450 genes.  I expect when all assemblies are done that white rot 
fungus will have between 130 and 150 P450s.  Blast searches of all 38 Neurospora 
sequences against the white rot sequences only identified CYP51 and CYP61.  All the rest 
of the genes did not match the other 34 P450 families in Neurospora.  

The white rot genes have many exons with short introns separating them.  There are also 
some unexpected features.  This slide [slide 7] shows the structure of one P450.  This 
gene has 12 exons.  Please notice the red ones are very short.  What is the evidence that 
these are real?  First, the end of exon 6 is phase 1, while the beginning of exon 8 is phase 0.  
You cannot join these two exons together without an intervening exon with phase 1 and 
phase 0 ends.  Exon  8 and 10 have a similar problem.  Second, there are 28 P450 genes in 
white rot that have this same exon structure.  Third, the AGSDT sequence is the highly 
conserved part of the I helix oxygen binding pocket.  This five amino acid motif is clearly 
missing from sequence alignments with other P450s, so it must be there in the gene 
someplace.  The short exon 9 is also missing in alignments right after the EXXR motif.

There are two other 5 amino acid exons at different locations in some of the white rot P450 
genes, but that is not the shortest exon. [slide 8] Here is a gene with a three amino acid 
exon (actually 8 nucleotides long).  Again, the phases are incompatible from exon 9 to 10, 
requiring an intermediate exon, The sequence is in the same region as before at the 
AGHETT conserved site in the I-helix and there are six P450s with this same intron-exon 
structure.  As further evidence I offer this sequence [slide 9] from an adjacent gene where 
exon 9 is not split and the GHE sequence is on the end of exon 9.  These short exons make 
the gene assembly process difficult.  

Lets go back to our eukaryotic tree now and look at some of the other branches. [slide 10] 
The seven red Xs indicate branches with detected p450s.  Below the fungi and animal clade 
is Dictyostelium discoideum the cellular slime mold.  There are many p450s in 
Dictyostelium and we will come back to them later, but first lets look at some of the rarer 
P450s.  At the bottom of the tree are presumed ancient eukaryotes that have no 
mitochondria and are anaerobic.  The Giardia genome has been sequenced to 4X coverage 
(about 92% done) at Woods Hole and there is no hint of a P450, as one might expect for an 
anaerobe.  The next branch up that shows a P450 is the Euglenozoa.  This branch includes 
the disease causing protozoans Trypanosoma and Leishmania. [slide 11] This is 
Trypanosoma brucei the cause of African Sleeping Sickness.  Both Trypanosoma and 
Leishmania contain a CYP51, but no other P450s have been detected in them yet.  
Trypanosoma has 121,000 reads in Genbank, so it has been pretty well sampled.  
Leishmania has only about 20,000 reads but a partial CYP51 is still found. [slide 12] 
Because CYP51 is present in Euglenozoa and four other branches, I have placed CYP51 in 
the common ancestor of all eukaryotes except the anaerobic Archaezoa.  It probably was 
there too and has been lost, but we just can't tell.  The Heterolobosea and Glaucophyta have 
not been sequenced to any depth yet, so we don't know anything about their P450s.  
Rhodophyta has at least one P450 in Porphyra.  This P450 is not in any named family.  
Green plants of course have hundreds of P450s but that is not our topic.  

The Alveolates and the Stramenopiles are two main branches in the eukaryotic crown group.  
Among the Alveolates, the malaria genome has been sequenced now and the results are blast 
searchable at PlasmoDB. [slide 13]  This is Plasmodium falciparum in a blood smear.  
The PlasmoDB database is a site where you must register and agree to some usage 
restrictions before seeing the data.  I registered and blasted the genome with numerous 
P450s, including CYP51.  The genome is essentially complete so any p450s should have 
been detected.  There were none.  Plasmodium is a parasite, so it may have no need of 
P450s since all the hosts sterols and other metabolites are available free.  Still it was a bit of 
a shock that there were no P450s found.  

Adjacent to the apicomplexans on the eukaryote tree are the Ciliophora. [slide 14] 
Tetrahymena(645 seqs) and Paramecium(3341 seqs) are free living ciliates and so they would 
not have the option to dump P450 genes supplied by a host, but their genome projects are just 
beginning so at the moment they have no detected P450s either. [slide 15]

Stramenopiles, the sister group to the alveolates, have two P450s so far, one is from a 
diatom and it is CYP97E1 a member of the plant CYP97 family.  Because of the clear 
relationship I have placed CYP97 in the common ancestor to plants and Stramenopiles.  
This is complicated by the fact that diatoms are known to have an endosymbiotic origin of 
their plastid from the red algae, so the CYP97E1 could have been transferred along with the 
red algal nuclear genome when the plastid was acquired.  

You will notice the name Phytophthora here.  That is Phytophthora infestans the Irish potato 
famine blight. [slide 16] Two stages of the life cycle are shown here.  An EST survey 
project has been done in Phytophthora generating about 4100 ESTs.  One P450 fragment is 
present in the collection. [slide 17] [link to Phytophthora ramorum page] 
[link to Phytophthora sojae page]. The sequence is from the heme binding region to the 
end of the protein.  It most resembles an unnamed chickpea sequence fragment that is most 
like CYP704A2, a member of the CYP86 clan in plants.  CYP86A1 and CYP94A2, other 
members of this clan, are fatty acid hydroxylases.  This might be an interesting one to clone 
for those in the audience who are working on plant fatty acid hydroxylases.  Note that 
excluding green plants, only five P450s have been found in branches from the 
Stramenopiles to the end of the tree.  Hopefully, this will change in the future or it will be 
confirmed that there are few p450s in these lineages.

The last branch on our tree [slide 18] that we need to talk about is the Amoebozoa which 
includes Dictyostelium. [slide 19] Dictyostelium now has at least 45 P450s not 13 as 
shown here.  The slime molds are interesting developmental models since they go through 
aggregation induced by cAMP.  They differentiate and form a stalk and a fruiting body.  It 
is tempting to believe that some of the 45 P450s might be involved in this process.  Is there 
any evidence?  The answer is yes.  William Loomis's lab has been isolating random 
insertional mutants in Dictyostelium and this data is posted on their web site at UCSD.  
One of the mutants detected was in the NADPH cytochrome P450 reductase gene.  This 
gene is required for most of the electron flow to the P450s, so when it is knocked out the 
P450s are shut down.  This mutant has an interesting phenotype.  It forms yellow mounds 
[slide 20].  This is a picture of the RedA mutant posted on the Loomis lab web site.  The 
fruiting body is normally yellow, so what has happened here is the failure to form the stalk.  

It is known that a compound called DIF-1 [slide 21] induces the differentiation of prestalk 
cells to form the stalk.  There are two types of prestalk cells and DIF-1 induces one of the 
two types.  The biosynthesis of DIF-1 is proposed by Robert Kay of the MRC in 
Cambridge, to use a polyketide synthase to make and aromatize the ring with the oxygens in 
place.  Then a chloroperoxidase adds the chlorines and a methyltransferase adds the methyl 
group.  No P450s have been proposed to be involved.  The evidence for a polyketide 
synthase is based on cerulenin as an inhibitor of one of the reactions, but no step by step 
intermediates are known.  I suspect there is room in this pathway for a P450 to carry out 
one or more of the steps.  This would explain the RedA mutant phenotype.  

P450s are often downstream of polyketide synthase genes in bacteria where they act on the 
product of the polyketide synthase.  A search for the text phrase polyketide synthase found 
71 hits in a Dictyostelium sequence database, these were probably multiple hits to the same 
set of genes.  Searches of the 71 contig numbers against the P450 contig numbers did not 
turn up any matches where a P450 might have been on the same contig.  

Even if a P450 is not required for the synthesis of DIF-1 it must also be broken down.  
There is evidence that there are possibly 12 intermediates in the breakdown of DIF-1.  
What P450s might be involved?  The Loomis Lab has also been doing microarray work on 
Dictyostelium and the data is posted to their web site.  I searched their site for P450s that 
were spotted on the microarray.  There were 5 P450s and the reductase. [slide 22]  Here 
are data from an average of 5 experiments measuring ratios of spot intensity of prespore to 
prestalk mRNA levels.  Spots 313 and 360 were from the same gene and act as duplicates.  
Three of the genes show significant changes.  CYP516B1 has a two fold increase and 
CYP517A1 has a two fold decrease, while CYP515B1 has a four fold decrease.  

They also measured spot intensity over 22 hours at two hour intervals [slide 23] as the 
cells were going through a cycle of aggregation and stalk formation.  The mound forms at 
about 10-12 hours with the slug phase next.  Here are the data for two P450s from the 
experiment.  These two P450s show a 7 fold change in mRNA levels over the course of the 
experiment, probably as the transition between mound and slug is occurring.  These might 
be candidates for biosynthesis or breakdown of DIF-1.

I will close with a picture of Joseph Hooker [slide 24].  The botanists in the audience may 
know he was the director of Kew Botanical Gardens in the mid 1800s as was his father 
before him.  Joseph Hooker was a botanical explorer and collector of plants.  Botanical 
exploration was a passionate affair at that time, as whole unexplored continents lay waiting 
for the first collector to enter.  Today we are explorers and collectors of genes, and the 
genetic continents that are our hunting grounds are just opening up.