La Grande Motte talk Sept. 14, 2001

La Grande Motte Sept 11-15 2001 D. Nelson last modified Sept. 10, 2001

P450s of the lower eukaryote world

The tree of eukaryotic life has a complex structure. Based on ultrastructural
identities, there are about 60 different eukaryotic lineages. This tree shows schematically
the relationships between 13 major groups of eukaryotes, but the branch lengths are not
accurate. Though animals and green plants are highly visible because of their size, they
represent only two of these branches. To appreciate the broader diversity in eukaryotic life
it is necessary to consider the other less familiar branches. Most of these are occupied by
microscopic single cells, though there are some larger members in these groups. I will
show two examples from Rhodophyta and the Stramenopiles. This is the red algae
Gelidium . Molecular biology is indebted to this seaweed since it is a major
source of agar and agarose. Here we see giant kelp that grows up to 10 meters long. Kelp
and the famous Sargasso weed are found in the Stramenopiles . Fungi can be quite
large also, the current record holder is a 3.4 square mile or 900 hectare Armillaria ostoyae or
honey mushroom found in the Malheur National Forest in eastern Oregon. That single
mycelial clone is about 2 miles across and is estimated to be 2400 years old.

The genome projects have now advanced to the point of completing a number of eukaryotic
genomes and many others are far along toward completion. The 12 eukaryotes colored red
on this slide have finished or nearly finished genome sequences. Those in blue
have genome projects underway. Much of this sequence data is not in Genbank, however, it
is possible to use the genome project web sites to blast search these genomes to see what is
there. I have done that for a variety of lower eukaryotes and I continue to do it since the
data is coming in so fast I cannot quite keep up.

Starting at the top, Neurospora crassa has been sequenced. The sequence coverage of the
genome is greater than 10 fold. Blast searches have identified 38 P450s in 36 families, and
only 4 of these are incomplete, missing some small portion from the N-terminal or the
middle of the protein sequence. All 38 P450s have been named. Only 7 families have
related sequences in other species, mostly from Fusarium. Fusarium is being sequenced
also, but I have not analyzed that genome yet, so there may be additional matches between
Fusarium and Neurospora. Neurospora does have a CYP505 which is related to the
P450foxy from Fusarium. This P450 is fused with NADPH cytochrome P450 reductase.

The other big surprise genome of 2001 was the white rot fungus Phanerochaete
chrysosporium. This genome is being sequenced at the Dept. of Energy Joint
Genome Institute in Walnut Hill California as part of a microbial genome initiative. The
fungus breaks down lignin in wood which is brown, leaving behind cellulose which is white,
thus the name white rot fungus. The fungus can grow at the high temperatures found in
wood chip piles, making it a potential industrial agent for bleaching paper pulp instead of
the polluting acid or base chemistries that are currently used. The fungus secretes many
oxidative enzymes and may be useful in bioremediation of toxic waste sites. The concept of
fungi as environmental clean up agents is shown here. This slide shows a
mushroom taking a bite out of a chlorinated and hydroxylated ring compound. The picture
was taken from Thom Volk’s Fungus of the Month web site, which has detailed write ups
on dozens of fungi along with some exceptional photos. I began searching the white rot
genome expecting it would have a few P450s, maybe 10-15, but I was very surprised by the
large number of hits. After doing multiple searches with a variety of P450s and assembly
of overlapping fragments, I was left with 167 contigs. So far I have assembled 103 genes
with all intron-exon boundaries identified and I have 64 more to do. 96 of the 103
sequences are full length P450 genes. I expect when all assemblies are done that white rot
fungus will have between 130 and 150 P450s. Blast searches of all 38 Neurospora
sequences against the white rot sequences only identified CYP51 and CYP61. All the rest
of the genes did not match the other 34 P450 families in Neurospora.

The white rot genes have many exons with short introns separating them. There are also
some unexpected features. This slide shows the structure of one P450. This
gene has 12 exons. Please notice the red ones are very short. What is the evidence that
these are real? First, the end of exon 6 is phase 1, while the beginning of exon 8 is phase 0.
You cannot join these two exons together without an intervening exon with phase 1 and
phase 0 ends. Exon 8 and 10 have a similar problem. Second, there are 28 P450 genes in
white rot that have this same exon structure. Third, the AGSDT sequence is the highly
conserved part of the I helix oxygen binding pocket. This five amino acid motif is clearly
missing from sequence alignments with other P450s, so it must be there in the gene
someplace. The short exon 9 is also missing in alignments right after the EXXR motif.

There are two other 5 amino acid exons at different locations in some of the white rot P450
genes, but that is not the shortest exon. Here is a gene with a three amino acid
exon (actually 8 nucleotides long). Again, the phases are incompatible from exon 9 to 10,
requiring an intermediate exon, The sequence is in the same region as before at the
AGHETT conserved site in the I-helix and there are six P450s with this same intron-exon
structure. As further evidence I offer this sequence from an adjacent gene where
exon 9 is not split and the GHE sequence is on the end of exon 9. These short exons make
the gene assembly process difficult.

Lets go back to our eukaryotic tree now and look at some of the other branches.
The seven red Xs indicate branches with detected p450s. Below the fungi and animal clade
is Dictyostelium discoideum the cellular slime mold. There are many p450s in
Dictyostelium and we will come back to them later, but first lets look at some of the rarer
P450s. At the bottom of the tree are presumed ancient eukaryotes that have no
mitochondria and are anaerobic. The Giardia genome has been sequenced to 4X coverage
(about 92% done) at Woods Hole and there is no hint of a P450, as one might expect for an
anaerobe. The next branch up that shows a P450 is the Euglenozoa. This branch includes
the disease causing protozoans Trypanosoma and Leishmania. This is
Trypanosoma brucei the cause of African Sleeping Sickness. Both Trypanosoma and
Leishmania contain a CYP51, but no other P450s have been detected in them yet.
Trypanosoma has 121,000 reads in Genbank, so it has been pretty well sampled.
Leishmania has only about 20,000 reads but a partial CYP51 is still found.
Because CYP51 is present in Euglenozoa and four other branches, I have placed CYP51 in
the common ancestor of all eukaryotes except the anaerobic Archaezoa. It probably was
there too and has been lost, but we just can’t tell. The Heterolobosea and Glaucophyta have
not been sequenced to any depth yet, so we don’t know anything about their P450s.
Rhodophyta has at least one P450 in Porphyra. This P450 is not in any named family.
Green plants of course have hundreds of P450s but that is not our topic.

The Alveolates and the Stramenopiles are two main branches in the eukaryotic crown group.
Among the Alveolates, the malaria genome has been sequenced now and the results are blast
searchable at PlasmoDB. This is Plasmodium falciparum in a blood smear.
The PlasmoDB database is a site where you must register and agree to some usage
restrictions before seeing the data. I registered and blasted the genome with numerous
P450s, including CYP51. The genome is essentially complete so any p450s should have
been detected. There were none. Plasmodium is a parasite, so it may have no need of
P450s since all the hosts sterols and other metabolites are available free. Still it was a bit of
a shock that there were no P450s found.

Adjacent to the apicomplexans on the eukaryote tree are the Ciliophora.
Tetrahymena(645 seqs) and Paramecium(3341 seqs) are free living ciliates and so they would
not have the option to dump P450 genes supplied by a host, but their genome projects are just
beginning so at the moment they have no detected P450s either.

Stramenopiles, the sister group to the alveolates, have two P450s so far, one is from a
diatom and it is CYP97E1 a member of the plant CYP97 family. Because of the clear
relationship I have placed CYP97 in the common ancestor to plants and Stramenopiles.
This is complicated by the fact that diatoms are known to have an endosymbiotic origin of
their plastid from the red algae, so the CYP97E1 could have been transferred along with the
red algal nuclear genome when the plastid was acquired.

You will notice the name Phytophthora here. That is Phytophthora infestans the Irish potato
famine blight. Two stages of the life cycle are shown here. An EST survey
project has been done in Phytophthora generating about 4100 ESTs. One P450 fragment is
present in the collection. [link to Phytophthora ramorum page]
[link to Phytophthora sojae page]. The sequence is from the heme binding region to the
end of the protein. It most resembles an unnamed chickpea sequence fragment that is most
like CYP704A2, a member of the CYP86 clan in plants. CYP86A1 and CYP94A2, other
members of this clan, are fatty acid hydroxylases. This might be an interesting one to clone
for those in the audience who are working on plant fatty acid hydroxylases. Note that
excluding green plants, only five P450s have been found in branches from the
Stramenopiles to the end of the tree. Hopefully, this will change in the future or it will be
confirmed that there are few p450s in these lineages.

The last branch on our tree that we need to talk about is the Amoebozoa which
includes Dictyostelium. Dictyostelium now has at least 45 P450s not 13 as
shown here. The slime molds are interesting developmental models since they go through
aggregation induced by cAMP. They differentiate and form a stalk and a fruiting body. It
is tempting to believe that some of the 45 P450s might be involved in this process. Is there
any evidence? The answer is yes. William Loomis’s lab has been isolating random
insertional mutants in Dictyostelium and this data is posted on their web site at UCSD.
One of the mutants detected was in the NADPH cytochrome P450 reductase gene. This
gene is required for most of the electron flow to the P450s, so when it is knocked out the
P450s are shut down. This mutant has an interesting phenotype. It forms yellow mounds
. This is a picture of the RedA mutant posted on the Loomis lab web site. The
fruiting body is normally yellow, so what has happened here is the failure to form the stalk.

It is known that a compound called DIF-1 induces the differentiation of prestalk
cells to form the stalk. There are two types of prestalk cells and DIF-1 induces one of the
two types. The biosynthesis of DIF-1 is proposed by Robert Kay of the MRC in
Cambridge, to use a polyketide synthase to make and aromatize the ring with the oxygens in
place. Then a chloroperoxidase adds the chlorines and a methyltransferase adds the methyl
group. No P450s have been proposed to be involved. The evidence for a polyketide
synthase is based on cerulenin as an inhibitor of one of the reactions, but no step by step
intermediates are known. I suspect there is room in this pathway for a P450 to carry out
one or more of the steps. This would explain the RedA mutant phenotype.

P450s are often downstream of polyketide synthase genes in bacteria where they act on the
product of the polyketide synthase. A search for the text phrase polyketide synthase found
71 hits in a Dictyostelium sequence database, these were probably multiple hits to the same
set of genes. Searches of the 71 contig numbers against the P450 contig numbers did not
turn up any matches where a P450 might have been on the same contig.

Even if a P450 is not required for the synthesis of DIF-1 it must also be broken down.
There is evidence that there are possibly 12 intermediates in the breakdown of DIF-1.
What P450s might be involved? The Loomis Lab has also been doing microarray work on
Dictyostelium and the data is posted to their web site. I searched their site for P450s that
were spotted on the microarray. There were 5 P450s and the reductase. Here
are data from an average of 5 experiments measuring ratios of spot intensity of prespore to
prestalk mRNA levels. Spots 313 and 360 were from the same gene and act as duplicates.
Three of the genes show significant changes. CYP516B1 has a two fold increase and
CYP517A1 has a two fold decrease, while CYP515B1 has a four fold decrease.

They also measured spot intensity over 22 hours at two hour intervals as the
cells were going through a cycle of aggregation and stalk formation. The mound forms at
about 10-12 hours with the slug phase next. Here are the data for two P450s from the
experiment. These two P450s show a 7 fold change in mRNA levels over the course of the
experiment, probably as the transition between mound and slug is occurring. These might
be candidates for biosynthesis or breakdown of DIF-1.

I will close with a picture of Joseph Hooker . The botanists in the audience may
know he was the director of Kew Botanical Gardens in the mid 1800s as was his father
before him. Joseph Hooker was a botanical explorer and collector of plants. Botanical
exploration was a passionate affair at that time, as whole unexplored continents lay waiting
for the first collector to enter. Today we are explorers and collectors of genes, and the
genetic continents that are our hunting grounds are just opening up.