Whats New?

What's New July 18, 2013

The nomenclature pages are updated continuously and new genome pages are added to the main table periodically. New genomes include Jatropha curcas (a biofuel crop), Prunus persica (peach), Ricinus communis (castor bean), Aquilegia coerulea (columbine, a basal angiosperm plant in Ranuculales). More than 20,000 P450 sequences have been named.

What's New December 13, 2011

The nomenclature pages have been updated. Many genomes that were named but not posted yet on these pages have been added (more than 1000 new names). New genomes include Ixodes scapularis (deer tick). Grosmannia clavigera (fungi) Postia placenta (fungi) Metarhizium anisopliae var. acridum and var. anisopliae (fungi) Tomato and potato. The Daphnia (water flea) sequences have been posted on a Daphnia page.

What's New November 30, 2011

The Monarch butterfly P450s are posted.

What's New November 30, 2011

The Two-spotted spider mite P450s are posted.

What's New November 30, 2011

The whitefly Trialeurodes vaporariorum P450s transcriptome is posted.

What's New July 22, 2011

The human P450 master table has had all links checked and they all work and link to the correct data. New links have been added to make the table more complete.

What's New July 22, 2011

The public set of named bacterial P450s has been updated. See 1105 public bacterial P450 sequences

What's New July 19, 2011

I am going through trhe tables on the main page and checking for dead Links. These are being corrected. Next task will be to fix all links In the human mouse and rat master tables.

What's New July 19, 2011

The potato P450s are posted under the potato page. The potato genome paper appeared in Nature July 14, 2011

What's New May 11, 2011

The Selaginella P450s are posted under the Selaginella page. Selaginella is a spike moss in the lycophytes. It is an early vascular plant. The Selaginella genome paper appeared online (before print) on May 5, 2011 in Science Express, lead author Jody Banks.

What's New May 10, 2011

Four new genomes are posted. Three ant genomes from seed-harvester ant (Pogonomyrmex barbatus), fire ant (Solenopsis invicta) and the Argentine ant (Linepithema humile) are posted on the animal section under the honeybee and Jewel wasp row. The fourth genome is Postia placenta, a Basidiomycete fungus. This is on the fungal genomes page.

What's New Nov. 23, 2009

A new set of 2780 fungal P450s have been added to the P450 blast server. They are accessible in the pull down menu as all fungal P450s. I also added more bacterial P450 sequences so there are 1015 named Bacterial P450s. The older databases will be updated to correct changed names and to include more accurate sequence assemblies. Also the files are being formatted to be true FASTA format.

What's New Aug. 24, 2009

The 2009 P450 stats have been updated and now include 11,292 P450s. 19 Excel spreadsheets on animals, plants, fungi, protists and bacteria are posted along with a summary of the data in pdf format. The P450s are sorted by CYP name or by species name. Census information is given that lists family and subfamily data.

What's New Aug. 17, 2009

My P450 publications page is now up to date. There are links to many full papers and links to abstracts for the newest ones. These include four 2009 papers.

What's New Aug. 14, 2009

The P450 talks and lectures have been brought up to date. There are 15 Powerpoints and accompanying text.

What's New Aug. 12, 2009

The fungal P450s have their own page linked from the main table. This now includes 55 fungal genomes.

What's New Feb. 27, 2008

check out the P450 stats page. It is full of new data and files.

What's New July 27, 2005

The 8th International Symposium on Cytochrome P450 Biodiversity and Biotechnology

will be held in Swansea, Wales UK July 23-27 2006.

A Link to the website

What's New Jan. 26, 2003

The Mouse Master Table has been posted with over 500 links to mouse P450 data.

What's New Sept. 25, 2002

458 rice P450s have all been named. See the rice page for details.

What's New Sept. 12, 2002

I have not written a whats new section in a long time. Lots of things are new. The rice genome P450s have been identified from the two strains: indica (Chinese project) and japonica (public international project). There are 952 sequences identified between these two strains and 485 different genes. I am in the process of naming these now. A color coded file (green = japonica 389 seqs, blue = indica 563 seqs) is available on the rice section of the web page. In this file full length sequences begin with a red accession number.

What's New Jan. 2, 2002

The Fugu sequences are now named and a tree and alignment are on the server. To see the FASTA format of the sequence contigs sorted by family go to FASTA Sequence List

To see the alignment of human and Fugu sequences go to Human Fugu alignment

47 Fugu genes have been aligned to 60 human sequences and 8 other fish sequences and used to make a tree. In this tree human branches are red, Fugu branches are blue and other fish are gray.

What's New Dec. 28, 2001

I have upgraded my P450 server to a G4 Mac so you may notice some improvement in speed.

What's New Dec. 28, 2001

I have finished serching the Fugu genome for P450s. These are annotated and will soon be named. There are 36 full length sequences and 11 more are almost full length, for 47. There are six more that run off the ends of scaffolds and are probably real P450 genes. There are 26 other partials. See pufferfishes on the main page and fugu.FASTA file. Alignment and tree with human sequences is underway.

What's New Dec. 3, 2001

The server was down from noon on the 27th of Nov. to this morning Dec 3, almost 6 days. This is the longest it has been down since Feb. 1995 when I started it. It was unfortunate that I left town on the 28th without knowing it was down and I did not get back until today. It is up and running again. The newest area is the Fugu P450 section found under pufferfish on the main table. I have been assembling the Fugu P450s from the new genome assembly released Oct. 26. 13 of these sequences are full length sequences. 22 more are nearly full length, but need more effort to assemble them completely. 65 more are partials.

What's New Nov. 2, 2001

The rice P450 set is getting more complete. Additional P450s have been found on newly deposited genomic clones and some fragments have been joined to make longer P450 contigs. There are now 290 contigs including 166 full length sequences plus 46 pseudogene pieces. There are 78 partial sequences that may be parts of complete genes. I have been blasting each of these against the four mmain sections of genbank (restricted to rice) to try to extend these sequences. I have 33 of these fragments to go before declaring the Genbank data thouroughly mined for P450s. At that point I will make a summary for legal purposes and send it to the company that I have signed an access agreement with. Once they have a copy of the public P450 data, I will begin to search their private data to complete the set. I want to make it clear what I have found without any help from this private data. Once that data has been thoroughly searched, a publication on the results will follow.

What's New Oct. 12, 2001

I have searched the public data at Genbank for rice P450s in the nr, htgs, gss and est sections. All P450 fragments have been identified and Blast searches have been done of all fragments against each other to identify overlapping fragments and assemble them into contigs. Currently there are 296 contigs. These are composed of 156 full length sequences and 45 pseudogene partial sequences that cannot be made any more complete. There are 95 more partial sequences that will need futher work to complete. The 296 sequences have been made into a blast searchable file and they are now searchable on our blast server. The file of annotated sequences is available as well as the FASTA version without all the annotation. (see the rice section of the homepage).

What's New Oct. 12, 2001

16 Lolium rigidum plant P450s have been posted to the home page. Go to Lolium on the main table.

What's New Sept. 20, 2001

Chlamydomonas P450s from the EST project have been added to the species table. There are very few known so far, but they include CYP51 and two CYP97s. Rice has been added to our blast server (219 sequences, 47,693 letters) These include some new full length P450s found in Genbank. There are still many short pieces that need to be linked to full length genes. The blast server now has 1161 sequences from 12 species. See below for a link to the server. The all the sequences option does not include the rice sequences yet.

What's New August 17, 2001

a BLAST server has been set up on a RedHat Linux machine to allow blast searching of selected P450 sets. The first set available is all the Arabidopsis P450s, 273 named genes and 16 more unnamed fragments that have not been designated because I do not know if they are really different from the named genes. We plan to add more P450 sets (human , mouse, Neurospora) in the next few days and weeks. Only BLASTP and BLASTX work now since this is a protein database. The server address is http:/132.192.64.52/p450.html GO THERE NOW

What's New August 8, 2001

A search of the Neurospora crassa genome has identified 38 P450s. most are now full length. All Neurospora P450s have bee named, most from 527A1 to 553A1, but there are sme that were named already or fall in existing families. CYP51, 53A4, 54, 55A6, 61A5, 65B1, 65C1, 68D1, 505A2, 507A1, See the lower eukaryote part of the homepage.

What's New August 3, 2001

The fungus Phanerochaete chrysosporium has been sequenced. The P450s from this Basidiomycete have been searched and the preliminary results are on the homepage (see the main table). The big surprise from this work is the large number of P450s found. At least 122 signature sequence regions are present and 167 P450 fragments are found. These numbers may increase. Scaffold 205 has 12 different exon 1 sequences and probably 11 intact P450 genes. CYP51, CYP61 and CYP63A1, CYP63A2 and CYP63A3 are present. I have assembled 103 gene models with intron-exon boundaries defined and more are in progress. This fungus breaks down lignin.

What's New May 29, 2001

I have added Xenopus P450s to the animal part of the homepage table. This is now pretty complete for current data. I will be adding bovine and pufferfish sections based on ESTs and GSS (genome survey sequences).

What's New May 23, 2001

CYP26C1 is complete. CYP27C1 is complete. There is a new CYP4A22 sequence that is 95% identical to 4A11. Three genomic sequences agree, but all ESTs are like the 4A11 sequence and not like this new genomic sequence. Therefore, there seem to be two different genes, 4A11 and the new closely related 4A22 sequence. The human CYP26C1 gene is now complete. There are no mouse or human ESTs in Genbank even though there are 3.5 million human ESTs and 2 million mouse ESTs, so this is a rare transcript, or it may have a long 3 prime untranslated region that makes the ESTs fall in the non-coding part of the gene. The 2888bp downstream of the stop codon do not match any human ESTs. There is a bovine EST that covers the first 133 amino acids and it is 95% identical suggesting the protein is highly conserved in mammals that diverged 80-100 million years ago. This may be a developmental gene only present for a brief time or in a limited tissue distribution.
CYP26C1    human 
           GenEMBL AL358613.11 May 2, 2001
           522 amino acids, 6 exons, (0) = phase 0 intron
           52% to 26B1 human, also 15 amino acid insertion in exon 5 vs. 26B1
MFPWGLSCLSVLGAAGTALLCAGLLLSLAQHLWTLRWMLSRDRASTLPLPKGSMGWPFFGETLHWLVQ (0)
GSRFHSSRRERYGTVFKTHLLGRPVIRVSGAENVRTILLGEHRLVRSQWPQSAHILLGSHTLLGAVGEPHRRRRK (0)
VLARVFSRAALERYVPRLQGALRHEVRSWCAAGGPVSVYDASKALTFRMAARILLGLRL
DEAQCATLARTFEQLVENLFSLPLDVPFSGLRK (0)
GIRARDQLHRHLEGAISEKLHEDKAAEPGDALDLIIHSARELGHEPSMQELK (0)
ESAVELLFAAFFTTASASTSLVLLLLQHPAAIAKIREELVAQGLGRACGCAPGAAGGSEGPPPD
CGCEPDLSLAALGRLRYVDCVVKEVLRLLPPVSGGYRTALRTFELD (0)
GYQIPKGWSVMYSIRDTHETAAVYRSPPEGFDPERFGAAREDSRGASSRLHYIPFGGGARSCLG
QELAQAVLQLLAVELVRTARWELATPAFPAMQTVPIVHPVDGLRLFFHPLTPSVAGNGLCL*

CYP27C1 AC027142 43% identical to 27A1 assembled gene
intron starting with QIH ending in VDT is from Celera's public data
CRA_Gene|hCG42613 /len=10487.  This Celera sequence is still missing the C-terminal.
Probable last exon is now found in AC027142.  AG Intron boundary is in the same
Location as CYP26B1.  Stop codon is one codon away from 26B1s stop codon.  
Length is preserved from cys to intron. (n) = intron phase, 9 exons

  1  85452 MQTSAMALLARILRAGLRPAPERGGLLGGGAPRRPQPAGARLPAGARAEDKGAGRPGSPPG 85634 61
 62  85635 GGRAEGPRSLAAMPGPRTLANLAEFFCRDGFSRIHEIQ (0) 85748 99
100  39574 QKHTREYGKIFKSHFGPQFVVSIADRDMVAQVLRAEGAAPQRANMESWREYRDLRGRATGLISA (2) 39371 163
164  43984 EGEQWLKMRSVLRQRILKPKDVAIYSGEVNQVIADLIKRIYLLRSQAEDGETVTNVNDLFFKYSME (1) 43787 229 
230  41743 GVATILYESRLGCLENSIPQLTVEYIEALELMFSMFKTSMYAGAIPRWLRPFIPKPWREFC 41564 290
291  41563 RSWDGLFKFS 41534 300 (1)
301        QIHVDNKLRDIQYQMDRGRRVSGGLLTYLFLSQALTLQEIYANVTEMLLAGVDT (0) 354 (Celera sequence)
355 110201 TSFTLSWTVYLLARHPEVQQTVYREIVKNLGERHVPTAADVPKVPLVRALLKETLR (2) 110034 410
411 108566 LFPVLPGNGRVTQEDLVIGGYLIPKG (0) 108489 436
437 108006 TQLALCHYATSYQDENFPRAKEFRPERWLRKGDLDRVDNFGSIPFGHGVRSCIGRRIAELEIHLVVIQ (0) 107794 504
505 102503 LLQHFEIKTSSQTNAVHAKTHGLLTPGGPIHVRFVNRK* 102619 542

new CYP4A22 sequence
>new 4A11 like sequence AL390073.5 95% identical to 4A11 see alignment below
MSVSVLSPSRRLGGVSGILQVTSLLILLLLLIKAAQLYLHRQWLLKALQQFPCPPSHWLFGHIQE
FQHDQELQRIQERVKTFPSACPYWIWGGKVRVQLYDPDYMKVILGRS
DPKSHGSYKFLAPRI
GYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML
DKWEELLGQDSPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDR
NSQSYIQAISDLNSLVFCCMRNAFHENDTIYSLTSAGRWTHRACQLAHQHT
DQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAK
MENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHGLLGDGASITW
NHLDQMPYTTMCIKEALRLYPPVPGIGRELSTPVTFPDGRSLPKG
IMVLLSIYGLHHNPKVWPNLE
VFDPSRFAPGSAQHSHAFLPFSGGSR
NCIGKQFAMNQLKVARALTLLRFELLPDPTRIPIPMARLVLKSKNGIHLRLRRLPNPCEDKDQL*

>CYP4A11 NM_000778 12 exons (n) = phase of introns
MSVSVLSPSRLLGDVSGILQAASLLILLLLLIKAVQLYLHRQWLLKALQQFPCPPSHWLFGHIQE(0)
LQQDQELQRIQKWVETFPSACPHWLWGGKVRVQLYDPDYMKVILGRS (1)
DPKSHGSYRFLAPWI (1)
GYGLLLLNGQTWFQHRRMLTPAFHYDILKPYVGLMADSVRVML (0)
DKWEELLGQDSPLEVFQHVSLMTLDTIMKCAFSHQGSIQVDR (2)
NSQSYIQAISDLNNLVFSRVRNAFHQNDTIYSLTSAGRWTHRACQLAHQHT (1)
DQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAK (0)
MENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHSLLGDGASITW (2)
NHLDQMPYTTMCIKEALRLYPPVPGIGRELSTPVTFPDGRSLPKG (1)
IMVLLSIYGLHHNPKVWPNPEV (0)
FDPSRFAPGSAQHSHAFLPFSGGSR (2)
NCIGKQFAMNELKVATALTLLRFELLPDPTRIPIPIARLVLKSKNGIHLRLRRLPNPCEDKDQL*


CYP4A22 new seq (top) vs CYP4A11 NM_000778 (bottom) 12 exons
       Length = 520

 Score = 2607 (917.7 bits), Expect = 1.1e-276, P = 1.1e-276
 Identities = 494/520 (95%), Positives = 504/520 (96%)

Query:     1 MSVSVLSPSRRLGGVSGILQVTSLLILLLLLIKAAQLYLHRQWLLKALQQFPCPPSHWLF 60
             MSVSVLSPSR LG VSGILQ  SLLILLLLLIKA QLYLHRQWLLKALQQFPCPPSHWLF
Sbjct:     1 MSVSVLSPSRLLGDVSGILQAASLLILLLLLIKAVQLYLHRQWLLKALQQFPCPPSHWLF 60

Query:    61 GHIQEFQHDQELQRIQERVKTFPSACPYWIWGGKVRVQLYDPDYMKVILGRSDPKSHGSY 120
             GHIQE Q DQELQRIQ+ V+TFPSACP+W+WGGKVRVQLYDPDYMKVILGRSDPKSHGSY
Sbjct:    61 GHIQELQQDQELQRIQKWVETFPSACPHWLWGGKVRVQLYDPDYMKVILGRSDPKSHGSY 120

Query:   121 KFLAPRIGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVMLDKWEELLGQD 180
             +FLAP IGYGLLLLNGQTWFQHRRMLTPAFH DILKPYVGLMADSVRVMLDKWEELLGQD
Sbjct:   121 RFLAPWIGYGLLLLNGQTWFQHRRMLTPAFHYDILKPYVGLMADSVRVMLDKWEELLGQD 180

Query:   181 SPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDRNSQSYIQAISDLNSLVFCCMRNAFHEND 240
             SPLEVFQHVSLMTLDTIMK AFSHQGSIQVDRNSQSYIQAISDLN+LVF  +RNAFH+ND
Sbjct:   181 SPLEVFQHVSLMTLDTIMKCAFSHQGSIQVDRNSQSYIQAISDLNNLVFSRVRNAFHQND 240

Query:   241 TIYSLTSAGRWTHRACQLAHQHTDQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAKM 300
             TIYSLTSAGRWTHRACQLAHQHTDQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAKM
Sbjct:   241 TIYSLTSAGRWTHRACQLAHQHTDQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAKM 300

Query:   301 ENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHGLLGDGAS 360
             ENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIH LLGDGAS
Sbjct:   301 ENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHSLLGDGAS 360

Query:   361 ITWNHLDQMPYTTMCIKEALRLYPPVPGIGRELSTPVTFPDGRSLPKGIMVLLSIYGLHH 420
             ITWNHLDQMPYTTMCIKEALRLYPPVPGIGRELSTPVTFPDGRSLPKGIMVLLSIYGLHH
Sbjct:   361 ITWNHLDQMPYTTMCIKEALRLYPPVPGIGRELSTPVTFPDGRSLPKGIMVLLSIYGLHH 420

Query:   421 NPKVWPNLEVFDPSRFAPGSAQHSHAFLPFSGGSRNCIGKQFAMNQLKVARALTLLRFEL 480
             NPKVWPN EVFDPSRFAPGSAQHSHAFLPFSGGSRNCIGKQFAMN+LKVA ALTLLRFEL
Sbjct:   421 NPKVWPNPEVFDPSRFAPGSAQHSHAFLPFSGGSRNCIGKQFAMNELKVATALTLLRFEL 480

Query:   481 LPDPTRIPIPMARLVLKSKNGIHLRLRRLPNPCEDKDQL* 520
             LPDPTRIPIP+ARLVLKSKNGIHLRLRRLPNPCEDKDQL*
Sbjct:   481 LPDPTRIPIPIARLVLKSKNGIHLRLRRLPNPCEDKDQL* 520


What's New April 19, 2001

A complete set of public mouse P450 protein sequences is posted on the homepage under the table and other info. It is also reachable under the mouse button. 75 mouse P450s are given. 5 confidential sequences are named but not given. 6 human sequences with no mouse orthologs are also given in anticipation that they will be found. There are at least 80 mouse P450s and possibly 86.

What's New April 5, 2001

42 new P450s from Dictyostelium discoideum have been named. The sequences and accession numbers or other identifiers from the Dicty Blast server or the Kazusa cDNA project are give in the lower eukaryotes section of the web page. 705 sequence fragments have been identified and sorted into 55 contigs. There are 26 full length sequences. There are at least 43 P450 genes in Dicty. The names are in families CYP51, CYP508, CYP513-525. An alignment of 54 contigs is also presented. There are 13 unnamed fragments and 7 possible pseudogenes.

What's New March 20, 2001

The Dictyostelium sequences have been revisited and many new accessions have been found. Jinchuan Xing a graduate student on a rotation in the lab has searched all 64 previous contigs against the databases and he assembled as many of these genes as possible from the new hits. There are now 17 complete or nearly complete sequences and the number of contigs has dropped from 64 to 56 due to joining of previous contigs. There are at least 41 P450s in Dictyostelium based on unique C-terminal sequences. (see lower eukaryotes on the homepage, then click on 57 D. discoideum contigs) The alignment of C-terminals has also been revised.

What's New March 9, 2001

A new Human P450 has been found and it has orthologs in mouse and bovine and fish. It may be vertebrate specific since it is not found in fly or worm. The new sequences are all being named CYP20 (a long reserved and unused animal p450 name)
Human CYP20 (phase of introns shown)
AC011737.8|AC011737 Homo sapiens chromosome 2 clone RP11-33N4, WORKING
MLDFAIFAVTFLLALVGAVLYLYP (0)
ASRQAAGIPGITPTEEK (2)
DGNLPDIVNSGSLHEFLVNLHERYGPVVSFWFGRRLVVSLGTVDVLKQHINPNKTS (1)
DPFETMLKSLLRYQSGGGSVSENHMRKKLYENGVTDSLKSNFALLLK (0)
LSEELLDKWLSYPETQHVPLSQHMLGFAMKSVTQMVMGSTFEDDQEVIRFQKNHGT (0)
VWSEIGKGFLDGSLDKNMTRKKQYED (1)
ALMQLESVLRNIIKERKGRNFSQHIFIDSLVQGNLNDQQ (0)
ILEDSMIFSLASCIITAK (1)
LCTWAICFLTTSEEVQKKLYEEINQVFGNGPVTPEKIEQLR (2)
YCQHVLCETVRTAKLTPVSAQLQDIEGKIDRFIIPRE (0)
TLVLYALGVVLQDPNTWPSPHK (2) genomic seq stops here the rest is cDNA
FDPDRFDDELVMKTFSSLGFSGTQECPELR (2) intron site based on fish genomic DNA
FAYMVTTVLLSVLVKRLHLLSVEGQVIETKYELVTSSREEAWITVSKRY

AK020848 Mus musculus Cyp20 adult retina cDNA plus ESTs for C-term
MLDFAIFAVTFLLALVGAVLYLYPASRQASGIPGLTPTEEKDGN
LPDIVNSGSLHEFLVNLHERYGPVVSFWFGRRLVVSLGTTDVLKQHFNPNKTSDPFET
MLKSLLGYQSGGGSAGEDHVRRKLYGDAVTASLHSNFPLLLQLSEELLDKWLSYPETQ
HIPLSQHMLGFALKFVTRMVLGSTFEDEQEVIRFQKIHG 
TVWSEIGKGFLDGSLDKNTTRKKQYQEALMQLESTLKKIIKERKGGNFRQHT 
FIDSLTQGKLNEQQILEDCVVFSLASCIITAR 
LCTWTIHFLTTTGEVQKK
LCKEIDQVLGEGPITSEKIEQLSYCQQVLFETVRTAKLTPVSARLQDIEGKVGPFVIPKE 360
TLVLYALGVVLQDPSTWPLPHRFDPDRFADEPVMKVFSSLGFSGTWECPELXFAYMVTAV 540
LVSVLLEKLRLLAVDRQVVEMKYELVTSAREEAWITVSKRH*

Bovine CYP20 

MLDFAIFAVTFLLALVGAVLYLYPASRQAAGIPGITPTEEKDGNLPDIV
NSGSLHEFLVNLHERYGPVVSFWFGRRLVVSLGTVDVLKQHINPNKTLDPFETMLKSLLR
YQSDSGNVSENHMRKKLYENGVTNCLRINFALLIKLSEELLDKWLSYPESQHVPLCQHML
GFAMKSVTQMVMGSTFEDEQEVIRFQKNHGTVWSEIGKGFLDGSLDKSTTRKKQYEDALM
QLESILKKIKERKGRNFSQHIFIDSLVQGNLNDQQILEDTMIFSLAS
CMITAKLCTWAVCFLTTYEEIQKKLYEEIDQVLGKGPITSEKIEELRYCRQVLCETVRTA
KLTPVSARLQDIEGKIDKFIIPRETLVLYALGVVLQEXGTWSSPYKFDPERFDDESVMKT
FSLLGFSGTRECPELRFAYMVTAVLLSVLLRRLHLLSVE
GQVIETKYELVTSSKEEAWITVSKRY 498

Fish homologs CYP20

From oryzias latipes (fish)
MLDFAIFAVTFVVILVGAVLYLYPSSRRASGVPGLFPTDEKDGNLQDIVDRGSLHEFLV 
GLHEQFGPVASFWFGRQPVVSLGSVDPLRQHINPNHTTDSFETMLKSLLGYQAGAGGGAN 
ESVMRKKLYESAINNALKNSFPAVLKVAEELVDKWSSVPEDQHIPLCAHLLGLALKTV 

Human to fill gap in fish based sequence
TQMVMGSTFEDDQEVIRFQKNHGT (phase 0)
VWSEIGKGFLDGSLDKNMTRKKQYED (phase 1)

227 this part from Takifugu rubipes pufferfish
ALSEMESTLLSVVKERKSQRNKSVFVDSLIQSTLTERQ 265
IMEDCMVFMLAGCAITAN 283 (1)

Tetraodon nigroviridis freshwater pufferfish 
284
VCIWALHFLSSSEDVQDRLHQELEEVLGSGPVSLEKIPQL
RYCQQVLNETVRTAKLTPVAAGLQEVEGKVDQHLIPKE
TLVIYALGVILQDSHTWDAPCR
FHPDRFEEESVRKSFRLLGFSGSQTCPELR
VAYTVATVLLSAVVRQLRLHRLEDTLVEVRSELVSTPREETWITFSRRN

What's New Feb. 7, 2001

The assembled human genome is due for publication on Feb 12 in Science (online access free for whole genome issue) Hard copy publication on Friday Feb. 16. I will be searching for the missing regions of 27C1 and 26C1 when this comes out. Other notable findings are a complete diatom P450 (sequence still confidential) that is 52% identical to 97B3. This is a case where sequence identity across deep phylogenetic space is observed. Family 97 may be like CYP51 in this regard. Other recent evidence suggests that plants may be closest relatives of heterokonts (including diatoms) and alveolates. Some P450 sequence fragments are starting to become available from these species.

What's New Dec. 20, 2000

The Arabidopsis genome has seven unsequenced clones and one 5kb gap not including the centromere gaps. There may be a P450 on one of these clones, since CYP84A4 has not been found yet. There are also three GSS sequences that do not match known P450s. To see the details click here

What's New Dec. 14, 2000

The Genome of Arabidopsis is complete and described in Nature and Plant Physiology today. Over the course of the day I will be searching for the missing P450 genes that are currently represented as fragments and I will be incorporating them into the website. This depends on the availability of the sequence for searching. Sometimes Genbank lags behind the big announcements in the press. (Note: I was not able to find the whole 84A4, so I guess the whole genome is not available for searching yet)

What's New Dec. 11, 2000

I have begun a new long term project to detail the molecular events that have shaped the evolution of eukaryotic life. I have a great interest in this area and I would like to start this as an open ended and ever growing review/ text on the specific innovations and dichotomies that make up eukaryotic life. The announcement of the project is here This is linked to an Introduction and a brief article on key Synapomorphies in eukaryotic evolution.

What's New Nov. 14, 2000

The number of mapped clones in Arabidopsis that are not in Genbank has dropped to 31. A new Arabidopsis P450 CYP86B2 has been found on Chr V. Science has delayed it annual Genome issue for several weeks. The latest previous Genome issue was Oct. 25. What are they waiting for? Is it Arabidopsis? Is it Schizosaccharomyces pombe?

What's New Sept. 25, 2000

A new seach for human P450s in the human genome project data has been done. The files listing human P450s by accession number, chromosome or name will be updated. A new human P450 was found named CYP2W1. This sequence may have a frameshift at the C-terminal, since another frame has an exact match to the 2D6 C-terminal 6 amino acids at the right location for the end of the gene. The probable N-terminal was found for 4F12, though that is not certain yet. A file will be posted detailing this gene. Look for the new files on Sept 26.

What's New Sept. 19, 2000

This year a large number of Arabidopsis ESTs have been deposited from Japan. On April 20, 2000 3366 ESTs were deposited and on June 27, 2000 49848 more ESTs were deposited for a total of 53233. These EST accession numbers start with AV. A search of these ESTs for P450s identified 305 as P450s (0.6%) The P450s were found by searching with 9 different P450 sequences from Arabidopsis and CYP92A2 from tobacco. Each sequence from Arabidopsis was from a different P450 clan to maximize coverage. There are probably a few more ESTs that are family specific that were not picked up in this search. For more details see the Arabidopsis page and go to the P450s sorted by family link.

What's New Sept. 11, 2000

All 90 Drosophila sequences on my website have been blast searched vs the sequences on Rene Feyereisen's web site and all the differences between our translations have been identified. All these differences have been checked against the Genbank entries for the Drosophila genome. All differences have been resolved and corrections made to my site. This affected the translations of 14 P450 sequences. (4p2, 4p3, 4ac2, 9f2, 9f3, 12a4, 28d2, 49a1, 307a1, 313a1, 313a2, 313a5, 313b1, 318a1). These should now be more accurate than they were. Rene will be making revisions to his site, with the goal of having 100% agreement between our two sets of sequences. Michael Ashburner at Flybase and Eleanor Whitfield at Swiss-prot have also been notified of these changes. Effort will be made to get the Celera and Genbank annotations to match these translations.

What's New Sept. 5, 2000

Rice P450s have been thoroughly updated. 358 accession numbers and 213 contigs including 40 complete rice P450 sequences are now available. Go to the rice button on the home page. Read the rice P450 report.

What's New Aug. 23, 2000

All 269 genes from Arabidopsis have been listed by chromosome in the order that the clones appear on the chromosomes. This is a linear map without the actual spacing that would be included in a real map. See Rene's site for the drawn maps, though that site does have some errors and many new genes are missing.
Go to the chromosome list
The alphnumeric list of Arabidopsis accession numbers and the alphanumeric list of clone names has also been updated.

What's New Aug. 15, 2000

The Arabidopsis sequencing totals web site showed a dramatic jump yesterday From 92.9% complete to 103.7% complete. TOTAL The number of bases done is now 134,819kb which is a little larger than the predicted size of 130 MB. I think this means that the Arabidopsis genome is done or almost done. When I searched Entrez for new Arabidopsis entries in the last few days, I found 186 genomic clones deposited on Aug. 10. Also on that date 10,136 ESTs from developing seeds were deposited (Accession numbers BE520155 - BE530290). Blast searches on the 15th identified 55 as P450 ESTs. See the P450 sequences file at the bottom for a list by P450 sequence name or By alphanumeric order of the accession numbers. These P450s are expressed in developing seeds.

The Rice-Research.org administrator responded that placement of rice P450s on the Cytochrome P450 Homepage constituted an open public forum and therefore would meet the criteria set for presentation of data. I will send the necessary forms through the University system for access to the rice genome.

What's New Aug. 11, 2000

Monsanto/Pharmacia have created a website Rice-Research.org that will allow researchers access to the rice genome. Rice-Research.org A six page legal agreement has to be signed by a researcher and their organization before a password is issued. I have asked the administrator of this organization if I could name and post the rice P450 sequences on this web site as I have done for Arabidopsis. I am waiting for a reply. If the answer is yes, then I will sign the agreement and begin naming rice P450s. Otherwise, we will all have to wait for the public project to finish each clone.

What's New Aug. 4, 2000

I have posted a sequence alignment of all 57 human P450 proteins. This includes 53 probable functional genes and 4 pseudogenes (2G1P, 2G2P, 2T2P and 2T3P) This alignment is assembled from previous alignments I had with addition of new sequences. The alignment could be improved, so watch the version date. I am preparing a second copy with all the intron locations mapped to help understand the relationships between the families. I already found that CYP4V2 has no relationship to other 4 family members and should probably be treated as a separate family. Go to alignment.

What's New July 31, 2000

The Pseudomonas aeruginosa genome is complete but not published. The sequence is available on a server and it contains three P450 genes. The sequences of these have been posted in the file of bacterial P450 sequences as CYP107S1, CYP168A1 and CYP169A1.

What's New July 27, 2000

48 New bacterial P450s have been named in an attempt to catch up on the long neglected bacterial section of the nomenclature. I have posted a fasta file of the public bacterial sequences under the bacteria button on the homepage. This is not all the bacterial P450s but it is a good start.

What's New July 25, 2000

I have posted the talk I gave at Stresa Italy at the MDO2000 meeting on July 10. This talk includes the slides that I used, and a few comments on things that have changed since July 10. GO TO TALK

What's New May 19, 2000

14 complete or almost complete P450s have been assembled from the genomic and EST sequence data at Jena, Baylor College of Medicine, Genbank and the Japanese Project. Many of the dicty P450s have a short N-terminal exon that is still unidentified. There are now 439 sequence fragments that have been assembled into 64 contigs. In addition to the 14 full sequences, there are 27 partial sequences that have a C-terminal part. There are 23 more contigs that do not include a C-terminal. My expectation is that most of these 23 will join with an identified C-terminal contig so the total will be near 41 P450s in Dictyostelium. An alphanumeric list of the 439 fragments is posted as well as the 64 contigs (protein sequence) with their identifiers. See the lower eukaryoters button on the home page.

What's New May 19, 2000

Dictyostelium discoideum the cellular slime mold has at least 41 P450s and maybe more. See the Lower eukaryotes button. Look at the alignment of 43 C-terminals (8 and 8b may be the same seq and 25 and 29 may be the same).

What's New May 11, 2000

The current count of human P450s stands at 53 genes plus one candidate 2A gene that is partial. There are 22 pseudogenes. I have dropped some of the 2D pseudogenes from this count since there appears to be only 1 2D6 gene and 2 pseudogenes. The others are various alleles of these three formed by deletion and rearrangements. The 2C9, 3A4, 11B1, 11B2 and 26A1 genes are not present in the genomic sequence from the human genome project. The estimates of completion are now around 85%, so this leads to a back of the envelope calculation of 54 genes - 5 not in genomic DNA = 49 genes/0.85 = 58 genes predicted. There might be four more unknown P450 genes left to find in the last 15% of the human genome. I would be surprised if the final count exceeded 60 functional P450 genes.

The human chromosome 21 sequence was announced this week. I examined it for P450 genes and found the author's have annotated one pseudogene of the 4F family on chromosome 21. It has been named CYP4F28P and I have posted it to my human P450 sequence file. It is 81% identical to CYP4F25P and 80% identical to CYP4F26P. The 4F family seems to generate a lot of pseudogenes, more so than other families.

I am trying to keep up with the new human sequences, but revisions of old sequences are rapidly replacing the ones I have already posted on my PDF files of human P450 genes. This means the nucleotide numbering of the exon locations will be changing.

The sequence previously named CYP4AH1 has been renamed to CYP4V2 since it matches to a rainbow trout P450 fragment sent to me long ago and named CYP4V1. The two are 69% identical.

Be aware that there is one P450 in the database that is labeled incorrectly as human (AC021892). This is a rice CYP75 sequence from rice chromosome 10. It was accidentally labeled as being from human chromosome 10 in the definition line. The authors are trying to get this corrected since it affects many entries.

What's New May 3, 2000

A new human P450 has been found on AC010383.2 that looks like a fungal P450 from Agaricus bisporus (mushroom). I wonder if it might be a fungal contamination of the library? I have pieced part of it together, but it is still not complete. The two short introns are not like a human gene. Contig ends 2436 gene on minus strand 2352 VKYIPSWFPGAGFKRIAAKWKTDVNKMFDVPYAKFKDSM (gap) EKQRAAHEALDCVLERKRLPGVEDRDALPHITALAYEVLRYVS (intron) VSLLAIPHRTTADSYYKGYYIPAGSTIFPNSW (intron) AILHDEALYPEPHLFKPERFLDEDGSLHAHARYPIEA FGYGRRICPGRHFAHDALWLAIAHILAVFKIERALDEDGNESRGILRVASSMVL 1228 contig ends 1227

What's New May 2, 2000

A consideration of the effect of genome duplications on P450 evolution in vertebrates is given Where did all the P450s go?

What's New April 25, 2000

Searches for P450s in the human genomic DNA sequences being deposited in Genbank have identified several new human P450s. These include the five shown below and others. One is 43% identical to CYP27A1 and has been named CYP27C1. See the human section of the homepage for detailed PDF tables sorted by chromosome, CYP name or accession number.

>AC027142 CYP27C1 43% identical to 27A1 partially assembled gene

1 85452 MQTSAMALLARILRAGLRPAPERGGLLGGGAPRRPQPAGARLPAGARAEDKGAGRPGSPPG 48 85635 GGRAEGPRSLAAMPGPRTLANLAEFFCRDGFSRIHE 85742 83 84 39568 LQQKHTREYGKIFKSHFGPQFVVSIADRDMVAQVLRAEGAAPQRANMESWREYRDLRGRATGLISA 39371 149 150 43984 EGEQWLKMRSVLRQRILKPKDVAIYSGEVNQVIADLIKRIYLLRSQAEDGETVTNVNDLFFKYSME 43787 215 (GGT)intron G amino acid at boundary (AGGA) other end of intron 216 41743 GVATILYESRLGCLENSIPQLTVEYIEALELMFSMFKTSMYAGAIPRWLRPFIPKPWREFC 41564 41563 RSWDGLFKFSKRRIE 41519 287 gap of 51 amino acids 340 110201 TSFTLSWTVYLLARHPEVQQTVYREIVKNLGERHVPTAADVPKVPLVRALLKETLR 110034 395 intron LFPVLPGNGRVTQEDLVIGGYLIPKG intron 418 108006 TQLALCHYATSYQDENFPRAKEFRPERWLRKGDLDRVDNFGSIPFGHGVRSCIGRRIAELEIHLVVIQV 107791 493 missing about 30 aa at end

>AC012525 Homo sapiens chromosome 4. There is a mouse ortholog for this seq. Low 40% range with other mammalian 4 family members new subfamily of CYP4

223491 MAGLWLGLVWQKLLLWGAASAVSLAGASLVLSLLQRVASYARKWQQMRPIPTVARAYPLVGHALLMKPDGR 223279 220816 EFFQQIIEYTEEYRHMPLLKLWVGPVPMVALYNAENVEG 220700 219309 ILTSSKQIDKSSMYKFLEPWLGLGLLT 219232 218377 STGNKWRSRRKMLTPTFHFTILEDFLDIMNEQANILVKKLEKHINQEAFNCFFYITLCALDIIC 218186 217783 ETAMGKNIGAQSNDDSEYVRAVYR 217712 216357 MSEMIFRRIKMPWLWLDLWYLMFKEGWEHKKSLQILHTFTNSV 216229 214155 IAERANEMNANEDCRGDGRGSAPSKNKRRAFLDLLLSVTDDEGNRLSHEDIREEVDTFMFE 213973 210091 GHDTTAAAINWSLYLLGSNPEVQKKVDHELDDV 209993 206422 KSDRPATVEDLKKLRYLECVIKETLRLFPSVPLFARSVSED 206248 YFLTAGYRVLKGTEAVIIPYALHRDPRYFPNPEEFQPERFFPENAQG 206069 206068 RHPYAYVPFSAGPRNCIG 206015 204818 QKFAVMEEKTILSCILRHFWIESNQKREELGLEGQLILRPSNGIWIKLKRRNADER* 204648

>AC025090 CYP2U1 AC000016 has C-term 41% to 2N1 new CYP2 subfamily intron joints not yet defined

MSSPGPSQPPAEDPPWPARLLRAPLGLLRLDPSGGALLLCGLVALLGWSWLRRRRARGI 77036 PPGPTPWPLVGNFGHVLLPPFLRRRSWLSSRTRAAGIDPSVIGPQVLLAHLARVYGSI 76863 76862 FSFFIGHYLVVVLSDFHSVREALVQQAEVFSDRPRVPLISIVT 76734 105008 GPVWRQQRKFSHSTLRHFGLGKLSLEPKIIEEFKYVKAEMQKHGEDPFCPF 105160 105161 SIISNAVSNIICSLCFGQRFDYTNSEFKKMLGFMSRGLEICLNSQVLLVNICPWLYYLPF 105340 105341 GPFKELRQIEKDITSFLKKIIKDHQESLDRENPQDFIDMYLLHMEEERKNNSNSSFDEE 105517 105518 YLFYIIGDLFIAGTDTTTNSLLWCLLYMSLNPDVQ 105622 107396 KVHEEIERVIGANRAPSLTDKAQMPYTEATIMEVQRLTVVVPLAIPHMTSENT 107554 109370 LQGYTIPKGTLILPNLWSVHRDPAIWEKPEDFYPNRFLDDQGQLIKKETFIPFGIG 109540 KRVCMGEQLAKMELFLMFVSLMQSFAFALPEDSKKPLLTGRFGLTLAPHPFNITISRR

>CYP4F22 AC011492 assembled gene 13 exons 114537-140651 66% to 4F3, 65% to 4F11, 63% to 4F2, 59% to 4F8, 64% to 4F12, 57% to AC011537 exact intron boundaries need checking no ESTs MLPITDRLLHLLGLEKTAFRIYAVSTLLLFLLFFLFRLLLRFLRLCRSFYITCRRLRCFPQPPRRNWLLGHLGMVS PNEAGLQDEKKVLDNMHHVLLVWMGPVLPLLVLVHPDYIKPLLGAS AAIAPKDDLFYGFLKPWLG DGLLLSKGDKWSRHRRLLTPAFHFDILKPYMKIFNQSADIMH AKWRHLAEGSAVSLDMFEHISLMTLDSLQKCVFSYNSNCQE KMSDYISAIIELSALSVRRQYRLHHYLDFIYYRSADGRRFRQACDMVHHFTTEVIQERRR ALRQQGAEAWLKAKQGKTLDFIDVLLLAR DEDGKELSDEDIRAEADTFMFEG HDTTSSGISWMLFNLAKYPEYQEKCREEIQEVMKGRELEELEW DDLTQLPFTTMCIKESLRQYPPVTLVSRQCTEDIKLPDGRIIPK GIICLVSIYGTHHNPTVWPDSK VYNPYRFDPDNPQQRSPLAYVPFSAGPR NCIGQSFAMAELRVVVALTLLRFRLSVDRTRKVRPELILRTENGLWLKVEPLPPRA*

>CYP4F23P AC011492 assembled gene 76% to 4F3, 76% to 4F8, 76% to 4F11, 73% to 4F2, 75% to 4F12, 77% to 4F11, 60% to other 4F on this accession no ESTs MSLLSLSWLGLGPVAASPWLLLLLVGASWLLARVLAWTYAFYDNCHRLQCFQQPPKRNCF*GHLSLVS GNEEDMRLMEDLGHYFRDVQLWWLGSFYPVLHLVHPTFTAPVLQAS AAVALKDMSFYGFLKPWLG DGLLISAGDKWRWHRHLLTPAFHFKILKPYVKIFNESTNIMH AKWQRLALEGSVRLEMFEHISLMTLDSLQKCIFSFDSNCQE KPSEYIDAILELSALSLKRHQHIFLLTDFLYFLTPNGRRFCRACDIVHNFTDAVIQERRR TLTSQGVDDFLQAKAKSKTLDFIDVLLLAK DENGKKLSDENIRAEADTFMSG GHDTTASGLSWVLYNLARYPEYQEHCRQEVQELLKNGDPKEIEW DDLAQLPFLTMCLKESLRLHSPVSRIHRCCPQDGVLPDGRVIPK GNTCTISIFGIHHNPSVWPDPEV YDPFRFDPENLQKTSPLAFIPFSAVPR NCIGQTFAMAEMKVVLALTLLRFRVLPDHAEPRRKLELIVRAEDGLWLRVEPLSADLQ*

What's New April 11, 2000

The Plant P450 Database has exceeded 500 sequence entries. The database has been updated to include the latest information on which sequences are still confidential and which have become public. Many new sequences have been added since the last revision in August 99. Please note that slightly more than half of these sequences are from Arabidopsis. A large collection of isoflavone synthases was deposited in Feb. 2000. These can be found in the 93C subfamily. For a collection of 28 93C accession numbers and their sequences click here

According to the latest Nature of April 6, Monsanto has sequenced all 12 chromosomes of rice to around 5X coverage and will release it to the International Rice Genome Sequence Project (IRGSP) no strings attached. The IRGSP will combine this new data with their own and release it to public databases. The exact time table for this release is not given. For a news item from the IRGSP click here The gist of this article is that transfer of the data from Monsanto will be in May and June and this will accelerate completion of the genome, but it still might take 3 years. The raw Monsanto data will not go right into public databases. It seems that only finished segments of the genome will be released. What would Craig Venter do with a 5X coverage genome? See below.

On April 7 before a congressional hearing J. Craig Venter said Celera was through sequencing human DNA and would switch to mouse. He said they would begin assembling the human genome from the present sequence data that they had accumulated so far and the finished sequence would be ready in 3-6 weeks. Francis Collins disputed this claim saying it was not possible. Celeras stock fell 20% in one day after Collins remarks.

What's New March 17, 2000

The accession numbers AC008537 and AC008962 from human contain a large cluster of P450 genes from Chromosome 19. This cluster has been described in Hoffman et al. J. Molec. Evol. 41, 894-900 1996. However, the sequence of the cluster is now coming into the databases on these two accession numbers. These are incomplete sequences at the present time, with 16 unordered fragments for AC008537 and 5 unordered fragments for AC008962. The genes included here are 2A6, 2A7, 2A13, 2F1, 2G1 and a proable pseudogene of 2G. There are also some additioal unnanmed 2A, 2B and 2F sequences as well as a new 2T2P sequence. I will be trying to put these together and post them over the next few weeks.

What's New Jan. 27, 2000

All Drosophila melanogaster P450 protein sequences have been posted as a FASTA File All Drosophila melanogaster P450s These sequences are all in Genbank. There are no confidential sequences left. There are 86 P450 genes and 4 pseudogenes. CYP51 is absent. Since CYP51 is also absent in C. elegans, this important eukaryotic sterol biosynthetic gene may have been lost in the common ancestor of flies and nematode worms.

What's New Jan. 19, 2000

I have posted a 4 family tree with 89 sequences, including the new Drosophila sequences in the 4 family and in the 18 clan.

New 4 Family tree

A second tree covering the remaining sequences including the 6, 9, 12 and 28 families is also here

New 6 and 9 Family tree

What's New Dec. 29, 1999

Two new sequence alignments of the I-helix to the end half of the proteins are posted. The first alignment covers 73 sequences (mainly insect sequences) in a 4 family alignment. The second alignment has 79 sequences (mainly 6, 9, 12, 28 families) with many new Drosophila sequences included in both alignments. See second alignment. These will form the basis for naming the new Drosophila P450s, which will be done by Jan. 15, 2000. Once the trees for these alignments have been debugged and polished, they will be posted with the new nomenclature for the Drosophila sequences. The trees will contain about 31-32 confidential sequences, but these will not be in the alignments.

What's New Dec. 27, 1999

The Arizona P450 site is now linked. Please go visit this excellent site.

The P450 Site at the University of Arizona

What's New Nov. 24, 1999

There are now 69 N-terminal sequences in an alignment under the Drosophila button on the main page. That means there are at least that many P450s in Drosophila and probably a few more will be found, since C. elegans had 80 P450s. The FASTA file of P450 sequences has all the fragments I have so far assembled and 50 full length sequences are available there. This file is a working scratch pad, so bear with me until I get them all in final order. There are a couple of sequences that are so different from existing P450s that I have not been able to assemble the middle section, even though it is present in the sequence data, I cannot identify the exons.

What's New Nov. 22, 1999

Celera genomics continues to deposit new Drosophila sequences into Genbank. On Nov. 16, 1612 fragments were deposited. I am trying to catch up to these. See the N-terminal page for an alignment of 65 Drosophila N-terminal sequences. The drosophila list is also being updated. See the accession numbers beginning with AC0#####* followed by an asterisk. These are newly added since the July 14 revision of this page.

What's New Nov. 9, 1999

Celera Genomics deposited 551 Drosophila sequences into Genbank on Nov. 3, 1999. These represent about 10 million bases of sequence. The remaining Drosophila genome sequences from Celera should be in Genbank by the end of the year. This will allow identifcation of all the P450s in a macroscopic animal.

What's New Oct. 19, 1999

The Dictyostelium discoideum genome project has been making progress. In Nature 401 from 30 Sept 99 page 440 the current status is given as "Over two-fold coverage of the 34 Mb genome is now available and there is already information on at least 90% of the genes." I have searched the Jena web Blast server for new sequence extensions of the 18 different P450 genes I had found earlier. Some of these are now complete sequences based on ESTs and shotgun sequences. A complete CYP51 is given, There is a complete CYP508A1 and a partial CYP508B1. At least two other sequences are complete or nearly so. I have also translated all the new P450 hits found on the Jena server and sorted them into contigs. There are now 67 different P450 contigs in Dictyostelium, though there will be fewer genes than this because some are non-overlapping N- and C-terminals of the same gene. Even though slime molds are multicellular only part of the time, they seem to have many P450s, possibly more than 30. See the Lower Eukaryote option on the homepage table.

 

What's New Oct. 13, 1999

A new human CYP family CYP39A is now represented by a complete sequence. This was assembled over several days by using every trick in the book to find all the pieces. I suspect this will be an important P450, with an ancient history, probably acting on sterols. The gene is also found as ESTs in mouse and chicken. See CYP39A1 in this file.

 

What's New Sept. 23, 1999

A new human P450 has been found named CYP26B1 that is 44% identical to 26A1 from mouse and humans. The sequence was found in genomic DNA AC007002. The reference is in press.

Nelson, D.R. (1999) A second CYP26 P450 in humans and zebrafish: CYP26B1.

Archives of Biochemistry and Biophysics in press.

 

What's New Sept. 22, 1999

The server has been upgraded from an ancient Quadra 650 circa 1994 to a PowerMac 
7100.  This is not new, but it is a step up from the old Quadra.  You should 
notice some improvements in speed.  If you have any trouble accessing any part 
of this site let me know so I can check it.
 

What's New Sept. 16, 1999

A new tree of the CYP81 family is posted under the trees button.  This does not 
contain all 
CYP81 sequences, but it has all the subfamilies represented.

What's New Sept. 14, 1999

Soybean has a large number (86) of P450 containing ESTs.  I have collected 
these, and 
incorporated them in a list with 22 full length soybean P450s.  This is a new 
format that 
uses the Entrez Nucleotide query output for each entry.  These have been 
modified to 
include my own annotation and translations to make them more useful.  The 
sequences 
have been sorted by family.  This format provides easy access to the nucleotide 
and protein 
sequences (if available) from the genbank records by a single mouse click.  Let 
me know 
how you like it, it was a lot of work to make it. See the soybean button under 
plants.

What's New Sept. 13, 1999

7938 new Arabidopsis ESTs were deposited in Genbank on Sept. 8.   I have 
looked at them by Blast searches over the weekend and I found 57 that contain 
P450
sequences.  47 of these are close or exact matches to known P450s from 
Arabidopsis (96% 
or better identity).  Nine ESTs are new sequences that range from 60% to 93% 
identical to 
known P450s.  One is a poor sequence that is probably a 93D1 EST. See New Arabidopsis ESTs

What's New Sept. 9, 1999

On Sept. 8, 1999 Genome Systems, Inc., a wholly owned subsidiary of Incyte
Pharmaceuticals, Inc. deposited 7938 Arabidopsis mRNA sequences into Genbank.
These ESTs run from AI992384-AI999813 and from AW004082-AW004589.  
This is about 4 - 4.5Mb of sequence. Currently, these cannot be searched for 
P450 genes 
by BLAST.  It may take a day to update the searchable database so these 
sequences are 
there. I suspect there will be some new P450s in this data set.

What's New Sept. 2, 1999

The Cytochrome P450 homepage has a new look.  A table has replaced the 15 gif 
images 
that used to serve as buttons.  This should shorten loading time and improve the 
feel of the 
page. It will also allow addition of new buttons for different species as time 
goes on.
(There are many soybean ESTs, and I have not worked on zebrafish yet)  

What's New Sept. 1, 1999

Tomato Bursts on the P450 Scene

A tomato EST project has generated more than 26,000 ESTs, with 8616 deposited in 
March, 11789 in June, and 5394 in July 1999.  These have now been searched for 
P450 
containing sequences.  There are 235 P450 coding tomato ESTs.  These sort into 
58 
contigs, that include 5 complete genes (CYP51, 73A24, 76A6, 88B1 and 707A4) not 
known previously from tomato.  For more info see the Tomato P450 page

What's New Aug. 26, 1999

A new 68 sequence tree with the CYP4 family is given.  This has 63 4 Family 
sequences, 
and 5 additional sequences.  The 48 family and the CYP4P subfamily are at a 
boundary 
between the insect cluster and other animal CYP4 members.  On earlier versions 
of this tree 
they fell outside the 4 cluster.  Here they are just inside.  
A 4 family tree with emphasis on insects August 





26, 1999[PDF]

What's New Aug. 25, 1999

39 P450 containing ESTs from Zea mays have been translated and sorted by family
see Zea mays ESTs with P450s
 

What's New Aug. 23, 1999

 A note on Plant P450 evolution
new ESTs from pine are clarifying how old some plant P450 families are.

What's New Aug. 20, 1999

The diversity of plant P450s has been displayed in a new table that lists plant
higher level taxonomic groups and shows which groups have known P450 sequences.
The more than 400 plant P450s belong to 65 species.  These include 3 conifers 
and 62 
angiosperms.  Ten species of monocots have known P450 sequences.  Seven of these 
are 
crop plants among the grasses (Poaceae).  The majority of species are among the 
eudicots, 
with 22 in the eurosids (11 in Fabales) and 24 in the asterids (10 in 
Solanales).  
P450s are known from only 19 of the 60 higher order groups listed.  
 Go to the table.

What's New Aug. 13, 1999

The plant P450 database has been updated (see the Databases button on the home 
page).
Plants now have over 400 known P450s.  212 
are from Arabidopsis.  The Deep Green meeting 
in St. Louis described in todays Science has found that plants should be split 
into 
three kingdoms not just one.  Green, brown and red plants split at about the 
same time 
from a common ancestor.  Each of these divisions should be accorded kingdom 
status.  
This is different from the usual view, and it will be interesting to see how 
this sorts out.  I 
will be interested to know which P450s are specific to each clade, and which 
predate the 
split.

What's New Aug. 10, 1999

A press release of July 28 from Celera stated that one million sequences (500 
million bp of 
sequence ) have been completed from the Drosophila genome.  Below is a quote 
from the 
press release.

"Celera expects to complete the random sequencing phase of
Drosophila in early September when it will begin sequencing the
human genome. This will entail completing another 2 million
sequences-or about 1 billion letters of genetic code. Working with
the Berkeley Drosophila Genome Project (BDGP), Celera will then
fill gaps and resolve ambiguities in the sequence to produce finished
sequence. Celera will begin making sequence data available to the
public in October 1999, and anticipates release of the completed
sequence by the end of the year and publication in collaboration with
the BDGP in early 2000."

What's New July, 28, 1999

Celera Genomics will finish the Drosophila genome in August or September 
according to 
an NPR interview with Craig Venter that aired on July 27.  It was not mentioned 
when the 
sequence data will be posted to Genbank.

The rice P450 page has been updated.  There are some new ESTs and some contigs 
have 
been joined

What's New July, 16, 1999

A tree with 30 plant P450s coving the CYP93, 705, 706 and 712 families has been 
posted 
under the trees button on the home page.  The CYP93D1 and CYP712A2 sequences had 
their names reversed in the bibliographic pages and FASTA files.  This has been 
corrected.

What's New July, 14, 1999


The only possible mitochondrial P450 from C.elegans has been reexamined to 
better 
identify the intron exon boundaries based on similarity to CYP12A sequences of 
insects.
One previously missed exon is added based on an EST sequence and another exon is 
extended to fill a critical gap.  Only about 40% of the sequence is covered by 
ESTs, so five 
of the intron boundaries are theoretical. 

CELZK177  cosmid ZK177 U21321 CYP44 Probable mitochondrial P450 489 aa
C70591 is an EST from the middle region that adds 49 amino acids not previously 
identified as coding region in ZK177. Another exon extended 13 amino acids helps 
fill in 
the missing I-helix region with DGLSTT matching with AGXDTT of the I-helix.
* indicates predicted intron locations ** indicates verified introns based on 
ESTs.

MRRSIRNLAENVEKCPYSPTSSPNTPPRTFSEIPGPREIPVIGNIGYFKYAVKS*
DAKTIENYNQHLEEMYKKYGKIVKENLGFGRKYVVHIFDP*ADVQTVLAADGKTPFIVPL
QETTQKYREMKGMNPGLGNL*NGPEWYRLRSSVQHAMMRPQSVQT*YLPFSQIVSN
DLVCHVADQQKRFGLVDMQKVAGRWSLESAGQILFEKSLGSLGNRSEWADGLIEL
NKKIFQLSAK**MRLGLPIFRLFSTPSWRKMVDLEDQFYSEVDRLMDDALDKLKVNDSDS**
KDMRFASYLINRKELNRRDVKVILLSMFSDGLST*TAPMLIYNLYNLATHPEALKEIQKE
IKEDPASSKLTFLRACIKETFRMFPIGTEVSRVTQKNLILSGYEVPAGTAVDINTNVL
MR**HEVLFSDSPREFKPQRWLEKSKEVHPFAYLPFGFGPRMCAGRRFAEQDLLTSL
AKLCGNYDIRHRGDPITQIYETLLLPRGDCTFEFKKL

What's New June 23, 1999

Rice P450s have been updated by exhaustively searching the genbank entries for 
new 
accession numbers.  The sequences have been translated and added to the FASTA 
P450 
rice list.  See the rice button on the homepage for more info.  There are 192 
accession 
numbers for rice P450s

What's New June 16, 1999

Two new trees have been made with CYP4 sequences as the main content.  These 
trees 
illustrate the difficulty of maintaining a sensible nomenclature when families 
get very large.
There are some inconsistencies.  The 4D subfamily has been split over time into 
two 
different clusters of sequences.  Cyp4p should probably be in a separate family.  
The 4F 
subfamily has also been split.  I have started using double letters as in 
Cyp4aa1 for new 
subfamilies because the single letters have all been used.  Please go the the 
Drosophila 
section to see these new trees.

What's New June 14, 1999

A Tree of 56 Insect P450s
The tree with 56 insect P450s includes many new Drosophila 
sequences.  Some are not yet named.  This tree is based on an alignment that 
covers the I-
helix to the ends of the sequences, since many are missing the N-terminal. The 4 
family 
sequences are not included here.  There are too many to fit, they will be 
treated in a separate 
tree.

What's New June 11, 1999

The Drosophila P450s have been found in Genbank by systematic BLAST searches of 
the 
nr, month, others ESTs, gss and htgs sections, using different P450 family 
representatives.  The first search with Cyp4d2 yielded 101 new ESTs, 6 new 
sequences 
from month, one from htgs and none from gss or nr.  The second search with 
Cyp6d2 only 
found 17 new ESTs and one sequence from month.  The third search hit only 5 new 
ESTs 
and one sequence from nr.  At this point the search was halted, since the 
returns were not 
worth the effort of scanning the output for new sequences.  Some of the new 
sequences are 
very different from other P450s (AC005130) and cannot be easily assembled into a 
complete sequence by comparison with known P450s. I have identified exon 
containing 
ORFS from this gene, but I cannot detect the exon boundaries.  If you are brave 
have a try 
at it.  The new sequences (almost 300 total in the original FASTA file) have 
been compared 
with each other by repetitive Do-It-Yourself WU-BLASTs and condensed onto 98 
contigs.  Ten of these are from other Drosophila species, 88 are from D. 
melanogaster.  
Based on C. elegans 80 P450 genes, these 88 genes and gene fragments may 
represent 
nearly all the P450s from Drosophila, though some are probably N- and C-
terminals of the 
same gene and the number of contigs will drop as the genome is completed.

What's New June 7, 1999

The Drosophila P450 FASTA sequence file has been updated by including many new 
sequences starting with genbank numbers AI (ESTs) and AL (genome survey 
sequences)
Work continues to sort these and assign them to known sequences or related 
families.

What's New June 4, 1999

I have done a preliminary search of the new Drosophila sequences and have 
translated 
many of them in the file Drosophila P450s.
I am also finding some older P450s that have come out since 
I last worked on the Drosophila sequences.

What's New June 2, 1999

On May 28, 1999 28,049 Drosophila genome survey sequences were deposited from 
Genoscope in France.  These are BAC end sequences.  The percent of the 
Drosophila 
genome sequenced as reported at the MOT tables jumped from 15% to 24%.  I have 
not 
had a chance to search these for P450 hits, but there should be a number of new 
P450s in 
this large sequence collection of 9% of the Drosophila genome.

What's New May 31, 1999

NCBI has started a new service called Locus Link.  This is an attempt to collect 
information 
about a family of proteins/ genes on a single page with links to the relevant 
databases like 
OMIM, Genbank, Unigene etc.  There is a cytochrome P450 page.  Some of the 
information comes from the Cytochrome P450 homepage, which is listed as a 
collaborator.  
The list of collaborators is short, with only seven names, including several 
major databases.  
The cytochrome P450 homepage is honored to be among these sources. 



Locus 


Link P450 page at NCBI

What's New May 25, 1999

Having trouble with those pesky human P450 polymorphisms? Try 
The Official Human P450 Allele 





Nomenclature Site
(in Sweden).

What's New May 20, 1999

A possible chloroplast P450 has been reported from almond this sequence is 
probably in 
the CYP71D subfamily.  Therefore, the whole 71D subfamily might be localized in 
the 
chloroplast.   Chloroplast P450

What's New April 5, 1999

A large cluster of Arabidopsis P450s has been found on AB024038.  This sequence 
contains 14 different P450s and only two were known previously.  See CYP86C2 and 
CYP71B16 to CYP71B26.  CYP71B3 and CYP71B4 were already known.

What's New Mar. 25, 1999

A discussion of early eukaryotic P450 evolution is available looking at P450s in 
fungi and 
the slime mold Dictyostelium. Read Me

What's New Mar. 2, 1999

The CYP72 family has been sequenced in Arabidopsis (accession number AB023038) 
and 
there are 8 genes and one pseudogene present.  This is similar in size to the 
Arabidopsis 
Z97338 gene cluster of CYP702 and 705 genes that has 8 genes and two 
pseudogenes.  
The spacing between the genes is less than 1000bp except for one gap of about 
6000bp.  
See the nomenclature files CYP72A7 to CYP72A15 for nucleotide positions.  The 
sequences have been placed in the FASTA Arabidopsis file.

What's New Feb. 25, 1999

An estimate has been made for the number of P450 genes in Arabidopsis (372) see 
P450s 
sorted by family.

What's New Feb. 23, 1999

Nine new Arabidopsis P450s have been named 71A16, 79F1, 79F2, 705A5 (now 
complete), 705A10P, 705A11P, 705A12, 707A3 and 708A2.

What's New Feb. 17, 1999

Rat UNIGENE entries have now been included as a separate file to complement the 
mouse 
and human files.

What's New Feb. 12, 1999

Lists have been compiled of all the human and mouse P450s in the UNIGENE 
database.
These do not include all pseudogenes, but most of the normal genes are present.  
The 
human UNIGENE is much more complete than mouse, with only 3 of 48 human genes 
not 
having a UNIGENE entry.  click on the mouse and human buttons to go to these 
files.
In the files, the UNIGENE entries are hyperlinked to take you there 
immediately.

What's New Feb. 4, 1999

I have decided to make a data giveaway on human P450s lying undiscovered in the 
EST 
database.  I did a search for these in 1995 and have been sitting on them for 
several years.  
About half are now cloned, but half are not.  So here they are! New human P450s enjoy.

What's New Feb. 1, 1999

I have begun a housekeeping reorganization of the P450 homepage.  It has grown 
to the 
point of becoming cluttered and hard to use.  To make access to desired sections 
easier, I 
have added buttons to link to different subsections of the site.  The new 
homepage also has 
a name http://drnelson.utmem.edu/CytochromeP450.html that is easier for search 
engines 
to locate.  It may be a few days before all the subsections are properly 
configured.

What's New Jan. 21, 1999

All rice P450 fragments in the EST and GSS databases have been found, translated 
and 
sorted according to their best matches to other P450s.  There are 134 of these 
fragments. Rice Cytochrome P450s 

What's New Jan. 8, 1999

Plant P450s have been updated and the Plant 





Cytochrome P450 Database
has been completed. All the files have been made so the hyperlinks work. Public and Confidential sequence lists are available for each subfamily. The total count for plant P450s is now 289. This does not count all the ESTs and fragments produced from the Genome Survey Sequences. Lower eukaryote P450s have been updated and the Lower Eukaryote Cytochrome P450 Database has been completed. All the files have been made so the hyperlinks work. Public and Confidential sequence lists are available for each subfamily. The total count for lower eukaryote P450s is now 58. This does not count all the ESTs and fragments from genome projects on Dictyostelium discoideum(24) or trypanosoma cruzi(2).

What's New April 21, 1998

I have just finished an analysis if Arabidopsis P450s and estimate there are a 
minimum of 
137 P450 genes in Arabidopsis.  All ESTs are translated and genes assembled from 
genomic sequence.  See Cytochrome P450 Arabidopsis 





Links
.

What's New Mar 13, 1998

An  alignment  of 60 of the 80 C. elegans P450s 
is posted, along with 45 other sequences from many other families.  The 20 C. 
elegans 
sequences that are not included in the alignment are either pseudogenes 
(CYP13A9P, 
CYP25A6P, CYP33C10P, CYP33E3P CYP35D2P, or they are very similar to sequences 
in the alignment, such as the CYP13A2-CYP13A8, CYP13A10, CYP14A3-CYP14A5, 
CYP25A3-CYP25A5 sequences. CYP33C2 is the only sequence still missing the C-
terminal at the end of a contig, so there might be more CYP33C genes in the 
missing 
region.
.

What's New Mar 12, 1998

The C. elegans genome is now nearing completion.  I have just done a 
comprehensive 
search of C. elegans for P450s and named all the new sequences.  There are 80 
genes for 
P450s in C. elegans.  Some of them are in large clusters with 10-13 genes, some 
of which 
are probably in operons.  A small number (6-7) are pseudogenes.  Since the whole 
genome 
of C. elegans is predicted to have about 16,000 genes, these P450s represent an 
unusually 
large subset comprising about 0.5% of the total.  Some P450 clusters are in 
clusters of 
olfactory receptor related genes.  Perhaps the P450s are in some way acting in 
concert with 
the olfactory receptors.  Here is an excerpt from the bibliographic page with 
these 
sequences.
.

There are 80 C. elegans P450s listed here, surpassing the known mouse and human 
complements.  The C. elegans genome is now officially 77% complete.  However, 
the 
amount of sequence in the Blast searchable database at Washington Univ. is 
117Mb, more 
than the 100Mb size of the genome.  Therefore, we can guess that this set 
includes all the 
P450 genes in C. elegans, but the distrubution is not even. Most P450 genes (43 
genes) are 
on chromosome V. see additional info on C. elegans P450s see this list. To see the actual 
sequences go to the C. elegans sequence file.  So 
far we are missing CYP11A 
and 
CYP11B, CYP17, CYP19, CYP21, CYP24 and CYP27A and CYP27B.  Does C. elegans 
make steroids? The present evidence would suggest not. Does C. elegans have 
mitochondrial P450s?  There is one probable mitochondrial P450 in C. elegans on 
cosmid 
ZK177 named CYP44.  This sequence is incomplete, missing part of the I-helix and 
some 
sequence upstream of that.  It probably cannot code for a functional gene.  
.

What's New Dec 19, 1997

I have searched for new cytochrome P450s in Arabidopsis and find 37 full length 
P450s 
that are new or that complete fragments from ESTs.  There are also 38 new ESTs 
for 
Arabidopsis P450s.  New Arabidopsis sequences

There is a new file with the protein sequences of public Arabidopsis P450s

FASTA Arabidopsis sequences

What's New Oct. 21, 1997

The crystal structure of CYP55A1 has appeared in Nature Structural Biology
volume 4, number 10, October 1997 pp. 827-832.  This enzyme is a nitric oxide 
reductase.
The source is fungal (Fusarium oxysporum), though the protein is soluble and 
falls in 
phylogenetic trees in the 105 family.  This probably represents a lateral gene 
transfer from
a bacterial species like Streptomyces.
.

What's New Oct. 2, 1997

On naming P450s, problems and approaches
.

The four tables of the P450 database (plants, animals, bacteria and lower 
eukaryotes) are 
under construction.  It may take a while to get the data entered that will make 
these tables 
useful.  If you have additional suggestions, let me know.

What's New July 1, 1997

.

The Arabidopsis data listed for Arabidopsis families and Arabidopsis 





ESTs 
has been updated. .

What's New June 25, 1997

.

I have revised the genes per species data listed 



here
.

What's New May 22, 1997

.

Francis Durst has asked me to post an announcement and web link for the 

IVth International Symposium on Cytochrome P450 Biodiversity and Biotechnology 


to be held in Strasbourg, France July 12-17 1998.  I wouldn't want to miss this 
one.

Jump to Strasbourg

.

What's New May 16, 1997

.

The new C. elegans sequences have been assigned names with 17 members in 
the 
CYP33 family and 9 members in the CYP35 family.  

What's New April. 9, 1997

.

I received permission to put the C. elegans translations on my webpage.  
To see the 
sequences that are presented in the same order as the list below click 
here
.

What's new Mar. 25, 1997

*** Note The CYP1B1 gene has been linked to primary congenital glaucoma****
See April 97 Human Molecular Genetics

What's New Feb. 20, 1997

.

After much effort, I have pieced together 71 cytochrome P450 protein sequences 
starting 
from C. elegans genomic sequences.  Some have been in the GenBank 
database for 
some 
time.  Many others are unfinished cosmid sequences in the ftp repositories at 
the Sanger
Center and the Wash U. Genome Sequencing Center.  These later sequences were 
downloaded and translated in three frames to identify where the beginning and 
ends of the 
sequence lie.  The region was fed through the Baylor College of Medicing 
GeneFinder 
website set for nematode sequences.  If this successfully assembled the gene, I 
stopped 
there.  If not, then I did it the hard way by inspection of the three translated 
frames to find 
the most probable protein sequence.  These sequences may not have the correct 
intron exon 
structures, but they should be pretty close.  I should mention that the Gene 
Finder 
program was not very successful in constructing P450 genes.  I had to do most of 
them by 
myself.  To see a list click here.  I will try to 
post the 
actual sequences soon if I get permission from the sequencers.
.

THIS REPRESENTS THE LARGEST COLLECTION OF P450S FROM A SINGLE 
ORGANISM.  The genome of C. elegans is about 67% complete, so there might 
be 
as 
many as 100 P450 genes in C. elegans, though there are a lot on 
chromosome V 
for some 
reason.  This may skew the results, so there might be much less than 100.  The 
odd 
distribution is noticeable on CHR I also.  There is only one P450 on CHR I so 
far.
.

What's New July 24, 1996

.

I have started assembling a Rogue's Gallery (and email directory) of P450 
researchers.

As of August 28, 1996 there are 231 entries (and 47 pictures).

If you dare to be seen in such a collection, send me your picture and any 

comment you might want to appear with it. 

You can compress a .gif file with BinHex and email it to me at:
 
dnelson@utmem1.utmem.edu

or you can send a photo in the mail to:

Dept. of Biochemistry

University of Tennessee

858 Madison Ave.

Memphis, TN 38163

.


To go to the Rogue's Gallery click 
here

I would advise you to turn off the autoload images option on your browser, 

otherwise all the images will be downloaded automatically.
.

My research interests

.

Biosketch

.

For Mitochondrial carrier info click here

.

.

Cytochrome P450 alignments

.

What's New January 28, 1997

.


	In looking for new bacterial P450s I found 19 P450s that are new to the 
alignment.  
Mycobacterium tuberculosis has at least eleven P450 genes, a bacterial 
record.
One shows some similarity to CYP51 (34% identity) and one must wonder if it has 
a 
similar function.  

What's New July 10, 1996

.


	I have just analyzed 163 Arabidopsis expressed sequence tags found 
in the 


ESTdb.  These have all been translated and aligned with the other plant P450s in 
a 

new plant only alignment of 125 public sequences.  39 confidential sequences 
have 

been removed from a comprehensive 165 plant sequence alignment.  Some 

interesting features are noted.  A CYP72 sequence has been assembled from 

Arabidopsis ESTs.  Two EST fragments are CYP51 sequences for lanosterol 
14 


demethylase, one is from Zea mays.