Drosophila P450 Links

What's New Jan. 27, 2000

All Drosophila melanogaster P450 protein sequences have been posted as a FASTA File All Drosophila melanogaster P450s These sequences are all in Genbank. There are no confidential sequences left. There are 86 P450 genes and 4 pseudogenes. CYP51 is absent. Since CYP51 is also absent in C. elegans, this important eukaryotic sterol biosynthetic gene may have been lost in the common ancestor of flies and nematode worms.

What's New Jan. 19, 2000

I have posted a 4 family tree with 89 sequences, including the new Drosophila sequences in the 4 family and in the 18 clan. New 4 Family tree A second tree covering the remaining sequences including the 6, 9, 12 and 28 families is also here New 6 and 9 Family tree

What's New Jan. 2, 2000

Two new sequence alignments of the I-helix to the end half of the proteins are posted. The first alignment covers 73 sequences (mainly insect sequences) in a 4 family alignment. The second alignment has 80 sequences (mainly 6, 9, 12, 28 families) with many new Drosophila sequences included in both alignments. See second alignment. These will form the basis for naming the new Drosophila P450s, which will be done by Jan. 15, 2000. Once the trees for these alignments have been debugged and polished, they will be posted with the new nomenclature for the Drosophila sequences. The trees will contain about 31-32 confidential sequences, but these will not be in the alignments. Nov. 29, 1999 We are now in the final stages of completion of the fly genome. In the next month, nearly all P450s in Drosophila should be identified and posted to this site. Currently 80 N-terminal sequences have been identified and 57 are from complete sequences. The remaining 23 partial sequences should be filled in soon as the Drosophila data is deposited from Celera. I will hold off naming them until the sequence data is complete. For earlier updates see the Whats New section of the main page. A press release of July 28 from Celera stated that one million sequences (500 million bp of sequence ) have been completed from the Drosophila genome. Below is a quote from the press release. "Celera expects to complete the random sequencing phase of Drosophila in early September when it will begin sequencing the human genome. This will entail completing another 2 million sequences-or about 1 billion letters of genetic code. Working with the Berkeley Drosophila Genome Project (BDGP), Celera will then fill gaps and resolve ambiguities in the sequence to produce finished sequence. Celera will begin making sequence data available to the public in October 1999, and anticipates release of the completed sequence by the end of the year and publication in collaboration with the BDGP in early 2000." Note July 9: The Rubin sequencing effort continues to deposit more sequence with over 1700 in June and at least 26 so far in July. These will be searched for new P450s. Note June 16: The 4 family has been partially displayed in a tree including subfamilies 4A- 4P, with many mammal sequences included. The 47 family is probably missnamed. Also 4E4 should be a separate subfamily. Another tree has been prepared that reduces the number of very similar sequences to include more CYP4 subfamilies up to Cyp4aa1 (formerly Cyp47). See the the tree above with 61 CYP4 sequences. Note: June 14, 1999 The tree with 56 insect P450s includes many new Drosophila sequences. Some are not yet named. This tree is based on an alignment that covers the I- helix to the ends of the sequences, since many are missing the N-terminal. The 4 family sequences are not included here. There are too many to fit, they will be treated in a separate tree. Note: June 11, 1999 The Drosophila P450s have been found in Genbank by systematic BLAST searches of the nr, month, others ESTs, gss and htgs sections, using different P450 family representatives. The first search with Cyp4d2 yielded 101 new ESTs, 6 new sequences from month, one from htgs and none from gss or nr. The second search with Cyp6d2 only found 17 new ESTs and one sequence from month. The third search hit only 5 new ESTs and one sequence from nr. At this point the search was halted, since the returns were not worth the effort of scanning the output for new sequences. Some of the new sequences are very different from other P450s (AC005130) and cannot be easily assembled into a complete sequence by comparison with known P450s. I have identified exon containing ORFS from this gene, but I cannot detect the exon boundaries. If you are brave have a try at it. The new sequences (almost 300 total in the original FASTA file) have been compared with each other by repetitive Do-It-Yourself WU-BLASTs and condensed onto 98 contigs. Ten of these are from other Drosophila species (4d10, 4e5, 6a9, 9b3, 9f1, 13b1, 28a1, 28a2, 28a3, 28a4), 88 are from D. melanogaster. Based on C. elegans 80 P450 genes, these 88 genes and gene fragments may represent nearly all the P450s from Drosophila, though some are probably N- and C-terminals of the same gene and the number of contigs will drop as the genome is completed. Note: On May 28, 1999 28,049 Drosophila genome survey sequences were deposited from Genoscope in France. These are BAC end sequences. The percent of the Drosophila genome sequenced as reported at the MOT tables jumped from 15% to 24%. I have not had a chance to search these for P450 hits, but there should be a number of new P450s in this large sequence collection of 9% of the Drosophila genome. A preliminary BLAST search with 6a2 as the query found 37 bona fide P450 hits in the genome survey sequences in the month section of genbank. These probably represent 25 different genes. There are probably more than this, but I will have to search with other families like 9 and 28 to find them. These sequences have now been translated and added to the FASTA file above. June 10, 1999. More extensive searches of the nr, est, htgs, gss, and month sections of Genbank have identified 235 ESTs, 44 genome survey sequences, 30 AC00XXXX genomic P1 clones and 41 other sequences for a total of 350 accession numbers for Drosophila P450s. These have all been translated and are being assembled into contigs. (See the FASTA file) |AC007549 Drosophila melanogaster chromosome 2 clo... 1012 0.0 Cyp6a2 emb|AL054861.1|CNS00A30 Drosophila melanogaster 182 3e-77 cyp6a9 emb|AL053264.1|CNS0098O Drosophila melanogaster 272 4e-73 cyp6a9 emb|AL072094.1|CNS00GEP Drosophila melanogaster 178 1e-72 cyp6a9 emb|AL055555.1|CNS004XH Drosophila melanogaster 171 5e-50 cyp6a9 emb|AL070586.1|CNS00DGA Drosophila melanogaster 108 1e-38 cyp6a9 emb|AL054261.1|CNS004MS Drosophila melanogaster 105 1e-22 cyp6a9 emb|AL069964.1|CNS00DFU Drosophila 136 6e-32 72% identical to 6a9 emb|AL054065.1|CNS004PR Drosophila melanogaster 222 2e-62 cyp6a8 emb|AL063862.1|CNS00350 Drosophila melanogaster 123 5e-28 cyp9c1 emb|AL076220.1|CNS00JFP Drosophila melanogaster 77 5e-23 cyp9c1 gb|AC007581.2|AC007581 Drosophila melanogaster 89 7e-18 cyp9c1 gb|AC007291.10|AC007291 Drosophila melanogaster 57 9e-14 cyp4e3 emb|AL076873.1|CNS00JXU Drosophila 191 1e-59 exact match with AA951440 emb|AL076863.1|CNS00JXK Drosophila 173 2e-79 exact match with AA951440 emb|AL052842.1|CNS000F5 Drosophila 215 8e-56 exact match with AA699131 emb|AL074108.1|CNS00HVU Drosophila 171 7e-48 exact match with AA699131 emb|AL078165.1|CNS00KMI Drosophila 196 4e-50 exact match with Dm3472 emb|AL069773.1|CNS00ERU Drosophila 87 4e-17 exact match to Dm3472 emb|AL065891.1|CNS006T4 Drosophila 126 5e-29 exact match with AA141600 emb|AL058810.1|CNS0017H Drosophila 68 3e-11 exact match with Dm0590 emb|AL055637.1|CNS00ALR Drosophila 62 1e-09 exact match with AL058497 emb|AL058497.1|CNS00BYD Drosophila 40 0.006 exact match to AL055637 emb|AL070449.1|CNS00FAM 72 1e-12 exact match with composite sequence CK01076 emb|AL059533.1|CNS005I8 62 2e-09 exact match with composite sequence CK01076 emb|AL061295.1|CNS001S5 Drosophila 59 1e-08 exact match to L46858 emb|AL061650.1|CNS00613 Drosophila 74 3e-13 60% identical to L46858 emb|AL065705.1|CNS006L5 Drosophila 58 2e-08 exact match to AA698945 emb|AL059237.1|CNS00CG4 58 2e-08 exact match to AC005811, AL062712, AL068269, AL075733 emb|AL062712.1|CNS002HH 54 3e-07 exact match to AC005811, AL059237, AL068269, AL075733 emb|AL068269.1|CNS00LIR 70 3e-18 exact match to AC005811, AL062712, AL059237, AL075733 emb|AL075733.1|CNS00J4Z 51 2e-06 exact match to AC005811, AL062712, AL068269, AL059237 emb|AL054245.1|CNS009UB Drosophila 46 2e-12 exact match to AL062352 emb|AL062352.1|CNS002D3 Drosophila 89 4e-29 exact match to AL054245 emb|AL057969.1|CNS00BXP Drosophila 61 3e-09 exact match to AL067059 emb|AL067059.1|CNS007EC Drosophila 56 9e-08 exact match to AL057969 emb|AL057750.1|CNS00162 136 5e-58 65% identical to AL062684 emb|AL062684.1|CNS002GP 155 7e-38 65% identical to AL057750 gb|AC007356.6|AC007356 Drosophila 71 2e-12 probable mitochondrial clan sequence gb|AC005472.9|AC005472 114 2e-25 66% identical to AA567377 gb|AC007571.2|AC007571 Drosophila 82 1e-15 probable new family emb|AL072844.1|CNS00H2C Drosophila 80 5e-15 42% identical to 6a5 emb|AL070820.1|CNS00FNQ Drosophila 40 0.006 40% identical to CYP28A1