26 Incomplete Drosophila melanogaster P450 fragments Jan. 2, 2000

10 N-terminal sequences from incomplete P450s

AC010578 21500-20681 AA696094, AA803578, AA202305, AI546241
45% identical to 6A1 54% to AC009844 comp(64943-65869) 6A20 61% to 6A20 (small region)
This might be the N-terminal of 6a19 
MAILLGLVVGVLTLVAWWVLQNYTYWKRRGIPHDPPNIPLGNTGELWRTMP
LAGILKRTYLKFRKQTDGPFAGFYLYAMKYIVITDVDFVKTVLIRDFDKF
HDRGVYHNEKDDPLTNNLATIEGQKWKNLRQKLTHTFTSAKMKSMFSTVL
NVGDEMIRVVDEKISSSSQTLEVTDIVSRFTSDVIGICAFGLKCNSLRDP
KAEFVQMGYSALRERRHGWLVDLLIFGMPKLAGELGFQFLLPSVQKFYMK
IVQDTIDYRMKRKVTRNDFMDTLIDMKQQYDKGDKENGLAFNEVAAQAFV
FFLAG 20681

AI063421 GH03219 AI532649 SD04231, 63% identical to AA696094 see above
probably a 6a seq
LIVLLIGVITFVAWYVHQHFNYWKRRGIPHDEPKIPYGNTSELMKTVHFA
DIFKRTYNKLRNKTDGPFVGFYMYFKRMVVVTDIDFAKTVLIREFDKFHD
RGVFHNERDDPLSANLVNIDGQKWKTLRQKLTPTFTSGKMKTMFPTILTV
GDELIRVFGETASADSDSMEITNVVARFTADVIGSCAFGLDCHSLSDPKAKF

AA951440 LD31895, AA816508 LD01943, AA201305 LD04267 AC010578
46% to 6a16 50% to 6a9 probably a 6a seq
MLDVVALLLIALAVGFWFVRTRYSYWTRRGIGSEPARFPVGNMEGFRKNKHFI
DIVTPIYEKFKGNGAPFAGFFMMLRPVVLVTDLELAKQILIQDFANFEDR
GMYHNERDDPLTGHLFRIDGPKWRPLRQKMSPTFTSAKMKYMFPTVCEVG
EELTQVCGELADNAMCGILEIGDLMARYTSDVIGRCAFGVECNGLRNPEA
EFAIMGRRAFSERRHCKLVDGFIESFPEVARFLRMRQIHQDITDFYVGIV
RETVKQREEQGIVRSDFMNLLIEMKQRGELTIEEMAAQAFIFFAAGFDTS
ASTLGFALYELAKQPD

AL069964 Drosophila 72% identical to 6A9 This will be a 6A sequence
MSVGTVLLTALLALVGYLLMKWRSTMRHWQDLGIPCEEPHILMG
SMKGVRTARSFNEIWTSYYNKFRGSGPFAGFYWFRRPAVFVLEK
SLGKQILIKEFNKFTDRGFFHNPEDDPLSGQLFLLDGQKWRTMR
NSTSSTFTSGKMKY

AC009844 44542-44881 88% identical to 6A17 this will be a 6a seq.
MLLLALIVVILSLLVFAARRRHGYWQRRGIPHDVPHPIYGNIKDWPKKRHIAMIFRDYYFK
YKRSVYPFAGFYFFFTRSAVITDLELVKRVLIKDFNHFENRGIFYNEIDDPLS

AC009844 65863-64467 45% to 6a2 same as AL097801, AL057750
Some frameshifts after WKRRGI and missing C-terminal tentative name = Cyp6a20
MAVMIVLLIGVITFLAWYVHQHFNYWKRRGI
FPR*APKLPTVIPAYLMKTRPFCGYFSRDPPTKLRTKPAGPFVGFLYVFQEDW*L*PNIDSAKPE
LIREFDKFPVGGVFHNERED
PLSATLVNIDGQKWKPLRQKLTPTFTSGKMKTMFPTILTVGDELIRVFGETASADSDSME
ITNVVARFTADVIGSCAFGLDCHSLSDPKAKFVQMGTTAITERRHGKSMDLLLFGAPELA
AKLRMKATVQEVEDFYMNIIRDTVDYRVKNNVKRHDFVDMLIEMKLKFDNGDKENGLTFN
EIAAQAFIFFLAGFETSSTTMGFALYELACHQDIQDKLRTEINTVLKQHNGKLDYDSMRE
MTYLEKVITETMRKRPVVGHLIRVATQHYQHTNPKYNIEKGTGVIVPTLAIHHDPEFYPE
PEKFIPERFDEDQVQQRRACTFLPFGDGPRNCIGLRFGRMQVIVGMALLIHNFKFEITPP
NFFVKYR

AL078186 most like Drosophila AC014810 Cyp307A1 51% probable 307A2
41% to Pleuronectes platessa CYP1B 43% to M13454 Chicken PB inducible gene CYP2H
MLTSVFYVLFAIAITIILISYVFLLLKCKQKAFVVIGLLYQEKKY
QCFDQAPGPHPWPIIGNINLLGRFQYNPFYGFGTLTKKYGDIYSLSLGHT
RCIVVNNVDLIKEVLNKNGKYFGGRPDFFRYHKLFGGDRNNCKFIXXLRF

Query: 1     MLTSVFYVLFAIAITIILISYVFLLLKCKQKAFVVI----GLLYQEKKYQCFDQAPGPHP 56
             ML ++ Y + AI ++++  SY+ ++   K++    +            YQ + QAPGP P
Sbjct: 54284 MLAALIYTILAILLSVLATSYICIIYGVKRRVLQPVKTKNSTEINHNAYQKYTQAPGPRP 54105

Query: 57    WPIIGNINLLGRFQYNPFYGFGTLTKKYGDIYSLSLGHTRCIVVNNVDLIKEVLNKNGKY 116
             WPIIGN++LL R++ +PF GF  L ++YGDIYSL+ GHTRC+VVNN++LI+EVLN+NGK 
Sbjct: 54104 WPIIGNLHLLDRYRDSPFAGFTALAQQYGDIYSLTFGHTRCLVVNNLELIREVLNQNGKV 53925

Query: 117   FGGRPDFFRYHKLFGGDRNN 136
               GRPDF RYHKLFGG+R+N
Sbjct: 53924 MSGRPDFIRYHKLFGGERSN 53865

three frames 
FVVIGLLYQEKKYQCFDQAPGPHPWPIIGNINLLGRFQYNPFYGFGTLTKKYGDIYSLSLGHT
RCIVVNNVDLIIEVINKNGKYFGGRQDFFDIQALWP*SRQLLVYYIY*D
FIYFFGKSISCIRQKSTIHL

LW*SDYYTKKKNINVLIRPRGHTLGQ*SEI*IYLDDFSITPFTVLAH*PKNMATFIPYLLGTL
AA*W*ITLI***KSLTKMENTLADVRIFSISKLFGRDRDNC*FIIYTKI
LFISLVRAYLALDKKARYI

CGNRTIIPRKKISMF*SGPGATPLANNRKYKFTWTISV*PLLRFWHIDQKIWRHLFPISWAHS
LHSGK*R*FNNRSH*QKWKILWRTSGFFRYPSSLAVIETTVSLLYILRF
YLFLW*EHILH*TKKHDTF

AL062684 Cyp6a19 might be the C-terminal end of AC010578 21500-20681   
LAGxxxxSTTMGFTLYELACNQDVQDKLRAEISVLERYNGKLEYDSMQDL
FYMEKVINESLRKHPVVAHLARIATKPYQHSNPKYFIEAGTGVLVSTLGIHHDPEFYPEP
EKFIPERFDEEQVKKRPTCAFLPFGAGPRNCIGLRFGRMQVIIGLALL
IHNFRFELHPKTPVPMKYTINNLLLGSEGGIHLNITKVVRD

AC018294 Drosophila melanogaster, *** SEQUENCING IN PROGRESS ***, in ordered
5this sequence is complete but I cannot unambiguously identify the N-terminal 
exon
36-529  44% to 12A2 probable mito seq.
1755 MYHTLMLNIGFICRMSTQK* 1814  CANNOT IDENTIFY THE N-TERMINAL
2205 ATAVNLEEAKPYADIPGPSKLQLIRAFLPGGK
2369 GLYKNLPVHEMFLDMNRQYGSIFRMPSVAGTDLVLTMNPQDYEVIFRNEGQYPYRRSFE 2545
2546 VMDYFKRVHRREVFDGYDGLTSGNGPAWGKMRTAVNPILLQPRN 2725
2726 AKLYMTNLVQVSDEFLER* 2776
2838 IRIIRDPVTQEMPDDFAVDIRHLVIESICSVALNTHLGLLGEQRNNKDIQKLVLALQDVV 3017
3018 ELGFQLDIMPAFWKYLPMPNFKKLMRSLDTITDFCYFHIGNALKRIEEDAKAGTLNEIGL 3197
3198 ETSLLEKLARFDRQTAVIIAMDLLFAGADPVSLTLGGILFS 3377
3378 LSKSPDKQARLLEEIRGILPNKDSSLTIENMRNLPYLRACIKEGIRMYPIGPGTLRRMPH 3557
3558 DVVLSGYRVVAGTDVGIAANYQMANMEQFVPKVREFIPERWLRDESNSHLVGE 3716
3717 TATPFMYLPFGFGPRSCAGKRIVDMMLEIAISRLVRNFKIGFDYPIENAFKAQFFVQPNI 3896
3897 PFKFKFIERNE* 3932

AL063519 T7 end of BAC BACR07K20 STS G01307 is exact match
AC006496 AC012699
MAVILLLALALVLGCYCALHRHKLADIYLRPLLKNTLLEDFYHAELIQPEAPKRRRRGI
WDIPGPKRIPFLGTKWIFLLFFRRYKMTKLHE
YGDIVLEVMPSNVPIVHLYNRDDLEKVLKYPSKYPFRPPTEIIVMYRQSRPDR
YASVGIVNEQGPMWQRLRSSLTSSITSPRVLQNFLPALNAVCDDFIELLRARRDPDTLV
VPNFEELANLMGLEAVCTLMLGRRMGFLAIDTKQPQKISQLAAAVKQLFISQRDSYYGLG
LWKYFPTKTYRDFARAEDLIYE
Orf with I helix
SQSSVISEIIDHELEELKKSAACEDDEAAGLRSIFLNI
LELKDLDIRDKKSAIIDFIAAGIET
Orf with EXXR motif
LANTLLFVLSSVTGDPGAMPRILSEFCEYRDTNILQDALTNA
TYTKACIQESYRLRPTAFCLARILEEDMELSGYSLNAG
Orf with PERW and heme motifs
TVVLCQNMIACHKDSNFQGAKQFTPERWIDPATENFTVNVDNASIVV
PFGVGRRSCPGKRFVEMEVVLLLAK

*VCGNTFPPKRTGTLPAPRT*SMSRYTEGWAPISIPI*LKSLSIQCDLRDHRS*AGGTQKVGCLRG*RGCWITKYLSE
YS
GAQGSGYQGQKVSDHRLYCRWHRNGGCLVVFQMEFMLNNF*FLVSQHFVVCTEFCYWRSRCYATNPK*ILRVSGHEYP
AG
CTNECHIHKGLYTGVLQTEAHSLLPGQNPGGGHGALGLLA*CRGKLKLSFRSLSLLFPSFWNLLFYILFLEINISFSY
Q*
FVSDL*TVVLCQNMIACHKDSNFQGAKQFTPERWIDPATENFTVNVDNASIVVPFGVGRRSCPGKRFVEMEVVLLLAK
VS
IIHKPQGSYF*DH*LKQPNLSLSDGPSL*CELCEATGNGVRVPAGTQNSTQSKTQRSGFLIMGALEN*SEKQCLPSGV
LY
QIQLK*LRTEPLSKINVKYA*VSTHICIHIYVYPLSLMSQHQQPSPVKQSKKS*TN*NKLTNLC*LDVK*IRMYIQTS
CF
*IYF*S*QITSSNCF*EFCKRKDCV*VLFFTILT

LGLWKYFPTKTYRDFARAEDLIYE*VHRRLGSHFYSHLT*IALNPV*SPRSSIMSWRNSKSRLPARMTRLLDYEVSF*
IF
WSSRIWISGTKSQRS*TLLPLA*KRWVFGSISNGIHVKQFLIFS*PTLCCLY*VLLLEIPVLCHES*VNSASIGTRIS
CR
MH*RMPHTQRPVYRSPTD*GPQPFAWPESWRRTWSSRATRLMQG*VKA*FSITLTFISKLLEFTILYFIFGNKY*FLL
SI
ICFRSLDCGAMSEYDSLPQG*QLPRGQAVYPRALD*SCHGEFHGERG*CQYCGALRRGSKIVSRKAFCGNGGGAAAS*
GE
YNP*APGIVFLGSLTKTTKFIFIRWS*PLM*AL*SHWKRSSSSCWHPKLHSV*DSAIGFSDNGCFRKLIRKTVPTLRR
FV
SNTTKIAKN*TIK*N*CKICLGFYTYLYTYICISFIPDEPASAA*SCEAIKKELNQLK*INKFMLTGCKVD*NVYTNV
MF
CCNYFLNLFLVVANN*QQLFLRVL*TERLCLSFVFHDFN

SRSVEILSHQNVQGLCPRRGLDL*VGTQKVGLPFLFPSNLNRSQSSVISEIIDHELEELKKSAACEDDEAAGLRSIFL
NI
LELKDLDIRDKKSAIIDFIAAGIETVGVW*YFKWNSC*TIFNF*LANTLLFVLSSVTGDPGAMPRILSEFCEYRDTNI
LQ
DALTNATYTKACIQESYRLRPTAFCLARILEEDMELSGYSLNAGVS*SLVFDHSHFYFQAFGIYYFIFYFWK*ILVSP
IN
NLFPIFRLWCYVRI**PATRIATSKGPSSLPQSVGLILPRRISR*TWIMPVLWCPSAWVEDRVQESVLWKWRWCCC*L
R*
V*SISPRDRISRITD*NNQIYLYQMVLAFDVSFVKPLETEFEFLLAPKTPLSLRLSDRVF**WVL*KTDPKNSAYPQA
FC
IKYN*NS*ELNH*VKLM*NMLRFLHISVYIYMYILYP**ASISSLVL*SNQKRAKPIKIN*QIYADWM*SRLECIYKR
HV
LL*LLLKFIFSRSK*LAAIVFESSVNGKTVFKFCFSRF*Q

AC008307 AC015141 AC007725 AC007648 AC012699 chromosome 3 clone BACR03D22 
(D709)
Orf with N-terminal
MTEKRERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKG
LPVVGTLVDLIAAGGATQ*

36181 caaatacgtg cccattccaa gggtgaaggg acttccggta gttgggacac ttgtggatct
36241 tatagccgcc ggcggagcca cgcagtaaat ccgtaaagtt taacctcgaa aagtcggaga

the intron boundary could be either at the V in GTLVTL or at the Q* CAGTAA
We assume the motif LPVVGTL will not be broken by an intron.
Orf with QYG motif
FNPYILYLFSLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEGQYPQHPLPDAWTLYNQQHACQRGLF
FM*
918 tag tttaacccgt atatccttta tctgtttagc cttcataagt
961 acatcgatgc gaggcacaag cagtatggtc ccattttccg ggagcgattg ggcggtaccc

The distance between PVVGTL motif and QYG should be near 17-24 amino acids
There are two possible ways to get this value using the two GTs in the N-
terminal exon either
MTEKRERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKG
LPVVGTLFNPYILYLFSLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEGQYPQ
HPLPDAWTLYNQQHACQRGLFFM

Or
MTEKRERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKG
LPVVGTLVDLIAAGGATHLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEG
QYPQHPLPDAWTLYNQQHACQRGLFFM

cagtatggtc ccattttccg ggagcgattg ggcggtaccc
1021 aggatgcagt gttcgtatcg tccgcaaatc tcatgcgcgg agtcttccag cacgagggtc
1081 agtatccgca gcatccgctg ccggatgcct ggacgctgta taaccagcaa catgcc

Query: 64  GLPVVGTLVDLIAAGGATHLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHE
           GLPV+G L+ +      +  H       KQYG +F  RL G+Q  V +S   ++R  F+ E
CYP18  485 GLPVIGYLLFM-----GSEKHTRFMELAKQYGSLFSTRL-GSQLTVVMSDYKMIRECFRRE

A possible site of the intron start follows FQHE at the G= GGT
A 2nd possible site is the GT that bridges the codons for QY in the HEGQYP
CAGTAT

Orf with C-helix
YIFAERIHSLTRISFICLAFISTSMRGTSSMVPFSGSDWAVPRMQCSYRPQISCAESSSTRVSIRSIRCRMPGRCITS
NM
PANGDCSSCKWSGPNESVELQMWSRRKRSLVSSSMGIGRAPSGCTTDAYLIDCCSTEI*

tgcgcgg agtcttccag cacgagggtc
1081 agtatccgca gcatccgctg ccggatgcct ggacgctgta taaccagcaa catgcctgcc
1141 aacggggact gttcttcatg taagtggagt ggaccaaatg agtccgttga attgcaaatg

a possible AG for intron end is at the S in WSGPN
The joined P450 seq would look like
MTEKRERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKG
LPVVGTLVDLIAAGGAT*HLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEG*
HGPNESVELQMWSRRKRSLVSSSMGIGRAPSGCTTDAYLIDCCSTEI*

In CYP18 there are 54 amino acids from the QYG to the WXXXR motif in the C-
helix
Orf with I-helix
SPYKMISSQLYPLVVLCCIMFGTSVLTCPKIQSSLDYFTQIVHKVFEHSSRLMTFPPRLAQILRLPIWRDFEANVDEV
LR
EGAAIIDHCIRVQEDQRRPHDEALYHRLQAADVPGDMIKRIFVDLVIAAGDT*
Orf with K-helix
TAFSSQWALFALSKEPRLQQRLAKERATNDSRLMHGLIKESLRLYPVAPFIGRYLPQDAQLGGHFIEKD*
Orf with heme
TMVLLSLNTAGRDPSHFEQPERVLPERWCIGETEQVHKSHGSLPFAIGQRSCIGRRVALK 44986
QLHSLLGRCAAQFEMSCLNEMPVDSVLRMVTVPDRTLRLALRPRTE* 44845

A rough estimate of the sequence that will need correction and verification 
from
mRNA this seq is a little short and may be missing an exon between the C and I
helices.  Intron exon boundaries are approximate.
MTEKRERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKG
LPVVGTLVDLIAAGGAT*HLHKYIDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEG*
HGPNESVELQMWSRRKRSLVSSSMGIGRAPSGCTTDAYLIDCCSTEI*
SPYKMISSQLYPLVVLCCIMFGTSVLTCPKIQSSLDYFTQIVHKVFEHSSRLMTFPPRLAQILRLPIWRDFEANVDEV
LR
EGAAIIDHCIRVQEDQRRPHDEALYHRLQAADVPGDMIKRIFVDLVIAAGDT*
TAFSSQWALFALSKEPRLQQRLAKERATNDSRLMHGLIKESLRLYPVAPFIGRYLPQDAQLGGHFIEKD*
TMVLLSLNTAGRDPSHFEQPERVLPERWCIGETEQVHKSHGSLPFAIGQRSCIGRRVALK
QLHSLLGRCAAQFEMSCLNEMPVDSVLRMVTVPDRTLRLALRPRTE*

Three frames
CPSQVADTMRSAPIRSDRRFGRRLSMLDGLKGPLRPAGRVSLPLAWPFVGTRFPERFTAESRCRWLPPAATVTYTDDR
EE
GAAGPAALAETPARPAPGANP*PEPLPFALRSAAFAAFSRNGTTACRRRQIRAHSKGEGTSGSWDTCGSYSRRRSHAV
NP
*SLTSKSRRESNRVENA*YILVSENSIKLAYAGLVQC*TIQP*LATF*KK*LSIGFSFKKVLNPTKLPKSAKRNRVFY
I*
NAS*DTKLY*NLTNRNI*IEISKLQRRKIC*NIQGRN*KTNHNLFDLREFNYNTYVLIYFRRKDP*FNPYILYLFSLH
KY
IDARHKQYGPIFRERLGGTQDAVFVSSANLMRGVFQHEGQYPQHPLPDAWTLYNQQHACQRGLFFM*VEWTK*VR*IA
NV
EPTKTISCLFFDGNREGAEWLHNRRILNRLLLNGNLNWMDVHIESCTRRMVDQWKRRTAEAAAIPLAESGEIRSYELP
LL
EQQLYRWSIEGTRRHALRADPHTK*FHPNSIHL*FCAASCLAPACSPAPRSSPRWTTSRRLCTRCLSIARD**HSRLA
WP
RFCACPSGGISRPMWMRCCVRELP*SITASECRRTKGDRTMRRFTIASRRRMCQAI*SSGYL*TWSLQQVTR*AIQLN
CK
AHY*L*LHLI*TAFSSQWALFALSKEPRLQQRLAKERATNDSRLMHGLIKESLRLYPVAPFIGRYLPQDAQLGGHFIE
KD
VSERNFFECLIAHVCLFSDHGAALLVHGRSRSITL*AAG

LPIAGSRYDAIRSDPIGSAIWSPALDARWVKRPAPAGGSSQSAPGLALCRDAFSRAFHSREQVQVVTPRCHRDVHR*P
RR
GSGRARCAG*DTCSTSSWCESLA*ASSVRAAIRRLCSVFPQRNYRLPSPPNTCPFQG*RDFR*LGHLWIL*PPAEPRS
KS
VKFNLEKSERKQSC*KCIIYSCVRKLYKIGLCRTSSMLNNSTLIGNLLKKVIINRFFF*ESA*PY*ITKKCKT**SFL
YL
KCFMRY*IILKSHKSKYLNRNIKAAKKKNMLKYTRTKLKN*SQSLRLKRVQLQYICINIFSPKGSIV*PVYPLSV*PS
*V
HRCEAQAVWSHFPGAIGRYPGCSVRIVRKSHARSLPARGSVSAASAAGCLDAV*PATCLPTGTVLHVSGVDQMSPLNC
KC
GADENDLLSLLRWE*GGRRVAAQPTHT*STAAQRKFELDGRAY*ELYQTNGGSVEKTHCGGGGDSASGEW*NTKLRTA
PV
GTTALPLVHRRYKTSCIKG*SPYKMISSQLYPLVVLCCIMFGTSVLTCPKIQSSLDYFTQIVHKVFEHSSRLMTFPPR
LA
QILRLPIWRDFEANVDEVLREGAAIIDHCIRVQEDQRRPHDEALYHRLQAADVPGDMIKRIFVDLVIAAGDTVSNPIE
LQ
SPLLIVVTPHLDRIQQSVGFVCPFKGAEAPATTGQGASYQ*FSPDARPDQGVPASVPRSSLHWPISAAGRATWRSLYR
KG
CE*KKFL*VPHRSCLPFFRPWCCSPCTRQVAIHHTLSSR

PAHRR*PIRCDPLRSDRIGDLVAGSRCSMG*KARSGRRVESVCPWPGPLSGRVFQSVSQPRAGAGGYPPLPP*RTPMT
EK
RERPGPLRWLRHLLDQLLVRILSLSLFRSRCDPPPLQRFPATELPPAVAAKYVPIPRVKGLPVVGTLVDLIAAGGATQ
*I
RKV*PRKVGEKAIVLKMHNIFLCPKTL*NWLMPD*FNVKQFNLNWQPFEKSDYQ*VFLLRKCLTLLNYQKVQNVIEFF
IS
KMLHEILNYIKISQIEIFKSKYQSCKEEKYVEIYKDETKKLITISST*ESSTTIHMY*YIFAERIHSLTRISFICLAF
IS
TSMRGTSSMVPFSGSDWAVPRMQCSYRPQISCAESSSTRVSIRSIRCRMPGRCITSNMPANGDCSSCKWSGPNESVEL
QM
WSRRKRSLVSSSMGIGRAPSGCTTDAYLIDCCSTEI*IGWTCILRAVPDEWWISGKDALRRRRRFR*RRVVKYEATNC
PC
WNNSSTVGP*KVQDVMH*GLIPIQNDFIPTLSTCSSVLHHVWHQRAHLPQDPVLAGLLHADCAQGV*A*LATDDIPAS
LG
PDFAPAHLAGFRGQCG*GAA*GSCHNRSLHQSAGGPKETAR*GALPSPPGGGCARRYDQADICRLGHCSR*HGEQSN*
IA
KPITNCSYTSFRPHSAVSGLCLPFQRSRGSSNDWPRSELPMILA*CTA*SRSPCVCTP*LPSLADICRRTRNLAVTLS
KR
M*VKEISLSASSLMFAFFQTMVLLSLYTAGRDPSHFEQPE

3 Sequence fragments that are probably the same as known sequences

AI402187 GH09810 94% identical to Cyp4d2 7 diffs probably is 4d2
PPVPMIGRWFAEDVEKRGKHIPAGTNLTMGIFVLPRDPEYFESPDEFRPE
RFDADVPQIHPYAYIPFSAGPRNCIGQKFAMLEMKSTVSKLLRQFELLPL
APEPRQLMNIVLRSANGVHLGLKPRA

AC008003 Drosophila melanogaster chromosome 2 clone BACR48D02 (D851)
matches AC008324 gene 2 after the C-helix exon only one diff with AA567719
EST AI517752 covers the intron joint and agrees with this genomic sequence
The AC008324 seq may be an alternate splice or not assembled correctly.
QKWHTRRKTLTPAFHFNILQSFLSIFK
EESKKFIKILDKNVGFELELNQIIPQFTLNNIC
ETALGVKLDDMSEGNEYRKAIHDFEIVFNQRMCNPLMFFNWYFFLFGDYKKYSRILRTIH
GFSSGIIQRKRQQFKQKQLGQVDEFGKKQRYAMLDTLLAAEAEGKIDHQGICDEVNTFMF
GGYDTTSTSLI

AC010578 Drosophila melanogaster chromosome 2 clone BACR03K23 (D1086) RPCI-98  
7
diffs with 6a17 in fist 29 aa then only one diff after that probably 6a17 seq.
EFLAQAIIFLGAGFETSSTTMGFGIYELGRNQDVQDKLREEIGNVFGKHNK
EFTYEGIKEMKYLEQVVMETLRKYPVLAHLTRMTDTDFSPEDPKYFIAKGT
IVVIPALGIHYDPDIYPEPEIFKPERFTDEEIAARPSCTWLPFGEGPRNCI
GLRFGMMQTCVGLAYLIRGYKFSVSPETQIPMKIVVKNILISAENGIHLKV
EKLAK*

12 different Internal fragments not connected to an N-terminal sequence in the
N-terminal alignment figure

Cyp9f2 AA735946 78% IDENTICAL TO Cyp9f1 This appears to be
derived from a full length version of the pseudogene sequence
DRENTFYQMGKKLTTFTFLQNMKFILLFALKSLN
KILKVEIFDRKSTQYFVRLVLDAMKYRQEHNIVRPDMINMLMEARGIIQT
EKTKASAVREWSDRSIVAQCFAFFFAGFETSAVLMCFTAHELMENQDVQQ
RLYEEVQQVDQDLEGKELTYEAIMGMKYLDQVVSEVLRKWPPAIAFDREC
NKDITFDVDGQKVEVKKGDVIWLPTCGFHRDPK

9f pseudogene AI113499 AL105104 AC017240 AC007594 comp(31455-32006)

NSFKDRENTFYQMGKKLTTFTFLQNMKFILLFALKSLNK
ILKVELFDRKSTQYFVRLVLDAMKYRQEHNIVRPDMINMLMEARGIIQTEKTKASAVRE
WSDRDIVAQCFVFFFAGFETSAVLMCFTAHELMENQDVQQRLYEEVQQVDQDLEGKELTY
EAIMGMKYLDQVVSEVLRKWPAAIAFDRECNK
EAKAVIYYLLKDYRFAPAKKSCIPLELISSGFQLSPKGGFWIKLVQR

AL062684 = AC009844 71301-70762
LAGXXXSTTMGFTLYELACNQDVQDKLRAEIXSVLERYNGKLEYDSMQDLFYME
KVINESLRKHPVVAHLARIATKPYQHSNPKYFIEAGTGVLVSTLGIHHDPEFY
PEPEKFIPERFDEEQVKKRPTCAFLPFGAGPRNCIGLRFGRMQVIIGLALLIHNFRF
ELHPKTPVPMKYTINNLLLGSEGGIHLNITKVVRD*

L46858, AL061295 (BACR004K24) 52% identical to cyp6a5
EFTYDSMQELRYMELVIAETLRKYPILPQLTRISRHLYAAKGDRHFYIEP
GQMLLIPVYGIHHDPALYPEPHKFIPERFLADQLAQRPTAAWLPFGDGPR
NCIGMRFGKMQTTIGLVSLLRNFHFSVCPRTDPKIEFLKSNILLCPAHGI
YLKVQQLSQMSS*

AL061650 Drosophila melanogaster genome survey sequence TET3 end of
TLRGXPLLPRLTRFSGLLYAARGVRLFXFGPGLLLLXPVYGIXXVPALXPXPHRFI
PERX 539
LAGRLAPRPAAAWLPFGVGPRXCVGMGFGRVPAAVGLVGLLRIFRFGVCPRPGP
GVAFLR 359
SPFLLCPAXGFCLGVPRL 305

AC009844 28950-28363 and 42575-43341 same sequence
VLEIVDLVARYTPDVIGNCAFGLNCNSLQNPNAEFVTIGKRAIIERRYGGLLDFLIFGFP 28771
KLSRRLRLKLNVQDVEDFYTSIVRNTIDYRLRTNEKRHDFMDSLIEMYEKEQAGNT 28603
EDGLSFNEILAQAFIFFVAGFETSSTTMGFALYELALDQDIQKHNNEF 28423
TYEGIKEMKYLEQVVMETLRKYPVLAHLTRMTQTDFSPEDPKYFIPKGTTGVIPALGIHYDPEIYPEP
GEVKPERLTDEAIAARPSCTWL

AC009844 35993-35161
IERFFMRIVRETVAFREQNNIRRNDFMDQLIDLKNKPLMVSQSGESVNLTIEEIAAQAF 35817
VFFAAGFETSSTTMGFALYELAQNQDIEKCNGELNYESMKDLVYLDQ 35637
ETLRLYTVLPVLNRECLEDYEVPGHPKYVIKKGMPVLIPCGAMHRD 35440
EKLYANPNTFNPDNFSPERVKERDSVEWLPFGDGPRNCIGMRFGQMQARIGLALLIKDFK 35260
FSVCEKTTIPMTYNKEMFLIASNSGIYLKAERV 35161

AC009844 82008-82526
KGLYCNQKSDPLSGDLYALRGESWKEMRQKLDPSLEGDRMSLLYDCLYEEAEQLLLTVNS 82187
TLMSQPHSTVHIQKIMRRYVLSSLAKCVFGLNAEQRKTYPLEDFEQMTELALNSHKHGYL 82367
MNLMMIRVPNFCRMLRMRRTPKQAEEYFIKLLTSIVEQRETSGKPQKDYLQLL 82526

AC009844 694-846
WFGFGVGARSCIGIQFAQLQLRLALALLLSEYEFSLNTRKPLINLEDGIAL

AL067521 similar to CYP18 I-helix to K-helix region
LLWXSVFHCXIRGCCARVXXALARVVGRLRLPXFXXXXXLPLPASPFLASXRRSXXVPLAPTPSPXR 515

AC012164 293-399 27006-27380 same gene as next fragment
RGLGKEELAGHATTLLLEGYETSAMLLAFALYELALNEDAQRHAGNLI 27185
DPGALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGTELVLP 27365
VHAIH 27380
DSALYPAPNQFRPERF 27492

TSGCLRDLRSGCTLPGDSHEGGS*TKSLAGMARSALSAKRLEFAGDHVTVAHASIG*THWPPVGNVELANV*PLIHYP
LI
QIRTPAPAALVQGIGGSPKWWRQPLAMAGGKQKRLGKGGASWTCHHSPTGGLRNLGDATRLRPLRIGPQ*GCTATVTH
RI
G*SGPAPRWQSD*SRGSG*TSL*RSCALGGTASSSGHAGSAEALHQDIHPS*SEIRSEQRT*GAFGHRAGVAGSCHSF
VS
DTSVRLQIVYELQSFSISEIPHCIRHPTSFVQNAL*IIHQ

YKWLPPRSSVWMHIAWGFT*GWLMNQVAGWNGSLRSFSQAFGVCWRPCHCCTRLDWVDSLATGW*CGTRQCVTFDPLS
TY
TDTYPCPCSTGSGNWWKPEVVATTSCNGWRKAKEAWERRS*LDMPPLSYWRATKPRRCYSPSPSTNWPSMRMHSDGYT
SN
WMKWPSATLAI*LIQGLWVNFAIAKLRSWRHCVFIRPCRLCRSAAPRHSPFLIRNQERAANLRCIWAPSWCCRFMPFI
RE
*HFSETSDCLRTTVFFDFRDSALYPAPNQFRPERFVNHPPM

LQVVASAIFGLDAHCLGIHMRVAHEPSRWLEWLAPLFQPSVWSLLETMSLLHTPRLGRLIGHRLVMWNSPMCNL*STI
HL
YRYVPLPLQHWFRELVEARSGGDNLLQWLAESKRGLGKEELAGHATTLLLEGYETSAMLLAFALYELALNEDAQRRLH
IE
LDEVAQRHAGNLIDPGALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGTELVLPVHAIH
S*
VTLQ*DFRLSTNYSLFRFQRFRIVSGTQPVSSRTLCKSSTN

AC012164 477-537 109248-109066 not on AC015216 or AC012373
FLFFAPLKAWFGIGPPEGLPLARFKGKVGAPLPPGLVKVCPKGPPLGAPSPHSPPMGKVGA 109066

Exon with C-helix
SCTTAPSGAWRISTAGSLVNRSMCGVGVRVMPLSPAMSTSVPEGSSKWKRLRMRPVKSVSIMRAISSPGHIR

Three frames
FLFFAPLKAWFGIGPPEGLPLARFKGKVGAPLPPGLVKVCPKGPPLGAPSPHSPPMGKVGAQLCGPGALPSPVFFLPQ
WL
KAKKKKNI*KHKRKSNGGNGRPFPFWPPIASYRFSDKLVVIRPWEEHIRIPARAGLDKEQSSRKL*PTHRGLQPILFF
HP
QEAVFN*L*LNPS*YALVRFDYTANVNVFIVIKLYYYYR*C*DTAMSTASRPIKFS*LTFLFIRSISASLMSKQQTTK
QT
VV*SYKHTSK*ANDSKIKKNLWKKP*K*TGLYLFHNVF*YSSQIVIFAFEILLSQTTHY*NHIKLSRNSYSHLTFFLF
IQ
PF*NLILTFGTIIKLYI*NVF*LFIKYANGNAFKK*FAH*KKRIK*TTIH*KV*CNCKGYKKKSDTIRCTQDAQPNPA
PQ
LRRVHGASPPPAAW*IAACAEWGSG*CRSHPPCPPQCPRAVPSGSA*GCDP*RA*VSCEPSLRRDTFDSRTRMG*TVA
AP
GRLHRRSRSAPDGTPPVPGRPLGPCG*PTPSKRSSNHP**NNRPRRSSLCIRAECPVGQRCVCASSRKCKPLGRPAAP
VP
PKWETDPAAGGAHAGAAGSVVRGPSRAGTSAGWPARCPPRLQRDRPADGAAGRRSRAANPCLERLGSAPGCTPARTCP
PH
RSPRHPRLPHSAPGLDPLPLTA*AAPSLAAFAAASAPGAP*TPSASPSAAGHRPPQCGRRWRASWPCPPAG

FSVFCPPQGMVWHWPSGGVTPGPFQRESGGPPPPGVG*SLPQRPPPWGPFPPFPPHGQSWGPIMWAWGPPKPGFFSSP
MV
KSQKKKKYIKTQEKIEWW*WEAISILASHRELSI*R*VSSYPPVGGAHPDTG*SRAGQGAEQPEAVTNAQRTATNSIL
SP
PRSCVQLIVIKS*LIRPS*VRLYS*CQCFYSY*VILLLPLMLRYRYVYSQQAD*V*LANVLVYP*Y*C*LNVQTTNNK
TN
CGLVLQAHIKIGKRFENKEKFVEKTLKMNWAIFIS*CFLI*LANCYFCL*NFTKPNHSLLKSYKTF*KFLLSFDIFLI
HS
TILKSYTNIRHYN*TLHIECILIIY*ICKWKCLQKVICPLKKTH*IDHHSLKSIM*L*GIQKKIGYNKMHSGCTT*SC
TT
APSGAWRISTAGSLVNRSMCGVGVRVMPLSPAMSTSVPEGSSKWKRLRMRPVKSVSIMRAISSPGHIR*PDENGMNCG
GA
W*APSAVKKRSGRNSSGSGKATGSMWIAHSEQTIIEPPLIK*SPTTKFSLHPCGMPSGTTLRMRIVSEMQALR*ASGS
SS
SKVGDGPSGRRSSCRRRWQRCSRAK*SRNQRRVASEVSTPAPKRSASRWRSWSSQ*SCESVSRTARKCSRMHSGSYVS
SS
SLSSSSSSASLCSWLGSASFDCLSCSFSSCLCSRFCSWSSLNTFCFSFSRWPQASTMRS*MACVLALPSSRK

FFCFLPPSRHGLALALRRGYPWPVSKGKWGPPSPRGWLKFAPKAPPLGPLPPIPPPWAKLGPNYVGLGPSQARFFFFP
NG
*KPKKKKIYKNTRENRMVVMGGHFHSGLPSRAIDLAIS**LSARGRSTSGYRLEPGWTRSRAAGSCDQRTEDCNQFYS
FT
PKKLCSTNCN*ILVNTP*LGSIIQLMSMFL*LLSYIITTANAKIPLCLQPAGRLSLVS*RSCLSVVLVLA*CPNNKQQ
NK
LWFSLTSTHQNRQTIRK*RKICGKNLENELGYIYFIMFFNIARKLLFLPLKFY*AKPLTTKII*NFLEILTLI*HFSY
SF
NHFKILY*HSAL*LNFTYRMYFNYLLNMQMEMPSKSDLPIKKNALNRPPFTKKYNVIVRDTKKNRIQ*DALRMHNLIL
HH
SSVGCMAHLHRRQLGESQHVRSGGQGDAALTRHVHLSARGQFQVEAPEDATRKEREYHASHLFAGTHSIAGREWDELW
RR
LVGSIGGQEALRTELLRFREGHWVHVDSPLRANDHRTTLDKIIAHDEVLFASVRNAQWDNAAYAHRLGNASP*VGQRL
QF
LQSGRRTQRQEELMQAPLAALFAGQVEQEPAQGGQRGVHPGSKEIGQQMAQLVVAVELRIRVSNG*EVLQDALRLVRV
LL
IALLVILVCLTLLLAWIRFL*LPELLLL*LPLQPLLLLELLEHLLLLLQPLATGLHNAVVDGVRLGLALQQE

Same gene as AC015216 Drosophila melanogaster, *** SEQUENCING IN PROGRESS ***, 
Note there seem to be multiple genes on AC015216
 Score =  207 bits (521), Expect(2) = 9e-55
 Identities = 110/113 (97%), Positives = 110/113 (97%), Gaps = 12/113 (10%)
 Frame = +3

Query: 1     RGLGKEELAGHATTLLLEGYETSAMLLAFALYELALNEDAQR------------HAGNLI 48
             RGLGKEELAGHATTLLLEGYETSAMLLAFALYELALNEDAQR            HAGNLI
Sbjct: 73548 RGLGKEELAGHATTLLLEGYETSAMLLAFALYELALNEDAQRRLHIELDEVAQRHAGNLI 
73727

Query: 49    DPGALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGTELVLP 108
             DP ALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGT LVLP
Sbjct: 73728 DPVALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGTVLVLP 
73907

Query: 109   VHAIH 113
             V AIH
Sbjct: 73908 VQAIH 73922

73861 cgagcagcga acttaaggtg catttgggca ccgtgctggt gttgccggtt caggccattc
    73921 atttgtgagt gacacttcag tgagacttca gattgtctac gaactacagt ctttttcgat
    73981 ttcagagatc ccgcattgta tccggcaccc aaccagttcg tccagaacgc tttctaaatc
    74041 agccaccaat gggctgtcgg tttctgggct tcggagctgg accacgaatg tgtccgggaa

AC015216 Assembled sequence 72624-74273 also = GSS AL098201
MLPLVLFILLAATLLFWKWQGNHWRRLGLEAPFGWPLVGNMLDFALGRRSYGEIYQEIYT*
RNPGLKYVGFYRLFNEPAILVRDQELLRQILVGRNFADCADNAVYVDHQRDVLASHNPFIANGDRWRVLRADLVP
LFTPSRVRQTLPHVARACQLLRDQVPLGRFEAKDLATRYTLQVVASAIFGLDAHCLGIHMRVAHEPSRWLEWLAPLFQ
PS
VWSLLETMSLLHTPRLGRLIGHR*YVPLPLQHWFRELVEARSGGDNLLQWLAESKRGLGKEELAGHATTLLLEGYETS
A
MLLAFALYELALNEDAQRRLHIELDEVAQRHAGNLI 
DPVALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGTVLVLPVQAIH
LDPALYPAPNQFXPERFLNQPPMGCRFLGFGAGPRMCPGMRLGLLQTKAALTTLLQDHCVQLADEDQCRVEVSPLTFL
TASRNGIWLSFKRRTRRY*

PMLAVSNLA*LKHTRQLIDIQQLRG**GAIMESRNVVHSQLL*DIQLNWECKLSGIVPFKLFRCSSIMLPLVLFILLA
AT
LLFWKWQGNHWRRLGLEAPFGWPLVGNMLDFALGRRSYGEIYQEIYT*DTLILRVLQGHVQ*TFLDSRNPGLKYVGFY
RL
FNEPAILVRDQELLRQILVGRNFADCADNAVYVDHQRDVLASHNPFIANGDRWRVLRADLVPLFTPSRVRQTLPHVAR
AC
QLLRDQVPLGRFEAKDLATRYTLQVVASAIFGLDAHCLGIHMRVAHEPSRWLEWLAPLFQPSVWSLLETMSLLHTPRL
GR
LIGHRLVMWNSPMCNL*STIHLYRYVPLPLQHWFRELVEARSGGDNLLQWLAESKRGLGKEELAGHATTLLLEGYETS
A

ONLY GT NEAR THE PLVGNM IS AT IYT IN THE T* JOINT ACGTAA LEAVING AC
72661 cgctcctgtt ttggaaatgg cagggcaatc actggcgtcg cctgggactg gaggcacctt
72721 ttggctggcc attggttggg aatatgttgg actttgccct gggccgccgc tcatatggag
72781 agatttacca ggaaatctat acgtaagata cactaatctt aagagtcctt caaggccacg
ONLY AG NEAR THIS SITE IS AGT IN S CODON LEAVING A T BASE BEFORE RNPGL
72841 tgcaatgaac ttttcttgac agtcggaatc cgggcctgaa atatgtgggt ttctatcgcc
72901 tgtttaacga acccgccatt ctggtgcgtg accaggagtt gctgcgccag atcctggtgg
REGION NEAR STIH ONLY AG I IN THE R CODON AGA AT RYVP LEAVING AN A BASE BEFORE 
YVP
THIS MEANs THE GT MUST BE AT THE JOINT OF TWO CODONS
73381 gactcattgg ccaccggttg gtaatgtgga actcgccaat gtgtaacctt tgatccacta
73441 tccacttata cagatacgta cccctgcccc tgcagcactg gttcagggaa ttggtggaag
END REGION OF C HELIX EXON ONLY FIVE GT STRADDLE THE CODON BOUNDARIES
73201 gttatacgct acaagtggtt gcctccgcga tcttcggtct ggatgcacat tgcctgggga
73261 ttcacatgag ggtggctcat gaaccaagtc gctggctgga atggctcgct ccgctctttc
73321 agccaagcgt ttggagtttg ctggagacca tgtcactgtt gcacacgcct cgattgggta
73381 gactcattgg ccaccggttg gtaatgtgga actcgccaat gtgtaacctt tgatccacta
AIHL* has a GT at the L* overlap TTGT leaving TT after the AIH
73861 cgagcagcga acttaaggtg catttgggca ccgtgctggt gttgccggtt caggccattc
73921 atttgtgagt gacacttcag tgagacttca gattgtctac gaactacagt ctttttcgat
73981 ttcagagatc ccgcattgta tccggcaccc aaccagttcg tccagaacgc tttctaaatc
74041 agccaccaat gggctgtcgg tttctgggct tcggagctgg accacgaatg tgtccgggaa
The R in FRDP = AGA so a boundary between AIH and RDP would be AIHLDP


MOST PROBABLE EXON BOUNDARY IS EIYTRNPGL
PNASCLESCVIETYPSVD*YTAITWLMRRNHGITQRGSLSAALRYSTQLGMQVEWDCAFQAFQMLIDNAATGAIYPIG
RH
APVLEMAGQSLASPGTGGTFWLAIGWEYVGLCPGPPLIWRDLPGNLYVRYTNLKSPSRPRAMNFS*QSESGPEICGFL
SP
V*RTRHSGA*PGVAAPDPGGP*LCRLCGQRRLCGPPARCPG*PQSLHRQRRPLASLAGRPGAALHTQPSASDLAACGQ
SM
SVAPGSGAPWTLRGQGLGHALYATSGCLRDLRSGCTLPGDSHEGGS*TKSLAGMARSALSAKRLEFAGDHVTVAHASI
G*
THWPPVGNVELANV*PLIHYPLIQIRTPAPAALVQGIGGSPKWWRQPLAMAGGKQKRLGKGGASWTCHHSPTGGLRNL
G

EST AI062339 GH01510.5prime GH Drosophila melanogaster head pOT2 Drosophila
           melanogaster cDNA clone GH01510 5prime
           Length = 545
           
 Score = 25.1 bits (53), Expect = 9.7
 Identities = 10/17 (58%), Positives = 11/17 (63%)
 Frame = -1

Query: 15  KRLEFAGDHVTVAHASI 31
           K LEFAG HV V H  +
Sbjct: 320 KGLEFAGQHVAVLHGGL 270

AQC*LSRILRD*NIPVS*LIYSNYVANEAQSWNHATWFTLSCFEIFNSTGNAS*VGLCLSSFSDAHR*CCHWCYLSYW
PP
RSCFGNGRAITGVAWDWRHLLAGHWLGICWTLPWAAAHMERFTRKSIRKIH*S*ESFKATCNELFLTVGIRA*NMWVS
IA
CLTNPPFWCVTRSCCARSWWAVTLPTVRTTPFMWTTSAMSWLATIPSSPTETAGESCGPTWCRSSHPAECVRPCRMWP
EH
VSCSGIRCPLDASRPRTWPRVIRYKWLPPRSSVWMHIAWGFT*GWLMNQVAGWNGSLRSFSQAFGVCWRPCHCCTRLD
WV
DSLATGW*CGTRQCVTFDPLSTYTDTYPCPCSTGSGNWWKPEVVATTSCNGWRKAKEAWERRS*LDMPPLSYWRATKP
RR

Query: 114   DSALYPAPNQFRPERF 129
             D ALYPAPNQF    F
Sbjct: 73987 DPALYPAPNQFVQNAF 74034

C-terminal region three frames
GWPLVGNMLDFALGRRSYGEIYQEIYT*DTLILRVLQGHVQ*TFLDSRNPGLKYVGFYRLFNEPAILVRDQELLRQIL
VGRNFADCADNAVYVDHQRDVLASHNPFIANGDRWRVLRADLVPLFTPSRVRQTLPHVARACQLLRDQVPLGRFEAKD
LATRYTLQVVASAIFGLDAHCLGIHMRVAHEPSRWLEWLAPLFQPSVWSLLETMSLLHTPRLGRLIGHRLVMWNSPMC
NL*STIHLYRYVPLPLQHWFRELVEARSGGDNLLQWLAESKRGLGKEELAGHATTLLLEGYETSAMLLAFALYELALN
EDAQRRLHIELDEVAQRHAGNLIDPVALGELRYSEAALLEALRLHPAMQALQKRCTKTFTLPDQKSGASSELKVHLGT
VLVLPVQAIHL*VTLQ*DFRLSTNYSLFRFQRSRIVSGTQPVRPERFLNQPPMGCRFLGFGAGPRMCPGMRLGLLQTK
AALTTLLQDHCVQLADEDQCRVEVSPLTFLTASRNGIWLSFKRRTRRY**CNEGIISIYHKICATL*MLHG

WLAIGWEYVGLCPGPPLIWRDLPGNLYVRYTNLKSPSRPRAMNFS*QSESGPEICGFLSPV*RTRHSGA*PGVAAPDP
GGP*LCRLCGQRRLCGPPARCPG*PQSLHRQRRPLASLAGRPGAALHTQPSASDLAACGQSMSVAPGSGAPWTLRGQG
LGHALYATSGCLRDLRSGCTLPGDSHEGGS*TKSLAGMARSALSAKRLEFAGDHVTVAHASIG*THWPPVGNVELANV
*PLIHYPLIQIRTPAPAALVQGIGGSPKWWRQPLAMAGGKQKRLGKGGASWTCHHSPTGGLRNLGDATRLRPLRIGPQ
*GCTATVTHRIG*SGPAPRWQSD*SSGSG*TSL*RSCALGGTASSSGHAGSAEALHQDIHPS*SEIRSEQRT*GAFGH
RAGVAGSGHSFVSDTSVRLQIVYELQSFSISEIPHCIRHPTSSSRTLSKSATNGLSVSGLRSWTTNVSGNATWPAPDE
GCTDHTTAGPLRPAGG*GSVQGGGVSAHLPHRQQEWHLAEFQEKDT*ILMMQ*RDYIHISQNLCYIVDVTW


LAGHWLGICWTLPWAAAHMERFTRKSIRKIH*S*ESFKATCNELFLTVGIRA*NMWVSIACLTNPPFWCVTRSCCARS
WWAVTLPTVRTTPFMWTTSAMSWLATIPSSPTETAGESCGPTWCRSSHPAECVRPCRMWPEHVSCSGIRCPLDASRPR
TWPRVIRYKWLPPRSSVWMHIAWGFT*GWLMNQVAGWNGSLRSFSQAFGVCWRPCHCCTRLDWVDSLATGW*CGTRQC
VTFDPLSTYTDTYPCPCSTGSGNWWKPEVVATTSCNGWRKAKEAWERRS*LDMPPLSYWRATKPRRCYSPSPSTNWPS
MRMHSDGYTSNWMKWPSATLAI*LIQWLWVNFAIAKLRSWRHCVFIRPCRLCRSAAPRHSPFLIRNQERAANLRCIWA
PCWCCRFRPFICE*HFSETSDCLRTTVFFDFRDPALYPAPNQFVQNAF*ISHQWAVGFWASELDHECVRECDLACSRR
RLH*PHYCRTTASSWRMRISAGWRCLRSPSSPPAGMAFG*VSREGHVDINDAMKGLYPYITKFVLHCRCYME

2nd gene on AC015216 Drosophila melanogaster, in ordered fragments
48688-49053, 49155-49301, 49723-50877
MSADIVDIGHTGWMPSVQSLSILLVPGALVLVILYLCERQCNDLMGAP
PPGPWGLPFLGYLPFLDARAPHKSLQKLAKRYGGIFELKMGRVPTVVLSDAALVRDFFRR 49011
DVMTGRAPLYLTH
GIICAQEDIWRHARRETIDWLKALGMTRRPGELRARLERRIARGVDECV 49301
VNPLPALHHSLGNIINDLVFGITYKRDDPDWLYLQRLQEEGVKLIGVSGVVNFLPWLRHL 49902
PANVRNIRFLLEGKAKTHAIYDRIVEACGQRLKEKQKVFKELQEQKRLQRQLEKEQLRQS 50082
KEADPSQEQSEADEDDEESDEEDTYEPECILEHFLAVRDTDSQLYCDDQLRHLLAD 50250
LFGAGVDTSLATLRWFLLYLAREQRCQRRLHELLLPLGPSPTLEELEPLAYLRACIS 50421
ETMRIRSVVPLGIPHGCKENFVVGDYFIKGGSMIVCSEWAIHMDPVAFPEPEEFRPERFL 50601
TADGAYQAPPQFIPFSSGYRMCPGEEMARMILTLFTGRILRRFHLELPSGTEVDMAGES 50778
GITLTPTPHMLRFTKLPAVEMRHAPDGAVVQD 50877
The above sequence is also intact on AC012373 154223-156412
Also on AC012164 partial sequence on two different fragments
Also on AC012376 partial on two fragments

Fragments for another gene found on AC015216 by blast with the above seq.

cyp18 is on AC015216 from comp(51875-54944)
Query: 49    PPGPWGLPFLGYLPFLDARAPHKSLQKLAKRYGGIFELKMGRVPTVVLSDAALVRDFFRR 108
             PPGPWGLP +GYL F+ +   H    +LAK+YG +F  ++G   TVV+SD  ++R+ FRR
Sbjct: 54785 PPGPWGLPVIGYLLFMGSEK-HTRFMELAKQYGSLFSTRLGSQLTVVMSDYKMIRECFRR 
54609

Query: 109   DVMTGR 114
             +  TGR
Sbjct: 54608 EEFTGR 54591

Query: 171   VNPLPALHHSLGNIINDLVFGITYKRDDPDWLYLQRLQEEGVKLIGVSGVVNFLPWLRHL 230
             V+  P +  ++ N+I  L+    +  DDP +     L EEG++L G    V+++P ++  
Sbjct: 53463 VDMSPVISVAVSNVICSLMMSTRFSIDDPKFRRFNFLIEEGMRLFGEIHTVDYIPTMQCF 
53284

Query: 231   PANVRNIRFLLEGKAKTHAIYDRIVE 256
             P+       + + +A+    Y  +++
Sbjct: 53283 PSISTAKNKIAQNRAEMQRFYQDVID 53206

Query: 337   DDQLRHLLADLFGAGVDTSLATLRWFLLYLAREQRCQRRLHELL---LPLGPSPTLEELE 393
             ++QL  ++ DLF AG++T   TL W  +++ R  +  RR+ + L   +     PT+E+L+
Sbjct: 52982 EEQLVQVIIDLFSAGMETIKTTLLWINVFMLRNPKEMRRVQDELDQVVGRHRLPTIEDLQ 
52803

Query: 394   PLAYLRACISETMRIRSVVPLGIPH 418
              L    + I E+MR  S+VPL   H
Sbjct: 52802 YLPITESTILESMRRSSIVPLATTH 52728

Query: 383   LGPSPTLEELEPLA-------YLRACISETMRIRSVVPLGIPHGCKENFVVGDYFIKGGS 435
             +GP+P      P+        + R+C+S  +R R V   G             Y I  GS
Sbjct: 52381 VGPAPAAR**SPMHRSQTNHNFRRSCLSPLVRCRDVELNG-------------YTIPAGS 
52241

Query: 436   MIVCSEWAIHMDPVAFPEPEEFRPERFLTADGAYQAPPQFIPFSSGYRMCPGEEMARMIL 495
              ++    ++HMDP  + +PEEFRP RF+  +G  + P  FIPF  G RMC G+ +ARM L
Sbjct: 52240 HVIPLINSVHMDPNLWEKPEEFRPSRFIDTEGKVRKPEYFIPFGVGRRMCLGDVLARMEL 
52061

Query: 496   TLFTGRILRRFHLELPSGTEV-DMAGESGITLTP 528
              LF    +  F + LP G  +  + G  G T+TP
Sbjct: 52060 FLFFASFMHCFDIALPEGQPLPSLKGNVGATITP 51959