Revised Oct. 18, 2007

D. Nelson

 

Some names not previously assigned are now assigned.  Some revisions to sequences are made, while some are known but remain confidential. 

 

Revised Feb. 9, 2007

D. Nelson

 

Some sequences are given placeholder names like CYP6AEaa until official

names can be assigned.  There are 51 complete sequences, 9 more are nearly

complete, missing only N- or C-terminals or a small internal piece.

This is 60 genes with strong, intact or nearly intact assemblies.

There are another 12 in closely related gene subfamilies that have all or nearly

all exons accounted for, but the exon connections cannot be made from the small

contigs that do not overlap.  This is at least 72 genes.  There are 7 more partials

that are less than half a gene at the present time.  If these can be completed, that

would make 79 P450s in silkworm.  This is comparable to Drosophila.

 

Parts of some genes were assembled from confidential sequences.  The Beijing Genomics Institute has now set up a database with a BLAST server to make this data public, so I have included it here.

 

SilkDB: a knowledgebase for silkworm biology and genomics

Nucleic Acids Research 33, 399-402 (2005)

 

http://silkworm.genomics.org.cn/

 

July 20, 2004

 

Note: trees have been built and naming is being done, but some sequences are still

uncertain.  The old CYP301B sequence has had its C-terminal exon changed based

on an Apis sequence and it is now CYP49A1, the old CYP49A1 is now CYP49A2

 

CYP4G subfamily sequences. 

 

>CYP4G22 old CYP4Gxx 64% to 4G19

AV404689 Bombyx mori prothoracic gland EST spans two contigs

AV405174 Bombyx mori prothoracic gland EST

AV404871 Bombyx mori prothoracic gland EST

BP183989 P5PG Bombyx mori cDNA clone

BP182770 cDNA clone NRPG1026

BP182055 NRPG Bombyx mori cDNA clone

BP183011 cDNA clone NRPG1327

BP183771 P5PG Bombyx mori cDNA clone

AU004478 EST

BP183321 NRPG Bombyx mori cDNA clone

BAAB01118673.1 BAAB01085960.1 84% to 4G20

MSYTNAENVVPTSTFSAINLFYVLLVPAVILWYAYWRMSRRRLYELADKLNGPPGLPLLGNALEFVGGSA ()

DIFRNIVQKSADYDHESVVKIWIGPRLLVFLYDPRDVEVILSSHVYIDKAEEYRFFKPWLGNGLLIST (1?)

GQKWRSHRKLIAPTFHLNVLKSFIDLFNANSRAVVDKLKKEASNFDCHDYMSECTVEILL (1)

 234 ETAMGVSKSTQDQSGFEYAMAVMKMCDILHLRHTKIWLRPDLLFKFTDYAKNQTKLLDIIHGLTKK 431 (0)

 988 VIKRKKEEFASGKKPSNLNETATTSEPSTGKLTSVEGLSFGQSSGLKDDLDVDDDVGQK 1164

1165 KRLAFLDLLLESSQSGVAISDEEIKEQVDTIMFE 1266 ()

1340 GHDTTAAGSSFFLSMMGIHQDIQDKVIEELDQIFGDSDRPVTFQDTLEMKYLERCLMET 1516

1517 LRLYPPVPIIARQVNQEITL 1576 ()

1680 SNGKKIPAGTTLVIATYKLHRRPDVYPNPNKFDPDNFLPERSANRHYYAFVPFSAGPRSCV 1862 (1)

1960 GRKYAMLKLKVILSTILRNFRVISDLKESDFKLQADIILKRAEGFQVRLQPRKRMAKA* 2136

 

>CYP4G23 old CYP4Gyy 58% to 4G19 59% to CYP4Gxx

CK508129   rswdd0_001928.y1 swd Bombyx mori cDNA

CK505752 rswcc0_009305.y1 swc Bombyx mori cDNA

AV401408 BP122820 BP123076 BP183420 CK509955 CK505752 CK505553

BAAB01050777.1 BAAB01096227.1 BAAB01156775.1 BAAB01003438.1

BAAB01073763.1 BAAB01110168.1 BAAB01050777.1

MTSLVDETEGYHVNSRVIFYPLLGLTTAIWILYRWQQNSHMHKLAELLPGPASIPIFGNALTLMRKNPHE ()

LVNLALGYAQTFGNVIRVWLGSKLIVFLVDADDIEIILNSHVHIDKATEYRFFKPWLGEGLLISS (1?)

GPKWRSHRKMIAPTFHINILKSFVGIFNQNSNNVVEKLKSEVGKTFDVHDYMSGTTVDILL ()

ETAMGISRKTQDESGFDYAMAVMK ()

MCDIIHQRHYKFWMRSEIVFKLTSFFKQQTKLLGIIHGLTNK ()

VIKNKKETYLENKAKGIIPPTLEEFTHHSGEILANNAKTLSDTVFKGYRDDLDFNDENDV ()

GEKKRLAFWDLMIESSQNGTNKISDHEIKEEVDTIMFE ()

GHDTTAAGSSFVLCLLGIHQDVQARVYDELYQIFGDSDRPATFADTLEMKYL

ERVILESLRLYPPVPVIARKLNRDVTI ()

STKNYVIPAGTTVVIGTFMLHRQPKYYKDPEVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV (1)

GRKYALLKLKILLSTILRNFRTISEIPEKEFKLQGDIILKRAEGFQMKVEPRKRVPTNVAR*

 

>CYP4G24 old CYP4Gzz 71% TO CYP4Gyy

FIRST TWO EXONS ARE JOINED WITHOUT OVERLAPPING EVIDENCE

BUT THEY ARE THE ONLY OTHER 4G N-TERM SEQUENCES WITHOUT A PARTNER

BAAB01149226.1 BAAB01135847.1 BAAB01073803.1 BAAB01199362.1

MSSLLNYNFEFDVPPIFHTLLLAIMIMWM

LHRWQQSSRLFRLGNKLPGPMALPLVGNSLLILGKKAEGW = BAAB01031491.1

VKYALDYSEKYG

TVVRAWAGPKLVVFLTDANDVEVILNSQIHIDKSPEYRFFKPWLGEGLLISSG = BAAB01016193.1

XKWRSHRKMIAPTFHINILKSFMGVFNENSKSVVKKLRSEVGKTFDVHDYMSCVTVDILL

ETAMGITKTTQDAASFDYAMAVMK

MCNIIHQRHYKVWLHFDAIFKLTSLFKKQRELLKTIHGLTNK (0)

VIKKKKFMYLQNKEKGIIPPTIEELTKIKDTNDSIMEDSAKTLSDTVFKGYRDDLDFNDEQDV ()

GEKKRLAFLDLMIESAQNHTCNISDHEIKEEVDTIMFE ()

GHDTTAAGSSFVLCLLGIHQEIQSKV

YDELFEIFGDSDRLVTFADTLQMKYLERVILESLRLYPPVPAIARKLTRDVQI ()

VTNNYIIPAGSTVVIGTFKIHRDPKYHKNPNVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV (1)

GRKYALLKLKVLLSTILRNYKTTSEISEDQFVLQADIILKRYDGFKIRIEPRNKNHSNTV*

 

CYP4L subfamily sequence

 

>CYP4L6  67% to 4L4

CK535040 EST 5aa diffs BP182824 EST

BAAB01121615.1 BAAB01097697.1 BAAB01093085.1 BAAB01107041.1

MYLILLICLLVLVLTVSWWSMLNRICKSNVPGPFPLPIIGNAHQFVVRST (1)

EFLGLLKSFTDKYGDVFRVHFFSYPYVLISHPKYAE (0)

 932 ALVSSADLITKGRSYSFLKAWLGEGLLTAS 1018 (1)

1506 GPRWRLHRKFLTPAFHFNILQNFLPVFCKKSEILRDKIRRLADGQPIDLFPITALAALDNVAESIM 1688 (1)

GVSVNAQQNSESEYVRAIEX ()

LSQITTLRMQIPLLGEDFIFNLTSYKKKQNIALEVVHGQTKKVIEARRCELEKNNKTNISGTNE ()

IGIKNKHAFLDLLLLAEIDGKL ()

12  MDEQSVREEVDTFMLEGHDTTTSGI 86

85  LVYTLFGLSKHPDIQEKWYEEQLTIFGEEMDRTPAY ()

NELAQMKVLEWVIKESLRMYPSVP 264

265 LIERWITKDAE

VGGLKLSKGTSVVLNIFQMHRNPEVFEKPLEFIPERFDSLEHKNPFSWLAFSAGPRNCI ()

GQKFAMMEMKVTLSTLVRNFKLVPVDIEPILCADLILRSQNGVKVGFLPRTQSNSKT

 

CYP4M subfamily sequences

 

>CYP4M5 EST BP125511

BAAB01099467.1 BAAB01097680.1 BAAB01039373.1

MFVYLIFIASFFLLIHLAFNY

NSKAVMMNKVPGPKLSFILGNAPEIMMLSSVELMKLARKFASRWDGIYRIWAFPLSIINIYNPDDVEVIVSTTKHN

EKSSVYKFLKPWLGDGLLISK

GEKWQQRRKILTPAFHFNILRQFSVIIEENSQRLVESLEKCIGKPIDIVPVVSEYTLNSIC

ETSMGTQLSDKTEDAWKAYKDAIYELGPYFFQRFTRVYLYFDIIFYLTSLWRKMKKPLKSLHG FTSTVIKERKIYVEQNGVKF

GEDVNDDDLYIYKKRRKTAMLDLLIAAQKDGEIDDHGIQEEVDTFMFE

EHDTTASGLTFCFMLLANHRAVQDKIVEEINYIMGDSTRRANLEDLSKMKYLECCIKESLRLYPPVHFISRNLNEPV

VLSNYEIPAGSFCHIHIFDLHRRADIYEDPLVYDPDRFSQENSKGRHPYAYIPFSAGPRNCI

GQKFAMIEMKSAVAEVLRKYELVPVTRPSEIELIADIILRNSGPVEITFNKRTK

 

>CYP4M9 BAAB01103156.1 BAAB01142776.1 BAAB01178932.1 57% to 4M5

BY921981.1 EST N-term

MWMYFILILLLLLTIHFLLNYNYRARLLRRIPGPRGYFIVGNALDVILSPAELFASTRK

NAAQWPNLNRFWSFGIGALNIYGPDEIEAIISSTQHITKSPVYNFLSDWLRDGLLLST

GTKWQKRRKILTPAFHFNILKQFCVILEENSQRFTENLKDTEGKSINVVPAISEYTLHSII ()

675 ETAMRTQLGSETSEAGRSYKNAICELGNQFVHRLARLPLHNNFIYNLYTLGKQNKH

LNIVHSFTKKVIKDRRQYIRGNGGNNFDDEKDTQADEHSIYFNKKKTAMLDLLLKAERDGL

IDEIGVQEEIDTFMFE ()

1150 GHDTTATGLTYCIMLIANHKSIQVGALRKYID 1245 ()

295 GESTRAADIEDLSKMRYLERCIKESLRLYPPVPSMGRILSEEI 423 ()

1467 VLDGYTVPAGTYCHIQIFDLHRREDLFKDPLVFDPDRFLPHNTEGRHPYAYIPFSAGPRNCI 1282 ()

GQKFAILEMKSLLSAVLRRYNLYPITKPEDLKFVLDLVLRTTEPVHVRFVKRNKV*

 

CYP4S subfamily sequences

 

>CYP4S5 66% to CYP4S4 from the moth Mamestra brassicae

BAAB01177041.1 BAAB01181662.1 BAAB01048279.1 BAAB01025317.1

AADK01013707.1

MIWAVVLLIIFIFLIWKALEDEENPLDSLPGPERKPIIGATMRFVNLNT ()

3009 GEMFIKLREYHAMYGTRYVVKIFKRRILHLSNERDVE (0)

 365 VVLSHSKNIKKSKPYTFLSPWLGSDLLLST 454

GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLFTIC

ETAMGTQLDSDKSAETSEYKMAILQIGSLLLDRLTKVWLHNDFIFRQFTVGRKFQKCLKQVHSFAHN

VIVERKRQRASGRDPTVVAEDVFG

RKKRLAMLDLLLEAEEKNEIDFEGIMDEVNTFMFE (1?)

GHDTTAVALTFSLMLVAEDDQVQ (0?)

DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML ()

DDLKIKKGSEVAVHIYDLHRRKELFSDPEKFLPDRFLNGELKHPYSFVPFSAGPRNCI (1)

GQRFATLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*

 

>CYP4S6 94% to CYP4S5

BAAB01121123.1 BAAB01054179.1 BAAB01073078.1 BAAB01008096.1 BAAB01081659.1

AADK01014846.1

 164 MIWTVVVLIIFIFLIWKALEDEENPLDSLPGPERKPIIGAALRFVNLNT 18 ()

     REMFIKLREYHAMYGTRFVVKIFKRRVLHLSNEKDAE (0)

     VVLSHSKNIKKSKGYTFLSPWLGSGLLLST (1)

1053 GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLYTIC 1235

1966 ETAMGTQLDSDKSAETKEYKMAILQIASLLLNRLTKVWLHNDLIFDQLTVGRKFQK 2133

2134 CLKQVHSFAHNVIVERKRQRASGRDYSVVAEDVFGRKKRLAMLDLLLEAEEKNEIDFEGI 2313

2314 MDEVNTFMFE 2343 ()

2862 GHDTTAVALTFSLMLVAEDDQVQ 2930 ()

3158 DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML 3325 ()

 578 DDLNIKKGSEVVVHIYDLHRRKELFADPEKFQPDRFLNGELKHPYSFVPFSAGPRNCI 751 ()

     GQRFAMLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*

 

CYP4AU sequences

 

>CYP4AU2 BAAB01192732.1 CYP325 like N-term

AADK01032226.1

1927 MIILLLLFFVASLYWLYWTNKSKRMNKMTASLPVPPTLPILGNATLFIG

 

>CYP4AU2 35% to 4C3 aa 112-276 37% to 4G19 BAAB01022101.1 exons 2,3,4

AADK01025934.1

(1) AIGDPEDAQVVLENCLDKDVVYRFLRPWLGHGLFVAP (1)

VCLWKSHRKVLLPVFHNKVVEQYLQMISVQADILVERLNEKANKGEFDVLKYITACTLDIVF (1)

ETAMGERMDVQRSPDTPYLRARHTVMTILNMRFFKVWLQPDCIFNLTSYSKQQKDNIDLTHKFTDE (0)

 

>CYP4AU2 62% to CYP4AU1 I-helix to end BAAB01055413.1

AADK01029434.1

Note N-terminal is known but confidential (10/18/2007)

Only missing 23-24 aa in the middle between the two segments

(1) GKVRPVLDMLFGREIEFTDEQLREHIDSITIAGNDTTALVMAYTLVRLGIHQNVQEKVYLE ()

QRTIFGDSKRGADKVDVAQMQYLERVLKESMRLYTVVPIIARNVHKDTYL ()

PRCGVTLPAGIGAVVGPFAIHRSKSVWGPDADEFDPDRFLPERSLNRHPAAFLPFSHGSRNCI ()

GRNFGMLIMKSIVSTISRSYRIEADELGPLKIEMLLFPIRGHQIRISRRETLA*

 

>CYP4AU-like BAAB01077742.1 Length = 3282 68% to BAAB01022101.1 exon 2

AADK01002646.1

69% to CYP4AU2

2223 AICDTEHAQIILDCLNKDMVYRFLRLWLEHGLFVAS 2330

 

CYP4AX sequences 50% identical to CYP4S4

 

>CYP4AX1 CYP4qq 50% to CYP4S4 53% to CYP4Sxx lower case is from CYP4rr

BAAB01092965.1 BAAB01135526.1 BAAB01183645.1, AADK01005955.1

AADK01030377.1 (frameshift after GQKF in last exon)

MWFSLVVVAVIYALWKLFYKEDDPIDSLPGPTKLPIIGNMLDMFNMTP ()

GEKFKYERQLSKTYKQRYMQKIFYRRIVYVHHPDDVE ()

VVLSHSKNITKNVNYDFLKPWLGTGLLLST ()

GSKWFKRRKILTSAFHFDILKDFASLFEEKSRRLVDQLRANNGEPISLLPVMSNYTLFTLC (1)

ETALGTKLDTDRSVAAAEYKDAISKTAQISIYRLPRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN

VIVQRKEQRLNSLDKGLVERDEFNRKKRTALLDLLLEAEAKREIDLEGIREEVNTFMFA ()

GHDTTGTALTFSLMLLSDHEEAQ ()

ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRVGRVLTEDITL (1)

GGVPIKAGTEIGVQIIDLHHREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL ()

GQKF

AMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*

 

>CYP4AX2 CYP4rr 94% identical to CYP4qq lower case is from CYP4qq

BAAB01152933.1 BAAB01194854.1 BAAB01058563.1 BAAB01102208.1 BAAB01008375.1

MWFSLLVVAVIYALWKLFYKEDDPIDSLPGPAKLPIIGNMLDMFNMTP ()

GEKFKYERQLSKTYKQRYMQKVFYRRIVYLHHPDDVE (0)

VVLSHSKNITKNVIYDFLKPWLGTGLLLST ()

GSKWYKRRKILTSAFHFDILKDFASLFEERSRRLVDQLRANNGEAISILPVMNNFTLLTIC (1)

ETALGTKLDTDRSVNTAAYKDAISKIGQICIYRLSRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN

VIVQRKEQRLNSLDKGLVERDEFNRKKRTVLLDLLLEAEAKREIDLEGIREEVNTFMFA ()

ghdttgtaltfslmllsdheeaq ()

ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRIGRVLTEDITL (1?)

GGVPMRAGTEVCVLTIDLHYREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL (1)

GQKFAMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*

 

&&&&&&&&&&&&&&&&&

 

>AM106362 Lutzomyia longipalpis Jacobina = best EST to these genes

MWGKFFFICDGLFWAVFFSPPLRGGLGSFFRGGGRKFPGPFYGS

PSLGRLPAQMGFSQKGFLGLPKVILSKITQSNENFGWAEAVC ()

AYFPPEECQYVFNANECLSRDDIYDYIKPFTGDGLVTLP ()

AETWKDHRKFLNPCFNLKILQSYMPIFNTEVKTLIGRLGQRIGKGSFDMYDYMDACALDVVC ()

QTTLGTQMNIQKNENMDYLDAANSLLATMTTRIFNPLYHSDFIFNLSKWAKMEQKNSDITFGFVDNILQR

KKAAYKKFQPSDEQNNLDEGTSFKSPQLFIDQLLKLSMEGKYFTDTDVKNEANTIVA

 

 

CYP6B sequence

 

>CYP6B29 51% TO 6B1 CK494244 EST 57% to 6B27 aa 144-500

BAAB01105883.1 BAAB01064450.1

2461 MAIIYILSASVVLPLLLYLYFTRHFNYWKKRNVPGPKPVPLFGNLMELALRKKNIGIVFKELYENFPNEK 2670

2671 VVGIYRMTTPCLLIRDLDVIKNIMIKDFDVFVDRGVELS*SGLGANLFHADGDTWRV 2841

2842 LRNRFTPLFTSGKLKN

MLHLMIERANKYIEHVEMLCDHQPEQDIHTLVQKYTMATIAACAFGLDIDTTDPNKDQLK

TLEEIDRLSLTANFAFELDMMYPGVLKKLNSTLFPGFVSRFFKDVVKTIIEQRNGKPTDR

NDFMDLILALRQLGDIQATKRNSEDKEYSIELTDELIEAQAFVFYIAGYETSATTMTFML

YQLALNPDIQDKVIAEIDQGLKESKGEVTYEMLQNLTYFEKAFNETLRMYSIVEPLQRNA

KIDYKIPDTDIVIEKGTTVLFSPLGIHHDEKYYPNPSKFDPERFSPANISARHPCAHIPF

GTGPRNCI ()

661 GMRFAKIQSRVCMVKMFSKFRFELAKNTPRNLDIDPTRLLLGPKGGIPLKIVRR*  825

 

CYP6AB subfamily sequences

 

>CYP6AB4 64% TO 6AB1 OVER WHOLE SEQ BAAB01081811.1 BAAB01162324.1

BAAB01031011.1, AADK01009517.1

MLTAAIFVIIVALVYLYSTRTFRYWEKRGIKHDKPVPFFGTDSEGYLLRKSMTQTAVDAY

LKYPNEKVIGFFRSSRPELIIRDPDIIKRILTTDFAYFYPRGLNPHKKVIEPLMRNLFFA

DGDLWRLLRQRMTPAFTSGKLRAMFPLVVERAERLQSRTLEIASQPLDARELMARYTTDF

IGACGFGLDADSLNDEDSAFRKLGAAIFNITVQQAIVAALKEIFPGIFKHFKYSSKYETD

FMSLVSSILKQRNYKPSGRNDFIDLLLECRMKGEIVVESIEKMKPDGTSEVVRMELTEQL

LAAQVFIFFAAGFETSSSATSFTLHQLAFHPEIQEKVQKELDQVLAKYNNKLCYDAIKEM

RYLESAFKEAMRMFPSLGFLIRECARQYTFPELNLTIDEGVGIIIPLQALHNDPEYFDSP

NEFRPERFMPSEYNHNKTKFVYLPFGDGPRGCI ()

GARLGLMQSLAGLAAVLSKFTVKPAPSTKRHPVVEPKSSVVQSIKDGLPLLFIERTKS*

 

>CYP6AB5 BAAB01021567.1 BAAB01206787.1 58% to 6AB3 72% to 6AB2

56% TO  CYP6ABxx

MISSING C-TERM BEYOND A REPEAT SEQ.

possible C-term = BAAB01196051.1 66% to 6AB2

Bmb030331 from Li Bin

MFFYLLIVILITLYYYGVR

TFDYWKKKGVNHDPPLPFFGNNLRQFMQKASMAMMATETYKKYPEEKVVGFYRGTSPELV

VRDPELIKRILVTDFSSFYARGFNPHKKVIEPLLKNLFFADGDLWRLIRQRFTPAFSTAK

LKAMFHIITERAEKLQMIAENEAYENFCDVRELMARYTTDFIGACGFGLNIDSLSDENSQ

FRKLGKRIFKRDLSDAVRAALKLMFPELCKHLTFLTPELEKSMTYLVQNVIREKNYKPSG

RNDFIDLMLELKQKGKLLGESIEAKNANGTPKQVELEFDDLLMTAQVFVFFGAGFETSST

ASSYTLHQLAFNPECQEKTQKEIDEVLSKHNNKITYDAIKEMTYLEMAFNEAMRLYPSVG

YLVRMCTVPEYTFPEINLTINEDVKLMIPIQAIQKDEKYFKDPERFHPERFSSGAKANLK

PYTFLPFGEGPRACV ()

923 GERLGQMLSMAGLVAVLQKYTVEPVEISLRDPIPDPTTTVSEGFVHGLPLKLRRRERRI* 1102

 

>CYP6AB8 96% identical to CYP6AB4 20 aa diffs

whole seq known but confidential

 

CYP6AE sequences. 

 

>CYP6AE2    Bombyx mori (silkworm)

            old CYP6AEcc BAAB01091139.1 contig444108,

            frameshift at 1529 whole gene in one contig

            No accession number

            Junwen Ai

            submitted to nomenclature committee Jan. 31, 2007

            nearly identical to sequence CYP6AEcc on Bombyx page except

            one region from amino acids 106-256 does not match.

            The EST BY914225 agrees with the Ai sequence so the old

            CYP6AEcc sequence appears to be a hybrid assembled from two genes.

 663 MSVSALVFAAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM 842

 843 QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL 1022

1023 FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR 1202

1203 YTLDSICSSAFGIETNTLSEGAENSPFPSMGSTIFSSSITRGLKLIGRSMWPGIFYKLGL 1382

1383 RCFPTEIDDFFERLLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDG 1529

1529 LVPKNVDAKKVTVKVDDALLIAQCVVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEV 1708

1709 DELLLKHNNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMR 1888

1889 VHVPVYAIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICI 2038 (1)

2917 GMRFGKMQTIAGMVTCLKKYNFELADGMSKTVPFRSTTVLTQPSTGLFLKATPRDGWKQRIFAR* 3111

 

>CK526561   a related EST rswfa0_003045.y1 swf Bombyx mori cDNA.

2aa diffs with BAAB01091139.1, 6aa diffs with CYP6AEbb

    GFETSASTLSAALYELAKNPEAQRRAQEEVDELLLK

414 HDNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMRVHVPVY 235

234 AIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICIG

 

>CYP6AE3P (old CYP6AEbb) BAAB01174895.1 95% to BAAB01091139.1 54% to 6AE1

there are probably two very similar genes

BAAB01172535.1 fills in missing sequence for this contig

Bmb026776 from Li Bin

MSVSALVFSAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM

QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL

FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR

34  YTLDSICSSAFGIESNTLREGGENSPFANMGSIVFSSSITRGLKWISRSMWPGIFYKLGL 213

214 QCFPAEIDGFFER 252

LLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDGLVPKNVDAKKVTVKVDDALLIAQC

VVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEVDELLLKHNNKLNYDCLAELPYLEA

CMNEAMRLYPXXXXXTRETVADYTFPDGLRIDKGMRVHVPVYAIHRNPDNFPDPEEFRPE

RHLGDAKNDIKQFTFFPFGEGPRICI (1)

GMRFGKMQTIAGLITCLKKFNFELADGMPRTLAFRSTTLLTQPSTGLFLKATPRDGWEQRIFAR*

 

>CYP6AE4 (old CYP6AEaa) BAAB01211364.1 84% to CYP6AEff 53% to 6AE1

missing C-term

MLFLTLLFILSVCVYFIY (frameshift)

     YRVCNRRFDYWRKKNVSFVPPVPILGNYSGYILLKESISKVVHNLCKLFP 2567

2568 NDPYIGAFFGTEPTLIVKDPEFIKLVLTKDFYHFNGREGSKYTHNEVVTQNIFFTYGDRW 2747

2748 KVIRQNLTPLFSSLKMRNMFHIIEKCSGIFENLLDEESLAPEVEMKSLMSRFTMDCIGGC 2927

2928 AFGVDTKAMQEPKDNIFTTMGYLFFESTTYRGIKNVLRAIWPGIFYGLGLKVFPTDLNEF 3107

3108 FSKLLVGIFEARDYKPSSRNDFVDLLLNLKKNRHIVGDRLQKTTTGDEGADSKFELEVDD 3287

3288 GLLVGQCLAFFSAGFETTSTISNFTLYELAKNPDVQKRAQKEVDEYIKKHNNKLDYDCVK 3467

3468 ELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRDPEYFPE 3647

3648 PELFKPERFYGEEKKNIRPFTYLPFGDGPRICI 3746 (1)

 

>CYP6AE5 (old CYP6AEff) BAAB01096775.1 91% TO CYP6AEee

BAAB01211363.1 51% to CYP6AE1 Depressaria pastinacella 86% TO CYP6AEdd

AADK01005027.1

MILTIIFILSLCVYILYRISTRKFDYWQKKNVNYVQPTPFLGNYSGYILLKENLL

DVVYNLSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLTKDFFYFTGKECFEYTHKEVIT

QGIIFTYGDRWKVIRQNVTPLFSSSKMRSMFRIIEHCSGVFENLLD EESLAPEVEMKSLM

SRFTMDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLG

FKVFPTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEG

ADSKFELEVDDGLLVGQCLAFFSTGFETSSTISNFTLYELAKNPDVQKRAQKEVDEYIKK

HNNKLDYDCVKELPFVEACIDEALRLYPLFGVISRQTGERYTFPTGLTLDKGDRVHIPVY

HLQRDPEYFPEPELFKPERFYGEEKKNIRPFTYLPFGDGPRICI (1)

11263  GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLRLKFVPRNDLQKRIFA* 11069

 

>CYP6AE6P (old CYP6AEee) BAAB01091974.1 93% to CYP6AEff

BAAB01178163.1 = C-term exon 50% TO BAAB01091139.1

BAAB01149335.1 = N-TERMINAL (MOST PROBABLE SEQUENCE TO COMPLETE GENE)

1055 MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN 1234

1235 LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF 1414

1415 TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLD 1537

     EESLAPEVEMRSVMSRFT

MDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLGFKVF

PTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEGADSK

LELEVNDDLLVAQCVSFFIAGFETSSNSLTFTLYELAKNPDVQKRAQKEVDEYIKKHNNK

LDYDCVKELPFVEACIDEALRLYPLFGVISR (frameshift)

DKRERYTFPTGLTLDKGDRVHIPVYHLHHDPEYFPEPELFN

PERFYGEEKKNIRPFTYLPFGAGPRVCI

GERFAKMQMLAGLVPILKRYTVRLAEGMPETINFEPKAIASQPNIGVRLNLLPRNN

 

>CYP6AE7 (old CYP6AEdd) 53% to 6AE1 BAAB01210600.1 BAAB01149335.1  87% to CYP6Aeaa

Bmb021626 from Li Bin

MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN

LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF

TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLDEESLAPEVEMRSVMSRFTM

DCIGGCAFGVDTNAMQEPKDNIFKTMGYLFFESTTHRGIKNVFKAIWPEIFYGLGFKVFP

TDLNEFFSKLLVGIFEARDYKPSSQNVFINLLLNLKKNRHIVGDRLLKIKTGNVRAESKI

KLEVDDELLVSQCVAFFIAGFETSSTISSFTLYELAKNPDVQKRAQKEVDEYIKKHNNKL

DYDCVKELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRD

PEYFPEPELFKPERFYGEEKRNIRPFTYLPFGAGPRTCI (1)

GQRFAKMQMLAGLVTILKRYTVRLAEGMPETINFEQRAIVTQPNIGIRLNLLPRNN*

 

>CYP6AE8 (old CYP6AEgg) 53% to 6AE1 BAAB01205437.1 BAAB01169820.1 (52% to BAAB01211364.1)

BAAB01196919.1 contig74655, 50% to 6AE1 might be the last exon

     MFLLINICVILFVIYYLVTKKYSYWRNRNVSHEKPVLLLGNYGDLILQKKNFGEMAQAIC

1747 QKFPGEPVVGAFFGTEPVLIPQDPEVIKTILTKDFYYFNGREISEHVHKELLSYNLFA 1574

1573 TYGDEWKILRQNLTPIFSTAKLKSMFTLIEKCSKSFQNLLEDETKISKELEVRTLMQRFT 1394

1393 IECIGSCIFGVDTDTLGNDKMNPFKAAGSQLSDFSRLVFVKGIVRAIWPTLFYALGFKTF 1214

1213 TTELDIFKKLVNAVFAQRKHKPTTRNDFVDLILTWKNNNTITGDSIGSFKNSDKTKFSI 1037

1036 DVNDDLLLAQCLVFFAAGFETSAMTSSYTLHELAKNQRALKKACDEVDAYLLRHGNKV 863

 862 NYDCVTELPYLEACIEETLRLYPVLGIITREVMEDYVLLDKIHLKKGDRIHVPVFHLHH 686

 685 NPEHFPNPEEYRPERFYGEEKRKVKPYTYLPFGEGPRICI 566 ()

2262 GMRFAKMQSIAGLITILKKFRLELPEGAPTKIEFKPEAFVTTPKDLIKIKFLEREGWQQRVF 2077

 

>CYP6AE9 (old CYP6AEhh) BAAB01207183.1 BAAB01033068.1 50% TO CYP6Aedd

Bmb007891 from Li Bin

530 MYLLTFLLHFILFLVLVVYYITKRNHDYWKNKKVPFEKPYPILGNYGDYILLKIF 366

365 FGNVTKRLCEKFPESPYVGTFYGTEPALVVQDPDLIRLILVKDFFYFHGREVCKYVDREI 186

185 TTQSIFFTYGDDWRVLRHSLTPLFTTSKMKN

MFHLIKNCCRVFENVLEEKATVKSFEINSIIKYFIMDCVGACLFGVEINAMEKHSQNPIV

KISKEIFPKSTTRGIINVLRGIWPSLFYALRLKLFPEVITMFFTDLLECAFKDRQRNLSQ

RQDFIDLFMKLRQNKYLVGDSIPDIKNGRVQKVNLEVTDELLIAQCVSFFGAAFETSSTT

LSLTFYELAKNPKYQKKAIEEVDDYFAKHNNEIEFDIVSDTPYLNACIDETLRLYPSLAN

LTREVMEDYTFPTGLKVEKGTRIHIPVYHLQRNPKYFPEPNKFDPRRFLPEAKQTIYPFT

YMPFGEGHRICIAMRFAKMQMLAGFATLLKKYEVAVDDTTPQELTIDPRIIVTTPIENIQ

LKLIARQHAL*

 

FOUR NEARLY IDENTICAL C-TERMINAL EXONS FOR CYP6AE SEQUENCES.

>BAAB01132789.1   Bombyx mori DNA, contig535134, whole genome shotgun sequence

          Length = 2056

352 GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLRLKFVPRND 519

 

>BAAB01134380.1   Bombyx mori DNA, contig538543, whole genome shotgun sequence

          Length = 2850

218 GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISMRLKFVPRND 385

 

>BAAB01199026.1   Bombyx mori DNA, contig760785, whole genome shotgun sequence

          Length = 626

460 GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLRLKFVPRN 624

 

>BAAB01136322.1   Bombyx mori DNA, contig542966, whole genome shotgun sequence

          Length = 1880

1736 GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLR 1879

 

Other CYP6 family like sequences

 

>CYP6AN2 65% to 6AN1

join with BAAB01065359.1  C-term exon 67% to 6AN1

Bmb032387 from Li Bin 361-441 65% to 6AN1

AADK01063079.1, AADK01036854.1

Whole sequence known but confidential

KEIQCEIDEVLSRHDNKLCYDAILEMPLLT

MAFKEALRMFPSLGNLHRVCTRSYTIPELGITIDPGVRIIIPAQAIQNDAKYFDDPSEFR

PKRFAKDSEIKKFSFLPFGAGPRNC205 GARLGEMQSLAGLAAILHKFSVEPAPSTVRKLRVKHTQNVVQGVEGGLPLLIKERK* 375

 

>CYP6AU1 new subfamily 40% to 6B1 43% to 6B27 BAAB01129923.1

AADK01004964.1

MLTAYLLYVFASIVVLIYFYLNNKYKYWKNKNIAGPEPEFLFGNLKESFHRRKHIATVFK

EIYDQYPDEKVVGMFRMTSPTLIVRDLDIVKQIMIKDYAKFNERGIKFSKEGLGDNLFHA

DADAWKAVRSHLTPMFSSGKLKNMVRVLSQTGDRFVDHVTNEVRLRPEQELVSLFLKYNI

ATIMACSFSMETDINDMQVYEMLNVAVFENSYISELDMVFPGILRKTNRSIFGPTVNNFC

YEIVEFIKNERNGKPANRNDMMDMLLGVNCDKLKMKNGSEDFVEITEHVMAGQVFVFFAAG

YVNNTITLTFSLYHLAKDQSIQ

ERLMREIETVLENHNNVLTLEAINEMSYLEMIYLETLRMHPTTNTLQRSALEDYTIP

GTDIEIEKGTLVLIPPLAFHHDEKIYPEPEKFDPERFSVENHKSRHACAFLSFGIGPRTCIG

 

>CYP6AV1 new subfamily 44% to 6AB1 BAAB01020563.1 BAAB01140967.1

Bmb035757 from Li Bin

MLLLIIVIIILYLYTTRNHSYWAKRGVKHERPIPFFGNHLPNILLQRDNTDLGLQMYNAY

ENEKIVGYYIGNIPQLIIRDPEIIKHMLNIDFNSFSERGLNLNPKKESLLNNLFFVSG

DNWRRMRNIFSGAFTSVKLKAVFPLIVKCTDKLVKKTTMRHVSDEIKTYELMTGYTMEFIAS

CGFGVDVDTISESNMHFVDVARSFFDKTSLQTFMINLDEICSILKIIPGTFEVNSEFNSF

ILNITKKVLAERNFKPSERNDFVDHLLQMVYWNGNEKLTTEADLESDSWFVAAQVFLFLV

AGFETSAVATSVVLHQLAYNQNLQDEVRKEIDSALKKYDNKLCYEAVCEMSLLDMTLKES

LRIQPPAGFIRRRCIKEYKIPGTNIAIDPGVKILIPIKALNHDPKYFDCPSEFRPQRFSP

EAEQNIPKFVYMPFGSGPRTCP ()

GARLASLQSMAGLASLLHHFVVEPTPNTSRHYKIKRSAFLVQIIEGGIPLYFKPRTNK* 3673

 

>CYP6AW1 new subfamily 39% to CYP6P5 aa 125-502 BAAB01195862.1

BAAB01153386.1 CK496233 EST

AADK01000856.1

MWLSVLLVVILLFLILLLTVFHYTTKGRKYWLLRNVPYREPCPLFGNFGATFTMRRSYTK

MLQFFYDNYYNEKYVGIFQARRPTLMLIDLELVKNVLSKEFPNFSDRISVTTDTQREPL

LRNLANMSGAEWKAMRQIVTPTFSSAKMKAMFPLIAECAQTLKNSLLKESLAEVNVPDFMTRFTTD

VIGSCAFGVDPGSLKDPESPFLKMSQKMFKIDRSTVLKRYCRTFFPKLFKFLNLRTYSKD

VETFFTTIIKKVLDERRATGVQRHDFLQLMLNVQKTETSFVMTDTLIISNSFIFMLAGLE

SSSTTLSFCLYEWAKDKHIQDDLRTEMVDCLERYGGINYEAVCSMRWVN

QAVLETLRLHPPTPLTTRLGTSACTLNGTDLSVRVRDPVLIPLYCIQRDAQHFPNPDKFNLERFKETNPPG

FLAFGEGPRSCPGARFAQLTVAAALAALLSSFEIEPCSMTTSTIVYDPRSVMLKNKGGIW

LKFVPL

 

>CYP354A1  Bombyx mori (silkworm)

           BAAB01200346.1 BAAB01118814.1

           BAAB01157630.1 BAAB01008306.1

           Bmb029934, Bmb010351 from Li Bin

           AV399740.1 EST N-term 203 amino acids

           BAAB01211873.1 exons 1,2

           AADK01025884.1 exons 3,4,5,6

           BAAB01157630.1 exon 7

           AADK01034838.1 exons 7,8

           AADK01018486.1 exons 8,9

           This sequence assembled from genomic DNA and ESTs

           A tree of 155 CYP6 and 9 sequences places it outside CYP6 and CYP9, but

           It is a CYP3 clan member.

           The stop codon in exon 4 seems to be a seq error

           when compared to the CYP354A2 cDNA sequence.

MNFSPGTVQILQFIQNDWKL

605 ILILTLLIFIYYYYTNTFDYFEKRGVPFKKPIIFLGNLGPRLKAVKSFHQYQLDIYQYFKGHPYG (1) 799

1277 GTFDGRRPVLHILDPELIKAIMIRDFDHFTDRNTLNSMEPRYLSRSLLNLK (0) 1429

4244 GLEWKGVRSTLTPAFSSSRLKNMIPLIQQCSKQMVEFLKKF (1) 4027

2588 DGKEIEMKQTMGHFTLEVIGACAFGIKCDALSNENSRFVK (0)

1717  VAEKFDYMPKYKRVILLMLLVFMPKMIR*LRLSFLNIEYTG  1598

ELVRMLQAAKAERRSSESK (2)

KGDFLQILIDFAAKETAQNDTAKREI (1)

LLDDDTIDAQSLLFLIAGY

ETSSTLLSFAIHVLATKPDLQETLRAHVQEMTKGKEMSYELLAQMDYLEAFLQ (1)

ETLRIYPPVARVDRICTKPYIIPGTTVHVGV

GDAVAIPVYGLHMDEDIYPEAREFKPERFMDDQKKDRPSHLYLPFGAGPRNCI (1)

GLRFAMISAKIAMVALMKNFKFSVCSKTMDPIDFDKRAVLLKSAKGLWVRIELIDLS*

 

>CYP354A2 AB265182.1 Antheraea yamamai

FTLEVIGACAFGIKCDALSHENAYFYKVAENFDYMPKIKRVLIF

LCMIFMPKLLTYLNVSFLHLKSTDELVRMLQAAKAERKRLNSRENDFLQILIDFAKKE

YTELENTNTAKYLDDDTIDAQCLLFLIAGYETSSTLLSFAIHELAINTQLQSKLRAHI

KEVTDGKEISYELLSELTYLDGFLLGDAVAVPVYGIHMDPKYYPEPHELRPERFMHSE

KKERPSHLFLAFGSGPRSCIGSRFAMISAKTAMMSLMKNYKFSTCSQTTYPIEFDKRS

VLLKSETGLWVRFEPL

 

CYP9A subfamily sequences. 

 

>CYP9A19 old CYP9Aqq 66% to 9A13 BAAB01209004.1 BAAB01182505.1

BAAB01064142.1 BAAB01170681.1 BAAB01057984.1 BAAB01037494.1

Bmb022413 and Bmb022414 From Li Bin

MIIIIWTLVIGLALLLYLKQTYSYFSKHEIKSVTPLPILGNMGKIVFKFNHLVDDISQLYNNFPEER ()

FVGRYEFVNPVIYIRDIEIVKRITIKDFEHFLDHRTIVNEDSDPMFGRNLFSLK ()

GQEWKDMRSTLSPAFTSSKMKLMMPLIVEVGEQMINAIKENIKNS ()

GVGYVDIDTKDLTTRYANDVIASCAFGLKVDSLTEENNRFYAMGKAATNFSFKQILMLLGFISFPKLMK ()

MTKFRLFSEETSGFFKELIMGTMKDREMRKIIRPDMIHLLMEAKK ()

GKLVHDNKSSKDTDAGFATVEESAVGKKQIDR ()

VWTDDDIIAQAVLFFIAGFETVSSAMTFLLHELALNPEVQEKLVEEIKENKERNNGKFDYNSIQNMAYLDMVVS ()

ELLRLWPPAVSMDRICVQDYNLGKPNDKAKRDFI (0)

LRKGTGVAIPVWAFHRNPEFFPDPQKFDPERFSEENKHNIKPFTYLPFGVGPRNCI (1)

GSRFALCEVKVMAYQLLQHMEISPCEKTCIPSKLSKEIFNLRLEGGHWVRLKIRD*

 

>CYP9A20 old CYP9Arr 65% to 9A13

BAAB01013356.1 BAAB01000958.1 BAAB01096964.1

BAAB01123933.1 BAAB01125489.1 BAAB01165057.1 CK511220 EST

may be a hybrid sequence 68% to 9A13  

joined Bmb025541, Bmb025542, Bmb025543

MILLIWAVVLIAAFVLFYKQAYSLFSKHGVKGFTPLPFFGNMGRIVIKMDHFSDHIQSLYDSFPEER ()

FVGRYEFLNPMVIIRDIELLKKITVKDFEHFLDHRTIINKDTDPFFGRSLFFLR ()

DQDWKDMRSTLSPAFTSSKMKLMMPFIVEVGEQMNKALKQRIQEAG ()

VGYVDIDSKDLTTRYANDVIASCAFGLKVDSITEENNQFYAMGKAASTFNFRQLLIFFGLASVPKLV ()

KILRITLFQKEIKTFFRELILGTMKNREAQNIIRPDMIHLLMEAKK ()

1129 GKLRHDEKSTKDSDAGFATVEESSVGKKDINR 1224 ()

1878 VWTDDDLVAQAVLFFVAGFETVSSAMTFL 1964

     LHELALNPEVQEKLVEEIRENEKNNNGKFDYNSIQNMVYLDMVV

SEVLRLWPPVIALDRMCVKDYNLGKPNDKSKEDFI ()

IRKDVAVGIPVWGLHRDPEFFPNPLKFDPERFSEENKHNIKPFSYMPFGLGPRNCI ()

GSRFALCEVKVMTYQLLQHMEISPCEKTCIPSKLSKETFNLRLEGGHWIRLKIRN

 

>CYP9A21 old CYP9Ass BP115106 EST BAAB01063049.1 BAAB01018306.1

BAAB01114672.1 BAAB01001005.1 BAAB01079446.1

joined without overlapping fragment, but = complementary frags

could be a hybrid of two genes 66% to 9A13

MIIIIWTLAIGLAFLLYLKQIYCYFSKHEIKSITPLPILGNMGKIVFKINHFVDDISQLYNKFPEER ()

FVGRYEFVNPVIYIRDIEIVKRITIKDFEHFLDHRTIVNEETDPIFGRNLFSLK ()

GQEWKDMRSTLSPAFTSSKMKLMMPLIVEVGEQMINALKKNIKNSG ()

VGYVDIDTKDLTTRYANDVIASCAFGLKVDSLTEENNQFYAMGKAASNFSFKQILLLLGFISFPKMMK ()

MTKFTLFSEETSGFFKELIMGTMKDREMRKIIRPDMIHLLMEAKK ()

GKLVHDDKSSKDTDAGFATVEESAVGKKQIDR ()

VWTDDDIIAQAVLFFVAGFETVSSAMTFLLHELALNPEVQDKLVEEIKENKERNNGKFDYNSIQNMVYLDMVVS ()

EXXRLWPPGVSLDRLCVQDYYLGNPNEMAIRDFI ()

LRKGTGVAIPVWAFHRNPEFFPDPLKFDPERFSEENKHNIKPFAYLPFGVGPRNCI ()

GSRFALCEVKVMAYQLLQHMEISPCEKTCIPSKLSKETFNLRLEGGHWVRLKIRD

 

>CYP9A22 old CYP9Att 75% to 9A14 aa 421-529 BAAB01150296.1 AU002409 BAAB01097088.1

BAAB01211661.1 72% to 9A12 cannot walk upstream any futher

join with N-term from BAAB01034175.1 (might be a hybrid seq)

MITLIWLGVLLVTLTLHLRKVYSRFKDYGVNHFTPIPVLGNAGPITVRLRHVAEDFDMVYKAFPEDR (2?)

FTGRFDLLRPTVIIKDLDLIKQITIKDFEHFLDHRALVDDTADPFFGRNLFSLR ()

GQEWKDMRSTLSPAFTSSKMRGMVPFMVEVNNQMIDMIKKKIVANA ()

GYLDCEGKDLTTRYANDVIASCAFGVKVDSHTNEENQFYLMGRDMADFGFRKIMVFLGYSSFPKLMK ()

KFNAKLLSDETGHFFTDLVLRTMEDREVKEIVRPDMIHLLMEAKQ ()

GKLSYDEKSTKEADTGFATVEESDVGKKTINRI

WSNTDLIAQATLFFVAGFETISSAMSFALHELALNPEIQDRLVQEIKENYAKTGGKFDFN

CIQDLTYMDMFVSEVLRLWTPVVGMDRLCVQDYNLGRANKNATKDFI ()

LRKGEGLSIPTWSIHHNPEYYPEPYKFDPERFSEENKRNIKPFTYLPFGTGPRNCI ()

GSRFALCEVKVMLYQLLQQIEVLPSDKTKVRAKLAKDTFNVKIEGGHWIRLKLRD

 

CYP9G subfamily sequences

 

>CYP9G1 intact sequence submitted by Manabu Kamimura 3/22/99

BAAB01081406.1 BAAB01087326.1 BAAB01125517.1

first exon (69 amino acids) known but confidential

Bmb008984 from Li Bin

RYVGFTDGVCPGIFIRDPEIIKHVTVKEFDHFVNNKDLSPEGEESILKNSLIMLK (1)

DEKWRKMRAALSPAFTASKMREMVPLITEISHNIVEYLK ()

EHLTEDIDLDDLMSRYSNDVIASAAFGLQINSLKERDNIFFKAGKDVFNFSLFQSIRMIFSDHFPSLSK ()

KLGFTVIPKSTSEFFRTLIASTVDYRIKNKVERRDMIQLLMQLST (1)

EWTETELAAQLFVFFVAGFETTGNTLINCIHELALNPHIQDILYEELKAFKETKGNLVYENIGELKYLDCVLN ()

ETMRKWSAAIFVDRICTKPYVLPPPREGGKPCQ ()

LKPGEVIYNAVNSIHMDPKYYEEPEKFIPERFLDENKHKIKPFTFMPFGVGPRYCI ()

GSRFALMEMKILLFRLMLNFKVLKCAKTLDPIKMSPVGFNMNIWGGSWVKFQARKA*

 

>CYP9G3 59% to 9G1 C-term corrected by EST CK508550 BAAB01034600.1

N-term corrected by EST BP125176 AU000984 CK510649 (4 aa diffs) BP127468

BAAB01164136.1 BAAB01189844.1 BAAB01100167.1

GTLK exon is extra compared to CYP9G1, but it is seen in ESTs

The length of this sequence in other insect species ESTs is highly variable.

MLVEVIVFLITTLVAYYLYVYKKIHYFYDARGVKYQPGIPVLGNILKSSLGTGHFWEDIDKIYKAFPGER ()

YIGYIEGTTPILMIKDPEIIKNITVRDFDHFVNHKEFFPVEIDALFGGSLFMMK ()

DDKWRDMRTTLSPAFTGSKMRLMLPFMIDISKNIVEYLK ()

GHQLEDVDVDDLMRRYTNDVIASAGFGLQVNSLVDKDNEFYECGQAMFSTSWPQRFKMILAAQFPTLAK ()

KIGIKVFPQKVTRFFREIVTSTMDYRLKNNVERPDMIQLLMDAYK ()

GTLKNESNESDEKNVGFAMTEEMLKPKGNVR ()

KWTQDELTAQVFIFFLAGFESSANGLTLCIHELALNPEAQEKLYEAIIKFKEEKGPLTYDNIGELKYLDCVLN ()

ETSRKWSAAIIVDRVCSKPYELPPPREGGKPYK ()

LQPGDIVYNSVNSIQMDPEHHPDPEKFDPDRFLDENKHKIKPFTFLPFGAGPRNCI ()

GSRFALLELKVLIYYIVLNFKIIKTEKTLSPIKLQPGEFNIKVWGGTWTQFEARE*

 

>CYP9G3-de9b10b BAAB01034600.1 pseudogene detritus exons downstream of CYP9Gxx

2492 DFLFNVANSIHIDPIYHTEPDTFNPARFLVENEYQMREFTFMPFSIGPRNFF

4065 GKGFALL*IKLILYYLVLNFKVMKFMKTQNPIKLIQHEFNLEV 4193

 

Other CYP9/CYP6 like  sequences

 

>CYP9AJ1 N-term 34% to 9A20 N-term, 76% to EB742536.1 Antheraea mylitta

AADK01023492.1, AADK01016060.1, BAAB01077546.1, BAAB01122451.1

Whole seq is known but confidential

3622 MLWYVVLIVTCLFYYSSRRLRYFSSRGIATLPTTPFFGNLTAVTFGRENFVEAIAKGYDAFKD 3458 ()

1564 RYFGLYQYLVPTLVPRDPELIRQILVRDFSAFADRGVHIDEECDPLFGRNLIMLR 1400

4355 GSKWRSMRVALSPAFSGARCRNMAPLMVESAKSVTNHLQKRIIDEKVIDINV (0) 4200

3464 ITMSYVNDVIASCAFGFAVDSLKEPDNCIYKLGQKAIIQDTTQVMKFFGYENIKTVMK

3290 VNPRLEYILGILKTQKKASKXXXXXXXXXXXXXX

     RPDFIQILVDAMQGMLIYE (2) 2694

 

>CYP9AJ1 BAAB01063629.1   CYP9A like I-helix region

(1) 617 FTDDDLVAQAVLFYVAGYDTTANLINYFLYEMAVNPHVQEKLNQEIAEMSDEGDVYEAIQRLKYLEMCVC 408 (1)

 

>CYP9AJ1 C-term BAAB01173477.1 BAAB01146424.1 BAAB01001094.1

41% TO 6G2 374-514 45% to 9E2

Bmb023483 from Li Bin

AADK01003334.1

MSPQPNNLSPLDILINVNKIYKTKYIKYIKLVEPHSVNE

EVLRLWPLVGSADRRSVIPYDFGPTYPDSKHSLI

APTGIHIWIPIYSIHRDEQYWPDPNTCIPERFSPENKGSIVPYTYLPFGTGPRHCI

GSRFAILAAKVFLVKFLKTYRTKTHREKTKLSPRAFILRPRDGFELTVERRIEND

 

>CYP9AJ2 Antheraea mylitta (wild silkmoth) from fat body

         EB742536.1

         76% to AADK01023492 seq above

RYFGLYQYLVPTLVPRDPELIRQIMVRDFHAFADRGVHISADCDPLFGRNLIMLT

GSNWRSMRVSLSPAFSGARCRCMAPLMADTARAVAQHLKHHITHEQLIDINT

ITMAYVNDVIASCAFGFAVNSLEDLQ

NGIFRLGQRAIVQDTT

 

CYP12 or CYP333 like mito sequences.  CYP333 seems to be the lepidopteran

(moths and butterflies) equivalent of CYP12 in flies and mosquitos (diptera).

These sequences will probably be in CYP333.

 

There are at least four different C-terminals here and some

N-terminal fragments, but it is not clear which N-terminals belong to

 which C-terminals.

 

>CYP333B1 39% to 12B2 mito BAAB01158176.1 BAAB01035519.1

BAAB01083643.1 BAAB01120305.1 39% to 49A1 39% to 12F2

47% to CYP333A1 75% to combined (BAAB01158865.1 + BAAB01136746)

N-terminal is upstream of a repeat seq.  I cannot identify it.

Bmb006234 from Li Bin

AADK01000081.1, AADK01010165.1, EST BJ985225.1

MIIALHYSKLRPVLNFSCLQQCVR ()

TVTVSAATEKLQQTELKSFREIPGPSSLPIMGPFLHFMP

GGSLHNINSTELTHKLYDIYGPIVRIDSMFSKDAIVLLYDAESAGI

ILRNENNMPIRISFKSLSYYRQKYKKSENDRTDRPTGLVSD ()

HGELWKSFRSAVNPVLLQPKTIRLYSSALEEVATDMVERL ()

RSLRDENNRIRGQFDQEMNLWSLESIGVVALGNRLNCFDSNLQDDSPVKRLIECVHQ

MFVLSNELDLKPSIG (1?)

QLNYTKNIFKTRLIYFSLTKYFIKKALDDIKMNKSKSDDEKPVLEKLLDINEEYAYIMASDMLVAGVDT (0)

TSNTMSATLYLMAINQDKQQKLREEVMSKNGKRSYLRACIKEAMRILPVV

SGNMRRTTKEYNILGYHIPEN ()

VDIAFAHQHLSMMEKYYPRPTEFIPERWLTNKSDPLYYGNAHPFANSPFGFGVRSCI ()

GRRIAELEVETFLSKIVENFQVEWSGSSPRVEQTSINYFKGPFNFIFKDL*

 

&&&&&&&&&&&&&&&&&&

 

>CYP333B2

BAAB01158865.1 AADK01000081.1 BAAB01179470 BAAB01136746.1 BAAB01210389

whole seq 61% to 333B1

MIGLYILFITFFFLFQNLIYCQ (1)

ATVSTSENVDVTNLKPFHEIPGPSSLPLIGPLLHFIPG (1)

GSLYTPDTKDFSAKLFKLYGPIVKLDPLFARNTLLMVYDPESAAN (0)

VLRSENRIPYRGGFNSLAYYRKHIKKHENNHKKLTGLITE ()

GEEWWDLRSTVNPVLLQPKTIKLYSAAIDEVAQEMMNR (2?)

MHRKLDENDRLQAKFDDEMNLWALESIGVVAFGIRLNCFDPNLAENSPEKKLIECVHQI

1459 FNLSNQLDFQPSLWHIFSTPTYKKAMKMFQLQEE 1358 ()

 545 LSKYFINKAMRNINKNENKPDEQKGVLEKLLDINEEYAYIMATDMLVAGVDT 390 (0)

 176 VANSIAATLYLFAKNPEKQEKLREEVMSKESKRYLKACIKETMRMMPVVSGNLRRTT

4119 KEYNILGYHVPKG 4081 (0)

3451 IDVAFAHQDLSSMEEHYPRPTEFIPERWLADKNDPLYYGKAHPFVMAPFGFGVRSCI 3281 ()

2571 GRRIAELEIETFLTKILENFRVEWYGPPPKIIQTSINYFTGPFNFVFNDIKKK* 2410

 

&&&&&&&&&&&&&&&&&&&

 

>CK494024 EST seq for N-term KYG motif CK493643 CK540020

BAAB01054952.1 BAAB01013929.1 BAAB01206514.1 28% to 49A1

MKMAKSTVVIRQSL (2)

LQRQCRRHIAGSSSSRTSTSPQRRNAAATASATATCLKPFNEIPGPMALPMLRHSAHILPKI (1?)

GNFHHTVGLDVLENLRKKYGDLVRLSKATRTRPVLYVFHPEMMRE (0?)

VYES

 

A middle region fragment

 

>BAAB01019704 Length = 2835 CYP12 like WALES motif (at ETAM site)

42% to CYP49A2

1227 RLEELRCKGNVLNEELETEIYRWALETVGMMLFGIRLGCLDARYVTQSMHTF 1072

 

&&&&&&&&&&&&&&&&&&&&&&

 

>CYP333A2 BAAB01120524.1 Length = 1355 38% to CYP49A1 C-helix region

 126 VYRAEEANPLRPGFQVLDYYRTQLRKSRYGGLHGLINA 239 ()

1099 QGPEWREFRTKVNPALLLPKLVKLYAPGIDEIAQDFVQRYLFSFVIKYYNCVCNFLCSSIIPLR 1290

 

>CYP333A2 mitochondrial P450 CYP333 family I-helix aa 294-337

BAAB01104455.1 61% to CYP333A1

3  DNFELELTKFSLEATALVALGSRLGCLKDSLDSDHPARRLMKSTRDIFELTYKLEIRPSP 182

183WRYIATPAYKMVIEAYDTQWE 245 ()

VSKYYFRISMMYINHARKKLEQRGYDIPEEEKSVLEKLIAIDEKVAVMMASEMLLAGIDT 753

 

strongly suspect these two fragments are part of the same gene with

one exon missing between them from the AGIDT MOTIF TO THE PKG MOTOIF.

 

>CYP333A2 BAAB01160330.1  Length = 986 also ovS022C05f from KAIKOBLAST

aa 262-365 75% to CYP333A1 BY922830.1

482 IDVIAPNEYLSRSEKFYPQPEEFIPERWLVEKSDPLYYGNAHPLVTLPFGFGVRSCIGRR 661

662 IAELEIELLIKRLIEEFKVTWNGPPIKIVNKLTNTFVKPYNFTFTSVK* 808

 

&&&&&&&&&&&&&&&&&&&

 

A related Ips pini sequence (NOT BOMBYX)

 

>CB407630 JH III-treated male I. pini midguts Ips pini cDNA 40% to 12F4

CYP12 like sequence from Ips pini (North American pine engraver) (N-term)

MLCSKSFSPLIVLRNNSTISAAGVNSFKKAEVVDNAKPEGWDQAKPFDQIPGIKP

LPLIGNNFRFLPGGEFHKVQGLDITRRLQQKYGRITTLSGLFGVMPIVHLFDPNDFEHVL

RNEGPWPIRKNVECVTYYRQRVRPEIFKGVDGVALTQGEEWLRERSVVNKILMQPRTIEM

YVGSMNEVANDLLELMKHFAKKDPNSEMPDNFQNELYR

 

>CB408781 JH III-treated male I. pini midguts Ips pini cDNA Length = 464

WALES region to I-helix.  These two Ips pini cDANs probably are from the same

gene.  They overlap by 2 aa and are the right size to cover a CYP12F sequence exactly in this region.

 

2 YRWTLESVGVVAYNRRIGCLDINMHKDSEGSRFISAVQEFFDLMYALEYRPSMWRIYSTKKW 187

188 KRFVELMDFITEINQRYINECLATIDPNSDIPDHERSALERLFKVDQQIAVVMASDMLVAGVDT 379

 

Joined sequences 37% to 12F1, 36% to CYP333B2

MLCSKSFSPLIVLRNNSTISAAGVNSFKKAEVVDNAKPEGWDQAKPFDQIPGIKP

LPLIGNNFRFLPGGEFHKVQGLDITRRLQQKYGRITTLSGLFGVMPIVHLFDPNDFEHVL

RNEGPWPIRKNVECVTYYRQRVRPEIFKGVDGVALTQGEEWLRERSVVNKILMQPRTIEM

YVGSMNEVANDLLELMKHFAKKDPNSEMPDNFQNEL

2   YRWTLESVGVVAYNRRIGCLDINMHKDSEGSRFISAVQEFFDLMYALEYRPSMWRIYSTKKW 187

188 KRFVELMDFITEINQRYINECLATIDPNSDIPDHERSALERLFKVDQQIAVVMASDMLVAGVDT 379

 

CYP15 like sequence

 

>CYP15C1 40% to CYP15 made from non-overlapping pieces, may be hybrid

note: CYP15A1, B1 and C1 are all probably orthologs.  Each species

seems to only have a single CYP15.  (not found in Drosophila)

BAAB01071346.1 BAAB01157546.1 BAAB01036905.1 BAAB01022841.1

BAAB01157214.1BAAB01068330.1, AADK01010670.1, AADK01007973.1

     MLALIALCFILFFYIISRRHRGLCYPP (1)

2393 GPTPLPVVGNLISVLWESRKFKCHHLIWQSWSQKYGNLLGLRLGSINVVVVTGIELIKEV 2214

2213 SNREVFEGRPDGFFYTMRSFGKKL 2142 (1)

GLVFSDGPTWHRTRRFVLKYLKNFGYNSRFMNVYIGDECEALVQLRLADAGEPILVNQMFHITIVNILWRLVAGKR ()

YDLEDQRLKELCSLVMRLFKLVDMSGGFLNFLPFLRHFVPRLIGFTELQEIHNALHQYLR ()

YQEIIKEHQENLQLGAPKDVIDAFLIDMLESQDDKR ()

ATLDDLQVVCLDLLEAGMETVTNTAVFMLLHVVRNEDVQRKLHQEIDDIIGRDRNPLLDDRIR ()

MVYTEAVILETLRISTVASMGIPHMALNDAKLGNYIIPK ()

one exon not identified

326 GKRRCIGEGLARSELFMFLTHILQKFHLRIPKNEPLPSTEPIDGLSLSAKQFRIIFEPRKTFKSI* 523

 

CYP18A1 sequence and a related sequence

 

>CYP18A1 60% TO CYP18A1 ORTHOLOG AU005208 EST BAAB01190855.1 BAAB01187772.1

Bmb024372 from Li Bin

MITMLTNSKILWALWQVMNYCVSRTSVMLIIVTCTALLLTQFLKLVRDIRKLPP

GPWGPPVVGYLPFLGVRHKTFLQLARNYGALFSARLGNQLTIVMSDYKIIREAFRREEFTGRPSTPLMHTLDGL ()

GIINSEGRLWKNQRRFLHEKLREFGMTYMGNGKKLMEDRIQ ()

NEIHELIVSLHRAQGAPIDVNPLLALCVSNVICGITMSVRFSNGDVRFERLNHLIEEGMRLFGEVHYGEYIPLYN ()

YLPGKALAQEKVAKNRDEMFAFYQTLIDEHRETLDINNARDLIDVYLIEIEKAKSEGRAGELFEGRDH ()

ELQLKQILGDLFSAGMETIKSSLLWMIVFMLRNPDVKRRVQEELDAVI

GRERLPSIDDISSLPYTETTILETLRLSSIVPLATTHSPTR ()

DVQINGYKIPAGSQVIPLINCVHMDPNLWDEPNKFNPSRFIDATGKIRRPEYFMPFGVGRRMCLG

DVLARKEMFMFFSCMMHQFDLEMAEGDALPSLEGIVGATIAPKAFRVKFLARSPVPLVPTTLSADSSHLRHVGSH

 

CYP18B1 sequence

 

>39% to CYP18A1 BAAB01081335.1 BAAB01007952.1

BAAB01142048.1 BAAB01138082.1 BAAB01045663.1

note: this sequence is made from several non-overlapping contigs.

MLIYTRIISTLGMWTKLDTYIELYNGYMQDILDRKHRTMDLLFLVLLGLLFLFLKRTICYYMYLPP (1)

GPWGVPFLGYLPFMKSSPHAMYSRMADKYGEISSIKLGNHLVVCLNSPKLVKELFS

RSDSIARPRTPLNEIMEGR (1?)

GIVLSEGILWQKQRQFLHEKFRALGVKVWPKQRFEKFII (0)

1739 MEIEEFITDLIKLNGAPVDPTLLLGRHVHNIICQLMMSFRFEEDDQEFGIFNEKISRGMKL 1557

1556 YGSIHASEYVRHYL 1515 (0)

KLPGKKTILDEMKRSLADISEFHANKIRERIDYRATHPYDEPADLLDYYLDNIEARKSRKRFPDIFPGVDP (1)

EKQVVQVMNDLFSAGMETSRTTLSWTLLMMIHEPDVAAKVRAQLTETVHPGELVTLDHRPELPY

LEAVLFETLRCVSLVPLGTTHVNTT ()

SEWKVDKYVIPKGAHIIPLIGKMNNDPKVYPEPDKFKPERFLRDGQFHIPDSYMAFGVGQ

RLCLGIQLARMQLFLFFANIMNRFEFSLPEGAEMPPLEGFMAATHTPLPYSLCFHKIDN*

 

CYP49A like sequences

 

>CYP49A1 44% to 301A1, BAAB01133495.1 BAAB01150577.1 BAAB01091102.1 BAAB01198981.1

BAAB01133297.1 BAAB01093561.1 possible N-term

note: C-term exon on this gene has been changed 7/20/04 because it

matches better to the CYP49A1 Apis mellifera sequence.

Bmb020819 from Li Bin

MIMINLLGAKNSMLSTCPVHIQRARSTHVVDATAFEVSSPVKPWEDVPGPKPLPLLGNTWRFTPYI (1?)

GSYSVEHIDRICVSLRAKYGKCVKMAGLLGRPDMLFVFDANEVERVFRGEDAAPHR ()

PSMPSLNYYKHTLRKDFFSAEENCAGV ()

HGDSWSAFRTKVSRVALSAGAAAQYTAPVAEVADCFVER ()

IRKIRDENMETPEDFLNEIHKWSLE ()

SLGLIALDTRLGCFEACEGSESQRLIDAV

KTFFLCVGELELRAPCGGSTPPPCSDERRCFRHYPQV ()

SVTLRHVDKALEEIKLNGSSKSLLQDLVTAAGARVAAVAALDMFLVGIDT ()

TSTAVASILYQLSSRPHVQEKIYEEVTKALQGRPMSPGDLNQMPYLKATVKEVLR ()

MYPVVIGNGRQLTKDTVICGYNIPKG ()

TQVIFQHYVMGNSEEYFKDASQFRPERWLKRTAQRHHAFASLPFGYGKRMCLGRRFAELEIHTVICK (0)

>BAAB01052936.1 possible C-term for this gene

LLQKYKLEYHYGDLEPTRSFIARPKRALKLRFIDRI*

 

>CYP49A2 BAAB01014940.1 BAAB01083735.1 51% to CYP49A1 Drosoph.

BAAB01056249.1, AADK01003288.1 first exon is a guess

MLSGACTAHYTKHIRCVIIVMKYQTTVAAALESFRNF (1)

ARCYTVMPGPRPLPILGNSWRFALGW

KPWRTKRLDLTLWCLRSLAGAGGAAKVAKLFGHPDLVFPFCAEETARIYRREDAMPHR

AAAPCLKHYKQELRKDFFGDEPGLIGI ()

HGEPWSRFRSKVSKALIAPEAARAAVPELDYVANDFVIR ()

LEHLLDLNRELPKDFLTELYKWALE ()

SVGAWALGTRLGCLNDTKTDAKEMIK

CIHGFFHSVPELELSAPLWRIYSTPAYKTYIEALDSFRL ()

LCLKRLTDKGVCAQVAKNCGQKVATILALDLMLVGVDTTAA ()

AAASSLYLLANVPRAQRALQKELDTNLPKNRILNDKDLDKLPYLKACIKEALR (2?)

MKPVILGNGRCIQSDTTISGYKVPKG (0?)

THIVFPHYVLSNEERYFPSPHEYVPERWLRENDIAG (gc boundary?)

VCRKQKEIGIHPFASLPFGFGRRMCVGKRFAEVELQLLLAR (0?)

IFQKYNVLWRHPELTYSVTPTYIPNESLKFTLNKRNE*

 

CYP301A1 sequence

 

>CYP301A1 65% to Drosophila 301A1, 64% to Apis 301A1

BAAB01030444.1 BP117781 BAAB01134599.1 BAAB01155686.1 BAAB01102862.1

BAAB01093671.1 BAAB01132355.1 BAAB01006099.1

MGRALRSFAAYARPIQLQSTRNSSSCPFSKRQRSQIAPTAELNEEIFA

NARPYSEVPGPRPIPILGNTWRMVPVIG

QFDISEFAKVTKQFLDTYGRIVRLGGLIGRPDLLFVYDADEIER

MYRREGPTPFRPAMPCLVKYKSEVRKDFFGELPGVVGV ()

HGDQWRRFRSKVQRPILQPQTVKKYVAPIELVTEDFIKYMVDARDENGDLPHEFDNDIHRWSLEC ()

    IGRVALDVRLGCLSPQLNSNSEPQRIIDAAKFALRNVAVLELKA 306

307 PYWRYIPTPLWSK YVNNMNFFVE ()

368 ICSRYINEALERLKTKKVTSENDLSLLERVLRSEGDPKIATIMALDLILVGIDT 529 ()

    ISMAVCSILYQAATRLEQQDKMAEEIRRVLPDPSKPLSYSDLDKLHYTKAFVREVFR ()

MYSTVIGNGRTLQDDDVICGYHIPKG ()

QVVFPTIVTGNMEQFVSDPLEFKPERWLEGGGKLHPFASLPYGFGARICLGRRFADLEIQVLLAK (0)

LLSRYRLEYHHEPLDYAVTFMYAPDGPLRLRMIER*

 

CYP302A1 like sequence

 

>CYP302A1 46% to 302A1 Anoph., BAAB01119552.1 BAAB01040509.1 BAAB01022270.1 CK534186

BAAB01132138.1    missing N-term or N-term not identified

AADK01003274.1

     MFVRLTVKNNIPYRARKCVYRRASENFVGSEHASKVNEQGDNLM

8340 NFEDIPGPRSYPIIGTLHKYLPLI (1) 8411

GDYDAEALDKNAILNWRRYGSLVREKPIVNLVHVYDPDDIEAVFRQDHRYPARRSHTAMNYYRTNKPNVYNTGGL ()

LRSNGPDWWRLRSIFQKNFTSPQSVKTHVSDTDNIAKEFVEWIKRDKVSSKNDFLTFLNRLNLE ()

IIGVVAFNERFNSFALSEQDPESRSSKTIAAAFGSNSGVMKLDKGFLWKMFSTPLYKKLVNSQIYLEK ()

ISTDILIRKINLFESDDSKNDKSLLKTFLQQPQLDHKDIMGMMVDILMAAIDT (0)

463 TAYTTSFVLYHIARNKRCQDEMFEELHTLLPKKDDEITADVLSKASYVRSSIKE 536

SLRLNPVSIGIGRWLQKDIVLKGYSIPKG (0?)

VIVTQNMTSSRLPQFIRDPLTFKPERWMRGSPQYETIHPFLSLPFGHGPRSCIARRLAEQNICIILMR ()

LIREFEIQWAGEELGVKTLLINKPNKPVSLNFIPRSS

 

CYP303A1 like sequence

 

>CYP303A1 45% to CYP303A1 BAAB01045446.1 BAAB01180906.1

BP126135 BAAB01031702.1

Bmb002871 Bmb015732 from Li Bin

MWLAALAVLLAVCLYLFLDTLKPRKFPPGPKWTPILGCAKEVYKLREK ()

TGYLYKAVRELSLTYCKETPVLGLRIGKDRIVMVNSLEANKEMLFNEDIDGRPKGIFYQTRTWGERRGVLLT

DGELWKEQRRFLIKHLKEFGFGRSGMGETAKLEAEHIVIDVMHMIGDRGSAVIQMHN

FFYVYILNTLWTMMAGNRYNPSDPQMKILQ

SMLFDLFAAVDMVGTAFSHFPILSIVAPTLSGYR

NFIKTHKRIWKFLREELARHKDSFQPDKEDKDFMDVYIRALREHGEVNTYSEGQ

LVAMCMDMFMAGTETTSKSMSFCFSYLVREQEVQRKAQEEIDRVVGKDRVPSVNDRPN (2)

MPYNEAIVHECVRHFMGRTFGVPHRALRDTTLAGCHIPE (0)

DTMVVSNYTNILLDENYFPEPYSFKPERFLVNGRVSLPDHYFPFGLAKHRCMGDALAKCNIF

VFTTTMLQRFSLVPVPGEGLPSLDHMDGATPSAAPFKALVIPRI*

 

CYP305A sequence

 

There is no 305A sequence.

 

CYP305B1 sequence

 

>Bombyx mori CYP305B1 mRNA AB044900 BAAB01125622.1 Bmb009083 Bmb026475

BY931349.1 EST., AADK01021449.1, BAAB01102353.1, BAAB01149133.1

BY922832.1, BP117067 ESTs, BAAB01028853.1, BAAB01141906.1, AADK01007172.1

MLPVLVCIIVIVLICCNVIRSVIKPEKFPPGPIWYPFFGSSSIV

QQMTSKHGSQWKALLELSKQWSTQVLGLKLGRELVVVVYGEKNVRQVFSESEFDGRPN

SFFYKLRCLGKRLGVTFVDGPLWREHRQFTVKHLKNVGFGKTSMELEIQNELKLLREY

INDNKHKPIKVDSMFSSAVMNVLWKYVAGERIREDKLERLLELFYLRSKAFTLTGGLL

SQIPWCRFIIPGLSGYKLIVDLNQQISEIIEEAIKKHLNKEVQQNDFIYSFLDEMNEE

NKASFTYDQLKTVCLDLIIAGSQTTGNAVKFALLSVLRNKNIQEKIFNEIENTIGDSM

PCWADSSKLVYTSAFLLEVMRIHTIAPLAGPRRVLQDTVIDGYVIPKETTVLISLADI

HLDPNLWPDPHEIKPERFIDEKGLSKSNEHIYPFGSGRRRCPGDSLARSFVFIIFVGI

LQKYRIDCVNGVLPSNEADIGLLAAPKPFVANFVSRE

 

related to BX561870.1 Glossina morsitans (50%)

GTRKGITGVDGPLWYEHRHFSMKQLRNVGFGRTPMEKHIERETDDLLAYIEGLNGLPVCP

SSFLAHVVINVLWTMVANKHFAYEDKRLEKLLNLLHRRSQAFDMSGGLLSQY

PWLRFIAPKKTGYNIICQLNNELHEFFMETIEKTQAHTDARKC

 

CYP306A1 sequence

 

>CYP306A1 mRNA for cytochrome P450 monooxygenase,

            complete cds. BAAB01049676.1 BAAB01150476.1

BAAB01016115.1 BAAB01136488.1 BAAB01075467.1

ACCESSION   AB162964

Niwa,R., Matsuda,T., Yoshiyama,T., Namiki,T., Mita,K., Fujimoto,Y.

            and Kataoka,H.

  TITLE     CYP306A1, a cytochrome P450 enzyme, is essential for ecdysteroid

            biosynthesis in the prothoracic glands of Bombyx and Drosophila

  JOURNAL   J. Biol. Chem. (2004) In press

41% to Drosophila m. 306A1 AV404609 Bombyx mori prothoracic gland cDNA

MDLYFIWLVTFVAGFWIFKKIKEWQNLPPGPWGLPIVGYLPFID

RYHPHITLTNLSKTYGAIYGLKMGSIYAVVLSDHKLVGDTFSKDSFSGRAPLYLTHGL

MNGNGIICAEGGLWRDQRKLITSWLKSFGMSKHSVSREKLEKRIASGVYEILENIEKT

SDAALDLPHMLTNSLGNVVNEIIFGFKFPPEDKTWQWFRQIQEEGCHEMGVAGVVNFL

PFIRHVSPSTRKTIEVLVRGQAQTHTLYASMIDRRRKMLGLEKPKGAEYAPHENLLKL

YPNGHIKCIKYSKVSPNTEHFFDPNTLIPTEGDCILDNFLLEQKKRFESGDPTALYMR

DEQLHFLLADMFGAGLDTTSVTLAWFLLYMALFPEEQEEIRKEILSVYPYDDDVDSSR

LPLLMAAICETQRIRSIVPVGIPHGCIEDAYLGNYRIPKNAMVIPLQWAIHMDPNVWE

EPEKFKPRRFLAQDGSLLKPQEFIPFQTGKRMCPGDELSRMLSCGLVSRLFRKQRIRL

ASKIPTAEEMRGTVGVTLAPPPVKYYCEPI

 

CYP307A1 sequence

 

>CYP307A1 51% to 307A1 probable ortholog BAAB01102325.1 BAAB01020228.1

Bmb008079 from Li Bin

MSSLIIVLFVFALAVYKLLRRKTVRWVKTNKYGGVETAILRTAPGPVCWPIIGSLHLLGG

HESPFQAFTELSKKYGDIFSVKLGSADCVVVNNLSLIREVLNQNGNVVAGRPDFLRFHKL

FAGDRNN ()

SLALCDWSNLQLRRRNLARRHCGPKQHTDSHARIGTVGTFESVELIQTLKGLT

SRSDASIDLKPILMKSAMNMFSNYMCSVRFDDEDLEFQKIVDHFDEIFWEINQGYAVDFL

PWLAPFYKKHMEKLSNWSQDIRSFILSRIVEQREINLDTEAPEKDFLDGLLRVLHEDPTM

DRNTIIFMLEDFLGGHSSVGNLVMLCLTAVARDPEVGRKIRQEIDAVTRGKRPVGLTDRS

HLPYTEATILECLRYASSPIVPHVATENANISGYGIEKGTVVFINNYVLNNSEQYWSEPE

KFDPSRFLEKTRVRTRRNSQCDSGLESDSERAPVGKPDVEREMLSVKKNIPHFIPFSIGK

RTCIGQTMVTSMSFTMFANIMQSFEVGVENINDLKQKPACVALPKNTYKMHLIPRK

 

CYP314A1 sequence

 

>hypothetical CYP314A1 ortholog from many non-overlapping fragments

49% to 314A1 BP179750 BP179443 BAAB01069738.1 BAAB01063661.1  

BAAB01099804.1 BAAB01099804.1 BAAB01122030.1 BAAB01134118.1

MQSTSPPLLDWSCVPTLVLAVIAVVVAVTALLTRTSDAKHSCR

LPGPQPLPFLGTRWLFWSRYKMNKLHEAY ()

ADMFKRYGPVFMETTPGGVAVVSIAERTALEAVLRSPAKKPYRPPTEIVQMYRRSRPDRYASTGLVNE

1435 QGEKWYHLRRNLTTDLTSPHTMQNFLPQLNTISDDFLELLNTSRQSDGTVYAFEQLT 1265

1264 NRMGLE

     AVCGLMLGSRLGFLERWMSGRAMALAAAVKNHFRAQRDSYYGAPLWK 783

 784 FAPTALYKTFVKSEETIHA 840

2337 VPDRIVSELMEEAKSKTTGMAQDEAIQEIFLKILENPALDMRDKKAAIIDFITAGIET 2510

 87  LANSLVFLLYLLSGRPDWQRKINSELPPYAMLCSEDLAGAPSVRAAINEAFR 242

 243 LLPTAPFLARLLDSPMTIGGHKIPPGTFVLAHTAAACRREENFWRAEEYLPERWIK 410

 411 VQEPHAYSLVAPFGRGRRMCPGKRFVELELHLLLAK (0)

     IMQKWRVEFDGELDIQFDFLLSAKSPVTLRLVEW*

 

CYP315A1 sequence

 

>CYP315A1 mRNA for homolog of shadow, complete cds.

ACCESSION   AB167737 BAAB01119643.1

Niwa,R., Matsuda,T., Yoshiyama,T., Namiki,T., Mita,K., Fujimoto,Y.

            and Kataoka,H.

  TITLE     CYP306A1, a cytochrome P450 enzyme, is essential for ecdysteroid

            biosynthesis in the prothoracic glands of Bombyx and Drosophila

  JOURNAL   J. Biol. Chem. (2004) In press

 

MHRFPSMSSIRSAVRSRNSNRCSMSTKPHKSLRTIDEMPHKKSL

PIIGTKFDLFSAGGGKNLHKYIDMRHKQLGPIFYERLTGKTKLVFISDPTHMKSLFLN

LEGKYPAHILPEPWVLYEKLYGSKRGLFFMDGEDWLINRRIMNKHLLREDSDVWLRAP

IRTAVFHFICNWKLRAQSGNFSPNLESEFYRFSTDVILAVLQGNSALLKPTPEYEMLL

LLFSEAVKKIFSTTTKLYALPVEFCQRWNLKVWRNFKQSVDDSISIAQKIVYEMLHTK

DAGDGLVKRLKDENMSDELITRIVADFVIAAGDTTAYTSLWILFLLSNNTEILTEMND

NDQYVKNVVKEAMRLYPVAPFLTRILPKQCVLGPYLLEEGTPVIASIYTSGRDEQNFS

KADQFLPYRWDRNDQRKKDLVNHVPSATLPFAFGARSCIGKKMAMLQMTELISQIVKN

FDLKSMNNSDVDAVTSQVLVPNKDIKVLILPRSISK

 

CYP324A sequence

 

>CYP324A1 45% to 324A1 Trichoplusia ni BAAB01098782.1 BAAB01016062.1

BAAB01088903.1 BAAB01068119.1 BAAB01136435.1

Bmb031911, Bmb031912 from Li Bin

MLFIIIFIFILLLLTWWLIRWQQVKSYWAARNVPHEPPHPVLGSLTFLQKENP ()

SIWLIKLYKKFPFPYIGIWLFWKPALIINSPELARQILTKDADTFRNRFSNAGKSDPVGALNLFMIN ()

DPVWSSVRRRLTPVFTKLKLQALYPILIRKSNDLKKRIKEDTEKNIKINLR ()

SLFVDYSTDILGEAAFGVSSNSITTGESAMREVTKDFMKFDWLRGLQWSCIFFFPELADFFR (2)

CKLFPKESLEILRKIYRTMVAERSKSQSISGKSKDLLDALMAMKIEAAAENE ()

VYNEDLLFAQATLFVQAGFETTSSAITFAIYELAYNPEIQ ()

ERLYREIVEAKQKMEGNELDGVVLSNLQYLNCVIN ()

ETLRKYPSLGWLDRVSSQSYKVDDTLTVPAGTAVY

VNVAGIQSDPQLFPKPEEFIPERFNTDNNNIKPFTFIPFGEGPRQCI ()

GIRFGYQAIQFGLSAIILNFKLRPIEGSPLPNNCHIESKGFVYTADHPLHIQFVPRN*

 

CYP341 sequences, weakly similar to CYP325

 

These sequences are closely related and are weakly similar to CYP325

There appear to be six genes.

5 exon 1 seqs., 6 exon 2 seqs.,

6 exon 3 seqs., 6 exon 4 sequences, 7 exon 5 seqs., 6 exon 6 seqs.,

5 exon 7 seqs., 6 exon 8 seqs., 6 exon 9 seqs.

 

>CYP341A1 9 exons 39% to 4C3 BAAB01068196.1 BAAB01181661.1 BAAB01053162.1

BAAB01098630.1 BAAB01068157.1 BAAB01166031.1 BAAB01162916.1

Bmb018467 from Li Bin, AADK01007855.1

MIVILLIVLVLGWFSVFRYRRRNMYKLAAAIPDVDKHIPLLGIAHKFTGNTE ()

VLSNPVDSEVVLKTCLEKDDLHRFIRAIIGYGGIFAP ()

VSIWRRRRKIMVPAFSPRIVQSFVGIISEQSEKLASNLGKRVGKGMFSSWPFLSAYTLDSVC ()

ETALGVKINAQGDKDSSFLKSMNRILNIVCMRIFHLWLQPVWLFKLFPVFNEHQSCIEMLHDFVDK (0)

VIQNKREEIKRENNSKTEVDYEY (1)

NLGSYKCKTFLDLLITLSGAEKGYTNIELREEVLTLTVAGTDTSAVAIGFTLELLSKYPEIQEKVYKE ()

LREVFCDSERPLIKEDLEKMKYLERVVKESLRLFPPVPFIIRKVLEDMTL (1)

PSGSVLPAGSGIVVSIWGIHRDPKYWGPEAEHFDPDRFLPERFNVEHPCCYLPFSSGPRNCL (1)

1097 GYQYAMISIKTSLSAILRRYKVVGEPEKGPVPRIRVKLDIMMKSVEGCQVALERRPTK* 921

 

>CYP341A2 Papilio xuthus

MFLCLLCLSVVLGMVLFKLKRRRLYRLASKIPGSDDELPFIGLA

HKFTGTTEDILNSLQKYSYEAMKNNGILRGWLGHILYFIVVDPVDVEVILKTSLEKDD

LHRFIRNVIGNGLIFAPVSIWRRRRKITVPAFSPKIVDTFMEVFAEQSEKLVSVLAAC

AGNGYIAMEPYLCRYTLDSVCETTMGITTNAQNNPNAPYLKALKNILNLVCERIFHLW

LQPDWLYKFFSQSKSHQKYTKEMQGFVDEVIQNKRREIKKEKDLKSEVDRNFGLSNYK

TQSFLDLLIEFSGGENGYTDLELREEILTLTIAGTDTTGISIGYTLKLLAMYPKVQDK

LYQELLDVFGTSDRRIVKEDLSKLKYLERIVKESLRLYPPGPFIIRKVLEDISLPSGR

VFPAGSGAAVSIWGLHRDPKYWGPDAEVFDPDRFLPERFNLKHACSYIPFSSGPRNCI

GYQYALMSMKTVLSAIVRRYKIMGEESGPVPHIKSKIDIMMKAVDDYKICLEKRFK

 

>CYP341A3 AADK01027984.1 96% to 341A1 exons 1,2

MIVILLIVLVLGWFSVFRYRRRNMYELAAAIPDVDEHIPLLGIAHKFTGNTE  2243

4155 PRDLELVLKTCLEK 4196

 

>CYP341A3 AADK01029147.1, BAAB01096639 exons 4,5

1917 ETAMGVKVNAQGDKDSNFLISMNRILNIVCMRIFHLWLQPAWLFKLFPVFHEHQKCKKLLHDFVDE  1720

1030 AIQKKREEIGTENNSAIEVDYKY  (1) 962

 

>CYP341A3 BAAB01031047.1 43% to 4G15 342-572 45% to 4AU1, exons 6-9

AADK01017121.1

4094 LGSYKSKTFLDLLITLSGAEKGYTNIELREEVLTLTVAGTDTSAVAIGFTLELLSKYPEIQEKVYKE 3894

3561 LCEVFGDSERPLVKEDLEKMKYLERVVKESLRLFPPAPFIIRKVLEDITL 3412 (1)

2636 PSGVMVSIWGIHRDPKYWGPEAEHFDPDRFLPERFNVEHPCCYMPFSSGPRNCV 2451

2226 GYQYAMISIKTSLSAILRRYKVVGEPEKGPVPQIRVKLDIMMKSVNGCQVALEKRPTK* 2050

 

>CYP341A4 35% to 4G1 BAAB01061620.1 (C-term), BAAB01046550.1 BAAB01151967 BAAB01098630, BAAB01016786.1 exon 3

BAAB01023249.1 BAAB01137180.1

BAAB01039204.1 BAAB01138999 42% TO 4G15 341-572 45% to 4AU1

AADK01003016.1 whole gene plus adjacent gene

88% to 341A1

 8810 MIVILLIVLVLGWFSVFRYRRRNMYKLAAAIPDVDERIPLLGIAHKFTGNTE 8965 (1)

11577 ALLNPVDMEVVLKTCLEKDDLHRFMRVVIGYGGIFAP 11687  (1)

14661 VSIWRRRRKIMVPAFSPKIVQSFVGIISEQSEKLASNLGKCVGTGKFSSWPFLNAYTLDSIC 14846 (1)

15391 ETAMGVKVNAQGDNDSIFLKSLNRMLNIVCMRIFHLWLQPTWLFKLFPVFHEHQKGKKLLYDFVDE 15588(0)

16204 AIQKKREEIRTENNSGTKVDYKY (1)

17580 DLGSYKSKTFLDLLITLSGAEKGYTNIELREEVLTLTVAGTDTSAVTIGFTLELLSKYPEIQEKVYKE (2)

18316 LCAVFGDSERPLVKEDLEKMKYLERVVKESLRLFPPVPFIIRKVLEDITL (1)

19211 PSGNILPAGSGVVVSIWGIHRDPKYWGPEAEYFDPDRFLPERFNVEHPCCYMPFSSGPRNCL (1)

19622 GYQYAMISIKTSLSAILRRYKVVGEPEKGPVPQIRVKLDIMMKSVNGCQVALERRPTK* 19798

 

>CYP341A5 BAAB01076430.1 Length = 917 exon 4, AADK01008138.1 only one exon in 13kb

150 ETAMGVNINAQAKADSEFLKSLNRLLNVICERIFHLWLHPDWLFKQLPVYDEHQKCIKVLHEFIDQ 347 (0)

 

>CYP341A5 AADK01003016.1

exon 5,6,7,8,9 BAAB01039204.1 BAAB01036273.1

BAAB01189227.1 BAAB01150672.1 BAAB01106539

note: mariner transposase is upstream

Bmb022600 from Li Bin

 570 VIQNKREEFQIEKISKIEVDTQY 638

1347 DLGLYKRKTFLDLLITFSGDEKGYTNVELREEMLTLTVAGTDTSAVAIGFTLELLAKYPKIQDKVYQE 1550

2502 LYEIFDGSERALVKEDLEKMEYLDRVVKESLRLFPPVPFIIRKVLEDTRL 2651

3624 PSGNVLPAGSGIMVSIWGIHRDPKYWGPEAEHFDPDRFLPERFNLEHPCCYMPFSSGPRNCL 3809

4037 GYQYAMISIKTSLSAILRRYKVVGEPEKGPVPQIRVKLDIMMKSVNGCQVALERRPTK* 4213

 

>CYP341A6 AADK01011365.1, 95% to BAAB01046229.1, same as BAAB01090513

same as BAAB01110300.1, Bmb029009 from Li Bin

AADK01018246.1 exons 2-7

AADK01015317.1 same as AADK01032246 at aa level 97% at nucl. Level

     MIALLLILVGLGWVLVFRYRRRNMYKLAAAIPTPDETNLLVGVAHKMMGNTE

2420 VSSNPFDLEVILKTCLEKDDSHRFFRPGIGNGGIFAP 2530

3077 VSIWRRRRKIMVPAFSPKIVHSFVGIISEQSEKLVSSLSKCVGKGMFSSWPFLSAYTLDSVC 3262

3934 ETAMGVNINAQAKAESELLKSMNRFLNVICERMFHVWLHPDWLFKQFPVFNEHQKCIKVWHEFIDQ 4131

6718 VIQNKREEFQMEKISKDLEVDTHY 6789

7230 DLGSHKSKTFLDLLITFSGDEKGYTNVELREEMLTLTVAGTDTSAVGIGFTLELLAKYPK IQE 7418

8335 LYGIFDGSERALVKEDLEKMKYLERVVKESLRLFPPVPFIIRKVLEDTRL 8484

 

>CYP341A6 BAAB01059505.1 exons 8,9

BAAB01018260.1 exact overlap 15 aa

44% to CYP325F 395-482

2796 SGNVLPAGSGIFVSIWGIHRDPKYWGPEAEHFDPDRFLPERFNVEHPCCYMPFSSGPRNCL 2978

1565 GYQYAMISMKTSLSAILRRYKVVGEPEKGPVPRIRVKLDIMMKSVDGCQVALERRPTKLC* 1383

 

>CYP341A6? 3 aa diffs

AADK01021477.1 = BAAB01167086.1 exons 8,9 only 2 exons in 6kb

4834  ASGNVLPAGSGIVVSIWGIHRDPKYWGPEAEHFDPDRFLPERFNVEHPCCYMPFSSGPRNCL  (1) 4649

4404  GYQYAMISMKASLSAILRRYKVVGEPEKGPVPRIRVKVDIMMKSVDGCQVALERRPTKL *  4225

 

>CYP341A7 AADK01032246.1 = BAAB01046229.1

exons 1-6, BAAB01077822.1

Note exons 1,5,6 same as AADK01011365, this seq might be a hybrid

If 5 and 6 are removed ths could join with AADK01003016.1

2328 MIALLLILVGLGWVLVFRYRRRNMYKLAAAIPTPDETNLLVGVAHKMMGNTE 2483

1432 VSSNPFDLEVILKTCLEKDDLHRFFRPAIGYGGIFAP  1542

1804 VSIWRRRRKIMVPAFSPKIVHSFVGIISEQSEKLVSSLSKCVGKGMFGSWPFLSAYTLDSVC 1989

2671 ETAMGVNINAQAKADSEFLKSLNRLLNVICERIFHVWLHPDWLFKQFPVFNEHQKCIRVLHEFIDQ (0) 2868

     VIQNKREEFQMEKISKDLEVDTHY (1)

     DLGSHKSKTFLDLLITFSGDEKGYTNVELREEMLTLTVAGTDTSAVGIGFTLELLAKYPKIQENIFQE

 

&&&&&&&&&&&&&&&&&&&&&&

 

>CYP341B1 AADK01018717.1 exons 1,2, 43% to AADK01032246

BAAB01172821.1, BAAB01064393.1

4511 MLVQLILCIFVALWLLSQRYKKKEMMKVWEQLKNDYTALPLIGHAYMFFGSQE 4353

 

2836 VISEPVMAEYVLKTCLEKDDILKCSRFLVGNGSVFAP  2726

 

>CYP341B1 BAAB01092271.1 Length = 1739 25% to 4S4 ETAM exon 4

GC boundary

708 ETTLGVKVNAQGNSEQPFLRAFEIICRLDSSRFCQPWLHNDTVYKMMPQYQQHKDSKDFLCNFIDQ 905 (0)

 

These two sequences probably join with one more exon between

 

>CYP341B1 44% TO 4M8 BAAB01210990.1 47% to 4H23

41% to 4AA1 anoph

DAHRNGLKSFLELLIESSGGNKGYTDLELQEETLVLVLAGTDTSAVGVAFTSVMLSRHQDVQEKVYEE ()

LKEVFGDSDRPIVADDLPKLKYLEAVIKETMRLYPPVPLIVR

KVDKDVTLPTGLTLVKNCGIVINIWAVHRNPLYWGDDADIFRPERFIDTPIKHPAAFMAFSHGPRACI

GYQYATMSMKTATANLLRHFRLRPAEPTDPTYKHEKNKPLRVKFDVMMKDMDNFTVQLEPRYK*

 

>CYP341B1 joined sequences BAAB01092271.1 BAAB01210990.1

Bmb003145 from Li Bin

AADK01012599.1, BAAB01092272

2366 (1) VSIWRPRRKILAPTFSPKNLTHFVDIFSKQSSYMVKYLGKAAKTGNFSIWKYINTYSMDSIC 2184

708 ETTLGVKVNAQGNSEQPFLRAFEIICRLDSSRFCQPWLHNDTVYKMMPQYQQHKDSKDFLCNFIDQ 905 (0)

VIKSKRNSLEEQKDSTEADQ(1)

NAHRNGLKSFLELLIESSGGNKGYTDLELQEETLVLVLAGTDTSAVGVAFTSVMLSRHQDVQEKVYEE ()

LKEVFGDSDRPIVADDLPKLKYLEAVIKETMRLYPPVPLIVR

KVDKDVTLPTGLTLVKNCGIVINIWAVHRNPLYWGDDADIFRPERFIDTPIKHPAAFMAFSHGPRACI

GYQYATMSMKTATANLLRHFRLRPAEPTDPTYKHEKNKPLRVKFDVMMKDMDNFTVQLEPRYK*

 

&&&&&&&&&&&&&&&&&&&&&&

 

>CYP341C1 BAAB01149680.1 36% TO 4H18 AA 111-182 C-HELIX exon 3

594 VNIWHRRRKLLNPHFGTKNQNNFMETFIKQSAVLVNNLRKEADNGTFSVWDYLTAYTLDSVC 409

 

>CYP341C1 44% to 4d 299-345 I-helix 77% to 4H8 61% to gene 1 above

BAAB01135732.1, AADK01005588.1

EATLGVQMNSQAHSNLEFLRSFDVCSSLGAARICQPWLHSDIIYHRLQRYKTYQKNTEYVLDFVKQVSIF tgt (1)

592 PTSDNGMKTVLELLILNKTFNDVELQEEAFVMIIAGTDTSAIGISFTLLMLARYPDVQEKVYQE 783

IQELFGDSERPPEVEDIHRLLYLDAVIKEAMRLYPPAPVIIRKVERETKL (1) cgt

 

&&&&&&&&&&&&&&&&&&&&&&

 

>55% to CYP341B1 BAAB01045792.1

45% to 325J 66% to BAAB01210990.1, AADK01018755.1

(1) PSGLILPRDIGILIPIWSVNRNPKYWGDDADVFRPERFLDGTKKHPTAFMTFSQGPRACL

GFKFAMNSAKAALASILRHYRIKPPSELTSMSPGQYPPIRVKFALMTRDVDNFRIQLESRS*

 

>seq not assigned AADK01004743.1 one aa diff to BAAB01133870 exon 5, only one exon in 17kb

10917 AIQNKRQSIKSTNNSYYRVFNLY 10849

 

>CYP341-un1 pseudogene fragment

42% TO 313A4 aa 398-442 perf to heme BAAB01044466.1, AADK01012171.1

1181 VPRGTVCAVSAMVMGRARRLWGPDAAEYRPERWLAPPHAQPAAFLAFSYGRRACI 1017 ()

 

CYP332A sequence

 

>64% to CYP332A1, 37% to 324A1, 32% to 6G and 6d

CK502566 Bombyx mori cDNA corrected C-term

CK518690 Bombyx mori cDNA

CK526599 Bombyx mori cDNA

CK522682 Bombyx mori cDNA

BAAB01119140.1 BAAB01018054.1 BAAB01100160.1 BAAB01098663.1

BAAB01157576.1

MDVSGPLQLFLIFIIFCLTA

IYLLFNRNYQYWEKRGVPYEKPFFLFGSLSFILRKSFWDYFYELSKRHTGDYVGIFLGFK

PTLMVQTPEIARRILVKDNAHFNNRYCYSSYGVDPLGSLNLFTVK ()

NPKWSNIRHELSPMFTSLRLKTICELMNVNAKELVLKIQRDYIDNNEDVNLK ()

ELFSMYTSDTVGYTVFGLRVSALNDPSSPLWFITNHMVKWDFWRGFEFTAIFFVPALARFLR ()

LKFFSQPATEYIMRLFRTVVDERKKTNQNTDKDLVNHLLKLKENLKLGADI ()

KLADEIMMAQAAVFILGSIETSSSTLSYCLHELAYHPEEQQ ()

KLFEEVDDAIKETGKEILDYENLQELKYLSACILETLRKYPPVPHIDRVC

NKTYKLNDELTIEEDIPVFVNVLAIHRNEKYYPEPDQWRPERMIGVTDNDNLQYTFLPFGDGPRFCI ()

GKRYGLLQMRAAIAQMIHKYKFEAAEPHSTPSDPYSVILSPKSGGRIKFVPR*

 

CYP337 new family sequences

 

>CYP337A1 60% to 337A2 29% TO 6A21 30% to 6P3 36% to CYP321A1

BAAB01181176.1 BAAB01007828.1 BAAB01049389.1

Bmb030264 from Li Bin

MLLLLLISFTLLLIFFWKQNNYWKSRNVKQVTGTLFKFTFGSRSLPEYYKEIYDKHNESQ

IGIYLGRRPAIILKDLRDIQAVLAGDFQSFHSRGIILSEKETLADSILFIDDLPRWKILR

QKLSPAFSSLRLKTMFEGIERSARDFVEFIENSGNDQDLEEMPFNAIYKYTTGSIGAAVF

GVDVDQNTLDTPALNITRKALEPTLKSIMTFFLAGTFPRLIKWLDMMNFDNYETSFIDAV

KKVLENRRAGEKQYDFIDVCLELQNHEVLRDLVTGYEIVPTDELLAAQAFFFFVAGADTT

ANVMHFTLLELSSNPSVLKKLHAEIDEVFQDGKKTLNFEDMDKFKYLEMVVNETMRKYPP

IGLLQRICTKETNLPSNNLRISKDTIAVVPVLALHRDERFYPKSDAFDPQRFAPENFNEI

NKFSFLPFGEGNRVCI ()

GAKFARLQLRAGLAWLLRKYTLVPQDYKPVKFERSPFAVRDTKAKYRLINRTN

 

>CYP337A2 BAAB01100580.1 BAAB01137171.1 new family 60% to 337A1

31% TO CYP6as 35% to CYP6AB1

BP123442 EST 37% to CYP321A1

MLPVLVIVSLVAILILYWQSSNYWKKRNVREVNGTVLKFTFGNCSLPEYYKQIYDKYNEN

QIGFHLGASPALVLRDLQDVQAVLASNFQSFYRRGFAVNDADVLGGNMLFLDDLPRWKIL

RQKLSPAFSSLRLKAMYEAIEKTARDFTDYIATDERAIKEPFDAVLKYTTASIGVAVFGL

DEKGESLIDLPLSHVAGNALKPSLKANIVFFIGSTFPRLFKWLNMTFFGEYEDVFIGAVK

KVLKNRRKLDKRLDIVDVYLDMQSSGRLRDVVTGFEMEPTDEVIAAQSFFFYVAGADTTA

NAIHFILLELSANPHVLNKLHAEIDTVLPKGTEVLTFEDIDRLTYMDMVISEAMRKYPPI

GFIQRLCTKDAILPSNNLRISKDTVTVVPILAIHRDERFYPNPDVFDPERFTPEKIKERN

KFSYLAFGEGNRICIGARFSRLQVKACITWLLRKYTLKPQEYKPERFERSAFSLRDTKSK

YEFIRRTN

 

CYP338 new family sequence

 

>CYP338A1 new family LIKE CYP6

BAAB01074380.1 (one exon covers first 408 aa)

C-term lies beyond a repeat sequence.  No ESTs cover it

AADK01008949.1

MFFESFLLNLSVLIVLIVALVFDYVTKFFSYWYVRHVPYKTPIPFFGSDYHRVLGLTNST

DEVVKLYNEHPGDKFVGRLKNRIPDLIVKDPDAIKRMLSTDFAYFHSRGLGLDKSRDVCI

RNNLFYADAEKWTLLRQGLEAVLNGLCREFDIHACLSEANGDTNVQQLLSVVLDSVFDNL

LLGDEGTSIKELRTTLQKRSMIVKLKSYLKNIFPSIYVTFGLSTLPNNVLKNTRKYMESS

KLQRLIDDSGYMHQVSLKDKHVYAFENEFASSTLAL FVTEGYIPCLYTLTALFYELAVNP

QIQEKARNSIEKDKGVNYLDAIIKETMRLHPSHSIISRQCVKMYQYPDSNLTIDRNVTIN

VPVEAIHKDKEHYENPEVFNPERFFDDHGPTKHSYSYLPFGAGPRKCI

 

CYP339 new family sequence in the mito clan

 

>CYP339A1 new family in mito clan,

CK534983 CK537464 CK535691  BAAB01021114.1 31% to 301A1 anoph. BAAB01141557

BAAB01170021.1 BAAB01154229

note: C-terminal exon was changed 7/20/04.  This is still a guess.

(2 agt)NGEEWSRQRSIIYTPLHNAVTYHIQGIDDICEYFSQKIYNMRNHQDEIPKDLYKDLHKWAFDCL (1)

PGLQESNSYKDLHKWALDCL

GLILFSKKFSMLDTDLVYSQCDMSWMYHSLEKATEAVIKCESGLQWWKILSTPAWYSWVKYCDSWD ()

SLIGKYVLEAEQAISYKAKEIEEQY

PNSNIWINARLLGQEKMNPEDIATVIMDMWLMGVNT

ITSSTSFLLYYLAKYQKAQKILYKEIQENFPEQKIMDLTKIREQTPYLQACIKETLR ()

LTPPIPVLTRILPKNITLDKYN ()

IPRGTLIIMSTQDASLKESNYDDANTFCPERWLKSDSNEYHLFASIPFGYGARKC

LGQNIAETMMSLLTVK (0?)

>BAAB01062474.1 Length = 869 C-term exon for CYP49/301 like gene

217 MIQAYKIEYRREPLEYHIHPMYTPNGPIRLRLVER 321

 

CYP340 new family only 30-31% identical to CYP4s or CYP325s

 

There are at least 8 genes in the following set.  Most of the individual exons do not

exist on overlapping contigs so they cannot be joined in a single complete gene

 

5 exon 1 sequences, 5 exon 2 sequences, 7 exon 3 sequences, 8 exon 4 sequences

8 exon 5 sequences, 9 exon 6 sequences, 13 exon 7 sequences

8 exon 8 sequences, 9 exon 9 sequences, 5 exon 10 sequences

 

This first gene is complete

 

>CYP340A1 old CYP340Aaa CYPnew1 new family BAAB01050176.1

BAAB01056719.1 revised to include exon 2 BAAB01160534.1 (exons 3,4,5)

BAAB01199634.1 exon 3 BAAB01056719.1 exon 3 BAAB01196580.1 exon 6

BAAB01158445.1 exon 6

BAAB01006148.1 exon 7 with one stop codon

BAAB01212205.1 exon 8 BAAB01008628.1 exon 8 BAAB01212925.1 exon 9

BAAB01103225.1 exon 10 BAAB01050816.1 exon 9 1 aa diff

BAAB01009268.1 exon 10 1 aa diff and a frameshift

BP124291

MFIPIIVVVVCVLLLFYSIAERHSNVPLCDNYLPVIGHTHMIIGG

SGLLQTVKYACEETNKKGGVAILKLGLSNYY

VITDPEDNLTVANGTLQKHFVYQFASNWLGDGLITSS

GETWKRHRKLLNPAFSQQMLNIYTVVFNRKSRNLISAIEIQMKSGPVLIDTVFREMALNTLL

STAFGIEEEDSDFNKKYIHAVDVILALLTRRFQNPLLHYPFFYKLSALKKKEEEVIETILTASKK

IIKNKREALNKERSNENGYAT

ERKFKSMLELLLKDSDGDALTDEEIRDEVDTLILAGSDTSSQLTLVVVMVLGSYPEIQDKVYQE

VASVCGVSDTDVEKHQHPRLVYTEAVLKETLRLYPTVAVVLRKPENEIKL

KNYTIPANSNCVLGIYGLNRHPVWGPDAHTFRPERWLEPGGVPGNPNAFAGFSVGKRNCI

GKTYALISTKIILAHLVRRYKVTADISKIEFKMDVIMTPSDNCYVDFELRK

 

>CYP340A1-de9b BAAB01213048.1 Length = 2727 new exon 9 94% to CYPnew1bb

750 NYTIPSDSNCVLGIYGLNRHPVWGPDAHTFRPERWLELGGVPDDPNAFAGFSVGKRNCI 926

 

>AADK01017983.1 nearly identical to BAAB01213048, one frameshift and 1 aa diff

2 aa diffs to CYP340A1-de9b

231  NYTIPSDSNCVLGIYGLNRHPVWGPDAHTFRPERWLELGGVPDDPNAFAGFT  386

388  VGKRHCI (1)

 

>CYP340A2 AADK01003526.1 identical to BAAB01106541.1  (2 genes)

nearly identical to BAAB01003905.1 (3 diffs in N-term)

identical to CYPnew1dd

78% to 340Aaa, 92% to 340Acc, 66% to 340Abb

complete

old CYPnew1dd 91% to CYPnew1cc

BAAB01128878.1 exon 7

BAAB01171305.1 Length = 1430

BAAB01040719. Length = 645 exon 3

BAAB01163949.1 Length = 1132 exons 3,4,5

BAAB01116297 BAAB01160956

64% to 340A3, 78% to 340A1

1679 MFILIVLVVVCVVLLFYSIKKSNSNVPLCDNYLPVIGHTHMLIGGGK  (1) 1545

 490 KLLRTVKYACEEANKKGGVVILQLGLENYY  (1) 401

VITDPEDNLTIANGALQKHSFYQFASNWLGDGLVTSA

GETWKRHRKLLNPAFSQQMLNIYTVVFNQESKNLISAIEIQMKSGPVLITAALKEMALKTLL

STAFGIEVEECDFNQKYMHAVDEVMAVLTRRIQNILLHNSFVYKLSALKKKEDELIETITTMSNK

VINSKRRALKNKQESHENGCAS

120 EKKTQSMSDLLLKGLDDEAFTDKEIRNEVDTLIFTGSDTSSQIMTVVVMLLGSYPEIQDKVYQE 311

     IKSVCGDSDADVDKLQHPR 8576

8575 LVYTEAVLKETLRLYPITPVVLRKTENEIKL 8483

7374 SYTIPANSNCMLGIYGLNRHPVWGPDAHTFRPERWLELGGVPDDPNAFAGFSVGKRNCI  7198

6278 GKTYALISMKTTLVHLLRRYKVTADISKIEFKLDALMVPSDNCYAKFESRK*  6123

 

>CYP340A3 old CYP340Abb CK503564 CK511251 CK504721 BAAB01089735.1

new family CYP4 like

BAAB01006192.1 BAAB01077706.1 BAAB01092224.1 67% to CYPnew1

AADK01020942.1

complete

3080  MLIPVFLIILCVILYYFYWRDKSINNVPLCDKYLPIIGHTHLFIGNTK (1)  3223

3906  ILRTVKSICEDTNDKGGVTVARLALQDYY  3992

VITDPVDNLKIANGTFLKHFAYRFTSHWLGDGLITSS (1)

GETWKRHRKLLNPAFNQQILNSFIGVFNDESRKLVSEIGNEMAKGPVEVTTPFRQMAFRLLF ()

LTAFGIPVEDSDFNQKYIHSVDKLLSMLIYRFQNVLLHNSFIYKISGLKKKEEQMVETVHAMSNM (0)

IIKRKREASKNKSPTDEHCYDT (1)

TAHRYKSILDLLLKGLDGDALTDKEIRDEVDTIIVAGYDTSSWVLTLVMMALGSYPEIQNKVYQE ()

VSSMFGDSEADVDKSHYPGLVYVEAVLKETLRLYPIVPIALRQTESDIELK ()

NYTIPADSNCVLGIYGLNRHPVWGPDAHTFRPERWLEPGGVPDDPNAFAGFSVGKRTCI (1?)

GKVYALMSMKTTLVHLIRRYKVTADISKVEFKMDVLMTPVNNCNVKFELRK*

 

(-) strand second gene on AADK01020942.1, 2 aa diffs to CYP340A2

1237 MFILIVLVVVCVVLLFYSIKKSNSSVPLCDNYLPVIGHTHMLIGDGK (1) 1106

 

>CYP340A4 old CYP340Acc CYPnew1cc BAAB01206095.1 BAAB01104811.1 BAAB01080812.1 BAAB01021854.1

91% to CYP340A2

AADK01031494.1 identical to exon 9 adds exon 10

VITDPEDNLTVANGELQKHYFYQFTSNWVGDGLVTSA (1)

GETWKRHRKLLNPAFSQQMLNIYTAVFNRESRNLISAIEIQMKSGPVLITAAFKEMALKTLL (1)

STAFGIEEEECDFNQKYMHAVDELMAVLTRRLQNILLHNSFIYKLSALKKKEDELIETIMAMSNK (0?)

AINSKRRALKNKQESHENGCAS (1)

EKKTQSLSDLLLKGLDDEAFTDKEIRNEVDTFIAAGSHTSSQLMTVVVMVLGSYPEIQDKVYQE ()

IKSVCGDSDADVDKLQHPRLVYTEAVLKETLRLYPIAPVVLRKTENEIKLK

NYTIPANSNCLLGIYGLNRHPVWGPDAHTFRPERWLEPGGVPDDPNAFAGFSVGKRNCI

2581  GKTYALISMKTTLVHLVRRYKVTADISKIEFKMDVIMVPSDNCYVKFESRK*  2426

 

&&&&&&&&&&&&&&&&&&&

 

>CYP340A5P AADK01040486.1 5 aa diffs to CYP340A1

more seq is known, but confidential

1341  VASVCGVSDDDVEKHHHPRLVYTEAVLKETLRLYPTIPLVLRKPENEIKL  1490

 

&&&&&&&&&&&&&&&&&&&

 

>BAAB01190249.1 exon 6 85% to CYP340A1

144 IIKNKRESLNKERSNENSYTT 206

 

>BAAB01092011.1 Length = 2603

3 aa diffs with CYP340A1

1939 MFIPIILVVVCVLLLFYSIAERHSNVPLCNNYLPVIGHTHLIIGG 2073

 

>BAAB01106113.1 Length = 2553

90% to BAAB01003905.1, 1 aa diff to CYP340A2

2098 MFILIVLVVVCVVLLFYSIKKSNSSVPLCDNYLPVIGHTHMLIG 2229

 

CYP340A3 facing away from each other

60% to BAAB01003905.1

255 MLIPVFLIILCVILYYFYWRDKSINNVPLCDKYLPIIGHTHLFIG 121

 

&&&&&&&&&&&&&&&&&&&&&&&&

 

>BAAB01199634.1 Length = 696 exon 2, 2 aa diffs to CYP340A1 same?

25 SGLLQTVKYACEESNKKGGVAILRLGLSNYY 117

 

&&&&&&&&&&&&&&&&&&&&&&&&

 

>BAAB01023413.1 Length = 2453 exon 2, 66% to BAAB01003905.1

this seq not assigned

1235 SELLRSVKYICGETNKKGGVARCKLGLDYY 1146

 

&&&&&&&&&&&&&&&&&

 

>BAAB01038139.1 Length = 841 exon 4 new 2 aa diffs to CYP340A1

118 GETWKRHRKLLNPAFSQQMLNIYTAVFNRKSRNLISAIEIEMKSGPVLIDTVFREMA 288

 

&&&&&&&&&&&&&&&&&&&&

 

>CYP340A6 BAAB01162196.1  Length = 1626 exons 3,4 1 aa diff to Bmb020206

Bmb020206 36% to 313B1 80-182 BAAB01162196.1 exons 3,4

1 aa diff to BAAB01162196.1

all on AADK01002164.1 single gene

Bmb020204 33% to 312A1 205-384 90% to Bmb035473 BAAB01073561.1

BAAB01209311 4 aa diffs to Bmb037924

95% to CYPnew1dd

1401 VITDPEDNLTVANGALQKHFFYQFASNWLGDGLVTSS 1291 (1)  1 aa diff to 340A6

5391 VITDPEDNLTVANGALHKHFFYQFASNWLGDGLVTSS 5281 CYP340A6

1016 GETWKRHRKLLNPAFSQQMLNIYTVVFNRESRNLISAIEIQMKSGPVLISAAFKEMALKALL 831 CYP340A6

5006 GETWKRHRKLLNPAFSQQMLNIYTVVFNRESRNLISAIEIQMKSGPVLISAAFKEMALKALL 4821 CYP340A6

     ATAFGIEVEECDFNQKYMHAVDEVMAVLTRRFQNILLHNSFVYKLSALKKKEDELIETITTMSNK ()CYP340A6

2997 ATAFGIEVEECDFNQKYMHAVDEVMAVLTRRFQNILLHNSFVYKLSALKKKEDELIETITTMSNK 2803 340A6

1436 VINSKRRALKNKQESHENGCAS 1371 CYP340A6

     VINSKRRALKNKQESHENGCAS ()CYP340A6

      EKKTQSMSDLLLKGLDDEAFTDKEIRNEVDTLIFTGYDTSSQIMTVVVMLLGSYREIQDKVYQE ()

2 aa diffs to 340A6

514  TEKKTQSMSDLLLKGLDDEAFTDKEIRNEVDTLIFTGYDTSSQIMTVVVMLLGSYREIQDKVYQE 320

2 aa diffs to 340A6

     IKSVCGDSDADVDKLQHPRLVYTEAVLKETLRLYPITP CYP340A6

276  IKSVCGDSDADVDKLQHPRLVYTEAVLKETLRLYPITPVVLRKTENEI 103 CYP340A6

 

&&&&&&&&&&&&&&&&&&&&

 

>CYP340B1 C-helix 45% to CYPnew2aa

BAAB01186081.1

VPLWKQQRKALNPAFKQQILNNFMDIFNNQGRRLIMQLAAHGPGSFDHHHPILINNLESSL

32% to 4C3 aa 307-370 49% to CYPnew2aa I-helix exon BAAB01091571.1

SEETISLLDFILDQDKSKQFLTDEEIREQIDIFLIASFDTSATALIYLLTVVGSYPEVQQKIYDE

 

>CYP340B1 AADK01006724.1  larger seq identical to BAAB01024995.1  exon 9 (-) strand

13326 NVGKLKETNIFMFRILAEVGDNKDVTKDVLPKLVYLEAVIKETLRMYSVVPAIARKTDIDLKLSK (2?) 13132

11570 KNYTIPKGSSIGIMLGTLHQHPQWGPDAHQFRPERWFTENLNVFAPFSMGKRNCI  (1) 11409

10588 GKVYAMMSMKVLLVHVFRHYKVTGDISNTEHRLGVLLKPATGHHIKLDKRNKT* 10427

 

&&&&&&&&&&&&&&&&&&&&&&&&&&&

 

>CYP340C1 BAAB01092938.1  exon 5 Length = 2807 a related ETAM exon

1613 ETALGIKMEDHSAVNQQYEQALHDIFAVLTERFQKFWLHYDFVYNRTKLKEREDQIIKVL 1792

1793 HNMSNTV 1813

 

>CYP340C1 BAAB01203952.1 Length = 3702 new exon 8 52% to CK533198

2542 VRRVLGDAERDVTKEDYLRLEYLEAVLKESMRMYPVAPVIARYSDAEVKL 2691

 

>CYP340C1 AADK01006724.1  second exon 9 on same contig (+) strand identical to exon 10 BAAB01125095.1

identical to exon 8 BAAB01203952.1

3243 (0)  CGVSCPHRVRRVLGDAERDVTKEDYLRLEYLEAVLKESMRMYPVAPVIARYSDAEVKL(1?) 3392

4640 KNYTAPAGSGFILLLWGVHQHRIWGADADQFRPERWLEAATLPDPSFFAGFSTGRRSCI  (1) 4816

7545 GKVYAMMSMKTTLSFLLRRYRVSSDVTDLEFKLEAILRPHRGHYIAIERRSKDDK* 7712

 

&&&&&&&&&&&&&&&&&&&&&&&&&&&

 

>CYP340D1 CYPnew2aa 46% to CYPnew1bb BAAB01026721.1 (1 aa diff)

BAAB01035425.1 Length = 1039 BAAB01188536 BAAB01172813

Identical to AADK01004833.1 (adds exon 1)

Bmb018392 from Li Bin

9506 MIILTVSLVIIIAVLGSWMLFYRHYKDSPPFHKGLLPIIGHSYLFMGDTT (1) 9360

SIWRNLKNLAQDCTDKGGVLQIIMGFKRHY

IITDPKDALTVANACLQKHFVYSFGSRWLGNGLITGS (1)

GEIWKRHR KLLSPSNSPQILNTYLGIFNENSRQLVTDLAPMVGEGLKDLSFHIRKMAMNTIF ()

RTAFGVYTNEDENFTKAYMSAVDEILTIITKRFTRVWLQIDFIYNLTSIKKREDELIRIVNEMSNK ()

IIARKRSELASGVAAANET

DNKYQGLLDLMLKLAKEDALTDQEIREEVDTAIMAGFDTSSWILVYVLVKLGSFPEVQDRAYNE ()

2658 IMDVFGDSDRDLEKEDLSKLVYTEAVLKETMRLYTMGPVSLRHIEEDVKL 2807

 

>CYP340D1 AADK01018708.1 same as BAAB01096550.1  runs off the end

      (1) kNFTLKAGTDCSISLFGINRHTVWGEDVNEF

6993  KPERWLDPQKIPDNAFAAFSIGKRSCT  (1)

6346  GRAYALTLMKINLAHLLRKYKISGDVSKLDFKFNFIIKPISGSDIGLEYRK* 6191

 

&&&&&&&&&&&&&&&&&&&&&&&&&&&

 

>CYP340E1 AADK01004833 second gene upstream of new2aa = CK533198

BAAB01132907.1

CK533198   rswgb0_001141.y1 BAAB01126021.1 BAAB01203780.1

BAAB01034164.1 exon 10 overlaps EST seq., 45% to CYPnew1bb

17347 VITHPETAVTVADSTTSKHYIYSFFRRWLREGLLTSS   (1)17237

16897 GGVWKRHRKLVSPSFNLRVLYSFMGVFNSGSKKLINHFRDEMNKGPLNVLPLIKVVSLESICR 16709

15510 ISENTFGIEESSDLEFKKKFMIAVDKMLFIVIERIRKIWLHNDFLYNYCTSLKKEEDNVV 15331

15330 KVLTDMTNK 15304

14587 VLHKKQALLKMKSPTEQPEQSKKS 14516

  160 EKKFKALLDLLLDLSAEDSLTGPEILEELNTAVFGGYDTTSCTLTYVLMNLGTYQDVQQKMYEE

  683 IKNVFGDSERDVESDDLSKLVYTEAVIKESLRLYPIAPIIFRELDQDVKINNYTLK 519

  520 AGRGCAINVYGINRHAMWGDDADKFRPERRLEPDGVPSVY*RVATFGVGKRACI

12885 GRTYAMMSMKTTLVHILRQFRIAADITKIEFKIEIILVPQTPCYLTLERRS* 12730

 

>CYP340E1 BAAB01132907.1 Length = 1852 exons 3,4 new 51% to CYPnew1bb

563 VITHPETAVTVADSTTSKHYIYSFFRRWLREGLLTSS 673 (1)

1013 GGVWKRHRKLVSPSFNLRVLYSFMGVFNSGSKKLINHFRDEMNKGPLNV 1159

 

>CYP340E1 CK533198   rswgb0_001141.y1 BAAB01126021.1 BAAB01203780.1

BAAB01034164.1 exon 10 overlaps EST seq., 45% to CYPnew1bb

    ENTFGIEESSDLEFKKKFMIAVDKMLFIVIERIRKIWLHNDFLYNYCTSLKKEEDNVVKVLTDMTNK

    VLHKKQALLKMKSPTEQPEQSKKS

160 EKKFKALLDLLLDLSAEDSLTGPEILEELNTAVFGGYDTTSCTLTYVLMNLGTYQDVQQKMYEE

683 IKNVFGDSERDVESDDLSKLVYTEAVIKESLRLYPIAPIIFRELDQDVKINNYTLK 519

520 AGRGCAINVYGINRHAMWGDDADKFRPERRLEPDGVPSVY*RVATFGVGKRACI

1006 GRTYAMMSMKTTLVHILRQFRIAADITKIEFKIEIILVPQTPCYLTLERRS* 1161

 

&&&&&&&&&&&&&&&&

 

>CYP340F1 BAAB01090784.1 Length = 3359 new exons 7,8,9,10 45% to CYPnew1bb

2886 SLIDLLLNLSGDQVFSQEDIREEIDTVIVAAFDTTSWSLTYALLVLGSMPEIQEKVXXX 2719

2611 IEEVLGPTDRNLDKDDLQKLVYTEAMIKEVLRLYNVLPAVLRDITEKVQL 2462

1515 NYTMYPGDQCMILINNLNRHKAWGLDVEQFRPERWLGEAELPEHHSAYFATFGIGRRFCI 1336

     GRIFAMYSMKTILVHILRRYSVKSDLAKLRLTSDYVTKPVSGHFITITRRINAVN*

 

&&&&&&&&&&&&&&&&

 

>CYP340-un1 pseudogene AADK01025706.1 = like CYP4S3 Bmb021357

I-helix region 277-344 BAAB01059276.1 44% to 4L4  CYP4C like

Missing C-term and internal frag, possible pseudogene

TWRNNRRLLNQAFKHTILDGFVDVFNNQARSLVDELAAEADLEKIYILNKLSLHSLRLVF

TILGVSDQEVTPEKFNSYLETNNILHNIFTKRFQKFWLHSDIVFNRTKLKIEQDYAVAVLHG

ELISNCGSQPSTYRPILLSSVMVIYFNSAFIVAGEQ

KPFIDLLIEIAEEKGLSDMEVIHELATFIAAGHDTVPYTLLYTLMCVGSHPPVQQRIYEE

AADK01040671.1

LQQVLGSDDVTKQNLSSLVYLEATIKETMRLYPIAPVVSRVTDCDVKLSKY

missing about 14 aa at PKG motif

LHHHPVWGSDVEHFKPERWLDPTTLPENAFAGFSTGKRNCI

GKTFAMMSMKTTLAHLLRQTR

 

C-term Perf to heme region 49% to 4S3 396-458

 

end of this seq is 76% identical to BAAB01034164.1 above

 

&&&&&&&&&&&&&&&&&&&&&&

 

>AADK01029578.1= BAAB01012981.1 Length = 2183 partial exon 7, 77% to CYP340A2

412 MLGLLLKGLDDDALTDREIRNEVDTLITAGSDTSS 308

 

exon 8 sequences

 

>AADK01065909.1 = BAAB01083548.1 Length = 2923 new exon 8 1 aa diff to BAAB01023742.1, 79% to CYP340A4

286 SVANVDKHQHPRLVYREAVVKETLRLYPIGPILLRKAENEIKL 158

 

>AADK01030164.1= BAAB01023742.1 Length = 888 new exon 8 1 aa diff to BAAB01083548.1, 79% to CYP340A4

500 SVANVDKHQHPRLVYREAVVKETLRLYPIGPILLRKPENEIKL 628

 

>BAAB01117890.1 Length = 1882 new exon 8 1 aa diff to BAAB01083548.1

76% to CYP340A4

1814 SVANVDKHQHPRLVYREAVVNETLRLYPIGPILLRKAENEIKL 1686

 

>BAAB01108434.1 Length = 2407 frameshift, 4 aa diffs to CYP340A4 exon 8

CK536690.1 EST also has frameshift AADK01041079.1

1156 IKSVCGDSDADVDKLQHPRLVYTEAVVKET 1245 LRLYQIYSVVLRKTENEIKL 1303

 

exon 9 sequences

 

More new families

 

>CYP365A1 N-term 34% TO 9A1 BAAB01029726.1 BAAB01072071.1

AADK01000274.1

Whole sequence known but confidential

THQFYQRT

LVIRDPELIKRVCVNDFQHFVDRGFFFNKEVDPLAGSVLFLRGNEWKKLRAKISPIFS ()

PNKLRGMFPLIDNTADEFVIRLRKRIKNKNKDIENVDSNSNIED

AYALVDSEELVGGYTADAIVPCAFGLKSLVMDKPDDPFAVALQAFYEKSLYNVFEK ()

TMRQFWPAFVLFFRMRIIPKKTHDFFHNVVTSVLRARESGKF

 

>CYP365A1 C-term 42% TO 6D4 AA 308-458 BAAB01113890.1 BAAB01206962.1

Bmb040074 From Li Bin

AADK01026272.1

Whole sequence known but confidential

DIVISANAFIIFLGGFETTSSTLAFLFLELAADQRVQEKMRQEIREVLSRNDGKMTFEILQELIYMEMVIQ ()

ETLRLYPPFPSIQRMCTKDYLIPGTEKVVEKGTIVLFPTLGIQRDNQ

YFERADDFYPERWSGGFVPPPGVYMPFGDGPRYCI

 

>Simlar to CYP6CJ1 Louse P450 EST 40%

CB887321   Lice_03_N06_T3 Body louse cDNA library Pediculus humanus cDNA.

          Length = 704

 

MDYYYVTLLIIIISALTWFIYALANRNCNY

WKNRNVPYVSPVPFFGNLKSLLFLEKNVGEFLGDLYNSEKLPDHGFYGIYLTNNDPA

LIIRDPELIRNICIKDFQHFSNRNATSDEKNDPLGYNNLFTLTGSKWKFIRTKLTPTFTSGKM

KQMFTLVKETSDELIRYLNENSKEKNYTIMETKRF

 

>and Pea aphid P450 EST 40%

CF587676   USDA-FP_120600-031 Acyrthosiphon pisum, Pea Aphid Acyrthosiphon

           pisum cDNA clone WHAP-003_H03 5'.

MLIFANFWIDFIILITVLFSIIYYYCTSTFNVWKK

LNVPYIRPIPLFGNYLRVALGIENPMETYRKIYCELAGFKYGGMFQMRTPY

LMIRDPEIINNILIKDFSYFTDRGIYVDFKTEPLSEVLFLMNNPRWKKFRSKLSPAFSSGKLKQMFNQI

EKCGHDMINNIFAELKKNPHDIDMRDVVAKYSMDVIGSCAFGLTLM

 

>CYP366A1 an--0429 new N-term http://www.ab.a.u-tokyo.ac.jp/silkbase/

BAAB01010488.1 BAAB01018170.1 BAAB01015588.1 BAAB01015589.1

AADK01020747.1, BY914303.1 EST

           28% to 4G15

    MLIMSEVLWCLLMLCACVWWWCSAPRRTRKLLAALPSFPQLPLLGNIHQIPRNSI

    NLFQFLEKIAVTCDTTEMPFVFWLGPRPIL

    FISDPEDVKGVNNAFIEKPHYYSFAKVWLGNGLVVAP

    PEIWKNSIKKLSGTFTPSIVEGYQEVFAGRAADLV 545

546 QRLKARTTEEPFDIMHDLAYTTLEAICQTAFG 641

 

>CYP366A1 35% to 4C37 BAAB01132146.1 BAAB01000816.1 ETAM exon

Bmb038385 from Li Bin

AADK01020849.1

best matches to CYP4C middle region

ETAFGFPKISESIVTKEYYDAFHRCLELLIRRGLNPLLHLDFMYRLTPASKELQKCVGILHNVSNT ()

VITKLIQERECAKQDNRNEI ()

BAAB01011842.1

VPNRRRFKSFLDLMLEMQVSTPELSVEQIRSEVDTVLFGGHETVATTLFYALLMIGRDKNLQDK

LYNEVRDVVGDGGRPVTGADLPRLRYCEATVLETLRLFPPFPAVLRMADKDLQLSSGI

 

>CYP367A1 BAAB01019896.1 BAAB01107379.1

Bmb008200 from Li Bin

AADK01004845.1 BAAB01014226 4 fam like 27% to 4C15

     MLWPVIYLLALVFAYWPYWRWKNRRLLRLSASMPGPRALPIIGNGLLIVVNTGGKFNKTI (0) 4269

5077 LFRTYGDYCKIWLGPELNICVKNPDDIR (0)

     LLLTSNKVNQKGPAYEIMKAAIGPGILTG (1)

6651 GPTWRNHRKIVTPSYNKRAVKLYSAVFNREAEVLANLLLKKQSGVTFNVYYDVVEITTQCVC (1) 6836

8492 QTLIGLSKDDSRNVDGMSDLILETQN (2) 8569

9222 LYELLFTKMTVWWLQIPFVYWITGRKATENAYVKKIDRLTSDFLKKRRTALKGGNVDEESMGIVDRYI

ASGELTEQEIKWETMTLFTTSQEASAKITSAVLLFLAHLPDWQEKVYKEIVEVIGSGRND

VTAEHLKHLHYLDMVYQEALRYLSIAALIQRTVEEEITINNGKFTLPIGT

TLVIPIHDLHRDPRYWDEPLKVKPERFLPENVKKRSPNVFIPFSLGAMDCLGRVYAEPLIKTLVVWAVRE

VQLEAEGCVEDLKLHVAISVKFANGYNLKVKPRI*

 

>CYP367B1 BAAB01152562 N-term 4 fam like

127 aa more seq known but confidential

MISYMILIVIVFALMWSGWKQKNKKFMEMANQFPGPQALPFIGNALRFMCEPE