Aedes aegypti cytochrome P450s

 

Oct. 18, 2005 under revision April 21, 2006 (in progress)

Revision continues May 19, 2006 to June 25, 2006

Compiled by David Nelson and David Drane

 

The completed and named sequences are here

(/AedesFasta.June25.htm)

This file is more archival with detailed information.

Please see the FASTA file above.

 

Useful links for analysis

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi  Trace Archive at NCBI

http://trace.ensembl.org/perl/traceview Trace files at Ensemble

http://132.192.64.52/blast/P450.html P450 Blast server

http://www.proweb.org/proweb/Tools/WU-blast.html Do-it-yourself WU Blast

http://www.bioinformatics.vg/bioinformatics_tools/JVT.shtml DNA translator

http://ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATABASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&GENETIC_CODE=0&NCBI_GI=on&PAGE=Translations&PROGRAM=tblastn&SERVICE=plain&SET_DEFAULTS.x=23&SET_DEFAULTS.y=10&SHOW_OVERVIEW=on&UNGAPPED_ALIGNMENT=no&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes   NCBI TBLASTN search

http://www.ncbi.nlm.nih.gov/BLAST/tracemb.shtml NCBI megablast

http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=a_aegypti TIGR Aedes gene index page

 

206 Aedes sequences here including 142 complete sequences. 

Numbers in () are intron phases. Names have not been assigned for most genes.

 

Sequences collected and assembled by David Drane and David Nelson from July to Sept.

2005. 3.5 million of 15 million trace file sequences were downloaded from NCBI and

placed on a stand alone BLAST server on a Mac G4 for TBLASTN searches

at expect value of 10.  The WGS section of Genbank was searched and 220 AAGE01XXXXXX

accession numbers are given at the end of this file.  The TIGR Gene Index was

searched for text “P450”.  The EST section of Genbank was searched and

discontiguous megablast was used to extend sequences by chromosome walking.

Most sequences should be represented here now, but not all are assembled. 

The Aedes mosquito seems to have more P450s than the Anopheles mosquito. 

 

This file is in progress.  The CYP4 and CYP325 families are not yet fully assembled

because there are some large introns in these sequences.

The sequences are presented in clan groups: the CYP2, CYP3, CYP4 and mitochondrial

clans.  Note: Aedes has a CYP18 that was not found in Anopheles.

CYP329 of Anopheles now looks like it is a pseudogene of a CYP9 sequence.

It is short in the heme signature and it has a P at the critical T in the I-helix

oxygen binding pocket.  It is the only sequence that is in the CYP3 clan that does not

fall inside the CYP6 or 9 families in Anopheles.

There are 11 complete sequences in the CYP2 clan (CYP15, 18, 303, 304, 305, 306:phm, 307)

Phantom phm is one of the Halloween genes.

There are 76 complete sequences in the CYP3 clan (CYP6, CYP9)

There are 34 complete sequences in the CYP4 clan (CYP4 , CYP325)

There are 9 complete sequences in the mitochondrial clan (CYP12, 49, 301, 302:dib,

314:shd, 315:sad) These include three of the Halloween genes disembodied dib, shade shd,

shadow sad.

There are 21 pseudogenes so far.

There are 15 partial sequences (not including the pseudogenes).

 

CYP2/CYP18 clan sequences

 

>514720743 753475610 750240311 possible CYP15 N-term = DR747015.1 EST

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

 

>CYP15B like 585964866 641740723 584363040

 78 GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA 257

258 IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV 437

438 NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2) 575

    FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLSPLWTFLQ 816

 

>CYP15B like seq DR746695.1 adult female corpora allata cDNA

813859354 749484786 522065275 514869301 520643713

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLIAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYFVQLKERLI*

 

>possible complete CYP15B1 assembled from parts 52% to 15B1 from Anopheles

AAGE01116789 AAGE01129498

Used trace archive seqs to verify seq at PLLNLLRPLWTFLQ

This region is not accurate in AAGE02003241.1

638470554 

823375362 

593712263 

586030336 

641740723 

569671400 

used AAGE02003241.1 for the C-term seq changes

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA

IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV

NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2)

FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLRPLWTFLQ (0)

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLVAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYYVQLKERLI*

 

>567527404  46% to CYP15B1 may be a CYP15 pseudogene

XXXXXXXXXXXXXXXXLVRRFRFYHHTCAAFMCLYRPIVDLRMGRDRVVIMTGLDP

I*KVYSKDEKENRPVGFFFRIRSFDKRLAVVFTDGAHWDIQRRFSVRTLKALGMGRTGLV

SSLEREAEEMIHHLRKLSRTQKVISRNNAFDVSVLNSIWTLIAER

 

>CYP18A1 AAGE01025833 AAGE01338874.1 AAGE01065191.1 529463664 572557122

66% to 18A1 (note: CYP18 not seen in Anopheles) complete

revised at cyan aa based on AAGE02007615.1

     MFLDTYLLGVVRQEFFDASKARST

2678 LLVFCCTLSCVVFLQWLFRLVCQIKKLPPGPWGVPIFGYLTFIGHEKHTQYMKLARKYG 2502

2501 SLFSAKLGAQLTVVISDYKIIREAFKTEDFTGRPHSPLLKTLGGF (1?)

     GIINSEGQLWKDQRRFLH 719

 718 EKLRHFGMTVLGNKKHLMESRIM (0)

 534 TEVAELLASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLF 359

 358 GEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIVDAYL 179

 178 DEIQKAQAEGRDQELFDGKDH 116

     EIQMMQVIADLFSAGMETIKTTLLWLNVFMLRHPDAMKRVQDELDQVVGRNRLPKIEDVP 406

     YLPITETTILEVMRISSIVPLATTHSPKS (2) 319

2116 DVVINGYTIPAGSYVVPLINSVHMDPTLWDKPEEFNPSRFLDAEGKVHKPDFFIPFGVGR  1937

1936 RRCLGDVLARMELFLFFASIMHTFTIELPEDEPMPSLKGIIGVTISPQAFRVKLIPRPLN  1757

1756 ADLDRLRNVGSC*  1718

 

>AAGE01098313 (upper seq) CYP18 like fragment probable pseudogene

Query:  1703 ASLNEVGSRSS 1671

             ASLNEVGS+S+

Sbjct:   210 ASLNEVGSQST 220

 

Query:   313 VGSQSTELSKYLSVLVSNVICNIIMSVRFSLEDPKF--------------GGMHTIDYIP 176

             VGSQST+LSKYLSV VSNVICNIIMSVRFSLEDPKF              G +HTIDYIP

Sbjct:   215 VGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHTIDYIP 274

 

Query:   175 QIQYLPGNV-------KNRQEMFDIYREVINEHKRSFNAENIRDIV 59

             QIQYLPGN+       KNRQEMFD YREVI+EHKRSFNAENIRDIV

Sbjct:   275 QIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:    56 AYLDEILKAQAE 21

             AYLDEI KAQAE

Sbjct:   322 AYLDEIQKAQAE 333

 

>AAGE01227048 (upper seq) CYP18 like fragment probable pseudogene

Query:   643 ASLNEVGSQPIDLNKYLSVSVSNVICNIIMSVRFSLEDPKFA-------------G-LHT 780

             ASLNEVGSQ  DL+KYLSVSVSNVICNIIMSVRFSLEDPKF              G +HT

Sbjct:   210 ASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHT 269

 

Query:   763 FAGLHTIDYIPQIQYLPGNV-------KNRQEMFDFYREMIDEHKQSFNAENIRDIV 912

             F  +HTIDYIPQIQYLPGN+       KNRQEMFDFYRE+IDEHK+SFNAENIRDIV

Sbjct:   264 FGEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:   915 AYLDEILKAQAEDRDQELFEGKDHEI 992

             AYLDEI KAQAE RDQELF+GKDH++

Sbjct:   322 AYLDEIQKAQAEGRDQELFDGKDHDV 347

 

>CYP303A1 AAGE01109944 641807020 834983680 618119317

834966118 826136105 587934965

72% to 303A1 complete

MYWYYLACFIVVFIIFLYLDCIKPANFPPGPKWYPIIGSAIEIARARQKTGMLCKAIKLIASKYDHKGVIGF

KVGKDKTVMAISGDSLREMMSNEDLDGRPTGIFYETRTWGLRRGVLLTDEEFWQEQRRFI

VRHLKEFGFARKGMAEIIGNEAEYVKNDFHALVKAGNGKALVQMQSAFSVYILNTLWLMM

AGIRYTRENKDLKYLQSLLHELFANIDMMGALFSHFPFIRFFAPRLSGYKQFVEIHNLMH

KFIGAEVENHKKSFNDTDEPRDLMDVYLKILQSNRDIPESFSQEQLLAVCLDMFIAGSET

TTKTLGFAFLHLVRQRETQLKVQKELDEVVGRNRLPTLEDRVN (2)

LPYCEAVVLEALRMFMANTFGIPHRALRDTKLCGYDIPK (0)

DTMLVGMFRGMMLNDWESPTSFKPERFLKGGKIVIPPNFHPFGVGRHRCMGEMMGKAN 110

LFLFITTLFQSFDFLVPEGYPIPSDEPIDGATPSVRQYTALIVPR*

 

>581536484 803281860 586608108 826028980

574131458 595148561 754352758 590136340 519840563 753671460

67% to 512982119 above 58% to 304B anoph

tried walking the chromosome down to exon 2 so

some numbers above are in the intron

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

 

>223483644 519671636 528946489 494183870

a second 304B like C-term sequence 57% to 304B

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B2xx Possible full length gene joining the 512982119 and 223483644 fragments complete

note this is a hybrid of two different genes, see corrected seq below

AAGE01051934

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B3yy/xx top part = my old Byy, bottom = my old Bxx + 1 aa diff

DW987682.1 EST supports this assembly, so mine are hybrids

AAGE02028825.1 revised seq on 4/20/06

46553 MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHRAAVRLGQFYRTKILGIYLGDFP

SIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRRGIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELM

TLLDVLRYGPKFEHERLFAKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (2) 47203

61403 TGRHAINFQQKGDDYGTILSYLP

WLKDYFPEATNYRILREVNNRLNDLIEAMVQKYLASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1) 61672

61872 YDQLVMILWDMLL

PTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRVNLPYAEATLREALRIDTLVPSGISHVALEDTKL

CGYDIPKGCFVMLSLDVINNQREFWGDPENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQN

FNIKPRPGDPLPDLGQRITGVVTSMEPFWLRFEAR* 62501

 

 

>512982119 637789748 834948129 570603901 750442192 570627554 568540398

743856885 581525309 637183809 812171267 586112683 570800380

579961153 793213948 581533371 587665129 570695804 574007683

60% to 304B1anopheles numbers above include a long chromosome

walk of about 5-7kb, about 500 bp per step.  No C-term was found

N-term exon is 55% to 304B anoph. and 48% to 304C anoph.

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

 

>CYP304B 494544931 512720460 41% to 476322188 72% to 304B1

827562306 594336057 512633341

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Byy AAGE01029809 Possible full length gene joining

the 581536484 and 494544931 fragments complete

note this is a hybrid of two different genes, see corrected seq below

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Bxx/yy top part = my old Bxx, bottom = my old Byy

AAGE02028825.1 revised accurate seq 4/20/06

22307 MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHRAANKLCEYYRTKILGIYLGNFP

TVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLRGIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEIL

TLVEMLRYGPRHEHETEFMTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE (2) 22960

35137 TGKYAMMFQRTGDDYGTIYSLL

PWMRHLFPNRTRYRTIREGSLGVNRFIESIIQKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQH 35412

35469 DQLVLGIVDF

FFPAISGATTQIALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRIDTLVPSGVAHMAMKDT

TLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGELSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALV

QNFNIRQRLGDKLPDMGKRSTGIIISPADYWVKFEPR* 36092

 

>CYP304C1 AAGE01104491 512990636 572473586 613989430 64% to CYP304C1

749978894 754492027 584954719 complete

MVLISELIIAALLGLLIYRFYRYLFERPSENFPPGPPRL

PLLGGYPFMLALNYKHLHKAAARLSQLYKSKLIGLYLGPLPAVIVNDYDTVKEVLTRPEF

DGRPDLFMARLRDQHFQRR (1)

GIFFTDSESWREQRRFFLRTLHHFGFGRRSPEAEADIQAGLEDVISLLRDGPKYEHEKAL

VDSAGFALCPTVFFAVFSNVLLRMIVGVRLAREDQAVMFE

VGKNAIAFHRNGDDYGMLLSYIPWIRHLFPKTTKYDLLRKVNQQANAVILSLAQKCES

SYDENDIRCLVDAYIQEMRATGSKGESTGKDEFGFQ (1)

YDQLVIGAADFLVPPFSAIPAKICLILERLIQYPEVQTKMYRELNEVVGLNRLPTLDDRA

DLPYCDAVIREGLRIDALVPSGIPHMAVTDTQLNGYQIPKGTVIVNSLEFIHHQPEIFRD

PDSFMPERFLTPDGKLALDQDKTLPFGAGKRVCGGEQFARNALFLGVTSLVQNFTFQ

LPAGRACPDLDGRITGVIQTTPDFRLKFVSRR*

 

>CYP305A6 AAGE01041187 494160882  476322188 754462117 mate pair = 754369970

which is an exact match to part of AAGE01202372

65% to 305A2 825745101 613940462

AAGE01202372.1 N-term exon for CYP305A complete

1435  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 1346

GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQ

YPAVHEALTKEAFDGRPDNFFIRLRTMGTR (2)

LGITFTDGPFWTEHNSFVVR

HLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTTGSRIPR

DDQRLTRLLKLLQDRSKAFDMSGG

ILSQLPWLRHIAPEWTGYNLINRFNQEIHEFFKATIEKHHQDYTEEKCSDDLIYAFIK

EMKERKDDPCSTFTDVQLSMIILDIFIAGSQTTSTTIDIALMILAMNTEIQRKIYAEIDD

NFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQIAPIGGPRRALSDCTLGGYRIPRNTTILM

GLHTVQMDPDHWGDPENFRPERFIGPDGKIINTERLIPFGLGRRRCLGDSLARSCMFTFL

VGILQKFSLRLPDSLEGPSLKLTPGITLSPKPYKVVFEPRLK*

 

AAGE02003241.1

24317  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1)  24228

13700  GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQYPA  13530

13529  VHEALTKEAFDGRPDNFFIRLRTMGTR (2) 13449

13391  LGITFTDGPFWTEH  13350

13349  NSFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTT  13170

13169  GSRIPRDDQRLTRLLKLLQDRSKAFDMSGGILSQLPWLRHIAPEWTGYNLINRFNQEIHE  12990

12989  FFKATIEKHHQDYTEEKCSDDLIYAFIKEMKERKDDPCSTFTDVQLSMIILDIFIAGSQT  12810

12809  TSTTIDIALMILAMNTEIQRKIYAEIDDNFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQI  12630

12629  APIGGPRRALSDCTLGGYRIPRNTTILMGLHTVQMDPDHWGDPENFRPERFIGPDGKIIN  12450

12449  TERLIPFGLGRRRCLGDSLARSCMFTFLVGILQKFSLRLPDSLEGPSLKLTPGITLSPKP  12270

12269  YKVVFEPRLK*  12237

 

>73% to CYP305 above 519967093 521924636 570423900 pseudogene of AAGE01051792

contains a deletion and stop codon

FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPAVREVHSKEEFDGRPDNF

LLKMRLERFVISRLGVTCTDGPFWAEHRNFVVRHLRQAGYGRQ

GIIRDMDGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMS

GGVLSQLPWLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYA

FIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQRDTCRN

R*DLHHDEMPSKRSYSLPYTE

 

AAGE02003240.1 this matches 305A5

Sbjct  53574  FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA  53395

 

Query  61     VREVHSKEEFDGRPDNF-----------------LLKMRLERFVISRLGVTCTDGPFWAE  103

              VREVHSKEEFDGRPDNF                 LLKMRLERFVISRLGVTCTDGPFWAE

Sbjct  53394  VREVHSKEEFDGRPDNFFLRLRTMGTR*DFKL*CLLKMRLERFVISRLGVTCTDGPFWAE  53215

 

Query  104    HRNFVVRHLRQAGYGRQ--------------GIIRDMDGEPVWPGSILPTSVINVLWTFT  149

              HRNFVVRHLRQAGYGRQ              GIIRDMDGEPVWPGSILPTSVINVLWTFT

Sbjct  53214  HRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDMDGEPVWPGSILPTSVINVLWTFT  53035

 

Query  150    TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  209

              TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH

Sbjct  53034  TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  52855

 

Query  210    EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  269

              EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ

Sbjct  52854  EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  52675

 

Query  270    TTSITIDLAFMMLTMHTDIQRDT-CRNRXDLHHDEMPSKRS-YSLPYTE  316

              TTSITIDLAFMMLTMHTDIQ+        +LH DEMP +    SLPYTE

Sbjct  52674  TTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMPQQNDRTSLPYTE  52528

 

>CYP305A5 AAGE01051792 70% to CYP305A2 but no stop codon N-term exon is one of two choices.

82% to other CYP305 Aedes seq

520611721 836008963 529076567 570690021

AAGE01309663.1 CYP305A N-term exon matches by default since the other CYP305 has an exon 1 sequence complete

     MIVLVLTSVLIIAFSYWLLQELRRPPNYPP (1)

     GPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 696

 697 VREVHSKEEFDGRPDNFFLRLRTMGTR (2?) 777

 838 LGVTCTDGPFWAEHRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDM 987

 988 DGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLP 1167

1168 WLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDD 1347

1348 PSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMP 1527

1528 QQNDRTSLPYTEAFLLEVQRFFHIVPVSGPRRALSDCTLGGYQIPKNTTILMGLRTVHMD 1707

1708 PEHWGDPECFRPERFLSPDGKIITTERLIPFGLGRRRCLGESLARACMFTFLVGILQKFS 1887

1888 LRQPANCSEKPSPKLLPGITLSPKPYKVIFEPR* 1986

 

>CYP306A1 570772008 512981304 597667916 641824294 753304856 593374976 574131373

587966306 514783872 514783871 618134500 835036042 803206894 578828539

AAGE01228356 AAGE01635404 AAGE01635520 complete

MYLILGIVLILTYVLWTLLDRRGKPPGPFGLPILGYLPFIDSIKPYETLTNLAKRYG

PVYSLRMGQVDAVVLTAPDLIRDTLKREETTGRAPLFITHGIMGGH (1)

GIICAEGNLWRDQRRLSTEWLRKMGMTKFGPTRATLEARILIGVNELLE (0)

DLRRESEKVFAFDPAPLLHHILGNLMNDIVFGLQYERDDATWRYLQHLQEEGVKHIGVSMAVNFLPFLR (2)

HLPSSKRIIEFLLNGKAKTHKIYDSIIEKQRSRMEGGGSEVSDP

GRHDDCILSNFLQETRRRETGARPELAFCSDVQLRHLLADLFGAGVDTTFTTLRWLILFL

ALNKDAQERLRQEMASQLRGEPCLNDVDSLPYLKACVAEAQRLRTVVPLGIPHGAVS (0)

EITIAGYKVSKNTMIIPLLWSVHMDPSLWPNPDRFDPDRFLDESGQYSAPAHFMPFQT

GKRMCLGDELARMILLLYTGRLFWHFELDVFNGEGLDLTGVCGITLTPPPFEIIFKERV*

 

>CYP307A1 571521703 817504746 824335840 591439033 834970143

TC53059 TC28026 TC50479 78% to CYP307A1 complete

813467047 (exon 1) found by searching with the DNA seq above, 67% to anoph 307A1

246 MAYTLILVALMSLLSVVCYLKVLYEWHRKVRVQTVKSSRYAKKLQKLEESQPQEVEEAP 422

423 VEFPQAPGPYPWPVLGSAAIIGQYPAPFMGFSALAKKYGDVYSIRIGQGQCLVVSSLELI 602

603 REVLNQNGRYFGGRPDFLRYHQLFGGDRNN (1)

SLALCDWSSLQQKRRNLARKHCSPSDASSYYQKMSDVGV

AEMHYFMDQLTDVVTPGQDFKVKPLIMQACANMFSKYMCSVRFEYDDAGFQKMVHSFDEI

FYEINQGYAVDFMPWLAPFYFRHMSKLSSWSNYIRGFILERIVNEREQNLGEDEPERDFT

DALLKSLREDPSVSRDTIMYMLEDFIGGHSAIGNLVMLALGYVAKNPEIGARIQQEIDHV

TDKGLRNVTLYDTESMPYTVATIFEVLRYSSSPIVPHVATENTCIG

GYGVQTGTVVFINNYDLNTSEKYWDHPERFDPSR (2?)

SNESQKQILRVKKNIPHFLPFSIGKRTCIGQNLVRGFSFIMLANILQKYDVHT

NDPAQIKMKPACVAVPPDTYPLAFTQRSQ*

 

>CYP307B1 AAGE01081732 476411966 68% to 307B1 519649910 578920479 complete

revised according to AAGE02011086.1 and AAGE02028078.1 4/20/06

1027 MEKFTIFLFSSNTIYLLVACFLVTLIMLLLEVRQKISVKSDLVKLVKSFLFGQWLSVFTQNNKNRNL 848

847  NDTEVKVLRRAPGPKSYPIIGNLKDLDGYEVPYQAFSVLAKKYGPVVNLKLGVVDAVVIN 668

667  GIEHIKEVLINKAQYFDSRPNFRRYQLLFSGNKEN 533 (1)

     SLAFCDWSEVQKARRDMLVPHTFPRNFSGRFNELNGVINDEIRLVIGESNVNRVIEIK

14  PIIMNICANVFSQYFASHRFELEDPKFQKLVKNFDQIFYEVNQGYAADFLPFLLPLHHR 193

194 NLKRMDQLAEEIREIMLETIINDRYDNWVEGNTENDYVDSLINHVKSKIGPDMEWETALF 373

374 ALEDIIGGHSAVANFLVKTFGYIIQHPEVQQNIQSEVDRVLETEGKHTVDLSDRNHMPYT 553

554 EAVIMEALRLIASPIVPHVANQDSQIG 637 (1?)

685 GYDVPKDTLIFLNNYDLSMSENLWENPNDFVPERFLQNGRLVKPDFFIPFGAGRRS 864

865 CMGYKMTQLISFSIIANLLRSYTITPLSGHSYFVPVGSLAMPEKSYEFQINLRH* 1029

 

CYP3 clan CYP6 related sequences

Note CYP6 and CYP9 sequences (in Anopheles) have only one intron and will be the easiest to assemble. 6AG, 6AH and 6AJ

Are exceptions. 

 

CYP3 clan sequences

CYP6 related, 14 complete, 20 partials

 

>AAGE01198540 494152727 63% to CYP6Z2 67% to AY433537 519918984 574095157 569650597 complete

MFIYTFALFWLALVLVLRYIYSYWDRNGLASIKPQIPYGNLKSVAQK

TQSFGVATCELYWKSQERLAGIYLFFRPAVLIRDAHLAQRIMTTDFSYFHDRGVYCNEEI

DPFSANLFAL

PGKRWRNLRHRFTPLFTSGQLRCMMPTILDVGHKLQKFLEPAAERQEVVDIREIVSRGVL

ELIASLFFGFEADCINDPDDAFSKTLREFQLGGFMNNFRTACTFVCPELLQVTRISSLSP

QMIKFATDVVTKQIEHREKNNVSRKDFIQLLIDLRREEANNNEVALSFEQCAANVFLFYV

AGSDTSTSAITFTLHELTQNPEVMDKLQSEIDEMLVQTNGELTYTAIKELPYLDLCVKET

LRKYPGLAILNRKCTKSYAVPESSVVIQEGTQIMIPLLAYGMDEKYFPEPERYYPERFNKQSKNYDEKA

YYPFGEGPRNCI (1)

AYRMGVMVSKIGLILLLSKFKFEATQGPKIVFSAATVPLVPKGGIPVKISNR*

 

>AAGE01065173 78% to AAGE01198540

N-term is on AAGE02015843.1 (revised 4/20/06)

46903 MFIYTFALFWLAVAFAIRYIYSYWDRNGLPSIKPH

3333 IPYGNLKAVANRTESFGVATCDLYWKSKDRLVGIYLFFRPAVLIRDAHLAQQIMTTDFSH 3154

3153 FHDRGVFCNEEVDPFSANLFALAGKRWRNLRNKFTPLFTAGQLRCMMPIILSVGHKLQNV 2974

2973 LEPAAKKQEVLEIRELVSRCVLDIIASVFFGFEANCINDPNDAFIQNLRELQYDGFFNNL 2794

2793 RAAASFICPELLKLTRISSLSPEMIRFVTDIVTKQIEHREKNKVTRKDFIQLLIDLRRED 2614

2613 TNNNEAALGFEECAANVFLFYVAGSDTSTSAVAFTLHELTQNAETMGKLQTEIDEMLVKT 2434

2433 SGELTYDGIKEMSYLDLCVKETLRKYPGLAILNRECTKSYAVPNSDILLKKGTQVVIPLL 2254

2253 AYGMDEKYFPEPDRYLPERFDKSTKNYDEKAFYPFGEGPRNCI (1) 2116

2065 AFRMGVMVSKICLVLLLSRFNFEATRGPKIDFTPSTVALLPKGGIPVKISIR* 1907

 

>AAGE01047841 AY433537 62% to 6Z2 569650597 622013821 579345058 complete

MLFIYSVALLCIAVTLALKYVYSYWDRHGLPSVKPHIPFGNLKTVVKKTESFGIAIN

QLYWQTKGQLAGIYLFFRPAILVRD

AHLAQQIMTTDFNHFHDRGIYCNEEGDPFSANLFALPGKRWRNLRNKLTPLFTGGQLRGM

MPTILEVGEKLQKHLEPVAERQEVVEIRDIVSRFVLEIIATVFFGFEANCIEDRDDSFSK

VLREAQGERLSAVLRAAAMFVCPGLLRYTGISSLEPQVIAFVSEIVTKQIEHREKNSVTR

KDFIQQLIEIRRGSGENQVPAMSIEQCAANVFLFYAAGSETSTGTIAFSMHELSHHADVM

KKLQDEIDDALAKSNGAITYESVMQMQYLDLCVKETLRKYPGLPFLNRECTMDYKVPDSD

LVIRKGTQLVLPIYGFSMDEQYFPEPECYIPERFEEASKNYDEKAYYPFG

DGPRNCI (1)

AYRMGVLITKIGLILLLSKFTFEATQGPKMMFSSASVPLLPKDGISLKISN

RKR*

 

>AAGE01005406 80% to AY433537 62% to 6Z2 complete

possible pseudogene with frameshift at AVGDKLX X = ct

confirmed in four trace archive sequences

520668645, 757097876, 589569591, 811977620

5398 MLFVYTLTILSIAITLVLKFVYSYWDRYGVQNIKPHIPFGNLKTVVKKTESFGVAINQLY 5219

5218 WQTKGQLVGIYLFFRPAILIRDAHLAQQIMTTDFNHFHDRGVYCNEEGDPFSASLFSLPG 5039

5038 KRWRNLRNKLTPLFTGGQLRGMMPTILAVGDKLX 4940

4937 KHLEPVAENREPIEIRDIVSRFVLEIIATVFFGFEANCIKDRNDAFCRVLREAQRESMYT 4758

4757 NFRAAAVFVCPGLLKYTGISSLEPEVKEFVSGIVTEQIEHREKNGATRKDFIQQLIELRR 4578

4577 EDSQNQNVRMSIEQCAANVFLFYIAGSETSTGTITFTMHELSQHPEVMKKLQAEIDDTLA 4398

4397 KSNGEITYENVNQIQYLDLCVKETLRKYPGLPILNRECTSDYKVPDLDLVIRKGTQVVIP 4218

4217 LYGISMDEQYFPEPECYKPERFDGASKNYDEKAYYPFGEGPRNCI (1) 4083

4017 AFRMGVLVSKIGLVLLSSKFNFKPTQGPKIVFSPAAVPLVPKGGISLMISRRDK 3856

     VADLYMGLHISVVLKVVCS*

 

>AAGE01054542 476413066 56% to 20199522 76% to 6Y1 579367130

614744104 834925676 complete

MWLVYLVWLVAAVLLAVYLWIKKRFNFWKDRGVEYIEPEFPFGNFKTLGKVEHIAPITQR

HYDYFKQKGVPYGGVFMLTSPLLYILDTKLIKTLLVKDFNHFPNRGVYFNEKDDPLSAHMFAI

EGNKWKTLRNKLSPTFTSGRIKMTFPLVVGVCQQFCDHLGEVVQQSNEVEMHDLLSRYTI

DVIGTCAFGIDCNSFREPDNEFRKYGKIAFDKLPHSPLVVYLMKAFRSYANAFGMKQLHE

DVSSFFSKVVKDTIEYRESNNVVRNDFMDLLLKLKNTGRLEESGEEIGKISFEEIAAQAF

IFFTAGYDTSSTAMTYTLYELALNQKAQEKARKCVLDIFAANNGTLTYESVGNMGYLDQC

IN (1)

936 ETLRKHPPVAILERNADRDYKLPDSDIVIKKGRKIMIPTFAMHHDAEHFPDPE 760

759 RYDPDRFSPEQVACRDPYCYLPFGEGPRICIGMRFGTIQARVGLASLLKRFRFRVCDKTQ 580

579 IPVRYSKTNFILGPANGVWLRVEKL* 505

 

>AAGE01206812 586027460 593564617 637757183 494307621 complete 38% to 6M1

TC54189 TC23406 TC42024 38% CYP6P3 TC54190 TC23407 TC42025 TC574 TC6535

83% to TC63333 94% to TC54191 581543219

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNPSFPFGDVADTFKQRKSYANRLAELHHQ

SASDSHRFVGIYTLFQPILLVTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAG

AKWRRMRLKLTPAFTTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVIASVGFGLE

CNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVK

LNDDDVEEYMLNLVR

DTIAKREGGGEVRKDFIQLL ()

VQLRNQVEVKDGGSWEMNKVDQNKTLTVEEMAAQSFVFLN

AGYETTSSTVTFCLFELCRNKDLIRKVQEEIDRVMDGGREISYEALAEMTYLESCIDETL

RKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYEDPVKFDPDRYGER

KSETMPHYSFGDGPRVCI (1)

GLRMGKVMAKMALVELLFRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRFRAK*

 

>617983543 some differences with TC54191 TC42026 44% to CYP6Z3

94% to 586027460

584131270 520119914 760257438 832454533 625082069

complete 39% to 6N1 anopheles complete

MLLPILLVVLVVYLFQKWTYSHWKRRGVPQLNPAFP

FGNVADTFKQRTSYSNRLAELHHQAVRDGHRFVGIYTLL

QPILLVTDVELVKRMLTVDFEHFVDRGAHVNEKRDPLSGHLFSLTGAKWRRMRLKLTPAF

TTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVI

ASVGFGLECNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVKLND

DDVEEYMLNLVRDTIAKREGGGEVRKDFIQLLVQLRNQVEVKDGGSWEMNKVDQNKTLTV

EEMAAQSFVFLNAGYETTSSTVTFCLFELCRNKDLIGKVQEEIDRVMDGGREISYEALAE

MTYLESCIDETLRKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYED

PMKFDPDRYGERKSETMPHYSFGDGPRVCI

GLRMGKVMAKMALVELLSRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRLRTK*

 

>NABNU08TR  NABNU08 32% to CYP6Z4 59% to 586027460

(no genomic match) looks like a pseudogene

best genomic match was to 617983543 at 77%

I do not think the TIGR database actually has an Aedes seq that is missing from

The 15 million trace files of Aedes, so this may be a contaminant from another

Species

 

KTLTPFERAAQSSGSQKAFYETTSATGDGSRIERSRNKDLI

GKVQEEIDRVMDGGKGIS*

EALAETTYPESCTEETLRKHPSPPDQDRGGTKPNKTPETDDASEKDTPVQTPPGGTQRDK

REKEDPEKHEPERYGERKPETTPHHSRGDGPRDSTGHRKGKATAKKALAEQPTRNDYEQE

PPAADTGENEQEPSQPTPQAKHEVKQKPRQRAK

 

>AAGE01003592 512632636 TC63333 TC10785 TC15419 TC26904 TC37692 TC4101

41% to CYP6P4 83% to 586027460 836033925 753054225 574000494

(n-term looks identical to 586027460) complete

revised 4/21/06 used AAGE02004393.1, AAGE02030939.1

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNP

SFPFGDVADTFKQRKSYANRLAELHHQSASDSHRFVG

IYTLFQPILL

VTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAGAKWRWMLQKLAPAFTSAKV

KSMFPTMMTCGRTLSAVVGDHLGRALPIRALMTRFTMDVIASVGFGLDCN

SMRNPDEPFHKMGSKFFSKSWKTSVRMLLAFVAPKVNRFLQL

KLNDDDVEEYMLNLVRDTIAKREHGGEVRNDFIQLLVQLRNQVEVEDGGSWEINKVEPNK

ALTVQEIAAQSFVFLNAGYETTSSTITFCLFELCRNRDLLGKLQEEIDEVVDGGREASYE

AITEMTYLEACVEETLRKYPISPVLFRVCTKPYRIPDTDFVIEKGTLVQISLVGLNRDPR

YYEAPLKFDPDRYGERKAETMVHYSFGDGPRGCIGLRMGKVMVKMALVELLSNYDFEMES

PTGENELDPSLLMLQPKHDVILIPKFM*

 

>CYP6AG3 AF288534 AAGE01003202 TC54102 TC12857 TC2905 TC29599 TC46955

TC9197 62% to CYP6AG2 48% to 6AG1 complete

Revised 4/21/06 used AAGE02011378.1 211773-218215 (+) strand

211773 MWWTVVGVLGGILSAIYLFLSWNFNCWKKDGIKGPKPRLLFGNL

       PNVLTQKKHIFYEYEKIYN (2) 211961

216796 DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSA

       NEFSDVVDEKSDPLFARNPFSLSGEKWKTRRGEITPAFTNNR (0)

       IKALSTLMDEVCDRMTDHVKKQKESALETKE (0) 217188

217247 LMSKYTTDVVSNCVFAIDAQSF SKDKPEIREMGRRIMDFNFAA

       QIILMVTTFLPSVKKFYKFTFVPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSL

       QEKKQISEIDMAGHGVSFFADGFETSSLVMTYCLFDLASHPEIQTRLREEIRNVQATK

       GGINYDNIGEMTYLDQVLNETLRIHPIIPVLAKRCTESTVLVGPKDQKIPVSAGTTVV

       IPYFVQLDSQYYQEPNKYNPERFSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQ

       VKRGIIEIIDKFEISVNSKTQVPLKYEPKMFMLYPVGGIWLNYKPIK* 218215

 

>CYP6AG4 possible assembled whole sequence 93% to 6AG3

NABUJ77TF = 6AG4  NABUJ77 = 6AG3 57% to CYP6AG2 90% to 6AG3

AY431873 96% to 6AG3 only 2 aa diffs to new 1.231_5

60% to 6AG2 only 45% to 6AG1 complete

DR747526.1 EST 95% to 6AG3 98% (only 3 aa diffs) to AY431873

This seq not found in the WGS section may be hybrid

Replace with

AAGE02011379

23932 MWWTVVGVLGGILSAIYLFLSWNFDCWKKDGIKGPKPRLLFGNLPNVLKQKKHIFYEYEKIYN (2) 24120

30535 DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSANEFSDAVDEKSDPLFARN 30705

30706 PFSLSGEKWKTRRGEITPAFTNNR (0)

      IKALSTLMDEVCDRMTDHVKKQKEPAVDTKE (0) 30927

30987 LMSKYTTDVVSNCVFAIDAQSFSKDKPEIREMGRRIMDFNFRAQIILMITTFLPSVKKF 31163

31164 YKFTFLPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSLQEKKQISEIDMAGHGVSF 31343

31344 FADGFETSSTVMTNCLFDLASHPEIQTRLREEIRNVQATKGGINYDNIGEMTYLDQVLNE 31523

31524 TLRIHPIIPVLRKRCTESTVLVGPKDQKIPVSAGTTVVIPYFVQLDSQYYQEPNKYNPER 31703

31704 FSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQVKRGIIEIIDKFEISVNSKTQVPL 31883

31884 KYEPKMFMLYPVGGIWLDYKSIK* 31955

 

>AAGE01024260 51% CYP6AG1 complete

replace with AAGE02035807.1 1 aa diff to earlier version

3271  MLVTVGLLLTAFAALYLYLTWHFDYWRKRNVPGPEPLPLVGNFPAFFRRNRPVMEEKYQIYK (2)  3456

3518  DYCSKYNFVGIFTNRSPQIFITSPALARDILVKYFKNFHDNEIGLITNKELDPLF  3682

3683  GRNPFVLNGAAWKAKRAEITPAFTASR (0)  3763

3825  IKALYVSVENVCAQMTKYVKEHCESPIEMKELGDKFTTDVVSSCIFGADAQSFIHQDAE  4001

4002  IRDMGSKLMDSSLSFALKMAVMTVLPSVAKIANMSLVSKPREKFFIKLMAEAIRHREESS  4181

4182  EKYLDFLDYLSMLKKEKNITELDMAAHGVTFFLDGNETSSATLSLNLYELAKQPEIQKRL  4361

4362  REELMNATNDDGTISYETLSELPFLEQVFSEGLRLWPPVTFMSKVCTDPIELDLTSTRKV  4541

4542  PIERGTCAIISNWSLHRDPNFYEDPLKFNPDRFAPEKGGIFPYKEKGCYMPFGDGPRQCL  4721

4722  GMRFGRMQVKRGIYEVIRNFEISVASRTSDPLKIVSSPAISLGLSGIWLSFKPIRS* 4892

 

>AAGE01002325 58% to 6AG1 complete

5532 MFLTITLIVTAVAAIYLYLTWNYNYWKKLNVPGPSPLPGLGSFPSFITQRRPVADEM 5362

5361 DEIYR (2)

5286 EYKPKYNFVGVFSNRSPRIMITSSELAKDILSKNFKNFHDNEFGEMTNKEIDPLF 5107

5106 GRNPFMLTGDEWKAKRAEITPAFTTSR 5026 (0)

4960 MKALFPLVEDVCSRMTKYVTQNRGSVLDSKELSAKFTTDVVSSCIFACDAQSFTSGKPEI 4781

4780 REQGRKLMEQSFSSFLILLFIINFPTLAKIFKIGLVPKSLEKFFTDLMKEAISHRDASGT 4601

4600 NRVDYLDYLISLRNKKEISELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPEVQKRLR 4421

4420 KELQKVTTDQGTVSYDSLLELSYLDQVVNESLRLWPPAAFISKKCTEPMDLPLTANQNVT 4241

4240 IGKEICAIINIWSLHRDPEYYDDPLTFNPDRFSPETGGTAPYREKGCFIPFGDGPRQCLG 4061

4060 MRFARMQVKRCLYELVSNFKITVNEKTKQPMKLDPKQFLTMPLGGIWLDFEPISK* 3893

 

>AAGE01005157 51% to AAGE01002325 48% to 6AG1 complete

3390 MWLIVISILVTIVSLVYHYLTWNFNYWKYRGVPGPLPKPFLGTFSSTFTQKEHPIEENNRIYR 3202

3149 LFREYRKDVPFIGGFSFRSPQLFALSPTLVKDILVKYHKHFRANEVGGTFDSKADPLLAR 2970

2969 NPFFLDGEEWRSKRAQITPAFTNSR (0) 2895

2815 LKALLPIMDNICNNMVSYIDRHIPNGPIESKELSAKYTTDVVSSCIFGAEGGSLTS 2648

2647 DRSEIREMGNALFQQTFMFIVLAVISSIAPILKRFVKLSLIPKSIENYFVGLMTEAVRK 2471

2470 RKASGTKQVDYLDHLINLQEQKEISILDMAAHGVTFFIDGFETTSEVLGFSLLELSIDKE 2291

2290 IQNRLRQEIHSAEDGQLTFETIMELPYLDQIVN (1) 2192

     ETLRKWPPAYALSKRCTEEITFRLKDNHEVLIEKGITAILPIWAIHLDK 1990

1989 EFYPDPNRFNPDRFSEEDGGHSVRYYQEKGVFLPFGDGPRACIGRRIGLLQVKRALVEIV 1810

1809 KNYDFTVNSKTVLPIKIDPKNIAVTPLGGIWIDYRKL* 1696

 

>AAGE01024111 86% to AAGE01002325 complete

3449 MFITITLIVSAVTAIYLYLTWHFNYWKKLNVPGPSPLPGLGNFPSFITQKRPVAEEMDEIYR 3264 (2)

3191 EYKPKYNFAGVFSNRSPRIMITSAELAKDILVKNFKNFHDNEFGELTNKEIDPLL 3027

3026 GRNPFLLDGSEWKAKRAEVTPAFTTSR (0) 2949

2884 MKALFPLVEDVCSRMTKYLIKNRGSVIDAKELSAKFTTDVVSSCIFACDAQSFTSEKPEI 2705

2704 REQGRKLIEQTFSSFMLLLFIVNFPTLAKIFHVGFIPKSMEKFFTNLMKDAVRYRDASET 2525

2524 NRADYLDYLITLKKKKELSELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPTVQKRLR 2345

2344 QELKKVTTDNGTVSYDSLLELSYLDQVVNESLRLWPPAAFMSKKCTEPMELPLTANRSVT 2165

2164 IGKEVCAIINIWSLHRDPEYFDDPLTFNPDRFSPETGGTSPYREKGCFVPFGEGPRQCLG 1985

1984 MRFARMQVKRCLYEAVTNFAITVNPKTMEPMRLDPKQVLTMPLGGIWLNFEPISK* 1817

 

>CYP6AL1v2 AY771597 AAGE01003622 complete 40% to 6N1, 98% to 6AL1

476388850 494541913

TC67380 TC35464 TC46198 98% to AY771597, 2aa diffs to 6AL1

MLFLAFAIFVLFAIIQIVYHFRYWMRRGVPQLRPSFPFGDFGEF

FRQKHGIPMTYANIYARTRHLPYVGIYLSMRPVLFVNDPQMVKDILSRDFEHFHDRGL

HVNEETDPLSGNLFSLGGVKWKNMRAKLTPTFTSGCLKGMLAILIDKATVLQKQFAKE

IATHNTIEVKDLFARYTTDVIASVAYGIDNDSINNDHDLFRQMGIKVFQQDFKTSLRL

ALTFFIPKIKALLGFSLVAKDVEDFMINLVSKTIEHRERNGIQRKDMMQLMLQLRNSG

SVSINDQQWNLDSSATVKNLTINQVAAQVFVFFVAGYETSSTLMSFCVWELARNPEIQ

VKVHQEIDSVLSNYGGALTYEALADMEYLECCMEETLRKHPPVSFLNRECTKTYRIPE

TDVIIDKGTAVVVSLLGMHRDPQHFTQPTEFKPERFSSDEQSNESNKAYFPFGGGPRL

CIGMRLGMLQAKVALVTLLAKFEFSLGKEHVKDMELPLKANTLLLVPQDGIQLVVKKR

 

>AAGE01116725 827542817 63% to 6AL1v2 complete

    MLLAFLALSAPLVTVLIWLQFRYWTRCGVPQLDPSFPFGNFSEFFCQKNGIPS

    TYANLYHRTKHLPFVGIYLSLRPALLINDPELVKNILTRDFEHFHDRGIHVDEETDPMSG

    HLFALGGVKWKNLRAKLTPTFSSGSLKEMFPLLVEKATVLQKRFLKEIATSEVVEVKELAACY

  1 TSDVIASVAYGIDMDSINNRDDLFRRMGEK VLAHDLITSLRLALAFWFPKLKVMLGSKSI 180

181 APVIQEFMTELVRKTIEHREKEGVHRKDMMQLLLQLRNGVSLKRNGVQWTEDSAPKNAIK 360

361 SLSIDEVTAQVMVFFVAGYETSSSTVSFCLFELARHQDIQAKVHQEIDTVLAEHEGNLTY 540

541 ASLASMKYLEQCLEETVRKYPPVAILNRECTKTYRIPETDVIVEKGTPIVVPLMGMHRDP 720

721 QYFPQPNDFQPDRFEGGAQSKAYFGFGAGPRLCIGMRLGILQSKVAVVTLLRKFKF 888

889 SLANPEDQHTELRMKPRSFILTTEGGIQLVVQQRHVCET* 1008

 

>CYP6AL2 AAGE01012031 494577102 622074403 641800960 755156422 632907779 580010239

42% to 6Z2 complete 51% to 6AL1

MTLLSIGVALLCVAA

FAFLNYVFGYWKRRGIRQLTPHFPFGNFTDLFFGKASFPKVCENLYERSKQWRLLGGY

VLLRPILLVNDPQLAKDIMVKDFQHFHDRGPHVDEENDPLSGHLFSLAGEKWKHLRAKLT

PTFTSGRLKGMFQTLVDTGEVLQEYIQKYAEGEDVVEIREILARYNTDNIASVAFGIKID

SINNPNEPFRHIGRK (0)

VFEPNFRNNMRGLITFMVPKLNKYLKIKSVDDDVEKFILKVVQETLEYREKNGIVRRDMM

QLLLQLRNTGTVSVDERWDVETSDKFKKLTLKEVAAQAHVFFLAGFETSSTTMSFCLYEL

AKHPEIQRRVQAEIDSVTALHDGKLTYDSINDMRYLECCIDETLRKYPPVPVLNRECTQD

YKVPGMDFTIEKGTAIVLQIAGMQHDPQYYPDPMQFKPERFQDPEVKSKPYAPFGDGPRV

CIGMRMGKIQTKVGLCLLLSKFDFELFGHDEPELVMDPNNFVLTPVDGINLKVSCRE*

 

>AAGE01225620 AAGE01073711 84% to 6AL2 588918478 complete

818 MTPLSIGVALLCVAAFAFLNYVFSYWNRRGVQQLTPYFPFGNFSDLFLGKASFPRVCETL 639

638 YERTKKWRLLGVYILLRPVLLVNDPQLAKDIMVKDFQHFHDRGTHVDEENDPLSGHLFSL 459

458 AGEKWKHLRAKLTPTFTSGRLKGMFQTLVDTGEVLQDYIHTCAKNEEVVEIREILARYNT 279

278 DNIASVAFGIKIDSINNPNEPFRQIGRK (0) 192

120 FFESNFRNNMRLMITFMVPKLNKYFKIKSVDAEVEQFILGMAKETLEYREKNGVVRKDMM

QLLIQLRNTGTVSVDERWDVETSTNSKKLTIGEVAAQAHVFFLAGFETSSSTMSFCLYEL

AKNPEVQRKVQSEIDSVTALHDGKLTYDSINEMRYLECCIDETLRKYPPVPVLNRECTKD

YKVPDSDITIEKGTAVILQISAMHHDPQYYPDPLRFVPERFLDPDMKGKPYAPFGDGPRI

CIGLRMGKIQTKVGLCLLLSKFNFELYGHKESELVMSPNNFLNTPVNGINLKVSCRE*

 

>AAGE01005840 44% to 6AL2 810047144 637194488 complete

MLLCLLILGSIATFYLFLHHHYSYWKRRGISQLKPSFAFGDFGPVIRGRANFVHHLQGIYERTK

RDYSLLGLYVLFRPALLVNDFVVARDILSRDFQHFGDRGIYVDEKRDPFSGHLFALDGER

WRHVRHKVAPAFTPLKLKDVFQTQLIGGVVLQDHLKHFAESGQSVDVADLFLRYSVDMIA

SVAFGVEIDSVNCPEEQFYRVAHSSVESNVKNLLRWTGGFLIPKVLKYTGTR

LVDQHVQDFFMHVVQQTVEYREKTGFTRRDVLQSLLKIMNAESQNVSI 185

186 DFTITDLTVTAFTFLLAGMETSSSTATFCLYEIVNNQEIQRRLQKEIDESLQEHDGL 356

357 ITYDSVVAMKYLDHCVNEAMRKFPALAYLHRICTEDYLVPSTRTIIKKGTLVLIPIYALQ 536

537 RDQEFFPHPDLFLPDRFNDPEAIRQAPFFPFGEGPRSCIGQRMGKMNVKIALVHLLSRYN 716

717 FTLANPVDQGREAPIDPLHFTISPQGSFNMNVTHRKCSSPSSHSKSLTNHSVSLAH*

 

>AAGE01173027 TC56435 TC16115 TC27418 TC40206 TC8341 56% to CYP6N1v2

494318851, join with 223483845 complete

replace with AAGE02015839.1

note: ESTs DV300013.1 DV262803.1 have CVG not CIG at heme region

33781  MIALLLIGAVTLVFLFVKQRFNYWKVRGVPYVRPTFPLGNLWGIGTKKHLSEGLEDLYVQ  33602

33601  LKGKAQLGGIYFFINPVVLVTDLDLIKTILIKDFNFFHDRSIYYNEKDDPLTAHLFTMEG  33422

33421  IKWKNMRVKLTPTFTSGKMKLMFPIVRDCANELEKCISKEIVDGKEIEVKDILARYTTDV  33242

33241  IGNCAFGLECNSLHNPNAEFREMGRKVFQLQGLGFLKLLLTQQFSTLSRALGATVLQPDV  33062

33061  AKFFLKTVSDNVDYREKNKIERNDFIDLMIKLKNGQTLEHDKSDQRVEKLSIEQVAAQSF  32882

32881  VFFFAGFETSSTLMSFCLYELAQNQDLQDKARKDILDTLNKHGSLSYEAVHEMKYLENCVS (1)  32699

       ETLRKHPPASNIFRTATQDYTVPGTSLTIEKGTSVMIP  32522

32521  TLAIHRDPEYYPDPMKFDPDRFTADQVAARHPFAFLPFGEGPRVCIGMRFGLMQARVGLA  32342

32341  TLLKNFRFTVGERLETPAQLDPSSAILLIKGGLWLKVDKI*  32219

 

>AY433475 AAGE01031181 complete 52% TO 6M4 519940462 2246449 DR747763.1

821767964 627434636 90% to 6M5

MEPITIILVTILVLLLTYGFHLIRRQLRFFXDHNVPH

IAGNFVLIDKTQHPANHFLRWYKQSKGQYPLTGVFMFIKPIAIPLDLELIKRILVKDFQY

FQNRGMYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMMAAGKQFSEH

LEEKMSEENELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFES

PRSGLKDLLKITAPGLAR

FFGVTEILPDVAEFFMDVVKSTVEYRMKNNVRRNDFMDLLIAMLDDETEGSESLTISEIA

AQAYVFFIAGFETSSTTMTWALHELSRNPEIQEEGRKCVQEVLEKYNGVMSYEAIMEMTY

IDYIIN (1)

ETLRLYPPVPLHFRVVTKDYPVPGTDTVLPAGTFTMIPVYAIHHDEDIF

PEPEKFDPTRFTPEEVSKRHAYAWTPFGEGPRICIGLRFGMMQARIGLALLLNNLRFSPG

PKSCTKMEFQPENLILTPKQGLWLKVEKV*

 

>AAGE01032555 AAGE01493222 (4 aa diffs) 476379758 88% to AY433475

570738080 578795972 58% to 6M2 complete

MEVITITLLTILVLLIAYASHLLRRQIRFFKDRNVPHIPASF

ELLDKTIHPAKHFLRWYKQFKGQYPLTGVIMFIKPIAIPLDLDLIKRILVKDFQYFQNRG

IYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMVAAGKQF 558

557 SEYLEEKVEDGNELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEA 387

386 PRNALKDAFKMTAPGLARFLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRNDFMDLLI 210

209 AMLDDKTEGSESLTINEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDI

QEEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN (1)

ETLRLYPPVPMHFRVVSKDY

HVPETDTILPAGTFTMIPVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGP

RVCIGLRFGMMQARIGLALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*

 

>CYP6M5 AAGE01133741 494330821 73% to AY433475 578801721  826022155 639077358 760273799 581855956 complete

MEVITITLLTILILLLIYVLHLLRRQIHFFKDRNVPYKPASFERLDKTIHPAMHFLRWYKQFKG

QYPLSGVFMFIKPIVIPLDLELIKRILVKDFQYFQNRGIYYNERDDPLS

AHLFSLEGAKWRNLRAKISPTFTSGKMKMMYPTMVAAGKQFSEYLEEKVGDGNELEMRD

LLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEAPRNVLKDAFKMTAPGLAR

FLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRN

DFMDLLIAMLDDKTEGSESLTISEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDIQ

EEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN

ETLRLYPPVPMHFRVVSKDYHVPETDTILPAGTFTMI

PVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGPRVCIGLRFGMMQARIGL

ALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*

 

>CYP6M6 AAGE01004894 complete 476324109 637742538 512549238 568770347

5 aa diffs to 6M6 476418676 494093520 66% to AY433475

MDVFLLIAAFVLLVAYGLHLLRKQVNFWADRNVPHNPVNFRQTVDQTVHMARRFQGYYHQFK

GQYPFAGMYLFTKPVALAIDLELLKCIFVKDFQYFHDRGTYYNEKDDPLSAHLFNLEGN

KWRNLRSKISPTFTSGKMKMMYPTMIAAGKQFSEYMDEKVGVEQELELKDLLARFTTDVI

GMCAFGIECNSMKDPNAEFREKGRMHFETPRNRKKDMMCSIAPKLARMMGLKQIIPDLSD

FFLGVVRETIDYRVKNGVRRNDFMDLLIGMLTGENVELGP

LTFNEVAAQAFVFFVAGFETSSTTMTWALYELSVNQDIQEKGRKCVRDVLEKYNGEL

SYETIMEMSYIDHILH (1)

ETLRKYPPVPVHFRIVTKDYKVPNTETVLPAGTSVMIPVYAVHHDPEIFPDPK RFDPDRFTTEEINKRHPYAWTPFGEGPRICIGMRFGMMQARIGLALLLNNFRFSSG

KKSTVPLDFTAKSFILSPDEGLWLKVEKL*

 

>AAGE01105997 63% to 6M6 822931992 complete

     MMEPLDVAITVVMVALAVYMYLDKKHSYWADRKVPFVKPKFFYGNAKEISQTMQ

     VGQVFQQFYHELKGRSPFGGIYMFTAPVAVVTDLELLKCIFVK

   4 DFQYFHDRGTFYSEKGDPLSAHMFNLEGNKWKMLRNKLSPTFTSGKMKMMFPTIVAAGK 180

 181 QFHDFMDEKVKQESEFELKDLLARFTTDVIGMCAFGIECNSIKDPDAQFRVMGRKLFTTG 360

 361 RSKPKSFLMNTMPKVAKLLRLRIFPADVSDFFMKVVRETIDYRMANNVHRNDFMDLLIQM 540

 541 RNPDENKSSEGLLSFNEIAAQAFVFYLAGFETSSTLLTWTLYELAVNQDIQEKGRQHVKE 720

 721 VLKKHDGEMTYESITSMKYLDQILN 795 (1)

 860 EALRKYPPVPVHFRETSKDYTVPDSNIVIEGGTRLFVPVYAIHHDPEIFPNPEQFNPDRF 1039

1040 TPEEEQKRHPYAWTPFGEGPRICIGLRFGMMQARIGLAYLLNSFKFSIGEKCKVPLEFDV 1219

1220 KSFILAPKGGLWLKVEKI* 1276

 

>476419050 52% to CYP6N3 637792107 793200622 complete

AAGE02015839.1 4/21/06

102508  MWIFLLLSIAVLLILQVRRKYSYWKRHGVPFIQPRFPFGSITPVGDRVHSSQLMARFYNQ  102687

102688  LKGTYPFAGMYFFTNPVVLALDLDFIKNVLVRDFQYFHDRGLYHNEKDDPLTCHLFNIEG  102867

102868  TKWTNLRRKLLPTFSSGKMKMMCPTILAIADRFRTAIENSISDQNEIEMRDFLARFTTDV  103047

103048  IGTCAFGIDCNSLENPDAEFLKMGNKIFEVPTSRIIAYFFVSTFQELSKKLHIKAVPEDV  103227

103228  SRFFYKVVRETMAYRQSSGVQRNDFMNLLMQLKEKGELEGSDEKLGTLTLDEVVAQAYVF  103407

103408  FLGGYETSSTNMCFCLYELALNGEIQEKARECVQKAVAKHGGLNYEALMDMPYLEQCIY (1)  103584

        EALRKYPPIANLFRSVTQDYNVPNSNVMLPKGMNVWIPIYAIH  103767

103768  HDPEFFPEPELFDPERFTQEECEKRKPFTYMPFGEGPRTCIATRFGMMETKTGLATLLMN  103947

103948  FKFTKSARLEVPPKFSTKHVMLTPVGGLWVKVEKIEQ*  104061

 

>AAGE01021887 66% to AAGE01005098 complete

replace with AAGE02015839.1 4/21/06

112574  MWLNLLVMVFALSVILVRRRYSYWKRIGVPFIQPRFPLGSIGSIGTRIHSSQLLAQFYQQ  112753

112754  LKGSHPFAGIFYFLQPVALALDLEFVKNVMVRDFQYFHDRGLYYNEKDDPLSSHLFNIEG  112933

112934  TKWTTLRRKLVPTFSSGKLKMMCPTVVSVADRFKMCIEKSIAKEEAIEMRELLARFTTDV  113113

113114  IGSCAFGIECNSLENPDDKFRKMGEKVFDVSPFAILAFFFLSTFKDLARKCRISITDSEV  113293

113294  AAFFSTIVQKTITYREKNNVQRNDFMNLLMQMMKKNKEDESEENSVTLTLDEVVAQSYVF  113473

113474  FLGGFETSRTTMSYCLYELSLNQEVQNRARKCIQSAVAKHDGLNYEALMDMPYLEQCIN (1) 113650

113709  ESLRKYPPISNALRSTTKDYAVPGTEVILKKGTDVIVPIYAIHHDPEYYPDPELFDPD  113882

113883  RFSADQCAKRKPFTFMPFGEGPRMCVASRFGMMETKIGLAAMLMSFRFSKCEKSIVPLKI  114062

114063  SPNHLMLTPAGGLWLKVEQLESDETEMGFSKLISDERVNRLGYSM*  114200

 

>AAGE01004071 494535013 complete 55% to 476419050 56% to 6N2

TC62330 TC28072 TC40767 58% to CYP6N2

MWNLSTSSNPHTIVPAHPKEILLLATSSVPFIPARFPVGSFDGVGVRNHPSQLLAKFYR

QMKGLHPFVGVYYFLQPVVVVLDLDFAKTILIRDFQYFHDRGLYYNEKDDPISGNLLHLE

GSRWTNQRKKLIPTFSSGKLRMMCPTILKVADNLKVSFERYVAERDEIEIKDILARFSTD

VIASCAFGLDCSSLLEADDEFRRMGTKVFDISGWKLLKLFFVFAFGNVARRCHMKLIDED

ISQFFFKVVRETIDFRKKNHVHRKDFLNLLIQLKDNG

ELEGSNEKLGTLTLNEVVAHSFVFFLGGFETASTTMSYCL

YELSLNEEVQERARQCVKAAIHKYGDLNYDDLLDMPYLEQCINETLRKYPPSTIYRIVTQ

NYHVPDSSIVFPKGMSVMIPVYAIHHDPEFWPSPELYDPDRFAPEECVSRNPLTFIPFGE

GPRMCVAARLGVLQTKIGLATLLMNFRFSRCKNSTEPLQYSPKHFILTPVGGLKMRVEKI

Q*

 

>AAGE01078584 494579395 48%  to 581602077 46% to 6N2 586209056 568771938

574225739 pseudogene (sequence does not continue)

1188 MLLYLLLTVVTLAYLWIGRRYSYWKQRSVPYVEPRFPFGNLQGLNKRHFGLLAQDVYSKL 1367

1368 KGSGSKFGGMFFFVNPVAVILDLDFAKDVFVKDFQYFHDRGVYSNEKVDPITSHLVAMEG 1547

1548 IKWKNLRAKLTPTFTSGKMKMMFPTITAVADEFRKCMVNEVDKGGEIEMKEFLARFTT

DVIGSCAFGLECNSLADPEAEFRKMGKKALTMSPMGFLR 525

524 RILSVTFRDLAKFLGVRISDPDVATFFMNVVRSTIEYRERNKVQRNDFMDLLIKLKNVEP 345

344 IDENTNQLGPLTFNEIVAQAFVFFLAGFETSSTT 243

MCFCLYELAKNQELQDKARRNIDEVLAKYGTMTYEAVH

EMRYMENCIN(1)

ESLRKYPPLPNILRNVNKPY

 

>AAGE01020246 79% to 494579395 58% to 6N1 complete

4381 MLLFLLLSVVTAAYLWVIWRYSYWKRRSVPYVEPSFPFGNLQGLNKRHFGLLTQDVYSK 4557

4558 LKGTGCKFGGMFFFVNPMVVILNLDFAKDVFVKDFQYFHDRGEYSNEKADPIMAHLVTME 4737

4738 GTKWKNLRTKLTPVFTSGKMKMMFPI

4816 ITAVAEEFRKCMAKEADKGEDIEMKELLARFTTDVIGNCAFGLECNSLMDPEAEFRKMGR 4995

4996 KAMAMSSADFLRRKLCNSFRGLAKLLGVRLSDPDVSDFFMNAVRSTIEYRERNKVQRNDL 5175

5176 MDLLIKLKNAELIDEKSDRLGPLTFNEIAAQAFVFFLAGFESSSTAMSFCLYELAKNQEL 5355

5356 QDKARRNINEVLVKHGTLTYEALYEMTYIENCIN 5457

5518 ESLRKYPPVTNIVRNVSKPYRVPGMNVTLEEDCRVLLPVYAIHHDPSLYPNPDQFDPERF 5697

5698 NPENSAARHPMAFVPFGEGPRICIGLRFGSMQARIGLTYLLKNFRFTLSEKMHDPLKMMS 5877

5878 NTIILASEGGLWMRIEKL* 5934

 

>AAGE01192518 80% to AAGE01020246 637742736 (758bp upstream do not have more P450 seq)

this break is not near the usual intron boundary CIN/ESLR.  This is a probable

pseudogene fragment.

1554 YQVPGMNVTLEKGCRVLLPVYAIHQDPKLYPNPEQYDPDRFNPENSAARHSMAFVPFGEG 1375

1374 PRFCIGQRFGMMQARIGLTYLLKNFRFTLSEKTPSPLKILANSTVLASEGGLWLKLEKL* 1195

 

>AAGE01052546 494130149 64% to 76419050 615888679

12  ADRGLYYNEKDDPISCHLFNIEGSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCL 191

192 SKSITQNQQEAEMKEWLNRFTIDVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSR 365

366 IIKFFFISLFKNLAKKVHIKSVPEDVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMA 545

546 DGKLEGSDEDVGKITLNEVVAQSFVFFLAGYETSSTVMMFCLYELSLQEDIQRRARENV 722

723 ITAVSRHGGLNYDALMDMGYLDQCVN

    ETMRKYPPAGNLGR 899

900 CVTKDYNIPNTNITLRK

 

EGPRNCIAARSGMLMAK 1136

 

AAGE02018066.1 Length=351875 use this seq

319978  MWIYLLIGIITSLVLFVRRKYLFWERQGVPFIKPKFPFGNLLVNGKRVHTSQLTTYYYNA  319799

319798  LKGKKHPIGGVFFFTTPFAVVLDRELMRNVLIQDFQHFHDRGLYYNEKDDPISCHLFNIE  319619

319618  GSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCLSKSITQNQQEAEMKEWLNRFTI  319439

319438  DVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSRIIKFFFISLFKNLAKKVHIKSVPE  319259

319258  DVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMADGKLEGSDEDVGKITLNEVVAQSF  319079

319078  VFFLAGYETSSTVMMFCLYELSLQEDIQRRARENVITAVSRHGGLNYDALMDMGYLDQCV  318899

318898  N (1)

        ETMRKYPPAGNLGRCVTKDYNIPNTNITLRKGLNVVIPVH  318719

318718  GIHHDAEYYPDPERFDPERFSAEESTKRLPFTFMPFGEGPRNCIAARFGMLMAKVGVASM  318539

318538  LMRFQFSKCSKTAVPLVISPKHASMSPEGGMWLKVKEIK*  318419

 

>AAGE01005098 6263502207 69% to 6N2

TC55162 TC31642 TC43307 584989096 579853579 complete

    MWIYLLIAAITLSVLLVR

232 RKYSYWKRHGVPYIKPTFPFGNIRPAGNRVHSSQLMTRYYNELKGKHQFGGIFFFTNPV 408

409 ALALDLEFIKDVLVRDFQYFHDRGMYYNERDDPISGHLFNIEGTQWTNLRKKLLPTFSSG 588

589 KLKMMSPTIISVAERFQECLEKCITVDTEIEMKDLLARFTTDVIGTCAFGIDCNSLN 759

760 DPEVEFRKMGNKMFELP

TGRILKFFFISTFKNLARKARLKSVPEDVSEFFFRVVRETIDYREKSHIQRNDFMNLLMQLREKGALE

GSDEKVGTLSMNEVVAQAFVFFLGGFETSSTTMSYCLYELSLHEDIQERARECVQSAIAK

HGGFNYDAVMDMNYLELCIN (1)

ESLRKYPPGAN

LVRCATKDYQVRNSSVVFKKGMSVMVPIYAIHHDAEYYPDPERYDPERFGVEELAKRPPF

TFMPFGEGPRICIAARFGMMESKIGLAALLMNFKFSKCSKSIVPLVISNKHVVLTPAGGL

WLKVEKLEQ*

 

>AAGE01273771 520199522  728739223 86% to CYP6N4v1 578623429 complete

MLIYLTVLALTLAVLWIRKRYSYWMDRGILYVEPS

FPAGNLRGMGRKEHLSSQMQRCYKELKGKGPVGGMFFFINPVALAMDLDLIKSVLVKDFQ

YFHDRSVYYNEKDDPLSAHLFTMEGAKWKNLRAKLTPTFTSGKMKMMYPTIIGVADEFQK

LMKSEVSS

NAEIEMKEILARFTTDVIGTCAFGLECNSLHDPDAKFRAMGRKIFSFANGRF

LKAVIAQQFRSLARSLHIALVDKEVSDFFLGAVRDTIKYREENKIERNDFMSLLMKLKDD

GNTGNTETLTVEEIAAQAFVFFLAGFETSSTAMSYCLYELAQNSDLQNKARKSVMDSIKK

HGSLTYEAMQDMQYIDQCIN (1)

ESLRKYPPASTLTRSVSKDYKLPNSNVVLQQGSTLIVPVYA

LHHDAEYYPDPEKYNPDRFTPEEVAKRNPYCFLPFGEGPRICIGMRFGMMQARVGLAYLL

RDFSFTLSSKTPVPLKISPRSPVLTSEGGLWLKVQKL*

 

>AAGE01504815 581602077 91% to CYP6N3v2 753204063 815151384 632872571 complete 579754790 TC62375 TC16009 TC23365 TC47533 TC56452 TC50555 6N6v1 6T1.4

MWIYLTVLALTLAVLWVRKRYAYWKERGIPYVEPSFPAGNIR

GMGRKEHFSTQMQRCYKELKGKGPVGGVFFFINPVPLALDLDFIKTVLVKDFQYFHDRSI

YYNEKDDPLSAHLVALEGAKWKNLRTKLTPTFTSGKMKTMFPTIIGVADEFQKMMKNEVV

GNTEIEMKDILARFTTDVIGTCAFGIECNSLQDPNAQ

FRRMGRKIFSVAK

GRLLKLITAQQFRSLARMLGITLIDK 

DVSDFFIGAVRDTIKYREENKIERNDFMSLLMKLKNDESSQDTNSGDVE

TLTVEQIAAQAFVFFLAGFETSSTAMSNSLYELAQNSDLQNKARKSVMDAIKKYGSLTYE

AMQDMQYIDQCIN (1)

ESLRKYPPASNLTRTVSTDYKLPDSNVVLQQGSTLIVPVYALHHDA

EYYPDPEKYDPDRFTPEEVAKRNPYCFLPFGEGPRNCIGMRFGMLQARVGLAYLLRDFSF

TLSNKTPVPLKISPHSPILTSEGGLWLNVRKL*

 

>AAGE01026936 476398858 46% to 476419050 48% to 6N1 584294086 819690004

complete

MSALLIILALTPLFLFIIYVK

672 QKYAYWARRNVPFLKPHFPYGNFEALDRKSIADVAREAYEEMKNRGPFYGAYFFLQPL 499

498 ITITDPDLIKMVLIKDFNTFPDRGLYFNERDDPLSAHMFAIEGNKWRSLRQRLSPTFTSG 319

318 KMKMMFPTLAAVGDQFSAFLDEEIGSGKVVEVKDFMAKFTTDIIGSCAFGIECNSFK 148

147 DPHGRFRQFGKMVFETPVHGSLVRFALKSFPEISRRLRIK 28

ALHEEASKFFYGVVEDTVKYREKNGVERKDFLSLLIDMKKDGVDFT

MDEIAANSFIFFGAGFETSSSNQTFCLYELARNPECQDKARQSVLDALRNHGGMTYDAAC

DMQYLDQCIN (1)

ETLRLYPSVPVLERRAFQDYKIPGHDVVIPKGMKINIPAYAI

QRDERFYPDPDVFNPDRFHQKEVAKRHICTFIPFGEGPRICIGLRFGMMQSRVGLATILS

KFRISICSETANPLEYSSKTSVLIPKEGLWLRVDPL*

 

>AAGE01569058 TC56593 TC20604 TC28568 TC51124

56%  to CYP6AA1 834896125 complete

MGLYNTVLYLVLPIVWLLYTYFRRKYSYWADRNVPQVPGSL

PLGSFNGMGTKYHFVDVLKRVYDTYHKTHKAIGMYLSVKPILFVSDLDLIKKILVKDFNS

FRDRGMYYNEKDDPLSAHLFSIEGERWRFLRNKLSPTFTSGKIKYMYLTICE

IGEEFLACFDKYLDRKEAVDIKPLAQRFTSDVISSVAFGLKTNALKNEGSELLNKGDSVF

KPGRWETIRIFALLSYRDLAKKLGLRQFPRDVTDYFMDIIRGTVDHREKTNVMRQDFLQL

LLKLKNKGTIEDHEEESKEKITLNELAAQAFLFFFAGFETTSTTVSFALFELANNAEVQEKTR

QEVQRVLAKHGGHLTYDAIKDMTYLEQVVNETLRKHPPVGNLIRLANDPYRIDSLGTDIE

RDTMIMIPVHAIHNDPDIYPDPERFDPYRFTPEAINARHSHCFIPFGDGPRNCIGMRFAL

VEVKFGIAQLLTRLRFTVNEKTQFPVRYDPKSQFAEVKGGIWLNVERI*

 

>223495136 AABUM55TV.gz 59% to TC56593 40% to 6aa2 pseudogene?

Sequence has no exact match, 78% to 615844728(TC56593 see above)

KTLLNHPPFFNLILLLNYPYLIHSL*TFF

487 QQNTIIMIPFHTIHNYPNIYPYP*RFYPNQFTP*SINSHHSHTFIPF*YPPLNCISIPFS 308

307 LLHLNFVIAHLLTKLRFTSNHKTHFKNRYDPKSRGAVVEGGIWLKVE 167

 

>AAGE01185776 223460790 57% to TC56593 53% to 6AA2 78% to 223468847

636165685 580089056 637123288 complete

MAFLFTTLCLLLPLLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNDMGSTKHIVELLDAIYKQYRNTHK

AVGMFLSINPILLAVDLELVKQILVKDFNSFHDRGMYFNERDDPLASHMFSVEGERWRFL

RNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQKFTCDVIGSCAF

GLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQTPADIERFFV

NMVRKTVEHREKNNITRPDFLQLLMQLKNKGTLEESEEDSKETISMNDVIAQA 682

681 FLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTYVEQIVH (1)

    ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVM 325

324 IPVHSIHHDPEIYPDPSRFDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFG 145

144 IAQLLSRLRFTVNEKTQLPLRYDPKANVASALGGLWLDVERI*

 

AAGE02023125.1 Length=57153 use this seq probable allele of upper seq

97% identical 12 diffs

15290  MVFLFTTLCLLLPHLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNGMGSTKHFVELLD  15111

15110  PVYKQYRNTHKAVGMFLSINPVLLAVDPDLVKQILVKDFNSFHDRGMYFNERDDPLASHM  14931

14930  FSVEGERWRFLRNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQK  14751

14750  FTCDVIGSCAFGLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQ  14571

14570  TPADIERFFVNMVRKTVEHREKNNISRPDFLQLLMQLKNKGTLEESKEDSKETISMNDVI  14391

14390  AQAFLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTY  14211

14210  VEQIVH (1)  14193

14133  ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVMIPVHSIHHDPEIYPDPSR  13972

13971  FDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFGIAQLLSRLRFTVNEKTQL  13792

13791  PLRYDPKTNVASALGGLWLDVERI*  13717

 

>223468847 73% to 223460790 I-helix and end 62% to 223460790 pseudogene

AFLFFFVGFSTSFTPFSFSLFEFALFPQLRGVARDRFLRTLDDHDVLFTFAALIDLTYVALIFH (?)

DSLP*FAPFRYVFREAYVPFQFHSPDFFLG*S

490 TIVMIPFHSFLHDPEFFPDPSRFVPDRFSPEAISALHSHSFLPFGDGPRNCFGMRFALLE 311

310 VKFGIAPFLPPFPFSFPPPPPLPLRFVPKANVASTLPGLCFPVDLI 173

 

>AAGE01408667 TC66947 TC28477 TC49577 55% to CYP6P4 TC67159

TC33540 TC39062 593244050 519827477 50% to CYP6P4 complete

Revised May 15 2006, Trace files support the cyan seqs

MAILELYLAIGVTLVLATAGCVFLFLDKKRSFWKDRNFPCTGRAKMIYGDYKNMNQT

EHMQYINQRIYNEFKARKLPIGGTVLFLVPSTVVVDPDLIKAMLVKDFNFFHDRGVYNNP

EVDPLTGHLFSLEGQAWRQLRAKLSPTFTSGKMKMMFSTIL

SVADDLKEFLLEKTESGPTELEMKNVLAGFTTDV

IGSCAFGIECNSLRATHCRFREVSRKIFEQSVGQMLWMIVLMLFK

GVATKLKLKATPAEVENFFTNMVQETIDHRERNNVQRSDFMNILIQMKNSTNLEEKLTLN

EITAQSFIFFVAGFETSSTTMVNCLFELAMNPDIQEKLRAEIFKVC

GEGDLTYESVSSVEYLNMVIDETLRKHPVVDSLLRTSTQPYNIPNTDLKIPKGTF

VFIPVHALHHDPEYYPDPDRFDPERFNAENRASRHPFVYLPFGEGPRNCIGMRFGLMQTR

VGLITVLRNFRVRPSSNTPERLVVNPKSGIPAPLGGIPLLIERI*

 

 

>AY432230 AAGE01011017 52% to 6P4 583641208 589587999 CR937398.1

CR937850 CR937397.1 CR937849.1 complete

MDPVTVILTIFVGLTGLVYFFLRREQQKWPRLGVPFAKNPH

LLFGNVRGIFQKEHSCEILQRLYWEFKGRGLKLGGIMNFFQPAVLVIDPEISKSILVKDFNKFHDRGIFVDPAGDPLSANLFSLEGAQWKAMRTKMSPTFTSGKMKYMFESVLNVAERLKDYLAENCLKEDIELKNILQRFTMDVIGNVAFGV

ECNSIKNPSSEFRLMGLKANRFDGVRFLKFFIGGAYKNFAKKIKLKVVEDDVHKFFMSLV

HSTVHYREGNNVKRNDFLNLLMEIKNKGKFSDEPNSGGEGITMNEIAAQCFIFFTAGFET

SSTTINFCLYELANNPDIQDRLRNEIEDVVAKDGGELKYDTLLGMNYLDRVVS (1)

ETLRKYSAVDNLFRISNSPYTPDGCNFTIPAGTLFQIPIHSMHHDPEYFPDPGRFDPDRF

LPEVAKSRHPYCYLPFGEGPRVCIGVRFGLMQTKIGLVTLLRDFRFGPRSETPDRLQFEA

KTFVLTPQTGIYLKIEPIGI*

 

>AAGE01083421 476356097 39% to CYP6AD1 587854394 570810577 complete

MISGTVCILLVLANVAFLVLFVR

GVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNL

LVSPSILVADPDLAEEVLVGNVRRFPDRGL HVDAEVDPLSETLFALRGNRWKDKRNR 

LAPVFSEETLKPV FRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVMGKSVFGM 202

203 RCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITDATVEKFYVDLC 382

383 RSNVLVRESYKVKENDILQLFMRLREARQLTME 481

482 ELTTACYSFVKHGMEPCTSVMTFCLYELAKNLSIQKRLRDEISHNLEDTDGQ 637

638 LTYDVIMSMNYLDQVVN (1) 688

745 ETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPENF 897

898 DPERFTAKQARTRHPYCYLRFGAGPRECLGAR

    FGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI*

 

These trace files match 100%  MISGTVCILLVLA

gnl|ti|591523435 

gnl|ti|587360321 

gnl|ti|585826792 

gnl|ti|578932662 

gnl|ti|570810577 

gnl|ti|576970754 

 

These trace files match 100%  HVDAEVDPLSETLFALRGNRWKDKRNRLAPVFSEETLKPV

gnl|ti|639160181 

gnl|ti|591799078 

gnl|ti|591523435 

gnl|ti|576970754 

gnl|ti|570810577 

 

These trace files match 100%

RLAPVFSEETLKPVFRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVM

GKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITD

gnl|ti|639160181 

gnl|ti|591523435 

gnl|ti|578800786 

gnl|ti|570810577 

gnl|ti|476356097 

 

These trace files match LAKNLSIQKRLRDEISHNLEDTDGQLTYDVIMSMNYLDQVVN  100%

gnl|ti|793209208 

gnl|ti|637742971 

gnl|ti|587849418 

gnl|ti|578800786 

gnl|ti|476356097 

gnl|ti|567212773 

 

These trace files match FHHDPDHFPAPENFDPERFTAKQARTRHPYCYLRFGAGPRECLGAR  100%

gnl|ti|476356097 

gnl|ti|793209208 

gnl|ti|614704229 

gnl|ti|637742971 

gnl|ti|588905694 

gnl|ti|743515336 

gnl|ti|587854394 

gnl|ti|587849418 

 

These trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI100%

gnl|ti|614704229 

gnl|ti|637742971 

gnl|ti|588905694 

gnl|ti|745128895 

gnl|ti|743515336 

gnl|ti|587854394 

 

>1.1327_5 AAGE01083421 JP’s version 17 diffs to AAGE01083421, new gene

MISGTVCIVLVLANVAFLVLFVRGVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNLLVSPSILVADPDLAEEVLVGNVRRFPDRGLHVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQVMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDATVEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCTSVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQVVNETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI.

 

These trace files match 100%  MISGTVCIVLVLA

gnl|ti|822917015 

gnl|ti|749404661 

gnl|ti|630748299 

gnl|ti|520549356 

 

These trace files match 100%  HVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPV

gnl|ti|749404661 

gnl|ti|630748299 

gnl|ti|591518206  this seq also matches JP’s version

QLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQ

VMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDAT

VEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCT

SVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQ

 

These trace files match FHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGAR  100%

Two different sequences exist

gnl|ti|808282492 

gnl|ti|593108115 

 

These trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI100%

gnl|ti|808282492 

gnl|ti|593642133 

gnl|ti|593108115 

 

>CYP6Pnew 574331490 AAGE01337778.1 571589719 600552785 complete

66% to 6P4

note: TC66947 begins 357bp downstream of this seq, same oreintation

this seq matches AAGE02030882.1

MLPFLLAVVALLLTAAGLYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAY 353

VNRELYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPL 173

SGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDV

AEEFRQFLVDSRERESVIEMKEVLASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVF

EQRMSTLFKFIFASLFK

DLARKLGVKITDAGVEKFFLGLVRETVEFREKNNVMRNDFMNLLLQLKNKGRLVDQLDE

ADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVAS

NGGRVTYDLVMGLRYLDNVVN (1)

ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFDP

DRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMQTKIGLITLL

RNFRFSPSAKTPDKIAFDVKSFVLSPDGGNYLRYDKI*

 

>AAGE01395583 96% to 6Pnew 592239414 N-term part C-term part 569878875

803228481 a second C-term part, neither can extend 8 aa to the normal exon boundary

This looks like a pseudogene

    MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE

    LYQQFKARGEGFGGYSFFAVPAVIIVDPELVK

  8 TILVRDFAVFHDRGIYNNPKDDPLSGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTI 184

185 MDVAEEFRQFLVDSRKRESVIEMKEILASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKK 364

365 VFEQRVSTLMKFIFASLFKDLARKLRIKITDAGVEKFFLGLVRETVEFREKNNVLRNDFM 544

545 NLLLQLKNKGRLVDQLDEADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAK 724

725 NPDIQEKLREDIEEAVASNSGRVTYDLVMGL 817

 

this is the same as

AAGE02023125.1 100% match to 1.702_1 with error at intron boundary, missing small exon

This could be a pseudogene since the gene structure does not match related genes

The end of exon 1 could be broken off by an insertion, blocking expressed. 

If it is expressed it is missing 2 conserved amino acids VM

48326  MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE  48147

48146  LYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPLSGQL  47967

47966  FLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDVAEEFRQFLVDSRKRESVIEMKEILAS  47787

47786  FTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVFEQRVSTLMKFIFASLFKDLARKLRIKI  47607

47606  TDAGVEKFFLGLVRETVEFREKNNVLRNDFMNLLLQLKNKGRLVDQLDEADEVAARGLTM  47427

47426  EELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVASNSGRVTYDL  (0) 47253

44758  GLRYLDNVVN (1) 44729

44670  ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFD  44503

44502  PDRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMHTKIGLITLLRNFRFSPSPKTPDKI  44323

44322  AFDVKSFVLSPDGGNYLRYDKI*  44254

 

>AAGE01028822 752849490 633799995 600024515 576876673 complete 65% to 6S1

MILILLLLAATALFFRWINAYRARYQFWKEHNVPHLEPRFPVGNAGDILKSTIHFAH

IMDNLYRELKHFGDYAGIYFFRDPVLVVLSPEFAKTVLVK 658

DFNYFLDRGVYSNEKDDPLSANLFFMEGHRWRKLRAKLTPTFTTGKLKAMFHTILAVGEQ 478

FDRYLQDYTKQKDEVEVKDLLARFTTDIIGSCAFGIDCNSLENPESKFRQMGKRMINFPK 298

LKALKIFFAMMYRKQARWLRIRFNDEDVSDFFFAVVRDTIRYREENNFERKDFMQLLIE- 121

LKNKGYMEDDGEYVEEL

QGGRLEKLTFEEIAAQAFVFFFAGFETSATTMTFALHLLASNQEVQDRGRKCVYEVLERH 444

DGKLSYEALMEMTYIDCIIQ (1)

ETLRIYPPVATIHRITTKPYKLPNGSVLPEGVGVVIPNLAFQRDPEFFPEPMQFRPE 157

RFFEDEKDKRHNFCHLPFGEGPRICIGMRFGLLQTRMGIAMLLKNYRFRLCPKSVFP

LKTDPINLIYGPAGDVWLGIEKIQ*

 

>AAGE01126587 45% to 6AJ1 missing part of exon 4 632860227 759657436 578977303

walked by megablast to AAGE01030106.1 but this still does not get the N-term

bset match to N-term in WGS = AAGE01034156.1  Since there is only one 6AJ

This must be the correct N-terminal.

Note after the sequence PERDDQLQSLLKTK there is a gap compared to Anoph 6AJ1

The seq KSSTY that comes right after this seq in 6AJ1 appears on the

Opposite strand translation in the same spot, so there might be an inversion

Here and this might be a pseudogene.  13 aa at the end of exon 4 are missing.

Five trace file seqs have a 100% match in this region, so the seq is correct.

Alternatively the seq may be shorter like 6AH1 or 6AG.  A PHASE 1 boundary is

possible.

1061 MILTTVFLIGLLYKNPVAFFLVIVAGLLVRELIKYHFRHWERCNVPGPKPSLIFGNIASN  882

881  IFLRQHFAEMIDGWYN (2)  834

763  KFPNAPFIGFYKIFKPSVMIRDPEMIKNVLVRDQACFSANDFAFDEKLDPLLAHNPFMVSG  587

586  ERWKKSRQLLTPIFTGSKMKQLFPIMDEISSQFVDFVGRQCGREVEAKS (0)  440

     ISAAYTTQNVAGCAFSLDADCFNNPNSEWRVMGKKIFQPTLLAGIKFMLMLFVPSVTWFIPVP (2)

2075 FLPKEVDRWMRKLVSTLLQERKNKQPERDDQLQSLLKTKS (1) 1956

1485 AELTEEQIAGHSLAFFSEGFETSSTTMGFAILH (0) 1381

1327 LAENPDVQEKLFQEIQNTLGKNDIPLTFDLVQKIEYLDWVLQESLRITPPA 1175

1174 AGLQKLCTQNYCLKYKVDGKEVGTWIMPGTTVLIPIVAVHM

 990 DPKYYPEPEKFRPERFSPEEKAKRTDPVYYPFGEGPRMCLGMRFAQIQIKMALLKLVQQF 811

 810 RVRTSPNYKPWQYNRNTFLTEAKDGLQVVFERRS* 706

 

>AAGE01171970 69% to 6AK1

822917241 578615327 575500950 complete

MPLVAWLAAIALICYLAWKRNNFWNRHGVPYVLEIPAVGNFSSVALQMHSMFD

YVARIYDHVRTRDADFFGINIFFRKALVIRNPDMVKKMLVADSRYFINRQMC

TDREGDHFGYYNLMMIKEPLWKDLRGYLSPSVTSSRLRRMFSLIDE (0) 526

IGNNMLAHLDGVQQKPTK

LRETEFKELCARFTTDVIASTFFGIQANCLSDEESEFRYYGR 264

KIFEYGPKRALNMAAFFFMPELVPYLGFKLFPRDTERFLKTIIEQEIARRETSGENRGDF 84

IDSMIALKNNEATIGVEEKI (2)

HLKGDILVAQAATFYMASFETTSSVLSFTLYELTKN (0)

PEIQQRLREEIHNCIKKYGRDLSYECLVNEMPYLGMVISEAARLYPVLPFIERQCSLPAGA

TGYKLDPFHNFVVPNKMPVLVPIYAIHRDPK (0)

YFPDPLRFDPDRFSKDNADNIVPCSYMPFGVGPRTCLGSHFGTLQVKVAITRLLSKYRI

LRSESSPETLTYRKNAFTLHSNEGLYADLELDELC*

 

>AAGE01277917 65% to 6AH1

519656511 569678490  521887623 578889466 749413119 578997302 complete

MLELYIALAVLAVCLYFKWSCSYWRRVGNVDGPQPLPIFGNGLEQITGAKHFGEIFEEVYR (2?)

TYPTAAWVGIYELFNKPAIVVRDLELVKE 878

ILVGSFQHFNRNSFEVDETIDPLVAINPFTQSGDLWKERRSQVVPVFSQTKIRSCFPIIK 698

NVADNFLEYVTKTRKTSPDFEAKD 623

ICARFTIDSVASCAFGIDAESFTNPNSEFRRVGFELFNPSSIMATVRSLLALFAPKLASLLRIP  (2?)

FVPPYVDRWFRKLVNEVIRQRKEGEVKRQDLFQAMYDT 628

LTQQGTVDVKNDEIVGHSVTFLTEGFETSSTLMCYFLYELASNQHIQDRVLNEIDCVLKE 448

YDGKLTDEAVNKLTYMERAMYETLRMHSPVFTLTKVCTKEYELPPQYTDDVGKRITMKPG 268

MSAIIPVHAIHLDPEIYPDPCRFEPDRFLDENRKGRHRYAFLGFGEGPRICLGMKFGLSQ 88

SKIGIATLLSKYRVVGSDKQELPLEISRKSFLLASKNGIWVKFVERG*

 

 

CYP3 clan

CYP9J related sequences 17 complete genes, two full length pseudogenes

And two partial pseudogenes

 

>CYP9J1 TC67648 TC11677 TC2154 TC45358 AY064092 AF390099 complete 50% to 9J4 96% to 9J2 

MVEVNIFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVVKCAT

SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENESYKKGNES

QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRNKQGIVRND

LVNMLMETKNGALKYEEQDTQVPEGFATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILTFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVS

ETLRKYPTATLTDRYVNKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

96% to CYP9J1, 9J2, 9J15 complete

nearly identical to 9J2 (9aa diffs)

This is a near exact match (1 aa diff) to CYP9Jae9 below without any frameshifts

This is not a pseudogene

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFGSKLLYNSYPDAK (2)

IIGYYELTKPTYMVRDPEMIKKIAIKDFDSFTDRT

PVFGDAVPADSLFFNSLF

SLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCATSMTDFIHSEAKAGRRLEFNMKDTFS

RFVCDAIASVAFGIEVDSF

RDPENEFYKKGNESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKK

LILDNMDQRKKQGIVR

NDLVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEA

LRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPER

FSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEK

TQIPLRLSKSAFTMQAENGV

WLELKARPKA

 

>CYP9Jae9 494315799 588882678 579665582 579009008

          Length = 536

 

 Score = 1060 bits (2742), Expect = 0.0

 Identities = 535/557 (96%), Positives = 535/557 (96%)

 Frame = +1

 

Query: 247  MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG 426

            MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFL GIFDMVVLKRVELVFG

Sbjct: 1    MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFG 60

 

Query: 427  SKLLYNSYPDAK*VCSTNDCRAYYDDEPSFSTRIIGYYELTKPTYMVRDPEMIKKIAIKD 606

            SKLLYNSYPDAK                     IIGYYELTKPTYMVRDPEMIKKIAIKD

Sbjct: 61   SKLLYNSYPDAK---------------------IIGYYELTKPTYMVRDPEMIKKIAIKD 99

 

Query: 607  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 786

            FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA

Sbjct: 100  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 159

 

Query: 787  TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 966

            TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ

Sbjct: 160  TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 219

 

Query: 967  KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 1146

            KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN

Sbjct: 220  KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 279

 

Query: 1147 MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV 1326

            MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV

Sbjct: 280  MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV 339

 

Query: 1327 SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 1506

            SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP

Sbjct: 340  SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 399

 

Query: 1507 TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 1686

            TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN

Sbjct: 400  TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 459

 

Query: 1687 RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 1866

            RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT

Sbjct: 460  RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 519

 

Query: 1867 MQAENGVWLELKARPKA 1917

            MQAENGVWLELKARPKA

Sbjct: 520  MQAENGVWLELKARPKA 536

 

AAGE02011007.1 1 diff to CYP9Jae9 (fourth P450 on this contig)

177820  MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG  177641

177640  SKLLYNSYPDAK (2) 177605

177541  IIGYYELTKPTYMVRDPEMIKKIAIKD  177461

177460  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVA  177290

177289  KCATSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGN  177110

177109  ESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRND  176930

176929  LVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAA  176753

176752  FDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEAL  176573

176572  RKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERF  176393

176392  SEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSK  176213

176212  SAFTMQAENGVWLELKAR  176159

 

>CYP9J2 TC64859 TC11586 TC5014 TC50571 AF329892 complete

50% to 9J4

96% to 9J1, 98% to AY064093

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRYMAELVVKCAT

SMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET

QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLISDNMDQRKKQGIVRND

LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS

EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

>CYP9J15 AY064093 complete 98% to 9J2 50% to 9J4

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPILCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHVAELVAKCAT

SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET

QKVHTFKSLTTFVTLRFVPFLQKVFNFDIVDANVAGYFKKLILDNMDQRKKQGIVRND

LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS

EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

$$$$$$$

 

>AAGE01172381 AAGE01048909 TC52199 TC17152 TC22373 TC36113 72% to TC60679

575142964 494348902 trace archive seqs

join with TC60874 TC20300 TC25446 TC48456

complete, 57% to CYP9J2, 57% to 9J1 56% to 9J15

MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNSGPMLTRKKDIA

SHIRTMYDTYPDAKMIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEE

VSEKSLFGNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLLSEAK

AGKRLEFEMKDIFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIK

FLLMRAMPALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKG

ALKHEKGEQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLT

FLTYELALNPEIQQRLYEEVMETESNLDGKPLTYE

VLQQMKYMDMVISESLRKWPPGIVADRYCTKEYQFKDGP

GSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAYIPFGVGPRNCI

GSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGVWLEFKPRSI*

 

AAGE02011007.1 Length=228841 use this seq part of a large gene cluster

With 6 genes and one pseudogene on this contig (second P450 on contig)

Others are at 143937 (+)=CYP9Jae7, 164192 (-) = AAGE01021462, 177820 (-) = 9J2, 196721 (+)= 9J9v1 (6 diffs), 218249 (+) = 9J10

220106 (+) pseudogene N-term exon 1 only new

153975 MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNS

       GPMLTRKKDIASHIRTMYDTYPDAK (2) 154190

154250 MIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEEVSEKSLF

GNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLISEAKAGKRLEF

EMKDTFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIKFLLMRAM

PALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKGALKHEKG

EQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLTFLTYELA

LNPEIQQRLYEEVMETESNLDGKPLTYEVLQQMKYMDMVISESLRKWPPGIVADRYCT

KDYQFKDGPGSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAY

IPFGVGPRNCIGSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGV

WLEFKPRSI* 155653

 

>AAGE01021462 82% to AAGE01172381 821673483 574004088 637743693 complete

AAGE01075209 only 4 aa diffs to 574004088

578 MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI 399

398 QMMYNTYPDAK (2) 366

304 IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNTLFALR 128

127 GQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEA 2

    KARKRLEFEMKDTFTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFL

LKIKFIMMRAMPTLA

EKLGVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKD

AGFATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQR

LYEEIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGI

LGDRYCTKDYQYKDAAGSFVIEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKI

NPFAYMPFGVGPRNCIGSRLALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGE

KDVWLEFKPRAL*

 

AAGE02011007.1 = old AAGE01021462 (third P450 on this contig)

164192  MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI  164013

164012  QMMYNTYPDAK (2) 163980

163918  IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNT  163757

163756  LFALRGQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEAKARKRLEFEMKDT  163577

163576  FTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFLLKIKFIMMRAMPTLAEKL  163397

163396  GVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKDAGF  163217

163216  ATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQRLYE  163037

163036  EIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGILGDRYCTKDYQYKDAAGSFV  162857

162856  IEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKINPFAYMPFGVGPRNCIGSRL  162677

162676  ALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGEKDVWLEFKPRAL*  162518

 

>AAGE01005986 476365476 825769964 631521475 578081381 476406576 591384997

51% to TC52199 59% to 9J3 complete

MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLYNRFPNDK (2)

IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALRG

NKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFTND

VIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFLSK

KDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEESQVGRRSHDRTWTDD

ELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILETNRSLDGKILSYEALQ

AMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKGRSVMGSVIGMHHDPK

YYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI ()

GSRFALMEMKAVVYYLLLNLSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV*

 

AAGE02011007.1 1 diff to AAGE01005986 (first P450 on this contig)

143916  MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLY  144110

144111  NRFPNDK (2) 144131

144498  IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALR  144671

144672  GNKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFT  144848

144849  NDVIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFL  145028

145029  SKKDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEE  145205

145206  SQVGRRSHDRTWTDDELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILET  145385

145386  NRSLDGKILSYEALQAMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKG  145565

145566  RSVMGSVIGMHHDPKYYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI (1)  145721

145788  GSRFALMEMKAVVYYLLLNFSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV*  145964

 

>476384387 587659684 832450214 519879482 82% to AY433038 54% to 9J5 complete

note this seq is upstream of 9J6 on AAGE01000868 (4723-3066) on opp. strand

MEINLELWIAVISIGILLYKWITRNNDYFHEKPIPSMAVKPFFGGIAPLVFKSFSMNGFISHIYQKYPNVK (2)

VFGFFDALTPIFVVRDPELIKKITVKDFDHFIDHLPMFGNSENDNPYSIFGKTLFAL

TGKKWRQMRATLSPAFTGSKMRKMFELVIECSDSVAQFYKTQSNETHEVELTDLLTRF

GFDVIASCAFGIRMDSLRDRDNDFYNNGIKMRRFQRLSVAIRFVMFKFCPTLMGKLGIDV

IDRDQVRYFSALIKDAVKQ

RQTKDIIRHDMIQLLIQARKGTLKHQEEKEVEEGFATVKESSIGKTNVTFNMTDNEMI

AQAFVFFLAGFETVSTALTFLIHDLVMNKDVQHRLYEEVASTHEYLQGKHLNYDTLQKMK

YMDMVVSESMRMRPAGPFMDRVCIHDYDLDDGQGLKFTIDKGTAVWIPVQGIHMDPKYYP

NPERFDPERFNDENKAAINPMTYLPFGIGPRNCIGS

RFALMEIKAIVYYLLLHFSFEANRKTQIPLKLRKGFTVVAAEGEVWIDLKAR*

 

>CYP9J6 AY433038 AAGE01000868 (10298-11954) 520111339 528815988

616367213 644315757 56% to AY431970 56% to 9J5 complete

MEVNLGAVIVILSTVILIYKWITRNNDYFHEKPIPSMAVKPLFGSTGPLILKQFSLHGFINHIYQKYPNAK (2)

VLGIFDALTPIFVVRDPELIKKIAVKDFDHFIDHRPMFGNSENDN

PYSIFGKTLFALEGQKWRDMRATLSPAFTGSKMRKMFELVIECSDSVAQYYVKQSKKVVE

VELTDMCTRFGSDVIATCAFGIKMDSLRERDNEFYDNGKKMMRFERLSVALRMFAFKFFP

TLMGQMGIDIIDREQAKYFSALIMDAVRQRQTKGITRPDMIQLLIQARKGTLKHQEEKEV

EEGFATVKESSIGKTNVSFNMTDNEMIAQAFVFFLAGFETVSTTLTFLIYDLVV

NKDVQQRLYEEIVATNDSLQG

KLLNYDTLQKMKYLDMVLSESMRIRPAAATLDRLCVRDYEVDDGQGLKFTINKGTAVWIP

TQGIHMDPMYYPNPERFDPERFNDENKATIDPMTYLPFGVGPRNCIGSRFALMEIKAIVY

YLLLHFSFEANRKTPIPLELRKGFTIVAAEGEVWIDMKAR*

 

>AAGE01063458 263503628 49 % to TC52199  48% to CYP9L3 476324061 

DR747831 520184164 820336301 223394438 51% to CYP9J1 476322739 complete

MEVNLLLLLIIVGILGVIYRQVKKHYDYFHDKPIPSMATVPLLGSTGPLMTKRCTFNDFIQTIYYKYPSAKV

FGLFDMTTKMFVLRDPEVIKKITVKDFEYFVDRRPLFGANKEDDGNENIL

FNKTLVGMVDQRWRDMRAILSPAFTGSKMRAMFELIEQYCTQMVPILKEQSAESGYVDYE

MKDFFSRVANDIIATCAFGLQVESLKSRDNEFYTMGKQMMNFNRFIVLLRVMGLRFFPSL

MIKMGVDIVDREQNQYFSKIIKEAVRARETHGIVRPDMIHLLMQARKGTLKHQQETTEST

AGFATVEESDVGKSVVSKTMSEPEFIAQCLIFFLAGFDTVSTGMLFMAYELDLNPNIQQK

LYEEIAQTNKELGGKPATYDTLQKMKYMDMVVS

ESLRMWPVAAFDRKCGRDYVLDDGAGLKFTIDAGTCIWVPVYGIHRDPKYYPNPDKFEPE

RFSDENRGKIDMTMYMPFGMGPRNCIGSRFALMEIKAIMYALLLNFSIERNEKTQVPLKL

VKGFVGLQVENGLHLRFKKRK*

 

>CYP9J9v1 AAGE01125862 AAGE01449675 TC60679 TC19466 TC24864 TC43056

59% to CYP9J1 AY431945 DR747470 91% to TC60950,

90% to TC60951 588932055 complete

MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNT

GAMMFRRRDVSAHVKLLYNSLEGYK (2)

VAGFYDLMKPIYM

LRDPEVIKQIAVKDFDYFMDHTPTMTNNRADDEVGGDSLFGNSLFALRGQK

WRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSRFGNDV

IATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVMFKFLLLRAFPKLSQKIGVDFVDST

LTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATVEESNV

GKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVMETEES

LNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIEKGQTM

WIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLALMEVKV

IIYNLLKEFSLEASEKTQIPLKMAKNFFALQAENGVWLELKPRKH*

 

AAGE02011007.1 6 diffs to 9J9v1 (fifth P450 on this contig)

196721  MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNTGAMMFRRRDVSAH  196900

196901  VKLLYNSFEGYK (2) 196936

196999  VAGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSKADDEVGGDSLFGNSLFA  197175

197176  LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSR  197355

197356  FGNDVIATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVLFKFLLLRAFPKLSQKIGVD  197535

197536  FVDSNLTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATV  197715

197716  EESNVGKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVM  197895

197896  ETEESLNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIE  198075

198076  KGQTMWIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL  198255

198256  MEVKVIIYNLLKEFSLEASEKTQIPLKIAKNFFALQAENGVWLELKPR  198399

 

>AAGE01065801 AY431970 60% to 9J5 494336054 641818007 578928176 complete

AAGE02029679.1 use this seq change D to E

MEVELLHVGVLVAIVAFLYRWITRNNDYFHDKPIPSMAVTPFLGASGPLLLRKVTFNDFVQSIYNKYPGVK (2)

VFGMFETITPFFVIRDPELIKQIGIKDFDHFVDHRPTFGLDDETAEHPKALFRKTLFSM TGQRWKEMRATLSPAFTGSKMRQMFSLMSECCDEMMKHYLDKAKGSG RVEVEMKDLLSRISINVIASCAFGIKVDCFKEQEHEFLYHGRKMMGFGRPIVIARMLAMR

VFPKFAAKFGIDLLDREQANYFTHVFQETIRARESHGYIRHDMIDLLLQARKGTLKYQEE

KDDQEGFATVQESDVGKADVSKSMTEAEMIAQCLIFFLGGFDTVSTCAMFTAYELVRNPE

VQHKLYEEIKQTEKELEGKPLSYDALQKMKYMDMVVSETLRMWPLAPATDRLCTQDYTID

DGQGVRFTIDKGTCVWFPAAGLHHDPQYFPNPEKFDPERFNDENKR

NINLGAYLPFGIGPRNCIGSRFALMEVKAVMYFILLKFSFVRGAKTQIPMQLRKGFTNLG

PENGMHVELKLR*

 

>AAGE01006393 81% to AAGE01065801 83% to AY431970 complete

AAGE01400897 84% to AAGE01065801 834914646 N-term 578891344 C-term

749 MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI 570

569 QTIYNKFPGVK (2) 537

474 VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH 331

330 PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR 151

150 VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFL 181

 182 RHGKKMMDFARPIVIARMMAMRVFPKLSSRFGIDLLDPEQARYFTQVFQETIKARESHGT 361

 362 VRNDMIDLLLQARKGTLKFQEEKNDQEGFATVQESDMGKVEVMKHITESEMIAQCLVFFL 541

 542 GGFDTVSTCAMFMAYELVRSPEVQQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSE 721

 722 TLRIWPLAPATDRLCTKDYTIDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPE 901

 902 RFNDENKRNINLGAYLPFGIGPRNCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQL 1081

1082 RKGFTNLGPEKGMHVELKLR*

 

AAGE02029680.1 Same as above use this seq

86528  MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI  86707

86708  QTIYNKFPGVK (2) 86740

86803  VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH  86946

86947  PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR  87126

87127  VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFLRHGKKMMDFARPIVIARMMAMRV  87306

87307  FPKLSSRFGIDLLDPEQARYFTHVFQETIKARESHGTVRNDMIDLLLQARKGTLKFQEEK  87486

87487  NDQEGFATVQESDMGKVEVTKQITESEMIAQCLVFFLGGFDTVSTCAMFMAYELVRNPEV  87666

87667  QQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSETLRIWPLAPATDRLCTKDYTIDD  87846

87847  GQGLKFTIDKGTCVWFPAAGLHHNPQYFPNPERFDPERFNDENKRNINLGAYLPFGIGPR  88026

88027  NCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQLRKGFTNLGPEKGMHVELKLR  88197

 

>AAGE01179692 AAGE01102574 AAGE01259804 (3 aa diffs)

476398393 616358813 575550118 584317339

67% to TC60679 in 9J fam

53% to 9J4 63% to 9J9 complete

AAGE01266366 parts of two genes

MAAAVLVAVLLFCRYVAKKYQYFLTK

PVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDVFPQAP

VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPMMPTDAEKEHNPDTLFGNT

LLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEHFKAEAAAGRTMEHEMKET

FSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLATFVKFLLITFVPRLMRWLKVDVLNGQSAAY

FKRIILDNMEQREAHKILRNDMIQILMEVRKGTLQHQKEEKDTKDAGFATVEESQVGKSS

HSRVWTENELVAQCLLFFLAGLDTISTCMT

FLTYELTVDPDIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS

NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN

TSAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEK

GVWIEFKPRSSQDS*

 

>AAGE01198792 (parts of two genes, 1-804 is N-term of AAGE01339434,

1707-973 is C-term of another gene) 95% to 476398393 

574155449 638535809 complete

MLKVDLFMAAALLAAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDV

ATHFKKIYDVFPQAP (2)

VIGFYDFTTPMYLLRDPEMIKKVSTKDFDYFTDHVPMMPTDAEKEHGPETLFGNTLL

SLRGQKWRDMRSTLSPAFTESKMRHMFELVAECGRSLVEHF

QTEAAAGRTMVHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLAT

FVKFLLIMFVPRLMRWLKVDVLNGQSAAYFKRMILDNMEQREAHKILRNDMIQILMEVRK

GTLQHQKEE

1707 KDTKDAGFATVEESQVGKSSHSRVWTETELVGQCLLFFLAGLDTISTCMTFLTYELTVDP 1528

1527 DIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVISNRYCNKNYLY 1348

1347 DDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQINTSAYIPFDVG 1168

1167 PRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEKGVWIEFKPRNSPDF* 973

 

>AAGE01102574 possible pseudogene fragment upstream of AAGE01179692

AAGE02011008.1 pseudogene piece

1906  T*LII*KGHTAFISTYRSHHDP*YYENPEQFDPEWFNEAYRAHISTNIYIPFEFRPRNCI  2085

2086  GSRLALIEVKRM  2121

2121  VYQHLKDFETVTTEPIPVRIARDPLALHPEKGVRAELK  2234

 

AAGE02011008.1  use this seq (first P450 on contig)

Note 6 P450 genes in this contig at 2k, 4k= AAGE01339434, 6k=9J8,

16k= AAGE01007189, 19k= CYP9LaeP 494089659, 31k = 9Jnew all (+)

2420  MAAAVLVAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDV  2599

2600  FPQAP (2) 2614

2669  VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPM  2779

2780  MPTDAEKEHNPDTLFGNTLLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEH  2959

2960  FKAEAAAGRTMEHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLA  3139

3140  TFVKFLLITFVPRLMRWLKVDVLNGQSAAYFKRIILDNMEQREAHKILRNDMIQILMEVR  3319

3320  KGTLQHQKEEKDTKDAGFATVEESQVGKSSHSRVWTENELVAQCLLFFLAGLDTISTCMT  3499

3500  FLTYELTVDPDIQQRLYEEITETDKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS  3679

3680  NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN  3859

3860  TCAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATEKTQIPVSIARDSFGLHPEK  4039

4040  GVWIEFKPRSSQDS* 4084

 

>AAGE01339434 AAGE01406122 775439256 579949790 521969711 complete 61% to 9J10

AAGE01266366 parts of two genes

AAGE01198792 (parts of two genes, 1-804 is N-term of this gene,

1707-979 is C-term of another gene 95% to 476398393) 

NABOD09TR  NABOD09 TC54929 TC20193 TC31509 TC43101 TC5368 TC8501

MEVDLLSAFAVGCIVILIYHYASQKYLYFLTKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAFPESK (2?)

VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRS

LPSANDRADTDQPVEGLFANSLVAFQGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSM

VEHFRSEANAGRRLECELKDVFSRFCNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQ

SPKIILKFLLFQTVPWLMRKLKVDFADADLADYFKGIIQ

DNMKQREVHGIVRNDMVQMLMEVRKGTLKHISDDRESKDSGFASVEESHFGKSTHSRAWT

DNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMDTERLLSEKPLSYEA

LQSMKYLDMVVSETLRKWPPTIDSDRYSTRDYLLDDGAGLKVPIEKGRSIYIPIVAIQND

PKYFPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALMEVKVAVY

YLLREFSLERTERTDDPIRLTKKAIDLRTENGAWVELKPRKI*

 

AAGE02011008.1  use this seq 6 diffs to AAGE01339434 (second P450 on contig)

4256  MEVDLLSAFAVGCIVILIYHYVSQKYLYFLAKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAF  4456

4457  PESK (2) 4468

4537  VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRSLPSANDRADTDQPVEGLFANSLVAF  4716

4717  QGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSMVEHFRSEANAGRRLECELKDVFSRF  4896

4897  CNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQSPKIILKFLLFQTVPWLMRKLKVDF  5076

5077  ADADLADYFKGIIQDNMKQREVHGIVRNDMVQMLMEVRKGTLKHIGDDRESKDSGFASVE  5256

5257  ESHFGKSTHSRAWTDNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMD  5436

5437  TERLLSEKPLSYEALQSMKYLDMVVSETLRKWPPTIDTDRYSTRDYLLDDGAGLKVPIEK  5616

5617  GRSIYIPIVAIQNDPKYYPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALM  5796

5797  EVKVAVYYLLREFSLERTERTDVPIRLTKKAIDLRTENGAWVELKPRKI*  5946

 

>CYP9J8 AAGE01187448 AAGE01142069 AAGE01118978

476375412 832374064 810104215 758886185 262894467 complete

AAGE01439874 N-term of 9J8 and C-term of AAGE01339434

9J8 is 609bp downstream of AAGE01339434

MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYDTFPNAK (2)

IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFA

LRGQKWRDMRATLSPAFTGSKMRHMFELVLDC

ARSTAEYFREEAKSGRTTEYEMKNVFSRFSTDVIGSVAFGIKVDSLREQDNDFFVKGKAM

LNFQNLKSLLKVIMLRSAPGLMNRLNVDITSPQMNAYFKDMIMDNMKQREINGIVRNDMI

NILMQVQKGALLHQKDEQDTKDAGFATVEESSVGKALHNRVWSENELVAQCFLFFLAGFD

TVSTCLTFVSYELLANPDVQQKLFEEIMAVEASLDGKPLSYEVLQKMQYLDQIISETLRL

WPPAPFVDRYCVKDYLFDDGQGTRVPIEKGQIVWFPITALHHDAKYFPEPNRFDPER

FSEQNRPKINPGAYLPFGVGPRNCIGSRFALMEVKAIVYHLVKNFTLERSGKSRVPLKLE

KSYIAMIVEGGMWLEFRPRA*

 

AAGE02011008.1 use this seq 4 diffs to CYP9J8 AAGE01187448 (third P450 on contig)

6554  MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYET  6748

6749  FPNAK (2) 6763

6825  IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFAL  7001

7002  RGQKWRDMRATLSPAFTGSKMRHMFELVLDCARSTAEYFREEAKSGRTTEYEMKNVFSRF  7181

7182  STDVIGSVAFGIKVDSLREQDNDFFVKGKAMLNFQNLKSLIKVIMLRSAPGLMNRLNVDI  7361

7362  TSPQMNAYFKDMIMDNMKQREINGIVRNDMINILMQVQKGALLHQKDEQDTKDAGFATVE  7541

7542  ESSVGKALHNRVWSENELVAQCFLFFLAGFDTVSTCLTFVSYELLANPDVQQKLFEEIMA  7721

7722  VEASLDGKLLSYEVLQKMQYLDQIISETLRLWPPAPFVDRYCVKDYLFDDGQGTRIPIEK  7901

7902  GQIVWFPITALHHDAKYFPEPNRFDPERFSEQNRPKINPGAYLPFGVGPRNCIGSRFALM  8081

8082  EVKAIVYHLVKNFTLERSGKSRVPLKLEKSYIAMIVEGGMWLEFRPRA*  8228

 

>AAGE01007189 AAGE01035444 65% to 9J8 complete

3059 MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDLY 2877

2876 HTHPEAK (2) 2856

     IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS 2698

2697 DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ 2518

2517 SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN 2338

2337 FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI 2158

2157 LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSET 1981

1980 VSTYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLW 1801

1800 PPAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQ 1621

1620 NRSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMI 1441

1440 GMEVDGGVWLELEPRERS* 1384

 

AAGE02029680.1 Length=88206 note 9 P450s on this contig, use this seq

13173  MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDL  12994

12993  YHTHPEAK (2) 12970

12907  IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFSDHTPIYTGG  12785

12784  DVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQSMVDYFREE  12605

12604  ATAGKRTEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFTNGKELMNFRNMKTVAK  12425

12424  VLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINILMQVRQGAL  12245

12244  KNQKEDQETTNTGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETVSTCLTFLAY  12065

12064  ELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWPPAPFADRYC  11885

11884  SKNYRYDDGQGTRATIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQNRSKIKTGTY  11705

11704  LPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIGMEVDGGVWL  11525

11524  ELEPRERS*  11498

 

AAGE02011008.1 use this seq 1 diff to AAGE01007189 (fourth P450 on contig)

16655  MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNGAPLIFKKKDMLRHIQDLYHT  16843

16844  HPEAK (2) 16858

16921  IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS  17016

17017  DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ  17196

17197  SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN  17376

17377  FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI  17556

17557  LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETV  17736

17737  STYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWP  17916

17917  PAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQN  18096

18097  RSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIG  18276

18277  MEVDGGVWLELEPRERS* 18330

 

>AAGE01004684 AAGE01021948 494089659 55% to CYP9L2 223518047 590305650 827533211

569625505 575383376 520166303 520595733 637720165 65% to 263503628 complete

possible pseudogene This seq has a stop codon seen in 9 trace files

MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNRFPDAK (2?)

LGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRRSLFGDSASESDSILITKTLL

LLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMVGILKDQAGPVGYVDYE

MKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQARKGVLKHHRETAEASAGFATVEESEVGKTAIGKTMT

DSEFVAQCLIFFIAGFEAISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTL

QQMKYMDMVTSEALRMWSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMI 721

720 PSYAIHRDPKYYPNPDRFDPERFSEERRADINMTMYLPFGAGPRNCIGSRFALMEMKAIV 541

540 YGLLLNFSIERNEKTQVPLRLNKGFAPLAGEKGMHLRLKVRG* 418

 

AAGE02011008.1 Length=35048 use this seq both ours were wrong

19802  MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPE  19981

19982  FIQMIYNRFPDAK(2)

       ALGMFDMSTRFVVLRDPELIKKVLVKD  20161

20162  FEFFIDRRSLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVT  20341

20342  YSDRMVGILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKK  20521

20522  MMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDM  20701

20702  IHLLMQARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFE  20881

20882  AISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRM  21061

21062  WSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEE  21241

21242  RRADINMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFA  21421

21422  PLAGEKGMHLRLKVRG*  21472

 

AAGE02011008.1 use this seq 1 diff to AAGE01007189 (fifth P450 on contig)

pseudogene This seq has a stop codon seen in 9 trace files

19826  MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNR 20005

20006  FPDAK (2) 20020

20081  ALGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRR  20185

20186  SLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMV  20359

20360  GILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGK  20539

20540  TSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQ  20719

20720  ARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFEAISSQ  20896

20897  MSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRMWSGP  21073

21074  ATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEERRAD  21253

21254  INMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFAPLAG  21433

21434  EKGMHLRLKVRG* 21472

 

>CYP9Jnew AAGE01553900 AAGE01064689 59% to 9J5 644306108 757010867 616348285 complete AAGE02011008.1 (sixth P450 on contig)

30792  MEVNVLYLLIVVAVLAVIYRRITRFYEYFHDKPIPSMAAGPPFGSAGPLY

       RKKYSFNDFIKMTYDKFPGAK (2) 31004

31067  VFGLCDMTTKLFVIRDPELIKKVTVKDFDYFVNRRATFGESIDDHDEMLFAKSLLALN  31240

31241  DQKWRDMRATLSPAFTGSKMRAMFELIEGYSARMVEILKEQSQAAGYVDYEMKDCFTRVA  31420

31421  NDIIATCAFGLQVESLKNRENEFYVMGKNMLNFNRVSIMFRIFGFNLFPGLMAKLGVDLI  31600

31601  DAEFGQYFSKIIKDAVHTRETRGIVRPDMIHLLMQAKKGALKSQYETTDANTGFATVEE  31777

31778  SEVGRSSIAKAITESEMIAQCFVFFLAGFDSVSSEMVFMAYELALNPDVQQRLYEEIVET  31957

31958  DKQLGGKPPTYDTLQKMQYMDMVVSESLRMWPAGAFDRKCDRDYVLDDGAGLKFTIDRG  32134

32135  AYVWIPVHGIHRDPKYYPDPDKFDPERFSESNRDNIDMTMYMPFGAGPRNCIGSRFALME  32314

32315  IKAIMYALLLQFRIERNEKTSVPLKLVKGFAGLNGEGGVHLRLTLRQ* 32458

 

>AAGE01123974 494309314 592077078 733946792 579386359 62% to CYP9J5

pseudogene N-term missing, deletion in second exon

TPFLGASGPLMLRKVTFIEFIQSIYNKYPGVK (2?)

VFGMFDTITPFFEIRD

[DELETION]

KFGIDLLDREQADYFTHVFQETIRTRESHGIIRHDMIDLLLQARKGTLKYHEE

KDDQEGFATVQESEVGKVEMSKSMTEAEMIAQCLIFFLGGFDTVSSCIMFTAYELVRN

PEVQQKLCEEIVQTDKELGGKPLSYDALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYI

VDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPERFNDENKRNINLGTYLPFGI

GPRNCIGSRFALMEVKAVMYYILLKFTIARSAKTQIPVQLRKGFTNVGPDHGMHMELKLR

*

 

>494345849 G719P81FE4.T0 pseudogene 92% to 494309314

DLLVQTIHGYV*SIIRRKMLRKGFRHCQES*

LGKMEMSKSMTEAEMIAQYLIFFLDGFATVSSCIMFTAYEVVRNPEVQRKLCEEIVQTDK

ELGGKPLSYEALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYIVDDGQGLKFTIDKGTC

VWFPAAGLHHDPQYFPNPEQ

FDPERFNDENK

XXNSGLGTYLPFGIDPK

 

>AAGE01253357 476363988 755013039 587425733 632907226 55% to CYP9J5 complete

MEVNLLLLATVITVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFADFVSYVYNKFPGVK (2?)

VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESAESEEHPYALFKRVI

FALNGQRWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGRMEVEIKDML

SRYGINVIASCAFGIDVDCFKDVDHEFMYHGRRMLQMGNPVVIAKMLFMRMFPNLAKKSG

MDVIPREQAVYFTKLIKETIRTRESQGIVRNDMIDLLLEARKGTLKYEEEREEVQEGFAT

VQESDVGKAQVTKAISEIDMIAQCLIFFIAGFESVSTTSMFMIYE

LILNPEIQQKLYEEVEQTYKQLGDKLLTYDALQSMKYMDMVVSETMRKWPLSPIGDRICV

RDYTLDDGQGLRFTIDKGTCVWFPIHGLHHDPQYYPNPDRFDPERFNDQNKGNIKMGTYL

PFGIGPRNCIGSRFALMELKAVMYHMLRKFSFHRSTNTRIPLKLRKGMNNVGTDEGMHVERIRRL*

 

>AAGE01015732 91% to 476363988 55% to 9J5 complete

2265 MEVNLLLLATVLTVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFSDFV 2444

2445 AYVYNKFPGVK (2) 2477

2538 VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESVESEEHPYALFKRVIFAL 2714

2715 NGQQWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGQMEVEIKDMLSRY 2894

2895 GINVIASCAFGIDVDCFKDVDHEFMYHGTRMLQMGNPLVIAKMLFTRMFPKLANNWGMDV 3074

3075 IPREQAVYFSKLIKETIRTRESQGIVRNDMIDLLLEARKGKLKYEEEREEEQEGFATVQE 3254

3255 SDVGKAQVTKAISEVDMIAQCLIFFIAGFESVSANTMFMIYELILNPDIQQKLYEEVEQT 3434

3435 YKELGDKRLTYDALQSMKYMDMVISETLRKWPLTPVGDRMCVKDYVLDDGQGLRFTIDKG 3614

3615 TCVWFPIHGLHHDPQYYPNPDRFDPERFKDQNKGHIKMGTYIPFGIGPRNCIGSRFALME 3794

3795 MKALMYHMLRRFSFHRTANTQIPPKFRKGMNNFGTEQGLHVELRLRGQ* 3941

 

>AAGE01001411.1 3000-5000 region 10kb upstream of 9J7 complete

MDLDWTQLLAIVAIVVIIYRWLTGNHDYFHHKPIPSMTVRPIMGSTGPLLLKQCTFPEFIQSSYKKFAGAR (2)

VFGLFDTNIPMYVICDPDLIKRIAVADFDHFMDHRPIFGASNSDHPNLLFEKT

LFALTGQKWKNMRSTLSPAFTGSKMRQMFKFVVDCSESMVRFYQSEPRGTSHEMKDVFSR

FANDVIATCAFGIEVDSLRKRNNEFYVHGSKMLRLTRLSVVARLLGYRFAPTLMGKLGLD

INDQEQNQYFSSLVKETVKIRDVQGIFRPDMVHLLMEAKKGTLHHQEEIEHNKGFATVEE

SAMTKMRSMNSMTEVELIAQCLMFFLAGFDTVSTCLTFTAYELALNPTIQDKLYEEIKRI

HEAMSGKSLDYETLQKMSYMDMVISEVLRKWPAIAALDRLCVQDYEMDVGNGLKFTIDRG

SGIWIPIHAMHHDPKYYPDPERFLPERFSDENKASINMGAYLPFGIGPRNCIGSRFALME

VKAIVYHMLLRFSFERTAKTQVPVEIVKGFAPLKPKDGVFLEFRPRDAV*

 

>CYP9J7 262902386 621799144 520524964 618123933 complete

AAGE01001411.1 13000-15000 region

MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFTDYVKTVYDSFPDAK (2?)

VCGVMNTVIPLYIVRDPELIKKIAIKDFDHFADHRPVFGSDHGDHPNLIACKA

LFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVECSEALVDYYRDQGAKEWDM

RTLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVFTNFKTQLKIAGYLFTPWLM

NWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLLMQSRKGILKNQQ

EDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLAYELA

LNPDVQEKLSAEIAETHQSLNKRSIT

YEALHSMKYLDMVISESLRKWPSAPAVDRLCVQDYTLDDGQGLQFRMEKGIGIWIPIYGI

HRDPKYYPEPDKFDPERFSDQRKGDIQPGTYLPFGIGPRSCIGMRFALMELKCIVYYLLL

NFRLEKTERTEVPPVLEKGYVTLSAANGVWLKMVPK*

 

AAGE02029679.1 use this seq

77856  MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFT  77677

77676  DYVKTVYDSFPDAK (2) 77635

77577  VCGVMNTVIPLYIVRDPELIKKIAIKD  77497

77496  FDHFADHRPVFGSDHGDHPNLIACKALFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVE  77317

77316  CSEALVDYYRDQGAKEWDMKDLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVF  77137

77136  TNFKTQLKIAGYLFTPWLMNWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLL  76957

76956  MQSRKGILKNQQEDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLA  76777

76776  YELALNPDVQEKLSAEIAETHQSLNKRSITYEALYSMKYLDMVISESLRKWPSAPAVDRL  76597

76596  CVQDYTLDDGQGLQFRMEKGIGIWIPIYGIHRDPKYYPEPDKFDPERFSEQRKGDIQPGT  76417

76416  YLPFGIGPRSCIGMRFALMELKCIVYYLLLNFRLEKTERTEVPPVLEKGYVTLSAANGVW  76237

76236  LKMVPK* 76216

 

>AAGE01024220 39% to 9J8 40% to CYP329A1 anopheles 826166409

note has a P at I-helix T location like CYP329

There is a deletion and a stop codon in N-term exon

This may be the CYP329A1 pseudogene equivalent in Aedes

Change N-term to eliminate the stop codon

METEDLYWFSFILVTIVGFFTFKLMTKNR

ILSGQRSSLREAAFYLRKSG*GNK

4729 IFGFYNYLSPVYYIRDPELIRKLWINEF

 

4455  METEDLYWFSFILVTIVGFFTFKLMTKNRHFFRVRGVPFEKPHFIYGNLGEVTSGKLSSLELIASF  4652

4653  YQKFENER (2) 4676

IFGFYNYLSPVYYIRDPELIRKLWINEFNSFANHAYFLDESKDPI

LGNQLHLLKNEKWRQMRHTLTPVLSGQSVSSMSSLIRTNSLDLV

5853 DHLKASVDSELEFKGIFLKYVFNVIANCAFGLELNTFKDESDKFCTYGTALVYGNNPVQT 5674

5673 LKTMMFYLFPKMTTQMKVRLMEDEHAAYFTNLIGSTISEREKKNVNRADVIQMLHQANKG 5494

5493 ELKAEGQDDEVLQMKDFSKCKWNQEELIAQCIAFFGSGFEPLVNLLSFA 5347

5346 AYELAANPDIQQKLLSELEGSLRDDPVVSDTVDKLSYLNMVISETLRKWPASPSLDR 5176

5175 ECSKDYLLDDGGCRVQFRKGDTLWVSIWALHRDERNFPDPERFDPERFSEKNKASITPG 4999

4998 TYMPYGVGPRNCI (1)

4902 GTRLASLVAKITLVDLVRNFKLELGSRMVQPLRLSKTSYSMEPEGGFWLKMTPR* 4738

 

>CYP9J10 complete AAGE01039952 AAGE01005096 494160094 72% to TC52199

754334205 821634863 803206067 637185165 TC60951 TC32891 TC38518

57% to CYP 9J1, 90% to TC60679 98% to TC60950 (56% to anopheles 9J4)

CYP9J10 TC60950 TC38519 60% to CYP9J4 98% to TC60951 (3 aa diffs)

98% to 9J10v2 (2 aa diffs) 98% to 9J10v1 (3 aa diffs)

MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSHLKMLYNTYEGSK (2)

MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA

LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKD

TFSRFGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLLKVLFLRAFPKLS

HKLGLDFVDSTLTEYFKQMIVDNMKQRAAHGIMRNDMIQMLMEVRKGSLRHQK

DEKETKDAGFATVEESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQ

RLYDEVMETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQ

GTRFLVEKGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNC

IGSRLALMEVKVIIYNLLKDFSLESSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR*

 

AAGE02011007.1 5 diffs to 9J10 (sixth P450 on this contig)

218249  MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSH  218428

218429  LKMLYNTYEGSK (2) 218464

218529  MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA  218705

218706  LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKDTFSR  218885

218886  FGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLVKVLFLRAFPKLSQKLGLD  219065

219066  FVDSTLTEYFKQMIVDNMKQRDAHGIMRNDMIQMLMEVRKGSLRHQKDEKETKDAGFATV  219245

219246  EESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQRLYDEVL  219425

219426  ETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQGTRFLVE  219605

219606  KGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL  219785

219786  MEVKVIIYNLLKDFSLVSSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR* 219938

 

>AAGE02011007.1 pseudogene exon 1 new seq last P450 on this contig

168bp downstream of 9J10, 51% to CYP9Jae4

220106  MLQVHMFLAATVVLLL*YSYSITT*RNIYEYSLSKPISCAKPTFLVGRNWSTSLCKADMT  220285

220286  LHFKKICVFFPDA  220324

 

>AAGE01088707 494098990 826090661 819721560 57% to 9J5 complete

MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFIQTLYDKYRGVK (2)

VFGLFDMMTPTYVIRDPELIKQVAVKDFDHFADHVQVFGNSSYDHPNLLTGKTLFSLTG

LRWKTMRATLSPAFTGSKMRYMFELIVECTERAVRYYEKNALKSGPKVYEMKDVFSRFAN

DVIATCAFGLQIESSRDRDNEFFVNGSKMLDFSRPSVMLRIMGHQLVPWLMAFFGWDVID

EQQNTYFKTLILDAIREREHRGIVRPDMINLLIHAKKGTLKHQQENEHVPEGFATVQESE

VGTSSVTTVMTDVEMVAQCLIFFLAGFDTVSTSLLYASYELAINPEVQQKLYDEIQNTRT

ALNGKPLTYDAMQKMKYMDMVMSEVLRMWPPAPSTDRLCTKNYVMDEGNGVKYTIEKGTS

VWFPIHALHHDPNYYPQPEKFDPERFSDERKGSINAGAYLPFGIGPRNCIGSRFALAEVK

TILYYMLGSFSFERCSKTEVPPVLAKGFDVIPANGMHIEFKPRPKK*

 

AAGE02029679.1 2 aa diffs to AAGE01088707 use this seq

32148  MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFI  32327

32328  QTLYDKYRGVK (2) 32349

32424  VFGLFDMMTPTYVIRDPELIKQVAVKDF  32507

32508  DHFADHVQVFGNSSYDHPNLLTGKTLFSLTGLRWKTMRATLSPAFTGSKMRYMFELIVEC  32687

32688  TERAVRYYEKNALKSGPKVYEMKDVFSRFANDVIATCAFGLQIESSRDRDNEFFVNGSKM  32867

32868  LDFSRPSVMLRIMGHQLVPWLMAFFGWDVIDEQQNTYFKTLILDAIREREHRGIVRPDMI  33047

33048  NLLIQAKKGTLKHQQENEQVPEGFATVQESEVGTSSVTTVMTDVEMVAQCLIFFLAGFDT  33227

33228  VSTSLLYASYELAINPEVQQKLYDEIQNTRTALNGKPLTYDAMQKMKYMDMVMSEVLRMW  33407

33408  PPAPSTDRLCTKNYVMDEGNGVKYTIEKGTSVWFPIHALHHDPNYYPQPEKFDPERFSDE  33587

33588  RKGSINAGAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVLAKGFD  33767

33768  VIPANGMHIEFKPRPKK* 33821

 

>AAGE01194580 86% to 494098990 832396347 complete

AAGE01341824 89% to 494098990

 543 MEVNLFYFGAIVAIFGALYYLLTKKHGYFHDKPIPAMGAKPILGSIGDLMLQRVPFNTFL 722

 723 QAAYDKYSGVK (2) 755

 816 VFGMFDLMTPTYVIRDPELIKQVGVKDFDHFVDHEQVFGNSSYDHPNLLTGKTLFSLTG 995

 996 SRWKTMRATLSPAFTGSKMRYMFELIVECIERAVKYYEEETKKKGAQVYEMKDVFSRFAN 1175

1176 DVIATCAFGLQVESSRDRDNEFFVNGSKMVDFGKPSFILRLMGHQLVPWLMAFFGWDVID 1355

1356 GQQNTYFKRLIMDAIKEREHRGIVRPDMINLLIQAKKGTLKHQQENEQVPEGFATVQESE 1535

1536 VGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLYTSYELAVNPEVQKKLYDEIQNTRT 1715

1716 ALGGK 1730

     PLTYDAVQKMK

     YMDMVISEVLRKWPPIASTD 879

 878 RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699

 698 GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519

 518 HIEFKPRPKG* 486

 

>AAGE01341824 89% to 494098990

1238 KGTLKHQQENEQVPEGFATVQESEVGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLY 1059

1058 TSYELAVNPEVQKKLYDEIQNTRTALGGKPLTYDAVQKMKYMDMVISEVLRKWPPIASTD 879

 878 RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699

 698 GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519

 518 HIEFKPRPKG* 486

 

 

CYP3 clan

Four CYP9M related sequences and one new CYP9 subfamily, plus one pseudogene

 

>AAGE01023613 494247077  812172036 586045833 57% to 9M1 complete

MVLLDLLVVLIPIVSYLLYRWAVATYDFFEKRKIPYVKPYPFVGGLWPVFSGKLHPTDAAVLGYNLFPEN

RFSGFFAFRRPGYLIHDPALAKQIMIKDFDHFTDHMNTISVDVDPIFGRALFFMDGQRWR

HGRSGLSPAFTGSKMRNMFTLLSKYVEGAMQRLAQDAGQGKMELEIRDLFQK (2?)

LGNDIITSISFGVEIDSVHNPNNEFFKRGKQLAATGGFQGLKFFFSLVVPDSVFKLFGI

RFLPKEAADFYVDVVSKTIKHREEYKIVRPDFIHLFVQARKNEL

KEETADDELKSAGFTTVEEHIEASTENSQYTDLDITAVAASFFFGGIETTTTMLCFALYE

LAGNKEVQQKLQAEIDSVRKELGGGSLTYEVLQKMKYLDMVVTETLRRWPPLGITNRVCV

KPYTFEDHEGTKVTIEKGQLIQIPVQSFHRDPSFFPDPYRFDPERFSEENKHKINQDAFL

PFGSGPRNCIGSRLALMQAKCLLYYLFSAFSLEYSDKMDVPIKLNKMSLTYTAKNGFWFN

LLPKKVAV*

 

>AAGE01008959 54% TO 9M1 74% TO 494247077 complete

6014 MGVLEWLAVFVPIVTYLLYRWSTATYDYFREKKIPFVKPYPLFGSLWPIFSGKLHPVDAT 6193

6194 ILGYDMFPGRRFSGFFTFRTPSYLVHDPALAKQVLIKDFDHFTDHTSTILPDVDPVLGRN 6373

6374 LFFMDGQRWRHGRSGLSPAFTGSKMRNMFVLLSNYVDGAMKRLAQDAGPGKMELELRDLFQK (2) 6559

6593 LGNDIITSISFGVDIDSIHNPNNKFYKRGQKVTATGGIQGFKVFLTTVIPGSVFKF 6784

6785 FGVKVLPKEAADFYVDVISKTVKQREEYKIVRPDFIHLFMQARKNELKEDKADEELKDAG 6964

6965 YSTVEEHLQSTTKNNQYTDLDIAAVAVSFFFG 7060

7060 GIETTSSVLSFVLYELCLNPAIQHKLQEEIDTVRAQLEGNPLSYEVLQKMKYMDMVVS 7233

7234 ETLRRWAPLGIVSRKCVKPYTFEDHDGTKVTVEKGHIIQIPLQSFHRDPNFFPDPYRFDP 7413

7414 ERFSDENKHKIKQDTYIPFGSGPRNCIGSRLALMQTKCVLFYLFANFSVEFSEKMDVPIK 7593

7594 LNKMALSYTAQNGFWFHFAARDVKT* 7671

 

>AAGE01012700 71% to 494247077 55% to 9M1 complete

2863 MFESLALIVPVAAFLVYLWSIATYNYFKKRKIPFVKPYPLIGGLWPALTGKVLPLEAATL 3042

3043 GYDMFPKHRFSGYFMFRNPEYLIHDPALAKQVMIKDFDHFTDHTSVFPVEVDPIIGRSLF 3222

3223 FMDGQRWRNGRSGMSPAFTGSKMRNMFTLLSKYADSAMQRLVEDAGKNKLELEIRDLFQK (2) 3402

3464 LGNDIITSISFGLDIDSVHNPDSEMFKKGKQLAGTTGFQGFKFFLSMALPSSIYKLFGI 3640

3641 RLITKDVADFYLDIVTNTIKYREENNIVRPDFIHLFVQARKNELKEDKTDETLDSAGFTT 3820

3821 VQEHIKSSSENSKYSDFGITAVVASFFFGSIETTSTVLCFAMYELAANPEIQQKLQDEIE 4000

4001 LVKDQLNDSPLTYEVLQKMKYLDMVVSETLRRWPPLGTTNRVCVKPYTLEDYDGTQVTIE 4180

4181 KGQAVQIPIISYHHDPNYFPDPYRFDPERFSDESRDKINQDAFLPFGSGPRNCIGSRLAL 4360

4361 MQVKSLLFYLLTCFSVEFSEKMDVPIKLKKMSMTYTAQSGFWFNLVPKSVEV* 4519

 

>494569869 25% to 9M1 N-term 39% to 494247077 pseudogene of 494247077

LDLLDLVGVLIPIGSYPLDLWVLTSYYSFEKIEIPYVKPYPLVG

ELRPEFTNVLLPSYDTGIGHLLFPETDLP*

FFWIHQIACLSPYSPQAILINMIGTYLFSDFCGI*

SADVDPDFGRALFLTDGLKTRPGRSGINVIVWAYNMNMLAVCYMPLFHGSYQKQAEDARQ

CNMSIDYCDVFHL ()

RGSDVIHYNIRLDVHIDCVHVPFYDYYHKRASG*RLTGVIFWDLEF

 

>AAGE01026951 AAGE01099852 476324290 55% to 9M1

579013166 826063288 832528009 494192568 637071386

613990760 complete

METLVWIALVLLIIIFLIYRWSIACYDYFEKRNILY

VKPYPFFGGLWPVFCRKLHPTDATIMGYNMFPERRCSGLFTF

RNPAYDIHDPTLAKQIMVKDFDHFTDHMNTISADVDPILGRALFFMGGSRWRHGRAGLSP

AFTGSKMRNMFVLLSKHVDEAMRRLVEDAGEGALEVEIRELFQK (2)

LGNDITTSISFGVEVDSVHNPGNTFLEM

GKLLIATSAFQGFKYLLSLVVPESVFKFFGVRFFPKEAADFYLDIVTETISHREKNKIVR

PDFIQLFVQARKNELKKDNTDDNFK

SAGFTTVDEYIESSTENGQYTDLDIAAVALSFFFGGIETTTTAICFAVYEIVLNATIKEK

LQTEIDSVKEGLEGRPLSYEILQQMKYLDMVVSEALRRWPPVGVTNRACVKTYAFEENDG

TTVTIEEGQVVHIPVQSFHRDSNYFPDPLRFDPERFSDENKHVINQDAFLPFGSGPRNCV

GSRLALMQAKCILYYFFCTLFDGLFQQNGPTDQTQDYVSLLRSAEWFVVSFDAEHGKVVKYKK*

 

>AAGE01015749 494133555 519945594  763120971 810094850 53% to 9M1 complete

MILLLLVVAVGYLIYRWSVATFDYFEKLNVPFLKPYPFFGALWPSLKGE

KSPTDATAEGYRLFPGNRFSGFFSFREPGYLIHDPELIKQIAIRDFDHFTDHANNVPLEV

DPFLGRGLFFTGGQRWKHGRTALSPAFTGSKMRNMFQLLSSYTDGAMKRLVKDAAGGKLEREMKDLFQR (2?)

LGNDVMTSISLGFDTDSVHDPDNEFFQYGKRLSRTSGLQG

LRFFVLTLLPENILKVIGIRIIPSDIANFYNEVV

IKVIKERLEKNIVRPDFIHLMLQARKNELKADKTDEFLNDAGFSTVKEHLQSSAKNQ

IEWSDYDIAATSASFFFGGIESTTTLVCFALYEIALNHDVQQKLRAEVDATKLSLGDAKL

TYESMQQMKYMDMVITETLRKWPPFGVTNRRCTKAYSLENANGTKVTVHKGQVIFIPIYE

IQRDAQYYPNPERFDPERFSDQNRGNLNQDTYLPFGIGPRNCIGSRLTLMQAKCYLFYML

TCFEIQLSTKTDVPMQLDARSSALNAKNGLKMQLIPRGV*

 

>AAGE01236202 AAGE01528761 AAGE01574909 575351627 754305099 587660657 263512612

581727980 743856203 625109625 223413916

773058412 (exon 1) mate pair = 775439855 (exon 2)

832539269 578892595 complete 51% to 9M1

MMELLLLGAAALTAVCYLLYRWSTSTFGYFEKRSVPFGKPYPLLGALWPYLKGEKSPVDALCEGYRHFP

GCTYSGVFLFRSPCYLIHDPELIKKIAVRDFDHFADHANNVSLEVDPFMGRVLFFANGQR

WKQGRTALSPAFTGSKMRNMFGLVSEYTNGAVQRLVEDAEASGGKMERELNDLFKR (2?)

LGHDAITSISLGTDIDSIREPENEFFAHGKELAKTTGLQGFRFFIMSLLPEKILRLSR

MRIVPEHLANFYHGVVSKVIKHRLDNGIVRPDFIHLLLQARRNELKTDKTDE

KFNDAGFATVQEHLQAPTKNPIEWTDYDIAATVATFFFGGAESTTALLCFTIYELALN

PHVQQKLLAEIDSVQKTVGTEKLTYESMQQMKYLDM

VISETLRKWPPFGVTNRRCTKPYQIQDVDGHSVTIEKGQVVFLPIQHIHRDPHFFPNPMR

FDPERFATENRDQLNQDAYLPFGAGPRNCIGSRLSLMQTKCFLYYLLSTFEVQLSNRTEV

PIEIDLKATGLNSKNGFWFHLIQRVK*

 

>AAGE01014192 813491936 639416242 762398872 complete

TC52960 TC20003 TC26029 TC39058 TC4763 TC7436

40% to CYP9J4 40% to 9A4 (new subfamily in CYP9)

AAGE02005788.1 4968-6548 no introns

ESTs DV359961,DV294300,DV294302,DV359959

MEAFLLISALVGALILLYRYATAFANYFNQRGIKYRKPTFLLGNLGPILFQRTT

PVANLTDLYREFAGEKVYGFYEFRRPTIILRDLQLIKRVF

VKDFNHFTNHTAPVDEHMDSILGNGLISLEGQKWRDMRAMLSPMFTGNKIRHMVPLVGKC

AEDLCRFVERETDEVEWDVRELLAKCLVEVIGSCAFGIEVDSFNDPDNEFDRVAKYLMNQ

SDVRKVARFLLIMVFPKMCKQFGMELFDDKYKRLFRRLVSETMLKRESDGVSRPDLIQLLMLARRG

KLEADKDVEGESFAAANDYLETGTDDVKRSWSDDELTAQAVIFFAAGFDTTSTLLSFTLM

ELAIHPEIQDRLFEEIKSVQRSDSVISYEQIQSLEYLDAVISESLRKWPPLTATDRKCTK

DYLMVDPEDGSPMFSIEEGYSVWVPIYCFHHDPKYFPNPEKFDPDRFNRVNRHQLNPAAY

MPFGVGPRNCIGSRFALMSAKMILLRLLRSFRVEVCPKTDTTLQLSKTKMNMTLEKGHWV

YLKRRS*

 

CYP4 clan sequences

 

>AY433052 AAGE01072700.1 AY431937 88% to 4G16 complete

AAGE01141041.1 AAGE01223479.1 AAGE01094290.1

1517  MSATVAPADPVMANANIASPMNVFYFLLAPALLLWFIYWRISRQHMLKLAEKIPGPPGLP  1338

1337  LLGNALELIGTSH (1?)

1379  SVFRNVIEKGKDFNQVIKIWIGPKLIVFLVDPRDVELLLSSHVYIDKSPEYRFFKPWLGNGLLIST  1179

323  GHKWRQHRKLIAPTFHLNVLKSFIDLFNENSRLVVEKMHKEAGKTFDCHDYMSECTVEILL (1)

1240 ETAMGVSKKTQDQSGFDYAMAVMKMCDILHLRHRKMWLYPDLFFNMSQYAKRQVKLLDTI 1061

1060 HSLTRKVIRNKKAAFATGTRGSLATTSIKTAEFEKPKSNINTNSVEGLSFGQSANLK 890

 889 DDLDVDENDVGEKKRLAFLDLLLESAENGALISDEEIKNQVDTIMFEGHDTTAAGSSFFL 710

 709 SMMGIHQHIQDKVIQELDDIFGDSDRPATFQDTLEMKYLERCLMETLRMYPPVPIIARS

LKQDLKLASSDLVVPSGATIVVATYKLHRLETIYPNPNVFDPDNFLPERQANRHYYAFVP

FSAGPRSCV 320 (1)

 255 GRKYAMLKLKVILSTILRNFRVISDLKEEDFKLQADIILKREEGFQIRLEPRQRKPKAAKA*

 

>AAGE01114834 52% to AY433052 same as TC67187 76% to 4G17N-term probable ortholog complete 80% to 4G17 full length

AAGE01340100.1 same as AAGE01229939.1 85% to 4G17 C-term, probable ortholog

633767131 823361413 823353110

MVIFMTLVLVASALFHFWMISRRYVQLGNKIPGPRAYP

FIGNANMLLGMNHNEIMERAMQLSYIYGSVARGWLGYHLVVFLTEPADIEIILNSYVHLT

KSSEYRFFKPWLGDGLLISSGEKWRSHRKLIAPAFHMNVLKTFVDVFNDNSLAVVERMRK

EVGKEFDVHDYMSEVTVDILLETAMGSQRTSESKEGFDYAMAVMK (2)

MCDILHSRQLKFHLRMDSVFNFTKIKQEQERLLGIIHGLTRKVVKQKKELFE

KNFADGKLPSPSLSEIIAKEESESKESLPV

ISQGSLLRDDLDFNDENDIGEKRRLAFLDLMIETAKSGADLTDEEIKEEVDTIMFEGHDT

TAAGSSFVLCLLGIHQDVQDRVYKEIYQIFGNSKRKATFNDTLEMKYLERVIFETLRMYP

PVPVIARKVTQDVRLASHDYVVPAGTTVVIGTYKVHRRADIYPNPDVFNPDNFLPERTQN

RHYYSYIPFSAGPRSCV (1)

GRKYAMLKLKVLLSTILRNYRVVSNLKESDFKLQGDIILKRTDGFRIQLEPRV*

 

>CYP4C38? exon 1 AAGE01133681.1 587572087 complete

These two pieces probably are from one gene, since there are no

Other closely related sequences found. 66% to 4C36

784  MSELTTFIYGILVFLIFAPFLQWWVKRARLVQIIDKIPGPKAYPFIGTTYTFFGKKHY (1)  611

 

>CYP4C38 N-term AAGE01207392.1 AAGE01470307.1 71% to 4C27

824335234 761357490 744250376 592527729

570727647 754993699 585845687 593920597 613947338

594452687 575404595 749489367 579218945 825227784

AAGE01009885.1 TC66432 Length = 995 71% to 4C27 anopheles

AAGE01207392.1 matches the N-term part of TC66432

parts are on AAGE02022591.1 AAGE02022592.1. AAGE02022593.1

use these seqs

456 ELFYIIDERTRRYPDIHRIWTGMRPEIRISKPEYVETIIGASKHMEKSHGYDFLFDWLGEGLLTSK  259

302 GERWFQHRKLITPTFHFNILDGFCDVFAEQGAVLAERLEPFANTGKPVDVFPFITKAALDIIC (1) 490

694 ETAMGVKVNAQTGGENNYVNAIYR (2)  762

822 MSEIFVDRSIKPWLHPEFIFKRTEYGRQHKKALDIVHGYTKK (0) 947

    VIRDRKEALQVKENSTGAGDTGEDLYFGTKKRLAFL 227

228 DLLLEGNAKHKQLTDDDVREEVDTFMFE (0)

    GHDTTTAGMSWALFLLGLHPDWQDRVHQEIDS 407

408 IFAGSDRPATMKDLGEMKLLERCLKETLRLYPSVSFFGRKLSEDVTLGQYHIPAGTLMGI 587

588 HAYHVHRDER (2?)

    FYPDPEKFDPDRFLPENTEHRHPFAYIPFSAGPRNCIGQKFAILEEKS 761

762 IVSSVLRKFRVRSANTRDEQKICQELITRPNEGIRLYLEKRQ*

 

>Exon 1 of 4C25 ortholog AAGE01102043.1    80% to 4C25 complete

Exon 2 of 4C25 ortholog AAGE01326257.1

AAGE01078331.1 83% to 4C25

515 MIEATVKSSFVLSKVAKMLSYFSPITIILATMIAGAIYVYNKRRARLVKLIEKIPGPASMPLIGNSLHINVDHD 294 (1?)

EIFNRIISIRKLYGRQQGFSRAWNGPIPYVMISKASAVE  (0) 935

PILGSPRHIEKSHDYEFLKPWLGTGLLTSQGKKWHPRRKILT  1504

1505  PAFHFKILDDFVDIFQEQSAVLVQRLQRELGNEEGFNCFPYVTLCALDIVC (1)  1657

1714  ETAMGRLIHAQKNSDSDYVKAVYQ (2)

      IGSIVQNRQQKIW  1887

1888  LQPDFIFKRTEDYRNHQRCLSILHEFSNRVIRERKEEIRKQKQSNNNTINGNANNAVEAN  2067

2068  ILDGNNNAEEFGRKKRLAFLDLLIEASQDGTVLSNEDIREEVDTFMFEGHDTTSAAISWI  2247

2248  LLLLGAEPAIQDRIVEEIDHIMGGDRDRFPTMKELNDMKYLECCIKEGLRLYPSVPLIAR  2427

2428  KLVEDVQIEDYTIPAGTTAMIVVYQLHRDPAVFPNPDKFNPDNFLPENCRGRHPYAYIPF  2607

2608  SAGPRNCIGQKFAVLEEKSVISAVLRKYRIEAVDRRENLTLLGELILRPKDGLRIKISRR  2787

2788  E*  2793

 

>AAGE01094388.1 exon 2 4C like possible pseudogene fragment cannot extend

2051  FNRIECIKRLYTYQSGGYMRTWNG  1980

 

 

>AAGE01029369.1 72% to 4C26 62% to 4C25  complete

did blast with exon 1 of 4C26 to find best match

did blast with last 500bp of AAGE01029369 to find trace seq on (+)

759912013

mate pair = 759644271 will be downstream of AAGE01029369 and should match next

contig, possibly with N-term exon.  Contig match = AAGE01001656.1 (-)

over 15kb, but no P450 seq.  Might be a short contig between these

AAGE01030574.1 exon 2 4C like

586027613 matches first 500 bp on (-)

mate pair = 586024059  matches AAGE01059591.1 (+) no P450 seq

repeat with first 500bp of AAGE01059591

600013440 matches on (-) mate pair = 600014884

this matches two contigs AAGE01312028.1 (-)

and AAGE01029369.1 (+)  this one has a P450 seq

that is 4C like and complements this exon 2 seq.

Join them

Now use last 500bp of AAGE01030574 to find a trace file that matches on (+)

636183786 mate pair = 637148886 matches AAGE01001355.1

this is the same contig found in a search above going downstream from the N-term of a 4C like P450.  The intron must be more than 17kb

join exon 1 seq

AAGE01098344.1 best hit to 4C26 N-term

searched by megablast to get 585803103(+), mate pair = 585951518

searched WGS with this to find adjacent contig downstream

this = AAGE01001355.1 16kb, no P450 seq

148  MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKD  354

4051  EFLDRVISAQKMYGRRIGMSRAWNGPIPYVMISKASAVE  3935

3210  PILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSEFVNIFHK   3034

3033  QALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFC (1?)

2868  ETAMGCPVNAQKNSDSEYVRAHK (2?)

      IGKIIRNRLQKVWLRPD  2686

2685  FIFKHTEDYRKHQECLQVLHNFSDRVVQERKTEIVAKRCQAEDLIDLNNNKVADETISCC  2506

2505  SKKQLEFLDLLIEGSLDGNGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQD  2326

2325  RVIDEIDGIMNGDRDRRPTMQELNDMKYLECCIKEGLRLYPSIPLIARRLTEDVQVDDY   2149

2148  IIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQK  1969

      FAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL* 1816

 

1.295_1 AAGE01029369 Hils Version  15 diffs to blast file

AAGE02013631.1  exon 1, AAGE02013630.1 exon 1 exact duplicate,

AAGE02013629.1 exon 2, AAGE02013628.1  exons 3-5 use this seq

MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKDEFLDRVISAQK

MYGRRIGMSRAWNGPIPYVMISKASAVEPILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSE

FVNIFHKQALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFCETAMGCPVNAQRNSDSEYVRAHKLIGKIIRNRLQKVWLR

PDFIFKHTEDYRKHHECLQVLHSFSDRVVQERKAEIVAKRRQAEDLIDLNNNNESEELTSCCRKKQLAFLDLLIEGSLDG

NGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQDRVIDEIDGIMNGDRDRKPTMQELNDMKYLECCIKEGLR

LYPSIPLIARRLTEDVQVDDYIIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQ

KFAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL.

 

>476414268 92% to Aedes albopictus AY971511 complete

760814858 568935720 581452704 754413849 580048410

walked upstream to 531423840

walked to 529070673

walked to 824339230 mate pair = 823396717 matches C-term

AAGE01143020.1 63% to 4C28 AAGE01462557.1 62% to 4C37

AAGE01324666.1 exon 2 4C like same as 476414268

supercontig 1.295 frame = -

177175 MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG 176996

176995 SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 176900

DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW 546

LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFN

CTRYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE (2)

IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLANKHDEDPSI

EIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLLYL

LGTDPQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARKLT

ESVNVGDYTIPAGTNAVIVVYQLHRDTQIFPNPDKFNPDRFLPENSQGRHQY

AYIPFSAGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGL

RISISQRQ*

 

AAGE02013627.1 exon 1, AAGE02013626.1 exons 2-3 use this seq (3 diffs)

12959  MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG

       SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 12684

13019  DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW  12840

12839  LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFNCT  12660

12659  PYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE (2) 12552

12391  IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLAN  12221

12220  KHDEDPSIEIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLL  12041

12040  YLLGTDLQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARK  11861

11860  LTESVNVGDYTIPAGTNAVIVVYQLHRDTQVFPNPDKFNPDRFLPENSQGRHQYAYIPFS  11681

11680  AGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGLRISISQRQ*  11498

 

>AAGE01044016.1 AAGE01004063 47% to 4AR1, 614744667 579602080 complete

probably same gene as 4T1.6 (3 diffs), 4I1.3 (2 diffs)

MIAIIAFTAIFVLFVYVWQWRRRLSRPFRTVPGPPGLPLIGNCHQFIGKSSTNIFHMLI

ELERLYGSVFKVDVATGIWLFYMSPGDIERIMTGP

EFNCKSDDYDMLLEWLGTGLLISNGNKWFTHRKALTPAFHFKILDNFVQVFDEKSTILAR

KFLSYSGKVVGIFPLVKLCTLDVIVETAMGTESNAQTEESGYTMAVEDISEIVFWRMFNN

VYNTEFMFKLSNKYGTYKKCLETIREFTLSIIEKRRSTLNVFDKNGGTSEVCNDSTGLKK

KMALLDILLQTEIDGRPLTNEEVREEVDTFMFA (0)

GHDTTASAITFLLYAMAKYPDVQQKVYEEAVSVLGDSIDTPITL

SALNDLKYLDLVIKESLRMFPPVPYISRSTIK

EVELSGCTIPTGTNITVGIFNMHHNPKYFPDPEEFIPERFEVERGVEKQHPYAYVPFSAG

GRNCIGQKFAQYEIKSTISKVIRLCRIELI

RPNYEPPLKAEMILKPQDEMPLRFFPR*

 

>AAGE01046474.1 possible 4K2 like N-term joins with AAGE01021812

2979 MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1) 2827

2664 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE (0) 2557

supercontig 1.283 1377209 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE 1377316

 

AAGE01021812 52% to AAGE01044016 54% to CYP4K2

supercontig 1.283 1387420 KILNTQSYASKSEDYDKVAEWIGYGL 1387497

     KILNTQSYASKSEDYDKVAEWIGYGL 4389

4388 LISKGEKWFKRRKVLTPGFHFKILESFVRVFNEKSDVLCRKLASYGGSEVDVFPTLKLYT 4209

4208 LDVLCETALGYSCNAQTEDSFYPAAVEELMSILYWRFFNLFASVDTLFRFTKQYRRFHKL 4029

4028 IGDTREFTLKIIEEKRKLLNELHDEGAVNEEDDEGKKKMALLDLLLRATVDGKPLSD 3858

3857 DDIREEVDTFTFA 3819 (0)

3755 GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 3645

3583 TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIFQMH 3404

3403 RDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI 3269 (1)

3219 GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK* 3049

 

AAGE02013268.1 use this seq

239125  MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1)  239274

239440  EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE (0) 239547

249651  KILNTQSYASKSEDYDKVAEWIGYGLLISKGEKWFKRRKVLTPGFHFKILESFVRVFNE  249827

249828  KSDVLCRKLASYGGSEVDVFPTLKLYTLDVLCETALGYSCNAQTEDSFYPAAVEELMSIL 250007

250008  YWRFFNLFASVDTLFRFTKQYRRFHKLIGDTREFTLKIIEEKRKLLNELHDEGAVNEEDD  250187

250188  EGKKKMALLDLLLRATVDGKPLSDDDIREEVDTFTFA (0)

250362  GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 250490

250552  TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIF  250704

250705  QMHRDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI (1) 250839

250901  GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK* 251068

 

>CYP4D23 AAGE01000026.1 476394815 TC65595 TC24018 TC42055 74% to 4D22

only 51% to 4D17, probable ortholog of 4D22 complete

4T2.8 (v1 2 diffs), 4T1.3 (v2 1 diff), 4T1.1 (v3 1 diff)

AAGE01263405.1 probable exon 1 of 4D22 ortholog

566  MSILDWILVITGAVLAINYLLVRRNLKYQSQWPGPAAVPLIGCYYLYFNKKPE (0)

5888 DVMDFIFTLSRKYGTMFRVWVGTRLALFCTNTPDTETVLSSQKLIRKSELYKFLVPWLGN 6067

6068 GLLLSTDQKWFNKRKIITPAFHFKILEQFIEVFDRQSGILVQKLKPEASGKLVNVYPYVT 6247

6248 LCALDVIC 6271 (1)

6334 ETAMGTPINAQTDVDSKYVRAVTELSYLLTTRFVKVWQRSDFLFNLSPDRKRQDKV 6501

6502 IKVLHDFTTNIIQKRRKELMDHGDSGISGDDSIGSKKKMAFLDVLLQASVDGKPL 6666

6667 TDKEIQEEVDTFMFEGHDTTTIAIAFTLLLLARHPEVQEKVYKEVTEIIGTDLSIPATYR 6846

6847 NLQDMKYLEMVIKESLRLYPPVPIIGRKFTEKTTIGGNVIPEDSNFNLGIIVMHRDPKLF 7026

7027 DDPEKFDPERFSPERTMEQSSPYAYIPFSAGPRNCI (1)

7202 GQKFAMLELKSTLSKVIRNYRLTEAGPEPQLIIQLTLKPKDGLKIAFVPRA* 7357

 

>CYP4D24 AAGE01006231.1 4T2.6 (100%) complete 494125342 62% to 4D16

AAGE01082298 bridged by 825775921 to AAGE01006231.1

1254 MLILLASVVVLSIGLAVYFYQQFANRLHYAAKIGGPKGYPLLGNSIQYGTKSPVEFLQE 1078

1077 VQKTNEQCGKFYRLWIGPDLIFPITDAKL 991 (0)

917 AILSSQKLLDKSVQYDFIRPWLGNGLLTSTGRKWHSRRKIITPTFHFKILEQFVEIFD 744

743 QQSNIFVGQLKSKAQSGEDFDVFPVVTLCALDVIC 639

4385 ESAMGTKVNAQLNSDSKYVRAVKD (2) 4456

MATVAMARSFKAFARFNFTFYFTPYRRMQDKALKVLHDYTDSVIRSRRLEL

AKGAFTKSDENENDVGIRKKVAFLDMLLQATVDGRPLDDLEVREEV

4812  DTFMFEGHDTTTSAISFLIGILAKHPDVQQKVYDEVRNVIGDDLNVSVTLSMLNQLNYLD  4991

LVIKETLRLYPSVPIYGRMLLENQEI (1)

5134  NGTVFPAGSNLAIFPYFMGRDPEYFENPLEFRPERFAVETSAEKANPYRYVPFSAGPRNCIGQKFA  5331

VAEIKSLISKLVRHYEVLPPKQPNSERMIAELVLRPEGGVPVRIRSRVR*

 

>AAGE01055570 54% to 4D16 TC58022 AAGE01032454.1 512569922 complete

mate pair = 514720868 = C-term

walked upstream to 822913819 mate pair = 822913819 = C-term

walked up to 803280909 mate pair 808283299 matches mid region

walked to 519825648 mate pair 520507511 matches C-term

walked to 826165713 and 825253376 mate pair = 825244224 matches exons 2,3

walked to 528823040 and 572484877 mate pair = 572478448 matches exons 3,4

walked to 585907890 walked to 812022667 and 749632380 mate pair = 749635932

matches exon 2, walked to 578582171 (possible repeat region)

walked to 580094767 mate pair = 585907890 above so got past the repeat

also found 759050174 mate pair = 759046608.  This mate pair has an N-term seq

which is almost identical to AAGE01124480

another hit that matches 759046608 exactly = 824317331 so the seq is confirmed

     MWFLLSLVAAACLAWAIYRKFARTLEISGQHTGPPALPILGNGLWFLNKQPD (1)

     EFLPIIQRLTDEYGDVFRFWQGPEFTLYVGRPSMIE (0)

     TLLTDKNLTDKS 392

 393 GEYGYLSNWLGDGLLLSKRNKWHARRKAITPAFHFKILEQFVDVFDRNASELVDVLGKHA 572

 573 DSGEVFDIFPHVLLYALDVIC (1) 635

 698 ESAMGTSVNALRNADSEYVRAVKEAANVSIKRMFDFIRRTPLFYLTPSYQQLR-KSLK 868

 869 VLHGYTDNVITSRRKQLSNSSNKNHKDSDDFGFRRKEAFLDMLLKTNINGKPLTDLEI 1042

1043 REEVDTFMFEGHDTTTSAVVFTLLNLAKHPAIQQKVYDEIESVIGNDLQKPIELSDLHDL 1222

1223 SYLEMVIKETLRLYPSVPLIGRRCVEETTIEGKTIPAGANIIVGVFFMGRDPNYFEKPLD 1402

1403 FIPERFSGEKSVEKFNPYKYIPFSAGPRNCI 1501

     GQKFALNEMKSVISKLLRHYEFILPAGSPAEPLLASELILKPHHGVPLQIRRRGH*

 

>516274867 broken CYP4D exon 1 probable pseudogene

AGQSALPILGNVLRFLNYLPD (1)

AGCTGGCCAATCGGCCTTACCAATCCTGGGGAATGTACTGAGGTTTCTCAACTACTTGCCCGATGGT

 

>AAGE01019344.1 AGE01032320 54% to 4D17 complete

AAGE01178606.1 exon 2 of 4D like seq, 45% to 4D17 but only 35% to 4D15

TC64783 Length = 903 83% to AAGE01055570

793219534 630759272 note that in 630759272 exons 2 and 3 are on the – strand

and exon 4 is on the plus strand. But the order is correct on 587120742

520001141 512663558,

AAGE01124480.1 516274910 = exon 1 most like 4D17

744614807 568757642 576385708 are exact matches so this seq is

really different from the seq of AAGE01055570

this is probably the N-term exon of seq AAGE01019344

904  MWIYLSLLTVGFVAVVIYRKFARTLEVAQQYAGPPALPILGNGLWFLNKQPD (1)  1059

1674 EFLPIIHKLTSTYGDVVRFWQGPQFTLYVGNPSMIE (0)  1781

19   ILTNKHLTDKSGEYDYLSNWLGDGLLLSKRHKWHARRKAITPAFHFKILEQFVDVFDRNAAELV 198

199  DVLEKHADDGKTFDMFPYVLLYALDVIC (1)

332  ESAMGTSVNALRNADSEYVRAVKEAAHVSIKRMFDIIRRTSLFYLTPSYQKLRKALK 511

512  VLHGYTDNVIVSRRNQLMSKTDSGGVSDEFGAKKKDAFLDMLLRTSINGKPLTNLEIRE 688

689  EVDTFMFEGHDTTTSAVVFTLFNLAKHPEIQQKVYDEIVSVIGKDPKEKIELSHLHDLSY 868

869  TEMAIKETLRLFPSVPLIGRRCVEEITIEGKT

     IPAGANIIVGIYFMGRDPKYFENPSHFIPERFEGEFSVEKFNPYKYIPFSAGPRNCI (1)

     GQKFALNEMKSVISKLLRHYEFILPPDSVEEPPLASELILKPHRGVPLQIRHRALN*

 

>AY431801 64% to 4D24 AAGE01115931.1 AAGE01014858.1 AAGE01023514.1 complete

AY433130 change 1 aa use AAGE02013268.1

MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP ()

DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE ()

GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIIT

PTFHFKILEQFVEIFDSQSNILIDKLTPFMESGETFDVFPLVTLCALDVIC (1)

ESAMGTKVNAQIHSDSEYVQAVKE 2240 (2)

2179 ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPD 2000

1999 NNATSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRL 1820

1819 AKHPEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 1628

1567 EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKEISAEKVNPYRYIPFSAGPRNCI 1385

1384 GQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMILRPTYGVLLRLKKRQ* 1229

 

AAGE02013268.1

212620  MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP () 212462

202379  DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE (0) 202272

189962  GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIITPTFHFKILEQFVEIFDSQ  189783

189782  SNILIDKLTPFMESGETFDVFPLVTLCALDVIC (1) 189684

189644  ESAMGTKVNAQIHSDSEYVQAVKE (2) 189553

189492  ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPDNNA  189304

189303  TSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRLAKH  189124

189123  PEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 188941

188880  EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKE  188764

188763  ISAEKVNPYRYIPFSAGPRNCIGQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMI  188584

188583  LRPTYGVLLRLKKRQ*  188536

 

>CYP4H28 4T2.2 (2 diffs) complete

AAGE01082714.1 AAGE01027375.1 AY432644 55% to 4H18

MLAILVSLATVAFLWLVYQRRMARAAKIAAYFPHPKPVLPLLGNSLMFANKDAPAIFHTVLDLHKQCG  1218

QNLVTYGLFGDVQLHISSPKAIERVLLSKVTKKNYIYEYLEPWLGTGLLLSFGEKWFQRR  1038

KIITPTFHFKILEQFLEVFNAETDRLVTKIEQHVGGEEFDMYQYITLHALDSIC  876

2915 ETSMGVSINALDNPDNAYVHAIKDFGSIVIQRTFSALRSFPLLYFLHPFYWRQQKLIK 3094

3095 TMHN FTNSVIKAKRQALEEKRHTEGETKEHNEDDGIYGKKRMSFLDLLLNESSMSD 3262

3263 ADIREEVDTFMFEGHDTTTSGIYFSLMALAMHPDIQERLYGEIRQVLETEEERHAPLTNATLQQMKY  3463

3464  LDMVIKEVLRVYPSVPIIGRELLEDVEI (1) 3547

3604  NGCQVPRGTAMVVIIHNVHRNAEVFPDPERFDPERFSDESGGKRGPYDYIPFSVGARNCI  3783

GQKYALLEMKVTLVKLLLAYRFIPGKSTDSIRIQGDLVLRPFGNMALRIESR*

 

>4H29 4I1.8  512549996 AAGE01010708.1 784728638 complete

MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTPPG

IFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLNEG

LITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHVTL

CTLDIISESAMGVKLNAQDDPNSSYVQAVKE (2)

MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHK

FTDSVIFQRKDQLDDEQARQESKQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNS DIREEVDTFMFEGHDTTTSCISFSAYHIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTL QELKYLEMTIKEVLRIHPSVPIIGRKTTGDMRIDGETVPAGVDIAVLIYAMHNNP EVFPEPEKFDPERFNEENSAKRHPYSYIPFSAGPRNCIGQKFALLEIKVTLVKLLGHYRL LPCEPENEVKVKSDITLRPVNGTFVKIVPR

 

AAGE02013311.1 use this seq (3 aa diffs)

43167  MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTP  42988

42987  PGIFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLN  42808

42807  EGLITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHV  42628

42627  TLCTLDIISESAMGVKLNAQDDPNSSYVQAVKE (2) 42529

37902  MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHKFTDSVIFQRKDQLDDEQARQES  37723

37722  KQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNSDIREEVDTFMFAGHDTTTSCISFSAY  37543

37542  HIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTLQELKYLEMTIKEVLRIHPSVPIIGRK  37363

37362  TTGDMRIDGETVPAGVDIAVLIYAMHNNPEVFPEPEKFDPERFNEENSAKRHPYSYIPFS  37183

37182  AGPRNCVGQKYALLEIKVTLVKLLGHYRLLPCEPENEVKVKSDITLRPVNGTFVKIVPR  37006

 

>476148479 476152924 832469399 620727729 529569782 68% to 4H18 complete

AAGE01076911.1

MLLILTLIFATVGYALFNYHRQRQKLLNIRSHFDGPDSHYLWGTFPMFIGKTIP (1)

DIWDIITDLHKKHGEDIAIIAAFN

ELVMDLSSSKNVEKVLLAKSIKKSFAYDFLEPWLGTGLLISTGEKWFQRRKIITPTFHFS

MLEGFLEVFNKEANILVSKLKAKAGKDEFDIYDYVTLYALDSIC

ETSMGVQINAQDDPNNEYAVAVKQMSTFILRRVFSILRTFPSLFFLYPFAKEQKKVILKLH

NFTNSVIDARRAMLEKEKSNKNVTFDLQEENMY

TKRKMTFLDLLLNVTVNGKPLSREDIREEVDTFMFEGHDTTTSGISFTLWHLAKYQDV QQKLFEEIDRVLGKDKVNAELTNLQIQELDYLDMVVKESLRLIPPVPIIGRTLVEDMEM (1) NGVTIPAGTQISIKIYNIHRNPKIWEKSDEFIPERFSKTNESKRGPYDFIPFSAGSRNCIGQ RYAMMELKVTIIKLIASFKVLPGDSMDKLRFKTDLVIRPDNGIPIKLVERI*

 

>AAGE01049176.1 67% to 4H14 complete

MLFLAIVVGALLYLVVNFYVTRKPLERMAVHFSGPKPHYLLGNVLEFLNKDLP (1)

GIFETMVGFHRKYGQDILTWNVLNLNMISVTSAENVEKVLMAKQTKKSFLYSFVEPWLGQGLL

ISSGEKWFQRRKIITPTFHFKILEQFVTVFNKETDTMVENLKKHVDGGEFDIYDYVTLMALDSIC

ETSMGTCVNAQKNPTNRYVQNVKRMSVLVLLRTISVLAGSPLLYDILHPHAWEQRKIIKQ

LHEFTISVIESRRRQLEADKLEQVDFDMNEESLYSKRKMTFLDLLLNVTVEGKPLTNADI

REEVDTFMFE (0)

GHDTTTSGISFAIYQLALNPQIQDKLYDEIVSILGKNSSNVELTF

QTLQDFRYLESVIKESMRLFPPVPFIGRTSVEDMEM (1)

NGTTVKAGQEFLVAIYVIHRNPKVYPDPERFDPERFSDTAESKRGPYDYIPFSAGSRNCI

GQRYAMLEMKVTLIKLLMNYKILPGESMGKVRVKSDLVLRPDRGIPVKLVARS*

 

>494155296  56% to AY205085 66% to 4H14 793189512 AAGE01213118.1

AAGE01473588.1 AAGE01538714.1 531423523 512616786 570666861 571502407

supercontig 1.85 Frame = - complete

2620039 MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP (1) 2619884

2602651 GIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLA 2602469

SGEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC

ETSMGTSVNAQKDPDNRYVRNVKR (2?)

MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFT

DNVIATRRKQLKSDQMVDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLD

IREEVDTFMFE (0)

GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAIK

EGLRLFPSVPFIGRNLVEDLEF (1)

DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPF

SAGTRNCIGQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV*

   

AAGE02005220.1 Length=120659 USE THIS SEQ

89011  MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP (1)  88856

71635  GIFEKIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLAS  71438

71437  GEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC (1) 71252

71190  ETSMGTSVNAQKDPDNRYVRNVKR (2) 71122

71063  MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFTDNVIATRRKQLKSGQM  70893

70892  LDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLDIREEVDTFMFE  (0) 70752

70216  GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAI  70043

70042  KEGLRLFPSVPFIGRNLVEDLEF (1)

69911  DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPFSAGTRNCI  69732

69731  GQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV*  69570

 

>TC65985 TC16577 TC24796 TC37697 57% to CYP4H14

AAGE01321728.1 AY205085 AAGE01106416.1 complete

MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVYY

HEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGRK

WFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSICE

TAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRTL

HQFTDNVIWKRREQLMNGPRNDEMDNTTLSKKKQTFLDLLLCMSVESQPLSNEDIREEVD

TFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKYL

DLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI

PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK

LRTDLVLRPEYGIPIKIEARN*

 

AAGE02013268.1 use this seq (3aa diffs) no introns

13367  MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVY  13188

13187  YHEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGR  13008

13007  KWFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSIC  12828

12827  ETAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRT  12648

12647  LHQFTDNVIWKRREQLMNGPRNDEMDNTTSSKKKQTFLDLLLCMSVEGQSLSNEDIREEV  12468

12467  DTFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKY  12288

12287  LDLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI  12108

12107  PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK  11928

11927  LRTDLVLRPEYGIPIKIEARN*  11862

 

>AY431450 65% to 4J10 AAGE01108571 514842991 complete

continues on AAGE01227281.1 AAGE01378346.1

MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPML

PVLGNILEVLIKDTVQTFNYARSNALKYGRSYRQWIFG

NVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGSKWQARRKILTPAFHF

SILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC (1)

ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPLEDA

IEPIHDFTRSIIRQKREQLKQDSTMHIVDSDGI (2)

YGSKQRYAMLNTLLMAEENDAIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQ

QRVYDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV

DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAA

KRNPYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAK

TPIKVQFAKRKANATRS*

 

>AAGE02025842.1 first P450 on contig (of two)

6 aa diffs to AY431450 all in short interval

trace files 811916166, 586617316, 582273387 match this seq

580134265, 753220309, match AY431450

There may be two sequences

Searched with the first 211 nucleotides to see if there were two

Alternate matches affecting synonomous codons. All but one trace file matched

This genomic seq.

180450  MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPMLPVLGNILEVLIKDTVQTFNYARSN 180647

180648  ALKYGRSYRQWIFGNVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGS  180824

180825  KWQARRKILTPAFHFSILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC (1) 181007

181067  ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPL  181222

181223  EDAIEPIHDFTRSIIRQKREELKQDSTMHIEDSGDI (2) 181330

181387  YESKQRYAMLNTLLMAEENDVIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQQRV  181575

181576  YDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV (1)  181740

181798  DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAAKRN  181932

181933  PYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAKTPI  182112

182113  KVQFAKRKANATRS*  182157

 

>AAGE01397643.1 84% TO 223407477 569795084

250bp downstream of AAGE01227281.1

join with AAGE01226366.1

MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQGK

TFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG

LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTAL

 

>AAGE01226366.1 95% to AAGE01331087.1 10 aa diffs (allele?)

      YLNP

1440  KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHESAVQDRI  1261

1260  YSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKECLRLWPPVAFISRNISEDIVLEDGA  1081

1080  VIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK  901

900   YAMMELKVVIVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR*  754

 

AAGE02025842.1 ESTs DW194177.1 EB096538.1 use this seq

Second P450 on contig (of two)

182386  MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQ (1) () 182556

182616  AKTFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG  182798

182799  LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTALQPV  182978

182979  MSKFTLNTIC (1)

        ETSMGVKLSTVSGADVYRTKLYEIGEALVHRLMRPWLLNDFLC

        RLTGYKAAFDKLLLPVHSFTTGIINKKREQFQASSEPLVELTEENI (2)

        YLNPKKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEG  183506

183507  HDTTAAALVFIFFTLAHESAVQDRIYSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKE  183686

183687  CLRLWPPVAFISRNISEDIVLEDGAVIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLP  183866

183867  EEVDRRNPYAYVPFSAGPRNCIGQKYAMMELKVVIVNALLKFRVLPVTKLEDINFVADLV  184046

184047  LRSTNPIEVRFERR  184088

 

>AAGE01331087.1 61% to 4J5 575366287 574128015

no ESTs for this seq. no exact match in WGS 95% to AAGE02025842.1 second gene

      YLNP

1211  KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHEPAVQDRI  1032

1031  YSEILQVYNGKPQSERAFTPQDYAEMKFLDRALKECLRLWPPVAFISRNISEDIVLDDGT  852

851   LIPAGCVANIHIFDLHRDPEQYPEPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK  672

671   YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR*  525

 

>AAGE01288441.1 97% to AAGE01216085.1 10 aa diffs

note on finding the C-term.  This seq is 74% identical to 4J5

at the C-term.  This 4J5 seq continues as

GQKYALLEVKTAVAYLVLRYRILPATKREEIRFIADLVLRSATPLKVRFERRQNA*

Which is 59% to AAGE01331087 so this is a good model

The N-term part of this seq has only one seq in the trace files

So it may be a poor version of the AAGE01216085.1 seq.

1368  IVIRGSFVINAIRARETEALLSSTKLIDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNI  1189

1188  LPKFLVTFQEECDKLLRKLDADVKADNTTTLQSVAARFTLNTIC  1057

997   ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL  830

829   KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG  650

649   IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIEAMIAGRIKPTEPLSMHDY  470

469   SELKFMDRVIKECLRLYPPVPFISRAILEDALLGDRFIPKDSMANLHIFDLHRDPDQFPD  290

289   PERFDPDRFLPANVEKRNPYAYVPFSAGPRNCI  191

 

>AAGE01216085.1 61% to 4J9 578920794 complete

TC57837 Length = 832 100% to AAGE01216085 extends the end

519943525 826152951 513457906 new C-term for 4J seq

attempted walking to join with N-term part. Ran into a gap

613942247 589588262

     MDWLTIVLLLILALLALYEVHLRLLLSNRAAKQFPGPRR

4    LPVLGNALALLFNDQVSTFKLPRRWAQRYKESYRLVIRGGFVINAIRARETEALLSSTKL  183

184  IDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNILPKFLVTFQEECDKLLRKL  363

364  DADVKAGNTTTLQSVAARFTLNTIC  438 (1)

498  ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL  665

666  KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG  845

846  IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIETMIAGRSNPTEPLSMHDY  1025

1026 GELKFMDRVIKECLRLYPPVPFISRAVLEDAQLGDRFIPKDSMANVHIFDLHRDPEQFPD  1205

1206 PERFDPDRFLPENVEKRNPYAYVPFSAGPRNCI  1304

QRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR* 677

 

>223407477 AABIG09TP.gz 223407646 AABIH08TP.gz 65% to 4J9

AAGE01099570.1 574077942

MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQGK (0)

TYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNI

DKSRFYKFLHPFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATID

EHVDKGVPTALQPVMSKFTLNTIC

 

Correct seq 88% to AAGE02025842.1 matches Nelson 223407477 on top

AAGE02025843.1

6807 MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 6637

6575 ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNIDKSRFYKFLHPFLGLGLL

     NSNGSKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTALQPVMSKFTLNTIC (1) 6183

6127 ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAA

     FDKLLLPVHSFTTGIINMKRKQFQESLEPSVELTEENI (2) 5861

5801 YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAVQDRI

     YREILQVYSNKPQSSRAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIILDDGS

     LIPAGCVANIHIMDMHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNCIGQK

     YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 5100

 

>AAGE02030510.1 Length=13206 98% to AAGE02025843.1 6807-5100. 9 aa diffs

new seq

11199  MDFLTNWWFGALVIVIVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 11029

10967  ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEPILSSTRNIDKSRFYKFLH  10797

10796  PFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTA  10617

10616  LQPVMSKFTLNTIC (1) 10575

10519  ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAAFDKLLLP  10346

10345  VHSFTTGIINMKRKQFQESLEPSVELTEENI (2) 10253

10193  YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAV  10014

10013  QDRIYSEILQVYSNKLQSALAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIIL  9834

9833   DDGSLIPAGCVANIHIMDLHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNC  9654

9653   IGQKYAMMELKVVVVNALLKFKVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 9492

 

These two seqs are very close but the region QDRIYSEILQVYSNKLQSALAFTPQDY

Has 4 aa diffs.  Trace files support both sequences

588906795, 590281011 match this seq.

832533501, 832391117, 589181510, 579871626, 592076987  match the other seq AAGE02025843.1

 

>AAGE02030510.1 pseudogene like AY431450 new seq

12342  WRSWRKILTPASHFSIFSEFL*LLQKEVDKLVRLLE

       NGIDKYQWQV*DLNGKIGHSMMTPWFL  12007

       YDDGAYNLFGYQKSLEDAIEPIHDFTKN

11898  DEETIQEEVDNLMFEGYDTTAEGLIFSILLLATEQEAQQRV*NELLEDLS  11749

       TNLESESFTVASYKNFNY

 

>AAGE01099570.1 4J like pseudogene

396 RIDHSIMTQLLYDDGVYNLFKYRKSL*DAIEPIHDFIRSIFLQNCVQLNQDSMMYSEEVK 575

576 QTFASTV*VKLSRISRYGLKPTYIMMNILLTAEK 677

676 NDGTAEETILEEVDNTMFEGCDTTAAGLIFSILLLATEQEPQQRV*DKL*EDCSSKS 847

848 ESETFTWMSYNNLKYRFLK 904

GPEYALLEMITIICILLISYRAIXXXXIFIADQILQTKPTAKVDYARRKANAMRN*

 

>AAGE01003123.1 C-term of 4J like seq 90% to AAGE01331087, pseudogene

PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGPRNCI

2873  GQKYAMMEL*VVVVNALLKFRVLPVTKLKDINFVADLVLRSTNPIEVRFERR  2718

 

>476375054 Pseudogene 61% to 4J5, 4aa diffs to AAGE01003123, stop codon in same place

666 PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGTRICIGQKYA 845

846 MMEL*VVVVNALLKFGILPVTK 911

 

>AAGE01584611.1 89% to AAGE01005255.1 matches 574201551, 520163843

note mate pair of 520163843 = 520524408 that has a C-term

like AAGE01005255, but not identical.  These two genes are linked

AAGE01584611 is upstream of 520524408 on the same strand

12   AELYKSNIREVGKIIQQRIMNPLLFEDWIYKITGYQAEFDKILSPIHSFTNNIIRQRRET  191

192  FHATMRNVDSPSEENTYTNIKQRYAMLDSLLLAEAKHQIDAEGIREEVDTFTFEGHDT  365

366  IGSAFVFTFLLIAHDQLVQQSLYEEIQRMFNLQPIPTLQNYNDLKYMDRVIKESLRI  536

537  YPPVPFISRLITEDVQYDGKLVPRGTLMNVGIYDLHRDPEQFPDPLRFDPDRFLPEQVQR  716

717  RSPYAYIPFSAGPRNCI  767

 

>520524408 593182570 813103660 574115512 593092990 825253407

579726574 exact match to AAGE01484914.1

96% to AAGE01005255, but not the same gene

This seq is identical to TC57838

TC57838 Length = 974 9 aa diffs to AGE01005255 complete

593092990 and 593182570, 825253407 579726574

Another set of WGS seqs are an exact match to AGE01005255

So there are two very similar gene sequences that are 95% identical.

    MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFT

    ALFKSQTGAFQQARQWARIFNNRTYRVLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYK 779

    LIVPFIGNGLLNSTGEKWHQRRKILTPTYHFNILQGFLQIFHEECRKLVNQLDKDAAQGI 599

    TTTLQPLSTQVTLNTIC (1)

    ETAMRLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQ

  1 AKFDKILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEA 177

178 KQQIDGEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNP 354

355 TQQDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDP 534

535 EQFPDPERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI

GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR* 807

 

AAGE02035951.1 missing the last exon use this seq 3 aa diffs to 520524408

2639  MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGAFQQ  2460

2459  ARQWARIFNNRTYRLLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGNGLLN  2280

2279  STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVNQLDKDAAQGITTTLQPLSTQVT  2100

2099  LNTIC (1) 2085

2026  ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFD  1868

1867  KILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEAKQQID  1688

1687  GEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPTQQDYN  1508

1507  DLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDPEQFPDP  1328

1327  ERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI (1) 1232

      GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR.

 

>TC57838 TC48249 matches 593092990 and 593182570, 825253407 579726574

 cyan = corrections

GCAAAATTCGACAAGATTCTTCGTCCCATTCATGCATTCACCAACAGCATCATCCGACAGCGAAGGGAAACATTTCATGA

AACTATGAAAAACGTGGACTCCCCATCGGAGGAGAACATATACACCAACATAAAGCAGCGCTACGCCATGCTGGATAGTC

TTCTGCTGGCGGAAGCCAAACAGCAAATTGACGGCGAAGGGATCCGCGAGGAGGTTGACACGTTTACCTTTGAAGGCCAC

GATACAACTGGCAGTGCCTTCGTGTTCACCTTTCTGTTGATTGCTCACGAGCAACTCGTTCAGCAGCGTCTGTTCGAAGA

GATTGAACGCATGTTCAACCTCCAACCCAATCCAACCCAACAGGACTACAATGACTTGAAGTACATGGATCGGGTGATCA

AGGAATCGCTTCGAATCTATCCGCCGGTGCCATTCATCTCCCGATTGATTACCGAGGATGTACAATACGATGGGAAGTTG

GTACCGAGGGGTACCATCATGAACATCGAAATCTACGATTTGCACCGAGATCCGGAGCAGTTTCCCGATCCGGAACGATT

CGATCCGGATCGGTTTCTGCCGGAGGAGGTCCAGCGGAGGAGTCCGTACGCTTATGTTCCGTTCAGTGCTGGACCGAGGA

ATTGCATTGGTCAACGGTTCGCCATGCTGGAGCTGAAGGCCATCCTCATCGGGGTGCTCCGCGAGTTCCGAGTCCTTCCC

GTTACCAAGCGGGAGGATGTGGTTTTCGTTGGGGACATGGTCCTCCGCTCGAGAGACCCAATCGTGGTCAAATTCGAACG

ACGTTAAGCTTTTTCTTGCTTTTTATAGCGACCCGTTGACCCAGTGAATTCAAGGATTTTTCAGTTTTTTACGGACAAAG

AGCGCATCCTGAACTGCTACTAGGCTACCCAAAAGTCATATTCTTAAATTGTTAATCCTAACATACTGTGTGAATAAATG

TTTTTTTATCGATT

 

>AAGE01005255.1 55% TO 4J9 TC57836 complete

5509 MYVFTTVAGLLVFIFILYEIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGSFQQ  5330

5329 ARQWARIFNHRTYRLLIQGVLFVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGKGLLN  5150

5149 STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVYQLDKDAAQGITTTLQPLSTQVTLNTIC  4955 (1)

4896 ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFDK 4735

4734 ILRPIHAFINSIIRQRRETFHETMKNVDTPSEENIYTNIKQRYAMLDSLLLAESKQQID 4558

4557 AEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPAL 4390

4389 QDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKFVPRGTIMNVEIYDLHRDPEQ 4210

4209 FPDPERFDPDRFLPEDVQRRSPYAYVPFSAGPRNCI 4096

     GQRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR* 783

 

>AAGE01001298.1 AAGE01138953 gene a C-term complete

821735340 matches on the (-)

therefore, the mate pair 821748090  is upstream an unkown amount

blast WGS with the mate pair to reach a contig upstream.

it matches AAGE01001298 from 587 to 1544, not useful

use the first part of AAGE01001298 to find a trace file that matches on the (-)

get the mate pair (upstream) and blast WGS. 529508154 matches on (-)

the mate pair is 529508153.  This matches AAGE01043608.1 on the (-)

from 1130-67 Therefore AAGE01043608.1 points away from AAGE01001298.1.  The N-term seq on AAGE01043608.1 seems to be the

upper part of the C-term seq on AAGE01001298.1

did a mate pair search with first 3000bp of AAGE01001298.1, no success

AAGE01043608 46% to 325C1 N-term

4061 MITVLLFLVLFVVLIVVKYVKYERSFSFAKNIPSVEPAYPIVGNALQFVGKNGEELFKKFADM 3873

3872 LNHPAKLFQIRMGVLRLFCTNDPDVAQKILTQCLEKPFLYDFFKLDYGLFSAH (1) 3708

3643 YDIWKNQRKSLNPTFNQKILNGFLPIFDQCAQNLVRRLQSCTDGDSVKITDCHLRCTLEML 3467

3466 CRTTFGVDINNNPNAFKLTALINE (2)

      IIQEVINRRNKEAPTLDNSDPECDGYRKPQIFIEQLLNQQENNNFTEIEIIHNVYTMIVA (0) 3244

3245  GSDTTGNQLGYISLMLAFFPELQEKVFREVMEVFPGEIEFTVDNLRQLEYTEMFI  3409

3410  KECLRLLPIGPHVMRFTTADTELEGVSIPKGNILAVSIFNMHRRKDIWGPNADQFDPENF  3589

3590  SAERSKGRHPFAYVPFSGGNRNCI (1)

      GSRYAMYSMKIVLVHLLRHFKIHTRRRFEDIRFEFEALLKMSIEPEVSLEKRVPVTIKRSN*  3877

 

AAGE02000570.1 EST DV312052.1 F = Y in 3 ESTs use this seq

DV352514.1, DV365167.1 42% to 325C3 Anoph.

40534  MITVLLFLVLFVVFIVVKYVKYERSFSFAKNIPSVEPAYPIVGNALQFVGKNGEELFKKF  40355

40354  ADMLNHPAKLFQMRMGVLRLFCTNDPDVAQKILTQCLEKPFLYDFFKLDYGLFSAH  (1) 40187

40119  YIWKNQRKSLNPTFNQKILNGFLPIFDQCAQNLVKRLQSCTDGDSVKITDCHLRCTLEML  39940

39939  CRTTFGVDINNNPNAFKLTALINE  (2) 39868

39801  IFHLASKRILRVHLYGEKIYRLTSDYRKDVKLRKEAYYYADK  (0) 39676

27232  IIQEVINRRNKEAPTLDNSDPECDGYRKPQIFIEQLLNQQENNNFTEIEIIHNVYTMIVA  (0) 27053

26986  GSDTTGNQLGYISLMLAFFPELQEKVYREVMEVFPG  26879

26878  EIEFTVDNLRQLEYTEMFIKECLRLLPIGPHVMRFTTADTELEGVSIPKGNILAVSIFNM  26699

26698  HRRKDIWGPNADQFDPENFSAERSKGRHPFAYVPFSGGNRNCI (1) 26570

26509  GSRYAMYSMKIVLVHLLRHFKIHTRRRFEDIRFEFEALLKMSIEPEVSLEKRVPVTIKRSN*  26324

 

AAGE02000570.1 = AAGE01056055.1 N-term 61% to 325E1 57% to AAGE01193335.1

Length=117881

 

69592  MFGLTFALIVVYLLALYVYAKIKYRFANKIPSIEPMVPFFGNGLEFAQKNCYKIFVNLKR  69771

69772  IFENNKHHRLFKLCFGPIVVLCPTHPDLIQKVMTDTGSMEKPYVYEFLRVDLGLLSAK  69945

 

note next P450 is at 86kb but there is no P450 seq between 69945 and 86000

This is a pseudogene fragment

 

>AAGE01001298.1 gene b N-term this seq joins with AAGE01024167a complete

AAGE01024167a parts of 2 genes 3-365 = C-term, 2021-2299 = N-term

743263008, 634987900 631579630 591886482

591886482 (+) = mate pair of 591882912 (-) that matches the lower part of AAGE01024167 on the (-) strand, so the mate pair will be on the plus strand

upstream of part b so it must belong to gene a

528593770 (+) = mate pair of 521934839(-)

528593770 matches AAGE01001298.1 gene b exon 3

14162 MVLLLISFLIVLTLLKLVHRNNHRFAKDLPSVEPCYPLLGNALMFVGKSPEQKFENLARG  14338

14339 FLQNDRLFKLWFGPKLTLGTSHPELVQKIVNHPDCIERPLFFYKQLRMTQGLLVAR (1) 14512

14569 YGLWKQQRKALNSTFNLKILHSFIPIFEECSRKLVNRLQNHVGCSKPINLAQFVSQCTLEMV  14754

14755 CGTTLGMEHLQQESGSRFLHHIERVMDIMGERILSIPMQITALYFFTPMFWQEMHSLKMNRQYAAE (0)

      IIDEGRRKMKANEQSNTIDEDQDGYHKPQIFLDQILSANRAGKPFDDEEIQHNVRTMIAA (0) 15200

      GNDTSALAISHCCLWLAMYPEIQERVYCEIKEHFPYPDSEITPEGLKNLIYTEMCIKETLRLTGPAPNIARE

      TLADVELDGLIVPKGTTIILSLYALHRRQDVWGPQADRFDPDNFDEDKCRTRPAGVFIPFSTGPRDCI (1)

      GRYAMISMKIMIMYILRNFKLITQLKPEQLRYKFGPTLKLACDHMIQLEKRVD* 371

 

AAGE02000570.1 second of five P450 N-terms on this contig = AAGE01024167a

16107  MVLLLISFLIVLTLLKLVHRNNHRFAKDLPSVEPCYPLLGNALMFVGKSPEQKFENLARGFLQN 15916

15915  DRLFKLWFGPKLTLGTSHPELVQKIVNHPDCIERPLFFYKQLRMTQGLLVAR (1) 15760

15697  YGLWKQQRKALNSTFNLKILHSFIPIFEECSRKLVNRLQNHVGCSKPINLAQFVSQCTLEMVC  15509

15508  GTTLGMEHLQQESGSRFLHHIERVMDIMGERILSIPMQITALYFFTPMFWQEMHSLK  15338

15251  EGRRKMKANEQSNTIDEDQDGYHKPQIFLDQILSANRAGKPFDDEEIQHNVRTMIAA (0) 15081

8219   GNDTSALAISHCCLWLAMYPEIQERVYCEIKEHFPYPDSEITPEGLKNLIYTEMCIKETL  8040

8039   RLTGPAPNIARETLADVELDGLIVPKGTTIILSLYALHRRQDVWGPQADRFDPDNFDEDK  7860

7859   CRTRPAGVFIPFSTGPRDCI (1)

       GRYAMISMKIMIMYILRNFKLITQLKPEQLRYKFGPTLKLACDHMIQLEKRVD*  7581

 

>AAGE01024167b parts of 2 genes 3-365 = C-term,

1937-2299(+) = N-term complete

use bottom to find (+) strand matches

this region seems to be in a repeat but

611427889(+) is 100% match mate = 611438962

no good match in WGS

move back upstream on AAGE01024167 to avoid the repeat

575338571(+) is outside the repeat, mate = 575344991 insert = 4000bp

matches AAGE01121136.1 (+) 2309bp

use the top of AAGE01121136 to find (-) strand matches

743252107(-) mate = 743258565 insert size = 4000bp

matches AAGE01224146.1 (+) 1587 bp

this seq has a P450 on it (join)

AAGE01224146 42% to 325C3 TC64516 no ESTs at TIGR for upstream part

578075956, tried mate pair search for this seq

68% to AAGE01075759

MDLFLLLLTGPLAIVFLIFLYVRVL

QYINRFANSVPFGGMSRYPLFINDWKLLRASPVQKFEILAETFAQHDRLFRVWFGPRMAF

ATCHPDVIQAILTHPECVDKPFFYRFARLDHGLLVGR (1)

1587 GHLWRRQRKQLNPTFNLRILTSFLPIFEKCCQQMVNCLEPFANGDRIDILQHTTRCTLNMIL 1408

1407 QTSLDTDSLSNEESASLVKHIKR (2) 1342

1282 FFFISTNRVLNLHHYWEPVYRLTKNFAMESESYGVILGATRK (0) 1196

 721 ILNIKKNEMKDKPLNENDLEYKKPRIYMDQLLKLSDTMSDKEIMHNVCTMIAA (0)

 548 GNDTSGQLMAYACLLLGMYPHIQEKVYSEIIELIPLTRKESISVEQLKTLTYTEMFM 378

 377 FECLRLCPIAPNIARLNMTPIELEGITIPAGHIFFISFYSLHRRKDIWGPDAEQFDPERF 198

 197 SPERSVGRHLYAFLPFSGGSRNCIGWRYAMMSMKLMLVYLLREYRF 66

  58 RTDLKLSDLKFKFDMMLVLVFEHWVKIEKRRYNC*

 

AAGE02000570.1  first of five P450 N-terms on this contig

Same as AAGE01024167b

6012  MDLFLLLTGPLAIVFLIFLYVRVLQYINRFANSVPFGGMSRYPLFINDWKLLRASPVQKFEILAE  5818

5817  TFAQHDRLFRVWFGPRMAFATCHPDVIQAILTHPECVDKPFFYRFARLDHGLLVGR  (1) 5650

continues on AAGE02000569

68485  HLWRRQRKQLNPTFNLRILTSFLPIFEKCCQQMVNCLEPFANGDRIDILQHTTRCTLNMI  68306

68305  LQTSLDTDSLSNEESASLVKHIKR  (2) 68234

68180  RFFFISTNRVLNLHHYWEPVYRLTKNFAMESESYGVILGATRK (0)  68052

67661  ILNIKKNEMKDKPLNENDLEYKKPRIYMDQLLKLSDTMSDKEIMHNVCTMIAA  (0) 67503

67443  GNDTSGQLMAYACLLLGMYPHIQEKVYSEIIELIPLTRKESISVEQLKTLTYTEMFMFEC  67264

67263  LRLCPIAPNIARLNMTPIELEGITIPAGHIFFISFYSLHRRKDIWGPDAEQFDPERFSPE  67084

67083  RSVGRHLYAFLPFSGGSRNCIGWRYAMMSMKLMLVYLLREYRFRTDLKLSDLKFKFDMML  66904

66903  VLVFEHWVKIEKRRYNC * 66850

 

>AAGE01004336.1 N-term 36% to 325E1, tried mate pair search with first

1000bp of AAGE01004336, no success

625112877 (+), 586057308

636086813 matches the first 500bp on the (-)

the mate pair (upstream) = 634991647.  Search WGS with the mate

pair seq to identify the upstream contig = AAGE01044598.1 (+) 2141-3111

this seq does not have any P450 fragments on it, keep going.

Use the first 500bp of AAGE01044598.1 to repeat procedure

595142314 matches on the (-) mate pair = 594349756

this matches AAGE01298676.1 on the (+) from 439-1278

These pieces complete a P450

AAGE01147701 AAGE01298676 494130880 72% to 494257581 38% to 325C3 56% to AAGE01041126

4639 MLLVLLAVVGVLLLFQYIKLLLSNGYADKIPPLQPVYPLVGHIPLFLGKNTHQAFDVVVKL  4457

4456 LGSVERMGKLMLGPKPLITISHPDLMQQVLTRNDLYDKPFLYEFLRLGNGLITER (1) 4292

1303 SGERWLQTRKLLGPTFNTSMLTSFLSTMDARTMKMVSKLQSLADGHSEIDIYPFLLTCTLE 1124

1123 IAISTTMGRMDDEMPGQQDYIRNLEMLVKNAIGTRIVNINLWLFYQFSKAYEVEERARKICYDFTNK (0) 865

 812 IIEQRRLELHSLPKDAAVDDEYIKKRMNTLDQILTAQKSDGTTFSNTDLIYQLFTIISA (0) 633

 568 DTSALTVSYTCLYLAMNPHIQDTVKSEMDQVFYSPDVEINLDTLKQMEYTEMAIKEALRI 389

 388 CPTAPFAARQTSSEILLDGITVPKGEIIFIDLYNLHRHKEFWGPDPDRYDPERFRPEAVQ 209

 208 QRHPFAFLPFSGGSRNCIGHRYAMNAMKIMLLRLLQNFEVRTNLKQEDFKFRFEITAKLE 29

  28 GPHSVWLVKRNKGL*

 

 

>494576331 90% to 494087031 43% to CYP325C1 not in WGS section except N-term

744442433 821640843 (N-term part may not belong to this gene

939  LQKVFIDQLLDESALGRSFSDTEIVQNVYTMLAA 1040 821640843

        847 QLLDESALGRSFSNTEIVQNVYTMLAA 767) 494576331

end of 6331 has no matches in trace archive or WGS

whole 821640843 has no matches in trace archive or WGS

cannot extend seq upstream.

note 21-120 of 494576331 matches 100-1 (-) of AAGE01456445.1 100%

AAGE01456445 1-48 = extreme C-term of this P450

Use top of AAGE01456445 to find (-) matches

575128039(-) mate = 574349186

matches AAGE01279448.1 (+) 1397bp upstream of P450 seq below

613966178(-) mate = no mate

use top of AAGE01279448 for (-) strand matches

827530978(-) mate = 826071276 matches a repeat

757071970(-) mate = 757078385 matches a repeat

used the bottom of AAGE01279448 for (-) strand matches

no (-) strand matches

tried walking upstream of AAGE01279448

803206452(+) goes 690bp upstream, use to find next contig

AAGE01098162.1 (+) 2623bp is the best match, but only 92%

There may not be a WGS match

Continue walk from first 600 bp of 803206452

521895948(-)  extends seq 588-1173

invert and blast against WGS for next contig

matches AAGE01511887.1(+) 945 bp and AAGE01256996.1 (+) 1466bp

note these were also found in the 494087031 region

must be an error in one of these gene assemblies

the match seen in AAGE01456445 1-48 to the C-term is identical to 494576331

and not to the 494087031 gene.

Use bottom of 6331 for (+) strand matches, try to link to 9448

Use bottom of 6445 for (+) strand matches, try to link to 9934

578396833 (+) mate = 578471877

755814757(+) mate = 755821174

821647414(-) mate = 821640843 with N-term part of 6331 P450

this seq extends 6445 seq.

759288116 extends 821647414 (+) farther into the gap mate = 759284526

AAGE01642682 continues from 8116 overlap 0nly95% match maybe not

568531526 extends AAGE01642682(+) mate = 568527984

757700898(-) overlaps 759288116 mate = 757698399 (no WGS match)

823382968 (-) matches 757698399

(1) ASLWRSQRKALYATFSPSILKNFIPTFETFSKQLVDKLHQYEGSTIDILPITSACTLRMI

GRSTMGVDENDEMEIAKFVSNMDK (2)

ITEVVSNRFLSVHLHSEQIYRMTSSYEREIQYRQECSNYTMK (0?)

ILKERKRKVHMELARLQKLARLQKVFIDQLLDESALGRSFSDTEIVQNVYTMLAA (0)

VLKERKRKVYMELPG

GSETTARSISYACLLLAIYPDVQEKVHAEIMSLFQDDIHPLTTATLAVLTYMEAFLK

ECHRLYPVGPYIARESTDSIELDGTSFPKGSVFVFNFFTLHRRTAFWGTDSEHFN

PERFLNEQHGEHHAFGYLPFSGGQRNCI (1)

228 GQRYAMMSLKVMLIYLIRNFRIETHLRHEDLRFSFGMMLELSSECLVRFNKR* 70

 

>494087031 86% to 494576331 46% to CYP325K1

AAGE01227180.1 same as AAGE01139230.1 (overlap)

Use the last 500bp of AAGE01227180 to find a trace seq on (+)

Wrong way so AAGE01256996 may be downstream of 7031

And upstream of 6331

519661758 matches (+) mate pair = 519808856

this matches AAGE01256996.1 only 1466bp and no p450

574210685 matches (+) mate pair = 574203231

this matches AAGE01139934.1 (-) no P450

591889338 matches (+) mate pair = 591895771

matches AAGE01139934.1 (-) again only 2117 bp

use last 500 bp of AAGE01139934 to repeat

822879728 (+)  mate pair = 822884411

matches AAGE01511887.1 (+)  945bp and AAGE01256996.1 (+)

note AAGE01256996.1 must be downstream of AAGE01139934

760895106(+) mate pair = 760897640

best match = AAGE01465489.1 1006bp seems to be in a repeat

try beginning of AAGE01256996 look for (-) stand match

to go farther downstream with the mate pair

823320338 (-) mate pair = 823337193

matches AAGE01007603.1 (+) 10kb no P450, looks like a repeat

 

may be a problem with this assembly so used 812167174

a 494087031 specific sequence to find (-) strand matches

with upstream mate pairs

586126037(-) mate = 586046641

matches AAGE01139934.1(-) 2117bp differs from AAGE01279448.1

633003388(-) mate = 632990527

matches AAGE01139934

use bottom of AAGE01139934 for (+) strand matches

574203231 (-) 100% mate = 574210685 error used – match here

matches AAGE01227180.1 (+) 1575bp

also matches AAGE01139230.1 (-) 2124 bp (more upstream)

still in coding region

584214911 (+) 1 nuc diff mate = 584218501 same match as above

should use 825270301(+) mate = 825997756

matches AAGE01511887.1 (+) 945bp

(-) match to AAGE01511887 = 588751488 mate = 588755052

about 4000bp insert size

matches AAGE01217117.1 (-) 1618bp

803206452(+) extends 825997756 about450bp

The mate pair of 803206452 = 808267411

This seq matches AAGE01139934

803206452 matches AAGE01279448.1 on (+)

574349186 extends AAGE01279448 on (+) about 200bp

mate = 575128039 matches AAGE01456445(-) 100%

822879728(+) extends AAGE01139934 on (+) 750bp

835013678(+) extends 822879728 about 150bp

569815318(+) may extend 835013678 about 800bp

use top of AAGE01227180 to find (-) strand matches

571497371(-) mate = 571493799

matches AAGE01248465.1(+) 1495bp

812167174(-) mate = 813478515

matches AAGE01115835.1 (-) 2373bp

760773555(-) mate = 760780002

matches AAGE01115835 and AAGE01248465

625133249(-) extends AAGE01115835 about 500bp upstream

toward AAGE01227180, took this seq inverted it and blasted WGS

matched AAGE01139230

620662629(-) extends AAGE01139934 upstream 400bp

592204606 goes upstream 180bp

529229530 may continue 592204606 and AAGE01139934 upstream

still no match in WGS

819718554 extends farther in same direction

matches AAGE01227180.1

1393 (1) ASLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEGSTIDILPIT

SACTLRMIGRTTMGIDENDEMEIAKFVSNMDK (2?)

ITEVVSNRFLSVHLHSEQIYRMTQLYERETQYRKECSNYTMK (0)

VLKERKRKVYMELPGLQKVFIDQLLDECALGRSFSDTEIMQNVYTMLAA (0)

GSETTARSISYACLLLAIYPDIQEKVYAEIMSLLSDDIHPLTTATLAELTYMEAFLKECH

RLYPVAPYIARESTESIELDGVCFPKGSVFIFNFFALHRSSAVWGIDSEQFNPERFL

NEQNGEHHAFGYLPFSGGQRNCI ()

GQRYAMMSLKVMLIYLIRYFRMETHLRQEDLRFSFGMMLELSTECLVRFNKR*

 

AAGE02000572.1 exon 1 use this seq

AAGE02000573.1 exons 2 to end

 8761  MLITEFLILFLIVLVFIKLIKPLIQFRSIPYVRPWYPLVGNVFLFLGKTGEQLFDQMNCM  8940

 8941  FAQHDRLFLLWFGIRPVVGVSHPELIRKVLTSRACLEKPFFYRFSRIDQGLWAAK (1) 9105

10805  SLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEGSTIDILPITSA  10963

10964  CTLRMIGRTTMGIDENDEMEIAKFVSNMDK (2) 11053

11106  RITEVVSNRFLSVHLHSEQIYRMTQLYERETQYRKECSNYTMK (0) 111134

11297  VLKERKRKVYMELPGLQKVFIDQLLDECALGRSFSDTEIMQNVYTMLAA (0) 11443

11505  GSETTARSISYACLLLAIYPDIQEKVYAEIMSLLSDDIHPLTTAT  11639

11640  LAELTYMEAFLKECHRLYPVAPYIARESTESIELDGVCFPKGSVFIFNFFALHRSSAVWG  11819

11820  IDSEQFNPERFLNEQNGEHHAFGYLPFSGGQRNCI (1) 11924

11989  GQRYAMMSLKVMLIYLIRYFRMETHLRQEDLRFSFGMMLELSTECLVRFNKR*  12147

 

494087031 gene C-term matches 100% to 571497371 607-713

754300186 169-275, 812167174 945-1000 (partial 1-56)

ACTTCAGAATGGAAACGCACTTACGCCAAGAAGATTTACGATTTAGTTTCGGAATGA

TGCTGGAACTTTCAACGGAATGTTTGGTTCGATTTAACAAACGATGA

FRMETHLRQEDLRFSFGMMLELSTECLVRFNKR*

 

The other seq is 613966178(-) 848-745, 494576331 (-) 173-70

744442433 605-708, 575128039 (-) 757-674 with a few errors

ACTTCAGAATAGAAACGCACCTACGCCACGAAGATTTACGATTCAGTTTCGGGATGA

TGCTGGAACTTTCATCGGAATGTTTAGTTCGATTTAACAAACGATGA

Tgctggaactttcatcggaatgtttagttcgatttaacaaacgatga AAGE01456445

FRIETHLRHEDLRFSFGMMLELSSECLVRFNKR*

AAGE01456445 is identical to the 494576331 seq SS not ST

 

Note 494579893 is the mate pair of 494576331

744448874 is mate pair of 744442433

574349186 is mate pair of 575128039

 

>AAGE01205264.1 N-term 41% to 325C2 63% to AAGE01102953

use top of this seq to find (-) matches

575508824(-) mate = 575515286 matches a repeat region

618132741(-) mate = 620849900

matches AAGE01592485.1 (+) 822bp 100% no P450

also matches AAGE01067688.1 (+) 3264bp 1 nuc diff, no P450

Note: this seq connects via numerous steps to AAGE01001451

Join complete

579979942(-) mate = 579983471

matches AAGE01293853.1(+) 1356 bp no P450

595039631(-) mate = 595043179 matches AAGE01293853

636182639(+) goes upstream 283bp

521922341(+) extends back about 530bp

matches AAGE01570704.1 (+) 859 bp

also matches AAGE01423387.1 (-) 1072bp

AAGE01001451 47% to AAGE01025218 41% to 325C3

used 14000-14600 region to find (+) strand matches

578828140(+) mate = 578830669

matches AAGE01294062.1 (-) 1356bp

also matches AAGE01359563.1 (-) 1199bp

581388714(+) mate = 581381243 matches AAGE01294062

use the end of AAGE01294062 to find (+) strand matches

580182961(+) mate = 580167672

matches AAGE01104198.1 (+) 2529bp

591439015(+) mate = 589183870

matches AAGE01139429.1 (+) 2122bp

and AAGE01295550

use the top of AAGE01295550 to find (-) matches

813088108(-) mate = 813097585 matches AAGE01104198(-)

630167999(-) mate = 630791391 matches AAGE01084800(+)

use the top of AAGE01084800 to find (-) matches

750327728(-) mate = 750321309 matches AAGE01067688(-)

note: two intron boundaries revised May 17, 2006, now corrected

     MLLAVTVIVGLITIWLLLSQ

453  RRRYRFADSLPQLKPWFPVVGNGALMFGKSDVDRFDVLVKIFRDYDRMVRVWAGPKMLLF  274

273  TSHPDLVQQLLTSPACLEKPFLYSFAGFEQGLFTSK (1) 166

8177 YKLWRSMRKRLNSSFNLRILHGFIPVFVQCARKMVEDLNENPDGTVVSMHKFTSVCTLEMA 7998

7997 CGTTLGSDITRREGKEEFVHGLDI (2)

7858 AFGEAARRMVSVHLYPNIVYHLTKYHRELVQARGVVCDFFSR (0) 7745

7636 LVTERRNTMSLNCNKKTNEEELDFDRKPKILIDQLLSVNRDGKSFSDTEIEDNIYAVITG (0)

7423 ANDTSGLLIAHACLFLCFYKDIEEKLFTEIMEFMPNEEFEINPESLKQLSYL 7253

7267 EKFLKECLRHCPVAPNISRENMSEIEIDGMKVPPGNIFIMNFYALHRRKDIWGPDADKFD 7088

7087 PEQFSEERSRNRHPFAYLPFSGGNRICI (1) 7004

6945 GWRYAMFSMKVMLIYLIRNFQFETEIRPEQVRYRHDLTMKLPFEHMIKVTRRKLE 6781

     GSTVMSDILKHPELVPKEGRE*

 

AAGE02009115.1

Query  1      MLLAVTVIVGLITIWLLLSQRRRYRFADSLPQLKPWFPVVGNGALMFGKSDVDRFDVLVK  60

              MLLAVTVIVGLITIWLLLSQRRRYRFADSLPQLKPWFPVVGNGALMFGKSDVDRFDVLVK

Sbjct  11166  MLLAVTVIVGLITIWLLLSQRRRYRFADSLPQLKPWFPVVGNGALMFGKSDVDRFDVLVK  11345

 

Query  61     IFRDYDRMVRVWAGPKMLLFTSHPDLVQQLLTSPACLEKPFLYSFAGFEQGLFTSK  116

              IFRDYDRMVRVWAGPKMLLFTSHPDLVQQLLTSPACLEKPFLYSFAGFEQGLFTSK

Sbjct  11346  IFRDYDRMVRVWAGPKMLLFTSHPDLVQQLLTSPACLEKPFLYSFAGFEQGLFTSK  11513

 

Query  118    KLWRSMRKRLNSSFNLRILHGFIPVFVQCARKMVEDLNENPDGTVVSMHKFTSVCTLEMA  177

              KLWRSMRKRLNSSFNLRILHGFIPVFVQCARKMVEDLNENPDGTVVSMHKFTSVCTLEMA

Sbjct  39540  KLWRSMRKRLNSSFNLRILHGFIPVFVQCARKMVEDLNENPDGTVVSMHKFTSVCTLEMA  39719

 

Query  178    CGTTLGSDITRREGKEEFVHGLDIAFGEAARRMVSV  213

              CGTTLGSDITRREGKEEFVHGLD++    A R VS+

Sbjct  39720  CGTTLGSDITRREGKEEFVHGLDMS---VAVRFVSL  39818

 

Query  202    AFGEAARRMVSVHLYPNIVYHLTKYHRELVQARGVVCDFFSR------------------  243

              AFGEAARRMVSVHLYPNIVYHLTKYHRELVQARGVVCDFFSR                 

Sbjct  39856  AFGEAARRMVSVHLYPNIVYHLTKYHRELVQARGVVCDFFSRVSLFPPFFGVIMSHIIDC  40035

 

Query  244    ------LVTERRNTMSLNCNKKTNEEELDFDRKPKILIDQLLSVNRDGKSFSDTEIEDNI  297

                    LVTERRNTMSLNCNKKTNEEELDFDRKPKILIDQLLSVNRDGKSFSDTEIEDNI

Sbjct  40036  CMFYFQLVTERRNTMSLNCNKKTNEEELDFDRKPKILIDQLLSVNRDGKSFSDTEIEDNI  40215

 

Query  298    YAVITG--------------------ANDTSGLLIAHACLFLCFYKDIEEKLFTEIMEFM  337

              YAVITG                    ANDTSGLLIAHACLFLCFYKDIEEKLFTEIMEFM

Sbjct  40216  YAVITGVGSEGFEELRL*YFI*LSFKANDTSGLLIAHACLFLCFYKDIEEKLFTEIMEFM  40395

 

Query  338    PNEEFEINPESLKQLSYLEKFLKECLRHCPVAPNISRENMSEIEIDGMKVPPGNIFIMNF  397

              PNEEFEINPESLKQLSYLEKFLKECLRHCPVAPNISRENMSEIEIDGMKVPPGNIFIMNF

Sbjct  40396  PNEEFEINPESLKQLSYLEKFLKECLRHCPVAPNISRENMSEIEIDGMKVPPGNIFIMNF  40575

 

Query  398    YALHRRKDIWGPDADKFDPEQFSEERSRNRHPFAYLPFSGGNRICIG  444

              YALHRRKDIWGPDADKFDPEQFSEERSRNRHPFAYLPFSGGNRICIG

Sbjct  40576  YALHRRKDIWGPDADKFDPEQFSEERSRNRHPFAYLPFSGGNRICIG  40716

 

Query  443    IGWRYAMFSMKVMLIYLIRNFQFETEIRPEQVRYRHDLTMKLPFEHMIKVTRRKLEGSTV  502

              +GWRYAMFSMKVMLIYLIRNFQFETEIRPEQVRYRHDLTMKLPFEHMIKVTRRKLEGSTV

Sbjct  40769  VGWRYAMFSMKVMLIYLIRNFQFETEIRPEQVRYRHDLTMKLPFEHMIKVTRRKLEGSTV  40948

 

Query  503    MSDILKHPELVPKEGRE  519

              MSDILKHPELVPKEGRE

Sbjct  40949  MSDILKHPELVPKEGRE  40999

 

>AAGE01005370.1 N-term 67% to AAGE01071933 complete

AAGE01041126 C-term 39% to 325C1 78% to AAGE01070673

used bottom for (+) strand matches

753010618 (+) mate = 753014216

matches AAGE01028971.1 (-) 5369 bp no P450

832392017(+) mate = 827574490 matches AAGE01028971

633009586(+) mate = 632994720 matches AAGE01005370 about 95%

632994720 has no exact matches in trace archive so it

may be a poor quality seq.  Best matches all match AAGE01005370

This is evidence to link these N and C-terminals, but it

Is not strong.  There seem to be several repeats in the

Intervening seq that make it very hard to span this region.

Join with AAGE01005370

4098 MFVFLLGVVCALLVFQYLKLLRAGSYATKIETLRPVYPVLGHIPLFGGKNSHESFCTLRRLLNS  4289

4290 VDRIGKFVVGPKPFIVIQHPELMQQVLCCNELYDKPFLYDFFRLGNGILTER (1) 4445

3009 SGERWLQARKLVNPAFNTRMLTAFLPIMDSEAKSLCDKLEPLADGNTEIDIF 2857

2856 SHLSSCTLSTTFGTTMGQNAKEIPEQHDYIRNVEI (2)

     FLKAVG 2677

2676 ERLVNVYYFIEPIYKLSKAYKIHDEARRICNEFTHRIVSKRRFEIQSLGEDFQQKDEYI 2500

2499 KQHLNALDQIITMKRPDGTGFSDLEVNEHLYTLIGA (0) 2392

2330 GTDTSALNVAYTCLYLAMYPEVQEKVLTEINQVFYSPEVEVNIENLKQLEYTEMVIK 2160

2159 EILRLFPAVPLGARQTMSAIELDGIRIPKDQIIIFSMFTLHRRKDIWGPDPEQFDPERFR 1980

1979 PEAIEARHPFAYLPFSGGLRNCIGHRYAMNVMRIILLRIMQKFEIQTNMKPTDLKLKFEV 1800

1799 TLKLDGPHRVWLVRRNK* 1746

 

>AAGE01071933.1 N-term complete

636174215 (-) mate = 637138541

matches AAGE01109197.1 (+) 2460 bp

use first 500bp of AAGE01109197 to find (-) match

572574319 (-) mate = 572566832

matches AAGE01603317.1 (+) 803bp no P450

821756106 (-) mate = 822887151

matches AAGE01368176.1 (+) 1180bp no P450

AAGE01070673 76% to AAGE01041126 825771011

used the end of AAGE01070673 to find a (+) strand match

used mate pair to jump to an upstream contig

822916395 (+) mate = 821723190, matches AAGE01370794.1 (+) 1174bp no P450

581417565 (+) mate = 581414014, matches AAGE01622151.1 (+) 764bp no P450

use all of AAGE01622151.1 to find (-) matches

520523028(-) mate = 520159312

matches AAGE01071933.1 (-) 3153bp = N-term of P450 join

494257581 44% to TC59131 48% to 325K

same seq as AAGE01070673

AAGE01004336.1 very end (11kb region) matches 494257581

There is a full p450 in the 1-4000 region of this contig

These genes are linked.  This seq is just beyond the end of

AAGE01004336 on the (-)

2959  MFLILLGIIGALLAWQYLKLLLAGSYATKIESFRPSYPVLGHLTLFWGKNSCEAFSSATR  2780

2779  LFATVDRLGKVMLGPKPLIVVHHPEVMQQVLSRHDLYDKPFFYDFLRLGSGLITER (1) 2612

1060 SGERWLQARKLLNPTFNTRMLTGFLPIMDSEARRLSNGLEPLADGKTEIDIFKYISSCTLS 878

 877 MVFSTTMGQNGKEIPGQQDYVRSLED (2) 803

 740 LMNAVGERIMNVNHFIGPIYRFSKAYRVHQKASEVCNGFTHRIVQKRRFEIQKLGENFHQK 561

 560 DEYIKQSLNALDQIITIKKQDGSGFSDTEVNEHLYSLIGA (0) 441

 375 ANDTSALTAAYTCLYLAMYPEIQNKVVNEMNQVFYSPEVEVNLETLKQLEYTEMVIKEILRL 196

 195 FPAVPLGARQTANEIVLDGIRIPKDQIIVYSLYTLHRRKDIWGPDPDQFDPERFLSEAIQ 16

  15 ARHPFAYLPFSGGLRNCIGHRYAMNSMRIMLLRILQKFEIRTNMKPMELKLKFEITLKLD

     GPHRVWLVKRNK*

 

>AAGE01120855.1 N-term 36% to 325C2, 52% to AAGE01039338 complete

520695497 (+) mate = 521887392

matches AAGE01164267.1 (-) 1919 bp no P450

595144689 (+) mate = 594350190

matches AAGE01088447.1 (-) 2793 bp no P450

592694963 (+) mate = 592508489

matches AAGE01426313.1 (+) 1067 bp no P450

494193586 52% to 494159924 50% to 325C, 52% to AAGE01039338

AAGE01018213.1(+) AAGE01548387.1

Use top of AAGE01018213 for (-) strand matches

586006569(-) mate = 589171602

matches AAGE01164267.1 (+) 1919bp

this contig also found in AAGE01120855 search join sequences

seq revised May 17, 2006

1243 MWFFTVALVSICVLMVIRWLQKRRRDFAKHVPWVRPYLPVLGNGLLFIGKDDVQRFWNMQKMFDRKE  1443

1444 NLFRFYLGPNTVFGTNDPGTAQQILTDPNCMDKPYVYDYFLADCGVFAAK (1) 1593

2350 TSVWKSQRKALNPTSNVRVLQGYIPTFCRINSAMIKRLENVPAGKTINFMDYASRLAVELVCA

TTLGFDINQFDDPDGFAHNMER (2)

VFYVASRRMLNVHLQLDTVYRWTKDYREERALREKMESYAMK (0)

IYESAERRFSSPPEDDEDQEQEKSRILVHQLFVNKHRKFAKM

EILHNIYTIIAAGTDTTANAVSYTCLQLAMHPEQQERLYNEINDIFPNSEPIITLEALKC

LPYLDMVLKEALRLYPAAWIVMRENTDDVIIDGLRIPKGNKFAVNIYSMQRRVDVWG

PDANLFNPERFGAERSATRHRYAFLPFSGGRRDCL ()

GARYAMISMKIMMVHLVKHFRFTTTMREEDINFRFDALLRIIGGHQLQIEKR*

 

AAGE02009773.1 Length=286003

 

Query  1       MWFFTVALVSICVLMVIRWLQKRRRDFAKHVPWVRPYLPVLGNGLLFIGKDDVQRFWNMQ  60

               MWFFTVALVSICVLMVIRWLQKRRRDFAKHVPWVRPYLPVLGNGLLFIGKDDVQRFWNMQ

Sbjct  213771  MWFFTVALVSICVLMVIRWLQKRRRDFAKHVPWVRPYLPVLGNGLLFIGKDDVQRFWNMQ  213950

 

Query  61      KMFDRKENLFRFYLGPNTVFGTNDPGTAQQILTDPNCMDKPYVYDYFLADCGVFAAKSMF  120

               KMFDRKENLFRFYLGPNTVFGTNDPGTAQQILTDPNCMDKPYVYDYFLADCGVFAAKSMF

Sbjct  213951  KMFDRKENLFRFYLGPNTVFGTNDPGTAQQILTDPNCMDKPYVYDYFLADCGVFAAKSMF  214130

Phase 0 boundary at KSMF/ASV not possiblem phase one at FAAK/TSV is possible

ttcgccgcgaaaagtatgttt gttctgtgaa

F  A  A  K  S  M  F

atta tagcaagtgt

I  I  A  S  V

Query  119     MFASVWKSQRKALNPTSNVRVLQGYIPTFCRINSAMIKRLENVPAGKTINFMDYASRLAV  178

               + ASVWKSQRKALNPTSNVRVLQGYIPTFCRINSAMIKRLENVPAGKTINFMDYASRLAV

Sbjct  222887  IIASVWKSQRKALNPTSNVRVLQGYIPTFCRINSAMIKRLENVPAGKTINFMDYASRLAV  223066

 

Query  179     ELVCATTLGFDINQFDDPDGFAHNME-------------------------RVFYVASRR  213

               ELVCATTLGFDINQFDDPDGFAHNME                         RVFYVASRR

Sbjct  223067  ELVCATTLGFDINQFDDPDGFAHNMER*FMGHTN*IQCCMNWTLIVYEILFRVFYVASRR  223246

 

aacatggaacggtaatttat

N  M  E  R  * must use phase 2 GT

attttgttcagagttttctatgtagcttcc

I  L  F  R  V  F  Y

My seq has an extra R (remove it)

Query  214     MLNVHLQLDTVYRWTKDYREERALREKMESYAMKI  248

               MLNVHLQLDTVYRWTKDYREERALREKMESYAMK+

Sbjct  223247  MLNVHLQLDTVYRWTKDYREERALREKMESYAMKV  223351

 

Phase 0 possible as above

There is no GT after YREE so my seq is not possible use this seq

223261 tctgcagttg gatacagttt atcgatggac taaggattac cgcgaggaac gagcccttcg

223321 cgagaagatg gagagttatg caatgaaggt agtgaatggt ataatgcagg gctggttgca

 

Query  246     MKIYESAERRFSSPPEDDEDQEQEKSRILVHQLFVNKHRKFAKMEILHNIYTIIAAGTDT  305

               ++IYESAERRFSSPPEDDEDQEQEKSRILVHQLFVNKHRKFAKMEILHNIYTIIAAGTDT

Sbjct  224038  IQIYESAERRFSSPPEDDEDQEQEKSRILVHQLFVNKHRKFAKMEILHNIYTIIAAGTDT  224217

 

Query  306     TANAVSYTCLQLAMHPEQQERLYNEINDIFPNSEPIITLEALKCLPYLDMVLKEALRLYP  365

               TANAVSYTCLQLAMHPEQQERLYNEINDIFPNSEPIITLEALKCLPYLDMVLKEALRLYP

Sbjct  224218  TANAVSYTCLQLAMHPEQQERLYNEINDIFPNSEPIITLEALKCLPYLDMVLKEALRLYP  224397

 

Query  366     AAWIVMRENTDDVIIDGLRIPKGNKFAVNIYSMQRRVDVWGPDANLFNPERFGAERSATR  425

               AAWIVMRENTDDVIIDGLRIPKGNKFAVNIYSMQRRVDVWGPDANLFNPERFGAERSATR

Sbjct  224398  AAWIVMRENTDDVIIDGLRIPKGNKFAVNIYSMQRRVDVWGPDANLFNPERFGAERSATR  224577

 

Query  426     HRYAFLPFSGGRRDCLG  442

               HRYAFLPFSGGRRDCLG

Sbjct  224578  HRYAFLPFSGGRRDCLG  224628

 

Query  442     GARYAMISMKIMMVHLVKHFRFTTTMREEDINFRFDALLRIIGGHQLQIEKR  493

               GARYAMISMKIMMVHLVKHFRFTTTMREEDINFRFDALLRIIGGHQLQIEKR

Sbjct  224714  GARYAMISMKIMMVHLVKHFRFTTTMREEDINFRFDALLRIIGGHQLQIEKR  224869

 

>AAGE01124926.1 N-term 41% to 325E1 complete

found mate pair 739505219 of 739501659 that matches the end of

AAGE01124926.  This seq matches = AAGE01015918

AAGE01015918 67% to AAGE01147701 joined by mate pair

Also on AAGE02018310.1

     MAVILAFLLFTTFFFMLVLIRRYVIDNL

405  YALKIPTVSPVQPVIGHAGIFMRKNTHQMFWLFVKCYQEVDRLAKLRFGPIPVLLVNHPE  584

585  LIQQLMIRPELYDKPFFYEYMGLGKGLITEQ (1) 677

SEIWRRSRKLLNPAFSTRILNEFVPIMDSRARKMVKSLAVLADDKTEFDILPITAQCTLEMVF STTMGCKMEERPGEREYVRCLEGLMTCIGERILNIDRYLGPVYRFTKAYQVDKVCRDTCNGFTEK (0) 1147

IIQERKREMANINNNIIDERMSAAEIDDGRVKSMNFLDQILTIQRPDGTNFVDDEVSDHLYSIVGA (0) 1399

1287 GNETSTLTISYTCLFLAMDHQIQAKVCSEIKQVFPSHHTEVTPEALKQLIYTEMALKETL 1108

1107 RLCPAVPFAARSNVKPIELDGIHIPQRQIFCFNFFALHRRKDFWGDEPEQFDPNRFSPEN 928

 927 SRNRHPYAYLPFSGGFRNCIGGRYAINSLKIMLLRILQNFHMDTSLKREDMRFRWAITMK 748

 747 LVGPHAVRLTKRDL* 703

 

>AAGE01025218.1 44% to 325C3 500-2700 region complete

MLLIPVLLLIAIFSATWVLIILYIKRNRAFARSLTLHPPKVYFLGMDLTMAVEDEVQRFESVWRMF

LSHDRMFKHLLGPIMGIGISHPDLMHKVLSHPDCLEKPFFYNFVQLEHGIFSAE (1)

YKLWKGQRKALNPTFNMKILNSFISIFEDCSSRMVADLFKCANGETVDMFQFTSKCTLEMV

CATTLGSNVLEREGSDEFLRNMEG (2)

LFELVGKRMLSVELFLDSIYRLTSYYRKEMKIRKKIEEFSGN (0)

IIREKRREHMFCLNQQHLHNASTPKEDEDDIRKPQIFIDQLLSLSNSSRPFTDEEILHNVLTIMIA (0)

GNDTSGLGVAHACLFLAIYPNIQQKVYDEVMKHFPPDGPNDRISLDADFLRQLEYTEMFL 

KEVLRHCPVAPTVARQNLKELELDGVRIPAGNTLSFSFFALHRRKDIWGPDAEKFDPENF 

APERCEKRHPYAFMPFSSGSRNCIGGRYAMISMKVMIVYIVRNFSLKTNLRHSHLRYKFG 

MTLKLPFAHAIQVYKRNIEQ*

 

>AAGE01239763 95% to AAGE01025218 825989618 757493333 813564487 complete

     MLLIPVLLLIAIFSATWVLIILYIKRNRAFARSLTLHPPKVYFLGMDLTMAVEDEVQRFD

     SVWRMFLSHDRMFKHLLGPIMGIGISHPDLMHKVLSHPDCLEKPFFYNFVQLEHGIFSAE (1)

     YKLWKGQRKALNPTFNMKILNS

1525 FIPIFEDCSSRMVADLFKWANGDTVDMFHFTSKCTLEMVCATTLGSNVLEREGSDEFLRNMEG 1334

1281 LFELVGKRMLSVELFLDSIYRLTSYHRKERKIRKKINEFSGN (0)

 584 IIKEKRREHMSCLNQQHLHDESTLTKDEDDIRKPQIFIDQLLSLSNSSRPFTDEEILHNVLTIMIA (0) 387

 319 GNDTSGLGVAHACLFLAIYPDIQQKVYDEVMKHFPPDGPNDRISLDADSLRQLEYTEMFL 140

 139 KEVLRHCPVAPTVARQNLKELELDGVRIPAGNTLSFSFFALHRRKD 2

     IWGQDAEKFDPENFAPERCERRHPYAFMPFSNGSRNCIGGRYAMISMKVMIVYIVRN

     FSLKTNLRHSDLKYKFGMTLKLPFAHAIQVYKRNVE*

 

>AAGE01027431.1 AAGE01206292 (5 aa diffs)  45% to 325J1 complete

476419948 46% to 494292861 (probable hybrid gene seq)

3565  MITQIVLVSFIISLMYWWRNRLIHQQLGCLPGPFSLPVLGSSYIFLGKSYSEILDAFHR  3741

3742  ISSTYGRNGSPVRFFLGSKPYIIINHPDHAQTILNSTGCLDKPWIYQYTPLEGIFS  3909

3910  LPTQKWRIHRKAIQPSFNWSILKSFLPIFKSKADLLIHKLKQRTSSHEPFDIYGFVAACT  4089

4090  LDMVY (1)  4104

4175  ATTLGIEMNIQQQSSCEYLEILEELFELVTNRVTNVLLHYDWIYRWTSYYRRECKARKI  4351

4352  FQSPAQQVLRQKPILVSNATESDDLTQPQIFIDQLYRIAVKDPHFTKETIEKELNT  4519

4520  MIFGGNETTAVTMSNALLLIAMHPDVQLKLLEEFNVVFEGNLENMTVANLQRLVYM  4687

4688  EAVLKEVMRLWPITTILGRTTSTEVRLDEFLIPAEVNLVIDVYSIHRNARYWGADANRFV  4867

4868  PERFFDREQYPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYNFELRTPLRMEDIR  5038

5039  LKVAMTLKVENKHMITLSDRRK*  5098

 

AAGE02021468.1 13 diffs all in first 167 aa

Length=30723

7153  MITQIVLICFIISLTYWWHNRLIHQQLGCLPGPFSLPLLGSSYIFLGKSYSEILDAFHRI  6974

6973  SSTYGRNGNPVRFFLGTKPFIIINHPDHAQTILNSTSCLDKPWIYQYTPLEGIFSLPTQK  6794

6793  WRIHRKAIQPSFNWSILKSFLPIFKSKVDLLIRKLKQRTSGHEPFDVYGFVAACTLDMVY  (1) 6614

6543  ATTLGIEMNIQQQSSCEYLEILEELFELVTNRVTNVLLHYDWIYRWTSYYRRECKARKIF  6364

6363  QSPAQQVLRQKPILVSNATESDDLTQPQIFIDQLYRIAVKDPHFTKETIEKELNTMIFGG  6184

6183  NETTAVTMSNALLLIAMHPDVQLKLLEEFNVVFEGNLENMTVANLQRLVYMEAVLKEVMR  6004

6003  LWPITTILGRTTSTEVRLDEFLIPAEVNLVIDVYSIHRNARYWGADANRFVPERFFDREQ  5824

5823  YPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYNFELRTPLRMEDIRLKVAMTLKVENK  5644

5643  HMITLSDRRK*  5611

 

note: 578965186 is a 100% match to AAGE01027431 above in the first exon

There must be two sequences

Trace file 581849161 also matches this region 100% for 189 bp

Including 4 amino acid differences

512981174 also matches including 6 differences from YIIINH

>gnl|ti|578965186 name:1095030068504 mate:578923620 green = exon 1

CAGGCCGGAATCCCCAAACAGTGATGCAACAGTATTTAATGAACTAAGTTCCATTATTATCATACCTACC

TCCCAACATCATCGCGTGATTCTGGCGTACCTAACTGCTACTGAGTAGGTAGCAGAGAGTAATGTTGTTT

GAGTACTGTAAGACTTGACAAAAGATACTAGATGCATACGCCGTGCATATCAACATGGTAACCCATTGTT

ATTCATGTATACGAATGACTTTCAAATTTTGGAAGTATAAATTCCCATTGAAAGCACATGAAGGAATGAA

TAAAACAGTGTTTCTAAGCTTTGTTTATACCAGTCAGAGATTTATAAC ATGATCACTCAAATCGTTCTTG

TCAGTTTCATCATTTCTCTTATGTACTGGTGGCGCAACCGCCTAATTCACCAACAATTGGGATGCCTTCC

AGGTCCGTTCAGCTTGCCGGTACTAGGATCAAGCTACATATTTCTTGGCAAATCTTATTCCGAAATATTG

GACGCTTTTCATCGGATTTCCTCTACCTATGGACGCAACGGAAGCCCTGTACGATTCTTCCTCGGCTCGA

AGCCTTACATAATCATAAATCACCCCGATCATGCTCAAACGATTCTAAATTCTACGGGCTGCCTGGACAA

ACCATGGATCTATCAGTACACGCCACTTGAGGGAATATTTTCACTACCAACACAGAAATGGCGAATCCAC

AGGAAAGCAATCCAACCAAGCTTCAATTGGTCTATTTTGAAAAGCTTCCTACCAATTTTTAAAAGTAAAG

CAGATTTACTGATACATAAACTAAAGCAACGAACTTCCAGTCACGAACCCTTCGATATCTATGGATTCGT

AGCAGCTTGCACATTGGATATGGTGTAT GGTAAGTTAAACATAATCTGTATGATGCGTTCATATTGATCT

CTAAAACAACTTCATCTTCCATTTGAAAGCTACCACCCTCGGCATAGAAATGAACATCCAGCAACAATCA

TCTTGTGAGTATCTGGAGATACTCGAAGAATTATTCGAATTGGTGACAAATCGTGTAACGAACGTTCTGC

TCCACTACGATTGGATCTATCGA

 

The second exon in two of these trace files has a 2 aa insertion and two

Other differences shown below

This seq continues in 520523626

ATTLGIEMNIQQQSSCEYLEILEELFELVTNRVTNVLLHYDWIYRWTSYYRRECKARKIF

QSPAQQVLRQKPILVNQSNATESDDLTQPQIFIDQLYRIAVKDPHFTKETIEKELNTMIFGG

NETTAVTMSNALLLIAMHPDVQLKLLEEFNAVFEGNLENMTVENLQRLVYMEAVLKEVMR

LWPITTILGRTTSTEVRLDEFLIPA 

 

520523626

Query: 29  LGNMTVENLQRLVYMEAVLKEVMRLWPITTILGRTTSTEVRLDEFLIPAGVNLVIDVYSI 208

           L NMTV NLQRLVYMEAVLKEVMRLWPITTILGRTTSTEVRLDEFLIPA VNLVIDVYSI

Sbjct: 337 LENMTVANLQRLVYMEAVLKEVMRLWPITTILGRTTSTEVRLDEFLIPAEVNLVIDVYSI 396

 

Query: 209 HRNARYWGADANRFVPERFFDREQYPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYNY 388

           HRNARYWGADANRFVPERFFDREQYPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYN+

Sbjct: 397 HRNARYWGADANRFVPERFFDREQYPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYNF 456

 

Query: 389 ELRTPLRMQDIRLKVAMTLKVENKHIITLSDRRK 490

           ELRTPLRM+DIRLKVAMTLKVENKH+ITLSDRRK

Sbjct: 457 ELRTPLRMEDIRLKVAMTLKVENKHMITLSDRRK 490

 

A complete second sequence is shown here

This appears to be a real gene 95% identical to AAGE02021468.1

Constructed from trace files 578965186, 581849161, 512981174, 520523626

MITQIVLVSFIISLMYWWRNRLIHQQLGCLPGPFSLPVLGSSYIFLGKSYSEILDAFHR

ISSTYGRNGSPVRFFLGSKPYIIINHPDHAQTILNSTGCLDKPWIYQYTPLEGIFS

LPTQKWRIHRKAIQPSFNWSILKSFLPIFKSKADLLIHKLKQRTSSHEPFDIYGFVAACT

LDMVY (1)

ATTLGIEMNIQQQSSCEYLEILEELFELVTNRVTNVLLHYDWIYRWTSYYRRECKARKIF

QSPAQQVLRQKPILVNQSNATESDDLTQPQIFIDQLYRIAVKDPHFTKETIEKELNTMIFGG

NETTAVTMSNALLLIAMHPDVQLKLLEEFNAVFEGNLENMTVENLQRLVYMEAVLKEVMR

LWPITTILGRTTSTEVRLDEFLIPAGVNLVIDVYSI

HRNARYWGADANRFVPERFFDREQYPYAFLGFSAGPRNCIGTRYAWLSMKVMLTAILYNY

ELRTPLRMQDIRLKVAMTLKVENKHIITLSDRRK

 

>AAGE01054827 AAGE01084906 AAGE01489548 41% to AAGE01025218 complete

44% to AAGE01064173

used first 500bp of AAGE01054827 for (-) trace files

644304896 (-) mate = 639417477 no exact match

579477042 (-) mate = 579483479

matches AAGE01212501.1 (-) 1639 bp no P450

>AAGE01097831.1 exon from phase 2 to ILQ boundary 54% to AAGE01224146

may join with AAGE01054827

searched trace files with first 250bp to correct I-helix region and extend

813896966 joins with AAGE01075759

AAGE01075759 67% to AAGE01224146 813896966, 68% to AAGE01224146

used (+) matches to continue with mate pairs for upstream seq

593176095(+) mate = 593182547 matches a repeat

758931963 (+) mate = 758929444 matches a repeat

used AAGE01075759 2101-2640 to avoid the repeat looked for (+) matches

575421893(+) 100% match, but still in repeat, mate = 575475472

matches AAGE01212501.1 (+) 1639 bp no P450

note same contig found in search from AAGE01054827

join

3548 MLIPLLTLFVCLLILVVVIINFVNTVQAKYGFAKNLPTVTRADESIVKLLWRLVRASDVDKFNRVVEA 3345

3344 FSLPYRLWKVWLGPVVCLGVCHPDLVQIVLTHPDCLEKPFIYRFIRINRGLLVAE (1) 3180

3098 AELWKRHRKVLNSAFNLRILHGFIPIFEKCCSRMVSDLKQMKDGETFDVMRFTARC 2931

2930 TLEMVFETSLGTGCLPPSESDCLIRHIKR (2) 2844

1623 FFNIASSRFLNVHLHYEPIFRLTQRSKKETESLHYCNTVVKQ (0) 1498

1424 LLTEKRQNVTKTPETPDDEFGKPRIFVDQLLRLGNSFSETDIIHNVFTMIIA (0) 1269

     GNDTSGQLMAYACLFLGMYPHIQEKVHAEIVEIIPRHHNEPISPEKLKSLAYTEM

3057 FLNECLRLCPTAPNIVRQNMAPITLDETRIPAGNLLAISLFAYHRRKDIWGPDADEFDPD 2878

2877 RFSAERSAGRHPFAFLPFSGGSRNCIGWRYAMISMKLMLVYLLREFKIKTDIRHQDMAFR 2698

2697 FNAALVLAGKH* 2677

 

>AAGE01132222 40% to AAGE01050332 N-term 42% to AAGE01064173 complete

AAGE01009694 494159924 51% to TC59131 47% to 325K1, 52% to AAGE01239763

use top 500bp to find trace file seqs with (-) strand matches

578469145 (-) mate = 578753270

matches AAGE01043606.1 (+) 4116bp no P450

use top 500bp to find trace file seqs with (-) strand matches

587753164 (-) mate = 587749602

matches AAGE01438960.1 (-) 1045bp no P450

636100588(-) mate = 636100414

matches AAGE01132222.1 (-) 2190 bp join with P450 N-term

1153 MLLVLAIVLVLLLILLVDSTLKHHVGKFARSLESVSPNYPLLGSATVFLGHSEERRFENFM 971

 970 NMLRQVDRIGKGWLGPQLMFYVAHPELVQKVLTDPNCSEKPFFYEFSRLTHGLFSAK (1) 800

     YSIWKPNRKALNPTMNVKMLNSFVPIFERFSRSMVEKLKCYPEGTPVDI 599

 598 LDFTTECALEMVCGTTLGTALKKGSGKRKFLESMQT (2) 494

 432 FISRVATRTLSVSLYNESFYRLTRAYNEEENARKYCLDFAKC (0) 307

IIEERQQVLVTEPQSKNSDENEDDDGYQIRRPKLFIDQVLSGNNTEADISTQNLSEQILTIMAA (0)

GYDTSANMVAHSCLFLAIFPELQEKVAQEIQTVLPDCEQELTAETLKDLPYLDKFFKECL

RLTPVGSTIARVNMTDIELDGCRIPKGNIFIFNFYVLHRRKDIWGPYAEQFDPENFS

PERSKGRHPFAFLPFSGGSRNCIGARYAMISNKIMIIHLVRNFRMSTKIRF

EDLKYRINVTLNLAFKHLITLEARR*

 

>AAGE01077592.1 N-term 37% to 325H1 same as AAGE01398514.1

use bottom 1000bp to search for (+) strand matches for mate pairs

529517080 (+) mate = 529517079

matches AAGE01008879.1 (-) 9326 bp no P450

use bottom 1000bp of AAGE01008879 to continue

look for (+) strand matches

578668939 (+) mate = 578777890

matches AAGE01190920.1 (-) 1749bp no P450

819718395 (+) mate = 820309477

matches AAGE01181611.1 (+) 1804 bp no P450

also matches AAGE01190920.1 (-)

620666104 (+) mate = 618185026

matches AAGE01179350.1 (+) 1818bp

use bottom of AAGE01190920 look for (+) strand

521973383 (+) mate = 520697038 matches a repeat

755995765 (+) mate = 755992211

matches AAGE01400949.1 (-) 1113 bp

757659867(+) mate = 757662389

matches AAGE01400949.1 (-)

use top of AAGE01179350 to find (-) strand matches

568764550 (-) mate = 578941007

matches AAGE01023525.1 (-) 6010bp no P450

use bottom of AAGE01023525 to find (+) strand matches

579862900(+) mate = 579996877 no good match only 95%

630748606(+) mate = 627442145

matches AAGE01032128.1(+) 5053bp

also matches AAGE01213139.1 (+) 1636bp

use top of AAGE01213139 to find (-) matches

757669846(-) mate = 754351538

matches AAGE01083633 (-) 2889 bp no P450

579851495(-) mate = no mate

1230 MLQLVLVFVLFTGFTYYLAFRRSRKRLYELAATFPAPFDLP

     LIGSTYIGIGLNSKTIIEYLLKFLHNLPSPFRAWMGPFLGIIFDKPQHLAVILNSQHCVQ

     KSVFQKFFRFDKGLINSDRNIWRPQRKQLAAPFSYQVVANFAPSFNEYAEEQLKYLDRFV

     GAEAFDMLPKLSFYVLSSTLANLFKVQLHSHDYDFMEKFVKNSEQ (2) 1847

1910 MWINIFRRVYKPWLISEFIYRLTPAYKMELQQVGKLRALSEE (0) 2035

 

AAGE02000578.1 complete

Length=111930

trace files that match this seq 835981972, 585800596, 578620585 (last exon)

Note 20kb intron

42754  MLQLVLVFVLFTGFTYYLAFRRSRKRLYELAATFPAPFDLPLIGSTYIGIGLNSKTIIEY  42575

42574  LLKFLHNLPSPFRAWMGPFLGIIFDKPQHLAVILNSQHCVQKSVFQKFFRFDKGLINSDR  42395

42394  NIWRPQRKQLAAPFSYQVVANFAPSFNEYAEEQLKYLDRFVGAEAFDMLPKLSFYVLSST  42215

42214  LANLFKVQLHSHDYDFMEKFVKNSEQ (2) 42137

42074  MWINIFRRVYKPWLISEFIYRLTPAYKMELQQVGKLRALSEE (0) 41949

21976  IVEARKVLQQKSHPGDDHNASSEV 21905

21904  LIERLERLTYQTGEMTNEEMMDNIDTFLFAAVDTTTSTMASTLLMMAIHPEVQER  21740

21739  VYQEVSQVVPNDYIAIEDLPNLVYLERVMKETMRLIPIAGMLNRVCEKELQVGEWTIPVG  21560

21559  ATIGIPVLKVHRDRAIWGERSDEFDPDNFLPEKVAQRHPYAYIPFSAGIRNCVGMRYANV  21380

21379  SMKVLLAKLVKRFRFKTDLRMKDLKFEAAFLMMLANKHMMRIEKR*  21242

 

Note trace 618127454 matches AAGE01531287 100% including the five differences

Above. Nucleotide seq is 96% differing at 21 positions from AAGE02000578 (last exon)

476383722, 575812863 also match this seq.

There are two seqs (possible alleles?)

I could not find differences in the first two exons

 

>AAGE01056055.1 N-term 61% to 325E1 57% to AAGE01193335.1

use first 500 bp to find a (-) strand match in megablast, get the mate pair

to move toward exon 2

578797200 (-) mate = 578664560

matches AAGE01174552.1 (+) 1848bp

575416824 (-) mate = 575383266

matches AAGE01180945.1 (+) 1808bp no P450

575353233 (-) mate = 575356809

matches AAGE01180945.1 (+)

use first 500bp of AAGE01180945.1 to find (-) strand match

581543226 (-) mate = 581549670

matches AAGE01277447.1 (+) 1402bp no P450

586630848 (-) mate = 586627267 matches AAGE01277447.1 (+)

use first 500 bp of AAGE01277447 to find (-) match

648073848 (-) mate = 648073771

matches AAGE01397305.1 (+) 1120bp

584129384 (-) mate = 584132944

matches AAGE01193335.1 (-) 1736bp (another P450 N-term)

also matches AAGE01089335.1 (+) 2777bp

end of this P450 may be in one of the three gaps between contigs

check gap between 7447 and 0945

759108191(+) matches top of AAGE01180945 and goes upstream 422bp

this region is in a repeat

754353423(-) goes about 675bp upstream

matches AAGE01128818.1 (-) 2226bp bp no P450 but

this same contig was found in AAGE01531287 search

also mate pair of 754353423 = 754349249 matches AAGE01193335(-)

and AAGE01089335.1(+)

631542973 matches the end of AAGE01277447 and goes downstream about 580bp

This region matches (no good WGS match)

join AAGE01531287 and AAGE01056055

AAGE01531287.1 46% to 325F2  476383722 38% to TC65595 43% to AAGE01347884

AAGE01381118.1 3 aa diffs 618127454 578620585

Next contig = AAGE01128818.1 no P450

Use the top of AAGE01128818 to find (-) strand trace file matches to continue.

No (-) strand matches found.  Used 835983442 to move upstream about 500 bp, used this seq to look for next upstream contig

This was in a repeat region

576268671 goes upstream 576bp 580164240 extend about 500bp more

521859673 goes 682 bp more

took most upstream seq and blasted WGS for next contig

this matches AAGE01128818(-)

use the last 500bp to search for (+) matches

(used the wrong end before)

586093264 (+) mate = 586089708 contains a 23bp repeat AATTTTCCTGGAATTTTGAACAG

matches AAGE01029644.1 (+) 5300bp no P450

take the end and look for (+) matches

587583422(+) mate = 587579857

matches AAGE01527387.1 (+) 924bp no P450

586096527(+) mate = 586100102

matches AAGE01523698.1 (-) 929bp

also AAGE01079226.1 (-) 2981bp no P450

Note this gene is still missing WXXXR to ETAM, ETAM to phase2, phase 2  to ILQ

These pieces may be between AAGE01531287 and AAGE01128818

Or AAGE01180945 and AAGE01056055

Use bottom of AAGE01531287 to extend into gap

Matches 586206960 goes downstream about 500bp matches AAGE01128818

Missing exons not here so must be between AAGE01180945 and AAGE01056055

Used bottom of AAGE01180945 to extend into gap

588808258 (+)100% match extends about 400bp

matches AAGE01464688.1(-) 1007bp no P450

used top of AAGE01464688 to go up, 749551959(-) this

matches AAGE01174552.1 (+) seen above no P450 look

between AAGE01174552 end and AAGE01056055 beginning

759111770(-) extends AAGE01174552 end 419bp

matches AAGE01174704 96% not best match no P450

824260043(-) extends AAGE01056055 upstream 650 bp no good WGS match

these missing exons may be in a seq. gap

2665  MFGLTFALIVVYLLALYVYAKIKYRFANKIPSIEPMVPFFGNGLEFAQKNCYKIFVNLKRIF  2480

2479  ENNKHHRLFKLCFGPIVVLCPTHPDLIQKVMTDTGSMEKPYVYEFLRVDLGLLSAK  (1)tgt  2312

missing WXXXR to ILQ

 (0) IVEARKVLQQKSHPGDDHNASSEDI

918  IERLERLTYQTGEMTNEEMMDNIDTFLFAAVDTTTSTMASTLLMMAIHPEVQERVYQEVSQ  734

733  VVPNDYIAIEDLPNLVYLERVIKETMRLIPIAVMLNRVCEKELQVGEWTIPVGATIG  563

562  IPVLKVHRDRAVWGERSDEFDPDNFLPEKVAQRHPYAYIPFSAGIRNCVGMRYANVSMKV  383

382  LLAKLVKRFRFKTDLKMKDLKFEAAFLMMLANKHMMRIERR*

 

>AAGE01102953.1 N-term 41% to 325C2 63% to AAGE01205264

use the top to look for (-) strand matches

578991895(-) mate = 578989377 4000bp away

matches AAGE01203025.1 (+) 1685 bp

744451246 (-) mate = 744448722 4000bp away matches AAGE01203025

579985225(-) mate = 579831226 matches AAGE01203025

use the top of AAGE01203025 to find (-) strand matches

521843675(-) mate = 521843676 8500bp away

matches AAGE01134554.1(-) 2167 bp no P450

755155471(-) mate = 755151937 matches AAGE01134554

519673863 (-) mate = 519832722 no ideal matches

use bottom of AAGE01134554 to find (+) strand matches

759361419(+) mate = 759364946 4000bp away

matches AAGE01333805.1 (-) 1256bp

574325046(+) mate = 574328615 matches AAGE01333805

574309681(+) mate = 574306139 matches AAGE01333805

use bottom of AAGE01333805 to find (+) strand matches

574303286(+) mate = 574310750 4000bp away

matches AAGE01574296.1 (-) 853bp

818802134(+) mate = 818798999 3500bp away

matches AAGE01606128.1 (-) 798bp

521922340(+) mate = 521922341 8500bp away

matches AAGE01570704.1 (+) 859bp

also matches AAGE01423387.1 (-) 1072bp

also matches AAGE01205264.1 (+) 1674bp this seq has a P450

The P450 is only another N-term and not a complementary seq.

The end of AAGE01102953 must lie between these two N-terms

Go back from AAGE01205264 to try to find it

Use end of AAGE01205264 to find (+) matches

644326835(+) 100%, but in a repeat region mate = 644291825

matches AAGE01173044.1 (+) 1858 bp

use bottom of AAGE01173044 to find (+) matches

520174168 (+) mate = 520529036

matches AAGE01161151.1 (-) 1942bp no P450

759234929(+) mate = 756228559 in a repeat

note 574149612 is the mate pair of 574146059

a 98% match to AAGE01102953 (902-1440bp (+) strand).

574149612 blast matches AAGE01026029(-)

this means AAGE01026029 is upstream of AAGE01102953

in the same orientation.  These mate pairs are 40,000bp apart

Since the C-term is in front of the N-term they must be from two

different genes.

1062  MLLVLSVFVVILCCVLFVSHRRKYKFADAVPSLQPVYPLLGNADIMWKSDTERFETIV  889

888   KIFSEHDRMVKVWAGPQMLLFTCHPDLVQQILSSSDCLEKPFLYSFAGFERGLFTSK (1)tgt  718

 

>AAGE01011475.1 N-term 39% to 325C2 57% TO AAGE01205264

576763861 (-) mate = 576734824

matches AAGE01470952.1 (+) 998 bp

AAGE01470952 matches traces 584345722(+)

576734824(+) 759713549(-) 513462158(-) 644338159(-)

832391075 extends 576734824 toward AAGE01011475

588911372 extends 832391075 toward AAGE01011475

588911372 matches AAGE01277343.1(+) 1403bp

819687897 extends AAGE01277343 by 600bp

753930249 extends 819687897 by 300bp

this matches AAGE01478066.1 (+)

569795945 adds 200bp

576763861 also matches AAGE01416311.1 (-) 1084bp

use the top of AAGE01470952 to find (-) matches

513462158(-) mate = 513455708

matches AAGE01108027.1 (+) 2475bp

644338159 (-) mate = 644341403

matches AAGE01000025.1 (-) 31853 bp in repeat

also matches AAGE01020018.1 (-) 6517bp in repeat

use top of AAGE01108027 to find (-) strand matches

821712794(-) mate = 822910015

matches AAGE01412652.1 (-) 1091bp

575363756(-) mate = 575417301

matches AAGE01201801.1(-) 1691bp

use bottom of AAGE01201801 to find (+) matches

587969133(+) mate = 587971667

matches AAGE01314237.1(+) 1304bp

also matches AAGE01199108.1 (-) 1705bp

755591529(+) mate = 755588994

matches AAGE01314237

use top of AAGE01314237 to find (-) strand matches

520544941(-) mate = 519998176

matches AAGE01380710.1 (-) 1154bp

also matches AAGE01070573.1(-) 3188bp

776272502(-) mate = 760988636 (no good match),776276064 matches AAGE01070573

use the bottom of AAGE01070573 to find (+) matches

581448944(+) mate = 581451482

matches AAGE01012162.1 (-) 8204bp no P450

use the bottom of AAGE01012162 to find (+) matches

815215884(+) mate = 815228284

matches AAGE01018936.1 (-) 6693bp no P450

630764106(+) mate = 627377245

matches AAGE01018936

use the bottom of AAGE01018936 to find (+) matches

827562568(+) mate = 826189694

matches AAGE01344368.1 (-) 1232 bp

 

note try extending from exon 1

638540618 adds 292bp downstream

638532364 continues to extend from 435-977

579935629 extends about 300 bp more then this matches

AAGE01020538.1 look for (-) matches to top of seq

528804330(-) extends 400bp mate = 521977783

matches AAGE01107049.1 (-) 2488bp

579020337(-) mate = 579026800

825733263 extends 528804330 seq about 400bp

mate = 825726173 matches AAGE01281689.1 (+) 1390bp

also matches AAGE01448499.1(-) 1030bp

try to extend AAGE01107049

581836134 (+) extends mate = 581839689 matches AAGE01164529

625165915(+) extends mate = 625084666 matches AAGE01164529

587867770(+) extends mate = 587871313

matches AAGE01164529.1(+) 1917bp

AAGE01026029 TC59131 TC50633 41% to CYP325C2 476413823, 64% to AAGE01111617

used bottom of AAGE01026029 to find (+) strand trace files for mate pairs

819714450 (+) mate = 819714180

matches AAGE01378352.1 (+)  1158bp no P450

571195913 (+) mate = 571382711 40,000bp away

matches AAGE01610084.1 (+) 790bp no P450

757966935(+) mate = 757964404

matches AAGE01076861.1 (-) 3034bp

use bottom of AAGE01076861 to find (+) strand trace files

600027741 (+) mate = 600565908

matches AAGE01218034.1 (-) 1614 bp no P450

use bottom of AAGE01218034 to find (+) strand trace files

522072791(+) mate = 528934371 matches AAGE01011475(-) (N-term P450)

Join

 

2039  MLFLILTVTVSILGAILYWNFRARYRFSDKWPTLKPVYPILGNGPVVMGKNEVDRFEIIR  1860

1859  DVCYSAERILKIWAGPKLLLLTSHPDLIQQILTSPVCLEKPYLYHFAGFEEGLFTAK (1)tgt 1689

missing exon 2

VWKPARKRLNPAFNLRIIHGFVPIMARCAQKMAARLNKYPDGATVDIIKYTNMCTLEMICGTTMGSDVLNRDGKEEFKRGLD

5684 gAFNGAAWRMMNVHLYPDIIYKMTRYHKELTEARKIVCDFFTK (0)  5559

4999 KQILQQKRLNNDEKNNNDEEENEADNHKPKILVDLLLSNSSDGKPFTESQITDNVYAVITG (0)

AVDTTALITAHACLFMSFYPEIQERVFAEINQYFPVGSDDQEVTHEQFR

QLTYTEMFLNEVQRHWTPVPLIARENMAEFEIDGVKVPPGHVFGLSLHALHMRKDVWGPD

ADRFDPENFSEERAKNRHPFAFLPFSGGTRICL (1)

GWRYASFSMKAVMVHLVKNFKFSSKIKPEDIRFKHDLTMKLPFEHLVQITKRNPVAN*

 

AAGE02009114.1 EST DV278487.1 use this seq

Length=83514

 

17904  MLFLILTVTVSILGAILYWNFRARYRFSDKWPTLKPVYPILGNGPVVMGKNEVDRFEIIR  18083

18084  DVCYSAERILKIWAGPKLLLLTSHPDLIQQILTSPVCLEKPYLYHFAGFEEGLFTAK  (1) 18254

36838  YHVWKPARKRLNPAFNLRIIHGFVPIMARCAQKMAARLNKYPDGATVDIIKYTNMCTLEMI  37020

37021  CGTTMGSDVLNRDGKEEFKRGLDG (2)

       AFNGAAWRMMNVHLYP  37200

37201  DIIYKMTRYHKELTEARKIVCDFFTK (0) 37278

37823  LIVEKKQILQQKRLNNDEKNNNDEEENEADNHKPKILVDLLLSNSSDGKPFTESQITDNV  38002

38003  YAVITG (0) 38020

38170  AVDTTALITAHACLFMSFYPEIQERVFAEINQYFPVGSDDQEVTHEQFRQLTYTEMFLNE  38349

38350  VQRHWTPVPLIARENMAEFEIDGVKVPPGHVFGLSLHALHMRKDVWGPDADRFDPENFSE  38529

38530  ERAKNRHPFAFLPFSGGTRICL (1)

       GWRYASFSMKAVMVHLVK  38709

38710  NFKFSSKIKPEDIRFKHDLTMKLPFEHLVQITKRNPVAN * 38829

 

>AAGE01046733 N-term 41% to 325C1 50% to AAGE01064173, 53% to AAGE01102953

used last 500bp to search for trace files (+) strand to get mate pairs

575129702(+) mate = 575133272 about 4000bp away

matches AAGE01078543.1 (+) 2996bp no P450

832524160 (+) mate = 832511205

matches AAGE01295016.1 (+) 1353bp no P450

AAGE01111617 44% to 325C2 62% to AAGE01026029

use bottom 500bp to find (+) strand matches

637074140(+) mate = 636170633 about 3500bp away

matches AAGE01198799.1 (-) 1707 bp no P450

587396559 (+) mate = 589186032 matches AAGE01198799.1

use the bottom 500 bp of AAGE01198799 to find (+) strand matches

813882368(+) mate = 813145290, 813876578 about 3500bp away

matches AAGE01078543.1 (-) search down from AAGE01046733 also

found this contig so these two sequences should be joined complete

2272 MYLLVFFGTIATLGTLWYWTYRWQFKFADKWPSVRPRYAIVGNALIMLWKNDVQRFQEIKRV 2457

2458 FSECDRILTAWAGPKMFLITSHPDIVHQILSSPDCLERPFLYRFAGFTQGIFTAK (1) 2622

2696 LPVWKDNRKRLNSTFNQRIVHGFVPYFVKCCEKMTKSLLECADGETVNIQKYTAVCALEMA 2872

2873 AGTTLGGDVLQQGDGKEEFKRGLDL (2)

3013 AFNGASRRMVTVPFYSDLIYQMTHHYKELMEGRRIICDFFTK (0) 3138

2151 LLIERKKFLLDHSKNTDVDTEEEYNKPKILVDQLLGVSHDGRQFNDIQIR

     DNVYAVITGATDTTSLATAHACLFLSFY 1972

1971 PDIQERLHAELAEVFPGNIADYTPENIKKLTYLDMFINEVQRHCTVVPYVARENTAEI 1798

1797 EIDGVKVPPGNIFIMSLYAMHKRPDIWGPDAEKFDPENFTEERIKDRHPAAFLPYSAGSKNCL 1582

1550 GWRYAIFGMKLIMIHLVRNFHFSSKIKHEDMQFRHDLTLKLPFQHLVQLKKRNPGKILTMVE*

 

AAGE02009113.1 Length=35743

33020  MYLLVFFGTIATLGTLWYWTYRWQFKFADKWPSVRPRYAIVGNALIMLWKNDVQRFQEIK  33199

33200  RVFSECDRILTAWAGPKMFLITSHPDIVHQILSSPDCLERPFLYRFAGFTQGIFTAK  (1) 33370

33414  LPVWKDNRKRLNSTFNQRIVHGFVPYFVKCCEKMTKSLLECADGETVNIQKYTAVC  33605

33606  ALEMAAGTTLGGDVLQQGDGKEEFKRGLDL  (2) 33695

33761  AFNGASRRMVTVPFYSDLIYQMTHHYKELMEGRRIICDFFTK (0)  33886

Continues on AAGE02009114.1 Length=83514

8595  LLIERKKFLLDHSKNTDVDTEEEYNKPKILVDQLLGVSHDGRQFNDIQIRDNVYAVITGA  8774

8775  TDTTSLATAHACLFLSFYPDIQERLHAELAEVFPGNIADYTPENIKKLTYLDMFINEVQR  8954

8955  HCTVVPYVARENTAEIEIDGVKVPPGNIFIMSLYAMHKRPDIWGPDAEKFDPENFTEERI  9134

9135  KDRHPAAFLPYSAGSKNCL (1)  9191

      GWRYAIFGMKLIM  9288

9289  IHLVRNFHFSSKIKHEDMQFRHDLTLKLPFQHLVQLKKRNPGKILTMVE  9435

 

 

>AAGE01025964.1 Length=5712 looks like pseudogene version of exon 3 from

AAGE01046733 gene with two frameshifts 44%

AFIG

ASSRS

4798  VSFPFFDDHFHQQHRHLKRLLVHRRALFDCTLL (0)  4700

GCATTCATCGGC GCATCGTCACGTTCT GTGTCTTTTCCATTCTTCGACGATCACTTTCATCAGCAGCATCGTCATCTGAAACGTC TTCTTGTTCACCGGCGTGCCCTCTTTGATTGCACTTTGCTCGT

 

>AAGE01050332 N-term 43% to 325C3 68% to AAGE01064173 complete

probably joins AAGE01062475 since both parts are very similar to AAGE01064173

These seqs seem to be about 25kb apart

used first 2500bp to do a mate pair search

823383326(-) mate = 822882346

matches AAGE01146758.1 (+) 2057bp

519958430(-) mate = 520159488

matches AAGE01492644.1 (-) 970 bp

also matches AAGE01007236.1 (-) 10050bp no P450 in a repeat

[some (+) to 7236 = 588906424, 584180985, 578472808

570791474 (near end) mate = 570802630 8500bp away

528805831 mate = 520588493 8500bp away

520588493 matches AAGE01435940.1(+) also found

in search from C-term direction starting with AAGE01062475

JOIN

644343133(-) mate = 644350619

matches AAGE01147442.1 (-) 2051 bp no P450

520003334(-) mate = 520199507

matches AAGE01147442

639101672(-) mate = 637789479

matches AAGE01146758

[matches to AAGE01146758 = 569633492(+)

520559431(-)568528128(-)579710222(+)

753071933(-) mate = 753068371 matches AAGE01146758

572559845 (-) mate = 572556276 matches AAGE01147442

581849846(-) mate = 581712988 matches AAGE01146758

use top of AAGE01146758 to find (-) matches (in a repeat)

use bottom of AAGE01147442 to find (+) matches

633007683(+) mate = 633009196

matches AAGE01007236.1 (-) 10050bp no P450 in a repeat

579756463(+) mate = 579750017

matches AAGE01492644.1 (-) 970bp no P450

520514512(+) mate = 520119306

matches AAGE01293135.1 (-) 1358bp

818781498(+) mate = 813898164

matches AAGE01545749.1(-) 897bp

757662582(+) mate = 757660034

note did a mate pair search of whole sequence

570569222 is a 96% (-) match with mate pair 570572811

this mate pair matches AAGE01224146

note: the mate pairs are 40,000bp apart

570572946  is a 96% (-) match with mate pair 570596480

this mate pair matches AAGE01224146

note: the mate pairs are 40,000bp apart

AAGE01121756.1(-)  overlaps end of AAGE01147442 by 400bp

2515 MCLYLLVSTFVLAFVWICESLRRKNAFAKNLPMAKPIKSFLGVDYSIMDMSDEERFEVMNDCF 2327

2326 ARFDRLFVFYTGPLLVLAVSHPDLVQKLLSHPDCLEKPYFYDFVKFEQGIFSAK (1) 2165

1629 YKLWKGQRKALNPTFNLRILHSFFPIFDECSKKLVQELKKLPKGETVNLFRYTSHCALEMV 1450

1449 CGTTLGSDVLEREGKDEFLCALEE (2) 1381

1323 IFGLVSRRMLSVHLYSDLIYMMTPAYWKEQFARNKLRSFAMR (0) 1198

 

AAGE02000569.1 use this seq

Length=71780

 

33986  MCLYLLVSTFVLAFVWICESLRRKNAFAKNLPMAKPIKSFLGVDYSIMDMSDEERFEVMN  34165

34166  DCFARFDRLFVFYTGPLLVLAVSHPDLVQKLLSHPDCLEKPYFYDFVKFEQGIFSAK  34336

34872  KLWKGQRKALNPTFNLRILHSFFPIFDECSKKLVQELKKLPKGETVNLFRYTSHCALEMV  35051

35052  CGTTLGSDVLEREGKDEFLCALEE (2)

       IFGLVSRRMLSVHLYSDL  35231

35232  IYMMTPAYWKEQFARNKLRSFAMK (0) 35303

49327  ILQEKKETARDTKTNGTDLDSEPETEEFKKPQIFIDQLLSISDISRSFTDEEILCNVLVI  49506

49507  MIA (0) 49515

49575  GNDTSGLAVAYGCLFLAMFPQIQERVYAEIMEHFPSDEMEITADSLRLLEYTERFLKETL  49754

49755  RHCPVAANIARENMKDIELDGVMIPAGTKFTVSFWALHRRADMWGPEVHSFDPDHFLPER  49934

49935  CRDRNPNAYMPFSTGARNCI (1) 49994

50199  GRYAMLSTKVMLIHILKNFKITTKLRFEDMRYKFGMTLKMSTDHLVQLERRF*  50357

 

 

 

>AAGE01062475 3413bp 494267002 53% to 494159924 53% to 325C 67% to AAGE01196445

use top 500bp of AAGE01062475 to find (-) matches and get the mate pairs

570785063(-) mate = 569751422

matches AAGE01407923.1(+) 1100 bp

529462795(-) mate = 529069593

use the top of AAGE01407923 to find (-) matches

575474529 (-) mate = 574253700,575420959

matches AAGE01433232.1 (-) 1055bp no P450

also matches AAGE01445502.1 (+) 1035bp no P450

use the end of AAGE01433232 to find (+) matches

578624953(+) mate= 578709633

matches AAGE01073717.1 (+) 3109bp no P450

use top of AAGE01073717 to find (-) matches

832544134 (-) mate = 832533932

matches AAGE01531222.1 (-) 918bp

823397032 (-) mate = 824324730

matches AAGE01466172.1 (-) 1005bp

use bottom of AAGE01466172 to find (+) strand matches (in repeat)

use the end of AAGE01433232 to find (+) matches again

578657876(+) mate = 578708713

matches AAGE01111705.1 (+) 2426bp no P450

use the bottom of AAGE01111705 for (+) strand matches

578708713(+) mate = 578657876

matches AAGE01238879.1 (+) 1529bp no P450

622012470 (+) mate = 620865717

matches AAGE01358323.1 (-) 1201bp no P450

use the bottom of AAGE01358323 for (+) strand matches

runs ito a repeat

used first 500 bp of AAGE01062475 to find upstream seq

754331375 (-) goes upstream about 500bp, used this to find contig

no good match in WGS

mate = 754341923 matches AAGE01358323.1 (+) 1201 bp

use top of AAGE01358323 for (+) matches (bottom had repeat matches)

still in repeat

This older strategy lacked info in insert sizes and went way too far

Upstream

Used more (-) strand matches and mate pairs to go upstrem only 4-8000bp

AAGE01269113.1(-) matches 587973945 mate of 587971423

That matches near the end of AAGE01062475, insert = 4000bp no P450

574232774(-) also matches end of AAGE01062475 mate = 574236320

insert 4000bp this also matches AAGE01269113

529565709 matches end of AAGE01062475, mate = 529515150

insert = 8500bp. 529515150 matches AAGE01426036.1(+) 1067bp

used 821692110 to extend 754331375 toward AAGE01269113

this eq overlapped AAGE01269113

used 1200-1700bp of AAGE01062475 to locate (-) mate pairs between

AAGE01269113 and AAGE01426036

820321218(-) had mate pair 821692110 3500bp upstream

matches AAGE01269113

826039570(-) has mate pair = 826154743 3500bp upstream

matches AAGE01269113

tried end of AAGE01269113 to extend upstream toward AAGE01426036

587871645(-) extends into gap 370bp

this seq matches AAGE01638768.1(-) 695bp

which points toward AAGE01426036

use all of AAGE01638768 to find (+) matches

578717187(+) mate = 580122653 3500bp apart

579657839(+) extends about 400bp mate = 579661406 4000bp insert

579661406 matches AAGE01435940.1(+) 1050bp

also matches AAGE01318937.1(+) 1292bp

also matches AAGE01407923.1

580122653 extends AAGE01426036 upstream about 500bp

578477469 extends AAGE01426036 downstream 465bp

AAGE01463180.1(-) extends 578477469 upstream toward 579657839

Now looking upstream of AAGE01407923

754331417 is a mate pair of 754330948 (-) 9500 bp insert

520562795 is a mate pair of 520665364(-) 8500 bp insert

matches AAGE01000444.1 (-) 20031bp no P450

832502264 is a mate pair of 832503627(-) 3500 bp insert

matches AAGE01441151.1(-) 1041bp

1266 ILQEKKETARDTKTNGTDLDSEPETEEFKKPQIFIDQLLSISDISRSFTDEEILCNVLVIMIA (0)

GNDTSGLAVAYGCLFLAMFPQIQERVYAEIMEHFPSDGMEITADSLRLLEYTERFLKE

TLRHCPVAANIARENMKDIELDGVMIPAGTKFTVSFWALHRRADMWGPEVHSFDP

DHFLPERCRDRNPNAYMPFSTGARNCI (1)

GGRYAMLSTKVMLIHILKNFKITTKLRFEDMRYKFGMTLKMSTDHLVQLERRF* 2296

 

>AAGE01064173 68% to AAGE01050332 N-term complete

used 1-1960 in a mate pair search no P450 found so used

(-) mate pairs for WGS searches

618075209(-) mate = 614737464

matches AAGE01295005.1 (-) 1353bp no P450

this contig also found in search for AAGE01196445

on opposite orientation as expected from a search from opposite direction

Join

AAGE01196445 67% to AAGE01062475 832452467 825244079 (C-term)

used first 500bp to find trace file seqs on (-) strand to get mate pairs upstream

579928179 (-) mate = 579928178

matches AAGE01295005.1 (+) 1353bp no P450

755009935 (+) moves upstream about 400bp

used this to look for next contig = AAGE01295005.1

used first 300bp of this seq to find trace files (-) strand

827560763 (-) mate = 826196777 in a repeat

matches AAGE01327171.1 (+) 1272bp no P450

revisited this seq used all of AAGE01295005 to find more (-) mate pairs

810114220 (-) in a TGGAAGAC repeat mate = 810112302

matches AAGE01538741.1 (+) 908bp no P450

621935132 (-)in a TGGAAGAC repeat mate = 621783691

matches AAGE01090366.1 (+) 2758bp no P450

use top of AAGE01090366 to find (-) matches

834948650(-) mate = 835990450 matches repeat AAGE01090366

749639099(-) mate = 749632674 no good match

811933234(-) mate = 810057071 in a repeat

593164368(-) mate = 593170808 in a repeat

used middle of AAGE01090366 to find (-) strand matches

591745757(-) mate = 591749328 in a repeat

576250249(-) mate = 576246693 in a repeat

use top of AAGE01090366 to walk upstream

572277727 moves upstream 477 bp

used this to find the immediate upstream contig

AAGE01469359.1 (=) 1001bp used top to walk upstream

615889895(-) goes about 400 bp upstream (end of seq)

592717583 (-) further extends this seq now in a repeat

579616263 (+) goes back 448bp further

579057965 (+) goes another 450bp

569646653 (-) adds about 650 bp

520152779 (+) moves back about 200bp

cannot get past the repeat

1960 MFLFLLLVCAILAFFAIRDHLRKTQAFAENLPIVVPEKSFLGINYDLLGLNDEERFELVN 1781

1780 RIFLQQDRLFRMSIGPMLILGVSHPDLVQKLLSHPDCLEKPFFYDFVKYDQGIFSAK (1) 1610

1541 FKLWKSQRKALNPTFNLRILHSFVPIFEKCSKKLVSELEKCKDGDTVNMFKYTSRCALEMV 1362

1361 CGTTLGSDVLQRDGKEVFLTSLEE (2) 1290

1226 LFLLVSRRMLSMHLYSDLIYMMTPHYWKELIHRKRCKAFTKK (0) 1101

1337 ILQEKKEARRYGATPESTPDSDPEADDFKKPQIFIDQLLSTSESSRPFTDEEIFHNVFVIMVA (0) 1525

1587 GNDTSGLATAYACLFLGMYPHIQEKVYAEVMEHFPNEDVEMTGDSLKQLEYTEMFLKEVL

     RHCPVAANIARQCIKDIEIDGTRVPAGNLFIFTFWAMHRRKDIWGPDADKFD

     PDNFLPERVQARNPNAYMPFSSGSRNCI (1)

     GGRYAMISIKVMLVYLLRRFKLHTNLKHEDLRYKFGITLRLSTSHMVQLERRKC*

 

>AAGE01039338 43% to 325C2 N-term complete

AAGE01059014.1

TC54812

558  MLFALLATGAILSLAVIWSIVQKNRFAQNVPTIEPWYPVVGNGLLFFGKNDIKKFHKLRKAF 743

744  DRKDALFRLYLGPRMLLCTSDPTVAQAIMTDANCMEKPYVYKFFNLNEGVFAAKSEYF (1) 920

973  THIWKGQRKALNPAFNSKILESFVSIFCEVSKTMIQRLDTVGKGDTINIMEHASRCTLEMV 1149

1150 CSTTLGFNVDIFDQIDDMGHKIEQ (2)

FFYIAARRILKFYLHVDSIYRWTKDYKDEQTLRENLDVYGMQVTITQLIETKSKS

RKPQIFVNQIFTNTIRKFERQEIIDNIITIVGAGTDTSATAIAFTFLQLAMYQEHQQK

VYEEIVKVFPESEPHITTEALKKLQYTKMVLNECLRLYPVAPILLRENTADITLCGGV

RVPKGNILTIDVYNIHRRKDVWGPDADEFIPERFSPERSAGRHPFAFLTFSGGSRNCI

GSRYAMISMKIMMVYLLKNFRFKTKIREEDIRYKFDALLRIEGGHLVQIEKRA*

 

AAGE02009773.1 Length=286003 FAAK boundary based on a similar EST EB099011.1

Revise KSKS boundary based on exact EST DW992488.1 use this seq

200611  MLFALLATGAILSLAVIWSIVQKNRFAQNVPTIEPWYPVVGNGLLFFGKNDIKKFHKLRK  200432

200431  AFDRKDALFRLYLGPRMLLCTSDPTVAQAIMTDANCMEKPYVYKFFNLNEGVFAAK (1)  200264

200202  THIWKGQRKALNPAFNSKILESFVSIFCEVSKTMIQRLDTVGKGD  200068

200067  TINIMEHASRCTLEMVCSTTLGFNVDIFDQIDEMGHKIEQ  (2)199948

184059  FFYIAARRILKFYLHVDSIYRWTKDYKDEQTLRENLDVYGMQ  (0)183934

183878  IYDDANNRFSK

183845  GSMNDETDDDKTEGFRKPQIFVNQIFTNTIRKFERQEIIDNIITIVGAGTDTSATAIAFT  183666

183665  FLQLAMYQEHQQKVYEEIVKVFPESEPHITTEALKKLQYTKMVLNECLRLYPVAPILLRE  183486

183485  NTADITLCGGVRVPKGNILTIDVYNIHRRKDVWGPDADEFIPERFSPERSAGRHPFAFLT  183306

183305  FSGGSRNCI (1)

183216  GSRYAMISMKIMMVYLLKNFRFKTKIREEDIRYKFDALLRIEGGHLVQIEKRA*  183055

 

>AAGE01030046.1  43% to 325F2 complete

4988  MWWPLLLAYLLAAVVLIYSYVQWTRRKMYAMLASMSSPKTLPVIGHAYKFFNVSP (1) 4836

3880  EALAKTIRYFRRFPSPVCLHMGPLPHVVVFDPESIQVVLNSQHCLQKPHQYSFLWIPRTLMCAP (1) 3689

3623  VHMWKTQRKAFNPAFGPAILGSFVPVFNEKCAILMEILEQHVGKPQRDFTRDILKCTLDQIY (1)  3426

2611  VTAFECEFNMQLSPDGDKSMDLFESYVGFVTKRFFSVWKYPDFIYRWTKAYRKQMLCCTTY  2432

2431  FNEVLEKIVRHVDIDRRVNDLCGEQSVEKYHNFVYCLSKYLQAQGPIPREDILAHFGF  2258

2257  MIFAGNETTAKTINAVLLMLAMHPEIQERCFLEVAAVCPIENQYISAEDVSNLTYLEMVC  2078

2077  KETMRLFPVAAMLARVATSDVKLN (1) 2006

1943  DRQTIPANTRIIIGTYQIHRDPKIWGPNSNRFDPDHFLPDNVAKRHPYSYIPFSGGPRNC  1764

1763  LGPRFAWLSMKTIIAFILRQYRLNTSLKFDQLKVAYGVLLTIADGCPMTIEKR*  1605

 

>AAGE01516637 AAGE01063015 67% to AAGE01030046, 818793338 820291982 complete

AAGE01269282 55% to 325F2 74% to AAGE01030046, probable end of AAGE01516637

Used botttom of AAGE01269282 to find (+) strand matches

632852730 (+) mate = 630141349 matches AAGE01063015.1

     MWWPLVLLYFTLGVILVYSYVQWTRRKMYAMLATMSNPKSLPVIGHAHKFFNVTL (1)

     ADTLQYFGSFPSPVCIHMGPLPHVAIFDPESVQVV

     LNSPDCLQKMHQYSFFWVPRTLLCAP (1) 41

104  VHMWKGQRKALNPAFSSAIIGKIVPVFNKKCEKLMHILDQYVGKQQKDCTVDILKCTLDQIY (1) 289

348  ETSFECEFNMQLSPDGEKTLDIFENFMKLASERICTIWKYPDLIYQWTKAYKKQLSCIDTY 530

531  YDTILEKVARHIDIDKRINEVNEVDNSTKKHNFIHCLSKYLRSQGPIPREDVLAHFCLIV 710

711  FAGNESTAKTVSTAMLMLAMHPEIQERCYQEINTVCPGENQYISAGDAANLTYLEMVIKE 890

891  TLRLLPVVPVLGRTATSDVKLN (1)

1136 DRHTIPANTGIVIGTFQIHRDPKIWGPNAERFDPDNFLPENVAKRHPYSFIPFSAGPRNC 957

956  LGVRYAWYSMKILLAYIVRQYRLSTTLKLDQVKVAYGVLLALKDGYPVSLEKRK* 792

 

AAGE02023793.1 Length=162318 use this seq

151537  MWWPLVLLYFTLGVILVYSYVQWTRRKMYAMLATMSNPKSLPVIGHAHKFFNVTL (1)  151373

        ADTLQYFGSFPSPVCIHMGPLPHVAIFDPESVQVV  151178

151177  LNSPDCLQKMHQYSFFWVPRTLLCAP (1)

151037  VHMWKGQRKALNPAFSSAIIGKIVPVFNKKCEKLMHILDQYVGKQQKDCTVDILKCTLDQIY (1) 150852

150793  ETSFECEFNMQLSPDGEKTLDIFENFMKLASERICTIWKYPDLIYQWTKAYK  150638

150637  KQLSCIDTYYDTILEKVARHIDIDKRINEVNEADNSTKKHNFIHCLSKYLRSQGPIPRED  150458

150457  VLAHFCLIVFAGNESTAKTVSTAMLMLAMHPEIQERCYQEINTVCPGENQYISAGDAANL  150278

150277  TYLEMVIKETLRLLPVVPVLGRTATSDVKLN (1)  150185

142052  DRHTIPANTGIVIGTFQIHRDPKIWGPNAERFDPDNFLPEN

        VAKRHPYSFIPFSAGPRNCLGVRYAWYSMKIL  141834

141833  LAYIVRQYRLSTTLKLDQVKVAYGVLLALKDGYPVSLEKRK*  141708

 

>AAGE01193335.1 AAGE01182945.1 N-term  65% to 325E1

probable N-term exon of AAGE01245552

813462501 581487574 588882777 586123749 752854417

637742616 matches on (+) mate pair = 638510990

this matches a repeat

813462501 (+) mate = 812771386

matches AAGE01220928.1 (-) no P450

760274997 (+) mate = 761367180

matches AAGE01191443.1 (+) 1746bp no P450

use first 500 bp of AAGE01191443 to find a (-) hit in megablast

use the mate pair to keep moving toward exon 2

569712697 (-) mate = 569677085

this matches AAGE01120174.1 (-) 2321 bp no P450

use the end of AAGE01120174 to find a (+) match to continue

636174306 (+) mate = 637134652

this matches AAGE01597709.1 (+) 813bp no P450

812154070 (+) mate = 813112864, 813497943

matches AAGE01353363.1 (+) 1212bp no P450

568529646 (+) mate = 568526082

matches a repeat

1033  MALLLITLALLGGIWLALYIYCRIRFGFARNIPEVRPLKFFFGNGLDFAQKNSYEIFVSINRV  1221

1222  FRENKRIFKISFGPIKVVCPTHPDLIQKVLCQSASMDKPYVYDFTRMGSGLLTAP (1)tgt  1383

 

>AAGE01245552.1 TC58401 new 53% to AAGE01045021 65% to 325E

531429449

Walked to 574319237

Walked to 744143898

Walked to 637124801 and 821753900 mate pair = 821670798 matches C-term

Walked to 836005807

Walked to 825766500

Walked to 755025265 mate pair = 755028846 matches DTWK and on

Walked to 745181680

Walked to 827540437 mate pair = 825766500 above

Walked to 529568074 mate pair = 531429449 matches DTWK

Walked to 757636880 mate pair 757640457 is upstream, may jump over the repeat

Walked to 529232002 runs into a large repeat region

Decided to continue the walk upstream of 757640457

Walked to 583632437 ran into a gap upstream

Tried going downstream of 757640457 toward repeat region

Walked to 519868279

158 YDTWKVHRKLLNPTFNTRILNSFIPIFNDCADKMIE

    SIHEHAAPGKVLNILEFTSPCTLAMICRTSLGGKVLEREGTQKFVEGLEI (2)

    ILSNVGLRMFNANLHPDIVYRFTRFYRREMESRKFCYAFTDK (0)

    IILEKQQELAEAKAKG

 89 KQDTNNNASDSGNDNFEEEEDDLLSYKKPQIFIDQLLTIPLPDGKPFSHKEISDYIYTMI 268

269 AAGNETSATQAAHTLMYLAMHPEVQEKAVKEIKELLPTPESKITSEVMKNMVYMERIIKE 448

449 SQRLAPVAAVYGRKTIADLQLDQFTIPKGNIFILNIFALHRRKEYWGEDAELFNPDRFFP 628

629 ENSKNRHPFAYLPFSGGNRGCI (1)

    GNRYAMMSMKTIVSAILRNFKISTDLEYEKIEFKFKVSMHLSGPHRTFVEPRNLYG*

 

AAGE02000570.1 fifth of five P450s on this contig = AAGE01193335.1 use this seq

86011  MALLLITLALLGGIWLALYIYCRIRFGFARNIPEVRPLKFFFGNGLDFAQKNSYEIFVSINRVFR  86205

86206  ENKRIFKISFGPIKVVCPTHPDLIQKVLCQSASMDKPYVYDFTRMGSGLLTAP (1)  86364

continues on AAGE02000571.1

6804  YDTWKVHRKLLNPTFNTRILNSFIPIFNDCADKMIESIHEHAAPGKVLNILEFTSPCTLAM  6986

6987  ICRTSLGGKVLEREGTQKFVEGLEI (2))

      ILSNVGLRMFNANL  7166

7167  HPDIVYRFTRFYRREMESRKFCYAFTDK (0)

      IILEKQQELAVAKAK  7346

7347  GKQDTNNNASDSGNDNFEEEEDDLLSYKKPQIFIDQLLTIPLPDGKPFSHKEISDHIYTM  7526

7527  IAAGNETSATQAAHTLMYLAMHPEVQEKAVKEIKELLPTPESKITSEVMKNMVYMERIIK  7706

7707  ESQRLAPVAAVYGRKTIADLQLDQFTIPKGNIFILNIFALHRRKEYWGEDAELFNPDRFL  7886

7887  PENSKNRHPFAYLPFSGGNRGCI (1)

      GNRYAMMSMKTIVSAIL  8066

8067  RNFKISTDLEYEKIEFKFKVSMHLSGPHRTFVEPRNLYG*  8186

 

>AAGE01001430 44% to 325F2 263506881 complete

 1270 MWNTVGVFVVFVTIFSLAYYRWRRRKVIAMLAKMDGPRSLPLVGHTHMLHLFG (1) 1410

 8777 KEIFNTFLQYGDRYTSPIAVEMGPMVYIFVYTPEQLQVVLNSPHCLEKPLQ 8956

 8957 YSFFQVSRGIFSAP (1) 8998

 9580 VDLWKILRKLITPSFGPGLLSSFVPIFNEKSSVMVEQMAKNVGKPQRDYYSEIVLCFMDTI 9756

 9828 TAFGVDCDLQRSPAGAEYVETQEKYIDIVTERYLKPWQYLNFIYRFTNAYQIFKKRHGKF 10007

10008 LALLTQATRINEVEDMLSKNSISKDYQDKDVGAKKIPIFVEKLLDEIQKSGHIKRED 10178

10179 IDDHIVTMCFAGNDTTATTMSNILLMLAMHPDIQERVYQEIIAACPDRNQQVSIEDAGKL 10358

10359 TYTEMVCKETMRHFSIAPVIGRTATQDVKLN (1) 10466

14112 DDITIPANSTLICCFYKLHMDPKNWGPDVKNFNPDNFLPDLVAKRHPYSFLPFSGGPRNC 14291

14292 LGVRYAWLSMKIMLVHILRRYRLRTTLTMDTITVKFNSFMKIEDGCPITVEER* 14453

 

AAGE02023796.1 Length=287591 use this revised seq

23727  MWNTVGVFVVFVTIFSLAYYRWRRRKVIAMLAKMDGPRSLPLVGHTHMLHLFGS (0)  23566

16194  EIFNTFLQYGDRYTSPIAVEMGPMVYIFVYTPEQLQVVLNSPHCLEKP  16047

16046  LQYSFFQVSRGIFSAP (1) 16002

       VDLWKILR  15400

15399  KLITPSFGPGLLSSFVPIFNEKSSVMVEQMAKNVGKPQRDYYSEIVLCFMDTIC(1)  15235

15172  NTAFGVDCDLQRSPAGAEYVETQEKYIDIVTERYLKPWQYLNFIYRFTNAYQIFKKRHGKF  14990

14989  LALLTQATRINEVEDMLSKNSISKDYQDKDVGAKKIPIFVEKLLDEIQKSGHIKREDIDD  14810

14809  HIVTMCFAGNDTTATTMSNILLMLAMHPDIQERVYQEIIAACPDRNQQVSIEDAGKLTYT  14630

14629  EMVCKETMRHFSIAPVIGRTATQDVKLN  (1) 14546

10885  DDITIPANSTLICCFYKLHMDPKNWGPDVKNFNPDNFLPDLVAKRHPYSFLPFSGGPRNC  10706

10705  LGVRYAWLSMKIMLVHILRRYRLRTTLTMDTITVKFNSFMKIEDGCPITVEER*  10544

 

>AAGE01347884 494537782 61% to 325F2 like 325H1 613935093 complete

AAGE01293591 61% to AAGE01303772 overlaps with 613935093

50% to AAGE01001430

used first part of AAGE01069052 to find trace files on the (-) strand

749335708 was an imperfect match, but its mate pair

749342170 matches AAGE01347884,

585893297 (-) mate = 585809237

matches AAGE01311378.1 (-) 1311bp no P450

591378293 (-) mate = no mate

803283930 (-) mate = 793222515

matches AAGE01125396.1 (+) 2262bp no P450

595130292 (-) mate = 594343693

matches AAGE01311378.1 (-)

also matches AAGE01184369.1 (+) 1787bp no P450

594343603 (-) mate = 595131586

matches AAGE01184369.1 AAGE01311378.1

AAGE01069052.1 best match to exon 1 of AAGE01001430, but cannot link it

To the rest of this gene yet.)

914  MWSVVIGYSLSVLVFLAVYYRWSRRKTNAALANMNGPPKYPLIGHLYLLKYTSQ (1)  753

225  EKIFETFVELGSTYSSPMGIELGPITLVVVYQPEHLQAVLSSPHCISRPFWYDFFRVSRGIFSSP (1) 419

485  AHIWRGQRKVLNHSFGPGILNSFVSIFNEKSEILTKLMTSHVGRGERDFGHEIARAALDTIY (1) 673

726  STAFGLNFGMQEAPEGSKYLEAQEEFIGLVLKRIFSVINYSERIYRLTKDYKREQELLSYAR 911

912  TLTNRIMQARNAEQILSGAIGLPSVTTENTDGKKPQIFLDKLFELAVENKQQLSKEDIP 1088

1089 EHLDTIIFAGNDTTATTMSNLLLMLAMHPDVQERVYQEVMEACP 1220

     DLEQPVSMEDTAKLTYTEMVCKETMRLFPVGPLIGRIAEVDIKIS (1)

67   DEHVIPAGSEVGCGIYMVHRDRKIWGPRAEEFNPDHFLPENISKIHPYAYLPFSGGIR 240

241  NCIGVRYAWISMKIMIVHILRRYRLKTSLTMDKITLQYCILLKIGNGCRISLEERNI* 414

 

AAGE02023793.1 Length=162318 use this seq

79977  MWSVVIGYSLSVLVFLAVYYRWSRRKTNAALANMNGPPKYPLIGHLYLLKYTSQ  (1) 80138

88311  EKIFETFVELGSTYSSPMGIELGPITLVVVYQPEHLQAVLSSPHCISRPFWYDFFR  88478

88479  VSRGIFSSP (1) 88505

88574  AHIWRGQRKVLNHSFGPGILNCFVSIFNEKS  88666

88667  EILTKLMTSHVGRGERDFGHEIARAALDTIY (1)

88812  STAFGLNFGMQEAPEGSKYLEAQEEFIGLVLKRIFSVINYSERIYRLTKDYKREQELL  88985

88986  SYARTLTNRIMQARNAEQILSGAIGLPSVTTENTDGKKPQIFLDKLFELAVENKQQLSKE  89165

89166  DIPEHLDTIIFAGNDTTATTMSNLLLMLAMHPDVQERVYQEVMEACPDLEQPVSMEDTAK  89345

89346  LTYTEMVCKETMRLFPVGPLIGRIAEVDIKIS  (1) 89441

89510  DEHVIPAGSEVGCGIYMVHRDRKIWGPRAEEFNPDHFLPENISKIHPYAYLPFSGGIR  89677

89678  NCIGVRYAWISMKIMIVHILRRYRLKTSLTMDKITLQYCILLKIGNGCRISLEERNI*  89851

 

>AAGE01011003 832457338 this seq overlaps 639393192  complete

591547547 mate pair = 591382861 matches AAGE01153865

51% to 325K1 N-term The next best match is 325E1 at 43%

AAGE01153865 494304406 639393192 58% to 476430416 52% to 325K1

223521505 76% to 476430416 60% to 325K

7774 MLAFVSLVLLIVAILMVHWWHAKVDFARYLPRAQPHYPVIGNLQIALPFGKSAEELLGL 7950

7951 LHSYFRQHDRMFAIHIGPKVAIGLSHPELVQQVLNHPYCQEKSNVYELLRLPNGLLSSK  (1) 8127

8189 YKVWKLHRKTLNSTFNLRILNSYLPIFNDSTRKLIQLLDQYASTGKTFNILAPLTHCTLGM 8371

8372 VCETSFGKKVLEREGKEQFFDDLEV (2) 8446

     LLTSLGKRVVNVLLHSEIIYRLTPMYRDETKSRPVCRQFTDKVIEEKRIEIESSFAGDSRL

     ITTEGTLSHDEGEAYKRPQIFIDQLLKMPLMTKSAYNFTDLEISDQVFTMIIA (0)

668  GNETSATQMAHTCLLLAMNPAIQQKAYQEVQQFIETENSYIDADILRKLVYIEAVLKESM 847

848  RLLPVGSLISRKNLQDIVLDGHIIPKNTPLLMKPYSLHRRPDIWGSDAEQFVPERFLGED 1027

1028 SKRRHPYAFIPFSGGPRGCIGLRYAMMTLKIMLALILKNFEISTQLKYRDLRIHYQLSLN 1207

1208 LAGPHAVSLERRR* 1249

 

AAGE02009247.1 Length=85986 use this seq

note trace file 832457338 supports the other seq (allele?)

14328  MLAFVSLVLLIVAILMVHWWHAKVDFARYLPRAQPHYPVIGNLQIALPFGKSAEELLGLL  14507

14508  HSYFRQHDRMFAIHIGPKVAIGLSHPELVQQVLNHPYCQEKSNVYELLRLPNGLLSSK (1)  14681

14743  YKVWKLHRKTLNSTFNLRILNSFLPIFNDSTRKLIQLLDQYASTGKTFNILAPLTHCTLGMV  14928

14929  CETSFGKKVLEREGKEQFFDDLEV  (2) 15000

15057  LLTSLGKRVVNVLLHSEIIYRLTPMYRDETKSRAVCRQFTDKVIEEKRIEI  15209

15210  ESSFAGDSSLITTEGTLSDDEGEAYKRPQIFIDQLLKMPLMTKSAYNFTDLEISDQVFTM  15389

15390  IIA (0) 15398

16042  GNETSATQMAHTCLLLAMNPAIQQKAYQEVQQFIETENSYIDADILRKLVYIEAVLKESM  16221

16222  RLLPVGSLISRKNLQDIVLDGHIIPKNTPLLMKPYSLHRRPDIWGSDAEQFVPERFLGED  16401

16402  SKRRHPYAFIPFSGGPRGCIGLRYAMMTLKIMLALILKNFEISTQLKYRDLRIHYQLSLN  16581

16582  LAGPHAVSLERRR* 16623

 

>AAGE01421194.1 N-term 756213770 51% to AAGE01011003 complete

AAGE01614450.1 AAGE01034413.1 (at least 5000bp to exon 3)

End of exon 2 is a the beginning of AAGE01034413(+)

Probably joins with AAGE01045021 since both are highly similar to AAGE01011003

Tried mate pair search with end and middle of AAGE01034413.1, no success

825266760 walked down to 578617426 used end to search again

looked for (+) match to end of AAGE01034413 to find mate pair in

next contig 592715947(+) mate = 592712372

matches AAGE01302867.1 (+) 1333bp

matches AAGE01045021.1 (+) 100%

AAGE01045021 61% to 325K 476430416 58% to 494304406 66% to AAGE01011003

256  MIALLWVVLSITIAVLVQRQWQKKVKFAGAIPRAKPYYPVVGNLPLALGKTSDELFSSLY  77

76   DCFRQHDRLFTLQFSTIVAVCLSHPELIQRVLNHPDCQEKPDVYKVVRLPKGLLAAR (1)

     YNTWKVHRKILNSTFNSKILQSFLTIFNNSSRRLIERLDHHADRGKSCNILEYISEC

     TLEMICRTSLGGRALERDGRQDFIENLEV (2)

2275 ALTALGKRILSFPLHNDFIYQFTTLYRDEMKAISTCHQFTDK (0) 2147

2054 IIAEKHVEFKSLFDEREKSKERETLQENDDESESYKRPQIFIDQLMKMPLMMKDAY 1887

1886 SFSDQDISDHVYTMIVA (0)

     GNETSATQLAHTCLLLAMNP 1707

1706 EVQEKAYQEVKEVVVSSDVFIDMDTLKQLVYVEAVLKEAMRLIPVAPLIARENIRDIELD 1527

1526 GHLIPKGTVLLMNMYALHRRNDVWGSDFERFYPERFLGETAKRRHPYAHLPFSGGPRGCI 1347

1346 GYRYAMMSLKILLALILKNFELSTDLKYNDIKYHYQISLNLAIPHAVSLKRRL* 1185

 

AAGE02009247.1 Length=85986, use my seq, same as below (second gene on contig)

37082  MIALLWVVLSITIAVLVQRQWQKKVKFAGAIPRAKPYYPVVGNLPLALGKTSDELFSSLY  36903

36902  DCFRQHDRLFTLQFSTIVAVCLSHPELIQRVLNHPDCQEKPDVYKVVRLPKGLLAAR  (1) 36732

36675  YNTWKVHRKILNSTFNSKILQSFLTIFNNSSRRLIERLDHHADRGKSCNILEYISECTLEMIC  36487

36486  RTSLGGRALERDGRQDFIENLEV (2)36418

26045  ALTALGKRILSFPLHNDFIYQFTTLYRDEMKAISTCHQFTDK  (0) 25920

25824  IIAEKHVEFKSLFDEREKSKERETLQENDDESESYKRPQIFIDQLMKMPLMMKDA  25660

25659  YSFSDQDISDHVYTMIVA (0)

       GNETSATQLAHTCLLLAMN  25480

25479  PEVQEKAYQEVKEVVVSSDVFIDMDTLKQLVYVEAVLKEAMRLIPVAPLIARENIRDIEL  25300

25299  DGHLIPKGTVLLMNMYALHRRNDVWGSDFERFYPERFLGETAKRRHPYAHLPFSGGPRGC  25120

25119  IGYRYAMMSLKILLALILKNFELSTDLKYNDIKYHYQISLNLAIPHAVSLKRRL*  24955

 

>AAGE01082531.1 complete

494292861 494251906 61% to 325G1

AAGE01491115.1 67% to 325G1

396  MFQSVLVLVLFPIVVYLILRWKHRRFYRISSELPGPVNYPLIGCGHLFIGKSNEEQF  566

567  AILNDITKTYPSPCRAWLGPKLFVFIDNPEDMQVILNSPNCLEKADLYRFFRCEKGLF  740

741  SSPASIWKVHRKLLSPCFSPAILASFVSIFNVKSEILVQRLEKNLGQGAFNLFGDISRC  917

918  TLDMIC  935

     ATTLGTNMDLQSNEGTEFIKSIED (2)

     ACELINCRLYKFWLHPEWIYQRTKYYKEEK

1228 YCYEKAYEMSRKILKMKQEARSKSRNTLNNNDNILSKSPQIYIDQILRLAEETDVFDNQAIK  1404

1405 DELDTIIVGGNETSALTLSHVMLMLAIHQDIQQKVYNEIVNVIGSCDPSIPVHNDQLSKL  1584

1585 IYTEMVMKETMRLFPVGPVVARTCTSPTRI (1)  1674

1736 SKTTIPAGTNIVLGVYNVHRNPKHWGPDVDRFDPEHFFPERVAERHPYSFLPFSGGPRNCIG  1919

1920 YKYGLMSMKIMLCHLLRAYRFRSPLKMDQLQLKMSITLKIANRHMVTVERRNG*

 

AAGE02023796.1 Length=287591 use this seq with correct intron boundaries

46056  MFQSVLVLVLFPIVVYLILRWKHRRFYRISSELPGPVNYPLIGCGHLFIGKSNEEQFAIL  46235

46236  NDITKTYPSPCRAWLGPKLFVFIDNPEDMQVILNSPNCLEKADLYRFFRCEKGLFSSPAS  46415

46416  IWKVHRKLLSPCFSPAILASFVSIFNVKSEILVQRLEKNLGQGAFNLFGDISRCTLDMIC (1)  46595

46657  ATTLGTNMDLQSNEGTEFIKSIED (2) 46728

46788  ACELINCRLYKFWLHPEWIYQRTKYYKEEKYCYEKAYEMSRKILKMKQEARSKSRN  46955

46956  TLNNNDNILSKSPQIYIDQILRLAEETDVFDNQAIKDELDTIIVGGNETSALTLSHVMLM  47135

47136  LAIHQDIQQKVYNEIVNVIGSCDPSIPVHNDQLSKLIYTEMVMKETMRLFPVGPVVARTC  47315

47316  TSPTRI (1) 47333

47393  SKTTIPAGTNIVLGVYNVHRNPKHWGPDVDRFDPEHFFPERVAERHPYSFLPFSGG  47560

47561  PRNCIGYKYGLMSMKIMLCHLLRAYRFRSPLKMDQLQLKMSITLKIANRHMVTVERRNG*  47740

 

>AAGE01073923.1 51% to 325G1 complete 70% to AAGE01082531.1

      MVLSVIVLLALP

1018  IIAYLVLRWKQRRFYQISAELPGPVSYPLIGSAHLFIGKTNEELFAILNGIVKTYSSPCR  1197

1198  GWLGPKLFVFIDNPEDIQVILNSPNCLEKAEIYRFIRSLNGLFTSPVSIWKVHRKLLSPC  1377

1378  FSPVVLSSFISKFNSKSATLVQHVGKNIGRAEYDSYGDISRCTLDMIC (1)  1518

1578  ATFLGTDMNLQSNEGTEFIKNVED (2)

      GCELINYRLHRFWL  1757

1758  HPEWIYRLTKYYRTERKCFENVFNMLNKIWKKRQMVLSESKTASLYESMSTKKPLIFIDR  1937

1938  IQRLAEETQVFDEIDIRDELSTIIVAGNETSALSLSNTILMLAIHQDIQEEVYNEIVNV  2114

2115  LESGDPSVPVNNEHLSKLCYTEMVIKETMRLFPVGPMLGRKCTAPTRI (1)  2258

2323  SKSTIPEGTNIILGVNNVHRNPAYWGPDANRFDPNHFLPDRIAERHPYAFLPFSGGPRNCIGYK  2514

2515  YALMSMKIILCYLLRAYRFRSPLKLDQLQLTMSLTLKIANRNVMTVERRDN*  2670

 

>AAGE01008633.1 325-like N-term 38% to AAGE01132222 complete

AAGE01074064.1 42% to 325C3 40% to AAGE01041126

on (-) strand so use bottom of AAGE01074064 to find (+) matches

813467254(+) mate = 811970297

matches AAGE01540337.1 (+) 905bp no P450

744058557(+) mate = 744054987

matches AAGE01196471.1 (+) 1719bp

625161011(+) mate = 625110150

matches AAGE01196471

use top of AAGE01196471 to look for (-) strand matches

813876897(-) mate = 813126749

matches AAGE01008633.1 (-) 9433bp = N-term of P450 join

     MEITQVLSWIVVGLLLITYIAHKWKYRNLRMIPGTNPDYPVLGNIPLFLKGGNAYMKVFDK  1356

1355 EHRMSKIWLGPVPLINVQHPELVQKVLNECIDKPFAYDFMELGQGLVSER (1) 1206

1144 YGQRWREHRKTLSPLFNTKILHSFIPIFERATSDVMKRLEVVCDGRDFDLLEYTSSCSAKMVHG 953

952  TMVDTLSVSEEIIHSLITNLDI (2) 890

ILDAVGKRILNGVYALKTLYKMSSVYRDEWRSRKICYETVNDVRTTRHSMANCSDLE

ADLKNPSKSKAFLERLLTIQHKGRSFTDDEIINHAYTMLVA (0)

GYETTALQLTNVCMMLAMHPDIQERVASEIKTIFPSLDMEILPEALKDLPYLDM

TINETMRLYPVVPLIARQSNSSLELDKVNIPTGTNFIVHIGALHRRKDVWGEEVLDFNPD

NFLPEKVERRHPYAFIPLSAGPRVCI

GNRYAMLSLKVFLIRLLQRYRLSTKLTRKDLKFKFQVTLKLKIPYTVQMERRN*

 

AAGE02006231.1 Length=220788 use this seq

94074  MEITQVLSWIVVGLLLITYIAHKWKYRNLRMIPGTNPDYPVLGNIPLFLKGGNAYMKVFD  93895

93894  KEHRMSKIWLGPVPLINVQHPELVQKVLNECIDKPFAYDFMELGQGLVSER  (1) 93742

93680  YGQRWREHRKTLSPLFNTKILHSFIPIFERATSDVMKRLEVVCDGRDFDLLEYTSSCSAKM  93498

93497  VHGTMVDTLSVSEEIIHSLITNLDI (2) 93423

86605  ILDAVGKRILNGVYALKTLYKMSSVYRDEWRSRKICYETVNDVRTTRHSMANC (1) 86447

86352  TEADLKNPSKSKAFLERLLTIQHKGRSFTDDEIINHAYTMLVA  (0) 86224

86154  GYETTALQLTNVCMMLAMHPDIQERVASEIKTIFPSLDMEILPEALK  86014

86013  DLPYLDMTINETMRLYPVVPLIARQSNSSLELDKVNIPTGTNFIVHIGALHRRKDVWGEE  85834

85833  VLDFNPDNFLPEKVERRHPYAFIPLSAGPRVCIGNRYAMLSLKVFLIRLLQRYRLSTKLT  85654

85653  RKDLKFKFQVTLKLKIPYTVQMERRNLYGEDSTVNCTK  85540

 

>AAGE01303772 AAGE01114730 476159771 58% to 494251906 60% to 325F2

57% to AAGE01020741

used AAGE01152759 to search for trace files that were about 75% identical

found 595132659 (67-255) 71% to AAGE01152759 might complete this seq

67 MWWWDTLLTYVCASVIMAVFYFRWTRRKMNAKLATMSGPRRLPLLGHAHKFYRATP (1) 234

RKIANTLKYFGSFPSPVCIYMGPLPHVAIFDPEQLQVILNSQNCLDKSIQYSFLRVSRTMISAP (1)

THLWKNQRKALNPSFAPAILNNFVPIFNEKCAILTGLLGKYVGQPERNYTRDLCKFTLDQIY (1)

ATALGCDFDMQRSPDGERSLDLIESYIKVMVARIFTVWKYPEFIYRMTSGYKREQEILKTY

HETIISKLLRAIDFEEKLKLSDENNNEAMNEDTGSKKPNIFIDRLLKLMRDGDEIAKEDI

FQQIDMILFAGNDTTAKTTSFILLMLAMHPEVQERCYQEIMAVCPGENQIVTAEDAAELIYLEMA

CKETMRLFPVGSVLARVTTADIKLN (1)

DEHTIPADSTIIMGIYQIHRDPKIWGPKADEFDPNNFLPERAEKRHPYSFLPFSGGPRNC

VGMRYAWLSLKVLVVHMLRKYRLSTSLTMDQIRIKYGIILNIANGCLLTLEKR*

 

AAGE02021468.1 Length=30723 use this seq

21687  MWWYSWLTYLCASVCMVMNFLQWSRRKTNAKFAKMSGPRRLPLIGHAHKFFRATP (1)  21523

16266  GKIANTLKYFGSFPSPVCIYMGPLPHVAIFDPEQLQVILNSQNCLDKSIQYSFLRVSRT  16084

16083  MISAP (1)

15999  THLWKNQRKALNPSFAPAILNNFVPIFNEKCA  15904

15903  ILTGLLGKYVGQPERNYTRDLCKFTLDQIY  (1) 15814

15755  ATALGCDFDMQRSPDGERSLDLIESYIKVMVARIFTVWKYPEFIYRMTSGYKREQEILKT  15576

15575  YHETIISKLLRAIDFEEKLKLSDENNNEAMNEDTGSKKPNIFIDRLLKLMRDGDEIAKED  15396

15395  IFQQIDMILFAGNDTTAKTTSFILLMLAMHPEVQERCYQEIMAVCPGENQIVTAEDAAEL  15216

15215  IYLEMACKETMRLFPVGSVLARVTTADIKLN (1) 15126

15067  DEHTIPADSTIIMGIYQIHRDPKIWGPKADEFDPNNFLPERAEKRHPY  14924

14923  SFLPFSGGPRNCVGMRYAWLSLKVLVVHMLRKYRLSTSLTMDQIRIKYGIILNIANGCLL  14744

14743  TLEKR* 14726

 

AAGE02023793.1 Length=162318 4 P450s on this contig at 15k, 41k, 79k, 151k

15109  MWWWDTLLTYVCASVIMAVFYFRWTRRKMNAKLATMSGPRRLPLLGHAHKFYRATP  14942

This is an N-term only used incorrectly above as a part of AAGE01303772

 

>AAGE01152759 75% to AAGE01303772 overlaps AAGE01124741 in exon 2 complete

AAGE01124741.1 AAGE01593486 AAGE01487633.1

43% to 325F1 56% to AAGE01020741

61% to AAGE01303772

844  MLCWVTMLTYICASVILAVFCLRSTRRKMDCKLASMSGPRRLPLLGHAQNFYKSTP (1) 665

     EAIAGTLKYFSKFPSPVCIHMGPLPHVAIFDPEQLQVVLHSQNCLDKSVQYSFLRVSETLISAP (1) 332

 23  GHLWKGQRKALNPSFGPAILTTFAEIFNNKCAILTKRLEEYAGKPERNFYRDISKCTLDQIY (1)

208  ATAFGCDFNMQTSLDGERSLDLQEAYMKVMANRFFSVWKYPEFIYRWTAGYKKELELRRIY  387

388  HETITCKLVQQVSVEEKLHTKEDIDFKTEETGKRIPENFIECLVKYLRAEGETSK  552

553  DAVYPHIDMTVFAGNDTSAKTICSILLMLAMHPEVQERCYQELMEVCPEKDQHISYKDAA  732

733  NLTYLEMVCKETMRLLPAVPFMARITSGDIVLN (1)

     DQHTIPAN  912

913  CTIIMGIFQIHRDPRIWGPNADNFDPDNFLPDNVAKRHPYSYIPFSAGPRNCIGTRYAYL  1092

1093 SSKIMVGSILRKYRLKTSLTMDKLRISCGLLLHISNGCQMAIEHR*  1230

 

AAGE02021469.1 Length=21097 use this seq

11337  MLCWVTMLTYICASVILAVFCLRSTRRKMDCKLASMSGPRRLPLLGHAQNFYKSTP (1) 11170

11016  KAIAGTLKYFSKF  10978

10977  PSPVCIHMGPLPHVAIFDPEQLQVVLHSQNCLDKSVQYSFLRVSETLISAP (1)  10825

10767  THLWKGQRKALNPSFGPAILTTFAEIFNNKCAILTKRLEEYAGKPERNFY  10618

10617  RDISKCTLDQIY (1) 10582

10112  ATAFGCDFNMQTSLDGERSLDLQEAYMKVMANRFFSVWKYPEFIYRWTAGYKKELELRRI  9933

9932   YHETITCKLVQQVSVEEKLHTKEDIDFKTEETGKRIPENFIECLVKYLRAEGETSKDAVY  9753

9752   PHIDMTVFAGNDTSAKTICSILLMLAMHPEVQERCYQELMEVCPEKDQHISYKDAANLTY  9573

9572   LEMVCKETMRLLPAVPFMARITSGDIVLN (1)

9428   DQHTIPANCTII  9393

9392   MGIFQIHRDPRIWGPNADNFDPDNFLPDNVAKRHPYSYIPFSAGPRNCIGTRYAYLSSKI  9213

9212   MVGSILRKYRLKTSLTMDKLRISCGLLLHISNGCQMAIEHR*  9087

 

>AAGE01004435.1 might match with AAGE01020741 complete

try searching with first 1000bp of AAGE01004435 for (-) strand matches

600018362 (-) mate = 600015159

matches AAGE01442713 (-) seen below in repeat region when searching

for extension to AAGE01020741.  This is evidence that these two seqs

are from the same gene, but not real strong evidence, since the match

is in a repeat region.

591948070(-) mate = 591747525

matches AAGE01406894.1 (+) 1102 bp

also matches AAGE01215974.1 (-) 1623 bp

Provisionally join AAGE01004435 and AAGE01020741

AAGE01020741.1 43% to 325F2 missing exon 1 57% to AAGE01303772

used top 1000bp of AAGE01020741 to look for (-) trace file seqs

to get mate pairs upstream

825198586 (-) mate = no mate

579619346(-) mate = 579123534

matches AAGE01028893.1 (+) 5378bp no P450

592231766(-) mate = 592238195

matches AAGE01028893.1 (+)

repeat with top 1000bp of AAGE01028893

this jumps upstream more than 5000bp and continues the search

579492092(-) mate = 579485641

matches AAGE01267250.1 (+) 1433bp no P450

578717559 (-) mate = 580123560

matches AAGE01442713.1 (+) 1039bp no P450 repeat region

520662500(-) mate = 528593949

matches AAGE01299598.1(-) 1341bp no P450 repeat region

761358767(-) mate = 760269002

matches AAGE01284342.1(-) 1383bp no P450

also matches AAGE01097354.1(-) 2636bp no P450

also matches AAGE01125726.1 (+) 2259bp no P450

3883 MWWYLLVPYICAGVIIVVSYVHWTRRKMYKTLATMSCPKTLPLIGHAHKFFNATA (1) 3719

 876 ESIADGLKFFAEFPSPVVIHLGPSPQVAIFDPEQARIVLNSQNCLDKAFFYSFLRVPGTLISSP (1)

1133 GPLWRSQRKALNSSLGPAILGSFIPIFNNKSAILVDLLEKYAGEPERDFSVDIAKCFLDQIY (1)

     ETAFGCNFSMQTSPEGDKTVDMMGDYMHIV  1477

1478 SKRFFTIWQYPEVLYRMTDAYKTEQALLKAHHEITENIVRQVKFHEQINMTDEKFAALDG  1657

1658 SSKTHNFIECLVKYMRTSVHTSQADIFSHIDMTLFAGNDTTAKSLSYVLLLMAMHPEVQE  1837

1838 RCYQEVMEVCPGEERFISAEDTANLTYLDMVCKEGMRLFPVVPIMARVTNNDVKLDGKC (1) 2017

     EHHTIPANCNIILGVYQMHRDPNIWGPNADQFNPDNFLPEN  2197

2198 AAKRHPYAYLPFSAGPRNCMGLRYARIAMKVTAAHILKKYRLRTSLTLEELRVSYGVMLN  2377

2378 IANGVLMSLEKR*  2416

 

Mitochondrial clan sequences (note CYP49 not found yet)

 

>CYP301A1 AAGE01019797 476402901 520186080 585804951 600015456 520527252

574225018 636187512 528806294 88% to CYP301A1

note anoph 301A1 has 4 extra aa after VGIDT (probably an error at an intron boundary) complete

MMDVDSVAVNTRLHPAPAHIQQQAPPTIKLENARPYSEVPGPKPLPILGNTWR (2)

VFPIIGQYKISDVAMISFLLHEHFGRIVRLGGLIGRPDLLFVYDADEIEK

VYRNEGPTPFRPSMPSLVKYKSELRKDFFGDLPGVVGV (2)

HGEPWREFRSRVQKPVLQLSTVRRYVTPLEQVTEEFIDRCNQMLDHNKELPDDFDNEIHKWSLECIG

LVALDTRLGCLEPNLTSDSEPQQIINAAKYALRNVATLELKAPYWRYFPTPLWTKYVNNMDYFVK (2)

VCMKYIKSATKRMNLGEGRALDGEPSLLERVIKSQKDERLAVVMALDLILVGIDTISMAVCSILYQ

LATRPAEQQKVYEELKRIMPDPKTPLTYQLLDQAHYLKAFIKEVLRVYSTVIGNGRTLQEDTVICGYRIPKG (0)

VQCVFPNLVTGTMEEYVTDAKSFKPERWLKPSQGGTGDNLHPFSTLPYGYGARMCLGRRFA

DLEMQILLAKLLRSYKLEFHHKPLKYVVTFMYAPEGALKFKMTPRE*

 

>CYP49A1 AAGE01096311 yellow = potential intron, does not match with anopheles

90% to anopheles CYP49A1 complete

578793495 587596076 637745631 815234501 621923398 600556551 574277444

MVGQRIANSLLKRSAFSSAINVLPNEAAATIDPPPLADYRGHVKPYSDVPGPKELPLIGNSWRFAPLI (1)

GQYRIEDLDKVMCDLHRHFGKIAKVGGLIGHPDLLFVFDADLIRDTFK

KEEVQPHRPAMPSLRNYKSKLRKDFFGDNMGLIGV (2?)

HGEKWDAFRAQVQQVMLQPTTAKKYIAPLDEISTDFMAR (2)

IHEMRDENNELPGDFLHELYKWALE (1)

SIGRVALDTRLGCVSKTGNEESKRIIDSINTFFWTVAEVELRMPIWRIYKTSAYKKYLGALDTFRE (2)

LCMRHINIAMEKMNNSSEVKSEEHISIVERILQKTNNPKIAAVLALDLIMVGVDT (0)

TSVAATSTIYQLSQNPDKQEILFNEIKRSLPTPDTKFTISMLETMPYLRACIKETLR (2)

MYPVVIGNGRSLQSDAVIGGYHIPKG (0)

THVIFPHLVVSNLEEYFPEPDRFLPERWLKRGELKEHAGCPHAGQKIHPYVSLPFGYGRR

TCIGRRFAECELQILLSK (0)

LFRRYHVEYNYEKLTYKVNPTYIPDKPLKFKLTERTD*

 

>CYP302A1v2 AY947550 missing last exon

MYLSLVKSKIKSATLMTSRSCATVVLENVKPYNQIPGPRGPFGL

GNLYQYIPGIGKYSFDALHESGQDKYEKYGPIVRETMVPGQDIVWLYDPNDIAAVLND

KTPGIYPSRRSHTALAKYRRDRPNVYRTAGLLATNGIEWWKIRSELQKGLSSPQSVRN

FLPLTDKVTREFVASMNSTENDCVKDFMPAISSLNLELICLMAFDVRLDSFSDEQMKP

NSLSSRLMESAEVTNQSILPTDQGFQLWKFFETPAYRKLRKAQEFMEKTAVELVSQKL

LYFDEDQQKLASGRHRSRSLLEEYLRNPNLELHDIIGMAADLLLAGVHTSSYTTAFAL

YHLYLNPDAQDKLYQEACRILPDPWECQIEAAALNSEASYCRAVLKESLRLNPISIGV

GRILNKDATLGGYHVPKGTVVVTQNLVSCRQERYFKNPTKFIPERWMRETKEDVNPYL

VLPFGHGMRSCIARRMAEQNILVLLLRLI

 

AAGE02035843.1  Length=4920 only one diff at intron boundary, use my boundary

7aa diffs to AY947550, 6 aa diffs to AY947549

3078  MYLSLVKSKIKSATLMTSRSCATVVLE (2)  2998

2930  NVKPYNQIPGPRGPFGFGNLYQYIPGI (1)  2850

2787  GKYSFDALHESGQDKYEKYGPIVRETMVPGQDIVWLYDPNDIAAVLNDKTPGIYPSRRSH  2608

2607  TALAKYRRDRPNVYRTAGLLAT (2) 2542

2479  NGIEWWKIRSELQKGLSSPQSVRNFLPLTDKVTREFVASMNSTENDCVPDFMPAISRLNLE (1) 2397

2238  LICLMAFDVRLDSFSDEQMKPNSLSSRLMESAEVTNQSILPTDQGFQLWKYF  2083

2082  ETPAYRKLRKAQEFMEKTAVELVSQKLLYFDEDQQKLATGRHRSRSLLEEYLRNPNLELH  1903

1902  DIIGMAADLLLAGVHTSSYTTAFALYHLCLNPDAQDKLYQEACRILPDPWECQIEAAALN  (1) 1723

1662  SEASYCRAVLKESLRLNPISIGVGRILNKDATLGGYHVPK  1543

1542  GTVVVTQNLVSCRQERYFKNPTKFIPERWMRETKEDVNPYLVLPFGHGMRSCIARRMAEQ  1363

1362  NMLVLLLR (0)

1276  LIRSYEIDWKGKVPMNIETKLINQPDQPIKIAFRSRKS*  1160

 

>CYP302A1v1 AY947549 AAGE01063658 complete 76% to 302A1 Anopheles

8 aa diffs to AY947550, an allele?

MYLSPVKSKIKSATLMTSRSCATVVLENVKPYNQIPGPRGPFGL

GNLYQYIPGIGKYSFDALHESGQDKYEKYGPIVRETMVPGQDIVWLYDPNDIAAVLND

KTPGIYPSRRSHTALAKYRRDRPNVYRTAGLLATNGIEWWKIRSELQKGLSSPQSVRN

FLPLTDKVTREFVASMNSTEHDCVPDFMPAISRLNLELICVMAFDVRLDSFSDEQMKP

NSLSSRLMESAEVTNQSILPTDQGFQLWKYFETPAYRKLRKAQEFMEKTAVELVSQKL

LYFDEDQQKLASGRHRSRSLLEEYLRNPNLELHDIIGMAADLLLAGVHTSSYTTAFAL

YHLCLNPDAQDKLYQEACRILPDPWECQIEAAALNSEASYCRAVLKESLRLNPISIGV

GRILNKDATLGGYHVPKGTVVVTQNLVSCRQERYFKNPTKFIPERWMRETKEDVNPYL

VLPFGHGMRSCIARRMAEQNMLVLLLRLIRSYEIDWKGKVPMNIETKLINQPDQPIKI

AFRSRKS

 

AAGE02020818.1 Length=128320 3 aa diffs to AY947549, use this seq

11 aa diffs to AY947550, 6aa diffs to AAGE02035843 (allele?)

48630  MYLSPVISKIKSATLMTSRSCATVVLE (2)

       NVKPYNQIPG  48451

48450  PRGPFGLGNLYQYIPGI (1) 48400

48337  GKYSFDALHESGQDKYEKYGPIVRETMVPGQDIVWLYDPNDIAAVLNDKTPGIYPSRRSH  48158

48157  TALAKYRRDRPNVYRTAGLLAT (2)

       NGIEWWKIRSELQKGL  47978

47977  SSPQSVRNFLPLTDKVTREFVASMNSTEHDCVPDFMPAISRLNLE (1)

47784  LICLMAFDVRLDSFSDEQMKPNSLSSRLMESAEVTNQSILPTDQGFQLWKYFETPAYRKL  47605

47604  RKAQEFMEKTAVELVSQKLLYFDEDQQKLASGRHRSRSLLEEYLRNPNLELHDIIGMAAD  47425

47424  LLLAGVHTSSYTTAFALYHLCLNPDAQDKLYQEACRILPDPWECQIEAAALN (1)  47269

       SEASYCRAVLKESLRLNPISIGVGRILNKDATLGGYHVPKGTVVVTQN  47065

47064  LVSCRQERYFKNPTKFIPERWMRGTKEDVNPYLVLPFGHGMRSCIARRMAEQNMLVLLLR  (0) 46885

       LIRSYEIDWKGKVPMNIETKLINQPDQPIKIAFRSRKS*

 

>CYP314A1 AY947552 complete 63% to 314A1 Anopheles

MSVTIILFYVFITAFMLLSYNPKPRKILESIKSFLLHLVHLSKN

VVSTTIHVPPDAMQMNQAEPVYTVWDIPGPKRLPLVGTRWMYYVGRYKLNKMQDAFVD

LHKRYGNIVLEFDHVPIVNLFDRVDMEKVLKYPSKYPYRPPTEIVEYYRRSRPDRFAS

TGIVNTQGEQWHELRVKLTSGIMSRKLLQAFIPTLNEIADEFVTLIRQKRDSNDCVKD

FQDIANTVGLEIICCFVLGRRMGYMSGDKQKNEKFVKLAEAVKSTFMYISQSYYGVKL

WKYLPTKLYRDYVRCEEIIYDTIAEIVNEALAEEQEKCAADDMRGIFLNILQSEGLDK

KDKIAGIIDLIHAAIETFSNTLSFLLNNMTSHPERQARIASEFTSDTITNNDLVNAAF

TRACIKESYRISPTTPCLARILEEDFDLSGYQLKAGTVVLCHTRVACQNEQNFQQANT

FLPERWLEQVDENQNVYKLDEPGSSLVLPFGTGRRMCPGNKIIEIELTLIMAKIFQQF

KVEYHSQLDTQFQFLLAPGTPIEIIFRDRD

 

>CYP315A1 AY947551 494561798 494528462 complete 48% to 315A1 Anopheles

MKLMSSVASGKARPFQDLPGPRRFPFLGTINDIIHLGNPKTLHL

TISKHHIKYGPLFKIQIGNVNAVFIKDPDMMRSVFAYEGKFPKHPLPEAWTYFNEKRK

CKRGLFFMDDEEWLHYRKLLNQPLLRNTSWMIGPIKRVSDNTIKSLPHNAKHSDCKEK

RFELHNVESVLYKWSIEVLLSVMLGSSYNEINAIKLNELVEQFSRTVYQIFMYSSKLM

AVPPEIADRLQLDAWKQFERIVPESLAIANKIIDISIDDIERGDGLLSKLEDCIPSRD

SIKRIFSDFIIAAGDTTAFATLWCLYLLAKNQAVQTMVRDETKHDFLESPLIRATVKE

SLRLFPIAPFIGRFLATDAIIGDYCIPKNTLALLSLYSAGRDEVNFYLPNEFLPQRWL

RRDDKNQSIIPFNANASLPFAIGSRSCIGRRVALIQMQYLLSKILNEYRLTVLNDEEV

DAELKLVTVPNKKVKLAFHKLQ

 

AAGE02021700.1 Length=168697 use this seq

141373  MKLMSSVASGKARPFQDLPGPRRFPFLGTINDIIHLGNPKT  (2) 141495

142158  LHLTISKHHIKYGPLFKIQIGNVNAVFIKDPDMMRSVFAYEGKFPKHPLPEAWTYFNEKR  142337

142338  KCKRGLFFM (2) 142364

142420  DDEEWLHYRKLLNQPLLRNTSWMIGPIKRVSDNTIKSLPHNAKHSDCKEKRFELHNV  142590

142591  ESVLYKWSIE (1)

147854  VLLSVMLGSSYNEINAIKLNELVEQFSRTVYQIFMYSSKLMAVPPEIADRLQLDAWKQ  148027

148028  FERIVPESLAI (1)

        ANKIIDISIDDIERGDGLLSKLEDCIPSR  148207

148208  DSIKRIFSDFIIAAGDT

148323  TAFATLWCLYLLAKNQAVQTMVRDETKHDFLESPLIRATVKESLRLFPIAPFIGRFLATD  148502

148503  AIIGDYCIAKN (0)

        TLALLSLYSAGRDEVNFYLPNEFLPQRWLR  148682

148683  RDDKNQSIIPFNANASLPFAIGSRSCIGRKVALIQMQYLLSK  (0) 148808

167332  ILNEYRLTVLNDEEVDAELKLVTVPNKKVKLAFHKLQ  167442

 

>CYP12F8 ae11 AAGE01032419 AAGE01086349 TC54878 TC56637 56% to CYP12F2

757957630 570857480 636185159 complete

578659905  634992373  826015391  815163912  749851347

MAGNRTVRLGCNLGQRMGLATRVTHHETEWDKALPYSKIPAPSVFKMLKNFGP (1)

GRQYNAGLPEVYRFFRDSYGDLVRMPGLFGKRDMLLSFHPDDYETLFRNEGQWPLRR

GLDTFGYYRMHVRPDVFKGKGGLVAE ()

GENWQKFRSTVNPVMLQPKTVKLYVNKLDKVALQLMGL ()

MLNMRDSKNELPANFKQWINRWALESMGVLALDTRFGLLDSKQSVEAQIIVTNLQEFFELT

YQLDVLPSIWRYYKTASFKRLITVLDRITE (2)

IVKSKIEEAAARLESNPSAPSETQSVLEKLLKVDRDIAFIMACDMLMAGVDT (0)

TGAGVTGILYCLATNPDKQAKLREEIRTILPNKDSALTPENMHNLPYLRACVKEC

IRLCPPVSANVRATGKDLVLRGYQVPKGTDVAMSSMILQNDERFMTRAKEFIPERWLKLD

DYPSVQDAHPFLILPFGFGVRTCIGRRLAMLEMEILTARITRLFEYRWNYGELKIRGNLV

NMPINELKFQMKEVED*

 

AAGE02011045.1 Length=46523 first exon  use this seq

AAGE02011046.1 Length=152209 exons 2-6

25298 MAGNRTVRLGCNLGQRMGLATRVTHHETEWDKALPYSKIPAPSVFKMLKNFGPG (1) 25459

2812  GRQYNAGLPEVYRFFRDNYGDLVRMPGLFGKRDMLLSFHPDDYETLFRNEGQWPLRRGLD  2991

2992  TFGYYRMHVRPDVFKGKGGLVAD  (2) 3060

3126  QGENWQKFRSTVNPVMLQPKTVKLYVNKLDKVALQLMGL (2)

      MI  3305

3306  NMRDSKNELPANFKQWINRWALESMGVLALDTRFGLLDSKQSVEAQIIVTNLQEFFELTY  3485

3486  QLDVLPSIWRYYKTASFKRLITVLDRITE (2) 3572

3632  IVKSKIEEAAARLESNPSAPSETQSVLEKLLKVDRDIAFIMACDMLMAGVDT  3787

7202  TGAGVTGILYCLATNPDKQAKLREEIRTILPNKDSALTPENMHNLPYLRACVKECIRLCP  7381

7382  PVSANVRATGKDLVLRGYQVPKG  (0) 7450

7519  TDVAMSSMILQNDERFMTRAKEFIPERWLKLDDYPSVQDAHPFLILPFGFGVRTCIGRRL  7698

7699  AMLEMEILTARITRLFEYRWNYGELKIRGNLVNMPINELKFQMTEVED*  7845

 

>TC62801 TC30081 TC42104 62% to CYP12F2 only 3 aa diffs to TC62802

633797604

LTKELFELVYQLDIRPSIWMYYKTSKYHRLMKVFDELTAIAMAKVDEAVVRLEKNPTTNTDAQSVLEK

LLKIDRDVAIIMSFDMVLAGIDT (0)

TTSAIIGILYCLANHPEKQAKLREELRGILPTKNSSLTPDNMLNL

PYLRACIKEGMRLFPPIGGNIRAAGKDIVLQGYRIPKG

TDVAMGSMVAQQSDRFVPRAKE

FLPERWLKTKEPGCPHAKDAHPFVYLPFGNGPRTCVGRRLAMLEMEILVAK

 

>This seq has good matches to 633797604 745118857 578662923 (2 nuc diffs up to PKG)

No exact matches to the 547 and higher part of this seq.  I am pretty sure that

TC62801 is the same gene as TC62802

ACTGGACATACGACCATCCATTTGGATGTACTACAAAACTTCGAAATACCATCGCTTCATGAAAGTTTTCGATGAACTGA

CTTCCATTGCCATGGCAAAAGTAGACGAAGCCGTCGTAAGGTTGGAAAAGAATCCTACTACAAACACCGATGCCCAAAGC

GTGTTGGAAAAATTGCTCAAAATTGATCGAGATGTAGCAATTATCATGTCCTTCGATATGGTTCTTGCTGGAATTGATAC

GGTGGGCAATAATTACCTAAAATATGTTCAATCGACACTACAAAGTCGCAATTACAGACAACTTCAGCCATCATTGGGAT

TCTATACTGCTTAGCTAATCATCCTGAGAAGCAGGCGAAACTGCGTGAAGAACTGAGAGGTATTCTGCCAACGAAGAATT

CCTCATTGACTCCAGACAACATGTTAAATCTGCCATATCTGCGTGCCTGCATCAAAGAAGGAATGCGATTGTTTCCACCG

ATCGGAGGGAACATCCGAGCTGCTGGAAAGGATATCGTACTACAGGGCTACCGAATTCCTAAAGGA

ACGGACGTAGCCAT

GGGTTCCATGGTTGCTCAGCAGAGTGATCGGTTCGTACCCCGGGCAAAAGAGTTTCTTCCGGAACGCTGGTTGAAGACTA

AGGAACCCGGATGTCCGCACGCCAAAGATGCCCATCCTTTCGTCTATCTACCGTTTGGAAATGGGCCTCGAACCTGCGTA

GGTCGAAGATTGGCCATGCTGGAGATGGAAATCCTGGTGGCGAAATCACTCGGCTATTCGAGTATCGTTGGAATTACGGG

GACTTGAATATCCAGACAACTCTGGTGAATACTCCGGTGAATGATTTGAAGTTCCAGATGGTGGAGGTTGATAGTTAGTT

GGGGCTGCTAAGGATGTCTATAATTCTCCATACTGATGTACTGGAATGAATCATAGGGTGCAGATTGCTCTATTGAATAA

TATTTTTTCAACAATAAATTTGTTCATGTGTTTACCTAAAACC

 

>CYP12F5 AAGE01014703 TC62802 TC30082 TC42105 59% to CYP12F2 576558364 519971711

633083599 AAGE01014703.1 genomic clone in WGS section of genbank

note the two seqs are 93% identical at the nucleotide level

complete

MQLKSVSRVFTFNSNCFNRTFSIVPTRNYGVQTAPASEGSDPEWDNAKPYESIPRMTLMESVKNFFPG (1)

GRYHNVSITEMHRLFQEDYGDLIRFPGILGRKDTVMTYRPDDFEKLFRTEGTWPNRRGLD

TFVHYRKNVRPDVFKGVGGLVTE (2)

QGESWQKFRTIVNPVLLQPKTVRLYVDKLDEVTREFMNI (2)

MLKIRDNKNEMPADFNQWLNRWALEVTGVISVDSRLGVLDAEESEEAKRIVK (0)

LTKELFELVYQLDIRPSIWMYYKTSKYHRLMKVFDELTAIAMAKVDEAVVRLEKNPTTNTDAQSVLEK

LLKIDRDVAIIMSFDMVLAGIDTTTSAIIGILYCLANHPEKQAKLREELRGILPTKNSSL

TPDNMLNLPYLRACIKEGMRLFPPIGGNIRAAGKDIVLQGYRIPKGTDVAMGSMVAQQSD

RFVPRAKEFLPERWLKTKEPGCPHAKDAHPFVYLPFGNGPRTCVGRRLAMMEMEILIARI

TRLFEYRWNYGDLNIQTTLVNTPVNDLKFQMMEVDS*

 

>CYP12F7 AAGE01039158.1 CYP12F N-term exon complete

end of this contig matches 579038185 803270410 580023179 832546439

821664975 walked from here to try to find seq DR747927

633081412

did a blast of the first 2kb of AAGE01039158 against WGS with megablast and then

searched the mate pairs for a match. 519876045 had a mate pair = 519934913 that

matched DR747927. Therefore this first exon belongs to DR747927 and it completes

that gene.

DR747927 18 July 2005 EST AAGE01381702 AAGE01102867

63% to 12F2 223403769

592051743 627484635 569653177 578234080 528591681

MLRKAVKVNNSVVVSSVRFRSTQAQAAVANGAAAHEARDSEWDNALPFSKIPGPNVFQMLKSFAP (1)

GRYHNANLPTMHRLFREDYGELVRMPGLFGRRDVLLSFR

PDDYETLFRNEGQWPIRRGIDTFAYYRQKVRPDVFKGLGGLVTE (2)

QGENWQTFRTAVNPVMLQPKTVKLYVDKLDAVAQEFMKM (2)

MVRIRDDKNELPGDFSQWLNRWALETMGVLALDTRLGVLDESESEEAKSIVDNIRQFFELTYQ

LDVLPSVWKYYKTPTFHKLMSVLDVLTS (2)

IVMAKVDDAVIRLEKNPSAPSDTQSVLEKLLKVDRHVAIVMAFDMLLAGVDT (0)

TSSGTTGILYLLATNPDKQAKLREELRTVLPKKDSPLTAENMRNLPYLRACIE

EGLRLCPPTAGNVRAAGKDLVLQGYQIPKG ()

TDVAMASMILNQEETYVKRAKEFLPERWLKEDGYPNAKDAHPFLYLPFGFGARTCIGRRLAMLE

MEMIVSRITRQFDYRWNYGELEVRASLINVPINELRFQMTELDD*

 

>CYP12F6 new AAGE01025257.1 AAGE01059738 AAGE01062828 762706782 633806669

520523625 826091106 832456020 824229657 749956998 complete

MQSKGRSVVLKATLDHYSGRYVTIDRCYGIQVAPSSEGRDPEWDNAKPYEQIPRMTFFQALKNFVPE (1)

GRYHNVSATEMHRLFQQDFGDLVRFPGILGRKDTVMTFQPDDFEKVFRTEG

PWPNRRGLASFVHYRKEVRPEVFKGLGGLVSEQGENWQKFRSIVNPVLLQPKTVRSYVGK

LDEISREFMNM

MLKIRDDKNELPADFSQWLIRWSLESTGVLSVDSRLGVLDEQESDKARQI (0)

LTKELFELVYQLDILPSIWVYYKTPKYHRLMKVFDELTSIAMAKVDEAVLRLEKNPSTTSDA

QSVLEKLLKIDRNVAIVMSFDMILAGVDT (0)

TTSAIIGILYHLARHPEKQAKLREELLTIMPKKDTSLTPDNMQKLPYLRAFIKEGIRLFP

PIVGNLRAAGKDIVLQGYRIPKG (0)

TDIGMGSMVAQQSDRFVPRAKEFLPERWLKTKEPGCPHAKDAHPFVYLPFGNGPRTCVRR

RLAMLEMEILIARITRLFEYRWNNGDLKIQTTLVNTPVNDLKFQMVEVDD*

 

AAGE02003131.1 Length=176362 use this seq

26817  MQSKGRSVVLKATLDHYSGRYVTIDRCYGIQVAPSSEGRDPEWDNAKPYEQIPRMTFFQA  26996

26997  LKNFVPE (1) 27017

27079  GRYHNVSATEMHRLFQQDFGDLVRFPGILGRKDTVMTFQPDDF  27207

27208  EKVFRTEGPWPNRRGLASFVHYRKEVRPEVFKGLGGLVSEQGENWQKFRSIVNPVLLQPK  27387

27388  TVRSYVGKLDEISREFMNI  (2) 27444

27509  MLKIRDDKNELPADFSQWLIRWSLESTGVLSVDSRLGVLDEQESDKARQIV (0)  27658

27726  LTKELFELVYQLDILPSIWVYYKTPKYHRLMKVFDELTSIAMAKVDEAVLRLEKNPSTTS  27905

27906  DAQSVLEKLLKIDRNVAIVMSFDMILAGVDT  (0) 27998

28055  TTSAIIGILYHLARHPEKQAKLREELLTIMPKKDTSLTPDNMQKLPYLRAFIKEGIRLFP  28234

28235  PIVGNLRAAGKDIVLQGYRIPKG (0) 28302

28369  TDIGMGSMVAQQSDRFVPRAKEFLPERWLKTKEPGCPHAKDAHPFVYLPFGNGPRTCVRR  28548

28549  RLAMLEMEILIARITRLFEYRWNNGDLKIQTTLVNTPVNDLKFQMVEVDD*  28701

 

220 Aedes accession numbers from NCBI WGS section in order

AAGE01000026.1  1e-50

AAGE01000868.1  9e-26 2 genes

AAGE01001298.1  9e-39 incomp 2 genes

AAGE01001411.1  9e-31

AAGE01001430.1  1e-45

AAGE01001451.1  3e-28 incomp

AAGE01002325.1  1e-20

AAGE01003123.1  6e-23 pseudogene

AAGE01003202.1  1e-19

AAGE01003592.1  5e-54

AAGE01003622.1  2e-24

AAGE01004063.1  1e-36

AAGE01004071.1  2e-15

AAGE01004684.1  3e-27

AAGE01004894.1  6e-19

AAGE01005096.1  6e-24

AAGE01005098.1  1e-69

AAGE01005157.1  2e-35

AAGE01005255.1  5e-32 incomp

AAGE01005406.1  3e-18

AAGE01005840.1  1e-32 incomp

AAGE01005986.1  2e-23

AAGE01006231.1  1e-26

AAGE01006393.1  3e-37 incomp

AAGE01007189.1  5e-21

AAGE01008959.1  2e-18

AAGE01009694.1  4e-41 incomp

AAGE01009885.1  1e-32 incomp

AAGE01010708.1  2e-43

AAGE01011003.1  5e-23 incomp

AAGE01011017.1  3e-17

AAGE01012031.1  3e-20

AAGE01012700.1  3e-18

AAGE01014192.1  3e-28

AAGE01014703.1  2e-59

AAGE01015732.1  5e-29

AAGE01015749.1  4e-68

AAGE01015918.1  8e-37 incomp

AAGE01018213.1  5e-38 incomp

AAGE01019797.1  2e-49

AAGE01020246.1  4e-19

AAGE01020741.1  1e-87 incomp

AAGE01021462.1  3e-68 incomp

AAGE01021812.1  9e-36 incomp

AAGE01021887.1  9e-62

AAGE01021948.1  3e-27

AAGE01023514.1  5e-29

AAGE01023613.1  4e-23

AAGE01024111.1  1e-17

AAGE01024167.1  4e-27 incomp

AAGE01024220.1  3e-62 incomp

AAGE01024260.1  6e-22

AAGE01025218.1  4e-37

AAGE01025257.1  3e-61

AAGE01025833.1  2e-09

AAGE01026029.1  1e-29 incomp

AAGE01026936.1  3e-44

AAGE01026951.1  2e-64

AAGE01027375.1  1e-34

AAGE01027431.1  4e-71

AAGE01028822.1  6e-46

AAGE01029369.1  1e-55 incomp

AAGE01029809.1  4e-22

AAGE01030046.1  4e-68

AAGE01031181.1  1e-19

AAGE01032419.1  1e-99

AAGE01032555.1  9e-17

AAGE01035444.1  4e-23

AAGE01039338.1  2e-24 incomp

AAGE01039952.1  6e-24

AAGE01041126.1  3e-45 incomp

AAGE01041187.1  2e-18

AAGE01043608.1  2e-29 incomp

AAGE01044016.1  2e-24

AAGE01045021.1  9e-44 incomp

AAGE01046733.1  7e-30 incomp

AAGE01047465.1  5e-23

AAGE01047841.1  6e-19

AAGE01048909.1  4e-35

AAGE01049176.1  6e-39

AAGE01050332.1  4e-33 incomp

AAGE01051792.1  4e-44 incomp

AAGE01051934.1  2e-19

AAGE01052546.1  2e-19 pseudogene

AAGE01054542.1  1e-24

AAGE01054827.1  4e-33 incomp

AAGE01055570.1  4e-41 incomp

AAGE01059014.1  3e-42 incomp

AAGE01059738.1  3e-28

AAGE01062475.1  7e-28 incomp

AAGE01062828.1  9e-26

AAGE01063015.1  2e-28 incomp

AAGE01063458.1  1e-26

AAGE01063658.1  2e-30

AAGE01064173.1  8e-57 incomp

AAGE01064689.1  6e-71

AAGE01065173.1  2e-50 incomp

AAGE01065191.1  7e-20

AAGE01065801.1  3e-24

AAGE01070673.1  3e-30 incomp

AAGE01072700.1  7e-36

AAGE01073711.1  1e-21 incomp

AAGE01073923.1  3e-101

AAGE01074064.1  1e-44 incomp

AAGE01075209.1  1e-64 incomp

AAGE01075759.1  8e-27 incomp

AAGE01076911.1  4e-41

AAGE01078331.1  1e-59

AAGE01078584.1  1e-39 pseudogene

AAGE01081732.1  2e-08

AAGE01082298.1  6e-28

AAGE01082531.1  1e-59

AAGE01082714.1  2e-27

AAGE01083421.1  5e-46

AAGE01084906.1  2e-31 incomp

AAGE01086349.1  2e-132

AAGE01088707.1  2e-73

AAGE01096311.1  3e-11

AAGE01098313.1  2e-15 pseudogene

AAGE01099852.1  7e-23

AAGE01102043.1  2e-33

AAGE01102574.1  1e-30 2 genes

AAGE01102867.1  2e-77 incomp

AAGE01104491.1  2e-10

AAGE01105997.1  1e-15 incomp

AAGE01106416.1  7e-34

AAGE01108571.1  4e-23

AAGE01109944.1  2e-15

AAGE01111617.1  3e-31 incomp

AAGE01114730.1  2e-40 incomp

AAGE01114834.1  1e-34 incomp

AAGE01116725.1  4e-26 incomp

AAGE01116789.1  9e-25

AAGE01118978.1  5e-33

AAGE01123974.1  4e-18 pseudogene

AAGE01124741.1  2e-72 incomp

AAGE01124926.1  2e-26 incomp

AAGE01125862.1  8e-24

AAGE01126587.1  2e-24 incomp

AAGE01129498.1  4e-09

AAGE01132222.1  5e-35 incomp

AAGE01133741.1  1e-25

AAGE01138953.1  6e-40 incomp

AAGE01139230.1  2e-33 incomp

AAGE01142069.1  6e-24

AAGE01143020.1  4e-50 incomp

AAGE01147701.1  5e-37 incomp

AAGE01152759.1  1e-46 incomp

AAGE01153865.1  3e-41 incomp

AAGE01171970.1  1e-23

AAGE01172381.1  2e-25

AAGE01173027.1  1e-52

AAGE01179692.1  3e-21

AAGE01185776.1  6e-20

AAGE01187448.1  3e-27

AAGE01192518.1  3e-07 incomp

AAGE01194580.1  3e-90

AAGE01196445.1  2e-20 incomp

AAGE01198540.1  4e-21

AAGE01198792.1  1e-20 2 genes

AAGE01206292.1  3e-33

AAGE01206812.1  1e-25

AAGE01213118.1  6e-22 incomp

AAGE01216085.1  3e-33 incomp

AAGE01224146.1  6e-35 incomp

AAGE01225620.1  1e-26 incomp

AAGE01226366.1  2e-37 incomp

AAGE01227048.1  6e-21 pseudogene

AAGE01227180.1  6e-43 incomp

AAGE01227281.1  5e-35

AAGE01228356.1  8e-35

AAGE01234290.1  4e-41

AAGE01236202.1  2e-21

AAGE01239763.1  1e-78 incomp

AAGE01245552.1  2e-54 incomp

AAGE01253357.1  2e-117

AAGE01259804.1  5e-29

AAGE01266366.1  1e-20

AAGE01269158.1  1e-21 incomp

AAGE01269282.1  6e-37 incomp

AAGE01273771.1  3e-23

AAGE01277917.1  1e-21

AAGE01288441.1  2e-32 incomp

AAGE01293591.1  4e-39 incomp

AAGE01298676.1  2e-36 incomp

AAGE01303772.1  4e-41 incomp

AAGE01321728.1  3e-21

AAGE01331087.1  4e-39 incomp

AAGE01337778.1  1e-18

AAGE01338874.1  1e-61

AAGE01339434.1  2e-23

AAGE01340100.1  2e-36 incomp

AAGE01341824.1  2e-22 incomp

AAGE01347884.1  2e-41 incomp

AAGE01378346.1  8e-35

AAGE01381118.1  7e-55 incomp

AAGE01381702.1  1e-48 incomp

AAGE01395583.1  3e-38 incomp

AAGE01397643.1  3e-24 incomp

AAGE01400897.1  9e-23 incomp

AAGE01406122.1  5e-22

AAGE01408667.1  6e-30

AAGE01439874.1  5e-30

AAGE01449675.1  2e-59

AAGE01462557.1  3e-46 incomp

AAGE01484914.1  1e-24 incomp

AAGE01489548.1  8e-32 incomp

AAGE01491115.1  4e-62 incomp

AAGE01493222.1  3e-36

AAGE01504815.1  3e-21

AAGE01516637.1  2e-33 incomp

AAGE01528761.1  2e-25

AAGE01531287.1  4e-55 incomp

AAGE01553900.1  2e-22

AAGE01569058.1  2e-25

AAGE01574909.1  4e-39

AAGE01584611.1  8e-32 incomp

AAGE01593486.1  3e-18 incomp

AAGE01635404.1  2e-40

AAGE01635520.1  7e-89

 

Blast of Broad Institute new assembly of Oct 6 2005-09-22

Query= 6331

        (52 letters)

Database: /seq/annotation/blast_databases/insect/aedes_aegypti/aedes_a

egypti_1.fasta

          4758 sequences; 1,383,971,543 total letters

Searching..........done

                                                                  Score     E

Sequences producing significant alignments:                        (bits)  Value

Aedes aegypti supercontig 1.6                                         106  9e-23

Aedes aegypti supercontig 1.174                                        54  7e-07

Aedes aegypti supercontig 1.738                                        50  1e-05

Aedes aegypti supercontig 1.610                                        45  3e-04

Aedes aegypti supercontig 1.3561                                       43  0.002

Aedes aegypti supercontig 1.3299                                       42  0.002

Aedes aegypti supercontig 1.187                                        42  0.003

Aedes aegypti supercontig 1.106                                        42  0.004

Aedes aegypti supercontig 1.170                                        41  0.005

Aedes aegypti supercontig 1.283                                        40  0.011

Aedes aegypti supercontig 1.467                                        40  0.014

Aedes aegypti supercontig 1.457                                        37  0.092

Aedes aegypti supercontig 1.105                                        36  0.20

Aedes aegypti supercontig 1.197                                        32  3.9

Aedes aegypti supercontig 1.1912                                       32  3.9

Aedes aegypti supercontig 1.778                                        30  8.6

>Aedes aegypti supercontig 1.6

         Length = 5075626

 

>AAGE01025218 Frame = +

3957227 MLLIPVLLLIAIFSATWVLIILYIKRNRAFARSLTLHPPKVYFLGMDLTMAVEDEVQRFE 3957406

3957407 SVWRMFLSHDRMFKHLLGPIMGIGISHPDLMHKVLSHPDCLEKPFFYNFVQLEHGIFSAE (1) 3957586

3957651 YLWKGQRKALNPTFNMKILNSFISIFEDCSSRMVADLFKCANGETVDMFQFTSKCTLEMV 3957830

3957831 CATTLGSNVLEREGSDEFLRNMEG (2) 3957902

3957958 LFELVGKRMLSVELFLDSIYRLTSYYRKEMKIRKKIEEFSGN (0) 3958083

3958499 IIREKRREHMFCLNQQHLHNASTPKEDEDDIRKPQIFIDQLLSLSNSSRPFTDEEILHNVLTIMIA (0) 3958696

3958761 QGNDTSGLGVAHACLFLAIYPNIQQKVYDEVMKHFPPDGPND 1886

3958887 RISLDADFLRQLEYTEMFLKEVLRHCPVAPTVARQNLKELELDGVRIPAGNTLSFSFFAL 3959066

3959067 HRRKDIWGPDAEKFDPENFAPERCEKRHPYAFMPFSSGSRNCIGGRYAMISMKVMIVYIV 3959246

3959247 RNFSLKTNLRHSHLRYKFGMTLKLPFAHAIQVYKRNIEQ* 3959366

 

>AAGE01064173 Frame = +

3965717 MFLFLLLVCAILAFFAIRDHLRKTQAFAENLPIVVPEKSFLGINYDLLGLNDEERFELVN 3965896

3965897 RIFLQQDRLFRMSIGPMLILGVSHPDLVQKLLSHPDCLEKPFFYDFVKYDQGIFSAK (1) 3966067

3966133 FKLWKSQRKALNPTFNLRILHSFVPIFEKCSKKLVSELEKCKDGDTVNMFKYTSRCALEMV 3966315

3966316 CGTTLGSDVLQRDGKEVFLTSLEE (2) 3966387

3966451 LFLLVSRRMLSMHLYSDLIYMMTPHYWKELIHRKRCKAFTKK (0) 3966576

3974507 ILQEKKEARRYGATPESTPDSDPEADDFKKPQIFIDQLLSTSESSRPFTDEEIFHNVFVIMVA (0) 3974695

3974757 GNDTSGLATAYACLFLGMYPHIQEKVYAEVMEHFPNEDVEMTGDSLKQLEYTEMFLKEVL 3974936

3974937 RHCPVAANIARQCIKDIEIDGTRVPAGNLFIFTFWAMHRRKDIWGPDADKFDPDNFLPER 3975116

3975117 VQARNPNAYMPFSSGSRNCI (1) 3975176

3975245 GGRYAMISIKVMLVYLLRRFKLHTNLKHEDLRYKFGITLRLSTSHMVQLERRKC* 3975409

 

>AAGE01050332 + AAGE01245552 Frame = +

3997728 MCLYLLVSTFVLAFVWICESLRRKNAFAKNLPMAKPIKSFLGVDYSIMDMSDEERFEVMN 3997907

3997908 DCFARFDRLFVFYTGPLLVLAVSHPDLVQKLLSHPDCLEKPYFYDFVKFEQGIFSAK (1) 3998078

3998611 YKLWKGQRKALNPTFNLRILHSFFPIFDECSKKLVQELKKLPKGETVNLFRYTSHCALEMV 3998793

3998794 CGTTLGSDVLEREGKDEFLCALEE (2) 3998865

3998920 IFGLVSRRMLSVHLYSDLIYMMTPAYWKEQFARNKLRSFAMK (0) 3999045

4013069 ILQEKKETARDTKTNGTDLDSEPETEEFKKPQIFIDQLLSISDISRSFTDEEILCNVLVIMIA (0) 4013257

4013317 GNDTSGLAVAYGCLFLAMFPQIQERVYAEIMEHFPSDEMEITADSLRLLEYTERFLKETL 4013496

4013497 RHCPVAANIARENMKDIELDGVMIPAGTKFTVSFWALHRRADMWGPEVHSFDPDHFLPER 4013676

4013677 CRDRNPNAYMPFSTGARNCI (1) 4013736

4013935 IGGRYAMLSTKVMLIHILKNFKITTKLRFEDMRYKFGMTLKMSTDHLVQLERRF* 4014099

 

>AAGE01024167b + AAGE01224146 Frame = -

4041637 MDLFLLLLTGPLAIVFLIFLYVRVLQYINRFANSVPFGGMSRYPLFINDWKLLRASPVQK 4041458

4041457 FEILAETFAQHDRLFRVWFGPRMAFATCHPDVIQAILTHPECVDKPFFYRFARLDHGLLVGR (1) 4041272

4032230 GHLWRRQRKQLNPTFNLRILTSFLPIFEKCCQQMVNCLEPFANGDRIDILQHTTRCTLNMI 4032048 4032047 LQTSLDTDSLSNEESASLVKHIKR (2) 4031976

4031919 FFFISTNRVLNLHHYWEPVYRLTKNFAMESESYGVILGATRK (0) 4031794

4031403 ILNIKKNEMKDKPLNENDLEYKKPRIYMDQLLKLSDTMSDKEIMHNVCTMIAA 4031245

4031185 GNDTSGQLMAYACLLLGMYPHIQEKVYSEIIELIPLTRKESISVEQLKTLTYTEMFMFEC 4031006

4031005 LRLCPIAPNIARLNMTPIELEGITIPAGHIFFISFYSLHRRKDIWGPDAEQFDPERFSPE 4030826

4030825 RSVGRHLYAFLPFSGGSRNCIGWRYAMMSMKLMLVYLLREYRFRTDLKLSDLKFKFDMML 4030646

4030645 VLVFEHWVKIEKRRYNC* 4030592

 

>AAGE01001298.1 gene b Frame = -

 Frame = -1

4051729 MVLLLISFLIVLTLLKLVHRNNHRFAKDLPSVEPCYPLLGNALMFVGKSPEQKFENLARG 4051550

4051549 FLQNDRLFKLWFGPKLTLGTSHPELVQKIVNHPDCIERPLFFYKQLRMTQGLLVAR (1) 4051382

4051319 YGLWKQQRKALNSTFNLKILHSFIPIFEECSRKLVNRLQNHVGCSKPINLAQFVSQCTLEM 4051137

4051136 VCGTTLGMEHLQQESGSRFLHHIERVMDIMGERILSIPMQITALYFFTPMFWQEMHSLKM 4050957

4050956 NRQYAAE (0) 4050936

4050882 IIDEGRRKMKANEQSNTIDEDQDGYHKPQIFLDQILSANRAGKPFDDEEIQHNVRTMIAA (0) 4050703

4043841 GNDTSALAISHCCLWLAMYPEIQERVYCEIKEHFPYPDSEITPEGLKNLIYTEMCIKETL 4043662

4043661 RLTGPAPNIARETLADVELDGLIVPKGTTIILSLYALHRRQDVWGPQADRFDPDNFDEDK 4043482

4043481 CRTRPAGVFIPFSTGPRDCI (1) 4043422

4043364 GRYAMISMKIMIMYILRNFKLITQLKPEQLRYKFGPTLKLACDHMIQLEKRVD* 4043203

 

>AAGE01001298.1 gene a + AAGE01043608 Frame = -

4076156 MITVLLFLVLFVVFIVVKYVKYERSFSFAKNIPSVEPAYPIVGNALQFVGKNGEELFKKFA 4075974

4075973 DMLNHPAKLFQMRMGVLRLFCTNDPDVAQKILTQCLEKPFLYDFFKLDYGLFSAH (1) 4075809

4075744 YDIWKNQRKSLNPTFNQKILNGFLPIFDQCAQNLVKRLQSCTDGDSVKITDCHLRCTLEML 4075562

4075561 CRTTFGVDINNNPNAFKLTALINE (2) 4075490

4062854 IIQEVINRRNKEAPTLDNSDPECDGYRKPQIFIEQLLNQQENNNFTEIEIIHNVYTMIVA (0) 4062675

4062608 GSDTTGNQLGYISLMLAFFPELQEKVFREVMEVFPGEIEFTVDNLRQLEYTEMFIKECLR 4062429

4062428 LLPIGPHVMRFTTADTELEGVSIPKGNILAVSIFNMHRRKDIWGPNADQFDPENFSAERS 4062249

4062248 KGRHPFAYVPFSGGNRNCI (1) 4062192

4062131 GSRYAMYSMKIVLVHLLRHFKIHTRRRFEDIRFEFEALLKMSIEPEVSLEKRVPVTIKRSN* 4061946

 

>AAGE01056055 Frame = + just exon 1, not a complete seq 57% to AAGE01193335

This gene is 16kb downstream and in the same orientation.  This might be

An alternative exon 1 for AAGE01193335

4105214 MFGLTFALIVVYLLALYVYAKIKYRFANKIPSIEPMVPFFGNGLEFAQKNCYKIFVNLKRIF 4105399

4105400 ENNKHHRLFKLCFGPIVVLCPTHPDLIQKVMTDTGSMEKPYVYEFLRVDLGLLSAK (1) tgt 4105567

 

no P450 in interval between 4105567 and 4121642

 

>AAGE01193335 + Frame = +

4121633 MALLLITLALLGGIWLALYIYCRIRFGFARNIPEVRPLKFFFGNGLDFAQKNSYEIFVSINRV 4121821

4121822 FRENKRIFKISFGPIKVVCPTHPDLIQKVLCQSASMDKPYVYDFTRMGSGLLTAP (1) 4121986

4163966 YDTWKVHRKLLNPTFNTRILNSFIPIFNDCADKMIESIHEHAAPGKVLNILEFTSPCTLAM 4164148

4164149 ICRTSLGGKVLEREGTQKFVEGLEI (2) 4164223

4164287 ILSNVGLRMFNANLHPDIVYRFTRFYRREMESRKFCYAFTDK (0) 4164412

4164464 IILEKQQELAVAKAKGKQDTNNNASDSGNDNFEEEEDDLLSYKKPQIFIDQLLTIPLPD 4164640

4164641 GKPFSHKEISDHIYTMIAAGNETSATQAAHTLMYLAMHPEVQEKAVKEIKELLPTPESKI 4164820

4164821 TSEVMKNMVYMERIIKESQRLAPVAAVYGRKTIADLQLDQFTIPKGNIFILNIFALHRRK 4165000

4165001 EYWGEDAELFNPDRFLPENSKNRHPFAYLPFSGGNRGCI (1) 4165117

4165175 LGNRYAMMSMKTIVSAILRNFKISTDLEYEKIEFKFKVSMHLSGPHRTFVEPRNLYG* 4165348

 

>494087031 Frame = + with new N-term

N-term matches 836008679 836009201 812170512 591454297 625040575

4179656 MLITEFLILFLIVLVFIKLIKPLIQFRSIPYVRPWYPLVGNVFLFLGKTGEQLFDQMNCMF 4179838

4179839 AQHDRLFLLWFGIRPVVGVSHPELIRKVLTSRACLEKPFFYRFSRIDQGLWAAK (1) 4180000

4194135 TSLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEGSTIDILPITSACTLRMI 4194314

4194315 GRTTMGIDENDEMEIAKFVSNMDK (2) 4194386

4194442 ITEVVSNRFLSVHLHSEQIYRMTQLYERETQYRKECSNYTMK (0) 4194567

4194630 VLKERKRKVYMELPGLQKVFIDQLLDECALGRSFSDTEIMQNVYTMLAA (0) 4194776

4194838 GSETTARSISYACLLLAIYPDIQEKVYAEIMSLLSDDIHPLTTATLAELTYMEAFLKE 4195011

4195012 CHRLYPVAPYIARESTESIELDGVCFPKGSVFIFNFFALHRSSAVWGIDSEQFNPERFLN 4195191

4195192 EQNGEHHAFGYLPFSGGQRNCI (1) 4195257

4195322 GQRYAMMSLKVMLIYLIRYFRMETHLRQEDLRFSFGMMLELSTECLVRFNKR* 4195364

 

>AAGE01077592 Frame = -

note bottom of this seq is very similar (6aa diffs) to AAGE01056055.

AAGE01056055 assembled wrong. an end for AAGE01056055 does not exist

4563278 MLQLVLVFVLFTGFTYYLAFRRSRKRLYELAATFPAPFDLPLIGSTYIGIGLNSKTIIEYLL 4563093

4563092 KFLHNLPSPFRAWMGPFLGIIFDKPQHLAVILNSQHCVQKSVFQKFFRFDKGLINS 4562925

4562924 DRNIWRPQRKQLAAPFSYQVVANFAPSFNEYAEEQLKYLDRFVGAEAFDMLP 4562769

4562768 KLSFYVLSSTLANLFKVQLHSHDYDFMEKFVKNSEQ (2) 4562661

4562598 MWINIFRRVYKPWLISEFIYRLTPAYKMELQQVGKLRALSEE (0) 4562473

4542500 IVEARKVLQQKSHPGDDHNASSEVLIERLERLTYQTGEMTNEEMMDNIDTFLFAAVDTTT 4542321

4542320 STMASTLLMMAIHPEVQERVYQEVSQVVPNDYIAIEDLPNLVYLERVMKETMRLIPIAGM 4542141

4542140 LNRVCEKELQVGEWTIPVGATIGIPVLKVHRDRAIWGERSDEFDPDNFLPEKVAQRHPYA 4541961

4541960 YIPFSAGIRNCVGMRYANVSMKVLLAKLVKRFRFKTDLRMKDLKFEAAFLMMLANKHMMR 4541781

4541780 IEKR* 4541766

 

>Aedes aegypti supercontig 1.170

         Length = 2080708

>AAGE01046733 Frame = +

557640 MYLLVFFGTIATLGTLWYWTYRWQFKFADKWPSVRPRYAIVGNALIMLWKNDVQRFQEIKRV 557825

557826 FSECDRILTAWAGPKMFLITSHPDIVHQILSSPDCLERPFLYRFAGFTQGIFTAK (1) 557990

558058 LPVWKDNRKRLNSTFNQRIVHGFVPYFVKCCEKMTKSLLECADGETVNIQKYTAVC 225

558226 ALEMAAGTTLGGDVLQQGDGKEEFKRGLDL (2) 558315

558381 AFNGASRRMVTVPFYSDLIYQMTHHYKELMEGRRIICDFFTK (0) 558506

569058 LLIERKKFLLDHSKNTDVDTEEEYNKPKILVDQLLGVSHDGRQFNDIQIRDNVYAVITGA 9237

569238 TDTTSLATAHACLFLSFYPDIQERLHAELAEVFPGNIADYTPENIKKLTYLDMFINEVQR 9417

569418 HCTVVPYVARENTAEIEIDGVKVPPGNIFIMSLYAMHKRPDIWGPDAEKFDPENFTEERI 9597

569598 KDRHPAAFLPYSAGSKNCL (1) 569654

569713 GWRYAIFGMKLIMIHLVRNFHFSSKIKHEDMQFRHDLTLKLPFQHLVQLKKRNPGKILTMVE* 569901

 

>AAGE01011475 Frame = +

578367 MLFLILTVTVSILGAILYWNFRARYRFSDKWPTLKPVYPILGNGPVVMGKNEVDRFEIIRD 578549

578550 VCYSAERILKIWAGPKLLLLTSHPDLIQQILTSPVCLEKPYLYHFAGFEEGLFTAK (1) 578717

597301 YHVWKPARKRLNPAFNLRIIHGFVPIMARCAQKMAARLNKYPDGATVDIIKYTNMCTLEMIC 597486

597487 GTTMGSDVLNRDGKEEFKRGLDG (2) 597555

597616 AFNGAAWRMMNVHLYPDIIYKMTRYHKELTEARKIVCDFFTK (0) 597741

597301 KQILQQKRLNNDEKNNNDEEENEADNHKPKILVDLLLSNSSDGKPFTESQITDNVYAVITG (0) 597483

597633 AVDTTALITAHACLFMSFYPEIQERVFAEINQYFPVGSDDQEVTHEQFR 597779

597780 QLTYTEMFLNEVQRHWTPVPLIARENMAEFEIDGVKVPPGHVFGLSLHALHMRKDVWGPD 597959

597960 ADRFDPENFSEERAKNRHPFAFLPFSGGTRICL (1) 598058

598119 GWRYASFSMKAVMVHLVKNFKFSSKIKPEDIRFKHDLTMKLPFEHLVQITKRNPVAN* 598292

 

>AAGE01102953 Frame = +2 possible alternative exon 1 for AAGE01205264 22.4kb away

632492 MLLVLSVFVVILCCVLFVSHRRKYKFADAVPSLQPVYPLLGNADIMWKSDTERFETIVKI 632671

632672 FSEHDRMVKVWAGPQMLLFTCHPDLVQQILSSSDCLEKPFLYSFAGFERGLFTSK (1) 632836

 

No P450 seq in this interval

 

>AAGE01205264 Frame = +1

655243 MLLAVTVIVGLITIWLLLSQRRRYRFADSLPQLKPWFPVVGNGALMFGKSDVDRFDVLVK 655422

655423 IFRDYDRMVRVWAGPKMLLFTSHPDLVQQLLTSPACLEKPFLYSFAGFEQGLFTSK (1) 655590

683614 YKLWRSMRKRLNSSFNLRILHGFIPVFVQCARKMVEDLNENPDGTVVSMHKFTSVCTLEMA 683796

683797 CGTTLGSDITRREGKEEFVHGLDI (2) 683868

683933 AFGEAARRMVSVHLYPNIVYHLTKYHRELVQARGVVCDFFSR 684058

684143 RRNTMSLNCNKKTNEEELDFDRKPKILIDQLLSVNRDGKSFSDTEIEDNIYAVITG (0) 684316

684359 LSFKANDTSGLLIAHACLFLCFYKDIEEKLFTEIMEFMPNEEFEIN 684496

684497 PESLKQLSYLEKFLKECLRHCPVAPNISRENMSEIEIDGMKVPPGNIFIMNFYALHRRKD 684676

684677 IWGPDADKFDPEQFSEERSRNRHPFAYLPFSGGNRICI (1) 684790

684849 GWRYAMFSMKVMLIYLIRNFQFETEIRPEQVRYRHDLTMKLPFEHMIKVTRRKLEGSTV 685025

685026 MSDILKHPELVPKEGRE* 685079

 

>Aedes aegypti supercontig 1.174

         Length = 1986401

 

> AAGE01132222 Frame = -3

Query: 1      MLLVLSVFVVILCCVLFVSHRRKY--KFADAVPSLQPVYPLLGNADI-MWKSDTERFETI 57

              MLLVL++ +V+L  +L  S  + +  KFA ++ S+ P YPLLG+A + +  S+  RFE

Sbjct: 137412 MLLVLAIVLVLLLILLVDSTLKHHVGKFARSLESVSPNYPLLGSATVFLGHSEERRFENF 137233

 

Query: 58     VKIFSEHDRMVKVWAGPQMLLFTCHPDLVQQILSSSDCLEKPFLYSFAGFERGLFTSK 115

              + +  + DR+ K W GPQ++ +  HP+LVQ++L+  +C EKPF Y F+    GLF++K

Sbjct: 137232 MNMLRQVDRIGKGWLGPQLMFYVAHPELVQKVLTDPNCSEKPFFYEFSRLTHGLFSAK 137059

 

> AAGE01421194 Frame = -3

Query: 1      MLLVLSVFVVILCCVLFVSH-RRKYKFADAVPSLQPVYPLLGNADI-MWKSDTERFETIV 58

              M+ +L V + I   VL     ++K KFA A+P  +P YP++GN  + + K+  E F ++

Sbjct: 858945 MIALLWVVLSITIAVLVQRQWQKKVKFAGAIPRAKPYYPVVGNLPLALGKTSDELFSSLY 858766

 

Query: 59     KIFSEHDRMVKVWAGPQMLLFTCHPDLVQQILSSSDCLEKPFLYSFAGFERGLFTSK 115

               F +HDR+  +     + +   HP+L+Q++L+  DC EKP +Y      +GL  ++

Sbjct: 858765 DCFRQHDRLFTLQFSTIVAVCLSHPELIQRVLNHPDCQEKPDVYKVVRLPKGLLAAR 858595

 

> AAGE01011003 Frame = +1

Query: 1      MLLVLSVFVVILCCVLFVSHRRKYKFADAVPSLQPVYPLLGNADI---MWKSDTERFETI 57

              ML  +S+ ++I+  ++      K  FA  +P  QP YP++GN  I     KS  E    +

Sbjct: 836191 MLAFVSLVLLIVAILMVHWWHAKVDFARYLPRAQPHYPVIGNLQIALPFGKSAEELLGLL 836370

 

Query: 58     VKIFSEHDRMVKVWAGPQMLLFTCHPDLVQQILSSSDCLEKPFLYSFAGFERGLFTSK 115

                F +HDRM  +  GP++ +   HP+LVQQ+L+   C EK  +Y       GL +SK

Sbjct: 836371 HSYFRQHDRMFAIHIGPKVAIGLSHPELVQQVLNHPYCQEKSNVYELLRLPNGLLSSK 836544

 

>Aedes aegypti supercontig 1.174

         Length = 1986401

 Frame = -3

Query: 2      SLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQY-EGSTIDILPIT 52

              S+W+  RKALN T +  +L +F+P FE FS+ +V+KL  Y EG+ +DIL  T

Sbjct: 137001 SIWKPNRKALNPTMNVKMLNSFVPIFERFSRSMVEKLKCYPEGTPVDILDFT 136846

 

 Frame = +2

Query: 3      LWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQY--EGSTIDIL-PIT 52

              +W+  RK LN+TF+  IL +F+P F   +++L+  L QY   G T +IL P+T

Sbjct: 836612 VWKLHRKTLNSTFNLRILNSFLPIFNDSTRKLIQLLDQYASTGKTFNILAPLT 836770

 

>Aedes aegypti supercontig 1.738

         Length = 536643

 

 Frame = +1

Query: 3     LWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEG 43

             LWRSQRKALN++  P+IL +FIP F   S  LVD L +Y G

Sbjct: 57850 LWRSQRKALNSSLGPAILGSFIPIFNNKSAILVDLLEKYAG 57972

 

 Frame = +3

Query: 1      ASLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQY-EGSTIDILPIT 52

              A LW+  RK LN+ F+  IL  FIP FE    ++V  L Q  +G T D++  T

Sbjct: 121224 AELWKRHRKVLNSAFNLRILHGFIPIFEKCCSRMVSDLKQMKDGETFDVMRFT 121382

 

 Frame = -2

Query: 3      LWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEGS-----TIDILPIT 52

              +W+ QRKALN  FS +I+   +P F    ++L+  L QY G      T+DIL  T

Sbjct: 151031 MWKGQRKALNPAFSSAIIGKIVPVFNKKCEKLMHILDQYVGKQQKDCTVDILKCT 150867

 

 Frame = +2

Query: 3      LWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEGS-----TIDILPIT 52

              +W++QRKA N  F P+IL +F+P F      L++ L Q+ G      T DIL  T

Sbjct: 191981 MWKTQRKAFNPAFGPAILGSFVPVFNEKCAILMEILEQHVGKPQRDFTRDILKCT 192145

 

 Frame = +3

Query: 1      ASLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEG 43

              AS+W+  RK L+  FSP+IL +F+  F   S+ LV +L +  G

Sbjct: 233268 ASIWKVHRKLLSPCFSPAILASFVSIFNVKSEILVQRLEKNLG 233396

 

 Frame = +2

Query: 1     ASLWRSQRKALNATFSPSILKNFIPTFETFSKQLVDKLHQYEG 43

             A +WR QRK LN +F P IL  F+  F   S+ L   +  + G

Sbjct: 88574 AHIWRGQRKVLNHSFGPGILNCFVSIFNEKSEILTKLMTSHVG 88702

 

 Frame = -1