Modified May 25, 2005
D. Nelson
68 sequences, 15 complete sequences

CYP2 Clan sequences (15 sequences, 1 complete)

>complete combined seq CN566859 CN566581 CYP2 clan member [gene 2]
MFLEVIGAVFIPPLIWTIWVYIKHLIDCLHYPRGPIPLPFIGNGYLIRKAEPYKEL
VNLGKIYGDVFSFSVGSVRYVIVNSLEGIQEVLVKKGWQFAGRPKGP ()
SWDRSIHGLIQRDPSKKFKILRKLATSSLKIFADGLAGMESKAIEESFQLNKKLLETNGKPFSMQEIT (1)
TLCVLNIICSILFNHRYKEDDLEFQDIIKYSNICFKERGVNNYIISIPWLRY
FPSASSRNLDEMIKIRDPLL
KKKVQEHKRSYDEYNLRDLTDALIKASNSETGQDPDEKVTDDNIVFILN
NFILAGSETSSNTILWFIVYILHWPEYQDKLYDEILKVTSGSRYPCLKDRPSLHLMQAAI
YETLRLSSVAPFGLHHKAMEKSSICGKSI
PKGALIITNLWSIHHDESYWKNAMSFYPERWLENSGEFNSKLGNAYLPFSSGPRSCIGETL
AKTELFIFISRLINDFRFVKPILEELPRLEGSFGITCTPYDFKVEIVPRSKNLLV*

>1097329039870
MLFKVIGTILIPPLIWVVWIYIKHLVDCLSYPQGPFPLPFIGNAHLIRNRESYKVF
SEFQKIYGSVFGFSIGSTRYVVVNNLEGVQEVLIKKGSQFAGRPRRA (1)

>1096703827379
MFPEIVGAIMLPPLIWAAWIYIKHLVDCLVYPRGPFPLPFVGNAYLFSKGKPYKEF
VKLGKTYGDVFGFSIGSIRYVVVNSLEGIKKXXXXXXXXXXXXXXXX

>1095899160393 frameshifted
MFFEVIRAFFTPPLVWIIMVYIKNLIDYLYYPREPIPLPFIGNGDLIRKAEPFKEL
VNLEKKYGDVFSFRIGLVRFVVVSSLEVILEILVKKGWQANGRPKAP (1)

>gnl|ti|646968536 1095898162561 83% to 1095897329284 37% to 2X2 N-term
MILKVIGSIFFPPLIWFVYSYIKHLIECLYYPKGPVPLPFIGNTNLLRKKETCKEF
VNLGKIYGDIFGFSIGSIRYVIVNNLEGIHEVLIKKGSQFSGRPRII (1)

>gnl|ti|646849327 1095897329284 40% to 2X2 N-term
MLLQITCGFLFPPLIWIVWTYIKHLYDCLSYPQGPIPLPFIGNAHLLRKGEPYKEL
VNLGKIYGDVFGFSIGSIRYVVVNNLEGIKEVLIKKGSQFAGRPRLKFTI (1)
(1) ALSRGMNGLIMSDPSPHFRILRKLASSSLKIYAEGLDGMEKKAINEYSYLHKKLSTMNGKAVSLKRMI (1)

>1097265020030 new N-term weak with frameshifts may be poor seq
XXXXXTCGCTYPQKIWNVLWTDIKHLSDSESYPQGPISLPIXXXAHIERKGETYREI
DRLR*IYGDDIGMCIGTLRYVDVNNLEGIRDVLIYTGTQFL

>1096124019772 related exon 2, 5 aa diffs to 1097331043073
(1) ALTRAMNGLIISDPSPHFKILRKLASSSLKLYAEGLDGMEKKAINEYSYLHKKLSTMNGKAVSLKRMI (1)

>1097206642797 related exon 2, 61% to 1095897329284
(1) DWSRTMNSLINNDLNATFKVLRKITSSSLKIYAEGLVGMEKRAIEEYTHLNKKLLSLKGQAVSIKNMI (1)

>gnl|ti|647066038 1095898227332  34% to 17A1 35% to 2U1 fugu 33% to 2U1 human
(1) AFNRNTNSLINSDPGPRFKILRKLASSSLKIYAEGLLGMERIAISEYCELSKKLQSIKEKPVSVHKIM (1)
(0) QSTLNIICTILFNHRYEDDNQEFQNIIKYSSLIVQTFNETSYVSS
IPLLRYFPTATSRNIFEIIRLRDPILKRKLQEHRKSYDKNNLRDITDALIKVSLDSEMGE
ELTEKITDDNIEFLLNDFMIAGSETSSSTILWFIVYMLHWPEYQNKLYDEITKVASDNRY
VSLKDRPMLHLMQAAIHETLRLSSVVPLGLVHKAMENSSICGKFVPKGALILTNLWSMHH
DESYWKNAMSFYPERWLEKSGEFNYKLGYAYLPFSNGPRSCLGETLAKTELFVFITRLLK
DYRFEMPTGKELPCLDGRSGITSPPNDFEVVIIPRN*

>1096110062131  related exon 2 73% to 1095897329284
(1) AWSRALNGLVACDPGPRFKVLRKLASSSLKIYAEGLDGMEKKAADEYSHLNKKLQTMNGKPVSLQNMI (1)
(1) ELGTLNIICTILFNHRYEEDDKEFQDIIKYSNLTVKIFGGTSILSSIPWLRFLPSASSRSIYE
IVRIRDPLLKKKLQEHKSSFDENNLRDVTDVLIKVSLGSDIAKGSEEKITDENIEFLLND
FIIAGSETSSSTILWFIVYLLHWPEYQDKLYNEIIKVTSGKRYPCLNDRP ?
LHLTQATIHETLRLSSVGPLAIVHKAMENSSICGKPVPKGAFILTNLWSTH
HDESYWKNPMCFYPERWLEKSGEFNSKLGYAFLPFSGGPRSCLGEALARTELFVFFSRLV
TDYRFEKPNGEELPRLNGRFGLTCSPFDFKSVVVPRC*

>1097206059080 5 aa diffs to 1095898809307 might be the same gene
(1) GPCKPSHIICTILFNHRYDENDQEFQDIIKYSNLSVRASSATSLISSIPWLRFFPSTASR
NIYEIIRLRDPILKRKLQEHRSSYDENNLRDVTDSLIKVSLDSALENNSHEKITDDNIEF
LLNDFIIAGSETSSNTVLWFIVYMLHWPEYQDKLYNEILKITSGNRYPCLSDRPMLHLMQ
AAIHETLRLSSVAPLGVGHKAMESSSICGKPVPKGAFILTNLWSIHHDETHWNNAMSFYP
ERWLEKSGEFNLKLGEAYLPFSSGPRSCLGETLAKIELFVFISRLVKDYRFEKPTEEDLP
NLKGESGITRTPSEFKVMAIPRN*

>gnl|ti|649393684 1095898809307 45% to 17A1 C-term. No exact matches
         VYLKLGEAYLPFSSGPRSCLGEALAKIELFIFISRLVKDYRFEKPTEEELP
NLKGESGITRIPSEFKVMTIPRN*

>gnl|ti|648017453 1095896110991     52   1e-05 35% to 17A1 fugu 34% to 2U1 fugu
(1) ELTTLNIICTILFNQRYEQDDDEFQNIIKYSNLSFKAFSASNLLSSIPWLRYFPTTASKYIQ 707
706 EIERLRDPILKRKLQEHRKSYDENNLRDITDALIKASIHLNAEKDSLIKVTDDNIQFILN 527
526 DLILAGSETSSSTITWFIVYMLHYPEYQDKIFNEVIKVTSGNRYPCLNDRPLLHLLQATI 347
346 HETLRLSSVAPLGLRHKAMENSTICDKPVLKGTLIITNLWSIHHDERYWKNPMSFYPERW 167
166 LNETGEFDYKLGNAYIPFSGGPRACLGETLAKTELFVIISRLVTDFYFEKSVEEDLPRLDSF 374
375 PGVTRSPYDFKVVVVSRS*

>gnl|ti|654998190 1095901734433  33% to CYP21 33% to CYP17 33% to 2U1
870 XXXXXXXXXXXXXXXXXXXXXXXFQDIIKTHNETXXXXXXX
837 SYISSIPWLRYFPTATSRNMIILNKNKYNIFEIIRLRDPILIRKLQEHKRTYD*SNLGDV 658
657 TDALIKISLESVTETDSHEKITDDNTEFL
LNDFMIAGSETSSNMILWFIVYILHRPEYQDKLYDEIFKVFSG
IGYPSLNDRPRFHLIQAIIHETLRLLSVAPLGLCHKALENGSICGKFVPKG
LLILTNLWSIHHDERYWKNAINFFPERWLDNSGNFNYNLGYAYLPFS 147
146 GGPRSCLGETLAKTELFVFISRLVKDYRFEKPNGKDSLDGRSGVTCLPYEFEIVMIPRS*



more CYP2 clan sequences (5 sequences)

>whole gene 1095899272864 1096526199166 35% to 17A1 in Xenopus and chicken
MWYEIICGLIISILLYIIGSYLMHLLECRKYPLGPFPIPIFGNLHLLGTEPHKIL
AAYSKKYGAVFSISLGLQRIVIISDITTTREALVQKASIFAGRPKSYLIQLISSGYKGIAFMDY
GSFWKVLRKVSHSSLKIYGEGHERFEKILTKESEELHKRLLKKSNNSVELKSEF (1)
GAAIINVICFIVFGERYQYSDSEFKEVLTTINDIVDGLSNTTAVGFLPWLRFLPFSPIKKLSIS
LSKYVRFLNDKLKKHKETFDEKKIRDFTDSIINFSNNEAVKQKFKNVDEHLEPVIGDLFI
TGSETTLTSLLWLILYMMHYPKYQQEIFKEITTVIGEDRYPCLNDRDSLHLVKAALKECL
RLSSIVPLGLPHKTTKETVLMGHSIPGNATVMINHWQIHNDTNYWENPNEFNPYRWIGKD
KKFDPSKATSFLPFSAGTRVCLGKTVAENELFFFFSRLIRDFNFECIPGCPPPSLIGKCN
ITHAPKQFCAYLTPRINNLM*

>1097509039345 92% identical to 1096064108200 probably joins with 1095898835518
MFLEVAFGVVTPLFLYVIATYLDHLFKCRFYPPGPFPLPIIGNLHLIGKKPHEKF
VEYSKKYGEVFSLSFGMHRVVIVSGKDSIREVLVQKSNIFAGRPKNYI
ANIVSRGYKNIGYGDIGPKWKILRKIAHSSLKNYGESTAHLETLVVRESEELHKNLYKKSNRSTKLEHKF (1)
gnl|ti|649400787 1095898835518 93% identical to 1096064108200, 39% to 17A1 fugu 
(1) GVAVLNVICSIVFGKRYEYENCEFKEILTYMNYVFTGVAGTNAISFIPWLRFLPLDGLR
KLKKGLSIRDPVLRKQLLYHRETYNESNLRDYTDYVIQFSRDEAILKKFGEQLTDDY
LELLLNDIFIAGTETALTTLLWSIIYLIHWPKFQDKIYNEIVSAIGKNRYPSMKDRNMLP
LVNAALSETLRLSSVTPLGVPHKAMEDTTLLNDLKIPKGTTILTNLWQLHHNKNCWENPH
EFNPYRWFTNDQTLDSIKSMNFLPFSAGTRVCLGKGIAEVELFLFYSRLVRDFKFEVKP
GDSLPSLYGNCGLL*

>1096064108200 93% to 1095898835518 
1097206498632 walked up to 1096081234652 found mate pair 1096071090512
already known N-term seq matches 1095897342515 100%
1095897342515 38% to 17A1 fugu whole seq.
MFLEIAFGVTAPLLLYVIATYLDHLFKCRFYPPGPFPLPIIGNLHLIGKKPHEKF
VEYSKKYGEVFSLSFGMHRVVIVSGKDSIREVLVQKSNIFAGRPKNYIANIVSRGYKNIGYGDIGPKWKIL
RKIAHSSLKNYGESTKHLETLVVKESEELHKRLFKNCNRSTELEDEF (1)
(1) GVAVLNVICFIVFAKRYENKDSEFKKILMYMNYVFSGVASTNFASFIPWLRFFPLDGLR
KLKKGLSIRDPVLRKQLLYHRETYNESNLRDYTDYVIQFSRDEAILKKFGEQLTDDYLEL
LLNDIFIAGTETALTTLLWSIIYLIHWPKFQDEIYNEIVSTIGKDRYPSMKDRNMLPLVN
AALSETLRLSSVTPLGVPHKAMEDTTLLNDLKIPKGTTILTNLWQLHHNENCWENPHEFNPYRWF
TNDQALDSIKSMNFLPFSAGTRVCLGKGIAEVELFLFYSRLVRDFKFEVKPGDSLPSLDG
NYGITLTPRIFTTFVVARNDSLVAQNHSL*

>gnl|ti|647182814 1095899213949 73% to 1095899272864
(1) GVAVLNVICFIVFGERYQYSDPAFIEILTTINNIVSGLSNTTAVDFLPGLRYLQFSEIK 256
257 KLKSSLVIYFRLLNDQLKKHKKTFDENNIRDFTDSIIKFSKDETMENKFEEELTDEHLEH 436
437 VIGDMFIAGSETTLTSLLWLIIYMIHYPKYQEEIFEEITRVIGENRYPQLSDRDSLHLVK 616
617 ASIKECLRLSSIIPLGVPHKTMSDTTLIGYNIPKNTTVIINHWQIHNDTNHWKNPNEFNP 796
797 HRWIDDDSKFDATRATSYLPFSAGTRVCLGKTVAETELFFFFTRLIRDFKFE
GVPGCPLPSLIGKCSITLAPEEFNVHVTPRINSLMFSKNVLPE*

>gnl|ti|647193621 1095899233960 49% to 1096064108200
ace_3154.y 
(1) VTGVMNVLCGIVFGTQYEENDKELEKVISFKQLILDGVADTFAISFLPWLRFFPSNGLKKVRK
GVLIRDKLLRFQLKKHRETYNPVQIRDYTDYVLKYSKEFETSRNIDEQLSEDNMEMM
LQDIFISGSETTISTLLWFAVYLVNWPKYQDDIYDETIKIVGNDRYPSLSDRPKLHLFES
AMKETLRLSSVIPLGLPHRSLEETSIKKFKIPKNTNVMINLWQLHHDSKSWSDPHTFNPY
RWLNDKNIFDKSKNPNYLPFSTGLRACLGYHTTESIIFLFFTRLIRDFNLCLKPGASTP
SLNGVLRVTLTPDTSYIILKPRSNNLISQKIEA*

More CYP2 Clan sequences (2 sequences)

>1095898098005    35% to 17A1 34% to 2P4
MLVFQQLIFAVLVPAFLYFVFSYLQHLWICSKYPKGLFPLPLLGNIH
QLGKNSSQTFSSLTKIYGDIFSVSIGTQRLVILNSMESIHEALLTKGSTFGGRPTEF
TSNVFTKGYKNLSHTDYGPNLKALRKVIHLSVQKYAGGLTRQEQMITFERDELCKKLFNTEKEIALRCEI (1)
 (1) DFCTVNVMSGYLFNERFLNQNSEFKDVVKSIQLLLDNSGITDKTTFIHWLRYLPLREWN
793 EIKQARLVLNPWVEKKVEDHWRKYNENEIINVTDSMIQHFLTKYDGLDTDFAKKYITLL 617
616 LIELLVAGTETTAITICWMVLYLIHNPEYQEEIYKEITLNIGCRYPTLAEKNLFPLL 446
445 QAFIQETLRITSVVPLNLAHKALKDTSICGKIIPKDAIVITNLWNLHHDNRYFKNPNEFD 266
265 PKRWINENGLFDSISQKYFKPFSAGARVCLGETLAKNQLFLIISGLIMNFIFTSAPGKD 89
88  LPSLEGQFGITFRPNSFKVL*

>1095901303788    39% to CYP21 39% to 2R1 40% to 2P4
MFLFVVFEVVFGLIIPVLLYVI
VVYIYHIWECQRYPPGPFPLPVIGNYNLLANDPVKALCDLEIIYGDVFSLSLGTVR
VVVVSSHESIYDVLVGDGSNFSGRPREYSSLLFTGGFENLSHMDNNPLTKKIRKVFYSKL
KTNGSILAHNENIVKHESELLHQRLLQNEGSVTNLRYEI (1)
DLCIVNSICSIIFGNRLSDTCEVHEILKATRLLLKNLSNIEIMHYLPWM
RFFLLKKQNEISESRNICKFWIQTQLHKRKKSLKNENISDILLNLWDQQKQENP
NEEQYRMILVELVMAGSETTAATITWLIFYLLHWPHYQSILYKEIKNVCGDQYPTFNDIKSMPIMQATILE
TLRLSSVVPLSLSHKAVNNAKINKFTIPKDTIIITNLWGVHHNEKYWEKPFEFNPMRWLD
KNGELSTAKRLGYFPFSAGPRGCIGESFARMQMFIICSRLIKDFSFELPQSGETPKLDGD
IGITLTPLPYNAVAKQRT*

CYP4 Clan (40 sequences, 9 complete)

>gnl|ti|655005893 1095958068757  43% to 4V5 complete gene no introns
MVSVFYILFSGLVFYVVSKILWKLWRNSYGLSSIVTPPNVPFFGTSLYLHSDA
RKFFFQLYDYTRRYGDVFCIWLGPKPVICSSSVKFSEAVLSSQKVITKGFSYDFLHDWLK
TGLLTSTGSKWKTRRRLLTPSFHFSILNNFIKIFEEQASILVDKLAVAADNKEVVDVQVP
IGLATLDIICETSMGVKVNAQSHPDSEYVKAITVLNEEIQMRQKFPWLWFDAIYKLL 568
567 PCGKRFYKALDVAHKLSFDVINERMQMKIQESYCETASDEKKFFLDLLLDIYRKGKI 397
396 DTEGIQEEVDTFMFEGHDTTSAALGWTLWLLGKNPDVQKKLHKEIDEIELNGGSLYDKVR 217
216 QSKYLEIILKESLRMHPPVPMYGRTVEEDMTIDGQFVPKGAQIVLLVLILHSNPDYWEN 40
39  PNDFIPERFEADSYEKRNPYSYVPFSAGPRNCIGQKFAMIEEK
ILLYSIMKNFHLKSMQNENEVFGTLDIIHKSINGINIKFTRR*

>1096064105622 90% to 1095958068757 varies at N-term 
1096071088011 joins CV564924.1 EST 
MIYASYLVLVGLFVFFVSKILWKLWKSSYGLETIATPPNIPVFGTSLYLHSDARKFFFQL
SEFTKKYGTVFCIWLGPKPMIISSSVKFSEAVLSSQ
KVITKGFSYDFLHDWLKTGLLTSTGSKWKTRRRLLTPSFHFSILNNFIKIFEEQASILV
DKLAVAADNKEVVDVQVPIGLATLDIICETSMGVKVNAQSHPDSEYVKAITVLNEEFVMR
IKYPWLWFDVIYKLLPCGKRFYKALDVAHKLSFDVINERMQMKIRESYCETASDEKKFFL
DLLLDIYQKGEIDTEGIQEEVDTFMFEGHDTTSAALGWTLWLLGKNPDVQRKLHKEIDEI
ELNGGSLYDKVRQSKYLENILKESLRMHPPVPMYGRTVEEDMTIDNQFIPKGAQIILLVL
MLHSNPEYWENPNDFMPDRFEADSYEKRNPYSYVPFSAGPRNCIGQKFAMIEEKILLYSIM
KNFHLKSMQDENEVFGTVDVIHKSINGINIMFTRR

>1097329374310 very similar to 1095899295538 seq
1096608398403
MFIAYSLLVVVSLYFVIKLFWK
FWIYSYGLSTVPTPPTIPFFGNSLQLESDSVKFNKQLCEWSKIYGNVFCVWVGLR
PTIFSSSVNFSEAILSSQEVLKKASIYEFLHDWLKTGLLTSTGNK
WKLRRRLLTPSFHFSILNNFLKIFEEQGACLVDKLRTYAKSGENFDIQVPIGLATLDIICETSMGVKVNAQSHP
DSAYVKAINILSEEIPRRFKYPWLWPDIIYKHLACGKRYYKALDVAHKLSLDVINERIETLFQNE
NNVTTNKNKEVSSEKKKFFLDLLLDIHKKGEIDTEGIQEEVDTFMFEGHDTTSSALSWIL
WLLGRYPQVQQKLHSEIDEVELTGGSLYEKVRNFKYLENIIKESLRIHPPVPLIGRHIEK
DMVIDGQFIPKKSEIGVLVMMMHSSPEYWKDPYDFIPERFEQEDFVKRNPYIYIPFSAGPRNCIG
QKFAMIEEKMLLYSIMKNFYVQSMQNENEILPSLDLIRKSVNGIILKLTER*

>1095964281471 1097672357643 1097675038710 1096526281478 1097675573844
MFSNIKMIYTLCIIICGFYFLIKILWMCWKYSYGLTSIATPPNTPFLGTSFYFLSDS
RKSYFQLCNYTKQFGNVFCIWLGPKPMIVSSSVKFLKAVLSSEKITTKGFSYDWIHDWLK
TGLLTSSGPKWKARRKLL
TSSFHFSVFNRLKIIIEEQACILVDKISFAADNKKVVDVQTLIGLATLDVICETIMGVKINAQ 780
SYPDSEYVKAISVLHKEIVNRMKFPWLWFDVIYKLLPCGKRFYKALDVAHKFTFDIINKR 600
MEISVNESYIDTPLEEKSYFLDLLLNIHKKKEIDMEGIQEEVDTF
IFAGHDTISVALSWTLWLLGKYSEIQRKLHKSIDEIELNGGSLFEKVRNFKYLENII
KESMRIHPPVPMYGRTVEENMTIDGQFVPKGAQIILLVLMLHSDPNIWENPKEFIPERFE
TDDWKIKNSYSYLPFSAGSRNCLGQKFAMIEAKMLLYSIM
KKFSLKSMQDENEVYGTVDILHKSINGINILFTRR*

>1095898788708 
MFLTFMFLFLIYFLIKVFWKLWIYSYGLSTVSTPPTLPLFGNCLQIKSDPVKASKQL
FEWSRVYGKVFCVWVGIRPTIFSSSVNFSEAILSSQKIIQKGFVYNFLHEWLKTGLLTST
GNKWKLRLRLLTPSFHFSILNNFLKIFEEQGNCLIDKFRVLAQNGKYFDIQVPIGLATLD
IICETSMGVKINAQYQPDSEYVTAINILSEEIVRRFKYPWLWPNIFYKHFSCGKRYFKAL
DIAHKLSLNVIHERIQTSLQNESENVLINKLDNKSVLNNEEELGVRKKRFFLDLLLDMHK
KGEIDVDGIQEEVDTFMFEGHDTTSS
AMCWTLWLLGRYPQIQQKLHAEVDEVELTSGSLYEKVRNFKYLE
NVLKESLRLHPPVPLISRYIEEDMMIDGQFIPKKSEIAILVMMIHLNPEYWKDPHSFIPE
RFDQDDFVKRNPYTYIPFSAGPRNCIGQKFAMIEEKMLLYNIMKHFYVESMQNENEILRT
QDLISKSANGIMMKFYER*

>gnl|ti|648470985 1095898761545 N-term
MFLVYSLLVVIFSYFLIKISWKLWIYSYGLSTVPTPPTIPFFGNCLQLESDSVKFNKQI
REWSKIYGNVFCVWIGLTPMIYSSSVNFSEAILSSQKVLKKASVYEFLYEWLQTGLLTSTGNK
WKLRRRLLTPSFHFSILNNFLKIFEEQGACLVDKLRIYAKSGGNFDIQVPIGLATLDIICETSM
GVKVNAQSHPDSEYAKAIGI
LSEEIPKRIKYPWLWPDIIYKHLACGKRYYKALDVAHKLSLDVIKERVKTLIQNKSEVTS
NKNKKESGSEKKKFFLDLLLDMHKKGEIDTEGIQEEVDTFMFEGHDSTSSALSWMLWLLGRYPQVQQKLHSEIDEVE
LTGGSLYEKVRNFKYLENVVKESMRIHPPVPLIGRHIEEDMVIDGQFVPKSSEIVLLVMM
MQSSPEYWKDPYDFIPERFEQEDFVKRNPYIYIPFSAGPRNCIGQKFAMIEEKMLLYIIM
KNFYVQSIQNENEILLALNIIHKSSNGIIMKFTER*

>1096083942127 1097329109827 clearly best match to 4V sequences
MAFILLIFFLLLITLFLIWIYWVRSYNLNFVPSPLRFPLFGCALFLKSESH
ELFKQVRWFFSEFGSAFCLWIGPKPVLMTGNIDHIQTVLKSQKIITKSSSYTFLNE
WLGTGLLTSTGAKWKSRRKVLTKAFHFSIINSYVDSFYQNSVSLSNHLENHSGVPINIQA
LMSLFTLDIICETAMGFKLNSMKNLNCDYVNAVEEVKILLIERQKSPWLWNKFVYKLFSS
GKKFYTQLQVLKSFTKKIVNKRIKNYSLSSNGCKSFLDLLIDAYNQGKIDLEGIYEEVDT
FMFAGHDTTAAALSYIFLMLGTHPKVQKKLHEEIDTNVNINSYENLSEKIRKMEYLDCVI
KESLRLHPPVSVFGRILEDDTIFSNHLVGKGADIVLCPETLHTDPLYWENHRSFIPERFS
NVEFAFCQPYLYIPFSAGPRNCIGQKFALMEIKIAIFVVMSKFIVTAVEQCLSPM
ATFIQRYENGVLMLFEDEKRFLYML*

>EST DN812371.1 joins with CN627429 CN775805 and 1095901729505 1097325001902
DN812964.1 DN810769.1 DN816152.1 
MYTIGIAVLIFLCFSLFFANILKRFYHPLRKLPSPKENFFTAHYGYFNGYDQINAVINFGKQFKERGLYTLDTLN ()
GFRFVNLLMPEFIKTVFSDGNSFQRSTATKVIFPLVGNGIFVSNYEDHHWQRKVLNEAFTLQQLKNYFPAFTVHIDLLMK ()
LWSYSCDKDNGTNIIVLDDLSNLSFDIIGDVGFGYQFNTITSHSGNEFTKALQSYCQL
RFQLNAVHKALLAFFPFLMHLSFMYGKRKRAEQVICNTLNM ()
LINKRKKEIEDGIAADQKDFLTVVLKDQQKEGSKMTNDLIKDNLMTLLIAGHETTSTAMQWCLYMLGT (0)
NLGVQNKLREEIKKNVFDIKSVSYEEVLSIKYLDCVVKETLRMHTPVAFIGRINKNQTKFGDFDV
PAGSFLRIPIDSAHMNESVYHDPYSFRPERFLT ()
GEIPPLSFLTFGQGIYNCIGKNFALLEIKTFLVKALLQFEFSVDLKHLNYKKLISITNKTVEPLWIRVKPI*

>1097309000937 exon 1
MFLVCLALIVLFIGLFLLCYLLKRTFHPLRLLPSPKEQLITGHNRYFHGRDHTSTYLSFNEKFKEEGLCTLDTLY (1)

>1096091465110 88% to 1097331817678 
1096625274441 1096123742264
MFLICLALLILSIGLFFLRYLLKRIFHPLQLLPSPKEQLITGHISHFQGRDHSNTFLGFNEKFKEEGLCTLDTLY

>1097331817678  1096526275245  1096124165677 1096110023112 1096761988512
1096701884902 1096625274440 1096602178993 exon 1
MFVICLALITLFIGLFFLRCLLKRIFHPLRLLPSPKEHLITGHISHFQGRDHSNTFLSFNEKFKEEGLCTLDTLY (1)

>1097675832709 new exon 1 with one possible frameshift or there is another exon
MCMVYIAVLILLCLIVFFAN
VLKRFYHPLRNFPSPQENLITGHYSYFYRYDHVKTLLNFGKQFEKNGLYTLDTLN (1)

>1097675877620 new exon 1 with one possible frameshift or there is another exon
MYMICIAAIVILCFL
VLAVMLKRFYYPLCMLPSPKENLFTAHYRYFYGHDHINAFLNFQNQFKDYGLYTLDLLLG (1)

>1095901177607 new exon 1 only 5 aa diffs from 1097675877620
MYMICIAAIVILCFL
VLAVMLKRFYYPLCMLPSPKENLFTAHYRYIYGHDHINAFLYFQNQFKEYGLYTLDIFL

>Combined N and C-terms CN567799 CN567598 tag12b09.x1 [gene 4] N-term 
XXXXXXXXXXXRVRFLLRYLLKRIFHPLRFLPSPKEQLITGHINHFQGRDHSSTYLSFNEKFKEESLCTLDTLH
VPRFVYLIAPEFIKKIFADGKLFQRSKSIRTLAPLIGNSMVGSNYEHHHWQRKLFNG
AFTSQQLKNYFPAFLKHTNLLMK
(0)LWSYTCDKESGTNLTVLDDLSNLSFDIVGDVGFGYHFNTITSHSGNEVTKAFQKY
CQLRHSLHPFYKALFAYFPFLMRLSFMFGKHKKAEQVISYTXXX (0)
49 aa gap
683 FFIAGYETISTTLTLCLYMLAI
(0) NLEVQEKLREEIQKNKLDVNNISFEEVTSLKYLDCVVK 504
503 ETLRLHGLAPVLGRETINAIKFGEYEIPANTVLQTHVSNLHMNETIYRDPHSFKPERFMT (1)324
323 GEIPASFYLPFGHGVYNCIGKNFALLEIKTFLVKALLQFKFSIDPMHINYTK 168
167 IIWLTMRTVEPLLIRVKPIAE* 102

>DN813094.1 ACAC-aab89g09.g1 Hydra EST UCI 7..= 1097675463974
1097675463974 mate pair to exon 2 1097675525814 missing exon 1
DN813094.1 DN603400.1 DN137655.1 CN774194.1
MFLICLALLILSIGLFFLRYLLKRIFHPLQLLPSPKEQLITGHISHFQGRDHSNTFLGFN 303
EKFKEEGLCTLDTL (1)
(1) VPRYVYLIAPEFIKKIFADGKLFQRATSLKVLAPIIGNSMLTSNYEDHHWQRKLFNGAFT 565
SQQLKNYFPAFLTHTDFLMK (0)
1095898198167 
LWSYTSDKESGTNLTVLDDLSNLSFDIIGDVGFGYQFNTINSHSGNEFT
SAFRYLTELQHNASVFSKVLISCFPFLAQFLLLFGKRRKLIQVVHKTLNK (0)
(0) LIEKRKKEIDDGISTEEKDIITIVLKDQQQESSKLTNDLIRDNLLLFLIAGHETTSTAMTWCLYMLGT (0)
(0) NLEVQEKLREEIQKNILDKKNITFEEILSLKYLDCVVKETLRLHGPAPILGRRNINATKF 957
GEYEVPANTVLRTHVSSLHMNETIYPDPHSFKPERFMT (1)
GEIPATFYLTFGHGIYNCIGKNFALLEIKTFLVKALLQFEFSVDPEHISYKKFIWLTTITAEPLSIRVKPIAD*

>1095898814465 new exon 2 mate pair to 1095899110069
(1) VPRYVYLIAPEFIKKIFADGKLFQRTTSIRIMAPSIGNSMLSSNYEDHHWQRKLFNGAFT 471
SQQLKNYFPSFLTHTNLLMK (0)
1097567103129 1097675494277 
(0) IWSYTCDKESGTNLTVLDDLSNLSFDIIGDVGFGYQFNTITSHSRNEFTSAIRYLAEIQL 657
NASVFLKVLISYFPFLIQLLVMFGKRRKFIQIVRKTLNK (0)

>CX833403.1 ACAC-aaa40d06.g1 Hydra EST UCI 7..
91% to 1095899139433 exon 2
IIFTVLFW*RTFHPLQLLPSPKEQLITGHNMYFHGRDHTSTYLSFNKQFKK*GLCTQHTLX
VPRYVYLIAPQFITKIFAYGKLFQRPTTLKILAPLIGNSMLGSNYKDHHWQKKLFNGAFT 431
SQQLKNYFPAFLKHTT*LMKHWSYTCDKESGTNLTVLDDLSNLSFNIVGDVGFLGFGYQFTQ
ITSHASNEYTS

>1095899139433 new exon 2 
VPRYVYLIAPEFIKKIFADGKLFQRPTTLKILAPLIGNSMLGSNYEDHHWQRKLFNGAFT 549
SQQLKNYFPAFLKHTNLLMK

>new exon 2 1095899339221 1097206043402 1097672369437 possible frameshift/insertion
(1) GFKFIYLLMPEYIKTMVSNGKVFQKSTAMKVIFPLVGNGMLVSNYEHHHWQRKLFNEAFS
AQQLKKYFPAFKEHT DLLIK (0)

>1095964240637  new exon 2 
GFRFVDLLLPEFIKTIFSDGKVFHRSNVLKVLFPLVGNGMIVSNYEDHHWQRKVLNEAFT 854
SQQLKNYFPAFTLHTDLLMK (0)

>1097509072583 new exon 3 boundary wrong (missing one nuc?)
(0) LWSYTCDKESGTNLTVLDDLSNLSFDIVGDVGFGYQFNTITSHSSNEFTSAVRNLTKMQI 694
NASVFSKVLITCFPFLVKFLLLFGKRRNLIQIVYKTLNK (2) 

>1096081231152 new exon 3 
(0) IWSYTCDKENGTKIIVLDDLSNLSLDIIGDVGYGYQFNTLTSHSGNEFTKAFQSYCQLQY 135
NIKPIYKALSAFFPFLMGLSIMFGKRKKTEEILRNNLNM (0)

>BP514308 N-term 25% to 46a [gene 9] missing exons 4-6
BP514307.1 BP505238.1 CO509836.1 ace_5451.y
MYSIYIAIIIVPLVFFVAVFFKRFYHQFRLLPSPKESLITCHYSYFDVHDHVNTLLNFGKEFKDYGLYTINTLI (1)
GPRQVHLLLPHFIKTVIADGKFFQRSPVFKAVFPLVGNSMIVSNYEDHHWQRKLFNQAFT
SQQLKRYFLAFTLHTDLLMK (0)
LWSCTCDKENGTNLNVWSDLSNLSFDIIGDVGFGYQFNTITSHSGNAFTKALRSY
INLRFNSSVVHNVLIAYFPFLMRFLSKFGNLNKAEQVIYNTLNM (0)
1095958061820 1096064134288 mate pair = 1096041094868 exon 3
(0) LIDKRKKEIENGLVKEEKDFLSIVLKDQQQEKSKLTNDLIRDNLMTLLIAGHETTSTAMLWCLYTLGT (0)

>gnl|ti|648047811 1095899057643 I-helix 4 aa diffs to 1095898198167 exon 4
(0) LIEKRKKEIDDGISTKEKDIITIVLKDQQQESSKLTNDLIRDNLLLFLIAGHKTTSTTMTWCLYILGT (0)

>gnl|ti|654999901 1095901768752 87% to 1095901729505 exon 4 linked to 1095901795880
(0) LINKRKKEIEDGIETGEKDFLTIVLKDQQKEGSKMTNDLIRNNLVTLLIAGHETTSVAMQWCLYILGI (0)
(0) NSDVQNKLREDIKKNVFDIKSITCEEVLSIKYLDCVVKEVLRLHPPVSFIGRINTR
QTNFGEYNVPAGSYLRVPINSAHMNESVYPDPYSFKPERFLT (1)

>1097206350025 1097675534489 1096602217388 1097331459342 all with frameshift (pseudogene)
(0) LIDKRKKEIEDGIATDEKDLLTIALKDQQKENSKMTX
    NLIRDNLMTFLIAAHETTSTGMQWCLYMLGT (0)

>1096526374787 no 100% matches to this seq, best match is 1095901795880
 (1)XXXXXXXXXXEKVINIKYLDCVVKEVLRLHPPVLFIGRINTRQTNLGKYIETAGSNQR
VPINNAHMNESVYPDPYSFMPKRLLT (1)

>1096526100337 74% to CN776982
1096526100337  1097206278072  1096123494736  
(0) NLDVQNKLREEIKKNVFDIKSILREEVLSIKYLDCVVKETLRMHPPASFISRKNKTETKL 308
GDYDIPAGTFLRISINNVHMNESVYPDPYLFKPERFMT (1)
1095898850029 same as 1096520314506 = mate pair match to 1096526207508
DEIPPSSFLSFGQGIYNCIGKNFALLEIKTFLVKALLHFEVSVDPSHVNYTKQILLTLNTVEPIWIRVKSIEE*

>1096071008743 84% to CN567799
(0) NLEVQDKLREEILKNILDVNNISFEEVMSLKYLDCVVKETLRLHGPAPILRRRTMNAIKFGEYEVPANTVLQ
THISSLHMNETIYADPHLFKPERFMT (1)

>gnl|ti|648014530 1095896049543 1096082202706 
(0) NIEVQEKLREDIQKNILDVNNISFEEVMSLKYLDCVVKETLRLHGPAPLLGRRTISATKF
GEYEVPANTILRTHVSSIHMNETIYPDPHSFKPERFMT (1)

>1097263640455 new exon 5
(0) NLDVQEKLREGIKKNVSDIKNISYEEVLSNKYLDCVVKEALRIHPPRS

>CX832158.1 ACAC-aaa70d12.g1 = 1097696222067 1097678083218
DHLMTLSLARHETTSTAMQW*LYMLATNLG
VQNKLREEIKKNVFDIXSVSYEEVLSIKYLACVXXXXXX
MHTPAALIGRINKNQTKFGDFDVPAGSFLGILIDSAHTNESVILDPYSFRPERLWT
GEIQPYSYLTFGQGIFNCIGKNFALLEIKTFLVKALLQFEFSVDLEHMNYIKKIFISTKT 60
VEPLWIRVKPI*


>1097331770349 new exon six with stop
DEIPSSSYLTFGYGIYNCIGKNFALLEIKTFLIKAL*QFEFLVDPEQLSYKKQISIST 330
KTAEPLWIRVKSI*

>1097329444796 new exon 6 
GEIPASFYLPFGHGVYNCIGKNFALLEIKTFLVKALLQFEFSVDPKNINYTKVIWLTTRT 44
VEPLLIRVKPLQPV

>1097325113147 1097672294531 
GEIPATFYLPFGHGVYNCIGKNFALLEIKTFLVKALLQFEFSVDPKHANYTKVIWLTAKT 290
TEPLSIRVKPIVD*

>1097206250175
GEVPPFSFLTFGRSNYNCIGKNFVLLDIKAFLVKALLQFKFSVDP 360
MHLNYKKPISITNKAVDPLWIRVKTI*

>1096123749751 boundary is not right
GETPASLYLPFGHGVYKVIGKNFSLLEIKTLSVKALLQLEKVVDPKNINYSKVIWLTSRT 211
VEPLFIRVKLIVD*

>DN245670.1 ACAE-aaa28d17.b1 Hydra EST UCI 5... poor C-term seq
HMNSIKKXXXXXXXXVEPLYIRSKPN*

CYP20 clan (2 sequences)

>gnl|ti|655009968 1095963046224   KYG region 46% to CYP20 35% to 27B1
(2) LGNLGSLTFDGGIHKFLVENHKRLGPMFSFYWGKELAVSLACPILFKE (0)

>1097263613070 mate pair to 1097206643989 I-helix/J helix boundary?
BP508840 BP508840 Best match in Fugu, human and Ciona is CYP20
taz04211.y1
(1) XLTWLVYFLCKHPEVESKVYNEIKEFTEKDLDMELLTKFS (2)
YTKQVIDEVMRIAVLAPYAARYSDYDYDIIVDGHLIPKK (0)
TPIILALGTVFQDETIFPEPDRFDPDRFSDKQIEERSALAFQPFGF
AGKRKCPGYRLAYAETLTYTFYIIKNFHISLFDKQSVKMHYGFVTKPSEEIWIKVLRRKNI*

>CYP20 amphioxus 39% to CYP20 Danio
    MLDYAIFAITFVVFLIATVLYLYP (0)
(0) GANKITTIPGLEPSDPK (2)
(2) DGNLGDVGRAGSLHEFLLKLHTEYGDIASFWWGQQLVVSLGAPELWKQH ERIFDRP (1)
(1) ALLFKGFEPLIGAKSIQYANSVDGRTRRKLYDPSYGHNAMKHYYSIFQE (0)
(0) LGQEMAKKWESMKGDQHIPLHAHIIALAMKAITRSSFGDAFKDEKECVQFGRNYDI (0)
(0) CWNDMEERIKGSHPTEGSPREKKFKE (1)
(1) ALGKLHATIARVAKYRRENPSPPQEQLFIDVLIEGNLPEEQ (0)
(0) VLCDAMTFTVGGIHTSGN (1)
(1) LLTWALYYIATHEEVEEKLHQELSDVLGKKGEVTPDNISQLV (2)
(2) YLRQVLDESLRCAVIAPWGARYMDLDAEVGGHIVPAK (0)
(0) QTPVIHAFGVVLQDERIWPEPNK (2)
    FDPDRFDAENSKGRHKLAFQPFGFAGGRKCP (1)
(1) GYRFTYTWTSVFLSILCRQFKLHLVDGQVVKPCHGLVTRPVDEIWITVTKRD*