ࡱ>   EbjbjҬҬ 2Ɯ_Ɯ_+ :::::NNN8JN)$(n, #######$&)#:8 8 8 #::h#8 8 8 8 ::#8 8 #8 8 "t`#rE8 ,###0)$4#,_*8 _*`#`#_*:t# 8 8 8 8 8 8 8 8 ##8 8 8 8 )$8 8 8 8 _*8 8 8 8 8 8 8 8 8 X : Chlamydomonas reinhardtii cytochrome P450s D. Nelson, Sept. 2, 2004 Under revision May 11, 2006 39 named genes, 2 named pseudogenes, + one bacterial contaminant families = 51, 55, 97, 710, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 767, 768, 769, 770, 771 (5 old families, 16 new families) 51 is in the 51 clan (sterol 14 alpha demethylase) 55 is of fungal origin. (nitrite/nitrate reductase, soluble enzyme) 710 is in the 61 clan (C-22 sterol desaturase in fungi [CYP61] and plants [CYP710]) 737, 738, 739, 740 are in the CYP85 clan 97 is in the CYP97 clan (carotenoid hydroxylases of epsilon and beta rings) 743, 744 are in the CYP711 clan (CYP711A1 produces a carotenoid hormone in Arabidopsis) 745 may be a new plant clan, CYP97A like CYP747 is hard to place. 38% to CYP97A6 in C-term half 741 and 742 sometimes cluster with 97 but not always. 741, 742, 748, 767, 768, 769 cluster together and have best hits to CYP4 clan members 746 may be of bacterial origin, best hit is to CYP252A1 Streptomyces peucetius top 26 hits all bacterial CYP746 and CYP770 may be the Chalmydomonas precursors of the CYP72 clan There is a CYP746 in moss  HYPERLINK "/chlamy.tree.pdf" Chlamydomonas P450 tree  HYPERLINK "/chlamy.FASTA.html" A link to the 2003 Chlamydomonas P450 page P450s sorted by gene model number using the JGI annotation * indicates more than one gene model for a single gene. C_60077 CYP742A1 C_130004 CYP739A1 C_130006 CYP739A2 C_130009 CYP739A4 C_130009 CYP739A5 C_130012 CYP739A6 C_130125 CYP739A3 C_140094 CYP-un1Chlre pseudogene 1, family not identified, half of gene C_180013 CYP743A1 CYP744A4 between C_239009 and C_239004 not annotated C_250032 CYP746A1, 39% to Streptomyces peucetius CYP252A1 C_310063 CYP97A6 C_340039 unnnamed C-term P450 fragment PKG to heme C_410095 CYP97B6 C_420091 CYP743A2 C_470024 CYP737A1 C_570052 CYP738A1 C_680007 CYP51G1 C_900050 CYP747A1, 41% to CYP743B2 C-term *C_940015 CYP744A1 C_940016 CYP744A1, N-term = C_940015 C_940017 CYP744A2 C_940020 CYP744B1 C_940044 CYP744A3 C_980035 CYP743B3 C_980053 CYP741A1 *C_980058 CYP741A1 N-term C_1040015 CYP97A5 C_1080041 CYP740A1 C_1130014 CYP743C1 C_1340038 CYP97C3 70% to 97C2 C_1370013 CYP744C1 C_1530020 unnnamed C-term P450 fragment PKG to end C_1540014 CYP710B1 C_1730009 CYP744A5P pseudogene 81% to 744A3 C_1820019 CYP748A1 about 40% to C-term half of 741A1 C_1860018 CYP745A1 C_2580005 CYP55B1, 43% to CYP55A6 C_4150003 unnamed CYP97 like C-term P450 fragment *C_4260002 CYP97A5 *C_5270001 CYP739A6 C_7970001 unnamed C-term P450 fragment C_8600001 CYP743B2 falls in a seq gap of scaffold 98 C_8600002 CYP743B3 same as C_980035 *C_8650001 CYP744B1 *C_9610001 CYP743C1 C_10690001 unnamed C-term P450 fragment *C_22500001 CYP739A5 *C_28140001 CYP746A1 = C_250032 C-helix exon duplication C_32340001 CYP743B1 falls in a seq gap of scaffold 98 P450s sorted by CYP name (version 2 assembly) CYP51G1 C_680007 CYP55B1 C_2580005 43% to CYP55A6 CYP97A5 *C_4260002 CYP97A5 C_1040015 CYP97A6 C_310063 CYP97B6 C_410095 CYP97C3 C_1340038 70% to 97C2 CYP710B1 C_1540014 CYP737A1 C_470024 CYP738A1 C_570052 CYP739A1 C_130004 CYP739A2 C_130006 CYP739A3 C_130125 CYP739A4 C_130009 CYP739A5 *C_22500001 CYP739A5 C_130009 CYP739A6 *C_5270001 CYP739A6 C_130012 CYP740A1 C_1080041 CYP741A1 *C_980058 N-term CYP741A1 C_980053 CYP742A1 C_60077 CYP743A1 C_180013 CYP743A2 C_420091 CYP743B1 C_32340001 CYP743B2 C_8600001 CYP743B3 C_8600002 same as C_980035 CYP743C1 *C_9610001 CYP743C1 C_1130014 CYP744A1 *C_940015 CYP744A1 C_940016 N-term = C_940015 CYP744A2 C_940017 CYP744A3 C_940044 CYP744A4 between C_239009 and C_239004 not annotated CYP744A5P C_1730009 pseudogene 81% to 744A3 CYP744B1 *C_8650001 CYP744B1 C_940020 CYP744C1 C_1370013 CYP745A1 C_1860018 CYP746A1 *C_28140001 = C_250032 C-helix exon duplication CYP746A1 C_250032, 39% to Streptomyces peucetius CYP252A1 CYP747A1 C_900050 41% to CYP743B2 C-term CYP748A1 C_1820019 about 40% to C-term half of 741A1 C_140094 CYP-un1Chlre pseudogene 1, family not identified, half of gene C_340039 unnnamed C-term P450 fragment PKG to heme C_1530020 unnnamed C-term P450 fragment PKG to end C_4150003 unnamed CYP97 like C-term P450 fragment C_7970001 unnamed C-term P450 fragment C_10690001 unnamed C-term P450 fragment P450s sorted by CYP name (version 3 assembly) CYP51G1 scaffold_7:2481399-2484780 Protein ID: 126254 CYP55B1 scaffold_52:370660-375180 Protein ID: 121742 CYP97A5 scaffold_55:373287-377786 Protein ID: 39257 CYP97A6 scaffold_42:732596-737181 Protein ID: 121076 CYP97B6 scaffold_1:2256360-2261776 Protein ID: 116601 CYP97C3 scaffold_64:422589-430105 Protein ID: 122396 CYP710B1 scaffold_66:390953-394690 Protein ID: 132687 CYP737A1 scaffold_41:635800-640648 Protein ID: 151890 CYP738A1 scaffold_6:2860971-2864314 Protein ID: 167934 CYP739A1 scaffold_8:1064933-1068008 Protein ID: 140983 CYP739A2 scaffold_8:1078648-1085528 Protein ID: 140985 CYP739A3 scaffold_8:1105803-1109510 Protein ID: 140993 CYP739A4a scaffold_8:1131245-1134169 Protein ID: 165902 CYP739A4b scaffold_8:1135368-1135969 Protein ID: 165903 CYP739A5a scaffold_8:1125087-1127174 Protein ID: 165900 CYP739A5b scaffold_8:1128094-1130653 Protein ID: 186291 CYP739A6 scaffold_8:1145820-1150791 Protein ID: 186292 CYP740A1 scaffold_68:172336-177730 Protein ID: 153850 CYP741A1a scaffold_71:380138-383878 Protein ID: 179637 CYP741A1b scaffold_846:3828-5043 Protein ID: 181363 CYP742A1 scaffold_37:480604-486602 Protein ID: 151489 CYP743A1 scaffold_1:5611907-5617553 Protein ID: 116541 CYP743A2a scaffold_16: 609616-615492 Protein ID: 189550 CYP743A2b scaffold_16: 609616-615492 Protein ID: 116043 CYP743B1 scaffold_71:125260-130065 Protein ID: 122749 CYP743B2 scaffold_71:130374-138996 Partial seq not annotated CYP743B3 scaffold_71:139305-143478 Protein ID: 122730 CYP743C1 scaffold_17:1489349-1496178 Protein ID: 147793 CYP744A1a scaffold_23:958703-961028 Protein ID: 148983 CYP744A1b scaffold_23:962118-963228+ Protein ID: 118452 CYP744A2 scaffold_23:969108-971162 Protein ID: 118526 CYP744A3 scaffold_23:976166-982342 Protein ID: 118465 CYP744A4a scaffold_23:1143890-1147747 Protein ID: 95157 CYP744A4b scaffold_23:1141463-1143101 Protein ID: 103666 CYP744A5P scaffold_21:6347-7649 Protein ID: 148389 CYP744B1 scaffold_23:1014183-1020804 Protein ID: 118428 CYP744C1 scaffold_39:932071-938361 Protein ID: 177201 CYP745A1 scaffold_74:79791-84023 Protein ID: 154128 CYP746A1 scaffold_1:3570907-3575049 Protein ID: 116510 CYP747A1 scaffold_96:178714-184286 Protein ID: 108849 CYP748A1 scaffold_9:2353835-2358515 Protein ID: 114278 CYP767A1 scaffold_9:1625885-1634209 Protein ID: 169101 CYP768A1a scaffold_23:1470852-1473965 Protein ID: 149040 CYP768A1b scaffold_23:1476142-1477663 Protein ID: 149041 C_140094 scaffold_48:305112-303028 Partial seq not annotated C_4150003 scaffold_21:297178-306479 Protein ID: 191092 C_7970001 scaffold_15:453166-458216 Protein ID: 170931 C_10690001 scaffold_24:545063-551204 Protein ID: 173996 Bacterial scaffold_661:7589-8149 Protein ID: 109783 P450s sorted by scaffold location (version 3 assembly) CYP97B6 scaffold_1:2256360-2261776 Protein ID: 116601 CYP746A1 scaffold_1:3570907-3575049 Protein ID: 116510 CYP743A1 scaffold_1:5611907-5617553 Protein ID: 116541 CYP738A1 scaffold_6:2860971-2864314 Protein ID: 167934 CYP51G1 scaffold_7:2481399-2484780 Protein ID: 126254 CYP739A1 scaffold_8:1064933-1068008 Protein ID: 140983 CYP739A2 scaffold_8:1078648-1085528 Protein ID: 140985 CYP739A3 scaffold_8:1105803-1109510 Protein ID: 140993 CYP739A5a scaffold_8:1125087-1127174 Protein ID: 165900 CYP739A5b scaffold_8:1128094-1130653 Protein ID: 186291 CYP739A4a scaffold_8:1131245-1134169 Protein ID: 165902 CYP739A4b scaffold_8:1135368-1135969 Protein ID: 165903 CYP739A6 scaffold_8:1145820-1150791 Protein ID: 186292 CYP767A1 scaffold_9:1625885-1634209 Protein ID: 169101 CYP748A1 scaffold_9:2353835-2358515 Protein ID: 114278 C_7970001 scaffold_15:453166-458216 Protein ID: 170931 CYP743A2a scaffold_16:609616-615492 Protein ID: 189550 CYP743A2b scaffold_16:609616-615492 Protein ID: 116043 CYP743C1 scaffold_17:1489349-1496178 Protein ID: 147793 CYP744A5P scaffold_21:6347-7649 Protein ID: 148389 C_4150003 scaffold_21:297178-306479 Protein ID: 191092 CYP744A1a scaffold_23:958703-961028 Protein ID: 148983 CYP744A1b scaffold_23:962118-963228+ Protein ID: 118452 CYP744A2 scaffold_23:969108-971162 Protein ID: 118526 CYP744A3 scaffold_23:976166-982342 Protein ID: 118465 CYP744B1 scaffold_23:1014183-1020804 Protein ID: 118428 CYP744A4a scaffold_23:1143890-1147747 Protein ID: 95157 CYP744A4b scaffold_23:1141463-1143101 Protein ID: 103666 CYP768A1a scaffold_23:1470852-1473965 Protein ID: 149040 CYP768A1b scaffold_23:1476142-1477663 Protein ID: 149041 C_10690001 scaffold_24:545063-551204 Protein ID: 173996 CYP742A1 scaffold_37:480604-486602 Protein ID: 151489 CYP744C1 scaffold_39:932071-938361 Protein ID: 177201 CYP737A1 scaffold_41:635800-640648 Protein ID: 151890 CYP97A6 scaffold_42:732596-737181 Protein ID: 121076 C_140094 scaffold_48:305112-303028 Partial seq not annotated CYP55B1 scaffold_52:370660-375180 Protein ID: 121742 CYP97A5 scaffold_55:373287-377786 Protein ID: 39257 CYP97C3 scaffold_64:422589-430105 Protein ID: 122396 CYP710B1 scaffold_66:390953-394690 Protein ID: 132687 CYP740A1 scaffold_68:172336-177730 Protein ID: 153850 CYP743B1 scaffold_71:125260-130065 Protein ID: 122749 CYP743B2 scaffold_71:130374-138996 Partial seq not annotated CYP743B3 scaffold_71:139305-143478 Protein ID: 122730 CYP741A1a scaffold_71:380138-383878 Protein ID: 179637 CYP741A1b scaffold_846:3828-5043 Protein ID: 181363 CYP745A1 scaffold_74:79791-84023 Protein ID: 154128 CYP747A1 scaffold_96:178714-184286 Protein ID: 108849 Bacterial scaffold_661:7589-8149 Protein ID: 109783 P450 sequences Note: the P450 sequences have many apparent insertions of poly Ala, poly Gly, poly S and mixtures of these. These are found in some ESTs so they are real. It is not clear why these sequences are inserted or what they do to the structure of these P450s. >CYP51G1 C_680007 10 EXONS 56% TO 51G1 Arab EST SUPPORT BI717817 BU649818 BI726293 BM001590 AV642299 60124 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG 59927 59843 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE 59694 59394 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE 59245 59108 EYFAKWGETGVVDFKDEFSKLITLTAARTLL 59016 58765 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 58637 58412 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR 58311 58119 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK 57985 57667 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK 57455 57191 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 57009 56877 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 56743 newest data: version 3 checked April 24, 2006 Name: estExt_gwp_1H.C_70049 Protein ID: 126254 Location: Chlre3/scaffold_7:2481399-2484780 100% match 2481399 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG (0) 2481596 2481680 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE (0) 2481829 2482129 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE (0) 2482278 2482415 EYFAKWGETGVVDFKDEFSKLITLTAARTLL (1) 2482507 2482758 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 2482886 2483111 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR (2) 2483212 2483404 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK (0) 2483538 2483856 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK (0) 2484068 2484332 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 2484514 2484646 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 2484780 >CYP55B1 C_2580005 (possible CYP55 fungal origin), 42% to 105T1 MAPQHD (1) 47793 FPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK 47629 (0) 47243 VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYRY 47127 (0) missing about 20aa here ? seq gap AKADKLVDAMIARGGPLDLNEAFSMPLPFR 46168 (0) (same intron loc. as 55A6) 45913 VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL 45773 (0) 45601 VAEKERNPTGKDLISELVTKQ 45539 (0) 45264 LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ 45136 (0) 44693 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ 44565 (0) 44256 LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ 44143 (0) 43755 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW* 43573 newest data: version 3 checked April 24, 2006 Name: e_gwW.52.47.1 Protein ID: 121742 Location: Chlre3/scaffold_52:370660-375180 Note gene model is too long at SMPLPFRVGGW, shorten by 4 amino acids First exon is still my best guess, not in gene model e_gwW.52.47.1 51% to CYP55A5v1 Aspergillus oryzae 48% to CYP55A3 Cylindrocarpon tonkinense 42% to 105T1 Burkholderia fungorum (bacteria) 370660 MAPQH (1) 370674 370738 DFPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK (0) 370905 371291 VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYR (2) 371404 371628 GMVWPYLTPEAVEQLRPSIQ (0) 371677 372474 AKADKLVDAMIARGGPLDLNEAFSMPLPFR (0) 372563 372818 VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL (0) 372958 373130 VAEKERNPTGKDLISELVTKQ (0) 373192 373467 LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ (0) 373595 374038 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ (0) 374166 374475 LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ (0) 374588 374995 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW* 375180 >CYP97A5 15 EXONS 60% TO 97A3 FIRST EXON PREDICTED BY GENSCAN C_4260002 C_1040015 no mRNA or homology evidence for exon 1 note: CYP97A6 has homology to exon 2, but no upstream match for 5000bp EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158 Gray resembles a cycad EST 13351 MPPDVSGNMLSFSTSISGCRF (1) 373428 GRSAARFLADLGRQWRAEASKRMPE (0) 373502 12913 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSKGLLSEILDFVMGT 12698 (0) 12532 GLIPADGEIWKARRRAVVPALHRK 12461 12332 YVMSMVDMFGDCAAHGASATLDKYAASG 12249 11994 TSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ 11884 11707 AVYTLLREAEHRSTAPIAYWNIPGIQFV 11624 11493 VPRQKRCQEALVLVNECLDGLIDKCKKLV 11407 11269 EEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 11168 11003 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 10884 10681 VDELLGDRKPGVEDLRALK (0) 10625 10448 MTTRVINEAMRLYPQPPVLIRRALQ 10374 10118 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER 10002 9580 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ 9485 9358 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT 9263 9124 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACG PAVMPVASAEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 8852 newest data: version 3 checked April 28, 2006 Name: gwH.55.10.1 Protein ID: 39257 Location: Chlre3/scaffold_55:373287-377786 This model differs from seq below at ends 100% match from ARGDIRE to DPLTVP EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158 Gray resembles a cycad EST scaffold_55 16 exons 373287 MPPDVSGNMLSFSTSISGCRF (1) 373349 373428 GRSAARFLADLGRQWRAEASKRMPE (0) 373502 373725 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL 373874 373875 LTNADKYSKGLLSEILDFVMGT (0) 373940 374106 GLIPADGEIWKARRRAVVPALHRK (2) 374177 374306 YVMSMVDMFGDCAAHGASATLDKYAAS (1) 374386 374641 GTSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ (0) 374754 374931 AVYTLLREAEHRSTAPIAYWNIPGIQFV (0) 375014 375145 VPRQKRCQEALVLVNECLDGLIDKCKKL (0) 375228 375366 VEEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 375470 375635 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 375754 375957 VDELLGDRKPGVEDLRALK (0) 376013 376190 MTTRVINEAMRLYPQPPVLIRRALQ (0) 376264 376520 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER (2) 376636 377058 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ (0) 377153 377280 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT (1) 377375 377514 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACGPAVMPVAS 377663 377664 AEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 377786 >CB092428.1 hf05f08.g1 Cycad Leaf Library (NYBG) Cycas rumphii cDNA clone hf05f08, mRNA sequence. Length=609 This seq supports the secon and third exons above. Query 40 GRSAARFLADLGRQWRAEASKRMPEVRLELRPCDGGGRASCPVLGKSTYTARGDIREIVG 99 GR+ A+ +A ++WRA + +MPE ARG++R + G Sbjct 383 GRALAKSIAVAEQKWRAHNASKMPE-------------------------ARGNVRAVAG 487 Query 100 QPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL 139 QP FVPLY LFL YG +FRL+FGPKSFVI+SDPA AK IL Sbjct 488 QPFFVPLYNLFLTYGGVFRLTFGPKSFVIVSDPAIAKHIL 607 VVQCAGQAGIRPGFEARAIAWPRCVFVSAKTRGFRLNKRVSNDFLGRQLTIKSFSNRQRG GKIRAATVSSLNEGGGGNEPAVERVERLTEEDRAELSVRIAAGEFTAEPVTLNLLKIRLF LIKFGAP GRALAKSIAVAEQKWRAHNASKMPEARGNVRAVAGQPFFVPLYNLFLTYGGVF RLTFGPKSFVIVSDPAIAKHIL volvox matches >ABSY36486.y1 CHROMAT_FILE: ABSY36486.y1 PHD_FILE: [top] ABSY36486.y1.phd.1 CHEM: term DYE: ET TIME: Fri Sep 5 Query: 22 GRSAARFLADLGRQWRAEASKRMPE 46 GR ARFLADLGR+WR+EA+KRMPE Sbjct: 240 GRPVARFLADLGRRWRSEAAKRMPE 314 >ABSY25604.b1 CHROMAT_FILE: ABSY25604.b1 PHD_FILE: [top] ABSY25604.b1.phd.1 CHEM: term DYE: big TIME: Tue Sep 16 11:06:39 2003 Length = 1069 Query: 46 EARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 105 +ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK Sbjct: 340 QARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 519 Query: 106 GLLSEILDFVMGT 118 GLLSEILDFVMGT Sbjct: 520 GLLSEILDFVMGT 558 >CYP97A6 C_310063 missing exon 1 (0) VRVPLNNVGKVPIFQLLYELYSS (2) (2) HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS (2) 9247 KGILAEILEFVMGN (0) 9306 seq gap missing 2 exons 9705 XSVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ 9812 10081 AVYSVLRESTVRSTAPFP 10128 (1) 10371 YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSM 10487 (0) LEAEGSIPMPASPSSPSSSTATSSSAPSSPSAPLEESSA 10853 PTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAAV 10963 11275 LTWALHLLVAHPEVMKRVRDE 11277 11605 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ 11736 11956 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR 12075 12251 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALAVVLRRYDFAPDTSFGPVGFKS 12442 12584 GATINTSNGLHMLISRRDLT 12643 12644 GVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAAVGCPHAAAAATSGAPAGVTP 12811 newest data: version 3 checked April 27, 2006 Name:e_gwW.42.59.1 Protein ID:121076 Location:Chlre3/scaffold_42:732596-737181 100% to e_gwW.42.59.1 from VRVPL to MLISRR scaffold_42 cannot identify exon 1 732596 VRVPLNNVGKVPIFQLLYELYSS (2) 732667 733002 HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS (2) 733106 733345 KGILAEILEFVMGN (0) 733386 733631 GLLAADGEHWIARRRVVAPALQRK (2) 733702 733949 FVSSQVALFGAATAHGLPQLEAAAAAAAAAAGDSRGGGA 734065 734066 ASVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ (0) 734176 734445 AVYSVLRESTVRSTAPFP (1) 734498 734738 YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSMVGRCCGGGGGGGGG (0) 734893 SSAPTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAA (0) 735324 735636 ALTWALHLLVAHPEVMKRVRDE (0) 735701 735969 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ (0) 736100 736320 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR (2) 736439 736615 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALA VVLRRYDFAPDTSFGPVGFKS (1) 736806 736948 GATINTSNGLHMLISRRDLTGGVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAA VGCPHAAAAATSGAPAGVTPQ* 737181 54% to DY932408.1 plains sunflower Helianthus petiolaris MAASLTTLQFPSPYLNTPTTKFKLKSPSTSFPKSYGVSRSCGIKCSYSNGRKPD SGEEKSGKKVEMTPEEKRRAELSARIASGAFTVEQPSLGSLLVSGLAKLGVPSNILEPVS NLINSGGNYPKIPEAKGAISAIRSEAFFIP LYELFLTYGGIFRLTFGPKSFLIVSDPNIA KHILKDNAKAYSKGILAEILEFVMGTGLIPADGEVWRVRRRVIVPALHLKYVAAMIGLFG EATDRLCKKLDDAAYNGEDVEMESLFSRLTLDIIGKSVFNYDFDSLD >CYP97B6 on top of gene model C_410095 but annotation is in the wrong frame strongly suspect ARGN... is N-term part of CYP97B6, but no proof compare to 97A5 exon 2. ESTs BI996334.1 AV390436.1 ALIAHKTLLQLY ARGNIREIVGQTATVPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNADKYSMGLLTKILDLVMST 64% identical to 97A5 exon 2 but not in ver 3 of genome on the Bac ends from ver 2 PTQ4692.y1 CHROMAT_FILE: PTQ4692.y1 PHD_FILE: PTQ4692.y1.phd.1 This is probably a real exon 2 of a CYP97 like seq HE 479653 DMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK 479552 479264 AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR 479154 ADLKVINECLDNLIKQARDTRVAEDAEALQNRDYSKVSDPSLLRFLVD MRGEEPTNKQLRDDLMTMLIGGHETTAAV (44 aa sequence gap up to EXXR) CLGESLRMY 477871 PQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR 477740 477549 SPYLWKDPDTFRPERFFEPN 477484 SNPDFGGKWAGYRPDAVTGGAALY PNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAAMLLRRFTFRLAVPAEKV (1?) 476620 GMATGATIHTANGLSMRVTRRTP 476552 SGGSGSGAPGAAAKVPATV* >PTQ4692.y1 CHROMAT_FILE: PTQ4692.y1 PHD_FILE: PTQ4692.y1.phd.1 CHEM: unknown DYE: unknown TIME: Thu Jan 10 11:26:57 2002 TEMPLATE: PTQ4692 DIRECTION: rev = trace file 334400148 no other trace files match, may have errors TCTATGTGACCTATACAAACTCTCGCTTGGCGAGACCTGGAGGATCACTC CAGTCTGGCGAATTCGCGGACTCGGGCTCGAAAAAGAGATTGGACTCGAT CCCTGTCGCCAAGTGCTGAGGAAGGATCCGCTTGTTGGCGATGCAAATTG CAAAAACGGAATTCAGGAAGCGGAGCGCACGCACTAGATGCCTCCACATG ACACCGGTAATATGATGACCATTTCAACATAGCATATCACGATGCCGATA TGGGTGCTGTGCATGACCGACCTTTGGACCAGGGGGTGCCCCATCGTCCA CGCCCAACTGCCTGCTTGGCTCTGACACAGGACGGTCTGCAGCTCGCTTC CTGGTGGACTTGGGCCGCCACCGGCGTGCCAAGGCCACAAAGCGCATGCC TGACGTGAGGTTATAGCTGCGGACCTGCTGACGGCGGTGGGCAGATGCAG CTGCCCGGTAACTGGGCAAATCCACGTATACTGCATGGTGTGCAATGCAT GGGGCGTCAGTATACTTGTAAAGGGTGTACTCTCACCTATCAGTGGGCTC ATATGACCGGGGCCTGCGACTCCGTCCTGAAATCGACAAAAAGCTAGCGC CCTTGATTGCCCACAAAACTCTCTTGCAATTGTACGCACGCGGCAACATA CGGGAGATTGTGGGCCAAACAGCGACTGTGCCGCTGAACAAACTGTTCCT GGTGTACGTGCAGATCTTCCGGGTGTCTTTCCGGCCCAGAGCTTCTGGAT CATCTCTGAGCCCGCATGATGCGAAGGAGATCCTGCGCACGAACGCTGAC AAGTACAGCATGGGGCTGCTCACGAAGATCCTGGATCTCGTGATGAGCAC GCACGGTGCGCGTTGC newest data: version 3 checked April 27, 2006 Name:e_gwW.1.53.1 Protein ID:116601 Location:Chlre3/scaffold_1:2256360-2261776 Green supported by identical ESTs Gray supported by related ESTs, but not identical Two small gaps and the N-term are missing Note: yellow region is out of order, but supported by an EST This seq agrees with model at FIDS to PAFH, GSAVV to AKFR, LEDL to VTRR Seq gap here AV390436.1 BI996334.1 2261776 FIDSGGVYKLVFGPKAFIVVSDPVVVRHILK (0) 2261684 2261461 ENAFNYDKGVLAEILEPIMGKGLIPADLETWKVRRRAVVPAFHK (2) 2261331 lyleamvkvfsdcsekmilkseklireketssgedtiel Arabidopsis 2259284 (0) GSAVVDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK (0) 2259168 2258877 AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR (2) 2258770 ADLKVINECLDNLIKQARDTRVAEDAEAL 2263466 QNRDYSKVSDPSLLRFLVDMRGEEPTNKQLRDDLMTMLIAGHETTAAV 2263323 LTWAMFCLVQ (0) ntdklvkaqaeidtildqrkp Ginkgo (1) SLEDLKAMPYLRA 2257791 CLGESLRMYPQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR (2) 2257633 2257436 SPYLWKDPDTFRPERFFEPNSNPDFGGKWA (1) 2257347 2256904 GYRPDAVTGGAALYPNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAA 2256758 2256757 MLLRRFTFRLAVPAEK (0) 2256710 2256491 VGMATGATIHTANGLSMRVTRRTPSGGSGSGAPGAAAKVPATV* 2256360 >CYP97C3 C_1340038 RUNS OFF END 70% to 97C1 44288 VPLGQDVMISVYNIHHSPAVWDDPE (0) 44214 43839 AFIPERFGPLDGPVPNEQNTDFR 43777 (2) 43352 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 43227 42864 IGMTTGATIHTTNGLYMYVKER 42799 GAAASGSSGVAGGKQLAAA* Name:e_gwW.64.11.1 Protein ID:122396 Location:Chlre3/scaffold_64:422589-430105 e_gwW.64.11.1 has an internal seq between EEMRAA and VPVGQD that is not right This seq agrees with model from DIKE to EEMRAA, VPVG to YMYV >e_gwW.64.11.1 [Chlre3:122396] green parts look right compared to Arab. The first exon shown matches a volvox seq. The true N-term is not identified. assembled pieces 422589 GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684 (0) NVPVANAQPA 423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD PAAAKHVLRSTDNSQRNIYNKGLVAE (0) VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2) AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT NDLIKQCKEMVDEEEMRAASAAAAA (1) (1) GTEYLIEAVPSVLRLLIPERAEVDSTQ (chlamy AFWX153863.b3 with frameshift DST/QLRDD) LRDDLLSMLVAGHETT (1) (1) APLTWTLYLLVNNPNKMYAP (0) 390458 (0) AEVDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK (0) 390625 428555 (0) VPVGQDVMISVYNIHHSPAVWDDPE (0) 428629 429002 AFIPERFGPLDGPVPNEQNTDFR ()429070 429495 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 429617 429980 IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA* 430105 NVPVANAQPA is from trace file 658821390 422589 GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684 (0) NVPVANAQPA 423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD PAAAKHVLRSTDNSQRNIYNKGLVAE (0) VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2) AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT NDLIKQCKEMVDEEEMRAA (2) Trace 335863205 continues seq (no match) (seq gap) (0) LRDDLLSMLVAGHETT (1) Trace file 650467013 matches mid region of gap in e_gwW.64.11.1 This seq has 436 (1) APLTWTLYLL (0) = 97C like seq 390464 (0) VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK (0) 390625 This fragment matches scaffold 64 from 390464 to 390625 Missassembled 97C3 seq 428555 (0) VPVGQDVMISVYNIHHSPAVWDDPE (0) 428629 429002 AFIPERFGPLDGPVPNEQNTDFR ()429070 429495 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 429617 429980 IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA* 430105 (0) dymndsdpsvlrfliaareevdstq (volvox trace 636376981) blast of Chlamy unplaced reads with Physcomitrella 97C seq >SYF31892.y1 CHROMAT_FILE: SYF31892.y1 PHD_FILE: SYF31892.y1.phd.1 [top] CHEM: term DYE: ET TIME: Mon May 20 17:26:52 2002 TEMPLATE: SYF31892 DIRECTION: rev Length = 786 Score = 76.5 bits (163), Expect = 7e-14 Identities = 30/46 (65%), Positives = 40/46 (86%) Frame = +1 Query: 31 QLRDDLLSMLVAGHETTGSVLTWTVYLLSKNPAALAKVHEELDRVL 76 QLRDDL++ML+AGHETT +VLTWT+YLLS++P A A + +E+ RVL Sbjct: 196 QLRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKEVRRVL 333 >AFWX152107.b2 CHROMAT_FILE: AFWX152107.b2 PHD_FILE: [top] AFWX152107.b2.phd.1 CHEM: term DYE: big TIME: Mon Nov 1 12:27:19 2004 Length = 1012 100% match to 97C1 Arab. Query: 31 QLRDDLLSMLVAGHETTG 48 QLRDDLLSMLVAGHETTG Sbjct: 15 QLRDDLLSMLVAGHETTG 68 GCAGGTGGATCACGCAGCTGCGCGACGACCTGCTGTCCATGCTGGTGGCGGGACACGAGACCACGGGTGAGGGGGGGCGGGGGCAGGGGCTTGTGCCGGCCACCCGTTAT This matches trace file 587324724 in Chalmy Also matches 636376981 in volvox Use the volvox seq to look upstream The volvox seq matches Physcomitrella 97C5 with no intron Query: 317 QDYMNDSDPSVLRFLIAAREEVDSTQLRDDLLSMLVAGHETT 442 volvox ++Y+N+SDPSVLRFL+A+REEV S QLRDDLLSMLVAGHETT Sbjct: 372 EEYVNESDPSVLRFLLASREEVSSVQLRDDLLSMLVAGHETT 413 moss 97C5 volvox 97C like I-helix >ABSY190514.b1 CHROMAT_FILE: ABSY190514.b1 PHD_FILE: [top] ABSY190514.b1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28 22:26:28 2003 Length = 1327 Score = 35.8 bits (73), Expect = 0.033 Identities = 16/40 (40%), Positives = 26/40 (65%) Frame = -1 Query: 18 LLASREEVSSVQLRDDLLSMLVAGHETTGSVLTWTLYLLS 57 LL + + +S+ +LR + +LVAG ETTG + W+L L+ Sbjct: 532 LLITGKPLSAKRLRCETAFLLVAGFETTGHGIAWSLLFLA 413 Volvox I-helix region 97B like seq >ABSY134624.g1 CHROMAT_FILE: ABSY134624.g1 PHD_FILE: [top] ABSY134624.g1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28 22:55:34 2003 Length = 1303 Score = 32.6 bits (66), Expect(2) = 0.001 Identities = 12/19 (63%), Positives = 18/19 (94%) Frame = -3 Query: 23 EEVSSVQLRDDLLSMLVAG 41 E+V++ QLRDDL++ML+AG Sbjct: 137 EDVTNKQLRDDLMTMLIAG 81 Score = 22.8 bits (44), Expect(2) = 0.001 Identities = 9/14 (64%), Positives = 11/14 (78%) Frame = -3 Query: 11 DPSILRFLLASREE 24 DPS+LRFL+ R E Sbjct: 176 DPSLLRFLVDMRGE 135 Volvox EXXR region for 97C >ABSY52309.x1 CHROMAT_FILE: ABSY52309.x1 PHD_FILE: [top] Query: 21 PTIQDMKKLKYTTRVMNESLRLYPQPPVLIRRSIDNDIL 59 PT+ D +L+Y R +NES+RLYP PPVL+RR++ D L Sbjct: 168 PTLADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDEL 284 (0) VESVMGSRTAPTLAD YGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGYK (0) GTGGAGTCCGTGATGGGCAGCCGTACCGCCCCCACCCTGGCGG ACTACGGCCAGCTGCGGTACGTGATGCGCTGTGTGAACGAGTCCATGCGGCTCTACCCGC ACCCGCCCGTGCTGCTGAGGAGGGCGCTGGTGGAGGACGAGCTGCCGGGGGGCTACAAG This volvox DNA matches trace files for Chlamydomonas 90% 336308963, 335368868, 335328342 (0) VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLL RRALVEDELPGGFK (0) This fragment matches scaffold 64 from 390464 to 390625 Missassembled 97C3 seq N-term region >gi|93288035|dbj|BW989539.1| BW989539 Chamaecyparis obtusa cambium and surrounding tissues Chamaecyparis obtusa cDNA clone CO02636 5', mRNA sequence. Length=565 Score = 85.9 bits (211), Expect = 7e-16 Identities = 41/78 (52%), Positives = 54/78 (69%), Gaps = 1/78 (1%) Frame = +3 Query 5 DSKGAGTSFTSPGWLTQLNMLWGGKSVSNVPVANAQPADIKELLGGALFKALYKWMQESG 64 D GAG S+ SP WLT + G S +P+ANA+ D+K+LLGGALF L+KWM+ESG Sbjct 234 DKAGAGLSWVSPDWLTSFMKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESG 413 Query 65 PIYLLPTGPVSSFLVVSD 82 P+Y L GP +F+V+SD Sbjct 414 PVYRLAAGP-RNFVVISD 464 ISPSLPSITSNVAVSLPKQSTRKKKTRLLRIQCRVDEKSTSTDKAGAGLSWVSPDWLTSF MKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESGPVYRLAAGPRNFVVISDP EAAKHVLRNYGKYGKGLVSEVSQFLFGSGFAIAEGELWMVRRKAVLPSIHRKYLSVMVDR VFCKCAERLVEKLNRDTEMAVEVNME volvox >ABSY209455.b1 CHROMAT_FILE: ABSY209455.b1 PHD_FILE: [top] ABSY209455.b1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28 23:45:44 2003 Length = 1108 Query: 25 GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS 56 GK+ID+ GAG SFTSPGWLTQLNMLWGGK VS Sbjct: 236 GKSIDAAGAGASFTSPGWLTQLNMLWGGKGVS 331 Volvox >ABSY179960.b1 CHROMAT_FILE: ABSY179960.b1 PHD_FILE: [top] Query: 61 EAVELIRKTTNDLIKQCKEMVDEEEMRAA 89 +AVELIR+TTNDLI++CKEMVDEEE AA Sbjct: 443 KAVELIRQTTNDLIRKCKEMVDEEEREAA (1) 357 agt >CX541939.1| s13dNF0BH03GS032_467186 Germinating Seed Medicago truncatula Query 1 LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKAL 60 LD+IG SVFNY+F++L SDSPVI+AVYTALKE E R+TDLLP WK+ + +IPRQ KA Sbjct 120 LDVIGLSVFNYNFDALNSDSPVIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAE 299 Query 61 EAVELIRKTTNDLIKQCKEMVDEEEMR 87 AV +IRKT DLI+QCKE+V+ E R Sbjct 300 NAVTVIRKTVEDLIEQCKEIVESEGER 380 SIMVDRVFCKCAERLVEKLQADAVNGTAVNMEDKFSQLTLDVIGLSVFNYNFDALNSDSP VIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAENAVTVIRKTVEDLIEQCKEIV ESEGERIDADEYVNDADPSILRFLLASREEVSSVQ >CYP710B1 C_1540014 10 EXONS 43% to 710A1 exon 1 predicted by genscan. EST SUPPORT BI719962.1 There are two possible start codons 15aa apart. 20577 MNATGLLNDGLASLG MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 20356 20000 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP 19905 19904 GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 19785 19644 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 19498 19355 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 19269 18917 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 18802 18583 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 18497 18215 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 18012 17722 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLTETYTAPKGALIVPSLVAACKQ 17456 (0) 17279 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 17155 (0) 16995 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 16840 newest data: version 3 checked April 30, 2006 Name: estExt_gwp_1H.C_660048 Protein ID: 132687 Location: Chlre3/scaffold_66:390953-394690 394690 MNATGLLNDGLASLG 394645 MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 394505 394113 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 393898 393757 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 393596 393468 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 393373 393030 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 392911 392696 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 392610 392328 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 392152 391835 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLT ETYTAPKGALIVPSLVAACKQ 391608 (0) 391392 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 391270 (0) 391108 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 390953 >CYP737A1 C_470024 I cannot detect the N-terminal sequence for this gene. (about 100 aa) 13432 (2) SWPAATVAMLGTDSVTFST 13379 (1) 13145 GAYHRSLRRLLGPCFSPQ 13092 (0) C-helix 12878 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0) ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS 12322 (1) 12098 NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS (0) DLAINLLFAGHETTATSIVRLML (0) VLRSRPDVVSRLREEQAAAVRQHGAAIS (1) 10590 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0) GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2) WLSGRSSASGNSSNSASSSAL QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM GMLPPQMLTFGGGGRYCLGANLAWAELK (0) VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0) VRAR* newest data: version 3 checked April 30, 2006 Name: Chlre2_kg.scaffold_41000082 Protein ID: 151890 Location: Chlre3/scaffold_41:636238-640632 640648 (2) SWPAATVAMLGTDSVTFST 640592 (1) 640358 GAYHRSLRRLLGPCFSPQ 640305 (0) C-helix 640091 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0) 639933 639681 ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS (1) 639538 639314 NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS (0) 639183 638938 DLAINLLFAGHETTATSIVRLML (0) 638870 638116 VLRSRPDVVSRLREEQAAAVRQHGAAIS (1) 638033 637803 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0) 637672 637263 GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2) 637156 636987 WLSGRSSASGNGSSNSASSSAL QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM GMLPPQMLTFGGGGRYCLGANLAWAELK (0) 636721 636345 VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0) 636238 635814 VRAR* 635800 >CYP738A1 C_570052 a member of the CYP85 clan There is a problem between exons 3 and 4. In almost all members of the CYP85 clan (CYP85, CYP707, CYP90 etc.) There are 28 amino acids between TVM and LVG in this gene there is no way to accomplish this spacing. I suspect an error. The yellow sequence can be inserted if a T to A change occurs at 78905 creating an AG boundary, but the sequence is still 5 aa short. Need an EST 78090 MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE LPLPEGSLGLPVVGETLELITN (1) 78317 78475 GDTFGTSRRERYGDVYKTNILGAPTVM 78555 (0) 78907 VAAPMARRYACICFRFSCQVTST 78976 LVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ 79104 (0) AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT (1) 79935 AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT 80030 VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQ (0) DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK 80709 (0) 81002 GWRVWCHVGDSVTRYNKDQFQPERWLGSSG 81091 (1) MAAGGCPMHAGGGGAARGA 81230 QPEYSLPFGSGVRTCLGRNLVMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL* 81433 newest data: version 3 checked April 30, 2006 Name: fgenesh2_pg.C_scaffold_6000379 Protein ID: 167934 Location: Chlre3/scaffold_6:2860971-2865055 2864314 MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE LPLPEGSLGLPVVGETLELITN (1) 2864090 2863929 GDTFGTSRRERYGDVYKTNILGAPTVMVYGE (0) 2863837 2863692 DAVRAVLAAEDRLVASDWPQ (0) 2863631 2853440 VTSTLVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ (0) 2863300 2862803 AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT (1) 2862681 2862469 AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQQ (0) 2862116 2861859 DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK (0) 2861695 2861402 GWRVWCHVGDSVTRYNKDQFQPERWLGSSG (1) 2861313 2861231 MAAGGCPMHAGGGGAARGAQPEYSLPFGSGVRTCLGRNL VMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL* 2860971 >CYP739A1 C_130004 no ESTs inserts in exon 3 and exon 6 INSERTION IN EXON 8 newest data: version 3 checked May 1, 2006 Name:Chlre2_kg.scaffold_8000154 Protein ID:140983 Location:Chlre3/scaffold_8:1065299-1068007 1064933 MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSR ATSVPRLGPLLTIPLLGDVAWLAADPTRFVFGR (2) 1065157 1065263 FQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVP FSSFRRLMEVAPGRPYDVQADKAAHGPW (0) 1065466 1065649 RRMFLSALGPAGLQALLPRAQAVMQAHLAQWEAAGTAAGGRSGGGCIPSLFRQ (0) 1065807 1065921 VRLLSVDLAIEVIAEVPLPPGVERIAFREQ (0) 1066010 1066110 LLCFLDGLFGLPLALPGSSVARALAAKEELVAALGPLVAADRQRMAKR (0) 1066253 1066445 WRAAGSSYAALVDTLTAASAAVGGSAAAEAAAGVQAAEPSAAAAARVTVRDAVISGFMALG (2) 1066627 1066780 RAAAVSVLHAVVAGADTTRFALFNTLALVAMSARVQEEIFAEQER (0) 1066914 1067117 VVAEHGPELSARVLGSAAITPYLDAVVREAMRLLPATPGN MRRLTADLRVGAGRGGPASELVIPK (1) 1067311 1067464 GSMVWRFVPLMHCLDPVLWDGDTSVDVPAHMDWRSNFEG AFRPERWLSEDTKPKYYYTFGSDNHLCVGQNLAYM (0) 1067685 1067865 EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRGTDRVLLEPR* 1068008 >CYP739A2 C_130006 EST support BI724239.1 1031069F06.y1 note micro exon of 24 nucleotides (phase 1 boundaries) newest data: version 3 checked May 1, 2006 Name:Chlre2_kg.scaffold_8000156 Protein ID:140985 Location:Chlre3/scaffold_8:1078611-1085528 1078648 MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGPLLHVPLLGDG VLMAAGNPVKMFWDR (2) 1078872 1081962 YRRYGSVFRTMMLGSRIWVVTDLDALRGPLRDEGAYLEIPFKAFQRLV (2) 1082105 1082294 SAESFLNRPGVHGPW (0) 1082338 1082426 RKIFSATLAPPRLAAMVPKIAQ (0) 1082491 1082664 LMQSHLSKWEEQGQVTIFRA (0) 1082723 1082865 ARVMGVDLAVDVILDIKLLDGTDRAWVKSQ (0) 1082954 1083213 VEDYLDGLYGLPLNLPGSTLSKALAARARLVEVFLRQPDVAAMQAQF (0) 1083353 1083542 WEAIGKSPQAYAAAVLDQHTSTGDKPAGVAAEEEPSGKAAGAPTPAAP GSRPAVLPPSIMTAQLMGRAMLK (2) 1083754 1084122 PSELADGAMSLLHMLVASADTTRFALFNTWTLLAMSPRVQDKLYEEQKK (0) 1084268 1084520 VMAEYGEELSYAATCHMPYMDATLKECMRLLPASAGGIRKLTADMQVGGYTVPA (1) 1084681 1084836 GEYVWYHA (1) 1084859 1085017 GLMHYIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETRPRYM FTFGTGAHLCIGMNLVYL (0) 1085214 1085385 EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKGTDTVLLEPR* 1085528 >CYP739A3 C_130125 PTQ11643.x1 PTQ6387.y1 insertion of 15 aa in the WEEG region (DRWT) end of exon 3 Also insertion in exon 6 169549 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR (2) 169388 169172 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEIPIKSFKMLTELPSPNSDRVNHATW (0) 168982 168818 RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0) 168370 QRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 168342 168037 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 167894 WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGGGGGGGPEAAAPAPQGQQQ SLFRLPQAVMLGFFGLK (2) 167270 ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 167136 166905 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 166687 166388 GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG 166275 166274 AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 166155 165982 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 165842 newest data: version 3 checked May 1, 2006 Name:Chlre2_kg.scaffold_8000164 Protein ID:140993 Location:Chlre3/scaffold_8:1105803-1109384 1109510 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR (2) 1109349 1109134 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEI PIKSFKMLTELPSPNSDRVNHATW (0) 1108943 1108779 RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0) 1108612 1108334 CRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 1108245 1107998 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 1107855 1107661 WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGG GGGGGPEAAAPAPQGQQQSLFRLPQAVMLGFFGLK (2) 1107440 1107243 ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 1107097 1106866 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 1106705 1106352 GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 1106131 1105943 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 1105803 >CYP739A4 C_130009 no ESTs insert in exon 8, 52% to 739A5 MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0) WPFLGDAVELGITSDLSRLM (2) 7765 FKKYGRVFRLNLLGHTAFV (0) 7434 VSDEAALRGVLSDDGAIATIPFRAFS (2) 7411 7197 DLMGEYGTQSVKEIHGPW (0) 7181 6868 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0) SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1) 5973 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR (2) 5465 HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 5420 4094 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY 3915 3914 HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYYFTFGSGVHLCAGVNLVYL (0) 3498 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL* newest data: version 3 checked May 1, 2006 (Join two models) Name:fgenesh2_pg.C_scaffold_8000179 Protein ID:165902 Location:Chlre3/scaffold_8:1131576-1134169 Name:fgenesh2_pg.C_scaffold_8000180 Protein ID:165903 Location:Chlre3/scaffold_8:1135368-1136663 1131245 MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0) 1131400 1131576 WPFLGDAVELGITSDLSRLM (2) 1131635 1131729 FKKYGRVFRLNLLGHTAFV (0) 1131785 1132054 VSDEAALRGVLSDDGAIATIPFRAFS (2) 1132131 1132291 DLMGEYGTQSVKEIHGPW (0) 1132344 1132623 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0) 1132748 1133096 SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1) 1133209 1133518 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR (2) 1133808 1134023 HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 1134169 1135197 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWL SEETKPKYYFTFGSGVHLCAGVNLVYL (0) 1135580 1135793 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL* 1135969 >CYP739A5 C_130009 C_22500001 EST SUPPORT BI527318 BG852189 BE129324 BI527323 BI527331 BU651784.1 MICRO EXON 13 NUCLEOTIDES newest data: version 3 checked May 1, 2006 (Join two models) Name:fgenesh2_pg.C_scaffold_8000177 Protein ID:165900 Location:Chlre3/scaffold_8:1125087-1127174 Name:estExt_fgenesh2_pg.C_80173 Protein ID:186291 Location:Chlre3/scaffold_8:1128094-1131067 1125087 MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLF SWPFLGDTIEFATTDPTKFLFGR (2) 1125287 1125418 FKRYGR (2) 1125435 1125617 VFRLSLLGFTAYVTADPEALRPLLADEGGHFTIPVQTFTALMGAYNLQAHKEVHAAW (0) 1125787 1126081 RKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRVELYEA (0) 1126206 1126538 ARTLGLDLAVDVLSGVKLEERGIQPAWLKSR MADFLGGLYGLPLALPGSPLAKALAAKEELLRVLVPAVEGRQQELLKL (0) 1126774 1127068 WEDNDRSAAAVATKLASSPETATIADANLLGFTARG (2) 1127175 1128397 CTTPRDAAMTVLHAVMGAADTTRFALFNTWAILAMSPRVQDLIYEEQKK (0) 1128543 1128758 VVAENGPELTYKTAMSMP (2) 1128811 1129253 YLDAAFKEAMRLLPASAGGFRMLTKELRVGDVLLPP (1) 1129360 1129696 GTIIW (2) 1129710 1130071 FHALLLQTLDPVLWDGDTSVDVPVHMDWRNNFEGAFRPERWLSEET KPRSYYIFGQGAHLCAGMVLVTL (0) 1130277 1130498 EVKLLLAMVLRKWRLQLEVPDMLARAELFPYTKPAKGTGGMRLIAREQPVA* 1130653 >CYP739A6 C_130012 C_5270001 33% to 707A2, 85 clan member, 57% to 739A2 ESTs BU647654.1 BI528139 28201 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQSRFMFSR (2) 28013 27729 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFPTFKLLMGDFNVAGGGKHIHGPW (0)27538 26890 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 26765 26576 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 26487 26101 AETWLYGMWGLPVPLPGS (2) 26048 25807 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 25718 25284 WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2) 24803 ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 24672 24589 VIEEFGPELSYKAASSMP (2) 24536 24153 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 24046 23660 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 23493 23362 FTFGYGNHLCAGINLAYL (0) 23309 23164 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 23024 newest data: version 3 checked May 1, 2006 (Join two models) Name:estExt_fgenesh2_pg.C_80177 Protein ID:186292 Location:Chlre3/scaffold_8:1145690-1151969 1145820 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGP MFTLPILGDTVALALSEQSRFMFSR (2) 1146008 1146292 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFP TFKLLMGDFNVAGGGKHIHGPW (0) 1146483 1146925 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 1147050 1147239 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 1147328 1147714 AETWLYGMWGLPVPLPGS (2) 1147767 1148008 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 1148097 1148531 WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2) 1148674 1148997 ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 1149143 1149226 VIEEFGPELSYKAASSMP (2) 1149279 1149662 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 1149769 1150155 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 1150322 1150453 FTFGYGNHLCAGINLAYL (0) 1150506 1150651 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 1150791 >CYP740A1 C_1080041 ONE EXON IN A SEQ GAP C_1080041 3676 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTRYMWK (0) 3500 NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0) VPGKTFTMLISDIRHMQVPEQHAVF (0) RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0) CAAVSLDVALEVLTGVPLPGGPETRAEVRRGTGG (0) LFRTLAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPPELDREVAELAGK (0) SRLPTGGTAWHRTPRPGSAACPRGPTADAGSRRSHRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR (2) TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0) AMRKYGDELSYSILNDMPYLDAVIK (0) 497 ETIRVFPTAVGGFRRALKDVP (0) TIN268971.x1 XXXXXXXXXXXXXXXXXXXXX PKG motif in seq gap (0) AFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE (0) ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP* newest data: version 3 checked May 1, 2006 (Join two models) Name:Chlre2_kg.scaffold_68000022 Protein ID:153850 Location:Chlre3/scaffold_68:173935-177729 172336 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGP AFYLPAIGETLSLFASPTRYMWK (0) 172533 172751 NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0) 172870 173104 VPGKTFTMLISDIRHMQVPEQHAVF (0) 173178 173539 RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0) 173667 173950 CAAASLDVALEVLTGVPLPAAPETRAEVRRGTGG (0) 174051 174373 LFRTALAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPELDREVAELAGK (0) 174522 174680 SRLPTGGTAWHETHLAHARTPRPGSAACPRGPTADAGSRRS HRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR (2) 174913 174948 TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0) 175109 175468 AMRKYGDELSYSILNDMPYLDAVIK (0) 175542 175867 ETIRVFPTAVGGFRRALKDVP (0) 175929 176602 VEGGQLIPAGSIVFYSTHLLNAADPALLPRSLAPE (2) 176706 176860 ALEGPTGLPAHLDYE (0) 176904 177170 CRLEEAFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE (0) 177292 177581 ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP* 177730 >CYP741A1 14 exons C_980053, C_980058 (N-term part) 72A9 LIKE exons 3,4,5,13,14 not well supported MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 8225 8408 VHQCMARWAEQYGKIFKLELPTMT (0) 8479 missing approx 80 aa between exon 2 and VMTG this is a very poorly conserved region, so it is very hard without cDNA to identify the missing piece(s). 10741 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 10893 11059 VVGELMTALQRMLNPLNRWFWWRK (0) 11130 11410 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 11544 11782 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 11901 13254 VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) 13370 13544 VIHESMRLMPPTSGGTVR (2) 13597 13750 VVPRDTQLAGHVLPKGTMLW (0) 13809 14146 IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) 14253 14541 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 14648 14920 FRLADDMGGVEG (1) 14955 15160 AVSEARQHITLKPGDRGLLMHAIPRVPA* 15246 newest data: version 3 checked May 1, 2006 note version 2 seq is better than ver 3 at this gene Name: fgenesh2_pg.C_scaffold_71000048 Protein ID: 179637 Location: Chlre3/scaffold_71:380138-384009 Name: fgenesh2_pg.C_scaffold_846000001 Protein ID: 181363 Location: Chlre3/scaffold_846:2079-5042 34% identical to CYP767A1, 29% to Cyp3a11 Drosophila 4 clan member 380138 MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 380332 380494 VHQCMARWAEQYGKIFKLELPTMT (0) 380589 381479 AVVLTDPEAVSQVLKVDRFEKLTTSYQNMEK (0) 381582 382152 LTAEQQPNILTEPLSAYYKAVRRAVTPAFSTANLR (2) 382211 328813 RFFPLLLDITQQ 382849 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 383001 383167 VVGELMTALQRMLNPLNRWFWWRK (0) 383238 383518 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 383652 383890 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 383878 VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) (in a seq gap) 5043 VIHESMRLMPPTSGGTVR (2) 4990 4837 VVPRDTQLAGHVLPKGTMLW (0) 4778 IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) (in a seq gap) 4242 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 4135 3863 FRLADDMGGVEG (1) 3828 AVSEARQHITLKPGDRGLLMHAIPRVPA* (in a seq gap) >CYP742A1 C_60077 29% to 741A1 YELLOW COULD BE REMOVED AS AN INTRON newest data: version 3 checked May 1, 2006 Name: Chlre2_kg.scaffold_37000075 Protein ID: 151489 Location: Chlre3/scaffold_37:480605-486413 Note: the model Chlre2_kg.scaffold_37000075 is short at the N-term. It is missing the first 63 aa. It is also wrong at the end of the second exon After WRA 486602 MHTAPRRIHAARCRPLHASTGASTPGPAGAPDLPPLQRA PGPPGLPWLGQLPAYLATKFFPKKMLEWSEQYNGVYAMEIVGRKYLVVT (1) 486339 486182 EPSLIAGIVGRGSAGLPKSTGYAMWDSAIS (2) 486093 485810 PHAGVQGLFTVAENTTTWRAVRRAYGPAIGPGSMS (2) 485706 485124 SGTSTSTSSSSTASINSTTGLTSHEMNHLAKCLTLDMLGLSAFG IDFRCLDDPAAAQLPSLIES (0) 484933 484532 AMHECGERARSVGRRLLPWLYEEEARAGAADMAAFHALVE (0) 484413 484091 DVWRQIRARGAPTEDDNSFGAQLLRLADPSLAP (1) 483993 483712 GGAALSDEQICAEIATVIIAGYETTAN (2) 483632 483379 TLTWMLYGLHAHKDASEQLVAELRGA (1) 483302 483023 GLVPDTSSSSSPSSVDPTTASFASLAGAHEALGGLPVLDAYVRECLRLYSTAP NGLIKEVPKNGPPARV (1) 482817 482606 GPFAADPGVVVWIPFWSLHLSNLNWEQPHDFQL (0) 482508 482279 SRWLGKDPRTAGSLTASRCP (0) VSGTLNALRAATSSSSSSSSSSSSSSSSSSSSSSGSDSDGEGGSSSGGRGSK (0) AIRFMPFGDGSRNCVGQHLGMLQLK (0) 481989 481146 LSLAYLAARFDLVLDEARMGGSAAAALERQRVNLTLEVDGGM (2) 481021 480681 YLLGASVHSHARVYWYQLVSCEPKC* 480604 >CYP743A1 C_180013 16 exons MLRALSCLALLAAGAARLAAAAGATDSA (0) 14775 VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 14852 15120 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0?) 15414 VAMGRKWVVVVADAELMRQ (0) AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2) 16227 GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0) GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2) 16969 RLAVACGDVFRFGSALHGSS (2) 17034 17266 YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 17364 17660 ARSTLLAACMELIRSWRQQHHATT 17722 (large insert here) TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ (0) VQTFLLAGYETTANALAFAIYCVATHPE (1) 181099 VESRLLAEVDAVLGRDR (2) 181146 18987 PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0) 19333 LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 19479 19665 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 19715 19932 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 20069 newest data: version 3 checked May 1, 2006 Name: e_gwW.1.412.1 Protein ID: 116541 Location: Chlre3/scaffold_1:5612270-5613769 5617553 MLRALSCLALLAAGAARLAAAAGATDSA (0) 5617470 5617234 VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 5617145 5616874 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0) 5616770 5616574 VAMGRKWVVVVADAELMRQ (0) 5616518 5616251 AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2) 5616156 5615776 GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0) 5615657 5615530 GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2) 5615438 5615016 RLAVACGDVFRFGSALHGSS (2) 5614957 5614725 YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 5614630 5614331 ARSTLLAACMELIRSWRQQHHATT (large insert here) 5614263 5614262 TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ (0) 5613966 5613766 VQTFLLAGYETTANALAFAIYCVATHPE (1) 5613683 VESRLLAEVDAVLGRDR (2) 5613004 PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0) 5612882 5612664 LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 5612548 5612380 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 5612279 5612062 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 5611907 >CYP743A2 C_420091 33% to CYP711A1 16 exons EST support BM002146 BI728655 BE726345 N-term to C-helix 37486 MQDVISFLLNGLGFAAVGLVVL (0) 37551 37671 QLVLSLDLYKRWKLRHLP (1) 37724 37956 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 38060 38400 VAFGRNWMVVVAEPDLLRQ (0) 38456 38720 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR (2) 38812 39351 DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 568697 GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2) VDFQAMESLPAAGTGEGGADTKPAAAP 39994 APASSSYG RVLVQACRDVFKYSSVVYGSK (2) 40059 40319 YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT (0) 40408 40595 ARTHLRDACMSLIDGWKKQEASG 40654 VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE insert 567380 AAAAVGTAAGGVGGLSAGSFLGLMLAAR 567309 DKSTGEGLTDLQVAAQ (0) 41049 VQTFILAGYETTANALAFAVYCLATNPE (1) 41141 AEAKLLAEIDAVLGPDR (2) 41578 LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 41700 42000 VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 42119 42294 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (0) 42395 42712 EAKVALAALYQRLTFELEPGQ (0) 42771 43237 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 43362 newest data: version 3 checked May 1, 2006 Name: estExt_fgenesh2_pg.C_160079 Protein ID: 189550 Location: Chlre3/scaffold_16:612970-615574 Name: e_gwW.16.62.1 Protein ID: 116043 Location: Chlre3/scaffold_16:610198-611929 scaffold_16: 609616-615492 615492 MQDVISFLLNGLGFAAVGLVVL (0) 615427 615307 QLVLSLDLYKRWKLRHLP (1) 615254 615022 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 614918 614578 VAFGRNWMVVVAEPDLLRQ (0) 614522 614258 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR (2) 614166 613627 DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 613487 613489 GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2) 613328 613083 VDFQAMESLPAAGTGEGGADTKPAAAP APASSSYGRVLVQACRDVFKYSSVVYGSK (2) 612916 612659 YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT (0) 612564 612380 ARTHLRDACMSLIDGWKKQEASG VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE insert AAAAVGTAAGGVGGLSAGSFLGLMLAAR DKSTGEGLTDLQVAAQ (0) 612051 611926 VQTFILAGYETTANALAFAVYCLATNPE (1) 611843 611651 AEAKLLAEIDAVLGPDR (2) 611601 611397 LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 611278 610981 VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 610865 610684 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (0) 610583 610269 EAKVALAALYQRLTFELEPGQ (0) 610207 609741 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 609616 >CYP743B1 scaffold 98 unannotated region adjacent to a large gap C_32340001 inserts in large sequence gap of scaffold 98 and completes the P450 gene. first exon is best guess 251670 MVASASWQLDLLGALSGAPSPQM (0) 251602 251481 AAAGLALLLASLLIYLLDPIQRWRLRKVPGER (?) 251386 251227 (1) GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 251117 251064 SAEVQGIAVIPHHVS 251020 251017 RMQVALGRKWAVVLADAEMQRQVRGTGAERG 250925 2384 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA 2241 2240 TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 2142 1933 VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP 1751 1592 YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ 1497 1323 ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADACQ 1198 1119 SSNGVGAAATSGRGGMAGVAPGSFLDLMLG 1030 1029 HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 931 805 VQLFILAGYETTANALAFAVYCIATHPE 722 526 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT 350 266 VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 153 newest data: version 3 checked May 2, 2006 Name:e_gwW.71.18.1 Protein ID:122749 Location:Chlre3/scaffold_71:125260-130065 Frameshift at HHVS/RMQ, GC boundary at RKVP? 125260 MVASASWQLDLLGALSGAPSPQM (0) 125328 125449 AAAGLALLLASLLIYLLDPIQRWRLRKVP (1) 125535 125703 GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 125813 125869 AEVQGIAVIPHHVS 125910 125913 RMQVALGRKWAVVLADAEMQRQVRGTGAERG (2) 126005 126948 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 127190 127399 VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP (2) 127581 127740 YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ (0) 127835 128012 ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADA (2) 128128 128216 SNGVGAAATSGRGGMAGVAPGSFLDLMLG HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) 128401 128527 VQLFILAGYETTANALAFAVYCIATHPE (1) 128610 128806 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT (0) 128982 129066 VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 129182 129643 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 129726 129919 EAKVALATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVTPVPRRKL* 130065 >CYP743B2 C_8600001 also inserts in same gap of scaffold 98 141391 XXRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480 ARTRLCNTVFQLIDSWKEQHRAEAEIDAAASSGKPDVGAGRHSSN GVGAAATSGRGGLSGVAPGSFLDLMLGQRQGGERGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) VQLFILAGYETTANALAFAVYCIATHPE (1) (seq gap) 3945 (0) SSPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 3838 3648 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 3511 CYP743B2 scaffold_71:130374-138996 a duplication of 743B3 note: 1 extra S in exon 1 CYP743B3 has some defects so CYP743B2 may be the intact gene, While CYP743B3 may be a pseudogene copy. 130374 MSNVFANWPSGSGAPLGGLLRSLGM (0) 130448 130571 VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 130657 130846 GPGPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 130962 131029 AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 131151 (seq gap, missing six exons) 138577 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 138660 138850 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 138996 >CYP743B3 C_980035 C_8600002 same sequence 2544 MSNVFANWPSGSGAPLGGLLRLGM (0) 2615 2745 VAAGFALLLVSLIIYLLDPI 242088 KRWRLRKIPG 242059 241862 PGPRGRPVLGCLPQLRAQPMPLFLQ 241788 241786 SCAQTYGPVFKAS 241748 241692 AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG 241570 240969 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA 240826 240825 TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 240727 240592 VDFHSINEEDQAGSGSGSATATAGATAAAKGRGDDGYGKQLAAACGQIFR 240443 240442 YGSPVHGSP 240416 240284 YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK 240189 240021 ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAGQSNN 239890 239889 GVGAAATGGRGLSGVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAP 239740 239739 LTDEQVAGQ 239714 239592 VQLFILAGYETTANALAFAVYCIATHPE 239509 239320 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSR 239171 239170 VVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP 239027 238674 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ 238591 238461 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 238315 newest data: version 3 checked May 2, 2006 Name:e_gwW.71.20.1 Protein ID:122730 Location:Chlre3/scaffold_71:139305-143478 2 Frameshifts and one small duplication of AEVQ these defects were not in the ver 2 seq (see above) 139305 MSNVFANWPSGSGAPLGGLLRLGM (0) 139375 139498 VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 139584 139787 GPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 139897 139966 AEVQA 139967 139969 EVQGTAVLLHHVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 140088 140701 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGATASX 140853 140856 ATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 140942 141077 VDFHSINEEDQAGSGSGS 141130 141131 ATATAGATAAAKGRGDDGYGKQLAAACGQIFRYGSPVHGSP (2) 141253 141391 YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480 141648 ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAG 141767 141768 QSNNGVGAAATGGRGLSGVAPGSFLDLMLGH 141860 141861 RQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) 141956 142077 VQLFILAGYETTANALAFAVYCIATHPE (1) 142160 142349 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEAL RLFPPAHLTSRVVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP (0)142642 143119 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0)143202 143332 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 143478 >CYP743C1 C_1130014 C_9610001 AV627084.1 top part = scaf 961 35102 MTFLQLLPGVPLVLLGVLALPV (0) 35037 34921 VITLVQEVITKRKYRHIP 34868 (1) 34694 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 34611 VK (seq gap about 146 aa) followed by scaf 113 118674 AVALPCLLPAVRHLAAAAPDPVLALHIQ 118591 (0) 118264 SRQVLRQVSTKLITAWRDSHTAAS ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG 31374 MAPTLTDAQIEAQVQTFLLA 31315 (1) (I-helix) 31010 GFETTANALTFAVYLLACHPE 30948 (0) (87 aa seq gap) 29287 (0) AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 29156 28541 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 28308 Name:Chlre2_kg.scaffold_17000165 Protein ID:147793 Location:Chlre3/scaffold_17:1492638-1496177 CYP743C1 scaffold_17:1489349-1496178 1489349 MTFLQLLPGVPLVLLGVLALPV (0) 1489414 1489530 VITLVQEVITKRKYRHIP (1) 1489583 1489757 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 1489840 1490285 VKQYGDLLTFRLGSRQFVLVADPDAAR (2) 1490365 (small gap in C-helix region) 1491618 (0) PVFTARVFLTQIVFPHTARSLRGYQALMDREAVALAGRLR RQAAAGGGGGGGGGGGGGGGDKAGEIEVMSEMSRVTLAVVGTAAYG (2) 1491875 1492617 CNDFFRTMSPAARSSWSW 1492670 1492671 AVALPCLLPAVRHLAAAAPDPVLALHIQ (0) 1492754 1493081 SRQVLRQVSTKLITAWRDSHTAAS ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG 1493293 1493357 MAPTLTDAQIEAQVQTFLLA (1) 1493416 (I-helix) 1493721 GFETTANALTFAVYLLACHPE (0) 1493783 (EXXR missing in a seq gap) 1494664 (0) IQGHRIPAGSTLWLSIAHLHTRDGVWPEPQ (0) 1494753 1495196 AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 1495330 1495948 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 1496178 >CYP744A1 C_940015 (N-term part), C_940016 MALSSAWALAGLFL (0) AMFVFFGYSLRKRWQLRKIP (1) 23868 GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 23773 23439 AWLGVEPLIIICDPALIR 23386 (2) 23162 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 23058 22787 GEVWRRGRRVFEASVINGVR (2) 22557 LAAHLPAINRCANRFVAQL AQRVAAPAAAHSGKTLGEEGIDMFS 22396 IVGGYTMAVTGEVAYG 22349 (2) HVPAVTRGVRPFWQVEHSTLYLPLG (0) 21478 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 21446 IANISRLLMERWATSKKAAAAAAGTG TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) VIAQSFTFV 20858 MAGFETTALTLSLVTFMLATHPE (0) 20791 AAARLTAEVDGLGPGELTHEVLAE (0) 20358 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK (2) 20233 19945 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 19775 19557 MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER* Name:Chlre2_kg.scaffold_23000133 Protein ID:148983 Location:Chlre3/scaffold_23:958944-961028 N-term part, missing two exons Name:e_gwW.23.96.1 Protein ID:118452 Location:Chlre3/scaffold_23:962118-963240 Only covers I-helix to heme scaffold_23 958703 MALSSAWALAGLFL (0) 958744 958941 AMFVFFGYSLRKRWQLRKIP (1) 959000 959120 GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 959212 959546 AWLGVEPLIIICDPALIR (2) 959599 959823 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 959930 960201 GEVWRRGRRVFEASVINGVR (2) 960260 960457 LAAHLPAINRCANRFVAQLAQRVAAPAAAHSGKTL GEEGIDMFSIVGGYTMAVTGEVAYG (2) 960636 961230 HVPAVTRGVRPFWQVEHSTLYLPLG (0) 961304 961510 VMFPWARPLVRWLATHFPDRAQREHMAARTQ 961602 961603 IIANISRLLMERWATSKKAAAAAAGTG TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 961899 962100 VIAQSFTFVMAGFETTALTLSLVTFMLATHPE (0) 962195 962367 AAARLTAEVDGLGPGELTHEVLAE (0) 962438 962633 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK (2) 962752 963039 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 963228 19557 MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER* last exon is in a seq gap use ver 2 seq here >CYP744A2 C_940017 PTQ5694.x1 K-helix to heme = PTQ11662.x1 PTQ243.x1 PTQ52.x1 PTQ9722.x1 MWNVAELGLALVPVV (0) 18913 AFVWLAYNLPERWRLRRIP (1) 18854 18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0) 18648 18338 IWFGVNPWIVVSDPALLR (2) 18285 18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS (2) 17775 GEVWRRGRRVFEASVIHPTR (2) 17722 17477 LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA 17301 EVGSYTMAVVGEVAYG 17256 (2) WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR (0) 16350 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 16318 IANISRLLMERWAASKKAAAAAAGTG GGAGNAAGAGGDRAGG FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 15798 VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 15691 AAARLVAEVDAVGGRELTAELLAE (0) 15294 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 15175 HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP* Name:e_gwW.23.89.1 Protein ID:118526 Location: Chlre3/scaffold_23:969108-971162 I-helix to heme only, seq gap above this Use earlier version for the top half scaffold_23 MWNVAELGLALVPVV (0) 18913 AFVWLAYNLPERWRLRRIP (1) 18854 18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0) 18648 18338 IWFGVNPWIVVSDPALLR (2) 18285 18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS (2) 17775 GEVWRRGRRVFEASVIHPTR (2) 17722 17477 LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA 17301 EVGSYTMAVVGEVAYG 17256 (2) WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR (0) 969108 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 969200 969201 IANISRLLMERWAASKKAAAAAAGTGGGAGNAAGAGGDRAGG FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 969428 969669 VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 969764 969927 AAARLVAEVDAVGGRELTAELLAE (0) 969998 970161 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 970280 970565 HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFG 970711 970712 IGARMCVGHKLAMM (0) 970753 970992 MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP* 971162 >CYP744A3 C_940044 MPGLGALLAFIQTPLGA (0) ITWLGWYPLRRYAFRKFP (1) 3380 GPFGLPFLGNLPQ (0) 3430 3597 IAAMDTTAFLTSSAVKYGPVCK (0) 3662 3831 VWFGTRPWVLINDPELIR 3884 (2) 4267 RHSFRWPARPANFASYFHVMTGENRAIDRAGVVLAE (2) 4371 TIN460677.b1 GEVWRRGRRAFEGSIIHPAR (2) WEQ17438.g11, TIN285957.x1 5063 LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG 5218 (2) VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ (0) 5902 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK 5994 (0) VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 6987 VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 7049 7448 AAARLAAEVDAVLGGRELTAELLAE (0) 8071 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR 8190 (2) 8546 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA 8714 PFGMGTRMCVGHKLAMM (0) 9146 ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG* Name:e_gwW.23.108.1 Protein ID:118465 Location:Chlre3/scaffold_23:976166-982342 I-helix to heme only Exons 6 and 7 are in a seq gap and are taken from trace archive files 982342 MPGLGALLAFIQTPLGA (0) 982292 982144 ITWLGWYPLRRYAFRKFP (1) 982091 981891 GPFGLPFLGNLPQ (0) 981853 981686 IAAMDTTAFLTSSAVKYGPVCK (0) 981621 981452 VWFGTRPWVLINDPELIR 3884 (2) 981399 981016 RHSFRWPARPANFASYFHVM 980957 TGENRAIDRAGVVLAE (2) TIN460677.b1 GEVWRRGRRAFEGSIIHPAR (2) WEQ17438.g11, TIN285957.x1 980414 LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG (2) 980268 979957 VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ (0) 979820 979584 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK (0) 979492 979090 VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG 978974 978973 FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 978875 978529 VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 978434 978038 AAARLAAEVDAVLGGRELTAELLAE (0) 977961 977384 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR (2) 977265 976909 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA PFGMGTRMCVGHKLAMM (0) 976691 976309 ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG* 976166 >CYP744A4 between C_239009 and C_239004 not annotated AV641971 35% to 703A2 N-term to C-helix 51492 MYAALALVLSPVLL (0) 51451 51367 ALLWAIINPVERWKTRKIPG (2) 51308 51224 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0)51123 50867 IWFGNRAWITIADPALIR (2)50814 50325 KLGFKFLNRPARMTDFGH (0) 50272 49795 VLVGHNAEVDNAGAFVAR (2)49706 49574 GEVWRRGRRAFEASIIHPAS (2) 49515 58799 LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQGEQQGKAALAMKQQGG GGGGGVEILTEAGNYTMAAVGEVAYG (2) 58542 (SEQ GAP) 47764 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 47672 VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF 56888 KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 56811 46972 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG 46832 46831 PWRAGAGAGEGACAGGELTPELLAK (0) 46757 46326 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 46288 46008 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALGPADPNGWAPFGVGARMCVGHKLAMM (0) 45820 45561 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 45421 Note: scaffold 121 has part of the last exon as a duplication 61920 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLL 61825 Name:fgenesh1_est.C_scaffold_23000031 Protein ID:95157 Location:Chlre3/scaffold_23:1143890-1147747 CYP744A4 N-term Name:e_gwH.23.61.1 Protein ID:103666 Location:Chlre3/scaffold_23:1141463-1143101 CYP744A4 I-helix to end 1147747 MYAALALVLSPVLL (0) 1147706 1147622 ALLWAIINPVERWKTRKIPG (2) 1147563 1147479 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0) 1147378 1147122 IWFGNRAWITIADPALIR (2) 1147069 1146580 KLGFKFLNRPARMTDFGH (0) 1146527 1146050 VLVGHNAEVDNAGAFVAR (2) 1145997 1145829 GEVWRRGRRAFEASIIHPAS (2) 1145770 1145397 LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQ GEQQGKAALAMKQQGGGGGGGVEILTEAGNYTMAAVGEVAYG (2) 1145140 (missing exon 9 in a SEQ GAP) 1143893 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 1143801 1143522 VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 1143307 1143098 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG PWRAGAGAGEGACAGGELTPELLAK (0) 1142886 1142455 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 1142339 1142048 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALG PADPNGWAPFGVGARMCVGHKLAMM (0) 1141860 1141603 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 1141463 >CYP744A5P pseudogene C_1730009 C-helix 81% to 744A3 PROBABLE pseudogene WITH PART OF EXON 3, EXONS 4,5 AND 6 13946 QIAAMDTTAFLTSSAVKYGPVCK 13875 13607 AWFSTQPWVINDPKLVR 13551 RHSFRWRARPSLFASYFQVMTGENRAIDRAGVGAGG 12773 GEAWRRTRRVLEGSIIHPAR 12705 Name:Chlre2_kg.scaffold_21000002 Protein ID:148389 Location:Chlre3/scaffold_21:6347-7649 Frameshift at PSL/FASY, bad boundary 6350 QIAAMDTTAFLTSSAVKYGPVCK (0) 6418 6683 AWFSTQPWVINDPKLVR (2) 6733 7163 RRHSFRWRARPSL 7201 7200 FASYFQVMTGENRAIDRAGVGAGG (0) 7271 7533 (2) GEAWRRTRRVLEGSIIHPAR (2) 7592 >CYP744B1 C_8650001 C_940020 FIRST TWO EXONS FOUND BY WALKING. FIRST EXON IS A BEST GUESS MELVSGLALAGVALFIL (0) TIN33450.x1 GFIWAGFNPIERYLSPLRRFP (1) TIN292840.x1 WALKED TO THIS READ GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0) 220 IWLGNQPWVCVADPDLIR (BAD BOUNDARY, SHOULD BE phase 2) 568 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK (2) 672 SEQ GAP (EXON 6) 1533 (2) LAGHLPAVWRCVRRYTPRLERHAAT (1) 1589 1838 GEPLDLSSDLADLTLAVVGEAAYG 1882 (2) VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0) 2710 MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 2737 VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ (0) 3663 IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE 3782 VDEHFARQAAAEQQQQGQQQREGDDALPE (0) 4526 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 4645 NTFVCLATHAV 4956 MHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPAGGGADAAAAAWVPFGMGPRM 5132 5133 CVGSKFATM 5153 (0) 5425 VSKAVLLQIYRRFTFELHPKQ (0) 5484 VLPLRTRTALTHAPRDGIWVVVKAR* 5818 Name:e_gwW.23.77.1 Protein ID:118428 Location:Chlre3/scaffold_23:1014183-1020804 Probable GC boundary at exon 4 DLIR = AGGC 1014183 MELVSGLALAGVALFIL (0) 1014233 1014336 GFIWAGFNPIERYLSPLRRFP (1) 1014398 1014778 GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0) 1014873 1015176 IWLGNQPWVCVADPDLIR 1015229 (GC BOUNDARY?) 1015524 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK (2) 1015631 1015978 NGPTWRLARRAFESSIIHPQS (2) 1016040 1016491 LAGHLPAVWRCVRRYTPRLERHAAT (1) 1016565 1016769 GEPLDLSSDLADLTLAVVGEAAYG (2) 1016840 1017260 VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0) 1017403 1017671 MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 1017751 1018161 VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ (0) 1018451 1018621 IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE VDEHFARQAAAEQQQQGQQQREGDDALPE (0) 1018827 1019484 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 1019603 1019881 NTFVCLATHAVMHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPA GGGADAAAAAWVPFGMGPRMCVGSKFATM (0) 1020117 1020380 VSKAVLLQIYRRFTFELHPKQ (0) 1020442 1020727 VLPLRTRTALTHAPRDGIWVVVKAR* 1020804 >CYP744C1 C_1370013 43% to 744A2 2459 MQLTWLGWAPVTRWRLRNIP (1) 2400 1885 GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 1787 1503 VWVAARPWIVVSDPVAARKIAYR (2) 1423 1222 SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 1142 784 GPAWKASRRAFETSVLRPDRL 722 721 AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY (1) VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSSKSPSSALVRACHECMACFELPLATMYLPLQ (0) MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0) AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGISSSSFLAAMLKDQTGSNTACASSSGTDGG (0) VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2) KLRAELVENRELLDSADGLAQ (0) LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2) 508 GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1) 603 905 ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 1539 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 1652 VWVTLQPLLLPGAK* Name:fgenesh2_pg.C_scaffold_39000151 Protein ID:177201 Location:Chlre3/scaffold_39:932071-938361 932071 MQLTWLGWAPVTRWRLRNIP (1) 932130 932645 GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 932740 933039 VWVAARPWIVVSDPVAARKIAYR (2) 933107 933305 SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 933397 933746 GPAWKASRRAFETSVLRPDRL AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY (1) 933949 934140 VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSS KSPSSALVRACHECMACFELPLATMYLPLQ (0) 934331 935234 MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0) 935371 935723 AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGGI SSSSFLAAMLKDQTGSNTACASSSGTDGG (0) 935911 936216 VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2) 936320 936586 KLRAELVENRELLDSADGLAQ (0) 936648 936914 LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2) 937036 937175 GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1) 937267 937572 ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 937667 938203 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADGVWVTLQPLLLPGAK* 938361 >CYP745A1 C_1860018 AV623700 N-term 31% to CYP735A4 rice, 28% to CYP97A4 rice similar to CYP97 and CYP72 clans MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT CVSQNKGAWDIMEDWMDAKGPIVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGLVTS DGALWQKQRMLMGPALRVDVLDDIIR IAKKAIDRLCEKLSHHAGKGDIVDIEEEFRLLTLQ (0) VIGEAVLSLGPEECDR (0) VFPQLYLPV MNEANRRVLRPYRMYLPTPEWFRFSSRMGQLNGFLIDLFRRRWQARQAAAAAAQGEGSSS SKPKPADILDRIMEAIE ESGAKWDAALETQLCYEVKTFLAGHETSAAMLTWSTLELAAHSQAADK (0) VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK (0) ECLQLRLPVIMSE (0?) this may be wrong there is a seq gap that may have the true exon AEDDPQGLLGYPLPRGTMVACHLQ (0) GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ (0) GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTVIPVGPIRGLKVLVEQRK* Name:Chlre2_kg.scaffold_74000010 Protein ID:154128 Location:Chlre3/scaffold_74:79791-84023 Revised the EXXR exon This seq is most like CYP97 or CYP746 sequences. It clusters with the 72 clan or the 97 clan and these two cluster with each other. 84023 MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVD 83874 83873 YIRIARGLSAIPSAPGGVPLLGHVIPMLTCVSQNKGAWDIMEDWMDAKGP 83724 83723 IVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGL 83574 83573 VTSDGALWQKQRMLMGPALRVDVLDDIIRIAKKAIDRLCEKLSHHAGKGD 83424 83423 IVDIEEEFRLLTLQ (0) 83382 83181 VIGEAVLSLGPEECDR (0) 83134 82841 VFPQLYLPVMNEANRRVLRPYRMYLPTPEWFRFSSRMGQLN 82719 82718 GFLIDLFRRRWQARQAAAAAAQGEGSSSSKPKPADILDRIMEAIE (0) 82584 82327 ESGAKWDAALETQLCYEVKTFL 82262 82261 LAGHETSAAMLTWSTLELAAHSQAADK (0) 82181 81879 VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK (0) 81781 81308 EGLRKYSVVPVVTRVL (0) 81263 80886 AEDDPQGLLGYPLPRGTMVACHLQ (0) 80815 80411 GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ (0) 80292 79988 GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTV 79845 79844 IPVGPIRGLKVLVEQRK* 79791 >CYP746A1 C_28140001 = C_250032 C-helix exon duplication This is a bacterial related seq like CYP252A1, CYP197A1, CYP208A1 N-term is probably in a seq gap. C-term runs off the end scaf 2814 is a repeat of the C-helix exon 39% to CYP252A1 from Streptomyces peucetius, but not bacterial because it has introns. MLALAGGLQSMLQVSSPLVTHKITYGSL (0) RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT (SEQ GAP HERE) 31584 GNGLLVSDGPVWQRQRRLSNPAFRRAAV 31495 EAYGGAMVAATEDMMRRVWGPA (1) GGTRDVYADFNELTLQVTLEALFGF SEDAAQIVAAVEKAFTFFTQR (2) AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?) 30612 AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 30502 30091 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS (2) 29766 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRR 29548 29296 EGYSGYMGLMSNLGPNGAYLPFGGGPRNC 29261 (SEQ GAP HERE) KPLLTLRPEAVVLRISPRRQ* Name:e_gwW.1.470.1 Protein ID:116510 Location:Chlre3/scaffold_1:3570907-3575049 50% to CYP746B1 Physcomitrella patens (moss) top 26 hits in nr section of genbank all bacterial followed by CYP97A of glycine max 3575049 MLALAGGLQSMLQVSSPLVTHKITYGSL (0) 3574966 3574076 RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT 3573975 3573974 ATYGPVVGLLLGGERVALVTGRAEARA VLVEAAGEVYVKEGTAFFPGSSLA (1) 3573822 3573413 GNGLLVSDGPVWQRQRRLSNPAFRRAAV EAYGGAMVAATEDMMRRVWGPA (1) 3573264 3573081 GGTRDVYADFNELTLQVTLEALFGF () 3573001 3572874 SEDAAQIVAAVEKAFTFFTQR (2) 3572812 3572660 AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?) 3572496 3572453 AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 3572289 3571932 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS (2) 3571801 3571661 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRS (2) 3571326 3571149 NLGPNGAYLPFGGGPRNC 3571090 3571089 IGTGFAMMEALLVLAALLQRYSLALPPAAGSSSGGAFPKP 3570969 KPLLTLRPEAVVLRISPRRQ* 3570907 >CYP747A1 C_900050 41% to CYP743B2 C-term EXXR to PERF IN SEQ GAP I HELIX LOCATED 28000BP AWAY ON SMALL FRAGMENT (MISSASSEMBLY?) FIRST EXON IS A BEST GUESS 352943 MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK 352824 (0) 352717 DVSVIFFEGLHVAQLKFSEKYGPVCR 352640 (2) 352462 FANPASLNGATSWVFINSPENIQHVCATNVRNYS 352361 RRYLPDIYT (2) 352115 YVTHGKGILGSQ 352080 (0) 351877 DEYNARHRRLCSGPFRNKWQLQRFSSVVVER 351785 (2) 351348 SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR 351190 (2) 350690 DLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK 350565 (0) 350317 AMDRLGAPHLRQLGEAVSVMRAAMLDVIA 350231 (0) 378450 ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRLEEELDAVLAGG 378271 (0) 348405 (0) REARQHRFQWLPFGAGPRMCLGASFAQ (0) 348325 348100 MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ* 347933 Name:e_gwH.96.3.1 Protein ID:108849 Location:Chlre3/scaffold_96:178714-184286 Model only covers I-helix to heme region This seq is now complete, 38% to 97A6 in C-term half 178714 MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK (0) 178833 178940 DVSVIFFEGLHVAQLKFSEKYGPVCR (2) 179017 179195 FANPASLNGATSWVFINSPENIQHVCATNVRNYSRRYLPDIYT (2) 179323 179686 YVTHGKGILGSQ (0) 179721 179924 DEYNARHRRLCSGPFRNKWQLQRFSSVVVER (2) 180016 180453 SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR (2) 180611 181108 RDLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK (0) 181236 181484 AMDRLGAPHLRQLGEAVSVMRAAMLDVIA (0) 181570 182398 ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRL EEELDAVLA (1) 182574 182854 DGEAPTYESLERMPYLQ (0) 182904 183327 ACAKEVMRLYPAIPVFPREAARPDVLPTGHGVAAGDVVFMSS YALGRSEAVWGPDVLEFDPDR (2) 183515 183802 FSPEREARQHRFQWLPFGAGPRMCLGASFAQ (0) 183895 184119 MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ* 184286 >CYP748A1 C_1820019 about 40% to C-term half of 741A1 >C_1820019 N-terminal missing (about 65aa) This seq begins at the KYG motif (TYG) There is a seq gap before this seq, which is probably where the true N-terminal is located. Name:e_gwW.9.168.1 Protein ID:114278 Location:Chlre3/scaffold_9:2353835-2358515 2353322 MSSALDELRFYGTLAATLLGPRYDLGRVPGPPGHPLLGNITAVMRPDYHVQ (0) 2353474 2353835 MLEWANTYGGIFKFSLGFQPVVVVSDPAVAVQVLGRAPGRAIPRKCVGYKFFDL (0) 2353996 2354237 ATNASGAHSFFTTSDEGQWAAVRKAAAAAFSSANVK (2) 2354344 2354560 KAFPIALRHLLL (0) 2354595 2355565 LSLLHVFVEALFGVTPEDFP (1) 2355624 2355926 GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPV (0) 2356030 2356150 VGWREGGTGHVSRGFGARNSRAWGSGEKEWTEENWEPR (0) 2356263 2356454 AVTDLWACLGRVRHPRT (1) 2356504 2356839 GELLGRQGLVPEIGALMMAGFDTSSHSVAWALFALAANPEAQQRVRQELDGRGLLRRP (1) 2357012 2357269 GTAAPPRLPVLDDLPQLPYLNACIDEAMRMYPVAATASVR (2) 2357388 2357569 EVTEPTRVGDFVIPPGVIVWPMLYALHNSVHNWDQPDVFKPERWLQSNAGGSS (1) 2357727 2357950 GKGGGGGKRYMPFSDGMKSCLGQ (0) 2358018 2358133 ALGLMEVRTALVVLLGR (2) 2358183 2358396 YAFALDPGHGGEAAVRRSMIMSLTLKIRGGLRLVATPLG* 2358515 >volvox CYP748A1 79% to Chlamydomonas 748A1 ABSY209135.g1 exon 1 ABSY189778.b1 exon 2a ABSY140806.b1 exon 2b ABSY42643.g1 exon 3 ABSY86219.x1 exon 4 ABSY93957.g1 exons 5, 6 ABSY112787.y1 exons 8, 9 fused ABSY106164.g1 exons 10, 11 506 MSSSWEELCFYGHLASTLFSPKYDLARVPGPRGSFGLGNITAVMRPDYHVQ (0) 348 203 MLEWANQYGGVYKFSLGFQWVVVVSDPRIAVQ (0) 298 289 VLGRGPDSIPRKCVGYKFFDL (0) 248 33 ATNAAGAHSFFTTSDETQWAAVRKAAAAAFSSANVR (2) 140 394 KAFPIALRHSRL (0) 716 LSMLHVFMEALFGIRPEDFP (1) 657 290 GRQVAADMNLVLEEANERLKVPLRKVAMALVRPT (0) 189 623 GVTDLWACLGRVRHPVT 573 GAPLGRDALVPEIGALMMAGFDTSSHSVAWVLFALAAHPGAQLRCRQELAARGLVAEGA (1) 984 GSAQRGPTLDDLIQLPYLNAVIDETMRMSPVAATASVR (2) 871 357 EVTQPTRVGDYVIPPGVIVWPMLYALHNAVHNWDRPDEFLPERWLPGSGAA (1) 199 2357672 AGCCAGACGTCTTCAAGCCCGAGCGATGGCTGCAGAGCAACGCCGGCGGCAGCAGCAGTGACAGCGGTGGCAGCAGCAGCAAGGGCGGCAACGAGGAAGC 2357772 GGGGGTGGCCGGTGCCGGTGGCGGTGGCGCGGGAGGCGCTCGTTCGGCCGCGGCTAACGACGAGGGCAGCGGCGGCGCTGCGGGTGGCTTGGGCGGTGGC 2357872 GGCAGTGGCGCCAGCAGCAGGAGCGGCTCCTCCGCCGCCCTGGGTGCGGCGGCGGCGGCGGCGGCAGACGGCGGCGGAGGCAAGGGCGGCGGCGGCGGCA 2357972 AGCGCTACATGCCGTTCAGTGACGGCATGAAGAGCTGCCTGGGGCAGGTGGGTGGGTGGGCTCTGGGGGTATGTCGTGGTTAGATTCCGCCCCTCACCTT 2358072 TCCCTCCCTTCTCCCGCGCGAAACTTCCCTCATGCTTTCCGCCCTCCTCCTCCCGCCGCAGGCTCTGGGGCTTATGGAGGTGCGCACCGCACTGGTGGTG 2358172 CTGCTGGGCAGGTGCGTGCGTGCGGCGCCGGGGCAGGGGTGGGCGTGGGGGCATGAGGGGGAATGGCCTCAGTGAGATGACGC 2352872 GAAGCAGCGCTTAGTGGTGGTGGCGGGGGAGCTGGGCGGAGCCGCGACCCACGGAGGCGCCGGCCGGCGACTGGAGCACAACGCTTCGCTTCGGCGCTGT 2352972 GCCACTGCTGCAACACAACTGAACATAGGATTCACAGCACTGTTGCTACTGGACGCCACGTCGAGGCTATCGCAGCTATCCCAGAGGACCGCCGCCGGAG 2353072 CCGGGAGGCCACCCCTCAACGCACGCCGCCGTGTGCAGCCAGCCAGTCGGTCCTTTGCCGCCGGCGCAATCAGCACCACCAGCAGCGCAAACAGCCGGCA 2353172 CACACACAGACACCGTACAGCAGCTAACTTGCCAGCCCAACTGCATAGCAGCAGCTCTCCGCCTTTCTACCCCACATCACCCACCCACGCACCCAAGCCT 2353272 CTCCAAGCCACCGCTCCCCTCACCTCTCCCGCTGCAACACACGCCGCACCATGTCCTCGGCCCTGGATGAGCTGCGCTTCTACGGCACCCTGGCCGCCAC 2353372 GCTGCTGGGCCCGCGCTACGACCTGGGCCGCGTGCCGGGGCCGCCCGGGCACCCGCTGCTGGGCAACATCACCGCAGTCATGAGGCCCGACTACCACGTG 2353472 CAGGTGTGCTGACGACCGGCGGGGCGGATGGGGGTTGGGCGGGGGGCAAGGGGAGGTGGGGGACTATGGCGCGGAGGAGTTTGGGTGGGGAAGGGATTTG 2353572 GTATTGTGTGGGGTGGGTGGGGTGGGGTGGGGCAGAGGGTTCAGGGGCTGGGTCGGTGTCGAGGCGGCAAGGGGTAGGATGATCATGACCCGGGGGGATA 2353672 GGAGCGTGTGCGGCTCAGCTGCTGCTGGCCGCGCGCCACCACAGCTGCCGCGGCACTACCCTATGCGCCGCTCCGCACCAGGAACAGCACCTCCCCCCAC 2353772 CGCACCGCATCGTGTGTCACGCCCACGCACCTGACTGCTGCTGCCCGGCCTGCTGCCCGCCAGATGCTGGAGTGGGCCAACACCTACGGCGGCATCTTCA 2353872 AGTTCAGCCTGGGCTTCCAGCCGGTGGTGGTGGTGTCCGACCCGGCGGTGGCGGTGCAGGTGCTGGGCCGCGCGCCGGGCCGAGCCATCCCGCGCAAGTG 2354372 GAGGGGACTCCCCACGCTTGCGGCACCCTTGCGCACGTGTGTGACTTGCCTAGCATCCATCGATCCCCGGCTCAAGCGCCTGATATGCCTCCACCGCTTC 2354472 TGTTCAACCGCCCCGTTGTAATCTCCTGCTACTCATGCTCCCTCTCCCTCCCGCTCGCTGCTCCGGATGGCCCTAACGCCGTCGCAGGAAGGCGTTCCCC 2354572 ATCGCGCTGCGTCACTTGCTGCTGGTGGCGGAGTCCCTGGACCCAGCCGGCCCCCACACGCCCGGCAACCCCTACCTCGACCTCACACACCACTCCCAAC 2354672 AACAGCACCAACAACACCAACGGCACCCGCAAATGCAGAAGGGTGACGGTGCCGCCGCCGCACGCATGAGTGGCGGCGACGGCAGCGAAGCGGCGCCTGC 2354772 AGGCAAGGGCGACAGCAGCAGTGGCGCTGCTAGCAGCAGGTGGCTGTGGCGGACGCCCGACCTCAACTGGATGCGGAGCGGCCTCAGCCTCGGGTTCCGC 2354872 CGGCGCAGCCGCAGCCGCCCCGGCAACAGCACCGCCGCCGCCAAGCCTGCTTCTACGCCGCCAGGGAGCGCTACTACTTCCACTGGTGCTACAGCCAACG >ABSY86219.x1 CHROMAT_FILE: ABSY86219.x1 PHD_FILE: [top] ABSY86219.x1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10 11:54:44 2003 Length = 781 Query: 180 RRRKAFPIALRHLLLVAESLDPAGPHTPGNPYLDLT 287 RRRKAFPIALRH LVA LDPA P NPY+ LT Sbjct: 382 RRRKAFPIALRHSRLVAAGLDPAVQPDPANPYIQLT 489 2354972 CCACACATGCGGCAGACGCAGCTCCCAGTGCCAGCAGCAGCTTTGTGGACCTGGGCAGCAGCTGCGTAGGTGCTGACAGCAGTGCGAGCCTCGCCTCTCG 2355072 GTCGTCCTCGCCCTCGGCCACCGCGTCTGCGCCCTGCTCCTGCGGCCGCTGCGGCGCAAACAGCCCGCGCCGCGCCGTCGCCGCCGCCACTGCCACTGCC 2355172 GACACTAAGGGTGGCGGCGCAGAGCGGACGGCCGCGGCGCCGGCGGGGCCGGCGGAGGCGGAGGAGCTTGCGGCGGGCGGCGTGGGAGCGGGCGCGCCGG 2355272 GCGCTGCCGCTGGCGGGCGCTCCATCCACAGCCACCCCTTTGACTGCGGCACCGACGAAACGAGCAGTGTAGACGAGACGCCTCCGCACGCCACGGCTGC 2355372 ACCCGCCGCCGCCACCTGCACCGCCCCAGCCGGCGCTGGCAGCGGCAGCGGCAGCGCCACGGATGCCGGCACCAGCGCTAGCGGCACCATCGACGCGGAA 2355472 AGTAGCACCGGCGCCGGCACTAGCGGCAACCCTAGCGGCGGCGGCACAGGCGGCGGCCCTGCTGCTGTGGTGGACATCCAGGAACACTTGGAGCTGAGCC none 2355572 TGCTGCATGTGTTTGTGGAGGCACTGTTCGGAGTGACGCCGGAGGACTTCCCGGGTAGGTGCCGGGGACGGACGGAGGGGGAACAAAGAGGCGAGGCAAG 2355672 GCGAGGCGGTCTGGGCAGACGGGAGGGAGGTTGGTGCCAATTGGCGCCATTCAGTGCTTGCTACTCTGCTGTTTCTATCTCGCCAGTATGTGCTAGCGCA 2355772 CTGTCTGCTGACTGGGCACTGACACGTCACCTGGCTGCTCCCTCCGACCCATATCGCCTTCGCACCTCACACGCTCCCCGCCCACCACCTCCCCGCCCGC 2355872 CTGCCCCCTGCTCCTCCCTCCATTCCCCCCTCTTGTCCCGCCCCTCCCGCCCTCCCAGGCCGCCAGGTGGCTGCCGACATGAACCTGGTGCTGGAGGAGG 2355972 CCAACAGCCGCCTCAAGGTGCCGCTCAGCGGGCTGGCAAGAGCCGTCACACAGCCGGTGGTGGGTGCGGGGCTGGGGCGGTTATGTGCCCGAGCGCAATG 2356072 GAGTCGGTCCCAATAATAGTCAAGGAGTCGTCGCGGGACTGGCCATGGGGCGGGGCGGACTGGGGCCAATTGGGTGAGGTCGGGTGGAGGGAGGGAGGGA >ABSY93957.g1 CHROMAT_FILE: ABSY93957.g1 PHD_FILE: [top] ABSY93957.g1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14 12:58:11 2003 Length = 1195 Query: 3 LHVFVEALFGVTPEDFP 53 LHVF+EALFG+ PEDFP Sbjct: 707 LHVFMEALFGIRPEDFP 657 Query: 358 GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPVVGAGLG 477 GRQVAADMNLVLEEAN RLKVPL +A A+ +P V G G Sbjct: 290 GRQVAADMNLVLEEANERLKVPLRKVAMALVRPTVRRGGG 171 2356172 CAGGACACGTGAGTCGGGGATTTGGAGCCAGGAACTCAAGGGCTTGGGGAAGCGGGGAAAAGGAATGGACCGAGGAGAACTGGGAACCGAGGGTTACGCA 2356272 GGCACGCCGCCGCAACCCCCAGTCTGACGTGCGACGCTGCTAGCCGCCACCCCTCCTCCACGCACGCGCACACGCCCAACCCCACACAGGCCCAGGCCCG 2356372 CATCCGCGCCGCCCAGGTGCGGCTGGCTGCGGTGTACGGCAGCCTGTACGACGTCATCCGGGCCCGTGGGCCGCAGCCCGAGGCCGTGACAGACCTGTGG 2356472 GCGTGCCTGGGCCGCGTGCGACACCCCAGGACAGGTGGGGGGGCGGTTGTGGCGTGTGGGTTGACGCGGGTACGTGGGGACACAGGGAGGGGGTGGGGGC 2356572 ACTGCTGGGTGGGTGTGCGCGCGGCACGCCGCCGCGGCCCCGAGTTACTGACTCTGGAGGAAACCATGCTGCAACTCACTTGCCCTGCCGCATGGACCGC 2356672 GGCCCGCAGCACCTCCACCGCGCCTGCACCAGACCTCCCCCACCTTTGCCCTAACCCACCCTTTTCTTCCTTATCCAGCCACCAATCACGGACTTCGCTC >ABSY112787.y1 CHROMAT_FILE: ABSY112787.y1 PHD_FILE: [top] ABSY112787.y1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10 10:18:50 2003 Length = 799 Query: 277 PEAVTDLWACLGRVRHPRTG 336 PE VTDLWACLGRVRHP TG Sbjct: 629 PEGVTDLWACLGRVRHPVTG 570 >CYP-un1Chlre pseudogene 1, family not identified, C_140094 half of gene, very different 63125 (0) HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 62976 VGRGRERGGRELARMEVKRGGDRF (2) 62528 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGS (1?) 62412 61349 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREALAALLTKYRWRL 61164 61163 YDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 61041 scaffold_48:305112-303028 no model Name:estExt_fgenesh2_pg.C_480037 Protein ID:193769 Location:Chlre3/scaffold_48:289896-330340 Note this is a very long gene model that contain s the EXXR exon But no other exons. It misses the heme signature sequence And the I-helix motif 305112 HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 304963 304962 VGRGRERGGRELARMEVKRGGDR 304894 304518 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGSS (1) 304396 303336 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREA 303187 303186 LAALLTKYRWRLYDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 303028 vovlox has no ortholog $$$$$$$$$ >CYP767A1 Green my predictions Yellow JGI predictions that work in blast CYAN = motifs Name: fgenesh2_pg.C_scaffold_9000240 Protein ID: 169101 Location: Chlre3/scaffold_9:1625885-1634209 Exon 13 in a seq gap use older version of seq here fgenesh2_pg.C_scaffold_9000240 [Chlre3:169101] similar to 741A1 C_340039 unnnamed C-term P450 fragment PKG to heme 1625885 MDGWPPSSPGSIRLQTLQLHAVPPAEPSSSPFITGPPPT (2) 1626001 1628184 LRSLLLPRYDLDSIPGPWPHALPLLGNMLSVLRPDFHRVLLRWADQYGGVVRIKFLWQ (0) 1628357 1628626 DSLLVTDPAALASICGRGEGACDKAAAIYTPIN (1) 1628724 1629069 AMCTPRGHVNLLTSPANDAWRAVRKAVAVSFSWNNIKNKFPIIR (2) 1629200 1629464 DRTSELVEWLRAEGPAASVDVDQAALRVTLDVIGL (0) 1629568 1629948 TAFGHDYGCVRLRQVPPEHLIRVLPRAFTEVMRRIANPLRALAPRLVKKGTK (1)1630103 1630519 GLQAFRDFQAHMQQLLREVLDRGPPPPEDTDIGAQL (2) 1630636 1630701 EAQR (0) 1630712 1630736 PAITEERILSE (0) 1630768 1630963 IGILFVEGFETTGHTISWTLFNIATTP (1) 1631043 1631243 GVQEAVAAELGGLGLLVRPHAMGGR 1631317 1631318 GAARPLALEDLKRLPYLTACVKEAMRMYPVVSIMGRITQ (0) 1631434 1631650 HPTRVGKYLVPAGTPIGTALFAIHNTRHNWTDPLAFRPQRWM 1631775 1631776 GESSSERASGRASERARDSGR (2) 1631838 4554 YMPFSEGPRSCVGQSLAKLEVMTVLATLLAHFRVDLAEE (0) 4646 1634099 MGGREGVHKRESTHLTLQTAGTRGIQMHLHPREDDP* 1634209 Note: most similar to animal CYP46, CYP24 and CYP4 sequences 34% to 741A1 The first exon is probably not right (too far away) Short EAQR exon is required to join GAQL to PAIT, there may be some revision needed here. (see volvox ortholog) trace file 652853255 from PQRWM walked down to 650266898 these two covered the ver 3 assembly to 1633054 with 100% matches walked down to 337758911 goes to gap region in assembly 100% used the very end of the assembly to search again and found 335096672. This seq has the missing P450 exon seq 336483811 has the end of this exon >Volvox ortholog assembled from blasts for each exon, exons 7,8 found By comparing DNA for Chalmydomonas and volvox in this region for matches. Missing exon 1 ABSY171556.g1 exon 2 PKY ABSY46806.x3 exon 3 DGL fused with exon 4 ABSY46806.x3 exon 4 MCT ABSY5198.y1 exon 5 DRT ABSY140583.g1 exon 6 SAF ABSY56673.x2 exon 7 GLT ABSY90166.y3, ABSY10903.x1, ABSY90166.y1, ABSY125944.g1 exon 8 PAI Missing exon 9 ABSY174072.y1 exon 10 GTQ Missing exon 11 ABSY225235.b1 exon 12 FMP ABSY176428.b2 exon 13 MGG 270 (0) PKYDLDLIPGPWTHALPFIGNLLQFLRPDFHRVCLRWADKYGGIVR (2) 133 343 (2) IKFLWHDGLLVTDPPALAAICGRGEGAVDKAANIYSPIN 459 460 QMCTPHAYPNLLTSLADDRWRAVRKAIALSFAFGNIRKKFPLIR (2) 591 494 (2) DRTGELLEWLRGVGPLESVDVDQAALRVTLDVIGL (0) 598 (0) SAFGHDYGCTRLQQVPYNHLLRVLPRAFTEVMRRIANPFRSFAPGLVKNGKK (1) 723 GLTSFKDFQRHMQELLGEIKARGPPARGDADIGAQLYRVLEAAR (0) 779 (0) PAITDERILSE (0) 322 (1) GTQEAVAEELSSLGLLVRPKSEGGRSAARQLELDDLKRLRYLTACVKESMRMYPVVSIMGRWRMR (0) 516 702 (2) FMPFSEGPRSCVGQSLAKLEVMTVLAMLLANFRIELSDE (0) 818 744 (0) MGGREGVRQRESTHLTLQTRGTRGIRMHLHPRDQE* 851 $$$$$$ >CYP768A1 Chlre2_kg.scaffold_23000190 [Chlre3:149040] this P450 model is upstream and covers an N-term up to I-helix motif 2000bp space between N- and C-term parts C_1530020 unnnamed C-term P450 fragment PKG to end Chlre2_kg.scaffold_23000191 [Chlre3:149041] 31% to 4Z1, 31% to 4B1 in C-term part 24% to CYP46 over most of the length 35% to Ciona 4V5 like seq C-term part Name: Chlre2_kg.scaffold_23000190 Protein ID: 149040 Location: Chlre3/scaffold_23:1470852-1473965 Name: Chlre2_kg.scaffold_23000191 Protein ID: 149041 Location: Chlre3/scaffold_23:1476142-1477663 1470852 MPAAQLFKFLLKPQYDLAKLPQPPVADWVLGHVKHLLRK (1) 1470968 1471402 DYHRVILGWAKQYGRIFKLR (2) 1471461 1471649 ILNEWTVVITDPAAAAQVLATVPGRTHNYKHIDE (0) 1471750 1471901 VLGGPGKIS (2) 1471927 1472089 MFGTPDEVHWRNARKATAPAFSMAN (0) 1472163 1472706 VPDATALPGFDELASNILLLMAEANAQ (0) 1472786 1473064 VTDPLRAFFYFTPIAPLVSK (0) 1473123 1473316 HVARCRAALKQVVMFHGRTAARILAR (2) 1473393 1473587 PEPSPDNTLLWACLHRLRHPHTGRKLTPGQLHPE (1) 1473688 1473904 VGMYTAAGFDTTASTVGWCM (2) 1473963 1474237 YAASLWPEQQQAVAAELRAAGIFGPAAVVE (0) 1474326 1474621 ELAKLPRLNAFINE (0) 1474662 1475863 VMRMFPPTAVSAER (2) 1475904 1476109 LTPDEPVTIMGMTFPAK (0) 1476159 1476468 TVLWCITYGIHMSDANWEDAAKFK (2) 1476539 1476792 PERWLEDPRCAFAKSPGAGGAAAAPATAGGAEGPAAAIGGAAEEEPPNTA PRRFVPFGQGPKNCVGQ (0) 1476992 1477404 NFGITVVRAVVALLLRRYHVDLHPDMDTSPEGDKLGGGGGGGDGNSSGSGQAGG CRHSAEDTARLTHVAVITKLKKLRLVLQRRDD* 1477664 Volvox CYP768A1 ortholog ABSY165990.g1 exons 1,2,3 ABSY147804.y1 exon 4 C-helix partial ABSY193853.g1 exon 4 C-helix partial ABSY111272.b1 exon 4 intact ABSY75276.y2 exons 5,6 fused ABSY73799.g1 exons 9,10 ABSY165990.b1 exons 14,15 fused, 16, 17 ABSY22806.b1 exons 18, 19 MWDTLRFYYSTHGPLGAWTPAIVLLLNILGIALALAVTKFIGLYFA 258 PSYDLRKIPTPPVGDAILGHVKFLLRPDYHRVILAWTRKYGKIFRLR (2) 398 599 ILTQWTVVITDPAAAAQVLAVVPGRTHNYTLVDE (0) 700 986 GLGGPGKIS (2) 546 MFGTRDEAHWRNVRKATAPAFSMAN (0) 620 362 (0) VPDARALPGFDLLVPRILLLMAEANRQIVDPLWALWYRTPLAPLLSK (0) 222 221 PDPPSDNTLLWACLHRLRHHITGARLTPTQLHPE (1) 322 649 GGMYTTAGFDTTASTLGWCL (2) 705 957 (2) VSPDRPVAVGPFTLPPGVVLWPLVYGIHMSDANWDEPEAFR (2) 835 542 MERWLEDPRCAFARGE (1) 495 314 RGPGASGAPRRFLPFADGPKNCVGQ (0) 261 469 NFGLVVVRAVLALLLSRYRVALHGDM 546 no boundary after DM 822 (0) VAVVTKLSKLRLVMTPRD* 878 $$$$$$$ note: the next three sequences are partial missing the N-term. It is nearly impossible to assemble the N-term part without cDNA. >CYP771A1 C_4150003 unnamed CYP97 like C-term P450 fragment estExt_fgenesh2_pg.C_210032 [Chlre3:191092] TIN347338.x1 CANNOT DETECT N-TERM HALF, EXXR TO PERF MISSING I-helix present and heme signature present Gray region is 39% to a Xenopus seq EAASLWLLMALPVPNELLPG YGTYEANVRRLDELVYDM LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE (0) VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ (SEQ GAP) (0) PNAFLPFGVGSRSCIGRHFGLLSTQ (0) LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ* Name:estExt_fgenesh2_pg.C_210032 Protein ID:191092 Location:Chlre3/scaffold_21:297178-306479 302461 (0) IEAASLWLLMALPVPNELLPGYGTYEANVRRLDELVYDM (0) 302577 303433 LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE (0) 303540 303889 VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ (0) 304029 305941 PNAFLPFGVGSRSCIGRHFGLLSTQ (0) 306015 306339 LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ* 306479 volvox matches ABSY89114.y1 ABSY732.g2 I-helix 787 (0) NAALWLLLQLPIPDHLLPGYDKYMANIATLDEL (0) 689 278 (0) LVTMFFGGTDTSALALTLTAYHLAHCPEAQRAARAE (0) 385 $$$$$$ >CYP770A1 C_7970001 unnamed C-term P450 fragment fgenesh2_pg.C_scaffold_15000041 [Chlre3:170931] runs off end NOTE: CANNOT FIND AN AG-GT BOUNDARY AT LAST EXON. THIS MIGHT HAVE A LONG INSERT IN IT AND NO INTRON Very low sequence identity to other P450s LLVSEGQQWRLMHALATPAF (C-helix) 34% to CYP714A2 GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRH (2) (I-helix) SRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ (0) (EXXR) (0) DTTIGGVMLPKD (0) 5873 (0) VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 5766 5066 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVMRYSLELLQPPPPSPR 4884 4643 (0) QLFVSLRPGPGVRVCFVPRHQQQVE* 4563 Name:fgenesh2_pg.C_scaffold_15000041 Protein ID:170931 Location:Chlre3/scaffold_15:453166-458216 39% to 746A1 454318 LLVSEGQQWRLMHALATPAF 454377 KAELLERGAFAAALRGVMEEWHRRAVALLPLWRLQAA (0) possible exon like 97B6/97C3 455773 (0) GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRQ (2) 455895 456058 GCGFPTSRFIQSHPSRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ (0) 456204 456449 DTTIGGVMLPKD (0) 456484 456906 VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 457013 457713 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVM 457850 457851 RYSLELLQPPPPSPR (?) 457895 458139 (0) QLFVSLRPGPGVRVCFVPRHQQQVE* 458216 >volvox matches ABSY135777.g1 661 IMGAGHETTATTTAALLYCISAHPDVRQRVEQEL 560 I-helix ABSY182504.y1 326 LPYTEAVLKETMRLYPALPMMHRHARNDIRLEDGRVAPK 210 (EXXR motif) ABSY179247.b1 alternative 106 LESIVLETLRLYSPAYMVGRCAQVDATLGPYSLPTGTTVLVSPFVMHRDAAVW 264 715 GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 843 ABSY65293.y2 888 DATLGPTSVPTGTTVLVSPFVMHRDXPVW 802 351 GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 223 $$$$$$ >CYP769A1 fgenesh2_pg.C_scaffold_24000071 [Chlre3:173996] C_10690001 unnamed C-term P450 fragment 43% to 97A5 cannot extend upstream possible C-helix exon VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE seq gap here RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG AG region of I-helix here LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG () VYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK () DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVL LCFLALEPTGDPADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* Name:fgenesh2_pg.C_scaffold_24000071 Protein ID:173996 Location:Chlre3/scaffold_24:545063-551204 Not a bacterial contamination since there are exons and an ortholog in volvox 551626 MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDS (1) 551525 550850 LPPALPLLGHWLALRARGRGSEPGDTHLRTLRRWAEAHGGAFRLLLPRAW 550701 550097 VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE (0) 549978 (seq gap here) 548586 (0) AVEATGQVLLLRLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG (0) 548416 547904 LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG (1) 547794 546390 GVYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK (0) 546253 545752 DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) 545624 545320 ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGD 545171 545170 PADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* 545063 >Volvox about 60% to Chlamydomonas seq above >ABSY24005.y1 CHROMAT_FILE: ABSY24005.y1 PHD_FILE: [top] Query: 5 QLRNRCNLTGHTSRQPLQPSHLPTLW 30 +L RCNL G SR+ LQ HL T W Sbjct: 526 RLNYRCNLRGRVSRRALQDVHLSTRW 603 MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTAR (1) volvox MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDSR (1) Chlamy 685709806 AOBN322434.y1, also ABSY28503.b1 ABSY17828.g1 goes upstream of this exon different N-term (probably both same with one having errors) SPPGVPLLGHSLAYARAPWKWGGVPRARVPGEPSFLW (errors) (1) APPGVPLLGHSLTLRAWPSWTWWWFRSG GPRGDQLLLRALLRWSEQYDGAFQLRNGWL APPGVPLPGHSLTLPAWPSLDMGVGSGAEGPRATTLHWGRCCPGPSSTMVLFT ABSY207904.g1 Trace archive files 685812629 AOBN472902.y1 684986793 ABSY385036.g1 683183378 AOBO82354.b1 710612050 AOBN690035.g1 550752068 ABSY207904.g1 689850606 AOBN318993.b1 685709806 AOBN322434.y1 85 VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0) 204 689851374 = mate pair of 689850606 above (C-helix) 2 exons 955 () GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830 (0) AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR CRILHHFIEQ (0) ABSY202948.b1 (+) (0) LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1) ABSY223271.b1 246 (1) GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK 112 ABSY130123.g1 (+) 348 (0) DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA 474 (0) 711 LCYPEAFMPFSFGSRN*LGQKLPVAQIKAALAMLL*FLGLKPS 839 SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1) trace archive 712749567 VAPWGLGLFLRPEGGMQLLVAPRKKNS* 687335561 Assembled volvox CYP769A1 seq 56% to Chlamydomonas CYP769A1 seq MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTA (1) PPPGVPLLGHSLTLRAWPSWTWWWFRSGGPRGDQLLLRALLRWSEQYDGAFQLRNGWL VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0) 204 GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830 AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR CRILHHFIEQ (0) LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1) GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA (0) SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1) VAPWGLGLFLRPEGGMQLLVAPRKKNS* >ABSY207904.g1 CHROMAT_FILE: ABSY207904.g1 PHD_FILE: ABSY207904.g1.phd.1 CHEM: term DYE: big TIME: Sun Nov 30 14:23:34 2003 NNNNNNAAGCGCTGAATACCCTCCTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXGTTCTTCACCCGAATGCCGTACCCAGCAGCGCTACA GCCACCTCCTCGGCCCAGTGGCGGTTACTGCGGAGGTCGCTGCTACACGCCTTTTCCGAC TCGGAGCTTCAACTGGACTTTGAG GTGCGTGGAGGTGCATGTGTTCGTGTGATATGTGTC TATCTGTCTGTGTATCTGTCGTCGTAGGCCAGGCGTCACTGTCCAAGAGAACCCTTCACA AGAGGCCAAGAGAACCCACCCCCACCCCCACCCCCACCCTCACCCTCACCCTCCCACCCC CTCCCCCCACCCTCACCCTCACCCCCACCCTCCCACCCCCACCCTCATCCTCACCTCCAC CCTCCCACCCCCACCCCCTAACCCTAAAAAAAAAATCCCCACAAAAACCTCATCTATATA TNCTTCATCCCCAATCCCAACTCCACTATCCAACATCTTTAAATCATCACCCATTTCTCC CACTCTAACCTCCACCCCAACCTCAACTTCTTACCCAACCCTTATAAAAATCAACTCCCT TTTTTAAATCCCCAAACCTCAAATCCTATTCCCTACCCAATTATCCTTTCACATCTATAC CCCATATCTATTCATAAACCTTAACCAACCCCTCACTTACCCTTTACCTTTAAAATCATA AAACTCACCACCTTTCCATACTATCTTTTCAAATACCCATAACTTTTCCCCACATCAAAA TAAAAATTTTTTTCCTATTAATACAACACTTTTTATACCCCCTCTCTACACTATAAACAT CCCCTTAATTTTATATATTTCCCCTAAANATACTTCCCCCATTTCTACTTTATCATATAA AAAAATAATTTTCCAACTTCCTTAAAAAACCTTCTTAAAAATTATTTCTATTTAACACTC TCATTTATAATTTTCTTACCTATTATTAAATTTCCTTAAAATCTCAAAAAAACTCTCTTC CCCATTAACAAACTATTCATTATTCCCCTTACNACTAACAAATAAAAATAAAAAATTTTT TTTCTTTTCTCCCCCTCATAATACAAATAAAAATAATTTCCCAAAAAACACCCACACACC ATCATACATTCCAATTATCTTTATAAAACAATTTCCCTNTCCACATACAATATAAAAAAT AAAATATCTCCCTATCTTACATAAATCTATTCATCTTANTCTAATATCCCTTCTCCTACC TCTTTCAACCTCTTTTAATCAATAATCTTTTTATACCCTCACAACTCTTTTCTACTCACT ATCACTCTC >ABSY109519.b1 CHROMAT_FILE: ABSY109519.b1 PHD_FILE: [top] ABSY109519.b1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14 12:57:01 2003 Length = 1136 Score = 26.7 bits (53), Expect = 23 Identities = 12/18 (66%), Positives = 13/18 (72%) Frame = -2 Query: 2 AAAATHPARTGYGAARSA 19 AAAAT A +GYGA R A Sbjct: 511 AAAATSRAASGYGAERGA 458 Match to Kineococcus radiotolerans SRS30216 ctg215, whole genome shotgun (bacteria) ACCESSION AAEF02000013 MVRAVPAIVRAPHLFLAEVTRRHGPVAAIPLPRTPVLVLADPDGVRRVLVENARGYGKATIQY SALATVTGPGLLAGDGEVWKQHRRTVQPAFHHGSLEDVA AHAVHAARGLVAEADALPPGTPLEVLGATSRAGLEVVGHTLAAADLSGDAPLLVEAVG RALELVVRRAASPVPAAWPTPARRRLAREVAVIDEVCARIVATRRARPLEDPRDVVGL MLAAGMDDR QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGR EPGWDDLGKLPLLRAVVDESLRLYPPAWVVTRQALADDVVAGVAVPAGTLVIVCTWGL HRDPALWEAPEEFRPDRFLDAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLC ERTVRPAGPAPRVDALVTLRPRGGLPL HVERLAPSAS Score = 122 bits (306), Expect = 4e-25 Identities = 81/206 (39%), Positives = 110/206 (53%), Gaps = 14/206 (6%) Frame = +1 Query 49 RILQDFIEGLLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREGGVYDAPHTALTWTM 108 ++ + + ++ GHE V SL W L LAR + ++ AEL G + W Sbjct 46849 QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGREPG-----WDD 47013 Query 109 LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLAGWSVPAGAEVWVDVHAMHRNPQL 168 L +LP L V E+LRLYP P RQ D V+AG +VPAG V V +HR+P L Sbjct 47014 LGKLPLLRAVVDESLRLYP--PAWVVTRQALADDVVAGVAVPAGTLVIVCTWGLHRDPAL 47187 Query 169 WRDPDRFNPERWAEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLAL 228 W P+ F P+R+ +AP + +++PFG+GPR C+G+ LA E LA LLC + Sbjct 47188 WEAPEEFRPDRFL----DAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLCERTV 47355 Query 229 EPTGDPADEPRPAAGLFLRPAGGLHL 254 P G PA PR A + LRP GGL L Sbjct 47356 RPAG-PA--PRVDALVTLRPRGGLPL 47424 Match to 4F3 >CYP4F3 NM_000896 Length = 520 Score = 80.9 bits (198), Expect = 5e-18 Identities = 59/193 (30%), Positives = 93/193 (48%), Gaps = 33/193 (17%) Query: 96 RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEGLLL----------- 144 RL+ + ++ RRR LP+ +D +A +K + L DFI+ LLL Sbjct: 261 RLVHDFTDDVIQERRRTLPSQG----VDDFLQAKAKSKTL-DFIDVLLLSKDEDGKKLSD 315 Query: 145 -------------GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTM 191 GH+ L+W L LA++ Q++ E++ E + D + W Sbjct: 316 EDIRAEADTFMFEGHDTTASGLSWVLYHLAKHPEYQERCRQEVQ-ELLKDREPKEIEWDD 374 Query: 192 LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQ 250 L +LPFL C++E+LRL+P P PA R +D+VL G +P G + V H NP Sbjct: 375 LAQLPFLTMCIKESLRLHP--PVPAVSRCCTQDIVLPDGRVIPKGIICLISVFGTHHNPA 432 Query: 251 LWRDPDRFNPERW 263 +W DP+ ++P R+ Sbjct: 433 VWPDPEVYDPFRF 445 >CYP4F12 mRNA for cytochrome P450, complete cds. AB035130 Length = 524 Score = 123 bits (308), Expect = 1e-30 Identities = 74/194 (38%), Positives = 107/194 (55%), Gaps = 9/194 (4%) Query: 187 GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTMLHRLPFLDCCVRE 246 GH+ L+W L LAR+ Q++ E++ E + D + W L +LPFL CV+E Sbjct: 329 GHDTTASGLSWVLYNLARHPEYQERCRQEVQ-ELLKDRDPKEIEWDDLAQLPFLTMCVKE 387 Query: 247 ALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERW 305 +LRL+P P P R +D+VL G +P G +D+ +H NP +W DP+ ++P R+ Sbjct: 388 SLRLHP--PAPFISRCCTQDIVLPDGRVIPKGITCLIDIIGVHHNPTVWPDPEVYDPFRF 445 Query: 306 AEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGDPADEPRP 365 S+ SPLAF+PF +GPR+C+GQ A AE+K LA++L P EPR Sbjct: 446 DPENSKGR--SPLAFIPFSAGPRNCIGQAFAMAEMKVVLALMLLHFRFLP---DHTEPRR 500 Query: 366 AAGLFLRPAGGLHL 379 L +R GGL L Sbjct: 501 KLELIMRAEGGLWL 514 >e_gwH.661.2.1 [Chlre3:109783] Name:e_gwH.661.2.1 Protein ID:109783 Location:Chlre3/scaffold_661:7589-8149 bacterial contamination 81% to Arthrobacter seq NZ_AAHG01000018.1 Arthrobacter sp. FB24 MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGDDASGTAQLFA SSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGSADLIKELAYPLPVIVISELM GIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEYFLALIDE Query 1 MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGD 60 MDF A+ E LDPFPYYERMREAAPV++DEQSGSWH+FRYDDVQR LSEYATFSS MGGD Sbjct 49739 MDFAAANENPLDPFPYYERMREAAPVFHDEQSGSWHVFRYDDVQRVLSEYATFSSRMGGD 49560 Query 61 DASGTAQLFASSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGS 120 D S T QLFASSLI TDPPRHR LRSLVTQAFTPKAVDALAPRI+ LT+ELL+GI +RG Sbjct 49559 DPSETGQLFASSLITTDPPRHRHLRSLVTQAFTPKAVDALAPRISELTEELLDGIVSRGG 49380 Query 121 ADLIKELAYPLPVIVISELMGIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEY 180 ADLI+ELAYPLPVIVISELMGIPA DR+RFKQWSDVIVSQTRT +A+ +H A N EMT Y Sbjct 49379 ADLIEELAYPLPVIVISELMGIPADDRDRFKQWSDVIVSQTRTNAATEDHQATNREMTGY 49200 Query 181 FLALIDE 187 FL LI++ Sbjct 49199 FLDLIEQ 49179     *+,DE`ab # 5 6 7 i j   * + v w . / d e # $ k l    J K ࿿hH;5B*\ph hH;0JjhH;UhH;B*phhH;hH;CJOJQJaJhH;5B*CJ\aJphK+,Eab6 7 j  + w / e $ l   edUB K L !mOdz)?UkedUBK L   !lmNOcdyz()>?TUjk,-bcxyQRfg|}hH;CJaJhH;CJOJQJaJhH;\-cyRg}3]sedUBedUB23\]rs01WXqr78RSij !=>XYst "#=>WXhH;CJaJhH;5B*\phhH;CJOJQJaJY1Xr8Sj!>Yt #edUBedUB#>X9To2l%ZVedUBedUB89STno12kl$%YZ'UVc  EFS56CqrhH;B*CJOJQJaJph3fhH;5B*\phhH;CJaJhH;CJOJQJaJR F6r&bR B ~ 9!u!!!)"e""edUB%&3abo#QR_   A B O k } ~ !8!9!B!k!t!u!!!!!!!!(")"6"d"e"k"r"""""""##&#T#U#b###hH;hH;B*phhH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJph3fN""#U### $E$$$$5%x%%%,&h&i&j&&&&'V''' (F((edUBedUB#####$ $$D$E$R$b$$$$$$$4%5%B%b%w%x%%%%%%%%+&,&5&b&g&h&i&j&&&&&&&&''''U'V'b'c''''''' ( ((E(F(S(b(((((((((hH;5B*\phhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJI(((6)r)))&*b***+R+++,B,~,,,2-n---".^.../edUBedUB()5)6)B)C)q)r))))))))%*&*3*a*b*o*******++!+3+Q+R+_+++++++,,,3,A,B,O,},~,,,,,,,-1-2-3-<-m-n-x-------!."./.3.].^.k.....hH;hH;B*phhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJO...//&/3/T/U/b///////0 0030D0E0R000000001 1;1<1I1w1x11111111 2+2,2-2;2<2=222222#3$3=3>3?3@3I3R3m3n333333354㸸hH;5B*\phhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJph3fH/U/// 0E0001<1x111,2-2<2=222$3>3?3@3n333364u44edUB5464t4u444444 5 5E5F555555!6"6#6P6Q6l6m66666666677U7V77777788G8H8888887989u9v9w9999999::G:H:o:p::::˿hH;B*CJaJphhH;B*CJaJphhH;CJaJhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphF44 5F555"6#6Q6m666667V7778H88889v9w999:H:edUBedUBH:p::: ;\;;;<<I<^<r<<<%=I=r===><>c>>>>> >5>;><>b>c>j>>>>>>>>>ӵhH;B*CJOJQJaJphhH;B*CJaJph" *hH;B*CJOJQJaJph3fhH;B*CJOJQJaJph3fhH;CJOJQJaJ *hH;CJaJhH;CJaJA>>>>>?5?;?@?@R@S@{@|@@@@@@AAAAA>A?AFA_AjAkAqAAAAAAAAAA߲ϮϜϜϊ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;hH;B*CJOJQJaJph *hH;CJOJQJaJhH;B*CJOJQJaJphhH;CJOJQJaJhH;CJaJhH;B*CJaJph6?@@?@S@|@@AA?AkAAABDBmBBBC'CNCCCC DHDIDwDDedUBedUBedUBA BBBB=BCBDBJBfBlBmBqBsBBBBBBBBBBCC CC&C'C-CFCMCNCTCqC{CCCCCCCCCCCC D DDADGDHDIDqDvDwDDDDDDDDDEE$EIE̼ *hH;CJOJQJaJhH;B*CJOJQJaJph3f#hH;B*CJOJQJ^JaJphhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphDDDDDESEnEoEEEEFBFmFFFG2GhGGGG/HbHHHIIZIrIedUBIEQERESEmEnEoEqEEEEEEEEEE FFFF F6FAFBFIFaFlFmFtFFFFFFFFFFGG G&G1G2G9GIGNGgGhGGGGGGGGGGGGGG#H.H/H6HIHVHaH°԰԰԰԰԰԰԰԰԰԰԰԰" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CJOJQJaJEaHbHiHHHHHHHHHI IIIIIYIZIqIrI|I}I~IIIIIIGJHJIJJJJJJKKK?K@KAK}K~KKKKKLLLLL L]L^LLLLLLLLMMMMUMVMMMMMMMMNNN^N_NNNNNNNN" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphWrI}I~IIIIHJJJJK@KAK~KKKLL L^LLLLLMMVMMMMedUBMMN_NNNNNNO!OBOoOOOOOO1P`PPPP"QoQQQQRRedUBNNNO OO O!OAOBOnOoOOOOOOOOOO P0P1P_P`PPPPPPP Q!Q"QnQoQQQQQQRR RRR*R+RTRURRRRRRRRRRRRRRRSS S#S$SNSOSSSSSS" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphLR+RURRRRRS$SOSSSS)TcTTTU9UaUUUUV8VuVVVV WedUBSSS(T)T0T3TbTcTTTTTTUU8U9U`UaUUUUUUUVV7V8VtVuVVVVVVVVVVVVVW W!WlWmWWWWWWWWWWXX6X7XnXoXXX" *hH;B*CJOJQJaJph%hH;5B*CJOJQJ\aJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphA W!WmWWWWW7XoXXXXY1YdYYYYY ZCZ\ZZZZZu[[[edUBedUBXXXXXXXY0Y1YcYdYYYYYYYYYYYZ ZBZCZ[Z\ZZZZZZZZt[u[[[[[[[[[\\P\Q\\\\\\\\]]O]P]]]]]]]^^N^ϸhH;CJOJQJaJ *hH;#hH;B*CJOJQJ^JaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*OJQJ^JphD[[\Q\\\\]P]]]]^O^^^^_,_-_[_n____`*`g```edUBN^O^^^^^^^^__+_,_-_Z_[_m_n_________`)`*`f`g``````````aa aaaaaa)a*a.a/a9a:a?aFaGaaa㵣hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph *hH;CJOJQJaJhH;B*ph3f *hH;hH;hH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJph:``aGaaaa"bcbzbbbb2ctccccd-dUdddddddemeedUBedUBaaaaaaaaab!b"b*bQbRbZbbbcbubybzbbbbbbbbbbbc%c1c2c:ckcsctc|ccccccccccccccdd,d-dTdUddddd響hH;CJOJQJaJhH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph>dddddddddeelemeeeeee@fAfRfSfffffffffffffggg.g1g2gxgygggggggggggggggghhhh%h,h2hbhghmhnh|hhクhH;B*ph *hH;hH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphEmeeeAfSfffffg2gyggghhnhhhi;iw?wuwvwwwwwwwwwwww x!x\x]xxxxxxxxxxy.y/yEyFyGyHy㻻#hH;B*CJOJQJ^JaJphhH;" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphJ&vCv]v{v|vvvv w?wvwwwwwww!x]xxxxx/yFyGyHyIyWyyedUBedUBHyIyVyWy~yyyyyyyyyy$z%zizjzuzvzwzzzz { {{U{Z{[{\{~{{{{{{{{{{{|;|<|=|y|z||||||||| }!}9}:}Q}R}S}}}}}}}}}}}%~&~'~R~S~hH;B*CJOJQJaJph3fhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CJOJQJaJMyyyy%zjzvzwzz {[{\{{{{{|=|z|||||!}:}R}S}}}}edUB}}}&~'~S~|~~~~J6cdހKOɂedUBS~{~|~~~~~~~~~~IJ 56bcd݀ހ JKVWNOWiȂɂς%/06" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphEɂ0|كTUބ߄7`T҆9uч?edUB6Wq{|؃كSTUW݄ބ߄67_`؅  HST[xƆц҆ن89tuÇЇч>?ijhH;CJOJQJaJhH;B*CJOJQJaJph3fhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphL?j"DkщBq}~Ί 7d.XƌedUBedUBӈ!"CDjkЉщABpq|}~͊Ί 67cd֋'-.WXŌƌIJxyÍhH;B*CJOJQJaJph3fhH;B*CJaJphhH;CJaJhH;B*CJOJQJaJphhH;CJOJQJaJLJyčōƍߎ'rՏ!]%m6[edUBedUBÍčōƍЍٍގߎ&'qrďԏՏۏ !\]$%EI`blmst‘͑﷫ﷷhH;B*CJaJphhH;CJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJph?56Z[mnْؒ78bcݓJK{”ϔДؔٔ$/23;stu  ;߹߭hH;CJOJQJaJhH;CJaJhH;B*CJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fD[nْ8cKД3tu K~`edUBedUB;JK}~;_`חؗ)*;klʘ˘23;stu~01;CDnowКךؚ%9@AIdklt˹߹߹߹߹߹߹߹߹߹߹߹" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;CJaJhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fIؗ*l˘3tu1DoؚAlț EqedUBedUBǛțЛ DEpqМWX~˝Нӝ&'@AОܞݞ !QRП֟ן78*+,VWwxhH;B*CJOJQJaJph3fhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphJX'Aݞ!Rן8+,Wx2`edUBedUB12_`ߢ$%TUңӣNOƤǤȤҤڤ?@]^z{ĥť()aghmz٦ڦ !no%&PQ쾾hH;B*CJOJQJaJph3f" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphhH;K%UӣOǤȤ@^{ť)hڦ!o&QedUBQuv%&MN|}Y_`hu()Z[֫׫ث9:STU\ɬʬ()ST\߯߯" *hH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fEQv&N}`)[׫ث:TUʬ)TedUBTЭ]ˮJqį'lmΰnر*eʲHedUBȭϭЭح&U\]eʮˮ BIJR\ipqyïį̯&'/cklmwͰΰ԰$dmntԱױر)*dehH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphRɲʲԲGHjkԳCDLrs{ӴԴܴ)=>pqʵ˵ԵUV|})*ijkuԷhH;CJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphFHkDsԴ>q˵V}*jk<ledUBԷڷ3;<>[`klɸʸ+0PQrswĹŹ̹йܹ$%&bcº޷hH;B*CJOJQJaJph3fhH;B*CJaJphhH;CJaJ *hH;CJaJhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph" *hH;B*CJOJQJaJph:ʸQsŹ%&cºZĻ>oDzؽedUBedUB),YZvûĻλϻ޻=>PWnovüȼ#'=CDvyzᅢ *hH;CJOJQJaJhH;CJaJ *hH;CJaJhH;B*CJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3Žѽ׽ؽ߽-MNv|}ѾҾ'(-stúú׫יׇ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;5CJOJQJ\aJ *hH;CJaJhH;CJaJhH;B*CJaJphhH;B*CJOJQJaJph *hH;CJOJQJaJhH;CJOJQJaJ0ؽN}Ҿ(tؿ?x-b 0C & FedUBedUB׿ؿ->?wx,-ab~ /0BCmn;<NQV\ghno{ްެ *hH;hH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphBCn<o\DhRSredUBedUBedUB%69G[\CDNRgh|QRSqr!"޼hH;B*CJOJQJaJph3fhH;B*CJaJphhH;CJaJhH;CJOJQJaJhH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph" *hH;B*CJOJQJaJph9"l2c/cRrDEaedUBedUBedUB"kl 129\bcj{~(./68D\bcj}%KQRYqry뿿 *hH;CJaJ *hH;CJaJ *hH;B*CJaJphhH;B*CJaJphhH;CJaJhH;B*CJaJph3fHy"=CDEOX`a./ab!XY_oFGyzɽɹɧɧɽ" *hH;B*CJOJQJaJphhH;hH;CJOJQJaJhH;B*CJOJQJaJphhH;B*CJaJph *hH;CJaJhH;CJaJ *hH;CJaJB/bYGz:k3p@edUB 9:Bjk23op?@HNO+,5@mnt?ƴƴ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph *hH;CJOJQJaJhH;CJOJQJaJhH;B*CJOJQJaJph3fG@O,n@}@d-gedUBedUB?@u|}$5?@cd,-fg./0Z[|} ()0FQRYkvw~ξξhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphKg/0[} )RwC1kedUBedUB ;BC018cjkr@Anoz{ RSZ[yhH;CJOJQJaJhH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphQA{S[zW 0uedUByzVW /05tu&'5ST}~45mn$%5^_45no#NO|}hH;hH;B*CJOJQJaJph3fhH;B*CJOJQJaJphV'T~5n%_5oO}edUBedUB#YZ[}~#23ABqr#34stu#$?@wx'(?@no-.nohH;CJOJQJaJhH;B*CJOJQJaJphhH;B*CJOJQJaJphPZ[~Br4tu$@x(@oedUBo.oEYl%UOpDwedUBDEXYkl$%TU%NOopCDvw!"OP$߱" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fC"PFO{,Rc)<hedUBedUB$*EFLNOz{+,27GHQRbcpt();<gh ̢̲hH;B*CJOJQJaJph3f#hH;B*CJOJQJ^JaJphhH;CJaJhH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph@ABghp ,34<TU]p̼̼̼̼̼̼̰̍" *hH;B*CJOJQJaJph *hH;CJaJhH;CJaJhH;B*CJaJphhH;CJOJQJaJhH;hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph7Bh4UM";Z;\edUBedUBedUB"@LMUX\pq|!":;AYZ:;[\bp();`a" *hH;B*CJOJQJaJphhH;CJOJQJaJ *hH;hH;" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph@)a DEgz0<]9d edUB ;CDEfgyz/0;<\]89;cd ;<pqﯯ" *hH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3f" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph> <q \#=aMn;ledUBedUB [\"#<=`a#0:LMmnt :;kl  EFGﻩ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph@ FG[n3c$FKvZ[nedUBGZ[mn 23bc#$EFMrJKuvYZ[emnͻ߯߯߯hH;CJOJQJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fEnH'TFaE i     edUBGH&'ST~EF`aD E h i          E F M         7 8 M m n       ; < hH;B*CJOJQJaJph3f" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphQ F    8 n    < |     ? ^     AMedUB< M { |          > ? M ] ^         @AGHM"LM"9:uvw  "GHXYlmhH;B*CJOJQJaJph3f" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphLM:vw HYmGnYwedUB?FGOfmnvXYvwMN{|TUVlv12RSTu߽hH;hH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJph3fJN|UV2STv3ZedUBedUBuv23QTYZ_bDEnopCDp}~IJprsȼ" *hH;B*CJOJQJaJphhH;B*CJaJphhH;CJaJhH;CJOJQJaJhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CEoD~Js=bcw1fedUBedUB<=abcpvw01efp()1UV^1GHyz#$1UV" *hH;B*CJOJQJaJphhH;CJaJhH;B*CJaJph" *hH;B*CJOJQJaJph3fhH;B*CJOJQJaJph3fhH;B*CJOJQJaJphB)VHz$V E s   !N!!!!edUBedUB  1 D E r s     !!1!M!N!!!!!!!"""A"B"l"m""""""""""##=#>#p#q########,$-$V$W$$$$$$$$"%#%X%Y%%%%%%hH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphhH;B*OJQJ^JphN!"B"m""""""#>#q####-$W$$$$#%Y%%%%"&f&g&{&&edUB%%%!&"&e&f&g&h&z&{&&&&&&'''q'r''''''''''((W(X(}(~(((((())^)_)`))))))))))**=*>*V*W******++R+S+q+r++++++,,),*,U,V,,,,hH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphS&&''r'''''(X(~((()_)`)))))*>*W***+S+r+++edUB+,*,V,,,, -H-g-h---.G.u...../9?9p9q99999":&:':`:ŵhH;B*CJOJQJaJph3fhH;B*CJOJQJaJph *hH;B*CJaJph *hH;CJaJhH;B*CJaJphhH;CJaJG99':a:}::::-;t;u;;;;D<\<p<<<<A=z====6>\>>>edUBedUBedUB`:a:|:}:::::::;,;-;s;t;u;;;;;;;<C<D<[<\<o<p<<<<<<<<==4=9=@=A=I=r=y=z=======>5>6>[>\>d>v>{>>>>׻ׯצׯׯׯׯׯםׯ *hH;CJaJ *hH;CJaJhH;B*CJaJphhH;B*CJOJQJaJph3fhH;B*CJaJphhH;CJaJhH;CJOJQJaJhH;hH;B*CJOJQJaJph>>>>>>>>>???*?1?2?:?F?P?V?]?^?f?|?????????????@@*@+@>@?@R@S@j@k@l@@@@@@@@AA4A5AeAfAzA{AAAAAAAAA)B*B\B]BBBBhH;B*OJQJ^JphhH;B*CJOJQJaJph *hH;CJaJhH;CJaJhH;B*CJaJphJ>2?^??????@+@?@S@k@l@@@@A5AfA{AAAAA*B]BBBedUBedUBBB C CwCxCCCCQDRDDDDEEEE%EEEEE%FcFdFFF%G=G>G?G@GGGHH%HHHHH%I`IaIIII%J;JG?G@GGHHHaIIIPFP{P|P~PPPPQQQQQQQQQQR&R(R2R4R5R" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph *hH;CJOJQJaJhH;hH;B*CJOJQJaJphI^LLLLMMCMsMMMMNNN]OO7P8P=P>PPQQQ_RRR SNSedUBedUB5R7R9R;RIRKR^R_RRRRR S SMSNSfSgS~SSSSSSSSSSTTITJTTTTTTT\U]UUUUUUUUU6V7VVVVWWWQWRWWWWWWWWWWW X X-X.X/X0X1X2X޾hH;B*OJQJ^Jph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphKNSgSSSSSSTJTTTT]UU7VVWWRWWWWWW X.X/X0X1X2XedUB2X3XnXoX|XXXXXXX&Y'Y(YrYsYyYYYYYYYYZZ+Z,ZlZmZyZZZZZZZZZ%[&[7[;[a[b[i[[[[[[[[[[\ \ \ \ \\\*\+\U\V\c\d\i\\\hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;CJOJQJaJhH;B*CJOJQJaJphF2XoXXXX'Y(YsYYYYYZ,ZmZZZZ&[b[[[[[\ \ \\+\V\edUBV\d\\\\\<]o]]]2^s^^^,_E_e____6````` aaNaaaedUBedUB\\\\\\\;]<]i]n]o]]]]]]]]]]]]^&^1^2^:^M^S^g^r^s^{^^^^^^^^^^+_,_D_E_S_d_e_m_s_y_~________ߝߝߝߝߝߝߝߝߙhH;" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphhH;B*CJOJQJaJph3f<_____________` ``(`,`-`5`6`>`_````````` a aaa>aMaNaaaaaaaaa9b:b>bvbwbbbbb c c ccQcRc˹˧˹˹hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph;aaa:bwbbb c cRcccccd)dDd_ddddddeReeee;fxfedUBRccccccccc ddd(d)dCdDd^d_dddddddddddeeQeReeeeeeef:f;fwfxfffffffffgggMgNgOgUgVgWgXg`gggggggg-h.hYhZhhhhhhH;CJOJQJaJhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphLxfffffgNgOgVgWgggg.hZhhhhhhi.iPiciiiii-jKjedUBhhhhhhhii-i.iOiPibiciiiiiiiiiiiii,j-jJjKj\jbjejhjxjyjjjjjjkk7k8kEkLk`kakkkkkkkkkkkl ll,l-lElglhllll" *hH;B*CJOJQJaJph *hH;hH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;B*CJOJQJaJphFKjyjjjk8kakkkkl-lhllllmm3mXm}mmmmmnCnnnedUBedUBllllllllmmm2m3mWmXm|m}mmmmmmmmmmnnBnCnmnpn~nnnnnnnn%o&o'oUoVovowoxoyozooooooooooooooo.p/pNpOpPpWpXpYppp޺" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph" *hH;B*CJOJQJaJphInnn&o'oVowoxoyozoooo/pOpPpXpYppppqDqqqqqq!rQredUBpppppqqCqDqqqqqqqqqqqqqr r!r>rBrPrQrZr[reror|r}rrrrrrrrr s sJsKsXs]sssssssssssss.t/t0t>t?tKtLt^t_ttѿѿ㯯hH;B*CJOJQJaJph3f" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphEQr[r}rrrrr sKsssss/t0t?tLt_ttttttu0u=uquuuedUBttttttttttu/u0u{?{I{g{h{￿￿￿hH;CJOJQJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphIz zazzzzzz{{?{h{{{{{|!|J|{|||}}*}=}g}}}edUBedUBh{{{{{{{{{|| |!|$|%|I|J|z|{|||||}}}%})}*}<}=}f}g}}}}}}}}}}}}}}}}}}}}}~~%~*~+~D~J~褗 *hH;CJOJQJaJhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJph3fhH;" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;CJaJ;}+~f~u~~~>|\]ր >jρDւedUBedUBJ~e~f~t~u~~~~~~~"%=>im{|%[\]Հր  !#%ﻻ" *hH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;hH;B*CJOJQJaJphB%'),-12=>ij΁ρՁ #%(CDJUWX\fi~Ղւ  #$=>WXntǃȃ#45CDUVﻻ *hH;hH;hH;CJOJQJaJ" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphHւ $>Xȃ5DV˄݄OŅƅ2mц8edUBedUBVʄ˄܄݄/0BCNOXYeglmąŅƅ!#%()123>@AENQlmІц쾾ʬ *hH;" *hH;B*CJOJQJaJphhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;B78eḟ͇hi 8]^׉؉QRˊ̊ EFR9:Rvw-.Rjk67OPhH;CJOJQJaJ" *hH;B*CJOJQJaJphhH;B*CJOJQJaJphhH;Q8f͇i ^؉R̊ F:w.kedUB7PhiĎЎю234 D1XY̑ؑedUBPRghiÎĎώЎю1234=̏͏׏ڏۏ܏ =CD~01=LWXYˑ̑בّؑ()=stŒƒǒ￿hH;B*CJOJQJaJph" *hH;B*CJOJQJaJph" *hH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJphEّؑ)tƒǒbPє*+8JabՕ֕"jedUB=ab=OPДє)*+789IJ`abԕՕ֕!"9ij9JK+,9wxy˘̘͘)GHݙޙ%&)qrhH;CJOJQJaJhH;B*OJQJ^JphhH;B*CJOJQJaJphTK,xy̘͘Hޙ&rsST45edUBrs)RST)345RSlmҜӜ);<QRߝ_`MN;<ؠ٠ڠ ŵhH;B*CJOJQJaJphhH;CJOJQJaJhH;B*CJOJQJaJph3fhH;B*OJQJ^JphhH;B*CJOJQJaJphG5SmӜ<R`N<٠ڠ (edUB  '()*+,-./0123456789:;<=>?@ABCDEhhH;jhH;UhH;B*CJOJQJaJph!()*+-.01346789:;<=>?@ABCDEgdedUB01h/R / =!"#$% x666666666vvvvvvvvv666666>6666666666666666666666666666666666666666666666666hH66666666666666666666666666666666666666666666666666666666666666666p62&6FVfv2(&6FVfv&6FVfv&6FVfv&6FVfv&6FVfv&6FVfv8XV~ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@66666_HmH nH sH tH H`H Normal CJOJQJ_HaJmH sH tH DA D Default Paragraph FontRiR 0 Table Normal4 l4a (k ( 0No List 4U`4 0 Hyperlink >*phDV D 0FollowedHyperlink >*phD/D msonormaldd[$\$OJQJHB`"H 0 Body TextB*CJOJQJaJphH/1H 0Body Text CharCJOJPJQJaJLP`BL 0 Body Text 2B*CJOJQJaJph3fL/QL 0Body Text 2 CharCJOJPJQJaJ8Z`b8 0 Plain TextOJQJN/qN 0Plain Text CharCJOJPJQJ^JaJ4@4 0Header  H$B/B 0 Header CharCJOJPJQJaJ4 @4 0Footer  H$B/B 0 Footer CharCJOJPJQJaJPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭Vj\{cp/IDg6wZ0s=Dĵw %;r,qlEآyDQ"Q,=c8B,!gxMD&铁M./SAe^QשF½|SˌDإbj|E7C<bʼNpr8fnߧFrI.{1fVԅ$21(t}kJV1/ ÚQL×07#]fVIhcMZ6/Hߏ bW`Gv Ts'BCt!LQ#JxݴyJ] C:= ċ(tRQ;^e1/-/A_Y)^6(p[_&N}njzb\->;nVb*.7p]M|MMM# ud9c47=iV7̪~㦓ødfÕ 5j z'^9J{rJЃ3Ax| FU9…i3Q/B)LʾRPx)04N O'> agYeHj*kblC=hPW!alfpX OAXl:XVZbr Zy4Sw3?WӊhPxzSq]y Ea K #(.54:>AIEaHNSXN^adh(l ptHyS~6Í;QԷ"y?y$G< u%,05`:>BK5R2X\_Rchlptwh{J~%VPr E    "#%')+-023579;=?ACEGILNPRSUWY[^_acegikmoprsvxz| #"(/4H:?DrIMR W[`meCjzns&vy}ɂ?[QTHؽC@go n M!&+049>B^LNS2XV\axfKjnQruz}ւ8ؑ5(E  !$&(*,./1468:<>@BDFHJKMOQTVXZ\]`bdfhjlnqtuwy{} EXXT # @H 0(  0(  b S  ?C" OLE_LINK7 OLE_LINK6 OLE_LINK11 OLE_LINK10 OLE_LINK9?y7uUFH@UFY^v_ ^`hH ^`hH. pp^p`hH. @ @ ^@ `hH. ^`hH. ^`hH. ^`hH. ^`hH. PP^P`hH.YY _( ( S) IVv4- ) ̀o        UBH;+-@E@Unknown G.[x Times New Roman5^Symbol3. *Cx Arial7CourierC.  PLucida GrandeM  *CharcoalDokChampa?Z PTimesTimes9=  @ ConsolasC.,*{$ Calibri Light7.*{$ CalibriA$BCambria Math"1hT'T'=\=\%0ZZB@P $P'2! xx Normal*Chlamydomonas reinhardtii cytochrome P450sStephen NelsonMooney, Charles P Oh+'0 4@ ` l x',Chlamydomonas reinhardtii cytochrome P450sStephen NelsonNormalMooney, Charles P2Microsoft Office Word@@VE@VE=\ ՜.+,D՜.+,\ hp  'PrivateZ +Chlamydomonas reinhardtii cytochrome P450s Title 8@ _PID_HLINKS'A  file:///chlamy.FASTA.html4'file:///chlamy.tree.pdf  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry FCrE1Tableo*WordDocument2SummaryInformation(DocumentSummaryInformation8CompObjr  F Microsoft Word 97-2003 Document MSWordDocWord.Document.89q