Chlamydomonas
reinhardtii cytochrome P450s
D. Nelson, Sept. 2, 2004
Under revision May 11, 2006
39 named genes, 2 named pseudogenes,
+ one bacterial contaminant
families = 51, 55, 97, 710, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746,
747, 748, 767, 768, 769, 770, 771 (5 old families, 16 new families)
51 is in the 51 clan (sterol 14 alpha demethylase)
55 is of fungal origin. (nitrite/nitrate reductase, soluble
enzyme)
710 is in the 61 clan (C-22 sterol desaturase in fungi [CYP61] and plants [CYP710])
737, 738, 739, 740 are in the CYP85 clan
97 is in the CYP97 clan (carotenoid hydroxylases of epsilon and beta rings)
743, 744 are in the CYP711 clan (CYP711A1 produces a carotenoid hormone in Arabidopsis)
745 may be a new plant clan, CYP97A like
CYP747 is hard to place. 38% to CYP97A6 in C-term half
741 and 742 sometimes cluster with 97 but not always.
741, 742, 748, 767, 768, 769 cluster together and have best hits to CYP4 clan members
746 may be of bacterial origin, best hit is to CYP252A1 Streptomyces peucetius
top 26 hits all bacterial
CYP746 and CYP770 may be the Chalmydomonas precursors of the CYP72 clan
There is a CYP746 in moss
A link to the 2003 Chlamydomonas P450 page
P450s sorted by gene model
number using the JGI annotation
*
indicates more than one gene model for a single gene.
C_60077 CYP742A1
C_130004 CYP739A1
C_130006 CYP739A2
C_130009 CYP739A4
C_130009 CYP739A5
C_130012 CYP739A6
C_130125 CYP739A3
C_140094 CYP-un1Chlre
pseudogene 1, family not identified, half of gene
C_180013 CYP743A1
CYP744A4 between C_239009 and C_239004 not annotated
C_250032 CYP746A1, 39% to
Streptomyces peucetius CYP252A1
C_310063 CYP97A6
C_340039 unnnamed C-term P450
fragment PKG to heme
C_410095 CYP97B6
C_420091 CYP743A2
C_470024 CYP737A1
C_570052 CYP738A1
C_680007 CYP51G1
C_900050 CYP747A1, 41% to
CYP743B2 C-term
*C_940015 CYP744A1
C_940016 CYP744A1, N-term =
C_940015
C_940017 CYP744A2
C_940020 CYP744B1
C_940044 CYP744A3
C_980035 CYP743B3
C_980053 CYP741A1
*C_980058 CYP741A1 N-term
C_1040015 CYP97A5
C_1080041 CYP740A1
C_1130014 CYP743C1
C_1340038 CYP97C3 70% to 97C2
C_1370013 CYP744C1
C_1530020 unnnamed C-term P450
fragment PKG to end
C_1540014 CYP710B1
C_1730009 CYP744A5P pseudogene 81% to
744A3
C_1820019 CYP748A1 about 40% to
C-term half of 741A1
C_1860018 CYP745A1
C_2580005 CYP55B1, 43% to CYP55A6
C_4150003 unnamed CYP97 like C-term
P450 fragment
*C_4260002 CYP97A5
*C_5270001 CYP739A6
C_7970001 unnamed C-term P450
fragment
C_8600001 CYP743B2 falls in a seq gap
of scaffold 98
C_8600002 CYP743B3 same as C_980035
*C_8650001 CYP744B1
*C_9610001 CYP743C1
C_10690001 unnamed C-term P450 fragment
*C_22500001 CYP739A5
*C_28140001 CYP746A1 = C_250032 C-helix exon
duplication
C_32340001 CYP743B1 falls in a seq gap of
scaffold 98
P450s sorted by CYP name (version
2 assembly)
CYP51G1 C_680007
CYP55B1 C_2580005 43%
to CYP55A6
CYP97A5 *C_4260002
CYP97A5 C_1040015
CYP97A6 C_310063
CYP97B6 C_410095
CYP97C3 C_1340038 70%
to 97C2
CYP710B1 C_1540014
CYP737A1 C_470024
CYP738A1 C_570052
CYP739A1 C_130004
CYP739A2 C_130006
CYP739A3 C_130125
CYP739A4 C_130009
CYP739A5 *C_22500001
CYP739A5 C_130009
CYP739A6 *C_5270001
CYP739A6 C_130012
CYP740A1 C_1080041
CYP741A1 *C_980058 N-term
CYP741A1 C_980053
CYP742A1 C_60077
CYP743A1 C_180013
CYP743A2 C_420091
CYP743B1 C_32340001
CYP743B2 C_8600001
CYP743B3 C_8600002 same as C_980035
CYP743C1 *C_9610001
CYP743C1 C_1130014
CYP744A1 *C_940015
CYP744A1 C_940016 N-term =
C_940015
CYP744A2 C_940017
CYP744A3 C_940044
CYP744A4 between C_239009 and
C_239004 not annotated
CYP744A5P C_1730009 pseudogene 81% to
744A3
CYP744B1 *C_8650001
CYP744B1 C_940020
CYP744C1 C_1370013
CYP745A1 C_1860018
CYP746A1 *C_28140001 = C_250032
C-helix exon duplication
CYP746A1 C_250032, 39% to Streptomyces
peucetius CYP252A1
CYP747A1 C_900050 41% to
CYP743B2 C-term
CYP748A1 C_1820019 about 40%
to C-term half of 741A1
C_140094 CYP-un1Chlre
pseudogene 1, family not identified, half of gene
C_340039 unnnamed C-term P450
fragment PKG to heme
C_1530020 unnnamed C-term P450
fragment PKG to end
C_4150003 unnamed CYP97 like C-term
P450 fragment
C_7970001 unnamed C-term P450
fragment
C_10690001 unnamed C-term P450 fragment
P450s sorted by CYP name
(version 3 assembly)
CYP51G1 scaffold_7:2481399-2484780 Protein ID: 126254
CYP55B1 scaffold_52:370660-375180 Protein ID: 121742
CYP97A5 scaffold_55:373287-377786 Protein ID:
39257
CYP97A6 scaffold_42:732596-737181 Protein ID: 121076
CYP97B6 scaffold_1:2256360-2261776 Protein ID: 116601
CYP97C3 scaffold_64:422589-430105 Protein ID: 122396
CYP710B1 scaffold_66:390953-394690 Protein ID: 132687
CYP737A1 scaffold_41:635800-640648 Protein ID: 151890
CYP738A1 scaffold_6:2860971-2864314 Protein ID: 167934
CYP739A1 scaffold_8:1064933-1068008 Protein ID: 140983
CYP739A2 scaffold_8:1078648-1085528 Protein ID: 140985
CYP739A3 scaffold_8:1105803-1109510 Protein ID: 140993
CYP739A4a scaffold_8:1131245-1134169 Protein ID: 165902
CYP739A4b scaffold_8:1135368-1135969 Protein ID: 165903
CYP739A5a scaffold_8:1125087-1127174 Protein ID: 165900
CYP739A5b scaffold_8:1128094-1130653 Protein ID: 186291
CYP739A6 scaffold_8:1145820-1150791 Protein ID: 186292
CYP740A1 scaffold_68:172336-177730 Protein ID: 153850
CYP741A1a scaffold_71:380138-383878 Protein ID: 179637
CYP741A1b scaffold_846:3828-5043 Protein ID:
181363
CYP742A1 scaffold_37:480604-486602 Protein ID: 151489
CYP743A1 scaffold_1:5611907-5617553 Protein ID: 116541
CYP743A2a scaffold_16:
609616-615492 Protein ID: 189550
CYP743A2b scaffold_16:
609616-615492 Protein ID: 116043
CYP743B1 scaffold_71:125260-130065 Protein ID: 122749
CYP743B2 scaffold_71:130374-138996 Partial seq not annotated
CYP743B3 scaffold_71:139305-143478 Protein ID: 122730
CYP743C1 scaffold_17:1489349-1496178 Protein ID:
147793
CYP744A1a scaffold_23:958703-961028 Protein ID: 148983
CYP744A1b
scaffold_23:962118-963228+
Protein ID: 118452
CYP744A2 scaffold_23:969108-971162 Protein ID: 118526
CYP744A3 scaffold_23:976166-982342 Protein ID: 118465
CYP744A4a scaffold_23:1143890-1147747
Protein ID: 95157
CYP744A4b scaffold_23:1141463-1143101
Protein ID: 103666
CYP744A5P scaffold_21:6347-7649 Protein ID: 148389
CYP744B1 scaffold_23:1014183-1020804 Protein ID: 118428
CYP744C1 scaffold_39:932071-938361 Protein ID: 177201
CYP745A1 scaffold_74:79791-84023 Protein ID: 154128
CYP746A1 scaffold_1:3570907-3575049 Protein ID: 116510
CYP747A1 scaffold_96:178714-184286 Protein ID: 108849
CYP748A1 scaffold_9:2353835-2358515 Protein ID: 114278
CYP767A1
scaffold_9:1625885-1634209
Protein ID: 169101
CYP768A1a
scaffold_23:1470852-1473965 Protein ID: 149040
CYP768A1b scaffold_23:1476142-1477663
Protein ID: 149041
C_140094 scaffold_48:305112-303028 Partial seq not annotated
C_4150003 scaffold_21:297178-306479 Protein ID: 191092
C_7970001 scaffold_15:453166-458216 Protein ID: 170931
C_10690001 scaffold_24:545063-551204 Protein ID: 173996
Bacterial scaffold_661:7589-8149 Protein ID:
109783
P450s sorted by scaffold
location (version 3 assembly)
CYP97B6 scaffold_1:2256360-2261776 Protein ID: 116601
CYP746A1 scaffold_1:3570907-3575049 Protein ID: 116510
CYP743A1 scaffold_1:5611907-5617553 Protein ID: 116541
CYP738A1 scaffold_6:2860971-2864314 Protein ID: 167934
CYP51G1 scaffold_7:2481399-2484780 Protein ID: 126254
CYP739A1 scaffold_8:1064933-1068008 Protein ID: 140983
CYP739A2 scaffold_8:1078648-1085528 Protein ID: 140985
CYP739A3 scaffold_8:1105803-1109510 Protein ID: 140993
CYP739A5a scaffold_8:1125087-1127174 Protein ID: 165900
CYP739A5b scaffold_8:1128094-1130653 Protein ID: 186291
CYP739A4a scaffold_8:1131245-1134169 Protein ID: 165902
CYP739A4b scaffold_8:1135368-1135969 Protein ID: 165903
CYP739A6 scaffold_8:1145820-1150791 Protein ID: 186292
CYP767A1 scaffold_9:1625885-1634209 Protein ID: 169101
CYP748A1 scaffold_9:2353835-2358515 Protein ID: 114278
C_7970001 scaffold_15:453166-458216 Protein ID: 170931
CYP743A2a scaffold_16:609616-615492 Protein ID: 189550
CYP743A2b scaffold_16:609616-615492 Protein ID: 116043
CYP743C1 scaffold_17:1489349-1496178 Protein ID:
147793
CYP744A5P scaffold_21:6347-7649 Protein ID: 148389
C_4150003 scaffold_21:297178-306479 Protein ID: 191092
CYP744A1a scaffold_23:958703-961028 Protein ID: 148983
CYP744A1b
scaffold_23:962118-963228+
Protein ID: 118452
CYP744A2 scaffold_23:969108-971162 Protein ID: 118526
CYP744A3 scaffold_23:976166-982342 Protein ID: 118465
CYP744B1 scaffold_23:1014183-1020804 Protein ID: 118428
CYP744A4a scaffold_23:1143890-1147747
Protein ID: 95157
CYP744A4b scaffold_23:1141463-1143101
Protein ID: 103666
CYP768A1a scaffold_23:1470852-1473965 Protein ID: 149040
CYP768A1b scaffold_23:1476142-1477663 Protein ID: 149041
C_10690001 scaffold_24:545063-551204 Protein ID: 173996
CYP742A1 scaffold_37:480604-486602 Protein ID: 151489
CYP744C1 scaffold_39:932071-938361 Protein ID: 177201
CYP737A1 scaffold_41:635800-640648 Protein ID: 151890
CYP97A6 scaffold_42:732596-737181 Protein ID: 121076
C_140094 scaffold_48:305112-303028 Partial seq not annotated
CYP55B1 scaffold_52:370660-375180 Protein ID: 121742
CYP97A5 scaffold_55:373287-377786 Protein ID:
39257
CYP97C3 scaffold_64:422589-430105 Protein ID: 122396
CYP710B1 scaffold_66:390953-394690 Protein ID: 132687
CYP740A1 scaffold_68:172336-177730 Protein ID: 153850
CYP743B1 scaffold_71:125260-130065 Protein ID: 122749
CYP743B2 scaffold_71:130374-138996 Partial seq not annotated
CYP743B3 scaffold_71:139305-143478 Protein ID: 122730
CYP741A1a scaffold_71:380138-383878 Protein ID: 179637
CYP741A1b scaffold_846:3828-5043 Protein ID:
181363
CYP745A1 scaffold_74:79791-84023 Protein ID: 154128
CYP747A1 scaffold_96:178714-184286 Protein ID: 108849
Bacterial scaffold_661:7589-8149 Protein ID:
109783
P450 sequences
Note:
the P450 sequences have many apparent insertions of poly Ala, poly Gly,
poly
S and mixtures of these. These are
found in some ESTs so they are
real. It is not clear why these sequences are
inserted or what they do to the
structure
of these P450s.
>CYP51G1
C_680007 10 EXONS 56% TO 51G1 Arab
EST
SUPPORT BI717817 BU649818 BI726293 BM001590 AV642299
60124
MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG 59927
59843
GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE 59694
59394
VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE 59245
59108
EYFAKWGETGVVDFKDEFSKLITLTAARTLL 59016
58765
GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 58637
58412
CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR 58311
58119
YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK 57985
57667
EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK 57455
57191
GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 57009
56877
IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 56743
newest
data: version 3 checked April 24, 2006
Name:
estExt_gwp_1H.C_70049
Protein
ID: 126254
Location:
Chlre3/scaffold_7:2481399-2484780
100%
match
2481399
MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG (0) 2481596
2481680
GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE (0) 2481829
2482129
VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE (0) 2482278
2482415
EYFAKWGETGVVDFKDEFSKLITLTAARTLL (1) 2482507
2482758
GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 2482886
2483111
CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR (2) 2483212
2483404
YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK (0) 2483538
2483856
EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK (0)
2484068
2484332
GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 2484514
2484646
IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 2484780
>CYP55B1 C_2580005 (possible CYP55 fungal origin), 42% to 105T1
MAPQHD (1)
47793
FPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK 47629 (0)
47243
VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYRY 47127 (0)
missing about 20aa here ? seq gap
AKADKLVDAMIARGGPLDLNEAFSMPLPFR 46168
(0) (same intron loc. as 55A6)
45913
VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL 45773 (0)
45601 VAEKERNPTGKDLISELVTKQ
45539 (0)
45264
LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ 45136 (0)
44693 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ
44565 (0)
44256
LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ
44143 (0)
43755 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW*
43573
newest
data: version 3 checked April 24, 2006
Name:
e_gwW.52.47.1
Protein
ID: 121742
Location:
Chlre3/scaffold_52:370660-375180
Note
gene model is too long at SMPLPFRVGGW, shorten by 4 amino acids
First
exon is still my best guess, not in gene model e_gwW.52.47.1
51%
to CYP55A5v1 Aspergillus oryzae
48%
to CYP55A3 Cylindrocarpon tonkinense
42%
to 105T1 Burkholderia fungorum (bacteria)
370660 MAPQH
(1) 370674
370738 DFPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK
(0) 370905
371291 VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYR (2) 371404
371628
GMVWPYLTPEAVEQLRPSIQ (0) 371677
372474 AKADKLVDAMIARGGPLDLNEAFSMPLPFR (0) 372563
372818 VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL (0) 372958
373130 VAEKERNPTGKDLISELVTKQ (0) 373192
373467 LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ (0) 373595
374038 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ (0) 374166
374475 LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ (0) 374588
374995 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW*
375180
>CYP97A5
15 EXONS 60% TO 97A3 FIRST EXON PREDICTED BY GENSCAN
C_4260002
C_1040015
no
mRNA or homology evidence for exon 1
note:
CYP97A6 has homology to exon 2, but no upstream match for 5000bp
EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158
Gray
resembles a cycad EST
13351 MPPDVSGNMLSFSTSISGCRF (1)
373428
GRSAARFLADLGRQWRAEASKRMPE
(0) 373502
12913
ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSKGLLSEILDFVMGT
12698 (0)
12532
GLIPADGEIWKARRRAVVPALHRK
12461
12332
YVMSMVDMFGDCAAHGASATLDKYAASG
12249
11994
TSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ
11884
11707
AVYTLLREAEHRSTAPIAYWNIPGIQFV
11624
11493
VPRQKRCQEALVLVNECLDGLIDKCKKLV
11407
11269
EEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ
(0) 11168
11003
LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE
(0) 10884
10681
VDELLGDRKPGVEDLRALK (0)
10625
10448
MTTRVINEAMRLYPQPPVLIRRALQ
10374
10118
DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER
10002
9580 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ 9485
9358 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT 9263
9124 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACG
PAVMPVASAEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA*
8852
newest
data: version 3 checked April 28, 2006
Name: gwH.55.10.1
Protein
ID: 39257
Location: Chlre3/scaffold_55:373287-377786
This
model differs from seq below at ends
100%
match from ARGDIRE to DPLTVP
EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158
Gray
resembles a cycad EST
scaffold_55
16 exons
373287
MPPDVSGNMLSFSTSISGCRF (1) 373349
373428
GRSAARFLADLGRQWRAEASKRMPE
(0) 373502
373725
ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL 373874
373875
LTNADKYSKGLLSEILDFVMGT
(0) 373940
374106
GLIPADGEIWKARRRAVVPALHRK
(2) 374177
374306
YVMSMVDMFGDCAAHGASATLDKYAAS
(1) 374386
374641
GTSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ
(0) 374754
374931
AVYTLLREAEHRSTAPIAYWNIPGIQFV
(0) 375014
375145
VPRQKRCQEALVLVNECLDGLIDKCKKL
(0) 375228
375366
VEEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ
(0) 375470
375635
LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE
(0) 375754
375957
VDELLGDRKPGVEDLRALK (0)
376013
376190
MTTRVINEAMRLYPQPPVLIRRALQ
(0) 376264
376520
DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER
(2) 376636
377058
FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ
(0) 377153
377280
FALFEAVVALAMLMRRYEFNLDESKGTVGMTT
(1) 377375
377514
GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACGPAVMPVAS
377663
377664
AEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA*
377786
>CB092428.1
hf05f08.g1 Cycad Leaf Library (NYBG) Cycas rumphii cDNA clone
hf05f08,
mRNA sequence.
Length=609
This
seq supports the secon and third exons above.
Query 40
GRSAARFLADLGRQWRAEASKRMPEVRLELRPCDGGGRASCPVLGKSTYTARGDIREIVG 99
GR+ A+ +A ++WRA + +MPE
ARG++R + G
Sbjct 383 GRALAKSIAVAEQKWRAHNASKMPE-------------------------ARGNVRAVAG 487
Query 100 QPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL 139
QP FVPLY LFL YG +FRL+FGPKSFVI+SDPA AK IL
Sbjct 488 QPFFVPLYNLFLTYGGVFRLTFGPKSFVIVSDPAIAKHIL 607
VVQCAGQAGIRPGFEARAIAWPRCVFVSAKTRGFRLNKRVSNDFLGRQLTIKSFSNRQRG
GKIRAATVSSLNEGGGGNEPAVERVERLTEEDRAELSVRIAAGEFTAEPVTLNLLKIRLF
LIKFGAP
GRALAKSIAVAEQKWRAHNASKMPEARGNVRAVAGQPFFVPLYNLFLTYGGVF
RLTFGPKSFVIVSDPAIAKHIL
volvox
matches
>ABSY36486.y1 CHROMAT_FILE: ABSY36486.y1
PHD_FILE: [top]
ABSY36486.y1.phd.1 CHEM: term DYE: ET TIME: Fri Sep 5
Query:
22 GRSAARFLADLGRQWRAEASKRMPE 46
GR ARFLADLGR+WR+EA+KRMPE
Sbjct:
240 GRPVARFLADLGRRWRSEAAKRMPE 314
>ABSY25604.b1 CHROMAT_FILE: ABSY25604.b1
PHD_FILE: [top]
ABSY25604.b1.phd.1 CHEM: term DYE: big TIME: Tue Sep 16
11:06:39 2003
Length =
1069
Query:
46
EARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 105
+ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK
Sbjct:
340 QARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 519
Query:
106 GLLSEILDFVMGT 118
GLLSEILDFVMGT
Sbjct:
520 GLLSEILDFVMGT 558
>CYP97A6
C_310063 missing exon 1
(0) VRVPLNNVGKVPIFQLLYELYSS (2)
(2) HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS (2)
9247
KGILAEILEFVMGN (0) 9306
seq gap missing 2 exons
9705
XSVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ 9812
10081
AVYSVLRESTVRSTAPFP 10128 (1)
10371
YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSM 10487 (0)
LEAEGSIPMPASPSSPSSSTATSSSAPSSPSAPLEESSA
10853
PTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAAV 10963
11275
LTWALHLLVAHPEVMKRVRDE 11277
11605
VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ 11736
11956
DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR 12075
12251
FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALAVVLRRYDFAPDTSFGPVGFKS 12442
12584
GATINTSNGLHMLISRRDLT 12643
12644
GVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAAVGCPHAAAAATSGAPAGVTP 12811
newest
data: version 3 checked April 27, 2006
Name:e_gwW.42.59.1
Protein
ID:121076
Location:Chlre3/scaffold_42:732596-737181
100%
to e_gwW.42.59.1 from VRVPL to MLISRR
scaffold_42
cannot
identify exon 1
732596
VRVPLNNVGKVPIFQLLYELYSS
(2) 732667
733002
HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS
(2) 733106
733345
KGILAEILEFVMGN (0) 733386
733631
GLLAADGEHWIARRRVVAPALQRK (2) 733702
733949
FVSSQVALFGAATAHGLPQLEAAAAAAAAAAGDSRGGGA 734065
734066
ASVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ (0) 734176
734445
AVYSVLRESTVRSTAPFP (1) 734498
734738
YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSMVGRCCGGGGGGGGG (0) 734893
SSAPTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAA
(0) 735324
735636
ALTWALHLLVAHPEVMKRVRDE (0) 735701
735969
VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ (0) 736100
736320
DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR (2) 736439
736615
FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALA
VVLRRYDFAPDTSFGPVGFKS (1) 736806
736948
GATINTSNGLHMLISRRDLTGGVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAA
VGCPHAAAAATSGAPAGVTPQ* 737181
54%
to DY932408.1 plains sunflower Helianthus petiolaris
MAASLTTLQFPSPYLNTPTTKFKLKSPSTSFPKSYGVSRSCGIKCSYSNGRKPD
SGEEKSGKKVEMTPEEKRRAELSARIASGAFTVEQPSLGSLLVSGLAKLGVPSNILEPVS
NLINSGGNYPKIPEAKGAISAIRSEAFFIP
LYELFLTYGGIFRLTFGPKSFLIVSDPNIA
KHILKDNAKAYSKGILAEILEFVMGTGLIPADGEVWRVRRRVIVPALHLKYVAAMIGLFG
EATDRLCKKLDDAAYNGEDVEMESLFSRLTLDIIGKSVFNYDFDSLD
>CYP97B6
on top of gene model C_410095 but annotation is in the wrong frame
strongly
suspect ARGN... is N-term part of CYP97B6, but no proof
compare
to 97A5 exon 2.
ESTs
BI996334.1 AV390436.1
ALIAHKTLLQLY
ARGNIREIVGQTATVPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNADKYSMGLLTKILDLVMST
64%
identical to 97A5 exon 2 but not in ver 3 of genome
on
the Bac ends from ver 2
PTQ4692.y1 CHROMAT_FILE: PTQ4692.y1 PHD_FILE:
PTQ4692.y1.phd.1
This is
probably a real exon 2 of a CYP97 like seq
HE
479653 DMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK
479552
479264 AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR
479154
ADLKVINECLDNLIKQARDTRVAEDAEALQNRDYSKVSDPSLLRFLVD
MRGEEPTNKQLRDDLMTMLIGGHETTAAV
(44
aa sequence gap up to EXXR)
CLGESLRMY
477871
PQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR 477740
477549 SPYLWKDPDTFRPERFFEPN 477484
SNPDFGGKWAGYRPDAVTGGAALY
PNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAAMLLRRFTFRLAVPAEKV
(1?)
476620 GMATGATIHTANGLSMRVTRRTP 476552
SGGSGSGAPGAAAKVPATV*
>PTQ4692.y1 CHROMAT_FILE: PTQ4692.y1 PHD_FILE:
PTQ4692.y1.phd.1 CHEM: unknown DYE: unknown TIME: Thu Jan 10 11:26:57 2002
TEMPLATE: PTQ4692 DIRECTION: rev
=
trace file 334400148
no other
trace files match, may have errors
TCTATGTGACCTATACAAACTCTCGCTTGGCGAGACCTGGAGGATCACTC
CAGTCTGGCGAATTCGCGGACTCGGGCTCGAAAAAGAGATTGGACTCGAT
CCCTGTCGCCAAGTGCTGAGGAAGGATCCGCTTGTTGGCGATGCAAATTG
CAAAAACGGAATTCAGGAAGCGGAGCGCACGCACTAGATGCCTCCACATG
ACACCGGTAATATGATGACCATTTCAACATAGCATATCACGATGCCGATA
TGGGTGCTGTGCATGACCGACCTTTGGACCAGGGGGTGCCCCATCGTCCA
CGCCCAACTGCCTGCTTGGCTCTGACACAGGACGGTCTGCAGCTCGCTTC
CTGGTGGACTTGGGCCGCCACCGGCGTGCCAAGGCCACAAAGCGCATGCC
TGACGTGAGGTTATAGCTGCGGACCTGCTGACGGCGGTGGGCAGATGCAG
CTGCCCGGTAACTGGGCAAATCCACGTATACTGCATGGTGTGCAATGCAT
GGGGCGTCAGTATACTTGTAAAGGGTGTACTCTCACCTATCAGTGGGCTC
ATATGACCGGGGCCTGCGACTCCGTCCTGAAATCGACAAAAAGCTAGCGC
CCTTGATTGCCCACAAAACTCTCTTGCAATTGTACGCACGCGGCAACATA
CGGGAGATTGTGGGCCAAACAGCGACTGTGCCGCTGAACAAACTGTTCCT
GGTGTACGTGCAGATCTTCCGGGTGTCTTTCCGGCCCAGAGCTTCTGGAT
CATCTCTGAGCCCGCATGATGCGAAGGAGATCCTGCGCACGAACGCTGAC
AAGTACAGCATGGGGCTGCTCACGAAGATCCTGGATCTCGTGATGAGCAC
GCACGGTGCGCGTTGC
newest
data: version 3 checked April 27, 2006
Name:e_gwW.1.53.1
Protein
ID:116601
Location:Chlre3/scaffold_1:2256360-2261776
Green
supported by identical ESTs
Gray
supported by related ESTs, but not identical
Two
small gaps and the N-term are missing
Note:
yellow region is out of order, but supported by an EST
This
seq agrees with model at FIDS to PAFH, GSAVV to AKFR, LEDL to VTRR
Seq gap
here
AV390436.1
BI996334.1
2261776 FIDSGGVYKLVFGPKAFIVVSDPVVVRHILK
(0) 2261684
2261461 ENAFNYDKGVLAEILEPIMGKGLIPADLETWKVRRRAVVPAFHK
(2) 2261331
lyleamvkvfsdcsekmilkseklireketssgedtiel
Arabidopsis
2259284
(0) GSAVVDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK
(0) 2259168
2258877
AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR
(2) 2258770
ADLKVINECLDNLIKQARDTRVAEDAEAL
2263466 QNRDYSKVSDPSLLRFLVDMRGEEPTNKQLRDDLMTMLIAGHETTAAV 2263323
LTWAMFCLVQ (0)
ntdklvkaqaeidtildqrkp Ginkgo
(1) SLEDLKAMPYLRA
2257791
CLGESLRMYPQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR
(2) 2257633
2257436
SPYLWKDPDTFRPERFFEPNSNPDFGGKWA
(1) 2257347
2256904
GYRPDAVTGGAALYPNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAA
2256758
2256757
MLLRRFTFRLAVPAEK (0)
2256710
2256491
VGMATGATIHTANGLSMRVTRRTPSGGSGSGAPGAAAKVPATV* 2256360
>CYP97C3
C_1340038 RUNS OFF END 70% to 97C1
44288
VPLGQDVMISVYNIHHSPAVWDDPE (0) 44214
43839
AFIPERFGPLDGPVPNEQNTDFR 43777 (2)
43352
YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 43227
42864
IGMTTGATIHTTNGLYMYVKER 42799
GAAASGSSGVAGGKQLAAA*
Name:e_gwW.64.11.1
Protein
ID:122396
Location:Chlre3/scaffold_64:422589-430105
e_gwW.64.11.1
has an internal seq between EEMRAA and VPVGQD that is not right
This seq
agrees with model from DIKE to EEMRAA, VPVG to YMYV
>e_gwW.64.11.1
[Chlre3:122396] green parts look right compared to Arab.
The
first exon shown matches a volvox seq.
The true N-term is not identified.
assembled
pieces
422589
GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684
(0)
NVPVANAQPA
423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD
PAAAKHVLRSTDNSQRNIYNKGLVAE (0)
VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2)
AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT
LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT
NDLIKQCKEMVDEEEMRAASAAAAA (1)
(1) GTEYLIEAVPSVLRLLIPERAEVDSTQ (chlamy AFWX153863.b3 with frameshift DST/QLRDD)
LRDDLLSMLVAGHETT (1)
(1) APLTWTLYLLVNNPNKMYAP (0)
390458 (0) AEVDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK
(0) 390625
428555
(0) VPVGQDVMISVYNIHHSPAVWDDPE
(0) 428629
429002
AFIPERFGPLDGPVPNEQNTDFR ()429070
429495
YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ
(0) 429617
429980
IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA*
430105
NVPVANAQPA
is from trace file 658821390
422589
GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684
(0)
NVPVANAQPA
423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD
PAAAKHVLRSTDNSQRNIYNKGLVAE (0)
VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2)
AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT
LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT
NDLIKQCKEMVDEEEMRAA (2)
Trace
335863205 continues seq (no match)
(seq gap)
(0) LRDDLLSMLVAGHETT (1)
Trace
file 650467013 matches mid region of gap
in
e_gwW.64.11.1
This seq has 436 (1) APLTWTLYLL (0) = 97C like seq
390464 (0) VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK
(0) 390625
This
fragment matches scaffold 64 from 390464 to 390625
Missassembled 97C3 seq
428555
(0) VPVGQDVMISVYNIHHSPAVWDDPE
(0) 428629
429002
AFIPERFGPLDGPVPNEQNTDFR ()429070
429495
YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ
(0) 429617
429980
IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA*
430105
(0) dymndsdpsvlrfliaareevdstq (volvox trace 636376981)
blast of Chlamy
unplaced reads with Physcomitrella 97C seq
>SYF31892.y1 CHROMAT_FILE: SYF31892.y1 PHD_FILE:
SYF31892.y1.phd.1
[top]
CHEM:
term DYE: ET TIME: Mon May 20 17:26:52 2002
TEMPLATE: SYF31892 DIRECTION: rev
Length = 786
Score = 76.5 bits (163), Expect = 7e-14
Identities = 30/46 (65%), Positives =
40/46 (86%)
Frame = +1
Query:
31
QLRDDLLSMLVAGHETTGSVLTWTVYLLSKNPAALAKVHEELDRVL 76
QLRDDL++ML+AGHETT +VLTWT+YLLS++P A A + +E+ RVL
Sbjct:
196 QLRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKEVRRVL 333
>AFWX152107.b2 CHROMAT_FILE: AFWX152107.b2
PHD_FILE: [top]
AFWX152107.b2.phd.1 CHEM: term DYE: big TIME: Mon Nov
1 12:27:19
2004
Length =
1012
100%
match to 97C1 Arab.
Query:
31 QLRDDLLSMLVAGHETTG 48
QLRDDLLSMLVAGHETTG
Sbjct:
15 QLRDDLLSMLVAGHETTG 68
GCAGGTGGATCACGCAGCTGCGCGACGACCTGCTGTCCATGCTGGTGGCGGGACACGAGACCACGGGTGAGGGGGGGCGGGGGCAGGGGCTTGTGCCGGCCACCCGTTAT
This
matches trace file 587324724
in Chalmy
Also
matches 636376981 in volvox
Use
the volvox seq to look upstream
The
volvox seq matches Physcomitrella 97C5 with no intron
Query: 317
QDYMNDSDPSVLRFLIAAREEVDSTQLRDDLLSMLVAGHETT 442 volvox
++Y+N+SDPSVLRFL+A+REEV S QLRDDLLSMLVAGHETT
Sbjct: 372 EEYVNESDPSVLRFLLASREEVSSVQLRDDLLSMLVAGHETT
413 moss 97C5
volvox 97C
like I-helix
>ABSY190514.b1 CHROMAT_FILE: ABSY190514.b1
PHD_FILE: [top]
ABSY190514.b1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28
22:26:28 2003
Length =
1327
Score = 35.8 bits (73), Expect = 0.033
Identities = 16/40 (40%), Positives =
26/40 (65%)
Frame = -1
Query:
18
LLASREEVSSVQLRDDLLSMLVAGHETTGSVLTWTLYLLS 57
LL + +
+S+ +LR + +LVAG ETTG + W+L L+
Sbjct:
532 LLITGKPLSAKRLRCETAFLLVAGFETTGHGIAWSLLFLA 413
Volvox
I-helix region 97B like seq
>ABSY134624.g1 CHROMAT_FILE: ABSY134624.g1
PHD_FILE: [top]
ABSY134624.g1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28
22:55:34 2003
Length =
1303
Score = 32.6 bits (66), Expect(2) =
0.001
Identities = 12/19 (63%), Positives =
18/19 (94%)
Frame = -3
Query:
23 EEVSSVQLRDDLLSMLVAG 41
E+V++
QLRDDL++ML+AG
Sbjct:
137 EDVTNKQLRDDLMTMLIAG 81
Score = 22.8 bits (44), Expect(2) =
0.001
Identities = 9/14 (64%), Positives =
11/14 (78%)
Frame = -3
Query:
11 DPSILRFLLASREE 24
DPS+LRFL+ R E
Sbjct:
176 DPSLLRFLVDMRGE 135
Volvox
EXXR region for 97C
>ABSY52309.x1 CHROMAT_FILE: ABSY52309.x1
PHD_FILE: [top]
Query:
21 PTIQDMKKLKYTTRVMNESLRLYPQPPVLIRRSIDNDIL
59
PT+
D +L+Y R +NES+RLYP PPVL+RR++
D L
Sbjct:
168 PTLADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDEL 284
(0)
VESVMGSRTAPTLAD
YGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGYK
(0)
GTGGAGTCCGTGATGGGCAGCCGTACCGCCCCCACCCTGGCGG
ACTACGGCCAGCTGCGGTACGTGATGCGCTGTGTGAACGAGTCCATGCGGCTCTACCCGC
ACCCGCCCGTGCTGCTGAGGAGGGCGCTGGTGGAGGACGAGCTGCCGGGGGGCTACAAG
This
volvox DNA matches trace files for Chlamydomonas 90%
336308963,
335368868, 335328342
(0)
VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLL
RRALVEDELPGGFK (0)
This
fragment matches scaffold 64 from 390464 to 390625
Missassembled 97C3 seq
N-term
region
>gi|93288035|dbj|BW989539.1| BW989539 Chamaecyparis obtusa cambium and
surrounding tissues
Chamaecyparis
obtusa cDNA clone CO02636 5', mRNA sequence.
Length=565
Score = 85.9 bits (211), Expect = 7e-16
Identities = 41/78 (52%), Positives =
54/78 (69%), Gaps = 1/78 (1%)
Frame = +3
Query 5
DSKGAGTSFTSPGWLTQLNMLWGGKSVSNVPVANAQPADIKELLGGALFKALYKWMQESG 64
D GAG S+ SP WLT + G S
+P+ANA+ D+K+LLGGALF L+KWM+ESG
Sbjct 234 DKAGAGLSWVSPDWLTSFMKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESG 413
Query 65 PIYLLPTGPVSSFLVVSD 82
P+Y L GP +F+V+SD
Sbjct 414 PVYRLAAGP-RNFVVISD
464
ISPSLPSITSNVAVSLPKQSTRKKKTRLLRIQCRVDEKSTSTDKAGAGLSWVSPDWLTSF
MKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESGPVYRLAAGPRNFVVISDP
EAAKHVLRNYGKYGKGLVSEVSQFLFGSGFAIAEGELWMVRRKAVLPSIHRKYLSVMVDR
VFCKCAERLVEKLNRDTEMAVEVNME
volvox
>ABSY209455.b1 CHROMAT_FILE: ABSY209455.b1
PHD_FILE: [top]
ABSY209455.b1.phd.1
CHEM: term DYE: big TIME: Fri Nov 28
23:45:44 2003
Length =
1108
Query:
25
GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS 56
GK+ID+
GAG SFTSPGWLTQLNMLWGGK VS
Sbjct:
236 GKSIDAAGAGASFTSPGWLTQLNMLWGGKGVS 331
Volvox
>ABSY179960.b1 CHROMAT_FILE: ABSY179960.b1
PHD_FILE: [top]
Query:
61 EAVELIRKTTNDLIKQCKEMVDEEEMRAA
89
+AVELIR+TTNDLI++CKEMVDEEE
AA
Sbjct:
443 KAVELIRQTTNDLIRKCKEMVDEEEREAA (1) 357 agt
>CX541939.1|
s13dNF0BH03GS032_467186 Germinating Seed Medicago truncatula
Query 1
LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKAL 60
LD+IG SVFNY+F++L SDSPVI+AVYTALKE E R+TDLLP WK+ + +IPRQ KA
Sbjct 120 LDVIGLSVFNYNFDALNSDSPVIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAE 299
Query 61 EAVELIRKTTNDLIKQCKEMVDEEEMR 87
AV +IRKT DLI+QCKE+V+ E R
Sbjct 300 NAVTVIRKTVEDLIEQCKEIVESEGER 380
SIMVDRVFCKCAERLVEKLQADAVNGTAVNMEDKFSQLTLDVIGLSVFNYNFDALNSDSP
VIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAENAVTVIRKTVEDLIEQCKEIV
ESEGERIDADEYVNDADPSILRFLLASREEVSSVQ
>CYP710B1 C_1540014 10
EXONS 43% to 710A1 exon 1 predicted by genscan.
EST
SUPPORT BI719962.1 There are
two possible start codons 15aa apart.
20577 MNATGLLNDGLASLG
MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 20356
20000 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP
19905
19904
GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 19785
19644 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ
(0) 19498
19355 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 19269
18917 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 18802
18583 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 18497
18215 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR
(0) 18012
17722 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLTETYTAPKGALIVPSLVAACKQ
17456 (0)
17279
GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 17155 (0)
16995
YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 16840
newest data: version 3 checked April
30, 2006
Name: estExt_gwp_1H.C_660048
Protein ID: 132687
Location: Chlre3/scaffold_66:390953-394690
394690 MNATGLLNDGLASLG
394645
MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 394505
394113 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP
GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 393898
393757 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ
(0) 393596
393468 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 393373
393030 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 392911
392696 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 392610
392328 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR
(0) 392152
391835 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLT
ETYTAPKGALIVPSLVAACKQ 391608
(0)
391392
GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 391270 (0)
391108 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK*
390953
>CYP737A1
C_470024
I
cannot detect the N-terminal sequence for this gene. (about 100 aa)
13432
(2) SWPAATVAMLGTDSVTFST 13379 (1)
13145
GAYHRSLRRLLGPCFSPQ 13092 (0) C-helix
12878
AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0)
ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS 12322 (1)
12098
NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS
(0)
DLAINLLFAGHETTATSIVRLML (0)
VLRSRPDVVSRLREEQAAAVRQHGAAIS (1)
10590
GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0)
GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2)
WLSGRSSASGNSSNSASSSAL
QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM
GMLPPQMLTFGGGGRYCLGANLAWAELK (0)
VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0)
VRAR*
newest
data: version 3 checked April 30, 2006
Name: Chlre2_kg.scaffold_41000082
Protein
ID: 151890
Location: Chlre3/scaffold_41:636238-640632
640648
(2) SWPAATVAMLGTDSVTFST 640592 (1)
640358
GAYHRSLRRLLGPCFSPQ 640305 (0) C-helix
640091
AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0) 639933
639681 ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS (1) 639538
639314
NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS
(0) 639183
638938
DLAINLLFAGHETTATSIVRLML (0) 638870
638116
VLRSRPDVVSRLREEQAAAVRQHGAAIS (1) 638033
637803
GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0) 637672
637263
GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2) 637156
636987
WLSGRSSASGNGSSNSASSSAL
QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM
GMLPPQMLTFGGGGRYCLGANLAWAELK (0) 636721
636345
VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0) 636238
635814
VRAR* 635800
>CYP738A1
C_570052 a member of the CYP85 clan
There
is a problem between exons 3 and 4.
In almost all members of the CYP85 clan (CYP85, CYP707, CYP90 etc.) There
are 28 amino acids between TVM and LVG
in
this gene there is no way to accomplish this spacing. I suspect an error.
The
yellow sequence can be inserted if a T to A change occurs at 78905
creating
an AG boundary, but the sequence is still 5 aa short. Need an EST
78090
MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE
LPLPEGSLGLPVVGETLELITN (1) 78317
78475
GDTFGTSRRERYGDVYKTNILGAPTVM
78555 (0)
78907
VAAPMARRYACICFRFSCQVTST
78976
LVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ 79104 (0)
AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT
(1)
79935
AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT 80030
VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST
GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQ (0)
DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK 80709 (0)
81002
GWRVWCHVGDSVTRYNKDQFQPERWLGSSG 81091 (1)
MAAGGCPMHAGGGGAARGA
81230
QPEYSLPFGSGVRTCLGRNLVMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL*
81433
newest
data: version 3 checked April 30, 2006
Name: fgenesh2_pg.C_scaffold_6000379
Protein
ID: 167934
Location: Chlre3/scaffold_6:2860971-2865055
2864314
MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE
LPLPEGSLGLPVVGETLELITN
(1) 2864090
2863929
GDTFGTSRRERYGDVYKTNILGAPTVMVYGE
(0) 2863837
2863692
DAVRAVLAAEDRLVASDWPQ (0)
2863631
2853440
VTSTLVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ (0) 2863300
2862803 AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT (1) 2862681
2862469
AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT
VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST
GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQQ (0) 2862116
2861859
DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK (0) 2861695
2861402
GWRVWCHVGDSVTRYNKDQFQPERWLGSSG (1) 2861313
2861231
MAAGGCPMHAGGGGAARGAQPEYSLPFGSGVRTCLGRNL
VMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL* 2860971
>CYP739A1
C_130004 no ESTs inserts in exon 3 and exon 6 INSERTION IN EXON 8
newest
data: version 3 checked May 1, 2006
Name:Chlre2_kg.scaffold_8000154
Protein
ID:140983
Location:Chlre3/scaffold_8:1065299-1068007
1064933
MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSR
ATSVPRLGPLLTIPLLGDVAWLAADPTRFVFGR (2) 1065157
1065263
FQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVP
FSSFRRLMEVAPGRPYDVQADKAAHGPW (0) 1065466
1065649
RRMFLSALGPAGLQALLPRAQAVMQAHLAQWEAAGTAAGGRSGGGCIPSLFRQ (0) 1065807
1065921
VRLLSVDLAIEVIAEVPLPPGVERIAFREQ (0) 1066010
1066110
LLCFLDGLFGLPLALPGSSVARALAAKEELVAALGPLVAADRQRMAKR (0) 1066253
1066445
WRAAGSSYAALVDTLTAASAAVGGSAAAEAAAGVQAAEPSAAAAARVTVRDAVISGFMALG (2) 1066627
1066780
RAAAVSVLHAVVAGADTTRFALFNTLALVAMSARVQEEIFAEQER (0) 1066914
1067117
VVAEHGPELSARVLGSAAITPYLDAVVREAMRLLPATPGN
MRRLTADLRVGAGRGGPASELVIPK (1) 1067311
1067464
GSMVWRFVPLMHCLDPVLWDGDTSVDVPAHMDWRSNFEG
AFRPERWLSEDTKPKYYYTFGSDNHLCVGQNLAYM (0) 1067685
1067865
EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRGTDRVLLEPR* 1068008
>CYP739A2 C_130006 EST support
BI724239.1 1031069F06.y1
note
micro exon of 24 nucleotides (phase 1 boundaries)
newest
data: version 3 checked May 1, 2006
Name:Chlre2_kg.scaffold_8000156
Protein
ID:140985
Location:Chlre3/scaffold_8:1078611-1085528
1078648
MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGPLLHVPLLGDG
VLMAAGNPVKMFWDR (2) 1078872
1081962
YRRYGSVFRTMMLGSRIWVVTDLDALRGPLRDEGAYLEIPFKAFQRLV
(2) 1082105
1082294
SAESFLNRPGVHGPW (0) 1082338
1082426
RKIFSATLAPPRLAAMVPKIAQ (0) 1082491
1082664
LMQSHLSKWEEQGQVTIFRA (0) 1082723
1082865
ARVMGVDLAVDVILDIKLLDGTDRAWVKSQ
(0) 1082954
1083213
VEDYLDGLYGLPLNLPGSTLSKALAARARLVEVFLRQPDVAAMQAQF (0) 1083353
1083542
WEAIGKSPQAYAAAVLDQHTSTGDKPAGVAAEEEPSGKAAGAPTPAAP
GSRPAVLPPSIMTAQLMGRAMLK (2) 1083754
1084122
PSELADGAMSLLHMLVASADTTRFALFNTWTLLAMSPRVQDKLYEEQKK (0) 1084268
1084520
VMAEYGEELSYAATCHMPYMDATLKECMRLLPASAGGIRKLTADMQVGGYTVPA (1) 1084681
1084836
GEYVWYHA (1) 1084859
1085017
GLMHYIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETRPRYM
FTFGTGAHLCIGMNLVYL (0) 1085214
1085385
EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKGTDTVLLEPR* 1085528
>CYP739A3
C_130125 PTQ11643.x1 PTQ6387.y1
insertion
of 15 aa in the WEEG region (DRWT) end of exon 3
Also
insertion in exon 6
169549
MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR (2) 169388
169172
FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEIPIKSFKMLTELPSPNSDRVNHATW (0) 168982
168818
RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0)
168370
QRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 168342
168037
VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 167894
WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGGGGGGGPEAAAPAPQGQQQ
SLFRLPQAVMLGFFGLK (2)
167270
ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 167136
166905
VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 166687
166388
GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG 166275
166274
AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 166155
165982
EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 165842
newest
data: version 3 checked May 1, 2006
Name:Chlre2_kg.scaffold_8000164
Protein
ID:140993
Location:Chlre3/scaffold_8:1105803-1109384
1109510 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR
(2) 1109349
1109134 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEI
PIKSFKMLTELPSPNSDRVNHATW (0) 1108943
1108779
RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0) 1108612
1108334
CRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 1108245
1107998
VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 1107855
1107661
WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGG
GGGGGPEAAAPAPQGQQQSLFRLPQAVMLGFFGLK (2) 1107440
1107243
ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 1107097
1106866
VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 1106705
1106352
GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG
AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 1106131
1105943
EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 1105803
>CYP739A4
C_130009 no ESTs insert
in exon 8, 52% to 739A5
MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0)
WPFLGDAVELGITSDLSRLM
(2)
7765
FKKYGRVFRLNLLGHTAFV (0)
7434
VSDEAALRGVLSDDGAIATIPFRAFS (2) 7411
7197
DLMGEYGTQSVKEIHGPW (0) 7181
6868
RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0)
SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1)
5973
GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE
QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR
(2)
5465
HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 5420
4094
VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY 3915
3914
HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYYFTFGSGVHLCAGVNLVYL (0)
3498
EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL*
newest
data: version 3 checked May 1, 2006 (Join two models)
Name:fgenesh2_pg.C_scaffold_8000179
Protein
ID:165902
Location:Chlre3/scaffold_8:1131576-1134169
Name:fgenesh2_pg.C_scaffold_8000180
Protein
ID:165903
Location:Chlre3/scaffold_8:1135368-1136663
1131245
MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0) 1131400
1131576
WPFLGDAVELGITSDLSRLM (2) 1131635
1131729
FKKYGRVFRLNLLGHTAFV (0) 1131785
1132054
VSDEAALRGVLSDDGAIATIPFRAFS (2) 1132131
1132291
DLMGEYGTQSVKEIHGPW (0) 1132344
1132623
RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0) 1132748
1133096
SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1) 1133209
1133518
GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE
QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR
(2) 1133808
1134023
HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 1134169
1135197
VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY
HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWL
SEETKPKYYFTFGSGVHLCAGVNLVYL (0) 1135580
1135793
EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL* 1135969
>CYP739A5
C_130009 C_22500001
EST SUPPORT BI527318 BG852189 BE129324 BI527323 BI527331
BU651784.1
MICRO
EXON 13 NUCLEOTIDES
newest
data: version 3 checked May 1, 2006 (Join two models)
Name:fgenesh2_pg.C_scaffold_8000177
Protein
ID:165900
Location:Chlre3/scaffold_8:1125087-1127174
Name:estExt_fgenesh2_pg.C_80173
Protein
ID:186291
Location:Chlre3/scaffold_8:1128094-1131067
1125087
MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLF
SWPFLGDTIEFATTDPTKFLFGR (2) 1125287
1125418
FKRYGR (2) 1125435
1125617
VFRLSLLGFTAYVTADPEALRPLLADEGGHFTIPVQTFTALMGAYNLQAHKEVHAAW
(0) 1125787
1126081
RKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRVELYEA
(0) 1126206
1126538
ARTLGLDLAVDVLSGVKLEERGIQPAWLKSR
MADFLGGLYGLPLALPGSPLAKALAAKEELLRVLVPAVEGRQQELLKL (0) 1126774
1127068
WEDNDRSAAAVATKLASSPETATIADANLLGFTARG
(2) 1127175
1128397
CTTPRDAAMTVLHAVMGAADTTRFALFNTWAILAMSPRVQDLIYEEQKK
(0) 1128543
1128758
VVAENGPELTYKTAMSMP (2) 1128811
1129253
YLDAAFKEAMRLLPASAGGFRMLTKELRVGDVLLPP
(1) 1129360
1129696
GTIIW (2) 1129710
1130071
FHALLLQTLDPVLWDGDTSVDVPVHMDWRNNFEGAFRPERWLSEET
KPRSYYIFGQGAHLCAGMVLVTL
(0) 1130277
1130498
EVKLLLAMVLRKWRLQLEVPDMLARAELFPYTKPAKGTGGMRLIAREQPVA*
1130653
>CYP739A6
C_130012 C_5270001 33% to 707A2, 85 clan member, 57% to 739A2
ESTs
BU647654.1 BI528139
28201
MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQSRFMFSR
(2) 28013
27729
YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFPTFKLLMGDFNVAGGGKHIHGPW
(0)27538
26890
RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS
(0) 26765
26576
ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 26487
26101
AETWLYGMWGLPVPLPGS (2) 26048
25807
ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 25718
25284
WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2)
24803
ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 24672
24589
VIEEFGPELSYKAASSMP (2) 24536
24153
YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 24046
23660
GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 23493
23362
FTFGYGNHLCAGINLAYL (0) 23309
23164
EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 23024
newest
data: version 3 checked May 1, 2006 (Join two models)
Name:estExt_fgenesh2_pg.C_80177
Protein
ID:186292
Location:Chlre3/scaffold_8:1145690-1151969
1145820
MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGP
MFTLPILGDTVALALSEQSRFMFSR (2)
1146008
1146292
YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFP
TFKLLMGDFNVAGGGKHIHGPW (0)
1146483
1146925
RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS
(0) 1147050
1147239
ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 1147328
1147714
AETWLYGMWGLPVPLPGS (2) 1147767
1148008
ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 1148097
1148531
WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2) 1148674
1148997
ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 1149143
1149226
VIEEFGPELSYKAASSMP (2) 1149279
1149662
YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 1149769
1150155
GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 1150322
1150453
FTFGYGNHLCAGINLAYL (0) 1150506
1150651
EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 1150791
>CYP740A1
C_1080041 ONE EXON IN A SEQ GAP C_1080041
3676
MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTRYMWK (0) 3500
NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF
(0)
VPGKTFTMLISDIRHMQVPEQHAVF (0)
RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0)
CAAVSLDVALEVLTGVPLPGGPETRAEVRRGTGG
(0)
LFRTLAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPPELDREVAELAGK (0)
SRLPTGGTAWHRTPRPGSAACPRGPTADAGSRRSHRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR
(2)
TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ
(0)
AMRKYGDELSYSILNDMPYLDAVIK
(0) 497
ETIRVFPTAVGGFRRALKDVP (0) TIN268971.x1
XXXXXXXXXXXXXXXXXXXXX PKG
motif in seq gap
(0) AFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE (0)
ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP*
newest
data: version 3 checked May 1, 2006 (Join two models)
Name:Chlre2_kg.scaffold_68000022
Protein
ID:153850
Location:Chlre3/scaffold_68:173935-177729
172336
MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGP
AFYLPAIGETLSLFASPTRYMWK (0) 172533
172751
NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF
(0) 172870
173104
VPGKTFTMLISDIRHMQVPEQHAVF (0) 173178
173539
RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0) 173667
173950
CAAASLDVALEVLTGVPLPAAPETRAEVRRGTGG (0) 174051
174373
LFRTALAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPELDREVAELAGK (0) 174522
174680
SRLPTGGTAWHETHLAHARTPRPGSAACPRGPTADAGSRRS
HRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR (2) 174913
174948
TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ
(0) 175109
175468
AMRKYGDELSYSILNDMPYLDAVIK (0) 175542
175867 ETIRVFPTAVGGFRRALKDVP (0) 175929
176602
VEGGQLIPAGSIVFYSTHLLNAADPALLPRSLAPE (2) 176706
176860 ALEGPTGLPAHLDYE (0) 176904
177170 CRLEEAFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE
(0) 177292
177581
ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP*
177730
>CYP741A1
14 exons C_980053, C_980058 (N-term part)
72A9
LIKE exons 3,4,5,13,14 not well supported
8031
MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK
SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP
(1) 8225
8408 VHQCMARWAEQYGKIFKLELPTMT (0) 8479
missing
approx 80 aa between exon 2 and VMTG
this
is a very poorly conserved region, so it is very hard without cDNA to
identify
the missing piece(s).
10741
VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ
(0) 10893
11059
VVGELMTALQRMLNPLNRWFWWRK (0) 11130
11410
EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 11544
11782
GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 11901
13254
VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) 13370
13544
VIHESMRLMPPTSGGTVR (2) 13597
13750
VVPRDTQLAGHVLPKGTMLW (0) 13809
14146
IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) 14253
14541
ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 14648
14920
FRLADDMGGVEG (1) 14955
15160
AVSEARQHITLKPGDRGLLMHAIPRVPA* 15246
newest
data: version 3 checked May 1, 2006
note
version 2 seq is better than ver 3 at this gene
Name: fgenesh2_pg.C_scaffold_71000048
Protein
ID: 179637
Location: Chlre3/scaffold_71:380138-384009
Name: fgenesh2_pg.C_scaffold_846000001
Protein
ID: 181363
Location: Chlre3/scaffold_846:2079-5042
34% identical
to CYP767A1, 29% to Cyp3a11 Drosophila 4 clan member
380138
MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK
SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 380332
380494
VHQCMARWAEQYGKIFKLELPTMT (0) 380589
381479 AVVLTDPEAVSQVLKVDRFEKLTTSYQNMEK (0) 381582
382152
LTAEQQPNILTEPLSAYYKAVRRAVTPAFSTANLR
(2) 382211
328813
RFFPLLLDITQQ
382849
VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 383001
383167
VVGELMTALQRMLNPLNRWFWWRK (0) 383238
383518
EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 383652
383890
GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0)
383878
VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) (in a seq gap)
5043 VIHESMRLMPPTSGGTVR (2) 4990
4837 VVPRDTQLAGHVLPKGTMLW (0) 4778
IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG
(0) (in a seq gap)
4242 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 4135
3863 FRLADDMGGVEG (1) 3828
AVSEARQHITLKPGDRGLLMHAIPRVPA* (in a seq gap)
>CYP742A1 C_60077 29% to
741A1
YELLOW COULD BE
REMOVED AS AN INTRON
newest
data: version 3 checked May 1, 2006
Name: Chlre2_kg.scaffold_37000075
Protein
ID: 151489
Location: Chlre3/scaffold_37:480605-486413
Note: the
model Chlre2_kg.scaffold_37000075 is short at the N-term. It is
missing the
first 63 aa. It is also wrong at
the end of the second exon
After WRA
486602 MHTAPRRIHAARCRPLHASTGASTPGPAGAPDLPPLQRA
PGPPGLPWLGQLPAYLATKFFPKKMLEWSEQYNGVYAMEIVGRKYLVVT (1) 486339
486182 EPSLIAGIVGRGSAGLPKSTGYAMWDSAIS (2) 486093
485810 PHAGVQGLFTVAENTTTWRAVRRAYGPAIGPGSMS (2) 485706
485124
SGTSTSTSSSSTASINSTTGLTSHEMNHLAKCLTLDMLGLSAFG
IDFRCLDDPAAAQLPSLIES (0) 484933
484532 AMHECGERARSVGRRLLPWLYEEEARAGAADMAAFHALVE (0) 484413
484091 DVWRQIRARGAPTEDDNSFGAQLLRLADPSLAP
(1) 483993
483712 GGAALSDEQICAEIATVIIAGYETTAN (2) 483632
483379 TLTWMLYGLHAHKDASEQLVAELRGA (1) 483302
483023 GLVPDTSSSSSPSSVDPTTASFASLAGAHEALGGLPVLDAYVRECLRLYSTAP
NGLIKEVPKNGPPARV (1) 482817
482606 GPFAADPGVVVWIPFWSLHLSNLNWEQPHDFQL (0) 482508
482279 SRWLGKDPRTAGSLTASRCP (0)
VSGTLNALRAATSSSSSSSSSSSSSSSSSSSSSSGSDSDGEGGSSSGGRGSK
(0)
AIRFMPFGDGSRNCVGQHLGMLQLK (0) 481989
481146 LSLAYLAARFDLVLDEARMGGSAAAALERQRVNLTLEVDGGM (2) 481021
480681 YLLGASVHSHARVYWYQLVSCEPKC* 480604
>CYP743A1
C_180013 16 exons
MLRALSCLALLAAGAARLAAAAGATDSA (0)
14775
VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 14852
15120
GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0?)
15414
VAMGRKWVVVVADAELMRQ (0)
AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2)
16227
GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0)
GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2)
16969
RLAVACGDVFRFGSALHGSS (2) 17034
17266
YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 17364
17660
ARSTLLAACMELIRSWRQQHHATT 17722 (large insert here)
TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV
GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ
(0)
VQTFLLAGYETTANALAFAIYCVATHPE (1)
181099
VESRLLAEVDAVLGRDR (2) 181146
18987
PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0)
19333
LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 19479
19665
ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 19715
19932
EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 20069
newest
data: version 3 checked May 1, 2006
Name: e_gwW.1.412.1
Protein
ID: 116541
Location: Chlre3/scaffold_1:5612270-5613769
5617553
MLRALSCLALLAAGAARLAAAAGATDSA (0) 5617470
5617234
VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 5617145
5616874
GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0) 5616770
5616574
VAMGRKWVVVVADAELMRQ (0) 5616518
5616251
AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2) 5616156
5615776
GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0) 5615657
5615530
GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2) 5615438
5615016 RLAVACGDVFRFGSALHGSS
(2) 5614957
5614725
YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 5614630
5614331
ARSTLLAACMELIRSWRQQHHATT (large
insert here) 5614263
5614262 TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV
GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ
(0) 5613966
5613766
VQTFLLAGYETTANALAFAIYCVATHPE (1) 5613683
VESRLLAEVDAVLGRDR (2)
5613004
PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0) 5612882
5612664
LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 5612548
5612380
ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 5612279
5612062
EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 5611907
>CYP743A2
C_420091 33% to CYP711A1
16
exons EST support
BM002146 BI728655 BE726345 N-term to C-helix
37486
MQDVISFLLNGLGFAAVGLVVL (0)
37551
37671 QLVLSLDLYKRWKLRHLP (1) 37724
37956 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR
(0) 38060
38400 VAFGRNWMVVVAEPDLLRQ (0) 38456
38720 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR (2)
38812
39351
DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 568697
GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2)
VDFQAMESLPAAGTGEGGADTKPAAAP
39994
APASSSYG RVLVQACRDVFKYSSVVYGSK (2) 40059
40319
YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT
(0) 40408
40595 ARTHLRDACMSLIDGWKKQEASG 40654
VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE insert
567380 AAAAVGTAAGGVGGLSAGSFLGLMLAAR 567309
DKSTGEGLTDLQVAAQ (0)
41049 VQTFILAGYETTANALAFAVYCLATNPE (1)
41141
AEAKLLAEIDAVLGPDR (2)
41578
LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 41700
42000
VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 42119
42294 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ
(0) 42395
42712 EAKVALAALYQRLTFELEPGQ (0) 42771
43237
VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 43362
newest
data: version 3 checked May 1, 2006
Name: estExt_fgenesh2_pg.C_160079
Protein
ID: 189550
Location: Chlre3/scaffold_16:612970-615574
Name: e_gwW.16.62.1
Protein
ID: 116043
Location: Chlre3/scaffold_16:610198-611929
scaffold_16:
609616-615492
615492
MQDVISFLLNGLGFAAVGLVVL
(0) 615427
615307
QLVLSLDLYKRWKLRHLP (1)
615254
615022
GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR
(0) 614918
614578
VAFGRNWMVVVAEPDLLRQ (0)
614522
614258
VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR
(2) 614166
613627
DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG
613487
613489 GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2) 613328
613083
VDFQAMESLPAAGTGEGGADTKPAAAP
APASSSYGRVLVQACRDVFKYSSVVYGSK (2) 612916
612659
YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT
(0) 612564
612380
ARTHLRDACMSLIDGWKKQEASG
VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE
insert
AAAAVGTAAGGVGGLSAGSFLGLMLAAR
DKSTGEGLTDLQVAAQ (0) 612051
611926
VQTFILAGYETTANALAFAVYCLATNPE
(1) 611843
611651
AEAKLLAEIDAVLGPDR (2)
611601
611397
LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 611278
610981
VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 610865
610684
ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ
(0) 610583
610269
EAKVALAALYQRLTFELEPGQ
(0) 610207
609741
VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK*
609616
>CYP743B1
scaffold 98 unannotated region adjacent to a large gap
C_32340001
inserts in large sequence gap of scaffold 98 and
completes
the P450 gene.
first
exon is best guess
251670
MVASASWQLDLLGALSGAPSPQM (0) 251602
251481
AAAGLALLLASLLIYLLDPIQRWRLRKVPGER (?) 251386
251227
(1) GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 251117
251064
SAEVQGIAVIPHHVS 251020
251017
RMQVALGRKWAVVLADAEMQRQVRGTGAERG 250925
2384
GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA 2241
2240
TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 2142
1933
VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP 1751
1592
YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ 1497
1323
ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADACQ 1198
1119
SSNGVGAAATSGRGGMAGVAPGSFLDLMLG 1030
1029
HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 931
805 VQLFILAGYETTANALAFAVYCIATHPE 722
526
VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT 350
266 VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH
(0) 153
newest
data: version 3 checked May 2, 2006
Name:e_gwW.71.18.1
Protein
ID:122749
Location:Chlre3/scaffold_71:125260-130065
Frameshift
at HHVS/RMQ, GC boundary at RKVP?
125260
MVASASWQLDLLGALSGAPSPQM (0) 125328
125449
AAAGLALLLASLLIYLLDPIQRWRLRKVP (1) 125535
125703
GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 125813
125869
AEVQGIAVIPHHVS 125910
125913
RMQVALGRKWAVVLADAEMQRQVRGTGAERG (2) 126005
126948
GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA
TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 127190
127399
VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP (2) 127581
127740
YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ (0) 127835
128012
ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADA (2) 128128
128216
SNGVGAAATSGRGGMAGVAPGSFLDLMLG
HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ
(0) 128401
128527
VQLFILAGYETTANALAFAVYCIATHPE (1) 128610
128806
VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT (0) 128982
129066
VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 129182
129643 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 129726
129919
EAKVALATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVTPVPRRKL* 130065
>CYP743B2
C_8600001 also inserts in same gap of scaffold 98
141391
XXRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480
ARTRLCNTVFQLIDSWKEQHRAEAEIDAAASSGKPDVGAGRHSSN
GVGAAATSGRGGLSGVAPGSFLDLMLGQRQGGERGSGGKKAEGEEGVEHAPLTDEQVAGQ
(0)
VQLFILAGYETTANALAFAVYCIATHPE (1)
(seq
gap)
3945
(0) SSPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 3838
3648
EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 3511
CYP743B2
scaffold_71:130374-138996
a
duplication of 743B3 note: 1 extra S in exon 1
CYP743B3
has some defects so CYP743B2 may be the intact gene,
While
CYP743B3 may be a pseudogene copy.
130374
MSNVFANWPSGSGAPLGGLLRSLGM
(0) 130448
130571
VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 130657
130846
GPGPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 130962
131029
AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 131151
(seq
gap, missing six exons)
138577
PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 138660
138850
EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 138996
>CYP743B3 C_980035 C_8600002 same sequence
2544
MSNVFANWPSGSGAPLGGLLRLGM (0) 2615
2745
VAAGFALLLVSLIIYLLDPI
242088 KRWRLRKIPG
242059
241862
PGPRGRPVLGCLPQLRAQPMPLFLQ 241788
241786 SCAQTYGPVFKAS
241748
241692
AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG 241570
240969 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA
240826
240825
TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 240727
240592
VDFHSINEEDQAGSGSGSATATAGATAAAKGRGDDGYGKQLAAACGQIFR 240443
240442 YGSPVHGSP 240416
240284
YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK 240189
240021 ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAGQSNN
239890
239889
GVGAAATGGRGLSGVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAP 239740
239739 LTDEQVAGQ 239714
239592
VQLFILAGYETTANALAFAVYCIATHPE 239509
239320
VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSR 239171
239170 VVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP
239027
238674
PRGAAQQHAHAPFGYGSRMCIGYKFAMQ 238591
238461
EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 238315
newest
data: version 3 checked May 2, 2006
Name:e_gwW.71.20.1
Protein
ID:122730
Location:Chlre3/scaffold_71:139305-143478
2
Frameshifts and one small duplication of AEVQ
these
defects were not in the ver 2 seq (see above)
139305
MSNVFANWPSGSGAPLGGLLRLGM (0) 139375
139498
VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 139584
139787
GPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 139897
139966
AEVQA 139967
139969
EVQGTAVLLHHVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 140088
140701
GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGATASX 140853
140856
ATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 140942
141077
VDFHSINEEDQAGSGSGS 141130
141131
ATATAGATAAAKGRGDDGYGKQLAAACGQIFRYGSPVHGSP (2) 141253
141391
YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480
141648
ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAG 141767
141768
QSNNGVGAAATGGRGLSGVAPGSFLDLMLGH 141860
141861
RQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) 141956
142077
VQLFILAGYETTANALAFAVYCIATHPE (1) 142160
142349
VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEAL
RLFPPAHLTSRVVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP (0)142642
143119
PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0)143202
143332
EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 143478
>CYP743C1
C_1130014 C_9610001 AV627084.1 top part = scaf 961
35102 MTFLQLLPGVPLVLLGVLALPV (0) 35037
34921 VITLVQEVITKRKYRHIP 34868 (1)
34694 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 34611
VK
(seq
gap about 146 aa) followed by scaf 113
118674
AVALPCLLPAVRHLAAAAPDPVLALHIQ 118591 (0)
118264
SRQVLRQVSTKLITAWRDSHTAAS
ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG
31374
MAPTLTDAQIEAQVQTFLLA 31315 (1) (I-helix)
31010
GFETTANALTFAVYLLACHPE 30948 (0)
(87
aa seq gap)
29287
(0) AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 29156
28541
EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR
GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 28308
Name:Chlre2_kg.scaffold_17000165
Protein
ID:147793
Location:Chlre3/scaffold_17:1492638-1496177
CYP743C1
scaffold_17:1489349-1496178
1489349
MTFLQLLPGVPLVLLGVLALPV (0) 1489414
1489530
VITLVQEVITKRKYRHIP (1) 1489583
1489757
GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 1489840
1490285
VKQYGDLLTFRLGSRQFVLVADPDAAR (2)
1490365
(small gap in C-helix region)
1491618
(0) PVFTARVFLTQIVFPHTARSLRGYQALMDREAVALAGRLR
RQAAAGGGGGGGGGGGGGGGDKAGEIEVMSEMSRVTLAVVGTAAYG (2) 1491875
1492617
CNDFFRTMSPAARSSWSW 1492670
1492671 AVALPCLLPAVRHLAAAAPDPVLALHIQ (0) 1492754
1493081 SRQVLRQVSTKLITAWRDSHTAAS
ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG
1493293
1493357
MAPTLTDAQIEAQVQTFLLA (1) 1493416 (I-helix)
1493721
GFETTANALTFAVYLLACHPE (0)
1493783
(EXXR missing in a seq gap)
1494664 (0) IQGHRIPAGSTLWLSIAHLHTRDGVWPEPQ (0) 1494753
1495196
AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 1495330
1495948
EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR
GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 1496178
>CYP744A1
C_940015 (N-term
part), C_940016
MALSSAWALAGLFL
(0)
AMFVFFGYSLRKRWQLRKIP (1)
23868
GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 23773
23439
AWLGVEPLIIICDPALIR 23386 (2)
23162
KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 23058
22787
GEVWRRGRRVFEASVINGVR (2)
22557
LAAHLPAINRCANRFVAQL
AQRVAAPAAAHSGKTLGEEGIDMFS
22396
IVGGYTMAVTGEVAYG 22349 (2)
HVPAVTRGVRPFWQVEHSTLYLPLG
(0)
21478
VMFPWARPLVRWLATHFPDRAQREHMAARTQI 21446
IANISRLLMERWATSKKAAAAAAGTG
TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA
IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0)
VIAQSFTFV 20858
MAGFETTALTLSLVTFMLATHPE (0) 20791
AAARLTAEVDGLGPGELTHEVLAE (0)
20358
KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK
(2) 20233
19945
GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 19775
19557
MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER*
Name:Chlre2_kg.scaffold_23000133
Protein
ID:148983
Location:Chlre3/scaffold_23:958944-961028
N-term
part, missing two exons
Name:e_gwW.23.96.1
Protein
ID:118452
Location:Chlre3/scaffold_23:962118-963240
Only
covers I-helix to heme
scaffold_23
958703
MALSSAWALAGLFL (0) 958744
958941
AMFVFFGYSLRKRWQLRKIP (1) 959000
959120
GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 959212
959546
AWLGVEPLIIICDPALIR (2) 959599
959823
KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 959930
960201
GEVWRRGRRVFEASVINGVR (2) 960260
960457
LAAHLPAINRCANRFVAQLAQRVAAPAAAHSGKTL
GEEGIDMFSIVGGYTMAVTGEVAYG (2) 960636
961230
HVPAVTRGVRPFWQVEHSTLYLPLG
(0) 961304
961510
VMFPWARPLVRWLATHFPDRAQREHMAARTQ 961602
961603
IIANISRLLMERWATSKKAAAAAAGTG
TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA
IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 961899
962100
VIAQSFTFVMAGFETTALTLSLVTFMLATHPE (0) 962195
962367
AAARLTAEVDGLGPGELTHEVLAE (0) 962438
962633
KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK
(2) 962752
963039 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 963228
19557
MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER*
last
exon is in a seq gap use ver 2 seq here
>CYP744A2
C_940017
PTQ5694.x1 K-helix to heme = PTQ11662.x1 PTQ243.x1
PTQ52.x1 PTQ9722.x1
MWNVAELGLALVPVV
(0)
18913
AFVWLAYNLPERWRLRRIP (1) 18854
18740
GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0) 18648
18338
IWFGVNPWIVVSDPALLR (2) 18285
18027
KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS (2)
17775
GEVWRRGRRVFEASVIHPTR (2) 17722
17477
LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA
17301
EVGSYTMAVVGEVAYG 17256 (2)
WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR
(0)
16350
VMFPWARPLVRWLATHFPDRAQREHMAARTQI 16318
IANISRLLMERWAASKKAAAAAAGTG
GGAGNAAGAGGDRAGG
FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0)
15798
VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 15691
AAARLVAEVDAVGGRELTAELLAE (0)
15294
GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 15175
HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM
(0)
MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP*
Name:e_gwW.23.89.1
Protein
ID:118526
Location:
Chlre3/scaffold_23:969108-971162
I-helix
to heme only, seq gap above this
Use
earlier version for the top half
scaffold_23
MWNVAELGLALVPVV (0)
18913 AFVWLAYNLPERWRLRRIP (1) 18854
18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0)
18648
18338 IWFGVNPWIVVSDPALLR (2) 18285
18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS
(2)
17775 GEVWRRGRRVFEASVIHPTR (2) 17722
17477 LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA
17301 EVGSYTMAVVGEVAYG 17256 (2)
WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR
(0)
969108
VMFPWARPLVRWLATHFPDRAQREHMAARTQI 969200
969201
IANISRLLMERWAASKKAAAAAAGTGGGAGNAAGAGGDRAGG
FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 969428
969669
VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 969764
969927
AAARLVAEVDAVGGRELTAELLAE (0) 969998
970161
GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 970280
970565
HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFG 970711
970712
IGARMCVGHKLAMM (0) 970753
970992
MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP* 971162
>CYP744A3
C_940044
MPGLGALLAFIQTPLGA (0)
ITWLGWYPLRRYAFRKFP
(1)
3380
GPFGLPFLGNLPQ (0) 3430
3597
IAAMDTTAFLTSSAVKYGPVCK (0) 3662
3831
VWFGTRPWVLINDPELIR 3884 (2)
4267
RHSFRWPARPANFASYFHVMTGENRAIDRAGVVLAE (2) 4371 TIN460677.b1
GEVWRRGRRAFEGSIIHPAR
(2) WEQ17438.g11, TIN285957.x1
5063
LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG 5218 (2)
VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ
(0)
5902
LMFPALEPLWLWAAHHMPDAKQTKAMRARSK 5994 (0)
VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG
FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0)
6987
VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 7049
7448
AAARLAAEVDAVLGGRELTAELLAE (0)
8071
KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR 8190 (2)
8546
GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA
8714
PFGMGTRMCVGHKLAMM (0)
9146
ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG*
Name:e_gwW.23.108.1
Protein
ID:118465
Location:Chlre3/scaffold_23:976166-982342
I-helix
to heme only
Exons
6 and 7 are in a seq gap and are taken from trace archive files
982342
MPGLGALLAFIQTPLGA (0) 982292
982144
ITWLGWYPLRRYAFRKFP (1) 982091
981891
GPFGLPFLGNLPQ (0) 981853
981686
IAAMDTTAFLTSSAVKYGPVCK (0) 981621
981452
VWFGTRPWVLINDPELIR 3884 (2) 981399
981016
RHSFRWPARPANFASYFHVM 980957 TGENRAIDRAGVVLAE (2) TIN460677.b1
GEVWRRGRRAFEGSIIHPAR (2) WEQ17438.g11, TIN285957.x1
980414
LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG (2) 980268
979957
VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ
(0) 979820
979584
LMFPALEPLWLWAAHHMPDAKQTKAMRARSK (0) 979492
979090
VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG 978974
978973
FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 978875
978529
VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 978434
978038
AAARLAAEVDAVLGGRELTAELLAE (0) 977961
977384
KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR (2) 977265
976909
GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA
PFGMGTRMCVGHKLAMM (0) 976691
976309
ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG* 976166
>CYP744A4
between C_239009 and
C_239004 not annotated
AV641971 35% to 703A2 N-term to C-helix
51492 MYAALALVLSPVLL (0) 51451
51367 ALLWAIINPVERWKTRKIPG (2) 51308
51224 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA
(0)51123
50867 IWFGNRAWITIADPALIR (2)50814
50325 KLGFKFLNRPARMTDFGH (0) 50272
49795 VLVGHNAEVDNAGAFVAR (2)49706
49574 GEVWRRGRRAFEASIIHPAS (2) 49515
58799
LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQGEQQGKAALAMKQQGG
GGGGGVEILTEAGNYTMAAVGEVAYG (2) 58542
(SEQ
GAP)
47764
(0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 47672
VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF
56888
KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 56811
46972
VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG 46832
46831
PWRAGAGAGEGACAGGELTPELLAK (0) 46757
46326
LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 46288
46008
DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALGPADPNGWAPFGVGARMCVGHKLAMM (0) 45820
45561
VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 45421
Note:
scaffold 121 has part of the last exon as a duplication
61920
VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLL 61825
Name:fgenesh1_est.C_scaffold_23000031
Protein
ID:95157
Location:Chlre3/scaffold_23:1143890-1147747
CYP744A4
N-term
Name:e_gwH.23.61.1
Protein
ID:103666
Location:Chlre3/scaffold_23:1141463-1143101
CYP744A4
I-helix to end
1147747
MYAALALVLSPVLL (0) 1147706
1147622
ALLWAIINPVERWKTRKIPG (2) 1147563
1147479
PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA
(0) 1147378
1147122
IWFGNRAWITIADPALIR (2) 1147069
1146580
KLGFKFLNRPARMTDFGH (0) 1146527
1146050
VLVGHNAEVDNAGAFVAR (2) 1145997
1145829
GEVWRRGRRAFEASIIHPAS (2) 1145770
1145397
LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQ
GEQQGKAALAMKQQGGGGGGGVEILTEAGNYTMAAVGEVAYG (2) 1145140
(missing
exon 9 in a SEQ GAP)
1143893
(0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 1143801
1143522
VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF
KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 1143307
1143098
VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG
PWRAGAGAGEGACAGGELTPELLAK (0) 1142886
1142455
LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 1142339
1142048
DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALG
PADPNGWAPFGVGARMCVGHKLAMM
(0) 1141860
1141603
VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 1141463
>CYP744A5P
pseudogene C_1730009 C-helix 81% to 744A3
PROBABLE
pseudogene WITH PART OF EXON 3, EXONS 4,5 AND 6
13946
QIAAMDTTAFLTSSAVKYGPVCK 13875
13607
AWFSTQPWVINDPKLVR 13551
RHSFRWRARPSLFASYFQVMTGENRAIDRAGVGAGG
12773
GEAWRRTRRVLEGSIIHPAR 12705
Name:Chlre2_kg.scaffold_21000002
Protein ID:148389
Location:Chlre3/scaffold_21:6347-7649
Frameshift at PSL/FASY, bad boundary
6350
QIAAMDTTAFLTSSAVKYGPVCK (0) 6418
6683
AWFSTQPWVINDPKLVR (2) 6733
7163
RRHSFRWRARPSL 7201
7200
FASYFQVMTGENRAIDRAGVGAGG (0) 7271
7533
(2) GEAWRRTRRVLEGSIIHPAR
(2) 7592
>CYP744B1
C_8650001 C_940020
FIRST
TWO EXONS FOUND BY WALKING. FIRST
EXON IS A BEST GUESS
MELVSGLALAGVALFIL (0)
TIN33450.x1
GFIWAGFNPIERYLSPLRRFP
(1) TIN292840.x1 WALKED TO THIS READ
GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0)
220 IWLGNQPWVCVADPDLIR (BAD BOUNDARY,
SHOULD BE phase 2)
568 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK
(2) 672
SEQ GAP (EXON 6)
1533
(2) LAGHLPAVWRCVRRYTPRLERHAAT (1) 1589
1838
GEPLDLSSDLADLTLAVVGEAAYG 1882 (2)
VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0)
2710
MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 2737
VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA
FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ
(0)
3663
IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE 3782
VDEHFARQAAAEQQQQGQQQREGDDALPE (0)
4526
LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 4645
NTFVCLATHAV
4956
MHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPAGGGADAAAAAWVPFGMGPRM 5132
5133
CVGSKFATM 5153 (0)
5425
VSKAVLLQIYRRFTFELHPKQ (0) 5484
VLPLRTRTALTHAPRDGIWVVVKAR* 5818
Name:e_gwW.23.77.1
Protein
ID:118428
Location:Chlre3/scaffold_23:1014183-1020804
Probable
GC boundary at exon 4 DLIR = AGGC
1014183
MELVSGLALAGVALFIL (0) 1014233
1014336
GFIWAGFNPIERYLSPLRRFP (1) 1014398
1014778
GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0) 1014873
1015176
IWLGNQPWVCVADPDLIR 1015229 (GC BOUNDARY?)
1015524
RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK (2) 1015631
1015978
NGPTWRLARRAFESSIIHPQS (2) 1016040
1016491
LAGHLPAVWRCVRRYTPRLERHAAT (1) 1016565
1016769
GEPLDLSSDLADLTLAVVGEAAYG (2) 1016840
1017260 VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0) 1017403
1017671
MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 1017751
1018161
VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA
FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ
(0) 1018451
1018621
IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE
VDEHFARQAAAEQQQQGQQQREGDDALPE (0) 1018827
1019484
LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 1019603
1019881
NTFVCLATHAVMHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPA
GGGADAAAAAWVPFGMGPRMCVGSKFATM (0) 1020117
1020380
VSKAVLLQIYRRFTFELHPKQ (0) 1020442
1020727
VLPLRTRTALTHAPRDGIWVVVKAR* 1020804
>CYP744C1
C_1370013 43% to 744A2
2459
MQLTWLGWAPVTRWRLRNIP (1) 2400
1885
GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 1787
1503
VWVAARPWIVVSDPVAARKIAYR (2) 1423
1222
SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 1142
784 GPAWKASRRAFETSVLRPDRL 722
721
AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY (1)
VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSSKSPSSALVRACHECMACFELPLATMYLPLQ
(0)
MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0)
AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGISSSSFLAAMLKDQTGSNTACASSSGTDGG (0)
VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2)
KLRAELVENRELLDSADGLAQ
(0)
LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2)
508 GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1)
603
905 ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0)
1539
VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 1652
VWVTLQPLLLPGAK*
Name:fgenesh2_pg.C_scaffold_39000151
Protein
ID:177201
Location:Chlre3/scaffold_39:932071-938361
932071
MQLTWLGWAPVTRWRLRNIP (1) 932130
932645
GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 932740
933039
VWVAARPWIVVSDPVAARKIAYR (2) 933107
933305
SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 933397
933746
GPAWKASRRAFETSVLRPDRL
AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY (1) 933949
934140
VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSS
KSPSSALVRACHECMACFELPLATMYLPLQ (0) 934331
935234
MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0) 935371
935723
AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGGI
SSSSFLAAMLKDQTGSNTACASSSGTDGG (0) 935911
936216
VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2) 936320
936586
KLRAELVENRELLDSADGLAQ (0) 936648
936914
LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2) 937036
937175
GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1) 937267
937572
ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 937667
938203
VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADGVWVTLQPLLLPGAK* 938361
>CYP745A1 C_1860018
AV623700
N-term 31% to CYP735A4 rice, 28% to CYP97A4 rice
similar
to CYP97 and CYP72 clans
MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT
CVSQNKGAWDIMEDWMDAKGPIVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGLVTS
DGALWQKQRMLMGPALRVDVLDDIIR
IAKKAIDRLCEKLSHHAGKGDIVDIEEEFRLLTLQ
(0)
VIGEAVLSLGPEECDR
(0)
VFPQLYLPV
MNEANRRVLRPYRMYLPTPEWFRFSSRMGQLNGFLIDLFRRRWQARQAAAAAAQGEGSSS
SKPKPADILDRIMEAIE
ESGAKWDAALETQLCYEVKTFLAGHETSAAMLTWSTLELAAHSQAADK (0)
VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK
(0)
ECLQLRLPVIMSE
(0?) this may be wrong there is a seq gap that may have the true exon
AEDDPQGLLGYPLPRGTMVACHLQ
(0)
GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ
(0)
GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTVIPVGPIRGLKVLVEQRK*
Name:Chlre2_kg.scaffold_74000010
Protein
ID:154128
Location:Chlre3/scaffold_74:79791-84023
Revised
the EXXR exon
This
seq is most like CYP97 or CYP746 sequences.
It
clusters with the 72 clan or the 97 clan and these two
cluster
with each other.
84023
MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVD 83874
83873
YIRIARGLSAIPSAPGGVPLLGHVIPMLTCVSQNKGAWDIMEDWMDAKGP 83724
83723
IVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGL 83574
83573
VTSDGALWQKQRMLMGPALRVDVLDDIIRIAKKAIDRLCEKLSHHAGKGD 83424
83423
IVDIEEEFRLLTLQ (0) 83382
83181
VIGEAVLSLGPEECDR (0) 83134
82841
VFPQLYLPVMNEANRRVLRPYRMYLPTPEWFRFSSRMGQLN 82719
82718
GFLIDLFRRRWQARQAAAAAAQGEGSSSSKPKPADILDRIMEAIE (0) 82584
82327
ESGAKWDAALETQLCYEVKTFL 82262
82261
LAGHETSAAMLTWSTLELAAHSQAADK (0) 82181
81879
VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK (0) 81781
81308
EGLRKYSVVPVVTRVL (0) 81263
80886
AEDDPQGLLGYPLPRGTMVACHLQ (0) 80815
80411
GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ (0) 80292
79988
GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTV 79845
79844
IPVGPIRGLKVLVEQRK* 79791
>CYP746A1
C_28140001 = C_250032 C-helix exon duplication
This is a
bacterial related seq like CYP252A1, CYP197A1, CYP208A1
N-term is
probably in a seq gap. C-term runs
off the end
scaf 2814 is a
repeat of the C-helix exon
39% to CYP252A1
from Streptomyces peucetius,
but not bacterial
because it has introns.
MLALAGGLQSMLQVSSPLVTHKITYGSL
(0)
RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT
(SEQ
GAP HERE)
31584
GNGLLVSDGPVWQRQRRLSNPAFRRAAV 31495
EAYGGAMVAATEDMMRRVWGPA (1)
GGTRDVYADFNELTLQVTLEALFGF
SEDAAQIVAAVEKAFTFFTQR (2)
AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?)
30612
AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 30502
30091
ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS
(2)
29766
VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG
TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRR 29548
29296
EGYSGYMGLMSNLGPNGAYLPFGGGPRNC 29261
(SEQ GAP HERE)
KPLLTLRPEAVVLRISPRRQ*
Name:e_gwW.1.470.1
Protein
ID:116510
Location:Chlre3/scaffold_1:3570907-3575049
50%
to CYP746B1 Physcomitrella patens (moss)
top
26 hits in nr section of genbank all bacterial
followed
by CYP97A of glycine max
3575049
MLALAGGLQSMLQVSSPLVTHKITYGSL (0) 3574966
3574076
RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT 3573975
3573974
ATYGPVVGLLLGGERVALVTGRAEARA
VLVEAAGEVYVKEGTAFFPGSSLA (1) 3573822
3573413
GNGLLVSDGPVWQRQRRLSNPAFRRAAV
EAYGGAMVAATEDMMRRVWGPA
(1) 3573264
3573081 GGTRDVYADFNELTLQVTLEALFGF () 3573001
3572874
SEDAAQIVAAVEKAFTFFTQR (2) 3572812
3572660
AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?) 3572496
3572453
AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 3572289
3571932
ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS
(2) 3571801
3571661
VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG
TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRS (2) 3571326
3571149
NLGPNGAYLPFGGGPRNC 3571090
3571089
IGTGFAMMEALLVLAALLQRYSLALPPAAGSSSGGAFPKP
3570969
KPLLTLRPEAVVLRISPRRQ* 3570907
>CYP747A1 C_900050 41%
to CYP743B2 C-term
EXXR to PERF IN SEQ GAP
I HELIX LOCATED 28000BP
AWAY ON SMALL FRAGMENT (MISSASSEMBLY?)
FIRST EXON IS A BEST GUESS
352943
MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK 352824 (0)
352717 DVSVIFFEGLHVAQLKFSEKYGPVCR 352640 (2)
352462
FANPASLNGATSWVFINSPENIQHVCATNVRNYS 352361 RRYLPDIYT (2)
352115 YVTHGKGILGSQ 352080 (0)
351877 DEYNARHRRLCSGPFRNKWQLQRFSSVVVER
351785 (2)
351348
SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR 351190 (2)
350690 DLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK
350565 (0)
350317
AMDRLGAPHLRQLGEAVSVMRAAMLDVIA 350231 (0)
378450
ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRLEEELDAVLAGG
378271 (0)
348405 (0) REARQHRFQWLPFGAGPRMCLGASFAQ (0) 348325
348100 MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ*
347933
Name:e_gwH.96.3.1
Protein
ID:108849
Location:Chlre3/scaffold_96:178714-184286
Model
only covers I-helix to heme region
This
seq is now complete, 38% to 97A6 in C-term half
178714
MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK (0) 178833
178940
DVSVIFFEGLHVAQLKFSEKYGPVCR (2) 179017
179195
FANPASLNGATSWVFINSPENIQHVCATNVRNYSRRYLPDIYT (2) 179323
179686
YVTHGKGILGSQ (0) 179721
179924
DEYNARHRRLCSGPFRNKWQLQRFSSVVVER (2) 180016
180453
SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR (2) 180611
181108
RDLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK (0) 181236
181484
AMDRLGAPHLRQLGEAVSVMRAAMLDVIA (0) 181570
182398
ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRL
EEELDAVLA
(1) 182574
182854 DGEAPTYESLERMPYLQ (0) 182904
183327
ACAKEVMRLYPAIPVFPREAARPDVLPTGHGVAAGDVVFMSS
YALGRSEAVWGPDVLEFDPDR (2) 183515
183802
FSPEREARQHRFQWLPFGAGPRMCLGASFAQ (0) 183895
184119
MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ* 184286
>CYP748A1
C_1820019 about 40% to C-term half of 741A1
>C_1820019
N-terminal missing (about
65aa) This seq begins at the KYG motif (TYG)
There is a seq gap before
this seq, which is probably where the true
N-terminal is located.
Name:e_gwW.9.168.1
Protein
ID:114278
Location:Chlre3/scaffold_9:2353835-2358515
2353322
MSSALDELRFYGTLAATLLGPRYDLGRVPGPPGHPLLGNITAVMRPDYHVQ (0) 2353474
2353835 MLEWANTYGGIFKFSLGFQPVVVVSDPAVAVQVLGRAPGRAIPRKCVGYKFFDL (0) 2353996
2354237 ATNASGAHSFFTTSDEGQWAAVRKAAAAAFSSANVK (2) 2354344
2354560 KAFPIALRHLLL
(0) 2354595
2355565
LSLLHVFVEALFGVTPEDFP (1) 2355624
2355926
GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPV (0) 2356030
2356150
VGWREGGTGHVSRGFGARNSRAWGSGEKEWTEENWEPR (0) 2356263
2356454
AVTDLWACLGRVRHPRT (1) 2356504
2356839 GELLGRQGLVPEIGALMMAGFDTSSHSVAWALFALAANPEAQQRVRQELDGRGLLRRP (1) 2357012
2357269 GTAAPPRLPVLDDLPQLPYLNACIDEAMRMYPVAATASVR (2) 2357388
2357569 EVTEPTRVGDFVIPPGVIVWPMLYALHNSVHNWDQPDVFKPERWLQSNAGGSS (1) 2357727
2357950 GKGGGGGKRYMPFSDGMKSCLGQ (0) 2358018
2358133 ALGLMEVRTALVVLLGR (2) 2358183
2358396 YAFALDPGHGGEAAVRRSMIMSLTLKIRGGLRLVATPLG* 2358515
>volvox
CYP748A1 79% to Chlamydomonas 748A1
ABSY209135.g1
exon 1
ABSY189778.b1
exon 2a
ABSY140806.b1
exon 2b
ABSY42643.g1
exon 3
ABSY86219.x1
exon 4
ABSY93957.g1
exons 5, 6
ABSY112787.y1
exons 8, 9 fused
ABSY106164.g1
exons 10, 11
506
MSSSWEELCFYGHLASTLFSPKYDLARVPGPRGSFGLGNITAVMRPDYHVQ (0) 348
203
MLEWANQYGGVYKFSLGFQWVVVVSDPRIAVQ (0) 298
289
VLGRGPDSIPRKCVGYKFFDL (0) 248
33 ATNAAGAHSFFTTSDETQWAAVRKAAAAAFSSANVR
(2) 140
394
KAFPIALRHSRL (0)
716
LSMLHVFMEALFGIRPEDFP (1) 657
290
GRQVAADMNLVLEEANERLKVPLRKVAMALVRPT (0) 189
623
GVTDLWACLGRVRHPVT 573
GAPLGRDALVPEIGALMMAGFDTSSHSVAWVLFALAAHPGAQLRCRQELAARGLVAEGA (1)
984
GSAQRGPTLDDLIQLPYLNAVIDETMRMSPVAATASVR (2) 871
357
EVTQPTRVGDYVIPPGVIVWPMLYALHNAVHNWDRPDEFLPERWLPGSGAA (1) 199
2357672
AGCCAGACGTCTTCAAGCCCGAGCGATGGCTGCAGAGCAACGCCGGCGGCAGCAGCAGTGACAGCGGTGGCAGCAGCAGCAAGGGCGGCAACGAGGAAGC
2357772
GGGGGTGGCCGGTGCCGGTGGCGGTGGCGCGGGAGGCGCTCGTTCGGCCGCGGCTAACGACGAGGGCAGCGGCGGCGCTGCGGGTGGCTTGGGCGGTGGC
2357872
GGCAGTGGCGCCAGCAGCAGGAGCGGCTCCTCCGCCGCCCTGGGTGCGGCGGCGGCGGCGGCGGCAGACGGCGGCGGAGGCAAGGGCGGCGGCGGCGGCA
2357972
AGCGCTACATGCCGTTCAGTGACGGCATGAAGAGCTGCCTGGGGCAGGTGGGTGGGTGGGCTCTGGGGGTATGTCGTGGTTAGATTCCGCCCCTCACCTT
2358072 TCCCTCCCTTCTCCCGCGCGAAACTTCCCTCATGCTTTCCGCCCTCCTCCTCCCGCCGCAGGCTCTGGGGCTTATGGAGGTGCGCACCGCACTGGTGGTG
2358172
CTGCTGGGCAGGTGCGTGCGTGCGGCGCCGGGGCAGGGGTGGGCGTGGGGGCATGAGGGGGAATGGCCTCAGTGAGATGACGC
2352872
GAAGCAGCGCTTAGTGGTGGTGGCGGGGGAGCTGGGCGGAGCCGCGACCCACGGAGGCGCCGGCCGGCGACTGGAGCACAACGCTTCGCTTCGGCGCTGT
2352972
GCCACTGCTGCAACACAACTGAACATAGGATTCACAGCACTGTTGCTACTGGACGCCACGTCGAGGCTATCGCAGCTATCCCAGAGGACCGCCGCCGGAG
2353072
CCGGGAGGCCACCCCTCAACGCACGCCGCCGTGTGCAGCCAGCCAGTCGGTCCTTTGCCGCCGGCGCAATCAGCACCACCAGCAGCGCAAACAGCCGGCA
2353172
CACACACAGACACCGTACAGCAGCTAACTTGCCAGCCCAACTGCATAGCAGCAGCTCTCCGCCTTTCTACCCCACATCACCCACCCACGCACCCAAGCCT
2353272
CTCCAAGCCACCGCTCCCCTCACCTCTCCCGCTGCAACACACGCCGCACCATGTCCTCGGCCCTGGATGAGCTGCGCTTCTACGGCACCCTGGCCGCCAC
2353372 GCTGCTGGGCCCGCGCTACGACCTGGGCCGCGTGCCGGGGCCGCCCGGGCACCCGCTGCTGGGCAACATCACCGCAGTCATGAGGCCCGACTACCACGTG
2353472
CAGGTGTGCTGACGACCGGCGGGGCGGATGGGGGTTGGGCGGGGGGCAAGGGGAGGTGGGGGACTATGGCGCGGAGGAGTTTGGGTGGGGAAGGGATTTG
2353572
GTATTGTGTGGGGTGGGTGGGGTGGGGTGGGGCAGAGGGTTCAGGGGCTGGGTCGGTGTCGAGGCGGCAAGGGGTAGGATGATCATGACCCGGGGGGATA
2353672
GGAGCGTGTGCGGCTCAGCTGCTGCTGGCCGCGCGCCACCACAGCTGCCGCGGCACTACCCTATGCGCCGCTCCGCACCAGGAACAGCACCTCCCCCCAC
2353772
CGCACCGCATCGTGTGTCACGCCCACGCACCTGACTGCTGCTGCCCGGCCTGCTGCCCGCCAGATGCTGGAGTGGGCCAACACCTACGGCGGCATCTTCA
2353872
AGTTCAGCCTGGGCTTCCAGCCGGTGGTGGTGGTGTCCGACCCGGCGGTGGCGGTGCAGGTGCTGGGCCGCGCGCCGGGCCGAGCCATCCCGCGCAAGTG
2354372
GAGGGGACTCCCCACGCTTGCGGCACCCTTGCGCACGTGTGTGACTTGCCTAGCATCCATCGATCCCCGGCTCAAGCGCCTGATATGCCTCCACCGCTTC
2354472
TGTTCAACCGCCCCGTTGTAATCTCCTGCTACTCATGCTCCCTCTCCCTCCCGCTCGCTGCTCCGGATGGCCCTAACGCCGTCGCAGGAAGGCGTTCCCC
2354572
ATCGCGCTGCGTCACTTGCTGCTGGTGGCGGAGTCCCTGGACCCAGCCGGCCCCCACACGCCCGGCAACCCCTACCTCGACCTCACACACCACTCCCAAC
2354672
AACAGCACCAACAACACCAACGGCACCCGCAAATGCAGAAGGGTGACGGTGCCGCCGCCGCACGCATGAGTGGCGGCGACGGCAGCGAAGCGGCGCCTGC
2354772
AGGCAAGGGCGACAGCAGCAGTGGCGCTGCTAGCAGCAGGTGGCTGTGGCGGACGCCCGACCTCAACTGGATGCGGAGCGGCCTCAGCCTCGGGTTCCGC
2354872
CGGCGCAGCCGCAGCCGCCCCGGCAACAGCACCGCCGCCGCCAAGCCTGCTTCTACGCCGCCAGGGAGCGCTACTACTTCCACTGGTGCTACAGCCAACG
>ABSY86219.x1 CHROMAT_FILE: ABSY86219.x1
PHD_FILE: [top]
ABSY86219.x1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10
11:54:44 2003
Length = 781
Query:
180 RRRKAFPIALRHLLLVAESLDPAGPHTPGNPYLDLT 287
RRRKAFPIALRH LVA LDPA
P NPY+ LT
Sbjct:
382 RRRKAFPIALRHSRLVAAGLDPAVQPDPANPYIQLT 489
2354972
CCACACATGCGGCAGACGCAGCTCCCAGTGCCAGCAGCAGCTTTGTGGACCTGGGCAGCAGCTGCGTAGGTGCTGACAGCAGTGCGAGCCTCGCCTCTCG
2355072
GTCGTCCTCGCCCTCGGCCACCGCGTCTGCGCCCTGCTCCTGCGGCCGCTGCGGCGCAAACAGCCCGCGCCGCGCCGTCGCCGCCGCCACTGCCACTGCC
2355172
GACACTAAGGGTGGCGGCGCAGAGCGGACGGCCGCGGCGCCGGCGGGGCCGGCGGAGGCGGAGGAGCTTGCGGCGGGCGGCGTGGGAGCGGGCGCGCCGG
2355272
GCGCTGCCGCTGGCGGGCGCTCCATCCACAGCCACCCCTTTGACTGCGGCACCGACGAAACGAGCAGTGTAGACGAGACGCCTCCGCACGCCACGGCTGC
2355372 ACCCGCCGCCGCCACCTGCACCGCCCCAGCCGGCGCTGGCAGCGGCAGCGGCAGCGCCACGGATGCCGGCACCAGCGCTAGCGGCACCATCGACGCGGAA
2355472
AGTAGCACCGGCGCCGGCACTAGCGGCAACCCTAGCGGCGGCGGCACAGGCGGCGGCCCTGCTGCTGTGGTGGACATCCAGGAACACTTGGAGCTGAGCC
none
2355572
TGCTGCATGTGTTTGTGGAGGCACTGTTCGGAGTGACGCCGGAGGACTTCCCGGGTAGGTGCCGGGGACGGACGGAGGGGGAACAAAGAGGCGAGGCAAG
2355672
GCGAGGCGGTCTGGGCAGACGGGAGGGAGGTTGGTGCCAATTGGCGCCATTCAGTGCTTGCTACTCTGCTGTTTCTATCTCGCCAGTATGTGCTAGCGCA
2355772
CTGTCTGCTGACTGGGCACTGACACGTCACCTGGCTGCTCCCTCCGACCCATATCGCCTTCGCACCTCACACGCTCCCCGCCCACCACCTCCCCGCCCGC
2355872
CTGCCCCCTGCTCCTCCCTCCATTCCCCCCTCTTGTCCCGCCCCTCCCGCCCTCCCAGGCCGCCAGGTGGCTGCCGACATGAACCTGGTGCTGGAGGAGG
2355972
CCAACAGCCGCCTCAAGGTGCCGCTCAGCGGGCTGGCAAGAGCCGTCACACAGCCGGTGGTGGGTGCGGGGCTGGGGCGGTTATGTGCCCGAGCGCAATG
2356072
GAGTCGGTCCCAATAATAGTCAAGGAGTCGTCGCGGGACTGGCCATGGGGCGGGGCGGACTGGGGCCAATTGGGTGAGGTCGGGTGGAGGGAGGGAGGGA
>ABSY93957.g1 CHROMAT_FILE: ABSY93957.g1 PHD_FILE: [top]
ABSY93957.g1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14
12:58:11 2003
Length =
1195
Query:
3 LHVFVEALFGVTPEDFP 53
LHVF+EALFG+ PEDFP
Sbjct:
707 LHVFMEALFGIRPEDFP 657
Query:
358 GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPVVGAGLG 477
GRQVAADMNLVLEEAN RLKVPL +A
A+ +P V G G
Sbjct:
290 GRQVAADMNLVLEEANERLKVPLRKVAMALVRPTVRRGGG 171
2356172
CAGGACACGTGAGTCGGGGATTTGGAGCCAGGAACTCAAGGGCTTGGGGAAGCGGGGAAAAGGAATGGACCGAGGAGAACTGGGAACCGAGGGTTACGCA
2356272
GGCACGCCGCCGCAACCCCCAGTCTGACGTGCGACGCTGCTAGCCGCCACCCCTCCTCCACGCACGCGCACACGCCCAACCCCACACAGGCCCAGGCCCG
2356372
CATCCGCGCCGCCCAGGTGCGGCTGGCTGCGGTGTACGGCAGCCTGTACGACGTCATCCGGGCCCGTGGGCCGCAGCCCGAGGCCGTGACAGACCTGTGG
2356472
GCGTGCCTGGGCCGCGTGCGACACCCCAGGACAGGTGGGGGGGCGGTTGTGGCGTGTGGGTTGACGCGGGTACGTGGGGACACAGGGAGGGGGTGGGGGC
2356572
ACTGCTGGGTGGGTGTGCGCGCGGCACGCCGCCGCGGCCCCGAGTTACTGACTCTGGAGGAAACCATGCTGCAACTCACTTGCCCTGCCGCATGGACCGC
2356672
GGCCCGCAGCACCTCCACCGCGCCTGCACCAGACCTCCCCCACCTTTGCCCTAACCCACCCTTTTCTTCCTTATCCAGCCACCAATCACGGACTTCGCTC
>ABSY112787.y1 CHROMAT_FILE: ABSY112787.y1
PHD_FILE: [top]
ABSY112787.y1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10
10:18:50 2003
Length = 799
Query:
277 PEAVTDLWACLGRVRHPRTG 336
PE
VTDLWACLGRVRHP TG
Sbjct:
629 PEGVTDLWACLGRVRHPVTG 570
>CYP-un1Chlre pseudogene 1, family
not identified, C_140094
half of
gene, very different
63125
(0) HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 62976
VGRGRERGGRELARMEVKRGGDRF
(2)
62528
GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGS (1?) 62412
61349
WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREALAALLTKYRWRL 61164
61163
YDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 61041
scaffold_48:305112-303028
no model
Name:estExt_fgenesh2_pg.C_480037
Protein
ID:193769
Location:Chlre3/scaffold_48:289896-330340
Note
this is a very long gene model that contain s the EXXR exon
But
no other exons. It misses the heme
signature sequence
And
the I-helix motif
305112
HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ
304963
304962
VGRGRERGGRELARMEVKRGGDR 304894
304518
GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGSS
(1) 304396
303336
WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREA
303187
303186
LAALLTKYRWRLYDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 303028
vovlox
has no ortholog
$$$$$$$$$
>CYP767A1
Green
my predictions
Yellow
JGI predictions that work in blast
CYAN
= motifs
Name: fgenesh2_pg.C_scaffold_9000240
Protein
ID: 169101
Location: Chlre3/scaffold_9:1625885-1634209
Exon
13 in a seq gap use older version of seq here
fgenesh2_pg.C_scaffold_9000240
[Chlre3:169101] similar to 741A1
C_340039
unnnamed C-term P450 fragment PKG to heme
1625885
MDGWPPSSPGSIRLQTLQLHAVPPAEPSSSPFITGPPPT
(2) 1626001
1628184 LRSLLLPRYDLDSIPGPWPHALPLLGNMLSVLRPDFHRVLLRWADQYGGVVRIKFLWQ (0) 1628357
1628626
DSLLVTDPAALASICGRGEGACDKAAAIYTPIN
(1) 1628724
1629069
AMCTPRGHVNLLTSPANDAWRAVRKAVAVSFSWNNIKNKFPIIR (2) 1629200
1629464
DRTSELVEWLRAEGPAASVDVDQAALRVTLDVIGL (0) 1629568
1629948
TAFGHDYGCVRLRQVPPEHLIRVLPRAFTEVMRRIANPLRALAPRLVKKGTK
(1)1630103
1630519
GLQAFRDFQAHMQQLLREVLDRGPPPPEDTDIGAQL (2) 1630636
1630701 EAQR (0) 1630712
1630736
PAITEERILSE (0) 1630768
1630963
IGILFVEGFETTGHTISWTLFNIATTP (1)
1631043
1631243
GVQEAVAAELGGLGLLVRPHAMGGR 1631317
1631318
GAARPLALEDLKRLPYLTACVKEAMRMYPVVSIMGRITQ (0) 1631434
1631650
HPTRVGKYLVPAGTPIGTALFAIHNTRHNWTDPLAFRPQRWM 1631775
1631776
GESSSERASGRASERARDSGR (2) 1631838
4554 YMPFSEGPRSCVGQSLAKLEVMTVLATLLAHFRVDLAEE
(0) 4646
1634099
MGGREGVHKRESTHLTLQTAGTRGIQMHLHPREDDP*
1634209
Note:
most similar to animal CYP46, CYP24 and CYP4 sequences
34%
to 741A1
The
first exon is probably not right (too far away)
Short
EAQR exon is required to join GAQL to PAIT, there may be some revision
needed
here. (see volvox ortholog)
trace
file 652853255 from PQRWM
walked
down to 650266898
these
two covered the ver 3 assembly to 1633054 with 100% matches
walked
down to 337758911 goes to gap region in assembly 100%
used
the very end of the assembly to search again and found
335096672. This seq has the missing P450 exon seq
336483811
has the end of this exon
>Volvox
ortholog assembled from blasts for each exon, exons 7,8 found
By
comparing DNA for Chalmydomonas and volvox in this region for matches.
Missing
exon 1
ABSY171556.g1
exon 2 PKY…
ABSY46806.x3 exon 3 DGL… fused with exon 4
ABSY46806.x3 exon 4 MCT…
ABSY5198.y1 exon 5 DRT…
ABSY140583.g1
exon 6 SAF…
ABSY56673.x2 exon 7 GLT…
ABSY90166.y3,
ABSY10903.x1, ABSY90166.y1, ABSY125944.g1 exon 8 PAI…
Missing
exon 9
ABSY174072.y1
exon 10 GTQ…
Missing
exon 11
ABSY225235.b1
exon 12 FMP…
ABSY176428.b2
exon 13 MGG…
270
(0) PKYDLDLIPGPWTHALPFIGNLLQFLRPDFHRVCLRWADKYGGIVR (2) 133
343
(2) IKFLWHDGLLVTDPPALAAICGRGEGAVDKAANIYSPIN 459
460
QMCTPHAYPNLLTSLADDRWRAVRKAIALSFAFGNIRKKFPLIR (2) 591
494
(2) DRTGELLEWLRGVGPLESVDVDQAALRVTLDVIGL (0) 598
(0)
SAFGHDYGCTRLQQVPYNHLLRVLPRAFTEVMRRIANPFRSFAPGLVKNGKK (1)
723
GLTSFKDFQRHMQELLGEIKARGPPARGDADIGAQLYRVLEAAR (0) 779
(0) PAITDERILSE (0)
322
(1) GTQEAVAEELSSLGLLVRPKSEGGRSAARQLELDDLKRLRYLTACVKESMRMYPVVSIMGRWRMR (0) 516
702
(2) FMPFSEGPRSCVGQSLAKLEVMTVLAMLLANFRIELSDE (0) 818
744
(0) MGGREGVRQRESTHLTLQTRGTRGIRMHLHPRDQE* 851
$$$$$$
>CYP768A1
Chlre2_kg.scaffold_23000190 [Chlre3:149040]
this
P450 model is upstream and covers an N-term up to I-helix motif
2000bp
space between N- and C-term parts
C_1530020
unnnamed C-term P450 fragment PKG to end
Chlre2_kg.scaffold_23000191
[Chlre3:149041]
31%
to 4Z1, 31% to 4B1 in C-term part
24%
to CYP46 over most of the length
35%
to Ciona 4V5 like seq C-term part
Name: Chlre2_kg.scaffold_23000190
Protein
ID: 149040
Location: Chlre3/scaffold_23:1470852-1473965
Name: Chlre2_kg.scaffold_23000191
Protein
ID: 149041
Location: Chlre3/scaffold_23:1476142-1477663
1470852
MPAAQLFKFLLKPQYDLAKLPQPPVADWVLGHVKHLLRK
(1) 1470968
1471402
DYHRVILGWAKQYGRIFKLR
(2) 1471461
1471649
ILNEWTVVITDPAAAAQVLATVPGRTHNYKHIDE (0) 1471750
1471901
VLGGPGKIS (2) 1471927
1472089
MFGTPDEVHWRNARKATAPAFSMAN (0) 1472163
1472706
VPDATALPGFDELASNILLLMAEANAQ (0) 1472786
1473064
VTDPLRAFFYFTPIAPLVSK (0) 1473123
1473316
HVARCRAALKQVVMFHGRTAARILAR (2) 1473393
1473587
PEPSPDNTLLWACLHRLRHPHTGRKLTPGQLHPE (1) 1473688
1473904
VGMYTAAGFDTTASTVGWCM
(2) 1473963
1474237 YAASLWPEQQQAVAAELRAAGIFGPAAVVE (0) 1474326
1474621
ELAKLPRLNAFINE (0) 1474662
1475863
VMRMFPPTAVSAER (2) 1475904
1476109 LTPDEPVTIMGMTFPAK (0) 1476159
1476468
TVLWCITYGIHMSDANWEDAAKFK
(2) 1476539
1476792
PERWLEDPRCAFAKSPGAGGAAAAPATAGGAEGPAAAIGGAAEEEPPNTA
PRRFVPFGQGPKNCVGQ (0) 1476992
1477404
NFGITVVRAVVALLLRRYHVDLHPDMDTSPEGDKLGGGGGGGDGNSSGSGQAGG
CRHSAEDTARLTHVAVITKLKKLRLVLQRRDD*
1477664
Volvox
CYP768A1 ortholog
ABSY165990.g1
exons 1,2,3
ABSY147804.y1
exon 4 C-helix partial
ABSY193853.g1
exon 4 C-helix partial
ABSY111272.b1
exon 4 intact
ABSY75276.y2
exons 5,6 fused
ABSY73799.g1
exons 9,10
ABSY165990.b1
exons 14,15 fused, 16, 17
ABSY22806.b1
exons 18, 19
MWDTLRFYYSTHGPLGAWTPAIVLLLNILGIALALAVTKFIGLYFA
258
PSYDLRKIPTPPVGDAILGHVKFLLRPDYHRVILAWTRKYGKIFRLR (2) 398
599
ILTQWTVVITDPAAAAQVLAVVPGRTHNYTLVDE (0) 700
986
GLGGPGKIS (2)
546
MFGTRDEAHWRNVRKATAPAFSMAN (0) 620
362
(0) VPDARALPGFDLLVPRILLLMAEANRQIVDPLWALWYRTPLAPLLSK (0) 222
221
PDPPSDNTLLWACLHRLRHHITGARLTPTQLHPE (1) 322
649
GGMYTTAGFDTTASTLGWCL (2) 705
957
(2) VSPDRPVAVGPFTLPPGVVLWPLVYGIHMSDANWDEPEAFR (2) 835
542
MERWLEDPRCAFARGE
(1) 495
314
RGPGASGAPRRFLPFADGPKNCVGQ (0) 261
469
NFGLVVVRAVLALLLSRYRVALHGDM 546 no boundary after DM
822
(0) VAVVTKLSKLRLVMTPRD* 878
$$$$$$$
note:
the next three sequences are partial missing the N-term.
It
is nearly impossible to assemble the N-term part without cDNA.
>CYP771A1
C_4150003 unnamed CYP97 like
C-term P450 fragment
estExt_fgenesh2_pg.C_210032
[Chlre3:191092]
TIN347338.x1
CANNOT DETECT N-TERM HALF, EXXR TO PERF MISSING
I-helix
present and heme signature present
Gray
region is 39% to a Xenopus seq
EAASLWLLMALPVPNELLPG
YGTYEANVRRLDELVYDM
LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE (0)
VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ
(SEQ
GAP)
(0)
PNAFLPFGVGSRSCIGRHFGLLSTQ
(0)
LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ*
Name:estExt_fgenesh2_pg.C_210032
Protein
ID:191092
Location:Chlre3/scaffold_21:297178-306479
302461
(0) IEAASLWLLMALPVPNELLPGYGTYEANVRRLDELVYDM (0) 302577
303433
LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE
(0) 303540
303889
VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ
(0) 304029
305941
PNAFLPFGVGSRSCIGRHFGLLSTQ
(0) 306015
306339
LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ* 306479
volvox
matches
ABSY89114.y1
ABSY732.g2
I-helix
787
(0) NAALWLLLQLPIPDHLLPGYDKYMANIATLDEL (0) 689
278
(0) LVTMFFGGTDTSALALTLTAYHLAHCPEAQRAARAE (0) 385
$$$$$$
>CYP770A1
C_7970001 unnamed C-term P450 fragment
fgenesh2_pg.C_scaffold_15000041
[Chlre3:170931]
runs
off end
NOTE:
CANNOT FIND AN AG-GT BOUNDARY AT LAST EXON.
THIS
MIGHT HAVE A LONG INSERT IN IT AND NO INTRON
Very
low sequence identity to other P450s
LLVSEGQQWRLMHALATPAF (C-helix)
34%
to CYP714A2
GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRH
(2) (I-helix)
SRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ (0) (EXXR)
(0) DTTIGGVMLPKD (0)
5873
(0) VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 5766
5066
PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVMRYSLELLQPPPPSPR
4884
4643
(0) QLFVSLRPGPGVRVCFVPRHQQQVE* 4563
Name:fgenesh2_pg.C_scaffold_15000041
Protein
ID:170931
Location:Chlre3/scaffold_15:453166-458216
39%
to 746A1
454318
LLVSEGQQWRLMHALATPAF 454377
KAELLERGAFAAALRGVMEEWHRRAVALLPLWRLQAA (0) possible exon like 97B6/97C3
455773
(0) GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRQ (2) 455895
456058
GCGFPTSRFIQSHPSRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ
(0) 456204
456449
DTTIGGVMLPKD (0) 456484
456906
VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 457013
457713
PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVM 457850
457851
RYSLELLQPPPPSPR (?) 457895
458139
(0) QLFVSLRPGPGVRVCFVPRHQQQVE* 458216
>volvox
matches
ABSY135777.g1
661
IMGAGHETTATTTAALLYCISAHPDVRQRVEQEL 560 I-helix
ABSY182504.y1
326
LPYTEAVLKETMRLYPALPMMHRHARNDIRLEDGRVAPK 210 (EXXR motif)
ABSY179247.b1
alternative
106
LESIVLETLRLYSPAYMVGRCAQVDATLGPYSLPTGTTVLVSPFVMHRDAAVW 264
715
GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 843
ABSY65293.y2
888
DATLGPTSVPTGTTVLVSPFVMHRDXPVW 802
351
GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 223
$$$$$$
>CYP769A1
fgenesh2_pg.C_scaffold_24000071 [Chlre3:173996]
C_10690001
unnamed C-term P450 fragment
43% to 97A5 cannot extend
upstream
possible
C-helix exon
VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE
seq
gap here
RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG
AG
region of I-helix here
LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG
()
VYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK
()
DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH
(0)
ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVL
LCFLALEPTGDPADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA*
Name:fgenesh2_pg.C_scaffold_24000071
Protein
ID:173996
Location:Chlre3/scaffold_24:545063-551204
Not
a bacterial contamination since there are exons and an ortholog in volvox
551626
MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDS (1) 551525
550850
LPPALPLLGHWLALRARGRGSEPGDTHLRTLRRWAEAHGGAFRLLLPRAW 550701
550097
VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE
(0) 549978
(seq gap here)
548586
(0) AVEATGQVLLLRLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG (0) 548416
547904
LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG (1) 547794
546390
GVYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK
(0) 546253
545752
DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) 545624
545320
ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGD
545171
545170
PADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* 545063
>Volvox
about 60% to Chlamydomonas seq above
>ABSY24005.y1 CHROMAT_FILE: ABSY24005.y1
PHD_FILE: [top]
Query:
5 QLRNRCNLTGHTSRQPLQPSHLPTLW
30
+L RCNL G SR+ LQ HL T W
Sbjct:
526 RLNYRCNLRGRVSRRALQDVHLSTRW 603
MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTAR (1)
volvox
MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDSR (1)
Chlamy
685709806 AOBN322434.y1, also ABSY28503.b1
ABSY17828.g1
goes upstream of this exon
different
N-term (probably both same with one having errors)
SPPGVPLLGHSLAYARAPWKWGGVPRARVPGEPSFLW
(errors)
(1) APPGVPLLGHSLTLRAWPSWTWWWFRSG
GPRGDQLLLRALLRWSEQYDGAFQLRNGWL
APPGVPLPGHSLTLPAWPSLDMGVGSGAEGPRATTLHWGRCCPGPSSTMVLFT
ABSY207904.g1
Trace
archive files
685812629 AOBN472902.y1
684986793 ABSY385036.g1
683183378 AOBO82354.b1
710612050 AOBN690035.g1
550752068 ABSY207904.g1
689850606 AOBN318993.b1
685709806 AOBN322434.y1
85 VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0)
204
689851374 = mate pair of 689850606 above (C-helix) 2 exons
955 () GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830
(0)
AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR
CRILHHFIEQ (0)
ABSY202948.b1 (+)
(0) LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1)
ABSY223271.b1
246
(1) GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK 112
ABSY130123.g1 (+)
348
(0) DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA 474 (0)
711 LCYPEAFMPFSFGSRN*LGQKLPVAQIKAALAMLL*FLGLKPS 839
SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1)
trace archive 712749567
VAPWGLGLFLRPEGGMQLLVAPRKKNS*
687335561
Assembled volvox
CYP769A1 seq 56% to Chlamydomonas CYP769A1 seq
MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTA (1)
PPPGVPLLGHSLTLRAWPSWTWWWFRSGGPRGDQLLLRALLRWSEQYDGAFQLRNGWL
VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0)
204
GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830
AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR
CRILHHFIEQ (0)
LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1)
GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK
DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA
(0)
SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1)
VAPWGLGLFLRPEGGMQLLVAPRKKNS*
>ABSY207904.g1 CHROMAT_FILE: ABSY207904.g1 PHD_FILE:
ABSY207904.g1.phd.1 CHEM: term DYE: big TIME: Sun Nov 30 14:23:34 2003
NNNNNNAAGCGCTGAATACCCTCCTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXGTTCTTCACCCGAATGCCGTACCCAGCAGCGCTACA
GCCACCTCCTCGGCCCAGTGGCGGTTACTGCGGAGGTCGCTGCTACACGCCTTTTCCGAC
TCGGAGCTTCAACTGGACTTTGAG
GTGCGTGGAGGTGCATGTGTTCGTGTGATATGTGTC
TATCTGTCTGTGTATCTGTCGTCGTAGGCCAGGCGTCACTGTCCAAGAGAACCCTTCACA
AGAGGCCAAGAGAACCCACCCCCACCCCCACCCCCACCCTCACCCTCACCCTCCCACCCC
CTCCCCCCACCCTCACCCTCACCCCCACCCTCCCACCCCCACCCTCATCCTCACCTCCAC
CCTCCCACCCCCACCCCCTAACCCTAAAAAAAAAATCCCCACAAAAACCTCATCTATATA
TNCTTCATCCCCAATCCCAACTCCACTATCCAACATCTTTAAATCATCACCCATTTCTCC
CACTCTAACCTCCACCCCAACCTCAACTTCTTACCCAACCCTTATAAAAATCAACTCCCT
TTTTTAAATCCCCAAACCTCAAATCCTATTCCCTACCCAATTATCCTTTCACATCTATAC
CCCATATCTATTCATAAACCTTAACCAACCCCTCACTTACCCTTTACCTTTAAAATCATA
AAACTCACCACCTTTCCATACTATCTTTTCAAATACCCATAACTTTTCCCCACATCAAAA
TAAAAATTTTTTTCCTATTAATACAACACTTTTTATACCCCCTCTCTACACTATAAACAT
CCCCTTAATTTTATATATTTCCCCTAAANATACTTCCCCCATTTCTACTTTATCATATAA
AAAAATAATTTTCCAACTTCCTTAAAAAACCTTCTTAAAAATTATTTCTATTTAACACTC
TCATTTATAATTTTCTTACCTATTATTAAATTTCCTTAAAATCTCAAAAAAACTCTCTTC
CCCATTAACAAACTATTCATTATTCCCCTTACNACTAACAAATAAAAATAAAAAATTTTT
TTTCTTTTCTCCCCCTCATAATACAAATAAAAATAATTTCCCAAAAAACACCCACACACC
ATCATACATTCCAATTATCTTTATAAAACAATTTCCCTNTCCACATACAATATAAAAAAT
AAAATATCTCCCTATCTTACATAAATCTATTCATCTTANTCTAATATCCCTTCTCCTACC
TCTTTCAACCTCTTTTAATCAATAATCTTTTTATACCCTCACAACTCTTTTCTACTCACT
ATCACTCTC
>ABSY109519.b1 CHROMAT_FILE: ABSY109519.b1
PHD_FILE: [top]
ABSY109519.b1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14
12:57:01 2003
Length =
1136
Score = 26.7 bits (53), Expect = 23
Identities = 12/18 (66%), Positives =
13/18 (72%)
Frame = -2
Query:
2 AAAATHPARTGYGAARSA 19
AAAAT A +GYGA R A
Sbjct:
511 AAAATSRAASGYGAERGA 458
Match to Kineococcus radiotolerans SRS30216 ctg215, whole genome
shotgun (bacteria)
ACCESSION AAEF02000013
MVRAVPAIVRAPHLFLAEVTRRHGPVAAIPLPRTPVLVLADPDGVRRVLVENARGYGKATIQY
SALATVTGPGLLAGDGEVWKQHRRTVQPAFHHGSLEDVA
AHAVHAARGLVAEADALPPGTPLEVLGATSRAGLEVVGHTLAAADLSGDAPLLVEAVG
RALELVVRRAASPVPAAWPTPARRRLAREVAVIDEVCARIVATRRARPLEDPRDVVGL
MLAAGMDDR
QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGR
EPGWDDLGKLPLLRAVVDESLRLYPPAWVVTRQALADDVVAGVAVPAGTLVIVCTWGL
HRDPALWEAPEEFRPDRFLDAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLC
ERTVRPAGPAPRVDALVTLRPRGGLPL HVERLAPSAS
Score
= 122 bits (306), Expect = 4e-25
Identities = 81/206 (39%), Positives = 110/206
(53%), Gaps = 14/206 (6%)
Frame = +1
Query 49
RILQDFIEGLLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREGGVYDAPHTALTWTM 108
++ + + ++ GHE V SL W L LAR + ++ AEL
G + W
Sbjct 46849 QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGREPG-----WDD 47013
Query 109
LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLAGWSVPAGAEVWVDVHAMHRNPQL 168
L +LP L V
E+LRLYP P RQ D V+AG +VPAG V V +HR+P L
Sbjct 47014 LGKLPLLRAVVDESLRLYP--PAWVVTRQALADDVVAGVAVPAGTLVIVCTWGLHRDPAL 47187
Query 169
WRDPDRFNPERWAEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLAL 228
W P+ F P+R+ +AP +
+++PFG+GPR C+G+ LA E LA LLC +
Sbjct 47188 WEAPEEFRPDRFL----DAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLCERTV 47355
Query 229 EPTGDPADEPRPAAGLFLRPAGGLHL 254
P G PA PR A + LRP GGL L
Sbjct 47356 RPAG-PA--PRVDALVTLRPRGGLPL 47424
Match to
4F3
>CYP4F3
NM_000896
Length = 520
Score = 80.9 bits (198), Expect = 5e-18
Identities = 59/193 (30%), Positives =
93/193 (48%), Gaps = 33/193 (17%)
Query:
96
RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEGLLL----------- 144
RL+
+ ++ RRR LP+ +D +A +K + L DFI+ LLL
Sbjct:
261 RLVHDFTDDVIQERRRTLPSQG----VDDFLQAKAKSKTL-DFIDVLLLSKDEDGKKLSD 315
Query:
145 -------------GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTM 191
GH+ L+W L LA++ Q++
E++ E + D
+ W
Sbjct:
316 EDIRAEADTFMFEGHDTTASGLSWVLYHLAKHPEYQERCRQEVQ-ELLKDREPKEIEWDD 374
Query:
192 LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQ 250
L
+LPFL C++E+LRL+P P PA R
+D+VL G +P G + V
H NP
Sbjct:
375 LAQLPFLTMCIKESLRLHP--PVPAVSRCCTQDIVLPDGRVIPKGIICLISVFGTHHNPA 432
Query:
251 LWRDPDRFNPERW 263
+W DP+
++P R+
Sbjct:
433 VWPDPEVYDPFRF 445
>CYP4F12
mRNA for cytochrome P450, complete cds. AB035130
Length = 524
Score = 123 bits (308), Expect = 1e-30
Identities = 74/194 (38%), Positives =
107/194 (55%), Gaps = 9/194 (4%)
Query:
187 GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTMLHRLPFLDCCVRE 246
GH+ L+W
L LAR+ Q++
E++ E + D
+ W L +LPFL CV+E
Sbjct:
329 GHDTTASGLSWVLYNLARHPEYQERCRQEVQ-ELLKDRDPKEIEWDDLAQLPFLTMCVKE 387
Query:
247 ALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERW 305
+LRL+P P P R +D+VL
G +P G +D+ +H NP +W DP+ ++P R+
Sbjct:
388 SLRLHP--PAPFISRCCTQDIVLPDGRVIPKGITCLIDIIGVHHNPTVWPDPEVYDPFRF 445
Query:
306 AEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGDPADEPRP 365
S+ SPLAF+PF
+GPR+C+GQ A AE+K LA++L P EPR
Sbjct:
446 DPENSKGR--SPLAFIPFSAGPRNCIGQAFAMAEMKVVLALMLLHFRFLP---DHTEPRR 500
Query:
366 AAGLFLRPAGGLHL 379
L +R GGL L
Sbjct:
501 KLELIMRAEGGLWL 514
>e_gwH.661.2.1
[Chlre3:109783]
Name:e_gwH.661.2.1
Protein
ID:109783
Location:Chlre3/scaffold_661:7589-8149
bacterial
contamination 81% to Arthrobacter seq NZ_AAHG01000018.1
Arthrobacter
sp. FB24
MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGDDASGTAQLFA
SSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGSADLIKELAYPLPVIVISELM
GIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEYFLALIDE
Query 1
MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGD 60
MDF A+ E
LDPFPYYERMREAAPV++DEQSGSWH+FRYDDVQR LSEYATFSS MGGD
Sbjct 49739
MDFAAANENPLDPFPYYERMREAAPVFHDEQSGSWHVFRYDDVQRVLSEYATFSSRMGGD 49560
Query 61
DASGTAQLFASSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGS 120
D S T QLFASSLI TDPPRHR
LRSLVTQAFTPKAVDALAPRI+ LT+ELL+GI +RG
Sbjct 49559
DPSETGQLFASSLITTDPPRHRHLRSLVTQAFTPKAVDALAPRISELTEELLDGIVSRGG 49380
Query 121
ADLIKELAYPLPVIVISELMGIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEY 180
ADLI+ELAYPLPVIVISELMGIPA DR+RFKQWSDVIVSQTRT +A+ +H A N EMT Y
Sbjct 49379
ADLIEELAYPLPVIVISELMGIPADDRDRFKQWSDVIVSQTRTNAATEDHQATNREMTGY 49200
Query 181 FLALIDE
187
FL LI++
Sbjct 49199 FLDLIEQ 49179