Chlamydomonas reinhardtii cytochrome P450s

 

D. Nelson, Sept. 2, 2004

Under revision May 11, 2006

 

39 named genes, 2 named pseudogenes,

+ one bacterial contaminant

families = 51, 55, 97, 710, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746,

747, 748, 767, 768, 769, 770, 771 (5 old families, 16 new families)

 

51 is in the 51 clan (sterol 14 alpha demethylase)

55 is of fungal origin. (nitrite/nitrate reductase, soluble enzyme)

710 is in the 61 clan (C-22 sterol desaturase in fungi [CYP61] and plants [CYP710])

737, 738, 739, 740 are in the CYP85 clan

97 is in the CYP97 clan (carotenoid hydroxylases of epsilon and beta rings)

743, 744 are in the CYP711 clan (CYP711A1 produces a carotenoid hormone in Arabidopsis)

745 may be a new plant clan, CYP97A like

CYP747 is hard to place. 38% to CYP97A6 in C-term half

741 and 742 sometimes cluster with 97 but not always.

741, 742, 748, 767, 768, 769 cluster together and have best hits to CYP4 clan members

746 may be of bacterial origin, best hit is to CYP252A1 Streptomyces peucetius

top 26 hits all bacterial

CYP746 and CYP770 may be the Chalmydomonas precursors of the CYP72 clan

There is a CYP746 in moss

 

Chlamydomonas P450 tree

 

A link to the 2003 Chlamydomonas P450 page

 

P450s sorted by gene model number using the JGI annotation

 

* indicates more than one gene model for a single gene.

 

C_60077      CYP742A1

C_130004     CYP739A1

C_130006     CYP739A2

C_130009     CYP739A4

C_130009     CYP739A5

C_130012     CYP739A6

C_130125     CYP739A3

C_140094     CYP-un1Chlre pseudogene 1, family not identified, half of gene

C_180013     CYP743A1

             CYP744A4 between C_239009 and C_239004 not annotated

C_250032     CYP746A1, 39% to Streptomyces peucetius CYP252A1

C_310063     CYP97A6

C_340039     unnnamed C-term P450 fragment PKG to heme

C_410095     CYP97B6

C_420091     CYP743A2

C_470024     CYP737A1

C_570052     CYP738A1

C_680007     CYP51G1

C_900050     CYP747A1, 41% to CYP743B2 C-term

*C_940015    CYP744A1

C_940016     CYP744A1, N-term = C_940015

C_940017     CYP744A2

C_940020     CYP744B1

C_940044     CYP744A3

C_980035     CYP743B3

C_980053     CYP741A1

*C_980058    CYP741A1 N-term

C_1040015    CYP97A5

C_1080041    CYP740A1

C_1130014    CYP743C1

C_1340038    CYP97C3 70% to 97C2

C_1370013    CYP744C1

C_1530020    unnnamed C-term P450 fragment PKG to end

C_1540014    CYP710B1

C_1730009    CYP744A5P pseudogene 81% to 744A3

C_1820019    CYP748A1 about 40% to C-term half of 741A1

C_1860018    CYP745A1

C_2580005    CYP55B1, 43% to CYP55A6

C_4150003    unnamed CYP97 like C-term P450 fragment

*C_4260002   CYP97A5

*C_5270001   CYP739A6

C_7970001    unnamed C-term P450 fragment

C_8600001    CYP743B2 falls in a seq gap of scaffold 98

C_8600002    CYP743B3 same as C_980035

*C_8650001   CYP744B1

*C_9610001   CYP743C1

C_10690001   unnamed C-term P450 fragment

*C_22500001  CYP739A5

*C_28140001  CYP746A1 = C_250032 C-helix exon duplication

C_32340001   CYP743B1 falls in a seq gap of scaffold 98

 

P450s sorted by CYP name (version 2 assembly)

 

CYP51G1      C_680007    

CYP55B1      C_2580005 43% to CYP55A6

CYP97A5     *C_4260002  

CYP97A5      C_1040015   

CYP97A6      C_310063    

CYP97B6      C_410095    

CYP97C3      C_1340038 70% to 97C2

CYP710B1     C_1540014   

CYP737A1     C_470024    

CYP738A1     C_570052    

CYP739A1     C_130004    

CYP739A2     C_130006

CYP739A3     C_130125    

CYP739A4     C_130009

CYP739A5    *C_22500001 

CYP739A5     C_130009    

CYP739A6    *C_5270001  

CYP739A6     C_130012     

CYP740A1     C_1080041   

CYP741A1    *C_980058 N-term

CYP741A1     C_980053    

CYP742A1     C_60077     

CYP743A1     C_180013    

CYP743A2     C_420091    

CYP743B1     C_32340001  

CYP743B2     C_8600001   

CYP743B3     C_8600002 same as C_980035

CYP743C1    *C_9610001  

CYP743C1     C_1130014   

CYP744A1    *C_940015   

CYP744A1     C_940016 N-term = C_940015

CYP744A2     C_940017    

CYP744A3     C_940044    

CYP744A4     between C_239009 and C_239004 not annotated

CYP744A5P    C_1730009 pseudogene 81% to 744A3

CYP744B1    *C_8650001  

CYP744B1     C_940020    

CYP744C1     C_1370013   

CYP745A1     C_1860018   

CYP746A1    *C_28140001 = C_250032 C-helix exon duplication

CYP746A1     C_250032, 39% to Streptomyces peucetius CYP252A1

CYP747A1     C_900050 41% to CYP743B2 C-term

CYP748A1     C_1820019 about 40% to C-term half of 741A1

C_140094     CYP-un1Chlre pseudogene 1, family not identified, half of gene

C_340039     unnnamed C-term P450 fragment PKG to heme

C_1530020    unnnamed C-term P450 fragment PKG to end

C_4150003    unnamed CYP97 like C-term P450 fragment

C_7970001    unnamed C-term P450 fragment

C_10690001   unnamed C-term P450 fragment

 

P450s sorted by CYP name (version 3 assembly)

 

CYP51G1      scaffold_7:2481399-2484780  Protein ID: 126254

CYP55B1      scaffold_52:370660-375180   Protein ID: 121742

CYP97A5      scaffold_55:373287-377786   Protein ID:  39257

CYP97A6      scaffold_42:732596-737181   Protein ID: 121076

CYP97B6      scaffold_1:2256360-2261776  Protein ID: 116601

CYP97C3      scaffold_64:422589-430105   Protein ID: 122396

CYP710B1     scaffold_66:390953-394690   Protein ID: 132687

CYP737A1     scaffold_41:635800-640648   Protein ID: 151890

CYP738A1     scaffold_6:2860971-2864314  Protein ID: 167934

CYP739A1     scaffold_8:1064933-1068008  Protein ID: 140983

CYP739A2     scaffold_8:1078648-1085528  Protein ID: 140985

CYP739A3     scaffold_8:1105803-1109510  Protein ID: 140993

CYP739A4a    scaffold_8:1131245-1134169  Protein ID: 165902

CYP739A4b    scaffold_8:1135368-1135969  Protein ID: 165903

CYP739A5a    scaffold_8:1125087-1127174  Protein ID: 165900

CYP739A5b    scaffold_8:1128094-1130653  Protein ID: 186291

CYP739A6     scaffold_8:1145820-1150791  Protein ID: 186292

CYP740A1     scaffold_68:172336-177730   Protein ID: 153850

CYP741A1a    scaffold_71:380138-383878   Protein ID: 179637

CYP741A1b    scaffold_846:3828-5043      Protein ID: 181363

CYP742A1     scaffold_37:480604-486602   Protein ID: 151489

CYP743A1     scaffold_1:5611907-5617553  Protein ID: 116541

CYP743A2a    scaffold_16: 609616-615492  Protein ID: 189550

CYP743A2b    scaffold_16: 609616-615492  Protein ID: 116043

CYP743B1     scaffold_71:125260-130065   Protein ID: 122749

CYP743B2     scaffold_71:130374-138996   Partial seq not annotated

CYP743B3     scaffold_71:139305-143478   Protein ID: 122730

CYP743C1     scaffold_17:1489349-1496178 Protein ID: 147793

CYP744A1a    scaffold_23:958703-961028   Protein ID: 148983

CYP744A1b    scaffold_23:962118-963228+  Protein ID: 118452

CYP744A2     scaffold_23:969108-971162   Protein ID: 118526

CYP744A3     scaffold_23:976166-982342   Protein ID: 118465

CYP744A4a    scaffold_23:1143890-1147747 Protein ID:  95157

CYP744A4b    scaffold_23:1141463-1143101 Protein ID: 103666

CYP744A5P    scaffold_21:6347-7649       Protein ID: 148389

CYP744B1     scaffold_23:1014183-1020804 Protein ID: 118428

CYP744C1     scaffold_39:932071-938361   Protein ID: 177201

CYP745A1     scaffold_74:79791-84023     Protein ID: 154128

CYP746A1     scaffold_1:3570907-3575049  Protein ID: 116510

CYP747A1     scaffold_96:178714-184286   Protein ID: 108849

CYP748A1     scaffold_9:2353835-2358515  Protein ID: 114278

CYP767A1     scaffold_9:1625885-1634209  Protein ID: 169101

CYP768A1a    scaffold_23:1470852-1473965 Protein ID: 149040

CYP768A1b    scaffold_23:1476142-1477663 Protein ID: 149041

C_140094     scaffold_48:305112-303028   Partial seq not annotated

C_4150003    scaffold_21:297178-306479   Protein ID: 191092

C_7970001    scaffold_15:453166-458216   Protein ID: 170931

C_10690001   scaffold_24:545063-551204   Protein ID: 173996

Bacterial    scaffold_661:7589-8149      Protein ID: 109783

 

 

P450s sorted by scaffold location (version 3 assembly)

 

CYP97B6      scaffold_1:2256360-2261776  Protein ID: 116601

CYP746A1     scaffold_1:3570907-3575049  Protein ID: 116510

CYP743A1     scaffold_1:5611907-5617553  Protein ID: 116541

CYP738A1     scaffold_6:2860971-2864314  Protein ID: 167934

CYP51G1      scaffold_7:2481399-2484780  Protein ID: 126254

CYP739A1     scaffold_8:1064933-1068008  Protein ID: 140983

CYP739A2     scaffold_8:1078648-1085528  Protein ID: 140985

CYP739A3     scaffold_8:1105803-1109510  Protein ID: 140993

CYP739A5a    scaffold_8:1125087-1127174  Protein ID: 165900

CYP739A5b    scaffold_8:1128094-1130653  Protein ID: 186291

CYP739A4a    scaffold_8:1131245-1134169  Protein ID: 165902

CYP739A4b    scaffold_8:1135368-1135969  Protein ID: 165903

CYP739A6     scaffold_8:1145820-1150791  Protein ID: 186292

CYP767A1     scaffold_9:1625885-1634209  Protein ID: 169101

CYP748A1     scaffold_9:2353835-2358515  Protein ID: 114278

C_7970001    scaffold_15:453166-458216   Protein ID: 170931

CYP743A2a    scaffold_16:609616-615492   Protein ID: 189550

CYP743A2b    scaffold_16:609616-615492   Protein ID: 116043

CYP743C1     scaffold_17:1489349-1496178 Protein ID: 147793

CYP744A5P    scaffold_21:6347-7649       Protein ID: 148389

C_4150003    scaffold_21:297178-306479   Protein ID: 191092

CYP744A1a    scaffold_23:958703-961028   Protein ID: 148983

CYP744A1b    scaffold_23:962118-963228+  Protein ID: 118452

CYP744A2     scaffold_23:969108-971162   Protein ID: 118526

CYP744A3     scaffold_23:976166-982342   Protein ID: 118465

CYP744B1     scaffold_23:1014183-1020804 Protein ID: 118428

CYP744A4a    scaffold_23:1143890-1147747 Protein ID:  95157

CYP744A4b    scaffold_23:1141463-1143101 Protein ID: 103666

CYP768A1a    scaffold_23:1470852-1473965 Protein ID: 149040

CYP768A1b    scaffold_23:1476142-1477663 Protein ID: 149041

C_10690001   scaffold_24:545063-551204   Protein ID: 173996

CYP742A1     scaffold_37:480604-486602   Protein ID: 151489

CYP744C1     scaffold_39:932071-938361   Protein ID: 177201

CYP737A1     scaffold_41:635800-640648   Protein ID: 151890

CYP97A6      scaffold_42:732596-737181   Protein ID: 121076

C_140094     scaffold_48:305112-303028   Partial seq not annotated

CYP55B1      scaffold_52:370660-375180   Protein ID: 121742

CYP97A5      scaffold_55:373287-377786   Protein ID:  39257

CYP97C3      scaffold_64:422589-430105   Protein ID: 122396

CYP710B1     scaffold_66:390953-394690   Protein ID: 132687

CYP740A1     scaffold_68:172336-177730   Protein ID: 153850

CYP743B1     scaffold_71:125260-130065   Protein ID: 122749

CYP743B2     scaffold_71:130374-138996   Partial seq not annotated

CYP743B3     scaffold_71:139305-143478   Protein ID: 122730

CYP741A1a    scaffold_71:380138-383878   Protein ID: 179637

CYP741A1b    scaffold_846:3828-5043      Protein ID: 181363

CYP745A1     scaffold_74:79791-84023     Protein ID: 154128

CYP747A1     scaffold_96:178714-184286   Protein ID: 108849

Bacterial    scaffold_661:7589-8149      Protein ID: 109783

 

P450 sequences

 

Note: the P450 sequences have many apparent insertions of poly Ala, poly Gly,

poly S and mixtures of these.  These are found in some ESTs so they are

real.  It is not clear why these sequences are inserted or what they do to the

structure of these P450s.

 

 

>CYP51G1 C_680007 10 EXONS 56% TO 51G1 Arab 

EST SUPPORT BI717817 BU649818 BI726293 BM001590 AV642299

 

60124 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG 59927

59843 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE 59694

59394 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE 59245

59108 EYFAKWGETGVVDFKDEFSKLITLTAARTLL 59016

58765 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 58637

58412 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR 58311

58119 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK 57985

57667 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK 57455

57191 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 57009

56877 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 56743

 

newest data: version 3 checked April 24, 2006

Name: estExt_gwp_1H.C_70049

Protein ID: 126254

Location: Chlre3/scaffold_7:2481399-2484780

100% match

 

2481399 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG (0) 2481596

2481680 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE (0) 2481829

2482129 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE (0) 2482278

2482415 EYFAKWGETGVVDFKDEFSKLITLTAARTLL (1) 2482507

2482758 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 2482886

2483111 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR (2) 2483212

2483404 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK (0) 2483538

2483856 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK (0) 2484068

2484332 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 2484514

2484646 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 2484780

 

>CYP55B1 C_2580005 (possible CYP55 fungal origin), 42% to 105T1

      MAPQHD (1)

47793 FPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK 47629 (0)

47243 VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYRY 47127 (0)

      missing about 20aa here ? seq gap

      AKADKLVDAMIARGGPLDLNEAFSMPLPFR 46168 (0) (same intron loc. as 55A6)

45913 VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL 45773 (0)

45601 VAEKERNPTGKDLISELVTKQ 45539 (0)

45264 LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ 45136 (0)

44693 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ 44565 (0)

44256 LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ 44143 (0)

43755 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW* 43573

 

newest data: version 3 checked April 24, 2006

Name: e_gwW.52.47.1

Protein ID: 121742

Location: Chlre3/scaffold_52:370660-375180

Note gene model is too long at SMPLPFRVGGW, shorten by 4 amino acids

First exon is still my best guess, not in gene model e_gwW.52.47.1

51% to CYP55A5v1 Aspergillus oryzae

48% to CYP55A3 Cylindrocarpon tonkinense

42% to 105T1 Burkholderia fungorum (bacteria)

370660 MAPQH (1) 370674

370738 DFPFSRPKGVEPPAEYKELRSKCPVAPGRLFDGSKIWLISRHKELKEVLQDGRFSK (0) 370905

371291 VRTLPGFPELSPGGKAAAQSGNAATFVDMDPPEHTKYR (2) 371404

371628 GMVWPYLTPEAVEQLRPSIQ (0) 371677

372474 AKADKLVDAMIARGGPLDLNEAFSMPLPFR (0) 372563

372818 VIYDFIGIPEADFAYLSANVAVRSSGSSNAKDAAAAADDLVKYMDNL (0) 372958

373130 VAEKERNPTGKDLISELVTKQ (0) 373192

373467 LRPGHMTREQLVQTAFLMLVAGNATVATQINLGVISLLQHPDQ (0) 373595

374038 LAAMKADPARLVPAATEEICRFHTGSSYALRRLAVADVQVDGQ (0) 374166

374475 LVKKGEGIIALNQSANRDESVFPDPDRFDIHRQSNPQQ (0) 374588

374995 VGFGYGTHVCVAEWLARAEIQVAIGTLFRRLPNLRLAVPESQIQYSDPARDVGLAALPVTW* 375180

 

>CYP97A5 15 EXONS 60% TO 97A3 FIRST EXON PREDICTED BY GENSCAN

C_4260002 C_1040015

no mRNA or homology evidence for exon 1

note: CYP97A6 has homology to exon 2, but no upstream match for 5000bp

EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158

Gray resembles a cycad EST

13351 MPPDVSGNMLSFSTSISGCRF (1)

373428 GRSAARFLADLGRQWRAEASKRMPE (0) 373502

12913 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSKGLLSEILDFVMGT 12698 (0)

12532 GLIPADGEIWKARRRAVVPALHRK 12461

12332 YVMSMVDMFGDCAAHGASATLDKYAASG 12249

11994 TSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ 11884

11707 AVYTLLREAEHRSTAPIAYWNIPGIQFV 11624

11493 VPRQKRCQEALVLVNECLDGLIDKCKKLV 11407

11269 EEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 11168

11003 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 10884

10681 VDELLGDRKPGVEDLRALK (0) 10625

10448 MTTRVINEAMRLYPQPPVLIRRALQ 10374

10118 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER 10002

 9580 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ 9485

 9358 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT 9263

 9124 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACG

      PAVMPVASAEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 8852

 

newest data: version 3 checked April 28, 2006

Name:   gwH.55.10.1

Protein ID:    39257

Location:       Chlre3/scaffold_55:373287-377786

This model differs from seq below at ends

100% match from ARGDIRE to DPLTVP

EST support = cyan BM003139 BI725954 BE441929 BI719213 CF555158

Gray resembles a cycad EST

 

scaffold_55 16 exons

373287 MPPDVSGNMLSFSTSISGCRF (1) 373349

373428 GRSAARFLADLGRQWRAEASKRMPE (0) 373502

373725 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL 373874

373875 LTNADKYSKGLLSEILDFVMGT (0) 373940

374106 GLIPADGEIWKARRRAVVPALHRK (2) 374177

374306 YVMSMVDMFGDCAAHGASATLDKYAAS (1) 374386

374641 GTSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ (0) 374754

374931 AVYTLLREAEHRSTAPIAYWNIPGIQFV (0) 375014

375145 VPRQKRCQEALVLVNECLDGLIDKCKKL (0) 375228

375366 VEEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 375470

375635 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 375754

375957 VDELLGDRKPGVEDLRALK (0) 376013

376190 MTTRVINEAMRLYPQPPVLIRRALQ (0) 376264

376520 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER (2) 376636

377058 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ (0) 377153

377280 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT (1) 377375

377514 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACGPAVMPVAS 377663

377664 AEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 377786

 

>CB092428.1 hf05f08.g1 Cycad Leaf Library (NYBG) Cycas rumphii cDNA clone

hf05f08, mRNA sequence.

Length=609

 

This seq supports the secon and third exons above.

 

Query  40   GRSAARFLADLGRQWRAEASKRMPEVRLELRPCDGGGRASCPVLGKSTYTARGDIREIVG  99

            GR+ A+ +A   ++WRA  + +MPE                         ARG++R + G

Sbjct  383  GRALAKSIAVAEQKWRAHNASKMPE-------------------------ARGNVRAVAG  487

 

Query  100  QPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQIL  139

            QP FVPLY LFL YG +FRL+FGPKSFVI+SDPA AK IL

Sbjct  488  QPFFVPLYNLFLTYGGVFRLTFGPKSFVIVSDPAIAKHIL  607

 

VVQCAGQAGIRPGFEARAIAWPRCVFVSAKTRGFRLNKRVSNDFLGRQLTIKSFSNRQRG

GKIRAATVSSLNEGGGGNEPAVERVERLTEEDRAELSVRIAAGEFTAEPVTLNLLKIRLF

LIKFGAP GRALAKSIAVAEQKWRAHNASKMPEARGNVRAVAGQPFFVPLYNLFLTYGGVF

RLTFGPKSFVIVSDPAIAKHIL

 

volvox matches

>ABSY36486.y1  CHROMAT_FILE: ABSY36486.y1 PHD_FILE:     [top]

           ABSY36486.y1.phd.1 CHEM: term DYE: ET TIME: Fri Sep  5

 

Query: 22  GRSAARFLADLGRQWRAEASKRMPE 46

           GR  ARFLADLGR+WR+EA+KRMPE

Sbjct: 240 GRPVARFLADLGRRWRSEAAKRMPE 314

 

>ABSY25604.b1  CHROMAT_FILE: ABSY25604.b1 PHD_FILE:     [top]

           ABSY25604.b1.phd.1 CHEM: term DYE: big TIME: Tue Sep 16

           11:06:39 2003

          Length = 1069

 

Query: 46  EARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 105

           +ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK

Sbjct: 340 QARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSK 519

 

Query: 106 GLLSEILDFVMGT 118

           GLLSEILDFVMGT

Sbjct: 520 GLLSEILDFVMGT 558

 

>CYP97A6 C_310063 missing exon 1

(0)  VRVPLNNVGKVPIFQLLYELYSS (2)

(2)  HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS (2)

9247 KGILAEILEFVMGN (0) 9306

 seq gap missing 2 exons

 

 9705 XSVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ 9812

10081 AVYSVLRESTVRSTAPFP 10128 (1)

10371 YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSM 10487 (0)

      LEAEGSIPMPASPSSPSSSTATSSSAPSSPSAPLEESSA

10853 PTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAAV 10963

11275 LTWALHLLVAHPEVMKRVRDE 11277

11605 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ 11736

11956 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR 12075

12251 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALAVVLRRYDFAPDTSFGPVGFKS 12442

12584 GATINTSNGLHMLISRRDLT 12643

12644 GVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAAVGCPHAAAAATSGAPAGVTP 12811

 

newest data: version 3 checked April 27, 2006

Name:e_gwW.42.59.1

Protein ID:121076

Location:Chlre3/scaffold_42:732596-737181

100% to e_gwW.42.59.1 from VRVPL to MLISRR

scaffold_42

cannot identify exon 1

732596 VRVPLNNVGKVPIFQLLYELYSS (2) 732667

733002 HGGVFRMRLGPKSFLVLSDPGAVRQVLVGAVDKYS (2) 733106

733345 KGILAEILEFVMGN (0) 733386

733631 GLLAADGEHWIARRRVVAPALQRK (2) 733702

733949 FVSSQVALFGAATAHGLPQLEAAAAAAAAAAGDSRGGGA 734065

734066 ASVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ (0) 734176

734445 AVYSVLRESTVRSTAPFP (1) 734498

734738 YWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSMVGRCCGGGGGGGGG (0) 734893

       SSAPTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAA (0) 735324

735636 ALTWALHLLVAHPEVMKRVRDE (0) 735701

735969 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ (0) 736100

736320 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR (2) 736439

736615 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALA

       VVLRRYDFAPDTSFGPVGFKS (1) 736806

736948 GATINTSNGLHMLISRRDLTGGVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAA

       VGCPHAAAAATSGAPAGVTPQ* 737181

 

54% to DY932408.1 plains sunflower Helianthus petiolaris

MAASLTTLQFPSPYLNTPTTKFKLKSPSTSFPKSYGVSRSCGIKCSYSNGRKPD

SGEEKSGKKVEMTPEEKRRAELSARIASGAFTVEQPSLGSLLVSGLAKLGVPSNILEPVS

NLINSGGNYPKIPEAKGAISAIRSEAFFIP

LYELFLTYGGIFRLTFGPKSFLIVSDPNIA

KHILKDNAKAYSKGILAEILEFVMGTGLIPADGEVWRVRRRVIVPALHLKYVAAMIGLFG

EATDRLCKKLDDAAYNGEDVEMESLFSRLTLDIIGKSVFNYDFDSLD

 

>CYP97B6 on top of gene model C_410095 but annotation is in the wrong frame

strongly suspect ARGN... is N-term part of CYP97B6, but no proof

compare to 97A5 exon 2.

ESTs BI996334.1 AV390436.1

ALIAHKTLLQLY

ARGNIREIVGQTATVPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNADKYSMGLLTKILDLVMST

64% identical to 97A5 exon 2 but not in ver 3 of genome

on the Bac ends from ver 2

PTQ4692.y1  CHROMAT_FILE: PTQ4692.y1 PHD_FILE: PTQ4692.y1.phd.1

This is probably a real exon 2 of a CYP97 like seq

HE

479653 DMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK 479552

479264 AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR 479154

ADLKVINECLDNLIKQARDTRVAEDAEALQNRDYSKVSDPSLLRFLVD

MRGEEPTNKQLRDDLMTMLIGGHETTAAV

(44 aa sequence gap up to EXXR)

       CLGESLRMY

477871 PQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR 477740

477549 SPYLWKDPDTFRPERFFEPN 477484

SNPDFGGKWAGYRPDAVTGGAALY

       PNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAAMLLRRFTFRLAVPAEKV (1?)

476620 GMATGATIHTANGLSMRVTRRTP 476552

SGGSGSGAPGAAAKVPATV*

 

>PTQ4692.y1  CHROMAT_FILE: PTQ4692.y1 PHD_FILE: PTQ4692.y1.phd.1 CHEM: unknown DYE: unknown TIME: Thu Jan 10 11:26:57 2002 TEMPLATE: PTQ4692 DIRECTION: rev

= trace file 334400148

no other trace files match, may have errors

TCTATGTGACCTATACAAACTCTCGCTTGGCGAGACCTGGAGGATCACTC

CAGTCTGGCGAATTCGCGGACTCGGGCTCGAAAAAGAGATTGGACTCGAT

CCCTGTCGCCAAGTGCTGAGGAAGGATCCGCTTGTTGGCGATGCAAATTG

CAAAAACGGAATTCAGGAAGCGGAGCGCACGCACTAGATGCCTCCACATG

ACACCGGTAATATGATGACCATTTCAACATAGCATATCACGATGCCGATA

TGGGTGCTGTGCATGACCGACCTTTGGACCAGGGGGTGCCCCATCGTCCA

CGCCCAACTGCCTGCTTGGCTCTGACACAGGACGGTCTGCAGCTCGCTTC

CTGGTGGACTTGGGCCGCCACCGGCGTGCCAAGGCCACAAAGCGCATGCC

TGACGTGAGGTTATAGCTGCGGACCTGCTGACGGCGGTGGGCAGATGCAG

CTGCCCGGTAACTGGGCAAATCCACGTATACTGCATGGTGTGCAATGCAT

GGGGCGTCAGTATACTTGTAAAGGGTGTACTCTCACCTATCAGTGGGCTC

ATATGACCGGGGCCTGCGACTCCGTCCTGAAATCGACAAAAAGCTAGCGC

CCTTGATTGCCCACAAAACTCTCTTGCAATTGTACGCACGCGGCAACATA

CGGGAGATTGTGGGCCAAACAGCGACTGTGCCGCTGAACAAACTGTTCCT

GGTGTACGTGCAGATCTTCCGGGTGTCTTTCCGGCCCAGAGCTTCTGGAT

CATCTCTGAGCCCGCATGATGCGAAGGAGATCCTGCGCACGAACGCTGAC

AAGTACAGCATGGGGCTGCTCACGAAGATCCTGGATCTCGTGATGAGCAC

GCACGGTGCGCGTTGC

 

newest data: version 3 checked April 27, 2006

Name:e_gwW.1.53.1

Protein ID:116601

Location:Chlre3/scaffold_1:2256360-2261776

Green supported by identical ESTs

Gray supported by related ESTs, but not identical

Two small gaps and the N-term are missing

Note: yellow region is out of order, but supported by an EST

This seq agrees with model at FIDS to PAFH, GSAVV to AKFR, LEDL to VTRR

Seq gap here

AV390436.1 BI996334.1

2261776 FIDSGGVYKLVFGPKAFIVVSDPVVVRHILK (0) 2261684

2261461 ENAFNYDKGVLAEILEPIMGKGLIPADLETWKVRRRAVVPAFHK (2) 2261331

        lyleamvkvfsdcsekmilkseklireketssgedtiel Arabidopsis

2259284 (0) GSAVVDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIK (0) 2259168

2258877 AVYGVLKEAEHRSTFYLPYWNLPLADVLVPRQAKFR (2) 2258770

        ADLKVINECLDNLIKQARDTRVAEDAEAL

2263466 QNRDYSKVSDPSLLRFLVDMRGEEPTNKQLRDDLMTMLIAGHETTAAV 2263323

        LTWAMFCLVQ (0)

        ntdklvkaqaeidtildqrkp Ginkgo

    (1) SLEDLKAMPYLRA

2257791 CLGESLRMYPQPPILIRRALAEDTLPAGLRGDPAGYPIGKGADLFISVWNLHR (2) 2257633

2257436 SPYLWKDPDTFRPERFFEPNSNPDFGGKWA (1) 2257347

2256904 GYRPDAVTGGAALYPNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAA 2256758

2256757 MLLRRFTFRLAVPAEK (0) 2256710

2256491 VGMATGATIHTANGLSMRVTRRTPSGGSGSGAPGAAAKVPATV* 2256360

 

>CYP97C3 C_1340038 RUNS OFF END 70% to 97C1

44288 VPLGQDVMISVYNIHHSPAVWDDPE (0) 44214

43839 AFIPERFGPLDGPVPNEQNTDFR 43777 (2)

43352 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 43227

42864 IGMTTGATIHTTNGLYMYVKER 42799

      GAAASGSSGVAGGKQLAAA*

 

Name:e_gwW.64.11.1

Protein ID:122396

Location:Chlre3/scaffold_64:422589-430105

e_gwW.64.11.1 has an internal seq between EEMRAA and VPVGQD that is not right

This seq agrees with model from DIKE to EEMRAA, VPVG to YMYV

>e_gwW.64.11.1 [Chlre3:122396] green parts look right compared to Arab.

The first exon shown matches a volvox seq.  The true N-term is not identified.

assembled pieces

422589 GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684

(0) NVPVANAQPA

423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD

PAAAKHVLRSTDNSQRNIYNKGLVAE (0)

VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2)

AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT

LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT

NDLIKQCKEMVDEEEMRAASAAAAA (1)

(1) GTEYLIEAVPSVLRLLIPERAEVDSTQ  (chlamy AFWX153863.b3 with frameshift DST/QLRDD)

    LRDDLLSMLVAGHETT (1)

(1) APLTWTLYLLVNNPNKMYAP (0)

390458 (0) AEVDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK (0) 390625

428555 (0) VPVGQDVMISVYNIHHSPAVWDDPE (0) 428629

429002 AFIPERFGPLDGPVPNEQNTDFR ()429070

429495 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 429617

429980 IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA* 430105

 

NVPVANAQPA is from trace file 658821390

422589 GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS (0) 422684

(0) NVPVANAQPA

423126DIKELLGGALFKALYKWMQESGPIYLLPTGPVSSFLVVSD

PAAAKHVLRSTDNSQRNIYNKGLVAE (0)

VSEFLFGKGFAISGGDAWKARRRAVGPSLHK (2)

AYLEAMLDRVFGASSLFAADKLRKAAAEGTPVNMEALFSQLT

LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKALEAVELIRKTT

NDLIKQCKEMVDEEEMRAA (2)

Trace 335863205 continues seq (no match)

 (seq gap)

(0) LRDDLLSMLVAGHETT (1)

Trace file 650467013 matches mid region of gap

in e_gwW.64.11.1 This seq has 436 (1) APLTWTLYLL (0) = 97C like seq

390464 (0) VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGFK (0) 390625

This fragment matches scaffold 64 from 390464 to 390625

Missassembled 97C3 seq

 

428555 (0) VPVGQDVMISVYNIHHSPAVWDDPE (0) 428629

429002 AFIPERFGPLDGPVPNEQNTDFR ()429070

429495 YIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ (0) 429617

429980 IGMTTGATIHTTNGLYMYVKERGAAASGSSGVAGGKQLAAA* 430105

 

 

 (0) dymndsdpsvlrfliaareevdstq (volvox trace 636376981)

 

blast of Chlamy unplaced reads with Physcomitrella 97C seq

>SYF31892.y1  CHROMAT_FILE: SYF31892.y1 PHD_FILE: SYF31892.y1.phd.1     [top]

           CHEM: term DYE: ET TIME: Mon May 20 17:26:52 2002

           TEMPLATE: SYF31892 DIRECTION: rev

          Length = 786

 

 Score = 76.5 bits (163), Expect = 7e-14

 Identities = 30/46 (65%), Positives = 40/46 (86%)

 Frame = +1

 

Query: 31  QLRDDLLSMLVAGHETTGSVLTWTVYLLSKNPAALAKVHEELDRVL 76

           QLRDDL++ML+AGHETT +VLTWT+YLLS++P A A + +E+ RVL

Sbjct: 196 QLRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKEVRRVL 333

 

>AFWX152107.b2  CHROMAT_FILE: AFWX152107.b2 PHD_FILE:     [top]

          AFWX152107.b2.phd.1 CHEM: term DYE: big TIME: Mon Nov

          1 12:27:19 2004

          Length = 1012

 

100% match to 97C1 Arab.

Query: 31 QLRDDLLSMLVAGHETTG 48

          QLRDDLLSMLVAGHETTG

Sbjct: 15 QLRDDLLSMLVAGHETTG 68

GCAGGTGGATCACGCAGCTGCGCGACGACCTGCTGTCCATGCTGGTGGCGGGACACGAGACCACGGGTGAGGGGGGGCGGGGGCAGGGGCTTGTGCCGGCCACCCGTTAT

This matches trace file 587324724 in Chalmy

Also matches 636376981 in volvox

Use the volvox seq to look upstream

 

The volvox seq matches Physcomitrella 97C5 with no intron

Query:   317 QDYMNDSDPSVLRFLIAAREEVDSTQLRDDLLSMLVAGHETT 442 volvox

             ++Y+N+SDPSVLRFL+A+REEV S QLRDDLLSMLVAGHETT

Sbjct:   372 EEYVNESDPSVLRFLLASREEVSSVQLRDDLLSMLVAGHETT 413 moss 97C5

 

volvox 97C like I-helix

>ABSY190514.b1  CHROMAT_FILE: ABSY190514.b1 PHD_FILE:     [top]

           ABSY190514.b1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28

           22:26:28 2003

          Length = 1327

 

 Score = 35.8 bits (73), Expect = 0.033

 Identities = 16/40 (40%), Positives = 26/40 (65%)

 Frame = -1

 

Query: 18  LLASREEVSSVQLRDDLLSMLVAGHETTGSVLTWTLYLLS 57

           LL + + +S+ +LR +   +LVAG ETTG  + W+L  L+

Sbjct: 532 LLITGKPLSAKRLRCETAFLLVAGFETTGHGIAWSLLFLA 413

 

Volvox I-helix region 97B like seq

>ABSY134624.g1  CHROMAT_FILE: ABSY134624.g1 PHD_FILE:     [top]

           ABSY134624.g1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28

           22:55:34 2003

          Length = 1303

 

 Score = 32.6 bits (66), Expect(2) = 0.001

 Identities = 12/19 (63%), Positives = 18/19 (94%)

 Frame = -3

 

Query: 23  EEVSSVQLRDDLLSMLVAG 41

           E+V++ QLRDDL++ML+AG

Sbjct: 137 EDVTNKQLRDDLMTMLIAG 81

 

 

 

 Score = 22.8 bits (44), Expect(2) = 0.001

 Identities = 9/14 (64%), Positives = 11/14 (78%)

 Frame = -3

 

Query: 11  DPSILRFLLASREE 24

           DPS+LRFL+  R E

Sbjct: 176 DPSLLRFLVDMRGE 135

 

Volvox EXXR region for 97C

>ABSY52309.x1  CHROMAT_FILE: ABSY52309.x1 PHD_FILE:     [top]

 

Query: 21  PTIQDMKKLKYTTRVMNESLRLYPQPPVLIRRSIDNDIL 59

           PT+ D  +L+Y  R +NES+RLYP PPVL+RR++  D L

Sbjct: 168 PTLADYGQLRYVMRCVNESMRLYPHPPVLLRRALVEDEL 284

 

(0) VESVMGSRTAPTLAD

YGQLRYVMRCVNESMRLYPHPPVLLRRALVEDELPGGYK (0)

 

GTGGAGTCCGTGATGGGCAGCCGTACCGCCCCCACCCTGGCGG

ACTACGGCCAGCTGCGGTACGTGATGCGCTGTGTGAACGAGTCCATGCGGCTCTACCCGC

ACCCGCCCGTGCTGCTGAGGAGGGCGCTGGTGGAGGACGAGCTGCCGGGGGGCTACAAG

This volvox DNA matches trace files for Chlamydomonas 90%

336308963, 335368868, 335328342

(0) VDAVLGSRLSPTMADYGQLRYVMRCVNESMRLYPHPPVLL

RRALVEDELPGGFK (0)

This fragment matches scaffold 64 from 390464 to 390625

Missassembled 97C3 seq

 

 

 

N-term region

>gi|93288035|dbj|BW989539.1|  BW989539 Chamaecyparis obtusa cambium and surrounding tissues

Chamaecyparis obtusa cDNA clone CO02636 5', mRNA sequence.

Length=565

 

 Score = 85.9 bits (211),  Expect = 7e-16

 Identities = 41/78 (52%), Positives = 54/78 (69%), Gaps = 1/78 (1%)

 Frame = +3

 

Query  5    DSKGAGTSFTSPGWLTQLNMLWGGKSVSNVPVANAQPADIKELLGGALFKALYKWMQESG  64

            D  GAG S+ SP WLT    +  G   S +P+ANA+  D+K+LLGGALF  L+KWM+ESG

Sbjct  234  DKAGAGLSWVSPDWLTSFMKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESG  413

 

Query  65   PIYLLPTGPVSSFLVVSD  82

            P+Y L  GP  +F+V+SD

Sbjct  414  PVYRLAAGP-RNFVVISD  464

 

ISPSLPSITSNVAVSLPKQSTRKKKTRLLRIQCRVDEKSTSTDKAGAGLSWVSPDWLTSF

MKMRTGPDESGIPMANAKLDDVKDLLGGALFLPLFKWMKESGPVYRLAAGPRNFVVISDP

EAAKHVLRNYGKYGKGLVSEVSQFLFGSGFAIAEGELWMVRRKAVLPSIHRKYLSVMVDR

VFCKCAERLVEKLNRDTEMAVEVNME

 

volvox

>ABSY209455.b1  CHROMAT_FILE: ABSY209455.b1 PHD_FILE:     [top]

           ABSY209455.b1.phd.1 CHEM: term DYE: big TIME: Fri Nov 28

           23:45:44 2003

          Length = 1108

 

Query: 25  GKNIDSKGAGTSFTSPGWLTQLNMLWGGKSVS 56

           GK+ID+ GAG SFTSPGWLTQLNMLWGGK VS

Sbjct: 236 GKSIDAAGAGASFTSPGWLTQLNMLWGGKGVS 331

 

Volvox

>ABSY179960.b1  CHROMAT_FILE: ABSY179960.b1 PHD_FILE:     [top]

 

Query: 61  EAVELIRKTTNDLIKQCKEMVDEEEMRAA 89

           +AVELIR+TTNDLI++CKEMVDEEE  AA

Sbjct: 443 KAVELIRQTTNDLIRKCKEMVDEEEREAA (1) 357 agt

 

>CX541939.1| s13dNF0BH03GS032_467186 Germinating Seed Medicago truncatula

Query  1    LDIIGKSVFNYDFNSLTSDSPVIQAVYTALKETEQRATDLLPLWKVPGIGWLIPRQRKAL  60

            LD+IG SVFNY+F++L SDSPVI+AVYTALKE E R+TDLLP WK+  +  +IPRQ KA

Sbjct  120  LDVIGLSVFNYNFDALNSDSPVIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAE  299

 

Query  61   EAVELIRKTTNDLIKQCKEMVDEEEMR  87

             AV +IRKT  DLI+QCKE+V+ E  R

Sbjct  300  NAVTVIRKTVEDLIEQCKEIVESEGER  380

 

SIMVDRVFCKCAERLVEKLQADAVNGTAVNMEDKFSQLTLDVIGLSVFNYNFDALNSDSP

VIEAVYTALKEAEARSTDLLPYWKIDFLCKIIPRQIKAENAVTVIRKTVEDLIEQCKEIV

ESEGERIDADEYVNDADPSILRFLLASREEVSSVQ

 

>CYP710B1 C_1540014 10 EXONS 43% to 710A1 exon 1 predicted by genscan.

EST SUPPORT BI719962.1 There are two possible start codons 15aa apart.

20577 MNATGLLNDGLASLG

      MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 20356

20000 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP 19905

19904 GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 19785

19644 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 19498

19355 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 19269

18917 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 18802

18583 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 18497

18215 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 18012

17722 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLTETYTAPKGALIVPSLVAACKQ 17456 (0)

17279 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 17155 (0)

16995 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 16840

 

newest data: version 3 checked April 30, 2006

Name:   estExt_gwp_1H.C_660048

Protein ID:    132687

Location:       Chlre3/scaffold_66:390953-394690

 

394690 MNATGLLNDGLASLG

394645 MSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 394505

394113 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP

       GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 393898

393757 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 393596

393468 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 393373

393030 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 392911

392696 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 392610

392328 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 392152

391835 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLT

       ETYTAPKGALIVPSLVAACKQ 391608 (0)

391392 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 391270 (0)

391108 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 390953

 

>CYP737A1 C_470024    

I cannot detect the N-terminal sequence for this gene. (about 100 aa)

13432 (2) SWPAATVAMLGTDSVTFST 13379 (1)

13145 GAYHRSLRRLLGPCFSPQ 13092 (0) C-helix

12878 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0)

      ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS 12322 (1)

12098 NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS (0)

      DLAINLLFAGHETTATSIVRLML (0)

      VLRSRPDVVSRLREEQAAAVRQHGAAIS (1)

10590 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0)

      GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2)

      WLSGRSSASGNSSNSASSSAL

      QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM

      GMLPPQMLTFGGGGRYCLGANLAWAELK (0)

      VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0)

      VRAR*

 

newest data: version 3 checked April 30, 2006

Name:   Chlre2_kg.scaffold_41000082

Protein ID:    151890

Location:       Chlre3/scaffold_41:636238-640632

 

640648 (2) SWPAATVAMLGTDSVTFST 640592 (1)

640358 GAYHRSLRRLLGPCFSPQ 640305 (0) C-helix

640091 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSAVIEQLPKLQKG (0) 639933

639681 ARMLTFEVMSHVVAGFHFSPQQLASLSDAFDVFVRGIFAPVALAIPGS (1) 639538

639314 NYAKASAARKVMVAALTQQLELLKGGSGGGGNGGGANGGGDGDS (0) 639183

638938 DLAINLLFAGHETTATSIVRLML (0) 638870

638116 VLRSRPDVVSRLREEQAAAVRQHGAAIS (1) 638033

637803 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0) 637672

637263 GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (2) 637156

636987 WLSGRSSASGNGSSNSASSSAL

       QQQDGTATADGDDVASAAAAASVGGGGGAAGSGTLSSPM

       GMLPPQMLTFGGGGRYCLGANLAWAELK (0) 636721

636345 VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE (0) 636238

635814 VRAR* 635800

 

 

>CYP738A1 C_570052 a member of the CYP85 clan

There is a problem between exons 3 and 4.  In almost all members of the CYP85 clan (CYP85, CYP707, CYP90 etc.) There are 28 amino acids between TVM and LVG

in this gene there is no way to accomplish this spacing.  I suspect an error.

The yellow sequence can be inserted if a T to A change occurs at 78905

creating an AG boundary, but the sequence is still 5 aa short. Need an EST

78090 MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE

      LPLPEGSLGLPVVGETLELITN (1) 78317

78475 GDTFGTSRRERYGDVYKTNILGAPTVM 78555 (0)

78907 VAAPMARRYACICFRFSCQVTST

78976 LVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ 79104 (0)

      AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT (1)

79935 AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT 80030

      VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST

      GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQ (0)

      DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK 80709 (0)

81002 GWRVWCHVGDSVTRYNKDQFQPERWLGSSG 81091 (1)

      MAAGGCPMHAGGGGAARGA

81230 QPEYSLPFGSGVRTCLGRNLVMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL* 81433

 

newest data: version 3 checked April 30, 2006

Name:   fgenesh2_pg.C_scaffold_6000379

Protein ID:    167934

Location:       Chlre3/scaffold_6:2860971-2865055

 

2864314 MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE

        LPLPEGSLGLPVVGETLELITN (1) 2864090

2863929 GDTFGTSRRERYGDVYKTNILGAPTVMVYGE (0) 2863837

2863692 DAVRAVLAAEDRLVASDWPQ (0)   2863631

2853440 VTSTLVGPDSLNLLTGPRHGAVKRALSDAFADRALRRHVPAIAELVQ (0) 2863300

2862803 AVFDRVVLGGAGSRDRAAQLQAVMSALQAGFNTPPVQLPFT (1) 2862681

2862469 AYGKAVAARQEFGQLVSQSIQRSRQHTAASAT

        VSVSPSSAPAFDCAMSDVVAAAAAAAATGTALPDSLLVDNAAAAFFGNAST

        GPSLAKALQHLATNAAGPNGGATGGVMAALRQEQQ (0) 2862116

2861859 DIVSRHGPAITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK  (0) 2861695

2861402 GWRVWCHVGDSVTRYNKDQFQPERWLGSSG (1) 2861313

2861231 MAAGGCPMHAGGGGAARGAQPEYSLPFGSGVRTCLGRNL

        VMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRRL* 2860971

 

>CYP739A1 C_130004 no ESTs inserts in exon 3 and exon 6 INSERTION IN EXON 8

newest data: version 3 checked May 1, 2006

Name:Chlre2_kg.scaffold_8000154

Protein ID:140983

Location:Chlre3/scaffold_8:1065299-1068007

1064933 MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSR

        ATSVPRLGPLLTIPLLGDVAWLAADPTRFVFGR (2) 1065157

1065263 FQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVP

        FSSFRRLMEVAPGRPYDVQADKAAHGPW (0) 1065466

1065649 RRMFLSALGPAGLQALLPRAQAVMQAHLAQWEAAGTAAGGRSGGGCIPSLFRQ (0) 1065807

1065921 VRLLSVDLAIEVIAEVPLPPGVERIAFREQ (0) 1066010

1066110 LLCFLDGLFGLPLALPGSSVARALAAKEELVAALGPLVAADRQRMAKR (0) 1066253

1066445 WRAAGSSYAALVDTLTAASAAVGGSAAAEAAAGVQAAEPSAAAAARVTVRDAVISGFMALG (2) 1066627

1066780 RAAAVSVLHAVVAGADTTRFALFNTLALVAMSARVQEEIFAEQER (0) 1066914

1067117 VVAEHGPELSARVLGSAAITPYLDAVVREAMRLLPATPGN

        MRRLTADLRVGAGRGGPASELVIPK (1) 1067311

1067464 GSMVWRFVPLMHCLDPVLWDGDTSVDVPAHMDWRSNFEG

        AFRPERWLSEDTKPKYYYTFGSDNHLCVGQNLAYM (0) 1067685

1067865 EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRGTDRVLLEPR* 1068008

 

>CYP739A2 C_130006 EST support BI724239.1 1031069F06.y1

note micro exon of 24 nucleotides (phase 1 boundaries)

newest data: version 3 checked May 1, 2006

Name:Chlre2_kg.scaffold_8000156

Protein ID:140985

Location:Chlre3/scaffold_8:1078611-1085528

1078648 MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGPLLHVPLLGDG

        VLMAAGNPVKMFWDR (2) 1078872

1081962 YRRYGSVFRTMMLGSRIWVVTDLDALRGPLRDEGAYLEIPFKAFQRLV (2) 1082105

1082294 SAESFLNRPGVHGPW (0) 1082338

1082426 RKIFSATLAPPRLAAMVPKIAQ (0) 1082491

1082664 LMQSHLSKWEEQGQVTIFRA (0) 1082723

1082865 ARVMGVDLAVDVILDIKLLDGTDRAWVKSQ (0) 1082954

1083213 VEDYLDGLYGLPLNLPGSTLSKALAARARLVEVFLRQPDVAAMQAQF (0) 1083353

1083542 WEAIGKSPQAYAAAVLDQHTSTGDKPAGVAAEEEPSGKAAGAPTPAAP

        GSRPAVLPPSIMTAQLMGRAMLK (2) 1083754

1084122 PSELADGAMSLLHMLVASADTTRFALFNTWTLLAMSPRVQDKLYEEQKK (0) 1084268

1084520 VMAEYGEELSYAATCHMPYMDATLKECMRLLPASAGGIRKLTADMQVGGYTVPA (1) 1084681

1084836 GEYVWYHA (1) 1084859

1085017 GLMHYIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETRPRYM

        FTFGTGAHLCIGMNLVYL (0) 1085214

1085385 EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKGTDTVLLEPR* 1085528

 

>CYP739A3 C_130125 PTQ11643.x1 PTQ6387.y1

insertion of 15 aa in the WEEG region (DRWT) end of exon 3

Also insertion in exon 6

169549 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR (2) 169388

169172 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEIPIKSFKMLTELPSPNSDRVNHATW (0) 168982

168818 RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0)

168370 QRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 168342

168037 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 167894

       WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGGGGGGGPEAAAPAPQGQQQ

       SLFRLPQAVMLGFFGLK (2)

167270 ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 167136

166905 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 166687

166388 GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG 166275

166274 AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 166155

165982 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 165842

 

newest data: version 3 checked May 1, 2006

Name:Chlre2_kg.scaffold_8000164

Protein ID:140993

Location:Chlre3/scaffold_8:1105803-1109384

1109510 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR (2) 1109349

1109134 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEI

        PIKSFKMLTELPSPNSDRVNHATW (0) 1108943

1108779 RKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQQQQGGGGGGGGGGQLQIYRA (0) 1108612

1108334 CRKLGLDLSVDVVAGVDLPQSVDRGEFKKQ (0) 1108245

1107998 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 1107855

1107661 WSQVGGDMAAMSELLIQQLDQQEGDDMGASSSSGGGGGG

        GGGGGPEAAAPAPQGQQQSLFRLPQAVMLGFFGLK (2) 1107440

1107243 ATGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK (0) 1107097

1106866 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPT (1) 1106705

1106352 GALLLSYSFLMHCIDPALWDGDTSVDVPAHMDWRNNFEG

        AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYM (0) 1106131

1105943 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 1105803

 

>CYP739A4 C_130009 no ESTs insert in exon 8, 52% to 739A5

     MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0)

     WPFLGDAVELGITSDLSRLM (2)

7765 FKKYGRVFRLNLLGHTAFV (0)

7434 VSDEAALRGVLSDDGAIATIPFRAFS (2) 7411

7197 DLMGEYGTQSVKEIHGPW (0) 7181

6868 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0)

     SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1)

5973 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE

     QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR (2)

5465 HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 5420

4094 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY 3915

3914 HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYYFTFGSGVHLCAGVNLVYL (0)

3498 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL*

 

newest data: version 3 checked May 1, 2006 (Join two models)

Name:fgenesh2_pg.C_scaffold_8000179

Protein ID:165902

Location:Chlre3/scaffold_8:1131576-1134169

Name:fgenesh2_pg.C_scaffold_8000180

Protein ID:165903

Location:Chlre3/scaffold_8:1135368-1136663

1131245 MLEPELAVAGLRGLLSDPRIVGTLFAALIAALAVWASGIVGTKLHLPGPYIT (0) 1131400

1131576 WPFLGDAVELGITSDLSRLM (2) 1131635

1131729 FKKYGRVFRLNLLGHTAFV (0) 1131785

1132054 VSDEAALRGVLSDDGAIATIPFRAFS (2) 1132131

1132291 DLMGEYGTQSVKEIHGPW (0) 1132344

1132623 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRVELFQA (0) 1132748

1133096 SHAMGLDLSTDVIANVHFTALDRGWFKQQMRTFTAGMW (1) 1133209

1133518 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE

        QERQREAAREAEARGQKATPPDLSIKEAMLTAYFIGGWVR (2) 1133808

1134023 HTALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 1134169

1135197 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY

        HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWL

        SEETKPKYYFTFGSGVHLCAGVNLVYL (0) 1135580

1135793 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPLPVPGIDL* 1135969

 

>CYP739A5 C_130009 C_22500001

EST SUPPORT BI527318 BG852189 BE129324 BI527323 BI527331 BU651784.1

MICRO EXON 13 NUCLEOTIDES

 

newest data: version 3 checked May 1, 2006 (Join two models)

Name:fgenesh2_pg.C_scaffold_8000177

Protein ID:165900

Location:Chlre3/scaffold_8:1125087-1127174

Name:estExt_fgenesh2_pg.C_80173

Protein ID:186291

Location:Chlre3/scaffold_8:1128094-1131067

1125087 MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLF

        SWPFLGDTIEFATTDPTKFLFGR (2) 1125287

1125418 FKRYGR (2) 1125435

1125617 VFRLSLLGFTAYVTADPEALRPLLADEGGHFTIPVQTFTALMGAYNLQAHKEVHAAW (0) 1125787

1126081 RKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRVELYEA (0) 1126206

1126538 ARTLGLDLAVDVLSGVKLEERGIQPAWLKSR

        MADFLGGLYGLPLALPGSPLAKALAAKEELLRVLVPAVEGRQQELLKL (0) 1126774

1127068 WEDNDRSAAAVATKLASSPETATIADANLLGFTARG (2) 1127175

1128397 CTTPRDAAMTVLHAVMGAADTTRFALFNTWAILAMSPRVQDLIYEEQKK (0) 1128543

1128758 VVAENGPELTYKTAMSMP (2) 1128811

1129253 YLDAAFKEAMRLLPASAGGFRMLTKELRVGDVLLPP (1) 1129360

1129696 GTIIW (2) 1129710

1130071 FHALLLQTLDPVLWDGDTSVDVPVHMDWRNNFEGAFRPERWLSEET

        KPRSYYIFGQGAHLCAGMVLVTL (0) 1130277

1130498 EVKLLLAMVLRKWRLQLEVPDMLARAELFPYTKPAKGTGGMRLIAREQPVA* 1130653

 

>CYP739A6 C_130012 C_5270001 33% to 707A2, 85 clan member, 57% to 739A2

ESTs BU647654.1 BI528139

28201 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQSRFMFSR (2) 28013

27729 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFPTFKLLMGDFNVAGGGKHIHGPW (0)27538

26890 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 26765

26576 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 26487

26101 AETWLYGMWGLPVPLPGS (2) 26048

25807 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 25718

25284 WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2)

24803 ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 24672

24589 VIEEFGPELSYKAASSMP (2) 24536

24153 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 24046

23660 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 23493

23362 FTFGYGNHLCAGINLAYL (0) 23309

23164 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 23024

 

newest data: version 3 checked May 1, 2006 (Join two models)

Name:estExt_fgenesh2_pg.C_80177

Protein ID:186292

Location:Chlre3/scaffold_8:1145690-1151969

1145820 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGP

        MFTLPILGDTVALALSEQSRFMFSR (2) 1146008

1146292 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFP

        TFKLLMGDFNVAGGGKHIHGPW (0) 1146483

1146925 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 1147050

1147239 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 1147328

1147714 AETWLYGMWGLPVPLPGS (2) 1147767

1148008 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 1148097

1148531 WTELGSSGAAMADDLVAKASAAPGAEGAKGLGAPRLSHVIRLGLFGLG (2) 1148674

1148997 ATEVEHSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 1149143

1149226 VIEEFGPELSYKAASSMP (2) 1149279

1149662 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 1149769

1150155 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 1150322

1150453 FTFGYGNHLCAGINLAYL (0) 1150506

1150651 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 1150791

 

>CYP740A1 C_1080041  ONE EXON IN A SEQ GAP C_1080041

3676 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTRYMWK (0) 3500

NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0)

VPGKTFTMLISDIRHMQVPEQHAVF (0)

RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0)

CAAVSLDVALEVLTGVPLPGGPETRAEVRRGTGG (0)

LFRTLAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPPELDREVAELAGK (0)

SRLPTGGTAWHRTPRPGSAACPRGPTADAGSRRSHRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR (2)

TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0)

AMRKYGDELSYSILNDMPYLDAVIK (0) 497

ETIRVFPTAVGGFRRALKDVP (0) TIN268971.x1

XXXXXXXXXXXXXXXXXXXXX PKG motif in seq gap

(0) AFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE (0)

ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP*

 

newest data: version 3 checked May 1, 2006 (Join two models)

Name:Chlre2_kg.scaffold_68000022

Protein ID:153850

Location:Chlre3/scaffold_68:173935-177729

172336 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGP

       AFYLPAIGETLSLFASPTRYMWK (0) 172533

172751 NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0) 172870

173104 VPGKTFTMLISDIRHMQVPEQHAVF (0) 173178

173539 RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA (0) 173667

173950 CAAASLDVALEVLTGVPLPAAPETRAEVRRGTGG (0) 174051

174373 LFRTALAGLYGVPLPWLPGTAIHSALRAQRRLMALLGPELDREVAELAGK (0) 174522

174680 SRLPTGGTAWHETHLAHARTPRPGSAACPRGPTADAGSRRS

       HRHRHHQLLLRHRGAHAHAGGPGRCGPALPHRHAFLR (2) 174913

174948 TGTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0) 175109

175468 AMRKYGDELSYSILNDMPYLDAVIK (0) 175542

175867 ETIRVFPTAVGGFRRALKDVP (0) 175929

176602 VEGGQLIPAGSIVFYSTHLLNAADPALLPRSLAPE (2) 176706

176860 ALEGPTGLPAHLDYE (0) 176904

177170 CRLEEAFRPERWLSDETRPRQFAGFGGGQHLCLGMHLAHAE (0) 177292

177581 ARMLLALVVRRFHLRLEQPQLLSRVTYFPGPVPRKGADGLVLMPRRLEP* 177730

 

>CYP741A1 14 exons C_980053, C_980058 (N-term part)

72A9 LIKE exons 3,4,5,13,14 not well supported

8031       MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK

         SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 8225

8408  VHQCMARWAEQYGKIFKLELPTMT (0) 8479

 

missing approx 80 aa between exon 2 and VMTG

this is a very poorly conserved region, so it is very hard without cDNA to

identify the missing piece(s).

 

10741 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 10893

11059 VVGELMTALQRMLNPLNRWFWWRK (0) 11130

11410 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 11544

11782 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 11901

13254 VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) 13370

13544 VIHESMRLMPPTSGGTVR (2) 13597

13750 VVPRDTQLAGHVLPKGTMLW (0) 13809

14146 IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) 14253

14541 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 14648

14920 FRLADDMGGVEG (1) 14955

15160 AVSEARQHITLKPGDRGLLMHAIPRVPA* 15246

 

newest data: version 3 checked May 1, 2006

note version 2 seq is better than ver 3 at this gene

Name:   fgenesh2_pg.C_scaffold_71000048

Protein ID:    179637

Location:       Chlre3/scaffold_71:380138-384009

Name:   fgenesh2_pg.C_scaffold_846000001

Protein ID:    181363

Location:       Chlre3/scaffold_846:2079-5042

34% identical to CYP767A1, 29% to Cyp3a11 Drosophila 4 clan member

380138 MDGFWKTLGLGALLSPVLYALYLASLIVIPYLK

       SLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 380332

380494 VHQCMARWAEQYGKIFKLELPTMT (0) 380589

381479 AVVLTDPEAVSQVLKVDRFEKLTTSYQNMEK (0) 381582

382152 LTAEQQPNILTEPLSAYYKAVRRAVTPAFSTANLR (2) 382211

328813 RFFPLLLDITQQ

382849 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 383001

383167 VVGELMTALQRMLNPLNRWFWWRK (0) 383238

383518 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 383652

383890 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 383878

       VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) (in a seq gap)

  5043 VIHESMRLMPPTSGGTVR (2) 4990

  4837 VVPRDTQLAGHVLPKGTMLW (0) 4778

       IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) (in a seq gap)

  4242 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 4135

  3863 FRLADDMGGVEG (1) 3828

       AVSEARQHITLKPGDRGLLMHAIPRVPA* (in a seq gap)

 

>CYP742A1 C_60077 29% to 741A1

YELLOW COULD BE REMOVED AS AN INTRON

newest data: version 3 checked May 1, 2006

Name:   Chlre2_kg.scaffold_37000075

Protein ID:    151489

Location:       Chlre3/scaffold_37:480605-486413

Note: the model Chlre2_kg.scaffold_37000075 is short at the N-term. It is

missing the first 63 aa.  It is also wrong at the end of the second exon

After WRA

486602 MHTAPRRIHAARCRPLHASTGASTPGPAGAPDLPPLQRA

       PGPPGLPWLGQLPAYLATKFFPKKMLEWSEQYNGVYAMEIVGRKYLVVT (1) 486339

486182 EPSLIAGIVGRGSAGLPKSTGYAMWDSAIS (2) 486093

485810 PHAGVQGLFTVAENTTTWRAVRRAYGPAIGPGSMS (2) 485706

485124 SGTSTSTSSSSTASINSTTGLTSHEMNHLAKCLTLDMLGLSAFG

       IDFRCLDDPAAAQLPSLIES (0) 484933

484532 AMHECGERARSVGRRLLPWLYEEEARAGAADMAAFHALVE (0) 484413

484091 DVWRQIRARGAPTEDDNSFGAQLLRLADPSLAP (1) 483993

483712 GGAALSDEQICAEIATVIIAGYETTAN (2) 483632

483379 TLTWMLYGLHAHKDASEQLVAELRGA (1) 483302

483023 GLVPDTSSSSSPSSVDPTTASFASLAGAHEALGGLPVLDAYVRECLRLYSTAP

       NGLIKEVPKNGPPARV (1) 482817

482606 GPFAADPGVVVWIPFWSLHLSNLNWEQPHDFQL (0) 482508

482279 SRWLGKDPRTAGSLTASRCP (0)

       VSGTLNALRAATSSSSSSSSSSSSSSSSSSSSSSGSDSDGEGGSSSGGRGSK (0)

       AIRFMPFGDGSRNCVGQHLGMLQLK (0) 481989

481146 LSLAYLAARFDLVLDEARMGGSAAAALERQRVNLTLEVDGGM (2) 481021

480681 YLLGASVHSHARVYWYQLVSCEPKC* 480604

 

>CYP743A1 C_180013 16 exons

      MLRALSCLALLAAGAARLAAAAGATDSA (0)

14775 VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 14852

15120 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0?)

15414 VAMGRKWVVVVADAELMRQ (0)

      AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2)

16227 GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0)

      GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2)

16969 RLAVACGDVFRFGSALHGSS (2) 17034

17266 YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 17364

17660 ARSTLLAACMELIRSWRQQHHATT 17722 (large insert here)

      TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV

      GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ (0)

      VQTFLLAGYETTANALAFAIYCVATHPE (1)

181099 VESRLLAEVDAVLGRDR (2) 181146

18987 PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0)

19333 LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 19479

19665 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 19715

19932 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 20069

 

newest data: version 3 checked May 1, 2006

Name:   e_gwW.1.412.1

Protein ID:    116541

Location:       Chlre3/scaffold_1:5612270-5613769

5617553 MLRALSCLALLAAGAARLAAAAGATDSA (0) 5617470

5617234 VSRALAVLALLLALHVLADPLQRWRLRHIP (1) 5617145

5616874 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK (0) 5616770

5616574 VAMGRKWVVVVADAELMRQ (0) 5616518

5616251 AGQRLRSHVIIEPNLNRGHLRRLDAEGLFQAH (2) 5616156

5615776 GEFWRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA (0) 5615657

5615530 GGGARPAAGYVDVWRALGGMTLQVVGSTAYG (2) 5615438

5615016 RLAVACGDVFRFGSALHGSS (2) 5614957

5614725 YQRIGLLLPELVPALVPLAHSLPDPPFKRLQR (0) 5614630

5614331 ARSTLLAACMELIRSWRQQHHATT  (large insert here) 5614263

5614262 TRTAGGTTATGVAAAAEAPAAMCGAAVPAAAAAVDGAAAPAGPEEADAAARGGGV

        GGGGGDGSGVGGSGVAAGSFLDLMLAARDKANGAALTDRMVAAQ (0) 5613966

5613766 VQTFLLAGYETTANALAFAIYCVATHPE (1) 5613683

        VESRLLAEVDAVLGRDR (2)

5613004 PPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ (0) 5612882

5612664 LGGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 5612548

5612380 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ (0) 5612279

5612062 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRRAARAE* 5611907

 

>CYP743A2 C_420091 33% to CYP711A1

16 exons EST support BM002146 BI728655 BE726345 N-term to C-helix

37486 MQDVISFLLNGLGFAAVGLVVL (0) 37551

37671 QLVLSLDLYKRWKLRHLP (1) 37724

37956 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 38060

38400 VAFGRNWMVVVAEPDLLRQ (0) 38456

38720 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR (2) 38812

39351 DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 568697

      GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2)

      VDFQAMESLPAAGTGEGGADTKPAAAP

39994 APASSSYG RVLVQACRDVFKYSSVVYGSK (2) 40059

40319 YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT (0) 40408

40595 ARTHLRDACMSLIDGWKKQEASG 40654

      VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE insert

567380 AAAAVGTAAGGVGGLSAGSFLGLMLAAR 567309

       DKSTGEGLTDLQVAAQ (0)

41049 VQTFILAGYETTANALAFAVYCLATNPE (1) 41141

      AEAKLLAEIDAVLGPDR (2)

41578 LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 41700

42000 VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 42119

42294 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (0) 42395

42712 EAKVALAALYQRLTFELEPGQ (0) 42771

43237 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 43362

 

newest data: version 3 checked May 1, 2006

Name:   estExt_fgenesh2_pg.C_160079

Protein ID:    189550

Location:       Chlre3/scaffold_16:612970-615574

Name:   e_gwW.16.62.1

Protein ID:    116043

Location:       Chlre3/scaffold_16:610198-611929

 

scaffold_16: 609616-615492

615492 MQDVISFLLNGLGFAAVGLVVL (0) 615427

615307 QLVLSLDLYKRWKLRHLP (1) 615254

615022 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 614918

614578 VAFGRNWMVVVAEPDLLRQ (0) 614522

614258 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR (2) 614166

613627 DDFWRKVRAAWQPAFSAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 613487

613489 GQQGAGKEEEAAGKAGKAEAEGGSGGGGGSSTRVDIWRELGAMTLQVVGSTAYG (2) 613328

613083 VDFQAMESLPAAGTGEGGADTKPAAAP

       APASSSYGRVLVQACRDVFKYSSVVYGSK (2) 612916

612659 YSRVGLLFPEWRPVVAILANAAPDLPFKMLKT (0) 612564

612380 ARTHLRDACMSLIDGWKKQEASG

       VQDGKSKQEEQNGDANGHTAASTAGAKGDGAVSGAGAANAIGE insert

       AAAAVGTAAGGVGGLSAGSFLGLMLAAR

       DKSTGEGLTDLQVAAQ (0) 612051

611926 VQTFILAGYETTANALAFAVYCLATNPE (1) 611843

611651 AEAKLLAEIDAVLGPDR (2) 611601

611397 LPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ (0) 611278

610981 VGPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLP (0) 610865

610684 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (0) 610583

610269 EAKVALAALYQRLTFELEPGQ (0) 610207

609741 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 609616

 

>CYP743B1 scaffold 98 unannotated region adjacent to a large gap

C_32340001 inserts in large sequence gap of scaffold 98 and

completes the P450 gene.

first exon is best guess

251670 MVASASWQLDLLGALSGAPSPQM (0) 251602

251481 AAAGLALLLASLLIYLLDPIQRWRLRKVPGER (?) 251386

251227 (1) GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 251117

251064 SAEVQGIAVIPHHVS 251020

251017 RMQVALGRKWAVVLADAEMQRQVRGTGAERG 250925

2384 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA 2241

2240 TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 2142

1933 VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP 1751

1592 YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ 1497

1323 ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADACQ 1198

1119 SSNGVGAAATSGRGGMAGVAPGSFLDLMLG 1030

1029 HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 931

 805 VQLFILAGYETTANALAFAVYCIATHPE 722

 526 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT 350

 266 VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 153

 

newest data: version 3 checked May 2, 2006

Name:e_gwW.71.18.1

Protein ID:122749

Location:Chlre3/scaffold_71:125260-130065

Frameshift at HHVS/RMQ, GC boundary at RKVP?

125260 MVASASWQLDLLGALSGAPSPQM (0) 125328

125449 AAAGLALLLASLLIYLLDPIQRWRLRKVP (1) 125535

125703 GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 125813

125869 AEVQGIAVIPHHVS 125910

125913 RMQVALGRKWAVVLADAEMQRQVRGTGAERG (2) 126005

126948 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA

       TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 127190

127399 VDFHSINEEDQAGSGSGSGSAIATAGATAAAKGRGDDGYGKQLAAACGQIFRYTSSAHGSP (2) 127581

127740 YLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ (0) 127835

128012 ARNRLSGAVFQLMDSWKQQHIAAAGSGAAGKGSSGKADA (2) 128128

128216 SNGVGAAATSGRGGMAGVAPGSFLDLMLG

       HRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) 128401

128527 VQLFILAGYETTANALAFAVYCIATHPE (1) 128610

128806 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT (0) 128982

129066 VGGFNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 129182

129643 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 129726

129919 EAKVALATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVTPVPRRKL* 130065

 

>CYP743B2 C_8600001 also inserts in same gap of scaffold 98

 

141391 XXRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480

ARTRLCNTVFQLIDSWKEQHRAEAEIDAAASSGKPDVGAGRHSSN

GVGAAATSGRGGLSGVAPGSFLDLMLGQRQGGERGSGGKKAEGEEGVEHAPLTDEQVAGQ (0)

VQLFILAGYETTANALAFAVYCIATHPE (1)

(seq gap)

3945 (0) SSPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 3838

3648 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 3511

 

CYP743B2 scaffold_71:130374-138996

a duplication of 743B3 note: 1 extra S in exon 1

CYP743B3 has some defects so CYP743B2 may be the intact gene,

While CYP743B3 may be a pseudogene copy.

130374 MSNVFANWPSGSGAPLGGLLRSLGM (0) 130448

130571 VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 130657

130846 GPGPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 130962

131029 AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 131151

(seq gap, missing six exons)

138577 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0) 138660

138850 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRRKL* 138996

 

>CYP743B3 C_980035 C_8600002 same sequence

 

2544 MSNVFANWPSGSGAPLGGLLRLGM (0) 2615

2745 VAAGFALLLVSLIIYLLDPI

242088 KRWRLRKIPG 242059

241862 PGPRGRPVLGCLPQLRAQPMPLFLQ 241788

241786 SCAQTYGPVFKAS 241748

241692 AEVQGIAVILHRVSRMQVALGRKWVVVLADAEMQRQVDGAG 241570

240969 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGA 240826

240825 TASGATAGGGSSVDMWRELGGMTLQVVGSTAYG 240727

240592 VDFHSINEEDQAGSGSGSATATAGATAAAKGRGDDGYGKQLAAACGQIFR 240443

240442 YGSPVHGSP 240416

240284 YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK 240189

240021 ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAGQSNN 239890

239889 GVGAAATGGRGLSGVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAP 239740

239739 LTDEQVAGQ 239714

239592 VQLFILAGYETTANALAFAVYCIATHPE 239509

239320 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSR 239171

239170 VVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP 239027

238674 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ 238591

238461 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 238315

 

newest data: version 3 checked May 2, 2006

Name:e_gwW.71.20.1

Protein ID:122730

Location:Chlre3/scaffold_71:139305-143478

2 Frameshifts and one small duplication of AEVQ

these defects were not in the ver 2 seq (see above)

139305 MSNVFANWPSGSGAPLGGLLRLGM (0) 139375

139498 VAAGFALLLVSLIIYLLDPIKRWRLRKIP (1) 139584

139787 GPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS (1) 139897

139966 AEVQA 139967

139969 EVQGTAVLLHHVSRMQVALGRKWVVVLADAEMQRQVDGAG (2) 140088

140701 GSTWRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAGATASX 140853

140856 ATAGGGSSVDMWRELGGMTLQVVGSTAYG (2) 140942

141077 VDFHSINEEDQAGSGSGS 141130

141131 ATATAGATAAAKGRGDDGYGKQLAAACGQIFRYGSPVHGSP (2) 141253

141391 YLRVAMLFPELRSLLLTLAHTLPDEKFTILTK (0) 141480

141648 ARTRLCNTVFQLIDSWKQQHSAEGATAAGASSGKPDAGAG 141767

141768 QSNNGVGAAATGGRGLSGVAPGSFLDLMLGH 141860

141861 RQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ (0) 141956

142077 VQLFILAGYETTANALAFAVYCIATHPE (1) 142160

142349 VESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEAL

       RLFPPAHLTSRVVPPGETLTVGGYTIPGGTAVYLPMYLAHRDPAVWPRAEEFLPERFLP (0)142642

143119 PRGAAQQHAHAPFGYGSRMCIGYKFAMQ (0)143202

143332 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRRKL* 143478

 

>CYP743C1 C_1130014 C_9610001 AV627084.1 top part = scaf 961

35102 MTFLQLLPGVPLVLLGVLALPV (0) 35037

34921 VITLVQEVITKRKYRHIP  34868 (1)

34694 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 34611

      VK

(seq gap about 146 aa) followed by scaf 113

118674 AVALPCLLPAVRHLAAAAPDPVLALHIQ 118591 (0)

118264 SRQVLRQVSTKLITAWRDSHTAAS

ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG

31374 MAPTLTDAQIEAQVQTFLLA  31315 (1) (I-helix)

31010 GFETTANALTFAVYLLACHPE 30948 (0)

(87 aa seq gap)

29287 (0) AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 29156

28541 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR

      GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 28308

 

Name:Chlre2_kg.scaffold_17000165

Protein ID:147793

Location:Chlre3/scaffold_17:1492638-1496177

CYP743C1 scaffold_17:1489349-1496178

1489349 MTFLQLLPGVPLVLLGVLALPV (0) 1489414

1489530 VITLVQEVITKRKYRHIP (1) 1489583

1489757 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 1489840

1490285 VKQYGDLLTFRLGSRQFVLVADPDAAR (2) 1490365

        (small gap in C-helix region)

1491618 (0) PVFTARVFLTQIVFPHTARSLRGYQALMDREAVALAGRLR

        RQAAAGGGGGGGGGGGGGGGDKAGEIEVMSEMSRVTLAVVGTAAYG (2) 1491875

1492617 CNDFFRTMSPAARSSWSW 1492670

1492671 AVALPCLLPAVRHLAAAAPDPVLALHIQ (0) 1492754

1493081 SRQVLRQVSTKLITAWRDSHTAAS

        ANGSSTNSTSGSSSSTGVAPGSFLGLMLAARDRSRKEGGAAATAKDG 1493293

1493357 MAPTLTDAQIEAQVQTFLLA (1) 1493416 (I-helix)

1493721 GFETTANALTFAVYLLACHPE (0) 1493783

        (EXXR missing in a seq gap)

1494664 (0) IQGHRIPAGSTLWLSIAHLHTRDGVWPEPQ (0) 1494753

1495196 AFRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 1495330

1495948 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR

        GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 1496178

 

>CYP744A1 C_940015 (N-term part), C_940016

      MALSSAWALAGLFL (0)

      AMFVFFGYSLRKRWQLRKIP (1)

23868 GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 23773

23439 AWLGVEPLIIICDPALIR 23386 (2)

23162 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 23058

22787 GEVWRRGRRVFEASVINGVR (2)

22557 LAAHLPAINRCANRFVAQL

      AQRVAAPAAAHSGKTLGEEGIDMFS

22396 IVGGYTMAVTGEVAYG 22349 (2)

      HVPAVTRGVRPFWQVEHSTLYLPLG (0)

21478 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 21446

      IANISRLLMERWATSKKAAAAAAGTG

      TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA

      IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0)

      VIAQSFTFV 20858 MAGFETTALTLSLVTFMLATHPE (0) 20791

      AAARLTAEVDGLGPGELTHEVLAE (0)

20358 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK (2) 20233

19945 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 19775

19557 MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER*

 

Name:Chlre2_kg.scaffold_23000133

Protein ID:148983

Location:Chlre3/scaffold_23:958944-961028

N-term part, missing two exons

Name:e_gwW.23.96.1

Protein ID:118452

Location:Chlre3/scaffold_23:962118-963240

Only covers I-helix to heme

scaffold_23

958703 MALSSAWALAGLFL (0) 958744

958941 AMFVFFGYSLRKRWQLRKIP (1) 959000

959120 GALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 959212

959546 AWLGVEPLIIICDPALIR (2) 959599

959823 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVAS (2) 959930

960201 GEVWRRGRRVFEASVINGVR (2) 960260

960457 LAAHLPAINRCANRFVAQLAQRVAAPAAAHSGKTL

       GEEGIDMFSIVGGYTMAVTGEVAYG (2) 960636

961230 HVPAVTRGVRPFWQVEHSTLYLPLG (0) 961304

961510 VMFPWARPLVRWLATHFPDRAQREHMAARTQ 961602

961603 IIANISRLLMERWATSKKAAAAAAGTG

       TGTGTAITADSKAGTASAPPAEAARADGAAAAGKGAEEA

       IKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 961899

962100 VIAQSFTFVMAGFETTALTLSLVTFMLATHPE (0) 962195

962367 AAARLTAEVDGLGPGELTHEVLAE (0) 962438

962633 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK (2) 962752

963039 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0) 963228

 19557 MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER*

 

last exon is in a seq gap use ver 2 seq here

 

>CYP744A2 C_940017    

PTQ5694.x1  K-helix to heme = PTQ11662.x1 PTQ243.x1 PTQ52.x1 PTQ9722.x1

      MWNVAELGLALVPVV (0)

18913 AFVWLAYNLPERWRLRRIP (1) 18854

18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0) 18648

18338 IWFGVNPWIVVSDPALLR (2) 18285

18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS (2)

17775 GEVWRRGRRVFEASVIHPTR (2) 17722

17477 LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA

17301 EVGSYTMAVVGEVAYG 17256 (2)

      WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR (0)

16350 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 16318

      IANISRLLMERWAASKKAAAAAAGTG

      GGAGNAAGAGGDRAGG

      FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0)

15798 VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 15691

      AAARLVAEVDAVGGRELTAELLAE (0)

15294 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 15175

HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM (0)

MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP*

 

Name:e_gwW.23.89.1

Protein ID:118526

Location: Chlre3/scaffold_23:969108-971162

I-helix to heme only, seq gap above this

Use earlier version for the top half

scaffold_23

       MWNVAELGLALVPVV (0)

18913  AFVWLAYNLPERWRLRRIP (1) 18854

18740  GPVGLPFLGNILSFSTYGHDYFAMMEKYGR (0) 18648

18338  IWFGVNPWIVVSDPALLR (2) 18285

18027  KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS (2)

17775  GEVWRRGRRVFEASVIHPTR (2) 17722

17477  LAAHLPAINRCANRF 17427 VTRLAQRVAAPAAEPGAGGKDDGHSGGTGNDGGGAGFDFFA

17301  EVGSYTMAVVGEVAYG 17256 (2)

       WRLAERESRQGKPAMMSWCPTMCRLPCRLPLPHVHTQVENATKYLPLR (0)

969108 VMFPWARPLVRWLATHFPDRAQREHMAARTQI 969200

969201 IANISRLLMERWAASKKAAAAAAGTGGGAGNAAGAGGDRAGG

       FKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE (0) 969428

969669 VIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 969764

969927 AAARLVAEVDAVGGRELTAELLAE (0) 969998

970161 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 970280

970565 HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFG 970711

970712 IGARMCVGHKLAMM (0) 970753

970992 MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTEREAAAAAVAVP* 971162

 

>CYP744A3 C_940044

     MPGLGALLAFIQTPLGA (0)

     ITWLGWYPLRRYAFRKFP (1)

3380 GPFGLPFLGNLPQ (0) 3430

3597 IAAMDTTAFLTSSAVKYGPVCK (0) 3662

3831 VWFGTRPWVLINDPELIR 3884 (2)

4267 RHSFRWPARPANFASYFHVMTGENRAIDRAGVVLAE (2) 4371 TIN460677.b1

     GEVWRRGRRAFEGSIIHPAR (2) WEQ17438.g11, TIN285957.x1

5063 LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG 5218 (2)

     VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ (0)

5902 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK 5994 (0)

     VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG

     FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0)

6987 VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 7049

7448 AAARLAAEVDAVLGGRELTAELLAE (0)

8071 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR 8190 (2)

8546 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA

8714 PFGMGTRMCVGHKLAMM (0)

9146 ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG*

 

Name:e_gwW.23.108.1

Protein ID:118465

Location:Chlre3/scaffold_23:976166-982342

I-helix to heme only

Exons 6 and 7 are in a seq gap and are taken from trace archive files

982342 MPGLGALLAFIQTPLGA (0) 982292

982144 ITWLGWYPLRRYAFRKFP (1) 982091

981891 GPFGLPFLGNLPQ (0) 981853

981686 IAAMDTTAFLTSSAVKYGPVCK (0) 981621

981452 VWFGTRPWVLINDPELIR 3884 (2) 981399

981016 RHSFRWPARPANFASYFHVM 980957 TGENRAIDRAGVVLAE (2) TIN460677.b1

       GEVWRRGRRAFEGSIIHPAR (2) WEQ17438.g11, TIN285957.x1

980414 LAAHVPAMLRCLGRFTARLDRHAGSAQPLDVAAALGDLMLAAMGQIAYG (2) 980268

979957 VDFGCEEGADSSASNSSGVAGELVAALRDLFETMRMENATAYLPLQ (0) 979820

979584 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK (0) 979492

979090 VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGG 978974

978973 FKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 978875

978529 VIGQGFTFLAAGYETTSAATSLALFLLATHPE (0) 978434

978038 AAARLAAEVDAVLGGRELTAELLAE (0) 977961

977384 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR (2) 977265

976909 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGSAGGGGSSSLWPTAGGNNPHVWA

       PFGMGTRMCVGHKLAMM (0) 976691

976309 ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRRG* 976166

 

>CYP744A4 between C_239009 and C_239004 not annotated

AV641971 35% to 703A2 N-term to C-helix

51492 MYAALALVLSPVLL (0) 51451

51367 ALLWAIINPVERWKTRKIPG (2) 51308

51224 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0)51123

50867 IWFGNRAWITIADPALIR (2)50814

50325 KLGFKFLNRPARMTDFGH (0) 50272

49795 VLVGHNAEVDNAGAFVAR (2)49706

49574 GEVWRRGRRAFEASIIHPAS (2) 49515

58799 LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQGEQQGKAALAMKQQGG

      GGGGGVEILTEAGNYTMAAVGEVAYG (2) 58542

(SEQ GAP)

47764 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 47672

      VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF

56888 KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 56811

46972 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG 46832

46831 PWRAGAGAGEGACAGGELTPELLAK (0) 46757

46326 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 46288

46008 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALGPADPNGWAPFGVGARMCVGHKLAMM (0) 45820

45561 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 45421

 

Note: scaffold 121 has part of the last exon as a duplication

61920 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLL 61825

 

Name:fgenesh1_est.C_scaffold_23000031

Protein ID:95157

Location:Chlre3/scaffold_23:1143890-1147747

CYP744A4 N-term

Name:e_gwH.23.61.1

Protein ID:103666

Location:Chlre3/scaffold_23:1141463-1143101

CYP744A4 I-helix to end

1147747 MYAALALVLSPVLL (0) 1147706

1147622 ALLWAIINPVERWKTRKIPG (2) 1147563

1147479 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0) 1147378

1147122 IWFGNRAWITIADPALIR (2) 1147069

1146580 KLGFKFLNRPARMTDFGH (0) 1146527

1146050 VLVGHNAEVDNAGAFVAR (2) 1145997

1145829 GEVWRRGRRAFEASIIHPAS (2) 1145770

1145397 LAAHLPAINRCANRFVARLARRAAAAAAAAADASLGSAGGGAAQ

        GEQQGKAALAMKQQGGGGGGGVEILTEAGNYTMAAVGEVAYG (2) 1145140

(missing exon 9 in a SEQ GAP)

1143893 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 1143801

1143522 VAEVSRLLMEQWQANKAAAAAAAASGGDGGADGGDRAGGF

        KEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE (0) 1143307

1143098 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG

        PWRAGAGAGEGACAGGELTPELLAK (0) 1142886

1142455 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 1142339

1142048 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALG

        PADPNGWAPFGVGARMCVGHKLAMM (0) 1141860

1141603 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 1141463

 

>CYP744A5P pseudogene C_1730009 C-helix 81% to 744A3

PROBABLE pseudogene WITH PART OF EXON 3, EXONS 4,5 AND 6

13946 QIAAMDTTAFLTSSAVKYGPVCK 13875

13607 AWFSTQPWVINDPKLVR 13551

      RHSFRWRARPSLFASYFQVMTGENRAIDRAGVGAGG

12773 GEAWRRTRRVLEGSIIHPAR 12705

 

Name:Chlre2_kg.scaffold_21000002

Protein ID:148389

Location:Chlre3/scaffold_21:6347-7649

Frameshift at PSL/FASY, bad boundary

6350 QIAAMDTTAFLTSSAVKYGPVCK (0) 6418

6683 AWFSTQPWVINDPKLVR (2) 6733

7163 RRHSFRWRARPSL 7201

7200 FASYFQVMTGENRAIDRAGVGAGG (0) 7271

7533 (2) GEAWRRTRRVLEGSIIHPAR (2) 7592

 

>CYP744B1 C_8650001 C_940020

FIRST TWO EXONS FOUND BY WALKING.  FIRST EXON IS A BEST GUESS

     MELVSGLALAGVALFIL (0) TIN33450.x1

     GFIWAGFNPIERYLSPLRRFP (1) TIN292840.x1 WALKED TO THIS READ

     GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0)

 220 IWLGNQPWVCVADPDLIR (BAD BOUNDARY, SHOULD BE phase 2)

 568 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK (2) 672

     SEQ GAP (EXON 6)

1533 (2) LAGHLPAVWRCVRRYTPRLERHAAT (1) 1589

1838 GEPLDLSSDLADLTLAVVGEAAYG 1882 (2)

     VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0)

2710 MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 2737

     VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA

     FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ (0)

3663 IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE 3782

     VDEHFARQAAAEQQQQGQQQREGDDALPE (0) 

4526 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 4645

     NTFVCLATHAV

4956 MHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPAGGGADAAAAAWVPFGMGPRM 5132

5133 CVGSKFATM 5153 (0)

5425 VSKAVLLQIYRRFTFELHPKQ (0) 5484

     VLPLRTRTALTHAPRDGIWVVVKAR* 5818

 

Name:e_gwW.23.77.1

Protein ID:118428

Location:Chlre3/scaffold_23:1014183-1020804

Probable GC boundary at exon 4 DLIR = AGGC

1014183 MELVSGLALAGVALFIL (0) 1014233

1014336 GFIWAGFNPIERYLSPLRRFP (1) 1014398

1014778 GPAPLPFLGNLVSVATRDLTAYLADCRQAYGG (0) 1014873

1015176 IWLGNQPWVCVADPDLIR 1015229 (GC BOUNDARY?)

1015524 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVFLK (2) 1015631

1015978 NGPTWRLARRAFESSIIHPQS (2) 1016040

1016491 LAGHLPAVWRCVRRYTPRLERHAAT (1) 1016565

1016769 GEPLDLSSDLADLTLAVVGEAAYG (2) 1016840

1017260 VDFRTTDEQQDGGRPADPSAPGPALVAAVRECFDCLDVNKTTMYGPLK (0) 1017403

1017671 MIWPGLTPLWRWMAKHLPDAAQTRHMR (0) 1017751

1018161 VADVSRQLMAQWQAAKAKTAAAADTAGATAASGAGAEAGAGVGVGAGAQAKPGGGGAVQA

        FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ (0) 1018451

1018621 IVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE

        VDEHFARQAAAEQQQQGQQQREGDDALPE (0) 1018827

1019484 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG (2) 1019603

1019881 NTFVCLATHAVMHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPA

        GGGADAAAAAWVPFGMGPRMCVGSKFATM (0) 1020117

1020380 VSKAVLLQIYRRFTFELHPKQ (0) 1020442

1020727 VLPLRTRTALTHAPRDGIWVVVKAR* 1020804

 

>CYP744C1 C_1370013 43% to 744A2

2459 MQLTWLGWAPVTRWRLRNIP (1) 2400

1885 GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 1787

1503 VWVAARPWIVVSDPVAARKIAYR (2) 1423

1222 SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 1142

 784 GPAWKASRRAFETSVLRPDRL 722

 721 AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY  (1)

     VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSSKSPSSALVRACHECMACFELPLATMYLPLQ (0)

     MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0)

     AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGISSSSFLAAMLKDQTGSNTACASSSGTDGG (0)

     VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2)

     KLRAELVENRELLDSADGLAQ (0)

     LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2)

 508 GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1) 603

 905 ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0)

1539 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 1652

     VWVTLQPLLLPGAK*

 

Name:fgenesh2_pg.C_scaffold_39000151

Protein ID:177201

Location:Chlre3/scaffold_39:932071-938361

932071 MQLTWLGWAPVTRWRLRNIP (1) 932130

932645 GPFALPFLGHLPAISARDLVHFCHDVARQYGP (0) 932740

933039 VWVAARPWIVVSDPVAARKIAYR (2) 933107

933305 SLARPSTVASFTHALVGEPRQVDDESIFWNR (2) 933397

933746 GPAWKASRRAFETSVLRPDRL

       AAHMPAVRRCTERFLARLAPYADGSTAVDMKDEYGVIALAITGEVAY (1) 933949

934140 VSFWPSDEDAALLAAPTGGSGAATSSSSSSSSSS

       KSPSSALVRACHECMACFELPLATMYLPLQ (0) 934331

935234 MLLPALRPLWLALAAALPDAAQRRHMEARQAVADVSRRLMREWQQQ (0) 935371

935723 AAARANDSGGDGLLLKDQTPVVNGGSSSSGSGGI

       SSSSFLAAMLKDQTGSNTACASSSGTDGG (0) 935911

936216 VISQGLSFILAGYDTTGTTLALTTFLLAHNPTTQE (2) 936320

936586 KLRAELVENRELLDSADGLAQ (0) 936648

936914 LPYLDAVLKESQRLHPAVGHFWRDATSDIALPEMGGLVIPK (2) 937036

937175 GSFVSISIYNMHRDPAHWKEPERFIPERFLQ (1) 937267

937572 ATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 937667

938203 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADGVWVTLQPLLLPGAK* 938361

 

>CYP745A1 C_1860018

AV623700 N-term 31% to CYP735A4 rice, 28% to CYP97A4 rice

similar to CYP97 and CYP72 clans

MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT

CVSQNKGAWDIMEDWMDAKGPIVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGLVTS

DGALWQKQRMLMGPALRVDVLDDIIR

IAKKAIDRLCEKLSHHAGKGDIVDIEEEFRLLTLQ (0)

VIGEAVLSLGPEECDR (0)

VFPQLYLPV

MNEANRRVLRPYRMYLPTPEWFRFSSRMGQLNGFLIDLFRRRWQARQAAAAAAQGEGSSS

SKPKPADILDRIMEAIE ESGAKWDAALETQLCYEVKTFLAGHETSAAMLTWSTLELAAHSQAADK (0)

VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK (0)

ECLQLRLPVIMSE (0?) this may be wrong there is a seq gap that may have the true exon

AEDDPQGLLGYPLPRGTMVACHLQ (0)

GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ (0)

GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTVIPVGPIRGLKVLVEQRK*

 

Name:Chlre2_kg.scaffold_74000010

Protein ID:154128

Location:Chlre3/scaffold_74:79791-84023

Revised the EXXR exon

This seq is most like CYP97 or CYP746 sequences.

It clusters with the 72 clan or the 97 clan and these two

cluster with each other.

84023 MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVD 83874

83873 YIRIARGLSAIPSAPGGVPLLGHVIPMLTCVSQNKGAWDIMEDWMDAKGP 83724

83723 IVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGL 83574

83573 VTSDGALWQKQRMLMGPALRVDVLDDIIRIAKKAIDRLCEKLSHHAGKGD 83424

83423 IVDIEEEFRLLTLQ (0) 83382

83181 VIGEAVLSLGPEECDR (0) 83134

82841 VFPQLYLPVMNEANRRVLRPYRMYLPTPEWFRFSSRMGQLN 82719

82718 GFLIDLFRRRWQARQAAAAAAQGEGSSSSKPKPADILDRIMEAIE (0) 82584

82327 ESGAKWDAALETQLCYEVKTFL 82262

82261 LAGHETSAAMLTWSTLELAAHSQAADK (0) 82181

81879 VVEEARAAFGPRGESEAGRRAVDEMIYTLAVLK (0) 81781

81308 EGLRKYSVVPVVTRVL (0) 81263

80886 AEDDPQGLLGYPLPRGTMVACHLQ (0) 80815

80411 GTHRLYESPDEFRPDRFMPGGEYDQFDDADRAYMFLPFIQ (0) 80292

79988 GPRNCLGQHLALLEARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTV 79845

79844 IPVGPIRGLKVLVEQRK* 79791

 

>CYP746A1 C_28140001 = C_250032 C-helix exon duplication

This is a bacterial related seq like CYP252A1, CYP197A1, CYP208A1

N-term is probably in a seq gap.  C-term runs off the end

scaf 2814 is a repeat of the C-helix exon

39% to CYP252A1 from Streptomyces peucetius,

but not bacterial because it has introns.

MLALAGGLQSMLQVSSPLVTHKITYGSL (0)

RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT

(SEQ GAP HERE)

31584 GNGLLVSDGPVWQRQRRLSNPAFRRAAV 31495

      EAYGGAMVAATEDMMRRVWGPA (1)

      GGTRDVYADFNELTLQVTLEALFGF

      SEDAAQIVAAVEKAFTFFTQR (2)

      AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?)

30612 AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 30502

30091 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS (2)

29766 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG

      TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRR 29548

29296 EGYSGYMGLMSNLGPNGAYLPFGGGPRNC 29261

      (SEQ GAP HERE)

      KPLLTLRPEAVVLRISPRRQ*

 

Name:e_gwW.1.470.1

Protein ID:116510

Location:Chlre3/scaffold_1:3570907-3575049

50% to CYP746B1 Physcomitrella patens (moss)

top 26 hits in nr section of genbank all bacterial

followed by CYP97A of glycine max

3575049 MLALAGGLQSMLQVSSPLVTHKITYGSL (0) 3574966

3574076 RLSSPPPPAFPAGPSGDQTLPLLTDPLRFLTDAT 3573975

3573974 ATYGPVVGLLLGGERVALVTGRAEARA

        VLVEAAGEVYVKEGTAFFPGSSLA (1) 3573822

3573413 GNGLLVSDGPVWQRQRRLSNPAFRRAAV

        EAYGGAMVAATEDMMRRVWGPA (1) 3573264

3573081 GGTRDVYADFNELTLQVTLEALFGF () 3573001

3572874 SEDAAQIVAAVEKAFTFFTQR (2) 3572812

3572660 AATGFVIPEWLPTWDNLEFAAAVQQLDRVVYGMINRRRQELAAAF (1?) 3572496

3572453 AGVPSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 3572289

3571932 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCGGPEAGTPTPAS (2) 3571801

3571661 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG

        TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRS (2) 3571326

3571149 NLGPNGAYLPFGGGPRNC 3571090

3571089 IGTGFAMMEALLVLAALLQRYSLALPPAAGSSSGGAFPKP

3570969 KPLLTLRPEAVVLRISPRRQ* 3570907

 

>CYP747A1 C_900050 41% to CYP743B2 C-term

EXXR to PERF IN SEQ GAP

I HELIX LOCATED 28000BP AWAY ON SMALL FRAGMENT (MISSASSEMBLY?)

FIRST EXON IS A BEST GUESS

352943 MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK 352824 (0)

352717 DVSVIFFEGLHVAQLKFSEKYGPVCR 352640 (2)

352462 FANPASLNGATSWVFINSPENIQHVCATNVRNYS 352361 RRYLPDIYT (2)

352115 YVTHGKGILGSQ 352080 (0)

351877 DEYNARHRRLCSGPFRNKWQLQRFSSVVVER 351785 (2)

351348 SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR 351190 (2)

350690 DLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK 350565 (0)

350317 AMDRLGAPHLRQLGEAVSVMRAAMLDVIA 350231 (0)

378450 ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRLEEELDAVLAGG 378271 (0)

 

 

348405 (0) REARQHRFQWLPFGAGPRMCLGASFAQ (0) 348325

348100 MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ* 347933

 

Name:e_gwH.96.3.1

Protein ID:108849

Location:Chlre3/scaffold_96:178714-184286

Model only covers I-helix to heme region

This seq is now complete, 38% to 97A6 in C-term half

178714 MKSALSAFVRDSGDQVAETGAPTATRPIPGPAPLSLEALK (0) 178833

178940 DVSVIFFEGLHVAQLKFSEKYGPVCR (2) 179017

179195 FANPASLNGATSWVFINSPENIQHVCATNVRNYSRRYLPDIYT (2) 179323

179686 YVTHGKGILGSQ (0) 179721

179924 DEYNARHRRLCSGPFRNKWQLQRFSSVVVER (2) 180016

180453 SKRLVDIFSAAAAADPSGAFTTDVATQTQRLTLDVVGLVAFSHDFACVEQVQR (2) 180611

181108 RDLAGATAGDGRSGVLQDRVLWAVNTFGEVLAQVFITPLPLLK (0) 181236

181484 AMDRLGAPHLRQLGEAVSVMRAAMLDVIA (0) 181570

182398 ATEDDGRGLSDEELWEDVHDIMGAGHETTATTTAALLYCISAHPHVRQRL

       EEELDAVLA (1) 182574

182854 DGEAPTYESLERMPYLQ (0) 182904

183327 ACAKEVMRLYPAIPVFPREAARPDVLPTGHGVAAGDVVFMSS

       YALGRSEAVWGPDVLEFDPDR (2) 183515

183802 FSPEREARQHRFQWLPFGAGPRMCLGASFAQ (0) 183895

184119 MSVALMAATLLQRFRFTPLAPCSPLIPVGYDITMNFGPSGGLRMRVAPRQRGQQQ* 184286

 

>CYP748A1 C_1820019 about 40% to C-term half of 741A1

>C_1820019

N-terminal missing (about 65aa) This seq begins at the KYG motif (TYG)

There is a seq gap before this seq, which is probably where the true

N-terminal is located.

Name:e_gwW.9.168.1

Protein ID:114278

Location:Chlre3/scaffold_9:2353835-2358515

2353322 MSSALDELRFYGTLAATLLGPRYDLGRVPGPPGHPLLGNITAVMRPDYHVQ (0) 2353474

2353835 MLEWANTYGGIFKFSLGFQPVVVVSDPAVAVQVLGRAPGRAIPRKCVGYKFFDL (0) 2353996

2354237 ATNASGAHSFFTTSDEGQWAAVRKAAAAAFSSANVK (2) 2354344

2354560 KAFPIALRHLLL (0) 2354595

2355565 LSLLHVFVEALFGVTPEDFP (1) 2355624

2355926 GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPV (0) 2356030

2356150 VGWREGGTGHVSRGFGARNSRAWGSGEKEWTEENWEPR (0) 2356263

2356454 AVTDLWACLGRVRHPRT (1) 2356504

2356839 GELLGRQGLVPEIGALMMAGFDTSSHSVAWALFALAANPEAQQRVRQELDGRGLLRRP (1) 2357012

2357269 GTAAPPRLPVLDDLPQLPYLNACIDEAMRMYPVAATASVR (2) 2357388

2357569 EVTEPTRVGDFVIPPGVIVWPMLYALHNSVHNWDQPDVFKPERWLQSNAGGSS (1) 2357727

2357950 GKGGGGGKRYMPFSDGMKSCLGQ (0) 2358018

2358133 ALGLMEVRTALVVLLGR (2) 2358183

2358396 YAFALDPGHGGEAAVRRSMIMSLTLKIRGGLRLVATPLG* 2358515

 

>volvox CYP748A1 79% to Chlamydomonas 748A1

ABSY209135.g1 exon 1

ABSY189778.b1 exon 2a

ABSY140806.b1 exon 2b

ABSY42643.g1 exon 3

ABSY86219.x1 exon 4

ABSY93957.g1 exons 5, 6

 

ABSY112787.y1 exons 8, 9 fused

ABSY106164.g1 exons 10, 11

506 MSSSWEELCFYGHLASTLFSPKYDLARVPGPRGSFGLGNITAVMRPDYHVQ (0) 348

203 MLEWANQYGGVYKFSLGFQWVVVVSDPRIAVQ (0) 298

289 VLGRGPDSIPRKCVGYKFFDL (0) 248

 33 ATNAAGAHSFFTTSDETQWAAVRKAAAAAFSSANVR (2) 140

394 KAFPIALRHSRL (0)

716 LSMLHVFMEALFGIRPEDFP (1) 657

290 GRQVAADMNLVLEEANERLKVPLRKVAMALVRPT (0) 189

 

623 GVTDLWACLGRVRHPVT 573

    GAPLGRDALVPEIGALMMAGFDTSSHSVAWVLFALAAHPGAQLRCRQELAARGLVAEGA (1)

984 GSAQRGPTLDDLIQLPYLNAVIDETMRMSPVAATASVR (2) 871

357 EVTQPTRVGDYVIPPGVIVWPMLYALHNAVHNWDRPDEFLPERWLPGSGAA (1) 199

 

2357672 AGCCAGACGTCTTCAAGCCCGAGCGATGGCTGCAGAGCAACGCCGGCGGCAGCAGCAGTGACAGCGGTGGCAGCAGCAGCAAGGGCGGCAACGAGGAAGC

2357772 GGGGGTGGCCGGTGCCGGTGGCGGTGGCGCGGGAGGCGCTCGTTCGGCCGCGGCTAACGACGAGGGCAGCGGCGGCGCTGCGGGTGGCTTGGGCGGTGGC

2357872 GGCAGTGGCGCCAGCAGCAGGAGCGGCTCCTCCGCCGCCCTGGGTGCGGCGGCGGCGGCGGCGGCAGACGGCGGCGGAGGCAAGGGCGGCGGCGGCGGCA

2357972 AGCGCTACATGCCGTTCAGTGACGGCATGAAGAGCTGCCTGGGGCAGGTGGGTGGGTGGGCTCTGGGGGTATGTCGTGGTTAGATTCCGCCCCTCACCTT

2358072 TCCCTCCCTTCTCCCGCGCGAAACTTCCCTCATGCTTTCCGCCCTCCTCCTCCCGCCGCAGGCTCTGGGGCTTATGGAGGTGCGCACCGCACTGGTGGTG

2358172 CTGCTGGGCAGGTGCGTGCGTGCGGCGCCGGGGCAGGGGTGGGCGTGGGGGCATGAGGGGGAATGGCCTCAGTGAGATGACGC

 

 

2352872 GAAGCAGCGCTTAGTGGTGGTGGCGGGGGAGCTGGGCGGAGCCGCGACCCACGGAGGCGCCGGCCGGCGACTGGAGCACAACGCTTCGCTTCGGCGCTGT

2352972 GCCACTGCTGCAACACAACTGAACATAGGATTCACAGCACTGTTGCTACTGGACGCCACGTCGAGGCTATCGCAGCTATCCCAGAGGACCGCCGCCGGAG

2353072 CCGGGAGGCCACCCCTCAACGCACGCCGCCGTGTGCAGCCAGCCAGTCGGTCCTTTGCCGCCGGCGCAATCAGCACCACCAGCAGCGCAAACAGCCGGCA

2353172 CACACACAGACACCGTACAGCAGCTAACTTGCCAGCCCAACTGCATAGCAGCAGCTCTCCGCCTTTCTACCCCACATCACCCACCCACGCACCCAAGCCT

2353272 CTCCAAGCCACCGCTCCCCTCACCTCTCCCGCTGCAACACACGCCGCACCATGTCCTCGGCCCTGGATGAGCTGCGCTTCTACGGCACCCTGGCCGCCAC

 

 

2353372 GCTGCTGGGCCCGCGCTACGACCTGGGCCGCGTGCCGGGGCCGCCCGGGCACCCGCTGCTGGGCAACATCACCGCAGTCATGAGGCCCGACTACCACGTG

2353472 CAGGTGTGCTGACGACCGGCGGGGCGGATGGGGGTTGGGCGGGGGGCAAGGGGAGGTGGGGGACTATGGCGCGGAGGAGTTTGGGTGGGGAAGGGATTTG

2353572 GTATTGTGTGGGGTGGGTGGGGTGGGGTGGGGCAGAGGGTTCAGGGGCTGGGTCGGTGTCGAGGCGGCAAGGGGTAGGATGATCATGACCCGGGGGGATA

2353672 GGAGCGTGTGCGGCTCAGCTGCTGCTGGCCGCGCGCCACCACAGCTGCCGCGGCACTACCCTATGCGCCGCTCCGCACCAGGAACAGCACCTCCCCCCAC

2353772 CGCACCGCATCGTGTGTCACGCCCACGCACCTGACTGCTGCTGCCCGGCCTGCTGCCCGCCAGATGCTGGAGTGGGCCAACACCTACGGCGGCATCTTCA

2353872 AGTTCAGCCTGGGCTTCCAGCCGGTGGTGGTGGTGTCCGACCCGGCGGTGGCGGTGCAGGTGCTGGGCCGCGCGCCGGGCCGAGCCATCCCGCGCAAGTG

 

2354372 GAGGGGACTCCCCACGCTTGCGGCACCCTTGCGCACGTGTGTGACTTGCCTAGCATCCATCGATCCCCGGCTCAAGCGCCTGATATGCCTCCACCGCTTC

2354472 TGTTCAACCGCCCCGTTGTAATCTCCTGCTACTCATGCTCCCTCTCCCTCCCGCTCGCTGCTCCGGATGGCCCTAACGCCGTCGCAGGAAGGCGTTCCCC

2354572 ATCGCGCTGCGTCACTTGCTGCTGGTGGCGGAGTCCCTGGACCCAGCCGGCCCCCACACGCCCGGCAACCCCTACCTCGACCTCACACACCACTCCCAAC

2354672 AACAGCACCAACAACACCAACGGCACCCGCAAATGCAGAAGGGTGACGGTGCCGCCGCCGCACGCATGAGTGGCGGCGACGGCAGCGAAGCGGCGCCTGC

2354772 AGGCAAGGGCGACAGCAGCAGTGGCGCTGCTAGCAGCAGGTGGCTGTGGCGGACGCCCGACCTCAACTGGATGCGGAGCGGCCTCAGCCTCGGGTTCCGC

2354872 CGGCGCAGCCGCAGCCGCCCCGGCAACAGCACCGCCGCCGCCAAGCCTGCTTCTACGCCGCCAGGGAGCGCTACTACTTCCACTGGTGCTACAGCCAACG

 

>ABSY86219.x1  CHROMAT_FILE: ABSY86219.x1 PHD_FILE:     [top]

           ABSY86219.x1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10

           11:54:44 2003

          Length = 781

 

Query: 180 RRRKAFPIALRHLLLVAESLDPAGPHTPGNPYLDLT 287

           RRRKAFPIALRH  LVA  LDPA    P NPY+ LT

Sbjct: 382 RRRKAFPIALRHSRLVAAGLDPAVQPDPANPYIQLT 489

 

2354972

CCACACATGCGGCAGACGCAGCTCCCAGTGCCAGCAGCAGCTTTGTGGACCTGGGCAGCAGCTGCGTAGGTGCTGACAGCAGTGCGAGCCTCGCCTCTCG

2355072 GTCGTCCTCGCCCTCGGCCACCGCGTCTGCGCCCTGCTCCTGCGGCCGCTGCGGCGCAAACAGCCCGCGCCGCGCCGTCGCCGCCGCCACTGCCACTGCC

2355172 GACACTAAGGGTGGCGGCGCAGAGCGGACGGCCGCGGCGCCGGCGGGGCCGGCGGAGGCGGAGGAGCTTGCGGCGGGCGGCGTGGGAGCGGGCGCGCCGG

2355272 GCGCTGCCGCTGGCGGGCGCTCCATCCACAGCCACCCCTTTGACTGCGGCACCGACGAAACGAGCAGTGTAGACGAGACGCCTCCGCACGCCACGGCTGC

2355372 ACCCGCCGCCGCCACCTGCACCGCCCCAGCCGGCGCTGGCAGCGGCAGCGGCAGCGCCACGGATGCCGGCACCAGCGCTAGCGGCACCATCGACGCGGAA

2355472 AGTAGCACCGGCGCCGGCACTAGCGGCAACCCTAGCGGCGGCGGCACAGGCGGCGGCCCTGCTGCTGTGGTGGACATCCAGGAACACTTGGAGCTGAGCC

 

none

 

2355572 TGCTGCATGTGTTTGTGGAGGCACTGTTCGGAGTGACGCCGGAGGACTTCCCGGGTAGGTGCCGGGGACGGACGGAGGGGGAACAAAGAGGCGAGGCAAG

2355672 GCGAGGCGGTCTGGGCAGACGGGAGGGAGGTTGGTGCCAATTGGCGCCATTCAGTGCTTGCTACTCTGCTGTTTCTATCTCGCCAGTATGTGCTAGCGCA

2355772 CTGTCTGCTGACTGGGCACTGACACGTCACCTGGCTGCTCCCTCCGACCCATATCGCCTTCGCACCTCACACGCTCCCCGCCCACCACCTCCCCGCCCGC

2355872 CTGCCCCCTGCTCCTCCCTCCATTCCCCCCTCTTGTCCCGCCCCTCCCGCCCTCCCAGGCCGCCAGGTGGCTGCCGACATGAACCTGGTGCTGGAGGAGG

2355972 CCAACAGCCGCCTCAAGGTGCCGCTCAGCGGGCTGGCAAGAGCCGTCACACAGCCGGTGGTGGGTGCGGGGCTGGGGCGGTTATGTGCCCGAGCGCAATG

2356072 GAGTCGGTCCCAATAATAGTCAAGGAGTCGTCGCGGGACTGGCCATGGGGCGGGGCGGACTGGGGCCAATTGGGTGAGGTCGGGTGGAGGGAGGGAGGGA

 

>ABSY93957.g1  CHROMAT_FILE: ABSY93957.g1 PHD_FILE:     [top]

           ABSY93957.g1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14

           12:58:11 2003

          Length = 1195

Query: 3   LHVFVEALFGVTPEDFP 53

           LHVF+EALFG+ PEDFP

Sbjct: 707 LHVFMEALFGIRPEDFP 657

 

Query: 358 GRQVAADMNLVLEEANSRLKVPLSGLARAVTQPVVGAGLG 477

           GRQVAADMNLVLEEAN RLKVPL  +A A+ +P V  G G

Sbjct: 290 GRQVAADMNLVLEEANERLKVPLRKVAMALVRPTVRRGGG 171

 

2356172 CAGGACACGTGAGTCGGGGATTTGGAGCCAGGAACTCAAGGGCTTGGGGAAGCGGGGAAAAGGAATGGACCGAGGAGAACTGGGAACCGAGGGTTACGCA

2356272 GGCACGCCGCCGCAACCCCCAGTCTGACGTGCGACGCTGCTAGCCGCCACCCCTCCTCCACGCACGCGCACACGCCCAACCCCACACAGGCCCAGGCCCG

2356372 CATCCGCGCCGCCCAGGTGCGGCTGGCTGCGGTGTACGGCAGCCTGTACGACGTCATCCGGGCCCGTGGGCCGCAGCCCGAGGCCGTGACAGACCTGTGG

2356472 GCGTGCCTGGGCCGCGTGCGACACCCCAGGACAGGTGGGGGGGCGGTTGTGGCGTGTGGGTTGACGCGGGTACGTGGGGACACAGGGAGGGGGTGGGGGC

2356572 ACTGCTGGGTGGGTGTGCGCGCGGCACGCCGCCGCGGCCCCGAGTTACTGACTCTGGAGGAAACCATGCTGCAACTCACTTGCCCTGCCGCATGGACCGC

2356672 GGCCCGCAGCACCTCCACCGCGCCTGCACCAGACCTCCCCCACCTTTGCCCTAACCCACCCTTTTCTTCCTTATCCAGCCACCAATCACGGACTTCGCTC

 

>ABSY112787.y1  CHROMAT_FILE: ABSY112787.y1 PHD_FILE:     [top]

           ABSY112787.y1.phd.1 CHEM: term DYE: ET TIME: Wed Sep 10

           10:18:50 2003

          Length = 799

 

Query: 277 PEAVTDLWACLGRVRHPRTG 336

           PE VTDLWACLGRVRHP TG

Sbjct: 629 PEGVTDLWACLGRVRHPVTG 570

 

 

 

 

>CYP-un1Chlre pseudogene 1, family not identified, C_140094

half of gene, very different

63125 (0) HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 62976

VGRGRERGGRELARMEVKRGGDRF (2)

62528 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGS (1?)  62412

 

61349 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREALAALLTKYRWRL 61164

61163 YDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 61041

 

scaffold_48:305112-303028 no model

Name:estExt_fgenesh2_pg.C_480037

Protein ID:193769

Location:Chlre3/scaffold_48:289896-330340

Note this is a very long gene model that contain s the EXXR exon

But no other exons.  It misses the heme signature sequence

And the I-helix motif

305112 HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 304963

304962 VGRGRERGGRELARMEVKRGGDR  304894

304518 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGSS (1) 304396

303336 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREA 303187

303186 LAALLTKYRWRLYDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 303028

 

vovlox has no ortholog

 

$$$$$$$$$

 

>CYP767A1

Green my predictions

Yellow JGI predictions that work in blast

CYAN = motifs

Name:   fgenesh2_pg.C_scaffold_9000240

Protein ID:    169101

Location:       Chlre3/scaffold_9:1625885-1634209

Exon 13 in a seq gap use older version of seq here

fgenesh2_pg.C_scaffold_9000240 [Chlre3:169101]  similar to 741A1

C_340039 unnnamed C-term P450 fragment PKG to heme

1625885 MDGWPPSSPGSIRLQTLQLHAVPPAEPSSSPFITGPPPT  (2) 1626001

1628184 LRSLLLPRYDLDSIPGPWPHALPLLGNMLSVLRPDFHRVLLRWADQYGGVVRIKFLWQ (0) 1628357

1628626 DSLLVTDPAALASICGRGEGACDKAAAIYTPIN (1) 1628724

1629069 AMCTPRGHVNLLTSPANDAWRAVRKAVAVSFSWNNIKNKFPIIR (2) 1629200

1629464 DRTSELVEWLRAEGPAASVDVDQAALRVTLDVIGL (0) 1629568

1629948 TAFGHDYGCVRLRQVPPEHLIRVLPRAFTEVMRRIANPLRALAPRLVKKGTK (1)1630103

1630519 GLQAFRDFQAHMQQLLREVLDRGPPPPEDTDIGAQL (2) 1630636

1630701 EAQR (0) 1630712

1630736 PAITEERILSE (0) 1630768

1630963 IGILFVEGFETTGHTISWTLFNIATTP (1) 1631043

1631243 GVQEAVAAELGGLGLLVRPHAMGGR 1631317

1631318 GAARPLALEDLKRLPYLTACVKEAMRMYPVVSIMGRITQ (0) 1631434

1631650 HPTRVGKYLVPAGTPIGTALFAIHNTRHNWTDPLAFRPQRWM 1631775

1631776 GESSSERASGRASERARDSGR (2) 1631838

   4554 YMPFSEGPRSCVGQSLAKLEVMTVLATLLAHFRVDLAEE (0)  4646

1634099 MGGREGVHKRESTHLTLQTAGTRGIQMHLHPREDDP* 1634209

Note: most similar to animal CYP46, CYP24 and CYP4 sequences

34% to 741A1

The first exon is probably not right (too far away)

Short EAQR exon is required to join GAQL to PAIT, there may be some revision

needed here. (see volvox ortholog)

trace file 652853255 from PQRWM

walked down to 650266898

these two covered the ver 3 assembly to 1633054 with 100% matches

walked down to 337758911 goes to gap region in assembly 100%

used the very end of the assembly to search again and found

335096672.  This seq has the missing P450 exon seq

336483811 has the end of this exon

 

>Volvox ortholog assembled from blasts for each exon, exons 7,8 found

By comparing DNA for Chalmydomonas and volvox in this region for matches.

Missing exon 1

ABSY171556.g1 exon  2 PKY…

ABSY46806.x3  exon  3 DGL… fused with exon 4

ABSY46806.x3  exon  4 MCT…

ABSY5198.y1   exon  5 DRT…

ABSY140583.g1 exon  6 SAF…

ABSY56673.x2  exon  7 GLT…

ABSY90166.y3, ABSY10903.x1, ABSY90166.y1, ABSY125944.g1 exon 8 PAI…

Missing exon 9

ABSY174072.y1 exon 10 GTQ…

Missing exon 11

ABSY225235.b1 exon 12 FMP…

ABSY176428.b2 exon 13 MGG…

270 (0) PKYDLDLIPGPWTHALPFIGNLLQFLRPDFHRVCLRWADKYGGIVR (2) 133

343 (2) IKFLWHDGLLVTDPPALAAICGRGEGAVDKAANIYSPIN  459

460     QMCTPHAYPNLLTSLADDRWRAVRKAIALSFAFGNIRKKFPLIR (2) 591

494 (2) DRTGELLEWLRGVGPLESVDVDQAALRVTLDVIGL (0) 598

    (0) SAFGHDYGCTRLQQVPYNHLLRVLPRAFTEVMRRIANPFRSFAPGLVKNGKK (1)

723     GLTSFKDFQRHMQELLGEIKARGPPARGDADIGAQLYRVLEAAR (0) 779

    (0) PAITDERILSE (0)

 

322 (1) GTQEAVAEELSSLGLLVRPKSEGGRSAARQLELDDLKRLRYLTACVKESMRMYPVVSIMGRWRMR (0) 516

 

702 (2) FMPFSEGPRSCVGQSLAKLEVMTVLAMLLANFRIELSDE (0) 818

744 (0) MGGREGVRQRESTHLTLQTRGTRGIRMHLHPRDQE* 851

 

$$$$$$

 

>CYP768A1 Chlre2_kg.scaffold_23000190 [Chlre3:149040]

this P450 model is upstream and covers an N-term up to I-helix motif

2000bp space between N- and C-term parts

C_1530020 unnnamed C-term P450 fragment PKG to end

Chlre2_kg.scaffold_23000191 [Chlre3:149041]

31% to 4Z1, 31% to 4B1 in C-term part

24% to CYP46 over most of the length

35% to Ciona 4V5 like seq C-term part

 

Name:   Chlre2_kg.scaffold_23000190

Protein ID:    149040

Location:       Chlre3/scaffold_23:1470852-1473965

Name:   Chlre2_kg.scaffold_23000191

Protein ID:    149041

Location:       Chlre3/scaffold_23:1476142-1477663

 

1470852 MPAAQLFKFLLKPQYDLAKLPQPPVADWVLGHVKHLLRK (1) 1470968

1471402 DYHRVILGWAKQYGRIFKLR (2) 1471461

1471649 ILNEWTVVITDPAAAAQVLATVPGRTHNYKHIDE (0) 1471750

1471901 VLGGPGKIS (2) 1471927

1472089 MFGTPDEVHWRNARKATAPAFSMAN (0) 1472163

1472706 VPDATALPGFDELASNILLLMAEANAQ (0) 1472786

1473064 VTDPLRAFFYFTPIAPLVSK (0) 1473123

1473316 HVARCRAALKQVVMFHGRTAARILAR (2) 1473393

1473587 PEPSPDNTLLWACLHRLRHPHTGRKLTPGQLHPE (1) 1473688

1473904 VGMYTAAGFDTTASTVGWCM (2) 1473963

1474237 YAASLWPEQQQAVAAELRAAGIFGPAAVVE (0) 1474326

1474621 ELAKLPRLNAFINE (0) 1474662

1475863 VMRMFPPTAVSAER (2) 1475904

1476109 LTPDEPVTIMGMTFPAK (0) 1476159

1476468 TVLWCITYGIHMSDANWEDAAKFK (2) 1476539

1476792 PERWLEDPRCAFAKSPGAGGAAAAPATAGGAEGPAAAIGGAAEEEPPNTA

        PRRFVPFGQGPKNCVGQ (0) 1476992

1477404 NFGITVVRAVVALLLRRYHVDLHPDMDTSPEGDKLGGGGGGGDGNSSGSGQAGG

        CRHSAEDTARLTHVAVITKLKKLRLVLQRRDD* 1477664

 

Volvox CYP768A1 ortholog

ABSY165990.g1 exons 1,2,3

ABSY147804.y1 exon 4 C-helix partial

ABSY193853.g1 exon 4 C-helix partial

ABSY111272.b1 exon 4 intact

ABSY75276.y2 exons 5,6 fused

ABSY73799.g1 exons 9,10

ABSY165990.b1 exons 14,15 fused, 16, 17

ABSY22806.b1 exons 18, 19

    MWDTLRFYYSTHGPLGAWTPAIVLLLNILGIALALAVTKFIGLYFA

258 PSYDLRKIPTPPVGDAILGHVKFLLRPDYHRVILAWTRKYGKIFRLR (2) 398

599 ILTQWTVVITDPAAAAQVLAVVPGRTHNYTLVDE (0) 700

986 GLGGPGKIS (2)

546 MFGTRDEAHWRNVRKATAPAFSMAN (0) 620

362 (0) VPDARALPGFDLLVPRILLLMAEANRQIVDPLWALWYRTPLAPLLSK (0) 222

 

221 PDPPSDNTLLWACLHRLRHHITGARLTPTQLHPE (1) 322

649 GGMYTTAGFDTTASTLGWCL (2) 705

 

 

 

957 (2) VSPDRPVAVGPFTLPPGVVLWPLVYGIHMSDANWDEPEAFR (2) 835

542 MERWLEDPRCAFARGE (1) 495

314 RGPGASGAPRRFLPFADGPKNCVGQ (0) 261

469 NFGLVVVRAVLALLLSRYRVALHGDM 546 no boundary after DM

822 (0) VAVVTKLSKLRLVMTPRD* 878

 

$$$$$$$

 

note: the next three sequences are partial missing the N-term.

It is nearly impossible to assemble the N-term part without cDNA.

 

>CYP771A1  C_4150003 unnamed CYP97 like C-term P450 fragment

estExt_fgenesh2_pg.C_210032 [Chlre3:191092]

TIN347338.x1 CANNOT DETECT N-TERM HALF, EXXR TO PERF MISSING

I-helix present and heme signature present

Gray region is 39% to a Xenopus seq

EAASLWLLMALPVPNELLPG

YGTYEANVRRLDELVYDM

LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE (0)

VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ

(SEQ GAP)

(0) PNAFLPFGVGSRSCIGRHFGLLSTQ (0)

LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ*

 

Name:estExt_fgenesh2_pg.C_210032

Protein ID:191092

Location:Chlre3/scaffold_21:297178-306479

302461 (0) IEAASLWLLMALPVPNELLPGYGTYEANVRRLDELVYDM (0) 302577

303433 LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRRE (0) 303540

303889 VLGVLGGRALGELRAEDVKAMPLLAAVVNETLRLHPPLAEITRVATQ (0) 304029

 

305941 PNAFLPFGVGSRSCIGRHFGLLSTQ (0) 306015

306339 LTLAALVARFEVLPPAPPAPTALDWSQSIVITSRSGVWLRLRPIRQ* 306479

 

volvox matches

ABSY89114.y1

ABSY732.g2 I-helix

787 (0) NAALWLLLQLPIPDHLLPGYDKYMANIATLDEL (0) 689

278 (0) LVTMFFGGTDTSALALTLTAYHLAHCPEAQRAARAE (0) 385

 

$$$$$$

 

>CYP770A1 C_7970001 unnamed C-term P450 fragment

fgenesh2_pg.C_scaffold_15000041 [Chlre3:170931]

runs off end

NOTE: CANNOT FIND AN AG-GT BOUNDARY AT LAST EXON. 

THIS MIGHT HAVE A LONG INSERT IN IT AND NO INTRON

Very low sequence identity to other P450s

LLVSEGQQWRLMHALATPAF (C-helix)

34% to CYP714A2

GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRH (2) (I-helix)

SRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ (0) (EXXR)

(0) DTTIGGVMLPKD (0)

5873 (0) VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 5766

5066 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVMRYSLELLQPPPPSPR 4884

4643 (0) QLFVSLRPGPGVRVCFVPRHQQQVE* 4563

 

Name:fgenesh2_pg.C_scaffold_15000041

Protein ID:170931

Location:Chlre3/scaffold_15:453166-458216

39% to 746A1

454318 LLVSEGQQWRLMHALATPAF 454377

       KAELLERGAFAAALRGVMEEWHRRAVALLPLWRLQAA (0) possible exon like 97B6/97C3

455773 (0) GVALTLVGMGHENVSATAAWALLLLAAHPEQQQALYRELRQ (2) 455895

456058 GCGFPTSRFIQSHPSRTAALLRLPYLDAVLRETLRLYPPVPMLSRQLMQ (0) 456204

456449 DTTIGGVMLPKD (0) 456484

456906 VELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP (?) 457013

457713 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVM 457850

457851 RYSLELLQPPPPSPR (?) 457895

458139 (0) QLFVSLRPGPGVRVCFVPRHQQQVE* 458216

 

>volvox matches

ABSY135777.g1

661 IMGAGHETTATTTAALLYCISAHPDVRQRVEQEL 560 I-helix

ABSY182504.y1

326 LPYTEAVLKETMRLYPALPMMHRHARNDIRLEDGRVAPK 210 (EXXR motif)

ABSY179247.b1 alternative

   106 LESIVLETLRLYSPAYMVGRCAQVDATLGPYSLPTGTTVLVSPFVMHRDAAVW 264

715 GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 843

ABSY65293.y2

888 DATLGPTSVPTGTTVLVSPFVMHRDXPVW 802

351 GAYLPFGGGPRNCIGTGFAMMEGMLVLAAVLQRYDLTLPPQTL 223

 

$$$$$$

 

>CYP769A1 fgenesh2_pg.C_scaffold_24000071 [Chlre3:173996]

C_10690001 unnamed C-term P450 fragment

43% to 97A5 cannot extend upstream

possible C-helix exon

VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE

seq gap here

RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG

AG region of I-helix here

LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG ()

VYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK ()

DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0)

ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVL

LCFLALEPTGDPADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA*

 

Name:fgenesh2_pg.C_scaffold_24000071

Protein ID:173996

Location:Chlre3/scaffold_24:545063-551204

Not a bacterial contamination since there are exons and an ortholog in volvox

551626 MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDS (1) 551525

550850 LPPALPLLGHWLALRARGRGSEPGDTHLRTLRRWAEAHGGAFRLLLPRAW 550701

550097 VLHSPPAPSLLTSTAAAQWRAARRSLLFAFSRSELEQDFE (0) 549978

(seq gap here)

548586 (0) AVEATGQVLLLRLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEG (0) 548416

547904 LLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREG (1) 547794

546390 GVYDAPHTALTWTMLHRLPFLDCCVREALRLYPAQPCPATVRQLNK (0) 546253

545752 DVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) 545624

545320 ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGD 545171

545170 PADEPRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* 545063

 

>Volvox about 60% to Chlamydomonas seq above

>ABSY24005.y1  CHROMAT_FILE: ABSY24005.y1 PHD_FILE:     [top]

 

Query: 5   QLRNRCNLTGHTSRQPLQPSHLPTLW 30

           +L  RCNL G  SR+ LQ  HL T W

Sbjct: 526 RLNYRCNLRGRVSRRALQDVHLSTRW 603

MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTAR (1) volvox

     MRALQLRNRCNLTGHTSRQPLQPSHLPTLWVLDSR (1) Chlamy

685709806  AOBN322434.y1, also ABSY28503.b1

ABSY17828.g1 goes upstream of this exon

different N-term (probably both same with one having errors)

     SPPGVPLLGHSLAYARAPWKWGGVPRARVPGEPSFLW (errors)

 (1) APPGVPLLGHSLTLRAWPSWTWWWFRSG GPRGDQLLLRALLRWSEQYDGAFQLRNGWL

     APPGVPLPGHSLTLPAWPSLDMGVGSGAEGPRATTLHWGRCCPGPSSTMVLFT

ABSY207904.g1

Trace archive files

685812629  AOBN472902.y1

684986793  ABSY385036.g1

683183378  AOBO82354.b1 

710612050  AOBN690035.g1

550752068  ABSY207904.g1

689850606  AOBN318993.b1

685709806  AOBN322434.y1

85  VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0) 204

689851374 = mate pair of 689850606 above (C-helix) 2 exons

955 () GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830

(0) AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR

CRILHHFIEQ (0)

ABSY202948.b1 (+)

(0) LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1)

ABSY223271.b1

246 (1) GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK 112

ABSY130123.g1 (+)

348 (0) DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA 474 (0)

   711 LCYPEAFMPFSFGSRN*LGQKLPVAQIKAALAMLL*FLGLKPS 839

SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1) trace archive 712749567

VAPWGLGLFLRPEGGMQLLVAPRKKNS* 687335561

 

Assembled volvox CYP769A1 seq 56% to Chlamydomonas CYP769A1 seq

MSIDARLDRRLNYRCNLRGRVSRRALQDVHLSTRWTKTA (1)

PPPGVPLLGHSLTLRAWPSWTWWWFRSGGPRGDQLLLRALLRWSEQYDGAFQLRNGWL

VLHPNAVPSSATATSSAQWRLLRRSLLHAFSDSELQLDFE (0) 204

GPGAVVDVNDAALRLSLDVMGLSKLGYDFQVGMAVVRQNEKL (?) 830

AVESQGEVLMLRLLGEVAAEWAVRRRRLLGRWAPWISDGAAEGQTR

CRILHHFIEQ (0)

LLLAHGPTGHSIAWALGCLAARRGVQEKLVAELKKE (1)

GIFNDPLRLTYDMLSKLPYLDCVVREVLRLYPTMPCPATVRTLKK

DVALHGRTLTAASDVWVDVFSMHRSPKWWRDPHHFKPERWTA (0)

SPPPLAPLCSPEAFMPFSFGSRSCLGQKLAVAQIKAALAMLLCFLVFEPS (1)

VAPWGLGLFLRPEGGMQLLVAPRKKNS*

 

>ABSY207904.g1  CHROMAT_FILE: ABSY207904.g1 PHD_FILE: ABSY207904.g1.phd.1 CHEM: term DYE: big TIME: Sun Nov 30 14:23:34 2003

NNNNNNAAGCGCTGAATACCCTCCTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

XXXXXXXXXXXXXXXXXXXXXXXXGTTCTTCACCCGAATGCCGTACCCAGCAGCGCTACA

GCCACCTCCTCGGCCCAGTGGCGGTTACTGCGGAGGTCGCTGCTACACGCCTTTTCCGAC

TCGGAGCTTCAACTGGACTTTGAG GTGCGTGGAGGTGCATGTGTTCGTGTGATATGTGTC

TATCTGTCTGTGTATCTGTCGTCGTAGGCCAGGCGTCACTGTCCAAGAGAACCCTTCACA

AGAGGCCAAGAGAACCCACCCCCACCCCCACCCCCACCCTCACCCTCACCCTCCCACCCC

CTCCCCCCACCCTCACCCTCACCCCCACCCTCCCACCCCCACCCTCATCCTCACCTCCAC

CCTCCCACCCCCACCCCCTAACCCTAAAAAAAAAATCCCCACAAAAACCTCATCTATATA

TNCTTCATCCCCAATCCCAACTCCACTATCCAACATCTTTAAATCATCACCCATTTCTCC

CACTCTAACCTCCACCCCAACCTCAACTTCTTACCCAACCCTTATAAAAATCAACTCCCT

TTTTTAAATCCCCAAACCTCAAATCCTATTCCCTACCCAATTATCCTTTCACATCTATAC

CCCATATCTATTCATAAACCTTAACCAACCCCTCACTTACCCTTTACCTTTAAAATCATA

AAACTCACCACCTTTCCATACTATCTTTTCAAATACCCATAACTTTTCCCCACATCAAAA

TAAAAATTTTTTTCCTATTAATACAACACTTTTTATACCCCCTCTCTACACTATAAACAT

CCCCTTAATTTTATATATTTCCCCTAAANATACTTCCCCCATTTCTACTTTATCATATAA

AAAAATAATTTTCCAACTTCCTTAAAAAACCTTCTTAAAAATTATTTCTATTTAACACTC

TCATTTATAATTTTCTTACCTATTATTAAATTTCCTTAAAATCTCAAAAAAACTCTCTTC

CCCATTAACAAACTATTCATTATTCCCCTTACNACTAACAAATAAAAATAAAAAATTTTT

TTTCTTTTCTCCCCCTCATAATACAAATAAAAATAATTTCCCAAAAAACACCCACACACC

ATCATACATTCCAATTATCTTTATAAAACAATTTCCCTNTCCACATACAATATAAAAAAT

AAAATATCTCCCTATCTTACATAAATCTATTCATCTTANTCTAATATCCCTTCTCCTACC

TCTTTCAACCTCTTTTAATCAATAATCTTTTTATACCCTCACAACTCTTTTCTACTCACT

ATCACTCTC

 

>ABSY109519.b1  CHROMAT_FILE: ABSY109519.b1 PHD_FILE:     [top]

           ABSY109519.b1.phd.1 CHEM: term DYE: big TIME: Sun Sep 14

           12:57:01 2003

          Length = 1136

 

 Score = 26.7 bits (53), Expect =    23

 Identities = 12/18 (66%), Positives = 13/18 (72%)

 Frame = -2

 

Query: 2   AAAATHPARTGYGAARSA 19

           AAAAT  A +GYGA R A

Sbjct: 511 AAAATSRAASGYGAERGA 458

 

 

Match to Kineococcus radiotolerans SRS30216 ctg215, whole genome shotgun (bacteria)

ACCESSION   AAEF02000013

MVRAVPAIVRAPHLFLAEVTRRHGPVAAIPLPRTPVLVLADPDGVRRVLVENARGYGKATIQY

SALATVTGPGLLAGDGEVWKQHRRTVQPAFHHGSLEDVA

AHAVHAARGLVAEADALPPGTPLEVLGATSRAGLEVVGHTLAAADLSGDAPLLVEAVG

RALELVVRRAASPVPAAWPTPARRRLAREVAVIDEVCARIVATRRARPLEDPRDVVGL

MLAAGMDDR QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGR

EPGWDDLGKLPLLRAVVDESLRLYPPAWVVTRQALADDVVAGVAVPAGTLVIVCTWGL

HRDPALWEAPEEFRPDRFLDAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLC

ERTVRPAGPAPRVDALVTLRPRGGLPL HVERLAPSAS

 

Score =  122 bits (306),  Expect = 4e-25

 Identities = 81/206 (39%), Positives = 110/206 (53%), Gaps = 14/206 (6%)

 Frame = +1

 

Query  49     RILQDFIEGLLLGHEPVGHSLAWALGCLARNRAAQDKLVAELKREGGVYDAPHTALTWTM  108

              ++  + +  ++ GHE V  SL W L  LAR  +   ++ AEL    G  +       W 

Sbjct  46849  QVRDELVTFVVAGHETVASSLTWTLDLLARAPSVLARVHAELAGALGGREPG-----WDD  47013

 

Query  109    LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLAGWSVPAGAEVWVDVHAMHRNPQL  168

              L +LP L   V E+LRLYP  P     RQ   D V+AG +VPAG  V V    +HR+P L

Sbjct  47014  LGKLPLLRAVVDESLRLYP--PAWVVTRQALADDVVAGVAVPAGTLVIVCTWGLHRDPAL  47187

 

Query  169    WRDPDRFNPERWAEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLAL  228

              W  P+ F P+R+     +AP  +  +++PFG+GPR C+G+ LA  E    LA LLC   +

Sbjct  47188  WEAPEEFRPDRFL----DAPRPAAGSYVPFGAGPRLCIGRDLALVEEVLVLATLLCERTV  47355

 

Query  229    EPTGDPADEPRPAAGLFLRPAGGLHL  254

               P G PA  PR  A + LRP GGL L

Sbjct  47356  RPAG-PA--PRVDALVTLRPRGGLPL  47424

 

Match to 4F3

>CYP4F3 NM_000896

          Length = 520

 

 Score = 80.9 bits (198), Expect = 5e-18

 Identities = 59/193 (30%), Positives = 93/193 (48%), Gaps = 33/193 (17%)

 

Query: 96  RLLGEVAEEWDARRRRLLPAWAAPWLLDSAAEASSKCRILQDFIEGLLL----------- 144

           RL+ +  ++    RRR LP+      +D   +A +K + L DFI+ LLL           

Sbjct: 261 RLVHDFTDDVIQERRRTLPSQG----VDDFLQAKAKSKTL-DFIDVLLLSKDEDGKKLSD 315

 

Query: 145 -------------GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTM 191

                        GH+     L+W L  LA++   Q++   E++ E + D     + W 

Sbjct: 316 EDIRAEADTFMFEGHDTTASGLSWVLYHLAKHPEYQERCRQEVQ-ELLKDREPKEIEWDD 374

 

Query: 192 LHRLPFLDCCVREALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQ 250

           L +LPFL  C++E+LRL+P  P PA  R   +D+VL  G  +P G    + V   H NP

Sbjct: 375 LAQLPFLTMCIKESLRLHP--PVPAVSRCCTQDIVLPDGRVIPKGIICLISVFGTHHNPA 432

 

Query: 251 LWRDPDRFNPERW 263

           +W DP+ ++P R+

Sbjct: 433 VWPDPEVYDPFRF 445

 

>CYP4F12 mRNA for cytochrome P450, complete cds. AB035130

          Length = 524

 

 Score =  123 bits (308), Expect = 1e-30

 Identities = 74/194 (38%), Positives = 107/194 (55%), Gaps = 9/194 (4%)

 

Query: 187 GHEPVGHSLAWALGCLARNRAAQDKLVAELKREGVYDAPHTALTWTMLHRLPFLDCCVRE 246

           GH+     L+W L  LAR+   Q++   E++ E + D     + W  L +LPFL  CV+E

Sbjct: 329 GHDTTASGLSWVLYNLARHPEYQERCRQEVQ-ELLKDRDPKEIEWDDLAQLPFLTMCVKE 387

 

Query: 247 ALRLYPAQPCPATVRQLNKDVVLA-GWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERW 305

           +LRL+P  P P   R   +D+VL  G  +P G    +D+  +H NP +W DP+ ++P R+

Sbjct: 388 SLRLHP--PAPFISRCCTQDIVLPDGRVIPKGITCLIDIIGVHHNPTVWPDPEVYDPFRF 445

 

Query: 306 AEHASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGDPADEPRP 365

               S+    SPLAF+PF +GPR+C+GQ  A AE+K  LA++L      P      EPR

Sbjct: 446 DPENSKGR--SPLAFIPFSAGPRNCIGQAFAMAEMKVVLALMLLHFRFLP---DHTEPRR 500

 

Query: 366 AAGLFLRPAGGLHL 379

              L +R  GGL L

Sbjct: 501 KLELIMRAEGGLWL 514

 

>e_gwH.661.2.1 [Chlre3:109783]

Name:e_gwH.661.2.1

Protein ID:109783

Location:Chlre3/scaffold_661:7589-8149

bacterial contamination 81% to Arthrobacter seq NZ_AAHG01000018.1

Arthrobacter sp. FB24

MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGDDASGTAQLFA

SSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGSADLIKELAYPLPVIVISELM

GIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEYFLALIDE

 

Query  1      MDFRASPEYQLDPFPYYERMREAAPVYYDEQSGSWHIFRYDDVQRTLSEYATFSSHMGGD  60

              MDF A+ E  LDPFPYYERMREAAPV++DEQSGSWH+FRYDDVQR LSEYATFSS MGGD

Sbjct  49739  MDFAAANENPLDPFPYYERMREAAPVFHDEQSGSWHVFRYDDVQRVLSEYATFSSRMGGD  49560

 

Query  61     DASGTAQLFASSLIATDPPRHRQLRSLVTQAFTPKAVDALAPRIAGLTDELLEGIAARGS  120

              D S T QLFASSLI TDPPRHR LRSLVTQAFTPKAVDALAPRI+ LT+ELL+GI +RG

Sbjct  49559  DPSETGQLFASSLITTDPPRHRHLRSLVTQAFTPKAVDALAPRISELTEELLDGIVSRGG  49380

 

Query  121    ADLIKELAYPLPVIVISELMGIPAQDRERFKQWSDVIVSQTRTGSASGNHIAANMEMTEY  180

              ADLI+ELAYPLPVIVISELMGIPA DR+RFKQWSDVIVSQTRT +A+ +H A N EMT Y

Sbjct  49379  ADLIEELAYPLPVIVISELMGIPADDRDRFKQWSDVIVSQTRTNAATEDHQATNREMTGY  49200

 

Query  181    FLALIDE  187

              FL LI++

Sbjct  49199  FLDLIEQ  49179