Other ways to search: Events Calendar | UTHSC | UTHSC News

Chlamydomonas P450s

This file last modified Sept. 10, 2003

D. Nelson

For a link to the old file on Chlamydomonas P450s see old file

CHLAMYDOMONAS HAS AT LEAST 33-34 P450 GENES (SEE C-TERM ALIGNMENT), one may be a duplicate (scaf 2693).

The sequences include a clear CYP51 (scaffold 58), a clear CYP710 (scaffold 690), There are several CYP97s including four CYP97As and a probable CYP97B. These are the sequences with the best percent identity to Arabidopsis P450s. Scaffold 1399 is 61% identical to CYP97A3. There is a cluster of CYP707 like sequences (probable CYP85 clan), another cluster of 10 CYP711 like sequences, and a cluster of 5 CYP709 like sequences (probable CYP72 clan). There is no CYP74 (allene oxide synthase), or CYP73 (Cinnamate 4-hydroxylase). There are no obvious CYP71 clan members (or Plant group A P450s). This is rather surprising since they dominate the Arabidopsis and rice P450 collection. The single sequence that is most like CYP71s is on scaffold 175. It is so different from other known P450s, I cannot assemble it completely. Blast search of this seq against Arabidopsis turns up CYP71s as the best hits, but only around 29% from the I-helix the the heme. Identity is about 35% from the K-helix the the heme.

The analysis of these sequences is continuing. Six or seven sequences have very little resemblance to known P450s in the N-terminal half. I have not been able to assemble the complete gene in these cases, even when using the Genscan server and blasts against plant ESTs as aids. These may require EST support to be able to assemble them correctly.

Note on sequence nucleotide numbering. The JGI genome blast server does not count nucleotide numbering correctly in its output for TBLASTN searches. The nucleotide numbers are correct on the 5 prime or left side, but the 3 prime side has been counted as amino acids and not as 3 times that value. Not all of these errors have been corrected below.

  • For a link to the JGI Chlamydomonas blast server see: JGI Server
  • For a link to the Chlamydomonas EST blast server (which also does the genome with correct nucleotide counting) see: EST blast server
  • For the GenScan server see: Genscan server

Scaffold numbers are given followed by arbitrary seq numbers for my use in [n]

Scaffold numbers are given followed by arbitrary seq numbers for my use in [n]

5 frag [29] frag EXXR to end, runs off end upstream (Join with scaf 171? No)
25a [30] four genes 125k     almost complete
25b [12] four genes 138-145k almost complete
25c [18] four genes 166-169k almost complete
25d [9]  four genes 186k     almost complete this one joins scaf 1959b
33 frag [28] C-term in seq gap missing N-term and middle (not recognizable)
58 complete [7 8] = CYP51
156a FRAG [31]    almost complete
156b frag [11 19] almost complete
156c FRAG [32]    almost complete
171 [40] runs off end missing C-term EXXR to end (Join with scaf 5? No)
175 [44]  partial, very different sequence (cannot recognize N-term half)
200 (2/3) [20]   almost complete
306 N-term + heme [1] missing EXXR to PERF in seq gap
437a frag [34]   almost complete
437b frag [35]   almost complete
467 frag [36}    almost complete
479 frag [3]     almost complete
521 frag [4 14]  almost complete
574 frag [25] PKG to end, runs off end 
636 frag [23] cannot recognize N-term half
668 frag [38] EXXR to end, runs off end upstream
683 frag [43] I-helix [seq gap] heme to end (cannot recognize N-term half)
690 complete [21]
712 frag [13] N-term EST, I-helix, seq gap, heme to end (cannot recognize middle) 
781 frag [26] N-term and C-term (cannot recognize middle)
806 frag [33]    almost complete missing N-term exon(s) to KYG
846 complete [2]
946 frag [5 22] EXXR to end runs off end
1199 complete [24]
1285 [42] C-helix MAY BE ACCIDENTAL
1399 complete [6 15]
1959a [10] (3-4k) almost complete
1959b + 25 [9]    almost complete small gap in middle 
2262 [41] N-term only, runs off end
2628 frag [37] C-term, runs off end upstream
2693 frag [39]    almost complete almost identical to scaf 437 (duplicate?)
3547 frag [27] partial C-term seq with large seq gap
no scaffold frag 16. mid region only SIMILAR TO SCAF 1399
no scaffold frag 17. mid region only SIMILAR TO SCAF 1399
THESE LAST TWO COULD BE UPSTREAM OF SCAF 946 OR DOWNSTREAM OF 574

A partial seq alignment is given below.  Dunaliel is Dunaliella another green algae EST frag.

seqs. 1-4, 6, 7, 9, 10, 12, 13, 16, 18, 21, 24, 26, 30, 34-36, 40, 41, 43 
are N-TERMINAL PARTS

1. MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT----
21. MNATGLLNDGLASLGMSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLPGPASVTPIIGGIVEMVKDPYG
34.                                  GFALLLVSLIIYLLDPIKRWRLRKIPG-PGPRGRPVLGCLPQLRAQPMP
35.                                  GLALLLASLLIYLLDPIQRWRLRKVPGERGPPARPLLGCLPQLRAQPMP
36.                                   AVLALLLALHVLADPLQRWRLRHIPG--GPPALPLLGSVPAMMRAGGP
30.             MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSRATSVPRLGPLLTIPLLGDVAWLAADPTR
10.                                              GAFPYPTTHPSTITLHVTITQWPFLGDAVELGITXXX
18.                                  MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAE
41.                                           MQLTWLGWAPVTRWRLRNIPGPFALPFLGHLPAISARDLV
40.                     MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTR
26.     MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAELPLPEGSLGLPVVGETLELITN-GD
7.              MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTGGPWK
24.                MDGFWKTLGLGALLSPVLYALYLASLIVIPYLKSLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTPVH
2.                           MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQS
42.                                                                          MQIAAMDTT
3.                               MYAALALVLSPVLLALLWAIINPVERWKTRKIPGPPGLPLLGHLLNFATGDAT
4.                        MQDVISFLLNGLGFAAVGLVVLQLVLSLDLYKRWKLRHLPGPPALPLLGNLPQILAKGSP
13.                       MTFLQLLPGVPLVLLGVLALPVVITLVQEVITKRKYRHIPGPKPQPISGNLREFLTSPGG
9.                       MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLFSWPFLGDTIEFATTDPT
12.              MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGXLLHVPLLGDGVLMAAGNPV
6.                                                                      ARGDIREIVGQPVF
16.                                                                     ARGNIREIVGQTAT

13.            LLGCLEGWVK
2.              RFMFSRYKKYGSVFRLNLLGQ
24.             QCMARWAEQYGKIFKLELPTMT
10.             XXXXXRFKKYGRVFRLNLLGHTAFVVRTP
30.             FVFGRRFQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVPFSSFRRLM
34.             LFLQSCAQTYGPVFKASQVALGRKWVVVLADAEMQRQVDGAGSERGQGGGAQIRL
35.             LFLQSCAQTYGPVFKASQVALGRKWAVVLADAEMQRQVRGTGAERG
36.             FFFRQCFAKYGPVFKAQVAMGRKWVVVVADAELMRQ
41.             HFCHDVARQYGPVSLTQVWVAARPWIVVSDPVAARKIAYR
40.             YMWK-NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF
26.             TFGTSRRERYGDVYKTNILGAPTVMV
3.              DFTVEAVKKYGNVVAIWFGNRAWITIADPALIRKLGFKFLNRPARMTDFGHVLVGHNAEV
42.             AFLTSSAVKYGPVCK-WFSTQPWV-INDPKLVRWVG
4.              AFFRECRAKYGPVFRVAFGRNWMVVVAEPDLLRQVGGKLLN----HSMFRGLLGGEFAKL
9.              KFLFGRFKRYGRVFRLSLLGFTAYVTADPEALRPLLA-DEG---GHFTIPVQTFTALMGA
18.             -FLFSR-KEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEG---ASAEIPIKSFKMLTEL
12.             KMFWDRYRRYGSVFRTMMLGSRIWVVTDLDALRGPL-RDEG---AYLEIPFKAFQRLVSA
6.              VPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNAD--KYSKGLLSEILDFVMGT
16.             VPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNAD--KYSMGLLTKILDLVM--
7.                        GECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSEVYDFNIPTFGRGV

28.         IPGNGLLVSDGPVWQRQRRLSNPAFRRAAV
1.                   DGALWQKQRMLMGPALRVDVLDDIIR
41.                   GPAWKASRRAFETSVLRPDRL
3.           DNAGAFMASGEVWRRGRRAFEASIIHPASL
42.               FPYRGEAWRRTRRVLEGSIIHPA
34.                      WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG
35.                      WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG
36.                      WRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA 
4.           DDWGLVSARDDFWRKVRAAWQPAFXAPSLSGYFPLMTDCA
9.              YNLQAHKEVHAAWRKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRV
18.             PSPNSDRVNHATWRKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQ
12.             ESFLNRPGVHGPWRKIFSATLAPPRLAAMVPKIAQLMQSHLSKWEEQGQV
6.              GLIPADGEIWKARRRAVVPALHRKYVMSMVDMFGDCAAHGASATLDKYAA
7.              VFDVEQKVRTEQFRMFTE-ALTKNRLKSYVPHFNKEAEEYFAKWGETGVV 360

12.                  TIFRAARVMGVDLAVDVILDIKLLDGTDRAWVKS 664
6.              SGTSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQAVYTLLREAEHRSTAP
17.                HEDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIKAVYGVLKEAEHRSTFY
7.                DFKDEFSKLITLTAARTLLGREVREQLFDEVADLLHGLDEGMVP 429

6.              IAYWNIPGIQFVVPRQKRCQEALVLVNECLDGLIDKCKKLVEEEDAVFGGG
17.             LPYWNLPLADVLVPRQAKFRADLKVINECLDNLIKQARDTRVAEDAEALQN

17.             RDYSKVSDPSLLRFLVDMRGEEPTNKQLRDDLMTML

SEQS 5-6,8-11, 14, 19-21, 25-28, 30-34, 36-39, 43 ARE C-TERMINAL PARTS
8               TRKDLAAIFAKIIRARRESGRREEDVLQQFIDARYQNVNGGRALTEEEITGLLIAVL


21.             FASQDASTASLV-WTITLMAEHPEVLARVRDEQ---------YRLRPNPEEKVTGDMLN-EMHYTRQV
8.              FAGQHTSSITTS-WTGIFMAANKEHYNKAAEEQQ---------DIIRKFGNELSFETLS-EMEVLHRN
43.             LGGTDTSALTVA-FAAWHLAAEPQLQAELRREVGARSG------------------------------
14.             LAGYETTANALA-FAVYCLATNPE----------AEAKLLAEIDAVLGP-DRLPTEADLPRLPYTEAV
13.             LAGFETTANALT-FAVYLLACHPE--------------------------------------------
36.             LAGYETTANALA-FAIYCVATHPEGEWRSE----------GPSEDGTAARYRPPTESDLPRLPYTEAV
35.             LAGYETTANALA-FAVYCIATHPEGTATYRPLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAV
34.             LAGYETTANALA-FAVYCIATHPEGTATYRPLAAVESRLLHEVDDVLPGSDQL---------PYTEAV
31.             -AGYETTSAATS-LALFLLATHPEAAARLA----------AEVDAVLGRELTAELLAE--KLPYTEAV
28.             ----ETSAILLG-WASALLAAHPEVQAAAA----------AEVAAVCGSASPADCSVR--HMPYLESV
38.             -------------------------------------------------------------LPYLDAV
32.             MAGFETTALTLS-LVTFMLATHPEAAARLT----------AEVDGLGPGELTHEVLAE--KLPYTEAV
3.              VAGYETSSNTTT-MASYLLATHPAAQQRMA----------DEIDAVLGGELTPELLAK---LPYTEAV
11.             LAGFETSADTLA-LTCYLLATHPEAAARLV----------AEVDAVGGRELTAELLAE--GLPYTEAV
33.             LAGFETTAATIS-FTAFCLATHPEAQARLL----------AEVDEGQQQREGDDALPE---LPYLDAV
6.              IAGHETTAAVLT-WTLYLLSQHPEAAAAIR----------KEVDELLGDRKPGVEDLRA--LKMTTRV
17.             IGGHETTAAV----------------------------------------------------------
Dunaliel        --------------------------------------------------------------------
25.             --------------------------------------------------------------------
20.             IAGHETTAAVALTWALHLLVAHPEVMKRVR----------DEVDWVL--GDRLPGSDDLPLLRYTTRV
5.              -------------------------------------------------------------------C
23.             FAGHETTATSIVRLMLVLANPRPDVVSRLREEQA----------AAVRQHGAAISGSSIRDMPYLDAV
24.             IAGFETTAHAIG-WTLMFIAGSPEVESRVA-----AELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA
1.              --------------------------------------------------------------------
27.             --------------------------------------------------------------------
29.             --------------------------------------------------------------------
37.             --------------------------------------------------------------------
26.             SMVCLNHLNALSTWWPVMTRIAVPPWPTAVRQD------------IVSRHGPAITAEALDEMSYGTAV
9.              -GAADTTRFALF-NTWAILAMSPRVQDLIYEEQKK----------VVAENGPELTYKTAMSMPYLDAA
40.             IASDDTSKHLFF-FELVAAAMLPGVWAKLEEEQKQ----------AMRKYGDELSYSILNDMPYLDAV
2.              MASADTTRFALF-NTWALVAQSARVQEKLYEEQQK----------VIEEFGPELSYKAASSMPYMDAT
30.             --------------------------------------------------------------------
12.             --------------------------------------------------------------------
18.             AAAADTTRVTLF-TVLALVAMSPRVQEEIFAEQQK----------VIAEYGSELSYKVVSDMPYLEAV
10.             ---------------------------------------------FVAAHGPELTPAALSSMPYLEAC


21.             VKEILRFRPAAPMVPMRAKAPFKL----TETYTAPKGA---------------------
8.              ITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPKGDVVAASPNF-------------
43.             -----------------------------------------------------------
14.             FNETMRLYPPAHATNRHTDK-APMQ----GPYTLPKDTTL-------------------
13.             -----------------------------------------------------------
36.             LNEAMRLFPPAHATTRIVEAGAPLQ----GGVSLPPRTPL-------------------
35.             VNEALRLFPPAHLTSRVVPPGETLT------FNIPAGIPI-------------------
34.             VNEALRLFPPAHLTSRVVPPGETLT------FNIPAGIPI-------------------
31.             IKETLRLHPGITFLVREATEDVDL----GAGRVVPRGSTL-------------------
28.             VLETLRLYSPAYMVGRCARRDAAL-----GPYVLPAGTTV-------------------
38.             LKESQRLHPAVGHFWRDATSDIALP--EMGGLVIPKGSFV-------------------
32.             IKETLRLHPPIPYFIREAREDLDL----GNGMVAPKGSYL-------------------
3.              LQETLRLYPAAPYLLREAREEVDL----GGGRVVPKDSVL-------------------
11.             IKEAMRLYPPVPYLLRQAREDLDL----GKGMVAPKHSYV-------------------
33.             LKESMRLYPAGSALIRKSPQPLDL---GRDGLVIPG-----------------------
6.              INEAMRLYPQPPVLIRRALQDDHF-----DQFTVPAGSDL-------------------
17.             -----------------------------------------------------------
Dunaliel        -----------------------------------------------------------
25.             ---------------------------------------VPLGQDV-------------
20.             VNEALRLYPQPPVLIRRAMQDDVL----------PGGHVVAAGTDL-------------
5.              LGESLRMYPQPPILIRRALAEDTL---PAGLRGDPAGYPIGKGADL-------------
23.             VKETWRCHPVVPMVPRRAVRDFTL-----GGHDVPQGWGVVLGLVEPM-----------
24.             IHESMRLMPPTSGGTVRVVPRDTQ----LAGHVLPKGTMLW------------------
1.              -----------------------------------------------------------
27.             -------LCPTSASLSRCRPQHPT---RVGKYLVPAGTPIGTALFA-------------
29.             ----------------VVPPLQDV---VLAGWSVPAGAEV-------------------
37.             -------------------------------------------ELV-------------
26.             ARELLRITPAVPAVFRLALVDFEL-----------------------------------
9.              FKEAMR--LLPASAGGFRMLTKEL---RVGDVLLPPGTIIWFHALLLQTL--DPVLWDG
40.             IK---------------------------------------------------------
2.              IKECMR--LLPASAGGPRKLTQDL---KVGEVVLPAGSFVWMYSYLLHCL--DPVLWDG
30.             VREAMR--LLPATPGNMRRLTADL---RVG----PAGSMVWRFVPLMHCL--DPVLWDG
12.             LKECMR--LLPASAGGIRKLTADM---QVGGYTVPAG--------LMHYI--DPVLWDG
18.             VKEAMR--LLPPAAGGMRVLSEPL---TVGDVTLPTGALLLSYSFLMHCI--DPALWDG
10.             FRRAMRFSLLPTGGGALRHFTKEL---KAGSVTLPAGEWFSGVYHPHLMHCIDPVLWDG



21.             LIVPSLVAACKQGYSNPDSFDPDRF------SPERAEDIKYA------------------
8.              ------SHMLPQCFNNPKAYDPDRF------APPREEQNKP-------------------
43.             -------------------------------DVWHTDTSLPLPAMPAAPLFCPSRPQP--
14.             FMSIFSAHHNTDVWPRVNDFVPERFLP----ESPLY---PEVAARVP-------------
13.             -------PAAFLRPQHTQAFRPERFL-----SPDVPGSAPELAARHP-------------
36.             ILAIYSAHHDPAVWPRPEDFIPERFLP----ASPLH---SEVAARVP-------------
35.             FLPMYIAHRDPAVWPRADVFLPERFLH----SSPLYESLQPRGAAQQ-------------
34.             FLPLYIAHRDPAVWPRAEEFLPERFLP----SSPQYESLQPRGAAQQ-------------
31.             CMATHAVMHDPDIWPEPEAFRPERFLPEGS------------------------------
28.             LVSPYVMHRDPEVWEEPEVFRPERWQELQRRCER------------------GLMSNLGP
38.             SISIYNMHRDPAHWKEPERFIPERFLQ----GALGPTDP---------------------
32.             TMYMHAVHLNPDVWPHPERFLPQRFLPEGS-AAFGPADP---------------------
3.              VLHVHSMQRDPDVWPQPEAFLPQRYLPEGQ-AALGPADP---------------------
11.             VLYVHSMHLNPDVWPHPERFLPQRFLPEGS-AAFGPADP---------------------
33.             -------MHDPAIWPEPEAFRPERFLPEGS------------------------------
6.              FISVWNLHRSPKLWDEPDKFKPERF---------GPLDSPI------------PNEVTEN
17.             ------------------------------------------------------------
Dunaliel        ------------------------------------------------------------
25.             MISVYNIHHSPAVWDDPE-FIPERF---------GPLDGPV------------PNEQNTD
20.             FISVWNLHHSPQLWERPEAFDPDRF---------GPLDSPP------------PTEFSTD
5.              FISVWNLHRSPYLWKDPDTFRPERFFEPNSNPDFGGKWAGYRPDAVTGGAALYPNEVASD
23.             RDLPAWSGLTPDSPLHPSHFNPDRWLSGRSSASGN---------------------GMLP
24.             IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG-----------------------A
1.              ------------------------------------------------------------
27.             ------IHNTRHNWTDPLAFRPQRWMGESS------------------------------
29.             WVDVHAMHRNPQLWRDPDRFNPERW-----------------------------------
37.             VSPYVLHRLPRLWGPHAACFQPERFMP---------------------------------
44.             ------------------------------------------------------------
26.             ------------------------------------------------------------
9.              DTSVDVPVHMDWRNNFEGAFRPERWLSEETKP----------------------------
40.             ------------------------------------------------------------
2.              DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP----------------------------
30              DTSVDVPAHMDWRSNFEGAFRPERWLSEDTKP----------------------------
12.             DTSVDVPAHMDWRNNFEGAFRPERWLSEETRP----------------------------
18.             DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP----------------------------
10.             DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP----------------------------

21.       SNFLVFGHGPHYCVGKEYAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIF
8.        YAFIGFGAGRHACIGQNFAYLQIKSIWSVLLRNFEFELLDPVPEAD---YESMVIGPKP-
43.       NAFLPFGVGSRSCIGRHFGLLSTQ------------------------------------
14.       HAHAPFGFGSRMCIGWKFAVQ-EAKVALAALYQRLTFELEPGQV-PLQTAVGITLSPRNG
13.       HVHLPFGSGPRMCIGWRFAMQ-EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNG
36.       GAHAPFGYGSRMCIGWKFAMQ-EAKLVLALLYQRLLFRLQPGQV-PLPTATALTLAPRDG
35.       HAHAPFGYGSRMCIGYKFAMQ-EAKVALATLYRRLTFTLEPGQQ-PLQVEASLTMAPRGG
34.       HAHAPFGYGSRMCIGYKFAMQ-EAKVALATLYRRLTFTLEPGQQ-PLKLVASVTMSPRGG
31.       HVWAPFGMGTRMCVGHKLAMM-ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADG
28.       GAYLPFGGGPRN------------------------------------------------
38.       GAYVPFGSGPRMCVGYKMAIM-VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG
32.       GAWAPFGIGARMCVGHKLAMM-MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDG
3.        NGWAPFGVGARMCVGHKLAMM-VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADG
11.       GAWAPFGIGARMCVGHKLAMM-MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDG
33.       AAWVPFGMGPRMCVGSKFATM-VSKAVLLQIYRRFTFELHPKQVLPLRTRTALTHAPRDG
6.        FAYLPFGGGRRKCIGDQFALF-EAVVALAMLMRRYEFNLDESKGTV-GMTTGATIHTTNG
17.       ------------------------------------------------------------
Dunaliel  --------------------H-EAVVALSVLLKNFNEAQLVRNQTI-GMTTGATIHTTNG
25        FSYIPFSGGPRKCVGDQFALM-EAVVALTVLLRQYDF-QMVPNQQI-GMTTGATIHTTNG
20.       FRFLPFGGGRRKCVGDMFAIA-ECVVALAVVLRRYDFAPDTSFGPV-GFKSGATINTSNG
5.        FAFIPFGGGARKCVGDQFAMF-EATVAAAMLLRRFTFRLAVPAEKV-GMATGATIHTANG
23.       PQMLTFGGGGRYCLGANLAWA-ELKVFVAVLLRGYDFTSPLPELE---------------
24.       RGFLPFSEGPRNCVGQSLALL-ELRTALALLCGSFR------------------------
1.        --------GPRNCLGQHLALL-EARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTVIPV
27.       LSYMPFSEGPRSCVGQSLAKL-EVMTVLATLLAHFR------------------------
29.       LAFMPFGSGPRSCLGQQLAAA-ELKAALAVLLCFLALEPTGDPADEPRPAAGLFLRPAGG
37.       GPYLPFGAGPRACPGASFGSA-EVKLLVAHVVMRYSLELLQPPPPSPRQQLFVSLRPGPG
44.       AEARPFGIGPRACPAGSLSVV-IVREALAALLTKYRWRLYDEVGDRDWMSGAVSTPTMAF
26.       EYSLPFGSGVRTCLGRNLVMT-ELLVVLAVLARGYEWEAVNPAEQW-G--VVPSPAPKEG
9.        RSYYIFGQGAHLCAGMVLVTL-EVKLLLAMVLRKWRLQLEVPDMLAR-AELFPYPKPPKG
40.       ------------------------------------------------------------
2.        KYYFTFGYGNHLCAGINLAYL-EIRTMLALVIRKYRLRLQTPDMLSR-ARYFPFVEPSPG
30.       KYYYTFGSDNHLCVGQNLAYM-EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRG
12.       RYMFTFGTGAHLCIGMNLVYL-EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKG
18.       KYYYTFGVGKHMCAGIHLVYM-VRVQMVALLVRKHRLKLQTPDMFER-ATWLPFTTPAPG
10.       KYYFTFGSGVHLCAGVNLVYL-EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPG

21.             DLSWSAKK*----------------------
8.              CRVRYTRRKL*--------------------
43.             -------------------------------
14.             VWVRPVARRLTPRQPTTPPVGSAAK*-----
13.             VWVRLSPRGGGGSGGGGGRGQEVATAAAKGAAVRSAAA
36.             LWVRPVLRR----------------------
35.             LRVTPVPRR----------------------
34.             LHVTPVPRR----------------------
31.             VWMTVTRR-----------------------
28.             -------------------------------
38.             -------------------------------
32.             VWVTLTER-----------------------
3.              VWLTLTER*----------------------
11.             IWLTLTER-----------------------
33.             -------------------------------
6.              LNMFVRRRDPLTVPPPSSSMAEAVSTGYAF-
17.             -------------------------------
Dunaliel        LYMTVKERTPAAAALAGATA*----------
25.             LYMYVKER-----------------------
20.             LHMLISRRDLT--------------------
5.              LSMRVTRRTPGGGSGSGAPG-----------
23.             -------------------------------
24.             -------------------------------
1.              GPIRGLKVLVEQRK*----------------
27.             -------------------------------
29.             LHLLLVHRQ----------------------
37.             VRVCFVPRHQQQVE*----------------
44.             RPPLRVVFARVVEDGGESS*-----------
26              LRVRLHRR-----------------------
9.              TGGIRLIAREQPLALGARTQGGVNLGSRVFE
40.             -------------------------------
2.              TDTVLLEAR*---------------------
30.             TDRVLLEPR*---------------------
12.             TDTVLLEPR----------------------
18.             TDTVLFEPR*---------------------
10.             TDKVELLPREQPLPVASIDL-----------


>44. scaffold 175 very different
63125 (0) HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 62976
VGRGRERGGRELARMEVKRGGDRF (2)
62528 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGSa (1?)  62412

61349 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREALAALLTKYRWRL 61164
61163 YDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 61041


>6. 15. SCAF 1399 15 EXONS 60% TO 97A3 FIRST EXON PREDICTED BY GENSCAN
BM003139 BI725954 BE441929 BI719213
13327 MLSFSTSISGCRFGSWAVPSFGPRRAPTSTPTCRLGFDTGRSAARFLADLGRQWRAEASKRMP
      EVRLELRPCDGGGRASCPVLGKSTYT (0) 13061
12913 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSKGLLSEILDFVMGT 12698
12532 GLIPADGEIWKARRRAVVPALHRK 12461
12332 YVMSMVDMFGDCAAHGASATLDKYAASG 12249
11994 TSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ 11884
11707 AVYTLLREAEHRSTAPIAYWNIPGIQFV 11624
11493 VPRQKRCQEALVLVNECLDGLIDKCKKLV 11407
11269 EEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 11168
11003 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 10884
10681 VDELLGDRKPGVEDLRALK (0) 10625
10448 MTTRVINEAMRLYPQPPVLIRRALQ 10374 
10118 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER 10002
9580 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ 9485
9358 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT 9263
9124 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACG
     PAVMPVASAEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 8852

>21. Scaffold 690 10 EXONS 43% to 710A1 exon 1 predicted by genscan. EST SUPPORT
20577 MNATGLLNDGLASLGMSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 20356
20000 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP 19905 
19904 GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 19785 
19644 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 19498 
19355 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 19269 
18917 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 18802
18583 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 18497
18215 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 18012
17722 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLTETYTAPKGALIVPSLVAACKQ 17456 (0)
17279 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 17155 (0)
16995 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 16840


>7. 8. SCAF 58 10 EXONS 56% TO 51A2  EST SUPPORT BI717817 BU649818 BI726293 BM001590 BI718677 more
AV642299

60124 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG 59927
59843 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE 59694
59394 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE 59245
59108 EYFAKWGETGVVDFKDEFSKLITLTAARTLL 59016
58765 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 58637
58412 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR 58311
58119 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK 57985
57667 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK 57455
57191 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 57009
56877 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 56743

>24. SCAF 1199 14 exons 72A9 LIKE exons 3,4,5,13,14 not well supported
8031  MDGFWKTLGLGALLSPVLYALYLASLIVIPYLKSLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 8225
8408  VHQCMARWAEQYGKIFKLELPTMT (0) 8479
8633  ELIRLTNITTRLGLVYDTGRTFGT 8704
8705  RAKRRPGGSLPRTWPRDVAASMQYDALIQPDLSIVSAYKRDSKSCRT (2) 8845
8884  GNAKTGPNGRCDTARTLTRQDTTPGHREA (0) 8970
10741 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 10893
11059 VVGELMTALQRMLNPLNRWFWWRK (0) 11130
11410 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 11544
11782 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 11901
13254 VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) 13370
13544 VIHESMRLMPPTSGGTVR (2) 13597
13750 VVPRDTQLAGHVLPKGTMLW (0) 13809
14146 IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) 14253
14541 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 14648
14920 FRLADDMGGVEG (1) 14955
15160 AVSEARQHITLKPGDRGLLMHAIPRVPA* 15246
17630 ATGITAGGRGGAW 17668 ????

>20. SCAF 200 94E3 aaaa01014899.1 LIKE 
 seq gap
 9247 SSRHSKGILAEILEFVMGN (0) 9306
 seq gap
 9705 SVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ 9812
10081 AVYSVLRESTVRSTAPX 10128
10371 ADWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSM 10487

10853 PTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAAV 10963
11275 LTWALHLLVAHPEVMKRVRDE 11277
11605 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ 11736
11956 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR 12075
12251 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALAVVLRRYDFAPDTSFGPVGFKS 12442
12584 GATINTSNGLHMLISRRDLT 12643
12644 GVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAAVGCPHAAAAATSGAPAGVTP 12811

>5. 22 scaf 946 CYP97B AV390436 Chlamydomonas reinhardtii EXXR to end No additional ESTs
CLGESLRMYPQPPILIRRALAEDTLPAglrgdpagypigkGADLFISVWNLHRSPYLWKDPDT 
    FRPERFFEPNSNPDFGGKW (?)
188 AGYRPDAVTGGAALYPNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAAMLLRRFTFRLAVPAEK (0) 385
604 VGMATGATIHTANGLSMRVTRRTPGGGSGSGAPG 705

>4. 14. scaf 521 BE726345 N-term to C-helix 33% to CYP711A1 BM002146 BI728655
37486 MQDVISFLLNGLGFAAVGLVVL (0) 37551
37671 QLVLSLDLYKRWKLRHLP (1?) 37724
37956 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 38060
38400 VAFGRNWMVVVAEPDLLRQ (0) 38456
38720 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR 38812
39351 DDFWRKVRAAWQPAFXAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 39491
39994 YGRVLVQACRDVFKYSSVVYGS 40059
40319 YSRVGLLFPEWRPVVAILANAAPDLPFKML 40408
40595 QARTHLRDACMSLIDGWKKQ 40654

41049 QVQTFILAGYETTANALAFAVYCLATNPEGE 41141
41578 RLPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ 41700 
42000 GPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLPVS 42119
42294 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (?) 42395
42712 QEAKVALAALYQRLTFELEPGQ (0) 42771
43237 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 43362

>28. scaffold_33
31584 IPGNGLLVSDGPVWQRQRRLSNPAFRRAAV 31495
30609 PSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 30502
30091 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCG
29766 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG
      TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRR 29548
29296 EGYSGYMGLMSNLGPNGAYLPFGGGPRN 29261 (SEQ GAP HERE)

>23. SCAF 636 
13432 WPAATVAMLGTDSVTFST 13379
13145 GAYHRSLRRLLGPCFSPQ 13092 C-helix?
12878 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSA

12363 GIFAPVALAIPGSN 12322
12098 YAKASAARKVMVAAL 12054

      LLFAGHETTATSIVRLMLVLANP
      RPDVVSRLREEQAAAVRQHGAAIS
10590 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0)
      GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (1)
      WLSGRSSASGN (?)
      GMLPPQMLTFGGGGRYCLGANLAWAELK (0)
      VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE

>40. scaf 171 runs off end
3676 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTRYMWK (0) 3500
NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0)
VPGKTFTMLISDIRHMQVPEQHAVF
RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA

1651 LAGLYGVPLPWLPGTAIHSALRAQRRLMALLGP 1553
GTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0)
ectvirygtllpdwphattviitamtsq  (THIS MAY BE INTRON)
AMRKYGDELSYSILNDMPYLDAVIK (0) 497

>27. scaffold_3547 Length = 4667 partial seq with large gap
LCPTSASLSRCRPQHPTRVGKYLVPAGTPIGTALFAIHNTRHNWTDPLAFRPQRWMGESS
4545 LSYMPFSEGPRSCVGQSLAKLEVMTVLATLLAHFR 4646

>25. scaffold_574 RUNS OFF END Length = 44,663 97C1 LIKE 68%
44288 VPLGQDVMISVYNIHHSPAVWDDPEX 44214
43839 FIPERFGPLDGPVPNEQNTDF 43777
43352 SYIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ 43227
42864 IGMTTGATIHTTNGLYMYVKER 42799

>1. scaf 306  AV623700 N-term
MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT
CVSQNKGAWDIMEDWMDAKGPIVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGLVTS
DGALWQKQRMLMGPALRVDVLDDIIR
IAKKAIDRLCEKLSHHAGKGDIVDIEEEFRLLTLQVIGEAVLSLGPEECDRVFPQLYLPV
MNEANRRVLRPYRMYLPTPEWFRFSSRMGQLNGFLIDLFRRRWQARQAAAAAAQGEGSSS
SKPKPADILDRIMEAIEESGAKWDAALETQLCYEVKTFLLAGHETSAAMLTWSTLELAAH
SQAADKVVEEARAAFGPR
(SEQUENCE GAP)
GPRNCLGQHLALLEARVVLGL
LHARFSFKPAPSVHPDPASLFMRHPTVIPVGPIRGLKVLVEQRK

>41. scaffold_2262 N-term, runs off end
2459 MQLTWLGWAPVTRWRLRNIP (1?) 2400
1885 GPFALPFLGHLPAISARDLVHFCHDVARQYGPV 1787
1503 SLTQVWVAARPWIVVSDPVAARKIAYR 1423
1222 LARPSTVASFTHALVGEPRQVDDESIF 1142
 784 GPAWKASRRAFETSVLRPDRL 722
 721 AAHMPAVRRCTERFLARL 712

>42. scaffold_1285 C-helix similar to scaf 479
gene region on scaffold not large enough for a P450 
BUT THIS MAY BE INCORRECTLY ANNOTATED
13946 MQIAAMDTTAFLTSSAVKYGPVCK 13875
13607 WFSTQPWVINDPKLVRWVG 13551
12773 FPYRGEAWRRTRRVLEGSIIHPA 12705

>3. scaf 479 AV641971 35% to 703A2 N-term to C-helix 
51492 MYAALALVLSPVLL (0) 51451
51367 ALLWAIINPVERWKTRKIPG (2) 51308
51224 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0)51123
50867 IWFGNRAWITIADPALIR (2)50814
50325 KLGFKFLNRPARMTDFGH (0) 50272
49795 VLVGHNAEVDNAGAFVAR (2)49706
49574 GEVWRRGRRAFEASIIHPAS (2) 49515
49142 LAAHLPAINRCANRF (0) 49098
48929 AGNYTMAAVGEVAYG (2) 48885

47764 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 47672
46972 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG 46832
46831 pwragagagegacagGELTPELLAK (0) 46757
46326 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 46288
46008 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALGPADPNGWAPFGVGARMCVGHKLAMM (0) 45820
45561 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 45421

>BM446811 halotolerant green alga, Dunaliella salina New not chalmy 
    HEAVVALSVLLKNFNXEAQLVRNQTIGMTTGATIHTTNGLYMTVKERTPAAAALAGATA*

>16. no scaf PTQ4692.y1 Mid region of CYP97 like seq similar to seq 6 64% TO SCAF 1399
 ARGNIREIVGQTATVPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNADKYSMGLLTKILDLVM ST851

>9. scaffold 1959b 25D BI527318 BG852189 BE129324 BI527323 BI527331 28% to 702As
MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLFSWPFLGDTIEFATTDPTKFLFGR
FKRYGR 
185655 VFRLSLLGFTAYVTADPEALRPLLADEGGHFTIPVQTFTALMGAYNLQAHKEVHAAW 185825
RKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRV
ELYEAARTLGLDLAVDVLSGVKLEERGIQPXWLKSRMADFLXRLYGLPLALPGSLAR

FTARGCTTPRDAAMTVLHAVMGAADTTRFALFNTWAILAMSPRVQDLIYEEQKK (0)
VVAENGPELTYKTAMSMPYLDAAFKEAMRLLPASAGGFRMLTKELRVGDVLLPPGTIIWFHALLLQTLDPVLWDGDTSVDVP
VHMDWRNNFEGAFRPERWLSEETKPRSYYIFGQGAHLCAGMVLVTLEVKL
LLAMVLRKWRLQLEVPDMLARAELFPYPKPPKGTGGIRLIAREQPLALGARTQGGVNLGSRVFEF*

>2. scaf 846 BI528139 33% to 707A2 possible 85 clan member (complete)
28201 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQSRFMFSR (2) 28013
27729 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFPTFKLLMGDFNVAGGGKHIHGPW (0)27538
26890 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 26765
26576 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 26487
26101 AETWLYGMWGLPVPLPGS (2) 26048
25807 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 25718
25284 WTELGSSGAAMADDL (0) 25240
24803 HSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 24672
24589 VIEEFGPELSYKAASSMP (2) 24536
24153 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 24046
23660 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 23493
23362 FTFGYGNHLCAGINLAYL (0) 23309
23164 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 23024

>10. scaffold 1959a no ESTs
7978 GAFPYPTTHPSTITLHVTITQWPFLGDAVELGITXXXXXXXX 7937
7765 RFKKYGRVFRLNLLGHTAFVVRTP 7742
7434 SDEAALRGVLSDDGAIATIPFRAF 7411
7197 LMGEYGTQSVKEIHGPW 7181
6868 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRV 6832
5973 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE 5803
5465 TALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 5420
4094 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY 3915
3914 HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYYFTFGSGVHLCAGVNLVYL 3711
3498 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPL 3346

>30. scaf 25a no ESTs 
125241 MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSRATSVPRLGPLLTIPLLGDVAWLAADPTRFVFGR 125465
125568 RFQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVPFSSFRRLM 125714
125957 XXXXXXXXXXXXXXXRRMFLSALGPAGLQALLPRAQAVMQAHLAQWEAAG 126061
126427 FLDGLFGLPLALPGSSVARALAAKEELVAALGPLVAADRQRMAKR 126561
126753 WRAAGSSYAALVDTL 126797
127088 RAAAVSVLHAVVAGADTTRFALFNTLALVAMSARVQEEIFAEQER 127222
127425 VVAEHGPELSARVLGSAAITPYLDAVVREAMRLLPATPGNMRRLTADLRVGXXXX 
127766 PAGSMVWRFVPLMHCLDPVLWDGDTSVDVP AHMDWRSNFEG 127888
127889 AFRPERWLSEDTKPKYYYTFGSDNHLCVGQNLAYM 127993
128173 EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRGTDRVLLEPR* 128316


>12. scaf 25b BI724239.1 1031069F06.y1 C. reinhardtii CC-1690, Stress II EST support 
138956 MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGPLLHVPLLGDGVLMAAGNPVKMFWDR 139180
142270 YRRYGSVFRTMMLGSRIWVVTDLDALRGPLRDEGAYLEIPFKAFQRLV 142413
142602 SAESFLNRPGVHGPW 142646
142734 RKIFSATLAPPRLAAMVPKIAQ (0) 142799
142972 LMQSHLSKWEEQGQVTIFRA 143031
143173 ARVMGVDLAVDVILDIKLLDGTDRAWVKS 143259
143521 VEDYLDGLYGLPLNLPGSTLSKALAARARLVEVFLRQPDVAAMQAQF 143661
143850 WEAIGKSPQAYAAAVLDQ 143903
144445 DGAMSLLHMLVASADTTRFALFNTWTLLAMSPRVQDKLYEEQKK 144576
144828 VMAEYGEELSYAATCHMPYMDATLKECMRLLPASAGGIRKLTADMQVGGYTVPAG 144992
145328 XXXXXXXXLMHYIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETRPRYMFTFGTGAHLCIGMNLVYL 145522
145693 EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKGTDTVLLEPR 145833

>18. scaf 25c PTQ11643.x1 PTQ6387.y1 N-term 38% to seq 2 and seq 12
169549 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR 169388
169172 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEIPIKSFKMLTELPSPNSDRVNHATW (0) 168982
168818 RKATMAAVGPHALHTLFPPVLEVIRAHADRWT 168723
168370 RKLGLDLSVDVVAGVDLPQSVDRGEFKKQ 168342
168037 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 167894
167270 (2) TGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK 167136
166905 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPTG 166687
166388 ALLLSYSFLMHCIDPALWDGDTSVDVP AHMDWRNNFEG 166275
166274 AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYMVRVQQ (0) 166155
165982 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 165842

>31. scaf 156a  
3380 RLPAGPFGLPFLGNL 3424
3591 IQIAAMDTTAFLTSSAVKYGPVCK (0) 3662
3831 VWFGTRPWVLINDPELIR 3884
4264 RRHSFRWPARPANFASYFHVMTGENRAIDRAGVVLA 4371
5063 PPSLAAHVPAMLRCLGRFTARL 5128
5177 GDLMLAAMGQIAYG 5218
5902 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK 5994
     VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGGFKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE
6987 AGYETTSAATSLALFLLATHPE (0) 7049
7448 AAARLAAEVDAVLGGRELTAELLAE (0)
8071 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR 8190
8546 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGS 
8714 PFGMGTRMCVGHKLAMMVS
9143 QASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRR

>11. +19 SCAF 156B PTQ5694.x1  K-helix to heme = PTQ11662.x1 PTQ243.x1 PTQ52.x1 PTQ9722.x1
18913 AFVWLAYNLPERWRLRRIPG 18854
18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGRV 18648
18338 IWFGVNPWIVVSDPALLR 18285
18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS
17778 RGEVWRRGRRVFEASVIHP 17722
17477 ASLAAHLPAINRCANRF 17427
17301 EVGSYTMAVVGEVAYG 17256
16350 Q VMFPWARPLVRWLATHFPDRAQREHMAARTQI 16318
IANISRLLMERWAASKKAAAAAAGTGGGAGNAAGAGGDRAGGFKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE
15798 FIQQVIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 15691
      AAARLVAEVDAVGGRELTAELLAE (0)
15294 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 15175
HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM
MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTER

>32. scaf 156c  
23868 PGALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 23773
23439 AWLGVEPLIIICDPALIR 23386
23162 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVA 23058
22787 RGEVWRRGRRVFEASVING (0)
22557 PASLAAHLPAINRCANRFVAQL (0?)
22396 IVGGYTMAVTGEVAYG 22349
21478 Q VMFPWARPLVRWLATHFPDRAQREHMAARTQI 21446
IANISRLLMERWATSKKAAAAAA
GKGAEEAIKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE
      VIIAQVIAQSFTFV
20858 MAGFETTALTLSLVTFMLATHPE (0) 20791
      AAARLTAEVDGLGPGELTHEVLAE (0)
20358 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK 20233
19945 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM 19775
19557 QMAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER 

>43 scaffold_683
LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRREVGARSG
8306 DVWHTDTSLPLPAMPAAPLFCPSRPQPNAFLPFGVGSRSCIGR HFGLLSTQ  8178
VRRGGGWKRSGGGEGSGGEGWGTHGAWPRAFRDRGPHGGKWVWLLQGPGLLV* ?????
LAALVARFEVLP 7789
PAPPAPTALDWSQSIVITSRSGVWLR 7711

>13. SCAF 712 AV627084.1 Chlamydomonas reinhardtii 5% to 0.04% CO2 cDNA clone 
35102 MTFLQLLPGVPLVLLGVLALPV (0) 35037
34921 VITLVQEVITKRKYRHIP  34868
34694 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 34611
      VK
SANGSSTNSTSGSSSSTGVAPGSFLGLML
31374 MAPTLTDAQIEAQVQTFLLA  31315
31010 GFETTANALTFAVYLLACHPE 30948

29287 FRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 29156
28541 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR
GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 28308

>17. no scaf 20021010.6327.2 new seq Length = 408 mid gene
HEDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIKAVYGVLKEAEHRSTFYLPYWNLPL
ADVLVPRQAKFRADLKVINECLDNLIKQARDTRVAEDAEALQNRDYSKVSDPSLLRFLVD
MRGEEPTNKQLRDDLMTMLIGGHETTAAV

>26. scaffold_781
MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE
LPLPEGSLGLPVVGETLELITN (1)
GDTFGTSRRERYGDVYKTNILGAPTVMV

STPLTAYGKAVAARQEFGQLVSQSIQRSRQHTA
12675 RYAHVSRNGRERRLTPEPHLSMVCLNHLNALSTWWPVMTRIAVPPWPTAVRQDIVSRHGPA
      ITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK 12376
      GWRVWCHVGDSVTRYNKDQFQPERWLGSSG
11834 QPEYSLPFGSGVRTCLGRNLVMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRR 11637

>33. scaffold_806 missing N-term to KYG motif
220 IWLGNQPWVCVADPDLIR
568 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVSS 672
1530 SLAGHLPAVWRCVRRYTPRL 1589
1838 LADLTLAVVGEAAYG 1882
2710 QMIWPGLTPLWRWMAKHLPDAAQTRHMR 2737
VADVSRQLMAQWQAAKAKTAAAA
FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ
3660 QIVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE 3782
     GQQQREGDDALPE
4526 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG 4645
4956 MHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPAGGGADAAAAAWVPFGMGPRM 5132
5133 CVGSKFATM 5153
5425 VSKAVLLQIYRRFTFELHPKQ (0) 5484
     VLPLRTRTALTHAPRDG 5818

>34 scaffold_437a similar to scaf 521 95% to 437b
36702 GFALLLVSLIIYLLDPIKRWRLRKIPG
36433 PGPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 36320
36203 QVALGRKWVVVLADAEMQRQVDGAGSERGQGGGAQIRL
      WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 35376
35354 AGGGSSVDMWRELGGMTLQV
35034 GYGKQLAAACGQIFRYGSPVHGSP 34963
34836 HSYLRLAMLFPELRSLLLTLAHTLPDEKFTIL 34741
34577 LQARTRLCNTVFQLIDSWKEQH 34512
34445 SSNGVGAAATSGRGGLSGVAPGSFLDLMLGQRQGGERGSGGKKAEGEEGVEHAPLTDEQVAGQ 34257
34116 VQLFILAGYETTANALAFAVYCIATHPEGTATYR 34015
33849 PLAAVESRLLHEVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT
33568 FNIPAGIPIFLPLYIAHRDPAVWPRAEEFLPERFLP (0)  33452
31822 SSPQYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 31712
31582 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRR 31445


>35 scaffold_437b
41842 GLALLLASLLIYLLDPIQRWRLRKVPGER 41756
41597 GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 41487
      QVALGRKWAVVLADAEMQRQVRGTGAERG 41295
      WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 40216
40194 AGGGSSVDMWRELGGMTLQV
39794 GYGKQLAAACGQIFRYTSSAHGSP 39723
39570 HSYLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ 39469
39298 LQARNRLSGAVFQLMDSWKQQH 39233
39091 SSNGVGAAATSGRGGMAGVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 38903
38777 VQLFILAGYETTANALAFAVYCIATHPEGTATYR 38676
38510 PLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLTVRVTN
38229 FNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 38128
37666 SSPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 37556
37363 EAKVALATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVTPVPRR 37226

>39. scaffold_2693 Length = 5632 frags out of order ALMOST IDENTICAL TO SCAF 437
2984 GFALLLVSLIIYLLDPIKRWRLRKIPG 2904
2701 PRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 2594
2490 QVALGRKWVVVLADAEMQRQVDGAGSGRGRGGGAQIRLRDVAGTMAHGR 2344
1800 WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 1669
1342 GYGKQLAAACGQIFRYGSPVHGSP 1271
1145 HSYLRVAMLFPELRSLLLTLAHTLPDEKFTIL 1050
882 LQARTRLCNTVFQLIDSWKQQH 817
753 SNNGVGAAATGGRGLS GVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 568
447 VQLFILAGYETTANALAFAVYCIATHPEGT 358
187 PLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETL 2

3945 SPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 3838
3648 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRR 3511

>29. scaffold_5 RUNS OFF INTO A SEQ GAP
5199 (0) DVVPPLQDVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) 5050
4746 ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGDPADE 4585
     PRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* 4489

>36. scaf 467 
14775 AVLALLLALHVLADPLQRWRLRHIPG 14852
15120 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK
15414 AQVAMGRKWVVVVADAELMRQ
16227 WRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA 
16969 YGRRLAVACGDVFRFGSALHGS 17034
17266 SYQRIGLLLPELVPALVPLAHSLPDPPFKRLQR 17364
17660 QARSTLLAACMELIRSWRQQH 17722
18159 PTAAHTYIHSPLAWPRGHTAHPQVQTFLLAGYETTANALAFAIYCVATHPEGEWRSEGP 18335
18336 RGRAGERRAGSEDGTAARY 18392
18987 RPPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ 
19333 GGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 19479
19665 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ 19715
19932 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRR 20069

>37. scaffold_2628 Length = 5882 runs off end
NOTE: CANNOT FIND AN AG-GT BOUNDARY AT LAST EXON.  
THIS MIGHT HAVE A LONG INSERT IN IT AND NO INTRON
5870 ELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP 5766
5066 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVMRYSLELLQPPPPSPR 4884
4643 QQLFVSLRPGPGVRVCFVPRHQQQVE* 4563

>38. scaffold_668 runs off end
508 RGSFVSISIYNMHRDPAHWKEPERFIPERFLQ 603
905 AATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 
1539 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 1652