This file last modified Sept. 10, 2003
D. Nelson
For a link to the old file on Chlamydomonas P450s see old file
CHLAMYDOMONAS HAS AT LEAST 33-34 P450 GENES (SEE C-TERM ALIGNMENT), one may be a duplicate (scaf 2693).
The sequences include a clear CYP51 (scaffold 58), a clear CYP710 (scaffold 690), There are several CYP97s including four CYP97As and a probable CYP97B. These are the sequences with the best percent identity to Arabidopsis P450s. Scaffold 1399 is 61% identical to CYP97A3. There is a cluster of CYP707 like sequences (probable CYP85 clan), another cluster of 10 CYP711 like sequences, and a cluster of 5 CYP709 like sequences (probable CYP72 clan). There is no CYP74 (allene oxide synthase), or CYP73 (Cinnamate 4-hydroxylase). There are no obvious CYP71 clan members (or Plant group A P450s). This is rather surprising since they dominate the Arabidopsis and rice P450 collection. The single sequence that is most like CYP71s is on scaffold 175. It is so different from other known P450s, I cannot assemble it completely. Blast search of this seq against Arabidopsis turns up CYP71s as the best hits, but only around 29% from the I-helix the the heme. Identity is about 35% from the K-helix the the heme.
The analysis of these sequences is continuing. Six or seven sequences have very little resemblance to known P450s in the N-terminal half. I have not been able to assemble the complete gene in these cases, even when using the Genscan server and blasts against plant ESTs as aids. These may require EST support to be able to assemble them correctly.
Note on sequence nucleotide numbering. The JGI genome blast server does not count nucleotide numbering correctly in its output for TBLASTN searches. The nucleotide numbers are correct on the 5 prime or left side, but the 3 prime side has been counted as amino acids and not as 3 times that value. Not all of these errors have been corrected below.
- For a link to the JGI Chlamydomonas blast server see: JGI Server
- For a link to the Chlamydomonas EST blast server (which also does the genome with correct nucleotide counting) see: EST blast server
- For the GenScan server see: Genscan server
Scaffold numbers are given followed by arbitrary seq numbers for my use in [n]
Scaffold numbers are given followed by arbitrary seq numbers for my use in [n] 5 frag [29] frag EXXR to end, runs off end upstream (Join with scaf 171? No) 25a [30] four genes 125k almost complete 25b [12] four genes 138-145k almost complete 25c [18] four genes 166-169k almost complete 25d [9] four genes 186k almost complete this one joins scaf 1959b 33 frag [28] C-term in seq gap missing N-term and middle (not recognizable) 58 complete [7 8] = CYP51 156a FRAG [31] almost complete 156b frag [11 19] almost complete 156c FRAG [32] almost complete 171 [40] runs off end missing C-term EXXR to end (Join with scaf 5? No) 175 [44] partial, very different sequence (cannot recognize N-term half) 200 (2/3) [20] almost complete 306 N-term + heme [1] missing EXXR to PERF in seq gap 437a frag [34] almost complete 437b frag [35] almost complete 467 frag [36} almost complete 479 frag [3] almost complete 521 frag [4 14] almost complete 574 frag [25] PKG to end, runs off end 636 frag [23] cannot recognize N-term half 668 frag [38] EXXR to end, runs off end upstream 683 frag [43] I-helix [seq gap] heme to end (cannot recognize N-term half) 690 complete [21] 712 frag [13] N-term EST, I-helix, seq gap, heme to end (cannot recognize middle) 781 frag [26] N-term and C-term (cannot recognize middle) 806 frag [33] almost complete missing N-term exon(s) to KYG 846 complete [2] 946 frag [5 22] EXXR to end runs off end 1199 complete [24] 1285 [42] C-helix MAY BE ACCIDENTAL 1399 complete [6 15] 1959a [10] (3-4k) almost complete 1959b + 25 [9] almost complete small gap in middle 2262 [41] N-term only, runs off end 2628 frag [37] C-term, runs off end upstream 2693 frag [39] almost complete almost identical to scaf 437 (duplicate?) 3547 frag [27] partial C-term seq with large seq gap no scaffold frag 16. mid region only SIMILAR TO SCAF 1399 no scaffold frag 17. mid region only SIMILAR TO SCAF 1399 THESE LAST TWO COULD BE UPSTREAM OF SCAF 946 OR DOWNSTREAM OF 574 A partial seq alignment is given below. Dunaliel is Dunaliella another green algae EST frag. seqs. 1-4, 6, 7, 9, 10, 12, 13, 16, 18, 21, 24, 26, 30, 34-36, 40, 41, 43 are N-TERMINAL PARTS 1. MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT---- 21. MNATGLLNDGLASLGMSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLPGPASVTPIIGGIVEMVKDPYG 34. GFALLLVSLIIYLLDPIKRWRLRKIPG-PGPRGRPVLGCLPQLRAQPMP 35. GLALLLASLLIYLLDPIQRWRLRKVPGERGPPARPLLGCLPQLRAQPMP 36. AVLALLLALHVLADPLQRWRLRHIPG--GPPALPLLGSVPAMMRAGGP 30. MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSRATSVPRLGPLLTIPLLGDVAWLAADPTR 10. GAFPYPTTHPSTITLHVTITQWPFLGDAVELGITXXX 18. MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAE 41. MQLTWLGWAPVTRWRLRNIPGPFALPFLGHLPAISARDLV 40. MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTR 26. MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAELPLPEGSLGLPVVGETLELITN-GD 7. MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTGGPWK 24. MDGFWKTLGLGALLSPVLYALYLASLIVIPYLKSLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTPVH 2. MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQS 42. MQIAAMDTT 3. MYAALALVLSPVLLALLWAIINPVERWKTRKIPGPPGLPLLGHLLNFATGDAT 4. MQDVISFLLNGLGFAAVGLVVLQLVLSLDLYKRWKLRHLPGPPALPLLGNLPQILAKGSP 13. MTFLQLLPGVPLVLLGVLALPVVITLVQEVITKRKYRHIPGPKPQPISGNLREFLTSPGG 9. MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLFSWPFLGDTIEFATTDPT 12. MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGXLLHVPLLGDGVLMAAGNPV 6. ARGDIREIVGQPVF 16. ARGNIREIVGQTAT 13. LLGCLEGWVK 2. RFMFSRYKKYGSVFRLNLLGQ 24. QCMARWAEQYGKIFKLELPTMT 10. XXXXXRFKKYGRVFRLNLLGHTAFVVRTP 30. FVFGRRFQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVPFSSFRRLM 34. LFLQSCAQTYGPVFKASQVALGRKWVVVLADAEMQRQVDGAGSERGQGGGAQIRL 35. LFLQSCAQTYGPVFKASQVALGRKWAVVLADAEMQRQVRGTGAERG 36. FFFRQCFAKYGPVFKAQVAMGRKWVVVVADAELMRQ 41. HFCHDVARQYGPVSLTQVWVAARPWIVVSDPVAARKIAYR 40. YMWK-NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF 26. TFGTSRRERYGDVYKTNILGAPTVMV 3. DFTVEAVKKYGNVVAIWFGNRAWITIADPALIRKLGFKFLNRPARMTDFGHVLVGHNAEV 42. AFLTSSAVKYGPVCK-WFSTQPWV-INDPKLVRWVG 4. AFFRECRAKYGPVFRVAFGRNWMVVVAEPDLLRQVGGKLLN----HSMFRGLLGGEFAKL 9. KFLFGRFKRYGRVFRLSLLGFTAYVTADPEALRPLLA-DEG---GHFTIPVQTFTALMGA 18. -FLFSR-KEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEG---ASAEIPIKSFKMLTEL 12. KMFWDRYRRYGSVFRTMMLGSRIWVVTDLDALRGPL-RDEG---AYLEIPFKAFQRLVSA 6. VPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNAD--KYSKGLLSEILDFVMGT 16. VPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNAD--KYSMGLLTKILDLVM-- 7. GECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSEVYDFNIPTFGRGV 28. IPGNGLLVSDGPVWQRQRRLSNPAFRRAAV 1. DGALWQKQRMLMGPALRVDVLDDIIR 41. GPAWKASRRAFETSVLRPDRL 3. DNAGAFMASGEVWRRGRRAFEASIIHPASL 42. FPYRGEAWRRTRRVLEGSIIHPA 34. WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 35. WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 36. WRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA 4. DDWGLVSARDDFWRKVRAAWQPAFXAPSLSGYFPLMTDCA 9. YNLQAHKEVHAAWRKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRV 18. PSPNSDRVNHATWRKATMAAVGPHALHTLFPPVLEVIRAHADRWTQQAQ 12. ESFLNRPGVHGPWRKIFSATLAPPRLAAMVPKIAQLMQSHLSKWEEQGQV 6. GLIPADGEIWKARRRAVVPALHRKYVMSMVDMFGDCAAHGASATLDKYAA 7. VFDVEQKVRTEQFRMFTE-ALTKNRLKSYVPHFNKEAEEYFAKWGETGVV 360 12. TIFRAARVMGVDLAVDVILDIKLLDGTDRAWVKS 664 6. SGTSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQAVYTLLREAEHRSTAP 17. HEDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIKAVYGVLKEAEHRSTFY 7. DFKDEFSKLITLTAARTLLGREVREQLFDEVADLLHGLDEGMVP 429 6. IAYWNIPGIQFVVPRQKRCQEALVLVNECLDGLIDKCKKLVEEEDAVFGGG 17. LPYWNLPLADVLVPRQAKFRADLKVINECLDNLIKQARDTRVAEDAEALQN 17. RDYSKVSDPSLLRFLVDMRGEEPTNKQLRDDLMTML SEQS 5-6,8-11, 14, 19-21, 25-28, 30-34, 36-39, 43 ARE C-TERMINAL PARTS 8 TRKDLAAIFAKIIRARRESGRREEDVLQQFIDARYQNVNGGRALTEEEITGLLIAVL 21. FASQDASTASLV-WTITLMAEHPEVLARVRDEQ---------YRLRPNPEEKVTGDMLN-EMHYTRQV 8. FAGQHTSSITTS-WTGIFMAANKEHYNKAAEEQQ---------DIIRKFGNELSFETLS-EMEVLHRN 43. LGGTDTSALTVA-FAAWHLAAEPQLQAELRREVGARSG------------------------------ 14. LAGYETTANALA-FAVYCLATNPE----------AEAKLLAEIDAVLGP-DRLPTEADLPRLPYTEAV 13. LAGFETTANALT-FAVYLLACHPE-------------------------------------------- 36. LAGYETTANALA-FAIYCVATHPEGEWRSE----------GPSEDGTAARYRPPTESDLPRLPYTEAV 35. LAGYETTANALA-FAVYCIATHPEGTATYRPLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAV 34. LAGYETTANALA-FAVYCIATHPEGTATYRPLAAVESRLLHEVDDVLPGSDQL---------PYTEAV 31. -AGYETTSAATS-LALFLLATHPEAAARLA----------AEVDAVLGRELTAELLAE--KLPYTEAV 28. ----ETSAILLG-WASALLAAHPEVQAAAA----------AEVAAVCGSASPADCSVR--HMPYLESV 38. -------------------------------------------------------------LPYLDAV 32. MAGFETTALTLS-LVTFMLATHPEAAARLT----------AEVDGLGPGELTHEVLAE--KLPYTEAV 3. VAGYETSSNTTT-MASYLLATHPAAQQRMA----------DEIDAVLGGELTPELLAK---LPYTEAV 11. LAGFETSADTLA-LTCYLLATHPEAAARLV----------AEVDAVGGRELTAELLAE--GLPYTEAV 33. LAGFETTAATIS-FTAFCLATHPEAQARLL----------AEVDEGQQQREGDDALPE---LPYLDAV 6. IAGHETTAAVLT-WTLYLLSQHPEAAAAIR----------KEVDELLGDRKPGVEDLRA--LKMTTRV 17. IGGHETTAAV---------------------------------------------------------- Dunaliel -------------------------------------------------------------------- 25. -------------------------------------------------------------------- 20. IAGHETTAAVALTWALHLLVAHPEVMKRVR----------DEVDWVL--GDRLPGSDDLPLLRYTTRV 5. -------------------------------------------------------------------C 23. FAGHETTATSIVRLMLVLANPRPDVVSRLREEQA----------AAVRQHGAAISGSSIRDMPYLDAV 24. IAGFETTAHAIG-WTLMFIAGSPEVESRVA-----AELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA 1. -------------------------------------------------------------------- 27. -------------------------------------------------------------------- 29. -------------------------------------------------------------------- 37. -------------------------------------------------------------------- 26. SMVCLNHLNALSTWWPVMTRIAVPPWPTAVRQD------------IVSRHGPAITAEALDEMSYGTAV 9. -GAADTTRFALF-NTWAILAMSPRVQDLIYEEQKK----------VVAENGPELTYKTAMSMPYLDAA 40. IASDDTSKHLFF-FELVAAAMLPGVWAKLEEEQKQ----------AMRKYGDELSYSILNDMPYLDAV 2. MASADTTRFALF-NTWALVAQSARVQEKLYEEQQK----------VIEEFGPELSYKAASSMPYMDAT 30. -------------------------------------------------------------------- 12. -------------------------------------------------------------------- 18. AAAADTTRVTLF-TVLALVAMSPRVQEEIFAEQQK----------VIAEYGSELSYKVVSDMPYLEAV 10. ---------------------------------------------FVAAHGPELTPAALSSMPYLEAC 21. VKEILRFRPAAPMVPMRAKAPFKL----TETYTAPKGA--------------------- 8. ITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPKGDVVAASPNF------------- 43. ----------------------------------------------------------- 14. FNETMRLYPPAHATNRHTDK-APMQ----GPYTLPKDTTL------------------- 13. ----------------------------------------------------------- 36. LNEAMRLFPPAHATTRIVEAGAPLQ----GGVSLPPRTPL------------------- 35. VNEALRLFPPAHLTSRVVPPGETLT------FNIPAGIPI------------------- 34. VNEALRLFPPAHLTSRVVPPGETLT------FNIPAGIPI------------------- 31. IKETLRLHPGITFLVREATEDVDL----GAGRVVPRGSTL------------------- 28. VLETLRLYSPAYMVGRCARRDAAL-----GPYVLPAGTTV------------------- 38. LKESQRLHPAVGHFWRDATSDIALP--EMGGLVIPKGSFV------------------- 32. IKETLRLHPPIPYFIREAREDLDL----GNGMVAPKGSYL------------------- 3. LQETLRLYPAAPYLLREAREEVDL----GGGRVVPKDSVL------------------- 11. IKEAMRLYPPVPYLLRQAREDLDL----GKGMVAPKHSYV------------------- 33. LKESMRLYPAGSALIRKSPQPLDL---GRDGLVIPG----------------------- 6. INEAMRLYPQPPVLIRRALQDDHF-----DQFTVPAGSDL------------------- 17. ----------------------------------------------------------- Dunaliel ----------------------------------------------------------- 25. ---------------------------------------VPLGQDV------------- 20. VNEALRLYPQPPVLIRRAMQDDVL----------PGGHVVAAGTDL------------- 5. LGESLRMYPQPPILIRRALAEDTL---PAGLRGDPAGYPIGKGADL------------- 23. VKETWRCHPVVPMVPRRAVRDFTL-----GGHDVPQGWGVVLGLVEPM----------- 24. IHESMRLMPPTSGGTVRVVPRDTQ----LAGHVLPKGTMLW------------------ 1. ----------------------------------------------------------- 27. -------LCPTSASLSRCRPQHPT---RVGKYLVPAGTPIGTALFA------------- 29. ----------------VVPPLQDV---VLAGWSVPAGAEV------------------- 37. -------------------------------------------ELV------------- 26. ARELLRITPAVPAVFRLALVDFEL----------------------------------- 9. FKEAMR--LLPASAGGFRMLTKEL---RVGDVLLPPGTIIWFHALLLQTL--DPVLWDG 40. IK--------------------------------------------------------- 2. IKECMR--LLPASAGGPRKLTQDL---KVGEVVLPAGSFVWMYSYLLHCL--DPVLWDG 30. VREAMR--LLPATPGNMRRLTADL---RVG----PAGSMVWRFVPLMHCL--DPVLWDG 12. LKECMR--LLPASAGGIRKLTADM---QVGGYTVPAG--------LMHYI--DPVLWDG 18. VKEAMR--LLPPAAGGMRVLSEPL---TVGDVTLPTGALLLSYSFLMHCI--DPALWDG 10. FRRAMRFSLLPTGGGALRHFTKEL---KAGSVTLPAGEWFSGVYHPHLMHCIDPVLWDG 21. LIVPSLVAACKQGYSNPDSFDPDRF------SPERAEDIKYA------------------ 8. ------SHMLPQCFNNPKAYDPDRF------APPREEQNKP------------------- 43. -------------------------------DVWHTDTSLPLPAMPAAPLFCPSRPQP-- 14. FMSIFSAHHNTDVWPRVNDFVPERFLP----ESPLY---PEVAARVP------------- 13. -------PAAFLRPQHTQAFRPERFL-----SPDVPGSAPELAARHP------------- 36. ILAIYSAHHDPAVWPRPEDFIPERFLP----ASPLH---SEVAARVP------------- 35. FLPMYIAHRDPAVWPRADVFLPERFLH----SSPLYESLQPRGAAQQ------------- 34. FLPLYIAHRDPAVWPRAEEFLPERFLP----SSPQYESLQPRGAAQQ------------- 31. CMATHAVMHDPDIWPEPEAFRPERFLPEGS------------------------------ 28. LVSPYVMHRDPEVWEEPEVFRPERWQELQRRCER------------------GLMSNLGP 38. SISIYNMHRDPAHWKEPERFIPERFLQ----GALGPTDP--------------------- 32. TMYMHAVHLNPDVWPHPERFLPQRFLPEGS-AAFGPADP--------------------- 3. VLHVHSMQRDPDVWPQPEAFLPQRYLPEGQ-AALGPADP--------------------- 11. VLYVHSMHLNPDVWPHPERFLPQRFLPEGS-AAFGPADP--------------------- 33. -------MHDPAIWPEPEAFRPERFLPEGS------------------------------ 6. FISVWNLHRSPKLWDEPDKFKPERF---------GPLDSPI------------PNEVTEN 17. ------------------------------------------------------------ Dunaliel ------------------------------------------------------------ 25. MISVYNIHHSPAVWDDPE-FIPERF---------GPLDGPV------------PNEQNTD 20. FISVWNLHHSPQLWERPEAFDPDRF---------GPLDSPP------------PTEFSTD 5. FISVWNLHRSPYLWKDPDTFRPERFFEPNSNPDFGGKWAGYRPDAVTGGAALYPNEVASD 23. RDLPAWSGLTPDSPLHPSHFNPDRWLSGRSSASGN---------------------GMLP 24. IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG-----------------------A 1. ------------------------------------------------------------ 27. ------IHNTRHNWTDPLAFRPQRWMGESS------------------------------ 29. WVDVHAMHRNPQLWRDPDRFNPERW----------------------------------- 37. VSPYVLHRLPRLWGPHAACFQPERFMP--------------------------------- 44. ------------------------------------------------------------ 26. ------------------------------------------------------------ 9. DTSVDVPVHMDWRNNFEGAFRPERWLSEETKP---------------------------- 40. ------------------------------------------------------------ 2. DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP---------------------------- 30 DTSVDVPAHMDWRSNFEGAFRPERWLSEDTKP---------------------------- 12. DTSVDVPAHMDWRNNFEGAFRPERWLSEETRP---------------------------- 18. DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP---------------------------- 10. DTSVDVPAHMDWRNNFEGAFRPERWLSEETKP---------------------------- 21. SNFLVFGHGPHYCVGKEYAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIF 8. YAFIGFGAGRHACIGQNFAYLQIKSIWSVLLRNFEFELLDPVPEAD---YESMVIGPKP- 43. NAFLPFGVGSRSCIGRHFGLLSTQ------------------------------------ 14. HAHAPFGFGSRMCIGWKFAVQ-EAKVALAALYQRLTFELEPGQV-PLQTAVGITLSPRNG 13. HVHLPFGSGPRMCIGWRFAMQ-EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNG 36. GAHAPFGYGSRMCIGWKFAMQ-EAKLVLALLYQRLLFRLQPGQV-PLPTATALTLAPRDG 35. HAHAPFGYGSRMCIGYKFAMQ-EAKVALATLYRRLTFTLEPGQQ-PLQVEASLTMAPRGG 34. HAHAPFGYGSRMCIGYKFAMQ-EAKVALATLYRRLTFTLEPGQQ-PLKLVASVTMSPRGG 31. HVWAPFGMGTRMCVGHKLAMM-ASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADG 28. GAYLPFGGGPRN------------------------------------------------ 38. GAYVPFGSGPRMCVGYKMAIM-VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 32. GAWAPFGIGARMCVGHKLAMM-MAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDG 3. NGWAPFGVGARMCVGHKLAMM-VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADG 11. GAWAPFGIGARMCVGHKLAMM-MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDG 33. AAWVPFGMGPRMCVGSKFATM-VSKAVLLQIYRRFTFELHPKQVLPLRTRTALTHAPRDG 6. FAYLPFGGGRRKCIGDQFALF-EAVVALAMLMRRYEFNLDESKGTV-GMTTGATIHTTNG 17. ------------------------------------------------------------ Dunaliel --------------------H-EAVVALSVLLKNFNEAQLVRNQTI-GMTTGATIHTTNG 25 FSYIPFSGGPRKCVGDQFALM-EAVVALTVLLRQYDF-QMVPNQQI-GMTTGATIHTTNG 20. FRFLPFGGGRRKCVGDMFAIA-ECVVALAVVLRRYDFAPDTSFGPV-GFKSGATINTSNG 5. FAFIPFGGGARKCVGDQFAMF-EATVAAAMLLRRFTFRLAVPAEKV-GMATGATIHTANG 23. PQMLTFGGGGRYCLGANLAWA-ELKVFVAVLLRGYDFTSPLPELE--------------- 24. RGFLPFSEGPRNCVGQSLALL-ELRTALALLCGSFR------------------------ 1. --------GPRNCLGQHLALL-EARVVLGLLHARFSFKPAPSVHPDPASLFMRHPTVIPV 27. LSYMPFSEGPRSCVGQSLAKL-EVMTVLATLLAHFR------------------------ 29. LAFMPFGSGPRSCLGQQLAAA-ELKAALAVLLCFLALEPTGDPADEPRPAAGLFLRPAGG 37. GPYLPFGAGPRACPGASFGSA-EVKLLVAHVVMRYSLELLQPPPPSPRQQLFVSLRPGPG 44. AEARPFGIGPRACPAGSLSVV-IVREALAALLTKYRWRLYDEVGDRDWMSGAVSTPTMAF 26. EYSLPFGSGVRTCLGRNLVMT-ELLVVLAVLARGYEWEAVNPAEQW-G--VVPSPAPKEG 9. RSYYIFGQGAHLCAGMVLVTL-EVKLLLAMVLRKWRLQLEVPDMLAR-AELFPYPKPPKG 40. ------------------------------------------------------------ 2. KYYFTFGYGNHLCAGINLAYL-EIRTMLALVIRKYRLRLQTPDMLSR-ARYFPFVEPSPG 30. KYYYTFGSDNHLCVGQNLAYM-EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRG 12. RYMFTFGTGAHLCIGMNLVYL-EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKG 18. KYYYTFGVGKHMCAGIHLVYM-VRVQMVALLVRKHRLKLQTPDMFER-ATWLPFTTPAPG 10. KYYFTFGSGVHLCAGVNLVYL-EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPG 21. DLSWSAKK*---------------------- 8. CRVRYTRRKL*-------------------- 43. ------------------------------- 14. VWVRPVARRLTPRQPTTPPVGSAAK*----- 13. VWVRLSPRGGGGSGGGGGRGQEVATAAAKGAAVRSAAA 36. LWVRPVLRR---------------------- 35. LRVTPVPRR---------------------- 34. LHVTPVPRR---------------------- 31. VWMTVTRR----------------------- 28. ------------------------------- 38. ------------------------------- 32. VWVTLTER----------------------- 3. VWLTLTER*---------------------- 11. IWLTLTER----------------------- 33. ------------------------------- 6. LNMFVRRRDPLTVPPPSSSMAEAVSTGYAF- 17. ------------------------------- Dunaliel LYMTVKERTPAAAALAGATA*---------- 25. LYMYVKER----------------------- 20. LHMLISRRDLT-------------------- 5. LSMRVTRRTPGGGSGSGAPG----------- 23. ------------------------------- 24. ------------------------------- 1. GPIRGLKVLVEQRK*---------------- 27. ------------------------------- 29. LHLLLVHRQ---------------------- 37. VRVCFVPRHQQQVE*---------------- 44. RPPLRVVFARVVEDGGESS*----------- 26 LRVRLHRR----------------------- 9. TGGIRLIAREQPLALGARTQGGVNLGSRVFE 40. ------------------------------- 2. TDTVLLEAR*--------------------- 30. TDRVLLEPR*--------------------- 12. TDTVLLEPR---------------------- 18. TDTVLFEPR*--------------------- 10. TDKVELLPREQPLPVASIDL----------- >44. scaffold 175 very different 63125 (0) HAALLPRLLCRPELSRAEAVANCHSCLLAGYETTAHTLACCLLHLGQRPQ 62976 VGRGRERGGRELARMEVKRGGDRF (2) 62528 GMALLGAVIRETLRVNPPVIGLPRVVSAPGGITVRLPAGSa (1?) 62412 61349 WDPTRTAAPAGAVGADGAAPSDPFAEARPFGIGPRACPAGSLSVVIVREALAALLTKYRWRL 61164 61163 YDEVGDRDWMSGAVSTPTMAFRPPLRVVFARVVEDGGESS* 61041 >6. 15. SCAF 1399 15 EXONS 60% TO 97A3 FIRST EXON PREDICTED BY GENSCAN BM003139 BI725954 BE441929 BI719213 13327 MLSFSTSISGCRFGSWAVPSFGPRRAPTSTPTCRLGFDTGRSAARFLADLGRQWRAEASKRMP EVRLELRPCDGGGRASCPVLGKSTYT (0) 13061 12913 ARGDIREIVGQPVFVPLYKLFLVYGKIFRLSFGPKSFVIISDPAYAKQILLTNADKYSKGLLSEILDFVMGT 12698 12532 GLIPADGEIWKARRRAVVPALHRK 12461 12332 YVMSMVDMFGDCAAHGASATLDKYAASG 12249 11994 TSLDMENFFSRLGLDIIGKAVFNYDFDSLAHDDPVIQ 11884 11707 AVYTLLREAEHRSTAPIAYWNIPGIQFV 11624 11493 VPRQKRCQEALVLVNECLDGLIDKCKKLV 11407 11269 EEEDAVFGEEFLSERDPSILHFLLASGDEISSKQ (0) 11168 11003 LRDDLMTMLIAGHETTAAVLTWTLYLLSQHPEAAAAIRKE (0) 10884 10681 VDELLGDRKPGVEDLRALK (0) 10625 10448 MTTRVINEAMRLYPQPPVLIRRALQ 10374 10118 DDHFDQFTVPAGSDLFISVWNLHRSPKLWDEPDKFKPER 10002 9580 FGPLDSPIPNEVTENFAYLPFGGGRRKCIGDQ 9485 9358 FALFEAVVALAMLMRRYEFNLDESKGTVGMTT 9263 9124 GATIHTTNGLNMFVRRRDPLTVPPTSSSVAETVSTGYAFACG PAVMPVASAEVVAAPATAAGGGCPFHTAAGAAVPAATMSLRPTGPPSA* 8852 >21. Scaffold 690 10 EXONS 43% to 710A1 exon 1 predicted by genscan. EST SUPPORT 20577 MNATGLLNDGLASLGMSGFGDNLASGPALVAAGGALALGYALWEQMKFRWYRSDKNGNMLP (1) 20356 20000 GPASVTPIIGGIVEMVKDPYGFWERQRLYSFP 19905 19904 GMSWNSIVGIFTVFVTDPALSRYVFSHNSSDSLLLALHPN (1) 19785 19644 AEWILGKTNIAFMSGPEHKALRKSFLALFTRKALGLYVLKQDDVIRKHFNEWMQ (0) 19498 19355 TAGPREIRPFIRDLNAYTSQEVFVGPYLDDPT (0) 19269 18917 EREKFSDAYRAMTDGFLAFPLLLPGTGVWKGRQGRQFIVK (0) 18802 18583 TLTRAAARSKVRMAAGQEPECLLDFWTKQ (0) 18497 18215 ILSDIKDAADAGQEAPFYADDKKIAETVMDFLFASQDASTASLVWTITLMAEHPEVLAR (0) 18012 17722 VRDEQYRLRPNPEEKVTGDMLNEMHYTRQVVKEILRFRPAAPMVPMRAKAPFKLTETYTAPKGALIVPSLVAACKQ 17456 (0) 17279 GYSNPDSFDPDRFSPERAEDIKYASNFLVFGHGPHYCVGKE 17155 (0) 16995 YAMNHLTVFLALLATSLDFPRIRSKVSDDIIYLPTLYPGDSIFDLSWSAKK* 16840 >7. 8. SCAF 58 10 EXONS 56% TO 51A2 EST SUPPORT BI717817 BU649818 BI726293 BM001590 BI718677 more AV642299 60124 MDLPPELAVLADKVLSLSPVVLVALGSAVLILALAVGRVLFNLLPSKRPPVWEGLPFIGGLLKFTG 59927 59843 GPWKLLENGYAKFGECFTVPVAHRRVTFLIGPEVSPHFFKAGDDEMSQSE 59694 59394 VYDFNIPTFGRGVVFDVEQKVRTEQFRMFTEALTKNRLKSYVPHFNKEAE 59245 59108 EYFAKWGETGVVDFKDEFSKLITLTAARTLL 59016 58765 GREVREQLFDEVADLLHGLDEGMVPLSVFFPYAPIPVHFKRDR (2) 58637 58412 CRKDLAAIFAKIIRARRESGRREEDVLQQFIDAR 58311 58119 YQNVNGGRALTEEEITGLLIAVLFAGQHTSSITTSWTGIFMAANK 57985 57667 EHYNKAAEEQQDIIRKFGNELSFETLSEMEVLHRNITEALRMHPPLLLVMRYAKKPFSVTTSTGKSYVIPK 57455 57191 GDVVAASPNFSHMLPQCFNNPKAYDPDRFAPPREEQNKPYAFIGFGAGRHACIGQNFAYLQ (0) 57009 56877 IKSIWSVLLRNFEFELLDPVPEADYESMVIGPKPCRVRYTRRKL* 56743 >24. SCAF 1199 14 exons 72A9 LIKE exons 3,4,5,13,14 not well supported 8031 MDGFWKTLGLGALLSPVLYALYLASLIVIPYLKSLPLRRKLRHLPGPPVTGFFLLGNVPDLVRTP (1) 8225 8408 VHQCMARWAEQYGKIFKLELPTMT (0) 8479 8633 ELIRLTNITTRLGLVYDTGRTFGT 8704 8705 RAKRRPGGSLPRTWPRDVAASMQYDALIQPDLSIVSAYKRDSKSCRT (2) 8845 8884 GNAKTGPNGRCDTARTLTRQDTTPGHREA (0) 8970 10741 VMTGLAAAGPSAALDLDRVAQRLTIDVIGRFAFDRDFGATADIAKTNEALQ (0) 10893 11059 VVGELMTALQRMLNPLNRWFWWRK (0) 11130 11410 EARGLWASRRRYDALVRRALEDLRSSPPAQHTLLHHLMSLTDPDT (1) 11544 11782 GKPLSARRLRSETALFWIAGFETTAHAIGWTLMFIAGSPE (0) 11901 13254 VESRVAAELEGAGLLAVPGRPEPRQLAWGDLGGLKYLNA (1) 13370 13544 VIHESMRLMPPTSGGTVR (2) 13597 13750 VVPRDTQLAGHVLPKGTMLW (0) 13809 14146 IPFYAMQRSERVWGPDAAQFRPERWLAAAAGAGGPG (0) 14253 14541 ARGFLPFSEGPRNCVGQSLALLELRTALALLCGSFR (2) 14648 14920 FRLADDMGGVEG (1) 14955 15160 AVSEARQHITLKPGDRGLLMHAIPRVPA* 15246 17630 ATGITAGGRGGAW 17668 ???? >20. SCAF 200 94E3 aaaa01014899.1 LIKE seq gap 9247 SSRHSKGILAEILEFVMGN (0) 9306 seq gap 9705 SVDMESFFSRLSLDIIGKSVFDYDFDSLRHDDPVIQ 9812 10081 AVYSVLRESTVRSTAPX 10128 10371 ADWKLPGISLLVPRLRESDAALAIVNDTLDRLIARCKSM 10487 10853 PTVLHFLLGSGEALNSRQLRDDLMTLLIAGHETTAAV 10963 11275 LTWALHLLVAHPEVMKRVRDE 11277 11605 VDWVLGDRLPGSDDLPLLRYTTRVVNEALRLYPQPPVLIRRAMQ 11736 11956 DDVLPGGHVVAAGTDLFISVWNLHHSPQLWERPEAFDPDR 12075 12251 FGPLDSPPPTEFSTDFRFLPFGGGRRKCVGDMFAIAECVVALAVVLRRYDFAPDTSFGPVGFKS 12442 12584 GATINTSNGLHMLISRRDLT 12643 12644 GVPPPAPRAPAAAAGAAAGSCPHAAAAAATAAAAAAVGCPHAAAAATSGAPAGVTP 12811 >5. 22 scaf 946 CYP97B AV390436 Chlamydomonas reinhardtii EXXR to end No additional ESTs CLGESLRMYPQPPILIRRALAEDTLPAglrgdpagypigkGADLFISVWNLHRSPYLWKDPDT FRPERFFEPNSNPDFGGKW (?) 188 AGYRPDAVTGGAALYPNEVASDFAFIPFGGGARKCVGDQFAMFEATVAAAMLLRRFTFRLAVPAEK (0) 385 604 VGMATGATIHTANGLSMRVTRRTPGGGSGSGAPG 705 >4. 14. scaf 521 BE726345 N-term to C-helix 33% to CYP711A1 BM002146 BI728655 37486 MQDVISFLLNGLGFAAVGLVVL (0) 37551 37671 QLVLSLDLYKRWKLRHLP (1?) 37724 37956 GPPALPLLGNLPQILAKGSPAFFRECRAKYGPVFR (0) 38060 38400 VAFGRNWMVVVAEPDLLRQ (0) 38456 38720 VGGKLLNHSMFRGLLGGEFAKLDDWGLVSAR 38812 39351 DDFWRKVRAAWQPAFXAPSLSGYFPLMTDCAVRLADKLEGLARRQPG 39491 39994 YGRVLVQACRDVFKYSSVVYGS 40059 40319 YSRVGLLFPEWRPVVAILANAAPDLPFKML 40408 40595 QARTHLRDACMSLIDGWKKQ 40654 41049 QVQTFILAGYETTANALAFAVYCLATNPEGE 41141 41578 RLPTEADLPRLPYTEAVFNETMRLYPPAHATNRHTDKAPMQ 41700 42000 GPYTLPKDTTLFMSIFSAHHNTDVWPRVNDFVPERFLPVS 42119 42294 ESPLYPEVAARVPHAHAPFGFGSRMCIGWKFAVQ (?) 42395 42712 QEAKVALAALYQRLTFELEPGQ (0) 42771 43237 VPLQTAVGITLSPRNGVWVRPVARRLTPRQPTTPPVGSAAK* 43362 >28. scaffold_33 31584 IPGNGLLVSDGPVWQRQRRLSNPAFRRAAV 31495 30609 PSDLLTSLLLARDEDGSGMSDQALRDELMTLLVAGQ (0) 30502 30091 ETSAILLGWASALLAAHPEVQAAAAAEVAAVCG 29766 VRHMPYLESVVLETLRLYSPAYMVGRCARRDAALGPYVLPAG TTVLVSPYVMHRDPEVWEEPEVFRPERWQELQRR 29548 29296 EGYSGYMGLMSNLGPNGAYLPFGGGPRN 29261 (SEQ GAP HERE) >23. SCAF 636 13432 WPAATVAMLGTDSVTFST 13379 13145 GAYHRSLRRLLGPCFSPQ 13092 C-helix? 12878 AVEGYLPSIQAICERYCAEWAAETTAAAAAAAPAATGGDSSA 12363 GIFAPVALAIPGSN 12322 12098 YAKASAARKVMVAAL 12054 LLFAGHETTATSIVRLMLVLANP RPDVVSRLREEQAAAVRQHGAAIS 10590 GSSIRDMPYLDAVVKETWRCHPVVPMVPRRAVRDFTLGGHDVPQ (0) GWGVVLGLVEPMRDLPAWSGLTPDSPLHPSHFNPDR (1) WLSGRSSASGN (?) GMLPPQMLTFGGGGRYCLGANLAWAELK (0) VFVAVLLRGYDFTSPLPELEVKLFPALTVAQGFPIE >40. scaf 171 runs off end 3676 MAPLLDAKQLELLGIGMQLAAVLLVLYYLLKWLAGKRGGVPGPAFYLPAIGETLSLFASPTRYMWK (0) 3500 NWLEYGPFFRTHLLGYPLYVVGSPGLLKPVLGDDSAFEFF (0) VPGKTFTMLISDIRHMQVPEQHAVF RRRLGQALNPGALSRHVMAPLRVVLERHLDAWEAAGRVQLAEA 1651 LAGLYGVPLPWLPGTAIHSALRAQRRLMALLGP 1553 GTPLSLTKEQIFERALGVVIASDDTSKHLFFFELVAAAMLPGVWAKLEEEQKQ (0) ectvirygtllpdwphattviitamtsq (THIS MAY BE INTRON) AMRKYGDELSYSILNDMPYLDAVIK (0) 497 >27. scaffold_3547 Length = 4667 partial seq with large gap LCPTSASLSRCRPQHPTRVGKYLVPAGTPIGTALFAIHNTRHNWTDPLAFRPQRWMGESS 4545 LSYMPFSEGPRSCVGQSLAKLEVMTVLATLLAHFR 4646 >25. scaffold_574 RUNS OFF END Length = 44,663 97C1 LIKE 68% 44288 VPLGQDVMISVYNIHHSPAVWDDPEX 44214 43839 FIPERFGPLDGPVPNEQNTDF 43777 43352 SYIPFSGGPRKCVGDQFALMEAVVALTVLLRQYDFQMVPNQQ 43227 42864 IGMTTGATIHTTNGLYMYVKER 42799 >1. scaf 306 AV623700 N-term MASSSSPLEELLAFAGVKDGTISSPRLALVVLGAALAAYALVFAVINVVDYIRIARGLSAIPSAPGGVPLLGHVIPMLT CVSQNKGAWDIMEDWMDAKGPIVKYNIAGTQGVAVRDPKAMKRIFQTGYKLYEKDLKLSYRPFLPILGTGLVTS DGALWQKQRMLMGPALRVDVLDDIIR IAKKAIDRLCEKLSHHAGKGDIVDIEEEFRLLTLQVIGEAVLSLGPEECDRVFPQLYLPV MNEANRRVLRPYRMYLPTPEWFRFSSRMGQLNGFLIDLFRRRWQARQAAAAAAQGEGSSS SKPKPADILDRIMEAIEESGAKWDAALETQLCYEVKTFLLAGHETSAAMLTWSTLELAAH SQAADKVVEEARAAFGPR (SEQUENCE GAP) GPRNCLGQHLALLEARVVLGL LHARFSFKPAPSVHPDPASLFMRHPTVIPVGPIRGLKVLVEQRK >41. scaffold_2262 N-term, runs off end 2459 MQLTWLGWAPVTRWRLRNIP (1?) 2400 1885 GPFALPFLGHLPAISARDLVHFCHDVARQYGPV 1787 1503 SLTQVWVAARPWIVVSDPVAARKIAYR 1423 1222 LARPSTVASFTHALVGEPRQVDDESIF 1142 784 GPAWKASRRAFETSVLRPDRL 722 721 AAHMPAVRRCTERFLARL 712 >42. scaffold_1285 C-helix similar to scaf 479 gene region on scaffold not large enough for a P450 BUT THIS MAY BE INCORRECTLY ANNOTATED 13946 MQIAAMDTTAFLTSSAVKYGPVCK 13875 13607 WFSTQPWVINDPKLVRWVG 13551 12773 FPYRGEAWRRTRRVLEGSIIHPA 12705 >3. scaf 479 AV641971 35% to 703A2 N-term to C-helix 51492 MYAALALVLSPVLL (0) 51451 51367 ALLWAIINPVERWKTRKIPG (2) 51308 51224 PPGLPLLGHLLNFATGDATDFTVEAVKKYGNVVA (0)51123 50867 IWFGNRAWITIADPALIR (2)50814 50325 KLGFKFLNRPARMTDFGH (0) 50272 49795 VLVGHNAEVDNAGAFVAR (2)49706 49574 GEVWRRGRRAFEASIIHPAS (2) 49515 49142 LAAHLPAINRCANRF (0) 49098 48929 AGNYTMAAVGEVAYG (2) 48885 47764 (0) LMFPALRPLWRWMAEHLPDAAQTENMRARSK (0) 47672 46972 VIGQGFTFLVAGYETSSNTTTMASYLLATHPAAQQRMADEIDAVLG 46832 46831 pwragagagegacagGELTPELLAK (0) 46757 46326 LPYTEAVLQETLRLYPAAPYLLREAREEVDLGGGRVVPK (2) 46288 46008 DSVLVLHVHSMQRDPDVWPQPEAFLPQRYLPEGQAALGPADPNGWAPFGVGARMCVGHKLAMM (0) 45820 45561 VTKVALVRMYQRFRVSLHPRQPLPLKMKTGLVRVPADGVWLTLTER* 45421 >BM446811 halotolerant green alga, Dunaliella salina New not chalmy HEAVVALSVLLKNFNXEAQLVRNQTIGMTTGATIHTTNGLYMTVKERTPAAAALAGATA* >16. no scaf PTQ4692.y1 Mid region of CYP97 like seq similar to seq 6 64% TO SCAF 1399 ARGNIREIVGQTATVPLNKLFLVYVQIFRVSFRPRASGSSLSPHDAKEILRTNADKYSMGLLTKILDLVM ST851 >9. scaffold 1959b 25D BI527318 BG852189 BE129324 BI527323 BI527331 28% to 702As MGEQGAAAGTPLALAATLLAGTILVFYIYQQLKPSKSRLPGPLFSWPFLGDTIEFATTDPTKFLFGR FKRYGR 185655 VFRLSLLGFTAYVTADPEALRPLLADEGGHFTIPVQTFTALMGAYNLQAHKEVHAAW 185825 RKVLMAALTGSGMAKLVPGVVAVMGRHVEGWAQAGRV ELYEAARTLGLDLAVDVLSGVKLEERGIQPXWLKSRMADFLXRLYGLPLALPGSLAR FTARGCTTPRDAAMTVLHAVMGAADTTRFALFNTWAILAMSPRVQDLIYEEQKK (0) VVAENGPELTYKTAMSMPYLDAAFKEAMRLLPASAGGFRMLTKELRVGDVLLPPGTIIWFHALLLQTLDPVLWDGDTSVDVP VHMDWRNNFEGAFRPERWLSEETKPRSYYIFGQGAHLCAGMVLVTLEVKL LLAMVLRKWRLQLEVPDMLARAELFPYPKPPKGTGGIRLIAREQPLALGARTQGGVNLGSRVFEF* >2. scaf 846 BI528139 33% to 707A2 possible 85 clan member (complete) 28201 MDLTKIHEDPIGLLLAMIAGALVAFFLLARKEKRPLGPMFTLPILGDTVALALSEQSRFMFSR (2) 28013 27729 YKKYGSVFRLNLLGKHMYILSDLEALRGPYRDEGAIPEVPFPTFKLLMGDFNVAGGGKHIHGPW (0)27538 26890 RKASLAALGPAGLQSMFPPVLRVMQSHLSEWEAAGRVEVFQS (0) 26765 26576 ARRMGLELAVDVVADVELSPAVDRAWFKQQ (0) 26487 26101 AETWLYGMWGLPVPLPGS (2) 26048 25807 ALAKALAARKVLLRVLGQELAADHEDYKSR (0) 25718 25284 WTELGSSGAAMADDL (0) 25240 24803 HSALAVLHAVMASADTTRFALFNTWALVAQSARVQEKLYEEQQK (0) 24672 24589 VIEEFGPELSYKAASSMP (2) 24536 24153 YMDATIKECMRLLPASAGGPRKLTQDLKVGEVVLPA (1) 24046 23660 GSFVWMYSYLLHCLDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYY (0) 23493 23362 FTFGYGNHLCAGINLAYL (0) 23309 23164 EIRTMLALVIRKYRLRLQTPDMLSRARYFPFVEPSPGTDTVLLEAR* 23024 >10. scaffold 1959a no ESTs 7978 GAFPYPTTHPSTITLHVTITQWPFLGDAVELGITXXXXXXXX 7937 7765 RFKKYGRVFRLNLLGHTAFVVRTP 7742 7434 SDEAALRGVLSDDGAIATIPFRAF 7411 7197 LMGEYGTQSVKEIHGPW 7181 6868 RKLIMAAVNGRGLSELVPGVAGVMARHVAGWAQAGRV 6832 5973 GLPVRLPGSDYSAALAAKERLIAALMPEMRDAHAAMLKRWEAAGRSGPALAAALLEE 5803 5465 TALRDAPMTILNAVVAAADTTRFSLFTFWAMVAMSTRVQEEIFGEQQR (0) 5420 4094 VVAAHGPELTPAALSSMPYLEACFKEAMRLLPTGGGAVRHLTKELKAGSVTLPAGEWVWY 3915 3914 HPHLMHCIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETKPKYYFTFGSGVHLCAGVNLVYL 3711 3498 EAKLVMAMLVRRFRLRLSAPDMLARCTRVFPFMQPVPGTDKVELLPREQPL 3346 >30. scaf 25a no ESTs 125241 MAVFGFRELFASMYIPGLSPVLSTITCLAGVLLFLAWQRHSRATSVPRLGPLLTIPLLGDVAWLAADPTRFVFGR 125465 125568 RFQRYGPTFILNLMGVPLYVLTQPADLRGPYRDQGAEPDVPFSSFRRLM 125714 125957 XXXXXXXXXXXXXXXRRMFLSALGPAGLQALLPRAQAVMQAHLAQWEAAG 126061 126427 FLDGLFGLPLALPGSSVARALAAKEELVAALGPLVAADRQRMAKR 126561 126753 WRAAGSSYAALVDTL 126797 127088 RAAAVSVLHAVVAGADTTRFALFNTLALVAMSARVQEEIFAEQER 127222 127425 VVAEHGPELSARVLGSAAITPYLDAVVREAMRLLPATPGNMRRLTADLRVGXXXX 127766 PAGSMVWRFVPLMHCLDPVLWDGDTSVDVP AHMDWRSNFEG 127888 127889 AFRPERWLSEDTKPKYYYTFGSDNHLCVGQNLAYM 127993 128173 EVKLLLAMLLRKYRLQLHTPDMLARASQMFPFVIPRRGTDRVLLEPR* 128316 >12. scaf 25b BI724239.1 1031069F06.y1 C. reinhardtii CC-1690, Stress II EST support 138956 MAGLATFEPSAQTPLTWSLALFSSFVAGLYVTFAIYRSFGKGAKKLPPGPLLHVPLLGDGVLMAAGNPVKMFWDR 139180 142270 YRRYGSVFRTMMLGSRIWVVTDLDALRGPLRDEGAYLEIPFKAFQRLV 142413 142602 SAESFLNRPGVHGPW 142646 142734 RKIFSATLAPPRLAAMVPKIAQ (0) 142799 142972 LMQSHLSKWEEQGQVTIFRA 143031 143173 ARVMGVDLAVDVILDIKLLDGTDRAWVKS 143259 143521 VEDYLDGLYGLPLNLPGSTLSKALAARARLVEVFLRQPDVAAMQAQF 143661 143850 WEAIGKSPQAYAAAVLDQ 143903 144445 DGAMSLLHMLVASADTTRFALFNTWTLLAMSPRVQDKLYEEQKK 144576 144828 VMAEYGEELSYAATCHMPYMDATLKECMRLLPASAGGIRKLTADMQVGGYTVPAG 144992 145328 XXXXXXXXLMHYIDPVLWDGDTSVDVPAHMDWRNNFEGAFRPERWLSEETRPRYMFTFGTGAHLCIGMNLVYL 145522 145693 EVKLLLSMVLRKYRLRLHTPDMLLRCERLFPFFLPAKGTDTVLLEPR 145833 >18. scaf 25c PTQ11643.x1 PTQ6387.y1 N-term 38% to seq 2 and seq 12 169549 MDYMQLLVGLLAILLASILLLRSSGKRLSPRFRVPLLGDTIKMAKRPAEFLFSR 169388 169172 FKEFGPVFTLDLMGSTYWVVADMDAQRRFLYRTEGASAEIPIKSFKMLTELPSPNSDRVNHATW (0) 168982 168818 RKATMAAVGPHALHTLFPPVLEVIRAHADRWT 168723 168370 RKLGLDLSVDVVAGVDLPQSVDRGEFKKQ 168342 168037 VEVWLDGLFVLPLALPGTKLARAMAAKKWLLATLMPALSDVHGRFSKQ (0) 167894 167270 (2) TGLRESAIAVLQAVAAAADTTRVTLFTVLALVAMSPRVQEEIFAEQQK 167136 166905 VIAEYGSELSYKVVSDMPYLEAVVKEAMRLLPPAAGGMRVLSEPLTVGDVTLPTG 166687 166388 ALLLSYSFLMHCIDPALWDGDTSVDVP AHMDWRNNFEG 166275 166274 AFRPERWLSEETKPKYYYTFGVGKHMCAGIHLVYMVRVQQ (0) 166155 165982 EVKTMVALLVRKHRLKLQTPDMFERATWLPFTTPAPGTDTVLFEPR* 165842 >31. scaf 156a 3380 RLPAGPFGLPFLGNL 3424 3591 IQIAAMDTTAFLTSSAVKYGPVCK (0) 3662 3831 VWFGTRPWVLINDPELIR 3884 4264 RRHSFRWPARPANFASYFHVMTGENRAIDRAGVVLA 4371 5063 PPSLAAHVPAMLRCLGRFTARL 5128 5177 GDLMLAAMGQIAYG 5218 5902 LMFPALEPLWLWAAHHMPDAKQTKAMRARSK 5994 VAEVSRLLMEQWQANKAAAVAAAASGGAGGADGGDRAGGFKEVGGGISSSSFMAAMMEGRRGAVEDRLSDIE 6987 AGYETTSAATSLALFLLATHPE (0) 7049 7448 AAARLAAEVDAVLGGRELTAELLAE (0) 8071 KLPYTEAVIKETLRLHPGITFLVREATEDVDLGAGRVVPR 8190 8546 GSTLCMATHAVMHDPDIWPEPEAFRPERFLPEGS 8714 PFGMGTRMCVGHKLAMMVS 9143 QASKATLVSLCQRFSFALHPKQPLPLKLKTGLTYGPADGVWMTVTRR >11. +19 SCAF 156B PTQ5694.x1 K-helix to heme = PTQ11662.x1 PTQ243.x1 PTQ52.x1 PTQ9722.x1 18913 AFVWLAYNLPERWRLRRIPG 18854 18740 GPVGLPFLGNILSFSTYGHDYFAMMEKYGRV 18648 18338 IWFGVNPWIVVSDPALLR 18285 18027 KLAYKCVGKPASMSEYGHVLTGENYEIEQANAFVAS 17778 RGEVWRRGRRVFEASVIHP 17722 17477 ASLAAHLPAINRCANRF 17427 17301 EVGSYTMAVVGEVAYG 17256 16350 Q VMFPWARPLVRWLATHFPDRAQREHMAARTQI 16318 IANISRLLMERWAASKKAAAAAAGTGGGAGNAAGAGGDRAGGFKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE 15798 FIQQVIAQSFLFVLAGFETSADTLALTCYLLATHPE (0) 15691 AAARLVAEVDAVGGRELTAELLAE (0) 15294 GLPYTEAVIKEAMRLYPPVPYLLRQAREDLDLGKGMVAPK (2) 15175 HSYVVLYVHSMHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM MAKTLLVRMYQRYRVALHPSQPLPLRMKAGLSRVPLDGIWLTLTER >32. scaf 156c 23868 PGALGWPFLGSIPEFSIYGYEYVLGLSAKLGN (0) 23773 23439 AWLGVEPLIIICDPALIR 23386 23162 KYAYKCVSKPPSMSEYGHVLTGFNYDVDQASAFVA 23058 22787 RGEVWRRGRRVFEASVING (0) 22557 PASLAAHLPAINRCANRFVAQL (0?) 22396 IVGGYTMAVTGEVAYG 22349 21478 Q VMFPWARPLVRWLATHFPDRAQREHMAARTQI 21446 IANISRLLMERWATSKKAAAAAA GKGAEEAIKEVGGGISSSSFMAAMMEGRRGAPQEERLSDVE VIIAQVIAQSFTFV 20858 MAGFETTALTLSLVTFMLATHPE (0) 20791 AAARLTAEVDGLGPGELTHEVLAE (0) 20358 KLPYTEAVIKETLRLHPPIPYFIREAREDLDLGNGMVAPK 20233 19945 GSYLTMYMHAVHLNPDVWPHPERFLPQRFLPEGSAAFGPADPGAWAPFGIGARMCVGHKLAMM 19775 19557 QMAKTLLVRMYQRFRIELHPRQPLPLKMKTGLSRVPVDGVWVTLTER >43 scaffold_683 LVTMLLGGTDTSALTVAFAAWHLAAEPQLQAELRREVGARSG 8306 DVWHTDTSLPLPAMPAAPLFCPSRPQPNAFLPFGVGSRSCIGR HFGLLSTQ 8178 VRRGGGWKRSGGGEGSGGEGWGTHGAWPRAFRDRGPHGGKWVWLLQGPGLLV* ????? LAALVARFEVLP 7789 PAPPAPTALDWSQSIVITSRSGVWLR 7711 >13. SCAF 712 AV627084.1 Chlamydomonas reinhardtii 5% to 0.04% CO2 cDNA clone 35102 MTFLQLLPGVPLVLLGVLALPV (0) 35037 34921 VITLVQEVITKRKYRHIP 34868 34694 GPKPQPISGNLREFLTSPGGLLGCLEGW (0) 34611 VK SANGSSTNSTSGSSSSTGVAPGSFLGLML 31374 MAPTLTDAQIEAQVQTFLLA 31315 31010 GFETTANALTFAVYLLACHPE 30948 29287 FRPERFLSPDVPGSAPELAARHPHVHLPFGSGPRMCIGWRFAMQ (0) 29156 28541 EAKTVLSRLVQAVDFTLAPGQAAPLDTVAGLTLAPRNGVWVRLSPR GGGGSGGGGGRGQEVATAAAKGAAVRSAAA* 28308 >17. no scaf 20021010.6327.2 new seq Length = 408 mid gene HEDMESEFLSLGLDIIGLGVFNFDFGSINSESPVIKAVYGVLKEAEHRSTFYLPYWNLPL ADVLVPRQAKFRADLKVINECLDNLIKQARDTRVAEDAEALQNRDYSKVSDPSLLRFLVD MRGEEPTNKQLRDDLMTMLIGGHETTAAV >26. scaffold_781 MRSSSRGAKIGRAYPTAHHIDGRASGGRPLHFGLHPCHRPCLRAKAAQSGLAE LPLPEGSLGLPVVGETLELITN (1) GDTFGTSRRERYGDVYKTNILGAPTVMV STPLTAYGKAVAARQEFGQLVSQSIQRSRQHTA 12675 RYAHVSRNGRERRLTPEPHLSMVCLNHLNALSTWWPVMTRIAVPPWPTAVRQDIVSRHGPA ITAEALDEMSYGTAVARELLRITPAVPAVFRLALVDFELQGRRIPK 12376 GWRVWCHVGDSVTRYNKDQFQPERWLGSSG 11834 QPEYSLPFGSGVRTCLGRNLVMTELLVVLAVLARGYEWEAVNPAEQWGVVPSPAPKEGLRVRLHRR 11637 >33. scaffold_806 missing N-term to KYG motif 220 IWLGNQPWVCVADPDLIR 568 RVAYRVLSRPFSHTDSIHLLAGEQWEVDCNTLVSS 672 1530 SLAGHLPAVWRCVRRYTPRL 1589 1838 LADLTLAVVGEAAYG 1882 2710 QMIWPGLTPLWRWMAKHLPDAAQTRHMR 2737 VADVSRQLMAQWQAAKAKTAAAA FVEVGGGISSSSFMASLLEGRRGAAKEEERLTDLQ 3660 QIVAQCLTFLLAGFETTAATISFTAFCLATHPEAQARLLAE 3782 GQQQREGDDALPE 4526 LPYLDAVLKESMRLYPAGSALIRKSPQPLDLGRDGLVIPG 4645 4956 MHDPAIWPEPEAFRPERFLPEGSSSLGPMVGGAAASAPAGGGADAAAAAWVPFGMGPRM 5132 5133 CVGSKFATM 5153 5425 VSKAVLLQIYRRFTFELHPKQ (0) 5484 VLPLRTRTALTHAPRDG 5818 >34 scaffold_437a similar to scaf 521 95% to 437b 36702 GFALLLVSLIIYLLDPIKRWRLRKIPG 36433 PGPRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 36320 36203 QVALGRKWVVVLADAEMQRQVDGAGSERGQGGGAQIRL WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 35376 35354 AGGGSSVDMWRELGGMTLQV 35034 GYGKQLAAACGQIFRYGSPVHGSP 34963 34836 HSYLRLAMLFPELRSLLLTLAHTLPDEKFTIL 34741 34577 LQARTRLCNTVFQLIDSWKEQH 34512 34445 SSNGVGAAATSGRGGLSGVAPGSFLDLMLGQRQGGERGSGGKKAEGEEGVEHAPLTDEQVAGQ 34257 34116 VQLFILAGYETTANALAFAVYCIATHPEGTATYR 34015 33849 PLAAVESRLLHEVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLT 33568 FNIPAGIPIFLPLYIAHRDPAVWPRAEEFLPERFLP (0) 33452 31822 SSPQYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 31712 31582 EAKVALATLYRRLTFTLEPGQQPLKLVASVTMSPRGGLHVTPVPRR 31445 >35 scaffold_437b 41842 GLALLLASLLIYLLDPIQRWRLRKVPGER 41756 41597 GPPARPLLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 41487 QVALGRKWAVVLADAEMQRQVRGTGAERG 41295 WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 40216 40194 AGGGSSVDMWRELGGMTLQV 39794 GYGKQLAAACGQIFRYTSSAHGSP 39723 39570 HSYLRVAMLFPELRRLLVPLAHTLPDKRFAILMQ 39469 39298 LQARNRLSGAVFQLMDSWKQQH 39233 39091 SSNGVGAAATSGRGGMAGVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 38903 38777 VQLFILAGYETTANALAFAVYCIATHPEGTATYR 38676 38510 PLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETLTVRVTN 38229 FNIPAGIPIFLPMYIAHRDPAVWPRADVFLPERFLH (0) 38128 37666 SSPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 37556 37363 EAKVALATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVTPVPRR 37226 >39. scaffold_2693 Length = 5632 frags out of order ALMOST IDENTICAL TO SCAF 437 2984 GFALLLVSLIIYLLDPIKRWRLRKIPG 2904 2701 PRGRPVLGCLPQLRAQPMPLFLQSCAQTYGPVFKAS 2594 2490 QVALGRKWVVVLADAEMQRQVDGAGSGRGRGGGAQIRLRDVAGTMAHGR 2344 1800 WRQLRAAWQPAFAPASLAGYLPLMTGCADQLARRLEAKATAAAG 1669 1342 GYGKQLAAACGQIFRYGSPVHGSP 1271 1145 HSYLRVAMLFPELRSLLLTLAHTLPDEKFTIL 1050 882 LQARTRLCNTVFQLIDSWKQQH 817 753 SNNGVGAAATGGRGLS GVAPGSFLDLMLGHRQGGGSGSGGKKAEGEEGVEHAPLTDEQVAGQ 568 447 VQLFILAGYETTANALAFAVYCIATHPEGT 358 187 PLAAVESRLLREVDDVLPGSDQLPGESDLPRLAYTEAVVNEALRLFPPAHLTSRVVPPGETL 2 3945 SPLYESLQPRGAAQQHAHAPFGYGSRMCIGYKFAMQ 3838 3648 EAKVVLATLYRRLTFTLEPGQQPLQVEASLTMAPRGGLRVMPVPRR 3511 >29. scaffold_5 RUNS OFF INTO A SEQ GAP 5199 (0) DVVPPLQDVVLAGWSVPAGAEVWVDVHAMHRNPQLWRDPDRFNPERWAEH (0) 5050 4746 ASEAPLCSPLAFMPFGSGPRSCLGQQLAAAELKAALAVLLCFLALEPTGDPADE 4585 PRPAAGLFLRPAGGLHLLLVHRQRGQRAGAA* 4489 >36. scaf 467 14775 AVLALLLALHVLADPLQRWRLRHIPG 14852 15120 GPPALPLLGSVPAMMRAGGPFFFRQCFAKYGPVFK 15414 AQVAMGRKWVVVVADAELMRQ 16227 WRLLRGAWQPAFSSAALSGYLPLMSACGLRLAQQLQA 16969 YGRRLAVACGDVFRFGSALHGS 17034 17266 SYQRIGLLLPELVPALVPLAHSLPDPPFKRLQR 17364 17660 QARSTLLAACMELIRSWRQQH 17722 18159 PTAAHTYIHSPLAWPRGHTAHPQVQTFLLAGYETTANALAFAIYCVATHPEGEWRSEGP 18335 18336 RGRAGERRAGSEDGTAARY 18392 18987 RPPTESDLPRLPYTEAVLNEAMRLFPPAHATTRIVEAGAPLQ 19333 GGVSLPPRTPLILAIYSAHHDPAVWPRPEDFIPERFLP (0) 19479 19665 ASPLHSEVAARVPGAHAPFGYGSRMCIGWKFAMQ 19715 19932 EAKLVLALLYQRLLFRLQPGQVPLPTATALTLAPRDGLWVRPVLRR 20069 >37. scaffold_2628 Length = 5882 runs off end NOTE: CANNOT FIND AN AG-GT BOUNDARY AT LAST EXON. THIS MIGHT HAVE A LONG INSERT IN IT AND NO INTRON 5870 ELVVSPYVLHRLPRLWGPHAACFQPERFMPPPPRP 5766 5066 PPAAGGGCTEPAAAGPYLPFGAGPRACPGASFGSAEVKLLVAHVVMRYSLELLQPPPPSPR 4884 4643 QQLFVSLRPGPGVRVCFVPRHQQQVE* 4563 >38. scaffold_668 runs off end 508 RGSFVSISIYNMHRDPAHWKEPERFIPERFLQ 603 905 AATGGALGPTDPGAYVPFGSGPRMCVGYKMAIM (0) 1539 VVKSVLAGLLLRYRVALHPRQPLPLRLKTGLTLEPADG 1652