Diatom P450s from Thalassiosira pseudonana

data from JGI (Joint Genome Institute)

LINKS:

There are 10 P450s found.  One is a clear CYP97E sequence related to 
CYP97E1 from another diatom.  Another is a second CYP97 related to CYP97B3 
from plants and a CYP97 from Chlamydomonas. There is a CYP51 present.
The other 7 P450s are pretty unique.  Only CYP97E2 has introns.  The
Others are intron free.  

Last modified July 26, 2004
D. Nelson

>CYP51C1 scaffold 68 48% to 51G1 Arabidopsis
MYCNSPTRLTTNPQRIYKCNPSRLSLTHTQFFHKRLTFLIGPEAQEPFFKAPDEVLSQN
EVYGFMKPVFGPGIVYDASKKNRQVQFQSMANGLRTARLKGYTAKIERETRQYLESWGES
GELDLFHALSELTILTASRCLHGDDVRENLFKEVSELYHDLDQGLTPLTVFFPNAPTKSH
MKRNAARAKMVELFSKVIKNRRDNPDVQHSDGTDILSIFMDVKYKDGSNITDEQVTGLLI
ALLFAGQHTSCITSTWTSLFILNNPAILKRIIAEQNDVFGSQPDADVDYKMVNEDMPLLH
NSMKEALRLCPPLILLIRYALKDVKVKAAGKDYTIPKGDMVLISPSVGMRIPEVFKEPNT
FDPDRFGPDREEDKSSPFAYMGFGGGMHSCMGQNFAFVQVKTILSVLFREFELEMVSETM
PDIDYEAMVVGPKGDCRVRYKRRQ*

>CYP97E1 from the diatom Skeletonema costatum AF459441
The Diatom mRNA may be short at the N-terminal 
RQAFLALSIGLLSVGGVNSFQAPVAGSRVVTAAPSITQLFSTLDKKEVVETKKQQSSTST
TTPLVDVSSSRVDVDVNVLD
MASYESDLLSTWDEDPSLQKGFDWEIEKLRRYFAGLRQTPDGRW
VRKSTLFEFLVTNSPSKVVGVGPDGERYESPPKPVNIFDVGVLVGKNTLTWLGFGPNL
GMAAVPDAVIQKYEGSFFTFIKGALGGDLQTLAGGPLFLLLAKYYTDHGPIFNLSFGP
KSFLVISDPVMARHILRDSSPEQYCKGMLAEILEPIMGDGLIPADPKIWKVRRRAVVP
GFHKKWLNSMIGLFGDCGDRLVDDLEKRSTSDKPVIDMEERFCSVTLDIIGKAVFNYD
FGSVTKESPIVKAVYRVLREAEHRSSSFIPYWNLPYAEKWMVGQVEFRKDMGMLDDIL
AKLINRAVETRQEATVEELEERETSDDPSLLRFLVDMRGEDLTSKVLRDDLMTMLIAG
HETTAAMLTWTMFGLVSNDPGMMKEIQAEVRTVMGNKSRPDYDDVVAMKKLRYALIEA
LRLYPEPPVLIRRARQEDTLPPGGTGLSGGVKVLRGTDIFISTWNLHRAPEYWENADK
YDPTRWERPFKNPGVKGWNGYDPEKQSSQSLYPNEITSDYAFLPFGAGKRKCIGDQFA
MLEASVTLSMIMNKFDFTLVGTPEDVGMKTGATIHTMNGLNMMVSPRSETNPIPGTNE
WWTKQHLMRGLSSTGRPYTSDEDAAWTTSANGMRP

>CYP97E2 52% to 97B3 from Arabidopsis from GDLQ to VVSRR
85% identical to Skeletonema costatum AF459441 over 654 aa
scaffold 12 170174 to 172672 53% to scaffold 23 seq 
MCTKLSSRRTLLALYFAFTGCTAFQLPSATPSRASITKAYSTHLDKEIKSKTPLVNPSKIYT
QADIDTLDLSSYENELLAAWDTDSSLQRGFDWEIEKLRRNFAGLRQREDGQWVRKPSLFD
FLVTNTPSNVVGVSNTGERYESPPKPVNMLDVGLLITKNLLNTLGFGPSLGMAAVPDAVI
QKYEGSFFSFIKGVLGGDLQTLAGGPLFLLLAKYYQDYGPIFNLSFGPKSFLVISDPVMA
RHILRDSSPEQYCKGMLAEILEPIMGDGLIPADPKIWKVR (2?)
RRAVVPGFHKKWLNNMVTLFGDCGERLVNDLDARATAKTPVDMEERFCSV
TLDIIGKAVFNYDFGSVTKESPIVKAVYRVLREAEHRSSSFIPYWDLPYADKWMGGQVEF
RKDMGMLDDILTKLINRAIETRDEASVEELEDRDVGDDPSLLRFLADMRGEDLTSKVLRD
DLMTMLIAGHETTA (0)
AMLTWTVFGLVSNDSGLMKEIQAEVRTVMGDKLRPDYDDIAKMKKMRYALIEALRLYPEPPVLIRRARSEDN
LPAGGSGLSGGVKVLRGTDIFISTWNLHRAPEYWENPEKYDPTRWERRFKNPGVKGWNGY
DPEKQSESSLYPNEITADYAFLPFGAGKRKCIGDQFAMLEASVTL (0)
AMIINKFDFTLVGSPKDVGMKTGATIHTMNGLNMVVSRRSEDNPIPETNDYWIQQHLSRGLNVNGRPYSS
NEDAAWTASSRDKNEGVVSRLVN*

>CYP97F1 Scaffold 23 132652 to 134655 also on scaffold 784 
52% to 97B3 Arabidopsis, 52% to 97E1 from another diatom
MMIHHSRVCV
LQLRIGQHNTARAKEKPNPNMKFTTALAVLCWTSVTNAFVPSSFTSPALKNEQQQVRASS
PLYALDTKEKEETTTATSASSTDTSSTPAAAATEESEGLPWWWEYIWKLPVMQPAEPGTD
IIFADSARVLRTNIEQIYGGFPSLDQCPLAEGEITDIADGTMFIGLQRYQQQYGSPYKLC
FGPKSFLVISDPVQAKHVLRDANTLYDKGILAEILKPIMGKGLIPADPETWSVRRRAIVP
AFHKAWLNHMVGLFGYCNEGLIASLEEAAKKNDAPNGQQGGKIEMEEKFCSVALDIIGLS
VFNYEFGSVSEESPVIKAVYSALVEAEHRSMTPAPYWDLPFANEVVPRLRKFNSDLKVLD
DVLTDLIDRAKNSRQVEDIEELEKRDYANVKDPSLLRFLVDMRGADIDNKQLRDDLMTML
IAGHETTAAVLTWALFELTKHPEQMAKVRAEIDSVLGDRTPTYDDIKEMQYLRLVVAETL
RLYPEPPLLIRRCRTENKLPKGGGREATVIRGMDIFLSLYNLHHDERFWPEPNEFKPERW
ESKYINPEVPEWAGYDPAKWINTNLYPNEVASDFAYLPFGGGARKCVGDEFATLEATVTL
AMLLRRFEFEFDSAKLAASKIDIMDHPEDLEHAVGMRTGATIHTRKGLHMVIRKREL*

>CYP5018A1 Scaffold 2 194927 to 196540 (minus strand) also on scaffold 853
MFDTMTAPSSPSSSRSALVTAAAASLV
SLSLLSIYKRRRTTSNNELPYPPTPPDRNYFLGHAMSLRRVPGEPKKSHDLLFLNWMNKL
NSKVVMFELPFLGRLFGLGRMICVGDAEIARHILVTANYNKSPTYSVLQPLIGMSSMVAT
EGKMWKDQRKLYNPGFSPEFLRNCVSTIIEKCNRFIVRCDGDVENGVATDMLARSIDLTS
DVIVQVAFGEDWGVDSKDKHGIETLQTIRDLTVAVGENMTNPLRKYFGLRSIWRTRRLSA
ALDQDMQNLVKRRLAQVLAGDADLEKDILSLTLSGVLEAKQESKSGAISLSKDEMERMTS
QLKTFYFAGHDTSSSAIAWAYWLLTKHPESLQRAREEVVSHLGRDWSDEALTGDSLCNTT
YQCLQKCEYLDAVARETLRLYPPAASTRWATDAKGANAGGFNLEKSVVHVNFYAIQRDPD
VWENPDSFVPERFLGEEGRKRILSYSFLPFSKGSRDCIGKYFALLEIKIALAALISRYDA
SVVNENEQYVIRLTSVPHDGCKVNLSRRRK*

>CYP5019A1 Scaffold 43 84821 to 86698 (minus strand)
MLSLCSHIFFHIVHVHLQPS
WLLGHYPLFVNPDKHHQTFTAHATPSGISALWGPSTDKFFSSVRADHCRSILRQSSSRNF
VSFIVRHGRRTLGEESIILINGGKRWKRQRKVIQKAFHLEVVKGGREAVGEVADVVVDWI
LRACSGRSDSGVHDGDCGGRLVGADGNEKVCVEAEDFFKLFALEVFGKVAMGYDFRCFPS
LATSDDNGNSNSNVALHNKKDVVSNGTHTYNDNACNCLQMPPDAQSFDFLNVDIGNRSTP
TSLMNPCMQFYSIPTPHNKKYHHHMDRIKGLVGKIIGLQLNRLCSDGGVMEGDTNMITHL
LQSTIEENFSPTDTDGNTGCPFSSSLPSKSIPDTVISNLTPSDKDQIIESVSKMLITFLM
AGYETTAISMSFVVYFLSKYKRCQERCAEEARRVLGRFGVHGTDIDDDELVYCRAVFMET
IRLHLPVMFTTRVTEKEMSFDTGLEEGHNVTIPKGTRCVVCPTVVHMDERNFERAEEFLP
ERWVRWERGRWVERDYETEGLKSTALPSITEDEQDSPPISAKYDEENNSASSISAADPHN
FFSFSDGARNCVGKRLAIMESTILIAVLLRDVCVDFAEEGFEMKKVRRFVTCGPESLPVVFWRRE*

>CYP5020A1 scaffold 88 80450 to 82390 (plus strand)
MITAHSPIIIHSSRRQCNRKRYTKSLSPTMEGSSSSSSSSAMLFRLSSFL
MTQQEDSLLSPPLFDNLIPSSFNNNSLLVLLTSITIILLL
IIIIKQLHSHHTRQPFHPKHNLPPSVPNPHPILGHHAYTAYTPTSPQSDHVFKHHANKQG
ISSFWFFSTPSISITRASHARSILRHSVERRHMSFFLRRFKRVMGKDSLIYVEKDQLWRS
HRSIVQRAFTAEAVMGMGRKVWRVADSFAGSILEESRKNSGADAGTGGYRKDALELFKWV
IIDVFGVVALGYDFQCTKSLKLAPLAQSFDYSIDDVQERTLASEMCNPFMQCYWLPTERN
RRYAFHNVRVQGLMSDICEKRKSTIDSEEVRRPAADSTSHMQLRTFSSSLASRGDDLLTI
LLKTPSPNNNQGNDNVDVMSTKEVTEMLLTLFYAGYDTSSTALSCAMYLLSTHKEIQTIC
ANEARAASISLQQDPSTWNEQLAYCKAVVMESLRLHSPVTLTSRTLETNVQLDECTSVPK
GTRVYIPIELIQTSECNFARATEFLPERWVRRDESTGLWVERDYKSEESQSFVSKDSSYI
PPANPHNIFAFSDGARNCVGRRLAMMESTMVLACMVRDLVVNVPEGFKLEMKKKFVMTTP
EEVPLIFTERCRDGRV*

>CYP5021A1 Scaffold 8 518667 to 520907 (plus strand) also on scaffold 849 and scaffold 3
MDNRPLSSLTPECTIDRY
PNATTSSFQSFGGAATCSGSSTASTTTHLLLSSILIGLLSPIISCLFVILLLMFFKRRQT
RHELEGGKCNLPTVVWRPRFMNYTSKDEGSDEDIEIDDYEAWAREYARSLQSDDGNNSGK
STHMKKLGSSAITNILPRMERLNGPYGMYATVYGVSTKVLHVAHPVPARAILTGSGVVDV
GGMNNGIGECFERQNSSFLGEISQSVTRPFKRLSSGMEGAAISPSEERKQRRRSSALRLL
TGSTKYPAYDHFKNFSGDGVFTADGSDWKAKRASVLHCLLRSGGADCMLEKEINRAADSF
EREVTWAKQTMNKEGDDKDGPVMNVVTMLQRSTIGLIYRIITHHNVEFSPDIDTNEQFIC
SPKSSAASLTSLDKNQHNGAKASEDDNHTKPDVKKDSQMKLLLPIYLDAVTKIRMIVLAQ
SRSIWYLLPRWAYRTFSPMYRDEERTMVPIRQFARLACENAVEGSPLELLSQRSSHASKE
GEATSAVSKDLLDEAITLLFAGQDTSAATLSWTLHLLSLHPQEQQKVVEEVRSVLSSLDE
GEMVSKNTISQLPYLDAVIKESMRLYPVAPFIVRKLTTDVTIPIESQSVEDDATTTTIPE
STFACIWIYALQRNPKLWTKPDEFIPERWIEPDLRSNDLGQQEVGSYMPFALGPRNCLGQ
PIAQVILRVLLARILNKYEVRDPKFDALQRLGEETGEAFDTKYLLKDMQAGFTVLPSNGL
RIKLVERC*

>CYP5022A1 Scaffold 45 115153 to 116619
MGEMTESTVVASLLFAYDVLNSYPFVPPFRSVYGMSILGDDELIICDPKVFDKYVVRAEDKYPIGGAEA
VTTFTDFYKENNLTKALEGTGHGPEWKAWRKSLDPDMYVAWETYLPTIADAANKISKVAG
TSNIEFVDFLSRSAFDMFTAVMYGESPQTTNNNVASKEDIEFVRASQSAFDVTGRLLSNP
LDKLFGGILQSEFNVNMEKTYYFANLRTKQYADGATELQRAAKTAHGEESESKCPITAIK
TQFLNPSFIERLVHRGELSNDNIAELAAPLLMAGVDTTAYVMSWLYLNLASNLDVQAKLA
EELKSVLDGADLTTKEQMDSLPYLKACIRESHRLTPATPILAKTLEKDIDVVVDDACYKV
EAGHRISLNLRAIPMDPAYVDNPTLFQPERFLPSAVEARRGTPSEIIDHPSFADPFGRGK
RRCLGVNVAVAEIMILAARLVQDWEIGLVDTEDRHRWTAKQKLMLKADPYPSMTVSPRS*

>CYP5023A1 Scaffold 85 7912 to 9600 (plus strand)
MQQMATTCFVSSFTLPSSRITGPPTSFGRTSIQHEHLPSLTNLCAIYERGSTQLKA
ALLPSVVTKISPPLRNTLVLAAAAVAIYKNRHRLYLGSDPDPNFSEPLPEGSLGCPLLGN
LGFFTKNGTPETGPGEFFRSQAKTVDNASIFKYMALGKPVAMVSGMKNVKSAFNTEFKKI
RTGSLIKNFYRLFGKQSLLFISDADRHQYLRRLVGQSMTPEAISNAMPALVNVATDQIEL
LSEHPITVMESALTNYTLDVAWRQILGLDLKEDEIVTFYDAVDNWIGGITNVRTLFLPGM
ENTKAGKGLIYLKSKIERRIDELLANGPDGSTMSYMVYAKDEEDATKSMTREEIIDNALL
LILAGSETAASTLTVAMLALGLNKDAFQKLKDEQRQLISRHGEELTRSMLDKDCPYLDAV
VKETMRIKPLAGTGAVRIAEETIVVDGKQIPKGYGVAFNIFLTHASDPVVKEEDGSHMDV
AKGFKPERWLSDETKPTEYMPFGYGARFCLGYNLAMAEMKVFLALFARRVDYDLVNMTPD
HVTWKKMSIIPKPDDGAVISVTSISK*

>CYP5024A1 Scaffold 15 295386 to 296981 (plus strand)
MLIRTNTIAPLAVILLSVGGCHSFAPVVQHQCHGASLFSSTAAAEQKDVSKL
PLPPNPEKKDISELPLPPNLGMNLFRNIRDTFSYLSNPDRFVADRSAKLGPIFLAYQFFK
PTVYCGGKENVAEFISGTELKNKVIHPALPESFVELHTKWGALNMDATESMFKEARVLFG
DVLSSREALEQYSAAADREISDYVDNLAERVKTNPQQPIYLAPELKSLCLQIFSKIFSGE
GLSEQQMQQFNDYNDALLALSKGTDQYKKGKTALDELRVEMLRRFRALDDPNIPSDTPGK
WYHDQIFGRENFDNEERIATGMVLFIWGAFIECASLCVDSLALSYKYGLQEKIDGVREEY
ATRQATGLSSSDPKFWNTNDMPYTNGILRETLRTAPPGAGVPRFSYDDFELAGYRIPANY
PVMLDPRIGNMDANLYTKPEQFEPLRWVPTKAKESACPFQGSALNLGIGSWFPGGFGAHQ
CPGVPLAELVGRMFITKISNEFDAWEFSGEGLDKSGDIDYVKIPVRIPRDDFGMRFTLN*