Undiscovered Human P450s hidden in the EST database

David R. Nelson     Feb. 8, 1999 Revised Sept. 20, 1999, Oct. 13, 1999
Dec. 11, 1999, Feb. 10, 2000

	In November 1995, I did a comprehensive search of all the human ESTs at the 
time looking for new P450s not yet identified or cloned.  There were 297,363 human 
ESTs on Nov. 30, 1995.  I found 12 P450s not yet described.  Over the last five 
years, 11 of these have been cloned.  One remains.  I was planning to write this 
up and publish it, but the current state of genome sequencing in humans will 
make this a useless effort.  Therefore, I have decided to publish these findings 
on the web and let the P450 cloners go get these sequences.

CYP2S1  52% identical to CYP2B subfamily members and 50% with CYP2A 
members 50% with CYP2G1.  This sequence is probably in a new subfamily.  
It has been named CYP2S1

PGTEFTNKNMLMTVIYXLFAGTMTVSTTVGYTLLLXMKYPHVQKWVRX
ELNRELGAGQAPSLGDRTRLPYTDAVLHEAQRLLALVPMGIPRTLMRTTR
FRGYTLPQGTEVFPLLGSILHEPNIFKHPEEFNPDRFLDADGRFRKHEAFLP
FSLGKRVCLGEGLAKAEVFLFFTTILQAFSLESPCPPDTLSLKPTVSGLFNIPP
AFQLQVRPTDLHSTTQTR*

This sequence is a consensus of ESTs T84852, AA315278, AA300981 and AA301039
There was no UNIGENE entry for any of these ESTs
Note on Sept. 17, 1999 I did find a mouse homolog for this sequence and it has the 
full C-terminal sequence.  When I searched with this mouse C-terminal I found the 
human C-terminal also AA316621 AA496320.  

mouse ortholog of CYP2S1 
starts at amino acid 61 ESTs AA546445, AI585412
LSKKYGPVFTVYLGPWRRVVVLVGHDAVREALGGQAEEFSGRGTLATLDKTFDGHGVFF
ANGERWKQLRKFTLLALRDLGMGKREGEELIQAEVQSLVEAFRKTGGRPFNPSMLLGPA
TSNVVCSLVFGIRFAYDDKEFQAVNQAASGTLLGSSSQWGQ

GAP

ESTs AA967201, AA543966
RLPYTDAVLHEAQRLLALVPMGMPHTITRTTCFRGYTLPKGTEVFPLIGSILHDPAVFQNPGEFHPGRFLDEDGRLRKHEAFL
PYSLGKRVCLGEGLARAELWLFFTSILQAFSLETPCPPGDLSLKPAISGLFNIPPDFQLRVWPTGDQSR*


Human genomic DNA matches to 2S1

gb|AC011505.2|AC011505 Homo sapiens chromosome 19 clone CITB-H1_2081K17, WORKING DRAFT
            SEQUENCE, 10 unordered pieces
            Length = 166330
            
CYP2S1  52% identical to CYP2B subfamily members and 50% with CYP2A 
members 50% with CYP2G1.  This sequence is probably in a new subfamily.  
It has been named CYP2S1

LOCUS       AC011510   130776 bp    DNA             HTG       06-FEB-2000
gb|AC011510.3|AC011510 Homo sapiens chromosome 19 clone CT-2195B23, WORKING DRAFT SEQUENCE, 9
              ordered pieces
              Length = 130776

NINE EXONS Unigene entry Hs.98370

MEATGTWALLLALALLLLLTLALSGTRARGHLPPGPTPLPLLGNLLQLRPGALYSGLMR*
LSKKYGPVFTIYLGPWRPVVVLVGQEAVREALGGQAEEFSGRGTVAMLEGTFDGH*
GVFFSNGERWRQLRKFTMLALRDLGMGKREGEELIQAEARCLVETFQGTE*
GRPFDPSLLLAQATSNVVCSLLFGLRFSYEDKEFQAVVRAAGGTLLGVSSQGGQ*
TYEMFSWFLRPLPGPHKQLLHHVSTLAAFTVRQVQQHQGNLDASGPARDLVDAFLLKMAQ*
EEQNPGTEFTNKNMLMTVIYLLFAGTMTVSTTVGYTLLLLMKYPHVQ*
KWVREELNRELGAGQAPSLGDRTRLPYTDAVLHEAQRLLALVPMGIPRTLMRTTRFRGYTLPQ*
GTEVFPLLGSILHEPNIFKHPEEFNPDRFLDADGRFRKHEAFLPFSL*
GKRVCLGEGLAKAEVFLFFTTILQAFSLESPCPPDTLSLKPTVSGLFNIPPAFQLQVRPTDLHSTTQTR*

Exon 1 comp(108139-108315)
Exon 2 comp(106871-107036)
Exon 3 comp(103651-103801)
Exon 4 comp(102958-103118)
Exon 5 comp(102692-102871)
Exon 6 comp(100208-100349)
Exon 7 comp(98748-98935)
Exon 8 comp(96286-96427)
Exon 9 comp(95897-96105)
3 prime UTR  comp(94852-95896) = EST AI445492 that has a poly A tail

Human ESTs = AA315278, AA300981, AA316621, AA496320, T84852, AA301039
Plus 17 ESTs in 3 prime UTR in Unigene entry Hs. 98370

*********************

There is a new human sequence in the genomic data from Genbank.  It has been named 
CYP3A43     human
            GenEMBL AC011904 8902-46787 13 exons
            Gene assembled from genomic sequence by Henry Strobel 
            and David Nelson on Dec 11, 1999
            intron exon boundaries defined by comparison to rat 3A9
            and human 3A sequences.  GT AG pairs found for all introns
            ESTs AA417369 zu08d03.s1 AA416822 zu08d03.r1 Soares testis
            Opposite ends of same clone
            67% identical to rat 3A9

Assembled gene * = intron exon boundary ** = EST support for this boundary

MDLIPNFAMETWVLVATSLVLLYI*
YGTHSHKLFKKLGIPGPTPLPFLGTILFYLR*
GLWNFDRECNEKYGEMWG*
LYEGQQPMLVIMDPDMIKTVLVKECYSVFTNQM*
PLGPMGFLKSALSFAEDEEWKRIRTLLSPAFTSVKFKE*
MVPIISQCGDMLVRSLRQEAENSKSINLKE*
DFFGAYTMDVITGTLFGVNLDSLNNPQDPFLKNMKKLLKLDFLDPFLLLI* 
SLFPFLTPVFEALNIGLFPKDVTHFLKNSIERMKESRLKDKQK*
HRVDFFQQMIDSQNSKETKSHK*
ALSDLELVAQSIIIIFAAYDTTSTTLPFIMYELATHPDVQQKLQEEIDAVLPNK**
APVTYDALVQMEYLDMVVNETLRLFPVVSRVTRVCKKDIEINGVFIPKGLAVMVPIYALHHDPKYWTEPEKFCPE**
RFSKKNKDSIDLYRYIPFGAGPRNCIGMRFALTNIKLAVIRALQNFSFKPCKETQ**
IPLKLDNLPILQPEKPIVLKVHLRDGITSGP*

coding region 504 amino acids

exon 1 8902-8972
exon 2 17239-17332
exon 3 19906-19958
exon 4 24929-25028
exon 5 28274-28387
exon 6 28952-29040
exon 7 30332-30480
exon 8 36377-36504
exon 9 37619-37685
exon 10 40616-40776
exon 11 42399-42625
exon 12 44323-44485
exon 13 46692-46787

The ten ESTs that have been cloned are:

 

T91507 and T91536 CYP2R1 

(still confidential) UNIGENE entry Hs.16846 (14 ESTs)


CYP4F11 
This sequence is made from two ESTs in the original 12 from N- and C-terminal 
regions.  The sequence has been named CYP4F11

GenEMBL AC005336 cosmid F20129 end of cosmid
ESTs T56269 and T56204 opposite ends of clone yb89b03
ESTs T69576 and T69645 opposite ends of clone yc44c03
EST AA991369 and EST W23003
G07004 human STS WI-8821
N-terminal up though the C helix and the C-terminal from I helix to end is present.
The middle region is not present.  The rest of this gene should be on cosmid       
R28342 this is not in the database yet. About 22 amino acids missing at N-terminal.

LLLVGGSWLLARVLAWTYTFYDNCRRLQCFPQPPKQNWFWGHQGLVTPTE
EGMKTLTQLVTTYPQGFKLWLGPTFPLLILCHPDIIRPITSASAAVAPKD
MIFYXXLKPWLGDGLLLSXXDKWNRQRRM
(167 amino acid gap)
IRGEXDTXMXGGHDTTASGLSWVLYHLKRHPEYQEQCRKEVKEXL
KDREPIEIEWDDLAQXPFLTMCIKESLRLXPPVPVISRCXTQDXVLPDGRXIPKXIV
CLINIIG
IHYNPTVWPDPEVYDPFRFDQENIKERSPLAFIPFSAGPRNCIGQAFAM
AEMKVVLALTLLHFRILPTHTEPRRKPELILRAEGGLWLRVEPLGANSQ*

The 4F11 sequence is now known in full from a cDNA submitted by Henry Strobel and Xiaoming Cui

cDNA sequence:
N-terminal is on AC011517 rest is on AC020950
MPQLSLSWLGLGPVAASPWLLLLLVGGSWLLARVLAWTYTFYDNCRRLQC
FPQPPKQNWFWGHQGLVTPTEEGMKTLTQLVTTYPQGFKLWLGPTFPLLI
LCHPDIIRPITSASAAVAPKDMIFYGFLKPWLGDGLLLSGGDKWSRHRRML
TPAFHFNLKPYMKIFNKSVNIMHDKWQRLASEGSARLDMFEHISLMTLDS
LQKCVFSFESNCQEKPSEYIAAILELSAFVEKRNQQILLHTDFLYYLTPDGQR
FRRACHLVHDFTDAVIQERRRTLPTQGIDDFLKNKAKSKTLDFIDVLLLSKD
EDGKELSDEDIRAEADTFMFEGHDTTASGLSWVLYHLAKHPEYQEQCRQEV
QELLKDREPIEIEWDDLAQLPFLTMCIKESLRLHPPVPVISRCCTQDFVLPDG
RVIPKGIVCLINIIGIHYNPTVWPDPEVYDPFRFNQENIKERSPLAFIPFSAGP
RNCIGQAFAMAEMKVVLALTLLHFRILPTHIEPRRKPELILRAEGGLWLRVEPLGANSQ

4F11 genomic DNA 12 exons

Exon 1 AC011517 comp(11030-11223) four frameshifts in this exon
MPQLSLSWLGLGPVAASPWLLLLLVGGSWLLARVLAWTYTFYDNCRRLQCFPQPPKQNWFWGHQGL

Exon 2 AC011517 comp(7979-8120) missing G base at end of exon before GT pair
Three frameshifts in this exon
VTPTEEGMKTLTQLVTTYPQGFKLWLGPTFPLLILCHPDIIRPITSAS

Exon 3 AC020950 comp(22900-22954)
AAVAPKDMIFYGFLKPWLG

Exon 4 AC020950 comp(22683-22811)
DGLLLSGGDKWSRHRRMLTPAFHFNLKPYMKIFNKSVNIMH

Exon 5 AC020950 comp(18976-19097)
DKWQRLASEGSARLDMFEHISLMTLDSLQKCVFSFESNCQ

Exon 6 AC020950 comp(18027-18297)
EKPSEYIAAILELSAFVEKRNQQILLHTDFLYYLTPDGQRFRRACH
LVHDFTDAVIQERRRTLPTQGIDDFLKNKAKSKTLDFIDVLLLSK

Exon 7 AC020950 7459-7522 three frameshifts in this exon
DEDGKELSDEDIRAEADTFMFE

Exon 8 AC020950 7717-7849 one frameshift in this exon
GHDTTASGLSWVLYHLAKHPEYQEQCRQEVQELLKDREPIEIE

Exon 9 AC020950 15360-15493
WDDLAQLPFLTMCIKESLRLHPPVPVISRCCTQDFVLPDGRVIPK

Exon 10 AC020950 15595-15662
GIVCLINIIGIHYNPTVWPDPEV

Exon 11 AC005336 comp(36538-36620)
YDPFRFNQENIKERSPLAFIPFSAGPR

Exon 12 AC020950 comp(17336-17514) one frameshift in this exon
NCIGQAFAMAEMKVVLALTLLHFRILPTHIEPRRKPELILRAEGGLWLRVEPLGANSQ*


Exon 1 and 2 are on one contig on AC011517
Exon 3 and 4 are on one contig on AC020950
Exon 5 and 6 are on one contig on AC020950
Exon 7 and 8 are on one contig on AC020950
Exon 9 and 10 are on one contig on AC020950
Exon 11 and 12 are on one contig on AC005336 



T98002 CYP4F12 

GenEMBL AC004523  missing N-terminal
UNIGENE entry Hs.110130 (12 ESTs)
ITPTEEGLKNSTQMSATYSQGFTIWLGPIIPFIVLCHPDTIRSI
TNASAAIAPKDNLFIRFLKPWLGEGILLSGGDKWSRHRRMLTPAFHFNILKSYITIFN
KSANIMLDKWQHLASEGSSCLDMFEHISLMTLDSLQKCIFSFDSHCQERPSEYIATIL
ELSALVEKRSQHILQHMDFLYYLSHDGRRFHRACRLVHDFTDAVIRERRRTLPTQGID
DFFKDKAKSKTLDFIDVLLLSKDEDGKALSDEDIRAEADTFMFGGHDTTASGLSWVLY
NLARHPEYQERCRQEVQELLKDRDPKEIEWDDLAQLPFLTMCVKESLRLHPPAPFISR
CCTQDIVLPDGRVIPKGITCLIDIIGVHHNPTVWPDPEVYDPFRFDPENSKGRSPLAF
IPFSAGPRNCIGQAFAMAEMKVVLALMLLHFRFLPDHTEPRRKLELIMRAEGGLWLRV
EPLNVSLQ

R53456 CYP4X1 

(still confidential) UNIGENE entry Hs.26040 (13 ESTs)

R21282 CYP26

CYP26A1     human
            GenEMBL NM_000783
            White,J.A., Beckett-Jones,B., Guo,Y.D., Dilworth,F.J., Bonasoro,J.,
            Jones,G. and Petkovich,M.
            cDNA cloning of human retinoic acid-metabolizing enzyme (hP450RAI)
            identifies a novel family of cytochromes P450
            J. Biol. Chem. 272 (30), 18538-18541 (1997)
            Note: new family in mammals, homolog to human ESTs R51129 and R21282
MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPL
PPGTMGFPFFGETLQMVLQRRKFLQMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILL
GDDRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKVIMRAFSREALECYVPVITEEV
GSSLEQWLSCGERGLLVYPEVKRLMFRIAMRILLGCEPQLAGDGDSEQQLVEAFEEMT
RNLFSLPIDVPFSGLYRGMKARNLIHARIEQNIRAKICGLRASEAGQGCKDALQLLIE
HSWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKG
LLCKSNQDNKLDMEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGW
NVIYSICDTHDVAEIFTNKEEFNPDRFMLPHPEDASRFSFIPFGGGLRSCVGKEFAKI
LLKIFTVELARHCDWQLLNGPPTMKTSPTVYPVDNLPARFTHFHGEI

CYP39A1 A new P450 family in humans

The EST R07010 covers the C-terminal part of a P450.  Two ESTs with coding regions 
are not found in UNIGENE, but the opposite end of EST R11279 = R11221 and it is in 
UNIGENE Hs.20766 with 16 EST sequences.  
This sequence is most like CYP7B1 and CYP8B1, but the percent identity is only 
28%. The sequence is in a new family.  It has been named CYP39A1. The *s indicate 
predicted intron exon boundaries.  An h after the * indicates that this joint is 
confirmed by a human EST.  An m after the * indicates the joint is supported by a 
mouse EST. A c after the * indicates the joint is supported by a chicken EST.  The 
N-terminal exon is identified from the genomic sequence but it is a tentative 
identification requiring confirmation.  The N-terminal has an EST AA398040 
zt89c07.r1 that is part of a UNIGENE entry Hs.119154 with 5 ESTs.  These appear to 
be from an untranslated region of a gene, including a poly A tail.  I suspect that 
the AA398040 EST is flawed and has a retained intron sequence.  The N-terminal and 
the intron sequence is found in the genomic clone AC008104.  
CYP39A1 has 12 exons.  Only the boundaries after exons 1, 2 and 3 are not confirmed 
by EST data. The 2nd and 3rd exons were found by running the genomic DNA through 
a gene searching program called FGENESH at Baylor College of Medicines web site for 
sequence analysis.  The second exon was also found by searching the genomic DNA 
with CYP8B1 as a query sequence.  The 1st exon was found by searching with CYP7B1.  
CYP8B1 in mouse and human has no exons, but CYP8A1 has 10.  It is probable that 
CYP8B1 evolved from a processed mRNA that had the introns removed.  CYP8A1 and 
CYP39A1 do not share any intron exon boundaries.  CYP7B1 is only known as mRNA so 
no intron boundaries can be defined.  One CYP7B1 intron break is seen in GSS data 5 
amino acids before the EXXR pair and it is not shared with CYP39A1.  CYP7A1 has 6 
exons and it may share one intron exon boundary at the end of the 2nd exon, but the 
alignment is not very good here.  CYP51 shares one intron exon boundary at the KYG 
motif (end of exon 1 in CYP39A).  This corresponds to the end of exon 2 in CYP51
The KYG motif is often associated with introns, and it may be an ancient site for a 
very early intron.  I speculate that the conservation of sequence at this site as 
well as some others well conserved at intron locations may be due to the intron and 
not to any structure requirements of the P450s.


CYP39A1 is most like CYP7B1 and CYP8B1, with CYP7A1 and CYP51 as additional 
matches.  Since CYP7B1 is an oxysterol 7-alpha-hydroxylase, and CYP7A1 is 
cholesterol 7-alpha-hydroxylase and CYP8B1 is a sterol 12-alpha-hydroxylase, it is 
probable that CYP39A1 will have a sterol as substrate.  However, CYP8A1 is 
prostacylin synthase, so this prediction may be incorrect.  

MELISPTVIIILGCLALFLLLQRKNLRRPPCIKGWIPWIGVGFEFGKAPLEFIEKARIK*
YGPIFTVFAMGNRMTFVTEEEGINVFLKSKKVDFELAVQNIVYRT*
ASIPKNVFLALHEKLYIMLKGKMGTVNLHQFTGQLTEELHEQLENLGTHGTMDLNNLVR*
HLLYPVTVNMLFNKSLFSTNKKKIKEFHQYFQVYDEDFEYGSQLPECLLR* m 
NWSKSKKWFLELFEKNIPDIKACKSAKDNSM* m     see 22757 (-) on AL035670
TLLQATLDIVETETSKENSPNYGLLLLWASLSNAVP* m c
VAFWTLAYVLSHPDIHKAIMEGISSVFGKAG* m c
KDKIKVSEDDLENLLLIKWCVLETIRLKAPGVITRKVVKPVEIL* h m c
NYIIPSGDLLMLSPFWLHRNPKYFPEPELFKPERW*  h (sequence not perfect at this boundary)
EKGKFRRKHSFLGTASWAFGAGSSQCPGKV* m               
FALLEVQMCIILILYKYDCSLLDPLPKQ* h m
SYLHLVGVPQPEGQCRIEYKQRI*

human genomic clones that contain this gene:
AC008104 Homo sapiens clone 446_F_17, WORKING DRAFT SEQUENCE, 
13 unordered pieces Length = 180699
AL035670.15|HS347E1 Homo sapiens chromosome 6 clone dJ347E1, 
WORKING DRAFT SEQUENCE, in unordered pieces Length = 99785

Mouse ortholog CYP39A1 known from ESTs

Mouse EST AI118926 ue22b04.y1 MELFSPIAIAVLGSCVLFLFSRLKNLLGPPCIQGWIPWIGAGLEFGKAPLEFI

QFKTYDEGFEYGSQLPEWLLRNWSKSKRWLLALFEKNIGNIKAHGSAGHSGTLLQAILEVVETETRQYSPNYGLVVLWAALAN
APPIAFWTLGYILSHPDIHRTVLESISSVFGTAGKDKIKVSEDDLKKLLIIKWCILESVRLRAPGVITRKVVKPVKILNHTVP
SGDLLMLSPFWLHRNPKYFPEPESFKPERWKEANLDKYIFLDYFMAFGGRKFQCPGKWFALLEIQLCIILVLYKYECSLLDPL
PKQSSRHLVGVPQPAGKCRIEYKQRA*
exact matches for mouse
AI118926 ue22b04.y1 N-terminal fragment 
AA272844 va97h09.r1
AA606237 vo06d06.r1
AI552260 vf73b10.y1 extends to 3 prime
AA457858 vf73b10.r1 extends to 3 prime

Mouse EST MELFSPIAIAVLGSCVLFLFSRLKNLLGPPCIQGWIPWIGAGLEFGKAPLEFI
          ||| ||  |  ||   |||    |||  |||  ||||||| | ||||||||||
Human 39A MELISPTVIIILGCLALFLLLQRKNLRRPPCIKGWIPWIGVGFEFGKAPLEFIEKARIK*

66% identical


These sequences are similar to a chicken sequence

gb|AI979980.1|AI979980 pat.pk0008.g11 chicken activated T cell cDNA Gallus gallus 
cDNA
           clone pat.pk0008.g11 5' similar to cholesterol 7-alpha
           hydroxylase, mRNA sequence
           Length = 500
           
 Score =  185 bits (465), Expect = 2e-46
 Identities = 86/159 (54%), Positives = 119/159 (74%)
 Frame = +2

Query: 50  SGTLLQAILEVVETETRQYSPNYGLVVLWAALANAPPIAFWTLGYILSHPDIHRTVLESI 109
           S  LLQ +L+    + +   PNYGL++LWA+ ANA PIAFWTL +ILS P +++ V+E +
Sbjct: 29  SKXLLQHLLD--NLQGKHLXPNYGLLMLWASQANAVPIAFWTLVFILSSPSVYKKVMEDL 202

Query: 110 SSVFGTAGKDKIKVSEDDXXXXXXXXXXXXESVRLRAPGVITRKVVKPVKILNHTVPSGD 169
           +SVFG AGKD+I+VSE+D            E++RLR+PG IT+KV+KP++I + T+P+GD
Sbjct: 203 TSVFGNAGKDEIEVSEEDLKNLPYIKWCTLEAIRLRSPGAITKKVIKPIRIQSFTIPAGD 382

Query: 170 LLMLSPFWLHRNPKYFPEPESFKPERWKEANLDKYIFLD 208
           +LMLSP+WLHRNPKYFP+PE FKP+RWKE  + +  FLD
Sbjct: 383 MLMLSPYWLHRNPKYFPDPEMFKPDRWKE-EI*RRXFLD 496


H06539 and R36281 CYP46 

(in Genbank Aug 10, 1999) no UNIGENE entry for these ESTs
opposite end of R36281 = R49568 with UNIGENE entry Hs.25121 (5 ESTs)
The UNIGENE entry is only the 3 prime untranslated region of the mRNA
CYP46      human
           GenEMBL NM_006668
           Lund EG, Guileyardo JM and Russell DW.
           cDNA cloning of cholesterol 24-hydroxylase, a mediator of
           cholesterol homeostasis in the brain.
           Proc. Natl. Acad. Sci. U.S.A. 96, 7238-7243 (1999)
           32% identity with Drosophila 4D2
           ESTs H06539, H51951, R36281
           mouse homolog EST AA096922
MSPGLLLLGSAVLLAFGLCCTFVHRARSRYEHIPGPPRPSFLLG
HLPCFWKKDEVGGRVLQDVFLDWAKKYGPVVRVNVFHKTSVIVTSPESVKKFLMSTKY
NKDSKMYRALQTVFGERLFGQGLVSECNYERWHKQRRVIDLAFSRSSLVSLMETFNEK
AEQLVEILEAKADGQTPVSMQDMLTYTAMDILAKAAFGMETSMLLGAQKPLSQAVKLM
LEGITASRNTLAKFLPGKRKQLREVRESIRFLRQVGRDWVQRRREALKRGEEVPADIL
TQILKAEEGAQDDEGLLDNFVTFFIAGHETSANHLAFTVMELSRQPEIVARLQAEVDE
VIGSKRYLDFEDLGRLQYLSQVLKESLRLYPPAWGTFRLLEEETLIDGVRVPGNTPLL
FSTYVMGRMDTYFEDPLTFNPDRFGPGAPKPRFTYFPFSLGHRSCIGQQFAQMEVKVV
MAKLLQRLEFRLVPGQRFGLQEQATLKPLDPVLCTLRPRGWQPAPPPPPC

The one ESTs that still has not been cloned is:


It looks like it has been cloned now on Feb 4, 2000 see below, but the N-terminal 167 amino acids is still missing

CYP4Z1X = CYP4A20
H21976 is 55% identical to CYP4A11 in the C-terminal region.  It does have a 
UNIGENE entry Hs.176588 with eight ESTs from only two different clones.
60% identical to mouse 4A14.  58% identical to rabbit 4B1
Since the 4B and 4A subfamilies cannot be distinguished, this is probably a new 
subfamily.  It has been named CYP4Z1 until the full length sequence is known.  The 
name may have to be changed later.

Assembled gene is 57% to 4A11 so this is not 4Z1 but is a new human CYP4A20

Missing first 167 amino acids
NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS
SYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT
HLEKVIQDRKESLKDKLKQDTTQKRRWDFLDILLSAKV
ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSSITW
EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPA
GITVFINIWALHHNPYFWEDPQV
FNPLRFSRENSEKIHPYAFIPFSAG
PRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV

Related gene on AJ131016 = 4A11

Query: 1      NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43
              +KWEE + Q+S LE+FQHVSLMTLD+IMK AFSHQGSIQ+DRS
Sbjct: 134795 DKWEELLGQDSPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDRS 134667

Seq upstream of this seq

4A11

MSVSVLSPSRLLGDVSGILQAASLLILLLLLIKAVQLYLHRQWL
LKALQQFPCPPSHWLFGHIQELQQDQELQRIQKWVETFPSACPHWLWGGKVRVQLYDP
DYMKVILGRSDPKSHGSYRFLAPWIGYGLLLLNGQTWFQHRRMLTPAFHYDILKPYVG
LMADSVRVMLDKWEELLGQDSPLEVFQHVSLMTLDTIMKCAFSHQGSIQVDRNSQSYI
QAISDLNNLVFSRVRNAFHQNDTIYSLTSAGRWTHRACQLAHQHTDQVIQLRKAQLQK
EGELEKIKRKRHLDFLDILLLAKMENGSILSDKDLRAEVDTFMFEGHDTTASGISWIL
YALATHPKHQERCREEIHSLLGDGASITWNHLDQMPYTTMCIKEALRLYPPVPGIGRE
LSTPVTFPDGRSLPKGIMVLLSIYGLHHNPKVWPNPEVFDPSRFAPGSAQHSHAFLPF
SGGSRNCIGKQFAMNELKVATALTLLRFELLPDPTRIPIPIARLVLKSKNGIHLRLRR
LPNPCEDKDQL

Probable exon of 4A11 gene
GYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML

LPTTLFNLYLTAVIITWM*QLN*KIHMPEIYLLFLHLPMMWLLLDSSSPRKPPFSLPNCHQSSLGPCSL*YTHIHIFVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVMLVSPCLSPLPTPTHSTHSHQLHSQPCVPQAAIDIDTWNNTLRSLL*E*RVPQSCIR*ENTQASGHIHTLFPTIGIEIHGEHKALLLPLWNLSTRDWRHMQSCWTVGLPMLAGLGNGSFSGSMDRPRCHPPRLWPGAPGFCMDLSHPGKEMNTRSMFSEQYLP*IASSPVRTLISHPTVPSTF**MEKN*SPCFLYNTVPILAP*AIFGTETFCPKVGVLLVPQN*LSSQNTEESPIIYPRTTITTKFSQAYVLTESQNSILNPCYCERIC*MLY*KCVRIRCQFTNTGFTVPVSTMHLLCKVKNPNQDICKLQEKTRYLQVHEKARPY*QSTSWRLRRSMPPPTTGQMGRAPWPGFPSGGLSARLLDDPGHHHEECL

CPQPFSTFISQLS*SHGCSSSIRKFTCLKYTSFSCICP*CGFFWIVLPPGSHLSPSQIVTSHP*VRAASDTHTYTYSCLP*GTACSC*MGRHGSSIDGC*PQPSTMTS*SHTWGSWQTLYE*CW*VHVSLLSPHPLTAHTLTNSTLNPVSHRQP*T*THGTTLSGHCCESRGFPRVV*GRRTPRHQVISTLCSPP*E*RFMVNTRPFSSHFGTSAQGTGGICNLVGQ*DFLCWLAWAMAASVAAWTGQDVTHPGSGLGPQVSAWI*ATLGRK*TPGLCSQSNTFHR*HHLQSGL*FLTQLCPAHFDEWKRIEVPVFSTTQCPY*PLKLSLALKLSALKLVSY*SLRTSSALRTLKSPPLFIPEQQ*QQNFHKHMS*LNLKIQYLIHAIVRGSVECCIKSV*ELGVSLPTLALQSQFLQCICSVK*RIPTKISASYRKKQDICRYMKKPGLISKALLGD*EGPCPLPPQDKWEELLGQDSPLEVFQHVSLMTLDTIMKSA

AHNPFQPLSHSCHNHMDVAAQLENSHA*NIPPFPAFAHDVASSG*FFPQEATFLPPKLSPVIPRSVQPLIHTHTHIRVYLRVRLAPVEWADMVPASTDADPSLPQ*HPEAIRGAHGRLCTSDAGESMSLSSPHTHSQHTLSPTPLSTLCPTGSHRHRHMEQHSQVIAVRVEGSPELYKVGEHPGIRSYPHFVPHHRNRDSW*TQGPSPPTLEPQHKGLEAYAILLDSRTSYAGWPGQWQLQWQHGQAKMSPTQALAWGPRFLHGFKPPWEGNEHQVYVLRAIPSIDSIISSQDSDFSPNCAQHILMNGKELKSLFSLQHSAHTSPLSYLWH*NFLP*SWCPTSPSELAQLSEH*RVPHYLSQNNNNNKIFTSICPN*ISKFNT*SMLL*EDLLNVVLKVCEN*VSVYQHWLYSPSFYNASAL*SEESQPRYLQATGKNKISAGT*KSQALLAKHFLEIKKVHAPSHHRTNGKSSLARIPLWRSFSTSP**PWTPS*RVP

Possible N-term?
MNWEAIILSKLTQQQKTKYHMFSLISGNYMLDTYGHKYGNDRHWGLLEG*

Upstream seq of AQ394813 (probably in intron region)
EF*YIVEGFLS*LLTANSLRTRTISNSFLNSQWIWEKTCQVLNIFRDLFLESKTSVYLCSLQMFLNSYLFFIP

MOUSE CYP4A14 sequence related to this human seq. (60% identical)
GITATISIYGLHHNPRFWPNPKVFDPSRFAPDSSHHSHAYLPFSGGS
RNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKLR"

Blast of 4Z1 shows two genes on one genbank entry

emb|AJ131016.1|HSA131016 Homo sapiens SCL gene locus
              Length = 193471
              
 Score =  108 bits (268), Expect = 4e-23
 Identities = 52/54 (96%), Positives = 52/54 (96%)
 Frame = -1

Query: 50     RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVCQKK 103
              RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHV  KK
Sbjct: 160300 RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKK 160139

 Score = 67.9 bits (163), Expect = 7e-11
 Identities = 34/54 (62%), Positives = 43/54 (78%)
 Frame = -3
This seq shows three diffs to 4A11
Query: 46     SAGLRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHV 99
              SA  RNCIG+ FA+ + KVA ALTLLRF+L PD +R P P+ ++VLKSKNGIH+
Sbjct: 129476 SAWPRNCIGKQFAMNQLKVARALTLLRFELLPDPTRIPIPMARLVLKSKNGIHL 129315

 Score = 64.0 bits (153), Expect = 1e-09
 Identities = 29/29 (100%), Positives = 29/29 (100%)
 Frame = -3

Query: 22     QVFNPLRFSRENSEKIHPYAFIPFSAGLR 50
              QVFNPLRFSRENSEKIHPYAFIPFSAGLR
Sbjct: 161417 QVFNPLRFSRENSEKIHPYAFIPFSAGLR 161331

 Score = 58.2 bits (138), Expect = 6e-08
 Identities = 23/23 (100%), Positives = 23/23 (100%)
 Frame = -3

Query: 1      GITVFINIWALHHNPYFWEDPQV 23
              GITVFINIWALHHNPYFWEDPQV
Sbjct: 162536 GITVFINIWALHHNPYFWEDPQV 162468

Blast of human genomic DNA with mouse 4A14

emb|AJ131016.1|HSA131016 Homo sapiens SCL gene locus
              Length = 193471
              
Score = 80.8 bits (196), Expect = 8e-14
 Identities = 37/62 (59%), Positives = 42/62 (67%)
 Frame = -1

Query: 1      MGFFLFSPTRYLDGISGFFQWAFLLSLFLVLFKAVQFYLRRQWLLKTLQHFPCMPSHWLW 60
              M   + SP+R L G+SG  Q   LL L L+L KA Q YL RQWLLK LQ FPC PSHWL+
Sbjct: 140578 MSVSVLSPSRRLGGVSGILQVTSLLILLLLLIKAAQLYLHRQWLLKALQQFPCPPSHWLF 140399

Query: 61     GH 62
              GH
Sbjct: 140398 GH 140393

Score = 55.0 bits (130), Expect = 4e-06
 Identities = 26/44 (59%), Positives = 33/44 (74%)
 Frame = -1

Query: 66     DKELQQILIWVEKFPSACLQCLSGSNIRVLLYDPDYVKVVLGRS 109
              D+ELQ+I   V+ FPSAC   + G  +RV LYDPDY+KV+LGRS
Sbjct: 137275 DQELQRIQERVKTFPSACPYWIWGGKVRVQLYDPDYMKVILGRS 137144

Score = 82.7 bits (201), Expect = 2e-14
 Identities = 37/48 (77%), Positives = 41/48 (85%)
 Frame = -2

Query: 120    FAPWIGYGLLLLNGKKWFQHRRMLTPAFHYDILKPYVKIMADSVNIML 167
              F   +GYGLLLLNG+ WFQHRRMLTPAFH DILKPYV +MADSV +ML
Sbjct: 135972 FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML 135829

Score = 67.5 bits (162), Expect = 7e-10
 Identities = 30/43 (69%), Positives = 37/43 (85%)
 Frame = -3

Query: 168    DKWEKLDGQDHPLEIFHCVSLMTLDTVMKCAFSYQGSVQLDEN 210
              DKWE+L GQD PLE+F  VSLMTLDT+MK AFS+QGS+Q+D +
Sbjct: 134795 DKWEELLGQDSPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDRS 134667

Score = 64.4 bits (154), Expect = 6e-09
 Identities = 27/54 (50%), Positives = 41/54 (75%)
 Frame = -2

Query: 210    NSKLYTKAVEDLNNLTFFRLRNAFYKYNIIYNMSSDGRLSHHACQIAHEHTDGV 263
              NS+ Y +A+ DLN+L F  +RNAF++ + IY+++S GR +H ACQ+AH+HT  V
Sbjct: 134301 NSQSYIQAISDLNSLVFCCMRNAFHENDTIYSLTSAGRWTHRACQLAHQHTGSV 134140

Score = 55.0 bits (130), Expect(2) = 1e-53
 Identities = 25/38 (65%), Positives = 33/38 (86%)
 Frame = -1

Query: 260    TDGVIKMRKSQLQNEEELQKARKKRHLDFLDILLFARM 297
              TD VI++RK+QLQ E EL+K ++KRHLDFLDILL A++
Sbjct: 133711 TDQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAKV 133598

Score =  179 bits (450), Expect(2) = 1e-53
 Identities = 87/112 (77%), Positives = 102/112 (90%), Gaps = 32/112 (28%)
 Frame = -3

Query: 295    ARMEDRNSLSDEDLRAEVDTFMFEGHDTTASGISWIFYALATHPEHQQRCREEVQSILGD 354
              ++ME+ + LSD+DLRAEVDTFMFEGHDTTASGISWI YALATHP+HQ+RCREE+  +LGD
Sbjct: 133520 SQMENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHGLLGD 133341

Query: 355    GTSVTW--------------------------------DHLGQMPYTTMCIKEALRLYPP 382
              G S+TW                                +HL QMPYTTMCIKEALRLYPP
Sbjct: 133340 GASITW*VRAQKMGFPAFSTGAPGLPRPCWCSGWNCFRNHLDQMPYTTMCIKEALRLYPP 133161

Query: 383    VISVSRELSSPVTFPDGRSIPKGI 406
              V  + RELS+PVTFPDGRS+PKG+
Sbjct: 133160 VPGIGRELSTPVTFPDGRSLPKGM 133089

Score = 41.4 bits (95), Expect = 0.053
 Identities = 16/26 (61%), Positives = 19/26 (72%)
 Frame = -1

Query: 405    GITATISIYGLHHNPRFWPNPKVFDP 430
              GI   +SIYGLHHNP+ WPN +V  P
Sbjct: 132199 GIMVLLSIYGLHHNPKVWPNLEVCGP 132122

 Score = 52.3 bits (123), Expect = 3e-05
 Identities = 22/27 (81%), Positives = 25/27 (92%)
 Frame = -3

Query: 426    KVFDPSRFAPDSSHHSHAYLPFSGGSR 452
              +VFDPSRFAP S+ HSHA+LPFSGGSR
Sbjct: 131990 QVFDPSRFAPGSAQHSHAFLPFSGGSR 131910

Score =  100 bits (247), Expect = 8e-20
 Identities = 49/59 (83%), Positives = 54/59 (91%)
 Frame = -3

Query: 448    SGGSRNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKL 506
              S   RNCIGKQFAMN+LKVA ALTLLRFELLPDPTRIP+P+ARLVLKSKNGIHL L++L
Sbjct: 129476 SAWPRNCIGKQFAMNQLKVARALTLLRFELLPDPTRIPIPMARLVLKSKNGIHLRLRRL 129300


This part is the 4Z1 sequence

Score = 58.6 bits (139), Expect = 4e-07
 Identities = 26/43 (60%), Positives = 35/43 (80%)
 Frame = -1

Query: 168    DKWEKLDGQDHPLEIFHCVSLMTLDTVMKCAFSYQGSVQLDEN 210
              +KWE+   Q+  LE+F  VSLMTLD++MKCAFS+QGS+QLD +
Sbjct: 193444 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 193316


An EST covers this region and can extend to the next exon 

gb|AI675602.1|AI675602 wc02e11.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2314028 3'
           similar to SW:CP41_RAT P08516 CYTOCHROME P450 4A1 ;,
           mRNA sequence
           Length = 469
           
 Score = 91.7 bits (224), Expect = 1e-18
 Identities = 43/43 (100%), Positives = 43/43 (100%)
 Frame = +2

Query: 1   NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43
           NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS
Sbjct: 179 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 307

Three frames for this EST does not extend 5 or 3 prime may have retained introns

FHFQY*QGVLFYMVVIIRQIHFVHFIFSNPSKL*KGHKIT*KLGKDGHVYDHDIHPLPQNKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRSVVKGR*LFANNCVTH*HVVPSSLFQYPGLIPESSVQP*QNLQPAHEQFSTSQRP

FSFSVLTRGFILYGCHNKANSFCTLYIFKPQQTLKGT*NNLKIGKRWACV*S*YSSPAPEQMGGTHCPKLTSGALSTCLPDDPGQHHEVCLQPPGQHPVGQVSGKRKVIVCQ*LCHPLTCCSIFPIPVPWTHT*KQCSTLAKSPTSA*TIFYITTT

FIFSTDKGFYFIWLS**GKFILYTLYFQTPANSKRDIK*LKNWEKMGMCMIMIFIPCPRTNGRNTLPKTHVWSSFNMSP**PWTAS*SVPSATRAASSWTGQW*KEGNCLPITVSPTNMLFHLPYSSTLDSYLKAVFNLSKISNQRMNNFLHHND

Compare to earlier section of earlier gene
FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML

Compare to next section of earlier gene
NSQSYIQAISDLNSLVFCCMRNAFHENDTIYSLTSAGRWTHRACQLAHQHTGSV

Score = 96.3 bits (236), Expect = 2e-18
 Identities = 40/63 (63%), Positives = 51/63 (80%)
 Frame = -3

Query: 298    EDRNSLSDEDLRAEVDTFMFEGHDTTASGISWIFYALATHPEHQQRCREEVQSILGDGTS 357
              E+    S+ DL+AEV TFMF GHDTT+S ISWI Y LA +PEHQQRCR+E++ +LGDG+S
Sbjct: 178967 ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSS 178788
Query: 358    VTW 360
              +TW
Sbjct: 178787 ITW 178779


Score = 77.6 bits (188), Expect = 7e-13
 Identities = 32/46 (69%), Positives = 39/46 (84%)
 Frame = -3

Query: 361    DHLGQMPYTTMCIKEALRLYPPVISVSRELSSPVTFPDGRSIPKGI 406
              +HL QMPYTTMCIKE LRLY PV+++SR L  P+TFPDGRS+P G+
Sbjct: 171935 EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPAGL 171798

Score = 40.6 bits (93), Expect = 0.090
 Identities = 18/37 (48%), Positives = 24/37 (64%)
 Frame = -3

Query: 391    SSPVTFPDGRSIPKGITATISIYGLHHNPRFWPNPKV 427
              SS +  P   S+  GIT  I+I+ LHHNP FW +P+V
Sbjct: 162578 SSLMHLPAFSSVYSGITVFINIWALHHNPYFWEDPQV 162468

Score = 76.9 bits (186), Expect = 1e-12
 Identities = 38/60 (63%), Positives = 47/60 (78%)
 Frame = -1

Query: 447    FSGGSRNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKL 506
              F G  RNCIG+ FA+ E KVAVALTLLRF+L PD +R P P+ ++VLKSKNGIH+  KK+
Sbjct: 160315 FLGIPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 160136


Assembled gene

Missing first 167 amino acids
NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS
SYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT
HLEKVIQDRKESLKDKLKQDTTQKRRWDFLDILLSAKV
ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSSITW
EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPA
GITVFINIWALHHNPYFWEDPQV
FNPLRFSRENSEKIHPYAFIPFSAG
PRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV

From GSS

gb|AQ394813|AQ394813 CITBI-E1-2542J12.TF CITBI-E1 Homo sapiens genomic clone 2542J12,
           genomic survey sequence [Homo sapiens]
           Length = 715
           
 Score = 91.7 bits (224), Expect = 7e-19
 Identities = 43/43 (100%), Positives = 43/43 (100%)
 Frame = +1

Query: 1   NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43
           NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS
Sbjct: 523 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 651

Three frames of this entry

PNSNI*LRVFCLDS*LQTL*EQELSLIPF*ILNGSGKKLVKY*IFLGIFS*NQKPLFIYALYRCF*ILIYFLFHVLSA*FFHDKSHINCTNET*KRGKNSYFPTLTKSQIASKAQYHFQY*QGVLFYMVVIIRQIHFVHFIFSNPSKL*KGHKIT*KLGKDGHVYDHDIHPLPQNKWEEHIAQNSRLELFQHVSLMTLDS

RILIYS*GFSVLTLNCKLFENKNYL*FLSKFSMDLGKNLSSTKYF*GSFPRIKNLCLFMLSTDVFEFLFIFYSMFCQLSFSMIKVI*IVLMKLEKEGKIAISPPLQSHK*HLRHSIIFSTDKVFYFIWLS**GKFILYTLYFQTPANSKRDIK*LRNWEKMGMCMIMIFIPCPRTNGRNTLPKTHVWSSFNMSP**PWT

EF*YIVEGFLS*LLTANSLRTRTISNSFLNSQWIWEKTCQVLNIFRDLFLESKTSVYLCSLQMFLNSYLFFIPCFVSLVFP**KSYKLY**NLKKREK*LFPHPYKVTNSI*GTVSFSVLTRCFILYGCHNKANSFCTLYIFKPQQTLKGT*NNLEIGKRWACV*S*YSSPAPEQMGGTHCPKLTSGALSTCLPDDPGQ

This entry extends 5 prime end 

emb|AL136278.1|HS819P24S H.sapiens STS from genomic clone 819P24, sequence tagged site [Homo
           sapiens]
           Length = 499
           
 Score = 45.7 bits (106), Expect = 2e-04
 Identities = 24/25 (96%), Positives = 24/25 (96%)
 Frame = +2

Query: 1   EF*YIVEGFLS*LLTANSLRTRTIS 25
           EF*YIVEGFLS*LLTANSLR RTIS
Sbjct: 425 EF*YIVEGFLS*LLTANSLRIRTIS 499

Three frames of this entry

TQASLSWKSTP*TDAPLFESKAPS**CPLQPAPTTASPCNALTHIQTDTTPIITYLLLFLYSSWLHDSCNLSEHLILVTCLEKSFQIELIN*FTLQS**QLNSV*IFILKWRTSSQ*TFGF*QSEIFWDIGRYVQGGK*HLNSNI*LRVFCLDS*LQTL*E*ELS

NSSFLELEEHTLN*CTSLRV*GSFLMMPPSACSNHCLSLQCPYPHPNRHHPHYHLPIALSLLFLAP*QLQSFRTSDFSNVFGKKLSN*IN*LIHITELITA*QCINIYIEMENFVSINIWLLTK*NFLGYWQVCPRG*ITSEF*YIVEGFLS*LLTANSLRIRTIS

KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPPLSPTYCSFSTLLGSMTVAIFQNI*F**RVWKKAFKLN*LINSHYRVNNSLTVYKYLY*NGELRLNKHLAFDKVKFSGILAGMSKGVNNI*ILIYS*GFSVLTLNCKLFENKNYL

These two entries from GSS extend 5 prime again

gb|AQ061657|AQ061657 CIT-HSP-2348A12.TR CIT-HSP Homo sapiens genomic clone 2348A12
           Length = 475
           
 Score =  101 bits (249), Expect = 1e-21
 Identities = 51/54 (94%), Positives = 51/54 (94%)
 Frame = +3

Query: 1   KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPPLS 54
           KLK  *VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPP LS
Sbjct: 312 KLKPX*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPXLS 473

Three frames of this entry

LFR*HYRILKLIISFYTTTVLQFCCIISILLVRILRCRD*AVCLKSQGRPGAVAHACNPSTLGGQGGRMA*AQEFKTSLANMAKPHLY*KYMIIRQIRGRVEFKLKPLELEEHTLN*CT

AI*VTL*NTQAYYILLYYYSLTILLYYLHFASKDIEMQRLSSLFKVTRQARCSGSCL*SQHFGRPRWADGLSPGVQDQPGQHGQTPSLLKIHDHKAD*GKS*VQTQAS*VGRAHPELMH

SYLGDTIEYSSLLYPFILLQSYNFAVLSPFC**GY*DAEIKQFV*SHKAGQVQWLMPVIPALWEAKVGGWLEPRSSRPAWPTWPNPISTKNT*S*GRLGEELSSNSSLLSWKSTP*TDA

This seq contain ALU repeat

The entry below may extend the sequence
gb|AC006028.2|AC006028 Homo sapiens clone GS165O14, WORKING DRAFT SEQUENCE, 5 unordered pieces
              Length = 284673
              
 Score = 30.9 bits (68), Expect = 6.1
 Identities = 13/15 (86%), Positives = 14/15 (92%)
 Frame = -1

Query: 1      SYLGDTIEYSSLLYP 15
              SYLGDTIEYSSL +P
Sbjct: 284577 SYLGDTIEYSSLTWP 284533

Three frames of the sequence
This frame has a possible ETAM exon of A P450
VMLTVDSRGSPCVEL*ADNNFTQETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGASVPSQ*PQPAAGLPLGSPAHLNRHWV*DTLWSL

*C*RSTLEDPLVWNCERITISHRKQL*P*LRQAI*VTL*NTQA*PGHFLAGQPLP*EPQCPLSDHSQLLASHWVPPHT*TVTGFKTPCGR

DADGRL*RIPLCGIVSG*QFHTGNSYDHDYAKLFR*HYRILKLDLATSLLGSLCLRSLSALSVTTASCWPPTGFPRTPEPSLGLRHPVVV

These ESTs have this ETAM exon (may be accidental hit)


 gb|AA514190|AA514190 HFLEST-741 Human fetal liver (S.Xue) Homo sapiens cDNA
           Length = 503
           
 Score = 46.9 bits (109), Expect = 3e-05
 Identities = 25/35 (71%), Positives = 28/35 (79%)
 Frame = -2

Query: 1   ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35
           ETAMTMITPSYLGDTIEYSS       +A++ALGA
Sbjct: 340 ETAMTMITPSYLGDTIEYSS-------YASNALGA 257

 gb|H00072|H00072 ph5g11u_19/1TV Homo sapiens cDNA clone ph5g11u_19/1TV.
           Length = 334
           
 Score = 46.9 bits (109), Expect = 3e-05
 Identities = 25/35 (71%), Positives = 28/35 (79%)
 Frame = -3

Query: 1   ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35
           ETAMTMITPSYLGDTIEYSS       +A++ALGA
Sbjct: 269 ETAMTMITPSYLGDTIEYSS-------YASNALGA 186

 gb|H00069|H00069 ph5f06u_19/1TV Homo sapiens cDNA clone ph5f06u_19/1TV.
           Length = 405
           
 Score = 45.7 bits (106), Expect = 8e-05
 Identities = 24/35 (68%), Positives = 28/35 (79%)
 Frame = -2

Query: 1   ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35
           ETAMTMITPSYLGDTIEYSS       ++++ALGA
Sbjct: 164 ETAMTMITPSYLGDTIEYSS-------YSSNALGA 81

 gb|T48598|T48598 ph6f9_19/1TV Homo sapiens cDNA clone ph6f9_19/1TV.
           Length = 375
           
 Score = 45.7 bits (106), Expect = 8e-05
 Identities = 24/35 (68%), Positives = 28/35 (79%)
 Frame = -3

Query: 1   ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35
           ETAMTMITPSYLGDTIEYSS       ++++ALGA
Sbjct: 271 ETAMTMITPSYLGDTIEYSS-------YSSNALGA 188

 emb|AL045794.1|AL045794 DKFZp434G226_r1 434 (synonym: htes3) Homo sapiens cDNA clone
           DKFZp434G226 5', mRNA sequence
           Length = 508
           
 Score = 45.3 bits (105), Expect = 1e-04
 Identities = 21/21 (100%), Positives = 21/21 (100%)
 Frame = -2

Query: 1   ETAMTMITPSYLGDTIEYSSL 21
           ETAMTMITPSYLGDTIEYSSL
Sbjct: 120 ETAMTMITPSYLGDTIEYSSL 58

 gb|AI535983.1|AI535983 xu.P6.B5 conorm Homo sapiens cDNA 3', mRNA sequence
          Length = 517
          
 Score = 45.3 bits (105), Expect = 1e-04
 Identities = 21/21 (100%), Positives = 21/21 (100%)
 Frame = -2

Query: 1  ETAMTMITPSYLGDTIEYSSL 21
          ETAMTMITPSYLGDTIEYSSL
Sbjct: 99 ETAMTMITPSYLGDTIEYSSL 37

 gb|T41397|T41397 ph4c2_19/1TV Homo sapiens cDNA clone ph4c2_19/1TV.
           Length = 308
           
 Score = 44.5 bits (103), Expect = 2e-04
 Identities = 24/35 (68%), Positives = 27/35 (76%)
 Frame = -1

Query: 1   ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35
           E AMTMITPSYLGDTIEYSS       +A++ALGA
Sbjct: 308 EPAMTMITPSYLGDTIEYSS-------YASNALGA 225

 gb|AI535783.1|AI535783 jun1.M13-Control conorm Homo sapiens cDNA 3', mRNA sequence
          Length = 506
          
 Score = 34.4 bits (77), Expect = 0.19
 Identities = 15/15 (100%), Positives = 15/15 (100%)
 Frame = -2

Query: 1  ETAMTMITPSYLGDT 15
          ETAMTMITPSYLGDT
Sbjct: 58 ETAMTMITPSYLGDT 14

 gb|AQ059217|AQ059217 CIT-HSP-2348B20.TR CIT-HSP Homo sapiens genomic clone 2348B20
           Length = 403
           
 Score = 63.6 bits (152), Expect = 2e-10
 Identities = 31/32 (96%), Positives = 31/32 (96%)
 Frame = +2

Query: 1   KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSL 32
           KLK P*VGRAHPELMHLSSSLRLLPDDAPFSL
Sbjct: 308 KLKPP*VGRAHPELMHLSSSLRLLPDDAPFSL 403

From ESTs 

GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHF
AIIECKVAVALTLLR
FKLAPDHSRPPQPVRQVVLKSKNGIHVCQKKFA*

gb|AA193450|AA193450 zr40e07.r1 Soares NhHMPu S1 Homo sapiens cDNA clone 665892 5'
           Length = 623
           
 Score =  100 bits (246), Expect = 3e-20
 Identities = 47/47 (100%), Positives = 47/47 (100%)
 Frame = -2

Query: 45  YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT 91
           YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT
Sbjct: 262 YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT 122

Three frames of this EST

GHFFLNPRETLKGP*NNLKLGKDGPCDDQIFIPCPRTKWGEHIAPKPHGLGLFNMSPPGWTLGQHHGRWSLQAPGQPSKLTGQVVKGKVIGLPINWGHPTNMVFHLPLIPVPWTHYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSNLTKNFISSQVSPGIYMARVHYETSHCLRH*LWLWLLLLWTYGMVIIQI

DTFF*TPGKL*RGLKIT*NWEKMGHVMIRYSSLAPEQNGGNTLPQNLTVWGSLTCLPLDGPWDSIMEGGPFRHQGSHPS*QVRW*KGR*LVCQLTGVTQLTWCSIFP*FQYPGLIT*KQCSTLAKSPTSA*TIFYITTTWFSNSALKAKSFQI*PRTSSVHRLVLGFTWPESTMRHLIV*DIDSGYGCFYYGHMAWSSFRL

WTLFFKPPGNSKGALK*LKIGKRWAM**SDIHPLPQNKMGGTHCPKTSRSGAL*HVSPWMDPGTASWKVVPSGTRAAIQVDRSGGKREGNWFAN*LGSPN*HGVPSSPNSSTLDSLPESSVQP*QNLQPAHEQFSTSQRPGFQIQLSRPNLFKFNQELHQFTG*SWDLHGQSPL*DISLSETLTLVMAASTMDIWHGHHSDW

Compare to these sequences 
Next region upstream
NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS

Section before this from related gene
FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML


gb|H21976|H21976 yl38c11.r1 Homo sapiens cDNA clone 160532 5' similar to
           SP:CP4B_RABIT P15128 CYTOCHROME P450 IVB1 ;.
           Length = 332
           
 Score =  105 bits (259), Expect = 8e-22
 Identities = 45/45 (100%), Positives = 45/45 (100%)
 Frame = +1

Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF 283
           GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF
Sbjct: 46  GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF 180

 Score = 81.1 bits (197), Expect = 2e-14
 Identities = 40/51 (78%), Positives = 41/51 (79%)
 Frame = +2

Query: 273 EKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323
           +K  P     FSAG RNCIGQHFAIIEC VAVALTLLRFKLAPD SRPPQP
Sbjct: 149 KKYIPMPSYHFSAGLRNCIGQHFAIIECXVAVALTLLRFKLAPDXSRPPQP 301

gb|AI675602.1|AI675602 wc02e11.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2314028 3'
           similar to SW:CP41_RAT P08516 CYTOCHROME P450 4A1 ;,
           mRNA sequence
           Length = 469
           
 Score = 91.7 bits (224), Expect(2) = 4e-28
 Identities = 43/43 (100%), Positives = 43/43 (100%)
 Frame = +2

Query: 1   NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43
           NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS
Sbjct: 179 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 307

 Score = 55.8 bits (132), Expect(2) = 4e-28
 Identities = 25/25 (100%), Positives = 25/25 (100%)
 Frame = +3

Query: 44  SYLKAVFNLSKISNQRMNNFLHHND 68
           SYLKAVFNLSKISNQRMNNFLHHND
Sbjct: 393 SYLKAVFNLSKISNQRMNNFLHHND 467

gb|AI668602.1|AI668602 yl48g04.x5 Soares breast 3NbHBst Homo sapiens cDNA clone
           IMAGE:161526 3' similar to SW:CP4B_RABIT P15128
           CYTOCHROME P450 4B1 ;, mRNA sequence
           Length = 629
           
 Score =  153 bits (384), Expect = 2e-36
 Identities = 75/79 (94%), Positives = 75/79 (94%)
 Frame = -1

Query: 264 PLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323
           PLRFSRE SE IHPYAFIPFSAG RNCIGQHFAIIEC VAVALTLLRFKLAPDHSRPPQP
Sbjct: 629 PLRFSREISEXIHPYAFIPFSAGLRNCIGQHFAIIECXVAVALTLLRFKLAPDHSRPPQP 450

Query: 324 VRQVVLKSKNGIHVFAKKV 342
           VRQVVLKSKNGIHVFAKKV
Sbjct: 449 VRQVVLKSKNGIHVFAKKV 393

gb|AI668594.1|AI668594 yl38c11.x5 Soares breast 3NbHBst Homo sapiens cDNA clone
           IMAGE:160532 3' similar to SW:CP4B_RABIT P15128
           CYTOCHROME P450 4B1 ;, mRNA sequence
           Length = 629
           
 Score =  159 bits (399), Expect = 3e-38
 Identities = 77/79 (97%), Positives = 77/79 (97%)
 Frame = -1

Query: 264 PLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323
           PLRFSRENSE IHPYAFIPFSAG RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP
Sbjct: 629 PLRFSRENSEXIHPYAFIPFSAGLRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 450

Query: 324 VRQVVLKSKNGIHVFAKKV 342
           VRQVVLKSKNGIHVFAKKV
Sbjct: 449 VRQVVLKSKNGIHVFAKKV 393

gb|AI820775.1|AI820775 yl38c11.y5 Soares breast 3NbHBst Homo sapiens cDNA clone
           IMAGE:160532 5' similar to SW:CP4B_RABIT P15128
           CYTOCHROME P450 4B1 ;, mRNA sequence
           Length = 548
           
 Score =  220 bits (555), Expect = 2e-56
 Identities = 103/104 (99%), Positives = 103/104 (99%)
 Frame = +2

Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298
           GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII
Sbjct: 101 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 280

Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 342
           ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV
Sbjct: 281 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 412

 gb|AI733538.1|AI733538 yl48g04.y5 Soares breast 3NbHBst Homo sapiens cDNA clone
           IMAGE:161526 5' similar to SW:CP4B_RABIT P15128
           CYTOCHROME P450 4B1 ;, mRNA sequence
           Length = 535
           
 Score =  220 bits (555), Expect = 2e-56
 Identities = 103/104 (99%), Positives = 103/104 (99%)
 Frame = +3

Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298
           GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII
Sbjct: 87  GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 266

Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 342
           ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV
Sbjct: 267 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 398

 gb|H25624|H25624 yl48g04.r1 Homo sapiens cDNA clone 161526 5' similar to gb:J02871
           CYTOCHROME P450 IVB1 (HUMAN);contains Alu repetitive
           element;.
           Length = 432
           
 Score =  201 bits (507), Expect = 7e-51
 Identities = 96/102 (94%), Positives = 97/102 (94%), Gaps = 1/102 (0%)
 Frame = +3

Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298
           GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII
Sbjct: 87  GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 266

Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKN-GIHVFAK 340
           ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSK    HVF +
Sbjct: 267 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKXWEFHVFCQ 395

For a list of UNIGENE entries of human P450s see human UNIGENE P450s