Rice P450 Links

From Chlamydomonas to rice: the evolution of green P450s A link to my talk given in Los Angeles August 21, 2002

Note the japonica rice strain has been revised 9/19/2007. See the two links below for an Excel spreadsheet of all japonica P450s Including name, possible function, references and accession numbers. The second file contains the sequences.

Includes ortholog pairs and 953 sequences from japonica and indica

Sept. 25, 2002, There are 458 different P450 gene sequences in rice. Some of these are short pseudogene fragments of only one exon or part of one exon. I count 309 full length, probably functional, P450 genes and another 14 that are now incomplete but may become completed as the genome is finished. That would give 323 full length P450s and 135 pseudogenes.

This compares to 249 full length P450s in Arabidopsis and 23 pseudogenes.

Several sequences may join together to form complete sequences. (98A17 and 98A18), (714D1 and 714D2), (various CYP730 fragments) this may reduce the gene count by three or four.

August 9, 2002, 875 total rice contigs, 341 are japonica, 534 are indica 250 japonica sequences are full length. 293 rice sequences are full length (from both subspecies)

<p”>I will be trying to name all the rice genes before August 20. All gene pieces have been sorted into family bins and many have been named but some like the 94 family still need to be sorted into subfamilies. (8/19/02)

As of March 1, 2002 I have updated the rice P450 page. There are now 202 full length P450s, 61 pseudogene fragments and 61 incomplete sequences. A blast server for the rice P450 sequences is available. The 324 sequence contigs are non-redundant, meaning I have blasted them against each other and combined overlapping fragments into contigs. I have done systematic searches of the Genbank Database for rice P450 ESTs, GSSs, sequences in nr and HTGS hits. These are compiled into the set of sequences on the blast server. The public data has now been completely searched. I will now be looking at the private 5X coverage of rice, but the results from this cannot be presented until they are approved according to legal agreements, so I will not be able to add them daily. </p.

Note from Oct. 16, 2001

Since 1993, there are 197241 rice entries in Genbank.

The majority are recent additions.

Several sequences may join together to form complete sequences.(98A17 and 98A18), (714D1 and 714D2), (various CYP730 fragments)this may reduce the gene count by three or four.

August 9, 2002, 875 total rice contigs, 341 are japonica, 534 are indica250 japonica sequences are full length. 293 rice sequences are full length (from both subspecies)I will be trying to name all the rice genes before August 20. All gene pieces have been sorted into family bins and many have been named but some like the 94 family still need to be sorted into subfamilies.

(8/19/02)

As of March 1, 2002 I have updated the rice P450 page. There are now 202 full length P450s, 61 pseudogene fragments and 61 incomplete sequences.

A blast server for the rice P450 sequences is available. The 324 sequence contigs are non-redundant, meaning I have blasted them against each other and combined overlapping fragments into contigs. I have done systematic searches of the Genbank Database for rice P450 ESTs, GSSs, sequences in nr and HTGS hits. These are compiled into the set of sequences on the blast server. The public data has now been completely searched. I will now be looking at the private 5X coverage of rice, but the results from this cannot be presented until they are approved according to legal agreements, so I will not be able to add them daily.Note from Oct. 16, 2001 Since 1993, there are 197241 rice entries in Genbank.

The majority are recent additions.

2001  32693 up to Oct. 16, 2001
2000  43284
1999  67898
1998  34465
1997   8170
1996   1529
1995   3733
1994   3468
1993   4866

no entries before 1993

Of the total number of entries 197241,
93164 are GSS sequences = Genome Survey Sequence (from BAC ends)
92634 are ESTs
553 are STS (Sequence Tagged Sites for mapping)
277 are patents
1145 are HTGS high throughput genomic
6667 are microsatellite repeats from the Monsanto data
(in NR section with accession numbers starting with AY)
2801 are other NR section sequences, mRNAs, genes and finished BACs

in 1999 the activity was

01/  6291
02/  4294
03/   688
04/  4191
05/  4860  1709 GSSs + 3094 ESTs
06/ 10776  6162 GSSs + 4527 ESTs
07/  6370  5050 GSSs + 1284 ESTs
08/  3785  3239 GSSS +  444 ESTs
09/  1577  1188 GSSs +  360 ESTs
10/  1089   839 GSSs +  216 ESTs
11/ 18077 17800 GSSs +  261 ESTs
12/ 11507 11412 GSSs +   72 ESTs

in 2000 the activity was

43284 20085 GSSs + 22558 ESTs for the whole year

in 2001 the activity was

32639    62 GSSs + 23831 ESTs (up to 10/16/01)

01/   916    0 GSSs +   793 ESTs
02/  7063    0 GSSs +  1276 ESTs 5637 = microsatellite sequences
03/  1833   51 GSSs +  1614 ESTs
04/  1311    0 GSSs +  1212 ESTs
05/  2766    0 GSSs +  2632 ESTs
06/   702    0 GSSs +   595 ESTs
07/  3647    2 GSSs +  3355 ESTs
08/   576    0 GSSs +    85 ESTs
09/   420    9 GSSs +   420 ESTs
10/ 13405    0 GSSs + 12269 ESTs 12269 ESTs deposited Oct. 5