puffer.html

Last modified March 25, 2002

D. Nelson New: see the poster Comparative genomics of Fugu and Human P450s

On Oct 26, 2001 the JGI released a draft assembly of Fugu.Fugu P450s have been assembled from the genomic data at the Fugu blast servers.There are 75 different contigs. 35 are complete assembled genes. 12 more are nearly complete, missing only small parts of the sequence, from a few amino acids up to one or two exons. These 47 Fugu P450s have been aligned to 60 human sequences and 8 other fish sequences and used to make a tree. In this tree human branches are red, Fugu branches are blue and other fish are gray. The remaining 28 Fugu sequence contigs are made up of 11 pseudogene pieces (2K12P, 2K13P, 2K14P, 2K15P, 2P5P, 2R2P, 2R3P, 2X5P, 3A50P, 8A3P, 8B3P) and 17 partials. The 17 partials will probably collapse to not more than 8-12 genes, since some appear to be from different parts of the same gene.Examples are CYP2P4 missing exons 4, 7, 8 and 9 and two fragments covering exon 7 and exons 8,9. CYP2X3 and CYP2X4 are both missing exons 1 and 2, however there are two fragments in CYP2X that cover exon 1 and exon 2, it is just not possible to tell if they belong to 2X3 or 2X4. 27A3 is missing exons 3 and 4, however, there are two exon 4 CYP27A fragments to choose from. One probably belongs to 27A3 and the other may be a pseudogene fragment or an alternative splice exon 4.The sequences CYP1A1, 2K11, 3A49, 11B1 are probably real genes that are missing more than half of their sequence or they are hybrids made from several fragments that may or may not belong together (like 11B1). There are two CYP17A fragments and one 3A fragment that are short. They might be from real genes, or pseudogenes. More sequence data is needed to clarify their situation.Based on these arguments there is good evidence for 55 Fugu P450s plus 11-15 pseudogenes. There is a blast server set up to search these Fugu P450s. Go to P450 blast server The sequences used in this BLAST search database are shown in more detail in the FASTA file below. This FASTA file has some sequences assembled with zebrafish sequence, tetraodon sequence or human sequence to fill gaps in the assembly. These are not found in the blast file to avoid confusion between species. To see the alphabetical list of accession numbers go to Alpha List

The Tetraodon sequences have not been searched yet.There are 189,036 GSSs for Tetraodon nigroviridis (freshwater pufferfish) and 45,707 GSSs for Fugu rubripes (salt water Japanese pufferfish) in Genbank , so there should be many P450 sequence fragments in these collections.
D. Nelson

May 24, 2001

June 22, 2001

There are 412,000 Fugu sequences at http://bahama.jgi-psf.org/prod/bin/blast_fugu.cgiThese represent about 200 million bases of sequence, about half the genome sizeof Fugu. Allowing for some redundancy less than half of the genome is represented in this set. I have searched these with 18 mammalian P450s and found 107 accession numbers that have P450 sequence. These have been sorted into 80 contigs and 17 different P450 families. Only CYP39 is still missing.August 27, 2001The Fugu blast server at http://fugu.hgmp.mrc.ac.uk/blast/ has 979,612 sequences; 421,163,906 total letters or just over 1X coverage of the genome. These sequences have been searched and the results posted in two files below. There are now 272 accession numbers in 108 contigs. The sequences are still very fragmentary consisting mostly of single exons or a few exons on a single contig. There is some uncertainty about how to join these into a complete sequence. For example, a complete sequence of CYP3A can be assembled from individual reads, but there are at least two different CYP3As in Fugu, so this is a chimeric sequence. The other complication with ray finned fish is a predicted whole genome duplication that occurred in this lineage after divergence from tetrapods (us). That is why fish have seven Hox gene clusters instead of the four seen in mammals. After this duplication many parts of the duplicated genome were lost, however, some parts were retained. It is known that fish have two CYP19 sequences, probably as a result of the genome duplication. It looks like there might be two CYP17 sequences, two CYP46 sequences and two CYP26B1 sequences. This will add to the difficulty of assembling these genes and other P450s can be expected to have similar duplications. In my assembly of contigs, I have tried to get full length P450s at the risk of making chimeric sequences. I expect these will be self correcting as the genome is more deeply sequenced and the contig size increases, so beware of this when looking at the assembled genes. CYP8B1 is shown as a complete sequence, but there are minor conserved sequence variations between reads, so there are probably two CYP8B1s in Fugu.