Ciona P450s

Last modified July 17, 2003 David Nelson

An alignment has been made of all CYP4 clan members from Ciona savignyi and 
Ciona intestinalis.  The alignment was made to help refine the sequences by 
comparing them to others and to help in identifying intron-exon boundaries.  
There are 35 sequences in this set.  One set of 10 sequences from savignyi has 
expanded recently, since there is only a single intestinalis ortholog.
An alignment of CYP4 clan Ciona sequences
some intron exon boundaries are marked, but this is not finished.
Blue = phase 0, magenta = phase 1, green = phase 2.
A tree of CYP4 clan Ciona sequences

Most Ciona sequences have now been identified from both species, however, the 
other sequences are not refined as to their full length and intron-exon 
boundaries.  This is a work in progess. The partially assembled sequences are 
here.  There are more savignyi CYPs (about 97).  
Sequences of all CYP Ciona sequences
Table of all Ciona savignyi CYP sequences by scaffold number

section below modified Jan. 7, 2002  David Nelson

New info:  An assembly of the C. intestinalis genome has become available.
I have searched this assembly for P450s and have used it to complete the 
partial sequences I had already assembled from individual reads.  I am in 
the process of improving my sequences by comparison with the assembled 
genome.  I have made a new file of 113 sequences. revised Ciona P450 contigs. 
This includes 83 C. intestinalis sequences and 30 C. savignyi 
sequences.  I have made a tree of the complete sequences
(80 C. intestinalis and 18 C. savignyi sequences)
leaving out those partials and pseudogenes that would affect the tree 
building algorithm.  The new tree is
new Ciona tree.  
This tree shows the typical expansion of a single clan to make P450s 
for the organism.  The clan used is the CYP2 clan and there are 51 
C. intestinalis P450s in this clan.

A complete cross reference table has been constructed to link my assemblies 
with the JGI assemblies.  The JGI sequences are linked from this table.

Only two P450 pseudogenes have been found in Ciona inestinalis.
These are sequence 91 (a possible pseudogene of sequence 36), 
and sequence 232 (a possible pseudogene of sequence 112).  
Sequence 231 is incomplete, but it is 80% identical 
to sequence 64 and may represent part of a complete gene.  Sequence 
231 is not found in the JGI assembly v1.0.  It may be upstream of scaffold 638.

The 83 C. intestinalis sequences include the two pseudogenes (seq 91 and seq 232).
This leaves 81 predicted functional genes (assuming seq 231 will be an intact gene).

The Ciona genes will be named after the Anopheles genes are named.

The genome of Ciona savignyi has been sequenced to 14X 
coverage at the  Whitehead Institute.  (genome size 180Mb)
The genome of Ciona intestinalis is being sequenced at the  
Joint Genome Institute.  (2.5X coverage blast searchable) 

I am interested in finding all the P450s from this model urochordate genus that is simpler than Fugu yet more closely related to mammals/vertebrates than to echinoderms. This fits in with comparative P450 studies already done on mammals and Fugu. To this end, Rob Edwards and myself have downloaded the 44 sequence files from the Whitehead institute and set them up in a local Blast server so they can be searched. We attempted to assemble the genome with Phrap from the 4.3 million reads and their associated quality files but our Linux Dell PC could not do this. The MSCI814 Bioinformatics class (25 students and some auditors) that Rob and I taught last semester, scoured the data for every P450 hit and tried to assemble the genes. This is part of the course was very difficult for the students. In fact, no one got a complete Ciona P450 gene assembled. There were too many exons per gene and the reads were too short to easily link the exons. Extensive chromosome walking was required and the students did not fair too well at this task. I have been working on it myself in October So far, 77 different P450 sequences have been found in the Ciona intestinalis data. I have assembled 75 sequences from the I-helix to the end of the gene. 31 are completely assembled. 13 more savignyi P450s are completely assembled. The sequences found in Ciona savignyi will be blasted against the intestinalis Blast server at JGI to find the orthologs of that species and to help in assembly if there are any gaps in the savignyi sequences. The sequence coverage at JGI is less than the Whitehead data, so some missing sequences are expected. To see the detailed progress in analyzing these genomes for p450s see the bioinformatics course pages on this process. Accession hunting gene assembly An alignment of 77 C-terminals is shown here. Alignment A phylogenetic tree of 75 of these sequences is shown here. Tree because the font size is too small to read in this picture see Bare Tree for a tree with readable lables, but no clan annotations. Bare Tree 2 a tree with some extra fugu reference sequences Summary of older information: Blast searches have been done with P450s against the JGI Ciona intestinalis sequence data. The CYP1A1 blast gave 250 valid P450 hits. The protein sequences from these hits have been extracted and blast searched against each other to find overlaps. Since then 25 more P450s have been used to find accession number and sequences. The resulting assemblies are 210 Ciona P450 contigs. All 18 mammalian P450 families and a 8 additional subfamilies (1B1, 2A6, 2D6, 2F1, 2W1, 26B1, 27B1, 27C1) have now been searched against Ciona at JGI. There are 780 accessions so far with a few more expected. All of these have been translated and assembled. Including some Ciona savignyi sequences the Blast file now has 210 contigs. 44 genes are complete. For a FASTA file See the FASTA list For a more detailed list with accession numbers see the master sequence file In addition, Rob Edwards has blast searched 41 human P450s (one from each subfamily) against all 4.3 million reads of Ciona savignyi. These reads were in 44 separate files, since we have not been able to assemble them. 1804 blast files covering all of mammalian P450 space are collected, but these have not been analyzed yet. Each sequence read only contains one or two exons, so there are many fragments that are probably from the same gene, but they have not been joined due to lack of overlap within exons. This may pose a problem that will require comparison of the intron sequences and walking to join the fragments. The accession numbers sorted by sequence are listed here.