The Molecular History of Eukaryotic Life The Synapomorphies
David Nelson Dec. 10, 2000; modified Jan. 11, 2001; Last modified Jan. 19, 2001 Note: some data removed on April 18, 2001 to satisfy requirements on no prior publication for Science. Once publication is in press, this data will be restored. Given below is a list of current insertions used for making phylogenetic inferences. The list is followed by some genereal and specific discussion about these insertions. The ARP3, enolase and xanthine dehydrogenase/ aldehyde oxidase insertions are original with this website.
  • EF1 alpha insertion for fungal (including Microsporidia) + animal clade
  • actin related protein 3 (ARP3) insertion for animal + fungal + Amoebozoa clade
  • alanyl tRNA replacement from mitochondrial genome for Diplomonandida exclusive clade This may include Parabasala also, but no sequence is available yet.
  • Ribosomal protein L24 insertion for Euglenozoa specific clade
  • xanthine dehydrogenase/ aldehyde oxidase insertion for plant specific clade This may include Rhodophyta and Glaucophyta but no sequences are available yet.
  • Ribosomal protein L34 insertion for plant specific clade This may include Rhodophyta and Glaucophyta but no sequences are available yet.
Rare insertions or deletions do occur in proteins resulting in length variations. If the change in length is small and it is not in a critical region so the protein can tolerate the change, these mutations become part of the history of the protein. They are passive markers that are inherited in all future descendants of this gene. The thirteen eukaryotic branches that represent all extant eukaryotic life each had a beginning. There was a time that each line branched off from a common ancestor and existed for a while as a single species before splitting again. The interval between the first branching and the second branching had a duration. Certainly, there were mutations that occurred during this time and probably neutral insertions or deletions occurred that have been preserved over time to the present day. These are the 12 synapomorphies that we are searching for to define the geneaology of the main eukaryote lines. Twelve is a small number. Genomes on the other hand are large, with thousands of genes. At the bare minimum there will be a few hundred of these genes that have orthologs in every species of eukaryote. It is only a matter of looking at enough sequence data to find the insertions/ deletions that occurred at the right time interval at the beginning of a branch. With genome projects begun on a number of human pathogens like Plasmodium, Leishmania, Trypanosoma, Giardia, Encephalitozoon cuniculi and other key non-pathogenic species on the eukaryotic tree like Dictyostelium, it will not be long before massive amounts of data are available to search. Already two of these 12 branchings are resolved. The first was the plant, animal, fungi trichotomy. The molecular feature that supported the joining of animals and Fungi was an insertion in the EF1 alpha protein sequence. This 12 amino acid insertion was present in animals, fungi and no other eukaryote or prokaryote phyla with sequence data(1). The same was also true for three short deletions in enolase. These are synapomorphies that define a clade. The structure of the cells is not important, and in this case it was misleading. Since the microsporidia have been moved into the fungi, I checked for the 12 amino acid insertion from a microsporidian to see if it was there and I did find it as an 11 amino acid insertion. See below. This is the data from Figure 1A of (1) with the sequence of Glugea plecoglossi added and some relabeling of higher order taxa to fit the new results of Baldauf et al. (2). EST sequences from Alveolates, Stramenopiles and Oxymonads (Parabasala) have been added to make the set more complete. Note the KTL sequence after the gap in Eimeria. This is most like the animal and fungi sequences. Two of the Rhodophyta sequences have an insert of four amino acids seen in six ESTs from yezoensis and one complete mRNA from purpurea.

DEFINITION Glugea plecoglossi peptide elongation factor 1 alpha, complete cds. ACCESSION D84253 MAEEKPILNVCFIGHVDSGKSTTVGNLAFQLGAIDARKMDKLKK EAEERGRGTFSYAYVMDMSAAERERGITITTSLMKLETSKHMLNVIDCPGHQDFIKNM VTGAAQADVGVVLVPCATGEFESCISGGTLKDHIMISGVLGCRKLIVCVNKVDTIDEK NRISRFDEVAKEMKGIIAKSHPDKDPIIIPISGYLGINIVEKGDKFEWFKGWKPVSGA GDSIFTLEGALNSQIPPPRPIDKPLRMPIDSIHKIPGIGMVYTGRVSTGAIKPGMVVS SQPTGVVAEVKTLEIHKQSRAAVVSGENCGVALKAASQGNPALIKPGHVFSNTKDSPV EIFEAARAKIVVVAHPKGIKPGYCPTMDLGTHHVPCQITKFISKRMPGIKEEIPSPDV VQKGENVTCIIHPQKQVVMETLKEVPSLARFALRDAGRIVGIGAIEARYTKAEYETEV PSTTGKGRKATRGPG Animals Xenopus laevis GDNMLEPSPNM....PWFKGWKITRKEGSGSGTTLLEALDCILPPSRP Animals Drosphila melanogaster GDNMLEASDRL....PWYKGWNIERKEGKADGKTLLDALDAILPPSRP Animals Onchocerca volvulus GDNMLEPSANM....PWFKGWSVERKEGTMTGKTLLEALDSVVPPQRP Fungi Saccharomyces cerevisiae GDNMIEATTNA....PWYKGWEKETKAGVVKGKTLLEAIDAIEQPSRP Fungi Mucor racemosus GDNMLDESTNM....PWFKGWNKETKAGSKTGKTLLEAIDAIEPPVRP Microsporidia Glugea plecoglossi GINIVEKGDKF....EWFKGWKPVSGAGD-SIFTLEGALNSQIPPPRP Amoebozoa Dictyostelium discoideum GDNMLERSDKM....EWYKG............PTLLEALDAIVEPKRP Amoebozoa Entamoeba histolytica GDNMIEPSTNM....PWYKG............PTLIGALDSVTPPERP Plants Arabidopsis thaliana GDNMIERSTNL....DWYKG............PTLLEALDQINEPKRP Plants Triticum aestivum (wheat) GDNMIERSTNL....DWYKG............PTLLEALDQINEPKRP Rhodophyta Porphyra yezoensis GENLFERTDKTHALGKWYKG............PCLLEALDNCDPPKRP Rhodophyta Porphyra purpurea tef-c GENLFERTGGDHALGKWYKG............PCLLEALDACDPPKRP Rhodophyta Porphyra purpurea tef-s GDNMLEKSTNM....PWYTG............PTLFEVLDAMKPPKRP Glaucophyta Cyanophora paradoxa GDNMLEPSSNL....GWYKG............PTLVEALDQVEEPKRP Heterokonta Phytophthora infestans GDNMIDRSTNM....PWYKG............PFLLEALDNLNAPKRP Heterokonta Laminaria digitata GDNMVDKSTNM....PWYKG............PYLMEALDTMKEPTRP Alveolata Eimeria S5-2 GDNMVERSSNM....GWYKG............KTLVEALDSVEPPKRP Alveolata Plasmodium falciparum GDNLIEKSDKT....PWYKG............RTLIEALDTMEPPKRP Discicristata Euglena gracilis GDNMIEASENM....GWYKG............LTLIGALDNLEPPKRP Discicristata Trypanosoma cruzi GDNMIDKSENM....PWYKG............PTLLEALDMLEPPVRP Discicristata Acrasis rosea GDNMLEKSTNM....PWYKG............PTLLEALDALEPPKRP Diplomonadida Giardia lamblia GDNIMEKSDKM....PWYEG............PCLIDAIDGLKAPKRP Parabasala Dinenympha exilis GDNMLDRSTNM....PWYKD............PILFDALDLLEVPKRP Parabasala Pyrsonympha grandis GDNMLERSPNM....SWYKD............PILFEALDLLEVPKRP Parabasala Trichomonas vaginalis GDNMTEKSPNM....PWYNG............PYLLEALDSLQPPKRP Parabasala Trichonympha agilis GDNMTEKSDKM....PWWKG............LTLLEALDTLEPPKRP Archaea Sulfolobus acidocaldarius GDNVTHKSTKM....PWYNG............PTLEELLDQLEIPPKP Archaea Halobacterium halobium GDNIAEESEHT....GWYDG............EILLEALNELPAPEPP

The second branch to be resolved is Giardia’s position as one of the most distant members of the eukaryotes based on alanyl tRNA synthetases. There is a paper in the Oct. 24 PNAS on alanyl tRNA synthetases(3). Eukaryote ATSs appear to have come from the mitochondrial ancestor by displacing the host ATS of archaeal origin. These two types of ATS are easy to distinguish because of a fairly large insertion in the eukaryotes that is missing in the archaea. When this gene was sequenced from Giardia (a diplomonad and thought to be a very ancient eukaryote), the insert was not there. It looked like the archaeal ATS genes. Giardia do not have mitochondria, but they do have mitochondrial like nuclear genes like HSP70s etc. This suggests that they did have the mitochondrial ancestor and did transfer some genes from it to the nucleus. However, the ATS gene did not replace its host gene counterpart in this line (possibly the only line of eukaryotes where this failed to happen). The authors argue that the diplomonads diverged very early before this key evolutionary event happened. This qualifies for a very rare unique evolutionary landmark, placing Giardia outside all other eukaryote lineages. The sequence of this gene needs to be done for other members of the Archezoa group like Trichomonas vaginalis to see if they all have the archaeal sequence. If they do then the last branch on the eukaryote tree is fixed and every other eukaryote species falls between the animal fungal clade and the Archezoa clade. The next branch to be fixed in place is probably the Mycetozoa. With most of the Dictyostelium genome sequenced, there is a good chance a key synapomorphy could be found that unites this clade with the animal fungal clade and excludes plants and protists. [Jan. 16, 2001] An insertion has been found in ARP3 (actin related protein 3) that unites animals, fungi and Amoebozoa (Mycetozoa + Lobosea). This supports the tree in Baldauf et al.(2). ARP3 is part of a conserved eukaryotic complex called the ARP2/3 complex that organizes microtubules. The alignment of the region of interest is shown below. Because this is an actin related gene and there are hundreds of actin sequences and many isoforms, it is difficult to do the comparison. Actin numbering in Arabidopsis is at least up to actin 12. The insertion that seems relevant is the one shown in sequences 1-7 and 12 below and probably the one in the two entamoeba histolytica sequences (14,15 Amoebozoa). The latter two have two additional amino acids, that probably reflects a further modification of this insert. The plants have a five amino acid insert that is conserved across monocots and dicots. There is an intron in the Arabidopsis gene that occurs between SK and CEM. The three animals shown do not have an intron in the insertion. The fungi do not have an intron here. Arabidopsis actin genes have a phase 1 intron in G of GIVLD (Actin 1, 2, 3, 4, 7, 8, 11, 12) The intron in the ARP3 Arab gene is phase 0 and much larger. Note that potatos have at least eight kinds of actin and none of those have this insert. I speculate that the plant ARP3 insert is derived from a separate event related to the modification of the intron seen in other plant actins. Note that the Arabidopsis ARP3 actin does not have the intron seen in the other Arabidopsis actins mentioned above. I point out the insertion in enolase that unites plants with heterokonts and alveolates to the exclusion of animals and fungi as additional support for this interpretation. The fact that there are Amoebozoa (and fungi and animals, not shown) actin sequences without the insert suggests it is specific to ARP3. It is not clear if ARP3 exists in all the species shown, though the ARP2/3 complex seems to be well conserved among eukaryotes. The sequence fragment shown ends in GYVIGS instead of GY[AS]LPH for the other actins and that sequence is not found yet in the heterokonts, alveolates, discicristata, diplomonadida and parabasala.

1. PGLYIAVQAVLALAASWTSrqvgert..lTGTVIDSGDGVTHVIPVAEGYVIGS ANIMALS 2. PGLYIAVQAVLALAASWTSrqvgert..lTGTVIDSGDGVTHVIPVAEGYVIGS ANIMALS 3. PGLYIAVQAVLALAASWASrsaeert..lTGIVVDSGDGVTDVIPVAEGYVIGS ANIMALS 4. AGLYIAVQAVLALAASWTSskvqdrs..lTGTVIDSGDGVTHVIPVAEGYVIGS FUNGI 5. AGLYIAVQAVLALAASWTSskvtdrs..lTGTVVDSGDGVTHIIPVAEGYVIGS FUNGI 6. AGLYIAVQAVLALAASWTSskvtdrs..lTGTVIDSGDGVTHVIPVAEGYVIGS FUNGI 7. PGLYIAVQAVLALAASWTSkna.ekt..lTGTVIDSGDGVTHVIPISEGYVIGS AMOEBOZOA 8. PAMYVAIQAVLSLYASGRT..........TGIVMDSGDGVSHTVPIYEGYALPH AMOEBOZOA 9. PAMYVAIQAVLSLYASGRT..........TGIVMDSGDGVSHTVPIYEGYALPH AMOEBOZOA 10. PAMYVAIQAALSLYASGRT..........TGIVMDSGDGVSHTVPIYEGYALPH AMOEBOZOA 11. PAMYVAIQAVLSLYASGRT..........TGIVMDSGDGVSHTVPIYEGYSLPH AMOEBOZOA 12 PGLYIAVQAVLALAASWTSkqvtekt..lTGTVIDSGDGVTHVIPVAEGYVIGS AMOEBOZOA 13. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVTHTVPIYEGYALPH AMOEBOZOA 14. PGMYIAVQAVLAIVASWSRkdnnpanarlTGTVIDSGDGVTHIIPVADGYVIGS entamoeba hist. 15. PGMYIAVQAVLAMCNHGSRkdnnpanaelTGTVIDSGDGVTHIIPVADGYVIGS entamoeba hist. 16. PGLYIAVNSVLALAAGYTTsk.....cemTGVVVDVGDGATHVVPVAEGYVIGS Arabidopsis intron in gap 17. PGLYIAVNSVLALXAGYTTsk.....cxmTGVVXDIGDGAT Medicago truncatula EST 18. VLALAAGYTTtk.....cemTGVVVDVGDGATHIVPAADGYVIGS Zea mays EST 19. PGLYIAVXSVLALAAGYTTsk.....cemTGIVVDGGDGATHVVPGADRYVIGS Glycine max EST 20. PGLYIAVQPVLALAAGYTAsk.....cemTGVVVDIGDGATHVVPVAEGYVIGS Lycopersicon (tomato) EST 21. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 22. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 23. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 24. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 25. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 26. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 27. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 28. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 29. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 30. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 31. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 32. PAMYVAIQAVLSLYASGRT..........TGIVMDSGDGVSHTVPIYEGYALPH PLANTS 33. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH PLANTS 34. PAFYVAIQAVLSLYASGRT..........SGIVVDSGDGVTHTVPIYEGYSLPH RHODOPHYTA 35. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVTHTVPIYEGXALPH GLAUCOPHYTA 36. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 37. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 38. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 39. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 40. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 41. LAMYVNIQAVLSLYASGST..........TGCVLDSGDGVSHTVPIYEGYALPH HETEROKONTA 42. PAMYVAIQAVLSLYSSGRT..........TGIVLDSGDGVSHTVPIYEGYALPH ALVEOLATA 43. PAMYVSIQAILSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYVLPH ALVEOLATA 44. PAMYVAIQAVLSLYSSGRT..........TGIVLDSGDGVSHTVPIYEGYALPH ALVEOLATA 45. PAMYVNIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYAIPH ALVEOLATA 46. PALYVSIQAVLSLYSSGRT..........TGIVLDCGDGVSHTVPIYEGYSLPH DISCICRISTATA 47. PAMYVNIQAVLSLYASGRT..........TGCVLDSGDGVSHTVPIYEGYALPH DISCICRISTATA 48. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH DISCICRISTATA 49. PAMYVAIQAVLSLYASGRT..........TGIVLDSGDGVSHTVPIYEGYALPH DISCICRISTATA 50. PAFYVQVQAVLALYSSGRT..........TGIVIDTGDGVTHTVPVYEGYSLPH DIPLOMONADIDA 51. PSFYVGIQAVLSLYSSGRT..........TGIVFDAGDGVSHTVPIYEGYSLPH PARABASALA 52. PSFYVGIQAVLSLYSSGRT..........TGIVFDAGDGVSHTVPIYEGYSLPH PARABASALA 53. PSFYVGIQAVLSLYSSGRT..........TGIVFDAGDGVSHTVPIYEGYSLPH PARABASALA 54. PSFYVGIQAVLSLYSSGRT..........TGIVFDAGDGVSHTVPIYEGYSLPH PARABASALA 55. PSFYVGIQAVLSLYSSGRT..........TGIVFDAGDGVSHTVPIFEGYSLPH PARABASALA 1. = Fugu rubipes pufferfish AF034581.1 actin-related protein (ARP3) gene 2. = ARP3_HUMA human AC027120.4 Homo sapiens chromosome 3 clone RP11-141B14 3. = ARP3_DROME Drosophila melaonogaster AE003556.2 genomic scaffold 4. = ARP3_NEUCR NEUROSPORA CRASSA 5. = ARP3_SCHPO SCIZOSACCHAROMYCES POMBE 6. = ARP3_YEAST SACCHAROMYCES CEREVISIAE 7. = ARP3_DICDI Dictyostelium dicoideum 8. = ACT1_DICDI Dictyostelium dicoideum 9. = ACT8_DICDI Dictyostelium dicoideum 10. = ACT2_DICDI Dictyostelium dicoideum 11. = ACT3_DICDI Dictyostelium dicoideum 12. = ARP3_ACACA Acanthamoeba castellanii (Amoeba) 13. = ACT1_ACACA Acanthamoeba castellanii (Amoeba) 14. = AZ680014.1 ENTIP12TF Entamoeba histolytica 15. = AZ527984.1 ENTCL76TF Entamoeba histolytica 16. = AC007357 gene= F3F19.20 Strong similarity to gb|U29610 Actin-like protein (Arp3) 17. = BF638396.1 NF057B03PL1F1027 Phosphate starved leaf Medicago truncatula 18. = BE025380.1|BE025380 945026A07.Y1 945 - Mixed adult tissues Zea mays 19. = AW202092.1 sf11a07.y1 Gm-c1027 Glycine max 20. = BE441091.1 EST408361 tomato immature green fruit Lycopersicon esculentum 21. = ACTB_ARATH ARABIDOPSIS 22. = ACT1_ARATH ARABIDOPSIS 23. = ACT3_ARATH ARABIDOPSIS 24. = ACT4_ARATH ARABIDOPSIS 25. = ACTC_ARATH ARABIDOPSIS 26. = ACTD_SOLTU SOLANUM TUBEROSUM (POTATO) 27. = ACTC_SOLTU SOLANUM TUBEROSUM (POTATO) 28. = ACT8_SOLTU SOLANUM TUBEROSUM (POTATO) 29. = ACTA_SOLTU SOLANUM TUBEROSUM (POTATO) 30. = ACT5_SOLTU SOLANUM TUBEROSUM (POTATO) 31. = ACT3_SOLTU SOLANUM TUBEROSUM (POTATO) 32. = ACT9_SOLTU SOLANUM TUBEROSUM (POTATO) 33. = ACTB_SOLTU SOLANUM TUBEROSUM (POTATO) 34. = AV435141.1 Porphyra yezoensis 35. = U90325.1 Cyanophora paradoxa actin gene 36. = ACT1_PHYIN Phytophthora infestans 37. = ACT2_PHYIN Phytophthora infestans 38. = ACT_PHYME Phytophthora megasperma (Potato pink rot fungus) 39. = ACT_COSCS Costaria costata 40. = ACT_FUCVE Fucus vesiculosus 41. = ACT_FUCDI Fucus distichus 42. = ACT1_PLAFA Plasmodium falciparum 43. = ACT2_PLAFA Plasmodium falciparum 44. = ACT_TOXGO Toxoplasma gondii 45. = ACT_CRYPV Cryptosporidium parvum 46. = EUGLENA GRACILIS O65204 47. = ACT_ACHBI Achlya bisexualis 48. = ACT1_NAEFO Naegleria fowleri 49. = ACT2_NAEFO Naegleria fowleri 50. = ACT_GIALA GIARDIA LAMBLIA 51. = U63122.1 Trichomonas vaginalis actin gene 52. = U63124.1 Trichomonas vaginalis Type2 actin mRNA 53. = U63126.1 Trichomonas vaginalis Type3 actin mRNA 54. = U63125.1 Trichomonas vaginalis Type4 actin mRNA 55. = U63123.1 Trichomonas vaginalis Type5 actin mRNA

XANTHINE DEHYDROGENASE / ALDEHYDE OXIDASE PLANT SPECIFIC INSERTION The xanthine dehydrogenase and aldehyde oxidase genes code for molybdenum containing proteins with FeS centers and flavins. They share a common ancestor(4). Only the plants so far show the 8 amino acid insert below. Arabidopsis has at least three aldehyde oxidase genes and one xanthine dehydrogenase(5). The sampling in the lower eukaryotes is not present yet. Depending on the time of this insertion, the question of Plantae as a monophyletic group may be answered, if the insertion is found in Rhodophyta and Glaucophyta but no other lower eukaryote groups. At the present, the sequences are not available from these species.

ANIMALS 1 GQGLHTKMVQVASRALK........IPTSKIYISETSTNTVPNTSPTAA HUMAN XDH ANIMALS 2 GQGVHTKMIQVVSRELR........MPMSNVHLRGTSTETVPNANISGG HUMAN AOX ANIMALS 3 GQGLNQKMLQVCSEALK........RPIDTITIVDCSTDKITNAPETGA C. ELEGANS AF038614 ANIMALS 4 GQGLHTKILQIAARCLE........IPIERIHIHDTSTDKVPNASATAA C. ELEGANS Z83318 ANIMALS 5 GQGLNTKMIQCAARALG........IPSELIHISETATDKVPNTSPTAA D. MELANOGASTER ANIMALS GQGMNTKISQVAAHTLG........IPMEQVRIEASDTINGANSMVTGG D. MELANOGASTER ANIMALS 6 GQGVNTKVAQVAAHILG........IPMTKISIKTMSSLTSPNASVSGG CULEX MOSQUITO FUNGI 7 GQGLHTKMTMIAAEALG........VPLSDVFISETATNTVANTSSTAA ASPERGILLUS FUNGI 8 GQGLHTKMTQIAAQALN........VPLENVFISETATNTVANASATAA NEUROSPORA PLANTS GQGLHTKVAQVAASAFN........IPLSSVFVSETSTDKVPNASPTAA Arab XDH AL079347 PLANTS 9 GQGLWTKVQQMVAYGLGmikcegsdDLLERIRLLQTDTLSMSQSSYTAG ARABIDOPSIS PLANTS 10 GQGLWTKVKQMTAFGLGqlcpgggeSLLDKVRVIQADTLSMIQGGVTGG ZEA MAYS PLANTS 11 AQGLWTKVRQMTAYALGsiesswaeDLVEKVRVIQADTLSVVQGGLTAG TOMATO PLANTS 12 GQGLWTKVKQMTAYALSliqcagneELLEKVRVIQADTLSLIQGGFTSG COTTON PLANTS 13 SQGLMTKVKQMAAFALGavqcdridSLLDKVRVVQTDTVSLIQGGLTAG GLYCINE MAX PLANTS 14 GQGLWTKVQQMTAFGLGelcpdggeSLLDKVRVIQADTLSMIQGGFTGG SORGHUM BICOLOR PLANTS 15 GQGLWTKVKQMAAFGLGqlwadrsqDLLERVRVIQADTFIVVQGGWTTG WHEAT PLANTS 16 GQGLWTKVRQMTAYALGsiesswaeDLVEKVRVIQADTLSVVQGGPTNG POTATO BACTERIA 17 GQGLHAKMVQVAAAVLG........IDPVQVRITATDTSKVPNTSATAA RHODOBACTER CAPSULATUS 1 = NM_000379.2 Homo sapiens xanthine dehydrogenase (XDH) 2 = L11005.1 Human aldehyde oxidase (hAOX) mRNA 3 = U53333.1 Caenorhabditis elegans cosmid F36A4 THIS SEQ HAS AN INTRON IN THE GAP 4 = Z83318.1 C. elegans cosmid F55B11 Aldehyde oxidase/ xanthine dehydrogenase 5 = AE003698.2 Drosophila melanogaster 6 = AF202953 Culex pipiens quinquefasciatus aldehyde oxidase (AO1) 7 = X82827 hxA gene; xanthine dehydrogenase. Aspergillus nidulellus. 8 = AL391572 Neurospora crassa probable xanthine dehydrogenase 9 = AB037271 Arabidopsis thaliana AAO4 mRNA for aldehyde oxidase 10 = D88452 Zea mays mRNA for aldehyde oxidase-2 11 = U82559.1 Lycopersicon esculentum aldehyde oxidase 1 homolog (TAO1) 12 = AI728545 BNLGHi11006 Six-day Cotton fiber Gossypium hirsutum 13 = BE801522 sr15f03.y1 Gm-c1050 Glycine max + BF066257 st28c11.y1 Gm-c1067 Glycine max THESE TWO ESTS ARE NOT FROM IDENTICAL GENES AA 1-9 FROM BF066257 14 = BE596841 PI1_59_C08.b1_A002 Pathogen induced 1 (PI1) Sorghum bicolor 15 = BE497404 WHE0751_A10_A19ZS heat-stressed seedling cDNA library Triticum aestivum 16 = BE921778 EST425547 potato leaves and petioles Solanum tuberosum 17 = AJ001013.1|RCXDHAB Rhodobacter capsulatus xdhB (XANTHINE DEHYDROGENASE B) TREE IS A UPGMA TREE WITHOUT THE INSERT INCLUDED BASED ON 41 AMINO ACID ALIGNMENT ONLY XDH genes cluster at the top of the tree. Animal genes below the Bacterial XDH gene are Assumed to be Aldehyde oxidase genes, though only human is known to be AOX +---------ANIMALS 1 HUM XDH +--6 ! +---------ANIMALS 5 DROS XDH +-10 ! ! +----FUNGI 7 ASPER XDH ! ! +------3 +-11 +--8 +----FUNGI 8 NEUROS XDH ! ! ! ! ! +-----------PLANTS ARAB XDH +------13 ! ! ! +--------------ANIMALS 4 C ELEGANS XDH +-15 ! ! ! +----------------BACTERIA 17 XDH ! ! +-16 +------------------------ANIMALS 2 HUM AOX ! ! ! ! +-----------------ANIMALS DROS AOX +------------17 +------14 ! ! +-----------------ANIMALS 6 CULEX AOX ! ! ! +----------------------------ANIMALS 3 C ELEGANS AOX ! -18 +--------------PLANTS 9 ! ! ! ! +-PLANTS 10 ! ! +------1 ! AOX GENES ! ! +-PLANTS 14 +--------------------------12 +--5 ! ! ! +--PLANTS 11 ! ! ! +---2 ! +--7 +--4 +--PLANTS 16 ! ! ! ! ! ! ! +------PLANTS 12 +--9 ! ! +----------PLANTS 15

The EF1 alpha sequences have a two amino acid deletion that appears in several groups so it may have occurred multiple times. It may not be too useful as a cladistic marker. It is interesting to note that Entamoeba histolytica (seq. 15) clustered with Euplotes (seqs. 31,32 in Alveolata) in a 1997 paper by Baldauf and Doolittle(6) using EF1 alpha sequences to look at eukaryote evolution. Perhaps the Entamoeba sequence does belong with the Alveolates and not with the Amoebozoa, though the text of that paper claims this is an artefact. The close association between Euplotes and Entamoeba EF1 alpha may arise from a lateral gene transfer. If this were true then the 2 amino acid deletion would only have happened in the fungi and some of the alveolates. It is complicated by the observation that some but not all ciliophorans have the deletion, while all apicomplexans seem to have it. Note: one of the Euplotes sequences is probably an animal contamination. It contains the animal specific sequence STEPPYS. One scenario for these results is that the ancestor to the alveolates suffered the 2 amino acid deletion, and the line of ciliophorans including Paramecium, Tetrahymena, Colpoda, Stentor and Stolonychia regained the 2 amino acida as an insert.

1. NKMDSTEPPYSQKRYEEIVKEVSTYIKKIGY XM_004398.1 ANIMALS 2. NKMDSTEPPYSQARFEEITKEVSAYIKKIGY L23807 ANIMALS 3. NKMDSTEPPYSEARYEEIKKEVSSYIKKIGY AE003779 ANIMALS 4. NKMDSTEPPFSEARFTEITNEVSGFIKKIGY U51994.1 ANIMALS 5. NKMD..TTGWSQARFEEIVKETSNFIKKVGF D82571.1 FUNGI 6. NKMD..TTQWSQTRFEEIIKETKNFIKKVGY D45837.1 FUNGI 7. NKMD..SVKWDESRFQEIVKETSNFIKKVGY M15666.1 FUNGI 8. NKMD..SVKWDESRFQEIVKETSNFIKKVGY M15667.1 FUNGI 9. NKMD..STKYSEARYNEIVKEVSTFIKKIGF AF157303 FUNGI 10. NKMD..STKYSEARYNEIVKEVSTFIKKIGY AF157274 FUNGI 11. NKMD..STKYSEARYNEIVKEVSTFIKKIGY AF157283 FUNGI 12. NKMD..STKYSEARYNEIVKEVSTFIKKIGY AF157250 FUNGI 13. NKMDEKSVNWSQARYDEIVKETSSFVKKIGY AF016243 AMOEBOZOA 14. NKMDEKSTNYSQARYDEIVKEVSSFIKKIGY X55972.1 AMOEBOZOA 15. NKMD..AIQYKQERYEEIKKEISAFLKKTGY M92073.1 AMOEBOZOA ? 16. NKMDATTPKYSKARYDEIIKEVSSYLKKVGY AC026875.4 PLANTS 17. NKMDATTPKYSKARYDEIVKEVSSYLKKVGY X14449.1 PLANTS 18. NKMDATTPKYSKARYDEIVKEVSSYLKKVGY U76259.1 PLANTS 19. NKMDATTPKYSKARYDEIVKEVSSYLKKVGY D63582.1 PLANTS 20. NKMDDKNVNWSKERYEEVSKEMDLYLKKVGY U08844.1 RHODOPHYTA 21. NKMDEKSVNYGQPRFEEIKKEVSAYLKKIGY AF092951.1 GLACOCYSTOPHYCEAE 22. NKMDDKSVNYSEARYKEIKAEMTSFLTKVGY AF091355.2 HETEROKONTA 23. NKMDDSSVMYGQARYEEIKSEVTTYLKKVGY AJ249839.1 HETEROKONTA 24. NKMDDSSVMYGESRYTEIKEEVAIYLKKVGY AW400940 HETEROKONTA 25. NKMD..TVKYSEDRYEEIKKEVKDYLKKVGY AJ224150.1 ALVEOLATA Apicomplexa 26. NKMD..TVKYSEDRYEEIKKEVKDYLKKVGY AJ224151.1 ALVEOLATA Apicomplexa 27. NKMD..TVKYSEDRYEEIKKEVKDYLKKVGY AJ224153.1 ALVEOLATA Apicomplexa 28. NKMD..TVKYSEDRYEEIKKEVKDYLKKVGY X60488 ALVEOLATA Apicomplexa 29. NKMD..TCEYKQSRFDEIFNEVDGYLKKVGY U71180.1 ALVEOLATA Apicomplexa 30. NKMD..SCNYSEDRFNEIQKEVAMYLKKVGY N81319 ALVEOLATA Apicomplexa 31. NKMD..AAEYDEERYNEIKKEVSEYLAKVGY AF056100.1 ALVEOLATA Ciliophora 32. NKMD..AAEYDETRYKEIKKEVSEYLDKVGY AF056099.1 ALVEOLATA Ciliophora 33. NKMD..TIDYDEERFNEIVENVSDHLGKIGY U26267.1 ALVEOLATA Ciliophora 34. NKMD..AVQYNEERFTDIKKEVIDYLKKMGS AF056104.1 ALVEOLATA Ciliophora 35. NKMD..AISYNEERFTDIKKEVIDYLKKLGF AF056105.1 ALVEOLATA Ciliophora 36. NKMDEKTVNYAQGRYDEIVKEMRDYLKKVGY AF172083.2 ALVEOLATA Ciliophora 37. NKMDEKTVNFSEERYQEIKKELSDYLKKVGY D11083.1 ALVEOLATA Ciliophora 38. NKMDDKSVNWDQGRFLEIKKELSDYLKKIGY AF056108 ALVEOLATA Ciliophora 39. NKMDEKTVAWSQSRFEEIQKEVQEYLKKVGY AF056098 ALVEOLATA Ciliophora 40. NKMDDKSCNWSQERYEEIKKEVSQYLKKVGY AF056106 ALVEOLATA Ciliophora 41. NKMDSTEPPYSEDRYEEIKKEVSTFLAKVGY U26260.1 ALVEOLATA Probable ANIMAL contaminant 42. NKMDDKTVNYGQERYDEIVKEVSAYIKKVGY AJ234094.1 EUGLENOZOA 43. NKMDDKSVNFAQERYDEIVKEVSAYLKKVGY AI069750 EUGLENOZOA 44. NKMDDKTVQYSQARYEEISKEVGTYLKRVGY U72244.1 EUGLENOZOA 45. NKFDDKTVKYSQARYEEIKKEVSGYLKKVGY X16890.1 EUGLENOZOA 46. NKFDDTSVSYKEDRYNEIKSEVGRYLKGLGF AF058283.1 HETEROLOBOSEA 47. NKMDDKSVQYKEDRYKEIQKEVADYLKKVGY AF190771.1 HETEROLOBOSEA 48. NKMDDPNVNYSKDRYNEIKTEMTKTLVAIGY U94406.1 DIPLOMONADIDA 49. NKMDDPQVNYSEARYTEIKTEMQKTFKQIGF U37081.1 DIPLOMONADIDA 50. NKMDDPQVNYSEARYKEIKEEMQKNLKQIGY U29442.1 DIPLOMONADIDA 51. NKMDDGQVKYSKERYDEIKGEMMKQLKNIGW D14342.1 DIPLOMONADIDA 52. NKMDDNTVNYAESRYKEITEEMKNVLKQVGW AF230353.1 PARABASALA 53. NKMDDKTVNYNKARFDEIQSEMTRILTGIGY D78479.1 PARABASALA 54. NKMDDKTVNYNKARFDEITAEMTRILTGIGY AF058282.1 PARABASALA 55. NKMDDKSVNWAESRYNEIKTEMRTYLKKIGY AF230349.1 PARABASALA Oxymonadida 56. NKMDDKSVNWAESRYNEVKTEMGTYLKKIGY AB007029.1 PARABASALA Oxymonadida 1. Homo sapiens EF 1 alpha mRNA 2. Danio rerio EF 1 alpha mRNA 3. Drosophila melanogaster 4. Caenorhabditis elegans 5. S. pombe mRNA for EF 1 alpha-A 6. Neurospora crassa EF 1-alpha gene 7. S.cerevisiae EF1 alpha-A gene 8. S.cerevisiae EF1 alpha-B gene 9. Zychaea mexicana gene 10. Phascolomyces articulosus gene 11. Rhizomucor pusillus gene 12. Fennellomyces linderi gene 13. Physarum polycephalum (tef1) gene 14. D. Discoideum gene 15. Entamoeba histolytica mRNA 16. Arabidopsis thaliana Genomic There are 4 EF1-alpha genes in Arabidopsis A1-A4 17. Tomato LeEF-1 mRNAcds 18. Zea mays mRNA 19. Oryza sativa mRNA 20. Porphyra purpurea 21. Cyanophora paradoxa Glacocystophyceae 22. Blastocystis hominis gene 23. Phytophthora infestans mRNA 24. Laminaria digitata cDNA 25. Plasmodium berghei EF-1alpha A-gene 26. Plasmodium berghei EF-1alpha B-gene 27. Plasmodium knowlesi EF-1alpha A-gene 28. Plasmodium falciparum MEF-1 gene 29. Crytposporidium parvum MRNA 30. Toxoplasma gondii cDNA 31. Euplotes aediculatus TEF2 GENE 32. Euplotes aediculatus TEF1 GENE 33. Euplotes crassus EFA2 gene 34. Spathidium sp. (TEF1) gene CILIOPHORA 35. Spathidium sp. (TEF2) gene CILIOPHORA 36. Paramecium tetraurelia gene 37. Tetrahymena pyriformis mRNA 38. Stylonychia mytilus TEF1 gene 39. Colpoda inflata TEF1 gene 40. Stentor coeruleus TEF1 gene 41. Euplotes crassus GENE 80% identical to Drosophila, 70% to seq 29 Euplotes contains STEPPY seq found in animals. Probable animal DNA contamination 42. Trypanosoma brucei cDNA 43. T. cruzi cDNA 44. Leishmania braziliensis gene 45. Euglena gracilis tef mRNA 46. Naegleria andersoni gene 47. Acrasis rosea (tef1) gene Heterolobosea 48. Spironucleus vortens gene Diplomonad 49. Hexamita inflata gene Diplomonad 50. Diplomonad ATCC50330 gene Diplomonad 51. Gardia lamblia mRNA Diplomonad 52. Trichonympha agilis mRNA Parabasala 53. Trichomonas tenax gene Parabasala 54. Trichomonas vaginalis mRNA Parabasala 55. Dinenympha exilis mRNA Oxymonadida 56. Unidentified Oxymonadida A-14 mRNA

Ribosomal protein L24 has a four amino acid insert in Euglenozoa that needs more work to clarify the extent of this insert in other groups.

MRIEKCYFCSSPIYPGHGIQFV....RNDSTVFKFCRSRCNKLF C. elegans Animals MKVELCSFSGYKIYPGHGRRYA....RTDGKVFQFLNAKCESAF Human Animals MKVEIDSFSGAKIYPGRGTLFV....RGDSKIFRFQSSKSASLF K. lactis Fungi MRVHTCYFCSGPVYPGHGIMFV....RNDSKVFRFCRSKCHKNF S. pombe Fungi MKVEVDSFSGAKIYPGRGTLFV....RGDSKIFRFQNSKSASLF yeast B Fungi LKTELCRFSGQKIYPGRGIRFI....RSDSQVFLFLNSKCKRYF Arabidopsis Plants LKTELCRFSGAKIYPGRGIRFI....RGDSQVFLFVNSKCKRYF Cicer arietinum (chickpea) Plants LKTELCRFSGQKIYPGKGIRFI....RSDSQVFLFANSKCKRYF Hordeum vulgare (barley) Plants VKTEVCQYSGFRIYPGHGSRFI....RVDGKSYVFANSKCEASF Porphyra yezoensis AV436717 Rhodophyta RTSLCNYSEFKIYPARGMKFV....RGDSKVFHFINTKVESLF Dicty Mycetozoa KTQVCAFSGFKIPVAKGRKYV....RLDLKSFTFINKKSLMQF Entamoeba histolytica AJ012925 Entamoebidae IKTETCSFSESRIYPGHGSRFI....RRDGSAYVFINSKSKSLF Phytophthora sojae BE583183 Heterokonta IKTDMCSFSEYRIYPGRGQRFV....AKDGKVHTFIHRKEASLF Toxoplasma gondii cDNA AA531998 Alveolata IKTELCSFSEYRIYPGRGQRFV....AKDGKVHTFLFRKEASLF Sarcocystis neurona BE635379 Alveolata IKTELCSFSEYRIYPGRGLKFV....SRDGKVHTYIHSKEARLG Eimeria tenella BE028025 Alveolata MRTIDCEFSHFAVHPGHGRRYVPFAFLSTKPVLTFARPKCFAMY Trypanosoma brucei Euglenozoa MRTIECEFSHFAVHPGHGRRYVPFAFLSTKPVLTFSRPKCFALY Leishmania major embl CAC02879 Euglenozoa MEKRVCSFCGYDIEPGTGKMYV....RRDGRVFYFCSGKCEKNM Archaeoglobus fulgidus Archaea PEWRTCSFCGYEIEPGKGKMVV....EKDGTVLYFCSSKCEKSY Methanococcus jannaschii Archaea

Ribosomal large subunit 34 Plant specific insertion (look for other taxa with insert, especially Rhodophyta and Glaucophyta)

CGVCPGKLRGVRPVRPKVLMR--LSKTKKHVSRAYGG human Animals CGVCPGRLRGVRAVRPKVLMR--LSKTKKHVQPGLWW rat Animals CGVLPGRLRGVVAVRPKVLMR--LSKTKKHVQQGLWW pig Animals CGQCKEKLSGIKPSRPSERPR--MCRRLKTVTRTFGG aedes albopictus Animals CGDCGSALQGISTLRPRQYAT--VSKTHKTVSRAYGG yeast A gene Fungi CGDTGVPLQGIPALRPREFAR--LSHNKKTVQRAYGG S. pombe Fungi CGECGVNLAGIPALRPYQYKN--LPKSRRTVSRAYGG Dictyostelium AU038253 Mycetozoa CGDCGCVLAGIRHIRPHQYGW--LGKSKRTVTRAYGG Entamoeba histolytica AZ687907 Entamoebidae CPVTGKRIQGIPHLRPTEYKRSRLSRNRRTVNRAYGG tobacco Plants CPVTGKRIQGIPHLRPTEYKRSRLSRNRRTVNRAYGG pea Plants CPVTGKRIQGIPHLRPSEYKRSRLSRNRRTVNRAYGG Arabidopsis Plants CGDCGISLPGIKHLRPKQYKN--LKKREKTVSRAYGG Laminaria digitata AW400906 Heterokonta CGDCKQTLKGIKHLQSKEYKN--VSKTQRTVSRAYGG Phytophthora infestans BE775634 Heterokonta CGDCKKPLAGIPACAPYEMKH--LKKRERTVARAYGG Cryptosporidium parvum AA555437 Alveolata CGGCGRLLPGIPARRPPQFRL--LKKRERTVNRAYGG Eimeria tenella AI757126 Alveolata CANCHRALPGIPAVAPHKMKL--LKKKDRTVHRAYGG Sarcocystis neurona BF324030 Alveolata CGNCDRALPGIPAVAPHRLRL--LKKRERTVHRAYGG Toxoplasma gondii AA520444 Alveolata CAICGAELHGVPRGRPVEIRK--LPKSQRRPERPYGG Methanococcus jannaschii Archaea

Return to index References 1. Baldauf SL, Palmer JD. Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. Proc Natl Acad Sci U S A. 1993 Dec 15;90(24):11558-62. 2. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science. 2000 Nov 3;290(5493):972-7. 3. Joseph W. Chihade, James R. Brown, Paul R. Schimmel, and Llu’s Ribas de Pouplana. Origin of mitochondria in relation to evolutionary history of eukaryotic alanyl-tRNA synthetase PNAS 97, 12153-12157 2000 4. Terao M, Kurosaki M, Demontis S, Zanotta S, Garattini E. Isolation and characterization of the human aldehyde oxidase gene: conservation of intron/exon boundaries with the xanthine oxidoreductase gene indicates a common origin. Biochem J. 1998 Jun 1;332 ( Pt 2):383-93. 5. Biochim Biophys Acta 1998 Jul 9;1398(3):397-402 Biochemical and genetic characterization of three molybdenum cofactor hydroxylases in Arabidopsis thaliana. Hoff T, Frandsen GI, Rocher A, Mundy J Aldehyde oxidases and xanthine dehydrogenases/oxidases belong to the molybdenum cofactor dependent hydroxylase class of enzymes. Zymograms show that Arabidopsis thaliana has at least three different aldehyde oxidases and one xanthine oxidase. 6. Baldauf SL, Doolittle WF. Origin and evolution of the slime molds. Proc Natl Acad Sci U S A. 1997 Oct 28;94(22):12007-12.