Molecular Biology & Evolution. AOP
Estimating translational selection in Eukaryotic genomes
Natural selection on codon usage is a pervasive force that acts on a large variety of prokaryotic and eukaryotic genomes. Despite this, obtaining reliable estimates of selection on codon usage has proved complicated, perhaps due to the fact that the selection coefficients involved are very small. In this work, a population genetics model is used to measure the strength of selected codon usage bias, S, in ten eukaryotic genomes. It is shown that the strength of selection is closely linked to expression, and that reliable estimates of selection coefficients can only be obtained for genes with very similar expression levels. We compare the strength of selected codon usage for orthologous genes across all ten genomes classified according to expression categories. Fungi genomes present the largest S values (2.24-2.56), while multicellular invertebrate and plant genomes present more moderate values (0.61-1.91). The large mammalian genomes (human and mouse) show low S values (0.22-0.51) for the most highly expressed genes. This might not be evidence for selection in these organisms as the technique used here to estimate S does not properly account for nucleotide composition heterogeneity along such genomes. The relationship between estimated S values and empirical estimates of population size is presented here for the first time. It is shown, as theoretically expected, that population size has an important role in the operativity of translational selection.
Spontaneous Mutational and Standing Genetic (Co)Variation at Dinucleotide Microsatellites in Caenorhabditis briggsae and C. elegans
Understanding the evolutionary processes responsible for shaping genetic variation within and between species requires separating the effects of mutation and selection. Differences between the patterns of genetic variation observed in nature and when mutations are allowed to accumulate in the relative absence of selection can reveal biases imposed by selection. We characterize the genetic variation at dinucleotide microsatellite repeats in four sets of 250-generation mutation accumulation (MA) lines, two in the species Caenorhabditis briggsae and two in C. elegans, and compare the mutational variation to the standing variation in those species. We also compare the mutational properties of microsatellites to the cumulative effects of mutations on fitness in the same lines. Integrated over the whole genome, we infer that the mutation rate of C. briggsae is about twice that of C. elegans, consistent with the cumulative mutational effects on fitness. The mutational spectrum (ratio of insertions to deletions) differs between repeat types and, in some cases, between species. The per-locus mutation rate is significantly positively correlated with the standing genetic variation at the same locus in both species, providing justification for the common practice of using the standing genetic variance as a surrogate for the mutation rate.
A signature of evolutionary constraint on a subset of ectopically expressed olfactory receptor genes
Olfactory receptor (OR) genes constitute the basis for the sense of smell. It has long been observed that a subset of mammalian OR genes are expressed in non-olfactory tissues, in addition to their expression in the olfactory epithelium. However, it is unknown whether OR genes have alternative functions in the non-olfactory tissues. Using a dedicated microarray, we surveyed OR gene expression in olfactory epithelium as well as a number of non-olfactory tissues, in human and chimpanzee. Our observations suggest that ectopically expressed OR orthologous genes are expressed in the same non-olfactory tissues in human and chimpanzee more often than expected by chance alone. Moreover, we found that the subset of orthologous OR genes with conserved ectopic expression evolve under stronger evolutionary constraint than OR genes expressed exclusively in the olfactory epithelium. Thus, although we cannot provide direct functional data, our observations are consistent with the notion that a subset of ectopically expressed OR genes have additional functions in non-olfactory tissues.
Inferring selection on amino acid preference in protein domains
Models that explicitly account for the effect of selection on new mutations have been proposed to account for ‘codon bias’, or the excess of ‘preferred’ codons that results from selection for translational efficiency and/or accuracy. In principle such models can be applied to any mutation that results in a preferred allele, but in most cases the fitness effect of a specific mutation cannot be predicted. Here we show that is possible to assign preferred and un-preferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered ‘preferred alleles’ just as are synonymous mutations leading to codons for more abundant tRNAs.
We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived-allele frequency distribution and polymorphism to divergence ratios of preferred and un-preferred differences, and find evidence for both negative and positive selection acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains, and find that it is consistent with the quantitative effects that we observe.
Gene expression levels are a target of recent natural selection in the human genome
Changes in gene expression may represent an important mode of human adaptation. However, to date, there are relatively few known examples in which selection has been shown to act directly on levels or patterns of gene expression. In order to test whether SNPs that affect gene expression in cis are frequently targets of positive natural selection in humans, we analyzed genome-wide SNP and expression data from cell lines associated with the International HapMap Project. Using a haplotype-based test for selection that was designed to detect incomplete selective sweeps, we found that SNPs showing signals of selection are more likely than random SNPs to be associated with gene expression levels in cis. This signal is significant in the Yoruba (which is the population that shows the strongest signals of selection overall), and shows a trend in the same direction in the other HapMap populations. Our results argue that selection on gene expression levels is an important type of human adaptation. Finally, our work provides an analytical framework for tackling a more general problem that will become increasingly important: namely, testing whether selection signals overlap significantly with SNPs that are associated with phenotypes of interest.
The single mitochondrial porin of Trypanosoma brucei is the main metabolite transporter in the outer mitochondrial membrane
All mitochondria have integral outer membrane proteins with β-barrel structures including the conserved metabolite transporter VDAC and the conserved protein import channel Tom40. Bioinformatic searches of the Trypanosoma brucei genome for either VDAC or Tom40 identified a single open reading frame, with sequence analysis suggesting that VDACs and Tom40s are ancestrally related and should be grouped into the same protein family: the mitochondrial porins. The single T. brucei mitochondrial porin is essential only under growth conditions that depend on oxidative phosphorylation. Mitochondria isolated from homozygous knock-out cells did not produce ATP in response to added substrates, but ATP production was restored by physical disruption of the outer membrane. These results demonstrate that the mitochondrial porin identified in T. brucei is the main metabolite channel in the outer membrane and therefore the functional orthologue of VDAC. No distinct Tom40 was identified in T. brucei. In addition to mitochondrial proteins, T. brucei imports all mitochondrial tRNAs from the cytosol. Isolated mitochondria from the VDAC knock-out cells import tRNA as efficiently as wild-type. Thus, unlike the scenario in plants, VDAC is not required for mitochondrial tRNA import in T. brucei.
Selection on cis-regulatory variation at B4galnt2 and its influence on von Willebrand Factor in house mice
The RIIIS/J inbred mouse strain is a model for type 1 von Willebrand disease (VWD), a common human bleeding disorder. Low von Willebrand factor (VWF) levels in RIIIS/J are due to a regulatory mutation, Mvwf1, which directs a tissue-specific switch in expression of a glycosyltransferase, B4GALNT2, from intestine to blood vessel. We recently found that Mvwf1 lies on a founder allele common among laboratory mouse strains. To investigate the evolutionary forces operating at B4galnt2, we conducted a survey of DNA sequence polymorphism and microsatellite variation spanning the B4galnt2 gene region in natural Mus musculus domesticus populations. Two divergent haplotypes segregate in these natural populations, one of which corresponds to the RIIIS/J sequence. Different local populations display dramatic differences in the frequency of these haplotypes, and reduced microsatellite variability near B4galnt2 within the RIIIS/J haplotype is consistent with the recent action of natural selection. The level and pattern of DNA sequence polymorphism in the 5’ flanking region of the gene significantly deviates from the neutral expectation and suggests that variation in B4galnt2 expression may be under balancing selection and/or arose from a recently introgressed allele that subsequently increased in frequency due to natural selection. However, coalescent simulations indicate that the heterogeneity in divergence between haplotypes is greater than expected under an introgression model. Analysis of a population where the RIIIS/J haplotype is in high frequency reveals an association between this haplotype, the B4galnt2 tissue-specific switch, and a significant decrease in plasma VWF levels. Given these observations, we propose that low VWF levels may represent a fitness cost that is offset by a yet unknown benefit of the B4galnt2 tissue-specific switch. Similar mechanisms may account for the variability in VWF levels and high prevalence of VWD in other mammals, including humans.
The Chloroplast Genomes of the Green Algae Pyramimonas, Monomastix and Pycnococcus Shed New light on the Evolutionary History of Prasinophytes and the Origin of the Secondary Chloroplasts of Euglenids
Because they represent the earliest divergences of the Chlorophyta and include the smallest known eukaryotes (e.g. the coccoid Ostreococcus), the morphologically diverse unicellular green algae making up the Prasinophyceae are central to our understanding of the evolutionary patterns that accompanied the radiation of chlorophytes and the reduction of cell size in some lineages. Seven prasinophyte lineages, 4 of which exhibit a coccoid cell organization (no flagella nor scales), were uncovered from analysis of nuclear-encoded 18S rDNA data; however their order of divergence remains unknown. In this study, the chloroplast genome sequences of the scaly quadriflagellate Pyramimonas parkeae (clade I), the coccoid Pycnococcus pravosoli (clade V) and the scaly uniflagellate Monomastix (unknown affiliation) were determined, annotated and compared to those previously reported for green algae/land plants, including 2 prasinophytes (Nephroselmis olivacea, clade III and Ostreococcus tauri, clade II). The chlororachniophyte Bigelowiella natans and the euglenid Euglena gracilis, whose chloroplasts originate presumably from distinct green algal endosymbionts, were also included in our comparisons. The 3 newly sequenced prasinophyte genomes differ considerably from one another and from their homologs in overall structure, gene content and gene order, with the 80,211-bp Pycnococcus and 114,528-bp Monomastix genomes (98 and 94 conserved genes, respectively) resembling the 71,666-bp Ostreococcus genome (88 genes) in featuring a significantly reduced gene content. The 101,605-bp Pyramimonas genome (110 genes) features 2 conserved genes (rpl22 and ycf65) and ancestral gene linkages previously unrecognized in chlorophytes as well as a DNA primase gene putatively acquired from a virus. The Pyramimonas and Euglena cpDNAs revealed uniquely shared derived gene clusters. Besides providing unequivocal evidence that the green algal ancestor of the euglenid chloroplasts belonged to the Pyramimonadales, phylogenetic analyses of concatenated chloroplast genes and proteins elucidated the position of Monomastix and showed that the Mamiellales, a clade comprising Ostreococcus and Monomastix, are sister to the Pyramimonadales + Euglena clade. Our results also revealed that major reduction in gene content and restructuring of the chloroplast genome occurred in conjunction with important changes in cell organization in at least 2 independent prasinophyte lineages, the Mamiellales and the Pycnococcaceae.
Explosive speciation of Takifugu: another use of fugu as a model system for evolutionary biology
Although the fugu Takifugu rubripes has attracted attention as a model organism for genomic studies because of its compact genome, it is not generally appreciated that there are approximately 25 closely related species with limited distributions in the waters of East Asia. We performed molecular phylogenetic analyses and constructed a timetree using whole mitochondrial genome sequences from 15 Takifugu species together with 10 outgroups to examine patterns of diversification. The resultant timetree showed that the modern Takifugu species underwent explosive speciation during the Pliocene 1.8–5.3 MYA, which is comparable to that of the Malawi cichlids and tropheine cichlids in the Lake Tanganyika. Considering their limited distributions and remarkable variations in coloration, morphology, and behavior, the results of the present study strongly suggest that Takifugu species are strong candidates as a model system for evolutionary studies of speciation mechanisms in marine environments where few such organisms are available.
Adaptive evolution of 5'HoxD genes in the origin and diversification of the cetacean flipper
The homeobox genes Hoxd12 and Hoxd13 control digit patterning and limb formation in tetrapods. Both show strong expression in the limb bud during embryonic development, are highly conserved across vertebrates, and show mutations that are associated with carpal, metacarpal and phalangeal deformities. The most dramatic evolutionary reorganization of the mammalian limb has occurred in cetaceans (whales, dolphins and porpoises), in which the hindlimbs have been lost and the forelimbs have evolved into paddle-shaped flippers. We reconstructed the phylogeny of digit patterning in mammals, and inferred that digit number has changed twice in the evolution of the cetacean forelimb. First, the divergence of the early cetaceans from their even-toed relatives coincided with the reacquisition of the pentadactyl forelimb, whereas the ancestors of tetradactyl baleen whales (Mysticeti) later lost a digit again. To test whether the evolution of the cetacean forelimb is associated with positive selection or relaxation of Hoxd12 and Hoxd13, we sequenced these genes in a wide range of mammals. In Hoxd12, we found evidence of Darwinian selection associated with both episodes of cetacean forelimb reorganization. In Hoxd13, we found a novel expansion of a polyalanine tract in cetaceans compared to other mammals (17/18 residues versus 14/15 residues, respectively), lengthening of which has previously been shown to be linked to synpolydactyly in humans and mice. Both genes also show much greater sequence variation among cetaceans than across other mammalian lineages. Our results strongly implicate 5’HoxD genes in the modulation of digit number, web forming, and the high morphological diversity of the cetacean manus.
Origin of primate orphan genes: a comparative genomics approach
Genomes contain a large number of genes that do not have recognisable homologues in other species, and which are likely to be involved in important species-specific adaptive processes. The origin of many such "orphan" genes remains unknown. Here we present the first systematic study of the characteristics and mechanisms of formation of primate-specific orphan genes. We determine that codon usage values for most orphan genes fall within the bulk of the codon usage distribution of bona-fide human proteins, supporting their current protein-coding annotation. We also show that primate orphan genes display distinctive features in relation to genes of wider phylogenetic distribution: higher tissue specificity; more rapid evolution and shorter peptide size. We estimate that around 24% are highly divergent members of mammalian protein families. Interestingly, around 53% of the orphan genes contain sequences derived from transposable elements and are mostly located in primate-specific genomic regions. This indicates frequent recruitment of transposable elements as part of novel genes. Finally, we also obtain evidence that a small fraction of primate orphan genes, around 5,5%, might have originated de novo from mammalian non-coding genomic regions.
Near neutrality, rate heterogeneity, and linkage govern mitochondrial genome evolution in Atlantic Cod (Gadus morhua) and other gadine fish
The mtDNA genome figures prominently in evolutionary investigations of vertebrate animals due to a suite of characteristics that include absence of Darwinian selection, high mutation rate, and inheritance as a single linkage group. Given complete linkage and selective neutrality, mtDNA gene trees are expected to correspond to intraspecific phylogenies, and mtDNA diversity will reflect population size. The validity of these assumptions is however rarely tested on a genome-wide scale. Here, we analyze rates and patterns of molecular evolution among 32 whole mitochondrial genomes of Atlantic Cod (Gadus morhua) as compared with its sister taxon, the walleye pollock (G. (Theragra) chalcogrammus), and genomes of seven other gadine codfish. We evaluate selection within Gadus morhua, between sister species, and among species, and intra-specific measures of linkage disequilibrium and recombination within G. morhua. Strong rate heterogeneity occurs among sites and genes at all levels of hierarchical comparison, consistent with variation in mutation rates across the genome. Neutrality indices (dN/dS) are significantly greater than unity among G. morhua genomes, and between sister species, which suggests that polymorphisms within species are slightly deleterious, as expected under the nearly-neutral theory of molecular evolution. Among species of gadines, dN/dS ratios are heterogeneous among genes, consistent with purifying selection and variation in functional constraint among genes rather than positive selection. The dN/dS ratio for ND4L is anomalously high across all hierarchical levels. There is no evidence for recombination within G. morhua. These patterns contrast strongly with those reported for humans: genome-wide patterns in other vertebrates should be investigated to elucidate the complex patterns of mtDNA molecular evolution.
Comparative and Functional Characterization of Intragenic Tandem Repeats in Ten Aspergillus Genomes
Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases, and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in ten genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs are evolutionarily labile, their functional associations appear to be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Large number of ultraconserved elements were already present in the jawed vertebrate ancestor
Stephen et al. (2008, 25: 402; doi:10.1093/molbev/msm268) identified 13,736 ultraconserved elements (UCEs) in placental mammals and investigated their evolution in opossum, chicken, frog and fugu. They found that there was a massive expansion of UCEs during tetrapod evolution and the substitution rate in UCEs showed a significant decline in tetrapods compared to fugu, suggesting they were exapted in tetrapods. They considered it unlikely that these elements are ancient but evolved at a higher rate in teleost fishes. In this study, we investigated the evolution of UCEs in a cartilaginous fish, the elephant shark and show that nearly half the UCEs were present in the jawed vertebrate ancestor. The substitution rate in UCEs is higher in fugu than in elephant shark, and approximately one-third of ancient UCEs have diverged beyond recognition in teleost fishes. These data indicate that UCEs have evolved at a higher rate in teleost fishes, which may have implications for their vast diversity and evolutionary success.
Problems and solutions for estimating indel rates and length distributions
Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a dataset of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic datasets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12–16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6–1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.
Molecular evolution, functional variation and proposed nomenclature of the gene family that includes sphingomyelinase D in sicariid spider venoms
The venom enzyme sphingomyelinase D (SMase D) in the spider family Sicariidae (brown, or fiddleback spiders (Loxosceles) and six-eyed sand spiders (Sicarius)) causes dermonecrosis in mammals. SMase D is in a gene family with multiple venom-expressed members that vary in functional specificity. We analyze molecular evolution of this family, and variation in SMase D activity among crude venoms using a dataset that represents the phylogenetic breadth of Loxosceles and Sicarius. We isolated a total of 190 non-redundant nucleotide sequences encoding 168 non-redundant amino acid sequences of SMase D homologs from 21 species. Bayesian phylogenies support 2 major clades, that we name and β, within which we define 7 and 3 subclades respectively. Sequences in the clade are exclusively from New World Loxosceles and L. rufescens and include published genes for which expression products have SMase D and dermonecrotic activity. The β clade includes paralogs from New World Loxosceles that have no, or reduced, SMase D and no dermonecrotic activity, and also paralogs from Sicarius and African Loxosceles of unknown activity. Gene duplications are frequent, consistent with a birth-and-death model, and there is evidence of purifying selection with episodic positive directional selection. Despite having venom-expressed SMase D homologs, venoms from New World Sicarius have reduced, or no, detectable SMase D activity, and Loxosceles in the Southern African spinulosa group have low SMase D activity. Sequence conservation mapping shows >98% conservation of proposed catalytic residues of the active site and around a plug motif at the opposite end of the TIM barrel, but and β clades differ in conservation of key residues surrounding the apparent substrate binding pocket. Based on these combined results we propose an inclusive nomenclature for the gene family, renaming it SicTox, and discuss emerging patterns of functional diversification.
No Rosetta stone for a sense-antisense origin of aminoacyl tRNA synthetase classes
Amino-acyl tRNA synthetases (aaRS) are crucial enzymes that join amino acids to their cognate tRNAs, thereby implementing the genetic code. These enzymes fall into two unrelated structural classes whose evolution has not been explained. The leading hypothesis, proposed by Rodin and Ohno, is that the two classes originated as a pair of sense/antisense genes encoded on opposite strands of a single DNA molecule. This unusual idea obtained its main support from reports of a "Rosetta stone": a locus where genes for heat shock protein 70 (HSP70) and an NAD-specific glutamate dehydrogenase (NAD-GDH), which are structurally homologous to the two classes of aaRS, overlap extensively on complementary DNA strands. This remarkable locus was first characterized in the oomycete Achlya klebsiana, and has since been reported in many other species. Here we present evidence that the ORFs on the antisense strand of HSP70 genes are spurious, and we identify a more probable candidate for the gene encoding the oomycete NAD-GDH enzyme. These results cast extensive doubt on the Rosetta Stone argument.
Estimating translational selection in Eukaryotic genomes
Natural selection on codon usage is a pervasive force that acts on a large variety of prokaryotic and eukaryotic genomes. Despite this, obtaining reliable estimates of selection on codon usage has proved complicated, perhaps due to the fact that the selection coefficients involved are very small. In this work, a population genetics model is used to measure the strength of selected codon usage bias, S, in ten eukaryotic genomes. It is shown that the strength of selection is closely linked to expression, and that reliable estimates of selection coefficients can only be obtained for genes with very similar expression levels. We compare the strength of selected codon usage for orthologous genes across all ten genomes classified according to expression categories. Fungi genomes present the largest S values (2.24-2.56), while multicellular invertebrate and plant genomes present more moderate values (0.61-1.91). The large mammalian genomes (human and mouse) show low S values (0.22-0.51) for the most highly expressed genes. This might not be evidence for selection in these organisms as the technique used here to estimate S does not properly account for nucleotide composition heterogeneity along such genomes. The relationship between estimated S values and empirical estimates of population size is presented here for the first time. It is shown, as theoretically expected, that population size has an important role in the operativity of translational selection.
Mitochondrial Heteroplasmy and Paternal Leakage in Natural Populations of Silene vulgaris, a Gynodioecious Plant
It is currently thought that most angiosperms transmit their mitochondrial genomes maternally. Maternal transmission limits opportunities for genetic heterogeneity (heteroplasmy) of the mitochondrial genome within individuals. Recent studies of the gynodioecious species Silene vulgaris and S. acaulis, however, document both direct and indirect evidence of mitochondrial heteroplasmy, suggesting that the mitochondrial genome is at times transmitted via paternal leakage. This heteroplasmy allows the generation of multi-locus recombinants, as documented in recent studies of both species. A prior study that employed quantitative PCR (q-PCR) on a limited sample provided direct evidence of heteroplasmy in the mitochondrial gene atp1 in S. vulgaris. Here, we apply the q-PCR methods to a much larger sample and extend them to incorporate the study of an additional atp1 haplotype along with two other haplotypes of the mitochondrial gene cox1 to evaluate the origin, extent, and transmission of mitochondrial genome heteroplasmy in S. vulgaris. We first calibrate our q-PCR methods experimentally and then use them to quantify heteroplasmy in 408 S. vulgaris individuals sampled from 22 natural populations located in Virginia, New York, and Tennessee. Sixty-one individuals exhibit heteroplasmy, including five that exhibited the joint heteroplasmy at both loci that is a prerequisite for effective recombination. The heteroplasmic individuals were distributed among 18 of the populations studied, demonstrating that heteroplasmy is a widespread phenomenon in this species. Further, we compare mother and offspring from 71 families to determine the rate of heteroplasmy gained and lost via paternal leakage and vegetative sorting across generations. Of 17 sibships exhibiting cox1 heteroplasmy and 14 sibships exhibiting atp1 heteroplasmy, more than half of the observations of heteroplasmy are generated via paternal leakage at the time of fertilization, with the rest being inherited from a heteroplasmic mother. Moreover, we show that the average paternal contribution during paternal leakage is about 12%. These findings are surprising, given that the current understanding of gynodioecy assumes that mitochondrial cytoplasmic male sterility (CMS) elements are strictly maternally inherited. Knowledge of the dynamics of mitochondrial populations within individuals plays an important role in understanding the evolution of gynodioecy, and we discuss our findings within this context.
Gnathostome phylogenomics utilizing lungfish EST sequences
The relationship between the Chondrichthyes (cartilaginous fishes), the Actinopterygii (ray-finned fishes) and the piscine Sarcopterygii (lobe-finned fishes), and how the Tetrapoda (four-limbed terrestrial vertebrates) are related to these has been a contentious issue for more than a century. A general consensus about the relationship of these vertebrate clades has gradually emerged among morphologists, but no molecular study has yet provided conclusive evidence for any specific hypothesis. In order to examine these relationships on the basis of more extensive sequence data we have produced almost 1,000,000 base pairs of expressed sequence tags (ESTs) from the African marbled lungfish, Protopterus aethiopicus. This new data set yielded 771 transcribed nuclear sequences that had not been previously described. The lungfish EST sequences were combined with EST data from two cartilaginous fishes and whole genome data from an agnathan, four ray-finned fishes and four tetrapods. Phylogenomic analysis of these data yielded, for the first time, significant maximum likelihood support for a traditional gnathostome tree with a split between the Chondrichthyes and remaining (bone) gnathostomes. Also the sister group relationship between Dipnoi (lungfishes) and Tetrapoda received conclusive support. Previously proposed hypotheses, such as the monophyly of fishes, could be rejected significantly. The divergence time between lungfishes and tetrapods was estimated to 382-390 million years ago by the current data set and six calibration points.



.jpg)