Angiosperm Phylogeny Based on 18S/26S rDNA Sequence Data: Constructing a Large Data Set Using Next-Generation Sequence Data
The utility of 18S and 26S in broad phylogenetic analyses has been much maligned due in large part to the low signal in both genes. However, few analyses have employed complete 26S rDNA sequences over a broad range of taxa, and most alignments of the two genes are done de novo, without taking into account the secondary structure of the two rRNA genes. Here we mine next-generation sequence data to compile large matrices (429 taxa) of complete 18S + 26S gene sequences, and we compare both de novo alignment methods with curated alignments done by eye that take into account secondary structure and hard-to-align regions (profile alignments). The combined 18S + 26S topology is overall very similar to recently published gene trees for the angiosperms based on three or more genes. Overall support for the backbone or framework of the combined tree is low (bootstrap support below 50%). Few major clades have bootstrap support above 50%. Most well-supported clades are tip clades (families and orders sensu APG III 2009). Importantly, the 18S + 26S rDNA topology is consistent with current estimates of relationships: the basalmost angiosperms are recovered (Amborellaceae, Nymphaeales, Austrobaileyales), as are most major clades, including Mesangiospermae, eudicots (Eudicotyledoneae sensu Cantino et al. 2007), core eudicots (Gunneridae sensu Cantino et al. 2007), rosids (Rosidae sensu Cantino et al. 2007), asterids (Asteridae sensu Cantino et al. 2007), and Caryophyllales. Most clades recognized at the ordinal level (sensu APG III 2009) are also recovered. However, there are also some unusual placements in the 18S + 26S topology, but none of these receives bootstrap support above 50%. The profile and de novo alignments gave very similar topologies. 18S + 26S trees remain useful sources of data in large combined analyses. This is the first time a large data set of complete 26S gene sequences has been employed at this scale; this gene in particular proved to be useful phylogenetically. Targeted sequencing of 18S/26S rDNA is not advocated here, but given that these regions provide useful phylogenetic information and are abundant in next-generation sequencing runs, we suggest that the data be used rather than discarded.