Create your own conference schedule! Click here for full instructions

Abstract Detail


Boutte, Julien [1], Fishbein, Mark [2], Straub, Shannon C.K. [1].

Indel characters and phylogenomic analyses: application to milkweeds (Asclepias).

Phylogenomic approaches allow characterization of evolutionary relationships and plant diversity using a considerable number of genes and informative characters. However, a potential source of phylogenetically informative characters (PICs), coded indels, are often not utilized during phylogenomic analyses because misassembly of short reads, rather than true insertion or deletion of nucleotides, may be their cause. Studies at the species level, especially in groups that have undergone recent, rapid evolutionary radiations, often recover low amounts of phylogenetically informative variation in coding regions, and require non-coding region sequences, which are richer in indels, to resolve gene trees. To study the impact of indel characters on phylogenomic analyses, we developed a pipeline to evaluate and code indels in sequence alignments. We applied this pipeline to a Hyb-Seq data set (768 loci including targeted exons and the intron flanking regions or “splash zone”) for the American milkweeds (Asclepias L., Apocynaceae; ca. 130 species), which are the result of a rapid and recent evolutionary radiation and whose phylogeny has been difficult to resolve. For each sequenced locus, we assembled a mean of 1,827 bp of exon and 1,067 bp of internal and flanking splash zones with HybPiper. Using custom python scripts, we identified false positives PICs (due to low sequencing depth and/or sequencing error) and putative chimeric large insertion/deletion regions (created during reassembly process due to low read depth or read depth variation). After removing erroneous assemblies, each locus contained a mean of 5.62 PICs per 100 bp due to Single Nucleotide Polymorphism (SNP) variation and a mean of 8.44 PICs per 100 bp including sites that could be coded as indels. We conducted phylogenomic analyses on the concatenated loci using RAxML and 2MATRIX, a program that codes informative indels in binary characters. We used ASTRAL to estimate the species trees from the collection of 767 gene trees. We then compared the impact, separately, of including coded small and large indels on gene trees and species trees derived from concatenated loci and sets of gene trees and the impact of indel coding on discordance among these phylogenetic approaches. Our new pipeline represents a step forward in making maximal use of the information content in phylogenomic datasets.

Log in to add this item to your schedule

1 - Hobart and William Smith Colleges, Department of Biology, 300 Pulteney Street, Geneva, NY, 14456, USA
2 - Oklahoma State University, Plant Biology, Ecology, & Evolution, 301 Physical Sciences, Stillwater, OK, 74078, USA

Phylogenetically informative characters (PICs)
Large insertions/deletions
Non-coding regions.

Presentation Type: Oral Paper
Session: 33, Phylogenomics II
Location: Fort Worth Ballroom 4/Omni Hotel
Date: Wednesday, June 28th, 2017
Time: 8:00 AM
Number: 33001
Abstract ID:400
Candidate for Awards:None

Copyright © 2000-2017, Botanical Society of America. All rights reserved