Treetrimmer: A method for phylogenetic dataset size reduction

Shinichiro Maruyama, Robert Jm Eveleigh, John M. Archibald

    Research output: Contribution to journalArticle

    12 Citations (Scopus)

    Abstract

    Background: With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Findings. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. Conclusions: TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

    Original languageEnglish
    Article number145
    JournalBMC Research Notes
    Volume6
    Issue number1
    DOIs
    Publication statusPublished - 2013

    Keywords

    • Dereplication
    • Phylogenetic tree
    • Pruning
    • Taxonomic category
    • TreeTrimmer

    ASJC Scopus subject areas

    • Biochemistry, Genetics and Molecular Biology(all)

    Fingerprint Dive into the research topics of 'Treetrimmer: A method for phylogenetic dataset size reduction'. Together they form a unique fingerprint.

  • Cite this