BUSTED+MSS: Synonymous rate corrections#
Method Summary#
Standard codon-based tests for natural selection, including BUSTED, operate on the assumption that synonymous substitutions (mutations that do not alter the amino acid) are selectively neutral and occur at a uniform background rate ().
This assumption is increasingly challenged by evidence of purifying selection acting on synonymous sites to optimize translation efficiency, preserve mRNA secondary structure, or control folding kinetics. When models fail to account for this synonymous selection, the background rate () is underestimated, inflating the () ratio and leading to false positive inferences of positive selection.
This paper introduces BUSTED+S+MSS, which incorporates Multiclass Synonymous Substitution (MSS) models into the BUSTED framework. It partitions synonymous rates into multiple empirically derived classes, correcting for global synonymous selection constraints.
What It Does#
- Models Synonymous Variation: Replaces the single background rate with a multiclass distribution, capturing site-to-site variation in synonymous selection.
- Reduces False Positives: Minimizes spurious positive selection signals caused by unmodeled purifying selection on synonymous sites.
- Integrates an "Error Sink": Combines synonymous rate classes with an error-sink parameter to absorb alignment artifacts, preventing them from biasing biological selection tests.
How to Use It in HyPhy#
The MSS synonymous selection correction is fully integrated into the standard HyPhy BUSTED package.
- Prepare Input: You need a coding sequence alignment and phylogenetic tree.
- Execute the Analysis:
Select the synonymous rate variation option when running BUSTED via the CLI:
bash hyphy busted --alignment data.fas --tree tree.nwk --syn-rate-classes 3Specifying--syn-rate-classes 3enables the Multiclass Synonymous Substitution (MSS) model, splitting synonymous rates into three distinct categories.
Key Findings & Significance#
- Empirical Validation: Applied to datasets across five diverse clades—Drosophila, Caenorhabditis, Enterobacteria, Saccharomyces, and Primates. The inclusion of MSS consistently improved model fit.
- Fewer False Positives: MSS corrections reduced the number of genes falsely inferred to be under positive selection, particularly in highly divergent alignments.
- Dual Correction: Information-theoretic analyses show that while site-specific synonymous rate variation (SRV) provides the primary correction, global synonymous rate variation (MSS) acts as a crucial second-order correction.