SupplemntaryInformation_Palm_mBio_ver2_180822

Supplementary information

In explorative analysis, evolutionary rates were estimated using a strict or relaxed (uncorrelated lognormal)

clock model, a Bayesian skyline plot or constant population size model, and using a partitioned (1

, 3

)

or non-partitioned model (Table S9). All analyses were performed using the HKY nucleotide substitution

model. We also compared the estimates obtained when analyzing each individual separately (unlinked) or

simultaneously as two different progressor groups as implemented in the recently described HPM

incorporating fixed effects(1). Analyses using a Bayesian skyline plot demographic model did not converge.

Thus, only the constant size demographic model was used for further analysis. The relaxed clock nucleotide

models did not converge well for all individuals and was therefore discarded for further analyses. Based on

these results we proceeded with the strict clock codon and nucleotide models and with the relaxed clock

codon model in subsequent analysis. In explorative analyses, we did not find any significant differences

between the estimated evolutionary rates when comparing the unlinked model with the HPM or between the

partitioned and non-partitioned nucleotide models (data not shown). Thus, subsequent analysis were

performed using the HPM, a constant size model, and both the codon and non-partitioned nucleotide models

(referred to as simply nucleotide model in the main manuscript).

Codon substitution rates were estimated using both a strict and relaxed clock with the relaxed clock

generally estimating a significantly faster codon substitution rate than the strict clock. However, the

differences between the relaxed and strict clocks were similar for all patients, as were the differences

between the groups (Tables S4 and S6).

Convergence with high effective sample sizes (ESSs) was reached for all datasets using the strict clock

codon model, except for individual DL2051 that displayed a binomial posterior rate distribution. To study if

this could have influenced the observed differences between the progressor groups, we reanalyzed our data

for all individuals using either (1) only the sample states of the binomial posterior rate distribution resulting