1
Supplementary information
In explorative analysis, evolutionary rates were estimated using a strict or relaxed (uncorrelated lognormal)
clock model, a Bayesian skyline plot or constant population size model, and using a partitioned (1
st
+2
nd
, 3
rd
)
or non-partitioned model (Table S9). All analyses were performed using the HKY nucleotide substitution
model. We also compared the estimates obtained when analyzing each individual separately (unlinked) or
simultaneously as two different progressor groups as implemented in the recently described HPM
incorporating fixed effects(1). Analyses using a Bayesian skyline plot demographic model did not converge.
Thus, only the constant size demographic model was used for further analysis. The relaxed clock nucleotide
models did not converge well for all individuals and was therefore discarded for further analyses. Based on
these results we proceeded with the strict clock codon and nucleotide models and with the relaxed clock
codon model in subsequent analysis. In explorative analyses, we did not find any significant differences
between the estimated evolutionary rates when comparing the unlinked model with the HPM or between the
partitioned and non-partitioned nucleotide models (data not shown). Thus, subsequent analysis were
performed using the HPM, a constant size model, and both the codon and non-partitioned nucleotide models
(referred to as simply nucleotide model in the main manuscript).
Codon substitution rates were estimated using both a strict and relaxed clock with the relaxed clock
generally estimating a significantly faster codon substitution rate than the strict clock. However, the
differences between the relaxed and strict clocks were similar for all patients, as were the differences
between the groups (Tables S4 and S6).
Convergence with high effective sample sizes (ESSs) was reached for all datasets using the strict clock
codon model, except for individual DL2051 that displayed a binomial posterior rate distribution. To study if
this could have influenced the observed differences between the progressor groups, we reanalyzed our data
for all individuals using either (1) only the sample states of the binomial posterior rate distribution resulting