Technology
Medicine wants to move away from a `one-size-fits-all’ approach, and instead use the vast amounts of information now accessible to tailor treatments optimally: `the right drug at the right time for the right patient’ (precision medicine). To achieve this, just collecting data is not enough. One needs intelligent quantitative algorithms that can predict, from data of individual patients, how for each possible treatment regime their disease would evolve. Moreover, these algorithms must be interpretable.
The main barrier in extracting patterns from medical data is the need to see the difference between `signal’ and `noise’. When an algorithm mistakes noise for signal, this is called `overfitting’. The danger of overfitting happening increases if we include more data per patient or if the number of patients is not large enough. The conventional statistical methods used in medicine were not designed to deal with overfitting.
​
Our approach to reliable inference from high-dimensional medical data is to develop new mathematical and statistical models, in collaboration with medical professionals and academia, designed to meet the challenges of modern data-driven personalized medicine. They combine the precision and interpretability of Bayesian approaches with mathematical tools from theoretical physics, and focus on:
-
Undoing effects of overfitting in clinical outcome prediction for high-dimensional signals or small data sets.
-
How to decontaminate inferences for the effects of disease interactions.
-
How to infer responder subgroups in clinical trials prospectively.
-
How to identify optimal personalized treatments and drug doses.
​
Below we give a more in-depth description of the technology behind our two main inference pipelines, spsSIGNATURE and spsMOSAICS.
Both have by now been used in multiple medical studies and publications.
spsSIGNATURE
spsSIGNATURE is a game-changer for handling high-dimensional data in the overfitting regime, delivering reliable risk and treatment response signatures. This isn't just another tool—it's your competitive edge.
This fully automated pipeline computes reliable risk and treatment response signatures for high-dimensional data, i.e. in the overfitting regime. It can handle multiple clinical outcome types: time-to-event, ordinal class outcomes, and real-valued clinical scales. It suppresses overfitting effects by a combination of iterative removal of irrelevant covariates based on probabilistic arguments (as opposed to on z-scores), and bias removal and optimal regularization based on a statistical mechanical theory of overfitting in Generalized Linear Models (GLM). This theory, developed by Saddle Point in collaboration with academic partners (see the publications list), is based on the so-called replica method. The benefits of this strategy compared to more standard regularization approaches to overfitting (as used in other statistical software packages) are that (i) no data need to be sacrificed for optimization of the hyperparameters; they are either not needed (in bias removal mode) or they are computed analytically (in regularization mode), and (ii) also nuisance parameters (e.g. base hazards) are corrected for overfitting.
​
Other features include:
​
-
Handling of informative covariate missingness.
-
Optimized personalized risk and treatment response signatures, and personalized optimal drug doses.
-
Shadow analysis (analysis with outcome-randomized data, to exclude false positive inferences).
-
Generation of simulated clinical data.
-
Validation of predictions on further data sets.
-
Fully automated report generation.
spsMOSAICS
spsMOSAICS finds latent subgroups in patient cohorts, allowing for prospective responder identification in clinical trials. It's your unique tool for improving drug targeting and transforming patient outcomes.
While Saddle Point Signature constructs treatment response scores, if required, these are limited to those scenarios where their available covariates carry (possibly complex) information on the likelihood of an individual’s treatment response. The Bayesian pipeline Saddle Point Mosaics goes further. It also detects responder subgroups in those cases where the measured covariates are not informative. It uses Bayesian model selection to infer more generally:
​
-
The number and size of statistically significant patient subgroups in a given clinical data set.
-
Whether these subgroups differ in frailties, risk associations, base hazard rates or combinations of these three.
-
What are the sizes and the quantitative characteristics of each subgroup.
-
What are the probabilities for individual patients in the data set to belong to each of the detected subgroups.
-
Whether and how subgroup membership can be predicted a priori from the covariates.
​​
If (some of) the subgroups are distinct in terms of their association with the treatment variable, this results in responder identification. spsMOSAICS thus informs the user on (i) the extent to which a cohort is stratifiable, (ii) what are the characteristics of the strata, and (iii) a rational tool for finding stratifying biomarkers (which would be variables that correlate with the reported class membership probabilities). The pipeline can handle multiple risks, and decontaminate survival predictions for competing risk effects, and generates fully automated analysis reports.