Overfitting in medical inference
Overfitting correction
When data sets contain insufficient numbers of patients compared to the number of measurements per patients statistical methods are known to start `overfitting’, i.e. to misinterpret noise as signal. This leads to unreliable predictions, and implies that many expensive data remain under-used (see description of the BFI method).
We have shown, using mathematical tools from statistical physics (the replica methods), that it is possible for a large family of statistical models to predict the relation between the incorrect inferences of an overfitting model and what would have been reported in the absence of overfitting. Our results, illustrated by application to linear, logistic, and Cox regression, enable us to correct statistical inferences in the hitherto forbidden overfitting regime.