At The Stats People, we are against black boxes that require users to place all their faith in scientific looking numeric outputs, which say nothing about how reliable these estimates would be across different samples or on new cases.

For some time now, we have been advocating the use of predictive modelling methods that are optimised based on how well they perform on new cases, or at least on cases different to those on which they were built. See The Curse of Overfitting. These methods are integrated into our CCR algorithm.

This approach is known as “crossvalidation” and is particularly relevant for smaller samples with many highly correlated predictors. We have been conscious for some time that not all of the methods we offer use or lend themselves to crossvalidation. They always converge to estimates, even on tiny samples and require “faith” in their validity.

To counter this we are currently putting resources into developing bootstrapping techniques for these methods which will deliver reliable “confidence intervals” for their outputs, without reliance on “in-sample” estimates.

These extensions will allow clients to have greater faith in the results and reliably answer the question: how significantly different are the results for these predictors? We hope to roll this out to other methods over time.