Resource title

The design and analysis of benchmark experiments

Resource image

image for OpenScout resource :: The design and analysis of benchmark experiments

Resource description

The assessment of the performance of learners by means of benchmark experiments is established exercise. In practice, benchmark studies are a tool to compare the performance of several competing algorithms for a certain learning problem. Cross-validation or resampling techniques are commonly used to derive point estimates of the performances which are compared to identify algorithms with good properties. For several benchmarking problems, test procedures taking the variability of those point estimates into account have been suggested. Most of the recently proposed inference procedures are based on special variance estimators for the cross-validated performance. We introduce a theoretical framework for inference problems in benchmark experiments and show that standard statistical test procedures can be used to test for differences in the performances. The theory is based on well defined distributions of performance measures which can be compared with established tests. To demonstrate the usefulness in practice, the theoretical results are applied to benchmark studies in a supervised learning situation based on artificial and real-world data. (author's abstract) ; Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

Resource author

Torsten Hothorn, Friedrich Leisch, Achim Zeileis, Kurt Hornik

Resource publisher

Resource publish date

Resource language


Resource content type


Resource resource URL

Resource license

Adapt according to the license agreement. Always reference the original source and author.