IEEE International Conference on Performance, Computing, and Communications, 2004, 2004
The paper introduces the concept of regression benchmarking as a variant of regression testing fo... more The paper introduces the concept of regression benchmarking as a variant of regression testing focused at detecting performance regressions. Applying the regression benchmarking in the area of middleware development, the paper explains how regression benchmarking differs from middleware benchmarking in general. On a real-world example of TAO, the paper shows why the existing benchmarks do not give results sufficient for
Uploads
Papers by Tomas Kalibera
In contrast, we provide a statistically rigorous methodology for repetition and summarising results that makes efficient use of ex- perimentation time. Time efficiency comes from two key obser- vations. First, a given benchmark on a given platform is typically prone to much less non-determinism than the common worst-case of published corner-case studies. Second, repetition is most needed where most uncertainty arises (whether between builds, between executions or between iterations). We capture experimentation cost with a novel mathematical model, which we use to identify the number of repetitions at each level of an experiment necessary and sufficient to obtain a given level of precision.
We present our methodology as a cookbook that guides re- searchers on the number of repetitions they should run to obtain reliable results. We also show how to present results with an effect size confidence interval. As an example, we show how to use our methodology to conduct throughput experiments with the DaCapo and SPEC CPU benchmarks on three recent platforms.
In contrast, we provide a statistically rigorous methodology for repetition and summarising results that makes efficient use of ex- perimentation time. Time efficiency comes from two key obser- vations. First, a given benchmark on a given platform is typically prone to much less non-determinism than the common worst-case of published corner-case studies. Second, repetition is most needed where most uncertainty arises (whether between builds, between executions or between iterations). We capture experimentation cost with a novel mathematical model, which we use to identify the number of repetitions at each level of an experiment necessary and sufficient to obtain a given level of precision.
We present our methodology as a cookbook that guides re- searchers on the number of repetitions they should run to obtain reliable results. We also show how to present results with an effect size confidence interval. As an example, we show how to use our methodology to conduct throughput experiments with the DaCapo and SPEC CPU benchmarks on three recent platforms.