Skip to content

Latest commit

 

History

History
 
 

bench_timers

Benchmark test for RIOT timers

This test is intended to collect statistics about the runtime delays in the periph_timer implementation. This tool is mainly intended to detect problems in the low level driver implementation which can be difficult to detect from higher level systems such as xtimer.

Testing periph/timer

The basic idea of the test is to generate a wide variety of calls to the periph/timer API, in order to detect problems with the software implementation. The test consists of a single thread of execution which will set random timeouts on a periph_timer device. A reference timer is used to measure the time it takes until the callback is called. The difference between the expected time and the measured time is computed, and the mean and variance of the recorded values are calculated. The results are printed as a table on stdout every 30 seconds. All of the test scenarios used in this application are based on experience from real world bugs encountered during the development of RIOT and other systems.

periph/timer functions tested

Both timer_set and timer_set_absolute calls are mixed in a random order, to ensure that both functions are working correctly.

Rescheduling

Some bugs may occur only when the application attempts to replace an already set timer by calling timer_set again while the timer is running. This is dubbed "resched" in this application. The application will issue rescheduling timer calls in the normal operation of the test to catch this class of problems.

Setting stopped timers

Another class of bugs is the kind where a timer is not behaving correctly if it was stopped before setting a timer target. The timer should set the new target, but not begin counting towards the target until timer_start is called. This is covered by this application under the "stopped" category of test inputs.

General benchmark behaviour

The benchmark application uses some techniques to improve the quality of the tests.

Avoiding phase lock

The CPU time used for generation and setting of the next timer target in software is approximately constant across iterations. When the application flow is driven by the timer under test, via a mutex which is unlocked from the timer callback, the application will be "phase locked" to the timer under test. This peculiarity may hide certain implementation race conditions which only occur at specific phases of the timer ticks. In the test application, this problem is avoided by inserting random CPU delay loops at key locations:

  • After timer_stop
  • Before timer_set/timer_set_absolute
  • Before timer_start

Estimating benchmark CPU overhead

An estimation of the overhead resulting from the CPU processing delays inside the benchmarking code is performed during benchmark initialization. The estimation algorithm attempts to perform the exact same actions that the real benchmark will perform, but without setting any timers. The measured delay is assumed to originate in the benchmark code and will be subtracted when computing the difference between expected and actual values. A warning is printed when the variance of the estimated CPU overhead is too high, this can be a sign that some other process is running on the CPU and disrupting the estimation, or a sign of serious problems inside the timer_read function.

Testing xtimer

The benchmark application can be reconfigured for using xtimer instead of the low level periph/timer interface. Use the Makefile target test-xtimer to build a test for xtimer.

xtimer functions tested

The application will mix calls to the following xtimer functions:

  • _xtimer_set
  • _xtimer_set_absolute
  • _xtimer_periodic_wakeup
  • _xtimer_spin

These functions cover almost all uses of xtimer. The higher level functions such as xtimer_usleep and xtimer_set_msg all use these functions internally in the implementations.

Results

When the test has run for a certain amount of time, the current results will be presented as a table on stdout. To assist with finding the source of any discrepancies, the results are split according to three parameters:

  • Function used: timer_set, timer_set_absolute
  • Rescheduled: yes/no
  • Stopped: yes/no

Expected results

It is difficult to set a general expected result which applies to all platforms. However, the mean and variance of the differences should be small, in relation to the tick duration of the timer under test. What is "small" also depends the CPU core frequency, because of CPU processing time spent in the driver code and in the test code. For example, when testing a 1 MHz timer and comparing with another 1 MHz timer, the mean difference should likely be in single digits on a Cortex-M CPU. Testing a 32768 Hz timer and referencing a 1 MHz timer on the other hand should have a mean difference in double digits, the variance will also be greater because of the quantization errors and rounding in the tick conversion routine. If the mean or variance of the difference is exceptionally large, the row will be marked with "SIC!" to draw attention to the fact.

Interpreting timer_set_xxx statistics

The first part of the results compares the reference time elapsed against the expected time. A negative value means that the timer alarm is triggered too soon, compared to the expected time. A positive value means that the timer alarm is triggered late, compared to the expected time.

Expected variance

The variance of the results should generally be as small as possible, except for the case where the reference timer is running at a higher frequency than the timer under test. There will always be a truncation error when setting a timer target because of the discrete timer counts that can be set. The benchmark will compute limits for the expected variance and show a message if the variance is less than or greater than the expected variance.

Variance too low

A variance lower than expected indicates that there is a phase correlation between the reference timer and the timer under test. This can only be explained by a software bug or a hardware failure. The primary suspect will be the reference timer.

Variance too high

A variance greater than expected indicates that there is some extra randomness in the timeout length, which can be caused by other processes interfering with the test, a software bug in the timer under test, or hardware failure.

Mathematical explanation

Let X0 be the exact time that a timer_set call occurs, X0 is a real number. Let T be the desired timeout, T is a positive integer because the timer can only count integer number of ticks. Let x0 = floor(X0) be the count that the timer is currently on when the timer is set, x0 is an integer. If X0 is uniformly random, then the truncation error X0' = (X0 - x0) will be a sample from a continuous uniform distribution in the interval [0,1) ticks. The timer target, x1, in the timer under test can be expressed as x1 = x0 + T. Let k be a real number that satisfies Y = kx, where x is an interval in the unit used by the timer under test, and Y is an interval in the unit used by the reference timer, Y is a real number. The time measured by the reference timer can be expressed as Y = k(x1 - X0) = k(x0 + T - X0) = k(X0 - X0' + T - X0) = k(T - X0'). Y will be a sample from a uniform random distribution in the interval [kT, k(T + 1)]. The variance of Y is defined as Var(Y) = (k(T + 1) - kT)^2 / 12.

Interpreting timer_read statistics

A separate table is displayed for statistics on timer_read. This table compares the expected timer under test time after a timer has triggered, against the actual reported value from timer_read(TIM_TEST_DEV). A positive value means that the TUT time has passed the timer target time. A negative value means that, according to the timer under test, the alarm target time has not yet been reached, even though the timer callback has been executed.

A note on BOARD=native

When running on native, the application will be running as a normal process in a multi process system which means that most results will have a much greater variance than expected and the resulting differences will also be much larger than on a bare metal system. Be careful when drawing conclusions on results from native.

Configuration

The timer under test and the reference timer can be chosen at compile time by defining TIM_TEST_DEV and TIM_REF_DEV, respectively. The frequencies for these timers can be configured through TIM_DEV_FREQ and TIM_REF_FREQ. For example, to compare timer device 2 running at 32768 Hz against a reference timer on timer device 0 (default) running at 1 MHz (default) on Mulle:

CFLAGS='-DTIM_TEST_DEV=TIMER_DEV\(2\) -DTIM_TEST_FREQ=32768' BOARD=mulle make flash

Default configuration

reference timer: TIMER_DEV(0) timer under test: TIMER_DEV(1) timer frequency: 1 MHz

Define USE_REFERENCE=0 to compare a timer against itself. Be aware that this may hide a class of errors where the timer is losing ticks in certain situations, which a separate reference timer would be able to detect.

Example output

Below is a short sample of a test of a 32768 Hz timer with a 1 MHz reference:

------------- BEGIN STATISTICS --------------
Limits: mean: [-10, 41], variance: [58, 99]
Target error (actual trigger time - expected trigger time), in reference timer ticks
positive: timer is late, negative: timer is early
=== timer_set running ===
   interval    count       sum       sum_sq    min   max  mean  variance
   1 -    2:   25705    455646      2002373      2    35    17     77
   3 -    4:   25533    457326      1973191      2    35    17     77
   5 -    8:   25412    451046      1970014      2    35    17     77
   9 -   16:   25699    458563      2017198      2    36    17     78
  17 -   32:   25832    457768      2018909      1    35    17     78
  33 -   64:   25758    440454      1978830      1    35    17     76
  65 -  128:   25335    414320      1942196      0    34    16     76
 129 -  256:   25254    367689      1979616     -3    32    14     78
 257 -  512:   25677    285822      2020736     -7    31    11     78
      TOTAL   230205   3788634     18459378     -7    36    16     80

...

The statistics above show that the timer implementation introduces a small delay on short timers. Note however that it is not clear whether this delay is in timer_set, or timer_read, but the combined effect of all error sources make the timer overshoot its target by on average 17 microseconds. The difference between min and max is close to the expected difference of approximately 1/32768 s = 30.51 µs. The lower min and mean for the longer timer intervals are most likely caused by a drift between the 1 MHz clock and the 32.768 kHz clocks inside the CPU, and is expected when using two separate clock sources for the timers being compared.

Below is a short sample of a test where there is something wrong with the timer implementation, again a 32768 Hz timer tested with a 1 MHz reference:

=== timer_set_absolute resched ===
   interval    count       sum       sum_sq    min   max  mean  variance
   1 -    2:  152217  82324291  13339518497     38  1071   540  87635  <=== SIC!
   3 -    4:  152199  82254909  13301794832     38  1071   540  87397  <=== SIC!
   5 -    8:  152023  82085454  13264873203     38  1071   539  87256  <=== SIC!
   9 -   16:  152156  82236539  13265351119     38  1071   540  87183  <=== SIC!
  17 -   32:  152639  82606815  13267666812     38  1071   541  86922  <=== SIC!
  33 -   64:  151883  81836155  13268661551     37  1070   538  87361  <=== SIC!
  65 -  128:  152114  82151495  13228100922     36  1070   540  86962  <=== SIC!
 129 -  256:  152363  81806042  13278055359     34  1069   536  87148  <=== SIC!
 257 -  512:  152360  81235185  13303811951     29  1066   533  87318  <=== SIC!
      TOTAL  1369954 738536885 119197349719     29  1071   539  87008  <=== SIC!

We can see that the variance, the maximum error, and the mean are very large. This particular timer implementation needs some work on its timer_set_absolute implementation.

Configuration details

Configuration macros used by the application are described below

Settings related to timer hardware

TIM_TEST_DEV

Timer under test. Default: TIMER_DEV(1)

TIM_REF_DEV

Reference timer used for measuring the callback time error. Default: TIMER_DEV(0)

TIM_TEST_FREQ

Frequency of the timer under test, used when calling timer_init during the application startup. Default: 1000000, 1 MHz

TIM_REF_FREQ

Frequency of the reference timer, used when calling timer_init during the application startup. Default: 1000000, 1 MHz

TIM_TEST_CHAN

Timer channel used on the timer under test. Default: 0

USE_REFERENCE

Shorthand for setting the reference timer to the same as the timer under test. Default: 0

Settings related to result processing

DETAILED_STATS

Keep statistics per timer target offset value if set to 1, otherwise keep statistics only per scenario. Default: 1

LOG2_STATS

Only used when DETAILED_STATS == 1. Statistics are grouped according to 2-logarithms of the timer offset value, e.g. 1-2, 3-4, 5-8, 9-16 etc. This reduces memory consumption and creates a shorter result table for easier overview. Default: 1

TEST_PRINT_INTERVAL_TICKS

The result table will be printed to standard output when this many reference timer ticks have passed since the last printout. Default: ((TIM_REF_FREQ) * 30)

Settings related to timer input generation

TEST_MIN

Minimum timer offset tested for timer_set_absolute, in timer under test ticks. Default: 1 for timers < 1 MHz, 16 otherwise, to prevent setting a timer target in the past

TEST_MIN_REL

Minimum timer offset tested for timer_set, in timer under test ticks. Default: 0

TEST_MAX

Maximum timer offset tested, in timer under test ticks. Default: 128

SPIN_MAX_TARGET

The CPU busy wait loop will be calibrated during application start up so that the maximum random spin length is approximately equal to this many timer under test ticks. Default: 16

TEST_UNEXPECTED_STDDEV

Mark the output row if the computed variance implies a standard deviation which is greater than this value. This value will be automatically recomputed to compensate for the variance resulting from the truncation in the tick conversion if the reference timer is running at a higher frequency than the timer under test. Default: 4

TEST_UNEXPECTED_MEAN

Mark the output row if the computed mean absolute difference is greater than this number of timer under test ticks. This value will be automatically recomputed to compensate for the error resulting from the truncation in the tick conversion if the reference timer is running at a higher frequency than the timer under test. Default: 2