NAME
lmbench - benchmarking toolbox
SYNOPSIS
#include ’’lmbench.h’’
typedef u_long |
iter_t |
typedef (*benchmp_f)(iter_t iterations, void* cookie)
void |
benchmp(benchmp_f initialize, benchmp_f benchmark, benchmp_f cleanup, int enough, int parallel, int warmup, int repetitions, void* cookie) |
|||
uint64 |
get_n() | |||
void |
milli(char *s, uint64 n) |
|||
void |
micro(char *s, uint64 n) |
|||
void |
nano(char *s, uint64 n) |
|||
void |
mb(uint64 bytes) |
|||
void |
kb(uint64 bytes) |
DESCRIPTION
Creating benchmarks using the lmbench timing harness is easy. Since it is so easy to measure performance using lmbench , it is possible to quickly answer questions that arise during system design, development, or tuning. For example, image processing
There are two attributes that are critical for performance, latency and bandwidth, and lmbench´s timing harness makes it easy to measure and report results for both. Latency is usually important for frequently executed operations, and bandwidth is usually important when moving large chunks of data.
There are a number of factors to consider when building benchmarks.
The timing harness requires that the benchmarked operation be idempotent so that it can be repeated indefinitely.
The timing subsystem, benchmp, is passed up to three function pointers. Some benchmarks may need as few as one function pointer (for benchmark).
void |
benchmp(initialize, benchmark, cleanup, enough, parallel, warmup, |
repetitions, cookie)
measures the performance of benchmark repeatedly and reports the median result. benchmp creates parallel sub-processes which run benchmark in parallel. This allows lmbench to measure the system’s ability to scale as the number of client processes increases. Each sub-process executes initialize before starting the benchmarking cycle with iterations set to 0. It will call initialize , benchmark , and cleanup with iterations set to the number of iterations in the timing loop several times in order to collect repetitions results. The calls to benchmark are surrounded by start and stop call to time the amount of time it takes to do the benchmarked operation iterations times. After all the benchmark results have been collected, cleanup is called with iterations set to 0 to cleanup any resources which may have been allocated by initialize or benchmark. cookie is a void pointer to a hunk of memory that can be used to store any parameters or state that is needed by the benchmark.
void |
benchmp_getstate() |
returns a void pointer to the lmbench-internal state used during benchmarking. The state is not to be used or accessed directly by clients, but rather would be passed into benchmp_interval.
iter_t |
benchmp_interval(void* state) |
returns the number of times the benchmark should execute its benchmark loop during this timing interval. This is used only for weird benchmarks which cannot implement the benchmark body in a function which can return, such as the page fault handler. Please see lat_sig.c for sample usage.
uint64 |
get_n() |
returns the number of times loop_body was executed during the timing interval.
void |
milli(char *s, uint64 n) |
print out the time per operation in milli-seconds. n is the number of operations during the timing interval, which is passed as a parameter because each loop_body can contain several operations.
void |
micro(char *s, uint64 n) |
print the time per opertaion in micro-seconds.
void |
nano(char *s, uint64 n) |
print the time per operation in nano-seconds.
void |
mb(uint64 bytes) |
print the bandwidth in megabytes per second.
void |
kb(uint64 bytes) |
print the bandwidth in kilobytes per second.
USING lmbench
Here is an example of a simple benchmark that measures the latency of the random number generator lrand48():
#include ’’lmbench.h’’
void
benchmark_lrand48(iter_t iterations, void* cookie) {
while(iterations-- > 0) | |||
lrand48(); |
}
int
main(int argc, char *argv[])
{
benchmp(NULL, benchmark_lrand48, NULL, 0, 1, 0, TRIES, NULL); | |
micro( lrand48()", get_n());" | |
}
Here is a simple benchmark that measures and reports the bandwidth of bcopy:
#include ’’lmbench.h’’
#define MB
(1024 * 1024)
#define SIZE (8 * MB)
struct _state {
int size; | |
char* a; | |
char* b; |
};
void
initialize_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie; |
if (!iterations) return;
state->a = malloc(state->size); | |||
state->b = malloc(state->size); | |||
if (state->a == NULL || state->b == NULL) | |||
}
void
benchmark_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie; | |||
while(iterations-- > 0) | |||
bcopy(state->a, state->b, state->size); |
}
void
cleanup_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie; |
if (!iterations) return;
free(state->a); | |
free(state->b); |
}
int
main(int argc, char *argv[])
{
struct _state state; | |||
state.size = SIZE; | |||
benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy, | |||
0, 1, 0, TRIES, &state); | |||
mb(get_n() * state.size); | |||
}
A slightly more complex version of the bcopy benchmark might measure bandwidth as a function of memory size and parallelism. The main procedure in this case might look something like this:
int
main(int argc, char *argv[])
{
int |
size, par; |
||||
struct _state state; |
|||||
for (size = 64; size <= SIZE; size <<= 1) { |
|||||
for (par = 1; par < 32; par <<= 1) { |
|||||
state.size = size; | |||||
benchmp(initialize_bcopy, benchmark_bcopy, | |||||
cleanup_bcopy, 0, par, 0, TRIES, &state); | |||||
fprintf(stderr, d%d | |||||
mb(par * get_n() * state.size); | |||||
} |
|||||
} |
|||||
}
VARIABLES
There are three environment variables that can be used to modify the lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.
FUTURES
Development of lmbench is continuing.
SEE ALSO
lmbench(8), timing(3), reporting(3), results(3).
AUTHOR
Carl Staelin and Larry McVoy
Comments, suggestions, and bug reports are always welcome.