Manpages

NAME

lmbench - system benchmarks

DESCRIPTION

lmbench is a series of micro benchmarks intended to measure basic operating system and hardware system metrics. The benchmarks fall into three general classes: bandwidth, latency, and ’’other’’.

Most of the lmbench benchmarks use a standard timing harness described in timing(3) and have a few standard options: parallelism, warmup, and repetitions. Parallelism specifies the number of benchmark processes to run in parallel. This is primarily useful when measuring the performance of SMP or distributed computers and can be used to evaluate the system’s performance scalability. Warmup is the number of minimum number of microseconds the benchmark should execute the benchmarked capability before it begins measuring performance. Again this is primarily useful for SMP or distributed systems and it is intended to give the process scheduler time to "settle" and migrate processes to other processors. By measuring performance over various warmup periods, users may evaulate the scheduler’s responsiveness. Repetitions is the number of measurements that the benchmark should take. This allows lmbench to provide greater or lesser statistical strength to the results it reports. The default number of repetitions is 11.

BANDWIDTH MEASUREMENTS

Data movement is fundamental to the performance on most computer systems. The bandwidth measurements are intended to show how the system can move data. The results of the bandwidth metrics can be compared but care must be taken to understand what it is that is being compared. The bandwidth benchmarks can be reduced to two main components: operating system overhead and memory speeds. The bandwidth benchmarks report their results as megabytes moved per second but please note that the data moved is not necessarily the same as the memory bandwidth used to move the data. Consult the individual man pages for more information.

Each of the bandwidth benchmarks is listed below with a brief overview of the intent of the benchmark.

bw_file_rd

reading and summing of a file via the read(2) interface.

bw_mem_cp

memory copy.

bw_mem_rd

memory reading and summing.

bw_mem_wr

memory writing.

bw_mmap_rd

reading and summing of a file via the memory mapping mmap(2) interface.

bw_pipe

reading of data via a pipe.

bw_tcp

reading of data via a TCP/IP socket.

bw_unix

reading data from a UNIX socket.

LATENCY MEASUREMENTS

Control messages are also fundamental to the performance on most computer systems. The latency measurements are intended to show how fast a system can be told to do some operation. The results of the latency metrics can be compared to each other for the most part. In particular, the pipe, rpc, tcp, and udp transactions are all identical benchmarks carried out over different system abstractions.

Latency numbers here should mostly be in microseconds per operation.

lat_connect

the time it takes to establish a TCP/IP connection.

lat_ctx

context switching; the number and size of processes is varied.

lat_fcntl

fcntl file locking.

lat_fifo

’’hot potato’’ transaction through a UNIX FIFO.

lat_fs

creating and deleting small files.

lat_pagefault

the time it takes to fault in a page from a file.

lat_mem_rd

memory read latency (accurate to the ~2-5 nanosecond range, reported in nanoseconds).

lat_mmap

time to set up a memory mapping.

lat_ops

basic processor operations, such as integer XOR, ADD, SUB, MUL, DIV, and MOD, and float ADD, MUL, DIV, and double ADD, MUL, DIV.

lat_pipe

’’hot potato’’ transaction through a Unix pipe.

lat_proc

process creation times (various sorts).

lat_rpc

’’hot potato’’ transaction through Sun RPC over UDP or TCP.

lat_select

select latency

lat_sig

signal installation and catch latencies. Also protection fault signal latency.

lat_syscall

non trivial entry into the system.

lat_tcp

’’hot potato’’ transaction through TCP.

lat_udp

’’hot potato’’ transaction through UDP.

lat_unix

’’hot potato’’ transaction through UNIX sockets.

lat_unix_connect

the time it takes to establish a UNIX socket connection.

OTHER MEASUREMENTS

mhz

processor cycle time

tlb

TLB size and TLB miss latency

line

cache line size (in bytes)

cache

cache statistics, such as line size, cache sizes, memory parallelism.

stream

John McCalpin’s stream benchmark

par_mem

memory subsystem parallelism. How many requests can the memory subsystem service in parallel, which may depend on the location of the data in the memory hierarchy.

par_ops

basic processor operation parallelism.

SEE ALSO

bargraph(1), graph(1), lmbench(3), results(3), timing(3), bw_file_rd(8), bw_mem_cp(8), bw_mem_wr(8), bw_mmap_rd(8), bw_pipe(8), bw_tcp(8), bw_unix(8), lat_connect(8), lat_ctx(8), lat_fcntl(8), lat_fifo(8), lat_fs(8), lat_http(8), lat_mem_rd(8), lat_mmap(8), lat_ops(8), lat_pagefault(8), lat_pipe(8), lat_proc(8), lat_rpc(8), lat_select(8), lat_sig(8), lat_syscall(8), lat_tcp(8), lat_udp(8), lmdd(8), par_ops(8), par_mem(8), mhz(8), tlb(8), line(8), cache(8), stream(8)

ACKNOWLEDGEMENT

Funding for the development of these tools was provided by Sun Microsystems Computer Corporation.

A large number of people have contributed to the testing and development of lmbench.

COPYING

The benchmarking code is distributed under the GPL with additional restrictions, see the COPYING file.

AUTHOR

Carl Staelin and Larry McVoy

Comments, suggestions, and bug reports are always welcome.