This
article
is an attempt to sum up a small number of generic rules that appear to be
useful rules of thumb when creating high performing programs. It is structured
by first establishing some fundamental causes of performance hits followed by
their extensions.
Two fundamental causes of performance
problems
Memory Latency: A
big performance problem on modern computers is the latency of SDRAM. The CPU
waits idly for a read from memory to come back.
Context Switching: When
a CPU switches context "the memory it will access is most likely unrelated
to the memory the previous context was accessing. This often results in
significant eviction of the previous cache, and requires the switched-to
context to load much of its data from RAM, which is slow."
Rules to help balance the forces of evil
Batch work: To
avoid the cost of context switches, it makes sense to try to invoke them as
rarely as possible. You may not have much control over operating systems’
system calls. Avoid context switching by batching work. For example, there are
vector versions of system calls like writev() and readv() that operate on more
than one item per call. An implication is that you want to merge as many writes
as possible.
Avoid Magic Numbers: They
don't scale. Waking a thread up every 100ms or when 100 jobs are queued, or
using fixed size buffers, doesn't adapt to changing circumstances.
Allocate memory buffers up front:
Avoid
extra copying and maintain predictable memory usage.
Organically
adapt your job-batching sizes to the granularity of the scheduler and the time
it takes for your thread to wake up.
Adapt
receive buffer sizes for sockets, while at the same time avoiding copying
memory out of the kernel.
Always
complete all work queued up for a thread before going back to sleep.
Only
signal a worker thread to wake up when the number of jobs on its queue go from
0 to > 0. Any other signal is redundant and a waste of time.
No comments:
Post a Comment