Commit graph

4 commits

Author SHA1 Message Date
Marko Mäkelä
709f0510e3 MDEV-19845: Adjust for Skylake based on benchmarks
Even though the PAUSE instruction latency was increased from
about 10 to 140 clock cycles in the Intel Skylake microarchitecture,
it seems to be optimal to reduce the amount of subsequently executed
PAUSE instructions not to 1/14, but to 1/2.
2019-07-02 17:44:05 +03:00
Marko Mäkelä
f5c080c735 MDEV-19845: Fix the build on some platforms
On some platforms, MY_RELAX_CPU() falls back to an atomic
memory operation, but my_cpu.h fails to include my_atomic.h.
2019-06-27 15:04:00 +03:00
Marko Mäkelä
0b7fa5a05d MDEV-19845: Fix the build on some x86 targets
The RDTSC instruction, which was introduced in the Intel Pentium,
has been used in MariaDB for a long time. But, the __rdtsc()
wrapper is not available by default in some x86 build environments.
The simplest solution seems to replace the inlined instruction
with a call to the wrapper function my_timer_cycles(). The overhead
for the call should not affect the measurement threshold.

On Windows and on AMD64, we will keep using __rdtsc() directly.
2019-06-27 12:19:51 +03:00
Marko Mäkelä
042fc29597 MDEV-19845: Adaptive spin loops
Starting with the Intel Skylake microarchitecture, the PAUSE
instruction latency is about 140 clock cycles instead of earlier 10.
On AMD processors, the latency could be 10 or 50 clock cycles,
depending on microarchitecture.

Because of this big range of latency, let us scale the loops around
the PAUSE instruction based on timing results at server startup.

my_cpu_relax_multiplier: New variable: How many times to invoke PAUSE
in a loop. Only defined for IA-32 and AMD64.

my_cpu_init(): Determine with RDTSC the time to run 16 PAUSE instructions
in two unrolled loops according, and based on the quicker of the two
runs, initialize my_cpu_relax_multiplier. This form of calibration was
suggested by Mikhail Sinyavin from Intel.

LF_BACKOFF(), ut_delay(): Use my_cpu_relax_multiplier when available.

ut_delay(): Define inline in my_cpu.h.

UT_COMPILER_BARRIER(): Remove. This does not seem to have any effect,
because in our ut_delay() implementation, no computations are being
performed inside the loop. The purpose of UT_COMPILER_BARRIER() was to
prohibit the compiler from reordering computations. It was not
emitting any code.
2019-06-27 10:53:18 +03:00