MDEV-24630: MY_RELAX_CPU assembly instruction upgrade/research for

memory barrier on ARM

As suggested in the said JIRA ticket based on the contribution done by
the community (in an attempt to optimize the spin-loop) the said approach
was evaluated against MariaDB Server 10.5 and found to help improve
throughput in the range of 2-5%.

Note: 10.6 timing graph and model are different as home-brew
mutexes are replaced with pthread mutexes. Said patch has mixed
impact on 10.6 so not recommended for 10.6.
This commit is contained in:
Krunal Bauskar 2021-03-30 15:57:14 +08:00 committed by Marko Mäkelä
parent 8c2e3259c1
commit 76d2846a71

View file

@ -84,7 +84,7 @@ static inline void MY_RELAX_CPU(void)
__ppc_get_timebase();
#elif defined __GNUC__ && (defined __arm__ || defined __aarch64__)
/* Mainly, prevent the compiler from optimizing away delay loops */
__asm__ __volatile__ ("":::"memory");
__asm__ __volatile__ ("isb":::"memory");
#else
int32 var, oldval = 0;
my_atomic_cas32_strong_explicit(&var, &oldval, 1, MY_MEMORY_ORDER_RELAXED,