-------------------------------------------------------------
revno: 2877
committer: Davi Arnaut <Davi.Arnaut@Sun.COM>
branch nick: 35164-6.0
timestamp: Wed 2008-10-15 19:53:18 -0300
message:
Bug#35164: Large number of invalid pthread_attr_setschedparam calls
Bug#37536: Thread scheduling causes performance degradation at low thread count
Bug#12702: Long queries take 100% of CPU and freeze other applications under Windows
The problem is that although having threads with different priorities
yields marginal improvements [1] in some platforms [2], relying on some
statically defined priorities (QUERY_PRIOR and WAIT_PRIOR) to play well
(or to work at all) with different scheduling practices and disciplines
is, at best, a shot in the dark as the meaning of priority values may
change depending on the scheduling policy set for the process.
Another problem is that increasing priorities can hurt other concurrent
(running on the same hardware) applications (such as AMP) by causing
starvation problems as MySQL threads will successively preempt lower
priority processes. This can be evidenced by Bug#12702.
The solution is to not change the threads priorities and rely on the
system scheduler to perform its job. This also enables a system admin
to increase or decrease the scheduling priority of the MySQL process,
if intended.
Furthermore, the internal wrappers and code for changing the priority
of threads is being removed as they are now unused and ancient.
1. Due to unintentional side effects. On Solaris this could artificially
help benchmarks as calling the priority changing syscall millions of
times is more beneficial than the actual setting of the priority.
2. Where it actually works. It has never worked on Linux as the default
scheduling policy SCHED_OTHER only accepts the static priority 0.
http://lists.mysql.com/commits/59686
Cleanup pthread_self(), pthread_create(), pthread_join() implementation on Windows.
Prior implementation is was unnecessarily complicated and even differs in embedded
and non-embedded case.
Improvements in this patch:
* pthread_t is now the unique thread ID, instead of HANDLE returned by beginthread
This simplifies pthread_self() to be just straight GetCurrentThreadId().
prior it was much art involved in passing the beginthread() handle from the caller
to the TLS structure in the child thread ( did not work for the main thread of
course)
* remove MySQL specific my_thread_init()/my_thread_end() from pthread_create.
No automagic is done on Unix on pthread_create(). Having the same on Windows will
improve portability and avoid extra #ifdef's
* remove redefinition of getpid() - it was defined as GetCurrentThreadId()
htttp://lists.mysql.com/commits/50957?f=plain
Always use TLS functions instead of __declspec(thread) to access
thread local storage variables.
The change removes the necessity to recomplile the same source
files twice - with USE_TLS for DLLs and without USE_TLS for EXEs.
Real benefit of this change is better readability and maintainability
of TLS functions within MySQL.
There is a performance loss using TlsXXX functions compared to __declspec
but the difference is negligible in practice. In a sysbench-like benchmark
I ran with with TlsGetValue, pthread_[get|set]_specific was called 600000000
times and took 0.17sec of total 35min CPU time, or 0.008%.
Improved data typing of server variables in InnoDB to avoid bugs on
Windows.
Workaround for bug in gcc assembler on 32-bit Mac OS X as well as
on Solaris.
The problem is that the function used by the server to increase
the thread's priority (pthread_setschedparam) has the unintended
side-effect of changing the calling thread scheduling policy,
possibly overwriting a scheduling policy set by a sysadmin.
The solution is to rely on the pthread_setschedprio function, if
available, as it only changes the scheduling priority and does not
change the scheduling policy. This function is usually available on
Solaris and Linux, but it use won't work by default on Linux as the
the default scheduling policy only accepts a static priority 0 -- this
is acceptable for now as priority changing on Linux is broken anyway.
The problem is that MySQL's 'fast' mutex implementation uses the
random() routine to determine the spin delay. Unfortunately, the
routine interface is not thead-safe and some implementations (eg:
glibc) might use a internal lock to protect the RNG state, causing
excessive locking contention if lots of threads are spinning on
a MySQL's 'fast' mutex. The code was also misusing the value
of the RAND_MAX macro, this macro represents the largest value
that can be returned from the rand() function, not random().
The solution is to use the quite simple Park-Miller random number
generator. The initial seed is set to 1 because the previously used
generator wasn't being seeded -- the initial seed is 1 if srandom()
is not called.
Futhermore, the 'fast' mutex implementation has several shortcomings
and provides no measurable performance benefit. Therefore, its use is
not recommended unless it provides directly measurable results.
The problem is that unimplemented WIN32 version of pthread_kill
is returning ESRCH no matter the arguments, causing calls to
mysqld_list_processes to set the procinfo to dead because
pthread_kill returns non zero. The dead procinfo would show
up on a second invocation of show processlist.
When locking a "fast" mutex a static variable cpu_count
was used as a flag to initialize itself on the first usage
by calling sysconf() and setting non-zero value.
This is not thread and optimization safe on some
platforms. That's why the global initialization needs
to be done once in a designated function.
This will also speed up the usage (by a small bit)
because it won't have to check if it's initialized on
every call.
Fixed by moving the fast mutexes initialization out of
my_pthread_fastmutex_lock() to fastmutex_global_init()
and call it from my_init()
MySQL uses _beginthread()/_endthread() instead of
_beginthreadex()/_endthreadex() to create/end its threads
on Windows.
According to MSDN _endthread() does close the thread handle.
So there's no need the handle to be closed explicitly.
Besides : WaitForSingleObject(, INFINITE) != WAIT_OBJECT_0) is
true for all practical cases as the other two possible return
codes (according to MSDN) cannot happen in that case the
CloseHandle() was actually a dead code.
Fixed by removing the CloseHandle() call. No test case added
because it's not possible to test for absence of dead code.
The problem reported is a compile bug,
reported by the development GCC team with GCC 4.2.
The original issue can no longer be reproduced in MySQL 5.1,
since the configure script no longer define HAVE_ATOMIC_ADD,
which caused the Linux atomic functions to be used (and cause a problem
with an invalid cast).
This patch implements some code cleanup for 5.1 only, which was identified
during the investigation of this issue.
With this patch, statistics maintained in THD::status_var are by definition
owned by the running thread, and do not need to be protected against race
conditions. These statistics are maintained by the status_var_* helpers,
which do not require any lock.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
Fixed compile-pentium64 scripts
Fixed wrong estimate of update_with_key_prefix in sql-bench
Merge bk-internal.mysql.com:/home/bk/mysql-5.1 into mysql.com:/home/my/mysql-5.1
Fixed unsafe define of uint4korr()
Fixed that --extern works with mysql-test-run.pl
Small trivial cleanups
This also fixes a bug in counting number of rows that are updated when we have many simultanous queries
Move all connection handling and command exectuion main loop from sql_parse.cc to sql_connection.cc
Split handle_one_connection() into reusable sub functions.
Split create_new_thread() into reusable sub functions.
Added thread_scheduler; Preliminary interface code for future thread_handling code.
Use 'my_thread_id' for internal thread id's
Make thr_alarm_kill() to depend on thread_id instead of thread
Make thr_abort_locks_for_thread() depend on thread_id instead of thread
In store_globals(), set my_thread_var->id to be thd->thread_id.
Use my_thread_var->id as basis for my_thread_name()
The above changes makes the connection we have between THD and threads more soft.
Added a lot of DBUG_PRINT() and DBUG_ASSERT() functions
Fixed compiler warnings
Fixed core dumps when running with --debug
Removed setting of signal masks (was never used)
Made event code call pthread_exit() (portability fix)
Fixed that event code doesn't call DBUG_xxx functions before my_thread_init() is called.
Made handling of thread_id and thd->variables.pseudo_thread_id uniform.
Removed one common 'not freed memory' warning from mysqltest
Fixed a couple of usage of not initialized warnings (unlikely cases)
Suppress compiler warnings from bdb and (for the moment) warnings from ndb
- The condition variable implementation "lost" a signal to
WaitOnSingleObject when a semaphore was released.
- The signal could be consumed by a new call to pthread_cond_wait
before all waiting threads had awoken.
- The new implementation of pthread_cond_* uses events
instead of semaphores. It also uses an extra lock to protect entry
into new cond wait before the broadcast has finished.