Also added support for MAP_SYNC. It allows to achieve decent performance
with DAX devices even when libpmem is unavailable.
Fixed Windows version of my_msync(): according to manual FlushViewOfFile()
may return before flush is actually completed. It is advised to issue
FlushFileBuffers() after FlushViewOfFile().
Fix:
===
Add "REPLICA" as an alias for "SLAVE". All commands which use "SLAVE" keyword
can be used with new alias "REPLICA".
List of commands:
On Master:
=========
SHOW REPLICA HOSTS <--> SHOW SLAVE HOSTS
Privilege "SLAVE" <--> "REPLICA"
On Slave:
=========
START SLAVE <--> START REPLICA
START ALL SLAVES <--> START ALL REPLICAS
START SLAVE UNTIL <--> START REPLICA UNTIL
STOP SLAVE <--> STOP REPLICA
STOP ALL SLAVES <--> STOP ALL REPLICAS
RESET SLAVE <--> RESET REPLICA
RESET SLAVE ALL <--> RESET REPLICA ALL
SLAVE_POS <--> REPLICA_POS
Variable `wsrep_new_cluster` should be set to false after `wsrep_init_startup`.
Problem was that this was done before when mysqldump is used as SST method so option
wsrep-new-cluster didn't have any effect.
Support for galera GTID consistency thru cluster. All nodes in cluster
should have same GTID for replicated events which are originating from cluster.
Cluster originating commands need to contain sequential WSREP GTID seqno
Ignore manual setting of gtid_seq_no=X.
In master-slave scenario where master is non galera node replicated GTID is
replicated and is preserved in all nodes.
To have this - domain_id, server_id and seqnos should be same on all nodes.
Node which bootstraps the cluster, to achieve this, sends domain_id and
server_id to other nodes and this combination is used to write GTID for events
that are replicated inside cluster.
Cluster nodes that are executing non replicated events are going to have different
GTID than replicated ones, difference will be visible in domain part of gtid.
With wsrep_gtid_domain_id you can set domain_id for WSREP cluster.
Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and
WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format.
Fixed galera tests to reflect this chances.
Add variable to manually update WSREP GTID seqno in cluster
Add variable to manipulate and change WSREP GTID seqno. Next command
originating from cluster and on same thread will have set seqno and
cluster should change their internal counter to it's value.
Behavior is same as using @@gtid_seq_no for non WSREP transaction.
Starting with commit 373443903b
we would invoke memcmp() unconditionally, even if the length is zero.
But, a call to memcmp() is undefined if any parameter is a null pointer,
even if the length is zero.
In the following tests, a null pointer is being passed to the comparison:
vcol.vcol_keys_innodb gcol.gcol_keys_innodb main.func_group_innodb
innodb.innodb_bug53592
cmp_data(): Keep WITH_UBSAN happy and avoid potential future bugs
in optimized builds, like the one addressed by
commit fc168c3a5e (MDEV-15587).
InnoDB crash recovery used a special type of mem_heap_t that
allocates backing store from the buffer pool. That incurred
a significant overhead, leading to underutilization of memory,
and limiting the maximum contiguous allocated size of a log record.
recv_sys_t::blocks: A linked list of buf_block_t that are allocated
by buf_block_alloc() for redo log records. Replaces recv_sys_t::heap.
We repurpose buf_block_t::unzip_LRU for linking the elements.
recv_sys_t::max_log_blocks: Renamed from recv_n_pool_free_frames.
recv_sys_t::max_blocks(): Accessor for max_log_blocks.
recv_sys_t::alloc(): Allocate memory from the current recv_sys_t::blocks
element, or allocate another block. In debug builds, various free()
member functions must be invoked, because we repurpose
buf_page_t::buf_fix_count for tracking allocations.
recv_sys_t::free_corrupted_page(): Renamed from recv_recover_corrupt_page()
recv_sys_t::is_memory_exhausted(): Renamed from recv_sys_heap_check()
recv_sys_t::pages and its elements are allocated directly by the
system memory allocator.
recv_parse_log_recs(): Remove the parameter available_memory.
We rename some variables 'store_to_hash' to 'store', because
recv_sys.pages is not actually a hash table.
This is joint work with Thirunarayanan Balathandayuthapani.
[Variant 2 of the fix: collect the attached conditions]
Problem:
make_join_select() has a section of code which starts with
"We plan to scan all rows. Check again if we should use an index."
the code in that section will [unnecessarily] re-run the range
optimizer using this condition:
condition_attached_to_current_table AND current_table's_ON_expr
Note that the original invocation of range optimizer in
make_join_statistics was done using the whole select's WHERE condition.
Taking the whole select's WHERE condition and using multiple-equalities
allowed the range optimizer to infer more range restrictions.
The fix:
- Do range optimization using a condition that is an AND of this table's
condition and all of the previous tables' conditions.
- Also, fix the range optimizer to prefer SEL_ARGs with type=KEY_RANGE
over SEL_ARGS with type=MAYBE_KEY, regardless of the key part.
Computing
key_and(
SEL_ARG(type=MAYBE_KEY key_part=1),
SEL_ARG(type=KEY_RANGE, key_part=2)
)
will now produce the SEL_ARG with type=KEY_RANGE.
class log_file_t: more or less sane RAII wrapper around redo log file
descriptor and its path.
This change is motivated by the need of using that log_file_t somewhere else.
Problem:
=======
P1) Conditional jump or move depends on uninitialised value(s)
sql_ex_info::init(char const*, char const*, bool) (log_event.cc:3083)
code: All the following variables are not initialized.
----
return ((cached_new_format != -1) ? cached_new_format :
(cached_new_format=(field_term_len > 1 || enclosed_len > 1 ||
line_term_len > 1 || line_start_len > 1 || escaped_len > 1)));
P2) Conditional jump or move depends on uninitialised value(s)
Rows_log_event::Rows_log_event(char const*, unsigned
int, Format_description_log_event const*) (log_event.cc:9571)
Code: Uninitialized values is reported for 'var_header_len' variable.
----
if (var_header_len < 2 || event_len < static_cast<unsigned
int>(var_header_len + (post_start - buf)))
P3) Conditional jump or move depends on uninitialised value(s)
Table_map_log_event::pack_info(Protocol*) (log_event.cc:11553)
code:'m_table_id' is uninitialized.
----
void Table_map_log_event::pack_info(Protocol *protocol)
...
size_t bytes= my_snprintf(buf, sizeof(buf), "table_id: %lu (%s.%s)",
m_table_id, m_dbnam, m_tblnam);
Fix:
===
P1 - Fix)
Initialize cached_new_format,field_term_len, enclosed_len, line_term_len,
line_start_len, escaped_len members in default constructor.
P2 - Fix)
"var_header_len" is initialized by reading the event buffer. In case of an
invalid event the buffer will contain invalid data. Hence added a check to
validate the event data. If event_len is smaller than valid header length
return immediately.
P3 - Fix)
'm_table_id' within Table_map_log_event is initialized by reading data from
the event buffer. Use 'VALIDATE_BYTES_READ' macro to validate the current
state of the buffer. If it is invalid return immediately.
os_file_flush_data_func(): fix builds on POSIX OSs where fdatasync()
is not avaiable
log_t::files::flush_data_only(): rename from fdatasync()
log_t::files::fsync(): removed and replaced with flush_data_only().
It will flush everything we need for using redo log files.
This is the only symlink in the repository. Symlinks can cause
trouble when using file systems or operating systems that do not
support them.
Also remove the unused file DartConfig.cmake that refers to the script.
cmake -DWITH_INNODB_EXTRA_DEBUG:BOOL=ON
was broken ever since commit 8777458a6e
(MDEV-6076 Persistent AUTO_INCREMENT for InnoDB).
There is a race condition between page reads that call
page_zip_validate() (while holding clustered index root page S-latch)
and writes that update PAGE_ROOT_AUTO_INC
(with buf_block_t::lock SX-latch, compatible with S-latch).
page_zip_validate_low(): Skip the PAGE_ROOT_AUTO_INC field on
clustered index root pages in order to avoid false positives.
dict_table_t::parse_name(): Properly calculate the *tbl_name_len.
A failure was easily repeatable during the test
innodb.innodb-alter-debug for the table name test.① ("test/@2460").
The UTF-8 representation of the U+2460 is only 3 bytes "\xe2\x91\xa0"
while the filename-safe encoded counterpart of it in dict_table_t::name
is 5 bytes "@2460".
This bug, introduced by commit ea37b14409
(MDEV-16678), could cause a purge task to hang.
Post-push fix. aria_pack_mdev14183 test is unstable.
The fix is the following:
1. Disable the test for embedded server.
2. Create non-"transactional" Aria table in the test, as aria_pack does not
support "transactional" Aria tables.
While waiting for mutex, thread_pool_generic::wait_begin(),
current task can be marked long-running. This is done by periodic
mantainence task, that runs in parallel.
Fix to recheck is_long_task() after the mutex acquisition.
I found that memcpy_aligned was used incorrectly at redo log and decided to put
assertions in aligned functions. And found even more incorrect cases.
Given the amount discovered of bugs, I left assertions to prevent future bugs.
my_assume_aligned(): instead of MY_ASSUME_ALIGNED macro
This is 10.4 version.
Idea is to create monitor thread for both donor and joiner that will
periodically if needed extend systemd timeout while SST is being
processed. In 10.4 actual SST is executed by running SST script
and exchanging messages on pipe using blocking fgets. This fix
starts monitoring thread before SST script is started and
we stop monitoring thread when SST has been completed.
Since commit f52bf92014 the type
Sql_sort is non-trivial, because it includes a data member
Bounds_checked_array<SORT_FIELD> local_sortorder.
There still is no vtable, so memset() is safe to invoke, but
we must add a cast to silence a warning in GCC 8 or later.
For Merge_chuck structures first set the start and end positions of the buffer and
then adjust the end positions of the buffer if the records are dynamic in nature.
maybe_wake_or_create_thread()
A task that is executed,could be counted as waiting (after wait_begin()
before wait_end()) or as long-running (callback runs for a long time).
If task is both marked waiting and long running, then calculation of
current concurrency (# of executing tasks - # of long tasks - #of waiting tasks)
is wrong, as task is counted twice.
Thus current concurrency could go negative, but with unsigned arithmetic
it will become a huge number.
As a result, maybe_wake_or_create_thread() would neither wake or create
a thread, when it should. Which may result in a deadlock.