Fixed tpool::pread() and tpool::pwrite() to return SSIZE_T on Windows,
so that huge numbers are not converted to negatives.
Also, make sure to never attempt reading/writing more bytes than
DWORD can accomodate (4G)
Removed all dependencies of command line arguments based on positions in
an array (this kind of code should never have been written).
Instead use option names, which are stable.
Reviewer: Sergei Golubchik
mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty
for pages of the temporary tablespace. Ever since
commit 5eb539555b
we never add those pages to buf_pool.flush_list.
mtr_t::commit(): Implement part of mtr_t::prepare_write() here,
and avoid acquiring log_sys.mutex if no log is written.
During IMPORT TABLESPACE fixup, we do not write log, but we must
add pages to buf_pool.flush_list and for that, be prepared
to acquire log_sys.flush_order_mutex.
mtr_t::do_write(): Replaces mtr_t::prepare_write().
The aim of the InnoDB change buffer is to avoid delays when a leaf page
of a secondary index is not present in the buffer pool, and a record needs
to be inserted, delete-marked, or purged. Instead of reading the page into
the buffer pool for making such a modification, we may insert a record to
the change buffer (a special index tree in the InnoDB system tablespace).
The buffered changes are guaranteed to be merged if the index page
actually needs to be read later.
The change buffer could be useful when the database is stored on a
rotational medium (hard disk) where random seeks are slower than
sequential reads or writes.
Obviously, the change buffer will cause write amplification, due to
potentially large amount of metadata that is being written to the
change buffer. We will have to write redo log records for modifying
the change buffer tree as well as the user tablespace. Furthermore,
in the user tablespace, we must maintain a change buffer bitmap page
that uses 2 bits for estimating the amount of free space in pages,
and 1 bit to specify whether buffered changes exist. This bitmap needs
to be updated on every operation, which could reduce performance.
Even if the change buffer were free of bugs such as MDEV-24449
(potentially causing the corruption of any page in the system tablespace)
or MDEV-26977 (corruption of secondary indexes due to a currently
unknown reason), it will make diagnosis of other data corruption harder.
Because of all this, it is best to disable the change buffer by default.
The problem was that "group_min_max optimization" does not work if
some aggregate functions, like COUNT(*), is used.
The function get_best_group_min_max() is using the join->sum_funcs
array to check which aggregate functions are used.
The bug was that aggregates in HAVING where not yet added to
join->sum_funcs at the time get_best_group_min_max() was called.
Fixed by populate join->sum_funcs already in prepare, which means that
all sum functions will be in join->sum_funcs in get_best_group_min_max().
A benefit of this approach is that we can remove several calls to
make_sum_func_list() from the code and simplify the function.
I removed some wrong setting of 'sort_and_group'.
This variable is set when alloc_group_fields() is called, as part
of allocating the cache needed by end_send_group() and does not need
to be set by other functions.
One problematic thing was that Spider is using *join->sum_funcs to detect
at which stage the optimizer is and do internal calculations of aggregate
functions. Updating join->sum_funcs early caused Spider to fail when trying
to find min/max values in opt_sum_query().
Fixed by temporarily resetting sum_funcs during opt_sum_query().
Reviewer: Sergei Petrunia
The problem was that get_best_group_min_max() did not check if fields used
by the "group_min_max optimization" where used in sub queries.
Because of this, it did not detect that a key (b,a) was used in the WHERE
clause for the statement:
SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ).
Fixed by also traversing the sub queries when checking if a field is used.
This disables group_min_max_optimization for the above query.
Reviewer: Sergei Petrunia
MENT-328 wrongly assumed that the backup failed because of warnings from
mariabackup about not found files. This is normal (and the error message
should be deleted).
randgen failed because mariabackup didn't retry BACKUP STAGE BLOCK DDL
if it failed with a deadlock.
To simplify things, I implemented the retry loop in the server as
this particular deadlock should be quickly resolved.
When a server is compiled with -fPIE, my_addr_resolve needs to
subtract the info.dli_fbase from symbol addresses in memory for
addr2line to recognize them. When a server is compiled without -fPIE,
my_addr_resolve should not do it. Unfortunately not all compilers
define __PIE__ when -fPIE was used (e.g. older gcc doesn't), so we
have to resort to run-time detection.
- Changed SST scripts to use /usr/bin/env bash instead of
/bin/bash for better portability.
- Fixed use of mktemp on non-Linux platforms to produce
temporary file instead of directory.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
(Backport Varun Gupta's patch + edit the commit comment)
Name resolution code produced errors for valid queries with window
functions (but not for queries which used aggregate functions as
window functions).
Name resolution code worked incorrectly, because window function
objects had is_window_func_sum_expr()=false. This was so, because
mark_as_window_func_sum_expr() was only called for aggregate functions
used as window functions.
The fix is to call it for any window function.
If innodb_flush_method=O_DSYNC, log_sys.flushed_to_disk_lsn is changed
without 'flush_lock' protection inside log_write().
This leads to a race condition, if there are 2 threads running in parallel,
doing log_write_up_to() with different values for 'flush_to_disk'
In this case, log_write() and log_write_flush_to_disk_low() can execute at
the same time, and both would change flushed_lsn.
The fix is to remove special treatment of durable writes from log_write().
There is no apparent reason for this special treatment, log_write_flush_to_disk_low()
is already optimized for durable writes.
Nor there is an apparent reason to call log_flush_notify() more often in
for O_DSYNC.
buf_page_get_low(): If the page was read-fixed, validate the page ID
because the page could have been marked as corrupted. We should retry
the page read in this case, instead of returning a soon-to-be-evicted
corrupted page to the caller.
This was initially only observed on Microsoft Windows.
On Linux, this was repeated after adding a sleep
to buf_pool_t::corrupted_evict() between
bpage->zip.fix.fetch_sub() and bpage->lock.x_unlock().
In commit 9bc874a594 (MDEV-23497)
the configuration option innodb_read_only_compressed was introduced
to giver users advance notice of a plan to remove ROW_FORMAT=COMPRESSED
support for InnoDB.
Based on user feedback, this plan has been scrapped.
Even though ROW_FORMAT=COMPRESSED is a dead end and causes some
overhead for InnoDB data structures, we can live with that.
Now that we know that some users really want to keep using
ROW_FORMAT=COMPRESSED, the previous default value of the parameter
innodb_read_only_compressed=ON should be changed to OFF, to allow
smooth upgrades to 10.6 and later versions, without requiring users
to update any configuration file.
- Store the deferred tablespace name while loading the tablespace
for backup process.
- Mariabackup stores the list of space ids which has page0 INIT_PAGE
records. backup_first_page_op() and first_page_init() was introduced
to track the page0 INIT_PAGE records.
- backup_file_op() and log_file_op() was changed to handle
FILE_MODIFY redo log records. It is used to identify the
deferred tablespace space id.
- Whenever file operation redo log was processed by backup,
backup_file_op() should check whether the space name exist
in deferred tablespace. If it is then it needs to store the
space id, name when FILE_MODIFY, FILE_RENAME redo log processed
and it should delete the tablespace name from defer list in other
cases.
- backup_fix_ddl() should check whether deferred tablespace has
any page0 init records. If it is then consider the tablespace
as newly created tablespace. If not then backup should try
to reload the tablespace with SRV_BACKUP_NO_DEFER mode to
avoid the deferring of tablespace.
This happens for example if one installs into home directory of a user
C:\Users\<username>\mariadb
The reason is that the service user "NT SERVICE\<service_name>" does
not have read and execute permissions for service executable mysqld.exe
in this directory.
Moreover, it would not have read permissions for server.dll loaded by the
exe, or to plugin directory, where plugins may reside.
The fix is to give service users read and execute permissions to bin, share, and
lib\plugin subdirectories.
The permission setting is doneby mysql_install_db.exe, but also in MSI.
It is important to do that in MSI, as we want permissions to survive
the MSI upgrade.
when it's run directly after main.mysql_json_mysql_upgrade
because mysqld--help-aria starts a second mysqld that reads the plugin
table, so it has to be flushed and closed at that time.