This is a fixup of MDEV-26345 commit
77ed235d50.
In MDEV-26345 the spider group by handler was updated so that it uses
the item_ptr fields of Query::group_by and Query::order_by, instead of
item. This was and is because the call to
join->set_items_ref_array(join->items1) during the execution stage,
just before the execution replaces the order-by / group-by item arrays
with Item_temptable_field.
Spider traverses the item tree during the group by handler (gbh)
creation at the end of the optimization stage, and decides a gbh could
handle the execution of the query. Basically spider gbh can handle the
execution if it can construct a well-formed query, executes on the
data node, and store the results in the correct places. If so, it will
create one, otherwise it will return NULL and the execution will use
the usual handler (ha_spider instead of spider_group_by_handler). To
that end, the general principle is the items checked for creation
should be the same items later used for query construciton. Since in
MDEV-26345 we changed to use the item_ptr field instead of item field
of order-by and group-by in query construction, in this patch we do
the same for the gbh creation.
The item_ptr field could be the uninitialised NULL value during the
gbh creation. This is because the optimizer may replace a DISTINCT
with a GROUP BY, which only happens if the original GROUP BY is empty.
It creates the artificial GROUP BY by calling create_distinct_group(),
which creates the corresponding ORDER object with item field aligning
with somewhere in ref_pointer_array, but leaving item_ptr to be NULL.
When spider finds out that item_ptr is NULL, it knows there's some
optimizer skullduggery and it is passed a query different from the
original. Without a clear contract between the server layer and the
gbh, it is better to be safe than sorry and not create the gbh in this
case.
Also add a check and error reporting for the unlikely case of item_ptr
changing from non-NULL at gbh construction to NULL at execution to
prevent server crash.
Also, we remove a check added in MDEV-29480 of order by items being
aggregate functions. That check was added with the premise that spider
was including auxiliary SELECT items which is referenced by ORDER BY
items. This premise was no longer true since MDEV-26345, and caused
problems such as MDEV-29546, which was fixed by MDEV-26345.
btr_cur_t::search_leaf(): In the BTR_SEARCH_PREV and BTR_MODIFY_PREV
modes, reset the previous search status before invoking
page_cur_search_with_match(). Otherwise, we the search could invoke
in a totally wrong subtree.
This fixes a regression that was introduced in
commit de4030e4d4 (MDEV-30400).
buf_block_alloc(): Define as an alias in buf0lru.h, which defines
the underlying buf_LRU_get_free_block().
buf_block_free(): Define as an alias of the non-inline function
buf_pool.free_block(block).
Reviewed by: Vladislav Lesin
Instead of repurposing buf_page_t::access_time for state()==MEMORY
blocks that are part of recv_sys.pages, let us define an anonymous
union around buf_page_t::hash. In this way, we will be able to
declare access_time private.
Reviewed by: Vladislav Lesin
row_purge_remove_sec_if_poss_leaf(): If there is an active transaction
that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1
so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if
the record needs to be purged. It could be the case that an active
transaction would insert this record between the time this check
completed and row_purge_remove_sec_if_poss_tree() acquired a latch
on the secondary index leaf page again.
row_purge_del_mark_error(), row_purge_check(): Some unlikely code
refactored into separate non-inline functions.
trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky
part of trx_sys_t::find_same_or_older() to a non-inline function.
trx_sys_t::find_same_or_older_in_purge(): A variant of
trx_sys_t::find_same_or_older() for use in the purge subsystem,
with potential concurrent access of the same trx_t object from
multiple threads.
trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the
regular data field trx_t::max_inactive_id, which we
use on systems that have native 64-bit loads or stores.
On any 64-bit system that seems to be supported by GCC, Clang or MSVC,
relaxed atomic loads and stores use the regular load and store
instructions. On -march=i686 the 64-bit atomic loads and stores
would use an XMM register.
This fixes a regression that had been introduced in
commit b7b9f3ce82 (MDEV-34515).
There would be messages
[ERROR] InnoDB: tried to purge non-delete-marked record in index
in the server error log, and an assertion ut_ad(0) would cause a
crash of debug instrumented builds. This could also cause incorrect
results for MVCC reads and corrupted secondary indexes.
The debug instrumented test case was written by Debarun Banerjee.
Reviewed by: Debarun Banerjee
Ignore snapshot isolation conflict during fragment removal, before
streaming transaction commits. This happens when a streaming
transaction creates a read view that precedes the INSERTion of
fragments into the streaming_log table. Fragments are INSERTed
using a different transaction. These fragment are then removed
as part of COMMIT of the streaming transaction. This fragment
removal operation could fail when the fragments were not part
the transaction's read view, thus violating snapshot isolation.
We periodically observe assertion failures in the mtr tests,
specifically in the /storage/innobase/row/row0ins.cc file,
following a WSREP error. The error message is: 'WSREP: record
locking is disabled in this thread, but the table being modified
is not mysql/wsrep_streaming_log: mysql/innodb_table_stats.'"
This issue seems to occur because, upon opening the table,
innodb_stats_auto_recalc may trigger, which Galera does not
anticipate. This commit should fix this bug.
The existing default value 1000 is too big and could result in
"hanging" when failing to connect a remote server. Three tries in
total is a more sensible default.
ha_storage_put_memlim(): Initialize node->next in order to avoid a
crash on a subsequent invocation, due to dereferencing an uninitialized
pointer.
This fixes a regression that had been introduced in
commit ccb6cd8053 (MDEV-35189).
Reviewed by: Debarun Banerjee
It is `ulint` on 10.6 and `uint32_t` on 10.11+, but I included its
format specifier change in 10.6 (MDEV-35430, merged #3493) rather
than 10.11. This commit reverts that change so 10.11 can reapply it.
`table->space_id` is `ulint` on 10.6 and `uint32_t` on 10.11+, but
I included its format specifier change in 10.6 rather than 10.11.
PR #3650 reverts the change from 10.6; this commit reapplies it
on 10.11 as a follow up on its batch (MDEV-35431, merged #3518).
Create a MY_WARNING_FLAGS_NON_FATAL for testing warnings
Add -Weffc++ as no-error espect, disabling from rocksdb due
to excessive errors that will be corrected later.
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
Various additional fixes, each too small to put into
their own commit.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
Change the type of my_hash_get_key to:
1) Return const
2) Change the context parameter to be const void*
Also fix casting in hash adjacent areas.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
A race condition was observed between two buf_page_get_zip() for a page.
One of them had proceeded to buf_read_page(), allocating and x-latching
a buf_block_t that initially comprises only an uncompressed page frame.
While that thread was waiting inside buf_block_alloc(), another thread
would try to access the same page. Without acquiring a page latch, it
would wrongly conclude that there is corruption because no compressed
page frame exists for the block.
buf_page_get_zip(): Simplify the logic and correct the documentation.
Always acquire a shared latch to prevent any race condition with a
concurrent read operation. No longer increment a buffer-fix; the latch
is sufficient for preventing page relocation or eviction.
buf_read_page(): Add the parameter bool unzip=true. In buf_page_get_zip()
there is no need to allocate an uncompressed page frame for reading a
compressed BLOB page. We only need that for other ROW_FORMAT=COMPRESSED
pages, or for writing compressed BLOB pages.
btr_copy_zblob_prefix(): Remove the message "Cannot load compressed BLOB"
because buf_page_get_zip() will already have reported a more specific
error whenever it returns nullptr.
row_merge_buf_add(): Do not crash on BLOB corruption, but return an
error instead. (In debug builds, an assertion will fail if this
corruption is noticed.)
Reviewed by: Debarun Banerjee
When srv_page_size and innodb_page_size were introduced,
the functions page_align() and page_offset() got more expensive.
Let us try to replace such calls with simpler pointer arithmetics
with respect to the buffer page frame.
page_rec_get_next_non_del_marked(): Add a page frame as a parameter,
and template<bool comp>.
page_rec_next_get(): A more efficient variant of page_rec_get_next(),
with template<bool comp> and const page_t* parameters.
lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks.
fseg_free_step(), fseg_free_step_not_header(): Take the header block
as a parameter.
Reviewed by: Vladislav Lesin
ut_hash_ulint(): Remove. The exclusive OR before a modulus operation
does not serve any useful purpose; it is only obfuscating code and
wasting some CPU cycles.
Reviewed by: Debarun Banerjee
ut_fold_ull(): For SIZEOF_SIZE_T < 8, we simulate universal hashing
(Carter and Wegman, 1977) by pretending that SIZE_T_MAX + 1
is a prime. In other words, we implement a Rabin–Karp rolling
hash algorithm similar to java.lang.String.hashCode().
This is used for representing 64-bit dict_index_t::id or
dict_table_t::id in the native word size.
For SIZEOF_SIZE_T >= 8, we just use an identity mapping.
Reviewed by: Debarun Banerjee
The HASH_ macros are unnecessarily obfuscating the logic,
so we had better replace them.
hash_cell_t::search(): Implement most of the HASH_DELETE logic,
for a subsequent insert or remove().
hash_cell_t::remove(): Remove an element.
hash_cell_t::find(): Implement the HASH_SEARCH logic.
xb_filter_hash_free(): Avoid any hash table lookup;
just traverse the hash bucket chains and free each element.
xb_register_filter_entry(): Search databases_hash only once.
rm_if_not_found(): Make use of find_filter_in_hashtable().
dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table():
Define non-inline to avoid unnecessary code duplication.
dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache():
Look for duplicate while finding the insert position.
dict_table_change_id_in_cache(): Merged to the only caller
row_discard_tablespace().
hash_insert(): Helper function of dict_sys_t::resize().
fil_space_t::create(): Look for a duplicate (and crash if found)
when searching for the insert position.
lock_rec_discard(): Take the hash array cell as a parameter
to avoid a duplicated lookup.
lock_rec_free_all_from_discard_page(): Remove a parameter.
Reviewed by: Debarun Banerjee
Problem:
========
From 10.6.13(86767bcc0f) version,
InnoDB purge thread does free the TRX_UNDO_CACHED undo segment.
Pre-10.6.13 version data directory can contain TRX_UNDO_CACHED
undo segment in system tablespace even though it
has external undo tablespace.
During slow shutdown, InnoDB collects the segment from tables
that exist in system tablespace and cached undo segment in
the system tablespace as used segment exist in system tablespace.
While shrinking the system tablespace, last used extent can be
used by undo cached segment. This extent blocks the shrinking of
system tablespace in a effective way.
Solution:
========
While freeing the unused segment, InnoDB should free the cached
undo segment header page exists in system tablespace and
reset the TRX_RSEG_UNDO_SLOTS to 0xff for the rollback segment
header page exists in system tablespace. This could improve
the shrinking of system tablespace further.
buf_pool_t::LRU_warn(): Also clear the try_LRU_scan flag, to ensure
that need_LRU_eviction() will hold. This should ensure progress when
buf_LRU_get_free_block() is expecting buf_flush_page_cleaner() to
make some room, even when buf_pool.LRU.count is small.
This hang was observed in trx_lists_init_at_db_start() while the last
batch of crash recovery was in progress, but it could theoretically
be possible also when a large part of the buffer pool is occupied by
record locks or the adaptive hash index.
Reviewed by: Debarun Banerjee
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes.
(Excludes previous parts #3485 and #3493)
- While saving the InnoDB table statistics, InnoDB does clone
the table object and its statistics to protect the metadata
in case if table is being dropped. From 10.6 onwards, any
background task inside InnoDB on the table takes MDL.
So metadata is protected by MDL of the table. Avoid the
cloning of the table and its associated index statistics.
error C2664: 'bool TestHr(PGLOBAL,HRESULT)': cannot convert argument 2
from 'MSXML2::IXMLDOMNodePtr' to 'HRESULT'
Prior to 17.12, there was a code-analysis warning C6216 at the affected
places (compiler generated cast between semantically different integral
types).
ha_storage_put_memlim(): Invoke my_crc32c() to "fold", and traverse
the hash table only once.
fold_lock(): Remove some redundant conditions and use my_crc32c()
instead of ut_fold_ulint_pair().
trx_i_s_cache_t::add(): Replaces add_lock_to_cache(),
search_innodb_locks(), and locks_row_eq_lock(). Avoid duplicated
traversal of the hash table.
Reviewed by: Debarun Banerjee
- Replace statement fails with duplicate key error when multiple
unique key conflict happens. Reason is that Server expects the
InnoDB engine to store the confliciting keys in ascending order.
But the InnoDB doesn't store the conflicting keys
in ascending order.
Fix:
===
- Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine
only when unique index order is different in .frm and innodb dictionary.
Problem:
========
- dict_stats_table_clone_create() does not initialize the
flag stats_error_printed in either dict_table_t or dict_index_t.
Because dict_stats_save_index_stat() is operating on a copy
of a dict_index_t object, it appears that
dict_index_t::stats_error_printed will always be false
for actual metadata objects, and uninitialized in
dict_stats_save_index_stat().
Solution:
=========
dict_stats_table_clone_create(): Assign stats_error_printed
for table and index while copying the statistics