Quick read record uses different handler (H1) for finding records. It
cannot use ha_delete_row() handler (H2) as it is different search
mode: inited == INDEX for H1, inited == RND for H2. So, read handler
H1 uses index while write handler H2 uses random access.
For going next record in H1 there is info->last_pos optimization for
stepping index via tree_search_next(). This optimization can work with
deleted rows only if delete is conducted in the same handler, there
is:
67 int hp_rb_delete_key(HP_INFO *info, register HP_KEYDEF *keyinfo,
68 const uchar *record, uchar *recpos, int flag)
69 {
...
74 if (flag)
75 info->last_pos= NULL; /* For heap_rnext/heap_rprev */
But this cannot work for different handler. So, last_pos in H1 after
delete in H2 contains stale info->parents array and last_pos points
into that parents. In the specific test case last_pos' parent is
already freed node and tree_search_next() steps into it.
The fix invalidates local savings of info->parents and info->last_pos
based on key_version. Record deletion increments share->key_version in
H2, so in H1 we know the tree might be changed.
Another good measure would be to use H1 for delete. But this is bigger
refactoring than just bug fixing.
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.
Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.
For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.
Other things:
- Fixed calculation of max_records passed to hp_create() to take
into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
fragmentation. This also avoids a problem where we need 1K
to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
memory usage for heap to reflect this.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
on disable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE) the engine does
not know that the long unique is logically unique, because on the
engine level it is not. And the engine disables it,
Change the disable_indexes/enable_indexes API. Instead of the enum
mode, send a key_map of indexes that should be enabled. This way the
server will decide what is unique, not the engine.
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .
Code style changes have been done on top. The result of this change
leads to the following improvements:
1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.
2. Compiler can better understand the intent of the code, thus it leads
to more optimization possibilities. Additionally it enabled detecting
unused variables that had an empty default constructor but not marked
so explicitly.
Particular change required following this patch in sql/opt_range.cc
result_keys, an unused template class Bitmap now correctly issues
unused variable warnings.
Setting Bitmap template class constructor to default allows the compiler
to identify that there are no side-effects when instantiating the class.
Previously the compiler could not issue the warning as it assumed Bitmap
class (being a template) would not be performing a NO-OP for its default
constructor. This prevented the "unused variable warning".
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)
Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
ha_heap::external_lock contains some consistency checks for the table,#
in a debug compilation.
This code suffers from lack of synchronization, in a rare case
where mysql_lock_tables() fail, and unlock is forced, even if lock was
not previously taken.
To workaround, require EXTRA_DEBUG compile definition in order to activate
the consistency checks.The code still might be useful in some cases - but
the audience are developers looking for errors in single-threaded scenarios,
rather than multiuser stress-tests.
I_S tables were materialized too late, an attempt to use table
statistics before the table was created caused a crash.
Let's move table creation up. it only needs read_set to
be calculated properly, this happens in JOIN::optimize_inner(),
after semijoin transformation.
Note that tables are not populated at that point, so most of the
statistics would make no sense anyway. But at least field sizes
will be correct. And it won't crash.
`ha_heap::clone` was creating a handler by share's handlerton, which is
partition handlerton.
handler's handlerton should be used instead.
Here in particular, HEAP handlerton will be used and it will create ha_heap
handler.
Reimplement MDEV-14275 Improving memory utilization for information schema
Postpone temp table instantiation until after setup_fields().
Replace all unused (not marked in read_set) columns in an I_S table
with CHAR(0). This can drastically reduce the footprint of a MEMORY
table (a TABLE_CATALOG alone is 1538 bytes per row).
This does not change the engine. If the table was decided to be Aria
(because of, say, blobs) then after optimization it'll stay Aria
even if all blobs were removed.
Note 1: when transforming table structure, share->blob_fields is
preserved, otherwise Aria might switch from DYNAMIC to STATIC row format
and expect a special field for a deleted mark, which create_tmp_tabe
didn't provide.
Note 2: optimizer was doing handler::info() (to know the number of rows)
before the temp table is populated. That didn't make much sense. Now
it's done before the table is even instantiated. Preserve the old
behavior and report 0 rows.
This reverts e2664ee836 and a8458a2345
Final added to:
- All reasonable classes inhereted from Field
- All classes inhereted from Protocol
- Almost all Handler classes
- Some important Item classes
The stripped size of mariadbd is just 4K smaller, but several object files
showed notable improvements in common execution paths.
- Checked field.o and item_sum.o
Other things:
- Added 'override' to a few class functions touched by this patch.
- Removed 'virtual' from a new class functions that had/got 'override'
- Changed Protocol_discard to inherit from Protocol instad of Protocol_text
TDC_RT_REMOVE_ALL -> tdc_remove_table(). Some occurrences replaced with
TDC_element::flush() (whenver TABLE_SHARE is available).
TDC_RT_REMOVE_NOT_OWN[_KEEP_SHARE] -> TDC_element::flush(). These modes
assume that current thread owns TABLE_SHARE reference, which means we can
avoid hash lookup and flush unused TABLE instances directly.
TDC_RT_REMOVE_UNUSED -> TDC_element::flush_unused(). Only [ab]used by
mysql_admin_table() currently. Should be removed eventually.
Part of MDEV-17882 - Cleanup refresh version
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
Prototype change:
- virtual ha_rows records_in_range(uint inx, key_range *min_key,
- key_range *max_key)
+ virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+ const key_range *max_key,
+ page_range *res)
The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
This can be used for a better calculation of IO seeks when we
estimate the cost of a range index scan.
The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
Fixed also get_sweep_read_cost() for HEAP tables.
A conflict between MDEV-19514 (b42294bc64)
and MDEV-20934 (d7a2401750)
was resolved. We will not invoke the function ibuf_delete_recs()
from ibuf_merge_or_delete_for_page(). Instead, we will add that
logic to the function ibuf_read_merge_pages().