mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Alexander Barkov	db2013787d	MDEV-23570 deprecate keep_files_on_create	2022-01-26 15:22:26 +04:00
Marko Mäkelä	e9aac09153	MDEV-25440: Indexed CHAR columns are broken with NO_PAD collations cmp_data(): Compare different-length CHAR fields with the new strnncollsp_nchars function that will pad spaces if needed. Any InnoDB ROW_FORMAT except the original one that was named ROW_FORMAT=REDUNDANT in MySQL 5.0.3 will internally store CHAR(n) columns as variable-length if the character encoding is variable length. Spaces may be trimmed from the end. For NOT NULL values, the minimum length is always n*mbminlen. In cmp_data() we only know the lengths in bytes and we cannot easily know the ROW_FORMAT. is_strnncoll_compatible(): Refactored from innobase_mysql_cmp(). innobase_mysql_cmp(): Merged to cmp_whole_field(). cmp_whole_field(): Invoke strnncollsp_nchars for the DATA_MYSQL (the CHAR type with any other collation than latin1_swedish_ci). Reviewed by: Alexander Barkov Tested by: Roel Roel Van de Paar	2022-01-26 12:42:17 +02:00
Marko Mäkelä	37144afbb0	Cleanup: Simplify cmp_geometry_field() and cmp_whole_field() Let us always compare DATA_GEOMETRY with cmp_geometry_field().	2022-01-26 12:21:05 +02:00
Marko Mäkelä	2cbf92522b	Cleanup: Remove an unused parameter of fts_add_doc_by_id()	2022-01-26 12:19:48 +02:00
Oleksandr Byelkin	7db489fc7d	new CC	2022-01-26 10:42:01 +01:00
Oleksandr Byelkin	b09a744383	new CC 3.2	2022-01-26 09:51:22 +01:00
Vladislav Vaintroub	2925d0f2ee	MDEV-27612 Connect : check buffer sizes, fix string format errors	2022-01-26 09:38:22 +01:00
Lena Startseva	b9623383cc	MDEV-8652: Partitioned table creation problem when creating from procedure context twice in same session The problem was solved in in MDEV-7990, this commit contains only test	2022-01-26 15:08:18 +07:00
Alexey Botchkov	020dc54dab	MDEV-20770 Server crashes in JOIN::transform_in_predicates_into_in_subq upon 2nd execution of PS/SP comparing GEOMETRY with other types. The Item_in_subselect::in_strategy keeps the value and as the error happens the condition isn't modified. That leads to wrong ::fix_fields execution on second PS run. Also the select->table_list is merged but not restored if an error happens, which causes hanging loops on the third PS execution.	2022-01-26 07:48:09 +04:00
Igor Babaev	0041265671	MDEV-27510 Query returns wrong result when using split optimization This bug may affect the queries that uses a grouping derived table with grouping list containing references to columns from different tables if the optimizer decides to employ the split optimization for the derived table. In some very specific cases it may affect queries with a grouping derived table that refers only one base table. This bug was caused by an improper fix for the bug MDEV-25128. The fix tried to get rid of the equality conditions pushed into the where clause of the grouping derived table T to which the split optimization had been applied. The fix erroneously assumed that only those pushed equalities that were used for ref access of the tables referenced by T were needed. In fact the function remove_const() that figures out what columns from the group list can be removed if the split optimization is applied can uses other pushed equalities as well. This patch actually provides a proper fix for MDEV-25128. Rather than trying to remove invalid pushed equalities referencing the fields of SJM tables with a look-up access the patch attempts not to push such equalities. Approved by Oleksandr Byelkin <sanja@mariadb.com>	2022-01-25 17:12:37 -08:00
Brandon Nesterenko	8b15d0d4e0	MDEV-16091: Seconds_Behind_Master spikes to millions of seconds This patch addresses two problems with rpl.rpl_seconds_behind_master_spike First, --sync_slave_with_master / select master_pos_wait seems to have a bug where it will hang after all master events have been executed. This patch removes the sync_slave_with_master command from the test, where it not required anyway as it is used to declare explicit cleanup Second, the test uses timestamps to ensure that the Seconds_Behind_Master value does not point to a time too far in the past. The checks of these timestamps were too strict, because they could be slightly inconsistent with the master and the SBM would be counted as invalid when it was actually correct. To fix this, a slight buffer was added to the check to ensure the value is valid but still does not point too far in the past Reviewed By: =========== Andrei Elkin <andrei.elkin@mariadb.com>	2022-01-25 15:32:23 -07:00
Alexander Barkov	216834b068	A cleanup for MDEV-18918/MDEV-20254 Adjusting rocksdb tests results.	2022-01-25 17:48:44 +04:00
Vladislav Vaintroub	be1d965384	MDEV-27373 wolfSSL 5.1.1 - compile wolfcrypt with kdf.c, to avoid undefined symbols in tls13.c - define WOLFSSL_HAVE_ERROR_QUEUE to avoid endless loop SSL_get_error - Do not use SSL_CTX_set_tmp_dh/get_dh2048, this would require additional compilation options in WolfSSL. Disable it for WolfSSL build, it works without it anyway. - fix "macro already defined" Windows warning.	2022-01-25 11:19:00 +01:00
Oleksandr Byelkin	8db47403ff	WolfSSL v5.1.1	2022-01-25 11:19:00 +01:00
Oleksandr Byelkin	157e66273b	5.7.37	2022-01-25 11:13:39 +01:00
Jan Lindström	0f7fececbf	Revert "MDEV-26223 Galera cluster node consider old server_id value even after modification of server_id [wsrep_gtid_mode=ON]" This reverts commit `a0f711e928`.	2022-01-25 11:05:41 +02:00
Alexey Botchkov	50e66db018	MDEV-25917 create table like fails if source table is partitioned and engine is myisam or aria with data directory. Create table like removes data_file_path/index_file_path from the thd->work_partition_info.	2022-01-25 12:58:17 +04:00
Jan Lindström	057178072c	Add have_debug.inc	2022-01-25 10:53:37 +02:00
Marko Mäkelä	93756c992f	MDEV-27229 fixup: GCC -Wunused-function	2022-01-25 09:00:18 +02:00
Alexander Barkov	62e320c86d	MDEV-18918 SQL mode EMPTY_STRING_IS_NULL breaks RBR upon CREATE TABLE .. SELECT The 10.5 version of the patch. Removing DEFAULT from INFORMATION_SCHEMA columns. DEFAULT in read-only tables is rather meaningless. Upgrade should go smoothly. Also fixes: MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables	2022-01-25 10:31:55 +04:00
Alexander Barkov	da37bfd8d6	MDEV-18918 SQL mode EMPTY_STRING_IS_NULL breaks RBR upon CREATE TABLE .. SELECT Removing DEFAULT from INFORMATION_SCHEMA columns. DEFAULT in read-only tables is rather meaningless. Upgrade should go smoothly. Also fixes: MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables	2022-01-25 10:31:03 +04:00
Marko Mäkelä	882f820c66	MDEV-27451 gcol.virtual_index_drop fails with LeakSanitizer errors Because commit `24773bf380` made dict_v_col_t encapsulate v_indexes, we must invoke dict_v_col_t::~dict_v_col_t() to destruct the container. This basically is a fixup of the merge commit `5008171b05` of the 10.2 commit `cf2c6b7f8d` (MDEV-24971). I did not debug why no leaks are reported for 10.2 or 10.3.	2022-01-24 20:23:35 +02:00
Oleksandr Byelkin	ebc77c6d17	Merge remote-tracking branch 'connect/10.2' into 10.2	2022-01-24 17:28:34 +01:00
Alexander Barkov	050508672c	A clean-up for MDEV-10654 add support IN, OUT, INOUT parameter qualifiers for stored functions Changes: 1. Enabling IN/OUT/INOUT mode for sql_mode=DEFAULT, adding tests for sql_mode=DEFAULT based by mostly translating compat/oracle.sp-inout.test to SQL/PSM with minor changes (e.g. testing trigger OLD.column and NEW.column as IN/OUT parameters). 2. Removing duplicate grammar: sp_pdparam and sp_fdparam implemented exactly the same syntax after - the first patch for MDEV-10654 (for sql_mode=ORACLE) - the change #1 from this patch (for sql_mode=DEFAULT) Removing separate rules and adding a single "sp_param" rule instead, which now covers both PRDEDURE and FUNCTION parameters (and CURSOR parameters as well!). 3. Adding a helper rule sp_param_name_and_mode, which is a combination of the parameter name and the IN/OUT/INOUT mode. It allows to simplify the grammer a bit. 4. The first patch unintentionally allowed IN/OUT/INOUT mode to be specified in CURSOR parameters. This is good for the IN keyword - it is allowed in PL/SQL CURSORs. This is not good the the OUT/INOUT keywords - they should not be allowed. Adding a additional symantic post-check.	2022-01-24 19:46:27 +04:00
ManoharKB	4572dc23f7	MDEV-10654 add support IN, OUT, INOUT parameter qualifiers for stored functions Problem: Currently stored function does not support IN/OUT/INOUT parameter qualifiers. This is needed for Oracle compatibility (sql_mode = ORACLE). Solution: Implemented parameter qualifier support to CREATE FUNCTION (reference: CREATE PROCEDURE) Implemented return by reference for OUT/INOUT parameters in execute_function() (reference: execute_procedure()) Files changed: sql/sql_yacc.yy: Added IN, OUT, INOUT parameter qualifiers for CREATE FUNCTION. sql/sp_head.cc: Added input and output parameter binding for IN/OUT/INOUT parameters in execute_function() so that OUT/INOUT can return by reference. sql/share/errmsg-utf8.txt: Added error message to restrict OUT/INOUT parameters while function being called from SQL query. mysql-test/suite/compat/oracle/t/sp-inout.test: Added test cases mysql-test/suite/compat/oracle/r/sp-inout.result: Added test results Reviewed-by: iqbal@hasprime.com	2022-01-24 19:46:27 +04:00
Sergei Golubchik	8acc7fb39c	MDEV-24088 Assertion in InnoDB's FTS code may be triggered by a repeated words fed to simple_parser plugin increment `position` for every word, because the plugin doesn't (FTS API doesn't use positions that InnoDB FTS relies on)	2022-01-24 11:30:48 +01:00
Nayuta Yanagisawa	5595ed9d9f	MDEV-27521 SIGSEGV in spider_parse_connect_info in MDEV-27106 branch Add NULL check to SPIDER_OPTION_STR_LIST.	2022-01-24 19:26:09 +09:00
Nayuta Yanagisawa	0599dd9014	MDEV-26858 Spider: Remove dead code related to HandlerSocket Remove the dead-code, in Spider, which is related to the Spider's HandlerSocket support. The code has been disabled for a long time and it is unlikely that the code will be enabled.	2022-01-24 19:26:09 +09:00
Nayuta Yanagisawa	72f34df349	MDEV-27106 Spider: specify connection to data node by engine-defined attributes We introduce engine-defined attributes to specify remote data nodes. The engine attributes do not cover all the existing DSN parameters because most of them need not be specified at the table level. We introduce the following three attributes: REMOTE_SERVER, REMOTE_DATABASE, REMOTE_TABLE. One cannot specify both DSN parameter, in COMMENT or CONNECT, and engine-defined attribute that are for the same SPIDER_SHARE attribute. For example, Spider returns an error if both COMMENT='table "t1"' and REMOTE_TABLE="t2" are specified for a single Spider table or a single partition in a Spider table.	2022-01-24 19:26:09 +09:00
Nayuta Yanagisawa	c5d09f731a	MDEV-5271 Support engine-defined attributes per partition Make it possible to specify engine-defined attributes on partitions as well as tables. If an engine-defined attribute is only specified at the table level, it applies to all the partitions in the table. This is a backward-compatible behavior. If the same attribute is specified both at the table level and the partition level, the per-partition one takes precedence. So, we can consider per-table attributes as default values. One cannot specify engine-defined attributes on subpartitions. Implementation details: * We store per-partition attributes in the partition_element class because we already have the part_comment field, which is for per-partition comments. * In the case of ALTER TABLE statements, the partition_elements in table->part_info is set up by mysql_unpack_partition(). So, we parse per-partition attributes after the call of the function.	2022-01-24 19:26:09 +09:00
Daniele Sciascia	49e3bd2cbc	MDEV-27553 Assertion `inited==INDEX' failed: in ha_index_end() In wsrep_schema code, call ha_index_end() only if the corresponding ha_index_init() call succeeded. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-01-24 09:46:21 +02:00
Daniel Black	83dd7db69d	MDEV-27314 InnoDB Buffer Pool Resize output cleanup (mtr postfix) More tests depending on 'Completed resizing buffer pool.' output	2022-01-24 17:28:06 +11:00
Haidong Ji	d0ca235d16	MDEV-27314 InnoDB Buffer Pool Resize output cleanup Cleaned up the log messages as suggested, with a minor code formatting change. On bullet point 13, I decided to not include timestamp in output message. In most (all?) cases, the output goes to the log file, which has timestamp already.	2022-01-24 11:14:26 +11:00
Otto Kekäläinen	c5c61b51b6	Extend the Gitlab-CI pipeline to run mini benchmark Implement new mini-benchmark script for simple CPU bound benchmark for the duration of 5 minutes. The script can be run stand-alone or as part of a CI pipeline. Extend Gitlab-CI to run mini-benchmark on every commit to catch if there are severe performance regressions. Also bump MARIADB_MAJOR_VERSION to 10.8 which is needed on the 10.8 branch.	2022-01-22 13:47:39 -08:00
Marko Mäkelä	1f5fc7b745	MDEV-27208: mtr --ps-protocol test fixup The test ./mtr --ps-protocol main.func_math was broken in commit `5b3ad94c7b` because in that mode, one of several truncation warnings for a single integer literal would be omitted. Those warnings are issued by the parser somewhere outside CRC32() or CRC32C().	2022-01-22 10:24:47 +02:00
Marko Mäkelä	2c16fd9baf	MDEV-24827, MDEV-20516 fixup: Use C90, plug memory leaks	2022-01-22 10:17:05 +02:00
Jan Lindström	2b6f235ae0	MDEV-21308 : WSREP: binlog ... cache not empty warnings on server with WSREP disabled Remove output if wsrep is not enabled.	2022-01-22 09:14:26 +02:00
Dmitry Shulga	f99d141cd2	MDEV-20516: Assertion `!lex->proc_list.first && !lex->result && !lex->param_list.elements' failed in mysql_create_view Execution of the CREATE VIEW statement sent via binary protocol where the flags of the COM_STMT_EXECUTE request a cursor to be opened before running the statement results in an assert failure. This assert fails since the data member thd->lex->result has not null value pointing to an instance of the class Select_materialize. The data member thd->lex->result is assigned a pointer to the class Select_materialize in the function mysql_open_cursor() that invoked in case the packet COM_STMT_EXECUTE requests a cursor to be opened. After thd->lex->result is assigned a pointer to an instance of the class Select_materialize the function mysql_create_view() is called (indirectly via the function mysql_execute_statement()) and the assert fails. The assert DBUG_ASSERT(!lex->proc_list.first && !lex->result && !lex->param_list.elements); was added by the commit `591c06d4b7`. Unfortunately , the condition !lex->result was specified incorrect. It was supposed that the thd->lex->result is set only by parser on handling the clauses SELECT ... INTO but indeed it is also set inside mysql_open_cursor() and that fact was missed by the assert's condition. So, the fix for this issue is to just remove the condition !lex->result from the failing assert.	2022-01-22 12:46:06 +07:00
Eugene Kosov	faaecc8fcf	MDEV-27273 Confusing column count in IMPORT TABLESPACE error message It's misleading to compare and write to user number of columns and fields. Thus, it would be better to remove that check and let use see a subsequent error message about missing or mispaced column. row_import::match_schema(): remove misleading check	2022-01-21 20:25:56 +03:00
Marko Mäkelä	5b3ad94c7b	MDEV-27208: Extend CRC32() and implement CRC32C() We used to define a native unary function CRC32() that computes the CRC-32 of a string using the ISO 3309 polynomial that is being used by zlib and many others. Often, a CRC is computed in pieces. To faciliate this, we introduce a 2-ary variant of the function that inputs a previous CRC as the first argument: CRC32('MariaDB')=CRC32(CRC32('Maria'),'DB'). InnoDB and MyRocks use a different polynomial, which was implemented in SSE4.2 instructions that were introduced in the Intel Nehalem microarchitecture. This is commonly called CRC-32C (Castagnoli). We introduce a native function that uses the Castagnoli polynomial: CRC32C('MariaDB')=CRC32C(CRC32C('Maria'),'DB'). This allows SELECT...INTO DUMPFILE to be used for the creation of files with valid checksums, such as a logically empty InnoDB redo log file ib_logfile0 corresponding to a particular log sequence number.	2022-01-21 19:24:00 +02:00
Alexander Barkov	e4b302e436	MDEV-27018 IF and COALESCE lose "json" property Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property from their arguments. The same problem was repeatable for single row subselects. The problem happened because the method Item::is_json_type() was inconsistently implemented across the Item hierarchy. For example, Item_hybrid_func and Item_singlerow_subselect did not override is_json_type(). Solution: - Removing Item::is_json_type() - Implementing specific JSON type handlers: Type_handler_string_json Type_handler_varchar_json Type_handler_tiny_blob_json Type_handler_blob_json Type_handler_medium_blob_json Type_handler_long_blob_json - Reusing the existing data type infrastructure to pass JSON type handlers across all item types, including classes Item_hybrid_func and Item_singlerow_subselect. Note, these two classes themselves do not need any changes! - Extending the data type infrastructure so data types can inherit their properties (e.g. aggregation rules) from their base data types. E.g. VARCHAR/JSON acts as VARCHAR, LONGTEXT/JSON acts as LONGTEXT when mixed to a non-JSON data type. This is done by: - adding virtual method Type_handler::type_handler_base() - adding a helper class Type_handler_pair - refactoring Type_handler_hybrid_field_type methods aggregate_for_result(), aggregate_for_min_max(), aggregate_for_num_op() to use Type_handler_pair. This change also fixes: MDEV-27361 Hybrid functions with JSON arguments do not send format metadata Also, adding mtr tests for JSON replication. It was not covered yet. And the current patch changes the replication code slightly.	2022-01-21 19:28:48 +04:00
Maheedhar PV	991d5dce32	Bug#31374305 - FORMAT() NOT DISPLAYING WHOLE NUMBER SIDE CORRECTLY FOR ES_MX AND ES_ES LOCALES Changed the grouping and decimal separator for spanish locales as per ICU. Change-Id: I5d80fa59d3e66372d904e17c22c532d4dd2c565b	2022-01-21 16:02:34 +01:00
Sergei Golubchik	4504e6d14e	test cases for MySQL bugs also fix a comment, and update a macro just in case	2022-01-21 16:02:34 +01:00
Sergei Golubchik	c9beef4315	don't build with OpenSSL 3.0, it doesn't work before MDEV-25785	2022-01-21 16:02:34 +01:00
Marko Mäkelä	b07920b634	MDEV-27199: Remove FIL_PAGE_FILE_FLUSH_LSN The only purpose of the field FIL_PAGE_FILE_FLUSH_LSN was to store the log sequence number for a new ib_logfile0 when the InnoDB redo log was missing at startup. Because FIL_PAGE_FILE_FLUSH_LSN no longer serves any purpose, we will stop updating it. The writes of that field were inherently risky, because they were not covered by neither the redo log nor the doublewrite buffer. Warning: After MDEV-14425 and before this change, users could perform a clean shutdown of the server, replace the ib_logfile0 with a 0-length file, and expect a valid log file to be created on the next server startup. After this change, if the FIL_PAGE_FILE_FLUSH_LSN had ever been updated in the past, the server would still create a log file in such a scenario, but possibly with an incorrect (too small) LSN. Users should not manipulate log files directly!	2022-01-21 16:16:32 +02:00
Marko Mäkelä	88d9fbb484	Disable adaptive spinning on buf_pool.mutex During the testing of MDEV-14425, buf_pool.mutex and log_sys.mutex were identified as the main bottlenecks for write workloads. Let us disable spinning also for buf_pool.mutex, except on ARMv8 where spinning was enabled for log_sys.mutex in commit `f7684f0ca5` (MDEV-26855). This was tested on AMD64 and recommended by Axel Schwenke. According to Krunal Bauskar, removing the spinloops did not improve performance in his tests on ARMv8.	2022-01-21 16:13:28 +02:00
Marko Mäkelä	5d54fd611f	Cleanup: Replace ut_crc32c(x,y) with my_crc32c(0,x,y)	2022-01-21 16:13:04 +02:00
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00
Marko Mäkelä	c1d7b4575e	MDEV-26870 --skip-symbolic-links does not disallow .isl file creation The InnoDB DATA DIRECTORY attribute is not implemented via symbolic links but something similar, *.isl files that contain the names of data files. InnoDB failed to ignore the DATA DIRECTORY attribute even though the server was started with --skip-symbolic-links. Native ALTER TABLE in InnoDB will retain the DATA DIRECTORY attribute of the table, no matter if the table will be rebuilt or not. Generic ALTER TABLE (with ALGORITHM=COPY) as well as TRUNCATE TABLE will discard the DATA DIRECTORY attribute. All tests have been run with and without the ./mtr option --mysqld=--skip-symbolic-links and some tests that use the InnoDB DATA DIRECTORY attribute have been adjusted for this.	2022-01-21 14:43:59 +02:00
Thirunarayanan Balathandayuthapani	28e166d643	MDEV-26784 [Warning] InnoDB: Difficult to find free blocks in the buffer pool Problem: ======= InnoDB ran out of memory during recovery and it fails to flush the dirty LRU blocks. The reason is that buffer pool can ran out before the LRU list length reaches BUF_LRU_OLD_MIN_LEN(256) threshold. Fix: ==== During recovery, InnoDB should write out and evict all dirty blocks.	2022-01-21 14:15:18 +05:30

... 8 9 10 11 12 ...

195337 commits