for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
One change is that if the port is not supplied or out of bound, the
old behaviour is to print 3306. The new behaviour is to not print
it (if not supplied) or the out of bound value.
The existing syntax for CREATE SERVER
CREATE [OR REPLACE] SERVER [IF NOT EXISTS] server_name
FOREIGN DATA WRAPPER wrapper_name
OPTIONS (option [, option] ...)
option:
{ HOST character-literal
| DATABASE character-literal
| USER character-literal
| PASSWORD character-literal
| SOCKET character-literal
| OWNER character-literal
| PORT numeric-literal }
With this change we have:
option:
{ HOST character-literal
| DATABASE character-literal
| USER character-literal
| PASSWORD character-literal
| SOCKET character-literal
| OWNER character-literal
| PORT numeric-literal
| PORT quoted-numerical-literal
| identifier character-literal}
We store these options as a JSON field in the mysql.servers system
table. We retain the restriction that PORT needs to be a number, but
also allow it to be a quoted number, so that SHOW CREATE SERVER can be
used for dumping. Without an accompanied implementation of SHOW CREATE
SERVER, some mysqldump tests will fail. Therefore this commit should
be immediately followed by the one implementating SHOW CREATE SERVER,
with testing covering both.
The limit of socket length on unix according to libc is 108, see
sockaddr_un::sun_path, but in the table it is a string of max length
64, which results in truncation of socket and failure to connect by
plugins using servers such as spider.
In specifying a derived table with a union, for example
CREATE TABLE t (c1 INT KEY,c2 INT,c3 INT) ENGINE=MyISAM;
SELECT * FROM (SELECT * FROM t UNION SELECT * FROM t) AS d (d1,d2);
we bypass an earlier check for the correct number of specified column
names, causing a crash.
Fixed by adding a check for the correct number of supplied arguments
in st_select_lex_unit::rename_types_list()
Fix for MDEV-31466 - add optional derived table column names.
Column names within a SELECT_LEX structure can be left in a non-reparsable
state (as printed out from *::print) after JOIN::prepare. This caused
an incorrect view definition to be written into the .FRM file.
Fixed by resetting item list names in SELECT_LEX structures representing
derived tables before writing out the view definition.
Reviewed by Igor Babaev (igor@mariadb.com)
Extend derived table syntax to support column name assignment.
(subquery expression) [as|=] ident [comma separated column name list].
Prior to this patch, the optional comma separated column name list is
not supported.
Processing within the unit of the subquery expression will use
original column names, outside the unit will use the new names.
For example, in the query
select a1, a2 from
(select c1, c2, c3 from t1 where c2 > 0) as dt (a1, a2, a3)
where a2 > 10;
we see the second column of the derived table dt being used both within,
(where c2 > 0), and outside, (where a2 > 10), the specification.
Both conditions apply to t1.c2.
When multiple unit preparations are required, such as when being used within
a prepared statement or procedure, original column names are needed for
correct resolution. Original names are reset within mysql_derived_reinit().
Item_holder items, used for result tables in both TVC and union preparations
are renamed before use within st_select_lex_unit::prepare().
During wildcard expansion, if column names are present, items names are
set directly after creation.
Reviewed by Igor Babaev (igor@mariadb.com)
Update `SESSION_USER()` behaviour to be comparable with `CURRENT_USER()`.
`SESSION_USER()` will return the user and host columns from `mysql.user`
used to authenticate the user when the session was created.
Historically `SESSION_USER()` was an alias of `USER()` function. The
main difference with `USER()` behaviour after this changes is that
`SESSION_USER()` now returns the host column from `mysql.user` instead of
the client host or ip.
NOTE: `SESSION_USER_IS_USER` old mode is added to make the change
backward compatible.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
During sql_mode=ORACLE, ignore the NOCOPY keyword in stored routine
parameters. The optimization (pass-by-reference instead of
pass-by-value) helping to avoid value copying will be done in a separate
task when needed.
When calculate_cond_selectivity_for_table() takes into account multi-
column selectivities from range access, it tries to take-into account
that selectivity for some columns may have been already taken into account.
For example, for range access on IDX1 using {kp1, kp2}, the selectivity
of restrictions on "kp2" might have already been taken into account
to some extent.
So, the code tries to "discount" that using rec_per_key[] estimates.
This seems to be wrong and unreliable: the "discounting" may produce a
rselectivity_multiplier number that hints that the overall selectivity
of range access on IDX1 was greater than 1.
Do a conservative fix: if we arrive at conclusion that selectivity of
range access on condition in IDX1 >1.0, clip it down to 1.
Analysis:
The value gets appended as string instead of unescaped json value
Fix:
Append the value of json in a temporary string and then store it in the
field instead of directly storing as string.
SSL_CTX_set_ciphersuites() sets the TLSv1.3 cipher suites.
SSL_CTX_set_cipher_list() sets the ciphers for TLSv1.2 and below.
The current TLS configuration logic will not perform SSL_CTX_set_cipher_list()
to configure TLSv1.2 ciphers if the call to SSL_CTX_set_ciphersuites() was
successful. The call to SSL_CTX_set_ciphersuites() is successful if any TLSv1.3
cipher suite is passed into `--ssl-cipher`.
This is a potential security vulnerability because users trying to restrict
specific secure ciphers for TLSv1.3 and TLSv1.2, would unknowingly still have
the database support insecure TLSv1.2 ciphers.
For example:
If setting `--ssl_cipher=TLS_AES_128_GCM_SHA256:ECDHE-RSA-AES128-GCM-SHA256`,
the database would still support all possible TLSv1.2 ciphers rather than only
ECDHE-RSA-AES128-GCM-SHA256.
The solution is to execute both SSL_CTX_set_ciphersuites() and
SSL_CTX_set_cipher_list() even if the first call succeeds.
This allows the configuration of exactly which TLSv1.3 and TLSv1.2 ciphers to
support.
Note that there is 1 behavior change with this. When specifying only TLSv1.3
ciphers to `--ssl-cipher`, the database will not support any TLSv1.2 cipher.
However, this does not impose a security risk and considering TLSv1.3 is the
modern protocol, this behavior should be fine.
All TLSv1.3 ciphers are still supported if only TLSv1.2 ciphers are specified
through `--ssl-cipher`.
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.
Move memory allocations performed during Sys_var_gtid_binlog_state::do_check
to Sys_var_gtid_binlog_state::global_update where they will be freed before
the latter method returns.
The code erroneously called sec_since_epoch() for dates with zeros,
e.g. '2024-00-01'.
Fixi: adding a test that the date does not have zeros before
calling TIME_to_native().
The code in my_strtoll10_mb2 and my_strtoll10_utf32
could hit undefinite behavior by negation of LONGLONG_MIN.
Fixing to avoid this.
Also, fixing my_strtoll10() in the same style.
The previous reduction produced a redundant warning on
CAST(_latin1'-9223372036854775808' AS SIGNED)
The code in my_strntoull_8bit() and my_strntoull_mb2_or_mb4()
could hit undefinite behavior by negating of LONGLONG_MIN.
Fixing the code to avoid this.
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"
Fix for v.10.5
The loose regex for the MDEV-34539 test ended up
matching the opensuse in the path in buildbot.
Adjust to more complete regex including space,
backtick and \n, which becomes much less common
as a path name.
The loose regex for the MDEV-34539 test ended up
matching the opensuse in the path in buildbot.
Adjust to more complete regex including space,
backtick and \n, which becomes much less common
as a path name.
A CHAR column cannot be longer than 1024, because
Binlog_type_info_fixed_string::Binlog_type_info_fixed_string
replies on this fact - it cannot store binlog metadata for longer columns.
In case of the filename character set mbmaxlen is equal to 5,
so only 1024/5=204 characters can fit into the 1024 limit.
- In strict mode:
Disallowing creation of a CHAR column with octet length grater than 1024.
- In non-strict mode:
Automatically convert CHAR with octet length>1024 into VARCHAR.