Some collations were causing IBMDB2I to report
inaccurate key range estimations to the optimizer
for LIKE clauses that select substrings. This can
be seen by running EXPLAIN. This problem primarily
affects multi-byte and unicode character sets.
This patch involves substantial changes to several
modules. There are a number of problems with the
character set and collation handling. These problems
have been or are being fixed, and a comprehensive
test has been included which should provide much
better coverage than there was before. This test
is enabled only for IBM i 6.1, because that version
has support for the greatest number of collations.
mysql-test/suite/ibmdb2i/r/ibmdb2i_collations.result:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
result file for test case.
mysql-test/suite/ibmdb2i/t/ibmdb2i_collations.test:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
Tests for character sets and collations. This test
is enabled only for IBM i 6.1, because that version
has support for the greatest number of collations.
storage/ibmdb2i/db2i_conversion.cc:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
- Added support in convertFieldChars to enable records_in_range
to determine how many substitute characters were inserted and
to suppress conversion warnings.
- Fixed bug which was causing all multi-byte and Unicode fields
to be created as UTF16 (CCSID 1200) fields in DB2. The corrected
code will now create UCS2 fields as UCS2 (CCSID 13488), UTF8
fields (except for utf8_general_ci) as UTF8 (CCSID 1208), and
all other multi-byte or Unicode fields as UTF16. This will only
affect tables that are newly created through the IBMDB2I storage
engine. Existing IBMDB2I tables will retain the original CCSID
until recreated. The existing behavior is believed to be
functionally correct, but it may negatively impact performance
by causing unnecessary character conversion. Additionally, users
accessing IBMDB2I tables through DB2 should be aware that mixing
tables created before and after this change may require extra type
casts or other workarounds. For this reason, users who have
existing IBMDB2I tables using a Unicode collation other than
utf8_general_ci are encouraged to recreate their tables (e.g.
ALTER TABLE t1 ENGINE=IBMDB2I) in order to get the updated CCSIDs
associated with their DB2 tables.
- Improved error reporting for unsupported character sets by forcing
a check for the iconv conversion table at table creation time,
rather than at data access time.
storage/ibmdb2i/db2i_myconv.h:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
Fix to set errno when iconv fails.
storage/ibmdb2i/db2i_rir.cc:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
Significant improvements were made to the records_in_range code
that handles partial length string data in keys for optimizer plan
estimation. Previously, to obtain an estimate for a partial key
value, the implementation would perform any necessary character
conversion and then attempt to determine the unpadded length of
the partial key by searching for the minimum or maximum sort
character. While this algorithm was sufficient for most single-byte
character sets, it did not treat Unicode and multi-byte strings
correctly. Furthermore, due to an operating system limitation,
partial keys having UTF8 collations (ICU sort sequences in DB2)
could not be estimated with this method.
With this patch, the code no longer attempts to explicitly determine
the unpadded length of the key. Instead, the entire key is converted
(if necessary), including padding, and then passed to the operating
system for estimation. Depending on the source and target character
sets and collations, additional logic is required to correctly
handle cases in which MySQL uses unconvertible or differently
-weighted values to pad the key. The bulk of the patch exists
to implement this additional logic.
storage/ibmdb2i/ha_ibmdb2i.h:
Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
The convertFieldChars declaration was updated to support additional
optional behaviors.
wmemset was being used to fill the row buffers.
wmemset was intended to fill the buffer with
16-bit UCS2 pad values. However, the 64-bit
version of wmemset uses 32-bit wide characters
and thus filled the buffer incorrectly. In some
cases, the null byte map would be overwritten,
causing ctype_utf8.test and ibmdb2i_rir.test to
fail, giving the error message CPF5035.
This patch eliminates the use of wmemset to fill
the row buffer. wmemset has been replaced with
memset16, which always fills memory with 16-bit
values.
storage/ibmdb2i/db2i_conversion.cc:
Bug#44811 Tests with utf8 charset fail with ibmdb2i on 64bit MySQL
Eliminate the use of wmemset to fill
the row buffer. Replace wmemset with
memset16, which always fills memory
with 16-bit values.
storage/ibmdb2i/db2i_misc.h:
Bug#44811 Tests with utf8 charset fail with ibmdb2i on 64bit MySQL
Eliminate the use of wmemset to fill
the row buffer. Replace wmemset with
memset16, which always fills memory
with 16-bit values.
When a user selected an unsupported character set for an
IBMDB2I table, error 2501 or 2511 may have been returned,
giving the appearance of an internal programming error.
This patch consolidates these errors into a single descriptive
error message for the common case of an unsupported character
set.
The new error number is 2504 and indicates a user error.
The errors 2501 and 2511 remain to indicate cases of internal
programming errors.
storage/ibmdb2i/db2i_charsetSupport.cc:
Bug#44232 Error msg should be improved when collation not supported.
consolidate errors 2501 and 2511 into a single
descriptive error message for the common case
of an unsupported character set.
storage/ibmdb2i/db2i_conversion.cc:
Bug#44232 Error msg should be improved when collation not supported.
consolidate errors 2501 and 2511 into a single
descriptive error message for the common case
of an unsupported character set.
storage/ibmdb2i/db2i_errors.cc:
Bug#44232 Error msg should be improved when collation not supported.
consolidate errors 2501 and 2511 into a single
descriptive error message for the common case
of an unsupported character set.
storage/ibmdb2i/db2i_errors.h:
Bug#44232 Error msg should be improved when collation not supported.
consolidate errors 2501 and 2511 into a single
descriptive error message for the common case
of an unsupported character set.
Modify plugins.m4 configuration framework so that plugins which are
not built still get added to the source distribution during make dist.
This came up now because we can only build ibmdb2i on i5/OS, and we
can't bootstrap our source dist on that platform. The solution is to
specify DIST_SUBDIRS containing all plugins, separate from SUBDIRS
which contains the plugins which are actually built.
This ibmdb2i code is from the ibmdb2i-ga3-src.zip file, with a patch
to plug.in to disable the plugin if the PASE environment isn't available.