Problem: The functions my_like_range_xxx() returned
badly formed maximum strings for Asian character sets,
which made problems for storage engines.
Fix:
- Removed a number my_like_range_xxx() implementations,
which were in fact dumplicate code pieces.
- Using generic my_like_range_mb() instead.
- Setting max_sort_char member properly for Asian character sets
- Adding unittest/strings/strings-t.c,
to test that my_like_range_xxx() return well-formed
min and max strings.
Notes:
- No additional tests in mysql/t/ available.
Old tests cover the affected code well enough.
Fix warnings flagged by the new warning option -Wunused-but-set-variable
that was added to GCC 4.6 and that is enabled by -Wunused and -Wall. The
option causes a warning whenever a local variable is assigned to but is
later unused. It also warns about meaningless pointer dereferences.
Although the C standard mandates that sprintf return the number
of bytes written, some very ancient systems (i.e. SunOS 4)
returned a pointer to the buffer instead. Since these systems
are not supported anymore and are hopefully long dead by now,
simply remove the portability wrapper that dealt with this
discrepancy. The autoconf check was causing trouble with GCC.
Apart strict-aliasing warnings, fix the remaining warnings
generated by GCC 4.4.4 -Wall and -Wextra flags.
One major source of warnings was the in-house function my_bcmp
which (unconventionally) took pointers to unsigned characters
as the byte sequences to be compared. Since my_bcmp and bcmp
are deprecated functions whose only difference with memcmp is
the return value, every use of the function is replaced with
memcmp as the special return value wasn't actually being used
by any caller.
There were also various other warnings, mostly due to type
mismatches, missing return values, missing prototypes, dead
code (unreachable) and ignored return values.
strict aliasing violations.
Essentially, the problem is that large parts of the server were
developed in simpler times (last decades, pre C99 standard) when
strict aliasing and compilers supporting such optimizations were
rare to non-existent. Thus, when compiling the server with a modern
compiler that uses strict aliasing rules to perform optimizations,
there are several places in the code that might trigger undefined
behavior.
As evinced by some recent bugs, GCC does a somewhat good of job
misoptimizing such code, but on the other hand also gives warnings
about suspicious code. One problem is that the warnings aren't
always accurate, yet we can't afford to just shut them off as we
might miss real cases. False-positive cases are aggravated mostly
by casts that are likely to trigger undefined behavior.
The solution is to start a cleanup process focused on fixing and
reducing the amount of strict-aliasing related warnings produced
by GCC and others compilers. A good deal of noise reduction can
be achieved by just removing useless casts that are product of
historical cruft and are likely to trigger undefined behavior if
dereferenced.
32bit builds with the --enable-assembler flag (enabled by default)
fail with an error message: undefined reference to `strmov_overlapp'.
Since the fix for bug 48866 we use a home-grown strmov function
instead of the ctpcpy function, but the source file for this
function was missed in the Makefile.am.
The strings/Makefile.am file has been modified to include strmov.c
file into ASSEMBLER_x86 and ASSEMBLER_sparc32 sections.
strmov() is not guaranteed to work correctly on overlapping
source and destination buffers. On some OSes it may work,
but Fedora 12 has a stpcpy() that's not working correctly
on overlapping buffers.
Fixed to use the overlap-safe version of strmov instead.
Re-vitalized the overlap-safe version of strmov.
Problem: the "caseinfo" member of CHARSET_INFO structure was not
initialized for user-defined Unicode collations, which made the
server crash.
Fix: initializing caseinfo properly.
In MySQL when the mapping for space is changed to something other than
0x20 by defining a different collation, then space is not ignored when
comparing two strings.
This was happening because the function that performs the comparison
of two strings while ignoring ending spaces, was comparing the collation
value of a space with the ascii value of the ' ' character. This should
be changed to do comparison between the collated values.
with gcc 4.3.2
This patch fixes a number of GCC warnings about variables used
before initialized. A new macro UNINIT_VAR() is introduced for
use in the variable declaration, and LINT_INIT() usage will be
gradually deprecated. (A workaround is used for g++, pending a
patch for a g++ bug.)
GCC warnings for unused results (attribute warn_unused_result)
for a number of system calls (present at least in later
Ubuntus, where the usual void cast trick doesn't work) are
also fixed.
Problem:
Crash happened with a user-defined utf8 collation,
on attempt to insert a value longer than the column
to store.
Reason:
The "ctype" member was not initialized (NULL) when
allocating a user-defined utf8 collation, so an attempt
to call my_ctype(cs, *str) to check if we loose any important
data when truncating the value made the server crash.
Fix:
Initializing tge "ctype" member to a proper value.
mysql-test/r/ctype_ldml.result
Adding tests
mysql-test/t/ctype_ldml.test
Adding tests
strings/ctype-uca.c
Adding initialization of "ctype" member.
modified:
mysql-test/r/ctype_ldml.result
mysql-test/t/ctype_ldml.test
strings/ctype-uca.c
on cp932 and sjis environment.
Problem: case conversion erroneously changes the second bytes
of multi-byte sequences because single-byte functions were
called in a mistake.
Fix: call multi-byte aware functions instead.
The reference manual has instructions for adding new character
sets, and refers to the string/CHARSET_INFO.txt file. This file
is currently not present in the distribution.
Modify the build to include this file in the distribution.
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
The MONTHNAME/DAYNAME functions
returns binary string, so the LOWER/UPPER functions
are not effective on the result of MONTHNAME/DAYNAME call.
Character set of the MONTHNAME/DAYNAME function
result has been changed to connection character set.
Problem:
XML syntax parser allowed to use quoted strings as attribute names,
and tried to put them into parser state stack instead of identifiers.
After that parser failed, if quoted string contained some slash characters.
Fix:
- Disallowing quoted strings in regular tags.
- Allowing quoted string in DOCTYPE declararion, but
don't push it into parse state stack (just skip it).
When the fractional part in a multiplication of DECIMALs
overflowed, we truncated the first operand rather than the
longest. Now truncating least significant places instead
for more precise multiplications.
(Queuing at demand of Trudy/Davi.)
Grouping or ordering of long values in not indexed BLOB/TEXT columns
with GBK or BIG5 charsets crashes the server.
MySQL server uses sorting (the filesort procedure) in the temporary
table to evaluate the GROUP BY clause in case of lack of suitable index.
That procedure takes into account only first @max_sort_length bytes
(system variable, usually 1024) of TEXT/BLOB sorting key string.
The my_strnxfrm_gbk and my_strnxfrm_big5 fill temporary keys
with data of whole blob length instead of @max_sort_length bytes
length. That buffer overrun has been fixed.
Problem: some collation handlers called incorrect version
of my_like_range_xxx(), which led to wrong min_str and max_str,
so like range optimizer threw away good records.
Fix: changing the wrong handlers to call proper version of
my_like_range_xxx().