Commit graph

216 commits

Author SHA1 Message Date
Monty
a206658b98 Change CHARSET_INFO character set and collaction names to LEX_CSTRING
This change removed 68 explict strlen() calls from the code.

The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name

Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible

Other things:
- There where compiler issues with ensuring that all character set names
  points to the same string: gcc doesn't allow one to use integer constants
  when defining global structures (constant char * pointers works fine).
  To get around this, I declared defines for each character set name
  length.
2021-05-19 22:54:07 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Daniel Black
f2fea295b4 ucs2: cppcheck - add va_end 2021-01-21 16:46:59 +11:00
Marko Mäkelä
cf87f3e08c Merge 10.4 into 10.5 2020-08-14 11:33:35 +03:00
Marko Mäkelä
2f7b37b021 Merge 10.3 into 10.4, except MDEV-22543
Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()
2020-08-13 18:48:41 +03:00
Marko Mäkelä
4bd56a697f Merge 10.2 into 10.3 2020-08-13 18:18:25 +03:00
Marko Mäkelä
31aef3ae99 Fix GCC 10.2.0 -Og -Wmaybe-uninitialized
For some reason, GCC emits more -Wmaybe-uninitialized warnings
when using the flag -Og than when using -O2. Many of the warnings
look genuine.
2020-08-11 15:58:16 +03:00
Monty
dbcd3384e0 MDEV-7947 strcmp() takes 0.37% in OLTP RO
This patch ensures that all identical character sets shares the same
cs->csname.
This allows us to replace strcmp() in my_charset_same() with comparisons
of pointers. This fixes a long standing performance issue that could cause
as strcmp() for every item sent trough the protocol class to the end user.

One consequence of this patch is that we don't allow one to add a character
definition in the Index.xml file that changes the csname of an existing
character set. This is by design as changing character set names of existing
ones is extremely dangerous, especially as some storage engines just records
character set numbers.

As we now have a hash over character set's csname, we can in the future
use that for faster access to a specific character set. This could be done
by changing the hash to non unique and use the hash to find the next
character set with same csname.
2020-07-23 10:54:33 +03:00
Alexander Barkov
cfe5ee90c8 MDEV-22043 Special character leads to assertion in my_wc_to_printable_generic on 10.5.2 (debug)
The code did not take into account that:
- U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis)
- Some character sets do not have a code for U+005C (e.g. swe7)

Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to
cover all special cases easier.
2020-05-09 16:01:30 +04:00
Alexander Barkov
f1e13fdc8d MDEV-21581 Helper functions and methods for CHARSET_INFO 2020-01-28 12:29:23 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru
5543b75550 Update FSF Address
* Update wrong zip-code
2019-05-11 21:29:06 +03:00
Alexander Barkov
a8efe7ab1f MDEV-17502 MDEV-17474 Change Unicode xxx_general_ci and xxx_bin collation implementation to "inline" style 2018-10-19 14:35:01 +04:00
Alexander Barkov
6eae037c4c MDEV-17474 Change Unicode collation implementation from "handler" to "inline" style 2018-10-17 06:44:40 +04:00
Marko Mäkelä
05459706f2 Merge 10.2 into 10.3 2018-08-03 15:57:23 +03:00
Marko Mäkelä
ef3070e997 Merge 10.1 into 10.2 2018-08-02 08:19:57 +03:00
Oleksandr Byelkin
cb5952b506 Merge branch '10.0' into bb-10.1-merge-sanja 2018-07-25 22:24:40 +02:00
Alexander Barkov
e2ac4098ed Simplify caseup() and casedn() in charsets
After the MDEV-13118 fix there's no code in the server that
wants caseup/casedn to change the argument in place for simple
charsets.  Let's remove this logic and always return the result in a
new string for all charsets, both simple and complex.

1. Removing the optimization that *some* character sets used in casedn()
  and caseup(), which allowed (and required) to change the case in-place,
  overwriting the string passed as the "src" argument.
  Now all CHARSET_INFO's work in the same way:
  non of them change the source string in-place, all of them now convert
  case from the source string to the destination string, leaving
  the source string untouched.

2. Adding "const" qualifier to the "char *src" parameter
   to caseup() and casedn().

3. Removing duplicate implementations in ctype-mb.c.
  Now both caseup() and casedn() implementations for all CJK character sets
  use internally the same function my_casefold_mb()
  (the former my_casefold_mb_varlen()).

4. Removing the "unused" attribute from parameters of some my_case{up|dn}_xxx()
   implementations, as the affected parameters are now *used* in the code.
   Previously these parameters were used only in DBUG_ASSERT().
2018-07-19 13:02:14 +04:00
luz.paz
3dd01669b4 Misc. typos
Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt  ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`
2018-04-05 15:26:57 +04:00
Alexander Barkov
c7a2f23a7b Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 2018-01-29 12:44:20 +04:00
Vladislav Vaintroub
9891ee5a2a Fix and reenable Windows compiler warning C4800 (size_t conversion). 2018-01-26 10:37:46 +00:00
Marko Mäkelä
2c1067166d Merge bb-10.2-ext into 10.3 2017-10-04 08:24:06 +03:00
Vladislav Vaintroub
7354dc6773 MDEV-13384 - misc Windows warnings fixed 2017-09-28 17:20:46 +00:00
Monty
536215e32f Added DBUG_ASSERT_AS_PRINTF compile flag
If compiling a non DBUG binary with
-DDBUG_ASSERT_AS_PRINTF asserts will be
changed to printf + stack trace (of stack
trace are enabled).

- Changed #ifndef DBUG_OFF to
  #ifdef DBUG_ASSERT_EXISTS
  for those DBUG_OFF that was just used to enable
  assert
- Assert checking that could greatly impact
  performance where changed to DBUG_ASSERT_SLOW which
  is not affected by DBUG_ASSERT_AS_PRINTF
- Added one extra option to my_print_stacktrace() to
  get more silent in case of stack trace printing as
  part of assert.
2017-08-24 01:05:50 +02:00
Sergei Golubchik
4a5d25c338 Merge branch '10.1' into 10.2 2016-12-29 13:23:18 +01:00
kevg
780db8e252 fix build and some warnings 2016-11-24 17:36:02 +03:00
Alexander Barkov
1d8eafbeaf Removing the unused function my_bincmp() from strings/ctype-ucs2.c 2016-11-24 15:55:55 +04:00
Alexey Botchkov
27025221fe MDEV-9143 JSON_xxx functions.
strings/json_lib.c added as a JSON library.
        SQL frunction added with sql/item_jsonfunc.h/cc
2016-10-19 14:10:03 +04:00
Alexander Barkov
5058ced5df MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2
Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().
2016-10-10 14:36:09 +04:00
Sergei Golubchik
66d9696596 Merge branch '10.0' into 10.1 2016-09-28 17:55:28 +02:00
Sergei Golubchik
77ce4ead81 Merge branch '5.5' into 10.0 2016-09-27 09:21:19 +02:00
Sergei Golubchik
a229091953 potential signedness issue
different fix for 07a33cdcef:

Bug #23296299 : HANDLE_FATAL_SIGNAL (SIG=11) IN MY_TOSORT_UTF32
2016-09-12 16:41:54 +02:00
Alexander Barkov
ee19806b8e MDEV-9711 NO PAD collations
Based on the patch from Daniil Medvedev (a Google Summer of Code task)
2016-09-06 12:50:02 +04:00
Alexander Barkov
e7ff281d2e MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring 2016-05-17 15:27:10 +04:00
Alexander Barkov
e09299511e MDEV-9665 Remove cs->cset->ismbchar()
Using a more powerfull cs->cset->charlen() instead.
2016-03-16 10:55:12 +04:00
Alexander Barkov
82bec8bfdf MDEV-9265 SuSE patches: Suspicious implicit sign extension 2015-12-15 11:04:51 +04:00
Alexander Barkov
af3c67056d MDEV-9265 SuSE patches: Suspicious implicit sign extension 2015-12-15 10:57:28 +04:00
Alexander Barkov
310c718cff MDEV-9178 Wrong result for CAST(CONVERT('1IJ3' USING ucs2) AS SIGNED)
Also, fixing compilation warnings in ctype-mb.ic (Windows).
2015-11-24 22:47:42 +04:00
Alexander Barkov
78b80cb6ba Adding MY_CHARSET_HANDLER::native_to_mb().
This is a pre-requisite patch for:
- MDEV-8433 Make field<'broken-string' use indexes
- MDEV-8625 Bad result set with ignorable characters when using a prefix key
- MDEV-8626 Bad result set with contractions when using a prefix key
2015-08-14 18:34:41 +04:00
Alexander Barkov
e4f8cea356 MDEV-8419 utf32: compare broken bytes as "greater than any non-broken character" 2015-07-07 09:15:58 +04:00
Alexander Barkov
3a606ba210 Fixing a bug in MDEV-8418 (utf16, utf16le) and MDEV-8417 (utf8mb4).
Fixing non-BMP characters to have the same weight, as it was before
MDEV-8418 and MDEV-8417.
2015-07-06 18:59:33 +04:00
Alexander Barkov
b2e324a21f MDEV-8416 ucs2: compare broken bytes as "greater than any non-broken character"
MDEV-8418 utf16: compare broken bytes as "greater than any non-broken character"
2015-07-06 15:50:56 +04:00
Alexander Barkov
197afb413f MDEV-6566 Different INSERT behaviour on bad bytes with and without character set conversion 2015-03-13 16:51:36 +04:00
Alexander Barkov
b1b6101af2 A preparatory patch for MDEV-6566.
Adding a new virtual function MY_CHARSET_HANDLER::copy_abort().
Moving character set specific code into the correspoding implementations
(for simple, multi-byte and mbmaxlen>1 character sets).
2015-03-02 18:24:22 +04:00
Sergei Golubchik
2ae7541bcf cleanup: s/const CHARSET_INFO/CHARSET_INFO/
as CHARSET_INFO is already const, using const on it
is redundant and results in compiler warnings (on Windows)
2014-12-04 10:41:51 +01:00