mariadb/include
Alexander Barkov 6075f12c65 MDEV-31071 Refactor case folding data types in Unicode collations
This is a non-functional change. It changes the way how case folding data
and weight data (for simple Unicode collations) are stored:

- Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO
- Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead.

This patch changes simple Unicode collations in a similar way
how MDEV-30695 previously changed Asian collations.

No new MTR tests are needed. The underlying code is thoroughly
covered by a number of ctype_*_ws.test and ctype_*_casefold.test
files, which were added recently as a preparation
for this change.

Old and new Unicode data layout
-------------------------------

Case folding data is now stored in separate tables
consisting of MY_CASEFOLD_CHARACTER elements with two members:

    typedef struct casefold_info_char_t
    {
      uint32 toupper;
      uint32 tolower;
    } MY_CASEFOLD_CHARACTER;

while weight data (for simple non-UCA collations xxx_general_ci
and xxx_general_mysql500_ci) is stored in separate arrays of
uint16 elements.

Before this change case folding data and simple weight data were
stored together, in tables of the following elements with three members:

    typedef struct unicase_info_char_st
    {
      uint32 toupper;
      uint32 tolower;
      uint32 sort;          /* weights for simple collations */
    } MY_UNICASE_CHARACTER;

This data format was redundant, because weights (the "sort" member) were
needed only for these two simple Unicode collations:
- xxx_general_ci
- xxx_general_mysql500_ci

Adding case folding information for Unicode-14.0.0 using the old
format would waste memory without purpose.

Detailed changes
----------------
- Changing the underlying data types as described above

- Including unidata-dump.c into the sources.
  This program was earlier used to dump UnicodeData.txt
  (e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt)
  into MySQL / MariaDB source files.
  It was originally written in 2002, but has not been distributed yet
  together with MySQL / MariaDB sources.

- Removing the old format Unicode data earlier dumped from UnicodeData.txt
  (versions 3.0.0 and 5.2.0) from ctype-utf8.c.
  Adding Unicode data in the new format into separate header files,
  to maintain the code easier:

    - ctype-unicode300-casefold.h
    - ctype-unicode300-casefold-tr.h
    - ctype-unicode300-general_ci.h
    - ctype-unicode300-general_mysql500_ci.h
    - ctype-unicode520-casefold.h

- Adding a new file ctype-unidata.c as an aggregator for
  the header files listed above.
2023-04-18 11:29:25 +04:00
..
atomic
mysql Merge 10.6 into 10.8 2023-04-12 15:50:08 +03:00
providers
aligned.h
aria_backup.h
assume_aligned.h
big_endian.h
byte_order_generic.h
byte_order_generic_x86.h
byte_order_generic_x86_64.h
CMakeLists.txt post fix for "move alloca() definition from all *.h files to one new header file" 2023-03-08 17:36:36 +01:00
decimal.h
dur_prop.h
errmsg.h
ft_global.h
handler_ername.h
handler_state.h
hash.h
heap.h Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
ilist.h Merge 10.6 into 10.8 2023-02-10 13:43:53 +02:00
json_lib.h Merge 10.8 into 10.9 2023-01-10 14:50:58 +02:00
keycache.h
lf.h
little_endian.h
m_ctype.h MDEV-31071 Refactor case folding data types in Unicode collations 2023-04-18 11:29:25 +04:00
m_string.h Merge 10.5 into 10.6 2023-04-11 16:15:19 +03:00
ma_dyncol.h Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
maria.h
mariadb_capi_rename.h
my_alarm.h
my_alloc.h
my_alloca.h Post-MDEV-30700: moving alloca() definitions from all *.h files to new header file 2023-03-13 17:41:06 +01:00
my_atomic.h
my_atomic_wrapper.h Merge 10.5 into 10.6 2023-02-10 13:03:01 +02:00
my_attribute.h
my_base.h Merge 10.8 into 10.9 2023-01-10 14:50:58 +02:00
my_bit.h
my_bitmap.h
my_byteorder.h
my_check_opt.h
my_compare.h Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
my_compiler.h
my_counter.h Apply clang-tidy to remove empty constructors / destructors 2023-02-09 16:09:08 +02:00
my_cpu.h
my_crypt.h
my_dbug.h
my_decimal_limits.h header typos 2022-12-20 08:55:48 +11:00
my_default.h
my_dir.h Merge 10.6 into 10.7 2023-01-04 14:52:25 +02:00
my_getopt.h
my_global.h Merge 10.8 into 10.9 2023-03-17 06:58:33 +02:00
my_handler_errors.h
my_libwrap.h
my_list.h
my_md5.h
my_minidump.h
my_net.h
my_nosys.h
my_pthread.h
my_rdtsc.h Merge 10.7 into 10.8 2023-01-10 14:42:50 +02:00
my_rnd.h
my_service_manager.h
my_stack_alloc.h
my_stacktrace.h
my_sys.h Merge 10.9 into 10.10 2023-03-17 06:59:46 +02:00
my_time.h
my_tree.h
my_uctype.h
my_user.h
my_valgrind.h
my_xml.h
myisam.h Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
myisamchk.h
myisammrg.h Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
myisampack.h
mysql.h Merge 10.6 into 10.7 2023-01-04 14:52:25 +02:00
mysql_com.h
mysql_com_server.h
mysql_embed.h
mysql_time.h
mysql_version.h.in
mysqld_default_groups.h
mysys_err.h
pack.h
password.h
pfs_file_provider.h
pfs_idle_provider.h
pfs_memory_provider.h
pfs_metadata_provider.h
pfs_socket_provider.h
pfs_stage_provider.h
pfs_statement_provider.h
pfs_table_provider.h
pfs_thread_provider.h
pfs_transaction_provider.h
probes_mysql.d.base
probes_mysql.h
probes_mysql_nodtrace.h.in
queues.h header typos 2022-12-20 08:55:48 +11:00
rijndael.h
scope.h
service_versions.h
source_revision.h.in
span.h Apply clang-tidy to remove empty constructors / destructors 2023-02-09 16:09:08 +02:00
sql_common.h
ssl_compat.h Merge 10.6 into 10.8 2023-04-12 15:50:08 +03:00
sslopt-case.h
sslopt-longopts.h
sslopt-vars.h
t_ctype.h
thr_alarm.h
thr_lock.h
thr_timer.h
typelib.h
violite.h
waiting_threads.h header typos 2022-12-20 08:55:48 +11:00
welcome_copyright_notice.h
wqueue.h
wsrep.h