The weight scanner routine scanner_next() did not properly handle the cases
when a contraction produces no weights (is ignorable).
Adding a helper routine my_uca_scanner_set_weight() and using
it in all cases:
- A single ASCII character
- A contraction starting with an ASCII character
- A multi-byte character
- A contraction starting with a multi-byte character
Also adding two other helper routines:
- my_uca_scanner_next_expansion_weight()
- my_uca_scanner_set_weight_outside_maxchar()
to avoid using scanner->wbeg directly inside scanner_next().
This reduces the probability of similar future bugs.
This patch prepares the code for upcoming changes:
MDEV-27009 Add UCA-14.0.0 collations
MDEV-27042 UCA: Resetting contractions to ignorable does not work well
1. Adding "const" qualifiers to return type and parameters in functions:
- my_uca_contraction2_weight()
- my_wmemcmp()
- my_uca_contraction_weight()
- my_uca_scanner_contraction_find()
- my_uca_previous_context_find()
- my_uca_context_weight_find()
2. Adding a helper function my_uca_true_contraction_eq()
3. Changing the way how scanner->wbeg is set during context weight handling.
It was previously set inside functions:
- my_uca_scanner_contraction_find()
- my_uca_previous_context_find()
Now it's set inside scanner_next(), which makes the code more symmetric
for context-free and context-dependent sequences.
This makes then upcoming fix for MDEV-27042 simpler.
Additional changes:
1. Adding a fast path for ASCII characters
2. Adding dedicated MY_COLLATION_HANDLERs for collations with no contractions
(for utf8 and for utf8mb4 character sets). The choice between
the full-featured handler and the "no contraction" handler is
made at the collation initialization time.