mirror of
https://github.com/MariaDB/server.git
synced 2026-04-12 11:26:33 +02:00
Summary:
The charset definition files sql/share/charsets/Index.xml and
mysql-test/std_data/ldml/Index.xml contained duplicate "flag" attributes
on single <collation> elements, violating XML well-formedness rules.
Standard XML parsers (xmllint, libxml2, etc.) reject duplicate attributes,
making these files unparseable by any spec-compliant tool.
Root Cause:
When nopad_bin collations were added, their flags were specified as
XML attributes: flag="binary" flag="nopad". The XML specification
(Section 3.1, Well-Formedness Constraint: Unique Att Spec) prohibits
duplicate attribute names on a single element. MariaDB's custom XML
parser in strings/xml.c happened to process both duplicates because
it handles attributes sequentially in a while loop, but this is
non-standard behavior that breaks interoperability with standard
XML tooling.
What the patch does:
Converts all 24 occurrences of duplicate flag attributes from
self-closing elements with duplicate attributes to elements with
child <flag> nodes. This follows the existing pattern already used
by many collations in the same file (e.g., big5_chinese_ci,
latin1_swedish_ci, utf8mb3_general_ci).
Before (invalid XML):
<collation name="latin2_nopad_bin" id="1101" flag="binary" flag="nopad"/>
After (valid XML):
<collation name="latin2_nopad_bin" id="1101">
<flag>binary</flag>
<flag>nopad</flag>
</collation>
No C code changes are required. The _CS_FLAG handler in
strings/ctype.c (around line 621) already processes <flag> child
elements using bitwise OR (|=) to accumulate flags, so both "binary"
(MY_CS_BINSORT) and "nopad" (MY_CS_NOPAD) flags are correctly applied.
Files modified:
- sql/share/charsets/Index.xml (23 collations fixed)
- mysql-test/std_data/ldml/Index.xml (1 collation fixed)
Complete list of 24 collations fixed:
sql/share/charsets/Index.xml:
1. latin2_nopad_bin (id=1101)
2. dec8_nopad_bin (id=1093)
3. cp850_nopad_bin (id=1104)
4. hp8_nopad_bin (id=1096)
5. koi8r_nopad_bin (id=1098)
6. swe7_nopad_bin (id=1106)
7. ascii_nopad_bin (id=1089)
8. cp1251_nopad_bin (id=1074)
9. hebrew_nopad_bin (id=1095)
10. latin7_nopad_bin (id=1103)
11. koi8u_nopad_bin (id=1099)
12. greek_nopad_bin (id=1094)
13. cp1250_nopad_bin (id=1090)
14. cp1257_nopad_bin (id=1082)
15. latin5_nopad_bin (id=1102)
16. armscii8_nopad_bin (id=1088)
17. cp866_nopad_bin (id=1092)
18. keybcs2_nopad_bin (id=1097)
19. macce_nopad_bin (id=1067)
20. macroman_nopad_bin (id=1077)
21. cp852_nopad_bin (id=1105)
22. cp1256_nopad_bin (id=1091)
23. geostd8_nopad_bin (id=1117)
mysql-test/std_data/ldml/Index.xml:
24. ascii2_nopad_bin (id=325)
Validation:
- xmllint --noout passes cleanly on both files after the fix
- Zero duplicate flag attributes remain (verified with grep)
- The fix is consistent with the existing pattern used by other
collations in the same files
Co-Authored-By: Claude AI <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| ascii2.xml | ||
| Index.xml | ||
| latin1.xml | ||