Commit graph

80 commits

Author SHA1 Message Date
Alexander Barkov
4382ff0917 Merging from mysql-5.1-bugteam 2010-07-26 10:47:03 +04:00
Alexander Barkov
e497d6e6e1 Bug#45012 my_like_range_cp932 generates invalid string
Problem: The functions my_like_range_xxx() returned
badly formed maximum strings for Asian character sets,
which made problems for storage engines.

Fix: 
- Removed a number my_like_range_xxx() implementations,
  which were in fact dumplicate code pieces.
- Using generic my_like_range_mb() instead.
- Setting max_sort_char member properly for Asian character sets
- Adding unittest/strings/strings-t.c, 
  to test that my_like_range_xxx() return well-formed 
  min and max strings.

Notes:

- No additional tests in mysql/t/ available.
  Old tests cover the affected code well enough.
2010-07-26 09:06:18 +04:00
Davi Arnaut
b3d22cef93 WL#5486: Remove code for unsupported platforms
Remove MS-DOS specific code.
2010-07-15 08:16:06 -03:00
Alexander Barkov
5f6c630907 WL#4583 Case conversion in Asian character sets
modified:
  include/m_ctype.h
  - Changing type for tolower/toupper members, to store values >= 0xFFFF.
  - Adding function prototypes

  mysql-test/r/ctype_big5.result
  mysql-test/r/ctype_cp932_binlog_stm.result
  mysql-test/r/ctype_eucjpms.result*
  mysql-test/r/ctype_euckr.result
  mysql-test/r/ctype_gb2312.result
  mysql-test/r/ctype_gbk.result
  mysql-test/r/ctype_sjis.result
  mysql-test/r/ctype_ujis.result
  mysql-test/t/ctype_big5.test
  mysql-test/t/ctype_cp932_binlog_stm.test
  mysql-test/t/ctype_eucjpms.test
  mysql-test/t/ctype_euckr.test
  mysql-test/t/ctype_gb2312.test
  mysql-test/t/ctype_gbk.test
  mysql-test/t/ctype_sjis.test
  mysql-test/t/ctype_ujis.test
  -  Adding tests

  strings/ctype-big5.c
  strings/ctype-cp932.c
  strings/ctype-euc_kr.c
  strings/ctype-eucjpms.c
  strings/ctype-gb2312.c
  strings/ctype-gbk.c
  strings/ctype-sjis.c
  - Adding upper/lower case conversion data

  strings/ctype-mb.c
  - Adding handling of upper/lower conversion for multi-byte characters.

  strings/ctype-ujis.c
  - Implementing shared upper/lower conversion
    functions  for ujis and eucjpms
  - Adding upper/lower case conversion data for ujis
2010-01-14 15:17:57 +04:00
mkindahl@dl145h.mysql.com
759d2f6658 Merge dl145h.mysql.com:/data0/mkindahl/mysql-5.0-rpl
into  dl145h.mysql.com:/data0/mkindahl/mysql-5.1-rpl
2008-02-20 19:49:26 +01:00
bar@mysql.com/bar.myoffice.izhnet.ru
3a438cf57a Bug#32510 LIKE search fails with indexed 'eucjpms' and 'ujis' char column
Problem: some collation handlers called incorrect version
of my_like_range_xxx(), which led to wrong min_str and max_str,
so like range optimizer threw away good records.
Fix: changing the wrong handlers to call proper version of
my_like_range_xxx().
2008-02-04 11:10:40 +04:00
ramil/ram@ramil.myoffice.izhnet.ru
b5347808ed Merge mysql.com:/home/ram/work/b31070/b31070.5.0
into  mysql.com:/home/ram/work/b31070/b31070.5.1
2007-10-04 12:10:15 +05:00
ramil/ram@ramil.myoffice.izhnet.ru
fa252fec68 Merge mysql.com:/home/ram/work/b31070/b31070.4.1
into  mysql.com:/home/ram/work/b31070/b31070.5.0
2007-10-04 10:54:51 +05:00
ramil/ram@mysql.com/ramil.myoffice.izhnet.ru
bc9b4834e1 Fix for bug #31069: crash in 'sounds like'
and for bug #31070: crash during conversion of charsets

Problem: passing a 0 byte length string to some my_mb_wc_XXX() 
functions leads to server crash due to improper argument check.

Fix: properly check arguments passed to my_mb_wc_XXX() functions.
2007-10-04 10:20:00 +05:00
monty@mysql.com/narttu.mysql.fi
088e2395f1 WL#3817: Simplify string / memory area types and make things more consistent (first part)
The following type conversions was done:

- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t

Removed declaration of byte, gptr, my_string, my_size_t and size_s. 

Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
  instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
  as this requires fewer casts in the code and is more in line with how the
  standard functions work.
- Added extra length argument to dirname_part() to return the length of the
  created string.
- Changed (at least) following functions to take uchar* as argument:
  - db_dump()
  - my_net_write()
  - net_write_command()
  - net_store_data()
  - DBUG_DUMP()
  - decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
  argument to my_uncompress() from a pointer to a value as we only return
  one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
  the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
  casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.

Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
  needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
  explicitely as this conflict was often hided by casting the function to
  hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
  get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
  size_t. This was needed to properly detect errors (which are
  returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
  (portability fix)
- Removed windows specific code to restore cursor position as this
  causes slowdown on windows and we should not mix read() and pread()
  calls anyway as this is not thread safe. Updated function comment to
  reflect this. Changed function that depended on original behavior of
  my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
  m_size is the number of elements in the array, not a string/memory
  length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
  Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
  - Replaced some calls to alloc_root + memcpy to use
    strmake_root()/strdup_root().
  - Changed some calls from memdup() to strmake() (Safety fix)
  - Simpler loops in client-simple.c
2007-05-10 12:59:39 +03:00
kent@kent-amd64.(none)
be15e3bc15 Merge mysql.com:/home/kent/bk/main/mysql-5.0
into  mysql.com:/home/kent/bk/main/mysql-5.1
2006-12-23 20:20:40 +01:00
kent@mysql.com/kent-amd64.(none)
226a5c833f Many files:
Changed header to GPL version 2 only
2006-12-23 20:17:15 +01:00
bar@bar.intranet.mysql.r18.ru
3cfbe36adc After merge fix 2006-10-03 16:00:09 +05:00
bar@mysql.com/bar.intranet.mysql.r18.ru
29bc5cc179 Bug#6147: Traditional: Assigning a string to a numeric column has unexpected results
The problem was that when converting a string to an exact number,
rounding didn't work, because conversion didn't understand
approximate numbers notation.
Fix: a new function for string-to-number conversion was implemented,
which is aware of approxinate number notation (with decimal point
and exponent, e.g. -19.55e-1)
2006-07-20 13:41:12 +05:00
bar@mysql.com
8620ac4d64 Merge abarkov@bk-internal.mysql.com:/home/bk/mysql-4.1
into  mysql.com:/usr/home/bar/mysql-4.1.b15376
2006-03-23 14:29:43 +04:00
bar@mysql.com
b2b020b318 Merge mysql.com:/usr/home/bar/mysql-5.0
into  mysql.com:/usr/home/bar/mysql-5.1-new
2006-03-23 14:14:32 +04:00
bar@mysql.com
1008146d86 Merge mysql.com:/usr/home/bar/mysql-4.1.b15376
into  mysql.com:/usr/home/bar/mysql-5.0
2006-03-23 12:41:28 +04:00
bar@mysql.com
d7c773834b WL#1386 - CTYPE table for unicode character sets
A prerequisite for several fulltext and XML bugs.
MY_CHARSET_HANDLER now has a new function "ctype"
to detect a type of the next character in a string
(i.e. digit, letter, space, punctuation, control, etc),
which now works correctly for both 8bit and multibyte charsets.
Previously only 8bit charsets worked correctly,
while any multibyte character was considered as letter
in multibyte charsets.
Many files:
  Adding new function
Makefile.am:
  Adding build rules for uctypedump,
  a dump tool to create my_uctype.h
  using Unicode Character Database file.
m_ctype.h:
  Adding declaration of my_uni_ctype,
  ctype data for Unicode.
  Adding new member into MY_CHARSET_HANDLER
Makefile.am:
  Adding my_uctype.h into noinst_HEADERS
my_uctype.h, uctypedump.c:
  new files:
  ctype data for unicode,
  and the tool to generate it from 
  a Unicode Character Database file.
2006-02-02 10:07:47 +04:00
bar@mysql.com
949aced164 Merge mysql.com:/usr/home/bar/mysql-4.1.b15377
into  mysql.com:/usr/home/bar/mysql-5.0
2006-01-13 16:39:33 +04:00
bar@mysql.com
9ac6e558d4 Bug#15375 Unassigned multibyte codes are broken
into parts when converting to Unicode.
m_ctype.h:
  Reorganizing mb_wc return codes to be able
  to return "an unassigned N-byte-long character".
sql_string.cc:
  Adding code to detect and properly handle
  unassigned characters (i.e. the those character
  which are correctly formed according to the 
  character specifications, but don't have Unicode
  mapping).
Many files:
  Fixing conversion function to return new codes.
ctype_ujis.test, ctype_gbk.test, ctype_big5.test:
  Adding a test case.
ctype_ujis.result, ctype_gbk.result, ctype_big5.result:
  Fixing results accordingly.
2005-12-12 21:42:09 +04:00
bar@mysql.com
b99f9ab723 Bug#15377 Valid multibyte sequences are truncated on INSERT
ctype-euc_kr.c:
ctype-gb2312.c:
  Adding specific well_formed_length functions
  for gb2312 and euckr, to allow storing characters
  which are correct according to the character set
  specifications but just don't have Unicode mapping.
  Previously only those which have Unicode mapping
  could be stored, while unassigned characters lead
  to data truncation.
Many files:
  new file
2005-12-09 16:37:58 +04:00
bar@mysql.com
39b0712cf7 type_binary.result, type_binary.test:
new file
mysql_fix_privilege_tables.sql, mysql_create_system_tables.sh:
  Adding true BINARY/VARBINARY: fixing "password" type, not to be 0x00-padding.
Many files:
  Adding true BINARY/VARBINARY: fixing tests not to output 0x00 bytes.
  Adding true BINARY/VARBINARY: new pad_char structure member.
ctype-bin.c:
  Adding true BINARY/VARBINARY: new pad_char structure member.
  New strnxfrm, with two trailing length bytes.
field.cc:
  Adding true BINARY/VARBINARY.
2005-10-13 19:16:19 +05:00
elliot@mysql.com
55b81af671 Merge mysql.com:/home/emurphy/src/bk-clean/mysql-4.1
into  mysql.com:/home/emurphy/src/work/mysql-5.0
2005-08-19 15:29:30 -04:00
elliot@mysql.com
197782605f BUG#11338 (logging of prepared statement w/ blob type)
In cp932, '\' character can be the second byte in a 
multi-byte character stream. This makes it difficult to use
mysql_escape_string. Added flag to indicate which languages allow
'\' as second byte of multibyte sequence so that when putting a prepared
statement into the binlog we can decide at runtime whether hex encoding
is really needed.
2005-08-17 04:26:32 -04:00
bar@mysql.com
2df945d87b Bug#8610: The ucs2_turkish_ci collation fails with upper('i')
UPPER/LOWER now can return a string with different length.

mi_test1.c:
  Adding new arguments.
Many files:
  Changeing caseup/casedn to return a result with different
  length than argument.
sql_string.h:
  Removing unused method,
mysql_priv.h:
  Removing unused method
2005-06-06 16:54:15 +05:00
bar@noter.(none)
153b928c10 Bug#9509 Optimizer: wrong result after AND with latin1_german2_ci
We cannot propagate constants with tricky collations.
2005-05-05 21:13:57 +05:00
bar@mysql.com
8828884f4c CSC#4385: slow sorting for UTF8 large table:
my_strnxfrm_utf8 now requires 2 bytes per character
in filesort key, instead of 3 bytes per character.
Shorter filesort keys make sorting faster.
2005-01-26 16:34:09 +04:00
bar@noter.intranet.mysql.r18.ru
362de9467c Incorporating new faster string->number converter functions
into MY_CHARSET_INFO structure.
2004-09-25 15:29:33 +05:00
bar@mysql.com
501954ddb5 Bug #3453 MySQL output formatting in multibyte character sets 2004-09-09 18:21:31 +05:00
bar@mysql.com
f81edf4afd A new function to meassure terminal screen cells number for a string. 2004-08-25 11:39:43 +05:00
bar@mysql.com
1cd108ff97 Many files:
LIKE crashed mysqld for binary collations in some cases
2004-08-18 12:07:54 +05:00
bar@mysql.com
cbd3e61c8d Unicode collation algorithm: contraction support.
E.g. 'Ch' is treated as a separate letter in Czech,
not as a combination of C+h.
2004-06-12 20:36:58 +05:00
bar@mysql.com
f8b15e8bb6 Initialize max_sort_char only if a character set is requested. 2004-06-11 17:50:20 +05:00
bar@mysql.com
c64d93b274 Allocate memory when a character set is requested:
- For simple character sets: from_uni convertion table.
- For UCA: alternative weight arrays.
Use mbminlen instead of MY_CS_NONTEXT
2004-06-11 16:29:16 +05:00
bar@mysql.com
34d413a6a0 Optimization to use less memory. 2004-06-10 19:10:21 +05:00
bar@bar.intranet.mysql.r18.ru
8962ed3c7d WL#916: Unicode collations for some languages 2004-06-08 17:56:15 +05:00
bar@bar.intranet.mysql.r18.ru
391d5629f6 Preparation for user-defined Unicode collations:
weights data now comes from a static variables
but from the charset structure.
2004-05-25 17:40:20 +05:00
bar@bar.intranet.mysql.r18.ru
fc17aad767 min_sort_char was added, for the future UCA implementation.
UCS2 now has its own my_like_range function.
2004-03-19 10:00:46 +04:00
monty@mysql.com
e9315f984d Changed wellformedlen to well_formed_len
Fixed that blobs >16M can be inserted/updated
Fixed bug when doing CREATE TEMPORARY TABLE ... LIKE
2004-02-17 01:35:17 +02:00
bar@bar.intranet.mysql.r18.ru
d13ad0822e Problem fix:
http://bugs.mysql.com/bug.php?id=2366
Wrong utf8 behaviour when data is trancated
2004-02-06 16:59:25 +04:00
bar@bar.intranet.mysql.r18.ru
f802ec0215 UCS-2 aligning 0xAA -> 0x00AA 2004-01-19 19:16:30 +04:00
monty@mysql.com
ca953d5e7d Merge bk-internal.mysql.com:/home/bk/mysql-4.1
into mysql.com:/my/mysql-4.1
2003-12-08 12:26:10 +02:00
monty@mysql.com
e8aef44349 Portability fixes for Windows 2003-12-08 12:25:37 +02:00
serg@serg.mylan
b2e6b36487 fix for my_mbcharlen(charset, c) to return 1 for single-byte characters
(isn't it obvious ?)
2003-12-06 19:05:26 +01:00
bar@bar.mysql.r18.ru
e0a0790ebb Fixed that multibyte charsets didn't honor multibyte
sequence boundaries in functions LIKE and LOCATE in
the case of "binary" collation. Comparison was done
like if the strings were just a binary strings without
character set assumption.
2003-09-19 15:18:19 +05:00
bar@bar.mysql.r18.ru
83c6946232 Bug fix:
http://bugs.mysql.com/bug.php?id=1264
2003-09-16 15:43:17 +05:00
bar@bar.mysql.r18.ru
762ca8b6f8 Fix strnxfrm_multiplye from 0 to 1 for charsets that do not use strnxfrm 2003-08-18 17:24:50 +05:00
bar@bar.mysql.r18.ru
78c7d40986 BINARY collations for every character set 2003-05-23 18:39:55 +05:00
bar@bar.mysql.r18.ru
8192d169a2 CHARSET_INFO structure reorganization for easier maintainance 2003-05-23 17:45:52 +05:00
bar@bar.mysql.r18.ru
2b1e1f6494 Variables were rename, binary collation names were added
Fixed that SHOW CHARACTER SET displayed non-dynamic charsets even if they were not really compiled
2003-05-22 17:20:19 +05:00