MALFORMED XPATH EXP
Problem:
A malformed XPATH expression in the ExtractValue query is causing
a server crash. This malformed XPATH expression is resulted when
the position attribute in the substring function contains ".." in
the beginning.
Solution:
The original crash is happening because the "../" is being evaluated
prematurely. It tries to access XML while it hasn't been parsed yet.
The premature evaluation is happening because the val_nodeset function
is being set to constant, in which case we proceed to evaluate them in
JOIN:prepare stage only. The solution to this is setting the val_nodeset
functions as non-constant. This forces us to evaluate the function in
the JOIN:exec stage and thus avoid any premature evaluation of the
XML strings.
WITH MALFORMED XPATH EXP
Problem:
A malformed XPATH expression in the ExtractValue query is
causing a server crash. This malformed XPATH expression is
resulted when the position attribute in the substring function
contains ".." in the beginning.
Solution:
The original crash is happening because the "../" is being
evaluated prematurely. It tries to access XML while it
hasn't been parsed yet. The premature evaluation is happening
because the val_nodeset function is being set to constant,
in which case we proceed to evaluate them in JOIN:prepare
stage only. The solution to this is setting the val_nodeset
functions as non-constant. This forces us to evaluate the function
in the JOIN:exec stage and thus avoid any premature evaluation of
the XML strings.
The XPATH implementation was not handling correctly the XPATH
production #19
(http://www.w3.org/TR/1999/REC-xpath-19991116/#node-sets),
namely
PathExpr ::= | FilterExpr '/' RelativeLocationPath
| FilterExpr '//' RelativeLocationPath
It was lacking context for the RelativeLocationPath and it was just
ignoring the second slash instead of treating it as a different axis
specifier.
Fixed the above two problems and added a test case.
Bug#57820 extractvalue crashes
Problem: ExtractValue and Replace crashed in some cases
due to invalid handling of empty and NULL arguments.
Per file comments:
@mysql-test/r/ctype_ujis.result
@mysql-test/r/xml.result
@mysql-test/t/ctype_ujis.test
@mysql-test/t/xml.test
Adding tests
@sql/item_strfunc.cc
Make sure Item_func_replace::val_str safely handles empty strings.
@sql/item_xmlfunc.cc
set null_value if nodeset_func returned NULL,
which is possible when the second argument is an
unset user variable.
This patch:
- Moves all definitions from the mysql_priv.h file into
header files for the component where the variable is
defined
- Creates header files if the component lacks one
- Eliminates all include directives from mysql_priv.h
- Eliminates all circular include cycles
- Rename time.cc to sql_time.cc
- Rename mysql_priv.h to sql_priv.h
added:
include/ctype_numconv.inc
mysql-test/include/ctype_numconv.inc
mysql-test/r/ctype_binary.result
mysql-test/t/ctype_binary.test
Adding tests
modified:
mysql-test/r/bigint.result
mysql-test/r/case.result
mysql-test/r/create.result
mysql-test/r/ctype_cp1251.result
mysql-test/r/ctype_latin1.result
mysql-test/r/ctype_ucs.result
mysql-test/r/func_gconcat.result
mysql-test/r/func_str.result
mysql-test/r/metadata.result
mysql-test/r/ps_1general.result
mysql-test/r/ps_2myisam.result
mysql-test/r/ps_3innodb.result
mysql-test/r/ps_4heap.result
mysql-test/r/ps_5merge.result
mysql-test/r/show_check.result
mysql-test/r/type_datetime.result
mysql-test/r/type_ranges.result
mysql-test/r/union.result
mysql-test/suite/ndb/r/ps_7ndb.result
mysql-test/t/ctype_cp1251.test
mysql-test/t/ctype_latin1.test
mysql-test/t/ctype_ucs.test
mysql-test/t/func_str.test
Fixing tests
@ sql/field.cc
- Return str result using my_charset_numeric.
- Using real multi-byte aware str_to_XXX functions
to handle tricky charset values propely (e.g. UCS2)
@ sql/field.h
- Changing derivation of non-string field types to DERIVATION_NUMERIC.
- Changing binary() for numeric/datetime fields to always
return TRUE even if charset is not my_charset_bin. We need
this to keep ha_base_keytype() return HA_KEYTYPE_BINARY.
- Adding BINARY_FLAG into some fields, because it's not
being set automatically anymore with
"my_charset_bin to my_charset_numeric" change.
- Changing derivation for numeric/datetime datatypes to a weaker
value, to make "SELECT concat('string', field)" use character
set of the string literal for the result of the function.
@ sql/item.cc
- Implementing generic val_str_ascii().
- Using max_char_length() instead of direct read of max_length
to make "tricky" charsets like UCS2 work.
NOTE: in the future we'll possibly remove all direct reads of max_length
- Fixing Item_num::safe_charset_converter().
Previously it alligned binary string to
character string (for example by adding leading 0x00
when doing binary->UCS2 conversion). Now it just
converts from my_charset_numbner to "tocs".
- Using val_str_ascii() in Item::get_time() to make UCS2 arguments work.
- Other misc changes
@ sql/item.h
- Changing MY_COLL_CMP_CONV and MY_COLL_ALLOW_CONV to
bit operations instead of hard-coded bit masks.
- Addding new method DTCollation.set_numeric().
- Adding new methods to Item.
- Adding helper functions to make code look nicer:
agg_item_charsets_for_string_result()
agg_item_charsets_for_comparison()
- Changing charset for Item_num-derived items
from my_charset_bin to my_charset_numeric
(which is an alias for latin1).
@ sql/item_cmpfunc.cc
- Using new helper functions
- Other misc changes
@ sql/item_cmpfunc.h
- Fixing strcmp() to return max_length=2.
Previously it returned 1, which was wrong,
because it did not fit '-1'.
@ sql/item_func.cc
- Using new helper functions
- Other minor changes
@ sql/item_func.h
- Removing unused functions
- Adding helper functions
agg_arg_charsets_for_string_result()
agg_arg_charsets_for_comparison()
- Adding set_numeric() into constructors of numeric items.
- Using fix_length_and_charset() and fix_char_length()
instead of direct write to max_length.
@ sql/item_geofunc.cc
- Changing class for Item_func_geometry_type and
Item_func_as_wkt from Item_str_func to
Item_str_ascii_func, to make them return UCS2 result
properly (when character_set_connection=ucs2).
@ sql/item_geofunc.h
- Changing class for Item_func_geometry_type and
Item_func_as_wkt from Item_str_func to
Item_str_ascii_func, to make them return UCS2 result
properly (when @@character_set_connection=ucs2).
@ sql/item_strfunc.cc
- Implementing Item_str_func::val_str().
- Renaming val_str to val_str_ascii for some items,
to make them work with UCS2 properly.
- Using new helper functions
- All single-argument functions that expect string
result now call this method:
agg_arg_charsets_for_string_result(collation, args, 1);
This enables character set conversion to @@character_set_connection
in case of pure numeric input.
@ sql/item_strfunc.h
- Introducing Item_str_ascii_func - for functions
which return pure ASCII data, for performance purposes,
as well as for the cases when the old implementation
of val_str() was heavily 8-bit oriented and implementing
a UCS2-aware version is tricky.
@ sql/item_sum.cc
- Using new helper functions.
@ sql/item_timefunc.cc
- Using my_charset_numeric instead of my_charset_bin.
- Using fix_char_length(), fix_length_and_charset()
and fix_length_and_charset_datetime()
instead of direct write to max_length.
- Using tricky-charset aware function str_to_time_with_warn()
@ sql/item_timefunc.h
- Using new helper functions for charset and length initialization.
- Changing base class for Item_func_get_format() to make
it return UCS2 properly (when character_set_connection=ucs2).
@ sql/item_xmlfunc.cc
- Using new helper function
@ sql/my_decimal.cc
- Adding a new DECIMAL to CHAR converter
with real multibyte support (e.g. UCS2)
@ sql/mysql_priv.h
- Introducing a new derivation level for numeric/datetime data types.
- Adding macros for my_charset_numeric and MY_REPERTOIRE_NUMERIC.
- Adding prototypes for str_set_decimal()
- Adding prototypes for character-set aware str_to_xxx() functions.
@ sql/protocol.cc
- Changing charsetnr to "binary" client-side metadata for
numeric/datetime data types.
@ sql/time.cc
- Adding to_ascii() helper function, to convert a string
in any character set to ascii representation. In the
future can be extended to understand digits written
in various non-Latin word scripts.
- Adding real multy-byte character set aware versions for str_to_XXXX,
to make these these type of queries work correct:
INSERT INTO t1 SET datetime_column=ucs2_expression;
@ strings/ctype-ucs2.c
- endptr was not calculated correctly. INSERTing of UCS2
values into numeric columns returned warnings about
truncated wrong data.
When values of different types are compared they're converted to a type that
allows correct comparison. This conversion is done for each comparison and
takes some time. When a constant is being compared it's possible to cache the
value after conversion to speedup comparison. In some cases (large dataset,
complex WHERE condition with many type conversions) query might be executed
7% faster.
A test case isn't provided because all changes are internal and isn't visible
outside.
The behavior of the Item_cache is changed to cache values on the first request
of cached value rather than at the moment of storing item to be cached.
A flag named value_cached is added to the Item_cache class. It's set to TRUE
when cache holds the value of the last stored item.
Function named cache_value() is added to the Item_cache class and derived classes.
This function actually caches the value of the saved item.
Item_cache_xxx::store functions now only store item to be cached and set
value_cached flag to FALSE.
Item_cache_xxx::val_xxx functions are changed to call cache_value function
prior to returning cached value if value_cached is FALSE.
The Arg_comparator::set_cmp_func function now calls cache_converted_constant
to cache constants if they need a type conversion.
The Item_cache::get_cache function is overloaded to allow setting of the
cache type.
The cache_converted_constant function is added to the Arg_comparator class.
It checks whether a value can and should be cached and if so caches it.
The problem is that XML functions(items) do not reset null_value
before their execution and further item excution may use
null_value value of the previous result.
The fix is to reset null_value.
Problem:
RelativeLocationPath can appear only after a node-set expression
in the third and the fourth branches of this rule:
PathExpr :: = LocationPath
| FilterExpr
| FilterExpr '/' RelativeLocationPath
| FilterExpr '//' RelativeLocationPath
XPatch code didn't check the type of FilterExpr and crashed.
Fix:
If FilterExpr is a scalar expression
(variable reference, literal, number, scalar function call)
return error.
Faster thr_alarm()
Added 'Opened_files' status variable to track calls to my_open()
Don't give warnings when running mysql_install_db
Added option --source-install to mysql_install_db
I had to do the following renames() as used polymorphism didn't work with Forte compiler on 64 bit systems
index_read() -> index_read_map()
index_read_idx() -> index_read_idx_map()
index_read_last() -> index_read_last_map()
Fixed compiler warnings, errors and link errors
Fixed new bug on Solaris with gethrtime()
Added --debug-check option to all mysql clients to print errors and memory leaks
Added --debug-info to all clients. This now works as --debug-check but also prints memory and cpu usage
--long-query-time is now given in seconds with microseconds as decimals
--min_examined_row_limit added for slow query log
long_query_time user variable is now double with 6 decimals
Added functions to get time in microseconds
Added faster time() functions for system that has gethrtime() (Solaris)
We now do less time() calls.
Added field->in_read_set() and field->in_write_set() for easier field manipulation by handlers
set_var.cc and my_getopt() can now handle DOUBLE variables.
All time() calls changed to my_time()
my_time() now does retry's if time() call fails.
Added debug function for stopping in mysql_admin_table() when tables are locked
Some trivial function and struct variable renames to avoid merge errors.
Fixed compiler warnings
Initialization of some time variables on windows moved to my_init()
Problem: Memory overrun happened in attempts to generate
error messages (e.g. in case of incorrect XPath syntax).
Reason: set_if_bigger() was used instead of set_if_smaller().
Change: replacing wrong set_if_bigger() to set_if_smaller(),
and making minor additional code clean-ups.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
Problem: when replacing the root element, UpdateXML
erroneously tried to mix old XML content with the
replacement string, which led to crash.
Fix: don't use the old XML content in these cases,
just return the replacement string.
Removed a lot of compiler warnings
Removed not used variables, functions and labels
Initialize some variables that could be used unitialized (fatal bugs)
%ll -> %l
- Removed not used variables
- Changed some ulong parameters/variables to ulonglong (possible serious bug)
- Added casts to get rid of safe assignment from longlong to long (and similar)
- Added casts to function parameters
- Fixed signed/unsigned compares
- Added some constructores to structures
- Removed some not portable constructs
Better fix for bug Bug #21428 "skipped 9 bytes from file: socket (3)" on "mysqladmin shutdown"
(Added new parameter to net_clear() to define when we want the communication buffer to be emptied)
Problem: "greater than" and "less than" XPath operators appeared to have been implemented in reverse.
Fix: swap arguments to eq_func() and eq_func_reverse() to provide correct operation result.
fragment is not well-formed xml
Problem:
- ExtractValue silently returned NULL if a wrong XML value is passed.
- In some cases "unexpected END-OF-INPUT" error was not detected, and
a non-NULL result could be returned for a bad XML value.
Fix:
- Adding warning messages, to make user aware why NULL was returned.
- Missing "unexpected END-OF-INPUT" error is reported now.
Problem source:
Qualified names (aka QName) didn't work as tag names and attribute names,
because the parser lacked a real rule to scan QName, so it understood
only non-qualified names without prefixes.
Solution:
New rule was added to check both "ident" and "ident:ident" sequences.
ExtractValue didn't understand tag and attribute names
consisting of "tricky" national letters (e.g. latin accenter letters).
It happened because XPath lex parser recognized only basic
latin letter a..z ad a part of an identifier.
Fixed to recognize all letters by means of new "full ctype" which
was added recently.