When using GROUP_CONCAT with ORDER BY, a tree is used for the sorting, as
opposed to normal nested loops join used when there is no ORDER BY.
The tree traversal that generates the result counts the lines that have been
cut down. (as they get cut down to the field's max_size)
But the check of that count was before the tree traversal, so no
warning was generated if the output is truncated.
Fixed by moving the check to after the tree traversal.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
Validity checks for nested set functions
were not taking into account that the enclosed
set function may be on a nest level that is
lower than the nest level of the enclosing set
function.
Fixed by :
- propagating max_sum_func_level
up the enclosing set functions chain.
- updating the max_sum_func_level of the
enclosing set function when the enclosed set
function is aggregated above or on the same
nest level of as the level of the enclosing
set function.
- updating the max_arg_level of the enclosing
set function on a reference that refers to
an item above or on the same nest level
as the level of the enclosing set function.
- Treating both Item_field and Item_ref as possibly
referencing items from outer nest levels.
If a set function with a outer reference s(outer_ref) cannot be aggregated
the outer query against which the reference has been resolved then MySQL
interpretes s(outer_ref) in the same way as it would interpret s(const).
Hovever the standard requires throwing an error in this situation.
Added some code to support this requirement in ansi mode.
Corrected another minor bug in Item_sum::check_sum_func.
When creating a temporary table the concise column type
of a string expression is decided based on its length:
- if its length is under 512 it is stored as either
varchar or char.
- otherwise it is stored as a BLOB.
There is a flag (convert_blob_length) to create_tmp_field
that, when >0 allows to force creation of a varchar if the
max blob length is under convert_blob_length.
However it must be verified that convert_blob_length
(settable through a SQL option in some cases) is
under the maximum that can be stored in a varchar column.
While performing that check for expressions in
create_tmp_field_from_item the max length of the blob was
used instead. This causes blob columns to be created in the
heap temp table used by GROUP_CONCAT (where blobs must not
be created in the temp table because of the constant
convert_blob_length that is passed to create_tmp_field() ).
And since these blob columns are not expected in that place
we get wrong results.
Fixed by checking that the value of the flag variable is
in the limits that fit into VARCHAR instead of the max length
of the blob column.
from func_group.test after the patch for bug #27229 had been applied.
The memory corruption happened because in some rare cases the function
count_field_types underestimated the number of elements in
in the array param->items_to_copy.
context was used as an argument of GROUP_CONCAT.
Ensured correct setting of the depended_from field in references
generated for set functions aggregated in outer selects.
A wrong value of this field resulted in wrong maps returned by
used_tables() for these references.
Made sure that a temporary table field is added for any set function
aggregated in outer context when creation of a temporary table is
needed to execute the inner subquery.
The problem in this bug is when we create temporary tables. When
temporary tables are created for unions, there is some
inferrence being carried out regarding the type of the column.
Whenever this column type is inferred to be REAL (i.e. FLOAT or
DOUBLE), MySQL will always try to maintain exact precision, and
if that is not possible (there are hardware limits, since FLOAT
and DOUBLE are stored as approximate values) will switch to
using approximate values. The problem here is that at this point
the information about number of significant digits is not
available. Furthermore, the number of significant digits should
be increased for the AVG function, however, this was not properly
handled. There are 4 parts to the problem:
#1: DOUBLE and FLOAT fields don't display their proper display
lengths in max_display_length(). This is hard-coded as 53 for
DOUBLE and 24 for FLOAT. Now changed to instead return the
field_length.
#2: Type holders for temporary tables do not preserve the
max_length of the Item's from which they are created, and is
instead reverted to the 53 and 24 from above. This causes
*all* fields to get non-fixed significant digits.
#3: AVG function does not update max_length (display length)
when updating number of decimals.
#4: The function that switches to non-fixed number of
significant digits should use DBL_DIG + 2 or FLT_DIG + 2 as
cut-off values (Since fixed precision does not use the 'e'
notation)
Of these points, #1 is the controversial one, but this
change is preferred and has been cleared with Monty. The
function causes quite a few unit tests to blow up and they had
to b changed, but each one is annotated and motivated. We
frequently see the magical 53 and 24 give way to more relevant
numbers.
aggregated in outer context returned wrong results.
This happened only if the subquery did not contain any references
to outer fields.
As there were no references to outer fields the subquery erroneously
was taken for non-correlated one.
Now any set function aggregated in outer context makes the subquery
correlated.
To correctly decide which predicates can be evaluated with a given table
the optimizer must know the exact set of tables that a predicate depends
on. If that mask is too wide (refer to non-existing tables) the optimizer
can erroneously skip a predicate.
One such case of wrong table usage mask were the aggregate functions.
The have a all-1 mask (meaning depend on all tables, including non-existent
ones).
Fixed by making a real used_tables mask for the aggregates. The mask is
constructed in the following way :
1. OR the table dependency masks of all the arguments of the aggregate.
2. If all the arguments of the function are from the local name resolution
context and it is evaluated in the same name resolution
context where it is referenced all the tables from that name resolution
context are OR-ed to the dependency mask. This is to denote that an
aggregate function depends on the number of rows it processes.
3. Handle correctly the case of an aggregate function optimization (such that
the aggregate function can be pre-calculated and made a constant).
Made sure that an aggregate function is never a constant (unless subject of a
specific optimization and pre-calculation).
One other flaw was revealed and fixed in the process : references were
not calling the recalculation method for used_tables of their targets.
"Server Variables for Plugins"
Implement support for plugins to declare server variables.
Demonstrate functionality by removing InnoDB specific code from sql/*
New feature for HASH - HASH_UNIQUE flag
New feature for DYNAMIC_ARRAY - initializer accepts preallocated ptr.
Completed support for plugin reference counting.
When only one row was present, the subtraction of nearly the same number
resulted in catastropic cancellation, introducing an error in the
VARIANCE calculation near 1e-15. That was sqrt()ed to get STDDEV, the
error was escallated to near 1e-8.
The simple fix of testing for a row count of 1 and forcing that to yield
0.0 is insufficient, as two rows of the same value should also have a
variance of 0.0, yet the error would be about the same.
So, this patch changes the formula that computes the VARIANCE to be one
that is not subject to catastrophic cancellation.
In addition, it now uses only (faster-than-decimal) floating point numbers
to calculate, and renders that to other types on demand.