The bug was caused by several issues.
2 problems in seek_io_cache. Due to wrong offsets used, we would end up
seeking way too much (first change), or over the intended seek point
(second change). Fixing it requires correctly detecting available data
in buffer (first change), and not using "IO_SIZE alligned" reads. The
second is needed because _my_b_cache_read adjusts the pos_in_file itself
based on read_pos and read_end. Pretending buffer is empty when we want
to force a read will aleviate this problem.
Secondly, the big-table cursors didn't repect the interface definitions
of always returning the rownumber that Table_read_cursor::fetch() would activate.
At the same time, next(), prev() and move_to() should not perform any
row activation.
Window functions need to be computed after applying the HAVING clause.
An optimization that we have for regular, non-window function, cases is
to apply having only during sending of the rows to the client. This
allows rows that should be filtered from the temporary table used to
store aggregation results to be stored there.
This behaviour is undesireable for window functions, as we have to
compute window functions on the result-set after HAVING is applied.
Storing extra rows in the table leads to wrong values as the frame
bounds might capture those -to be filtered afterwards- rows.
The same approach is needed for LAST_VALUE, otherwise the LAST_VALUE sum
functions are not cleared correctly. Now LAST_VALUE behaves as NTH_VALUE
with 0 offset, only that the frame that it is examining is the bottom bound,
not the top bound.
There was no implementation of the virtual method print()
for the Item_window_func class. As a result for a view
containing window function an invalid view definition could
be written in the frm file. When a query that refers to
this view was executed a syntax error was reported.
Add support for having multiple IO_CACHEs with type=READ_CACHE to share
the file they are reading from.
Each IO_CACHE keeps its own in-memory buffer. When doing a read or seek
operation on the file, it notifies other IO_CACHEs that the file position
has been changed.
Make Rowid_seq_cursor use cloned IO_CACHE when reading filesort result.
Refactour out (into a copy for now) the logic of Item_sum_hybrid, to
allow for multiple arguments. It does not contain the comparator
members. The result is the class Item_sum_hybrid_simple.
LEAD and LAG make use of this Item to store previous rows in a chache.
It also helps in specifying the field type. Currently LEAD/LAG do not
support default values.
NTH_VALUE behaves identical to LEAD and LAG, except that the starting
position cursor is placed on the top of the frame instead of the current
row.
Make window functions work with an empty over clause by forcing
a sort on the first column of the current join_tab. This is a temporary
fix until we get window functions to work with big tables.
The positional cursor now fetches rows based on the positional
cursor and an offset (if present). It will fetch rows, based on the
offset, only if the required position is not out of bounds.
With clever use of partition bounds, we only need to add one row to the
items at a time. This way we remove the need to "reset" the item and run
through the full partition again.
We can set values in the record buffer first and only perform one table
write call at the end. No need to write to file every time one column is
updated.
Also, remove unused method from Table_read_cursor.
This makes them behave exactly like CURRENT ROW. Standard specifies
unsigned integer, which includes the value 0.
Expand the win_min_max test to include this kind of frame definitions.
Cursors now report their current row number as the boundary of the
partition. This is used by Frame_scan_cursor to compute aggregate
functions that do not support removal.
Currently cursors automatically add values to the sum functions they
manage. There are use cases when we just want to figure out the frame
boundaries, without actually adding/removing values from them.
When specifying a RANGE type frame that exceeds the partition size, both
for the top and bottom cursors we end up removing more rows than added
to the aggregate function. This happens because our TOP range cursor,
which removes values from the aggregate function, would be allowed to breach
partition boundaries, while the BOTTOM range cursor would not.
To prevent this from happening, force the TOP range cursor to only move
within the current partition, as does the BOTTOM range cursor.
Perform only one table scan for each window function present. We do this
by keeping keeping cursors for each window function frame bound and
running them for each function for every row.
- Avoid some realloc() during startup
- Ensure that file_key_management_plugin frees it's memory early, even if
it's linked statically.
- Fixed compiler warnings from unused variables and missing destructors
- Fixed wrong indentation
Make Frame_range_current_row_bottom to take into account partition bounds.
Other partition bounds that could potentially hit the end of partition are
Frame_range_n_bottom, Frame_n_rows_following, Frame_unbounded_following,
and they all had end-of-partition protection.
To simplify the code, factored out end-of-partition checks into
class Partition_read_cursor.
This bug revealed a serious problem: if the same partition list
was used in two window specifications then the temporary table created
to calculate window functions contained fields for two identical
partitions. This problem was fixed as well.
n=0 in "ROWS 0 PRECEDING" is valid, add handling for it:
- Adjust the assert
- Bottom bound of 'ROW 0 PRECEDING' is actually looking at the current
row, that is, it needs to process partition's first row directly in
Frame_n_rows_preceding::next_partition().
- Added testcases
" The sort order for the sub-sequence of window functions starting
from the element marked by SORTORDER_CHANGE_FLAG up to the next
element marked by SORTORDER_CHANGE_FLAG must be taken from the
last element of the sub-sequence (not from the first one)."
- Rename Window_funcs_computation to Window_funcs_computation_step
- Introduce Window_func_sort which invokes filesort and then
invokes computation of all window functions that use this ordering.
- Expose Window functions' sort operations in EXPLAIN|ANALYZE FORMAT=JSON
that the call-back comparison function returns a positive
number when arg1 < arg2, and a negative number when arg1 > arg2.
This is not in line with other implementation of sorting
algorithm.
Changed bubble_sort: now a negative result from the comparison
function means that arg1 < arg2, and positive result means
that arg1 > arg2.
Changed accordingly all call-back functions that are used as
parameters in the call of bubble_sort.
Added a test case to check the proper sorting of window functions.