mariadb/mysql-test/suite/innodb/t/purge_secondary.test
Marko Mäkelä 7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00

142 lines
8.7 KiB
Text

--source include/have_innodb.inc
--source include/have_sequence.inc
# Ensure that the history list length will actually be decremented by purge.
SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
CREATE TABLE t1 (
a SERIAL, b CHAR(255) NOT NULL DEFAULT '', c BOOLEAN DEFAULT false,
l LINESTRING NOT NULL DEFAULT ST_linefromtext('linestring(448 -689,
453 -684,451 -679,453 -677,458 -681,463 -681,468 -678,470 -676,470 -678,
468 -675,472 -675,472 -675,474 -674,479 -676,477 -675,473 -676,475 1324,
479 1319,484 1322,483 1323,486 1323,491 1328,492 1325,496 1325,498 1325,
501 1330,498 1331,500 1331,504 1330,508 1329,512 1332,513 1337,518 1339,
518 1339,513 1344,513 1344,512 1346,514 1351,515 1353,519 1358,518 1362,
522 1365,525 1360,526 1362,527 1362,528 1367,525 1371,528 1366,532 1369,
536 1374,539 1377,543 1379,539 1381,541 1382,543 1383,546 1388,549 1393,
554 1393,554 1395,554 1392,550 1394,550 1392,546 1394,549 1397,550 1393,
549 1394,554 1390,554 1391,549 1396,551 1396,547 1400,547 1402,551 1407,
554 1412,554 1415,558 1418,463 -681,465 -677,465 -675,470 -670,470 -665,
470 -660,470 -659,473 -656,476 -656,481 -655,482 -652,486 -654,486 -652,
486 -648,491 -646,490 -651,494 -646,493 -644,493 -644,490 -644,491 2356,
495 2359,495 2364,500 2359,503 5359,504 5364,509 5368,504 5367,499 5368,
498 5371,498 5369,500 5370,504 5370,508 5370,511 5370,507 5374,508 5378,
511 5382,507 5387,509 5389,512 5388,515 5393,520 5396,517 5397,517 5402,
515 5404,520 5402,521 5405,525 5405,526 5408,530 7408,535 7413,533 7415,
529 7412,532 7416,4532 7416,4534 7421,4533 7417,4536 7413,4536 7418,
4540 3418,4545 3418,4549 3415,4551 3419,4554 3421,4559 3423,4559 3426,
4557 3424,4561 3428,4558 3428,4563 3431,4565 3435,4569 3439,4569 3439,
4569 3444,4567 3444,4572 3446,4577 3447,4581 3444,4581 3448,4584 3448,
4579 3447,4580 3450,4583 3449,4583 3453,4587 3455,4588 3458,4593 3463,
4598 3465,4601 3468,4598 3464,4598 3460,4593 5460,4595 5461,4600 5464,
4600 5465,4601 5466,4606 5466,4608 5466,4605 5464,4608 5467,4607 5468,
4609 5465,4614 5461,4618 5463,4621 5467,4623 5470,4622 5470,4622 5470,
4625 6470,4627 6471,4627 6472,4627 6473,6627 6474,6625 6474,6628 6477,
6633 6481,6633 6480,6637 6475,7637 6479,7638 6482,7643 6487,7644 6492,
7647 6492,7648 6495,7646 6498,7650 6499,7646 6494,7644 6499,7644 6497,
7644 6499,7647 6502,7649 6504,7650 6501,7647 6503,7649 6504,7650 6508,
7651 6503,7652 6508,7655 6508,7650 6511,7655 6515,7658 6513,7663 6513,
7665 6514,7669 6512,7667 6510,7664 6510,472 -675,477 -670,479 -666,
482 -663,484 -668,484 -666,485 -664,481 -664,479 -659,482 -659,484 -658,
483 -659,488 2341,493 2339,489 2338,491 2342,491 2346,494 2346,490 2348,
493 2348,498 2349,498 2350,499 2349,502 2350,503 2348,506 2348,506 2348,
507 2353,507 2355,504 2359,504 2364,504 2361,499 2365,502 2360,502 2358,
503 2357,504 2353,504 2357,500 2356,497 2355,498 2355,500 2359,502 2361,
505 2364,508 2364,506 2368,506 2370,504 2373,499 2373,496 2372,493 2377,
497 2380,495 2383,496 7383,493 7386,497 7391,494 7387,495 7389,498 7392,
498 7392,495 7395,493 7398,498 7401,498 7403,503 7400,498 8400,501 8401,
503 8401,503 8401,501 10401,496 10396,491 10401,492 10399,493 10403,
496 10403,491 10403,493 10407,489 10410,493 10407,489 10403,498 7403,
497 7399,496 7403,500 7405,500 7407,503 7411,508 7415,511 7415,511 7420,
515 7420,520 7423,523 7423,520 7427,523 7427,523 7427,522 7432,525 4432,
527 4434,530 4437,534 4441,529 4446,529 4441,534 4436,537 4436,535 4437,
532 4437,534 4432,535 4429,538 4430,542 4427,542 4431,538 4431,541 4431,
541 4433,543 4433,545 4432,549 4428,552 4426,556 4427,557 4423,560 4427,
561 4428,558 4430,559 4434,559 4432,561 4434,561 4437,563 4435,559 4430,
561 4435,4561 4437,4566 4441,4568 4446,4568 4450,4569 4455,4565 4458,
4561 4463,4561 9463,4564 9463,4565 9461,9565 9463,9560 9467,9560 9466,
9555 9469,9555 9471,9559 9469,9557 9473,9553 9478,9555 9480,9557 9481,
9557 9481,9557 9483,9562 9487,9558 9487,9558 9490,9561 9493,9562 9493,
9557 9493,9560 9496,9555 9501,9553 9503,9553 9506,9557 9510,9558 9511,
9561 9514,9563 9512,9568 9514,9567 9514,9567 13514,9570 13517,9566 13521,
9571 13521,9571 13526,9573 13521,9571 13521,9576 10521,9580 10526,9582 10525,
9584 10528,9584 10531,9584 10533,9589 10533,9588 10537,9588 10541,9589 10542,
9593 10544,9595 10540,9597 10541,9600 10545,9601 15545,9603 15549,9605 15553,
9601 15558,9601 15553,9605 15551,9605 15550,9605 15554,9607 15556,9605 15556,
9604 15561,9607 15559,9603 15559,9603 15562,9604 15563,9608 15566,9612 15570,
9617 15565,9622 15568,9627 15566,9628 15564,9629 15564,9633 15569,9636 15569,
9634 15571,9634 15572,9636 15574,9634 15570,9629 15570,9631 15567,9629 15570,
9626 15574,9626 15575,498 7401,502 7401,506 7397,506 7395,502 7398,497 7401,
502 7402,505 7397,508 7400,504 7404,3504 7409,3505 7405,3508 7410,3511 7413,
3511 7416,3511 7419,3511 7419,3513 7421,3517 7424,3519 7426,3520 11426,
3523 11421,3527 11418,3530 11415,3530 11416,3533 11418,7533 11415,7531 11415,
7531 11417,7536 11420,7541 11424,7543 11425,7543 11427,7543 11429,7540 11429,
7542 11425,7541 11420,7542 11421,7542 11422,7540 11424,7540 11423,7543 11422,
7546 11426,7550 11431,7553 11436,7555 16436,7553 16438,7558 16438,7559 16438,
7560 16439,7565 16437,7560 16435,7563 16435,7566 16440,7566 16444,7564 16447,
7559 16443,7561 16443,7566 16448,7570 16451,7574 16456,7578 16459,
12578 16459,12578 20459,12577 20456,12581 20454,12585 20456,12585 20456,
12585 20456,12583 20456,12579 20459,12580 20461,12580 20462,12580 20460,
12585 20465,12586 20467,12590 20470,12590 20470,12589 20471,12584 20471,
12589 20471,9589 20472,9594 20472,9595 20472,9596 20477,9598 20482,
9603 20480,9608 20484,9613 20484,9610 20486,9608 20488,9608 20489,9610 20489,
9614 20486,9619 20481,9620 20481,9618 21481,9621 21483,9626 21483,9628 21485,
9623 21487,9622 21490,9626 21493,9621 21495,9626 21498,9622 21499,9624 21504,
9625 21499,9629 21501,9633 21498,9637 21495,9639 21498,9644 21501,9557 9481,
9560 9485,9561 9490,9563 9488,9560 9486,9558 9488,9561 9492,9563 9495,
9567 9492,9567 9488,9564 9490,9559 9495,9559 9498,9557 9502,9562 9506,
9564 9509,9569 9512,9569 9516,9569 9518,9569 9515,9571 9513,9571 9512,
9573 9513,9578 9516,9581 9516,9585 11516,9585 11521,9590 10521,9586 10524,
9589 10529,9589 10527,9589 10527,9594 10532,9594 10534,9598 10536,9598 10540,
9600 10542,9604 10538,9607 10538,9609 10543,9613 10538,9613 10533,9613 10537,
9610 10537,9614 10542,9609 10542,9610 10543,9610 10548,9611 10553,9616 7553,
9620 7553,9621 7557,9618 7559,9618 7554,9622 7557,9622 7561,9622 7556,
9622 7560,9619 7560,9620 7565,9622 7563,9627 7566,9630 7570,9630 7571,
9632 7573,9637 7576,9639 7578,9640 7576,9640 7579,9640 7575,9642 7570,
9646 7570,9651 7574,9653 7577,9652 7572,9653 7576,9653 7576,9651 7581,
9656 7585,9660 7586,9659 7591,9657 7594,9661 7598,9664 7602,9668 12602,
9673 12604,9676 12606,9679 12602,9682 12605,9677 12610,9674 12606,9674 12601,
9674 12603,9672 9603,9668 9605,9671 9606,9668 9611,9668 9606,9671 9611,
9675 9615,9677 9620,9678 9622,9679 9624,9684 9626,9685 9627,9685 9622,
9685 9626,9689 9628,9694 9633,9699 9637,9699 9637,9704 9636,9708 9637,
9709 9638,9707 9639,9705 9642,9707 9647,9710 9649,9711 9653,9716 9649,
9716 9648,9720 9650,9721 9648,9723 9648,9726 4648,12726 4653,12731 4655,
12734 4660,12730 4661,12733 4664,12733 4665,12735 4670,12737 4674,12741 4674,
12738 4675,12740 4675,12737 4675,12742 4678,12743 4681,12746 4677)'),
INDEX(b,c), SPATIAL INDEX `sidx`(l)
) ENGINE=InnoDB ROW_FORMAT=REDUNDANT;
INSERT INTO t1 () VALUES (),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),();
SELECT LENGTH(l) FROM t1;
INSERT INTO t1 (a) SELECT NULL FROM t1;
INSERT INTO t1 (a) SELECT NULL FROM t1;
CHECK TABLE t1;
UPDATE t1 SET c=true, l=ST_linefromtext('linestring(0 0,1 1,2 2)');
DELETE FROM t1;
CHECK TABLE t1;
source include/wait_all_purged.inc;
ANALYZE TABLE t1;
SELECT OTHER_INDEX_SIZE FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS
WHERE NAME='test/t1';
# Work around MDEV-13942, Dropping the spatial index to avoid the possible hang
ALTER TABLE t1 DROP INDEX `sidx`;
INSERT INTO t1 (a) SELECT * FROM seq_1_to_544;
ALTER TABLE t1 FORCE, ALGORITHM=INPLACE;
ALTER TABLE t1 FORCE, ALGORITHM=INPLACE;
SELECT (variable_value > 0) FROM information_schema.global_status
WHERE LOWER(variable_name) LIKE 'INNODB_BUFFER_POOL_PAGES_FLUSHED';
--echo # Note: The OTHER_INDEX_SIZE does not cover any SPATIAL INDEX.
--echo # To test that all indexes were emptied, replace DROP TABLE
--echo # with the following, and examine the root pages in t1.ibd:
--echo # FLUSH TABLES t1 FOR EXPORT;
--echo # UNLOCK TABLES;
DROP TABLE t1;
SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;