The root cause is the WAL logging of file operation when the actual
operation fails afterwards. It creates a situation with a log entry for
a operation that would always fail. I could simulate both the backup
scenario error and Innodb recovery failure exploiting the weakness.
We are following WAL for file rename operation and once logged the
operation must eventually complete successfully, or it is a major
catastrophe. Right now, we fail for rename and handle it as normal error
and it is the problem.
I created a patch to address RENAME operation to a non existing schema
where the destination schema directory is missing. The patch checks for
the missing schema before logging in an attempt to avoid the failure
after WAL log is written/flushed. I also checked that the schema cannot
be dropped or there cannot be any race with other rename to the same
file. This is protected by the MDL lock in SQL today.
The patch should this be a good improvement over the current situation
and solves the issue at hand.
Fix one more bug in "DDL redo" phase in prepare
If table was renamed, and then new table was created with the old name,
prepare can be confused, and .ibd can end up with wrong name.
Fix the order of how DDL fixup is applied , once again - ".new" files
should be processed after renames.