I want to start off by saying that there are a lot of causes for the error in the title of this article, most of which are easy to fix conditions centering around misconfiguration. However none of them applied to our situation here.

After weeks/months of troubleshooting and isolating on a customer’s (unfortunately production) servers we believe we’ve identified a key bug in MySQL’s row based replication method (RBR) which produces errors like this on the “slave” server (we’re using GTID multi-master so both servers are slaved to each other):

Could not execute Update_rows event on table customer1.table1; Can't find record in 'table1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log db1-bin.000001, end_log_pos 344884

Could not execute Update_rows event on table customer1.table1; Can't find record in 'table1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log db1-bin.000001, end_log_pos 9685352

Could not execute Update_rows event on table customer2.table1; Can't find record in 'table1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log db1-bin.000001, end_log_pos 28653123

Could not execute Delete_rows event on table customer3.table2; Can't find record in 'table2', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log db1-bin.000001, end_log_pos 179801477

Our eventual conclusion was that these tables did not have any indexes which had a UNIQUE value set, which apparently causes big problems for RBR. Once we added an index which did have UNIQUE set, the problem vanished. In our case we created an extra column specifically for this and indexed that so it didn’t interfere with the existing data layout.

We’ve opened a MySQL bug report for this but have yet to see any movement on it:

We also created a script which identifies all tables in all databases that meet this condition:

Hopefully this helps others become aware of this and work around this issue.

4 Responses to “MySQL RBR (Row Based Replication) /w GTID halts with Error_code: 1032; handler error HA_ERR_END_OF_FILE”

  1. Mary

    We have the same problem however, the work around that you did in fixing your problem doesn’t work for us. All of our tables has primary keys which were set to auto-increment.

    Now, we don’t know how to fix this problem anymore.

    Reply
  2. Bobby

    We’ve got the same problem here. After running SQL_SLAVE_SKIP_COUNTER a couple of times and restating the slave server, the replication starts again. But this is not ideal as we need to constantly monitor the status.

    Do you think that switching from RBR to Mixed, would be a good idea?

    Also has anyone test that with MariaDB?

    Reply
    • Adam Strohl

      If you’re having to skip transactions, in my view, that isn’t a workable solution. Not only will this result in corruption or differences in the data between the two peers, it indicates there is still an underlying problem. I’ve never had a situation where we “just had to skip one transaction and everything was fine”. There will always be more. You’re also risking replication stopping randomly (possibly unnoticed if you’re not monitoring it closely) and then having to fail over to a peer with old data, too.

      For the customer that caused this issue to come up for us we tested extensively RBR, Mixed and Statement and for them RBR was the best. For this situation this was due to a lot of non-deterministic stored procedures which was producing different data on each peer which then, later, was causing replication failures due to collisions.

      For your situation this might not make sense. Generally I prefer Mixed or Statement replication unless you, like our above situation, absolutely had to use RBR.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.