In this blog post we’ll discuss pt-online-schema-change, and how to correctly use it.

Always use pt-osc?

Altering large tables can be still a problematic DBA task, even now after we’ve improved Online DDL features in MySQL 5.6 and 5.7. Some ALTER types are still not online, or sometimes just too expensive to execute on busy production master.

So in some cases, we may want to apply an ALTER first on slaves, taking them out of traffic pool one by one and bringing them back after the ALTER is done. In the end, we can promote one of the already altered slaves to be new master, so that the downtime/maintenance time is greatly minimized. The ex-master can be altered later, without affecting production. Of course, this method works best when the schema change is backwards-compatible.

So far so good, but there is another problem. Let’s say the table is huge, and ALTER takes a lot of time on the slave. When it is a DML-blocking type ALTER (perhaps when using MySQL 5.5.x or older, etc.), there will be a long slave lag (if the table is being written by replication SQL thread at the same time, for example). So what do we to speed up the process and avoid the altered slave lag? One temptation that could tempt you is why not use pt-online-schema-change on the slave, which can do the ALTER in a non-blocking fashion?

Let’s see how it that would work. I need to rebuild big table on slave using MySQL version 5.6.16 (“null alter” was made online since 5.6.17) to reclaim disk space after some rows are deleted.

This example demonstrates the process (db1 is the master, db2 is the slave):

The tool is still working during the operation, and the table receives some writes on master:

which are applied on slave right away, as the table allows writes all the time.

Done! No slave lag, and the table is rebuilt. But . . . let’s just make sure data is consistent between the master and slave (you can use pt-table-checksum):

No, it is not! The slave is clearly missing the updates that happened during a pt-osc run. Why?

The explanation is simple. The pt-online-schema-change relies on triggers. The triggers are used to make the writes happening to the original table also populate to the temporary table copy, so that both tables are consistent when the final table switch happens at the end of the process. So what is the problem here? It’s the binary log format: in ROW based replication, the triggers are not fired on the slave! And my master is running in ROW mode:

So, if I used pt-online-schema-change on the master, the data inconsistency problem doesn’t happen. But using it on the slave is just dangerous!

Conclusion

Whenever you use pt-online-schema-change, make sure you are not executing it on a slave instance. For that reason, I escalated this bug report: https://bugs.launchpad.net/percona-toolkit/+bug/1221372. Also in many cases, using a normal ALTER will work well enough. As in my example, to rebuild the table separately on each slave in lockless mode, I would just need to upgrade to the more recent 5.6 version.

BTW, if you’re wondering about Galera replication (used in Percona XtraDB Cluster, etc.) since it also uses a ROW-based format, it’s not a problem. The pt-osc triggers are created in all nodes thanks to synchronous write-anywhere replication nature. It does not matter which node you start pt-online-schema-change on, and which other nodes your applications writes on at the same time. No slaves, no problem! 🙂

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Piotr Mazurkiewicz

MariaDB has ability to run triggers on slave for row-based replication starting from 10.1.1. Which I have ported to PS 5.6 successfully.

Todd Lomaro

What if you have Master-Master enabled?

zhuomin chen

This should NOT be the issue, even though trigger on Original table on Slave is NOT firing,
Rows on New table on Master should be replicated to New table on Slave over via binlog.
I do not see the need for slave firing trigger at all.

Jimmy

Igor

Was using on pt-online-schema-change with –set-vars=”SQL_LOG_BIN=OFF” to prevent propagation to the connected slave. I was expecting the change will take place on 3 nodes Galera cluster but will be missed on connected Slave. Well, the change indeed happen on Galera cluster, but the table on 2 Galera nodes is EMPTY! BEWARE!!!