pt-online-schema-change (if misused) can't save the day

In this blog post we’ll discuss pt-online-schema-change, and how to correctly use it.

Always use pt-osc?

Altering large tables can be still a problematic DBA task, even now after we’ve improved Online DDL features in MySQL 5.6 and 5.7. Some ALTER types are still not online, or sometimes just too expensive to execute on busy production master.

So in some cases, we may want to apply an ALTER first on slaves, taking them out of traffic pool one by one and bringing them back after the ALTER is done. In the end, we can promote one of the already altered slaves to be new master, so that the downtime/maintenance time is greatly minimized. The ex-master can be altered later, without affecting production. Of course, this method works best when the schema change is backwards-compatible.

So far so good, but there is another problem. Let’s say the table is huge, and ALTER takes a lot of time on the slave. When it is a DML-blocking type ALTER (perhaps when using MySQL 5.5.x or older, etc.), there will be a long slave lag (if the table is being written by replication SQL thread at the same time, for example). So what do we to speed up the process and avoid the altered slave lag? One temptation that could tempt you is why not use pt-online-schema-change on the slave, which can do the ALTER in a non-blocking fashion?

Let’s see how it that would work. I need to rebuild big table on slave using MySQL version 5.6.16 (“null alter” was made online since 5.6.17) to reclaim disk space after some rows are deleted.

This example demonstrates the process (db1 is the master, db2 is the slave):

[root@db2 ~]# pt-online-schema-change --execute --alter "engine=innodb" D=db1,t=sbtest1
 No slaves found.  See --recursion-method if host db2 has slaves.
 Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
 Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
 Altering `db1`.`sbtest1`...
 Creating new table...
 Created new table db1._sbtest1_new OK.
 Altering new table...
 Altered `db1`.`_sbtest1_new` OK.
 2016-05-16T10:50:50 Creating triggers...
 2016-05-16T10:50:50 Created triggers OK.
 2016-05-16T10:50:50 Copying approximately 591840 rows...
 Copying `db1`.`sbtest1`:  51% 00:28 remain

(...)

[root@db2 ~]# pt-online-schema-change --execute --alter "engine=innodb" D=db1,t=sbtest1

No slaves found. See --recursion-method if host db2 has slaves.

Not checking slave lag because no slaves were found and --check-slave-lag was not specified.

Operation, tries, wait:

analyze_table, 10, 1

copy_rows, 10, 0.25

create_triggers, 10, 1

drop_triggers, 10, 1

swap_tables, 10, 1

update_foreign_keys, 10, 1

Altering `db1`.`sbtest1`...

Creating new table...

Created new table db1._sbtest1_new OK.

Altering new table...

Altered `db1`.`_sbtest1_new` OK.

2016-05-16T10:50:50 Creating triggers...

2016-05-16T10:50:50 Created triggers OK.

2016-05-16T10:50:50 Copying approximately 591840 rows...

Copying `db1`.`sbtest1`: 51% 00:28 remain

(...)

The tool is still working during the operation, and the table receives some writes on master:

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.06 sec)
 Rows matched: 99  Changed: 99  Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.05 sec)
 Rows matched: 99  Changed: 99  Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;

Query OK, 99 rows affected (0.06 sec)

Rows matched: 99 Changed: 99 Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;

Query OK, 99 rows affected (0.05 sec)

Rows matched: 99 Changed: 99 Warnings: 0

which are applied on slave right away, as the table allows writes all the time.

(...)
 Copying `db1`.`sbtest1`:  97% 00:01 remain
 2016-05-16T10:51:53 Copied rows OK.
 2016-05-16T10:51:53 Analyzing new table...
 2016-05-16T10:51:53 Swapping tables...
 2016-05-16T10:51:53 Swapped original and new tables OK.
 2016-05-16T10:51:53 Dropping old table...
 2016-05-16T10:51:53 Dropped old table `db1`.`_sbtest1_old` OK.
 2016-05-16T10:51:53 Dropping triggers...
 2016-05-16T10:51:53 Dropped triggers OK.
 Successfully altered `db1`.`sbtest1`.

(...)

Copying `db1`.`sbtest1`: 97% 00:01 remain

2016-05-16T10:51:53 Copied rows OK.

2016-05-16T10:51:53 Analyzing new table...

2016-05-16T10:51:53 Swapping tables...

2016-05-16T10:51:53 Swapped original and new tables OK.

2016-05-16T10:51:53 Dropping old table...

2016-05-16T10:51:53 Dropped old table `db1`.`_sbtest1_old` OK.

2016-05-16T10:51:53 Dropping triggers...

2016-05-16T10:51:53 Dropped triggers OK.

Successfully altered `db1`.`sbtest1`.

Done! No slave lag, and the table is rebuilt. But . . . let’s just make sure data is consistent between the master and slave (you can use pt-table-checksum):

db1 {root} (db1) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392590 |
 +--------+
 1 row in set (0.00 sec)

db2 {root} (test) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392586 |
 +--------+
 1 row in set (0.00 sec)

db1 {root} (db1) > select max(k) from db1.sbtest1 where id<100;

+--------+

| max(k) |

+--------+

| 392590 |

+--------+

1 row in set (0.00 sec)

db2 {root} (test) > select max(k) from db1.sbtest1 where id<100;

+--------+

| max(k) |

+--------+

| 392586 |

+--------+

1 row in set (0.00 sec)

No, it is not! The slave is clearly missing the updates that happened during a pt-osc run. Why?

The explanation is simple. The pt-online-schema-change relies on triggers. The triggers are used to make the writes happening to the original table also populate to the temporary table copy, so that both tables are consistent when the final table switch happens at the end of the process. So what is the problem here? It’s the binary log format: in ROW based replication, the triggers are not fired on the slave! And my master is running in ROW mode:

db1 {root} (db1) > show variables like 'binlog_format';
 +---------------+-------+
 | Variable_name | Value |
 +---------------+-------+
 | binlog_format | ROW |
 +---------------+-------+
 1 row in set (0.01 sec)

db1 {root} (db1) > show variables like 'binlog_format';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| binlog_format | ROW |

+---------------+-------+

1 row in set (0.01 sec)

So, if I used pt-online-schema-change on the master, the data inconsistency problem doesn’t happen. But using it on the slave is just dangerous!

Conclusion

Whenever you use pt-online-schema-change, make sure you are not executing it on a slave instance. For that reason, I escalated this bug report: https://bugs.launchpad.net/percona-toolkit/+bug/1221372. Also in many cases, using a normal ALTER will work well enough. As in my example, to rebuild the table separately on each slave in lockless mode, I would just need to upgrade to the more recent 5.6 version.

BTW, if you’re wondering about Galera replication (used in Percona XtraDB Cluster, etc.) since it also uses a ROW-based format, it’s not a problem. The pt-osc triggers are created in all nodes thanks to synchronous write-anywhere replication nature. It does not matter which node you start pt-online-schema-change on, and which other nodes your applications writes on at the same time. No slaves, no problem! 🙂

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Piotr Mazurkiewicz

7 years ago

MariaDB has ability to run triggers on slave for row-based replication starting from 10.1.1. Which I have ported to PS 5.6 successfully.

Todd Lomaro

7 years ago

What if you have Master-Master enabled?

zhuomin chen

6 years ago

This should NOT be the issue, even though trigger on Original table on Slave is NOT firing,
Rows on New table on Master should be replicated to New table on Slave over via binlog.
I do not see the need for slave firing trigger at all.

Jimmy

Igor

6 years ago

Was using on pt-online-schema-change with –set-vars=”SQL_LOG_BIN=OFF” to prevent propagation to the connected slave. I was expecting the change will take place on 3 nodes Galera cluster but will be missed on connected Slave. Well, the change indeed happen on Galera cluster, but the table on 2 Galera nodes is EMPTY! BEWARE!!!

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

pt-online-schema-change (if misused) can’t save the day

Always use pt-osc?

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

pt-online-schema-change (if misused) can’t save the day

Always use pt-osc?

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation