3 handy tools to remove problematic MySQL processesDBAs often encounter situations where they need to kill queries to ensure there are no long-running queries on a MySQL server that would impact performance. Long-running queries can be the result of many factors. Fortunately, Percona Server contains some handy tools to remove problematic MySQL processes. I will highlight all of the tools via some examples in this post.

pt-kill:
There have been some good posts on this blog about the pt-kill tool, like this one by Arunjith Aravindan titled “How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill.” Let’s dive into pt-kill a bit further with a few more examples. What does pt-kill do? It kills MySQL connections. Say you wanted to run pt-kill from a cronjob and then get an email on every killed process/query. Here is typical example for that.

Assume this is running from a cronjob, When pt-kill executes, it will kill queries longer than 5 seconds. By default, pt-kill runs forever –run-time option tells how long pt-kill to run before exiting If –interval and –busy-time parameters are used together then the explicit –interval value is used. Otherwise the default interval is 30 seconds. Note: this will only kill all read queries as per the –match-info parameter.

he above command will log all killed queries in the file referenced with the –log option and should be used with —daemonize for logging. If you need to be notified via email for every killed query, the command below will do it. Off-course, you need to have the system configured to send e-mail.

You can execute this shell script in the background within screen or with the nohup utility continuously to monitor the log file. It will send an email whenever any new killed query arrives to the referenced log file in the pt-kill command. Unfortunately, there is no option to notify-by-email in pt-kill at the moment, so this is sort of a workaround.

In order to log all killed queries into a database table you will need to use the –log-dsn option as per the example below.

All killed queries will be logged into percona.kill_log table. The –daemonize option will run this command in the background forever and will kill all SELECT queries running longer than 1 second (–busy-time 1). The –interval option instructs pt-kill to scan processes every 1 second (–interval 1).

With the help of logging killed queries into a database tables. You can easily get all the trends/and /statistics on killed queries via SQL.

By default the tool kills the oldest of the queries that would have been running for more than a given –busy-time.  If you need to kill all the threads that have been busy for more than a specified –busy-time, then this will do it:

Statement Timeout in Percona Server:
The max-statement-time feature is ported from the Twitter patches. This feature can be used to limit the query execution time by specifying the timeout value in the max_statement_time variable. When the specified number of milliseconds is reached the server aborts the statement and returns the error below to the client.

Let me demonstrate this through another example:

As you can see from this example statement, the timeout feature works for all statements including SELECT/DML/DDL queries.

The above mentioned status variables are stats for a statement timeout feature. Max_statement_time_exceeded will inform you that the total number of statements exceeded the defined timeout. Max_statement_time_set defines the number of statements for which execution time limit was set. You can find more details in this documentation. The statement timeout feature was introduced in Percona Server 5.6. You can check if your specific version of Percona Server supports this feature or not via the have_statement_timeout variable.

Bugs you should be aware of:

https://bugs.launchpad.net/percona-server/+bug/1388533 -> This affects how the feature interacts with stored procedures. If you use stored procedures, max_statement_time might not behave as you expect.
https://bugs.launchpad.net/percona-server/+bug/1307432 -> This is documentation bug. Percona Server timeouts might not be safe for some statements like DDL and should not be used with such queries, The documentation does not reflect this. You should be very careful if you set a global statement timeout, It affects data changing queries as well. For best results set the max_statement_time variable in a session before running queries that you want to be killed if they execute too long, instead of using a global variable.
https://bugs.launchpad.net/percona-server/+bug/1376934 -> This affects the statement timeout feature on the query level. You must set max_statement_time in a session or globally instead however, this bug is fixed in latest version i.e. Percona Server 5.6.22-72.0

InnoDB Kill Idle Transactions:
This feature was introduced in Percona Server 5.5. It limits the age of idle XtraDB transactions and will kill idle transactions longer than a specified threshold for innodb_kill_idle_transaction. This feature is useful when autocommit is disabled on the server side and you are relying on the application to commit transactions and want to avoid long running transactions that are uncommitted. Application logic errors sometimes leaves transactions uncommitted. Let me demonstrate it quickly through one example:

 Conclusion:
In this post, I shared some tools that can help you get rid of long-running transactions to help ensure that you don’t run into performance issues. This is one of the many good reasons to use Percona Server, which has some extra features on top of vanilla MySQL server.

 

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Aftab

I think you may also highlight the need to identify and optimise slow queries.

Jaime Crespo

MAX_STATEMENT_TIME twitter’s feature is also in upstream 5.7, both as a SELECT syntax http://dev.mysql.com/doc/refman/5.7/en/select.html and as a global an session variable: http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_statement_time

Masroor Farooqi

Excellent review of these tools and helpful examples to illustrate them. It’s good to have these tools under your belt for situations when they are needed.

However, I think Aftab’s comment was getting to the root of the problem. One needs to be worried about long queries, but I would be equally worried about having a process having the power of a sledgehammer. In a case of a simple of a badly formed select query doing a table scan can be killed easily by any of these tools. But imagine an update query running in a MyISAM table. The update runs for a long time and the gets quashed by before all the updates are complete. Now I have committed the carnal sin of a DBA; I have left the database in an inconsistent state – definitely not cool.

Even in a transaction safe world, I do an insert/update, and half way through Brutus (my pt-kill process name :-)) comes and kills the process. Now the database has to do a rollback. We all know the rollback is an expensive process, so at this point I’m stuck between a rock and a hard place – also not cool.

The point I’m trying to make is that automated tools are excellent, but they come at a price. In the database word the price is a risk (small as it may be) that there may be unintended consequences. As long as we are ready to take those risks – bring them on.