In this blog, we’ll discuss the ramifications of the Galera Error Failed to Report Last Committed (Interrupted System Call).

I have recently seen this error with Percona XtraDB Cluster (or Galera):

[Warning] WSREP: Failed to report last committed 549684236, -4 (Interrupted system call)

It was posted in launchpad as a bug in 2013: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1434646

My colleague Przemek replied, and explained it as:

Reporting the last committed transaction is just a part of the certification index purge process. In case it fails for some reason (it occasionally does), the cert index purge may be a little delayed. But it does not mean the transaction was not applied successfully. This is a warning after all.

If we look up this error in the source code, we realize it is reusing Linux system errors. Specifically:

#define EINTR 4 /* Interrupted system call */

As there isn’t much documentation regarding this error, and internet searches did not bring up useful information, my colleague David Bennett and I delved into the source code (as we do on occasion).

If we look in the Galera source code gcs_sm.hpp we see:

289  * @retval -EINTR  - was interrupted by another thread

We also see:

317                 /* was interrupted, will be handled by someone else */

This means that the thread was interrupted, but the server will retry on another thread. As it is just a warning, it isn’t anything to be too concerned about – unless they begin to pile up (which could be a sign of concurrency issues).

The specific warning is thrown from galera_service_thd.cpp here:

58                 if (gu_unlikely(ret < 0))
59                 {
60                     log_warn << "Failed to report last committed "
61                              << data.last_committed_ << ", " << ret
62                              << " (" << strerror (-ret) << ')';
63                     // @todo: figure out what to do in this case
64                 }

This warning could be handled better so as to not flood the logs, or sound cryptic enough to concern administrators.

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Roel Van de Paar

On PXC 5.7, using local sysbench, I see this message on a HDD:

2016-08-09T09:43:58.131621Z 0 [Warning] WSREP: Failed to report last committed 71, -4 (Interrupted system call)
2016-08-09T09:44:01.746062Z 0 [Warning] WSREP: Failed to report last committed 43, -4 (Interrupted system call)
2016-08-09T09:44:04.974485Z 0 [Warning] WSREP: Failed to report last committed 45, -4 (Interrupted system call)
2016-08-09T09:44:12.695912Z 0 [Warning] WSREP: Failed to report last committed 51, -4 (Interrupted system call)
2016-08-09T09:44:16.046751Z 0 [Warning] WSREP: Failed to report last committed 53, -4 (Interrupted system call)

The same does not happen on a SSD in the same server (otherwise all remaining equal)

Vasiliy Petrov

I had little different message
[Warning] WSREP: Failed to report last committed 285293519, -110 (Connection timed out)
What does it mean?

Krunal Bauskar

This simply means that the said node was unable to send the commit report notification to group channel probably due to heavy n/w traffic. It is again from same category and can be ignored but it also signals an important warning that you probably want to re-evaluate your load and available n/w bandwidth. Not that things will break immediately but if things keeps growing in this way you may see node dropping in future due to n/w issues.