
Fabian Tech Tips

Replication States used to check replication status
Oct 13, 2024
7 min read
0
0
0
Replication States
In order to monitor replication effectively, you need to know and understand the various states that the master and slave can occupy. Server states can be displayed by using the SHOW PROCESSLIST statement on the master and the slave.
At least one line of the results will be related to the replication activities for the user account associated with replication. Following the examples of this chapter, the account is replicant on the master and system user on the slave. In the Command column, on the master the value will be Binlog Dump, meaning a binary log thread; on the slave the value will be Connect.
The results will also contain a field called State, in which the state of the thread will be given. Here is an example from a slave:
SHOWPROCESSLIST \G***************************1.row***************************Id: 16User: systemuserHost: db: NULLCommand: ConnectTime: 119255State: Waiting formaster tosend event Info: NULL
These results show only one thread, the I/O thread waiting on the master. If the server were processing entries from the master’s binary log, there would probably be another row shown in the results for the SQL thread.
What follows is a list of all of the possible server states that you may see on master and slave servers, along with descriptions of each. In addition to understanding these traits, you may want to develop a script to check that replication is running on the slave and not stalled and to notify you if it’s not running.
Replication on MySQL is very stable, but if it does stop, it’s very quiet about it. Fortunately, it’s very good about rapidly catching up once you restart it.
Master BinLog Dump Thread States
Here is an overview of master server replication states that can be reported for binary log threads (Binlog Dump):
Has sent all binlog to slave; waiting for binlog to be updated: This is the most common status message you should see for a slave connection on the master. In this state, the master is basically doing nothing regarding replication. It has sent the slave all entries requested and is now waiting for another event to occur that will cause its binary log to be updated. Notice that it says it is waiting for the binary log to be updated. It doesn’t say it’s waiting for the databases to be updated. That’s handled by a different component of MySQL. The thread lives only to provide information about the binary log to the slave.
Sending binlog event to slave: After the binary log has been updated, the master informs the slave that one or more new entries have been made. If the slave requests the entries, the master enters this state, indicating that it is in the process of sending a slave entries or information on pertinent database events. There are obviously other states in between, but they are so fast and short-lived that they are not registered and therefore will not show up in the results of SHOW PROCESSLIST.
Finished reading one binlog; switching to next binlog: If a slave has been offline for a while, the master may have flushed its logs in the interim. Whenever the master does this, it will start a new log file, saving the previous ones. When a slave requests log entries that span more than one log file as the master switches from one file to the next, it enters this state.
Waiting to finalize termination: Once the master has completed the process of updating a slave, the master shows this status as it’s closing the binary log file and winding down the communication with the slave. When it is finished, the master will return to the first thread state (Has sent all binlog to slave; waiting for binlog to be updated) in which it is waiting for more changes to the binary log.
Slave I/O Thread States
Here is a list of replication states that can be found on the slave server for I/O threads:
Connecting to master: This state indicates that the slave I/O thread is attempting to connect to the master. If it can’t connect, it may stay in this state for a while as it retries.
Checking master version: After the slave connects to the master, it compares versions of MySQL with the master to ensure compatibility. This is very quick.
Registering slave on master: After the slave connects to the master, it registers itself with the master as a replication slave server. During this process, it will be in this state. On the master side of the connection, the Binlog Dump state will be Has sent all binlog to slave; waiting for binlog to be updated, as described previously.
Requesting binlog dump: When the slave has been informed of changes to the master binary log, it enters this state to request the new entries. Also, when it first connects to a server — either for the first time or after having been disconnected for a while — it enters this state briefly to request all entries since the last master binary log position that it gives the master. If no changes have occurred, none are returned. If there are new entries, the entries starting from the position given until the end of the master’s binary log will be transmitted to the slave. On the master side, you will see the state Sending binlog event to slave as a result of the request.
Waiting to reconnect after a failed binlog dump request: If the request for new entries mentioned in the previous state fails to be received from the master, the slave enters this state as it waits to be able to connect to the master periodically. This timeout period is configured using the --master-connect-retry and defaults to 60 seconds. The number of retries it will make can be found in the master.info file shown earlier in this chapter. Each time the slave attempts to reconnect, it will enter the next state.
Reconnecting after a failed binlog dump request: If the slave failed to stay connected to the master while trying to retrieve entries to the master’s binary log (as mentioned in the previous state description), this state indicates that the slave is trying to reconnect. If it fails again, it will go back to the previous state and wait to retry. By default, it will try 60 times before stopping. You can change the number of retries with the --master-connect-retry option.
Waiting for master to send event: This state is the most common that you will see, unless your server is very busy. The SQL thread is currently connected to the master and is waiting for the master to send it binary log updates. If there is no activity after a while, the connection will time out. The number of seconds that will elapse before timeout is reached can be found in the variable slave_net_timeout (previously slave_read_timeout). A timeout is the same as a lost connection for the slave. Therefore, it will become active and attempt to reconnect to the master, then inquire about any changes to the master’s binary log, before entering this state again.
Queueing master event to the relay log: This state occurs when the slave I/O thread has received changes to the master’s binary log from the master and is writing the SQL statements and the related information to the slave’s relay log. Once it’s done, the slave’s SQL thread will read the relay log and execute the new SQL statements written to the log. On the SQL thread, this is the Reading event from the relay log state described in the next section.
Waiting to reconnect after a failed master event read: If the connection to the slave failed while reading an event (represented by an entry in the master’s binary log), the slave will wait in this state for a certain amount of time before attempting to reconnect to the master. The number of seconds that the slave will wait before retrying is found in the master-connect-retry variable on the slave. When the slave attempts to reconnect, it enters the next state.
Reconnecting after a failed master event read: This state occurs after the slave I/O thread loses its connection to the master while receiving an entry from the master binary log.
Waiting for the slave SQL thread to free enough relay log space: If the SQL thread isn’t processing the entries in the relay log fast enough, and the backlog has caused the relay log files to become too large, the I/O thread will enter this state. In this state, it’s waiting for the SQL thread to process enough of the entries in the relay log so that the I/O thread can delete some of the older content of the log. The maximum amount of space allocated for the relay log files is found in the relay_log_space_limit variable. The slave SQL thread automatically deletes relay log files. The FLUSH LOGS statement, though, causes the slave to rotate log files and to consider deleting old files.
Waiting for slave mutex on exit: When the I/O thread has been terminated, it enters this state as it closes. The term mutex stands for mutual exclusion. The SQL thread gets the mutex to prevent any other slave replication activities so that replication can be shut down without loss of data or file corruption.
Slave SQL Thread States
Here is a list of replication states that can be found on the slave server for SQL threads:
Has read all relay log; waiting for the slave I/O thread to update it: Because replication is so fast, you will usually see the slave’s SQL thread in this state unless you have a very busy database system with data constantly being updated. This state indicates that the slave’s SQL thread has read all of the entries in its relay log and has executed all of the SQL statements that it contains. It has no further updates to make to its databases and is waiting for the slave’s I/O thread to add more entries to the relay log