Hybrid replication, i.e. combining Galera and asynchronous MySQL replication in the same setup, became much easier with MySQL 5.6 and GTID. Although it was fairly straightforward to replicate from a standalone MySQL server to a Galera Cluster, doing it the other way round (Galera → standalone MySQL) was a bit more challenging. At least until MySQL 5.6 and GTID.
There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. In a belts and suspenders approach, an asynchronous slave can also serve as a remote live backup.
In this blog post, we will show you how to replicate a Galera Cluster to a MySQL server with GTID, and how to failover the replication in case the master node fails.
**Take note that MariaDB Galera Cluster 10 doesn't work with the method described in this blog post since MariaDB and MySQL have different GTID implementation. For more details kindly review following articles:
In MySQL 5.5, resuming a broken replication requires you to determine the last binary log file and position, which are distinct on all Galera nodes if binary logging is enabled. We can illustrate this situation with the following figure:
If the MySQL master fails, replication breaks and the slave will need to switch to another master. You will need pick a new Galera node, and manually determine a new binary log file and position of the last transaction executed by the slave. Another option is to dump the data from the new master node, restore it on slave and start replication with the new master node. These options are of course doable, but not very practical in production.
GTID (Global Transaction Identifier) provides a better transactions mapping across nodes, and is supported in MySQL 5.6. In Galera Cluster, all nodes will generate different binlog files. Binlog events are same and in same order, but binlog file offsets may vary. With GTID, slaves can see a unique transaction coming in from several masters and this could easily being mapped into the slave execution list if it needs to restart or resume replication.
All necessary information for synchronizing with the master is obtained directly from the replication stream. This means that, when you are using GTIDs for replication, you do not need to include MASTER_LOG_FILE or MASTER_LOG_POS options in the CHANGE MASTER TO statement; instead, it is necessary only to enable the MASTER_AUTO_POSITION option. More on GTID in the MySQL Documentation page.
Make sure the Galera nodes (masters) and slave(s) are running on MySQL 5.6 before proceeding with this setup. We have a database called sbtest in Galera, which we will replicate to the slave node.
1. Enable required replication options by specifying following lines inside each DB node’s my.cnf (including the slave node):
For master (Galera) nodes:
gtid_mode=ON log_bin=binlog log_slave_updates=1 enforce_gtid_consistency expire_logs_days=7 server_id=1 # 1 for master1, 2 for master2, 3 for master3 binlog_format=ROW
For slave node:
gtid_mode=ON log_bin=binlog log_slave_updates=1 enforce_gtid_consistency expire_logs_days=7 server_id=101 # 101 for slave binlog_format=ROW replicate_do_db=sbtest slave_net_timeout=60
For ClusterControl users, you can go to Manage > Configurations to modify the config files for all Galera nodes.
2. Perform a cluster rolling restart of the Galera Cluster (from ClusterControl UI > Manage > Upgrade > Rolling Restart). This will reload each node with the new configurations, and enable GTID. Restart the slave as well.
3. Create a slave replication user and run following statement on one of the Galera nodes:
mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepassword'; mysql> FLUSH PRIVILEGES;
For ClusterControl users, you can use Manage > Schemas and Users, similar to screenshot below:
4. Log into the slave and dump database sbtest from one of the Galera nodes:
$ mysqldump -uroot -p -h192.168.0.201 --single-transaction --skip-add-locks --triggers --routines --events sbtest > sbtest.sql
5. Restore the dump file onto the slave server:
$ mysql -uroot -p < sbtest.sql
6. Start replication on the slave node:
To verify that replication is running correctly, examine the output of slave status:
If the designated master goes down, the slave will retry to reconnect again in slave_net_timeout value (our setup is 60 seconds - default is 1 hour). You should see following error on slave status:
Last_IO_Errno: 2003 Last_IO_Error: error reconnecting to master 'firstname.lastname@example.org:3306' - retry-time: 60 retries: 1
To change the master node, you can simply do as follows:
In some cases, you might encounter a “Duplicate entry .. for key ..” error after the master node changed:
Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table sbtest.sbtest; Duplicate entry '1089775' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.000009, end_log_pos 85789000
In older versions of MySQL, you can just use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n to skip statements, but it does not work with GTID. Miguel from Percona wrote a great blog post on how to repair this by injecting empty transactions.
Another approach, for smaller databases, could also be to just get a fresh dump from any of the available Galera nodes, restore it and use RESET MASTER statement:
mysql> STOP SLAVE; mysql> RESET MASTER; mysql> DROP SCHEMA sbtest; CREATE SCHEMA sbtest; USE sbtest; mysql> SOURCE /root/sbtest_from_galera2.sql; -- repeat step #4 above to get this dump mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1; mysql> START SLAVE;
You may also use pt-table-checksum to verify the replication integrity, more information in this blog post.
Note: Since the slave server applier is single-threaded, do not expect the async replication performance to be the same as Galera’s parallel replication. Also, if the reporting load is intensive and continuous, the slave lag will just keep growing. We have seen cases where slave could never catch up with the master.
Let us know if you would like to see this feature in ClusterControl. Happy hybrid clustering!