1. Introduction
  2. What is MySQL Replication?
  3. Issues with MySQL Replication?
  4. Recommended topology for MySQL Replication
  5. How do I deploy this MySQL Replication setup?
  6. Why use a semi-sync master and slaves?
  7. Automated fail-over with ClusterControl™
    1. Failure of Master server
    2. Failure of backup master
    3. Failure of a slave
  8. Upgrade instructions from ClusterControl™ v1.1.8 to v1.1.9
  9. Management Scripts
    1. Show Replication Status
    2. Start Replication
    3. Stop Replication
    4. Fail-over
    5. Stopping a Slave server
    6. Start a MySQL server
    7. View server status
  10. Troubleshooting
    1. Cannot connect to Master server

 

1. Introduction

ClusterControl™ for MySQL Replication enables customers to Deploy, Manage, Monitor and Scale a clustered database platform based on the standard MySQL Replication.

 

2. What is MySQL Replication?

Replication enables data from one MySQL server (the master) to be replicated to one or more MySQL servers (the slaves). Replication is asynchronous – slaves need not to be connected permanently to receive updates from the master.  

MySQL Replication is very easy to setup, and is used to scale out read workloads, provide high availability and geographic redundancy, and offload backups and analytics.

 

3. Issues with MySQL Replication?

Because it is simple to setup, MySQL Replication is probably the most widely used mechanism to provide high availability. Unfortunately, it is also somewhat fragile:

  • Fail-over is not automatic and has to be performed by somebody who is very skilled
  • Slaves can easily end up with different data to the master, due to hardware problems, software bugs or the use of non-deterministic functions. Diverging datasets on master and slave servers causes replication to stop.
  • A crashing master can cause corruption of the binary log. When it is restarted, the slave servers would not be able to continue from the last binary log position.

It is also possible to set up two-way replication between two mysql servers. However, ring topologies are not recommended. MySQL Replication currently does not support any locking protocol between master and slave to guarantee the atomicity of a distributed updated across two different servers.

 

4. Recommended topology for MySQL Replication

A recommended MySQL Replication topology would be constructed so that it takes advantage of the strengths of Replication, and minimizes the weaknesses of the protocol.

 

 

Some recommendations:

  • Replication only flows in one direction, applications only write on the master
  • The master pushes changes to a backup master and to one or more slaves
  • Use semi-synchronous replication (in MySQL 5.5) between master and backup master
    • Master sends update to backup master and waits with transaction commit
    • Backup master gets update, writes to its relay log and flushes to disk. Backup master then acknowledges receipt of the transaction to the master.
    • Master proceeds with transaction commit.
    • Semi-sync replication has a performance impact, but the risk for data loss is minimized
  • Have only the replication process make changes on the slaves, so as to minimize the risk of data conflicts on the slave.
    • Slaves should therefore be started in read-only mode
    • Applications will not be able to modify data directly on the slaves, but the Replication process will still function on a read-only server
  • Replication sends larger packets between the servers, the max allowed packet set to a high value so as to avoid replication errors.
  • Binary logs need to be available to bring a new slave up to date
    • Provisioning of a new slave requires a combination of the last backup, and all transactions that happened after the backup
  • Replication connection parameters should not be placed in the my.cnf file. For instance, a slave may have its configuration file overwritten, and not know from what point to continue the replication.


5. How do I deploy this MySQL Replication setup?

A MySQL Replication cluster can be configured using the Severalnines Configurator: http://www.severalnines.com/replication-configurator/

The wizard collects some high level data (e.g. IP addresses of machines, size of database, amount of RAM in machines, type of database workload, etc.). The application then generates the configuration of the Replication Cluster. At the end, a deployment package is automatically generated and emailed to the user.

The deployment package automates a number of tasks:

  • copy and install binaries on all servers
  • create datadirs
  • create OS user
  • create the mysql user for MySQL Servers
  • install system tables
  • set permissions for MySQL
  • set up replication links (Master->Backup Master->Slaves)
  • install ClusterControl™ on all hosts to monitor the cluster, detect failures and automate fail-over  

At the end, it has a verification phase that verifies whether the cluster has been properly set up. The Replication Cluster is now deployed.  

 

6. Why use a semi-sync backup and slaves?

We recommend a backup master to reduce the impact of a master failure:

  • No data loss
  • Simplified reconfiguration of the replication cluster
  • The overall system is more stable
  • The master handles connections from different applications, executes write transactions and writes them to disk.

The backup master will synchronize with the master using semi-sync replication.  Since it has the same data as the master, we are minimizing the risk of data loss in case we lose the master.

In case of failure of a database node, the other nodes in the Replication cluster might need to be reconfigured in order for the system to reach a stable state. The backup master facilitates this. See section on "Automated fail-over with ClusterControl™ ".

 

7. Automated fail-over with ClusterControl™

In order to keep the replication cluster stable and running, it is important for the system to be resilient to failures. Failures are caused by either software bugs or hardware problems, and can happen at any time. In case a server goes down, failure handling, failover and reconfiguration of the Replication cluster needs to be automatic, so as to minimize downtime.

ClusterControl™ automates fail-over and reconfiguration of the replication cluster. We describe 3 scenarios below:

  • master failure
  • backup master failure
  • slave failure

 

7.1 Failure of master server 

Updates are done on the master server only.

If the master fails, replication will stop. Applications doing updates reconnect to the backup master and then continue to operate.

 

Effectively, the backup master has become the new master. The slaves are failed over to the new master and continue to get updates.

 

When the old master comes up again, it will synchronize with the new master (which is handling application updates). This is orchestrated by ClusterControl.

 

Once the old master is up-to-date with the new master, ClusterControl™ turns it into a backup master. Semi-sync replication between the master and the backup master ensures that both instances have the same data. The Replication cluster has been restored to its original topology.

 

7.2 Failure of backup master

In case the backup master fails, the master will still continue to replicate its updates to the slaves. Applications still do their updates on the master. The replication cluster does not need reconfiguration.

 

ClusterControl™ detects the failure and will try to restart the backup master. When the backup master comes up again, it will synchronize with the master.


When the backup master has caught up, it will stay in sync with the master via semi-sync replication.  

At this point, the Replication cluster has been restored to its original topology.

7.3 Failure of a slave

If a slave fails, applications connected to the slave can connect to another slave and continue to operate.

 

When the slave comes up again, it will resynchronize with the master and be available to the applications. Depending on how long the slave is lagging behind, ClusterControl™ will either resynchronize the missing transactions or delete the old data and provision the slave from scratch.

 

At this point, the Replication cluster has been restored to its original topology.

 

8. Upgrade instructions from ClusterControl™ v1.1.8 to v1.1.9

To upgrade from version 1.1.8, follow the instructions below:

  • Log into the ClusterControl™ server
  • Download the new schema files

wget http://www.severalnines.com/downloads/cmon/cmon_db_mods-1.1.8-1.1.9.sql

wget http://www.severalnines.com/downloads/cmon/cmon_db-1.1.9.sql 

wget http://www.severalnines.com/downloads/cmon/cmon_data-1.1.9.sql

mysql -ucmon -pcmon cmon < cmon_db_mods-1.1.8-1.1.9.sql

mysql -ucmon -pcmon cmon < cmon_db-1.1.9.sql

mysql -ucmon -pcmon cmon < cmon_data-1.1.9.sql

 

  • Change to the directory that has the deployment scripts

cd s9s-mysql-55/mysql/scripts/install

 

  • For tar ball upgrade:

./install-cmon -f /path/to/cmon-1.1.9-64bit-glibc23-mc70.tar.gz

 

  • For RPM upgrade:

./install-cmon -c /path/to/cmon-controller-1.1.9-1.x86_64.rpm  -a /path/to/cmon-agent-1.1.9-1.x86_64.rpm /path/to/cmon-www-1.1.9-1.noarch.rpm

 

9. Management Scripts  

Below is a list of scripts that come with the Deployment Package.  Scripts are located on the ClusterControl server, under:

s9s-mysql-55/mysql/scripts/ 

 

9.1 Show Replication Status

  [root@clustercontrol scripts]# ./repl-status.sh

master_host -->slave_host status master_status  

slave_status [binlog|m_pos|exec_pos|lag]

app01app02ok binlog.000003:250   

binlog.000003| 250| 250| 0

--- slaves follows ---

app02app03ok binlog.000001:107   

binlog.000001| 107| 107| 0

 

9.2 Start Replication

The following command starts replication between app02 (master) and app03 (slave).

 

[root@clustercontrol scripts]# ./start-repl.sh  -m app02 -s app03

starting replication between app02 --> app03 (change master=0, reset master=0)

 

9.3 Stop Replication

 

The following command stops replication between app02 (master) and app03 (slave).

[root@clustercontrol scripts]# ./stop-repl.sh -m app02 -s app03

stopping replication between app02 --> app03

replication [stopped] app02 --> app03

 

9.4 Fail-over

 

ClusterControl™ will detect failures and automatically perform fail-over. Slaves will always be kept aligned with the master.

Fail-over can also be done manually from the command line. However, please note that this is not recommended as it may lead to data loss. 

 

[root@clustercontrol scripts]# ./failover-repl.sh -m app01 -a

replication [stopped]  --> app03

starting replication between app01 --> app03 (change master=1, reset master=0)

replication [started] app01 --> app03

[root@clustercontrol scripts]# ./repl-status.sh -a

master_host -->slave_host    status    master_status  

slave_status [binlog|m_pos|exec_pos|lag]

app01 app02ok binlog.000004:107   

binlog.000004| 107| 107| 0

--- slaves follows ---

app01app03ok binlog.000002:107   

binlog.000002| 107| 107| 0


9.5 Stopping a Slave server

There are two ways of stopping a Slave server.

One way is to explicitly stop the Slave:

[root@clustercontrol scripts]# ./stop-slave.sh -h app03

app03: Executing '/etc/init.d/mysqld stop' [ok] 

 

It is also possible to use the generic command to stop the Slave: 

 

[root@clustercontrol scripts]# ./stop-mysqld.sh -h app03

app03: Executing '/etc/init.d/mysqld stop' [ok]

app03: Waiting 'mysqld' to stop (timeout=60): [ok]

app03: stopped mysql server

 

9.6 Start a MySQL server

The following command starts a MySQL Server. The script will automatically detect if the host is a Master or a Slave and copy out the correct configuration file (my.cnf.slave or my.cnf.master) to the server. 

[root@clustercontrol scripts]# ./start-mysqld.sh -h app03

app03: Copying '/tmp/my.cnf' [ok]

app03: Executing 'mv /tmp/my.cnf /etc/mysql/my.cnf' [ok]

app03: Executing '/etc/init.d/mysqld start' [ok]

 

It is also possible to explicitly start a slave server:

 

[root@clustercontrol scripts]# ./start-slave.sh -h app03

app03: Copying '/tmp/my.cnf' [ok]

app03: Executing 'mv /tmp/my.cnf /etc/mysql/my.cnf' [ok]

app03: Executing '/etc/init.d/mysqld start' [ok] 

 

9.7 View server status

The status.sh script will ping (using both 'ping' and 'mysqladmin ping') each mysql server and print out if they are connected or not.

 

[root@clustercontrol scripts]# ./status.sh  -a

Master hosts

------------

app01:    up

app02:    up

Slave hosts

------------

app03:    up 

 

10. Troubleshooting

 

10.1 Cannot connect to Master

When adding a new slave, one would first have to add a new host using the 'Host Management' tab. This allows ClusterControl™ to provision the new host with a management agent.

A Slave can be then added by using the 'Scale' tab. The new host can be selected to commission the new slave.

If you are using the etc/hosts file to resolve hostnames, it is important to keep the file updated on all hosts.  The hostname of the new slave should be added to the etc/hosts file on ClusterControl server, the Backup Master as well as the Master.