December 23, 2013
By Severalnines

This post is a continuation of our previous post on Online Schema Upgrade in Galera using TOI method. We will now show you how to perform a schema upgrade using the Rolling Schema Upgrade (RSU) method. This requires that the new schema is backward compatible with the old schema, so it requires a bit more planning than the TOI method. In some cases, you might need to turn off any connections to the target node. 

 

If you are using non-uniform hardware across your Galera cluster, RSU might be preferable as the more predictable process. RSU does not slow down your cluster.

 

As described in details in the previous post, an online schema upgrade using RSU can be performed in two ways:

  • RSU with wsrep_desync + wsrep_on 
  • RSU with node dropping

Each one has its own advantages/disadvantages, and you should have recoverable backups ready before performing any of these.

 

Test Case

 

We will be using the same data set as in the previous post and we will perform schema upgrade using both ways. All Galera nodes are running MySQL 5.5.34, wsrep_25.9.r3928 with Galera wsrep provider version 25.2.8. We have a three-node Galera cluster fronted by one HAproxy node, as illustrated in following figure:

 

 

Option 1: RSU + wsrep_desync + wsrep_on

 

When wsrep_desync variable is set to ON, the node is desynced from the cluster. Toggling this back will require a IST or a SST depending on how long it was desynced. The variable wsrep_on is used to enable or disable wsrep replication. Setting it to OFF on a Galera node will stop replication and have behave like a standalone MySQL server.

 

A node can be set to omit flow control by:

mysql> SET GLOBAL wsrep_desync=ON;

 

A session can be declared to not replicate anything:

mysql> SET wsrep_on=OFF;

 

Running DDL in such a session, will result in a local schema change, and processing of the DDL will not hold back the cluster. However, all cluster transactions will be replicated to the node, and if there are conflicts, the DDL will be aborted. This method is good only for non-conflicting operations for example altering a table that is not being updated.

 

During this period, the node status will change to “Donor/Desynced” (state 2), and your HAProxy load balancer should distinguish this as unhealthy and exclude it from load balancing set. For details on how this works, please review our tutorial on HAProxy.

 

1. We will start with node1. Set the OSU method to RSU:

mysql> SET GLOBAL wsrep_OSU_method='RSU';

 

2. Omit flow control and turn off wsrep API:

mysql> SET GLOBAL wsrep_desync=ON; SET wsrep_on=OFF;

 

3. Perform the non-conflicting DDL operations (assume there are no updates on tbl_points):

mysql> ALTER TABLE tbl_points ADD COLUMN remarks VARCHAR(255) DEFAULT "-";

At this point, this node is still part of the cluster and should receive incoming writesets from other nodes. The writesets should be applied locally however, if there are any writesets conflicting with the corresponding table during the altering operation, the DDL operation will abort and you will get “ERROR 1317 (70100): Query execution was interrupted”.

 

4. Turn on wsrep API and resume flow control:

mysql> SET wsrep_on=ON; SET GLOBAL wsrep_desync=OFF;

At this point, your application will be talking to all nodes, but node1 has a new version of the schema. The data set on node1 contains the new table, but the table does not exist yet on node2 and node3. Therefore, your application should be upgraded before you perform the schema change, so it can handle both the old and the new schema. 

 

Repeat the schema change on the remaining nodes as per the process described above.

 

Once the schema upgrade is completed, you are advised to revert the OSU method to default TOI so other DDL will not RSU’ed:

mysql> SET GLOBAL wsrep_OSU_method='TOI';

 

Option 2: RSU + Node Dropping

 

Also known as “manual RSU”, this method will drop a node from the cluster and run DDL on the stand-alone node. Joining the node back to the cluster must happen through IST as SST will bring back the old schema. We assumed that users are aware of RBR limitations, since those are preserved in Galera as well.

 

This method will require you to:

  • Adjust gcache size big enough to allow IST after the DDL is over
  • Protect the target node (performing the DDL) from any production connections (application and loadbalancer)

 

Adjust gcache size and perform rolling restart as described in this blog post. If you are running ClusterControl, it is recommended to disable auto recovery so it won’t recover the “unusual” Galera behaviour during the exercise by adding following line into /etc/cmon.cnf:

enable_autorecovery=0

 

Restart CMON service to apply the changes:

$ service cmon restart

 

1. Configure HAproxy to isolate the target node from receiving any front-end connections by commenting the respective line:

listen  s9s1_33306_lb1
        bind *:33306
        mode tcp
        timeout client  60000ms
        timeout server  60000ms
        balance leastconn
        option httpchk
        option allbackups
        default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s maxconn 256 maxqueue 128 weight 100
        #server node1 192.168.197.151:3306 check
        server node2 192.168.197.152:3306 check
        server node3 192.168.197.153:3306 check

 

2. Reload HAproxy configuration:

$ service haproxy reload

 

3. We will start with node1. Set the OSU method to RSU:

mysql> SET GLOBAL wsrep_OSU_method='RSU';

 

4. Isolate the node from existing cluster:

mysql> SET GLOBAL wsrep_cluster_address="gcomm://";

At this point, this node is expected to leave the current Galera cluster.

 

5. Perform the schema upgrade:

mysql> ALTER TABLE tbl_points ADD COLUMN remarks VARCHAR(255) DEFAULT "-";

 

6. Let the node rejoin the cluster:

mysql> SET GLOBAL wsrep_cluster_address="gcomm://node1:4567,node2:4567,node3:4567";

At this point, this node will perform an IST from any of the selected donor node. The table on node1 returns rows with new column yet on node2 and node3 the table structure remain unchanged.

 

Repeat the same step for the remaining nodes. 

 

Once completed, revert the HAproxy and ClusterControl configuration back to their original state:

listen  s9s1_33306_lb1
        bind *:33306
        mode tcp
        timeout client  60000ms
        timeout server  60000ms
        balance leastconn
        option httpchk
        option allbackups
        default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s maxconn 256 maxqueue 128 weight 100
        server node1 192.168.197.151:3306 check
        server node2 192.168.197.152:3306 check
        server node3 192.168.197.153:3306 check

 

And for /etc/cmon.cnf:

#enable_autorecovery=0

 

Restart CMON service to apply the changes:

$ service cmon restart

 

Reload HAproxy configuration:

$ service haproxy reload

 

Once the schema upgrade exercise completed, you are advised to revert the OSU method to default TOI so other DDL will not RSU’ed:

mysql> SET GLOBAL wsrep_OSU_method='TOI';

 

References