blog

Live Migrations Using MySQL Replication

Krzysztof Ksiazek

Published: October 19, 2020
Last Updated: May 4, 2022

Migrating your database to a new datacenter can be a high-risk and time-consuming process. A database contains state, and can be much harder to migrate as compared to web servers, queues or cache servers.

In this blog post, we will give you some tips on how to migrate your data from one service provider to another. The process is somewhat similar to our previous post on how to upgrade MySQL, but there are a couple of important differences.

MySQL Replication or Galera Cluster?

Switching to another service provider (e.g., moving from AWS to Rackspace or from colocated servers to cloud) very often means one would build a brand new infrastructure in parallel, sync it with the old infrastructure and then switch over to it. To connect and sync them, you may want to leverage MySQL replication.

If you are using Galera Cluster, it might be easier to move your Galera nodes to a different datacenter. However, note that the whole cluster still has to be treated as a single database. This means that your production site might suffer from the additional latency introduced when stretching Galera Cluster over the WAN. It is possible to minimize impact by tuning network settings in both Galera and the operating system, but the impact cannot be entirely eliminated. It is also possible to set up asynchronous MySQL Replication between the old and the new cluster instead, if the latency impact is not acceptable.

Setting Up Secure Connectivity

MySQL Replication is unencrypted, and therefore not safe to use over the WAN. There are numerous ways of ensuring that your data will be transferred safely. You should investigate if it is possible to establish a VPN connection between your current infrastructure and your new service provider. Most of the providers (for example both Rackspace and AWS) provids such a service – you can connect your “cloudy” part to your existing datacenter via virtual private network.

If, for some reason, this solution does not work for you (maybe it requires hardware that you do not have access to), you can use software to build a VPN – one of them will be OpenVPN. This tool will work nicely to setup encrypted connections between your datacenters.

If OpenVPN is not an option, there are more ways to ensure replication will be encrypted. For example, you can use SSH to create a tunnel between old and new datacenters, and forward the 3306 port from the current MySQL slave (or master) to the new node. It can be done in a very simple way as long as you have SSH connectivity between the hosts:

$ ssh -L local_port:old_dc_host:mysql_port_in_old_dc root@old_dc_host -N &

For example:

$ ssh -L 3307:10.0.0.201:3306 [email protected] -N &

Now, you can connect to the remote server by using 127.0.0.1:3307

$ mysql -P3307 -h 127.0.0.1

It will work similarly for the replication, just remember to use 127.0.0.1 for the master_host and 3307 for the master_port

Last but not least you can encrypt your replication using SSL. This previous blog post explains how it can be done and what kind of impact it may have on the replication performance.

If you decided to use Galera replication across both datacenters, the above suggestions also apply here. When it comes to the SSL, we previously blogged about how to encrypt Galera replication traffic. For a more complete solution, you may want to encrypt all database connections from client applications and any management/monitoring infrastructure.

Setting Up the New Infrastructure

Once you have connectivity, you need to start building the new infrastructure. For that, you will probably use xtrabackup or mariabackup. It might be tempting to combine the migration with the MySQL upgrade, after all you are setting up whole new environment in the new location. We would not recommend to do that. Migrating to a new infrastructure is significant enough on it’s own so combining it with another major change increases complexity and risk. That’s true for other things too – you want to take step-by-step approach to changes. Only by changing things one at a time that you can understand the results of the changes, and how they impact your workload – if you made more than one change at a given time, you cannot be sure which one is responsible for a given (new) behavior that you’ve observed.

When you have a new MySQL instance up and running in the new datacenter, you need to slave it off the node in the old datacenter – to ensure that data in both datacenters will stay in sync. This will become handy as you prepare yourself for the final cutover. It’s also a nice way of ensuring that the new environment can handle your write load.

Next step will be to build a complete staging infrastructure in the new location and perform tests and benchmarks. This is a very important step that shouldn’t be skipped – the problem here is that you, as the DBA, have to understand the capacity of your infrastructure. When you change the provider, things also change. New hardware/vm’s are faster or slower. There’s more or less memory per instance. You need to understand again how your workload will fit in the hardware you are going to use. For that you’ll probably use Percona Playback or pt-log-player to replay some of the real life queries on the staging system. You’ll want to test the performance and ensure that it’s on a level which is acceptable for you. You also want to perform all of the standard acceptance tests that you run on your new releases – just to confirm that everything is up and running. In general, all applications should be built in a way that they do not rely on the hardware configuration and on a current setup. But something might have slipped through and your app may depend on some of the config tweaks or hardware solutions that you do not have in the new environment.

Finally, once you are happy with your tests, you’ll want to build a production-ready infrastructure. After this is done, you may want to run some read-only tests for final verification. This would be the final step before the cutover.

Cutover

After all those tests have been performed and after the infrastructure was deemed production-ready, the last step is to cutover traffic from the old infrastructure.

Globally speaking, this is a complex process but when we are looking at the database tier, it’s more or less the same thing as standard failover – something that you may have done multiple times in the past. We covered it in details in an earlier post, in short the steps are: stop the traffic, ensure it’s stopped, wait while the application is being moved to the new datacenter (DNS records change or what not), do some smoke tests to ensure all looks good, go live, monitor closely for a while.

This cutover will require some downtime, as we can see. The problem is to make sure we have consistent state across the old site and the new one. If we want to do it without downtime, then we would need to set up master-master replication. The reason is that as we refresh DNS and move over sessions from the old site to the new one, both systems will be in use at the same time – until all sessions are redirected to the new site. In the meantime, any changes on the new site need to be reflected on the old site.

Using Galera Cluster, as described in this blog post, can also be a way to keep data between the two sites in sync.

We are aware this is a very brief description of the data migration process. Hopefully, it will be enough to point you into a good direction and help you identify what additional information you need to look for.

Migration and upgrades: achieving near zero-downtime in PostgreSQL

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance

PostgreSQL Bi-Directional Logical Replication — A Deep Dive