One of the world's largest websites has offered users an insight into the steps it takes to ensure all the information held on its vast databases will be safe should anything catastrophic happen.
Facebook operates one of the largest MySQL installations in the world, engineer at the firm Eric Barrett explained in a blog post on the site. With thousands of database servers in multiple regions around the world, he observed it is unsurprising that it has unique and highly demanding requirements to make sure they are resilient and, more importantly, can be recovered should the worst happen.
As a result, it has put in place a backup system that relies on a high level of automation, allowing it to deploy and manage hundreds of databases with minimal human intervention to grow the site and move around many petabytes of information a week.
"Rather than extensive front-loaded testing, we emphasise rapid detection of failures and quick, automated correction," Mr Barrett said.
The system uses a three-stage process to protect this data, with the first of these using rack backup servers, which collect second-by-second binary logs of database updates.
After this data is collected, it is immediately copied to the firm's large, customised database clusters, which are highly stable, replicated and have fixed retention terms. Once a week, this data is then copied again to discrete long-term storage solutions in a separate region.
In addition to this, Mr Barrett described the rigorous monitoring processes that check the system and score the severity of database backup failures, as well as the testing systems the business uses to make sure it can restore data quickly in the event of a serious incident.
"Backups are not the most glamorous type of engineering. They are technical, repetitive and when everything works, nobody notices," the expert said. However, they are also one of the most vital parts of a network, as without them, firms will be severely exposed when they suffer downtime.
For firms such as Facebook, which stores vast amounts of personal data, its reputation is also on the line as, if it suffers large amounts of downtime or data loss, it will quickly lose the trust of consumers who often upload highly sensitive information.