What is data replication?
Data replication is the process of updating copies of your data in multiple places at the same time. The goal of replication is to keep your data available to the users who rely on it to make decisions and to the customers who need it to perform transactions.
Data replication works by keeping the source and target data synchronized. That means that any changes to the source data are reflected accurately and quickly in the target data.
Depending on your data replication strategy, your target database can be the same as the source (full-database replication) or the target can be a subset of the source (partial replication). If your goal is high availability or disaster recovery, it makes sense to maintain full replicas. For analysis and reporting, you can reduce the workload on the source database by replicating subsets (according to region or business function) of the database to targets.
What are examples of data replication?
The more you depend on your data, the more important it is to make sure you have no single point of failure. When you replicate your data from the production source to targets in other cities or time zones, you help ensure that users and customers can always access it.
- IT administrators often turn to data replication for disaster recovery. With their data safely maintained at two or three different sites, they are less vulnerable to business interruption in case of a system breach or disaster at any single site. And, because the replica is always up to date, business continuity is just a matter of redirecting traffic away from the disabled source to the target site.
- In an era when customer bases and development teams follow the sun, geo-diverse database replicas keep data close to the people who need it. Data replication is a useful strategy for overcoming network latency and improving local access.
- Real-time analytics are integral to competitive advantage, so line-of-business managers want to run queries and base their decisions on current transactions. To keep those queries from burdening the source, administrators create and maintain replicas for use by analysts and offload that work from the production database.
The image below depicts a log-based replication architecture, with data flowing from source to target and cloud.
Why is data replication important?
Data replication technology lets your organization use your databases in two, five or a dozen places at the same time.
So, why is replication important? I’ll explain how you can replicate data to your advantage in three important areas:
- Analysis and reporting
- Upgrades and migrations
- High availability and disaster recovery
Analysis and reporting
“Replication isn’t such a big deal,” you say. “I can email a data file to 20 people. Then I’d have my data in 20 places at the same time.”
That’s true. But what if it was ever-changing sales data from your e-commerce site, or real-time data based on your company’s social media? By the time recipients had opened your data file and started analysis, they would be studying old news. It would be like reading a printed newspaper: The longer they studied the data in the file, the less current its story would be. You’d have to send out an updated file every time there was a new or changed transaction.
Besides, sending a data file doesn’t scale up very well. It may work for a 100KB or 700KB spreadsheet full of data, but what about a 500GB database? You couldn’t send that out every hour.
“In that case,” you say, “I’d let everybody log onto and query the production database. Then, we would all query and analyze exactly the same data at the same time.”
Yes, that way nobody would be studying old news.
But then you’d have a congestion problem in your database. The reports you ran would compete for memory and CPU cycles against the reports that all the other analysts were running. And all those reports would compete against the transactions of customers who pay your salary and keep the lights on.
Data replication is a more efficient, more elegant solution for putting nearly real-time data in front of the analysts who can take action on it.
Upgrades and migrations
The argument for replication is different in the data center, where IT administrators are trying to perform a migration or an upgrade. There, the pressure comes not from the need to run reports but from the need for business continuity. Customers and users don’t care that there’s a migration or upgrade going on; they want full access to the data without interruption.
“No problem,” you say. “When we migrate/upgrade, we can back up our databases and restore them to the target. Once we have all our data in two places, we’ll start the migration/upgrade. Then, as soon as it’s finished, we’ll point all our users to the new environment.”
But what about all the transactions that have been changed and added in the meantime? It will take you a while to bring the new environment into sync with the old one. And what will you do if there are problems in the new environment? You’ll have to roll things back to where they were before, then try again. That’s not business continuity — that’s business on-again-off-again.
Data replication lets you maintain an accurate, real-time copy of production data to upgrade and migrate databases without risk. It keeps source and target in sync until testing is complete when you can confidently switch users over to the new, upgraded environment.
High availability and disaster recovery
Database administrators are responsible for ensuring databases run smoothly while keeping an eye on high availability, disaster recovery and the five nines of uptime. Unscheduled downtime results in the loss of service, data, money and customers, so the job is all about keeping multiple databases and platforms running efficiently. High availability ensures the data is always there for users, and disaster recovery is the big backstop in case the data suddenly isn’t there.
“Our database includes native tools for high availability,” you say, “And we use another tool to keep remote copies running for disaster recovery. That’s how we have our data in more than one place.”
Native tools are often expensive for the limited functionality they provide and some still have a single point of failure—a shared database. If something happens to that database, your systems will be down while you recover. Besides, a copy is not a replica. A copy is a snapshot, and a snapshot of a database is obsolete as soon as a new transaction hits.
A replica, on the other hand, gives you true high availability. Replication means you have databases that can immediately take over for one other in case of failure.
With data replication, you achieve high availability and strengthen disaster recovery. Replication lets you switch users to a secondary system during maintenance or downtime to keep your production data available. Your applications don’t have to wait for you to spin up a copy of an entire disaster recovery database, which means that you won’t lose transactions. Plus, the right high availability product lets you use the same target database for disaster recovery.
Your data available in more than one place
You’ll be surprised at how many advantages there are to having your databases in more than one place at a time.
Here’s another: for data integration projects. When you’re pulling together large amounts of data from multiple sources, sending data from all those sources to a replication target keeps production data available. Meanwhile, integration tools aggregate the data from the different silos and make it available for operations and analysis.
Being data-driven means removing as many roadblocks as possible between users and databases. Data replication is a big step toward getting your data in as many places as your users need it.