In previous article SAP HANA High Availability
we explained what is High Availability and how SAP HANA supports that. In this article we will talk about Disaster Recovery in HANA.
What is Disaster recovery?
The term disaster recovery is used to describe the activities that need to be done to restore the database in the event of a fire, earthquake, vandalism, or other catastrophic events.
A disaster recovery plan must be planned and documented in order to prevent catastrophic data loss and incidents.
What is Fault Recovery?
Fault Recovery is the process of recovery and resuming system operations after an outage in the data center due to a fault.
Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
Before we start, we need to understand what our KPIs are when it comes to High Availability and Disaster Recovery, so we can make better-informed decisions.
There are two main objectives:Recovery Point Objective (RPO):
Recovery Point Objective (RPO) is the maximum tolerable period of time during which operational data is lost without the ability to recover.Recovery Time Objective (RTO):
Recovery Time Objective (RTO) is the maximum permissible time it takes to recover the system after an outage for operations to resume.
These two KPIs help system architects choose optimal high availability and disaster recovery technologies and procedures in SAP HANA.
How SAP HANA supports Disaster recovery?
SAP HANA offers three levels of disaster recovery support:
- Storage replication
- System replication
SAP HANA uses in-memory technology, but of course it fully persists any transaction that changes the data, such as row insertions, deletions and updates, so it can resume from a power-outage without loss of data.
SAP HANA persists two types of data to storage:
Transaction Redo Logs
- Transaction redo logs
- Data changes in the form of Savepoints
A transaction redo log is used to record a change. To make a transaction durable, it is not required to persist the complete data when the transaction is committed; instead it is sufficient to persist the redo log.
Upon an outage, the most recent consistent state of the database can be restored by replaying the changes recorded in the log, redoing completed transactions and rolling back incomplete ones
A Savepoint is a periodic point in time, when all the changed data is written to storage, in the form of pages. One goal of performing savepoints is to speed up restart: when starting up the system, logs need not be processed from the beginning, but only from the last savepoint position.
Savepoints are coordinated across all processes (called SAP HANA services) and instances of the database to ensure transaction consistency.
By default, savepoints are performed every five minutes, but this can be configured.
What is a Snapshot?
Savepoints normally overwrite older savepoints, but it is possible to freeze a savepoint for future use; this is called a snapshot.
The advantage of Snapshots is that they can be replicated in the form of full data backups, which can be used to restore a database to a specific point in time. This can be useful in the event of data corruption, for instance.
Savepoints, can be saved to local storage, and the additional backups, can be additionally saved to backup storage. Local recovery from the crash uses the latest savepoint, and then replays the last logs, to recover the database without any data loss.
One drawback of backups is the potential loss of data between the time of the last backup and the time of the failure.
A preferred solution therefore, is to provide continuous replication of all persisted data.
In this method, all the data that is written to the persistence layer (data and log volumes) by the primary HANA system is replicated to a secondary (usually) cross-site location. Because the secondary host is in a remote location, this solution requires a reliable, high bandwidth and low latency connection between the primary site and the secondary site.
The HANA instance of the secondary location can be used for other SAP HANA systems (such as Test or QA environment) depending on your hardware solution.Synchronous Vs Asynchronous Storage Replication
Storage replication offers both synchronous or asynchronous replication options, but synchronous replication should only be used when the distance between the primary and secondary site is no more than 100 km. Otherwise due to the longer distances between the locations, the latency times for writing the redo log may increase.Advantage of Storage Replication
Due to its continuous nature, storage replication (sometimes also called remote storage mirroring) can be a more attractive option than backups, as it reduces the amount of time between the last backup and a failure
Another advantage of storage replication is that it also enables a much shorter recovery time
System Replication is an alternative, high availability solution for SAP HANA platform, providing an extremely short RTO.
Usually system replication is set up so that a secondary standby system is configured as an exact copy of the active primary system, with the same number of active hosts in each system. The number of standby hosts need not be identical.
The secondary system can be located near the primary system to serve as a rapid failover solution for planned downtime, or to handle storage corruption or other local faults, or, it can be installed in a remote site to be used in a disaster recovery scenario.Important point about System Replication:
- Compatible with all SAP HANA hardware partner solutions
- Secondary system has the same number of active nodes as the active, primary system
- The instances in the secondary system operate in live replication mode.
- In this mode, all secondary system services constantly communicate with their primary counterparts.
- System replication replicates data and persists data/logs, and finally loads data to memory.
- The logs and data can be compressed before shipping.
To know more about System Replication, refer to the article SAP HANA Disaster Recovery - System Replication
Challenge yourself! Can you pass this quiz on HANA Availability and Scalability?HANA High Availability and Disaster Recover Quiz