Monday, July 29, 2019

Why Disaster Recovery matter!

First, let's go through the definition of the term "Disaster Recovery" or in short as its commonly used by IT professionals "DR."

From an IT point of view, a disaster is anything that stops the operability of IT services in any organization. IT services, such as Infrastructure failure or cyber attack ... etc.,

The goal of the Disaster Recovery solution is to recover your organization IT services operation to normal situation.

What is "RTO and RPO" and why it is matter?

Recovery Time Objective or "RTO" is the time needed to recover your IT Services operations after a disaster has occurred, You’ll want your RTO to be as short as possible,

Recovery Point Objective or "RPO" is the number of data changes you lose that your company or organization can sustain or how much data changes you can afford to lose if a disaster happened,
For example: if a disaster happened at 11:00 AM and your DR solution can recover your Data with change updated until 10:45 AM then your "RPO" is equal "15 Minutes RPO", if your DR solution can recover your data with changes updated until 11:00 AM then your RPO is equal to "0".

Nutanix Metro Availability (Synchronous Replication)

The Term "Synchronous Replication" itself means the process of copying data over a storage area network, local area network or wide area network so there are multiple, up-to-date copies of the data,

In another word, when any data write happen in main site another copy is sent to DR and wait for DR confirm that the data is written in DR before committing the data in main site,  that will make the RPO equal to "0",  it is required that the round trip latency between both sites to be equal to or less than "5 ms" Maintain adequate bandwidth to accommodate peak writes. It is also recommended that you have a redundant physical network between both sites.


Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site, and your RTT Latency is within 5MS

When using the Nutanix Metro Availability over VMware ESXI or Microsoft Hyper-V (Nutanix AHV coming soon), Nutanix will handle stretching and the replication of the storage between Main and DR sites,

Nutanix Near Synchronous (Near-sync DR)


NearSync DR provides the best of both worlds (Sync and Async), zero impact to primary I/O latency (like async replication) in addition to a very low RPO (like sync replication (metro). This allows users have a very low RPO without having the overhead of requiring synchronous replication for writes.
By using NearSync DR feature the RPO would be (up to 1-minute RPO), you can set your schedule to 1 minute to 15 Minutes,

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From Main site Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Cross Hypervisor DR

Nutanix provides cross-hypervisor disaster recovery for migrating and DR services between two sites running diffrent hypervisors (ESXi and AHV Clusters). 

Now let's go to step by step to build your cross-hypervisor DR scenario, before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From Main site Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Async DR - Basic setup and Protection Domain Configuration

The Term "Asynchronous Replication" itself means to store and forward approach, in another word, writes data to the primary storage array first and then commits data to be replicated to replication targets based on a specific schedule, 

Nutanix provides native snapshots Per-VM or per volume group, by using Async DR feature the RPO would be (up to 1-hour RPO), the snapshot schedule should be equal to your desired RPO, you can set your schedule to 1 Hour or more,

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From the Main site, Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Async DR - Recover from a Disaster

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of Recover from a Disaster for protected VM's from Main site to DR site,

In this scenario we assume that we lost the Main site (No access to Main Site anymore) and we need to recover from the DR site,

From DR Site go to Data Protection page, select the protection domain you need to recover and select "Activate"

Nutanix Async DR - Planned Failover

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of a planned migration for protected VM's from Main site to DR site,

As first step we need to double check that every thing looks normal in DR site, go to Data Protection page to make sure that the protection domain is listed, you will find the protection domain but not as active and if you check the VM page you will not find the VM's as it is currently running in the main site.

Nutanix Async DR - Cloning VM from Snapshot to DR Site

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of Cloning protected VM to DR site,

In this scenario we have normal access to Main site and the original VM is working with no issues but we need to clone another copy for the same VM from the DR site.

From the DR Site go to Data Protection page and select the protection domain from the list, once selected you will be able to select "Local Snapshot", select the version you want to restore (Clone) your VM from, for example if you set your replication schedule every 1 hour with retention of 10 last copies then you will find here 10 versions from the last 10 hours,
Select your snapshot and click on "Restore"!