Friday, September 13, 2019

Part-5: Test Failover

In the previous posts we went through deploying Prism Central in Main site and DR site, Enable Leap, Configure Availability Zones and creating Protection Policies, Create, Configuring and Validating Recovery Plans

Part-1: Prism Central One-Click Deployment (Click Here)
Part-2: Enabling Leap and Configure Availability Zones (Click Here)
Part-3: Create and Configure Protection Policies (Click Here)
Part-4: Create and Configure Recovery Plans (Click Here)

In this post we will go through Test the Recovery Plans by conducting Test Failover,

After Validation completed, lets do a Test Failover to make sure all are good, same like Validation, select your recovery plan, click on Actions and then click on "Test"


Thursday, September 12, 2019

Part-4: Create and Configure Recovery Plans

In the previous posts we went through deploying Prism Central in Main site and DR site, Enable Leap, Configure Availability Zones and creating Protection Policies,

Part-1: Prism Central One-Click Deployment (Click Here)
Part-2: Enabling Leap and Configure Availability Zones (Click Here)
Part-3: Create and Configure Protection Policies (Click Here)

In this post we will go through Creating, Configuring and Validating Recovery Plans,

From Prism Central on the Main Site, on the left corner, click on Entities menu > Policies > Recovery Plans



Click on Create New Recovery Plan

Part-3: Create Protection Policies

In the previous posts we went through deploying Prism Central in Main site and DR site, Enable Leap and Configure Availability Zones,

Part-1: Prism Central One-Click Deployment (Click Here)
Part-2: Enabling Leap and Configure Availability Zones (Click Here)

In this post we will go through Creating and configuring Protection Policies,

From Prism Central on the Main Site, on the left corner, click on Entities menu > Policies > Protection Policies

Part-2: Enabling Leap and Configure Availability Zones

In the previous post we deployed Prism Central in Main site and DR site:

Part-1: Prism Central One-Click Deployment (Click Here)

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

Now Let's start our journey with Nutanix Leap ...
From Prism Central "PC" in Main site and the same steps from DR Site,  From Settings menu, under Setup, Click on "Leap"

Part-1: Prism Central One-Click Deployment

In this post we will cover the required steps to deploy Prism Central,

From Prism Element main page on the upper left side look for Prism Central, click on "Register or Create new"



If you are joining your cluster to Prism Central already deployed locally or even on another Cluster then you will click on Connect, in our scenario here we need to deploy new one then we will click on Deploy

DR Orchestration (On-Prem Leap)

Nutanix Leap will help you setup, configure, orchestrate and automate all the DR services from centralized location from Nutanix Prism Central, From AOS 5.11 onward, Nutanix adds protection policies and recovery plans to Prism Central for AHV and ESXi, offering an easy way to orchestrate operations around migrations and unplanned failures, Now you can apply orchestration policies from a central location, ensuring consistency across all your sites and clusters, To help manage these new protection policies and recovery plans, Nutanix uses a construct called Availability Zones managed by one Prism Central. An availability zone can also represent a region in Nutanix Xi Cloud Services

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

My Lab consist of two Nutanix Clusters (Main Site) and (DR Site), on each site we will deploy Prism Central which will be integrated together to provide single logical management layer for our Leap DR services,

Monday, July 29, 2019

Why Disaster Recovery matter!

First, let's go through the definition of the term "Disaster Recovery" or in short as its commonly used by IT professionals "DR."

From an IT point of view, a disaster is anything that stops the operability of IT services in any organization. IT services, such as Infrastructure failure or cyber attack ... etc.,

The goal of the Disaster Recovery solution is to recover your organization IT services operation to normal situation.

What is "RTO and RPO" and why it is matter?

Recovery Time Objective or "RTO" is the time needed to recover your IT Services operations after a disaster has occurred, You’ll want your RTO to be as short as possible,

Recovery Point Objective or "RPO" is the number of data changes you lose that your company or organization can sustain or how much data changes you can afford to lose if a disaster happened,
For example: if a disaster happened at 11:00 AM and your DR solution can recover your Data with change updated until 10:45 AM then your "RPO" is equal "15 Minutes RPO", if your DR solution can recover your data with changes updated until 11:00 AM then your RPO is equal to "0".

Nutanix Metro Availability (Synchronous Replication)

The Term "Synchronous Replication" itself means the process of copying data over a storage area network, local area network or wide area network so there are multiple, up-to-date copies of the data,

In another word, when any data write happen in main site another copy is sent to DR and wait for DR confirm that the data is written in DR before committing the data in main site,  that will make the RPO equal to "0",  it is required that the round trip latency between both sites to be equal to or less than "5 ms" Maintain adequate bandwidth to accommodate peak writes. It is also recommended that you have a redundant physical network between both sites.


Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site, and your RTT Latency is within 5MS

When using the Nutanix Metro Availability over VMware ESXI or Microsoft Hyper-V (Nutanix AHV coming soon), Nutanix will handle stretching and the replication of the storage between Main and DR sites,

Nutanix Near Synchronous (Near-sync DR)


NearSync DR provides the best of both worlds (Sync and Async), zero impact to primary I/O latency (like async replication) in addition to a very low RPO (like sync replication (metro). This allows users have a very low RPO without having the overhead of requiring synchronous replication for writes.
By using NearSync DR feature the RPO would be (up to 1-minute RPO), you can set your schedule to 1 minute to 15 Minutes,

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From Main site Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Cross Hypervisor DR

Nutanix provides cross-hypervisor disaster recovery for migrating and DR services between two sites running diffrent hypervisors (ESXi and AHV Clusters). 

Now let's go to step by step to build your cross-hypervisor DR scenario, before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From Main site Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Async DR - Basic setup and Protection Domain Configuration

The Term "Asynchronous Replication" itself means to store and forward approach, in another word, writes data to the primary storage array first and then commits data to be replicated to replication targets based on a specific schedule, 

Nutanix provides native snapshots Per-VM or per volume group, by using Async DR feature the RPO would be (up to 1-hour RPO), the snapshot schedule should be equal to your desired RPO, you can set your schedule to 1 Hour or more,

Before you start you need to make sure that your network connectivity between both sites is healthy and both sites are reachable from the other site,

From the Main site, Prism interface click on the main tab and go to Data Protection and click on it

Nutanix Async DR - Recover from a Disaster

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of Recover from a Disaster for protected VM's from Main site to DR site,

In this scenario we assume that we lost the Main site (No access to Main Site anymore) and we need to recover from the DR site,

From DR Site go to Data Protection page, select the protection domain you need to recover and select "Activate"

Nutanix Async DR - Planned Failover

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of a planned migration for protected VM's from Main site to DR site,

As first step we need to double check that every thing looks normal in DR site, go to Data Protection page to make sure that the protection domain is listed, you will find the protection domain but not as active and if you check the VM page you will not find the VM's as it is currently running in the main site.

Nutanix Async DR - Cloning VM from Snapshot to DR Site

In the previous post we went through the steps of configuring Main site and DR site as a remote site to each other and create a protection domain including which VM's will be protected from Main site to DR site, details on the following link:

Async DR - Basic setup and Protection Domain Configuration (Click Here)

In this post we will go through the steps of Cloning protected VM to DR site,

In this scenario we have normal access to Main site and the original VM is working with no issues but we need to clone another copy for the same VM from the DR site.

From the DR Site go to Data Protection page and select the protection domain from the list, once selected you will be able to select "Local Snapshot", select the version you want to restore (Clone) your VM from, for example if you set your replication schedule every 1 hour with retention of 10 last copies then you will find here 10 versions from the last 10 hours,
Select your snapshot and click on "Restore"!