As one transitions from application architecture to infrastructure (mainly cloud) architecture, you start hearing  the terms High Availability and Disaster Recovery thrown around a lot (and thrown around interchangeably!).

While there is some overlap between the two, in general, they serve different purposes.

  1. A Highly Available architecture is one that minimizes downtime (for both applications and underlying infrastructure) . It is usually based on adding additional components – such as mirrors and redundant nodes (e.g. an Oracle RAC provides you with the redundancy needed at the database level). Think REDUNDANCY when you think of HA.
  2. A Disaster Recovery plan comes into play when even High Availability does not cut it. Think of your entire Oracle RAC – with all its redundant nodes – getting wiped out by a disaster (fire, flood, whatever). Think EVERYTHING downtime – including your HA cluster, possibly even the underlying networking infrastructure.

In such a scenario, you will need a step by step approach for recreating your entire production environment. Whether this is done from tape backups or cloud backups, needs to be part of your plan.

Which people, which locations, the exact steps to be taken by each person (roles and responsibilities) – all need to be part of your DR plan.

high_availability

The Cloud blurs the boundary (between HA and DR)

Prior to the advent of the cloud, everything I said above was true. DR contingency planning starts where HA planning ends.

However, with the cloud, your HA solution can ITSELF provide you with a Disaster Recovery Plan. This happens due to constructs such as Availability Zones and Multi-Region tenancy. Availability zones allow you to span your HA nodes across different data centers. If one of the data centers is struck by disaster, your HA redundant node in the second datacenter takes over.  So – it manages to solve the problem of disaster recovery as well. However, Availability Zones are not guaranteed to be in different geographic locations – so your redundant data centers can all be sitting within a few miles of one another. This makes it possible for Disaster to influence both your data centers.

However, if you go a step further, you can spread out your Availability Zones  across different GEOGRAPHIC REGIONS (e.g. NorthWest would be one REGION and SouthWest would be another region). Now, with your nodes spread across such a wide geographic separation, the chances of disaster striking both nodes is minimal.

Summary

DR and HA are both used interchangeably – but mean different things.  HA tries to provide uninterrupted uptime for your I.T. asset. DR goes a step further – and takes over when HA cannot hold up (as in the case of a real disaster).

Prior to cloud computing, these two strategies (DR and HA) were actually different – and independent of one another.

With the advent of the cloud, it is possible for your HA strategy to help in the case of Disaster as well. In fact, it costs a fraction of what it would in a non-Cloud environment.

Footnote- Impact of Disaster

  1. A  study (University of Texas, Austin) discovered that 85 percent of businesses (tech and non-tech included) are heavily (or entirely) dependent on uptime of their I.T. Systems.
  2. The longer it takes to restore communications (after the disaster), the more critical the impact on businesses.

Where can I learn more?

Anuj Varma offers a 1-day seminar covering all emerging technologies.

This seminar includes Cloud Computing, BigData, NoSQL, mobile computing, Javascript Libraries and other customized content not available elsewhere.

All content is based on real-world implementations.

Anuj holds professional certifications in Google Cloud, AWS as well as certifications in Docker and App Performance Tools such as New Relic. He specializes in Cloud Security, Data Encryption and Container Technologies.

Initial Consultation

Anuj Varma – who has written posts on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.