Disasters come in all shapes and sizes, from a full site failure, to failure of something like a storage array. There may be a natural disaster where your data center is, or you may even have a malicious actor inside your environment seeking to harm your business.
Whatever the case, you need a plan, and all disaster recovery plans are not created equal. Then, once you have your plan, you need to test it and update it on a regular basis.
Create a disaster recovery plan for your business
Creating a disaster recovery plan is essential to your business’ function. After all, how will things continue to work if a disaster occurs?
The first thing to do is to get an understanding of the applications and services in your IT environment. After you have this list, you must perform a business impact analysis, or BIA.
A BIA is an important step to determine how you will recover your applications, since as part of this you will determine what the recovery point objective (RPO) and recovery time objective (RTO) is for every application or service in your environment.
For each application and service, you then must determine several important items:
- The components or servers that make up the application or service
- Dependencies on other applications or services in your environment
- How to restore each component to meet your RPO and RTO
- How to verify components have been restored correctly
- How to protect the data in your disaster recovery site
- How to fail back to production when the disaster is over, or how to make the DR site the permanent site
Throughout this disaster recovery process, there is something very important to remember, the human component. Recovering from a disaster is stressful! That’s why making sure you have a unified approach to each application is so important.
While RPOs, RTOs and restoration steps may differ for each application, it is important to follow the same processes for each.
It is also important to find a tool that will make the creation, testing and execution of your plan simple for your staff to perform.
Disaster recovery plan template
Perhaps one of the most important parts of your plan is your disaster recovery plan. Unfortunately, this is often overlooked by most in the planning process.
Why is it so important? Simple. We are all human. During a disaster, tensions and emotions are running high. Having each disaster recovery plan look the same, and contain the same information is key to its success.
From a planning perspective, it is also much easier to have documentation with a unified look and feel, so key stakeholders can quickly review documentation and get an understanding of what is in each plan.
Let’s take a closer look at things to think about when creating a new disaster recovery plan.
What are the key steps of a disaster recovery plan?
If we boil a disaster recovery plan to just a few key steps it really comes down to bringing the application back online and making sure it works correctly.
Key steps for a successful disaster recovery:
- Declaring the disaster
- Beginning recovery of plan components
- Testing plan components as they are recovered
- Testing the application to ensure it is working correctly
- Protecting the data in the DR location
- Notifying application owners and key stakeholders as the plan progresses
Are there more things to think of for your DR plan? Of course, but if your plan cannot bring the failed application back online and make it work again, you have not successfully recovered. It’s just that simple.
Let’s take a closer look at what needs to be included in a DR plan to make this happen.
What to include:
There is some information that each disaster recovery plan should contain, no matter what type of plan it is.
Here’s the most important information to include:
- Disaster recovery plan name
- Name of application or system to be recovered
- Name and contact information for application owner
- RPO and RTO of application or system
- Name of each server or component to be recovered
- Steps to be taken on servers or components
- Validation and verification steps to ensure application is running correctly
Let’s take a closer look at why these things are so important.
Each plan should have a distinct name so it’s easy to reference, this may be something as simple as “XYZ Application Disaster Recovery Plan.” It should also clearly list the application owner as well as their contact information.
The next important item is the plan’s RPO and RTO. This should be right at the beginning of the plan so whoever is doing the recovery knows how much time they must recover in.
It should also list every asset in the plan, as well as the steps that must be taken to recover each of them. If there is a specific order that applications or components need to be recovered in, it should be listed. Likewise, if the applications recovered require any sort of verification or validation, these instructions should be included in the plan. If the application being recovered is complex, it’s also always nice to have an application topology diagram.
Recovery plan considerations
As you create recovery plans, there are several considerations to keep in mind.
First and foremost, your recovery plan needs to be easy to read and use! This means you should make sure your recovery plan is reviewed by others during the creation of the plan.
Besides checking things like spelling and grammar, it’s always good to have another set of eyes on the plan to make sure it flows correctly.
If you’re validating your application before you bring all components online, this could cause problems from both an audit perspective and for your application functioning properly.
It’s also a good idea to make sure someone who does not know that particular application reviews the disaster recovery plan, after all, who knows who may be executing the recovery.
Building a disaster recovery team
It’s important to identify what personnel will be recovering applications and services in the event of a disaster so they can test recovery and become familiar with the methods that will be used.
It’s good to have several disaster recovery teams ready to go in the event of a disaster. Because we can’t predict disasters, we also can’t predict the impact to our disaster recovery teams.
If we hedge our bets on a single person, or single team of people, we may be in trouble if no one has power to turn on their computer to begin recovering applications!
This is one of the reasons an easy-to-follow disaster recovery plan is so important.
We need to make sure that anyone can log in and begin recovery at any moment, even if they are not familiar with the application they are recovering.
By taking a unified approach and using the same disaster recovery template for all plans, we can help reduce the risk of the human component.
It’s also important to try to automate and orchestrate the execution of our plan when possible. In an ideal world, someone would just need to access the disaster recovery tool and click a button to begin application recovery.
Types of disaster recovery plans
Often, people think that there is just one disaster recovery plan they need to recover their whole business. This is not the case, and it is simply not practical. There are many types of plans, and ways to group them.
Think of the single document you would have to have to recover your business. It would be so large and so complex, that there would be almost no way to test it, or successfully execute it. Not to mention it would be nearly impossible to update when things change in an environment.
The truth is that most organizations have many types of business continuity and DR plans.
Here are a few examples:
- Application-level failure
- Site-level failure
- Infrastructure component failure
- Mission-critical applications
- Dev/test applications
A plan needs to be much more granular and flexible, while sharing the same look and feel.
Recovering services at the application level make sense for several reasons. First and foremost, it allows the application owners to get involved in disaster recovery. After all, they know their applications the best and are aware of any nuances to their particular application, such as the order the servers need to be started in. They are also aware of any special requirements from an audit perspective.
A site-level plan simply lists the applications that would need to be recovered in the event of a site failure, as well as any other important information that may be applicable at the site level. Then the application-level plan can be easily used.
There may also be a plan at the component level for the recovery of a storage array in the event of a failure.
plan for disaster recovery in your environments, so you can respond to different types of disasters accordingly. While these plans should have the same look and feel, the steps taken for recovery may be different based on the type of failure.
Testing your plan
After creating a plan, the most important thing you can do is test it. Why? Simple. You need to know if the plan you put together works.
There is a tendency to not fully test disaster recovery plans, or not test them at all. At best, most organizations sort of test their DR plans once or twice a year.
Continuous testing is important, especially since applications are constantly changing. Plans must be updated any time a change is made to an application, such as adding more server for additional capacity, or removing older servers.
When testing, be sure to pay special attention to what did not go as desired. This is the only way your disaster recovery plan will improve.
Sure, you can restore one server and call it a successful DR test, but has the test served its purpose?
The true purpose of a DR test is to find out if your plan works or not. Don’t cheat at your DR test or you may find yourself in an even worse spot in the event of an actual disaster.
How are organizations using a disaster recovery plan?
First and foremost, DR and business continuity plans are created to ensure your business will continue to run in the event of a disaster. While everyone hopes they will never have to execute a DR plan, it’s important that it’s ready and waiting to be executed at any time.
Besides protecting your business, disaster recovery planning is a big audit point. Regulated environments need to prove that they are ready for a disaster and capable of recovering, or face consequences. The disaster recovery plan is a very important asset when an auditor comes knocking.
We also can’t forget our disaster recovery tests. DR plans are used during DR tests to see if they actually work or not and must be revised and re-tested if they do not.
How to successfully plan for and recover from a disaster?
When it comes to successful disaster recovery, there are several important things to focus on.
First and foremost is focusing on the creation of your disaster recovery plans. Be sure to use a template to make sure all your plans look and feel the same way for ease of use when it comes time to recover them. Be sure to clearly state the application’s RTO and RPO at the beginning of each plan so there is no confusion come recovery time.
Another equally important component of disaster recovery and business continuity is testing. Be sure to fully test your disaster recovery plans as often as possible, especially after there have been changes to the application or service the plan is based on. Be sure to pay attention to what doesn’t work during a disaster recovery test and fix it as soon as possible.
While we tend to focus on technology for disaster recovery, don’t forget about your people. Be sure to have a good understanding of who will be recovering an application should a disaster arise. This is one of the reasons it’s so important to have easy-to-follow disaster recovery plans. The person who wrote the plan may not be the person performing the disaster recovery.
Finally, consider an automation and orchestration solution for your disaster recovery planning and testing.
Veeam Availability Orchestrator can do everything we talked about in this guide, and then some. From creating full-application disaster recovery in less than 10 minutes to one-click testing and disaster recovery plan execution, Veeam Availability Orchestrator will simplify your VMware disaster recovery planning.
Veeam Availability Orchestrator will also generate all your disaster recovery plan documentation from a unified template dynamically. No more having to manually update a plan when applications are changed.
To learn more about Veeam Availability Orchestrator download a 30-day free trial.