Hurricanes, earthquakes, fires, floods, terrorist attacks, and cyberattacks…. Unfortunately, these incidents could happen to your organization at any time, and are for the most part beyond your control. However, you can control how well your organization is equipped to respond. Having a formal Disaster Recovery Plan (DRP) in place will ensure you can successfully recover from and continue operating in the event of a disaster.
Perhaps you have a DRP but it’s been a while since you’ve given it any thought. The document is only as good as the actual use and accuracy of the information within it. Do you know where to locate your organization’s DRP? How thorough is your plan? When was the plan last updated? Has it been tested recently? Have you taken into account new technologies and services?
Often combined with your business continuity program, IT disaster recovery planning involves the processes, policies, and procedures that define the delivery of critical technical services needed to support business operations in the event of natural or man-made disaster. Without ongoing, focused attention on the disaster recovery plan beforehand, your organization may be unable to deliver the necessary services during a disaster, resulting in significant interruption of business and a potential loss of data.
Below is some high level guidance to consider when creating your organization’s Disaster Recovery Plan.
BUSINESS IMPACT ANALYSIS
In order to adequately create a DRP or assess your existing plan, you should start by performing a risk assessment or Business Impact Analysis (BIA) to accurately identify all IT services that currently support your organization’s critical business activities. A BIA will allow you to properly prioritize business processes and system resources.
You need to identify all critical systems, applications, and key infrastructure components in use. Prioritize systems from most critical to least critical. Include a complete inventory of hardware and software applications in priority order, and include vendor contract information and technical support contact numbers so you can refer to them as needed. Make sure your inventory includes all systems and applications hosted internally, and all of those that have been outsourced to third-parties. As more and more organizations are moving core services to the cloud, this may help with disaster recovery. For example, if your institution uses Gmail, even if a disaster occurred that wiped out the entire organizational network, most employees would still be able to log in to their Gmail accounts since it is hosted offsite.
Determine your tolerance for downtime and develop recovery time objectives. How much unplanned downtime is acceptable for your organization to sustain? Many organizations will divide systems into three tiers. Tier 1 should include any systems or applications you would need to access immediately; that you can’t do business without. Tier 2 includes systems that are essential, but that you don’t necessarily need right away. You could operate without them for 8-10, or even 24 hours. Tier 3 systems might be those that will be okay if they aren’t recovered for a few days.
In March of this year we saw the City of Atlanta crippled by a ransomware attack. Many services were down for a week, leaving residents unable to pay water bills and parking tickets, and police unable to enter system reports. The final cost of the attack is now estimated at around $17 million. Moving forward, the City is implementing a complete overhaul of all software and systems, and most likely, their disaster recovery and business continuity plan will be a high priority. Lesson learned!
Next you should list and categorize all possible threats and their potential impact on various tiers or systems. For example, if a hurricane takes out your hosting center, do you have a transition/failover plan in place? If a ransomware attack compromises your primary network server, can you still access those applications from the fallback server?
It may be helpful to create a list of each possible scenario and then walk through each one. Specifically, you will need to clearly define the steps and equipment needed to bring critical services back online. This information should outline all necessary details including who to contact, where data is stored, where data can be restored, etc. Consider issues such as budgets, availability of resources, human constraints, technological constraints, and regulatory obligations.
Your DRP should include a list of key personnel and their contact information. A common mistake many organizations make is to focus primarily on technology and not enough on people and process. Determine who is responsible for what, and define key roles and responsibilities for your entire staff, from C-level executives all the way down. This list should include availability of staff, identify back-up personnel, and have a succession plan in case a key staff member is in a place where they cannot perform their assigned role (e.g. on vacation, on an airplane). It is important for all employees to understand that disaster recovery is not just an IT function and that they too play a role in protecting the organization’s data and ensuring its ongoing availability.
The role of the individual is especially critical when working with third-party vendors or service providers. All parties involved need to be aware of each other’s responsibilities in order to ensure the DRP operates as efficiently as possible. Make sure your contract agreements include emergency procedures and contacts, and define the level of service you can expect in the event of a disaster. Where relevant, consider including acceptable timeframes for getting a response, who will be involved, what their role(s) will be, etc. in the vendor agreement.
Be sure that your DRP includes acceptable practices for handling sensitive information. Define procedures covering the protection, maintenance, and storage of any sensitive data. Having comprehensive backup and disaster recovery strategies in place is a requirement for multiple regulatory standards including FERPA, HIPAA, and PCI.
Your disaster recovery plan should accurately define the steps and equipment needed to bring critical services back online. It should outline all necessary details including who to contact, what data elements are being stored, where data is stored, where data is replicated, etc. There should be written documentation of all steps and actions to be taken during an incident to facilitate a successful recovery.
You also need to review physical facilities and identify alternate work areas within the same location, at different organizational locations, or at third-party provided locations. Make sure you are assessing physical security, staff access procedures, ID badges, etc. at secondary sites, just as you would your primary location. Don’t overlook heating, ventilation, and air conditioning (HVAC) for IT systems, sufficient electrical power, data infrastructure, the distance of the alternate location from the primary site, staffing at the alternative site, availability of failover (to a backup system) and failback (return to normal operations) technologies to facilitate recovery, and support for legacy systems.
Prepare checklists or recovery plan actions for all disasters or emergency scenarios that you documented during the threat analysis. While it is impossible to anticipate every scenario, the more that are thought through before, the less likely your teams will have to develop recovery actions on the fly. Over time you can continue to include additional scenarios, increasing your chances of having an effective playbook to follow. The more detailed your organization’s disaster recovery plan is, the more likely affected IT assets can be quickly recovered and business can return to normal operations.
In the event of a disaster, how are you going to communicate with your employees? What happens if your e-mail system is down? What if the cell towers are overloaded or down? Do employees know how to gain access to necessary documentation? You may need alternative methods of contacting your employees and keeping them updated throughout the incident. Have these alternate communication processes documented and in-place, and reference the necessary procedures in the DRP.
Once your disaster recovery plan has been completed, it is ready to be tested. This is often done through a table top exercise, and will help determine whether you can recover and restore IT assets as planned. Include details as to how the environments will be tested, including the method(s), frequency, and schedule of tests, within the plan. There are a lot of unknown issues that can break a plan. The only way to find them is to test it when you can afford to fail. Make sure all key stakeholders are included in the test and document both the successes and weaknesses so you can update the plan accordingly. When it comes down to it, you are only as good as your last test.
A big mistake organizations make is not updating their disaster recovery plan after changes are made to their environment (e.g. new systems or technologies). If you have recently implemented any significant system upgrades or rolled out new technologies, you need to review your DRP and ensure the changes are included with your business impact analysis.
Taking time and resources to audit your disaster recovery plan can also be a valuable exercise to ensure it addresses all relevant controls, people, process, and technology issues. During the IT audit, evidence will be collected and evaluated to identify any weakness in planning or procedures that could affect your ability to successfully recover from a disaster. It can help locate any areas of the disaster recovery plan that are incomplete, lack suitable procedures or documentation, are untested, do not align with required time frame objectives from the BIA, or are not up to date. An audit will verify that all staff understand their individual roles and responsibilities, and identify any potential gaps there as well.
The creation and annual validation of the DRP will require staff resources. To ensure executive support for this work, you should address the need for disaster recovery through the analysis of potential financial losses due to an event or breach. Work with your legal and financial departments to document the total losses per day that your organization would face if you were not able to quickly recover your organization’s core IT and related business services.
National Institute of Standards and Technology (NIST) Special Publication 800-34
ISO/IEC 24762: Guidelines for Information and Communications Technology Disaster Recovery Services
Some additional guidance from our Security Advisor Team below:
[Campbell]: The article above covers most of the major considerations for DRP creation, testing, maintenance, etc., so I don’t have much to add. I will, however, point out that while you can roll your own based upon these steps and plan elements, your institution might be better served to select and leverage a more formal DRP framework and audit plan. A couple of examples are listed under the Resources above. There are also other options with varying levels of formality, such as ISACA. Find one that fits your institution and use it. Draw in internal audit to help the staff who live and breathe it evaluate it with fresh eyes. Consider periodically bringing in completely fresh outside eyes to audit your plan and preparedness. DRP is a great example of looking beyond checkbox compliance; when you suffer a disaster all of that planning, preparation, training, etc. will suddenly not look like busy work, because your team will be as ready as possible to get operations running so that you can safely conduct business.