Business Continuity and Disaster Recovery in business context
In a world and society characterised by a high degree of digitalisation, the demands on companies to provide services to customers very quickly, agilely, variably and with high performance and UX are becoming ever higher. These requirements are usually diametrically opposed to the issues of availability, resilience and reliability. Nevertheless, the failure of a service leads to customers migrating or being disappointed, which has an impact on the financial situation or the reputation of a company: Since other competitors are in the same situation, customers can also switch quickly if the service is available there.
It is therefore of great importance to be able to provide services reliably, which falls under the area of Business Continuity Management (BCM):
• Within the framework of BCM, business continuity strategies are developed to help companies ensure the continuity of their business activities
• Normally, this is the responsibility of a Business Continuity Manager outside the specialist departments
Within this BCM framework, business continuity measures include the following:
PDCA Cycle (Deming wheel) — serves to establish continuous improvements in process management for companies. Within BCM, appropriate participants in BC and DR measures can then be defined.
• Plan — People involved in the process of planning
• Do — Implementing agents
• Check — Review by other departments
• Act — Constant operational implementation by specialist department
Business Continuity Plan (BCP)
• Outlines advanced preparations to ensure that critical business functions can continue.
• Procedures for all phases of recovery to ensure fast, effective execution of recovery strategies for critical business functions
• The objective of the BCP is to coordinate the recovery of critical business functions in managing and supporting business recovery in the event of an interruption due to emergency or crisis.
- short or long-term disturbances due to human or technical failure.
Business Impact Analysis (BIA)
• Predicting and analysing the consequences of interruptions to business processes, i.e. the IT service provided as such
• Collecting information needed to develop recovery strategies.
• Calculation of financial and reputational damage.
• Determination of MAD/MTD/MTPD (see below)
• Enables determination of RPO and RTO (see below)
General risk assessment approach based on business continuity threats of the BCM institute:
• Loss of facilities (office)
• loss of facilities (data centres)
• Loss of technology and essential equipment
• Loss of key personnel
• Loss of services provided by (sub)suppliers
• This threat is the basis for the analysis for each application, each platform, each service
• Companies determined acceptance, avoidance, transfer and mitigation of business continuity risks in a risk assessment
However, the BIA considers all the components of a business process that are necessary to provide an IT service over its entire life cycle. The basis for this is provided by the BIA:
Information Technology Infrastructure Library (ITIL)
• Defined processes, functions and roles for the corporate IT infrastructure
• Defines 26 core processes to which the IT infrastructure is subject (ITIL version 3)
• Measurability of the processes is always a prerequisite
• ISO/IEC 20000:2005 ITIL certification model
• Since ITIL 4 Edition change to a Service Value System tailored for added value
IT Service Continuity Management (ITSCM)
• ITIL process addresses the risks that can have a serious impact on IT services
• ITSCM must be set up to complement the management systems for business continuity and information security
• Used to assess the risk of process failure: Financial, reputation, compliance, and strategic
• In the case of business continuity, these are determined via BIA
Since business processes are an accumulation of different processes, people, front-end and back-end components etc., they are usually grouped together in a cluster in order to derive appropriate measures:
Business Continuity Response Cluster
• Bundling applications into a response cluster derived from the risks to business continuity in order to be able to react appropriately.
• Based on related application within a business area
- These applications then include all the components and processes required to deliver the application.
Disaster Recovery (DR) — focus on failures caused by long-term disruptions or disasters, such as
• Fires, explosions, chemical accidents
• Floods, earthquakes
• Terrorism
• extended power supply interruptions
Disaster Recovery focuses on the recovery of technological equipment and platforms or other required technological infrastructure in the event of a permanent loss of equipment and/or technology, including data
Terms used in the context of BCDR measures
Business continuity and disaster recovery measures are often considered together.
• Business continuity ensures that services continue to operate during or after incidents
• Disaster Recovery defines how services are restored after a complete failure
RTO = Recovery Time Objective
The aim of the recovery time is the time span after an incident
• Computer, platform, server failure
• Application crash
• Network disruption
Within which a product or service must be resumed before unacceptable consequences occur.
This includes restoration of
• Resources
• Applications
• Services
• of normal operation
RPO = Recovery Point Objective
Recovery point at which information used by an activity must be recovered so that business activities can function when resumed
• This defines parameters for the intervals at which backups are performed for recovery
• The RPO value indicates the period of time during which data can be lost and depends on the sources from which the transaction data can be obtained.
• This time is relatively short if the backups can be retrieved directly for recovery, but it becomes longer if there are media breaks in the backups (different backup media) The recovered data is only as current as the last backup
Example:
• If backups are only made every 24 hours, the RPO is 24 hours.
• Data that has changed during this period may then be lost
MTD (Maximum Tolerable Downtime) and MAD = (Maximum Allowable Dowtime) are to be understood in the same way — often used on MTPD (Maximum Tolerable Period of Destruction). They indicate the time that a company can tolerate the loss of a service — this is defined by appropriate BIA (Business Impact Analysis).
The framework for business continuity and disaster recovery is therefore comprehensive and affects many areas of the business. For the entire IT infrastructure and business processes it is therefore very important that BCDR measures are planned and tracked from the very beginning. For this purpose, appropriate positions must be created within a company, as this cannot be done by the specialist departments themselves. The role of the BC Manager with the support of business analysts should be filled accordingly. Above all, the BIA area and risk assessment is part of the management consideration and is decisive for the continuation of business activities. If services can no longer be offered to customers, this will have a direct impact on the company’s reputation and will be disseminated accordingly on the social media. Immediately after the information security measures, business continuity should therefore follow when planning the IT infrastructure. Only when these components can be guaranteed can the company concentrate on the actual design of the service. So this is part of basic components such as an umbrella, rubber boots and a mackintosh when it rains — you don’t want to get wet, do you?