Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system. Redundancy should be made on every possible level - storage, network, filesystem, OS.
Detection of failures as they occur.
Reliable failover.
Deploy across multiple failure groups (e.g., different availability zones or regions)
Monitor all components in availability groups
Plan for automatic failover and test regularly
Use redundant networking, power, and storage
Business continuity refers to the planning and preparation undertaken to ensure critical functions and operations can continue or be quickly resumed in the event of disruption. This involves a set of processes, strategies, and practices aimed at minimising the impact of disruptive incidents and maintaining the overall resilience of the business.
Business Continuity Management involves:
Planning
Recovery
Management
BAU
Risk Management
Resilience
Procedures
Availability Group - a logical grouping of systems, services, or resources designed to work together to provide continuous service — even when one or more components fail. (Redundancy and Failover)
BAU - Business As Usual
BCI - Business Continuity Institute
BCM - Business Continuity Management
BCMS - Business Continuity Management System
BCP - Business Continuity Plan
BIA - Business Impact Analysis
Business Continuity Exercise - Practicing the BCP - e.g. a DR Test
DR - Disaster Recovery
Failover - when something fails and immediately and automatically switches over to a redundant copy.
Failure Group - a set of resources that are likely to fail together due to shared dependencies (like shared power, network switch, or storage).
IMT - Incident Management Team
Incident - Something that happens to threaten BAU. Examples include: Power outages, Floods, Fires, IT Outages, Cyber-attacks, Pandemics etc.
Invocation - The act of declaring that the BCP needs to be put into effect
ISO 22301 - BCM Certification
IT DR Management - the on-going process of planning, developing, testing and implementing disaster recovery management procedures, processes and associated solutions. Where applicable, these processes and procedures ensure the regular testing of IT DR solutions and the provision of plans in the event of a DR invocation.
IT Service Continuity - See IT DR Management.
Redundancy - you should distribute redundant components across different failure groups
RPO - Recovery Point Objective
RTO - Recovery Time Objective
SPOF - Single Point Of Failure
Switchover - manual invocation of a failover (normally allowing for actions to eliminate data loss)
WAR - Work Area Recovery - where BAU activities will relocate during an incident. Sometimes referred to as the War Room.