Availability

Infrastructure

Principles of High Availability

Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system. Redundancy should be made on every possible level - storage, network, filesystem, OS.
Detection of failures as they occur.
Reliable failover.

Deploy across multiple failure groups (e.g., different availability zones or regions)
Monitor all components in availability groups
Plan for automatic failover and test regularly
Use redundant networking, power, and storage

Windows Availability

Oracle Availability

MySQL High Availability

MSSQL Availability

Business Continuity Management involves:

Risk Management

Availability Group - a logical grouping of systems, services, or resources designed to work together to provide continuous service — even when one or more components fail. (Redundancy and Failover)
BAU - Business As Usual
BCI - Business Continuity Institute
BCM - Business Continuity Management
BCMS - Business Continuity Management System
BCP - Business Continuity Plan
BIA - Business Impact Analysis
Business Continuity Exercise - Practicing the BCP - e.g. a DR Test
DR - Disaster Recovery

Failover - when something fails and immediately and automatically switches over to a redundant copy.
Failure Group - a set of resources that are likely to fail together due to shared dependencies (like shared power, network switch, or storage).
IMT - Incident Management Team
Incident - Something that happens to threaten BAU. Examples include: Power outages, Floods, Fires, IT Outages, Cyber-attacks, Pandemics etc.
Invocation - The act of declaring that the BCP needs to be put into effect
ISO 22301 - BCM Certification
Redundancy - you should distribute redundant components across different failure groups
RPO - Recovery Point Objective
RTO - Recovert Time Objective
SPOF - Single Point Of Failure
Switchover - manual invocation of a failover (normally allowing for actions to eliminate data loss)
WAR - Work Area Recovery - where BAU activities will relocate during an incident. Sometimes referred to as the War Room.

Page updated

Google Sites

Report abuse