SQL Server AlwaysOn is a marketing term for the HA and DR solutions introduced when SQL Server 2012 was launched. Specifically:
AlwaysOn Failover Clustering Instances (AlwaysOn FCI) - new in SQL2012
AlwaysOn Availability Groups (AlwaysOn AG) - new in SQL2012
AlwaysOn Basic Availability Groups (AlwaysOn BAG) - new in SQL2016
AlwaysOn Distributed Availability Groups (AlwaysOn DAG) - new in SQL2016
Does not need shared disk
Is available in SE since SQL2016
Only two replicas (primary and secondary)
AlwaysOn BAG provides failover support for a single database only, replacing database mirroring which is deprecated.
Not 100% uptime. Failing over involves a short outage.
VMware does not support VMware vSphere vMotion® or DRS on clustered SQL Server virtual machines within a shared disk architecture (such as AlwaysOn Failover Cluster Instances). This restriction does not apply to AlwaysOn Availability Groups that are built on a non-shared disk architecture. VMware fully supports using vSphere HA, vSphere vMotion, and DRS with AlwaysOn Availability Groups. (1)
"The HA choice comes down to where you want your HA to be managed. VMware HA pushes the high availability question out of the SQL Server realm and into the VMware infrastructure. More than anything else, this is a business decision – just be sure you’re happy with the decision of which team is managing your uptime." (2)
A primary replica and multiple secondary replicas form an availability group. Each replica is hosted on a different failure group (like different servers or data centers) to avoid simultaneous failures.
Always On AGs don’t eliminate disconnects — they make them short enough that good applications ride straight through them. (ChatGPT)
A few seconds of downtime is unavoidable at the SQL Server engine level.
During AG failover:
The primary replica changes
The Listener IP moves
Existing TCP sessions are terminated
In-flight transactions are rolled back
This results in a short (maybe 5 to 30 seconds) outage.
The outage can be shortened by overiding the conservative WSFC failover sensitivity (using PowerShell)...
(Get-Cluster).SameSubnetDelay
(Get-Cluster).SameSubnetThreshold
Start by setting...
SameSubnetDelay = 1000 # ms
SameSubnetThreshold = 5
Tune carefully to avoid false positivesZero Transaction Loss is achievable with application-side retry logic.
Use:
Short connection timeouts
Automatic retry on transient errors
Idempotent transactions where possible
Typical retry pattern:
1–3 retries
1–5 second backoff
Total retry window: ~15 seconds
Errors to retry on:
4060 (Cannot open database)
10053, 10054, 10060 (network errors)
64 (network name deleted)
233 (connection initialization error)
ADO.NET
MultiSubnetFailover=True;
Connect Timeout=5;
JDBC
jdbc:sqlserver://MyAgListener.company.local:1433;
databaseName=MyDatabase;
applicationName=MyApp;
integratedSecurity=false;
user=myuser;
password=mypassword;
multiSubnetFailover=true;
loginTimeout=5;
connectRetryCount=3;
connectRetryInterval=5;
For each server licensed with SQL Server 2017 and covered by active SA, customers can run up to the same number of passive failover instances in a separate, OSE to support failover events. A passive SQL Server instance is one that is not serving SQL Server data to clients or running active SQL Server workloads.
http://download.microsoft.com/download/7/8/c/78cdf005-97c1-4129-926b-ce4a6fe92cf5/sql_server_2017_licensing_guide.pdfFor each on-prem server OSE licensed with SQL Server 2019 and covered by active SA, customers can use the following passive replicas in anticipation of a failover event:
One passive fail-over replica for High Availability in a separate OSE
One passive fail-over replica for Disaster Recovery in a separate OSE
One passive fail-over replica for Disaster Recovery in a single VM on Azure
VMWare Stretch Cluster
AlwaysOn Availability Groups
Pros
Storage manages storage replication - bi-directional
Pros
Readable
Backup
CheckDB
Corruption Repair
Query (needs license)
Flows Primary to Secondary - resync can be more difficult