HA (High Availability) refers to that a system can keep running uninterruptedly, which shows the system availability and is one of the criteria for designing a system. Compared with the HA system components, the HA system can run longer time. HA is always achieved by improving the fault tolerance of a system.
In the database field, HA refers to that manual or automatic primary/standby switchover can be realized once the primary database fails in the scenario where there are one primary database and one standby database or there are one primary database and multiple standby databases so that the database downtime can be decreased and the service impact is reduced.
Primary/standby mainly applies to the database scenarios, such as primary/standby MogDB database or MySQL database. Primary/standby also applies to stateful application services. Primary/standby indicates the primary database instance and standby database instance which own independent data files. Primary/standby database instances achieve data synchronization through a data operation log. The primary database instance allows read and write operations while the standby database instance allows only the read operation. The standby database instance can ensure that data view delay does not exceed a specific range (usually refers to the latest data transaction) by using an instant playback operation log.
Database involves physical replication (streaming replication) and logical replication.
- Physical replication indicates that redo logs are used to record data block changes. It can be used for copying a primary database instance to obtain the same standby database instance. Physical replication involves synchronous and asynchronous replication. It aims to keep data files of the primary and standby database instances consistent, thereby protecting data to the great extent.
- Logical replication transfers logical operation logs. The data status of the primary and standby database instances is consistent, but data files stored in disks are different.
Compared with physical replication, logical replication is more flexible. However, the possibility of data inconsistency is greater than that of physical replication in specific situations.
Asynchronous streaming replication indicates that transactions submitted in the primary database instance do not need to wait for being received by the standby database instance and a successful response is returned only when transactions are written into WAL logs. If the primary database instance breaks down, transactions submitted in the primary database instance may fail to be sent to the standby database instance, which will cause data loss. The amount of data loss in the standby database instance is related to WAL replication delay. The greater the delay is, the more data is lost.
Synchronous streaming replication refers to that a successful response is returned only when the primary database instance receives a confirmation information from the standby database instance after the standby database instance receives the WAL logs upon submission of transactions in the primary database instance. This will ensure data integrity but increase the transaction response time. Therefore, the throughput of synchronous streaming replication is lower than that of asynchronous streaming replication.
Switchover refers to that a standby server is switched as a primary server out of the need for maintenance. This can prevent data loss during switchover.
Failover refers to that a standby server is switched as a primary server once the original primary server fails.