Enhancement of WAL Redo Performance

Availability

This feature is available since MogDB 5.0.2.

Introduction

MogDB has the WAL redo performance enhanced.

Benefits

In primary/standby deployment scenarios, the WAL redo performance can be improved. In the TPCC scenario where there are 1000 warehouses, over 100 concurrent requests, one primary and one standby database servers, the WAL redo performance is promoted by 50%, and the the RTO time is shortened by 1/3.

Description

In the primary/standby deployment, the standby database obtains WALs from the primary database, and the WALs redo is performed to finish data synchronization between the primary and standby databases. When the primary database fails to provide services, the standby database can take over the services of the primary database. During this process, the standby database performs redo operations on all WAL logs sent by the primary database before being promoted to the primary database and providing services.

The redo performance of the standby database is not optimal, resulting in a long failover time or a long switchover operation during an primary/standby switchover drill. On the one hand, the database cannot provide services for a long time, resulting in user services being stopped for a long time. On the other hand, the data in the standby database will be delayed for a longer time than that in the primary database, causing WAL files in the standby database to accumulate and therefore occupy disk space.

MogDB provides a parallel redo mechanism that allows multiple threads to work simultaneously during redo. This feature optimizes table-level parallel redo and provides redo performance views to query the redo status. Specific optimization points are as follows:

Increase the number of batches of WAL logs handed over by starup threads to reduce performance degradation caused by WAL record flow.
Modify the table distribution policy to distribute redo tasks more evenly across worker threads.
Observation view: You can view the time spent and WAL redo status in each stage of the redo process.

Parameters：

No.	Parameter Description
1	enable_batch_dispatch: specifies whether to enable "batch optimization + load balancing optimization".
2	enable_time_report: specifies whether to count information required by redo_time_detail().
3	parallel_recovery_batch: specifies the quantity of WALs temporarily stored in startup threads in page-level concurrent recovery.
4	parallel_recovery_timeout: specifies the time period for which WAL records temporarily stored in startup are distributed if there are no WALs distributed in page-level concurrent recovery.
5	parallel_recovery_dispatch_algorithm: specifies the startup thread distribution algorithm in page-level concurrent recovery.

Functions：

No.	Function Description
1	redo_stat_detail(): queries the speed at which a standby database receives, flushes, or applies WALs in recent time period. It intuitively presents the WAL processing capabilities of the standby database.
2	redo_time_detail(): provides data to analyze redo problem.
3	dispatch_stat_detail(): queries redo loads of each worker thread to judge whether the load of each worker thread is in balanced.

Enhancement of WAL Redo Performance

Availability

Introduction

Benefits

Description

Related Pages