HomeMogDBMogDB StackUqbar
v5.0

Documentation:v5.0

Supported Versions:

Other Versions:

Enhancement of Dirty Pages Flushing Performance

Availability

This feature is available since MogDB 5.0.8.

Introduction

In MogDB's incremental checkpoint mode (which is also the default dirty page flushing mode), when the database faces a scenario with high write pressure, a large number of dirty pages can accumulate, leading to the following consequences:

  1. Long checkpoint duration
  2. Long switchover duration
  3. Long downtime, etc.

MogDB 5.0.8 supports the ultimate dirty page flushing feature, which can be enabled by setting the parameter extreme_flush_dirty_page = on. If the current system's operations mentioned above have high latency, this parameter can be activated to improve the flushing speed under high-pressure scenarios, allowing upper-level operations to respond quickly and reducing the time required for executing checkpoints, switchovers, restarts, and RTO operations.

New GUC Parameters

extreme_flush_dirty_page

Parameter Description: Whether to enable the ultimate dirty page flushing mode (enabling it can speed up flushing, but increases write amplification)

This parameter is of the POSTMASTER type.

Value Range: Boolean

Default Value: off

Note: Please ensure that the current system's slow flushing is not due to system I/O capacity before turning on this parameter. Monitoring tools such as iostat and Node-exporter can be used to confirm that there are no disk I/O bottlenecks. For shared storage services, also ensure that the I/O capacity limit of the shared storage service is not reached.

checkpoint_target_time

Parameter Description: The desired maximum duration for executing a checkpoint (the smaller the value, the faster the flushing, the smaller the actual duration of the checkpoint, but write amplification increases. If I/O becomes a bottleneck, a very low value may affect the business); corresponding upstream operations include: shutdown (stop), switchover (primary-standby switch), manually executing the checkpoint statement.

This parameter is of the POSTMASTER type.

Value Range: 5 - 60s

Default Value: 30s

New Function

local_pagewriter_flush_detail()

Description: Displays detailed information about the flushing process, including GUC parameters related to flushing, variable information in the flushing process, etc. When the system's flushing is slow, calling this function can help analyze the bottleneck.

Permissions: Any user can call it.

Return Values:

Column Name Description
node_name Node name
pagewriter_sleep(ms) Flushing cycle duration
max_io_capacity(M) Maximum I/O capacity
dirty_page_percent_max Maximum dirty page ratio
candidate_buf_percent_target Target ratio of candidate buffer
max_redo_log_size(M) Maximum log redo size
main_pagewriter_detail Main pagewriter details: start time, waiting time, flush time
sub_pagewriter_detail ID: Sub-pagewriter number; wait_cost: Waiting time of the last flush cycle; flush_cost: Actual flush time of the last flush cycle
theoritical_max_io Theoretical maximum = (Theoretical maximum for 'scanning buffer to candidate queue' + Theoretical maximum for flushing from the dirty page queue)
lsn_percent LSN ratio
actual_max_io Actual maximum = (Actual maximum for 'scanning buffer to candidate queue' + Actual maximum for flushing from the dirty page queue)
actual_flush_num Actual flush value = (Actual value for 'scanning buffer to candidate queue' + Actual value for flushing from the dirty page queue)
remain_actual_dirty_page_num Remaining actual dirty page count
list_flush_detail Details for scanning buffer to candidate queue: current candidate buffer count, total buffer count
queue_flush_detail Details for flushing from the dirty page queue: dirty percent
forecast Forecast: current speed, estimated time for current checkpoint execution

Constraints

  • Enabling the ultimate dirty page flushing mode means that write amplification will increase. If I/O is already a bottleneck, enabling it will not be significantly optimized and may lead to a decrease in tPMC. Therefore, the premise of enabling the ultimate dirty page flushing mode is that machine I/O is not the current system's bottleneck.

Performance Improvement

After enabling the flush optimization, the checkpoint time and switchover time during SwitchOver have been improved by more than 47%, and the average value of TPMC is not significantly lost.

  • The average value of SwitchOver RTO has decreased by 47% to 67.5%

    Without enabling, the average value is 41.55 seconds, which is reduced to 13.5 seconds when checkpoint_target_time=5, and 22 seconds when checkpoint_target_time=30.

  • The average duration of checkpoint during SwitchOver has decreased by 49% to 73%

    Without enabling, the average value is 38.68 seconds, which is reduced to 10.42 seconds when checkpoint_target_time=5, and 19.67 seconds when checkpoint_target_time=30.

  • The average value of TPMC after enabling the flush optimization is close to the average value of TPMC without enabling the optimization.

TPCC and hardware configuration:

  1. TPCC: 3000 warehouses 500/600 terminals 10 minutes Run
  2. Hardware configuration: arm 48 CPU 200G Mem 3T Disk(RAID 0, 2 nvme SSD)

extreme_flush_dirty_page, checkpoint_target_time, local_pagewriter_flush_detail()

Copyright © 2011-2024 www.enmotech.com All rights reserved.