Ecological Tools
Doc Menu

Optimizing Database Parameters

Optimizing Database Memory Parameters

The performance of complex query statements strongly depends on the configuration parameters of the database memory. The database memory parameters include the control parameters for logical memory management and parameters determining whether execution operators are spilled to disks.

Parameter for Logical Memory Management

max_process_memory is a parameter used for logical memory management.

Use the following formula to calculate the available memory for job execution:

max_process_memory – Shared memory (including shared_buffers) – cstore_buffers

Therefore, the memory available to job execution depends on shared_buffers and cstore_buffers.

Views for logical memory management are provided to display the used memory and peak information in each database block. You can connect to a database node and run pg_total_memory_detail to query information about the memory usage on this database node. Alternatively, you can connect to the primary node of the database and run pgxc_total_memory_detail to query information about the memory usage on all the database nodes.

When the specified physical memory is insufficient, work_mem determines whether to write additional operator calculation data into temporary tables based on query characteristics and concurrency. This reduces performance by five to 10 times and prolongs the query response time from seconds to minutes.

  • For complex serial queries, each query requires five to ten associated operations. Set work_mem using the following formula:work_mem = 50% of the memory/10.
  • For simple serial queries, each query requires two to five associated operations. Set work_mem using the following formula:work_mem = 50% of the memory/5.
  • For concurrent queries, set work_mem using the following formula:work_mem = work_mem for serial queries/Number of concurrent SQL statements.

Parameter Determining Whether to Spill Execution Operators to Disks

work_mem sets the used memory threshold. Execution operators that can be spilled to disks will be written when the used memory exceeds the threshold. Such execution operators include Hash(VecHashJoin), Agg(VecAgg), Sort(VecSort), Material(VecMaterial), SetOp(VecSetOp), and WindowAgg(VecWindowAgg). They can be vectorized or non-vectorized. This parameter ensures concurrent throughput and the performance of a single query job. Therefore, you need to optimize the parameter based on the output of Explain Performance.

Optimizing Concurrent Queue Parameters

You can globally or locally control concurrent queues.

Global Concurrent Queues

In a global concurrent queue, max_active_statements controls the number of concurrent jobs on the primary node of the database. All common users' jobs are controlled, regardless of their complexity. When the number of concurrent jobs reaches the specified threshold, the rest of the jobs wait in a queue. Administrators' jobs are not under such control.

Set this parameter based on system capacities, such as memory and I/O usage. In a resource pool associated with common users, if the jobs of different priorities occupy different portions, they will be queued by priority first. Then, the jobs of the same priority are queued. Jobs in the queue of highest priority will be woken up first.

img NOTE:

  • In a high transactional concurrency scenario, you are advised to set max_active_statements to -1, indicating that global concurrency is not limited.
  • In an analytical query scenario, set max_active_statements to the number of CPU cores divided by the number of database nodes. Generally, its value ranges from 5 to 8.

Local Concurrent Queues

You can locally control the number of concurrent jobs within the same resource pool on the primary node of the database. The number of concurrent complex jobs are controlled based on their cost.

parctl_min_cost is used to determine whether a job is complex.