SQL Running Status Observation

Availability

This feature is available since MogDB 3.0.0.

Introduction

The sampling thread is sampled once in 1s by default, which can be controlled by the GUC parameter asp_sample_interval, and up to 100000 lines of data are sampled in memory, which is controlled by asp_sample_num, and will be flushed to a new disk when the upper limit is reached.

Benefits

When a performance bottleneck is found in a SQL statement and the execution of each operator of that SQL cannot be queried in the sampled view, you can locate the performance problem by plan_node_id.

Operator: the specific action of each step in the execution of the SQL statement (e.g. SELECT, SUM, WHERE, Group By, Having, Order By, Limit, etc.)

Description

A new column plan_node_id is added to dbe_perf.local_active_session and GS_ASP to record the execution of each operator of the SQL statement.

The existing monitoring level is defined by the GUC parameter resource_track_level, which has three values according to the level, namely

none: Resources are not monitored.
query: Resources used at the query level are monitored.
operator: Resources used at query and operator levels are monitored.

So each operator of the SQL statement is sampled only if the resource_track_level is set to operator.

MogDB will start a background worker sampling thread after being started. In order to avoid wasting resources, this sampling thread will not sample all the time, but sample MogDB every one sampling period, collect the snapshot of MogDB running at that time and save it in memory. dbe_perf.local_active_session can query the real-time sampling information. The sampling period is defined by the GUC parameter asp_sample_interval, and the default sample period is 1s. MogDB will flush the sampled data in memory to the GS_ASP table for historical query when 100000 rows (controlled by guc parameter asp_sample_num) are sampled in memory every time. Only when the statement execution time is greater than the sampling time, the running information will be collected by the sampling thread.

Scenarios

Create the table test in session1 and perform the insert operation.

MogDB=# create table test(c1 int);
CREATE TABLE
MogDB=# insert into test select generate_series(1, 1000000000);

In session2, look up the query_id of the SQL from the active session view

MogDB=# select query,query_id from pg_stat_activity where query like 'insert into test select%';
                    query                                  |    query_id
-----------------------------------------------------------+-----------------
 insert into test select generate_series(1, 100000000000); | 562949953421368
(1 row)

In session2, according to the query_id from the active job management view to query the statement with plan_node_id execution plan (the statement execution cost needs to be greater than the GUC value resource_track_cost to be recorded in the view, the default value of the GUC parameter is 100000, session level can be updated, so in order to facilitate testing, you can change the value to 10 in the test)

Set resource_track_cost=10;

MogDB=# select query_plan from dbe_perf.statement_complex_runtime where queryid = 562949953421368;
                                 query_plan
----------------------------------------------------------------------------
 Coordinator Name: datanode1                                               +
 1 | Insert on test  (cost=0.00..17.51 rows=1000 width=8)                         +
 2 |  ->  Subquery Scan on "*SELECT*"  (cost=0.00..17.51 rows=1000 width=8)    +
 3 |   ->  Result  (cost=0.00..5.01 rows=1000 width=0)                         +
                                                                        +
(1 row)

In session2, the sampling of the statement is queried from the sampling view dbe_perf.local_active_session based on the query_id, and the performance analysis is done in conjunction with the execution plan of the above query.

MogDB=# select plan_node_id, count(plan_node_id) from dbe_perf.local_active_session where query_id = 562949953421368 group by plan_node_id;
 plan_node_id | count
--------------+-------
        3     |   12
        1     |   366
        2     |   2
(3 rows)

In session2, when the memory data reaches the upper limit (controlled by the GUC parameter asp_sample_num), the existing memory sampling data will be flushed to the gs_asp table, and the data sampled using the statement can be queried from the gs_asp table after the flush.

MogDB=# select plan_node_id, count(plan_node_id) from gs_asp where query_id = 562949953421368 group by plan_node_id;
 plan_node_id | count
--------------+-------
  3           |    19
  1           |   582
  2           |     3

(3 rows)

Conclusion

When it is found that there is a performance bottleneck in insert into test select generate_series(1, 1000000000), the above steps locate that the insert operation is sampled with the highest value (plan_node_id =1 , count=366) during the whole SQL statement execution, which can be optimized.

GS_ASP, LOCAL_ACTIVE_SESSION

Issue