Usage Guide

Prerequisites

You have obtained training data.
If you use the provided tool to collect training data, you need to enable the WDR function. The involved parameters are track_stmt_stat_level and log_min_duration_statement. For details, see the following sections.
To ensure the prediction accuracy, the historical statement logs provided by users should be as comprehensive and representative as possible.

Collecting SQL Statements

This tool requires users to prepare data in advance. Each sample is separated by a newline character. The training data format is as follows:

SQL,EXECUTION_TIME

The prediction data format is as follows:

SQL

SQL indicates the text of an SQL statement, and EXECUTION_TIME indicates the execution time of the SQL statement. For details about the sample data, see train.csv and predict.csv in sample_data.

You can collect training data in the required format. The tool also provides the load_sql_from_rd script for automatic collection. The script obtains SQL information based on the WDR report. The involved parameters are log_min_duration_statement and track_stmt_stat_level:

log_min_duration_statement indicates the slow SQL threshold. If the value is 0, full collection is performed. The unit is millisecond.
track_stmt_stat_level indicates the information capture level. You are advised to set it to 'L0,L0'.

After this parameter is set, a certain amount of system resources may be occupied but the usage is generally low. In continuous high-concurrency scenarios, this may cause a performance loss less than 5%. If the database concurrency is low, the performance loss can be ignored. The following script is stored in the sqldiag root directory ($GAUSSHOME**/bin/components/sqldiag**).

Use a script to obtain the training set:
load_sql_from_wdr.py [-h] --port PORT --start_time START_TIME
                            --finish_time FINISH_TIME [--save_path SAVE_PATH]
Example:
    python load_sql_from_wdr.py --start_time "2021-04-25 00:00:00" --finish_time "2021-04-26 14:00:00" --port 5432  --save_path ./data.csv

Procedure

Provide historical logs for model training.

Perform training and prediction.

Template-based training and prediction:
   gs_dbmind component sqldiag [train, predict] -f FILE --model template --model-path template_model_path 
DNN-based training and prediction:
   gs_dbmind component sqldiag [train, predict] -f FILE --model dnn --model-path dnn_model_path

Examples

Use the provided test data to perform template-based training:

gs_dbmind component sqldiag train -f ./sample_data/train.csv --model template --model-path ./template

Use the provided test data for template-based prediction:

gs_dbmind component sqldiag predict -f ./sample_data/predict.csv --model template --model-path ./template --predicted-file ./result/t_result

Use the provided test data to update the template-based model:

gs_dbmind component sqldiag finetune -f ./sample_data/train.csv --model template --model-path ./template

Use the provided test data to perform DNN-based training:

gs_dbmind component sqldiag train -f ./sample_data/train.csv --model dnn --model-path ./dnn_model

Use the provided test data for DNN-based prediction:

gs_dbmind component sqldiag predict -f ./sample_data/predict.csv --model dnn --model-path ./dnn_model --predicted-file

Use the provided test data to update the DNN-based model:

gs_dbmind component sqldiag finetune -f ./sample_data/train.csv --model dnn --model-path ./dnn_model

Issue