HomeMogDBMogDB StackUqbar
v2.1

Documentation:v2.1

Supported Versions:

Other Versions:

Usage Guide

Prerequisites

  • Ensure that users provide training data.
  • If the user collects training data through the tools provided, the WDR function needs to be enabled, and the parameters involved are track_stmt_stat_level and log_min_duration_statement, as described in the following section.
  • To ensure the prediction accuracy, the historical statement logs provided by users should be as comprehensive and representative as possible.
  • The Python 3.6+ environment and dependencies have been configured as required.

Environment Configuration

This function requires Python 3.6+ to run. The required third-party dependency packages are recorded in the requirements.txt file, and the dependencies can be installed via the pip install command, for example:

pip install requirements.txt

Collecting SQL Statements

This tool requires the user to prepare the data in advance, and the training data are in the following format, with each sample separated by a newline character.

SQL,EXECUTION_TIME

The format of the predicted data is as follows:

SQL

SQL denotes the text of SQL statement and EXECUTION_TIME denotes the execution time of SQL statement. Sample data are shown in train.csv and predict.csv in sample_data.

Users can collect training data by themselves in the required format, and the tool also provides a script for automatic collection (load_sql_from_rd), which obtains SQL information based on WDR reports and involves parameters such as log_min_duration_statement and track_stmt_stat_level.

  • log_min_duration_statement indicates the slow SQL threshold, if 0 then the full amount is collected (in milliseconds).
  • track_stmt_stat_level indicates the level of information capture, it is recommended to set track_stmt_stat_level='L0,L0'

If this parameter is enabled, a certain amount of system resources may be occupied but the usage is generally low. Continuous high-concurrency scenarios may generate less than 5% performance loss. If the database concurrency is low, the performance loss can be ignored.

# Use the script to get the training data:
load_sql_from_wdr.py [-h] --port PORT --start_time START_TIME
                            --finish_time FINISH_TIME [--save_path SAVE_PATH]
# For example:
    python load_sql_from_wdr.py --start_time "2021-04-25 00:00:00" --finish_time "2021-04-26 14:00:00" --port 5432  --save_path ./data.csv

Procedure

  1. Provide historical logs for model training.

  2. Perform training and prediction.

    # Training and prediction based on template:
        python main.py [train, predict] -f FILE --model template --model-path template_model_path
    # Training and prediction based on DNN:
        python main.py [train, predict] -f FILE --model dnn --model-path dnn_model_path

Examples

In the root directory of this tool, run the following commands to achieve the corresponding functions.

Use the provided test data for template training:

python main.py train -f ./sample_data/train.csv --model template --model-path ./template

Use the provided test data to make templated predictions:

python main.py predict -f ./sample_data/predict.csv --model template --model-path ./template --predicted-file ./result/t_result

Use the provided test data to update the templated model:

python main.py finetune -f ./sample_data/train.csv --model template --model-path ./template

Use the provided test data for DNN training:

python main.py train -f ./sample_data/train.csv --model dnn --model-path ./dnn_model

Use the provided test data to make DNN predictions:

python main.py predict -f ./sample_data/predict.csv --model dnn --model-path ./dnn_model --predicted-file

Use the provided test data to update the DNN model:

python main.py finetune -f ./sample_data/train.csv --model dnn --model-path ./dnn_model
Copyright © 2011-2024 www.enmotech.com All rights reserved.