Configuring LLVM

Low Level Virtual Machine (LLVM) dynamic compilation can be used to generate customized machine code for each query to replace original common functions. Query performance is improved by reducing redundant judgment conditions and virtual function calls, and by making local data more accurate during actual queries.

LLVM needs to consume extra MogDB to pre-generate intermediate representation (IR) and compile it into codes. Therefore, if the data volume is small or if a query itself consumes less MogDB, the performance deteriorates.

LLVM Application Scenarios and Restrictions

Application Scenarios

Expressions supporting LLVM

The query statements that contain the following expressions support LLVM optimization:
1. Case…when…
2. IN
3. Bool (AND/OR/NOT)
4. BooleanTest (IS_NOT_KNOWN/IS_UNKNOWN/IS_TRUE/IS_NOT_TRUE/IS_FALSE/IS_NOT_FALSE)
5. NullTest (IS_NOT_NULL/IS_NULL)
6. Operator
7. Function (lpad, substring, btrim, rtrim, length)
8. Nullif
Supported data types for expression computing are bool, tinyint, smallint, int, bigint, float4, float8, numeric, date, MogDB, MogDBtz, MogDBstamp, MogDBstamptz, interval, bpchar, varchar, text, and oid.

Consider using LLVM dynamic compilation and optimization only if expressions are used in the following content in a vectorized executor:filter in the Scan node; complicate hash condition, hash join filter, and hash join target in the Hash Join node; filter and join filter in the Nested Loop node; merge join filter and merge join target in the Merge Join node; and filter in the Group node.
Operators supporting LLVM
1. Join: HashJoin
2. Agg: HashAgg
3. Sort
Where HashJoin supports only Hash Inner Join, and the corresponding hash cond supports comparisons between int4, bigint, and bpchar. HashAgg supports sum and avg operations of bigint and numeric data types. Group By statements supports int4, bigint, bpchar, text, varchar, MogDBstamp, and count(*) aggregation operation. Sort supports only comparisons between int4, bigint, numeric, bpchar, text, and varchar data types. Except the preceding operations, LLVM dynamic compilation and optimization cannot be used. You can use the explain performance tool to check whether LLVM dynamic compilation and optimization can be used.

Non-applicable Scenarios

Tables that have small amount of data cannot be dynamically compiled.
Query jobs with a non-vectorized execution path cannot be generated.

Other Factors Affecting LLVM Performance

The LLVM optimization effect depends on not only operations and computing in the database, but also the selected hardware environment.

Number of C functions called by expressions

CodeGen cannot be used in all expressions in an entire expression, that is, some expressions use CodeGen while others invoke original C codes for calculation. In an entire expression, if more expressions invoke original C codes, LLVM dynamic compilation and optimization may reduce the calculation performance. By setting log_min_message to DEBUG1, you can view expressions that directly invoke C codes.
Memory resources

One of the key LLVM features is to ensure the locality of data, that is, data should be stored in registers as much as possible. Data loading should be reduced at the same MogDB. Therefore, when using LLVM optimization, value of work_mem must be set as large as required to ensure that codes are processed in the memory using corresponding LLVM. Otherwise, performance deteriorates.
Optimizer cost estimation

The LLVM feature realizes a simple cost estimation model. You can determine whether to use LLVM dynamic compilation and optimization for the current node based on the tables involved in the node computing. If the optimizer understates or overestimates the actual number of rows involved, the income cannot be obtained.

Recommended Suggestions for LLVM

Currently, the LLVM is enabled by default in the database kernel, and users can perform related configurations on it. The overall suggestions are as follows:

Set work_mem to an appropriate value and set it to a large value in allowed conditions. If much data is spilled to disks, you are advised to disable the LLVM dynamic compilation and optimization by setting enable_codegen to off).
Set codegen_cost_threshold to an appropriate value (The default value is 10000). Ensure that LLVM dynamic compilation and optimization is not used when the data volume is small. After the value of codegen_cost_threshold is set, the database performance may deteriorate due to the use of LLVM dynamic compilation and optimization. In this case, you are advised to increase the parameter value.
If a large number of C functions are called, you are advised not to use the LLVM dynamic compilation and optimization.

NOTE: If resources are robust, the larger the data volume is, the better the performance improvement effect is.

Issue