HomeMogDBMogDB StackUqbar
v5.0

Documentation:v5.0

Supported Versions:

Other Versions:

Optimizer

Optimizer aims to create an optimized execution plan. A certain SQL query (query tree) can be actually executed in multiple modes, in which the same result set will generate. If possible, a query optimizer may check each possible execution plan and finally choose the one run in highest peed.

imgNote: In some cases, checking all possible execution modes of a query may cost much time and memory, especially for some queries in execution that involve a large number of connection operations. Optimizer is used to judge a proper query plan instead of an optimized query plan in proper time.

An optimizer works with the data structure called paths. The data structure is a planned simplified version, including necessary information used for decisions making by an optimizer. After finding an economical path, it will make an integrated plan tree to pass it to an executor. It includes enough details, that is the plans to be executed, which can be read by the executor and run. In the following sections, the distinctions between paths and plans are ignored.

A Possible Plan Generated

The optimizer generates a plan for each relation (table) that appears in the scan query. The possible plans are determined by indexes available for each relation. A sequential search is always possible once for a relation, so a sequential search plan is always created. Suppose there is an index defined on a relation (such as a BTree index) and a query contains the constraint relation.attribute OPR constant. If relation.attribute happens to match a keyword in a BTree index and OPR is one of the operators listed in the operator class of the index, then another plan will be created that scans the relation using the B-tree index. If there are other indexes, and the constraints in the query match the keywords in that index, more plans will be generated.

If the query asks to link two or more relations, the plan for the linked relation will not be considered until all feasible plans have been found while scanning a single relation. There are three possible join strategies:

  • Nested loop join: The right relation is scanned once for each row found in the left relation. This strategy is easy to implement, but can be time-consuming. However, if the right relation can be scanned with an index, then this might be a good strategy. The index scan of the right relation can be performed using the number from the current row of the left relation as the key.
  • Merge join: Before the join starts, each relation sorts the join fields. The two relations are then scanned concurrently, and the matching rows are combined to form the join row. This union is more attractive because each relation is scanned only once. The required sorting step can be done either by an explicit sorting step or by scanning the relations in the appropriate order using the index on the join key.
  • Hash join: First the right relation is scanned and loaded into a Hash table with the joined field as a hash key, then the left relation is scanned and each row found is used as a hash key to locate the matching row in the table.

If there are more than two relations in a query, the final result must be built through a tree of join steps, each with two inputs. The optimizer examines the possible join orders and finds the one with the least cost.

If the query uses fewer relations than geqo_threshold, a near-exhaustive search is run in order to find the best access sequence. The planner will prefer to connect between any two relationships that have a corresponding entry clause in the WHERE qualification (eg. there is a relationship like where rel1.attr1=rel2.attr2). Consider joining only when there is no other choices. There is no join clause for a particular relation, that is, no join clause is available for another relation. The planner thinks of all possible plans for each joining pair, and one of the criteria for choosing a plan is (presumably) to choose the cheapest one.

The finished query tree consists of a sequential or index scan of the underlying relation, plus nested loop, merge, and hash join nodes as needed, plus any auxiliary steps needed, such as sort nodes or aggregate function calculation nodes. Most of these planning node types have additional selection (discard rows that don't match a specified boolean condition) and projection (compute a derived set of fields based on the given field values, that is to compute a scalar expression if needed). One responsibility of the optimizer is to attach a selection condition from the WHERE clause and the output expression needed to compute the most appropriate node for the plan tree.

Copyright © 2011-2024 www.enmotech.com All rights reserved.