Ecological Tools
Doc Menu

Reviewing and Modifying a Table Definition


To properly define a table, you must:

  1. Reduce the data volume scanned by using the partition pruning mechanism.
  2. Minimize random I/Os by using clustering or partial clustering.

The table definition is created during the database design and is reviewed and modified during the SQL statement optimization.

Selecting a Storage Model

During database design, some key factors about table design will greatly affect the subsequent query performance of the database. Table design affects data storage as well. Scientific table design reduces I/O operations and minimizes memory usage, improving the query performance.

Selecting a model for table storage is the first step of table definition. Select a proper storage model for your service based on the following table.

Storage Model Application Scenario
Row storage Point queries (simple index-based queries that only return a few records)
Scenarios requiring frequent addition, deletion, and modification operations
Column storage Statistics analysis query, in which operations, such as group and join, are performed many times

Using PCKs

  1. The PCK is the column-store-based technology. It can minimize or maximize sparse indexes to quickly filter base tables. You are advised to select a maximum of two columns as PCKs. Use the following principles to specify PCKs:

    1. The selected PCKs must be restricted by simple expressions in base tables. Such constraints are usually represented by col op const, in which col indicates the column name, op indicates operators, (including =, >, >=, <=, and <), and const indicates constants.
    2. Select columns that are frequently selected (to filter much more undesired data) in simple expressions.
    3. List the less frequently selected columns on the top.
    4. List the columns of the enumerated type at the top.

Using Partitioned Tables

Partitioning refers to splitting what is logically one large table into smaller physical pieces based on specific schemes. The table based on the logic is called a partitioned table, and a physical piece is called a partition. Data is stored in physical partitions not the logical table. A partitioned table has the following advantages over an ordinary table:

  1. High query performance: You can specify partitions when querying partitioned tables, improving query efficiency.
  2. High availability: If a certain partition in a partitioned table is faulty, data in the other partitions is still available.
  3. Easy maintenance: To fix a partitioned table having a faulty partition, you only need to fix the partition.

MogDB supports range partitioned tables.

Range partitioned table: Data in different ranges is mapped to different partitions. The range is determined by the partition key specified during the partitioned table creation. The partition key is usually a date. For example, sales data is partitioned by month.

Selecting a Data Type

Use the following principles to select efficient data types:

  1. Select data types that facilitate data calculation.

    Generally, the calculation of integers (including common comparison calculations, such as =, >, <, ≥, ≤, and ≠ and group by) is more efficient than that of strings and floating point numbers. For example, if you need to perform a point query on a column-store table whose numeric column is used as a filter condition, the query will take over 10s. If you change the data type from NUMERIC to INT, the query duration will be reduced to 1.8s.

  2. Select data types with a short length.

    Data types with short length reduce both the data file size and the memory used for computing, improving the I/O and computing performance. For example, use SMALLINT instead of INT, and INT instead of BIGINT.

  3. Use the same data type for a join.

    You are advised to use the same data type for a join. To join columns with different data types, the database needs to convert them to the same type, which leads to additional performance overheads.