MogDB
Ecological Tools
Doc Menu

Column-store Tables Management

What is Column-store

Row-store stores tables to disk partitions by row, and column-store stores tables to disk partitions by column. By default, a row-store table is created. For details about differences between row storage and column storage, see Figure 1.

Figure 1 Differences between row storage and column storage

img

In the preceding figure, the upper left part is a row-store table, and the upper right part shows how the row-store table is stored on a disk; the lower left part is a column-store table, and the lower right part shows how the column-store table is stored on a disk. From the above figure, you can clearly see that the data of a row-store table are put together, but they are kept separately in column-store table.

Advantages and Disadvantages of Row-store and Column-store Tables and Their Usage Scenario

Both storage models have benefits and drawbacks.

Storage Model Benefit Drawback
Row storage Record data is stored together. Data can be easily inserted and updated. All the columns of a record are read after the SELECT statement is executed even if only certain columns are required.
Column storage Only the columns involved in a query are read. Projections are efficient. Any column can serve as an index. The selected columns need to be reconstructed after the SELECT statement is executed. Data cannot be easily inserted or updated.

Generally, if a table contains many columns (called a wide table) and its query involves only a few columns, column storage is recommended. Row storage is recommended if a table contains only a few columns and a query involves most of the fields.

Storage Model Application Scenarios
Row storage Point queries (simple index-based queries that only return a few records)Scenarios requiring frequent addition, deletion, and modification
Column storage Statistical analysis queries (requiring a large number of association and grouping operations)Ad hoc queries (using uncertain query conditions and unable to utilize indexes to scan row-store tables)

MogDB supports hybrid row storage and column storage. Each storage model applies to specific scenarios. Select an appropriate model when creating a table. Generally, MogDB is used for transactional processing databases. By default, row storage is used. Column storage is used only when complex queries in large data volume are performed.

Selecting a Storage Model

  • Update frequency

    If data is frequently updated, use a row-store table.

  • Data insertion frequency

    If a small amount of data is frequently inserted each time, use a row-store table. If a large amount of data is inserted at a time, use a column-store table.

  • Number of columns

    If a table is to contain many columns, use a column-store table.

  • Number of columns to be queried

    If only a small number of columns (less than 50% of the total) is queried each time, use a column-store table.

  • Compression ratio

    The compression ratio of a column-store table is higher than that of a row-store table. High compression ratio consumes more CPU resources.

Constraints of Column-store Table

  • The column-store table does not support arrays.
  • The number of column-store tables is recommended to be no more than 1000.
  • The table-level constraints of the column-store table only support PARTIAL CLUSTER KEY, and do not support table-level constraints such as primary and foreign keys.
  • The field constraints of the column-store table only support NULL, NOT NULL and DEFAULT constant values.
  • The column-store table does not support the alter command to modify field constraints.
  • The column-store table supports the delta table, which is controlled by the parameter enable_delta_store whether to enable or not, and the threshold value for entering the delta table is controlled by the parameter deltarow_threshold.
  • cstore_buffers

    The size of the shared buffer used by the column-store, the default value: 32768KB.

  • partition_mem_batch

    Specify the number of caches. In order to optimize the batch insertion of column-store partition tables, the data will be cached during the batch insertion process and then written to disk in batches. Default value: 256.

  • partition_max_cache_size

    Specify the size of the data buffer area. In order to optimize the batch insertion of column-store partition tables, the data will be cached during the batch insertion process and then written to disk in batches. Default value: 2GB.

  • enable_delta_store

    In order to enhance the performance of single data import in column-store and solve the problem of disk redundancy, whether it is necessary to enable the function of column-store delta table and use it in conjunction with the parameter DELTAROW_THRESHOLD. Default value: off.

Create Table Commands

MogDB creates normal tables as uncompressed row-store tables by default.

mogdb=# \dt
No relations found.
mogdb=# create table test_t(id serial primary key ,col1 varchar(8),col2 decimal(6,2),create_time timestamptz not null default now());
NOTICE:  CREATE TABLE will create implicit sequence "test_t_id_seq" for serial column "test_t.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test_t_pkey" for table "test_t"
CREATE TABLE
mogdb=# \dt+
                                     List of relations
 Schema |  Name  | Type  | Owner |  Size   |             Storage              | Description
--------+--------+-------+-------+---------+----------------------------------+-------------
 public | test_t | table | omm   | 0 bytes | {orientation=row,compression=no} |
(1 row)

mogdb=# 

To create a column-store table, you need to specify orientation=column, the default compression level is low.

mogdb=# create table column_t(id serial,col1 varchar(8),col2 decimal(6,2),create_time timestamptz not null default now()) with (orientation=column );
NOTICE:  CREATE TABLE will create implicit sequence "column_t_id_seq" for serial column "column_t.id"
CREATE TABLE
mogdb=# \dt+
                                        List of relations
 Schema |   Name   | Type  | Owner |  Size   |               Storage                | Description
--------+----------+-------+-------+---------+--------------------------------------+-------------
 public | column_t | table | omm   | 16 kB   | {orientation=column,compression=low} |
 public | test_t   | table | omm   | 0 bytes | {orientation=row,compression=no}     |
(2 rows)

mogdb=# \d+ column_t
                                                        Table "public.column_t"
   Column    |           Type           |                       Modifiers                       | Storage  | Stats target | Description
-------------+--------------------------+-------------------------------------------------------+----------+--------------+-------------
 id          | integer                  | not null default nextval('column_t_id_seq'::regclass) | plain    |              |
 col1        | character varying(8)     |                                                       | extended |              |
 col2        | numeric(6,2)             |                                                       | main     |              |
 create_time | timestamp with time zone | not null default now()                                | plain    |              |
Has OIDs: no
Options: orientation=column, compression=low

Add partial clustered storage columns to the column-store table.

mogdb=# \d+ column_t
                                                        Table "public.column_t"
   Column    |           Type           |                       Modifiers                       | Storage  | Stats target | Description
-------------+--------------------------+-------------------------------------------------------+----------+--------------+-------------
 id          | integer                  | not null default nextval('column_t_id_seq'::regclass) | plain    |              |
 col1        | character varying(8)     |                                                       | extended |              |
 col2        | numeric(6,2)             |                                                       | main     |              |
 create_time | timestamp with time zone | not null default now()                                | plain    |              |
Has OIDs: no
Options: orientation=column, compression=low

mogdb=# alter table column_t add PARTIAL CLUSTER KEY(id);
ALTER TABLE
mogdb=# \d+ column_t
                                                        Table "public.column_t"
   Column    |           Type           |                       Modifiers                       | Storage  | Stats target | Description
-------------+--------------------------+-------------------------------------------------------+----------+--------------+-------------
 id          | integer                  | not null default nextval('column_t_id_seq'::regclass) | plain    |              |
 col1        | character varying(8)     |                                                       | extended |              |
 col2        | numeric(6,2)             |                                                       | main     |              |
 create_time | timestamp with time zone | not null default now()                                | plain    |              |
Partial Cluster :
    "column_t_cluster" PARTIAL CLUSTER KEY (id)
Has OIDs: no
Options: orientation=column, compression=low

mogdb=#

Create column-store tables with partial clustered storage directly.

NOTICE:  CREATE TABLE will create implicit sequence "column_c_id_seq" for serial column "column_c.id"
CREATE TABLE
mogdb=# \d+ column_c
                                                        Table "public.column_c"
   Column    |           Type           |                       Modifiers                       | Storage  | Stats target | Description
-------------+--------------------------+-------------------------------------------------------+----------+--------------+-------------
 id          | integer                  | not null default nextval('column_c_id_seq'::regclass) | plain    |              |
 col1        | character varying(8)     |                                                       | extended |              |
 col2        | numeric(6,2)             |                                                       | main     |              |
 create_time | timestamp with time zone | not null default now()                                | plain    |              |
Partial Cluster :
    "column_c_cluster" PARTIAL CLUSTER KEY (id)
Has OIDs: no
Options: orientation=column, compression=low

mogdb=#

Please refer to Supported Data Types -> Data Types Supported by Column-store Tables under the Developer Guide for the data types supported by column-store tables.

Column-store versus Row-store

Used disk space

  • The default size of the column-store table is 16K, the compression level is low.
  • The default size of the row-store table is 0bytes, the compression level is no.
  • Insert 1 million pieces of data into the two tables separately , and compare the occupied disk size.

    mogdb=# \dt+
                                          List of relations
    Schema |   Name    | Type  | Owner |  Size   |                 Storage                 | Description
    --------+-----------+-------+-------+---------+-----------------------------------------+-------------
    public | column_t  | table | omm   | 16 kB   | {orientation=column,compression=low}    |
    public | column_th | table | omm   | 16 kB   | {orientation=column,compression=high}   |
    public | column_tm | table | omm   | 16 kB   | {orientation=column,compression=middle} |
    public | row_tc    | table | omm   | 0 bytes | {orientation=row,compression=yes}       |
    public | test_t    | table | omm   | 0 bytes | {orientation=row,compression=no}        |
    (5 rows)
    
    mogdb=# insert into column_t select generate_series(1,1000000),left(md5(random()::text),8),random()::numeric(6,2);
    INSERT 0 1000000
    Time: 11328.880 ms
    mogdb=# insert into column_th select generate_series(1,1000000),left(md5(random()::text),8),random()::numeric(6,2);
    INSERT 0 1000000
    Time: 10188.634 ms
    mogdb=# insert into column_tm select generate_series(1,1000000),left(md5(random()::text),8),random()::numeric(6,2);
    INSERT 0 1000000
    Time: 9802.739 ms
    mogdb=# insert into test_t select generate_series(1,1000000),left(md5(random()::text),8),random()::numeric(6,2);
    INSERT 0 1000000
    Time: 17404.945 ms
    mogdb=# insert into row_tc select generate_series(1,1000000),left(md5(random()::text),8),random()::numeric(6,2);
    INSERT 0 1000000
    Time: 12394.866 ms
    mogdb=# \dt+
                                           List of relations
    Schema |   Name    | Type  | Owner |   Size   |                 Storage                 | Description
    --------+-----------+-------+-------+----------+-----------------------------------------+-------------
    public | column_t  | table | omm   | 12 MB    | {orientation=column,compression=low}    |
    public | column_th | table | omm   | 8304 kB  | {orientation=column,compression=high}   |
    public | column_tm | table | omm   | 10168 kB | {orientation=column,compression=middle} |
    public | row_tc    | table | omm   | 58 MB    | {orientation=row,compression=yes}       |
    public | test_t    | table | omm   | 58 MB    | {orientation=row,compression=no}        |
    (5 rows)
    
    mogdb=#
  • The compression level of the column-store table is higher.
  • The less disk space it uses, after the row-store table is compressed, the size of the disk space dose not decrease significantly.
  • Column-store table take up nearly 6 times less disk space than row-store table.

DML Comparison

Search for a single column:

---
---Search by range, column-store is nearly 20 times faster than row-store
---
mogdb=# select col1 from test_t where id>=100010 and id<100020;
   col1
----------
 4257a3f3
 3d397284
 64343438
 6eb7bdb7
 d1c9073d
 6aeb037c
 1d424974
 223235ab
 329de235
 2f02adc1
(10 rows)

Time: 77.341 ms
mogdb=# select col1 from column_t where id>=100010 and id<100020;
   col1
----------
 d4837c30
 87a46f7a
 2f42a9c9
 4481c793
 68800204
 613b9205
 9d8f4a0a
 5cc4ff9e
 f948cd10
 f2775cee
(10 rows)

Time: 3.884 ms

---
---Search Randomly, column-store is nearly 35 times faster than row-store
---

mogdb=# select col1 from test_t limit 10;
   col1
----------
 c2780d93
 294be14d
 4e53b761
 2c10f8a2
 ae776743
 7d683c66
 b3b40054
 7e56edf9
 a7b7336e
 ea3d47d9
(10 rows)

Time: 249.887 ms
mogdb=# select col1 from column_t limit 10;
   col1
----------
 a745d77b
 4b6df494
 76fed9c1
 70c9664d
 3384de8a
 4158f3bf
 5d1c3b9f
 341876bb
 f396f4ed
 abfd78bb
(10 rows)

Time: 7.738 ms

Search for all the data:

---
---Row-store is 30% faster than column-store search
---
mogdb=# select * from test_t limit 10;
 id |   col1   | col2 |          create_time
----+----------+------+-------------------------------
  1 | c2780d93 |  .37 | 2020-10-26 14:27:33.304108+08
  2 | 294be14d |  .57 | 2020-10-26 14:27:33.304108+08
  3 | 4e53b761 |  .98 | 2020-10-26 14:27:33.304108+08
  4 | 2c10f8a2 |  .27 | 2020-10-26 14:27:33.304108+08
  5 | ae776743 |  .97 | 2020-10-26 14:27:33.304108+08
  6 | 7d683c66 |  .58 | 2020-10-26 14:27:33.304108+08
  7 | b3b40054 |  .44 | 2020-10-26 14:27:33.304108+08
  8 | 7e56edf9 |  .43 | 2020-10-26 14:27:33.304108+08
  9 | a7b7336e |  .31 | 2020-10-26 14:27:33.304108+08
 10 | ea3d47d9 |  .42 | 2020-10-26 14:27:33.304108+08
(10 rows)

Time: 6.822 ms

mogdb=# select * from column_t limit 10;
 id |   col1   | col2 |          create_time
----+----------+------+-------------------------------
  1 | a745d77b |  .33 | 2020-10-26 14:28:20.633253+08
  2 | 4b6df494 |  .42 | 2020-10-26 14:28:20.633253+08
  3 | 76fed9c1 |  .73 | 2020-10-26 14:28:20.633253+08
  4 | 70c9664d |  .74 | 2020-10-26 14:28:20.633253+08
  5 | 3384de8a |  .48 | 2020-10-26 14:28:20.633253+08
  6 | 4158f3bf |  .59 | 2020-10-26 14:28:20.633253+08
  7 | 5d1c3b9f |  .63 | 2020-10-26 14:28:20.633253+08
  8 | 341876bb |  .97 | 2020-10-26 14:28:20.633253+08
  9 | f396f4ed |  .73 | 2020-10-26 14:28:20.633253+08
 10 | abfd78bb |  .30 | 2020-10-26 14:28:20.633253+08
(10 rows)

Time: 9.982 ms

Update data:

---
---Update a field directly, column-store is nearly 7 times faster than row-store
---
mogdb=# update test_t set col1=col1;
UPDATE 1000000
Time: 19779.978 ms
mogdb=# update column_t set col1=col1;
UPDATE 1000000
Time: 2702.339 ms

Conclusion

  1. The Column-store table saves nearly 6 times the disk space usage compared to the row-store table.
  2. When searching for the specified field, the column-store table is about 20-35 times faster than the row-store table.
  3. When searching for all the data, the column-store table is 30% slower than the row-store table.
  4. When importing data in batches in the default compression mode, and column-store table is 40% faster than the row-store table.