文档中心MogDBMogDB StackUqbar
v5.0

文档:v5.0

支持的版本:

其他版本:

pg_bulkload

pg_bulkload简介

pg_bulkload是一种用于MogDB的高速数据加载工具,相比copy命令速度更快,能够跳过shared buffer、wal buffer直接写文件。


pg_bulkload安装

手动安装

  1. 访问MogDB下载页面,下载所需版本的pg_bulkload插件。

  2. 解压插件包,例如:

    tar -xzvf pg_bulkload-1.0-x.x.x-01-CentOS-x86_64.tar.gz
  3. 进入插件所在目录下,执行make install命令。

    cd pg_bulkload/
    make install

PTK安装

参见插件安装


pg_bulkload参数

pg_bulkload --help

pg_bulkload is a bulk data loading tool for PostgreSQL

Usage:
  Dataload: pg_bulkload [dataload options] control_file_path
  Recovery: pg_bulkload -r [-D DATADIR]

Dataload options:
  -i, --input=INPUT         INPUT path or function
  -O, --output=OUTPUT       OUTPUT path or table
  -l, --logfile=LOGFILE     LOGFILE path
  -P, --parse-badfile=*     PARSE_BADFILE path
  -u, --duplicate-badfile=* DUPLICATE_BADFILE path
  -o, --option="key=val"    additional option

Recovery options:
  -r, --recovery            execute recovery
  -D, --pgdata=DATADIR      database directory

Connection options:
  -d, --dbname=DBNAME       database to connect
  -h, --host=HOSTNAME       database server host or socket directory
  -p, --port=PORT           database server port
  -U, --username=USERNAME   user name to connect as
  -w, --no-password         never prompt for password
  -W, --password            force password prompt

Generic options:
  -e, --echo                echo queries
  -E, --elevel=LEVEL        set output message level
  --help                    show this help, then exit
  --version                 output version information, then exit

pg_bulkload使用示例

连接数据库:

gsql -p 5432 -d postgres -r

创建extension:

MogDB=# CREATE EXTENSION pg_bulkload;
CREATE EXTENSION

创建测试表:

MogDB=# create table test_bulkload(id int, name varchar(128));
CREATE TABLE

创建一个txt文件,写10W条数据:

seq 100000| awk '{print $0"|bulkload"}' > bulkload_output.txt

使用参数

文件创建成功,执行如下命令:

pg_bulkload -i ./bulkload_output.txt -O test_bulkload -l test_bulkload.log -p 5432 -o "TYPE=csv" -o "DELIMITER=|" -d postgres -U hlv

其中“5432”为数据库端口号,“hlv”为用户名。

回显如下:

WARNING:  It is invalid to set pg_temp or pg_catalog behind other schemas in search path explicitly. The priority order is pg_temp, pg_catalog         and other schemas.
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
        0 Rows skipped.
        100000 Rows successfully loaded.
        0 Rows not loaded due to parse errors.
        0 Rows not loaded due to duplicate errors.
        0 Rows replaced with new rows.

连接数据库,查看数据是否导入成功:

MogDB=# select count(1) from test_bulkload;
 count
--------
 100000
(1 row)

使用控制文件

在使用控制文件进行数据导入之前,需要先清空之前表中导入的数据。

TRUNCATE TABLE test_bulkload;

编写.ctl文件

INPUT=/vdb/MogDB-server/dest/bulkload_output.txt
LOGFILE = /vdb/MogDB-server/dest/test_bulkload.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 5  (该参数设置跳过几行)
DELIMITER = |
QUOTE = "\""
ESCAPE = "\""
OUTPUT = test_bulkload
MULTI_PROCESS = NO
WRITER = DIRECT
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = YES

注意:代码逻辑中以换行符来识别.ctl文件中的参数,因此.ctl文件的最后一行需要换行以避免参数识别错误。

执行命令:

pg_bulkload ./lottu.ctl -d postgres -U hlv

回显如下:

WARNING:  It is invalid to set pg_temp or pg_catalog behind other schemas in search path explicitly. The priority order is pg_temp, pg_catalog and other schemas.
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
        5 Rows skipped.
        99995 Rows successfully loaded.
        0 Rows not loaded due to parse errors.
        0 Rows not loaded due to duplicate errors.
        0 Rows replaced with new rows.

连接数据库,查看数据是否导入成功:

MogDB=# select count(1) from test_bulkload;
 count
-------
 99995
(1 row)
Copyright © 2011-2024 www.enmotech.com All rights reserved.