MVD 使用说明

MVD 为纯命令行工具，只能在 Shell 界面中执行。可通过 -h 选项查看命令帮助信息，如下：


Introduction:
    MVD is a data verification tool for Heterogeneous Databases.

Options:
    -h, --help           : Show help message
    -v, --version        : Show tool version [3.5.1]
    -x, --debug          : Run in debug mode, means more output logs
        --debug-md5      : Debug for print data before calculator MD5
    -c, --config-file    : Using a config file with format json
        --mtk-config     : Using a config file from MTK tool with format json
    -C, --category       : Compare category: A=All, M=Metadata, D=Data
    -m, --mode           : Data compare mode, default is [R]
                           [R] Row mode, compare data row by row
                           [S] Summary mode, compare summary data, include row count and data signature
                           [A] Automatic mode, Compare summary and compare row when summary does not matched
    -d, --func-dimension : [Advanced Option] Functions used in data comparison
                           Default: avg:a,min:np,max:np,median:np
                           Format: <name>:<primary_type>    -- Same function name among all database
                                   <primary_type> := a|p|np, a = all, p = primary table, np = not [p]
                                                     can be ignored, then use default [a]
                           Format: <name>:<function_list>:<primary_type>
                                   <function_list> := <db_type> = <function_name> | <db_type> = <function_name>
                                   <db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
                           Example 'testmin:oracle=min|mogdb=min|mysql=min|db2=min:p'
    -s, --source-db      : Source database to be verified
                           Format: <db_type>:<ip>:<port>:<name>:<user>:<password>
                                   <db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
    -t, --target-db      : Target database to be verified, format see also '-s'
        --passwd-encrypt : DB password is encrypted
    -w, --workers        : Parallel workers (1~32), default: 8
    -W, --table-workers  : Parallel workers within one table when partition/sampling parallel used (1~32), default: 4
    -T, --table          : Check a single table (source table info)
                           Format: <owner>.<table_name>
                                   <source_owner>.<source_table>:<target_owner>.<target_table>
                             Tips: 1. if this option was specified, then '-i' and '-e' will be ignored
                                   2. Name with mixed-case means case sensitive
                                   3. Name with quotation mark means keep original case
    -i, --include        : Source White list, patterns used for object filter, all patterns combined with comma
                           Format: <type>:<owner>.<object_name>,...
                                   <type>:<object_name>
                                   <owner>.<object_name>
                                   <object_name>
                             Tips: 1. <type> can use: */%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
                                   2. Can use */% in <owner> and <object_name> field, means to match all
                                   3. Name is case insensitive
    -e, --exclude        : Source Black list, patterns used for object filter, format see also '-i'
    -r, --remap-schema   : Schema transformation in comparison
                           Format: <source_schema>:<target_schema>,<source_schema>:<target_schema>...
        --column-list    : Set valid column list for data comparison, combined with comma
                           Format: <colume_item>,<colume_item>,<colume_item>,...
                                   <colume_item> := <source_schema>.<source_name>.<source_colume>:<target_column>
                             Tips: 1. if column name does not changed then ':<target_column>' can be removed
                                   2. if no <source_schema> then it's the limitation for all tables with the same name in each schema
                                   3. if no <source_schema>.<table_name> then it's the limitation for all tables
                                   4. Name with mixed-case means case sensitive
                                   5. Global/Table column mapping/limitation can be used together
                                   6. If global column mapping/limitation used, then all tables columns must in column-list
                                   7. If only table column mapping/limitation used, the column-list scope is only for the table
        --data-filter    : Set data filter for comparison, combined with |
                           Format: <filter_1>|<source_schema>.<source_name>:<filter_source>:<filter_target>,...
                           Example: hongye.test_tab:created > sysdate - 356:created > now() - interval '365 days'
        --sample-size    : Minimal size in MB when using sample comparison (partial data comparison), default 10240 means >= 10GB
        --sample-pct     : Sample percent in sample comparison (value must between 0 and 1), default 1 means compare all data
        --detail-mode    : Result data in detail mode (show data even no differences found)
    -f, --result-file    : Result file, used to save result, default to print result to screen
    -F, --result-format  : Result file data format: json (default), plain
    -R, --row-dir        : Row directory for differences data (MD5 & KEY)
        --row-feedback   : Query row data when differences found, otherwise just key condition listed
        --ignore-float   : Ignore float data type in comparison
        --ora-float-prec : Oracle float precision in data comparison, Range: -1 ~ 128, Default: -1
        --float-prec     : Float precision in data comparison, Range: -1 ~ 128, Default: -1
        --double-prec    : Double precision in data comparison, Range: -1 ~ 128, Default: -1
        --fraction-prec  : Fraction precision in data comparison, Range: 0-6, Default: 6 (Informix is 5)
    -z, --zero-char      : Specify a char for chr(0) in comparison, Default is empty char
    -Z, --time-zone      : Specify timezone for DB client, set empty use local, default is UTC(+00:00)
    -l, --logfile        : Write output information to logfile
    -L, --license        : Specify license file, default is: ./license.json
        --apply-license  : Apply for a new license from server
        --upgrade        : Upgrade current binary MVD command
        --callback       : Use callback interface to get PID and result asynchronously
        --generate-repair: Whether to generate repair scripts for the target database
        --repair-compared: Whether to just repair compared, default is try to repair all columns matched
        --rtrim-varchar  : Whether to rtrim blanks after varchar data, by default blanks after varchar is kept

Usage:
    1. Apply a license
       ./mvd_linux_x86_64 --apply-license
    2. Verify a single schema  (Using MD5 ROW-BY-ROW)
       ./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'HONGYE.*' -R './diff'

命令选项说明

命令选项	选项说明
-h, --help	显示工具的帮助信息
-v, --version	显示工具的当前版本信息
-x, --debug	开启 DEBUG 模式，输出更详细的日志内容，仅在出现问题后的调试过程中使用
--debug-md5	专用于 Python MD5 计算前值输出的 debug 选项，该选项会打开 DB2/Oracle 表中的所有数据，占用大量的磁盘日志空间，请谨慎使用
-c, --config-file	指定程序运行时的配置参数（json格式），使用配置参数代替具体命令行选项
--mtk-config	指定 MTK 运行时的配置文件（json格式），校验 MTK 同步过程中涉及到的对象和表的数据
-C, --catagory	指定对比类别： A = 对比所有， M = 仅对比对象结构， D = 仅对比数据
-m, --mode	对比模式： R = 逐行对比， S = 全量特征值对比， A = 自动对比
-d, --func-dimension	高级选项，不是特别清晰该选项含义时，请勿使用。指定数据对比过程中的统计函数维度，输入格式为： 1. <name>:<primary_type> 2. <name>:<function_list>:<primary_type> 其中： primary_type 可选值为 a = 所有表, p = 主键表, np = 非主键表 name 函数名称，若函数名称即为实际数据库中的函数，那么无需指定 function_list，否则需要指定改函数在各个数据库中的实际表现 function_list 指定该函数在各个数据库中的不同表现形式，格式为： <db_type>=<function_name> 其中 db_type 可选值如下： ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, SQLSERVER
-s, --source-db	指定对比的源端数据库，其输入格式为： <db_type>:<ip>:<port>:<name>:<user>:<password> 其中 db_type 可选值如下： ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, SQLSERVER
-t, --target-db	指定对比的目标端数据库，其输入格式与 `-s` 一样。
--passwd-encrypt	数据库密码是否使用 MDB 加密，默认不加密
-w, --workers	指定数据对比过程中的并发进程数量，范围是 1~32，默认是 8 个并发
-T, --table	仅仅只针对单张表进行对比，输入格式为：<owner>.<table_name> 或 <source_owner>.<source_table>:<target_owner>.<target_table>，不可使用通配符，会与 `-i` 和 `-e` 选项冲突。此选项支持使用引号（单引号，双引号，反引号）包裹，来保留 schema 和对象名的原始大小写。
-i, --include	需要包含进行对比的对象列表，可指定多个匹配模式，使用逗号分隔。匹配格式包括：<type>:<owner>.<object_name>, <owner>.<object_name>, <object_name>, ... 可以在 OWNER, OBJECT_NAME 中使用 `` 或 `%` 来标识通配符 TYPE 类型包括： /%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
-e, --exclude	需要排除的对象列表，其格式与 `-i` 一样。
-r, --remap-schema	指定对比时的源端与目标端的 Schema 映射关系，默认源端与目标端 Schema 同名无映射
--column-list	指定只对比本选项列出的字段数据
--data-filter	指定只对比给定过滤条件的数据
--sample-size	动态采样对比针对的表的大小阈值，大于阈值的表才考虑使用动态采样
--sample-pct	动态采样的数据比例，取值 (0, 1]
--detail-mode	显示详细的数据对比结果，包括无差异表的数据对比结果
-f, --result-file	对比结果的输出文件，默认无输出文件，直接在当前执行的命令行窗口展示结果
-F, --result-format	对比结果的格式，支持 json, plain 两种，默认 plain 方式，即便于人工阅读的文本格式
-R, --row-dir	执行逐行 MD5 对比模式，指定行差异结果的输出文件夹，每张有差异的表，都会在该目录中创建一个差异文件
--row-feedback	是否回显差异行的差异字段数据（仅针对主键表），默认只显示差异行的 KEY （Oracle 中的 ROWID, PG 中的 CTID 等）
--ignore-float	是否在数据对比过程中，忽略浮点类型（float, double, real 等非精确类型），默认不忽略
-z, --zero-char	指定 chr(0) 字符在对比中的替代字符，默认为空字符，即移除 chr(0) 不可见字符
-Z, --time-zone	指定客户端数据查询的时区，设置空字符串则使用本地操作系统时区，不设置则使用 UTC (+00:00) 时区
-l, --logfile	指定工具运行的日志文件
-L, --license	指定 License 文件的位置，若 license.json 不在当前目录则需要手动通过本选项指定
-l, --apply-license	申请 License
--upgrade	升级当前运行的 MVD 二进制命令
--callback	使用回调接口异步获取对比进程的 PID 以及对比结果
--generate-repair	是否生成针对目标库的数据修复脚本
--repair-compared	数据修复脚本是否仅针对对比字段，默认是针对所有双边匹配上的字段
--rtrim-varchar	是否移除 varchar 数据尾部的空格，默认保留 varchar 尾部空格

常用命令示例

以下为一些常用场景下的命令示例（以 2.0 Linux x86_64 版本程序为例）。

申请 License
```
./mvd_linux_x86_64 --apply-license
```
需要在执行过程中输入接收 license 的邮箱地址
Compare with a config file

通过预先编辑好的配置文件，执行对比任务，具体配置文件的配置说明，请参考: 配置文件
```
./mvd_linux_x86_64 -c config.json
```

对比 Oracle 到 MogDB 的结构和数据差异：

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:hongye:pwd' -t 'MOGDB:127.0.0.1:5432:omm:hongye:pwd' -i 'mtk.*' -R './diff'

对比 DB2 到 MogDB 的结构和数据差异：

./mvd_linux_x86_64 -s 'DB2:127.0.0.1:50000:HONGYE:db2inst1:pwd' -t 'MOGDB:127.0.0.1:5432:db2_mtk1:hongye:pwd' -i 'mtk.*' -R './diff'

对比 MySQL 到 MogDB 的结构和数据差异：

./mvd_linux_x86_64 -s 'MYSQL:127.0.0.1:3306:hongye:root:pwd' -t 'MOGDB:127.0.0.1:5432:mysql_mtk:hongye:pwd' -i 'mtk.*' -R './diff'

只对某一张表执行数据对比（特征值对比）：

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -T 'HONGYE.TEST'

对比数据和结构，并精确识别差异行：

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'mtk.*' -R './diff'

根据 MTK 的配置文件，进行迁移后的数据校验：
```
./mvd_linux_x86_64 --mtk-config oracle2opengauss.json
```
使用动态采样对比数据

动态采样参数 --sample-size 控制表大小超过多少会使用动态采样：100 表示对大小超过 100MB 的表使用动态采样，而 --sample-pct 控制采样比例：0.1 即采样对比 10% 的数据。
```
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -t 'mtk.test_big_table' -R './diff' --sample-size 100 --sample-pct 0.1
```