MVD Usage

MVD is command line tool can be run only in the Shell interface. You can query the command help information by running the -h command.


Introduction:
    MVD is a data verification tool for Heterogeneous Databases.

Options:
    -h, --help           : Show help message
    -v, --version        : Show tool version [3.5.1]
    -x, --debug          : Run in debug mode, means more output logs
        --debug-md5      : Debug for print data before calculator MD5
    -c, --config-file    : Using a config file with format json
        --mtk-config     : Using a config file from MTK tool with format json
    -C, --category       : Compare category: A=All, M=Metadata, D=Data
    -m, --mode           : Data compare mode, default is [R]
                           [R] Row mode, compare data row by row
                           [S] Summary mode, compare summary data, include row count and data signature
                           [A] Automatic mode, Compare summary and compare row when summary does not matched
    -d, --func-dimension : [Advanced Option] Functions used in data comparison
                           Default: avg:a,min:np,max:np,median:np
                           Format: <name>:<primary_type>    -- Same function name among all database
                                   <primary_type> := a|p|np, a = all, p = primary table, np = not [p]
                                                     can be ignored, then use default [a]
                           Format: <name>:<function_list>:<primary_type>
                                   <function_list> := <db_type> = <function_name> | <db_type> = <function_name>
                                   <db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
                           Example 'testmin:oracle=min|mogdb=min|mysql=min|db2=min:p'
    -s, --source-db      : Source database to be verified
                           Format: <db_type>:<ip>:<port>:<name>:<user>:<password>
                                   <db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
    -t, --target-db      : Target database to be verified, format see also '-s'
        --passwd-encrypt : DB password is encrypted
    -w, --workers        : Parallel workers (1~32), default: 8
    -W, --table-workers  : Parallel workers within one table when partition/sampling parallel used (1~32), default: 4
    -T, --table          : Check a single table (source table info)
                           Format: <owner>.<table_name>
                                   <source_owner>.<source_table>:<target_owner>.<target_table>
                             Tips: 1. if this option was specified, then '-i' and '-e' will be ignored
                                   2. Name with mixed-case means case sensitive
                                   3. Name with quotation mark means keep original case
    -i, --include        : Source White list, patterns used for object filter, all patterns combined with comma
                           Format: <type>:<owner>.<object_name>,...
                                   <type>:<object_name>
                                   <owner>.<object_name>
                                   <object_name>
                             Tips: 1. <type> can use: */%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
                                   2. Can use */% in <owner> and <object_name> field, means to match all
                                   3. Name is case insensitive
    -e, --exclude        : Source Black list, patterns used for object filter, format see also '-i'
    -r, --remap-schema   : Schema transformation in comparison
                           Format: <source_schema>:<target_schema>,<source_schema>:<target_schema>...
        --column-list    : Set valid column list for data comparison, combined with comma
                           Format: <colume_item>,<colume_item>,<colume_item>,...
                                   <colume_item> := <source_schema>.<source_name>.<source_colume>:<target_column>
                             Tips: 1. if column name does not changed then ':<target_column>' can be removed
                                   2. if no <source_schema> then it's the limitation for all tables with the same name in each schema
                                   3. if no <source_schema>.<table_name> then it's the limitation for all tables
                                   4. Name with mixed-case means case sensitive
                                   5. Global/Table column mapping/limitation can be used together
                                   6. If global column mapping/limitation used, then all tables columns must in column-list
                                   7. If only table column mapping/limitation used, the column-list scope is only for the table
        --data-filter    : Set data filter for comparison, combined with |
                           Format: <filter_1>|<source_schema>.<source_name>:<filter_source>:<filter_target>,...
                           Example: hongye.test_tab:created > sysdate - 356:created > now() - interval '365 days'
        --sample-size    : Minimal size in MB when using sample comparison (partial data comparison), default 10240 means >= 10GB
        --sample-pct     : Sample percent in sample comparison (value must between 0 and 1), default 1 means compare all data
        --detail-mode    : Result data in detail mode (show data even no differences found)
    -f, --result-file    : Result file, used to save result, default to print result to screen
    -F, --result-format  : Result file data format: json (default), plain
    -R, --row-dir        : Row directory for differences data (MD5 & KEY)
        --row-feedback   : Query row data when differences found, otherwise just key condition listed
        --ignore-float   : Ignore float data type in comparison
        --ora-float-prec : Oracle float precision in data comparison, Range: -1 ~ 128, Default: -1
        --float-prec     : Float precision in data comparison, Range: -1 ~ 128, Default: -1
        --double-prec    : Double precision in data comparison, Range: -1 ~ 128, Default: -1
        --fraction-prec  : Fraction precision in data comparison, Range: 0-6, Default: 6 (Informix is 5)
    -z, --zero-char      : Specify a char for chr(0) in comparison, Default is empty char
    -Z, --time-zone      : Specify timezone for DB client, set empty use local, default is UTC(+00:00)
    -l, --logfile        : Write output information to logfile
    -L, --license        : Specify license file, default is: ./license.json
        --apply-license  : Apply for a new license from server
        --upgrade        : Upgrade current binary MVD command
        --callback       : Use callback interface to get PID and result asynchronously
        --generate-repair: Whether to generate repair scripts for the target database
        --repair-compared: Whether to just repair compared, default is try to repair all columns matched
        --rtrim-varchar  : Whether to rtrim blanks after varchar data, by default blanks after varchar is kept

Usage:
    1. Apply a license
       ./mvd_linux_x86_64 --apply-license
    2. Verify a single schema  (Using MD5 ROW-BY-ROW)
       ./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'HONGYE.*' -R './diff'

Command Line Option Description

Command Option	Description
-h, --help	Shows the help information of the tool.
-v, --version	Shows the current version of the tool.
-x, --debug	Enables the debug mode and outputs more detailed log content. This option is used only in the debugging process.
--debug-md5	Outputs the original value of the Python MD5 computing result. This option will open all data of a DB2/Oracle table and therefore large disk log space will be occupied. Please use it with caution.
-c, --config-file	Specifies configuration parameters (JSON format) used during program execution. Configuration parameters can replace the specific command line options.
--mtk-config	Specifies the configuration file (JSON format) used during MTK execution. It is used for verifying objects and table data involved in MTK synchronization.
-C, --catagory	Specifies the comparison category. A refers to comparison of all information. M refers to comparison of only object structure. D refers to comparison of only data.
-m, --mode	Specifies the comparison mode: R = Row Compare, S = Summary Compare, A = Automatic Compare
-d, --func-dimension	This is an advanced option. If you are not very clear about its meaning, do not use it. Specifies the dimension of the statistical function during data comparison. The input format is as follows: 1. <name>:<primary_type> 2. <name>:<function_list>:<primary_type> Where: The value of `primary_type` an be a (all tables), p (primary key tables), and np (non-key value tables). name indicates the function name. If the function name is the actual function used in a database, the name does not need to be specified. Otherwise, the name needs to be specified. function_list indicates the format of the function in different databases. The format is <db_type>=<function_name> \| <db_type>=<function_name> \| ... The value of db_type can be ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, and SQLSERVER.
-s, --source-db	Specifies the source database. The input format is <db_type>:<ip>:<port>:<name>:<user>:<password>. The value of `db_type` can be ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, and SQLSERVER.
-t, --target-db	Specifies the target database. The input format is the same as that of `-s`.
--passwd-encrypt	DB password is encrypted in MDB, default is not encrypted
-w, --workers	Specifies the number of concurrent processes during data comparison. The value ranges from 1 to 32. The default value is 8.
-T, --table	Specifies a single table to be compared. The input format is <owner>.<table_name> or <source_owner>.<source_table>:<target_owner>.<target_table> . Wildcard characters are not allowed to be used because they will conflict with `-i` and `-e`. This option supports wrapping with quotes (single, double, backquotes) to preserve the original case of schema and object names.
-i, --include	Specifies a list of objects to be included for comparison. You can specify multiple matching modes and separate them with commas. The matching format includes <type>:<owner> .<object_name>, <owner>.<object_name>, <object_name>, .... `` or `%` can be used in OWNER and OBJECT_NAME to mark wildcard characters. TYPE includes /%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS.
-e, --exclude	Specifies the list of objects to be excluded. The format is the same as that of `-i`.
-r, --remap-schema	Specifies the mapping of schemas in the source and target databases during comparison. The schemas in the source and target databases have the same name but do not have the mapping relation by default.
--column-list	Specify to compare only the data in the fields listed within this option
--data-filter	Specify to compare only the data matchs the given filter within this option
--sample-size	Size threshold for dynamic sampling compares, and tables larger than the threshold are considered for dynamic sampling
--sample-pct	Percentage of dynamically sampled data, takes the value (0, 1]
--detail-mode	Displays detailed data comparison results, including tables without data difference
-f, --result-file	Specifies the comparison result file. By default, no file will be generated, and the result is shown in the command line window.
-F, --result-format	Specifies the comparison result format, including json and plain. The default value is plain, indicating the text format which is convenient for the user to read.
-R, --row-dir	Specifies the folder where the row difference result is generated upon the execution of the MD5 row-by-row comparison mode. For each table with a difference, a difference file is created in that directory.
--row-feedback	Specifies whether to show the different column data of a difference row (For primary key tables only). Only KEY of a difference row is shown by default. (The KEY indicates ROWID in Oracle and CTID in PostgreSQL.)
--ignore-float	Specifies whether to ignore the floating-point type (float, double, real, and other non-precise types) during data comparison. The floating-point type is not ignored by default.
-z, --zero-char	Specifies the replacing character of the chr(0) character during comparison. The default value is null, which is to remove the chr(0) invisible characters.
-Z, --time-zone	Specifies the time zone of the client data query, set the empty string to use the local OS time zone, if not set, use UTC (+00:00) time zone
-l, --logfile	Specifies the log file for tool running.
-L, --license	Specify the location of the license file, if license.json is not in the current directory, you need to specify it manually with this option
--apply-license	Apply for a license
--upgrade	Upgrade current binary MVD command
--callback	Use callback interface to get PID and result asynchronously
--generate-repair	Whether to generate repair scripts for the target database
--repair-compared	Whether to just repair compared, default is try to repair all columns matched
--rtrim-varchar	Whether to rtrim blanks after varchar data, by default blanks after varchar is kept

Examples of Common Commands

The following lists command examples in common scenarios (using the Linux x86_64 2.0 as an example).

Note: The method of comparing statistical eigenvalues is inefficient. It is recommended to use the MD5 row-by-row comparison method.

Apply for a license
```
./mvd_linux_x86_64 --apply-license
```
You need to enter the email address for receive the license during execution.
Compare with a config file

Execute comparison task with a pre-edited configuration file. For configuration instructions of the specific configuration file, please refer to: MVD Configuration
```
./mvd_linux_x86_64 -c config.json
```

Compare the structures and data between Oracle and MogDB.

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:hongye:pwd' -t 'MOGDB:127.0.0.1:5432:omm:hongye:pwd' -i 'mtk.*' -R './diff'

Compare the structures and data between DB2 to MogDB.

./mvd_linux_x86_64 -s 'DB2:127.0.0.1:50000:HONGYE:db2inst1:pwd' -t 'MOGDB:127.0.0.1:5432:db2_mtk1:hongye:pwd' -i 'mtk.*' -R './diff'

Compare the structures and data between MySQL to MogDB.

./mvd_linux_x86_64 -s 'MYSQL:127.0.0.1:3306:hongye:root:pwd' -t 'MOGDB:127.0.0.1:5432:mysql_mtk:hongye:pwd' -i 'mtk.*' -R './diff'

Compare data in a table (eigenvalue comparison).

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -T 'HONGYE.TEST'

Compare data and structures and precisely recognize rows that involve difference.

./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'mtk.*' -R './diff'

Perform data verification after migration according to the MTK configuration file.
```
./mvd_linux_x86_64 --mtk-config oracle2opengauss.json
```
Compare a table using sampling slice.

Dynamic sampling parameter --sample-size controls sampling threshold: 100 means use sampling when table size is bigger than 100MB, and --sample-pct controls sampling percentage: 0.1 means compare 10% of whole table data.
```
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -t 'mtk.test_big_table' -R './diff' --sample-size 100 --sample-pct 0.1
```