MVD Usage
MVD is command line tool can be run only in the Shell interface. You can query the command help information by running the -h command.
Introduction:
MVD is a data verification tool for Heterogeneous Databases.
Options:
-h, --help : Show help message
-v, --version : Show tool version [3.5.1]
-x, --debug : Run in debug mode, means more output logs
--debug-md5 : Debug for print data before calculator MD5
-c, --config-file : Using a config file with format json
--mtk-config : Using a config file from MTK tool with format json
-C, --category : Compare category: A=All, M=Metadata, D=Data
-m, --mode : Data compare mode, default is [R]
[R] Row mode, compare data row by row
[S] Summary mode, compare summary data, include row count and data signature
[A] Automatic mode, Compare summary and compare row when summary does not matched
-d, --func-dimension : [Advanced Option] Functions used in data comparison
Default: avg:a,min:np,max:np,median:np
Format: <name>:<primary_type> -- Same function name among all database
<primary_type> := a|p|np, a = all, p = primary table, np = not [p]
can be ignored, then use default [a]
Format: <name>:<function_list>:<primary_type>
<function_list> := <db_type> = <function_name> | <db_type> = <function_name>
<db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
Example 'testmin:oracle=min|mogdb=min|mysql=min|db2=min:p'
-s, --source-db : Source database to be verified
Format: <db_type>:<ip>:<port>:<name>:<user>:<password>
<db_type> := ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
-t, --target-db : Target database to be verified, format see also '-s'
--passwd-encrypt : DB password is encrypted
-w, --workers : Parallel workers (1~32), default: 8
-W, --table-workers : Parallel workers within one table when partition/sampling parallel used (1~32), default: 4
-T, --table : Check a single table (source table info)
Format: <owner>.<table_name>
<source_owner>.<source_table>:<target_owner>.<target_table>
Tips: 1. if this option was specified, then '-i' and '-e' will be ignored
2. Name with mixed-case means case sensitive
3. Name with quotation mark means keep original case
-i, --include : Source White list, patterns used for object filter, all patterns combined with comma
Format: <type>:<owner>.<object_name>,...
<type>:<object_name>
<owner>.<object_name>
<object_name>
Tips: 1. <type> can use: */%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
2. Can use */% in <owner> and <object_name> field, means to match all
3. Name is case insensitive
-e, --exclude : Source Black list, patterns used for object filter, format see also '-i'
-r, --remap-schema : Schema transformation in comparison
Format: <source_schema>:<target_schema>,<source_schema>:<target_schema>...
--column-list : Set valid column list for data comparison, combined with comma
Format: <colume_item>,<colume_item>,<colume_item>,...
<colume_item> := <source_schema>.<source_name>.<source_colume>:<target_column>
Tips: 1. if column name does not changed then ':<target_column>' can be removed
2. if no <source_schema> then it's the limitation for all tables with the same name in each schema
3. if no <source_schema>.<table_name> then it's the limitation for all tables
4. Name with mixed-case means case sensitive
5. Global/Table column mapping/limitation can be used together
6. If global column mapping/limitation used, then all tables columns must in column-list
7. If only table column mapping/limitation used, the column-list scope is only for the table
--data-filter : Set data filter for comparison, combined with |
Format: <filter_1>|<source_schema>.<source_name>:<filter_source>:<filter_target>,...
Example: hongye.test_tab:created > sysdate - 356:created > now() - interval '365 days'
--sample-size : Minimal size in MB when using sample comparison (partial data comparison), default 10240 means >= 10GB
--sample-pct : Sample percent in sample comparison (value must between 0 and 1), default 1 means compare all data
--detail-mode : Result data in detail mode (show data even no differences found)
-f, --result-file : Result file, used to save result, default to print result to screen
-F, --result-format : Result file data format: json (default), plain
-R, --row-dir : Row directory for differences data (MD5 & KEY)
--row-feedback : Query row data when differences found, otherwise just key condition listed
--ignore-float : Ignore float data type in comparison
--ora-float-prec : Oracle float precision in data comparison, Range: -1 ~ 128, Default: -1
--float-prec : Float precision in data comparison, Range: -1 ~ 128, Default: -1
--double-prec : Double precision in data comparison, Range: -1 ~ 128, Default: -1
--fraction-prec : Fraction precision in data comparison, Range: 0-6, Default: 6 (Informix is 5)
-z, --zero-char : Specify a char for chr(0) in comparison, Default is empty char
-Z, --time-zone : Specify timezone for DB client, set empty use local, default is UTC(+00:00)
-l, --logfile : Write output information to logfile
-L, --license : Specify license file, default is: ./license.json
--apply-license : Apply for a new license from server
--upgrade : Upgrade current binary MVD command
--callback : Use callback interface to get PID and result asynchronously
--generate-repair: Whether to generate repair scripts for the target database
--repair-compared: Whether to just repair compared, default is try to repair all columns matched
--rtrim-varchar : Whether to rtrim blanks after varchar data, by default blanks after varchar is kept
Usage:
1. Apply a license
./mvd_linux_x86_64 --apply-license
2. Verify a single schema (Using MD5 ROW-BY-ROW)
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'HONGYE.*' -R './diff'
Command Line Option Description
Command Option | Description |
---|---|
-h, --help | Shows the help information of the tool. |
-v, --version | Shows the current version of the tool. |
-x, --debug | Enables the debug mode and outputs more detailed log content. This option is used only in the debugging process. |
--debug-md5 | Outputs the original value of the Python MD5 computing result. This option will open all data of a DB2/Oracle table and therefore large disk log space will be occupied. Please use it with caution. |
-c, --config-file | Specifies configuration parameters (JSON format) used during program execution. Configuration parameters can replace the specific command line options. |
--mtk-config | Specifies the configuration file (JSON format) used during MTK execution. It is used for verifying objects and table data involved in MTK synchronization. |
-C, --catagory | Specifies the comparison category. A refers to comparison of all information. M refers to comparison of only object structure. D refers to comparison of only data. |
-m, --mode | Specifies the comparison mode: R = Row Compare, S = Summary Compare, A = Automatic Compare |
-d, --func-dimension | This is an advanced option. If you are not very clear about its meaning, do not use it. Specifies the dimension of the statistical function during data comparison. The input format is as follows: 1. <name>:<primary_type> 2. <name>:<function_list>:<primary_type> Where: The value of primary_type an be a (all tables), p (primary key tables), and np (non-key value tables).name indicates the function name. If the function name is the actual function used in a database, the name does not need to be specified. Otherwise, the name needs to be specified. function_list indicates the format of the function in different databases. The format is <db_type>=<function_name> | <db_type>=<function_name> | ... The value of db_type can be ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, and SQLSERVER. |
-s, --source-db | Specifies the source database. The input format is <db_type>:<ip>:<port>:<name>:<user>:<password>. The value of db_type can be ORACLE, DB2, MYSQL, POSTGRESQL, MOGDB, OPENGAUSS, INFORMIX, and SQLSERVER. |
-t, --target-db | Specifies the target database. The input format is the same as that of -s . |
--passwd-encrypt | DB password is encrypted in MDB, default is not encrypted |
-w, --workers | Specifies the number of concurrent processes during data comparison. The value ranges from 1 to 32. The default value is 8. |
-T, --table | Specifies a single table to be compared. The input format is <owner>.<table_name> or <source_owner>.<source_table>:<target_owner>.<target_table> . Wildcard characters are not allowed to be used because they will conflict with -i and -e . This option supports wrapping with quotes (single, double, backquotes) to preserve the original case of schema and object names. |
-i, --include | Specifies a list of objects to be included for comparison. You can specify multiple matching modes and separate them with commas. The matching format includes <type>:<owner> .<object_name>, <owner>.<object_name>, <object_name>, .... * or % can be used in OWNER and OBJECT_NAME to mark wildcard characters.TYPE includes */%/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS. |
-e, --exclude | Specifies the list of objects to be excluded. The format is the same as that of -i . |
-r, --remap-schema | Specifies the mapping of schemas in the source and target databases during comparison. The schemas in the source and target databases have the same name but do not have the mapping relation by default. |
--column-list | Specify to compare only the data in the fields listed within this option |
--data-filter | Specify to compare only the data matchs the given filter within this option |
--sample-size | Size threshold for dynamic sampling compares, and tables larger than the threshold are considered for dynamic sampling |
--sample-pct | Percentage of dynamically sampled data, takes the value (0, 1] |
--detail-mode | Displays detailed data comparison results, including tables without data difference |
-f, --result-file | Specifies the comparison result file. By default, no file will be generated, and the result is shown in the command line window. |
-F, --result-format | Specifies the comparison result format, including json and plain. The default value is plain, indicating the text format which is convenient for the user to read. |
-R, --row-dir | Specifies the folder where the row difference result is generated upon the execution of the MD5 row-by-row comparison mode. For each table with a difference, a difference file is created in that directory. |
--row-feedback | Specifies whether to show the different column data of a difference row (For primary key tables only). Only KEY of a difference row is shown by default. (The KEY indicates ROWID in Oracle and CTID in PostgreSQL.) |
--ignore-float | Specifies whether to ignore the floating-point type (float, double, real, and other non-precise types) during data comparison. The floating-point type is not ignored by default. |
-z, --zero-char | Specifies the replacing character of the chr(0) character during comparison. The default value is null, which is to remove the chr(0) invisible characters. |
-Z, --time-zone | Specifies the time zone of the client data query, set the empty string to use the local OS time zone, if not set, use UTC (+00:00) time zone |
-l, --logfile | Specifies the log file for tool running. |
-L, --license | Specify the location of the license file, if license.json is not in the current directory, you need to specify it manually with this option |
--apply-license | Apply for a license |
--upgrade | Upgrade current binary MVD command |
--callback | Use callback interface to get PID and result asynchronously |
--generate-repair | Whether to generate repair scripts for the target database |
--repair-compared | Whether to just repair compared, default is try to repair all columns matched |
--rtrim-varchar | Whether to rtrim blanks after varchar data, by default blanks after varchar is kept |
Examples of Common Commands
The following lists command examples in common scenarios (using the Linux x86_64 2.0 as an example).
Note: The method of comparing statistical eigenvalues is inefficient. It is recommended to use the MD5 row-by-row comparison method.
-
Apply for a license
./mvd_linux_x86_64 --apply-license
You need to enter the email address for receive the license during execution.
-
Compare with a config file
Execute comparison task with a pre-edited configuration file. For configuration instructions of the specific configuration file, please refer to: MVD Configuration
./mvd_linux_x86_64 -c config.json
-
Compare the structures and data between Oracle and MogDB.
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:hongye:pwd' -t 'MOGDB:127.0.0.1:5432:omm:hongye:pwd' -i 'mtk.*' -R './diff'
-
Compare the structures and data between DB2 to MogDB.
./mvd_linux_x86_64 -s 'DB2:127.0.0.1:50000:HONGYE:db2inst1:pwd' -t 'MOGDB:127.0.0.1:5432:db2_mtk1:hongye:pwd' -i 'mtk.*' -R './diff'
-
Compare the structures and data between MySQL to MogDB.
./mvd_linux_x86_64 -s 'MYSQL:127.0.0.1:3306:hongye:root:pwd' -t 'MOGDB:127.0.0.1:5432:mysql_mtk:hongye:pwd' -i 'mtk.*' -R './diff'
-
Compare data in a table (eigenvalue comparison).
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -T 'HONGYE.TEST'
-
Compare data and structures and precisely recognize rows that involve difference.
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -i 'mtk.*' -R './diff'
-
Perform data verification after migration according to the MTK configuration file.
./mvd_linux_x86_64 --mtk-config oracle2opengauss.json
-
Compare a table using sampling slice.
Dynamic sampling parameter --sample-size controls sampling threshold: 100 means use sampling when table size is bigger than 100MB, and --sample-pct controls sampling percentage: 0.1 means compare 10% of whole table data.
./mvd_linux_x86_64 -s 'ORACLE:127.0.0.1:1521:orcl:scott:tiger' -t 'MOGDB:127.0.0.1:5432:postgres:hongye:pwd' -t 'mtk.test_big_table' -R './diff' --sample-size 100 --sample-pct 0.1