MVD Configurations

MVD can executing comparison task by a configuration file, which is a file in json format.

Description of configuration options

Options	Type	Defaults	Description
SOURCE_DB	dict		Source database connection, including type, host, port, service, user, password
TARGET_DB	dict		Target database connection, including type, host, port, service, user, password
INCLUDE	list		Object whitelist for comparison, can use object_type, schema, object_name, and corresponding matching expressions
EXCLUDE	list		Object blacklist for comparison, can use object_type, schema, object_name, and corresponding matching expressions
TABLE	str		Single table comparison, if this option used, then INCLUDE/EXCLUDE will be ignored
ROW_DIR	str	/tmp/mvd_diff	Row difference directory, each table has one file if data difference found
CATEGORY	str	all	Comparison category, can compare metadata (meta) or data (data) separately
MODE	str	Row	Comparison mode, can be: Summary Mode ([S]ummary), Row Mode ([R]ow), Auto Mode ([A]utomatic), default is Row
WORKERS	int	8	Table parallel workers, valid range is: (0, 32]
TABLE_WORKERS	int	4	Parallels within a table when partition/sampling/data-filter used, valid range is: (0, 32]
DATA_FILTER	dict		Specify filter conditions, either for a single table or for a group of tables, either a single filter condition (partial comparison) or multiple filters (manual concurrency)
SAMPLE_SIZE	int	10240	Dynamic sampling threshold, the table size exceeds this threshold can considering dynamic sampling, the unit is: MB
SAMPLE_PCT	int	1	Dynamic sampling ratio, the range of values: (0, 1], the value of 1 for the full table sampling
REMAP_SCHEMA	dict		Use this option to specify the schema mapping if there is a schema inconsistency between the two databases
COLUMN_LIST	dict		Specify the range of column names to be compared, and the mapping relationship of the column names
SKIP_COLUMN_TYPE	dict	{}	Skip the specified column type, specify to skip the structure comparison, skip the data comparison, or both
SKIP_COLUMN_NAME	dict	{}	Skip the specified column name, specify to skip the structure comparison, skip the data comparison, or both
IGNORE_FLOAT	bool	false	Whether to ignore floating point columns during comparing, floating point columns may have precision differences which can caused failure of line comparison
ORA_FLOAT_PRECISION	int	-1	Specify Oracle float precision, the data with float datatype in Oracle will be rounded during comparing
FLOAT_PRECISION	int	-1	Specify float precision, the data with float datatype will be rounded during comparing
DOUBLE_PRECISION	int	-1	Specify double precision, the data with double datatype will be rounded during comparing
FRACTION_PRECISION	int	6	Specify millisecond precision, Informix range is [0, 5] and other database range is [0, 6]
ZERO_CHAR	str		Specify the replacement character for the null chr(0) character, the default is to remove the chr(0) character directly
TIME_ZONE	str	+00:00	Specify the time zone to be used in the database client when comparing, ensure that both time zones are the same
EXCLUDE_SYS_TABLE	list	[]	Exclude specified system tables
RTRIM_VARCHAR	bool	false	Whether to remove trailing spaces from variable length strings before comparison
LOCAL_MD5	bool	false	Whether to force calculate md5 locally in Python program
CALLBACK	str		Specify the result callback URL, support getting the execution result by callback
RUN_MODE	str	INFO	Specify the mode of execution, the logs for different modes may vary greatly, support: INFO/DEBUG/WARNING/ERROR/CRITICAL, etc
DEBUG_MD5	bool	false	For DEBUG mode of MD5 data comparison, this option will output logs of the same order of magnitude as the number of rows in the table when enabled, do not enable it unless necessary
LOGFILE	str	mvd_logfile.log	Output the log to the specified log file
PASSWD_ENCRYPT	bool	false	The database user password is encrypted by MDB and needs to be decrypted to use
MTK_CONFIG	str		Specify the MTK configuration file and support data comparison after migration by parsing the necessary parameters in the MTK configuration file
RESULT_FILE	str		Specify the output location of the result file, the default is only output to the screen, no result file is generated
RESULT_FORMAT	str	json	Specify the format of result file, support：json, plain
DETAIL_MODE	bool	false	Whether to output the data comparison result in detailed mode, detailed mode will output the data comparison result of all tables, default concise mode will only output the comparison result of tables with differences
ROW_FEEDBACK	bool	false	Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file
GENERATE_REPAIR	bool	false	Whether to generate repair scripts for the target database
REPAIR_COMPARED	bool	false	Whether to repair all columns matched, default is to repair all columns matched on both sides
FUNCTION_DIMENSION	dict		The old column statistics value comparison dimension, not recommended and may be deprecated in the future
LICENSE_FILE	str	./license.json	Specify the location of the license file

A sample configuration file is as follows (single line comments with // are supported in the configuration file):

{
    // Mandantory configurations
    // =============================================================================================
    "SOURCE_DB":                // Source Database connection information
    {
        "TYPE": "ORACLE",       // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
        "HOST": "127.0.0.1",    // DB Host IP
        "PORT": "1521",         // DB Port
        "SERVICE": "orcl",      // DB Name/Service
        "USER": "",             // DB User
        "PASSWORD": ""          // DB Password
    },
    "TARGET_DB":                // Target Database connection information
    {
        "TYPE": "MOGDB",        // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
        "HOST": "127.0.0.1",    // DB Host IP
        "PORT": "26000",        // DB Port
        "SERVICE": "postgres",  // DB Name/Service
        "USER": "omm",          // DB User
        "PASSWORD": ""          // DB Password
    },
    "INCLUDE":                  // Source white list to be compared
    [                                                   // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
        // ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"],    //       2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
        // ["%", "HONGYE", "%"]                         //       3. Name is case insensitive
    ],
    "EXCLUDE": [],              // Source black list to be compared, format like "INCLUDE"
    "TABLE": "",                // Single table comparison, when this option used, then "INCLUDE" and "EXCLUDE" are ignored
    // "TABLE": "source_schema.source_table:target_schema.target_table",  // source and target can have different schema or table name
    "ROW_DIR": "/tmp/mvd_diff",   // Row differences data directory, each diff table has a file in this directory
    "CATEGORY": "all",          // Compare category: all, meta, data
    "MODE": "Row",              // Data compare mode, default is [R]
                                //   [R] Row mode, compare data row by row
                                //   [S] Summary mode, compare summary data, include row count and data signature
                                //   [A] Automatic mode, Compare summary and compare row when summary does not matched


    // Optional configurations
    // =============================================================================================
    // Performance related configurations
    "WORKERS": 8,               // Parallel between tables
    "TABLE_WORKERS": 4,         // Parallel within a table (split by partition/sampling)
    "DATA_FILTER":              // Customer defined data filter for data comparison
    {
        // "*": [["id < 100", "id < 100"]]
        // "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
        //                                ["ID between 10000 and 20000", "ID between 10000 and 20000"]
        //                               ]

        // Follow 3 configurations does the same thing:
        // "*":["col_id < 100", "col_id > 100 and col_id < 200"]
        // "*":[["col_id < 100"], ["col_id > 100 and col_id < 200"]]
        // "*":[["col_id < 100", "col_id < 100"], ["col_id > 100 and col_id < 200", "col_id > 100 and col_id < 200"]]
    },
    "SAMPLE_SIZE": 10240,       // Sampling threshold in MB, default 10240MB, 0 means do not split data
    "SAMPLE_PCT": 1,            // Sampling percentage, values between 0 and 1, 1 means split and compare all data

    // Remapping configurations
    "REMAP_SCHEMA":             // Remap schema in comparison: Source:Target
    {
        // "SOURCE_SCHEMA": "TARGET_SCHEMA"
    },
    "COLUMN_LIST":              // Column limitation and remapping
    {
         // "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
         // "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
         // "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]

        // Follow 3 configurations does the same thing:
        // "*":["COL_INT", "COL_SMALLINT"]
        // "*":[["COL_INT"], ["COL_SMALLINT"]]
        // "*":[["COL_INT", "COL_INT"], ["COL_SMALLINT", "COL_SMALLINT"]]
    },

    // Data related configurations
    "IGNORE_FLOAT": false,      // Ignore float column in data comparison
    "ORA_FLOAT_PRECISION": -1,  // Oracle float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "FLOAT_PRECISION": -1,      // Float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "DOUBLE_PRECISION": -1,     // Double precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "FRACTION_PRECISION": 6,    // Time microseconds fraction, informix max to 5
    "ZERO_CHAR": "",            // Replace chr(0) with given char
    "TIME_ZONE": "+00:00",
    "EXCLUDE_SYS_TABLE": [],    // Exclude system object: object_name or schema.object_name. case insensitive
    "RTRIM_VARCHAR": false,     // Client time zone, default depend to server setting

    // Other configurations
    "CALLBACK": "",
    "RUN_MODE": "INFO",         // Log mode: INFO, DEBUG
    "DEBUG_MD5": false,         // Debug MD5 data before Python Calculation and Comparison
    "LOGFILE": "",
    "PASSWD_ENCRYPT": false,    // DB Password are encrypted
    "MTK_CONFIG": "",           // MTK config file
    "RESULT_FILE": "",          // Write result to a file
    "RESULT_FORMAT": "json",    // Result format: json, plain
    "DETAIL_MODE": false,       // Default only list table with data differences, detail mode will list all tables in data comparison
    "ROW_FEEDBACK": false,      // Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file
    "GENERATE_REPAIR": false,   // Whether to generate repair scripts for the target database
    "REPAIR_COMPARED": false,   // Whether to repair all columns matched, default is to repair all columns matched on both sides
    "ROW_FEEDBACK": false,      // Reversed for feature use
    "FUNCTION_DIMENSION":       // Column summary statistics comparison
    {
        "AVG": {"TYPE": "A", "FUNCTIONS": {"*": "AVG"}},
        "MIN": {"TYPE": "NP", "FUNCTIONS": {"*": "MIN"}},
        "MAX": {"TYPE": "NP", "FUNCTIONS": {"*": "MAX"}},
        "MEDIAN": {"TYPE": "NP", "FUNCTIONS": {"*": "MEDIAN"}}
    },
    "LICENSE_FILE": "./license.json"
}

INCLUDE/EXCLUDE

The scope of the comparison can be specified by the INCLUDE/EXCLUDE option, one is a whitelist and the other is a blacklist, the comparison will be executed if the object is in the whitelist and not in the blacklist.

Both are two-dimensional arrays of configuration items with the following configuration rules:

Each array in the inner layer corresponds to an object rule
The inner array contains 3 elements, corresponding to the object type, schema and object name
The % can be used in the inner data element to indicate a generic matching rule, similar to the matching rule in a database LIKE operation
Case-insensitive data in names

Example:

    "INCLUDE":                  // Source white list to be compared
    [                                                   // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
        // ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"],    //       2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
        // ["%", "HONGYE", "%"]                         //       3. Name is case insensitive
    ],
    "EXCLUDE": [],              // Source black list to be compared, format like "INCLUDE"

TABLE

Single table comparison mode, the INCLUDE/EXCLUDE option is disabled when this option is used.

In single table mode, schema.table_name must be specified, supporting two formats:

If both schema and table_name are the same in the source and target databases, use schema.table_name
If there is a difference in schema or table_name between the source and target databases, use source_schema.source_table:target_schema.target_table

DATA_FILTER

By specifying filter conditions, the following two requirements can be achieved:

For large tables, compare only partial of the data, for example, only some of the more recent data
For large tables, ranges can be manually divided according to a column, and data can be compared between multiple ranges concurrently

This option is a dictionary structure:

KEY is the table information: [schema.]table_name, or you can use * to enable the filter for all tables in the range
VALUE is a two-dimensional array of filter conditions, the inner array elements are specific filter conditions, containing two elements: source conditions, target conditions, if source filter is the same as target filter, then the inner array can contains only one element or using string directly.

Example:

    "DATA_FILTER":              // Customer defined data filter for data comparison
    {
        // "*": [["id < 100", "id < 100"]]
        // "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
        //                                ["ID between 10000 and 20000", "ID between 10000 and 20000"]
        //                               ]
    }

REMAP_SCHEMA

Use this option to specify the schema mapping relationship if there is a schema inconsistency between the two sides of the comparison.

This option is a dictionary structure, where KEY is the source schema and VALUE is the target schema.

Example:

    "REMAP_SCHEMA":             // Remap schema in comparison: Source:Target
    {
        // "SOURCE_SCHEMA": "TARGET_SCHEMA"
    }

COLUMN_LIST

This option is used to specify the range of column names, and the mapping of column names. The following two requirements can be achieved simultaneously:

Column Restriction: Restrict comparison to specified columns only
Column Mapping: Set the column mapping between source and target

This option is a dictionary structure:

KEY is the table information: [schema.]table_name, or you can use * to enable the column restriction and mapping for all tables in the range
VALUE is the column information, a two-dimensional array, each inner array corresponds to a column, which contains two elements: the source column, the target column, if source column is the same as target column, then the inner array can contains only one element or using string directly.

Example:

    "COLUMN_LIST":              // Column limitation and remapping
    {
         // "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
         // "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
         // "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]
    }