MVD Configurations
MVD can executing comparison task by a configuration file, which is a file in json format.
Description of configuration options
Options | Type | Defaults | Description |
---|---|---|---|
SOURCE_DB | dict | Source database connection, including type, host, port, service, user, password | |
TARGET_DB | dict | Target database connection, including type, host, port, service, user, password | |
INCLUDE | list | Object whitelist for comparison, can use object_type, schema, object_name, and corresponding matching expressions | |
EXCLUDE | list | Object blacklist for comparison, can use object_type, schema, object_name, and corresponding matching expressions | |
TABLE | str | Single table comparison, if this option used, then INCLUDE/EXCLUDE will be ignored | |
ROW_DIR | str | /tmp/mvd_diff | Row difference directory, each table has one file if data difference found |
CATEGORY | str | all | Comparison category, can compare metadata (meta) or data (data) separately |
MODE | str | Row | Comparison mode, can be: Summary Mode ([S]ummary), Row Mode ([R]ow), Auto Mode ([A]utomatic), default is Row |
WORKERS | int | 8 | Table parallel workers, valid range is: (0, 32] |
TABLE_WORKERS | int | 4 | Parallels within a table when partition/sampling/data-filter used, valid range is: (0, 32] |
DATA_FILTER | dict | Specify filter conditions, either for a single table or for a group of tables, either a single filter condition (partial comparison) or multiple filters (manual concurrency) | |
SAMPLE_SIZE | int | 10240 | Dynamic sampling threshold, the table size exceeds this threshold can considering dynamic sampling, the unit is: MB |
SAMPLE_PCT | int | 1 | Dynamic sampling ratio, the range of values: (0, 1], the value of 1 for the full table sampling |
REMAP_SCHEMA | dict | Use this option to specify the schema mapping if there is a schema inconsistency between the two databases | |
COLUMN_LIST | dict | Specify the range of column names to be compared, and the mapping relationship of the column names | |
SKIP_COLUMN_TYPE | dict | {} | Skip the specified column type, specify to skip the structure comparison, skip the data comparison, or both |
SKIP_COLUMN_NAME | dict | {} | Skip the specified column name, specify to skip the structure comparison, skip the data comparison, or both |
IGNORE_FLOAT | bool | false | Whether to ignore floating point columns during comparing, floating point columns may have precision differences which can caused failure of line comparison |
ORA_FLOAT_PRECISION | int | -1 | Specify Oracle float precision, the data with float datatype in Oracle will be rounded during comparing |
FLOAT_PRECISION | int | -1 | Specify float precision, the data with float datatype will be rounded during comparing |
DOUBLE_PRECISION | int | -1 | Specify double precision, the data with double datatype will be rounded during comparing |
FRACTION_PRECISION | int | 6 | Specify millisecond precision, Informix range is [0, 5] and other database range is [0, 6] |
ZERO_CHAR | str | Specify the replacement character for the null chr(0) character, the default is to remove the chr(0) character directly | |
TIME_ZONE | str | +00:00 | Specify the time zone to be used in the database client when comparing, ensure that both time zones are the same |
EXCLUDE_SYS_TABLE | list | [] | Exclude specified system tables |
RTRIM_VARCHAR | bool | false | Whether to remove trailing spaces from variable length strings before comparison |
LOCAL_MD5 | bool | false | Whether to force calculate md5 locally in Python program |
CALLBACK | str | Specify the result callback URL, support getting the execution result by callback | |
RUN_MODE | str | INFO | Specify the mode of execution, the logs for different modes may vary greatly, support: INFO/DEBUG/WARNING/ERROR/CRITICAL, etc |
DEBUG_MD5 | bool | false | For DEBUG mode of MD5 data comparison, this option will output logs of the same order of magnitude as the number of rows in the table when enabled, do not enable it unless necessary |
LOGFILE | str | mvd_logfile.log | Output the log to the specified log file |
PASSWD_ENCRYPT | bool | false | The database user password is encrypted by MDB and needs to be decrypted to use |
MTK_CONFIG | str | Specify the MTK configuration file and support data comparison after migration by parsing the necessary parameters in the MTK configuration file | |
RESULT_FILE | str | Specify the output location of the result file, the default is only output to the screen, no result file is generated | |
RESULT_FORMAT | str | json | Specify the format of result file, support:json, plain |
DETAIL_MODE | bool | false | Whether to output the data comparison result in detailed mode, detailed mode will output the data comparison result of all tables, default concise mode will only output the comparison result of tables with differences |
ROW_FEEDBACK | bool | false | Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file |
GENERATE_REPAIR | bool | false | Whether to generate repair scripts for the target database |
REPAIR_COMPARED | bool | false | Whether to repair all columns matched, default is to repair all columns matched on both sides |
FUNCTION_DIMENSION | dict | The old column statistics value comparison dimension, not recommended and may be deprecated in the future | |
LICENSE_FILE | str | ./license.json | Specify the location of the license file |
A sample configuration file is as follows (single line comments with //
are supported in the configuration file):
{
// Mandantory configurations
// =============================================================================================
"SOURCE_DB": // Source Database connection information
{
"TYPE": "ORACLE", // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
"HOST": "127.0.0.1", // DB Host IP
"PORT": "1521", // DB Port
"SERVICE": "orcl", // DB Name/Service
"USER": "", // DB User
"PASSWORD": "" // DB Password
},
"TARGET_DB": // Target Database connection information
{
"TYPE": "MOGDB", // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
"HOST": "127.0.0.1", // DB Host IP
"PORT": "26000", // DB Port
"SERVICE": "postgres", // DB Name/Service
"USER": "omm", // DB User
"PASSWORD": "" // DB Password
},
"INCLUDE": // Source white list to be compared
[ // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
// ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"], // 2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
// ["%", "HONGYE", "%"] // 3. Name is case insensitive
],
"EXCLUDE": [], // Source black list to be compared, format like "INCLUDE"
"TABLE": "", // Single table comparison, when this option used, then "INCLUDE" and "EXCLUDE" are ignored
// "TABLE": "source_schema.source_table:target_schema.target_table", // source and target can have different schema or table name
"ROW_DIR": "/tmp/mvd_diff", // Row differences data directory, each diff table has a file in this directory
"CATEGORY": "all", // Compare category: all, meta, data
"MODE": "Row", // Data compare mode, default is [R]
// [R] Row mode, compare data row by row
// [S] Summary mode, compare summary data, include row count and data signature
// [A] Automatic mode, Compare summary and compare row when summary does not matched
// Optional configurations
// =============================================================================================
// Performance related configurations
"WORKERS": 8, // Parallel between tables
"TABLE_WORKERS": 4, // Parallel within a table (split by partition/sampling)
"DATA_FILTER": // Customer defined data filter for data comparison
{
// "*": [["id < 100", "id < 100"]]
// "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
// ["ID between 10000 and 20000", "ID between 10000 and 20000"]
// ]
// Follow 3 configurations does the same thing:
// "*":["col_id < 100", "col_id > 100 and col_id < 200"]
// "*":[["col_id < 100"], ["col_id > 100 and col_id < 200"]]
// "*":[["col_id < 100", "col_id < 100"], ["col_id > 100 and col_id < 200", "col_id > 100 and col_id < 200"]]
},
"SAMPLE_SIZE": 10240, // Sampling threshold in MB, default 10240MB, 0 means do not split data
"SAMPLE_PCT": 1, // Sampling percentage, values between 0 and 1, 1 means split and compare all data
// Remapping configurations
"REMAP_SCHEMA": // Remap schema in comparison: Source:Target
{
// "SOURCE_SCHEMA": "TARGET_SCHEMA"
},
"COLUMN_LIST": // Column limitation and remapping
{
// "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
// "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
// "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]
// Follow 3 configurations does the same thing:
// "*":["COL_INT", "COL_SMALLINT"]
// "*":[["COL_INT"], ["COL_SMALLINT"]]
// "*":[["COL_INT", "COL_INT"], ["COL_SMALLINT", "COL_SMALLINT"]]
},
// Data related configurations
"IGNORE_FLOAT": false, // Ignore float column in data comparison
"ORA_FLOAT_PRECISION": -1, // Oracle float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
"FLOAT_PRECISION": -1, // Float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
"DOUBLE_PRECISION": -1, // Double precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
"FRACTION_PRECISION": 6, // Time microseconds fraction, informix max to 5
"ZERO_CHAR": "", // Replace chr(0) with given char
"TIME_ZONE": "+00:00",
"EXCLUDE_SYS_TABLE": [], // Exclude system object: object_name or schema.object_name. case insensitive
"RTRIM_VARCHAR": false, // Client time zone, default depend to server setting
// Other configurations
"CALLBACK": "",
"RUN_MODE": "INFO", // Log mode: INFO, DEBUG
"DEBUG_MD5": false, // Debug MD5 data before Python Calculation and Comparison
"LOGFILE": "",
"PASSWD_ENCRYPT": false, // DB Password are encrypted
"MTK_CONFIG": "", // MTK config file
"RESULT_FILE": "", // Write result to a file
"RESULT_FORMAT": "json", // Result format: json, plain
"DETAIL_MODE": false, // Default only list table with data differences, detail mode will list all tables in data comparison
"ROW_FEEDBACK": false, // Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file
"GENERATE_REPAIR": false, // Whether to generate repair scripts for the target database
"REPAIR_COMPARED": false, // Whether to repair all columns matched, default is to repair all columns matched on both sides
"ROW_FEEDBACK": false, // Reversed for feature use
"FUNCTION_DIMENSION": // Column summary statistics comparison
{
"AVG": {"TYPE": "A", "FUNCTIONS": {"*": "AVG"}},
"MIN": {"TYPE": "NP", "FUNCTIONS": {"*": "MIN"}},
"MAX": {"TYPE": "NP", "FUNCTIONS": {"*": "MAX"}},
"MEDIAN": {"TYPE": "NP", "FUNCTIONS": {"*": "MEDIAN"}}
},
"LICENSE_FILE": "./license.json"
}
INCLUDE/EXCLUDE
The scope of the comparison can be specified by the INCLUDE/EXCLUDE
option, one is a whitelist and the other is a blacklist, the comparison will be executed if the object is in the whitelist and not in the blacklist.
Both are two-dimensional arrays of configuration items with the following configuration rules:
- Each array in the inner layer corresponds to an object rule
- The inner array contains 3 elements, corresponding to the object type, schema and object name
- The % can be used in the inner data element to indicate a generic matching rule, similar to the matching rule in a database LIKE operation
- Case-insensitive data in names
Example:
"INCLUDE": // Source white list to be compared
[ // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
// ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"], // 2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
// ["%", "HONGYE", "%"] // 3. Name is case insensitive
],
"EXCLUDE": [], // Source black list to be compared, format like "INCLUDE"
TABLE
Single table comparison mode, the INCLUDE/EXCLUDE
option is disabled when this option is used.
In single table mode, schema.table_name must be specified, supporting two formats:
- If both schema and table_name are the same in the source and target databases, use
schema.table_name
- If there is a difference in schema or table_name between the source and target databases, use
source_schema.source_table:target_schema.target_table
DATA_FILTER
By specifying filter conditions, the following two requirements can be achieved:
- For large tables, compare only partial of the data, for example, only some of the more recent data
- For large tables, ranges can be manually divided according to a column, and data can be compared between multiple ranges concurrently
This option is a dictionary structure:
- KEY is the table information: [schema.]table_name, or you can use * to enable the filter for all tables in the range
- VALUE is a two-dimensional array of filter conditions, the inner array elements are specific filter conditions, containing two elements: source conditions, target conditions, if source filter is the same as target filter, then the inner array can contains only one element or using string directly.
Example:
"DATA_FILTER": // Customer defined data filter for data comparison
{
// "*": [["id < 100", "id < 100"]]
// "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
// ["ID between 10000 and 20000", "ID between 10000 and 20000"]
// ]
}
REMAP_SCHEMA
Use this option to specify the schema mapping relationship if there is a schema inconsistency between the two sides of the comparison.
This option is a dictionary structure, where KEY is the source schema and VALUE is the target schema.
Example:
"REMAP_SCHEMA": // Remap schema in comparison: Source:Target
{
// "SOURCE_SCHEMA": "TARGET_SCHEMA"
}
COLUMN_LIST
This option is used to specify the range of column names, and the mapping of column names. The following two requirements can be achieved simultaneously:
- Column Restriction: Restrict comparison to specified columns only
- Column Mapping: Set the column mapping between source and target
This option is a dictionary structure:
- KEY is the table information: [schema.]table_name, or you can use * to enable the column restriction and mapping for all tables in the range
- VALUE is the column information, a two-dimensional array, each inner array corresponds to a column, which contains two elements: the source column, the target column, if source column is the same as target column, then the inner array can contains only one element or using string directly.
Example:
"COLUMN_LIST": // Column limitation and remapping
{
// "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
// "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
// "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]
}