HomeMogDBMogDB StackUqbar

Documentation:v3.4

Supported Versions:

Other Versions:

MVD Configurations

MVD can executing comparison task by a configuration file, which is a file in json format.

Description of configuration options

Options Type Defaults Description
SOURCE_DB dict Source database connection, including type, host, port, service, user, password
TARGET_DB dict Target database connection, including type, host, port, service, user, password
INCLUDE list Object whitelist for comparison, can use object_type, schema, object_name, and corresponding matching expressions
EXCLUDE list Object blacklist for comparison, can use object_type, schema, object_name, and corresponding matching expressions
TABLE str Single table comparison, if this option used, then INCLUDE/EXCLUDE will be ignored
ROW_DIR str /tmp/mvd_diff Row difference directory, each table has one file if data difference found
CATEGORY str all Comparison category, can compare metadata (meta) or data (data) separately
MODE str Row Comparison mode, can be: Summary Mode ([S]ummary), Row Mode ([R]ow), Auto Mode ([A]utomatic), default is Row
WORKERS int 8 Table parallel workers, valid range is: (0, 32]
TABLE_WORKERS int 4 Parallels within a table when partition/sampling/data-filter used, valid range is: (0, 32]
DATA_FILTER dict Specify filter conditions, either for a single table or for a group of tables, either a single filter condition (partial comparison) or multiple filters (manual concurrency)
SAMPLE_SIZE int 10240 Dynamic sampling threshold, the table size exceeds this threshold can considering dynamic sampling, the unit is: MB
SAMPLE_PCT int 1 Dynamic sampling ratio, the range of values: (0, 1], the value of 1 for the full table sampling
REMAP_SCHEMA dict Use this option to specify the schema mapping if there is a schema inconsistency between the two databases
COLUMN_LIST dict Specify the range of column names to be compared, and the mapping relationship of the column names
SKIP_COLUMN_TYPE dict {} Skip the specified column type, specify to skip the structure comparison, skip the data comparison, or both
SKIP_COLUMN_NAME dict {} Skip the specified column name, specify to skip the structure comparison, skip the data comparison, or both
IGNORE_FLOAT bool false Whether to ignore floating point columns during comparing, floating point columns may have precision differences which can caused failure of line comparison
ORA_FLOAT_PRECISION int -1 Specify Oracle float precision, the data with float datatype in Oracle will be rounded during comparing
FLOAT_PRECISION int -1 Specify float precision, the data with float datatype will be rounded during comparing
DOUBLE_PRECISION int -1 Specify double precision, the data with double datatype will be rounded during comparing
FRACTION_PRECISION int 6 Specify millisecond precision, Informix range is [0, 5] and other database range is [0, 6]
ZERO_CHAR str Specify the replacement character for the null chr(0) character, the default is to remove the chr(0) character directly
TIME_ZONE str +00:00 Specify the time zone to be used in the database client when comparing, ensure that both time zones are the same
EXCLUDE_SYS_TABLE list [] Exclude specified system tables
RTRIM_VARCHAR bool false Whether to remove trailing spaces from variable length strings before comparison
LOCAL_MD5 bool false Whether to force calculate md5 locally in Python program
CALLBACK str Specify the result callback URL, support getting the execution result by callback
RUN_MODE str INFO Specify the mode of execution, the logs for different modes may vary greatly, support: INFO/DEBUG/WARNING/ERROR/CRITICAL, etc
DEBUG_MD5 bool false For DEBUG mode of MD5 data comparison, this option will output logs of the same order of magnitude as the number of rows in the table when enabled, do not enable it unless necessary
LOGFILE str mvd_logfile.log Output the log to the specified log file
PASSWD_ENCRYPT bool false The database user password is encrypted by MDB and needs to be decrypted to use
MTK_CONFIG str Specify the MTK configuration file and support data comparison after migration by parsing the necessary parameters in the MTK configuration file
RESULT_FILE str Specify the output location of the result file, the default is only output to the screen, no result file is generated
RESULT_FORMAT str json Specify the format of result file, support:json, plain
DETAIL_MODE bool false Whether to output the data comparison result in detailed mode, detailed mode will output the data comparison result of all tables, default concise mode will only output the comparison result of tables with differences
ROW_FEEDBACK bool false Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file
GENERATE_REPAIR bool false Whether to generate repair scripts for the target database
REPAIR_COMPARED bool false Whether to repair all columns matched, default is to repair all columns matched on both sides
FUNCTION_DIMENSION dict The old column statistics value comparison dimension, not recommended and may be deprecated in the future
LICENSE_FILE str ./license.json Specify the location of the license file

A sample configuration file is as follows (single line comments with // are supported in the configuration file):

{
    // Mandantory configurations
    // =============================================================================================
    "SOURCE_DB":                // Source Database connection information
    {
        "TYPE": "ORACLE",       // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
        "HOST": "127.0.0.1",    // DB Host IP
        "PORT": "1521",         // DB Port
        "SERVICE": "orcl",      // DB Name/Service
        "USER": "",             // DB User
        "PASSWORD": ""          // DB Password
    },
    "TARGET_DB":                // Target Database connection information
    {
        "TYPE": "MOGDB",        // DB Type: ORACLE|ORACLE2|DB2|MYSQL|POSTGRESQL|MOGDB|OPENGAUSS|SQLSERVER|INFORMIX
        "HOST": "127.0.0.1",    // DB Host IP
        "PORT": "26000",        // DB Port
        "SERVICE": "postgres",  // DB Name/Service
        "USER": "omm",          // DB User
        "PASSWORD": ""          // DB Password
    },
    "INCLUDE":                  // Source white list to be compared
    [                                                   // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
        // ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"],    //       2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
        // ["%", "HONGYE", "%"]                         //       3. Name is case insensitive
    ],
    "EXCLUDE": [],              // Source black list to be compared, format like "INCLUDE"
    "TABLE": "",                // Single table comparison, when this option used, then "INCLUDE" and "EXCLUDE" are ignored
    // "TABLE": "source_schema.source_table:target_schema.target_table",  // source and target can have different schema or table name
    "ROW_DIR": "/tmp/mvd_diff",   // Row differences data directory, each diff table has a file in this directory
    "CATEGORY": "all",          // Compare category: all, meta, data
    "MODE": "Row",              // Data compare mode, default is [R]
                                //   [R] Row mode, compare data row by row
                                //   [S] Summary mode, compare summary data, include row count and data signature
                                //   [A] Automatic mode, Compare summary and compare row when summary does not matched


    // Optional configurations
    // =============================================================================================
    // Performance related configurations
    "WORKERS": 8,               // Parallel between tables
    "TABLE_WORKERS": 4,         // Parallel within a table (split by partition/sampling)
    "DATA_FILTER":              // Customer defined data filter for data comparison
    {
        // "*": [["id < 100", "id < 100"]]
        // "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
        //                                ["ID between 10000 and 20000", "ID between 10000 and 20000"]
        //                               ]

        // Follow 3 configurations does the same thing:
        // "*":["col_id < 100", "col_id > 100 and col_id < 200"]
        // "*":[["col_id < 100"], ["col_id > 100 and col_id < 200"]]
        // "*":[["col_id < 100", "col_id < 100"], ["col_id > 100 and col_id < 200", "col_id > 100 and col_id < 200"]]
    },
    "SAMPLE_SIZE": 10240,       // Sampling threshold in MB, default 10240MB, 0 means do not split data
    "SAMPLE_PCT": 1,            // Sampling percentage, values between 0 and 1, 1 means split and compare all data

    // Remapping configurations
    "REMAP_SCHEMA":             // Remap schema in comparison: Source:Target
    {
        // "SOURCE_SCHEMA": "TARGET_SCHEMA"
    },
    "COLUMN_LIST":              // Column limitation and remapping
    {
         // "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
         // "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
         // "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]

        // Follow 3 configurations does the same thing:
        // "*":["COL_INT", "COL_SMALLINT"]
        // "*":[["COL_INT"], ["COL_SMALLINT"]]
        // "*":[["COL_INT", "COL_INT"], ["COL_SMALLINT", "COL_SMALLINT"]]
    },

    // Data related configurations
    "IGNORE_FLOAT": false,      // Ignore float column in data comparison
    "ORA_FLOAT_PRECISION": -1,  // Oracle float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "FLOAT_PRECISION": -1,      // Float precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "DOUBLE_PRECISION": -1,     // Double precision (-1 ~ 128), -1 means automatic round to column "precision - 1"
    "FRACTION_PRECISION": 6,    // Time microseconds fraction, informix max to 5
    "ZERO_CHAR": "",            // Replace chr(0) with given char
    "TIME_ZONE": "+00:00",
    "EXCLUDE_SYS_TABLE": [],    // Exclude system object: object_name or schema.object_name. case insensitive
    "RTRIM_VARCHAR": false,     // Client time zone, default depend to server setting

    // Other configurations
    "CALLBACK": "",
    "RUN_MODE": "INFO",         // Log mode: INFO, DEBUG
    "DEBUG_MD5": false,         // Debug MD5 data before Python Calculation and Comparison
    "LOGFILE": "",
    "PASSWD_ENCRYPT": false,    // DB Password are encrypted
    "MTK_CONFIG": "",           // MTK config file
    "RESULT_FILE": "",          // Write result to a file
    "RESULT_FORMAT": "json",    // Result format: json, plain
    "DETAIL_MODE": false,       // Default only list table with data differences, detail mode will list all tables in data comparison
    "ROW_FEEDBACK": false,      // Try to found column data differences for tables with primary key, and write differences into 'DIFF_DETAIL' field in diff result file
    "GENERATE_REPAIR": false,   // Whether to generate repair scripts for the target database
    "REPAIR_COMPARED": false,   // Whether to repair all columns matched, default is to repair all columns matched on both sides
    "ROW_FEEDBACK": false,      // Reversed for feature use
    "FUNCTION_DIMENSION":       // Column summary statistics comparison
    {
        "AVG": {"TYPE": "A", "FUNCTIONS": {"*": "AVG"}},
        "MIN": {"TYPE": "NP", "FUNCTIONS": {"*": "MIN"}},
        "MAX": {"TYPE": "NP", "FUNCTIONS": {"*": "MAX"}},
        "MEDIAN": {"TYPE": "NP", "FUNCTIONS": {"*": "MEDIAN"}}
    },
    "LICENSE_FILE": "./license.json"
}

INCLUDE/EXCLUDE

The scope of the comparison can be specified by the INCLUDE/EXCLUDE option, one is a whitelist and the other is a blacklist, the comparison will be executed if the object is in the whitelist and not in the blacklist.

Both are two-dimensional arrays of configuration items with the following configuration rules:

  1. Each array in the inner layer corresponds to an object rule
  2. The inner array contains 3 elements, corresponding to the object type, schema and object name
  3. The % can be used in the inner data element to indicate a generic matching rule, similar to the matching rule in a database LIKE operation
  4. Case-insensitive data in names

Example:

    "INCLUDE":                  // Source white list to be compared
    [                                                   // Tips: 1. <OBJECT_TYPE> can use: %/TABLE/VIEW/SEQUENCE/PROCEDURE/FUNCTION/OTHERS
        // ["OBJECT_TYPE", "SCHEMA", "OBJECT_NAME"],    //       2. Can use % in <SCHEMA> and <OBJECT_NAME> field, means to match all
        // ["%", "HONGYE", "%"]                         //       3. Name is case insensitive
    ],
    "EXCLUDE": [],              // Source black list to be compared, format like "INCLUDE"

TABLE

Single table comparison mode, the INCLUDE/EXCLUDE option is disabled when this option is used.

In single table mode, schema.table_name must be specified, supporting two formats:

  • If both schema and table_name are the same in the source and target databases, use schema.table_name
  • If there is a difference in schema or table_name between the source and target databases, use source_schema.source_table:target_schema.target_table

DATA_FILTER

By specifying filter conditions, the following two requirements can be achieved:

  1. For large tables, compare only partial of the data, for example, only some of the more recent data
  2. For large tables, ranges can be manually divided according to a column, and data can be compared between multiple ranges concurrently

This option is a dictionary structure:

  • KEY is the table information: [schema.]table_name, or you can use * to enable the filter for all tables in the range
  • VALUE is a two-dimensional array of filter conditions, the inner array elements are specific filter conditions, containing two elements: source conditions, target conditions, if source filter is the same as target filter, then the inner array can contains only one element or using string directly.

Example:

    "DATA_FILTER":              // Customer defined data filter for data comparison
    {
        // "*": [["id < 100", "id < 100"]]
        // "HONGYE.TEST_BIG_PART_BY_ID": [["ID < 10000", "ID < 10000"],
        //                                ["ID between 10000 and 20000", "ID between 10000 and 20000"]
        //                               ]
    }

REMAP_SCHEMA

Use this option to specify the schema mapping relationship if there is a schema inconsistency between the two sides of the comparison.

This option is a dictionary structure, where KEY is the source schema and VALUE is the target schema.

Example:

    "REMAP_SCHEMA":             // Remap schema in comparison: Source:Target
    {
        // "SOURCE_SCHEMA": "TARGET_SCHEMA"
    }

COLUMN_LIST

This option is used to specify the range of column names, and the mapping of column names. The following two requirements can be achieved simultaneously:

  1. Column Restriction: Restrict comparison to specified columns only
  2. Column Mapping: Set the column mapping between source and target

This option is a dictionary structure:

  • KEY is the table information: [schema.]table_name, or you can use * to enable the column restriction and mapping for all tables in the range
  • VALUE is the column information, a two-dimensional array, each inner array corresponds to a column, which contains two elements: the source column, the target column, if source column is the same as target column, then the inner array can contains only one element or using string directly.

Example:

    "COLUMN_LIST":              // Column limitation and remapping
    {
         // "*": [["SOURCE_COLUMN_FOR_ALL_TABLE", "TARGET_COLUMN_FOR_ALL_TABLE"]],
         // "HONGYE.TEST_BIG_PART_BY_ID": [["OBJECT_NAME", "NAME"]],
         // "SCHEMA.TABLE_NAME": [["SOURCE_COLUMN_NAME", "TARGET_COLUMN_NAME"], ["SOURCE_COLUMN_NAME_2", "TARGET_COLUMN_NAME_2_NEW"]]
    }
Copyright © 2011-2024 www.enmotech.com All rights reserved.