Target Kafka Description

This article mainly provides explanations on the target-side Kafka-related configurations and data formats in the synchronization channel from MogDB to Kafka.

1. Synchronization Limitation

Synchronization from MogDB to Kafka only supports table data synchronization, including consistent initialization full data synchronization and incremental data synchronization. Incremental data includes DML data synchronization and some DDL data synchronization.
Data synchronization from MogDB to Kafka does not support the synchronization of large object data, including but not limited to BLOB, CLOB, NCLOB, BYTEA, LONGTEXT, TEXT, SMALLTEXT, TINYTEXT, RAW, etc.

2. Target Kafka Configurations

Topic configuration entry

After creating the channel from MogDB to Kafka, in the more menu at the upper right corner of the object list, you can find the Topic configuration menu by pulling down.

Clicking on topic configuration will pop up a window displaying the topic configuration page. Topic configuration can be divided into three major categories:

Global Topic: All tables within the channel use the same topic (full and incremental can be specified separately).
Independent Topic: Each table within the channel uses an independent topic (full and incremental can be specified separately).
Custom Topic: You can freely customize the full or incremental topic for each table.

The topicParams on the right will set advanced parameter for all topics within the channel. By default, it does not need to be configured and is only configured in scenarios with special business requirements.

Global Topic

This is the default configuration method for MogDB to Kafka. All tables within the channel will use the same full and incremental Topics.

You can specify the full and incremental Topic names separately, or you can use the same Topic name for both full and incremental.

Independent Topic

Each table uses a topic related to its own table name. According to whether full and incremental data are merged, it is divided into two types:

Merged storage: For each table, a topic is created, and all full and incremental data of that table is sent to this Topic.
Separate storage: For each table, two topics are created to store full and incremental data respectively.

Custom Topic

Freely customizing the full or incremental topic name for each table offers a very high degree of freedom in configuration. However, for channels with a large number of tables, the configuration process is slightly complex.

You can first select Global Topic or Independent Topic as the basic configuration and save it. Then, enter the kafka topic configuration page for fine-tuning.

For custom topic adjustments, you can directly modify and adjust for a single table in the table. You can also batch-select multiple tables and configure their full or incremental topics.

3. Data Format Explanation

3.1 Full snapshot data

In addition to the snapshot data of the source table in the data synchronized to Kafka, snapshot identification data will also be sent before and after the full snapshot.

The consumer side can use these snapshot identifications to achieve functions such as data cleaning of snapshots, connection between snapshots and incremental data, and table-level breakpoint continuation.

3.1.1 Full snapshot identification

Snapshot start identification.

{
  "sourceSchemaName": "test_schema",
  "sourceTableName": "test_table",
  "status": " SNAPSHOT_BEGIN"
}

Snapshot end identification.

{
  "sourceSchemaName": "test_schema",
  "sourceTableName": "test_table",
  "status": " SNAPSHOT_END"
}

Data Description


sourceSchemaName	Source schema name
sourceTableName	Source table name
status	Full snapshot identification: `SNAPSHOT_BEGIN` indicates the start of full data, and `SNAPSHOT_END` indicates the end of full data

3.1.2 Full snapshot data

The actual data of the full snapshot is as follows:

{
  "source": {
    "db": "test_db",
    "schema": "test_schema",
    "table": "test_tab"
  },
  "syncType": "full",
  "op": "c",
  "after": {
    "a1": "1",
    "b1": "test"
}

Data Description


source	Source table information, mainly include: source database name, source schema name, source table name - db：The database name to which the table belongs - schema：Source schema name - table：Source table name
syncType	Data synchronization type: `full` indicates full data, and `sync` indicates increment data
op	Operation type. For full snapshot data, the operation type is always `c`
after	full snapshot row data in "Key-Value" structure Where "Key" is the column name and "Value" is the column data

3.2 Incremental

Incremental data is divided into two types: DDL and DML.

3.2.1 DDL

{
  "source": {
    "db": "test_db",
    "schema": "test_schema",
    "table": "test_tab"
  },
  "ddlContent": "DROP INDEX  s1.idx_a1",
  "ddlType": "ALTER",
  "syncType": "sync"
}


source	Source table information, mainly include: source database name, source schema name, source table name - db：The database name to which the table belongs - schema：Source schema name - table：Source table name
syncType	Data synchronization type: `full` indicates full snapshot, and `sync` indicates increment data
ddlContent	DDL statement executed in the source database, no "ddlContent" for truncate statement (ddlType=`t`)
ddlType	DDL type, including: `CREATE`、`ALTER`、`DROP`、`t`

3.2.2 DML

{
  "source": {
    "db": "dbName",
    "schema": "s1",
    "table": "t1"
  },
  "op": "u",
  "after": {
    "a1": "1",
    "b1": "WW"
  },
  "before": {
    "a1": "1",
    "b1": "2"
  },
  "hasPrimaryKey": false,
  "syncType": "sync"
}


source	Source table information, mainly include: source database name, source schema name, source table name - db：The database name to which the table belongs - schema：Source schema name - table：Source table name
syncType	Data synchronization type: `full` indicates full snapshot, and `sync` indicates increment data
hasPrimaryKey	Whether there is a primary key on the source table
op	DML operation, including: `c`、`u`、`d`, means insert, update and delete
before	According to whether there is a primary key, the data therein is also different. With primary key: Store the pre-image data of the primary key columns; Without primary key: Store the pre-image data of all columns.
after	Store the post image of all columns after the change.