HomeMogDBMogDB StackUqbar
v3.1

Documentation:v3.1

Supported Versions:

Other Versions:

Manual Rectification of the Brain Split Fault in a Database Cluster

Symptom

In the brain split scenario of a database cluster, a primary instance is stopped to be in the manually stopped status. Make sure that there is only one primary instance running normally. You can judge whether there is a brain split fault using the following two methods.

  • Error log

    The log of the cm_server primary instance includes split brain failure in db service, as shown in the following.

    CM_AGENT ERROR: [Primary], line 1529: split brain failure in db service, more dynamic primary and their term(7504) are the most(7504). Due to auto crash recovery is disabled, no need send restart msg to instance(6002) that had been restarted, waiting for manual intervention.

    Or

    CM_AGENT ERROR: line 570: split brain failure in db service, instance 6002 local term(7403) is not max term(7504). Due to auto crash recovery is disabled, will not restart current instance, waiting for manual intervention.
  • Cluster status

    Run the cm_ctl query -Cvp -z ALL command to query the cluster status, as shown in the following figure.

    img

    Note: In extreme scenarios, two database instances will be stopped. In this case, start the instance whose ID is small, such as the node 6001 so that it can continuously provide services. The start command is as follows.

    cm_ctl start -n <nodeid> -D <datapath>

Cause Analysis

If a network partitioning fault occurs, the failover process may be triggered due to the following reasons to meet the requirement that the instance that owning more votes can continuously provide services.

  • cms primary cannot perceive the status of the primary instance in the database cluster.
  • The standby instance cannot communicate with the primary instance, such as stream replication error.
  • The connectivity between the standby instance and the virtual IP address (configured) is abnormal.

Once the network partitioning fault is rectified, brain split may occur. Additionally, because the automatic fault rectification function cms_enable_db_crash_recovery of the database cluster is disabled, the dual-primary decision will not be made but a primary instance is stopped to ensure that there is one primary instance available and a DBA then manually deal with the issue to ensure data consistency of the database cluster.

Procedure

Tool Download

First, access the Download MogDB: MogDB Downloads page of the MogDB official website, switch to the Package Download tab, choose a version, OS, and CPU, and then choose a package in the ToolKits- \<version>-\<platform-cpuarch>.tar.gz format, as shown in the following figure.

img

Put the software package in the bin directory of MogDB and decompress it. The mog_xlogdump tool is used for data verification. The decompression result is as follows.

img

The tool depends on the following parameters.

img

As shown in the figure, mog_xlogdump depends on the transaction ID (-x), OID (-o) of a data table, and the xlog file (-p).

Tool Usage

The brain split fault will cause that the logs of the two database instances differ from a time point. If it lasts a long time, the difference may cause that tens of thousands of data records in single tables or multiple tables are inconsistent (in online environment, even if the difference occurs, it lasts short time and can be controlled). In this case, the DBA needs to confirm the data difference range using the mog_xlogdump tool to facilitate data merging.

Experimental Data Table

The experimental data table is test_example_01.

(1) Create a table on the primary instance and insert five data records.

create table test_example_01(id integer primary key, user_name varchar(10), register_time timestamp);

insert into test_example_01 values(1, 'zhangsan', '2022-09-22 10:57:10');

insert into test_example_01 values(2, 'lisi', '2022-09-22 10:58:10'); 

insert into test_example_01 values(3, 'wangwu', '2022-09-22 17:03:10');

insert into test_example_01 values(4, 'mazi', '2022-09-22 17:05:10');

insert into test_example_01 values(5, 'laoliu', '2022-09-22 17:09:10');

(2) Query the table information before the network partitioning fault occurs.

\d+ test_example_01; //Query the table structure. 

img

select pg_relation_filenode('test_example_01'); // Query the filenode, that is the OID. 

select oid from pg_class where relname='test_example_01'; // Query the OID of the table. 

img

select xmin, xmax, * from test_example_01; // Query the transaction ID of the table, that is, xmin.
select pg_current_xlog_location(); // Query the WAL log file of the primary instance, such as 000000010000000000000004.
  • wrz-cm-test-01 node

    img

Confirm the Data Change Range Based on the Start and End LSN

For tables where data is inconsistent or not is unclear, the mog_xlogdump tool can be used for finding inconsistent data based on the specified start and end LSN, that is, -s and -e. You can run either of the following commands.

mog_xlogdump -c Database connection -s Start lsn -e End lsn wal log file of the primary instance

mog_xlogdump -c Database connection -p xlog directory -s Start lsn -e End lsn

Note:

  1. The database connection string rule is postgres://<db_user>:<db_pwd>@<host_ip>:\<port>/<db_name>. You do not need to specify it in the local.
  2. Because the OID is not specified, the data obtained using this method contains all table data. It is hard to merge data later.

For tables where data is inconsistent or not is unclear, the mog_xlogdump tool can be used for parsing data of the table with specified OID based on the parameters -o and -R, and start and end LSN, that is, -s and -e. You can run either of the following commands.

mog_xlogdump -c Database connection -o OID of a table -R Column type of a table -s Start lsn -e End lsn wal log file of the primary instance

mog_xlogdump -c Database connection -p xlog directory -o OID of a table -R Column type of a table -s Start lsn -e End lsn

Note: The database connection string rule is postgres://<db_user>:<db_pwd>@<host_ip>:\<port>/<db_name>. You do not need to specify it in the local.

(1) Query the cluster status to make sure that the cluster is running normally.

img

(2) Use a firewall to simulate network isolation between the primary and standby nodes

Run the following command on any node of the cluster and isolate its network from the other node.

iptables -I OUTPUT -d 192.168.122.232 -j DROP; iptables -I INPUT -s 192.168.122.232 -j DROP

(3) Query the cluster status to make sure that the CMS and database instance role in each partition is primary.

  • wrz-cm-test-01 node

    img

  • wrz-cm-test-02 node

    img

Perform the Write, Update, and Delete Operations on the Primary Instances in Different Partitions

(1) Perform the write operation on the wrz-cm-test-01 node.

insert into test_example_01 values(6, 'xiaoming', '2022-09-22 17:09:10'); // Repeated IDs

img

(2) Perform the write operation on the wrz-cm-test-02 node.

insert into test_example_01 values(6, 'xiaoming', '2022-09-22 17:10:10'); // Repeated IDs

insert into test_example_01 values(7, 'huluwa', '2022-09-22 17:12:10');

img

Rectify the Network Partitioning Fault

(1) Clear the firewall.

Run the following command on the node in the second procedure in the network partitioning fault simulation part.

iptables -I OUTPUT -d 192.168.122.232 -j ACCEPT; iptables -I INPUT -s 192.168.122.232 -j ACCEPT

(2) Query the cluster status to make sure that it is running normally.

The cluster status is consistent with that in the Symptom part, that is, the cluster is in the manually stopped status.

img

Data Recovery and Merging

This procedure depends on logs. Once log data is lost, data recovery will fail. Assume that there is only one table (test_example_01) whose data is changed after network partitioning.

(1) Confirm the LSN information before the network partitioning fault occurs and when the brain split occurs.

  • Judge the time point at which the networking partitioning fault occurs.

    Based on the original cluster status of the node in simulating the network partitioning fault, the CMS and the database instance role on the wrz-cm-test-01 node will be changed as the primary instance after the network partitioning fault occurs. Therefore, the time point when the CMS is changed to a primary instance can be found in the cm_server log of the wrz-cm-test-01 node. With the time point, the LSN information corresponding to the last data synchronization of the primary and standby instances when the cluster is running normally can be obtained. (-s indicates the start time for mog_xlogdump to parse the log).

    The log for the CMS to be switched as the primary instance is as follows.

    /cm_server-2022-09-23_104436-current.log:2022-09-23 15:16:23.005 tid=1245295  LOG: [DccNotifyStatus] g_dbRole is 1, roleType is 1.

    The LSN information reported from the CMA node before the CMS is switched to be a primary node is as follows.

    2022-09-23 14:59:27.495 tid=1245411 CM_AGENT LOG: [InstanceIsUnheal], current report instance is 6001, node 1, instId[6001: 6002], node[1: 2], staticRole[2=Standby: 1=Primary], dynamicRole[2=Standby: 5=Down], term[703: 604], lsn[0/4016230: 0/4007840], dbState[2: 0], buildReason[2=Disconnected: 7=Unknown], doubleRestarting[0: 0], disconn_mode[3=prohibit_connection: 1=polling_connection], disconn[:0, :0], local[192.168.122.231:27001, 192.168.122.232:27001], redoFinished[1: 1], arbiTime[104: 100], syncList[cur: (sync list is empty), exp: (sync list is empty), vote: (dynamic status is empty)], groupTerm[604], sync_standby_mode[0: 0: 0].
  • Judge the time point at which brain split occurs in the cluster.

    Based on the brain split fault rectification mechanism, the LSN corresponding to the time point at which the brain split occurs can be replaced with the xlog position of the manually stopped database instance. (-e indicates the end time for mog_xlogdump to parse the log.)

    Under network isolation, start the manually stopped database instance, log in to the instance using gsql, and run the following command to obtain the xlog position.

    select pg_current_xlog_location();

(2) Use a firewall to isolate the node where the database instance is manually stopped and start the database instance on the node.

img

Obtain the data of the test_example_01 table and the LSN information when the brain split fault occurs.

img

(3) Use mog_xlogdump to confirm the data change range, and export the data.

Data change on the wrz-cm-test-01 node:

img

Data change on the wrz-cm-test-02 node:

img

Generate the data to be merged into the primary instance of the current database cluster.

insert into test_example_01 values(6, 'xiaoming', '2022-09-22 17:10:10'); // IDs conflict. Choose the ID of the latest time.
insert into test_example_01 values(7, 'huluwa', '2022-09-22 17:12:10');

After confirmation, stop the current database instance.

(4) Import the changed data into the primary instance of the database cluster.

Use gsql to log in to the primary instance and perform the above insert statement.

Query the database record after data is merged.

(5) Add the manually stopped database instance to the database cluster.

Based on the second step, make sure that the database instance on the node is in the stopped status, clear the firewall to cancel network isolation, and start the database instance on the node by running the cm_ctl start -n nodeid -D datapath command.

The instance will be added to the database cluster as a standby instance.

(6) Query the cluster status to ensure that any device is normal.

Copyright © 2011-2024 www.enmotech.com All rights reserved.