MogDB
Ecological Tools
Doc Menu

MogHA Management

Overview

MogHA is an enterprise HA product developed based on MogDB by EnMotech. It is mainly designed for server crash, instance crash, and other issues. MogHA is able to decrease the database fault time from the minute level to the second level, thereby ensuring continuous running of system services. During the fault recovery, the customers are insensitive to the process.

With MogHA, if a server crashes in the scenario where one primary server and one standby server are deployed or one primary server and multiple standby servers are deployed, the customer is allowed to manually or automatically perform primary/standby switchover to make the standby server take over services from the primary server. Therefore, the database downtime is shortened, avoiding impacting services.

Basic concepts

  • Primary/standby

    Primary/standby indicates that the primary and standby instances do not share data files but own their independent data files. The primary and standby servers support data synchronization through data operation logs.

    Both read and write operations can be allowed to perform on a primary server. However, only read operations can be performed on a standby server. The standby server can ensure that the data delay will not exceed a specified time range (Typically, the latest data transaction) through operation log playback.

  • Physical replication

    The database supports physical replication. It indicates that redo logs are used to record data block changes. This ensures that data files on the primary and standby servers are consistent, thereby protecting data to the utmost.

  • Logical replication

    Logical replication involves logical operations. It indicates that the data on the primary and standby servers is the same but data files are different.

    Compared with physical replication, logical replication is more flexible.

  • Database switchover (take a single HA architecture as an example)

    The main HA architecture is as follows.

    Image 2

    One HA group includes one primary database and one standby database. The main components include:

    • Agent: Each server is deployed with an agent for performing HA-related operations.
    • VIP: is short for virtual IP address. This function is provided by a vNIC that can be attached to a server. If a server crashes, the vNIC can be attached to another server. This prevents an application from modifying database configurations, reducing the database downtime. The VIP is mainly used for primary/standby switchover.
    • Arbitration: In consideration of simplified configurations, the gateway of the service subnet in which the primary or standby database is located is designed as the arbitration node used as criteria for network isolation. When the standby database judges whether the primary database can be pinged, it also judges whether the arbitration node can be pinged. If the arbitration node cannot be pinged, the standby database thinks that network disconnection occurs, and then will not trigger primary/standby switchover.
  • Deployment mode

    MogHA supports two deployment modes, including Lite mode and Full mode.

    Lite mode

    • The HA service is enabled on only the primary database server and the synchronous standby database server that are in the same equipment room.
    • Single switchover is supported only.
    • Manual operations are required after switchover.
    • The user needs to configure a new standby database server for synchronous replication of data by setting sync names.
    • The new standby database server needs to have the HA service enabled.
    • No any change will be required for database configurations.

    Full mode

    • The HA service needs to be enabled on all instances.
  • If the standby database server crashes, the asynchronous standby database server is automatically switched as the synchronous standby database server.

    • Continuous switchover is supported without manual intervention.
  • HA-related configurations of databases will be changed. (The asynchronous standby database server is automatically switched as the synchronous standby database server.)
  • Deployment and O&M

    image-20210407104511881

    MogHA manages services through systemd while systemd manages the web and heartbeat processes through supervosord.

    Web process is used for internal communications of components (mutual access is allowed within only one set of primary and standby database servers). Heartbeat process is used for actual check and HA operations.

Topology

MogHA supports the deployment mode of a maximum of one primary and eight standby database servers. The following will take the architecture of one primary and six standby database servers as an example.

The two-city-three-center architecture includes one primary database server, two synchronous standbys, three asynchronous standbys, and one cascaded standby. It can ensure that there are at least two database nodes in one data center. Real-time data synchronization between servers in two equipment rooms in the same city can prevent data loss or service unavailability due to single equipment room failure. The remote equipment room is used for disaster recovery and provides multiple data copies. The cluster architecture is as follows.

Image 4

Three database servers are deployed in the primary equipment room, including one primary database server, one synchronous standby, and one asynchronous standby. The synchronous standby and asynchronous standby are connected to the primary database server in the upstream. When data is changed on the primary database server, the commit operation is performed only after the data change is saved to the disk of the synchronous standby. Compared with the synchronous standby, the asynchronous standby allows certain delay of writing data to a disk. If the primary database server goes wrong, the data of the synchronous standby and the primary database server is consistent. The synchronous standby is preferentially chosen as the primary database server without data loss. When the original primary database server cannot be recovered quickly, the synchronization configuration parameter synchronous_standby_names needs to be modified on the new primary database server and the reloading is also required to switch the asynchronous standby to a synchronous standby. This ensures that there are two synchronous standbys in the whole cluster and there are two database servers in each equipment room.

The HA tool monitors the status of all nodes in a cluster. If a primary database node is abnormal, a standby database node can take over services from the primary database node, making sure that the whole database cluster is available. With the JDBC-related IP address list, the program can automatically determine which one is the primary database node and which one is the standby database node without application intervention.

Deployment and Installation

Prerequisites

  • The database has been deployed.
  • The OS must be based on x86 Red Flag 7.6.
  • Python 3 has been installed.

Procedure

  1. Ensure that the firewall is disabled. The actual environment can be set by port.
  2. Use NTP or chronyd to proof the time of the primary and standby database servers.
  3. Configure the sudo permission for the user.

    chmod +w /etc/sudoers
    which ifconfig
    /usr/sbin/ifconfig
    vi /etc/sudoers
    omm   ALL=(ALL)    NOPASSWD: /usr/sbin/ifconfig
    chmod -w /etc/sudoers
  4. Prepare the Python running environment.

    Create soft links to python and python3 in the /home/omm/ha/venv/bin directory.

    ln -s /usr/bin/python3.7 /home/omm/ha/venv/bin/python
    ln -s /usr/bin/python3.7 /home/omm/ha/venv/bin/python3
  5. Modify configuration files.

    a. Modify the postgresql.conf configuration file in the data directory, and modify the listening addresses as required.

    listen_addresses=’*’

    b. In the postgresql.conf configuration file of each node, change replconninfo1 and its subsequent ports to 26009, 26008, 26007, 26009, 26008, and 26007, and then restart the database cluster.

    c. Add the IP address for accessing the server to the pg_hba.conf file.

    For example: host all all 21.0.21.23/32 md5

    d. Put ha and venv in the /home/omm/ha directory.

    /home/omm/ha/ha/node.conf
                    env.sh
                    supervisord.conf
                    mogha.service
    /home/omm/ha/venv/bin/python
                          python3
    /usr/lib/systemd/system/mogha.service

    e. Modify the node.conf configuration file in the /home/omm/ha/ha directory.

    [config]
    heartbeat_interval=3                                 #HA heartbeat interval(s)
    primary_lost_timeout=10                              #Maximum time(s) for the loss of the primary database node
    primary_lonely_timeout=10                            #Maximum time(s) of the primary database node disconnected from other nodes
    double_primary_timeout=10                            #Maximum time of maintaining two primary database nodes 
    agent_port=8081                                      #Web port of HA
    db_port=26000                                         #Database service port
    db_user=omm                                          #OS user of the database
    db_datadir=/data/dn1                                 #Data directory
    primary_info=/home/omm/ha/ha/primary_info.json     #JSON data address of the primary database node, which is put in the HA directory by default
    standby_info=/home/omm/ha/ha/standby_info.json     #JSON data address of the standby database node, which is put in the HA directory by default
    taskset=True
    [meta]                                          #Metadatabase
    ha_name=ms1
    host=192.168.2.1
    port=26000
    db=monitordb
    user=monitor
    password=monitor
    schema=public
    [host1]                                            #Information of node 1, generally referring to the primary database node
    ip=192.168.122.201
    heartbeat_ips=192.168.100.201
    [host2]                                            #Information of node 2
    ip=192.168.122.202
    heartbeat_ips=192.168.100.202
    [host3]                                             #Information of node 3
    ip=192.168.122.205
    heartbeat_ips=192.168.100.205
    [host4]                                             #Information of node 4
    ip=192.168.122.206
    heartbeat_ips=192.168.100.206
    [zone1]                                            #Primary equipment room
    vip=192.168.122.211
    arping=192.168.122.1
    ping_list=192.168.122.1
    hosts=host1,host2
    [zone2]                                             #Standby equipment room
    vip=192.168.122.212
    arping=192.168.122.1
    ping_list=192.168.122.1
    hosts=host3
    cascades=host4                                      #Cascaded database

    f. Modify the env.sh configuration file in the /home/omm/ha/ha directory.

    export GAUSSHOME=/home/postgres/openGauss
    export PGDATA=$GAUSSHOME/data
    export LD_LIBRARY_PATH=$GAUSSHOME/lib

    g. Modify the supervisord.conf configuration file in the /home/omm/ha/ha directory.

    [supervisord]
    logfile=/tmp/mogha_supervisord.log ;The log file is $CWD/supervisord.log by default.
    logfile_maxbytes=50MB        ;If the log file size is greater than the upper limit, data will be written to a new log file. The default maximum size of a log file is 50 MB. If this parameter is set to 0, the log size is not limited. 
    logfile_backups=10           ;The default number of log file copies is 10. The number is set to 0, indicating that there is no copy. 
    loglevel=info                ;The log level is info by default. Other values include debug, warn, and trace.
    pidfile=/tmp/mogha_supervisord.pid ;pid file
    
    nodaemon=true               ;Check whether the node is started in the foreground. The default value is false, indicating that the node is started in the daemon mode. 
    minfds=1024                  ;Indicates the minimum number of the files that can be opened. The default value is 1024.
    minprocs=200                 ;Indicates the minimum number of the processes that can be opened. The default value is 200. 
    [program:web]
    command=/home/omm/ha/venv/bin/python  /home/omm/ha/ha/main.py --config /home/omm/ha/ha/node.conf --web
    autostart=true 
    startsecs=10   
    autorestart=true  
    startretries=3 
    user=omm
    redirect_stderr=true 
    stdout_logfile_maxbytes=20MB
    stdout_logfile_backups = 20 
    stdout_logfile=/home/omm/ha/ha/mogha_web.log
    environment=PYTHONUNBUFFERED=1,GAUSSHOME=/opt/gaussdb/app,PGDATA=/opt/gaussdb/data/db1,LD_LIBRARY_PATH=/opt/gaussdb/app/lib:/opt/mogdb/tools/lib:/opt/mogdb/tools/script/gspylib/clib
    directory=/home/omm/ha/ha/
    [program:heartbeat]
    command=/home/omm/ha/venv/bin/python  /home/omm/ha/ha/main.py --config /home/omm/ha/ha/node.conf --heartbeat
    autostart=true
    startsecs=10
    autorestart=true
    startretries=3
    user=omm
    redirect_stderr=true
    stdout_logfile_maxbytes=20MB
    stdout_logfile_backups = 20
    stdout_logfile=/home/omm/ha/ha/mogha_heartbeat.log
    environment=GAUSSHOME=/opt/gaussdb/app,PGDATA=/opt/gaussdb/data/db1,LD_LIBRARY_PATH=/opt/gaussdb/app/lib:/opt/mogdb/tools/lib:/opt/mogdb/tools/script/gspylib/clib
    directory=/home/omm/ha/ha

    h. Modify the mogha.service configuration file in the /home/omm/ha/ha directory.

    [Unit]
    Description=The doufu python message queue server
    After=network.target remote-fs.target nss-lookup.target
    
    [Service]
    Environment=GAUSSHOME=/gauss/openGauss/app_101
    Environment=PGDATA=/gaussdata/openGauss/db1
    Environment=LD_LIBRARY_PATH=/gauss/openGauss/app_101/lib:/gauss/openGauss/om/lib:/gauss/openGauss/om/script/gspylib/clib:
    Type=simple
    User=omm
    WorkingDirectory=/home/omm/ha/ha
    ExecStart=/home/omm/ha/venv/bin/supervisord -c /home/omm/ha/ha/supervisord.conf 
    KillSignal=SIGTERM
    TimeoutStopSec=5
    KillMode=process
    PrivateTmp=false
    [Install]
    WantedBy=multi-user.target

    i. Copy the mogha.service configuration file to the /usr/lib/systemd/system/ directory as user root.

  6. Enable/Disable MogHA.

    a. Run the following commands on all nodes as user root:

    su - root
    systemctl [start|stop|restart] mogha

    b. Check whether the log file of each node is correct.

    tail -f /home/omm/ha/ha/mogha_web.log
    tail -f /home/omm/ha/ha/mogha_heartbeat.log

    c. Set the HA service enabled upon server startup.

    su - root
    systemctl enable mogha

MogHA Uninstallation

Procedure

  1. Log in to each database node as user omm.
  2. Delete the venv and ha files on each node.
  3. Switch to use root.
  4. Delete the mogha.service file on each node.