MogDB
Ecological Tools
Doc Menu
Documentation:v1.1.0
Supported Versions:

Checking MogDB Health Status

Check Method

Use the gs_check tool provided by MogDB to check the MogDB health status.

Precautions

  • Only user root is authorized to check new nodes added during cluster scale-out. In other cases, the check can be performed only by user omm.
  • Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
  • If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of user root.
  • You can run -skip-root-items to skip root items.
  • Check the consistency between the new node and existing nodes. Run the gs_check command on an existing node and specify the -hosts parameter. The IP address of the new node needs to be written into the hosts file.

Procedure

Method 1:

  1. Log in as the OS user omm to the primary node of the database.
  2. Run the following command to check the MogDB database status:

    gs_check -i CheckClusterState

    In the command, -i indicates the check item and is case-sensitive. The format is -i CheckClusterState, -i CheckCPU or -i CheckClusterState,CheckCPU.

    Checkable items are listed in "Server Tools > gs_check > MogDB status checks" in the MogDB Tool Reference. You can create a check item as needed.

Method 2:

  1. Log in as the OS user omm to the primary node of the database.
  2. Run the following command to check the MogDB database health status:

    gs_check -e inspect

    In the command, -e indicates the inspection scenario and is case-sensitive. The format is -e inspect or -e upgrade.

    The inspection scenarios include inspect (routine inspection), upgrade (inspection before upgrade), Install (install inspection ), binary_upgrade (inspection before in-place upgrade), slow_node (node inspection), longtime (time-consuming inspection) and health (health inspection). You can create an inspection scenario as needed.

The MogDB inspection is performed to check MogDB status during MogDB running or to check the environment and conditions before critical operations, such as upgrade or scale-out. For details about the inspection items and scenarios, see "Server Tools > gs_check > MogDB status checks" in the MogDB Tool Reference.

Examples

Check result of a single item:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3

Checking...               [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes.  success: 3

Analysis the check result successfully
Success. All check items run completed. Total:1  Success:1  Failed:0
For more information please refer to /opt/mogdb/tools/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz

Local execution result:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L

2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK

2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749)  12/29/17  _x86_64_

17:09:24        CPU     %user     %nice   %system   %iowait    %steal     %idle
17:09:25        all      0.25      0.00      0.25      0.00      0.00     99.50
17:09:26        all      0.25      0.00      0.13      0.00      0.00     99.62
17:09:27        all      0.25      0.00      0.25      0.13      0.00     99.37
17:09:28        all      0.38      0.00      0.25      0.00      0.13     99.25
17:09:29        all      1.00      0.00      0.88      0.00      0.00     98.12
Average:        all      0.43      0.00      0.35      0.03      0.03     99.17

Check result of a scenario:

[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2

Checking...               [                         ] 21/59
Checking...               [=========================] 59/59
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes.  success: 2

CheckDBParams...............................OK
The item run on 1 nodes.  success: 1

CheckDebugSwitch............................OK
The item run on 2 nodes.  success: 2

CheckDirPermissions.........................OK
The item run on 2 nodes.  success: 2

CheckReadonlyMode...........................OK
The item run on 1 nodes.  success: 1

CheckEnvProfile.............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
GAUSSHOME        /usr1/mogdb/app
LD_LIBRARY_PATH  /usr1/mogdb/app/lib
PATH             /usr1/mogdb/app/bin


CheckBlockdev...............................OK
The item run on 2 nodes.  success: 2

CheckCurConnCount...........................OK
The item run on 1 nodes.  success: 1

CheckCursorNum..............................OK
The item run on 1 nodes.  success: 1

CheckPgxcgroup..............................OK
The item run on 1 nodes.  success: 1

CheckDiskFormat.............................OK
The item run on 2 nodes.  success: 2

CheckSpaceUsage.............................OK
The item run on 2 nodes.  success: 2

CheckInodeUsage.............................OK
The item run on 2 nodes.  success: 2

CheckSwapMemory.............................OK
The item run on 2 nodes.  success: 2

CheckLogicalBlock...........................OK
The item run on 2 nodes.  success: 2

CheckIOrequestqueue.....................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'

CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes.  success: 2

CheckIOConfigure............................OK
The item run on 2 nodes.  success: 2

CheckMTU....................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
1500

CheckPing...................................OK
The item run on 2 nodes.  success: 2

CheckRXTX...................................NG
The item run on 2 nodes.  ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256


CheckNetWorkDrop............................OK
The item run on 2 nodes.  success: 2

CheckMultiQueue.............................OK
The item run on 2 nodes.  success: 2

CheckEncoding...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
LANG=en_US.UTF-8

CheckFirewall...............................OK
The item run on 2 nodes.  success: 2

CheckKernelVer..............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64

CheckMaxHandle..............................OK
The item run on 2 nodes.  success: 2

CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06


CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.

CheckSysParams..........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.

CheckTHP....................................OK
The item run on 2 nodes.  success: 2

CheckTimeZone...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
+0800

CheckCPU....................................OK
The item run on 2 nodes.  success: 2

CheckSshdService............................OK
The item run on 2 nodes.  success: 2

CheckSshdConfig.........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:

Warning reason: UseDNS parameter is not set; expected: no

CheckCrondService...........................OK
The item run on 2 nodes.  success: 2

CheckStack..................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
8192

CheckNoCheckSum.............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
Nochecksum value is N,Check items pass.

CheckSysPortRange...........................OK
The item run on 2 nodes.  success: 2

CheckMemInfo................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G

CheckHyperThread............................OK
The item run on 2 nodes.  success: 2

CheckTableSpace.............................OK
The item run on 1 nodes.  success: 1

CheckSctpService............................OK
The item run on 2 nodes.  success: 2

CheckSysadminUser...........................OK
The item run on 1 nodes.  success: 1

CheckGUCConsistent..........................OK
All DN instance guc value is consistent.

CheckMaxProcMemory..........................OK
The item run on 1 nodes.  success: 1

CheckBootItems..............................OK
The item run on 2 nodes.  success: 2

CheckHashIndex..............................OK
The item run on 1 nodes.  success: 1

CheckPgxcRedistb............................OK
The item run on 1 nodes.  success: 1

CheckNodeGroupName..........................OK
The item run on 1 nodes.  success: 1

CheckTDDate.................................OK
The item run on 1 nodes.  success: 1

CheckDilateSysTab...........................OK
The item run on 1 nodes.  success: 1

CheckKeyProAdj..............................OK
The item run on 2 nodes.  success: 2

CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun  2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/mogdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun  1 16:15:15 2020 /usr1/mogdb/app/bin/mogdb -D /usr1/mogdb/data/dn1 -M standby


CheckFilehandle.............................OK
The item run on 2 nodes.  success: 2

CheckRouting................................OK
The item run on 2 nodes.  success: 2

CheckNICModel...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
version: 1.0.0
model: Red Hat, Inc. Virtio network device


CheckDropCache..........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
No DropCache process is running

CheckMpprcFile..............................NG
The item run on 2 nodes.  ng: 2
The ng[host240,host157] value:
There is no mpprc file

Analysis the check result successfully
Failed. All check items run completed. Total:59   Success:52   Warning:5   NG:2
For more information please refer to /usr1/mogdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz

Exception Handling

Troubleshoot exceptions detected in the inspection by following instructions in this section.

Table 1 Check of MogDB running status

Check Item Abnormal Status Solution
CheckClusterState (Checks the MogDB status.) MogDB or MogDB instances are not started. Run the following command to start MogDB and instances:

gs_om -t start
The status of MogDB or MogDB instances is abnormal. Check the status of hosts and instances. Troubleshoot this issue based on the status information.
gs_check -i CheckClusterState
CheckDBParams (Checks database parameters.) Database parameters have incorrect values. Use the gs_guc tool to set the parameters to specified values.
CheckDebugSwitch (Checks debug logs.) The log level is incorrect. Use the gs_guc tool to set log_min_messages to specified content.
CheckDirPermissions (Checks directory permissions.) The permission for a directory is incorrect. Change the directory permission to a specified value (750 or 700).
chmod 750 DIR
CheckReadonlyMode (Checks the read-only mode.) The read-only mode is enabled. Verify that the usage of the disk where database nodes are located does not exceed the threshold (60% by default) and no other O&M operations are performed.
gs_check -i CheckDataDiskUsage ps ux
Use the gs_guc tool to disable the read-only mode of MogDB.
gs_guc reload -N all -I all -c 'default_transaction_read_only = off'
CheckEnvProfile (Checks environment variables.) Environment variables are inconsistent. Update the environment variable information.
CheckBlockdev (Checks pre-read blocks.) The size of a pre-read block is not 16384 KB. Use the gs_checkos tool to set the size of the pre-read block to 16384 KB and write the setting into the auto-startup file.
gs_checkos -i B3
CheckCursorNum (Checks the number of cursors.) The number of cursors fails to be checked. Check whether the database is properly connected and whether the MogDB status is normal.
CheckPgxcgroup (Checks the data redistribution status.) There are pgxc_group tables that have not been redistributed. Proceed with the redistribution.
gs_expand、gs_shrink
CheckDiskFormat (Checks disk configurations.) Disk configurations are inconsistent between nodes. Configure disk specifications to be consistent between nodes.
CheckSpaceUsage (Checks the disk space usage.) Disk space is insufficient. Clear or expand the disk for the directory.
CheckInodeUsage (Checks the disk index usage.) Disk indexes are insufficient. Clear or expand the disk for the directory.
CheckSwapMemory (Checks the swap memory.) The swap memory is greater than the physical memory. Reduce or disable the swap memory.
CheckLogicalBlock (Checks logical blocks.) The size of a logical block is not 512 KB. Use the gs_checkos tool to set the size of the logical block to 512 KB and write the setting into the auto-startup file.
gs_checkos -i B4
CheckIOrequestqueue (Checks I/O requests.) The requested I/O is not 32768. Use the gs_checkos tool to set the requested I/O to 32768 and write the setting into the auto-startup file.
gs_checkos -i B4
CheckCurConnCount (Checks the number of current connections.) The number of current connections exceeds 90% of the allowed maximum number of connections. Break idle primary database node connections.
CheckMaxAsyIOrequests (Checks the maximum number of asynchronous requests.) The maximum number of asynchronous requests is less than 104857600 or (Number of database instances on the current node x 1048576). Use the gs_checkos tool to set the maximum number of asynchronous requests to the larger one between 104857600 and (Number of database instances on the current node x 1048576).
gs_checkos -i B4
CheckMTU (Checks MTU values.) MTU values are inconsistent between nodes. Set the MTU value on each node to 1500 or 8192.
ifconfig eth* MTU 1500
CheckIOConfigure (Checks I/O configurations.) The I/O mode is not deadline. Use the gs_checkos tool to set the I/O mode to deadline and write the setting into the auto-startup file.
gs_checkos -i B4
CheckRXTX (Checks the RX/TX value.) The NIC RX/TX value is not 4096. Use the checkos tool to set the NIC RX/TX value to 4096 for MogDB.
gs_checkos -i B5
CheckPing (Checks whether the network connection is normal.) There are MogDB IP addresses that cannot be pinged. Check the network settings, network status, and firewall status between the abnormal IP addresses.
CheckNetWorkDrop (Checks the network packet loss rate.) The network packet loss rate is greater than 1%. Check the network load and status between the corresponding IP addresses.
CheckMultiQueue (Checks the NIC multi-queue function.) Multiqueue is not enabled for the NIC, and NIC interruptions are not bound to different CPU cores. Enable multiqueue for the NIC, and bind NIC interruptions to different CPU cores.
CheckEncoding (Checks the encoding format.) Encoding formats are inconsistent between nodes. Write the same encoding format into /etc/profile for each node.
echo "export LANG=XXX" >> /etc/profile
CheckActQryCount (Checks the archiving mode.) The archiving mode is enabled, and the archiving directory is not under the primary database node directory. Disable archiving mode or set the archiving directory to be under the primary database node directory.
CheckFirewall (Checks the firewall.) The firewall is enabled. Disable the firewall.
systemctl disable firewalld.service
CheckKernelVer (Checks kernel versions.) Kernel versions are inconsistent between nodes.
CheckMaxHandle (Checks the maximum number of file handles.) The maximum number of handles is less than 1000000. Set the soft and hard limits in the 91-nofile.conf or 90-nofile.conf file to 1000000.
gs_checkos -i B2
CheckNTPD (Checks the time synchronization service.) The NTPD service is disabled or the time difference is greater than 1 minute. Enable the NTPD service and set the time to be consistent.
CheckSysParams (Checks OS parameters.) OS parameter settings do not meet requirements. Use the gs_checkos tool or manually set parameters to values meeting requirements.
gs_checkos -i B1 vim /etc/sysctl.conf
CheckTHP (Checks the THP service.) The THP service is disabled. Use the gs_checkos to enable the THP service.
gs_checkos -i B6
CheckTimeZone (Checks time zones.) Time zones are inconsistent between nodes. Set time zones to be consistent between nodes.
cp /usr/share/zoneinfo/\$primary time zone/$secondary time zone\ /etc/localtime
CheckCPU (Checks the CPU.) CPU usage is high or I/O waiting time is too long. Upgrade CPUs or improve disk performance.
CheckSshdService (Checks the SSHD service.) The SSHD service is disabled. Enable the SSHD service and write the setting into the auto-startup file.
service sshd start echo "server sshd start" >> initFile
CheckSshdConfig (Checks SSHD configurations.) The SSHD service is incorrectly configured. Reconfigure the SSHD service.
PasswordAuthentication=no; MaxStartups=1000; UseDNS=yes; ClientAliveInterval=10800/ClientAliveInterval=0
Restart the service.
server sshd start
CheckCrondService (Checks the Crond service.) The Crond service is disabled. Install and enable the Crond service.
CheckStack (Checks the stack size.) The stack size is less than 3072. Use the gs_checkos tool to set the stack size to 3072 and restart the processes with a smaller stack size.
gs_checkos -i B2
CheckNoCheckSum (Checks the NoCheckSum parameter.) NoCheckSum is incorrectly set or its value is inconsistent on each node. Set NoCheckSum to a consistent value on each node. If redHat6.4 or redHat6.5 with the NIC binding mode bond0 exists, set NoCheckSum to Y. In other cases, set it to N.
echo Y > /sys/module/sctp/parameters/no_checksums
CheckSysPortRange (Checks OS port configurations.) OS IP ports are not within the required port range or MogDB ports are within the OS IP port range. Set the OS IP ports within 26000 to 65535 and set the MogDB ports beyond the OS IP port range.
vim /etc/sysctl.conf
CheckMemInfo (Checks the memory information.) Memory sizes are inconsistent between nodes. Use physical memory of the same specifications between nodes.
CheckHyperThread (Checks the hyper-threading.) The CPU hyper-threading is disabled. Enable the CPU hyper-threading.
CheckTableSpace (Checks tablespaces.) The tablespace path is nested with the MogDB path or nested with the path of another tablespace. Migrate tablespace data to the tablespace with a valid path.
CheckSctpService (Checks the SCTP service.) The SCTP service is disabled. Install and enable the SCTP service.
modprobe sctp