HomeMogDBMogDB StackUqbar
v2.1

Documentation:v2.1

Supported Versions:

Other Versions:

gs_check

Background

gs_check has been enhanced to unify functions of various check tools, such as gs_check and gs_checkos. It helps you fully check MogDB runtime, OS, network, and database running environments; as well as perform comprehensive checks on various environments before major operations in MogDB, ensuring smooth operation.

Precautions

  • Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
  • If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of a user with the root permissions.
  • You can run -skip-root-items to skip root items.
  • If the MTU values are inconsistent, the check may be slow or the check process may fail to respond. When the inspection tool displays a message, change the MTU values of the nodes to be the same and then perform the inspection.
  • If the switch does not support the configured MTU value, process response failures may be caused due to communication problems even if the MTU values are the same. In this case, you need to adjust the MTU based on the switch.

Syntax

  • Check a single-item.

    gs_check -i ITEM [...] [-U USER] [-L] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items][--set][--routing]
  • Check a scenario.

    gs_check -e SCENE_NAME [-U USER] [-L] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items] [--time-out=SECS][--set][--routing][--skip-items]
  • Display help information.

    gs_check -? | --help
  • Display version information.

    gs_check -V | --version

Parameter Description

  • -U

    Specifies the name of the user for running MogDB.

    Value range: Name of the user for running MogDB

  • -L

    Specifies that the check is locally performed.

  • -i

    Specifies a check item. Its format is -i Check XX. For details about check items, see MogDB status checklist.

  • -e

    Specifies scenario check items. Default scenarios include inspect (routine inspection), upgrade (pre-upgrade inspection), binary_upgrade (local pre-upgrade inspection), health (health check inspection), and install (installation). You can also compile scenarios as required.

  • -l

    Specifies a log file path, Add the .log suffix when specifying the path.

  • -o

    Specifies the path of the check result output folder.

  • -skip-root-items

    Skips the check items that require root permissions.

  • -skip-items

    Skips specified check items.

  • -format

    Specifies the format of the result report.

  • -set

    Specifies abnormal items that can be fixed.

  • -cid

    Checks the ID used only by the internal check process.

  • -time-out

    Specifies the timeout period. The unit is second. If the user-defined timeout period is not less than 1500 seconds, the default value (1500 seconds) is used.

  • -routing

    Specifies the network segment for service IP addresses. The format is IP address:Subnet mask.

  • -disk-threshold=PERCENT

    Specifies the alarm threshold when you check disk usage. You can specify the integer value that ranges from 1 to 99. The default value is 90. This parameter is not mandatory for other check items.

  • -?, -help

    Displays help information.

  • -V, -version

    Displays version information.

MogDB status checklist

  • OS

    Check Item Description -set Supported or Not
    CheckCPU Checks the CPU usage of the host. If idle is greater than 30% and iowait is less than 30%, this item passes the check. Otherwise, this item fails the check. No
    CheckFirewall Checks the firewall status of the host. If the firewall is disabled, this item passes the check. Otherwise, this item fails the check. Yes
    CheckTimeZone Checks whether nodes in openGauss use the same time zone. If they do, this item passes the check. Otherwise, this item fails the check. No
    CheckSysParams Checks whether the values of OS parameters for each node are as expected. If the parameters do not meet the requirements of the warning field, a warning is reported. If the parameters do not meet the requirements of the NG field, this item fails the check, and the failed parameters are printed. Yes
    CheckOSVer Check the OS version of each node in openGauss. If versions are consistent with those in the version compatibility list and information about versions of OSs running on nodes in openGauss is included in the same version list, the item passes the check. Otherwise, this item fails the check. No
    CheckNTPD Checks the NTPD service. If the service is enabled and the time difference across nodes is within 1 minute, this item passes the check. Otherwise, this item fails the check. No
    CheckTHP Checks the THP service. If the service is enabled, this item passes the check. Otherwise, this item fails the check. Yes
    CheckSshdService Checks whether the sshd service is started. If yes, this item passes the check. Otherwise, this item fails the check. No
    CheckCrondService Checks whether the crontab service is started. If yes, this item passes the check. Otherwise, this item fails the check. Yes
    CheckCrontabLeft Checks whether the crontab file contains remaining Gauss information. If no, this item passes the check. Otherwise, this item fails the check. Yes
    CheckDirLeft Checks whether the /opt/enmo/Bigdata/, /var/log/Bigdata/, and /home/omm directories exist. If they do not exist or exist only in the mount directory, this item passes the check. Otherwise, this item fails the check. Yes
    CheckProcessLeft Checks whether the gaussdb and omm processes exist. If no, this item passes the check. Otherwise, this item fails the check. Yes
    CheckStack Checks stack depths. If the stack depths across nodes are inconsistent, a warning is reported. If the stack depths are consistent and greater than or equal to 3072, this item passes the check. If the stack depths are consistent but less than 3072, this item fails the check. Yes
    CheckOmmUserExist Checks whether user omm exists. If no, this item passes the check. Otherwise, this item fails the check. Yes
    CheckPortConflict Checks whether database node ports are occupied. If they are not, this item passes the check. Otherwise, this item fails the check. Yes
    CheckSysPortRange Checks the value range of the system parameter ip_local_port_range. If the value range is 26000 to 65535, this item passes the check. Otherwise, this item fails the check. Yes
    CheckEtcHosts If localhost is not configured for /etc/hosts, there is a mapping whose comment contains #openGauss, or the names of hosts having the same IP address are different, this item fails the check. Otherwise, this item passes the check. In addition, if host names are the same but IP addresses are different, this item also fails the check. No
    CheckCpuCount Checks the number of CPU cores. If the number is different from that of available CPUs, this item fails the check. If the two numbers are the same but unavailability messages exist, a warning is reported. If the CPU information of all nodes is different, this item fails the check. No
    CheckHyperThread Checks hyper-threading. If it is started, this item passes the check. Otherwise, this item fails the check. No
    CheckMemInfo Checks whether the total memory size of each node is the same. If yes, this item passes the check. Otherwise, a warning is reported. No
    CheckSshdConfig Checks the /etc/ssh/sshd_config file.
    (a)PasswordAuthentication=yes;
    (b)MaxStartups=1000;
    (c)UseDNS=no;
    (d) ClientAliveInterval is greater than 10800 or equal to 0.
    If the above information is configured, this item passes the check. If a and c configurations are incorrect, a warning is reported. If b and d configurations are incorrect, this item fails the check.
    Yes
    CheckMaxHandle Checks the maximum handle value of the OS. If the value is greater than or equal to 1 million, this item passes the check. Otherwise, this item fails the check. Yes
    CheckKernelVer Checks the kernel version of each node. If the version information is consistent, this item passes the check. Otherwise, a warning is reported. No
    CheckEncoding Checks the system code of each node in openGauss. If the codes are consistent, this item passes the check. Otherwise, this item fails the check. No
    CheckBootItems Checks whether there are manually added startup items. If no, this item passes the check. Otherwise, this item fails the check. No
    CheckDropCache Checks whether there is a dropcache process running on each node. If yes, this item passes the check. Otherwise, this item fails the check. No
    CheckFilehandle Checks the following conditions. If both the conditions are met, this item passes the check. Otherwise, this item fails the check.
    - The number of processes opened by each gaussdb process does not exceed 800,000.
    - The number of handles used by the slave process does not exceed that of handles used by the master process.
    No
    CheckKeyProAdj Checks all key processes. If the omm_adj value for all key processes is 0, this item passes the check. Otherwise, this item fails the check. No
    CheckMaxProcMemory Checks whether the value of max_process_memory on the database nodes is greater than 1 GB. If no, this item passes the check. Otherwise, this item fails the check. Yes
  • Device

    Check Item Description -set Supported or Not
    CheckSwapMemory Checks the swap memory and total memory sizes. If the check result is 0, this item passes the check. Otherwise, a warning is reported. If the result is greater than the total memory, this item fails the check. Yes
    CheckLogicalBlock Checks the logical block size of a disk. If the result is 512, this item passes the check. Otherwise, this item fails the check. Yes
    CheckIOrequestqueue Checks the I/O value. If the value is 32768, this item passes the check. Otherwise, this item fails the check. Yes
    CheckMaxAsyIOrequests Checks the maximum number of asynchronous requests. If the number of asynchronous I/O requests is greater than 104857600 and greater than the number of database instances on the current node x 1048576, this item passes the check. Otherwise, this item fails the check. Yes
    CheckIOConfigure Checks the I/O configuration. If the result is deadline, this item passes the check. Otherwise, this item fails the check. Yes
    CheckBlockdev Checks the size of the pre-read block. If the result is 16384, this item passes the check. Otherwise, this item fails the check. Yes
    CheckDiskFormat Checks the XFS format information about a disk. If the result is rw,noatime,inode64,allocsize=16m, this item passes the check. Otherwise, a warning is reported. No
    CheckInodeUsage Checks openGauss paths GAUSSHOME/PGHOST/GPHOMEE/GAUSSLOG/tmp and instance directories.
    Checks the usage of the above directories. If the usage exceeds the warning threshold (60% by default), a warning is reported. If the usage exceeds the NG threshold (80% by default), this item fails the check. If the usage is less than or equal to the NG threshold, this item passes the check.
    No
    CheckSpaceUsage Checks openGauss paths GAUSSHOME/PGHOST/GPHOME/GAUSSLOG/tmp and instance directories.
    Checks the usage of the above directories. If the usage exceeds the warning threshold (70% by default), a warning is reported. If the usage exceeds the NG threshold (90% by default), this item fails the check. Also checks the available space of the GAUSSHOME/PGHOST/GPHOME/GAUSSLOG/tmp/data directory. If the space is less than the threshold, this item fails the check. Otherwise, this item passes the check.
    No
    CheckDiskConfig Checks whether disk configurations are consistent. If the names, sizes, and mount points of disks are the same, this item passes the check. If any of them is inconsistent, a warning is reported. No
    CheckXid Checks the value of xid. If the value is greater than 1 billion, a warning is reported. If the value is greater than 1.8 billion, this item fails the check. No
    CheckSysTabSize Checks the system catalog capacity of each instance. If the available capacity of each disk is greater than the total capacity of system catalogs for all instances on the disk, this item passes the check. Otherwise, this item fails the check. No
  • Cluster

    Check Item Description -set Supported or Not
    CheckClusterState Checks the fencedUDF status. If it is down, a warning is reported. In this case, check the openGauss status. If it is Normal, this item passes the check. Otherwise, this item fails the check. No
    CheckDBParams For the primary database node, checks the size of the shared buffer and the Sem parameter.
    For database nodes, checks the size of the shared buffer and the maximum number of connections.
    The shared buffer size should be greater than 128 KB, greater than shmmax, and greater than shmall x PAGESIZE.
    If there is the primary database node, Sem must be greater than the rounded up result of (Maximum number of database node connections + 150)/16.
    If the above items are met, this item passes the check. If any of them is not met, this item fails the check.
    Yes
    CheckDebugSwitch Checks the value of the log_min_messages parameter in the configuration file of each instance on each node. If the value is empty, the default log level warning is used. In this case, if the actual log level is not warning, a warning is reported. Yes
    CheckUpVer Checks the version of the upgrade package on each node in openGauss. If the versions are consistent, this item passes the check. Otherwise, this item fails the check. You need to specify the path of the upgrade software package. No
    CheckDirPermissions Checks permissions for the node directories (instance Xlog path, GAUSSHOME, GPHOME, PGHOST, and GAUSSLOG). If the directories allow for the write permission and at most 750 permission, this item passes the check. Otherwise, this item fails the check. Yes
    CheckEnvProfile Checks the environment variables ($GAUSSHOME, $LD_LIBRARY_PATH, and $PATH) of nodes and those of the CMS, CMA, and database node processes. If there are node environment variables that are correctly configured and process environment variables exist, this item passes the check. Otherwise, this item fails the check. No
    CheckGaussVer Checks whether the gaussdb version of each node is consistent. If the versions are consistent, this item passes the check. Otherwise, this item fails the check. No
    CheckPortRange Checks the port range. If the value of ip_local_port_range is within the threshold (26000 to 65535 by default) and an instance port is out of the range, this item passes the check. Otherwise, this item fails the check. No
    CheckReadonlyMode Checks the read only mode. If the value of default_transaction_read_only on the database nodes in openGauss is off, this item passes the check. Otherwise, this item fails the check. No
    CheckCatchup Checks whether the CatchupMain function can be found in the gaussdb process stack. If no, this item passes the check. Otherwise, this item fails the check. No
    CheckProcessStatus Checks the owner of the gaussdb processes. If their owner is only user omm, this item passes the check. Otherwise, this item fails the check. No
    CheckSpecialFile Checks whether the files in the tmp directory (PGHOST), OM directory (GPHOME), log directory (GAUSSLOG), data directory, and program directory (GAUSSHOME) contain special characters or whether there are files that do not belong to user omm. If none of them exists, this item passes the check. Otherwise, this item fails the check. No
    CheckCollector Checks whether information is successfully collected in the output directory. If yes, this item passes the check. Otherwise, this item fails the check. No
    CheckLargeFile Checks whether there is a file over 4 GB in the directory of each database node. If there is such a file in any database node directory and its subdirectories, this item fails the check. Otherwise, this item passes the check. No
    CheckProStartTime Checks whether the interval for starting key processes exceeds 5 minutes. If no, this item passes the check. Otherwise, this item fails the check. No
    CheckDilateSysTab Checks whether a system catalog is bloated. If no, this item passes the check. Otherwise, this item fails the check. Yes
    CheckMpprcFile Checks whether the environment variable isolation file is modified. If no, this item passes the check. Otherwise, this item fails the check. No
  • Database

    Check Item Description -set Supported or Not
    CheckLockNum Checks the number of database locks. If a result is returned, this item passes the check. No
    CheckArchiveParameter Checks the database archive parameter. If the parameter is not enabled or is enabled for database nodes, this item passes the check. If it is enabled but not for database nodes, this item fails the check. Yes
    CheckCurConnCount Checks the number of database connections. If the number is less than 90% of the maximum connection quantity, this item passes the check. Otherwise, this item fails the check. No
    CheckCursorNum Checks the number of cursors in the database. If a result is returned, this item passes the check. Otherwise, this item fails the check. No
    CheckMaxDatanode Checks the maximum number of database nodes. If the number is less than the number of nodes configured in the XML file multiplied by the number of database nodes (90 x 5 by default), a warning is reported. Otherwise, this item passes the check. Yes
    CheckPgPreparedXacts Checks the pgxc_prepared_xacts parameter. If no 2PC transactions are found, this item passes the check. Otherwise, this item fails the check. Yes
    CheckPgxcgroup Checks the number of redistributed records in the pgxc_group table. If the result is 0, this item passes the check. Otherwise, this item fails the check. No
    CheckLockState Checks whether openGauss is locked. If no, this item passes the check. Otherwise, this item fails the check. No
    CheckIdleSession Checks the number of non-idle sessions. If the result is 0, this item passes the check. Otherwise, this item fails the check. No
    CheckDBConnection Checks whether the database can be connected. If yes, this item passes the check. Otherwise, this item fails the check. No
    CheckGUCValue Checks the result of [(max_connections + max_prepared_transactions) x max_locks_per_transaction]. If it is greater than or equal to 1 million, this item passes the check. Otherwise, this item fails the check. Yes
    CheckPMKData Checks whether the PMK schema of the database contains abnormal data. If no, this item passes the check. Otherwise, this item fails the check. Yes
    CheckSysTable Checks the system catalog. If the check can be performed, this item passes the check. No
    CheckSysTabSize Checks the system catalog capacity of each instance. If the available capacity of each disk is greater than the total capacity of system catalogs for all instances on the disk, this item passes the check. Otherwise, this item fails the check. No
    CheckTableSpace Checks tablespace paths. If no tablespace path and openGauss path are nested and no tablespace paths are nested, this item passes the check. Otherwise, this item fails the check. No
    CheckTableSkew Checks the skew of table data. If a table has unbalanced data distribution among openGauss database nodes, and the database node with the most data has over 100,000 records more than the database node with the least data, this item fails the check. Otherwise, this item passes the check. No
    CheckDNSkew Checks the skew of table data at the database node level. If the database node with the most amount of data has 5% more than the database node with the smallest amount of data, this item fails the check. Otherwise, this item passes the check. No
    CheckUnAnalyzeTable Checks for a table that has not been analyzed. If there is such a table and the table contains at least one record, this item fails the check. Otherwise, this item passes the check. Yes
    CheckCreateView If the query statement for creating a view contains sub-queries, and parsing and rewriting sub-query results lead to duplicate aliases, the check result is failed. Otherwise, the check result is passed. No
    CheckHashIndex Checks whether there are hash indexes. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckNextvalInDefault Checks whether a DEFAULT expression contains nextval (sequence). If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckNodeGroupName Checks whether the name of a Node Group contains non-SQL_ASCII characters. If yes, this item fails the check. Otherwise, this item passes the check. Yes
    CheckPgxcRedistb Checks whether any temporary table remains in the database after data redistribution. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckReturnType Checks whether a user-defined function contains invalid return value types. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckSysadminUser Checks whether there are database administrators in addition to the owner of openGauss. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckTDDate Checks whether the ORC table in a Teradata database contains columns of the date type. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckDropColumn Checks whether there are tables on which DROP COLUMN has been performed. If yes, this item fails the check. Otherwise, this item passes the check. No
    CheckDiskFailure Checks for disk faults. If there is an error during full data query in openGauss, this item fails the check. Otherwise, this item passes the check. No
  • Network

    Check Item Description -set Supported or Not
    CheckPing Checks the connectivity of all nodes in openGauss. If all their IP addresses can be pinged from each other, this item passes the check. Otherwise, this item fails the check. No
    CheckRXTX Checks the RX/TX value for backIP of a node. If it is 4096, this item passes the check. Otherwise, this item fails the check. Yes
    CheckMTU Checks the MTU value of a NIC corresponding to backIP of a node (ensure consistent PICs after bonding). If the result is not 8192 or 1500, a warning is reported. In this case, if MTU values in openGauss are the same, this item passes the check. Otherwise, this item fails the check. Yes
    CheckNetWorkDrop Checks the packet loss rate of each IP address within 1 minute. If the rate does not exceed 1%, this item passes the check. Otherwise, this item fails the check. No
    CheckBond Checks whether BONDING_OPTS or BONDING_MODULE_OPTS is configured. If no, a warning is reported. In this case, checks whether the bond mode of each node is the same. If yes, this item passes the check. Otherwise, this item fails the check. Yes
    CheckMultiQueue Checks cat /proc/interrupts. If multiqueue is enabled for NICs and different CPUs are bound, this item passes the check. Otherwise, this item fails the check. Yes
    CheckUsedPort Checks the value of net.ipv4.ip_local_port_range. If the value is greater than or equal to the default value of the OS (32768 to 61000), this item passes the check.
    Checks the number of random TCP ports. If the number is less than 80% of the total number of random ports, this item passes the check.
    No
    CheckNICModel Checks whether NIC models or driver versions are consistent across nodes. If yes, this item passes the check. Otherwise, a warning is reported. No
    CheckRouting Checks the number of IP addresses on the service network segment for each node. If the number exceeds 1, a warning is reported. Otherwise, this item passes the check. No
    CheckNetSpeed When the network is fully loaded, checks whether the average NIC receiving bandwidth is greater than 600 MB. If yes, this item passes the check.
    When the network is fully loaded, checks the network ping value. If it is shorter than 1s, this item passes the check.
    When the network is fully loaded, checks the NIC packet loss rate. If it is less than 1%, this item passes the check.
    No
  • Others

    Check Item Description -set Supported or Not
    CheckDataDiskUsage Checks the usage of the disk database node directory. If the usage is lower than 90%, this item passes the check. Otherwise, this item fails the check. No

img NOTE: Constraints on the CheckNetSpeed check item are as follows:

  • Do not use -L to check CheckNetSpeed, because doing so cannot produce enough network load and the check result will be inaccurate.
  • If the number of nodes is less than six, the network load produced by speed_test may not fully occupy the bandwidth, and the check result will be inaccurate.

Defining a Scenario

  1. Log in as the OS user omm to the primary node of the database.

  2. Create the scenario configuration file scene_XXX.xml in the script/gspylib/inspection/config directory.

  3. Write check items to the scenario configuration file in the following format:

    <?xml version="1.0" encoding="utf-8" ?>
    <scene name="XXX" desc="check cluster parameters before XXX.">
    <configuration/>
    <allowitems>
    <item name="CheckXXX"/>
    <item name="CheckXXX"/>
    </allowitems>
    </scene>

    item name indicates the check item name.

    Note: You need to ensure that the user-defined XML file is correct.

  4. Run the following command in the home/package/script/gspylib/inspection/config directory to deploy the file on each node where the check is to be performed:

    scp scene_upgrade.xml SIA1000068994: home/package/script/gspylib/inspection/config/

    img NOTE: home/package/script/gspylib/inspection/config is the absolute path of the new scenario configuration file.

  5. Switch to user omm and run the following command on an old node to view the check result:

    gs_check  -e XXX

Defining a Check Item

  1. Add a check item. Modify the script/gspylib/inspection/config/items.xml file in the following format:

    <checkitem id="10010" name="CheckCPU">
    <title>
    <zh>Check the CPU usage.</zh>
    <en>Check CPU Idle and I/O wait</en>
    </title>
    <threshold>
    StandardCPUIdle=30;
    StandardWIO=30
    </threshold>
    <suggestion>
    <zh>If the available space is insufficient and the CPU is heavily loaded, scale out the nodes. If iowait is too high, expand the disk capacity, which is the current performance bottleneck.</zh>.
    </suggestion>
    <standard>
    <zh>Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.</zh>
    </standard>
    <category>os</category>
    <permission>user</permission>
    <scope>all</scope>
    <analysis>default</analysis>
    </checkitems>
    • id: specifies the check item ID.

    • name: specifies the name of the check script.

    • title: specifies the check item description. It allows multiple languages.

      : checks content of Chinese version.

      : checks content of English version.

    • standard: specifies the check standards. It allows multiple languages.

    • suggestion: provides advice on how to fix check item problems. It allows multiple languages.

    • threshold: specifies the check item threshold. Multiple values are separated using semicolons (;), for example, Key1=Value1;Key2=Value2.

    • category: specifies the check item type. It is optional. Its value can be os, device, network, cluster, database, or other.

    • permission: specifies the permission required for checking an item. It is optional. Its value can be root or user (default).

    • scope: specifies the node scope where an item is checked. It is optional. cn- indicates that only the primary database node resides is checked. local- indicates that only the current node is checked. all- is the default value, indicating that all nodes in MogDB are checked.

    • analysis: specifies how the check result is analyzed. default- is the default value, indicating that the result on every node is checked, and that an item passes the check only if it passes the check on all the nodes. consistent- indicates that each node returns a result, and that an item passes the check if all the results are consistent. custom- indicates other ways.

    Note: You need to ensure that the user-defined XML file is correct.

  2. Create a check script named Check XXXX**.py** in the script/gspylib/inspection/items directory. The directory should contain multiple folders, each storing a type of scripts. The format is as follows:

    class CheckCPU(BaseItem):
    def __init__(self):
    super(CheckCPU, self).__init__(self.__class__.__name__)
    self.idle = None
    self.wio = None
    self.standard = None
    
    def preCheck(self):
    # check the threshold was set correctly
    if (not self.threshold.has_key('StandardCPUIdle')
    or not self.threshold.has_key('StandardWIO')):
    raise Exception("threshold can not be empty")
    self.idle = self.threshold['StandardCPUIdle']
    self.wio = self.threshold['StandardWIO']
    
    # format the standard by threshold
    self.standard = self.standard.format(idle=self.idle, iowait=self.wio)
    
    def doCheck(self):
    cmd = "sar 1 5 2>&1"
    output = SharedFuncs.runShellCmd(cmd)
    self.result.raw = output
    # check the result with threshold
    d = next(n.split() for n in output.splitlines() if "Average" in n)
    iowait = d[-3]
    idle = d[-1]
    rst = ResultStatus.OK
    vals = []
    if (iowait > self.wio):
    rst = ResultStatus.NG
    vals.append("The %s actual value %s is greater than expected value %s" % ("IOWait", iowait, self.wio))
    if (idle < self.idle):
    rst = ResultStatus.NG
    vals.append("The %s actual value %s is less than expected value %s" % ("Idle", idle, self.idle))
    self.result.rst = rst
    if (vals):
    self.result.val = "\n".join(vals)

    A script is developed based on the BaseItem class, which defines the common check process, result analysis method, and default output format. Extended parameters:

    • doCheck: contains specific ways to check an item. The check result is in the following format:

      result.rst: (optional) specifies the check result. Its value can be:

      • OK: indicates that the item passes the check.
      • NA: indicates that the check does not cover the node.
      • NG: indicates that the item failed the check.
      • WARNING: indicates that the check is complete and that a warning is reported.
      • ERROR: indicates that the check is interrupted due to an internal error.
    • preCheck: checks prerequisites. Its value can be cnPreCheck, which checks whether a primary database node instance is deployed on the current execution node; or localPreCheck, which checks whether the current execution node is specified for the check. You can set it using scope in the check item configuration file. This method can be reloaded to perform customized pre-checks.

    • postAnalysis specifies how the check result is analyzed. Its value can be default or consistent. You can set it using analysis in the check item configuration file. This method can be reloaded to perform customized result analysis.

    Note: The name of a user-defined check item cannot be the same as the name of an existing check item. In addition, you need to ensure that the user-defined check item script is standard.

  3. Deploy the script on all execution nodes.

  4. Log in to the nodes added in a scale-out as user root or to old nodes as user omm. Run the following commands as required and view the result:

    To locally perform a check, run the following command:

    gs_check -i CheckXXX  -L

    To remotely perform a check, run the following command:

    gs_check -i CheckXXX

OS Parameters

Table 2 OS parameters

Parameter Description Recommended Value
net.ipv4.tcp_max_tw_buckets Specifies the maximum number of TCP/IP connections concurrently remaining in the TIME_WAIT state. If the number of TCP/IP connections concurrently remaining in the TIME_WAITstate exceeds the value of this parameter, the TCP/IP connections in the TIME_WAIT state will be released immediately, and alarm information will be printed. 10000
net.ipv4.tcp_tw_reuse Reuses sockets whose status is TIME-WAIT for new TCP connections.
- 0: This function is disabled.
- 1: This function is enabled.
1
net.ipv4.tcp_tw_recycle Rapidly reclaims sockets whose status is TIME-WAIT in TCP connections.
- 0: This function is disabled.
- 1: This function is enabled.
1
net.ipv4.tcp_keepalive_time Specifies how often Keepalived messages are sent through TCP connections when Keepalived is enabled. 30
net.ipv4.tcp_keepalive_probes Specifies the number of Keepalived detection packets sent through a TCP connection before the connection is regarded invalid. The product of the parameter value multiplied by the value of the tcp_keepalive_intvl parameter determines the response timeout duration after a Keepalived message is sent through a connection. 9
net.ipv4.tcp_keepalive_intvl Specifies how often a detection packet is re-sent when the previous packets are not acknowledged. 30
net.ipv4.tcp_retries1 Specifies the maximum TCP reattempts during connection establishment. 5
net.ipv4.tcp_syn_retries Specifies the maximum SYN packet reattempts in the TCP. 5
net.ipv4.tcp_synack_retries Specifies the maximum SYN response packet reattempts in the TCP. 5
net.ipv4.tcp_retries2 Specifies the number of times that the kernel re-sends data to a connected remote host. A smaller value leads to earlier detection of an invalid connection to the remote host, and the server can quickly release this connection.
If "connection reset by peer" is displayed, increase the value of this parameter to avoid the problem.
12
vm.overcommit_memory Specifies the kernel check method during memory allocation.
- 0: The system accurately calculates the current available memory.
- 1: The system returns a success message without a kernel check.
- 2: The system returns a failure message if the memory size you have applied for exceeds the result of the following formula: Total memory size x Value of vm.overcommit_ratio / 100 + Total SWAP size.
The default value for a kernel is 2, which is too conservative. The recommended value is 0. If system loads are high, set this parameter to1.
0
net.ipv4.tcp_rmem Specifies the free memory in the TCP receiver buffer. Three memory size ranges in the unit of page are provided: min, default, and max. 8192 250000 16777216
net.ipv4.tcp_wmem Specifies the free memory in the TCP sender buffer. Three memory size ranges in the unit of page are provided: min, default, and max. 8192 250000 16777216
net.core.wmem_max Specifies the maximum size of the socket sender buffer. 21299200
net.core.rmem_max Specifies the maximum size of the socket receiver buffer. 21299200
net.core.wmem_default Specifies the default size of the socket sender buffer. 21299200
net.core.rmem_default Specifies the default size of the socket receiver buffer. 21299200
net.ipv4.ip_local_port_range Specifies the range of temporary ports that can be used by a physical server. 26000-65535
kernel.sem Specifies the kernel semaphore. 250 6400000 1000 25600
vm.min_free_kbytes Specifies the minimum free physical memory reserved for unexpected page breaks. 5% of the total system memory
net.core.somaxconn Specifies the maximum length of the listening queue of each port. This is a global parameter. 65535
net.ipv4.tcp_syncookies Specifies whether to enable SYN cookies to guard the OS against SYN attacks when the SYN waiting queue overflows.
- 0: The SYN cookies are disabled.
- 1: The SYN cookies are enabled.
1
net.core.netdev_max_backlog Specifies the maximum number of data packets that can be sent to the queue when the rate at which the network device receives data packets is higher than that at which the kernel processes the data packets. 65535
net.ipv4.tcp_max_syn_backlog Specifies the maximum number of unacknowledged connection requests to be recorded. 65535
net.ipv4.tcp_fin_timeout Specifies the default timeout. 60
kernel.shmall Specifies the total shared free memory of the kernel. 1152921504606846720
kernel.shmmax Specifies the maximum value of a shared memory segment. 18446744073709551615
net.ipv4.tcp_sack Specifies whether selective acknowledgment is enabled. The selective acknowledgment on out-of-order packets can increase system performance. Restricting users to sending only lost packets (for wide area networks) should be enabled, but this will increase CPU usage.
- 0: This function is disabled.
- 1: This function is enabled.
1
net.ipv4.tcp_timestamps Specifies whether the TCP timestamp (12 bytes are added in the TCP packet header) enables a more accurate RTT calculation than the retransmission timeout (for details, see RFC 1323) for better performance.
- 0: This function is disabled.
- 1: This function is enabled.
1
vm.extfrag_threshold When system memory is insufficient, Linux will score the current system memory fragments. If the score is higher than the value of vm.extfrag_threshold,kswapd triggers memory compaction. When the value of this parameter is close to 1000, the system tends to swap out old pages when processing memory fragments to meet the application requirements. When the value of this parameter is close to0, the system tends to do memory compaction when processing memory fragments. 500
vm.overcommit_ratio When the system uses the algorithms where memory usage never exceeds the thresholds, the total memory address space of the system cannot exceed the value of swap+RAM multiplied by the percentage specified by this parameter. When the value of vm.overcommit_memory is set to 2, this parameter takes effect. 90
MTU Specifies the maximum transmission unit (MTU) for a node NIC. The default value in the OS is 1500. You can set it to 8192 to improve the performance of sending and receiving data using SCTP. 8192

File System Parameters

  • soft nofile

    Indicates the soft limit. The number of file handles used by a user can exceed this parameter value. However, an alarm will be reported.

    Recommended value: 1000000

  • hard nofile

    Indicates the hard limit. The number of file handles used by a user cannot exceed this parameter value.

    Recommended value: 1000000

  • stack size

    Specifies the thread stack size.

    Recommended value: 3072

Examples

Check result of a single item:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3

Checking...               [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes.  success: 3

Success. All check items run completed. Total:1  Success:1  Failed:0
For more information please refer to /opt/mogdb/tools/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz

Local execution result:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L

2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK

2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749)  12/29/17  _x86_64_

17:09:24        CPU     %user     %nice   %system   %iowait    %steal     %idle
17:09:25        all      0.25      0.00      0.25      0.00      0.00     99.50
17:09:26        all      0.25      0.00      0.13      0.00      0.00     99.62
17:09:27        all      0.25      0.00      0.25      0.13      0.00     99.37
17:09:28        all      0.38      0.00      0.25      0.00      0.13     99.25
17:09:29        all      1.00      0.00      0.88      0.00      0.00     98.12
Average:        all      0.43      0.00      0.35      0.03      0.03     99.17

Check result of a scenario:

[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Skip CheckHdfsForeignTabEncoding because it only applies to V1R5 upgrade V1R6 with cluster.
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOConfigure CheckMTU CheckRXTX CheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckMaxProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:
Please enter password for user[root]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:57 Nodes:3
Checking...               [=========================] 57/57
Start to analysis the check result
CheckClusterState...........................OK
The item run on 3 nodes.  success: 3
CheckDBParams...............................OK
.........................................................................
CheckMpprcFile..............................OK
The item run on 3 nodes.  success: 3

Analysis the check result successfully
Failed. All check items run completed. Total:57   Success:49   Warning:5   NG:3   Error:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_inspect_201902207129254785.tar.gz

gs_checkos, gs_checkperf

Copyright © 2011-2024 www.enmotech.com All rights reserved.