HomeMogDBMogDB StackUqbar
v3.1

Documentation:v3.1

Supported Versions:

Other Versions:

gs_cgroup

Background

When jobs are batch processed in a cluster, loads on servers significantly vary due to the complexity of batch processing. To fully use cluster resources, you need to manage loads. gs_cgroup is a load management tool provided by MogDB. It can create default Cgroups and user-defined Cgroups, delete default and user-defined Cgroups, update resource quotas and allocations, display the configuration files of Cgroups and the Cgroup tree, and delete all Cgroups.

gs_cgroup creates Cgroups configuration files for the OS user of a database and generates Cgroups that the OS user sets in the OS. gs_cgroup also allows users to add or delete Cgroups, update Cgroup resource quotas, allocate CPU cores or I/O resources, set exception thresholds, and handle the exceptions. gs_cgroup is responsible only for Cgroups operations performed on the node where the current OS resides, and needs to be centrally configured across nodes by invoking the same statement.

For details, see "Resource Load Management" in Developer Guide.

Examples

  • Commands executed by a common user or the database administrator:

    1. Prerequisites: The GAUSSHOME environment variable is used as the database installation directory and user root has created default Cgroups for common users.

    2. Create Cgroups and set corresponding resource quota so that jobs of the database can be specified to a Cgroup and use its resources. The database administrator creates Class Cgroups for each database user.

      1. Create class and workload Cgroups.

        gs_cgroup -c -S class1 -s 40

        Create the class1 Cgroup and allocate 40% of Class resources to it.

        gs_cgroup -c -S class1 -G grp1 -g 20

        Create the grp1 Workload Cgroup under the class1 Cgroup and allocate 20% of class1 Cgroup resources to the Workload Cgroup.

      2. Delete the created grp1 Cgroup and class1 Cgroup.

        gs_cgroup -d -S class1 -G grp1

        Delete the created grp1 Cgroup.

        gs_cgroup -d -S class1

        Delete the created class1 Cgroup.

        img NOTICE: If a Class Cgroup is deleted, its Workload Cgroups will be deleted as well.

    3. Update the resource quota for created Cgroups.

      1. Update dynamic resource quota.

        gs_cgroup -u -S class1 -G grp1 -g 30

        Update the resources allocated to the grp1 Workload Cgroup under the class1 Cgroup for the current user to 30% of class1 resources.

      2. Update the resource limitation quota.

        gs_cgroup --fixed -u -S class1 -G grp1 -g 30

        Set the number of CPU cores allocated to the grp1 Cgroup to 30% of cores allocated to its parent Cgroup class1.

    4. Update the range of the CPU cores in the mogdb Cgroup.

      gs_cgroup -u -T mogdb -f 0-20

      Update the number of CPU cores used by the mogdb process to 0-20.

      img NOTE: The -f parameter can only be used to set the range of the CPU cores in the mogdb Cgroup. For other Cgroups, if you need to set the number of cores, use the -fixed parameter.

    5. Set exception handling information. (class:wg group must exist.)

      1. Terminate a job under the class:wg Cgroup when job congestion lasts for 1200s or job execution lasts for 2400s.

        gs_cgroup -S class -G wg -E "blocktime=1200,elapsedtime=2400" -a
      2. Specify the termination action performed when the size of spilled job data in the class:wg group reaches 256 MB or the size of broadcast job data in the group reaches 100 MB.

        gs_cgroup -S class -G wg -E "spillsize=256,broadcastsize=100" -a
      3. Demote a job under the Class Cgroup when the total CPU time taken to execute the job on all nodes reaches 100s.

        gs_cgroup -S class -E "allcputime=100" --penalty
      4. Demote a job under the Class Cgroup when the total time taken to execute the job on all nodes reaches 2400s and the skew of the CPU time reaches 90 percent.

        gs_cgroup -S class -E "qualificationtime=2400,cpuskewpercnt=90"

        img NOTICE: To set exception handling information for a Cgroup, ensure that the Cgroup has been created. Multiple specified thresholds are separated by commas (,). If no operation is specified, -penalty is used by default.

    6. Set the number of cores per CPU have for a Cgroup.

      Set the range of cores for the class:wg Cgroup to 20% of Class cores.

      gs_cgroup -S class -G wg -g 20 --fixed -u

      img NOTICE: The range of cores for the Class or Workload Cgroup must be specified by the -fixed parameter.

    7. Roll back the previous step.

      gs_cgroup --recover

      img NOTE: The -recover parameter can only roll back the latest addition, deletion, or modification made to the Class and Workload Cgroups.

    8. View information about Cgroups that have been created.

      1. View Cgroup information in configuration files.

        gs_cgroup -p

        Cgroup configuration

        gs_cgroup -p
        
        Top Group information is listed:
        GID:   0 Type: Top    Percent(%): 1000( 50) Name: Root                  Cores: 0-47
        GID:   1 Type: Top    Percent(%):  833( 83) Name: mogdb:omm           Cores: 0-20
        GID:   2 Type: Top    Percent(%):  333( 40) Name: Backend               Cores: 0-20
        GID:   3 Type: Top    Percent(%):  499( 60) Name: Class                 Cores: 0-20
        
        Backend Group information is listed:
        GID:   4 Type: BAKWD  Name: DefaultBackend   TopGID:   2 Percent(%): 266(80) Cores: 0-20
        GID:   5 Type: BAKWD  Name: Vacuum           TopGID:   2 Percent(%):  66(20) Cores: 0-20
        
        Class Group information is listed:
        GID:  20 Type: CLASS  Name: DefaultClass     TopGID:   3 Percent(%): 166(20) MaxLevel: 1 RemPCT: 100 Cores: 0-20
        GID:  21 Type: CLASS  Name: class1           TopGID:   3 Percent(%): 332(40) MaxLevel: 2 RemPCT:  70 Cores: 0-20
        
        Workload Group information is listed:
        GID:  86 Type: DEFWD  Name: grp1:2           ClsGID:  21 Percent(%):  99(30) WDLevel:  2 Quota(%): 30 Cores: 0-5
        
        Timeshare Group information is listed:
        GID: 724 Type: TSWD   Name: Low              Rate: 1
        GID: 725 Type: TSWD   Name: Medium           Rate: 2
        GID: 726 Type: TSWD   Name: High             Rate: 4
        GID: 727 Type: TSWD   Name: Rush             Rate: 8
        
        Group Exception information is listed:
        GID:  20 Type: EXCEPTION Class: DefaultClass
        PENALTY: QualificationTime=1800 CPUSkewPercent=30
        
        GID:  21 Type: EXCEPTION Class: class1
        PENALTY: AllCpuTime=100 QualificationTime=2400 CPUSkewPercent=90
        
        GID:  86 Type: EXCEPTION Group: class1:grp1:2
        ABORT: BlockTime=1200 ElapsedTime=2400
      2. View the Cgroup tree in the OS.

        gs_cgroup -P displays a Cgroup tree. In the tree, shares indicates the value of cpu.shares, which specifies the dynamic quota of CPU resources in the OS, and cpus indicates the value of cpuset.cpus, which specifies the dynamic quota of CPUSET resources in the OS (number of cores that a Cgroup can use).

        gs_cgroup -P
        Mount Information:
        cpu:/dev/cgroup/cpu
        blkio:/dev/cgroup/blkio
        cpuset:/dev/cgroup/cpuset
        cpuacct:/dev/cgroup/cpuacct
        
        Group Tree Information:
        - mogdb:wangrui (shares: 5120, cpus: 0-20, weight: 1000)
                - Backend (shares: 4096, cpus: 0-20, weight: 400)
                        - Vacuum (shares: 2048, cpus: 0-20, weight: 200)
                        - DefaultBackend (shares: 8192, cpus: 0-20, weight: 800)
                - Class (shares: 6144, cpus: 0-20, weight: 600)
                        - class1 (shares: 4096, cpus: 0-20, weight: 400)
                                - RemainWD:1 (shares: 1000, cpus: 0-20, weight: 100)
                                        - RemainWD:2 (shares: 7000, cpus: 0-20, weight: 700)
                                                - Timeshare (shares: 1024, cpus: 0-20, weight: 500)
                                                        - Rush (shares: 8192, cpus: 0-20, weight: 800)
                                                        - High (shares: 4096, cpus: 0-20, weight: 400)
                                                        - Medium (shares: 2048, cpus: 0-20, weight: 200)
                                                        - Low (shares: 1024, cpus: 0-20, weight: 100)
                                        - grp1:2 (shares: 3000, cpus: 0-5, weight: 300)
                                - TopWD:1 (shares: 9000, cpus: 0-20, weight: 900)
                        - DefaultClass (shares: 2048, cpus: 0-20, weight: 200)
                                - RemainWD:1 (shares: 1000, cpus: 0-20, weight: 100)
                                        - Timeshare (shares: 1024, cpus: 0-20, weight: 500)
                                                - Rush (shares: 8192, cpus: 0-20, weight: 800)
                                                - High (shares: 4096, cpus: 0-20, weight: 400)
                                                - Medium (shares: 2048, cpus: 0-20, weight: 200)
                                                - Low (shares: 1024, cpus: 0-20, weight: 100)
                                - TopWD:1 (shares: 9000, cpus: 0-20, weight: 900)

Parameter Description

  • -a [-abort]

    Terminates a job when it exceeds an exception threshold.

  • -b pct

    Specifies the percentage of resources of the Top Backend Cgroup taken by a Backend Cgroup. The -B backendname parameter must be specified as well.

    Value Range

    The value ranges from 1 to 99.

    If this parameter is not set, the default CPU quota accounts for 20% of the Vacuum Cgroup and 80% of the DefaultBackend Cgroup, respectively. The quota sum for the Vacuum and DefaultBackend Cgroups must be less than 100%.

  • -B name

    Specifies the name of a Backend Cgroup. Only the -u parameter can be used to change the resource quota for this Cgroup.

    The -b percent and -B backendname parameters need to be specified to set the resource proportion of database backend threads.

    Value range: a string with a maximum of 64 bytes.

  • -c

    Creates a Cgroup and specifies its name.

    A common user can specify -c and -S classname to create a Class Cgroup. If -G groupname is specified as well, a Workload Cgroup will be created under the Class Cgroup. The Workload Cgroup is at the bottom layer in the Class Cgroup (Layer-4 is the bottom layer.)

  • -d

    Deletes Cgroups.

    A common user can specify -d and -S classname parameters to delete the created Class Cgroups. If the -G groupname parameter is specified as well, a Workload Cgroup under the Class Cgroup is deleted, and related threads are put into the DefaultClass:DefaultWD:1 Cgroup. If the Workload Cgroups to be deleted locate at a high level (Level 1 is the top level), adjust hierarchy of lower-level Cgroups, create the new Cgroups-related threads, and load them to the new Cgroups.

  • -E data

    Specifies the exception thresholds, including blocktime, elapsedtime, allcputime, spillsize, broadcastsize, qualificationtime, and cpuskewpercent. The thresholds are separated by commas (,). 0 indicates that the setting is canceled. If the parameter is set to an invalid value, an error will be prompted.

    Table 2 Exception threshold types

    Exception Threshold Type Description Value Range (0 Indicates Setting Canceled) Operation upon Exception
    blocktime Job blocking duration. The unit is second. blocktime includes the total time spent in global and local concurrent queuing. 0-UINT_MAX abort
    elapsedtime Execution time of a job that has not been finished. The unit is second. The time indicates the duration from the start point of execution to the current time point. 0-UINT_MAX abort
    allcputime Total CPU time spent in executing a job on all nodes. The unit is second. 0-UINT_MAX abort, penalty
    cpuskewpercent CPU time skew of a job executed on nodes. The value depends on the setting of qualificationtime. 0-100 abort, penalty
    qualificationtime Interval for checking the CPU skew. The unit is second. This parameter must be set together with cpuskewpercent. 0-UINT_MAX none
    spillsize Amount of job data spilled to disks on nodes. The unit is MB. 0-UINT_MAX abort
    broadcastsize Size of broadcast operators of a job on nodes. The unit is MB. 0-UINT_MAX abort
  • -h [-help]

    Displays the command help information.

  • -H

    Collects $GAUSSHOME information among the current users.

    Value range: a string with a maximum of 1023 characters.

  • -f

    Specifies the range of core quantity used by the mogdb Cgroup. The range format can be a-b or a. For other Cgroups, use the -fixed parameter to set the range of core quantity.

  • -fixed

    Specifies the percentage of cores allocated for a Cgroup's parent group that the Cgroup can use, or specifies the I/O resources.

    -fixed is set together with -s, -g, -t, and -b when the kernel range ratio is set.

    The ratio is between 0 and 100. The sum of kernels of the same level is less than or equal to 100. The value 0 indicates that the kernel number of a level is same as that of the upper level. The CPU quota for all the Cgroups is set to 0 by default. -f and -fixed cannot be configured at the same time. After -fixed is set, the -f range will be automatically invalid. The ratio will be displayed in -p as the quota value.

    When the I/O resource quota is set, -R, -r, -W, and -w are used together.

  • -g pct

    Specifies the percentage of resources in a Class Cgroup taken by a Workload Cgroup. The -G groupname parameter needs to be specified as well. The -g pct parameter can be used with the -c parameter to create a Cgroup or with the -u parameter to update a Workload Cgroup.

    Value range: 1 to 99. By default, the CPU quota of a Workload Cgroup is 20%. The sum of CPU quotas for all Workload Cgroups must be less than 99%.

  • -G name

    Specifies the name of a Workload Cgroup. The -S classname parameter needs to be set to specify the Class Cgroup to which the Workload Cgroup belongs. The -G name parameter can be used with -c to create a Cgroup, with -d to delete a Cgroup, and with -u to update the resource quota for a Cgroup. Note that name in the -G name parameter cannot be a default Timeshare Cgroup name, including Low, Medium, High, and Rush.

    If a user creates a Workload Cgroup, the name must contain any colons (:). Names of Cgroups must be different.

    Value range: a string with a maximum of 28 bytes

  • -N [-group] name

    Shows the Cgroup name, class:wg for short.

  • -p

    Shows information about Cgroup configuration files.

  • -P

    Shows the structure of the Cgroup tree.

  • -penalty

    Demotes a job when the job exceeds an exception threshold. If no operation is specified, -penalty is used by default.

  • -r data

    Only updates the upper limit of data reading for I/O resources, that is, sets the value of blkio.throttle.read_bps_device. This parameter is a string consisting of major:minor value, in which major indicates the major device number of the disk to be accessed,minor indicates the minor device number, and value indicates the upper limit of the number of read operations. The upper limit ranges from 0 to ULONG_MAX, and 0 indicates that the number of read operations is not restricted. This parameter needs to be used with the -u parameter and Cgroup names. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • -R data

    Only updates the upper limit of I/O resources used to read data per second, that is, sets the value of blkio.throttle.read_iops_device. The value of this parameter is the same as that of the -r parameter. This parameter needs to be used with the -u parameter and Cgroup names. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • -recover

    Rolls back only the latest addition, deletion, or modification made to the Class and Workload Cgroups.

  • -revert

    Restores to the default status of the Cgroup.

  • -D mpoint

    Specifies a mount point. The default mount point is /dev/cgroup/subsystem.

  • -m

    Mounts the Cgroup.

  • -M

    Unmounts the Cgroup.

  • -U

    Specifies the database username.

  • -refresh

    Updates the status of the Cgroup.

  • -s pct

    Specifies the percentage of resources in the top Class Cgroup taken by a Class Cgroup. The -S classname parameter needs to be specified as well. The -s pct parameter can be used with the -c parameter to create a Cgroup or with the -u parameter to update a Class Cgroup.

    Value range: 1 to 99. By default, the CPU quota of the Class Cgroup is set to 20%. In R6C10, the CPU quota of the Class Cgroup is set to 40%. During the upgrade, the quota is not updated. The sum of the CPU quota of the newly created Class Cgroup and the default DefaultClass quota must be less than 100%.

  • -S name

    Specifies the name of a Class Cgroup. This parameter can be used with -c to create a Cgroup, with -d to delete a Cgroup, or with -u to update resource quota for a Cgroup. The name of a sub-Class Cgroup cannot contain the colon (:).

    Value range: a string with a maximum of 31 bytes.

  • -t percent

    Specifies the percentage of resources for top Cgroups (Root, mogdb: omm, Backend, and Class Cgroups). The -T name parameter needs to be specified as well. If this parameter is used to specify resource percentage for the -T Root Cgroup, the name shown in the Cgroup configuration file is Root. percent indicates the percentage of the value of blkio.weight, and its minimum value is 10%. The CPU resource quota, such as the value of cpu.shares cannot be changed. If this parameter is used to specify resource percentage for the mogdb:omm Cgroup, the parameter value indicates the percentage of CPU resources taken by the mogdb:omm Cgroup. (The cpu.shares value for the mogdb:omm Cgroup can be obtained based on the quota 1024 for the Root Cgroup and the condition that only one database is available for the current system.) The I/O resource quota is 1000 and will not change. If this parameter is used to specify resource percentage for the Class or Backend Cgroup, the parameter value indicates the percentage of resources in the mogdb Cgroup taken by the Class or Backend Cgroup.

    Value range: 1 to 99. By default, the quota of the Class Cgroup is 60%, and the quota of the Backend Cgroup is 40%. Modify the quota of the Class Cgroup and automatically update the quota of the Backend Cgroup so that the sum quota of the Backend and Class Cgroups is 100%.

  • -T name

    Specifies the names of top Cgroups.

    Value range: a string with a maximum of 64 bytes.

  • -u

    Updates Cgroups.

  • -V [-version]

    Displays version information about the gs_cgroup tool.

  • -w data

    Only updates the upper limit of I/O resources used to write data per second, that is, sets the value of blkio.throttle.write_bps_device. The value of this parameter is the same as that of the -r parameter. The -u parameter and the Cgroup name need to be specified as well. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • -W data

    Only updates the upper limit of I/O resources used to write data per second, that is, sets the value of blkio.throttle.write_iops_device. The value of this parameter is the same as that of the -r parameter. The -u parameter and the Cgroup name need to be specified as well. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

img NOTE: Use the following method to obtain the major:minor value for the disk. For example, obtain the number of the disk corresponding to the /mpp directory.

df
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/sda1       524173248  41012784  456534008   9% /
devtmpfs         66059264       236   66059028   1% /dev
tmpfs            66059264        88   66059176   1% /dev/shm
/dev/sdb1      2920486864 135987592 2784499272   5% /data
/dev/sdc1      2920486864  24747868 2895738996   1% /data1
/dev/sdd1      2920486864  24736704 2895750160   1% /mpp
/dev/sde1      2920486864  24750068 2895736796   1% /mpp1
ls -l /dev/sdd
brw-rw---- 1 root disk 8, 48 Feb 26 11:20 /dev/sdd

img NOTICE: Check the disk number of sdd rather than sdd1. Otherwise, an error will be reported. If the length of I/O quota limitation after the upgrade exceeds the allowed maximum length of the string, the update will not be saved in the configuration file. If the maximum length of the string is set to 96 and I/O resources of more than eight disks are updated, the string limitation may be exceeded. The update will not be saved in the configuration file though the update succeeds.

Copyright © 2011-2024 www.enmotech.com All rights reserved.