v1.1
- 关于MogDB
- 快速入门
- 安装指南
- 管理指南
- 日常运维
- 主备管理
- 高可用管理
- MOT内存表管理
- 列存表管理
- 备份与恢复
- 数据导出导入
- 升级指南
- 常见故障处理指南
- 常见故障定位手段
- 常见故障定位案例
- core问题定位
- TPCC运行时,注入磁盘满故障,TPCC卡住的问题
- 备机处于need repair(WAL)状态问题
- 内存不足问题
- 服务启动失败
- 出现“Error:No space left on device”提示
- 在XFS文件系统中,使用du命令查询数据文件大小大于文件实际大小
- 在XFS文件系统中,出现文件损坏
- switchover操作时,主机降备卡住
- 磁盘空间达到阈值,数据库只读
- 分析查询语句长时间运行的问题
- 分析查询语句运行状态
- 强制结束指定的问题会话
- 分析查询语句是否被阻塞
- 分析查询效率异常降低的问题
- 执行 SQL 语句时,提示 Lock wait timeout
- VACUUM FULL一张表后,表文件大小无变化
- 执行修改表分区操作时报错
- 不同用户查询同表显示数据不同
- 修改索引时只调用索引名提示索引不存在
- 重建索引失败
- 业务运行时整数转换错
- 高并发报错”too many clients already”或无法创建线程
- btree 索引故障情况下应对策略
- 安全指南
- 性能优化指南
- 开发者指南
- 参考指南
- 系统表及系统视图
- 系统表和系统视图概述
- 系统表
- GS_CLIENT_GLOBAL_KEYS
- GS_CLIENT_GLOBAL_KEYS_ARGS
- GS_COLUMN_KEYS
- GS_COLUMN_KEYS_ARGS
- GS_ENCRYPTED_COLUMNS
- GS_OPT_MODEL
- GS_WLM_INSTANCE_HISTORY
- GS_WLM_OPERATOR_INFO
- GS_WLM_PLAN_ENCODING_TABLE
- GS_WLM_PLAN_OPERATOR_INFO
- GS_WLM_USER_RESOURCE_HISTORY
- PG_AGGREGATE
- PG_AM
- PG_AMOP
- PG_AMPROC
- PG_APP_WORKLOADGROUP_MAPPING
- PG_ATTRDEF
- PG_ATTRIBUTE
- PG_AUTHID
- PG_AUTH_HISTORY
- PG_AUTH_MEMBERS
- PG_CAST
- PG_CLASS
- PG_COLLATION
- PG_CONSTRAINT
- PG_CONVERSION
- PG_DATABASE
- PG_DB_ROLE_SETTING
- PG_DEFAULT_ACL
- PG_DEPEND
- PG_DESCRIPTION
- PG_DIRECTORY
- PG_ENUM
- PG_EXTENSION
- PG_EXTENSION_DATA_SOURCE
- PG_FOREIGN_DATA_WRAPPER
- PG_FOREIGN_SERVER
- PG_FOREIGN_TABLE
- PG_INDEX
- PG_INHERITS
- PG_JOB
- PG_JOB_PROC
- PG_LANGUAGE
- PG_LARGEOBJECT
- PG_LARGEOBJECT_METADATA
- PG_NAMESPACE
- PG_OBJECT
- PG_OPCLASS
- PG_OPERATOR
- PG_OPFAMILY
- PG_PARTITION
- PG_PLTEMPLATE
- PG_PROC
- PG_RANGE
- PG_RESOURCE_POOL
- PG_REWRITE
- PG_RLSPOLICY
- PG_SECLABEL
- PG_SHDEPEND
- PG_SHDESCRIPTION
- PG_SHSECLABEL
- PG_STATISTIC
- PG_STATISTIC_EXT
- PG_TABLESPACE
- PG_TRIGGER
- PG_TS_CONFIG
- PG_TS_CONFIG_MAP
- PG_TS_DICT
- PG_TS_PARSER
- PG_TS_TEMPLATE
- PG_TYPE
- PG_USER_MAPPING
- PG_USER_STATUS
- PG_WORKLOAD_GROUP
- PLAN_TABLE_DATA
- STATEMENT_HISTORY
- 系统视图
- GS_SESSION_CPU_STATISTICS
- GS_SESSION_MEMORY_STATISTICS
- GS_SQL_COUNT
- GS_WLM_OPERATOR_HISTORY
- GS_WLM_OPERATOR_STATISTICS
- GS_WLM_PLAN_OPERATOR_HISTORY
- GS_WLM_REBUILD_USER_RESOURCE_POOL
- GS_WLM_RESOURCE_POOL
- GS_WLM_SESSION_HISTORY
- GS_WLM_SESSION_INFO_ALL
- GS_WLM_USER_INFO
- GS_WLM_SESSION_STATISTICS
- GS_STAT_SESSION_CU
- MPP_TABLES
- PG_AVAILABLE_EXTENSION_VERSIONS
- PG_AVAILABLE_EXTENSIONS
- PG_CURSORS
- PG_EXT_STATS
- PG_GET_INVALID_BACKENDS
- PG_GET_SENDERS_CATCHUP_TIME
- PG_GROUP
- PG_GTT_RELSTATS
- PG_GTT_STATS
- PG_GTT_ATTACHED_PIDS
- PG_INDEXES
- PG_LOCKS
- PG_MATVIEWS
- PG_NODE_ENV
- PG_OS_THREADS
- PG_PREPARED_STATEMENTS
- PG_PREPARED_XACTS
- PG_REPLICATION_SLOTS
- PG_RLSPOLICIES
- PG_ROLES
- PG_RULES
- PG_SECLABELS
- PG_SESSION_WLMSTAT
- PG_SESSION_IOSTAT
- PG_SETTINGS
- PG_SHADOW
- PG_STATS
- PG_STAT_ACTIVITY
- PG_STAT_ALL_INDEXES
- PG_STAT_ALL_TABLES
- PG_STAT_BAD_BLOCK
- PG_STAT_BGWRITER
- PG_STAT_DATABASE
- PG_STAT_DATABASE_CONFLICTS
- PG_STAT_USER_FUNCTIONS
- PG_STAT_USER_INDEXES
- PG_STAT_USER_TABLES
- PG_STAT_REPLICATION
- PG_STAT_SYS_INDEXES
- PG_STAT_SYS_TABLES
- PG_STAT_XACT_ALL_TABLES
- PG_STAT_XACT_SYS_TABLES
- PG_STAT_XACT_USER_FUNCTIONS
- PG_STAT_XACT_USER_TABLES
- PG_STATIO_ALL_INDEXES
- PG_STATIO_ALL_SEQUENCES
- PG_STATIO_ALL_TABLES
- PG_STATIO_SYS_INDEXES
- PG_STATIO_SYS_SEQUENCES
- PG_STATIO_SYS_TABLES
- PG_STATIO_USER_INDEXES
- PG_STATIO_USER_SEQUENCES
- PG_STATIO_USER_TABLES
- PG_THREAD_WAIT_STATUS
- PG_TABLES
- PG_TDE_INFO
- PG_TIMEZONE_NAMES
- PG_TOTAL_USER_RESOURCE_INFO
- PG_USER
- PG_USER_MAPPINGS
- PG_VIEWS
- PG_WLM_STATISTICS
- PLAN_TABLE
- GS_FILE_STAT
- GS_OS_RUN_INFO
- GS_REDO_STAT
- GS_SESSION_MEMORY
- GS_SESSION_MEMORY_DETAIL
- GS_SESSION_STAT
- GS_SESSION_TIME
- GS_THREAD_MEMORY_DETAIL
- GS_TOTAL_MEMORY_DETAIL
- PG_TIMEZONE_ABBREVS
- PG_TOTAL_USER_RESOURCE_INFO_OID
- PG_VARIABLE_INFO
- GS_INSTANCE_TIME
- 系统函数
- 支持的数据类型
- SQL 语法
- ABORT
- ALTER DATABASE
- ALTER DATA SOURCE
- ALTER DEFAULT PRIVILEGES
- ALTER DIRECTORY
- ALTER FOREIGN TABLE
- ALTER FUNCTION
- ALTER GROUP
- ALTER INDEX
- ALTER LARGE OBJECT
- ALTER MATERIALIZED VIEW
- ALTER ROLE
- ALTER ROW LEVEL SECURITY POLICY
- ALTER RULE
- ALTER SCHEMA
- ALTER SEQUENCE
- ALTER SERVER
- ALTER SESSION
- ALTER SYNONYM
- ALTER SYSTEM KILL SESSION
- ALTER SYSTEM SET
- ALTER TABLE
- ALTER TABLE PARTITION
- ALTER TABLESPACE
- ALTER TEXT SEARCH CONFIGURATION
- ALTER TEXT SEARCH DICTIONARY
- ALTER TRIGGER
- ALTER TYPE
- ALTER USER
- ALTER USER MAPPING
- ALTER VIEW
- ANALYZE | ANALYSE
- BEGIN
- CALL
- CHECKPOINT
- CLOSE
- CLUSTER
- COMMENT
- COMMIT | END
- COMMIT PREPARED
- COPY
- CREATE CLIENT MASTER KEY
- CREATE COLUMN ENCRYPTION KEY
- CREATE DATABASE
- CREATE DATA SOURCE
- CREATE DIRECTORY
- CREATE FOREIGN TABLE
- CREATE FUNCTION
- CREATE GROUP
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE ROW LEVEL SECURITY POLICY
- CREATE PROCEDURE
- CREATE ROLE
- CREATE RULE
- CREATE SCHEMA
- CREATE SEQUENCE
- CREATE SERVER
- CREATE SYNONYM
- CREATE TABLE
- CREATE TABLE AS
- CREATE TABLE PARTITION
- CREATE TABLESPACE
- CREATE TEXT SEARCH CONFIGURATION
- CREATE TEXT SEARCH DICTIONARY
- CREATE TRIGGER
- CREATE TYPE
- CREATE USER
- CREATE USER MAPPING
- CREATE VIEW
- CURSOR
- DEALLOCATE
- DECLARE
- DELETE
- DO
- DROP CLIENT MASTER KEY
- DROP COLUMN ENCRYPTION KEY
- DROP DATABASE
- DROP DATA SOURCE
- DROP DIRECTORY
- DROP FOREIGN TABLE
- DROP FUNCTION
- DROP GROUP
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP OWNED
- DROP ROW LEVEL SECURITY POLICY
- DROP PROCEDURE
- DROP ROLE
- DROP RULE
- DROP SCHEMA
- DROP SEQUENCE
- DROP SERVER
- DROP SYNONYM
- DROP TABLE
- DROP TABLESPACE
- DROP TEXT SEARCH CONFIGURATION
- DROP TEXT SEARCH DICTIONARY
- DROP TRIGGER
- DROP TYPE
- DROP USER
- DROP USER MAPPING
- DROP VIEW
- EXECUTE
- EXPLAIN
- EXPLAIN PLAN
- FETCH
- GRANT
- INSERT
- LOCK
- MOVE
- MERGE INTO
- PREPARE
- PREPARE TRANSACTION
- REASSIGN OWNED
- REFRESH MATERIALIZED VIEW
- REINDEX
- RELEASE SAVEPOINT
- RESET
- REVOKE
- ROLLBACK
- ROLLBACK PREPARED
- ROLLBACK TO SAVEPOINT
- SAVEPOINT
- SELECT
- SELECT INTO
- SET
- SET CONSTRAINTS
- SET ROLE
- SET SESSION AUTHORIZATION
- SET TRANSACTION
- SHOW
- SHUTDOW
- START TRANSACTION
- TRUNCATE
- UPDATE
- VACUUM
- VALUES
- GUC参数说明
- DBE_PERF
- 概述
- OS
- Instance
- Memory
- File
- Object
- STAT_USER_TABLES
- SUMMARY_STAT_USER_TABLES
- GLOBAL_STAT_USER_TABLES
- STAT_USER_INDEXES
- SUMMARY_STAT_USER_INDEXES
- GLOBAL_STAT_USER_INDEXES
- STAT_SYS_TABLES
- SUMMARY_STAT_SYS_TABLES
- GLOBAL_STAT_SYS_TABLES
- STAT_SYS_INDEXES
- SUMMARY_STAT_SYS_INDEXES
- GLOBAL_STAT_SYS_INDEXES
- STAT_ALL_TABLES
- SUMMARY_STAT_ALL_TABLES
- GLOBAL_STAT_ALL_TABLES
- STAT_ALL_INDEXES
- SUMMARY_STAT_ALL_INDEXES
- GLOBAL_STAT_ALL_INDEXES
- STAT_DATABASE
- SUMMARY_STAT_DATABASE
- GLOBAL_STAT_DATABASE
- STAT_DATABASE_CONFLICTS
- SUMMARY_STAT_DATABASE_CONFLICTS
- GLOBAL_STAT_DATABASE_CONFLICTS
- STAT_XACT_ALL_TABLES
- SUMMARY_STAT_XACT_ALL_TABLES
- GLOBAL_STAT_XACT_ALL_TABLES
- STAT_XACT_SYS_TABLES
- SUMMARY_STAT_XACT_SYS_TABLES
- GLOBAL_STAT_XACT_SYS_TABLES
- STAT_XACT_USER_TABLES
- SUMMARY_STAT_XACT_USER_TABLES
- GLOBAL_STAT_XACT_USER_TABLES
- STAT_XACT_USER_FUNCTIONS
- SUMMARY_STAT_XACT_USER_FUNCTIONS
- GLOBAL_STAT_XACT_USER_FUNCTIONS
- STAT_BAD_BLOCK
- SUMMARY_STAT_BAD_BLOCK
- GLOBAL_STAT_BAD_BLOCK
- STAT_USER_FUNCTIONS
- SUMMARY_STAT_USER_FUNCTIONS
- GLOBAL_STAT_USER_FUNCTIONS
- Workload
- Session/Thread
- SESSION_STAT
- GLOBAL_SESSION_STAT
- SESSION_TIME
- GLOBAL_SESSION_TIME
- SESSION_MEMORY
- GLOBAL_SESSION_MEMORY
- SESSION_MEMORY_DETAIL
- GLOBAL_SESSION_MEMORY_DETAIL
- SESSION_STAT_ACTIVITY
- GLOBAL_SESSION_STAT_ACTIVITY
- THREAD_WAIT_STATUS
- GLOBAL_THREAD_WAIT_STATUS
- LOCAL_THREADPOOL_STATUS
- GLOBAL_THREADPOOL_STATUS
- SESSION_CPU_RUNTIME
- SESSION_MEMORY_RUNTIME
- STATEMENT_IOSTAT_COMPLEX_RUNTIME
- Transaction
- Query
- STATEMENT
- SUMMARY_STATEMENT
- STATEMENT_COUNT
- GLOBAL_STATEMENT_COUNT
- SUMMARY_STATEMENT_COUNT
- GLOBAL_STATEMENT_COMPLEX_HISTORY
- GLOBAL_STATEMENT_COMPLEX_HISTORY_TABLE
- GLOBAL_STATEMENT_COMPLEX_RUNTIME
- STATEMENT_RESPONSETIME_PERCENTILE
- STATEMENT_USER_COMPLEX_HISTORY
- STATEMENT_COMPLEX_RUNTIME
- STATEMENT_COMPLEX_HISTORY_TABLE
- STATEMENT_COMPLEX_HISTORY
- STATEMENT_WLMSTAT_COMPLEX_RUNTIME
- STATEMENT_HISTORY
- Cache/IO
- STATIO_USER_TABLES
- SUMMARY_STATIO_USER_TABLES
- GLOBAL_STATIO_USER_TABLES
- STATIO_USER_INDEXES
- SUMMARY_STATIO_USER_INDEXES
- GLOBAL_STATIO_USER_INDEXES
- STATIO_USER_SEQUENCES
- SUMMARY_STATIO_USER_SEQUENCES
- GLOBAL_STATIO_USER_SEQUENCES
- STATIO_SYS_TABLES
- SUMMARY_STATIO_SYS_TABLES
- GLOBAL_STATIO_SYS_TABLES
- STATIO_SYS_INDEXES
- SUMMARY_STATIO_SYS_INDEXES
- GLOBAL_STATIO_SYS_INDEXES
- STATIO_SYS_SEQUENCES
- SUMMARY_STATIO_SYS_SEQUENCES
- GLOBAL_STATIO_SYS_SEQUENCES
- STATIO_ALL_TABLES
- SUMMARY_STATIO_ALL_TABLES
- GLOBAL_STATIO_ALL_TABLES
- STATIO_ALL_INDEXES
- SUMMARY_STATIO_ALL_INDEXES
- GLOBAL_STATIO_ALL_INDEXES
- STATIO_ALL_SEQUENCES
- SUMMARY_STATIO_ALL_SEQUENCES
- GLOBAL_STATIO_ALL_SEQUENCES
- GLOBAL_STAT_DB_CU
- GLOBAL_STAT_SESSION_CU
- Utility
- REPLICATION_STAT
- GLOBAL_REPLICATION_STAT
- REPLICATION_SLOTS
- GLOBAL_REPLICATION_SLOTS
- BGWRITER_STAT
- GLOBAL_BGWRITER_STAT
- GLOBAL_CKPT_STATUS
- GLOBAL_DOUBLE_WRITE_STATUS
- GLOBAL_PAGEWRITER_STATUS
- GLOBAL_RECORD_RESET_TIME
- GLOBAL_REDO_STATUS
- GLOBAL_RECOVERY_STATUS
- CLASS_VITAL_INFO
- USER_LOGIN
- SUMMARY_USER_LOGIN
- GLOBAL_GET_BGWRITER_STATUS
- Lock
- Wait Events
- Configuration
- Operator
- Workload Manager
- Global Plancache
- 附录
- 数据库报错信息
- SQL标准错误码说明
- 第三方库错误码说明
- GAUSS-00001 - GAUSS-00100
- GAUSS-00101 - GAUSS-00200
- GAUSS 00201 - GAUSS 00300
- GAUSS 00301 - GAUSS 00400
- GAUSS 00401 - GAUSS 00500
- GAUSS 00501 - GAUSS 00600
- GAUSS 00601 - GAUSS 00700
- GAUSS 00701 - GAUSS 00800
- GAUSS 00801 - GAUSS 00900
- GAUSS 00901 - GAUSS 01000
- GAUSS 01001 - GAUSS 01100
- GAUSS 01101 - GAUSS 01200
- GAUSS 01201 - GAUSS 01300
- GAUSS 01301 - GAUSS 01400
- GAUSS 01401 - GAUSS 01500
- GAUSS 01501 - GAUSS 01600
- GAUSS 01601 - GAUSS 01700
- GAUSS 01701 - GAUSS 01800
- GAUSS 01801 - GAUSS 01900
- GAUSS 01901 - GAUSS 02000
- GAUSS 02001 - GAUSS 02100
- GAUSS 02101 - GAUSS 02200
- GAUSS 02201 - GAUSS 02300
- GAUSS 02301 - GAUSS 02400
- GAUSS 02401 - GAUSS 02500
- GAUSS 02501 - GAUSS 02600
- GAUSS 02601 - GAUSS 02700
- GAUSS 02701 - GAUSS 02800
- GAUSS 02801 - GAUSS 02900
- GAUSS 02901 - GAUSS 03000
- GAUSS 03001 - GAUSS 03100
- GAUSS 03101 - GAUSS 03200
- GAUSS 03201 - GAUSS 03300
- GAUSS 03301 - GAUSS 03400
- GAUSS 03401 - GAUSS 03500
- GAUSS 03501 - GAUSS 03600
- GAUSS 03601 - GAUSS 03700
- GAUSS 03701 - GAUSS 03800
- GAUSS 03801 - GAUSS 03900
- GAUSS 03901 - GAUSS 04000
- GAUSS 04001 - GAUSS 04100
- GAUSS 04101 - GAUSS 04200
- GAUSS 04201 - GAUSS 04300
- GAUSS 04301 - GAUSS 04400
- GAUSS 04401 - GAUSS 04500
- GAUSS 04501 - GAUSS 04600
- GAUSS 04601 - GAUSS 04700
- GAUSS 04701 - GAUSS 04800
- GAUSS 04801 - GAUSS 04900
- GAUSS 04901 - GAUSS 05000
- GAUSS 05001 - GAUSS 05100
- GAUSS 05101 - GAUSS 05200
- GAUSS 05201 - GAUSS 05300
- GAUSS 05301 - GAUSS 05400
- GAUSS 05401 - GAUSS 05500
- GAUSS 05501 - GAUSS 05600
- GAUSS 05601 - GAUSS 05700
- GAUSS 05701 - GAUSS 05800
- GAUSS 05801 - GAUSS 05900
- GAUSS 05901 - GAUSS 06000
- GAUSS 06001 - GAUSS 06100
- GAUSS 06101 - GAUSS 06200
- GAUSS 06201 - GAUSS 06300
- GAUSS 06301 - GAUSS 06400
- GAUSS 06401 - GAUSS 06500
- GAUSS 06501 - GAUSS 06600
- GAUSS 06601 - GAUSS 06700
- GAUSS 06701 - GAUSS 06800
- GAUSS 06801 - GAUSS 06900
- GAUSS 06901 - GAUSS 07000
- GAUSS 07001 - GAUSS 07100
- GAUSS 07101 - GAUSS 07200
- GAUSS 07201 - GAUSS 07300
- GAUSS 07301 - GAUSS 07400
- GAUSS 07401 - GAUSS 07480
- GAUSS 50000 - GAUSS 50999
- GAUSS 51000 - GAUSS 51999
- GAUSS 52000 - GAUSS 52999
- GAUSS 53000 - GAUSS 53699
- 系统表及系统视图
- 术语表
检查MogDB健康状态
检查办法
通过MogDB提供的gs_check工具可以开展MogDB健康状态检查。
注意事项
- 扩容新节点检查只能在root用户下执行,其他场景都必须在omm用户下执行。
- 必须指定-i或-e参数,-i会检查指定的单项,-e会检查对应场景配置中的多项。
- 如果-i参数中不包含root类检查项或-e场景配置列表中没有root类检查项,则不需要交互输入root权限的用户及其密码。
- 可使用-skip-root-items跳过检查项中包含的root类检查,以免需要输入root权限用户及密码。
- 检查扩容新节点与现有节点之间的一致性,在现有节点执行gs_check命令指定-hosts参数进行检查,其中hosts文件中需要写入新节点ip。
操作步骤
方式1:
-
以操作系统用户omm登录数据库主节点。
-
执行如下命令对MogDB数据库状态进行检查。
gs_check -i CheckClusterState
其中,-i指定检查项,注意区分大小写。格式: -i CheckClusterState、-i CheckCPU或-i CheckClusterState,CheckCPU。
取值范围为所有支持的检查项名称,详细列表请参见《MogDB 工具参考》中"服务端工具 > gs_checkos > MogDB状态检查表",用户可以根据需求自己编写新检查项。
方式2:
-
以操作系统用户omm登录数据库主节点。
-
执行如下命令对MogDB数据库进行健康检查。
gs_check -e inspect
其中,-e指定场景名,注意区分大小写。格式: -e inspect或-e upgrade。
取值范围为所有支持的巡检场景名称,默认列表包括: inspect(例行巡检)、upgrade(升级前巡检)、install(安装)、binary_upgrade(就地升级前巡检)、health(健康检查巡检)、slow_node(节点)、longtime(耗时长巡检),用户可以根据需求自己编写场景。
MogDB巡检的主要作用是在MogDB运行过程中,检查整个MogDB状态是否正常,或者重大操作前(升级、扩容),确保MogDB满足操作所需的环境条件和状态条件。详细的巡检项目和场景请参见《MogDB 工具参考》中"服务端工具 > gs_checkos > MogDB状态检查表"。
示例
执行单项检查结果:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3
Checking... [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes. success: 3
Analysis the check result successfully
Success. All check items run completed. Total:1 Success:1 Failed:0
For more information please refer to /opt/mogdb/tools/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz
本地执行结果:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L
2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] 检查主机CPU占用率,如果idle 大于30%并且iowait 小于 30%.则检查项通过,否则检查项不通过
2017-12-29 17:09:29 [RST] OK
2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749) 12/29/17 _x86_64_
17:09:24 CPU %user %nice %system %iowait %steal %idle
17:09:25 all 0.25 0.00 0.25 0.00 0.00 99.50
17:09:26 all 0.25 0.00 0.13 0.00 0.00 99.62
17:09:27 all 0.25 0.00 0.25 0.13 0.00 99.37
17:09:28 all 0.38 0.00 0.25 0.00 0.13 99.25
17:09:29 all 1.00 0.00 0.88 0.00 0.00 98.12
Average: all 0.43 0.00 0.35 0.03 0.03 99.17
执行场景检查结果:
[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2
Checking... [ ] 21/59
Checking... [=========================] 59/59
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes. success: 2
CheckDBParams...............................OK
The item run on 1 nodes. success: 1
CheckDebugSwitch............................OK
The item run on 2 nodes. success: 2
CheckDirPermissions.........................OK
The item run on 2 nodes. success: 2
CheckReadonlyMode...........................OK
The item run on 1 nodes. success: 1
CheckEnvProfile.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
GAUSSHOME /usr1/mogdb/app
LD_LIBRARY_PATH /usr1/mogdb/app/lib
PATH /usr1/mogdb/app/bin
CheckBlockdev...............................OK
The item run on 2 nodes. success: 2
CheckCurConnCount...........................OK
The item run on 1 nodes. success: 1
CheckCursorNum..............................OK
The item run on 1 nodes. success: 1
CheckPgxcgroup..............................OK
The item run on 1 nodes. success: 1
CheckDiskFormat.............................OK
The item run on 2 nodes. success: 2
CheckSpaceUsage.............................OK
The item run on 2 nodes. success: 2
CheckInodeUsage.............................OK
The item run on 2 nodes. success: 2
CheckSwapMemory.............................OK
The item run on 2 nodes. success: 2
CheckLogicalBlock...........................OK
The item run on 2 nodes. success: 2
CheckIOrequestqueue.....................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'
CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes. success: 2
CheckIOConfigure............................OK
The item run on 2 nodes. success: 2
CheckMTU....................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
1500
CheckPing...................................OK
The item run on 2 nodes. success: 2
CheckRXTX...................................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256
CheckNetWorkDrop............................OK
The item run on 2 nodes. success: 2
CheckMultiQueue.............................OK
The item run on 2 nodes. success: 2
CheckEncoding...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
LANG=en_US.UTF-8
CheckFirewall...............................OK
The item run on 2 nodes. success: 2
CheckKernelVer..............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64
CheckMaxHandle..............................OK
The item run on 2 nodes. success: 2
CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06
CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.
CheckSysParams..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.
CheckTHP....................................OK
The item run on 2 nodes. success: 2
CheckTimeZone...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
+0800
CheckCPU....................................OK
The item run on 2 nodes. success: 2
CheckSshdService............................OK
The item run on 2 nodes. success: 2
Warning reason: UseDNS parameter is not set; expected: no
CheckCrondService...........................OK
The item run on 2 nodes. success: 2
CheckStack..................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
8192
CheckNoCheckSum.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
Nochecksum value is N,Check items pass.
CheckSysPortRange...........................OK
The item run on 2 nodes. success: 2
CheckMemInfo................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G
CheckHyperThread............................OK
The item run on 2 nodes. success: 2
CheckTableSpace.............................OK
The item run on 1 nodes. success: 1
CheckSctpService............................OK
The item run on 2 nodes. success: 2
CheckSysadminUser...........................OK
The item run on 1 nodes. success: 1
CheckGUCConsistent..........................OK
All DN instance guc value is consistent.
CheckMaxProcMemory..........................OK
The item run on 1 nodes. success: 1
CheckBootItems..............................OK
The item run on 2 nodes. success: 2
CheckHashIndex..............................OK
The item run on 1 nodes. success: 1
CheckPgxcRedistb............................OK
The item run on 1 nodes. success: 1
CheckNodeGroupName..........................OK
The item run on 1 nodes. success: 1
CheckTDDate.................................OK
The item run on 1 nodes. success: 1
CheckDilateSysTab...........................OK
The item run on 1 nodes. success: 1
CheckKeyProAdj..............................OK
The item run on 2 nodes. success: 2
CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun 2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/mogdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun 1 16:15:15 2020 /usr1/mogdb/app/bin/mogdb -D /usr1/mogdb/data/dn1 -M standby
CheckFilehandle.............................OK
The item run on 2 nodes. success: 2
CheckRouting................................OK
The item run on 2 nodes. success: 2
CheckNICModel...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
version: 1.0.1
model: Red Hat, Inc. Virtio network device
CheckDropCache..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
No DropCache process is running
CheckMpprcFile..............................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
There is no mpprc file
Analysis the check result successfully
Failed. All check items run completed. Total:59 Success:52 Warning:5 NG:2
For more information please refer to /usr1/mogdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz
异常处理
如果发现检查结果异常,可以根据以下内容进行修复。
表 1 检查MogDB运行状态
检查项 | 异常状态 | 处理方法 |
CheckClusterState(检查MogDB状态) | MogDB未启动或MogDB实例未启动 | 使用以下命令启动MogDB及实例。 gs_om -t start |
MogDB状态异常或MogDB实例异常 | 检查各主机、实例状态,根据状态信息进行排查。 gs_check -i CheckClusterState |
|
CheckDBParams(检查数据库参数) | 数据库参数错误 | 通过gs_guc工具修改数据库参数为指定值。 |
CheckDebugSwitch(检查调试日志) | 日志级别不正确 | 使用gs_guc工具将log_min_messages改为指定内容。 |
CheckDirPermissions(检查目录权限) | 路径权限错误 | 修改对应目录权限为指定数值(750/700)。 chmod 750 DIR |
CheckReadonlyMode(检查只读模式) | 只读模式被打开 | 确认数据库节点所在磁盘使用率未超阈值(默认60%)且未在执行其他运维操作。 gs_check -i CheckDataDiskUsage ps ux 使用gs_guc工具关闭MogDB只读模式 gs_guc reload -N all -I all -c 'default_transaction_read_only = off' |
CheckEnvProfile(检查环境变量) | 环境变量不一致 | 重新执行前置更新环境变量信息。 |
CheckBlockdev(检查磁盘预读块) | 磁盘预读块大小不为16384 | 使用gs_checkos设置预读块大小为16384KB,并写入自启动文件。 gs_checkos -i B3 |
CheckCursorNum(检查游标数) | 检查游标数失败 | 检查数据库能否正常连接,MogDB状态是否正常。 |
CheckPgxcgroup(检查重分布状态) | 有未完成重分布的pgxc_group表 | 继续完成扩容或缩容的数据重分布操作。 gs_expand、gs_shrink |
CheckDiskFormat(检查磁盘配置) | 各节点磁盘配置不一致 | 将各节点的磁盘规格改为相同。 |
CheckSpaceUsage(检查磁盘空间使用率) | 磁盘可用空间不足 | 清理或扩展对应目录所在的磁盘。 |
CheckInodeUsage(检查磁盘索引使用率) | 磁盘可用索引不足 | 清理或扩展对应目录所在的磁盘。 |
CheckSwapMemory(检查交换内存) | 交换内存大于物理内存 | 将交换内存调小或关闭。 |
CheckLogicalBlock(检查磁盘逻辑块) | 磁盘逻辑块大小不为512 | 使用gs_checkos修改磁盘逻辑块大小为512KB,并写入开机自启动文件。 gs_checkos -i B4 |
CheckIOrequestqueue(检查IO请求) | IO请求值不为32768 | 使用gs_checkos设置IO请求值为32768,并写入开机自启动文件。 gs_checkos -i B4 |
CheckCurConnCount(检查当前连接数) | 当前连接数超过最大连接数的90% | 断开未使用的数据库主节点连接。 |
CheckMaxAsyIOrequests(检查最大异步请求) | 最大异步请求值小于104857600或当前节点数据库实例数乘以1048576 | 使用gs_checkos设置最大异步请求值为104857600和当前节点数据库实例数乘以1048576中的最大值。 gs_checkos -i B4 |
CheckMTU(检查MTU值) | MTU值不一致 | 设置各节点的MTU一致为1500或8192。 ifconfig eth* MTU 1500 |
CheckIOConfigure(检查IO配置) | IO配置不是deadline | 使用gs_checkos设置IO配置为deadline,并写入开机自启动文件。 gs_checkos -i B4 |
CheckRXTX(检查RXTX值) | 网卡RX/TX值不是4096 | 使用checkos设置MogDB使用的物理网卡RX/TX值为4096 gs_checkos -i B5 |
CheckPing(检查网络通畅) | 存在MogDB IP无法ping通 | 检查异常ip间网络设置和状态、防火墙状态。 |
CheckNetWorkDrop(检查网络丢包率) | 网络通信丢包率高于1% | 检查对应IP间网络负载、状态。 |
CheckMultiQueue(检查网卡多队列) | 未开启网卡多队列并未将网卡中断绑定到不同CPU核心 | 开启网卡多队列并将网卡队列中断绑定到不同的CPU核心。 |
CheckEncoding(检查编码格式) | 各节点编码格式不一致 | 在/etc/profile中写入一致的编码信息。 echo "export LANG=XXX" >> /etc/profile |
CheckActQryCount(检查归档模式) | 启用归档模式,归档目录不在主数据库节点目录下 | 关闭归档模式或者将归档目录设置在主数据库节点目录下。 |
CheckFirewall(检查防火墙) | 防火墙未关闭 | 关闭防火墙服务。 systemctl disable firewalld.service |
CheckKernelVer(检查内核版本) | 节点间的内核版本不一致 | |
CheckMaxHandle(检查最大文件句柄数) | 最大文件句柄数小于1000000 | 设置91-nofile.conf/90-nofile.conf最大文件句柄数软硬限制为1000000。 gs_checkos -i B2 |
CheckNTPD(检查时间同步服务) | NTPD服务未开启或时间误差超过一分钟 | 开启NTPD服务并设置时钟一致。 |
CheckSysParams(检查操作系统参数) | 操作系统参数设置不满足要求 | 使用gs_checkos进行参数设置或手动设置。 gs_checkos -i B1 vim /etc/sysctl.conf |
CheckTHP(检查THP服务) | THP服务未开启 | 使用gs_checkos设置THP服务 gs_checkos -i B6 |
CheckTimeZone(检查时区) | 时区不一致 | 设置各节点为同一时区 cp /usr/share/zoneinfo/\$地区/$时区\ /etc/localtime |
CheckCPU(检查CPU) | CPU占用过高或IO等待过高 | 进行CPU配置升级或磁盘性能升级 |
CheckSshdService(检查SSHD服务) | 未开启SSHD服务 | 启动SSHD服务并写入开机自启动文件 service sshd start echo "server sshd start" >> initFile |
CheckSshdConfig(检查SSHD配置) | SSHD服务配置错误 | 设置SSHD服务, PasswordAuthentication=no; MaxStartups=1000; UseDNS=yes; ClientAliveInterval=10800/ClientAliveInterval=0 并重启服务: server sshd start |
CheckCrondService(检查Crond服务) | Crond服务未启动 | 安装Crond服务并启用 |
CheckStack(检查堆栈大小) | 堆栈大小小于3072 | 使用gs_checkos设置为3072并重启堆栈值过小进程。 gs_checkos -i B2 |
CheckNoCheckSum(检查NoCheckSum参数) | NoCheckSum设置错误或不一致 | 设置各节点的NoCheckSum值一致(存在redHat6.4⁄6.5且为bond0时全部设为Y,否则全部设为N) echo Y > /sys/module/sctp/parameters/no_checksums |
CheckSysPortRange(检查系统端口设置) | 系统ip端口不在预期范围内或MogDB端口在系统ip端口内 | 设置系统ip端口范围参数到26000-65535之中;设置MogDB端口在系统ip端口范围外 vim /etc/sysctl.conf |
CheckMemInfo(检查内存信息) | 各节点内存大小不一致 | 使用相同规格的物理内存 |
CheckHyperThread(检查超线程) | 未开启CPU超线程 | 开启CPU超线程 |
CheckTableSpace(检查表空间) | 表空间路径和MogDB路径存在嵌套或表空间路径相互存在嵌套 | 将表空间数据迁移到路径合法的表空间中 |
CheckSctpService(检查SCTP服务) | 未开启SCTP服务 | 部署并开启SCTP服务 modprobe sctp |