2022-05-12

数据库日常运维手册

一. 检查数据库基本状况

  1. Oracle 实例状态

    select instance_name,host_name,startup_time,status,database_status from v$instance;
    
    INSTANCE_NAME     HOST_NAME  STARTUP_TIME         STATUS   DATABASE_STATUS
    ---------------- -------- ------------------ ------------ -----------------
    ORCLCDB       ecs-398213     11-MAY-22         OPEN    ACTIVE
    
  2. Oracle 服务进程

    ps -ef | grep oracle | grep -E "dbw|lgw|smon|pmon|arc|ckpt|reco"
    
    oracle     22823       1  0 May10 ?        00:00:02 ora_pmon_ORCLCDB //监听客户端连接状态
    oracle     22857       1  0 May10 ?        00:00:03 ora_dbw0_ORCLCDB //写数据文件进程
    oracle     22859       1  0 May10 ?        00:00:02 ora_lgwr_ORCLCDB //写日志文件进程 
    oracle     22861       1  0 May10 ?        00:00:10 ora_ckpt_ORCLCDB //进行检查点进程
    oracle     22865       1  0 May10 ?        00:00:01 ora_smon_ORCLCDB //监听实例状态进程
    oracle     22871       1  0 May10 ?        00:00:00 ora_reco_ORCLCDB //恢复进程
    oracle     32465       1  0 14:04 ?        00:00:00 ora_arc0_ORCLCDB //日志归档进程
    oracle     32467       1  0 14:04 ?        00:00:00 ora_arc1_ORCLCDB
    oracle     32469       1  0 14:04 ?        00:00:00 ora_arc2_ORCLCDB
    oracle     32471       1  0 14:04 ?        00:00:00 ora_arc3_ORCLCDB
    
  3. Oracle 监听状态

    lsnrctl status
    
    Services Summary...
    Service "ORCLCDB" has 1 instance(s).
      Instance "ORCLCDB", status READY, has 1 handler(s) for this service...
    Service "ORCLCDBXDB" has 1 instance(s).
      Instance "ORCLCDB", status READY, has 1 handler(s) for this service...
    Service "dea333d52e005a62e0530100007fd019" has 1 instance(s).
      Instance "ORCLCDB", status READY, has 1 handler(s) for this service...
    Service "orclpdb1" has 1 instance(s).
      Instance "ORCLCDB", status READY, has 1 handler(s) for this service...
    The command completed successfully
    

二. 检查系统和Oracle日志文件

  1. 操作系统日志文件:查看是否有跟Oracle相关的报错

    cat /var/log/messages | grep  failed
    
    [system] Activation via systemd failed for unit 'org.freedesktop.resolve1.service':
    
  2. Oracle日志文件

    SQL> show parameter dump_dest
    
    NAME                  TYPE    VALUE
    ------------------------------------ ----------- ------------------------------
    background_dump_dest string /opt/oracle/product/19c/dbhome_1/rdbms/log
    core_dump_dest           string /opt/oracle/diag/rdbms/orclcdb/ORCLCDB/cdump
    user_dump_dest           string /opt/oracle/product/19c/dbhome_1/rdbms/log
    
    SQL> select * from v$logfile;
    
    GROUP# STATUS      TYPE       MEMBER                     IS_RECOVERY_DES CON_ID
    ------ ---------- ---------- -------------------------------------------------- ------
         3         ONLINE     /opt/oracle/oradata/ORCLCDB/redo03.log     NO           0
         2         ONLINE     /opt/oracle/oradata/ORCLCDB/redo02.log     NO           0
         1         ONLINE     /opt/oracle/oradata/ORCLCDB/redo01.log     NO           0
    
  1. Oracle核心转储目录

    ls $ORACLE_BASE/admin/ORCLCDB
    
    [oracle@ecs-398213 ~]$ ls $ORACLE_BASE/admin/ORCLCDB
    adump  dpdump  pfile  xdb_wallet
    
  1. ROOT用户和ORACLE用户的Email

    tail –n 200 /var/mail/root  
    tail –n 200 /var/mail/oracle #检查有无跟oracle用户相关的报错信息
    

三. 检查Oracle对象状态

  1. Oracle控制文件状态: STATUS为空表示状态正常

    SQL> select status ,name from v$controlfile;
    
    STATUS      NAME
    ---------- --------------------------------------------------
        /opt/oracle/oradata/ORCLCDB/control01.ctl
        /opt/oracle/oradata/ORCLCDB/control02.ctl
    
  2. Oracle在线日志状态:STATUS为空表示正常

    SQL> select group#,status,type,member from v$logfile;
    
    GROUP# STATUS            TYPE         MEMBER
    ------ -------------------- -------------------- --------------------------------
         3               ONLINE       /opt/oracle/oradata/ORCLCDB/redo03.log
         2               ONLINE       /opt/oracle/oradata/ORCLCDB/redo02.log
         1               ONLINE       /opt/oracle/oradata/ORCLCDB/redo01.log
    
  3. Oracle表空间状态:STATUS 状态应该都为ONLINE

    SQL> select tablespace_name,status from dba_tablespaces;
    
    TABLESPACE_NAME                 STATUS
    -------------------------------------------------- --------------------
    SYSTEM                          ONLINE
    SYSAUX                          ONLINE
    UNDOTBS1                    ONLINE
    TEMP                        ONLINE
    USERS                           ONLINE
    
  4. Oracle所有数据文件状态:STATUS状态应该为ONLINE

    SQL> select name,status from v$datafile;
    
    NAME                                            STATUS
    -------------------------------------------------- --------------------
    /opt/oracle/oradata/ORCLCDB/system01.dbf        SYSTEM
    /opt/oracle/oradata/ORCLCDB/sysaux01.dbf        ONLINE
    /opt/oracle/oradata/ORCLCDB/undotbs01.dbf           ONLINE
    /opt/oracle/oradata/ORCLCDB/pdbseed/system01.dbf   SYSTEM
    /opt/oracle/oradata/ORCLCDB/pdbseed/sysaux01.dbf   ONLINE
    /opt/oracle/oradata/ORCLCDB/users01.dbf         ONLINE
    /opt/oracle/oradata/ORCLCDB/pdbseed/undotbs01.dbf  ONLINE
    /opt/oracle/oradata/ORCLCDB/ORCLPDB1/system01.dbf  SYSTEM
    /opt/oracle/oradata/ORCLCDB/ORCLPDB1/sysaux01.dbf  ONLINE
    /opt/oracle/oradata/ORCLCDB/ORCLPDB1/undotbs01.dbf ONLINE
    /opt/oracle/oradata/ORCLCDB/ORCLPDB1/users01.dbf   ONLINE
    
  1. 检查无效对象:无返回表示正常

    SQL> select owner,object_name,object_type from dba_objects where status != 'VALID' and owner!='SYS' and owner!='SYSTEM';
    
    no rows selected
    
  1. 检查所有回滚段状态

    概述:回滚段用于存放数据修改之前的值(位置和值)。分为系统回滚段和非系统回滚段(public回滚段和private回滚段)。系统回滚段应该放在SYSTEM表空间中,并且应该永远保持ONLINE;public面向所有数据库实例,private某个实例私有。

    select owner,segment_id,segment_name,tablespace_name,status from dba_rollback_segs;
    
    OWNER  SEGMENT_ID SEGMENT_NAME            TABLESPACE_NAME                    STATUS
    ------ ---------- ------------------------------ -------------------------------------
    SYS          0         SYSTEM             SYSTEM                         ONLINE
    PUBLIC       1 _SYSSMU1_1261223759$       UNDOTBS1                       ONLINE
    PUBLIC       2 _SYSSMU2_27624015$         UNDOTBS1                       ONLINE
    PUBLIC       3 _SYSSMU3_2421748942$       UNDOTBS1                       ONLINE
    PUBLIC       4 _SYSSMU4_625702278$        UNDOTBS1                       ONLINE
    PUBLIC       5 _SYSSMU5_2101348960$       UNDOTBS1                       ONLINE
    PUBLIC       6 _SYSSMU6_813816332$        UNDOTBS1                       ONLINE
    PUBLIC       7 _SYSSMU7_2329891355$       UNDOTBS1                       ONLINE
    PUBLIC       8 _SYSSMU8_399776867$        UNDOTBS1                       ONLINE
    PUBLIC       9 _SYSSMU9_1692468413$       UNDOTBS1                       ONLINE
    PUBLIC       10 _SYSSMU10_930580995$      UNDOTBS1                       ONLINE
    

四. 检查Oracle相关资源使用情况

  1. Oracle初始化文件中的相关参数

    SQL> select max_utilization, initial_allocation, limit_value from v$resource_limit;
    
  2. 数据库连接情况

    SQL> select count(*) from v$process;   #查看当前数据库连接数
    
      COUNT(*)
    ----------
     62
    
    SQL> select count(*) from v$session;  #当前session连接数
    
      COUNT(*)
    ----------
     54
    
  3. 系统磁盘空间

    fdisk -l
    Device     Boot Start      End  Sectors Size Id Type
    /dev/vda1  *     2048 83886079 83884032  40G 83 Linux
    
    [oracle@ecs-398213 ~]$ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    devtmpfs        1.9G     0  1.9G   0% /dev
    tmpfs           1.9G  896M  973M  48% /dev/shm
    tmpfs           1.9G  8.5M  1.9G   1% /run
    tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
    /dev/vda1        40G   20G   18G  54% /
    tmpfs           374M     0  374M   0% /run/user/0
    
  4. 检查一些扩展异常的对象

    SQL> select Segment_Name, Segment_Type, TableSpace_Name, (Extents / Max_extents) * 100 Percent From sys.DBA_Segments Where Max_Extents != 0 and (Extents / Max_extents) * 100 >= 95 order By Percent;
    
    no rows selected
    

    如果有记录返回,则这些对象的扩展已经快达到它定义时的最大扩展值。对于这些对象要修改它的存储结构参数.

  5. 检查System表空间的内容

    SQL> select f.tablespace_name,a.total,f.free,round((f.free/a.total)*100)"%Free" from (select tablespace_name,sum(bytes/(1024*1024))total from dba_data_files group by tablespace_name)a, (select tablespace_name, round(sum(bytes/(1024*1024)))free from dba_free_space group by tablespace_name)f WHERE a.tablespace_name = f.tablespace_name(+);
    
    TABLESPACE_NAME          TOTAL   FREE      %Free
    ------------------------------ ---------- ---------- ----------
    SYSTEM                     910      7          1
    SYSAUX                     560     31          6
    UNDOTBS1               340    320         94
    USERS                       5       2         40
    
  6. 检查对象的下一步扩展与表空间的最大扩展值

    SQL> select a.table_name, a.next_extent, a.tablespace_name from all_tables a, (select tablespace_name, max(bytes) as big_chunk from dba_free_space group by tablespace_name) f where f.tablespace_name = a.tablespace_name and a.next_extent > f.big_chunk union select a.index_name, a.next_extent, a.tablespace_name from all_indexes a, (select tablespace_name, max(bytes) as big_chunk from dba_free_space group by tablespace_name) f where f.tablespace_name = a.tablespace_name and a.next_extent > f.big_chunk;
    
    no rows selected
    

    如果有记录返回,则表明这些对象的下一个扩展大于该对象所属表空间的最大扩展值,需调整相应表空间的存储参数。

五. 检查Oracle数据库备份结果

  1. 检查数据库备份日志信息

    假设:备份的临时目录为/backup/hotbakup,我们需要检查2009年7月22日的备份结果,则用下面的命令来检查:

    cat /backup/hotbackup/hotbackup-09-7-22.log|grep –i error
    

    备份脚本的日志文件为hotbackup-月份-日期-年份.log,在备份的临时目录下面。如果文件中存在“ERROR:”,则表明备份没有成功,存在问题需要检查。

  2. 检查backup卷中文产生的时间

    #ls –lt /backup/hotbackup
    

    backup卷是备份的临时目录,查看输出结果中文件的日期,都应当是在当天凌晨由热备份脚本产生的。如果时间不对则表明热备份脚本没执行成功。

  3. 检查oracle用户的email

    #tail –n 300 /var/mail/oracle
    

    热备份脚本是通过Oracle用户的cron去执行的。cron执行完后操作系统就会发一条Email通知Oracle用户任务已经完成。查看Oracle email中今天凌晨部分有无ORA-,Error,Failed等出错信息,如果有则表明备份不正常。

六. 检查Oracle数据库性能

  1. 数据库的等待时间

    select sid, event, p1, p2, p3, WAIT_TIME, SECONDS_IN_WAIT from v$session_wait where event not like 'SQL%' and event not like 'rdbms%';
    

    如果数据库长时间持续出现大量像latch free,enqueue,buffer busy waits,db file sequential read,db file scattered read等等待事件时,需要对其进行分析,可能存在问题的语句。

  2. Disk Read最高的SQL语句的获取

    select sql_text from (select * from v$sqlarea order by disk_reads) where rownum <= 5;
    
  3. 查找前十条性能差的sql

    select * from (select parsing_user_id executions, sorts, command_type, disk_reads, sql_text from v$sqlarea order by disk_reads desc) where rownum < 10;
    
  4. 等待时间最多的5个系统等待事件的获取

    select * from (select * from v$system_event where event not like 'SQL%' order by total_waits desc) where rownum <= 5;
    
  5. 检查运行很久的SQL

    SQL> SELECT USERNAME, SID, OPNAME, ROUND(SOFAR * 100 / TOTALWORK, 0) || '%' AS PROGRESS, TIME_REMAINING, SQL_TEXT FROM V$SESSION_LONGOPS, V$SQL WHERE TIME_REMAINING <> 0 AND SQL_ADDRESS = ADDRESS AND SQL_HASH_VALUE = HASH_VALUE;
    
    no rows selected
    
  6. 检查消耗CPU最高的进程

    SQL> SELECT segment_name table_name, COUNT(*) extents FROM dba_segments WHERE owner NOT IN ('SYS', 'SYSTEM') GROUP BY segment_name HAVING COUNT(*) = (SELECT MAX(COUNT(*)) FROM dba_segments GROUP BY segment_name);
    
    no rows selected
    
  7. 检查表空间的I/O比例

    SELECT DF.TABLESPACE_NAME NAME, DF.FILE_NAME "FILE", F.PHYRDS PYR, F.PHYBLKRD PBR, F.PHYWRTS PYW, F.PHYBLKWRT PBW FROM V$FILESTAT F, DBA_DATA_FILES DF WHERE F.FILE# = DF.FILE_ID ORDER BY DF.TABLESPACE_NAME; 
    
  8. 检查文件系统的I/O比例

    SELECT SUBSTR(A.FILE#, 1, 2) "#", SUBSTR(A.NAME, 1, 30) "NAME", A.STATUS, A.BYTES, B.PHYRDS, B.PHYWRTS FROM V$DATAFILE A, V$FILESTAT B WHERE A.FILE# = B.FILE#; 
    
  9. 检查死锁及处理

    SQL> select sid, serial#, sid || ',' || serial# kill_id, username, schemaname, osuser, machine, terminal, program, owner, object_name, object_type, o.object_id from dba_objects o, v$locked_object l, v$session s where o.object_id = l.object_id and s.sid = l.session_id;
    
    no rows selected
    
    alter system kill session '&sid,&serial#';   #Oracle级kill session
    
    kill -9 pid      #操作系统级kill session
    

七. 检查数据库系统资源使用情况

  1. CPU使用情况

    [oracle@ecs-398213 ~]$ top
    top - 16:35:48 up 2 days,  2:59,  2 users,  load average: 0.02, 0.02, 0.00
    Tasks: 159 total,   1 running, 158 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.2 us,  0.3 sy,  0.0 ni, 99.0 id,  0.2 wa,  0.2 hi,  0.2 si,  0.0 st
    

    剩余cpu平均值低于10%视为异常。

  2. 内存使用情况

    [oracle@ecs-398213 ~]$ free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3736         519         147         899        3068        2082
    Swap:           511          52         459
    

    剩余内存低于10%视为异常。

  3. IO使用情况

    iostat -k 1 3
    
  4. 系统负载情况

    [oracle@ecs-398213 ~]$ uptime
     16:42:26 up 2 days,  3:05,  2 users,  load average: 0.01, 0.02, 0.00
    

    后面的3个数值如果有高于2.5的时候就表明系统在超负荷运转,视为异常。

  5. 查看是否有僵死进程

    SQL> select spid from v$process where addr not in (select paddr from v$session);
    
    SPID
    ------------------------
    32450
    
  6. 检查缓冲区命中率

    SQL> SELECT a.VALUE + b.VALUE logical_reads, c.VALUE phys_reads, round(100 * (1 - c.value / (a.value + b.value)), 4) hit_ratio FROM v$sysstat a, v$sysstat b, v$sysstat c WHERE a.NAME = 'db block gets' AND b.NAME = 'consistent gets' AND c.NAME = 'physical reads';
    
    LOGICAL_READS PHYS_READS  HIT_RATIO
    ------------- ---------- ----------
           279329       11195    95.9922
    

    如果命中率低于90%则需加大数据库参数db_cache_size。

  7. 检查共享池命中率

    SQL> select sum(pinhits) / sum(pins) * 100 from v$librarycache;
    
    SUM(PINHITS)/SUM(PINS)*100
    --------------------------
         84.0048095
    

    如低于95%,则需要调整应用程序使用绑定变量,或者调整数据库参数shared pool的大小。

  8. 检查排序区

    SQL> select name,value from v$sysstat where name like '%sort%';
    
    NAME                                   VALUE
    ---------------------------------------------------------------- ----------
    IM ADG journal sort                            0
    sorts (memory)                                  6667
    sorts (disk)                               0
    sorts (rows)                              132264
    

    如果disk/(memoty+row)的比例过高,则需要调整sort_area_size(workarea_size_policy=false)或pga_aggregate_target(workarea_size_policy=true)。

  9. 检查日志缓冲区

    SQL> select name,value from v$sysstat where name in ('redo entries','redo buffer allocation retries');
    
    NAME                                   VALUE
    ---------------------------------------------------------------- ----------
    redo entries                               20229
    redo buffer allocation retries                         0
    

    如果redo buffer allocation retries/redo entries超过1%,则需要增大log_buffer。

八. 检查Oracle数据库安全性

  1. 检查系统安全日志信息

    grep -i accepted /var/log/secure   #查看登录成功的用户
    
    grep -i inval /var/log/secure      #查看登录失败的用户
    

    如果没有(Invalid、refused)视为系统正常,出现错误提示,应作出系统告警通知。

  2. 检查用户修改密码

    alter user USER_NAME identified by PASSWORD; 
    

九. 其他检查

  1. 检查当前crontab -l 是否正常

    crontab -l
    
  2. Oracle Job 是否有失败

    SQL> select job,what,last_date,next_date,failures,broken from dba_jobs Where schema_user='CAIKE';
    
    no rows selected
    

    如有问题建议重建job,如:

    exec sys.dbms_job.remove(1); 
    commit; 
    exec sys.dbms_job.isubmit(1,'REFRESH_ALL_SNAPSHOT;',SYSDATE+1/1440,'SYSDATE+4/1440'); 
    commit;
    
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 一、数据库切换1、业务环境:主库:10.100.0.9备库:10.100.0.18空闲:10.100.0.201 ...
    yangqing阅读 11,787评论 0 0
  • 一、恢复的相关概念 1.1)实例恢复 是指数据库在遭遇突然情况下崩溃,在处理的事务中有提交的,有...
    个人精进成长营阅读 3,826评论 0 0
  • What is Undo 当发布一条回滚语句时,撤销记录用于撤销未提交事务读数据库所做的修改。 在数据...
    个人精进成长营阅读 2,523评论 0 0
  • 一、建立用户/表空间/分配权限/删除表空间 1)注意表空间存放目录 2)/*分为四步 */ /*第1步:创建临时表...
    MrGago阅读 4,191评论 0 0
  • 脱机冷备 冷备份发生在数据库已经正常关闭的情况下 拷贝文件,假如数据库文件都在目录A,拷贝这些数据库文件到其他目录...
    wqh8384阅读 5,366评论 0 1