问题描述: 10月25日上午滨州网通的工程师报告OSS应用系统运行缓慢,具体操作是通过OSS系统查询话单,很长时间才能返回结果,严重影响了客户的正常使用。
问题处理: 1.登陆数据库主机,用sar命令看到idle的值持续为0,CPU的资源已经耗尽:
bz_db1# sar 2 4
SunOS kest 5.8 Generic_108528-19 sun4u 10/26/04
10:56:46 %usr %sys %wio %idle 10:56:48 1 4 95 0 10:56:50 1 5 94 0 10:56:52 0 6 93 0 10:56:54 1 6 93 0
Average 1 5 94 0
2.使用TOP命令看到有两个明显占用CPU利用率过高的进程,以下是top命令的结果:
bz_db1# top
last pid: 1664;load averages: 3.26, 3.24, 3.69 159 processes: 152 sleeping, 2 running, 2 zombie, 1 stopped, 2 on cpu CPU states: 1.5% idle, 72.5% user, 17.9% kernel, 8.0% iowait, 0.0% swap Memory: 2.0G real, 233M free, 2.0G swap in use, 3.4G swap free
PID USERNAME THR PR NCE SIZE RES STATE TIME FLTS CPU COMMAND 27420 oracle 1 10 0 1.3G 1.2G cpu01 22.9H 2 31.94% oracle 27418 oracle 1 10 0 1.3G 1.2G run 23.0H 6 26.86% oracle 5943 oracle 1 59 0 1.3G 1.2G sleep 25:26 37 4.92% oracle 6295 oracle 1 55 0 1.3G 1.2G run 25:14 74 4.90% oracle 7778 oracle 1 43 0 1.3G 1.2G sleep 11:43 110 4.86% oracle 13270 oracle 1 59 0 1.3G 1.2G sleep 210.6H 0 0.96% oracle 13056 oracle 1 48 0 1.3G 1.2G sleep 303:30 0 0.37% oracle 10653 root 1 58 0 2560K 1624K cpu00 0:00 0 0.32% top 18827 oracle 1 58 0 1.3G 1.2G sleep 18.4H 0 0.31% oracle 12748 oracle 258 58 0 1.3G 1.2G sleep 555:14 0 0.21% oracle 10634 oracle 1 59 0 1.3G 1.2G sleep 0:01 0 0.21% oracle 28458 oracle 1 58 0 1.3G 1.2G sleep 535:02 0 0.18% oracle 13075 oracle 1 59 0 1.3G 1.2G sleep 326:33 0 0.15% oracle 13173 oracle 1 58 0 1.3G 1.2G sleep 593:07 0 0.13% oracle 4927 oracle 1 59 0 1.3G 1.2G sleep 33.4H 0 0.11% oracle
可以看到这两个进程号分别是27420和27418.
3.捕获占用CPU利用率过高的SQL语句:
以下用到了我总结的SQL语句:
SQL>set line 240 SQL>set verify off SQL>column sid format 999 SQL>column pid format 999 SQL>column S_# format 999 SQL>column username format A9 heading "ORA User" SQL>column program format a29 SQL>column SQL format a60 SQL>COLUMN OSname format a9 Heading "OS User" SQL>SELECT P.pid pid,S.sid sid,P.spid spid,S.username username, S.osuser osname,P.serial# S_#,P.terminal,P.program program, P.background,S.status,RTRIM(SUBSTR(a.sql_text, 1, 80)) SQL FROM v$process P, v$session S,v$sqlarea A WHERE P.addr = s.paddr AND S.sql_address = a.address (+) AND P.spid LIKE '%&1%';
Enter value for 1: 27420(注意这里应输入占用CPU最高的进程对应的PID)
得到以下SQL语句:
Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016 and LOCALCHARGE>0 and caller like '0543886%';
27418进程对应的SQL语句如下: select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016 and caller like '0543888%';
4.使用相关用户连接到数据库,检查其执行计划: SQL>connect wacos/oss Connected.
SQL>@?/rdbms/admin/utlxplan.sql Table created.
SQL>set autotrace on
SQL>set timing on
SQL>Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016 and LOCALCHARGE>0 and caller like '0543886%';
NVL(SUM(LOCALCHARGE),0) NVL(SUM(USAGE),0) ----------------------- ----------------- 0 0
Elapsed: 00:02:56.37
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13435 Card=1 Bytes=5 3)
1 0 SORT (AGGREGATE) 2 1 PARTITION RANGE (ALL) 3 2 TABLE ACCESS (FULL) OF 'LOCALUSAGE' (Cost=13435 Card=1 81 Bytes=9593)
Statistics ---------------------------------------------------------- 258 recursive calls 0 db block gets 88739 consistent gets 15705 physical reads 0 redo size 580 bytes sent via SQL*Net to client 651 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 8 sorts (memory) 0 sorts (disk) 1 rows processed
发现对localusage表做了全表扫描,什么记录也没有返回居然用了2分多钟。
SQL> select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20040816 and caller like '0543888%';
NVL(SUM(LOCALCHARGE),0) ----------------------- 27.6
Elapsed: 00:03:56.46
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13435 Card=1 Bytes=4 0)
1 0 SORT (AGGREGATE) 2 1 PARTITION RANGE (ALL) 3 2 TABLE ACCESS (FULL) OF 'LOCALUSAGE' (Cost=13435 Card=3 615 Bytes=144600)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 88588 consistent gets 15615 physical reads 0 redo size 507 bytes sent via SQL*Net to client 651 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
这个SQL语句有结果返回,发现也是对localusage表做了全表扫描,但速度也很慢,用了3分多钟。
SQL> select count(*) from localusage;
COUNT(*) ---------- 5793776
该表有579万多条记录,数据量很大,全表扫描已经不再适合。
5.检查该表的类型:
SQL> SELECT INDEX_NAME, TABLE_NAME, STATUS, PARTITIONED FROM USER_INDEXES WHERE TABLE_NAME='LOCALUSAGE';
INDEX_NAME TABLE_NAME STATUS PAR ------------------------------ ------------------------------ -------- --- I_LOCALUSAGE_SID LOCALUSAGE N/A YES UI_LOCALUSAGE_ST_SEQ LOCALUSAGE N/A YES
SQL> SELECT index_name,table_name,locality FROM user_part_indexes where table_name='LOCALUSAGE';
INDEX_NAME TABLE_NAME LOCALI ------------------------------ ------------------------------ ------ I_LOCALUSAGE_SID LOCALUSAGE LOCAL UI_LOCALUSAGE_ST_SEQ LOCALUSAGE LOCAL
发现该表是分区表,并在SERVICEID,STARTIME和CDRSEQUENCE列上建立了分区索引,索引类型是local索引。
6.查看分区索引的索引键值:
SQL> select INDEX_NAME,COLUMN_NAME,INDEX_OWNER from dba_ind_columns where TABLE_NAME='LOCALUSAGE';
INDEX_NAME COLUMN_NAME INDEX_OWNER -------------------- -------------------- ------------------------------ I_LOCALUSAGE_SID SERVICEID WACOS UI_LOCALUSAGE_ST_SEQ STARTTIME WACOS UI_LOCALUSAGE_ST_SEQ CDRSEQUENCE WACOS
发现在endtime和caller列上都没有建立索引,这也是导致SQL语句做全表扫描的最终原因。
7.决定创建新的分区索引以消除全表扫描:
(1).首先查看localusage表分区情况:
SQL> select PARTITION_NAME,tablespace_name from user_tab_partitions where table_name='LOCALUSAGE';
PARTITION_NAME TABLESPACE_NAME ------------------------------ ------------------------------ LOCALUSAGE_200312 WACOS LOCALUSAGE_200401 WACOS LOCALUSAGE_200402 WACOS LOCALUSAGE_200404 WACOS LOCALUSAGE_200405 WACOS LOCALUSAGE_200406 WACOS LOCALUSAGE_200407 WACOS LOCALUSAGE_200409 WACOS LOCALUSAGE_200410 WACOS LOCALUSAGE_200411 WACOS LOCALUSAGE_200403 WACOS LOCALUSAGE_200408 WACOS LOCALUSAGE_200412 WACOS
13 rows selected.
(2).在caller列上创建local分区索引: SQL>set timing on SQL>create index I_LOCALUSAGE_CALLER on localusage(caller) LOCAL ( PARTITION LOCALUSAGE_200312, PARTITION LOCALUSAGE_200401, PARTITION LOCALUSAGE_200402, PARTITION LOCALUSAGE_200404, PARTITION LOCALUSAGE_200405, PARTITION LOCALUSAGE_200406, PARTITION LOCALUSAGE_200407, PARTITION LOCALUSAGE_200409, PARTITION LOCALUSAGE_200410, PARTITION LOCALUSAGE_200411, PARTITION LOCALUSAGE_200403, PARTITION LOCALUSAGE_200408, PARTITION LOCALUSAGE_200412 ) TABLESPACE wacos STORAGE( INITIAL 6553600 NEXT 6553600 MAXEXTENTS unlimited PCTINCREASE 0) PCTFREE 5 NOLOGGING;
Index created.
Elapsed: 00:06:27.90 (由于数据量比较大,耗时6分钟)
8.再次查看执行计划: SQL>Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016 and LOCALCHARGE>0 and caller like '0543886%';
NVL(SUM(LOCALCHARGE),0) NVL(SUM(USAGE),0) ----------------------- ----------------- 0 0
Elapsed: 00:00:03.00
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=53) 1 0 SORT (AGGREGATE) 2 1 PARTITION RANGE (ALL) 3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'LOCALUSAGE' (Cost=22 Card=181 Bytes=9593) 4 3 INDEX (RANGE SCAN) OF 'I_LOCALUSAGE_CALLER' (NON-UNIQUE) (Cost=14 Card=65063)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 16813 consistent gets 569 physical reads 0 redo size 580 bytes sent via SQL*Net to client 651 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
这次走了索引后速度明显快多了,用了3秒钟就返回了结果。
SQL>select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20040816 and caller like '0543888%';
NVL(SUM(LOCALCHARGE),0) ----------------------- 27.6
Elapsed: 00:00:24.73
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=40) 1 0 SORT (AGGREGATE) 2 1 PARTITION RANGE (ALL) 3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'LOCALUSAGE' (Cost=22 Card=3615 Bytes=144600) 4 3 INDEX (RANGE SCAN) OF 'I_LOCALUSAGE_CALLER' (NON-UNIQUE) (Cost=14 Card=65063)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 129336 consistent gets 7241 physical reads 0 redo size 507 bytes sent via SQL*Net to client 651 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
这个SQL语句走了索引,用了24秒钟返回结果,性能明显提高了很多。

|