免费网站建站百度云,网络文化经营许可证全国有多少张,芜湖网站建设芜湖,个人网页案例前段时间遇到一个奇葩的问题#xff0c;在开了SR和oracle support追踪两周以后才算是有了不算完美的结果#xff0c;在这里整理出来给大家分享。
1.问题描述
12/13我司某基地MES全厂停线#xff0c;系统卡死不可用#xff0c;通知到我排查#xff0c;查看alert log看到是…前段时间遇到一个奇葩的问题在开了SR和oracle support追踪两周以后才算是有了不算完美的结果在这里整理出来给大家分享。
1.问题描述
12/13我司某基地MES全厂停线系统卡死不可用通知到我排查查看alert log看到是表空间不足 怎么可能我系统加了自动添加数据文件的脚本和表空间使用率的脚本邮件中没有看到报警触发啊 如下是自动检查表空间使用率达到阈值会自动添加数据文件的脚本Oracle自动处理表空间不足脚本-CSDN博客 当下查询该表空间使用率发现该表空间还剩余73GB 那为什么会报表空间不足呢为了不耽误系统使用我先添加了四个数据文件来解决此问题 如下是我检查TBS使用率的sql
SELECT t.* FROM (SELECT a.tablespace_name ,a.unalloc_size ,nvl(f.free_size, 0) free_size ,a.used_size - nvl(f.free_size, 0) used_size ,round((a.used_size - nvl(f.free_size, 0)) / (a.unalloc_size a.used_size) ,2) capacity FROM (SELECT tablespace_name ,round(SUM(bytes) / 1024 / 1024) free_size FROM dba_free_space GROUP BY tablespace_name) f ,(SELECT tablespace_name ,round(SUM(user_bytes) / 1024 / 1024) used_size ,round(SUM(decode(autoextensible ,YES ,decode(sign(maxbytes - user_bytes) ,-1 ,0 ,maxbytes - user_bytes) ,0)) / 1024 / 1024) unalloc_size FROM dba_data_files GROUP BY tablespace_name) a WHERE 1 1 AND a.tablespace_name f.tablespace_name()) t --WHERE capacity 0.60 --AND (unalloc_size free_size) 4000 --AND (unalloc_size free_size) used_size / 2 ORDER BY capacity DESC;
2.问题溯源
2.1是不是空间碎片引起的
明明还有空间为什么会报错空间不足呢从这个结果看该表空间只有109m是unalloc的有73GB是free最先的怀疑就是是不是空间碎片太多
导致无法分配出连续的8192个blocks 8k*8k64M? 查询表空间碎片脚本如下
参考文档 Script to Detect Tablespace Fragmentation (Doc ID 1020182.6)
SET ECHO off REM NAME:TFSTSFRM.SQL REM USAGE:path/tfstsfgm REM ------------------------------------------------------------------------ REM REQUIREMENTS: REM SELECT ON DBA_FREE_SPACE REM ------------------------------------------------------------------------ REM PURPOSE: REM The following is a script that will determine how many extents REM of contiguous free space you have in Oracle as well as the REM total amount of free space you have in each tablespace. From REM these results you can detect how fragmented your tablespace is. REM REM The ideal situation is to have one large free extent in your REM tablespace. The more extents of free space there are in the REM tablespace, the more likely you will run into fragmentation REM problems. The size of the free extents is also very important. REM If you have a lot of small extents (too small for any next REM extent size) but the total bytes of free space is large, then REM you may want to consider defragmentation options. REM ------------------------------------------------------------------------ REM DISCLAIMER: REM This script is provided for educational purposes only. It is NOT REM supported by Oracle World Wide Technical Support. REM The script has been tested and appears to work as intended. REM You should always run new scripts on a test instance initially. REM ------------------------------------------------------------------------ REM Main text of script follows: create table SPACE_TEMP ( TABLESPACE_NAME CHAR(30), CONTIGUOUS_BYTES NUMBER) / declare cursor query is select * from dba_free_space order by tablespace_name, block_id; this_row query%rowtype; previous_row query%rowtype; total number; begin open query; fetch query into this_row; previous_row : this_row; total : previous_row.bytes; loop fetch query into this_row; exit when query%notfound; if this_row.block_id previous_row.block_id previous_row.blocks then total : total this_row.bytes; insert into SPACE_TEMP (tablespace_name) values (previous_row.tablespace_name); else insert into SPACE_TEMP values (previous_row.tablespace_name, total); total : this_row.bytes; end if; previous_row : this_row; end loop; insert into SPACE_TEMP values (previous_row.tablespace_name, total); end; . / set pagesize 60 set newpage 0 set echo off ttitle center Contiguous Extents Report skip 3 break on TABLESPACE NAME skip page duplicate spool contig_free_space.lis rem column CONTIGUOUS BYTES format 999,999,999 column COUNT format 999 column TOTAL BYTES format 999,999,999 column TODAY noprint new_value new_today format a1 rem select TABLESPACE_NAME TABLESPACE NAME, CONTIGUOUS_BYTES CONTIGUOUS BYTES from SPACE_TEMP where CONTIGUOUS_BYTES is not null and tablespace_nameTBS_MYCIM and CONTIGUOUS_BYTES 100000000order by TABLESPACE_NAME, CONTIGUOUS_BYTES desc; select tablespace_name, count(*) # OF EXTENTS, sum(contiguous_bytes) TOTAL BYTES from space_temp group by tablespace_name; spool off drop table SPACE_TEMP /
我对脚本做了调整增加了file_Id 这里剔除掉新加的四个数据文件的file_id 查到该表空间原有数据文件中的连续空间新加数据文件并不会像ASM新加磁盘似的会对数据文件做rebalance除非对表空间做shrikmove等动作才会释放也就是说这些连续空间就是原来有的。
从如下的查询结果来看 可以看到很多数据文件还有很多的连续free blocks 怎么都不至于出现8k连续的blocks无法分配的情况才对
SQL /home/oracle/jobs/checktbs.sql
TABLESPACE NAME FILE ID CONTIGUOUS BYTES
------------------------------ ------- ----------------
TBS_XXXXX 80 3,672,113,152
TBS_XXXXX 81 3,538,026,496
TBS_XXXXX 79 2,817,523,712
TBS_XXXXX 79 2,779,316,224
TBS_XXXXX 81 2,483,552,256
TBS_XXXXX 80 2,078,867,456
TBS_XXXXX 79 1,238,564,864 2.2 会不会是异常的sql操作
在和oracle support沟通中提到CTAS失败可能会导致被占用的空间释放那么我们查询问题时段的sql看看都有哪些操作
检查出问题前一个小时ASH 看看有什么sqlop
select distinct sql_opname from dba_hist_active_sess_history where sample_timeto_timestamp(2024/12/13 17:00:00,yyyy/mm/dd hh24:mi:ss) and sample_timeto_timestamp(2024/12/13 18:00:00,yyyy/mm/dd hh24:mi:ss);
看到除了一个create index外并无异常 检查系统当时的ddl log 没有看到有create index记录考虑到我是用的的 after ddl trigger来记录ddl日志也就是说如果失败了是不被记录到的。 检查问题create index的ASH发现只被捕获到一个snap10s是一个schedule相关的后台进程这么短时间不太可能占用这么大空间。
select sample_time,sql_id,session_id,session_type,sql_opname,program from dba_hist_active_sess_history
where sample_timeto_timestamp(2024/12/13 17:00:00,yyyy/mm/dd hh24:mi:ss)
and sample_timeto_timestamp(2024/12/13 18:00:00,yyyy/mm/dd hh24:mi:ss)
and sql_opnameCREATE INDEX;后续又和oracle support 系统梳理了问题时间的ASH的sql记录考虑到被insert extend的blocks即使rollback 也不会被释放也没有查询到类似有CTAS(create table as)等异常操作
PS如果CTAS失败是无法使用logminer挖掘查询到了的 3.结论
在经过和oracle support沟通了两周经过印度工程师大连工程师以及oracle 内部讨论给出的最终答复就是未知BUGOracle 11.2.0.4虽然上线了超过11年依然是目前市场主流版本之一从这个版本上市以来我也已经维护了很多很多套11G的库确实从来没有遇到过这种问题以现有已经查询到的数据无法得到逻辑自洽的合理解释如果各位网友还有什么排查思路也请留言 此次故障造成了2小时的生产停滞这种情况领导一定会追根溯源如果没有原厂技术支持那肯定是没有办法交代所以在之前的文章“谁是DBA圈里最大的背锅侠”就提到原厂才是DBA圈里最大的背锅侠当然了如果是BUG这个锅被的也不冤如果这个问题是我自己来追没有追到最终的原因我只能说我查不到原因老板会怀疑你的能力如果原厂说这个未知BUG那这就是结论
谁是DBA圈里最大的背锅侠_uncorrected hardware memery error in user-access a-CSDN博客