数据库版本11.2.0.4,linux,rac环境 一条
sql,按照正常执行计划执行会报Oracle内部
错误:
sql> SELECT * 2 FROM (SELECT A.BID,3 A.B1SUBCASE,A.B1NO,4 5 A.B1INGDATE,6 A.DEALTOTAL,7 A.CUSTID,8 A.POSMEMID,9 A.OUTSTOREROOMID,C.CUSTID AS C_CUSTID,10 11 D.STORESID,12 (SELECT SUM(TRADEMONEY) AS TRADEMONEY 13 FROM TBL_B1ITEMS AA 14 WHERE AA.BID = A.BID) AS SUM_TRADEMONEY,(SELECT COUNT(IBIID) AS CNTIBIID 15 16 FROM TBL_B1ITEMS AA 17 WHERE AA.BID = A.BID) AS CNTIBIID,18 (SELECT SUM(DEBIT) FROM TBL_B1CHECKOUT A1,TBL_DATADICTIONARY A2 19 20 WHERE A1.DATADICTIONARYID = A2.DATADICTIONARYID 21 AND A2.DICTIONARYCODE = 1 22 AND A1.FORECODE IS NULL AND A1.BID = A.BID) AS DEBIT 23 24 FROM TBL_B1 A INNER JOIN TBL_B1CASE B 25 26 ON A.B1TYPE = B.B1TYPE 27 AND A.B1SUBCASE = B.B1SUBCASE 28 LEFT JOIN TBL_CUSTOM C 29 ON A.CUSTID = C.CUSTID LEFT JOIN TBL_STORES D 30 31 ON A.OUTSTOREROOMID = D.STORESID 32 WHERE A.RDATE = TO_DATE('2016-05-11','yyyy-mm-dd') 33 AND A.B1SUBCASE IN (11,12,13,21,22,24,41,42,61,62,63,71,72,73,91) 34 35 AND B.AUTOREFER = 1 36 AND A.SERVICSTATUS = 0 AND A.BCOMPLETE = 1 37 38 AND NVL(A.BERRMEMID,0) <> 1 39 AND NVL(IINVAUTOREFERB1,0) <> 3 40 ORDER BY A.TIMEORDER,B.POSREFERORDER) 41 WHERE ROWNUM <= 10000; FROM TBL_B1CHECKOUT A1,TBL_DATADICTIONARY A2 * 第 19 行出现
错误: ORA-00600: 内部
错误代码,参数: [12406],[],[] 首先确定这是Oracle的bug无疑,
错误的stacktrace如下: ORA-00600: <C4>?<BF><B4><ED><CE><F3><B4><FA><C2><EB>,<B2><CE><CA><FD>: [12406],[] ========= Dump for incident 216219 (ORA 600 [12406]) ======== *** 2016-05-11 10:52:10.633 dbkedDefDump(): Starting incident default dumps (flags=0x2,level=3,mask=0x0) ----- Current
sql Statement for this session (
sql_id=gq026hrdjh7sd) ----- SELECT * FROM (SELECT A.BID,A.B1SUBCASE,A.B1INGDATE,A.DEALTOTAL,A.CUSTID,A.POSMEMID,A.OUTSTOREROOMID,D.STORESID,(SELECT SUM(TRADEMONEY) AS TRADEMONEY FROM TBL_B1ITEMS AA WHERE AA.BID=A.BID) AS SUM_ TRADEMONEY,(SELECT COUNT(IBIID) AS CNTIBIID FROM TBL_B1ITEMS AA WHERE AA.BID=A.BID) AS CNTIBIID,(SELECT SUM(DEBIT) FROM TBL_B1CHECKOUT A1,TBL_DATADICTIONARY A2 WHERE A1.DATADICTIONARYID=A2.DATADICTIONARYID AND A2.DICTIONARYCODE=1 AND A1.FORECODE IS NULL AND A1.BID=A.BID) AS DEBIT FROM TBL_B1 A INNER JOIN TBL_B1CASE B ON A.B1TYPE=B.B1TYPE AND A.B1SUBCASE=B.B1SUBCASE LEFT JOIN TBL_CUSTOM C ON A.CUSTID=C.CUSTID LEFT JOIN TBL_STORES D ON A.OUTSTOREROOMID =D.STORESID WHERE A.RDATE =TO_DATE(:B1,'yyyy-mm-dd') AND A.B1SUBCASE IN (11,91) AND B.AUTOREFER = 1 AND A.SERVICSTATUS=0 AND A.BCOMPLETE=1 AND NVL(A.BERRMEMID,0)<>1 AND NVL(IINVAUTOREFERB1,0)< > 3 ORDER BY A.TIMEORDER,B.POSREFERORDER) WHERE ROWNUM<=10000 ----- PL/
sql Stack ----- ----- PL/
sql Call Stack ----- object line object handle number name 0xdb38ea018 109 procedure PROD.PROC1 0xf49353b10 1 anonymous block ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dub
IoUs value) -------------------- -------- -------------------- ---------------------------- skdstdst()+41 call kgdsdst() 000000000 ? 000000000 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? ksedst1()+103 call skdstdst() 000000000 ? 000000000 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? ksedst()+39 call ksedst1() 000000000 ? 000000001 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? dbkedDefDump()+2746 call ksedst() 000000000 ? 000000001 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? ksedmp()+41 call dbkedDefDump() 000000003 ? 000000002 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? ksfdmp()+69 call ksedmp() 000000003 ? 000000002 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ? 7FFF3A9F9A20 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? dbgexProcessError() call dbgexPhaseII() 7F988CD217A8 ? 7F988CB73F60 ? +2680 7FFF3AA03078 ? 7FFF3A9F9AF8 ? 7FFF3A9FE5A0 ? 000000002 ? dbgeExecuteForError call dbgexProcessError() 7F988CD217A8 ? 7F988CB73F60 ? ()+88 000000001 ? 000000000 ? 7FFF3A9FE5A0 ? 000000002 ? dbgePostErrorKGE()+ call dbgeExecuteForError 7F988CD217A8 ? 7F988CB73F60 ? 2136 () 000000001 ? 000000001 ? 000000000 ? 000000002 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 00C0CC9E0 ? 7F988CABB520 ? 71 000000258 ? 000000001 ? 000000000 ? 000000002 ? kgeade()+351 call dbkePostKGE_kgsf() 00C0CC9E0 ? 7F988CABB520 ? 000000258 ? 000000001 ? 000000000 ? 000000002 ? kgeriv_int()+125 call kgeade() 00C0CC9E0 ? 00C0CCBA0 ? 7F988CABB520 ? 000000258 ? 000000000 ? 000000002 ? kgeriv()+17 call kgeriv_int() 00C0CC9E0 ? 00C0CCBA0 ? 7F988CABB520 ? 000000258 ? 000000000 ? 000000002 ? kgesiv()+115 call kgeriv() 00C0CC9E0 ? 00C0CCBA0 ? 7F988CABB520 ? 000000258 ? 000000000 ? 000000002 ? ksesic0()+193 call kgesiv() 00C0CC9E0 ? 7F988CABB520 ? 000003076 ? 000000000 ? 7FFF3AA03D30 ? 000000002 ? qesmaHandleBadRowID call ksesic0() 00C0CC9E0 ? 000000001 ? ()+982 000000001 ? 000000001 ? 7FFF3AA02D30 ? 000000000 ? qertbFetchByRowID() call qesmaHandleBadRowID 7F988C6642B8 ? 000000001 ? +9426 () 000000001 ? 000000001 ? 7FFF3AA02D30 ? 000000000 ? qergsFetch()+837 call qertbFetchByRowID() 7F988C6642B8 ? 7F988C64D560 ? 00957693A ? 000000001 ? 从call stack trace中的这几个
函数(qergsFetch->qertbFetchByRowID->qesmaHandleBadRowID)来看是在fetch时查到某个索引,并通过rowid回表时触发了Oracle的Bug。 STAT #1
40396469154440 id=25 cnt=1 pid=16 pos=2 obj=176780 op='TABLE ACCESS BY INDEX ROWID TBL_B1CASE (cr=1 pr=0 pw=0 time=10 us cost=1 size=12 card=1)' WAIT #1
40396469154440: nam='
sql*Net break/reset to client' ela= 3 driver id=1650815232 break?=1 p3=0 obj#=177516 tim=1462949120406296 WAIT #1
40396469154440: nam='
sql*Net break/reset to client' ela= 62 driver id=1650815232 break?=0 p3=0 obj#=177516 tim=1462949120406373 (END) 同时可以通过obj#得到有问题的索引PROD.XFXB1ITEMS_BID_P,尝试不走这个索引执行
sql,发现是可以正常出结果的。
sql> set lines 200 pages 999
sql> SELECT * 2 FROM (SELECT A.BID,10 C.CUSTID AS C_CUSTID,11 D.STORESID,(SELECT /*+full(AA) */ SUM(TRADEMONEY) AS TRADEMONEY 12 13 FROM TBL_B1ITEMS AA 14 WHERE AA.BID = A.BID) AS SUM_TRADEMONEY,15 (SELECT /*+full(AA) */ COUNT(IBIID) AS CNTIBIID 16 FROM TBL_B1ITEMS AA WHERE AA.BID = A.BID) AS CNTIBIID,17 18 (SELECT SUM(DEBIT) 19 FROM TBL_B1CHECKOUT A1,TBL_DATADICTIONARY A2 WHERE A1.DATADICTIONARYID = A2.DATADICTIONARYID 20 21 AND A2.DICTIONARYCODE = 1 22 AND A1.FORECODE IS NULL 23 AND A1.BID = A.BID) AS DEBIT 24 FROM TBL_B1 A INNER JOIN TBL_B1CASE B 25 26 ON A.B1TYPE = B.B1TYPE 27 AND A.B1SUBCASE = B.B1SUBCASE 28 LEFT JOIN TBL_CUSTOM C 29 ON A.CUSTID = C.CUSTID LEFT JOIN TBL_STORES D 30 31 ON A.OUTSTOREROOMID = D.STORESID 32 WHERE A.RDATE = TO_DATE('2016-05-11','yyyy-mm-dd') 33 AND A.B1SUBCASE IN 34 (11,91) AND B.AUTOREFER = 1 35 36 AND A.SERVICSTATUS = 0 37 AND A.BCOMPLETE = 1 38 AND NVL(A.BERRMEMID,0) <> 1 AND NVL(IINVAUTOREFERB1,0) <> 3 39 40 ORDER BY A.TIMEORDER,B.POSREFERORDER) 41 WHERE ROWNUM <= 10000; BID B1SUBCASE B1NO B1INGDATE ---------- ----------- ------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------- DEALTOTAL CUSTID ---------- ---------- POSMEMID OUTSTOREROOMID ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------- C_CUSTID STORESID SUM_TRADEMONEY CNTIBIID DEBIT ---------- ---------- -------------- ---------- ---------- 4624638 21 0401
50206457-160511000089 2016/05/11 15:58:57 32 630220 1708 1708 0 4624640 21 0401
50206457-160511000090 2016/05/11 16:04:35 18 1708 1708 0 18 问题肯定是出在索引PROD.XFXB1ITEMS_BID_P上,这是分区表上的一个普通的全局索引,做了坏块检查,并没发现存在坏块,对整个数据
文件都进行了检查,无坏块方面的问题: RMAN> backup validate datafile '+oradata/XZGdb/datafile/PROD.dbf'; 启动 backup 于 2016/05/11 15:03:33 使用目标
数据库控制
文件替代恢复目录 分配的通道: ORA_DISK_1 通道 ORA_DISK_1: SID=1166 实例=XZGdb1 设备类型=DISK 通道 ORA_DISK_1: 正在启动全部数据
文件备份集 通道 ORA_DISK_1: 正在指定备份集内的数据
文件 输入数据
文件:
文件号=00058
名称=+oradata/XZGdb/datafile/PROD.dbf 通道 ORA_DISK_1: 备份集已完成,经过时间:00:00:07 数据
文件列表 =================
文件状态
标记为损坏 空块 已检查的块 高 SCN ---- ------ -------------- ------------ --------------- ---------- 58 OK 0 134018 622080 170429924655
文件名: +oradata/XZGdb/datafile/PROD.dbf 块类型 失败的块 已处理的块 ---------- -------------- ---------------- 数据 0 178590 索引 0 297832 其他 0 11640 完成 backup 于 2016/05/11 15:03:42 RMAN> 猜测应该可以通过rebuild或者recreate这个索引
解决这个问题,于是在非业务高峰期对索引进行了rebuild: alter index PROD.XFXB1ITEMS_BID_P rebuild online parallel 8 nologging; alter index PROD.XFXB1ITEMS_BID_P parallel 1; 还是一样的报错,于是尝试recreate: drop index PROD.XFXB1ITEMS_BID_P; CREATE INDEX "PROD"."XFXB1ITEMS_BID_P" ON "PROD"."TBL_B1ITEMS" ("BID") PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "PROD" ; 再次执行的时候终于正常了。 查到官方的描述: ORA-00600: Internal Error Code,Arguments: [12406] (文档 ID 1567438.1) 转到
底部转到
底部 In this Document Symptoms Cause Solution APPLIES TO: Oracle Database - Enterprise Edition - Version 11.2.0.2 and later Information in this document applies to any platform. SYMPTOMS ORA-00600 [12406] is reported in 11.2.0.2 environment when running below
sql statement: SELECT ZELLE,:"SYS_B_0" SHAPE,OBJECTID,LTE800_LABEL_GELB_AKT.SHAPE.points,LTE800_LABEL_GELB_AKT.SHAPE.numpts,LTE800_LABEL_GELB_AKT.SHAPE.entity,LTE800_LABEL_GELB_AKT.SHAPE.minx,LTE800_LABEL_GELB_AKT.SHAPE.miny,LTE800_LABEL_GELB_AKT.SHAPE.maxx,LTE800_LABEL_GELB_AKT.SHAPE.maxy,LTE800_LABEL_GELB_AKT.rowid FROM GEODATA.LTE800_LABEL_GELB_AKT LTE800_LABEL_GELB_AKT WHERE SDE.ST_EnvIntersects(LTE800_LABEL_GELB_AKT.SHAPE,:1,:2,:3,:4) = :"SYS_B_1" ----- Call Stack Trace ----- qesmaHandleBadRowID qertbFetchByRowID opifch2 opiefn opiodr ttcpip opitsk opiino opiodr opidrv CAUSE Error is caused due to unpublished Bug 13571154 ORA 600 [12406] - QESMAHANDLEBADROWID [PART]<-- SIGNALING SOLUTION This issue will be fixed in version 12. It has been determined that the fix is too risky to be back ported to 11g. OR Drop and recreate indexes on the table used in the query.