最近遇到一个问题,有一套 Oracle 11GR2 数据库,使用数据泵 impdp 进行大数据量导入时遇到异常问题,worker 进程反复被终止并报错 ORA-00028: your session has been killed,经过深入分析,终于找到问题原因。
本文记录完整的分析排查过程,供后续参考。
问题现象
数据泵导入日志显示 worker 进程异常终止,具体错误如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Starting "SYS"."SYS_IMPORT_FULL_01": sys/******** AS SYSDBA directory=data_pump_dir dumpfile=lucifer_20250828171754.dmp logfile=lucifer_imp_20250902.log table_exists_action=replace
ORA-39014: One or more workers have prematurely exited. ORA-39029: worker 1 with process name "DW00" prematurely terminated ORA-31671: Worker process DW00 had an unhandled exception. ORA-00028: your session has been killed ORA-06512: at "SYS.KUPD$DATA_INT", line 58 ORA-06512: at "SYS.KUPD$DATA", line 3434 ORA-06512: at "SYS.KUPW$WORKER", line 16370 ORA-06512: at "SYS.KUPW$WORKER", line 4549 ORA-06512: at "SYS.KUPW$WORKER", line 10464 ORA-06512: at "SYS.KUPW$WORKER", line 1824 ORA-06512: at line 2
Alert 日志显示 worker 进程被异常终止:
1 2
DW01 terminating with fatal err=28, pid=26, wid=2, job LUCIFER.SYS_IMPORT_FULL_02 opidrv aborting process DW01 ospid (58023) as a result of ORA-28
ORA-00028 错误通常表示会话被外部程序终止,这种情况在数据泵导入中较为罕见。在 MOS 中也未找到相关解决方案:
SQL> conn /as sysdba SQL>CREATEOR REPLACE TRIGGER sys.set_dp_trace AFTER LOGON ON DATABASE DECLARE v_program v$session.program%TYPE; BEGIN SELECT substr(program, -5, 2) INTO v_program FROM v$session WHERE sid = (SELECTDISTINCT sid FROM v$mystat);
IF v_program ='DW'or v_program='DM'THEN EXECUTE IMMEDIATE 'alter session set tracefile_identifier = '||'DPTRC'; EXECUTE IMMEDIATE 'alter session set statistics_level=ALL'; EXECUTE IMMEDIATE 'alter session set max_dump_file_size=UNLIMITED'; EXECUTE IMMEDIATE 'alter session set events ''10046 trace name context forever, level 12'''; END IF; END; /
Trace DM DW ORA Lines level trc trc trc in (hex) file file file trace Purpose ------- ---- ---- ---- ------ ----------------------------------------------- 10300 x x x SHDW: To trace the Shadow process (API) (expdp/impdp) 20300 x x x KUPV: To trace Fixed table 40300 x x x 'div' To trace Process services 80300 x KUPM: To trace Master Control Process (MCP) (DM) 100300 x x KUPF: To trace File Manager 200300 x x x KUPC: To trace Queue services 400300 x KUPW: To trace Worker process(es) (DW) 800300 x KUPD: To trace Data Package 1000300 x META: To trace Metadata Package ------- 'Bit AND' 1FF0300 x x x 'all' To trace all components (full tracing)
WAIT #140098774867664: nam='Streams AQ: enqueue blocked on low memory' ela= 3000069 p1=0 p2=0 p3=0 obj#=408355 tim=1756989768058379 KUPM:20:42:48.058: ORA-39029: worker 5 with process name "DW04" prematurely terminated KUPM:20:42:48.059: ORA-31671: Worker process DW04 had an unhandled exception. ORA-00028: your session has been killed
猜测是内存不足导致 DataPump worker 进程被杀。
系统资源检查
检查数据库内存配置:
1 2 3 4 5 6 7 8 9
SQL>SELECT name, value/1024/1024as MB FROM v$parameter WHERE name IN ('memory_target','memory_max_target','sga_target','pga_aggregate_target');
SQL>SELECT pool, name, bytes/1024/1024as MB FROM v$sgastat WHERE pool ISNOT NULLAND bytes >10485760 ORDERBY bytes DESC;
-- 结果显示: streams pool 64MB,free memory 63.98MB,几乎完全空闲 POOL NAME MB ------------ -------------------------- ---------- ...... streams pool free memory 63.9820099 ...... streams pool KGH: NO ACCESS 31.992218 ......
检查 Streams 池状态:
1 2 3 4 5 6 7 8 9
SQL>SELECT component, current_size/1024/1024as current_mb, min_size/1024/1024as min_mb, max_size/1024/1024as max_mb FROM v$memory_dynamic_components WHERE component LIKE'%stream%'OR component LIKE'%queue%';
SQL>SELECT text FROM dba_source WHERE name ='PROC_KILLSESSION' ORDERBY owner, line;
TEXT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ PROCEDURE proc_killsession AS v_sid NUMBER :=0; v_serial NUMBER :=0; v_sql VARCHAR2(2000); v_record_time VARCHAR2(19) := to_char(sysdate,'yyyy-mm-dd hh24:mi:ss'); v_object_name VARCHAR2(30); CURSOR block_session_list IS SELECT s1.sid, s1.serial#, ao.object_name FROM v$session s1, v$session s2, all_objects ao WHERE s1.sid = s2.final_blocking_session AND s2.blocking_session ISNOT NULL AND s2.blocking_session_status ='VALID' AND s2.state ='WAITING' AND s2.row_wait_obj# = ao.object_id AND s2.last_call_et >60; BEGIN INSERT INTO block_event SELECT v_record_time record_time, to_char(s1.logon_time, 'yyyy-mm-dd hh24:mi:ss') logon_time, nvl(to_char(t.start_date, 'yyyy-mm-dd hh24:mi:ss'),'~') start_time, s1.username||'@'||s1.machine||'('||s1.client_info||') '||s1.program||' '||s1.action ||' (sid='||s1.sid||' serial#='||s1.serial#||' sql_id='||s1.sql_id||' | prev_sql_id='||s1.prev_sql_id||')' blocker, s2.username||'@'||s2.machine||'('||s2.client_info||') '||s2.program||' '||s2.action ||' (sid='||s2.sid||' serial#='||s2.serial#||' sql_id='||s2.sql_id||')' blocked, q1.sql_text blocking_text, q2.sql_text blocked_text, l1.lmode lock_mode, l2.request request_mode, l1.ctime CTIME1,l2.ctime CTIME2, s1.event blocking_event,s2.event blocked_event,ao.owner||'.'||ao.object_name object_name, DBMS_ROWID.rowid_create(1, ao.data_object_id, s2.row_wait_file#, s2.row_wait_block#, s2.row_wait_row#) BLOCKED_ROWID, s2.final_blocking_session final_blocking_session FROM v$lock l1, v$session s1, v$lock l2, v$session s2, v$locked_object lo, all_objects ao, v$transaction T, v$sql q1, v$sql q2 WHERE s1.sid=l1.sid AND s2.sid=l2.sid AND s2.last_call_et >60 AND l1.block=1AND l2.request>0 AND l1.id1=l2.id1 AND l1.id2=l2.id2 AND s1.sid=lo.session_id AND lo.object_id=ao.object_id AND hextoraw(s1.saddr) = hextoraw(t.ses_addr(+)) AND q2.sql_id = s2.sql_id -- AND ao.owner='LUCIFER' AND q1.sql_id IN (s1.sql_id,s1.prev_sql_id); COMMIT; OPEN block_session_list; FETCH block_session_list INTO v_sid,v_serial,v_object_name; LOOP EXIT WHEN block_session_list%notfound; v_sql :='insert into block_sql select :v_record_time,sql_id,hash_value,sql_fulltext,first_load_time,last_load_time,last_active_time from v$sql where upper(sql_text) like ''% '||v_object_name||' %'' AND command_type=6 AND last_active_time > sysdate-10/1440'; EXECUTE IMMEDIATE v_sql using v_record_time; v_sql :='alter system kill session '''||v_sid||','||v_serial||''''; --DBMS_OUTPUT.PUT_LINE(v_sql); EXECUTE IMMEDIATE v_sql; FETCH block_session_list INTO v_sid,v_serial,v_object_name; END LOOP; CLOSE block_session_list; COMMIT; EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.put_line('Error NO:'||SQLCODE); END proc_killsession;
impdp 'userid="/ as sysdba"' parfile=/home/oracle/LUCIFER_IMP.par
等待一段时间后,数据成功导入:
导入成功完成后重新启用 Job,并且删除 10046 触发器:
1 2 3 4 5 6
SQL>BEGIN DBMS_JOB.BROKEN(24, FALSE); END; /
SQL>DROPTRIGGER SYS.SET_DP_TRACE;
永久解决方案
优化 proc_killsession 存储过程,排除 DataPump 进程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-- 在游标定义中添加排除条件 CURSOR block_session_list IS SELECT s1.sid, s1.serial#, ao.object_name FROM v$session s1, v$session s2, all_objects ao WHERE s1.sid = s2.final_blocking_session AND s2.blocking_session ISNOT NULL AND s2.blocking_session_status ='VALID' AND s2.state ='WAITING' AND s2.row_wait_obj# = ao.object_id AND s2.last_call_et >60 -- 排除DataPump相关进程 AND s1.program NOTLIKE'%DW%' AND s1.program NOTLIKE'%MCP%' AND s1.module NOTLIKE'Data Pump%'; -- 其余逻辑保持不变