僵屍進程概念 僵屍進程(Zombie process)通俗來說指那些雖然已經終止的進程,但仍然保留一些信息,等待其父進程為其收屍. 書面形式一點:一個進程結束了,但是他的父進程沒有等待(調用wait / waitpid)他,那麼他將變成一個僵屍進程。通過ps命令查看其帶有defunct的標誌。僵屍進... ...
僵屍進程概念
僵屍進程(Zombie process)通俗來說指那些雖然已經終止的進程,但仍然保留一些信息,等待其父進程為其收屍. 書面形式一點:一個進程結束了,但是他的父進程沒有等待(調用wait / waitpid)他,那麼他將變成一個僵屍進程。通過ps命令查看其帶有defunct的標誌。僵屍進程是一個早已死亡的進程,但在進程表(processs table)中仍占了一個位置(slot)。
但是如果該進程的父進程已經先結束了,那麼該進程就不會變成僵屍進程。因為每個進程結束的時候,系統都會掃描當前系統中所運行的所有進程,看看有沒有哪個進程是剛剛結束的這個進程的子進程,如果是的話,就由Init進程來接管他,成為他的父進程,從而保證每個進程都會有一個父進程。而Init進程會自動wait其子進程,因此被Init接管的所有進程都不會變成僵屍進程
與ZOMBIE對應的進程狀態還有RUNNING(正在運行或等待運行狀態),UNINTERRUPTABLE(不可中斷阻塞狀態),INTERRUPTABLE(可中斷阻塞狀態),STOPPED(掛起狀態)等。
關於僵屍進程的維基百科介紹:
On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the "Terminated state". This occurs for child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped". A child process always first becomes a zombie before being removed from the resource table. In most cases, under normal system operation zombies are immediately waited on by their parent and then reaped by the system – processes that stay zombies for a long time are generally an error and cause a resource leak.
The term zombie process derives from the common definition of zombie — an undead person. In the term's metaphor, the child process has "died" but has not yet been "reaped". Also, unlike normal processes, the kill command has no effect on a zombie process.
Zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. These do not remain as zombie processes; instead, (like all orphaned processes) they are adopted by init (process ID 1), which waits on its children. The result is that a process that is both a zombie and an orphan will be reaped automatically.
僵屍進程查看
查看系統裡面有那些僵屍進程,有很多方法,例如top命令,ps命令等
另外,使用ps和grep命令結合也能查看僵屍進程,當然有非常多的形式,如下所。
[root@mylnx01 ~]# ps aux | grep Zs | grep -v grep
oracle 2002 0.0 0.0 0 0 ? Zs 02:44 0:00 [sh] <defunct>
oracle 2013 0.0 0.0 0 0 ? Zs 02:46 0:00 [sh] <defunct>
[root@mylnx01 ~]#
[root@mylnx01 ~]# ps -ef | grep defunct
oracle 2002 4788 0 02:44 ? 00:00:00 [sh] <defunct>
oracle 2013 4788 0 02:46 ? 00:00:00 [sh] <defunct>
[root@mylnx01 ~]#
[root@mylnx01 ~]# ps -A -ostat,ppid,pid,cmd | grep -e '^[Zz]'
Zs 4788 2002 [sh] <defunct>
Zs 4788 2013 [sh] <defunct>
[root@mylnx01 ~]#
查看僵屍進程的個數命令
[root@mylnx01 ~]# ps -ef | grep defunct | grep -v grep | wc -l
2
僵屍進程查殺
僵屍進程的查殺有時候是一個頭痛的問題,僵屍進程有時候很殺不掉,有時候還不能亂殺。
要殺掉僵屍進程,一般有兩個方法:
1:找到該defunct僵屍進程的父進程,將該進程的父進程殺掉,則此defunct進程將自動消失
2:重啟伺服器。
查看僵屍進程並殺掉
ps -ef | grep defunct | grep -v grep | awk {print "kill -9 " $2,$3}
一般情況下,不建議莽撞的kill掉這些僵屍進程,還是檢查一下具體原因後,根據具體情況再做查殺,如下所示。
[root@mylnx01 ~]# ps -ef | grep defunct
oracle 2002 4788 0 02:44 ? 00:00:00 [sh] <defunct>
oracle 2013 4788 0 02:46 ? 00:00:00 [sh] <defunct>
root 12348 10441 0 12:18 pts/11 00:00:00 grep defunct
[root@mylnx01 ~]# cat /proc/2002/stack
[<ffffffff8105b9f5>] do_exit+0x67d/0x696
[<ffffffff8105baae>] sys_exit_group+0x0/0x1b
[<ffffffff8105bac5>] sys_exit_group+0x17/0x1b
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@mylnx01 ~]# cat /proc/2013/stack
[<ffffffff8105b9f5>] do_exit+0x67d/0x696
[<ffffffff8105baae>] sys_exit_group+0x0/0x1b
[<ffffffff8105bac5>] sys_exit_group+0x17/0x1b
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@mylnx01 ~]# cat /proc/4788/stack
[<ffffffff811de86e>] sys_semtimedop+0x68b/0x7e7
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@mylnx01 ~]#
[root@mylnx01 ~]# lsof -p 4788
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
oracle 4788 oracle cwd DIR 253,6 4096 7880901 /u01/app/oracle/product/10.2.0/db_1/dbs
oracle 4788 oracle rtd DIR 253,0 4096 2 /
oracle 4788 oracle txt REG 253,6 104559054 7884256 /u01/app/oracle/product/10.2.0/db_1/bin/oracle
oracle 4788 oracle DEL REG 0,4 3211268 /SYSVdf6790e8
oracle 4788 oracle mem REG 253,0 143600 8421721 /lib64/ld-2.5.so
oracle 4788 oracle mem REG 253,0 1722304 8421722 /lib64/libc-2.5.so
oracle 4788 oracle mem REG 253,0 615136 8421739 /lib64/libm-2.5.so
oracle 4788 oracle mem REG 253,0 23360 8421607 /lib64/libdl-2.5.so
oracle 4788 oracle mem REG 253,0 145824 8421724 /lib64/libpthread-2.5.so
oracle 4788 oracle mem REG 253,0 114352 8421738 /lib64/libnsl-2.5.so
oracle 4788 oracle mem REG 253,0 53880 8421403 /lib64/libnss_files-2.5.so
oracle 4788 oracle mem CHR 1,5 4603 /dev/zero
oracle 4788 oracle mem REG 253,0 3768 10426606 /usr/lib64/libaio.so.1.0.1
oracle 4788 oracle mem REG 253,6 1552 7893073 /u01/app/oracle/product/10.2.0/db_1/dbs/hc_epps.dat
oracle 4788 oracle mem REG 253,6 3796601 7888182 /u01/app/oracle/product/10.2.0/db_1/lib/libnnz10.so
oracle 4788 oracle mem REG 253,6 123345 7885115 /u01/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so
oracle 4788 oracle mem REG 253,6 64041 7887888 /u01/app/oracle/product/10.2.0/db_1/lib/libclsra10.so
oracle 4788 oracle mem REG 253,6 11385162 7883147 /u01/app/oracle/product/10.2.0/db_1/lib/libjox10.so
oracle 4788 oracle mem REG 253,6 516097 7887854 /u01/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so
oracle 4788 oracle mem REG 253,6 691049 7887853 /u01/app/oracle/product/10.2.0/db_1/lib/libocrb10.so
oracle 4788 oracle mem REG 253,6 681761 7887852 /u01/app/oracle/product/10.2.0/db_1/lib/libocr10.so
oracle 4788 oracle mem REG 253,6 8545 7885226 /u01/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so
oracle 4788 oracle mem REG 253,6 1772385 7887887 /u01/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so
oracle 4788 oracle mem REG 253,6 177809 7884216 /u01/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so
oracle 4788 oracle 0r CHR 1,3 4601 /dev/null
oracle 4788 oracle 1r CHR 1,3 4601 /dev/null
oracle 4788 oracle 2w REG 253,6 1447 7995467 /u01/app/oracle/admin/epps/bdump/epps_psp0_4788.trc
oracle 4788 oracle 3r CHR 1,3 4601 /dev/null
oracle 4788 oracle 4r CHR 1,3 4601 /dev/null
oracle 4788 oracle 5w REG 253,6 663 1638412 /u01/app/oracle/admin/epps/udump/epps_ora_4784.trc (deleted)
oracle 4788 oracle 6w REG 253,6 30440 7995465 /u01/app/oracle/admin/epps/bdump/alert_epps.log.20150904 (deleted)
oracle 4788 oracle 7u REG 253,6 0 6930433 /u01/app/oracle/product/10.2.0/db_1/dbs/lkinstepps (deleted)
oracle 4788 oracle 8w REG 253,6 30440 7995465 /u01/app/oracle/admin/epps/bdump/alert_epps.log.20150904 (deleted)
oracle 4788 oracle 9u REG 253,6 1552 7893073 /u01/app/oracle/product/10.2.0/db_1/dbs/hc_epps.dat
oracle 4788 oracle 10r CHR 1,5 4603 /dev/zero
oracle 4788 oracle 11r REG 253,6 849408 7887921 /u01/app/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
oracle 4788 oracle 12r CHR 1,5 4603 /dev/zero
oracle 4788 oracle 13u REG 253,6 1552 7893073 /u01/app/oracle/product/10.2.0/db_1/dbs/hc_epps.dat
oracle 4788 oracle 14uR REG 253,6 24 7893074 /u01/app/oracle/product/10.2.0/db_1/dbs/lkEPPS
oracle 4788 oracle 15r REG 253,6 849408 7887921 /u01/app/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
查看僵屍進程的父進程,發現是對應的是ORACLE裡面PSPO進程,關於這個進程,我也沒有把握是否可以KIll掉。所以選擇重啟伺服器比較保險一點。
參考資料:
https://en.wikipedia.org/wiki/Zombie_process
http://linux.alai.net/viewblog.php?id=48189