我正在阅读Jesse Storimer的优秀书籍
Working with Unix Processes.在一篇关于从已经退出的子进程中捕获信号的部分中,他给出了一个代码示例.
我稍微修改了这段代码(见下文),以便更清楚地了解正在发生的事情:
>父母在信号之间恢复自己的执行(我可以看到它的信号),
>在一个陷阱语句中为多个孩子执行等待(有时我得到“收到一个CHLD信号”,然后是多个“子pid退出”).
预期产出
parent is working hard Received a CHLD signal child pid 73408 exited parent is working hard parent is working hard parent is working hard Received a CHLD signal child pid 73410 exited child pid 73409 exited All children exited - parent exiting too.
偶尔的错误
但有一段时间我得到这样的错误:
trapping_signals.rb:17:in `write': deadlock; recursive locking (ThreadError) from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `block in <main>' from trapping_signals.rb:17:in `call' from trapping_signals.rb:17:in `write' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `block in <main>' from trapping_signals.rb:40:in `call' from trapping_signals.rb:40:in `sleep' from trapping_signals.rb:40:in `block in <main>' from trapping_signals.rb:38:in `loop' from trapping_signals.rb:38:in `<main>
任何人都可以向我解释这里出了什么问题吗?
child_processes = 3 dead_processes = 0 # We fork 3 child processes. child_processes.times do fork do # Each sleeps between 0 and 5 seconds sleep rand(5) end end # Our parent process will be busy doing some work. # But still wants to know when one of its children exits. # By trapping the :CHLD signal our process will be notified by the kernel # when one of its children exits. trap(:CHLD) do puts "Received a CHLD signal" # Since Process.wait queues up any data that it has for us we can ask for it # here,since we know that one of our child processes has exited. # We loop over a non-blocking Process.wait to ensure that any dead child # processes are accounted for. # Here we wait without blocking. while pid = Process.wait(-1,Process::WNOHANG) puts "child pid #{pid} exited" dead_processes += 1 # We exit ourselves once all the child processes are accounted for. if dead_processes == child_processes puts "All children exited - parent exiting too." exit end end end # Work it. loop do puts "parent is working hard" sleep 1 end
解决方法
我查看了
Ruby sources以查看引发特定错误的位置,并且只在当前线程尝试获取锁时才会引发,但当前线程已经采用了相同的锁.这意味着锁定不是可重入的:
m = Mutex.new m.lock m.lock #=> same error as yours
现在至少我们知道会发生什么,但不知道为什么以及在哪里.错误消息表明它在调用puts期间发生.当它被调用时,它最终在io_binwrite结束.stdout不同步,但是它被缓冲,因此在第一次调用时满足this if condition,并且将设置缓冲区加上该缓冲区的写锁定.写锁定对于保证写入stdout的原子性很重要,不应该发生同时写入stdout的两个线程混淆了彼此的输出.为了证明我的意思:
t1 = Thread.new { 100.times { print "aaaaa" } } t2 = Thread.new { 100.times { print "bbbbb" } } t1.join t2.join
虽然两个线程轮流写入stdout,但是单个写入被打破绝不会发生 – 你将始终按顺序排列完整的5个或b个.这就是写锁is there for.
现在出现问题的是写锁定的竞争条件.父进程每秒循环并写入stdout(“父进程正在努力”).但是同一个线程最终也会执行陷阱块并再次尝试写入stdout(“收到CHLD信号”).您可以通过在puts语句中添加#{Thread.current}来验证它是否真的是同一个线程.如果这两个事件发生得足够紧密,那么你将遇到与第一个例子相同的情况:同一个线程试图获得两次相同的锁,这最终会触发错误.