Saturday, September 27, 2014

Unclogged

Older, wiser programmers will have figured out that the bug in my previous post was due to an uncaught SIGPIPE.  I'd completely forgotten about those.  :)

The shifting nature of the bug (which is what really had me confused) was due to a race condition between the parent process writing to the pipe, and the child closing it (by exiting).  Here's an overly simplified version of what happens in both processes after the clone():
/* parent - writer */
write(...)
close(...)
waitpid(...)

/* child - reader */
execve("/bin/false", ...)
exit(1)
If the parent attempts to write to the pipe after it has been closed on the other side by the child's exit(), then a SIGPIPE obviously occurs.  However, since our simple string is small enough to fit in a pipe's buffer, the parent may very well get the chance to close the pipe before the child.  At this point, the child's close() will simply discard any data in the pipe's buffer and exit; its exit status will then be returned by the parent's waitpid(), stored in $?, and cause Perl's close() to return a false value.

The script example given in the previous post is simple enough (without autodie) for the parent to get there first.  Adding autodie then introduces just enough complexity for the parent to take a little bit more time, giving the child a chance to exit first.  (Even using strace is enough to influence the result, making this a true heisenbug.)

Note that in the parent-closes-first case, the failure has nothing to do with the pipe itself (hence why $! is left empty), but is simply due to the child returning a non-zero exit status.  (Thus, replacing false with true would make the script fail some times, but not always.  Now there's a real head-scratcher.)

No comments: