gdb 6.3, debugging a program that vforked: no stdin/out of gdb

gdb 6.3, debugging a program that vforked: no stdin/out of gdb

Post by mai » Sat, 20 Dec 2008 19:42:26


i,

in a trace application I'm developing atm, I must occasionally try to
dereference user pointers (That is, pointers in user code which may be
invalid/uninitialized). To do this without crashing my app when the
pointers are invalid, I spawn a new process (the "victim") that may
crash, and communicate with it with shared memory. To supervise its
execution, get notified when it crashes and respawn it, a "watchdog"
thread is started. There, the child is vforked (must be vfork, not
fork, to access the same address space). The parent (main) thread,
meanwhile, waits for the result (either the dereferenced pointer, or
the report that the victim crashed).

Now, I have problems debugging this system with gdb (GNU gdb Red Hat
Linux (6.3.0.0-1.159.el4rh)). The system basically works nicely
already, but as it is new
code, it's not error free. Debugging it with gdb doesn't work as I
expected, though. I learned that when a process vforks, the parent
process is suspended until the child
process terminates or exec's something. For this reason, the parent
process in my case (the "watchdog") does not start to run until the
child crashes. It then calls
waitpid and immediately gets the signal code the child got. When I
attach gdb, though, the whole system hangs, and I didn't find out
exactly yet, where. The most strange
thing with this is, that all stdin input doesn't reach gdb at this
moment, and all of gdb's stdout is not displayed. So I have no chance
breaking and inspecting code at
all. The input I do reaches the shell I launched gdb in *after* gdb is
killed (as I must kill it from the outside to abort everything).

The processes then look like this (test is just a little test-app I'm
dynamically linking the tracer against; gdb is attached to test;
parent proc 16834 with Tl and child 16943 running but waiting for
jobs):

muenchen 6044 0.0 0.0 5040 1808 pts/19 Ss Dec12 0:01 \_ /
bin/bash
muenchen 16832 0.0 0.0 5224 1100 pts/19 S 10:12 0:00 \_ /
bin/sh ./localrun.sh
muenchen 16834 0.0 0.0 15396 1504 pts/19 Tl 10:12 0:00
| \_ test
muenchen 16943 0.0 0.0 15396 1504 pts/19 S 10:12 0:00
| \_ test
muenchen 16841 0.6 0.6 17528 12664 pts/19 S 10:12 0:00 \_
gdb test

When I set gdb to "follow-fork-mode child" before attaching, the app
with my library runs normally (quite surprisingly!)...

muenchen 6044 0.0 0.0 5040 1808 pts/19 Ss Dec12 0:01 \_ /
bin/bash
muenchen 21650 0.0 0.0 4424 1096 pts/19 S 10:31 0:00
\_ /bin/sh ./localrun.sh
muenchen 21652 11.5 2.5 92472 52832 pts/19 Sl 10:31 0:05
| \_ test
muenchen 21753 0.1 2.5 92472 52832 pts/19 S 10:32 0:00
| \_ test
muenchen 21664 0.6 0.6 17860 12664 pts/19 S 10:31 0:00 \_
gdb test

...until the victim (child process) crashes the first time. At this
point, I'd expect gdb to break and to allow me to inspect the frame,
and so on. Instead, everything
now hangs again, and gdb allows no input:

muenchen 6044 0.0 0.0 5040 1808 pts/19 Ss Dec12 0:01 \_ /
bin/bash
muenchen 21650 0.0 0.0 4424 1096 pts/19 S 10:31 0:00
\_ /bin/sh ./localrun.sh
muenchen 21652 4.4 2.5 92604 53000 pts/19 Sl 10:31 0:05
| \_ test
muenchen 21753 0.1 2.5 92604 53000 pts/19 T 10:32 0:00
| \_ test
muenchen 21664 0.2 0.6 17860 12664 pts/19 S 10:31 0:00 \_
gdb test

So, now the child is in state T (that's
 
 
 

gdb 6.3, debugging a program that vforked: no stdin/out of gdb

Post by Nate Eldre » Sun, 21 Dec 2008 03:15:27

" XXXX@XXXXX.COM " < XXXX@XXXXX.COM > writes:


This is really, really not what vfork is for. The suspension of the
parent process is specifically intended to keep you from using it to
make threads. You should be using the pthreads interface if you need
multiple threads that share the same memory.

That design seems kind of bogus as well, though. I don't think you
really want the "watchdog" sharing its memory with the "victim"; if the
victim scribbles over its memory the watchdog would crash also. A
better design, and the way that de *** s and trace programs are usually
done, is to have the watchdog be an entirely separate process which
starts the victim using fork(). If the watchdog needs to peek at the
victim's memory, it can do so using the functionality available through
the ptrace() system call. Alternatively, you can access larger chunks
of the victim's memory by mmap'ing /proc/<pid>/mem.

This way the watchdog can watch the victim, but the victim can't
interfere with the watchdog.

I'm not sure about your I/O problems, however.

 
 
 

gdb 6.3, debugging a program that vforked: no stdin/out of gdb

Post by mai » Tue, 30 Dec 2008 17:03:40

Hi Nate, thanks for answering.

I'm aware that the suspension of the parent is to avoid concurrent
writes etc., that would lead to undefined behaviour and crashes. In my
case, I *think* it's okay, because the 'victim' only reads memory,
never writes. But I still may be mistaken there. Just using a pthread
to access the user app's address space didn't work, as a crashing
child thread seemingly also crashes the parent thread, or leads to
undefined behaviour in it. So I changed that to a forked child. Also,
the victim is not the user process, just a tiny function trying to
read the memory at the given address (just 2 or 3 lines of code).

I'm also aware that the whole design seems to be kind of bogus, as you
call it, or kind of "thought backwards"... the reason is, that the
trace library without the victim/watchdog scheme existed before, and
just now the dereferencing of pointers was added to it. Additionally,
it is designed in a way that it can just be loaded dynamically to
trace the app at any time, and the launch of the user app shouldn't be
changed (and, actually, couldn't be changed at all) - so I'm not able
to start a wrapper instead of the user app, that then launches the app
via fork, or something like that.

My main concern, at the moment, is the I/O problem in gdb, as the
whole system seems to work fine as long as no de *** is attached.
I've been looking into this for the last weeks (including xmas ;) )
and didn't find a solution yet, or even someone having a similar
problem with gdb...