(Boehm) gc question

(Boehm) gc question

Post by Marco van » Thu, 26 Mar 2009 21:16:59

Hi, I recently talked with sb who wanted to use the boehm GC lib with a
compiler I also use. He was having problems with stability. My knowledge of
GC internals (specially outside of Java/.NET) is a bit basic, so maybe it is
no problem at all:

I brainstormed a bit, and remembered how these GCs found pointers, by
scanning the bss/stack for values in the range of allocated memory segments.

Now this language has a dual layer allocation scheme, a basic malloc/free
scheme (not unlike C, except that the size of the block is always stored in
the heap admin), but also, for automated types a scheme where a slightly
larger block is allocated for some extra data about the automated type:

a malloc looks like this:

void * allocateauto(size_t size)
pautomated* pauto;
void * p;
pauto=(automatedstruct *)p;
pauto->fields=data ;
return (p+sizeof(autostruct));

An automatedfree simply subtracts sizeof(autostruct) and then frees that. The
systems typing makes autotype distinct, so no pointer to the start of the
allocation is kept anywhere.

Is this one of the cases that could potentially break a std boehm GC? The GC
faq mentions GC being safe with basic pointer arithmetic, but this is
non-local, and no reference to the original block is kept.

Note that this is purely about system stability, not even if the GC
correctly frees such allocations. If it doesn't it is also fine.

(Boehm) gc question

Post by cr8819 » Thu, 26 Mar 2009 23:27:36

yeah, if Boehm (or some other conservative GC) is mixed with memory it
doesnt own (say, as gained from the likes of mmap or malloc or similar),
then one can expect all sorts of difficulties...

the only real options is to keep references around somewhere visible, or
find some way to put the memory in question under the control of the GC, or
use whatever GC the language provides, ...


(Boehm) gc question

Post by Marco van » Fri, 27 Mar 2009 20:06:44

It does own the memory, just that there is no reference to the beginning of the
allocated segment, but to somewhere in the middle.

If I think about it more, it might be doable, because the GC must just keep
internal administration of blocks and then check if pointer values found
match in an existing allocation. It makes determining and comparing
references maybe a bit more costly, and the risk on false positives is
larger (the space of allowed pointer values might become larger), but it
should be possible I think.

... but maybe not using the common Boehm implementation (libgc)

Doing that on the front might not be a problem, but how to throw that away
at the end of the life? That's what the GC is supposed to do.

The GC would be in near complete control (a few pretty static allocations
on startup, before GC initialization and maybe TLS as exceptions)

The language provides reference counting, but only for non-nestable types.
(iow not for classes, these are manually allocated). Most of these objects
are allocated over the memmanager that is plugged by the GC. I don't think
the internally allocated objects are the problem, except for the above
reference_points_to_somewhere_in_the_block problem. described above

The externally allocated objects are typically COM (or JNI/Corba whatever),
and I assume these are situation in memory managed by windows, so if the GC
limits itself to mapped mem of the app, these shouldn't be a problem.

(Boehm) gc question

Post by Mark Woodi » Fri, 27 Mar 2009 20:34:04

Marco van de Voort < XXXX@XXXXX.COM > writes:

If the GC_all_interior_pointers flag is turned off then the GC will
probably prematurely free blocks for which there's no pointer to the
start, which will cause all manner of problems. This flag is set by
default on my system (Debian GNU/Linux) but you should probably set it

-- [mdw]

(Boehm) gc question

Post by Marco van » Mon, 30 Mar 2009 02:14:43

Thanks. That gives me a direction to search for.