Questions about libbzip2

Questions about libbzip2

Post by irota » Mon, 03 Dec 2007 16:03:49


Hi,

I have two questions regarding libbzip2 (v1.04):

1) All the compressors I've looked at to date either provide a
function or at least a documented equation to compute the upper bound
of the compressed size of a given buffer length. However, I can't find
any such function or equation anywhere in libbzip2. Am I blind?

2) All of the compression and decompression functions take non-const
pointers to the source buffer. I can't imagine any reason why it'd
need to modify the source buffer, so I don't know why this is. Can it
modify the source buffer, or is it just not const-correct?

Thanks,
Adam
 
 
 

Questions about libbzip2

Post by Hans-Peter » Mon, 03 Dec 2007 19:22:05


Probably functions are used that require non-const pointers, e.g. ungetc()

DoDi

 
 
 

Questions about libbzip2

Post by irota » Wed, 05 Dec 2007 03:42:22


To answer my own question, I finally found the following statement in
the bzip2 documentation:
"To guarantee that the compressed data will fit in its buffer,
allocate an output buffer of size 1% larger than the uncompressed
data, plus six hundred extra bytes."

Indeed, bzip2's own unzcrash.c does the following:
uchar zbuf[M_BLOCK + 600 + (M_BLOCK / 100)];


Thanks,
Adam
 
 
 

Questions about libbzip2

Post by irota » Wed, 05 Dec 2007 03:44:54


That may be, but it's still uncertain to me whether this is a const-
correctness issue (i.e., you are required to pass a non-const pointer
even though the buffer won't be changed), or if the buffer pointed to
by the source pointer may actually be changed. The latter would seem
to me to be a little insane, but so far I've not found any assurances
in the documentation that this isn't the case.


Thanks again,
Adam
 
 
 

Questions about libbzip2

Post by Hans-Peter » Wed, 05 Dec 2007 09:39:54


It was my observation, in some compressor, that ungetc() was used, which
in the C specification/implementation allows to modify the input stream.
Since this function was used in an illegal/unspecified way, to put back
more than one byte, I designed my own stream with a larger unget buffer.
This model removed the need for checks for a pushed-back byte, by simply
decrementing the input pointer, and placing the byte into the file
buffer (with configurable underrun size). Another model would simply
ignore tha ungetc() argument, or verify that the push-back byte equals
the byte in the buffer, so that only the buffer pointer has to be updated.

That's why I think that the use of standard library functions can
introduce unwanted security risks, like the non-const property of some
pointer. And last not least, that's one of the reasons why C is not my
favorite programming language ;-)

DoDi