"Thomas Richter" < XXXX@XXXXX.COM > wrote in message
news:d1rbvg$3gc$ XXXX@XXXXX.COM ...
ok, however, I don't know how to determine them.
by comparrison with other encoders, and endless fiddling, I can gradually
drive up the ratio, and I have managed to match ratios, eg, with winrar and
7zip (I am also within the same time domain as 7zip, but not winrar).
I have found my arithmatic coder to be the main time limiter though. I can
drop the effort the lz encoder goes through (I have changed the algo I am
using for matching, mostly as my old approach was using far too much
memory). the new one doesn't really need more memory to have better matches,
but isn't exactly that "light" either.
my new algo can go faster and still do better though, eg, by limiting the
however, little can be done for the arithmatic coder.
the biggie is a single spot, namely a single division done per bit to
calculate the weights, which seems to be using up nearly all the time...
somehow, this division is quite expensive...
just, I haven't found a cheaper way to get around it...
the cost: the plain entopy coder has an upper limit of about 3.3MB/s with
the "-O3" compiler option...
this greatly limits my decode speed, eg, my lz decoder is only going about
8.4MB/s and taking about 9.4s (some speed is made up by the inflation).
some minor speed gains are made by limiting my context bits (to an order-1
vs. order-2 context), eg: 2.8MB encode (28s for 79MB) and 10.7MB decode
(7.39s), but this is not signifigant (code complexity is uneffected, my
guess is this can be attributed to needing to access less memory).
dropping the order has a minor effect, eg, the ratio goes from 14.53% to
14.74% original size.
I guess this is a cost, and I guess others have probably ran into this
various hacks do little, as most of them cost more than the problem (trying
things from precomputed indices, to trying to lock the model per-entry).
no good fix seems to exist, I guess this is a limit...
my simpler bytewise encoder, which gets kind-of worse compression due to
it's lack of entropy coding (eg: presently about 17.8% original size) has
the advantage of being much faster (and I could speed it up more by
propagating, eg, some newer hashing algos and similar back into it...).
it is ammusing some, in some cases entropy coding doesn't offer much
and as an added bonus: my bytewise decoder is very much simpler...
it would just seem quite odd though, eg, if entropy encoding is worth as
little as my experience seems to be showing. it works so well by itself, and
seems so good on paper, yet, somehow, can only squeeze a few extra percent
out of lz-encoded data?...
maybe it is because in my vs. trials for reference encoding, my bytewise
scheme didn't do too much worse than the arithmatic coded scheme (eg: within
"a few bits").
why is it that I can damn near beat out bzip2 without any entopy coding
anyways? is it the allmighty lz style string matching?...
a few percent worse, 20-100x faster for the decoder...
why then do nearly all major algos use entropy coding if it makes so little
difference? maybe it is the "in-general" case, eg, it makes a big difference
(not all files are likely to compress well with lz, but may do much better,
say, with order-3 ar