I have succeeded in writing my first (functional) arithmatic coder (can

encode input, and have generated data be correctly decodable).

it was a long and difficult proccess, taking much of the day...

ammusingly, it ended up looking a lot like paq, eg:

I am encoding 1 bit at a time, calling the encode bit function with a weight

indicating the probability that the value is 0.

likewise, I have ended up shifting values in and out of the range window 8

bits at a time (I started with 1 bit, then 4, and finally 8, noting that

using a larger unit has little effect on the ratio...).

for various reasons the bytes are unaligned, but this shouldn't be too hard

to change.

(or at least, in general ways it looks sort of like paq...).

for overflow, I check if basically the midpoint collides with either the

lower or upper part of the window, if so basically shifting values into/out

of the window until the range is normalized (again, ending up as 8 bit

operations as this didn't seem to hurt anything).

(all this sounds simple enough, but it was ammusingly difficult to come up

with this).

this came up as a lot simpler-seeming than the "underflow bits" scheme being

used by another encoder I have looked at a lot more (which seems to limit

everything to 1 bit at a time, and I was unable to get to work...).

for testing, I was using it in combination with a huffman encoder (eg:

arithmatic assisted huffman).

ratios aren't so good, it seems the coding doesn't really "shine" here (both

sides of a huffman node are pretty close wrt weight...). it would probably

do better if the bit-depth were fixed, eg, sorting values by probability,

and effectively encoding the index as 8 bits, but having each bit weight be

based on the weight distribution at that bit.

I am thinking it could be possible, eg, to encode values just using a simple

"curve" giving the weight for each bit in a value.

speed:

speed is acceptable I guess.

in combination with huffman, I am getting about 5MB/s for encode.

probably by making some changes (eg: direct byte-based access to buffers,

and more avoidance of single-bit io) I could get it faster.

likely, having it encode more directly rather than working as a backend for

a huffman encoder could improve things as well.

model:

the current model is effectively static. this is not part of the core

encoder functions (which don't care, they just get a bit and a weight).

I guess maybe the model is the slow part?...

ratio:

dunno, still more messing with this will be needed before I can determine

whether or not my ratios/speed are acceptable vs other stuff.

comments?...

Hi,

If you just need to encode bits, this is of course fine. However, in

most cases you have an alphabet that is a bit larger than just two

letters, and in that case a more generic coder would be fine. For

reference, you find a good arithmetic coder in a Dr.Dobbs article

by Marc Nelson about arithmetic coding. Just google for these keywords

and you find it immediately.

For fast binary arithmetic coding, you might want to google for

the QM and the MQ coder. Both are patented IIRC, but as long as you just

want to play with them that should be fine. They are remarkable in the

sense that they do not need to carry out any multiplications or divisions.

The MQ/QM coder scheme uses a "bit stuffing" strategy to cover the problem

of "carry-overs" instead of the classical bit-count. This is also something

you might find interesting.

Good models should be adaptive to the data. Try this as an exercise. (-;

You can compute the quality of the code actually: Compare the output

size (in bits) with the (binary) zero-level entropy of the data. If the

coder is good, it should create a file that is very near the zero-order

entropy of the input data.

So long,

Thomas

If you just need to encode bits, this is of course fine. However, in

most cases you have an alphabet that is a bit larger than just two

letters, and in that case a more generic coder would be fine. For

reference, you find a good arithmetic coder in a Dr.Dobbs article

by Marc Nelson about arithmetic coding. Just google for these keywords

and you find it immediately.

For fast binary arithmetic coding, you might want to google for

the QM and the MQ coder. Both are patented IIRC, but as long as you just

want to play with them that should be fine. They are remarkable in the

sense that they do not need to carry out any multiplications or divisions.

The MQ/QM coder scheme uses a "bit stuffing" strategy to cover the problem

of "carry-overs" instead of the classical bit-count. This is also something

you might find interesting.

Good models should be adaptive to the data. Try this as an exercise. (-;

You can compute the quality of the code actually: Compare the output

size (in bits) with the (binary) zero-level entropy of the data. If the

coder is good, it should create a file that is very near the zero-order

entropy of the input data.

So long,

Thomas

"Thomas Richter" < XXXX@XXXXX.COM > wrote in message

news:d1m2ug$l96$ XXXX@XXXXX.COM ...

afaik it is faster to encode one bit at a time than one character at a time,

given a reduction in total computations?...

I can encode a whole byte, eg, as 8 bits easily enough.

this would be impressive.

yeah, some amount of assignments and arithmatic is needed to encode each

bit...

a division is needed per bit by the model, but not by the core encoder.

I want to stay free of patents if possible.

dunno, the teqnique I use seems to work (basically, once the window

"collapses", I generally flush it and such and renormalize the window).

my encoder is presntly shifting in/out 8 bits at a time.

somehow fpaq0 seems to avoid some problems, eg, if the probability gets too

close to either extreme, and also the case of underflow.

I don't know, the underflow checking and weight clamping likely costs some

performance.

I need them, as I seem to have problems with underflow and getting too close

to the extremes...

current model now is adaptive, albeit I lifted the algo from fpaq0, albeit

with more bits for the context...

I am generally using a 24 bit context, as too much more uses too much

memory.

I use different models, eg, for references. actually, a model that seems to

work here is giving a weight for each bit. there seems to be little real

pattern in the bits.

dunno what this is...

I am mostly just comparing against other encoders.

order-2 arithmatic coding+lz77:

for tar archive given on site:

http://www.elis.ugent.be/~wheirman/compression/

I am now getting it to about 16.4% original size, and taking about 200s

(about 378kB/s). slow, but ok compression...

more diffiuclt is actually competing against my bytewise encoders/decoders,

which don't compress quite as well, but decompress quite a bit faster.

or such...

I am too drunk and too tired to really write much.

news:d1m2ug$l96$ XXXX@XXXXX.COM ...

afaik it is faster to encode one bit at a time than one character at a time,

given a reduction in total computations?...

I can encode a whole byte, eg, as 8 bits easily enough.

this would be impressive.

yeah, some amount of assignments and arithmatic is needed to encode each

bit...

a division is needed per bit by the model, but not by the core encoder.

I want to stay free of patents if possible.

dunno, the teqnique I use seems to work (basically, once the window

"collapses", I generally flush it and such and renormalize the window).

my encoder is presntly shifting in/out 8 bits at a time.

somehow fpaq0 seems to avoid some problems, eg, if the probability gets too

close to either extreme, and also the case of underflow.

I don't know, the underflow checking and weight clamping likely costs some

performance.

I need them, as I seem to have problems with underflow and getting too close

to the extremes...

current model now is adaptive, albeit I lifted the algo from fpaq0, albeit

with more bits for the context...

I am generally using a 24 bit context, as too much more uses too much

memory.

I use different models, eg, for references. actually, a model that seems to

work here is giving a weight for each bit. there seems to be little real

pattern in the bits.

dunno what this is...

I am mostly just comparing against other encoders.

order-2 arithmatic coding+lz77:

for tar archive given on site:

http://www.elis.ugent.be/~wheirman/compression/

I am now getting it to about 16.4% original size, and taking about 200s

(about 378kB/s). slow, but ok compression...

more diffiuclt is actually competing against my bytewise encoders/decoders,

which don't compress quite as well, but decompress quite a bit faster.

or such...

I am too drunk and too tired to really write much.

Hi,

But you loose context in that case. Consider the case where you encode

ASCII text; wouldn't it help to detect the letter "E" as an 8-bit entity

rather than just seeing the 0's and 1's of the ASCII encoding?

Yes, but this is sub-optimal. In other words, the "bit model" is typically

not suitable tfor random sources to be considered.

In the simplest case: "A model" = a set of empirical probabilities for

the symbols you want to encode.

In your case:

Compute: p_0 = probability of zero bit, p_1 probability of 1-bit. Then

compare: # of input bits * (p_0 * log(p_0) + p_1 * log(p_1))

where log() is the logarithm to the base of 2 with the length of the

output (compressed) message in bits.

The bracketed expression (p_0...log()) is called the zero-order entropy

(of the bit-model of your random source).

So long,

Thomas

"Thomas Richter" < XXXX@XXXXX.COM > wrote in message

news:d1mc61$s5p$ XXXX@XXXXX.COM ...

dunno.

I don't think it makes much difference.

processing things in terms of bit-patterns seems to work.

for a large glob of source:

order-0, 9 ctx bits, 75.1% original size;

order-0.5, 16 ctx bits, 47.7% original size;

order-1, 18 ctx bits, 41.6% original size;

order-1.5, 24 ctx bits, 32.8% original size.

order 1.5 is order-1, but with 6 low order bits from the oldest byte.

generally using the .5 convention for any un-even order's.

values presently being encoded are 9 bits (only the lower 8 are used for

most typical values).

big english text file:

order-0, 9 ctx bits, 61.6% original size;

order-0.5, 16 ctx bits, 39.1% original size;

order-1, 18 ctx bits, 35.3% original size;

order-1.5, 24 ctx bits, 30.4% original size.

dunno.

my experiences show that my ratios are comming out better than I was getting

out of huffman. afaik, the unaligned bit-centric model may actually help,

since there is some "smearing" between the bytes and thus a possibly

stronger association.

well, maybe, but in this case I don't have the probabilities for the bytes

themselves, rather just a prediction of the next bit from the previous bits.

for references, which are more random, I am gathering a weight for each bit

position.

using this approach for textual data (much less context, but not as good):

83.2% original size for the large glob of source;

75.4% original size for the big english text file.

it, however, works much better on references (lengths and offsets).

dunno.

the number I get is a lot larger than my output bits.

your description wasn't so clear either.

p_0: 0.613998

p_1: 0.385986

p_0*log2(c_0): 13.5

p_1*log2(c_1): 8.5

e: 634097953

where p_0 is the probability of 0 in the input, and c_0 is the output 0

bits.

computing probabilities for output instead:

p_0: 0.495941

p_1: 0.504044

p_0*log2(c_0): 10.9

p_1*log2(c_1): 11.1

e: 634177413

final output bits: 8415768

e/outbits: 75.4

changing it to be as-written, e becomes negative, but it's absolule value is

still much bigger than the output bits...

news:d1mc61$s5p$ XXXX@XXXXX.COM ...

dunno.

I don't think it makes much difference.

processing things in terms of bit-patterns seems to work.

for a large glob of source:

order-0, 9 ctx bits, 75.1% original size;

order-0.5, 16 ctx bits, 47.7% original size;

order-1, 18 ctx bits, 41.6% original size;

order-1.5, 24 ctx bits, 32.8% original size.

order 1.5 is order-1, but with 6 low order bits from the oldest byte.

generally using the .5 convention for any un-even order's.

values presently being encoded are 9 bits (only the lower 8 are used for

most typical values).

big english text file:

order-0, 9 ctx bits, 61.6% original size;

order-0.5, 16 ctx bits, 39.1% original size;

order-1, 18 ctx bits, 35.3% original size;

order-1.5, 24 ctx bits, 30.4% original size.

dunno.

my experiences show that my ratios are comming out better than I was getting

out of huffman. afaik, the unaligned bit-centric model may actually help,

since there is some "smearing" between the bytes and thus a possibly

stronger association.

well, maybe, but in this case I don't have the probabilities for the bytes

themselves, rather just a prediction of the next bit from the previous bits.

for references, which are more random, I am gathering a weight for each bit

position.

using this approach for textual data (much less context, but not as good):

83.2% original size for the large glob of source;

75.4% original size for the big english text file.

it, however, works much better on references (lengths and offsets).

dunno.

the number I get is a lot larger than my output bits.

your description wasn't so clear either.

p_0: 0.613998

p_1: 0.385986

p_0*log2(c_0): 13.5

p_1*log2(c_1): 8.5

e: 634097953

where p_0 is the probability of 0 in the input, and c_0 is the output 0

bits.

computing probabilities for output instead:

p_0: 0.495941

p_1: 0.504044

p_0*log2(c_0): 10.9

p_1*log2(c_1): 11.1

e: 634177413

final output bits: 8415768

e/outbits: 75.4

changing it to be as-written, e becomes negative, but it's absolule value is

still much bigger than the output bits...

just

the

divisions.

IIRC this is done by rounding the least probable symbol probability to

a power of 1/2, so you give up some compression. (This is done in the

arithmetic coding mode of jpeg).

The arithmetic coder in fpaq0 is a simplified version of the one in

paq. It uses multiplication, but that is not so expensive on modern

processors. The code is within about 0.0001 bpc of the Shannon limit.

The coder requires 8 coding operations to code one byte of data by

coding the bits one at a time. This is faster than the approach of

splitting up the range into 256 segments in proportion to each symbol

probability because you don't have to add up all 256 frequencies for

normalization. However it requires the model to maintain statistics

for partial byte contexts. For a zero order model, there are 255

contexts: one context for the first bit, 2 for the second, 4 for the

third, 8 for the fourth, etc. For each context, the statistics

consists of a 0 count and a 1 count.

fpaq0 codes 9 bits per symbol, one bit to code the EOF and 8 bits to

code the byte. It could be slightly faster by coding the length

explicitly and just coding 8 bits per byte (as paq does). However,

compressed size should not be affected.

AFAIK the paq/fpaq0 coder is not covered by patents.

Older arithmetic coders use 16 bit integers to maintain the lower and

upper bounds of the range, plus a carry counter so that when you have a

range like [.011111..., .100000...] the range can still be represented

with 16 bits precision. The coder outputs one bit at a time as the

bits become known.

However paq/fpaq0 uses 32 bit integers to maintain the range, and no

carry counter, and it outputs 8 bits at a time. When the leading 8

bits of the lower and upper bounds match, a byte is output. Thus, the

range normally has 24 to 32 bits precision. A probability is

represented with 12 bits. The new range is found by dropping the low

12 bits of the range and multiplying, which normally maintains 12

significant bits of precision in the result.

Occasionally the range can straddle a byte boundary, resulting in some

precision loss. In the extreme case, the probability is forced to 1/2,

but this is rare. For example, if the range is [0x37ffffff,

0x38000000], then the next coding operation forces the range to either

[0x37ffffff, 0x37ffffff] or [0x38000000, 0x3800000], in which case the

bit is coded inefficiently, but all four bytes can be output and the

range becomes [0x00000000, 0xffffffff]. You can think of the lower

bound as followed by an infinite stream of 0 bits, and the upper bound

as followed by an infinite stream of 1 bits.

A probability with 12 bits of precision is quite adequate for high end

compression. As a rule of thumb, an error E in a probability estimate

P will result in an increase in compressed size of (E/P)^2 over the

Shannon limit, so even large errors don't have much cost.

-- Matt Mahoney

"Matt Mahoney" < XXXX@XXXXX.COM > wrote in message

news: XXXX@XXXXX.COM ...

yes.

in mine, I went back to 8 bits per symbol (overloading/escaping some of the

low-probability ones). in general this has increased compression, given a

slight increase in the range of the contexts (this may cost some for the low

probability bytes, but this may be made up for given the small set of values

that can come after an escape, and the fact that, in general, I can have a

few more context bits in memory...).

yes, this is cool.

I am using 16 bits presently, but am considering increasing it.

yes, I wondered about that...

I guess it has an advantage that it avoids needing to use a longer

representation (eg: unsigned long long) to do the multiplications, but I was

paranoid it might hurt the accuracy.

eg:

fpaq0 style: (r>>12)*w

current style: (r*w)>>16

could also just use ull for some things, and see if my performance is hurt

too bad (current processors include 64 bit int math anyways with some ops,

but I guess some depends on if gcc uses them...).

I guess it can't hurt too bad, as I wont be doing and ull division, and x86

generates the high bits anyways (which are afaik typically just discarded).

if I am lucky, for something like: (uint)(((ull)r*(ull)w)>>32) gcc could be

generating something efficient...

hmm, good point.

however, I had found (with mine at least) that I still needed to manually

renormalize sometimes otherwise I tend to have problems with some files.

I guess the need would be greatly reduced if I were using 32 bit ranges.

cool.

however, by this point I will have basically just ripped off paq entirely,

instead of using a kind of hybrid style from 2 different arithmatic

coders...

however, doing it more like paq may allow me to squeeze out a little more

compression and speed, which would be good...

[some time has passed]

yes, a lot has been done.

now I am still using 32 bit math (long long expensive), and 12 bit

probabilities, with the ((r>>12)*w) approach.

with some math fiddling, I was able to eliminate the need for underflow

testing, and also needing to keep probabilities from ramming into one of the

extremes.

basic arithmatic coder compression ratios have also gone up, so I may be

able to re-integrate this with my lz77 coder and get better ratios (possibly

<15% for the gimp sources, as 15% was what I was getting previously, and

baseline ratios have improved by approx 5% with the encoder changes...).

I have now nearly doubled the speed as well (right now about 1.2MB/s).

and such...

news: XXXX@XXXXX.COM ...

yes.

in mine, I went back to 8 bits per symbol (overloading/escaping some of the

low-probability ones). in general this has increased compression, given a

slight increase in the range of the contexts (this may cost some for the low

probability bytes, but this may be made up for given the small set of values

that can come after an escape, and the fact that, in general, I can have a

few more context bits in memory...).

yes, this is cool.

I am using 16 bits presently, but am considering increasing it.

yes, I wondered about that...

I guess it has an advantage that it avoids needing to use a longer

representation (eg: unsigned long long) to do the multiplications, but I was

paranoid it might hurt the accuracy.

eg:

fpaq0 style: (r>>12)*w

current style: (r*w)>>16

could also just use ull for some things, and see if my performance is hurt

too bad (current processors include 64 bit int math anyways with some ops,

but I guess some depends on if gcc uses them...).

I guess it can't hurt too bad, as I wont be doing and ull division, and x86

generates the high bits anyways (which are afaik typically just discarded).

if I am lucky, for something like: (uint)(((ull)r*(ull)w)>>32) gcc could be

generating something efficient...

hmm, good point.

however, I had found (with mine at least) that I still needed to manually

renormalize sometimes otherwise I tend to have problems with some files.

I guess the need would be greatly reduced if I were using 32 bit ranges.

cool.

however, by this point I will have basically just ripped off paq entirely,

instead of using a kind of hybrid style from 2 different arithmatic

coders...

however, doing it more like paq may allow me to squeeze out a little more

compression and speed, which would be good...

[some time has passed]

yes, a lot has been done.

now I am still using 32 bit math (long long expensive), and 12 bit

probabilities, with the ((r>>12)*w) approach.

with some math fiddling, I was able to eliminate the need for underflow

testing, and also needing to keep probabilities from ramming into one of the

extremes.

basic arithmatic coder compression ratios have also gone up, so I may be

able to re-integrate this with my lz77 coder and get better ratios (possibly

<15% for the gimp sources, as 15% was what I was getting previously, and

baseline ratios have improved by approx 5% with the encoder changes...).

I have now nearly doubled the speed as well (right now about 1.2MB/s).

and such...

Hi,

There's no c_0 in the logarithm, there's a p_0. And I forgot a minus sign, thus:

H_0 = - (p_0 * log2(p_0) + p_1 * log2(p_1))

No output bits. p_0/p_1 are the input probabilities. Entropy is a measure

of the source bitstream.

Sorry, I forgot the minus sign. The above quantity (H_0) is always positive.

However, your encoder seems to model a random source with memory (i.e. you

predict data) in which case the entropy H of the source will be of course always

smaller than H_0.

So long,

Thomas

There's no c_0 in the logarithm, there's a p_0. And I forgot a minus sign, thus:

H_0 = - (p_0 * log2(p_0) + p_1 * log2(p_1))

No output bits. p_0/p_1 are the input probabilities. Entropy is a measure

of the source bitstream.

Sorry, I forgot the minus sign. The above quantity (H_0) is always positive.

However, your encoder seems to model a random source with memory (i.e. you

predict data) in which case the entropy H of the source will be of course always

smaller than H_0.

So long,

Thomas

ok.

ok.

ok.

I don't quite get what the relevance of all this is though...

current h_0:

25942679

output bits:

7367424

could be

You are in luck.

#include <stdlib.h>

int main(int argc, char** argv) {

return int((unsigned long long)atoi(argv[1])

*(unsigned long long)atoi(argv[2]) >> 32);

}

when compiled with g++ -S (MINGW 3.4.1, Windows Me)

.file "x.cpp"

.def ___main; .scl 2; .type 32; .endef

.text

.align 2

.globl _main

.def _main; .scl 2; .type 32; .endef

_main:

pushl %ebp

movl %esp, %ebp

pushl %ebx

subl $20, %esp

andl $-16, %esp

movl $0, %eax

addl $15, %eax

addl $15, %eax

shrl $4, %eax

sall $4, %eax

movl %eax, -8(%ebp)

movl -8(%ebp), %eax

call __alloca

call ___main

movl 12(%ebp), %eax

addl $4, %eax

movl (%eax), %eax

movl %eax, (%esp)

call _atoi

movl %eax, %ebx

movl 12(%ebp), %eax

addl $8, %eax

movl (%eax), %eax

movl %eax, (%esp)

call _atoi

imull %ebx

movl %edx, %eax

movl $0, %edx

movl -4(%ebp), %ebx

leave

ret

.def _atoi; .scl 3; .type 32; .endef

So the 64 bit multiply is done with one instruction with 32 bit

operands (input in eax and ebx, result low bits in eax and high bits in

edx) and the 32 bit shift is just a register move. This is without

optimization too.

imull %ebx

movl %edx, %eax

Unfortunately, Borland doesn't know what "long long" is.

-- Matt Mahoney

<snip fragments>

yes, cool.

yeah, that is lame, also given long long is in the c99 spec...

however, on my computer at least, trying to use long long's was comming out

slightly slower than just using unsigned ints, and I ended up using a

paq-style multiply.

then by looking at stuff, I noticed the round off from said paq-style

multiply was signifigant (caused drift and some inflation, verifiable, eg,

when doing an encode with all neutral weights), so I added another line to

adjust the results (basically computing a difference using the low order

bits and adding this into the result).

I am just a little fussy probably, I like it so that when I am using all

neutral weights, I essentially get the input back out the other end

(possibly with a little extra padding).

I may need to check again which is faster:

v=whuff_rmin+(r>>16)*w;

v+=((r&65535)*w)>>16;

vs:

v=whuff_rmin+(((ull)r*(ull)w)>>16);

or I could actually utilize the feature, eg:

v=whuff_rmin+(((ull)r*(ull)(w<<16))>>32);

I just checked, yes, cool, it is a bit faster...

note:

I need to shift it like this since I am using 16 bit weights (I don't really

know of a good/fast way to compute 32 bit ones vs. 16 bit ones and otherwise

deal with the issue that it would take more space for storing the counts to

actually utilize higher accuracies...).

ull division is afaik a lot more expensive than an extra shift anyways...

or such...

misc: gimp tarball now down to about 14.7%, albeit slow taking approx 186s

to compress (most of which going into my function for looking up strings,

for which I am considering trying a larger and shallower hash...).

oh well, so on goes the trudge...

Hi,

As said, it allows you to determine the quality of the encoder without

having to do "comparisons" with others since the (zero-order) entropy

gives a lower bound to the compression (with a zero-order model, that is).

Which means that with the higher-order model you're using you're

significantly better than H_0, which is a good sign. The next step to

determine the quality of the code would be to check against higher

order entropies, namely those that approximate your model better.

So long,

Thomas

"Thomas Richter" < XXXX@XXXXX.COM > wrote in message

news:d1rbvg$3gc$ XXXX@XXXXX.COM ...

ok.

ok, however, I don't know how to determine them.

by comparrison with other encoders, and endless fiddling, I can gradually

drive up the ratio, and I have managed to match ratios, eg, with winrar and

7zip (I am also within the same time domain as 7zip, but not winrar).

I have found my arithmatic coder to be the main time limiter though. I can

drop the effort the lz encoder goes through (I have changed the algo I am

using for matching, mostly as my old approach was using far too much

memory). the new one doesn't really need more memory to have better matches,

but isn't exactly that "light" either.

my new algo can go faster and still do better though, eg, by limiting the

"search depth".

however, little can be done for the arithmatic coder.

the biggie is a single spot, namely a single division done per bit to

calculate the weights, which seems to be using up nearly all the time...

ctx<<=1;

i=model[ctx|0]+1;

j=model[ctx|1]+1;

k=(i<<16)/(i+j);

somehow, this division is quite expensive...

just, I haven't found a cheaper way to get around it...

the cost: the plain entopy coder has an upper limit of about 3.3MB/s with

the "-O3" compiler option...

this greatly limits my decode speed, eg, my lz decoder is only going about

8.4MB/s and taking about 9.4s (some speed is made up by the inflation).

some minor speed gains are made by limiting my context bits (to an order-1

vs. order-2 context), eg: 2.8MB encode (28s for 79MB) and 10.7MB decode

(7.39s), but this is not signifigant (code complexity is uneffected, my

guess is this can be attributed to needing to access less memory).

dropping the order has a minor effect, eg, the ratio goes from 14.53% to

14.74% original size.

I guess this is a cost, and I guess others have probably ran into this

problem.

various hacks do little, as most of them cost more than the problem (trying

things from precomputed indices, to trying to lock the model per-entry).

no good fix seems to exist, I guess this is a limit...

my simpler bytewise encoder, which gets kind-of worse compression due to

it's lack of entropy coding (eg: presently about 17.8% original size) has

the advantage of being much faster (and I could speed it up more by

propagating, eg, some newer hashing algos and similar back into it...).

it is ammusing some, in some cases entropy coding doesn't offer much

really...

and as an added bonus: my bytewise decoder is very much simpler...

it would just seem quite odd though, eg, if entropy encoding is worth as

little as my experience seems to be showing. it works so well by itself, and

seems so good on paper, yet, somehow, can only squeeze a few extra percent

out of lz-encoded data?...

maybe it is because in my vs. trials for reference encoding, my bytewise

scheme didn't do too much worse than the arithmatic coded scheme (eg: within

"a few bits").

why is it that I can damn near beat out bzip2 without any entopy coding

anyways? is it the allmighty lz style string matching?...

a few percent worse, 20-100x faster for the decoder...

why then do nearly all major algos use entropy coding if it makes so little

difference? maybe it is the "in-general" case, eg, it makes a big difference

(not all files are likely to compress well with lz, but may do much better,

say, with order-3 ar

news:d1rbvg$3gc$ XXXX@XXXXX.COM ...

ok.

ok, however, I don't know how to determine them.

by comparrison with other encoders, and endless fiddling, I can gradually

drive up the ratio, and I have managed to match ratios, eg, with winrar and

7zip (I am also within the same time domain as 7zip, but not winrar).

I have found my arithmatic coder to be the main time limiter though. I can

drop the effort the lz encoder goes through (I have changed the algo I am

using for matching, mostly as my old approach was using far too much

memory). the new one doesn't really need more memory to have better matches,

but isn't exactly that "light" either.

my new algo can go faster and still do better though, eg, by limiting the

"search depth".

however, little can be done for the arithmatic coder.

the biggie is a single spot, namely a single division done per bit to

calculate the weights, which seems to be using up nearly all the time...

ctx<<=1;

i=model[ctx|0]+1;

j=model[ctx|1]+1;

k=(i<<16)/(i+j);

somehow, this division is quite expensive...

just, I haven't found a cheaper way to get around it...

the cost: the plain entopy coder has an upper limit of about 3.3MB/s with

the "-O3" compiler option...

this greatly limits my decode speed, eg, my lz decoder is only going about

8.4MB/s and taking about 9.4s (some speed is made up by the inflation).

some minor speed gains are made by limiting my context bits (to an order-1

vs. order-2 context), eg: 2.8MB encode (28s for 79MB) and 10.7MB decode

(7.39s), but this is not signifigant (code complexity is uneffected, my

guess is this can be attributed to needing to access less memory).

dropping the order has a minor effect, eg, the ratio goes from 14.53% to

14.74% original size.

I guess this is a cost, and I guess others have probably ran into this

problem.

various hacks do little, as most of them cost more than the problem (trying

things from precomputed indices, to trying to lock the model per-entry).

no good fix seems to exist, I guess this is a limit...

my simpler bytewise encoder, which gets kind-of worse compression due to

it's lack of entropy coding (eg: presently about 17.8% original size) has

the advantage of being much faster (and I could speed it up more by

propagating, eg, some newer hashing algos and similar back into it...).

it is ammusing some, in some cases entropy coding doesn't offer much

really...

and as an added bonus: my bytewise decoder is very much simpler...

it would just seem quite odd though, eg, if entropy encoding is worth as

little as my experience seems to be showing. it works so well by itself, and

seems so good on paper, yet, somehow, can only squeeze a few extra percent

out of lz-encoded data?...

maybe it is because in my vs. trials for reference encoding, my bytewise

scheme didn't do too much worse than the arithmatic coded scheme (eg: within

"a few bits").

why is it that I can damn near beat out bzip2 without any entopy coding

anyways? is it the allmighty lz style string matching?...

a few percent worse, 20-100x faster for the decoder...

why then do nearly all major algos use entropy coding if it makes so little

difference? maybe it is the "in-general" case, eg, it makes a big difference

(not all files are likely to compress well with lz, but may do much better,

say, with order-3 ar

Floating point division is faster than int on x86 (and float is faster

than double).

coding

It might be that BW is best suited for uniform data, and the gimp

tarball has a mix of data types. LZ is more adaptive.

files, so

beats

gzip only uses a 32K buffer. The gimp tarball has some redundancies

that span the length of the file. Any compressor that uses a lot of

memory will do better.

"Matt Mahoney" < XXXX@XXXXX.COM > wrote in message

news: XXXX@XXXXX.COM ...

ok.

however, simply changing the types causes a slowdown, so if it is faster I

guess the change would be more "involved" than some simple typecastes...

yes, possibly.

yes, ok.

my guess is gzip is probably a rather efficient coding, just one that

doesn't scale so well?...

my bytewise encoder isn't so efficient I guess, but wins due to it's rather

large dictionary, and has the advantage of being rather fast (at least for

in-memory decompression).

for encoding the gimp tarball, my bytewise encoder uses approx 500MB ram,

and my arithmatic coded variant uses about the same.

decode is much less, eg, around the size of the encoded data, and about 2x

the decoded size (plus any memory needed for arithmatic decoding, eg, 256kB

for an order-1 context, and about 32kB for the length/distance contexts).

I am thinking of trying a file-only decompressor for the bytewise variant

(basically, it will not use any explicit buffers), and seeing if the speed

is still good...

at present a good upper limit is a few-hundred MB at once, as otherwise the

memory requirements get too high (part of why I had changed the hashing algo

was that my older one was taking a lot more ram, and I was getting a >1GB

process image...).

a file-only decoder could at least decode much larger files.

another thought is it may be necessary (for the encoder) to divide large

files into "chunks", effectively limiting the dictionary to that chunk.

with a 64MB chunk size, memory requirements would at least be constrained.

I guess my arithmatic coded variant is better for smaller files (or bigger

ones, if one can tolerate single digit 2 and 4MB/s encode/decode speeds).

it beats out gzip at least for not-quite-so-large files (hundreds of kB to a

few MB). for small files (eg: <100k), however, gzip wins quite easily.

a thing I am noticing now is that maybe my arithmatic coder adjusts too

slowly? changing the predictor such that it keeps the counts smaller (eg:

<256) causes the ratio to improve some, and improves encode/decode speed

some (more decode, which rose to about 5.5MB/s vs about 4.1).

I guess it is probably ok, decoding the gimp tarball only takes about 15s

anyways, still this is a lot slower than the others...

news: XXXX@XXXXX.COM ...

ok.

however, simply changing the types causes a slowdown, so if it is faster I

guess the change would be more "involved" than some simple typecastes...

yes, possibly.

yes, ok.

my guess is gzip is probably a rather efficient coding, just one that

doesn't scale so well?...

my bytewise encoder isn't so efficient I guess, but wins due to it's rather

large dictionary, and has the advantage of being rather fast (at least for

in-memory decompression).

for encoding the gimp tarball, my bytewise encoder uses approx 500MB ram,

and my arithmatic coded variant uses about the same.

decode is much less, eg, around the size of the encoded data, and about 2x

the decoded size (plus any memory needed for arithmatic decoding, eg, 256kB

for an order-1 context, and about 32kB for the length/distance contexts).

I am thinking of trying a file-only decompressor for the bytewise variant

(basically, it will not use any explicit buffers), and seeing if the speed

is still good...

at present a good upper limit is a few-hundred MB at once, as otherwise the

memory requirements get too high (part of why I had changed the hashing algo

was that my older one was taking a lot more ram, and I was getting a >1GB

process image...).

a file-only decoder could at least decode much larger files.

another thought is it may be necessary (for the encoder) to divide large

files into "chunks", effectively limiting the dictionary to that chunk.

with a 64MB chunk size, memory requirements would at least be constrained.

I guess my arithmatic coded variant is better for smaller files (or bigger

ones, if one can tolerate single digit 2 and 4MB/s encode/decode speeds).

it beats out gzip at least for not-quite-so-large files (hundreds of kB to a

few MB). for small files (eg: <100k), however, gzip wins quite easily.

a thing I am noticing now is that maybe my arithmatic coder adjusts too

slowly? changing the predictor such that it keeps the counts smaller (eg:

<256) causes the ratio to improve some, and improves encode/decode speed

some (more decode, which rose to about 5.5MB/s vs about 4.1).

I guess it is probably ok, decoding the gimp tarball only takes about 15s

anyways, still this is a lot slower than the others...

1. further success ( arithmatic coder success)

4. google "top coder" contest = stacked against C++ coders

5. Difference between a range coder and arithmetic coder?

6. [Troll] "Hadron" branches out to gnu.misc.discuss,misc.int-property

7. Subject: segmentation fault bsd.misc, comp.unix.bsd.freebsd.misc

8. [PATCH 1/2] misc: add __init/__exit macros to drivers/misc/hwlat_detector.c

9. [PATCH scsi-misc-2.6 00/03] scsi: misc timer fixes (again)

10. TCPIP USB MISC LIKE GPIB MISC

11. comp.os.ms-windows.misc, comp.windows.misc, microsoft.public.windowsxp.help_and_support,

12. Subject: segmentation fault bsd.misc,comp.unix.bsd.freebsd.misc

13. [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked)

14. FA: PC Misc Hardware, Misc Software, Joysticks, Zip Drive and Disks

15. [PATCH 1/2] Misc: misc-phantom-move-to-unlocked_ioctl-fix