equivalent byte encoding

equivalent byte encoding

Post by erpy » Tue, 31 Oct 2006 02:23:46



Hi all,
I have some questions about information theory and binary representation
of the information itself.

Basically I always wondered whether the binary encoding system,
especially if byte-aligned as in most if not all computer architectures,
could waste information to *any* extent - or better - could allow for
more information compared to the byte-aligned system. And this being not
within an entire file, but within the alignment itself - 8bits.

For instance, if one chooses any equivalent byte-representation encoding
system, the total amount of information - possibilities - should be the
same as writing any single byte-aligned piece of information. I imagine
the simplest "equivalent representation", in these terms, that one can
think of is a signed byte (128 + 128 combinations == 1 byte of
combinations), which is basically a 4+4 bits alignment.

Now, there exists, or is it interesting to find, an equivalent
representation system that allows for more information, (not directly
"to compress," but I can assume the concept can be exploited to some
extent for data compression) being the additional information any number
of additional combinations over the 256 of a single byte, but that
actually *always* stores on file a total of 8 bits ?
Would such a system violate the counting theorem anyhow ?

And, if it exists, what benefits, if any, could such an equivalent
representation bring to information theory ? (if not directly to data
compression)

If it could be applied to data compression directly instead, how would
you exploit such properties of the equivalent representation in order to
efficiently compress data ?
In other words, what kind of compression system could benefit from being
able to always encode 1 byte that can represent no more than 8
*additional* combinations than the "common" byte-aligned system ? (i.e.
a modest number of extra "bits" for each "byte")

The simplest I can think of is that if I can store one more "decimal
number" in each extra combination I could avoid storing those in 2 bytes
- i.e. it would be possible to transmit the decimal "256" in one byte
instead of two, and so on - but it doesn't look very useful/efficient to
me, for one reason or the other. (extra encoding information required,
being one)
But since there's plenty of experts here, I'd think any of you can come
up with way more efficient schemes/ideas.



Thank you,
Emanuele.
 
 
 

equivalent byte encoding

Post by davvid_a_s » Tue, 31 Oct 2006 02:57:26

erpy wrote:


You can bijective transform any string to a byte stlye file. Or
transfrom
a byte files to a string file. Even doing it sloppy for a long file the
difference
would be less that 2 byte or 16 bits. worst case. I suspect that in
the future
if anything bytes maybe come 16 32 64 128 or 256 bits its not likely
they will
get smaller. Again for large files bits versus bytes is really no big
deal.

As far as compression ther are compressors such as arb255 which
bijectively
compress a file from a 256 symbol iid source. It converts the file to
the undelying
string from the bijective transform of bytes to strings. It then
compresses to
strings and on output does the reverse bijective transform from string
to 8 bit
byte style file. To change a bijective compressor like this for
different byte sizs
would be very easy.

David A. Scott
David A. Scott

 
 
 

equivalent byte encoding

Post by cr8819 » Tue, 31 Oct 2006 05:27:01


8 bits is 8 bits, and one can't store any more than what 8 bits can hold.
now, we can represent a bitstream in bytes, so nothing really is lost.


no, 4 bits gives you 16, and 16+16=32

it is closer to a 1+7 combination of bits, even then, not exactly (2s
complement).



nope.



yes.


think of it this way, you have 2 bit values:
00=0
01=1
10=2
11=3

now, in this case, is there any way you can represent 4?...


<snip>


well, here is something close.
always a bytewise input, and a bytewise output, and not using any dictionary
teqniques.


the algo here will be simplistic, but could be improved further.

so, we maintain a large hash table, with each hash index referring to a an
array of sorted byte values.
so, we first transform the value, by first figuring out a hash from the last
2 or 3 bytes, and looking up its place in the array (i).
in that array, we move the byte value to the front of the array.

so, now we can transform our bytewise input into a large string of
bytevalued indices often near 0.

one could try using RLE at this point, but better is possible.

likewise, a simplistic (bytewise) entropy coder with RLE.
00=3x 2 bit values;
01=2x 3 bit values;
10=single 6 bit value;
11=6 bit rle length:
0=larger value (stored in next byte);
1=reserved;
2..63=length (repeats last value).


now, in any case, we have a crappy entropy coding scheme not using any
bitstreams or arithmetic/range coding.


actually, I may test this just to see how good (or poorly) it does.