by erpy » Tue, 31 Oct 2006 02:23:46
Hi all,
I have some questions about information theory and binary representation
of the information itself.
Basically I always wondered whether the binary encoding system,
especially if byte-aligned as in most if not all computer architectures,
could waste information to *any* extent - or better - could allow for
more information compared to the byte-aligned system. And this being not
within an entire file, but within the alignment itself - 8bits.
For instance, if one chooses any equivalent byte-representation encoding
system, the total amount of information - possibilities - should be the
same as writing any single byte-aligned piece of information. I imagine
the simplest "equivalent representation", in these terms, that one can
think of is a signed byte (128 + 128 combinations == 1 byte of
combinations), which is basically a 4+4 bits alignment.
Now, there exists, or is it interesting to find, an equivalent
representation system that allows for more information, (not directly
"to compress," but I can assume the concept can be exploited to some
extent for data compression) being the additional information any number
of additional combinations over the 256 of a single byte, but that
actually *always* stores on file a total of 8 bits ?
Would such a system violate the counting theorem anyhow ?
And, if it exists, what benefits, if any, could such an equivalent
representation bring to information theory ? (if not directly to data
compression)
If it could be applied to data compression directly instead, how would
you exploit such properties of the equivalent representation in order to
efficiently compress data ?
In other words, what kind of compression system could benefit from being
able to always encode 1 byte that can represent no more than 8
*additional* combinations than the "common" byte-aligned system ? (i.e.
a modest number of extra "bits" for each "byte")
The simplest I can think of is that if I can store one more "decimal
number" in each extra combination I could avoid storing those in 2 bytes
- i.e. it would be possible to transmit the decimal "256" in one byte
instead of two, and so on - but it doesn't look very useful/efficient to
me, for one reason or the other. (extra encoding information required,
being one)
But since there's plenty of experts here, I'd think any of you can come
up with way more efficient schemes/ideas.
Thank you,
Emanuele.