Let me first introduce myself. My name is Ken Helberg and I've been
reading this newsgroup for over a year simply out of a general interest
in compression; learning what I can when it interests me. With my
limited knowledge, I have little to add to various discussions that
occur in this group. My work also only allows me to read this via
google groups, so I apologize ahead for anything funky that will occur
with this message.
I'm at a point with a particular project where I need to consider
compressing some data in order to make space available for other pieces
In the system I have a small scale audio DSP that has a 35-bit,
512-word program space. The code is downloaded to the DSP only once at
startup and never changes. The 35-bit values are stored as 40-bit
values currently in flash. So the DSP program code takes up 2560 bytes.
If possible, I would like to get that smaller. The tradeoff, of course,
is that whatever code I add to decompress needs to be smaller than the
savings in compressed data. The time it takes to decompress is of
little concern since it's only done once.
The first thing I thought I could do is to store the data in 36-bit
chunks. Possibly with the upper nibble of the 36-bit words stored in
their own array so I don't have to swap nibbles for every odd-numbered
word if I were to store them consecutively. Similarly, it'd be too
costly in code to store the 35-bit values and shift the bits around
just to save a bit per word over the 36-bit method. It might take 40
bytes of code to handle this little compression method; giving a
savings of 216 bytes - not much.
Another method I considered is that there are various runs of (what
look to be) NOPs that I could take out and pseudo-RLE; storing where
the run of NOPs exists and the length of the run. I could use the
offset of the next run off the start or end of the previous run to try
and save space, but unfortunately one of the steps between NOPs is over
256, which means I'd still have store the offset as a 16-bit value. I
could take a small hit in code complexity and contain the length of the
run as well as the offset in a single 16-bit value. WIth the current
DSP program code I have, this'll save me an additional 600 bytes,
approximately. This is different for different program code,
understandably, so I'd like to find a better method.
...which brings me to why I'm really writing to the group. I've got no
information as to how the opcodes are setup; which bits mean what. I
was curious if I could use a method similar to how the probabilities
are gathered for arithmetic coding to determine the bit-to-bit or
bit-to-multibit correlation. Figuring I could, I wondered if anyone
here has done that and has some sample code I could tweak for my needs.
This information would help me to find where the different pieces of
the opcode are split, allowing me to then analyze each word and the
probability of the separate opcode parts. With the small micro I have,
it would probably be easier to do Huffman with known probabilities than
it would to do arithmetic. With my limited RAM (1K) and processing
capability, I don't believe adaptive is an option, either. But, maybe
you folks have some more insight into this.
Here's a text representation of the DSP program code:
Thanks in advance for any help,