Thanks. The "package-merge" algorithm by Lawrence Larmore and Daniel
Hirschberg runs in O(n*L) algorithm, where n is the number of symbols
and L is the maximum code length.[1] There's also the "warm-up"
algorithm, which is not optimal but produces optimal results in many
cases and has a provable worst bound.[2, 3].
In the case of gzip, replacing the existing algorithm with an optimal
algorithm such as package-merge produces a minimal benefit. The best
result reported in [4] is that using the package-merge algorithm
reduced the size of a compressed file from 1,817,536 bytes to 1,817,533
bytes, a savings of 0.00017%. On the hand, in the example I gave at
the start of the thread, using an optimal algorithm reduced the size
of the compressed data by 6.4%, so there are values of L and n for
which using an optimal algorithm will produce a significant benefit.
Kenneth Almquist
[1] A Fast Algorithm for Optimal Length-Limited Huffman Codes, by
Lawrence L. Larmore and Daniel S. Hirschberg
<
http://www.yqcomputer.com/ ~dan/pubs/LenLimHuff.ps.gz>
[2] The WARM-UP Algorithm: A Lagrangean Construction of Length
Restricted Huffman Codes, by Ruy Luiz Milidiu, Eduardo Sany Laber
<ftp://ftp.inf.puc-rio.br/pub/docs/techreports/96_15_milidiu.ps.gz>
[3] Efficient Implementation of the WARM-UP Algorithm for the
Construction of Length-Restricted Prefix Codes, by
Ruy Luiz Milidiu, Artur Alves Pessoa, Eduardo Sany Laber
<
http://www.yqcomputer.com/ %2C349071%2C1%2C0.25%2CDownload/
http://www.yqcomputer.com/ :zSzzSzftp.learn.fplf.org.brzSzpubzSzpublicacoeszSzps.gzzSzalenex99.ps.gz/milidiu99efficient.ps.gz>
[4] Improved GNU gzip Data Compression, by Roger Sayle
<
http://www.yqcomputer.com/ ;