[added comp.compression]

This is actually a data compression question. Suppose you take a large

statistical sample of input strings and group them into equivalence classes

by hash value. In order to minimize the number of collisions, each

equivalence class should have about the same number of elements. This is the

same property that a good data compression algorithm has, if you group by a

fixed-length substring of the compressed output. Many compression algorithms

violate your ordering constraint, but order-preserving compression is not a

hard problem. The best way to construct your hash function (ignoring

performance issues) is to take the first N bits of the output of a good

order-preserving compressor applied to the input string.

Note that it's unavoidable that your hash values will only depend on the

first few bytes of the input for most inputs. E.g. for English text you

can't do better than about 8:1 compression, which means that an N-bit hash

will typically depend on only the first 8N bits of the input.

If your input strings are something like ASCII text, and you want a

reasonably fast hash function, you could try a static order-0 Huffman model.

Question for comp.compression people (since I'm curious): do any of the

algorithms mentioned at maximumcompression.com preserve lexicographic order?

-- Ben

Most of the PAQ algorithms, when used without preprocessing transforms,

preserve lexicographic order, or in some versions, reverse

lexicographic order. FPAQ0 does also, using a simpler model.

-- Matt Mahoney

1. Order-preserving hash of strings

2. Preserve insert order in a Hash

3. Creating an array of hashes from a database while preserving order

4. Printing the output of a hash of hashes in insertion order

5. Hash#values and Hash#keys order

6. Hash#keys, Hash#values order question

7. pgsql-server: Add: > * Order heap pointers on hash index pages by hash

8. Perl hash and rehash seeds; deterministic hash ordering

9. Order-preserving set difference

10. How to Display DropDownList with preserved order (custom order)

11. Change Legend Order Yet Preserve Plot Order

12. Hash of hashes, of hashes, of arrays of hashes

13. Hash#select returns an array but Hash#reject returns a hash...

14. Iterating over a hash of hash of hashes

15. How to print hashes consisting of hashes of hashes----

2 post • Page:**1** of **1**