Suppose I believe that every n-bit binary sequence is equally likely.

Now, suppose I observe a new sequence s. Can I quantify how consistent

that sequence is with my belief?

There are various tests I can do

1. Count number of 1's in the sequence. If number of 1's is

considerably higher than expected, then my belief is probably wrong

2. Count number of runs and if there are too many or too few runs of

1's, then this sequence is not very consistent with my belief.

3. Say that because any string is equally likely, then observing

111......1111 is fully consistent with my prior belief, and there's no

reason to update it.

So what would be the Bayesian way to choose between these approaches?

Yaroslav

This is a prior. One could call it the null.

Yes, you can use the likelihood of your model.

You're comparing your null hypothesis here with various alternative

hypotheses:

Alternative = Binomial

Alternative = HMM

Right. It depends, however, on whether you consider the string to be a

sequence or a sample. In 1) it is a sample (each 0 or 1 is an event),

in 2) it is a sequence (each 0 or 1 is an event in the context of the

previous event) and in 3) the whole sequence is a single event.

Aleks

--

mag. Aleks Jakulin

http://www.yqcomputer.com/

Artificial Intelligence Laboratory,

Faculty of Computer and Information Science,

University of Ljubljana, Slovenia.

>

OK, so the question here -- which representation should I choose?

The criteria here is to choose a prior consistent with my beliefs. So

I believe that true distribution is uniform, and want p(data).

A uniform distribution, in my belief:

1. Assigns low probability to strings with too many or too few ones

2. Assigns low probability to strings with too many or too few runs

3. Generally assigns low probability to strings with too much order --

the ones that can be compressed a lot

Ignoring belief 3 for now, I can come up with a good representation of

belief 1 (Binomial distribution), and good representation of belief 2

(runs test). But how would I combine these two distributions? The two

aren't really independent since lots of 1's mean few runs.

Belief 1 and belief 2 are really special cases of belief 3. Counting

number of 1's or number of runs in a string, is an approximation of

compressibility. General compressibility requires finding smallest

Turing Machine that generates string, and is uncomputable. The

practical question is, which other good approximation of

compressibility exist, and how do we combine them to form a consistent

prior?

OK, so the question here -- which representation should I choose?

The criteria here is to choose a prior consistent with my beliefs. So

I believe that true distribution is uniform, and want p(data).

A uniform distribution, in my belief:

1. Assigns low probability to strings with too many or too few ones

2. Assigns low probability to strings with too many or too few runs

3. Generally assigns low probability to strings with too much order --

the ones that can be compressed a lot

Ignoring belief 3 for now, I can come up with a good representation of

belief 1 (Binomial distribution), and good representation of belief 2

(runs test). But how would I combine these two distributions? The two

aren't really independent since lots of 1's mean few runs.

Belief 1 and belief 2 are really special cases of belief 3. Counting

number of 1's or number of runs in a string, is an approximation of

compressibility. General compressibility requires finding smallest

Turing Machine that generates string, and is uncomputable. The

practical question is, which other good approximation of

compressibility exist, and how do we combine them to form a consistent

prior?

My post assumes that you're interested in sequences.

Well, you can always have a mixture of two models. Like:

P(X_t|X_t-1, Theta) = theta_p P(X_t) + theta_q P(X_t | X_t-1)

here theta_p + theta_q = 1. Note that you need to model the two

submodels too, but I didn't list the parameters.

Anyway, this kind of stuff is in the domain of context modelling. With

additional data, the context "grows" in your model. For starters, you

could examine an application of context-modelling to the JPEG-LS image

compression standard:

http://www.yqcomputer.com/

http://www.yqcomputer.com/

and

Applications of universal context modeling to lossless compression of

gray-scale images.

Marcelo J. Weinberger, Jorma Rissanen, and Ron Arps.

IEEE Trans. Image Processing, 5(4):575--586, April 1996.

There are more papers referenced at

http://www.yqcomputer.com/

In addition to the universal modelling work started by Rissanen (and

of which JPEG-LS is a special case), Marcus Hutter told me that he

knows how to come up with these priors: http://www.yqcomputer.com/ ~marcus

Sorry, the references I'm giving you do not address your concerns

directly, but the questions about sequences you're asking have really

been studied far more in information theory than in statistics.

--

mag. Aleks Jakulin

http://www.yqcomputer.com/

Artificial Intelligence Laboratory,

Faculty of Computer and Information Science,

University of Ljubljana, Slovenia.

1. SuperCool random number generator - random sampling, random selection, random name generator

2. Whats the Edgeworth expansion of a random vector?

3. math.random.whats wrong with my code?!!

4. GENERATE RANDOM NUMBERS BUT EXCLUDE A NUMBER IN THE SEQUENCE

5. How can I match a random number with closest number from sequence?

7. Passing random-seed down to vi-MLS Sequence Waveform

8. Will this function yield a random sequence?

9. generate a sequence of random numbers

10. Updating records with a random sequence

11. Generating a random sequence of alpha-numeric characters.

12. random sequence of numbers

13. Generate random sequence based on probability

4 post • Page:**1** of **1**