Hi,

I'm trying to fit my data to one of well known distribution functions

using kstest. What I'd like to say at the end is something like

"Gamma distribution passed Kolmogorov-Smirnov test with 90%

confidence level for my data" (Gamma distribution and the 90% value

were arbitrarily chosen).

Please take a look at the following code.

Three different distributions were tested to fit my data 'xx' using

normfit, gamfit, and weibfit as shown in the code.

'alpha' is the significance level, and whatever it is, that is the

question I'd like to make.

[norm_para(1) norm_para(2)] = normfit(xx);

[H(1),P(1),KSTAT(1),CV(1)]=kstest(xx', [xx'

normcdf(xx,norm_para(1),norm_para(2))'],alpha);

gam_para = gamfit(xx);

[H(2),P(2),KSTAT(2),CV(2)]=kstest(xx', [xx'

gamcdf(xx,gam_para(1),gam_para(2))'],alpha);

weib_para = weibfit(xx);

[H(3),P(3),KSTAT(3),CV(3)]=kstest(xx', [xx'

weibcdf(xx,weib_para(1),weib_para(2))'],alpha);

First, alpha = 0.05, and run the code.

The result shows H=[1 1 0], P=[0.0127 0.0121 0.2041].

Second, alpha = 0.01;

then, H=[0 0 0], P==[0.0127 0.0121 0.2041].

What result could I get from this?

Does it mean that the third distribution(weibull) fits the best?

(Why? biggest p-value?)

Can I simply conclude the distribution with the biggest p-value is

the best fit?

What is the confidence level in this case? (what is the link between

p-value and confidence level and significance level?)

Thank you in advance.

Haewoon

Hi Haewoon -

A couple of things:

1) P-values from the one-sample K-S test are only valid if the distribution

you are testing against is _fully known in advance_. They are _not_ valid if

you test against a distribution that you have estimated from data. The

p-values in that case are typically "too large", because you are testing

against a distribution that is "too close" to your data. This will be

discussed in any stats text that describes the K-S test. That being said, it

is not unreasonable to use the p-values as a relative measure of goodness of

fit, as long as you don't interpret them too seriously as an absolute

indicator of fit.

2) The normal and the (logged) Weibull are both location-scale families, so it

is possible to get "correct" p-values. In the case of the normal, you can use

LILLIETEST. For the Weibull, it's not too hard to do something similar with a

Monte-Carlo test.

3) When choosing between distributions, and deciding on goodness of fit, it's

always a good idea to make CDF plots of your data against the fitted

distribution. The K-S test gives you a number, a good plot gives you the

whole story. If you have R14, a new GUI in the Stats Toolbox, called

DFITTOOL, is a big help in that direction. Or, use ECDF and STAIRS and

overlay the fitted CDF values on that.

4) If you have a recent release (R13 with Service Pack 1, or R14), you should

use WBLFIT and WBLCDF instead of WEIBFIT and WEIBCDF. The first two use a

somewhat better parameterization for the distribution.

Hope this helps.

- Peter Perkins

The MathWorks, Inc.

A couple of things:

1) P-values from the one-sample K-S test are only valid if the distribution

you are testing against is _fully known in advance_. They are _not_ valid if

you test against a distribution that you have estimated from data. The

p-values in that case are typically "too large", because you are testing

against a distribution that is "too close" to your data. This will be

discussed in any stats text that describes the K-S test. That being said, it

is not unreasonable to use the p-values as a relative measure of goodness of

fit, as long as you don't interpret them too seriously as an absolute

indicator of fit.

2) The normal and the (logged) Weibull are both location-scale families, so it

is possible to get "correct" p-values. In the case of the normal, you can use

LILLIETEST. For the Weibull, it's not too hard to do something similar with a

Monte-Carlo test.

3) When choosing between distributions, and deciding on goodness of fit, it's

always a good idea to make CDF plots of your data against the fitted

distribution. The K-S test gives you a number, a good plot gives you the

whole story. If you have R14, a new GUI in the Stats Toolbox, called

DFITTOOL, is a big help in that direction. Or, use ECDF and STAIRS and

overlay the fitted CDF values on that.

4) If you have a recent release (R13 with Service Pack 1, or R14), you should

use WBLFIT and WBLCDF instead of WEIBFIT and WEIBCDF. The first two use a

somewhat better parameterization for the distribution.

Hope this helps.

- Peter Perkins

The MathWorks, Inc.

1. Question about kstest and ecdf

2. kstest for lognorm and weibull

5. Favorite Add on's/Add in's for FrontPage 2002

6. dd On's

8. Awesome!, I won Picture of the Day at e-on's web site!!!

10. [Wring Group] Help with Add-on's in IE7

11. AutoCAD LT Customization add-on's

13. Can't get IE to allow Ad-on's for Old Lap top with IE 7 or 8?

15. IE add on's

2 post • Page:**1** of **1**