clustering and data variance question

clustering and data variance question

Post by ozgun.harm » Tue, 13 May 2008 00:19:07


Hello,
We have been doing some data clustering to compare samples generated
by two different methods: A method is used to generate sample x_1,
then we cluster x_1 using diana in R package and determine the optimal
clustering scenario by maximizing calinsky harabasz index (as
calculated by R). diana is divisive analysis, which is a hierarchical
divisive clustering method. It computes a tree or dendrogram.

Our hypothesis is that one method should generate data which is less
scattered, meaning that cluster analysis should yield less number of
clusters.

However, when we do the clustering analysis on the generated samples,
we saw that there is no clear distinction between number of clusters.
But if I look at the tree's generated by diana then it is obvious to
me that the method which we expect to have less clusters has less
spread in the tree.

I am thinking that we should also use the variance of data in the
clusters in addition to number of clusters to compare the sampling
methods. I, however, could not find a theoretical way to do that.
Could you suggest me ideas, papers or books to follow up with this
problem?

I hope this makes sense.
Arif.
 
 
 

1. clustering-variance question

2. Question on Variances specially Duration Variance

We have just implemented Project Server2003.
i would like to know when does the Duration variance field (and other
variance fields for that matter) gets calculated by the system?
In the resource tab in the PWA, the system can provide you with a list of
tasks assigned to a resource including such info as when the task was last
updated and the duration variance of the task. To see how the data in that
view changes, I create a small project with tasks assigned to me. I noticed
that the value for the duration variance remain blank even after I approved
the updates I've submitted.
But when I added the Duration Variance column in the project plan in MS
Project, the column contains values for each task.
Can anyone explain the reason for differences for me?
Thank you very much in advance.

3. variance of a cluster

4. minimum-variance clustering methods

5. two questions: and's inside if's and checking the return code of a program

6. Wireless Authentication ands Ecryption Question

7. related question Question about computing a "moving variance"

8. Pivot Tables - Variance and Variance %

9. Show Start Variance and Finish Variance w graphical indicators

10. Pivot Tables - Variance and % Variance fields

11. Show Start Variance and Finish Variance w graphical indicators?

12. how does each row, column's variance related with the matrix variance?

13. Cost Variance and Work Variance (Resource) Not Correctly Computed?

14. Cost Variance and Work Variance (Resource) Not Correctly Compu

15. Negative Variance (for Duration Variance)