Hi,

I've got a two-wave (i.e., PRE/POST) study. 100 subjects filled out

the same measures before and after a course of treatment. The measure

I'm interested in here is a scale consisting of 20 items.

Wave 1 OF STUDY: I've extracted, rotated, and interpreted three

factors from the 20 items administered in Wave 1, using common factor

analysis (i.e., not principal components analysis), and I would like

to compute factor scores. This is easy enough to do in SPSS.

Wave 2 OF STUDY: I would now like to compute factor scores using the

Wave 2 administration of these same 20 items, but instead of running

another factor analysis, I need to "assume" the same factor structure

as was estimated for Wave 1, because my ultimate goal is to compute

DIFFERENCE SCORES for each of the 100 subjects. My reasoning is that

if two sets of factor scores from SEPARATE factor analyses are

subtracted from each other to form difference scores, it would be like

subtracting apples from oranges instead of apples from apples, to use

an extremely crude metaphor.

My question is: How do I calculate Wave 2 factor scores using the raw

scores at Wave 2 and the factor score coefficient matrix output from

the abovementioned Wave 1 factor analysis? I am not familiar with the

SPSS matrix language and am wondering if there's a way to do it

without resorting to it.

Thanks!

Of course a lot depends on the overall purposes of your project. What

happened to the respondents between the study waves?

3 scales from 20 items may be a little iffy. What did you use as a

stopping rule?

How did the eigenvalues obtained compare to the eigenvalues from a

parallel analysis?

I would suggest the conventional scale construction approach.

Common factor analysis is conventional. using varimax rotation and

using only items that load cleanly on the scales that represent the

factors. that way you get divergent validity.

With SPSS it helps to use the options to sort the items in the order

they load on the factors and to suppress loading under .3 or so.

get the scores by something like.

compute scalescore1 = mean(item11, item3, item17, item20).

compute scalescore2 = mean(item12, item4, item18, item1).

compute scalescore3 = mean(item13, item5, item19).

Using unit weights create clearer interpretation of the meaning of a

score, and provides a scoring key that is usable at other times with a

lessened capitalization on chance.

Art Kendall

Social Research Consultants

happened to the respondents between the study waves?

3 scales from 20 items may be a little iffy. What did you use as a

stopping rule?

How did the eigenvalues obtained compare to the eigenvalues from a

parallel analysis?

I would suggest the conventional scale construction approach.

Common factor analysis is conventional. using varimax rotation and

using only items that load cleanly on the scales that represent the

factors. that way you get divergent validity.

With SPSS it helps to use the options to sort the items in the order

they load on the factors and to suppress loading under .3 or so.

get the scores by something like.

compute scalescore1 = mean(item11, item3, item17, item20).

compute scalescore2 = mean(item12, item4, item18, item1).

compute scalescore3 = mean(item13, item5, item19).

Using unit weights create clearer interpretation of the meaning of a

score, and provides a scoring key that is usable at other times with a

lessened capitalization on chance.

Art Kendall

Social Research Consultants

n Feb 10, 5:16m, Art Kendall < XXXX@XXXXX.COM > wrote:

> compute scalescore3 mean(item13, item5, item19)>

> Using unit weights create clearer interpretation of the meaning of >

> score, and provides a scoring key that is usable at other times with >

> lessened capitalization on chance>

> Art Kendal>

> Social Research Consultant>

> Fusto wrote> >> > Hi>

I agree with Art: the conventional scale construction approach usually

leads to results that are far more likely to hold up under cross-

validation.

Fusto says: "My reasoning is that

if two sets of factor scores from SEPARATE factor analyses are

subtracted from each other to form difference scores, it would be like

subtracting apples from oranges instead of apples from apples, to use

an extremely crude metaphor. "

This is exactly the problem with factor scores derived from exloratory

factor analysis: they are way too dependent upon the unique

characteristics of your measurement sample. This is especially true

since your number of cases is quite small compared to the number of

items under analysis.

By far the best approach would be to construct your scales based on

theoretical considerations. In that case you can do a confirmatory

factor analysis, which is less prone to sampling error. A very simple

technique for confirmatory factor analysis is the MultiGroup centroid

method. (MGM) Basically it boils down to comparing the corrected item-

total correlation for an item to the correlations of that item with

the scales it does not belong to (in theory).

Another thing: since you are computing difference scores, and your

number of items per factor is quite small, and your sample size is

quite modest, you might be running into power problems here. Mind that

the standard error of difference scores are 1.4 (square root of two)

times as large as the standard error of the individual factor scores!

> compute scalescore3 mean(item13, item5, item19)>

> Using unit weights create clearer interpretation of the meaning of >

> score, and provides a scoring key that is usable at other times with >

> lessened capitalization on chance>

> Art Kendal>

> Social Research Consultant>

> Fusto wrote> >> > Hi>

I agree with Art: the conventional scale construction approach usually

leads to results that are far more likely to hold up under cross-

validation.

Fusto says: "My reasoning is that

if two sets of factor scores from SEPARATE factor analyses are

subtracted from each other to form difference scores, it would be like

subtracting apples from oranges instead of apples from apples, to use

an extremely crude metaphor. "

This is exactly the problem with factor scores derived from exloratory

factor analysis: they are way too dependent upon the unique

characteristics of your measurement sample. This is especially true

since your number of cases is quite small compared to the number of

items under analysis.

By far the best approach would be to construct your scales based on

theoretical considerations. In that case you can do a confirmatory

factor analysis, which is less prone to sampling error. A very simple

technique for confirmatory factor analysis is the MultiGroup centroid

method. (MGM) Basically it boils down to comparing the corrected item-

total correlation for an item to the correlations of that item with

the scales it does not belong to (in theory).

Another thing: since you are computing difference scores, and your

number of items per factor is quite small, and your sample size is

quite modest, you might be running into power problems here. Mind that

the standard error of difference scores are 1.4 (square root of two)

times as large as the standard error of the individual factor scores!

On Mon, 9 Feb 2009 21:43:29 -0800 (PST), Fusto < XXXX@XXXXX.COM >

I agree with the posters who say that you will probably be happier

with your factors constructed from simple averages of a few items.

Those have intelligibility and can be separately described for item

reliability, etc.

However, if you are determined to use exact factors, you can see

some description of how to do it in my Replies in this group in

January (16 and 19), in the thread

Manual Factor Score vs. SPSS Factor Score

--

Rich Ulrich

I agree with the posters who say that you will probably be happier

with your factors constructed from simple averages of a few items.

Those have intelligibility and can be separately described for item

reliability, etc.

However, if you are determined to use exact factors, you can see

some description of how to do it in my Replies in this group in

January (16 and 19), in the thread

Manual Factor Score vs. SPSS Factor Score

--

Rich Ulrich

> Manual Factor Score vs. SPSS Factor Score> >> > --> > Rich Ulrich- Hide quoted text -> >> > - Show quoted text -

Hey Rich,

Hope it's okay if I ask a question. Is there any preference to taking

the means of items over, say, summing the items?

Thanks,

Ryan

> Manual Factor Score vs. SPSS Factor Score> >> > --> > Rich Ulrich- Hide quoted text -> >> > - Show quoted text -

Thanks to all of you for your very helpful responses!!!

On Tue, 10 Feb 2009 16:04:44 -0800 (PST), Ryan

[snip, previous. Keeping what is relevant to new question.]

[...]

It won't make any difference to your ANOVA tests, if

that is a worry to anyone, but I have a definite preference

for using the means over the sums. I'm thinking especially

of ad-hoc scales that are used in clinical trials. (Given someone's

"standard scale" that has often been reported, we are pretty-

much stuck with it. Even though the BPRS or the Hamilton

might be easier to teach if they used averages.) Here are

the main reasons.

- The sums each have an arbitrary Maximum, different for

scales with different numbers of items, so the only way to

know the 'meaning' of a total is to know the scale intimately.

That is an unnecessary burden on the reader, who is not the

PI who loves his scale. Or it is a burden on the statistician

who is dealing with dozens of scales, and wants to deal

intelligently with this scale without spending his life on it.

- The items have verbal labels which can be used to

interpret the average. On the one hand, it gives labels like

"never". Also, it is the easiest way to show any reader,

that a difference of 0.1 points is trivial, while a difference

of 1.0 points is large. The scoring reflects the units of

the "effect size" in the terms of the measurement.

- When there are occasion items that were blank, the

question of "What did you do with the missing?" is readily

answered. The score is the "average of those answered,

requiring (say) at least 3/4ths of the items to be present"

or it will be scored missing.

The other alternative that I have used frequently for

composite scores - not often for a Likert scale, but usually for

factors that are formed across different domains - is the T-score.

Scales are standardized with a mean of 50 and a standard deviation

of 10 - usually using the mean and SD for the whole sample at Pre.

That makes it relatively easy to look at group differences and

changes across time. (The standard deviation of 10 means

that you can report interesting differences without the clutter

of decimal points. The mean of 50 means that you don't

have the clutter of negative values for group means.)

--

Rich Ulrich

[snip, previous. Keeping what is relevant to new question.]

[...]

It won't make any difference to your ANOVA tests, if

that is a worry to anyone, but I have a definite preference

for using the means over the sums. I'm thinking especially

of ad-hoc scales that are used in clinical trials. (Given someone's

"standard scale" that has often been reported, we are pretty-

much stuck with it. Even though the BPRS or the Hamilton

might be easier to teach if they used averages.) Here are

the main reasons.

- The sums each have an arbitrary Maximum, different for

scales with different numbers of items, so the only way to

know the 'meaning' of a total is to know the scale intimately.

That is an unnecessary burden on the reader, who is not the

PI who loves his scale. Or it is a burden on the statistician

who is dealing with dozens of scales, and wants to deal

intelligently with this scale without spending his life on it.

- The items have verbal labels which can be used to

interpret the average. On the one hand, it gives labels like

"never". Also, it is the easiest way to show any reader,

that a difference of 0.1 points is trivial, while a difference

of 1.0 points is large. The scoring reflects the units of

the "effect size" in the terms of the measurement.

- When there are occasion items that were blank, the

question of "What did you do with the missing?" is readily

answered. The score is the "average of those answered,

requiring (say) at least 3/4ths of the items to be present"

or it will be scored missing.

The other alternative that I have used frequently for

composite scores - not often for a Likert scale, but usually for

factors that are formed across different domains - is the T-score.

Scales are standardized with a mean of 50 and a standard deviation

of 10 - usually using the mean and SD for the whole sample at Pre.

That makes it relatively easy to look at group differences and

changes across time. (The standard deviation of 10 means

that you can report interesting differences without the clutter

of decimal points. The mean of 50 means that you don't

have the clutter of negative values for group means.)

--

Rich Ulrich

> of ad-hoc scales that are used in clinical trials. Given someone's >> "standard scale" that has often been reported, we are pretty- >> much stuck with it. ven though the BPRS r the Hamilton gt; > might be easier to teach if they used averages.) ere are> > the main reasons.> >> > The sums each have an arbitrary Maximum, different fo>

> scales with different numbers of items, so the only way t>

> know the 'meaning' of a total is to know the scale intimately>

> That is an unnecessary burden on the reader, who is not th>

> PI ho loves his scale. r it is a burden on the statisticia>

> who is dealing with dozens of scales, and wants to dea>

> intelligently with this scale without spending his life on it>

> The items have verbal labels which can be used >o

> interpret the average. n the one hand, it gives labels li>e

> "never". Also, it is the easiest way to show any read>r,

> that a difference of 0.1 oints is trivial, while a differe>ce

> of 1.0 oints is large. he scoring reflects the units>of

> the "effect size" n the terms of the measurement.

I follow--makes sense. Thanks>

> >

> When there are occasion items that were blank,>the

> question of "What did you do with the missing?" s rea>ily

> answered. he score is the "average of those answe>ed,

> requiring (say) at least 3/4ths f the items to be pres>nt"

> or it will be scored missing.

Right.>. >

>

> The other alternative that I have used frequently>for

> composite scores - not often for a Likert scale, but usually>for

> factors that are formed across different domains - is the T-sco>e. > Scales are standardized with a mean of 50 and a standard devi>tion

> of 10 - usually using the mean and SD for the whole sample at>Pre.

> That makes it relatively easy to look at group difference> and

> changes across time. (The standard deviation of 10>means

> that you can report interesting differences without the c>utter

> of decimal points. he mean of 50 means that you>don't

> have the clutter of negative values for group means.)

This has been my common pract>ce>

>> > --

> Rich Ulrich

Thank you,

Ryan

1. Manual Factor Score vs. SPSS Factor Score

2. Style sheet question: difference between div#score vs. #score

3. formula to move a scored number to a score sheet

4. rank scores show ties and add top four scores

5. How to Differentiate between a Student's Score of "0" and a Score of Null?

6. memory score and cpu scores lower than expected.

7. UT3 System Score Calculator: Post Scores!

8. Baseball scores and box scores?

9. Can PROC GENMOD models be scored via PROC SCORE?

10. excel golf scores, how do I add the scores for all par 3's etc

11. Need Dynamic Visual for Score of 1 being HIGH and Score of 4 being

12. [PHP] scoring/sorting db search results based on score

13. Scoring - Help please, complicated score counter

14. save score and put in array to pull up all score later

15. Looking for formula to place score in a range of scores

8 post • Page:**1** of **1**