In a repeated measures design, all participants experience every condition of the study. We have already seen that a repeated measures or within-subjects t-test is statistically more sensitive, and is better able to detect effects than an independent groups t-test. This is partly due to the fact that we remove variability from scores that can be attributed to individual differences. Using repeated measures ANOVA allows us to see much more clearly how variability attributable to individual differences can be literally removed from the equation, making the test much more sensitive than the independent groups ANOVA previously described. Just like with independent samples t-test and ANOVA, you would use dependent samples t-test when you compare two conditions and Repeated Measures ANOVA when you compare more than two conditions.
Let’s consider the same example from the video homework problem presented in the previous chapter:
Mozart | Relaxation | Silence | |
5 | 4 | 2 | |
7 | 6 | 5 | |
4 | 3 | 3 | |
4 | 2 | 1 | |
4 | 3 | 2 | |
mean | 4.8 | 3.6 | 2.6 |
We have already seen how to use an independent groups ANOVA to test for an effect of treatment, i.e. to determine whether there are significant differences among the three group means, and how to use post-hoc tests to determine which differences between pairs of means are significant.
Further, remember what is actually compared in ANOVA:
The numerator is a measure of the variability among the treatment means (literally, the variance among the treatment groups). Possible sources of this variability include effects of the treatment, individual differences, and other effects of chance. The denominator is a measure of the variability due to individual differences, and other effects of chance. When the treatment has no effect (as the null hypothesis states), the value of F converges on 1.00.
Consider doing this study as a repeated measures design: Every participant would listen to Mozart AND the relaxation tape AND silence and take the IQ test after each listening session. We would, of course, need to appropriately counterbalance the conditions, so that the order would not influence the outcome. We could do use either complete or partial counterbalancing.
If all goes well, we may very well get data like these:
Subject # | Mozart | Relaxation | Silence | Subject Averages |
S1 | 5 | 4 | 2 | 3.67 |
S2 | 7 | 6 | 5 | 6.00 |
S3 | 4 | 3 | 3 | 3.33 |
S4 | 4 | 2 | 1 | 2.33 |
S5 | 4 | 3 | 2 | 3.00 |
mean | 4.8 | 3.6 | 2.6 |
Notice that the table now includes Subject #, and the scores in each condition now represent the spatial IQ from the same person within a row. The table also shows averages across all listening conditions for each participant. Now consider the same questions from the previous chapter: Why are the treatment means different from each other? In this repeated measures design, you can no longer attribute the differences among treatment means to individual differences, because the same people experienced each condition. The other two sources of variability still hold true: there may be an effect of treatment, and other effects of chance certainly play a role: S4 may have been distracted by a random thought in the silence condition, S2 may have experienced some insight unrelated to the music in the Mozart condition.
Why are the scores within a treatment group not the same? Well, within a group the scores DO come from different people, and other effects of random chance may still play a role.
Finally, consider the variability across subject averages. Obviously one source of this variability is the fact that they come from different people. Is it possible that treatment effects contribute to this variability? No, because the averages are across all three treatments and we do this for each subject. Similarly, other effects of random chance are likely to be averaged out across these three conditions. So the only source of variability among the subject averages is individual differences.
Now consider again the independent groups ANOVA statistic, F:
In an independent groups design, the numerator term, MSbetween treatments, is a measure of variability due possibly to treatment effects, individual differences, and other effects of chance. The denominator term, MSwithin treatments, is a measure of variability due to individual differences and other effects of chance.
A repeated measures ANOVA F would have the same numerator term, MSbetween treatments, but the sources of variability would NOT include individual differences, because the same people experienced each condition. The only potential sources of variability for the numerator term are treatment effects and other effects of chance. We need to find some way of removing variability due to individual differences from the denominator term to make a repeated measures F make sense.
Variability | Measure | sources of variability |
Among group means | SSbetween treatments | possible effects of treatment, chance |
Within treatment groups | SSwithin treatments | individual differences, chance |
Among subject means | SSbetween subjects | individual differences |
(difference between SSwithin treatments and SSbetween subjects) | SSerror | chance alone |
Because we can attribute the variability of the subject averages to individual differences alone, we can subtract a measure of this variability from SSwithin treatments to arrive at a denominator term that makes sense. The measurement of the subject average variability is called “SSbetween subjects”, and the remaining measure after subtracting SSbetween subjects from SSwithin treatments is called SSerror or SSresidual.
The formulae for SSbetween treatments and SSwithin treatments are still the same:
Subject # | Mozart | Relaxation | Silence | P |
S1 | 5 | 4 | 2 | 11.00 |
S2 | 7 | 6 | 5 | 18.00 |
S3 | 4 | 3 | 3 | 10.00 |
S4 | 4 | 2 | 1 | 7.00 |
S5 | 4 | 3 | 2 | 9.00 |
SS | 6.8 | 9.2 | 9.2 | |
T | 24 | 18 | 13 | G=55 |
mean | 4.8 | 3.6 | 2.6 | N=15, k=3 |
The new formula for this chapter is for SSbetween subjects:
where k is the number of conditions and P is the sum of the data for all conditions for each participant. Notice the similarity to the formula for SSbetween treatments. It is as though we are considering the rows of the data set (from different participants) as a treatment in and of itself, which is very much what we are doing.
SSerror is easily found by subtraction:
The last step is to calculate the denominator term for the repeated measures F ratio. Like all MS’s, MSerror is defined thusly:
How do we calculate dferror? Just as we subtracted:
Similarly:
Finally, use MSbetween treatments in the numerator, and MSerrorin the denominator, where:
If your observed value for F is beyond Fcrit, the proper statistical decision is to reject H0, and conclude that there is a significant difference somewhere among your group means. This can also be stated as a significant effect of the independent variable. It is necessary to follow a rejection of H0 with an appropriate post-hoc test to determine precisely where any significant differences among group means lie.
STEP ONE: STATE THE HYPOTHESES
The hypotheses are the same for one way ANOVA and repeated measures ANOVA with a slight wording change for the alternative hypothesis.
H0: µ1 = µ2 = µ3
H1:There is at least one difference in the means*
*leave out the word group because the groups are the same across the conditions when you use the same participants in each condition
Like with one way ANOVA, this version of the alternative hypothesis (H1) is a broad one covering all possible combinations of the differences you could find in the means. You could set up an alternative hypothesis that posits very specific mean differences as well.
STEP TWO: DETERMINE THE CRITICAL REGION
You use an F table to find the critical value for repeated measures ANOVA. You will need two different df values for the table. For one way ANOVA, you use dfbetween treatments and dfwithin treatments. For repeated measures ANOVA, you use dfbetween treatments for the df in the numerator of the F-ratio and dferror for the df in the denominator of the F-ratio.
For the dataset in this chapter:
dfbetween treatments = k – 1 = 3 – 1 = 2
dferror = (N – k) – (n – 1) = (15 – 3) – (5 – 1) = 12 – 4 = 8
With an alpha = 0.05, the critical value in the F-Tableis 3.84. An F-table is also available on page 541 in the Morling book.
STEP THREE: CALCULATE YOUR STATISTIC
For the data in this chapter, this calculation would look like this:
Oftentimes, all of these calculations are displayed in a source table.
SS | df | MS | F | |
Btwn Treatments | 12.13 | 2 | 3.03 | 12.63 |
Within Treatments | 25.2 | 12 | ||
Btwn Subj | 23.29 | 4 | ||
Error | 1.91 | 8 | 0.24 | |
Total | 37.33 | 16 |
STEP FOUR: MAKE A DECISION
Next, we compare our critical value to our calculated F. If our calculated value is higher, we reject the null hypothesis and say that there is at least one difference in the treatment means. If our calculated F is lower, we fail to reject the null hypothesis and conclude that the treatment means are not different. In this case our calculated F is 12.63 which is larger than our critical value for F of 3.84; thus, we reject the null and say there is a difference in the groups.
We need to include an interpretation in APA style to state our decision. For this study, one example of this would be:
The researchers examined whether or not 10 minutes of auditory exposure to Mozart music, a relaxation tape or silence impacted performance on a spatial IQ test. There was a difference between the treatments, F(4, 8) = 12.63, p< 0.05.
From here, a researcher would want to investigate this difference further using planned comparisons or post hoc tests.
POST HOC TESTS FOR REPEATED MEASURES
The same exact procedures are used for post hoc tests for repeated measures ANOVA and for one way ANOVA. The only difference is that now you must use MSerror and dferror in place of MSwithin_treatments and dfwithin_treatments.