Let's talk about ANOVA. Unlike its monstrous name -- sounds like one of zombie kinds, it is somewhat simple. That is, ANOVA is analysis of variance. If you know what variance is and understand some important issues in the t-test, which we previously discussed, it should be simple to understand.
Before going further, I need some clarification of terms. Suppose we have three groups to compare their exam scores. Each group has 20 members. Then, when we say compare between groups, we mean we see each group as a unit and compare the three groups. So, we are talking about the three groups, not the individual members of the three groups. On the other hand, when we say compare (members) within group, we mean we compare the members of each group. We set one group (out of the three groups) aside, and compare each individual to each other within the group.
It is always better to attack the basics when we run into complicated problem (the monster, ANOVA). So, let's talk about variance first, because after all, the name ANOVA indicates that it has something to do with variance.
What is the variance? It is the squared standard deviation and often written as s2 (for samples) or s2 (for population). The mathematical formula of variance (of population) is:
, where i indicates the ith number of individual; N indicates the total number of population;
indicates the mean of the population.
So, basically what this formula says is that we compare each individual's score to the mean (m) of the population (to get the difference between the two), square the difference, keep doing this for all the individuals, sum the differences up, and divide the sum with the population size (N).
Why do we square the differences? Because unless we do that, the sum of the differences will be always zero. What does this
variance mean? If you take a close look at the upper part and disregard the sigma sign, it is about
how far an individual's score is from the mean of the population. In other words, how much each score
varies from the mean of the population -- That's right, hence the name is the
variance. We can also note that if each individuals vary a lot from the mean, the sum of the upper part will get bigger. This means a bigger variance indicates the individuals do not stick together around the mean. We can visualize this as follows.
[JPG image (29.79 KB)]
The first graph shows its variance is 100, which means its standard deviation is 10. The variance for the second is 4, hence, its standard deviation is 2. As we can see, the higher the variance value is, the wider the graph is (which means individuals vary a lot).
note: This discussion can be also applied to the concept of standard deviation, which is squared root of the variance. When we talked about standard deviation, we also discussed the same characteristics.
The above formula is about
population. When we talk about
sample, the formula becomes slightly different -- but, not much.
, where i indicates ith number of the individual, n indicates the number of the sample (that's right, a big N for population, small n for sample), x bar indicates the mean of the sample.
note: This should remind you that we use Greek characters for representing population, English characters for representing sample. For example, m and s indicate the mean and the standard deviation of population, while x bar and s indicate the mean and the standard deviation of sample.
Using Greek letters or English letters may not seem important. But, the big difference between the above two formulae is N and n-1. Why do we use N for the population and why do we use n-1 for the sample? We can argue that if both are to get the same kind of things, called variance, regardless the difference between sample and population, we should use same number (N or N-1 for the population and n or n-1 for the sample)! The reason for using n-1 for sample is that size of a sample is smaller than that of population -- always, right?. Researchers and mathematicians have found that for small size, using n results in under-estimation of what they want to get. So, the variance obtained with one less from the number of the sample estimates the value of variance better.
Now take a look at the lower part of the formula, n-1. Have you seen this before? YES, YOU HAVE. Where? When we discussed t-test and chi-square test, we talked about the degrees of freedom. As far as I know (frankly I am not quite sure, if this is true), the n-1 is the degrees of freedom of a sample. So, the formula can be re-written as follows.
, where everything is the same, and df indicates
degrees of freedom.
We also call the upper part the Sum of Squared Differences and often use the abbreviation, SS. So, if we rewrite the formula again:
, where SS is
the Sum of Squared Differences , and df is
the degrees of freedom .
This so far was the discussion of variance. Again, you should be able to deduce the characteristics of variance from the formula, rather than memorizing the formula.
note: I know both -- deducing characteristics of variance and memorizing the formula are the same. But, sometimes, people try to memorize things without considering the rationale behind it. If this is the case of yours, you will quickly forget the formula right after an examination. If you know the meanings of variance, you will always draw the characteristics from the given formula. And by doing this, the chances are you will memorize and retain the formula without any difficulties.
Now, I guess we need to discuss the first part of ANOVA, ANalysis.
First, we need to discuss when we use ANOVA. Suppose we have three different groups -- say, group A is the students who use the class forum page as a study guide, group B is those who use review session as a study guide, and group C is those who just regularly attended the class. Then, suppose they had an exam and we got the exam result. With the information (exam result), we want to compare the three groups. That is, we can see if there is differences among the three groups. We can visualize this as follows.
[JPG image (38.15 KB)]
We can compare the three groups by examining the differences between individual scores to the total mean. But, there are two kinds of possible comparison. That is, first, I can compare the mean of each group (85, 88, 82 for Group A, B, C) to the grand mean (85). From this, we can see how far each group varies from the grand mean. Second, we can also compare how the scores of individuals in one group (s1, s2, s3, ... , s20 in Group A) vary from the mean of the group (Mean A=85). This will show how each individual varies from the mean of the group to which he or she belongs.
We can conceptualize this nicely with the terms that I have introduced earlier in this article. That is, the first part -- comparing mean of each group to the grand mean -- is actually comparing between group __(between group __analysis). The second part -- comparing each member with each others within a group -- is comparing within group (within group analysis).
And, by analysis, we mean we obtain the variances -- to see how scores vary from means -- of between groups and of within group. Let's name these two s2between (for between group) s2within (for within group). We can get the ratio between the s2between and s2within which is called F ratio:
This is the basic about the ANOVA. Here we might have a question. That is, if we have three groups and are about to compare whether their exam scores differ from each other, why not use t-test? -- a valid question. With the t-test method, we can compare the group A and the group B; the group B and the group C; and the group C and the group A and see how one differs from each other.
[JPG image (35.3 KB)]
It looks good and simple enough, doesn't it? The problem is when we do the t-test, we take a risk of being wrong -- usually 0.05 probability. When we compare A and B and state there is difference between the two, we take the risk of being false; and the risk is about 5 out of 100. Remember that this risk is the maximum tolerance we can take. In actual case, they (the comparison group) might be perfectly different, which indicates that there is no chance of lying about our conclusion. But, because we do not know about what the real situation is, when we do the statistics, we still assume that there is 5% risk. Or our assumption about being wrong may fully reflect the real situation. That is, the population indeed have some variation that every 5 cases out of 100 cases of sample examination turn out to be against our conclusion. (Remember here we still have 95 cases out of 100 total cases that show the conclusion is right -- after all this is what we admitted (about 95% confidence) in the first place by taking 0.05 probability risk).
Here, we have the same phenomenon (examination result) about one sample. If we partially do the t-test several times and if the reality really reflects our 0.05 probability risk, we actually accumulate the risk of being wrong. Of course, if the three groups are really different in reality and there is no chance of lying about this difference (remember here, the researchers, we, do not know this, we still take the 5% of chance while not knowing it is not necessary), it would be safe to use t-test several time toward the same phenomenon. But, again, we do not know this reality! What this indicates is that we cannot do the t-test several times toward one single phenomenon. It is like collecting beans in box with a small hole. While collecting and counting beans, we do not know how many we lost! Therefore, we turn to ANOVA and use it to see if there is any difference among the groups in exam performance.
So, the below is a graphic demonstration of the ANOVA test. We would like to see if the groups (as independent variable attributes) are different from one another. This means that the ideal type of being difference among the three groups should show (1) that there should be distinct difference among between groups and (2) there should be minimal differences among the individuals within the groups. The first one is easy to understand -- Each group should have distinctive, different mean score against other two. The second one shows that each group should have relatively small varieties within the group. In other words, the individual scores for each group should stick together around their group mean. This will represent that the individuals in the group are distinctively the same kinds and aggregation of such individuals will represent the distinctive difference against other groups. So, the below figure (figure 1) is a good example the three groups are really different. Each mean of the group doesn't share its mean score with others. Also individual scores stick together around the mean of their own groups.
[JPG image (75.73 KB)]
Figure. 1
[JPG image (26.13 KB)]
Figure 2
On the other hand, figure 2 gives us a different story. Firs, it should be noted the shape of the curve differ from one another. The group A has the largest variance while group B has the smallest one (or standard deviation -- can you see why?). It should be also noted that if the bell-shape curves are wide (hence, large within variance), the mean difference between groups should be large. If the within variances are small (for the groups), even small amount of difference in mean scores will make the groups different from each other.
Let's go back to the F value, the ratio of the between variance and within variance. Take a look at the formula below.
This shows the same thing we discussed the above. To summarize, we can think of this formula this way.
In the first place, we want to see if three groups are different in terms of exam score. In order to see a possible difference, the upper part (s2between) should be big; and the lower part (s2between) should be small. If this is the case, we can say that there is difference among the three groups in the exam scores.
And because F value is ratio of two variance values, the above 2, 3 are not fixed. That is, we can think of the relationship between the upper and lower part in this way, too.
If the lower part (s2within) is small enough or very small, even with a slight difference among the group (between variances among the three groups (s2between )), the F value will become large enough. In other words, if each group has small within-group variance, even slight difference of between-group variances will indicate that the three groups are distinctively different.
Anyway, you get the idea of ANOVA... Testing procedure is the same as the T-test.
You obtain F-ratio value (as we did for T-test value or chi-square value), find out degrees of freedom (We have two. I did not discuss this for ANOVA. Please refer to the textbook); Read the F-test table (p. 300, in the textbook); pin down the critical value (with proper df). Compare them. If F-ratio is bigger than critical value, what should we do? -- You should be able to figure this out by this time.