So far, you have been reading about statistics that describe sets of
data. In many research studies, sociologists might want to know the
extent to which two variables are related. Correlational statistics do
just that. Correlational statistics yield a number called the coefficient
of correlation. The coefficient may vary from 0.00 to 1.00.
Correlations may also be either positive or negative. In a positive
correlation, scores on two different variables increase and decrease
together. For example, there is a positive correlation between high
school average and freshmen grade point average in college. In a
negative correlation, as scores for one variable decrease, they increase
for the other variable. For example, there is a negative correlation
between absenteeism and course performance. The strength of a
correlation depends on its size, not its sign. For example, a correlation
of -.72 is stronger than a correlation of +.53.
Correlational statistics are important because they permit us to
determine the strength and direction of the relationship between
different sets of data or to predict scores on one distribution based on
our knowledge of scores on another. If the correlation between two
sets of data were a perfect 1.00, we could predict one score from
another with complete accuracy. But because correlations are almost
always less than perfect, we predict one score from another only with
a particular probability of being correct--the higher the correlation, the
higher the probability.
It cannot be stressed strongly enough that correlation does not mean
causation. For example, years ago, authorities presumed that autistic
children, who have poor social and communication skills, were caused
by "refrigerator mothers." Mothers of autistic children were aloof from
them. This was taken as a sign that the children suffered from mothers
who were emotionally cold. Knowing that this is simply a correlation,
you might wonder whether causality was in the opposite direction.
Perhaps autistic children, who do not respond to their mothers, cause
their mothers to become aloof from them. Moreover, why would a
mother have several normal children, then an autistic child, and then
several more normal ones? It would be difficult to believe she was a
warm parent to all but one. Today, evidence indicates that autism is a
neurological problem that has nothing to do with the mother's
emotionality.
As another example, although there is a positive correlation between
smoking and cancer in human beings, this correlation is not
scientifically acceptable evidence that smoking causes cancer. Perhaps
another factor (such as a level of stress tolerance) might make
someone prone to both smoking and cancer, without smoking's
necessarily causing cancer. Of course, correlation does not imply the
absence of causation. For example, there may indeed be a causal
relationship between smoking and cancer. The point is that if two
variables are strongly correlated, one of the variables may cause the
other, or there may not be a causal link: we just cannot tell for sure
based on a correlation coefficient. But remember that knowing that
two variables are related is still an important piece of information.
Learning Check #15: Many studies have determined that there is
a positive correlation between viewing violence on television and
violent behavioral patterns. What does this mean?
Click here for Answer.
Learning Check #16: Given that there is a positive correlation
between viewing violence on television and violent behavior, can
we conclude from this data that watching the violence on
television causes children to behave violently?
Click here for Answer.
Learning Check #17: Researchers used to believe that there was a
negative correlation between age and IQ. Recently, this
correlation has turned out to be much weaker than we originally
thought. Describe what is meant by a negative correlation
between age and IQ.
Click here for Answer.
Scatter Plots
Correlational data are graphed using a scatter plot, also known as a
scattergram or scatter diagram. In a scatter plot, one variable is plotted
on the abscissa and the other on the ordinate. Each participant's
scores on both variables are indicated by a dot placed at the junction
between those scores on the graph. This produces one dot for each
participant. The pattern of the dots gives a rough impression of the size
and direction of the correlation. In fact, a line drawn through the dots,
or line of best fit, helps estimate this. The closer the dots lie to a
straight line, the stronger the correlation. Figure B.6 illustrates several
kinds of correlation.
Pearson's Product-Moment Correlation
The most commonly used coefficient of correlation is the Pearson's
product-moment correlation (Pearson's r), named for the English
statistician Karl Pearson. One formula for calculating it is presented in
Figure B.7. The example assesses the relationship between home runs
and stolen bases by five baseball players during one month of a
season. Recall that correlation coefficients range from 0 to 1.00 and
can be either negative or positive. This correlation of -.23 is
considered to be a weak, negative correlation.
Learning Check #18: In a large study of twins, the Minnesota
Twin study found a correlation of +.71 between the IQ scores of
identical twins. Another study found that family income is
correlated +.30 with the IQ of children. What do these correlation
coefficients mean?
Click here for Answer.
Coefficient of Determination
One last number that can be helpful in understanding the relationship
between two variables is the coefficient of determination. The
coefficient of determination is the amount of variability that can be
accounted for in one variable by knowing a second variable. Think for
a moment of all the things that can have an impact on an exam score:
amount of time spent studying, how you feel the day of the exam,
amount of sleep the previous night, whether you were sick or felt well,
as well as a host of other factors. This means that the variability in your
exam scores (as they are usually not all the exact same score) is due to
many factors. A certain amount of the variability may be due to the
number of hours you studied for the exam. Suppose that you compute
the Pearson correlation between the number of hours you spent
studying for the exam and the score on the exam and find a correlation
of +.70. To get the coefficient of determination you simply square the
Pearson correlation, which in this case is the square of .70, or .49. If
you multiply this result by 100 percent, you end up with 49 percent.
This indicates that of all the things that can affect your exam score,
49 percent of the influence is due to the amount of time spent studying.
Learning Check #19: Given the correlation coefficients in
Learning Check #18 of +.71 and +.30, explain what you can
determine with respect to the coefficient of determination.
Click here for Answer.
Learning Check #20: Suppose that the correlation coefficient
between two variables is -.80. Would this lead to a different
conclusion based on the coefficient of determination than a
correlation of +.80?
Click here for Answer.