Find McGraw-Hill Textbooks by Discipline Accounting Activities and Sports Anthropology Agriculture Allied Health Anatomy and Physiology Architecture & Urban Planning Art Astronomy Botany Business Communications Business Law Business Math Business Statistics Cellular/Molecular Biology Chemistry Communication Computer Literacy/CIT Computer Science Criminal Justice Ecology/Evolution Economics Education Engineering English Environmental Science ESL Film Finance First-Year Experience Foreign Language Methods Forestry French General and Human Biology Genetics Geography Geology German Health History Human Performance Humanities Intro To Business Insurance & Real Estate Italian Japanese Journalism Literature Management & Organization Management Information Systems Mass Communication Marine/Aquatic Biology Marketing Math Meteorology Microbiology Music Nutrition Operations and Decision Sciences Philosophy and Religion Physical Education Physical Science Physics Political Science Portuguese Programming Languages Psychology Recreation Russian Sociology Spanish Statistics and Probability Student Success Theater Women's Studies Zoology
You are here: MHHE Home | Sociology Home | Statistics Primer for Sociology
 Introduction Representation of Data Descriptive Statistics Correlation Statistics Inferential Statistics Summary
 Correlational Statistics

So far, you have been reading about statistics that describe sets of data. In many research studies, sociologists might want to know the extent to which two variables are related. Correlational statistics do just that. Correlational statistics yield a number called the coefficient of correlation. The coefficient may vary from 0.00 to 1.00. Correlations may also be either positive or negative. In a positive correlation, scores on two different variables increase and decrease together. For example, there is a positive correlation between high school average and freshmen grade point average in college. In a negative correlation, as scores for one variable decrease, they increase for the other variable. For example, there is a negative correlation between absenteeism and course performance. The strength of a correlation depends on its size, not its sign. For example, a correlation of -.72 is stronger than a correlation of +.53.

Correlational statistics are important because they permit us to determine the strength and direction of the relationship between different sets of data or to predict scores on one distribution based on our knowledge of scores on another. If the correlation between two sets of data were a perfect 1.00, we could predict one score from another with complete accuracy. But because correlations are almost always less than perfect, we predict one score from another only with a particular probability of being correct--the higher the correlation, the higher the probability.

It cannot be stressed strongly enough that correlation does not mean causation. For example, years ago, authorities presumed that autistic children, who have poor social and communication skills, were caused by "refrigerator mothers." Mothers of autistic children were aloof from them. This was taken as a sign that the children suffered from mothers who were emotionally cold. Knowing that this is simply a correlation, you might wonder whether causality was in the opposite direction. Perhaps autistic children, who do not respond to their mothers, cause their mothers to become aloof from them. Moreover, why would a mother have several normal children, then an autistic child, and then several more normal ones? It would be difficult to believe she was a warm parent to all but one. Today, evidence indicates that autism is a neurological problem that has nothing to do with the mother's emotionality.

As another example, although there is a positive correlation between smoking and cancer in human beings, this correlation is not scientifically acceptable evidence that smoking causes cancer. Perhaps another factor (such as a level of stress tolerance) might make someone prone to both smoking and cancer, without smoking's necessarily causing cancer. Of course, correlation does not imply the absence of causation. For example, there may indeed be a causal relationship between smoking and cancer. The point is that if two variables are strongly correlated, one of the variables may cause the other, or there may not be a causal link: we just cannot tell for sure based on a correlation coefficient. But remember that knowing that two variables are related is still an important piece of information.

Learning Check #15: Many studies have determined that there is a positive correlation between viewing violence on television and violent behavioral patterns. What does this mean?

Learning Check #16: Given that there is a positive correlation between viewing violence on television and violent behavior, can we conclude from this data that watching the violence on television causes children to behave violently?

Learning Check #17: Researchers used to believe that there was a negative correlation between age and IQ. Recently, this correlation has turned out to be much weaker than we originally thought. Describe what is meant by a negative correlation between age and IQ.

Scatter Plots

Correlational data are graphed using a scatter plot, also known as a scattergram or scatter diagram. In a scatter plot, one variable is plotted on the abscissa and the other on the ordinate. Each participant's scores on both variables are indicated by a dot placed at the junction between those scores on the graph. This produces one dot for each participant. The pattern of the dots gives a rough impression of the size and direction of the correlation. In fact, a line drawn through the dots, or line of best fit, helps estimate this. The closer the dots lie to a straight line, the stronger the correlation. Figure B.6 illustrates several kinds of correlation.

Pearson's Product-Moment Correlation

The most commonly used coefficient of correlation is the Pearson's product-moment correlation (Pearson's r), named for the English statistician Karl Pearson. One formula for calculating it is presented in Figure B.7. The example assesses the relationship between home runs and stolen bases by five baseball players during one month of a season. Recall that correlation coefficients range from 0 to 1.00 and can be either negative or positive. This correlation of -.23 is considered to be a weak, negative correlation.

Learning Check #18: In a large study of twins, the Minnesota Twin study found a correlation of +.71 between the IQ scores of identical twins. Another study found that family income is correlated +.30 with the IQ of children. What do these correlation coefficients mean?

Coefficient of Determination

One last number that can be helpful in understanding the relationship between two variables is the coefficient of determination. The coefficient of determination is the amount of variability that can be accounted for in one variable by knowing a second variable. Think for a moment of all the things that can have an impact on an exam score: amount of time spent studying, how you feel the day of the exam, amount of sleep the previous night, whether you were sick or felt well, as well as a host of other factors. This means that the variability in your exam scores (as they are usually not all the exact same score) is due to many factors. A certain amount of the variability may be due to the number of hours you studied for the exam. Suppose that you compute the Pearson correlation between the number of hours you spent studying for the exam and the score on the exam and find a correlation of +.70. To get the coefficient of determination you simply square the Pearson correlation, which in this case is the square of .70, or .49. If you multiply this result by 100 percent, you end up with 49 percent. This indicates that of all the things that can affect your exam score, 49 percent of the influence is due to the amount of time spent studying.

Learning Check #19: Given the correlation coefficients in Learning Check #18 of +.71 and +.30, explain what you can determine with respect to the coefficient of determination.

Learning Check #20: Suppose that the correlation coefficient between two variables is -.80. Would this lead to a different conclusion based on the coefficient of determination than a correlation of +.80?