Suppose you gained access to the hundreds, or thousands, of high school
grade point averages of all the freshmen at your college or university.
What is the most typical score? How similar are the scores? Simply
scanning the scores would provide, at best, gross approximations of the
answers to these questions. To obtain precise answers, sociologists use
descriptive statistics, which include measures of central tendency and
measures of variability.
Measures of Central Tendency
A measure of central tendency is a single score that best represents an
entire set of scores. The measures of central tendency include the mode,
the median, and the mean.
Mode
The mode is the most frequently occurring score in a set of scores. In the
frequency distribution of exam scores discussed, the mode is 90. If two
scores occur equally often, the distribution is bimodal. If the data set is
made up of a counting of categories, then the category with the most
cases is considered the mode. For example, in determining the most
common academic major at your school, the mode is the major with the
most students. The winner of a presidential primary election in which there
are several candidates would represent the mode--the person selected by
more voters than any other.
The mode can be the best measure of central tendency for practical
reasons. Imagine a car dealership given the option of carrying a particular
model, but limited to selecting just one color. The dealership owner would
be wise to choose the modal color.
Learning Check #5: A researcher is interested in the effect of family
size on self-esteem. To begin this study, 10 students are each asked
how many brothers and sisters they have. The responses are as
follows: 2, 3, 1, 0, 9, 2, 3, 2, 4, 2. What is the mode for this set of
data?
Click here for Answer.
Mean
The mean is the arithmetic average, or simply the average, of a set of
scores. You are probably more familiar with it than any other measure of
central tendency. You encounter the mean in everyday life whenever you
calculate your exam average, batting average, gas mileage average, or a
host of other averages.
The mean of a sample is calculated by adding all the scores and dividing
by the number of scores.
Exam Scores: 99, 92, 93, 94, 97
Learning Check #6: What is the mean number of brothers and sisters
listed in Learning Check #5?
Click here for Answer.
Median
The median is the middle score in a distribution of scores that have been
ranked in numerical order. If the median is located between two scores, it
is assigned the value of the midpoint between them (for example, the
median of 23, 34, 55, and 68 would equal 44.5). The median is the best
measure of central tendency for skewed distributions, because it is
unaffected by extreme scores. Note that in the example below the median
is the same in both sets of exam scores, even though the second set
contains an extreme score. The mean is quite different, due to the one
extreme score on Exam B.
Exam A: 23, 25, 63, 64, 67
Exam B: 23, 25, 63, 64, 98
When Disraeli pointed out the ease of lying with statistics, he might have
been referring, in particular, to measures of central tendency. Suppose a
baseball general manager is negotiating with an agent about a salary for a
baseball catcher of average ability. Both might use a measure of central
tendency to prove their own points, perhaps based on the salaries of the
top seven catchers, as shown in Table B.2. The general manager might
claim that a salary of $340,000 (the median) would provide the player
with what he deserves, based on an average salary of the other players.
The agent might counter that a salary of $900,000 (the mean) would
provide the player with what he deserves, based on an average salary of
the other players. Note that neither would technically be lying: they would
simply be using statistics that favored their position. As Scottish writer
Andrew Lang (1844-1912) warned, beware of anyone who “uses
statistics as a drunken man uses lampposts--for support rather than for
illumination.”
Learning Check #7: What is the median number of brothers and
sisters listed in Learning Check #5?
Click here for Answer.
Learning Check #8: Note that the mean number of brothers and
sisters is quite a bit different than the median number of brothers
and sisters. In this case, which measure of central tendency would be
most appropriate to report? Why?
Click here for Answer.
Measures of Variability
Although a measure of central tendency is certainly important, it does not
completely represent a distribution by itself. Given a measure of central
tendency, you have an idea of where scores tend to fall, but you don’t
know to what extent the scores differ from one another. A measure of the
amount of dispersion contained within a data set is called a measure of
variability. Except when all scores in a data set are identical, all sets of
scores vary to some degree. Consider the members of your sociology
class. They would vary on a host of measures, including height, weight,
and grade point average. Measures of variability include the range, the
variance, and the standard deviation.
Range
The range is the difference between the highest and lowest scores in a
distribution. The range provides limited information, because distributions
in which scores bunch up toward the beginning, middle, or end of the
distribution might have the same range. Of course the range is useful as a
rough estimate of how a score compares with the highest and lowest in a
distribution. For example, a student might find it useful to know whether
he or she did near the best or the worst on an exam. The range of scores
in the distribution of 20 grades in the earlier example in Table B.1 would
be the difference between 94 and 80, or 14.
Learning Check #9: A social researcher would like to know how
many digits people in different age categories can recall with only one presentation of a list. She creates random lists of digits and presents them to participants.
The number of digits recalled by the first 10 participants is as
follows: 5, 9, 6, 10, 9, 7, 8, 7, 9, 12. What is the range of this data
set?
Click here for Answer.
Variance
A more informative measure of variability is the variance, which
represents the variability of scores around their group mean. Unlike the
range, the variance takes into account every score in the distribution.
Technically, the variance is the average of the squared deviations from the
mean.
Suppose you wanted to calculate the variance for the sets of 10-point
quiz scores in Quiz A and Quiz B (Table B.3). First, find the group mean.
Second, find the deviation of each score from the group mean. Note that
deviation scores will be negative for scores that are below the mean. As a
check on your calculations, the sum of the deviation scores should equal
zero. Third, square the deviation scores. By squaring the scores, negative
scores are made positive and extreme scores are given relatively more
weight. Fourth, find the sum of the squared deviation scores. Fifth, divide
the sum by the number of scores. This yields the variance. Note that the
variance for Quiz A is larger than that for Quiz B, indicating the students
were more varied in their performances on Quiz A.
Standard Deviation
The standard deviation, or S, is the square root of the variance. The
standard deviation of Quiz A would be
S = 3.19.
The standard deviation of Quiz B would be
S = 1.414.
Why not simply use the variance? One reason is that, unlike the variance,
the standard deviation is in the same units as the raw scores. This makes
the standard deviation more meaningful. Thus, it would make more sense
to discuss the variability of a set of IQ scores in IQ points than in squared
IQ points. The standard deviation is used in the calculation of many other
statistics.
Learning Check #10: The exam scores for two sections of
introductory sociology are listed below. Compute the standard
deviation for each section. Section #1: 42, 45, 56, 56, 60, 62, 67, 68,
70, 71. Section #2: 57, 57, 57, 70, 75, 77, 79, 83, 83, 92.
Click here for Answer.
Learning Check #11: Suppose that there were two groups that
discussed issues related to abortion. Each member of each group
rated on a scale of 1 to 10 their opinion regarding abortion (1 =
Totally against abortion; 5 = Neutral; 10 = Totally in favor of
abortion). The mean for Group A was found to be 5 with a standard
deviation of .02. For Group B the mean was also 5, but the standard
deviation was 3.42. Which group would have the more lively
debates?
Click here
for Answer.
The Normal Curve and Percentiles
As illustrated in Figure B.5, the normal curve is a bell-shaped graph that
represents a hypothetical frequency distribution in which the frequency of
scores is greatest near the mean and progressively decreases toward the
extremes. In essence, the normal curve is a smooth frequency polygon
based on an infinite number of scores. The mean, median, and mode of a
normal curve are the same. Many variable human
characteristics, such as height, weight, and intelligence, fall on a normal
curve.
One useful characteristic of a normal curve is that certain percentages of
scores fall at certain distances (measured in standard deviation units) from
its mean. A special statistical table makes it a simple matter to determine
the percentage of scores that fall above or below a particular score or
between two scores on the curve. For example, about 68 percent of
scores fall between plus and minus one standard deviation from the mean;
about 95 percent fall between plus and minus two standard deviations
from the mean; and about 99 percent fall between plus and minus three
standard deviations from the mean.
For example, consider an aptitude test, with a mean of 100 and a standard
deviation of 15. What percentage of people score above 115? Because
aptitude scores fall on a normal curve, about 34 percent of the scores
fall between the mean and one standard deviation (in this case 15 points)
above the mean. We also know that for a normal distribution 50 percent
of the scores fall above the mean and 50 percent fall below the mean.
Thus, about 84 percent (50 percent below the mean and 34 percent
between the mean and a score of 115) of the scores fall below 115. If 84
percent fall below 115, then 16 percent (100 percent minus 84 percent) must
fall above a score of 115.
Learning Check #12: An introductory sociology teacher who has
taught for years has developed a comprehensive final exam that is
normally distributed with a mean of 200 points and a standard
deviation of 25 points. (a) What percentage of the students score
above 200 points? (b) What percentage of the students score below
175 points? (c) What percentage of the students score more than 250
points?
Click here for Answer.
Scores along the abscissa of the normal curve also represent
percentiles--the scores at or below which particular percentages of
scores fall. Percentiles are frequently used, as they give us a quick idea of
how a score compares with the rest of the data set. If a score is equal to the
10th percentile, then you know that 10 percent of the scores fell at or
below that value and 90 percent of the scores were above that value.
With respect to IQ scores, a score of 115 would have a percentile rank
of 84.
Learning Check #13: What are the percentile ranks for the three
scores listed in Learning Check #12: 200, 175, and 250?
Click here for Answer.
Learning Check #14: Suppose you take your daughter Emily to the
doctor’s office for a well-check and find out that she is in the 5th
percentile for height and 7th percentile for weight. What do you now
know about Emily, as compared with other children her age?
Click here for Answer.