|
|
Descriptive StatisticsSuppose you gained access to the hundreds, or thousands, of high school grade point averages of all of the freshmen at your college or university. What is the most typical score? How similar are the scores? Simply scanning the scores would provide, at best, gross approximations of the answers to these questions. To obtain precise answers, psychologists use descriptive statistics, which include measures of central tendency and measures of variability. Measures of Central TendencyA measure of central tendency is a single score that best represents an entire set of scores. The measures of central tendency include the mode, the median, and the mean. ModeThe mode is the most frequently occurring score in a set of scores. In the frequency distribution of exam scores discussed, the mode is 90. If two scores occur equally often, the distribution is bimodal. If the data set is made up of a counting of categories, then the category with the most cases is considered the mode. For example, in determining the most common academic major at your school, the mode is the major with the most students. The winner of a presidential primary election in which there are several candidates would represent the mode--the person selected by more voters than any other. The mode can be the best measure of central tendency for practical reasons. Imagine a car dealership given the option of carrying a particular model, but limited to selecting just one color. The dealership owner would be wise to choose the modal color. Learning Check #5: A researcher is interested in the effect of family size on self-esteem. To begin this study, ten students are each asked how many brothers and sisters they have. The responses are as follows: 2, 3, 1, 0, 9, 2, 3, 2, 4, 2. What is the mode for this set of data? MeanThe mean is the arithmetic average, or simply the average, of a set of scores. You are probably more familiar with it than any other measure of central tendency. You encounter the mean in everyday life whenever you calculate your exam average, batting average, gas mileage average, or a host of other averages. The mean of a sample is calculated by adding all the scores and dividing by the number of scores. Exam Scores: 99, 92, 93, 94, 97
Learning Check #6: What is the mean number of brothers and sisters listed in Learning Check #5? MedianThe median is the middle score in a distribution of scores that have been ranked in numerical order. If the median is located between two scores, it is assigned the value of the midpoint between them (for example, the median of 23, 34, 55, and 68 would equal 44.5). The median is the best measure of central tendency for skewed distributions, because it is unaffected by extreme scores. Note that in the example below the median is the same in both sets of exam scores, even though the second set contains an extreme score. The mean is quite different, due to the one extreme score on Exam B. Exam A: 23, 25, 63, 64, 67 Exam B: 23, 25, 63, 64, 98 When Disraeli pointed out the ease of lying with statistics, he might have been referring, in particular, to measures of central tendency. Suppose a baseball general manager is negotiating with an agent about a salary for a baseball catcher of average ability. Both might use a measure of central tendency to prove their own points, perhaps based on the salaries of the top seven catchers, as shown in Table B.2. The general manager might claim that a salary of $340,000 (the median) would provide the player with what he deserves, based on an average salary of the other players. The agent might counter that a salary of $900,000 (the mean) would provide the player with what he deserves, based on an average salary of the other players. Note that neither would technically be lying--they would simply be using statistics that favored their position. As Scottish writer Andrew Lang (1844-1912) warned, beware of anyone who uses statistics as a drunken man uses lampposts--for support rather than for illumination. Learning Check #7: What is the median number of brothers and sisters listed in Learning Check #5? Learning Check #8: Note that the mean number of brothers and sisters is quite a bit different than the median number of brothers and sisters. In this case, which measure of central tendency would be most appropriate to report? Why? Measures of VariabilityAlthough a measure of central tendency is certainly important, it does not completely represent a distribution by itself. Given a measure of central tendency, you have an idea of where scores tend to fall, but you dont know to what extent the scores differ from one another. A measure of the amount of dispersion contained within a data set is called a measure of variability. Except when all scores in a data set are identical, all sets of scores vary to some degree. Consider the members of your psychology class. They would vary on a host of measures, including height, weight, and grade point average. Measures of variability include the range, the variance, and the standard deviation. RangeThe range is the difference between the highest and lowest scores in a distribution. The range provides limited information, because distributions in which scores bunch up toward the beginning, middle, or end of the distribution might have the same range. Of course the range is useful as a rough estimate of how a score compares with the highest and lowest in a distribution. For example, a student might find it useful to know whether he or she did near the best or the worst on an exam. The range of scores in the distribution of 20 grades in the earlier example in Table B.1 would be the difference between 94 and 80, or 14. Learning Check #9: A memory researcher would like to know how many digits a person can recall with only one presentation of a list. She creates random lists of digits and presents them to participants. The number of digits recalled by the first 10 participants are as follows: 5, 9, 6, 10, 9, 7, 8, 7, 9, 12. What is the range of this data set? VarianceA more informative measure of variability is the variance, which represents the variability of scores around their group mean. Unlike the range, the variance takes into account every score in the distribution. Technically, the variance is the average of the squared deviations from the mean. Suppose you wanted to calculate the variance for the sets of 10-point quiz scores in Quiz A and Quiz B (Table B.3). First, find the group mean. Second, find the deviation of each score from the group mean. Note that deviation scores will be negative for scores that are below the mean. As a check on your calculations, the sum of the deviation scores should equal zero. Third, square the deviation scores. By squaring the scores, negative scores are made positive and extreme scores are given relatively more weight. Fourth, find the sum of the squared deviation scores. Fifth, divide the sum by the number of scores. This yields the variance. Note that the variance for Quiz A is larger than that for Quiz B, indicating the students were more varied in their performances on Quiz A. Standard DeviationThe standard deviation, or S, is the square root of the variance. The standard deviation of Quiz A would be S = 3.19. The standard deviation of Quiz B would be S = 1.414. Why not simply use the variance? One reason is that, unlike the variance, the standard deviation is in the same units as the raw scores. This makes the standard deviation more meaningful. Thus, it would make more sense to discuss the variability of a set of IQ scores in IQ points than in squared IQ points. The standard deviation is used in the calculation of many other statistics. Learning Check #10: The exam scores for two sections of introductory psychology are listed below. Compute the standard deviation for each section. Section #1: 42, 45, 56, 56, 60, 62, 67, 68, 70, 71. Section #2: 57, 57, 57, 70, 75, 77, 79, 83, 83, 92. Learning Check #11: Suppose that there were two groups that discussed issues related to abortion. Each member of each group rated on a scale of 1 to 10 their opinion regarding abortion (1 = Totally against abortion; 5 = Neutral; 10 = Totally in favor of abortion). The mean for Group A was found to be 5 with a standard deviation of .02. For Group B the mean was also 5, but the standard deviation was 3.42. Which group would have the more lively debates? The Normal Curve and PercentilesAs illustrated in Figure B.5, the normal curve is a bell-shaped graph that represents a hypothetical frequency distribution in which the frequency of scores is greatest near the mean and progressively decreases toward the extremes. In essence, the normal curve is a smooth frequency polygon based on an infinite number of scores. The mean, median, and mode of a normal curve are the same. Many physical or psychological characteristics, such as height, weight, and intelligence, fall on a normal curve. One useful characteristic of a normal curve is that certain percentages of scores fall at certain distances (measured in standard deviation units) from its mean. A special statistical table makes it a simple matter to determine the percentage of scores that fall above or below a particular score or between two scores on the curve. For example, about 68 percent of scores fall between plus and minus one standard deviation from the mean; about 95 percent fall between plus and minus two standard deviations from the mean; and about 99 percent fall between plus and minus three standard deviations from the mean. For example, consider an IQ test, with a mean of 100 and a standard deviation of 15. What percentage of people score above 115? Because intelligence scores fall on a normal curve, about 34 percent of the scores fall between the mean and one standard deviation (in this case 15 points) above the mean. We also know that for a normal distribution 50 percent of the scores fall above the mean and 50 percent fall below the mean. Thus, about 84 percent (50 percent below the mean and 34 percent between the mean and a score of 115) of the scores fall below 115. If 84 percent fall below 115, then 16 percent (100 percent - 84 percent) must fall above a score of 115. Learning Check #12: An introductory psychology teacher who has taught for years has developed a comprehensive final exam that is normally distributed with a mean of 200 points and a standard deviation of 25 points. (a) What percentage of the students score above 200 points? (b) What percentage of the students score below 175 points? (c) What percentage of the students score more than 250 points? Scores along the abscissa of the normal curve also represent percentiles--the scores at or below which particular percentages of scores fall. Percentiles are frequently used, as they give us a quick idea of how a score compares to the rest of the data set. If a score is equal to the 10th percentile, then you know that 10 percent of the scores fell at or below that value and 90 percent of the scores were above that value. With respect to IQ scores, a score of 115 would have a percentile rank of 84. Learning Check #13: What are the percentile ranks for the three scores listed in Learning Check #12: 200, 175, and 250? Learning Check #14: Suppose you take your daughter Emily to the doctors office for a well-check and find out that she is in the 5th percentile for height and 7th percentile for weight. What do you now know about Emily, as compared to other children her age? The mode is the most frequently occurring number in a data set. In this case, the number 2 occurs most frequently and is therefore the mode. The mean number of brothers and sisters is the sum of all the data points divided by the number of data points. In this case the sum of all of the data points is 28 and there were 10 data points for a mean of 2.8. The median number of brothers and sisters is the middle value, after the data points have been ordered from lowest to highest. Remember that if there are an even number of data points the median is one-half way between the middle two data points. In this case the number listed in order are: 0, 1, 2, 2, 2, 2, 3, 3, 4, 9. The middle most data points are 2 and 2, which makes the median 2. The median number of brothers and sisters is 2, whereas the mean number of brothers and sisters is 2.8. The median is a better indication here, as there is one extreme score (9), which changes the mean considerably. The median is usually the preferred measure of central tendency when there is an extreme score. The range is the highest number minus the lowest number. In this case it is simply 12 - 5 = 7. The range is 7. GET TABLE 2, TABLE 3, AND TABLE 4 FROM SMITH PAGES 694-695. Note that the standard deviation for Section 2 is 11.81. Use the same procedure to calculate Section 1, which turns out to have a standard deviation of 9.58. Although both groups had a mean rating of neutral, the standard deviations tell a great deal here. With a standard deviation of .02, you know that the scores do not vary much at all. That is, most of the group has ratings very close to 5. With a standard deviation of 3.42, the story is much different. This group has ratings that deviate a great deal from 5, which indicates that there are many different and extreme positions in this group. Group B would most certainly have more lively discussions in this situation. (a) Fifty percent score higher than 200. In a normal distribution 50 percent of the scores lie above the mean. (b) About 16 percent of the students score below 175. Note that 25 is the standard deviation, and that 175 is one of these standard deviation units below the mean. Looking at the table of percentages for the normal distribution you can see that about 16 percent fall below the point of one standard deviation below the mean. (c) Note that 250 is 50 units above the mean. This is the same as 2 standard deviation units of 25 each (25 times 2 is 50). If you look at the standard deviation table, you see that only about 2 percent of the scores are higher than 2 standard deviations above the mean. (a) A score of 200 is at the 50th percentile. (b) Recall that 16 percent of the scores fall below a score of 175, which means that this score is at the 16th percentile. (c) You already determined that a score of 250 has only about 2 percent of the scores above that value. If about 2 percent is above the value of 250, then 98 percent must be below that value. The answer is the 98th percentile. The 5th percentile for height means that only 5 percent of the children at this age are at that height or shorter. That means that 95 percent of the children at Emilys age are taller. The weight is very similar. Only 7 percent of the children Emilys age weigh the same or less than she does, meaning that 93 percent weigh more. What you now know is that her height and weight are in proportion and that she is small for her age. |