Page 132

navidi_monk_essential_statistics_1e_ch1_3

130 Chapter 3 Numerical Summaries of Data Objective 5 Understand the effects of outliers Outliers An outlier is a value that is considerably larger or considerably smaller than most of the values in a data set. Some outliers result from errors; for example a misplaced decimal point may cause a number to be much larger or smaller than the other values in a data set. Some outliers are correct values, and simply reflect the fact that the population contains some CAUTION extreme values. Do not delete an outlier unless it is certain that it is an error. When it is certain that an outlier resulted from an error, the value should be corrected or deleted. However, if it is possible that the value of an outlier is correct, it should remain in the data set. Deleting an outlier that is not an error will produce misleading results. EXAMPLE 3.28 Determining whether an outlier should be deleted The temperature in a downtown location in a certain city is measured for eight consecutive days during the summer. The readings, in degrees Fahrenheit, are 81.2, 85.6, 89.3, 91.0, 83.2, 8.45, 79.5, and 87.8. Which reading is an outlier? Is it certain that the outlier is an error, or is it possible that it is correct? Should the outlier be deleted? Solution The outlier is 8.45, which is much smaller than the rest of the data. This outlier is certainly an error; it is likely that a decimal point was misplaced. The outlier should be corrected if possible, or deleted. EXAMPLE 3.29 Determining whether an outlier should be deleted The following table presents the populations, as of July 2009, of the five largest cities in the United States. City Population in millions New York 8.4 Los Angeles 3.8 Chicago 2.9 Houston 2.3 Phoenix 1.6 Source: U.S. Census Bureau Which value is an outlier? Is it certain that the outlier is an error, or is it possible that it is correct? Should the outlier be deleted? Solution The population of New York, 8.4 million, is an outlier because it is much larger than the other values. This outlier is not an error. It should not be deleted. If it were deleted, the data would indicate that the largest city in the United States is Los Angeles, which would be incorrect. The interquartile range The interquartile range (IQR for short) is a measure of spread that is often used to detect outliers. The IQR is the difference between the first and third quartiles. DEFINITION The interquartile range is found by subtracting the first quartile from the third quartile. IQR = Q3 − Q1


navidi_monk_essential_statistics_1e_ch1_3
To see the actual publication please follow the link above