Page 142

navidi_monk_essential_statistics_1e_ch1_3

140 Chapter 3 Numerical Summaries of Data Chapter 3 Summary Section 3.1: We can describe the center of a data set with the mean or the median. When a data set is skewed to the left, the mean is generally less than the median, and when a data set is skewed to the right, the mean is generally greater than the median. The mode of a data set is the most frequently occurring value. Section 3.2: The spread of a data set is most often measured with the standard deviation. For data sets that are unimodal and approximately symmetric, the Empirical Rule can be used to approximate the proportion of the data that lies within a given number of standard deviations of the mean. Chebyshev’s Inequality, which is valid for all data sets, provides a lower bound for the proportion of the data that lies within a given number of standard deviations of the mean. The coefficient of variation (CV) measures the spread of a data set relative to its mean. The CV provides a way to compare spreads of data sets whose values are in different units. Section 3.3: For bell-shaped data sets, the z-score gives a good description of the position of a value in a data set. Quartiles and percentiles can be used to describe the positions for any data set. Quartiles are used to compute the five-number summary, which consists of the minimum value, the first quartile, the median, the third quartile, and the maximum value. Outliers are values that are considerably larger or smaller than most of the values in a data set. Boxplots are graphs that allow us to visualize the five-number summary, along with any outliers. Comparative boxplots allow us to visually compare the shapes of two or more data sets. Vocabulary and Notation x = x1 +· · ·+ xn 89 mean 88 population variance σ2 106 arithmetic mean 88 mean absolute deviation (MAD) 123 quartile 127 boxplot 132 measure of center 88 range 105 Chebyshev’s Inequality 114 measure of position 88 resistant 92 coefficient of variation (CV) 116 measure of spread 88 sample mean ¯x 89 comparative boxplots 134 median 90 sample standard deviation s 109 degrees of freedom 108 mode 94 sample variance s2 107 deviation 105 modified boxplot 132 second quartile Q2 127 Empirical Rule 112 outlier 130 standard deviation 109 first quartile Q1 127 outlier boundaries 131 third quartile Q3 127 five-number summary 129 percentile 125 variance 105 interquartile range (IQR) 130 population mean μ 89 whisker 132 IQR method 131 population standard deviation σ 109 z-score 124 Important Formulas Sample mean: Coefficient of variation: ¯x = x n CV = σ μ Population mean: z-score: μ = x N z = x − μ σ Range: Interquartile range: Range = largest value − smallest value IQR = Q3 − Q1 = third quartile − first quartile Population variance: Lower outlier boundary: σ2 = (x − μ)2 N Q1 − 1.5 IQR Sample variance: Upper outlier boundary: s2 = (x − ¯x)2 n − 1 Q3 + 1.5 IQR Chapter Quiz 1. Of the mean, median, and mode, which must be a value that actually appears in the data set? 2. The prices (in dollars) for a sample of personal computers are: 550, 700, 420, 580, 550, 450, 690, 390, 350. Calculate the mean, median, and mode for this sample. 3. If a computer with a price of $2000 were added to the list in Exercise 2, which would be affected more, the mean or the median?


navidi_monk_essential_statistics_1e_ch1_3
To see the actual publication please follow the link above