Sunday, October 20, 2019

Example of Goodness of Fit Test

Example of Goodness of Fit Test The chi-square goodness of fit test is a useful to compare a theoretical model to observed data. This test is a type of the more general chi-square test. As with any topic in mathematics or statistics, it can be helpful to work through an example in order to understand what is happening, through an example of the chi-square goodness of fit test. Consider a standard package of milk chocolate MMs. There are six different colors: red, orange, yellow, green, blue and brown. Suppose that we are curious about the distribution of these colors and ask, do all six colors occur in equal proportion? This is the type of question that can be answered with a goodness of fit test. Setting We begin by noting the setting and why the goodness of fit test is appropriate. Our variable of color is categorical. There are six levels of this variable, corresponding to the six colors that are possible. We will assume that the MMs we count will be a simple random sample from the population of all MMs. Null and Alternative Hypotheses The null and alternative hypotheses for our goodness of fit test reflect the assumption that we are making about the population. Since we are testing whether the colors occur in equal proportions, our null hypothesis will be that all colors occur in the same proportion. More formally, if p1 is the population proportion of red candies, p2 is the population proportion of orange candies, and so on, then the null hypothesis is that p1 p2 . . . p6 1/6. The alternative hypothesis is that at least one of the population proportions is not equal to 1/6. Actual and Expected Counts The actual counts are the number of candies for each of the six colors. The expected count refers to what we would expect if the null hypothesis were true. We will let n be the size of our sample. The expected number of red candies is p1 n or n/6. In fact, for this example, the expected number of candies for each of the six colors is simply n times pi, or n/6. Chi-square Statistic for Goodness of Fit We will now calculate a chi-square statistic for a specific example. Suppose that we have a simple random sample of 600 MM candies with the following distribution: 212 of the candies are blue.147 of the candies are orange.103 of the candies are green.50 of the candies are red.46 of the candies are yellow.42 of the candies are brown. If the null hypothesis were true, then the expected counts for each of these colors would be (1/6) x 600 100. We now use this in our calculation of the chi-square statistic. We calculate the contribution to our statistic from each of the colors. Each is of the form (Actual – Expected)2/Expected.: For blue we have (212 – 100)2/100 125.44For orange we have (147 – 100)2/100 22.09For green we have (103 – 100)2/100 0.09For red we have (50 – 100)2/100 25For yellow we have (46 – 100)2/100 29.16For brown we have (42 – 100)2/100 33.64 We then total all of these contributions and determine that our chi-square statistic is 125.44 22.09 0.09 25 29.16 33.64 235.42. Degrees of Freedom The number of degrees of freedom for a goodness of fit test is simply one less than the number of levels of our variable. Since there were six colors, we have 6 – 1 5 degrees of freedom. Chi-square Table and P-Value The chi-square statistic of 235.42 that we calculated corresponds to a particular location on a chi-square distribution with five degrees of freedom. We now need a p-value, to determines the probability of obtaining a test statistic at least as extreme as 235.42 while assuming that the null hypothesis is true. Microsoft’s Excel can be used for this calculation. We find that our test statistic with five degrees of freedom has a p-value of 7.29 x 10-49. This is an extremely small p-value. Decision Rule We make our decision on whether to reject the null hypothesis based on the size of the p-value. Since we have a very miniscule p-value, we reject the null hypothesis. We conclude that MMs are not evenly distributed among the six different colors. A follow-up analysis could be used to determine a confidence interval for the population proportion of one particular color.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.