资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,Chapter 3 Experiments with a single factor: The analysis of variance,3-1 An example,A product development engineer is interested in maximizing the tensile strength of a new synthetic fiber that will be used to make cloth for mens shirts.,According to previous experience, the strength is affected by the percentage of cotton in the fiber.,For a permanent-press finish treatment, cotton content should range from 10 percent to 40 percent.,The engineer decides to test specimens at five levels of cotton percentage:,15 percent, 20 percent, 25 percent, 30 percent, 35 percent. He also decides to test five specimens at each level.,This is an example of a single-factor experiment with a=5 levels of the factor and n=5 replicates.,The 25 runs should be made in random order.,Cotton,Percentage,Experiment,Run Number,15,1,2,3,4,5,20,6,7,8,9,10,25,11,12,13,14,15,30,16,17,18,19,20,35,21,22,23,24,25,Test Sequence,Run Number,Percentage of Cotton,1,8,20,2,18,30,3,10,20,4,23,35,5,17,30,6,5,15,7,14,25,8,6,20,9,15,25,10,20,30,11,9,20,12,4,15,13,12,25,14,7,20,15,1,15,16,24,35,17,21,35,18,11,25,19,2,15,20,13,25,21,22,35,22,16,30,23,25,35,24,19,30,25,3,15,Table3-1 Data(in lb/in,2,) from the tensile strength experiment,54,10.8,Cotton,Percentage,1,2,Observe,3,Values,4,5,Total,Average,15,7,7,15,11,9,49,9.8,20,12,17,12,18,18,77,15.4,25,14,18,18,19,19,88,17.6,30,19,25,22,19,23,108,21.6,35,7,10,11,15,11,376,15.04,Table3-2 Typical data for a single-factor experiment,Treatment,(level),Observing,values,Totals,Averages,1,Y,11,Y,12,Y,1n,Y,1.,2,Y,21,Y,22,Y,2n,Y,2.,.,.,.,.,.,.,.,.,.,.,a,Y,a1,Y,a2,Y,an,Y,a.,Y.,We will find it useful to describe the observations with a linear statistic model:,Y,ij,is the (,ij,)th observation,is the overall mean, ,i,is called the,ith,treatment effect.,ij,is,a random error component.,3-3 Analysis of the fixed effects model,In this section , the single-factor analysis of variance for the fixed effects model is developed. In the fixed effects model, the treatment effects,i,are usually defined as deviations from the overall mean, so,Where y,i.,represent the total of observations under the ith treatment,is the average of the observations under the ith treatment,y. is the grand total number of all the observations,is the grand average of all the observations.,N=an is the total number of observations,The mean of the ith treatment is,E(y,ij,)=,I,=,+,I, I=1,2,.,a,Thus, the mean of the ith treatment consists of the overall mean plus the ith treatment effect.,We are interested in testing the equality of the a treatment means, that is,H,0,:,1,= ,2,=.= ,a,H,1,: ,i,j,for at least one pair (i,j),Note that if H,0,is true, all treatments have a common mean ,. Then we have,H,0,:,1,=,2,. ,a,=0,H,1,: ,i,0 for at least one i,3.3.1 Decomposition of the total sum,The name analysis of variance is derived from a partitioning of total variability into its component parts.,The total corrected sum of squares:,SS,T,is called the sum of square due to treatments(between treatments),SS,E,is called the sum of square due to error(within treatments),3-6,Degree of freedom,There are,an=N,total observations:,SS,T,has N-1 degrees of freedom.,There are,a,levels of factor, so has,a-1,degrees of freedom.,Within any treatment there are,n,replicates providing n-1 degrees of freedom with which to estimate the experimental error. Since there are a treatments, we have,a(n-1),=,an-a=N-a,degrees,of freedom for error.,The quantities,are called mean squares. We now examine the expected values of these mean squares:,The above terms involving,ij,2,is replaced by ,2,and E(,ij,)=0,By a similar approach, we may also show that,Thus, as we argued heuristically(启发式的),,MS,E,estimates,2,and ,if there are no differences in treatment means (which implies that,I,=0),MS,treatment,also estimate,2,. However, note that if treatment means do differ, the expected value of the treatment mean square is greater than,2,.,It seems clear that a test of the hypothesis of no difference in treatment means can be performed by comparing MS,treatment,and,MS,E,.,3.3.2 Statistic Analysis,We,now investigate how a formal test of the hypothesis of no differences in treatment means(H,0,:,1,=,2,=.= ,a, of equivalently,H,0,:,1,=,2,=,a,=0) can be performed.,From previous section,is distributed as F with a-1 and N-a degrees of freedom.,We have: if F,0,F,a-1,N-a,we reject H,0,.,From equatio3-6, we obtained,The error sum of squares is obtained by subtraction as,Example 3-1,(See Table3-1 and Table 3-2),The sums of squares required for the analysis of variance are computed as follows,:,=(7),2,+(7),2,+(15),2,+.+(15),2,+(11),2,(376),2,/25=636.96,= 75.76,=,=636.96,475.76=161.20,set,=0.01, F,0.0.1,4,20,=4.43,So F,0, F,0.0.1,4,20,we reject H,0,and conclude that the treatment means differ; That is, the percentage of cotton in the fiber significantly affects the mean tensile strength.,Example3-2,Coding the observations,We subtract 15 from each observation. The coded data are shown in,Table 3-5,. It is easy to verify that,SS,T,=(-8),2,+(-8),2,+(-4),2,(1),2,/(25)=636.96,and,SS,E,=,161.20,Table 3-5 coded tensile strength data for example 3-2,Percentage of cotton,1,2,Observation,3,4,5,Totals,Y,i.,15,-8,-8,0,-4,-6,-26,20,-3,2,-3,3,3,2,25,-1,3,3,4,4,13,30,4,10,7,4,8,33,35,-8,-5,-4,0,-4,-21,Y.=1,3-3.3 Estimation of the model parameters,We now develop estimators for the parameters in the single-factor model using the method of least squares.,Choose values of,and,I, say,and , that minimize L,.,Differentiating Equation 3-11 with respect to,and,I,and equation to zero, we obtain,and,which, after simplification, yield,The a+1 equations(Equation3-12)in a+1 unknowns are called the least squares normal equations. Notice that if we add the last a normal equations, we obtain the first normal equation.,Therefore, the normal equations are not linearly independent, and no unique solution for,1,a,exists. This difficulty can be overcome by several methods. Since we have defined the treatment effects as deviations from the overall mean, it seems reasonable to apply the constraint,Using this constraint , we obtain as the solution to the normal equations:,i=,1,2,a,A confidence interval estimated of the ith treatment mean may be easily determined . the mean of the ith treatment is,A point estimator of,I,would be,Using,MS,E,as an estimator of,2, we would have base the confidence interval on the t distribution.,Therfore , a 100(1-,) percent confidence interval on ith treatment mean,I,is,Example 3-3,Using the data in Example 3-1, we may find the estimates of the overall mean and the treatment effects as,=376/25=15.04 and,A 95 percent confidence interval on the mean of treatment 4 is computed from Equation3-15 as,Thus, the desired confidence interval is 18.95,4,25.25,3-3.4 Model adequacy checking: preview,3-3.5 The unbalanced case,In some single-factor experiments the number of observations taken within each treatment may be different. We then say that the design is unbalanced.,Let n,i,observations be taken under treatment i (i=1,2,.,a) and,The computational formulas for SS,T,and SS,Treatment,become,and,3-4 Comparison of individual treatment means,Comparisons between treatment means are made in terms of either the treatment totals(y,i.,) or the treatment averages ( ). The procedures for making these comparisons are usually called multiple comparison methods.,3-4.1 Graphical comparison of means,Reading.,3-4.2 Contrasts,Many multiple comparison methods use the idea of a contrast. Consider the synthetic fiber testing problem of example 3-1. Since the hypothesis H,0,:,I,=0 was rejected, we know that some cotton percentages produce different tensile strengths than others , but which once actually cuase this difference?,This hypothesis could be tested by investigating an appropriate linear combination of treatment totals, say:,If we had suspected that the average of cotton percentage 1 and 3 did not differ from the average of cotton percentage 4 and 5, then the hypothesis would have been,Which implies the linear combination,A linear combination of treatment totals such as,with the restriction that . Such linear combinations are called contrasts.,3-4.3 Orthogonal contrasts,A very important special case of the above procedure is that of orthogonal contrasts. Two contrast with coefficients c,i, and d,i, are orthogonal if,Example 3-4,Consider the data in Example 3-1. there are five treatment means and four degrees of freedom between these treatments. The set of comparisons between these means and the associated orthogonal contrasts are,Hypothesis Contrast,Table 3-6 Analysis of variance for the tensile strength date,Source of variation,Sum of squares,Degrees of freedom,Mean square,F,0,Cotton percentage,475.76,4,118.94,14.76,a,Orthogonal contrasts,C,1,:,4,= ,5,(291.60),1,291.60,36.18,a,C,2,:,1,+ ,3,= ,4+,5,(31.25),1,31.25,3.88,C,3,:,1,= ,3,(152.10),1,152.10,18.87,a,C,4,:,4,2,= ,I,+ ,3,+ ,I,+ ,5,(0.81),1,0.81,0.1,Error,161.20,20,8.06,Total,636.96,24,a,Significant at 1 percent,3-4.5 Comparing pairs of treatment means,Suppose that we are interested in comparing all pairs of a treatment means and that the null hypothesis that we wish to teat are H,0,:,i,= ,j,for i,j,. We now present several methods for making such comparisons.,The least significant difference (LSD) method.,we wish to test H,0,:,i,= ,j,for i,j,. This could be done by employing the t statistic,It is known that,i,and,j,differ when t,0,t,/2,N-,.,So, ,i,and,j,would declared significant different if,The quantity,is called the “least significant difference,Example 3-5,To illustrate the procedure, if we use the data in example 3-1, the LSD at,=0.05 is,The five treatment averages are,9.8 10.8,15.4 17.6,21.6,Duncans Multiple Range Test,A widely used procedure for comparing all pairs of means is the multiple range test developed by DUNCAN.,The standard error of each average is determined as,For unequal sample sizes, replace n by n,h,See Appendix Table VII, Obtain the values R,(p,f) for P =2,3.a,is called the least significant range.,Where ,is the significance level and f is the number of degrees of freedom for error.,for p=2,3.,a f is the degrees of freedom for error,Procedure: Beginning with largest versus smallest, which would be compared with the least significant range Ra. Next, the difference of the largest and the second smallest is computed and compared with the least significant range R,a-1,.,Example 3-6,We can apply Duncans multiple range test to the data of Example 3-1. Recall that MS,E,=8.06, N=25, n=5, and there are 20 error degrees of freedom. Ranking the treatment averages in ascending order, we have,The standard error of each average is = 1.27. From the table of significant ranges in Appendix VII:,The comparisons would yield,5,From the analysis we can see that there are significant differences between all pairs of means except 3 and 2 and 5 and 1,3-4.6 Comparing treatments with a control,In many experiments, one of the treatments is,a control, and the analysis is intererested in comparing each of the other a-1 treatment means with the control. Thus , there are only a-1 comparisons to be made. A procedure for making these comparisons has been developed by Dunnett.,Suppose that treatment a is the control. Then we wish to test the hypothesis,For I =1,2,.,a-1. Dunnetts procedure is a modification of the usual t test.,The null hypothesis H,0,:,i,=,a,is rejected using the following expression:,where the constant d,(a-1,f) is given in Appendix Table IX.,Example 3-7,To illustrate Dunnetts test, consider the data from example 3-1 with treatment 5 considered the control. In this example, a=5, a-1=4, f= N-a=20, and n,i,=n=5.,The observed differences are:,Based on the above calculation, we conclude that,3,5,and,4,5.,3-5 The random effects model,An experimenter is frequently interested in a factor that has a large number of possible levels. If the experimenter randomly selects,a,of these levels from the population of factor levels, then we say that the factor is random. Because the levels of factor actually used in the experiment were chosen randomly, inferences are made about the entire population of factor levels. We assume that the population of factor levels is either of infinite size or is large enough to be considered infinite.,The linear statistical model is,where both,i,and ,ij,are random variable. If ,I,has variance a ,2,and is independent of ,ij, the variance of any observation is,To test hypotheses in this model, we require that ,ij, are NID(0,), that the ,I, are NID(0, ,2,), and ,i,and ,ij,are independent.,The sum of squares identity,is still valid.,In the random effects model, instead of the mean, we test the variance.,If,2,=0, all treatments are identical; but if ,2,0, variability exists between treatments,.,if we reject H,0,(the null hypotheses,2,=0).,Consider,We can find something about F,0,(the sense of F,0,),Therefore, the estimators of the variance components are,Example 3-8,A textile company weaves a fabric on a large number of looms. They would like the looms to be homogeneous so that they obtain a fabric of uniform strength. The process engineer suspects that, in addition to the usual variation in strength within samples of fabric from the same loom, there may also be significant variations in strength between looms. To investigate this , she selects four looms at random and makes four strength determinations on the fabric manufactured on each loom. This experiment is run in random order, and the data obtained are shown in Table 3-7. The analysis of variance is conducted and is shown in Table 3-8.,Table 3-7,Looms,1,2,3,4,Y,i.,1,98,97,99,96,390,97.5,2,91,90,93,92,366,91.5,3,96,95,97,95,383,95.75,4,95,96,99,98,388,97,y,.,=1572,=95.44,Table 3-8,Source of variance,Sum of squares,Degree of Freedom,Mean square,F,0,Looms,89.19,3,29.73,15.68,a,Error,22.75,12,1.9,total,111.94,15,a,Significant at 5 percent,F,0.05, 3,12,=3.49,The variance components are estimated by and,The variance of any observation on strength is estimated by,Most of this variability is attributable to differences between looms.,
展开阅读全文