资源描述
New words,We have seen in the previous chapters how very simple graphical devices can help in understanding the structure and dependency of data. The graphical tools were based on either univariate (bivariate) data representations or on “slick” transformations of multivariate information perceivable by the human eye. Most of the tools are extremely useful in a modelling step, but unfortunately, do not give the full picture of the data set.,3 Moving to Higher Dimensions,Univariate ,ju:nivrit Adj.单变量的,One reason for this is that the graphical tools presented capture only certain dimensions of the data and do not necessarily concentrate on those dimensions or subparts of the data under analysis that carry the maximum structural information. In Part III of this book, powerful tools for reducing the dimension of a data set will be presented. In this chapter, as a starting point, simple and basic tools are used to describe dependency. They are constructed from elementary facts of probability theory and introductory statistics (for example, the covariance and correlation between two variables).,3 Moving to Higher Dimensions, The covariance is a measure of dependence. Covariance measures only linear dependence. Covariance is scale dependent. There are nonlinear dependencies that have zero covariance. Zero covariance does not imply independence. Independence implies zero covariance. Negative covariance corresponds to downward-sloping scatterplots. Positive covariance corresponds to upward-sloping scatterplots. The covariance of a variable with itself is its variance Cov(X,X) = XX = 2X For small n, we should replace the factor 1/n in the computation of the covariance by 1/n1 ., The correlation is a standardized measure of dependence The absolute value of the correlation is always less than one. Correlation measures only linear dependence. There are nonlinear dependencies that have zero correlation. Zero correlation does not imply independence. Independence implies zero correlation. Negative correlation corresponds to downward-sloping scatterplots. Positive correlation corresponds to upward-sloping scatterplots. Fishers Z-transform helps us in testing hypotheses on correlation. For small samples, Fishers Z-transform can be improved by the transformation, The center of gravity of a data matrix is given by its mean vector x =n1 XT1n. The dispersion of the observations in a data matrix is given by the empirical covariance matrix S = n1XTHX. The empirical correlation matrix is given by R = D1/2 SD1/2. A linear transformation Y = XAT of a data matrix X has mean A and empirical covariance ASXAT. The Mahalanobis transformation is a linear transformation z i = S1/2 (x i ) which gives a standardized, uncorrelated data matrix Z., Simple ANOVA models an output Y as a function of one factor. The reduced model is the hypothesis of equal means. The full model is the alternative hypothesis of different means. The F-test is based on a comparison of the sum of squares under the full and the reduced models. The degrees of freedom are calculated as the number of observations minus the number of parameters. The F-test rejects the null hypothesis if the F-statistic is larger than the 95% quantile of the F d f (r) d f (f), d f(f) distribution. The F-test statistic for the slope of the linear regression model y i = + x i + i is the square of the t-test statistic.,Covariance Matrix,Correlation Matrix,4 Multivariate Distributions,The preceding chapter showed that by using the two first moments of a multivariate distribution (the mean and the covariance matrix), a lot of information on the relationship between the variables can be made available. Only basic statistical theory was used to derive tests of independence or of linear relationships. In this chapter we give an introduction to the basic probability tools useful in statistical multivariate analysis.,Means and covariances share many interesting and useful properties, but they represent only part of the information on a multivariate distribution. Section 4.1 presents the basic probability tools used to describe a multivariate random variable, including marginal and conditional distributions and the concept of independence. In Section 4.2, basic properties on means and covariances (marginal and conditional ones) are derived.,4.1 Distribution and Density Function,
展开阅读全文