Chapter 11: Advanced Topics in Statistics | Six Sigma and Beyond: Statistics and Probability, Volume III

This chapter is an overview of advanced statistical topics that an experimenter may want to use in the pursuit of finding the appropriate model for understanding and predicting specific outcomes . Because the topics are very complex and the techniques very tedious , the chapter will focus on explaining some of the idiosyncrasies and identifying some of the tests. However, it is assumed that computer software will be used, and therefore no critical tables are provided. (Readers who are interested in table values should consult any statistics book that deals with these topics.)

WHAT ARE DISCRIMINANT ANALYSIS AND LOGISTIC REGRESSION?

In attempting to choose an appropriate analytical technique, we sometimes encounter a problem that involves a categorical dependent variable and several metric (measurable) independent variables . For example, we may wish to distinguish good from bad credit risks. If we had a metric measure of credit risk, then we could use multivariate regression. But we may be able to ascertain only if someone is in the good or bad risk category, and this is not the metric type measure required by multivariate regression analysis.

Discriminant analysis and logistic regression are the appropriate statistical techniques when the dependent variable is categorical (nominal or nonmetric) and the independent variables are metric. In many cases, the dependent variable consists of two groups or classifications (for example, male versus female or high versus low). In other instances, more than two groups are involved, such as low, medium, and high classifications. Discriminant analysis is capable of handling either two groups or multiple (three or more) groups. When two classifications are involved, the technique is referred to as two- group discriminant analysis. When three or more classifications are identified, the technique is referred to as multiple discriminaut analysis (MDA) . Logistic regression, also known as logit analysis, is limited in its basic form to two groups, although alternative formulations can handle more than two groups.

Discriminant analysis involves deriving a variate, the linear combination of the two (or more) independent variables that will discriminate best between a priori defined groups. Discrimination is achieved by setting the variate's weights for each variable to maximize the between-group variance relative to the within-group variance. The linear combination for a discriminant analysis, also known as the discriminant function, is derived from an equation that takes the following form:

Z _jk = ± + W ₁ X _1k + W ₂ X _2k + + W _n X _nk

where ± = intercept, Z _jk = discriminant Z score of discriminant function j for object k, W _i = discriminant weight for independent variable i, and X _ik = independent variable i for object k .

Discriminant analysis is the appropriate statistical technique for testing the hypothesis that the group means of a set of independent variables for two or more groups are equal. To do so, discriminant analysis multiplies each independent variable by its corresponding weight and adds these products together. The result is a single composite discriminant Z score for each individual in the analysis. By averaging the discriminant scores for all the individuals within a particular group, we arrive at the group mean. This group mean is referred to as a centroid . When the analysis involves two groups, there are two centroids; with three groups, there are three centroids; and so forth. The centroids indicate the most typical location of any individual from a particular group, and a comparison of the group centroids shows how far apart the groups are along the dimension being tested .

The test for the statistical significance of the discriminant function is a generalized measure of the distance between the group centroids. It is computed by comparing the distributions of the discriminant scores for the groups. If the overlap in the distributions is small, the discriminant function separates the groups well. If the overlap is large, the function is a poor discriminator between the groups. Two distributions of discriminant scores shown in Figure 11.1 further illustrate this concept. The top diagram represents the distributions of discriminant scores for a function that separates the groups well, whereas the lower diagram shows the distributions of discriminant scores on a function that is a relatively poor discriminator between groups A and B. The shaded areas represent probabilities of misclassifying objects from group A into group B.

Figure 11.1: Univariate representation of discriminant Z scores.

Multiple discriminant analysis is unique in one characteristic among the dependence relationships of interest here: if there are more than two groups in the dependent variable, discriminant analysis will calculate more than one discriminant function. As a matter of fact, it will calculate NG -1 functions, where NG is the number of groups. Each discriminant function will calculate a discriminant Z score. In the case of a three-group dependent variable, each object will have a score for discriminant functions one and two, allowing the objects to be plotted in two dimensions, with each dimension representing a discriminant function. Thus, discriminant analysis is not limited to a single variate, as is multiple regression, but creates multiple variates representing dimensions of discrimination among the groups.

Logistic regression is a specialized form of regression that is formulated to predict and explain a binary (two-group) categorical variable rather than a metric dependent measure. The form of the logistic regression variate is similar to the variate in multiple regression. The variate represents a single multivariate relationship with regression-like coefficients that indicate the relative impact of each predictor variable. The differences between logistic regression and discriminant analysis will become more apparent in our discussion of logistic regression's unique characteristics later in this chapter. Yet many similarities also exist between the two methods. When the basic assumptions of both methods are met, they each give comparable predictive and classificatory results and employ similar diagnostic measures. Logistic regression, however, has the advantage of being less affected than discriminant analysis when the basic assumptions, particularly normality of the variables, are not met. It also can accommodate nonmetric variables through dummy -variable coding, just as regression can. Logistic regression is limited, however, to prediction of only a two-group dependent measure. Thus, in cases for which three or more groups form the dependent measure, discriminant analysis is better suited.

ANALOGY WITH REGRESSION AND MANOVA

The application and interpretation of discriminant analysis is much the same as in regression analysis; that is, the discriminant function is a linear combination (variate) of metric measurements for two or more independent variables and is used to describe or predict a single dependent variable. The key difference is that discriminant analysis is appropriate for research problems in which the dependent variable is categorical (nominal or nonmetric), whereas regression is utilized when the dependent variable is metric. As discussed earlier, logistic regression is a variant of regression, thus having many similarities except for the type of dependent variable. Discriminant analysis is also comparable to "reversing" multivariate analysis of variance (MANOVA). In discriminant analysis, the single dependent variable is categorical, and the independent variables are metric. The opposite is true of MANOVA, which involves metric dependent variables and categorical independent variable(s).

DISCRIMINANT ANALYSIS (DA)

DA was initially developed by Fisher (1936) for the purpose of classifying objects into one of two clearly defined groups. Shortly thereafter, DA was generalized to problems of classification into any number of groups and has been labeled Multiple Discriminant Analysis (MDA). For some time, DA was used exclusively for taxonomic problems in various disciplines (e.g., botany, biology, geology, clinical psychology, vocational guidance). In recent years , DA has come into use as a method of studying group differences on several variables simultaneously . Because of some common features of DA and Multivariate Analysis of Variance (MANOVA), some researchers treat the two as interchangeable methods for studying group differences on multiple variables. More often, however, it is suggested that DA be used after a MANOVA for the purpose of identifying the dimensions along which the groups differ . For a comprehensive review of the various uses of DA see Huberty (1975). Good introductory treatments of DA will be found in Klecka (1980) and Tatsuoka (1970, 1976).

The discussion offered here is limited to the use of DA for the purpose of studying group differences. Sophisticated classification methods, of which DA is but one, are available and are discussed, among others, by Rulon et al. (1967), Overall and Klett (1972), Tatsuoka (1974, 1975), and Van Ryzin (1977).

To understand the DA it is necessary to discuss the concept of Sums of Squares and Cross Products (SSCP) matrices.

SSCP

Whereas in the univariate analysis of variance the total sum of squares of the dependent variable is partitioned into two components : (1) pooled within-groups sum of squares and (2) between-groups sum of squares ” with multiple dependent variables it is possible to calculate the within and between groups sums of squares for each of them. In addition, the total sum of cross products between any two variables can be partitioned into (1) pooled within groups sum of products and (2) between-groups sum of products. With multiple dependent variables, it is convenient to assemble the sums of squares and cross products in the following three matrices: W = pooled within-groups SSCP; B = between-groups SSCP; T = total SSCP. To clarify these notions, assume that there are only two dependent variables. Accordingly, the elements of the above matrices are:

where SS _W1 = pooled sum of squares within groups for variable 1, SS _W2 = pooled sum of squares within groups for variable 2, and SCP _W = pooled within-groups sum of products of variables 1 and 2.

where SS _b1 and SS _b2 , are the between-groups sums of squares for variables 1 and 2, respectively, and SCP _b is the between-groups sum of cross products of variables 1 and 2.

where SS ₁ and SS ₂ are the total sums of squares for variables 1 and 2, respectively, and SCP ₁₂ is the total sum of cross products of variables 1 and 2. Note that the elements of T are calculated as if all the subjects belong to a single group.

Because T = W + B , the elements of the total SSCP matrix ( T ) can be obtained by adding W and B . This is an important concept and it should be noted that normally W , B , and T are obtained by using matrix operations on the raw score matrices. This is how computer programs are written. Also, as shown above, only two of the three matrices have to be calculated. The third may be obtained by addition or subtraction, whatever the case may be. Thus, T was obtained above by adding W and B. If, instead, T and W were calculated, then B = T - W , or W = T - B .

ELEMENTS OF DA

Although the presentation of DA for two groups may be simplified (see, for example, Green, 1978, Chapter 4; Lindeman et al., 1980, Chapter 6), it was felt that it will be more instructive to present the general case ” that is, for two groups or more. Therefore, although in the presentation that follows the equations are applied to DA with two groups, the same equations are applicable to DA with any number of groups. Calculation of DA, particularly the eigenvalues, can become very complicated. Consequently, DA is generally calculated by the use of a computer program.

The basic idea of DA is to find a set of weights, v, by which to weight the scores of each individual so that the ratio of B (between-groups SSCP) to W (pooled within-groups SSCP) is maximized, thereby leading to maximum discrimination among the groups. This may be expressed as follows:

where v' and v are a row and column vectors of weights, respectively. » is referred to as the discriminant criterion.

A solution of » is obtained by solving the following determinantal equation:

W ^-1 B- » I = 0

where W ^-1 is the inverse of W , and I is an identity matrix. » is referred to as the largest eigenvalue, or characteristic root, of the matrix, the determinant of which is set equal to zero. With two groups, only one eigenvalue may be obtained. To solve this equation, first, the inverse of W has to be calculated (see Appendix A for a review of simple matrix calculations), and the determinant of W will be calculated. Second, we proceed to solve for » so that the determinant of the matrix will be equal to zero.

At this point, having calculated » , the weights, v , are calculated by solving the following:

(W ^-1 B- » I)v = 0

The terms in parenthesis are those used previously in the determinantal equation; v is referred to as the eigenvector or the characteristic vector. Using the value of » and the values of W ^-1 B that we obtained earlier, we can proceed to solve the homogeneous equations. The results are coefficients that have a constant proportionality.