Thursday, 19 May 2011

Principal Component Analysis (PCA) and SAS

Introduction

PCA can be performed using either PRINCOMP or FACTOR procedures in SAS .
However FACTOR procedure is more flexible as it is possible to perform the exploratory factor analysis. Because the analysis is to be performed using the FACTOR procedure, the output will at times make references to factors rather than to principal components (i.e., component 1 will be referred to as FACTOR1 in the output, component 2 as FACTOR2, and so forth). PCA should be conducted for the large number of  data .

SAS Programming

 PROC FACTOR DATA=Dummy SIMPLE corr
 MINEIGEN=1
 SCREE
 out=D2 outstat=stat;
 VAR region PhysicianType clinicsize enrollpot setting healthcov
  racenation;
 RUN;

 
Steps in Conducting Principal Component Analysis
Principal component analysis is normally conducted in a sequence of steps, with subjective decisions being made at many of these steps.

Step 1: Initial Extraction of the Components

In principal component analysis, the number of components extracted is equal to the number of variables being analyzed. Because seven variables are analyzed in the present study, seven  components will be extracted. The first component can be expected to account for a fairly large amount of the total variance. Each succeeding component will account for progressively smaller amounts of variance. Although a large number of components may be extracted in this way, only the first few components will be important enough to be retained for interpretation .
Step 2: Determining the Number of “Meaningful” Components to Retain

A.The eigenvalue-one criterion.  

In principal component analysis, one of the most commonly used criteria for solving the number-of-components problem is the eigenvalue-one criterion, also known as the Kaiser criterion (Kaiser, 1960). With this approach, we  retain and interpret any component with an eigenvalue greater than 1.00.

 Each observed variable contributes one unit of variance to the total variance in the data set. Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained

With the SAS System, the eigenvalue-one criterion can be implemented by including the MINEIGEN=1 option in the PROC FACTOR statement, and not including the NFACT option. The use of MINEIGEN=1 will cause PROC FACTOR to retain any component with an eigenvalue greater than 1.00.
B. The scree test.

 With the scree test (Cattell, 1966), you plot the eigenvalues associated with
each component and look for a “break” between the components with relatively large
eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained for rotation; those apppearing after the break are assumed to be unimportant and are not retained. Sometimes a scree plot will display several large breaks. When this is the case, you should look for the last big break before the eigenvalues begin to level off. Only the components that appear before this last large break should be retained.
Specifying the SCREE option in the PROC FACTOR statement causes the SAS System to print  an eigenvalue plot as part of the output.

C. Proportion of variance

  A third criterion in solving the number of factors problem involves retaining a component if it accounts for a specified proportion (or percentage)of variance in the data set. This proportion can be calculated with a simple formula:
Proportion = Eigenvalue for the component of interest/ Total eigenvalues of the correlation matrix


No comments:

Post a Comment