Canonical Correlation Analysis, Redundancy Analysis and :典型相關(guān)分析冗余分析
Canonical Correlation Analysis, Redundancy Analysis and Canonical Correspondence Analysis Hal WhiteheadBIOL4062/5062 Canonical Correlation Analysis Redundancy Analysis Canonical Correspondence AnalysisMultivariate Statistics with Two Groups of Variables Look at relationships between two groups of variables species variables vs environment variables (community ecology) genetic variables vs environmental variables (population genetics)VariablesUnitsXs YsCanonical Correlation Analysis Multivariate extension of correlation analysis Looks at relationship between two sets of variablesCanonical Correlation AnalysisGiven a linear combination of X variables:F = f1X1 + f2X2 + . + fpXpand a linear combination of Y variables:G = g1Y1 + g2Y2 + . + gqYqThe first canonical correlation is:Maximum correlation coefficient between F and G,for all F and G F1=f11,f12,.,f1p and G1=g11,g12,.,g1qare corresponding canonical variatesCanonical Correlation Analysis4.04.55.05.56.023451497431213102815191617611182015X1X21.01.52.00.00.51.01.51415317118259761912208104111613Y1Y2FF(16)F(7)GG(16)G(7)Maximize r(F,G)Canonical Correlation AnalysisThe first canonical correlation is:Maximum correlation coefficient between F and G,for all F and G F1=f11,f12,.,f1p and G1=g11,g12,.,g1qare corresponding first canonical variatesThe second canonical correlation is:Maximum correlation coefficient between F and G,for all F, orthogonal to F1, and G, orthogonal to G1 F2=f21,f22,.,f2p and G2=g21,g22,.,g2qare corresponding second canonical variatesetc.Canonical Correlation Analysis So each canonical correlation is associated with a pair of canonical variates Canonical correlations decrease Canonical correlations are higher than generally found with simple correlations as coefficients are chosen to maximize correlationsCanonical Correlation AnalysisCorrelation Matrix: X1 X2 X3 . Xp Y1 . YqX1X2 . A (pxp) C (pxq).XpY1. C (qxp) B (qxq).YqCanonical correlations are:Squareroots of Eigenvalues of B-1 C A-1 CCanonical variates for Y variables are EigenvectorsNumber of canonical correlations =min(No. Xs, No. Ys)Can test whether canonical correlations are significantly different from 0Canonical Correlation AnalysisWhat are the canonical correlations?Are they, in toto, significantly different from zero?Are some significant, others not? Which ones?What are the corresponding canonical variates? How does each original variable contribute towards each canonical variate (use loadings)?How much of the joint covariance of the two sets of variables is explained by each pair of canonical variates?Relationship to:Canonical Variate Analysis We can define dummy (1:0) variables to define groups of units: 1 = in group; 0 = out of group A canonical correlation analysis between these dummy grouping variables and the original variables is equivalent to a canonical variate analysisRedundancy Analysisy1 y2 Correlation Analysisx = y Simple Regression AnalysisX = y Multiple Regression Analysis(X=x1,x2,.)Y1 Y2 Canonical Correlation AnalysisX = Y Redundancy AnalysisHow one set of variables (X) may explain another set (Y)Redundancy Analysis “Redundancy” expresses how much of the variance in one set of variables can be explained by the otherRedundancy AnalysisOutput:canonical variates describing how X explains Y non-canonical variates(principal components of the residuals of Y)results may be presented as a biplot:two types of points representing the units and X-variables, vectors giving the Y-variablesHourly records of sperm whale behaviour Variables: Mean cluster size Max. cluster size Mean speed Heading consistency Fluke-up rate Breach rate Lobtail rate Spyhop rate Sidefluke rate Coda rate Creak rate High click rate Data collected: Off Galapagos Islands 1985 and 1987 Units: hours spent following sperm whales 440 hoursHourly records of sperm whale behaviour Variables: Mean cluster size Max. cluster size Mean speed Heading consistency Fluke-up rate Breach rate Lobtail rate Spyhop rate Sidefluke rate Coda rate Creak rate High click rate Data collected: Off Galapagos Islands 1985 and 1987 Units: hours spent following sperm whales 440 hoursPhysicalAcousticCanonical Correlation Analysis:Physical vs. Acoustic Behaviour 1 2 3Canonical correlations0.72 0.490.21 P-values0.000.000.06Redundancies:V(Acoustic) | V(Physical)34%20%1%V(Physical) | V(Acoustic)32% 8%1%Physical vs. Acoustic BehaviourCanonical correlations 1 2Loadings:Mean cluster size-0.95 0.07Max. cluster size-0.85 0.47Mean speed 0.21 0.06Heading consistency 0.32-0.27Fluke-up rate 0.73 0.23Breach rate-0.16 0.02Lobtail rate-0.22 0.03Spyhop rate-0.18 0.32Sidefluke rate-0.21 0.35Coda rate-0.64 0.64Creak rate-0.50 0.79High click rate 0.76 0.64Canonical Correspondence Analysis Canonical correlation analysis assumes a linear relationship between two sets of variables In some situations this is not reasonable(e.g. community ecology) Canonical correspondence analysis assumes Gaussian (bell-shaped) relationship between sets of variables “Species” variables are Gaussian functions of “Environmental” variablesCANOCOEnvironmental variable XSpecies abundanceSpecies ASpecies BSpecies CEnvironmental variable YSpecies abundanceCanonical CorrelationAnalysisEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundanceCanonical CorrespondenceAnalysisEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundance1.4X + 0.2YSpecies abundanceBest combination of X and YSpecies abundanceEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundance1.4X + 0.2YSpecies abundanceBest combination of X and YSpecies abundanceEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundance1.4X + 0.2YSpecies abundanceBest combination of X and YSpecies abundanceCanonical correspondence analysis: Dutch spiders 26 environmental variables 12 spider species 100 samples (pit-fall traps)Axes 1 2 3 4Eigenvalues .535 .214 .063 .019 Species-environment correlations .959 .934 .650 .782Cumulative percentage variance of species data 46.6 65.2 70.7 72.3 of species-environment relation 63.2 88.5 95.9 98.2Axis 1Axis 1Axis 2Canonical correspondence analysis can be detrendedThe Horseshoe effect Environmental Gradient Sp A 0000 00011Sp B000000110Sp C000000110Sp D 000001100Sp E000011100Sp F000111000Sp G 000110000Sp H001100000Sp I111000000Axis 1Axis 1Axis 2Detrended Axis 1Detrended Axis 2Detrended Canonical Correspondence Analysis Canonical Correlation Analysis Examines relationship between two sets of variables Redundancy Analysis Examines how set of dependent variables relates to set of independent variables Canonical Correspondence Analysis Counterpart of Canonical Correlation and Redundancy Analyses when relationship between sets of variables is Gaussian not linear