《Canonical Correlation Analysis, Redundancy Analysis and :典型相關(guān)分析冗余分析》由會員分享,可在線閱讀,更多相關(guān)《Canonical Correlation Analysis, Redundancy Analysis and :典型相關(guān)分析冗余分析(30頁珍藏版)》請?jiān)谘b配圖網(wǎng)上搜索。
1、Canonical Correlation Analysis, Redundancy Analysis and Canonical Correspondence Analysis Hal WhiteheadBIOL4062/5062 Canonical Correlation Analysis Redundancy Analysis Canonical Correspondence AnalysisMultivariate Statistics with Two Groups of Variables Look at relationships between two groups of va
2、riables species variables vs environment variables (community ecology) genetic variables vs environmental variables (population genetics)VariablesUnitsXs YsCanonical Correlation Analysis Multivariate extension of correlation analysis Looks at relationship between two sets of variablesCanonical Corre
3、lation AnalysisGiven a linear combination of X variables:F = f1X1 + f2X2 + . + fpXpand a linear combination of Y variables:G = g1Y1 + g2Y2 + . + gqYqThe first canonical correlation is:Maximum correlation coefficient between F and G,for all F and G F1=f11,f12,.,f1p and G1=g11,g12,.,g1qare correspondi
4、ng canonical variatesCanonical Correlation Analysis4.04.55.05.56.023451497431213102815191617611182015X1X21.01.52.00.00.51.01.51415317118259761912208104111613Y1Y2FF(16)F(7)GG(16)G(7)Maximize r(F,G)Canonical Correlation AnalysisThe first canonical correlation is:Maximum correlation coefficient between
5、 F and G,for all F and G F1=f11,f12,.,f1p and G1=g11,g12,.,g1qare corresponding first canonical variatesThe second canonical correlation is:Maximum correlation coefficient between F and G,for all F, orthogonal to F1, and G, orthogonal to G1 F2=f21,f22,.,f2p and G2=g21,g22,.,g2qare corresponding seco
6、nd canonical variatesetc.Canonical Correlation Analysis So each canonical correlation is associated with a pair of canonical variates Canonical correlations decrease Canonical correlations are higher than generally found with simple correlations as coefficients are chosen to maximize correlationsCan
7、onical Correlation AnalysisCorrelation Matrix: X1 X2 X3 . Xp Y1 . YqX1X2 . A (pxp) C (pxq).XpY1. C (qxp) B (qxq).YqCanonical correlations are:Squareroots of Eigenvalues of B-1 C A-1 CCanonical variates for Y variables are EigenvectorsNumber of canonical correlations =min(No. Xs, No. Ys)Can test whet
8、her canonical correlations are significantly different from 0Canonical Correlation AnalysisWhat are the canonical correlations?Are they, in toto, significantly different from zero?Are some significant, others not? Which ones?What are the corresponding canonical variates? How does each original varia
9、ble contribute towards each canonical variate (use loadings)?How much of the joint covariance of the two sets of variables is explained by each pair of canonical variates?Relationship to:Canonical Variate Analysis We can define dummy (1:0) variables to define groups of units: 1 = in group; 0 = out o
10、f group A canonical correlation analysis between these dummy grouping variables and the original variables is equivalent to a canonical variate analysisRedundancy Analysisy1 y2 Correlation Analysisx = y Simple Regression AnalysisX = y Multiple Regression Analysis(X=x1,x2,.)Y1 Y2 Canonical Correlatio
11、n AnalysisX = Y Redundancy AnalysisHow one set of variables (X) may explain another set (Y)Redundancy Analysis “Redundancy” expresses how much of the variance in one set of variables can be explained by the otherRedundancy AnalysisOutput:canonical variates describing how X explains Y non-canonical v
12、ariates(principal components of the residuals of Y)results may be presented as a biplot:two types of points representing the units and X-variables, vectors giving the Y-variablesHourly records of sperm whale behaviour Variables: Mean cluster size Max. cluster size Mean speed Heading consistency Fluk
13、e-up rate Breach rate Lobtail rate Spyhop rate Sidefluke rate Coda rate Creak rate High click rate Data collected: Off Galapagos Islands 1985 and 1987 Units: hours spent following sperm whales 440 hoursHourly records of sperm whale behaviour Variables: Mean cluster size Max. cluster size Mean speed
14、Heading consistency Fluke-up rate Breach rate Lobtail rate Spyhop rate Sidefluke rate Coda rate Creak rate High click rate Data collected: Off Galapagos Islands 1985 and 1987 Units: hours spent following sperm whales 440 hoursPhysicalAcousticCanonical Correlation Analysis:Physical vs. Acoustic Behav
15、iour 1 2 3Canonical correlations0.72 0.490.21 P-values0.000.000.06Redundancies:V(Acoustic) | V(Physical)34%20%1%V(Physical) | V(Acoustic)32% 8%1%Physical vs. Acoustic BehaviourCanonical correlations 1 2Loadings:Mean cluster size-0.95 0.07Max. cluster size-0.85 0.47Mean speed 0.21 0.06Heading consist
16、ency 0.32-0.27Fluke-up rate 0.73 0.23Breach rate-0.16 0.02Lobtail rate-0.22 0.03Spyhop rate-0.18 0.32Sidefluke rate-0.21 0.35Coda rate-0.64 0.64Creak rate-0.50 0.79High click rate 0.76 0.64Canonical Correspondence Analysis Canonical correlation analysis assumes a linear relationship between two sets
17、 of variables In some situations this is not reasonable(e.g. community ecology) Canonical correspondence analysis assumes Gaussian (bell-shaped) relationship between sets of variables “Species” variables are Gaussian functions of “Environmental” variablesCANOCOEnvironmental variable XSpecies abundan
18、ceSpecies ASpecies BSpecies CEnvironmental variable YSpecies abundanceCanonical CorrelationAnalysisEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundanceCanonical CorrespondenceAnalysisEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundance1.4X
19、+ 0.2YSpecies abundanceBest combination of X and YSpecies abundanceEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abundance1.4X + 0.2YSpecies abundanceBest combination of X and YSpecies abundanceEnvironmental variable XSpecies abundanceEnvironmental variable YSpecies abunda
20、nce1.4X + 0.2YSpecies abundanceBest combination of X and YSpecies abundanceCanonical correspondence analysis: Dutch spiders 26 environmental variables 12 spider species 100 samples (pit-fall traps)Axes 1 2 3 4Eigenvalues .535 .214 .063 .019 Species-environment correlations .959 .934 .650 .782Cumulat
21、ive percentage variance of species data 46.6 65.2 70.7 72.3 of species-environment relation 63.2 88.5 95.9 98.2Axis 1Axis 1Axis 2Canonical correspondence analysis can be detrendedThe Horseshoe effect Environmental Gradient Sp A 0000 00011Sp B000000110Sp C000000110Sp D 000001100Sp E000011100Sp F00011
22、1000Sp G 000110000Sp H001100000Sp I111000000Axis 1Axis 1Axis 2Detrended Axis 1Detrended Axis 2Detrended Canonical Correspondence Analysis Canonical Correlation Analysis Examines relationship between two sets of variables Redundancy Analysis Examines how set of dependent variables relates to set of independent variables Canonical Correspondence Analysis Counterpart of Canonical Correlation and Redundancy Analyses when relationship between sets of variables is Gaussian not linear