Lecture Notes: Introduction - Phil Ender

Loading...

Ed231A Multivariate Analysis Introduction Introduction to Education 231A Multivariate Analysis Instructor: Phil Ender Email: [email protected] Moore Hall 3030 (310) 206-3195 Textbook: Computer-Aided Multivariate Analysis (4th Edition) by Afifi, Clark and May Publisher: Chapman & Hall/CRC Year: 2004 ISBN 1-58488-308-1 You can view textbook examples for this book using several different statistical software packages at the ATS website: Afifi, Clark & May -- Textbook Examples. Topics Covered by Afifi et al vs Lecture Textbook Lecture matrix algebra simple linear regression simple linear regression multiple linear regression multiple linear regression multivariate multiple regression Hotellings T2 multivariate analysis of variance canonical correlation canonical correlation discriminant analysis discriminant analysis logistic regression probit regression survival analysis principal components analysis principal components analysis factor analysis factor analysis cluster analysis cluster analysis log-linear analysis

Course Organization No exams 10 Computer Assignments Programming using either Stata, SAS or R Note: There will be class the Wednesday before Thanksgiving Electronic Support Multivariate Course Webpage http://www.philender.com/courses/multivariate/ Syllabus Lecture Notes Help Sheets Computer Assignments ed231a_583244200_ender Lecture Notes Lectures will be used in class. Lectures will be available on the Multivariate Course Web site. About Assignments Write your own programs It is usually obvious when people copy someone else's program Make programs general Include comments & labels Computers Running Stata 16 Macs in Moore Hall* 20 Macs in GSE&IS Building* Macs & PCs in CLICC Labs in Powell Library PCs in Social Sciences Computing Lab** *May Require Technology Fee **Social Science students only

Relative Course Difficulty

Let's get started... What makes a model multivariate? Is multiple regression multivariate? The Afifi, Clark & May view of multivariate. Every model has a lhs - left hand side, and a rhs - right hand side model lhs = rhs lhs variables are response variables (the so called dependent variables, outcome variables). rhs variables are predictor or explanatory variables (aka independent variables). Here are two univariate models. y = x y = x1 x2 x3 And two multivariate models. y1 y2 y3 = x y1 y2 y3 = x1 x2 x3 For the purposes of this class, multivariate will be taken to mean models with multiple lhs variables. The concept of right hand side and left hand side equivalence. There are times when rhs variables and lhs variables an be exchanged and the two models can yield the same results. y1 y2 y3 = x x = y1 y2 y3 Examples: /* multivariate anova -- female is a rhs variable */ manova read write math = female Number of obs = 200 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+------------------------------------------------- female | W 0.8501 1 3.0 196.0 11.52 0.0000 e | P 0.1499 3.0 196.0 11.52 0.0000 e | L 0.1763 3.0 196.0 11.52 0.0000 e | R 0.1763 3.0 196.0 11.52 0.0000 e |------------------------------------------------- Residual | 198 -----------+------------------------------------------------- Total | 199 ------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F /* OLS regression -- female is a lhs variable */ /* in SAS: model female = read write math */ regress female read write math Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 11.52 Model | 7.43351627 3 2.47783876 Prob > F = 0.0000 Residual | 42.1614837 196 .215109611 R-squared = 0.1499 -------------+------------------------------ Adj R-squared = 0.1369 Total | 49.595 199 .249221106 Root MSE = .4638 ----------------------------------------------------------------------------- female | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+--------------------------------------------------------------- read | -.0112975 .0045153 -2.50 0.013 -.0202023 -.0023926 write | .0270844 .0046522 5.82 0.000 .0179095 .0362593 math | -.0102947 .0050408 -2.04 0.042 -.020236 -.0003535 _cons | .2476519 .2099033 1.18 0.239 -.1663071 .661611 ------------------------------------------------------------------------------

The role of matrix algebra in multivariate analysis. Matrix algebra gives us a concise and elegant way in which to represent multivariate models. If you are intimidated by it, please realize that the alternatives to matrix representation are worse. Consider this univariate multiple regression model b = (X'X)-1 X'y where X(n,m), y(n,1) & b(m,1) Contrast it with this multivariate multiple regression model B = (X'X)-1 X'Y where X(n,m), Y(n,p) & B(m,p) Some Examples of Multivariate Generalization of Univariate Models These examples are in stat package pseudo-code Regression: model y = x1 /* simple linear regression */ model y = x1 x2 x3 /* multiple linear regression */ model y1 y2 y3 = x1 x2 x3 /* multivariate multiple regression */ Probit Analysis (the z's are binary, 0/1, variables): model z = x1 /* simple probit analysis */ model z = x1 x2 x3 /* multiple probit analysis */ model z1 z2 z3 = x1 x2 x3 /* multivariate probit analysis */ Correlation: model ry,x /* Pearson correlation */ model Ry.x1,x2,x3 /* multiple correlation */ model RC y1,y2,y3 = x1,x2,x3 /* cannonical correlation */ Anova: model y = a /* one-way anova */ model y = a b a*b /* two-way anova */ model y1 y2 y3 = a /* one-way multivariate anova (manova) */ model y1 y2 y3 = a b a*b /* two-way multivariate anova (manova) */

Classifying Multivariate Models I. Testing effects; discriminating among groups A. Hotelling's T2 B. Multivariate Multiple Regression C. MANOVA / MANCOVA D. Discriminant Analysis E. Canonical Correlation Analysis II. Simplification of variable structure; determining dimensionality; rank reduction A. Canonical Correlation Analysis B. Discriminant Analysis C. Principal Components Analysis D. Factor Analysis E. Multidimensional Scaling III. Other A. Cluster Analysis B. Latent Class Analysis Some Multivariate Analogs to Univariate Procedures Student's t-test -> Hotelling's T2 anova -> manova multiple linear regression -> multivariate multiple regression multiple linear regression -> canonical correlation analysis To be a well behaved multivariate analog the multivariate procedure with one response variable should yield equivalent results as the univariate proecedure. Examples: ttest write, by(female) Two-sample t test with equal variances ----------------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 -----------------------------------------------------------------------------Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999 hotel write, by(female) notable 2-group Hotelling's T-squared = 13.943308 F test statistic: ((200-1-1)/(200-2)(1)) x 13.943308 = 13.943308 H0: Vectors of means are equal for the two groups F(1,198) = 13.9433 Prob > F(1,198) = 0.0002 display sqrt(r(T2)) 3.7340739 anova write prog Number of obs = 200 R-squared = 0.1776 Root MSE = 8.63918 Adj R-squared = 0.1693 Source | Partial SS df MS F Prob > F -----------+--------------------------------------------------- Model | 3175.69786 2 1587.84893 21.27 0.0000 | prog | 3175.69786 2 1587.84893 21.27 0.0000 | Residual | 14703.1771 197 74.635417 -----------+--------------------------------------------------- Total | 17878.875 199 89.843593 manova write = prog Number of obs = 200 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+------------------------------------------------- prog | W 0.8224 2 2.0 197.0 21.27 0.0000 e | P 0.1776 2.0 197.0 21.27 0.0000 e | L 0.2160 2.0 197.0 21.27 0.0000 e | R 0.2160 2.0 197.0 21.27 0.0000 e |------------------------------------------------- Residual | 197 -----------+------------------------------------------------- Total | 199 ------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F regress write read female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 77.21 Model | 7856.32118 2 3928.16059 Prob > F = 0.0000 Residual | 10022.5538 197 50.8759077 R-squared = 0.4394 -------------+------------------------------ Adj R-squared = 0.4337 Total | 17878.875 199 89.843593 Root MSE = 7.1327 ----------------------------------------------------------------------------- write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+--------------------------------------------------------------- read | .5658869 .0493849 11.46 0.000 .468496 .6632778 female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 -----------------------------------------------------------------------------display sqrt(.4394192130387506) /* multiple correlation */ .66288703 mvreg write = read female Equation Obs Parms RMSE "R-sq" F P ---------------------------------------------------------------------write 200 3 7.132735 0.4394 77.21062 0.0000 ----------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------write | read | .5658869 .0493849 11.46 0.000 .468496 .6632778 female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 ------------------------------------------------------------------------------

canon (write) (read female) Linear combinations for canonical correlation 1 Number of obs = 200 ----------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------u | write | .105501 .0084684 12.46 0.000 .0888016 .1222004 -------------+---------------------------------------------------------------v | read | .090063 .0078598 11.46 0.000 .0745639 .1055622 female | .8732598 .1614235 5.41 0.000 .5549397 1.19158 ----------------------------------------------------------------------------- (Standard errors estimated conditionally) Canonical correlations: 0.6629 display .66288703^2 /* canonical correlation squared */ .43941921

Multivariate Course Page Phil Ender, 12jul07, 30sep05, 24jan05

Loading...

Lecture Notes: Introduction - Phil Ender

Ed231A Multivariate Analysis Introduction Introduction to Education 231A Multivariate Analysis Instructor: Phil Ender Email: [email protected] Moore Hall...

72KB Sizes 1 Downloads 11 Views

Recommend Documents

CS 540 Lecture Notes: Introduction
Another answer: "It is the science and engineering of making intelligent machines, especially intelligent computer progr

Introduction to Macroeconomics Lecture Notes
unemployment (around 5%), good economic growth, and inflation (0—3%). In all specifications, aim is meeting several co

CSI3104 Introduction to Formal Languages Lecture Notes
Basic questions. ▫ What do theoretical models of a computer look like? ▫ What can we compute and what can't we compu

Introduction to Christian Theology – Lecture Notes
Some introduction courses end up being one big excursus on “how to” or “how should” one do theology. This course

STAT 6200 | Introduction to Biostatistics Lecture Notes
Lecture Notes. Introduction*. Statistics and Biostatistics: The field of statistics: The study and use of theory and met

Introduction to Geophysics – Lecture Notes
Mar 23, 2015 - We have a rock sample and we have measured the value of density to be 2.60 g/cm3. According to this value

Introduction to Software Engineering: Lecture Notes
Lecture 01: Introduction to Software Engineering - 1. http://ocw.kfupm.edu.sa/ocw_courses/phase2/SWE205/Lecture%20Notes/

Computers Simplified Supplemental Lecture Notes. Introduction
Parts of A Computer Hardware – you can see it or touch it Software – set of electronic instructions – Application

Numerical Methods Lecture Notes: introduction - damtp
Introduction. These lecture notes are written for the Numerical Methods course as part of the Natural Sciences Tripos, P

Lecture notes, lectures 1 - Introduction - COMP90015: Distributed
distributed systems comp90015 2016 sm2 introduction introduction to distributed systems from coulouris, dollimore and ki