Chat with us, powered by LiveChat 27 Questions TD0409-01 课件/Psy 202_10_Replication - Credence Writers
+1(978)310-4246 [email protected]

27 Questions

TD0409-01 课件/Psy 202_10_Replication and Open Science_W22-1.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 10:
STATISTICAL ISSUES AND THE

REPLICATION CRISIS

1

1. Best Practices in Research Psychology
1. Some scandal
2. From fraud to QRPs
3. Open science
4. Looking forward

GAME PLAN

2

BEST PRACTICES IN
RESEARCH PSYCHOLOGY

3

OUTLINE

 B e s t P r a c t i ces
 Examples of fraud/data manipulation
 Questionable Research Practices (QRPs)
 Doing research ethically and responsibly

 Re p ro d uci bilit y a nd Re p l ica tio n E f fo r t s
 Motivation
 Histor y of Attempts

 T h e Fu t u r e: B e i ng a g o o d c o ns u m e r o f p s yc ho lo gic al s c i e nc e

4

INCENTIVE STRUCTURE

 P u b l ish ed wo rk i s i m p o r t ant g et t i ng a j o b , g et t ing te nu r e , b e i ng
awa r ded g r a nt s , a nd b e i ng v i ewed f avo r a bly i n o u r fi e l d.

 A s a r e s u l t , a “ r a t r a c e ” c u l t u r e d evelo ps a nd p e o p l e t r y to
p u b lis h a s m u c h a s t h ey c a n.

 B a l a nc ing t h e d e s i re to s t ay t r u t h ful to p s yc ho lo gic al s c i e nc e
w i t h t h e ne c e s s it y to p u b l ish.

 T h i s r e s u lt s i n r e s e arc her s t a k i ng s h o r t c u ts a nd s o m et i mes
wo r s e …

5

RECENT CASES OF RESEARCH MISCONDUCT

Karen Ruggiero (late 90s, early 00’s)
Marc Hauser (2007-2011)
Diederick Stapel (2011)
Dirk Smeesters (2011-2012)
Larr y Sanna (2012)
Jens Förster (2014-2015)
Michael LaCour (2015)

6

7

… I think it is important to emphasize that I never
informed my colleagues of my inappropriate
behavior. I offer my colleagues, my PhD students, and
the complete academic community my sincere
apologies. I am aware of the suffering and sorrow
that I caused to them.

I did not withstand the pressure to score, to publish,
the pressure to get better in time. I wanted too much,
too fast. In a system where there are few checks and
balances, where people work alone, I took the wrong
turn. I want to emphasize that the mistakes that I
made were not born out of selfish ends.

-Brabants Dagblad. 31 October 2011.
-Translated from Dutch

8

h t t p : / / www. ny t imes . c o m/ 201 3/ 04 /2 8/ magaz ine/ dieder ik – st apels –
a u d ac io us – ac ademic – f raud . htm l?pagewanted= all

SCIENTIFIC FRAUD

9

10

NOT JUST PSYCHOLOGY. . .

 D r u g s t u d ies : 2 0 – 2 5 % r e p l icate ( P r i nz e )
 C a nc e r t r e a t m ent: 1 1 % r e p l ic ate ( B e g ley)

11

HOWEVER, OTHER PRACTICES DON’T
CONSTITUTE FRAUD

Questionable Research Practices

Decisions in design, analysis, and reporting
that increase the likelihood of achieving a
positive result
And a positive response from editors and reviewers

12

FALSE POSITIVE PSYCHOLOGY

 How do decisions in analyses af fect the final results?

 Having small samples, collecting additional dependent
variables, peeking at data, dropping an experimental
condition

 If enough possibilities are enter tained, the likelihood
of achieving a significant result could be over 80%!

Simmons et al., 2011
13

Did you get the effect you predicted?

Did you get ANY effect?

Publish

HARK!

HARKing: Hypothesizing After Results are Known

Figure by S. Vazire14

Did you get the effect you predicted?

Did you get ANY effect?

Publish
Can you dig around and find one?

N
o

HARK!

p-hack!

Figure by S. Vazire

“p-hacking” = fishing around in your data for statistically significant results
Often involves redefining variables or running unplanned analyses

15

EXAMPLE:
IS THE U.S. ECONOMY

AFFECTED BY WHETHER
DEMOCRATS OR

REPUBLICANS ARE IN
OFFICE?

http://fivethirtyeight.com/features/science-isnt-broken/#part2
16

NOT SO SIMPLE…

 Do you look at the number of Republicans or
Democrats?

 Which politicians do you look at?

 How do you measure the U.S. economy?

 Should you look at it in general or excluding
economic recessions?

17

18

19

20

21

QUESTIONABLE RESEARCH PRACTICES

 John, Loewenstein, & Prelec (2012) sur veyed 2,155
academic psychologists about the frequency of 10
dif ferent QRPs…..

 Not reporting all measures, rounding off p-values, only
including data that “worked out”

 Up to 63.4% admission and high levels of each being
“defensible”

22

WHAT SHOULD RESEARCHERS DO?

 Increase disclosure in methods, results, and
hypothesis presentation

 Pre-register hypotheses and studies
 Data collection rules, analytic strategies

 Share data

 Be a responsible scientist regardless of outcome

23

CENTER FOR OPEN SCIENCE

 Open Science Framework

 Founded to increase to openness, integrity, and
reproducibility of scientific research
 Brian Nosek and Jeff Spies

 Open source sof tware platform for pre-registering
hypotheses, archiving study materials, depositing
data and syntax

 Initiated the Reproducibility Project
24

CENTER FOR OPEN
SCIENCE

Vi d e o :
h t t ps : / / w w w. yo ut u b e . c o m / wa t c h ? v= D Ix m LVr AQ i w

25

PRODUCING RELIABLE FINDINGS

Reproducibility: A study can be duplicated in
method and/or analysis

Replicability: A study about a phenomenon
produces similar results from a previous study
of the same phenomenon.
Close/Exact Replications
Conceptual Replications

26

ARE PSYCHOLOGY
FINDINGS REPRODUCIBLE

AND REPLICABLE?

27

MANY LABS 1 .0

 Star ted running studies that could be done relatively
easily.

 Ef fects varied from those that have been known to
replicate (classic studies) and those that were
unknown.

28

MANY LABS 1 .0

29

MANY LABS 2.0/3.0 AND OTHER CHANGES

 Many Labs 2.0: Replication across sample and
setting

 Many Labs 3.0: Subject pool quality across the
academic semester

 Editorial policies of some journals changed
Report effect sizes, power, confidence intervals

 Special issues on replication

 Increase in meta-analyses
30

 Di s s e mi n at i o n o f Re pl i c at i o n At te mpt s
 Journal of Null Results

 P s yc h fi l e drawe r. o r g : Arc h i ve s at te mpte d re pl i c at i o n s o f s pe c i fi c
s t udi e s an d w h et h e r re pl i c at i o n was ac h i eve d

 Ce n te r fo r O pe n Sc i e n c e : P s yc h o l o g i s t B ri an N o s e k , a c h ampi o n
o f re pl i c at i o n i n p s yc h o l o g y, h as c re ate d t h e O pe n Sc i e n c e
Framewo rk , w h e re re pl i c at i o n s c an be re po r te d .

 As s o c i at i o n o f P s yc h o l o g i c al Sc i e n c e : Has re g i s te re d re pl i c at i o n s
o f s t udi e s , w i t h t h e ove ral l re s ul t s publ i s h e d i n Per spe c t ive s o n
Psyc h o l o g i c a l Sc ien c e .

 P l o s O n e : P ub l i c L i b rar y o f Sc i e n c e —pu bl i s h e s a bro ad ran g e o f
ar t i c l e s , i n c l udi n g f ai l e d re pl i c at i o n s , an d t h e re are o c c as i o n al
s ummari e s o f re pl i c at i o n at te mpt s i n s pe c i fi c are as .

 Th e Re pl i c at i o n In dex : Cre ate d i n 2 014 by U l ri c h Sc h i mmac k , t h e
s o – c al l e d ” R In dex ” i s a s t at i s t i c al to o l fo r e s t i mat i n g t h e
re p l i c abi l i t y o f s t udi e s , o f j o urn al s , an d eve n o f s p e c i fi c
re s e arc h e r s .

 An d mo re ! !

RESPONSES TO REPLICATION CRISIS

31

SOME CRITICISMS

 Researchers cherr y pick studies because they have
some personal/ intellectual ax to grind

 People who do replications are somehow not
qualified to do science

 Science is naturally self-correcting

 Unknown dif ferences between studies
 Sample-specific reasons for non-replication

32

UNKNOWN DIFFERENCES

Approval at Time 1: 65%

Approval at Time 2: 32%

33

REPRODUCIBILIT Y PROJECT (2015)

Large-scale replication

100 studies from 3 different journals
Close/exact replications
Contacted original study authors
Open materials and data
 Reduces likelihood of “unknown differences” effect

How many do you think replicated?

34

WHY DIDN’T MORE FINDINGS REPLICATE?

Perhaps some difference between studies
Boundary effects

Or perhaps the effect didn’t exist in the first
place?
Some uncertainty in findings
File drawer problem

35

FILE DRAWER PROBLEM

36

h t t p s : / / www. yo ut ube. co m / wat c h?v =0R nq1 NpH dmw

JOHN OLIVER KNOWS (NSFW)

WHAT DOES GOOD
RESEARCH LOOK LIKE?

38

GOOD RESEARCH

 Good research is open research
 Materials and data are shared

publicly

 Good research features
experimental methods that are
strong and isolate a question
of interest

 Good research is adequately
“powered” research (see
Tutorial 7 for a review)

39

GOOD RESEARCH

Good research is reproducible

40

CONSUMING SCIENCE

 Be an informed consumer of science

 Don’t believe ever ything you read!
 If an effect seems unbelievable, it just might be.

 Pay attention to sample size
 How big is the sample?
 Effects are unreliable if sample size is too low, a 2,000 person

study more reliable than a 50 person study.

41

CONSUMING SCIENCE

 Is the study you are reading the only demonstration of
this ef fect?
 Have people from other labs replicated this?

 Did the authors make their data available?

 Advocate for good research so we can understand
more about humans and why they do the things they
do

42

 St ar t He re
 A summar y
 http://nobaproject.com/modules/the-re plication-crisis-in-psychology

 A dissent
 http://www.nytimes.com/2015/09/01/opinion/psychology -is-not-in-

crisis.html?_r=0
 O pt i o n al
 A counterpoint to the dissent
 http://www.theatlanti c.com/notes/2015/09/swee ping-psychologys-problems-

under-the-rug/403726/
 A possible solution, and preliminar y findings
 http://as.virginia.edu/news/massive-collaboration-testing-re producibility –

psychology -studies-publishes-findings
 A response to the possible solution
 https://www.sciencenews.org/ar ticle/psychologys-re plication-crisis-sparks-

new -debate
 It’s not just us
 http://www.slate .com/a r ticles/health_and_science/future_ tense/2016/04/bi

omedicine_facing_a_wor se_replication_crisis_than_the_one_plaguing_psychol
ogy.html

O P T I O N A L R E A D I N G S : “ R E P L I C A B I L I T Y C R I S I S ” I N
P SYC H O LO GY

43

REPLICATION CRISIS
OR

CREDIBILIT Y REVOLUTION?

44

Interviewer: “How much of
what you print is wrong?”
Maddox: “All of it. That’s
what science is about — new
knowledge constantly
arriving to correct the old.”

John Maddox,
editor of Nature for 22 years

45

 D a t a A na l ys is P ro j e c t
 Due Tuesday April 5, 11:59pm

 C o u r s e E va l s ( s e e a nno u nc e me nt )

 F i na l exa m
 Tuesday Apr 12 9 am to Thursday April 14 11:59pm
 Same basics as Midterm; see Assessment Page for more info

46

TO DO

  • Psy 202H1: �Statistics iI���Module 10: �Statistical Issues and the Replication Crisis�
  • Game Plan
  • Best Practices in Research Psychology
  • Outline
  • Incentive Structure
  • Recent Cases of Research Misconduct
  • Slide Number 7
  • Slide Number 8
  • Scientific Fraud
  • Slide Number 10
  • Not just psychology. . .
  • However, other practices don’t constitute fraud
  • False Positive Psychology
  • Slide Number 14
  • Slide Number 15
  • EXAMPLE: �Is the U.s. economy affected by whether democrats or republicans are in office?
  • Not so simple…
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Questionable Research Practices
  • What should researchers do?
  • Center for Open Science
  • Center for open science
  • Producing Reliable Findings
  • Are psychology findings reproducible and replicable?
  • Many Labs 1.0
  • Many Labs 1.0
  • Many Labs 2.0/3.0 and other changes
  • Responses to Replication Crisis
  • Some criticisms
  • Unknown differences
  • Reproducibility Project (2015)
  • Why didn’t more findings replicate?
  • File Drawer Problem
  • John Oliver Knows (NSFW)
  • What does good research look like?
  • Good Research
  • Good Research
  • Consuming Science
  • Consuming Science
  • Optional Readings: “Replicability Crisis” In Psychology
  • Replication Crisis�or �Credibility Revolution?
  • Slide Number 45
  • To Do

TD0409-01 课件/Psy 202_7_Regress_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 7:
REGRESSION

1

1. Introduction to Regression
1. Linear Regression vs Correlation
2. Hypothesis Testing with Regression
3. Video

2. Multiple Regression
1. What is it, even?
2. What can we learn?

GAME PLAN

2

Correlation Review!

LINEAR REGRESSION

STATISTICAL TECHNIQUE USED TO
PREDICT THE UNKNOWN VALUE OF ONE

VARIABLE GIVEN A KNOWN VALUE OF
ANOTHER VARIABLE

3

REVIEW

Many studies aim to determine if two
variables has a Co-Var ying Relationship with
one another

When the value of one variable reliably changes in
value with another variable
 Positive Covariance = When the two variables change in the

same direction
 E.g. Weight-Height / Study Time-Exam Performance

 Negative Covariance = When the two variables change in
opposite directions
 E.G. Stress-Meditation / Alcohol Intoxication – Coordination

4

INTRODUCTION TO LINEAR EQUATIONS
AND REGRESSION

 T h e Pe a r so n c o r r e l at io n m e a s ur es a l i ne ar r e l a t io ns hip
b et ween t wo va r i a bles.

 T h e l i ne t h ro u gh t h e d a t a
 Makes the relationship easier to see
 Shows the central tendency of the relationship
 Can be used for prediction

 Re g r es si o n a na l y si s p r e c i sely d e fi nes t h e l i ne .

5

REVIEW
ASSESSING FOR T HE PRESENCE OF COVARIAT ION

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

Perfect Relationships Allow Perfectly Precise Predictions to a Single Value

E.G. If X and Y were perfectly related (r = +/- 1.00), I could accurately predict Y to a single
value given a single value of X

For example, if X = 3, I would predict that Y = 5 6

REVIEW
ASSESSING FOR THE PRESENCE OF CVRS

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

Imperfect Relationships Allow for Predictions to a Range of Values (not perfectly precise)

E.G. If X and Y were imperfectly related, ǀ r ǀ < 1.00, I could accurately predict Y to a range
of values given a single value of X

For example, if X = 3, I would predict that Y would be between 4 – 6 7

REVIEW
ASSESSING FOR THE PRESENCE OF CVRS

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

The stronger the CVR between two variables, the more precise the predictions are (or, the
more narrow the range of predicted values of one variable)

Variables X & Y: r = +.79
If X = 3, Y is predicted to be between 4 – 6

Variables A & B: r = +.32
If X = 3, Y is predicted to be between 2 – 8

More precise prediction Less precise prediction 8

LINEAR REGRESSION

When a significant correlation has been found
between two variables, it is common for
researchers to want to generate an equation
that would be useful for predicting the value
of one of the variables given the known value
of the other variable.

Linear Regression is the technique to use in
order to accomplish this
Linear Regression utilizes the equation of a straight

line in order to make these predictions

9

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b

m = slope of the line = (Change in Y) / (Change in X) = (Y2 – Y1) / (X2 – X1)

b = Y-intercept = the value of Y when X = 0

r = +1.00

10

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b

m = slope of the line = [(Y2 – Y1) / (X2 – X1)] = [(11 – 9) / (6 – 5)] = (2 / 1) = +2.00

b = Y-intercept = the value of Y when X = 0 = -1.00
Based on subtracting 2 from Y for every 1-value decrease in X
(e.g. When X = 2, Y = 3; When X = 1, Y = 1; When X = 0, Y = -1)

r = +1.00

11

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b
y = +2.00(x) + -1.00

Now, we can predict an unknown value of Y given a value of X
If X = 15, what is the predicted value of Y?

If X = 100, what is the predicted value of Y?

r = +1.00

12

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b
y = +2.00(x) + -1.00

Now, we can predict an unknown value of Y given a value of X
y = +2.00 (15) + -1.00 = 29.00

y = +2.00 (100) + -1.00 = 199.00

r = +1.00

13

LINEAR REGRESSION

y = +2.00(x) + -1.00
y = +2.00 (15) + -1.00 = 29.00

y = +2.00 (100) + -1.00 = 199.00

Since X & Y were perfectly related, we can make precise, single-value
predictions of Y from a given value of X

r = +1.00

14

LINEAR REGRESSION

What line should we use to characterize the relationship, and how
do we determine its equation?

What single line do we draw in order to determine its equation?

15

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

We will want to choose the “Best Fitting Regression Line”
The Line That Has The Smallest Average Degree of Prediction Error

Prediction Error = Y’ – Y
Y’ = The Line’s Predicted Value of Y; Y = Actual Value of Y

Y’

YPrediction Error
Y’ – Y

16

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others

Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error17

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others

Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error18

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do
we determine its equation?

There will always be one single straight line that “best fits the data”
or,

There will always be one straight line that has a smaller average degree of
Prediction Error than all other possible straight lines

This line is what is known as the “Least Squares Regression Line”, or “Best Fitting
Regression Line”

We will want to determine this line’s equation and use it for prediction

19

LINEAR REGRESSION

Equation of the “Least Squares Regression Line”

Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but this is the more common regression notation

In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:

Calculate First Calculate Second 20

LINEAR REGRESSION

Equation of the “Least Squares Regression Line”

Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but I will use this to be consistent with the textbook

In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:

Once you have calculated the
Pearson r correlation coefficient by hand,
then calculating this is easy as most of the

work has been completed already
21

 Fu nc t i o nal: D e fi ni ng t h e l i ne o f b e s t fi t t h a t we v i s u ally
e s t i m ated i n o u r s c a t terplot s

 C o nc e p t ual: H ow we d i s c us s o u r r e s u l ts
 Correlation does not specify relationship directionality at all
 Regression can imply it, if not directly test it
 “Predicting X FROM Y”

 St a t i s tic al:
 Simple linear regression and correlation will yield same results
 Beta (β) or b instead of Pearson’s r

 As statistics get more complex, regression gives us more functions
 Adding multiple predictors

LINEAR REGRESSION VS CORRELATION

22

LINEAR REGRESSION IS FOR THE BIRDS

23

P r a c t i ce e s t i m at ing s l o p e s a nd i nte rc ept s h e r e !
h t t p s : / / so phi eehill. s hinyap ps. io / eyebal l- regr es sio n/

24

EYEBALL REGRESSION

25

MORE ON PREDICTION
EQUATIONS

26

PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)

HOURS

876543210

S
C

O
R

E

6

5

4

3

2

1

0 Rsq = 0.5875

r = .77

r2 = .59

27

PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)

HOURS

876543210

S
C

O
R

E

6

5

4

3

2

1

0 Rsq = 0.5875

Slope (β)

Hours Studying (X)

Q
ui

z
S

co
re

Y
Intercept (α)

How can we describe
this “regression” line?

28

LINEAR REGRESSION MODEL

 Population parameters:

 Sample statistics:

 a & b are constants
 Y, X, & e vary for each person (i)

Yi = α + β(Xi)

Yi = a + b (Xi)

29

SIMPLE LINEAR REGRESSION
EQUATION

 Y′ = Predicted value of Y
 a = Intercept

Value of Y when X = 0
 b = Slope, unstandardized regression coefficient

Change in Y for every 1-unit change in X
 X = Any value of X

 Note:
 a (intercept) & b (slope) are constants

 X & Y are variables

Y′ = a + bX

30

EXAMPLE OF A PREDICTION EQUATION

Predict quiz score (Y) from hour s studying (X)

Y′ = 1.5 + .5X

31

PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)

HOURS

876543210

S
C

O
R

E

6

5

4

3

2

1

0 Rsq = 0.5875

Quiz(Y′) = 1.5 + .5X

a (intercept) = 1.5
b (slope) = .5

32

EXAMPLES OF PREDICTION EQUATIONS

Predict marital satisfaction (Y) from conflict (X)

Predict depression (Y) from stressful events (X)

Y′ = 10 + (-1)X

Y′ = 10 + 2X

33

UNDERSTANDING THE SLOPE

 Expected change in Y for ever y 1-unit change in X

“Rise over run”

 Slope can be positive (Y increases as X increases)
or negative (Y decreases as X increases)

 The (unstandardized) slope is in the metric of Y

34

COMPUTING THE SLOPE & INTERCEPT

bYX = = bYX =XSS
SP

2)X(X

)Y)(YX(X

∑ −

∑ −−

X

Y

s
sr

aYX = Y – b X

or

• These are the formulas for
regression of Y on X

• They are not reciprocal!

35

SAMPLE COMPUTATIONS

bYX = = = .50
X

Y

s
sr

aYX = = 3.333- (.50) (3.667) = 1.5Y – b X

• Predict quiz scores from hours studying
• Assume r = .77, sY = 1.63, sX = 2.50, MY = 3.33, MX = 3.67

50.2
63.177.

Y′Quiz = 1.5 + .5 (XStudy)

36

PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)

HOURS

876543210

S
C

O
R

E

6

5

4

3

2

1

0 Rsq = 0.5875

Quiz(Y′) = 1.5 + .5X

a (intercept) = 1.5
b (slope) = .5

37

PREDICTION ERRORS

 How good a job are we doing at predicting
Y from X?

 Compute each Y′ from prediction equation
 Simply plug in X for each person to determine Y′

 How close is Y′ to Y?
 Y – Y′ = error = “residual”

38

RESIDUALS (ERRORS OF PREDICTION)

HOURS

876543210

S
C

O
R

E

6

5

4

3

2

1

0 Rsq = 0.5875

Residual = Y – Y′

“Prediction Line”
“Regression Line”

Y′

PREDICTION ERRORS

39

Y′ = 1.5 + .50 (X)

40

=

PREDICTION ERRORS

H ow m u c h e r ro r o n ave rage?
Σ(Y – Y′) = 0, so we have to square them! (just like when we

computed variance around a mean)

Va r i anc e o f
t h e r e s i dua ls:

St a nd a rd d ev iat io n
o f t h e r e s i dual s:

=2.xys 2
2)Y(Y


∑ ′−

n

sy x.
2

xys .

“Standard
error of

prediction”

T H E S TA N DARD E R ROR O F E S TIMATE &
C O RRELATION

 The s tandar d er r or of es timate (s e) gives a measure
of the standard distance between a regression line
and the actual data points

 If the correlation is near +/-1 .00, the standard error
of estimate will be small; as the correlation nears 0,
it will become larger
 Predicted variability = SSregression = r2SSy
 Unpredicted variability = SSresidual = (1 – r2)SSy

41

2
)ˆ( 2


= ∑

n
YY

df
SSresidual

2
)1( 2


=
n

SSr
df

SS Yresidual

IS THE BEST FITTING A LINE ALWAYS A
GOOD FITTING LINE?

0

10

20

30

40

50

60

70

80

90

100

0 2 4 6 8
0

10

20

30

40

50

60

70

80

90

100

0 5 10 15

Both of these figures show the best fitting lines for the data.
But would we say that both lines fit the data equally well? Clearly
not! 42

“ L I N E O F B E ST F I T ” VS . “ G OODNESS O F F I T ”

Imagine that you’re shopping for a suit, and you find
the best fitting suit in Wal-Mart
 does that necessarily mean that the suit fits you
well?

• line of best fit  “best fitting suit”
• goodness of fit  “how well the suit fits”

43

“GOODNESS OF FIT”: R 2

 We wan t to d ete rmi n e h ow muc h o f t h e vari abi l i t y o f y i s
ex p l ai n e d by x

 Th e “re s i dual s um o f s q uare s ” SS r es id ua l te l l s us h ow muc h o f t h e
vari at i o n i n y i s u n expla i n ed by o ur mo de l

 We c an al s o c al c ul ate h ow muc h o f t h e vari at i o n i n y is expla in ed
by o ur mo d e l  SS r eg r es s io n

 Th e r 2 val ue , o ur me as ure o f g o o dn e s s o f fi t , te l l s us w h at
pr o po r t i o n o f t h e tot a l s um o f s q uare s o f o ur o ut c o me vari abl e
(SS y) i s ex pl ai n e d by o ur mo de l

r 2 = SS re g re ssi on/ S S y

44

U N DERSTAN DIN G T H E R E GRESSION E Q UATION

 S o m e p r e c aut io ns :
 The predicted value is not perfect (unless r = +/-1.00)

 The regression equation should not be used to make predictions for X
values that fall outside of the range of values covered by the original
data

 E.g., We wouldn’t want to predict creativity scores for someone with an IQ of 90 or 130 
because the relationship between IQ and creativity may be different for these values

45

46

HYPOTHESIS TESTS
WITH REGRESSION

TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION

Analysis of regression
 Is the amount of variance predicted by the

regression equation significantly greater than what
we would expect by chance  if there was no
relationship between x and y?

47

TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION

 Analysis of regression
 Is the amount of variance predicted by the regression

equation significantly greater than what we would expect by
chance  if there was no relationship between x and y?

 Regression variation (SSregression)  The variance in y that is
related to or associated with changes in x. The closer data
points fall to the regression line, the larger the value of
regression variation.

 Residual variation (SSresidual)  The variance in y that is not
related to changes in x. This is the variance in y that is left
over or remaining. The farther data points fall from the
regression line, the larger the value of residual variation.

48

TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION

Analysis of regression
Very similar to analysis of variance:
 Uses an F-ratio of two mean square (MS) values
 Each MS is a SS divided by its df

(Variance of y related to changes in X)

(Variance of y not related to changes in X)

49

Learning check!

STANDARDIZED REGRESSION EQUATION

 Involves transforming raw scores into z-scores before
finding the regression equation

X Y Zx Zy
107 6 -.27 -.25

110 8 .07 .58

101 4 -.96 -1.08

105 5 -.50 -.66

124 10 1.66 1.41

Recall that for any standardized distribution, M = 0 and SD = 1, so:
• a = 0 (intercept drops out of equation)
• beta (β) = r

β is the standardized regression coefficient (b)
• Easier to interpret than b
• Useful for multiple regression (comparing multiple predictors)

ZxZy Zx2 Zy2

.0675 .073 .0625

.0406 .0049 .3364

1.037 .9216 1.1664

.33 .25 .4356

2.34 2.756 1.9881

Ŷ = bX + a  z’y = rzx
z’y = .95zX

50

51

HYPOTHESIS TESTING FOR B (SLOPE)

Three common hypothesis tests:
Is b significantly different from 0?

Is b significantly different from some non-zero value?

Are two bs significantly different from each other?

52

HYPOTHESIS TESTING FOR B

Is b significantly dif ferent from 0?
Population parameter: β*
Sample statistic: b
Two-tailed statistical hypotheses
H0: β* = 0
H1: β* ≠ 0

Conduct a single-sample t-test

53

sb =

HYPOTHESIS TESTING FOR B

Conduct a t-test

w h e r e

t = df = n-2

1
.
−ns
xys

x

“Standard
error of the

slope”

b

hypoth
s

b β−

54

HYPOTHESIS TESTING FOR B

Notice that the standard error of b will be influenced by 3
things:
Larger n = smaller standard error

Larger sY.X = larger standard error
Poor prediction overall leads to more error (variability) in sample

estimates of β*

Larger sX = smaller error
All else equal, greater variability in X results in more stable (less

variable) sample estimates of β*

MORE REGRESSION,
MORE VARIABLES

55

MULTIPLE REGRESSION

 One criterion variable and two or more predictor variables
determining a single comprehensive relationship

 Can help fix the third variable problem, by adding controls
into the regression equation

Ŷ = bX + a  Ŷ = b1X1 + b2X2 + … + bnXn + ac

56

57

RESEARCH PROBLEM

What is the association between a DV (Y)
and two or more IVs (Xs)?
Predicting exam grades (Y) from hours studying (X1) and

number of lectures attended (X2)

Predicting depression (Y) from stress (X1) and
social support (X2)

Predicting marital satisfaction (Y) from intimacy (X1), conflict
(X2), closeness (X3)

Can have any combination of numerical or
categorical predictor variables

58

Note that IQ predicts 40% of the variance in academic performance but adding
SAT scores as a second predictor increases the predicted portion by only 10%.

59

T YPES OF MULTIPLE REGRESSION

 Three types of multiple regression (MR):
 Simultaneous
 Enter all predictor variables at the same time
 “Standard” MR

 Hierarchical
 Enter predictor variables in predetermined sets

 Stepwise
 Computer program adds (or takes away) predictor variables one at

a time to optimize R2 (coefficient of multiple determination)
 Completely data driven

60

SIMULTA NEOUS MULTIVA RIATE REGRESSION
WITH T WO PREDICTORS

Y

Relationship
Satisfaction

X1
Intimacy

X2
Conflict

Y′ = a + b1 X1 + b2 X2

61

QUESTIONS OF INTEREST IN MR

What is the relationship between Y and each X?
 b (unstandardized slopes)
 t-tests for b’s

What is relative impor tance of each X?
 β (standardized slopes)

62

SIMULTA NEOUS MULTIVA RIATE REGRESSION
WITH T WO PREDICTORS

Y

Relationship
Satisfaction

X1
Intimacy

X2
Conflict

b1

b2

b = unstandardized slope (regression coefficient)

β = standardized slope (regression coefficient)

β1

β2

MULTIPLE REGRESSION

 A s t at i s t i c al te c h n i q ue t h at as s e s s e s t h e e f fe c t o f s eve ral
pre di c to r s (X) o n a s i n g l e c ri te ri o n (o ut c o me ) me as ure (Y )

 Te l l s us t h e c o n t ri but i o n o f e ac h vari abl e a b ove a n d b eyo n d t h e
ot h e r vari abl e s i n t h e e q uat i o n

Unstandardized prediction equation:

Y′ = a + b1 X1 + b2 X2

Y′ = Predicted value of Y

a = Intercept
Value of Y when all the Xs = 0

bj = Partial slope for variable j
Change in Y for every one-unit change in X,
holding all other Xs constant

Xi = Value of X1 (or X2) for person i
63

64

EXAMPLES

Predicting exam scores (Y) from hours studying (X 1)
and number of lectures attended (X 2)

 Y′ = 25 + 3 (X1) + 1 (X2)

Predicting depression (Y) from stress (X 1)
and social suppor t (X 2)

 Y′ = 5 + 2 (X1) + -4 (X2)

MULTIPLE R EGRESSION:
CONTRIBUTION OF EACH PREDICTOR

 D o b ot h o f o u r p r e d ic to r va r i a bles p r e d ict va r i a bilit y i n Y ?
W h a t i f o nl y o ne o f t h e m i s a c t u ally p r e d ic tive ? H ow c a n we
exa m ine t h e r e l a t ive c o nt r i b ut io n o f e a c h p r e d ic to r va r i a ble?

 E xa m ining t h e b et a va l u es  wo rk s i f we h ave s t a nda r dized
o u r s c o r e s b e fo r e fi nd i ng t h e r e g r es si o n e qu a t io n ( i . e . , we
a r e u s i ng t h e s t a nd a rdiz ed fo r m o f t h e m u l t iple r e g r es sio n
e qu a t io n)
 Larger beta (β)  larger contribution

z’Y = 1.2zX1 + .65zX2

65

MULTIPLE R EGRESSION:
CONTRIBUTION OF EACH PREDICTOR

 C a n a l s o te s t t h e s i g ni fic ance o f e ac h c o nt r i b u t io n:
 Does adding the second predictor variable (X2) make our

predictions significantly more accurate?
 E.g.., H0: b2 = 0

 To te s t t h i s hy p ot hes is, we fo l l ow 3 s te p s:
1. How much variance is predicted by using just the first predictor

variable?
2. What is the contribution made by the second variable?
3. Is this additional variance significant or not?

66

67

MORE ON MULTIPLE
REGRESSION

 M ul t i p l e re g re s s i o n
 A statistical technique that assesses the ef fect of several predictors (X)

on a single criterion (outcome) measure (Y)
 AKA , Multiple regression tells us about the ef fect of a variable on the outcome

above and beyond the other variables in the model

 Can s e r ve t wo g o al s :
 Rule out alternative explanations
 Give more predictive power

 T h i s a l l o w s u s to c o n t r o l f o r t h e e f f e c t o f o t h e r v a r i a b l e s s t a t i s t i c a l l y ,
ev e n w h e n w e c a n ’ t c o n t r o l t h e m ex p e r i m e n t a l l y

 It d o e s N OT
 Establish causation

68

WHAT CAN MULTIPLE REGRESSION TELL US?

MULTIPLE REGRESSION: EXAMPLE

Linear regression Multiple regression

What predicts behaviour problems in elementar y kids?

69

THE THIRD VARIABLE PROBLEM

70

Multiple Regression Helps with the Third Variable Problem

71

Multiple Regression Helps with the Third Variable Problem

72

73

A f r i e nd l o o k s a t t h e s e d a t a a nd s ay s , “ T he o nl y r e a s o n
ava i labilit y o f r e c e s s p r e d ic ts b e h av io ur p ro b l ems i n t h e
c l a s s ro o m i s b e c a use t h e r e a r e s o m a ny b oy s i n t h e c l a s s ,
a nd b oy s a r e o bv i o us ly m o r e a c t i ve. ”

W h a t d o yo u s ay b a c k ?

74

MULTIVARIATE ANALYSIS

 Regression in the popular press
 Look for buzz words:
 Controlled for
 Taking into account
 Correcting for
 Adjusted for

Learning check!

75

SPECIAL T YPES OF MULTIPLE
REGRESSION

 Me di a tion a n a lysis
 Assesses whether a third variable explains the relationship between

X and Y
 Identifies possible causal mechanisms

 Mo de ra tion a n a lysis
 Assesses whether a third variable changes the relationship between

X and Y
 Identifies possible interactions among predictors

76

MEDIATION ANALYSIS

 S h o w t h a t I V p r e d i c t s DV

 S h o w t h a t I V p r e d i c t s m e d i a to r

 I n c l u d e b o t h t h e I V a n d t h e
m e d i a to r a s p r e d i c to r s o f t h e DV

β31

β21
β32

β11

*Results from Buying Time Promotes Happiness

77

MODERATION ANALYSIS

 W h en a r e l a t io nsh ip b et ween
t wo va r i ables d e p end s o n a t h i r d
va r i abl e
 Statistical interaction!

 I n m u l t iple r e g r e ss io n, i nc l u de
t h e I V, t h e m o d e r ato r, a nd a n
i nte r ac t io n te r m

 E xa m ple: S we a r ing m o d e r ates
t h e r e l a t io nsh ip b et ween
c a t a s t ro p hising a nd c o l d – p r es so r
l a tenc y

*Results from Swearing as a Response to Pain

78

h t t p s : / / w w w. b l u e p r i n t i n c o m e . c o m / t o o l s / l i f e – e x p e c t a n c y – c a l c u l a t o r – h o w – l o n g –
w i l l – i – l i v e /

MULTIPLE REGRESSION IN LIFE

79

h t t p : / / t i m e . c o m / 8 2 9 3 / i t s – t r u e – l i b e r a l s – l i k e – c a t s – m o r e – t h a n – c o n s e r v a t i v e s – d o /

MULTIPLE REGRESSION IN LIFE

Quiz score correlated
with actual political
preference at r = .68

http://time.com/510/ca
n-time-predict-your-
politics/

80

 D a t a a na l y sis p ro j e c t – t a ke a l o o k a t t h e i ns t r u c t io ns !

 M i nd t ap + t u to r i al

 Rev i ew m i d ter m w i t h TA s

81

TO DO

  • Psy 202H1: �Statistics iI���Module 7: �Regression�
  • Game Plan
  • Linear Regression��Statistical Technique Used to Predict the Unknown Value of One Variable Given a Known Value of Another Variable
  • Review
  • Introduction to Linear Equations and Regression
  • Review�Assessing for the Presence of covariation
  • Review�Assessing for the Presence of CVRs
  • Review�Assessing for the Presence of CVRs
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear Regression
  • Linear regression vs Correlation
  • Linear Regression is For the Birds
  • Eyeball regression
  • More on Prediction equations
  • Predicting Quiz Score (Y)�from Hours Studying (X)
  • Predicting Quiz Score (Y)�from Hours Studying (X)
  • Linear Regression Model
  • Simple linear Regression Equation
  • Example of a prediction equation
  • Predicting Quiz Score (Y)�from Hours Studying (X)
  • Examples of prediction equations
  • Understanding the slope
  • Computing the Slope & Intercept
  • Sample computations
  • Predicting Quiz Score (Y)�from Hours Studying (X)
  • Prediction Errors
  • Residuals (errors of prediction)
  • Prediction Errors
  • Prediction Errors
  • The standard error of estimate & correlation
  • Is the best fitting a line always a good fitting line?
  • “Line of best fit” vs. “Goodness of fit”
  • “Goodness of Fit”: r2
  • Understanding the Regression Equation
  • Hypothesis Tests with Regression
  • Testing the Significance of the Regression Equation
  • Testing the Significance of the Regression Equation
  • Testing the Significance of the Regression Equation
  • Standardized Regression Equation
  • Hypothesis testing for b (slope)
  • Hypothesis Testing for b
  • Hypothesis Testing for b
  • Hypothesis Testing for b
  • More Regression, more variables
  • Multiple regression�
  • Research Problem
  • Slide Number 58
  • Types of Multiple Regression
  • Simultaneous Multivariate regression�with two predictors
  • Questions of Interest in MR
  • Simultaneous Multivariate regression�with two predictors
  • Multiple regression
  • Examples
  • Multiple Regression:�Contribution of Each Predictor
  • Multiple Regression:�Contribution of Each Predictor
  • More on Multiple Regression
  • What can Multiple regression tell us?
  • Multiple Regression: Example
  • The Third Variable Problem
  • Slide Number 71
  • Slide Number 72
  • Slide Number 73
  • Slide Number 74
  • Multivariate Analysis
  • Special Types of Multiple Regression
  • Mediation analysis
  • Moderation Analysis
  • Multiple Regression in Life
  • Multiple Regression in Life
  • To Do

TD0409-01 课件/Psy 202_4_IntrotoFactorial_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 4:
INTRO TO FACTORIAL ANOVA

1

1. Intro to Factorial ANOVA
1. Why factorial designs?
2. Structure of a Factorial ANOVA
3. A conceptual demonstration
4. What we can learn from Factorial ANOVA
5. Effects in graphs and text

2. Calculations – next module!

GAME PLAN

2

SOME QUICK CLARIFICATION

The Question: If my exact degrees of freedom
isn’t included on the table, which number
should I use?
Answer: Choose the safest (most conservative)

value
E.g., You’re looking up a q value, and your dfwithin

= 36  but the table jumps from 30 to 40
 Use 30 (which will indicate a larger q value  making it a more

conser vative test)

3

SOME QUICK CLARIFICATION

The Question: How many decimal places should I
round my final answer to? If I’m doing multiple
calculations to get to my final answer, should I
not round until the ver y end?
Answer: You should round your final answers to 2

decimal places (e.g., 4.87246  4.87). For the best
accuracy, hold off on rounding until the last step. But
do not spend time worrying about rounding – I am
much more interested in whether you followed the
correct steps, used the correct formulas, etc. So long
as you’ve shown your work, there is no need to worry
if your final answer is off by a couple of hundredths.

4

INTRODUCTION TO
FACTORIAL

DESIGN AND ANALYSIS

5

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA 6

WHY FACTORIAL ANOVAS?

7

• So far, we have discussed designs where there is only one IV and only
one DV

• Complex designs include multiple IVs, multiple DVs, or both
• Multiple IVs  Factorial design & Assessing interactions

• More groups can offer more precision

• Include more experimental, control, or placebo conditions (add
levels of IV)

• Want to understand if your effect is moderated/affected by another
variable (add IVs)

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

8

 REQUIREMENTS
Must have 2 or more IVs
Must have 2 or more levels of each IV
Must have quantitative DV

But why a Factorial ANOVA when t-tests and one-
ways are just so great!?

FACTORIAL ANOVA

9

 Example

 IV (exercise): mild vs. intense
 IV (age group): young adult vs. elderly
DV: overall fitness

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

Why not just run two t-tests?
• Exercise on fitness
• Age on fitness

10

A ge on fitness test per formance– t-test

0

1

2

3

4

5

6

7

young elderly

11

Exercise on fitness test per formance — t-test

0

1

2

3

4

5

6

7

mild intense

HUH??!!??
Exercise is
good right!?

12

AGE × EXERCISE ON FITNESS INTERACTION!

0

1

2

3

4

5

6

7

young elderly
mild intense

13

MEASURING MORE THAN ONE OUTCOME

1 . M a n i p u l a t i o n c h e c k s ( m e a s u r i n g t h e I V )
To ensure our manipulation worked

2 . M u l t i p l e m e a s u r e s o f t h e s a m e v a r i a b le o r c o n s t r u c t ( s a m e DV )
To assess convergent validity
To create composite scores

3 . M e a s u r e s o f s e v e r a l d i f f e r e n t v a r i a b l e s o r c o n s t r u c t s ( m u l t i p l e DV s )
To assess divergent (discriminant) validity
To assess possible confounds that can’t be experimentally controlled

4 . M u l t i p l e I V S
To assess interactions

14

FACTORIAL DESIGNS: ADVANTAGES

1. Allow for testing of multiple hypotheses within a
single study

1. Methodologically efficient
2. Statistically “cheaper”

2. Allows for more complex hypotheses and research
questions

3. Better understand the nuances of an ef fect
1. Interactions
2. Moderation

15

 When describing factorial ANOVAs statistically and
conceptually I’ll focus on 2 x 2 factorial designs
 As we start to calculate things, you’ll understand why!

 Thus the specifics in the remainder of this lecture
apply only to 2 x 2 Factorial ANOVAs, rather than
more complex designs.

DISCLAIMER

16

FACTORIAL DESIGN

17

MANIPULATING MULTIPLE FACTORS

 A l l ows us to a n swe r q ue stio ns a bo ut wh eth er th e e f fe ct o f
o n e in de pen dent va ria ble de pe nds o n th e leve l o f a n oth er

 Fa ctorial d e sign : Ea c h leve l o f o n e I V i s c o mbined with e a c h
leve l o f th e oth e r s to pro duc e a ll po ssible c o mbinatio ns o f
leve l s
 Non-manipulated IVs ok

18

WINE-RATING EXAMPLE

 What determines how highly a wine is rated?

Cf. Plassmann, O’Doherty, Shiv, & Rangel (2008; PNAS)

Quality? Price?

19

FACTORIAL DESIGN

 A re se a rc h de sig n
inve stiga ting th e
e f fe c t o f two o r
mo re in de pen dent
va ri a bl es (fa c tor s)
o n th e de pe ndent
va ri a bl e

Cheap Price Expensive Price

Low Quality
Low Quality,
Cheap Price

Low Quality,
Expensive Price

High Quality
High Quality,
Cheap Price

High Quality,
Expensive Price

Price

Quality

Factors: Quality Price

High LowLevels: Cheap Expensive

2 levels x 2 levels = 4 conditions

20

FACTORIAL DESIGN TABLE

 T h e r o w s r e p r e s e n t t h e l e v e l s o f o n e i n d e p e n d e n t v a r i a b l e , t h e
c o l u m n s r e p r e s e n t t h e l e v e l s o f a s e c o n d i n d e p e n d e n t v a r i a b l e , a n d
e a c h c e l l r e p r e s e n t s a c o n d i t i o n .

2×2 DESIGN: FACTOR B
Level 1 Level 2

FACTOR A
Level 1 Condition 1 Condition 2

Level 2 Condition 3 Condition 4

21

BET WEEN- VS. WITHIN-SUBJECT
FACTORIAL DESIGN

 B e t w e e n – s u b j e c t s f a c to r i a l d e s i g n
 ALL of the factor s are manipulated between subjects
 Each subject par ticipates in just ONE condition

 W i t h i n – s u b j e c t s f a c to r i a l d e s i g n
 ALL of the factor s are manipulated within subjects
 Each subject par ticipates in ALL conditions

 M i xe d d e s i g n f a c to r i a l
 SOME of the factor s are manipulated between subjects, SOME within subjects
 Each subject par ticipates in MORE THAN ONE, but NOT ALL conditions

22

 Research on video games and aggression has been
mixed. Studies of ten compare how violent and non-
violent video games af fect aggressive behavior, but
you wonder if perhaps opponent type – whether the
game is played against another person or the
computer – also might matter.

 IV1: Game type – violent or non-violent
 IV2: Opponent type – real or computer
 DV: A ggressive behavior

AN EXAMPLE

23

 Between Subjects:
Each participant participates in one level of each IV

(i.e., in one of the four cells of the design).
All four cells of the design have different

participants.

T YPES OF FACTORIAL DESIGNS: BET WEEN

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-10 Participants #: 11-20

Against computer, level 2 Participants #: 21-30 Participants #: 31-40

24

 Repeated Measures:
Each participant participates in both levels of both

IVs (i.e., in all four cells of the design).
All four cells of the design have the same

participants.

T YPES OF FACTORIAL DESIGNS: WITHIN

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-40 Participants #: 1-40

Against computer, level 2 Participants #: 1-40 Participants #: 1-40

25

 Mixed Model:
Each participant participates in one level of one IV

and in both levels of the other IV (i.e., in two cells of
the design).
Two cells of the design have the same participants,

the other two have another set of participants.

T YPES OF FACTORIAL DESIGNS: MIXED

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-20 Participants #: 21-40

Against computer, level 2 Participants #: 1-20 Participants #: 21-40

26

 Structure
 Factors – new term for independent variable
 Levels – number of variations or categories in IV

 Notation
 A x B x C -> 2 x 2 x 3
 Where number of terms represents number of factors
 And the value of each term represents number of levels in that factor
 So the product of each term represents the total number of conditions

 Example: “We utilized a 2 (Game type: violent, non-violent) x 2
(Opponent type: person, computer) between-subjects factorial
design”
 Or, a 2×2

DESCRIBING A FACTORIAL DESIGN

27

Learning check!

A CONCEPTUAL
DEMONSTRATION

28

KINDS OF STATISTICAL EFFECTS

 M a i n e f f e c t
 On average, levels of Factor A dif fer from each other
 On average, levels of Factor B dif fer from each other

 S i m p l e e f f e c t
 At a specific level of Factor A , levels of Factor B dif fer from each other
 At a specific level of Factor B, levels of Factor A dif fer from each other

 I n te r a c t i o n
 The ef fect of Factor A on the DV depends on the level of Factor B
 The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B

29

We’ll come
back to this one
in a bit

2X2 EXAMPLE STUDY

• Wells & Petty (1980)

• Previous work shows that we sometimes infer
our attitudes and feelings by looking at our
behavior

• Suggested that we use our physical behavior
as an attitude cue

30

Asked par ticipants to “help evaluate headphones”

Nod head up and down
Shake head lef t to right

Listened to per suasive argument

Advocate tuition increase
Advocate tuition decrease

Rated opinion on tuition change

Head nod Head shake

Tuition
increase

1 2

Tuition
decrease

3 4

Factor A

Factor B

F
A
C
T
O
R

B

FACTOR A

31

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects
 Interaction

32

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

33

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

34

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

35

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

36

Head nod Head shake

Tuition
increase
Tuition
decrease

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

INTERACTIONS
• Ef fects of one IV on DV depend on presence of second IV
• Two types

• Spreading: effect exists at one level of the IV and is
weaker or nonexistant at different level

• Crossover: no main effects of either IV because effects
are opposite at different levels of other IV

37

CONCEPTUALLY,
WHAT CAN WE LEARN

FROM
FACTORIAL DESIGNS?

38

KINDS OF STATISTICAL EFFECTS

 M a i n e f f e c t
 On average, levels of Factor A dif fer from each other
 On average, levels of Factor B dif fer from each other

 S i m p l e e f f e c t
 At a specific level of Factor A , levels of Factor B dif fer from each other
 At a specific level of Factor B, levels of Factor A dif fer from each other

 I n te r a c t i o n
 The ef fect of Factor A on the DV depends on the level of Factor B
 The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B

39

WINE-RATING EXAMPLE

H o w d o q u a l i t y a n d p r i c e s
a f f e c t w i n e r a t i n g s ?

Cheap Price Expensive Price

Low Quality
Low Quality,
Cheap Price

Low Quality,
Expensive Price

High Quality
High Quality,
Cheap Price

High Quality,
Expensive Price

Price

Quality

Factors: Quality Price

High LowLevels: Cheap Expensive

2 levels x 2 levels = 4 conditions

40

Cheap Price Expensive Price

Low Quality 35 87 61

High Quality 51 87 69

43 87 Marginal Means

MAIN EFFECTS

The effect of one factor on average across all levels of
the other factor(s); difference between marginal means

Main Effect of Price
87 – 43 = 44

Main Effect of
Quality

69 – 61 = 8

Main effect of price: On average,
expensive wines (M=87) were rated 44
points higher than cheap wines (M=43)

Main effect of Quality:
On average, high-

quality wines (M=69)
were rated 8 points

higher than low-quality
wines (M=61)

41

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

SIMPLE EFFECTS

Simple Effect of Price
on Low Quality Wines

87 – 35 = 52

Simple Effect of Price
on High Quality Wines

87 – 51 = 36

Simple Effect of Quality
on Cheap Wines
51 – 35 = 16

Simple Effect of Quality
on Expensive Wines

87 – 87 = 0

When the wine is cheap,
high-quality wines (M=51)
are rated 16 points higher

than low-quality wines
(M=35)

When the wine is expensive,
high-quality (M=87) wines
are rated the same as low-

quality wines (M=87)

When the wine is high-
quality, expensive wines

(M=87) are rated 36 point
higher than cheap wines

(M=51)

When the wine is low-quality,
expensive wines (M=87) are
rated 52 points higher than

cheap wines (M=35)

42

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

INTERACTIONS

Simple Effect:
Price on Low Quality Wines

87 – 35 = 52

Simple Effect:
Price on High Quality Wines

87 – 51 = 36
Simple Effect:

Quality on Cheap Wines
51 – 35 = 16

Simple Effect:
Quality on Expensive Wines

87 – 87 = 0

The effect of one factor depends on the levels of the
other factor(s); difference between simple effects

Interaction: The effect of quality is
different for cheap wines vs. expensive

wines
(the effect of quality depends on price)

Interaction: The effect of
price is different for high-

quality wines vs. low-
quality wines (the effect of
price depends on quality)

43

HOW TO DESCRIBE INTERACTIONS

 M u s t d e s c r i b e a t l e a s t t w o s i m p l e e f f e c t s :

 E x a m p l e 1 : W h e n t h e p r i c e i s c h e a p , h i g h – q u a l i t y w i n e s
( M = 51 ) a r e r a t e d 1 6 p o i n t s h i g h e r t h a n l o w – q u a l i t y w i n e s
( M = 3 5 ) , b u t w h e n t h e p r i c e i s ex p e n s i v e , h i g h – a n d l o w –
q u a l i t y w i n e s a r e r a t e d e q u a l l y h i g h ( M = 87 f o r b o t h ) .

 E x a m p l e 2 : W h e n t h e q u a l i t y i s l o w, c h e a p w i n e s ( M = 3 5 ) a r e
r a t e d 5 2 p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) , b u t w h e n
t h e q u a l i t y i s h i g h , c h e a p w i n e s ( M = 51 ) a r e r a t e d o n l y 3 6
p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) .

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

Learning check!
44

STATISTICAL EFFECTS IN
GRAPHS

45

MAIN EFFECTS

Low

High

Quality

Comparing averages of end-points

Main effect of Quality:
On average, high-quality
wines were rated 8 points

higher than low-quality wines

Main effect of price:
On average, expensive

wines were rated 44 points
higher than cheap wines

46

SIMPLE EFFECTS

Low

High

Quality

Comparing any 2 end-points

Simple Effect of Quality
on Cheap Wines
51 – 35 = 16

Simple Effect of Quality
on Expensive Wines

87 – 87 = 0

When the wine is cheap, high-
quality wines are rated 16 points

higher than low-quality wines

When the wine is expensive,
high-quality wines are rated
the same as low-quality wines

47

SIMPLE EFFECTS

Low

High

Quality

Comparing any 2 end-points

Simple Effect of Price
on High Quality Wines

87 – 51 = 36

Simple Effect of Price
on Low Quality Wines

87 – 35 = 52

When the wine is high-quality,
expensive wines are rated 36
points higher than cheap wines

When the wine is low-quality,
expensive wines are rated 52
points higher than cheap wines

48

INTERACTIONS

Low

High

Quality

Are the differences different?

When the wine is cheap, high-
quality wines are rated 8

points higher than low-quality
wines

When the wine is expensive,
high-quality wines are rated
the same as low-quality wines

Interaction: The effect of quality is different
for cheap wines vs. expensive wines

49

INTERACTIONS

Low

High

Quality

Are the differences different?

When the wine is high-quality,
cheap wines are rated 36 points

lower than expensive wines

When the wine is low-quality,
cheap wines are rated 52 points

lower than expensive wines

Interaction: The effect of price is different
for high-quality wines vs. low-quality wines

50

INTERACTIONS

Low

High

Quality

Low

High

Quality

Interaction: No interaction:

Are the differences different?

Learning check!

51

LINE VS. BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

52

LINE VS. BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

53

MAIN EFFECTS IN BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

Low Quality High QualityMain effect of Quality:
On average, high-
quality wines were

rated 8 points higher
than low-quality wines

Main effect of price:
On average, expensive
wines were rated 44
points higher than

cheap wines
54

INTERACTIONS IN BAR GRAPHS

0

20

40

60

80

100

Cheap Expensive

Low Quality High Quality

Interaction: The effect of
quality is different for cheap

wines vs. expensive wines

Interaction: The effect of
price is different for high-

quality wines vs. low-quality
wines

55

GRAPHING THE SAME DATA SEVERAL WAYS

0

50

100

Low Quality High Quality

Cheap Expensive

0
20
40
60
80

100

Cheap Expensive

Low Quality High Quality

0

50

100

Low Quality High Quality

Cheap Expensive

0
20
40
60
80

100

Cheap Expensive

Low Quality High Quality

56

HIGHER ORDER INTERACTIONS

This figure from “Retrieval Practice
Protects Against Stress” shows a
2x2x2 design
1. Test 1 (immediate) vs. Test 2

(delayed)
2. Study practice (SP) vs. Retrieval

practice (RP)
3. Stressed (white) vs. Non-

stressed (grey)

A 3-way interaction is when the
effect of one factor depends on 2
other factors:
• There is an interaction between

Study Method and Stress
Induction for Test 2 but not for
Test 1

• The effect of stress depends on
how you studied, but also on
when the test happened

57

INTERPRETING TEXT

58

INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1

Main effect of
group behavior

Main effect of
partner behavior

Interaction between
group behavior &
partner behavior

59

INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1

60

INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2

Interaction between
reciprocity/conformity &

partner knowledge

Simple effect of
reciprocity/conformity

when partner behavior is
known

Simple effect of
reciprocity/conformity

when partner behavior is
unknown

61

INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2

Learning check!

62

 Re a d ing + M i nd t ap

 F i r s t c o nte nt – l o aded t u to r i al ( + fi r s t a s s i g nment )

 M i d ter m exa m w i l l b e i n t h e we e k b e fo r e r e a d ing we e k – t h i s
i s a g o o d t i m e to s t a r t m a k ing a s t u d y p l a n, s p r e a d o u t ove r
t i m e , s o yo u d o n’ t ne e d to c r a m !
 Midterm info is already posted! See Assignments page of syllabus.

63

TO-DO

  • Psy 202H1: �Statistics iI���Module 4: �Intro to Factorial ANOVA�
  • Game Plan
  • Some Quick Clarification
  • Some Quick Clarification
  • Introduction to Factorial �Design and Analysis
  • Slide Number 6
  • Why Factorial ANOVAS?
  • Slide Number 8
  • Factorial Anova
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Age × Exercise on Fitness Interaction!
  • Measuring More than one outcome
  • Slide Number 15
  • DISCLAIMER
  • Factorial Design
  • Manipulating multiple factors
  • Wine-rating example
  • Factorial design
  • Factorial design table
  • Between- vs. within-subject factorial design
  • AN Example
  • Types of Factorial Designs: Between
  • Types of Factorial Designs: Within
  • Types of Factorial Designs: Mixed
  • Describing a Factorial Design
  • A Conceptual Demonstration
  • Kinds of Statistical effects
  • Slide Number 30
  • Slide Number 31
  • Possible Effects
  • Possible Effects
  • Possible Effects
  • Possible Effects
  • Possible Effects
  • Slide Number 37
  • Conceptually,� What can we learn from �Factorial Designs?
  • Kinds of Statistical effects
  • Wine-rating example
  • Main effects
  • Simple effects
  • Interactions
  • How to describe Interactions
  • Statistical effects in graphs
  • Main Effects
  • Simple Effects
  • Simple Effects
  • Interactions
  • Interactions
  • Interactions
  • Line vs. Bar graphs
  • Line vs. Bar graphs
  • Main effects in bar graphs
  • Interactions in bar graphs
  • Graphing the same data several ways
  • Higher Order Interactions
  • Interpreting text
  • Interpreting text:�reciprocity & conformity, Study 1
  • Interpreting text:�reciprocity & conformity, Study 1
  • Interpreting text �reciprocity & conformity, Study 2
  • Interpreting text �reciprocity & conformity, Study 2
  • To-Do

TD0409-01 课件/Psy 202_3_RepeatedANOVA_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 3:
REPEATED MEASURES ANOVA

1

1. Repeated Measures ANOVA
1. Sample Problem
2. Effect Size
3. Posthoc Tests

GAME PLAN

2

 Spre ad o ut yo ur s t udy i n g
 All else being equal, studying twice for 1 .5 hours each is better than

studying once for 3 hrs. Studying three times for 1 hr each is even better.

 Th e b e s t s t udy i n g mat c h e s t h e t y pe o f as s e s s me n t
 The test will have computations? Do practice problems (MindTap,

lectures, book problems, as well as a ton of other stuf f online)
 The test will require you to explain things? Practice explaining things! To

a study buddy, your notebook , or a lamp, it doesn’t matter who is
listening. What matters is the explaining (in your own words).
 The test will be closed book? Well, your studying better include you

closing your book or notes and practicing recalling information!!

 In f ac t , t h i s i s t h e be s t advi c e I c an o f fe r: w h ateve r e l s e yo u do ,
yo u s t udy i n g mus t i n c l ude te s t i n g yo ur s e l f . Cl o s e yo ur n ote s an d
s e e w h at po ps o ut . Th e n , us e t h at to g ui de yo ur s t udy i n g .

 Fo r mo re t i p s , vi s i t Th e L e arn i n g Sc i e n t i s t s
 https://www.learningscientists.org/downloadable-materials/

MY BEST STUDY ADVICE
(FOR ALL CLASSES, NOT JUST THIS ONE)

3

REPEATED MEASURES
(OR WITHIN-SUBJECT)

ANOVA

4

THE LOGIC OF O N E – WAY
R E P E AT E D M E A S U R E S ANOVA

F = variance between sample means
variance expected by chance (error/natural
variability)

WTF??!!??
This looks identical
to the one way
between ANOVA

5

 I nd e p endent – meas ures A N OVA u s e s m u l t i ple p a r t i c ipant
s a m p les to te s t t h e t r e a t m ent s.
 If groups are different, what was responsible?
 Treatment dif ferences?
 Par ticipant group dif ferences?

 Re p e ated- m easur es s o l ve s t h i s p ro b l em by te s t i ng a l l
t r e a t ment s u s i ng o ne s a m p le o f p a r t i c ipant s .
 In an experiment, compare two or more manipulated treatment

conditions using the same participants in all conditions
 In a nonexperimental study, compare a group of participants at two

or more different times
 Before therapy; af ter therapy; 6-month follow -up
 Compare vocabular y at age 3, 4 and 5

REPEATED MEA SURES A NOVA :
WIT HIN-SUB JECT S DESIGN WITH MORE THA N 2

GROUPS

6

EXAMPLE!

https://www.scientificamerican.com/article/is-double-dipping-a-food-safety-problem-or-just-a-nasty-habit/ 7

F = variance between groups
variance within groups

 Two sources of variance:

 Between group variance — how big are differences
between groups
 Within group variance — how much error/natural

variability

THE LOGIC OF O N E – WAY
R E P E AT E D ANOVA

8

BET WEEN GROUP VARIANCE IN
REPEATED MEASURES

Why do people in dif ferent groups dif fer?

1. Treatment effect = differences caused by our
experimental treatment

Systematic dif ferences

2. Chance = differences due to random factors
including…
 Individual differences
Experimental error (noise)

Non-systematic, random dif ferences

In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role

9

WITHIN GROUP VARIANCE IN
REPEATED MEASURES

Why do people within the same group dif fer?

1. Chance = differences due to random factors
including…
 Individual differences
Experimental error (noise)

Non-systematic, random differences

In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role

10

 Re p e ated- m easur es d e s i gn a l l ows c o nt ro l o f t h e e f fec t s o f
p a r t i c ipant c h a r ac ter is tic s
 Eliminated from the numerator by the research design
 Must be removed from the denominator statistically

 T h e b i g ges t c h a ng e b et ween i nd e pendent – m easur es A N OVA
a nd r e p e ated- meas ur es A N OVA i s t h e a d d it io n o f a p ro c e s s to
m a t h ema tic ally r e m ove t h e i nd i v idual d i f fer enc es va r i a nc e
c o m p o ne nt f ro m t h e d e no m i nato r o f t h e F – r a t i o .

HOW DO WE DEAL WITH INDIVIDUAL
DIFFERENCES?

11

 If Null is True:

 If Null is False:

F= Between-group (Treatment or Chance- individual differences )
Within-group (Chance -individual differences )

F = Treatment Effect + Chance
(Chance- individual differences)

F = 0 + (Chance- individual differences)
(Chance- individual differences)

≈ 1

> 1

12

THE R E P E AT E D
M E A S U R E S F -RATIO

THE R E P E AT E D
M E A S U R E S F -RATIO

 If Null is True:

 If Null is False:

F = Between-group (Treatment or Just Experimental Error)
Within-group (Experimental Error)

F = Treatment Effect + Chance
Experimental Error

F = 0 + Experimental Error
Experimental Error

≈ 1

> 1

13

THE R E P E AT E D
M E A S U R E S F -RATIO

 F is the ratio between two variance estimates

 Denominator is called “error term”  composed of
individual difference variability and experimental error

14

T WO STAGES OF THE REPEATED-
MEASURES ANOVA

First stage
 Identical to independent samples ANOVA
Compute SStotal, SSbetween treatments and SSwithin treatments

Second stage
Done to remove the individual differences from the

denominator
Compute SSbetween subjects and subtract it from SSwithin

treatments to find SSerror (also called residual)

15

STRUCTURE OF THE REPEATED-MEASURES
ANOVA

If Within-Group variance
can be partitioned into
individual differences
and error, then the sum
of between subjects and
error values (i.e., SS, df)
will always equal Within!

16

REPEATED MEASURES DESIGNS:
PROS & CONS

• Repeated Pros:
 Participants serve as their own “controls” (reduced

error, more power)
 Need fewer participants for the same research

question (compared to between-subjects design)

• Repeated Cons:
 Order effects, practice effects
 May guess hypothesis / aware of what is being

manipulated
 Longer studies
 Limits possibilities for experimental manipulations

17

EFFECT SIZE FOR THE
REPEATED-MEASURES ANOVA

Percentage of variance explained by the
treatment differences
Partial η2 is percentage of variability that has

not already been explained by other factors

or
subjectsbetween total

eatmentsbetween tr2

SS SS
SS


errorSSSS
SS

+
=

eatmentsbetween tr

eatmentsbetween tr2η

18

REPEATED-MEASURES ANOVA
POST HOC TESTS (POSTTESTS)

Significant F indicates that H0 (“all
populations means are equal”) is wrong in
some way.

Use post hoc test to determine exactly where
significant differences exist among more than
two treatment means
Tukey’s HSD can be used
Substitute SSerror and dferror in the formulas

19

REPEATED-MEASURES ANOVA
ASSUMPTIONS

 The obser vations within each treatment condition
must be independent.

 The population distribution within each treatment
must be normal.

 The variances of the population distribution for each
treatment should be equivalent.

20

Learning check!

PRACTICE WITH
REPEATED MEASURES

ANOVA

21

STRUCTURE OF THE REPEATED-MEASURES
ANOVA

22

 Re s e arc h Qu e s t io n
 A researcher is tr ying to determine the best way for individuals to

recall a list of words. Eight participants each received three lists of
words and tried to remember them using three different ways of
memorizing (rote rehearsal, an imager y mnemonic technique, or a
stor y mnemonic technique). After each study period, participants did
a ten minute distractor task then took a test on the word list. Was
there a difference in recall based on the type of memor y technique
that participants used?

 IV: Memor y technique
 3 levels: rote vs imager y vs stor y

 DV: Number of words recalled

R E P E AT E D M E A S U R E S A N OVA – L E T ’ S P R AC T I C E !

23

Participant Rote Imagery Story
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

Are these 3
means

significantly
different

from each
other?

24

HYPOTHESIS TESTING WITH RM ANOVA

 Re s e arc h qu e s t i o n
 Does memory technique affect word recall?

 Ste p 1 : St a t is ti cal H y p ot h eses

 H0: µ1 = µ2 = µ3
 H1: At least one mean is different from another

 Ste p 2 : D e c i sio n Ru l e
 Look up critical value of F in Table

 Ste p 3 : C o m p u te o b s e r ved F – r a t io
 Track values in ANOVA Summary Table

 Ste p 4 : M a ke a D e c i s io n ( Re j ec t o r r et a i n H 0)

 * * Ste p 5 : I f H 0 r e j e cted , c o nd u c t p o s t – h o c c o m p a r iso ns

 Ste p 6 : C o m p u te E f fe ct S i z e , I nte rp r et a nd Re p o r t F i nd ings
25

COMPUTING ANOVA

The ANOVA Summar y Table

Source SS df MS F

Between group SSB dfB MSB MSB
MSEWithin group SSW dfW

Between
Subjects

SSP dfP

Error SSE dfE MSE

Total SST dfT 26

FINDING THE CRITICAL VALUE

Find Fcritical in Table

 Need to know 3 things

 α level
 dfnumerator = dfbetween
 dfdenominator = dferror

 If α = .05 and df = 2, 14, Fcritical = 3.74

27

CRITICAL VALUES OF F FOR DF=2,14

Critical region;
Reject H0

3.74 6.51

28

COMPUTING ANOVA

 STEP 1: Compute Sums of Squares (SS)

SSTotal =
2

2 GX
N

−∑

SSBetween =
22 GT

Nn
−∑

SSWithin = Σ(SS for each group) or SSTotal − SSBetween

Where:

• X = each value of X

• T = treatment group total (ΣX)

• G = grand total (ΣT)

• n = sample size of each group

• N = total sample size (Σn)

29

Participant Rote Imagery Story
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

n 8 8 8 N = 24
Totals T1=28 T2=40 T3=54 G = 122

N = 24
n = 8
K = 3

30

COMPUTING ANOVA

 STEP 2: Compute Degrees of Freedom (df)

dfBetween = k – 1

Where:

• n = sample size of each group

• N = total sample size (Σn)

• k = number of groups

dfWithin = N – k or Σ(n-1)

dfTotal = N – 1

31

COMPUTING ANOVA

 STEP 3 (NEW): Compute Between Subject Values

Where:

• n = sample size of each group

• N = total sample size (Σn)

• G = grand total (ΣT)

• P = person totals (Σx for each
participant)

• k = number of groups

SSbetweensubjects =
22 GP

Nk
−∑

32

Participant Rote Imagery Story P
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

2 + 4 + 5 = 9

3 + 2 + 3 = 8

14

16
15

16

24

18

33

COMPUTING ANOVA

 STEP 3 (NEW): Compute Between Subject Values

Where:

• n = sample size of each group

• N = total sample size (Σn)

• G = grand total (ΣT)

• P = person totals (Σx for each
participant)

• k = number of groups

SSbetweensubjects =
22 GP

Nk
−∑

SSerror = SSWithin – SSbetweensubjects

dfbetweensubjects = n – 1

dferror = dfwithin – dfbetweensubjects OR (N-k)-(n-1)

34

COMPUTING ANOVA

 STEP 4 (UPDATE): Compute Mean Squares (MS)

MSBetween = between
between

df
SS

MSerror =
error

error

df
SS

35

COMPUTING ANOVA

 STEP 4 (UPDATE): Compute the F -Ratio

F-Ratio =
error

between

MS
MS

36

THE ANOVA SUMMARY TABLE

Source SS df MS F

Between group SSB dfB MSB MSB
MSEWithin group SSW dfW

Between
Subjects

SSP dfP

Error SSE dfE MSE

Total SST dfT

37

THE ANOVA SUMMARY TABLE

Source SS df MS F

Between group 42.33 2 21.17 8.64

Within group 73.5 21

Between
Subjects

52.5 7

Error 34.33 14 2.45

Total 115.83 23

38

EFFECT SIZE FOR REPEATED MEASURES

total

eatmentsbetween tr2

SS
SS

jectsbetweensubtotal

eatmentsbetween tr2

SS SS
SS


For independent measures ANOVA

For repeated measures ANOVA

39

EFFECT SIZE FOR REPEATED MEASURES

jectsbetweensubtotal

eatmentsbetween tr2

SS SS
SS


=η For repeated measures ANOVA

η2 = 4 2 . 3 3 = . 5 5 2
1 1 5 . 8 3 – 3 9 .17

40

TUKEY HSD TEST

 Step 1: Find the value of “q”
 Need to know 3 things:
 α
 dfE
 k

 Step 2: Compute HSD

HSD =
n

MSerrorq
Where n = group sample
size, assuming equal n

in each group

41

TUKEY HSD TEST

 Step 1: Find the value of “q”
 α = .05 dfE = 14 k = 3
 From Table B.5: q = 3.70

 Step 2: Compute HSD

HSD = = ± 2.05 words
8
45.2

70.3
n

MSerror =q

So, a pair of means must
differ by at least 2.05 in
order to be significantly

different

42

TUKEY HSD TEST

 Step 3: Compute dif ference between each
pair of means and compare to HSD

 M1 – M2 = 3.5 – 5 = 1.5

 M1 – M3 = 3.5 – 6.75 = 3.25

 M2 – M3 = 5 – 6.75 = 1.75

Does NOT
exceed 2.05

Exceeds 2.05

Does NOT
exceed 2.05

43

TUKEY HSD TEST

 What do we conclude?

 M1 does not differ from M2
 There was no difference in word recall when

participants used the rote rehearsal or imagery
techniques

 M2 does not differ from M3
 There was no difference in word recall when

participants used the story technique or imagery
technique

 Only M3 differs from M1
 People remembered significantly more words when

using the story technique than the rote technique.

44

T h e r e wa s a s i g ni fic ant e f fec t o f m e m o r y te c h niqu e o n wo r d

r e c a l l, F ( 2 , 14 ) = 8 . 6 4 , p < . 0 5 , η2 = . 5 5 . Tu key p o s t – h o c

c o m p a r iso ns i nd i c a ted t h a t p a r t i ci pant s r e m e m bered

s i g ni ficant ly m o r e wo r d s w h e n s t u d y ing w i t h t h e s to r y te c h niqu e

( M = 6 . 7 5, S D = 2 . 3 ) t h a n w h e n t h ey s t u d ied w i t h rote r e h e ar sal

( M = 3 . 5 , S D = 1 . 4 ) , p < . 0 5 . N e i t her te c h nique l e d to d i f fer ent

r e s u lt s f ro m t h e i m a ger y m ne m o ni c ( M = 5 , S D = 1 . 9 ) .

FORMAL REPORT

SD for each group

SD =
1

SS
−n 45

REPORTING A REPEATED MEASURES
F -STATISTIC

 A closer look…

F(2, 14) = 8.64, p < .05,
Test

statistic

Observed
value

alpha
level

Degrees of
freedom
(B, Error)

Significance
Sig? p < α

Nonsig? p > α

55.2 =η

Effect
size

46

Learning check!

 Tu to r ial 1 now ava i lable!

 M i nd t ap C H 1 2 – d u e J a n 3 0

47

TO-DO

  • Psy 202H1: �Statistics iI���Module 3: �Repeated Measures ANOVA��
  • Game Plan
  • My Best Study Advice �(for all classes, not just this one)
  • Repeated Measures (or Within-Subject) ANOVA
  • THE LOGIC OF ONE-Way �Repeated Measures ANOVA
  • Repeated Measures ANOVA: �Within-Subjects Design with more than 2 groups
  • Example!
  • Slide Number 8
  • BET WEEN GROUP VARIANCE in repeated mEASURES
  • WITHIN GROUP VARIANCE in repeated mEASURES
  • How do we deal with individual differences?
  • Slide Number 12
  • THE repeated MeaSURES F-RATIO
  • THE Repeated mEASURES F-RATIO
  • Two Stages of the Repeated-Measures ANOVA
  • Structure of the Repeated-Measures ANOVA
  • Repeated Measures Designs:�Pros & Cons
  • Effect size for the �Repeated-Measures ANOVA
  • Repeated-Measures ANOVA�Post Hoc Tests (Posttests)
  • Repeated-Measures ANOVA Assumptions
  • Practice with Repeated Measures ANOVA
  • Structure of the Repeated-Measures ANOVA
  • Repeated Measures ANOVA – Let’s Practice!
  • Data from Memory Study
  • Hypothesis testing with RM ANOVA
  • Computing ANOVA
  • Finding the Critical Value
  • Critical values of F for df=2,14
  • Computing ANOVA
  • Data from Memory Study
  • Computing ANOVA
  • Computing ANOVA
  • Data from Memory Study
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • The ANOVA Summary Table
  • The ANOVA Summary Table
  • Effect Size for Repeated Measures
  • Effect Size for Repeated Measures
  • Tukey HSD Test
  • Tukey HSD Test
  • Tukey HSD Test
  • Tukey HSD Test
  • Formal Report
  • Reporting a Repeated Measures� F-statistic
  • To-DO

TD0409-01 课件/Psy 202_1_Review.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 1:
FOUNDATIONS REVIEW

1

1. Foundations Review
1. Very quick!
2. Not for teaching, but for reminding

GAME PLAN

2

 2 01 i s a T RU E p r e r eq

 U s e yo u r r e s o u rc es, e a r l y a nd o f te n
 Form study groups
 Text + MindTap resources from Ch 1-11
 In “Psy 201 Review” folder at bottom of main page

 Appendix A
 Campus tutoring and study centres (like New College Stats Aid Centre)
 Web resources like
 https://www.learner.org/series/against-all-odds-inside-statistics/
 https://www.khanacademy.org/math/statistics-probability
 http://devpsy.org/links/open_source_textbooks (scroll down for stats)

A NOTE ABOUT THE COURSE PREREQ

3

SUPER SPEEDY REVIEW

4

STATISTICS

 S t a t is ti cs i s t h e s c i enc e o f g a i ning i n s ight f r o m d a t a

 T h e te r m s t at is tics r e fe r s to a s et o f m a t h emat ic al p ro c e d ur es
fo r o r g a ni zing, s u m m ar izing, a nd i nte rp ret ing i nfo r m a t i o n
 “statistics help researchers bring order out of chaos” (p. 5)

5

STATISTICS

 Tw o g e ner a l p u rpo ses :
 To organize and summarize the information so that the researcher

can see what happened in the research study and can communicate
the results to others

 To answer the questions that initiated the research by determining
exactly when general conclusions are justified based on the specific
results that were obtained

6

INFERENTIAL STATISTICS

 C o ns i s t o f t h o s e te c h niques t h a t a l l ow u s to s t u d y s a m p les
a nd t h e n m a ke g e ne r alizat io ns a b o u t t h e p o p u l atio ns f ro m
w h i c h t h ey we r e s e l ec ted

7

INFERENTIAL STATISTICS

Many research situations begin with a
population that forms a normal distribution

A sample is selected from the population, and
receives a “treatment”, & the goal is to evaluate
the treatment

Probability is used to
decide whether the
treated sample is
“noticeably dif ferent”
from the population
 Do we reject the null

hypothesis or not?

8

 I t i s s t a t i s tic ally I M P OS SIB L E to d e m o ns t rate a p h e no m eno n
i s a b s o l utely t r u e ( a l l a b o u t p ro b a bilit y! )
 Researchers instead falsify
 Supporting evidence may not signal a theor y is always true;

disconfirming evidence does signal that a theor y is not always true

 S o , we s e e k to fi nd ev i denc e t h a t i s i t u nl i kely t h a t o u r
hy p ot hes is i s f a l s e

• P ro c e s s : L o g i c o f t h e N u l l H y p ot h e sis
• We determine what the population (distribution) would look like if the null

hypothesis were true
• Then, we see if our sample data are likely to have come from this distribution
• In other words, we look for the likelihood that our data are

consistent with the idea that there is no effect

PROVE

9

HYPOTHESIS TESTING

 A hy pothes i s tes t i s a s t a t i s ti cal p ro c e dur e t h a t u s e s d a t a
c o l l e c ted f ro m a s a m p le to eva l uate a p a r t i cu lar hy p ot hesi s
a b o u t a p o p u l at io n
 We make predictions about an unknown population

10

THE HYPOTHESIS TESTING
PROCEDURE

1. State the hypotheses.
2. Locate the critical region.

(Note: You must find the value for df and use the distribution
table for whichever statistic you are using.)

3. Calculate the test statistic.
4. Make a decision.
 Either “reject” or “fail to reject” the null hypothesis.

5. Report your findings
1. Formal APA statistical tag
2. Plain language description of nature of effect
3. Include effect size

11

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 1 : S t a te t h e hy pot hes es
 We have two opposing hypotheses about the population
 Null hypothesis (H0):
 Predicts that the independent variable (treatment) has no ef fec t on the

dependent variable for a population

 Alternative hypothesis (H1)
 Predicts that the independent variable (treatment) does have an ef fec t on

the dependent variable

12

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 2 : L o c a te t h e c r i t ic a l r e g io n
 Must decide which sample means would be consistent with the null

hypothesis (and therefore lead to accepting the null hypothesis), and
which sample means would be at odds with the null hypothesis (and
therefore lead to rejecting the null hypothesis)
 The alpha value, or the level of significance, is the probability value

used to define “ver y unlikely”
 E.g., with α = .05, we separate the most unlikely 5% of the sample means

(extreme values) from the most likely 95% of the sample means (central
values)

13

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 2 : L o c a te t h e c r i t ic a l r e g io n
 The critical region is defined by the alpha level
 E.g., An alpha of .05 (α = .05) indicates that the size of the critical

region is p = .05 (5% of all possible sample means)

14

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 3 : C a l c ula te s a m ple s t a t is t ics
 E.g., Compare the sample mean (from your data) with the null

hypothesis (e.g., that the population mean is the same as the
original population)

Obtained difference between our data and hypothesis
___________________________________________________________________________________________________________________________________

Standard difference between our data and hypothesis

Observed difference

How much difference we would expect by chance

15

THE HYPOTHESIS TESTING
PROCEDURE

 Step 4: Make a decision
 Two possible outcomes:
 You reject the null hypothesis, and conclude that the treatment does

have an effect.
 You fail to demonstrate that the treatment has an effect, so you fail to

reject the null hypothesis.

16

SUMMA RY: OUTCOMES OF HY POTHESIS TESTING

Decision

True Status of H0

No Effect
H0 True

Effect
H0 False

Reject H0 Type I Error

α
Probability of T1 Error

Correct

1 – β
‘power’

Retain H0
(also called

“fail to reject”)

Correct

1 – α
Level of confidence

Type II Error

β
Probability of T2 Error

17

REVIEW OF
HYPOTHESIS TESTING
WITH THE T STATISTIC

18

SINGLE-SAMPLE T STATISTIC

 Do newborn infants prefer to look at attractive
versus unattractive faces?

 Infants were shown two photographs of women’s
faces (one rated by adults as more attractive than
the other)

 Pair of faces remained on the screen until the baby
accumulated a total of 20 seconds of looking

 DV: Number of seconds spent looking at the
attractive face

 N = 9, M = 13 seconds, SS = 72

(Two-tailed test, α = .05)
19

SINGLE-SAMPLE T STATISTIC

1 . State the null and alternative hypotheses
Null hypothesis:
 The infants have no preference for either face
 H0: μattractive = 10 seconds

Alternative hypothesis:
 The infants prefer one face over the other
 H1: μattractive ≠ 10 seconds

20

SINGLE-SAMPLE T STATISTIC

2 . L o c a te t h e c r i t i c a l r e g i o n :

 df = n – 1 = 9 – 1 = 8
 Two-tailed test at the .05 level of significance
 Critical region consists of t values greater than

+2.306 or less than -2.306

tcrit = +/-2.306

21

SINGLE-SAMPLE T STATISTIC

3 . C a l c ula te t h e te s t s t a t i st ic i n 3 s tep s:

a. Sample variance

a. Estimated standard error

a. t statistic

s2 = SS
n-1

= SS = 72 (given in the problem) = 9
df 8 (calculated previously)

22

SINGLE-SAMPLE T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 3 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t b a b ies d o s h ow a
p r e fer ence w h e n g i ven a c h o i c e b et ween a n a t t r a c ti ve a nd
u na t t r ac t ive f a c e .

5 . Re p o r t:
 “ T h ere wa s a s i g ni fic ant e f fec t o f a t t r a c t ivenes s o n i nf a nt –

l o o k i ng t i m e , t ( 8 ) = 3 . 0 0 , p < . 0 5 , t wo – t a i led. I n ot h e r wo r d s ,
i nf a nt s l o o ke d l o ng e r a t a t t r a c tive f a c e s t h a n ex p ec ted. ”

23

BETWEEN-SUBJECTS OR
INDEPENDENT-MEASURES DESIGNS

Use a separate group of participants for each
treatment condition (or for each population)

We use subscripts to denote
which population or sample
we are referring to:

e.g., μ1, μ2

24

SI NGLE-SAMPLE VERSUS I NDEPENDENT
SAMPLES T FORMULAS

Single sample:

Independent samples:

• According to the null hypothesis, the population mean
difference is 0 (μ1 – μ2 = 0)

25

INDEPENDENT-MEASURES T STATISTIC

n = 10
M = 93
SS = 200

n = 10
M = 85
SS = 160

Do students who regularly watched Sesame Street when they
were growing up have better grades than students who did not
watch Sesame Street?

26

INDEPENDENT-MEASURES T STATISTIC

1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
 N u l l hy p ot h esis :
 There is no difference between the high school grades for students

who watched Sesame Street and those who did not
 H0: μ1 – μ2 = 0

 A l ter nati ve hy p ot hes is:
 There is a difference between the high school grades for students

who watched Sesame Street and those who did not
 H1: μ1 – μ2 ≠ 0

27

INDEPENDENT-MEASURES T STATISTIC

2. Locate the critical region:
 df = (n 1 – 1) + (n 2 – 1) = df 1 + df 2 = 9 + 9 = 18
 Two-tailed test with α = .05, t crit = +/-2.101

28

INDEPENDENT-MEASURES T STATISTIC

3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Pooled variance

a. Estimated standard error

a. t statistic

sp2 = SS1 + SS2 = 200 + 160 = 360 = 20
df1 + df2 9 + 9 18

29

INDEPENDENT-MEASURES T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 4 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e r e i s a s i g ni fic ant
d i f ferenc e b et ween t h e h i g h s c h o o l g r a des o f t h o s e s t u d e nt s
w h o wa t c h ed S e s am e St r e et a nd t h o s e w h o d i d not .

5 . Re p o r t
 “ T h ere wa s a s i g ni fic ant e f fec t o f p ro g r a m c o nd i t i o n o n

a c a d emic a c h i evement , t ( 1 8 ) = 4 . 0 0 , p < . 0 5 , t wo – t a i led. I n
ot h e r wo r d s , t h e s t u d ent s w h o wa t c h ed S e s a me St r e et h a d
h i g her g r a d es t h a n t h o s e w h o d i d not wa t c h t h e p ro g r am”

30

REPEATED-MEASURES T STATISTIC

 T h e d a t a we u s e i n a r e p e ated m e a s ur es t te s t a r e d i f fer enc e
s c o r es :

 N u m er ato r o f t s t a t i s t ic m e a s ur es a c t u al d i f fer enc e b et ween
t h e d a t a M D a nd t h e hy p ot hes is μD

 D e no m i na to r m e a s ur es t h e s t a nd ar d d i f fer ence t h a t i s
ex p ec ted i f H 0 i s t r u e

 S a m e p ro c e s s a s ot h e r te s t s

Difference score = D = X2 – X1

31

REPEATED-MEASURES T STATISTIC

 D o e s t h e c o l o u r r e d i nc r e a s e m e n’ s a t t r a c ti o n to
wo m e n? Re s e a rc her s p r e p ared a s et o f 3 0 wo m e n’ s
p h oto g r aphs , 1 5 m o u nte d o n a r e d b a c k gro und a nd 1 5
m o u nte d o n a w h i te b a c k gro und

 One p i c t u re i s t h e “ te s t p h oto g r aph” a nd i t a p p ear s
t w i c e , o nc e m o u nte d o n r e d a nd o nc e o n w h i te.

 A s a m p le o f n = 9 m e n r a te e a c h o f p h oto g r aphs o n a
1 2 – p o i nt s c a l e . I s t h e te s t p h oto g raph j u d ged
s i g ni ficant ly m o r e a t t r a c t ive w h e n p r e s e nted o n a r e d
b a c k gro und ?

32

REPEATED-MEASURES T STATISTIC

33

REPEATED-MEASURES T STATISTIC

1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
 N u l l hy p ot h esis :
 There is no difference in the attractiveness ratings between the red-

mounted versus white-mounted photo
 H0: μD= 0

 A l ter nati ve hy p ot hes is:
 There is a difference in the attractiveness ratings between the red-

mounted and white-mounted photo
 H1: μD ≠ 0

34

REPEATED-MEASURES T STATISTIC

2. Locate the critical region:
df = n – 1 = 9 – 1 = 8
Two-tailed test with α = .01 , tcrit = +/-3.355

35

REPEATED-MEASURES T STATISTIC

3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Sample variance

a. Estimated standard error

a. t statistic

s2 = SS
n-1

= SS = 18 = 2.25
df 8

50.
9
25.22

===
n
s

s
DM

36

REPEATED-MEASURES T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 6 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e b a c k gro und c o l o u r h a s a
s i g ni ficant e f fe ct o n t h e j u d ged a t t r a c t iveness o f t h e wo m a n
i n t h e te s t p h oto g r aph

5 . Re p o r t:
 “ C h a nging t h e b a c k gro u nd c o l o u r f ro m w h i te to r e d

s i g ni ficant ly i nc r e a sed t h e a t t r a c t ivenes s r a t i ng o f t h e wo m a n
i n t h e p h oto g r aph, t ( 8 ) = 6 . 0 0 , p < . 01 , t wo – t a i led. I n ot h e r
wo r d s , wo m e n o n a r e d b a c k gro und we r e p e rc eived a s m o r e
a t t r a c ti ve t h a n wo m e n o n a w h i te b a c k gro u nd. ”

37

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 S i ng l e s a m p le t te s t
 Comparing an unknown population mean (for our treatment

condition) to a known population mean (the μ for the original
population given in the problem)
 Is our unknown population mean (after treatment) the same as the

mean in the original population? Or is there a difference?
 H0: μtreatment = 10 seconds
 H1: μtreatment ≠ 10 seconds

38

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 I nd e p endent m e a s ures t te s t
 Comparing two unknown population means (for each of our

treatment conditions)
 Is the population mean for the first treatment condition the same as

the population mean for the second treatment condition? Or is there
a difference?
 H0: μ1 – μ2 = 0 (or μ1 = μ2)
 H1: μ1 – μ2 ≠ 0 (or μ1 ≠ μ2)

39

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 Re p e ated m e a s ur es t te s t
 Remember that here we are interested in difference scores

(treatment 2 score – treatment 1 score)
 Is the mean difference for the population equal to zero (no change

between score 1 and score 2)? Or is there a difference?
 H0: μD = 0
 H1: μD ≠ 0

40

  • Psy 202H1: �Statistics iI���Module 1: �Foundations Review��
  • Game Plan
  • A note About the Course Prereq
  • Super Speedy Review
  • Statistics
  • Statistics
  • Inferential Statistics
  • Inferential Statistics
  • PROVE
  • Hypothesis Testing
  • The Hypothesis Testing Procedure
  • The Hypothesis Testing Procedure
  • The Hypothesis Testing Procedure
  • The Hypothesis Testing Procedure
  • The Hypothesis Testing Procedure
  • The Hypothesis Testing Procedure
  • Summary: Outcomes of Hypothesis Testing
  • Review of Hypothesis Testing with the t statistic
  • Single-Sample t Statistic
  • Single-Sample t Statistic
  • Single-Sample t Statistic
  • Single-Sample t Statistic
  • Single-Sample t Statistic
  • Between-Subjects or �Independent-Measures Designs
  • Single-Sample versus Independent Samples t formulas
  • Independent-Measures t Statistic
  • Independent-Measures t Statistic
  • Independent-Measures t Statistic
  • Independent-Measures t Statistic
  • Independent-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Repeated-Measures t Statistic
  • Review: �Hypotheses for different type of t tests
  • Review: �Hypotheses for different type of t tests
  • Review: �Hypotheses for different type of t tests

TD0409-01 课件/Psy 202_2_OneWayANOVA_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 2:
ONE WAY ANOVA

1

1. Intro to ANOVA
1. Designs with More Than Two Groups
2. ANOVA Basics
3. Example

GAME PLAN

2

INTRO TO DESIGNS WITH
MORE THAN 2 GROUPS

3

 Re a s o ns ( i n s e r v i c e o f p r e c is io n):
 Allows researchers to compare multiple treatments
 …with no treatment (control group) or placebo as well
 Allows researchers to compare effects of multiple independent

variables simultaneously
 Factorial designs: more in Psy 202!

 U p fi r s t: m o r e t h a n t wo l evels o f o ne I V
 IV (mood): Positive vs Negative
 IV (mood): Happy, Sad, Angr y

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

4

EFFECT OF ANTI-DEPRESSANT DOSAGE
ON MENTAL HEALTH

0

5

10

15

20

25

30

1 MG 100 MG
0

5

10

15

20

25

30

1 MG 50 MG 100 MG 150 MG

(errors of interpolation) (errors of extrapolation) 5

EFFECT OF CAFFEINE
ON TEST PERFORMANCE

0

1

2

3

4

5

6

7

8

9

10

10 MG 100 MG
0

1

2

3

4

5

6

7

8

9

10

10 MG 50 MG 100 MG

“curvilinear effects”

6

 Ad vant ages
 Advance theor y with precision (boundar y conditions?)
 Insight into non-linear effects
 For complex effects, reduces both:
 The number of experiments conducted
 The number of par ticipants needed

PROS AND CONS OF ADDING LEVELS TO IV

7

n = 20 n = 20

n = 20 n = 20 n = 20 n = 20

Set of 2 condition studies: Total N = 120 One study with one factor with three levels: Total N = 60
8

 Ad vant ages
 Advance theor y with precision (boundar y conditions?)
 Insight into non-linear effects
 For complex effects, reduces both:
 The number of experiments conducted
 The number of par ticipants needed

 C o s t s
 Increase sample size for an individual study (from a study with 2 groups)
 Increases in time, money needed to conduct research

PROS AND CONS OF ADDING LEVELS TO IV

9

 H ow d o we a na l y ze t h e d i f fer enc e b et ween 3 g ro u p s a t a t i m e?
 Option 1: A series of t-tests
 E.g., group 1 vs 2, group 2 vs 3, group 1 vs 3

 ALERT: This INFLATES the likelihood of Type I Error!
 “Test-wise α” = .05

 “Experiment-wise α” ≈ .14 for 3 t-tests

 1 – (1- α)c, where c = number of comparisons
 1 – (1-.05)3 = 1 – .86 = .14
 No longer within range of acceptable risk of Type I Error

 A possible solution: Bonferroni correction
 Divide desired alpha by number of comparisons*
 .05/3 = .017 – new cut-off for determining significance
 BUT, as we learned, by reducing the likelihood of a Type I error, we increase likelihood of Type II

error (or, decrease power)

AND, A STATISTICAL COST

* Don’t forget, we should be planning out our hypothesis tests before we do them, so we know what this number is ahead of time

10

 O ne – Way A N OVA

 A n a n a ly sis o f t h e va r ianc e i n a s et o f s c o r e s o r o b s e r vatio ns ,
te s t s w h et her t h e d i f fer ences i n m e a ns a c ro s s l evel s o f s o m e
f a c to r i s s i g ni fic ant ly g r e ater t h a n t h e d i f fer enc es a m o ng
s c o r e s i n g e ne r al

 C o m p a r es a l l g ro u p m e a ns s i m u lt aneo us ly
 One statistic to interpret (initially)
 Just tells you there is A dif ference, not where the dif ference exists
 So, we do post hoc tests to clarify result
 Handles inflation of Type I Error

A BETTER STATISTICAL SOLUTION

11

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA12

RESEARCH PROBLEM

 Does the presence of others during an emergency
af fect helping behavior?
 Conduct an experiment with 3 conditions
 Wait alone
 Wait with 1 other person
 Wait with 2 other people

 IV = Number of people present
 3 “levels” (0, 1, 2)

 DV = Time it takes (in seconds) to call for help

13

DATA FROM HELPING STUDY

 Seconds lapsed before calling for help

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M1 = 16 M2 = 24 M3 = 29

Are these 3
means

significantly
different

from each
other?

14

THE LOGIC OF ANOVA

t = difference between sample means
difference expected by chance (error)

F = variance between sample means
variance expected by chance (error)

15

THE LOGIC OF ANOVA

 Variance = dif ferences between scores

 Two sources of variance:

 Between group variance
 Within group variance

F = variance between sample means
variance expected by chance (error)

16

BET WEEN GROUP VARIANCE

 Why do people in dif ferent groups dif fer?

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M2 = 24 M3 = 29M1 = 16

17

BET WEEN GROUP VARIANCE

Why do people in dif ferent groups dif fer?

1. Treatment effect = differences caused by our
experimental treatment

 Systematic differences

2. Chance = differences due to random factors including…
 Individual differences
 Experimental error

 Non-systematic, random differences

18

WITHIN GROUP VARIANCE

 Why do people within the same group dif fer,
even though they were treated alike?

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M2 = 24 M3 = 29M1 = 16

19

WITHIN GROUP VARIANCE

Why do people within the same group dif fer?

1. Chance = differences due to random factors
including…
 Individual differences
 Experimental error

 Non-systematic, random differences

20

SOURCES OF VARIANCE

Total Variance

Between Group
Variance

• Treatment Effect
• Chance (error)

Within Group
Variance

• Chance (error)

Numerator of F-ratio Denominator of F-ratio
21

THE F-RATIO

 If H 0 is True:

 If H 0 is False:

F = Between-group (Treatment + Chance)
Within-group (Chance)

F = Treatment Effect + Chance
Chance

F = 0 + Chance
Chance

≈ 1

> 1

22

THE F-RATIO

 F is the ratio between two variance estimates

 Denominator is also called “error term”

 How large does obser ved F have to be to conclude
there is a treatment ef fect (to reject H 0)?

 Compare observed F to critical values

 Based on the sampling distribution of F

23

THE SAMPLING DISTRIBUTION OF F

 A family of distributions (just like t)

 Each with a pair of degrees of freedom (df)

 Critical values shown in F table

 Need 3 pieces of information

(1) α level

(2) dfbetween (dfnumerator )

(3) dfwithin (dfdenominator)

24

THE SAMPLING DISTRIBUTION OF F

 F -values are always positive
 Variance cannot be negative

 If H 0 is true then F ≈1
 So peak appears around 1

F0 1 2 3 4 5 6

p

25

THE SAMPLING DISTRIBUTION OF F

 Shape of distribution will change with df
 Large df will result in less spread to the right
 In practical terms, leads to smaller critical values

of F (closer to 1.0)

F0 1 2 3 4 5 6

p

26

CRITICAL VALUES OF F

A portion of the F distribution table. Entries in regular type are critical values for the α =.05,
and bold type values are for the α=.01. The critical values for df = 2,12 have been
highlighted. Notice that we no longer differentiate between one-tailed or two-tailed
hypotheses. All values of F are positive, and all hypotheses are non-directional. Some
sources print separate tables for different alpha levels.

df
Between

df
Within

α =.05

α =.01

Learning check!

27

HYPOTHESIS TESTING
WITH ANOVA

28

RESEARCH PROBLEM

 Does the presence of others during an emergency
af fect helping behavior?
 Conduct an experiment with 3 conditions
 Wait alone
 Wait with 1 other person
 Wait with 2 other people

 IV = Number of people present
 3 “levels” (0, 1, 2)

 DV = Time it takes (in seconds) to call for help

29

DATA FROM HELPING STUDY

 Seconds lapsed before calling for help

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M1 = 16 M2 = 24 M3 = 29

Are these 3
means

significantly
different

from each
other?

30

HYPOTHESIS TESTING WITH ANOVA

 Research question
 Does presence of others affect helping?

 Step 1: Statistical Hypotheses

 H0: µ1 = µ2 = µ3
 H1: At least one mean is different from another

 No longer differentiate between one-tailed and
two-tailed tests.
 All ANOVA tests are non-directional

 Why?

31

HYPOTHESIS TESTING WITH ANOVA

 Step 2: Decision Rule: Look up critical value of F in Table
 α level
 dfnumerator = dfbetween
 dfdenominator = dfwithin

 Step 3: Compute obser ved F -ratio

 Step 4: Make a Decision (Reject or retain H 0)

 **Step 5: If H 0 rejected, conduct post-hoc comparisons

 Step 6: Interpret and Repor t Findings

32

FINDING THE CRITICAL VALUE

Find Fcritical in Table

 Need to know 3 things

 α level
 dfnumerator = dfbetween
 dfdenominator = dfwithin

 If α = .05 and df = 2,15, Fcritical = 3.68

33

CRITICAL VALUES OF F FOR DF=2,15

Critical region;
Reject H0

3.68 6.23

34

COMPUTING ANOVA

Steps in computing the ANOVA
Compute SS
Compute df (two values!)

Compute MS

Compute F

Keep track of your computations in an ANOVA
Summar y Table

35

COMPUTING ANOVA

The ANOVA summar y table

36

COMPUTING ANOVA

Variance = “Mean Square” (MS) =

F = between-group variance
within-group variance

SS
df

F = MS between
MS within

Throwback
to Module 3!

37

COMPUTING ANOVA

 STEP 1: Compute Sums of Squares (SS)

SSTotal =
2

2 GX
N

−∑

SSBetween =
22 GT

Nn
−∑

SSWithin = Σ(SS for each group) or SSTotal − SSBetween

Where:

• X = each value of X

• T = treatment group total (ΣX)

• G = grand total (ΣT)

• n = sample size of each group

• N = total sample size (Σn)

38

COMPUTING ANOVA

 STEP 2: Compute Degrees of Freedom (df)

dfBetween = k – 1

Where:

• n = sample size of each group

• N = total sample size (Σn)

• k = number of groups

dfWithin = N – k or Σ(n-1)

dfTotal = N – 1

39

COMPUTING ANOVA

 STEP 3: Compute Mean Squares (MS)

MSBetween =
between

between

df
SS

MSWithin =
within

within

df
SS

40

COMPUTING ANOVA

 STEP 4: Compute the F -Ratio

F-Ratio =
within

between

MS
MS

41

COMPUTING ANOVA

The ANOVA Summar y Table

Source SS df MS F

Between group SSB dfB MSB MSB
MSWWithin group

(error)
SSW dfW MSB

Total SST dfT

42

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23

19 23 32

20 23 28

18 30 34

12 20 30

13 21 27

n = 6 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

43

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSTotal = 72295221024418
414

]27…201914[
2

2222 =−=−++

SSTotal =
2

2 GX
N

−∑

44

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSBetween = 51695221003818
414

6
174

6
144

6
96 2222

=−=−





++

SSBetween =
22 G

n
T

N
−∑

45

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSWithin = 722 – 516 = 206

SSWithin= SSTotal − SSBetween

46

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
SS 58 72 76 SSWithin= 206

SSWithin = ΣSS = 58 + 72 + 76 = 206

SSWithin = ΣSS for each group

You will often
be given

these values

47

COMPUTING ANOVA

Let’s fill in our SS values

Source SS df MS F

Between group 516

Within group
(error)

206

Total 722

Notice
722 = 516 + 206
SST = SSB + SSW

48

COMPUTING ANOVA

Now compute degrees of freedom (df)

Source SS df MS F

Between group 516 k-1

Within group
(error)

206 N-k

Total 722 N-1

Where k = 3 N = 18

49

COMPUTING ANOVA

Source SS df MS F

Between group 516 (k-1)
3–1 = 2

Within group
(error)

206 (N-k)
18–3 = 15

Total 722 (N-1)
18–1 = 17

Where k = 3 N = 18

50

COMPUTING ANOVA

Source SS df MS F

Between group 516 2

Within group
(error)

206 15

Total 722 17

Notice
17 = 15 + 2

dfT = dfB + dfW

51

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 SSB/dfB

Within group
(error)

206 15 SSW/dfW

Total 722 17

Now compute the Mean Squares (MS)

52

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 516/2=258

Within group
(error)

206 15 206/15=13.73

Total 722 17

Now compute the Mean Squares (MS)

53

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 MSB
MSW

Within group
(error)

206 15 13.73

Total 722 17

Now compute the F-Ratio

54

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 258 = 18.79
13.73

Within group
(error)

206 15 13.73

Total 722 17

Now compute the F-Ratio

55

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 18.79

Within group
(error)

206 15 13.73

Total 722 17

All of this work for the final F-ratio!

56

MAKE A DECISION AND REPORT

 Does our obser ved F (1 8.79) exceed our critical
value of F (3.68)?

 Yes!

 Reject H 0

 Basic format:
 “There was a significant effect of how many others were

present on the time it took participants to call for help, F
(2, 15) = 18.79, p < .05. [to be continued]”

57

REPORTING A F -STATISTIC

 A closer look…

F(2, 15) = 18.79, p < .05
Test

statistic

Observed
value

alpha
level

Degrees of
freedom
(B, W)

Significance
Sig? p < α

Nonsig? p > α

Learning check!

58

INTERPRETING
FINDINGS

59

INTERPRETING FINDINGS FROM
ANOVA

 At least two of the means are significantly dif ferent
from each other
 But, which ones?
 Must conduct additional analyses to pinpoint specific

mean differences
 Called “post hoc tests” or (posttests)

 In other words,
Omnibus test  the “main test” (in this case the

one-way ANOVA)
Post hoc test  the “follow-ups”

60

POST HOC TESTS

 Pinpoint specific group dif ferences

 Conduct multiple comparisons, controlling for
experimentwise Type I error rate

 Many types of post hoc tests
 Mostly based on comparing absolute value of differences between

pairs of means to a critical value

 Common ones include
 Bonferroni Correction for Multiple Comparisons

 Fisher’s Least Significant Difference (LSD)

 Tukey Honestly Significant Difference (HSD)
61

TUKEY HSD TEST

 Tukey Honestly Significant Dif ference (HSD)
 HSD = minimum dif ference between means

needed for statistical significance
 How big does the difference between two means have

to be in order to conclude that they are significantly
different from each other?
 Like a critical value, but a critical “mean difference”

 Assumes equal n

62

TUKEY HSD TEST

 Step 1: Find the value of “q”(Table)
 Need to know 3 things:
 α
 dfW
 k

 Step 2: Compute HSD

HSD =
n

MSwithinq
Where n = group sample
size, assuming equal n

in each group

63

TUKEY HSD TEST

 Step 3: Compute dif ference between each
pair of means and compare to HSD
 M1 – M2 = ?
 M1 – M3 = ?
 M2 – M3 = ?

 Compare each mean difference to the HSD
 If the difference equals/exceeds the HSD,

conclude that the means are significantly
different from each other

64

COMPUTING TUKEY’S HSD

Alone
1 other
present

2 others
present

14 27 23

19 23 32

20 23 28

18 30 34

12 20 30

13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
Means M1=16 M2=24 M3=29

65

TUKEY HSD TEST: EXAMPLE

 Step 1: Find the value of “q” (Q Table)
 α = .05 dfW = 15 k = 3

66

TUKEY HSD TEST: EXAMPLE

 Step 1: Find the value of “q” (Q Table)
 α = .05 dfW = 15 k = 3
 From Table : q = 3.67

 Step 2: Compute HSD

HSD = = ± 5.55 seconds
6
73.13

67.3
n

MSwithin =q

So, a pair of means must
differ by at least 5.55 in
order to be significantly

different

67

TUKEY HSD TEST: EXAMPLE

 Step 3: Compute dif ference between each
pair of means and compare to HSD

 M1 – M2 = 16 – 24 = – 8

 M1 – M3 = 16 – 29 = -13

 M2 – M3 = 24 – 29 = -5

Exceeds 5.55

Exceeds 5.55

Does not
exceed 5.55

68

TUKEY HSD TEST: EXAMPLE

 What do we conclude?

 M1 differs from M2 and M3
 People waiting alone helped significantly faster than

people waiting with others

 M2 & M3 do NOT differ from each other

 There was no difference in helping times for individuals
waiting with 1 other person and individuals waiting with
2 other people

69

MEASURE OF EFFECT SIZE

 Compute propor tion of variance explained by the
treatment ef fect

 Propor tion of total variance accounted for by
variability between groups

 In ANOVA , r 2 typically called η2 (pronounced “eta
squared”)

r2 =
Total

Between

SS
SS

70

MEASURE OF EFFECT SIZE: EXAMPLE

 71% of the variance in helping behavior
(number of second lapsed before seeking
help) is explained by the number of people
present

r2 = η2 = 71.
722
516

SS
SS

Total

Between ==

71

REPORTING RESULTS OF AN ANOVA

Formal description of findings:
“There was a significant effect of the number of people present
on the time it took (in seconds) for participants to seek help,
F(2,15) = 18.79, p<.05, η2 = .71. Tukey post-hoc comparisons
indicated that participants who were waiting alone helped
significantly faster (M=16, SD=3.4) than participants who
waited with one other person (M=24, SD=3.8) or with two other
people (M=29, SD=3.9), p < .05.”

effect
size

SD for each group

SD =
1

SS
−n

72

INDEPENDENT MEASURES ANOVA
ASSUMPTIONS

 T h e o b s e r vat io ns w i t h in e a c h s a m p le m u s t b e i nd e p endent .
 T h e p o p u lat io n f ro m w h i c h t h e s a m p les a r e s e l e c ted m u s t b e

no r m a l .
 T h e p o p u lat io ns f ro m w h i c h t h e s a m p les a r e s e l e cted m u s t

h ave e qu a l va r i anc es ( h o m o g eneit y o f va r i a nce) .
 Violating the assumption of homogeneity of variance risks invalid

test results.

Learning check!

73

 M i nd t ap Ac c e ss
 Psy 201 last term? Just log in with same credentials and access course

using course code on syllabus
 Psy 201 last summer? Submit course code request to form on syllabus

webpage
 Ever yone else, use direct link to bookstore on syllabus webpage

 Ad d eve r yt hing to yo u r c a l e ndar now!
 Mindtap and tutorial problem sets often overlap
 Due date is not “do date”

TO DO

74

  • Psy 202H1: �Statistics iI���Module 2: �One Way ANOVA�
  • Game Plan
  • Intro to designs with more than 2 groups
  • When Would you want to study more than two groups?
  • Effect of Anti-depressant dosage on Mental Health
  • Effect of Caffeine �on Test Performance
  • Pros and Cons of Adding Levels to IV
  • Slide Number 8
  • Pros and Cons of Adding Levels to IV
  • AND, A statistical Cost
  • A Better Statistical Solution
  • Slide Number 12
  • Research problem
  • Data from Helping Study
  • The Logic of ANOVA
  • The Logic of ANOVA
  • Between Group Variance
  • Between Group Variance
  • Within Group Variance
  • Within Group Variance
  • Sources of Variance
  • The F-Ratio
  • The F-Ratio
  • The Sampling Distribution of F
  • The Sampling Distribution of F
  • The Sampling Distribution of F
  • Critical Values of F
  • Hypothesis Testing with ANOVA
  • Research problem
  • Data from Helping Study
  • Hypothesis testing with ANOVA
  • Hypothesis testing with ANOVA
  • Finding the Critical Value
  • Critical values of F for df=2,15
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Computing ANOVA
  • Make a Decision and report
  • Reporting a F-statistic
  • Interpreting Findings
  • Interpreting Findings from ANOVA
  • Post Hoc Tests
  • Tukey HSD Test
  • Tukey HSD Test
  • Tukey HSD Test
  • Computing Tukey’s HSD
  • Tukey HSD Test: Example
  • Tukey HSD Test: Example
  • Tukey HSD Test: Example
  • Tukey HSD Test: Example
  • Measure of Effect Size
  • Measure of Effect Size: Example
  • Reporting Results of an ANOVA
  • Independent Measures ANOVA Assumptions
  • To Do

TD0409-01 课件/Psy 202_9_advanced concepts_W22_topost-1.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 9:
INTRO TO ADVANCED CONCEPTS

1

1. Nonparametic Tests

2. Intro to Advanced Stats
1. Multilevel modeling
2. Factor analysis
3. Mediation
4. Meta-analysis

GAME PLAN

2

“ W hat d o we ne e d to k now ? ”

NON-PARAMETRIC VS
PARAMETRIC TESTS

3

PA RAMETRIC VS . N O N PARAMETRIC T E ST S

 Pa r a met r ic te s t s m a ke a s s u m pti o ns a b o u t t h e s h a pe o f t h e
p o p u lat io n d i s t r ibut io n a nd ot h e r p o p u lat io n p a r a meter s ( e . g . ,
μD = 0 )
 Normal distribution in the population
 Homogeneity of variance in the population
 Numerical score for each individual

 Re qu i r e d a t a f ro m a n i nte r val o r r a t i o s c a l e

 N o np a r amet r ic te s t s d o not m a ke t h e s e s a m e a s s u m pt io ns
 Most do not state hypotheses in terms of specific population

parameters
 Participants usually classified into categories
 Frequencies
 Nominal, ordinal

4

PARAMETRIC VS. NONPARAMETRIC TESTS

 E.g., What if we wanted to examine the
relationship between the national ranking
of US college basketball teams and the
annual athletics budget of the college?

School National
Rank

Annual $
(in millions)

Gonzaga 1 19

Duke 2 67

Indiana 3 60

Louisville 4 69

Georgetown 5 47

Michigan 6 111

Kansas 7 53

… … …

Virginia 25 46 5

“GOODNESS OF FIT” TEST AND THE
ONE SAMPLE T TEST

 N o np a r amet r ic ( c h i – squ are) ve r s us p a r a met r ic ( t ) te s t

 S i m i lar it y: B ot h te s t s u s e d a t a f ro m o ne s a m p le to te s t a
hy p ot hes is a b o u t a s i ng l e p o p u l atio n

 L eve l o f m e a s ur ement d eter mine s te s t:
 Numerical scores (inter val / ratio scale) make it appropriate to

compute a mean and use a t test
 Classification in non-numerical categories (ordinal or nominal scale)

makes it appropriate to compute proportions or percentages to do a
chi-square test

6

SPECIAL APPLICATIONS OF THE
CHI-SQUARE TEST S

 A s a s u b s t it ute fo r a p a r a met r ic te s t:
 The chi-square test for independence & the Pearson correlation
 The chi-square test for independence & the independent-measures t

test (or ANOVA)

 U nd e r w h a t c i rc u ms t ances wo u l d yo u c h o o s e to c o nd u c t t h e
c h i – s qu are ( no np a r am etr ic a l ter nat ive)?

Learning Check:
1. Data do not meet the assumptions for a standard parametric test
2. Data consist of nominal or ordinal measurements

7

EXAMPLE: THE MANN- WHITNEY U TEST (VS.
I NDEPENDENT T)

 A business owner
measured the job
satisfaction of his day -shif t
and night-shif t workers.
Each employee rated job
satisfaction on a scale
from 1 (not satisfied at all)
to 100 (completely
satisfied). Test whether
ratings of job satisfaction
dif fered between the two
groups using the Mann-
Whitney U test at a .05
level of significance.

Day Shift Night Shift

88 24

72 55

93 70

67 60

62 50

8

EXAMPLE: THE W ILCOXON SI GNED-RANKS T
TEST (VS. R EPEATED-MEASURES T)

 A researcher measured the number of cigarettes
patients smoked per day in a sample of 6 patients
before and 6 months af ter being diagnosed with lung
cancer. Test whether patients significantly changed
their smoking habits following the diagnosis using
the Wilcoxon signed-ranks T test at a .05 level of
significance. Before

Diagnosis
After

Diagnosis

23 20

25 5

13 8

12 16

9 15

22 19 9

EXAMPLE: THE K RUSKAL- WALLIS H TEST (VS.
ONE- WAY ANOVA)

 A researcher asks a sample of 15 students (5 per
group) to view and rate how ef fectively they think
one of three shor t video clips promoted safe
driving. The students rated the clips from 1 (not
ef fective at all) to 100 (extremely ef fective). Test
whether ratings dif fer between groups using the
Kruskal-Wallis H test at a .05 level of significance.

Clip A Clip B Clip C

88 92 50
67 76 55
22 80 43

14 77 65

42 90 39
10

EXAMPLE: THE FRIEDMAN TEST (VS.
R EPEATED ANOVA)

 A d o c to r i s c u r i o u s a b o u t w h e t h e r w o m e n w i t h o u t
h e a l t h i n s u r a n c e m a ke r e g u l a r o f f i c e v i s i t s
t h r o u g h o u t t h e c o u r s e o f t h e i r p r e g n a n c i e s . S h e
s e l e c t s a s a m p l e o f 4 p r e g n a n t w o m e n a n d r e c o r d s
t h e n u m b e r o f h o s p i t a l v i s i t s m a d e d u r i n g e a c h
t r i m e s te r o f t h e i r p r e g n a n c y.

 Te s t w h e t h e r t h e r e a r e d i f f e r e n c e s i n t h e n u m b e r o f
o f f i c e v i s i t s m a d e ov e r t h e c o u r s e o f t h e p r e g n a n c y
u s i n g t h e Fr i e d m a n te s t a t a . 0 5 l e v e l o f
s i g n i f i c a n c e .

Participant 1st Trimester 2nd Trimester 3rd Trimester

A 3 5 8
B 6 4 7
C 2 0 5

D 4 3 2
11

NONPARAMETRIC TESTS OVERVIEW

Statistic Purpose Example

Mann Whitney Compare two independent groups
when assumptions for independent t
not met

Determine whether a control group and
treatment group are different when the DV
is ordinal

Wilcoxon Signed-Rank Compare two matched or within
subject conditions when
assumptions for dependent t not
met

Determine whether ordinal ratings of
academic skill are different from ratings of
athletic skill for same group

Kruskal-Wallis Compare two or more independent
groups when assumptions for
oneway between subjects ANOVA
not met

Determine whether test scores from three
different instructional conditions are
different when scores are not distributed
normally

Friedman’s ANOVA Compare two or more matched or
within subject conditions when
assumptions for repeated ANOVA
not met

Determine whether ordinal ratings of
academic skill, athletic skill, and social skill
are different for same group of students

Chi Sq Goodness of Fit Compare observed frequency
distribution to null distribution

Determine whether there is a different in
proportion of A, B, C, D, F grades awarded
in school

Chi Sq Test of Independence Determine whether two categorical
variables are related

Test whether grade distributions differ by
gender 12

PARAMETRIC VS. NONPARAMETRIC TESTS

If you have a choice, which should you choose?

Things to consider:
Measurement
Assumptions
Variance
Undetermined scores

13

INTRO TO ADVANCED
PROCEDURES

14

I NTRO TO SOME ADVANCED PROCEDURES

 P u rp o s e:
 To provide you with some knowledge of additional procedures that

are available to help answer research questions

 To allow you to recognize and better understand these more
advanced procedures when you come across them in research
articles

15

ADVANCED PROCEDURES:
MULTILEVEL MODELING

 E s s e n t i al l y, t h i s re fe r s to c as e s o f re g re s s i o n w i t h g ro ups

 E xampl e : A re s e arc h e r i s i n te re s te d i n h ow muc h t h e n umbe r o f
h o ur s o n e s pe n ds s t ud y i n g fo r a s t at i s t i c s exam pre di c t s s c o re s o n
t h e exam. He s ur vey s s t ude n t s f ro m a do z e n di f fe re n t s t at i s t i c s
c l as s e s . P ro bl e m? Th i n g s c o ul d be ve r y di f fe re n t i n t h e d i f fe re n t
c o ur s e s .
 Dif ferent teachers, dif ferent assignments, dif ferent tests, etc.

 So l ut i o n ? M ay c arr y o ut t h e re g re s s i o n s e parate l y fo r e ac h c o ur s e ,
t h e n ave rag e t h e re g re s s i o n c o e f fi c i e n t s ac ro s s t h e d i f fe re n t
c o ur s e s . M ay al s o g o a s te p f ur t h e r an d t ake i n to c o n s i de rat i o n
s o me u pper-lev e l va r ia b le s (g ro up-l eve l )  e . g . , do e s te ac h e r
ex pe ri e n c e pre d i c t ave rag e te s t s c o re s i n t h e i r c l as s e s ?
 Example of a standard multilevel modeling procedure (multilevel, because

you are looking at both lower-level (individual) and upper-level (group)
variables)

16

ADVANCED PROCEDURES:
MULTILEVEL MODELING

17

ADVANCED PROCEDURES:
FACTOR ANALYSIS

 Fa c to r a na l ys is i s a s t a t i s tic al p ro c e d ur e a p p lied i n
s i t u at io ns w h e r e m a ny va r i ables a r e m e a s ur ed. I t i d e nt ifies
g ro u p s o f va r i a bles ( f a c to r s ) t h a t te nd to b e c o r r e l ated w i t h
e a c h ot h e r a nd not ot h e r va r i a bles.
 Factor loading  the correlation of a variable with a factor.

Variables may have loadings on each factor, but usually have high
loadings on only one.

 E . g . , “ Fa c to r a na l y s is o f t h e D e nt a l Fe a r S u r vey d i s c l o s ed
t h r e e s t a b le a nd r e l i able f a c to r s . T h e fi r s t f a c to r r e l a ted to
p a t ter ns o f d e nt a l avo i d ance a nd a nt i c i pato r y a nx i et y. T h e
s e c o nd f a c to r r e l ated to fe a r a s s o c i ated w i t h s p e c ific d e nt a l
s t i m u li a nd p ro c e dur es . Fa c to r t h r e e c o nc e r ne d fe l t
p hy s io lo gic a ro u s a l d u r i ng d e nt a l t r e a t m ent . ”

18

 H ow d o p s yc ho lo gi st s fi nd u nd e rlying d i m e ns io ns wh e n we c a n
o nl y o b s e r ve s p e c i fic b e h avio ur s ?

FROM BEHAVIOURS TO CONSTRUCTS

19

1 . HOW MANY SEA MONSTERS?

20

1 . HOW MANY SEA MONSTERS?

21

2. HOW MANY SEA MONSTERS?

22

2. HOW MANY SEA MONSTERS?

23

3. HOW MANY SEA MONSTERS?

24

3. HOW MANY SEA MONSTERS?

25

 H ow c o u l d yo u te l l t h e nu m b er o f s e a m o ns te r s w h e n yo u
c o u l d o nl y s e e p a r t s o f t h e m ? Yo u s aw v i s i ble p a r t s m ove
to g et her a nd ot h e r s m ove i nd e pendent ly ; yo u d i d a n
i nt u i t ive c o r r ela t io n.

 B y l o o k i ng a t t h e c o r r e l at io ns b et ween a l l t h e p a r t s we c a n
s e e ( o b s e r vable b e h avio r s ), we c a n i nfe r s o m et h ing a b o u t
t h e i r u nd e r lyi ng na t u r e ( t h e o r etic al c o ns t r u c t s ) .

 Fa c to r A n a ly s is i s a s t a t i s t ica l m et h o d t h a t l o o k s a t h ow l ot s
o f d i f fer ent o b s e r vat io ns c o r r e l ate a nd d eter mine s h ow m a ny
t h e o r etic al c o ns t r u c t s c o u l d m o s t s i m p ly ex p la in w h a t yo u
s e e .

FROM SEA MONSTERS TO FACTOR
ANALYSIS

26

ADVANCED PROCEDURES:
FACTOR ANALYSIS

What name would you
give to each of these
different factors?

27

ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS

 T h i s i s a p a r t i cul ar t y p e o f p a t h a na l y sis t h a t te s t s w h et her a
p r e s um ed c a u s al r e l a tio ns hip b et ween t wo va r i ables i s d u e
to s o m e p a r t i c ular i nte r vening va r i a ble ( M – me d iat in g
var i able )
 E.g., Fraley & Aron, 2004: Strangers meeting while either doing

something humourous or non-humourous. Those in the humourous
condition felt closer to their partners. Researchers wanted to
demonstrate that this result was mediated in part by the humour
distracting people from the discomfort of meeting a stranger.
 In other words, the reason humour increased closeness is that it

was distracting.

28

ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS

29

ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS

B a ro n & Ke nny ’ s ( 1 9 8 6 ) 4 s te p s fo r e s t a blis hing m e d iat io n:
1 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s Y.
2 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s M .
3 . S h ow t h a t M p r e d ic ts Y i n t h e c o ntex t o f a m u l t iple

r e g r es si o n i n w h i c h X i s a l s o i nc l u ded a s a p r e d ic to r.
4 . S h ow t h a t , w h e n M i s i nc l u ded a s a p r e d ic to r o f Y ( a l o ng

w i t h X ) , X no l o ng e r p r e d ic ts Y ( fo r f u l l me d iat ion ) o r t h a t
t h e p r e d ic tio n i s we a ker ( k now n a s p ar t i al me d iat io n ).

x y

m

*** not for cross-sectional designs!
30

 I k now w h a t o ne s t u d y s ay s … . B u t w h a t a b o u t a l l o f t h e
ot h e r s ?

 Rev i ew p a p er : a qu a l it at ive s u m m ar y o f t h e s t a te o f t h e
l i te rat ur e o n a g i ven r e s e arc h qu e s t io n

 M et a – a na ly sis : a s t a t i st ic al a na l y sis t h a t y i e l ds a qu a nt i t ative
s u m m ar y o f a s c i e nt ific l i ter at ure.
 Or, a “study of studies”

 Unit of analysis: effect size!

ADVANCED PROCEDURES:
META – ANALYSIS

31

ADVANCED PROCEDURES:
META – ANALYSIS

32

 A m a j o r l i m i ta tio n to m et a – analy sis : T h e F i l e D r awe r P r o ble m

 C a u t io n: j u s t b e c a us e i t i s s t a t i st ic al d o e s n’ t m e a n i t i s
p e r fect ly o b j e c ti ve!

 Ego depletion
 Ca r ter, E. C. , Ko f l er, L. M . , Fo rster, D. E. , & M cCul l o ugh, M . E. (2015). A s er i es o f meta-

a na l y ti c tests o f the depl eti o n effect: s el f – co ntro l do es no t s eem to rel y o n a l i mi ted
res o urce. Jo urnal o f Ex pe ri m e ntal Psyc ho l o gy: G e ne ral , 144 (4), 796- 815.
https :// do i . o rg /1 0. 1 0 37/ x ge 0 00 0 08 3

 H a g ger, M . S . , Wo o d, C. , S ti ff, C. , & Chatzi s a ra nti s , N . L. (2010). Ego depl eti o n a nd the
strength mo del o f s el f – co ntro l : a meta – a na l ys i s . Psyc ho l o gi cal B ul l eti n , 136(4), 495–
525. https :// do i . o rg /1 0. 1 0 37/a 00 1 94 8 6

 Ovulator y cycle effects
 Gildersleeve, K., Haselton, M. G., & Fales, M. R. (2014). Do women’s mate

preferences change across the ovulatory cycle? A meta-analytic
review. Psychological Bulletin, 140(5), 1205-1259.
https://psycnet.apa.org/doi/10.1037/a0035438

 Wood, W., Kressel, L., Joshi, P. D., & Louie, B. (2014). Meta-analysis of
menstrual cycle effects on women’s mate preferences. Emotion Review, 6(3),
229-249. https://doi.org/10.1177%2F1754073914523073

ADVANCED PROCEDURES:
META – ANALYSIS

33

META – ANALYSIS:
SOCIAL RELATIONSHIPS AND HEALTH

34

Better!

 Op e n wo rk h o u r s i n t u to r i a l s e s s i o n

 D a t a A na l ys is P ro j e c t!

35

TO DO

  • Psy 202H1: �Statistics iI���Module 9: �Intro to Advanced Concepts�
  • Game Plan
  • Non-Parametric vs Parametric Tests
  • Parametric vs. Nonparametric Tests
  • Parametric vs. Nonparametric Tests
  • “Goodness of Fit” Test and the �One Sample t Test
  • Special Applications of the� Chi-Square Tests
  • Example: The Mann-Whitney U Test (vs. independent t)
  • Example: The Wilcoxon Signed-Ranks T test (vs. repeated-measures t)
  • Example: The Kruskal-Wallis H test (vs. one-way anova)
  • Example: The Friedman test (vs. repeated anova)
  • NonParametric Tests Overview
  • Parametric vs. Nonparametric Tests
  • Intro to Advanced Procedures
  • Intro to Some Advanced Procedures
  • Advanced Procedures: �Multilevel Modeling
  • Advanced Procedures: �Multilevel Modeling
  • Advanced Procedures: �Factor Analysis
  • From Behaviours to Constructs
  • 1. How many sea monsters?
  • 1. How many sea monsters?
  • 2. How many sea monsters?
  • 2. How many sea monsters?
  • 3. How many sea monsters?
  • 3. How many sea monsters?
  • From sea monsters to factor analysis
  • Advanced Procedures: �Factor Analysis
  • Advanced Procedures: �Mediational Analysis
  • Advanced Procedures: �Mediational Analysis
  • Advanced Procedures: �Mediational Analysis
  • Advanced Procedures: �Meta-Analysis
  • Advanced Procedures: �Meta-Analysis
  • Advanced Procedures: �Meta-Analysis
  • Meta-Analysis: �Social Relationships and Health
  • To do

TD0409-01 课件/Psy 202_8_Chi_Square_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 8:
INTRO TO CHI SQUARE

1

1. Introduction to Chi-Square
1. Research Spotlight: Selfies in the Wild
2. Hypothesis Testing Steps
3. When to use the Chi-Square
4. An example
5. Practice!

2. Chi-Square Test of Independence
1. When We Use It
2. A Research Example
3. Practice

3. Back to the Big Picture

GAME PLAN

2

INTRODUCTION TO
CHI-SQUARE ANALYSIS

3

 Hy pot h e s i s te s t s us e d t h us f ar te s te d hy pot h e s e s abo ut
po p ul at i o n pa r a m eter s .

 Paramet ri c te s t s s h are s eve ral as s umpt i o n s
 Normal distribution in the population
 Homogeneity of variance in the population
 Numerical score for each individual

 N o n param et ri c te s t s are n e e de d i f re s e arc h s i t uat i o n do e s n ot
me et al l t h e s e as s umpt i o n s .
 More next week!

 N o n param et ri c te s t s …
 Make few assumptions about distribution (as compared to all of our

assumptions about normality and variance for z, t, F, etc.)
 Usually use categories/frequencies

PARAMETRIC VS NONPARAMETRIC TESTS

4

Statistical Test IV DV

Correlation/Linear
regression

Continuous Continuous

Independent
samples t-test

Two independent
categories

Continuous

Paired sample
t-test

Two related groups Continuous

ANOVA Multiple categories Continuous

Chi-square Two or more
categories

Categorical

5

THE CHI-SQUARE STATISTIC

 Most statistical tests you learn require quantitative
data (correlation, z-test, t-test, etc.)

 What if we have questions about categories or
classifications?
 Do college students prefer Coke or Pepsi?
 Is the racial breakdown of UofT representative of the general

population?

 These questions involve counting the number of
people in dif ferent groups/categories

 They involve frequency distributions

6

THE CHI-SQUARE STATISTIC

 The Chi-Square statistic: χ2
 Tests whether one set of proportions is different from

another
 Done by comparing frequencies (counts)

 Two types of hypothesis tests
 χ2 Goodness-of-fit test
 χ2 Test of independence

7

χ2 TEST FOR GOODNESS-OF -FIT

 Goodness-of-fit test uses frequency data from
a sample to test hypotheses about propor tions
in a population.

 Each individual is classified into ONE categor y
on the variable of interest.
 Do you prefer Coke or Pepsi?
 Do your prefer the original or prequel Star Wars movies?

 Simply count how many people in the sample
are in each categor y

8

χ2 TEST FOR GOODNESS-OF -FIT

 H O specifies the propor tion of the population
that should be in each categor y.

 The propor tions from H O are used to compute
expected frequencies

 The expected frequencies describe how the sample
would appear if H O was true

 χ2 then compares obser ved frequencies (from the
sample) to expected frequencies (from H O)

9

χ2 TEST FOR GOODNESS-OF -FIT

 Why is it called “goodness-of-fit?”

 We test whether our “obser ved” frequencies
“fit” against our “expected” frequencies.

 Kind of like model testing (remember, R 2 as a
statistic of “goodness of fit”)

10

h t t p s : / / info r mat io nis beaut iful . net/ v isual izat io ns /di ver s it y – in- tec h /

RESEARCH SPOTLIGHT:
HAVE WE REACHED

GENDER PARIT Y IN TECH
COMPANIES?

11

 W h en wo u l d yo u s ay t h a t g e nd er e qu a lit y h a d b e e n a c h i eved
i n te c h ?

 F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o u nt r y
 51% female in Canada (Census 2016)

 F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o m p a ny
 https://informationisbeautiful.net/visualizations/diversity -in-tech/

 D o t h ey m a t c h ?

GENDER PARIT Y IN TECH

12

GENDER PARIT Y IN TECH

13

RESEARCH PROBLEM

 D o e s a new te a c hing m et h o d i m p rove te s t p e r fo r manc e
o n a s t a nd ar dized m a t h te s t ?

 I n p r i o r ye a r s , 6 0 % o f s t u d ent s p a s s ed t h e te s t ( 4 0 % f a i l ed).

 D a t a f ro m t h e C U R R E NT s c h o o l ye a r ( 2 0 0 c h i l dr en) :

 Is there a significant change in test per formance?

Student Performance this Year

Pass Fail
150 50 Total n = 200

14

 This is “frequency” (or “count”) data
 200 children were sampled
 150 children passed
 50 children failed

RESEARCH PROBLEM

Test Performance

Pass Fail
150 50 Total n = 200

15

STEP 1: STATISTICAL HYPOTHESES

H0: There is no change/difference in student
performance

The pass rate this year (with the new teaching method) will be
the same as the pass rate in prior years (60% pass, 40% fail).

H1: The is a change/difference in student
performance

16

STEP 2: FIND CRITICAL VALUE

Two pieces of information needed
 α level
 df = C-1 (where C = number of categories)

Critical value from Table
 α = .01
 df = 2 -1 = 1
 Critical value = 6.63

17

CRITICAL VALUE OF χ2

χ2
6.63

Decision Rule: If observed χ2 equals or exceeds 6.63, then reject Ho

18

STEP 3: COMPUTE OBSERVED χ2

 f o = o b s e r ved f r e qu e ncy ( fo r e a c h c e l l )
 f e = ex p ec ted f r e qu enc y ( fo r e a c h c e l l ) = p n
 p = p ro p o r t i o n s t a te d i n t h e nu l l hy p ot hes is
 n = tot a l s a m p le s i z e


=
e

eO

f
ff 22 )(χ

19

COMPUTE OBSERVED χ2

 How do you find p?
 We are given information about the known population

distribution in previous years.

 60% pass and 40% fail.

 Thus the proportions (p) under the null hypothesis
are:
 p = .60 pass
 p = .40 fail

 If the problem doesn’t specify, figure out what the
question is asking: e.g., if 2 sodas are preferred at equal
rates, what proportion of people should prefer each one?
What about 3 different sodas?

20

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50
Total
n = 200

21

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50
Total
n = 200

Expected
frequencies (fe)

fe = pn

22

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50 Totaln = 200

Expected
frequencies (fe)

fe = pn

.60 × 200 .40 × 200

23

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50 Totaln = 200

Expected
frequencies (fe)

fe = pn

.60 × 200
= 120

.40 × 200
= 80

If HO is true (and there
is no change), we expect
to see 120 students pass

and 80 fail.

24

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

25

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2


=
e

eO

f
ff 22 )(χ

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

26

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2


=
e

eO

f
ff 22 )(χ

+∑

=
120

)120150( 22χ

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

27

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2


=
e

eO

f
ff 22 )(χ

80
)8050(

120
)120150( 222 −+∑


Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

28

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2


=
e

eO

f
ff 22 )(χ

80
)8050(

120
)120150( 222 −+∑


=χ = 7.5 +11.25 = 18.75

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

29

STEP 4: MAKE A DECISION

Reject Ho
Because observed χ2 (18.75) exceeds the

critical value (6.63)

30

STEP 5: REPORT RESULTS

 What does this mean?
 There was a significant change in test performance.

Students performed better this year (75% passed)
compared to prior years (60% passed).

 “Based on data from the current school year, test
performance was significantly improved with the new
teaching method, χ2 (1, N = 200) = 18.75, p < .01. A
larger percentage of students passed the test this year
(75%) compared to prior years (60%)”

31

REPORTING A

 A closer look…

χ2(1, N = 200) = 18.75, p < .01, two-tailed

Test
statistic

Observed
value

alpha
level Other tests:

One or two
tailed

All Chi-Sq
tests are one-

tailed!

Degrees of
freedom

Sample size
(Chi-Sq only)

Learning check!

32

CHI SQUARE TEST OF
INDEPENDENCE

33

THE CHI-SQUARE STATISTIC

 Working with categorical variables
 Two types of hypothesis tests
 χ2 Goodness-of-fit test
 χ2 Test of independence

 The Goodness of fit test
 We have one variable
 We test whether observed frequencies (proportions) match

expected or hypothesized frequencies

34

THE CHI-SQUARE STATISTIC

 What if we have more than one variable?

 What if we have questions about the relationship
between two categorical variables?
 Are women more likely than men to prefer Coke to Pepsi?
 Do students vs. faculty differ in their opinion

about raising student fees (yes/no)?

 Need the Chi-square test of independence

35

L l oy d , H u g e nber g, & c o l l e agues

RESEARCH SPOTLIGHT:
ARE THERE SYSTEMATIC

DIFFERENCES IN WHAT
KIND OF SELFIES MEN

AND WOMEN POST?
• A ng l e r e l a ted to p owe r
• A ng l e r e l a ted to g e nd er
• Powe r r e l a ted to g e nd er
• D ow ns t r e am c o ns e qu e nces ?

36

STUDY 1 METHOD

 Compiled 932 selfies from www.iconosquare.com
(Instagram)

 4 trained raters
 Judged target gender
 Judged whether selfie was taken below, at, or above eye level

37

STUDY 1: METHOD

38

STUDY 1: RESULTS

Low
(below

eye level)

Neutral
(at eye
level)

High
(above eye

level)

Male 139 240 87

Female 65 230 171

X2 (2, N = 932) = 54.40, p < .001
39

STUDY 1: DISCUSSION

 Pe o ple ta ke se lfies fro m va rie d a n g les
 A n g les c h o sen dif fe r by ta rg et g e n der

40

RESEARCH PROBLEM

 Are people more likely to litter when the environment
is already dir ty?

 Conduct an experiment:
Hand people a flier at the entrance to a parking lot
Parking lot is either dirty or clean
Measure whether person throws flier on the ground

 Is there a significant association between cleanliness of
the environment and littering?

 Kind of like a correlation, but for categorical variables

41

THE CHI-SQUARE STATISTIC

 Need a new chi-square statistic

 The Chi-Square test of independence
 Tests whether two categorical variables are related to each

other
 Whether two variables “depend” on each other
 Done by comparing frequencies (counts)

42

χ2 TEST OF INDEPENDENCE

 Test of independence uses frequency data from a sample
to test hypotheses about propor tions in a population.

 Each individual is classified into one categor y based on
the combination of two variables
 Are women more likely than men to prefer Coke to Pepsi?
 Do students vs. faculty differ in their opinion

about raising student fees (yes/no)?

 Simply count how many people in the sample are in each
categor y

43

χ2 TEST OF INDEPENDENCE

 H O states that the two variables ARE NOT related
 Assumes that frequencies (proportions) on one variable are the

same across levels of the other variable

 H 1 states that the two variables ARE related

 χ2 then compares obser ved frequencies (from the
sample) to expected frequencies

 Expected frequencies are computed from
sample data

44

χ2 TEST OF INDEPENDENCE

 Why is it called “test of independence”?

 We test whether the frequencies (proportions) on
one variable are independent from another variable

45

COMPUTING χ2

 Same formula for χ2

 f o = o b s e r v e d f r e q u e n c y ( f o r e a c h c e l l )
 f e = e x p e c te d f r e q u e n c y ( f o r e a c h c e l l )
 B u t g e t t i n g e x p e c te d f r e q u e n c i e s ( f e) a r e a b i t m o r e

c o m p l i c a te d !


=
e

eO

f
ff 22 )(χ

46

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

COMPUTING χ2

 Calculate χ2

 w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :

f c = c o l u m n to t a l

f r = r o w to t a l
n = to t a l s a m p l e s i z e


=
e

eO

f
ff 22 )(χ

n
ff

f rce =

47

HYPOTHESIS TESTING STEPS

Step 1: State the statistical hypotheses

Step 2: Create a decision rule

Step 3: Collect data and compute “obser ved”
test statistic

Step 4: Make a decision

Step 5: Repor t and summarize your results

48

RESEARCH PROBLEM

 A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent
i s a l r e ady d i r t y ?

 C o nd u c t a n ex p erime nt:

 Hand people a flier at the entrance to a parking lot

 Parking lot is either dirty or clean

 Measure whether person throws flier on the ground

 Is there a significant association between cleanliness of
the environment and littering?

49

RESEARCH PROBLEM

 A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent i s
a l r e ady d i r t y ?

 D a t a f ro m 10 0 p a r t i cipant s :

 Is there a significant relationship between cleanliness
of the environment and littering behavior?

Subject’s response

Environment No Litter Litter
Clean 45 5
Dirty 30 20

Total n = 100

50

STATISTICAL HYPOTHESES

 State the Statistical Hypotheses
H0: There is no relationship between cleanliness of

the environment and littering

H1: There is a predictable relationship between
cleanliness of the environment and littering

51

FIND CRITICAL VALUE

 Create Decision Rule (find critical value)
Two pieces of information needed
 α level
 df = (R-1)(C-1)

(where R=number of rows, C = number of columns)

Critical value from Table
 α = .05
 df = (2 -1)(2-1) = 1
 Critical value = 3.84

52

CRITICAL VALUE OF χ2

χ2
3.84

Decision Rule: If observed χ2 equals or exceeds 3.84, then reject Ho

53

COMPUTE OBSERVED χ2

 Calculate χ2

 w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :

f c = c o l u m n to t a l
f r = r o w to t a l
n = to t a l s a m p l e s i z e


=
e

eO

f
ff 22 )(χ

n
ff

f rce =

54

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

Subject’s response Row totals

Environment No Litter Litter
Clean 45 5 50
Dirty 30 20 50

Column totals 75 25
Total n = 100

COMPUTE EXPECTED
FREQUENCIES

 Obser ved frequencies with row and column totals:

 Next, compute expected frequency for each cell

n
ff

f rce =
55

COMPUTE EXPECTED
FREQUENCIES

 Expected frequency for each cell:

Subject’s response Row totals

Environment No Litter Litter
Clean 37.5 12.5 50
Dirty 37.5 12.5 50

Column totals 75 25 Total n = 100

n
ff

f rce =
5.37

100
5075

, ==
×

NoLitterClean 5.12
100

5025
, ==

×
LitterClean

5.37
100

5075
, ==

×
NoLitterDirty 5.12100

5025
, ==

×
LitterDirty

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

56

COMPUTE OBSERVED χ2

 Calculate χ2 ∑

=
e

eO

f
ff 22 )(χ

5.12
)5.1220(

5.37
)5.3730(

5.12
)5.125(

5.37
)5.3745( 22222 −+


+


+∑


= 1.5 + 4.5 + 1.5 + 4.5 = 12.00

Subject’s response

Environment No Litter Litter
Clean 45/37.5 5/12.5
Dirty 30/37.5 20/12.5

57

MAKE A DECISION

 Make a decision
Reject Ho
Because observed χ2 (12.00) exceeds the

critical value (3.84)

58

 Repor t Results
 “Results revealed a significant association between cleanliness

of the environment and people’s tendency to litter, χ2 (1, N =
100) = 12.0, p < .05. Participants were much more likely to
litter in a dirty environment (40%) than in a clean environment
(10%).”
 The sample data suggest that there is a significant association

between cleanliness of the environment and people’s tendency
to litter. When people were in a dirty environment they were
much more likely to litter (40%) compared to when they were
in a clean environment (10%).

 Where did I get 40% and 10%?
 20/50 littered in the dirty condition = 40%
 5/50 littered in the clean condition = 10%

REPORT AND SUMMA RIZE FIND INGS

59

REPORTING A χ2

 A closer look…

χ2(1, N = 100) = 12.0, p < .05, two-tailed

Test
statistic

Observed
value

alpha
level Other tests:

One or two
tailed

All Chi-Sq
tests are one-

tailed!

Degrees of
freedom

Sample size
(Chi-Sq only)

60

COHEN’S W

 C o h e n’ s w c a n b e u s e d to m e a s ur e e f fec t s i z e fo r b ot h t y p e s
o f c h i – s quar e te s t s :

 C o h e n s u g ges ted t h a t .10 i s a s m a l l e f fe ct , . 3 0 a m e d ium
e f fec t , a nd . 5 0 a l a r g e e f fec t .

 C o h e n’ s w d o e s not u s e t h e s a m p le s i z e , t h e r e fo re t h e s a m p le
s i z e d o e s not a f fe ct t h e va l u e o f w.


=
e

eo

P
PP

w
2)(

n
f

P
o

o =

Observed
proportion

61

THE PHI-COEFFICIENT

 Fo r a 2 x 2 m a t r i x , t h e p h i c o e f fi c ient (Φ) m e a s ur es t h e
s t r e ng t h o f t h e r e l a tio ns hip

So Φ2 is the proportion of variance
accounted for, just like r2n

2

φ
χ

=

62

EFFECT SIZE IN A LARGER MATRIX

 Fo r a l a r g er m a t r i x , a m o d i fi cat io n o f t h e
p h i – c o ef fic ient i s u s e d: C r a m er ’ s V

 d f * i s t h e s m a l ler o f ( R – 1 ) o r ( C – 1 )

*)(

2

dfn
V

χ
=

63

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 Independence of Obser vations
 E.g., The observation that Subject A is a Chemistry

major must be independent from the observation
that Subject B is an English major
 Random sampling

 Each observed frequency needs to come from a
different participant
 What if people can be double-majors?

64

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 S i z e o f E x p ec ted Fr e quenc ies
 Cochran’s Rule: Cell frequencies should all be > 5

 More lenient updates to the rule:
 No expected cell frequency should be less than 1
 No more than 20% of the expected cell frequencies should be less than 5
 Note: For a 2×2 matrix this means a single cell

 Solutions?
 Increase your sample size
 Consider collapsing categories together (should be done with caution –

can make it more dif ficult to reject H0)

65

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 E . g . , E x p ec ted Fr e qu enc ies

Teens Young
Adults

Middle
Aged

Seniors

Liberal 20 18 9 3

Conservative 4 12 2 8

Young Old

Liberal 38 12

Conservative 16 10

Learning check!

66

BACK TO THE BIG
PICTURE

67

What type of
claim?

Frequency
(one variable)

Chi-Sq
Goodness

Association
(two variables)

Categorical? Chi-Sq Ind

Quantitative? Correlation

HOW TO CHOOSE A TEST

68

 D a t a p ro j e ct
 *** Important! Updated Data File as of today, March 14

69

TO-DO

  • Psy 202H1: �Statistics iI���Module 8: �Intro to Chi Square�
  • Game Plan
  • Introduction to �Chi-Square Analysis
  • Parametric vs Nonparametric tests
  • Slide Number 5
  • The Chi-Square Statistic
  • The Chi-Square Statistic
  • 2 Test for Goodness-of-Fit
  • 2 Test for Goodness-of-Fit
  • 2 Test for Goodness-of-Fit
  • Research Spotlight: Have we reached gender parity in Tech companies?
  • Gender parity in tech
  • Gender parity in Tech
  • Research Problem
  • Research Problem
  • Step 1: Statistical Hypotheses
  • Step 2: Find Critical Value
  • Critical value of 2
  • Step 3: Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Compute observed 2
  • Step 4: Make a decision
  • Step 5: Report results
  • Reporting a
  • Chi Square Test of Independence
  • The Chi-Square Statistic
  • The Chi-Square Statistic
  • Research Spotlight: Are there systematic differences in what kind of selfies men and women post?
  • Study 1 Method
  • Study 1: Method
  • Study 1: Results
  • Study 1: Discussion
  • Research Problem
  • The Chi-Square Statistic
  • 2 Test of Independence
  • 2 Test of Independence
  • 2 Test of Independence
  • Computing 2
  • Computing 2
  • Hypothesis testing steps
  • Research Problem
  • Research Problem
  • Statistical Hypotheses
  • Find Critical Value
  • Critical value of 2
  • Compute observed 2
  • Compute expected frequencies
  • Compute expected frequencies
  • Compute observed 2
  • Make a decision
  • Report and Summarize Findings
  • Reporting a 2
  • Cohen’s w
  • The Phi-Coefficient
  • Effect Size in a Larger Matrix
  • Assumptions & Restrictions� for Chi-Square Tests
  • Assumptions & Restrictions� for Chi-Square Tests
  • Assumptions & Restrictions� for Chi-Square Tests
  • Back to the Big pIcture
  • How to Choose A Test
  • To-Do

TD0409-01 课件/Psy 202_6_correlation_W22_topost.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS I

MODULE 6:
INTRO TO CORRELATION

1

1. Intro to Correlation
2. Hypothesis Testing with Correlation
3. What are correlations used for?
4. Interpreting Correlation

1. Issues to look out for

GAME PLAN

2

INTRO TO CORRELATION

3

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA 4

Statistical Test IV DV

Correlation/Linear
regression

Quantitative Quantitative

Independent
samples t-test

Two independent
categories

Quantitative

Paired sample
t-test

Two related groups Quantitative

ANOVA Multiple categories Quantitative

Chi-square Two or more
categories

Categorical

5

RESEARCH PROBLEM

 What is the relationship between hours studying and
scores on a quiz?
 Conduct a non-experimental study
 n = 6 students

 Measure hours studying for an exam (X)

 Record each student’s quiz score (Y)

 Examine association between hours studying and quiz
scores

 Does study time predict quiz scores?

6

RESEARCH PROBLEM

 Correlation
 Direction and strength of an association between two variables

(X,Y)

 Typically (but not only) used in non-experimental research
(variables are measured, not manipulated)

 Other examples:
 Relationship between stressful life events (X) and number of

illness symptoms (Y)

 Relationship between years of education (X) and yearly income (Y)

7

TOOLS FOR CORRELATION

 The Scatterplot
 A figure

 Shows association between two variables

 The Pearson correlation coef ficient
 A statistic

 Describes the direction and strength of a linear association
between two continuous variables

8

THE SCATTERPLOT

 Hours studying and quiz scores

Student
Study Hrs

(X)
Test Score

(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

n = 6 people,
6 pairs of

scores
n =6

9

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

10

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

11

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

12

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

13

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

14

Scatterplot

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

1

0
0

1

3
0

1

6
0

1

9
0

2

1
0

24

0

SEEING RELATIONSHIPS

15

SUMMARIZING RELATIONSHIPS

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

1

0
0

1

3
0

1

6
0

1

9
0

2

1
0

24

0

Linear
relationship

Describes variables that
can be well-represented
by a straight line (i.e.,
there is a common ratio
between a score on one
and a score on the other)

16

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

1

0
0

1

3
0

1

6
0

1

9
0

2

1
0

24

0

SUMMARIZING RELATIONSHIPS

17

SUMMARIZING RELATIONSHIPS

Grade Point Average

P
ar

ty
h

ou
rs

(w
ee

k)

1.0 2.0 3.0 4.0

0
5

1
0

1
5

18

SUMMARIZING RELATIONSHIPS

Ex
am

p
er

fo
rm

an
ce

Curvilinear
relationship

Reported Anxiety

19

 h t t p : / / w w w. p e w r e s e a r c h . o r g / f a c t – t a n k / 2 01 5 / 0 9 / 1 6 / t h e – a r t – a n d – s c i e n c e – o f –
t h e – s c a t t e r p l o t /

NOT A GIVEN…

20

DESCRIBING RELATIONSHIPS

When we talk about statistical relationships,
we begin by assessing the covariance, or
degree to which two variables var y together.

This statistic is used as the basis for the
correlation coefficient, a statistic that
measures the relationship between variables.
 Pearson’s product-moment correlation: r
 Spearman’s rank-order correlation: r s
 Point-biserial correlation: rpb

21

THE CORRELATION COEFFICIENT:
BASICS

 Pear son Correlation Coef ficient
 Symbol: r

 Ranges from -1.0 to +1.0

 Sign (+/-) indicates “direction” of relationship

 Value indicates “strength” of relationship

• Some general guidelines
• .10 is weak
• .30 is moderate
• .50 is strong

 Measures a linear relationship only

Remember: r2 guidelines
• .01 weak
• .09 moderate
• .25 strong

22

THE CORRELATION COEFFICIENT

Figure 16-3 (p. 524). Examples of positive and negative relationships. (a) Beer sales are
positively related to temperature. (b) Coffee sales are negatively related to temperature.

Positive Correlation
X = Temperature
Y = Beer Sales

Negative Correlation
X = Temperature
Y = Coffee Sales

23

THE CORRELATION COEFFICIENT

Figure 16-5 (p. 525). Examples of different values for linear correlations: (a) shows a strong positive
relationship, r = +.90; (b) shows a moderate negative correlation, r = –.40; (c) shows a perfect negative
correlation, r = –1.0; (d) shows no linear trend, r = 0.0.

r = +.90

r = −1.0

r = −.40

r = 0

How closely
do the dots

hug the
line?

24

COMPUTING R

r = degree to which X & Y vary together
degree to which X & Y vary separately

r = Covariance of X & Y
Variance of X & Y

25

COVARIABILIT Y OF X AND Y

YX XY

Variance in
X alone

Variance in
Y alone

Covariance between
X and Y

• The greater the covariance, the greater the correlation (the closer r will be to ±1.0)
26

COMPUTING R

 Computational formulas for Pearson r

SP =
n

YX
XY ∑ ∑∑ −

SSX = n
X)(

X
2

2 ∑∑ − SSY =
n
Y)(

Y
2

2 ∑∑ −

r =
YXSSSS

SP

Where:

• SP = “Sum of products”

• SS = “Sum of squares”

SP = similar
to SS, but for
COvariance

Learning check! 27

HYPOTHESIS TESTING FOR R

 State the research question
 Is there a significant linear association between X & Y?

 Is r significantly different from zero?

ρ = “rho” the population parameter

 r = sample statistic

28

HYPOTHESIS TESTING FOR R

 Step 1: Statistical Hypotheses for r
 Almost always two-tailed (non-directional)
 H0: ρ = 0
 H1: ρ ≠ 0

 One-tailed upper (directional)
 H0: ρ ≤ 0
 H1: ρ > 0

 One-tailed lower (directional)
 H0: ρ ≥ 0
 H1: ρ < 0

29

HYPOTHESIS TESTING FOR R

 Step 2: Find critical value of r (Table)
 Need 3 pieces of information:
 α

 One-tailed or two-tailed?
 degrees of freedom: df = n−2

30

31

HYPOTHESIS TESTING FOR R

 Step 2: Find critical value of r (Table)
 Need 3 pieces of information:
 α

 One-tailed or two-tailed?
 degrees of freedom: df = n−2

 Step 3: Compute obser ved r
 Step 4: Make a decision
 Reject H0 if observed r exceeds rcritical

 Step 5: Summarize and repor t findings

32

LET’S PRACTICE!

 Research question
 Is there a significant linear association between hours

studying and quiz score?

 Is r significantly different from zero?

 Step 1: Statistical Hypotheses

 H0: ρ = 0
 H1: ρ ≠ 0

Step 2: Find rcritical in Table α = .05
 Two-tailed
 df = n−2; df = 6−2 = 4

33

34

LET’S PRACTICE!

 Research question
 Is there a significant linear association between hours studying

and quiz score?

 Is r significantly different from zero?

 Step 1: Statistical Hypotheses

 H0: ρ = 0
 H1: ρ ≠ 0

 Step 2: Find rcritical in Table α = . 0 5
 Two-tailed
 df = n−2; df = 6−2 = 4
 From table  rcrit = ±.811

35

LET’S PRACTICE

 Step 3: Compute obser ved r

 Steps in computing r:

 Compute SSX
 Compute SSY
 Compute SP

 Compute r

36

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

n =6 ΣX =22

SSX =
n
X)(

X
2

2 ∑∑ −

37

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2

A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49

n =6 ΣX =22 ΣX2 =112

SSX = n
X)(

X
2

2 ∑∑ −

38

COMPUTING R

 Compute SSx

SSX = n
X)(

X
2

2 ∑∑ −

SSX = 333.316
22

112
2
=−

39

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2

A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49

n =6 ΣX =22 ΣY =23 ΣX2 =112

SSY = n
Y)(

Y
2

2 ∑∑ −

40

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2

A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103

SSY = n
Y)(

Y
2

2 ∑∑ −

41

COMPUTING R

 Compute SSY

SSY = n
Y)(

Y
2

2 ∑∑ −

SSY = 833.146
23

103
2
=−

42

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2

A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103

SP = n
YX

XY ∑ ∑∑ −

43

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2 XY

A 1 1 1 1 1
B 1 3 1 9 3
C 3 4 9 16 12
D 4 5 16 25 20
E 6 4 36 16 24
F 7 6 49 36 42

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103 ΣXY =102

SP = n
YX

XY ∑ ∑∑ −

44

COMPUTING R

 Compute SP:

SP =
n

YX
XY ∑ ∑∑ −

SP = 667.17
6

)23)(22(
102 =−

45

COMPUTING R

 Finally, compute r!

r =
YXSSSS

SP

r = 819.571.21
667.17

4.833)(31.333)(1
17.776

+==

46

LET’S PRACTICE!

Step 4: Make a Decision

 Reject H0: robs (+.819) exceeds rcrit (±.811)

Step 5: Summarize and repor t finding

“There was a statistically significant positive correlation
between hours studying and quiz scores, r(4) = .82, p <
.05, two-tailed, r2 = .67. Students who studied longer
earned higher scores on the quiz.”

Notice: No causal
language!

47

LET’S PRACTICE!

Compute r2 ( “ c o e f fic ient o f d eter minat io n”)

 Effect size

 r2 = .8192 = .67

 67% of the variance in quiz scores is explained by
hours studying (and vice versa)

48

REPORTING AN R

 A closer look…

r(4) = .82, p < .05, two-tailed, r2 =.67
Test

statistic

Observed
value

alpha
level One- or two-

tailed
Degrees of

freedom

Effect size

49

 Quantitative data
 Independent obser vations
 Random sampling
 Linear relationship

ASSUMPTIONS FOR PEARSON’S R

Learning check!
50

SPOTLIGHT ON T WIN STUDIES

51

52

WHAT ARE
CORRELATIONS FOR?

COMMON USES FOR CORRELATIONS

Prediction
Note: this is NOT causal language

Measurement assessment
Validity (accuracy)
Reliability (consistency)

53

VARIOUS USES OF CORRELATIONS

 P r e dic t io n  I f we k now t h a t t wo va r i a bles a r e r e l a ted to o ne
a not h e r, we c a n u s e k n owl e d ge ab o u t o n e var iable to m a ke
p r e d ic tio ns a b o u t t h e va l u e o f t h e ot h e r va r i abl e
 E.g., How tall do you think my niece is? Does it help if I tell you that

she just turned 5?

54

VARIOUS USES OF CORRELATIONS

 Va l i dit y o f m e a s ur es
 Convergent validity: How strongly does the measure correlate with

other measures of the same construct?
 E.g., Does the self-esteem measure you’ve just constructed correlate

positively with existing self-esteem measures? (good thing)

 Discriminant validity: How strongly does the measure correlate with
measures of unrelated constructs?
 E.g., Does the self-esteem measure you’ve just constructed correlate

positively with measures of unrelated constructs (e.g., mood)? (bad thing)

55

VARIOUS USES OF CORRELATIONS

 Re l ia bili ty o f m e a s ures
 Reliable measures should produce consistent, stable results
 E.g., If you are measuring IQ, or a personality trait, or any other

construct where you expect stable results, you would expect a
person’s scores from any two measurement sessions to be highly
correlated

56

VARIOUS USES OF CORRELATIONS

 T h eo r y Ve r ifica t io n  M a ny p s yc ho lo gic al t h e o r i es i nvo l ve
s p e c ific p r e d ic tio ns a b o u t t h e r e l a t io ns hip b et ween t wo
va r i abl es
 One way these predictions can be tested is by determining the

correlation between the two variables
 E.g., The General Aggression Model predicts positive relationships

between recent exposure to violent media and a host of aggression-
related variables (hostile expectancies, aggressive cognitions,
physiological arousal, etc.)

57

INTERPRETING
CORRELATION

58

PROCEED WITH CAUTION…

1. Correlation is sensitive to outliers

2. Correlation is only appropriate for describing
linear relationships

3. Correlation is sensitive to restriction of range
(lack of generalization)

4. Beware of heterogeneous samples

5. Correlation does not imply causation

59

1 . SENSITIVE TO OUTLIERS

222120191817

12

10

8

6

4

2

0
-2

r = -.10

605040302010

50

40

30

20

10

0

-10

r = .94

• An outlier is an extremely deviant individual in the sample
• Characterized by a much larger (or smaller) score than all the others in the sample
• In a scatter plot, the point is clearly different from all the other points
• Outliers produce a disproportionately large impact on the correlation coefficient

60

2. LINEAR RELATIONSHIPS ONLY

r = .10

Reported anxiety

Ex
am

p
er

fo
rm

an
ce

61

3. RESTRICTION OF RANGE

Score on IQ test

S
co

re
o

n
ge

ne
ra

l m
at

h
te

st

r = .82

75 80 85 90 95 100 105 110 115 120 125 130

r = -.13

105 110 115 120 125

62

4. HETEROGENEOUS SAMPLES

r = – .70

Reported Anxiety

P
er

fo
rm

an
ce

o
n

Ex
am

63

AND NOW FOR SOME EXAMPLES…

64

65

66

67

68

69

We should all be texting while at Church (and
also having unprotected sex)!

Thinking of cleaning as women’s work is
actually better for both men and women
(especially if women do more housework, to
cut their risk of cancer)!

Any chance there is a problem here???

SO WHAT HAVE WE LEARNED?

70

5. CORRELATION IS NOT
CAUSATION

Ice cream sales

D
ea

th
b

y
D

ro
w

ni
ng

If X and Y are correlated:

… does X cause Y?

… does Y cause X?

… does Z cause X and Y?
71

NAME THAT CORRELATION…

72

NAME THAT CORRELATION…

73

NAME THAT CORRELATION…

74

NAME THAT CORRELATION…

75

WHAT’S WRONG WITH THIS
PICTURE?

r = – .80

WHAT’S WRONG WITH THIS
PICTURE?

r = .85

WHAT’S WRONG WITH THIS
PICTURE?

r = .10

WHAT’S WRONG WITH THIS
RESEARCH?

“The data showed a strong and highly significant
positive correlation between date of onset of
sexual activity and current level of sexual activity
(r = 0.75, p < .01), suggesting that teenagers
who begin having sex at an earlier age are more
promiscuous in college as a result.”

“The negative correlation coef ficient shows that
there is no relationship between these traits.”

“The correlation was significant (r = -1 .22)…”

79

h t t p : / / gues st hec o rr elat io n. c o m/

MORE PRACTICE?

80

 C a n m o r e e a s i ly
i d e nt if y i s s u e s
t h a t m i g ht
i nte r fer e w i t h yo u r
a b i li ty to i nte rpr et
yo u r d a t a

PROTIP: LOOK AT A SCATTERPLOT FIRST

81

ALTERNATIVES TO THE
PEARSON CORRELATION

Pearson correlation has been developed
For data having linear relationships
With data from interval or ratio measurement scales

Other correlations have been developed
For data having non-linear relationships
With data from nominal or ordinal measurement

scales
 Point-biserial
 Spearman’s correlation

82

SUMMARY

Correlations var y in type and magnitude
Errors are commonly made when interpreting

correlations
Look at a scatterplot!

83

h t t p s : / /kot aku. c o m/ anto nin – s ca lias- l andmark – defens e- o f – v io lent –
v i d e o – games – 17 58 99 036 0

EVEN THE US SUPREME COURT
KNOWS WHAT’S UP

84

And remember for the rest of your life:
Correlation does NOT equal causation!

Practice interpreting correlations on the
discussion board

85

86

JUST TO COMPLICATE
THINGS A LITTLE…

87

…BUT SOMETIMES IT KIND OF IS

https://www.youtube.com/watch?v=HUti6vGctQM&fbclid=IwAR2orZs_ECdn0
94_eSkyyp-1ZKXWtIv3USW2PL6N9oZunqIBY1nlTuUxAh4

 “In e s s e n c e , to l o g i c al l y i n fe r t h at X c aus e d Y, we n e e d to me et
t h re e re q ui re me n t s :
 We must know that X preceded Y. It is not possible for a cause to follow

or even coincide with an ef fect. It must come before it, even if it is
fractions of a second.
 X must covar y with Y. In other words, Y must be more likely to

occur when X occurs than when X does not occur.
 The relationship between X and Y is free from confounding. What this

means is that no other variable also covaries with X when #1 and #2 are
met.”

 W h at abo ut w h e n a t rue ex pe ri me n t i s n ot po s s i b l e ? Gi ve up?

 It may be mo re us e f ul to t h i n k o f c aus al i t y o n a c o n t i n uum
rat h e r t h an as a di c h oto mo us o ut c o me

 Se e mo re : h t t p: / / i c bs eve r y w h e r e . c o m/ bl o g / 2 01 4 / 10 / t h e -l o g i c –
o f – c aus al -c o n c l us i o n s /

88

…BUT SOMETIMES IT KIND OF IS

 Ke e p u p w i t h t u to r i als !
 Data project information: coming soon

89

TO-DO

  • Psy 202H1: �Statistics I����Module 6: �Intro to Correlation ��
  • Game Plan
  • Intro to Correlation
  • Slide Number 4
  • Slide Number 5
  • Research Problem
  • Research Problem
  • Tools for Correlation
  • The Scatterplot
  • The Scatterplot
  • The Scatterplot
  • The Scatterplot
  • The Scatterplot
  • The Scatterplot
  • Seeing relationships
  • Summarizing relationships
  • Summarizing relationships
  • Summarizing relationships
  • Summarizing relationships
  • Not a given…
  • Describing relationships
  • The Correlation Coefficient: Basics
  • The Correlation Coefficient
  • The Correlation Coefficient
  • Computing r
  • Covariability of X and Y
  • Computing r
  • Hypothesis testing for r
  • Hypothesis testing for r
  • Hypothesis testing for r
  • Slide Number 31
  • Hypothesis testing for r
  • Let’s Practice!
  • Slide Number 34
  • Let’s Practice!
  • Let’s Practice
  • Let’s Practice
  • Let’s Practice
  • Computing r
  • Let’s Practice
  • Let’s Practice
  • Computing r
  • Let’s Practice
  • Let’s Practice
  • Computing r
  • Computing r
  • Let’s Practice!
  • Let’s Practice!
  • Reporting an r
  • Assumptions for Pearson’s R
  • Spotlight on Twin Studies
  • What are correlations for?
  • Common Uses for Correlations
  • Various Uses of Correlations
  • Various Uses of Correlations
  • Various Uses of Correlations
  • Various Uses of Correlations
  • Interpreting Correlation
  • Proceed with caution…
  • 1. Sensitive to outliers
  • 2. Linear relationships only
  • 3. Restriction of range
  • 4. Heterogeneous samples
  • And now for some examples…
  • Slide Number 65
  • Slide Number 66
  • Slide Number 67
  • Slide Number 68
  • Slide Number 69
  • Slide Number 70
  • 5. Correlation is not causation
  • Name that correlation…
  • Name that correlation…
  • Name that correlation…
  • Name that correlation…
  • What’s wrong with this picture?
  • What’s wrong with this picture?
  • What’s wrong with this picture?
  • What’s wrong with this research?
  • More Practice?
  • ProTIP: Look at a Scatterplot First
  • Alternatives to the � Pearson Correlation
  • Summary
  • Even the US Supreme Court �Knows What’s Up
  • Slide Number 85
  • Just to complicate things a little…
  • …but sometimes it kind of is
  • …but sometimes it kind of is
  • To-Do

TD0409-01 课件/Psy 202_5_MoreFactorial_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 5:
HYPOTHESIS TESTING WITH

FACTORIAL ANOVA

1

1. Review
1. More on interactions and simple effects
2. Another 2 factor design

2. Hypothesis Testing with Factorial ANOVA
1. Sources of variance
2. Foundations of hypothesis test
3. Example, with numbers!

GAME PLAN

2

Learning check!
Factorial Design review

 Te s t w i l l b e p o s te d Tu e sd ay Fe b r uar y 1 5 9 a m a nd w i l l b e d u e
T h u r sday Fe b r uar y 17 1 1 : 5 9 p m
 This is NOT a timed test. You may start it, take a break, and return to

it.
 However, I do NOT recommend taking the whole time to complete the

test!
 It will be written as if it could be completed like an in-person test, about 2

hours (assuming you prepare for it as if it were an in-person test).

 C o nte nt: A ll r e a d ings a nd l e c t u r es t h ro u g h M o d u le 5
 Including things reviewed in text but not in lecture video

 Fo r m a t o f qu e s t i o ns m ay i nc l u de
 Multiple Choice
 Short Answer
 Computations

MIDTERM

3

 Pe rmi t te d re s o urc e s :
 Your book , your notes
 Simple calculator

 N OT pe rmi t te d re s o urc e s :
 Your friends/classmates
 Including any group chats, like Discord, GroupMe, Facebook , etc.

 Any other people, including but not limited to those who have taken this
course before
 Google (or any internet resources)

 IM P O R TA N T: N ot j us t W HAT but W HY ; appl i c at i o n
 Is MindTap similar to the test? Kind of…

 W i l l i t be w ri t te n to b e h arde r to make up fo r t h e f ac t t h at i t i s
o pe n bo o k ?
 No, but…

MIDTERM

4

5

MORE ON:
INTERACTIONS AND

SIMPLE EFFECTS

SIMPLE EFFECTS
• Ef fects of one IV on DV at one par ticular level of other IV

0
2
4
6
8

10
12
14
16
18
20

Head nod Head shake

Tuition
increase
Tuition
decrease

1. Simple effect of tuition condition on attitude, for head nodding condition
2. Simple effect of tuition condition on attitude, for head shaking condition
3. Simple effect of head condition on attitude, for tuition increase condition
4. Simple effect of head condition on attitude, for tuition decrease condition

6

Head Nodding Head Shaking

Tuition
Increase
Tuition
Decrease

1. Main ef fect of head
condition?

2. Main ef fect of message
condition?

3. Interaction between
head and message
conditions?

ACTUAL RESULTS

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

7

ANOTHER EXAMPLE
TWO FACTOR DESIGN

8

T WO-FACTOR DESIGNS

Test scores for boy
students put in a happy

mood

Test scores for girl
students put in a happy

mood

Test scores for boy
students put in a sad

mood

Test scores for girl
students put in a sad

mood

F
A
C
T
O
R

A

FACTOR B

Level 1 (B1) Level 2 (B2)

Level 1 (A1)

Level 2 (A2)

9

REVIEW: TERMINOLOGY

 Fa c to r  T h e va r i able ( i nd epend ent o r qu a s i – independent )
t h a t d e s igna tes t h e g ro u p s b e i ng c o m p a r ed
 E.g., Mood

 L evel  T h e i nd i v idu al c o nd i t i o ns o r va l u es t h a t m a ke u p a
f a c to r a r e c a l l ed t h e l evels o f t h e f a c to r
 E.g., Happy vs Sad

 Fa c to ri a l d e s ign  A ny s t u d y t h a t c o m b i ne s t wo o r m o r e
f a c to r s
 Comparing how male and female students perform on a general

knowledge test after being put in either a sad or happy mood
 2 (gender: boy, girl) x 2 (mood: happy, sad) independent measures

design

10

T WO-FACTOR DESIGNS

Test scores for boy
students put in a happy

mood

Test scores for girl
students put in a happy

mood

Test scores for boy
students put in a sad

mood

Test scores for girl
students put in a sad

mood

F
A
C
T
O
R

A

FACTOR B

Level 1 (B1) Level 2 (B2)

Main effect
of FACTOR

A  Do
test scores

differ
depending
on mood?

Row 1

Row 2

Level 1 (A1)

Level 2 (A2)

11

T WO-FACTOR DESIGNS

Test scores for boy
students put in a happy

mood

Test scores for girl
students put in a happy

mood

Test scores for boy
students put in a sad

mood

Test scores for girl
students put in a sad

mood

F
A
C
T
O
R

A

FACTOR B

Level 1 (B1) Level 2 (B2)

Main effect of FACTOR B  Do test scores differ
depending on the gender of the participant?

C
O
L
U
M
N

1

C
O
L
U
M
N

2

Level 1 (A1)

Level 2 (A2)

12

MAIN EFFECTS

 The mean dif ferences among the levels of one fac tor
are referred to as the main ef fect of that factor

You’re not interested in the individual cells, you’re
interested in comparing the means of each row
(Factor A) and the means of each column (Factor B)

Boys (B1) Girls (B2)
Happy (A1) M = 100 M = 110
Sad (A2) M = 110 M = 100

MA1 = 105
MA2 = 105

MB1 = 105 MB2 = 105
13

INTERACTIONS

 W h en t h e e f fec t o f o ne f a c to r d e p ends o n t h e d i f fer ent
l evels o f a s e c o nd f a c to r, t h e n t h e r e i s a n i n ter ac ti on
b et ween t h e f a c to r s

 Now you’re interested in the individual cells: An interaction between
two factors occurs when the mean differences between individual
treatment conditions (or cells) are different from what would be
predicted from the overall main effects of the factors

Boys (B1) Girls (B2)
Happy (A1) M = 100 M = 110
Sad (A2) M = 110 M = 100

MA1 = 105
MA2 = 105

MB1 = 105 MB2 = 105
14

T WO-FACTOR DESIGNS

Test scores for boy
students put in a happy

mood

Test scores for girl
students put in a happy

mood

Test scores for boy
students put in a sad

mood

Test scores for girl
students put in a sad

mood

F
A
C
T
O
R

A

FACTOR B

Level 1 (B1) Level 2 (B2)

Level 1 (A1)

Level 2 (A2)

A x B INTERACTION: Does the effect of mood
depend on participant gender? 15

INTERACTIONS

 W h en t h e r e s u lt s o f a t wo – f a c to r s t u d y a r e p r e s e nted i n a
g r a p h, t h e ex i s tence o f n o n – p ar al l e l l i n e s ( i . e . , l i ne s t h a t
c ro s s o r c o nve r ge) i nd i c ates a n i n ter ac ti on b et ween t h e t wo
f a c to r s

95

100

105

110

115

Happy Sad

Te
st

S
co

re

Mood

Boys
Girls

16

Learning check!

A ka, w h a t ’ s h a p pening S TATISTI CAL LY w i t h a
f a c to r i al A N OVA?

17

HYPOTHESIS TESTING
WITH FACTORIAL ANOVA

FACTORIAL ANOVA LOGIC

“Natural
variability”

“Variability across
group means”

“Variability
within groups”

“Variability between
groups”

=
“Error”

“Effect”
=

18

T YPES OF VARIANCES IN FACTORIAL
ANOVA (2 X 2)

• More variable  More effects  More sources of
variance:

o Between-groups for main effect of IV 1
o Between-groups for main effect of IV 2
o Between-groups for interaction of IV 1 and IV 2

o Within-groups (natural variability)*****

3
possible
Between

1 within

19

FACTORIAL ANOVA –

“Natural
variability”

“Variability across
group means”

“Variability
within groups”

“Variability between
groups”

=
“Error”

“Effect”
=

**Calculated separately for each main effect and
interaction

20

ANALYSIS OF VARIANCE

• Goal: explain the total variance in a set of scores by
determining how much is due to our IVs versus natural
variability

• In a one-way ANOVA, we had only two possible sources of
variance: between-groups and within-groups

• Now, we have many different sources:
• Main effect of IV1 (between-groups)
• Main effect of IV 2 (between-groups)
• Interaction (between-groups)
• Natural or error variability (within-groups)

21

T WO-FACTOR ANOVA

 In a t wo -f ac to r s t udy, we n e e d to te s t fo r t wo mai n e f fe c t s an d an
i n te rac t i o n
 Main ef fect of factor A
 Main ef fect of factor B
 A x B interaction

 Th i s me an s t h at we h ave t h re e s e parate hy pot h e s e s , w i t h 3
di f fe re n t s et s o f hy pot h e s e s , w h i c h are te s te d w i t h t h re e
s e parate F-rat i o s
 The ANOVA allows us to test for all three ef fects in a single analysis

 It is impor tant to understand that each ef fect (each hypothesis) is
independent from the others  this means that any pattern of
significant/non-significant results is possible
 Two significant main ef fects, no interaction
 Main ef fect of factor A , but not factor B, and a significant interaction
 Only a significant interaction (no main ef fects)
 Etc.

22

HYPOTHESES FOR MAIN EFFECTS

 Fa c to r A :
 Null: A has no effect on outcome
 H0: μA1 = μA2

 Alternative: A does have an effect on outcome
 H1: μA1 ≠ μA2

 Fa c to r B :
 Null: B has no effect on outcome
 H0: μB1 = μB2

 Alternative: B does have an effect on outcome
 H1: μB1 ≠ μB2

23

HYPOTHESES FOR THE
INTERACTION

 N u l l hy p ot h esis :
 H0: There is no interaction between factors A and B. All the mean

differences between treatment conditions are explained by the
main effects of the factors.

 A l ter nati ve hy p ot hes is:
 H1: There is an interaction between factors A and B. The mean

differences between treatment conditions are not what would be
predicted from the overall main effects of the two factors.

In symbols, at level B1: μA1 = μA2, at level B2: μA1 = μA2; At level A1 = ….

24

THE THREE F -RATIOS I N A
T WO-FACTOR ANOVA

FA = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r A
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

F B = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r B
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

FA x B = v a r i a n c e ( m e a n d i f f e r e n c e s ) n o t e x p l a i n e d by m a i n e f f e c t s
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

25

T WO STAGES OF THE T WO-FACTOR ANALYSIS
OF VARIANCE

 S t a ge 1 : S a m e a s i nd e pend ent m e a s ur es A N OVA ( o r s t a g e 1 o f
t h e r e p e ated m e a s ur es A N OVA)  tot a l va r i a nc e i s b ro ken
d own i nto bet ween-tr eatments var i an c e a nd w i t hi n- tr eatments
v a r i anc e ( w h i c h b e c o m es t h e d e nomi nator fo r a l l t h r e e F –
r a t i o s )

 S t a ge 2 : B et we en t r e a t ment s va r i anc e i s b ro ke n d ow n i nto
i nto t h r e e se p ar ate c o mp o nents : d i f fer ences a t t r i bu table to
Fa c to r A , to Fa c to r B , a nd to t h e A x B i nte r ac tio n ( w h ic h
b e c o m e t h e n umer ator s fo r e a c h r e s p ec t ive F – r a t io )

26

27

Now, BETWEEN-group
variance gets partitioned,
into our three (or more)
effects of interest! So, the
sum of A, B, and AxB
values (i.e., SS, df) will
always equal Between!

28

T WO-FACTOR ANOVA
SUMMARY TABLE EXAMPLE

Source SS df MS F
Between treatments 200 3

Factor A 40 1 40 4

Factor B 60 1 60 *6

A x B 100 1 100 **10

Within Treatments 300 20 10

Total 500 23

F.05 (1, 20) = 4.35*
F.01 (1, 20) = 8.10**

(N = 24; n = 6)

29

EFFECT SI ZE FOR T WO-FACTOR ANOVA:
PARTIAL ETA SQUARED

 η2 for each factor and the interaction is computed as
the percentage of variability not explained by other
factor s
Two equivalent equations

30

T WO-FACTOR ANOVA ASSUMPTIONS

 T h e va l i dit y o f t h e A N OVA d e p ends o n t h r e e a s s u m pti o ns
c o m m o n to ot h e r hy p ot h esis te s t s
 The obser vations within each sample must be independent of each

other
 The populations from which the samples are selected must be

normally distributed
 The populations from which the samples are selected must have

equal variances
(homogeneity of variance)

Learning check!

31

32

WHAT THE HYPOTHESIS
TEST LOOKS LIKE WITH

NUMBERS

EXAMPLE: HYPOTHESIS TESTING
WITH THE T WO-FACTOR ANOVA

• The following data is
from a study examining
the effects of arousal
level and task difficulty
on performance scores
(higher scores indicate
better performance)

• We will use it to
illustrate the hypothesis
testing procedure for a
two-factor ANOVA

(Notice that this is a 2 x 3
factorial design)

33

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

S te p 1 : St a te t h e hy p ot hes es
 Fa c to r A  Ta s k d i f fi cul ty
 H0: μA1 = μA2 (or H0: μeasy = μdif ficult)
 H1: μA1 ≠ μA2 (or H1: μeasy ≠ μdif ficult)

 Fa c to r B  A ro u s al l evel
 H0: μB1 = μB2 = μB3 (or H0: μlow = μmedium = μhigh)
 H1: μB1 ≠ μB2 ≠ μB3 (or H1: μlow ≠ μmedium ≠ μhigh)

 I nte rac t io n  Ta s k d i f fi cul ty x A ro u sa l l evel
 H0: There is no interaction effect. The effect of either factor does not

depend on the levels of the other factor.
 H1: There is an interaction effect.

34

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

Step 2: Compute the three F-ratios in two stages

 Stage 1: Par tition SS total and df total
 (same as s tage 1 for one-way repeated measures AND between groups ANOVA)

35

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

Step 2: Compute the three F-ratios in two stages
 Stage 2 (NEW): Par tition SS betweentreatments

SSA = N
G

n
T

row

row
22

−∑

SSB = N
G

n
T

col

col
22

−∑

SSAxB = BAatmentsbetweentre SSSSSS −−
36

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

Step 2: Compute the three F-ratios in two stages
 Stage 2 (NEW): Par tition df between-treatments

dfA = number of rows – 1

dfB = number of columns – 1

dfAxB = dfbetween treatments – dfA – dfB

37

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

Step 2: Compute the three F-ratios in two stages
 Stage 2: Calculate the four MS values

treatmentswithin

treatmentswithin
within df

SS
MS

−= denominator for all
three F-ratios

AxB

AxB
AxB

B

B
B

A

A
A

df
SS

MS

df
SS

MS

df
SS

MS

=

=

=

numerators for the three F-ratios

38

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

S te p 2 : Compute the three F-ratios in two stages
 Stage 2: Calculate the three F-ratios

within

AxB
AxB

within

B
B

within

A
A MS

MS
F

MS
MS

F
MS

MS
F ===

39

SUMMARY TABLE FOR T WO-FACTOR ANOVA

Source SS df MS
Between treatments

Factor A (difficulty)
Factor B (arousal)
A x B

260
120

80
60

5
1
2
2

120
40
30

F(1, 24) = 24.00
F(2, 24) = 8.00
F(2, 24) = 6.00

Within treatments 120 24 5
Total 380 29

40

HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA

S te p 3 : F i nd t h e c r i t i c al F va l u e fo r e a c h F – r a t i o , c o m p a r e w i t h
t h e c o m p u ted F – r a t io , a nd m a ke a d e c is io n r e g a r ding e ac h H 0
( a l l te s ted a t . 0 5 l evel )

 Fa c to r A : d f = 1 , 24  F crit = 4 . 2 6 ( FA = 24 . 0 0 )
 Decision: Reject H0, conclude that there is a significant main effect

of task difficulty
 Fa c to r B : d f = 2 , 24  F crit = 3 . 4 0 ( F B = 8 . 0 0 )
 Decision: Reject H0, conclude that there is a significant main effect

of arousal level
 A x B i nte rac t io n: d f = 2 , 24  F crit = 3 . 4 0 ( FAxB = 6 . 0 0 )
 Decision: Reject H0, conclude that there is a significant interaction

between task difficulty and arousal

41

EFFECT SI ZE FOR T WO-FACTOR ANOVA:
PARTIAL ETA SQ UARE

How large is the effect of task difficulty?

How large is the effect of arousal?

How large is the interaction effect?

AxBBtotal

A
A SSSSSS

SS
−−

=2η

AxBAtotal

B
B SSSSSS

SS
−−

=2η

BAtotal

AxB
AxB SSSSSS

SS
−−

=2η

42

REPORTING RESULTS IN APA FORMAT

“A t wo – f a c to r b et ween- s ubjec ts a na l y si s o f va r i anc e s h owe d a
s i g ni ficant m a i n e f fe ct fo r t a s k d i f fi cul ty, F ( 1 , 24 ) = 24 . 0 0 , p < . 0 5 ,
η2 = . 5 0 , s u c h t h a t p a r t i c ipant s p e r fo r m ed b et ter o n e a s y t a s k s
( M = 6 , S D = 2 . 2 6 ) t h a n o n d i f ficu lt t a s k s ( M = 2 , S D = 1 . 8 5 ).

T h e r e wa s a l s o a s i g ni fic ant m a i n e f fe ct fo r a ro u s a l l evel, F ( 2 , 24 )
= 8 . 0 0 , p < . 0 5 , η2 = . 4 0 , s u c h t h a t p a r t i ci pants p e r fo r m ed b et ter
a s a ro u s al i nc r e as ed f ro m l ow ( M = 2 , S D = 1 . 7 ) to m e d i um ( M = 4 ,
S D = 2 . 31 ) to h i g h ( M = 6 , S D = 2 . 2 6 ).

F i na lly, t h e r e wa s a s i g ni fic ant t a s k d i f fi cult y x a ro u s a l
i nte r ac t io n, F ( 2 , 24 ) = 6 . 0 0 , p < . 0 5 , η2 = . 3 3 . A s c a n b e s e e n by
l o o k i ng a t F i g ure 1 , i nc r e a sed l eve ls o f a ro u s a l l e d to c o ns i s te ntl y
b et ter p e r fo r m ance w h e n t h e t a s k wa s e a s y. H owever, w h e n t h e
t a s k wa s d i f fic ult , a m o d e r ate l evel o f a ro u s a l l e d to t h e b e s t
p e r fo r m ance , w i t h s c o r e s s h a rply d e c r e asing a s a ro u s a l i s
i nc r e a sed f ro m m o d e r a te to h i g h. ” 43

Since this factor has 3 levels, we actually
need to do post hocs to establish which
means are different

POST HOC TESTS

 I f yo u h ave a 2 x 2 d e s ign, p o s t h o c te s t s fo r a ny s i g ni fic ant
m a i n e f fec t s a r e u nne c e s sar y ( w hy?)

 H owever, i f yo u h ave m o r e t h a n t wo l evels o f a f a c to r a nd a
s i g ni ficant m a i n e f fe ct , yo u m ay w i s h to c o nd u c t a p o s t h o c
te s t ( e . g . , Tu key ’ s H S D ) to d eter mi ne w h i c h m e a ns a r e
s i g ni ficant ly d i f ferent f ro m o ne a not h e r

44

POST HOC TESTS: TUKEY’S HSD

 Re m e mber : Yo u wo u l d o nl y d o t h i s t y p e o f p o s t h o c te s t i f
t h e r e i s n o s i gni fi ca nt i n tera c t io n , b u t a s i gni fic a nt m a i n
e f fec t fo r a f a c to r w i t h m o r e t h a n t w o l evels ( e . g . , i f o u r
i nte r ac t io n wa s not s i g nific ant , b u t t h e r e wa s a m a i n e f fec t
o f a ro u s al )

• q  To find the q value, you need to know: the alpha level (same as original
test), dfwithin (from original ANOVA), and k (the number of levels in the factor
you are testing)

• MSwithin  from original ANOVA
• n  the number of participants in each level you are comparing (e.g., how

many participants were in each arousal condition)
45

POST HOC TESTS

 If yo u h ave a 2 x 2 de s i g n , po s t h o c te s t s fo r any s i g n i fi c an t mai n
e f fe c t s are un n e c e s s ar y (w hy ?)

 Howeve r, i f yo u h ave mo re t h an t wo l eve l s o f a f ac to r an d a
s i g n i fi c an t mai n e f fe c t , yo u may w i s h to c o n duc t a po s t h o c te s t
(e . g . , Tukey ’ s HSD) to dete rmi n e w h i c h me an s are s i g n i fi c an t l y
di f fe re n t f ro m o n e an ot h e r
 If the interaction is significant, don’t worr y too much about main ef fects
 Why?

 M o re i mp o r t an t l y (o r i n te re s t i n g l y ), i f yo u h ave a s i g n i fi c an t
i n te rac t i o n , yo u may wan t to te s t fo r simpl e m a in ef fec t s …

46

TESTING FOR SIMPLE MAIN EFFECTS

 A s i g ni fic ant i nte r ac t io n i nd i c ates t h a t t h e e f fec t o f o ne f a c to r
( e . g . , a ro u s a l) o n t h e d e p endent va r i abl e ( e . g . , p e r fo r manc e)
d e p ends o n t h e l evel s o f t h e ot h e r f a c to r ( e . g . , w h et her t h e
t a s k i s e a s y o r d i f ficu lt)

 To b et ter u nd e r s t and w h a t i s h a p p ening, we m ay w i s h to te s t
fo r t h e s i g ni fi canc e o f m e a n d i f fer enc es w i t h in o ne c o l u m n ( o r
row )
 Test the simple main effect of one factor for each level of the other

factor
 E.g., Test for significant differences between the levels of task

difficulty at each level of arousal (low, medium, high)

47

TESTING FOR SIMPLE MAIN EFFECTS

 C a n t h i nk o f i t a s d i v idi ng t h e d a t a u p i nto nu m e ro u s sin g le –
f ac tor A N OVAs ( o r t – te s t s , i f o nl y t wo l evels o f a f a c to r )

 Fo l l ow s s a m e p ro c e d ur e a s t h e o ne – way ( o r s i ng l e f a c to r )
i nd e pend ent m e a s ures A N OVA ( o r t – te s t )

48

TESTING FOR SIMPLE MAIN EFFECTS

E.g., At each level of arousal (factor B), we
test whether there is a significant difference
between the easy and difficult tasks (levels
of factor A)
 H0: μeasy = μdif ficult (μA1 = μA2)
 H1: μeasy ≠ μdif ficult (μA1 ≠ μA2)

F = variance (differences) for the means at this level of factor B
variance (differences) expected by chance

F = MSbetween for the two conditions at this level of factor B
MSwithintreatments from the original ANOVA

49

TESTING FOR SIMPLE MAIN EFFECTS

E.g., For the high level of arousal:

dfbetween treatments = k – 1 = 1
MSbetweentreatments = 160 = 160

1
MSwithintreatments = 5 (from previous)
F = 160 = 32.00

5 Fcrit (1, 24) = 4.26

Thus, at the high level of
arousal, there is a
significant difference in
performance on the
easy and difficult tasks
(we reject H0). 50

TESTING FOR SIMPLE MAIN EFFECTS

E.g., For the low level of arousal:

dfbetween treatments = k – 1 = 1
MSbetweentreatments = 10 = 10

1

MSwithintreatments = 5 (from previous)
F = 10 = 2.00

5 Fcrit (1, 24) = 4.26

Thus, at the low level of
arousal, there is not a
significant difference in
performance on the
easy and difficult tasks
(we fail to reject H0).

51

HIGHER -ORDER FACTORIAL DESIGNS

 W h a t a b o u t c a s e s w h e re we h ave a s t u d y d e s ign i nvo l v ing
t h r e e ( o r m o r e ) f a c to r s ?

 S a m e l o g i c , j u s t ex tend ed  now we h ave a “ f a c to r C ” , a nd
w i l l ne e d to te s t fo r a m a i n e f fec t o f f a c to r C a nd a l s o w h et her
i t i nte r ac t s w i t h f a c to r A o r B ( i . e . , A x C a nd B x C
i nte r ac t io ns) , a s we l l a s a p ote nt ial t h r ee- way i n te r acti on: A x
B x C

 T h r e e- way i nte r ac t io ns a r e m o r e c h a l lenging to i nte rp ret , b u t
c a n b e i nte re st ing a nd va l uabl e
 However, interactions involving four or more variables are often more

confusing than they are helpful!

52

EXAMPLE: THREE-WAY INTERACTION

 Pe rh aps t h e e f fec t s o f a ro u s a l l evel a nd t a s k d i f fi cul ty d i f fer
fo r m a l es a nd fe m ales
 If we add gender to the mix, we now have a three-factor design (2 x 3 x

2)

0

2

4

6

8

10

12

Low Medium High

P
er

fo
rm

an
ce

Arousal Level

Female Participants

Easy
Difficult

0

2

4

6

8

10

12

Low Medium High

P
er

fo
rm

an
ce

Arousal Level

Male Participants

Easy
Difficult

53

Learning check!

 M i nd t ap, Tu to r i al
 M i d ter m!
 Make a schedule – don’t cram it all in the few days before it is

posted.
 Midterm info posted online – Syllabus > Assignments > Term Tests

54

TO-DO

BONUS CONTENT:
ANOTHER EXAMPLE OF
A TWO FACTOR DESIGN

55

E X A MPLE: S E L F -ESTEEM & P R ESENCE O F A N
AUDIEN CE

Three questions:
– Does the level of self esteem (low or high) affect performance? (main effect)
– Does the presence or absence of the audience affect performance? (main effect)
– Does the effect of one factor (e.g., the audience) depend on the levels of the

other factor (e.g., self-esteem)? (interaction effect)

Three
separate
hypotheses
and three
separate
F-ratios

56

HYPOTHESES FOR MAIN EFFECTS

 Fa c to r A ( S e l f – esteem):
 Null: Self-esteem has no effect on performance
 H0: μA1 = μA2

 Alternative: Self-esteem does have an effect on performance
 H1: μA1 ≠ μA2

 Fa c to r B ( Au dienc e):
 Null: The absence or presence of an audience has no effect on

performance
 H0: μB1 = μB2

 Alternative: The absence or presence of an audience does have an
effect on performance
 H1: μB1 ≠ μB2

57

HYPOTHESES FOR THE
INTERACTION

 N u l l hy p ot h esis :
 H0: There is no interaction between factors A and B. All the mean

differences between treatment conditions are explained by the
main effects of the factors.

 A l ter nati ve hy p ot hes is:
 H1: There is an interaction between factors A and B. The mean

differences between treatment conditions are not what would be
predicted from the overall main effects of the two factors.

In symbols, at level B1: μA1 = μA2, at level B2: μA1 = μA2; At level A1 = ….

58

THE THREE F -RATIOS I N A
T WO-FACTOR ANOVA

FA = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r A
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

F B = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r B
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

FA x B = v a r i a n c e ( m e a n d i f f e r e n c e s ) n o t e x p l a i n e d by m a i n e f f e c t s
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t

59

Just main effects
(no interaction)

Main effects +
Interaction

60

E X A M P L E : M A I N E F F E C T O F FAC TO R A
( N O M A I N E F F E C T O F FA C T O R B , N O A X B I N T E R AC T I O N )

0

5

10

15

20

25

B1 B2

A1
A2

61

E X A MPLE: M A IN E F F ECT S FO R B OTH FACTORS
( B U T N O A X B I N T E R AC T I O N )

0

10

20

30

40

50

B1 B2

A1
A2

62

EXAMPLE: A X B INTERACTION
(BUT NO MAIN EFFECTS)

0

5

10

15

20

25

B1 B2

A1
A2

63

  • Psy 202H1: �Statistics iI���Module 5: �Hypothesis Testing with Factorial ANOVA��
  • Game Plan
  • Midterm
  • Midterm
  • More on: Interactions and simple effects
  • Simple Effects
  • Actual Results
  • Another Example Two Factor Design
  • Two-Factor Designs
  • Review: Terminology
  • Two-Factor Designs
  • Two-Factor Designs
  • Main Effects
  • Interactions
  • Two-Factor Designs
  • Interactions
  • Hypothesis Testing with Factorial ANOVA
  • Factorial ANOVA logic
  • Types of Variances in Factorial ANOVA (2 x 2)
  • Factorial ANOVA –
  • Analysis of Variance
  • Two-Factor ANOVA
  • Hypotheses for main effects
  • Hypotheses for the interaction
  • The Three F-ratios in a �Two-Factor ANOVA
  • Two Stages of the Two-Factor Analysis of Variance
  • Slide Number 27
  • Slide Number 28
  • Two-Factor ANOVA � Summary Table Example
  • Effect Size for Two-Factor ANOVA: Partial Eta Squared
  • Two-Factor ANOVA Assumptions
  • What the Hypothesis test looks like with numbers
  • Example: Hypothesis Testing with the Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Summary Table for Two-Factor ANOVA
  • Hypothesis Testing with the �Two-Factor ANOVA
  • Effect Size for Two-Factor ANOVA: Partial Eta Square
  • Reporting Results in APA Format
  • Post Hoc Tests
  • Post Hoc Tests: Tukey’s HSD
  • Post Hoc Tests
  • Testing for Simple Main Effects
  • Testing for Simple Main Effects
  • Testing for Simple Main Effects
  • Testing for Simple Main Effects
  • Testing for Simple Main Effects
  • Higher-Order Factorial Designs
  • Example: Three-Way Interaction
  • To-DO
  • BONUS CONTENT: Another EXAMPLE OF A TWO FACTOR DESIGN
  • Example: Self-Esteem & Presence of an Audience
  • Hypotheses for main effects
  • Hypotheses for the interaction
  • The Three F-ratios in a �Two-Factor ANOVA
  • Slide Number 60
  • Example: Main effect of Factor A �(no main effect of Factor B, no A x B interaction)
  • Example: Main effects for both factors�(but no A x B interaction)
  • Example: A x B interaction�(but no main effects)
error: Content is protected !!