27 Questions
TD0409-01 课件/Psy 202_10_Replication and Open Science_W22-1.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 10:
STATISTICAL ISSUES AND THE
REPLICATION CRISIS
1
1. Best Practices in Research Psychology
1. Some scandal
2. From fraud to QRPs
3. Open science
4. Looking forward
GAME PLAN
2
BEST PRACTICES IN
RESEARCH PSYCHOLOGY
3
OUTLINE
B e s t P r a c t i ces
Examples of fraud/data manipulation
Questionable Research Practices (QRPs)
Doing research ethically and responsibly
Re p ro d uci bilit y a nd Re p l ica tio n E f fo r t s
Motivation
Histor y of Attempts
T h e Fu t u r e: B e i ng a g o o d c o ns u m e r o f p s yc ho lo gic al s c i e nc e
4
INCENTIVE STRUCTURE
P u b l ish ed wo rk i s i m p o r t ant g et t i ng a j o b , g et t ing te nu r e , b e i ng
awa r ded g r a nt s , a nd b e i ng v i ewed f avo r a bly i n o u r fi e l d.
A s a r e s u l t , a “ r a t r a c e ” c u l t u r e d evelo ps a nd p e o p l e t r y to
p u b lis h a s m u c h a s t h ey c a n.
B a l a nc ing t h e d e s i re to s t ay t r u t h ful to p s yc ho lo gic al s c i e nc e
w i t h t h e ne c e s s it y to p u b l ish.
T h i s r e s u lt s i n r e s e arc her s t a k i ng s h o r t c u ts a nd s o m et i mes
wo r s e …
5
RECENT CASES OF RESEARCH MISCONDUCT
Karen Ruggiero (late 90s, early 00’s)
Marc Hauser (2007-2011)
Diederick Stapel (2011)
Dirk Smeesters (2011-2012)
Larr y Sanna (2012)
Jens Förster (2014-2015)
Michael LaCour (2015)
6
7
… I think it is important to emphasize that I never
informed my colleagues of my inappropriate
behavior. I offer my colleagues, my PhD students, and
the complete academic community my sincere
apologies. I am aware of the suffering and sorrow
that I caused to them.
I did not withstand the pressure to score, to publish,
the pressure to get better in time. I wanted too much,
too fast. In a system where there are few checks and
balances, where people work alone, I took the wrong
turn. I want to emphasize that the mistakes that I
made were not born out of selfish ends.
-Brabants Dagblad. 31 October 2011.
-Translated from Dutch
8
h t t p : / / www. ny t imes . c o m/ 201 3/ 04 /2 8/ magaz ine/ dieder ik – st apels –
a u d ac io us – ac ademic – f raud . htm l?pagewanted= all
SCIENTIFIC FRAUD
9
10
NOT JUST PSYCHOLOGY. . .
D r u g s t u d ies : 2 0 – 2 5 % r e p l icate ( P r i nz e )
C a nc e r t r e a t m ent: 1 1 % r e p l ic ate ( B e g ley)
11
HOWEVER, OTHER PRACTICES DON’T
CONSTITUTE FRAUD
Questionable Research Practices
Decisions in design, analysis, and reporting
that increase the likelihood of achieving a
positive result
And a positive response from editors and reviewers
12
FALSE POSITIVE PSYCHOLOGY
How do decisions in analyses af fect the final results?
Having small samples, collecting additional dependent
variables, peeking at data, dropping an experimental
condition
If enough possibilities are enter tained, the likelihood
of achieving a significant result could be over 80%!
Simmons et al., 2011
13
Did you get the effect you predicted?
Did you get ANY effect?
Publish
HARK!
HARKing: Hypothesizing After Results are Known
Figure by S. Vazire14
Did you get the effect you predicted?
Did you get ANY effect?
Publish
Can you dig around and find one?
N
o
HARK!
p-hack!
Figure by S. Vazire
“p-hacking” = fishing around in your data for statistically significant results
Often involves redefining variables or running unplanned analyses
15
EXAMPLE:
IS THE U.S. ECONOMY
AFFECTED BY WHETHER
DEMOCRATS OR
REPUBLICANS ARE IN
OFFICE?
http://fivethirtyeight.com/features/science-isnt-broken/#part2
16
NOT SO SIMPLE…
Do you look at the number of Republicans or
Democrats?
Which politicians do you look at?
How do you measure the U.S. economy?
Should you look at it in general or excluding
economic recessions?
17
18
19
20
21
QUESTIONABLE RESEARCH PRACTICES
John, Loewenstein, & Prelec (2012) sur veyed 2,155
academic psychologists about the frequency of 10
dif ferent QRPs…..
Not reporting all measures, rounding off p-values, only
including data that “worked out”
Up to 63.4% admission and high levels of each being
“defensible”
22
WHAT SHOULD RESEARCHERS DO?
Increase disclosure in methods, results, and
hypothesis presentation
Pre-register hypotheses and studies
Data collection rules, analytic strategies
Share data
Be a responsible scientist regardless of outcome
23
CENTER FOR OPEN SCIENCE
Open Science Framework
Founded to increase to openness, integrity, and
reproducibility of scientific research
Brian Nosek and Jeff Spies
Open source sof tware platform for pre-registering
hypotheses, archiving study materials, depositing
data and syntax
Initiated the Reproducibility Project
24
CENTER FOR OPEN
SCIENCE
Vi d e o :
h t t ps : / / w w w. yo ut u b e . c o m / wa t c h ? v= D Ix m LVr AQ i w
25
PRODUCING RELIABLE FINDINGS
Reproducibility: A study can be duplicated in
method and/or analysis
Replicability: A study about a phenomenon
produces similar results from a previous study
of the same phenomenon.
Close/Exact Replications
Conceptual Replications
26
ARE PSYCHOLOGY
FINDINGS REPRODUCIBLE
AND REPLICABLE?
27
MANY LABS 1 .0
Star ted running studies that could be done relatively
easily.
Ef fects varied from those that have been known to
replicate (classic studies) and those that were
unknown.
28
MANY LABS 1 .0
29
MANY LABS 2.0/3.0 AND OTHER CHANGES
Many Labs 2.0: Replication across sample and
setting
Many Labs 3.0: Subject pool quality across the
academic semester
Editorial policies of some journals changed
Report effect sizes, power, confidence intervals
Special issues on replication
Increase in meta-analyses
30
Di s s e mi n at i o n o f Re pl i c at i o n At te mpt s
Journal of Null Results
P s yc h fi l e drawe r. o r g : Arc h i ve s at te mpte d re pl i c at i o n s o f s pe c i fi c
s t udi e s an d w h et h e r re pl i c at i o n was ac h i eve d
Ce n te r fo r O pe n Sc i e n c e : P s yc h o l o g i s t B ri an N o s e k , a c h ampi o n
o f re pl i c at i o n i n p s yc h o l o g y, h as c re ate d t h e O pe n Sc i e n c e
Framewo rk , w h e re re pl i c at i o n s c an be re po r te d .
As s o c i at i o n o f P s yc h o l o g i c al Sc i e n c e : Has re g i s te re d re pl i c at i o n s
o f s t udi e s , w i t h t h e ove ral l re s ul t s publ i s h e d i n Per spe c t ive s o n
Psyc h o l o g i c a l Sc ien c e .
P l o s O n e : P ub l i c L i b rar y o f Sc i e n c e —pu bl i s h e s a bro ad ran g e o f
ar t i c l e s , i n c l udi n g f ai l e d re pl i c at i o n s , an d t h e re are o c c as i o n al
s ummari e s o f re pl i c at i o n at te mpt s i n s pe c i fi c are as .
Th e Re pl i c at i o n In dex : Cre ate d i n 2 014 by U l ri c h Sc h i mmac k , t h e
s o – c al l e d ” R In dex ” i s a s t at i s t i c al to o l fo r e s t i mat i n g t h e
re p l i c abi l i t y o f s t udi e s , o f j o urn al s , an d eve n o f s p e c i fi c
re s e arc h e r s .
An d mo re ! !
RESPONSES TO REPLICATION CRISIS
31
SOME CRITICISMS
Researchers cherr y pick studies because they have
some personal/ intellectual ax to grind
People who do replications are somehow not
qualified to do science
Science is naturally self-correcting
Unknown dif ferences between studies
Sample-specific reasons for non-replication
32
UNKNOWN DIFFERENCES
Approval at Time 1: 65%
Approval at Time 2: 32%
33
REPRODUCIBILIT Y PROJECT (2015)
Large-scale replication
100 studies from 3 different journals
Close/exact replications
Contacted original study authors
Open materials and data
Reduces likelihood of “unknown differences” effect
How many do you think replicated?
34
WHY DIDN’T MORE FINDINGS REPLICATE?
Perhaps some difference between studies
Boundary effects
Or perhaps the effect didn’t exist in the first
place?
Some uncertainty in findings
File drawer problem
35
FILE DRAWER PROBLEM
36
h t t p s : / / www. yo ut ube. co m / wat c h?v =0R nq1 NpH dmw
JOHN OLIVER KNOWS (NSFW)
WHAT DOES GOOD
RESEARCH LOOK LIKE?
38
GOOD RESEARCH
Good research is open research
Materials and data are shared
publicly
Good research features
experimental methods that are
strong and isolate a question
of interest
Good research is adequately
“powered” research (see
Tutorial 7 for a review)
39
GOOD RESEARCH
Good research is reproducible
40
CONSUMING SCIENCE
Be an informed consumer of science
Don’t believe ever ything you read!
If an effect seems unbelievable, it just might be.
Pay attention to sample size
How big is the sample?
Effects are unreliable if sample size is too low, a 2,000 person
study more reliable than a 50 person study.
41
CONSUMING SCIENCE
Is the study you are reading the only demonstration of
this ef fect?
Have people from other labs replicated this?
Did the authors make their data available?
Advocate for good research so we can understand
more about humans and why they do the things they
do
42
St ar t He re
A summar y
http://nobaproject.com/modules/the-re plication-crisis-in-psychology
A dissent
http://www.nytimes.com/2015/09/01/opinion/psychology -is-not-in-
crisis.html?_r=0
O pt i o n al
A counterpoint to the dissent
http://www.theatlanti c.com/notes/2015/09/swee ping-psychologys-problems-
under-the-rug/403726/
A possible solution, and preliminar y findings
http://as.virginia.edu/news/massive-collaboration-testing-re producibility –
psychology -studies-publishes-findings
A response to the possible solution
https://www.sciencenews.org/ar ticle/psychologys-re plication-crisis-sparks-
new -debate
It’s not just us
http://www.slate .com/a r ticles/health_and_science/future_ tense/2016/04/bi
omedicine_facing_a_wor se_replication_crisis_than_the_one_plaguing_psychol
ogy.html
O P T I O N A L R E A D I N G S : “ R E P L I C A B I L I T Y C R I S I S ” I N
P SYC H O LO GY
43
REPLICATION CRISIS
OR
CREDIBILIT Y REVOLUTION?
44
Interviewer: “How much of
what you print is wrong?”
Maddox: “All of it. That’s
what science is about — new
knowledge constantly
arriving to correct the old.”
John Maddox,
editor of Nature for 22 years
45
D a t a A na l ys is P ro j e c t
Due Tuesday April 5, 11:59pm
C o u r s e E va l s ( s e e a nno u nc e me nt )
F i na l exa m
Tuesday Apr 12 9 am to Thursday April 14 11:59pm
Same basics as Midterm; see Assessment Page for more info
46
TO DO
- Psy 202H1: �Statistics iI���Module 10: �Statistical Issues and the Replication Crisis�
- Game Plan
- Best Practices in Research Psychology
- Outline
- Incentive Structure
- Recent Cases of Research Misconduct
- Slide Number 7
- Slide Number 8
- Scientific Fraud
- Slide Number 10
- Not just psychology. . .
- However, other practices don’t constitute fraud
- False Positive Psychology
- Slide Number 14
- Slide Number 15
- EXAMPLE: �Is the U.s. economy affected by whether democrats or republicans are in office?
- Not so simple…
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Questionable Research Practices
- What should researchers do?
- Center for Open Science
- Center for open science
- Producing Reliable Findings
- Are psychology findings reproducible and replicable?
- Many Labs 1.0
- Many Labs 1.0
- Many Labs 2.0/3.0 and other changes
- Responses to Replication Crisis
- Some criticisms
- Unknown differences
- Reproducibility Project (2015)
- Why didn’t more findings replicate?
- File Drawer Problem
- John Oliver Knows (NSFW)
- What does good research look like?
- Good Research
- Good Research
- Consuming Science
- Consuming Science
- Optional Readings: “Replicability Crisis” In Psychology
- Replication Crisis�or �Credibility Revolution?
- Slide Number 45
- To Do
TD0409-01 课件/Psy 202_7_Regress_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 7:
REGRESSION
1
1. Introduction to Regression
1. Linear Regression vs Correlation
2. Hypothesis Testing with Regression
3. Video
2. Multiple Regression
1. What is it, even?
2. What can we learn?
GAME PLAN
2
Correlation Review!
LINEAR REGRESSION
STATISTICAL TECHNIQUE USED TO
PREDICT THE UNKNOWN VALUE OF ONE
VARIABLE GIVEN A KNOWN VALUE OF
ANOTHER VARIABLE
3
REVIEW
Many studies aim to determine if two
variables has a Co-Var ying Relationship with
one another
When the value of one variable reliably changes in
value with another variable
Positive Covariance = When the two variables change in the
same direction
E.g. Weight-Height / Study Time-Exam Performance
Negative Covariance = When the two variables change in
opposite directions
E.G. Stress-Meditation / Alcohol Intoxication – Coordination
4
INTRODUCTION TO LINEAR EQUATIONS
AND REGRESSION
T h e Pe a r so n c o r r e l at io n m e a s ur es a l i ne ar r e l a t io ns hip
b et ween t wo va r i a bles.
T h e l i ne t h ro u gh t h e d a t a
Makes the relationship easier to see
Shows the central tendency of the relationship
Can be used for prediction
Re g r es si o n a na l y si s p r e c i sely d e fi nes t h e l i ne .
5
REVIEW
ASSESSING FOR T HE PRESENCE OF COVARIAT ION
When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable
Perfect Relationships Allow Perfectly Precise Predictions to a Single Value
E.G. If X and Y were perfectly related (r = +/- 1.00), I could accurately predict Y to a single
value given a single value of X
For example, if X = 3, I would predict that Y = 5 6
REVIEW
ASSESSING FOR THE PRESENCE OF CVRS
When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable
Imperfect Relationships Allow for Predictions to a Range of Values (not perfectly precise)
E.G. If X and Y were imperfectly related, ǀ r ǀ < 1.00, I could accurately predict Y to a range
of values given a single value of X
For example, if X = 3, I would predict that Y would be between 4 – 6 7
REVIEW
ASSESSING FOR THE PRESENCE OF CVRS
When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable
The stronger the CVR between two variables, the more precise the predictions are (or, the
more narrow the range of predicted values of one variable)
Variables X & Y: r = +.79
If X = 3, Y is predicted to be between 4 – 6
Variables A & B: r = +.32
If X = 3, Y is predicted to be between 2 – 8
More precise prediction Less precise prediction 8
LINEAR REGRESSION
When a significant correlation has been found
between two variables, it is common for
researchers to want to generate an equation
that would be useful for predicting the value
of one of the variables given the known value
of the other variable.
Linear Regression is the technique to use in
order to accomplish this
Linear Regression utilizes the equation of a straight
line in order to make these predictions
9
LINEAR REGRESSION
Equation of a Straight Line
y = m(x) + b
m = slope of the line = (Change in Y) / (Change in X) = (Y2 – Y1) / (X2 – X1)
b = Y-intercept = the value of Y when X = 0
r = +1.00
10
LINEAR REGRESSION
Equation of a Straight Line
y = m(x) + b
m = slope of the line = [(Y2 – Y1) / (X2 – X1)] = [(11 – 9) / (6 – 5)] = (2 / 1) = +2.00
b = Y-intercept = the value of Y when X = 0 = -1.00
Based on subtracting 2 from Y for every 1-value decrease in X
(e.g. When X = 2, Y = 3; When X = 1, Y = 1; When X = 0, Y = -1)
r = +1.00
11
LINEAR REGRESSION
Equation of a Straight Line
y = m(x) + b
y = +2.00(x) + -1.00
Now, we can predict an unknown value of Y given a value of X
If X = 15, what is the predicted value of Y?
If X = 100, what is the predicted value of Y?
r = +1.00
12
LINEAR REGRESSION
Equation of a Straight Line
y = m(x) + b
y = +2.00(x) + -1.00
Now, we can predict an unknown value of Y given a value of X
y = +2.00 (15) + -1.00 = 29.00
y = +2.00 (100) + -1.00 = 199.00
r = +1.00
13
LINEAR REGRESSION
y = +2.00(x) + -1.00
y = +2.00 (15) + -1.00 = 29.00
y = +2.00 (100) + -1.00 = 199.00
Since X & Y were perfectly related, we can make precise, single-value
predictions of Y from a given value of X
r = +1.00
14
LINEAR REGRESSION
What line should we use to characterize the relationship, and how
do we determine its equation?
What single line do we draw in order to determine its equation?
15
LINEAR REGRESSION
What line should we use to characterize the relationship, and how do we
determine its equation?
We will want to choose the “Best Fitting Regression Line”
The Line That Has The Smallest Average Degree of Prediction Error
Prediction Error = Y’ – Y
Y’ = The Line’s Predicted Value of Y; Y = Actual Value of Y
Y’
YPrediction Error
Y’ – Y
16
LINEAR REGRESSION
What line should we use to characterize the relationship, and how do we
determine its equation?
Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others
Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error17
LINEAR REGRESSION
What line should we use to characterize the relationship, and how do we
determine its equation?
Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others
Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error18
LINEAR REGRESSION
What line should we use to characterize the relationship, and how do
we determine its equation?
There will always be one single straight line that “best fits the data”
or,
There will always be one straight line that has a smaller average degree of
Prediction Error than all other possible straight lines
This line is what is known as the “Least Squares Regression Line”, or “Best Fitting
Regression Line”
We will want to determine this line’s equation and use it for prediction
19
LINEAR REGRESSION
Equation of the “Least Squares Regression Line”
Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but this is the more common regression notation
In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:
Calculate First Calculate Second 20
LINEAR REGRESSION
Equation of the “Least Squares Regression Line”
Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but I will use this to be consistent with the textbook
In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:
Once you have calculated the
Pearson r correlation coefficient by hand,
then calculating this is easy as most of the
work has been completed already
21
Fu nc t i o nal: D e fi ni ng t h e l i ne o f b e s t fi t t h a t we v i s u ally
e s t i m ated i n o u r s c a t terplot s
C o nc e p t ual: H ow we d i s c us s o u r r e s u l ts
Correlation does not specify relationship directionality at all
Regression can imply it, if not directly test it
“Predicting X FROM Y”
St a t i s tic al:
Simple linear regression and correlation will yield same results
Beta (β) or b instead of Pearson’s r
As statistics get more complex, regression gives us more functions
Adding multiple predictors
LINEAR REGRESSION VS CORRELATION
22
LINEAR REGRESSION IS FOR THE BIRDS
23
P r a c t i ce e s t i m at ing s l o p e s a nd i nte rc ept s h e r e !
h t t p s : / / so phi eehill. s hinyap ps. io / eyebal l- regr es sio n/
24
EYEBALL REGRESSION
25
MORE ON PREDICTION
EQUATIONS
26
PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)
HOURS
876543210
S
C
O
R
E
6
5
4
3
2
1
0 Rsq = 0.5875
r = .77
r2 = .59
27
PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)
HOURS
876543210
S
C
O
R
E
6
5
4
3
2
1
0 Rsq = 0.5875
Slope (β)
Hours Studying (X)
Q
ui
z
S
co
re
Y
Intercept (α)
How can we describe
this “regression” line?
28
LINEAR REGRESSION MODEL
Population parameters:
Sample statistics:
a & b are constants
Y, X, & e vary for each person (i)
Yi = α + β(Xi)
Yi = a + b (Xi)
29
SIMPLE LINEAR REGRESSION
EQUATION
Y′ = Predicted value of Y
a = Intercept
Value of Y when X = 0
b = Slope, unstandardized regression coefficient
Change in Y for every 1-unit change in X
X = Any value of X
Note:
a (intercept) & b (slope) are constants
X & Y are variables
Y′ = a + bX
30
EXAMPLE OF A PREDICTION EQUATION
Predict quiz score (Y) from hour s studying (X)
Y′ = 1.5 + .5X
31
PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)
HOURS
876543210
S
C
O
R
E
6
5
4
3
2
1
0 Rsq = 0.5875
Quiz(Y′) = 1.5 + .5X
a (intercept) = 1.5
b (slope) = .5
32
EXAMPLES OF PREDICTION EQUATIONS
Predict marital satisfaction (Y) from conflict (X)
Predict depression (Y) from stressful events (X)
Y′ = 10 + (-1)X
Y′ = 10 + 2X
33
UNDERSTANDING THE SLOPE
Expected change in Y for ever y 1-unit change in X
“Rise over run”
Slope can be positive (Y increases as X increases)
or negative (Y decreases as X increases)
The (unstandardized) slope is in the metric of Y
34
COMPUTING THE SLOPE & INTERCEPT
bYX = = bYX =XSS
SP
2)X(X
)Y)(YX(X
∑ −
∑ −−
X
Y
s
sr
aYX = Y – b X
or
• These are the formulas for
regression of Y on X
• They are not reciprocal!
35
SAMPLE COMPUTATIONS
bYX = = = .50
X
Y
s
sr
aYX = = 3.333- (.50) (3.667) = 1.5Y – b X
• Predict quiz scores from hours studying
• Assume r = .77, sY = 1.63, sX = 2.50, MY = 3.33, MX = 3.67
50.2
63.177.
Y′Quiz = 1.5 + .5 (XStudy)
36
PREDICTING QUIZ SCORE (Y)
FROM HOURS STUDYING (X)
HOURS
876543210
S
C
O
R
E
6
5
4
3
2
1
0 Rsq = 0.5875
Quiz(Y′) = 1.5 + .5X
a (intercept) = 1.5
b (slope) = .5
37
PREDICTION ERRORS
How good a job are we doing at predicting
Y from X?
Compute each Y′ from prediction equation
Simply plug in X for each person to determine Y′
How close is Y′ to Y?
Y – Y′ = error = “residual”
38
RESIDUALS (ERRORS OF PREDICTION)
HOURS
876543210
S
C
O
R
E
6
5
4
3
2
1
0 Rsq = 0.5875
Residual = Y – Y′
“Prediction Line”
“Regression Line”
Y′
PREDICTION ERRORS
39
Y′ = 1.5 + .50 (X)
40
=
PREDICTION ERRORS
H ow m u c h e r ro r o n ave rage?
Σ(Y – Y′) = 0, so we have to square them! (just like when we
computed variance around a mean)
Va r i anc e o f
t h e r e s i dua ls:
St a nd a rd d ev iat io n
o f t h e r e s i dual s:
=2.xys 2
2)Y(Y
−
∑ ′−
n
sy x.
2
xys .
“Standard
error of
prediction”
T H E S TA N DARD E R ROR O F E S TIMATE &
C O RRELATION
The s tandar d er r or of es timate (s e) gives a measure
of the standard distance between a regression line
and the actual data points
If the correlation is near +/-1 .00, the standard error
of estimate will be small; as the correlation nears 0,
it will become larger
Predicted variability = SSregression = r2SSy
Unpredicted variability = SSresidual = (1 – r2)SSy
41
2
)ˆ( 2
−
−
= ∑
n
YY
df
SSresidual
2
)1( 2
−
−
=
n
SSr
df
SS Yresidual
IS THE BEST FITTING A LINE ALWAYS A
GOOD FITTING LINE?
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15
Both of these figures show the best fitting lines for the data.
But would we say that both lines fit the data equally well? Clearly
not! 42
“ L I N E O F B E ST F I T ” VS . “ G OODNESS O F F I T ”
Imagine that you’re shopping for a suit, and you find
the best fitting suit in Wal-Mart
does that necessarily mean that the suit fits you
well?
• line of best fit “best fitting suit”
• goodness of fit “how well the suit fits”
43
“GOODNESS OF FIT”: R 2
We wan t to d ete rmi n e h ow muc h o f t h e vari abi l i t y o f y i s
ex p l ai n e d by x
Th e “re s i dual s um o f s q uare s ” SS r es id ua l te l l s us h ow muc h o f t h e
vari at i o n i n y i s u n expla i n ed by o ur mo de l
We c an al s o c al c ul ate h ow muc h o f t h e vari at i o n i n y is expla in ed
by o ur mo d e l SS r eg r es s io n
Th e r 2 val ue , o ur me as ure o f g o o dn e s s o f fi t , te l l s us w h at
pr o po r t i o n o f t h e tot a l s um o f s q uare s o f o ur o ut c o me vari abl e
(SS y) i s ex pl ai n e d by o ur mo de l
r 2 = SS re g re ssi on/ S S y
44
U N DERSTAN DIN G T H E R E GRESSION E Q UATION
S o m e p r e c aut io ns :
The predicted value is not perfect (unless r = +/-1.00)
The regression equation should not be used to make predictions for X
values that fall outside of the range of values covered by the original
data
E.g., We wouldn’t want to predict creativity scores for someone with an IQ of 90 or 130
because the relationship between IQ and creativity may be different for these values
45
46
HYPOTHESIS TESTS
WITH REGRESSION
TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION
Analysis of regression
Is the amount of variance predicted by the
regression equation significantly greater than what
we would expect by chance if there was no
relationship between x and y?
47
TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION
Analysis of regression
Is the amount of variance predicted by the regression
equation significantly greater than what we would expect by
chance if there was no relationship between x and y?
Regression variation (SSregression) The variance in y that is
related to or associated with changes in x. The closer data
points fall to the regression line, the larger the value of
regression variation.
Residual variation (SSresidual) The variance in y that is not
related to changes in x. This is the variance in y that is left
over or remaining. The farther data points fall from the
regression line, the larger the value of residual variation.
48
TESTING THE SI GNIFICANCE O F THE
R EGRESSION EQUATION
Analysis of regression
Very similar to analysis of variance:
Uses an F-ratio of two mean square (MS) values
Each MS is a SS divided by its df
(Variance of y related to changes in X)
(Variance of y not related to changes in X)
49
Learning check!
STANDARDIZED REGRESSION EQUATION
Involves transforming raw scores into z-scores before
finding the regression equation
X Y Zx Zy
107 6 -.27 -.25
110 8 .07 .58
101 4 -.96 -1.08
105 5 -.50 -.66
124 10 1.66 1.41
Recall that for any standardized distribution, M = 0 and SD = 1, so:
• a = 0 (intercept drops out of equation)
• beta (β) = r
β is the standardized regression coefficient (b)
• Easier to interpret than b
• Useful for multiple regression (comparing multiple predictors)
ZxZy Zx2 Zy2
.0675 .073 .0625
.0406 .0049 .3364
1.037 .9216 1.1664
.33 .25 .4356
2.34 2.756 1.9881
Ŷ = bX + a z’y = rzx
z’y = .95zX
50
51
HYPOTHESIS TESTING FOR B (SLOPE)
Three common hypothesis tests:
Is b significantly different from 0?
Is b significantly different from some non-zero value?
Are two bs significantly different from each other?
52
HYPOTHESIS TESTING FOR B
Is b significantly dif ferent from 0?
Population parameter: β*
Sample statistic: b
Two-tailed statistical hypotheses
H0: β* = 0
H1: β* ≠ 0
Conduct a single-sample t-test
53
sb =
HYPOTHESIS TESTING FOR B
Conduct a t-test
w h e r e
t = df = n-2
1
.
−ns
xys
x
“Standard
error of the
slope”
b
hypoth
s
b β−
54
HYPOTHESIS TESTING FOR B
Notice that the standard error of b will be influenced by 3
things:
Larger n = smaller standard error
Larger sY.X = larger standard error
Poor prediction overall leads to more error (variability) in sample
estimates of β*
Larger sX = smaller error
All else equal, greater variability in X results in more stable (less
variable) sample estimates of β*
MORE REGRESSION,
MORE VARIABLES
55
MULTIPLE REGRESSION
One criterion variable and two or more predictor variables
determining a single comprehensive relationship
Can help fix the third variable problem, by adding controls
into the regression equation
Ŷ = bX + a Ŷ = b1X1 + b2X2 + … + bnXn + ac
56
57
RESEARCH PROBLEM
What is the association between a DV (Y)
and two or more IVs (Xs)?
Predicting exam grades (Y) from hours studying (X1) and
number of lectures attended (X2)
Predicting depression (Y) from stress (X1) and
social support (X2)
Predicting marital satisfaction (Y) from intimacy (X1), conflict
(X2), closeness (X3)
Can have any combination of numerical or
categorical predictor variables
58
Note that IQ predicts 40% of the variance in academic performance but adding
SAT scores as a second predictor increases the predicted portion by only 10%.
59
T YPES OF MULTIPLE REGRESSION
Three types of multiple regression (MR):
Simultaneous
Enter all predictor variables at the same time
“Standard” MR
Hierarchical
Enter predictor variables in predetermined sets
Stepwise
Computer program adds (or takes away) predictor variables one at
a time to optimize R2 (coefficient of multiple determination)
Completely data driven
60
SIMULTA NEOUS MULTIVA RIATE REGRESSION
WITH T WO PREDICTORS
Y
Relationship
Satisfaction
X1
Intimacy
X2
Conflict
Y′ = a + b1 X1 + b2 X2
61
QUESTIONS OF INTEREST IN MR
What is the relationship between Y and each X?
b (unstandardized slopes)
t-tests for b’s
What is relative impor tance of each X?
β (standardized slopes)
62
SIMULTA NEOUS MULTIVA RIATE REGRESSION
WITH T WO PREDICTORS
Y
Relationship
Satisfaction
X1
Intimacy
X2
Conflict
b1
b2
b = unstandardized slope (regression coefficient)
β = standardized slope (regression coefficient)
β1
β2
MULTIPLE REGRESSION
A s t at i s t i c al te c h n i q ue t h at as s e s s e s t h e e f fe c t o f s eve ral
pre di c to r s (X) o n a s i n g l e c ri te ri o n (o ut c o me ) me as ure (Y )
Te l l s us t h e c o n t ri but i o n o f e ac h vari abl e a b ove a n d b eyo n d t h e
ot h e r vari abl e s i n t h e e q uat i o n
Unstandardized prediction equation:
Y′ = a + b1 X1 + b2 X2
Y′ = Predicted value of Y
a = Intercept
Value of Y when all the Xs = 0
bj = Partial slope for variable j
Change in Y for every one-unit change in X,
holding all other Xs constant
Xi = Value of X1 (or X2) for person i
63
64
EXAMPLES
Predicting exam scores (Y) from hours studying (X 1)
and number of lectures attended (X 2)
Y′ = 25 + 3 (X1) + 1 (X2)
Predicting depression (Y) from stress (X 1)
and social suppor t (X 2)
Y′ = 5 + 2 (X1) + -4 (X2)
MULTIPLE R EGRESSION:
CONTRIBUTION OF EACH PREDICTOR
D o b ot h o f o u r p r e d ic to r va r i a bles p r e d ict va r i a bilit y i n Y ?
W h a t i f o nl y o ne o f t h e m i s a c t u ally p r e d ic tive ? H ow c a n we
exa m ine t h e r e l a t ive c o nt r i b ut io n o f e a c h p r e d ic to r va r i a ble?
E xa m ining t h e b et a va l u es wo rk s i f we h ave s t a nda r dized
o u r s c o r e s b e fo r e fi nd i ng t h e r e g r es si o n e qu a t io n ( i . e . , we
a r e u s i ng t h e s t a nd a rdiz ed fo r m o f t h e m u l t iple r e g r es sio n
e qu a t io n)
Larger beta (β) larger contribution
z’Y = 1.2zX1 + .65zX2
65
MULTIPLE R EGRESSION:
CONTRIBUTION OF EACH PREDICTOR
C a n a l s o te s t t h e s i g ni fic ance o f e ac h c o nt r i b u t io n:
Does adding the second predictor variable (X2) make our
predictions significantly more accurate?
E.g.., H0: b2 = 0
To te s t t h i s hy p ot hes is, we fo l l ow 3 s te p s:
1. How much variance is predicted by using just the first predictor
variable?
2. What is the contribution made by the second variable?
3. Is this additional variance significant or not?
66
67
MORE ON MULTIPLE
REGRESSION
M ul t i p l e re g re s s i o n
A statistical technique that assesses the ef fect of several predictors (X)
on a single criterion (outcome) measure (Y)
AKA , Multiple regression tells us about the ef fect of a variable on the outcome
above and beyond the other variables in the model
Can s e r ve t wo g o al s :
Rule out alternative explanations
Give more predictive power
T h i s a l l o w s u s to c o n t r o l f o r t h e e f f e c t o f o t h e r v a r i a b l e s s t a t i s t i c a l l y ,
ev e n w h e n w e c a n ’ t c o n t r o l t h e m ex p e r i m e n t a l l y
It d o e s N OT
Establish causation
68
WHAT CAN MULTIPLE REGRESSION TELL US?
MULTIPLE REGRESSION: EXAMPLE
Linear regression Multiple regression
What predicts behaviour problems in elementar y kids?
69
THE THIRD VARIABLE PROBLEM
70
Multiple Regression Helps with the Third Variable Problem
71
Multiple Regression Helps with the Third Variable Problem
72
73
A f r i e nd l o o k s a t t h e s e d a t a a nd s ay s , “ T he o nl y r e a s o n
ava i labilit y o f r e c e s s p r e d ic ts b e h av io ur p ro b l ems i n t h e
c l a s s ro o m i s b e c a use t h e r e a r e s o m a ny b oy s i n t h e c l a s s ,
a nd b oy s a r e o bv i o us ly m o r e a c t i ve. ”
W h a t d o yo u s ay b a c k ?
74
MULTIVARIATE ANALYSIS
Regression in the popular press
Look for buzz words:
Controlled for
Taking into account
Correcting for
Adjusted for
Learning check!
75
SPECIAL T YPES OF MULTIPLE
REGRESSION
Me di a tion a n a lysis
Assesses whether a third variable explains the relationship between
X and Y
Identifies possible causal mechanisms
Mo de ra tion a n a lysis
Assesses whether a third variable changes the relationship between
X and Y
Identifies possible interactions among predictors
76
MEDIATION ANALYSIS
S h o w t h a t I V p r e d i c t s DV
S h o w t h a t I V p r e d i c t s m e d i a to r
I n c l u d e b o t h t h e I V a n d t h e
m e d i a to r a s p r e d i c to r s o f t h e DV
β31
β21
β32
β11
*Results from Buying Time Promotes Happiness
77
MODERATION ANALYSIS
W h en a r e l a t io nsh ip b et ween
t wo va r i ables d e p end s o n a t h i r d
va r i abl e
Statistical interaction!
I n m u l t iple r e g r e ss io n, i nc l u de
t h e I V, t h e m o d e r ato r, a nd a n
i nte r ac t io n te r m
E xa m ple: S we a r ing m o d e r ates
t h e r e l a t io nsh ip b et ween
c a t a s t ro p hising a nd c o l d – p r es so r
l a tenc y
*Results from Swearing as a Response to Pain
78
h t t p s : / / w w w. b l u e p r i n t i n c o m e . c o m / t o o l s / l i f e – e x p e c t a n c y – c a l c u l a t o r – h o w – l o n g –
w i l l – i – l i v e /
MULTIPLE REGRESSION IN LIFE
79
h t t p : / / t i m e . c o m / 8 2 9 3 / i t s – t r u e – l i b e r a l s – l i k e – c a t s – m o r e – t h a n – c o n s e r v a t i v e s – d o /
MULTIPLE REGRESSION IN LIFE
Quiz score correlated
with actual political
preference at r = .68
http://time.com/510/ca
n-time-predict-your-
politics/
80
D a t a a na l y sis p ro j e c t – t a ke a l o o k a t t h e i ns t r u c t io ns !
M i nd t ap + t u to r i al
Rev i ew m i d ter m w i t h TA s
81
TO DO
- Psy 202H1: �Statistics iI���Module 7: �Regression�
- Game Plan
- Linear Regression��Statistical Technique Used to Predict the Unknown Value of One Variable Given a Known Value of Another Variable
- Review
- Introduction to Linear Equations and Regression
- Review�Assessing for the Presence of covariation
- Review�Assessing for the Presence of CVRs
- Review�Assessing for the Presence of CVRs
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear Regression
- Linear regression vs Correlation
- Linear Regression is For the Birds
- Eyeball regression
- More on Prediction equations
- Predicting Quiz Score (Y)�from Hours Studying (X)
- Predicting Quiz Score (Y)�from Hours Studying (X)
- Linear Regression Model
- Simple linear Regression Equation
- Example of a prediction equation
- Predicting Quiz Score (Y)�from Hours Studying (X)
- Examples of prediction equations
- Understanding the slope
- Computing the Slope & Intercept
- Sample computations
- Predicting Quiz Score (Y)�from Hours Studying (X)
- Prediction Errors
- Residuals (errors of prediction)
- Prediction Errors
- Prediction Errors
- The standard error of estimate & correlation
- Is the best fitting a line always a good fitting line?
- “Line of best fit” vs. “Goodness of fit”
- “Goodness of Fit”: r2
- Understanding the Regression Equation
- Hypothesis Tests with Regression
- Testing the Significance of the Regression Equation
- Testing the Significance of the Regression Equation
- Testing the Significance of the Regression Equation
- Standardized Regression Equation
- Hypothesis testing for b (slope)
- Hypothesis Testing for b
- Hypothesis Testing for b
- Hypothesis Testing for b
- More Regression, more variables
- Multiple regression�
- Research Problem
- Slide Number 58
- Types of Multiple Regression
- Simultaneous Multivariate regression�with two predictors
- Questions of Interest in MR
- Simultaneous Multivariate regression�with two predictors
- Multiple regression
- Examples
- Multiple Regression:�Contribution of Each Predictor
- Multiple Regression:�Contribution of Each Predictor
- More on Multiple Regression
- What can Multiple regression tell us?
- Multiple Regression: Example
- The Third Variable Problem
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Multivariate Analysis
- Special Types of Multiple Regression
- Mediation analysis
- Moderation Analysis
- Multiple Regression in Life
- Multiple Regression in Life
- To Do
TD0409-01 课件/Psy 202_4_IntrotoFactorial_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 4:
INTRO TO FACTORIAL ANOVA
1
1. Intro to Factorial ANOVA
1. Why factorial designs?
2. Structure of a Factorial ANOVA
3. A conceptual demonstration
4. What we can learn from Factorial ANOVA
5. Effects in graphs and text
2. Calculations – next module!
GAME PLAN
2
SOME QUICK CLARIFICATION
The Question: If my exact degrees of freedom
isn’t included on the table, which number
should I use?
Answer: Choose the safest (most conservative)
value
E.g., You’re looking up a q value, and your dfwithin
= 36 but the table jumps from 30 to 40
Use 30 (which will indicate a larger q value making it a more
conser vative test)
3
SOME QUICK CLARIFICATION
The Question: How many decimal places should I
round my final answer to? If I’m doing multiple
calculations to get to my final answer, should I
not round until the ver y end?
Answer: You should round your final answers to 2
decimal places (e.g., 4.87246 4.87). For the best
accuracy, hold off on rounding until the last step. But
do not spend time worrying about rounding – I am
much more interested in whether you followed the
correct steps, used the correct formulas, etc. So long
as you’ve shown your work, there is no need to worry
if your final answer is off by a couple of hundredths.
4
INTRODUCTION TO
FACTORIAL
DESIGN AND ANALYSIS
5
THE BIG PICTURE
Single
score
1 IV
z score
z test
One sample t-
test
Making comparisons
to population (NO IVs)
Sample
mean
σ known σ unknown
Making comparisons
between levels of IV(s)
or groups
More than 1 IV
2 levels 3+ levels IV
Between
subjects
Within
subjects
Independent
samples t-test
Paired
samples t-test
Between
subjects
Within
subjects
One-Way
Between
ANOVA
One-Way
Repeated
ANOVA
All IVs
Between
subjects
All IVs
Within
subjects
Mix of
within and
between
Between subj
Factorial
ANOVA
Repeated
Measures
Factorial ANOVA
Mixed Model
Factorial
ANOVA 6
WHY FACTORIAL ANOVAS?
7
• So far, we have discussed designs where there is only one IV and only
one DV
• Complex designs include multiple IVs, multiple DVs, or both
• Multiple IVs Factorial design & Assessing interactions
• More groups can offer more precision
• Include more experimental, control, or placebo conditions (add
levels of IV)
• Want to understand if your effect is moderated/affected by another
variable (add IVs)
WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?
8
REQUIREMENTS
Must have 2 or more IVs
Must have 2 or more levels of each IV
Must have quantitative DV
But why a Factorial ANOVA when t-tests and one-
ways are just so great!?
FACTORIAL ANOVA
9
Example
IV (exercise): mild vs. intense
IV (age group): young adult vs. elderly
DV: overall fitness
WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?
Why not just run two t-tests?
• Exercise on fitness
• Age on fitness
10
A ge on fitness test per formance– t-test
0
1
2
3
4
5
6
7
young elderly
11
Exercise on fitness test per formance — t-test
0
1
2
3
4
5
6
7
mild intense
HUH??!!??
Exercise is
good right!?
12
AGE × EXERCISE ON FITNESS INTERACTION!
0
1
2
3
4
5
6
7
young elderly
mild intense
13
MEASURING MORE THAN ONE OUTCOME
1 . M a n i p u l a t i o n c h e c k s ( m e a s u r i n g t h e I V )
To ensure our manipulation worked
2 . M u l t i p l e m e a s u r e s o f t h e s a m e v a r i a b le o r c o n s t r u c t ( s a m e DV )
To assess convergent validity
To create composite scores
3 . M e a s u r e s o f s e v e r a l d i f f e r e n t v a r i a b l e s o r c o n s t r u c t s ( m u l t i p l e DV s )
To assess divergent (discriminant) validity
To assess possible confounds that can’t be experimentally controlled
4 . M u l t i p l e I V S
To assess interactions
14
FACTORIAL DESIGNS: ADVANTAGES
1. Allow for testing of multiple hypotheses within a
single study
1. Methodologically efficient
2. Statistically “cheaper”
2. Allows for more complex hypotheses and research
questions
3. Better understand the nuances of an ef fect
1. Interactions
2. Moderation
15
When describing factorial ANOVAs statistically and
conceptually I’ll focus on 2 x 2 factorial designs
As we start to calculate things, you’ll understand why!
Thus the specifics in the remainder of this lecture
apply only to 2 x 2 Factorial ANOVAs, rather than
more complex designs.
DISCLAIMER
16
FACTORIAL DESIGN
17
MANIPULATING MULTIPLE FACTORS
A l l ows us to a n swe r q ue stio ns a bo ut wh eth er th e e f fe ct o f
o n e in de pen dent va ria ble de pe nds o n th e leve l o f a n oth er
Fa ctorial d e sign : Ea c h leve l o f o n e I V i s c o mbined with e a c h
leve l o f th e oth e r s to pro duc e a ll po ssible c o mbinatio ns o f
leve l s
Non-manipulated IVs ok
18
WINE-RATING EXAMPLE
What determines how highly a wine is rated?
Cf. Plassmann, O’Doherty, Shiv, & Rangel (2008; PNAS)
Quality? Price?
19
FACTORIAL DESIGN
A re se a rc h de sig n
inve stiga ting th e
e f fe c t o f two o r
mo re in de pen dent
va ri a bl es (fa c tor s)
o n th e de pe ndent
va ri a bl e
Cheap Price Expensive Price
Low Quality
Low Quality,
Cheap Price
Low Quality,
Expensive Price
High Quality
High Quality,
Cheap Price
High Quality,
Expensive Price
Price
Quality
Factors: Quality Price
High LowLevels: Cheap Expensive
2 levels x 2 levels = 4 conditions
20
FACTORIAL DESIGN TABLE
T h e r o w s r e p r e s e n t t h e l e v e l s o f o n e i n d e p e n d e n t v a r i a b l e , t h e
c o l u m n s r e p r e s e n t t h e l e v e l s o f a s e c o n d i n d e p e n d e n t v a r i a b l e , a n d
e a c h c e l l r e p r e s e n t s a c o n d i t i o n .
2×2 DESIGN: FACTOR B
Level 1 Level 2
FACTOR A
Level 1 Condition 1 Condition 2
Level 2 Condition 3 Condition 4
21
BET WEEN- VS. WITHIN-SUBJECT
FACTORIAL DESIGN
B e t w e e n – s u b j e c t s f a c to r i a l d e s i g n
ALL of the factor s are manipulated between subjects
Each subject par ticipates in just ONE condition
W i t h i n – s u b j e c t s f a c to r i a l d e s i g n
ALL of the factor s are manipulated within subjects
Each subject par ticipates in ALL conditions
M i xe d d e s i g n f a c to r i a l
SOME of the factor s are manipulated between subjects, SOME within subjects
Each subject par ticipates in MORE THAN ONE, but NOT ALL conditions
22
Research on video games and aggression has been
mixed. Studies of ten compare how violent and non-
violent video games af fect aggressive behavior, but
you wonder if perhaps opponent type – whether the
game is played against another person or the
computer – also might matter.
IV1: Game type – violent or non-violent
IV2: Opponent type – real or computer
DV: A ggressive behavior
AN EXAMPLE
23
Between Subjects:
Each participant participates in one level of each IV
(i.e., in one of the four cells of the design).
All four cells of the design have different
participants.
T YPES OF FACTORIAL DESIGNS: BET WEEN
Violent, level 1 Non-violent, level 2
Against person, level 1 Participants #: 1-10 Participants #: 11-20
Against computer, level 2 Participants #: 21-30 Participants #: 31-40
24
Repeated Measures:
Each participant participates in both levels of both
IVs (i.e., in all four cells of the design).
All four cells of the design have the same
participants.
T YPES OF FACTORIAL DESIGNS: WITHIN
Violent, level 1 Non-violent, level 2
Against person, level 1 Participants #: 1-40 Participants #: 1-40
Against computer, level 2 Participants #: 1-40 Participants #: 1-40
25
Mixed Model:
Each participant participates in one level of one IV
and in both levels of the other IV (i.e., in two cells of
the design).
Two cells of the design have the same participants,
the other two have another set of participants.
T YPES OF FACTORIAL DESIGNS: MIXED
Violent, level 1 Non-violent, level 2
Against person, level 1 Participants #: 1-20 Participants #: 21-40
Against computer, level 2 Participants #: 1-20 Participants #: 21-40
26
Structure
Factors – new term for independent variable
Levels – number of variations or categories in IV
Notation
A x B x C -> 2 x 2 x 3
Where number of terms represents number of factors
And the value of each term represents number of levels in that factor
So the product of each term represents the total number of conditions
Example: “We utilized a 2 (Game type: violent, non-violent) x 2
(Opponent type: person, computer) between-subjects factorial
design”
Or, a 2×2
DESCRIBING A FACTORIAL DESIGN
27
Learning check!
A CONCEPTUAL
DEMONSTRATION
28
KINDS OF STATISTICAL EFFECTS
M a i n e f f e c t
On average, levels of Factor A dif fer from each other
On average, levels of Factor B dif fer from each other
S i m p l e e f f e c t
At a specific level of Factor A , levels of Factor B dif fer from each other
At a specific level of Factor B, levels of Factor A dif fer from each other
I n te r a c t i o n
The ef fect of Factor A on the DV depends on the level of Factor B
The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B
29
We’ll come
back to this one
in a bit
2X2 EXAMPLE STUDY
• Wells & Petty (1980)
• Previous work shows that we sometimes infer
our attitudes and feelings by looking at our
behavior
• Suggested that we use our physical behavior
as an attitude cue
30
Asked par ticipants to “help evaluate headphones”
Nod head up and down
Shake head lef t to right
Listened to per suasive argument
Advocate tuition increase
Advocate tuition decrease
Rated opinion on tuition change
Head nod Head shake
Tuition
increase
1 2
Tuition
decrease
3 4
Factor A
Factor B
F
A
C
T
O
R
B
FACTOR A
31
POSSIBLE EFFECTS
When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects
Interaction
32
POSSIBLE EFFECTS
When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
Head nod Head shake
Tuition
increase
Tuition
decrease
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
33
POSSIBLE EFFECTS
When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
Head nod Head shake
Tuition
increase
Tuition
decrease
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
34
POSSIBLE EFFECTS
When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Head nod Head shake
Tuition
increase
Tuition
decrease
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
35
POSSIBLE EFFECTS
When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects
Head nod Head shake
Tuition
increase
Tuition
decrease
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
36
Head nod Head shake
Tuition
increase
Tuition
decrease
Head nod Head shake
Tuition
increase
Tuition
decrease
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
INTERACTIONS
• Ef fects of one IV on DV depend on presence of second IV
• Two types
• Spreading: effect exists at one level of the IV and is
weaker or nonexistant at different level
• Crossover: no main effects of either IV because effects
are opposite at different levels of other IV
37
CONCEPTUALLY,
WHAT CAN WE LEARN
FROM
FACTORIAL DESIGNS?
38
KINDS OF STATISTICAL EFFECTS
M a i n e f f e c t
On average, levels of Factor A dif fer from each other
On average, levels of Factor B dif fer from each other
S i m p l e e f f e c t
At a specific level of Factor A , levels of Factor B dif fer from each other
At a specific level of Factor B, levels of Factor A dif fer from each other
I n te r a c t i o n
The ef fect of Factor A on the DV depends on the level of Factor B
The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B
39
WINE-RATING EXAMPLE
H o w d o q u a l i t y a n d p r i c e s
a f f e c t w i n e r a t i n g s ?
Cheap Price Expensive Price
Low Quality
Low Quality,
Cheap Price
Low Quality,
Expensive Price
High Quality
High Quality,
Cheap Price
High Quality,
Expensive Price
Price
Quality
Factors: Quality Price
High LowLevels: Cheap Expensive
2 levels x 2 levels = 4 conditions
40
Cheap Price Expensive Price
Low Quality 35 87 61
High Quality 51 87 69
43 87 Marginal Means
MAIN EFFECTS
The effect of one factor on average across all levels of
the other factor(s); difference between marginal means
Main Effect of Price
87 – 43 = 44
Main Effect of
Quality
69 – 61 = 8
Main effect of price: On average,
expensive wines (M=87) were rated 44
points higher than cheap wines (M=43)
Main effect of Quality:
On average, high-
quality wines (M=69)
were rated 8 points
higher than low-quality
wines (M=61)
41
Cheap Price Expensive Price
Low Quality 35 87
High Quality 51 87
SIMPLE EFFECTS
Simple Effect of Price
on Low Quality Wines
87 – 35 = 52
Simple Effect of Price
on High Quality Wines
87 – 51 = 36
Simple Effect of Quality
on Cheap Wines
51 – 35 = 16
Simple Effect of Quality
on Expensive Wines
87 – 87 = 0
When the wine is cheap,
high-quality wines (M=51)
are rated 16 points higher
than low-quality wines
(M=35)
When the wine is expensive,
high-quality (M=87) wines
are rated the same as low-
quality wines (M=87)
When the wine is high-
quality, expensive wines
(M=87) are rated 36 point
higher than cheap wines
(M=51)
When the wine is low-quality,
expensive wines (M=87) are
rated 52 points higher than
cheap wines (M=35)
42
Cheap Price Expensive Price
Low Quality 35 87
High Quality 51 87
INTERACTIONS
Simple Effect:
Price on Low Quality Wines
87 – 35 = 52
Simple Effect:
Price on High Quality Wines
87 – 51 = 36
Simple Effect:
Quality on Cheap Wines
51 – 35 = 16
Simple Effect:
Quality on Expensive Wines
87 – 87 = 0
The effect of one factor depends on the levels of the
other factor(s); difference between simple effects
Interaction: The effect of quality is
different for cheap wines vs. expensive
wines
(the effect of quality depends on price)
Interaction: The effect of
price is different for high-
quality wines vs. low-
quality wines (the effect of
price depends on quality)
43
HOW TO DESCRIBE INTERACTIONS
M u s t d e s c r i b e a t l e a s t t w o s i m p l e e f f e c t s :
E x a m p l e 1 : W h e n t h e p r i c e i s c h e a p , h i g h – q u a l i t y w i n e s
( M = 51 ) a r e r a t e d 1 6 p o i n t s h i g h e r t h a n l o w – q u a l i t y w i n e s
( M = 3 5 ) , b u t w h e n t h e p r i c e i s ex p e n s i v e , h i g h – a n d l o w –
q u a l i t y w i n e s a r e r a t e d e q u a l l y h i g h ( M = 87 f o r b o t h ) .
E x a m p l e 2 : W h e n t h e q u a l i t y i s l o w, c h e a p w i n e s ( M = 3 5 ) a r e
r a t e d 5 2 p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) , b u t w h e n
t h e q u a l i t y i s h i g h , c h e a p w i n e s ( M = 51 ) a r e r a t e d o n l y 3 6
p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) .
Cheap Price Expensive Price
Low Quality 35 87
High Quality 51 87
Learning check!
44
STATISTICAL EFFECTS IN
GRAPHS
45
MAIN EFFECTS
Low
High
Quality
Comparing averages of end-points
Main effect of Quality:
On average, high-quality
wines were rated 8 points
higher than low-quality wines
Main effect of price:
On average, expensive
wines were rated 44 points
higher than cheap wines
46
SIMPLE EFFECTS
Low
High
Quality
Comparing any 2 end-points
Simple Effect of Quality
on Cheap Wines
51 – 35 = 16
Simple Effect of Quality
on Expensive Wines
87 – 87 = 0
When the wine is cheap, high-
quality wines are rated 16 points
higher than low-quality wines
When the wine is expensive,
high-quality wines are rated
the same as low-quality wines
47
SIMPLE EFFECTS
Low
High
Quality
Comparing any 2 end-points
Simple Effect of Price
on High Quality Wines
87 – 51 = 36
Simple Effect of Price
on Low Quality Wines
87 – 35 = 52
When the wine is high-quality,
expensive wines are rated 36
points higher than cheap wines
When the wine is low-quality,
expensive wines are rated 52
points higher than cheap wines
48
INTERACTIONS
Low
High
Quality
Are the differences different?
When the wine is cheap, high-
quality wines are rated 8
points higher than low-quality
wines
When the wine is expensive,
high-quality wines are rated
the same as low-quality wines
Interaction: The effect of quality is different
for cheap wines vs. expensive wines
49
INTERACTIONS
Low
High
Quality
Are the differences different?
When the wine is high-quality,
cheap wines are rated 36 points
lower than expensive wines
When the wine is low-quality,
cheap wines are rated 52 points
lower than expensive wines
Interaction: The effect of price is different
for high-quality wines vs. low-quality wines
50
INTERACTIONS
Low
High
Quality
Low
High
Quality
Interaction: No interaction:
Are the differences different?
Learning check!
51
LINE VS. BAR GRAPHS
0
10
20
30
40
50
60
70
80
90
100
Cheap Expensive
R
at
in
g
Price
0
10
20
30
40
50
60
70
80
90
100
Cheap Expensive
R
at
in
g
Price
52
LINE VS. BAR GRAPHS
0
10
20
30
40
50
60
70
80
90
100
Cheap Expensive
R
at
in
g
Price
0
10
20
30
40
50
60
70
80
90
100
Cheap Expensive
R
at
in
g
Price
53
MAIN EFFECTS IN BAR GRAPHS
0
10
20
30
40
50
60
70
80
90
100
Cheap Expensive
Low Quality High QualityMain effect of Quality:
On average, high-
quality wines were
rated 8 points higher
than low-quality wines
Main effect of price:
On average, expensive
wines were rated 44
points higher than
cheap wines
54
INTERACTIONS IN BAR GRAPHS
0
20
40
60
80
100
Cheap Expensive
Low Quality High Quality
Interaction: The effect of
quality is different for cheap
wines vs. expensive wines
Interaction: The effect of
price is different for high-
quality wines vs. low-quality
wines
55
GRAPHING THE SAME DATA SEVERAL WAYS
0
50
100
Low Quality High Quality
Cheap Expensive
0
20
40
60
80
100
Cheap Expensive
Low Quality High Quality
0
50
100
Low Quality High Quality
Cheap Expensive
0
20
40
60
80
100
Cheap Expensive
Low Quality High Quality
56
HIGHER ORDER INTERACTIONS
This figure from “Retrieval Practice
Protects Against Stress” shows a
2x2x2 design
1. Test 1 (immediate) vs. Test 2
(delayed)
2. Study practice (SP) vs. Retrieval
practice (RP)
3. Stressed (white) vs. Non-
stressed (grey)
A 3-way interaction is when the
effect of one factor depends on 2
other factors:
• There is an interaction between
Study Method and Stress
Induction for Test 2 but not for
Test 1
• The effect of stress depends on
how you studied, but also on
when the test happened
57
INTERPRETING TEXT
58
INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1
Main effect of
group behavior
Main effect of
partner behavior
Interaction between
group behavior &
partner behavior
59
INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1
60
INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2
Interaction between
reciprocity/conformity &
partner knowledge
Simple effect of
reciprocity/conformity
when partner behavior is
known
Simple effect of
reciprocity/conformity
when partner behavior is
unknown
61
INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2
Learning check!
62
Re a d ing + M i nd t ap
F i r s t c o nte nt – l o aded t u to r i al ( + fi r s t a s s i g nment )
M i d ter m exa m w i l l b e i n t h e we e k b e fo r e r e a d ing we e k – t h i s
i s a g o o d t i m e to s t a r t m a k ing a s t u d y p l a n, s p r e a d o u t ove r
t i m e , s o yo u d o n’ t ne e d to c r a m !
Midterm info is already posted! See Assignments page of syllabus.
63
TO-DO
- Psy 202H1: �Statistics iI���Module 4: �Intro to Factorial ANOVA�
- Game Plan
- Some Quick Clarification
- Some Quick Clarification
- Introduction to Factorial �Design and Analysis
- Slide Number 6
- Why Factorial ANOVAS?
- Slide Number 8
- Factorial Anova
- Slide Number 10
- Slide Number 11
- Slide Number 12
- Age × Exercise on Fitness Interaction!
- Measuring More than one outcome
- Slide Number 15
- DISCLAIMER
- Factorial Design
- Manipulating multiple factors
- Wine-rating example
- Factorial design
- Factorial design table
- Between- vs. within-subject factorial design
- AN Example
- Types of Factorial Designs: Between
- Types of Factorial Designs: Within
- Types of Factorial Designs: Mixed
- Describing a Factorial Design
- A Conceptual Demonstration
- Kinds of Statistical effects
- Slide Number 30
- Slide Number 31
- Possible Effects
- Possible Effects
- Possible Effects
- Possible Effects
- Possible Effects
- Slide Number 37
- Conceptually,� What can we learn from �Factorial Designs?
- Kinds of Statistical effects
- Wine-rating example
- Main effects
- Simple effects
- Interactions
- How to describe Interactions
- Statistical effects in graphs
- Main Effects
- Simple Effects
- Simple Effects
- Interactions
- Interactions
- Interactions
- Line vs. Bar graphs
- Line vs. Bar graphs
- Main effects in bar graphs
- Interactions in bar graphs
- Graphing the same data several ways
- Higher Order Interactions
- Interpreting text
- Interpreting text:�reciprocity & conformity, Study 1
- Interpreting text:�reciprocity & conformity, Study 1
- Interpreting text �reciprocity & conformity, Study 2
- Interpreting text �reciprocity & conformity, Study 2
- To-Do
TD0409-01 课件/Psy 202_3_RepeatedANOVA_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 3:
REPEATED MEASURES ANOVA
1
1. Repeated Measures ANOVA
1. Sample Problem
2. Effect Size
3. Posthoc Tests
GAME PLAN
2
Spre ad o ut yo ur s t udy i n g
All else being equal, studying twice for 1 .5 hours each is better than
studying once for 3 hrs. Studying three times for 1 hr each is even better.
Th e b e s t s t udy i n g mat c h e s t h e t y pe o f as s e s s me n t
The test will have computations? Do practice problems (MindTap,
lectures, book problems, as well as a ton of other stuf f online)
The test will require you to explain things? Practice explaining things! To
a study buddy, your notebook , or a lamp, it doesn’t matter who is
listening. What matters is the explaining (in your own words).
The test will be closed book? Well, your studying better include you
closing your book or notes and practicing recalling information!!
In f ac t , t h i s i s t h e be s t advi c e I c an o f fe r: w h ateve r e l s e yo u do ,
yo u s t udy i n g mus t i n c l ude te s t i n g yo ur s e l f . Cl o s e yo ur n ote s an d
s e e w h at po ps o ut . Th e n , us e t h at to g ui de yo ur s t udy i n g .
Fo r mo re t i p s , vi s i t Th e L e arn i n g Sc i e n t i s t s
https://www.learningscientists.org/downloadable-materials/
MY BEST STUDY ADVICE
(FOR ALL CLASSES, NOT JUST THIS ONE)
3
REPEATED MEASURES
(OR WITHIN-SUBJECT)
ANOVA
4
THE LOGIC OF O N E – WAY
R E P E AT E D M E A S U R E S ANOVA
F = variance between sample means
variance expected by chance (error/natural
variability)
WTF??!!??
This looks identical
to the one way
between ANOVA
5
I nd e p endent – meas ures A N OVA u s e s m u l t i ple p a r t i c ipant
s a m p les to te s t t h e t r e a t m ent s.
If groups are different, what was responsible?
Treatment dif ferences?
Par ticipant group dif ferences?
Re p e ated- m easur es s o l ve s t h i s p ro b l em by te s t i ng a l l
t r e a t ment s u s i ng o ne s a m p le o f p a r t i c ipant s .
In an experiment, compare two or more manipulated treatment
conditions using the same participants in all conditions
In a nonexperimental study, compare a group of participants at two
or more different times
Before therapy; af ter therapy; 6-month follow -up
Compare vocabular y at age 3, 4 and 5
REPEATED MEA SURES A NOVA :
WIT HIN-SUB JECT S DESIGN WITH MORE THA N 2
GROUPS
6
EXAMPLE!
https://www.scientificamerican.com/article/is-double-dipping-a-food-safety-problem-or-just-a-nasty-habit/ 7
F = variance between groups
variance within groups
Two sources of variance:
Between group variance — how big are differences
between groups
Within group variance — how much error/natural
variability
THE LOGIC OF O N E – WAY
R E P E AT E D ANOVA
8
BET WEEN GROUP VARIANCE IN
REPEATED MEASURES
Why do people in dif ferent groups dif fer?
1. Treatment effect = differences caused by our
experimental treatment
Systematic dif ferences
2. Chance = differences due to random factors
including…
Individual differences
Experimental error (noise)
Non-systematic, random dif ferences
In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role
9
WITHIN GROUP VARIANCE IN
REPEATED MEASURES
Why do people within the same group dif fer?
1. Chance = differences due to random factors
including…
Individual differences
Experimental error (noise)
Non-systematic, random differences
In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role
10
Re p e ated- m easur es d e s i gn a l l ows c o nt ro l o f t h e e f fec t s o f
p a r t i c ipant c h a r ac ter is tic s
Eliminated from the numerator by the research design
Must be removed from the denominator statistically
T h e b i g ges t c h a ng e b et ween i nd e pendent – m easur es A N OVA
a nd r e p e ated- meas ur es A N OVA i s t h e a d d it io n o f a p ro c e s s to
m a t h ema tic ally r e m ove t h e i nd i v idual d i f fer enc es va r i a nc e
c o m p o ne nt f ro m t h e d e no m i nato r o f t h e F – r a t i o .
HOW DO WE DEAL WITH INDIVIDUAL
DIFFERENCES?
11
If Null is True:
If Null is False:
F= Between-group (Treatment or Chance- individual differences )
Within-group (Chance -individual differences )
F = Treatment Effect + Chance
(Chance- individual differences)
F = 0 + (Chance- individual differences)
(Chance- individual differences)
≈ 1
> 1
12
THE R E P E AT E D
M E A S U R E S F -RATIO
THE R E P E AT E D
M E A S U R E S F -RATIO
If Null is True:
If Null is False:
F = Between-group (Treatment or Just Experimental Error)
Within-group (Experimental Error)
F = Treatment Effect + Chance
Experimental Error
F = 0 + Experimental Error
Experimental Error
≈ 1
> 1
13
THE R E P E AT E D
M E A S U R E S F -RATIO
F is the ratio between two variance estimates
Denominator is called “error term” composed of
individual difference variability and experimental error
14
T WO STAGES OF THE REPEATED-
MEASURES ANOVA
First stage
Identical to independent samples ANOVA
Compute SStotal, SSbetween treatments and SSwithin treatments
Second stage
Done to remove the individual differences from the
denominator
Compute SSbetween subjects and subtract it from SSwithin
treatments to find SSerror (also called residual)
15
STRUCTURE OF THE REPEATED-MEASURES
ANOVA
If Within-Group variance
can be partitioned into
individual differences
and error, then the sum
of between subjects and
error values (i.e., SS, df)
will always equal Within!
16
REPEATED MEASURES DESIGNS:
PROS & CONS
• Repeated Pros:
Participants serve as their own “controls” (reduced
error, more power)
Need fewer participants for the same research
question (compared to between-subjects design)
• Repeated Cons:
Order effects, practice effects
May guess hypothesis / aware of what is being
manipulated
Longer studies
Limits possibilities for experimental manipulations
17
EFFECT SIZE FOR THE
REPEATED-MEASURES ANOVA
Percentage of variance explained by the
treatment differences
Partial η2 is percentage of variability that has
not already been explained by other factors
or
subjectsbetween total
eatmentsbetween tr2
SS SS
SS
−
=η
errorSSSS
SS
+
=
eatmentsbetween tr
eatmentsbetween tr2η
18
REPEATED-MEASURES ANOVA
POST HOC TESTS (POSTTESTS)
Significant F indicates that H0 (“all
populations means are equal”) is wrong in
some way.
Use post hoc test to determine exactly where
significant differences exist among more than
two treatment means
Tukey’s HSD can be used
Substitute SSerror and dferror in the formulas
19
REPEATED-MEASURES ANOVA
ASSUMPTIONS
The obser vations within each treatment condition
must be independent.
The population distribution within each treatment
must be normal.
The variances of the population distribution for each
treatment should be equivalent.
20
Learning check!
PRACTICE WITH
REPEATED MEASURES
ANOVA
21
STRUCTURE OF THE REPEATED-MEASURES
ANOVA
22
Re s e arc h Qu e s t io n
A researcher is tr ying to determine the best way for individuals to
recall a list of words. Eight participants each received three lists of
words and tried to remember them using three different ways of
memorizing (rote rehearsal, an imager y mnemonic technique, or a
stor y mnemonic technique). After each study period, participants did
a ten minute distractor task then took a test on the word list. Was
there a difference in recall based on the type of memor y technique
that participants used?
IV: Memor y technique
3 levels: rote vs imager y vs stor y
DV: Number of words recalled
R E P E AT E D M E A S U R E S A N OVA – L E T ’ S P R AC T I C E !
23
Participant Rote Imagery Story
A 2 4 5
B 3 2 3
C 3 5 6
D 3 7 6
E 2 5 8
F 5 4 7
G 6 8 10
H 4 5 9
DATA FROM MEMORY STUDY
M1 = 3.5 M2 = 5 M3 = 6.75
Are these 3
means
significantly
different
from each
other?
24
HYPOTHESIS TESTING WITH RM ANOVA
Re s e arc h qu e s t i o n
Does memory technique affect word recall?
Ste p 1 : St a t is ti cal H y p ot h eses
H0: µ1 = µ2 = µ3
H1: At least one mean is different from another
Ste p 2 : D e c i sio n Ru l e
Look up critical value of F in Table
Ste p 3 : C o m p u te o b s e r ved F – r a t io
Track values in ANOVA Summary Table
Ste p 4 : M a ke a D e c i s io n ( Re j ec t o r r et a i n H 0)
* * Ste p 5 : I f H 0 r e j e cted , c o nd u c t p o s t – h o c c o m p a r iso ns
Ste p 6 : C o m p u te E f fe ct S i z e , I nte rp r et a nd Re p o r t F i nd ings
25
COMPUTING ANOVA
The ANOVA Summar y Table
Source SS df MS F
Between group SSB dfB MSB MSB
MSEWithin group SSW dfW
Between
Subjects
SSP dfP
Error SSE dfE MSE
Total SST dfT 26
FINDING THE CRITICAL VALUE
Find Fcritical in Table
Need to know 3 things
α level
dfnumerator = dfbetween
dfdenominator = dferror
If α = .05 and df = 2, 14, Fcritical = 3.74
27
CRITICAL VALUES OF F FOR DF=2,14
Critical region;
Reject H0
3.74 6.51
28
COMPUTING ANOVA
STEP 1: Compute Sums of Squares (SS)
SSTotal =
2
2 GX
N
−∑
SSBetween =
22 GT
Nn
−∑
SSWithin = Σ(SS for each group) or SSTotal − SSBetween
Where:
• X = each value of X
• T = treatment group total (ΣX)
• G = grand total (ΣT)
• n = sample size of each group
• N = total sample size (Σn)
29
Participant Rote Imagery Story
A 2 4 5
B 3 2 3
C 3 5 6
D 3 7 6
E 2 5 8
F 5 4 7
G 6 8 10
H 4 5 9
DATA FROM MEMORY STUDY
M1 = 3.5 M2 = 5 M3 = 6.75
n 8 8 8 N = 24
Totals T1=28 T2=40 T3=54 G = 122
N = 24
n = 8
K = 3
30
COMPUTING ANOVA
STEP 2: Compute Degrees of Freedom (df)
dfBetween = k – 1
Where:
• n = sample size of each group
• N = total sample size (Σn)
• k = number of groups
dfWithin = N – k or Σ(n-1)
dfTotal = N – 1
31
COMPUTING ANOVA
STEP 3 (NEW): Compute Between Subject Values
Where:
• n = sample size of each group
• N = total sample size (Σn)
• G = grand total (ΣT)
• P = person totals (Σx for each
participant)
• k = number of groups
SSbetweensubjects =
22 GP
Nk
−∑
32
Participant Rote Imagery Story P
A 2 4 5
B 3 2 3
C 3 5 6
D 3 7 6
E 2 5 8
F 5 4 7
G 6 8 10
H 4 5 9
DATA FROM MEMORY STUDY
M1 = 3.5 M2 = 5 M3 = 6.75
2 + 4 + 5 = 9
3 + 2 + 3 = 8
14
16
15
16
24
18
33
COMPUTING ANOVA
STEP 3 (NEW): Compute Between Subject Values
Where:
• n = sample size of each group
• N = total sample size (Σn)
• G = grand total (ΣT)
• P = person totals (Σx for each
participant)
• k = number of groups
SSbetweensubjects =
22 GP
Nk
−∑
SSerror = SSWithin – SSbetweensubjects
dfbetweensubjects = n – 1
dferror = dfwithin – dfbetweensubjects OR (N-k)-(n-1)
34
COMPUTING ANOVA
STEP 4 (UPDATE): Compute Mean Squares (MS)
MSBetween = between
between
df
SS
MSerror =
error
error
df
SS
35
COMPUTING ANOVA
STEP 4 (UPDATE): Compute the F -Ratio
F-Ratio =
error
between
MS
MS
36
THE ANOVA SUMMARY TABLE
Source SS df MS F
Between group SSB dfB MSB MSB
MSEWithin group SSW dfW
Between
Subjects
SSP dfP
Error SSE dfE MSE
Total SST dfT
37
THE ANOVA SUMMARY TABLE
Source SS df MS F
Between group 42.33 2 21.17 8.64
Within group 73.5 21
Between
Subjects
52.5 7
Error 34.33 14 2.45
Total 115.83 23
38
EFFECT SIZE FOR REPEATED MEASURES
total
eatmentsbetween tr2
SS
SS
=η
jectsbetweensubtotal
eatmentsbetween tr2
SS SS
SS
−
=η
For independent measures ANOVA
For repeated measures ANOVA
39
EFFECT SIZE FOR REPEATED MEASURES
jectsbetweensubtotal
eatmentsbetween tr2
SS SS
SS
−
=η For repeated measures ANOVA
η2 = 4 2 . 3 3 = . 5 5 2
1 1 5 . 8 3 – 3 9 .17
40
TUKEY HSD TEST
Step 1: Find the value of “q”
Need to know 3 things:
α
dfE
k
Step 2: Compute HSD
HSD =
n
MSerrorq
Where n = group sample
size, assuming equal n
in each group
41
TUKEY HSD TEST
Step 1: Find the value of “q”
α = .05 dfE = 14 k = 3
From Table B.5: q = 3.70
Step 2: Compute HSD
HSD = = ± 2.05 words
8
45.2
70.3
n
MSerror =q
So, a pair of means must
differ by at least 2.05 in
order to be significantly
different
42
TUKEY HSD TEST
Step 3: Compute dif ference between each
pair of means and compare to HSD
M1 – M2 = 3.5 – 5 = 1.5
M1 – M3 = 3.5 – 6.75 = 3.25
M2 – M3 = 5 – 6.75 = 1.75
Does NOT
exceed 2.05
Exceeds 2.05
Does NOT
exceed 2.05
43
TUKEY HSD TEST
What do we conclude?
M1 does not differ from M2
There was no difference in word recall when
participants used the rote rehearsal or imagery
techniques
M2 does not differ from M3
There was no difference in word recall when
participants used the story technique or imagery
technique
Only M3 differs from M1
People remembered significantly more words when
using the story technique than the rote technique.
44
T h e r e wa s a s i g ni fic ant e f fec t o f m e m o r y te c h niqu e o n wo r d
r e c a l l, F ( 2 , 14 ) = 8 . 6 4 , p < . 0 5 , η2 = . 5 5 . Tu key p o s t – h o c
c o m p a r iso ns i nd i c a ted t h a t p a r t i ci pant s r e m e m bered
s i g ni ficant ly m o r e wo r d s w h e n s t u d y ing w i t h t h e s to r y te c h niqu e
( M = 6 . 7 5, S D = 2 . 3 ) t h a n w h e n t h ey s t u d ied w i t h rote r e h e ar sal
( M = 3 . 5 , S D = 1 . 4 ) , p < . 0 5 . N e i t her te c h nique l e d to d i f fer ent
r e s u lt s f ro m t h e i m a ger y m ne m o ni c ( M = 5 , S D = 1 . 9 ) .
FORMAL REPORT
SD for each group
SD =
1
SS
−n 45
REPORTING A REPEATED MEASURES
F -STATISTIC
A closer look…
F(2, 14) = 8.64, p < .05,
Test
statistic
Observed
value
alpha
level
Degrees of
freedom
(B, Error)
Significance
Sig? p < α
Nonsig? p > α
55.2 =η
Effect
size
46
Learning check!
Tu to r ial 1 now ava i lable!
M i nd t ap C H 1 2 – d u e J a n 3 0
47
TO-DO
- Psy 202H1: �Statistics iI���Module 3: �Repeated Measures ANOVA��
- Game Plan
- My Best Study Advice �(for all classes, not just this one)
- Repeated Measures (or Within-Subject) ANOVA
- THE LOGIC OF ONE-Way �Repeated Measures ANOVA
- Repeated Measures ANOVA: �Within-Subjects Design with more than 2 groups
- Example!
- Slide Number 8
- BET WEEN GROUP VARIANCE in repeated mEASURES
- WITHIN GROUP VARIANCE in repeated mEASURES
- How do we deal with individual differences?
- Slide Number 12
- THE repeated MeaSURES F-RATIO
- THE Repeated mEASURES F-RATIO
- Two Stages of the Repeated-Measures ANOVA
- Structure of the Repeated-Measures ANOVA
- Repeated Measures Designs:�Pros & Cons
- Effect size for the �Repeated-Measures ANOVA
- Repeated-Measures ANOVA�Post Hoc Tests (Posttests)
- Repeated-Measures ANOVA Assumptions
- Practice with Repeated Measures ANOVA
- Structure of the Repeated-Measures ANOVA
- Repeated Measures ANOVA – Let’s Practice!
- Data from Memory Study
- Hypothesis testing with RM ANOVA
- Computing ANOVA
- Finding the Critical Value
- Critical values of F for df=2,14
- Computing ANOVA
- Data from Memory Study
- Computing ANOVA
- Computing ANOVA
- Data from Memory Study
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- The ANOVA Summary Table
- The ANOVA Summary Table
- Effect Size for Repeated Measures
- Effect Size for Repeated Measures
- Tukey HSD Test
- Tukey HSD Test
- Tukey HSD Test
- Tukey HSD Test
- Formal Report
- Reporting a Repeated Measures� F-statistic
- To-DO
TD0409-01 课件/Psy 202_1_Review.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 1:
FOUNDATIONS REVIEW
1
1. Foundations Review
1. Very quick!
2. Not for teaching, but for reminding
GAME PLAN
2
2 01 i s a T RU E p r e r eq
U s e yo u r r e s o u rc es, e a r l y a nd o f te n
Form study groups
Text + MindTap resources from Ch 1-11
In “Psy 201 Review” folder at bottom of main page
Appendix A
Campus tutoring and study centres (like New College Stats Aid Centre)
Web resources like
https://www.learner.org/series/against-all-odds-inside-statistics/
https://www.khanacademy.org/math/statistics-probability
http://devpsy.org/links/open_source_textbooks (scroll down for stats)
A NOTE ABOUT THE COURSE PREREQ
3
SUPER SPEEDY REVIEW
4
STATISTICS
S t a t is ti cs i s t h e s c i enc e o f g a i ning i n s ight f r o m d a t a
T h e te r m s t at is tics r e fe r s to a s et o f m a t h emat ic al p ro c e d ur es
fo r o r g a ni zing, s u m m ar izing, a nd i nte rp ret ing i nfo r m a t i o n
“statistics help researchers bring order out of chaos” (p. 5)
5
STATISTICS
Tw o g e ner a l p u rpo ses :
To organize and summarize the information so that the researcher
can see what happened in the research study and can communicate
the results to others
To answer the questions that initiated the research by determining
exactly when general conclusions are justified based on the specific
results that were obtained
6
INFERENTIAL STATISTICS
C o ns i s t o f t h o s e te c h niques t h a t a l l ow u s to s t u d y s a m p les
a nd t h e n m a ke g e ne r alizat io ns a b o u t t h e p o p u l atio ns f ro m
w h i c h t h ey we r e s e l ec ted
7
INFERENTIAL STATISTICS
Many research situations begin with a
population that forms a normal distribution
A sample is selected from the population, and
receives a “treatment”, & the goal is to evaluate
the treatment
Probability is used to
decide whether the
treated sample is
“noticeably dif ferent”
from the population
Do we reject the null
hypothesis or not?
8
I t i s s t a t i s tic ally I M P OS SIB L E to d e m o ns t rate a p h e no m eno n
i s a b s o l utely t r u e ( a l l a b o u t p ro b a bilit y! )
Researchers instead falsify
Supporting evidence may not signal a theor y is always true;
disconfirming evidence does signal that a theor y is not always true
S o , we s e e k to fi nd ev i denc e t h a t i s i t u nl i kely t h a t o u r
hy p ot hes is i s f a l s e
• P ro c e s s : L o g i c o f t h e N u l l H y p ot h e sis
• We determine what the population (distribution) would look like if the null
hypothesis were true
• Then, we see if our sample data are likely to have come from this distribution
• In other words, we look for the likelihood that our data are
consistent with the idea that there is no effect
PROVE
9
HYPOTHESIS TESTING
A hy pothes i s tes t i s a s t a t i s ti cal p ro c e dur e t h a t u s e s d a t a
c o l l e c ted f ro m a s a m p le to eva l uate a p a r t i cu lar hy p ot hesi s
a b o u t a p o p u l at io n
We make predictions about an unknown population
10
THE HYPOTHESIS TESTING
PROCEDURE
1. State the hypotheses.
2. Locate the critical region.
(Note: You must find the value for df and use the distribution
table for whichever statistic you are using.)
3. Calculate the test statistic.
4. Make a decision.
Either “reject” or “fail to reject” the null hypothesis.
5. Report your findings
1. Formal APA statistical tag
2. Plain language description of nature of effect
3. Include effect size
11
THE HYPOTHESIS TESTING
PROCEDURE
S te p 1 : S t a te t h e hy pot hes es
We have two opposing hypotheses about the population
Null hypothesis (H0):
Predicts that the independent variable (treatment) has no ef fec t on the
dependent variable for a population
Alternative hypothesis (H1)
Predicts that the independent variable (treatment) does have an ef fec t on
the dependent variable
12
THE HYPOTHESIS TESTING
PROCEDURE
S te p 2 : L o c a te t h e c r i t ic a l r e g io n
Must decide which sample means would be consistent with the null
hypothesis (and therefore lead to accepting the null hypothesis), and
which sample means would be at odds with the null hypothesis (and
therefore lead to rejecting the null hypothesis)
The alpha value, or the level of significance, is the probability value
used to define “ver y unlikely”
E.g., with α = .05, we separate the most unlikely 5% of the sample means
(extreme values) from the most likely 95% of the sample means (central
values)
13
THE HYPOTHESIS TESTING
PROCEDURE
S te p 2 : L o c a te t h e c r i t ic a l r e g io n
The critical region is defined by the alpha level
E.g., An alpha of .05 (α = .05) indicates that the size of the critical
region is p = .05 (5% of all possible sample means)
14
THE HYPOTHESIS TESTING
PROCEDURE
S te p 3 : C a l c ula te s a m ple s t a t is t ics
E.g., Compare the sample mean (from your data) with the null
hypothesis (e.g., that the population mean is the same as the
original population)
Obtained difference between our data and hypothesis
___________________________________________________________________________________________________________________________________
Standard difference between our data and hypothesis
Observed difference
How much difference we would expect by chance
15
THE HYPOTHESIS TESTING
PROCEDURE
Step 4: Make a decision
Two possible outcomes:
You reject the null hypothesis, and conclude that the treatment does
have an effect.
You fail to demonstrate that the treatment has an effect, so you fail to
reject the null hypothesis.
16
SUMMA RY: OUTCOMES OF HY POTHESIS TESTING
Decision
True Status of H0
No Effect
H0 True
Effect
H0 False
Reject H0 Type I Error
α
Probability of T1 Error
Correct
1 – β
‘power’
Retain H0
(also called
“fail to reject”)
Correct
1 – α
Level of confidence
Type II Error
β
Probability of T2 Error
17
REVIEW OF
HYPOTHESIS TESTING
WITH THE T STATISTIC
18
SINGLE-SAMPLE T STATISTIC
Do newborn infants prefer to look at attractive
versus unattractive faces?
Infants were shown two photographs of women’s
faces (one rated by adults as more attractive than
the other)
Pair of faces remained on the screen until the baby
accumulated a total of 20 seconds of looking
DV: Number of seconds spent looking at the
attractive face
N = 9, M = 13 seconds, SS = 72
(Two-tailed test, α = .05)
19
SINGLE-SAMPLE T STATISTIC
1 . State the null and alternative hypotheses
Null hypothesis:
The infants have no preference for either face
H0: μattractive = 10 seconds
Alternative hypothesis:
The infants prefer one face over the other
H1: μattractive ≠ 10 seconds
20
SINGLE-SAMPLE T STATISTIC
2 . L o c a te t h e c r i t i c a l r e g i o n :
df = n – 1 = 9 – 1 = 8
Two-tailed test at the .05 level of significance
Critical region consists of t values greater than
+2.306 or less than -2.306
tcrit = +/-2.306
21
SINGLE-SAMPLE T STATISTIC
3 . C a l c ula te t h e te s t s t a t i st ic i n 3 s tep s:
a. Sample variance
a. Estimated standard error
a. t statistic
s2 = SS
n-1
= SS = 72 (given in the problem) = 9
df 8 (calculated previously)
22
SINGLE-SAMPLE T STATISTIC
4 . M a ke a d e c isio n r e ga rdi ng H 0:
T h e o b t a i ned t s t a t i s ti c o f 3 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,
s o we r e j e c t H 0 a nd c o nc l u d e t h a t b a b ies d o s h ow a
p r e fer ence w h e n g i ven a c h o i c e b et ween a n a t t r a c ti ve a nd
u na t t r ac t ive f a c e .
5 . Re p o r t:
“ T h ere wa s a s i g ni fic ant e f fec t o f a t t r a c t ivenes s o n i nf a nt –
l o o k i ng t i m e , t ( 8 ) = 3 . 0 0 , p < . 0 5 , t wo – t a i led. I n ot h e r wo r d s ,
i nf a nt s l o o ke d l o ng e r a t a t t r a c tive f a c e s t h a n ex p ec ted. ”
23
BETWEEN-SUBJECTS OR
INDEPENDENT-MEASURES DESIGNS
Use a separate group of participants for each
treatment condition (or for each population)
We use subscripts to denote
which population or sample
we are referring to:
e.g., μ1, μ2
24
SI NGLE-SAMPLE VERSUS I NDEPENDENT
SAMPLES T FORMULAS
Single sample:
Independent samples:
• According to the null hypothesis, the population mean
difference is 0 (μ1 – μ2 = 0)
25
INDEPENDENT-MEASURES T STATISTIC
n = 10
M = 93
SS = 200
n = 10
M = 85
SS = 160
Do students who regularly watched Sesame Street when they
were growing up have better grades than students who did not
watch Sesame Street?
26
INDEPENDENT-MEASURES T STATISTIC
1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
N u l l hy p ot h esis :
There is no difference between the high school grades for students
who watched Sesame Street and those who did not
H0: μ1 – μ2 = 0
A l ter nati ve hy p ot hes is:
There is a difference between the high school grades for students
who watched Sesame Street and those who did not
H1: μ1 – μ2 ≠ 0
27
INDEPENDENT-MEASURES T STATISTIC
2. Locate the critical region:
df = (n 1 – 1) + (n 2 – 1) = df 1 + df 2 = 9 + 9 = 18
Two-tailed test with α = .05, t crit = +/-2.101
28
INDEPENDENT-MEASURES T STATISTIC
3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Pooled variance
a. Estimated standard error
a. t statistic
sp2 = SS1 + SS2 = 200 + 160 = 360 = 20
df1 + df2 9 + 9 18
29
INDEPENDENT-MEASURES T STATISTIC
4 . M a ke a d e c isio n r e ga rdi ng H 0:
T h e o b t a i ned t s t a t i s ti c o f 4 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,
s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e r e i s a s i g ni fic ant
d i f ferenc e b et ween t h e h i g h s c h o o l g r a des o f t h o s e s t u d e nt s
w h o wa t c h ed S e s am e St r e et a nd t h o s e w h o d i d not .
5 . Re p o r t
“ T h ere wa s a s i g ni fic ant e f fec t o f p ro g r a m c o nd i t i o n o n
a c a d emic a c h i evement , t ( 1 8 ) = 4 . 0 0 , p < . 0 5 , t wo – t a i led. I n
ot h e r wo r d s , t h e s t u d ent s w h o wa t c h ed S e s a me St r e et h a d
h i g her g r a d es t h a n t h o s e w h o d i d not wa t c h t h e p ro g r am”
30
REPEATED-MEASURES T STATISTIC
T h e d a t a we u s e i n a r e p e ated m e a s ur es t te s t a r e d i f fer enc e
s c o r es :
N u m er ato r o f t s t a t i s t ic m e a s ur es a c t u al d i f fer enc e b et ween
t h e d a t a M D a nd t h e hy p ot hes is μD
D e no m i na to r m e a s ur es t h e s t a nd ar d d i f fer ence t h a t i s
ex p ec ted i f H 0 i s t r u e
S a m e p ro c e s s a s ot h e r te s t s
Difference score = D = X2 – X1
31
REPEATED-MEASURES T STATISTIC
D o e s t h e c o l o u r r e d i nc r e a s e m e n’ s a t t r a c ti o n to
wo m e n? Re s e a rc her s p r e p ared a s et o f 3 0 wo m e n’ s
p h oto g r aphs , 1 5 m o u nte d o n a r e d b a c k gro und a nd 1 5
m o u nte d o n a w h i te b a c k gro und
One p i c t u re i s t h e “ te s t p h oto g r aph” a nd i t a p p ear s
t w i c e , o nc e m o u nte d o n r e d a nd o nc e o n w h i te.
A s a m p le o f n = 9 m e n r a te e a c h o f p h oto g r aphs o n a
1 2 – p o i nt s c a l e . I s t h e te s t p h oto g raph j u d ged
s i g ni ficant ly m o r e a t t r a c t ive w h e n p r e s e nted o n a r e d
b a c k gro und ?
32
REPEATED-MEASURES T STATISTIC
33
REPEATED-MEASURES T STATISTIC
1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
N u l l hy p ot h esis :
There is no difference in the attractiveness ratings between the red-
mounted versus white-mounted photo
H0: μD= 0
A l ter nati ve hy p ot hes is:
There is a difference in the attractiveness ratings between the red-
mounted and white-mounted photo
H1: μD ≠ 0
34
REPEATED-MEASURES T STATISTIC
2. Locate the critical region:
df = n – 1 = 9 – 1 = 8
Two-tailed test with α = .01 , tcrit = +/-3.355
35
REPEATED-MEASURES T STATISTIC
3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Sample variance
a. Estimated standard error
a. t statistic
s2 = SS
n-1
= SS = 18 = 2.25
df 8
50.
9
25.22
===
n
s
s
DM
36
REPEATED-MEASURES T STATISTIC
4 . M a ke a d e c isio n r e ga rdi ng H 0:
T h e o b t a i ned t s t a t i s ti c o f 6 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,
s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e b a c k gro und c o l o u r h a s a
s i g ni ficant e f fe ct o n t h e j u d ged a t t r a c t iveness o f t h e wo m a n
i n t h e te s t p h oto g r aph
5 . Re p o r t:
“ C h a nging t h e b a c k gro u nd c o l o u r f ro m w h i te to r e d
s i g ni ficant ly i nc r e a sed t h e a t t r a c t ivenes s r a t i ng o f t h e wo m a n
i n t h e p h oto g r aph, t ( 8 ) = 6 . 0 0 , p < . 01 , t wo – t a i led. I n ot h e r
wo r d s , wo m e n o n a r e d b a c k gro und we r e p e rc eived a s m o r e
a t t r a c ti ve t h a n wo m e n o n a w h i te b a c k gro u nd. ”
37
R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S
S i ng l e s a m p le t te s t
Comparing an unknown population mean (for our treatment
condition) to a known population mean (the μ for the original
population given in the problem)
Is our unknown population mean (after treatment) the same as the
mean in the original population? Or is there a difference?
H0: μtreatment = 10 seconds
H1: μtreatment ≠ 10 seconds
38
R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S
I nd e p endent m e a s ures t te s t
Comparing two unknown population means (for each of our
treatment conditions)
Is the population mean for the first treatment condition the same as
the population mean for the second treatment condition? Or is there
a difference?
H0: μ1 – μ2 = 0 (or μ1 = μ2)
H1: μ1 – μ2 ≠ 0 (or μ1 ≠ μ2)
39
R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S
Re p e ated m e a s ur es t te s t
Remember that here we are interested in difference scores
(treatment 2 score – treatment 1 score)
Is the mean difference for the population equal to zero (no change
between score 1 and score 2)? Or is there a difference?
H0: μD = 0
H1: μD ≠ 0
40
- Psy 202H1: �Statistics iI���Module 1: �Foundations Review��
- Game Plan
- A note About the Course Prereq
- Super Speedy Review
- Statistics
- Statistics
- Inferential Statistics
- Inferential Statistics
- PROVE
- Hypothesis Testing
- The Hypothesis Testing Procedure
- The Hypothesis Testing Procedure
- The Hypothesis Testing Procedure
- The Hypothesis Testing Procedure
- The Hypothesis Testing Procedure
- The Hypothesis Testing Procedure
- Summary: Outcomes of Hypothesis Testing
- Review of Hypothesis Testing with the t statistic
- Single-Sample t Statistic
- Single-Sample t Statistic
- Single-Sample t Statistic
- Single-Sample t Statistic
- Single-Sample t Statistic
- Between-Subjects or �Independent-Measures Designs
- Single-Sample versus Independent Samples t formulas
- Independent-Measures t Statistic
- Independent-Measures t Statistic
- Independent-Measures t Statistic
- Independent-Measures t Statistic
- Independent-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Repeated-Measures t Statistic
- Review: �Hypotheses for different type of t tests
- Review: �Hypotheses for different type of t tests
- Review: �Hypotheses for different type of t tests
TD0409-01 课件/Psy 202_2_OneWayANOVA_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 2:
ONE WAY ANOVA
1
1. Intro to ANOVA
1. Designs with More Than Two Groups
2. ANOVA Basics
3. Example
GAME PLAN
2
INTRO TO DESIGNS WITH
MORE THAN 2 GROUPS
3
Re a s o ns ( i n s e r v i c e o f p r e c is io n):
Allows researchers to compare multiple treatments
…with no treatment (control group) or placebo as well
Allows researchers to compare effects of multiple independent
variables simultaneously
Factorial designs: more in Psy 202!
U p fi r s t: m o r e t h a n t wo l evels o f o ne I V
IV (mood): Positive vs Negative
IV (mood): Happy, Sad, Angr y
WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?
4
EFFECT OF ANTI-DEPRESSANT DOSAGE
ON MENTAL HEALTH
0
5
10
15
20
25
30
1 MG 100 MG
0
5
10
15
20
25
30
1 MG 50 MG 100 MG 150 MG
(errors of interpolation) (errors of extrapolation) 5
EFFECT OF CAFFEINE
ON TEST PERFORMANCE
0
1
2
3
4
5
6
7
8
9
10
10 MG 100 MG
0
1
2
3
4
5
6
7
8
9
10
10 MG 50 MG 100 MG
“curvilinear effects”
6
Ad vant ages
Advance theor y with precision (boundar y conditions?)
Insight into non-linear effects
For complex effects, reduces both:
The number of experiments conducted
The number of par ticipants needed
PROS AND CONS OF ADDING LEVELS TO IV
7
n = 20 n = 20
n = 20 n = 20 n = 20 n = 20
Set of 2 condition studies: Total N = 120 One study with one factor with three levels: Total N = 60
8
Ad vant ages
Advance theor y with precision (boundar y conditions?)
Insight into non-linear effects
For complex effects, reduces both:
The number of experiments conducted
The number of par ticipants needed
C o s t s
Increase sample size for an individual study (from a study with 2 groups)
Increases in time, money needed to conduct research
PROS AND CONS OF ADDING LEVELS TO IV
9
H ow d o we a na l y ze t h e d i f fer enc e b et ween 3 g ro u p s a t a t i m e?
Option 1: A series of t-tests
E.g., group 1 vs 2, group 2 vs 3, group 1 vs 3
ALERT: This INFLATES the likelihood of Type I Error!
“Test-wise α” = .05
“Experiment-wise α” ≈ .14 for 3 t-tests
1 – (1- α)c, where c = number of comparisons
1 – (1-.05)3 = 1 – .86 = .14
No longer within range of acceptable risk of Type I Error
A possible solution: Bonferroni correction
Divide desired alpha by number of comparisons*
.05/3 = .017 – new cut-off for determining significance
BUT, as we learned, by reducing the likelihood of a Type I error, we increase likelihood of Type II
error (or, decrease power)
AND, A STATISTICAL COST
* Don’t forget, we should be planning out our hypothesis tests before we do them, so we know what this number is ahead of time
10
O ne – Way A N OVA
A n a n a ly sis o f t h e va r ianc e i n a s et o f s c o r e s o r o b s e r vatio ns ,
te s t s w h et her t h e d i f fer ences i n m e a ns a c ro s s l evel s o f s o m e
f a c to r i s s i g ni fic ant ly g r e ater t h a n t h e d i f fer enc es a m o ng
s c o r e s i n g e ne r al
C o m p a r es a l l g ro u p m e a ns s i m u lt aneo us ly
One statistic to interpret (initially)
Just tells you there is A dif ference, not where the dif ference exists
So, we do post hoc tests to clarify result
Handles inflation of Type I Error
A BETTER STATISTICAL SOLUTION
11
THE BIG PICTURE
Single
score
1 IV
z score
z test
One sample t-
test
Making comparisons
to population (NO IVs)
Sample
mean
σ known σ unknown
Making comparisons
between levels of IV(s)
or groups
More than 1 IV
2 levels 3+ levels IV
Between
subjects
Within
subjects
Independent
samples t-test
Paired
samples t-test
Between
subjects
Within
subjects
One-Way
Between
ANOVA
One-Way
Repeated
ANOVA
All IVs
Between
subjects
All IVs
Within
subjects
Mix of
within and
between
Between subj
Factorial
ANOVA
Repeated
Measures
Factorial ANOVA
Mixed Model
Factorial
ANOVA12
RESEARCH PROBLEM
Does the presence of others during an emergency
af fect helping behavior?
Conduct an experiment with 3 conditions
Wait alone
Wait with 1 other person
Wait with 2 other people
IV = Number of people present
3 “levels” (0, 1, 2)
DV = Time it takes (in seconds) to call for help
13
DATA FROM HELPING STUDY
Seconds lapsed before calling for help
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
M1 = 16 M2 = 24 M3 = 29
Are these 3
means
significantly
different
from each
other?
14
THE LOGIC OF ANOVA
t = difference between sample means
difference expected by chance (error)
F = variance between sample means
variance expected by chance (error)
15
THE LOGIC OF ANOVA
Variance = dif ferences between scores
Two sources of variance:
Between group variance
Within group variance
F = variance between sample means
variance expected by chance (error)
16
BET WEEN GROUP VARIANCE
Why do people in dif ferent groups dif fer?
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
M2 = 24 M3 = 29M1 = 16
17
BET WEEN GROUP VARIANCE
Why do people in dif ferent groups dif fer?
1. Treatment effect = differences caused by our
experimental treatment
Systematic differences
2. Chance = differences due to random factors including…
Individual differences
Experimental error
Non-systematic, random differences
18
WITHIN GROUP VARIANCE
Why do people within the same group dif fer,
even though they were treated alike?
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
M2 = 24 M3 = 29M1 = 16
19
WITHIN GROUP VARIANCE
Why do people within the same group dif fer?
1. Chance = differences due to random factors
including…
Individual differences
Experimental error
Non-systematic, random differences
20
SOURCES OF VARIANCE
Total Variance
Between Group
Variance
• Treatment Effect
• Chance (error)
Within Group
Variance
• Chance (error)
Numerator of F-ratio Denominator of F-ratio
21
THE F-RATIO
If H 0 is True:
If H 0 is False:
F = Between-group (Treatment + Chance)
Within-group (Chance)
F = Treatment Effect + Chance
Chance
F = 0 + Chance
Chance
≈ 1
> 1
22
THE F-RATIO
F is the ratio between two variance estimates
Denominator is also called “error term”
How large does obser ved F have to be to conclude
there is a treatment ef fect (to reject H 0)?
Compare observed F to critical values
Based on the sampling distribution of F
23
THE SAMPLING DISTRIBUTION OF F
A family of distributions (just like t)
Each with a pair of degrees of freedom (df)
Critical values shown in F table
Need 3 pieces of information
(1) α level
(2) dfbetween (dfnumerator )
(3) dfwithin (dfdenominator)
24
THE SAMPLING DISTRIBUTION OF F
F -values are always positive
Variance cannot be negative
If H 0 is true then F ≈1
So peak appears around 1
F0 1 2 3 4 5 6
p
25
THE SAMPLING DISTRIBUTION OF F
Shape of distribution will change with df
Large df will result in less spread to the right
In practical terms, leads to smaller critical values
of F (closer to 1.0)
F0 1 2 3 4 5 6
p
26
CRITICAL VALUES OF F
A portion of the F distribution table. Entries in regular type are critical values for the α =.05,
and bold type values are for the α=.01. The critical values for df = 2,12 have been
highlighted. Notice that we no longer differentiate between one-tailed or two-tailed
hypotheses. All values of F are positive, and all hypotheses are non-directional. Some
sources print separate tables for different alpha levels.
df
Between
df
Within
α =.05
α =.01
Learning check!
27
HYPOTHESIS TESTING
WITH ANOVA
28
RESEARCH PROBLEM
Does the presence of others during an emergency
af fect helping behavior?
Conduct an experiment with 3 conditions
Wait alone
Wait with 1 other person
Wait with 2 other people
IV = Number of people present
3 “levels” (0, 1, 2)
DV = Time it takes (in seconds) to call for help
29
DATA FROM HELPING STUDY
Seconds lapsed before calling for help
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
M1 = 16 M2 = 24 M3 = 29
Are these 3
means
significantly
different
from each
other?
30
HYPOTHESIS TESTING WITH ANOVA
Research question
Does presence of others affect helping?
Step 1: Statistical Hypotheses
H0: µ1 = µ2 = µ3
H1: At least one mean is different from another
No longer differentiate between one-tailed and
two-tailed tests.
All ANOVA tests are non-directional
Why?
31
HYPOTHESIS TESTING WITH ANOVA
Step 2: Decision Rule: Look up critical value of F in Table
α level
dfnumerator = dfbetween
dfdenominator = dfwithin
Step 3: Compute obser ved F -ratio
Step 4: Make a Decision (Reject or retain H 0)
**Step 5: If H 0 rejected, conduct post-hoc comparisons
Step 6: Interpret and Repor t Findings
32
FINDING THE CRITICAL VALUE
Find Fcritical in Table
Need to know 3 things
α level
dfnumerator = dfbetween
dfdenominator = dfwithin
If α = .05 and df = 2,15, Fcritical = 3.68
33
CRITICAL VALUES OF F FOR DF=2,15
Critical region;
Reject H0
3.68 6.23
34
COMPUTING ANOVA
Steps in computing the ANOVA
Compute SS
Compute df (two values!)
Compute MS
Compute F
Keep track of your computations in an ANOVA
Summar y Table
35
COMPUTING ANOVA
The ANOVA summar y table
36
COMPUTING ANOVA
Variance = “Mean Square” (MS) =
F = between-group variance
within-group variance
SS
df
F = MS between
MS within
Throwback
to Module 3!
37
COMPUTING ANOVA
STEP 1: Compute Sums of Squares (SS)
SSTotal =
2
2 GX
N
−∑
SSBetween =
22 GT
Nn
−∑
SSWithin = Σ(SS for each group) or SSTotal − SSBetween
Where:
• X = each value of X
• T = treatment group total (ΣX)
• G = grand total (ΣT)
• n = sample size of each group
• N = total sample size (Σn)
38
COMPUTING ANOVA
STEP 2: Compute Degrees of Freedom (df)
dfBetween = k – 1
Where:
• n = sample size of each group
• N = total sample size (Σn)
• k = number of groups
dfWithin = N – k or Σ(n-1)
dfTotal = N – 1
39
COMPUTING ANOVA
STEP 3: Compute Mean Squares (MS)
MSBetween =
between
between
df
SS
MSWithin =
within
within
df
SS
40
COMPUTING ANOVA
STEP 4: Compute the F -Ratio
F-Ratio =
within
between
MS
MS
41
COMPUTING ANOVA
The ANOVA Summar y Table
Source SS df MS F
Between group SSB dfB MSB MSB
MSWWithin group
(error)
SSW dfW MSB
Total SST dfT
42
COMPUTING ANOVA
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n = 6 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
43
COMPUTING ANOVA
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
SSTotal = 72295221024418
414
]27…201914[
2
2222 =−=−++
SSTotal =
2
2 GX
N
−∑
44
COMPUTING ANOVA
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
SSBetween = 51695221003818
414
6
174
6
144
6
96 2222
=−=−
++
SSBetween =
22 G
n
T
N
−∑
45
COMPUTING ANOVA
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
SSWithin = 722 – 516 = 206
SSWithin= SSTotal − SSBetween
46
COMPUTING ANOVA
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n 6 6 6 N = 18
SS 58 72 76 SSWithin= 206
SSWithin = ΣSS = 58 + 72 + 76 = 206
SSWithin = ΣSS for each group
You will often
be given
these values
47
COMPUTING ANOVA
Let’s fill in our SS values
Source SS df MS F
Between group 516
Within group
(error)
206
Total 722
Notice
722 = 516 + 206
SST = SSB + SSW
48
COMPUTING ANOVA
Now compute degrees of freedom (df)
Source SS df MS F
Between group 516 k-1
Within group
(error)
206 N-k
Total 722 N-1
Where k = 3 N = 18
49
COMPUTING ANOVA
Source SS df MS F
Between group 516 (k-1)
3–1 = 2
Within group
(error)
206 (N-k)
18–3 = 15
Total 722 (N-1)
18–1 = 17
Where k = 3 N = 18
50
COMPUTING ANOVA
Source SS df MS F
Between group 516 2
Within group
(error)
206 15
Total 722 17
Notice
17 = 15 + 2
dfT = dfB + dfW
51
COMPUTING ANOVA
Source SS df MS F
Between group 516 2 SSB/dfB
Within group
(error)
206 15 SSW/dfW
Total 722 17
Now compute the Mean Squares (MS)
52
COMPUTING ANOVA
Source SS df MS F
Between group 516 2 516/2=258
Within group
(error)
206 15 206/15=13.73
Total 722 17
Now compute the Mean Squares (MS)
53
COMPUTING ANOVA
Source SS df MS F
Between group 516 2 258 MSB
MSW
Within group
(error)
206 15 13.73
Total 722 17
Now compute the F-Ratio
54
COMPUTING ANOVA
Source SS df MS F
Between group 516 2 258 258 = 18.79
13.73
Within group
(error)
206 15 13.73
Total 722 17
Now compute the F-Ratio
55
COMPUTING ANOVA
Source SS df MS F
Between group 516 2 258 18.79
Within group
(error)
206 15 13.73
Total 722 17
All of this work for the final F-ratio!
56
MAKE A DECISION AND REPORT
Does our obser ved F (1 8.79) exceed our critical
value of F (3.68)?
Yes!
Reject H 0
Basic format:
“There was a significant effect of how many others were
present on the time it took participants to call for help, F
(2, 15) = 18.79, p < .05. [to be continued]”
57
REPORTING A F -STATISTIC
A closer look…
F(2, 15) = 18.79, p < .05
Test
statistic
Observed
value
alpha
level
Degrees of
freedom
(B, W)
Significance
Sig? p < α
Nonsig? p > α
Learning check!
58
INTERPRETING
FINDINGS
59
INTERPRETING FINDINGS FROM
ANOVA
At least two of the means are significantly dif ferent
from each other
But, which ones?
Must conduct additional analyses to pinpoint specific
mean differences
Called “post hoc tests” or (posttests)
In other words,
Omnibus test the “main test” (in this case the
one-way ANOVA)
Post hoc test the “follow-ups”
60
POST HOC TESTS
Pinpoint specific group dif ferences
Conduct multiple comparisons, controlling for
experimentwise Type I error rate
Many types of post hoc tests
Mostly based on comparing absolute value of differences between
pairs of means to a critical value
Common ones include
Bonferroni Correction for Multiple Comparisons
Fisher’s Least Significant Difference (LSD)
Tukey Honestly Significant Difference (HSD)
61
TUKEY HSD TEST
Tukey Honestly Significant Dif ference (HSD)
HSD = minimum dif ference between means
needed for statistical significance
How big does the difference between two means have
to be in order to conclude that they are significantly
different from each other?
Like a critical value, but a critical “mean difference”
Assumes equal n
62
TUKEY HSD TEST
Step 1: Find the value of “q”(Table)
Need to know 3 things:
α
dfW
k
Step 2: Compute HSD
HSD =
n
MSwithinq
Where n = group sample
size, assuming equal n
in each group
63
TUKEY HSD TEST
Step 3: Compute dif ference between each
pair of means and compare to HSD
M1 – M2 = ?
M1 – M3 = ?
M2 – M3 = ?
Compare each mean difference to the HSD
If the difference equals/exceeds the HSD,
conclude that the means are significantly
different from each other
64
COMPUTING TUKEY’S HSD
Alone
1 other
present
2 others
present
14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27
n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
Means M1=16 M2=24 M3=29
65
TUKEY HSD TEST: EXAMPLE
Step 1: Find the value of “q” (Q Table)
α = .05 dfW = 15 k = 3
66
TUKEY HSD TEST: EXAMPLE
Step 1: Find the value of “q” (Q Table)
α = .05 dfW = 15 k = 3
From Table : q = 3.67
Step 2: Compute HSD
HSD = = ± 5.55 seconds
6
73.13
67.3
n
MSwithin =q
So, a pair of means must
differ by at least 5.55 in
order to be significantly
different
67
TUKEY HSD TEST: EXAMPLE
Step 3: Compute dif ference between each
pair of means and compare to HSD
M1 – M2 = 16 – 24 = – 8
M1 – M3 = 16 – 29 = -13
M2 – M3 = 24 – 29 = -5
Exceeds 5.55
Exceeds 5.55
Does not
exceed 5.55
68
TUKEY HSD TEST: EXAMPLE
What do we conclude?
M1 differs from M2 and M3
People waiting alone helped significantly faster than
people waiting with others
M2 & M3 do NOT differ from each other
There was no difference in helping times for individuals
waiting with 1 other person and individuals waiting with
2 other people
69
MEASURE OF EFFECT SIZE
Compute propor tion of variance explained by the
treatment ef fect
Propor tion of total variance accounted for by
variability between groups
In ANOVA , r 2 typically called η2 (pronounced “eta
squared”)
r2 =
Total
Between
SS
SS
70
MEASURE OF EFFECT SIZE: EXAMPLE
71% of the variance in helping behavior
(number of second lapsed before seeking
help) is explained by the number of people
present
r2 = η2 = 71.
722
516
SS
SS
Total
Between ==
71
REPORTING RESULTS OF AN ANOVA
Formal description of findings:
“There was a significant effect of the number of people present
on the time it took (in seconds) for participants to seek help,
F(2,15) = 18.79, p<.05, η2 = .71. Tukey post-hoc comparisons
indicated that participants who were waiting alone helped
significantly faster (M=16, SD=3.4) than participants who
waited with one other person (M=24, SD=3.8) or with two other
people (M=29, SD=3.9), p < .05.”
effect
size
SD for each group
SD =
1
SS
−n
72
INDEPENDENT MEASURES ANOVA
ASSUMPTIONS
T h e o b s e r vat io ns w i t h in e a c h s a m p le m u s t b e i nd e p endent .
T h e p o p u lat io n f ro m w h i c h t h e s a m p les a r e s e l e c ted m u s t b e
no r m a l .
T h e p o p u lat io ns f ro m w h i c h t h e s a m p les a r e s e l e cted m u s t
h ave e qu a l va r i anc es ( h o m o g eneit y o f va r i a nce) .
Violating the assumption of homogeneity of variance risks invalid
test results.
Learning check!
73
M i nd t ap Ac c e ss
Psy 201 last term? Just log in with same credentials and access course
using course code on syllabus
Psy 201 last summer? Submit course code request to form on syllabus
webpage
Ever yone else, use direct link to bookstore on syllabus webpage
Ad d eve r yt hing to yo u r c a l e ndar now!
Mindtap and tutorial problem sets often overlap
Due date is not “do date”
TO DO
74
- Psy 202H1: �Statistics iI���Module 2: �One Way ANOVA�
- Game Plan
- Intro to designs with more than 2 groups
- When Would you want to study more than two groups?
- Effect of Anti-depressant dosage on Mental Health
- Effect of Caffeine �on Test Performance
- Pros and Cons of Adding Levels to IV
- Slide Number 8
- Pros and Cons of Adding Levels to IV
- AND, A statistical Cost
- A Better Statistical Solution
- Slide Number 12
- Research problem
- Data from Helping Study
- The Logic of ANOVA
- The Logic of ANOVA
- Between Group Variance
- Between Group Variance
- Within Group Variance
- Within Group Variance
- Sources of Variance
- The F-Ratio
- The F-Ratio
- The Sampling Distribution of F
- The Sampling Distribution of F
- The Sampling Distribution of F
- Critical Values of F
- Hypothesis Testing with ANOVA
- Research problem
- Data from Helping Study
- Hypothesis testing with ANOVA
- Hypothesis testing with ANOVA
- Finding the Critical Value
- Critical values of F for df=2,15
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Computing ANOVA
- Make a Decision and report
- Reporting a F-statistic
- Interpreting Findings
- Interpreting Findings from ANOVA
- Post Hoc Tests
- Tukey HSD Test
- Tukey HSD Test
- Tukey HSD Test
- Computing Tukey’s HSD
- Tukey HSD Test: Example
- Tukey HSD Test: Example
- Tukey HSD Test: Example
- Tukey HSD Test: Example
- Measure of Effect Size
- Measure of Effect Size: Example
- Reporting Results of an ANOVA
- Independent Measures ANOVA Assumptions
- To Do
TD0409-01 课件/Psy 202_9_advanced concepts_W22_topost-1.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 9:
INTRO TO ADVANCED CONCEPTS
1
1. Nonparametic Tests
2. Intro to Advanced Stats
1. Multilevel modeling
2. Factor analysis
3. Mediation
4. Meta-analysis
GAME PLAN
2
“ W hat d o we ne e d to k now ? ”
NON-PARAMETRIC VS
PARAMETRIC TESTS
3
PA RAMETRIC VS . N O N PARAMETRIC T E ST S
Pa r a met r ic te s t s m a ke a s s u m pti o ns a b o u t t h e s h a pe o f t h e
p o p u lat io n d i s t r ibut io n a nd ot h e r p o p u lat io n p a r a meter s ( e . g . ,
μD = 0 )
Normal distribution in the population
Homogeneity of variance in the population
Numerical score for each individual
Re qu i r e d a t a f ro m a n i nte r val o r r a t i o s c a l e
N o np a r amet r ic te s t s d o not m a ke t h e s e s a m e a s s u m pt io ns
Most do not state hypotheses in terms of specific population
parameters
Participants usually classified into categories
Frequencies
Nominal, ordinal
4
PARAMETRIC VS. NONPARAMETRIC TESTS
E.g., What if we wanted to examine the
relationship between the national ranking
of US college basketball teams and the
annual athletics budget of the college?
School National
Rank
Annual $
(in millions)
Gonzaga 1 19
Duke 2 67
Indiana 3 60
Louisville 4 69
Georgetown 5 47
Michigan 6 111
Kansas 7 53
… … …
Virginia 25 46 5
“GOODNESS OF FIT” TEST AND THE
ONE SAMPLE T TEST
N o np a r amet r ic ( c h i – squ are) ve r s us p a r a met r ic ( t ) te s t
S i m i lar it y: B ot h te s t s u s e d a t a f ro m o ne s a m p le to te s t a
hy p ot hes is a b o u t a s i ng l e p o p u l atio n
L eve l o f m e a s ur ement d eter mine s te s t:
Numerical scores (inter val / ratio scale) make it appropriate to
compute a mean and use a t test
Classification in non-numerical categories (ordinal or nominal scale)
makes it appropriate to compute proportions or percentages to do a
chi-square test
6
SPECIAL APPLICATIONS OF THE
CHI-SQUARE TEST S
A s a s u b s t it ute fo r a p a r a met r ic te s t:
The chi-square test for independence & the Pearson correlation
The chi-square test for independence & the independent-measures t
test (or ANOVA)
U nd e r w h a t c i rc u ms t ances wo u l d yo u c h o o s e to c o nd u c t t h e
c h i – s qu are ( no np a r am etr ic a l ter nat ive)?
Learning Check:
1. Data do not meet the assumptions for a standard parametric test
2. Data consist of nominal or ordinal measurements
7
EXAMPLE: THE MANN- WHITNEY U TEST (VS.
I NDEPENDENT T)
A business owner
measured the job
satisfaction of his day -shif t
and night-shif t workers.
Each employee rated job
satisfaction on a scale
from 1 (not satisfied at all)
to 100 (completely
satisfied). Test whether
ratings of job satisfaction
dif fered between the two
groups using the Mann-
Whitney U test at a .05
level of significance.
Day Shift Night Shift
88 24
72 55
93 70
67 60
62 50
8
EXAMPLE: THE W ILCOXON SI GNED-RANKS T
TEST (VS. R EPEATED-MEASURES T)
A researcher measured the number of cigarettes
patients smoked per day in a sample of 6 patients
before and 6 months af ter being diagnosed with lung
cancer. Test whether patients significantly changed
their smoking habits following the diagnosis using
the Wilcoxon signed-ranks T test at a .05 level of
significance. Before
Diagnosis
After
Diagnosis
23 20
25 5
13 8
12 16
9 15
22 19 9
EXAMPLE: THE K RUSKAL- WALLIS H TEST (VS.
ONE- WAY ANOVA)
A researcher asks a sample of 15 students (5 per
group) to view and rate how ef fectively they think
one of three shor t video clips promoted safe
driving. The students rated the clips from 1 (not
ef fective at all) to 100 (extremely ef fective). Test
whether ratings dif fer between groups using the
Kruskal-Wallis H test at a .05 level of significance.
Clip A Clip B Clip C
88 92 50
67 76 55
22 80 43
14 77 65
42 90 39
10
EXAMPLE: THE FRIEDMAN TEST (VS.
R EPEATED ANOVA)
A d o c to r i s c u r i o u s a b o u t w h e t h e r w o m e n w i t h o u t
h e a l t h i n s u r a n c e m a ke r e g u l a r o f f i c e v i s i t s
t h r o u g h o u t t h e c o u r s e o f t h e i r p r e g n a n c i e s . S h e
s e l e c t s a s a m p l e o f 4 p r e g n a n t w o m e n a n d r e c o r d s
t h e n u m b e r o f h o s p i t a l v i s i t s m a d e d u r i n g e a c h
t r i m e s te r o f t h e i r p r e g n a n c y.
Te s t w h e t h e r t h e r e a r e d i f f e r e n c e s i n t h e n u m b e r o f
o f f i c e v i s i t s m a d e ov e r t h e c o u r s e o f t h e p r e g n a n c y
u s i n g t h e Fr i e d m a n te s t a t a . 0 5 l e v e l o f
s i g n i f i c a n c e .
Participant 1st Trimester 2nd Trimester 3rd Trimester
A 3 5 8
B 6 4 7
C 2 0 5
D 4 3 2
11
NONPARAMETRIC TESTS OVERVIEW
Statistic Purpose Example
Mann Whitney Compare two independent groups
when assumptions for independent t
not met
Determine whether a control group and
treatment group are different when the DV
is ordinal
Wilcoxon Signed-Rank Compare two matched or within
subject conditions when
assumptions for dependent t not
met
Determine whether ordinal ratings of
academic skill are different from ratings of
athletic skill for same group
Kruskal-Wallis Compare two or more independent
groups when assumptions for
oneway between subjects ANOVA
not met
Determine whether test scores from three
different instructional conditions are
different when scores are not distributed
normally
Friedman’s ANOVA Compare two or more matched or
within subject conditions when
assumptions for repeated ANOVA
not met
Determine whether ordinal ratings of
academic skill, athletic skill, and social skill
are different for same group of students
Chi Sq Goodness of Fit Compare observed frequency
distribution to null distribution
Determine whether there is a different in
proportion of A, B, C, D, F grades awarded
in school
Chi Sq Test of Independence Determine whether two categorical
variables are related
Test whether grade distributions differ by
gender 12
PARAMETRIC VS. NONPARAMETRIC TESTS
If you have a choice, which should you choose?
Things to consider:
Measurement
Assumptions
Variance
Undetermined scores
13
INTRO TO ADVANCED
PROCEDURES
14
I NTRO TO SOME ADVANCED PROCEDURES
P u rp o s e:
To provide you with some knowledge of additional procedures that
are available to help answer research questions
To allow you to recognize and better understand these more
advanced procedures when you come across them in research
articles
15
ADVANCED PROCEDURES:
MULTILEVEL MODELING
E s s e n t i al l y, t h i s re fe r s to c as e s o f re g re s s i o n w i t h g ro ups
E xampl e : A re s e arc h e r i s i n te re s te d i n h ow muc h t h e n umbe r o f
h o ur s o n e s pe n ds s t ud y i n g fo r a s t at i s t i c s exam pre di c t s s c o re s o n
t h e exam. He s ur vey s s t ude n t s f ro m a do z e n di f fe re n t s t at i s t i c s
c l as s e s . P ro bl e m? Th i n g s c o ul d be ve r y di f fe re n t i n t h e d i f fe re n t
c o ur s e s .
Dif ferent teachers, dif ferent assignments, dif ferent tests, etc.
So l ut i o n ? M ay c arr y o ut t h e re g re s s i o n s e parate l y fo r e ac h c o ur s e ,
t h e n ave rag e t h e re g re s s i o n c o e f fi c i e n t s ac ro s s t h e d i f fe re n t
c o ur s e s . M ay al s o g o a s te p f ur t h e r an d t ake i n to c o n s i de rat i o n
s o me u pper-lev e l va r ia b le s (g ro up-l eve l ) e . g . , do e s te ac h e r
ex pe ri e n c e pre d i c t ave rag e te s t s c o re s i n t h e i r c l as s e s ?
Example of a standard multilevel modeling procedure (multilevel, because
you are looking at both lower-level (individual) and upper-level (group)
variables)
16
ADVANCED PROCEDURES:
MULTILEVEL MODELING
17
ADVANCED PROCEDURES:
FACTOR ANALYSIS
Fa c to r a na l ys is i s a s t a t i s tic al p ro c e d ur e a p p lied i n
s i t u at io ns w h e r e m a ny va r i ables a r e m e a s ur ed. I t i d e nt ifies
g ro u p s o f va r i a bles ( f a c to r s ) t h a t te nd to b e c o r r e l ated w i t h
e a c h ot h e r a nd not ot h e r va r i a bles.
Factor loading the correlation of a variable with a factor.
Variables may have loadings on each factor, but usually have high
loadings on only one.
E . g . , “ Fa c to r a na l y s is o f t h e D e nt a l Fe a r S u r vey d i s c l o s ed
t h r e e s t a b le a nd r e l i able f a c to r s . T h e fi r s t f a c to r r e l a ted to
p a t ter ns o f d e nt a l avo i d ance a nd a nt i c i pato r y a nx i et y. T h e
s e c o nd f a c to r r e l ated to fe a r a s s o c i ated w i t h s p e c ific d e nt a l
s t i m u li a nd p ro c e dur es . Fa c to r t h r e e c o nc e r ne d fe l t
p hy s io lo gic a ro u s a l d u r i ng d e nt a l t r e a t m ent . ”
18
H ow d o p s yc ho lo gi st s fi nd u nd e rlying d i m e ns io ns wh e n we c a n
o nl y o b s e r ve s p e c i fic b e h avio ur s ?
FROM BEHAVIOURS TO CONSTRUCTS
19
1 . HOW MANY SEA MONSTERS?
20
1 . HOW MANY SEA MONSTERS?
21
2. HOW MANY SEA MONSTERS?
22
2. HOW MANY SEA MONSTERS?
23
3. HOW MANY SEA MONSTERS?
24
3. HOW MANY SEA MONSTERS?
25
H ow c o u l d yo u te l l t h e nu m b er o f s e a m o ns te r s w h e n yo u
c o u l d o nl y s e e p a r t s o f t h e m ? Yo u s aw v i s i ble p a r t s m ove
to g et her a nd ot h e r s m ove i nd e pendent ly ; yo u d i d a n
i nt u i t ive c o r r ela t io n.
B y l o o k i ng a t t h e c o r r e l at io ns b et ween a l l t h e p a r t s we c a n
s e e ( o b s e r vable b e h avio r s ), we c a n i nfe r s o m et h ing a b o u t
t h e i r u nd e r lyi ng na t u r e ( t h e o r etic al c o ns t r u c t s ) .
Fa c to r A n a ly s is i s a s t a t i s t ica l m et h o d t h a t l o o k s a t h ow l ot s
o f d i f fer ent o b s e r vat io ns c o r r e l ate a nd d eter mine s h ow m a ny
t h e o r etic al c o ns t r u c t s c o u l d m o s t s i m p ly ex p la in w h a t yo u
s e e .
FROM SEA MONSTERS TO FACTOR
ANALYSIS
26
ADVANCED PROCEDURES:
FACTOR ANALYSIS
What name would you
give to each of these
different factors?
27
ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS
T h i s i s a p a r t i cul ar t y p e o f p a t h a na l y sis t h a t te s t s w h et her a
p r e s um ed c a u s al r e l a tio ns hip b et ween t wo va r i ables i s d u e
to s o m e p a r t i c ular i nte r vening va r i a ble ( M – me d iat in g
var i able )
E.g., Fraley & Aron, 2004: Strangers meeting while either doing
something humourous or non-humourous. Those in the humourous
condition felt closer to their partners. Researchers wanted to
demonstrate that this result was mediated in part by the humour
distracting people from the discomfort of meeting a stranger.
In other words, the reason humour increased closeness is that it
was distracting.
28
ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS
29
ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS
B a ro n & Ke nny ’ s ( 1 9 8 6 ) 4 s te p s fo r e s t a blis hing m e d iat io n:
1 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s Y.
2 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s M .
3 . S h ow t h a t M p r e d ic ts Y i n t h e c o ntex t o f a m u l t iple
r e g r es si o n i n w h i c h X i s a l s o i nc l u ded a s a p r e d ic to r.
4 . S h ow t h a t , w h e n M i s i nc l u ded a s a p r e d ic to r o f Y ( a l o ng
w i t h X ) , X no l o ng e r p r e d ic ts Y ( fo r f u l l me d iat ion ) o r t h a t
t h e p r e d ic tio n i s we a ker ( k now n a s p ar t i al me d iat io n ).
x y
m
*** not for cross-sectional designs!
30
I k now w h a t o ne s t u d y s ay s … . B u t w h a t a b o u t a l l o f t h e
ot h e r s ?
Rev i ew p a p er : a qu a l it at ive s u m m ar y o f t h e s t a te o f t h e
l i te rat ur e o n a g i ven r e s e arc h qu e s t io n
M et a – a na ly sis : a s t a t i st ic al a na l y sis t h a t y i e l ds a qu a nt i t ative
s u m m ar y o f a s c i e nt ific l i ter at ure.
Or, a “study of studies”
Unit of analysis: effect size!
ADVANCED PROCEDURES:
META – ANALYSIS
31
ADVANCED PROCEDURES:
META – ANALYSIS
32
A m a j o r l i m i ta tio n to m et a – analy sis : T h e F i l e D r awe r P r o ble m
C a u t io n: j u s t b e c a us e i t i s s t a t i st ic al d o e s n’ t m e a n i t i s
p e r fect ly o b j e c ti ve!
Ego depletion
Ca r ter, E. C. , Ko f l er, L. M . , Fo rster, D. E. , & M cCul l o ugh, M . E. (2015). A s er i es o f meta-
a na l y ti c tests o f the depl eti o n effect: s el f – co ntro l do es no t s eem to rel y o n a l i mi ted
res o urce. Jo urnal o f Ex pe ri m e ntal Psyc ho l o gy: G e ne ral , 144 (4), 796- 815.
https :// do i . o rg /1 0. 1 0 37/ x ge 0 00 0 08 3
H a g ger, M . S . , Wo o d, C. , S ti ff, C. , & Chatzi s a ra nti s , N . L. (2010). Ego depl eti o n a nd the
strength mo del o f s el f – co ntro l : a meta – a na l ys i s . Psyc ho l o gi cal B ul l eti n , 136(4), 495–
525. https :// do i . o rg /1 0. 1 0 37/a 00 1 94 8 6
Ovulator y cycle effects
Gildersleeve, K., Haselton, M. G., & Fales, M. R. (2014). Do women’s mate
preferences change across the ovulatory cycle? A meta-analytic
review. Psychological Bulletin, 140(5), 1205-1259.
https://psycnet.apa.org/doi/10.1037/a0035438
Wood, W., Kressel, L., Joshi, P. D., & Louie, B. (2014). Meta-analysis of
menstrual cycle effects on women’s mate preferences. Emotion Review, 6(3),
229-249. https://doi.org/10.1177%2F1754073914523073
ADVANCED PROCEDURES:
META – ANALYSIS
33
META – ANALYSIS:
SOCIAL RELATIONSHIPS AND HEALTH
34
Better!
Op e n wo rk h o u r s i n t u to r i a l s e s s i o n
D a t a A na l ys is P ro j e c t!
35
TO DO
- Psy 202H1: �Statistics iI���Module 9: �Intro to Advanced Concepts�
- Game Plan
- Non-Parametric vs Parametric Tests
- Parametric vs. Nonparametric Tests
- Parametric vs. Nonparametric Tests
- “Goodness of Fit” Test and the �One Sample t Test
- Special Applications of the� Chi-Square Tests
- Example: The Mann-Whitney U Test (vs. independent t)
- Example: The Wilcoxon Signed-Ranks T test (vs. repeated-measures t)
- Example: The Kruskal-Wallis H test (vs. one-way anova)
- Example: The Friedman test (vs. repeated anova)
- NonParametric Tests Overview
- Parametric vs. Nonparametric Tests
- Intro to Advanced Procedures
- Intro to Some Advanced Procedures
- Advanced Procedures: �Multilevel Modeling
- Advanced Procedures: �Multilevel Modeling
- Advanced Procedures: �Factor Analysis
- From Behaviours to Constructs
- 1. How many sea monsters?
- 1. How many sea monsters?
- 2. How many sea monsters?
- 2. How many sea monsters?
- 3. How many sea monsters?
- 3. How many sea monsters?
- From sea monsters to factor analysis
- Advanced Procedures: �Factor Analysis
- Advanced Procedures: �Mediational Analysis
- Advanced Procedures: �Mediational Analysis
- Advanced Procedures: �Mediational Analysis
- Advanced Procedures: �Meta-Analysis
- Advanced Procedures: �Meta-Analysis
- Advanced Procedures: �Meta-Analysis
- Meta-Analysis: �Social Relationships and Health
- To do
TD0409-01 课件/Psy 202_8_Chi_Square_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 8:
INTRO TO CHI SQUARE
1
1. Introduction to Chi-Square
1. Research Spotlight: Selfies in the Wild
2. Hypothesis Testing Steps
3. When to use the Chi-Square
4. An example
5. Practice!
2. Chi-Square Test of Independence
1. When We Use It
2. A Research Example
3. Practice
3. Back to the Big Picture
GAME PLAN
2
INTRODUCTION TO
CHI-SQUARE ANALYSIS
3
Hy pot h e s i s te s t s us e d t h us f ar te s te d hy pot h e s e s abo ut
po p ul at i o n pa r a m eter s .
Paramet ri c te s t s s h are s eve ral as s umpt i o n s
Normal distribution in the population
Homogeneity of variance in the population
Numerical score for each individual
N o n param et ri c te s t s are n e e de d i f re s e arc h s i t uat i o n do e s n ot
me et al l t h e s e as s umpt i o n s .
More next week!
N o n param et ri c te s t s …
Make few assumptions about distribution (as compared to all of our
assumptions about normality and variance for z, t, F, etc.)
Usually use categories/frequencies
PARAMETRIC VS NONPARAMETRIC TESTS
4
Statistical Test IV DV
Correlation/Linear
regression
Continuous Continuous
Independent
samples t-test
Two independent
categories
Continuous
Paired sample
t-test
Two related groups Continuous
ANOVA Multiple categories Continuous
Chi-square Two or more
categories
Categorical
5
THE CHI-SQUARE STATISTIC
Most statistical tests you learn require quantitative
data (correlation, z-test, t-test, etc.)
What if we have questions about categories or
classifications?
Do college students prefer Coke or Pepsi?
Is the racial breakdown of UofT representative of the general
population?
These questions involve counting the number of
people in dif ferent groups/categories
They involve frequency distributions
6
THE CHI-SQUARE STATISTIC
The Chi-Square statistic: χ2
Tests whether one set of proportions is different from
another
Done by comparing frequencies (counts)
Two types of hypothesis tests
χ2 Goodness-of-fit test
χ2 Test of independence
7
χ2 TEST FOR GOODNESS-OF -FIT
Goodness-of-fit test uses frequency data from
a sample to test hypotheses about propor tions
in a population.
Each individual is classified into ONE categor y
on the variable of interest.
Do you prefer Coke or Pepsi?
Do your prefer the original or prequel Star Wars movies?
Simply count how many people in the sample
are in each categor y
8
χ2 TEST FOR GOODNESS-OF -FIT
H O specifies the propor tion of the population
that should be in each categor y.
The propor tions from H O are used to compute
expected frequencies
The expected frequencies describe how the sample
would appear if H O was true
χ2 then compares obser ved frequencies (from the
sample) to expected frequencies (from H O)
9
χ2 TEST FOR GOODNESS-OF -FIT
Why is it called “goodness-of-fit?”
We test whether our “obser ved” frequencies
“fit” against our “expected” frequencies.
Kind of like model testing (remember, R 2 as a
statistic of “goodness of fit”)
10
h t t p s : / / info r mat io nis beaut iful . net/ v isual izat io ns /di ver s it y – in- tec h /
RESEARCH SPOTLIGHT:
HAVE WE REACHED
GENDER PARIT Y IN TECH
COMPANIES?
11
W h en wo u l d yo u s ay t h a t g e nd er e qu a lit y h a d b e e n a c h i eved
i n te c h ?
F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o u nt r y
51% female in Canada (Census 2016)
F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o m p a ny
https://informationisbeautiful.net/visualizations/diversity -in-tech/
D o t h ey m a t c h ?
GENDER PARIT Y IN TECH
12
GENDER PARIT Y IN TECH
13
RESEARCH PROBLEM
D o e s a new te a c hing m et h o d i m p rove te s t p e r fo r manc e
o n a s t a nd ar dized m a t h te s t ?
I n p r i o r ye a r s , 6 0 % o f s t u d ent s p a s s ed t h e te s t ( 4 0 % f a i l ed).
D a t a f ro m t h e C U R R E NT s c h o o l ye a r ( 2 0 0 c h i l dr en) :
Is there a significant change in test per formance?
Student Performance this Year
Pass Fail
150 50 Total n = 200
14
This is “frequency” (or “count”) data
200 children were sampled
150 children passed
50 children failed
RESEARCH PROBLEM
Test Performance
Pass Fail
150 50 Total n = 200
15
STEP 1: STATISTICAL HYPOTHESES
H0: There is no change/difference in student
performance
The pass rate this year (with the new teaching method) will be
the same as the pass rate in prior years (60% pass, 40% fail).
H1: The is a change/difference in student
performance
16
STEP 2: FIND CRITICAL VALUE
Two pieces of information needed
α level
df = C-1 (where C = number of categories)
Critical value from Table
α = .01
df = 2 -1 = 1
Critical value = 6.63
17
CRITICAL VALUE OF χ2
χ2
6.63
Decision Rule: If observed χ2 equals or exceeds 6.63, then reject Ho
18
STEP 3: COMPUTE OBSERVED χ2
f o = o b s e r ved f r e qu e ncy ( fo r e a c h c e l l )
f e = ex p ec ted f r e qu enc y ( fo r e a c h c e l l ) = p n
p = p ro p o r t i o n s t a te d i n t h e nu l l hy p ot hes is
n = tot a l s a m p le s i z e
∑
−
=
e
eO
f
ff 22 )(χ
19
COMPUTE OBSERVED χ2
How do you find p?
We are given information about the known population
distribution in previous years.
60% pass and 40% fail.
Thus the proportions (p) under the null hypothesis
are:
p = .60 pass
p = .40 fail
If the problem doesn’t specify, figure out what the
question is asking: e.g., if 2 sodas are preferred at equal
rates, what proportion of people should prefer each one?
What about 3 different sodas?
20
COMPUTE OBSERVED χ2
Compute expected frequencies (pn):
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Total
n = 200
21
COMPUTE OBSERVED χ2
Compute expected frequencies (pn):
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Total
n = 200
Expected
frequencies (fe)
fe = pn
22
COMPUTE OBSERVED χ2
Compute expected frequencies (pn):
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50 Totaln = 200
Expected
frequencies (fe)
fe = pn
.60 × 200 .40 × 200
23
COMPUTE OBSERVED χ2
Compute expected frequencies (pn):
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50 Totaln = 200
Expected
frequencies (fe)
fe = pn
.60 × 200
= 120
.40 × 200
= 80
If HO is true (and there
is no change), we expect
to see 120 students pass
and 80 fail.
24
COMPUTE OBSERVED χ2
Step 4: Calculate χ2
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Expected
frequencies (fe)
120 80
25
COMPUTE OBSERVED χ2
Step 4: Calculate χ2
∑
−
=
e
eO
f
ff 22 )(χ
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Expected
frequencies (fe)
120 80
26
COMPUTE OBSERVED χ2
Step 4: Calculate χ2
∑
−
=
e
eO
f
ff 22 )(χ
+∑
−
=
120
)120150( 22χ
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Expected
frequencies (fe)
120 80
27
COMPUTE OBSERVED χ2
Step 4: Calculate χ2
∑
−
=
e
eO
f
ff 22 )(χ
80
)8050(
120
)120150( 222 −+∑
−
=χ
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Expected
frequencies (fe)
120 80
28
COMPUTE OBSERVED χ2
Step 4: Calculate χ2
∑
−
=
e
eO
f
ff 22 )(χ
80
)8050(
120
)120150( 222 −+∑
−
=χ = 7.5 +11.25 = 18.75
Student Performance
Pass Fail
Observed
frequencies (fo)
150 50
Expected
frequencies (fe)
120 80
29
STEP 4: MAKE A DECISION
Reject Ho
Because observed χ2 (18.75) exceeds the
critical value (6.63)
30
STEP 5: REPORT RESULTS
What does this mean?
There was a significant change in test performance.
Students performed better this year (75% passed)
compared to prior years (60% passed).
“Based on data from the current school year, test
performance was significantly improved with the new
teaching method, χ2 (1, N = 200) = 18.75, p < .01. A
larger percentage of students passed the test this year
(75%) compared to prior years (60%)”
31
REPORTING A
A closer look…
χ2(1, N = 200) = 18.75, p < .01, two-tailed
Test
statistic
Observed
value
alpha
level Other tests:
One or two
tailed
All Chi-Sq
tests are one-
tailed!
Degrees of
freedom
2χ
Sample size
(Chi-Sq only)
Learning check!
32
CHI SQUARE TEST OF
INDEPENDENCE
33
THE CHI-SQUARE STATISTIC
Working with categorical variables
Two types of hypothesis tests
χ2 Goodness-of-fit test
χ2 Test of independence
The Goodness of fit test
We have one variable
We test whether observed frequencies (proportions) match
expected or hypothesized frequencies
34
THE CHI-SQUARE STATISTIC
What if we have more than one variable?
What if we have questions about the relationship
between two categorical variables?
Are women more likely than men to prefer Coke to Pepsi?
Do students vs. faculty differ in their opinion
about raising student fees (yes/no)?
Need the Chi-square test of independence
35
L l oy d , H u g e nber g, & c o l l e agues
RESEARCH SPOTLIGHT:
ARE THERE SYSTEMATIC
DIFFERENCES IN WHAT
KIND OF SELFIES MEN
AND WOMEN POST?
• A ng l e r e l a ted to p owe r
• A ng l e r e l a ted to g e nd er
• Powe r r e l a ted to g e nd er
• D ow ns t r e am c o ns e qu e nces ?
36
STUDY 1 METHOD
Compiled 932 selfies from www.iconosquare.com
(Instagram)
4 trained raters
Judged target gender
Judged whether selfie was taken below, at, or above eye level
37
STUDY 1: METHOD
38
STUDY 1: RESULTS
Low
(below
eye level)
Neutral
(at eye
level)
High
(above eye
level)
Male 139 240 87
Female 65 230 171
X2 (2, N = 932) = 54.40, p < .001
39
STUDY 1: DISCUSSION
Pe o ple ta ke se lfies fro m va rie d a n g les
A n g les c h o sen dif fe r by ta rg et g e n der
40
RESEARCH PROBLEM
Are people more likely to litter when the environment
is already dir ty?
Conduct an experiment:
Hand people a flier at the entrance to a parking lot
Parking lot is either dirty or clean
Measure whether person throws flier on the ground
Is there a significant association between cleanliness of
the environment and littering?
Kind of like a correlation, but for categorical variables
41
THE CHI-SQUARE STATISTIC
Need a new chi-square statistic
The Chi-Square test of independence
Tests whether two categorical variables are related to each
other
Whether two variables “depend” on each other
Done by comparing frequencies (counts)
42
χ2 TEST OF INDEPENDENCE
Test of independence uses frequency data from a sample
to test hypotheses about propor tions in a population.
Each individual is classified into one categor y based on
the combination of two variables
Are women more likely than men to prefer Coke to Pepsi?
Do students vs. faculty differ in their opinion
about raising student fees (yes/no)?
Simply count how many people in the sample are in each
categor y
43
χ2 TEST OF INDEPENDENCE
H O states that the two variables ARE NOT related
Assumes that frequencies (proportions) on one variable are the
same across levels of the other variable
H 1 states that the two variables ARE related
χ2 then compares obser ved frequencies (from the
sample) to expected frequencies
Expected frequencies are computed from
sample data
44
χ2 TEST OF INDEPENDENCE
Why is it called “test of independence”?
We test whether the frequencies (proportions) on
one variable are independent from another variable
45
COMPUTING χ2
Same formula for χ2
f o = o b s e r v e d f r e q u e n c y ( f o r e a c h c e l l )
f e = e x p e c te d f r e q u e n c y ( f o r e a c h c e l l )
B u t g e t t i n g e x p e c te d f r e q u e n c i e s ( f e) a r e a b i t m o r e
c o m p l i c a te d !
∑
−
=
e
eO
f
ff 22 )(χ
46
IMPORTANT!!!
This is different
from Goodness
of Fit method!!!
COMPUTING χ2
Calculate χ2
w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :
f c = c o l u m n to t a l
f r = r o w to t a l
n = to t a l s a m p l e s i z e
∑
−
=
e
eO
f
ff 22 )(χ
n
ff
f rce =
47
HYPOTHESIS TESTING STEPS
Step 1: State the statistical hypotheses
Step 2: Create a decision rule
Step 3: Collect data and compute “obser ved”
test statistic
Step 4: Make a decision
Step 5: Repor t and summarize your results
48
RESEARCH PROBLEM
A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent
i s a l r e ady d i r t y ?
C o nd u c t a n ex p erime nt:
Hand people a flier at the entrance to a parking lot
Parking lot is either dirty or clean
Measure whether person throws flier on the ground
Is there a significant association between cleanliness of
the environment and littering?
49
RESEARCH PROBLEM
A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent i s
a l r e ady d i r t y ?
D a t a f ro m 10 0 p a r t i cipant s :
Is there a significant relationship between cleanliness
of the environment and littering behavior?
Subject’s response
Environment No Litter Litter
Clean 45 5
Dirty 30 20
Total n = 100
50
STATISTICAL HYPOTHESES
State the Statistical Hypotheses
H0: There is no relationship between cleanliness of
the environment and littering
H1: There is a predictable relationship between
cleanliness of the environment and littering
51
FIND CRITICAL VALUE
Create Decision Rule (find critical value)
Two pieces of information needed
α level
df = (R-1)(C-1)
(where R=number of rows, C = number of columns)
Critical value from Table
α = .05
df = (2 -1)(2-1) = 1
Critical value = 3.84
52
CRITICAL VALUE OF χ2
χ2
3.84
Decision Rule: If observed χ2 equals or exceeds 3.84, then reject Ho
53
COMPUTE OBSERVED χ2
Calculate χ2
w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :
f c = c o l u m n to t a l
f r = r o w to t a l
n = to t a l s a m p l e s i z e
∑
−
=
e
eO
f
ff 22 )(χ
n
ff
f rce =
54
IMPORTANT!!!
This is different
from Goodness
of Fit method!!!
Subject’s response Row totals
Environment No Litter Litter
Clean 45 5 50
Dirty 30 20 50
Column totals 75 25
Total n = 100
COMPUTE EXPECTED
FREQUENCIES
Obser ved frequencies with row and column totals:
Next, compute expected frequency for each cell
n
ff
f rce =
55
COMPUTE EXPECTED
FREQUENCIES
Expected frequency for each cell:
Subject’s response Row totals
Environment No Litter Litter
Clean 37.5 12.5 50
Dirty 37.5 12.5 50
Column totals 75 25 Total n = 100
n
ff
f rce =
5.37
100
5075
, ==
×
NoLitterClean 5.12
100
5025
, ==
×
LitterClean
5.37
100
5075
, ==
×
NoLitterDirty 5.12100
5025
, ==
×
LitterDirty
IMPORTANT!!!
This is different
from Goodness
of Fit method!!!
56
COMPUTE OBSERVED χ2
Calculate χ2 ∑
−
=
e
eO
f
ff 22 )(χ
5.12
)5.1220(
5.37
)5.3730(
5.12
)5.125(
5.37
)5.3745( 22222 −+
−
+
−
+∑
−
=χ
= 1.5 + 4.5 + 1.5 + 4.5 = 12.00
Subject’s response
Environment No Litter Litter
Clean 45/37.5 5/12.5
Dirty 30/37.5 20/12.5
57
MAKE A DECISION
Make a decision
Reject Ho
Because observed χ2 (12.00) exceeds the
critical value (3.84)
58
Repor t Results
“Results revealed a significant association between cleanliness
of the environment and people’s tendency to litter, χ2 (1, N =
100) = 12.0, p < .05. Participants were much more likely to
litter in a dirty environment (40%) than in a clean environment
(10%).”
The sample data suggest that there is a significant association
between cleanliness of the environment and people’s tendency
to litter. When people were in a dirty environment they were
much more likely to litter (40%) compared to when they were
in a clean environment (10%).
Where did I get 40% and 10%?
20/50 littered in the dirty condition = 40%
5/50 littered in the clean condition = 10%
REPORT AND SUMMA RIZE FIND INGS
59
REPORTING A χ2
A closer look…
χ2(1, N = 100) = 12.0, p < .05, two-tailed
Test
statistic
Observed
value
alpha
level Other tests:
One or two
tailed
All Chi-Sq
tests are one-
tailed!
Degrees of
freedom
Sample size
(Chi-Sq only)
60
COHEN’S W
C o h e n’ s w c a n b e u s e d to m e a s ur e e f fec t s i z e fo r b ot h t y p e s
o f c h i – s quar e te s t s :
C o h e n s u g ges ted t h a t .10 i s a s m a l l e f fe ct , . 3 0 a m e d ium
e f fec t , a nd . 5 0 a l a r g e e f fec t .
C o h e n’ s w d o e s not u s e t h e s a m p le s i z e , t h e r e fo re t h e s a m p le
s i z e d o e s not a f fe ct t h e va l u e o f w.
∑
−
=
e
eo
P
PP
w
2)(
n
f
P
o
o =
Observed
proportion
61
THE PHI-COEFFICIENT
Fo r a 2 x 2 m a t r i x , t h e p h i c o e f fi c ient (Φ) m e a s ur es t h e
s t r e ng t h o f t h e r e l a tio ns hip
So Φ2 is the proportion of variance
accounted for, just like r2n
2
φ
χ
=
62
EFFECT SIZE IN A LARGER MATRIX
Fo r a l a r g er m a t r i x , a m o d i fi cat io n o f t h e
p h i – c o ef fic ient i s u s e d: C r a m er ’ s V
d f * i s t h e s m a l ler o f ( R – 1 ) o r ( C – 1 )
*)(
2
dfn
V
χ
=
63
ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S
Independence of Obser vations
E.g., The observation that Subject A is a Chemistry
major must be independent from the observation
that Subject B is an English major
Random sampling
Each observed frequency needs to come from a
different participant
What if people can be double-majors?
64
ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S
S i z e o f E x p ec ted Fr e quenc ies
Cochran’s Rule: Cell frequencies should all be > 5
More lenient updates to the rule:
No expected cell frequency should be less than 1
No more than 20% of the expected cell frequencies should be less than 5
Note: For a 2×2 matrix this means a single cell
Solutions?
Increase your sample size
Consider collapsing categories together (should be done with caution –
can make it more dif ficult to reject H0)
65
ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S
E . g . , E x p ec ted Fr e qu enc ies
Teens Young
Adults
Middle
Aged
Seniors
Liberal 20 18 9 3
Conservative 4 12 2 8
Young Old
Liberal 38 12
Conservative 16 10
Learning check!
66
BACK TO THE BIG
PICTURE
67
What type of
claim?
Frequency
(one variable)
Chi-Sq
Goodness
Association
(two variables)
Categorical? Chi-Sq Ind
Quantitative? Correlation
HOW TO CHOOSE A TEST
68
D a t a p ro j e ct
*** Important! Updated Data File as of today, March 14
69
TO-DO
- Psy 202H1: �Statistics iI���Module 8: �Intro to Chi Square�
- Game Plan
- Introduction to �Chi-Square Analysis
- Parametric vs Nonparametric tests
- Slide Number 5
- The Chi-Square Statistic
- The Chi-Square Statistic
- 2 Test for Goodness-of-Fit
- 2 Test for Goodness-of-Fit
- 2 Test for Goodness-of-Fit
- Research Spotlight: Have we reached gender parity in Tech companies?
- Gender parity in tech
- Gender parity in Tech
- Research Problem
- Research Problem
- Step 1: Statistical Hypotheses
- Step 2: Find Critical Value
- Critical value of 2
- Step 3: Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Compute observed 2
- Step 4: Make a decision
- Step 5: Report results
- Reporting a
- Chi Square Test of Independence
- The Chi-Square Statistic
- The Chi-Square Statistic
- Research Spotlight: Are there systematic differences in what kind of selfies men and women post?
- Study 1 Method
- Study 1: Method
- Study 1: Results
- Study 1: Discussion
- Research Problem
- The Chi-Square Statistic
- 2 Test of Independence
- 2 Test of Independence
- 2 Test of Independence
- Computing 2
- Computing 2
- Hypothesis testing steps
- Research Problem
- Research Problem
- Statistical Hypotheses
- Find Critical Value
- Critical value of 2
- Compute observed 2
- Compute expected frequencies
- Compute expected frequencies
- Compute observed 2
- Make a decision
- Report and Summarize Findings
- Reporting a 2
- Cohen’s w
- The Phi-Coefficient
- Effect Size in a Larger Matrix
- Assumptions & Restrictions� for Chi-Square Tests
- Assumptions & Restrictions� for Chi-Square Tests
- Assumptions & Restrictions� for Chi-Square Tests
- Back to the Big pIcture
- How to Choose A Test
- To-Do
TD0409-01 课件/Psy 202_6_correlation_W22_topost.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS I
MODULE 6:
INTRO TO CORRELATION
1
1. Intro to Correlation
2. Hypothesis Testing with Correlation
3. What are correlations used for?
4. Interpreting Correlation
1. Issues to look out for
GAME PLAN
2
INTRO TO CORRELATION
3
THE BIG PICTURE
Single
score
1 IV
z score
z test
One sample t-
test
Making comparisons
to population (NO IVs)
Sample
mean
σ known σ unknown
Making comparisons
between levels of IV(s)
or groups
More than 1 IV
2 levels 3+ levels IV
Between
subjects
Within
subjects
Independent
samples t-test
Paired
samples t-test
Between
subjects
Within
subjects
One-Way
Between
ANOVA
One-Way
Repeated
ANOVA
All IVs
Between
subjects
All IVs
Within
subjects
Mix of
within and
between
Between subj
Factorial
ANOVA
Repeated
Measures
Factorial ANOVA
Mixed Model
Factorial
ANOVA 4
Statistical Test IV DV
Correlation/Linear
regression
Quantitative Quantitative
Independent
samples t-test
Two independent
categories
Quantitative
Paired sample
t-test
Two related groups Quantitative
ANOVA Multiple categories Quantitative
Chi-square Two or more
categories
Categorical
5
RESEARCH PROBLEM
What is the relationship between hours studying and
scores on a quiz?
Conduct a non-experimental study
n = 6 students
Measure hours studying for an exam (X)
Record each student’s quiz score (Y)
Examine association between hours studying and quiz
scores
Does study time predict quiz scores?
6
RESEARCH PROBLEM
Correlation
Direction and strength of an association between two variables
(X,Y)
Typically (but not only) used in non-experimental research
(variables are measured, not manipulated)
Other examples:
Relationship between stressful life events (X) and number of
illness symptoms (Y)
Relationship between years of education (X) and yearly income (Y)
7
TOOLS FOR CORRELATION
The Scatterplot
A figure
Shows association between two variables
The Pearson correlation coef ficient
A statistic
Describes the direction and strength of a linear association
between two continuous variables
8
THE SCATTERPLOT
Hours studying and quiz scores
Student
Study Hrs
(X)
Test Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
n = 6 people,
6 pairs of
scores
n =6
9
THE SCATTERPLOT
H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Hours Studying (X)
Q
ui
z
S
co
re
(Y
)
Student Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
10
THE SCATTERPLOT
H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Hours Studying (X)
Q
ui
z
S
co
re
(Y
)
Student Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
11
THE SCATTERPLOT
H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Hours Studying (X)
Q
ui
z
S
co
re
(Y
)
Student Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
12
THE SCATTERPLOT
H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Hours Studying (X)
Q
ui
z
S
co
re
(Y
)
Student Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
13
THE SCATTERPLOT
H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Hours Studying (X)
Q
ui
z
S
co
re
(Y
)
Student Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
14
Scatterplot
Height (in)
W
ei
gh
t
(lb
s.
)
48 54 60 66 72 78 84
7
0
1
0
0
1
3
0
1
6
0
1
9
0
2
1
0
24
0
SEEING RELATIONSHIPS
15
SUMMARIZING RELATIONSHIPS
Height (in)
W
ei
gh
t
(lb
s.
)
48 54 60 66 72 78 84
7
0
1
0
0
1
3
0
1
6
0
1
9
0
2
1
0
24
0
Linear
relationship
Describes variables that
can be well-represented
by a straight line (i.e.,
there is a common ratio
between a score on one
and a score on the other)
16
Height (in)
W
ei
gh
t
(lb
s.
)
48 54 60 66 72 78 84
7
0
1
0
0
1
3
0
1
6
0
1
9
0
2
1
0
24
0
SUMMARIZING RELATIONSHIPS
17
SUMMARIZING RELATIONSHIPS
Grade Point Average
P
ar
ty
h
ou
rs
(w
ee
k)
1.0 2.0 3.0 4.0
0
5
1
0
1
5
18
SUMMARIZING RELATIONSHIPS
Ex
am
p
er
fo
rm
an
ce
Curvilinear
relationship
Reported Anxiety
19
h t t p : / / w w w. p e w r e s e a r c h . o r g / f a c t – t a n k / 2 01 5 / 0 9 / 1 6 / t h e – a r t – a n d – s c i e n c e – o f –
t h e – s c a t t e r p l o t /
NOT A GIVEN…
20
DESCRIBING RELATIONSHIPS
When we talk about statistical relationships,
we begin by assessing the covariance, or
degree to which two variables var y together.
This statistic is used as the basis for the
correlation coefficient, a statistic that
measures the relationship between variables.
Pearson’s product-moment correlation: r
Spearman’s rank-order correlation: r s
Point-biserial correlation: rpb
21
THE CORRELATION COEFFICIENT:
BASICS
Pear son Correlation Coef ficient
Symbol: r
Ranges from -1.0 to +1.0
Sign (+/-) indicates “direction” of relationship
Value indicates “strength” of relationship
• Some general guidelines
• .10 is weak
• .30 is moderate
• .50 is strong
Measures a linear relationship only
Remember: r2 guidelines
• .01 weak
• .09 moderate
• .25 strong
22
THE CORRELATION COEFFICIENT
Figure 16-3 (p. 524). Examples of positive and negative relationships. (a) Beer sales are
positively related to temperature. (b) Coffee sales are negatively related to temperature.
Positive Correlation
X = Temperature
Y = Beer Sales
Negative Correlation
X = Temperature
Y = Coffee Sales
23
THE CORRELATION COEFFICIENT
Figure 16-5 (p. 525). Examples of different values for linear correlations: (a) shows a strong positive
relationship, r = +.90; (b) shows a moderate negative correlation, r = –.40; (c) shows a perfect negative
correlation, r = –1.0; (d) shows no linear trend, r = 0.0.
r = +.90
r = −1.0
r = −.40
r = 0
How closely
do the dots
hug the
line?
24
COMPUTING R
r = degree to which X & Y vary together
degree to which X & Y vary separately
r = Covariance of X & Y
Variance of X & Y
25
COVARIABILIT Y OF X AND Y
YX XY
Variance in
X alone
Variance in
Y alone
Covariance between
X and Y
• The greater the covariance, the greater the correlation (the closer r will be to ±1.0)
26
COMPUTING R
Computational formulas for Pearson r
SP =
n
YX
XY ∑ ∑∑ −
SSX = n
X)(
X
2
2 ∑∑ − SSY =
n
Y)(
Y
2
2 ∑∑ −
r =
YXSSSS
SP
Where:
• SP = “Sum of products”
• SS = “Sum of squares”
SP = similar
to SS, but for
COvariance
Learning check! 27
HYPOTHESIS TESTING FOR R
State the research question
Is there a significant linear association between X & Y?
Is r significantly different from zero?
ρ = “rho” the population parameter
r = sample statistic
28
HYPOTHESIS TESTING FOR R
Step 1: Statistical Hypotheses for r
Almost always two-tailed (non-directional)
H0: ρ = 0
H1: ρ ≠ 0
One-tailed upper (directional)
H0: ρ ≤ 0
H1: ρ > 0
One-tailed lower (directional)
H0: ρ ≥ 0
H1: ρ < 0
29
HYPOTHESIS TESTING FOR R
Step 2: Find critical value of r (Table)
Need 3 pieces of information:
α
One-tailed or two-tailed?
degrees of freedom: df = n−2
30
31
HYPOTHESIS TESTING FOR R
Step 2: Find critical value of r (Table)
Need 3 pieces of information:
α
One-tailed or two-tailed?
degrees of freedom: df = n−2
Step 3: Compute obser ved r
Step 4: Make a decision
Reject H0 if observed r exceeds rcritical
Step 5: Summarize and repor t findings
32
LET’S PRACTICE!
Research question
Is there a significant linear association between hours
studying and quiz score?
Is r significantly different from zero?
Step 1: Statistical Hypotheses
H0: ρ = 0
H1: ρ ≠ 0
Step 2: Find rcritical in Table α = .05
Two-tailed
df = n−2; df = 6−2 = 4
33
34
LET’S PRACTICE!
Research question
Is there a significant linear association between hours studying
and quiz score?
Is r significantly different from zero?
Step 1: Statistical Hypotheses
H0: ρ = 0
H1: ρ ≠ 0
Step 2: Find rcritical in Table α = . 0 5
Two-tailed
df = n−2; df = 6−2 = 4
From table rcrit = ±.811
35
LET’S PRACTICE
Step 3: Compute obser ved r
Steps in computing r:
Compute SSX
Compute SSY
Compute SP
Compute r
36
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6
n =6 ΣX =22
SSX =
n
X)(
X
2
2 ∑∑ −
37
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y) X2
A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49
n =6 ΣX =22 ΣX2 =112
SSX = n
X)(
X
2
2 ∑∑ −
38
COMPUTING R
Compute SSx
SSX = n
X)(
X
2
2 ∑∑ −
SSX = 333.316
22
112
2
=−
39
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y) X2
A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49
n =6 ΣX =22 ΣY =23 ΣX2 =112
SSY = n
Y)(
Y
2
2 ∑∑ −
40
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y) X2 Y2
A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36
n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103
SSY = n
Y)(
Y
2
2 ∑∑ −
41
COMPUTING R
Compute SSY
SSY = n
Y)(
Y
2
2 ∑∑ −
SSY = 833.146
23
103
2
=−
42
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y) X2 Y2
A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36
n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103
SP = n
YX
XY ∑ ∑∑ −
43
LET’S PRACTICE
H o u r s s t u d y ing a nd qu i z s c o r e s
Student
Hours
(X)
Score
(Y) X2 Y2 XY
A 1 1 1 1 1
B 1 3 1 9 3
C 3 4 9 16 12
D 4 5 16 25 20
E 6 4 36 16 24
F 7 6 49 36 42
n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103 ΣXY =102
SP = n
YX
XY ∑ ∑∑ −
44
COMPUTING R
Compute SP:
SP =
n
YX
XY ∑ ∑∑ −
SP = 667.17
6
)23)(22(
102 =−
45
COMPUTING R
Finally, compute r!
r =
YXSSSS
SP
r = 819.571.21
667.17
4.833)(31.333)(1
17.776
+==
46
LET’S PRACTICE!
Step 4: Make a Decision
Reject H0: robs (+.819) exceeds rcrit (±.811)
Step 5: Summarize and repor t finding
“There was a statistically significant positive correlation
between hours studying and quiz scores, r(4) = .82, p <
.05, two-tailed, r2 = .67. Students who studied longer
earned higher scores on the quiz.”
Notice: No causal
language!
47
LET’S PRACTICE!
Compute r2 ( “ c o e f fic ient o f d eter minat io n”)
Effect size
r2 = .8192 = .67
67% of the variance in quiz scores is explained by
hours studying (and vice versa)
48
REPORTING AN R
A closer look…
r(4) = .82, p < .05, two-tailed, r2 =.67
Test
statistic
Observed
value
alpha
level One- or two-
tailed
Degrees of
freedom
Effect size
49
Quantitative data
Independent obser vations
Random sampling
Linear relationship
ASSUMPTIONS FOR PEARSON’S R
Learning check!
50
SPOTLIGHT ON T WIN STUDIES
51
52
WHAT ARE
CORRELATIONS FOR?
COMMON USES FOR CORRELATIONS
Prediction
Note: this is NOT causal language
Measurement assessment
Validity (accuracy)
Reliability (consistency)
53
VARIOUS USES OF CORRELATIONS
P r e dic t io n I f we k now t h a t t wo va r i a bles a r e r e l a ted to o ne
a not h e r, we c a n u s e k n owl e d ge ab o u t o n e var iable to m a ke
p r e d ic tio ns a b o u t t h e va l u e o f t h e ot h e r va r i abl e
E.g., How tall do you think my niece is? Does it help if I tell you that
she just turned 5?
54
VARIOUS USES OF CORRELATIONS
Va l i dit y o f m e a s ur es
Convergent validity: How strongly does the measure correlate with
other measures of the same construct?
E.g., Does the self-esteem measure you’ve just constructed correlate
positively with existing self-esteem measures? (good thing)
Discriminant validity: How strongly does the measure correlate with
measures of unrelated constructs?
E.g., Does the self-esteem measure you’ve just constructed correlate
positively with measures of unrelated constructs (e.g., mood)? (bad thing)
55
VARIOUS USES OF CORRELATIONS
Re l ia bili ty o f m e a s ures
Reliable measures should produce consistent, stable results
E.g., If you are measuring IQ, or a personality trait, or any other
construct where you expect stable results, you would expect a
person’s scores from any two measurement sessions to be highly
correlated
56
VARIOUS USES OF CORRELATIONS
T h eo r y Ve r ifica t io n M a ny p s yc ho lo gic al t h e o r i es i nvo l ve
s p e c ific p r e d ic tio ns a b o u t t h e r e l a t io ns hip b et ween t wo
va r i abl es
One way these predictions can be tested is by determining the
correlation between the two variables
E.g., The General Aggression Model predicts positive relationships
between recent exposure to violent media and a host of aggression-
related variables (hostile expectancies, aggressive cognitions,
physiological arousal, etc.)
57
INTERPRETING
CORRELATION
58
PROCEED WITH CAUTION…
1. Correlation is sensitive to outliers
2. Correlation is only appropriate for describing
linear relationships
3. Correlation is sensitive to restriction of range
(lack of generalization)
4. Beware of heterogeneous samples
5. Correlation does not imply causation
59
1 . SENSITIVE TO OUTLIERS
222120191817
12
10
8
6
4
2
0
-2
r = -.10
605040302010
50
40
30
20
10
0
-10
r = .94
• An outlier is an extremely deviant individual in the sample
• Characterized by a much larger (or smaller) score than all the others in the sample
• In a scatter plot, the point is clearly different from all the other points
• Outliers produce a disproportionately large impact on the correlation coefficient
60
2. LINEAR RELATIONSHIPS ONLY
r = .10
Reported anxiety
Ex
am
p
er
fo
rm
an
ce
61
3. RESTRICTION OF RANGE
Score on IQ test
S
co
re
o
n
ge
ne
ra
l m
at
h
te
st
r = .82
75 80 85 90 95 100 105 110 115 120 125 130
r = -.13
105 110 115 120 125
62
4. HETEROGENEOUS SAMPLES
r = – .70
Reported Anxiety
P
er
fo
rm
an
ce
o
n
Ex
am
63
AND NOW FOR SOME EXAMPLES…
64
65
66
67
68
69
We should all be texting while at Church (and
also having unprotected sex)!
Thinking of cleaning as women’s work is
actually better for both men and women
(especially if women do more housework, to
cut their risk of cancer)!
Any chance there is a problem here???
SO WHAT HAVE WE LEARNED?
70
5. CORRELATION IS NOT
CAUSATION
Ice cream sales
D
ea
th
b
y
D
ro
w
ni
ng
If X and Y are correlated:
… does X cause Y?
… does Y cause X?
… does Z cause X and Y?
71
NAME THAT CORRELATION…
72
NAME THAT CORRELATION…
73
NAME THAT CORRELATION…
74
NAME THAT CORRELATION…
75
WHAT’S WRONG WITH THIS
PICTURE?
r = – .80
WHAT’S WRONG WITH THIS
PICTURE?
r = .85
WHAT’S WRONG WITH THIS
PICTURE?
r = .10
WHAT’S WRONG WITH THIS
RESEARCH?
“The data showed a strong and highly significant
positive correlation between date of onset of
sexual activity and current level of sexual activity
(r = 0.75, p < .01), suggesting that teenagers
who begin having sex at an earlier age are more
promiscuous in college as a result.”
“The negative correlation coef ficient shows that
there is no relationship between these traits.”
“The correlation was significant (r = -1 .22)…”
79
h t t p : / / gues st hec o rr elat io n. c o m/
MORE PRACTICE?
80
C a n m o r e e a s i ly
i d e nt if y i s s u e s
t h a t m i g ht
i nte r fer e w i t h yo u r
a b i li ty to i nte rpr et
yo u r d a t a
PROTIP: LOOK AT A SCATTERPLOT FIRST
81
ALTERNATIVES TO THE
PEARSON CORRELATION
Pearson correlation has been developed
For data having linear relationships
With data from interval or ratio measurement scales
Other correlations have been developed
For data having non-linear relationships
With data from nominal or ordinal measurement
scales
Point-biserial
Spearman’s correlation
82
SUMMARY
Correlations var y in type and magnitude
Errors are commonly made when interpreting
correlations
Look at a scatterplot!
83
h t t p s : / /kot aku. c o m/ anto nin – s ca lias- l andmark – defens e- o f – v io lent –
v i d e o – games – 17 58 99 036 0
EVEN THE US SUPREME COURT
KNOWS WHAT’S UP
84
And remember for the rest of your life:
Correlation does NOT equal causation!
Practice interpreting correlations on the
discussion board
85
86
JUST TO COMPLICATE
THINGS A LITTLE…
87
…BUT SOMETIMES IT KIND OF IS
https://www.youtube.com/watch?v=HUti6vGctQM&fbclid=IwAR2orZs_ECdn0
94_eSkyyp-1ZKXWtIv3USW2PL6N9oZunqIBY1nlTuUxAh4
“In e s s e n c e , to l o g i c al l y i n fe r t h at X c aus e d Y, we n e e d to me et
t h re e re q ui re me n t s :
We must know that X preceded Y. It is not possible for a cause to follow
or even coincide with an ef fect. It must come before it, even if it is
fractions of a second.
X must covar y with Y. In other words, Y must be more likely to
occur when X occurs than when X does not occur.
The relationship between X and Y is free from confounding. What this
means is that no other variable also covaries with X when #1 and #2 are
met.”
W h at abo ut w h e n a t rue ex pe ri me n t i s n ot po s s i b l e ? Gi ve up?
It may be mo re us e f ul to t h i n k o f c aus al i t y o n a c o n t i n uum
rat h e r t h an as a di c h oto mo us o ut c o me
Se e mo re : h t t p: / / i c bs eve r y w h e r e . c o m/ bl o g / 2 01 4 / 10 / t h e -l o g i c –
o f – c aus al -c o n c l us i o n s /
88
…BUT SOMETIMES IT KIND OF IS
Ke e p u p w i t h t u to r i als !
Data project information: coming soon
89
TO-DO
- Psy 202H1: �Statistics I����Module 6: �Intro to Correlation ��
- Game Plan
- Intro to Correlation
- Slide Number 4
- Slide Number 5
- Research Problem
- Research Problem
- Tools for Correlation
- The Scatterplot
- The Scatterplot
- The Scatterplot
- The Scatterplot
- The Scatterplot
- The Scatterplot
- Seeing relationships
- Summarizing relationships
- Summarizing relationships
- Summarizing relationships
- Summarizing relationships
- Not a given…
- Describing relationships
- The Correlation Coefficient: Basics
- The Correlation Coefficient
- The Correlation Coefficient
- Computing r
- Covariability of X and Y
- Computing r
- Hypothesis testing for r
- Hypothesis testing for r
- Hypothesis testing for r
- Slide Number 31
- Hypothesis testing for r
- Let’s Practice!
- Slide Number 34
- Let’s Practice!
- Let’s Practice
- Let’s Practice
- Let’s Practice
- Computing r
- Let’s Practice
- Let’s Practice
- Computing r
- Let’s Practice
- Let’s Practice
- Computing r
- Computing r
- Let’s Practice!
- Let’s Practice!
- Reporting an r
- Assumptions for Pearson’s R
- Spotlight on Twin Studies
- What are correlations for?
- Common Uses for Correlations
- Various Uses of Correlations
- Various Uses of Correlations
- Various Uses of Correlations
- Various Uses of Correlations
- Interpreting Correlation
- Proceed with caution…
- 1. Sensitive to outliers
- 2. Linear relationships only
- 3. Restriction of range
- 4. Heterogeneous samples
- And now for some examples…
- Slide Number 65
- Slide Number 66
- Slide Number 67
- Slide Number 68
- Slide Number 69
- Slide Number 70
- 5. Correlation is not causation
- Name that correlation…
- Name that correlation…
- Name that correlation…
- Name that correlation…
- What’s wrong with this picture?
- What’s wrong with this picture?
- What’s wrong with this picture?
- What’s wrong with this research?
- More Practice?
- ProTIP: Look at a Scatterplot First
- Alternatives to the � Pearson Correlation
- Summary
- Even the US Supreme Court �Knows What’s Up
- Slide Number 85
- Just to complicate things a little…
- …but sometimes it kind of is
- …but sometimes it kind of is
- To-Do
TD0409-01 课件/Psy 202_5_MoreFactorial_W22.pdf
In s t ruc to r :
Dr. M o l l y M et z
PSY 202H1:
STATISTICS II
MODULE 5:
HYPOTHESIS TESTING WITH
FACTORIAL ANOVA
1
1. Review
1. More on interactions and simple effects
2. Another 2 factor design
2. Hypothesis Testing with Factorial ANOVA
1. Sources of variance
2. Foundations of hypothesis test
3. Example, with numbers!
GAME PLAN
2
Learning check!
Factorial Design review
Te s t w i l l b e p o s te d Tu e sd ay Fe b r uar y 1 5 9 a m a nd w i l l b e d u e
T h u r sday Fe b r uar y 17 1 1 : 5 9 p m
This is NOT a timed test. You may start it, take a break, and return to
it.
However, I do NOT recommend taking the whole time to complete the
test!
It will be written as if it could be completed like an in-person test, about 2
hours (assuming you prepare for it as if it were an in-person test).
C o nte nt: A ll r e a d ings a nd l e c t u r es t h ro u g h M o d u le 5
Including things reviewed in text but not in lecture video
Fo r m a t o f qu e s t i o ns m ay i nc l u de
Multiple Choice
Short Answer
Computations
MIDTERM
3
Pe rmi t te d re s o urc e s :
Your book , your notes
Simple calculator
N OT pe rmi t te d re s o urc e s :
Your friends/classmates
Including any group chats, like Discord, GroupMe, Facebook , etc.
Any other people, including but not limited to those who have taken this
course before
Google (or any internet resources)
IM P O R TA N T: N ot j us t W HAT but W HY ; appl i c at i o n
Is MindTap similar to the test? Kind of…
W i l l i t be w ri t te n to b e h arde r to make up fo r t h e f ac t t h at i t i s
o pe n bo o k ?
No, but…
MIDTERM
4
5
MORE ON:
INTERACTIONS AND
SIMPLE EFFECTS
SIMPLE EFFECTS
• Ef fects of one IV on DV at one par ticular level of other IV
0
2
4
6
8
10
12
14
16
18
20
Head nod Head shake
Tuition
increase
Tuition
decrease
1. Simple effect of tuition condition on attitude, for head nodding condition
2. Simple effect of tuition condition on attitude, for head shaking condition
3. Simple effect of head condition on attitude, for tuition increase condition
4. Simple effect of head condition on attitude, for tuition decrease condition
6
Head Nodding Head Shaking
Tuition
Increase
Tuition
Decrease
1. Main ef fect of head
condition?
2. Main ef fect of message
condition?
3. Interaction between
head and message
conditions?
ACTUAL RESULTS
O
pi
ni
on
o
n
Tu
it
io
n
C
ha
ng
e
7
ANOTHER EXAMPLE
TWO FACTOR DESIGN
8
T WO-FACTOR DESIGNS
Test scores for boy
students put in a happy
mood
Test scores for girl
students put in a happy
mood
Test scores for boy
students put in a sad
mood
Test scores for girl
students put in a sad
mood
F
A
C
T
O
R
A
FACTOR B
Level 1 (B1) Level 2 (B2)
Level 1 (A1)
Level 2 (A2)
9
REVIEW: TERMINOLOGY
Fa c to r T h e va r i able ( i nd epend ent o r qu a s i – independent )
t h a t d e s igna tes t h e g ro u p s b e i ng c o m p a r ed
E.g., Mood
L evel T h e i nd i v idu al c o nd i t i o ns o r va l u es t h a t m a ke u p a
f a c to r a r e c a l l ed t h e l evels o f t h e f a c to r
E.g., Happy vs Sad
Fa c to ri a l d e s ign A ny s t u d y t h a t c o m b i ne s t wo o r m o r e
f a c to r s
Comparing how male and female students perform on a general
knowledge test after being put in either a sad or happy mood
2 (gender: boy, girl) x 2 (mood: happy, sad) independent measures
design
10
T WO-FACTOR DESIGNS
Test scores for boy
students put in a happy
mood
Test scores for girl
students put in a happy
mood
Test scores for boy
students put in a sad
mood
Test scores for girl
students put in a sad
mood
F
A
C
T
O
R
A
FACTOR B
Level 1 (B1) Level 2 (B2)
Main effect
of FACTOR
A Do
test scores
differ
depending
on mood?
Row 1
Row 2
Level 1 (A1)
Level 2 (A2)
11
T WO-FACTOR DESIGNS
Test scores for boy
students put in a happy
mood
Test scores for girl
students put in a happy
mood
Test scores for boy
students put in a sad
mood
Test scores for girl
students put in a sad
mood
F
A
C
T
O
R
A
FACTOR B
Level 1 (B1) Level 2 (B2)
Main effect of FACTOR B Do test scores differ
depending on the gender of the participant?
C
O
L
U
M
N
1
C
O
L
U
M
N
2
Level 1 (A1)
Level 2 (A2)
12
MAIN EFFECTS
The mean dif ferences among the levels of one fac tor
are referred to as the main ef fect of that factor
You’re not interested in the individual cells, you’re
interested in comparing the means of each row
(Factor A) and the means of each column (Factor B)
Boys (B1) Girls (B2)
Happy (A1) M = 100 M = 110
Sad (A2) M = 110 M = 100
MA1 = 105
MA2 = 105
MB1 = 105 MB2 = 105
13
INTERACTIONS
W h en t h e e f fec t o f o ne f a c to r d e p ends o n t h e d i f fer ent
l evels o f a s e c o nd f a c to r, t h e n t h e r e i s a n i n ter ac ti on
b et ween t h e f a c to r s
Now you’re interested in the individual cells: An interaction between
two factors occurs when the mean differences between individual
treatment conditions (or cells) are different from what would be
predicted from the overall main effects of the factors
Boys (B1) Girls (B2)
Happy (A1) M = 100 M = 110
Sad (A2) M = 110 M = 100
MA1 = 105
MA2 = 105
MB1 = 105 MB2 = 105
14
T WO-FACTOR DESIGNS
Test scores for boy
students put in a happy
mood
Test scores for girl
students put in a happy
mood
Test scores for boy
students put in a sad
mood
Test scores for girl
students put in a sad
mood
F
A
C
T
O
R
A
FACTOR B
Level 1 (B1) Level 2 (B2)
Level 1 (A1)
Level 2 (A2)
A x B INTERACTION: Does the effect of mood
depend on participant gender? 15
INTERACTIONS
W h en t h e r e s u lt s o f a t wo – f a c to r s t u d y a r e p r e s e nted i n a
g r a p h, t h e ex i s tence o f n o n – p ar al l e l l i n e s ( i . e . , l i ne s t h a t
c ro s s o r c o nve r ge) i nd i c ates a n i n ter ac ti on b et ween t h e t wo
f a c to r s
95
100
105
110
115
Happy Sad
Te
st
S
co
re
Mood
Boys
Girls
16
Learning check!
A ka, w h a t ’ s h a p pening S TATISTI CAL LY w i t h a
f a c to r i al A N OVA?
17
HYPOTHESIS TESTING
WITH FACTORIAL ANOVA
FACTORIAL ANOVA LOGIC
“Natural
variability”
“Variability across
group means”
“Variability
within groups”
“Variability between
groups”
=
“Error”
“Effect”
=
18
T YPES OF VARIANCES IN FACTORIAL
ANOVA (2 X 2)
• More variable More effects More sources of
variance:
o Between-groups for main effect of IV 1
o Between-groups for main effect of IV 2
o Between-groups for interaction of IV 1 and IV 2
o Within-groups (natural variability)*****
3
possible
Between
1 within
19
FACTORIAL ANOVA –
“Natural
variability”
“Variability across
group means”
“Variability
within groups”
“Variability between
groups”
=
“Error”
“Effect”
=
**Calculated separately for each main effect and
interaction
20
ANALYSIS OF VARIANCE
• Goal: explain the total variance in a set of scores by
determining how much is due to our IVs versus natural
variability
• In a one-way ANOVA, we had only two possible sources of
variance: between-groups and within-groups
• Now, we have many different sources:
• Main effect of IV1 (between-groups)
• Main effect of IV 2 (between-groups)
• Interaction (between-groups)
• Natural or error variability (within-groups)
21
T WO-FACTOR ANOVA
In a t wo -f ac to r s t udy, we n e e d to te s t fo r t wo mai n e f fe c t s an d an
i n te rac t i o n
Main ef fect of factor A
Main ef fect of factor B
A x B interaction
Th i s me an s t h at we h ave t h re e s e parate hy pot h e s e s , w i t h 3
di f fe re n t s et s o f hy pot h e s e s , w h i c h are te s te d w i t h t h re e
s e parate F-rat i o s
The ANOVA allows us to test for all three ef fects in a single analysis
It is impor tant to understand that each ef fect (each hypothesis) is
independent from the others this means that any pattern of
significant/non-significant results is possible
Two significant main ef fects, no interaction
Main ef fect of factor A , but not factor B, and a significant interaction
Only a significant interaction (no main ef fects)
Etc.
22
HYPOTHESES FOR MAIN EFFECTS
Fa c to r A :
Null: A has no effect on outcome
H0: μA1 = μA2
Alternative: A does have an effect on outcome
H1: μA1 ≠ μA2
Fa c to r B :
Null: B has no effect on outcome
H0: μB1 = μB2
Alternative: B does have an effect on outcome
H1: μB1 ≠ μB2
23
HYPOTHESES FOR THE
INTERACTION
N u l l hy p ot h esis :
H0: There is no interaction between factors A and B. All the mean
differences between treatment conditions are explained by the
main effects of the factors.
A l ter nati ve hy p ot hes is:
H1: There is an interaction between factors A and B. The mean
differences between treatment conditions are not what would be
predicted from the overall main effects of the two factors.
In symbols, at level B1: μA1 = μA2, at level B2: μA1 = μA2; At level A1 = ….
24
THE THREE F -RATIOS I N A
T WO-FACTOR ANOVA
FA = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r A
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
F B = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r B
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
FA x B = v a r i a n c e ( m e a n d i f f e r e n c e s ) n o t e x p l a i n e d by m a i n e f f e c t s
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
25
T WO STAGES OF THE T WO-FACTOR ANALYSIS
OF VARIANCE
S t a ge 1 : S a m e a s i nd e pend ent m e a s ur es A N OVA ( o r s t a g e 1 o f
t h e r e p e ated m e a s ur es A N OVA) tot a l va r i a nc e i s b ro ken
d own i nto bet ween-tr eatments var i an c e a nd w i t hi n- tr eatments
v a r i anc e ( w h i c h b e c o m es t h e d e nomi nator fo r a l l t h r e e F –
r a t i o s )
S t a ge 2 : B et we en t r e a t ment s va r i anc e i s b ro ke n d ow n i nto
i nto t h r e e se p ar ate c o mp o nents : d i f fer ences a t t r i bu table to
Fa c to r A , to Fa c to r B , a nd to t h e A x B i nte r ac tio n ( w h ic h
b e c o m e t h e n umer ator s fo r e a c h r e s p ec t ive F – r a t io )
26
27
Now, BETWEEN-group
variance gets partitioned,
into our three (or more)
effects of interest! So, the
sum of A, B, and AxB
values (i.e., SS, df) will
always equal Between!
28
T WO-FACTOR ANOVA
SUMMARY TABLE EXAMPLE
Source SS df MS F
Between treatments 200 3
Factor A 40 1 40 4
Factor B 60 1 60 *6
A x B 100 1 100 **10
Within Treatments 300 20 10
Total 500 23
F.05 (1, 20) = 4.35*
F.01 (1, 20) = 8.10**
(N = 24; n = 6)
29
EFFECT SI ZE FOR T WO-FACTOR ANOVA:
PARTIAL ETA SQUARED
η2 for each factor and the interaction is computed as
the percentage of variability not explained by other
factor s
Two equivalent equations
30
T WO-FACTOR ANOVA ASSUMPTIONS
T h e va l i dit y o f t h e A N OVA d e p ends o n t h r e e a s s u m pti o ns
c o m m o n to ot h e r hy p ot h esis te s t s
The obser vations within each sample must be independent of each
other
The populations from which the samples are selected must be
normally distributed
The populations from which the samples are selected must have
equal variances
(homogeneity of variance)
Learning check!
31
32
WHAT THE HYPOTHESIS
TEST LOOKS LIKE WITH
NUMBERS
EXAMPLE: HYPOTHESIS TESTING
WITH THE T WO-FACTOR ANOVA
• The following data is
from a study examining
the effects of arousal
level and task difficulty
on performance scores
(higher scores indicate
better performance)
• We will use it to
illustrate the hypothesis
testing procedure for a
two-factor ANOVA
(Notice that this is a 2 x 3
factorial design)
33
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
S te p 1 : St a te t h e hy p ot hes es
Fa c to r A Ta s k d i f fi cul ty
H0: μA1 = μA2 (or H0: μeasy = μdif ficult)
H1: μA1 ≠ μA2 (or H1: μeasy ≠ μdif ficult)
Fa c to r B A ro u s al l evel
H0: μB1 = μB2 = μB3 (or H0: μlow = μmedium = μhigh)
H1: μB1 ≠ μB2 ≠ μB3 (or H1: μlow ≠ μmedium ≠ μhigh)
I nte rac t io n Ta s k d i f fi cul ty x A ro u sa l l evel
H0: There is no interaction effect. The effect of either factor does not
depend on the levels of the other factor.
H1: There is an interaction effect.
34
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
Step 2: Compute the three F-ratios in two stages
Stage 1: Par tition SS total and df total
(same as s tage 1 for one-way repeated measures AND between groups ANOVA)
35
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
Step 2: Compute the three F-ratios in two stages
Stage 2 (NEW): Par tition SS betweentreatments
SSA = N
G
n
T
row
row
22
−∑
SSB = N
G
n
T
col
col
22
−∑
SSAxB = BAatmentsbetweentre SSSSSS −−
36
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
Step 2: Compute the three F-ratios in two stages
Stage 2 (NEW): Par tition df between-treatments
dfA = number of rows – 1
dfB = number of columns – 1
dfAxB = dfbetween treatments – dfA – dfB
37
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
Step 2: Compute the three F-ratios in two stages
Stage 2: Calculate the four MS values
treatmentswithin
treatmentswithin
within df
SS
MS
−
−= denominator for all
three F-ratios
AxB
AxB
AxB
B
B
B
A
A
A
df
SS
MS
df
SS
MS
df
SS
MS
=
=
=
numerators for the three F-ratios
38
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
S te p 2 : Compute the three F-ratios in two stages
Stage 2: Calculate the three F-ratios
within
AxB
AxB
within
B
B
within
A
A MS
MS
F
MS
MS
F
MS
MS
F ===
39
SUMMARY TABLE FOR T WO-FACTOR ANOVA
Source SS df MS
Between treatments
Factor A (difficulty)
Factor B (arousal)
A x B
260
120
80
60
5
1
2
2
120
40
30
F(1, 24) = 24.00
F(2, 24) = 8.00
F(2, 24) = 6.00
Within treatments 120 24 5
Total 380 29
40
HYPOTHESIS TESTING W ITH THE
T WO-FACTOR ANOVA
S te p 3 : F i nd t h e c r i t i c al F va l u e fo r e a c h F – r a t i o , c o m p a r e w i t h
t h e c o m p u ted F – r a t io , a nd m a ke a d e c is io n r e g a r ding e ac h H 0
( a l l te s ted a t . 0 5 l evel )
Fa c to r A : d f = 1 , 24 F crit = 4 . 2 6 ( FA = 24 . 0 0 )
Decision: Reject H0, conclude that there is a significant main effect
of task difficulty
Fa c to r B : d f = 2 , 24 F crit = 3 . 4 0 ( F B = 8 . 0 0 )
Decision: Reject H0, conclude that there is a significant main effect
of arousal level
A x B i nte rac t io n: d f = 2 , 24 F crit = 3 . 4 0 ( FAxB = 6 . 0 0 )
Decision: Reject H0, conclude that there is a significant interaction
between task difficulty and arousal
41
EFFECT SI ZE FOR T WO-FACTOR ANOVA:
PARTIAL ETA SQ UARE
How large is the effect of task difficulty?
How large is the effect of arousal?
How large is the interaction effect?
AxBBtotal
A
A SSSSSS
SS
−−
=2η
AxBAtotal
B
B SSSSSS
SS
−−
=2η
BAtotal
AxB
AxB SSSSSS
SS
−−
=2η
42
REPORTING RESULTS IN APA FORMAT
“A t wo – f a c to r b et ween- s ubjec ts a na l y si s o f va r i anc e s h owe d a
s i g ni ficant m a i n e f fe ct fo r t a s k d i f fi cul ty, F ( 1 , 24 ) = 24 . 0 0 , p < . 0 5 ,
η2 = . 5 0 , s u c h t h a t p a r t i c ipant s p e r fo r m ed b et ter o n e a s y t a s k s
( M = 6 , S D = 2 . 2 6 ) t h a n o n d i f ficu lt t a s k s ( M = 2 , S D = 1 . 8 5 ).
T h e r e wa s a l s o a s i g ni fic ant m a i n e f fe ct fo r a ro u s a l l evel, F ( 2 , 24 )
= 8 . 0 0 , p < . 0 5 , η2 = . 4 0 , s u c h t h a t p a r t i ci pants p e r fo r m ed b et ter
a s a ro u s al i nc r e as ed f ro m l ow ( M = 2 , S D = 1 . 7 ) to m e d i um ( M = 4 ,
S D = 2 . 31 ) to h i g h ( M = 6 , S D = 2 . 2 6 ).
F i na lly, t h e r e wa s a s i g ni fic ant t a s k d i f fi cult y x a ro u s a l
i nte r ac t io n, F ( 2 , 24 ) = 6 . 0 0 , p < . 0 5 , η2 = . 3 3 . A s c a n b e s e e n by
l o o k i ng a t F i g ure 1 , i nc r e a sed l eve ls o f a ro u s a l l e d to c o ns i s te ntl y
b et ter p e r fo r m ance w h e n t h e t a s k wa s e a s y. H owever, w h e n t h e
t a s k wa s d i f fic ult , a m o d e r ate l evel o f a ro u s a l l e d to t h e b e s t
p e r fo r m ance , w i t h s c o r e s s h a rply d e c r e asing a s a ro u s a l i s
i nc r e a sed f ro m m o d e r a te to h i g h. ” 43
Since this factor has 3 levels, we actually
need to do post hocs to establish which
means are different
POST HOC TESTS
I f yo u h ave a 2 x 2 d e s ign, p o s t h o c te s t s fo r a ny s i g ni fic ant
m a i n e f fec t s a r e u nne c e s sar y ( w hy?)
H owever, i f yo u h ave m o r e t h a n t wo l evels o f a f a c to r a nd a
s i g ni ficant m a i n e f fe ct , yo u m ay w i s h to c o nd u c t a p o s t h o c
te s t ( e . g . , Tu key ’ s H S D ) to d eter mi ne w h i c h m e a ns a r e
s i g ni ficant ly d i f ferent f ro m o ne a not h e r
44
POST HOC TESTS: TUKEY’S HSD
Re m e mber : Yo u wo u l d o nl y d o t h i s t y p e o f p o s t h o c te s t i f
t h e r e i s n o s i gni fi ca nt i n tera c t io n , b u t a s i gni fic a nt m a i n
e f fec t fo r a f a c to r w i t h m o r e t h a n t w o l evels ( e . g . , i f o u r
i nte r ac t io n wa s not s i g nific ant , b u t t h e r e wa s a m a i n e f fec t
o f a ro u s al )
• q To find the q value, you need to know: the alpha level (same as original
test), dfwithin (from original ANOVA), and k (the number of levels in the factor
you are testing)
• MSwithin from original ANOVA
• n the number of participants in each level you are comparing (e.g., how
many participants were in each arousal condition)
45
POST HOC TESTS
If yo u h ave a 2 x 2 de s i g n , po s t h o c te s t s fo r any s i g n i fi c an t mai n
e f fe c t s are un n e c e s s ar y (w hy ?)
Howeve r, i f yo u h ave mo re t h an t wo l eve l s o f a f ac to r an d a
s i g n i fi c an t mai n e f fe c t , yo u may w i s h to c o n duc t a po s t h o c te s t
(e . g . , Tukey ’ s HSD) to dete rmi n e w h i c h me an s are s i g n i fi c an t l y
di f fe re n t f ro m o n e an ot h e r
If the interaction is significant, don’t worr y too much about main ef fects
Why?
M o re i mp o r t an t l y (o r i n te re s t i n g l y ), i f yo u h ave a s i g n i fi c an t
i n te rac t i o n , yo u may wan t to te s t fo r simpl e m a in ef fec t s …
46
TESTING FOR SIMPLE MAIN EFFECTS
A s i g ni fic ant i nte r ac t io n i nd i c ates t h a t t h e e f fec t o f o ne f a c to r
( e . g . , a ro u s a l) o n t h e d e p endent va r i abl e ( e . g . , p e r fo r manc e)
d e p ends o n t h e l evel s o f t h e ot h e r f a c to r ( e . g . , w h et her t h e
t a s k i s e a s y o r d i f ficu lt)
To b et ter u nd e r s t and w h a t i s h a p p ening, we m ay w i s h to te s t
fo r t h e s i g ni fi canc e o f m e a n d i f fer enc es w i t h in o ne c o l u m n ( o r
row )
Test the simple main effect of one factor for each level of the other
factor
E.g., Test for significant differences between the levels of task
difficulty at each level of arousal (low, medium, high)
47
TESTING FOR SIMPLE MAIN EFFECTS
C a n t h i nk o f i t a s d i v idi ng t h e d a t a u p i nto nu m e ro u s sin g le –
f ac tor A N OVAs ( o r t – te s t s , i f o nl y t wo l evels o f a f a c to r )
Fo l l ow s s a m e p ro c e d ur e a s t h e o ne – way ( o r s i ng l e f a c to r )
i nd e pend ent m e a s ures A N OVA ( o r t – te s t )
48
TESTING FOR SIMPLE MAIN EFFECTS
E.g., At each level of arousal (factor B), we
test whether there is a significant difference
between the easy and difficult tasks (levels
of factor A)
H0: μeasy = μdif ficult (μA1 = μA2)
H1: μeasy ≠ μdif ficult (μA1 ≠ μA2)
F = variance (differences) for the means at this level of factor B
variance (differences) expected by chance
F = MSbetween for the two conditions at this level of factor B
MSwithintreatments from the original ANOVA
49
TESTING FOR SIMPLE MAIN EFFECTS
E.g., For the high level of arousal:
dfbetween treatments = k – 1 = 1
MSbetweentreatments = 160 = 160
1
MSwithintreatments = 5 (from previous)
F = 160 = 32.00
5 Fcrit (1, 24) = 4.26
Thus, at the high level of
arousal, there is a
significant difference in
performance on the
easy and difficult tasks
(we reject H0). 50
TESTING FOR SIMPLE MAIN EFFECTS
E.g., For the low level of arousal:
dfbetween treatments = k – 1 = 1
MSbetweentreatments = 10 = 10
1
MSwithintreatments = 5 (from previous)
F = 10 = 2.00
5 Fcrit (1, 24) = 4.26
Thus, at the low level of
arousal, there is not a
significant difference in
performance on the
easy and difficult tasks
(we fail to reject H0).
51
HIGHER -ORDER FACTORIAL DESIGNS
W h a t a b o u t c a s e s w h e re we h ave a s t u d y d e s ign i nvo l v ing
t h r e e ( o r m o r e ) f a c to r s ?
S a m e l o g i c , j u s t ex tend ed now we h ave a “ f a c to r C ” , a nd
w i l l ne e d to te s t fo r a m a i n e f fec t o f f a c to r C a nd a l s o w h et her
i t i nte r ac t s w i t h f a c to r A o r B ( i . e . , A x C a nd B x C
i nte r ac t io ns) , a s we l l a s a p ote nt ial t h r ee- way i n te r acti on: A x
B x C
T h r e e- way i nte r ac t io ns a r e m o r e c h a l lenging to i nte rp ret , b u t
c a n b e i nte re st ing a nd va l uabl e
However, interactions involving four or more variables are often more
confusing than they are helpful!
52
EXAMPLE: THREE-WAY INTERACTION
Pe rh aps t h e e f fec t s o f a ro u s a l l evel a nd t a s k d i f fi cul ty d i f fer
fo r m a l es a nd fe m ales
If we add gender to the mix, we now have a three-factor design (2 x 3 x
2)
0
2
4
6
8
10
12
Low Medium High
P
er
fo
rm
an
ce
Arousal Level
Female Participants
Easy
Difficult
0
2
4
6
8
10
12
Low Medium High
P
er
fo
rm
an
ce
Arousal Level
Male Participants
Easy
Difficult
53
Learning check!
M i nd t ap, Tu to r i al
M i d ter m!
Make a schedule – don’t cram it all in the few days before it is
posted.
Midterm info posted online – Syllabus > Assignments > Term Tests
54
TO-DO
BONUS CONTENT:
ANOTHER EXAMPLE OF
A TWO FACTOR DESIGN
55
E X A MPLE: S E L F -ESTEEM & P R ESENCE O F A N
AUDIEN CE
Three questions:
– Does the level of self esteem (low or high) affect performance? (main effect)
– Does the presence or absence of the audience affect performance? (main effect)
– Does the effect of one factor (e.g., the audience) depend on the levels of the
other factor (e.g., self-esteem)? (interaction effect)
Three
separate
hypotheses
and three
separate
F-ratios
56
HYPOTHESES FOR MAIN EFFECTS
Fa c to r A ( S e l f – esteem):
Null: Self-esteem has no effect on performance
H0: μA1 = μA2
Alternative: Self-esteem does have an effect on performance
H1: μA1 ≠ μA2
Fa c to r B ( Au dienc e):
Null: The absence or presence of an audience has no effect on
performance
H0: μB1 = μB2
Alternative: The absence or presence of an audience does have an
effect on performance
H1: μB1 ≠ μB2
57
HYPOTHESES FOR THE
INTERACTION
N u l l hy p ot h esis :
H0: There is no interaction between factors A and B. All the mean
differences between treatment conditions are explained by the
main effects of the factors.
A l ter nati ve hy p ot hes is:
H1: There is an interaction between factors A and B. The mean
differences between treatment conditions are not what would be
predicted from the overall main effects of the two factors.
In symbols, at level B1: μA1 = μA2, at level B2: μA1 = μA2; At level A1 = ….
58
THE THREE F -RATIOS I N A
T WO-FACTOR ANOVA
FA = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r A
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
F B = v a r i a n c e ( d i f f e r e n c e s ) b e t w e e n t h e m e a n s f o r f a c to r B
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
FA x B = v a r i a n c e ( m e a n d i f f e r e n c e s ) n o t e x p l a i n e d by m a i n e f f e c t s
v a r i a n c e ( d i f f e r e n c e s ) e x p e c te d i f t h e r e i s n o t r e a t m e n t e f f e c t
59
Just main effects
(no interaction)
Main effects +
Interaction
60
E X A M P L E : M A I N E F F E C T O F FAC TO R A
( N O M A I N E F F E C T O F FA C T O R B , N O A X B I N T E R AC T I O N )
0
5
10
15
20
25
B1 B2
A1
A2
61
E X A MPLE: M A IN E F F ECT S FO R B OTH FACTORS
( B U T N O A X B I N T E R AC T I O N )
0
10
20
30
40
50
B1 B2
A1
A2
62
EXAMPLE: A X B INTERACTION
(BUT NO MAIN EFFECTS)
0
5
10
15
20
25
B1 B2
A1
A2
63
- Psy 202H1: �Statistics iI���Module 5: �Hypothesis Testing with Factorial ANOVA��
- Game Plan
- Midterm
- Midterm
- More on: Interactions and simple effects
- Simple Effects
- Actual Results
- Another Example Two Factor Design
- Two-Factor Designs
- Review: Terminology
- Two-Factor Designs
- Two-Factor Designs
- Main Effects
- Interactions
- Two-Factor Designs
- Interactions
- Hypothesis Testing with Factorial ANOVA
- Factorial ANOVA logic
- Types of Variances in Factorial ANOVA (2 x 2)
- Factorial ANOVA –
- Analysis of Variance
- Two-Factor ANOVA
- Hypotheses for main effects
- Hypotheses for the interaction
- The Three F-ratios in a �Two-Factor ANOVA
- Two Stages of the Two-Factor Analysis of Variance
- Slide Number 27
- Slide Number 28
- Two-Factor ANOVA � Summary Table Example
- Effect Size for Two-Factor ANOVA: Partial Eta Squared
- Two-Factor ANOVA Assumptions
- What the Hypothesis test looks like with numbers
- Example: Hypothesis Testing with the Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Summary Table for Two-Factor ANOVA
- Hypothesis Testing with the �Two-Factor ANOVA
- Effect Size for Two-Factor ANOVA: Partial Eta Square
- Reporting Results in APA Format
- Post Hoc Tests
- Post Hoc Tests: Tukey’s HSD
- Post Hoc Tests
- Testing for Simple Main Effects
- Testing for Simple Main Effects
- Testing for Simple Main Effects
- Testing for Simple Main Effects
- Testing for Simple Main Effects
- Higher-Order Factorial Designs
- Example: Three-Way Interaction
- To-DO
- BONUS CONTENT: Another EXAMPLE OF A TWO FACTOR DESIGN
- Example: Self-Esteem & Presence of an Audience
- Hypotheses for main effects
- Hypotheses for the interaction
- The Three F-ratios in a �Two-Factor ANOVA
- Slide Number 60
- Example: Main effect of Factor A �(no main effect of Factor B, no A x B interaction)
- Example: Main effects for both factors�(but no A x B interaction)
- Example: A x B interaction�(but no main effects)