27 Questions TD0409-01 课件/Psy 202_10_Replication

27 Questions

TD0409-01 课件/Psy 202_10_Replication and Open Science_W22-1.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 10:
STATISTICAL ISSUES AND THE

REPLICATION CRISIS

1. Best Practices in Research Psychology
1. Some scandal
2. From fraud to QRPs
3. Open science
4. Looking forward

GAME PLAN

BEST PRACTICES IN
RESEARCH PSYCHOLOGY

OUTLINE

 B e s t P r a c t i ces
 Examples of fraud/data manipulation
 Questionable Research Practices (QRPs)
 Doing research ethically and responsibly

 Re p ro d uci bilit y a nd Re p l ica tio n E f fo r t s
 Motivation
 Histor y of Attempts

 T h e Fu t u r e: B e i ng a g o o d c o ns u m e r o f p s yc ho lo gic al s c i e nc e

INCENTIVE STRUCTURE

 P u b l ish ed wo rk i s i m p o r t ant g et t i ng a j o b , g et t ing te nu r e , b e i ng
awa r ded g r a nt s , a nd b e i ng v i ewed f avo r a bly i n o u r fi e l d.

 A s a r e s u l t , a “ r a t r a c e ” c u l t u r e d evelo ps a nd p e o p l e t r y to
p u b lis h a s m u c h a s t h ey c a n.

 B a l a nc ing t h e d e s i re to s t ay t r u t h ful to p s yc ho lo gic al s c i e nc e
w i t h t h e ne c e s s it y to p u b l ish.

 T h i s r e s u lt s i n r e s e arc her s t a k i ng s h o r t c u ts a nd s o m et i mes
wo r s e …

RECENT CASES OF RESEARCH MISCONDUCT

Karen Ruggiero (late 90s, early 00’s)
Marc Hauser (2007-2011)
Diederick Stapel (2011)
Dirk Smeesters (2011-2012)
Larr y Sanna (2012)
Jens Förster (2014-2015)
Michael LaCour (2015)

… I think it is important to emphasize that I never
informed my colleagues of my inappropriate
behavior. I offer my colleagues, my PhD students, and
the complete academic community my sincere
apologies. I am aware of the suffering and sorrow
that I caused to them.

I did not withstand the pressure to score, to publish,
the pressure to get better in time. I wanted too much,
too fast. In a system where there are few checks and
balances, where people work alone, I took the wrong
turn. I want to emphasize that the mistakes that I
made were not born out of selfish ends.

-Brabants Dagblad. 31 October 2011.
-Translated from Dutch

h t t p : / / www. ny t imes . c o m/ 201 3/ 04 /2 8/ magaz ine/ dieder ik – st apels –
a u d ac io us – ac ademic – f raud . htm l?pagewanted= all

SCIENTIFIC FRAUD

http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all

NOT JUST PSYCHOLOGY. . .

 D r u g s t u d ies : 2 0 – 2 5 % r e p l icate ( P r i nz e )
 C a nc e r t r e a t m ent: 1 1 % r e p l ic ate ( B e g ley)

HOWEVER, OTHER PRACTICES DON’T
CONSTITUTE FRAUD

Questionable Research Practices

Decisions in design, analysis, and reporting
that increase the likelihood of achieving a
positive result
And a positive response from editors and reviewers

FALSE POSITIVE PSYCHOLOGY

 How do decisions in analyses af fect the final results?

 Having small samples, collecting additional dependent
variables, peeking at data, dropping an experimental
condition

 If enough possibilities are enter tained, the likelihood
of achieving a significant result could be over 80%!

Simmons et al., 2011
13

Did you get the effect you predicted?

Did you get ANY effect?

Publish

HARK!

HARKing: Hypothesizing After Results are Known

Figure by S. Vazire14

Did you get the effect you predicted?

Did you get ANY effect?

Publish
Can you dig around and find one?

N
o

HARK!

p-hack!

Figure by S. Vazire

“p-hacking” = fishing around in your data for statistically significant results
Often involves redefining variables or running unplanned analyses

EXAMPLE:
IS THE U.S. ECONOMY

AFFECTED BY WHETHER
DEMOCRATS OR

REPUBLICANS ARE IN
OFFICE?

http://fivethirtyeight.com/features/science-isnt-broken/#part2
16

http://fivethirtyeight.com/features/science-isnt-broken/#part2

NOT SO SIMPLE…

 Do you look at the number of Republicans or
Democrats?

 Which politicians do you look at?

 How do you measure the U.S. economy?

 Should you look at it in general or excluding
economic recessions?

QUESTIONABLE RESEARCH PRACTICES

 John, Loewenstein, & Prelec (2012) sur veyed 2,155
academic psychologists about the frequency of 10
dif ferent QRPs…..

 Not reporting all measures, rounding off p-values, only
including data that “worked out”

 Up to 63.4% admission and high levels of each being
“defensible”

WHAT SHOULD RESEARCHERS DO?

 Increase disclosure in methods, results, and
hypothesis presentation

 Pre-register hypotheses and studies
 Data collection rules, analytic strategies

 Share data

 Be a responsible scientist regardless of outcome

CENTER FOR OPEN SCIENCE

 Open Science Framework

 Founded to increase to openness, integrity, and
reproducibility of scientific research
 Brian Nosek and Jeff Spies

 Open source sof tware platform for pre-registering
hypotheses, archiving study materials, depositing
data and syntax

 Initiated the Reproducibility Project
24

CENTER FOR OPEN
SCIENCE

Vi d e o :
h t t ps : / / w w w. yo ut u b e . c o m / wa t c h ? v= D Ix m LVr AQ i w

https://www.youtube.com/watch?v=DIxmLVrAQiw

PRODUCING RELIABLE FINDINGS

Reproducibility: A study can be duplicated in
method and/or analysis

Replicability: A study about a phenomenon
produces similar results from a previous study
of the same phenomenon.
Close/Exact Replications
Conceptual Replications

ARE PSYCHOLOGY
FINDINGS REPRODUCIBLE

AND REPLICABLE?

MANY LABS 1 .0

 Star ted running studies that could be done relatively
easily.

 Ef fects varied from those that have been known to
replicate (classic studies) and those that were
unknown.

MANY LABS 1 .0

MANY LABS 2.0/3.0 AND OTHER CHANGES

 Many Labs 2.0: Replication across sample and
setting

 Many Labs 3.0: Subject pool quality across the
academic semester

 Editorial policies of some journals changed
Report effect sizes, power, confidence intervals

 Special issues on replication

 Increase in meta-analyses
30

 Di s s e mi n at i o n o f Re pl i c at i o n At te mpt s
 Journal of Null Results

 P s yc h fi l e drawe r. o r g : Arc h i ve s at te mpte d re pl i c at i o n s o f s pe c i fi c
s t udi e s an d w h et h e r re pl i c at i o n was ac h i eve d

 Ce n te r fo r O pe n Sc i e n c e : P s yc h o l o g i s t B ri an N o s e k , a c h ampi o n
o f re pl i c at i o n i n p s yc h o l o g y, h as c re ate d t h e O pe n Sc i e n c e
Framewo rk , w h e re re pl i c at i o n s c an be re po r te d .

 As s o c i at i o n o f P s yc h o l o g i c al Sc i e n c e : Has re g i s te re d re pl i c at i o n s
o f s t udi e s , w i t h t h e ove ral l re s ul t s publ i s h e d i n Per spe c t ive s o n
Psyc h o l o g i c a l Sc ien c e .

 P l o s O n e : P ub l i c L i b rar y o f Sc i e n c e —pu bl i s h e s a bro ad ran g e o f
ar t i c l e s , i n c l udi n g f ai l e d re pl i c at i o n s , an d t h e re are o c c as i o n al
s ummari e s o f re pl i c at i o n at te mpt s i n s pe c i fi c are as .

 Th e Re pl i c at i o n In dex : Cre ate d i n 2 014 by U l ri c h Sc h i mmac k , t h e
s o – c al l e d ” R In dex ” i s a s t at i s t i c al to o l fo r e s t i mat i n g t h e
re p l i c abi l i t y o f s t udi e s , o f j o urn al s , an d eve n o f s p e c i fi c
re s e arc h e r s .

 An d mo re ! !

RESPONSES TO REPLICATION CRISIS

SOME CRITICISMS

 Researchers cherr y pick studies because they have
some personal/ intellectual ax to grind

 People who do replications are somehow not
qualified to do science

 Science is naturally self-correcting

 Unknown dif ferences between studies
 Sample-specific reasons for non-replication

UNKNOWN DIFFERENCES

Approval at Time 1: 65%

Approval at Time 2: 32%

REPRODUCIBILIT Y PROJECT (2015)

Large-scale replication

100 studies from 3 different journals
Close/exact replications
Contacted original study authors
Open materials and data
 Reduces likelihood of “unknown differences” effect

How many do you think replicated?

WHY DIDN’T MORE FINDINGS REPLICATE?

Perhaps some difference between studies
Boundary effects

Or perhaps the effect didn’t exist in the first
place?
Some uncertainty in findings
File drawer problem

FILE DRAWER PROBLEM

h t t p s : / / www. yo ut ube. co m / wat c h?v =0R nq1 NpH dmw

JOHN OLIVER KNOWS (NSFW)

https://www.youtube.com/watch?v=0Rnq1NpHdmw

WHAT DOES GOOD
RESEARCH LOOK LIKE?

GOOD RESEARCH

 Good research is open research
 Materials and data are shared

publicly

 Good research features
experimental methods that are
strong and isolate a question
of interest

 Good research is adequately
“powered” research (see
Tutorial 7 for a review)

GOOD RESEARCH

Good research is reproducible

CONSUMING SCIENCE

 Be an informed consumer of science

 Don’t believe ever ything you read!
 If an effect seems unbelievable, it just might be.

 Pay attention to sample size
 How big is the sample?
 Effects are unreliable if sample size is too low, a 2,000 person

study more reliable than a 50 person study.

CONSUMING SCIENCE

 Is the study you are reading the only demonstration of
this ef fect?
 Have people from other labs replicated this?

 Did the authors make their data available?

 Advocate for good research so we can understand
more about humans and why they do the things they
do

 St ar t He re
 A summar y
 http://nobaproject.com/modules/the-re plication-crisis-in-psychology

 A dissent
 http://www.nytimes.com/2015/09/01/opinion/psychology -is-not-in-

crisis.html?_r=0
 O pt i o n al
 A counterpoint to the dissent
 http://www.theatlanti c.com/notes/2015/09/swee ping-psychologys-problems-

under-the-rug/403726/
 A possible solution, and preliminar y findings
 http://as.virginia.edu/news/massive-collaboration-testing-re producibility –

psychology -studies-publishes-findings
 A response to the possible solution
 https://www.sciencenews.org/ar ticle/psychologys-re plication-crisis-sparks-

new -debate
 It’s not just us
 http://www.slate .com/a r ticles/health_and_science/future_ tense/2016/04/bi

omedicine_facing_a_wor se_replication_crisis_than_the_one_plaguing_psychol
ogy.html

O P T I O N A L R E A D I N G S : “ R E P L I C A B I L I T Y C R I S I S ” I N
P SYC H O LO GY

http://nobaproject.com/modules/the-replication-crisis-in-psychology

http://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html?_r=0

http://www.theatlantic.com/notes/2015/09/sweeping-psychologys-problems-under-the-rug/403726/

http://as.virginia.edu/news/massive-collaboration-testing-reproducibility-psychology-studies-publishes-findings

https://www.sciencenews.org/article/psychologys-replication-crisis-sparks-new-debate

http://www.slate.com/articles/health_and_science/future_tense/2016/04/biomedicine_facing_a_worse_replication_crisis_than_the_one_plaguing_psychology.html

REPLICATION CRISIS
OR

CREDIBILIT Y REVOLUTION?

Interviewer: “How much of
what you print is wrong?”
Maddox: “All of it. That’s
what science is about — new
knowledge constantly
arriving to correct the old.”

John Maddox,
editor of Nature for 22 years

 D a t a A na l ys is P ro j e c t
 Due Tuesday April 5, 11:59pm

 C o u r s e E va l s ( s e e a nno u nc e me nt )

 F i na l exa m
 Tuesday Apr 12 9 am to Thursday April 14 11:59pm
 Same basics as Midterm; see Assessment Page for more info

TO DO

Psy 202H1: �Statistics iI��Module 10: �Statistical Issues and the Replication Crisis�
Game Plan
Best Practices in Research Psychology
Outline
Incentive Structure
Recent Cases of Research Misconduct
Slide Number 7
Slide Number 8
Scientific Fraud
Slide Number 10
Not just psychology. . .
However, other practices don’t constitute fraud
False Positive Psychology
Slide Number 14
Slide Number 15
EXAMPLE: �Is the U.s. economy affected by whether democrats or republicans are in office?
Not so simple…
Slide Number 18
Slide Number 19
Slide Number 20
Slide Number 21
Questionable Research Practices
What should researchers do?
Center for Open Science
Center for open science
Producing Reliable Findings
Are psychology findings reproducible and replicable?
Many Labs 1.0
Many Labs 1.0
Many Labs 2.0/3.0 and other changes
Responses to Replication Crisis
Some criticisms
Unknown differences
Reproducibility Project (2015)
Why didn’t more findings replicate?
File Drawer Problem
John Oliver Knows (NSFW)
What does good research look like?
Good Research
Good Research
Consuming Science
Consuming Science
Optional Readings: “Replicability Crisis” In Psychology
Replication Crisis�or �Credibility Revolution?
Slide Number 45
To Do

TD0409-01 课件/Psy 202_7_Regress_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 7:
REGRESSION

1. Introduction to Regression
1. Linear Regression vs Correlation
2. Hypothesis Testing with Regression
3. Video

2. Multiple Regression
1. What is it, even?
2. What can we learn?

GAME PLAN

Correlation Review!

LINEAR REGRESSION

STATISTICAL TECHNIQUE USED TO
PREDICT THE UNKNOWN VALUE OF ONE

VARIABLE GIVEN A KNOWN VALUE OF
ANOTHER VARIABLE

REVIEW

Many studies aim to determine if two
variables has a Co-Var ying Relationship with
one another

When the value of one variable reliably changes in
value with another variable
 Positive Covariance = When the two variables change in the

same direction
 E.g. Weight-Height / Study Time-Exam Performance

 Negative Covariance = When the two variables change in
opposite directions
 E.G. Stress-Meditation / Alcohol Intoxication – Coordination

INTRODUCTION TO LINEAR EQUATIONS
AND REGRESSION

 T h e Pe a r so n c o r r e l at io n m e a s ur es a l i ne ar r e l a t io ns hip
b et ween t wo va r i a bles.

 T h e l i ne t h ro u gh t h e d a t a
 Makes the relationship easier to see
 Shows the central tendency of the relationship
 Can be used for prediction

 Re g r es si o n a na l y si s p r e c i sely d e fi nes t h e l i ne .

REVIEW
ASSESSING FOR T HE PRESENCE OF COVARIAT ION

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

Perfect Relationships Allow Perfectly Precise Predictions to a Single Value

E.G. If X and Y were perfectly related (r = +/- 1.00), I could accurately predict Y to a single
value given a single value of X

For example, if X = 3, I would predict that Y = 5 6

REVIEW
ASSESSING FOR THE PRESENCE OF CVRS

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

Imperfect Relationships Allow for Predictions to a Range of Values (not perfectly precise)

E.G. If X and Y were imperfectly related, ǀ r ǀ < 1.00, I could accurately predict Y to a range
of values given a single value of X

For example, if X = 3, I would predict that Y would be between 4 – 6 7

REVIEW
ASSESSING FOR THE PRESENCE OF CVRS

When a CVR exists between two variables, it is possible to accurately predict the
unknown value of one of the variables given a known value of the other variable

The stronger the CVR between two variables, the more precise the predictions are (or, the
more narrow the range of predicted values of one variable)

Variables X & Y: r = +.79
If X = 3, Y is predicted to be between 4 – 6

Variables A & B: r = +.32
If X = 3, Y is predicted to be between 2 – 8

More precise prediction Less precise prediction 8

LINEAR REGRESSION

When a significant correlation has been found
between two variables, it is common for
researchers to want to generate an equation
that would be useful for predicting the value
of one of the variables given the known value
of the other variable.

Linear Regression is the technique to use in
order to accomplish this
Linear Regression utilizes the equation of a straight

line in order to make these predictions

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b

m = slope of the line = (Change in Y) / (Change in X) = (Y2 – Y1) / (X2 – X1)

b = Y-intercept = the value of Y when X = 0

r = +1.00

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b

m = slope of the line = [(Y2 – Y1) / (X2 – X1)] = [(11 – 9) / (6 – 5)] = (2 / 1) = +2.00

b = Y-intercept = the value of Y when X = 0 = -1.00
Based on subtracting 2 from Y for every 1-value decrease in X
(e.g. When X = 2, Y = 3; When X = 1, Y = 1; When X = 0, Y = -1)

r = +1.00

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b
y = +2.00(x) + -1.00

Now, we can predict an unknown value of Y given a value of X
If X = 15, what is the predicted value of Y?

If X = 100, what is the predicted value of Y?

r = +1.00

LINEAR REGRESSION

Equation of a Straight Line

y = m(x) + b
y = +2.00(x) + -1.00

Now, we can predict an unknown value of Y given a value of X
y = +2.00 (15) + -1.00 = 29.00

y = +2.00 (100) + -1.00 = 199.00

r = +1.00

LINEAR REGRESSION

y = +2.00(x) + -1.00
y = +2.00 (15) + -1.00 = 29.00

y = +2.00 (100) + -1.00 = 199.00

Since X & Y were perfectly related, we can make precise, single-value
predictions of Y from a given value of X

r = +1.00

LINEAR REGRESSION

What line should we use to characterize the relationship, and how
do we determine its equation?

What single line do we draw in order to determine its equation?

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

We will want to choose the “Best Fitting Regression Line”
The Line That Has The Smallest Average Degree of Prediction Error

Prediction Error = Y’ – Y
Y’ = The Line’s Predicted Value of Y; Y = Actual Value of Y

Y’

YPrediction Error
Y’ – Y

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others

Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error17

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do we
determine its equation?

Some Lines Have Smaller or Larger Average Degrees of Prediction Error Than Others

Smaller Average Degree of Prediction Error Larger Average Degree of Prediction Error18

LINEAR REGRESSION

What line should we use to characterize the relationship, and how do
we determine its equation?

There will always be one single straight line that “best fits the data”
or,

There will always be one straight line that has a smaller average degree of
Prediction Error than all other possible straight lines

This line is what is known as the “Least Squares Regression Line”, or “Best Fitting
Regression Line”

We will want to determine this line’s equation and use it for prediction

LINEAR REGRESSION

Equation of the “Least Squares Regression Line”

Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but this is the more common regression notation

In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:

Calculate First Calculate Second 20

LINEAR REGRESSION

Equation of the “Least Squares Regression Line”

Y’ = by (x) + ay
Conceptually the same as “y = m (x) + b”, but I will use this to be consistent with the textbook

In order to determine the slope (by ) and y-intercept (ay) of the
“Best Fitting Regression Line”, use the following equations:

Once you have calculated the
Pearson r correlation coefficient by hand,
then calculating this is easy as most of the

work has been completed already
21

 Fu nc t i o nal: D e fi ni ng t h e l i ne o f b e s t fi t t h a t we v i s u ally
e s t i m ated i n o u r s c a t terplot s

 C o nc e p t ual: H ow we d i s c us s o u r r e s u l ts
 Correlation does not specify relationship directionality at all
 Regression can imply it, if not directly test it
 “Predicting X FROM Y”

 St a t i s tic al:
 Simple linear regression and correlation will yield same results
 Beta (β) or b instead of Pearson’s r

 As statistics get more complex, regression gives us more functions
 Adding multiple predictors

LINEAR REGRESSION VS CORRELATION

LINEAR REGRESSION IS FOR THE BIRDS

P r a c t i ce e s t i m at ing s l o p e s a nd i nte rc ept s h e r e !
h t t p s : / / so phi eehill. s hinyap ps. io / eyebal l- regr es sio n/

EYEBALL REGRESSION

https://sophieehill.shinyapps.io/eyeball-regression/

TD0409-01 课件/Psy 202_4_IntrotoFactorial_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 4:
INTRO TO FACTORIAL ANOVA

1. Intro to Factorial ANOVA
1. Why factorial designs?
2. Structure of a Factorial ANOVA
3. A conceptual demonstration
4. What we can learn from Factorial ANOVA
5. Effects in graphs and text

2. Calculations – next module!

GAME PLAN

SOME QUICK CLARIFICATION

The Question: If my exact degrees of freedom
isn’t included on the table, which number
should I use?
Answer: Choose the safest (most conservative)

value
E.g., You’re looking up a q value, and your dfwithin

= 36  but the table jumps from 30 to 40
 Use 30 (which will indicate a larger q value  making it a more

conser vative test)

SOME QUICK CLARIFICATION

The Question: How many decimal places should I
round my final answer to? If I’m doing multiple
calculations to get to my final answer, should I
not round until the ver y end?
Answer: You should round your final answers to 2

decimal places (e.g., 4.87246  4.87). For the best
accuracy, hold off on rounding until the last step. But
do not spend time worrying about rounding – I am
much more interested in whether you followed the
correct steps, used the correct formulas, etc. So long
as you’ve shown your work, there is no need to worry
if your final answer is off by a couple of hundredths.

INTRODUCTION TO
FACTORIAL

DESIGN AND ANALYSIS

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA 6

WHY FACTORIAL ANOVAS?

• So far, we have discussed designs where there is only one IV and only
one DV

• Complex designs include multiple IVs, multiple DVs, or both
• Multiple IVs  Factorial design & Assessing interactions

• More groups can offer more precision

• Include more experimental, control, or placebo conditions (add
levels of IV)

• Want to understand if your effect is moderated/affected by another
variable (add IVs)

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

 REQUIREMENTS
Must have 2 or more IVs
Must have 2 or more levels of each IV
Must have quantitative DV

But why a Factorial ANOVA when t-tests and one-
ways are just so great!?

FACTORIAL ANOVA

 Example

 IV (exercise): mild vs. intense
 IV (age group): young adult vs. elderly
DV: overall fitness

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

Why not just run two t-tests?
• Exercise on fitness
• Age on fitness

A ge on fitness test per formance– t-test

young elderly

Exercise on fitness test per formance — t-test

mild intense

HUH??!!??
Exercise is
good right!?

AGE × EXERCISE ON FITNESS INTERACTION!

young elderly
mild intense

MEASURING MORE THAN ONE OUTCOME

1 . M a n i p u l a t i o n c h e c k s ( m e a s u r i n g t h e I V )
To ensure our manipulation worked

2 . M u l t i p l e m e a s u r e s o f t h e s a m e v a r i a b le o r c o n s t r u c t ( s a m e DV )
To assess convergent validity
To create composite scores

3 . M e a s u r e s o f s e v e r a l d i f f e r e n t v a r i a b l e s o r c o n s t r u c t s ( m u l t i p l e DV s )
To assess divergent (discriminant) validity
To assess possible confounds that can’t be experimentally controlled

4 . M u l t i p l e I V S
To assess interactions

FACTORIAL DESIGNS: ADVANTAGES

1. Allow for testing of multiple hypotheses within a
single study

1. Methodologically efficient
2. Statistically “cheaper”

2. Allows for more complex hypotheses and research
questions

3. Better understand the nuances of an ef fect
1. Interactions
2. Moderation

 When describing factorial ANOVAs statistically and
conceptually I’ll focus on 2 x 2 factorial designs
 As we start to calculate things, you’ll understand why!

 Thus the specifics in the remainder of this lecture
apply only to 2 x 2 Factorial ANOVAs, rather than
more complex designs.

DISCLAIMER

FACTORIAL DESIGN

MANIPULATING MULTIPLE FACTORS

 A l l ows us to a n swe r q ue stio ns a bo ut wh eth er th e e f fe ct o f
o n e in de pen dent va ria ble de pe nds o n th e leve l o f a n oth er

 Fa ctorial d e sign : Ea c h leve l o f o n e I V i s c o mbined with e a c h
leve l o f th e oth e r s to pro duc e a ll po ssible c o mbinatio ns o f
leve l s
 Non-manipulated IVs ok

WINE-RATING EXAMPLE

 What determines how highly a wine is rated?

Cf. Plassmann, O’Doherty, Shiv, & Rangel (2008; PNAS)

Quality? Price?

FACTORIAL DESIGN

 A re se a rc h de sig n
inve stiga ting th e
e f fe c t o f two o r
mo re in de pen dent
va ri a bl es (fa c tor s)
o n th e de pe ndent
va ri a bl e

Cheap Price Expensive Price

Low Quality
Low Quality,
Cheap Price

Low Quality,
Expensive Price

High Quality
High Quality,
Cheap Price

High Quality,
Expensive Price

Price

Quality

Factors: Quality Price

High LowLevels: Cheap Expensive

2 levels x 2 levels = 4 conditions

FACTORIAL DESIGN TABLE

 T h e r o w s r e p r e s e n t t h e l e v e l s o f o n e i n d e p e n d e n t v a r i a b l e , t h e
c o l u m n s r e p r e s e n t t h e l e v e l s o f a s e c o n d i n d e p e n d e n t v a r i a b l e , a n d
e a c h c e l l r e p r e s e n t s a c o n d i t i o n .

2×2 DESIGN: FACTOR B
Level 1 Level 2

FACTOR A
Level 1 Condition 1 Condition 2

Level 2 Condition 3 Condition 4

BET WEEN- VS. WITHIN-SUBJECT
FACTORIAL DESIGN

 B e t w e e n – s u b j e c t s f a c to r i a l d e s i g n
 ALL of the factor s are manipulated between subjects
 Each subject par ticipates in just ONE condition

 W i t h i n – s u b j e c t s f a c to r i a l d e s i g n
 ALL of the factor s are manipulated within subjects
 Each subject par ticipates in ALL conditions

 M i xe d d e s i g n f a c to r i a l
 SOME of the factor s are manipulated between subjects, SOME within subjects
 Each subject par ticipates in MORE THAN ONE, but NOT ALL conditions

 Research on video games and aggression has been
mixed. Studies of ten compare how violent and non-
violent video games af fect aggressive behavior, but
you wonder if perhaps opponent type – whether the
game is played against another person or the
computer – also might matter.

 IV1: Game type – violent or non-violent
 IV2: Opponent type – real or computer
 DV: A ggressive behavior

AN EXAMPLE

 Between Subjects:
Each participant participates in one level of each IV

(i.e., in one of the four cells of the design).
All four cells of the design have different

participants.

T YPES OF FACTORIAL DESIGNS: BET WEEN

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-10 Participants #: 11-20

Against computer, level 2 Participants #: 21-30 Participants #: 31-40

 Repeated Measures:
Each participant participates in both levels of both

IVs (i.e., in all four cells of the design).
All four cells of the design have the same

participants.

T YPES OF FACTORIAL DESIGNS: WITHIN

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-40 Participants #: 1-40

Against computer, level 2 Participants #: 1-40 Participants #: 1-40

 Mixed Model:
Each participant participates in one level of one IV

and in both levels of the other IV (i.e., in two cells of
the design).
Two cells of the design have the same participants,

the other two have another set of participants.

T YPES OF FACTORIAL DESIGNS: MIXED

Violent, level 1 Non-violent, level 2

Against person, level 1 Participants #: 1-20 Participants #: 21-40

Against computer, level 2 Participants #: 1-20 Participants #: 21-40

 Structure
 Factors – new term for independent variable
 Levels – number of variations or categories in IV

 Notation
 A x B x C -> 2 x 2 x 3
 Where number of terms represents number of factors
 And the value of each term represents number of levels in that factor
 So the product of each term represents the total number of conditions

 Example: “We utilized a 2 (Game type: violent, non-violent) x 2
(Opponent type: person, computer) between-subjects factorial
design”
 Or, a 2×2

DESCRIBING A FACTORIAL DESIGN

Learning check!

A CONCEPTUAL
DEMONSTRATION

KINDS OF STATISTICAL EFFECTS

 M a i n e f f e c t
 On average, levels of Factor A dif fer from each other
 On average, levels of Factor B dif fer from each other

 S i m p l e e f f e c t
 At a specific level of Factor A , levels of Factor B dif fer from each other
 At a specific level of Factor B, levels of Factor A dif fer from each other

 I n te r a c t i o n
 The ef fect of Factor A on the DV depends on the level of Factor B
 The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B

We’ll come
back to this one
in a bit

2X2 EXAMPLE STUDY

• Wells & Petty (1980)

• Previous work shows that we sometimes infer
our attitudes and feelings by looking at our
behavior

• Suggested that we use our physical behavior
as an attitude cue

Asked par ticipants to “help evaluate headphones”

Nod head up and down
Shake head lef t to right

Listened to per suasive argument

Advocate tuition increase
Advocate tuition decrease

Rated opinion on tuition change

Head nod Head shake

Tuition
increase

1 2

Tuition
decrease

3 4

Factor A

Factor B

F
A
C
T
O
R

FACTOR A

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects
 Interaction

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

POSSIBLE EFFECTS

When testing for ef fects in factorial designs,
several possible patterns of results:
No ef fects
One main ef fect (speech topic)
One main ef fect (head movement)
Two main ef fects

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

Head nod Head shake

Tuition
increase
Tuition
decrease

Head nod Head shake

Tuition
increase
Tuition
decrease

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

O
pi

ni
on

o
n

Tu
it

io
n

C
ha

ng
e

INTERACTIONS
• Ef fects of one IV on DV depend on presence of second IV
• Two types

• Spreading: effect exists at one level of the IV and is
weaker or nonexistant at different level

• Crossover: no main effects of either IV because effects
are opposite at different levels of other IV

CONCEPTUALLY,
WHAT CAN WE LEARN

FROM
FACTORIAL DESIGNS?

KINDS OF STATISTICAL EFFECTS

 M a i n e f f e c t
 On average, levels of Factor A dif fer from each other
 On average, levels of Factor B dif fer from each other

 S i m p l e e f f e c t
 At a specific level of Factor A , levels of Factor B dif fer from each other
 At a specific level of Factor B, levels of Factor A dif fer from each other

 I n te r a c t i o n
 The ef fect of Factor A on the DV depends on the level of Factor B
 The dif ference between levels of Factor A is dif ferent for dif ferent levels of Factor B

WINE-RATING EXAMPLE

H o w d o q u a l i t y a n d p r i c e s
a f f e c t w i n e r a t i n g s ?

Cheap Price Expensive Price

Low Quality
Low Quality,
Cheap Price

Low Quality,
Expensive Price

High Quality
High Quality,
Cheap Price

High Quality,
Expensive Price

Price

Quality

Factors: Quality Price

High LowLevels: Cheap Expensive

2 levels x 2 levels = 4 conditions

Cheap Price Expensive Price

Low Quality 35 87 61

High Quality 51 87 69

43 87 Marginal Means

MAIN EFFECTS

The effect of one factor on average across all levels of
the other factor(s); difference between marginal means

Main Effect of Price
87 – 43 = 44

Main Effect of
Quality

69 – 61 = 8

Main effect of price: On average,
expensive wines (M=87) were rated 44
points higher than cheap wines (M=43)

Main effect of Quality:
On average, high-

quality wines (M=69)
were rated 8 points

higher than low-quality
wines (M=61)

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

SIMPLE EFFECTS

Simple Effect of Price
on Low Quality Wines

87 – 35 = 52

Simple Effect of Price
on High Quality Wines

87 – 51 = 36

Simple Effect of Quality
on Cheap Wines
51 – 35 = 16

Simple Effect of Quality
on Expensive Wines

87 – 87 = 0

When the wine is cheap,
high-quality wines (M=51)
are rated 16 points higher

than low-quality wines
(M=35)

When the wine is expensive,
high-quality (M=87) wines
are rated the same as low-

quality wines (M=87)

When the wine is high-
quality, expensive wines

(M=87) are rated 36 point
higher than cheap wines

(M=51)

When the wine is low-quality,
expensive wines (M=87) are
rated 52 points higher than

cheap wines (M=35)

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

INTERACTIONS

Simple Effect:
Price on Low Quality Wines

87 – 35 = 52

Simple Effect:
Price on High Quality Wines

87 – 51 = 36
Simple Effect:

Quality on Cheap Wines
51 – 35 = 16

Simple Effect:
Quality on Expensive Wines

87 – 87 = 0

The effect of one factor depends on the levels of the
other factor(s); difference between simple effects

Interaction: The effect of quality is
different for cheap wines vs. expensive

wines
(the effect of quality depends on price)

Interaction: The effect of
price is different for high-

quality wines vs. low-
quality wines (the effect of
price depends on quality)

HOW TO DESCRIBE INTERACTIONS

 M u s t d e s c r i b e a t l e a s t t w o s i m p l e e f f e c t s :

 E x a m p l e 1 : W h e n t h e p r i c e i s c h e a p , h i g h – q u a l i t y w i n e s
( M = 51 ) a r e r a t e d 1 6 p o i n t s h i g h e r t h a n l o w – q u a l i t y w i n e s
( M = 3 5 ) , b u t w h e n t h e p r i c e i s ex p e n s i v e , h i g h – a n d l o w –
q u a l i t y w i n e s a r e r a t e d e q u a l l y h i g h ( M = 87 f o r b o t h ) .

 E x a m p l e 2 : W h e n t h e q u a l i t y i s l o w, c h e a p w i n e s ( M = 3 5 ) a r e
r a t e d 5 2 p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) , b u t w h e n
t h e q u a l i t y i s h i g h , c h e a p w i n e s ( M = 51 ) a r e r a t e d o n l y 3 6
p o i n t s l o w e r t h a n ex p e n s i v e w i n e s ( M = 87 ) .

Cheap Price Expensive Price

Low Quality 35 87

High Quality 51 87

Learning check!
44

STATISTICAL EFFECTS IN
GRAPHS

MAIN EFFECTS

Low

High

Quality

Comparing averages of end-points

Main effect of Quality:
On average, high-quality
wines were rated 8 points

higher than low-quality wines

Main effect of price:
On average, expensive

wines were rated 44 points
higher than cheap wines

SIMPLE EFFECTS

Low

High

Quality

Comparing any 2 end-points

Simple Effect of Quality
on Cheap Wines
51 – 35 = 16

Simple Effect of Quality
on Expensive Wines

87 – 87 = 0

When the wine is cheap, high-
quality wines are rated 16 points

higher than low-quality wines

When the wine is expensive,
high-quality wines are rated
the same as low-quality wines

SIMPLE EFFECTS

Low

High

Quality

Comparing any 2 end-points

Simple Effect of Price
on High Quality Wines

87 – 51 = 36

Simple Effect of Price
on Low Quality Wines

87 – 35 = 52

When the wine is high-quality,
expensive wines are rated 36
points higher than cheap wines

When the wine is low-quality,
expensive wines are rated 52
points higher than cheap wines

INTERACTIONS

Low

High

Quality

Are the differences different?

When the wine is cheap, high-
quality wines are rated 8

points higher than low-quality
wines

When the wine is expensive,
high-quality wines are rated
the same as low-quality wines

Interaction: The effect of quality is different
for cheap wines vs. expensive wines

INTERACTIONS

Low

High

Quality

Are the differences different?

When the wine is high-quality,
cheap wines are rated 36 points

lower than expensive wines

When the wine is low-quality,
cheap wines are rated 52 points

lower than expensive wines

Interaction: The effect of price is different
for high-quality wines vs. low-quality wines

INTERACTIONS

Low

High

Quality

Low

High

Quality

Interaction: No interaction:

Are the differences different?

Learning check!

LINE VS. BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

LINE VS. BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

R
at

in
g

Price

MAIN EFFECTS IN BAR GRAPHS

0
10
20
30
40
50
60
70
80
90

100

Cheap Expensive

Low Quality High QualityMain effect of Quality:
On average, high-
quality wines were

rated 8 points higher
than low-quality wines

Main effect of price:
On average, expensive
wines were rated 44
points higher than

cheap wines
54

INTERACTIONS IN BAR GRAPHS

100

Cheap Expensive

Low Quality High Quality

Interaction: The effect of
quality is different for cheap

wines vs. expensive wines

Interaction: The effect of
price is different for high-

quality wines vs. low-quality
wines

GRAPHING THE SAME DATA SEVERAL WAYS

100

Low Quality High Quality

Cheap Expensive

0
20
40
60
80

100

Cheap Expensive

Low Quality High Quality

100

Low Quality High Quality

Cheap Expensive

0
20
40
60
80

100

Cheap Expensive

Low Quality High Quality

HIGHER ORDER INTERACTIONS

This figure from “Retrieval Practice
Protects Against Stress” shows a
2x2x2 design
1. Test 1 (immediate) vs. Test 2

(delayed)
2. Study practice (SP) vs. Retrieval

practice (RP)
3. Stressed (white) vs. Non-

stressed (grey)

A 3-way interaction is when the
effect of one factor depends on 2
other factors:
• There is an interaction between

Study Method and Stress
Induction for Test 2 but not for
Test 1

• The effect of stress depends on
how you studied, but also on
when the test happened

INTERPRETING TEXT

INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1

Main effect of
group behavior

Main effect of
partner behavior

Interaction between
group behavior &
partner behavior

INTERPRETING TEXT:
RECIPROCIT Y & CONFORMIT Y, STUDY 1

INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2

Interaction between
reciprocity/conformity &

partner knowledge

Simple effect of
reciprocity/conformity

when partner behavior is
known

Simple effect of
reciprocity/conformity

when partner behavior is
unknown

INTERPRETING TEXT
RECIPROCIT Y & CONFORMIT Y, STUDY 2

Learning check!

 Re a d ing + M i nd t ap

 F i r s t c o nte nt – l o aded t u to r i al ( + fi r s t a s s i g nment )

 M i d ter m exa m w i l l b e i n t h e we e k b e fo r e r e a d ing we e k – t h i s
i s a g o o d t i m e to s t a r t m a k ing a s t u d y p l a n, s p r e a d o u t ove r
t i m e , s o yo u d o n’ t ne e d to c r a m !
 Midterm info is already posted! See Assignments page of syllabus.

TO-DO

Psy 202H1: �Statistics iI��Module 4: �Intro to Factorial ANOVA�
Game Plan
Some Quick Clarification
Some Quick Clarification
Introduction to Factorial �Design and Analysis
Slide Number 6
Why Factorial ANOVAS?
Slide Number 8
Factorial Anova
Slide Number 10
Slide Number 11
Slide Number 12
Age × Exercise on Fitness Interaction!
Measuring More than one outcome
Slide Number 15
DISCLAIMER
Factorial Design
Manipulating multiple factors
Wine-rating example
Factorial design
Factorial design table
Between- vs. within-subject factorial design
AN Example
Types of Factorial Designs: Between
Types of Factorial Designs: Within
Types of Factorial Designs: Mixed
Describing a Factorial Design
A Conceptual Demonstration
Kinds of Statistical effects
Slide Number 30
Slide Number 31
Possible Effects
Possible Effects
Possible Effects
Possible Effects
Possible Effects
Slide Number 37
Conceptually,� What can we learn from �Factorial Designs?
Kinds of Statistical effects
Wine-rating example
Main effects
Simple effects
Interactions
How to describe Interactions
Statistical effects in graphs
Main Effects
Simple Effects
Simple Effects
Interactions
Interactions
Interactions
Line vs. Bar graphs
Line vs. Bar graphs
Main effects in bar graphs
Interactions in bar graphs
Graphing the same data several ways
Higher Order Interactions
Interpreting text
Interpreting text:�reciprocity & conformity, Study 1
Interpreting text:�reciprocity & conformity, Study 1
Interpreting text �reciprocity & conformity, Study 2
Interpreting text �reciprocity & conformity, Study 2
To-Do

TD0409-01 课件/Psy 202_3_RepeatedANOVA_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 3:
REPEATED MEASURES ANOVA

1. Repeated Measures ANOVA
1. Sample Problem
2. Effect Size
3. Posthoc Tests

GAME PLAN

 Spre ad o ut yo ur s t udy i n g
 All else being equal, studying twice for 1 .5 hours each is better than

studying once for 3 hrs. Studying three times for 1 hr each is even better.

 Th e b e s t s t udy i n g mat c h e s t h e t y pe o f as s e s s me n t
 The test will have computations? Do practice problems (MindTap,

lectures, book problems, as well as a ton of other stuf f online)
 The test will require you to explain things? Practice explaining things! To

a study buddy, your notebook , or a lamp, it doesn’t matter who is
listening. What matters is the explaining (in your own words).
 The test will be closed book? Well, your studying better include you

closing your book or notes and practicing recalling information!!

 In f ac t , t h i s i s t h e be s t advi c e I c an o f fe r: w h ateve r e l s e yo u do ,
yo u s t udy i n g mus t i n c l ude te s t i n g yo ur s e l f . Cl o s e yo ur n ote s an d
s e e w h at po ps o ut . Th e n , us e t h at to g ui de yo ur s t udy i n g .

 Fo r mo re t i p s , vi s i t Th e L e arn i n g Sc i e n t i s t s
 https://www.learningscientists.org/downloadable-materials/

MY BEST STUDY ADVICE
(FOR ALL CLASSES, NOT JUST THIS ONE)

https://www.learningscientists.org/downloadable-materials/

REPEATED MEASURES
(OR WITHIN-SUBJECT)

ANOVA

THE LOGIC OF O N E – WAY
R E P E AT E D M E A S U R E S ANOVA

F = variance between sample means
variance expected by chance (error/natural
variability)

WTF??!!??
This looks identical
to the one way
between ANOVA

 I nd e p endent – meas ures A N OVA u s e s m u l t i ple p a r t i c ipant
s a m p les to te s t t h e t r e a t m ent s.
 If groups are different, what was responsible?
 Treatment dif ferences?
 Par ticipant group dif ferences?

 Re p e ated- m easur es s o l ve s t h i s p ro b l em by te s t i ng a l l
t r e a t ment s u s i ng o ne s a m p le o f p a r t i c ipant s .
 In an experiment, compare two or more manipulated treatment

conditions using the same participants in all conditions
 In a nonexperimental study, compare a group of participants at two

or more different times
 Before therapy; af ter therapy; 6-month follow -up
 Compare vocabular y at age 3, 4 and 5

REPEATED MEA SURES A NOVA :
WIT HIN-SUB JECT S DESIGN WITH MORE THA N 2

GROUPS

EXAMPLE!

https://www.scientificamerican.com/article/is-double-dipping-a-food-safety-problem-or-just-a-nasty-habit/ 7

https://www.scientificamerican.com/article/is-double-dipping-a-food-safety-problem-or-just-a-nasty-habit/

F = variance between groups
variance within groups

 Two sources of variance:

 Between group variance — how big are differences
between groups
 Within group variance — how much error/natural

variability

THE LOGIC OF O N E – WAY
R E P E AT E D ANOVA

BET WEEN GROUP VARIANCE IN
REPEATED MEASURES

Why do people in dif ferent groups dif fer?

1. Treatment effect = differences caused by our
experimental treatment

Systematic dif ferences

2. Chance = differences due to random factors
including…
 Individual differences
Experimental error (noise)

Non-systematic, random dif ferences

In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role

WITHIN GROUP VARIANCE IN
REPEATED MEASURES

Why do people within the same group dif fer?

1. Chance = differences due to random factors
including…
 Individual differences
Experimental error (noise)

Non-systematic, random differences

In a within subjects
designs Ps are their
own controls so
individual
differences can’t
play a role

 Re p e ated- m easur es d e s i gn a l l ows c o nt ro l o f t h e e f fec t s o f
p a r t i c ipant c h a r ac ter is tic s
 Eliminated from the numerator by the research design
 Must be removed from the denominator statistically

 T h e b i g ges t c h a ng e b et ween i nd e pendent – m easur es A N OVA
a nd r e p e ated- meas ur es A N OVA i s t h e a d d it io n o f a p ro c e s s to
m a t h ema tic ally r e m ove t h e i nd i v idual d i f fer enc es va r i a nc e
c o m p o ne nt f ro m t h e d e no m i nato r o f t h e F – r a t i o .

HOW DO WE DEAL WITH INDIVIDUAL
DIFFERENCES?

 If Null is True:

 If Null is False:

F= Between-group (Treatment or Chance- individual differences )
Within-group (Chance -individual differences )

F = Treatment Effect + Chance
(Chance- individual differences)

F = 0 + (Chance- individual differences)
(Chance- individual differences)

≈ 1

> 1

THE R E P E AT E D
M E A S U R E S F -RATIO

 If Null is True:

 If Null is False:

F = Between-group (Treatment or Just Experimental Error)
Within-group (Experimental Error)

F = Treatment Effect + Chance
Experimental Error

F = 0 + Experimental Error
Experimental Error

≈ 1

> 1

THE R E P E AT E D
M E A S U R E S F -RATIO

 F is the ratio between two variance estimates

 Denominator is called “error term”  composed of
individual difference variability and experimental error

T WO STAGES OF THE REPEATED-
MEASURES ANOVA

First stage
 Identical to independent samples ANOVA
Compute SStotal, SSbetween treatments and SSwithin treatments

Second stage
Done to remove the individual differences from the

denominator
Compute SSbetween subjects and subtract it from SSwithin

treatments to find SSerror (also called residual)

STRUCTURE OF THE REPEATED-MEASURES
ANOVA

If Within-Group variance
can be partitioned into
individual differences
and error, then the sum
of between subjects and
error values (i.e., SS, df)
will always equal Within!

REPEATED MEASURES DESIGNS:
PROS & CONS

• Repeated Pros:
 Participants serve as their own “controls” (reduced

error, more power)
 Need fewer participants for the same research

question (compared to between-subjects design)

• Repeated Cons:
 Order effects, practice effects
 May guess hypothesis / aware of what is being

manipulated
 Longer studies
 Limits possibilities for experimental manipulations

EFFECT SIZE FOR THE
REPEATED-MEASURES ANOVA

Percentage of variance explained by the
treatment differences
Partial η2 is percentage of variability that has

not already been explained by other factors

or
subjectsbetween total

eatmentsbetween tr2

SS SS
SS

−
=η

errorSSSS
SS

+
=

eatmentsbetween tr

eatmentsbetween tr2η

REPEATED-MEASURES ANOVA
POST HOC TESTS (POSTTESTS)

Significant F indicates that H0 (“all
populations means are equal”) is wrong in
some way.

Use post hoc test to determine exactly where
significant differences exist among more than
two treatment means
Tukey’s HSD can be used
Substitute SSerror and dferror in the formulas

REPEATED-MEASURES ANOVA
ASSUMPTIONS

 The obser vations within each treatment condition
must be independent.

 The population distribution within each treatment
must be normal.

 The variances of the population distribution for each
treatment should be equivalent.

Learning check!

PRACTICE WITH
REPEATED MEASURES

ANOVA

STRUCTURE OF THE REPEATED-MEASURES
ANOVA

 Re s e arc h Qu e s t io n
 A researcher is tr ying to determine the best way for individuals to

recall a list of words. Eight participants each received three lists of
words and tried to remember them using three different ways of
memorizing (rote rehearsal, an imager y mnemonic technique, or a
stor y mnemonic technique). After each study period, participants did
a ten minute distractor task then took a test on the word list. Was
there a difference in recall based on the type of memor y technique
that participants used?

 IV: Memor y technique
 3 levels: rote vs imager y vs stor y

 DV: Number of words recalled

R E P E AT E D M E A S U R E S A N OVA – L E T ’ S P R AC T I C E !

Participant Rote Imagery Story
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

Are these 3
means

significantly
different

from each
other?

HYPOTHESIS TESTING WITH RM ANOVA

 Re s e arc h qu e s t i o n
 Does memory technique affect word recall?

 Ste p 1 : St a t is ti cal H y p ot h eses

 H0: µ1 = µ2 = µ3
 H1: At least one mean is different from another

 Ste p 2 : D e c i sio n Ru l e
 Look up critical value of F in Table

 Ste p 3 : C o m p u te o b s e r ved F – r a t io
 Track values in ANOVA Summary Table

 Ste p 4 : M a ke a D e c i s io n ( Re j ec t o r r et a i n H 0)

 * * Ste p 5 : I f H 0 r e j e cted , c o nd u c t p o s t – h o c c o m p a r iso ns

 Ste p 6 : C o m p u te E f fe ct S i z e , I nte rp r et a nd Re p o r t F i nd ings
25

COMPUTING ANOVA

The ANOVA Summar y Table

Source SS df MS F

Between group SSB dfB MSB MSB
MSEWithin group SSW dfW

Between
Subjects

SSP dfP

Error SSE dfE MSE

Total SST dfT 26

FINDING THE CRITICAL VALUE

Find Fcritical in Table

 Need to know 3 things

 α level
 dfnumerator = dfbetween
 dfdenominator = dferror

 If α = .05 and df = 2, 14, Fcritical = 3.74

CRITICAL VALUES OF F FOR DF=2,14

Critical region;
Reject H0

3.74 6.51

COMPUTING ANOVA

 STEP 1: Compute Sums of Squares (SS)

SSTotal =
2

2 GX
N

−∑

SSBetween =
22 GT

Nn
−∑

SSWithin = Σ(SS for each group) or SSTotal − SSBetween

Where:

• X = each value of X

• T = treatment group total (ΣX)

• G = grand total (ΣT)

• n = sample size of each group

• N = total sample size (Σn)

Participant Rote Imagery Story
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

n 8 8 8 N = 24
Totals T1=28 T2=40 T3=54 G = 122

N = 24
n = 8
K = 3

COMPUTING ANOVA

 STEP 2: Compute Degrees of Freedom (df)

dfBetween = k – 1

Where:

• n = sample size of each group

• N = total sample size (Σn)

• k = number of groups

dfWithin = N – k or Σ(n-1)

dfTotal = N – 1

COMPUTING ANOVA

 STEP 3 (NEW): Compute Between Subject Values

Where:

• n = sample size of each group

• N = total sample size (Σn)

• G = grand total (ΣT)

• P = person totals (Σx for each
participant)

• k = number of groups

SSbetweensubjects =
22 GP

Nk
−∑

Participant Rote Imagery Story P
A 2 4 5

B 3 2 3

C 3 5 6

D 3 7 6

E 2 5 8

F 5 4 7

G 6 8 10

H 4 5 9

DATA FROM MEMORY STUDY

M1 = 3.5 M2 = 5 M3 = 6.75

2 + 4 + 5 = 9

3 + 2 + 3 = 8

16
15

COMPUTING ANOVA

 STEP 3 (NEW): Compute Between Subject Values

Where:

• n = sample size of each group

• N = total sample size (Σn)

• G = grand total (ΣT)

• P = person totals (Σx for each
participant)

• k = number of groups

SSbetweensubjects =
22 GP

Nk
−∑

SSerror = SSWithin – SSbetweensubjects

dfbetweensubjects = n – 1

dferror = dfwithin – dfbetweensubjects OR (N-k)-(n-1)

COMPUTING ANOVA

 STEP 4 (UPDATE): Compute Mean Squares (MS)

MSBetween = between
between

df
SS

MSerror =
error

error

df
SS

COMPUTING ANOVA

 STEP 4 (UPDATE): Compute the F -Ratio

F-Ratio =
error

between

MS
MS

THE ANOVA SUMMARY TABLE

Source SS df MS F

Between group SSB dfB MSB MSB
MSEWithin group SSW dfW

Between
Subjects

SSP dfP

Error SSE dfE MSE

Total SST dfT

THE ANOVA SUMMARY TABLE

Source SS df MS F

Between group 42.33 2 21.17 8.64

Within group 73.5 21

Between
Subjects

52.5 7

Error 34.33 14 2.45

Total 115.83 23

EFFECT SIZE FOR REPEATED MEASURES

total

eatmentsbetween tr2

SS
SS

=η

jectsbetweensubtotal

eatmentsbetween tr2

SS SS
SS

−
=η

For independent measures ANOVA

For repeated measures ANOVA

EFFECT SIZE FOR REPEATED MEASURES

jectsbetweensubtotal

eatmentsbetween tr2

SS SS
SS

−
=η For repeated measures ANOVA

η2 = 4 2 . 3 3 = . 5 5 2
1 1 5 . 8 3 – 3 9 .17

TUKEY HSD TEST

 Step 1: Find the value of “q”
 Need to know 3 things:
 α
 dfE
 k

 Step 2: Compute HSD

HSD =
n

MSerrorq
Where n = group sample
size, assuming equal n

in each group

TUKEY HSD TEST

 Step 1: Find the value of “q”
 α = .05 dfE = 14 k = 3
 From Table B.5: q = 3.70

 Step 2: Compute HSD

HSD = = ± 2.05 words
8
45.2

70.3
n

MSerror =q

So, a pair of means must
differ by at least 2.05 in
order to be significantly

different

TUKEY HSD TEST

 Step 3: Compute dif ference between each
pair of means and compare to HSD

 M1 – M2 = 3.5 – 5 = 1.5

 M1 – M3 = 3.5 – 6.75 = 3.25

 M2 – M3 = 5 – 6.75 = 1.75

Does NOT
exceed 2.05

Exceeds 2.05

Does NOT
exceed 2.05

TUKEY HSD TEST

 What do we conclude?

 M1 does not differ from M2
 There was no difference in word recall when

participants used the rote rehearsal or imagery
techniques

 M2 does not differ from M3
 There was no difference in word recall when

participants used the story technique or imagery
technique

 Only M3 differs from M1
 People remembered significantly more words when

using the story technique than the rote technique.

T h e r e wa s a s i g ni fic ant e f fec t o f m e m o r y te c h niqu e o n wo r d

r e c a l l, F ( 2 , 14 ) = 8 . 6 4 , p < . 0 5 , η2 = . 5 5 . Tu key p o s t – h o c

c o m p a r iso ns i nd i c a ted t h a t p a r t i ci pant s r e m e m bered

s i g ni ficant ly m o r e wo r d s w h e n s t u d y ing w i t h t h e s to r y te c h niqu e

( M = 6 . 7 5, S D = 2 . 3 ) t h a n w h e n t h ey s t u d ied w i t h rote r e h e ar sal

( M = 3 . 5 , S D = 1 . 4 ) , p < . 0 5 . N e i t her te c h nique l e d to d i f fer ent

r e s u lt s f ro m t h e i m a ger y m ne m o ni c ( M = 5 , S D = 1 . 9 ) .

FORMAL REPORT

SD for each group

SD =
1

SS
−n 45

REPORTING A REPEATED MEASURES
F -STATISTIC

 A closer look…

F(2, 14) = 8.64, p < .05,
Test

statistic

Observed
value

alpha
level

Degrees of
freedom
(B, Error)

Significance
Sig? p < α

Nonsig? p > α

55.2 =η

Effect
size

Learning check!

 Tu to r ial 1 now ava i lable!

 M i nd t ap C H 1 2 – d u e J a n 3 0

TO-DO

Psy 202H1: �Statistics iI��Module 3: �Repeated Measures ANOVA��
Game Plan
My Best Study Advice �(for all classes, not just this one)
Repeated Measures (or Within-Subject) ANOVA
THE LOGIC OF ONE-Way �Repeated Measures ANOVA
Repeated Measures ANOVA: �Within-Subjects Design with more than 2 groups
Example!
Slide Number 8
BET WEEN GROUP VARIANCE in repeated mEASURES
WITHIN GROUP VARIANCE in repeated mEASURES
How do we deal with individual differences?
Slide Number 12
THE repeated MeaSURES F-RATIO
THE Repeated mEASURES F-RATIO
Two Stages of the Repeated-Measures ANOVA
Structure of the Repeated-Measures ANOVA
Repeated Measures Designs:�Pros & Cons
Effect size for the �Repeated-Measures ANOVA
Repeated-Measures ANOVA�Post Hoc Tests (Posttests)
Repeated-Measures ANOVA Assumptions
Practice with Repeated Measures ANOVA
Structure of the Repeated-Measures ANOVA
Repeated Measures ANOVA – Let’s Practice!
Data from Memory Study
Hypothesis testing with RM ANOVA
Computing ANOVA
Finding the Critical Value
Critical values of F for df=2,14
Computing ANOVA
Data from Memory Study
Computing ANOVA
Computing ANOVA
Data from Memory Study
Computing ANOVA
Computing ANOVA
Computing ANOVA
The ANOVA Summary Table
The ANOVA Summary Table
Effect Size for Repeated Measures
Effect Size for Repeated Measures
Tukey HSD Test
Tukey HSD Test
Tukey HSD Test
Tukey HSD Test
Formal Report
Reporting a Repeated Measures� F-statistic
To-DO

TD0409-01 课件/Psy 202_1_Review.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 1:
FOUNDATIONS REVIEW

1. Foundations Review
1. Very quick!
2. Not for teaching, but for reminding

GAME PLAN

 2 01 i s a T RU E p r e r eq

 U s e yo u r r e s o u rc es, e a r l y a nd o f te n
 Form study groups
 Text + MindTap resources from Ch 1-11
 In “Psy 201 Review” folder at bottom of main page

 Appendix A
 Campus tutoring and study centres (like New College Stats Aid Centre)
 Web resources like
 https://www.learner.org/series/against-all-odds-inside-statistics/
 https://www.khanacademy.org/math/statistics-probability
 http://devpsy.org/links/open_source_textbooks (scroll down for stats)

A NOTE ABOUT THE COURSE PREREQ

https://www.learner.org/series/against-all-odds-inside-statistics/

https://www.khanacademy.org/math/statistics-probability

http://devpsy.org/links/open_source_textbooks

SUPER SPEEDY REVIEW

STATISTICS

 S t a t is ti cs i s t h e s c i enc e o f g a i ning i n s ight f r o m d a t a

 T h e te r m s t at is tics r e fe r s to a s et o f m a t h emat ic al p ro c e d ur es
fo r o r g a ni zing, s u m m ar izing, a nd i nte rp ret ing i nfo r m a t i o n
 “statistics help researchers bring order out of chaos” (p. 5)

STATISTICS

 Tw o g e ner a l p u rpo ses :
 To organize and summarize the information so that the researcher

can see what happened in the research study and can communicate
the results to others

 To answer the questions that initiated the research by determining
exactly when general conclusions are justified based on the specific
results that were obtained

INFERENTIAL STATISTICS

 C o ns i s t o f t h o s e te c h niques t h a t a l l ow u s to s t u d y s a m p les
a nd t h e n m a ke g e ne r alizat io ns a b o u t t h e p o p u l atio ns f ro m
w h i c h t h ey we r e s e l ec ted

INFERENTIAL STATISTICS

Many research situations begin with a
population that forms a normal distribution

A sample is selected from the population, and
receives a “treatment”, & the goal is to evaluate
the treatment

Probability is used to
decide whether the
treated sample is
“noticeably dif ferent”
from the population
 Do we reject the null

hypothesis or not?

 I t i s s t a t i s tic ally I M P OS SIB L E to d e m o ns t rate a p h e no m eno n
i s a b s o l utely t r u e ( a l l a b o u t p ro b a bilit y! )
 Researchers instead falsify
 Supporting evidence may not signal a theor y is always true;

disconfirming evidence does signal that a theor y is not always true

 S o , we s e e k to fi nd ev i denc e t h a t i s i t u nl i kely t h a t o u r
hy p ot hes is i s f a l s e

• P ro c e s s : L o g i c o f t h e N u l l H y p ot h e sis
• We determine what the population (distribution) would look like if the null

hypothesis were true
• Then, we see if our sample data are likely to have come from this distribution
• In other words, we look for the likelihood that our data are

consistent with the idea that there is no effect

PROVE

HYPOTHESIS TESTING

 A hy pothes i s tes t i s a s t a t i s ti cal p ro c e dur e t h a t u s e s d a t a
c o l l e c ted f ro m a s a m p le to eva l uate a p a r t i cu lar hy p ot hesi s
a b o u t a p o p u l at io n
 We make predictions about an unknown population

THE HYPOTHESIS TESTING
PROCEDURE

1. State the hypotheses.
2. Locate the critical region.

(Note: You must find the value for df and use the distribution
table for whichever statistic you are using.)

3. Calculate the test statistic.
4. Make a decision.
 Either “reject” or “fail to reject” the null hypothesis.

5. Report your findings
1. Formal APA statistical tag
2. Plain language description of nature of effect
3. Include effect size

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 1 : S t a te t h e hy pot hes es
 We have two opposing hypotheses about the population
 Null hypothesis (H0):
 Predicts that the independent variable (treatment) has no ef fec t on the

dependent variable for a population

 Alternative hypothesis (H1)
 Predicts that the independent variable (treatment) does have an ef fec t on

the dependent variable

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 2 : L o c a te t h e c r i t ic a l r e g io n
 Must decide which sample means would be consistent with the null

hypothesis (and therefore lead to accepting the null hypothesis), and
which sample means would be at odds with the null hypothesis (and
therefore lead to rejecting the null hypothesis)
 The alpha value, or the level of significance, is the probability value

used to define “ver y unlikely”
 E.g., with α = .05, we separate the most unlikely 5% of the sample means

(extreme values) from the most likely 95% of the sample means (central
values)

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 2 : L o c a te t h e c r i t ic a l r e g io n
 The critical region is defined by the alpha level
 E.g., An alpha of .05 (α = .05) indicates that the size of the critical

region is p = .05 (5% of all possible sample means)

THE HYPOTHESIS TESTING
PROCEDURE

 S te p 3 : C a l c ula te s a m ple s t a t is t ics
 E.g., Compare the sample mean (from your data) with the null

hypothesis (e.g., that the population mean is the same as the
original population)

Obtained difference between our data and hypothesis
___________________________________________________________________________________________________________________________________

Standard difference between our data and hypothesis

Observed difference

How much difference we would expect by chance

THE HYPOTHESIS TESTING
PROCEDURE

 Step 4: Make a decision
 Two possible outcomes:
 You reject the null hypothesis, and conclude that the treatment does

have an effect.
 You fail to demonstrate that the treatment has an effect, so you fail to

reject the null hypothesis.

SUMMA RY: OUTCOMES OF HY POTHESIS TESTING

Decision

True Status of H0

No Effect
H0 True

Effect
H0 False

Reject H0 Type I Error

α
Probability of T1 Error

Correct

1 – β
‘power’

Retain H0
(also called

“fail to reject”)

Correct

1 – α
Level of confidence

Type II Error

β
Probability of T2 Error

REVIEW OF
HYPOTHESIS TESTING
WITH THE T STATISTIC

SINGLE-SAMPLE T STATISTIC

 Do newborn infants prefer to look at attractive
versus unattractive faces?

 Infants were shown two photographs of women’s
faces (one rated by adults as more attractive than
the other)

 Pair of faces remained on the screen until the baby
accumulated a total of 20 seconds of looking

 DV: Number of seconds spent looking at the
attractive face

 N = 9, M = 13 seconds, SS = 72

(Two-tailed test, α = .05)
19

SINGLE-SAMPLE T STATISTIC

1 . State the null and alternative hypotheses
Null hypothesis:
 The infants have no preference for either face
 H0: μattractive = 10 seconds

Alternative hypothesis:
 The infants prefer one face over the other
 H1: μattractive ≠ 10 seconds

SINGLE-SAMPLE T STATISTIC

2 . L o c a te t h e c r i t i c a l r e g i o n :

 df = n – 1 = 9 – 1 = 8
 Two-tailed test at the .05 level of significance
 Critical region consists of t values greater than

+2.306 or less than -2.306

tcrit = +/-2.306

SINGLE-SAMPLE T STATISTIC

3 . C a l c ula te t h e te s t s t a t i st ic i n 3 s tep s:

a. Sample variance

a. Estimated standard error

a. t statistic

s2 = SS
n-1

= SS = 72 (given in the problem) = 9
df 8 (calculated previously)

SINGLE-SAMPLE T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 3 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t b a b ies d o s h ow a
p r e fer ence w h e n g i ven a c h o i c e b et ween a n a t t r a c ti ve a nd
u na t t r ac t ive f a c e .

5 . Re p o r t:
 “ T h ere wa s a s i g ni fic ant e f fec t o f a t t r a c t ivenes s o n i nf a nt –

l o o k i ng t i m e , t ( 8 ) = 3 . 0 0 , p < . 0 5 , t wo – t a i led. I n ot h e r wo r d s ,
i nf a nt s l o o ke d l o ng e r a t a t t r a c tive f a c e s t h a n ex p ec ted. ”

BETWEEN-SUBJECTS OR
INDEPENDENT-MEASURES DESIGNS

Use a separate group of participants for each
treatment condition (or for each population)

We use subscripts to denote
which population or sample
we are referring to:

e.g., μ1, μ2

SI NGLE-SAMPLE VERSUS I NDEPENDENT
SAMPLES T FORMULAS

Single sample:

Independent samples:

• According to the null hypothesis, the population mean
difference is 0 (μ1 – μ2 = 0)

INDEPENDENT-MEASURES T STATISTIC

n = 10
M = 93
SS = 200

n = 10
M = 85
SS = 160

Do students who regularly watched Sesame Street when they
were growing up have better grades than students who did not
watch Sesame Street?

INDEPENDENT-MEASURES T STATISTIC

1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
 N u l l hy p ot h esis :
 There is no difference between the high school grades for students

who watched Sesame Street and those who did not
 H0: μ1 – μ2 = 0

 A l ter nati ve hy p ot hes is:
 There is a difference between the high school grades for students

who watched Sesame Street and those who did not
 H1: μ1 – μ2 ≠ 0

INDEPENDENT-MEASURES T STATISTIC

2. Locate the critical region:
 df = (n 1 – 1) + (n 2 – 1) = df 1 + df 2 = 9 + 9 = 18
 Two-tailed test with α = .05, t crit = +/-2.101

INDEPENDENT-MEASURES T STATISTIC

3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Pooled variance

a. Estimated standard error

a. t statistic

sp2 = SS1 + SS2 = 200 + 160 = 360 = 20
df1 + df2 9 + 9 18

INDEPENDENT-MEASURES T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 4 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e r e i s a s i g ni fic ant
d i f ferenc e b et ween t h e h i g h s c h o o l g r a des o f t h o s e s t u d e nt s
w h o wa t c h ed S e s am e St r e et a nd t h o s e w h o d i d not .

5 . Re p o r t
 “ T h ere wa s a s i g ni fic ant e f fec t o f p ro g r a m c o nd i t i o n o n

a c a d emic a c h i evement , t ( 1 8 ) = 4 . 0 0 , p < . 0 5 , t wo – t a i led. I n
ot h e r wo r d s , t h e s t u d ent s w h o wa t c h ed S e s a me St r e et h a d
h i g her g r a d es t h a n t h o s e w h o d i d not wa t c h t h e p ro g r am”

REPEATED-MEASURES T STATISTIC

 T h e d a t a we u s e i n a r e p e ated m e a s ur es t te s t a r e d i f fer enc e
s c o r es :

 N u m er ato r o f t s t a t i s t ic m e a s ur es a c t u al d i f fer enc e b et ween
t h e d a t a M D a nd t h e hy p ot hes is μD

 D e no m i na to r m e a s ur es t h e s t a nd ar d d i f fer ence t h a t i s
ex p ec ted i f H 0 i s t r u e

 S a m e p ro c e s s a s ot h e r te s t s

Difference score = D = X2 – X1

REPEATED-MEASURES T STATISTIC

 D o e s t h e c o l o u r r e d i nc r e a s e m e n’ s a t t r a c ti o n to
wo m e n? Re s e a rc her s p r e p ared a s et o f 3 0 wo m e n’ s
p h oto g r aphs , 1 5 m o u nte d o n a r e d b a c k gro und a nd 1 5
m o u nte d o n a w h i te b a c k gro und

 One p i c t u re i s t h e “ te s t p h oto g r aph” a nd i t a p p ear s
t w i c e , o nc e m o u nte d o n r e d a nd o nc e o n w h i te.

 A s a m p le o f n = 9 m e n r a te e a c h o f p h oto g r aphs o n a
1 2 – p o i nt s c a l e . I s t h e te s t p h oto g raph j u d ged
s i g ni ficant ly m o r e a t t r a c t ive w h e n p r e s e nted o n a r e d
b a c k gro und ?

REPEATED-MEASURES T STATISTIC

1 . S t a te t h e n u l l a n d a l ter na t ive hy pot hes es
 N u l l hy p ot h esis :
 There is no difference in the attractiveness ratings between the red-

mounted versus white-mounted photo
 H0: μD= 0

 A l ter nati ve hy p ot hes is:
 There is a difference in the attractiveness ratings between the red-

mounted and white-mounted photo
 H1: μD ≠ 0

REPEATED-MEASURES T STATISTIC

2. Locate the critical region:
df = n – 1 = 9 – 1 = 8
Two-tailed test with α = .01 , tcrit = +/-3.355

REPEATED-MEASURES T STATISTIC

3 . C a l c ula te t h e t s t a t ist ic i n 3 s te ps:
a. Sample variance

a. Estimated standard error

a. t statistic

s2 = SS
n-1

= SS = 18 = 2.25
df 8

50.
9
25.22

===
n
s

s
DM

REPEATED-MEASURES T STATISTIC

4 . M a ke a d e c isio n r e ga rdi ng H 0:
 T h e o b t a i ned t s t a t i s ti c o f 6 . 0 0 f a l l s w i t h i n t h e c r i t i c al r e g i o n,

s o we r e j e c t H 0 a nd c o nc l u d e t h a t t h e b a c k gro und c o l o u r h a s a
s i g ni ficant e f fe ct o n t h e j u d ged a t t r a c t iveness o f t h e wo m a n
i n t h e te s t p h oto g r aph

5 . Re p o r t:
 “ C h a nging t h e b a c k gro u nd c o l o u r f ro m w h i te to r e d

s i g ni ficant ly i nc r e a sed t h e a t t r a c t ivenes s r a t i ng o f t h e wo m a n
i n t h e p h oto g r aph, t ( 8 ) = 6 . 0 0 , p < . 01 , t wo – t a i led. I n ot h e r
wo r d s , wo m e n o n a r e d b a c k gro und we r e p e rc eived a s m o r e
a t t r a c ti ve t h a n wo m e n o n a w h i te b a c k gro u nd. ”

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 S i ng l e s a m p le t te s t
 Comparing an unknown population mean (for our treatment

condition) to a known population mean (the μ for the original
population given in the problem)
 Is our unknown population mean (after treatment) the same as the

mean in the original population? Or is there a difference?
 H0: μtreatment = 10 seconds
 H1: μtreatment ≠ 10 seconds

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 I nd e p endent m e a s ures t te s t
 Comparing two unknown population means (for each of our

treatment conditions)
 Is the population mean for the first treatment condition the same as

the population mean for the second treatment condition? Or is there
a difference?
 H0: μ1 – μ2 = 0 (or μ1 = μ2)
 H1: μ1 – μ2 ≠ 0 (or μ1 ≠ μ2)

R E V I E W :
H Y P O T H E S E S F O R D I F F E R E N T T Y P E O F T T E S T S

 Re p e ated m e a s ur es t te s t
 Remember that here we are interested in difference scores

(treatment 2 score – treatment 1 score)
 Is the mean difference for the population equal to zero (no change

between score 1 and score 2)? Or is there a difference?
 H0: μD = 0
 H1: μD ≠ 0

Psy 202H1: �Statistics iI��Module 1: �Foundations Review��
Game Plan
A note About the Course Prereq
Super Speedy Review
Statistics
Statistics
Inferential Statistics
Inferential Statistics
PROVE
Hypothesis Testing
The Hypothesis Testing Procedure
The Hypothesis Testing Procedure
The Hypothesis Testing Procedure
The Hypothesis Testing Procedure
The Hypothesis Testing Procedure
The Hypothesis Testing Procedure
Summary: Outcomes of Hypothesis Testing
Review of Hypothesis Testing with the t statistic
Single-Sample t Statistic
Single-Sample t Statistic
Single-Sample t Statistic
Single-Sample t Statistic
Single-Sample t Statistic
Between-Subjects or �Independent-Measures Designs
Single-Sample versus Independent Samples t formulas
Independent-Measures t Statistic
Independent-Measures t Statistic
Independent-Measures t Statistic
Independent-Measures t Statistic
Independent-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Repeated-Measures t Statistic
Review: �Hypotheses for different type of t tests
Review: �Hypotheses for different type of t tests
Review: �Hypotheses for different type of t tests

TD0409-01 课件/Psy 202_2_OneWayANOVA_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 2:
ONE WAY ANOVA

1. Intro to ANOVA
1. Designs with More Than Two Groups
2. ANOVA Basics
3. Example

GAME PLAN

INTRO TO DESIGNS WITH
MORE THAN 2 GROUPS

 Re a s o ns ( i n s e r v i c e o f p r e c is io n):
 Allows researchers to compare multiple treatments
 …with no treatment (control group) or placebo as well
 Allows researchers to compare effects of multiple independent

variables simultaneously
 Factorial designs: more in Psy 202!

 U p fi r s t: m o r e t h a n t wo l evels o f o ne I V
 IV (mood): Positive vs Negative
 IV (mood): Happy, Sad, Angr y

WHEN WOULD YOU WANT TO STUDY
MORE THAN T WO GROUPS?

EFFECT OF ANTI-DEPRESSANT DOSAGE
ON MENTAL HEALTH

1 MG 100 MG
0

1 MG 50 MG 100 MG 150 MG

(errors of interpolation) (errors of extrapolation) 5

EFFECT OF CAFFEINE
ON TEST PERFORMANCE

10 MG 100 MG
0

10 MG 50 MG 100 MG

“curvilinear effects”

 Ad vant ages
 Advance theor y with precision (boundar y conditions?)
 Insight into non-linear effects
 For complex effects, reduces both:
 The number of experiments conducted
 The number of par ticipants needed

PROS AND CONS OF ADDING LEVELS TO IV

n = 20 n = 20

n = 20 n = 20 n = 20 n = 20

Set of 2 condition studies: Total N = 120 One study with one factor with three levels: Total N = 60
8

 C o s t s
 Increase sample size for an individual study (from a study with 2 groups)
 Increases in time, money needed to conduct research

PROS AND CONS OF ADDING LEVELS TO IV

 H ow d o we a na l y ze t h e d i f fer enc e b et ween 3 g ro u p s a t a t i m e?
 Option 1: A series of t-tests
 E.g., group 1 vs 2, group 2 vs 3, group 1 vs 3

 ALERT: This INFLATES the likelihood of Type I Error!
 “Test-wise α” = .05

 “Experiment-wise α” ≈ .14 for 3 t-tests

 1 – (1- α)c, where c = number of comparisons
 1 – (1-.05)3 = 1 – .86 = .14
 No longer within range of acceptable risk of Type I Error

 A possible solution: Bonferroni correction
 Divide desired alpha by number of comparisons*
 .05/3 = .017 – new cut-off for determining significance
 BUT, as we learned, by reducing the likelihood of a Type I error, we increase likelihood of Type II

error (or, decrease power)

AND, A STATISTICAL COST

* Don’t forget, we should be planning out our hypothesis tests before we do them, so we know what this number is ahead of time

 O ne – Way A N OVA

 A n a n a ly sis o f t h e va r ianc e i n a s et o f s c o r e s o r o b s e r vatio ns ,
te s t s w h et her t h e d i f fer ences i n m e a ns a c ro s s l evel s o f s o m e
f a c to r i s s i g ni fic ant ly g r e ater t h a n t h e d i f fer enc es a m o ng
s c o r e s i n g e ne r al

 C o m p a r es a l l g ro u p m e a ns s i m u lt aneo us ly
 One statistic to interpret (initially)
 Just tells you there is A dif ference, not where the dif ference exists
 So, we do post hoc tests to clarify result
 Handles inflation of Type I Error

A BETTER STATISTICAL SOLUTION

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA12

RESEARCH PROBLEM

 Does the presence of others during an emergency
af fect helping behavior?
 Conduct an experiment with 3 conditions
 Wait alone
 Wait with 1 other person
 Wait with 2 other people

 IV = Number of people present
 3 “levels” (0, 1, 2)

 DV = Time it takes (in seconds) to call for help

DATA FROM HELPING STUDY

 Seconds lapsed before calling for help

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M1 = 16 M2 = 24 M3 = 29

Are these 3
means

significantly
different

from each
other?

THE LOGIC OF ANOVA

t = difference between sample means
difference expected by chance (error)

F = variance between sample means
variance expected by chance (error)

THE LOGIC OF ANOVA

 Variance = dif ferences between scores

 Two sources of variance:

 Between group variance
 Within group variance

F = variance between sample means
variance expected by chance (error)

BET WEEN GROUP VARIANCE

 Why do people in dif ferent groups dif fer?

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M2 = 24 M3 = 29M1 = 16

BET WEEN GROUP VARIANCE

Why do people in dif ferent groups dif fer?

1. Treatment effect = differences caused by our
experimental treatment

 Systematic differences

2. Chance = differences due to random factors including…
 Individual differences
 Experimental error

 Non-systematic, random differences

WITHIN GROUP VARIANCE

 Why do people within the same group dif fer,
even though they were treated alike?

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M2 = 24 M3 = 29M1 = 16

WITHIN GROUP VARIANCE

Why do people within the same group dif fer?

1. Chance = differences due to random factors
including…
 Individual differences
 Experimental error

 Non-systematic, random differences

SOURCES OF VARIANCE

Total Variance

Between Group
Variance

• Treatment Effect
• Chance (error)

Within Group
Variance

• Chance (error)

Numerator of F-ratio Denominator of F-ratio
21

THE F-RATIO

 If H 0 is True:

 If H 0 is False:

F = Between-group (Treatment + Chance)
Within-group (Chance)

F = Treatment Effect + Chance
Chance

F = 0 + Chance
Chance

≈ 1

> 1

THE F-RATIO

 F is the ratio between two variance estimates

 Denominator is also called “error term”

 How large does obser ved F have to be to conclude
there is a treatment ef fect (to reject H 0)?

 Compare observed F to critical values

 Based on the sampling distribution of F

THE SAMPLING DISTRIBUTION OF F

 A family of distributions (just like t)

 Each with a pair of degrees of freedom (df)

 Critical values shown in F table

 Need 3 pieces of information

(1) α level

(2) dfbetween (dfnumerator )

(3) dfwithin (dfdenominator)

THE SAMPLING DISTRIBUTION OF F

 F -values are always positive
 Variance cannot be negative

 If H 0 is true then F ≈1
 So peak appears around 1

F0 1 2 3 4 5 6

THE SAMPLING DISTRIBUTION OF F

 Shape of distribution will change with df
 Large df will result in less spread to the right
 In practical terms, leads to smaller critical values

of F (closer to 1.0)

F0 1 2 3 4 5 6

CRITICAL VALUES OF F

A portion of the F distribution table. Entries in regular type are critical values for the α =.05,
and bold type values are for the α=.01. The critical values for df = 2,12 have been
highlighted. Notice that we no longer differentiate between one-tailed or two-tailed
hypotheses. All values of F are positive, and all hypotheses are non-directional. Some
sources print separate tables for different alpha levels.

df
Between

df
Within

α =.05

α =.01

Learning check!

HYPOTHESIS TESTING
WITH ANOVA

RESEARCH PROBLEM

 Does the presence of others during an emergency
af fect helping behavior?
 Conduct an experiment with 3 conditions
 Wait alone
 Wait with 1 other person
 Wait with 2 other people

 IV = Number of people present
 3 “levels” (0, 1, 2)

 DV = Time it takes (in seconds) to call for help

DATA FROM HELPING STUDY

 Seconds lapsed before calling for help

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

M1 = 16 M2 = 24 M3 = 29

Are these 3
means

significantly
different

from each
other?

HYPOTHESIS TESTING WITH ANOVA

 Research question
 Does presence of others affect helping?

 Step 1: Statistical Hypotheses

 H0: µ1 = µ2 = µ3
 H1: At least one mean is different from another

 No longer differentiate between one-tailed and
two-tailed tests.
 All ANOVA tests are non-directional

 Why?

HYPOTHESIS TESTING WITH ANOVA

 Step 2: Decision Rule: Look up critical value of F in Table
 α level
 dfnumerator = dfbetween
 dfdenominator = dfwithin

 Step 3: Compute obser ved F -ratio

 Step 4: Make a Decision (Reject or retain H 0)

 **Step 5: If H 0 rejected, conduct post-hoc comparisons

 Step 6: Interpret and Repor t Findings

FINDING THE CRITICAL VALUE

Find Fcritical in Table

 Need to know 3 things

 α level
 dfnumerator = dfbetween
 dfdenominator = dfwithin

 If α = .05 and df = 2,15, Fcritical = 3.68

CRITICAL VALUES OF F FOR DF=2,15

Critical region;
Reject H0

3.68 6.23

COMPUTING ANOVA

Steps in computing the ANOVA
Compute SS
Compute df (two values!)

Compute MS

Compute F

Keep track of your computations in an ANOVA
Summar y Table

COMPUTING ANOVA

The ANOVA summar y table

COMPUTING ANOVA

Variance = “Mean Square” (MS) =

F = between-group variance
within-group variance

SS
df

F = MS between
MS within

Throwback
to Module 3!

COMPUTING ANOVA

 STEP 1: Compute Sums of Squares (SS)

SSTotal =
2

2 GX
N

−∑

SSBetween =
22 GT

Nn
−∑

SSWithin = Σ(SS for each group) or SSTotal − SSBetween

Where:

• X = each value of X

• T = treatment group total (ΣX)

• G = grand total (ΣT)

• n = sample size of each group

• N = total sample size (Σn)

COMPUTING ANOVA

 STEP 2: Compute Degrees of Freedom (df)

dfBetween = k – 1

Where:

• n = sample size of each group

• N = total sample size (Σn)

• k = number of groups

dfWithin = N – k or Σ(n-1)

dfTotal = N – 1

COMPUTING ANOVA

 STEP 3: Compute Mean Squares (MS)

MSBetween =
between

between

df
SS

MSWithin =
within

within

df
SS

COMPUTING ANOVA

 STEP 4: Compute the F -Ratio

F-Ratio =
within

between

MS
MS

COMPUTING ANOVA

The ANOVA Summar y Table

Source SS df MS F

Between group SSB dfB MSB MSB
MSWWithin group

(error)
SSW dfW MSB

Total SST dfT

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23

19 23 32

20 23 28

18 30 34

12 20 30

13 21 27

n = 6 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSTotal = 72295221024418
414

]27…201914[
2

2222 =−=−++

SSTotal =
2

2 GX
N

−∑

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSBetween = 51695221003818
414

6
174

6
144

6
96 2222

=−=−











++

SSBetween =
22 G

n
T

N
−∑

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414

SSWithin = 722 – 516 = 206

SSWithin= SSTotal − SSBetween

COMPUTING ANOVA

Alone
1 other
present

2 others
present

14 27 23
19 23 32
20 23 28
18 30 34
12 20 30
13 21 27

n 6 6 6 N = 18
SS 58 72 76 SSWithin= 206

SSWithin = ΣSS = 58 + 72 + 76 = 206

SSWithin = ΣSS for each group

You will often
be given

these values

COMPUTING ANOVA

Let’s fill in our SS values

Source SS df MS F

Between group 516

Within group
(error)

206

Total 722

Notice
722 = 516 + 206
SST = SSB + SSW

COMPUTING ANOVA

Now compute degrees of freedom (df)

Source SS df MS F

Between group 516 k-1

Within group
(error)

206 N-k

Total 722 N-1

Where k = 3 N = 18

COMPUTING ANOVA

Source SS df MS F

Between group 516 (k-1)
3–1 = 2

Within group
(error)

206 (N-k)
18–3 = 15

Total 722 (N-1)
18–1 = 17

Where k = 3 N = 18

COMPUTING ANOVA

Source SS df MS F

Between group 516 2

Within group
(error)

206 15

Total 722 17

Notice
17 = 15 + 2

dfT = dfB + dfW

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 SSB/dfB

Within group
(error)

206 15 SSW/dfW

Total 722 17

Now compute the Mean Squares (MS)

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 516/2=258

Within group
(error)

206 15 206/15=13.73

Total 722 17

Now compute the Mean Squares (MS)

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 MSB
MSW

Within group
(error)

206 15 13.73

Total 722 17

Now compute the F-Ratio

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 258 = 18.79
13.73

Within group
(error)

206 15 13.73

Total 722 17

Now compute the F-Ratio

COMPUTING ANOVA

Source SS df MS F

Between group 516 2 258 18.79

Within group
(error)

206 15 13.73

Total 722 17

All of this work for the final F-ratio!

MAKE A DECISION AND REPORT

 Does our obser ved F (1 8.79) exceed our critical
value of F (3.68)?

 Yes!

 Reject H 0

 Basic format:
 “There was a significant effect of how many others were

present on the time it took participants to call for help, F
(2, 15) = 18.79, p < .05. [to be continued]”

REPORTING A F -STATISTIC

 A closer look…

F(2, 15) = 18.79, p < .05
Test

statistic

Observed
value

alpha
level

Degrees of
freedom
(B, W)

Significance
Sig? p < α

Nonsig? p > α

Learning check!

INTERPRETING
FINDINGS

INTERPRETING FINDINGS FROM
ANOVA

 At least two of the means are significantly dif ferent
from each other
 But, which ones?
 Must conduct additional analyses to pinpoint specific

mean differences
 Called “post hoc tests” or (posttests)

 In other words,
Omnibus test  the “main test” (in this case the

one-way ANOVA)
Post hoc test  the “follow-ups”

POST HOC TESTS

 Pinpoint specific group dif ferences

 Conduct multiple comparisons, controlling for
experimentwise Type I error rate

 Many types of post hoc tests
 Mostly based on comparing absolute value of differences between

pairs of means to a critical value

 Common ones include
 Bonferroni Correction for Multiple Comparisons

 Fisher’s Least Significant Difference (LSD)

 Tukey Honestly Significant Difference (HSD)
61

TUKEY HSD TEST

 Tukey Honestly Significant Dif ference (HSD)
 HSD = minimum dif ference between means

needed for statistical significance
 How big does the difference between two means have

to be in order to conclude that they are significantly
different from each other?
 Like a critical value, but a critical “mean difference”

 Assumes equal n

TUKEY HSD TEST

 Step 1: Find the value of “q”(Table)
 Need to know 3 things:
 α
 dfW
 k

 Step 2: Compute HSD

HSD =
n

MSwithinq
Where n = group sample
size, assuming equal n

in each group

TUKEY HSD TEST

 Step 3: Compute dif ference between each
pair of means and compare to HSD
 M1 – M2 = ?
 M1 – M3 = ?
 M2 – M3 = ?

 Compare each mean difference to the HSD
 If the difference equals/exceeds the HSD,

conclude that the means are significantly
different from each other

COMPUTING TUKEY’S HSD

Alone
1 other
present

2 others
present

14 27 23

19 23 32

20 23 28

18 30 34

12 20 30

13 21 27

n 6 6 6 N = 18
Totals T1=96 T2=144 T3=174 G = 414
Means M1=16 M2=24 M3=29

TUKEY HSD TEST: EXAMPLE

 Step 1: Find the value of “q” (Q Table)
 α = .05 dfW = 15 k = 3

TUKEY HSD TEST: EXAMPLE

 Step 1: Find the value of “q” (Q Table)
 α = .05 dfW = 15 k = 3
 From Table : q = 3.67

 Step 2: Compute HSD

HSD = = ± 5.55 seconds
6
73.13

67.3
n

MSwithin =q

So, a pair of means must
differ by at least 5.55 in
order to be significantly

different

TUKEY HSD TEST: EXAMPLE

 Step 3: Compute dif ference between each
pair of means and compare to HSD

 M1 – M2 = 16 – 24 = – 8

 M1 – M3 = 16 – 29 = -13

 M2 – M3 = 24 – 29 = -5

Exceeds 5.55

Does not
exceed 5.55

TUKEY HSD TEST: EXAMPLE

 What do we conclude?

 M1 differs from M2 and M3
 People waiting alone helped significantly faster than

people waiting with others

 M2 & M3 do NOT differ from each other

 There was no difference in helping times for individuals
waiting with 1 other person and individuals waiting with
2 other people

MEASURE OF EFFECT SIZE

 Compute propor tion of variance explained by the
treatment ef fect

 Propor tion of total variance accounted for by
variability between groups

 In ANOVA , r 2 typically called η2 (pronounced “eta
squared”)

r2 =
Total

Between

SS
SS

MEASURE OF EFFECT SIZE: EXAMPLE

 71% of the variance in helping behavior
(number of second lapsed before seeking
help) is explained by the number of people
present

r2 = η2 = 71.
722
516

SS
SS

Total

Between ==

REPORTING RESULTS OF AN ANOVA

Formal description of findings:
“There was a significant effect of the number of people present
on the time it took (in seconds) for participants to seek help,
F(2,15) = 18.79, p<.05, η2 = .71. Tukey post-hoc comparisons
indicated that participants who were waiting alone helped
significantly faster (M=16, SD=3.4) than participants who
waited with one other person (M=24, SD=3.8) or with two other
people (M=29, SD=3.9), p < .05.”

effect
size

SD for each group

SD =
1

SS
−n

INDEPENDENT MEASURES ANOVA
ASSUMPTIONS

 T h e o b s e r vat io ns w i t h in e a c h s a m p le m u s t b e i nd e p endent .
 T h e p o p u lat io n f ro m w h i c h t h e s a m p les a r e s e l e c ted m u s t b e

no r m a l .
 T h e p o p u lat io ns f ro m w h i c h t h e s a m p les a r e s e l e cted m u s t

h ave e qu a l va r i anc es ( h o m o g eneit y o f va r i a nce) .
 Violating the assumption of homogeneity of variance risks invalid

test results.

Learning check!

 M i nd t ap Ac c e ss
 Psy 201 last term? Just log in with same credentials and access course

using course code on syllabus
 Psy 201 last summer? Submit course code request to form on syllabus

webpage
 Ever yone else, use direct link to bookstore on syllabus webpage

 Ad d eve r yt hing to yo u r c a l e ndar now!
 Mindtap and tutorial problem sets often overlap
 Due date is not “do date”

TO DO

Psy 202H1: �Statistics iI��Module 2: �One Way ANOVA�
Game Plan
Intro to designs with more than 2 groups
When Would you want to study more than two groups?
Effect of Anti-depressant dosage on Mental Health
Effect of Caffeine �on Test Performance
Pros and Cons of Adding Levels to IV
Slide Number 8
Pros and Cons of Adding Levels to IV
AND, A statistical Cost
A Better Statistical Solution
Slide Number 12
Research problem
Data from Helping Study
The Logic of ANOVA
The Logic of ANOVA
Between Group Variance
Between Group Variance
Within Group Variance
Within Group Variance
Sources of Variance
The F-Ratio
The F-Ratio
The Sampling Distribution of F
The Sampling Distribution of F
The Sampling Distribution of F
Critical Values of F
Hypothesis Testing with ANOVA
Research problem
Data from Helping Study
Hypothesis testing with ANOVA
Hypothesis testing with ANOVA
Finding the Critical Value
Critical values of F for df=2,15
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Computing ANOVA
Make a Decision and report
Reporting a F-statistic
Interpreting Findings
Interpreting Findings from ANOVA
Post Hoc Tests
Tukey HSD Test
Tukey HSD Test
Tukey HSD Test
Computing Tukey’s HSD
Tukey HSD Test: Example
Tukey HSD Test: Example
Tukey HSD Test: Example
Tukey HSD Test: Example
Measure of Effect Size
Measure of Effect Size: Example
Reporting Results of an ANOVA
Independent Measures ANOVA Assumptions
To Do

TD0409-01 课件/Psy 202_9_advanced concepts_W22_topost-1.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 9:
INTRO TO ADVANCED CONCEPTS

1. Nonparametic Tests

2. Intro to Advanced Stats
1. Multilevel modeling
2. Factor analysis
3. Mediation
4. Meta-analysis

GAME PLAN

“ W hat d o we ne e d to k now ? ”

NON-PARAMETRIC VS
PARAMETRIC TESTS

PA RAMETRIC VS . N O N PARAMETRIC T E ST S

 Pa r a met r ic te s t s m a ke a s s u m pti o ns a b o u t t h e s h a pe o f t h e
p o p u lat io n d i s t r ibut io n a nd ot h e r p o p u lat io n p a r a meter s ( e . g . ,
μD = 0 )
 Normal distribution in the population
 Homogeneity of variance in the population
 Numerical score for each individual

 Re qu i r e d a t a f ro m a n i nte r val o r r a t i o s c a l e

 N o np a r amet r ic te s t s d o not m a ke t h e s e s a m e a s s u m pt io ns
 Most do not state hypotheses in terms of specific population

parameters
 Participants usually classified into categories
 Frequencies
 Nominal, ordinal

PARAMETRIC VS. NONPARAMETRIC TESTS

 E.g., What if we wanted to examine the
relationship between the national ranking
of US college basketball teams and the
annual athletics budget of the college?

School National
Rank

Annual $
(in millions)

Gonzaga 1 19

Duke 2 67

Indiana 3 60

Louisville 4 69

Georgetown 5 47

Michigan 6 111

Kansas 7 53

… … …

Virginia 25 46 5

“GOODNESS OF FIT” TEST AND THE
ONE SAMPLE T TEST

 N o np a r amet r ic ( c h i – squ are) ve r s us p a r a met r ic ( t ) te s t

 S i m i lar it y: B ot h te s t s u s e d a t a f ro m o ne s a m p le to te s t a
hy p ot hes is a b o u t a s i ng l e p o p u l atio n

 L eve l o f m e a s ur ement d eter mine s te s t:
 Numerical scores (inter val / ratio scale) make it appropriate to

compute a mean and use a t test
 Classification in non-numerical categories (ordinal or nominal scale)

makes it appropriate to compute proportions or percentages to do a
chi-square test

SPECIAL APPLICATIONS OF THE
CHI-SQUARE TEST S

 A s a s u b s t it ute fo r a p a r a met r ic te s t:
 The chi-square test for independence & the Pearson correlation
 The chi-square test for independence & the independent-measures t

test (or ANOVA)

 U nd e r w h a t c i rc u ms t ances wo u l d yo u c h o o s e to c o nd u c t t h e
c h i – s qu are ( no np a r am etr ic a l ter nat ive)?

Learning Check:
1. Data do not meet the assumptions for a standard parametric test
2. Data consist of nominal or ordinal measurements

EXAMPLE: THE MANN- WHITNEY U TEST (VS.
I NDEPENDENT T)

 A business owner
measured the job
satisfaction of his day -shif t
and night-shif t workers.
Each employee rated job
satisfaction on a scale
from 1 (not satisfied at all)
to 100 (completely
satisfied). Test whether
ratings of job satisfaction
dif fered between the two
groups using the Mann-
Whitney U test at a .05
level of significance.

Day Shift Night Shift

88 24

72 55

93 70

67 60

62 50

EXAMPLE: THE W ILCOXON SI GNED-RANKS T
TEST (VS. R EPEATED-MEASURES T)

 A researcher measured the number of cigarettes
patients smoked per day in a sample of 6 patients
before and 6 months af ter being diagnosed with lung
cancer. Test whether patients significantly changed
their smoking habits following the diagnosis using
the Wilcoxon signed-ranks T test at a .05 level of
significance. Before

Diagnosis
After

Diagnosis

23 20

25 5

13 8

12 16

9 15

22 19 9

EXAMPLE: THE K RUSKAL- WALLIS H TEST (VS.
ONE- WAY ANOVA)

 A researcher asks a sample of 15 students (5 per
group) to view and rate how ef fectively they think
one of three shor t video clips promoted safe
driving. The students rated the clips from 1 (not
ef fective at all) to 100 (extremely ef fective). Test
whether ratings dif fer between groups using the
Kruskal-Wallis H test at a .05 level of significance.

Clip A Clip B Clip C

88 92 50
67 76 55
22 80 43

14 77 65

42 90 39
10

EXAMPLE: THE FRIEDMAN TEST (VS.
R EPEATED ANOVA)

 A d o c to r i s c u r i o u s a b o u t w h e t h e r w o m e n w i t h o u t
h e a l t h i n s u r a n c e m a ke r e g u l a r o f f i c e v i s i t s
t h r o u g h o u t t h e c o u r s e o f t h e i r p r e g n a n c i e s . S h e
s e l e c t s a s a m p l e o f 4 p r e g n a n t w o m e n a n d r e c o r d s
t h e n u m b e r o f h o s p i t a l v i s i t s m a d e d u r i n g e a c h
t r i m e s te r o f t h e i r p r e g n a n c y.

 Te s t w h e t h e r t h e r e a r e d i f f e r e n c e s i n t h e n u m b e r o f
o f f i c e v i s i t s m a d e ov e r t h e c o u r s e o f t h e p r e g n a n c y
u s i n g t h e Fr i e d m a n te s t a t a . 0 5 l e v e l o f
s i g n i f i c a n c e .

Participant 1st Trimester 2nd Trimester 3rd Trimester

A 3 5 8
B 6 4 7
C 2 0 5

D 4 3 2
11

NONPARAMETRIC TESTS OVERVIEW

Statistic Purpose Example

Mann Whitney Compare two independent groups
when assumptions for independent t
not met

Determine whether a control group and
treatment group are different when the DV
is ordinal

Wilcoxon Signed-Rank Compare two matched or within
subject conditions when
assumptions for dependent t not
met

Determine whether ordinal ratings of
academic skill are different from ratings of
athletic skill for same group

Kruskal-Wallis Compare two or more independent
groups when assumptions for
oneway between subjects ANOVA
not met

Determine whether test scores from three
different instructional conditions are
different when scores are not distributed
normally

Friedman’s ANOVA Compare two or more matched or
within subject conditions when
assumptions for repeated ANOVA
not met

Determine whether ordinal ratings of
academic skill, athletic skill, and social skill
are different for same group of students

Chi Sq Goodness of Fit Compare observed frequency
distribution to null distribution

Determine whether there is a different in
proportion of A, B, C, D, F grades awarded
in school

Chi Sq Test of Independence Determine whether two categorical
variables are related

Test whether grade distributions differ by
gender 12

PARAMETRIC VS. NONPARAMETRIC TESTS

If you have a choice, which should you choose?

Things to consider:
Measurement
Assumptions
Variance
Undetermined scores

INTRO TO ADVANCED
PROCEDURES

I NTRO TO SOME ADVANCED PROCEDURES

 P u rp o s e:
 To provide you with some knowledge of additional procedures that

are available to help answer research questions

 To allow you to recognize and better understand these more
advanced procedures when you come across them in research
articles

ADVANCED PROCEDURES:
MULTILEVEL MODELING

 E s s e n t i al l y, t h i s re fe r s to c as e s o f re g re s s i o n w i t h g ro ups

 E xampl e : A re s e arc h e r i s i n te re s te d i n h ow muc h t h e n umbe r o f
h o ur s o n e s pe n ds s t ud y i n g fo r a s t at i s t i c s exam pre di c t s s c o re s o n
t h e exam. He s ur vey s s t ude n t s f ro m a do z e n di f fe re n t s t at i s t i c s
c l as s e s . P ro bl e m? Th i n g s c o ul d be ve r y di f fe re n t i n t h e d i f fe re n t
c o ur s e s .
 Dif ferent teachers, dif ferent assignments, dif ferent tests, etc.

 So l ut i o n ? M ay c arr y o ut t h e re g re s s i o n s e parate l y fo r e ac h c o ur s e ,
t h e n ave rag e t h e re g re s s i o n c o e f fi c i e n t s ac ro s s t h e d i f fe re n t
c o ur s e s . M ay al s o g o a s te p f ur t h e r an d t ake i n to c o n s i de rat i o n
s o me u pper-lev e l va r ia b le s (g ro up-l eve l )  e . g . , do e s te ac h e r
ex pe ri e n c e pre d i c t ave rag e te s t s c o re s i n t h e i r c l as s e s ?
 Example of a standard multilevel modeling procedure (multilevel, because

you are looking at both lower-level (individual) and upper-level (group)
variables)

ADVANCED PROCEDURES:
MULTILEVEL MODELING

ADVANCED PROCEDURES:
FACTOR ANALYSIS

 Fa c to r a na l ys is i s a s t a t i s tic al p ro c e d ur e a p p lied i n
s i t u at io ns w h e r e m a ny va r i ables a r e m e a s ur ed. I t i d e nt ifies
g ro u p s o f va r i a bles ( f a c to r s ) t h a t te nd to b e c o r r e l ated w i t h
e a c h ot h e r a nd not ot h e r va r i a bles.
 Factor loading  the correlation of a variable with a factor.

Variables may have loadings on each factor, but usually have high
loadings on only one.

 E . g . , “ Fa c to r a na l y s is o f t h e D e nt a l Fe a r S u r vey d i s c l o s ed
t h r e e s t a b le a nd r e l i able f a c to r s . T h e fi r s t f a c to r r e l a ted to
p a t ter ns o f d e nt a l avo i d ance a nd a nt i c i pato r y a nx i et y. T h e
s e c o nd f a c to r r e l ated to fe a r a s s o c i ated w i t h s p e c ific d e nt a l
s t i m u li a nd p ro c e dur es . Fa c to r t h r e e c o nc e r ne d fe l t
p hy s io lo gic a ro u s a l d u r i ng d e nt a l t r e a t m ent . ”

 H ow d o p s yc ho lo gi st s fi nd u nd e rlying d i m e ns io ns wh e n we c a n
o nl y o b s e r ve s p e c i fic b e h avio ur s ?

FROM BEHAVIOURS TO CONSTRUCTS

1 . HOW MANY SEA MONSTERS?

2. HOW MANY SEA MONSTERS?

3. HOW MANY SEA MONSTERS?

 H ow c o u l d yo u te l l t h e nu m b er o f s e a m o ns te r s w h e n yo u
c o u l d o nl y s e e p a r t s o f t h e m ? Yo u s aw v i s i ble p a r t s m ove
to g et her a nd ot h e r s m ove i nd e pendent ly ; yo u d i d a n
i nt u i t ive c o r r ela t io n.

 B y l o o k i ng a t t h e c o r r e l at io ns b et ween a l l t h e p a r t s we c a n
s e e ( o b s e r vable b e h avio r s ), we c a n i nfe r s o m et h ing a b o u t
t h e i r u nd e r lyi ng na t u r e ( t h e o r etic al c o ns t r u c t s ) .

 Fa c to r A n a ly s is i s a s t a t i s t ica l m et h o d t h a t l o o k s a t h ow l ot s
o f d i f fer ent o b s e r vat io ns c o r r e l ate a nd d eter mine s h ow m a ny
t h e o r etic al c o ns t r u c t s c o u l d m o s t s i m p ly ex p la in w h a t yo u
s e e .

FROM SEA MONSTERS TO FACTOR
ANALYSIS

ADVANCED PROCEDURES:
FACTOR ANALYSIS

What name would you
give to each of these
different factors?

ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS

 T h i s i s a p a r t i cul ar t y p e o f p a t h a na l y sis t h a t te s t s w h et her a
p r e s um ed c a u s al r e l a tio ns hip b et ween t wo va r i ables i s d u e
to s o m e p a r t i c ular i nte r vening va r i a ble ( M – me d iat in g
var i able )
 E.g., Fraley & Aron, 2004: Strangers meeting while either doing

something humourous or non-humourous. Those in the humourous
condition felt closer to their partners. Researchers wanted to
demonstrate that this result was mediated in part by the humour
distracting people from the discomfort of meeting a stranger.
 In other words, the reason humour increased closeness is that it

was distracting.

ADVANCED PROCEDURES:
MEDIATIONAL ANALYSIS

B a ro n & Ke nny ’ s ( 1 9 8 6 ) 4 s te p s fo r e s t a blis hing m e d iat io n:
1 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s Y.
2 . S h ow t h a t X s i g ni fic ant ly p r e d ic t s M .
3 . S h ow t h a t M p r e d ic ts Y i n t h e c o ntex t o f a m u l t iple

r e g r es si o n i n w h i c h X i s a l s o i nc l u ded a s a p r e d ic to r.
4 . S h ow t h a t , w h e n M i s i nc l u ded a s a p r e d ic to r o f Y ( a l o ng

w i t h X ) , X no l o ng e r p r e d ic ts Y ( fo r f u l l me d iat ion ) o r t h a t
t h e p r e d ic tio n i s we a ker ( k now n a s p ar t i al me d iat io n ).

x y

*** not for cross-sectional designs!
30

 I k now w h a t o ne s t u d y s ay s … . B u t w h a t a b o u t a l l o f t h e
ot h e r s ?

 Rev i ew p a p er : a qu a l it at ive s u m m ar y o f t h e s t a te o f t h e
l i te rat ur e o n a g i ven r e s e arc h qu e s t io n

 M et a – a na ly sis : a s t a t i st ic al a na l y sis t h a t y i e l ds a qu a nt i t ative
s u m m ar y o f a s c i e nt ific l i ter at ure.
 Or, a “study of studies”

 Unit of analysis: effect size!

ADVANCED PROCEDURES:
META – ANALYSIS

 A m a j o r l i m i ta tio n to m et a – analy sis : T h e F i l e D r awe r P r o ble m

 C a u t io n: j u s t b e c a us e i t i s s t a t i st ic al d o e s n’ t m e a n i t i s
p e r fect ly o b j e c ti ve!

 Ego depletion
 Ca r ter, E. C. , Ko f l er, L. M . , Fo rster, D. E. , & M cCul l o ugh, M . E. (2015). A s er i es o f meta-

a na l y ti c tests o f the depl eti o n effect: s el f – co ntro l do es no t s eem to rel y o n a l i mi ted
res o urce. Jo urnal o f Ex pe ri m e ntal Psyc ho l o gy: G e ne ral , 144 (4), 796- 815.
https :// do i . o rg /1 0. 1 0 37/ x ge 0 00 0 08 3

 H a g ger, M . S . , Wo o d, C. , S ti ff, C. , & Chatzi s a ra nti s , N . L. (2010). Ego depl eti o n a nd the
strength mo del o f s el f – co ntro l : a meta – a na l ys i s . Psyc ho l o gi cal B ul l eti n , 136(4), 495–
525. https :// do i . o rg /1 0. 1 0 37/a 00 1 94 8 6

 Ovulator y cycle effects
 Gildersleeve, K., Haselton, M. G., & Fales, M. R. (2014). Do women’s mate

preferences change across the ovulatory cycle? A meta-analytic
review. Psychological Bulletin, 140(5), 1205-1259.
https://psycnet.apa.org/doi/10.1037/a0035438

 Wood, W., Kressel, L., Joshi, P. D., & Louie, B. (2014). Meta-analysis of
menstrual cycle effects on women’s mate preferences. Emotion Review, 6(3),
229-249. https://doi.org/10.1177%2F1754073914523073

ADVANCED PROCEDURES:
META – ANALYSIS

https://doi.org/10.1037/xge0000083

https://doi.org/10.1037/a0019486

https://doi.org/10.1177%2F1754073914523073

META – ANALYSIS:
SOCIAL RELATIONSHIPS AND HEALTH

Better!

 Op e n wo rk h o u r s i n t u to r i a l s e s s i o n

 D a t a A na l ys is P ro j e c t!

TO DO

Psy 202H1: �Statistics iI��Module 9: �Intro to Advanced Concepts�
Game Plan
Non-Parametric vs Parametric Tests
Parametric vs. Nonparametric Tests
Parametric vs. Nonparametric Tests
“Goodness of Fit” Test and the �One Sample t Test
Special Applications of the� Chi-Square Tests
Example: The Mann-Whitney U Test (vs. independent t)
Example: The Wilcoxon Signed-Ranks T test (vs. repeated-measures t)
Example: The Kruskal-Wallis H test (vs. one-way anova)
Example: The Friedman test (vs. repeated anova)
NonParametric Tests Overview
Parametric vs. Nonparametric Tests
Intro to Advanced Procedures
Intro to Some Advanced Procedures
Advanced Procedures: �Multilevel Modeling
Advanced Procedures: �Multilevel Modeling
Advanced Procedures: �Factor Analysis
From Behaviours to Constructs
1. How many sea monsters?
1. How many sea monsters?
2. How many sea monsters?
2. How many sea monsters?
3. How many sea monsters?
3. How many sea monsters?
From sea monsters to factor analysis
Advanced Procedures: �Factor Analysis
Advanced Procedures: �Mediational Analysis
Advanced Procedures: �Mediational Analysis
Advanced Procedures: �Mediational Analysis
Advanced Procedures: �Meta-Analysis
Advanced Procedures: �Meta-Analysis
Advanced Procedures: �Meta-Analysis
Meta-Analysis: �Social Relationships and Health
To do

TD0409-01 课件/Psy 202_8_Chi_Square_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 8:
INTRO TO CHI SQUARE

1. Introduction to Chi-Square
1. Research Spotlight: Selfies in the Wild
2. Hypothesis Testing Steps
3. When to use the Chi-Square
4. An example
5. Practice!

2. Chi-Square Test of Independence
1. When We Use It
2. A Research Example
3. Practice

3. Back to the Big Picture

GAME PLAN

INTRODUCTION TO
CHI-SQUARE ANALYSIS

 Hy pot h e s i s te s t s us e d t h us f ar te s te d hy pot h e s e s abo ut
po p ul at i o n pa r a m eter s .

 Paramet ri c te s t s s h are s eve ral as s umpt i o n s
 Normal distribution in the population
 Homogeneity of variance in the population
 Numerical score for each individual

 N o n param et ri c te s t s are n e e de d i f re s e arc h s i t uat i o n do e s n ot
me et al l t h e s e as s umpt i o n s .
 More next week!

 N o n param et ri c te s t s …
 Make few assumptions about distribution (as compared to all of our

assumptions about normality and variance for z, t, F, etc.)
 Usually use categories/frequencies

PARAMETRIC VS NONPARAMETRIC TESTS

Statistical Test IV DV

Correlation/Linear
regression

Continuous Continuous

Independent
samples t-test

Two independent
categories

Continuous

Paired sample
t-test

Two related groups Continuous

ANOVA Multiple categories Continuous

Chi-square Two or more
categories

Categorical

THE CHI-SQUARE STATISTIC

 Most statistical tests you learn require quantitative
data (correlation, z-test, t-test, etc.)

 What if we have questions about categories or
classifications?
 Do college students prefer Coke or Pepsi?
 Is the racial breakdown of UofT representative of the general

population?

 These questions involve counting the number of
people in dif ferent groups/categories

 They involve frequency distributions

THE CHI-SQUARE STATISTIC

 The Chi-Square statistic: χ2
 Tests whether one set of proportions is different from

another
 Done by comparing frequencies (counts)

 Two types of hypothesis tests
 χ2 Goodness-of-fit test
 χ2 Test of independence

χ2 TEST FOR GOODNESS-OF -FIT

 Goodness-of-fit test uses frequency data from
a sample to test hypotheses about propor tions
in a population.

 Each individual is classified into ONE categor y
on the variable of interest.
 Do you prefer Coke or Pepsi?
 Do your prefer the original or prequel Star Wars movies?

 Simply count how many people in the sample
are in each categor y

χ2 TEST FOR GOODNESS-OF -FIT

 H O specifies the propor tion of the population
that should be in each categor y.

 The propor tions from H O are used to compute
expected frequencies

 The expected frequencies describe how the sample
would appear if H O was true

 χ2 then compares obser ved frequencies (from the
sample) to expected frequencies (from H O)

χ2 TEST FOR GOODNESS-OF -FIT

 Why is it called “goodness-of-fit?”

 We test whether our “obser ved” frequencies
“fit” against our “expected” frequencies.

 Kind of like model testing (remember, R 2 as a
statistic of “goodness of fit”)

h t t p s : / / info r mat io nis beaut iful . net/ v isual izat io ns /di ver s it y – in- tec h /

RESEARCH SPOTLIGHT:
HAVE WE REACHED

GENDER PARIT Y IN TECH
COMPANIES?

https://informationisbeautiful.net/visualizations/diversity-in-tech/

 W h en wo u l d yo u s ay t h a t g e nd er e qu a lit y h a d b e e n a c h i eved
i n te c h ?

 F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o u nt r y
 51% female in Canada (Census 2016)

 F i g ur e o u t d e m o gr aphic b r e a kdowns o f c o m p a ny
 https://informationisbeautiful.net/visualizations/diversity -in-tech/

 D o t h ey m a t c h ?

GENDER PARIT Y IN TECH

https://informationisbeautiful.net/visualizations/diversity-in-tech/

GENDER PARIT Y IN TECH

RESEARCH PROBLEM

 D o e s a new te a c hing m et h o d i m p rove te s t p e r fo r manc e
o n a s t a nd ar dized m a t h te s t ?

 I n p r i o r ye a r s , 6 0 % o f s t u d ent s p a s s ed t h e te s t ( 4 0 % f a i l ed).

 D a t a f ro m t h e C U R R E NT s c h o o l ye a r ( 2 0 0 c h i l dr en) :

 Is there a significant change in test per formance?

Student Performance this Year

Pass Fail
150 50 Total n = 200

 This is “frequency” (or “count”) data
 200 children were sampled
 150 children passed
 50 children failed

RESEARCH PROBLEM

Test Performance

Pass Fail
150 50 Total n = 200

STEP 1: STATISTICAL HYPOTHESES

H0: There is no change/difference in student
performance

The pass rate this year (with the new teaching method) will be
the same as the pass rate in prior years (60% pass, 40% fail).

H1: The is a change/difference in student
performance

STEP 2: FIND CRITICAL VALUE

Two pieces of information needed
 α level
 df = C-1 (where C = number of categories)

Critical value from Table
 α = .01
 df = 2 -1 = 1
 Critical value = 6.63

CRITICAL VALUE OF χ2

χ2
6.63

Decision Rule: If observed χ2 equals or exceeds 6.63, then reject Ho

STEP 3: COMPUTE OBSERVED χ2

 f o = o b s e r ved f r e qu e ncy ( fo r e a c h c e l l )
 f e = ex p ec ted f r e qu enc y ( fo r e a c h c e l l ) = p n
 p = p ro p o r t i o n s t a te d i n t h e nu l l hy p ot hes is
 n = tot a l s a m p le s i z e

∑
−

=
e

f
ff 22 )(χ

COMPUTE OBSERVED χ2

 How do you find p?
 We are given information about the known population

distribution in previous years.

 60% pass and 40% fail.

 Thus the proportions (p) under the null hypothesis
are:
 p = .60 pass
 p = .40 fail

 If the problem doesn’t specify, figure out what the
question is asking: e.g., if 2 sodas are preferred at equal
rates, what proportion of people should prefer each one?
What about 3 different sodas?

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50
Total
n = 200

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50
Total
n = 200

Expected
frequencies (fe)

fe = pn

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50 Totaln = 200

Expected
frequencies (fe)

fe = pn

.60 × 200 .40 × 200

COMPUTE OBSERVED χ2

 Compute expected frequencies (pn):

Student Performance
Pass Fail

Observed
frequencies (fo)

150 50 Totaln = 200

Expected
frequencies (fe)

fe = pn

.60 × 200
= 120

.40 × 200
= 80

If HO is true (and there
is no change), we expect
to see 120 students pass

and 80 fail.

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

∑
−

=
e

f
ff 22 )(χ

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

∑
−

=
e

f
ff 22 )(χ

+∑
−

=
120

)120150( 22χ

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

∑
−

=
e

f
ff 22 )(χ

80
)8050(

120
)120150( 222 −+∑

−
=χ

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

COMPUTE OBSERVED χ2

 Step 4: Calculate χ2

∑
−

=
e

f
ff 22 )(χ

80
)8050(

120
)120150( 222 −+∑

−
=χ = 7.5 +11.25 = 18.75

Student Performance

Pass Fail
Observed

frequencies (fo)
150 50

Expected
frequencies (fe)

120 80

STEP 4: MAKE A DECISION

Reject Ho
Because observed χ2 (18.75) exceeds the

critical value (6.63)

STEP 5: REPORT RESULTS

 What does this mean?
 There was a significant change in test performance.

Students performed better this year (75% passed)
compared to prior years (60% passed).

 “Based on data from the current school year, test
performance was significantly improved with the new
teaching method, χ2 (1, N = 200) = 18.75, p < .01. A
larger percentage of students passed the test this year
(75%) compared to prior years (60%)”

REPORTING A

 A closer look…

χ2(1, N = 200) = 18.75, p < .01, two-tailed

Test
statistic

Observed
value

alpha
level Other tests:

One or two
tailed

All Chi-Sq
tests are one-

tailed!

Degrees of
freedom

2χ

Sample size
(Chi-Sq only)

Learning check!

CHI SQUARE TEST OF
INDEPENDENCE

THE CHI-SQUARE STATISTIC

 Working with categorical variables
 Two types of hypothesis tests
 χ2 Goodness-of-fit test
 χ2 Test of independence

 The Goodness of fit test
 We have one variable
 We test whether observed frequencies (proportions) match

expected or hypothesized frequencies

THE CHI-SQUARE STATISTIC

 What if we have more than one variable?

 What if we have questions about the relationship
between two categorical variables?
 Are women more likely than men to prefer Coke to Pepsi?
 Do students vs. faculty differ in their opinion

about raising student fees (yes/no)?

 Need the Chi-square test of independence

L l oy d , H u g e nber g, & c o l l e agues

RESEARCH SPOTLIGHT:
ARE THERE SYSTEMATIC

DIFFERENCES IN WHAT
KIND OF SELFIES MEN

AND WOMEN POST?
• A ng l e r e l a ted to p owe r
• A ng l e r e l a ted to g e nd er
• Powe r r e l a ted to g e nd er
• D ow ns t r e am c o ns e qu e nces ?

STUDY 1 METHOD

 Compiled 932 selfies from www.iconosquare.com
(Instagram)

 4 trained raters
 Judged target gender
 Judged whether selfie was taken below, at, or above eye level

http://www.iconosquare.com/

STUDY 1: METHOD

STUDY 1: RESULTS

Low
(below

eye level)

Neutral
(at eye
level)

High
(above eye

level)

Male 139 240 87

Female 65 230 171

X2 (2, N = 932) = 54.40, p < .001
39

STUDY 1: DISCUSSION

 Pe o ple ta ke se lfies fro m va rie d a n g les
 A n g les c h o sen dif fe r by ta rg et g e n der

RESEARCH PROBLEM

 Are people more likely to litter when the environment
is already dir ty?

 Conduct an experiment:
Hand people a flier at the entrance to a parking lot
Parking lot is either dirty or clean
Measure whether person throws flier on the ground

 Is there a significant association between cleanliness of
the environment and littering?

 Kind of like a correlation, but for categorical variables

THE CHI-SQUARE STATISTIC

 Need a new chi-square statistic

 The Chi-Square test of independence
 Tests whether two categorical variables are related to each

other
 Whether two variables “depend” on each other
 Done by comparing frequencies (counts)

χ2 TEST OF INDEPENDENCE

 Test of independence uses frequency data from a sample
to test hypotheses about propor tions in a population.

 Each individual is classified into one categor y based on
the combination of two variables
 Are women more likely than men to prefer Coke to Pepsi?
 Do students vs. faculty differ in their opinion

about raising student fees (yes/no)?

 Simply count how many people in the sample are in each
categor y

χ2 TEST OF INDEPENDENCE

 H O states that the two variables ARE NOT related
 Assumes that frequencies (proportions) on one variable are the

same across levels of the other variable

 H 1 states that the two variables ARE related

 χ2 then compares obser ved frequencies (from the
sample) to expected frequencies

 Expected frequencies are computed from
sample data

χ2 TEST OF INDEPENDENCE

 Why is it called “test of independence”?

 We test whether the frequencies (proportions) on
one variable are independent from another variable

COMPUTING χ2

 Same formula for χ2

 f o = o b s e r v e d f r e q u e n c y ( f o r e a c h c e l l )
 f e = e x p e c te d f r e q u e n c y ( f o r e a c h c e l l )
 B u t g e t t i n g e x p e c te d f r e q u e n c i e s ( f e) a r e a b i t m o r e

c o m p l i c a te d !

∑
−

=
e

f
ff 22 )(χ

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

COMPUTING χ2

 Calculate χ2

 w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :

f c = c o l u m n to t a l

f r = r o w to t a l
n = to t a l s a m p l e s i z e

∑
−

=
e

f
ff 22 )(χ

n
ff

f rce =

HYPOTHESIS TESTING STEPS

Step 1: State the statistical hypotheses

Step 2: Create a decision rule

Step 3: Collect data and compute “obser ved”
test statistic

Step 4: Make a decision

Step 5: Repor t and summarize your results

RESEARCH PROBLEM

 A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent
i s a l r e ady d i r t y ?

 C o nd u c t a n ex p erime nt:

 Hand people a flier at the entrance to a parking lot

 Parking lot is either dirty or clean

 Measure whether person throws flier on the ground

 Is there a significant association between cleanliness of
the environment and littering?

RESEARCH PROBLEM

 A r e p e o p l e m o r e l i kely to l i t ter w h e n t h e e nv i ro nm ent i s
a l r e ady d i r t y ?

 D a t a f ro m 10 0 p a r t i cipant s :

 Is there a significant relationship between cleanliness
of the environment and littering behavior?

Subject’s response

Environment No Litter Litter
Clean 45 5
Dirty 30 20

Total n = 100

STATISTICAL HYPOTHESES

 State the Statistical Hypotheses
H0: There is no relationship between cleanliness of

the environment and littering

H1: There is a predictable relationship between
cleanliness of the environment and littering

FIND CRITICAL VALUE

 Create Decision Rule (find critical value)
Two pieces of information needed
 α level
 df = (R-1)(C-1)

(where R=number of rows, C = number of columns)

Critical value from Table
 α = .05
 df = (2 -1)(2-1) = 1
 Critical value = 3.84

CRITICAL VALUE OF χ2

χ2
3.84

Decision Rule: If observed χ2 equals or exceeds 3.84, then reject Ho

COMPUTE OBSERVED χ2

 Calculate χ2

 w h e r e f o = o b s e r v e d f r e q u e n c y
f e f o r e a c h c e l l i s :

f c = c o l u m n to t a l
f r = r o w to t a l
n = to t a l s a m p l e s i z e

∑
−

=
e

f
ff 22 )(χ

n
ff

f rce =

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

Subject’s response Row totals

Environment No Litter Litter
Clean 45 5 50
Dirty 30 20 50

Column totals 75 25
Total n = 100

COMPUTE EXPECTED
FREQUENCIES

 Obser ved frequencies with row and column totals:

 Next, compute expected frequency for each cell

n
ff

f rce =
55

COMPUTE EXPECTED
FREQUENCIES

 Expected frequency for each cell:

Subject’s response Row totals

Environment No Litter Litter
Clean 37.5 12.5 50
Dirty 37.5 12.5 50

Column totals 75 25 Total n = 100

n
ff

f rce =
5.37

100
5075

, ==
×

NoLitterClean 5.12
100

5025
, ==

×
LitterClean

5.37
100

5075
, ==

×
NoLitterDirty 5.12100

5025
, ==

×
LitterDirty

IMPORTANT!!!
This is different
from Goodness
of Fit method!!!

COMPUTE OBSERVED χ2

 Calculate χ2 ∑
−

=
e

f
ff 22 )(χ

5.12
)5.1220(

5.37
)5.3730(

5.12
)5.125(

5.37
)5.3745( 22222 −+

−
+

−
+∑

−
=χ

= 1.5 + 4.5 + 1.5 + 4.5 = 12.00

Subject’s response

Environment No Litter Litter
Clean 45/37.5 5/12.5
Dirty 30/37.5 20/12.5

MAKE A DECISION

 Make a decision
Reject Ho
Because observed χ2 (12.00) exceeds the

critical value (3.84)

 Repor t Results
 “Results revealed a significant association between cleanliness

of the environment and people’s tendency to litter, χ2 (1, N =
100) = 12.0, p < .05. Participants were much more likely to
litter in a dirty environment (40%) than in a clean environment
(10%).”
 The sample data suggest that there is a significant association

between cleanliness of the environment and people’s tendency
to litter. When people were in a dirty environment they were
much more likely to litter (40%) compared to when they were
in a clean environment (10%).

 Where did I get 40% and 10%?
 20/50 littered in the dirty condition = 40%
 5/50 littered in the clean condition = 10%

REPORT AND SUMMA RIZE FIND INGS

REPORTING A χ2

 A closer look…

χ2(1, N = 100) = 12.0, p < .05, two-tailed

Test
statistic

Observed
value

alpha
level Other tests:

One or two
tailed

All Chi-Sq
tests are one-

tailed!

Degrees of
freedom

Sample size
(Chi-Sq only)

COHEN’S W

 C o h e n’ s w c a n b e u s e d to m e a s ur e e f fec t s i z e fo r b ot h t y p e s
o f c h i – s quar e te s t s :

 C o h e n s u g ges ted t h a t .10 i s a s m a l l e f fe ct , . 3 0 a m e d ium
e f fec t , a nd . 5 0 a l a r g e e f fec t .

 C o h e n’ s w d o e s not u s e t h e s a m p le s i z e , t h e r e fo re t h e s a m p le
s i z e d o e s not a f fe ct t h e va l u e o f w.

∑
−

=
e

P
PP

w
2)(

n
f

P
o

o =

Observed
proportion

THE PHI-COEFFICIENT

 Fo r a 2 x 2 m a t r i x , t h e p h i c o e f fi c ient (Φ) m e a s ur es t h e
s t r e ng t h o f t h e r e l a tio ns hip

So Φ2 is the proportion of variance
accounted for, just like r2n

φ
χ

EFFECT SIZE IN A LARGER MATRIX

 Fo r a l a r g er m a t r i x , a m o d i fi cat io n o f t h e
p h i – c o ef fic ient i s u s e d: C r a m er ’ s V

 d f * i s t h e s m a l ler o f ( R – 1 ) o r ( C – 1 )

*)(

dfn
V

χ
=

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 Independence of Obser vations
 E.g., The observation that Subject A is a Chemistry

major must be independent from the observation
that Subject B is an English major
 Random sampling

 Each observed frequency needs to come from a
different participant
 What if people can be double-majors?

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 S i z e o f E x p ec ted Fr e quenc ies
 Cochran’s Rule: Cell frequencies should all be > 5

 More lenient updates to the rule:
 No expected cell frequency should be less than 1
 No more than 20% of the expected cell frequencies should be less than 5
 Note: For a 2×2 matrix this means a single cell

 Solutions?
 Increase your sample size
 Consider collapsing categories together (should be done with caution –

can make it more dif ficult to reject H0)

ASSUMPTIONS & R ESTRICTIONS
FOR CHI-SQUARE TEST S

 E . g . , E x p ec ted Fr e qu enc ies

Teens Young
Adults

Middle
Aged

Seniors

Liberal 20 18 9 3

Conservative 4 12 2 8

Young Old

Liberal 38 12

Conservative 16 10

Learning check!

BACK TO THE BIG
PICTURE

What type of
claim?

Frequency
(one variable)

Chi-Sq
Goodness

Association
(two variables)

Categorical? Chi-Sq Ind

Quantitative? Correlation

HOW TO CHOOSE A TEST

 D a t a p ro j e ct
 *** Important! Updated Data File as of today, March 14

TO-DO

Psy 202H1: �Statistics iI��Module 8: �Intro to Chi Square�
Game Plan
Introduction to �Chi-Square Analysis
Parametric vs Nonparametric tests
Slide Number 5
The Chi-Square Statistic
The Chi-Square Statistic
2 Test for Goodness-of-Fit
2 Test for Goodness-of-Fit
2 Test for Goodness-of-Fit
Research Spotlight: Have we reached gender parity in Tech companies?
Gender parity in tech
Gender parity in Tech
Research Problem
Research Problem
Step 1: Statistical Hypotheses
Step 2: Find Critical Value
Critical value of 2
Step 3: Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Compute observed 2
Step 4: Make a decision
Step 5: Report results
Reporting a
Chi Square Test of Independence
The Chi-Square Statistic
The Chi-Square Statistic
Research Spotlight: Are there systematic differences in what kind of selfies men and women post?
Study 1 Method
Study 1: Method
Study 1: Results
Study 1: Discussion
Research Problem
The Chi-Square Statistic
2 Test of Independence
2 Test of Independence
2 Test of Independence
Computing 2
Computing 2
Hypothesis testing steps
Research Problem
Research Problem
Statistical Hypotheses
Find Critical Value
Critical value of 2
Compute observed 2
Compute expected frequencies
Compute expected frequencies
Compute observed 2
Make a decision
Report and Summarize Findings
Reporting a 2
Cohen’s w
The Phi-Coefficient
Effect Size in a Larger Matrix
Assumptions & Restrictions� for Chi-Square Tests
Assumptions & Restrictions� for Chi-Square Tests
Assumptions & Restrictions� for Chi-Square Tests
Back to the Big pIcture
How to Choose A Test
To-Do

TD0409-01 课件/Psy 202_6_correlation_W22_topost.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS I

MODULE 6:
INTRO TO CORRELATION

1. Intro to Correlation
2. Hypothesis Testing with Correlation
3. What are correlations used for?
4. Interpreting Correlation

1. Issues to look out for

GAME PLAN

INTRO TO CORRELATION

THE BIG PICTURE

Single
score

1 IV

z score

z test

One sample t-
test

Making comparisons
to population (NO IVs)

Sample
mean

σ known σ unknown

Making comparisons
between levels of IV(s)
or groups

More than 1 IV

2 levels 3+ levels IV

Between
subjects

Within
subjects

Independent
samples t-test

Paired
samples t-test

Between
subjects

Within
subjects

One-Way
Between
ANOVA

One-Way
Repeated
ANOVA

All IVs
Between
subjects

All IVs
Within
subjects

Mix of
within and
between

Between subj
Factorial
ANOVA

Repeated
Measures
Factorial ANOVA

Mixed Model
Factorial
ANOVA 4

Statistical Test IV DV

Correlation/Linear
regression

Quantitative Quantitative

Independent
samples t-test

Two independent
categories

Quantitative

Paired sample
t-test

Two related groups Quantitative

ANOVA Multiple categories Quantitative

Chi-square Two or more
categories

Categorical

RESEARCH PROBLEM

 What is the relationship between hours studying and
scores on a quiz?
 Conduct a non-experimental study
 n = 6 students

 Measure hours studying for an exam (X)

 Record each student’s quiz score (Y)

 Examine association between hours studying and quiz
scores

 Does study time predict quiz scores?

RESEARCH PROBLEM

 Correlation
 Direction and strength of an association between two variables

(X,Y)

 Typically (but not only) used in non-experimental research
(variables are measured, not manipulated)

 Other examples:
 Relationship between stressful life events (X) and number of

illness symptoms (Y)

 Relationship between years of education (X) and yearly income (Y)

TOOLS FOR CORRELATION

 The Scatterplot
 A figure

 Shows association between two variables

 The Pearson correlation coef ficient
 A statistic

 Describes the direction and strength of a linear association
between two continuous variables

THE SCATTERPLOT

 Hours studying and quiz scores

Student
Study Hrs

(X)
Test Score

(Y)
A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

n = 6 people,
6 pairs of

scores
n =6

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

THE SCATTERPLOT

H o u r s s t u d y i n g a n d q u i z s c o r e s Relationship Between Hours
Studying and Quiz Score

0 1 2 3 4 5 6 7 8

Hours Studying (X)

Q
ui

z
S

co
re

(Y
)

Student Hours
(X)

Score
(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

Scatterplot

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

0
0

3
0

6
0

9
0

1
0

SEEING RELATIONSHIPS

SUMMARIZING RELATIONSHIPS

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

0
0

3
0

6
0

9
0

1
0

Linear
relationship

Describes variables that
can be well-represented
by a straight line (i.e.,
there is a common ratio
between a score on one
and a score on the other)

Height (in)

W
ei

gh
t

(lb
s.

)

48 54 60 66 72 78 84

7
0

0
0

3
0

6
0

9
0

1
0

SUMMARIZING RELATIONSHIPS

Grade Point Average

P
ar

ty
h

ou
rs

(w
ee

1.0 2.0 3.0 4.0

0
5

1
0

1
5

SUMMARIZING RELATIONSHIPS

Ex
am

p
er

fo
rm

an
ce

Curvilinear
relationship

Reported Anxiety

 h t t p : / / w w w. p e w r e s e a r c h . o r g / f a c t – t a n k / 2 01 5 / 0 9 / 1 6 / t h e – a r t – a n d – s c i e n c e – o f –
t h e – s c a t t e r p l o t /

NOT A GIVEN…

http://www.pewresearch.org/fact-tank/2015/09/16/the-art-and-science-of-the-scatterplot/

DESCRIBING RELATIONSHIPS

When we talk about statistical relationships,
we begin by assessing the covariance, or
degree to which two variables var y together.

This statistic is used as the basis for the
correlation coefficient, a statistic that
measures the relationship between variables.
 Pearson’s product-moment correlation: r
 Spearman’s rank-order correlation: r s
 Point-biserial correlation: rpb

THE CORRELATION COEFFICIENT:
BASICS

 Pear son Correlation Coef ficient
 Symbol: r

 Ranges from -1.0 to +1.0

 Sign (+/-) indicates “direction” of relationship

 Value indicates “strength” of relationship

• Some general guidelines
• .10 is weak
• .30 is moderate
• .50 is strong

 Measures a linear relationship only

Remember: r2 guidelines
• .01 weak
• .09 moderate
• .25 strong

THE CORRELATION COEFFICIENT

Figure 16-3 (p. 524). Examples of positive and negative relationships. (a) Beer sales are
positively related to temperature. (b) Coffee sales are negatively related to temperature.

Positive Correlation
X = Temperature
Y = Beer Sales

Negative Correlation
X = Temperature
Y = Coffee Sales

THE CORRELATION COEFFICIENT

Figure 16-5 (p. 525). Examples of different values for linear correlations: (a) shows a strong positive
relationship, r = +.90; (b) shows a moderate negative correlation, r = –.40; (c) shows a perfect negative
correlation, r = –1.0; (d) shows no linear trend, r = 0.0.

r = +.90

r = −1.0

r = −.40

r = 0

How closely
do the dots

hug the
line?

COMPUTING R

r = degree to which X & Y vary together
degree to which X & Y vary separately

r = Covariance of X & Y
Variance of X & Y

COVARIABILIT Y OF X AND Y

YX XY

Variance in
X alone

Variance in
Y alone

Covariance between
X and Y

• The greater the covariance, the greater the correlation (the closer r will be to ±1.0)
26

COMPUTING R

 Computational formulas for Pearson r

SP =
n

YX
XY ∑ ∑∑ −

SSX = n
X)(

X
2

2 ∑∑ − SSY =
n
Y)(

Y
2

2 ∑∑ −

r =
YXSSSS

Where:

• SP = “Sum of products”

• SS = “Sum of squares”

SP = similar
to SS, but for
COvariance

Learning check! 27

HYPOTHESIS TESTING FOR R

 State the research question
 Is there a significant linear association between X & Y?

 Is r significantly different from zero?

ρ = “rho” the population parameter

 r = sample statistic

HYPOTHESIS TESTING FOR R

 Step 1: Statistical Hypotheses for r
 Almost always two-tailed (non-directional)
 H0: ρ = 0
 H1: ρ ≠ 0

 One-tailed upper (directional)
 H0: ρ ≤ 0
 H1: ρ > 0

 One-tailed lower (directional)
 H0: ρ ≥ 0
 H1: ρ < 0

HYPOTHESIS TESTING FOR R

 Step 2: Find critical value of r (Table)
 Need 3 pieces of information:
 α

 One-tailed or two-tailed?
 degrees of freedom: df = n−2

HYPOTHESIS TESTING FOR R

 Step 2: Find critical value of r (Table)
 Need 3 pieces of information:
 α

 One-tailed or two-tailed?
 degrees of freedom: df = n−2

 Step 3: Compute obser ved r
 Step 4: Make a decision
 Reject H0 if observed r exceeds rcritical

 Step 5: Summarize and repor t findings

LET’S PRACTICE!

 Research question
 Is there a significant linear association between hours

studying and quiz score?

 Is r significantly different from zero?

 Step 1: Statistical Hypotheses

 H0: ρ = 0
 H1: ρ ≠ 0

Step 2: Find rcritical in Table α = .05
 Two-tailed
 df = n−2; df = 6−2 = 4

LET’S PRACTICE!

 Research question
 Is there a significant linear association between hours studying

and quiz score?

 Is r significantly different from zero?

 Step 1: Statistical Hypotheses

 H0: ρ = 0
 H1: ρ ≠ 0

 Step 2: Find rcritical in Table α = . 0 5
 Two-tailed
 df = n−2; df = 6−2 = 4
 From table  rcrit = ±.811

LET’S PRACTICE

 Step 3: Compute obser ved r

 Steps in computing r:

 Compute SSX
 Compute SSY
 Compute SP

 Compute r

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y)

A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

n =6 ΣX =22

SSX =
n
X)(

X
2

2 ∑∑ −

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2

A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49

n =6 ΣX =22 ΣX2 =112

SSX = n
X)(

X
2

2 ∑∑ −

COMPUTING R

 Compute SSx

SSX = n
X)(

X
2

2 ∑∑ −

SSX = 333.316
22

112
2
=−

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2

A 1 1 1
B 1 3 1
C 3 4 9
D 4 5 16
E 6 4 36
F 7 6 49

n =6 ΣX =22 ΣY =23 ΣX2 =112

SSY = n
Y)(

Y
2

2 ∑∑ −

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2

A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103

SSY = n
Y)(

Y
2

2 ∑∑ −

COMPUTING R

 Compute SSY

SSY = n
Y)(

Y
2

2 ∑∑ −

SSY = 833.146
23

103
2
=−

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2

A 1 1 1 1
B 1 3 1 9
C 3 4 9 16
D 4 5 16 25
E 6 4 36 16
F 7 6 49 36

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103

SP = n
YX

XY ∑ ∑∑ −

LET’S PRACTICE

 H o u r s s t u d y ing a nd qu i z s c o r e s

Student
Hours

(X)
Score

(Y) X2 Y2 XY

A 1 1 1 1 1
B 1 3 1 9 3
C 3 4 9 16 12
D 4 5 16 25 20
E 6 4 36 16 24
F 7 6 49 36 42

n =6 ΣX =22 ΣY =23 ΣX2 =112 ΣY2 =103 ΣXY =102

SP = n
YX

XY ∑ ∑∑ −

COMPUTING R

 Compute SP:

SP =
n

YX
XY ∑ ∑∑ −

SP = 667.17
6

)23)(22(
102 =−

COMPUTING R

 Finally, compute r!

r =
YXSSSS

r = 819.571.21
667.17

4.833)(31.333)(1
17.776

+==

LET’S PRACTICE!

Step 4: Make a Decision

 Reject H0: robs (+.819) exceeds rcrit (±.811)

Step 5: Summarize and repor t finding

“There was a statistically significant positive correlation
between hours studying and quiz scores, r(4) = .82, p <
.05, two-tailed, r2 = .67. Students who studied longer
earned higher scores on the quiz.”

Notice: No causal
language!

LET’S PRACTICE!

Compute r2 ( “ c o e f fic ient o f d eter minat io n”)

 Effect size

 r2 = .8192 = .67

 67% of the variance in quiz scores is explained by
hours studying (and vice versa)

REPORTING AN R

 A closer look…

r(4) = .82, p < .05, two-tailed, r2 =.67
Test

statistic

Observed
value

alpha
level One- or two-

tailed
Degrees of

freedom

Effect size

 Quantitative data
 Independent obser vations
 Random sampling
 Linear relationship

ASSUMPTIONS FOR PEARSON’S R

Learning check!
50

SPOTLIGHT ON T WIN STUDIES

WHAT ARE
CORRELATIONS FOR?

COMMON USES FOR CORRELATIONS

Prediction
Note: this is NOT causal language

Measurement assessment
Validity (accuracy)
Reliability (consistency)

VARIOUS USES OF CORRELATIONS

 P r e dic t io n  I f we k now t h a t t wo va r i a bles a r e r e l a ted to o ne
a not h e r, we c a n u s e k n owl e d ge ab o u t o n e var iable to m a ke
p r e d ic tio ns a b o u t t h e va l u e o f t h e ot h e r va r i abl e
 E.g., How tall do you think my niece is? Does it help if I tell you that

she just turned 5?

VARIOUS USES OF CORRELATIONS

 Va l i dit y o f m e a s ur es
 Convergent validity: How strongly does the measure correlate with

other measures of the same construct?
 E.g., Does the self-esteem measure you’ve just constructed correlate

positively with existing self-esteem measures? (good thing)

 Discriminant validity: How strongly does the measure correlate with
measures of unrelated constructs?
 E.g., Does the self-esteem measure you’ve just constructed correlate

positively with measures of unrelated constructs (e.g., mood)? (bad thing)

VARIOUS USES OF CORRELATIONS

 Re l ia bili ty o f m e a s ures
 Reliable measures should produce consistent, stable results
 E.g., If you are measuring IQ, or a personality trait, or any other

construct where you expect stable results, you would expect a
person’s scores from any two measurement sessions to be highly
correlated

VARIOUS USES OF CORRELATIONS

 T h eo r y Ve r ifica t io n  M a ny p s yc ho lo gic al t h e o r i es i nvo l ve
s p e c ific p r e d ic tio ns a b o u t t h e r e l a t io ns hip b et ween t wo
va r i abl es
 One way these predictions can be tested is by determining the

correlation between the two variables
 E.g., The General Aggression Model predicts positive relationships

between recent exposure to violent media and a host of aggression-
related variables (hostile expectancies, aggressive cognitions,
physiological arousal, etc.)

INTERPRETING
CORRELATION

PROCEED WITH CAUTION…

1. Correlation is sensitive to outliers

2. Correlation is only appropriate for describing
linear relationships

3. Correlation is sensitive to restriction of range
(lack of generalization)

4. Beware of heterogeneous samples

5. Correlation does not imply causation

1 . SENSITIVE TO OUTLIERS

222120191817

0
-2

r = -.10

605040302010

-10

r = .94

• An outlier is an extremely deviant individual in the sample
• Characterized by a much larger (or smaller) score than all the others in the sample
• In a scatter plot, the point is clearly different from all the other points
• Outliers produce a disproportionately large impact on the correlation coefficient

2. LINEAR RELATIONSHIPS ONLY

r = .10

Reported anxiety

Ex
am

p
er

fo
rm

an
ce

3. RESTRICTION OF RANGE

Score on IQ test

S
co

re
o

n
ge

ne
ra

l m
at

h
te

r = .82

75 80 85 90 95 100 105 110 115 120 125 130

r = -.13

105 110 115 120 125

4. HETEROGENEOUS SAMPLES

r = – .70

Reported Anxiety

P
er

fo
rm

an
ce

o
n

Ex
am

AND NOW FOR SOME EXAMPLES…

We should all be texting while at Church (and
also having unprotected sex)!

Thinking of cleaning as women’s work is
actually better for both men and women
(especially if women do more housework, to
cut their risk of cancer)!

Any chance there is a problem here???

SO WHAT HAVE WE LEARNED?

5. CORRELATION IS NOT
CAUSATION

Ice cream sales

D
ea

th
b

y
D

ro
w

ni
ng

If X and Y are correlated:

… does X cause Y?

… does Y cause X?

… does Z cause X and Y?
71

NAME THAT CORRELATION…

WHAT’S WRONG WITH THIS
PICTURE?

r = – .80

WHAT’S WRONG WITH THIS
PICTURE?

r = .85

WHAT’S WRONG WITH THIS
PICTURE?

r = .10

WHAT’S WRONG WITH THIS
RESEARCH?

“The data showed a strong and highly significant
positive correlation between date of onset of
sexual activity and current level of sexual activity
(r = 0.75, p < .01), suggesting that teenagers
who begin having sex at an earlier age are more
promiscuous in college as a result.”

“The negative correlation coef ficient shows that
there is no relationship between these traits.”

“The correlation was significant (r = -1 .22)…”

h t t p : / / gues st hec o rr elat io n. c o m/

MORE PRACTICE?

 C a n m o r e e a s i ly
i d e nt if y i s s u e s
t h a t m i g ht
i nte r fer e w i t h yo u r
a b i li ty to i nte rpr et
yo u r d a t a

PROTIP: LOOK AT A SCATTERPLOT FIRST

ALTERNATIVES TO THE
PEARSON CORRELATION

Pearson correlation has been developed
For data having linear relationships
With data from interval or ratio measurement scales

Other correlations have been developed
For data having non-linear relationships
With data from nominal or ordinal measurement

scales
 Point-biserial
 Spearman’s correlation

SUMMARY

Correlations var y in type and magnitude
Errors are commonly made when interpreting

correlations
Look at a scatterplot!

h t t p s : / /kot aku. c o m/ anto nin – s ca lias- l andmark – defens e- o f – v io lent –
v i d e o – games – 17 58 99 036 0

EVEN THE US SUPREME COURT
KNOWS WHAT’S UP

https://kotaku.com/antonin-scalias-landmark-defense-of-violent-video-games-1758990360

And remember for the rest of your life:
Correlation does NOT equal causation!

Practice interpreting correlations on the
discussion board

JUST TO COMPLICATE
THINGS A LITTLE…

…BUT SOMETIMES IT KIND OF IS

https://www.youtube.com/watch?v=HUti6vGctQM&fbclid=IwAR2orZs_ECdn0
94_eSkyyp-1ZKXWtIv3USW2PL6N9oZunqIBY1nlTuUxAh4

https://www.youtube.com/watch?v=HUti6vGctQM&fbclid=IwAR2orZs_ECdn094_eSkyyp-1ZKXWtIv3USW2PL6N9oZunqIBY1nlTuUxAh4

 “In e s s e n c e , to l o g i c al l y i n fe r t h at X c aus e d Y, we n e e d to me et
t h re e re q ui re me n t s :
 We must know that X preceded Y. It is not possible for a cause to follow

or even coincide with an ef fect. It must come before it, even if it is
fractions of a second.
 X must covar y with Y. In other words, Y must be more likely to

occur when X occurs than when X does not occur.
 The relationship between X and Y is free from confounding. What this

means is that no other variable also covaries with X when #1 and #2 are
met.”

 W h at abo ut w h e n a t rue ex pe ri me n t i s n ot po s s i b l e ? Gi ve up?

 It may be mo re us e f ul to t h i n k o f c aus al i t y o n a c o n t i n uum
rat h e r t h an as a di c h oto mo us o ut c o me

 Se e mo re : h t t p: / / i c bs eve r y w h e r e . c o m/ bl o g / 2 01 4 / 10 / t h e -l o g i c –
o f – c aus al -c o n c l us i o n s /

…BUT SOMETIMES IT KIND OF IS

http://icbseverywhere.com/blog/2014/10/the-logic-of-causal-conclusions/

 Ke e p u p w i t h t u to r i als !
 Data project information: coming soon

TO-DO

Psy 202H1: �Statistics I��Module 6: �Intro to Correlation ��
Game Plan
Intro to Correlation
Slide Number 4
Slide Number 5
Research Problem
Research Problem
Tools for Correlation
The Scatterplot
The Scatterplot
The Scatterplot
The Scatterplot
The Scatterplot
The Scatterplot
Seeing relationships
Summarizing relationships
Summarizing relationships
Summarizing relationships
Summarizing relationships
Not a given…
Describing relationships
The Correlation Coefficient: Basics
The Correlation Coefficient
The Correlation Coefficient
Computing r
Covariability of X and Y
Computing r
Hypothesis testing for r
Hypothesis testing for r
Hypothesis testing for r
Slide Number 31
Hypothesis testing for r
Let’s Practice!
Slide Number 34
Let’s Practice!
Let’s Practice
Let’s Practice
Let’s Practice
Computing r
Let’s Practice
Let’s Practice
Computing r
Let’s Practice
Let’s Practice
Computing r
Computing r
Let’s Practice!
Let’s Practice!
Reporting an r
Assumptions for Pearson’s R
Spotlight on Twin Studies
What are correlations for?
Common Uses for Correlations
Various Uses of Correlations
Various Uses of Correlations
Various Uses of Correlations
Various Uses of Correlations
Interpreting Correlation
Proceed with caution…
1. Sensitive to outliers
2. Linear relationships only
3. Restriction of range
4. Heterogeneous samples
And now for some examples…
Slide Number 65
Slide Number 66
Slide Number 67
Slide Number 68
Slide Number 69
Slide Number 70
5. Correlation is not causation
Name that correlation…
Name that correlation…
Name that correlation…
Name that correlation…
What’s wrong with this picture?
What’s wrong with this picture?
What’s wrong with this picture?
What’s wrong with this research?
More Practice?
ProTIP: Look at a Scatterplot First
Alternatives to the � Pearson Correlation
Summary
Even the US Supreme Court �Knows What’s Up
Slide Number 85
Just to complicate things a little…
…but sometimes it kind of is
…but sometimes it kind of is
To-Do

TD0409-01 课件/Psy 202_5_MoreFactorial_W22.pdf

In s t ruc to r :
Dr. M o l l y M et z

PSY 202H1:
STATISTICS II

MODULE 5:
HYPOTHESIS TESTING WITH

FACTORIAL ANOVA

1. Review
1. More on interactions and simple effects
2. Another 2 factor design

2. Hypothesis Testing with Factorial ANOVA
1. Sources of variance
2. Foundations of hypothesis test
3. Example, with numbers!

GAME PLAN

Learning check!
Factorial Design review

 Te s t w i l l b e p o s te d Tu e sd ay Fe b r uar y 1 5 9 a m a nd w i l l b e d u e
T h u r sday Fe b r uar y 17 1 1 : 5 9 p m
 This is NOT a timed test. You may start it, take a break, and return to

it.
 However, I do NOT recommend taking the whole time to complete the

test!
 It will be written as if it could be completed like an in-person test, about 2

hours (assuming you prepare for it as if it were an in-person test).

 C o nte nt: A ll r e a d ings a nd l e c t u r es t h ro u g h M o d u le 5
 Including things reviewed in text but not in lecture video

 Fo r m a t o f qu e s t i o ns m ay i nc l u de
 Multiple Choice
 Short Answer
 Computations

MIDTERM

 Pe rmi t te d re s o urc e s :
 Your book , your notes
 Simple calculator

 N OT pe rmi t te d re s o urc e s :
 Your friends/classmates
 Including any group chats, like Discord, GroupMe, Facebook , etc.

 Any other people, including but not limited to those who have taken this
course before
 Google (or any internet resources)

 IM P O R TA N T: N ot j us t W HAT but W HY ; appl i c at i o n
 Is MindTap similar to the test? Kind of…

 W i l l i t be w ri t te n to b e h arde r to make up fo r t h e f ac t t h at i t i s
o pe n bo o k ?
 No, but…

MIDTERM

27 Questions TD0409-01 课件/Psy 202_10_Replication

TD0409-01 课件/Psy 202_10_Replication and Open Science_W22-1.pdf

TD0409-01 课件/Psy 202_7_Regress_W22.pdf

TD0409-01 课件/Psy 202_4_IntrotoFactorial_W22.pdf

TD0409-01 课件/Psy 202_3_RepeatedANOVA_W22.pdf

TD0409-01 课件/Psy 202_1_Review.pdf

TD0409-01 课件/Psy 202_2_OneWayANOVA_W22.pdf

TD0409-01 课件/Psy 202_9_advanced concepts_W22_topost-1.pdf

TD0409-01 课件/Psy 202_8_Chi_Square_W22.pdf

TD0409-01 课件/Psy 202_6_correlation_W22_topost.pdf

TD0409-01 课件/Psy 202_5_MoreFactorial_W22.pdf

We offer the best custom paper writing services. We have done this question before, we can also do it for you.

Why Choose Us

How It Works