Title: Ch.12_ Dependent-Samples T (1)
URL Source: blob://pdf/d49f5476-f794-47f1-93ea-f79596b11d29
Markdown Content:
Research Methods, Statistics, and
Applications, Third Edition
Chapter 12: Dependent-Groups Designs
Stephen Lippi, Ph.D.
Spring 2025, Mercer University Repeated-Measures Designs
A repeated-measures design, or a within-subjects
design, is one in which the dependent variable (DV)
is measured two or more times for each individual in
a single sample
The same group of subjects is used in all of the treatment
conditions
ADVANTAGE : we use exactly the same individuals in all
treatment conditions; no risk for participants in one
treatment group to be different from participants in the
other
> 2
# Dependent-Groups
Dependent-groups design
Powerful design; we decrease random error in
participant characteristics
Sometimes, selecting related samples can be more practical
Selecting related samples minimizes standard error
Computing difference scores prior to computing your
test statistic eliminates the between-persons source of
error, which reduces the estimate of standard error
Increase in power
Reducing our estimate of standard error increases the
value of the test statistic
> 3
# Designs with Dependent Groups
Dependent-groups designs: The participants in different
conditions are related or are the same people .
Participants can be related in two ways:
They are observed in multiple conditions ( repeated
measures design ) - subjects experience every
condition in a study
Most common related samples design
They are matched , experimentally or naturally, based
on the common characteristics or traits that they
share
> 4
## Rpt. Measures and Matched-Subjects
## Designs
In a matched-subjects study, each individual in one sample is
matched with an individual in the other sample
The matching is done so that the two individuals are equivalent (or nearly
equivalent) with respect to a specific variable that the researcher would like to
control
In repeated-measures or matched-subjects designs, were comparing
two treatment conditions (typically a pre- v. post design)
Two scores are obtained for each individual or each pair of subjects
> 5
# Matched v. Rpt. Measures
> 6
Matched Designs
Measuring some trait before
matching
Intelligence
Genetics
Repeated Measures
Within-subject
Pre-post or multiple levels;
same individuals in both
treatment conditions Comparing Repeated- and
## Independent-Measures Designs
A repeated-measures design typically requires fewer
subjects than an independent-measures design
The repeated-measures design is especially well suited
for studying learning, development, or other changes
that take place over time
The primary advantage of a repeated-measures design
is that it reduces or eliminates problems caused by
individual differences
Individual differences are characteristics such as age, IQ,
gender, and personality that vary from one individual to another
These individual differences can influence the scores obtained
in a research study, and they can affect the outcome of a
hypothesis test
> 7
## Comparing Repeated- and
## Independent-Measures Designs
Within-groups designs are more powerful than between-
groups designs
Easier to detect differences between conditions
Reason : were keeping extraneous differences between
participants constant across all conditions (since theyre the
same people)
Power is the probability that a study will show a statistically
significant result when an IV truly has an effect in the
population
> 89
Within-groups
designs also
generally
require fewer
participants
overall Time related factors and order effects
The primary disadvantage of a repeated-measures design is that
the structure of the design allows for factors other than the
treatment effect to cause a participants score to change from one
treatment to the next
Specifically, in a repeated-measures design, each individual is measured in two
different treatment conditions, often at two different times
The order that treatments are presented can also influence potential scores!
One way to deal with order effects is to counterbalance the order of
presentation of treatments
That is, the participants are randomly divided into two groups, with one group
receiving treatment 1 followed by treatment 2, and the other group receiving
treatment 2 followed by treatment 1
The goal of counterbalancing is to distribute any outside effects evenly over the
two treatments
> 10
# Avoiding Order Effects
Counterbalancing (when levels of the IV are presented to
participants in different orders/sequences)
The idea behind counterbalancing is that any order effects should cancel
each other out when all the data are collected
Two types:
Full counterbalancing - occurs when all the possible condition orders are
presented; example: with two conditions, there are two orders; with three
conditions, there are six orders
Partial counterbalancing (example: Latin Square ) - occurs when only some
of the possible condition orders are used; example: a researcher could
present a randomized order for each participant or a Latin square (in
which each condition appears in each position at least once)
> 11
# Counterbalancing Examples
Full : Repeated-measures
design with 3 conditions A, B,
C, we get a total of 6 possible
sequences
Partial : Only some of the
possible conditions are
represented; present the
conditions in a randomized
order for every subject or use a
Latin Square design
> 12
## Disadvantages of within-groups
## designs
Three main disadvantages :
1) Potential for order effects
Solution: counterbalancing
2) Might not be practical or possible
Ex: Comparing methods on how to teach children to ride a bike
3) Experiencing all levels of the IV changes the way participants act
(demand characteristics )
Demand characteristics (aka experimental demand) occur
when participants pick up on cues that lead them to guess the
experimenters hypothesis
> 13
# Other potential disadvantages
Dependent-groups designs may not be effective
in situations where participants are changed in
some permanent way
Carryover effect (impact of treatment lasts
longer than the time between conditions)
Practice effect (participants performance
improves as a result of repetition)
Fatigue effect (participants performance
worsens as a result of repetition)
> 14
## The t statistic for a repeated-measures
## research design
The t statistic for a repeated-measures design is structurally
similar to our one-sample t test!
The major distinction of the related-samples t test is that it is
based on difference scores rather than raw scores (X values)
> 15
# Difference Values
Difference score - score or value obtained by
subtracting two scores
In a related-samples t test, this is obtained prior to
computing the test statistic
We subtract pairs of scores first, then compute the
test statistic, which eliminates the source of error
associated with observing different participants in
each group or treatment
> 16
# Formulas and Calculations
MD = Mean difference, or the average difference
between the scores of matched pairs or the scores
for the same participants across two conditions;
computed by subtracting one score of a matched
or repeated pair from the other score and dividing
by N
SMD = Standard error of the mean difference, or
the standard deviation for the differences between
means for the sampling distribution when D = 0
> 17
## The t statistic for a repeated-measures
## research design
The single sample t-statistic formula is used when developing the
dependent-samples t test!
For the repeated-measures design, the sample data we use are the
difference scores and are identified by the letter D, rather than X
We use Ds in the formula to emphasize that we are dealing with difference
scores instead of X values
The population mean that is of interest to us is the population mean difference
and we identify this parameter with the symbol D
> 18
With this formula, we compute the estimated
standard error the exact same way as it is
computed for the single-sample t test!
We compute variance for the sample of D
scores Hypothesis Tests for the Repeated-
## Measures Design
In a repeated-measures study, each individual is measured in
two different treatment conditions and we are interested in
whether there is a systematic difference between the scores
in the first treatment condition and the scores in the second
treatment condition
A difference score is computed for each person
The hypothesis test uses the difference scores from the sample to
evaluate the overall mean difference, D, for the entire population
The hypothesis test follows the same four-step process that
weve used for other tests
1) State hypotheses
2) Select alpha and locate the critical region
3) Calculate the t statistic and 4) make a decision!
> 19
## The Hypotheses for a Related-
## Samples t test
Our goal is to use the sample of difference scores to answer questions
about the general population
We want to know whether there is any difference between the two treatment conditions for
the general population
We are interested in difference scores
We want to know what would happen if every individual in the population were
measured in two treatment conditions (X1 and X2) and a difference score (D) was
computed for everyone
Our null hypothesis states that the mean difference for the general
population is zero
The alternative hypothesis states that there is a treatment effect that
causes the scores in one treatment condition to be systematically higher (or
lower) than the scores in the other condition
> 20
## Assumptions of the Related-samples t
## test
The related-samples t statistic requires two basic
assumptions :
1) The observations within each treatment condition must be
independent . This refers to the scores within each treatment
2) The population distribution of difference scores (D values) must
be normal
> 21
# Related-samples t DF
The degrees of freedom for the related-
samples t test: df = nD - 1
Very similar to the one-sample T test!
> 22
## Directional Hypotheses and one-tailed
## tests
In many repeated-measures and matched-subjects studies,
the researcher has a specific prediction concerning the
direction of the treatment effect
This kind of directional prediction can be incorporated into the
statement of the hypotheses, resulting in a directional, or one-tailed,
hypothesis test
> 23
# Our T statistic
The t statistic for a related samples t test is:
The larger our test statistic, the less likely a sample mean
difference would occur if the null hypothesis were true
The estimated standard error for difference scores, is
placed in the denominator. The standard error for a
distribution of mean difference scores (s MD ) is computed
as above
> 24
> MD= sample mean difference; D= population mean
> difference
# Related-Samples t Example
Study conducted to test
whether or not caffeine
influences heart rate
Participants are given
decaf coffee in one
session whereas the
second session they are
given coffee with
caffeine
Difference in heart rate
when given decaf coffee
or regular coffee
> 25
Do heart rates (# beats per minute) differ
at a 0.05 level of significance? Step 1: State your Hypotheses
State the hypotheses
H0: D = 0 (there is no mean difference in heart rate
when given decaf or caffeinated coffee)
H1: D 0 (there is a mean difference in heart rate in
presence of caffeine)
> 26
## Step 2: Set your Criteria for Rejection
## and Dfs
Level of significance is 0.05
df = n D - 1 = 10 - 1 = 9
Locate 9 df on the T table
Critical values are 2.262
Two tailed test (hence why its +/-)
> 27
## Step 3: Calculate your Test Statistic
You need to calculate/know a few things:
Number of difference scores (n D)
Difference scores
Sum of your difference scores ( D)
Sum of the squared distances ( D2)
Mean of difference scores ( MD = D/n D)
Variance of difference scores
SD of difference scores =
Sqrt(s 2D)
> 28 29
MD = D/n D = -81/10 = -8.1
1,399
(-81) 2 / 10
1,399 - 656.1 = 742.9 = SS
742.9 / (10-1) = 742.9/9 = 82.54
Sqrt ( 82.54) = 9.09
Standard
deviation for
difference
scores = 9.09 Test Statistic Time!
What we now have:
MD = - 8.1
sD = 9.09
nD = 10
Now we compute the estimated standard error for difference scores:
Compute the test statistic
> 30
9.09 /
Sqrt(10) 2.87
(-8.1 - 0)/ 2.87
= -2.82 Step 4: Make a Decision
Compare our obtained value to the critical value
Reject the null hypothesis if the obtained value exceeds
the critical value
tobtained = -2.82, which exceeds the lower critical value
Falls within the rejection region - decision is to reject
the null
Participants had a significantly higher heart rate when
given caffeine than when given decaf, t(9) = -2.82, p <
0.05
Participant heart rate was significantly affected by
caffeine, t(9) = -2.82, p < 0.05.
> 31
## Effect Size and Confidence intervals for
## the repeated-measures t
The most commonly used measures of effect size are
Cohens d and r 2, the percentage of variance accounted for
The size of the treatment effect also can be described with a
confidence interval estimating the population mean difference,
D
> 32
# Effect size
We want to determine the magnitude or strength of
the effect of the new intervention. We can compute
Cohens d, which is the effect size typically reported for
dependent-samples t tests. Cohens d tells us the
magnitude of our effect in standard deviation terms
> 33
Estimated Cohens d = M D / s D
Estimated Cohens d = -8.1 / 9.09 = -0.89 (pretty strong
effect) Confidence Intervals
The confidence interval describes the limits within which the mean
difference between two related populations is likely to be contained
In our example, students were given decaf at one time point and
caffeine in another:
Heart beat in bpm was recorded
MD =-8.1
sMD = 2.87
Find the 95% CI for this data
> 34
# Confidence Intervals
-8.1 2.262(2.87)
-14.59 -1.61
Note: there is not a 0 within our confidence interval!
Our 95% CI is the range of values that we can be 95%
certain contains the true mean of the population
95% confident that the mean difference in the
population falls within this range because 95% of all
sample mean differences we could have selected from
this population fall within the range of sample mean
differences we specified
> 35 36
## Another Example!