Chat with us, powered by LiveChat SOCW6311WK3SingleSytemReading.pdf - Credence Writers

Single-System Studies

Mark A. Mattaini

ocial work practice at all system levels involves action leading to behav-ioral or cul tural change. The primary role of social work research is to provide knowledge that contributes to such professional action. vVhile descriptive resea rch about human and cultural conditions, as discussed elsc·where in this volume, can be valuable fo r guiding professional acti on,

know ing how to most effectively support change is critical for practice. A central qu estion for social work research, therefore, is "what works" in practice, what works to address what goals and issues, with what populations, under wha t contextual conditions. While descriptive research ca11 suggest hypotheses, the only way to really determ in e howweU any fo rm of practice works is to test it, under the most rigorous conditions possible.

Experimen tal research is therefore criti cal for advancin g social work practice. Unfon·tunately, only a small proportion of soc ial work research is experimental (Thyer, 200 I). Experiment al research is of two types, group experiments (e.g., randomized clinical trials [RCTs]) and single-system research (SS R, also commonly referred to as single case resealfch, N of 1 research, o r interrupted time-series experiments). Si ngle-system experi-mental research, however, has often been un deremphasi zed in social work, in part because of limited understanding of the logic of natural science among social scientists and wcial workers.

SSR is experimental research; its purpose, as noted by Horner and colleagues (2005 ), is "to document causal, or functional, relationships between independent and dependent variables" (p. 166). The methodology has been used with all system levels-micro, mezzo, and macro-m aking it wid ely appl iCable for studying social work concerns. For example, Moore, Delaney, and Dixon (2007) studied ways to enha nce quality of life for quite impaired patients with Alzheimer's disease using singl e-system methods and were able to both individualize interven tions and produce generalizable knowledge from their study in ways that perhaps no other research strategy could equa l. In another example, Serna, Schumaker, Sherman, and Sheldon (1991) worked to improve family interactions in families with preteen and teenage children. The first several interven tions they attempted (interventions that are common in social work practice) fa iled to produce changes that generalized to homes. Single-system procedures, however, allowed them to rigorously and sequentially test multiple approaches until an adequately powerful intervention strategy was refi ned. (Note that Lhis would be impossible using group methods without under-mining the rigor of t he study.)



Turning to larger systems, single-system designs can be used, for example, to examine the relative effects of different sets of organizational and community contex.'tS on the effectiveness of school violence p reven tion effo rts (Ma ttaini, 2006). Fu rthermore, Jason, Braciszewski, O lson, and Ferrari (2005) used multiple baseline single-system methods to test the impact of policy changes on the rate of opening mutual help recovery homes for substance abusers across entire states. Embry and colleagues (2007) used a similar design to test the impact of a statewide intervention to redu ce sales of tobacco to m in ors.

Although sin gle-system m e thods are widely used for practice monitoring in social work, research and monitoring are different endeavors with different purposes. This chapter focuses on the utility of SSR for knowledge building. Readers interested in 1 he use of single-system methods for p ractice mon itoring are likely to find Bloom, Fischer, and Orme (2006) and Nugent, Siep pert, and Hudso n (2001 ) particularly helpful.

Understanding Single-System Research

Single-system experimental research relies on natural science methodologies, while much of the rest of social work research, including a good deal of group experimental research, emphasizes social science methods. The differences are real and s ubstantive. In 1993, Johnston and Pennypacker noted,

The natural sciences have spawned technologies that have dramatica lly transformed the h u man culture, a nd the pace o f t echno logical development o nly seems to increase. The social sciences have yet to offer a single well-develop ed techno logy that has had a broad impact on daily life. (p. 6)

T here is li llie evidence that this s ituation has changed. The reasons involve bo th meth-ous and philosophies of science. Critically, however, analysis is central in most natural sci-ences and is best achieved through the direct manipulation of variables and observation of the impact of those manipulations over a period of time. As one expert noted, the heart or SSR is demonstrating influence b y "mak[ing] things go up an d down" under precisely specified conditions (J. Moore, personal communication, 1998) . Such analysis is often best done one case at a time.

SSR has particular strengths for social work research. SSR focuses on the individual sys-tem, the indiv id ual person, the individ ual family, and the individual neighborhood, typi-cally the level of analysis of primary interest in social work. Fur thermore, SSR allows detailed analysis of intervention outcomes for both responders and nonresponders, which is critical for practice because each client, not just the average client, must be of concern. Relevant variables can then be further manipulated to understand and assist those who have not responded to the in itial manipulations (Horner ct al., 2005). furthermore, as noted by Horner and colleagues (2005 ), rigorous SSR can be implemented in natural and near natural conditions, making it a practical strategy for elaborating and refining inter-ventions with immediate appl icabili ly in standard service setti ngs.

Contrasts With Group Experimental Research Most group exper imen tal research reli es on comparin g the impact of one or more inter-ventio ns (e.g., experimental treatment vs. standard care, placebo therapy. or no treat-ment) applied to more or less equivalent samples. Ideally, these samples are randomly


selected from a larger population of interest, but in social work research, it is more common for samples to be chosen on the basis of availability or convenience. Comparison studies include (a) classical experiments with randomization and no -intervention controls, (b) contrast studies that compare one intervention with another, and (c) a wide range of quasi-exper imental des igns. W"hilc comparison studies, espe-cially ra ndomized clinical trials, are often regarded as the gold standard for ex-perimen-tal research, the often unacknowledged strategic and tactical limits of !>uch comparison studies are serious (Johnston & Pennypacker, 1993, p. 119). Conclusions rely on proba bilistic meth ods drawn from the social sciences, rather than on the analytic methods of SSR. As a result, Jo hnston and Pennypacker ( 1993) suggest that comparison studies "often lead to inappropri~te inferences with poor ge nera li ty, based on improper evidence gath ered in support of the wrong question, thus wasting the field's limited experimental resources" (p. 120). (Similar criticisms have been made of much descrip-tive research.)

Wh ile comparison stu dies are useful for many purposes (as outlined elsewhere in this volume), it is important to understand their limits. As is true of mo st social science research, comparison studies at their core are actuarial. They attempt to determine which of two procedures produces better results on average (Johnston & Pennypacker, 1993 ). Jn pretty much all cases, however, some persons (or groups, organizations, or communi ties ) will do bcller, some will show m inimal change, an d others wil l do worse. Comparison studies by their nature do not provide in formation about the variables that may explain why these within -group differences occur; rather, such differences, while acknowledged, are generally trea ted as error. Analytic natural science methods, however, including rigor-ous SSR, can do so.

In addition,

although two procedures may address the same general behavioral goal, a number of detailed differences among them may often make each an inappropriate metric for the other. These differences may in clude (a) the exact characteristics of the populations and settings where ea ch works best, (b) the target behaviors and their controlling infl uences, or (c) a variety of more administrative considerations such as the characteristics of the personnel conducting each procedure. (Johnston & Pennypacker, 1993, p. 122)

Similar issues are present for large system work like that done in community practice and prevention science. Biglan, Ary, and Wagenaar (2000) note a n umber of limita tion!> lo the use of comparison studies in community research, including "(a) the high cost of research d ue to the number of communities needed in such studies, (b) the difficulty in developing generalizable theoretical principles about community change proccs. e through randomized trials, (c) the obscuring of relationships that are unique to a subset of communities, and (d) the problem of diffusion of intervention activities from in ter-vention to control communities" (p. 32) . SSR, particularly the use of sop histicated time-series designs with matched communities (Biglan et al., 2000; Coulton, 2005), provides powerful alternatives that do not suffer from these limitations.

Analytic investigations, in co ntrast to actuarial studies, allow the researcher to manip-ulate identified variables one at a time, oflen with one system at a time, to explore the impact of those variables and the d ifferences in such impacts across systems, as well as to test hypoth eses about the differences found. This is the natural science approach to inves-tigation, this is how generalizable theory is built, and this is primarily how scientific advance occurs. Kerlinger (1986) states, "The basic aim of science is theory. Perhaps less


cryptically, the bas ic aim of science is to explain na tural phenomena" (p. 8). Social ·vork needs to be able to understand how personal and contextual factors important to client welfare and human rights can be influenced, and analytic studies are needed to move the field in that direction and thus "transform . .. human culture" (Johnston & Pennypacker, l ~93, p. 6 ). Once the relevant variables and contingent relatio nships have been clarified through analytic s tud ies, grou p experimental comparisons may have unique co ntribu-tions to m ake in organ izational cost-benefit comparisons and other areas as outl ined else where in this volume.

The Logic of Single-System Research The basic logic underlying SSR is straightforward. Data o n the behavior of interest are collected over a period of time until the baseline rate is dearly established. Intervention is then introduced as data continue to be collected. In more rigorous single-system studies, intervention is independently introduced at several points in time, while hold ing contex-tu al conditions co nstant, to confirm the presence of functional (causal) relationships. (Repeated measurement of the dependent variable[s] over time, therefore, is central to SSR.) As discussed later, a great deal is now kJ1own about how to achieve high levels of experimental control and validity in the use of these procedures.

Behaviors of interest in SSR may include those of individuals (clients, family members, service providers, policy makers) as well as aggregate behaviors am ong a group (students in a class, residents in a state). In addition , behavior as used here includes all ronns of actions in context (Lee, 1988), including motor behaviors (e.g., going to bed), visceral behaviors {e.g., bodily changes associated with emotions), verbal behaviors (e.g., speaking or covert self-talk), and observational behaviors (e.g., hearin g or dreaming).

A number of dimensions of behavior can be explored and pote ntially changed in SSR, including rate (frequency by uni t of time), in tensity, duration, and variability. Single-system researchers therefore can measure the impact of intervention (or prevention) on (a) how often something occurs (e.g., rate of suicide in a state), (b) how strongly it is present (e.g., level of stress), (c) how long something occurs (e.g., length of tantrums), and (d) how stable a phenom enon is (e.g., whether spikes in violence can be eliminated in a neighborhood). Nearly everything that social work research might be interested in, therefore, can be studied using SSR techniques, from a client's emotional state to rates of violations of human rights v.rithin a population.

Nearly all SSR designs depend on first establishing a stable baseline, the rate (or inten-sity, duration, variab ili ty) of behavior before intervention. Sin ce all behavior va ries to some extent over time, multiple o bservations are general ly necessary to establish the extent of natural va riability. In some cases, a baseline of as few as three data points may be adequate; in general, however, the more data points collected to establish baseline rates, the greater th e rigor of the stud y.

Once a stable baseline has been o btained, it is possible to introd uce a systematic varia-tion in conditions (i. e., an in ter vention, or one in a planned ser ies of intervent ions) and to determine whether that intervention is followed by a change in the behavior(s) of interest. The general standard for change in SSR is a shift in level, trend, or variability that is large, clearl y apparent, relatively immediate, and clinically substantive. (Technical details regarding how such changes can be assessed graphically and statistically are provided later in this chapter.) Fig ure 14.1 presents the most basic structure of the approach, depicting a clear change between phases. (Much more rigorous designs arc di scussed later in this chapter.)

Figure 14.1 A graph of data for a si mple single-system research design

with successive observations plotted on the horizontal axis and frequencies of a

behavior of interest on the vertical axis. (This graph depicts a n A-B [baseline- intervention] design, wh ich will be

di scussed in deta il later in the cha pter.)


30 Baseline Intervention


1/) 20 Q)

'() c Q) 15 :J C" Q) … u. 10 ~—o


0 2 3 4 5 6 7 8 9 iO


Rigorous SSR requires strong measu rement, more complex designs comparin.; mclti-ple phases, and sophisticated analytic techniques. Horner and coll eagues (2005. Tab!e 1) identify a series of quality in di cators that can be used to judge the rigor of single- ;-.~tern invest igations, inc luding evaluation of descriptions and characteristics of particirants, descr iptions and characteristics of the setting, specification of independent and depen-dent variables, measureme nt procedures, esta blishment of experimental con~rol. and proced ures to ensure internal, external, and social validity. All of these dimensions l'l.;n be explored later in this chapter.

Two examp les of methodologically straightforward single-system studies illu- r .. ,e ihe co re logic of SS R. All day an d Pakurar (2007) tes ted the effec ts o r teacher greeting::. n rates of on-task behavior for three middle schoo l students who had been nominated o· their teachers for consistent difficully in remaining on task during the beginning o: t!le "-1lool day. Some existing research suggests that teacher greetings ma y have an impact on ::.:u<!em behavior and achievement (Embry, 2004); Allday and Pakurar wanted to e.:penrnmtally test this effect. They us ed a multiple baseline design (discussed in detail later . ~nning by co llec ting observational d ata in classrooms. After three observations, one ~eac~er hega n greeting the target st udent in her class with his name and a positive statement \-hen he entered the classroom. Meanwhile, the two other students, who were in JitTerem schoo ls, continued to be observed.

The rate of on-task behavior for Lhc first student immediately improved, , hile there was no change for the other two. Shortly thereafter, the first studcn l co n tinued ~o be greeted, the second student also began to be greeted, an d the third stud ent connnueci m just be o bserved. On-task behavior for the firs t student rem ained high and improved rub-stantially fo r the second, while there was no cha nge for the third. At the nex1: ob:>en"3tion point, greetings for the third student were added; at this point, the data for all iliree showed improvement over baseline. Each time the intervention was introduced. aac only when the intervention was introduced, th e dependent variable showed a ch …. rl;;e. Each time change occurred co ncurrent with intervention, the presence of a causal :elation beca me m ore convincing, the principle of unlikely successive coincidences (Thyer & Mvers, 2007) . I n addition, two o f the studen ts showed greater improvem ents than tile third.


Those data indicate that the intervention tested was adequate for the first two students but that refinements may be needed for the third. This level of precision is critical for clin-ical research.

In a second example, Davis and colleagues (2008) reponed a single-system study with a 10-year-old boy who displayed multiple problem behaviors in the classroom that inter-fered with his own and others' Learning. After tracking his behaviors over a baseline period of 5 days, a social sk ills and self-control intervention was initiated. As soon as these procedures were implemented, the level of behavior problems dropped dramatically. When the procedures were withdrawn for 5 days, behavior problems rapidly increased again. When the procedures were reintroduced, behavior problems dropped 011ce more. The association between use of the intervention procedure and behavior problems becomes more persuasive each time rhey change in tan dem. Much more sophisticated and rigorous studies arc discussed below, some involving entire states in their sampling plans. What is important to note here, however, is the logic involved in demonstrating influence and control by introducing and withdrawing independent variables (interven tions) in planned ways to test: for functional relationships with dependent variables.

Rigor in SSR depends largely on two factors, the quality of the measurement used and the extent to which the design allows the investigator to rule out alternative explanations. In the Allday and Pakurar (2007) study, direct observation of the dependent var iable was imp lemen ted, with two observers used dur ing 20o/o of the observations. In the Davis et al. (2008) study, multiple measures, including direct onsite observation, were used (in lSo/o of observations, a second rater was used). In the Allday and Pakurar study, rigor was increased by introducing interventions one case at. a time to determine whether in terven-tion was fun ctionally related to behavior change. By con trast, stre ngthening rigor in the Davis et al. study involved introducin g and withdrawing procedures multiple tim es to determine whether presence or absence of the independent variable was consistently associated with behavior change.

Measurement in Single-System Research

There are a wide range of possible approaches for measuring independent and dependent variables in social work research. The most widely useful methods include direct observa-tion; self-monitoring by the client or research participant;· the use of scales, ratings, and standardized instruments completed by the client or other ra le rs; and the use of goal attainment scaling (GAS) or behaviorally anchored rating scales (BARS).

Observation Observation is the m ost direct and therefore often the most precise method of measuring behavior and behavior change. This is especially true when at least a sample of observations is conducted by more than one observc1·, which allows the calculation of i..nterobscrvcr reli-ability. Observation can be used to track such variables as the number of instances of self injury, the percentage of 10-second intervals in which a student is on task, repeated patterns that occur in family communication, or the immediate responses of decision makers to assertive behavior by clients participating in advocacy efforts, for example.

Observation often involves less subjective judgments, inferences, or estimates than other measures. For example, use of a rat ing scale related to the incidence of child behav-ior problems may involve judgments as to whether the rate is "high" or "very high," while


a simple co unt prov ides both more precision and perhaps a less value -laden measure. There are times when direct observation is impractical, but given irs advantages, when-ever possible, it is the strategy of choice in SSR. The wide availability of video recording equipment has contributed to both the practicality of observation and the possibility of recordin g in the moment and analyzing later, anc.l it c~ n ~ lso facilitate measuring interob-server, or interrater, reliability. (Carefu l refinement and pretesting of operational defini -tions and training procedures should be built into observation planning, as the quality of obtained data may otherwise be compromised.)

There are times when observation is not practical due to cost, intrusiveness, or when rea ctivity to observation is likely to influence the behaviors oC inLcrcst. There also are tim es when observation and recording may rai se erhi ca l issues (as in some studies of ille-gal or antisocial behavior) . Some issues of social work concern are also not directly observable; emotional states and covert self-talk are examples. Other measurement app ro aches are needed under such circums tances.

Self-Monitoring Self-monitoring (self-observation) is a common and very useful approach for data collec-tion in social work SSR. It is often not possible for the researcher to "go home with the clie nt" to observe, for example, ch ild beh avior prob lems (althoug h sometimes this is in fact rea lisLic and usef-ul). From hundreds of studies, however, il is cl ear that parents can record many kinds of data quite accurately, from the frequency of tantrums or successful toileting to the extent to which they are frustrated with the child. Couples can monitor the number of ca ring act ions their partners take ove r Lh e course of a week (e.g., in Stuarfs [1 980] "caring days" procedures). Depressed individu als ca n tra ck their activities and le,–els of satisfaction on an hourly basis to prepare for behavioral activation procedures (Dimidjian et al., 2006; Mattaini, 1997). So long as the measurement procedures are clear and the participant has the capacity and motivation to complete them, self-monitoring can be both highly accurate and quite cost-effective. Simple charts that are clear and com-muni cative for those completing them are usually essential and should be provided. Asking people to devise their own charting system often will not produce quality data, but collaborating with clients or participants to custom ize recording char ts can work 'ery well (studies involving multiple clients or participants require uniformity of recording .

Self-monitoring can itself be motivating to clients and research participants, providing immediate feedback and often a sense of control over o ne's life (Kopp, 1993) . As a result, self-monitoring procedures are often reactive; monitoring by itse]f may change behavior, usually in the desirable direction. (A s imilar issue can arise with other forms of monitor-ing, but this is a particula r issue with self-monitoring.) This can be an advantage for in tervention, when the primary interest is in working toward the client's goals, but can complicate analysis in research sin ce record ing constitutes an additional active variable that needs to be taken into account in analysis. Often the best option when reacti vity may be a problem is to begin self-monitoring without the planned intervention and examine the resulting data over several measurement points. If the dependent variable shows improvement, monitoring alone shou ld be continued until a stable level is achieved before introducin g further experimental manipulation.

Rating Scales and Rapid Assessment Instruments When observation is not possible or p ractical, rating scales can be a useful alternative. Either the participant (cl ient) or ano ther person (e.g., a socia l worker or a parcn 1) can


complete such scales. Self- anchored scales, for example, are completed by the cl ient- for example, rating one's level of anxiety on a 0 to 100 scale. Such scales often have excellent psychometric properties (Nugent et al., 2001) and can often be completed very frequently, thus providing fine-grained data for ana lysis. Several such scales can be co mbined, as in Tuckma n's (1988) Mood Thermometers or Azrin, Naster, and Jones's ( 1973) Marital Happiness Scale, to provide a more complete, nuanced, and mul tid imensional picture o f personal or couple func ti oning. Clinicians can complete r ating scales (e.g., the Clinical Ra ting Scale for family assessment; Epstein, Baldwin, & Bishop, 1983), and parents can complete ratings on child behavior.

T hel'e are many sta ndardi zed sca les and rating sca les available; perhaps most useful for social work p ractice an d research are rapid assessmen t instrum en ts (l~Is). RAis are brief instruments that can be completed quickly and are designed to be completed often. As a result, the researcher (or clinician) can collect an adequate number of data points to care-fully track events in the case and thereby identify function al relal ionships. Please refer to Chapter 5 (this volum e) fo r more inform ation regarding such inst ruments.

Goal Attainment Scaling and Behaviorally Anchored Rating Scales GAS (Bloom et al., 2006; Kiresuk, Smith, & Cardillo, 1994 ) is a measu rement and mon i-toring approach for tracking pro gress, usually on more than one goal area at the same time, that has been used for practice and research at all system levels. GAS can be used to concurrently track multiple goal areas for a single client/participant system , while provid-ing an aggregate index of progress. Tn addition, if GAS is used with a client populatio n, the scores can be aggregated to m easure program outcomes (Kires uk e t al., 1994) .

GAS is organized around the Goal Attainment Fo ll ow-Up Gu ide, a graph ic device that lists five levels of goal attainment on the vertical dimension (from most unfavorable out-come thought likely to most favorable outcome thought likely) and m ultiple scales (goal a reas) with relative weights across the horizontal. Thi. produces a m atrix; the items in the m atrix are typically individ ually tailored to the case. The midclle level is the "expected level of success" for that scale within the timeframe specified. A scale (or depression for a case in which the initial scores over a baseline period ranged berween 3 1 and 49 (a clini-cally significant level of depression) on the Generalized Conten tment Scale (Hudson, 1982) might list an expected level of 20 to 29 (subclinical ), a less tha n expected level of 30 to -19 (no change), and a most un favorable level of 50 or greater. Two levels of greater than expected would also be identified. There might also be sca les for anxiety, ac tivity level, and quality of partner relationship on the same follow-up guide; depression could be weighted as twice as important as the other scales if that was determined to be the most important goal. Books li sting many possible scale items have been produced for GAS to assisl in preparation.

Form ul as for calculating and aggrega ti ng stand<lrd sco res on GAS guides are also avail-able, and GAS has been widely used for program evaluation and research {e.g., Fisher & Hardie, 2002; Newton, 2002). Any goal or issue that can be framed in terms of expected and less than expected levels of progress can be incorporated into GAS, if the analyst has adequate familiarit y with the substantive issue or goal.

BARS (Daniels, 2000; Mattaini, 2007) is a variat io n of goa l attainment scaling methods in which each level is specified in clea r and observable behavioral terms. BARS can, there-fore, combine the advantages of observations and ratings with those of GAS, allowing aggregalion of quite different measures for program evaluation, for example. At the same time, detailed analysis should primarily be done at the level of th e case.


Existing Data In many cases, the data needed to comp lete a single-system study are already being collected and need on ly to be accessed. This is particularly co mmon in community and policy-level studies. Fo r exam ple, if investigators are in terested in red ucing levels of d rug-related and vio lent crime in a neighborhood, as in a recent st udy by Swenson and colleagues in South Carolina, they will typically find that relevant data are collected and reported on a regular (often monthly) and relatively fi ne- grain ed basis (Swenson, Henggeler, Taylor, & Addison, 2005 ) . The investigators initiated combined multisys-temic therapy and neighborhood development initiatives, viewing the neighborhood as the single system. Usi ng routinely collected data, they discovered th at police calls for service in the neighborhood, once one of the highest crime areas in the state, had dropped by more than 80%.

lnterobserver Reliability vVhen behav ior is being d irectly observed and counted or when a variable is being rated by observers using some form of rating scale, it is often important to determine the objec-tivity of the measures reported. The mosl common approach used to do so is to measure the extent to which two or more observers see the same things happening or not happen-ing. This can be particularly important when observation involves some judgment: For exan1ple, «Was that, or was that not, an act of physical aggressio n as we have operationally defined it?" There are a number of ways of reporting interobserver agreement. One of the simples t and often the most useful is the calculation of percentages of in tervals in which the o bservers agree and d isagree on the occurrence of a behavior of interest (e.g., in how m any 1 0-second intervals was a child on task). (Sim ilar percentages can be calculated for d uration and frequency data.) In some cases, such percentages m ay be artificially high, as when the behavior of interest occurs in very few or in most in tervals. In such cases, statis-tical tools such as kappa can correct for levels of agreement e}..–pected by chance. There are also circumstances in which correlations or other indices of agreement may be useful; see Bloom et al. (2006) for more information.

When levels of agreement are not adequate (at least 80%, but in mos t cases at least 90% is highly desirable), a number of steps can be taken. First, the behavior(s) of interest may need to be more clearly and operationally defined. Additional training (and often retrain-ing over time) and monitoring of the recording proced ures may be necessary. It is also sometimes necessary to make changes in observatio nal procedures. It is important that wh at is asked of observers is realistic and that they do not find the procedures too fatigu -ing, o r accuracy will suffer.

Single-System Designs

The pur pose of experimental design, whether in group experiments o r SSR, is to confirm or disconfirm the presence of a func tional relationship between the independent va ri-ables/interventions and the dependent var iable(s), ruling out alternative explanations of chan ge to the extent possible. In group experiments, this is commonly done using con-trast groups or a variety of quasi-experimental manipulations. In SSR, target systems commonly serve as their own controls, using patterns of change over time. Some of the most common SSR designs are briefly summarized in this section.


The A-B Design The simplest single-system des ign that can be used for research purposes is the A-B design. In this design, observations are collected over a period of time prior to introduc-tion of the experimental manipulation; data collection should continue until a stable baseline has been establi shed. Generally, more baseline data po ints are better than fewer because it is more likely that the full patlern will emerge with an extended baseline and because the number of analytic possibilities expands with more data points. Once a stable baseline has been established, the intervention is introduced while observa ti ons co ntinue to be collec ted, typically for about as long as the baseline data were collected. Tf the depen-dent va riable changes quickly in a very apparent way, as in Figure 14. 1, there is some evi-dence that the intervention may be responsible for !he change.

It is poss ib le, however, that something else occurred at the same time the intervention was introduced, so the evidence is not as strong as that provided by the more rigorous designs described later.

Note that A-n designs are a substantial improvement over case stud ies in which no baseline data are collected. (These are referred to in the SSR literature as B designs sin ce the label A is always used for baseline phases and B for the [first] intervention phase.) In a B design, data are simply collected during intervention; such a design ca n be useful for cli nica l monitor ing but does not provid e any information regarding causation (the pres-ence or absence of a functional relationship). Such case studies are therefore not generally useful for SSR.

An example of the use of an A-n design is Nugent, Bruley, and Allen (1998), who tested the impact of introducing a form of aggression replacement training (ART; Goldstein, Glick, & Gibbs, 1998) in a shelter for adolescent runaways, in an effort to reduce behavior problems in the shelter. They in troduced the inter ve ntio n at a point when they had 310 days of baseline data available and continued to monitor data for 209 days after the intro-duction of ART. While the investigators used very sophisticated statistical analyses (dis-cussed later) in the study, in terms of design, this study was a straightforward A- B design. Given the lon g basel ine, the relative stab ility of improvement over a 7-month period, and the small statistical probability of a change of the magnitude found occurring by chance, the data are arguably persuasive despite the limitations of A-B designs.

In so me situa tions, mu ltiple cases facing similar issues may be of interes l. For exa mple, a clinician-researcher may be in terested in the value of a group for parents of children with developmental disabilities. The extent to which each group member implements a particula r parenting technique migh t be one of several dependent variables of interest. Each pa ren t, therefo re, wou ld be the subject of a separate study, and the ir data could be tracked on a separate graph. Most likely, however, the researcher is also interested in the overall utility of the group. ln !his case, data for all parents could be shown on a single gra ph, along wilh a li ne showing mean data across all group mem bers; see Figu re 14.2 for an example (see Nugent et al., 200 l, for more information).

Another common situation is one in which multiple dependent variables for a single case are o f interest, for example, m ultiple dimensions of satisfac tion with an in timate partner relationship. In this situation, multiple lines, one for each variable of interest, can be plotted on a single graph. Progress on each but also the pattern among several var iab les can then be assessed. Social workers are o ften interested in simultaneous progress on several issues or goa ls, and SSR research can be idea l for tracki ng such cases and for studying multiple functional relationships at one time (see also multielemenl designs below) .




~ 8 40 (/)

~ 30

" 20 10

C HAPTER 14 • S ING I F-5YSHM R£Sf.UC.- 251

Baseline Intervention

0 +—~—-~—r—-r—-r~~—-~–~—-~–~

2 3 4 5 6 7 8 9 iO Measurement Points

Figure 14.2 A graph showin g hypothetical resu lts of behavioral activation treatme nt for de::;~~­with four clients. Each lin e with open symbols represents one client; the darker line with dosa: o"'des shows the average score across cl ients at each point in ti me. Note that the average level o~ depression, as measured by the Generalized Conten tment Scale (Hudson, 1982), is in creas. .-.g :…..:-g base line, but that two of the four cases are primarily responsible for the increase (and ma"J t:=-:~-'"E

need rapid intervention). There is an evident change on average and for each case beg inn"":; :a: intervention is initiated.

Withdrawal Designs

The study by Davis and colleagues (2008 ) discussed earlier in this chapter is an ~,?k of a withdrawal design.• It began with the co llection of baseline d ata for set:ral d…."!'Si an in tervention package was then introduced while data continued lo h e collected. TI'1::' m·er-vention was withdrawn after several days and then reintroduced se·era:. cin"S lzter. Behavior improved immediately and significantly each time the intervention pa … ~e- was introduced and worsened wh en it was withdrawn, suggesting stron gly that the b-a-.-en-li on was responsible for the change. T h is A-B-A-B is the standard pattern for "":d0.:~wal designs; with replications, it ca n be a very powerful design , although it is not a~ fit for every situation. See fig ure 14.3 for th e basicA-B-A-B model.

For example, I once worked in a partial residential program with adoi~m.s "'ith severe autism. Many of the behavior-analytic interventions we used and as~ fa..-nilies <o u se when our clients were at home for the weeke nds were extremel y time-;·n~nm:.- and demandi11g for staff a nd fami ly, invo lving hours of disc rete- trial training hi~· yro-g rammed o ne-on-one work) and other skills training every day for beha r:') SLc.'l as expressive language and response to eme rgencies. It was important und er those di'cum-sta nces to determin e whether an interven tion was really necessary and to probe oc~ion­ally to determine whether it was still necessary. Tf we found, for example, that a diaJtS rate of compliance with requests improved with application of a particular remfor.:emem arran gement, we did not know for certain that the reinforcers were respon..,ible for the


Functional Assessment and Treatment 100

FBA and ~ 90 Baseline Intervention Reversal Reinstatement 0 ·:;:

80 til 0 .s::.

C1l II

ID 70 p • I

C1l ; i

> i i :;: 60 / a. til "0

50 / 1

cv ; ! (ij

/~ ~ ~ 40 -0 p u C1l I

30 I

P·-cl C') / cv i … 0 -•. r:/ c: 20 d (I) D., u …. /!!. ·o .. n . C1l

' c.. 10 ' 'D—o

0 1 2 3 4 5 6 7 8 9 10 111 2 131415 16 17 18 19 20

Observation Sessions

–·O··· Self-Initiated –A- – Teacher Attention – +– Peer Attention —- Academic Escape

Figure 14.3 This graph, from Davis et al. (2008), is an example of a withd rawal design (A-B-A-8). The figure depicts t he

percentage of overall time intervals during whi ch each of several su btypes of maladaptive behavi ors occurred during initial

base line, first intervention, withdrawal, and reinstatement of interven tion. The percentage of intervals in which maladaptive

behaviors occurred overall is quite high in the first baseline phase and also in creased rapidly during the return to baseline.

(Note th at the withdrawal phase is labeled reversal, as is com mon in the literature; see No te 1.)

SOURCE: © 2008 Davis et al.; reprinted with perm ission.

change. ll was common under those circumstances to discontinue the intervention briefly; if performance suffered, we could be relatively sure that the in tervention was func tionall y related to the behavior and that we n eeded to co ntin ue it. After some time, however, it co mmonly made sense to again withdraw th e interventio n to determine whether natural consequences had become powerful enough to mainta in the behavior on their own .

Putnam, Handler, Ramirez-Plat t, and Luiselli (2 003 ) used a withdrawal design to improve student behavior on school buses. The school involved was a low-income, urban elementary school in which behavior problems on buses were widespread. The interven-tio n involved working with st Lld ents to ident ify appropriate beh aviors (a shared power technique; Manaini & Lowery, 2007) and subsequently reinforcing appropriate behaviors by means of tickets given to students by bus drivers, which were entered into a pri ze draw-in g. This was no t an extrem ely labor-intensive a rrangem ent but did requi re consistency and coordinalion. The intervention package was therefore introduced fo r several mon ths follow in g a baseline period and then withdrawn. Office referrals and suspensions for bus b ehavior went down dramatically during the in tervention period but increased again during the withdrawal phase. When interve ntion was reintro duced, problem data again


declined. It continued to be relatively ow during several months of follow up, when the program was maintained by the school without researcher involvement.

Withdrawal designs are clea rl y not appropriate under many circumstan ces. There are often ethical issues with withdrawing treatment; stakeholders also may raise reasonable objections to withdrawing treatment when things are going well. Furth ermore, so m e interventions by design arc expected to make irreversible changes. Fo r example, cogni tive therapy that changes a client's perspective on the world is designed to be short term, and the results are expected to last beyond the end of treatment. It might be logically possible but would certainly be ethically indefensible to use the techniques of cognitive therapy to try to change self- talk from healthy back to unhealthy and damaging, for example (this would be an examp le of an actual reversal design ). Luckily, other rigorous designs d is-cussed below can be used in c ircumstances where withdrawal or reve rsal are unrealistic o r inappropriate.

Variat~ons of Withdrawal Designs Several variations of withdrawal designs can be useful for special research and practice situations. One of these is the A-B-A design. Following collection of b aseline data, the in terven tion is introduced and subsequen tly discon tinued. This design is not generally usefu l for clinical studies sin ce it is applicable only in circumstances wh ere the expecta-tion is that the impact of the intervention is temporary, and the study ends with a baseline phase, potentially leaving a client system in the same si tuation he or she was to begin with. There are times in research, however, when the research interest is no t immediately cl ini-cal bul rather a simple question of causality.

Another occasionally useful design is lhe B-A-B design, which involves introducing an intervention for a period, withdrawing it, and th en reintroducing it. T his is not a common or particula rly strong design for research p urposes but does p ermit exam ining chan ges in the dependent variable concurrent with phase changes. It has been used in some clinical studies where the issue of concern required immediate intervention, and questions arose as to the need to continue that intervention. There are also times when a comp lex and expensive intervention is briefly withdrawn to be su re tha t it is still needed. Imagine, fo r exam p le, that a child with a se rious disability is accompanied by a one-on-one aide in a school setting. Given the costs involved, withdrawing this inten-sive se rvice to determine whether it is necessary may be practi cally necessary. If behav-ior problems in crease wh en the aide is withd rawn and decrease when the a id e is s ubseque ntly re instated, it suggests both that the presence of the aide is 11 ecessary and that it is functionally related to the level of problem behavior. (On occasion, B-A-B research repor ts are the result of unplanned interruptions in service, as when the person providing th e interventio n becomes ill for a period of time or is lemporarily assigned to other tasks.)

Multiple Baseline Designs While withdrawal designs offer considerable rigor, lhe need to withdraw service often precludes their use in both practice and research. Another SSR strategy also can provide strong evidence of a functional relationship between inde pendent and depend ent variables, a set of design types called mu l.tiple baseli ne (MB) designs. T he heart of MB designs is to concurrently begin collecting baseline data on two or more (pre ferably three or more) sirniJar "cases;' then introduce a common intervention with one case at a time, while collecting data continuously in all of the cases. See Figure 14.4 for the basic MB model.


100 Baseline Teacher Greeting ..1:

90 U) {2

I 80 c: 0

70 ..!!.! Ill

60 > Qj 'E 50 – 40 v 0 Q.1 C) 30 Ill 'E Q.1 20 0 Tim Qj 10 a..


—– 1 I I

100 l I

..1: I

U) 90 {2 c. 80 0

70 U) iii

60 >

~ Qj

£ 50 0 40 Q) C) 30 Ill 'E

20 Q) 0 Kay ….

10 Q.1 a.. 0

I l I — – — – – – – – 1


100 I I

..1: I

90 I

U) I

{2 I c. 80

v~-~ 0

70 ..!!.! Ill

60 > …. Q)

'E 50 0 40 Q) C) 30 Ill ….. c:

20 Q) 0 Jon Qj 10 a..

0 2 3 4 5 6 7 8 9 10 11


Figure 14.4 A multiple base line across clients study, taken from the study by Allday and Pakurar (2007, p. 3 19), descri bed

earlier in the chapter. Note that results for the first two clients are more persuasive than for the third, where there is overlap

between baseline and interventi on, although the averag e is improved. Th is might suggest the need for additional

in tervention intensity or alternative procedures.

SOURCE: @ 2007 Journal of Applied Behavior Analysis; reprinted with permission.


Th e "cases" in MB designs may be individual systems (clients, neighborhoods, even states) but may also be settings or situations (school, home, bus) for the same client or m ultiple behaviors or problems. In MB research, the intervention must be common across cases. The Allday and Pakurar (2007 ) stud y depicted in Figure 14-4 is an example in which a friendly greeting is the common manipulation. As with withdrawal designs, if a change in the dependent variable consistently is associated with intervention, the evidence fo r a functional relationship increases with each case (particularly with replications, as dis-cussed later) .

An interesting example of an MB across cases study was reported by Jason et al. {2005 , who tested an approach for starting Oxford House (Oil ) programs (m utual help recovery homes for persons with su bsta nce abuse iss ues). OH programs appear to be cost-effective and useful for many cl ients. Jason and colleagues were interested in whether using state funds to hire recruiters and establish startup loan funds would meaningfully increase the number of homes established. Baseline data were straightforward; there were no OH programs in any of the 13 stales stud ied d uring a 13-year baseline period (and probably ever before). As the result of a fede ral-level policy change offer ing funds that states might use in this manner, the recruiter-loan package was mad e available in 10 states. The number of OH homes increased in alllO stales over a period of 13 years, sometimes dramaticall y; 515 homes were opened in these 10 states d uring th is time. Durin g the first 9 of those years, data were also ava il able for 3 states th at did no t es tablish the recruiter-lo an arrangement; a total of 3 OH homes were opened in those states during those 9 years. The recruiter-_loan arrangement then became available to those states, and immediate increases were seen, with 44 h omes open ing in a 4-year period. See Figu re 14.5 for the data from this study.

This is somewhat of a h ybrid study, with multiple conc urrent rep lications in each phase. Overall, the data dearly support the conclusion Lhat the recruiter-loan package was responsible for th e dramatic increases in OH hom es, in every state studied. This investi -gati.on also shows the potential fo r use of SSR in comm unity and policy-level research.

An exam ple of an MB across settir1gs/situations study is fo un d in Mattaini, McGowan, and WilEams ( 1996). Baseline data were collected on a mother's use of positive conse-quences for appropriate behavior by her developmentally delayed child, as well as other parenting behaviors not d iscussed here. In the situa ti ons in which tr ain ing occurred, includ ing p utting away toys, playing with broth er, and mealtimes, baseline cla la were col-lected within each of those settings for five sessions. An intensive behavioral training program was then conducted in the putting away toys situation only. This resulted in a large and immed iate improvemen t in use of positive consequences in t hat condition, a very small carryover effect in the playing with brother cond ition, and no change in the mealtime condition. Training was then implemented in the playing with brother condi-tion, resulting in a significant increase; improvement was maintained in th e putting away toys condition, but there was still no improvement in the mealtime condition. When the inlcrvc ntion was in troduced there, immed iate improvement occurred. In other words, each time the training intervention was introduced, and only when the intervention was introduced, a large immediate effect was apparent.

By now the basic MB logic is probably cl ear, and research using MB across behaviors/problems is li mited, so this d iscussion will be brief. The most li kely situati on that would be appropriate for this kind of design, for most social workers, would be the use of a relalively standardized intervention such as solution-focused brief the rapy (SFBT) to sequentially work wit h a cl ient on several problem areas. For example, if a teen client was having co nflict with his custodial grandmot he r, was doing poorly academically, and had few fr iends, SFBT might be used sequentially with one issue at a time after a baseline period. (There is some risk of treatment for one issue carrying over to others


Base line Intervention 100 95 90 85 80 75

"0 70 (I)

c: 65 (I) c.

0 60 t/) 55 :I: 0 50 -0 45 ….. 40 (I) ..0

35 E :::J 30 z

25 20 15 10

5 0 0 -o—o—::::

"0 Ql c: (I) 30 c. 0 25 t/) 20 :I: 0 15 -0 10 …

5 (I) ..0 0 E o-c-o-o-o—o—-o-o–o-o–o-o—-o-~~ :::J z 1 5 10 15 20 25


Figure 14.5 Cumulative Number of New Oxford Houses Ope ned in Two Groups of States Over T ime as a Function of Recru iters Plus a Loan Fund Intervention

SOURCE: Jason, Braciszewski, Olson, and Ferrari (2005, p. 76). © 2005 Leonard A. Jason, Jordan Braciszewski, Brad ley D. Olson, and Joseph R. Ferrari; reprinted with permission.

in such circumsrances, however.) Another example would be the use o f a point system in a residentia l prog ram, in which a clien t's mult iple p roble ms might be seque ntially included in the point system.

Changing Intensity Designs As discussed by J3loom ct al. (2006), there are two types of changing intensity designs. In a changing criterion design, goal levels on the dependent variable are progressively stepped up (e.g., a n exercise program wi th higher goals each week) or down (e.g., a smoking ces-sation program in which the target number of cigarettes smoked is progressively reduced, step by step, over time) . If levels of beh avior change in ways that are consis Lent with the


plan, a ca usal inference is suppor ted, at least to a degree. In a changi11g program design, the inte nsi t y of an interven tion is progressively stepped up in a planned manner. For exampl e, the munber of hours per week of one-on-one intervention with an autistic child might be increased in 4-hour increments until a pattern of relatively consistent improve-ment was achieved. T his design is more likely to be used in clinical and e:>-..-ploratory studie , where tl1e required intensity of interven tio n is unknown.

Multielement Designs

Alternaling Tnterventions Design One SS R design with considerable util ity for clinical and direct practice research is the alternating in terventions or alternating treatments design, the most common of the so-called multielement designs (one o ther, simultaneous inter-ventions) is discussed below). In this design, two or more interventions are rand omly and rap id ly alternated to determine the diffe ren tial impact of each for the subject (or group of subjects). For example, Jurb ergs, Palcic, and Kelley (2007) tested the relative utili ty of two fo rms of school -home notes on the performance of low-income children diagn osed with altention-deficit hyperactivity disorders. A school -home note is a da ily report on student performance and behavior sent home by the teacher; parents provide previously agreed rewards based on those reports. In this stud y, one type of note added a loss of point:. (a minor punishment contingency) arrangement to the st.andard note. Which type of note was used each day was randomly determined; stud ents knew each d ay which note they were using. Both produced large results in academic performance and on-task behavior, with no mea ningful differences found between the two cond itions. Nonetheless, paren ts preferred th e notes that included the pun ishment arrangement. T hi s study also invol ved a wi thdrawal phase> so it is actually a hybrid invol ving both a lternating in terwn-tions and an ABAB with follow-up design elements. Figure 14.6 snows data for one of the si.x subjects in the study.

In a second example, Saville, Zinn, Ncef, Van Norman> and Ferre ri (2006) compared the use of a lecture approach and an alternative teaching method called interteachi11g for college co urses. ln tcrte:~ching involves having students work in dyads (or occasionall~ in groups of three) to discuss study q uestions together; lect uring in interteachin g courses typically is used on ly to clarify areas that students indicate on their inteTteaching record:. were difficult to understand. (There have been several ea rlier studies of in terteaching [e.g., Boyce & Hineline, 2002; Saville, Zinn> & Elliott, 2005), all of which in dicate that stu dents perfo rm hetter on examinati ons and prefer interteaching; clea rly, thi s technique needs to be more widely known in social work education.) In the first of two studies reported by Saville and colleagues (2006), which of the two techniques would be used each day was randomly determ ined. Quiz scores on the days when lec ture was used aver-aged 3.3 o n a 6-point scale, whi le scores on interteaching days averaged 1.7 (and had much smaller variance). Tn the second study repo rted in this article, tvo sections were used. Each day, one rece ived lecture and th e other in terteaching. Test scores for inte rteaching were higher in every case for the section using interteachi ng on that day.

There may be order and carryover effects in some alternating intervention studies (e.g., which inte rvention is experienced first may affect the later results) , but those who have st udied t hem believe that rapid alternations and counterbalancing can successfully mini-mize such effects. It is also always possible that the alternation itself may be an active vari-able in some cases, fo r example, because of th e novelty involved or minim izing satiation related to one or both techniques. Usuall y, a m o re significant concern in alternating in ter-vention studies is to determine how big a difference between interventions should be


Baseline 100


Treatment Baseline Treatment Follow-up

p. –6: Q ! ~ 90 /_-2k.Q-· 0 I I t;:) n.e,! lG 80 If 1 / I .8::..!" !

G – — – —cr– — – 0

~ ~~ Vvi A' 0 ~~~–~- w : ~ 50 f6 : 'E 40 : :

0.. 10 : ' ~ ~~u: . ~

0 1

I I I I I I I I I I I I I I I I I ; I I -r-r-1 T"i < i-,.–T"i "'T"I """11-r-T"j ""Tj-jr-T""""Tj """llr"'T"""II

5 9 13 17 21 25 29 33 37 41 45 49 Observation #

-+– Baseline – G- Respon se Cost –;;:,.– No Response Cost

Figure 14.6 Results for one case in the study by Jurberg s, Palcic, a nd Kelley (2007, p. 369) of t he use of school notes of two types. In the response cost con dition, a mild punishment condition was added to the standard reward arrangement. In the no-response cost condition, on ly the reward arrangement wa s in place. This is a n alternating inte rventions study; notice how the two co nd itions are intermixed in random order during the treatment phases.

SOURCE: © 2007 School Psychology Quarterly; reprinled with permission.

regarded as meaningfu l. Using visual a nalysis, as is Lyp ical in such studies (see below) , the most important question is commonly whether the difference found is clinically or socially meaningful. It is also in some cases possible to test differences statistically, for example, using online sofh.vare to perform randomizaLion tests (Ninness et al., 2002).

Simultaneous Interventions Design T here is also a little used design discussed by Barlow and Hersen (I 984) called the simultaneous inLcrvcntions or sjmultaneous treatments design, in which multiple intervenLions are provided at the same time. In the example they provide (Brow ning, 1967), different staff members handled a child behavior problem in different ways, and data were collcc Lcd on freque ncy of time spent with each staff member. The underlying assumption of the study was that the ch ild would spend more time with staff members whose approaches were Lhe least aversive. No examples of this design appear to be present in the social work literature and few anywhere else. None-theless, because the logic of the design for questions related to differential preferences is in triguing, it is included here so that its potential not be forgotten.

Successive Intervention and Interaction Designs

In some cases, the best way to compare two or more interventions is to introduce them sequentially, thus producing an A-B-C, A-B-C-D, A-B-C-B-C, or other design in which the alternatives are introduced sequentially. For exa mple, after a baseline p eriod in which crime data are collected, intensive police patrols might be used in a neighbor hood for 4 weeks, followed by 4 weeks of citizen patrols (A- 13 C design). If substantially different crime rates are found while one alternative is in place, there is at least reasonable evidence of di ffcrential effectiveness. The evidence coul u be strengthened using various patterns of


reversal or wit hdrawal of con di tions. For example, if the data look much beller when citizen patrols are in place, it may be important to reintroduce poli ce patrols again, followed by cit izen patrols, to see if their superiority is found consistently. If neither shows evidence of much effect, they might be introduced together (A-B-C-BC design), or another app roach (say, an integrated multisystemic therapy and neighborhood coali tion strategy; Swenson et al., 2005) might be introduced (A-B-C-D design).

There can be analytic challenges in all of these sequential designs. All can be strength-ened somewha t by reintroducing baseLine conditions betwee n intervention phases (e.g., A-B-A-C or A-B-A-C A D). A further issue is the o rder in which interyentions are intro-duced. For example, citizen patrols logically might o nly be effective if they are introduced after a period of police patrols, and the design described above can not distinguish whether this is the case, even with reversals. It may occasionally be possible to compare results in different neighborhoods in which interventions are introduced in different orders, but the reaLities of engaging large numbers of neighborhoods for such a study are daunting. Bloom e t al. (2006) and Barlow and Hersen (1984) discuss designs that may be helpful in so rting out some interaction effects. For example, Bloom and colleagues describe an interaction design consisting of A-B-A-B-BC-B-BC phases that allows the investigator to clarify the effects of two interventions separately and in combination.

While the designs described above are relatively standard, il is common for investiga-tors to assemble elements of multiple designs in original ways to fit research questions and situations. Once one understands the logic of the common approaches, h~ or she can go on Lo develop a hybrid design or occasionally an entirely novel approach th at remains consistent with the level of rigor required.

Internal, External, Social, and Ecological Validity in SSR

A number of threats to internal and external validity need to be considered to determine how strongly to trust causal assertions about the impact of independent variables (vs. possible rival explanations). The same threats generally need to be considered for both group experiments and SSR, although in some cases, control for those threats is established in different ways. See Chapter 4 for general informat ion related to threats to internal and external validity.

Internal Validity in SSR

Some threats to internal validity that are handled differently in SSR include history, mat-uration, and regression to the mean. Sequential confounding is an additional threat that is commonly a greater issue in SSR than in group designs, simply becau se group designs tend to be limited to a single intervention.

History. Events other than the intervention could be responsible for observed change. As noted earlier, in SSR, one approach for controlli ng for history is through introducing, withdrawing, and reintroducing intervention with tl1e expectation that external events are un likely to occur multiple times just when intervention does (see Withdrawal Designs, above). A second approach involves the use of staggered baselines across persons, se ttings, or bt:haviors, which is based o n the same principle (see Multiple Baseline Designs, above). By contrast, in group experiments, tl1e mosl common control for histo ry is the use of


random assignment to in tervention and control/comparison groups, on the assumption that external events on average should affect both groups equally.

Maturation. Maturation refers to the effect of ongoing developmental processes- for exampl e, the effects of aging on performance in many areas. Group ex periments again rely on rru1dom assignment here, while single-system designs generally rely on withdrawal/ reversal or multiple baseline approaches. If intervention effects occur only when inter-vention occurs, maturation is unlikely. The more cases or reversa ls that are studied, the more persuasive this argument will be.

Regression to the Mean. In both gro up experiments and SSR, study participants are com-monly selected because they are experiencing acute symptoms, when problems may be at their is therefore likely that at a later Lime, problem levels wi ll naturally be so m e-what lower. In group experiments, tl1e impact of this effect is likely to be about the same across groups; the primary related problem in those studies is that regression may add measurement error to the analysis. ln SSR, the best way to contro l for regression is ro ensure that the baseline phase is long enough to demonstrate stability.

Sequential Confounding. As briefly discussed in the earlier section of this chapter on successive intervention and interactions designs, in SSR involving more than one inter vention p hase (e.g., an A-B-C desig n), it is !JOSsible that the effec ts of a later interventio11 phase may be potentiated or weakened by an earlier intervention in the series. It is nor always possible to completely eliminate this threat to internal validity, but it is often possible to reduce the likelihood of interference and interact ion by returning to baseline conditions (e.g., A-B-A-C) or counterbalancing (e.g., A-B-A- C-A-B ). Replications in which the order of ph ases is counterbalanced across cases can provide even stronger data for exploring interactions.

External Validity (Generalizability) in SSR Researchers arc usually interested in in terventions with wide applicability. They want to assist th e participants in their own study, but they are hoping that the results will apply to a much broader population. In nearly all cases, however, in both SSR and group experi-ments, study pa rticipants are drawn from those who arc relatively easily availa ble, and convenience samples arc the norm. Despite efforts to ensure that the study sample is " representative" of a larger population, there is really no way to know this without draw-ing a random sample. Random samples can only be drawn when an exhaustive list of the population of interest is available, which is ~cldom the case. Nhilc there are lists of regis-tered voters o r li censed social workers within a state, no such list exists of persons meet-ing lhc criteria for schizophrenic disorder, of battered women, or in fact o f most populations that social work research and practice are interested in. In general, no exisl-ing methodology prov ides assurance of generaJizability of re!>ults tO a larger population in most cases; rather, a logical case must be made for the likelihood of external validity.

In group designs, if random assignment to intervention and con trol groups is not pos-sible, the question of generalizability becomes even more difficult. Adding more partici-pants docs not help much in establishing external validity when samples cannot be randomly selected from the populat ion. 1lhi le larger groups provide better statistical power to determine differences between the groups in the study sample, they are not necessarily more representative of a larger population. (Be careful not to co nfuse random assignment to groups with random selection from the population of interest. ) The


actuar ial nature of most group experiments is also a threat to external validity, in that many in the experimental group often do not have good resulls, but we usually have no information as to why some and not others appear to benefit.

In the case of SSR, while the general concerns about generalizability in all experimental studies are also applicable, there is an established approach for building a case for gener-alizability through replication. Tn direct replication, once an effect has been obtained with a small number of cases, the experiment is repeated by the same experi menter in the same setting with other sets of cases; the more such replications that occur, the stra nger the case for a rea l effect. Direct replications can be followed by systematic replications, in which one or more of the key va riables in the experiment are varied (e.g., a different expe ri-menter, a different setting, or a somewhat differen t clien t issue), and th e data are exam-ined to de termin e whether the same effect is found. Clinica l replications ma y follow, in which field testing in regular practice settings by regular practitioners occurs. The more consistent the results across replications, the stronger the case for gcncnt li zab ility; in addi-ti on, cases that do not produce the expected results can be f urther analyzed and va riations in troduced, increasing both knowledge and perhaps effectiveness with unique cases.

Replicat ion is dearly importan t and all too infrec1uenr ( in bot h SSR and group experi-ments, in fact) . One criticism often heard of SSR is a concern abo ut the small number of cases. Ce rtainly, results with one or a small number of individuals arc nol as persuasive as those t hat have been wi dely replicated . On the other hand, most im portant in tervention findings begin with sm all numbers of cases and are strengthened through multip le repli-cations. For example, Lovaas's (1987; McEachin, Smith, & Lovaas, 1993) groundbreaking findings about the possibility of mainstreaming many autistic children using applied behavior analysis emerged from a long series of direct and systematic replications; group comparison studies were useful only after the basic parameters had been clarified through SSR. At the same time, so long as samples for either SSR or group e:xperiments are not randomly selected from the population of interest, neither large nor small samples can be regarded as representative of that population, and external validity relies primarily on establishing a plausible logical case.

Social and Ecological Validity in SSR Social validity, as the term is typically used in SSR, refers to (a) the social significance of the goals esta blished, (b ) the social acceptabi li ty of th e intervention procedures used, and (c) th e social importance of the effects (Wolf, 1978) . Anoth er use of the term social validity is th at of Bloom et aL (2006), who use the term to refer Lo "Lhte extent to which a measurement p rocedure is equally valid for clients with different social or cultural char-acterist ics" (p. 85) . This is clearly a different constru ct, however.

An intervention directed toward a goal that is not valued by clien ts or community; that rei ies on procedures that stakeholders find too difficult, expensive, or unp leasant; or that produces weak effects from the perspective of those stakeholders may be scientifically interesting but lacks social validity. (Social importance of the effects of in tervention has also been called clinical significance.)

Social validity is clearly cen tral to social work, as th e mission of social work ties it fun-damentally to issues of social importance at all system levels. Increases in internal validity sometimes reduce social validity; this is one of the central challenges to applied research. for example, it is relatively easy to introduce new practice for constructing developmen-tally n u tritive cultures in schools when problems arc few and lhe available resources are great; there is a large literature in this area . Our work suggests that it is much more diffi-cult to introduce such changes in poor inner-city schools in neighbor hoods where the


rates of vio lence, drug crime, and family breakdown are high and resources sparse (Mattaini, 2006). Yet this is often where the need for social work intervention is highest, and a human rights framework suggests that we have an obligation to provide the highesl quality services in such settings.

Ecological validity involves the extent to which observational procedures or other con-textual parameters of the intervention are consistent vtth natural conditions (Barlow & Hersen, 1984). A critical consideration here is reactivity, the extent to which clients respond differently because of their awareness that they are bei ng observed. A number of stra tegies for reducing the possible effects of observation have been developed, including unobtrusive measures, relying on observations by those who are n atu rally presen t, and providing opportunities for those observed to acclimate to the presence of observers or observational technologies before formal measurement begins (Barlow & Hersen, 1984 ). There is no way to protect completely from reactivity in either SSR or group experiments, but SSR does offer the possibility of varying observational procedures as one of the active variables built into the study. It is also possible to vary other contextual parameters in a delibera te and planned way within SSR, and it is often possible to conduct such research in natural settings (e.g., homes and classrooms) in ways Lhat vary little from usual conditions.

Analysis of Single-System Data

T here are two basic strategies for the analysis of SSR data: visual analysis and statistical analysis. Each has its strengths and limitations, but in some studies, it is possib le to use both to explore the data more fully.

Visual Analysis Visual analys is has been the primary approach used in the evaluation of SSR data from the beginning and is based on the assumptio n that only effects that are powerfu l enough to be obvious to the naked eye shou ld be taken seriously. According to Parsonson and Baer ( 1978), "Differences between baseline and ex-perimental cond itions have to be clearly evident and reliable for a convincing demonstration of stable change to be claimed . . . an effect would probably have to be more powerful than that required to produce a statisti-caUy significant change" (p. 112). (Note that the magnitude of change sought visually is conceptually related to effect size in statistical an alysis.) This search for strong effects is consistent with common social work sentiment, in that most clien t: and community issues with wh ich social workers intervene are q uite serio us and require ver y substantial levels of change. The change sought in visual analysis usually is in mean levels of a problem or goal over time (e.g., is the client more or less depressed than during baseline?). Besides level, however, both trend (e.g., has a problem that was getting worse over time stabilized or begun to in1prove?) and variability (e.g., has a chi ld's erratic behavior leveled out?) are also often important co nsiderations.

Vis ual analysis relies on graphing; note the graphs used in earli er discussions of SSR designs in this chapter. Strong, consistent effects shou ld be immediately obvious to the observer, and multiple independent observers should agree that an effect is present to accept that change is real. One common standard for judging the presence of such an effect is the extent of overlap in data between the baseline phase and the intervention phase. If there is no overlap (or almost none when there are many data points), the presence of a real effect usually ca n be accepted with co nfidence (see the left panel of Figure 14.7) .


Figure 14.7 The data on the left panel show a dear discontinuity at the point of intervention , with no overla p between ph ases, suggesting an intervenlion effect. The data shown on the right, despite the nearly comp lete overlap between phases, are also convincing, and a dear trend reversed

dramatically at the point of intervention.


~~ ~!



Figure 14.8 The data on the left pa nel show a t rend in the basel ine data that genera lly continues into the intervention phase, suggesting little or no effect. By contrast, there is a clear discontinuity of level at the po int of intervention in the data on the right, which suggests a n effect even though the slopes within phases are simi lar.

Useful as that criterion can often be, there are other types of obvious effec ts. For example, the righ t panel of Figure 14.7 shows baseline data for a p ro blem to be escalatin g over time (an upward trend). 'When inlcrvention is introduced, Lhe trend reverses. While there is complete data overlap between the phases, a strong and convincing effect is clearly present. On the other hand, as shown in the Jcrt panel of Figure 14.8, when there is a trend in the baseline data in the desired direction, and the trend appears to co ntinue into the intervention phase, one cannot assume a n interven tion effect. On the other hand, if there is a distinct discontinuity, as in the right panel o f Figure 14.8, the eviden ce for an effect is more persuasive.

To be fully persuasive, changes in level, trend, or variabi lity sho uld usually be relatively immediate. The changes see n in Figure 14.7 arc examples; in both cases, change began occurring as soon as in ter vention was introdu ced. If intervention requires a number of sessions, days, or weeks to begi n to show an effect, Lbc graph will usua ll y be m uch less convincing. An excep ti on to this principle wou ld be a situation in which one predicts in advance that change will not occur for a prespecified period of time, based on a co nvinc-ing rationale. If change occurs la ter as predicted, the data would also be persuasive.


What if the patterns identified in the data are ambiguous (Rubin & Knox, 1996)? Difficult as it may be for an investigator to accept, most single-subject researchers take the position that an ambiguous graph sho uld usually result in accept in g the n ull hypothesis (that the intervention does not have a meaningful effect; M::~ttaini, 1996). There are times when statistical and quasi-statistical methods as discussed below may help to sort out such situations, but in many such cases, any effect found is likely to be small and may not be of clinical or social significance. Another option is to extend the phase in which ambiguous data appear (Bloo m et al. , 2006), which may produce further stability. Bloom ct aL (2006) discuss a number of types of data ambiguity and additional strategies that may be useful. Often, however, refinement and strengthening of the intervention is what is required, although there ce rtainly are limes when findin g a possible but uncertain effect may be a step in the search for a more powerfu l one.

lsmes. Unfortunately, there are other problems with visual analysis beyond the ambiguity asso ciated with weak effects. While many initially believed that visual a nalysis was a con-servative approach in which Type I (false alarm) errors were unlikely, studies have indi-cated that this is not always the case (DeProspc ro & Cohen, 1979; Matyas & Greenwood, 1990). Matyas and Greenwood ( 1990) found false alarm levels ranging from 16% to 84% with graphs designed to incorp orate varying effec t sizes, random variations, and degrees of autocorrelation. (Autocorrelation is discu ssed later. ) Furthermore, DeProspero and Cohen (1979) found on ly modest agreement among raters with some ki nds of graphs, despite using a sample of reviewers familiar with SSR. These findings certainly suggest accep ting only clear and convincing graphic evidence and have led to increasing use of sta tistical and q uasi-statistical methods to bolster vis ual analyses. Nonetheless, visual analysis, conservatively handled, remains central to determination of socially and clini -ca ll y significant effects.

Stati stical and Quasi-Statistical Analysis

The use of statistical methods has been and remains controversial in SSR. There has long been some concern that relying on statistical methods would be a distraction since it might result in finding many small and socia lly insignificant effects (Baer, 1977). Most single-system studies also involve on ly modest numbers of data points, which can severely limit the applicability and power of many types of statistical analyses. In this section, I briefly introduce several common approaches; space does not permit ful l development, and interested readers should th erefore refer to the origi nal sources for further informa-tion. Before looking at the options, however, th e issue of autocorrelation in single-system data must be examined.

Autocorrelation. One serious and unfortunately apparently co mmon issue in the stat ist ical analysis of single-system data is what is termed autocorrelation. Most sta tistical techniques used in SSR assume that the data points are independent, and no single observation can be predicted from previous data points. Unfort un ately, in some cases in SSR, values al one point in time ca n to some extent be predicted by earlier values; they an: autocorrelated (or serially dependent). There has been considerable debate over the past several decades as to the extent and severity of the autocorrelation problem (Bloom et al., 2006; Huitcma, 1988; Matyas & Greenwood, 1990) . Autocorrelation can increase both Type T (false positive) and Type II (false negative) erro rs. For this reason, statistical met hods that take autocorrelation into account (autoregress ive integrated movin g averages [ARIMAJ, for example) or that tra nsform the data to remove it are preferred when possible. Bloom et al. (2006) provide


statistical techniques to test for autocorrelation as well as transformations to reduce or remove autocorrelation from a data set. With smaller data sets, autocorrelation may well be present but may be difficult or impossible to identify or adjust for. In such cases, reliance on visual analysis may be best, but it is impo rtant to note that autocorrelation is often not evident to the eye and can affect the precision of visual analysis as well. Bloom et al. suggest that autocorrelation can be reduced by maximizing the interval between measurement points and by using the most valid and reliable measures possible.

Statistical Process Control Charts and Variations. Statistical process control (SPC) charts are widely used in manufacturing and business settings to monitor whether variations in an ongo ing, stable process are occurring. Determinations about what ch anges in the data should be regarded as reflecting real changes are based on decision rules that have a sta-tistical rationale. A number of types of SPC charts have been developed for different types or da ta and research situations (Orme & Cox, 2001) . SPC methods are generally useful even with small numb ers of observations (although more are always better) and, with rig-orous decision rules, are relatively robust even when autocorrelation or nonnormal dis -tributi ons are present Nugent et al. (2001 ) have developed variations of SPC charts that take into account such situations as small numbers of data points, nonlinear trends in base li ne phase data, or when the first or last data point in a phase is a serious outlier. The anal yses they suggest, although based on rigorous mathematical proofs, are easy to per-form with a simple graph and a ruler and, like other SPC-based methods, rely on simple decision rules (e.g., "two of three data points falling more then two sigma units away from the extended trend li ne signify significant change:' p. 127).

A RIMA Analysis. ARIMA procedures correct for several possible issues in single-system and time-series analyses (Gottm an, 1981; Kugent et al., 1998). These include autocorrela-tion, including periodicity (cyclical and seasonal effects), movin g average processes, and viola Lions of the assumption of stationarity. The major obstacles to routine use of ARJ11.A procedures are its complexity and the requirement for large numbers (at least do zens ) of data poi n ts in each phase. The only social work study using this procedure of which I am aware is ugent et al. (1 998), but particularly in policy analysis, there is considerable potential for the use of ARIMA and other related time -series an alysis methods such as segmented regression analysis.

Other Statistical Techniques. The proportion/frequency approach uses the binomi al dis-tribution to compare data in the inter vention phase Lo the typica l pattern during base-line. If an adequate number of observations during intervention fall outside th e typical baselin e range (in the desired dir ection), the change can be regarded as statistically sig-nifi ca nt. The conservative dual-criteria (CDC) approach , described by Bloom et al. (2006), is a related appro ach in which results are viewed as stati sti cally significa nt if and only if they fall above both an adjusted m ean line and an adju sted regression line calcu-lated from baseline data. The CDC approac h appea rs to be somewhat more robust in the face of some types of autocorrelation than many other approaches. Under som e circu m-sta nces, standard statistical methods such as t tests and ch i-square tests can be used to test the differences between phases in SSR studi es, although such use remains co ntrover-sia l and ca n be complica ted by autocorrelation and the shapes of data di stributions, amon g other concerns. Recent developments in the application of randomization tests using software tha t is freely available on line (Ninness et al., 2002) arc a major advan ce, as the shape of underly ing distributions and the small number of observations are not issues with such analyses.


Effect Sizes and Meta-Analysis. Measures of th e magnitude of intervention effect, effect sizes (ESs), are increasingly imp ortant in SSR. ESs are the most common metrics used to compare and aggregate studies in meta-analyses, but they may also be useful for judging and repo rting the power of interventions within individual st udies (Parker & Hagan-Burke, 2007). The calculatio n of ES in a standa rd AR design is stra ightforward. The mean value for the baseline phase A is subtra cted from the mean for the intervention phase B, an d the result is divided by the standard deviation of the baseline val ues:

(There arc dozens of ways that ES can be calculated, but this is the most common.) A value o f .2 is considered a small effect, .5 a medium effect, and .8 a large effect using this fo rmula (Cohen, 1988). This meas ure assumes no meaningful trend in the d ata, which is no t always the case; other related approaches can be applied in such circums tances (Bloom et al., 2006). Variations are needed in multiphasc des igns; for example, A-B ES across partici-pants in a multiple baseli ne study can be averaged to obtain an overall effect size.

Parker and Hagen -Burke (2007) provide several argumen ts for the use of ES in SSR, includin g journal publication expectations and th e widesp read use of ES in the evidence-based p ractice movement. Furtherm ore, while recognizing that v isual analysis is li kely to remain the primary approach in SS R, Parker and Hagan-Burke suggest that ES can strengthen analysis in four ways: by increasing objectivity, increasing precision, permit-ting the calculation of confidence intervals, and increasing general credibility in term s of p rofessional standards. Still, the use of ES in SSR is relatively new, there are many unre-solved technical concerns (Bloom ct al. , 2006) , an d , most significan tly, patterns in the data, which are often a nalytically important, are lost in reducing results to a single index.

One approach to examining generalizability in SSR is meta-analysis, in which the results of multiple studies are essentially pooled to increase the n u mber of cases (and thus statistical power) and the breadth of contextual factors being examined. Meta-analysis has bccorn e common in group experimental research, an d in terest in meta-ana lysis fo r SSR is growing (Busk & Serlin, 1992 ). The number of meta-analytic studies of SSR has been small, however, so the final utility of the approach remains to be seen.

A number of serious issu es associated with meta -analysis should not be minimized (Fischer, 1990). While there arc statis tical and methodological issues and limitations, per-haps the most serious concerns are (a) the loss of information re lated to pa tterns of change over time (Salzberg, Strain, & Baer, 1987) and (b) th e exten t to which the inter-ventions applied across studies arc in fact substantially identical. The first involves a trade-off between statistical and analytic power, recalling that manipulations of va riables and observa tion of patterns of resu lts over time arc the hear t of a natural science approach to behavior change. Standard m eta -analytic tech niques red uce th e results of each study to a single effect size, th us losing much of the information provi ded by the study. With rega rd to the second concern, any experienced social worker can tell you that two professionals might use the same words with the same client and have wildly differ-ent results depend in g on history, skills, nonverbal and paraverba l behavi ors, levels of warmth and authenticity, context, and a wide ra nge of other factors . The contrast with giving the same pill in multipl e sites is considera ble (although contextual fac to rs no do ubt arc important there, too ). In practice research, achieving consistency across cases is difficult, and across studies is even more so. Nonclhclcss, the potential for meta-analytic methods in SSR is currently unknown and should continue to be explored.


SingleMSystem Research and the EvidenceMBased Practice Movement

Across a II of the helping professions, demands for evidence-based and evidence-informed practice are strong and growing, and this is as it should be. When we know what is likely to help, there is a clear ethical obligation to use that information in work with clienb and cl ien t gro ups. Requirements for the use of evidence-based pra ctices are increasinglv built into legislation and policy. In marry cases, randomized clinical trials (H.Cfs, a verv rigor-ous type of group experimental design) are regarded as the "gold standard" for determ in-ing which practices should be regarded as evidence based. A strong case can be made, however, that RCTs arc sometimes not the best approach for validating best practices and that rigorous SSR offers an alternative for doing so under certain circumstances Horner et al., 2005; Odom et al. , 2005; Weisz & H awley, 2002).

As elaborated b y H orner and colleagues (20 05 ), five standards should be used to dete rmin e that a practice has been docume nted as evidence-based using SSR methods: "(a) the practice is operationally defined, (b) lhe context in which the practice is to be used is defined, (c) rhe practice is implemented w ith fidelity, (d) results from smgle-subject research document the practice to be functionally related to change in depen-dent measures, and (e) the experimental effects are replicated across a sufficient nu mber of studies, researchers, and participants to allow confidence in the findings" (pp. 175-176) . Most of these standards have been discussed in some depth earlier m this ch apter. The spec ific s t·andard discussed by Horner et al. for replica tion is ''orth particular note, however: "A practice may be considered evidence based when a a minimum of five single-subject studies that meet minimally acceptable methodolog1c3.1 criteria and document experimental control have been published in peer-re' -~ •. :d journals, (b) the studies are conducted by at least three different researchers across ar least three different geographic locations, and (c) the five or more studies include a lOral of at least 20 p ar ticipants" (p. 176).

Recalling some of th e limitations of group comparison experiments outlined early in this chapter, the importance of rigorous single-system designs for determining ''ilich practices should be regarded as evidence based is clear. The flexibility and lower c~ts of SSR may produce more information about those best practices much more quickh· than RCTs and other group experiments under circumstances where what is known is limited.

SingleMSystem Research: An Example

A strong example of valuable SSR in social work is the article "Use of Single-S,·stem Research to Evaluate the Effectiveness of Cognitive-Behavioural Treatment of Schizophrenia" by William Dradshaw (2003) . This study used a mu ltiple-baseline-across-seven -subjects d esign to test the effects of cogn itive-b ehavioral treatment (CBT J O'er a 36 month period on standardized measures of (a) psychosocial functio n ing, (b severity of symptoms, and (c) attainment of self-selected goals. There has been only very limited work, especially in the United States, on the value of CBT for persons with schizophrenia, and the studies of short-term CBT intervention have dem onstrated at best weak effects. The researcher hypothes ized tha t longer term intervention wjth this long-term condition (at least 6 months of impairment arc req uired fo r the diagnosis to be made) would pro -duce s ubstantial effects.



Ten adult cl ients were randomly selected from the ongoing caseload of a community mental health center; diagnoses were confirmed by psychiatric and social work diagnosti-cians with 100% agreement . Three of the 10 dropped out early in the study. Average length of illness for the remain ing 7 was 11 years, and 6 of the 7 were on psychotropic medication throughout the study. Of these clients, 2 were assigned to 6-month baselines, 2 to 9-month baselines, and the remaining 3 to 12-month baselines. During baseline con-ditions, quarterly monitoring by a psychiatrist and a case manager was provided (stan-dard treatment). At the end of th e baseline period for each, weekly CBT was initiated. The treatment package included efforts to strengthen the therapeutic alliance (in part through the use of solution-focused techniques), education about the illness, and standard cognitive-behavioral interventions, inclu ding activity scheduling, exercise, relaxat ion exercises, and cognitive restructuring, among others. Quarterly evaluations on the measures of func-tioning and symptoms were independently conducted by the clinician (the researcher) and case m anagers for each client, with very close ag reement.

Ana lysis Study data were analyzed both visually and statistically. Quarterly scores for psychosocial functioning and psychiatric symptoms were plotted on standard graphs, from which pat-terns could be identified (see Figure 14.9 for one example).

The visual results were compelling, with a few anomalies as would be expected work-ing with persons with severe mental illness. The data were tested for autocorrelation; none was found in the baseline data, but autocorrelation was found in all seven cases in the intervention phase data. As a result, a first differencing transformation (Bloom et al. , 2006) was used to remove the effects of autocorrelation, and t tests were then conducted.

Results All of the investigator's hypotheses were supported. As shown in Figure 14.9, all clients showed statistically significant impr ovements in psychosocial functioning, with an aver-age effect size of2.96 (a statistically large effect, generally reflecting improvement of about one and one-half levels on the 7-point scale). All showed statistically significant decreases in symptoms, with an average effect size of -2.19 (again a large effect). Every client also made greater tha n expected progress on self-selected goals from pretest to posttest, using standardized GAS scores. Visual analysis showed clear improvements for every client on each of the scales following flat baselines. Recognizing the limitations of this study being conducted by a single investigator in a single agency with a relatively homogeneous pop-ulation, th e researcher approp riately called for systematic replications by others.


Single-system research is, in many ways, an ideal methodology for social work research. Social worker practit ioners commonly work with one system at a time, under often unique contextual conditions, and SSR methods have the potential to make such work much more powerful. Every client's world and behavioral history are different, and unlike in many types of m edicin e, for example, standardized treatments are unlikely to be widely applicable without individualization. While social science methods, group experiments,

CIIAPTrR 14 • SINGLC· SvsnM ReseARCH 269

26 24 22

t/) 20 ~ 18 0 16 CJ 14

CJ) 12

Client 1


6 12 18 24 30 36 42 48

Baseline Intervention

Client 2 }-()

26 24 22

~ 20 CIJ 18 ….. 16 8 14 en 12 en 10 u. 8 a: 6

4 2 0

0 6 12


24 22

~ 20 CIJ 18 …. 16 8 14 en 12

~ ·~~~~~. '' I I I

26 l

en 10 u. 8 a: 6

4 2 0

0 6 12 18 24 30 36 42 48 0 6 12

Baseline Intervention Baseline

Client3 26 24 22

~ 20 CIJ 18 ….. 16 8 14 (J) 12 C/) 10

~ ~t I 0 I I I I

6 12 18 24 30 36 42 48 0 6 12

Baseline Intervention Baseline

Client 4

~~ ~ r-o-o tJ) 20 1 j:5 j ~~ : – /'rJ C/) 10 : r..)




~ lr:± I I I I I I I I I I 1

0 6 12 18 24 30 36 42 48

Baseline Intervention

Client 5

24 30 36 42 48


Client 6

24 30 36 42 48

Interventio n

Client 7


24 30 36 42 48


Figu re 14.9 Measures of psychosocial functioning ror the seven clients included in the Bradshaw {2003) study described in

the text.

SOURCE: © British Journal of Social Work; reprinted with permission.


and other forms of scholarship have important niches in social work research, perhaps no ot her strategy is therefore at the same time as practical and as powerfu l for determining what helps and what hurts, under what condit ions, as is single-system research groun ded in nalural science strategies.

Not e

1. Withdrawal designs arc sometimes called reversal designs; technically, however, in a reversal design, Lhe intervention is app lied during the reversal phase in ways that attempt to make the behavior of interest worse; there are few if any circumstances when social work researchers would have occasion to use this approach.


Allday, R. A., & Pakurar, K. (2007). Effects of teacher greetings on student on -task behavior. Journal of Applied Behavior Analysis, 40, 317-320.

Azrin, N.H., Naster, B. J., & Jones, R. (1973). Rec iproci ty counseling: A rapid learning-based pro-cedure fo r ma r ital cou nseling. Behaviour Research & Therapy, 11, 365-382.

Baer, D. M. ( 1977) . Rev iewer's comm ent: Just beca use it's reliab le doesn't mean that you ca n use it. journal of Applied Behavior Analysis, I 0, 117- J 19.

Barlow, D. H., & Hersen, M. ( 1984). Single case experimental desigrts: Strategies for studying behavior change (2nd ed.). New York: Allyn & Bacon.

Big! an, A., Ary, D., & Wagenaar, A. C. (2000). The value of interrupted time-series experiments for community intervention research. Prevention Science, 1, 31-49.

13loonl, M ., Fischer,]., & Or me, ). (2 006) . Evaluating practice: G11irielines for the accountable profes-siona l (5th ed .). l3oston: Allyn & Bacon.

13oyce, T. E., & Hineline, P. N. (2002). lnterteaching: A strategy for enhancing the user- friend liness of behavioral arrangements in the college classroom. The Behavior Atwlyst, 25, 215-226.

Bradshaw, W. (2003) . Use of single-system research to evaluate the effectiveness of cognitive-behav-ioural treatment of schizoph renia. Rritislt journal of Social Work, 33, 885-889.

Browning, R. M. (1967) . A same-subject design for simultaneous comparison of three reinforce-ment contingencies. Behaviour Research and Therapy, 5, 237- 243.

Busk, P. L., & Serlin, R. C . ( 1992). Meta-a nalysis for single-case research. In T. R. Kra Locbwill & ). R. Levin (Eds.), Sinf(le-case reseaTch design and analysis: New directions for psychology and education (pp. 187-212). llillsdale, NJ: Lawrence Erlbaum.

Cohen, J. (1988). Statisticnl power analysis for the behavioral sciences (2nd ed. ). Hillsdale, NJ: Lawrence Erlbaum.

Coulton, C. (2005). The place of community in social work practice research: Conceptual and m ethodological developments. Social Work Research, 29(2), 73-86.

Dan iels, A. C. (2000). Bringing out the best in people. New York: McGraw-Hill. Davis, R. L., Ninness, C., Rumph, R., McCuller, G., Stahl, K., Ward, T., e t al. (2008). Funct ional

assessment of self-initialed maladaptive behavio rs: A case study. Behavior and Social issue;, 17, 66-85.

DeProspero, A., & Cohen, S. (1979) . Inconsistent visual analyses of intrasubject data. journal of Applied Behavior Atwlysis, 12, 573-579.

Dimidjian, S., Hollon, S. D., Dobson, K. $ ., Schmaling, K. B., Kohlenberg, R. ]. , Addis, M. P.., et al. (2006) . Randomized trial of behavioral act iva t ion, cognitive therapy, and antidepressanlmcd-ica tion in the acute treatment of adLuls wi Lh major depress ion . journal of ConstJLting and Clinical Psychology, 74, 658-670.


Embry, D. D. (2004). Community-based prevention using simple, low-cost, evidence-based kernels and behavior vaccines. Joumal of Community Psycholo!(y, 32, 575-591.

Embry, D. D., Riglan, A., Galloway, D., McDaniels, R., Nunez, N ., Dahl, M . J., et al. (2007) . Evaluation of reward and reminder visits to reduce tobacco sales to young people: A multiple-base-Line across two states. Unpublished manuscript.

Epstein, N. H., Bald win , L. M., & Bishop, D. S. (1983). Th e McMaster Family Asscssmcnl Device. Journal of Marital and Family Therapy, 9, 171-1 80.

Fischer, J. (1990). Problems and issues in meta-ana lysis. In L. Videka-Sherman & W. J. Reid ( Eds.), Advances in clinical social work research (pp. 297- 325). Washington, DC: NASW Press.

Fisher, K., & Hardie, R. ). (2002). Goal attainment scaling in evaluati ng a multidi sciplinary pain management program. Clinical Rehabilitation, 16, 871-877.

Goldstein, A. P., Glick, B., & Gibbs, ). C. ( 1998). Aggression replacement training: A comprehensive intervention for aggressive youth (2nd ed. ). Champaign, IL: Research Press.

Gottm an, J. (1981) . Time series analysis: A comprehensive introduction for social sciemists. New York: Cambridge University Press.

Horner, R. H ., Carr, E. G., Halle, J. , McGee, G ., Odom, S., & Wolery, M . (2005) . The use of single-subject research to identify evidence -based pracLicc in special education. Except.ional Children, 71, 165-1 79.

Hudson, W. W. (1982). Tlte clinical measurement package: A field manual. Homewood, l L: Dorsey. Iluitema, B. E. (1988). Autocorrelation: 10 years of confusion. Behavioral Assessment, 10,253-294. Jason, L. A., Braciszewski, )., Olson, B. D., & Ferrari,). R. (2005). Increasing the number of mutual

help recovery homes for substance abusers: Effects of government policy and funding assis-tance. Behavior mul Social Issues, 14, 71-79.

Johnston, }. NL, & Pennypacker, H . $. (1993). Readings for "Strategies and tactics of behavioral research" (2nd ed.). Hillsdale, NJ: Lawrence l~rlbaum.

Jurbergs, N., Palcic, }., & Kelley, M. L. (2007). School-home notes with and without response cost: Incr easing attent ion and academic performance in low-income ch ild ren with attention-defLciVhypcractivity d isorder. School Psychology Quarterly, 22, 358-379.

Kerlinger, F. N. ( 1986) . Four1dations of behavioral research (3 rd cd.). New York: Holt, Rinehart & Winston.

Kiresuk, T. J., Smith, A., & Cardillo,}. E. (1994). Goal attainment scaling: Applications, theory, and measurement. Hillsdale, ~J: Lawrence Erlbaum.

Kopp, J. (1993) . Self-observation: An empowerment strategy in assessment. In J. B. Rauch (Ed. ), Assessment: A sourcebook for social work practice (pp. 255- 268). Milwaukee, WI : Families Internatio nal.

Lee, V. L. (1988). Beyond behaviorism. Hillsdale, NJ : Lawrence Erlbaum. Lovaas, 0 . I. (1987). Behavioral treatment and n or ma l edu cational and intellectual func tioning in

yotmg autistic children. Journal of Consulting and Clinical Psychology, 55, 3-Y . Mattaini, M.A. (1996). The abuse and neglect of single-case designs. Research on Social Work

Practice, 6, 83-90. Mattaini, M . A. (1997) . Clinical intervention with it1dividuals. Washington, DC: NASW Press. Mattaini, M.A. (2006). Will cultural analysis become a science? Behavior and Social Issues, 15, 68-80. Mattaini, M.A . (2007) . Monitori ng social work practice. In M . A. Mattaini & C. T. Lowery (Eds.) ,

Foundations of social work practice (4th cd., pp. 147- 167). Washington, DC: ASW Press. Mattaini, .M.A., & Lowery, C. T. (2007) . Perspectives for practice. In M.A. Mattaini & C. T. Lowery

(Eds. ), Foundations of social work practice (4th ed., pp. 3 1-62). Washington, DC: NASW Press. Mattaini, M.A., McGowan, B. G., & vVilliams, G. (1996). Child maltreatment. In M . A. Mattaini

& B. A. Thyer (l:::ds .) , Finding solutions to social problems: Behavioral strategies for change (pp. 223-266). Wash ington, DC: American Psychologica l Association.

Matyas, T. A., & Greenwood, K. M. ( 1990) . Visual analysis o f sin gle-case tim e series: Effects of va ri-ability, serial dependence, and magnitude of interve ntion e ffects. Journal of Applied Behavior Analysis, 23, 341-351.

McEachin, J. ). , Smith, T., & Lovaas, 0. I. (1993). Long-term outcome for children with autism who received early intensive behavioural treatment. American Journal on Mental Retardation, 97, 359- 372.


Moore, K., Delaney, J. A., & Dixon, M. R. (2007) . Using indices of hap piness to e_xami ne the infl u-ence of enviroumcntal enhancements for nursing home residents with Alzheimer's disease. journal of Applied Behavior Analysis, 40, 541 – 544.

Newton, M. (2002). Evaluating the outcome of counselling in primary care using a goal attainmen t scale. Counselling Psychology Quarterly, 15, 85-89.

Nin ness, C., Newton, R., Saxon, J., Ru mph, R., Bradfield, A., Harrison, C., et al. (2002 ). Sma ll group statistics: A Monte Carlo comparison of parametric and randomization tests . Behavior and Social Issues, 12, 53-63.

Nugent, W. R., Bruley, C., & Allen, P. ( 1998). The effects of aggressio n replacement training on anti -social behav io r in a runaway shelter. Research orr Social Work Practice, 8, 637- 656.

Nugent, W. R., Siepper t, ]. D., & Hudson, W. W. (2001) . Practice evaluation fo r the 21st century. Belmont, CA: Brooks/Cole .

Odom, S. L., Brautlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. R. {2005) . Rt'search in special education: Scientific methods and evidence-based practices. Exceptional Children, 71, 137- 148.

Orme, J. G., & Cox, M. E. {2 001). single-subject design data us i11 g stat istical proce~s con-trol ch;1 rts. Social Work Research, 25, J. LS-127.

Parker, R., & H agan-Burke, S. (2007). Useful effect size inlerprctations fo r sin gle case research. Behavior Therapy, 38, 95- 105.

Parsonson, B. S., & Baer, D. M. (1978). The analysis and presentation of graphic data. In T. R. KraLochwill (Ed.), Si11gle subject research: Strategies for evaluating change (pp. 101-165) . New Yo rk: Aca demic Press.

P utnam, R. F. , Handler, M. W., Ramirez-Piall, C. M., & Lu isell i, ). K. (2003) . Improving student bus-riding behavior through a whole-school intervention. Journal of Applied Behavior Analysis, 36, 583- 590.

Rubin, A., & Knox, K. S. (1996). Data analysis problems in single-case evaluation: Issues for research on social wo rk p ractice. Research on Social Work Practice, 6, 40-65 .

Sal7.berg, C . L., Stra in, P. S., & Baer, D. M. (1 987). Meta-a nalysis for single subject research: When docs it clar ify, when does it obscu re? Remedial and Special Education, 8, 43- 48.

Saville, B. K., Zinn, T. E., & Elliott, M. P. (2005) . Interteaching vs. traditional methods of instruc-ti on: A preliminary analysis. Teac11i11g of Psychology, 32, 161-163.

Saville, B. K., Zinn, T. E., eef, N . A., Van Norman, R., & Ferreri, S. J. (2006) . A comparison of and lectu re in the college classroom. Journal of Applied Behavior Analysis, 39,49-61.

Serna, I.. A., Schumaker, J. B., Sherman, J. A. , & Sheldon, J. ll. (1991). In- home gen eralizaLion of socia l interactions in families of adolescen ts with behavior prob l em~. journal of Applied Behavior Analysis, 24, 733-746.

Stuart, R. B. ( 1980). Jlelping couples change. New York: Guilford. Swenson, C. C., Hcnggeler, S. W., Taylor, l. S., & Addison, 0 . 'vV. (2005). Multisystemic therapy and

neighborhood pa rtnerships. New York: Guil ford. Thycr, B. A. (2001). Gllid elines for eval uati ng ou tcome studies o n social wo rk p ractice. Research on

Social Work Practice, 1, 76- 91 . Thyer, B. A., & .Myers, L. L. (2007). A social worker's guide to evaluating practice outcomes.

Alexandria, VA : Council on Social Work Education. Tuckman, B. W . (1988). The scaling of mood. Hducational and Psychological Measurement, 48, 4 19-427. Weisz, J. It, & Hawley, K. M. (2002) . Proccduml and coding man ual for iden tification of beneficial

treatments. Washington, UC: American Psychological Association. Wolf, M. M. {1978). Social validity: The case for subjective measurement or how applied behavior

analysis is finding its heart. ]ounml of Applied Behavior Analysis, 11, 202-2 14.



http:/ I A nice PowerPoint presenting the distinctions between case studies and single-case researcr ces ;"s

http:/ / Wikipedia entry in single-subject resea rch

http:/ / Web site for the Association for Behavior Analysis-International, the leading professional ara ·esea·cn organi7ation that focuses on research using single-case designs.

http:/ / Web site of the journal of Applied Behavior Analysis, the oldest journal on applied research ;1 ss~es of social significance, with a rocus on single-case designs.


1. How are si ngle-system research designs an improvement over traditional c..ase studies'

2. When may the usc of single-sys tem research designs be preferred over usi ng group resea rch aes ~~s1

3. How c.a n th e use of one or more baseline ph ases enhance t he internal va lidity of sing e-sys:e-research?

4. How can external validity be established for findings obtained from single-system researcn"'

error: Content is protected !!