U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Quasi-Experimental Designs for Causal Inference

When randomized experiments are infeasible, quasi-experimental designs can be exploited to evaluate causal treatment effects. The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score designs, and comparative interrupted time series designs. This article introduces for each design the basic rationale, discusses the assumptions required for identifying a causal effect, outlines methods for estimating the effect, and highlights potential validity threats and strategies for dealing with them. Causal estimands and identification results are formalized with the potential outcomes notations of the Rubin causal model.

Causal inference plays a central role in many social and behavioral sciences, including psychology and education. But drawing valid causal conclusions is challenging because they are warranted only if the study design meets a set of strong and frequently untestable assumptions. Thus, studies aiming at causal inference should employ designs and design elements that are able to rule out most plausible threats to validity. Randomized controlled trials (RCTs) are considered as the gold standard for causal inference because they rely on the fewest and weakest assumptions. But under certain conditions quasi-experimental designs that lack random assignment can also be as credible as RCTs ( Shadish, Cook, & Campbell, 2002 ).

This article discusses four of the strongest quasi-experimental designs for identifying causal effects: regression discontinuity design, instrumental variable design, matching and propensity score designs, and the comparative interrupted time series design. For each design we outline the strategy and assumptions for identifying a causal effect, address estimation methods, and discuss practical issues and suggestions for strengthening the basic designs. To highlight the design differences, throughout the article we use a hypothetical example with the following causal research question: What is the effect of attending a summer science camp on students’ science achievement?

POTENTIAL OUTCOMES AND RANDOMIZED CONTROLLED TRIAL

Before we discuss the four quasi-experimental designs, we introduce the potential outcomes notation of the Rubin causal model (RCM) and show how it is used in the context of an RCT. The RCM ( Holland, 1986 ) formalizes causal inference in terms of potential outcomes, which allow us to precisely define causal quantities of interest and to explicate the assumptions required for identifying them. RCM considers a potential outcome for each possible treatment condition. For a dichotomous treatment variable (i.e., a treatment and control condition), each subject i has a potential treatment outcome Y i (1), which we would observe if subject i receives the treatment ( Z i = 1), and a potential control outcome Y i (0), which we would observe if subject i receives the control condition ( Z i = 0). The difference in the two potential outcomes, Y i (1)− Y i (0), represents the individual causal effect.

Suppose we want to evaluate the effect of attending a summer science camp on students’ science achievement score. Then each student has two potential outcomes: a potential control score for not attending the science camp, and the potential treatment score for attending the camp. However, the individual causal effects of attending the camp cannot be inferred from data, because the two potential outcomes are never observed simultaneously. Instead, researchers typically focus on average causal effects. The average treatment effect (ATE) for the entire study population is defined as the difference in the expected potential outcomes, ATE = E [ Y i (1)] − E [ Y i (0)]. Similarly, we can also define the ATE for the treated subjects (ATT), ATT = E [ Y i (1) | Z i = 1] − E [ Y (0) | Z i =1]. Although the expectations of the potential outcomes are not directly observable because not all potential outcomes are observed, we nonetheless can identify ATE or ATT under some reasonable assumptions. In an RCT, random assignment establishes independence between the potential outcomes and the treatment status, which allows us to infer ATE. Suppose that students are randomly assigned to the science camp and that all students comply with the assigned condition. Then random assignment guarantees that the camp attendance indicator Z is independent of the potential achievement scores Y i (0) and Y i (1).

The independence assumption allows us to rewrite ATE in terms of observable expectations (i.e., with observed outcomes instead of potential outcomes). First, due to the independence (randomization), the unconditional expectations of the potential outcomes can be expressed as conditional expectations, E [ Y i (1)] = E [ Y i (1) | Z i = 1] and E [ Y i (0)] = E [ Y i (0) | Z i = 0] Second, because the potential treatment outcomes are actually observed for the treated, we can replace the potential treatment outcome with the observed outcome such that E [ Y i (1) | Z i = 1] = E [ Y i | Z i = 1] and, analogously, E [ Y i (0) | Z i = 0] = E [ Y i | Z i = 0] Thus, the ATE is expressible in terms of observable quantities rather than potential outcomes, ATE = E [ Y i (1)] − E [ Y i (0)] = E [ Y i | Z i = 1] – E [ Y i | Z i = 0], and we that say ATE is identified.

This derivation also rests on the stable-unit-treatment-value assumption (SUTVA; Imbens & Rubin, 2015 ). SUTVA is required to properly define the potential outcomes, that is, (a) the potential outcomes of a subject depend neither on the assignment mode nor on other subjects’ treatment assignment, and (b) there is only one unique treatment and one unique control condition. Without further mentioning, we assume SUTVA for all quasi-experimental designs discussed in this article.

REGRESSION DISCONTINUITY DESIGN

Due to ethical or budgetary reasons, random assignment is often infeasible in practice. Nonetheless, researchers may sometimes still retain full control over treatment assignment as in a regression discontinuity (RD) design where, based on a continuous assignment variable and a cutoff score, subjects are deterministically assigned to treatment conditions.

Suppose that the science camp is a remedial program and only students whose grade point average (GPA) score is less than or equal to 2.0 are eligible to participate. Figure 1 shows a scatterplot of hypothetical data where the x-axis represents the assignment variable ( GPA ) and the y -axis the outcome ( Science Score ). All subjects with a GPA score below the cutoff attended the camp (circles), whereas all subjects scoring above the cutoff do not attend (squares). Because all low-achieving students are in the treatment group and all high-achieving students in the control group, their respective GPA distributions do not overlap, not even at the cutoff. This lack of overlap complicates the identification of a causal effect because students in the treatment and control group are not comparable at all (i.e., they have a completely different distribution of the GPA scores).

An external file that holds a picture, illustration, etc.
Object name is nihms-983980-f0001.jpg

A hypothetical example of regression discontinuity design. Note . GPA = grade point average.

One strategy of dealing with the lack of overlap is to rely on the linearity assumption of regression models and to extrapolate into areas of nonoverlap. However, if the linear models do not correctly specify the functional form, the resulting ATE estimate is biased. A safer strategy is to evaluate the treatment effect only at the cutoff score where treatment and control cases almost overlap, and thus functional form assumptions and extrapolation are almost no longer needed. Consider the treatment and control students that score right at the cutoff or just above it. Students with a GPA score of 2.0 participate in the science camp and students with a GPA score of 2.1 are in the control condition (the status quo condition or a different camp). The two groups of students are essentially equivalent because the difference in their GPA scores is negligibly small (2.1 − 2.0 = .1) and likely due to random chance (measurement error) rather than a real difference in ability. Thus, in the very close neighborhood around the cutoff score, the RD design is equivalent to an RCT; therefore, the ATE at the cutoff (ATEC) is identified.

CAUSAL ESTIMAND AND IDENTIFICATION

ATEC is defined as the difference in the expected potential treatment and control outcomes for the subjects scoring exactly at the cutoff: ATEC = E [ Y i (1) | A i = a c ] − E [ Y i (0) | A i = a c ], where A denotes assignment variable and a c the cutoff score. Because we observe only treatment subjects and not control subjects right at the cutoff, we need two assumptions in order to identify ATEC ( Hahn, Todd, & van Klaauw, 2001 ): (a) the conditional expectations of the potential treatment and control outcomes are continuous at the cutoff ( continuity ), and (b) all subjects comply with treatment assignment ( full compliance ).

The continuity assumption can be expressed in terms of limits as lim a ↓ a C E [ Y i ( 1 ) | A i = a ] = E [ Y i ( 1 ) | A i = a ] = lim a ↑ a C E [ Y i ( 1 ) | A i = a ] and lim a ↓ a C E [ Y i ( 0 ) | A i = a ] = E [ Y i ( 0 ) | A i = a ] = lim a ↑ a C E [ Y i ( 0 ) | A i = a ] . Thus, we can rewrite ATEC as the difference in limits, A T E C = lim a ↑ a C E [ Y i ( 1 ) | A i = a c ] − lim a ↓ a C E [ Y i ( 0 ) | A i = a c ] , which solves the issue that no control subjects are observed directly at the cutoff. Then, by the full compliance assumption, the potential treatment and control outcomes can be replaced with the observed outcomes such that A T E C = lim a ↑ a C E [ Y i | A i = a c ] − lim a ↓ a C E [ Y i | A i = a c ] is identified at the cutoff (i.e., ATEC is now expressed in terms of observable quantities). The difference in the limits represents the discontinuity in the mean outcomes exactly at the cutoff ( Figure 1 ).

Estimating ATEC

ATEC can be estimated with parametric or nonparametric regression methods. First, consider the parametric regression of the outcome Y on the treatment Z , the cutoff-centered assignment variable A − a c , and their interaction: Y = β 0 + β 1 Z + β 2 ( A − a c ) + β 3 ( Z × ( A − a c )) + e . If the model correctly specifies the functional form, then β ^ 1 is an unbiased estimator for ATEC. In practice, an appropriate model specification frequently involves also quadratic and cubic terms of the assignment variable plus their interactions with the treatment indicator.

To avoid overly strong functional form assumptions, semiparametric or nonparametric regression methods like generalized additive models or local linear kernel regression can be employed ( Imbens & Lemieux, 2008 ). These methods down-weight or even discard observations that are not in the close neighborhood around the cutoff. The R packages rdd ( Dimmery, 2013 ) and rdrobust ( Calonico, Cattaneo, & Titiunik, 2015 ), or the command rd in STATA ( Nichols, 2007 ) are useful for estimation and diagnostic purposes.

Practical Issues

A major validity threat for RD designs is the manipulation of the assignment score around the cutoff, which directly results in a violation of the continuity assumption ( Wong et al., 2012 ). For instance, if a teacher knows the assignment score in advance and he wants all his students to attend the science camp, the teacher could falsely report a GPA score of 2.0 or below for the students whose actual GPA score exceeds the cutoff value.

Another validity threat is noncompliance, meaning that subjects assigned to the control condition may cross over to the treatment and subjects assigned to the treatment do not show up. An RD design with noncompliance is called a fuzzy RD design (instead of a sharp RD design with full compliance). A fuzzy RD design still allows us to identify the intention-to-treat effect or the local average treatment effect at the cutoff (LATEC). The intention-to-treat effect refers to the effect of treatment assignment rather than the actual treatment receipt. LATEC estimates ATEC for the subjects who comply with treatment assignment. LATEC is identified if one uses the assignment status as an instrumental variable for treatment receipt (see the upcoming Instrumental Variable section).

Finally, generalizability and statistical power are often mentioned as major disadvantages of RD designs. Because RD designs identify the treatment effect only at the cutoff, ATEC estimates are not automatically generalizable to subjects scoring further away from the cutoff. Statistical power for detecting a significant effect is an issue because the lack of overlap on the assignment variable results in increased standard errors. With semi- or nonparametric regression methods, power further diminishes.

Strengthening RD Designs

To avoid systematic manipulations of the assignment variable, it is desirable to conceal the assignment rule from study participants and administrators. If the assignment rule is known to them, manipulations can hardly be ruled out, particularly when the stakes are high. Researchers can use the McCrary test ( McCrary, 2008 ) to check for potential manipulations. The test investigates whether there is a discontinuity in the distribution of the assignment variable right at the cutoff. Plotting baseline covariates against the assignment variable, and regressing the covariates on the assignment variable and the treatment indicator also help in detecting potential discontinuities at the cutoff.

The RD design’s validity can be increased by combining the basic RD design with other designs. An example is the tie-breaking RD design, which uses two cutoff scores. Subjects scoring between the two cutoff scores are randomly assigned to treatment conditions, whereas subjects scoring outside the cutoff interval receive the treatment or control condition according to the RD assignment rule ( Black, Galdo & Smith, 2007 ). This design combines an RD design with an RCT and is advantageous with respect to the correct specification of the functional form, generalizability, and statistical power. Similar benefits can be obtained by adding pretest measures of the outcome or nonequivalent comparison groups ( Wing & Cook, 2013 ).

Imbens and Lemieux (2008) and Lee and Lemieux (2010) provided comprehensive introductions to RD designs. Lee and Lemieux also summarized many applications from economics. Angrist and Lavy (1999) applied the design to investigate the effect of class size on student achievement.

INSTRUMENTAL VARIABLE DESIGN

In practice, researchers often have no or only partial control over treatment selection. In addition, they might also lack reliable knowledge of the selection process. Nonetheless, even with limited control and knowledge of the selection process it is still possible to identify a causal treatment effect if an instrumental variable (IV) is available. An IV is an exogenous variable that is related to the treatment but is completely unrelated to the outcome, except via treatment. An IV design requires researchers either to create an IV at the design stage (as in an encouragement design; see next) or to find an IV in the data set at hand or a related data base.

Consider the science camp example, but instead of random or deterministic treatment assignment, students decide on their own or together with their parents whether to attend the camp. Many factors may determine the decision, for instance, students’ science ability and motivation, parents’ socioeconomic status, or the availability of public transportation for the daily commute to the camp. Whereas the first three variables are presumably also related to the science outcome, public transportation might be unrelated to the science score (except via camp attendance). Thus, the availability of public transportation may qualify as an IV. Figure 2 illustrates such IV design: Public transportation (IV) directly affects camp attendance but has no direct or indirect effect on science achievement (outcome) other than through camp attendance (treatment). The question mark represents unknown or unobserved confounders, that is, variables that simultaneously affect both camp attendance and science achievement. The IV design allows us to identify a causal effect even if some or all confounders are unknown or unobserved.

An external file that holds a picture, illustration, etc.
Object name is nihms-983980-f0002.jpg

A diagram of an example of instrumental variable design.

The strategy for identifying a causal effect is based on exploiting the variation in the treatment variable explained by IV. In Figure 2 , the total variation in the treatment consists of (a) the variation induced by the IV and (b) the variation induced by confounders (question mark) and other exogenous variables (not shown in the figure). The identification of the camp’s effect requires us to isolate the treatment variation that is related to public transportation (IV), and then to use the isolated variation to investigate the camp’s effect on the science score. Because we exploit the treatment variation exclusively induced by the IV but ignore the variation induced by unobserved or unknown confounders, the IV design identifies the ATE for the sub-population of compliers only. In our example, the compliers are the students who attend the camp because public transportation is available and do not attend because it is unavailable. For students whose parents always use their own car to drop them off and pick them up at the camp location, we cannot infer the causal effect, because their camp attendance is completely unrelated to the availability of public transportation.

Causal Estimand and Identification

The complier average treatment effect (CATE) is defined as the expected difference in potential outcomes for the sub-population of compliers: CATE = E [ Y i (1) | Complier ] − E [ Y i (0) | Complier ] = τ C .

Identification requires us to distinguish between four latent groups: compliers (C), who attend the camp if public transportation is available but do not attend if unavailable; always-takers (A), who always attend the camp regardless of whether or not public transportation is available; never-takers (N), who never attend the camp regardless of public transportation; and defiers (D), who do not attend if public transportation is available but attend if unavailable. Because group membership is unknown, it is impossible to directly infer CATE from the data of compliers. However, CATE is identified from the entire data set if (a) the IV is predictive of the treatment ( predictive first stage ), (b) the IV is unrelated to the outcome except via treatment ( exclusion restriction ), and (c) no defiers are present ( monotonicity ; Angrist, Imbens, & Rubin, 1996 ; see Steiner, Kim, Hall, & Su, 2015 , for a graphical explanation).

First, notice that the IV’s effects on the treatment (γ) and the outcome (δ) are directly identified from the observed data because the IV’s relation with the treatment and outcome is unconfounded. In our example ( Figure 2 ), γ denotes the effect of public transportation on camp attendance and δ the indirect effect of public transportation on the science score. Both effects can be written as weighted averages of the corresponding group-specific effects ( γ C , γ A , γ N , γ D and δ C , δ A , δ N , δ D for compliers, always-takers, never-takers, and defiers, respectively): γ = p ( C ) γ C + p ( A ) γA + p ( N ) γ N + p ( D ) γ D and δ = p ( C ) δ C + p ( A ) δ A + p ( N ) δ N + p ( D ) δ D where p (.) represents the portion of the respective latent group in the population and p ( C ) + p ( A ) + p ( N ) + p ( D ) = 1. Because the treatment choice of always-takers and never-takers is entirely unaffected by the instrument, the IV’s effect on the treatment is zero, γ A = γ N = .0, and together with the exclusion restriction , we also know δ A = δ N = 0, that is, the IV has no effect on the outcome. If no defiers are present, p ( D ) = 0 ( monotonicity ), then the IV’s effects on the treatment and outcome simplify to γ = p ( C ) γC and δ = p ( C ) δC , respectively. Because δ C = γ C τ C and γ ≠ 0 ( predictive first stage ), the ratio of the observable IV effects, γ and δ, identifies CATE: δ γ = p ( C ) γ C τ C p ( C ) γ C = τ C .

Estimating CATE

A two-stage least squares (2SLS) regression is typically used for estimating CATE. In the first stage, treatment Z is regressed on the IV, Z = β 0 + β 1 IV + e . The linear first-stage model applies with a dichotomous treatment variable (linear probability model). The second stage then regresses the outcome Y on the predicted values Z ^ from the first stage model, Y = π 0 + π 1 Z ^ + r , where π ^ 1 is the CATE estimator. The two stages are automatically performed by the 2SLS procedure, which also provides an appropriate standard error for the effect estimate. The STATA commands ivregress and ivreg2 ( Baum, Schaffer, & Stillman, 2007 ) or the sem package in R ( Fox, 2006 ) perform the 2SLS regression.

One challenge in implementing an IV design is to find a valid instrument that satisfies the assumptions just discussed. In particular, the exclusion restriction is untestable and frequently hard to defend in practice. In our example, if high-income families live in suburban areas with bad public transportation connections, then the availability of the public transportation is likely related to the science score via household income (or socioeconomic status). Although conditioning on the observed household income can transform public transportation into a conditional IV (see next), one can frequently come up with additional scenarios that explains why the IV is related to the outcome and thus violates the exclusion restriction.

Another issue arises from “weak” IVs that are only weakly related to treatment. Weak IVs cause efficiency problems ( Wooldridge, 2012 ). If the availability of public transportation barely affects camp attendance because most parents give their children a ride anyway, the IV’s effect on the treatment ( γ ) is close to zero. Because γ ^ is the denominator in the CATE estimator, τ ^ C = δ ^ / γ ^ , an imprecisely estimated γ ^ results in a considerable over- or underestimation of CATE. Moreover, standard errors will be large.

One also needs to keep in mind that the substantive meaning of CATE depends on the chosen IV. Consider two slightly different IVs with respect to public transportation: the availability of (a) a bus service and (b) subway service. For the first IV, the complier population consists of students who choose to (not) attend the camp depending on the availability of a bus service. For the second IV, the complier population refers to the availability of a subway service. Because the two complier populations are very likely different from each other (students who are willing to take the subway might not be willing to take the bus), the corresponding CATEs refer to different subpopulations.

Strengthening IV Designs

Given the challenges in identifying a valid instrument from observed data, researchers should consider creating an IV at the design stage of a study. Although it might be impossible to directly assign subjects to treatment conditions, one might still be able to encourage participants to take the treatment. Subjects are randomly encouraged to sign up for treatment, but whether they actually comply with the encouragement is entirely their own decision ( Imai et al., 2011 ). Random encouragement qualifies as an IV because it very likely meets the exclusion restriction. For example, instead of collecting data on public transportation, researchers may advertise and recommend the science camp in a letter to the parents of a randomly selected sample of students.

With observational data it is hard to identify a valid IV because covariates that strongly predict the treatment are usually also related to the outcome. However, these covariates can still qualify as an IV if they affect the outcome only indirectly via other observed variables. Such covariates can be used as conditional IVs, that is, they meet the IV requirements conditional on the observed variables ( Brito & Pearl, 2002 ). Assume the availability of public transportation (IV) is associated with the science score via household income. Then, controlling for the reliably measured household income in both stages of the 2SLS analysis blocks the IV’s relation to the science score and turns public transportation into a conditional IV. However, controlling for a large set of variables does not guarantee that the exclusion restriction is more likely met. It may even result in more bias as compared to an IV analysis with fewer covariates ( Ding & Miratrix, 2015 ; Steiner & Kim, in press ). The choice of a valid conditional IV requires researchers to carefully select the control variables based on subject-matter theory.

The seminal article by Angrist et al. (1996) provides a thorough discussion of the IV design, and Steiner, Kim, et al. (2015 ) proved the identification result using graphical models. Excellent introductions to IV designs can be found in Angrist and Pischke (2009 , 2015) . Angrist and Krueger (1992) is an example of a creative application of the design with birthday as the IV. For encouragement designs, see Holland (1988) and Imai et al. (2011) .

MATCHING AND PROPENSITY SCORE DESIGN

This section considers quasi-experimental designs in which researchers lack control over treatment selection but have good knowledge about the selection mechanism or at least the confounders that simultaneously determine the treatment selection and the outcome. Due to self or third-person selection of subjects into treatment, the resulting treatment and control groups typically differ in observed but also unobserved baseline covariates. If we have reliable measures of all confounding covariates, then matching or propensity score (PS) designs balance groups on observed baseline covariates and thus enable the identification of causal effects ( Imbens & Rubin, 2015 ). Regression analysis and the analysis of covariance can also remove the confounding bias, but because they rely on functional form assumptions and extrapolation we discuss only nonparametric matching and PS designs.

Suppose that students decide on their own whether to attend the science camp. Although many factors can affect students’ decision, teachers with several years of experience of running the camp may know that selection is mostly driven by students’ science ability, liking of science, and their parents’ socioeconomic status. If all the selection-relevant factors that also affect the outcome are known, the question mark in Figure 2 can be replaced by the known confounding covariates.

Given the set of confounding covariates, causal inference with matching or PS designs is straightforward, at least theoretically. The basic one-to-one matching design matches each treatment subject to a control subject that is equivalent or at least very similar in observed covariates. To illustrate the idea of matching, consider a camp attendee with baseline measures of 80 on the science pre-test, 6 on liking science, and 50 on the socioeconomic status. Then a multivariate matching strategy tries to find a nonattendee with exactly the same or at least very similar baseline measures. If we succeed in finding close matches for all camp attendee, the matched samples of attendees and nonattendees will have almost identical covariate distributions.

Although multivariate matching works well when the number of confounders is small and the pool of control subjects is large relative to the number of treatment subjects, it is usually difficult to find close matches with a large set of covariates or a small pool of control subjects. Matching on the PS helps to overcome this issue because the PS is a univariate score computed from the observed covariates ( Rosenbaum & Rubin, 1983 ). The PS is formally defined as the conditional probability of receiving the treatment given the set of observed covariates X : PS = Pr( Z = 1 | X ).

Matching and PS designs usually investigate ATE = E [ Y i (1)] − E [ Y i (0)] or ATT = E [ Y i (1) | Z i = 1] – E [ Y i (0) | Z i = 1]. Both causal effects are identified if (a) the potential outcomes are statistically independent of the treatment indicator given the set of observed confounders X , { Y (1), Y (0)}⊥ Z | X ( unconfoundedness ; ⊥ denotes independence), and (b) the treatment probability is strictly between zero and one, 0 < Pr( Z = 1 | X ) < 1 ( positivity ).

By the positivity assumption we get E [ Y i (1)] = E X [ E [ Y i (1) | X ]] and E [ Y i (0)] = E X [ E [ Y i (0) | X ]]. If the unconfoundedness assumption holds, we can write the inner expectations as E [ Y i (1) | X ] = E [ Y i (1) | Z i =1; X ] and E [ Y i (0) | X ] = E [ Y i (0) | Z i = 0; X ]. Finally, because the treatment (control) outcomes of the treatment (control) subjects are actually observed, ATE is identified because it can be expressed in terms of observable quantities: ATE = E X [ E [ Y i | Z i = 1; X ]] – E X [ E [ Y i | Z i = 0; X ]]. The same can be shown for ATT. The unconfoundedness and positivity assumption are frequently referred to jointly as the strong ignorability assumption. Rosenbaum and Rubin (1983) proved that if the assignment is strongly ignorable given X , then it is also strongly ignorable given the PS alone.

Estimating ATE and ATT

Matching designs use a distance measure for matching each treatment subject to the closest control subject. The Mahalanobis distance is usually used for multivariate matching and the Euclidean distance on the logit of the PS for PS matching. Matching strategies differ with respect to the matching ratio (one-to-one or one-to-many), replacement of matched subjects (with or without replacement), use of a caliper (treatment subjects that do not have a control subject within a certain threshold remain unmatched), and the matching algorithm (greedy, genetic, or optimal matching; Sekhon, 2011 ; Steiner & Cook, 2013 ). Because we try to find at least one control subject for each treatment subject, matching estimators typically estimate ATT. Once treatment and control subjects are matched, ATT is computed as the difference in the mean outcome of the treatment and control group. An alternative matching strategy that allows for estimating ATE is full matching, which stratifies all subjects into the maximum number of strata, where each stratum contains at least one treatment and one control subject ( Hansen, 2004 ).

The PS can also be used for PS stratification and inverse-propensity weighting. PS stratification stratifies the treatment and control subjects into at least five strata and estimates the treatment effect within each stratum. ATE or ATT is then obtained as the weighted average of the stratum-specific treatment effects. Inverse-propensity weighting follows the same logic as inverse-probability weighting in survey research ( Horvitz & Thompson, 1952 ) and requires the computation of weights that refer to either the overall population (ATE) or the population of treated subjects only (ATT). Given the inverse-propensity weights, ATE or ATT is usually estimated via weighted least squares regression.

Because the true PSs are unknown, they need to be estimated from the observed data. The most common method for estimating the PS is logistic regression, which regresses the binary treatment indicator Z on predictors of the observed covariates. The PS model is specified according to balance criteria (instead of goodness of fit criteria), that is, the estimated PSs should remove all baseline differences in observed covariates ( Imbens & Rubin, 2015 ). The predicted probabilities from the PS model represent the estimated PSs.

All three PS designs—matching, stratification, and weighting—can benefit from additional covariance adjustments in an outcome regression. That is, for the matched, stratified or weighted data, the outcome is regressed on the treatment indicator and the additional covariates. Combining the PS design with a covariance adjustment gives researchers two chances to remove the confounding bias, by correctly specifying either the PS model or the outcome model. These combined methods are said to be doubly robust because they are robust against either the misspecification of the PS model or the misspecification of the outcome model ( Robins & Rotnitzky, 1995 ). The R packages optmatch ( Hansen & Klopfer, 2006 ) and MatchIt ( Ho et al., 2011 ) and the STATA command teffects , in particular teffects psmatch ( StataCorp, 2015 ), can be useful for matching or PS analyses.

The most challenging issue with matching and PS designs is the selection of covariates for establishing unconfoundedness. Ideally, subject-matter theory about the selection process and the outcome-generating model is used for selecting a set of covariates that removes all the confounding ( Pearl, 2009 ). If strong subject-matter theories are not available, selecting the right covariates is difficult. In the hope to remove a major part of the confounding bias—if not all of it—a frequently applied strategy is to match on as many covariates as possible. However, recent literature shows that thoughtless inclusion of covariates may increase rather than reduce the confounding bias ( Pearl, 2010 ; Steiner & Kim, in press). The risk of increasing bias can be reduced if the observed covariates cover a broad range of heterogeneous construct domains, including at least one reliable pretest measure of the outcome ( Steiner, Cook, et al., 2015 ). Besides having the right covariates, they also need to be reliably measured. The unreliable measurement of confounding covariates has a similar effect as the omission of a confounder: It results in a violation of the unconfoundedness assumption and thus in a biased effect estimate ( Steiner, Cook, & Shadish, 2011 ; Steiner & Kim, in press ).

Even if the set of reliably measured covariates establishes unconfoundedness, we still need to correctly specify the functional form of the PS model. Although parametric models like logistic regression, including higher order terms, might frequently approximate the correct functional form, they still rely on the linearity assumption. The linearity assumption can be relaxed if one estimates the PS with statistical learning algorithms like classification trees, neural networks, or the LASSO ( Keller, Kim, & Steiner, 2015 ; McCaffrey, Ridgeway, & Morral, 2004 ).

Strengthening Matching and PS Designs

The credibility of matching and PS designs heavily relies on the unconfoundedness assumption. Although empirically untestable, there are indirect ways for assessing unconfoundedness. First, unaffected (nonequivalent) outcomes that are known to be unaffected by the treatment can be used ( Shadish et al., 2002 ). For instance, we may expect that attendance in the science camp does not significantly affect the reading score. Thus, if we observe a significant group difference in the reading score after the PS adjustment, bias due to unobserved confounders (e.g., general intelligence) is still likely. Second, adding a second but conceptually different control group allows for a similar test as with the unaffected outcome ( Rosenbaum, 2002 ).

Because researchers rarely know whether the unconfoundedness assumption is actually met with the data at hand, it is important to assess the effect estimate’s sensitivity to potentially unobserved confounders. Sensitivity analyses investigate how strongly an estimate’s magnitude and significance changes if a confounder of a certain strength would have been omitted from the analyses. Causal conclusions are much more credible if the effect’s direction, magnitude, and significance is rather insensitive to omitted confounders ( Rosenbaum, 2002 ). However, despite the value of sensitivity analyses, they are not informative about whether hidden bias is actually present.

Schafer and Kang (2008) and Steiner and Cook (2013) provided a comprehensive introduction. Rigorous formalization and technical details of PS designs can be found in Imbens and Rubin (2015) . Rosenbaum (2002) discussed many important design issues in these designs.

COMPARATIVE INTERRUPTED TIME SERIES DESIGN

The designs discussed so far require researchers to have either full control over treatment assignment or reliable knowledge of the exogenous (IV) or endogenous part of the selection mechanism (i.e., the confounders). If none of these requirements are met, a comparative interrupted time series (CITS) design might be a viable alternative if (a) multiple measurements of the outcome ( time series ) are available for both the treatment and a comparison group and (b) the treatment group’s time series has been interrupted by an intervention.

Suppose that all students of one class in a school (say, an advanced science class) attend the camp, whereas all students of another class in the same school do not attend. Also assume that monthly measures of science achievement before and after the science camp are available. Figure 3 illustrates such a scenario where the x -axis represents time in Months and the y -axis the Science Score (aggregated at the class level). The filled symbols indicate the treatment group (science camp), open symbols the comparison group (no science camp). The science camp intervention divides both time series into a preintervention time series (circles) and a postintervention time series (squares). The changes in the levels and slopes of the pre- and postintervention regression lines represent the camp’s impact but possibly also the effect of other events that co-occur with the intervention. The dashed lines extrapolate the preintervention growth curves into the postintervention period, and thus represent the counterfactual situation where the intervention but also other co-occurring events are absent.

An external file that holds a picture, illustration, etc.
Object name is nihms-983980-f0003.jpg

A hypothetical example of comparative interrupted time series design.

The strength of a CITS design is its ability to discriminate between the intervention’s effect and the effects of co-occurring events. Such events might be other potentially competing interventions (history effects) or changes in the measurement of the outcome (instrumentation), for instance. If the co-occurring events affect the treatment and comparison group to the same extent, then subtracting the changes in the comparison group’s growth curve from the changes in the treatment group’s growth curve provides a valid estimate of the intervention’s impact. Because we investigate the difference in the changes (= differences) of the two growth curves, the CITS design is a special case of the difference-in-differences design ( Somers et al., 2013 ).

Assume that a daily TV series about Albert Einstein was broadcast in the evenings of the science camp week and that students of both classes were exposed to the same extent to the TV series. It follows that the comparison group’s change in the growth curve represents the TV series’ impact. The comparison group’s time series in Figure 3 indicates that the TV series might have had an immediate impact on the growth curve’s level but almost no effect on the slope. On the other hand, the treatment group’s change in the growth curve is due to both the science camp and the TV series. Thus, in differencing out the TV series’ effect (estimated from the comparison group) we can identify the camp effect.

Let t c denote the time point of the intervention, then the intervention’s effect on the treated (ATT) at a postintervention time point t ≥ t c is defined as τ t = E [ Y i t T ( 1 ) ] − E [ Y i t T ( 0 ) ] , where Y i t T ( 0 ) and Y i t T ( 1 ) are the potential control and treatment outcomes of subject i in the treatment group ( T ) at time point t . The time series of the expected potential outcomes can be formalized as sum of nonparametric but additive time-dependent functions. The treatment group’s expected potential control outcome can be represented as E [ Y i t T ( 0 ) ] = f 0 T ( t ) + f E T ( t ) , where the control function f 0 T ( t ) generates the expected potential control outcomes in absence of any interventions ( I ) or co-occurring events ( E ), and the event function f E T ( t ) adds the effects of co-occurring events. Similarly, the expected potential treatment outcome can be written as E [ Y i t T ( 1 ) ] = f 0 T ( t ) + f E T ( t ) + f I T ( t ) , which adds the intervention’s effect τ t = f I T ( t ) to the control and event function. In the absence of a comparison group, we can try to identify the impact of the intervention by comparing the observable postintervention outcomes to the extrapolated outcomes from the preintervention time series (dashed line in Figure 3 ). Extrapolation is necessary because we do not observe any potential control outcomes in the postintervention period (only potential treatment outcomes are observed). Let f ^ 0 T ( t ) denote the parametric extrapolation of the preintervention control function f 0 T ( t ) , then the observable pre–post-intervention difference ( PP T ) in the expected control outcome is P P t T = f 0 T ( t ) + f E T ( t ) + f I T ( t ) − f ^ 0 T ( t ) = f I T ( t ) + ( f 0 T ( t ) − f ^ 0 T ( t ) ) + f E T ( t ) . Thus, in the absence of a comparison group, ATT is identified (i.e., P P t T = f I T ( t ) = τ t ) only if the control function is correctly specified ( f 0 T ( t ) = f ^ 0 T ( t ) ) and if no co-occurring events are present ( f E T ( t ) = 0 ).

The comparison group in a CITS design allows us to relax both of these identifying assumptions. In order to see this, we first define the expected control outcomes of the comparison group ( C ) as a sum of two time-dependent functions as before: E [ Y i t C ( 0 ) ] = f 0 C ( t ) + f E C ( t ) . Then, in extrapolating the comparison group’s preintervention function into the postintervention period, f ^ 0 C ( t ) , we can compute the pre–post-intervention difference for the comparison group: P P t C = f 0 C ( t ) + f E C ( t ) − f ^ 0 C ( t ) = f E C ( t ) + ( f 0 C ( t ) − f ^ 0 C ( t ) ) If the control function is correctly specified f 0 C ( t ) = f ^ 0 C ( t ) , the effect of co-occurring events is identified P P t C = f E C ( t ) . However, we do not necessarily need a correctly specified control function, because in a CITS design we focus on the difference in the treatment and comparison group’s pre–post-intervention differences, that is, P P t T − P P t C = f I T ( t ) + { ( f 0 T ( t ) − f ^ 0 T ( t ) ) − ( f 0 C ( t ) − f ^ 0 C ( t ) ) } + { f E T ( t ) − f E C ( t ) } . Thus, ATT is identified, P P t T − P P t C = f I T ( t ) = τ t , if (a) both control functions are either correctly specified or misspecified to the same additive extent such that ( f 0 T ( t ) − f ^ 0 T ( t ) ) = ( f 0 C ( t ) − f ^ 0 C ( t ) ) ( no differential misspecification ) and (b) the effect of co-occurring events is identical in the treatment and comparison group, f E T ( t ) = f E C ( t ) ( no differential event effects ).

Estimating ATT

CITS designs are typically analyzed with linear regression models that regress the outcome Y on the centered time variable ( T – t c ), the intervention indicator Z ( Z = 0 if t < t c , otherwise Z = 1), the group indicator G ( G = 1 for the treatment group and G = 0 for the control group), and the corresponding two-way and three-way interactions:

Depending on the number of subjects in each group, fixed or random effects for the subjects are included as well (time fixed or random effect can also be considered). β ^ 5 estimates the intervention’s immediate effect at the onset of the intervention (change in intercept) and β ^ 7 the intervention’s effect on the growth rate (change in slope). The inclusion of dummy variables for each postintervention time point (plus their interaction with the intervention and group indicators) would allow for a direct estimation of the time-specific effects. If the time series are long enough (at least 100 time points), then a more careful modeling of the autocorrelation structure via time series models should be considered.

Compared to other designs, CITS designs heavily rely on extrapolation and thus on functional form assumptions. Therefore, it is crucial that the functional forms of the pre- and postintervention time series (including their extrapolations) are correctly specified or at least not differentially misspecified. With short time series or measurement points that inadequately capture periodical variations, the correct specification of the functional form is very challenging. Another specification aspect concerns serial dependencies among the data points. Failing to model serial dependencies can bias effect estimates and their standard errors such that significance tests might be misleading. Accounting for serial dependencies requires autoregressive models (e.g., ARIMA models), but the time series should have at least 100 time points ( West, Biesanz, & Pitts, 2000 ). Standard fixed effects or random effects models deal at least partially with the dependence structure. Robust standard errors (e.g., Huber-White corrected ones) or the bootstrap can also be used to account for dependency structures.

Events that co-occur with the intervention of interest, like history or instrumentation effects, are a major threat to the time series designs that lack a comparison group ( Shadish et al., 2002 ). CITS designs are rather robust to co-occurring events as long as the treatment and comparison groups are affected to the same additive extent. However, there is no guarantee that both groups are exposed to the same events and affected to the same extent. For example, if students who do not attend the camp are less likely to watch the TV series, its effect cannot be completely differenced out (unless the exposure to the TV series is measured). If one uses aggregated data like class or school averages of achievement scores, then differential compositional shifts over time can also invalidate the CITS design. Compositional shifts occur due to dropouts or incoming subjects over time.

Strengthening CITS Designs

If the treatment and comparison group’s preintervention time series are very different (different levels and slopes), then the assumption that history or instrumentation threats affect both groups to the same additive extent may not hold. Matching treatment and comparison subjects prior to the analysis can increase the plausibility of this assumption. Instead of using all nonparticipating students of the comparison class, we may select only those students who have a similar level and growth in the preintervention science scores as the students participating in the camp. We can also match on additional covariates like socioeconomic status or motivation levels. Multivariate or PS matching can be used for this purpose. If the two groups are similar, it is more likely that they are affected by co-occurring events to the same extent.

As with the matching and PS designs, using an unaffected outcome in CITS designs helps to probe the untestable assumptions ( Coryn & Hobson, 2011 ; Shadish et al., 2002 ). For instance, we might expect that attending the science camp does not affect students’ reading scores but that some validity threats (e.g., attrition) operate on both the reading and science outcome. If we find a significant camp effect on the reading score, the validity of the CITS design for evaluating the camp’s impact on the science score is in doubt.

Another strategy to avoid validity threats is to control the time point of the intervention if possible. Researchers can wait with the implementation of the treatment until they have enough preintervention measures for reliably estimating the functional form. They can also choose to intervene when threats to validity are less likely (avoiding the week of the TV series). Control over the intervention also allows researchers to introduce and remove the treatment in subsequent time intervals, maybe even with switching replications between two (or more) groups. If the treatment is effective, we expect that the pattern of the intervention scheme is directly reflected in the time series of the outcome (for more details, see Shadish et al., 2002 ; for the literature on single case designs, see Kazdin, 2011 ).

A comprehensive introduction to CITS design can be found in Shadish et al. (2002) , which also addresses many classical applications. For more technical details of its identification, refer to Lechner (2011) . Wong, Cook, and Steiner (2009) evaluated the effect of No Child Left Behind using a CITS design.

CONCLUDING REMARKS

This article discussed four of the strongest quasi-experimental designs for causal inference when randomized experiments are not feasible. For each design we highlighted the identification strategies and the required assumptions. In practice, it is crucial that the design assumptions are met, otherwise biased effect estimates result. Because most important assumptions like the exclusion restriction or the unconfoundedness assumption are not directly testable, researchers should always try to assess their plausibility via indirect tests and investigate the effect estimates’ sensitivity to violations of these assumptions.

Our discussion of RD, IV, PS, and CITS designs made it also very clear that, in comparison to RCTs, quasi-experimental designs rely on more or stronger assumptions. With prefect control over treatment assignment and treatment implementation (as in an RCT), causal inference is warranted by a minimal set of assumptions. But with limited control over and knowledge about treatment assignment and implementation, stronger assumptions are required and causal effects might be identifiable only for local subpopulations. Nonetheless, observational data sometimes meet the assumptions of a quasi-experimental design, at least approximately, such that causal conclusions are credible. If so, the estimates of quasi-experimental designs—which exploit naturally occurring selection processes and real-world implementations of the treatment—are frequently better generalizable than the results from a controlled laboratory experiment. Thus, if external validity is a major concern, the results of randomized experiments should always be complemented by findings from valid quasi-experiments.

  • Angrist JD, Imbens GW, & Rubin DB (1996). Identification of causal effects using instrumental variables . Journal of the American Statistical Association , 91 , 444–455. [ Google Scholar ]
  • Angrist JD, & Krueger AB (1992). The effect of age at school entry on educational attainment: An application of instrumental variables with moments from two samples . Journal of the American Statistical Association , 87 , 328–336. [ Google Scholar ]
  • Angrist JD, & Lavy V (1999). Using Maimonides’ rule to estimate the effect of class size on scholastic achievment . Quarterly Journal of Economics , 114 , 533–575. [ Google Scholar ]
  • Angrist JD, & Pischke JS (2009). Mostly harmless econometrics: An empiricist’s companion . Princeton, NJ: Princeton University Press. [ Google Scholar ]
  • Angrist JD, & Pischke JS (2015). Mastering’metrics: The path from cause to effect . Princeton, NJ: Princeton University Press. [ Google Scholar ]
  • Baum CF, Schaffer ME, & Stillman S (2007). Enhanced routines for instrumental variables/generalized method of moments estimation and testing . The Stata Journal , 7 , 465–506. [ Google Scholar ]
  • Black D, Galdo J, & Smith JA (2007). Evaluating the bias of the regression discontinuity design using experimental data (Working paper) . Chicago, IL: University of Chicago. [ Google Scholar ]
  • Brito C, & Pearl J (2002). Generalized instrumental variables In Darwiche A & Friedman N (Eds.), Uncertainty in artificial intelligence (pp. 85–93). San Francisco, CA: Morgan Kaufmann. [ Google Scholar ]
  • Calonico S, Cattaneo MD, & Titiunik R (2015). rdrobust: Robust data-driven statistical inference in regression-discontinuity designs (R package ver. 0.80) . Retrieved from http://CRAN.R-project.org/package=rdrobust
  • Coryn CLS, & Hobson KA (2011). Using nonequivalent dependent variables to reduce internal validity threats in quasi-experiments: Rationale, history, and examples from practice . New Directions for Evaluation , 131 , 31–39. [ Google Scholar ]
  • Dimmery D (2013). rdd: Regression discontinuity estimation (R package ver. 0.56) . Retrieved from http://CRAN.R-project.org/package=rdd
  • Ding P, & Miratrix LW (2015). To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias . Journal of Causal Inference , 3 ( 1 ), 41–57. [ Google Scholar ]
  • Fox J (2006). Structural equation modeling with the sem package in R . Structural Equation Modeling , 13 , 465–486. [ Google Scholar ]
  • Hahn J, Todd P, & Van der Klaauw W (2001). Identification and estimation of treatment effects with a regression–discontinuity design . Econometrica , 69 ( 1 ), 201–209. [ Google Scholar ]
  • Hansen BB (2004). Full matching in an observational study of coaching for the SAT . Journal of the American Statistical Association , 99 , 609–618. [ Google Scholar ]
  • Hansen BB, & Klopfer SO (2006). Optimal full matching and related designs via network flows . Journal of Computational and Graphical Statistics , 15 , 609–627. [ Google Scholar ]
  • Ho D, Imai K, King G, & Stuart EA (2011). MatchIt: Nonparametric preprocessing for parametric causal inference . Journal of Statistical Software , 42 ( 8 ), 1–28. Retrieved from http://www.jstatsoft.org/v42/i08/ [ Google Scholar ]
  • Holland PW (1986). Statistics and causal inference . Journal of the American Statistical Association , 81 , 945–960. [ Google Scholar ]
  • Holland PW (1988). Causal inference, path analysis and recursive structural equations models . ETS Research Report Series . doi: 10.1002/j.2330-8516.1988.tb00270.x [ CrossRef ] [ Google Scholar ]
  • Horvitz DG, & Thompson DJ (1952). A generalization of sampling without replacement from a finite universe . Journal of the American Statistical Association , 47 , 663–685. [ Google Scholar ]
  • Imai K, Keele L, Tingley D, & Yamamoto T (2011). Unpacking the black box of causality: Learning about causal mechanisms from experimental and observational studies . American Political Science Review , 105 , 765–789. [ Google Scholar ]
  • Imbens GW, & Lemieux T (2008). Regression discontinuity designs: A guide to practice . Journal of Econometrics , 142 , 615–635. [ Google Scholar ]
  • Imbens GW, & Rubin DB (2015). Causal inference in statistics, social, and biomedical sciences . New York, NY: Cambridge University Press. [ Google Scholar ]
  • Kazdin AE (2011). Single-case research designs: Methods for clinical and applied settings . New York, NY: Oxford University Press. [ Google Scholar ]
  • Keller B, Kim JS, & Steiner PM (2015). Neural networks for propensity score estimation: Simulation results and recommendations In van der Ark LA, Bolt DM, Chow S-M, Douglas JA, & Wang W-C (Eds.), Quantitative psychology research (pp. 279–291). New York, NY: Springer. [ Google Scholar ]
  • Lechner M (2011). The estimation of causal effects by difference-in-difference methods . Foundations and Trends in Econometrics , 4 , 165–224. [ Google Scholar ]
  • Lee DS, & Lemieux T (2010). Regression discontinuity designs in economics . Journal of Economic Literature , 48 , 281–355. [ Google Scholar ]
  • McCaffrey DF, Ridgeway G, & Morral AR (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies . Psychological Methods , 9 , 403–425. [ PubMed ] [ Google Scholar ]
  • McCrary J (2008). Manipulation of the running variable in the regression discontinuity design: A density test . Journal of Econometrics , 142 , 698–714. [ Google Scholar ]
  • Nichols A (2007). rd: Stata modules for regression discontinuity estimation . Retrieved from http://ideas.repec.org/c/boc/bocode/s456888.html
  • Pearl J (2009). C ausality: Models, reasoning, and inference (2nd ed.). New York, NY: Cambridge University Press. [ Google Scholar ]
  • Pearl J (2010). On a class of bias-amplifying variables that endanger effect estimates In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (pp. 425–432). Corvallis, OR: Association for Uncertainty in Artificial Intelligence. [ Google Scholar ]
  • Robins JM, & Rotnitzky A (1995). Semiparametric efficiency in multivariate regression models with missing data . Journal of the American Statistical Association , 90 ( 429 ), 122–129. [ Google Scholar ]
  • Rosenbaum PR (2002). Observational studies . New York, NY: Springer. [ Google Scholar ]
  • Rosenbaum PR, & Rubin DB (1983). The central role of the propensity score in observational studies for causal effects . Biometrika , 70 ( 1 ), 41–55. [ Google Scholar ]
  • Schafer JL, & Kang J (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example . Psychological Methods , 13 , 279–313. [ PubMed ] [ Google Scholar ]
  • Sekhon JS (2011). Multivariate and propensity score matching software with automated balance optimization: The matching package for R . Journal of Statistical Software , 42 ( 7 ), 1–52. [ Google Scholar ]
  • Shadish WR, Cook TD, & Campbell DT (2002). Experimental and quasi-experimental designs for generalized causal inference . Boston, MA: Houghton-Mifflin. [ Google Scholar ]
  • Somers M, Zhu P, Jacob R, & Bloom H (2013). The validity and precision of the comparative interrupted time series design and the difference-in-difference design in educational evaluation (MDRC working paper in research methodology) . New York, NY: MDRC. [ Google Scholar ]
  • StataCorp. (2015). Stata treatment-effects reference manual: Potential outcomes/counterfactual outcomes . College Station, TX: Stata Press; Retrieved from http://www.stata.com/manuals14/te.pdf [ Google Scholar ]
  • Steiner PM, & Cook D (2013). Matching and propensity scores In Little T (Ed.), The Oxford handbook of quantitative methods in psychology (Vol. 1 , pp. 237–259). New York, NY: Oxford University Press. [ Google Scholar ]
  • Steiner PM, Cook TD, Li W, & Clark MH (2015). Bias reduction in quasi-experiments with little selection theory but many covariates . Journal of Research on Educational Effectiveness , 8 , 552–576. [ Google Scholar ]
  • Steiner PM, Cook TD, & Shadish WR (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores . Journal of Educational and Behavioral Statistics , 36 , 213–236. [ Google Scholar ]
  • Steiner PM, & Kim Y (in press). The mechanics of omitted variable bias: Bias amplification and cancellation of offsetting biases . Journal of Causal Inference . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Steiner PM, Kim Y, Hall CE, & Su D (2015). Graphical models for quasi-experimental designs . Sociological Methods & Research. Advance online publication . doi: 10.1177/0049124115582272 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • West SG, Biesanz JC, & Pitts SC (2000). Causal inference and generalization in field settings: Experimental and quasi-experimental designs In Reis HT & Judd CM (Eds.), Handbook of research methods in social and personality psychology (pp. 40–84). New York, NY: Cambridge University Press. [ Google Scholar ]
  • Wing C, & Cook TD (2013). Strengthening the regression discontinuity design using additional design elements: A within-study comparison . Journal of Policy Analysis and Management , 32 , 853–877. [ Google Scholar ]
  • Wong M, Cook TD, & Steiner PM (2009). No Child Left Behind: An interim evaluation of its effects on learning using two interrupted time series each with its own non-equivalent comparison series (Working Paper No. WP-09–11) . Evanston, IL: Institute for Policy Research, Northwestern University. [ Google Scholar ]
  • Wong VC, Wing C, Steiner PM, Wong M, & Cook TD (2012). Research designs for program evaluation . Handbook of Psychology , 2 , 316–341. [ Google Scholar ]
  • Wooldridge J (2012). Introductory econometrics: A modern approach (5th ed.). Mason, OH: South-Western Cengage Learning. [ Google Scholar ]

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Quasi-Experimental Design | Definition, Types & Examples

Quasi-Experimental Design | Definition, Types & Examples

Published on July 31, 2020 by Lauren Thomas . Revised on January 22, 2024.

Like a true experiment , a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable .

However, unlike a true experiment, a quasi-experiment does not rely on random assignment . Instead, subjects are assigned to groups based on non-random criteria.

Quasi-experimental design is a useful tool in situations where true experiments cannot be used for ethical or practical reasons.

Quasi-experimental design vs. experimental design

Table of contents

Differences between quasi-experiments and true experiments, types of quasi-experimental designs, when to use quasi-experimental design, advantages and disadvantages, other interesting articles, frequently asked questions about quasi-experimental designs.

There are several common differences between true and quasi-experimental designs.

True experimental design Quasi-experimental design
Assignment to treatment The researcher subjects to control and treatment groups. Some other, method is used to assign subjects to groups.
Control over treatment The researcher usually . The researcher often , but instead studies pre-existing groups that received different treatments after the fact.
Use of Requires the use of . Control groups are not required (although they are commonly used).

Example of a true experiment vs a quasi-experiment

However, for ethical reasons, the directors of the mental health clinic may not give you permission to randomly assign their patients to treatments. In this case, you cannot run a true experiment.

Instead, you can use a quasi-experimental design.

You can use these pre-existing groups to study the symptom progression of the patients treated with the new therapy versus those receiving the standard course of treatment.

Prevent plagiarism. Run a free check.

Many types of quasi-experimental designs exist. Here we explain three of the most common types: nonequivalent groups design, regression discontinuity, and natural experiments.

Nonequivalent groups design

In nonequivalent group design, the researcher chooses existing groups that appear similar, but where only one of the groups experiences the treatment.

In a true experiment with random assignment , the control and treatment groups are considered equivalent in every way other than the treatment. But in a quasi-experiment where the groups are not random, they may differ in other ways—they are nonequivalent groups .

When using this kind of design, researchers try to account for any confounding variables by controlling for them in their analysis or by choosing groups that are as similar as possible.

This is the most common type of quasi-experimental design.

Regression discontinuity

Many potential treatments that researchers wish to study are designed around an essentially arbitrary cutoff, where those above the threshold receive the treatment and those below it do not.

Near this threshold, the differences between the two groups are often so minimal as to be nearly nonexistent. Therefore, researchers can use individuals just below the threshold as a control group and those just above as a treatment group.

However, since the exact cutoff score is arbitrary, the students near the threshold—those who just barely pass the exam and those who fail by a very small margin—tend to be very similar, with the small differences in their scores mostly due to random chance. You can therefore conclude that any outcome differences must come from the school they attended.

Natural experiments

In both laboratory and field experiments, researchers normally control which group the subjects are assigned to. In a natural experiment, an external event or situation (“nature”) results in the random or random-like assignment of subjects to the treatment group.

Even though some use random assignments, natural experiments are not considered to be true experiments because they are observational in nature.

Although the researchers have no control over the independent variable , they can exploit this event after the fact to study the effect of the treatment.

However, as they could not afford to cover everyone who they deemed eligible for the program, they instead allocated spots in the program based on a random lottery.

Although true experiments have higher internal validity , you might choose to use a quasi-experimental design for ethical or practical reasons.

Sometimes it would be unethical to provide or withhold a treatment on a random basis, so a true experiment is not feasible. In this case, a quasi-experiment can allow you to study the same causal relationship without the ethical issues.

The Oregon Health Study is a good example. It would be unethical to randomly provide some people with health insurance but purposely prevent others from receiving it solely for the purposes of research.

However, since the Oregon government faced financial constraints and decided to provide health insurance via lottery, studying this event after the fact is a much more ethical approach to studying the same problem.

True experimental design may be infeasible to implement or simply too expensive, particularly for researchers without access to large funding streams.

At other times, too much work is involved in recruiting and properly designing an experimental intervention for an adequate number of subjects to justify a true experiment.

In either case, quasi-experimental designs allow you to study the question by taking advantage of data that has previously been paid for or collected by others (often the government).

Quasi-experimental designs have various pros and cons compared to other types of studies.

  • Higher external validity than most true experiments, because they often involve real-world interventions instead of artificial laboratory settings.
  • Higher internal validity than other non-experimental types of research, because they allow you to better control for confounding variables than other types of studies do.
  • Lower internal validity than true experiments—without randomization, it can be difficult to verify that all confounding variables have been accounted for.
  • The use of retrospective data that has already been collected for other purposes can be inaccurate, incomplete or difficult to access.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Thomas, L. (2024, January 22). Quasi-Experimental Design | Definition, Types & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/methodology/quasi-experimental-design/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Other students also liked, guide to experimental design | overview, steps, & examples, random assignment in experiments | introduction & examples, control variables | what are they & why do they matter, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Privacy Policy

Research Method

Home » Quasi-Experimental Research Design – Types, Methods

Quasi-Experimental Research Design – Types, Methods

Table of Contents

Quasi-Experimental Design

Quasi-Experimental Design

Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable(s) that is available in a true experimental design.

In a quasi-experimental design, the researcher uses an existing group of participants that is not randomly assigned to the experimental and control groups. Instead, the groups are selected based on pre-existing characteristics or conditions, such as age, gender, or the presence of a certain medical condition.

Types of Quasi-Experimental Design

There are several types of quasi-experimental designs that researchers use to study causal relationships between variables. Here are some of the most common types:

Non-Equivalent Control Group Design

This design involves selecting two groups of participants that are similar in every way except for the independent variable(s) that the researcher is testing. One group receives the treatment or intervention being studied, while the other group does not. The two groups are then compared to see if there are any significant differences in the outcomes.

Interrupted Time-Series Design

This design involves collecting data on the dependent variable(s) over a period of time, both before and after an intervention or event. The researcher can then determine whether there was a significant change in the dependent variable(s) following the intervention or event.

Pretest-Posttest Design

This design involves measuring the dependent variable(s) before and after an intervention or event, but without a control group. This design can be useful for determining whether the intervention or event had an effect, but it does not allow for control over other factors that may have influenced the outcomes.

Regression Discontinuity Design

This design involves selecting participants based on a specific cutoff point on a continuous variable, such as a test score. Participants on either side of the cutoff point are then compared to determine whether the intervention or event had an effect.

Natural Experiments

This design involves studying the effects of an intervention or event that occurs naturally, without the researcher’s intervention. For example, a researcher might study the effects of a new law or policy that affects certain groups of people. This design is useful when true experiments are not feasible or ethical.

Data Analysis Methods

Here are some data analysis methods that are commonly used in quasi-experimental designs:

Descriptive Statistics

This method involves summarizing the data collected during a study using measures such as mean, median, mode, range, and standard deviation. Descriptive statistics can help researchers identify trends or patterns in the data, and can also be useful for identifying outliers or anomalies.

Inferential Statistics

This method involves using statistical tests to determine whether the results of a study are statistically significant. Inferential statistics can help researchers make generalizations about a population based on the sample data collected during the study. Common statistical tests used in quasi-experimental designs include t-tests, ANOVA, and regression analysis.

Propensity Score Matching

This method is used to reduce bias in quasi-experimental designs by matching participants in the intervention group with participants in the control group who have similar characteristics. This can help to reduce the impact of confounding variables that may affect the study’s results.

Difference-in-differences Analysis

This method is used to compare the difference in outcomes between two groups over time. Researchers can use this method to determine whether a particular intervention has had an impact on the target population over time.

Interrupted Time Series Analysis

This method is used to examine the impact of an intervention or treatment over time by comparing data collected before and after the intervention or treatment. This method can help researchers determine whether an intervention had a significant impact on the target population.

Regression Discontinuity Analysis

This method is used to compare the outcomes of participants who fall on either side of a predetermined cutoff point. This method can help researchers determine whether an intervention had a significant impact on the target population.

Steps in Quasi-Experimental Design

Here are the general steps involved in conducting a quasi-experimental design:

  • Identify the research question: Determine the research question and the variables that will be investigated.
  • Choose the design: Choose the appropriate quasi-experimental design to address the research question. Examples include the pretest-posttest design, non-equivalent control group design, regression discontinuity design, and interrupted time series design.
  • Select the participants: Select the participants who will be included in the study. Participants should be selected based on specific criteria relevant to the research question.
  • Measure the variables: Measure the variables that are relevant to the research question. This may involve using surveys, questionnaires, tests, or other measures.
  • Implement the intervention or treatment: Implement the intervention or treatment to the participants in the intervention group. This may involve training, education, counseling, or other interventions.
  • Collect data: Collect data on the dependent variable(s) before and after the intervention. Data collection may also include collecting data on other variables that may impact the dependent variable(s).
  • Analyze the data: Analyze the data collected to determine whether the intervention had a significant impact on the dependent variable(s).
  • Draw conclusions: Draw conclusions about the relationship between the independent and dependent variables. If the results suggest a causal relationship, then appropriate recommendations may be made based on the findings.

Quasi-Experimental Design Examples

Here are some examples of real-time quasi-experimental designs:

  • Evaluating the impact of a new teaching method: In this study, a group of students are taught using a new teaching method, while another group is taught using the traditional method. The test scores of both groups are compared before and after the intervention to determine whether the new teaching method had a significant impact on student performance.
  • Assessing the effectiveness of a public health campaign: In this study, a public health campaign is launched to promote healthy eating habits among a targeted population. The behavior of the population is compared before and after the campaign to determine whether the intervention had a significant impact on the target behavior.
  • Examining the impact of a new medication: In this study, a group of patients is given a new medication, while another group is given a placebo. The outcomes of both groups are compared to determine whether the new medication had a significant impact on the targeted health condition.
  • Evaluating the effectiveness of a job training program : In this study, a group of unemployed individuals is enrolled in a job training program, while another group is not enrolled in any program. The employment rates of both groups are compared before and after the intervention to determine whether the training program had a significant impact on the employment rates of the participants.
  • Assessing the impact of a new policy : In this study, a new policy is implemented in a particular area, while another area does not have the new policy. The outcomes of both areas are compared before and after the intervention to determine whether the new policy had a significant impact on the targeted behavior or outcome.

Applications of Quasi-Experimental Design

Here are some applications of quasi-experimental design:

  • Educational research: Quasi-experimental designs are used to evaluate the effectiveness of educational interventions, such as new teaching methods, technology-based learning, or educational policies.
  • Health research: Quasi-experimental designs are used to evaluate the effectiveness of health interventions, such as new medications, public health campaigns, or health policies.
  • Social science research: Quasi-experimental designs are used to investigate the impact of social interventions, such as job training programs, welfare policies, or criminal justice programs.
  • Business research: Quasi-experimental designs are used to evaluate the impact of business interventions, such as marketing campaigns, new products, or pricing strategies.
  • Environmental research: Quasi-experimental designs are used to evaluate the impact of environmental interventions, such as conservation programs, pollution control policies, or renewable energy initiatives.

When to use Quasi-Experimental Design

Here are some situations where quasi-experimental designs may be appropriate:

  • When the research question involves investigating the effectiveness of an intervention, policy, or program : In situations where it is not feasible or ethical to randomly assign participants to intervention and control groups, quasi-experimental designs can be used to evaluate the impact of the intervention on the targeted outcome.
  • When the sample size is small: In situations where the sample size is small, it may be difficult to randomly assign participants to intervention and control groups. Quasi-experimental designs can be used to investigate the impact of an intervention without requiring a large sample size.
  • When the research question involves investigating a naturally occurring event : In some situations, researchers may be interested in investigating the impact of a naturally occurring event, such as a natural disaster or a major policy change. Quasi-experimental designs can be used to evaluate the impact of the event on the targeted outcome.
  • When the research question involves investigating a long-term intervention: In situations where the intervention or program is long-term, it may be difficult to randomly assign participants to intervention and control groups for the entire duration of the intervention. Quasi-experimental designs can be used to evaluate the impact of the intervention over time.
  • When the research question involves investigating the impact of a variable that cannot be manipulated : In some situations, it may not be possible or ethical to manipulate a variable of interest. Quasi-experimental designs can be used to investigate the relationship between the variable and the targeted outcome.

Purpose of Quasi-Experimental Design

The purpose of quasi-experimental design is to investigate the causal relationship between two or more variables when it is not feasible or ethical to conduct a randomized controlled trial (RCT). Quasi-experimental designs attempt to emulate the randomized control trial by mimicking the control group and the intervention group as much as possible.

The key purpose of quasi-experimental design is to evaluate the impact of an intervention, policy, or program on a targeted outcome while controlling for potential confounding factors that may affect the outcome. Quasi-experimental designs aim to answer questions such as: Did the intervention cause the change in the outcome? Would the outcome have changed without the intervention? And was the intervention effective in achieving its intended goals?

Quasi-experimental designs are useful in situations where randomized controlled trials are not feasible or ethical. They provide researchers with an alternative method to evaluate the effectiveness of interventions, policies, and programs in real-life settings. Quasi-experimental designs can also help inform policy and practice by providing valuable insights into the causal relationships between variables.

Overall, the purpose of quasi-experimental design is to provide a rigorous method for evaluating the impact of interventions, policies, and programs while controlling for potential confounding factors that may affect the outcome.

Advantages of Quasi-Experimental Design

Quasi-experimental designs have several advantages over other research designs, such as:

  • Greater external validity : Quasi-experimental designs are more likely to have greater external validity than laboratory experiments because they are conducted in naturalistic settings. This means that the results are more likely to generalize to real-world situations.
  • Ethical considerations: Quasi-experimental designs often involve naturally occurring events, such as natural disasters or policy changes. This means that researchers do not need to manipulate variables, which can raise ethical concerns.
  • More practical: Quasi-experimental designs are often more practical than experimental designs because they are less expensive and easier to conduct. They can also be used to evaluate programs or policies that have already been implemented, which can save time and resources.
  • No random assignment: Quasi-experimental designs do not require random assignment, which can be difficult or impossible in some cases, such as when studying the effects of a natural disaster. This means that researchers can still make causal inferences, although they must use statistical techniques to control for potential confounding variables.
  • Greater generalizability : Quasi-experimental designs are often more generalizable than experimental designs because they include a wider range of participants and conditions. This can make the results more applicable to different populations and settings.

Limitations of Quasi-Experimental Design

There are several limitations associated with quasi-experimental designs, which include:

  • Lack of Randomization: Quasi-experimental designs do not involve randomization of participants into groups, which means that the groups being studied may differ in important ways that could affect the outcome of the study. This can lead to problems with internal validity and limit the ability to make causal inferences.
  • Selection Bias: Quasi-experimental designs may suffer from selection bias because participants are not randomly assigned to groups. Participants may self-select into groups or be assigned based on pre-existing characteristics, which may introduce bias into the study.
  • History and Maturation: Quasi-experimental designs are susceptible to history and maturation effects, where the passage of time or other events may influence the outcome of the study.
  • Lack of Control: Quasi-experimental designs may lack control over extraneous variables that could influence the outcome of the study. This can limit the ability to draw causal inferences from the study.
  • Limited Generalizability: Quasi-experimental designs may have limited generalizability because the results may only apply to the specific population and context being studied.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

One-to-One Interview in Research

One-to-One Interview – Methods and Guide

Research Methods

Research Methods – Types, Examples and Guide

Applied Research

Applied Research – Types, Methods and Examples

Explanatory Research

Explanatory Research – Types, Methods, Guide

Qualitative Research

Qualitative Research – Methods, Analysis Types...

Experimental Research Design

Experimental Design – Types, Methods, Guide

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Quasi Experimental Design Overview & Examples

By Jim Frost Leave a Comment

What is a Quasi Experimental Design?

A quasi experimental design is a method for identifying causal relationships that does not randomly assign participants to the experimental groups. Instead, researchers use a non-random process. For example, they might use an eligibility cutoff score or preexisting groups to determine who receives the treatment.

Image illustrating a quasi experimental design.

Quasi-experimental research is a design that closely resembles experimental research but is different. The term “quasi” means “resembling,” so you can think of it as a cousin to actual experiments. In these studies, researchers can manipulate an independent variable — that is, they change one factor to see what effect it has. However, unlike true experimental research, participants are not randomly assigned to different groups.

Learn more about Experimental Designs: Definition & Types .

When to Use Quasi-Experimental Design

Researchers typically use a quasi-experimental design because they can’t randomize due to practical or ethical concerns. For example:

  • Practical Constraints : A school interested in testing a new teaching method can only implement it in preexisting classes and cannot randomly assign students.
  • Ethical Concerns : A medical study might not be able to randomly assign participants to a treatment group for an experimental medication when they are already taking a proven drug.

Quasi-experimental designs also come in handy when researchers want to study the effects of naturally occurring events, like policy changes or environmental shifts, where they can’t control who is exposed to the treatment.

Quasi-experimental designs occupy a unique position in the spectrum of research methodologies, sitting between observational studies and true experiments. This middle ground offers a blend of both worlds, addressing some limitations of purely observational studies while navigating the constraints often accompanying true experiments.

A significant advantage of quasi-experimental research over purely observational studies and correlational research is that it addresses the issue of directionality, determining which variable is the cause and which is the effect. In quasi-experiments, an intervention typically occurs during the investigation, and the researchers record outcomes before and after it, increasing the confidence that it causes the observed changes.

However, it’s crucial to recognize its limitations as well. Controlling confounding variables is a larger concern for a quasi-experimental design than a true experiment because it lacks random assignment.

In sum, quasi-experimental designs offer a valuable research approach when random assignment is not feasible, providing a more structured and controlled framework than observational studies while acknowledging and attempting to address potential confounders.

Types of Quasi-Experimental Designs and Examples

Quasi-experimental studies use various methods, depending on the scenario.

Natural Experiments

This design uses naturally occurring events or changes to create the treatment and control groups. Researchers compare outcomes between those whom the event affected and those it did not affect. Analysts use statistical controls to account for confounders that the researchers must also measure.

Natural experiments are related to observational studies, but they allow for a clearer causality inference because the external event or policy change provides both a form of quasi-random group assignment and a definite start date for the intervention.

For example, in a natural experiment utilizing a quasi-experimental design, researchers study the impact of a significant economic policy change on small business growth. The policy is implemented in one state but not in neighboring states. This scenario creates an unplanned experimental setup, where the state with the new policy serves as the treatment group, and the neighboring states act as the control group.

Researchers are primarily interested in small business growth rates but need to record various confounders that can impact growth rates. Hence, they record state economic indicators, investment levels, and employment figures. By recording these metrics across the states, they can include them in the model as covariates and control them statistically. This method allows researchers to estimate differences in small business growth due to the policy itself, separate from the various confounders.

Nonequivalent Groups Design

This method involves matching existing groups that are similar but not identical. Researchers attempt to find groups that are as equivalent as possible, particularly for factors likely to affect the outcome.

For instance, researchers use a nonequivalent groups quasi-experimental design to evaluate the effectiveness of a new teaching method in improving students’ mathematics performance. A school district considering the teaching method is planning the study. Students are already divided into schools, preventing random assignment.

The researchers matched two schools with similar demographics, baseline academic performance, and resources. The school using the traditional methodology is the control, while the other uses the new approach. Researchers are evaluating differences in educational outcomes between the two methods.

They perform a pretest to identify differences between the schools that might affect the outcome and include them as covariates to control for confounding. They also record outcomes before and after the intervention to have a larger context for the changes they observe.

Regression Discontinuity

This process assigns subjects to a treatment or control group based on a predetermined cutoff point (e.g., a test score). The analysis primarily focuses on participants near the cutoff point, as they are likely similar except for the treatment received. By comparing participants just above and below the cutoff, the design controls for confounders that vary smoothly around the cutoff.

For example, in a regression discontinuity quasi-experimental design focusing on a new medical treatment for depression, researchers use depression scores as the cutoff point. Individuals with depression scores just above a certain threshold are assigned to receive the latest treatment, while those just below the threshold do not receive it. This method creates two closely matched groups: one that barely qualifies for treatment and one that barely misses out.

By comparing the mental health outcomes of these two groups over time, researchers can assess the effectiveness of the new treatment. The assumption is that the only significant difference between the groups is whether they received the treatment, thereby isolating its impact on depression outcomes.

Controlling Confounders in a Quasi-Experimental Design

Accounting for confounding variables is a challenging but essential task for a quasi-experimental design.

In a true experiment, the random assignment process equalizes confounders across the groups to nullify their overall effect. It’s the gold standard because it works on all confounders, known and unknown.

Unfortunately, the lack of random assignment can allow differences between the groups to exist before the intervention. These confounding factors might ultimately explain the results rather than the intervention.

Consequently, researchers must use other methods to equalize the groups roughly using matching and cutoff values or statistically adjust for preexisting differences they measure to reduce the impact of confounders.

A key strength of quasi-experiments is their frequent use of “pre-post testing.” This approach involves conducting initial tests before collecting data to check for preexisting differences between groups that could impact the study’s outcome. By identifying these variables early on and including them as covariates, researchers can more effectively control potential confounders in their statistical analysis.

Additionally, researchers frequently track outcomes before and after the intervention to better understand the context for changes they observe.

Statisticians consider these methods to be less effective than randomization. Hence, quasi-experiments fall somewhere in the middle when it comes to internal validity , or how well the study can identify causal relationships versus mere correlation . They’re more conclusive than correlational studies but not as solid as true experiments.

In conclusion, quasi-experimental designs offer researchers a versatile and practical approach when random assignment is not feasible. This methodology bridges the gap between controlled experiments and observational studies, providing a valuable tool for investigating cause-and-effect relationships in real-world settings. Researchers can address ethical and logistical constraints by understanding and leveraging the different types of quasi-experimental designs while still obtaining insightful and meaningful results.

Cook, T. D., & Campbell, D. T. (1979).  Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin

Share this:

in quasi experimental research designs causal interpretations can be made

Reader Interactions

Comments and questions cancel reply.

Quasi-experimental designs for causal inference: an overview

  • Published: 26 June 2024
  • Volume 25 , pages 611–627, ( 2024 )

Cite this article

in quasi experimental research designs causal interpretations can be made

  • Heining Cham   ORCID: orcid.org/0000-0002-2933-056X 1 ,
  • Hyunjung Lee 1 &
  • Igor Migunov 1  

247 Accesses

1 Altmetric

Explore all metrics

The randomized control trial (RCT) is the primary experimental design in education research due to its strong internal validity for causal inference. However, in situations where RCTs are not feasible or ethical, quasi-experiments are alternatives to establish causal inference. This paper serves as an introduction to several quasi-experimental designs: regression discontinuity design, difference-in-differences analysis, interrupted time series design, instrumental variable analysis, and propensity score analysis with examples in education research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

in quasi experimental research designs causal interpretations can be made

Similar content being viewed by others

in quasi experimental research designs causal interpretations can be made

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

in quasi experimental research designs causal interpretations can be made

Understanding the counterfactual approach to instrumental variables: a practical guide

Explore related subjects.

  • Artificial Intelligence

The search engine by EBSCO does not offer searches within the publications’ keywords. We replicated the same search in PsycINFO, and its search engine allows searches within the publications’ keywords. The results from PsycINFO were, in general, consistent with the results from ERIC and are available upon request.

Latif and Miles ( 2020 ) had another group of students who were given in-class quizzes after midterm #1. For simplicity, we did not include this group in this paper.

Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91 (434), 444–455. https://doi.org/10.1080/01621459.1996.10476902

Article   Google Scholar  

Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55 (4), 1770–1780. https://doi.org/10.1016/j.csda.2010.11.008

Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28 (25), 3083–3107. https://doi.org/10.1002/sim.3697

Austin, P. C. (2014). A comparison of 12 algorithms for matching on the propensity score. Statistics in Medicine, 33 (6), 1057–1069. https://doi.org/10.1002/sim.6004

Baiocchi, M., Cheng, J., & Small, D. S. (2014). Tutorial in biostatistics: Instrumental variable methods for causal inference. Statistics in Medicine, 33 (13), 2297–2340. https://doi.org/10.1002/sim.6128

Bloom, H. S. (2012). Modern regression discontinuity analysis. Journal of Research on Educational Effectiveness, 5 (1), 43–82. https://doi.org/10.1080/19345747.2011.578707

Cannas, M., & Arpino, B. (2019). A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting. Biometrical Journal, 61 (4), 1049–1072. https://doi.org/10.1002/bimj.201800132

Cham, H. (2022). Quasi-experimental designs. In G. J. G. Asmundson (Ed.), Comprehensive clinical psychology (2nd ed., pp. 29–48). Elsevier.

Chapter   Google Scholar  

Cham, H., & West, S. G. (2016). Propensity score analysis with missing data. Psychological Methods, 21 (3), 427–445. https://doi.org/10.1037/met0000076

Collier, Z. K., Zhang, H., & Liu, L. (2022). Explained: Artificial intelligence for propensity score estimation in multilevel educational settings. Practical Assessment, Research & Evaluation, 27 , 3.

Google Scholar  

Cook, T. D. (2008). “Waiting for life to arrive”: A history of the regression-discontinuity design in psychology, statistics and economics. Journal of Econometrics, 142 (2), 636–654. https://doi.org/10.1016/j.jeconom.2007.05.002

Cunningham, S. (2021). Causal inference: The mixtape. Yale University Press . https://doi.org/10.2307/j.ctv1c29t27

Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95 (3), 932–945. https://doi.org/10.1162/REST_a_00318

Enders, C. K. (2022). Applied missing data analysis (2nd ed.). Guilford Press.

Feely, M., Seay, K. D., Lanier, P., Auslander, W., & Kohl, P. L. (2018). Measuring fidelity in research studies: A field guide to developing a comprehensive fidelity measurement system. Child and Adolescent Social Work Journal, 35 (2), 139–152. https://doi.org/10.1007/s10560-017-0512-6

Grimm, K. J., & McArdle, J. J. (2023). Latent curve modeling of longitudinal growth data. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (2nd ed., pp. 556–575). Guilford Press.

Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 (1), 25–46. https://doi.org/10.1093/pan/mpr025

Ho, D., Imai, K., King, G., & Stuart, E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15 (3), 199–236. https://doi.org/10.1093/pan/mpl013

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81 (396), 945–960. https://doi.org/10.2307/2289064

Huang, H., Cagle, P. J., Mazumdar, M., & Poeran, J. (2019). Statistics in brief: Instrumental variable analysis: An underutilized method in orthopaedic research. Clinical Orthopaedics and Related Research, 477 (7), 1750–1755. https://doi.org/10.1097/CORR.0000000000000729

Hughes, J. N., West, S. G., Kim, H., & Bauer, S. S. (2018). Effect of early grade retention on school completion: A prospective study. Journal of Educational Psychology, 110 (7), 974–991. https://doi.org/10.1037/edu0000243

Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B (statistical Methodology), 76 (1), 243–263.

Imbens, G. W., & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142 (2), 615–635. https://doi.org/10.1016/j.jeconom.2007.05.001

Jacob, R., Zhu, P., Somers, M. A., & Bloom, H. (2012). A practical guide to regression discontinuity . MDRC.

Jennings, P. A., Brown, J. L., Frank, J. L., Doyle, S., Oh, Y., Davis, R., Rasheed, D., DeWeese, A., DeMauro, A. A., Cham, H., & Greenberg, M. T. (2017). Impacts of the CARE for teachers program on teachers’ social and emotional competence and classroom interactions. Journal of Educational Psychology, 109 (7), 1010–1028. https://doi.org/10.1037/edu0000187

Kang, J., Chan, W., Kim, M. O., & Steiner, P. M. (2016). Practice of causal inference with the propensity of being zero or one: Assessing the effect of arbitrary cutoffs of propensity scores. Communications for Statistical Applications and Methods, 23 (1), 1–20. https://doi.org/10.5351/CSAM.2016.23.1.001

Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22 (4), 523–539. https://doi.org/10.1214/07-STS227

Kim, Y., & Steiner, P. (2016). Quasi-experimental designs for causal inference. Educational Psychologist, 51 (3–4), 395–405. https://doi.org/10.1080/00461520.2016.1207177

Kwok, O. M., West, S. G., & Green, S. B. (2007). The impact of misspecifying the within-subject covariance structure in multiwave longitudinal multilevel models: A Monte Carlo study. Multivariate Behavioral Research, 42 (3), 557–592. https://doi.org/10.1080/00273170701540537

Labrecque, J., & Swanson, S. A. (2018). Understanding the assumptions underlying instrumental variable analyses: A brief review of falsification strategies and related tools. Current Epidemiology Reports, 5 (3), 214–220. https://doi.org/10.1007/s40471-018-0152-1

Latif, E., & Miles, S. (2020). The impact of assignments and quizzes on exam grades: A difference-in-difference approach. Journal of Statistics Education, 28 (3), 289–294. https://doi.org/10.1080/10691898.2020.1807429

Lee, D. S., & Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of Economic Literature, 48 (2), 281–355. https://doi.org/10.1257/jel.48.2.281

Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29 (3), 337–346. https://doi.org/10.1002/sim.3782

Leite, W. L., Jimenez, F., Kaya, Y., Stapleton, L. M., MacInnes, J. W., & Sandbach, R. (2015). An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivariate Behavioral Research, 50 (3), 265–284. https://doi.org/10.1080/00273171.2014.991018

Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). John Wiley & Sons.

Lousdal, M. L. (2018). An introduction to instrumental variable assumptions, validation and estimation. Emerging Themes in Epidemiology, 22 (15), 1–7. https://doi.org/10.1186/s12982-018-0069-7

Maynard, C., & Young, C. (2022). The results of using a traits-based rubric on the writing performance of third grade students. Texas Journal of Literacy Education, 9 (2), 102–128.

McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9 (4), 403–425. https://doi.org/10.1037/1082-989X.9.4.403

Neyman, J., Dabrowska, D. M., & Speed, T. P. (1990). On the application of probability theory to agricultural experiments: Essay on principles. Statistical Science, 5 (4), 465–472.

Nguyen, T. T., Tchetgen Tchetgen, E. J., Kawachi, I., Gilman, S. E., Walter, S., Liu, S. Y., Manly, J. J., & Glymour, M. M. (2016). Instrumental variable approaches to identifying the causal effect of educational attainment on dementia risk. Annals of Epidemiology, 26 (1), 71–76. https://doi.org/10.1016/j.annepidem.2015.10.006

Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.

Book   Google Scholar  

Reichardt, C. S. (2019). Quasi-experimentation: A guide to design and analysis . Guilford Press.

Rubin, D. B. (2006). Matched sampling for causal effects . Cambridge University Press.

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70 (1), 41–55. https://doi.org/10.1093/biomet/70.1.41

Roth, J., Sant’Anna, P. H., Bilinski, A., & Poe, J. (2023). What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235 (2), 2218–2244. https://doi.org/10.1016/j.jeconom.2023.03.008

Sagarin, B. J., West, S. G., Ratnikov, A., Homan, W. K., Ritchie, T. D., & Hansen, E. J. (2014). Treatment noncompliance in randomized experiments: Statistical approaches and design issues. Psychological Methods, 19 (3), 317–333. https://doi.org/10.1037/met0000013

Schafer, J. L., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13 (4), 279–313. https://doi.org/10.1037/a0014268

Shadish, W., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference . Houghton Mifflin.

Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15 (3), 250–267. https://doi.org/10.1037/a0018719

Steiner, P. M., Shadish, W. R., & Sullivan, K. J. (2023). Frameworks for causal inference in psychological science. In H. Cooper, M. N. Coutanche, L. M. McMullen, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology: Foundations, planning, measures, and psychometrics (2nd ed., pp. 23–56). American Psychological Association.

Stuart, E. A., Huskamp, H. A., Duckworth, K., Simmons, J., Song, Z., Chernew, M. E., & Barry, C. L. (2014). Using propensity scores in difference-in-differences models to estimate the effects of a policy change. Health Services and Outcomes Research Methodology, 14 , 166–182. https://doi.org/10.1007/s10742-014-0123-z

Suk, Y., Steiner, P. M., Kim, J. S., & Kang, H. (2022). Regression discontinuity designs with an ordinal running variable: Evaluating the effects of extended time accommodations for English-language learners. Journal of Educational and Behavioral Statistics, 47 (4), 459–484. https://doi.org/10.3102/10769986221090275

Tarr, A., & Imai, K. (2021). Estimating average treatment effects with support vector machines. arXiv preprint. https://arxiv.org/abs/2102.11926

Thoemmes, F. J., & West, S. G. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46 (3), 514–543. https://doi.org/10.1080/00273171.2011.569395

U.S. Department of Education (2022). What works clearinghouse: Procedures and standards handbook (Version 5.0). https://ies.ed.gov/ncee/wwc/Docs/referenceresources/Final_WWC-HandbookVer5_0-0-508.pdf

West, S. G., Cham, H., & Liu, Y. (2014). Causal inference and generalization in field settings: Experimental and quasi-experimental designs. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (2nd ed., pp. 49–80). Cambridge University Press.

Wong, V. C., Cook, T. D., Barnett, W. S., & Jung, K. (2008). An effectiveness-based evaluation of five state pre-kindergarten programs. Journal of Policy Analysis and Management: THe Journal of the Association for Public Policy Analysis and Management, 27 (1), 122–154. https://doi.org/10.1002/pam.20310

Wong, V. C., Wing, C., Steiner, P. M., Wong, M., & Cook, T. D. (2013). Research designs for program evaluation. In J. A. Schinka, W. F. Velicer, & I. B. Weiner (Eds.), Handbook of psychology: Research methods in psychology (2nd ed., pp. 316–341). John Wiley and Sons, Inc.

Download references

Acknowledgements

This research was supported by a R01 grant from the National Institute on Aging (NIA) (R01AG065110), R01 grants from the National Institute on Minority Health and Health Disparities (R01MD015763 and R01MD015715), and a R21 grant from the National Institute of Mental Health (R21MH124902). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging, National Institute on Minority Health and Health Disparities, or the National Institute of Mental Health. We thank Dr. Peter M. Steiner, Dr. Yongnam Kim, and the anonymous reviewers for their valuable comments and suggestions on the earlier draft of this paper.

Author information

Authors and affiliations.

Department of Psychology, Fordham University, 441 E. Fordham Road, Bronx, NY, 10461, USA

Heining Cham, Hyunjung Lee & Igor Migunov

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Heining Cham .

Ethics declarations

Conflict of interest.

All authors declare that they have no conflicts of interest.

Ethical approval

This research article does not involve any human participants or animal subjects. No data collection are involved.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cham, H., Lee, H. & Migunov, I. Quasi-experimental designs for causal inference: an overview. Asia Pacific Educ. Rev. 25 , 611–627 (2024). https://doi.org/10.1007/s12564-024-09981-2

Download citation

Received : 01 June 2023

Revised : 05 June 2024

Accepted : 14 June 2024

Published : 26 June 2024

Issue Date : September 2024

DOI : https://doi.org/10.1007/s12564-024-09981-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quasi-experiment
  • Regression discontinuity
  • Difference-in-differences
  • Interrupted time series
  • Instrumental variable
  • Propensity score
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 26 November 2018

Quasi-experimental causality in neuroscience and behavioural research

  • Ioana E. Marinescu 1 ,
  • Patrick N. Lawlor 2 &
  • Konrad P. Kording   ORCID: orcid.org/0000-0001-8408-4499 3 , 4  

Nature Human Behaviour volume  2 ,  pages 891–898 ( 2018 ) Cite this article

8422 Accesses

62 Citations

238 Altmetric

Metrics details

  • Neuroscience

In many scientific domains, causality is the key question. For example, in neuroscience, we might ask whether a medication affects perception, cognition or action. Randomized controlled trials are the gold standard to establish causality, but they are not always practical. The field of empirical economics has developed rigorous methods to establish causality even when randomized controlled trials are not available. Here we review these quasi-experimental methods and highlight how neuroscience and behavioural researchers can use them to do research that can credibly demonstrate causal effects.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

111,21 € per year

only 9,27 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

in quasi experimental research designs causal interpretations can be made

Similar content being viewed by others

in quasi experimental research designs causal interpretations can be made

Causation in neuroscience: keeping mechanism meaningful

in quasi experimental research designs causal interpretations can be made

Causal inference on human behaviour

in quasi experimental research designs causal interpretations can be made

Comparing meta-analyses and preregistered multiple-laboratory replication projects

Pearl, J. Causality (Cambridge Univ. Press, New York, 2009).

The Notorious B.I.G. Mo’ Money Mo’ Problems (Bad Boy Records, 1997).

Grodstein, F. et al. A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann. Intern. Med. 133 , 933–941 (2000).

Article   CAS   Google Scholar  

Manson, J. E. et al. Estrogen plus progestin and the risk of coronary heart disease. N. Engl. J. Med. 349 , 523–534 (2003).

Humphrey, L. L., Chan, B. K. & Sox, H. C. Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Ann. Intern. Med. 137 , 273–284 (2002).

Greenland, S. Randomization, statistics, and causal inference. Epidemiology 1 , 421–429 (1990).

Ismail-Beigi, F. et al. Effect of intensive treatment of hyperglycaemia on microvascular outcomes in type 2 diabetes: an analysis of the ACCORD randomised trial. Lancet 376 , 419–430 (2010).

Article   Google Scholar  

Officers, T. A. Major outcomes in high-risk hypertensive patients randomized to or calcium channel blocker vs diuretic. J. Am. Med. Assoc. 288 , 2981–2997 (2002).

Group, S. R. A randomized trial of intensive versus standard blood-pressure control. N. Engl. J. Med. 373 , 2103–2116 (2015).

Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37 , 424–438 (1969).

Angrist, J. D. & Pischke, J.-S. Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton Univ. Press, Princeton, 2008).

Leamer, E. E. Let’s take the con out of econometrics. Am. Econ. Rev. 73 , 31–43 (1983).

Google Scholar  

Thistlethwaite, D. L. & Campbell, D. T. Regression-discontinuity analysis: an alternative to the ex-post facto experiment. J. Educ. Psychol. 51 , 309–317 (1960).

Imbens, G. W. & Lemieux, T. Regression discontinuity designs: a guide to practice. J. Econom. 142 , 615–635 (2008).

Angrist, J., Azoulay, P., Ellison, G., Hill, R. & Lu, S. F. Economic research evolves: fields and styles. Am. Econ. Rev. 107 , 293–297 (2017).

Angrist, J., Azoulay, P., Ellison, G., Hill, R. & Lu, S. F. Inside Job or Deep Impact? Using Extramural Citations to Assess Economic Scholarship (National Bureau of Economic Research, 2017).

McCrary, J. Manipulation of the running variable in the regression discontinuity design: a density test. J. Econom. 142 , 698–714 (2008).

Trochim, W. M. Research Design for Program Evaluation: The Regression-Discontinuity Approach (Sage Publications, Beverly Hills, 1984).

Jacob, R., Zhu, P., Somers, M. A., & Bloom, H. A practical guide to regression discontinuity. MDRC https://www.mdrc.org/publication/practical-guide-regression-discontinuity (2012).

Lansdell, B. & Kording, K. Spiking allows neurons to estimate their causal effect. Preprint at bioRxiv https://doi.org/10.1101/253351 (2018).

Moscoe, E., Bor, J. & Bärnighausen, T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: a review of current and best practice. J. Clin. Epidemiol. 68 , 122–133 (2015).

Pischke, J. S. The impact of length of the school year on student performance and earnings: evidence from the German short school years. Econ. J. 117 , 1216–1242 (2007).

Athey, S. & Imbens, G. W. Identification and inference in nonlinear difference-in-differences models. Econometrica 74 , 431–497 (2006).

Angrist, J. D. et al. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91 , 444–455 (2016).

Evans, W. N. & Ringel, J. S. Can higher cigarette taxes improve birth outcomes? J. Public. Econ. 72 , 135–154 (1999).

Stock, J. H. & Yogo, M. Testing for Weak Instruments in Linear IV Regression . (National Bureau of Economic Research, Cambridge, 2002).

Book   Google Scholar  

Li, X., Yamawaki, N., Barrett, J. M., Körding, K. P. & Shepherd, G. Scaling of optogenetically evoked signaling in a higher-order corticocortical pathway in the anesthetized mouse. Front. Syst. Neurosci . 12 , 16 (2018).

Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, Cambridge, 2009).

Ullman, J. B., & Bentler, P. M. in Handbook of Psychology 2nd edn (eds Schinka, J. A. & Velicer, W. F.) Ch. 23 (John Wiley & Sons, Hoboken, 2012).

Dehejia, R. H. & Wahba, S. Propensity score-matching methods for nonexperimental causal studies. Rev. Econ. Stat. 84 , 151–161 (2002).

Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70 , 41–55 (1983).

Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge Univ. Press, New York, 2015).

Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J. & Schölkopf, B. Nonlinear causal discovery with additive noise models. Adv. Neural Inf. Process. Syst. 21 , 689–696 (2009).

Abadie, A., Diamond, A. & Hainmueller, J. Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J. Am. Stat. Assoc. 105 , 493–505 (2010).

Ramsey, J. D. et al. Six problems for causal inference from fMRI. NeuroImage 49 , 1545–1558 (2010).

Jonas, E. & Kording, K. P. Could a neuroscientist understand a microprocessor? PLoS Comput. Biol. 13 , e1005268 (2017).

Valdes-Sosa, P. A., Roebroeck, A., Daunizeau, J. & Friston, K. Effective connectivity: influence, causality and biophysical modeling. NeuroImage 58 , 339–361 (2011).

Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M. & Beckmann, C. F. et al. Network modelling methods for FMRI. NeuroImage 54 , 875–891 (2011).

Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proc. 19th International Conference on World Wide Web 591–600 (ACM, 2010).

Bem, J. Using match confidence to adjust a performance threshold. Google Patent 7346615 (2008).

Slemrod, J. Buenas notches: lines and notches in tax system design. EJ Tax Res. 11 , 259–283 (2013).

Angrist, J. D. & Pischke, J.-S. The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24 , 3–30 (2010).

Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A. & Poeppel, D. Neuroscience needs behavior: correcting a reductionist bias. Neuron (in the press).

Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454 , 995–999 (2008).

Stevenson, I. H. & Körding, K. P. On the similarity of functional connectivity between neurons estimated across timescales. PLoS ONE 5 , e9206 (2010).

Sakkalis, V. Review of advanced techniques for the estimation of brain connectivity measured with EEG/MEG. Comput. Biol. Med. 41 , 1110–1117 (2011).

Bressler, S. L. & Seth, A. K. Wiener–Granger causality: a well established methodology. Neuroimage 58 , 323–329 (2011).

Ding, M., Chen, Y. & Bressler, S. L. in Handbook of Time Series Analysis: Recent Theoretical Developments and Applications (eds Schelter, B., Winterhalder, M. & Timmer, J.) 335–368 (Wiley-VCH, Weinheim, 2006).

Hiemstra, C. & Jones, J. D. Testing for linear and nonlinear Granger causality in the stock price — volume relation. J. Finance 49 , 1639–1664 (1994).

Chen, Z. Advanced State Space Methods for Neural and Clinical Data (Cambridge Univ. Press, Cambridge, 2015).

Shumway, R. H. & Stoffer, D. S. in Time Series Analysis and its Applications (Shumway, R. H. & Stoffer, D. S.) 319–404 (Springer, New York, 2011).

Friston, K. J., Harrison, L. & Penny, W. Dynamic causal modelling. NeuroImage 19 , 1273–1302 (2003).

Semedo, J., Zandvakili, A., Kohn, A., Machens, C. K. & Byron, M. Y. Extracting latent structure from multiple interacting neural populations. Adv. Neural Inf. Process. Syst. 27 , 2942–2950 (2014).

Daunizeau, J., David, O. & Stephan, K. E. Dynamic causal modelling: a critical review of the biophysical and statistical foundations. NeuroImage 58 , 312–322 (2009).

Latimer, K. W., Yates, J. L., Meister, M. L. R., Huk, A. C. & Pillow, J. W. Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science 349 , 184–187 (2015).

Nevo, A. & Whinston, M. D. Taking the dogma out of econometrics: structural modeling and credible inference. J. Econ. Perspect. 24 , 69–82 (2010).

Song, L., Kolar, M. & Xing, E. P. Time-varying dynamic bayesian networks. Adv. Neural Inf. Process. Syst. 22 , 1732–1740 (2009).

Goodman, N. D., Ullman, T. D. & Tenenbaum, J. B. Learning a theory of causality. Psychol. Rev. 118 , 110–119 (2011).

Gopnik, A. et al. A Theory of causal learning in children: causal maps and Bayes nets. Psychol. Rev. 111 , 3–32 (2004).

Gopnik, A. & Tenenbaum, J. B. Bayesian networks, Bayesian learning and cognitive development. Dev. Sci. 10 , 281–287 (2007).

Körding, K. P. et al. Causal inference in multisensory perception. PLoS ONE 2 , e943 (2007).

Download references

Author information

Authors and affiliations.

Department of Social Policy and Practice, University of Pennsylvania, Philadelphia, PA, USA

Ioana E. Marinescu

Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA

Patrick N. Lawlor

Departments of Neuroscience and Bioengineering, Leonard Davis Institute, Warren Center for Network Science, Wharton Neuroscience Initiative, University of Pennsylvania, Philadelphia, PA, USA

Konrad P. Kording

Canadian Institute For Advanced Research, Toronto, Ontario, Canada

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ioana E. Marinescu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Marinescu, I.E., Lawlor, P.N. & Kording, K.P. Quasi-experimental causality in neuroscience and behavioural research. Nat Hum Behav 2 , 891–898 (2018). https://doi.org/10.1038/s41562-018-0466-5

Download citation

Received : 18 March 2018

Accepted : 02 October 2018

Published : 26 November 2018

Issue Date : December 2018

DOI : https://doi.org/10.1038/s41562-018-0466-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

in quasi experimental research designs causal interpretations can be made

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Quasi-Experimental Designs for Causal Inference

Affiliation.

  • 1 Department of Educational Psychology, University of Wisconsin-Madison.
  • PMID: 30100637
  • PMCID: PMC6086368
  • DOI: 10.1080/00461520.2016.1207177

When randomized experiments are infeasible, quasi-experimental designs can be exploited to evaluate causal treatment effects. The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score designs, and comparative interrupted time series designs. This article introduces for each design the basic rationale, discusses the assumptions required for identifying a causal effect, outlines methods for estimating the effect, and highlights potential validity threats and strategies for dealing with them. Causal estimands and identification results are formalized with the potential outcomes notations of the Rubin causal model.

PubMed Disclaimer

A hypothetical example of regression…

A hypothetical example of regression discontinuity design. Note . GPA = grade point…

A diagram of an example…

A diagram of an example of instrumental variable design.

A hypothetical example of comparative…

A hypothetical example of comparative interrupted time series design.

Similar articles

  • Graphical Models for Quasi-experimental Designs. Steiner PM, Kim Y, Hall CE, Su D. Steiner PM, et al. Sociol Methods Res. 2017 Mar;46(2):155-188. doi: 10.1177/0049124115582272. Epub 2015 May 14. Sociol Methods Res. 2017. PMID: 30174355 Free PMC article.
  • Quasi-experimental study designs series-paper 7: assessing the assumptions. Bärnighausen T, Oldenburg C, Tugwell P, Bommer C, Ebert C, Barreto M, Djimeu E, Haber N, Waddington H, Rockers P, Sianesi B, Bor J, Fink G, Valentine J, Tanner J, Stanley T, Sierra E, Tchetgen ET, Atun R, Vollmer S. Bärnighausen T, et al. J Clin Epidemiol. 2017 Sep;89:53-66. doi: 10.1016/j.jclinepi.2017.02.017. Epub 2017 Mar 29. J Clin Epidemiol. 2017. PMID: 28365306
  • Designs of Empirical Evaluations of Nonexperimental Methods in Field Settings. Wong VC, Steiner PM. Wong VC, et al. Eval Rev. 2018 Apr;42(2):176-213. doi: 10.1177/0193841X18778918. Epub 2018 Jun 28. Eval Rev. 2018. PMID: 29954223
  • Regression Discontinuity for Causal Effect Estimation in Epidemiology. Oldenburg CE, Moscoe E, Bärnighausen T. Oldenburg CE, et al. Curr Epidemiol Rep. 2016;3:233-241. doi: 10.1007/s40471-016-0080-x. Epub 2016 Aug 5. Curr Epidemiol Rep. 2016. PMID: 27547695 Free PMC article. Review.
  • Applying Causal Inference Methods in Psychiatric Epidemiology: A Review. Ohlsson H, Kendler KS. Ohlsson H, et al. JAMA Psychiatry. 2020 Jun 1;77(6):637-644. doi: 10.1001/jamapsychiatry.2019.3758. JAMA Psychiatry. 2020. PMID: 31825494 Free PMC article. Review.
  • Modified inverse propensity weighting method to alleviate estimation errors in the model with multiple endogenous variables. Dhakal B, McLeod GFH, Insch A, Boden JM. Dhakal B, et al. MethodsX. 2023 Dec 20;12:102513. doi: 10.1016/j.mex.2023.102513. eCollection 2024 Jun. MethodsX. 2023. PMID: 38192361 Free PMC article.
  • Effect of high-risk versus low-risk pregnancy at the first antenatal care visit on the occurrence of complication during pregnancy and labour or delivery in Kenya: a double-robust estimation. Bagayoko M, Kadengye DT, Odero HO, Izudi J. Bagayoko M, et al. BMJ Open. 2023 Oct 29;13(10):e072451. doi: 10.1136/bmjopen-2023-072451. BMJ Open. 2023. PMID: 37899166 Free PMC article.
  • Causal models and causal modelling in obesity: foundations, methods and evidence. Zoh RS, Yu X, Dawid P, Smith GD, French SJ, Allison DB. Zoh RS, et al. Philos Trans R Soc Lond B Biol Sci. 2023 Oct 23;378(1888):20220227. doi: 10.1098/rstb.2022.0227. Epub 2023 Sep 4. Philos Trans R Soc Lond B Biol Sci. 2023. PMID: 37661742 Free PMC article. Review.
  • A Systematic Review of Outcomes Related to Nurse Practitioner-Delivered Primary Care for Multiple Chronic Conditions. McMenamin A, Turi E, Schlak A, Poghosyan L. McMenamin A, et al. Med Care Res Rev. 2023 Dec;80(6):563-581. doi: 10.1177/10775587231186720. Epub 2023 Jul 12. Med Care Res Rev. 2023. PMID: 37438917 Free PMC article. Review.
  • Do it fast! Early access to specialized care improved long-term outcomes in rheumatoid arthritis: data from the REAL multicenter observational study. Albuquerque CP, Reis APMG, Vargas Santos AB, Bértolo MB, Júnior PL, Neubarth Giorgi RD, Radominski SC, Guimarães MFBR, Bonfiglioli KR, L Cunha Sauma MF, Pereira IA, Brenol CV, Henrique Mota LM, Santos-Neto L, Castelar Pinheiro GR. Albuquerque CP, et al. Adv Rheumatol. 2023 Apr 24;63(1):17. doi: 10.1186/s42358-023-00301-7. Adv Rheumatol. 2023. PMID: 37095556
  • Angrist JD, Imbens GW, & Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–455.
  • Angrist JD, & Krueger AB (1992). The effect of age at school entry on educational attainment: An application of instrumental variables with moments from two samples. Journal of the American Statistical Association, 87, 328–336.
  • Angrist JD, & Lavy V (1999). Using Maimonides’ rule to estimate the effect of class size on scholastic achievment. Quarterly Journal of Economics, 114, 533–575.
  • Angrist JD, & Pischke JS (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press.
  • Angrist JD, & Pischke JS (2015). Mastering’metrics: The path from cause to effect. Princeton, NJ: Princeton University Press.

Related information

Grants and funding.

  • P2C HD047873/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • PubMed Central

Other Literature Sources

  • scite Smart Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

IMAGES

  1. What Is Quasi Experimental Research

    in quasi experimental research designs causal interpretations can be made

  2. experimental and quasi experimental designs

    in quasi experimental research designs causal interpretations can be made

  3. Advantages Of Quasi Experimental Research

    in quasi experimental research designs causal interpretations can be made

  4. Advantages Of Quasi Experimental Research

    in quasi experimental research designs causal interpretations can be made

  5. Understanding reliability and validity in qualitative research

    in quasi experimental research designs causal interpretations can be made

  6. What Is Quasi Experimental Research

    in quasi experimental research designs causal interpretations can be made

VIDEO

  1. AI for Growth: A Proposed Quasi-Experimental Research Study (Using ITS Design)

  2. RESEARCH DESIGNS Quasi experimental designs Eddie Seva See

  3. A Gentle Introduction to Propensity Score

  4. Quasi-Experiment Design That Use Control Group

  5. Chapter 12: Quasi Experimental Research Designs

  6. Quasi experimental research design|3rd yr bsc nursing #notes #nursing #research

COMMENTS

  1. Quasi-Experimental Designs for Causal Inference - PMC

    This article discusses four of the strongest quasi-experimental designs for identifying causal effects: regression discontinuity design, instrumental variable design, matching and propensity score designs, and the comparative interrupted time series design.

  2. Quasi-Experimental Design | Definition, Types & Examples

    Quasi-experimental design attempts to establish a cause-and-effect relationship by using criteria other than randomization.

  3. Quasi-Experimental Research Design – Types, Methods

    Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable (s) that is available in a true experimental design.

  4. Quasi Experimental Design Overview & Examples - Statistics by Jim

    A quasi experimental design is a method for identifying causal relationships that does not randomly assign participants to the experimental groups. Instead, researchers use a non-random process. For example, they might use an eligibility cutoff score or preexisting groups to determine who receives the treatment.

  5. Quasi-experimental designs for causal inference: an overview

    Quasi-experimental designs aim to establish the causal effect of a treatment on an outcome in the absence of RCTs. The regression discontinuity design (RDD) is a quasi-experiment that can achieve high internal validity. As an example of the RDD, Wong et al. (2008) conducted a study in the U.S.

  6. Quantifying causality in data science with quasi-experiments

    Here we review approaches to causality that are popular in econometrics and that exploit (quasi) random variation in existing data, called quasi-experiments, and show how they can be combined...

  7. Causation and Experimental Design - SAGE Publications Inc

    This chapter considers the meaning of causation, the criteria for achieving causally valid explanations, the ways in which experimental and quasi-experimental research designs seek to meet these criteria, and the difficulties that can sometimes result in invalid conclusions.

  8. Quasi-experimental causality in neuroscience and behavioural ...

    Here we review these quasi-experimental methods and highlight how neuroscience and behavioural researchers can use them to do research that can credibly demonstrate causal effects.

  9. Quasi-Experimental Designs for Causal Inference - PubMed

    The strongest quasi-experimental designs for causal inference are regression discontinuity designs, instrumental variable designs, matching and propensity score designs, and comparative interrupted time series designs.

  10. Use of Quasi-Experimental Research Designs in Education ...

    The overarching purpose of this chapter is to explore and document the growth, applicability, promise, and limitations of quasi-experimental research designs in education research.