• Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety

pretest and posttest experimental group design

Pretest-Posttest Designs

For many true experimental designs , pretest-posttest designs are the preferred method to compare participant groups and measure the degree of change occurring as a result of treatments or interventions.

This article is a part of the guide:

  • Experimental Research
  • Third Variable
  • Research Bias
  • Independent Variable
  • Between Subjects

Browse Full Outline

  • 1 Experimental Research
  • 2.1 Independent Variable
  • 2.2 Dependent Variable
  • 2.3 Controlled Variables
  • 2.4 Third Variable
  • 3.1 Control Group
  • 3.2 Research Bias
  • 3.3.1 Placebo Effect
  • 3.3.2 Double Blind Method
  • 4.1 Randomized Controlled Trials
  • 4.2 Pretest-Posttest
  • 4.3 Solomon Four Group
  • 4.4 Between Subjects
  • 4.5 Within Subject
  • 4.6 Repeated Measures
  • 4.7 Counterbalanced Measures
  • 4.8 Matched Subjects

Pretest-posttest designs grew from the simpler posttest only designs, and address some of the issues arising with assignment bias and the allocation of participants to groups.

One example is education, where researchers want to monitor the effect of a new teaching method upon groups of children. Other areas include evaluating the effects of counseling, testing medical treatments, and measuring psychological constructs. The only stipulation is that the subjects must be randomly assigned to groups, in a true experimental design, to properly isolate and nullify any nuisance or confounding variables .

pretest and posttest experimental group design

The Posttest Only Design With Non-Equivalent Control Groups

Pretest-posttest designs are an expansion of the posttest only design with nonequivalent groups, one of the simplest methods of testing the effectiveness of an intervention.

In this design, which uses two groups, one group is given the treatment and the results are gathered at the end. The control group receives no treatment, over the same period of time, but undergoes exactly the same tests.

Statistical analysis can then determine if the intervention had a significant effect . One common example of this is in medicine; one group is given a medicine, whereas the control group is given none, and this allows the researchers to determine if the drug really works. This type of design, whilst commonly using two groups, can be slightly more complex. For example, if different dosages of a medicine are tested, the design can be based around multiple groups.

Whilst this posttest only design does find many uses, it is limited in scope and contains many threats to validity . It is very poor at guarding against assignment bias , because the researcher knows nothing about the individual differences within the control group and how they may have affected the outcome. Even with randomization of the initial groups, this failure to address assignment bias means that the statistical power is weak.

The results of such a study will always be limited in scope and, resources permitting; most researchers use a more robust design, of which pretest-posttest designs are one. The posttest only design with non-equivalent groups is usually reserved for experiments performed after the fact, such as a medical researcher wishing to observe the effect of a medicine that has already been administered.

pretest and posttest experimental group design

The Two Group Control Group Design

This is, by far, the simplest and most common of the pretest-posttest designs, and is a useful way of ensuring that an experiment has a strong level of internal validity . The principle behind this design is relatively simple, and involves randomly assigning subjects between two groups, a test group and a control . Both groups are pre-tested, and both are post-tested, the ultimate difference being that one group was administered the treatment.

Confounding Variable

This test allows a number of distinct analyses, giving researchers the tools to filter out experimental noise and confounding variables . The internal validity of this design is strong, because the pretest ensures that the groups are equivalent. The various analyses that can be performed upon a two-group control group pretest-posttest designs are (Fig 1):

Pretest Posttest Design With Control Group

  • This design allows researchers to compare the final posttest results between the two groups, giving them an idea of the overall effectiveness of the intervention or treatment. (C)
  • The researcher can see how both groups changed from pretest to posttest, whether one, both or neither improved over time. If the control group also showed a significant improvement, then the researcher must attempt to uncover the reasons behind this. (A and A1)
  • The researchers can compare the scores in the two pretest groups, to ensure that the randomization process was effective. (B)

These checks evaluate the efficiency of the randomization process and also determine whether the group given the treatment showed a significant difference.

Problems With Pretest-Posttest Designs

The main problem with this design is that it improves internal validity but sacrifices external validity to do so. There is no way of judging whether the process of pre-testing actually influenced the results because there is no baseline measurement against groups that remained completely untreated. For example, children given an educational pretest may be inspired to try a little harder in their lessons, and both groups would outperform children not given a pretest, so it becomes difficult to generalize the results to encompass all children.

The other major problem, which afflicts many sociological and educational research programs, is that it is impossible and unethical to isolate all of the participants completely. If two groups of children attend the same school, it is reasonable to assume that they mix outside of lessons and share ideas, potentially contaminating the results. On the other hand, if the children are drawn from different schools to prevent this, the chance of selection bias arises, because randomization is not possible.

The two-group control group design is an exceptionally useful research method, as long as its limitations are fully understood. For extensive and particularly important research, many researchers use the Solomon four group method , a design that is more costly, but avoids many weaknesses of the simple pretest-posttest designs.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Martyn Shuttleworth (Nov 3, 2009). Pretest-Posttest Designs. Retrieved Sep 03, 2024 from Explorable.com: https://explorable.com/pretest-posttest-designs

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Want to stay up to date? Follow us!

Get all these articles in 1 guide.

Want the full version to study at home, take to school or just scribble on?

Whether you are an academic novice, or you simply want to brush up your skills, this book will take your academic writing skills to the next level.

pretest and posttest experimental group design

Download electronic versions: - Epub for mobiles and tablets - For Kindle here - For iBooks here - PDF version here

Save this course for later

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

pretest and posttest experimental group design

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

8.2 Non-Equivalent Groups Designs

Learning objectives.

  • Describe the different types of nonequivalent groups quasi-experimental designs.
  • Identify some of the threats to internal validity associated with each of these designs. 

Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A  nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions. There are several types of nonequivalent groups designs we will consider.

Posttest Only Nonequivalent Groups Design

The first nonequivalent groups design we will consider is the posttest only nonequivalent groups design.  In this design, participants in one group are exposed to a treatment, a nonequivalent group is not exposed to the treatment, and then the two groups are compared. Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This design would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.

Of course, researchers using a posttest only nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.

Pretest-Posttest Nonequivalent Groups Design

Another way to improve upon the posttest only nonequivalent groups design is to add a pretest. In the  pretest-posttest nonequivalent groups design t here is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a nonequivalent control group that is given a pretest, does  not  receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve, but whether they improve  more  than participants who do not receive the treatment.

Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an anti-drug program, and finally, are given a posttest. Students in a similar school are given the pretest, not exposed to an anti-drug program, and finally, are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this change in attitude could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.

Returning to the example of evaluating a new measure of teaching third graders, this study could be improved by adding a pretest of students’ knowledge of fractions. The changes in scores from pretest to posttest would then be evaluated and compared across conditions to determine whether one group demonstrated a bigger improvement in knowledge of fractions than another. Of course, the teachers’ styles, and even the classroom environments might still be very different and might cause different levels of achievement or motivation among the students that are independent of the teaching intervention. Once again, differential history also represents a potential threat to internal validity.  If asbestos is found in one of the schools causing it to be shut down for a month then this interruption in teaching could produce a difference across groups on posttest scores.

If participants in this kind of design are randomly assigned to conditions, it becomes a true between-groups experiment rather than a quasi-experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.

Interrupted Time-Series Design with Nonequivalent Groups

One way to improve upon the interrupted time-series design is to add a control group. The interrupted time-series design with nonequivalent groups involves taking  a set of measurements at intervals over a period of time both before and after an intervention of interest in two or more nonequivalent groups. Once again consider the manufacturing company that measures its workers’ productivity each week for a year before and after reducing work shifts from 10 hours to 8 hours. This design could be improved by locating another manufacturing company who does not plan to change their shift length and using them as a nonequivalent control group. If productivity  increased rather quickly after the shortening of the work shifts in the treatment group but productivity remained consistent in the control group, then this provides better evidence for the effectiveness of the treatment. 

Similarly, in the example of examining the effects of taking attendance on student absences in a research methods course, the design could be improved by using students in another section of the research methods course as a control group. If a consistently higher number of absences was found in the treatment group before the intervention, followed by a sustained drop in absences after the treatment, while the nonequivalent control group showed consistently high absences across the semester then this would provide superior evidence for the effectiveness of the treatment in reducing absences.

Pretest-Posttest Design With Switching Replication

Some of these nonequivalent control group designs can be further improved by adding a switching replication. Using a pretest-posttest design with switching replication design, nonequivalent groups are administered a pretest of the dependent variable, then one group receives a treatment while a nonequivalent control group does not receive a treatment, the dependent variable is assessed again, and then the treatment is added to the control group, and finally the dependent variable is assessed one last time.

As a concrete example, let’s say we wanted to introduce an exercise intervention for the treatment of depression. We recruit one group of patients experiencing depression and a nonequivalent control group of students experiencing depression. We first measure depression levels in both groups, and then we introduce the exercise intervention to the patients experiencing depression, but we hold off on introducing the treatment to the students. We then measure depression levels in both groups. If the treatment is effective we should see a reduction in the depression levels of the patients (who received the treatment) but not in the students (who have not yet received the treatment). Finally, while the group of patients continues to engage in the treatment, we would introduce the treatment to the students with depression. Now and only now should we see the students’ levels of depression decrease.

One of the strengths of this design is that it includes a built in replication. In the example given, we would get evidence for the efficacy of the treatment in two different samples (patients and students). Another strength of this design is that it provides more control over history effects. It becomes rather unlikely that some outside event would perfectly coincide with the introduction of the treatment in the first group and with the delayed introduction of the treatment in the second group. For instance, if a change in the weather occurred when we first introduced the treatment to the patients, and this explained their reductions in depression the second time that depression was measured, then we would see depression levels decrease in both the groups. Similarly, the switching replication helps to control for maturation and instrumentation. Both groups would be expected to show the same rates of spontaneous remission of depression and if the instrument for assessing depression happened to change at some point in the study the change would be consistent across both of the groups. Of course, demand characteristics, placebo effects, and experimenter expectancy effects can still be problems. But they can be controlled for using some of the methods described in Chapter 5.

Switching Replication with Treatment Removal Design

In a basic pretest-posttest design with switching replication, the first group receives a treatment and the second group receives the same treatment a little bit later on (while the initial group continues to receive the treatment). In contrast, in a switching replication with treatment removal design , the treatment is removed from the first group when it is added to the second group. Once again, let’s assume we first measure the depression levels of patients with depression and students with depression. Then we introduce the exercise intervention to only the patients. After they have been exposed to the exercise intervention for a week we assess depression levels again in both groups. If the intervention is effective then we should see depression levels decrease in the patient group but not the student group (because the students haven’t received the treatment yet). Next, we would remove the treatment from the group of patients with depression. So we would tell them to stop exercising. At the same time, we would tell the student group to start exercising. After a week of the students exercising and the patients not exercising, we would reassess depression levels. Now if the intervention is effective we should see that the depression levels have decreased in the student group but that they have increased in the patient group (because they are no longer exercising).

Demonstrating a treatment effect in two groups staggered over time and demonstrating the reversal of the treatment effect after the treatment has been removed can provide strong evidence for the efficacy of the treatment. In addition to providing evidence for the replicability of the findings, this design can also provide evidence for whether the treatment continues to show effects after it has been withdrawn.

Key Takeaways

  • Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or counterbalancing of orders of conditions.
  • There are three types of quasi-experimental designs that are within-subjects in nature. These are the one-group posttest only design, the one-group pretest-posttest design, and the interrupted time-series design.
  • There are five types of quasi-experimental designs that are between-subjects in nature. These are the posttest only design with nonequivalent groups, the pretest-posttest design with nonequivalent groups, the interrupted time-series design with nonequivalent groups, the pretest-posttest design with switching replication, and the switching replication with treatment removal design.
  • Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. However, it does not eliminate the problem of confounding variables, because it does not involve random assignment to conditions or counterbalancing. For these reasons, quasi-experimental research is generally higher in internal validity than non-experimental studies but lower than true experiments.
  • Of all of the quasi-experimental designs, those that include a switching replication are highest in internal validity.
  • Practice: Imagine that two professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.
  • regression to the mean
  • spontaneous remission

Creative Commons License

Share This Book

  • Increase Font Size

Experimental Design

  • Reference work entry
  • First Online: 01 January 2024
  • pp 2311–2313
  • Cite this reference work entry

pretest and posttest experimental group design

  • Kim Koh 2  

Experiments ; Randomized clinical trial ; Randomized trial

In quality-of-life and well-being research specifically, and in medical, nursing, social, educational, and psychological research more generally, experimental design can be used to test cause-and-effect relationships between the independent and dependent variables.

Description

Experimental design was pioneered by R. A. Fisher in the fields of agriculture and education (Fisher 1935 ). In studies that use experimental design, the independent variables are manipulated or controlled by researchers, which enables the testing of the cause-and-effect relationship between the independent and dependent variables. An experimental design can control many threats to internal validity by using random assignment of participants to different treatment/intervention and control/comparison groups. Therefore, it is considered one of the most statistically robust designs in quality-of-life and well-being research, as well as in...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research . Chicago: Rand MçNally & Company.

Google Scholar  

Fisher, R. A. (1935). The design of experiments . Edinburgh: Oliver and Boyd.

Kerlinger, F. N., & Lee, H. B. (2000). Foundations of behavioral research (4th ed.). Belmont: Cengage Learning.

Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects: Using experimental designs and observational design . Washington, DC: American Educational Research Association.

Download references

Author information

Authors and affiliations.

Werklund School of Education, University of Calgary, Calgary, AB, Canada

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kim Koh .

Editor information

Editors and affiliations.

Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Roma, Roma, Italy

Filomena Maggino

Section Editor information

Department of ECPS & Intitute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada

Bruno Zumbo

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this entry

Cite this entry.

Koh, K. (2023). Experimental Design. In: Maggino, F. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Cham. https://doi.org/10.1007/978-3-031-17299-1_967

Download citation

DOI : https://doi.org/10.1007/978-3-031-17299-1_967

Published : 11 February 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-17298-4

Online ISBN : 978-3-031-17299-1

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Pretest-posttest designs and measurement of change

Affiliation.

  • 1 507 White Hall, College of Education, Kent State University, Kent, OH 44242-0001, USA. [email protected]
  • PMID: 12671209

The article examines issues involved in comparing groups and measuring change with pretest and posttest data. Different pretest-posttest designs are presented in a manner that can help rehabilitation professionals to better understand and determine effects resulting from selected interventions. The reliability of gain scores in pretest-posttest measurement is also discussed in the context of rehabilitation research and practice.

PubMed Disclaimer

Similar articles

  • Analysis of covariance (ANCOVA) with difference scores. Jamieson J. Jamieson J. Int J Psychophysiol. 2004 May;52(3):277-83. doi: 10.1016/j.ijpsycho.2003.12.009. Int J Psychophysiol. 2004. PMID: 15094250
  • Data analytic techniques for treatment outcome studies with pretest/posttest measurements: an extensive primer. Sheeber LB, Sorensen ED, Howe SR. Sheeber LB, et al. J Psychiatr Res. 1996 May-Jun;30(3):185-99. doi: 10.1016/0022-3956(96)00012-X. J Psychiatr Res. 1996. PMID: 8884657
  • Analytic methods for questions pertaining to a randomized pretest, posttest, follow-up design. Rausch JR, Maxwell SE, Kelley K. Rausch JR, et al. J Clin Child Adolesc Psychol. 2003 Sep;32(3):467-86. doi: 10.1207/S15374424JCCP3203_15. J Clin Child Adolesc Psychol. 2003. PMID: 12881035
  • Counselor confounds in evaluations of vocational rehabilitation methods in substance dependency treatment. Staines GL, Cleland CM, Blankertz L. Staines GL, et al. Eval Rev. 2006 Apr;30(2):139-70. doi: 10.1177/0193841X05277084. Eval Rev. 2006. PMID: 16492996 Review.
  • Measuring the dose of nursing intervention. Reed D, Titler MG, Dochterman JM, Shever LL, Kanak M, Picone DM. Reed D, et al. Int J Nurs Terminol Classif. 2007 Oct-Dec;18(4):121-30. doi: 10.1111/j.1744-618X.2007.00067.x. Int J Nurs Terminol Classif. 2007. PMID: 17991139 Review.
  • Neurohabilitation of Cognitive Functions in Pediatric Epilepsy Patients through LEGO ® -Based Therapy. Zaldumbide-Alcocer FL, Labra-Ruiz NA, Carbó-Godinez AA, Ruíz-García M, Mendoza-Torreblanca JG, Naranjo-Albarrán L, Cárdenas-Rodríguez N, Valenzuela-Alarcón E, Espinosa-Garamendi E. Zaldumbide-Alcocer FL, et al. Brain Sci. 2024 Jul 13;14(7):702. doi: 10.3390/brainsci14070702. Brain Sci. 2024. PMID: 39061442 Free PMC article.
  • Anterior vertebral body tethering shows clinically comparable shoulder balance outcomes to posterior spinal fusion. Meyers J, Eaker L, Samdani A, Miyanji F, Herrera M, Wilczek A, Alanay A, Yilgor C, Hoernschemeyer D, Shah S, Newton P, Lonner B. Meyers J, et al. Spine Deform. 2024 Jul;12(4):1033-1042. doi: 10.1007/s43390-024-00847-6. Epub 2024 Mar 22. Spine Deform. 2024. PMID: 38517667
  • How Information Framing Nudges Acceptance of China's Delayed Retirement Policy: A Moderated Mediation Model of Anchoring Effects and Perceived Fairness. Zeng W, Zhao L, Zhao W, Zhang Y. Zeng W, et al. Behav Sci (Basel). 2024 Jan 10;14(1):45. doi: 10.3390/bs14010045. Behav Sci (Basel). 2024. PMID: 38247697 Free PMC article.
  • An educative nutritional intervention supporting older hospital patients to eat sufficiently using eHealth: a mixed methods feasibility and pilot study. Terp R, Kayser L, Lindhardt T. Terp R, et al. BMC Geriatr. 2024 Jan 4;24(1):22. doi: 10.1186/s12877-023-04582-x. BMC Geriatr. 2024. PMID: 38177992 Free PMC article.
  • Effectiveness of lower limb rehabilitation protocol using mobile health on quality of life, functional strength, and functional capacity among knee osteoarthritis patients who are overweight and obese: A randomized-controlled trial. Rafiq MT, Abdul Hamid MS, Hafiz E. Rafiq MT, et al. Arch Rheumatol. 2023 Oct 27;38(4):590-601. doi: 10.46497/ArchRheumatol.2023.9018. eCollection 2023 Dec. Arch Rheumatol. 2023. PMID: 38125060 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • MedlinePlus Health Information
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Am Med Inform Assoc
  • v.13(1); Jan-Feb 2006

The Use and Interpretation of Quasi-Experimental Studies in Medical Informatics

Associated data.

Quasi-experimental study designs, often described as nonrandomized, pre-post intervention studies, are common in the medical informatics literature. Yet little has been written about the benefits and limitations of the quasi-experimental approach as applied to informatics studies. This paper outlines a relative hierarchy and nomenclature of quasi-experimental study designs that is applicable to medical informatics intervention studies. In addition, the authors performed a systematic review of two medical informatics journals, the Journal of the American Medical Informatics Association (JAMIA) and the International Journal of Medical Informatics (IJMI), to determine the number of quasi-experimental studies published and how the studies are classified on the above-mentioned relative hierarchy. They hope that future medical informatics studies will implement higher level quasi-experimental study designs that yield more convincing evidence for causal links between medical informatics interventions and outcomes.

Quasi-experimental studies encompass a broad range of nonrandomized intervention studies. These designs are frequently used when it is not logistically feasible or ethical to conduct a randomized controlled trial. Examples of quasi-experimental studies follow. As one example of a quasi-experimental study, a hospital introduces a new order-entry system and wishes to study the impact of this intervention on the number of medication-related adverse events before and after the intervention. As another example, an informatics technology group is introducing a pharmacy order-entry system aimed at decreasing pharmacy costs. The intervention is implemented and pharmacy costs before and after the intervention are measured.

In medical informatics, the quasi-experimental, sometimes called the pre-post intervention, design often is used to evaluate the benefits of specific interventions. The increasing capacity of health care institutions to collect routine clinical data has led to the growing use of quasi-experimental study designs in the field of medical informatics as well as in other medical disciplines. However, little is written about these study designs in the medical literature or in traditional epidemiology textbooks. 1 , 2 , 3 In contrast, the social sciences literature is replete with examples of ways to implement and improve quasi-experimental studies. 4 , 5 , 6

In this paper, we review the different pretest-posttest quasi-experimental study designs, their nomenclature, and the relative hierarchy of these designs with respect to their ability to establish causal associations between an intervention and an outcome. The example of a pharmacy order-entry system aimed at decreasing pharmacy costs will be used throughout this article to illustrate the different quasi-experimental designs. We discuss limitations of quasi-experimental designs and offer methods to improve them. We also perform a systematic review of four years of publications from two informatics journals to determine the number of quasi-experimental studies, classify these studies into their application domains, determine whether the potential limitations of quasi-experimental studies were acknowledged by the authors, and place these studies into the above-mentioned relative hierarchy.

The authors reviewed articles and book chapters on the design of quasi-experimental studies. 4 , 5 , 6 , 7 , 8 , 9 , 10 Most of the reviewed articles referenced two textbooks that were then reviewed in depth. 4 , 6

Key advantages and disadvantages of quasi-experimental studies, as they pertain to the study of medical informatics, were identified. The potential methodological flaws of quasi-experimental medical informatics studies, which have the potential to introduce bias, were also identified. In addition, a summary table outlining a relative hierarchy and nomenclature of quasi-experimental study designs is described. In general, the higher the design is in the hierarchy, the greater the internal validity that the study traditionally possesses because the evidence of the potential causation between the intervention and the outcome is strengthened. 4

We then performed a systematic review of four years of publications from two informatics journals. First, we determined the number of quasi-experimental studies. We then classified these studies on the above-mentioned hierarchy. We also classified the quasi-experimental studies according to their application domain. The categories of application domains employed were based on categorization used by Yearbooks of Medical Informatics 1992–2005 and were similar to the categories of application domains employed by Annual Symposiums of the American Medical Informatics Association. 11 The categories were (1) health and clinical management; (2) patient records; (3) health information systems; (4) medical signal processing and biomedical imaging; (5) decision support, knowledge representation, and management; (6) education and consumer informatics; and (7) bioinformatics. Because the quasi-experimental study design has recognized limitations, we sought to determine whether authors acknowledged the potential limitations of this design. Examples of acknowledgment included mention of lack of randomization, the potential for regression to the mean, the presence of temporal confounders and the mention of another design that would have more internal validity.

All original scientific manuscripts published between January 2000 and December 2003 in the Journal of the American Medical Informatics Association (JAMIA) and the International Journal of Medical Informatics (IJMI) were reviewed. One author (ADH) reviewed all the papers to identify the number of quasi-experimental studies. Other authors (ADH, JCM, JF) then independently reviewed all the studies identified as quasi-experimental. The three authors then convened as a group to resolve any disagreements in study classification, application domain, and acknowledgment of limitations.

Results and Discussion

What is a quasi-experiment.

Quasi-experiments are studies that aim to evaluate interventions but that do not use randomization. Similar to randomized trials, quasi-experiments aim to demonstrate causality between an intervention and an outcome. Quasi-experimental studies can use both preintervention and postintervention measurements as well as nonrandomly selected control groups.

Using this basic definition, it is evident that many published studies in medical informatics utilize the quasi-experimental design. Although the randomized controlled trial is generally considered to have the highest level of credibility with regard to assessing causality, in medical informatics, researchers often choose not to randomize the intervention for one or more reasons: (1) ethical considerations, (2) difficulty of randomizing subjects, (3) difficulty to randomize by locations (e.g., by wards), (4) small available sample size. Each of these reasons is discussed below.

Ethical considerations typically will not allow random withholding of an intervention with known efficacy. Thus, if the efficacy of an intervention has not been established, a randomized controlled trial is the design of choice to determine efficacy. But if the intervention under study incorporates an accepted, well-established therapeutic intervention, or if the intervention has either questionable efficacy or safety based on previously conducted studies, then the ethical issues of randomizing patients are sometimes raised. In the area of medical informatics, it is often believed prior to an implementation that an informatics intervention will likely be beneficial and thus medical informaticians and hospital administrators are often reluctant to randomize medical informatics interventions. In addition, there is often pressure to implement the intervention quickly because of its believed efficacy, thus not allowing researchers sufficient time to plan a randomized trial.

For medical informatics interventions, it is often difficult to randomize the intervention to individual patients or to individual informatics users. So while this randomization is technically possible, it is underused and thus compromises the eventual strength of concluding that an informatics intervention resulted in an outcome. For example, randomly allowing only half of medical residents to use pharmacy order-entry software at a tertiary care hospital is a scenario that hospital administrators and informatics users may not agree to for numerous reasons.

Similarly, informatics interventions often cannot be randomized to individual locations. Using the pharmacy order-entry system example, it may be difficult to randomize use of the system to only certain locations in a hospital or portions of certain locations. For example, if the pharmacy order-entry system involves an educational component, then people may apply the knowledge learned to nonintervention wards, thereby potentially masking the true effect of the intervention. When a design using randomized locations is employed successfully, the locations may be different in other respects (confounding variables), and this further complicates the analysis and interpretation.

In situations where it is known that only a small sample size will be available to test the efficacy of an intervention, randomization may not be a viable option. Randomization is beneficial because on average it tends to evenly distribute both known and unknown confounding variables between the intervention and control group. However, when the sample size is small, randomization may not adequately accomplish this balance. Thus, alternative design and analytical methods are often used in place of randomization when only small sample sizes are available.

What Are the Threats to Establishing Causality When Using Quasi-experimental Designs in Medical Informatics?

The lack of random assignment is the major weakness of the quasi-experimental study design. Associations identified in quasi-experiments meet one important requirement of causality since the intervention precedes the measurement of the outcome. Another requirement is that the outcome can be demonstrated to vary statistically with the intervention. Unfortunately, statistical association does not imply causality, especially if the study is poorly designed. Thus, in many quasi-experiments, one is most often left with the question: “Are there alternative explanations for the apparent causal association?” If these alternative explanations are credible, then the evidence of causation is less convincing. These rival hypotheses, or alternative explanations, arise from principles of epidemiologic study design.

Shadish et al. 4 outline nine threats to internal validity that are outlined in ▶ . Internal validity is defined as the degree to which observed changes in outcomes can be correctly inferred to be caused by an exposure or an intervention. In quasi-experimental studies of medical informatics, we believe that the methodological principles that most often result in alternative explanations for the apparent causal effect include (a) difficulty in measuring or controlling for important confounding variables, particularly unmeasured confounding variables, which can be viewed as a subset of the selection threat in ▶ ; (b) results being explained by the statistical principle of regression to the mean . Each of these latter two principles is discussed in turn.

Threats to Internal Validity

1. Ambiguous temporal precedence: Lack of clarity about whether intervention occurred before outcome
2. Selection: Systematic differences over conditions in respondent characteristics that could also cause the observed effect
3. History: Events occurring concurrently with intervention could cause the observed effect
4. Maturation: Naturally occurring changes over time could be confused with a treatment effect
5. Regression: When units are selected for their extreme scores, they will often have less extreme subsequent scores, an occurrence that can be confused with an intervention effect
6. Attrition: Loss of respondents can produce artifactual effects if that loss is correlated with intervention
7. Testing: Exposure to a test can affect scores on subsequent exposures to that test
8. Instrumentation: The nature of a measurement may change over time or conditions
9. Interactive effects: The impact of an intervention may depend on the level of another intervention

Adapted from Shadish et al. 4

An inability to sufficiently control for important confounding variables arises from the lack of randomization. A variable is a confounding variable if it is associated with the exposure of interest and is also associated with the outcome of interest; the confounding variable leads to a situation where a causal association between a given exposure and an outcome is observed as a result of the influence of the confounding variable. For example, in a study aiming to demonstrate that the introduction of a pharmacy order-entry system led to lower pharmacy costs, there are a number of important potential confounding variables (e.g., severity of illness of the patients, knowledge and experience of the software users, other changes in hospital policy) that may have differed in the preintervention and postintervention time periods ( ▶ ). In a multivariable regression, the first confounding variable could be addressed with severity of illness measures, but the second confounding variable would be difficult if not nearly impossible to measure and control. In addition, potential confounding variables that are unmeasured or immeasurable cannot be controlled for in nonrandomized quasi-experimental study designs and can only be properly controlled by the randomization process in randomized controlled trials.

An external file that holds a picture, illustration, etc.
Object name is 16f01.jpg

Example of confounding. To get the true effect of the intervention of interest, we need to control for the confounding variable.

Another important threat to establishing causality is regression to the mean. 12 , 13 , 14 This widespread statistical phenomenon can result in wrongly concluding that an effect is due to the intervention when in reality it is due to chance. The phenomenon was first described in 1886 by Francis Galton who measured the adult height of children and their parents. He noted that when the average height of the parents was greater than the mean of the population, the children tended to be shorter than their parents, and conversely, when the average height of the parents was shorter than the population mean, the children tended to be taller than their parents.

In medical informatics, what often triggers the development and implementation of an intervention is a rise in the rate above the mean or norm. For example, increasing pharmacy costs and adverse events may prompt hospital informatics personnel to design and implement pharmacy order-entry systems. If this rise in costs or adverse events is really just an extreme observation that is still within the normal range of the hospital's pharmaceutical costs (i.e., the mean pharmaceutical cost for the hospital has not shifted), then the statistical principle of regression to the mean predicts that these elevated rates will tend to decline even without intervention. However, often informatics personnel and hospital administrators cannot wait passively for this decline to occur. Therefore, hospital personnel often implement one or more interventions, and if a decline in the rate occurs, they may mistakenly conclude that the decline is causally related to the intervention. In fact, an alternative explanation for the finding could be regression to the mean.

What Are the Different Quasi-experimental Study Designs?

In the social sciences literature, quasi-experimental studies are divided into four study design groups 4 , 6 :

  • Quasi-experimental designs without control groups
  • Quasi-experimental designs that use control groups but no pretest
  • Quasi-experimental designs that use control groups and pretests
  • Interrupted time-series designs

There is a relative hierarchy within these categories of study designs, with category D studies being sounder than categories C, B, or A in terms of establishing causality. Thus, if feasible from a design and implementation point of view, investigators should aim to design studies that fall in to the higher rated categories. Shadish et al. 4 discuss 17 possible designs, with seven designs falling into category A, three designs in category B, and six designs in category C, and one major design in category D. In our review, we determined that most medical informatics quasi-experiments could be characterized by 11 of 17 designs, with six study designs in category A, one in category B, three designs in category C, and one design in category D because the other study designs were not used or feasible in the medical informatics literature. Thus, for simplicity, we have summarized the 11 study designs most relevant to medical informatics research in ▶ .

Relative Hierarchy of Quasi-experimental Designs

Quasi-experimental Study DesignsDesign Notation
A. Quasi-experimental designs without control groups
    1. The one-group posttest-only designX O1
    2. The one-group pretest-posttest designO1 X O2
    3. The one-group pretest-posttest design using a double pretestO1 O2 X O3
    4. The one-group pretest-posttest design using a nonequivalent dependent variable(O1a, O1b) X (O2a, O2b)
    5. The removed-treatment designO1 X O2 O3 removeX O4
    6. The repeated-treatment designO1 X O2 removeX O3 X O4
B. Quasi-experimental designs that use a control group but no pretest
    1. Posttest-only design with nonequivalent groupsIntervention group: X O1
Control group: O2
C. Quasi-experimental designs that use control groups and pretests
    1. Untreated control group with dependent pretest and posttest samplesIntervention group: O1a X O2a
Control group: O1b O2b
    2. Untreated control group design with dependent pretest and posttest samples using a double pretestIntervention group: O1a O2a X O3a
Control group: O1b O2b O3b
    3. Untreated control group design with dependent pretest and posttest samples using switching replicationsIntervention group: O1a X O2a O3a
Control group: O1b O2b X O3b
D. Interrupted time-series design
    1. Multiple pretest and posttest observations spaced at equal intervals of timeO1 O2 O3 O4 O5 X O6 O7 O8 O9 O10

O = Observational Measurement; X = Intervention Under Study. Time moves from left to right.

The nomenclature and relative hierarchy were used in the systematic review of four years of JAMIA and the IJMI. Similar to the relative hierarchy that exists in the evidence-based literature that assigns a hierarchy to randomized controlled trials, cohort studies, case-control studies, and case series, the hierarchy in ▶ is not absolute in that in some cases, it may be infeasible to perform a higher level study. For example, there may be instances where an A6 design established stronger causality than a B1 design. 15 , 16 , 17

Quasi-experimental Designs without Control Groups

equation M1

Here, X is the intervention and O is the outcome variable (this notation is continued throughout the article). In this study design, an intervention (X) is implemented and a posttest observation (O1) is taken. For example, X could be the introduction of a pharmacy order-entry intervention and O1 could be the pharmacy costs following the intervention. This design is the weakest of the quasi-experimental designs that are discussed in this article. Without any pretest observations or a control group, there are multiple threats to internal validity. Unfortunately, this study design is often used in medical informatics when new software is introduced since it may be difficult to have pretest measurements due to time, technical, or cost constraints.

equation M2

This is a commonly used study design. A single pretest measurement is taken (O1), an intervention (X) is implemented, and a posttest measurement is taken (O2). In this instance, period O1 frequently serves as the “control” period. For example, O1 could be pharmacy costs prior to the intervention, X could be the introduction of a pharmacy order-entry system, and O2 could be the pharmacy costs following the intervention. Including a pretest provides some information about what the pharmacy costs would have been had the intervention not occurred.

equation M3

The advantage of this study design over A2 is that adding a second pretest prior to the intervention helps provide evidence that can be used to refute the phenomenon of regression to the mean and confounding as alternative explanations for any observed association between the intervention and the posttest outcome. For example, in a study where a pharmacy order-entry system led to lower pharmacy costs (O3 < O2 and O1), if one had two preintervention measurements of pharmacy costs (O1 and O2) and they were both elevated, this would suggest that there was a decreased likelihood that O3 is lower due to confounding and regression to the mean. Similarly, extending this study design by increasing the number of measurements postintervention could also help to provide evidence against confounding and regression to the mean as alternate explanations for observed associations.

equation M4

This design involves the inclusion of a nonequivalent dependent variable ( b ) in addition to the primary dependent variable ( a ). Variables a and b should assess similar constructs; that is, the two measures should be affected by similar factors and confounding variables except for the effect of the intervention. Variable a is expected to change because of the intervention X, whereas variable b is not. Taking our example, variable a could be pharmacy costs and variable b could be the length of stay of patients. If our informatics intervention is aimed at decreasing pharmacy costs, we would expect to observe a decrease in pharmacy costs but not in the average length of stay of patients. However, a number of important confounding variables, such as severity of illness and knowledge of software users, might affect both outcome measures. Thus, if the average length of stay did not change following the intervention but pharmacy costs did, then the data are more convincing than if just pharmacy costs were measured.

The Removed-Treatment Design

equation M5

This design adds a third posttest measurement (O3) to the one-group pretest-posttest design and then removes the intervention before a final measure (O4) is made. The advantage of this design is that it allows one to test hypotheses about the outcome in the presence of the intervention and in the absence of the intervention. Thus, if one predicts a decrease in the outcome between O1 and O2 (after implementation of the intervention), then one would predict an increase in the outcome between O3 and O4 (after removal of the intervention). One caveat is that if the intervention is thought to have persistent effects, then O4 needs to be measured after these effects are likely to have disappeared. For example, a study would be more convincing if it demonstrated that pharmacy costs decreased after pharmacy order-entry system introduction (O2 and O3 less than O1) and that when the order-entry system was removed or disabled, the costs increased (O4 greater than O2 and O3 and closer to O1). In addition, there are often ethical issues in this design in terms of removing an intervention that may be providing benefit.

The Repeated-Treatment Design

equation M6

The advantage of this design is that it demonstrates reproducibility of the association between the intervention and the outcome. For example, the association is more likely to be causal if one demonstrates that a pharmacy order-entry system results in decreased pharmacy costs when it is first introduced and again when it is reintroduced following an interruption of the intervention. As for design A5, the assumption must be made that the effect of the intervention is transient, which is most often applicable to medical informatics interventions. Because in this design, subjects may serve as their own controls, this may yield greater statistical efficiency with fewer numbers of subjects.

Quasi-experimental Designs That Use a Control Group but No Pretest

equation M7

An intervention X is implemented for one group and compared to a second group. The use of a comparison group helps prevent certain threats to validity including the ability to statistically adjust for confounding variables. Because in this study design, the two groups may not be equivalent (assignment to the groups is not by randomization), confounding may exist. For example, suppose that a pharmacy order-entry intervention was instituted in the medical intensive care unit (MICU) and not the surgical intensive care unit (SICU). O1 would be pharmacy costs in the MICU after the intervention and O2 would be pharmacy costs in the SICU after the intervention. The absence of a pretest makes it difficult to know whether a change has occurred in the MICU. Also, the absence of pretest measurements comparing the SICU to the MICU makes it difficult to know whether differences in O1 and O2 are due to the intervention or due to other differences in the two units (confounding variables).

Quasi-experimental Designs That Use Control Groups and Pretests

The reader should note that with all the studies in this category, the intervention is not randomized. The control groups chosen are comparison groups. Obtaining pretest measurements on both the intervention and control groups allows one to assess the initial comparability of the groups. The assumption is that if the intervention and the control groups are similar at the pretest, the smaller the likelihood there is of important confounding variables differing between the two groups.

equation M8

The use of both a pretest and a comparison group makes it easier to avoid certain threats to validity. However, because the two groups are nonequivalent (assignment to the groups is not by randomization), selection bias may exist. Selection bias exists when selection results in differences in unit characteristics between conditions that may be related to outcome differences. For example, suppose that a pharmacy order-entry intervention was instituted in the MICU and not the SICU. If preintervention pharmacy costs in the MICU (O1a) and SICU (O1b) are similar, it suggests that it is less likely that there are differences in the important confounding variables between the two units. If MICU postintervention costs (O2a) are less than preintervention MICU costs (O1a), but SICU costs (O1b) and (O2b) are similar, this suggests that the observed outcome may be causally related to the intervention.

equation M9

In this design, the pretests are administered at two different times. The main advantage of this design is that it controls for potentially different time-varying confounding effects in the intervention group and the comparison group. In our example, measuring points O1 and O2 would allow for the assessment of time-dependent changes in pharmacy costs, e.g., due to differences in experience of residents, preintervention between the intervention and control group, and whether these changes were similar or different.

equation M10

With this study design, the researcher administers an intervention at a later time to a group that initially served as a nonintervention control. The advantage of this design over design C2 is that it demonstrates reproducibility in two different settings. This study design is not limited to two groups; in fact, the study results have greater validity if the intervention effect is replicated in different groups at multiple times. In the example of a pharmacy order-entry system, one could implement or intervene in the MICU and then at a later time, intervene in the SICU. This latter design is often very applicable to medical informatics where new technology and new software is often introduced or made available gradually.

Interrupted Time-Series Designs

equation M11

An interrupted time-series design is one in which a string of consecutive observations equally spaced in time is interrupted by the imposition of a treatment or intervention. The advantage of this design is that with multiple measurements both pre- and postintervention, it is easier to address and control for confounding and regression to the mean. In addition, statistically, there is a more robust analytic capability, and there is the ability to detect changes in the slope or intercept as a result of the intervention in addition to a change in the mean values. 18 A change in intercept could represent an immediate effect while a change in slope could represent a gradual effect of the intervention on the outcome. In the example of a pharmacy order-entry system, O1 through O5 could represent monthly pharmacy costs preintervention and O6 through O10 monthly pharmacy costs post the introduction of the pharmacy order-entry system. Interrupted time-series designs also can be further strengthened by incorporating many of the design features previously mentioned in other categories (such as removal of the treatment, inclusion of a nondependent outcome variable, or the addition of a control group).

Systematic Review Results

The results of the systematic review are in ▶ . In the four-year period of JAMIA publications that the authors reviewed, 25 quasi-experimental studies among 22 articles were published. Of these 25, 15 studies were of category A, five studies were of category B, two studies were of category C, and no studies were of category D. Although there were no studies of category D (interrupted time-series analyses), three of the studies classified as category A had data collected that could have been analyzed as an interrupted time-series analysis. Nine of the 25 studies (36%) mentioned at least one of the potential limitations of the quasi-experimental study design. In the four-year period of IJMI publications reviewed by the authors, nine quasi-experimental studies among eight manuscripts were published. Of these nine, five studies were of category A, one of category B, one of category C, and two of category D. Two of the nine studies (22%) mentioned at least one of the potential limitations of the quasi-experimental study design.

Systematic Review of Four Years of Quasi-designs in JAMIA

StudyJournalInformatics Topic CategoryQuasi-experimental DesignLimitation of Quasi-design Mentioned in Article
Staggers and Kobus JAMIA1Counterbalanced study designYes
Schriger et al. JAMIA1A5Yes
Patel et al. JAMIA2A5 (study 1, phase 1)No
Patel et al. JAMIA2A2 (study 1, phase 2)No
Borowitz JAMIA1A2No
Patterson and Harasym JAMIA6C1Yes
Rocha et al. JAMIA5A2Yes
Lovis et al. JAMIA1Counterbalanced study designNo
Hersh et al. JAMIA6B1No
Makoul et al. JAMIA2B1Yes
Ruland JAMIA3B1No
DeLusignan et al. JAMIA1A1No
Mekhjian et al. JAMIA1A2 (study design 1)Yes
Mekhjian et al. JAMIA1B1 (study design 2)Yes
Ammenwerth et al. JAMIA1A2No
Oniki et al. JAMIA5C1Yes
Liederman and Morefield JAMIA1A1 (study 1)No
Liederman and Morefield JAMIA1A2 (study 2)No
Rotich et al. JAMIA2A2 No
Payne et al. JAMIA1A1No
Hoch et al. JAMIA3A2 No
Laerum et al. JAMIA1B1Yes
Devine et al. JAMIA1Counterbalanced study design
Dunbar et al. JAMIA6A1
Lenert et al. JAMIA6A2
Koide et al. IJMI5D4No
Gonzalez-Hendrich et al. IJMI2A1No
Anantharaman and Swee Han IJMI3B1No
Chae et al. IJMI6A2No
Lin et al. IJMI3A1No
Mikulich et al. IJMI1A2Yes
Hwang et al. IJMI1A2Yes
Park et al. IJMI1C2No
Park et al. IJMI1D4No

JAMIA = Journal of the American Medical Informatics Association; IJMI = International Journal of Medical Informatics.

In addition, three studies from JAMIA were based on a counterbalanced design. A counterbalanced design is a higher order study design than other studies in category A. The counterbalanced design is sometimes referred to as a Latin-square arrangement. In this design, all subjects receive all the different interventions but the order of intervention assignment is not random. 19 This design can only be used when the intervention is compared against some existing standard, for example, if a new PDA-based order entry system is to be compared to a computer terminal–based order entry system. In this design, all subjects receive the new PDA-based order entry system and the old computer terminal-based order entry system. The counterbalanced design is a within-participants design, where the order of the intervention is varied (e.g., one group is given software A followed by software B and another group is given software B followed by software A). The counterbalanced design is typically used when the available sample size is small, thus preventing the use of randomization. This design also allows investigators to study the potential effect of ordering of the informatics intervention.

Although quasi-experimental study designs are ubiquitous in the medical informatics literature, as evidenced by 34 studies in the past four years of the two informatics journals, little has been written about the benefits and limitations of the quasi-experimental approach. As we have outlined in this paper, a relative hierarchy and nomenclature of quasi-experimental study designs exist, with some designs being more likely than others to permit causal interpretations of observed associations. Strengths and limitations of a particular study design should be discussed when presenting data collected in the setting of a quasi-experimental study. Future medical informatics investigators should choose the strongest design that is feasible given the particular circumstances.

Supplementary Material

Dr. Harris was supported by NIH grants K23 AI01752-01A1 and R01 AI60859-01A1. Dr. Perencevich was supported by a VA Health Services Research and Development Service (HSR&D) Research Career Development Award (RCD-02026-1). Dr. Finkelstein was supported by NIH grant RO1 HL71690.

METHODS article

Evaluating intervention programs with a pretest-posttest design: a structural equation modeling approach.

\r\nGuido Alessandri*

  • 1 Department of Psychology, Sapienza University of Rome, Rome, Italy
  • 2 Department of Psychology, Liverpool Hope University, Liverpool, UK

A common situation in the evaluation of intervention programs is the researcher's possibility to rely on two waves of data only (i.e., pretest and posttest), which profoundly impacts on his/her choice about the possible statistical analyses to be conducted. Indeed, the evaluation of intervention programs based on a pretest-posttest design has been usually carried out by using classic statistical tests, such as family-wise ANOVA analyses, which are strongly limited by exclusively analyzing the intervention effects at the group level. In this article, we showed how second order multiple group latent curve modeling (SO-MG-LCM) could represent a useful methodological tool to have a more realistic and informative assessment of intervention programs with two waves of data. We offered a practical step-by-step guide to properly implement this methodology, and we outlined the advantages of the LCM approach over classic ANOVA analyses. Furthermore, we also provided a real-data example by re-analyzing the implementation of the Young Prosocial Animation, a universal intervention program aimed at promoting prosociality among youth. In conclusion, albeit there are previous studies that pointed to the usefulness of MG-LCM to evaluate intervention programs ( Muthén and Curran, 1997 ; Curran and Muthén, 1999 ), no previous study showed that it is possible to use this approach even in pretest-posttest (i.e., with only two time points) designs. Given the advantages of latent variable analyses in examining differences in interindividual and intraindividual changes ( McArdle, 2009 ), the methodological and substantive implications of our proposed approach are discussed.

Introduction

Evaluating intervention programs is at the core of many educational and clinical psychologists' research agenda ( Malti et al., 2016 ; Achenbach, 2017 ). From a methodological perspective, collecting data from several points in time (usually T ≥ 3) is important to test the long-term strength of intervention effects once the treatment is completed, such as in classic designs including pretest, posttest, and follow up assessments ( Roberts and Ilardi, 2003 ). However, several factors could hinder the researcher's capacity to collect data at follow-up assessments, in particular the lack of funds, participants' poor level of monitoring compliance, participants' relocation in different areas, etc. Accordingly, the use of the less advantageous pretest-posttest design (i.e., before and after the intervention) often represents a widely used methodological choice in the psychological intervention field. Indeed, from a literature research on the database PsycINFO using the following string “ intervention AND pretest AND posttest AND follow-up ” limited to abstract section and with a publication date from January 2006 to December 2016, we obtained 260 documents. When we changed “AND follow-up ” with “NOT follow-up ” the results were 1,544 (see Appendix A to replicate these literature search strategies).

A further matter of concern arises from the statistical approaches commonly used for evaluating intervention programs in pretest-posttest design, mostly ANOVA-family analyses, which heavily rely on statistical assumptions (e.g., normality, homogeneity of variance, independence of observations, absence of measurement error, and so on) rarely met in psychological research ( Schmider et al., 2010 ; Nimon, 2012 ).

However, all is not lost and some analytical tools are available to help researchers better assess the efficacy of programs based on a pretest-posttest design (see McArdle, 2009 ). The goal of this article is to offer a formal presentation of a latent curve model approach (LCM; Muthén and Curran, 1997 ) to analyze intervention effects with only two waves of data. After a brief overview of the advantageous of the LCM framework over classic ANOVA analyses, a step-by-step application of the LCM on real pretest-posttest intervention data is provided.

Evaluation Approaches: Observed Variables vs. Latent Variables

Broadly speaking, approaches to intervention evaluation can be distinguished into two categories: (1) approaches using observed variables and (2) approaches using latent variables . The first category includes widely used parametric tests such as Student's t , repeated measures analysis of variance (RM-ANOVA), analysis of covariance (ANCOVA), and ordinary least-squares regression (see Tabachnick and Fidell, 2013 ). However, despite their broad use, observed variable approaches suffer from several limitations, many of them ingenerated by the strong underlying statistical assumptions that must be satisfied. A first series of assumption underlying classic parametric tests is that the data being analyzed are normally distributed and have equal population variances (also called homogeneity of variance or homoscedasticity assumption). Normality assumption is not always met in real data, especially when the variables targeted by the treatment program are infrequent behaviors (i.e., externalizing conducts) or clinical syndromes ( Micceri, 1989 ). Likewise, homoschedasticy assumption is rarely met in randomized control trial as a result of the experimental variable causing differences in variability between groups ( Grissom and Kim, 2012 ). Violation of normality and homoscedasticity assumptions can compromise the results of classic parametric tests, in particular on rates of Type-I ( Tabachnick and Fidell, 2013 ) and Type-II error ( Wilcox, 1998 ). Furthermore, the inability to deal with measurement error can also lower the accuracy of inferences based on regression and ANOVA-family techniques which assume that the variables are measured without errors. However, the presence of some degree of measurement error is a common situation in psychological research where the focus is often on not directly observable constructs such as depression, self-esteem, or intelligence. Finally, observed variable approaches assume (without testing it) that the measurement structure of the construct under investigation is invariant across groups and/or time ( Meredith and Teresi, 2006 ; Millsap, 2011 ). Thus, lack of satisfied statistical assumptions and/or uncontrolled unreliability can lead to the under or overestimation of the true relations among the constructs analyzed (for a detailed discussion of these issues, see Cole and Preacher, 2014 ).

On the other side, latent variable approaches refer to the class of techniques termed under the label structural equation modeling (SEM; Bollen, 1989 ) such as confirmatory factor analysis (CFA; Brown, 2015 ) and mean and covariance structures analysis (MACS; Little, 1997 ). Although a complete overview of the benefits of SEM is beyond the scope of the present work (for a thorough discussion, see Little, 2013 ; Kline, 2016 ), it is worthwhile mentioning here those advantages that directly relate to the evaluation of intervention programs. First, SEM can easily accommodate the lack of normality in the data. Indeed, several estimation methods with standard errors robust to non-normal data are available and easy-to-use in many popular statistical programs (e.g., MLM, MLR, WLSMV, etc. in M plus ; Muthén and Muthén, 1998–2012 ). Second, SEM explicitly accounts for measurement error by separating the common variance among the indicators of a given construct (i.e., the latent variable) from their residual variances (which include both measurement error and unique sources of variability). Third, if multiple items from a scale are used to assess a construct, SEM allows the researcher to evaluate to what extent the measurement structure (i.e., factor loadings, item intercepts, residual variances, etc.) of such scale is equivalent across groups (e.g., intervention group vs. control group) and/or over time (i.e., pretest and posttest); this issue is known as measurement invariance (MI) and, despite its crucial importance for properly interpreting psychological findings, is rarely tested in psychological research (for an overview see Millsap, 2011 ; Brown, 2015 ). Finally, different competitive SEMs can be evaluated and compared according to their goodness of fit ( Kline, 2016 ). Many SEM programs, indeed, print in their output a series of fit indexes that help the researcher assess whether the hypothesized model is consistent with the data or not. In sum, when multiple indicators of the constructs of interest are available (e.g., multiple items from one scale, different informants, multiple methods, etc.), latent variables approaches offer many advantages and, therefore, they should be preferred over manifest variable approaches ( Little et al., 2009 ). Moreover, when a construct is measured using a single psychometric measure, there are still ways to incorporate the individuals' scores in the analyses as latent variables, and thus reduce the impact of measurement unreliability ( Cole and Preacher, 2014 ).

Latent Curve Models

Among latent variable models of change, latent curve models (LCMs; Meredith and Tisak, 1990 ), represent a useful and versatile tool to model stability and change in the outcomes targeted by an intervention program ( Muthén and Curran, 1997 ; Curran and Muthén, 1999 ). Specifically, in LCM individual differences in the rate of change can be flexibly modeled through the use of two continuous random latent variables : The intercept (which usually represents the level of the outcome of interest at the pretest) and the slope (i.e., the mean-level change over time from the pretest to the posttest). In detail, both the intercept and the slope have a mean (i.e., the average initial level and the average rate of change, respectively) and a variance (i.e., the amount of inter-individual variability around the average initial level and the average rate of change). Importantly, if both the mean and the variance of the latent slope of the outcome y in the intervention group are statistically significant (whereas they are not significant in the control group), that means that there was not only an average effect of the intervention, but also some participants were differently affected by the program ( Muthén and Curran, 1997 ). Hence, the assumption that participants respond to the treatment in the same way (as in ANOVA-family analyses) can be easily relaxed in LCM. Indeed, although individual differences may also be present in the ANOVA design, change occurs at the group level and, therefore, everyone is impacted in the same fashion after the exposure to the treatment condition.

As discussed by Muthén and Curran (1997) , the LCM approach is particular useful for evaluating intervention effects when it is conducted within a multiple group framework (i.e., MG-LCM), namely when the intercept and the slope of the outcome of interest are simultaneously estimated in the intervention and control group. Indeed, as illustrate in our example, the MG-LCM allows the research to test if both the mean and the variability of the outcome y at the pretest are similar across intervention and control groups, as well as if the mean rate of change and its inter-individual variability are similar between the two groups. Therefore, the MG-LCM provides information about the efficacy of an intervention program in terms of both (1) its average (i.e., group-level) effect and (2) participants' sensitivity to differently respond to the treatment condition.

However, a standard MG-LCM cannot be empirically identified with two waves of data ( Bollen and Curran, 2006 ). Yet, the use of multiple indicators (at least 2) for each construct of interest could represent a possible solution to overcome this problem by allowing the estimation of the intercept and slope as second-order latent variables ( McArdle, 2009 ; Geiser et al., 2013 ; Bishop et al., 2015 ). Interestingly, although second-order LCMs are becoming increasingly common in psychological research due to their higher statistical power to detect changes over time in the variables of interest ( Geiser et al., 2013 ), their use in the evaluation of intervention programs is still less frequent. In the next section, we present a formal overview of a second-order MG-LCM approach, we describe the possible models of change that can be tested to assess intervention effects in pretest-posttest design, and we show an application of the model to real data.

Identification of a Two-Time Point Latent Curve Model using Parallel Indicators

When only two points in time are available, it is possible to estimate two LCMs: A No-Change Model (see Figure 1 Panel A) and a Latent Change Model (see Figure 1 Panel B). In the following, we described in details the statistical underpinnings of both these models.

www.frontiersin.org

Figure 1. Second Order Latent Curve Models with parallel indicators (i.e., residual variances of observed indicators are equal within the same latent variable: ε 1 within η 1 and ε 2 within η 2 ) . All the intercepts of the observed indicators (Y) and endogenous latent variables (η) are fixed to 0 (not reported in figure). In model A, the residual variances of η 1 and η 2 (ζ 1 and ζ 2 , respectively) are freely estimated, whereas in Model B they are fixed to 0. ξ 1 , intercept; ξ 2 , slope; κ 1 , mean of intercept; κ 2 , mean of slope; ϕ 1 , variance of intercept; ϕ 2 , variance of slope; ϕ 12 , covariance between intercept and slope; η 1 , latent variable at T1; η 2 , latent variable at T2; Y, observed indicator of η; ε, residual variance/covariance of observed indicators.

Latent Change Model

A two-time points latent change model implies two latent means (κ k ), two latent factor variances (ζ k ), plus the covariance between the intercept and slope factor (Φ k ). This results in a total of 5+T model parameters, where T are the error variances for (y k ) when allowing VAR (∈ k ) to change over time. In the case of a two waves of data (i.e., T = 2), this latent change model has 7 parameters to estimate from a total of (2) (3)/2+2 = 5 identified means, variances, and covariances of the observed variables. Hence, two waves of data are insufficient to estimate this model. However, this latent change model can be just-identified (i.e., zero degrees of freedom [df]) by constraining the residual variances of the observed variables to be 0. This last constraint should be considered structural and thus included in all two-time points latent change model. In this latter case, the variances of the latent variables (i.e., the latent intercept representing the starting level, and the latent change score) are equivalent to those of the observed variables. Thus, when fallible variables are used, this impedes to separate true scores from their error/residual terms.

A possible way to allow this latent change model to be over-identified (i.e., df ≥ 1) is by assuming the availability of at least two observed indicators of the construct of interest at each time point (i.e., T1 and T2). Possible examples include the presence of two informants rating the same behavior (e.g., caregivers and teachers), two scales assessing the same construct, etc. However, even if the construct of interest is assessed by only one single scale, it should be noted that psychological instruments are often composed by several items. Hence, as noted by Steyer et al. (1997) , it is possible to randomly partitioning the items composing the scale into two (or more) parcels that can be treated as parallel forms. By imposing appropriate constraints on the loadings (i.e., λ k = 1), the intercepts (τ k = 0), within factor residuals (ε k = ε), and by fixing to 0 the residual variances of the first-order latent variables η k (ζ k = 0), the model can be specified as a first-order measurement model plus a second-order latent change model (see Figure 1 Panel B). Given previous constraints of loadings, intercepts, and first order factor residual variances, this model is over-identified because we have (4) (5)/2+4 = 14 observed variances, covariances, and means. Of course, when three or more indicators are available, identification issues cease to be a problem. In this paper, we restricted our attention to the two parallel indicators case to address the more basic situation that a researcher can encounter in the evaluation of a two time-point intervention. Yet, our procedure can be easily extended to cases in which three or more indicators are available at each time point.

Specification

More formally, and under usual assumptions ( Meredith and Tisak, 1990 ), the measurement model for the above two times latent change model in group k becomes:

where y k is a mp x 1 random vector that contains the observed scores, { y i t k } , for the ith variable at time t , i ∈ {1,2,., p}, and t ∈ {1,2,., m}. The intercepts are contained in the mp x 1 vector τ y k , Λ y k is a mp x mq matrix of factor loadings, η k is a mq x 1 vector of factor scores, and the unobserved error random vectors ∈ k is a mp x 1 vector. The population vector mean, μ y k , and covariance matrix, ∑ y k , or Means and Covariance Structure (MACS) are:

where μ η k is a vector of latent factors means, ∑ η k is the modeled covariance matrix, and θ ε k is a mp × mp matrix of observed variable residual covariances. For each column, fixing an element of Λ y k to 1, and an element of τ y k to 0, identifies the model. By imposing increasingly restrictive constraints on elements of matrix Λ y and τ y , the above two-indicator two-time points model can be identified.

The general equations for the structural part of a second order (SO) multiple group (MG) model are:

where Γ k is a mp x qr matrix containing second order factor coefficients, ξ k is a qr × 1 vector of second-order latent variables, and ζ k is a mq x 1 vector containing latent variable disturbance scores. Note that q is the number of latent factors and that r is the number of latent curves for each latent factor.

The population mean vector, μ η k , and covariance matrix, ∑ η k , based on (3) are

where Φ k is a r x r covariance of the latent variables, and Ψ k is a mq × mq latent variable residual covariance matrix. In the current application, what makes the difference in two models is the way in which matrices Γ k and Φ k are specified.

Application of the SO-MG-LCM to Intervention Studies Using a Pretest-Posttest Design

The application of the above two-times LCM to the evaluation of an intervention is straightforward. Usually, in intervention studies, individuals are randomly assigned to two different groups. The first group ( G 1 ) is exposed to an intervention that takes place somewhere after the initial time point. The second group ( G 2 ), also called the control group, does not receive any direct experimental manipulation. In light of the random assignment, G 1 and G 2 can be viewed as two equivalent groups drawn by the same population and the effect of the intervention may be ascertained by comparing individuals' changes from T1 to T2 across these two groups.

Following Muthén and Curran (1997) , an intercept factor should be modeled in both groups. However, only in the intervention group an additional latent change factor should be added. This factor is aimed at capturing the degree of change that is specific to the treatment group. Whereas, the absolute value for the latent mean of this factor can be interpreted as the change determined by the intervention in the intervention group, a significant variance indicates a meaningful heterogeneity in responding to the treatment. In this model α y k is a vector containing freely estimating mean values for the intercept (i.e., ξ 1 ), and the slope (i.e., ξ 2 ). Γ y k is thus a 2 x 2 matrix, containing basis coefficients, determined in [ 1 1 ] for the intercept (i.e., ξ 1 ) and [ 0 1 ] for the slope (i.e., ξ 2 ). Φ k is a 2 x 2 matrix containing variances and covariance for the two latent factors representing the intercept and the slope.

Given randomization, restricting the parameters of the intercept to be equal across the control and treatment populations is warranted in a randomized intervention study. Yet, baseline differences can be introduced in field studies where randomization is not possible or, simply, the randomization failed during the course of the study ( Cook and Campbell, 1979 ). In such cases, the equality constraints related to the mean or to the variance of the intercept can be relaxed.

The influence of participants' initial status on the effect of the treatment in the intervention group can also be incorporated in the model ( Cronbach and Snow, 1977 ; Muthén and Curran, 1997 ; Curran and Muthén, 1999 ) by regressing the latent change factor onto the intercept factor, so that the mean and variance of the latent change factor in the intervention group are expressed as a function of the initial status. Accordingly, this analysis captures to what extent inter-individual initial differences on the targeted outcome can predispose participants to differently respond to the treatment delivered.

Sequence of Models

We suggest a four-step approach to intervention evaluation. By comparing the relative fit of each model, researchers can have important information to assess the efficacy of their intervention.

Model 1: No-Change Model

A no-change model is specified for both intervention group (henceforth G1) and for control group (henceforth G2). As a first step, indeed, a researcher may assume that the intervention has not produced any meaningful effect, and therefore a no-change model (or strict stability model) should be simultaneously estimated in both the intervention and control group. In its more general version, the no-change model includes only a second-order intercept factor which represents the participants' initial level. Importantly, both the mean and variance of the second-order intercept factor are freely estimated across groups (see Figure 1 Panel A). More formally, in this model, Φ k is a qr x qr covariance matrix of the latent variables, and Γ k is a mq x qr matrix, containing for each latent variable, a set of basis coefficients for the latent curves.

Model 2: Latent Change Model in the Intervention Group

In this model, a slope growth factor is estimated in the intervention group only. As previously detailed, this additional latent factor is aimed at capturing any possible change in the intervention group. According to our premises, this model represents the “target” model, attesting a significant intervention effect in G1 but not in G2. Model 1 is then compared with Model 2 and changes in fit indexes between the two models are used to evaluate the need of this further latent factor (see section Statistical Analysis).

Model 3: Latent Change Model in Both the Intervention and Control Group

In model 3, a latent change model is estimated simultaneously in both G1 and G2. The fit of Model 2 is compared with the fit of Model 3 and changes in fit indexes between the two models are used to evaluate the need of this further latent factor in the control group. From a conceptual point of view, the goal of Model 3 is twofold because it allows the researcher: (a) to rule out the eventuality of “contaminations effects” between the intervention and control group ( Cook and Campbell, 1979 ); (b) to assess a possible, normative mean-level change in the control group (i.e., a change that cannot be attributed to the treatment delivered). In reference to (b), indeed, it should be noted that some variables may show a normative developmental increase during the period of the intervention. For instance, a consistent part of the literature has identified an overall increase in empathic capacities during early childhood (for an overview, see Eisenberg et al., 2015 ). Hence, researchers aimed at increasing empathy-related responding in young children may find that both the intervention and control group actually improved in their empathic response. In this situation, both the mean and variance of the latent slope should be constrained to equality across groups to mitigate the risk of confounding intervention effects with the normative development of the construct (for an alternative approach when more than two time points are available, see Muthén and Curran, 1997 ; Curran and Muthén, 1999 ). Importantly, the tenability of these constraints can be easily tested through a delta chi square test (Δχ 2 ) between the chi squares of the constrained model vs . unconstrained model. A significant Δχ 2 (usually p < 0.05) indicates that the two models are not statistically equivalent, and the unconstrained model should be preferred. On the contrary, a non-significant Δχ 2 (usually p > 0.05) indicates that the two models are statistically equivalent, and the constrained model (i.e., the more parsimonious model) should be preferred.

Model 4: Sensitivity Model

After having identified the best fitting model, the parameters of the intercept (i.e., mean and variance) should be constrained to equality across groups. This sensitivity analysis is crucial to ensure that both groups started with an equivalent initial status on the targeted behavior which is an important assumption in intervention programs. In line with previous analyses, the plausibility of initial status can be easily tested through the Δχ 2 test. Indeed, given randomization, it seems likely to assume that participants in both groups are characterized by similar or identical starting levels, and the groups have the same variability. These assumptions lead to a constrained no-change no-group difference model. This model is the same as the previous one, except that κ k = κ, or in our situation κ 1 = κ 2 . Moreover, in our situation, r = 1, q = 1, m = 2 , and hence, Φ k = Φ is a scalar, Γ k = 1 2 , and Ψ k = ΨI 2 for each of the k th population.

In the next section, the above sequence of models has been applied to the evaluation of a universal intervention program aimed to improve students' prosociality. We presented results from every step implied by the above methodology, and we offered a set of M plus syntaxes to allow researchers estimate the above models in their dataset.

The Young Prosocial Animation Program

The Young Prosocial Animation (YPA; Zuffianò et al., 2012 ) is a universal intervention program ( Greenberg et al., 2001 ) to sensitize adolescents to prosocial and empathic values ( Zuffianò et al., 2012 ).

In detail, the YPA tries to valorize: (a) the status of people who behave prosocially, (b) the similarity between the “model” and the participants, and (c) the outcomes related to prosocial actions. Following Bandura's (1977) concept of modeling , in fact, people are more likely to engage in those behaviors they value and if the model is perceived as similar and with an admired status . The main idea is that valuing these three aspects could foster a prosocial sensitization among the participants ( Zuffianò et al., 2012 ). In other terms, the goal is to promote the cognitive and emotional aspects of prosociality, in order to strengthen attitudes to act and think in a “prosocial way.” The expected change, therefore, is at the level of the personal dispositions in terms of an increased receptiveness and propensity for prosocial thinking (i.e., both the ability to take the point of view and to be empathetic rather than directly affecting the behaviors acted out by the individuals, as well as the ability to produce ideas and solutions that can help other people; Zuffianò et al., 2012 ). Due to its characteristics, YPA can be conceived as a first phase of prosocial sensitization on which implementing programs more appropriately direct to increase prosocial behavior (e.g., CEPIDEA program; Caprara et al., 2014 ). YPA aims to achieve this goal through a guided discussion following the viewing of some prosocial scenes selected from the film “Pay It Forward” 1 . After viewing each scene, a trained researcher, using a standard protocol guides a discussion among the participants highlighting: (i) the type of prosocial action (e.g., consoling, helping, etc.); (ii) the benefits for the actor and the target of the prosocial action; (iii) possible benefits of the prosocial action extended to the context (e.g., other persons, the more broad community, etc.); (iv) requirements of the actor to behave prosocially (e.g., being empathetic, bravery, etc.); (v) the similarity between the participant and the actor of the prosocial behavior; (vi) the thoughts and the feelings experienced during the viewing of the scene. The researcher has to complete the intervention within 12 sessions (1 h per session, once a week).

For didactic purposes, in the present study we re-analyzed data from an implementation of the YPA in three schools located in a small city in the South of Italy (see Zuffianò et al., 2012 for details).

We expected Model 2 (a latent change model in the intervention group and a no-change model in the control group) to be the best fitting model. Indeed, from a developmental point of view, we had no reason to expect adolescents showing a normative change in prosociality after such a short period of time ( Eisenberg et al., 2015 ). In line with the goal of the YPA, we hypothesized an small-medium increase in prosociality in the intervention group. We also expected that both groups did not differ at T1 in absolute level of prosocial behaviors, ensuring that both intervention and control group were equivalent. Finally, we explored the influence of participants' initial status on the treatment effect, a scenario in which those participants with lower initial level of prosociality benefitted more from attending the YPA session.

The study followed a quasi-experimental design , with both the intervention and control groups assessed at two different time points: Before (Time 1) YPA intervention and 6 months after (Time 2). Twelve classrooms from three schools (one middle school and two high schools) participated in the study during the school year 2008–2009. Each school has ensured the participation of 4 classes that were randomly assigned to intervention and control group (two classes to intervention group and two classes to control group). 2 In total, six classes were part of intervention group and six classes of control group. The students from the middle school were in the eighth grade (third year of secondary school in Italy), whereas the students from the two high schools were in the ninth (first year of high school in Italy) and tenth grade (second year of high school in Italy).

Participants

The YPA program was implemented in a city in the South of Italy. A total amount of 250 students participated in the study: 137 students (51.8% males) were assigned to the intervention group and 113 (54% males) to the control group. At T2 students were 113 in the intervention group (retention rate = 82.5%) and 91 in the control group (retention rate = 80.5%). Little's test of missingness at random showed a non-significant chi-squared value [ χ ( 2 ) 2 = 4.698, p = 0.10]; this means that missingness at posttest is not affected by the levels of prosociality at pretest. The mean age was 14.2 ( SD = 1.09) in intervention group, and 15.2 ( SD = 1.76) in control group. Considering socioeconomic status, the 56.8% of families in intervention group and the 60.0% in control group were one-income families. The professions mostly represented in the two groups were the “worker” among the fathers (the 36.4% in intervention group and the 27.9% in control group) and the “housewife” among the mothers (the 56.0% in the intervention group and the 55.2% in the control group). Parent's school level was approximately the same between the two groups: Most of parents in the intervention group (43.5%) and in the control group (44.7%) had a middle school degree.

Prosociality

Participants rated their prosociality on a 16-item scale (5-point Likert scale: 1 = never/almost never true ; 5 = almost always/always true ) that assesses the degree of engagement in actions aimed at sharing, helping, taking care of others' needs, and empathizing with their feelings (e.g., “ I try to help others ” and “ I try to console people who are sad ”). The alpha reliability coefficient was 0.88 at T1 and 0.87 at T2. The scale has been validated on a large sample of respondents ( Caprara et al., 2005 ) and has been found to moderately correlate ( r > 0.50) with other-ratings of prosociality ( Caprara et al., 2012 ).

Statistical Analysis

All the preceding models were estimated by maximum likelihood (ML) using M plus program 7 ( Muthén and Muthén, 1998–2012 ). Missing data were handled using full information maximum likelihood (FIML) estimation, which draws on all available data to estimate model parameters without imputing missing values ( Enders, 2010 ). To evaluate the goodness of fit, we relied on different criteria. First we evaluated the values assumed by the χ 2 likelihood ratio statistic for the overall group. Given that we were interested in the relative fit of the above presented different models of change within G1 and G2, we investigated also the contribution offered by each group to the overall χ 2 value. The idea was to have a more careful indication of the impact of including the latent change factor in a specific group. We also investigated the values of the Comparative Fit Index (CFI), the Tucker Lewis Fit Index (TLI), the Root Mean Square Error of Approximation (RMSEA) with associated 90% confidence intervals, and the Root Mean Square Residuals Standardized (SRMR). We accepted CFI and TLI values >0.90, RMSEA values <0.08, and SRMR <0.08 (see Kline, 2016 ). Last, we used the Akaike Information Criteria (AIC; Burnham and Anderson, 2004 ). AIC rewards goodness of fit and includes a penalty that is an increasing function of the number of parameters estimated. Burnham and Anderson (2004) recommend rescaling all the observed AIC values before selecting the best fitting model according to the following formula: Δi = AICi-AICmin, where AICmin is the minimum of the observed AIC values (among competing models). Practical guidelines suggest that a model which differs less than Δi = 2 from the best fitting model (which has Δi = 0) in a specific dataset is said to be “strongly supported by evidence”; if the difference lies between 4 ≤ and ≤ 7 there is considerably less support, whereas models with Δi > 10 have essentially no support.

We created two parallel forms of the prosociality scale by following the procedure described in Little et al. (2002 , p. 166). In Table 1 we reported zero-order correlations, mean, standard deviation, reliability, skewness, and kurtosis for each parallel form. Cronbach's alphas were good (≥0.74), and correlations were all significant at p < 0.001. Indices of skewness and kurtosis for each parallel form in both groups did not exceed the value of |0.61|, therefore the univariate distribution of all the eight variables (4 variables for 2 groups) did not show substantial deviations from normal distribution ( Curran et al., 1996) . In order to check multivariate normality assumptions, we computed the Mardia's two-sided multivariate test of fit for skewness and kurtosis. Given the well-known tendency of this coefficient to easily reject H 0 , we set alpha level at 0.001 (in this regard, see Mecklin and Mundfrom, 2005 ; Villasenor Alva and Estrada, 2009 ). Results of Mardia's two-sided multivariate test of fit for skewness and kurtosis showed p -value of 0.010 and 0.030 respectively. Therefore, the study variables showed an acceptable, even if not perfect, multivariate normality. Given the modest deviation from the normality assumption we decided to use Maximum Likelihood as the estimation method.

www.frontiersin.org

Table 1. Descriptive statistics and zero-order correlations for each group separately ( N = 250) .

Evaluating the Impact of the Intervention

In Table 2 we reported the fit indexes for the three alternative models (see Appendices B1–B4 for annotated M plus syntaxes for each of these). As hypothesized, Model 2 (see also Figure 2 ) was the best fitting model. Trajectories of Prosociality for intervention and control group separately are plotted in Figure 3 . The contribution of each group to overall chi-squared values highlighted how the lack of the slope factor in the intervention group results in a substantial misfit. On the contrary, adding a slope factor to control group did not significantly change the overall fit of the model [ Δ χ ( 1 ) 2 = 0.765, p = 0.381]. Of interest, the intercept mean and variance were equal across groups (see Table 2 , Model 4) suggesting the equivalence of G1 and G2 at T1.

www.frontiersin.org

Table 2. Goodness-of-fit indices for the tested models .

www.frontiersin.org

Figure 2. Best fitting Second Order Multiple Group Latent Curve Model with parameter estimates for both groups . Parameters in bold were fixed. This model has parallel indicators (i.e., residual variances of observed indicators are equal within the same latent variable, in each group). All the intercepts of the observed indicators (Y) and endogenous latent variables (η) are fixed to 0 (not reported in figure). G1, intervention group; G2, control group; ξ 1 , intercept of prosociality; ξ 2 , slope of prosociality; η 1 , prosociality at T1; η 2 , prosociality at T2; Y, observed indicator of prosociality; ε, residual variance of observed indicator. n.s. p > 0.05; * p < 0.05; ** p < 0.01; *** p < 0.001.

www.frontiersin.org

Figure 3. Trajectories of prosocial behavior for intervention group (G1) and control group (G2) in the best fitting model (Model 2 in Table 2 ) .

In Figure 2 we reported all the parameters of the best fitting model, for both groups. The slope factor of intervention group has significant variance (φ 2 = 0.28, p < 0.001) and a positive and significant mean (κ 2 = 0.19, p < 0.01). Accordingly, we investigated the presence of the influence of the initial status on the treatment effect by regressing the slope onto the intercept in the intervention group. Note that this latter model has the same fit of Model 2; however, by implementing a slope instead of a covariance, allows to control the effect of the individuals' initial status on their subsequent change. The significant effect of the intercept (i.e., β = –0.62, p < 0.001) on the slope ( R 2 = 0.38) indicated that participants who were less prosocial at the beginning increased steeper in their prosociality after the intervention.

Data collected in intervention programs are often limited to two points in time, namely before and after the delivery of the treatment (i.e., pretest and posttest). When analyzing intervention programs with two waves of data, researchers so far have mostly relied on ANOVA-family techniques which are flawed by requiring strong statistical assumptions and assuming that participants are affected in the same fashion by the intervention. Although a general, average effect of the program is often plausible and theoretically sounded, neglecting individual variability in responding to the treatment delivered can lead to partial or incorrect conclusions. In this article, we illustrated how latent variable models can help overcome these issues and provide the researcher with a clear model-building strategy to evaluate intervention programs based on a pretest-posttest design. To this aim, we outlined a sequence of four steps to be followed which correspond to substantive research questions (e.g., efficacy of the intervention, normative development, etc.). In particular, Model 1, Model 2, and Model 3 included a different combinations of no-change and latent change models in both the intervention and control group (see Table 2 ). These first three models are crucial to identify the best fitting trajectory of the targeted behavior across the two groups. Next, Model 4 was aimed at ascertaining if the intervention and control group were equivalent on their initial status (both in terms of average starting level and inter-individual differences) or if, vice-versa, this similarity assumption should be relaxed.

Importantly, even if the intervention and control group differ in their initial level, this should not prevent the researcher to investigate the presence of moderation effects—such as a treatment-initial status interaction—if this is in line with the researcher's hypotheses. One of the major advantage of the proposed approach, indeed, is the possibility to model the intervention effect as a random latent variable (i.e., the second-order latent slope) characterized by both a mean (i.e., the average change) and a variance (i.e., the degree of variability around the average effect). As already emphasized by Muthén and Curran (1997) , a statistically significant variance indicates the presence of systematic individual differences in responding to the intervention program. Accordingly, the latent slope identified in the intervention group can be regressed onto the latent intercept in order to examine if participants with different initial values on the targeted behavior were differently affected by the program. Importantly, the analysis of the interaction effects does not need to be limited to the treatment-initial status interaction but can also include other external variables as moderators (e.g., sex, SES, IQ, behavioral problems, etc.; see Caprara et al., 2014 ).

To complement our formal presentation of the LCM procedure, we provided a real data example by re-analyzing the efficacy of the YPA, a universal intervention program aimed to promote prosociality in youths ( Zuffianò et al., 2012 ). Our four-step analysis indicated that participants in the intervention group showed a small yet significant increase in their prosociality after 6 months, whereas students in the control group did not show any significant change (see Model 1, Model 2, and Model 3 in Table 2 ). Furthermore, participants in the intervention and control group did not differ in their initial levels of prosociality (Model 4), thereby ensuring the comparability of the two groups. These results replicated those reported by Zuffianò et al. (2012) and further attested to the effectiveness of the YPA in promoting prosociality among adolescents. Importantly, our results also indicated that there was a significant variability among participants in responding to the YPA program, as indicated by the significant variance of the latent slope. Accordingly, we explored the possibility of a treatment-initial status interaction. The significant prediction of the slope by the intercept indicated that, after 6 months, those participants showing lower initial levels of prosociality were more responsive to the intervention delivered. On the contrary, participants who were already prosocial at the pretest remained overall stable in their high level of prosociality. Although this effect was not hypothesized a priori , we can speculate that less prosocial participants were more receptive to the content of the program because they appreciated more than their (prosocial) counterparts the discussion about the importance and benefits of prosociality, topics that, very likely, were relatively new for them. However, it is important to remark that the goal of the YPA was to merely sensitize youth to prosocial and empathic values and not to change their actual behaviors. Accordingly, our findings cannot be interpreted as an increase in prosocial conducts among less prosocial participants. Future studies are needed to examine to what extent the introduction of the YPA in more intensive school-based intervention programs (see Caprara et al., 2014 ) could represent a further strength to promote concrete prosocial behaviors.

Limitations and Conclusions

Albeit the advantages of the proposed LCM approach, several limitations should be acknowledged. First of all, the use of a second order LCM with two available time points requires that the construct is measured by more than one observed indicators. As such, this technique cannot be used for single-item measures (e.g., Lucas and Donnellan, 2012 ). Second, as any structural equation model, our SO-MG-LCM makes the strong assumption that the specified model should be true in the population. An assumption that is likely to be violated in empirical studies. Moreover, it requires to be empirically identified, and thus an entire set of constraints that leave aside substantive considerations. Third, in this paper, we restricted our attention to the two parallel indicators case to address the more basic situation that a researcher can encounter in the evaluation of a two time-point intervention. Our aim was indeed to confront researchers with the more restrictive case, in terms of model identification. The case in which only two observed indicators are available is indeed, in our opinion, one of the more intimidating for researchers. Moreover, when a scale is composed of a long set of items or the target construct is a second order-construct loaded by two indicators (e.g., as in the case of psychological resilience; see Alessandri et al., 2012 ), and the sample size is not optimal (in terms of the ratio estimated parameters/available subjects) it makes sense to conduct measurement invariance test as a preliminary step, “before” testing the intervention effect, and then use the approach described above to be parsimonious and maximize statistical power. In these circumstances, the interest is indeed on estimating the LCM, and the invariance of indicators likely represent a prerequisite. Measurement invariance issues should never be undervalued by researchers. Instead, they should be routinely evaluated in preliminary research phases, and, when it is possible, incorporated in the measurement model specification phase. Finally, although intervention programs with two time points can still offer useful indications, the use of three (and possibly more) points in time provides the researcher with a stronger evidence to assess the actual efficacy of the program at different follow-up. Hence, the methodology described in this paper should be conceived as a support to take the best of pretest-posttest studies and not as an encouragement to collect only two-wave data. Fourth, SEM techniques usually require the use of relatively larger samples compared to classic ANOVA analyses. Therefore, our procedure may not be suited for the evaluation of intervention programs based on small samples. Although several rules of thumb have been proposed in the past for conducting SEM (e.g., N > 100), we encourage the use of Monte Carlo simulation studies for accurately planning the minimum sample size before starting the data collection ( Bandalos and Leite, 2013 ; Wolf et al., 2013 ).

Despite these limitations, we believe that our LCM approach could represent a useful and easy-to-use methodology that should be in the toolbox of psychologists and prevention scientists. Several factors, often uncontrollable, can oblige the researcher to collect data from only two points in time. In front of this (less optimal) scenario, all is not lost and researchers should be aware that more accurate and informative analytical techniques than ANOVA are available to assess intervention programs based on a pretest-posttest design.

Author Contributions

GA proposed the research question for the study and the methodological approach, and the focus and style of the manuscript; he contributed substantially to the conception and revision of the manuscript, and wrote the first drafts of all manuscript sections and incorporated revisions based on the suggestions and feedback from AZ and EP. AZ contributed the empirical data set, described the intervention and part of the discussion section, and critically revised the content of the study. EP conducted analyses and revised the style and structure of the manuscript.

The authors thank the students who participated in this study. This research was supported in part by a Research Grant (named: “Progetto di Ateneo”, No. 1081/2016) awarded by Sapienza University of Rome to GA, and by a Mobility Research Grant (No. 4389/2016) awarded by Sapienza University of Rome to EP.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00223/full#supplementary-material

1. ^ Directed by Leder (2000) .

2. ^ Importantly, although classrooms were randomized across the two conditions (i.e., intervention group and control group), the selection of the four classrooms in each school was not random (i.e., each classroom in school X did not have the same probability to participate in the YPA). In detail, participating classrooms were chosen according to the interest in the project showed by the head teachers.

Achenbach, T. M. (2017). Future directions for clinical research, services, and training: evidence-based assessment across informants, cultures, and dimensional hierarchies. J. Clin. Child Adolesc. Psychol. 46, 159–169. doi: 10.1080/15374416.2016.1220315

PubMed Abstract | CrossRef Full Text | Google Scholar

Alessandri, G., Vecchione, M., Caprara, G. V., and Letzring, T. D. (2012). The ego resiliency scale revised: a crosscultural study in Italy, Spain, and the United States. Eur. J. Psychol. Assess. 28, 139–146. doi: 10.1027/1015-5759/a000102

CrossRef Full Text | Google Scholar

Bandalos, D. L., and Leite, W. (2013). “Use of Monte Carlo studies in structural equation modeling research,” in Structural Equation Modeling: A Second Course, 2nd Edn. , eds G. R. Hancock and R. O. Mueller (Charlotte, NC: Information Age Publishing), 625–666.

Google Scholar

Bandura, A. (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychol. Rev. 84, 191–215. doi: 10.1037/0033-295X.84.2.191

Bishop, J., Geiser, C., and Cole, D. A. (2015). Modeling latent growth with multiple indicators: a comparison of three approaches. Psychol. Methods 20, 43–62. doi: 10.1037/met0000018

Bollen, K. A. (1989). Structural Equations with Latent Variables . New York, NY: Wiley.

Bollen, K. A., and Curran, P. J. (2006). Latent Curve Models: A Structural Equation Perspective . Hoboken, NJ: Wiley.

Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research . New York, NY: The Guilford Press.

Burnham, K. P., and Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33, 261–304. doi: 10.1177/0049124104268644

Caprara, G. V., Alessandri, G., and Eisenberg, N. (2012). Prosociality: the contribution of traits, values, and self-efficacy beliefs. J. Pers. Soc. Psychol. 102, 1289–1303. doi: 10.1037/a0025626

Caprara, G. V., Luengo Kanacri, B. P., Gerbino, M., Zuffianò, A., Alessandri, G., Vecchio, G., et al. (2014). Positive effects of promoting prosocial behavior in early adolescents: evidence from a school-based intervention. Int. J. Behav. Dev. 4, 386–396. doi: 10.1177/0165025414531464

Caprara, G. V., Steca, P., Zelli, A., and Capanna, C. (2005). A new scale for measuring adults' prosocialness. Eur. J. Psychol. Assess. 21, 77–89. doi: 10.1027/1015-5759.21.2.77

Cole, D. A., and Preacher, K. J. (2014). Manifest variable path analysis: potentially serious and misleading consequences due to uncorrected measurement error. Psychol. Methods 19, 300–315. doi: 10.1037/a0033805

Cook, T. D., and Campbell, D. T. (1979). Quasi-Experimentation: Design & Analysis Issues for Field Settings . Boston, MA: Houghton Mifflin.

Cronbach, L. J., and Snow, R. E. (1977). Aptitudes and Instructional Methods: A Handbook for Research on Interactions . New York, NY: Irvington.

Curran, P. J., and Muthén, B. O. (1999). The application of latent curve analysis to testing developmental theories in intervention research. Am. J. Commun. Psychol. 27, 567–595. doi: 10.1023/A:1022137429115

Curran, P. J., West, S. G., and Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychol. Methods 1, 16–29. doi: 10.1037/1082-989X.1.1.16

Eisenberg, N., Spinrad, T. L., and Knafo-Noam, A. (2015). “Prosocial development,” in Handbook of Child Psychology and Developmental Science Vol. 3, 7th Edn ., eds M. E. Lamb and R. M. Lerner (Hoboken, NJ: Wiley), 610–656.

Enders, C. K. (2010). Applied Missing Data Analysis . New York, NY: Guilford Press.

Geiser, C., Keller, B. T., and Lockhart, G. (2013). First-versus second-order latent growth curve models: some insights from latent state-trait theory. Struct. Equ. Modeling 20, 479–503. doi: 10.1080/10705511.2013.797832

Greenberg, M. T., Domitrovich, C., and Bumbarger, B. (2001). The prevention of mental disorders in school-aged children: current state of the field. Prevent. Treat. 4:1a. doi: 10.1037/1522-3736.4.1.41a

Grissom, R. J., and Kim, J. J. (2012). Effect Sizes for Research: Univariate and Multivariate Applications, 2nd Edn . New York, NY: Routledge.

Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling, 4th Edn . New York, NY: The Guilford Press.

Leder, M. (Director). (2000). Pay it Forward [Motion Picture]. Burbank, CA: Warner Bros.

Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: practical and theoretical issues. Multivariate Behav. Res. 32, 53–76. doi: 10.1207/s15327906mbr3201_3

Little, T. D. (2013). Longitudinal Structural Equation Modeling . New York, NY: The Guilford Press.

Little, T. D., Card, N. A., Preacher, K. J., and McConnell, E. (2009). “Modeling longitudinal data from research on adolescence,” in Handbook of Adolescent Psychology, Vol. 2, 3rd Edn ., eds R. M. Lerner and L. Steinberg (Hoboken, NJ: Wiley), 15–54.

Little, T. D., Cunningham, W. A., Shahar, G., and Widaman, K. F. (2002). To parcel or not to parcel: exploring the question, weighing the merits. Struct. Equ. Modeling 9, 151–173. doi: 10.1207/S15328007SEM0902_1

Lucas, R. E., and Donnellan, M. B. (2012). Estimating the reliability of single-item life satisfaction measures: results from four national panel studies. Soc. Indic. Res. 105, 323–331. doi: 10.1007/s11205-011-9783-z

Malti, T., Noam, G. G., Beelmann, A., and Sommer, S. (2016). Good Enough? Interventions for child mental health: from adoption to adaptation—from programs to systems. J. Clin. Child Adolesc. Psychol. 45, 707–709. doi: 10.1080/15374416.2016.1157759

McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annu. Rev. Psychol. 60, 577–605. doi: 10.1146/annurev.psych.60.110707.163612

Mecklin, C. J., and Mundfrom, D. J. (2005). A Monte Carlo comparison of the Type I and Type II error rates of tests of multivariate normality. J. Stat. Comput. Simul. 75, 93–107. doi: 10.1080/0094965042000193233

Meredith, W., and Teresi, J. A. (2006). An essay on measurement and factorial invariance. Med. Care 44, S69–S77. doi: 10.1097/01.mlr.0000245438.73837.89

Meredith, W., and Tisak, J. (1990). Latent curve analysis. Psychometrika 55, 107–122. doi: 10.1007/BF02294746

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychol. Bull. 105, 156–166. doi: 10.1037/0033-2909.105.1.156

Millsap, R. E. (2011). Statistical Approaches to Measurement Invariance . New York, NY: Routledge.

Muthén, B. O., and Curran, P. J. (1997). General longitudinal modeling of individual differences in experimental designs: a latent variable framework for analysis and power estimation. Psychol. Methods 2, 371–402. doi: 10.1037/1082-989X.2.4.371

Muthén, L. K., and Muthén, B. O. (1998–2012). Mplus User's Guide, 7th Edn . Los Angeles, CA: Muthen & Muthen.

Nimon, K. F. (2012). Statistical assumptions of substantive analyses across the general linear model: a mini-review. Front. Psychol. 3:322. doi: 10.3389/fpsyg.2012.00322

Roberts, M. C., and Ilardi, S. S. (2003). Handbook of Research Methods in Clinical Psychology . Oxford: Blackwell Publishing.

Schmider, E., Ziegler, M., Danay, E., Beyer, L., and Bühner, M. (2010). Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology 6, 147–151. doi: 10.1027/1614-2241/a000016

Steyer, R., Eid, M., and Schwenkmezger, P. (1997). Modeling true intraindividual change: true change as a latent variable. Methods Psychol. Res. Online 2, 21–33.

Tabachnick, B. G., and Fidell, L. S. (2013). Using Multivariate Statistics, 6th Edn . New Jersey, NJ: Pearson.

Villasenor Alva, J. A., and Estrada, E. G. (2009). A generalization of Shapiro–Wilk's test for multivariate normality. Commun. Stat. Theor. Methods 38, 1870–1883. doi: 10.1080/03610920802474465

Wilcox, R. R. (1998). The goals and strategies of robust methods. Br. J. Math. Stat. Psychol. 51, 1–39. doi: 10.1111/j.2044-8317.1998.tb00659.x

Wolf, E. J., Harrington, K. M., Clark, S. L., and Miller, M. W. (2013). Sample size requirements for structural equation models: an evaluation of power, bias, and solution propriety. Educ. Psychol. Meas. 76, 913–934. doi: 10.1177/0013164413495237

Zuffianò, A., Alessandri, G., and Roche-Olivar, R. (2012). Valutazione di un programma di sensibilizzazione prosociale: young prosocial animation [evaluation of a prosocial sensitization program: the young prosocial animation]. Psicol. Educ. 2, 203–219.

Keywords: experimental design, pretest-posttest, intervention, multiple group latent curve model, second order latent curve model, structural equation modeling, latent variables

Citation: Alessandri G, Zuffianò A and Perinelli E (2017) Evaluating Intervention Programs with a Pretest-Posttest Design: A Structural Equation Modeling Approach. Front. Psychol . 8:223. doi: 10.3389/fpsyg.2017.00223

Received: 21 November 2016; Accepted: 06 February 2017; Published: 02 March 2017.

Reviewed by:

Copyright © 2017 Alessandri, Zuffianò and Perinelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guido Alessandri, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7.3 Quasi-Experimental Research

Learning objectives.

  • Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research.
  • Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.

The prefix quasi means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.

Nonequivalent Groups Design

Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions.

Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.

Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.

Pretest-Posttest Design

In a pretest-posttest design , the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.

If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of history . Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of maturation . Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.

Another alternative explanation for a change in the dependent variable in a pretest-posttest design is regression to the mean . This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study because of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is spontaneous remission . This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001). Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

Does Psychotherapy Work?

Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952). But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate without receiving psychotherapy. This suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here:

http://psychclassics.yorku.ca/Eysenck/psychotherapy.htm

Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980). They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.

Han Eysenck

In a classic 1952 article, researcher Hans Eysenck pointed out the shortcomings of the simple pretest-posttest design for evaluating the effectiveness of psychotherapy.

Wikimedia Commons – CC BY-SA 3.0.

Interrupted Time Series Design

A variant of the pretest-posttest design is the interrupted time-series design . A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979). Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.

Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of Figure 7.5 “A Hypothetical Interrupted Time-Series Design” shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.

Figure 7.5 A Hypothetical Interrupted Time-Series Design

A Hypothetical Interrupted Time-Series Design - The top panel shows data that suggest that the treatment caused a reduction in absences. The bottom panel shows data that suggest that it did not

The top panel shows data that suggest that the treatment caused a reduction in absences. The bottom panel shows data that suggest that it did not.

Combination Designs

A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does not receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve more than participants who do not receive the treatment.

Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.

Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.

Key Takeaways

  • Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
  • Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. It does not eliminate the problem of confounding variables, however, because it does not involve random assignment to conditions. For these reasons, quasi-experimental research is generally higher in internal validity than correlational studies but lower than true experiments.
  • Practice: Imagine that two college professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.

Discussion: Imagine that a group of obese children is recruited for a study in which their weight is measured, then they participate for 3 months in a program that encourages them to be more active, and finally their weight is measured again. Explain how each of the following might affect the results:

  • regression to the mean
  • spontaneous remission

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin.

Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16 , 319–324.

Posternak, M. A., & Miller, I. (2001). Untreated short-term course of major depression: A meta-analysis of studies using outcomes from studies using wait-list control groups. Journal of Affective Disorders, 66 , 139–146.

Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy . Baltimore, MD: Johns Hopkins University Press.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Mona Massoud

  • Princess Nora bint Abdul Rahman University

What is the best statistical analysis for a quasi-experiment?

Most recent answer.

pretest and posttest experimental group design

Top contributors to discussions in this field

David Morse

  • Mississippi State University (Emeritus)

David L Morgan

  • Portland State University

Anuraj Nayarisseri

  • EMINENT BIOSCIENCES

Valentine Joseph Owan

  • University of Calabar

Dhritikesh Chakrabarty

  • Handique Girls' College

Get help with your research

Join ResearchGate to ask questions, get input, and advance your work.

All Answers (6)

pretest and posttest experimental group design

Similar questions and discussions

  • Asked 7 December 2020

Lucy Gilbert

  • Asked 12 September 2018

Sakila Yesmin

  • Asked 26 October 2022

Abdi Dandena

  • Asked 14 March 2017

Wahyu Hidayati

  • Asked 24 December 2015

Alejandro Carriedo

  • Asked 22 February 2024

Jane Elizabeth Thomas

  • Asked 10 December 2023

Ali Hassan Hommadi

  • Asked 1 September 2023

Karen H Schmeelk-Cone

  • Asked 30 July 2023

Rk Naresh

Related Publications

Larry V Hedges

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Loading to Phd Occupational Therapy Salary....

IMAGES

  1. Pretest-Posttest Design: Definition & Examples

    pretest and posttest experimental group design

  2. Pretest-Posttest Design

    pretest and posttest experimental group design

  3. PRE-TEST AND POST-TEST

    pretest and posttest experimental group design

  4. More Pretest-Posttest Design

    pretest and posttest experimental group design

  5. Pretest-Posttest Design

    pretest and posttest experimental group design

  6. Chapter 20 Answers

    pretest and posttest experimental group design

VIDEO

  1. Paano Gamitin ang Randomized Posttest Only Control Group Design

  2. Pretest Section 2 PKGBI

  3. Two-Group Experimental Design

  4. Static Group Pretest-Posttest Design

  5. One Group Pretest Posttest (Paired Sample T-Test, One Tail Greater)

  6. Independent Groups Design (Random Groups Design)

COMMENTS

  1. Pretest-Posttest Design: Definition & Examples

    A pretest-posttest design is an experiment in which measurements are taken on individuals both before and after they're involved in some treatment.. Pretest-posttest designs can be used in both experimental and quasi-experimental research and may or may not include control groups. The process for each research approach is as follows:

  2. Pretest-Posttest Designs

    The Two Group Control Group Design. This is, by far, the simplest and most common of the pretest-posttest designs, and is a useful way of ensuring that an experiment has a strong level of internal validity.The principle behind this design is relatively simple, and involves randomly assigning subjects between two groups, a test group and a control. ...

  3. Evaluating Intervention Programs with a Pretest-Posttest Design: A

    Keywords: experimental design, pretest-posttest, intervention, multiple group latent curve model, second order latent curve model, structural equation modeling, latent variables Introduction Evaluating intervention programs is at the core of many educational and clinical psychologists' research agenda (Malti et al., 2016 ; Achenbach, 2017 ).

  4. PDF Pretest-posttest designs and measurement of change

    Design 1: Randomized control-group pretest-posttest design With this RD, all conditions are the same for both the experimental and control groups, with the excep-tion that the experimental group is exposed to a treat-ment, T, whereas the control group is not. Maturation and history are major problems for internal validity in

  5. Why Is the One-Group Pretest-Posttest Design Still Used?

    More than 50 years ago, Donald Campbell and Julian Stanley (1963) carefully explained why the one-group pretest-posttest pre-experimental design (Y 1 X Y 2) was a very poor choice for testing the effect of an independent variable X on a dependent variable Y that is measured at Time 1 and Time 2.The reasons ranged from obvious matters such as the absence of a control group to technical ...

  6. PDF Pretest-Posttest Comparison Group Designs: Analysis and Interpretation

    The simplest case of the pretest-posttest comparison group design has one treatment group and one comparison group. Prior to the pretest, subjects are randomly assigned to groups or conditions. Random assignment is an important feature of the pretest-posttest comparison group design and separates it from nonequivalent (nonrandomized) group designs.

  7. 8.2 Non-Equivalent Groups Designs

    These are the one-group posttest only design, the one-group pretest-posttest design, and the interrupted time-series design. ... and the switching replication with treatment removal design. Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. However, it does not ...

  8. Experimental Design

    According to Campbell and Stanley ( 1963 ), there are three basic types of true experimental designs: (1) pretest-posttest control group design, (2) Solomon four-group design, and (3) posttest-only control group design. The pretest-posttest control group design is the most widely used design in medical, social, educational, and psychological ...

  9. A rank-based approach to design and analysis of pretest-posttest

    The pretest-posttest control group design, also called the pretest-posttest randomized experimental design, is widely used to evaluate effects of treatment in practice [[1], [2], [3]]. In such a trial, participants are randomly assigned to either the treatment group or the control group. The outcome of interest is measured twice, once before ...

  10. Why Is the One-Group Pretest-Posttest Design Still Used?

    More than 50 years ago, Donald Campbell and Julian Stanley (1963) care-fully explained why the one-group pretest-posttest pre-experimental design (Y1 X Y2) was a very poor choice for testing the effect of an independent variable X on a dependent variable Y that is measured at Time 1 and Time 2. The reasons ranged from obvious matters such as ...

  11. PDF Experimental Designs

    posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs but not in posttest only designs. The design notation is shown in Figure 10.6. Figure 10.6. Solomon four-group design Quasi-Experimental Designs

  12. Pretest-Posttest Design

    A pretest-posttest experimental design is a quasi-experimental approach, which means the aim of the approach is to establish a cause-and-effect relationship. ... The randomized Solomon four-group ...

  13. Pretest-posttest designs and measurement of change

    The article examines issues involved in comparing groups and measuring change with pretest and posttest data. Different pretest-posttest designs are presented in a manner that can help rehabilitation professionals to better understand and determine effects resulting from selected interventions. The reliability of gain scores in pretest-posttest ...

  14. The Use and Interpretation of Quasi-Experimental Studies in Medical

    1. The one-group posttest-only design: X O1 2. The one-group pretest-posttest design: O1 X O2 3. The one-group pretest-posttest design using a double pretest: O1 O2 X O3 4. The one-group pretest-posttest design using a nonequivalent dependent variable (O1a, O1b) X (O2a, O2b) 5.

  15. PDF Chapter 8. Experiments Topics Appropriate for Experimental Research

    Procedure. Step 1. Randomly assign subjects in your sample to an experimental group and a control group. Step 2. Pretest two groups to make sure they are similar in ways related to your experiment. Step 3. The experimental group is exposed to the experimental stimulus, but not the control group. Step 4.

  16. Evaluating Intervention Programs with a Pretest-Posttest Design: A

    Keywords: experimental design, pretest-posttest, intervention, multiple group latent curve model, second order latent curve model, structural equation modeling, latent variables. Citation: Alessandri G, Zuffianò A and Perinelli E (2017) Evaluating Intervention Programs with a Pretest-Posttest Design: A Structural Equation Modeling Approach. Front.

  17. 7.3 Quasi-Experimental Research

    A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest.

  18. (PDF) The Effect of PhET Simulations on Graphing Linear ...

    Employing a quasi-experimental pretest-posttest control group design, descriptive statistics (including frequencies, percentages, means, and standard deviations) were used to measure the students ...

  19. Comparison of Knowledge © The Author(s) 2022 Change in a Virtual

    Methods. A pretest-posttest design was used to evaluate 127 participants. Partic-ipants were randomly assigned to one of four conditions: desktop, tablet, mobile VR (Google Cardboard), and virtual reality headset. A pretest was given prior to participants' completion of a learning simulation. Upon completion of the

  20. What is the best statistical analysis for a quasi-experiment?

    Most recent answer. There is a very good method of comparing the effect in the intervention group compared to the comparison group: this is the effect size and the improvement index. The method is ...

  21. Chapter 07 Flashcards

    Looks like a true experimental design but lacks the key ingredient -- random assignment. - Probably the most commonly used quasi-experimental design is the nonequivalent groups design. - In its simplest form it requires a pretest and posttest for a treated and comparison group. - It is nonequivalent because there is no random assignment

  22. PDF Phd Occupational Therapy Salary

    utilized a one-group pretest-posttest quasi-experimental design with two phases. Phase 1 involved a pre-survey on the participants' (N = 349) perceptions of preparedness, understanding, confidence, and self-efficacy in appropriate salary expectations, understanding various pay packages, and salary negotiations.

  23. Fostering preservice teachers' research-related beliefs and motivation

    We investigated whether growth mindset (GM) and utility value (UV) interventions can change preservice teachers' skeptical beliefs about educational research and improve their willingness to engage with research. In an online experiment (Study 1, N = 84), the GM intervention increased growth mindset and research-related expectancy beliefs, and the UV intervention increased utility value ...