Banner

Evidence-Based Practice (EBP)

  • The EBP Process
  • Forming a Clinical Question
  • Inclusion & Exclusion Criteria
  • Acquiring Evidence
  • Appraising the Quality of the Evidence
  • Writing a Literature Review
  • Finding Psychological Tests & Assessment Instruments

Selection Criteria

Inclusion and exclusion are two sides of the same coin.

Inclusion and exclusion criteria are determined after formulating the research question but usually before the search is conducted (although preliminary scoping searches may need to be undertaken to determine appropriate criteria).  It may be helpful to determine the inclusion criteria and exclusion criteria for each PICO component.

Be aware that you may  introduce bias  into the final review if these are not used thoughtfully. 

Inclusion and exclusion are two sides of the same coin, so—depending on your perspective—a single database filter can be said to either include or exclude. For instance, if articles must be published within the last 3 years, that is inclusion. If articles cannot be more than 3 years old, that is exclusion. 

The most straightforward way to include or exclude results is to use database limiters (filters), usually found on the left side of the search results page.

Inclusion Criteria

Inclusion criteria are the elements of an article  that must be present  in order for it to be eligible for inclusion in a literature review. Some examples are:

  • Included studies must have compared certain treatments
  • Included studies must be a certain type (e.g., only Randomized Controlled Trials)
  • Included studies must be located in a certain geographic area
  • Included studies must have been published in the last 5 years

Exclusion Criteria

Exclusion criteria are the elements of an article that  disqualify the study from inclusion  in a literature review. Some examples are:

  • Study used an observational design
  • Study used a qualitative methodology
  • Study was published more than 5 years ago
  • Study was published in a language other than English
  • << Previous: Forming a Clinical Question
  • Next: Acquiring Evidence >>
  • Last Updated: May 16, 2024 2:44 PM
  • URL: https://libguides.umsl.edu/ebp

Banner

Systematic Reviews

  • The Research Question
  • Inclusion and Exclusion Criteria
  • Original Studies
  • Translating
  • Deduplication
  • Project Management Tools
  • Useful Resources
  • What is not a systematic review?

Inclusion/Exclusion Criteria

Inclusion criteria.

Identify the criteria that will be used to determine which research studies will be included. The inclusion and exclusion criteria must be decided before you start the review. Inclusion criteria is everything a study must have to be included. Exclusion criteria are the factors that would make a study ineligible to be included. Criteria that should be considered include:

Type of studies:  It is important to select articles with an appropriate study design for the research question. Dates for the studies and a timeline of the problem/issue being examined may need to be identified. 

Type of participants:  Identify the target population characteristics. It is important to define the target population's age, sex/gender, diagnosis, as well as any other relevant factors.

Types of intervention:  Describe the intervention being investigated. Consider whether to include interventions carried out globally or just in the United States. Eligibility criteria for interventions should include things such as the dose, delivery method, and duration of the investigated intervention. The interventions that are to be excluded may also need to be described here.

Types of outcome measures:  Outcome measures usually refer to measurable outcomes or ‘clinical changes in health’. For example, these could include body structures and functions like pain and fatigue, activities as in functional abilities, and participation or quality of life questionnaires.

Read Chapter 3 of the Cochrane Handbook

Exclusion criteria.

A balance of specific inclusion and exclusion criteria is paramount. For some systematic reviews, there may already be a large pre-existing body of literature. The search strategy may retrieve thousands of results that must be screened. Having explicit exclusion criteria from the beginning allows those conducting the screening process, an efficient workflow. For the final product there should be a section in the review dedicated to 'Characteristics of excluded studies.' It is important to summarize why studies were excluded, especially if to a reader the study would appear to be eligible for the systematic review. 

For example, a team is conducting a systematic review regarding intervention options for the treatment of opioid addiction. The research team may want to exclude studies that also involve alcohol addiction to isolate the conditions for treatment interventions solely for opioid addiction.

  • << Previous: Planning and Protocols
  • Next: Searching >>
  • Last Updated: Jul 29, 2024 2:41 PM
  • URL: https://libguides.sph.uth.tmc.edu/SystematicReviews
  • Reserve a study room
  • Library Account
  • Undergraduate Students
  • Graduate Students
  • Faculty & Staff

How to Conduct a Literature Review (Health Sciences and Beyond)

  • What is a Literature Review?
  • Developing a Research Question

Selection Criteria

Inclusion criteria, exclusion criteria.

  • Database Search
  • Documenting Your Search
  • Organize Key Findings
  • Reference Management

You may want to think about criteria that will be used to select articles for your literature review based on your research question.  These are commonly known as  inclusion criteria  and  exclusion criteria .  Be aware that you may introduce bias into the final review if these are not used thoughtfully.

Inclusion criteria are the elements of an article that must be present in order for it to be eligible for inclusion in a literature review.  Some examples are:

  • Included studies must have compared certain treatments
  • Included studies must be experimental
  • Included studies must have been published in the last 5 years

Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review.  Some examples are:

  • Study used an observational design
  • Study used a qualitative methodology
  • Study was published more than 5 years ago
  • Study was published in a language other than English
  • << Previous: Developing a Research Question
  • Next: Databases >>
  • Last Updated: Mar 15, 2024 12:22 PM
  • URL: https://guides.library.vcu.edu/health-sciences-lit-review

exclusion criteria for literature review

Systematic Reviews for Health Sciences and Medicine

  • Systematic Reviews
  • The research question
  • Common search errors
  • Search translation
  • Managing results
  • Inclusion and exclusion criteria
  • Critical appraisal
  • Updating a Review
  • Resources by Review Stage

Inclusion and Exclusion Criteria

Inclusion and exclusion criteria set the boundaries for the systematic review.  They are determined after setting the research question usually before the search is conducted, however scoping searches may need to be undertaken to determine appropriate criteria.  Many different factors can be used as inclusion or exclusion criteria. Information about the inclusion and exclusion criteria is usually recorded as a paragraph or table within the methods section of the systematic review.   It may also be necessary to give the definitions, and source of the definition, used for particular concepts in the research question (e.g. adolescence, depression).  

exclusion criteria for literature review

Other inclusion/exclusion criteria can include the sample size, method of sampling or availability of a relevant comparison group in the study.  Where a single study is reported across multiple papers the findings from the papers may be merged or only the latest data may be included.

  • << Previous: Managing results
  • Next: Critical appraisal >>
  • Last Updated: Aug 27, 2024 2:17 PM
  • URL: https://unimelb.libguides.com/sysrev

University of Texas

  • University of Texas Libraries

Literature Reviews

Determine inclusion and exclusion criteria.

  • What is a literature review?
  • Steps in the Literature Review Process
  • Define your research question
  • Choose databases and search
  • Review Results
  • Synthesize Results
  • Analyze Results
  • Librarian Support
  • Artificial Intelligence (AI) Tools

Once you have a clearly defined research question, make sure you are getting precisely the right search results from searching the databases by making decisions about these items:

  • Would the most recent five years be appropriate?
  • Is your research from a more historical perspective?
  • Where has this type of research taken place?
  • Will you confine your results to the United States?
  • To English speaking countries?
  • Will you translate works if needed?
  • Is there a particular methodology, or population that you are focused on?
  • Are there adjacent fields in which this type of research has been conducted that you would like to include?
  • Is there a controversy or debate in your research field that you want to highlight
  • Are you creating a historical overview? Is this background reading for your research?
  • Is there new technology that can shed light on an old problem or an old technology that can be used in a new way?
  • Last Updated: Aug 26, 2024 5:59 AM
  • URL: https://guides.lib.utexas.edu/literaturereviews

Creative Commons License

Banner

Systematic Reviews

  • The Research Question
  • Basic Service
  • Full Service
  • Inclusion and Exclusion Criteria
  • Translating
  • Deduplication
  • Screening & Selection
  • Citation Searching
  • Other Review Types

Inclusion Criteria

Read chapter 3 of the cochrane handbook.

Identify the criteria that will be used to determine which research studies will be included. The inclusion and exclusion criteria must be decided before you start the review. Inclusion criteria is everything a study must have to be included. Exclusion criteria are the factors that would make a study ineligible to be included. Criteria that should be considered include:

Type of studies:  It is important to select articles with an appropriate study design for the research question. Dates for the studies and a timeline of the problem/issue being examined may need to be identified. 

Type of participants:  Identify the target population characteristics. It is important to define the target population's age, sex/gender, diagnosis, as well as any other relevant factors.

Types of intervention:  Describe the intervention being investigated. Consider whether to include interventions carried out globally or just in the United States. Eligibility criteria for interventions should include things such as the dose, delivery method, and duration of the investigated intervention. The interventions that are to be excluded may also need to be described here.

Types of outcome measures:  Outcome measures usually refer to measurable outcomes or ‘clinical changes in health’. For example, these could include body structures and functions like pain and fatigue, activities as in functional abilities, and participation or quality of life questionnaires.

Exclusion Criteria

A balance of specific inclusion and exclusion criteria is paramount. For some systematic reviews, there may already be a large pre-existing body of literature. The search strategy may retrieve thousands of results that must be screened. Having explicit exclusion criteria from the beginning allows those conducting the screening process, an efficient workflow. For the final product there should be a section in the review dedicated to 'Characteristics of excluded studies.' It is important to summarize why studies were excluded, especially if to a reader the study would appear to be eligible for the systematic review. 

For example, a team is conducting a systematic review regarding intervention options for the treatment of opioid addiction. The research team may want to exclude studies that also involve alcohol addiction to isolate the conditions for treatment interventions solely for opioid addiction. 

Exercise for Developing Inclusion/Exclusion

Before developing your inclusion/exclusion criteria, please read Chapter Three of the Cochrane Handbook that reviews considerations for developing this criteria.

You must have a selection of relevant articles (a max of 5). Review the articles and make a bullet point list for each study of why that study would be either included or excluded from the review. This exercise can help jump start your predefined inclusion and exclusion criteria. This should be done before you start the review. 

Types of Study Design

There are different study types used for the evidence base in systematic reviews. Below are some definitions of the different study types that may be used. 

  • Randomized controlled trials (RCT) A group of patients is randomized into an experimental group and a control group to test the efficacy of a treatment/intervention. 
  • Cohort study Involves the identification of two groups (cohorts) of patients, one which did receive the exposure of interest, and one which did not, and following these cohorts forward for the outcome of interest.
  • Case-control study Involves identifying patients who have the outcome of interest (cases) and control patients without the same outcome, and looking to see if they had the exposure of interest. Just like cohort studies, case-control studies are observational.
  • Cross-sectional study Typically involves the surveying of a randomly selected group to find out their opinions or facts. These studies can answer questions such as how common a particular disease is, but cause and treatment of the disease cannot be gleaned.
  • Qualitative study Collects information on patients with diseases and those close to them. Requires specialized tools for analyzation and interpretation. These studies typically want to access a person's experience .
  • Meta-analysis A statistical analysis, which can either be a study in itself or a component of another study type. It uses quantitative methods to summarize the results of scientific studies. 

For further reading

Check out the Institute for Quality and Efficiency in Health Care (IQWiG) article What types of studies are there?  offered through the National Center for Biotechnology Information, U.S. National Library of Medicine.

  • << Previous: Planning and Protocols
  • Next: Searching >>
  • Last Updated: Aug 13, 2024 1:11 PM
  • URL: https://libguides.library.tmc.edu/Systematic_Reviews

Library Guides

Systematic Reviews

  • Introduction to Systematic Reviews
  • Systematic review
  • Systematic literature review
  • Scoping review
  • Rapid evidence assessment / review
  • Evidence and gap mapping exercise
  • Meta-analysis
  • Systematic Reviews in Science and Engineering
  • Timescales and processes
  • Question frameworks (e.g PICO)

Inclusion and exclusion criteria

  • Using grey literature
  • Search Strategy This link opens in a new window
  • Subject heading searching (e.g MeSH)
  • Database video & help guides This link opens in a new window
  • Documenting your search and results
  • Data management
  • How the library can help
  • Systematic reviews A to Z

exclusion criteria for literature review

Inclusion and exclusion criteria are a list of pre-defined characteristics to which literature must adhere to be included in a study. They are vital for the decision-making progress on what to review when undertaking a systematic review and will also help with systematic literature reviews.

You should be able to establish your inclusion/exclusion criteria during the process of defining your question. These criteria clearly demonstrate the scope of the study and provide justification for the exclusion of any information that does not meet these characteristics.

Example criteria

  • Intervention, treatment, process or experience
  • Reported outcomes
  • Research methodology
  • Participants
  • Age of study
  • Sample size
  • Place of study
  • Type of publication

E.g. stage 4 lung disease patients

E.g. whether the study's reported outcomes are relevant to your study and have been presented objectively

E.g. randomised control trial

E.g. age, sex ethnicity etc.

E.g. last 5 years

E.g. over 100 participants

E.g. UK based

E.g. primary research, peer-reviewed

E.g. community-based care

E.g. English

Precision vs Sensitivity

You should aim to be as extensive as possible when conducting searches for systematic reviews. However, it may be necessary to strike a balance between the sensitivity and precision of your search.

  • Sensitivity – the number of relevant results identified divided by the total number of relevant results in existence
  • Precision - the number of relevant results identified divided by the total number of results identified.

Increasing the comprehensiveness of a search will reduce its precision and will retrieve more non-relevant results. However, 

... at a conservatively-estimated reading rate of two abstracts per minute, the results of a   database search can be ‘scanread’ at the rate of 120 per hour (or approximately 1000  over an 8-hour period), so the high yield and low precision associated with systematic review searching is not as daunting as it might at first appear in comparison with the total  time to be invested in the review.  ( Cochrane Handbook for Systematic Reviews of Interventions, 2008, Section 6.4.4 )

  • << Previous: Question frameworks (e.g PICO)
  • Next: Where to search >>
  • Last Updated: Jan 23, 2024 10:52 AM
  • URL: https://plymouth.libguides.com/systematicreviews

Jump to navigation

Home

Cochrane Training

Chapter 3: defining the criteria for including studies and how they will be grouped for the synthesis.

Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Key Points:

  • The scope of a review is defined by the types of population (participants), types of interventions (and comparisons), and the types of outcomes that are of interest. The acronym PICO (population, interventions, comparators and outcomes) helps to serve as a reminder of these.
  • The population, intervention and comparison components of the question, with the additional specification of types of study that will be included, form the basis of the pre-specified eligibility criteria for the review. It is rare to use outcomes as eligibility criteria: studies should be included irrespective of whether they report outcome data, but may legitimately be excluded if they do not measure outcomes of interest, or if they explicitly aim to prevent a particular outcome.
  • Cochrane Reviews should include all outcomes that are likely to be meaningful and not include trivial outcomes. Critical and important outcomes should be limited in number and include adverse as well as beneficial outcomes.
  • Review authors should plan at the protocol stage how the different populations, interventions, outcomes and study designs within the scope of the review will be grouped for analysis.

Cite this chapter as: McKenzie JE, Brennan SE, Ryan RE, Thomson HJ, Johnston RV, Thomas J. Chapter 3: Defining the criteria for including studies and how they will be grouped for the synthesis [last updated August 2023]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook .

3.1 Introduction

One of the features that distinguishes a systematic review from a narrative review is that systematic review authors should pre-specify criteria for including and excluding studies in the review (eligibility criteria, see MECIR Box 3.2.a ).

When developing the protocol, one of the first steps is to determine the elements of the review question (including the population, intervention(s), comparator(s) and outcomes, or PICO elements) and how the intervention, in the specified population, produces the expected outcomes (see Chapter 2, Section 2.5.1 and Chapter 17, Section 17.2.1 ). Eligibility criteria are based on the PICO elements of the review question plus a specification of the types of studies that have addressed these questions. The population, interventions and comparators in the review question usually translate directly into eligibility criteria for the review, though this is not always a straightforward process and requires a thoughtful approach, as this chapter shows. Outcomes usually are not part of the criteria for including studies, and a Cochrane Review would typically seek all sufficiently rigorous studies (most commonly randomized trials) of a particular comparison of interventions in a particular population of participants, irrespective of the outcomes measured or reported. It should be noted that some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes; or a review may specifically address the adverse effects of an intervention used for several conditions (see Chapter 19 ).

Eligibility criteria do not exist in isolation, but should be specified with the synthesis of the studies they describe in mind. This will involve making plans for how to group variants of the PICO elements for synthesis. This chapter describes the processes by which the structure of the synthesis can be mapped out at the beginning of the review, and the interplay between the review question, considerations for the analysis and their operationalization in terms of eligibility criteria. Decisions about which studies to include (and exclude), and how they will be combined in the review’s synthesis, should be documented and justified in the review protocol.

A distinction between three different stages in the review at which the PICO construct might be used is helpful for understanding the decisions that need to be made. In Chapter 2, Section 2.3 , we introduced the ideas of a review PICO (on which eligibility of studies is based), the PICO for each synthesis (defining the question that each specific synthesis aims to answer) and the PICO of the included studies (what was actually investigated in the included studies). In this chapter, we focus on the review PICO and the PICO for each synthesis as a basis for specifying which studies should be included in the review and planning its syntheses. These PICOs should relate clearly and directly to the questions or hypotheses that are posed when the review is formulated (see Chapter 2 ) and will involve specifying the population in question, and a set of comparisons between the intervention groups.

An integral part of the process of setting up the review is to specify which characteristics of the interventions (e.g. individual compounds of a drug), populations (e.g. acute and chronic conditions), outcomes (e.g. different depression measurement scales) and study designs, will be grouped together. Such decisions should be made independent of knowing which studies will be included and the methods of synthesis that will be used (e.g. meta-analysis). There may be a need to modify the comparisons and even add new ones at the review stage in light of the data that are collected. For example, important variations in the intervention may be discovered only after data are collected, or modifying the comparison may facilitate the possibility of synthesis when only one or few studies meet the comparison PICO. Planning for the latter scenario at the protocol stage may lead to less post-hoc decision making ( Chapter 2, Section 2.5.3 ) and, of course, any changes made during the conduct of the review should be recorded and documented in the final report.

3.2 Articulating the review and comparison PICO

3.2.1 defining types of participants: which people and populations.

The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies and the likely scenarios in which the interventions will be used, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered together; they should be specified in advance (see MECIR Box 3.2.a ). As discussed in Chapter 2, Section 2.3.1 , the degree of breadth will vary, depending on the question being asked and the analytical approach to be employed. A range of evidence may inform the choice of population characteristics to examine, including theoretical considerations, evidence from other interventions that have a similar mechanism of action, and in vitro or animal studies. Consideration should be given to whether the population characteristic is at the level of the participant (e.g. age, severity of disease) or the study (e.g. care setting, geographical location), since this has implications for grouping studies and for the method of synthesis ( Chapter 10, Section 10.11.5 ). It is often helpful to consider the types of people that are of interest in three steps.

MECIR Box 3.2.a Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for participants ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered in aggregate. Considerations when specifying participants include setting, diagnosis or definition of condition and demographic factors. Any restrictions to study populations must be based on a sound rationale, since it is important that Cochrane Reviews are widely relevant.

Predefining a strategy for studies with a subset of eligible participants ( )

Sometimes a study includes some ‘eligible’ participants and some ‘ineligible’ participants, for example when an age cut-off is used in the review’s eligibility criteria. If data from the eligible participants cannot be retrieved, a mechanism for dealing with this situation should be pre-specified.

First, the diseases or conditions of interest should be defined using explicit criteria for establishing their presence (or absence). Criteria that will force the unnecessary exclusion of studies should be avoided. For example, diagnostic criteria that were developed more recently – which may be viewed as the current gold standard for diagnosing the condition of interest – will not have been used in earlier studies. Expensive or recent diagnostic tests may not be available in many countries or settings, and time-consuming tests may not be practical in routine healthcare settings.

Second, the broad population and setting of interest should be defined . This involves deciding whether a specific population group is within scope, determined by factors such as age, sex, race, educational status or the presence of a particular condition such as angina or shortness of breath. Interest may focus on a particular setting such as a community, hospital, nursing home, chronic care institution, or outpatient setting. Box 3.2.a outlines some factors to consider when developing population criteria.

Whichever criteria are used for defining the population and setting of interest, it is common to encounter studies that only partially overlap with the review’s population. For example, in a review focusing on children, a cut-point of less than 16 years might be desirable, but studies may be identified with participants aged from 12 to 18. Unless the study reports separate data from the eligible section of the population (in which case data from the eligible participants can be included in the review), review authors will need a strategy for dealing with these studies (see MECIR Box 3.2.a ). This will involve balancing concerns about reduced applicability by including participants who do not meet the eligibility criteria, against the loss of data when studies are excluded. Arbitrary rules (such as including a study if more than 80% of the participants are under 16) will not be practical if detailed information is not available from the study. A less stringent rule, such as ‘the majority of participants are under 16’ may be sufficient. Although there is a risk of review authors’ biases affecting post-hoc inclusion decisions (which is why many authors endeavour to pre-specify these rules), this may be outweighed by a common-sense strategy in which eligibility decisions keep faith with the objectives of the review rather than with arbitrary rules. Difficult decisions should be documented in the review, checked with the advisory group (if available, see Chapter 1 ), and sensitivity analyses can assess the impact of these decisions on the review’s findings (see Chapter 10, Section 10.14 and MECIR Box 3.2.b ).

Box 3.2.a Factors to consider when developing criteria for ‘Types of participants’

MECIR Box 3.2.b Relevant expectations for conduct of intervention reviews

Changing eligibility criteria ( )

Following pre-specified eligibility criteria is a fundamental attribute of a systematic review. However, unanticipated issues may arise. Review authors should make sensible post-hoc decisions about exclusion of studies, and these should be documented in the review, possibly accompanied by sensitivity analyses. Changes to the protocol must not be made on the basis of the findings of the studies or the synthesis, as this can introduce bias.

Third, there should be consideration of whether there are population characteristics that might be expected to modify the size of the intervention effects (e.g. different severities of heart failure). Identifying subpopulations may be important for implementation of the intervention. If relevant subpopulations are identified, two courses of action are possible: limiting the scope of the review to exclude certain subpopulations; or maintaining the breadth of the review and addressing subpopulations in the analysis.

Restricting the review with respect to specific population characteristics or settings should be based on a sound rationale. It is important that Cochrane Reviews are globally relevant, so the rationale for the exclusion of studies based on population characteristics should be justified. For example, focusing a review of the effectiveness of mammographic screening on women between 40 and 50 years old may be justified based on biological plausibility, previously published systematic reviews and existing controversy. On the other hand, focusing a review on a particular subgroup of people on the basis of their age, sex or ethnicity simply because of personal interests, when there is no underlying biologic or sociological justification for doing so, should be avoided, as these reviews will be less useful to decision makers and readers of the review.

Maintaining the breadth of the review may be best when it is uncertain whether there are important differences in effects among various subgroups of people, since this allows investigation of these differences (see Chapter 10, Section 10.11.5 ). Review authors may combine the results from different subpopulations in the same synthesis, examining whether a given subdivision explains variation (heterogeneity) among the intervention effects. Alternatively, the results may be synthesized in separate comparisons representing different subpopulations. Splitting by subpopulation risks there being too few studies to yield a useful synthesis (see Table 3.2.a and Chapter 2, Section 2.3.2 ). Consideration needs to be given to the subgroup analysis method, particularly for population characteristics measured at the participant level (see Chapter 10 and Chapter 26 , Fisher et al 2017). All subgroup analyses should ideally be planned a priori and stated as a secondary objective in the protocol, and not driven by the availability of data.

In practice, it may be difficult to assign included studies to defined subpopulations because of missing information about the population characteristic, variability in how the population characteristic is measured across studies (e.g. variation in the method used to define the severity of heart failure), or because the study does not wholly fall within (or report the results separately by) the defined subpopulation. The latter issue mainly applies for participant characteristics but can also arise for settings or geographic locations where these vary within studies. Review authors should consider planning for these scenarios (see example reviews Hetrick et al 2012, Safi et al 2017; Table 3.2.b , column 3).

Table 3.2.a Examples of population attributes and characteristics

Intended recipient of intervention

Patient, carer, healthcare provider (general practitioners, nurses, allied health professionals), health system, policy maker, community

In a review of e-learning programmes for health professionals, a subgroup analysis was planned to examine if the effects were modified by the (doctors, nurses or physiotherapists). The authors hypothesized that e-learning programmes for doctors would be more effective than for other health professionals, but did not provide a rationale (Vaona et al 2018).

Disease/condition (to be treated or prevented)

Type and severity of a condition

In a review of platelet-rich therapies for musculoskeletal soft tissue injuries, a subgroup analysis was undertaken to examine if the effects of platelet-rich therapies were modified by the (e.g. rotator cuff tear, anterior cruciate ligament reconstruction, chronic Achilles tendinopathy) (Moraes et al 2014).

In planning a review of beta-blockers for heart failure, subgroup analyses were specified to examine if the effects of beta-blockers are modified by the (e.g. idiopathic dilated cardiomyopathy, ischaemic heart disease, valvular heart disease, hypertension) and the (‘reduced left ventricular ejection fraction (LVEF)’ ≤ 40%, ‘mid-range LVEF’ > 40% and < 50%, ‘preserved LVEF’ ≥ 50%, mixed, not specified). Studies have shown that patient characteristics and comorbidities differ by heart failure severity, and that therapies have been shown to reduce morbidity in ‘reduced LVEF’ patients, but the benefits in the other groups are uncertain (Safi et al 2017).

Participant characteristics

Age (neonate, child, adolescent, adult, older adult)

Race/ethnicity

Sex/gender

PROGRESS-Plus equity characteristics (e.g. place of residence, socio-economic status, education) (O’Neill et al 2014)

In a review of newer-generation antidepressants for depressive disorders in children and adolescents, a subgroup analysis was undertaken to examine if the effects of the antidepressants were modified by . The rationale was based on the findings of another review that suggested that children and adolescents may respond differently to antidepressants. The age groups were defined as ‘children’ (aged approximately 6 to 12 years), ‘adolescents’ (aged approximately 13 to 18 years), and ‘children and adolescents’ (when the study included both children and adolescents, and results could not be obtained separately by these subpopulations) (Hetrick et al 2012).

Setting

Setting of care (primary care, hospital, community)

Rurality (urban, rural, remote)

Socio-economic setting (low and middle-income countries, high-income countries)

Hospital ward (e.g. intensive care unit, general medical ward, outpatient)

In a review of hip protectors for preventing hip fractures in older people, separate comparisons were specified based on (institutional care or community-dwelling) for the critical outcome of hip fracture (Santesso et al 2014).

3.2.2 Defining interventions and how they will be grouped

In some reviews, predefining the intervention ( MECIR Box 3.2.c ) may be straightforward. For example, in a review of the effect of a given anticoagulant on deep vein thrombosis, the intervention can be defined precisely. A more complicated definition might be required for a multi-component intervention composed of dietary advice, training and support groups to reduce rates of obesity in a given population.

The inherent complexity present when defining an intervention often comes to light when considering how it is thought to achieve its intended effect and whether the effect is likely to differ when variants of the intervention are used. In the first example, the anticoagulant warfarin is thought to reduce blood clots by blocking an enzyme that depends on vitamin K to generate clotting factors. In the second, the behavioural intervention is thought to increase individuals’ self-efficacy in their ability to prepare healthy food. In both examples, we cannot assume that all forms of the intervention will work in the same way. When defining drug interventions, such as anticoagulants, factors such as the drug preparation, route of administration, dose, duration, and frequency should be considered. For multi-component interventions (such as interventions to reduce rates of obesity), the common or core features of the interventions must be defined, so that the review authors can clearly differentiate them from other interventions not included in the review.

MECIR Box 3.2.c Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for interventions and comparators ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. Specification of comparator interventions requires particular clarity: are the experimental interventions to be compared with an inactive control intervention (e.g. placebo, no treatment, standard care, or a waiting list control), or with an active control intervention (e.g. a different variant of the same intervention, a different drug, a different kind of therapy)? Any restrictions on interventions and comparators, for example, regarding delivery, dose, duration, intensity, co-interventions and features of complex interventions should also be predefined and explained.

In general, it is useful to consider exactly what is delivered, who delivers it, how it is delivered, where it is delivered, when and how much is delivered, and whether the intervention can be adapted or tailored , and to consider this for each type of intervention included in the review (see the TIDieR checklist (Hoffmann et al 2014)). As argued in Chapter 17 , separating interventions into ‘simple’ and ‘complex’ is a false dichotomy; all interventions can be complex in some ways. The critical issue for review authors is to identify the most important factors to be considered in a specific review. Box 3.2.b outlines some factors to consider when developing broad criteria for the ‘Types of interventions’ (and comparisons).

Box 3.2.b Factors to consider when developing criteria for ‘Types of interventions’

Once interventions eligible for the review have been broadly defined, decisions should be made about how variants of the intervention will be handled in the synthesis. Differences in intervention characteristics across studies occur in all reviews. If these reflect minor differences in the form of the intervention used in practice (such as small differences in the duration or content of brief alcohol counselling interventions), then an overall synthesis can provide useful information for decision makers. Where differences in intervention characteristics are more substantial (such as delivery of brief alcohol counselling by nurses versus doctors), and are expected to have a substantial impact on the size of intervention effects, these differences should be examined in the synthesis. What constitutes an important difference requires judgement, but in general differences that alter decisions about how an intervention is implemented or whether the intervention is used or not are likely to be important. In such circumstances, review authors should consider specifying separate groups (or subgroups) to examine in their synthesis.

Clearly defined intervention groups serve two main purposes in the synthesis. First, the way in which interventions are grouped for synthesis (meta-analysis or other synthesis) is likely to influence review findings. Careful planning of intervention groups makes best use of the available data, avoids decisions that are influenced by study findings (which may introduce bias), and produces a review focused on questions relevant to decision makers. Second, the intervention groups specified in a protocol provide a standardized terminology for describing the interventions throughout the review, overcoming the varied descriptions used by study authors (e.g. where different labels are used for the same intervention, or similar labels used for different techniques) (Michie et al 2013). This standardization enables comparison and synthesis of information about intervention characteristics across studies (common characteristics and differences) and provides a consistent language for reporting that supports interpretation of review findings.

Table 3.2.b   outlines a process for planning intervention groups as a basis for/precursor to synthesis, and the decision points and considerations at each step. The table is intended to guide, rather than to be prescriptive and, although it is presented as a sequence of steps, the process is likely to be iterative, and some steps may be done concurrently or in a different sequence. The process aims to minimize data-driven approaches that can arise once review authors have knowledge of the findings of the included studies. It also includes principles for developing a flexible plan that maximizes the potential to synthesize in circumstances where there are few studies, many variants of an intervention, or where the variants are difficult to anticipate. In all stages, review authors should consider how to categorize studies whose reports contain insufficient detail.

Table 3.2.b A process for planning intervention groups for synthesis

1. Identify intervention characteristics that may modify the effect of the intervention.

Consider whether differences in interventions characteristics might modify the size of the intervention effect importantly. Content-specific research literature and expertise should inform this step.

The TIDieR checklist – a tool for describing interventions – outlines the characteristics across which an intervention might differ (Hoffmann et al 2014). These include ‘what’ materials and procedures are used, ‘who’ provides the intervention, ‘when and how much’ intervention is delivered. The iCAT-SR tool provides equivalent guidance for complex interventions (Lewin et al 2017).

differ across multiple characteristics, which vary in importance depending on the review.

In a review of exercise for osteoporosis, whether the exercise is weight-bearing or non-weight-bearing may be a key characteristic, since the mechanism by which exercise is thought to work is by placing stress or mechanical load on bones (Howe et al 2011).

Different mechanisms apply in reviews of exercise for knee osteoarthritis (muscle strengthening), falls prevention (gait and balance), cognitive function (cardiovascular fitness).

The differing mechanisms might suggest different ways of grouping interventions (e.g. by intensity, mode of delivery) according to potential modifiers of the intervention effects.

2a. Label and define intervention groups to be considered in the synthesis.

 

For each intervention group, provide a short label (e.g. supportive psychotherapy) and describe the core characteristics (criteria) that will be used to assign each intervention from an included study to a group.

Groups are often defined by intervention content (especially the active components), such as materials, procedures or techniques (e.g. a specific drug, an information leaflet, a behaviour change technique). Other characteristics may also be used, although some are more commonly used to define subgroups (see ): the purpose or theoretical underpinning, mode of delivery, provider, dose or intensity, duration or timing of the intervention (Hoffmann et al 2014).

In specifying groups:

Logic models may help structure the synthesis (see and ).

In a review of psychological therapies for coronary heart disease, a single group was specified for meta-analysis that included all types of therapy. Subgroups were defined to examine whether intervention effects were modified by intervention components (e.g. cognitive techniques, stress management) or mode of delivery (e.g. individual, group) (Richards et al 2017).

In a review of psychological therapies for panic disorder (Pompoli et al 2016), eight types of therapy were specified:

1. psychoeducation;

2. supportive psychotherapy (with or without a psychoeducational component);

3. physiological therapies;

4. behaviour therapy;

5. cognitive therapy;

6. cognitive behaviour therapy (CBT);

7. third-wave CBT; and

8. psychodynamic therapies.

Groups were defined by the theoretical basis of each therapy (e.g. CBT aims to modify maladaptive thoughts through cognitive restructuring) and the component techniques used.

2b. Define levels for groups based on dose or intensity.

For groups based on ‘how much’ of an intervention is used (e.g. dose or intensity), criteria are needed to quantify each group. This may be straightforward for easy-to-quantify characteristics, but more complex for characteristics that are hard to quantify (e.g. duration or intensity of rehabilitation or psychological therapy).

The levels should be based on how the intervention is used in practice (e.g. cut-offs for low and high doses of a supplement based on recommended nutrient intake), or on a rationale for how the intervention might work.

In reviews of exercise, intensity may be defined by training time (session length, frequency, program duration), amount of work (e.g. repetitions), and effort/energy expenditure (exertion, heart rate) (Regnaux et al 2015).

In a review of organized inpatient care for stroke, acute stroke units were categorized as ‘intensive’, ‘semi-intensive’ or ‘non-intensive’ based on whether the unit had continuous monitoring, high nurse staffing, and life support facilities (Stroke Unit Trialists Collaboration 2013).

3. Determine whether there is an existing system for grouping interventions.

 

In some fields, intervention taxonomies and frameworks have been developed for labelling and describing interventions, and these can make it easier for those using a review to interpret and apply findings.

Using an agreed system is preferable to developing new groupings. Existing systems should be assessed for relevance and usefulness. The most useful systems:

Systems for grouping interventions may be generic, widely applicable across clinical areas, or specific to a condition or intervention type. Some Cochrane Groups recommend specific taxonomies.

The (BCT) (Michie et al 2013) categorizes intervention elements such as goal setting, self-monitoring and social support. A protocol for a review of social media interventions used this taxonomy to describe interventions and examine different BCTs as potential effect modifiers (Welch et al 2018).

The has been used to group interventions (or components) by function (e.g. to educate, persuade, enable) (Michie et al 2011). This system was used to describe the components of dietary advice interventions (Desroches et al 2013).

 

Multiple reviews have used the consensus-based taxonomy developed by the Prevention of Falls Network Europe (ProFaNE) (e.g. Verheyden et al 2013, Kendrick et al 2014). The taxonomy specifies broad groups (e.g. exercise, medication, environment/assistive technology) within which are more specific groups (e.g. exercise: gait, balance and functional training; flexibility; strength and resistance) (Lamb et al 2011).

4. Plan how the specified groups will be used in synthesis and reporting.

Decide whether it is useful to pool all interventions in a single meta-analysis (‘lumping’), within which specific characteristics can be explored as effect modifiers (e.g. in subgroups). Alternatively, if pooling all interventions is unlikely to address a useful question, separate synthesis of specific interventions may be more appropriate (‘splitting’).

Determining the right analytic approach is discussed further in .

In a review of exercise for knee osteoarthritis, the different categories of exercise were combined in a single meta-analysis, addressing the question ‘what is the effect of exercise on knee osteoarthritis?’. The categories were also analysed as subgroups within the meta-analysis to explore whether the effect size varied by type of exercise (Fransen et al 2015). Other subgroup analyses examined mode of delivery and dose.

5. Decide how to group interventions with multiple components or co-interventions.

Some interventions, especially those considered ‘complex’, include multiple components that could also be implemented independently (Guise et al 2014, Lewin et al 2017). These components might be eligible for inclusion in the review alone, or eligible only if used alongside an eligible intervention.

Options for considering multi-component interventions may include the following.

and Welton et al 2009, Caldwell and Welton 2016, Higgins et al 2019).

The first two approaches may be challenging but are likely to be most useful (Caldwell and Welton 2016).

See Section . for the special case of when a co-intervention is administered in both treatment arms.

In a review of psychological therapies for panic disorder, two of the eight eligible therapies (psychoeducation and supportive psychotherapy) could be used alone or as part of a multi-component therapy. When accompanied by another eligible therapy, the intervention was categorized as the other therapy (i.e. psychoeducation + cognitive behavioural therapy was categorized as cognitive behavioural therapy) (Pompoli et al 2016).

 

In a review of psychosocial interventions for smoking cessation in pregnancy, two approaches were used. All intervention types were included in a single meta-analysis with subgroups for multi-component, single and tailored interventions. Separate meta-analyses were also performed for each intervention type, with categorization of multi-component interventions based on the ‘main’ component (Chamberlain et al 2017).

6. Build in contingencies by specifying both specific and broader intervention groups.

Consider grouping interventions at more than one level, so that studies of a broader group of interventions can be synthesized if too few studies are identified for synthesis in more specific groups. This will provide flexibility where review authors anticipate few studies contributing to specific groups (e.g. in reviews with diverse interventions, additional diversity in other PICO elements, or few studies overall, see also ).

In a review of psychosocial interventions for smoking cessation, the authors planned to group any psychosocial intervention in a single comparison (addressing the higher level question of whether, on average, psychosocial interventions are effective). Given that sufficient data were available, they also presented separate meta-analyses to examine the effects of specific types of psychosocial interventions (e.g. counselling, health education, incentives, social support) (Chamberlain et al 2017).

3.2.3 Defining which comparisons will be made

When articulating the PICO for each synthesis, defining the intervention groups alone is not sufficient for complete specification of the planned syntheses. The next step is to define the comparisons that will be made between the intervention groups. Setting aside for a moment more complex analyses such as network meta-analyses, which can simultaneously compare many groups ( Chapter 11 ), standard meta-analysis ( Chapter 10 ) aims to draw conclusions about the comparative effects of two groups at a time (i.e. which of two intervention groups is more effective?). These comparisons form the basis for the syntheses that will be undertaken if data are available. Cochrane Reviews sometimes include one comparison, but most often include multiple comparisons. Three commonly identified types of comparisons include the following (Davey et al 2011).

  • newer generation antidepressants versus placebo (Hetrick et al 2012); and
  • vertebroplasty for osteoporotic vertebral compression fractures versus placebo (sham procedure) (Buchbinder et al 2018).
  • chemotherapy or targeted therapy plus best supportive care (BSC) versus BSC for palliative treatment of esophageal and gastroesophageal-junction carcinoma (Janmaat et al 2017); and
  • personalized care planning versus usual care for people with long-term conditions (Coulter et al 2015).
  • early (commenced at less than two weeks of age) versus late (two weeks of age or more) parenteral zinc supplementation in term and preterm infants (Taylor et al 2017);
  • high intensity versus low intensity physical activity or exercise in people with hip or knee osteoarthritis (Regnaux et al 2015);
  • multimedia education versus other education for consumers about prescribed and over the counter medications (Ciciriello et al 2013).

The first two types of comparisons aim to establish the effectiveness of an intervention, while the last aims to compare the effectiveness of two interventions. However, the distinction between the placebo and control is often arbitrary, since any differences in the care provided between trials with a control arm and those with a placebo arm may be unimportant , especially where ‘usual care’ is provided to both. Therefore, placebo and control groups may be determined to be similar enough to be combined for synthesis.

In reviews including multiple intervention groups, many comparisons are possible. In some of these reviews, authors seek to synthesize evidence on the comparative effectiveness of all their included interventions, including where there may be only indirect comparison of some interventions across the included studies ( Chapter 11, Section 11.2.1 ). However, in many reviews including multiple intervention groups, a limited subset of the possible comparisons will be selected. The chosen subset of comparisons should address the most important clinical and research questions. For example, if an established intervention (or dose of an intervention) is used in practice, then the synthesis would ideally compare novel or alternative interventions to this established intervention, and not, for example, to no intervention.

3.2.3.1 Dealing with co-interventions

Planning is needed for the special case where the same supplementary intervention is delivered to both the intervention and comparator groups. A supplementary intervention is an additional intervention delivered alongside the intervention of interest, such as massage in a review examining the effects of aromatherapy (i.e. aromatherapy plus massage versus massage alone). In many cases, the supplementary intervention will be unimportant and can be ignored. In other situations, the effect of the intervention of interest may differ according to whether participants receive the supplementary therapy. For example, the effect of aromatherapy among people who receive a massage may differ from the effect of the aromatherapy given alone. This will be the case if the intervention of interest interacts with the supplementary intervention leading to larger (synergistic) or smaller (dysynergistic/antagonistic) effects than the intervention of interest alone (Squires et al 2013). While qualitative interactions are rare (where the effect of the intervention is in the opposite direction when combined with the supplementary intervention), it is possible that there will be more variation in the intervention effects (heterogeneity) when supplementary interventions are involved, and it is important to plan for this. Approaches for dealing with this in the statistical synthesis may include fitting a random-effects meta-analysis model that encompasses heterogeneity ( Chapter 10, Section 10.10.4 ), or investigating whether the intervention effect is modified by the addition of the supplementary intervention through subgroup analysis ( Chapter 10, Section 10.11.2 ).

3.2.4 Selecting, prioritizing and grouping review outcomes

3.2.4.1 selecting review outcomes.

Broad outcome domains are decided at the time of setting up the review PICO (see Chapter 2 ). Once the broad domains are agreed, further specification is required to define the domains to facilitate reporting and synthesis (i.e. the PICO for comparison) (see Chapter 2, Section 2.3 ). The process for specifying and grouping outcomes largely parallels that used for specifying intervention groups.

Reporting of outcomes should rarely determine study eligibility for a review. In particular, studies should not be excluded because they do not report results of an outcome they may have measured, or provide ‘no usable data’ ( MECIR Box 3.2.d ). This is essential to avoid bias arising from selective reporting of findings by the study authors (see Chapter 13 ). However, in some circumstances, the measurement of certain outcomes may be a study eligibility criterion. This may be the case, for example, when the review addresses the potential for an intervention to prevent a particular outcome, or when the review addresses a specific purpose of an intervention that can be used in the same population for different purposes (such as hormone replacement therapy, or aspirin).

MECIR Box 3.2.d Relevant expectations for conduct of intervention reviews

Clarifying role of outcomes ( )

Outcome measures should not always form part of the criteria for including studies in a review. However, some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes (e.g. hormone replacement therapy, or aspirin); or a review may address specifically the adverse effects of an intervention used for several conditions. If authors do exclude studies on the basis of outcomes, care should be taken to ascertain that relevant outcomes are not available because they have not been measured rather than simply not reported.

Predefining outcome domains ( )

Full specification of the outcomes includes consideration of outcome domains (e.g. quality of life) and outcome measures (e.g. SF-36). Predefinition of outcome reduces the risk of selective outcome reporting. The should be as few as possible and should normally reflect at least one potential benefit and at least one potential area of harm. It is expected that the review should be able to synthesize these outcomes if eligible studies are identified, and that the conclusions of the review will be based largely on the effects of the interventions on these outcomes. Additional important outcomes may also be specified. Up to seven critical and important outcomes will form the basis of the GRADE assessment and summarized in the review’s abstract and other summary formats, although the review may measure more than seven outcomes.

Choosing outcomes ( )

Cochrane Reviews are intended to support clinical practice and policy, and should address outcomes that are critical or important to consumers. These should be specified at protocol stage. Where available, established sets of core outcomes should be used. Patient-reported outcomes should be included where possible. It is also important to judge whether evidence of resource use and costs might be an important component of decisions to adopt the intervention or alternative management strategies around the world. Large numbers of outcomes, while sometimes necessary, can make reviews unfocused, unmanageable for the user, and prone to selective outcome reporting bias. Biochemical, interim and process outcomes should be considered where they are important to decision makers. Any outcomes that would not be described as critical or important can be left out of the review.

Predefining outcome measures ( )

Having decided what outcomes are of interest to the review, authors should clarify acceptable ways in which these outcomes can be measured. It may be difficult, however, to predefine adverse effects.

C17: Predefining choices from multiple outcome measures ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. A predefined hierarchy of outcomes measures may be helpful. It may be difficult, however, to predefine adverse effects. A rationale should be provided for the choice of outcome measure

C18: Predefining time points of interest ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. Authors may consider whether all time frames or only selected time points will be included in the review. These decisions should be based on outcomes important for making healthcare decisions. One strategy to make use of the available data could be to group time points into prespecified intervals to represent ‘short-term’, ‘medium-term’ and ‘long-term’ outcomes and to take no more than one from each interval from each study for any particular outcome.

In general, systematic reviews should aim to include outcomes that are likely to be meaningful to the intended users and recipients of the reviewed evidence. This may include clinicians, patients (consumers), the general public, administrators and policy makers. Outcomes may include survival (mortality), clinical events (e.g. strokes or myocardial infarction), behavioural outcomes (e.g. changes in diet, use of services), patient-reported outcomes (e.g. symptoms, quality of life), adverse events, burdens (e.g. demands on caregivers, frequency of tests, restrictions on lifestyle) and economic outcomes (e.g. cost and resource use). It is critical that outcomes used to assess adverse effects as well as outcomes used to assess beneficial effects are among those addressed by a review (see Chapter 19 ).

Outcomes that are trivial or meaningless to decision makers should not be included in Cochrane Reviews. Inclusion of outcomes that are of little or no importance risks overwhelming and potentially misleading readers. Interim or surrogate outcomes measures, such as laboratory results or radiologic results (e.g. loss of bone mineral content as a surrogate for fractures in hormone replacement therapy), while potentially helpful in explaining effects or determining intervention integrity (see Chapter 5, Section 5.3.4.1 ), can also be misleading since they may not predict clinically important outcomes accurately. Many interventions reduce the risk for a surrogate outcome but have no effect or have harmful effects on clinically relevant outcomes, and some interventions have no effect on surrogate measures but improve clinical outcomes.

Various sources can be used to develop a list of relevant outcomes, including input from consumers and advisory groups (see Chapter 2 ), the clinical experiences of the review authors, and evidence from the literature (including qualitative research about outcomes important to those affected (see Chapter 21 )). A further driver of outcome selection is consideration of outcomes used in related reviews. Harmonization of outcomes across reviews addressing related questions facilitates broader evidence synthesis questions being addressed through the use of Overviews of reviews (see Chapter V ).

Outcomes considered to be meaningful, and therefore addressed in a review, may not have been reported in the primary studies. For example, quality of life is an important outcome, perhaps the most important outcome, for people considering whether or not to use chemotherapy for advanced cancer, even if the available studies are found to report only survival (see Chapter 18 ). A further example arises with timing of the outcome measurement, where time points determined as clinically meaningful in a review are not measured in the primary studies. Including and discussing all important outcomes in a review will highlight gaps in the primary research and encourage researchers to address these gaps in future studies.

3.2.4.2 Prioritizing review outcomes

Once a full list of relevant outcomes has been compiled for the review, authors should prioritize the outcomes and select the outcomes of most relevance to the review question. The GRADE approach to assessing the certainty of evidence (see Chapter 14 ) suggests that review authors separate outcomes into those that are ‘critical’, ‘important’ and ‘not important’ for decision making.

The critical outcomes are the essential outcomes for decision making, and are those that would form the basis of a ‘Summary of findings’ table or other summary versions of the review, such as the Abstract or Plain Language Summary. ‘Summary of findings’ tables provide key information about the amount of evidence for important comparisons and outcomes, the quality of the evidence and the magnitude of effect (see Chapter 14, Section 14.1 ). There should be no more than seven outcomes included in a ‘Summary of findings’ table, and those outcomes that will be included in summaries should be specified at the protocol stage. They should generally not include surrogate or interim outcomes. They should not be chosen on the basis of any anticipated or observed magnitude of effect, or because they are likely to have been addressed in the studies to be reviewed. Box 3.2.c summarizes the principal factors to consider when selecting and prioritizing review outcomes.

Box 3.2.c Factors to consider when selecting and prioritizing review outcomes

3.2.4.3 Defining and grouping outcomes for synthesis

Table 3.2.c outlines a process for planning for the diversity in outcome measurement that may be encountered in the studies included in a review and which can complicate, and sometimes prevent, synthesis. Research has repeatedly documented inconsistency in the outcomes measured across trials in the same clinical areas (Harrison et al 2016, Williamson et al 2017). This inconsistency occurs across all aspects of outcome measurement, including the broad domains considered, the outcomes measured, the way these outcomes are labelled and defined, and the methods and timing of measurement. For example, a review of outcome measures used in 563 studies of interventions for dementia and mild cognitive impairment found that 321 unique measurement methods were used for 1278 assessments of cognitive outcomes (Harrison et al 2016). Initiatives like COMET ( Core Outcome Measures in Effectiveness Trials ) aim to encourage standardization of outcome measurement across trials (Williamson et al 2017), but these initiatives are comparatively new and review authors will inevitably encounter diversity in outcomes across studies.

The process begins by describing the scope of each outcome domain in sufficient detail to enable outcomes from included studies to be categorized ( Table 3.2.c Step 1). This step may be straightforward in areas for which core outcome sets (or equivalent systems) exist ( Table 3.2.c Step 2). The methods and timing of outcome measurement also need to be specified, giving consideration to how differences across studies will be handled ( Table 3.2.c Steps 3 and 4). Subsequent steps consider options for dealing with studies that report multiple measures within an outcome domain ( Table 3.2.c Step 5), planning how outcome domains will be used in synthesis ( Table 3.2.c Step 6), and building in contingencies to maximize potential to synthesize ( Table 3.2.c Step 7).

Table 3.2.c A process for planning outcome groups for synthesis

1. Fully specify outcome domains.

For each outcome domain, provide a short label (e.g. cognition, consumer evaluation of care) and describe the domain in sufficient detail to enable eligible outcomes from each included study to be categorized. The definition should be based on the concept (or construct) measured, that is ‘what’ is measured. ‘When’ and ‘how’ the outcome is measured will be considered in subsequent steps.

Outcomes can be defined hierarchically, starting with very broad groups (e.g. physiological/clinical outcomes, life impact, adverse events), then outcome domains (e.g. functioning and perceived health status are domains within ‘life impact’). Within these may be narrower domains (e.g. physical function, cognitive function), and then specific outcome measures (Dodd et al 2018). The level at which outcomes are grouped for synthesis alters the question addressed, and so decisions should be guided by the review objectives.

In specifying outcome domains:

In a review of computer-based interventions for sexual health promotion, three broad outcome domains were defined (cognitions, behaviours, biological) based on a conceptual model of how the intervention might work. Each domain comprised more specific domains and outcomes (e.g. condom use, seeking health services such as STI testing); listing these helped define the broad domains and guided categorization of the diverse outcomes reported in included studies (Bailey et al 2010).

In a protocol for a review of social media interventions for improving health, the rationale for synthesizing broad groupings of outcomes (e.g. health behaviours, physical health) was based on prediction of a common underlying mechanism by which the intervention would work, and the review objective, which focused on overall health rather than specific outcomes (Welch et al 2018).

2. Determine whether there is an existing system for identifying and grouping important outcomes.

Systems for categorizing outcomes include core outcome sets including the and initiatives, and outcome taxonomies (Dodd et al 2018). These systems define agreed outcomes that should be measured for specific conditions (Williamson et al 2017).These systems can be used to standardize the varied outcome labels used across studies and enable grouping and comparison (Kirkham et al 2013). Agreed terminology may help decision makers interpret review findings.

The COMET website provides a database of core outcome sets agreed or in development. Some Cochrane Groups have developed their own outcome sets. While the availability of outcome sets and taxonomies varies across clinical areas, several taxonomies exist for specifying broad outcome domains (e.g. Dodd et al 2018, ICHOM 2018).

In a review of combined diet and exercise for preventing gestational diabetes mellitus, a core outcome set agreed by the Cochrane Pregnancy and Childbirth group was used (Shepherd et al 2017).

In a review of decision aids for people facing health treatment or screening decisions (Stacey et al 2017), outcome domains were based on criteria for evaluating decision aids agreed in the (IPDAS). Doing so helped to assess the use of aids across diverse clinical decisions.

The Cochrane Consumers and Communication Group has an agreed taxonomy to guide specification of outcomes of importance in evaluating communication interventions (Cochrane Consumers & Communication Group).

3. Define the outcome time points.

A key attribute of defining an outcome is specifying the time of measurement. In reviews, time frames, and not specific time points, are often specified to handle the likely diversity in timing of outcome measurement across studies (e.g. a ‘medium-term’ time frame might be defined as including outcomes measured between 6 and 12 months).

In specifying outcome timing:

In a review of psychological therapies for panic disorder, the main outcomes were ‘short-term’ (≤6 months from treatment commencement). ‘Long-term’ outcomes (>6 months from treatment commencement) were considered important, but not specified as critical because of concerns of participant attrition (Pompoli et al 2018).

In contrast, in a review of antidepressants, a clinically meaningful time frame of 6 to 12 months might be specified for the critical outcome ‘depression’, since this is the recommended treatment duration. However, it may be anticipated that many studies will be of shorter duration with short-term follow-up, so an additional important outcome of ‘depression (<3 months)’ might also be specified.

4. Specify the measurement tool or measurement method.

For each outcome domain, specify:

Minimum criteria for inclusion of a measure may include:

(e.g. consistent scores across time and raters when the outcome is unchanged), and (e.g. comparable results to similar measures, including a gold standard if available); and

Measures may be identified from core outcome sets (e.g. Williamson et al 2017, ICHOM 2018) or systematic reviews of instruments (see COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative for a database of examples).

In a review of interventions to support women to stop smoking, objective (biochemically validated) and subjective (self-report) measures of smoking cessation were specified separately to examine bias due to the method used to measure the outcome (Step 6) (Chamberlain et al 2017).

In a review of high-intensity versus low-intensity exercise for osteoarthritis, measures of pain were selected based on relevance of the content and properties of the measurement tool (i.e. evidence of validity and reliability) (Regnaux et al 2015).

5. Specify how multiplicity of outcomes will be handled.

For a particular domain, multiple outcomes within a study may be available for inclusion. This may arise from:

Effects of the intervention calculated from these different sources of multiplicity are statistically dependent, since they have been calculated using the same participants. To deal with this dependency, select only one outcome per study for a particular comparison, or use a meta-analysis method that accounts for the dependency (see Step 6).

Pre-specify the method of selection from multiple outcomes or measures in the protocol, using an approach that is independent of the result (see ) (López-López et al 2018). Document all eligible outcomes or measures in the ‘Characteristics of included studies’ table, noting which was selected and why.

Multiplicity can arise from the reporting of multiple analyses of the same outcome (e.g. analyses that do and do not adjust for prognostic factors; intention-to-treat and per-protocol analyses) and multiple reports of the same study (e.g. journal articles, conference abstracts). Approaches for dealing with this type of multiplicity should also be specified in the protocol (López-López et al 2018).

It may be difficult to anticipate all forms of multiplicity when developing a protocol. Any post-hoc approaches used to select outcomes or results should be noted at the beginning of the Methods section, or if extensive, within an additional supplementary material.

The following hierarchy was specified to select one outcome per domain in a review examining the effects of portion, package or tableware size (Hollands et al 2015):

Selection of the outcome was made blinded to the results. All available outcome measures were documented in the ‘Characteristics of included studies’ table.

In a review of audit and feedback for healthcare providers, the outcome domains were ‘provider performance’ (e.g. compliance with recommended use of a laboratory test) and ‘patient health outcomes’ (e.g. smoking status, blood pressure) (Ivers et al 2012). For each domain, outcomes were selected using the following hierarchy:

6. Plan how the specified outcome domains will be used in the synthesis.

When different measurement methods or tools have been used across studies, consideration must be given to how these will be synthesized. Options include the following.

and ). There may be increased heterogeneity, warranting use of a random-effects model ( ).

In a review of interventions to support women to stop smoking, separate outcome domains were specified for biochemically validated measures of smoking and self-report measures. The two domains were meta-analysed together, but sensitivity analyses were undertaken restricting the meta-analyses to studies with only biochemically validated outcomes, to examine if the results were robust to the method of measurement (Chamberlain et al 2017).

In a review of psychological therapies for youth internalizing and externalizing disorders, most studies contributed multiple effects (e.g. in one meta-analysis of 443 studies, there were 5139 included measures). The authors used multilevel modelling to address the dependency among multiple effects contributed from each study (Weisz et al 2017).

7. Where possible, build in contingencies by specifying both specific and broader outcome domains.

Consider building in flexibility to group outcomes at different levels or time intervals. Inflexible approaches can undermine the potential to synthesize, especially when few studies are anticipated, or there is likely to be diversity in the way outcomes are defined and measured and the timing of measurement. If insufficient studies report data for meaningful synthesis using the narrower domains, the broader domains can be used (see also ).

Consider a hypothetical review aiming to examine the effects of behavioural psychological interventions for the treatment of overweight and obese adults. A specific outcome is body mass index (BMI). However, also specifying a broader outcome domain ‘indicator of body mass’ will facilitate synthesis in the circumstance where few studies report BMI, but most report an indicator of body mass (such as weight or waist circumference). This is particularly important when few studies may be anticipated or there is expected diversity in the measurement methods or tools.

3.3 Determining which study designs to include

Some study designs are more appropriate than others for answering particular questions. Authors need to consider a priori what study designs are likely to provide reliable data with which to address the objectives of their review ( MECIR Box 3.3.a ). Sections 3.3.1 and 3.3.2 cover randomized and non-randomized designs for assessing treatment effects; Chapter 17, Section 17.2.5  discusses other study designs in the context of addressing intervention complexity.

MECIR Box 3.3.a Relevant expectations for conduct of intervention reviews

Predefining study designs ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. This is particularly important when non-randomized studies are considered. Some labels commonly used to define study designs can be ambiguous. For example a ‘double blind’ study may not make it clear who was blinded; a ‘case-control’ study may be nested within a cohort, or be undertaken in a cross-sectional manner; or a ‘prospective’ study may have only some features defined or undertaken prospectively.

Justifying choice of study designs ( )

It might be difficult to address some interventions or some outcomes in randomized trials. Authors should be able to justify why they have chosen either to restrict the review to randomized trials or to include non-randomized studies. The particular study designs included should be justified with regard to appropriateness to the review question and with regard to potential for bias.

3.3.1 Including randomized trials

Because Cochrane Reviews address questions about the effects of health care, they focus primarily on randomized trials and randomized trials should be included if they are feasible for the interventions of interest ( MECIR Box 3.3.b ). Randomization is the only way to prevent systematic differences between baseline characteristics of participants in different intervention groups in terms of both known and unknown (or unmeasured) confounders (see Chapter 8 ), and claims about cause and effect can be based on their findings with far more confidence than almost any other type of study. For clinical interventions, deciding who receives an intervention and who does not is influenced by many factors, including prognostic factors. Empirical evidence suggests that, on average, non-randomized studies produce effect estimates that indicate more extreme benefits of the effects of health care than randomized trials. However, the extent, and even the direction, of the bias is difficult to predict. These issues are discussed at length in Chapter 24 , which provides guidance on when it might be appropriate to include non-randomized studies in a Cochrane Review.

Practical considerations also motivate the restriction of many Cochrane Reviews to randomized trials. In recent decades there has been considerable investment internationally in establishing infrastructure to index and identify randomized trials. Cochrane has contributed to these efforts, including building up and maintaining a database of randomized trials, developing search filters to aid their identification, working with MEDLINE to improve tagging and identification of randomized trials, and using machine learning and crowdsourcing to reduce author workload in identifying randomized trials ( Chapter 4, Section 4.6.6.2 ). The same scale of organizational investment has not (yet) been matched for the identification of other types of studies. Consequently, identifying and including other types of studies may require additional efforts to identify studies and to keep the review up to date, and might increase the risk that the result of the review will be influenced by publication bias. This issue and other bias-related issues that are important to consider when defining types of studies are discussed in detail in Chapter 7 and Chapter 13 .

Specific aspects of study design and conduct should be considered when defining eligibility criteria, even if the review is restricted to randomized trials. For example, whether cluster-randomized trials ( Chapter 23, Section 23.1 ) and crossover trials ( Chapter 23, Section 23.2 ) are eligible, as well as other criteria for eligibility such as use of a placebo comparison group, evaluation of outcomes blinded to allocation sequence, or a minimum period of follow-up. There will always be a trade-off between restrictive study design criteria (which might result in the inclusion of studies that are at low risk of bias, but very few in number) and more liberal design criteria (which might result in the inclusion of more studies, but at a higher risk of bias). Furthermore, excessively broad criteria might result in the inclusion of misleading evidence. If, for example, interest focuses on whether a therapy improves survival in patients with a chronic condition, it might be inappropriate to look at studies of very short duration, except to make explicit the point that they cannot address the question of interest.

MECIR Box 3.3.b Relevant expectations for conduct of intervention reviews

Including randomized trials ( )

if it is feasible to conduct them to evaluate the interventions and outcomes of interest.

Randomized trials are the best study design for evaluating the efficacy of interventions. If it is feasible to conduct them to evaluate questions that are being addressed by the review, they must be considered eligible for the review. However, appropriate exclusion criteria may be put in place, for example regarding length of follow-up.

3.3.2 Including non-randomized studies

The decision of whether non-randomized studies (and what type) will be included is decided alongside the formulation of the review PICO. The main drivers that may lead to the inclusion of non-randomized studies include: (i) when randomized trials are unable to address the effects of the intervention on harm and long-term outcomes or in specific populations or settings; or (ii) for interventions that cannot be randomized (e.g. policy change introduced in a single or small number of jurisdictions) (see Chapter 24 ). Cochrane, in collaboration with others, has developed guidance for review authors to support their decision about when to look for and include non-randomized studies (Schünemann et al 2013).

Non-randomized designs have the commonality of not using randomization to allocate units to comparison groups, but their different design features mean that they are variable in their susceptibility to bias. Eligibility criteria should be based on explicit study design features, and not the study labels applied by the primary researchers (e.g. case-control, cohort), which are often used inconsistently (Reeves et al 2017; see Chapter 24 ).

When non-randomized studies are included, review authors should consider how the studies will be grouped and used in the synthesis. The Cochrane Non-randomized Studies Methods Group taxonomy of design features (see Chapter 24 ) may provide a basis for grouping together studies that are expected to have similar inferential strength and for providing a consistent language for describing the study design.

Once decisions have been made about grouping study designs, planning of how these will be used in the synthesis is required. Review authors need to decide whether it is useful to synthesize results from non-randomized studies and, if so, whether results from randomized trials and non-randomized studies should be included in the same synthesis (for the purpose of examining whether study design explains heterogeneity among the intervention effects), or whether the effects should be synthesized in separate comparisons (Valentine and Thompson 2013). Decisions should be made for each of the different types of non-randomized studies under consideration. Review authors might anticipate increased heterogeneity when non-randomized studies are synthesized, and adoption of a meta-analysis model that encompasses heterogeneity is wise (Valentine and Thompson 2013) (such as a random effects model, see Chapter 10, Section 10.10.4 ). For further discussion of non-randomized studies, see Chapter 24 .

3.4 Eligibility based on publication status and language

Chapter 4 contains detailed guidance on how to identify studies from a range of sources including, but not limited to, those in peer-reviewed journals. In general, a strategy to include studies reported in all types of publication will reduce bias ( Chapter 7 ). There would need to be a compelling argument for the exclusion of studies on the basis of their publication status ( MECIR Box 3.4.a ), including unpublished studies, partially published studies, and studies published in ‘grey’ literature sources. Given the additional challenge in obtaining unpublished studies, it is possible that any unpublished studies identified in a given review may be an unrepresentative subset of all the unpublished studies in existence. However, the bias this introduces is of less concern than the bias introduced by excluding all unpublished studies, given what is known about the impact of reporting biases (see Chapter 13 on bias due to missing studies, and Chapter 4, Section 4.3 for a more detailed discussion of searching for unpublished and grey literature).

Likewise, while searching for, and analysing, studies in any language can be extremely resource-intensive, review authors should consider carefully the implications for bias (and equity, see Chapter 16 ) if they restrict eligible studies to those published in one specific language (usually English). See Chapter 4, Section 4.4.5 , for further discussion of language and other restrictions while searching.

MECIR Box 3.4.a Relevant expectations for conduct of intervention reviews

Excluding studies based on publication status ( )

Obtaining and including data from unpublished studies (including grey literature) can reduce the effects of publication bias. However, the unpublished studies that can be located may be an unrepresentative sample of all unpublished studies.

3.5 Chapter information

Authors: Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Acknowledgements: This chapter builds on earlier versions of the Handbook . In particular, Version 5, Chapter 5 , edited by Denise O’Connor, Sally Green and Julian Higgins.

Funding: JEM is supported by an Australian National Health and Medical Research Council (NHMRC) Career Development Fellowship (1143429). SEB and RER’s positions are supported by the NHMRC Cochrane Collaboration Funding Program. HJT is funded by the UK Medical Research Council (MC_UU_12017-13 and MC_UU_12017-15) and Scottish Government Chief Scientist Office (SPHSU13 and SPHSU15). RVJ’s position is supported by the NHMRC Cochrane Collaboration Funding Program and Cabrini Institute. JT is supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

3.6 References

Bailey JV, Murray E, Rait G, Mercer CH, Morris RW, Peacock R, Cassell J, Nazareth I. Interactive computer-based interventions for sexual health promotion. Cochrane Database of Systematic Reviews 2010; 9 : CD006483.

Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K. Attention should be given to multiplicity issues in systematic reviews. Journal of Clinical Epidemiology 2008; 61 : 857–865.

Buchbinder R, Johnston RV, Rischin KJ, Homik J, Jones CA, Golmohammadi K, Kallmes DF. Percutaneous vertebroplasty for osteoporotic vertebral compression fracture. Cochrane Database of Systematic Reviews 2018; 4 : CD006349.

Caldwell DM, Welton NJ. Approaches for synthesising complex mental health interventions in meta-analysis. Evidence-Based Mental Health 2016; 19 : 16–21.

Chamberlain C, O’Mara-Eves A, Porter J, Coleman T, Perlen S, Thomas J, McKenzie J. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Ciciriello S, Johnston RV, Osborne RH, Wicks I, deKroo T, Clerehan R, O’Neill C, Buchbinder R. Multimedia educational interventions for consumers about prescribed and over-the-counter medications. Cochrane Database of Systematic Reviews 2013; 4 : CD008416.

Cochrane Consumers & Communication Group. Outcomes of Interest to the Cochrane Consumers & Communication Group: taxonomy. http://cccrg.cochrane.org/ .

COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. COSMIN database of systematic reviews of outcome measurement instruments. https://database.cosmin.nl/ .

Coulter A, Entwistle VA, Eccles A, Ryan S, Shepperd S, Perera R. Personalised care planning for adults with chronic or long-term health conditions. Cochrane Database of Systematic Reviews 2015; 3 : CD010523.

Davey J, Turner RM, Clarke MJ, Higgins JPT. Characteristics of meta-analyses and their component studies in the Cochrane Database of Systematic Reviews: a cross-sectional, descriptive analysis. BMC Medical Research Methodology 2011; 11 : 160.

Desroches S, Lapointe A, Ratte S, Gravel K, Legare F, Turcotte S. Interventions to enhance adherence to dietary advice for preventing and managing chronic diseases in adults. Cochrane Database of Systematic Reviews 2013; 2 : CD008722.

Deyo RA, Dworkin SF, Amtmann D, Andersson G, Borenstein D, Carragee E, Carrino J, Chou R, Cook K, DeLitto A, Goertz C, Khalsa P, Loeser J, Mackey S, Panagis J, Rainville J, Tosteson T, Turk D, Von Korff M, Weiner DK. Report of the NIH Task Force on research standards for chronic low back pain. Journal of Pain 2014; 15 : 569–585.

Dodd S, Clarke M, Becker L, Mavergames C, Fish R, Williamson PR. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of Clinical Epidemiology 2018; 96 : 84–92.

Fisher DJ, Carpenter JR, Morris TP, Freeman SC, Tierney JF. Meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach? BMJ 2017; 356 : j573.

Fransen M, McConnell S, Harmer AR, Van der Esch M, Simic M, Bennell KL. Exercise for osteoarthritis of the knee. Cochrane Database of Systematic Reviews 2015; 1 : CD004376.

Guise JM, Chang C, Viswanathan M, Glick S, Treadwell J, Umscheid CA. Systematic reviews of complex multicomponent health care interventions. Report No. 14-EHC003-EF . Rockville, MD: Agency for Healthcare Research and Quality; 2014.

Harrison JK, Noel-Storr AH, Demeyere N, Reynish EL, Quinn TJ. Outcomes measures in a decade of dementia and mild cognitive impairment trials. Alzheimer’s Research and Therapy 2016; 8 : 48.

Hedges LV, Tipton E, Johnson M, C. Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods 2010; 1 : 39–65.

Hetrick SE, McKenzie JE, Cox GR, Simmons MB, Merry SN. Newer generation antidepressants for depressive disorders in children and adolescents. Cochrane Database of Systematic Reviews 2012; 11 : CD004851.

Higgins JPT, López-López JA, Becker BJ, Davies SR, Dawson S, Grimshaw JM, McGuinness LA, Moore THM, Rehfuess E, Thomas J, Caldwell DM. Synthesizing quantitative evidence in systematic reviews of complex health interventions. BMJ Global Health 2019; 4 : e000858.

Hoffmann T, Glasziou P, Barbour V, Macdonald H. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 1687 : 1-13.

Hollands GJ, Shemilt I, Marteau TM, Jebb SA, Lewis HB, Wei Y, Higgins JPT, Ogilvie D. Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco. Cochrane Database of Systematic Reviews 2015; 9 : CD011045.

Howe TE, Shea B, Dawson LJ, Downie F, Murray A, Ross C, Harbour RT, Caldwell LM, Creed G. Exercise for preventing and treating osteoporosis in postmenopausal women. Cochrane Database of Systematic Reviews 2011; 7 : CD000333.

ICHOM. The International Consortium for Health Outcomes Measurement 2018. http://www.ichom.org/ .

IPDAS. International Patient Decision Aid Standards Collaboration (IPDAS) standards. www.ipdas.ohri.ca .

Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, O’Brien MA, Johansen M, Grimshaw J, Oxman AD. Audit and feedback: effects on professional practice and healthcare outcomes. Cochrane Database of Systematic Reviews 2012; 6 : CD000259.

Janmaat VT, Steyerberg EW, van der Gaast A, Mathijssen RH, Bruno MJ, Peppelenbosch MP, Kuipers EJ, Spaander MC. Palliative chemotherapy and targeted therapies for esophageal and gastroesophageal junction cancer. Cochrane Database of Systematic Reviews 2017; 11 : CD004063.

Kendrick D, Kumar A, Carpenter H, Zijlstra GAR, Skelton DA, Cook JR, Stevens Z, Belcher CM, Haworth D, Gawler SJ, Gage H, Masud T, Bowling A, Pearl M, Morris RW, Iliffe S, Delbaere K. Exercise for reducing fear of falling in older people living in the community. Cochrane Database of Systematic Reviews 2014; 11 : CD009848.

Kirkham JJ, Gargon E, Clarke M, Williamson PR. Can a core outcome set improve the quality of systematic reviews? A survey of the Co-ordinating Editors of Cochrane Review Groups. Trials 2013; 14 : 21.

Konstantopoulos S. Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods 2011; 2 : 61–76.

Lamb SE, Becker C, Gillespie LD, Smith JL, Finnegan S, Potter R, Pfeiffer K. Reporting of complex interventions in clinical trials: development of a taxonomy to classify and describe fall-prevention interventions. Trials 2011; 12 : 125.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, Mckenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

López-López JA, Page MJ, Lipsey MW, Higgins JPT. Dealing with multiplicity of effect sizes in systematic reviews and meta-analyses. Research Synthesis Methods 2018; 9 : 336–351.

Mavridis D, Salanti G. A practical introduction to multivariate meta-analysis. Statistical Methods in Medical Research 2013; 22 : 133–158.

Michie S, van Stralen M, West R. The Behaviour Change Wheel: a new method for characterising and designing behaviour change interventions. Implementation Science 2011; 6 : 42.

Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, Eccles MP, Cane J, Wood CE. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine 2013; 46 : 81–95.

Moraes VY, Lenza M, Tamaoki MJ, Faloppa F, Belloti JC. Platelet-rich therapies for musculoskeletal soft tissue injuries. Cochrane Database of Systematic Reviews 2014; 4 : CD010071.

O'Neill J, Tabish H, Welch V, Petticrew M, Pottie K, Clarke M, Evans T, Pardo Pardo J, Waters E, White H, Tugwell P. Applying an equity lens to interventions: using PROGRESS ensures consideration of socially stratifying factors to illuminate inequities in health. Journal of Clinical Epidemiology 2014; 67 : 56–64.

Pompoli A, Furukawa TA, Imai H, Tajika A, Efthimiou O, Salanti G. Psychological therapies for panic disorder with or without agoraphobia in adults: a network meta-analysis. Cochrane Database of Systematic Reviews 2016; 4 : CD011004.

Pompoli A, Furukawa TA, Efthimiou O, Imai H, Tajika A, Salanti G. Dismantling cognitive-behaviour therapy for panic disorder: a systematic review and component network meta-analysis. Psychological Medicine 2018; 48 : 1–9.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series-paper 5: a checklist for classifying studies evaluating the effects on health interventions – a taxonomy without labels. Journal of Clinical Epidemiology 2017; 89 : 30–42.

Regnaux J-P, Lefevre-Colau M-M, Trinquart L, Nguyen C, Boutron I, Brosseau L, Ravaud P. High-intensity versus low-intensity physical activity or exercise in people with hip or knee osteoarthritis. Cochrane Database of Systematic Reviews 2015; 10 : CD010203.

Richards SH, Anderson L, Jenkinson CE, Whalley B, Rees K, Davies P, Bennett P, Liu Z, West R, Thompson DR, Taylor RS. Psychological interventions for coronary heart disease. Cochrane Database of Systematic Reviews 2017; 4 : CD002902.

Safi S, Korang SK, Nielsen EE, Sethi NJ, Feinberg J, Gluud C, Jakobsen JC. Beta-blockers for heart failure. Cochrane Database of Systematic Reviews 2017; 12 : CD012897.

Santesso N, Carrasco-Labra A, Brignardello-Petersen R. Hip protectors for preventing hip fractures in older people. Cochrane Database of Systematic Reviews 2014; 3 : CD001255.

Shepherd E, Gomersall JC, Tieu J, Han S, Crowther CA, Middleton P. Combined diet and exercise interventions for preventing gestational diabetes mellitus. Cochrane Database of Systematic Reviews 2017; 11 : CD010443.

Squires J, Valentine J, Grimshaw J. Systematic reviews of complex interventions: framing the review question. Journal of Clinical Epidemiology 2013; 66 : 1215–1222.

Stacey D, Légaré F, Lewis K, Barry MJ, Bennett CL, Eden KB, Holmes-Rovner M, Llewellyn-Thomas H, Lyddiatt A, Thomson R, Trevena L. Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews 2017; 4 : CD001431.

Stroke Unit Trialists Collaboration. Organised inpatient (stroke unit) care for stroke. Cochrane Database of Systematic Reviews 2013; 9 : CD000197.

Taylor AJ, Jones LJ, Osborn DA. Zinc supplementation of parenteral nutrition in newborn infants. Cochrane Database of Systematic Reviews 2017; 2 : CD012561.

Valentine JC, Thompson SG. Issues relating to confounding and meta-analysis when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods 2013; 4 : 26–35.

Vaona A, Banzi R, Kwag KH, Rigon G, Cereda D, Pecoraro V, Tramacere I, Moja L. E-learning for health professionals. Cochrane Database of Systematic Reviews 2018; 1 : CD011736.

Verheyden GSAF, Weerdesteyn V, Pickering RM, Kunkel D, Lennon S, Geurts ACH, Ashburn A. Interventions for preventing falls in people after stroke. Cochrane Database of Systematic Reviews 2013; 5 : CD008728.

Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, Jensen-Doss A, Hawley KM, Krumholz Marchette LS, Chu BC, Weersing VR, Fordwood SR. What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. American Psychologist 2017; 72 : 79–117.

Welch V, Petkovic J, Simeon R, Presseau J, Gagnon D, Hossain A, Pardo Pardo J, Pottie K, Rader T, Sokolovski A, Yoganathan M, Tugwell P, DesMeules M. Interactive social media interventions for health behaviour change, health outcomes, and health equity in the adult population. Cochrane Database of Systematic Reviews 2018; 2 : CD012932.

Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. American Journal of Epidemiology 2009; 169 : 1158–1165.

Williamson PR, Altman DG, Bagley H, Barnes KL, Blazeby JM, Brookes ST, Clarke M, Gargon E, Gorst S, Harman N, Kirkham JJ, McNair A, Prinsen CAC, Schmitt J, Terwee CB, Young B. The COMET Handbook: version 1.0. Trials 2017; 18 : 280.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Nature

Library guides

Our library guides bring together the essential resources in your subject area and connect you quickly and easily to information about Library Services

CityLibrary Search

Literature searching and finding evidence.

  • Literature searching or literature review?
  • Use the PICO or PEO frameworks

Establish your Inclusion and Exclusion criteria

  • Find related search terms
  • Subject Heading/MeSH Searching
  • Select databases to search
  • Structure your search
  • Search techniques
  • Search key databases
  • Manage results in EBSCOhost and Ovid
  • Analyse your search results
  • Document your search results
  • Training and support

These criteria help you decide which pieces of evidence (for example, which primary research studies) will/will not be included in your work. Using specific criteria will help make sure your final review is as unbiased, transparent and ethical as possible.

How to establish your Inclusion and Exclusion criteria

To establish your criteria you need to define each aspect of your question to clarify what you are focusing on, and consider if there are any variations you also wish to explore. This is where using frameworks like PICO help:

Example:   Alternatives to drugs for controlling headaches in children.

Using the PICO structure you clarify what aspects you are most interested in. Here are some examples to consider:

    Children

A specific age group? Teenagers and adolescents?

    Alternatives to drugs

What alternatives are there? Complementary therapies? Alternative medicines? Changes in lifestyle? All three?

If you decide to focus on 'complementary therapies' do you want to examine all therapies or a specific therapy like holistic therapy?

    Drugs

All drugs that treat headaches, or a group of drugs, or a specific drug?

   Headaches

All types of headaches, or a specific type such as tension headaches or migraines?

The aspects of the topic you decide to focus on are the  Inclusion  criteria.

The aspects you don't wish to include are the  Exclusion  criteria.

  • << Previous: Use the PICO or PEO frameworks
  • Next: Find related search terms >>
  • Last Updated: Aug 23, 2024 3:26 PM
  • URL: https://libguides.city.ac.uk/SHS-Litsearchguide

USF Libraries Hours by campus

Search the usf libraries, libraries locations.

  • Libraries Hours
  • Outages & Maintenance Alerts
  • Grad Students
  • Alumni & Public
  • Ask A Librarian
  • Subject & Course Guides
  • Research Tools
  • Find My Librarian

COLLECTIONS

  • Special Collections
  • Digital Initiatives
  • Digital Commons @ USF
  • Center for Digital Heritage & Geospatial Information
  • Administration
  • Get Help With...
  • USF Health Libraries

Systematic Reviews for Social Sciences

  • Systematic Reviews
  • Mapping Review
  • Mixed Methods Review
  • Rapid Review
  • Scoping Review
  • Develop the Question

Define Inclusion/Exclusion Criteria

  • Develop A Review Protocol
  • Create Search Strategies
  • Select Studies
  • Extract Data
  • Assess the Quality of Studies
  • Synthesize Data and Write the Report
  • Review Tools
  • Librarian Involvement
  • Grey Literature

One of the features that distinguishes a systematic review from a narrative review is the pre-specification of criteria for including and excluding studies in the review (eligibility criteria). Explicit criteria, based on the review’s scope and question(s), are used to include and exclude studies. 

A large number of references (study titles and abstracts) will have been found at the searching stage of the review. A proportion of these will look as though they are relevant to the review's research questions. So, having explicit criteria against which to assess studies makes the process more efficient in terms of time.

More importantly, it also helps to avoid hidden bias by having clear consistent rules about which studies are being used to answer the review's specific research questions. 

Each study needs to be compared against same criteria. To be included in the review, a study needs to meet all inclusion criteria and not meet any exclusion criteria. Inclusion/eligibility criteria include participants, interventions and comparisons and often study design. Outcomes are usually not part of the criteria, though some reviews do legitimately restrict eligibility to specific outcomes.

For example, a systematic review include criteria may be determined using ECLIPSE.

  • Expectation - identify best practices for information literacy instruction
  • Client Group - Higher education students
  • Location - United States
  • Impact -Best practices for information literacy instruction
  • Professionals - Librarians
  • SErvice - Information literacy instruction

Exclusion criteria may include non-peer-reviewed articles, articles not in English, articles before a specified date, and in this case, articles about theory rather than actual practice.

  • << Previous: Develop the Question
  • Next: Develop A Review Protocol >>
  • Last Updated: Jun 11, 2024 8:21 PM
  • URL: https://guides.lib.usf.edu/systematicreviews

Banner

  • JABSOM Library

Literature Review Basics and Searching Skills

  • Selecting Criteria
  • Getting Started
  • Developing Research Questions
  • Selecting Databases
  • Searching Databases
  • Documenting Searches
  • Organizing Findings
  • Managing Citations & References
  • Submitting Your Manuscript
  • Finding Help

Selection Criteria

You may want to think about criteria that will be used to select articles for your literature review based on your research question. These are commonly known as  inclusion criteria  and  exclusion criteria . Be aware that you may introduce bias into the final review if these are not used thoughtfully.

Inclusion Criteria

Inclusion criteria are the elements of an article that must be present in order for it to be eligible for inclusion in a literature review.  Some examples are:

  • Included studies must have compared certain treatments
  • Included studies must be experimental
  • Included studies must have been published in the last 5 years

Exclusion Criteria

Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review.  Some examples are:

  • Study used an observational design
  • Study used a qualitative methodology
  • Study was published more than 5 years ago
  • Study was published in a language other than English
  • << Previous: Developing Research Questions
  • Next: Selecting Databases >>
  • Last Updated: Mar 20, 2024 11:38 AM
  • URL: https://hslib.jabsom.hawaii.edu/lit_review

Health Sciences Library, John A. Burns School of Medicine, University of Hawai‘i at Mānoa, 651 Ilalo Street, MEB 101, Honolulu, HI 96813 - Phone: 808-692-0810, Fax: 808-692-1244

Copyright © 2004-2024. All rights reserved. Library Staff Page - Other UH Libraries

icon

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 19, Issue 1
  • Reviewing the literature
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Joanna Smith 1 ,
  • Helen Noble 2
  • 1 School of Healthcare, University of Leeds , Leeds , UK
  • 2 School of Nursing and Midwifery, Queens's University Belfast , Belfast , UK
  • Correspondence to Dr Joanna Smith , School of Healthcare, University of Leeds, Leeds LS2 9JT, UK; j.e.smith1{at}leeds.ac.uk

https://doi.org/10.1136/eb-2015-102252

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Implementing evidence into practice requires nurses to identify, critically appraise and synthesise research. This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review.

Are there different approaches to undertaking a literature review?

What stages are required to undertake a literature review.

The rationale for the review should be established; consider why the review is important and relevant to patient care/safety or service delivery. For example, Noble et al 's 4 review sought to understand and make recommendations for practice and research in relation to dialysis refusal and withdrawal in patients with end-stage renal disease, an area of care previously poorly described. If appropriate, highlight relevant policies and theoretical perspectives that might guide the review. Once the key issues related to the topic, including the challenges encountered in clinical practice, have been identified formulate a clear question, and/or develop an aim and specific objectives. The type of review undertaken is influenced by the purpose of the review and resources available. However, the stages or methods used to undertake a review are similar across approaches and include:

Formulating clear inclusion and exclusion criteria, for example, patient groups, ages, conditions/treatments, sources of evidence/research designs;

Justifying data bases and years searched, and whether strategies including hand searching of journals, conference proceedings and research not indexed in data bases (grey literature) will be undertaken;

Developing search terms, the PICU (P: patient, problem or population; I: intervention; C: comparison; O: outcome) framework is a useful guide when developing search terms;

Developing search skills (eg, understanding Boolean Operators, in particular the use of AND/OR) and knowledge of how data bases index topics (eg, MeSH headings). Working with a librarian experienced in undertaking health searches is invaluable when developing a search.

Once studies are selected, the quality of the research/evidence requires evaluation. Using a quality appraisal tool, such as the Critical Appraisal Skills Programme (CASP) tools, 5 results in a structured approach to assessing the rigour of studies being reviewed. 3 Approaches to data synthesis for quantitative studies may include a meta-analysis (statistical analysis of data from multiple studies of similar designs that have addressed the same question), or findings can be reported descriptively. 6 Methods applicable for synthesising qualitative studies include meta-ethnography (themes and concepts from different studies are explored and brought together using approaches similar to qualitative data analysis methods), narrative summary, thematic analysis and content analysis. 7 Table 1 outlines the stages undertaken for a published review that summarised research about parents’ experiences of living with a child with a long-term condition. 8

  • View inline

An example of rapid evidence assessment review

In summary, the type of literature review depends on the review purpose. For the novice reviewer undertaking a review can be a daunting and complex process; by following the stages outlined and being systematic a robust review is achievable. The importance of literature reviews should not be underestimated—they help summarise and make sense of an increasingly vast body of research promoting best evidence-based practice.

  • ↵ Centre for Reviews and Dissemination . Guidance for undertaking reviews in health care . 3rd edn . York : CRD, York University , 2009 .
  • ↵ Canadian Best Practices Portal. http://cbpp-pcpe.phac-aspc.gc.ca/interventions/selected-systematic-review-sites / ( accessed 7.8.2015 ).
  • Bridges J , et al
  • ↵ Critical Appraisal Skills Programme (CASP). http://www.casp-uk.net / ( accessed 7.8.2015 ).
  • Dixon-Woods M ,
  • Shaw R , et al
  • Agarwal S ,
  • Jones D , et al
  • Cheater F ,

Twitter Follow Joanna Smith at @josmith175

Competing interests None declared.

Read the full text or download the PDF:

Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Lau F, Kuziemsky C, editors. Handbook of eHealth Evaluation: An Evidence-based Approach [Internet]. Victoria (BC): University of Victoria; 2017 Feb 27.

Cover of Handbook of eHealth Evaluation: An Evidence-based Approach

Handbook of eHealth Evaluation: An Evidence-based Approach [Internet].

Chapter 9 methods for literature reviews.

Guy Paré and Spyros Kitsiou .

9.1. Introduction

Literature reviews play a critical role in scholarship because science remains, first and foremost, a cumulative endeavour ( vom Brocke et al., 2009 ). As in any academic discipline, rigorous knowledge syntheses are becoming indispensable in keeping up with an exponentially growing eHealth literature, assisting practitioners, academics, and graduate students in finding, evaluating, and synthesizing the contents of many empirical and conceptual papers. Among other methods, literature reviews are essential for: (a) identifying what has been written on a subject or topic; (b) determining the extent to which a specific research area reveals any interpretable trends or patterns; (c) aggregating empirical findings related to a narrow research question to support evidence-based practice; (d) generating new frameworks and theories; and (e) identifying topics or questions requiring more investigation ( Paré, Trudel, Jaana, & Kitsiou, 2015 ).

Literature reviews can take two major forms. The most prevalent one is the “literature review” or “background” section within a journal paper or a chapter in a graduate thesis. This section synthesizes the extant literature and usually identifies the gaps in knowledge that the empirical study addresses ( Sylvester, Tate, & Johnstone, 2013 ). It may also provide a theoretical foundation for the proposed study, substantiate the presence of the research problem, justify the research as one that contributes something new to the cumulated knowledge, or validate the methods and approaches for the proposed study ( Hart, 1998 ; Levy & Ellis, 2006 ).

The second form of literature review, which is the focus of this chapter, constitutes an original and valuable work of research in and of itself ( Paré et al., 2015 ). Rather than providing a base for a researcher’s own work, it creates a solid starting point for all members of the community interested in a particular area or topic ( Mulrow, 1987 ). The so-called “review article” is a journal-length paper which has an overarching purpose to synthesize the literature in a field, without collecting or analyzing any primary data ( Green, Johnson, & Adams, 2006 ).

When appropriately conducted, review articles represent powerful information sources for practitioners looking for state-of-the art evidence to guide their decision-making and work practices ( Paré et al., 2015 ). Further, high-quality reviews become frequently cited pieces of work which researchers seek out as a first clear outline of the literature when undertaking empirical studies ( Cooper, 1988 ; Rowe, 2014 ). Scholars who track and gauge the impact of articles have found that review papers are cited and downloaded more often than any other type of published article ( Cronin, Ryan, & Coughlan, 2008 ; Montori, Wilczynski, Morgan, Haynes, & Hedges, 2003 ; Patsopoulos, Analatos, & Ioannidis, 2005 ). The reason for their popularity may be the fact that reading the review enables one to have an overview, if not a detailed knowledge of the area in question, as well as references to the most useful primary sources ( Cronin et al., 2008 ). Although they are not easy to conduct, the commitment to complete a review article provides a tremendous service to one’s academic community ( Paré et al., 2015 ; Petticrew & Roberts, 2006 ). Most, if not all, peer-reviewed journals in the fields of medical informatics publish review articles of some type.

The main objectives of this chapter are fourfold: (a) to provide an overview of the major steps and activities involved in conducting a stand-alone literature review; (b) to describe and contrast the different types of review articles that can contribute to the eHealth knowledge base; (c) to illustrate each review type with one or two examples from the eHealth literature; and (d) to provide a series of recommendations for prospective authors of review articles in this domain.

9.2. Overview of the Literature Review Process and Steps

As explained in Templier and Paré (2015) , there are six generic steps involved in conducting a review article:

  • formulating the research question(s) and objective(s),
  • searching the extant literature,
  • screening for inclusion,
  • assessing the quality of primary studies,
  • extracting data, and
  • analyzing data.

Although these steps are presented here in sequential order, one must keep in mind that the review process can be iterative and that many activities can be initiated during the planning stage and later refined during subsequent phases ( Finfgeld-Connett & Johnson, 2013 ; Kitchenham & Charters, 2007 ).

Formulating the research question(s) and objective(s): As a first step, members of the review team must appropriately justify the need for the review itself ( Petticrew & Roberts, 2006 ), identify the review’s main objective(s) ( Okoli & Schabram, 2010 ), and define the concepts or variables at the heart of their synthesis ( Cooper & Hedges, 2009 ; Webster & Watson, 2002 ). Importantly, they also need to articulate the research question(s) they propose to investigate ( Kitchenham & Charters, 2007 ). In this regard, we concur with Jesson, Matheson, and Lacey (2011) that clearly articulated research questions are key ingredients that guide the entire review methodology; they underscore the type of information that is needed, inform the search for and selection of relevant literature, and guide or orient the subsequent analysis. Searching the extant literature: The next step consists of searching the literature and making decisions about the suitability of material to be considered in the review ( Cooper, 1988 ). There exist three main coverage strategies. First, exhaustive coverage means an effort is made to be as comprehensive as possible in order to ensure that all relevant studies, published and unpublished, are included in the review and, thus, conclusions are based on this all-inclusive knowledge base. The second type of coverage consists of presenting materials that are representative of most other works in a given field or area. Often authors who adopt this strategy will search for relevant articles in a small number of top-tier journals in a field ( Paré et al., 2015 ). In the third strategy, the review team concentrates on prior works that have been central or pivotal to a particular topic. This may include empirical studies or conceptual papers that initiated a line of investigation, changed how problems or questions were framed, introduced new methods or concepts, or engendered important debate ( Cooper, 1988 ). Screening for inclusion: The following step consists of evaluating the applicability of the material identified in the preceding step ( Levy & Ellis, 2006 ; vom Brocke et al., 2009 ). Once a group of potential studies has been identified, members of the review team must screen them to determine their relevance ( Petticrew & Roberts, 2006 ). A set of predetermined rules provides a basis for including or excluding certain studies. This exercise requires a significant investment on the part of researchers, who must ensure enhanced objectivity and avoid biases or mistakes. As discussed later in this chapter, for certain types of reviews there must be at least two independent reviewers involved in the screening process and a procedure to resolve disagreements must also be in place ( Liberati et al., 2009 ; Shea et al., 2009 ). Assessing the quality of primary studies: In addition to screening material for inclusion, members of the review team may need to assess the scientific quality of the selected studies, that is, appraise the rigour of the research design and methods. Such formal assessment, which is usually conducted independently by at least two coders, helps members of the review team refine which studies to include in the final sample, determine whether or not the differences in quality may affect their conclusions, or guide how they analyze the data and interpret the findings ( Petticrew & Roberts, 2006 ). Ascribing quality scores to each primary study or considering through domain-based evaluations which study components have or have not been designed and executed appropriately makes it possible to reflect on the extent to which the selected study addresses possible biases and maximizes validity ( Shea et al., 2009 ). Extracting data: The following step involves gathering or extracting applicable information from each primary study included in the sample and deciding what is relevant to the problem of interest ( Cooper & Hedges, 2009 ). Indeed, the type of data that should be recorded mainly depends on the initial research questions ( Okoli & Schabram, 2010 ). However, important information may also be gathered about how, when, where and by whom the primary study was conducted, the research design and methods, or qualitative/quantitative results ( Cooper & Hedges, 2009 ). Analyzing and synthesizing data : As a final step, members of the review team must collate, summarize, aggregate, organize, and compare the evidence extracted from the included studies. The extracted data must be presented in a meaningful way that suggests a new contribution to the extant literature ( Jesson et al., 2011 ). Webster and Watson (2002) warn researchers that literature reviews should be much more than lists of papers and should provide a coherent lens to make sense of extant knowledge on a given topic. There exist several methods and techniques for synthesizing quantitative (e.g., frequency analysis, meta-analysis) and qualitative (e.g., grounded theory, narrative analysis, meta-ethnography) evidence ( Dixon-Woods, Agarwal, Jones, Young, & Sutton, 2005 ; Thomas & Harden, 2008 ).

9.3. Types of Review Articles and Brief Illustrations

EHealth researchers have at their disposal a number of approaches and methods for making sense out of existing literature, all with the purpose of casting current research findings into historical contexts or explaining contradictions that might exist among a set of primary research studies conducted on a particular topic. Our classification scheme is largely inspired from Paré and colleagues’ (2015) typology. Below we present and illustrate those review types that we feel are central to the growth and development of the eHealth domain.

9.3.1. Narrative Reviews

The narrative review is the “traditional” way of reviewing the extant literature and is skewed towards a qualitative interpretation of prior knowledge ( Sylvester et al., 2013 ). Put simply, a narrative review attempts to summarize or synthesize what has been written on a particular topic but does not seek generalization or cumulative knowledge from what is reviewed ( Davies, 2000 ; Green et al., 2006 ). Instead, the review team often undertakes the task of accumulating and synthesizing the literature to demonstrate the value of a particular point of view ( Baumeister & Leary, 1997 ). As such, reviewers may selectively ignore or limit the attention paid to certain studies in order to make a point. In this rather unsystematic approach, the selection of information from primary articles is subjective, lacks explicit criteria for inclusion and can lead to biased interpretations or inferences ( Green et al., 2006 ). There are several narrative reviews in the particular eHealth domain, as in all fields, which follow such an unstructured approach ( Silva et al., 2015 ; Paul et al., 2015 ).

Despite these criticisms, this type of review can be very useful in gathering together a volume of literature in a specific subject area and synthesizing it. As mentioned above, its primary purpose is to provide the reader with a comprehensive background for understanding current knowledge and highlighting the significance of new research ( Cronin et al., 2008 ). Faculty like to use narrative reviews in the classroom because they are often more up to date than textbooks, provide a single source for students to reference, and expose students to peer-reviewed literature ( Green et al., 2006 ). For researchers, narrative reviews can inspire research ideas by identifying gaps or inconsistencies in a body of knowledge, thus helping researchers to determine research questions or formulate hypotheses. Importantly, narrative reviews can also be used as educational articles to bring practitioners up to date with certain topics of issues ( Green et al., 2006 ).

Recently, there have been several efforts to introduce more rigour in narrative reviews that will elucidate common pitfalls and bring changes into their publication standards. Information systems researchers, among others, have contributed to advancing knowledge on how to structure a “traditional” review. For instance, Levy and Ellis (2006) proposed a generic framework for conducting such reviews. Their model follows the systematic data processing approach comprised of three steps, namely: (a) literature search and screening; (b) data extraction and analysis; and (c) writing the literature review. They provide detailed and very helpful instructions on how to conduct each step of the review process. As another methodological contribution, vom Brocke et al. (2009) offered a series of guidelines for conducting literature reviews, with a particular focus on how to search and extract the relevant body of knowledge. Last, Bandara, Miskon, and Fielt (2011) proposed a structured, predefined and tool-supported method to identify primary studies within a feasible scope, extract relevant content from identified articles, synthesize and analyze the findings, and effectively write and present the results of the literature review. We highly recommend that prospective authors of narrative reviews consult these useful sources before embarking on their work.

Darlow and Wen (2015) provide a good example of a highly structured narrative review in the eHealth field. These authors synthesized published articles that describe the development process of mobile health (m-health) interventions for patients’ cancer care self-management. As in most narrative reviews, the scope of the research questions being investigated is broad: (a) how development of these systems are carried out; (b) which methods are used to investigate these systems; and (c) what conclusions can be drawn as a result of the development of these systems. To provide clear answers to these questions, a literature search was conducted on six electronic databases and Google Scholar . The search was performed using several terms and free text words, combining them in an appropriate manner. Four inclusion and three exclusion criteria were utilized during the screening process. Both authors independently reviewed each of the identified articles to determine eligibility and extract study information. A flow diagram shows the number of studies identified, screened, and included or excluded at each stage of study selection. In terms of contributions, this review provides a series of practical recommendations for m-health intervention development.

9.3.2. Descriptive or Mapping Reviews

The primary goal of a descriptive review is to determine the extent to which a body of knowledge in a particular research topic reveals any interpretable pattern or trend with respect to pre-existing propositions, theories, methodologies or findings ( King & He, 2005 ; Paré et al., 2015 ). In contrast with narrative reviews, descriptive reviews follow a systematic and transparent procedure, including searching, screening and classifying studies ( Petersen, Vakkalanka, & Kuzniarz, 2015 ). Indeed, structured search methods are used to form a representative sample of a larger group of published works ( Paré et al., 2015 ). Further, authors of descriptive reviews extract from each study certain characteristics of interest, such as publication year, research methods, data collection techniques, and direction or strength of research outcomes (e.g., positive, negative, or non-significant) in the form of frequency analysis to produce quantitative results ( Sylvester et al., 2013 ). In essence, each study included in a descriptive review is treated as the unit of analysis and the published literature as a whole provides a database from which the authors attempt to identify any interpretable trends or draw overall conclusions about the merits of existing conceptualizations, propositions, methods or findings ( Paré et al., 2015 ). In doing so, a descriptive review may claim that its findings represent the state of the art in a particular domain ( King & He, 2005 ).

In the fields of health sciences and medical informatics, reviews that focus on examining the range, nature and evolution of a topic area are described by Anderson, Allen, Peckham, and Goodwin (2008) as mapping reviews . Like descriptive reviews, the research questions are generic and usually relate to publication patterns and trends. There is no preconceived plan to systematically review all of the literature although this can be done. Instead, researchers often present studies that are representative of most works published in a particular area and they consider a specific time frame to be mapped.

An example of this approach in the eHealth domain is offered by DeShazo, Lavallie, and Wolf (2009). The purpose of this descriptive or mapping review was to characterize publication trends in the medical informatics literature over a 20-year period (1987 to 2006). To achieve this ambitious objective, the authors performed a bibliometric analysis of medical informatics citations indexed in medline using publication trends, journal frequencies, impact factors, Medical Subject Headings (MeSH) term frequencies, and characteristics of citations. Findings revealed that there were over 77,000 medical informatics articles published during the covered period in numerous journals and that the average annual growth rate was 12%. The MeSH term analysis also suggested a strong interdisciplinary trend. Finally, average impact scores increased over time with two notable growth periods. Overall, patterns in research outputs that seem to characterize the historic trends and current components of the field of medical informatics suggest it may be a maturing discipline (DeShazo et al., 2009).

9.3.3. Scoping Reviews

Scoping reviews attempt to provide an initial indication of the potential size and nature of the extant literature on an emergent topic (Arksey & O’Malley, 2005; Daudt, van Mossel, & Scott, 2013 ; Levac, Colquhoun, & O’Brien, 2010). A scoping review may be conducted to examine the extent, range and nature of research activities in a particular area, determine the value of undertaking a full systematic review (discussed next), or identify research gaps in the extant literature ( Paré et al., 2015 ). In line with their main objective, scoping reviews usually conclude with the presentation of a detailed research agenda for future works along with potential implications for both practice and research.

Unlike narrative and descriptive reviews, the whole point of scoping the field is to be as comprehensive as possible, including grey literature (Arksey & O’Malley, 2005). Inclusion and exclusion criteria must be established to help researchers eliminate studies that are not aligned with the research questions. It is also recommended that at least two independent coders review abstracts yielded from the search strategy and then the full articles for study selection ( Daudt et al., 2013 ). The synthesized evidence from content or thematic analysis is relatively easy to present in tabular form (Arksey & O’Malley, 2005; Thomas & Harden, 2008 ).

One of the most highly cited scoping reviews in the eHealth domain was published by Archer, Fevrier-Thomas, Lokker, McKibbon, and Straus (2011) . These authors reviewed the existing literature on personal health record ( phr ) systems including design, functionality, implementation, applications, outcomes, and benefits. Seven databases were searched from 1985 to March 2010. Several search terms relating to phr s were used during this process. Two authors independently screened titles and abstracts to determine inclusion status. A second screen of full-text articles, again by two independent members of the research team, ensured that the studies described phr s. All in all, 130 articles met the criteria and their data were extracted manually into a database. The authors concluded that although there is a large amount of survey, observational, cohort/panel, and anecdotal evidence of phr benefits and satisfaction for patients, more research is needed to evaluate the results of phr implementations. Their in-depth analysis of the literature signalled that there is little solid evidence from randomized controlled trials or other studies through the use of phr s. Hence, they suggested that more research is needed that addresses the current lack of understanding of optimal functionality and usability of these systems, and how they can play a beneficial role in supporting patient self-management ( Archer et al., 2011 ).

9.3.4. Forms of Aggregative Reviews

Healthcare providers, practitioners, and policy-makers are nowadays overwhelmed with large volumes of information, including research-based evidence from numerous clinical trials and evaluation studies, assessing the effectiveness of health information technologies and interventions ( Ammenwerth & de Keizer, 2004 ; Deshazo et al., 2009 ). It is unrealistic to expect that all these disparate actors will have the time, skills, and necessary resources to identify the available evidence in the area of their expertise and consider it when making decisions. Systematic reviews that involve the rigorous application of scientific strategies aimed at limiting subjectivity and bias (i.e., systematic and random errors) can respond to this challenge.

Systematic reviews attempt to aggregate, appraise, and synthesize in a single source all empirical evidence that meet a set of previously specified eligibility criteria in order to answer a clearly formulated and often narrow research question on a particular topic of interest to support evidence-based practice ( Liberati et al., 2009 ). They adhere closely to explicit scientific principles ( Liberati et al., 2009 ) and rigorous methodological guidelines (Higgins & Green, 2008) aimed at reducing random and systematic errors that can lead to deviations from the truth in results or inferences. The use of explicit methods allows systematic reviews to aggregate a large body of research evidence, assess whether effects or relationships are in the same direction and of the same general magnitude, explain possible inconsistencies between study results, and determine the strength of the overall evidence for every outcome of interest based on the quality of included studies and the general consistency among them ( Cook, Mulrow, & Haynes, 1997 ). The main procedures of a systematic review involve:

  • Formulating a review question and developing a search strategy based on explicit inclusion criteria for the identification of eligible studies (usually described in the context of a detailed review protocol).
  • Searching for eligible studies using multiple databases and information sources, including grey literature sources, without any language restrictions.
  • Selecting studies, extracting data, and assessing risk of bias in a duplicate manner using two independent reviewers to avoid random or systematic errors in the process.
  • Analyzing data using quantitative or qualitative methods.
  • Presenting results in summary of findings tables.
  • Interpreting results and drawing conclusions.

Many systematic reviews, but not all, use statistical methods to combine the results of independent studies into a single quantitative estimate or summary effect size. Known as meta-analyses , these reviews use specific data extraction and statistical techniques (e.g., network, frequentist, or Bayesian meta-analyses) to calculate from each study by outcome of interest an effect size along with a confidence interval that reflects the degree of uncertainty behind the point estimate of effect ( Borenstein, Hedges, Higgins, & Rothstein, 2009 ; Deeks, Higgins, & Altman, 2008 ). Subsequently, they use fixed or random-effects analysis models to combine the results of the included studies, assess statistical heterogeneity, and calculate a weighted average of the effect estimates from the different studies, taking into account their sample sizes. The summary effect size is a value that reflects the average magnitude of the intervention effect for a particular outcome of interest or, more generally, the strength of a relationship between two variables across all studies included in the systematic review. By statistically combining data from multiple studies, meta-analyses can create more precise and reliable estimates of intervention effects than those derived from individual studies alone, when these are examined independently as discrete sources of information.

The review by Gurol-Urganci, de Jongh, Vodopivec-Jamsek, Atun, and Car (2013) on the effects of mobile phone messaging reminders for attendance at healthcare appointments is an illustrative example of a high-quality systematic review with meta-analysis. Missed appointments are a major cause of inefficiency in healthcare delivery with substantial monetary costs to health systems. These authors sought to assess whether mobile phone-based appointment reminders delivered through Short Message Service ( sms ) or Multimedia Messaging Service ( mms ) are effective in improving rates of patient attendance and reducing overall costs. To this end, they conducted a comprehensive search on multiple databases using highly sensitive search strategies without language or publication-type restrictions to identify all rct s that are eligible for inclusion. In order to minimize the risk of omitting eligible studies not captured by the original search, they supplemented all electronic searches with manual screening of trial registers and references contained in the included studies. Study selection, data extraction, and risk of bias assessments were performed inde­­pen­dently by two coders using standardized methods to ensure consistency and to eliminate potential errors. Findings from eight rct s involving 6,615 participants were pooled into meta-analyses to calculate the magnitude of effects that mobile text message reminders have on the rate of attendance at healthcare appointments compared to no reminders and phone call reminders.

Meta-analyses are regarded as powerful tools for deriving meaningful conclusions. However, there are situations in which it is neither reasonable nor appropriate to pool studies together using meta-analytic methods simply because there is extensive clinical heterogeneity between the included studies or variation in measurement tools, comparisons, or outcomes of interest. In these cases, systematic reviews can use qualitative synthesis methods such as vote counting, content analysis, classification schemes and tabulations, as an alternative approach to narratively synthesize the results of the independent studies included in the review. This form of review is known as qualitative systematic review.

A rigorous example of one such review in the eHealth domain is presented by Mickan, Atherton, Roberts, Heneghan, and Tilson (2014) on the use of handheld computers by healthcare professionals and their impact on access to information and clinical decision-making. In line with the methodological guide­lines for systematic reviews, these authors: (a) developed and registered with prospero ( www.crd.york.ac.uk/ prospero / ) an a priori review protocol; (b) conducted comprehensive searches for eligible studies using multiple databases and other supplementary strategies (e.g., forward searches); and (c) subsequently carried out study selection, data extraction, and risk of bias assessments in a duplicate manner to eliminate potential errors in the review process. Heterogeneity between the included studies in terms of reported outcomes and measures precluded the use of meta-analytic methods. To this end, the authors resorted to using narrative analysis and synthesis to describe the effectiveness of handheld computers on accessing information for clinical knowledge, adherence to safety and clinical quality guidelines, and diagnostic decision-making.

In recent years, the number of systematic reviews in the field of health informatics has increased considerably. Systematic reviews with discordant findings can cause great confusion and make it difficult for decision-makers to interpret the review-level evidence ( Moher, 2013 ). Therefore, there is a growing need for appraisal and synthesis of prior systematic reviews to ensure that decision-making is constantly informed by the best available accumulated evidence. Umbrella reviews , also known as overviews of systematic reviews, are tertiary types of evidence synthesis that aim to accomplish this; that is, they aim to compare and contrast findings from multiple systematic reviews and meta-analyses ( Becker & Oxman, 2008 ). Umbrella reviews generally adhere to the same principles and rigorous methodological guidelines used in systematic reviews. However, the unit of analysis in umbrella reviews is the systematic review rather than the primary study ( Becker & Oxman, 2008 ). Unlike systematic reviews that have a narrow focus of inquiry, umbrella reviews focus on broader research topics for which there are several potential interventions ( Smith, Devane, Begley, & Clarke, 2011 ). A recent umbrella review on the effects of home telemonitoring interventions for patients with heart failure critically appraised, compared, and synthesized evidence from 15 systematic reviews to investigate which types of home telemonitoring technologies and forms of interventions are more effective in reducing mortality and hospital admissions ( Kitsiou, Paré, & Jaana, 2015 ).

9.3.5. Realist Reviews

Realist reviews are theory-driven interpretative reviews developed to inform, enhance, or supplement conventional systematic reviews by making sense of heterogeneous evidence about complex interventions applied in diverse contexts in a way that informs policy decision-making ( Greenhalgh, Wong, Westhorp, & Pawson, 2011 ). They originated from criticisms of positivist systematic reviews which centre on their “simplistic” underlying assumptions ( Oates, 2011 ). As explained above, systematic reviews seek to identify causation. Such logic is appropriate for fields like medicine and education where findings of randomized controlled trials can be aggregated to see whether a new treatment or intervention does improve outcomes. However, many argue that it is not possible to establish such direct causal links between interventions and outcomes in fields such as social policy, management, and information systems where for any intervention there is unlikely to be a regular or consistent outcome ( Oates, 2011 ; Pawson, 2006 ; Rousseau, Manning, & Denyer, 2008 ).

To circumvent these limitations, Pawson, Greenhalgh, Harvey, and Walshe (2005) have proposed a new approach for synthesizing knowledge that seeks to unpack the mechanism of how “complex interventions” work in particular contexts. The basic research question — what works? — which is usually associated with systematic reviews changes to: what is it about this intervention that works, for whom, in what circumstances, in what respects and why? Realist reviews have no particular preference for either quantitative or qualitative evidence. As a theory-building approach, a realist review usually starts by articulating likely underlying mechanisms and then scrutinizes available evidence to find out whether and where these mechanisms are applicable ( Shepperd et al., 2009 ). Primary studies found in the extant literature are viewed as case studies which can test and modify the initial theories ( Rousseau et al., 2008 ).

The main objective pursued in the realist review conducted by Otte-Trojel, de Bont, Rundall, and van de Klundert (2014) was to examine how patient portals contribute to health service delivery and patient outcomes. The specific goals were to investigate how outcomes are produced and, most importantly, how variations in outcomes can be explained. The research team started with an exploratory review of background documents and research studies to identify ways in which patient portals may contribute to health service delivery and patient outcomes. The authors identified six main ways which represent “educated guesses” to be tested against the data in the evaluation studies. These studies were identified through a formal and systematic search in four databases between 2003 and 2013. Two members of the research team selected the articles using a pre-established list of inclusion and exclusion criteria and following a two-step procedure. The authors then extracted data from the selected articles and created several tables, one for each outcome category. They organized information to bring forward those mechanisms where patient portals contribute to outcomes and the variation in outcomes across different contexts.

9.3.6. Critical Reviews

Lastly, critical reviews aim to provide a critical evaluation and interpretive analysis of existing literature on a particular topic of interest to reveal strengths, weaknesses, contradictions, controversies, inconsistencies, and/or other important issues with respect to theories, hypotheses, research methods or results ( Baumeister & Leary, 1997 ; Kirkevold, 1997 ). Unlike other review types, critical reviews attempt to take a reflective account of the research that has been done in a particular area of interest, and assess its credibility by using appraisal instruments or critical interpretive methods. In this way, critical reviews attempt to constructively inform other scholars about the weaknesses of prior research and strengthen knowledge development by giving focus and direction to studies for further improvement ( Kirkevold, 1997 ).

Kitsiou, Paré, and Jaana (2013) provide an example of a critical review that assessed the methodological quality of prior systematic reviews of home telemonitoring studies for chronic patients. The authors conducted a comprehensive search on multiple databases to identify eligible reviews and subsequently used a validated instrument to conduct an in-depth quality appraisal. Results indicate that the majority of systematic reviews in this particular area suffer from important methodological flaws and biases that impair their internal validity and limit their usefulness for clinical and decision-making purposes. To this end, they provide a number of recommendations to strengthen knowledge development towards improving the design and execution of future reviews on home telemonitoring.

9.4. Summary

Table 9.1 outlines the main types of literature reviews that were described in the previous sub-sections and summarizes the main characteristics that distinguish one review type from another. It also includes key references to methodological guidelines and useful sources that can be used by eHealth scholars and researchers for planning and developing reviews.

Table 9.1. Typology of Literature Reviews (adapted from Paré et al., 2015).

Typology of Literature Reviews (adapted from Paré et al., 2015).

As shown in Table 9.1 , each review type addresses different kinds of research questions or objectives, which subsequently define and dictate the methods and approaches that need to be used to achieve the overarching goal(s) of the review. For example, in the case of narrative reviews, there is greater flexibility in searching and synthesizing articles ( Green et al., 2006 ). Researchers are often relatively free to use a diversity of approaches to search, identify, and select relevant scientific articles, describe their operational characteristics, present how the individual studies fit together, and formulate conclusions. On the other hand, systematic reviews are characterized by their high level of systematicity, rigour, and use of explicit methods, based on an “a priori” review plan that aims to minimize bias in the analysis and synthesis process (Higgins & Green, 2008). Some reviews are exploratory in nature (e.g., scoping/mapping reviews), whereas others may be conducted to discover patterns (e.g., descriptive reviews) or involve a synthesis approach that may include the critical analysis of prior research ( Paré et al., 2015 ). Hence, in order to select the most appropriate type of review, it is critical to know before embarking on a review project, why the research synthesis is conducted and what type of methods are best aligned with the pursued goals.

9.5. Concluding Remarks

In light of the increased use of evidence-based practice and research generating stronger evidence ( Grady et al., 2011 ; Lyden et al., 2013 ), review articles have become essential tools for summarizing, synthesizing, integrating or critically appraising prior knowledge in the eHealth field. As mentioned earlier, when rigorously conducted review articles represent powerful information sources for eHealth scholars and practitioners looking for state-of-the-art evidence. The typology of literature reviews we used herein will allow eHealth researchers, graduate students and practitioners to gain a better understanding of the similarities and differences between review types.

We must stress that this classification scheme does not privilege any specific type of review as being of higher quality than another ( Paré et al., 2015 ). As explained above, each type of review has its own strengths and limitations. Having said that, we realize that the methodological rigour of any review — be it qualitative, quantitative or mixed — is a critical aspect that should be considered seriously by prospective authors. In the present context, the notion of rigour refers to the reliability and validity of the review process described in section 9.2. For one thing, reliability is related to the reproducibility of the review process and steps, which is facilitated by a comprehensive documentation of the literature search process, extraction, coding and analysis performed in the review. Whether the search is comprehensive or not, whether it involves a methodical approach for data extraction and synthesis or not, it is important that the review documents in an explicit and transparent manner the steps and approach that were used in the process of its development. Next, validity characterizes the degree to which the review process was conducted appropriately. It goes beyond documentation and reflects decisions related to the selection of the sources, the search terms used, the period of time covered, the articles selected in the search, and the application of backward and forward searches ( vom Brocke et al., 2009 ). In short, the rigour of any review article is reflected by the explicitness of its methods (i.e., transparency) and the soundness of the approach used. We refer those interested in the concepts of rigour and quality to the work of Templier and Paré (2015) which offers a detailed set of methodological guidelines for conducting and evaluating various types of review articles.

To conclude, our main objective in this chapter was to demystify the various types of literature reviews that are central to the continuous development of the eHealth field. It is our hope that our descriptive account will serve as a valuable source for those conducting, evaluating or using reviews in this important and growing domain.

  • Ammenwerth E., de Keizer N. An inventory of evaluation studies of information technology in health care. Trends in evaluation research, 1982-2002. International Journal of Medical Informatics. 2004; 44 (1):44–56. [ PubMed : 15778794 ]
  • Anderson S., Allen P., Peckham S., Goodwin N. Asking the right questions: scoping studies in the commissioning of research on the organisation and delivery of health services. Health Research Policy and Systems. 2008; 6 (7):1–12. [ PMC free article : PMC2500008 ] [ PubMed : 18613961 ] [ CrossRef ]
  • Archer N., Fevrier-Thomas U., Lokker C., McKibbon K. A., Straus S.E. Personal health records: a scoping review. Journal of American Medical Informatics Association. 2011; 18 (4):515–522. [ PMC free article : PMC3128401 ] [ PubMed : 21672914 ]
  • Arksey H., O’Malley L. Scoping studies: towards a methodological framework. International Journal of Social Research Methodology. 2005; 8 (1):19–32.
  • A systematic, tool-supported method for conducting literature reviews in information systems. Paper presented at the Proceedings of the 19th European Conference on Information Systems ( ecis 2011); June 9 to 11; Helsinki, Finland. 2011.
  • Baumeister R. F., Leary M.R. Writing narrative literature reviews. Review of General Psychology. 1997; 1 (3):311–320.
  • Becker L. A., Oxman A.D. In: Cochrane handbook for systematic reviews of interventions. Higgins J. P. T., Green S., editors. Hoboken, nj : John Wiley & Sons, Ltd; 2008. Overviews of reviews; pp. 607–631.
  • Borenstein M., Hedges L., Higgins J., Rothstein H. Introduction to meta-analysis. Hoboken, nj : John Wiley & Sons Inc; 2009.
  • Cook D. J., Mulrow C. D., Haynes B. Systematic reviews: Synthesis of best evidence for clinical decisions. Annals of Internal Medicine. 1997; 126 (5):376–380. [ PubMed : 9054282 ]
  • Cooper H., Hedges L.V. In: The handbook of research synthesis and meta-analysis. 2nd ed. Cooper H., Hedges L. V., Valentine J. C., editors. New York: Russell Sage Foundation; 2009. Research synthesis as a scientific process; pp. 3–17.
  • Cooper H. M. Organizing knowledge syntheses: A taxonomy of literature reviews. Knowledge in Society. 1988; 1 (1):104–126.
  • Cronin P., Ryan F., Coughlan M. Undertaking a literature review: a step-by-step approach. British Journal of Nursing. 2008; 17 (1):38–43. [ PubMed : 18399395 ]
  • Darlow S., Wen K.Y. Development testing of mobile health interventions for cancer patient self-management: A review. Health Informatics Journal. 2015 (online before print). [ PubMed : 25916831 ] [ CrossRef ]
  • Daudt H. M., van Mossel C., Scott S.J. Enhancing the scoping study methodology: a large, inter-professional team’s experience with Arksey and O’Malley’s framework. bmc Medical Research Methodology. 2013; 13 :48. [ PMC free article : PMC3614526 ] [ PubMed : 23522333 ] [ CrossRef ]
  • Davies P. The relevance of systematic reviews to educational policy and practice. Oxford Review of Education. 2000; 26 (3-4):365–378.
  • Deeks J. J., Higgins J. P. T., Altman D.G. In: Cochrane handbook for systematic reviews of interventions. Higgins J. P. T., Green S., editors. Hoboken, nj : John Wiley & Sons, Ltd; 2008. Analysing data and undertaking meta-analyses; pp. 243–296.
  • Deshazo J. P., Lavallie D. L., Wolf F.M. Publication trends in the medical informatics literature: 20 years of “Medical Informatics” in mesh . bmc Medical Informatics and Decision Making. 2009; 9 :7. [ PMC free article : PMC2652453 ] [ PubMed : 19159472 ] [ CrossRef ]
  • Dixon-Woods M., Agarwal S., Jones D., Young B., Sutton A. Synthesising qualitative and quantitative evidence: a review of possible methods. Journal of Health Services Research and Policy. 2005; 10 (1):45–53. [ PubMed : 15667704 ]
  • Finfgeld-Connett D., Johnson E.D. Literature search strategies for conducting knowledge-building and theory-generating qualitative systematic reviews. Journal of Advanced Nursing. 2013; 69 (1):194–204. [ PMC free article : PMC3424349 ] [ PubMed : 22591030 ]
  • Grady B., Myers K. M., Nelson E. L., Belz N., Bennett L., Carnahan L. … Guidelines Working Group. Evidence-based practice for telemental health. Telemedicine Journal and E Health. 2011; 17 (2):131–148. [ PubMed : 21385026 ]
  • Green B. N., Johnson C. D., Adams A. Writing narrative literature reviews for peer-reviewed journals: secrets of the trade. Journal of Chiropractic Medicine. 2006; 5 (3):101–117. [ PMC free article : PMC2647067 ] [ PubMed : 19674681 ]
  • Greenhalgh T., Wong G., Westhorp G., Pawson R. Protocol–realist and meta-narrative evidence synthesis: evolving standards ( rameses ). bmc Medical Research Methodology. 2011; 11 :115. [ PMC free article : PMC3173389 ] [ PubMed : 21843376 ]
  • Gurol-Urganci I., de Jongh T., Vodopivec-Jamsek V., Atun R., Car J. Mobile phone messaging reminders for attendance at healthcare appointments. Cochrane Database System Review. 2013; 12 cd 007458. [ PMC free article : PMC6485985 ] [ PubMed : 24310741 ] [ CrossRef ]
  • Hart C. Doing a literature review: Releasing the social science research imagination. London: SAGE Publications; 1998.
  • Higgins J. P. T., Green S., editors. Cochrane handbook for systematic reviews of interventions: Cochrane book series. Hoboken, nj : Wiley-Blackwell; 2008.
  • Jesson J., Matheson L., Lacey F.M. Doing your literature review: traditional and systematic techniques. Los Angeles & London: SAGE Publications; 2011.
  • King W. R., He J. Understanding the role and methods of meta-analysis in IS research. Communications of the Association for Information Systems. 2005; 16 :1.
  • Kirkevold M. Integrative nursing research — an important strategy to further the development of nursing science and nursing practice. Journal of Advanced Nursing. 1997; 25 (5):977–984. [ PubMed : 9147203 ]
  • Kitchenham B., Charters S. ebse Technical Report Version 2.3. Keele & Durham. uk : Keele University & University of Durham; 2007. Guidelines for performing systematic literature reviews in software engineering.
  • Kitsiou S., Paré G., Jaana M. Systematic reviews and meta-analyses of home telemonitoring interventions for patients with chronic diseases: a critical assessment of their methodological quality. Journal of Medical Internet Research. 2013; 15 (7):e150. [ PMC free article : PMC3785977 ] [ PubMed : 23880072 ]
  • Kitsiou S., Paré G., Jaana M. Effects of home telemonitoring interventions on patients with chronic heart failure: an overview of systematic reviews. Journal of Medical Internet Research. 2015; 17 (3):e63. [ PMC free article : PMC4376138 ] [ PubMed : 25768664 ]
  • Levac D., Colquhoun H., O’Brien K. K. Scoping studies: advancing the methodology. Implementation Science. 2010; 5 (1):69. [ PMC free article : PMC2954944 ] [ PubMed : 20854677 ]
  • Levy Y., Ellis T.J. A systems approach to conduct an effective literature review in support of information systems research. Informing Science. 2006; 9 :181–211.
  • Liberati A., Altman D. G., Tetzlaff J., Mulrow C., Gøtzsche P. C., Ioannidis J. P. A. et al. Moher D. The prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. Annals of Internal Medicine. 2009; 151 (4):W-65. [ PubMed : 19622512 ]
  • Lyden J. R., Zickmund S. L., Bhargava T. D., Bryce C. L., Conroy M. B., Fischer G. S. et al. McTigue K. M. Implementing health information technology in a patient-centered manner: Patient experiences with an online evidence-based lifestyle intervention. Journal for Healthcare Quality. 2013; 35 (5):47–57. [ PubMed : 24004039 ]
  • Mickan S., Atherton H., Roberts N. W., Heneghan C., Tilson J.K. Use of handheld computers in clinical practice: a systematic review. bmc Medical Informatics and Decision Making. 2014; 14 :56. [ PMC free article : PMC4099138 ] [ PubMed : 24998515 ]
  • Moher D. The problem of duplicate systematic reviews. British Medical Journal. 2013; 347 (5040) [ PubMed : 23945367 ] [ CrossRef ]
  • Montori V. M., Wilczynski N. L., Morgan D., Haynes R. B., Hedges T. Systematic reviews: a cross-sectional study of location and citation counts. bmc Medicine. 2003; 1 :2. [ PMC free article : PMC281591 ] [ PubMed : 14633274 ]
  • Mulrow C. D. The medical review article: state of the science. Annals of Internal Medicine. 1987; 106 (3):485–488. [ PubMed : 3813259 ] [ CrossRef ]
  • Evidence-based information systems: A decade later. Proceedings of the European Conference on Information Systems ; 2011. Retrieved from http://aisel ​.aisnet.org/cgi/viewcontent ​.cgi?article ​=1221&context ​=ecis2011 .
  • Okoli C., Schabram K. A guide to conducting a systematic literature review of information systems research. ssrn Electronic Journal. 2010
  • Otte-Trojel T., de Bont A., Rundall T. G., van de Klundert J. How outcomes are achieved through patient portals: a realist review. Journal of American Medical Informatics Association. 2014; 21 (4):751–757. [ PMC free article : PMC4078283 ] [ PubMed : 24503882 ]
  • Paré G., Trudel M.-C., Jaana M., Kitsiou S. Synthesizing information systems knowledge: A typology of literature reviews. Information & Management. 2015; 52 (2):183–199.
  • Patsopoulos N. A., Analatos A. A., Ioannidis J.P. A. Relative citation impact of various study designs in the health sciences. Journal of the American Medical Association. 2005; 293 (19):2362–2366. [ PubMed : 15900006 ]
  • Paul M. M., Greene C. M., Newton-Dame R., Thorpe L. E., Perlman S. E., McVeigh K. H., Gourevitch M.N. The state of population health surveillance using electronic health records: A narrative review. Population Health Management. 2015; 18 (3):209–216. [ PubMed : 25608033 ]
  • Pawson R. Evidence-based policy: a realist perspective. London: SAGE Publications; 2006.
  • Pawson R., Greenhalgh T., Harvey G., Walshe K. Realist review—a new method of systematic review designed for complex policy interventions. Journal of Health Services Research & Policy. 2005; 10 (Suppl 1):21–34. [ PubMed : 16053581 ]
  • Petersen K., Vakkalanka S., Kuzniarz L. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology. 2015; 64 :1–18.
  • Petticrew M., Roberts H. Systematic reviews in the social sciences: A practical guide. Malden, ma : Blackwell Publishing Co; 2006.
  • Rousseau D. M., Manning J., Denyer D. Evidence in management and organizational science: Assembling the field’s full weight of scientific knowledge through syntheses. The Academy of Management Annals. 2008; 2 (1):475–515.
  • Rowe F. What literature review is not: diversity, boundaries and recommendations. European Journal of Information Systems. 2014; 23 (3):241–255.
  • Shea B. J., Hamel C., Wells G. A., Bouter L. M., Kristjansson E., Grimshaw J. et al. Boers M. amstar is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. Journal of Clinical Epidemiology. 2009; 62 (10):1013–1020. [ PubMed : 19230606 ]
  • Shepperd S., Lewin S., Straus S., Clarke M., Eccles M. P., Fitzpatrick R. et al. Sheikh A. Can we systematically review studies that evaluate complex interventions? PLoS Medicine. 2009; 6 (8):e1000086. [ PMC free article : PMC2717209 ] [ PubMed : 19668360 ]
  • Silva B. M., Rodrigues J. J., de la Torre Díez I., López-Coronado M., Saleem K. Mobile-health: A review of current state in 2015. Journal of Biomedical Informatics. 2015; 56 :265–272. [ PubMed : 26071682 ]
  • Smith V., Devane D., Begley C., Clarke M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. bmc Medical Research Methodology. 2011; 11 (1):15. [ PMC free article : PMC3039637 ] [ PubMed : 21291558 ]
  • Sylvester A., Tate M., Johnstone D. Beyond synthesis: re-presenting heterogeneous research literature. Behaviour & Information Technology. 2013; 32 (12):1199–1215.
  • Templier M., Paré G. A framework for guiding and evaluating literature reviews. Communications of the Association for Information Systems. 2015; 37 (6):112–137.
  • Thomas J., Harden A. Methods for the thematic synthesis of qualitative research in systematic reviews. bmc Medical Research Methodology. 2008; 8 (1):45. [ PMC free article : PMC2478656 ] [ PubMed : 18616818 ]
  • Reconstructing the giant: on the importance of rigour in documenting the literature search process. Paper presented at the Proceedings of the 17th European Conference on Information Systems ( ecis 2009); Verona, Italy. 2009.
  • Webster J., Watson R.T. Analyzing the past to prepare for the future: Writing a literature review. Management Information Systems Quarterly. 2002; 26 (2):11.
  • Whitlock E. P., Lin J. S., Chou R., Shekelle P., Robinson K.A. Using existing systematic reviews in complex systematic reviews. Annals of Internal Medicine. 2008; 148 (10):776–782. [ PubMed : 18490690 ]

This publication is licensed under a Creative Commons License, Attribution-Noncommercial 4.0 International License (CC BY-NC 4.0): see https://creativecommons.org/licenses/by-nc/4.0/

  • Cite this Page Paré G, Kitsiou S. Chapter 9 Methods for Literature Reviews. In: Lau F, Kuziemsky C, editors. Handbook of eHealth Evaluation: An Evidence-based Approach [Internet]. Victoria (BC): University of Victoria; 2017 Feb 27.
  • PDF version of this title (4.5M)

In this Page

  • Introduction
  • Overview of the Literature Review Process and Steps
  • Types of Review Articles and Brief Illustrations
  • Concluding Remarks

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Recent Activity

  • Chapter 9 Methods for Literature Reviews - Handbook of eHealth Evaluation: An Ev... Chapter 9 Methods for Literature Reviews - Handbook of eHealth Evaluation: An Evidence-based Approach

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

future-logo

Article Menu

exclusion criteria for literature review

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The characteristics of teacher training with social impact to overcome school violence: a literature review.

exclusion criteria for literature review

1. Introduction

2.1. search strategy, 2.2. inclusion and exclusion criteria, 2.3. screening, 2.4. data extraction, 3.1. type of teacher training, 3.2. content of teacher training, 3.3. social impact achieved, 4. discussion and conclusions, 5. limitations, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development. United Nations. 2015. Available online: https://sdgs.un.org/2030agenda (accessed on 16 July 2024).
  • UNESCO. Behind the Numbers: Ending School Violence and Bullying. UNESCO. 2019. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000366483 (accessed on 21 July 2024).
  • World Health Organization. School-Based Violence Prevention: A Practical Handbook ; World Health Organization: Geneva, Switzerland, 2019; Available online: https://apps.who.int/iris/handle/10665/324930 (accessed on 24 July 2024).
  • Hillis, S.; Mercy, J.A.; Amobi, A.; Kress, H. Global prevalence of past-year violence against children: A systematic review and minimum estimates. Pediatrics 2016 , 137 , e20154079. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • UNICEF. An Everyday Lesson. #ENDviolence in Schools. UNICEF. 2018. Available online: https://www.unicef.org/documents/everyday-lesson-endviolence-schools (accessed on 15 March 2024).
  • Nansel, T.R.; Overpeck, M.D.; Pilla, R.S.; Ruan, W.J.; Simons-Morton, B.; Scheidt, P. Bullying behaviors among US youth: Prevalence and association with psychosocial adjustment. JAMA 2001 , 285 , 2094–2100. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • UNICEF. Hidden in Plain Sight: A Statistical Analysis of Violence against Children. UNICEF. 2014. Available online: https://data.unicef.org/resources/hidden-in-plain-sight-a-statistical-analysis-of-violence-against-children/ (accessed on 10 March 2024).
  • UNESCO. School Violence and Bullying: Global Status Report ; UNESCO: London, UK, 2017; Volume 9. [ Google Scholar ] [ CrossRef ]
  • Wang, J.; Iannotti, R.J.; Nansel, T.R. School Bullying among Adolescents in the United States: Physical, Verbal, Relational, and Cyber. J. Adolesc. Health 2009 , 45 , 368–375. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wodon, Q.; Fevre, C.; Male, C.; Nayihouba, A.; Nguyen, H. Ending Violence in Schools: An Investment Case ; The World Bank and the Global Partnership to End Violence Against Children: Washington, DC, USA, 2021; Available online: https://documents1.worldbank.org/curated/en/470341626799342515/pdf/Ending-Violence-in-Schools-An-Investment-Case.pdf (accessed on 25 July 2024).
  • Gershoff, E.T. School Corporal Punishment in Global Perspective: Prevalence, Outcomes, and Efforts at Intervention. Psychol. Health Med. 2017 , 22 , 224–239. [ Google Scholar ] [ CrossRef ]
  • Ogando Portela, M.J.; Pells, K. Corporal Punishment in Schools: Longitudinal Evidence from Ethiopia, India, Peru, and Viet Nam (Innocenti Discussion Paper No. 2015-02) ; UNICEF Office of Research: Florence, Italy, 2015; Available online: https://www.unicef-irc.org/publications/series/22/ (accessed on 25 June 2024).
  • Busch, V.; Loyen, A.; Lodder, M.; Schrijvers, A.J.P.; van Yperen, T.A.; de Leeuw, J.R.J. The Effects of Adolescent Health-Related Behavior on Academic Performance: A Systematic Review of the Longitudinal Evidence. Rev. Educ. Res. 2014 , 84 , 245–274. [ Google Scholar ] [ CrossRef ]
  • Wang, W.; Vaillancourt, T.; Brittain, H.L.; McDougall, P.; Krygsman, A.; Smith, D.; Cunningham, C.E.; Haltigan, J.D.; Hymel, S. School climate, peer victimization, and academic achievement: Results from a multi-informant study. Sch. Psychol. Q. 2014 , 29 , 360–377. [ Google Scholar ] [ CrossRef ]
  • UNESCO. Más Allá de los Números: Poner fin a la Violencia y el Acoso en el Ámbito Escolar. UNESCO. 2021. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000378398 (accessed on 20 August 2024).
  • Dalla Pozza, V.; Di Pietro, A.; Morel, S.; Psaila, E. Cyberbullying among Young People ; Directorate General for Internal Policies, Policy Department, Citizens’ Rights and Constitutional Affairs: Brussels, Belgium, 2016. [ Google Scholar ]
  • Moore, S.E.; Norman, R.E.; Suetani, S.; Thomas, H.J.; Sly, P.D.; Scott, J.G. Consequences of bullying victimization in childhood and adolescence: A systematic review and meta-analysis. World J. Psychiatry 2017 , 7 , 60. [ Google Scholar ] [ CrossRef ]
  • Seedat, S.; Stein, M.B.; Kennedy, C.M.; Hauger, R.L. Plasma cortisol and neuropeptide Y in female victims of intimate partner violence. Psychoneuroendocrinology 2003 , 28 , 796–808. [ Google Scholar ] [ CrossRef ]
  • Shonkoff, J.P.; Garner, A.S.; Committee on Psychosocial Aspects of Child and Family Health; Committee on Early Childhood, Adoption, and Dependent Care; Section on Developmental and Behavioral Pediatrics. The lifelong effects of early childhood adversity and toxic stress. Pediatrics 2012 , 129 , e232–e246. [ Google Scholar ] [ CrossRef ]
  • Zhang, T.-Y.; Meaney, M.J. Epigenetics and the environmental regulation of the genome and its function. Annu. Rev. Psychol. 2010 , 61 , 439–466. [ Google Scholar ] [ CrossRef ]
  • Flecha, R.; Puigvert, L.; Racionero-Plaza, S. Achieving Student Well-Being for All: Educational Contexts Free of Violence. NESET Report. 2023. Available online: https://nesetweb.eu/wp-content/uploads/2023/01/NESER_AR1_full_report-KC-2.pdf (accessed on 25 July 2024).
  • Pivik, J.; McComas, J.; Laflamme, M. Barriers and Facilitators to Inclusive Education. Except. Child. 2002 , 69 , 97–107. [ Google Scholar ] [ CrossRef ]
  • Jansen, D.E.; Veenstra, R.; Ormel, J.; Verhulst, F.C.; Reijneveld, S.A. Early Risk Factors for Being a Bully, Victim, or Bully/Victim in Late Elementary and Early Secondary Education. The Longitudinal TRAILS Study. BMC Public Health 2011 , 11 , 440. [ Google Scholar ] [ CrossRef ]
  • Caravita, S.C.S.; Stefanelli, S.; Mazzone, A.; Cadei, L.; Thornberg, R.; Ambrosini, B. When the Bullied Peer Is Native-Born vs. Immigrant: A Mixed-Method Study with a Sample of Native-Born and Immigrant Adolescents. Scand. J. Psychol. 2019 , 61 , 97–107. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rose, C.A.; Espelage, D.L.; Monda-Amaya, L.E. Bullying and Victimisation Rates among Students in General and Special Education: A Comparative Analysis. Educ. Psychol. 2009 , 29 , 761–776. [ Google Scholar ] [ CrossRef ]
  • Symes, W.; Humphrey, N. Peer-Group Indicators of Social Inclusion among Pupils with Autistic Spectrum Disorders (ASD) in Mainstream Secondary Schools: A Comparative Study. Sch. Psychol. Int. 2010 , 31 , 478–494. [ Google Scholar ] [ CrossRef ]
  • Earnshaw, V.A.; Reisner, S.L.; Juvonen, J.; Hatzenbuehler, M.L.; Perrotti, J.; Schuster, M.A. LGBTQ Bullying: Translating Research to Action in Pediatrics. Pediatrics 2017 , 140 , e20170432. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hirschstein, M.K.; van Schoiack Edstrom, L.; Frey, K.S.; Snell, J.L.; MacKenzie, E.P. Walking the Talk in Bullying Prevention: Teacher Implementation Variables Related to Initial Impact of the Steps to Respect Program. Sch. Psychol. Rev. 2007 , 36 , 3–21. [ Google Scholar ] [ CrossRef ]
  • Lodge, J.; Frydenberg, E. The Role of Peer Bystanders in School Bullying: Positive Steps toward Promoting Peaceful Schools. In Peace Education ; Routledge: London, UK, 2013; pp. 329–336. [ Google Scholar ]
  • Mahon, J.; Packman, J.; Liles, E. Preservice Teachers’ Knowledge about Bullying: Implications for Teacher Education. Int. J. Qual. Stud. Educ. 2020 , 93 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Barnes, A.; Cross, D.; Lester, L.; Hearn, L.; Epstein, M.; Monks, H. The Invisibility of Covert Bullying among Students: Challenges for School Intervention. J. Psychol. Couns. Sch. 2012 , 22 , 206–226. [ Google Scholar ] [ CrossRef ]
  • Craig, W.M.; Pepler, D.; Atlas, R. Observations of Bullying in the Playground and in the Classroom. Sch. Psychol. Int. 2000 , 21 , 22–36. [ Google Scholar ] [ CrossRef ]
  • Bradshaw, C.P.; Sawyer, A.L.; O’Brennan, L.M. Bullying and Peer Victimization at School: Perceptual Differences Between Students and School Staff. Sch. Psychol. Rev. 2007 , 36 , 361–382. [ Google Scholar ] [ CrossRef ]
  • Griffin, R.S.; Gross, A.M. Childhood Bullying: Current Empirical Findings and Future Directions for Research. Aggress. Violent Behav. 2004 , 9 , 379–400. [ Google Scholar ] [ CrossRef ]
  • Yoon, J.S.; Kerber, K. Bullying: Elementary Teachers’ Attitudes and Intervention Strategies. Res. Educ. 2003 , 69 , 27–35. [ Google Scholar ] [ CrossRef ]
  • Kochenderfer-Ladd, B.; Pelletier, M.E. Teachers’ Views and Beliefs about Bullying: Influences on Classroom Management Strategies and Students’ Coping with Peer Victimization. J. Sch. Psychol. 2008 , 46 , 431–453. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pšunder, M. The Identification of Teasing Among Students as an Indispensable Step Towards Reducing Verbal Aggression in Schools. Educ. Stud. 2010 , 36 , 217–228. [ Google Scholar ] [ CrossRef ]
  • Blain-Arcaro, C.; Smith, J.D.; Cunningham, C.E.; Vaillancourt, T.; Rimas, H. Contextual Attributes of Indirect Bullying Situations That Influence Teachers’ Decisions to Intervene. J. Sch. Violence 2012 , 11 , 226–245. [ Google Scholar ] [ CrossRef ]
  • Boulton, M.J.; Hardcastle, K.; Down, J.; Fowles, J.; Simmonds, J.A. A Comparison of Preservice Teachers’ Responses to Cyber versus Traditional Bullying Scenarios: Similarities and Differences and Implications for Practice. J. Teach. Educ. 2014 , 65 , 145–155. [ Google Scholar ] [ CrossRef ]
  • Nicolaides, S.; Toda, Y.; Smith, P.K. Knowledge and Attitudes about School Bullying in Trainee Teachers. Br. J. Educ. Psychol. 2002 , 72 , 105–118. [ Google Scholar ]
  • Veenstra, R.; Lindenberg, S.; Munniksma, A.; Dijkstra, J.K. The Complex Relation Between Bullying, Victimization, Acceptance, and Rejection: Giving Special Attention to Status, Affection, and Sex Differences. Child Dev. 2010 , 81 , 480–486. [ Google Scholar ] [ CrossRef ]
  • Bauman, S. Do we need more measures of bullying? J. Adolesc. Health 2016 , 59 , 487–488. [ Google Scholar ] [ CrossRef ]
  • Espelage, D.L.; Polanin, J.R.; Low, S.K. Teacher and Staff Perceptions of School Environment as Predictors of Student Aggression, Victimization, and Willingness to Intervene in Bullying Situations. Sch. Psychol. Q. 2014 , 29 , 287–305. [ Google Scholar ] [ CrossRef ]
  • Adi, Y.; Killoran, A.; Janmohamed, K.; Stewart-Brown, S. Systematic Review of the Effectiveness of Interventions to Promote Mental Wellbeing in Primary Schools: Universal Approaches Which Do Not Focus on Violence or Bullying. National Institute for Clinical Excellence. 2007. Available online: https://www.ncbi.nlm.nih.gov/books/NBK73674/ (accessed on 26 July 2024).
  • Berkowitz, M.W.; Bier, M.C. What Works in Character Education: A Research-Driven Guide for Educators. Available online: https://www.researchgate.net/profile/Marvin-Berkowitz-2/publication/251977043_What_Works_In_Character_Education/links/53fb5ea60cf22f21c2f31c28/What-Works-In-Character-Education.pdf (accessed on 24 July 2024).
  • Diekstra, R.F.; Gravesteijn, C. Effectiveness of School-Based Social and Emotional Education Programmes Worldwide. In Social and Emotional Education: An International Analysis ; SCIRP: Glendale, CA, USA, 2008; pp. 255–312. Available online: https://www.researchgate.net/profile/Rene-Diekstra-2/publication/255620397_Efectiveness_of_School-Based_Social_and_Emotional_Education_Programmes_Worldwide/links/555e0c9c08ae8c0cab2c5e7e/Efectiveness-of-School-Based-Social-and-Emotional-Education-Programmes-Worldwide.pdf (accessed on 24 July 2024).
  • Baumgarten, E.; Simmonds, M.; Mason-Jones, A.J. School-Based Interventions to Reduce Teacher Violence against Children: A Systematic Review. Child Abus. Rev. 2022 , 32 , e2803. [ Google Scholar ] [ CrossRef ]
  • Nye, E.; Melendez-Torres, G.J.; Gardner, F. Mixed Methods Systematic Review on Effectiveness and Experiences of the Incredible Years Teacher Classroom Management Programme. Rev. Educ. Res. 2019 , 7 , 631–669. [ Google Scholar ] [ CrossRef ]
  • Harris, D.N.; Sass, T.R. Teacher Training, Teacher Quality and Student Achievement. J. Public Econ. 2011 , 95 , 798–812. [ Google Scholar ] [ CrossRef ]
  • Flecha, R.; Radauer, A.; van den Besselaar, P. Monitoring the Impact of EU Framework Programmes. European Commission. 2018. Available online: https://op.europa.eu/en/publication-detail/-/publication/cbb7ce39-d66d-11e8-9424-01aa75ed71a1 (accessed on 24 July 2024).
  • Page, M.J.; Mckenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Int. J. Surg. 2021 , 88 , 105906. [ Google Scholar ] [ CrossRef ]
  • Baker-Henningham, H.; Bowers, M.; Francis, T.; Vera-Hernandez, M.; Walker, S.P. The Irie Classroom Toolbox, a Universal Violence-Prevention Teacher-Training Programme, in Jamaican Preschools: A Single-Blind, Cluster-Randomised Controlled Trial. Lancet Glob. Health 2021 , 9 , E456–E468. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Baker-Henningham, H.; Scott, Y.; Bowers, M.; Francis, T. Evaluation of a Violence-Prevention Programme with Jamaican Primary School Teachers: A Cluster Randomised Trial. Int. J. Environ. Res. Public Health 2019 , 16 , 2797. [ Google Scholar ] [ CrossRef ]
  • Letendre, J.; Ostrander, J.A.; Mickens, A. Teacher and Staff Voices: Implementation of a Positive Behavior Bullying Prevention Program in an Urban School. Child. Sch. 2016 , 38 , 237–246. [ Google Scholar ] [ CrossRef ]
  • Baker-Henningham, H.; Bowers, M.; Francis, T. The Process of Scaling Early Childhood Violence Prevention Programs in Jamaica. Pediatrics 2023 , 151 , e2023-060221M. [ Google Scholar ] [ CrossRef ]
  • Bowers, M.; Francis, T.; Baker-Henningham, H. The Irie Classroom Toolbox: Mixed Method Assessment to Inform Future Implementation and Scale-up of an Early Childhood, Teacher-Training, Violence-Prevention Programme. Front. Public Health 2022 , 10 , 1040952. [ Google Scholar ] [ CrossRef ]
  • Roca-Campos, E.; Duque, E.; Rios, O.; Ramis-Salas, M. The Zero Violence Brave Club: A Successful Intervention to Prevent and Address Bullying in Schools. Front. Psychiatry 2021 , 12 , 601424. [ Google Scholar ] [ CrossRef ]
  • Rodriguez-Oramas, A.; Zubiri, H.; Arostegui, I.; Serradell, O.; Sanvicen-Torne, P. Dialogue with Educators to Assess the Impact of Dialogic Teacher Training for a Zero-Violence Climate in a Nursery School. Qual. Inq. 2020 , 26 , 1019–1025. [ Google Scholar ] [ CrossRef ]
  • Costantino, C.; Casuccio, A.; Marotta, C.; Bono, S.E.; Ventura, G.; Mazzucco, W.; BIAS Study Working Group. Effects of an Intervention to Prevent the Bullying in First-Grade Secondary Schools of Palermo, Italy: The BIAS Study. Ital. J. Pediatr. 2019 , 45 , 64. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Madrid, B.J.; Lopez, G.D.; Dans, L.F.; Fry, D.A.; Duka-Pante, F.G.H.; Muyot, A.T. Safe Schools for Teens: Preventing Sexual Abuse of Urban Poor Teens, Proof-of-Concept Study—Improving Teachers’ and Students’ Knowledge, Skills and Attitudes. Heliyon 2020 , 6 , e04080. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bowes, L.; Aryani, F.; Ohan, F.; Haryanti, R.H.; Winarna, S.; Arsianto, Y.; Budiyawati, H.; Widowati, E.; Saraswati, R.; Kristianto, Y.; et al. The Development and Pilot Testing of an Adolescent Bullying Intervention in Indonesia—The ROOTS Indonesia Program. Glob. Health Action 2019 , 12 , 1656905. [ Google Scholar ] [ CrossRef ]
  • Flecha, R. Sharing Words: Theory and Practice of Dialogic Learning ; Rowman & Littlefield: Lanham, MD, USA, 2000. [ Google Scholar ]
  • Diamond, A.; Lee, C.; Senften, P.; Lam, A.; Abbott, D. Randomized Control Trial of Tools of the Mind: Marked Benefits to Kindergarten Children and Their Teachers. PLoS ONE 2019 , 14 , e0222447. [ Google Scholar ] [ CrossRef ]
  • Kitts, H. ‘It’s like Freire is haunting me’: The value of study groups for critical teacher professional development. Prof. Dev. Educ. 2024 , 1–14. [ Google Scholar ] [ CrossRef ]
  • Schiff, D.; Herzog, L.; Farley-Ripple, E.; Thum Iannuccilli, L. Teacher Networks in Philadelphia: Landscape, Engagement, and Value. Urban Educ. 2015, 12. Available online: https://eric.ed.gov/?id=EJ1056676 (accessed on 21 August 2024).
  • Flecha, R. The Dialogic Society. The Sociology Scientists and Citizens Like and Use ; Hipatia Press: Barcelona, Spain, 2022. [ Google Scholar ]
  • Thapa, A.; Cohen, J.; Guffey, S.; Higgins-D’Alessandro, A. A Review of School Climate Research. Rev. Educ. Res. 2013 , 83 , 357–385. [ Google Scholar ] [ CrossRef ]
  • Puigvert, L.; Gelsthorpe, L.; Soler-Gallart, M.; Flecha, R. Girls’ Perceptions of Boys with Violent Attitudes and Behaviours, and of Sexual Attraction. Palgrave Commun. 2019 , 5 , 56. [ Google Scholar ] [ CrossRef ]
  • WHO. Violence against Children. 2022. Available online: https://www.who.int/news-room/fact-sheets/detail/violence-against-children (accessed on 21 August 2024).

Click here to enlarge figure

Search Terms
“Bullying” AND “teach* train*” OR “Bullying” AND “Teach* education” OR “School harassment” AND “teach* train*” OR “School harassment” AND “Teach* education” OR “School peer victimization” AND “teach* train*” OR “School peer victimization” AND “Teach* education” OR “Zero violence” AND “Teach* train*” OR “Zero violence” AND “Teach* education” OR “Violence against children” AND “teach* train*” OR “Violence against children” AND “teach* education”
TitleType of StudyCountry/RegionPopulationDuration of the Teacher Training
School-based interventions to reduce teacher violence against children: a systematic reviewSystematic ReviewUganda, Tanzania, and Jamaica8 to 42 schools with
between 55 and 591 teachers and 220–4789 students aged 7 to 15 years
Study 1: 18 months; Study 2: 5.5 days, 8 h per day; Study 3: 12 h spread over 8 months; Study 4: 5 days, 9 h per day
Mixed methods systematic review on effectiveness and experiences of the Incredible Years Teacher Classroom Management programmeSystematic ReviewEngland, Ireland, Jamaica, the United States, and WalesMore than 147 schools, over 336 teachers and teaching assistants, and 5759 childrenBetween 5 and 8 sessions were received
The Irie Classroom Toolbox, a universal violence-prevention teacher-training programme, in Jamaican preschools: a single-blind, cluster-randomised controlled trialQuantitativeKingston and St Andrew, Jamaica76 preschools, 3–6-year-old children, 224 female teachers and 5 male (229 in total), 865 students aged 4 years5 full-day (6 h) workshops over one school year and 8 one-hour sessions of in-class support (once a month for 8 months)
Evaluation of a Violence-Prevention Programme with Jamaican Primary School Teachers: A Cluster Randomised TrialQuantitativeKingston, Jamaica14 primary schools, 55 teachers, and 220 students11.5 h of training during 8 months
Effects of an intervention to prevent the bullying in first-grade secondary schools of Palermo, Italy: the BIAS studyQuantitativePalermo, ItalyPalermo, ItalyDuring two school years, a pre–post intervention with 4 meetings, each lasting 5 h (20 h)
Randomized control trial of Tools of the Mind: Marked benefits to kindergarten children and their teachersQuantitativeCanada351 kindergarten children (mean age 5.2 years at entry, 51% female) in 18 public schoolsThree-day workshop before the school year began. Four one-day workshops during the school year
Teacher and Staff Voices: Implementation of a Positive Behavior Bullying Prevention Program in an Urban SchoolQualitativeHartford, Connecticut, USA21 people (teachers, support
staff, and administrators)
No data specified
The Zero Violence Brave Club: A Successful Intervention to Prevent and Address Bullying in SchoolsQualitativeValencia, Spain7 schools, 10 teachers (4 men and 6 women)No data specified
Dialogue With Educators to Assess the Impact of Dialogic Teacher Training for a Zero-Violence Climate in a Nursery SchoolQualitativeLleida, Spain6 educators from the nursery staffNo data specified
The Process of Scaling Early Childhood Violence Prevention Programs in JamaicaMixed methodsJamaicaTechnical staff from the Ministry of Education, 16 middle managers of the ECC, 42 ECC field officers, 18 teachers from 9 preschools (2 teachers each). All government teachers of grades 1, 2, and 3, approximately 5000 teachers. A total of 840 preschool teachers and 557 parentsTwenty 90 min training modules over 1 school year
The Irie Classroom Toolbox: Mixed method assessment to inform future implementation and scale-up of an early childhood, teacher-training, violence-prevention programmeMixed methodsKingston and St Andrew, Jamaica76 preschools, 3–6 year- old-children, and 38 schools, with 108 teachers evaluated at pre-test and 91 teachers from 37 preschools evaluated at post-testFour full-day (6 h each) workshops, 8 one-hour sessions of in-class support (once a month for 8 months)
The Development and Pilot Testing of an Adolescent Bullying Intervention in Indonesia—the ROOTS Indonesia ProgramMixed methodsIndonesia (South Sulawesi and central Java)7592 students across two pilot studies (2075 in the first and 5517 in the second)Two-day training and follow-up coaching.
Safe schools for teens: preventing sexual abuse of urban poor teens, proof-of-concept study—Improving teachers’ and students’ knowledge, skills and attitudesMixed methodsManila, Philippines237 teachers (33 male, 186 female) and 1458 Grade 7 studentsTwo-day training
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Olabarria, A.; Zubiri-Esnaola, H.; Carbonell, S.; Canal-Barbany, J.M. The Characteristics of Teacher Training with Social Impact to Overcome School Violence: A Literature Review. Future 2024 , 2 , 135-148. https://doi.org/10.3390/future2030011

Olabarria A, Zubiri-Esnaola H, Carbonell S, Canal-Barbany JM. The Characteristics of Teacher Training with Social Impact to Overcome School Violence: A Literature Review. Future . 2024; 2(3):135-148. https://doi.org/10.3390/future2030011

Olabarria, Ane, Harkaitz Zubiri-Esnaola, Sara Carbonell, and Josep María Canal-Barbany. 2024. "The Characteristics of Teacher Training with Social Impact to Overcome School Violence: A Literature Review" Future 2, no. 3: 135-148. https://doi.org/10.3390/future2030011

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 03 September 2024

Financial fraud detection through the application of machine learning techniques: a literature review

  • Ludivia Hernandez Aros   ORCID: orcid.org/0000-0002-1571-3439 1 ,
  • Luisa Ximena Bustamante Molano   ORCID: orcid.org/0009-0001-2038-8730 2 ,
  • Fernando Gutierrez-Portela   ORCID: orcid.org/0000-0003-3722-3809 2 ,
  • John Johver Moreno Hernandez   ORCID: orcid.org/0000-0002-8742-7781 1 &
  • Mario Samuel Rodríguez Barrero   ORCID: orcid.org/0000-0001-9356-6764 3  

Humanities and Social Sciences Communications volume  11 , Article number:  1130 ( 2024 ) Cite this article

Metrics details

  • Business and management

Financial fraud negatively impacts organizational administrative processes, particularly affecting owners and/or investors seeking to maximize their profits. Addressing this issue, this study presents a literature review on financial fraud detection through machine learning techniques. The PRISMA and Kitchenham methods were applied, and 104 articles published between 2012 and 2023 were examined. These articles were selected based on predefined inclusion and exclusion criteria and were obtained from databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect. These selected articles, along with the contributions of authors, sources, countries, trends, and datasets used in the experiments, were used to detect financial fraud and its existing types. Machine learning models and metrics were used to assess performance. The analysis indicated a trend toward using real datasets. Notably, credit card fraud detection models are the most widely used for detecting credit card loan fraud. The information obtained by different authors was acquired from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among other countries. Furthermore, the usage of synthetic data has been low (less than 7% of the employed datasets). Among the leading contributors to the studies, China, India, Saudi Arabia, and Canada remain prominent, whereas Latin American countries have few related publications.

Introduction

Financial fraud represents a highly significant problem, resulting in grave consequences across business sectors and impacting people’s daily lives (Singh et al., 2022 ). Its occurrence leads to reduced confidence in the economy, resulting in destabilization and direct economic repercussions for stakeholders (Reurink, 2018 ). Abdallah et al. ( 2016 ) define fraud as a criminal act aimed at obtaining money unlawfully. There are diverse types of fraud, such as asset misappropriation, expense reimbursement, and financial statement manipulation. Scholars have classified fraud into three categories: banking, corporate, and insurance (Ali et al., 2022 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

The problem becomes evident in the case of financial fraud, evidenced by the 2022 figures of the PricewaterhouseCoopers survey report revealing that 56% of companies globally have fallen victim to some form of fraud. In Latin America, 32% of companies have experienced fraud (PricewaterhouseCoopers, 2022 ). These alarming statistics align with the findings from Klynveld Peat Marwick Goerdeler (KPMG), indicating that 83% of the surveyed executives reported being targeted by cyber-attacks in the past 12 months. Furthermore, 71% had encountered some type of internal or external fraud (KPMG, 2022 ). These survey results reveal the higher risks of financial fraud faced by companies in Latin America, the United States, and Canada. In this context, traditional approaches, and techniques, as well as manual methods, have lost relevance and effectiveness because they cannot effectively address the complexity and scale of the information involved in detecting financial fraud.

As previously mentioned, despite the interest of organizations in detecting financial fraud using machine learning (ML), current knowledge in this field remains limited. After an initial research phase, specialized literature shows that most researchers have directed their efforts toward the analysis of credit card fraud using a supervised approach (Femila Roseline et al., 2022 ; Madhurya et al., 2022 ; Plakandaras et al., 2022 ; Saragih et al., 2019 ). In the studies of Ali et al. ( 2022 ), Hilal et al. ( 2022 ), and Ramírez-Alpízar et al. ( 2020 ), ML techniques employing the supervised approach were found to be the most widely used method for detecting financial fraud, compared to the unsupervised, deep learning, reinforcement, and semi-supervised approaches, among others. Moreover, scholars such as Whiting et al. ( 2012 ) have compared the performance of data mining models for detecting fraudulent financial statements using data from quarterly and annual financial indexes of public companies from the COMPUSTAT database.

Reurink ( 2018 ) has analyzed financial fraud resulting from false financial reports, scams, and misleading financial sales in the context of the financial market. Just like Wadhwa et al. ( 2020 ), he presented a wide variety of data mining methods, approaches, and techniques used in fraud detection, in addition to research addressing online banking fraud (Zhou et al., 2018 ; Moreira et al., 2022 ; Srokosz et al., 2023 ) and financial statement fraud (S. Chen, 2016 ; Ramírez-Alpízar et al., 2020 ). The abovementioned research works show that the accuracy of ML techniques in developing models for detecting financial fraud has increased (Al-Hashedi and Magalingam, 2021 ).

The effectiveness of financial fraud detection and prevention depends on the effective selection of appropriate ML techniques to identify new threats and minimize false fraud alarm warnings, responding to the negative impact of financial fraud on organizations (Ahmed et al., 2016 ). The use of ML techniques has made it possible to identify patterns and anomalies in large financial data sets. However, developments in detection tools, inaccurate classification, detection methods, privacy, computer performance, and disproportionate misclassification costs continue to hinder the accurate and timely detection of financial fraud (Dantas et al., 2022 ; Mongwe and Malan, 2020 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

Recently, several studies have reviewed financial statement fraud detection methods in data mining and ML (Gupta and Mehta, 2021 ; Shahana et al., 2023 ); however, the present study is different from these past works in the area. These authors established the types of financial fraud and the different data mining techniques and approaches used to detect financial statement fraud. In contrast, our study explains the trends in the use of ML approaches and techniques to detect financial fraud, and it presents the more frequently used datasets in the literature for conducting experiments.

Fraud detection mechanisms using machine learning techniques help detect unusual transactions and prevent cybercrime (Polak et al., 2020 ). Although each of these approaches uses different methods in their experimentation, a systematic literature review (SLR) shows that the application of each algorithm mirrors performance metrics to determine the accuracy with which it predicts that a financial transaction is fraud. Such metrics include Accuracy, Precision, F1 Score, Recall, and Sensitivity, among others.

The research presented uses a rigorous and well-structured methodology to expand current knowledge on financial fraud detection using machine learning (ML) techniques. Through the use of a systematic literature review that follows adaptations of PRISMA guidelines and Kitchenham’s methodology, the study ensures a carefully planned and transparent review process. The sources of information consulted include research articles published in reputable academic databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, ensuring that the review covers the most relevant and quality scientific literature in the field of financial fraud and machine learning. Moreover, the study includes a bibliometric analysis using VOSviewer software, which allows identifying trends and patterns within the literature both quantitatively and visually. Based on the 104 articles reviewed, which cover the period 2012–2023, we manage to describe the types of fraud, the models applied, the ML techniques used, the datasets employed, and the metrics of performance reported. These contribute to filling the existing gaps in the literature by providing a comprehensive and up-to-date synthesis of the evidence on the use of machine learning techniques for financial fraud detection, thus laying the groundwork for future research and practical applications in this field.

Our responses to the initial research questions raised are four main contributions that justify this research. Thus, this study contributes to the literature on financial fraud detection by examining the relationship between the current literature on financial fraud detection and ML based on the scholars, articles, countries, journals, and trends in the area. Fraud has been classified as internal and external, with a focus on credit card loan fraud investigations and insurance fraud. The different ML techniques and their models applied to experiments were grouped. The most widely used datasets in financial fraud detection using ML are analyzed according to the 86 articles that contained experiments, highlighting that most of them involve real data. This paper is useful for researchers because it studies and presents the metrics used in supervised and unsupervised learning experiments, providing a clear view of their application in the different models.

Therefore, this study is relevant because it presents in a consolidated and updated manner new contributions derived from experiment results regarding the use of ML, which helps address the problem when financial fraud occurs.

The research work is organized as follows: the section “Methods” comprehensively describes the research method and the questions addressed in the study. Section “Results of the data synthesis” presents the findings encompassing authors, articles, sources, countries, trends, financial fraud types, and datasets with their characteristics to which the detection models using ML techniques were applied, with the results of their metrics. Finally, the section “Discussion and conclusion” highlights the conclusions, including future lines of research in the field.

The study focuses on SLR, which provides a comprehensive view of the great developments in financial fraud detection. Considering the purpose, scientific guidelines were followed in the literature review of the PRISMA and Kitchenham methods, which were adapted by the authors (Ashtiani and Raahemi, 2022 ; Kitchenham and Brereton, 2013 ; Kitchenham and Stuart, 2007 ; Kumbure et al., 2022 ; Moher et al., 2009 ; Roehrs et al., 2017 ; Saputra et al., 2023 ; Wohlin, 2014 ).

The method used in the SLR was developed with carefully planned and executed activities: (a) planning of the review, (b) definition of research questions, (c) description of the search strategy, (d) consultation concerning the search strategy, (e) selection of the inclusion/exclusion criteria and data selection, (f) description of the quality assessment, (g) investigation of the study topics, (h) description of data extraction, and (i) synthesis of the data.

Each of the activities conducted in this study is explained below.

Planning of the review

The research purpose was established in accordance with the indicated research goals and questions. The analysis focused on research articles published between 2012 and 2023, particularly those using ML methods for financial fraud detection. Accordingly, the SLR procedure presented by Kitchenham and Stuart ( 2007 ) and Moher et al. ( 2009 ) was implemented following a series of steps adapted and modified by Ashtiani and Raahemi ( 2022 ) and Kumbure et al. ( 2022 ), as depicted in Fig. 1 . Thus, it was possible to ensure a rigorous and objective analysis of the available literature in our field of interest.

figure 1

Description of the general process used to review the literature in the study area. Authors’ own elaboration.

The procedures implemented in this review process are discussed in the following subsections.

Definition of research questions

In SLR, research questions are key and decisive for the success of the study (Kitchenham and Stuart, 2007 ). Therefore, analyzing the existing literature on financial fraud detection through ML techniques and its characteristics, problems, challenges, solutions, and research trends is crucial. Table 1 describes the research questions to provide a structured framework for the study.

Within the proposed systematic review, the questions were fine-tuned, achieving a better classification and thematic analysis. The research questions were categorized into two groups: general questions (GQ) and specific questions (SQ). GQs provide an overview of the current state of the art, that is, a general framework for future research. Meanwhile, SQs focus on specific matters emerging from the application areas of the topic, thereby improving the filtering process of the study.

Description of the search strategy

The search strategy was designed to identify a set of studies addressing the research questions posed. This strategy was to be implemented in two stages. In the first stage, a manual search was conducted by selecting a set of test documents through a defined database. Following the strategy proposed by Wohlin ( 2014 ), a snowball search was conducted. This approach involved choosing from a set of initial references (e.g., relevant articles or books addressing the subject matter) and searching for new related references relevant to the study based on these.

In the second stage, an automated search was performed using the technique described by Kitchenham and Brereton ( 2013 ), which included preparing a list of the main search terms to be applied in the queries in each database, as indicated in subsection “Search queries”.

Manual search

In the study’s initial stage, nine journal articles were selected from the test set of papers (Ahmed et al., 2016 ; Ali et al., 2022 ; Bakumenko and Elragal, 2022 ; Gupta and Mehta, 2021 ; Hilal et al., 2022 ; Nicholls et al., 2021 ; Nonnenmacher and Marx Gómez, 2021 ; Ramírez-Alpízar et al., 2020 ; West and Bhattacharya, 2016 ). The manual literature search helped identify articles related to financial fraud detection through ML techniques, which were used as an initial set and were part of the final analysis. In the subsequent stage, a backward and forward snowball search was conducted. This approach involved using the initial set to select the relevant articles.

The backward snowball search process comprised reviewing article titles, including those meeting the inclusion and exclusion criteria. In the forward snowball search, the analysis was performed in the Scopus database to identify studies citing one or more of the articles in the initial set. This filtering method helped identify studies meeting the inclusion and exclusion criteria, eliminate duplicates from the previous set, and analyze articles answering the questions posed, which were retained in the final study set.

Automated search

The research work mainly aimed to obtain a reliable set of relevant studies to minimize bias and increase the validity of the results. To this end, a manual search for articles meeting the inclusion and exclusion criteria was conducted by assessing the abstracts and other sections of articles. We decided to implement an automated search strategy using five databases: Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, known for their impartiality in the representation of research works, with inclusion and exclusion criteria already defined, thereby complementing the search. Thus, 104 related articles meeting the criteria established in the final set were identified.

Search queries

Studies from 2012 onward were reviewed with keywords such as “financial fraud” and “machine learning” to identify model-based approaches and associated techniques. Table 2 presents a summary of the queries used in each data source.

Inclusion and exclusion criteria and study selection

The study established inclusion and exclusion criteria, a key process to select the most relevant articles. The exclusion criteria were documents published between 2012 and 2023 (until March), such as conference reviews, book chapters, editorials, and reviews. Further, the availability of the full text of the article was considered. We decided to exclude articles published before 2012 for the following reasons: (i) They were over 11 years old; (ii) Relevant publications prior to 2012 were scarce; and (iii) Sufficient number of articles were available between 2012 and 2023.

For the inclusion and exclusion criteria, appropriate filtering tools were applied to each data source during the search stage. This enabled the automated selection of the most relevant and appropriate studies based on the research goal.

Data processing strategies

In the data processing strategy used, databases were selected following strict inclusion and exclusion criteria to ensure the quality and relevance of the information collected (Table 3 ). Various databases initially identified the following number of relevant articles: Scopus (28), Taylor & Francis (80), SAGE (71), ScienceDirect (663), and IEEE Xplore (5132). This initial step provides a broad overview of the available literature in the field of financial fraud detection using ML models.

Subsequently, a data removal phase was carried out so as to ensure data integrity, such that the following number of articles (given in parentheses) were removed from each database: Scopus (0), Taylor & Francis (63), SAGE (57), ScienceDirect (636), and IEEE Xplore (5114). This rigorous process ensures the integrity of the data collected and avoids redundancy.

The final step consisted of obtaining the consolidated number of articles included after the selection and exclusion of duplicates: Scopus (28), Taylor & Francis (17), SAGE (14), ScienceDirect (27), and IEEE Xplore (18). This methodological strategy ensured the relevance of the articles that carried out a complete analysis in the field of financial fraud detection using ML models.

Quality assessment

Once the inclusion and exclusion criteria were applied, the remaining articles were assessed for quality. The evaluation criteria used included the purpose of the research; contextualization; literature review; and related works, methods, conclusions, and results. To minimize the empirical obstacles associated with full-text filtering, a set of questions proposed by Roehrs et al. ( 2017 ) (see Table 4 ) was used to validate whether the selected articles met the previously established quality criteria.

Research topics

In conducting the literature review to understand the current state of published research on the topic, a data orientation process was addressed, including preprocessing techniques and ML models and their metrics. Accordingly, four research topics were defined based on the research goals. They are presented in Table 5 .

Data extraction

For data extraction, the necessary attributes were first defined and the information pertaining to the study goals was summarized. Next, the relevant information was identified and obtained through a detailed reading of the full text of each article. The information was then stored in a Microsoft Excel spreadsheet. Data were collected on the attributes specified in Table 6 . In Table 6 , the “Study” column corresponds to the identifiers of the research topics in Quality Assessment, and the “Subject” column refers to the category to which the different attributes belong. The names of the attributes and a brief description are presented in the last two columns of the table, including additional columns with relevant information.

Data synthesis

Data synthesis included analyzing and summarizing the information observed in the selected articles to address the research questions. To perform this task, a synthesis was conducted following the guidelines proposed by Moher et al. ( 2009 ) based on qualitative data. Further, a descriptive analysis was performed to obtain answers to the research questions. Consequently, a qualitative approach to data evidence was followed.

Results of the data synthesis

In this section, the 104 finally selected articles have been considered. The data were synthesized to address the five research questions mentioned.

General questions (GQ)

GQ1: Which were the most relevant authors, articles, sources, countries, and trends in the literature review on financial fraud detection based on the application of machine learning (ML) models?

The literature on financial fraud detection applying ML models has been studied by a large number of authors. However, some authors stood out in terms of the number of published papers and number of citations. Specifically, the most significant authors with two publications are Ahmed M. (with 318 citations), Ileberi E. (82 citations), Ali A. (20 citations), Chen S. (84 citations), and Domashova J and Kripak E. (each with 6 citations). Other relevant authors with one publication and who have been cited several times are Abdallah A. (with 333 citations), Abbasimehr H. (18 citations), Abd Razak S. (13 citations), Achakzai M. A. K. (5 citations), and Abosaq H. (2 citations). The aforementioned authors have contributed significantly to the development of research in financial fraud detection using ML models (Fig. 2 ).

figure 2

Shows the analysis of the connections between authors based on co-authorship of publications. Produced with VOSviewer.

Collectively, the researchers have contributed a solid knowledge base and have laid the foundation for future research in financial fraud detection using ML models. Although other researchers contributed to the field, such as Khan, S. and Mishra, B., both with 7 citations, among others, some have been more prominent in terms of the number of papers published. Their collective works have enriched the field and have promoted a greater understanding of the challenges and opportunities in this area.

As depicted in Fig. 3 , clusters 2 (green) and 4 (yellow) present the most relevant research articles on financial fraud detection using ML models. Cluster 2, comprising 9 articles with 357 citations and 32 links, is highlighted because of the significant impact of the articles by Sahin, Huang, and Kim. These articles have the highest number of citations and are deemed to be useful starting points for those intending to dive into this research field. Cluster 4, constituting 6 articles with 158 citations and 27 links, includes the works of Dutta and Kim, who have also been cited considerably.

figure 3

Depicts the connections between articles based on their bibliographic references. Produced with VOSviewer.

Articles in clusters 1 (red) and 3 (dark blue) could be valuable sources of information; however, they were observed to have a lower number of citations and links than those in clusters 2 and 4, such as that of Nian K. (62 citations and 4 links) and Olszewski (92 citations and 4 links). However, some articles in these clusters have had a substantial number of citations.

In Cluster 10 (pink), the article by Reurink A. is prominent, with 38 citations. This is followed by the article by Ashtiani M.N. with 10 citations. In Cluster 11 (light green), the article by Hájek P. has 129 citations. In Cluster 12 (grayish blue), the articles by Blaszczynski J. and Elshaar S. have the greatest number of citations, indicating their influence in the field of financial fraud detection.

In Cluster 13 (light brown), the article by Pourhabibi T. has the greatest number of citations at 102, suggesting that he has been relevant in the research on financial fraud detection. Finally, in Cluster 14 (purple), the articles by Seera M. have 63 citations and 2 links. The article by Ileberi E. has 11 citations and 1 link. Both articles have a small number of citations, indicating a lower influence on the topic.

In conclusion, clusters 2, 4, and 11 are the most relevant in this literature review. The articles by Sahin, Huang, Kim, Dutta, and Pumsirirat are the most influential ones in the research on financial fraud detection through the application of ML models.

The information presented in Fig. 4 is the result of a clustering analysis of the articles resulting from the literature review on financial fraud detection by implementing ML models. In total, 48 items were identified and grouped into 12 clusters. The links between the items were 100, with a total link strength of 123.

figure 4

Shows the relationship between different scientific journals based on bibliographic links. Produced with VOSviewer.

The following is a description of each cluster with its respective number of items, links, and total link strength (the number of times a link appears between two items and its strength):

Cluster 1 (6 articles—red): This cluster includes journals such as Computers and Security , Journal of Network and Computer Applications , and Journal of Advances in Information Technology . The total number of links is 27, and the total link strength is 32.

Cluster 2 (6 articles—dark green): This cluster includes articles from Technological Forecasting and Social Change , Journal of Open Innovation: Technology, Market, and Complexity , and Global Business Review . The total number of links is 18, and the total link strength is 19.

Cluster 3 (5 articles—dark blue): This cluster includes articles from the International Journal of Advanced Computer Science and Applications , Decision Support Systems , and Sustainability . The total number of links is 19, and the total link strength is 20.

Cluster 4 (4 articles—dark yellow): This cluster includes articles from Expert Systems with Applications and Applied Artificial Intelligence . The total number of links is 26, and the total link strength is 45.

Cluster 5 (4 articles—purple): This cluster includes articles from Future Generation Computer Systems and the International Journal of Accounting Information Systems . The total number of links is 15, and the total link strength is 16.

Cluster 6 (4 articles—dark blue): This cluster includes articles from IEEE Access and Applied Intelligence . The total number of links is 18, and the total link strength is 26.

Cluster 7 (4 articles—orange): This cluster includes articles from Knowledge-Based Systems and Mathematics . The total number of links is 23, and the total link strength is 29.

Cluster 8 (4 articles—brown): This cluster includes articles from the Journal of King Saud University—Computer and Information Sciences and the Journal of Finance and Data Science . The total number of links is 13, and the total link strength is 13.

Cluster 9 (4 articles—light purple): This cluster includes articles from the International Journal of Digital Accounting Research and Information Processing and Management . The total number of links is 2, and the total link strength is 2.

The clusters represent groups of related articles published in different academic journals. Each cluster has a specific number of articles, links, and total link strength. These findings provide an overview of the distribution and connectedness of articles in the literature on financial fraud detection using ML models. Further, clustering helps identify patterns and common thematic areas in the research, which may be useful for future researchers seeking to explore this field.

Clusters 1, 4, and 7 indicate a greater number of stronger articles and links. These clusters encompass articles from Computers and Security , Expert Systems with Applications , and Knowledge-Based Systems , which are important sources for the SLR on financial fraud detection through the implementation of ML models.

The analysis presented indicates the number of documents related to research in different countries and territories. In this case, a list of 50 countries/territories and the number of documents related to the research conducted in each of them is presented. China leads with the highest paper count at 18, followed by India at 13 and Saudi Arabia and Canada at 9 each. Canada, Malaysia, Pakistan, South Africa, the United Kingdom, France, Germany, and Russia have similar research outputs with 4–9 papers. Sweden and Romania have 1 or 2 research papers, indicating limited scientific research output.

The presence of little-known countries such as Armenia, Costa Rica, and Slovenia suggests ongoing research in places less common in the academic world. From that point on, the number of papers has gradually decreased.

The production of papers is geographically distributed across countries from different continents and regions. However, more research exists on the subject from countries with developed and transition economies, which allows for a greater capacity to conduct research and produce papers.

Figure 5 , sourced from Scopus’s “Analyze search results” option, depicts countries with their respective number of published papers on the topic of financial fraud detection through ML models.

figure 5

Represents the number of scientific publications in the study area classified by country. Produced with VOSviewer.

The above shows the diversity of countries involved in the research, where China leads the number of studies with 18 papers, followed by India with 13 and Saudi Arabia and Canada each with 9 papers. The other countries show little production, with less than 7 publications, which indicates an emerging topic of interest for the survival of companies that must prevent and detect different financial frauds using ML techniques.

The most relevant keywords in the review of literature on financial fraud detection implementing ML models include the following:

In Cluster 1, the most relevant keywords are “decision trees” (13 repetitions), “support vector machine (SVM)” (11 repetitions), “machine-learning” (10 repetitions), and “credit card fraud detection” (9 repetitions). A special focus has been placed on the topic of artificial intelligence (ML), in addition to algorithms and/or supervised learning models such as decision trees, support vector machines, and credit card fraud detection.

In Cluster 2, the most relevant keywords are “crime” (46 repetitions), “fraud detection” (43 repetitions), and “learning systems” (13 repetitions). These terms reflect a broader focus on financial fraud detection, where the aspects of crime in general, fraud detection, and learning systems used for this purpose have been addressed.

In Cluster 3, the most relevant keywords are “Finance” (19 repetitions), “Data Mining” (18 repetitions), and “Financial Fraud” (12 repetitions). These keywords indicate a focus on the financial industry, where data mining is used to reveal patterns and trends related to financial fraud.

In Cluster 4, the most relevant keywords are “Machine Learning” (45 repetitions), “Anomaly Detection” (16 repetitions), and “Deep Learning” (11 repetitions). They reflect an emphasis on the use of traditional ML and deep learning techniques for anomaly detection and financial fraud detection.

In general, the different clusters indicate the most relevant keywords in the SLR on financial fraud detection through ML models. Each cluster presents a specific set of keywords reflecting the most relevant trends and approaches in this field of research (Fig. 6 ).

figure 6

Shows the relationships between keywords based on their co-occurrence in the literature reviewed. Produced with VOSviewer.

GQ2: What types of financial fraud have been identified in ML studies?

Financial fraud is generated by weaknesses in companies’ control mechanisms, which are analyzed based on the variables that allow them to materialize. These include opportunity, motivation, self-fulfillment, capacity, and pressure. Some of these are comprehensively analyzed by Donald Cressey through the fraud theory approach. The lack of modern controls has led organizations to use ML in response to this major problem. According to the findings of the Global Economic Crime and Fraud Survey 2022–2023, which gathered insights from 1,028 respondents across 36 countries worldwide, instances of fraud within these companies have caused a financial loss of approximately 10 million dollars (PricewaterhouseCoopers, 2022 ).

Referring to the concept of fraud, as outlined in international studies (Estupiñán Gaitán, 2015 ; Márquez Arcila, 2019 ; Montes Salazar, 2019 ) and the guidelines of the American Institute of Certified Public Accountants, it is an illegal, intentional act in which there is a victim (someone who loses a financial resource) and a victimizer (someone who obtains a financial resource from the victim). Thus, the proposed classification includes corporate fraud and/or fraud in organizations, considering that the purpose is to misappropriate the capital resources of an entity or individual: cash, bank accounts, loans, bonds, stocks, real estate, and precious metals, among others.

In this SLR study, we have considered fraud classifications by authors of 86 articles, which encompass experiments. We have excluded the 18 SLR articles from our analysis. The types presented in Table 7 follow the holistic view of the authors of the research for a better understanding of the subject of financial fraud, considering whether it is internal or external fraud.

Table 7 highlights the diverse types of frauds, and the research works on them. According to the classification, external frauds correspond to those performed by stakeholders outside the company. This study’s findings show that 54% of the analyzed articles investigate external fraud, among which the most important studies are on credit card loan fraud, followed by insurance fraud, using supervised and unsupervised ML techniques for their detection.

In research works (Kumar et al., 2022 ) analyzing credit card fraud, attention is drawn to the importance of prevention through the behavioral analysis of customers who acquire a bank loan and identifying applicants for bad loans through ML models. The datasets used in these fraud studies have covered transactions performed by credit card holders (Alarfaj et al., 2022 ; Baker et al., 2022 ; Hamza et al., 2023 ; Madhurya et al., 2022 ; Ounacer et al., 2018 ; Sahin et al., 2013 ), while other research works have covered master credit card money transactions in different countries (Wu et al., 2023 ) and fraudulent transactions gathered from 2014 to 2016 by the international auditing firm Mazars (Smith and Valverde, 2021 ).

The second major type of external fraud is insurance fraud, which is classified as fraud in health insurance programs involving practices such as document forgery, fraudulent billing, and false medical prescriptions (Sathya and Balakumar, 2022 ; Van Capelleveen et al., 2016 ) and automobile insurance fraud involving fraudulent actions between policyholders and repair shops, who mutually rely on each other to obtain benefits (Aslam et al., 2022 ; Nian et al., 2016 ; Subudhi and Panigrahi, 2020 ); as a result of the issues they face, insurance companies have developed robust models using ML.

As regards internal fraud, caused by an individual within the company, 46% of studies have analyzed this type, with financial statement fraud, money laundering fraud, and tax fraud standing out. The studies show that the investigations are based on information reported by the US Securities and Exchange Commission (SEC) and the stock exchanges of China, Canada, Tehran, and Taiwan, among others. To a considerable extent, the information taken is from the real sector, and very few studies have obtained synthetic information based on the application of different learning models.

The following is a summary of the financial information obtained by the researchers to apply AI models and techniques:

Stock market financial reports : Fraud in the Canadian securities industry (Lokanan and Sharma, 2022 ), companies listed on the Chinese stock exchanges (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Xiuguo and Shengyong, 2022 ), companies with shares according to the SEC (Hajek and Henriques, 2017 ; Papík and Papíková, 2022 ), companies listed on the Tehran Stock Exchange (Kootanaee et al. 2021 ), companies in the Taiwan Economic Journal Data Bank (TEJ) stock market (S. Chen, 2016 ; S. Chen et al., 2014 ), analysis of SEC accounting and auditing publications (Whiting et al., 2012 )

Wrong financial reporting to manipulate stock prices (Chullamonthon and Tangamchit, 2023 ; Khan et al., 2022 ; Zhao and Bai, 2022 )

Financial data of 2318 companies with the highest number of financial frauds (mechanical equipment, medical biology, media, and chemical industries; Shou et al., 2023 ), fraudulent financial restatements (Dutta et al., 2017 )

Data from 950 companies in the Middle East and North Africa region (Ali et al., 2023 ), analyzing outliers in sampling risk and inefficiency of general ledger financial auditing (Bakumenko and Elragal, 2022 ), fraudulent intent errors by top management of public companies (Y. J. Kim et al., 2016 ), reporting of general ledger journal entries from an enterprise resource planning system (Zupan et al., 2020 )

Synthetic financial dataset for fraud detection (Alwadain et al., 2023 ).

Studies have analyzed situations involving fraudulent financial statements. In these cases, instances of fraud have already occurred, leading to the creation of financial reports that contain statements with outliers that can be deemed fraudulent intent or errors in financial figures. This raises a reasonable doubt about whether an intent exists with regard to the reporting of unrealistic figures. Notably, once there are parties responsible for the financial information presented to stakeholders, such as organization owners, managers, administrators, accountants, or auditors, it is unlikely for it to be unintentional (an error). In this context, transparency and explainability are essential so as to ensure fairness in decisions, thus avoiding bias and discrimination based on prejudiced data (Rakowski et al., 2021 ).

Because of its significance, the information reported in financial statements is vital for investigations. Studies have indicated substantial amounts of data extracted from the financial reports of regulatory bodies such as stock exchanges and auditing firms. These entities use the data to establish the existence of fraud and its types through predictive models that use ML techniques. Thus, they require financial data such as dates, the third party affected, user, debit or credit amount, and type of document, among other aspects involving an accounting record. This information aids in identifying the possible impact in terms of lower profits and the perpetrator and/or perpetrators to gather sufficient evidence and file criminal proceedings for the financial damage caused.

Moreover, investigations concerning money laundering fraud and/or money laundering, the second most investigated internal fraud type, encompass the reports of natural and legal persons exposed by the Financial Action Task Force in countries such as the Kingdom of Saudi Arabia (Alsuwailem et al., 2022 ), transactions from April to September 2018 from Taiwan’s “T” bank and the account watch list of the National Police Agency of the Ministry of Interior (Ti et al., 2022 ), money laundering frauds in Middle East banks (Lokanan, 2022 ), transactions of financial institutions in Mexico from January 2020 (Rocha-Salazar et al., 2021 ), and synthetic data of simulated banking transactions (Usman et al., 2023 ).

Concerns regarding the entry of proceeds from money laundering into an organization have been articulated in relation to the financial damage it causes to the country. At the macroeconomic level, these activities negatively affect financial stability, distorting the prices of goods and services. Moreover, such activities disrupt markets, making it difficult to make efficient financial decisions. At the microeconomic level, legitimate businesses face unfair competition with companies using illegal money, which may lead to higher unemployment levels. Furthermore, money laundering has a social impact because it affects the security and welfare of society.

Thus, some research works (Alsuwailem et al., 2022 ) have indicated the need to implement ML models for promoting anti-money laundering measures. For instance, in Saudi Arabia, money from illicit drug trafficking, corruption, counterfeiting, and product piracy have entered the country. The measures to be taken are categorized according to the three stages of money laundering: placement, layering (also known as concealment), and integration. These include new legal regulations against money laundering, staff training, customer identification and validation, reporting of suspicious activities, and documentation and storage of relevant data (Bolgorian et al., 2023 ).

Regarding the 7.5% incidence of internal fraud, specifically categorized as tax fraud resulting from tax evasion, the studies have analyzed tax returns on income and/or profits of legal persons and/or individuals from the Serbian tax administration during 2016–2017 (Savić et al., 2022 ). Studies have encompassed periodic value-added tax (VAT) returns, together with the anonymous list of clients for the tax year 2014 obtained from the Belgian tax administration (Vanhoeyveld et al., 2020 ) and income tax and VAT taxpayers registered and provided by the State Revenue Committee of the Republic of Armenia in 2018 (Baghdasaryan et al., 2022 ). These studies hold great relevance for tax administrations using different strategies to minimize the impact of fraud resulting from tax evasion. Tax evasion reduces the government’s ability to collect revenue, directly affecting government finances and causing budget deficits, thereby increasing public debt.

GQ3: Which ML models were implemented to detect financial fraud in the datasets?

Given that ML is a key tool to extract meaningful information and make informed decisions, this study analyzes the most widely used ML techniques in the field of financial fraud detection. It takes as reference 86 experimental articles, excluding 18 SLR articles. In these articles, the most commonly used trends and approaches in the implementation of ML techniques in financial fraud detection were identified.

For the analysis, the pattern of frequency of use of ML models was observed. Several of them have been prominent because of their popularity and implementation in detecting financial fraud (Fig. 7 ). Some of the most widely used models include long-short term memory (LSTM) with 7 mentions, autoencoder with 10 mentions, XGBoost with 13 mentions, k -nearest neighbors (KNN) with 14 mentions, artificial neural network (ANN) with 17 mentions, NB with 19 mentions, SVM with 29 mentions, DT with 29 mentions, LR with 32 mentions, and RF with 34 mentions.

figure 7

Illustrates the most common machine learning models in financial fraud detection. Authors’ own elaboration.

The LSTM model is a recurrent neural network used for sequence processing, especially for tasks concerning natural language processing (Chullamonthon and Tangamchit, 2023 ; Esenogho et al., 2022 ; Femila Roseline et al., 2022 ). Moreover, autoencoders are models used for data compression and decompression. These models are useful in dimensionality reduction applications (Misra et al., 2020 ; Srokosz et al., 2023 ). XGBoost is a library combining multiple weak DT models, offering a scalable and efficient solution in classification and regression tasks (Dalal et al., 2022 ; Udeze et al., 2022 ).

KNN and ANN are widely used models in various ML applications. KNN is based on neighbor closeness, and ANN is inspired by human brain functioning. NB is a probabilistic algorithm commonly used in text classification and data mining (Ashtiani and Raahemi, 2022 ; Lei et al., 2022 ; Shahana et al., 2023 ).

SVM, DT, LR, and RF, the most commonly mentioned models, are used in a wide range of classification and regression applications. These models are prominent because of their effectiveness and applicability to different scenarios, such as credit card loan fraud (external fraud) and financial statement fraud (internal fraud).

The most frequently used ML techniques are supervised learning (56.73%); unsupervised learning (18.29%), a combination of supervised and unsupervised learning (15.38%), a combination of supervised and deep learning (2.88%), and mathematical approach, supervised, and semi-supervised learning (0.96%). Figure 8 presents the ML techniques in the literature reviewed and indicates the number of times each type of technique is applied. Some articles applied several ML methods, in which the algorithms are mainly classified according to the learning method. In this case, there are four main types: supervised, semi-supervised, unsupervised, and deep learning.

figure 8

Shows the different experimental approaches used in the study. Authors’ own elaboration.

Supervised learning is the most widely used technique, with 56.73% of citations in financial fraud studies. In this approach, labeled training data are used, where the expected outputs are known and a model is built that can make higher-accuracy predictions on new unlabeled data. Common examples of supervised learning techniques include the models of LR, SVM, DT, RF, KNM, NB, and ANN.

Moreover, unsupervised learning constitutes 18.27% of the mentions. The technique focuses on discovering patterns in the data without knowing data with labels and/or types for training. Some of these include DBSCAN, autoencoder, and isolation forest (IF).

The combination of supervised, unsupervised, and semi-supervised learning is used with a frequency of 1.92%. This technique and/or approach combines elements of supervised and unsupervised learning, using both labeled and unlabeled data to train the models. It is also used when labeled data are scarce or expensive to obtain; thus, the aim is to take advantage of unlabeled information to improve model performance.

Finally, supervised and deep learning represents 2.88% of the mentions. It is based on deep neural networks with multiple neurons and hidden layers to learn complex data representations. It has achieved remarkable developments in areas such as image processing, voice recognition, and machine translation.

Specific questions (SQ)

SQ1: What datasets were used by implementing ML models for financial fraud detection?

First, the data structure and fraud types may vary with the collection of datasets. The performance of fraud detection models may be affected by variations in the number of instances and attributes selected. Therefore, investigating the datasets and their characteristics is relevant, as data differ in terms of data type (number, text) and the data source from which they were obtained (synthetic and/or real), as can be observed in Fig. 9 .

figure 9

Depicts the datasets used in the research on financial fraud detection. Authors’ own elaboration.

Credit card fraud detection

The dataset was created by the Machine Learning group at Université Libre de Bruxelles. It encompasses anonymized credit card transactions labeled as fraudulent or genuine. The transactions were performed in September 2013 over two days by European cardholders; a record of only 492 frauds out of 284,807 transactions is highly unbalanced because the positive types (frauds) represent only 0.172% of all transactions (Machine Learning Group, 2018 ).

The characteristics of the set encompass numerical variables resulting from a principal component analysis (PCA) transformation. For confidentiality, the original features of the data have not been disclosed. Features V1, V2…, V28 have been the main components obtained through PCA. The only features that have not transformed with PCA include “Time,” which denotes the seconds elapsed between each transaction. “Amount” denotes the transaction amount. The “Class” feature is the response variable, taking 1 as the value in case of fraud and 0 (no fraud) otherwise.

This dataset has been used by 15 authors in their papers, who have applied different financial fraud detection techniques (Alarfaj et al., 2022 ; Baker et al., 2022 ; Fanai and Abbasimehr, 2023 ; Fang et al., 2019 ; Femila Roseline et al., 2022 ; Hwang and Kim, 2020 ; Ileberi et al., 2021 , 2022 ; Khan et al., 2022 ; Misra et al., 2020 ; Ounacer et al., 2022 ).

Statlog (German credit data)

The dataset was proposed by Professor Hofmann to the UC Irvine ML repository on November 16, 1994, for facilitating credit rating (Hofmann, 1994 ). It mainly aims to determine whether a person presents a favorable or unfavorable credit risk (binary rating). The set is multivariate, which implies that it contains many attributes used in credit rating. These attributes include information on existing current account status, credit duration, credit history, and credit purpose and amount, among others. In total, there are 20 attributes describing several characteristics of individuals and contains 1000 instances; it has been widely used in research related to credit rating (Esenogho et al., 2022 ; Fanai and Abbasimehr, 2023 ; Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

Stalog (Australian credit approval)

The dataset belongs to the UC Irvine ML repository and was created by Ross Quinlan in 1997. It focuses on credit card applications within the financial field (Quinlan, 1997 ). It has a total of 690 instances and 14 attributes of which 6 are numeric of type integer/actual and 8 are categorical; consequently, its data characteristics are multivariate—that is, it contains multiple variables and/or attributes. Several studies have used the ensemble data (Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ; Singh et al., 2022 ).

China Stock Market and Accounting Research

The China Stock Market and Accounting Research (CSMAR) Database contains financial reports and violations of CSMAR. It provides information on China’s stock markets and the financial statements of listed companies; the data were collected between 1998 and 2016 from publicly funded companies (CSMAR, 2022 ). It includes fraudulent and non-fraudulent companies committing several types of fraud, such as showing higher profits and/or earnings, fictitious assets, false records, and other irregularities in financial reporting.

The set comprises 35,574 samples, including 337 annual fraud samples of companies in the Chinese stock market. This is selected as a data source to illustrate the financial statement information of listed companies in three studies (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Shou et al., 2023 ).

Synthetic financial datasets for fraud detection

It was generated by the PaySim mobile money simulator using aggregated data from a private dataset deriving from one month of financial records from a mobile money service in an African country (López-Rojas, 2017 ). The original records were provided by a multinational company offering mobile financial services in more than 14 countries worldwide. The dataset has been used in numerous studies (Alwadain et al., 2023 ; Hwang and Kim, 2020 ; Moreira et al., 2022 ).

The synthetic dataset provided is a scaled-down version, representing a quarter of the original dataset. It was made available for Kaggle. It constitutes 6,362,620 samples, with 8213 fraudulent transaction samples and 6,354,407 non-fraudulent transactions. It includes several attributes related to mobile money transactions: transaction type (cash-in, cash-out, debit, payment, and transfer); transaction amount in local currency; customer information (customer conducting the transaction and transaction recipient); initial balances before and after the transaction; and fraudulent behavior indicators (isFraud and isFlaggedFraud). These attributes indicate a binary classification.

Default of credit card clients

It was created by I-Cheng Yeh and introduced on January 25, 2016, and is available in the UC Irvine ML repository (Yeh, 2016 ). The dataset, which is used for classification tasks, focuses on the case of defaulted payments of credit card customers in Taiwan in the business area. Moreover, it is a multivariate dataset with 30,000 instances and 24 attributes. They include attributes such as the amount of credit granted, payment history, and statement records spanning April through September 2005. This data source is selected in studies such as those by Esenogho et al. ( 2022 ), Pumsirirat and Yan ( 2018 ), and Seera et al. ( 2021 ).

Synthetic data from a financial payment system

Edgar Lopez Rojas created the dataset in 2017. The synthetic data were generated in the BankSim payment simulator. It is based on a sample of transactional data provided by a bank in Spain (López-Rojas, 2017 ). It includes the following characteristics: step, customer ID, age, gender, zip code, merchant ID, zip code of merchant, category of purchase, amount of purchase, and fraud status. It comprises 594,643 transactions, of which ~1.2% (7200) were labeled as fraud and the rest (587,443) were labeled as genuine, and it was processed as a binary classification problem. The dataset has been used in several investigations (Esenogho et al., 2022 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

This dataset is a financial and economic information and research database (Compustat, 2022 ). It contains characteristics related to various aspects of companies, such as asset quality, revenues earned, administrative and sales expenses, and sales growth, among others. COMPUSTAT collects and stores detailed information on listed companies in the United States and Canada. The set includes information on 61 characteristics and consists of 228 companies, of which half showed fraud in their information while the other half did not present fraud (binary classification), and it is used in studies (Dutta et al., 2017 ; Whiting et al., 2012 ).

Insurance Company Benchmark (COIL 2000)

This dataset is used in the CoIL 2000 challenge, available at the UC Irvine Machine Learning Repository, created by Peter Van Der Putten. It consists of 9822 instances and 86 attributes containing information about customers of an insurance company and includes data on product use and sociodemographic data (Putten, 2000 ). It is characterized as multivariate and is used to perform regression/classification tasks by studies using the dataset (Huang et al., 2018 ; Sathya and Balakumar, 2022 ).

Bitcoin network transactional metadata

This dataset contains Bitcoin transaction metadata from 2011 to 2013. It was created by Omer Shafiq (Kaggle handle: OmerShafiq) and introduced to the Kaggle online community in 2019. The set comprises 11 attributes and 30,000 instances related to Bitcoin transactions, bitcoin flows, connections between transactions, average ratings, and malicious transactions (Omershafiq, 2019 ). It is efficient for investigating and analyzing anomalies and fraud detection in Bitcoin transactions (Ashfaq et al., 2022 ).

SQ2: What were the metrics used to assess the performance of ML models to detect financial fraud?

Based on previous studies (Nicholls et al., 2021 ; Shahana et al., 2023 ), the performance of the metrics used in ML models is the last step in determining whether the results align with the problem at hand. The metrics demonstrate the ability to do a specific task, such as classification, regression, or clustering quality, as they allow comparing the performance of models.

Many evaluation metrics have been used in previous studies, such as precision, sensitivity, recall, accuracy, and area under the curve. These metrics can be calculated using the confusion matrix. Figure 10 compares the target and true values with the predicted ones based on the study by Torrano et al. ( 2018 ).

figure 10

Presents the confusion matrix generated during the evaluation of the financial fraud detection models. Authors’ own elaboration.

According to previous studies (Shahana et al., 2023 ; Zhao and Bai, 2022 ), true positive (TP) projects a positive value (fraud) that matches the true value; true negative (TN) accurately predicts a negative outcome (no fraud); false positive (FP) denotes the predicted positive whose true value is negative (no fraud); and false negative (FN) represents the predicted negative whose true value is positive (fraud). FP and FN represent the misclassification cost, also known as classification model prediction error.

The metrics used to evaluate the effectiveness of supervised ML techniques are as follows. The accuracy metric is the most commonly used (Ramírez-Alpízar et al., 2020 ). It is defined as the total number or proportion of correct predictions/samples over the total number of records analyzed. Further, it is a method of evaluating the performance of a binary classification model distinguishing between true and false. In Eq. ( 1 ), it calculates the accuracy metric.

The sensitivity metric known as recall (TP or TPR rate) is the ratio of successfully identified fraudulent predictions to the total number of fraudulent samples. Equation ( 2 ) calculates the sensitivity metric.

The specificity metric (TN rate or TNR) is the percentage of non-fraudulent samples properly designated as non-fraudulent. It is represented in Eq. ( 3 ).

Accuracy is the ratio of correctly classified fraudulent predictions to the total number of fraudulent predictions. Equation ( 4 ) calculates the precision metric.

F1-score is a metric that combines accuracy and recall using a weighted harmonic mean (Bakumenko and Elragal, 2022 ). It is presented in Eq. ( 5 ).

Type I error (FP or FPR rate) is the number of legitimate predictions mistakenly labeled as fraudulent as a percentage of all legitimate predictions. The metric is defined in Eq. ( 6 ).

Type II error (FN or FNR rate) is the proportion of fraudulent samples incorrectly designated as non-fraudulent. Type I and II errors make up the overall error rate. It is defined in Eq. ( 7 ).

The area under the curve (AUC), or area under the receiver operating characteristic curve, represents a graphic of TPR versus FPR (Y. Chen and Wu, 2022 ). AUC values range from 0 to 1; the more accurate an ML model, the higher its AUC value. It is a metric that represents the model’s performance when differentiating between two classes.

Following the guidelines in previous studies (Amrutha et al., 2023 ; García-Ordás et al., 2023 ; Palacio, 2019 ), some metrics used to evaluate the effectiveness of unsupervised ML techniques will be defined.

The silhouette coefficient identifies the most appropriate number of clusters; a higher coefficient means better quality with this number of clusters. Equation ( 8 ) calculates the metric.

where x denotes the average of the distances of observation j with respect to the rest of the observations of the cluster to which j belongs. Furthermore, y denotes the minimum distance to a different cluster. The silhouette score takes values between −1 and 1. Based on the study by Viera et al. ( 2023 ), 1 (correct) represents the assignment of observation j to a good cluster, zero (0) indicates that observation j is between two distinct groups, and −1 (incorrect) indicates that the assignment of j to the cluster is a bad clustering.

The rand index is the similarity measure between two clusters considering all pairs and including those assigned to the same cluster in both the predictions and the true cluster. Equation ( 9 ) calculates the index.

The Davies–Bouldin metric is a score used to evaluate clustering algorithms. It is defined as the mean value of the samples, represented in Eq. ( 10 ).

where k denotes the number of groups \({c}_{i},{c}_{j}\) , k represents the centroids of cluster i and j , respectively, with \(d\left({c}_{i},{c}_{i}\right)\) as the distance between them, while \({\alpha }_{i}\) and \({\alpha }_{j}\) corresponds to the average distance of all elements in clusters i and j and the distance to their respective \({c}_{i}\) and \({c}_{j}\) centroids (Viera et al., 2023 ).

The Fowlkes–Mallows index is defined as the geometric mean between precision and recall, represented in Eq. ( 11 ).

The cophenetic correlation coefficient is a clustering method to produce a dendrogram (tree diagram). Equation ( 12 ) indicates the metric.

where \(x(i,j)=|{x}_{i}-{x}_{j}|\) represents the Euclidean distance between the i th and j th points of \(x\) . While \(t(i,j)\) is the height of the node at which the two points, \({t}_{i}\) and \({t}_{j}\) , of the dendrogram meet and \(\bar{x}\) and \(\bar{t}\) are the mean value of \(x(i,j)\) and \(t(i,j).\)

Discussion and conclusion

Research on the detection of financial fraud by applying ML techniques is a significant topic. On the one hand, fraud directly affects the business world and, on the other hand, detecting it early involves great challenges; this has led to designing tools using AI, such as ML techniques. This study is an SLR using adaptations of the PRISMA and Kitchenham methods to critically analyze and synthesize the study results. Research articles published in Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect were explored. The results were presented in two parts. The first one included a bibliometric study with the open-source software VOSviewer, followed by a discussion of the SLR results.

The bibliometric analysis presented the results of the authors, articles, sources, countries, and most important trends in the literature on financial fraud detection by applying ML, as well as an analysis of fraud types, ML models, and datasets. From the 104 articles dating from 2012 to 2023, several types of fraudulent activities are described, as well as external (e.g., credit cards, insurance) and internal (e.g., financial statements, money laundering) frauds, and a brief report on fraud, in general, is provided. Further, it was possible to extract supervised and unsupervised ML techniques, with the 10 most used models as RF in supervised techniques and autoencoder as an unsupervised technique.

During the literature review on the detection of financial fraud using machine learning models, it became evident that several authors have made significant contributions. However, some stand out more in terms of the number of publications and citations. Some of the most notable ones, Ahmed M. with 318 citations, Ileberi E. with 82, and Chen S. with 84, have made important advances in the field. Others, such as Abdallah A., with only one publication, but with 333 citations, have also made a considerable impact. And although researchers such as Khan S. and Mishra B. have fewer citations, the combined work of all these authors has established a robust knowledge base, providing a deeper understanding of the challenges and opportunities present in financial fraud detection through machine learning techniques.

Consistent with the analysis of the article clusters, clusters 2, 4 and 11 emerge as the most influential in this field with topics of interdisciplinary interest (artificial intelligence/machine learning, accounting, finance), among academics and auditing firms. The SLR evidences that authors in these domains often cooperate when it comes to publication, in turn, studies by (Huang et al., 2018 ; J. Kim et al., 2019 ; Sahin et al., 2013 ; Dutta et al., 2017 ) are highly cited articles.

Similarly, the leading countries in the research area include China, which has the largest number of published articles, followed by India and Saudi Arabia. The production of articles on the subject was found to be geographically distributed among countries whose economies are developing and are in transition, which indicates a greater capacity for the production of papers and research. In comparison to Ashtiani and Raahemi’s ( 2022 ) study highlighting the United States, leading with the largest number of papers (18) in the area, followed by China (8) and Greece (7), Al-Hashedi and Magalingam’s ( 2021 ) posit that India is the top producer of articles with 24, followed by China (14) and the United States (9).

The journals that have accepted the publication of these studies are specifically in the accounting and computer science domain. There is much literature on computers and security, expert systems with applications, and knowledge-based systems on financial fraud detection through ML models, as supported by Al-Hashedi and Magalingam ( 2021 ) and Ali et al. ( 2022 ). The keywords highlighted in the studies include crime, fraud detection, and ML. These words indicate a central focus on the financial industry, where learning and/or data mining systems help discover patterns or anomalies in financial data, in addition to attractive trends and approaches in the research field.

The literature has indicated articles investigating fraud types, particularly credit card loan fraud and insurance fraud, which are of great interest to the scientific community (Al-Hashedi and Magalingam, 2021 ; Ali et al., 2022 ; West and Bhattacharya, 2016 ). This study has classified the different types of fraud into internal and external, and sub-classifications have been derived. In both types, ML techniques have been used to detect financial fraud—supervised (59 articles), unsupervised (19 articles), supervised and unsupervised (16 articles), and deep learning (3 articles), among others. Most of the studies analyzed have developed binary classification models, that is, fraud or non-fraud. Supervised learning techniques require labeled data, and the most frequently used models are LR, RF, and SVM, among others. In the experiments, the prevalence of metrics such as accuracy, precision, sensitivity, and F1-score are highlighted. For unsupervised learning as a technique, the data do not have a label and focus on discovering new patterns with algorithms such as DBSCAN, autoencoder, and IF, among others. The evaluation with internal metrics was not made in detail. Few studies using semi-supervised learning and deep learning techniques have been highlighted because of the fact that they are novel.

Further, it is found in the trend through the keywords, as the research works address the subject of ML, learning algorithms, deep learning, SVM, fraudulent transactions, and anomaly detection, but it is evident that there is little research on unsupervised learning and deep learning. The scarce use of these techniques may be because of the complexity of the models and the high consumption of computational resources. In the analysis of the 86 experiment articles, few articles were found that used unsupervised techniques. Also, a large part of the datasets used is labeled, which requires further experimentation with models and unlabeled real-world datasets (Ounacer et al., 2018 ; Pumsirirat and Yan, 2018 ; Rubio et al., 2020 ; Van Capelleveen et al., 2016 ; Vanini et al., 2023 ). Meanwhile, labeled data are costly because an expert is required for their construction. Thus, more attention has been given to data origin, preprocessing, and feature extraction before training an ML model to increase detection accuracy. Accordingly, it should be emphasized that deep learning models require a thorough design and adjustment compared with previous models. They are quite sensitive to the architecture structure and choice of hyperparameters. Further, the data quality and quantity required is relatively high, so it should be considered in the design stage.

The studies show that the datasets for the experiments were taken from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among others. The researchers used ML models to detect financial fraud in credit card loans, highlighting the use of the “Credit Card Fraud Detection” dataset, mentioned 15 times. Also, the performance of ML models can be affected because of the selected set by the number of selected attributes and instances. From the analysis, it was observed that most of the articles use real datasets obtained from existing databases, historical records, or other collection methods, and few studies use synthetic datasets (four articles), which are those generated by modeling or simulation techniques and try to mimic a real dataset.

Still, the integration of real and synthetic datasets enables a comprehensive approach to the problem by providing a basis and complementary information for conclusions and comparisons with other studies on the performance of ML models. Specifically, the datasets used in recent studies and/or articles, spanning from 2012 to 2023, reveal concern related to obsolete data approximately from 1994, which, because of their age, do not provide effective and accurate results in the current context as a result of the new fraud modalities created day after day, with characteristics and behavior patterns that have evolved significantly over time.

The literature review and bibliometric analyses on financial fraud detection using machine learning and its various techniques conducted between 2012 and 2023 show a remarkable evolution in this field. Authors, including Ahmed M., Ileberi E., and Chen S. have made important contributions with a high number of citations. There has been fundamental interdisciplinary collaboration between areas such as artificial intelligence, accounting, finance, and information security, highlighting widely cited studies such as Huang et al. ( 2018 ), J. Kim et al. ( 2019 ), Sahin et al. ( 2013 ), and Dutta et al. ( 2017 ). Countries such as China, India and Saudi Arabia leading in publications can be seen, which reflects the global effort of emerging economies. Supervised learning techniques such as Random Forest, and unsupervised ones, like Autoencoder, are the most widely used. Furthermore, the effort and enthusiasm for the use of deep learning, despite its complexity and high computational resource requirements, are evident.

Research mainly uses real datasets such as those from the Chinese, Canadian, US, Taiwanese, and Tehran stock exchanges, with the “Credit Card Fraud Detection” dataset being the most important one. The journals that publish these studies belong both to the accounting area and to computer science, with extensive literature in Computers and Security, Expert Systems with Applications, and Knowledge-Based Systems. While it is true that the accuracy of fraud detection depends on the quality of the data and preprocessing with various algorithms, the need for robust and updated approaches to face new fraud modalities is particularly highlighted.

Limitations and scope for future research

The study had limitations that affected the scope and interpretation of the results. Although a systematic review was performed, the lack of quantitative support in the data collected is acknowledged. From the 104 articles identified in the SLR, 18 correspond to systematic reviews, which limits the availability of studies with specific details or experiments. This affected the depth of the analysis and the comprehensiveness of the results obtained.

The literature review reveals a predominant emphasis on the banking sector, especially in relation to credit card fraud and insurance fraud. The narrow focus leads to a lack of diversity in the types of fraud studied, excluding internal fraud types such as embezzlement, racketeering, smurfing, defalcation, collusion, signature forgery, and manipulation of accounting documents, among others. The underrepresentation of these other fraud types compromises the generalization of the findings and the applicability of ML models to contexts beyond the banking sector.

The datasets analyzed show a significant deficiency in the representation of fraud types. It can be observed that most of these datasets originated from the main stock exchanges and, additionally, the information used to carry out the experiments is old. This scenario indicates the inclusion of non-contemporary fraud types in the analysis. The limited availability of information on the performance metrics of the unsupervised learning models made it difficult to count the evaluation metrics used to predict financial fraud.

The field of financial fraud detection using ML models offers promising prospects for future research. An area of potential improvement is experimentation with advanced techniques, such as reinforcement learning or deep neural network architectures, to improve the accuracy and efficiency of models, including unsupervised learning. This approach could enable the development of more sophisticated systems capable of identifying complex fraud patterns and dynamically adjusting to the changing strategies of criminals, who are constantly innovating new fraud methods.

Moreover, it is suggested that the applicability of fraud detection systems in contexts other than banking be analyzed by adopting the anomaly approach, which would make it possible to move forward in the detection of fraud in real-time and minimize risks in organizations. It is also proposed that a dataset be created, containing real context information, which is freely accessible and includes new fraud methods to provide the scientific community with an updated dataset.

Data availability

The datasets generated and/or analyzed in this study are available in the Harvard Dataverse repository https://doi.org/10.7910/DVN/CM8NVY .

Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113. https://doi.org/10.1016/j.jnca.2016.04.007

Article   Google Scholar  

Achakzai MAK, Juan P (2022) Using machine learning meta-classifiers to detect financial frauds. Financ Res Lett 48:102915. https://doi.org/10.1016/j.frl.2022.102915

Ahmed M, Mahmood AN, Islam MdR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001

Al Ali A, Khedr AM, El-Bannany M, Kanakkayil S (2023) A powerful predicting model for financial statement fraud based on optimized XGBoost ensemble learning technique. Appl Sci 13(4):2272. https://doi.org/10.3390/app13042272

Article   CAS   Google Scholar  

Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715. https://doi.org/10.1109/ACCESS.2022.3166891

Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402. https://doi.org/10.1016/j.cosrev.2021.100402

Ali A, Abd Razak S, Othman SH, Eisa TAE, Al-Dhaqm A, Nasser Tusneem ME, Elshafie H, Saif A (2022) Financial fraud detection based on machine learning: a systematic literature review. Appl Sci (Switz). https://doi.org/10.3390/app12199637

Alsuwailem AAS, Salem E, Saudagar AKJ (2022) Performance of different machine learning algorithms in detecting financial fraud. Comput Econ. https://doi.org/10.1007/s10614-022-10314-x

Alwadain A, Ali RF, Muneer A (2023) Estimating financial fraud through transaction-level features and machine learning. Mathematics 11(5):1184. https://doi.org/10.3390/math11051184

Amrutha E, Arivazhagan S, Jebarani WSL (2023) Deep clustering network for steganographer detection using latent features extracted from a novel convolutional autoencoder. Neural Process Lett 55(3):2953–2964. https://doi.org/10.1007/s11063-022-10992-6

Arévalo F, Barucca P, Téllez-León I-E, Rodríguez W, Gage G, Morales R (2022) Identifying clusters of anomalous payments in the salvadorian payment system. Lat Am J Cent Bank. 3(1):100050. https://doi.org/10.1016/j.latcb.2022.100050

Ashfaq T, Khalid R, Yahaya A, Aslam S, Alsafari S, Hameed I (2022) A machine learning and blockchain bases efficient fraud detection mechanism. Sensors 22(19):7162. https://doi.org/10.3390/s22197162

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ashtiani MN, Raahemi B (2022) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. IEEE Access 10:72504–72525. https://doi.org/10.1109/ACCESS.2021.3096799

Aslam F, Hunjra A, Ftiti Z, Louhichi W, Shams T (2022) Insurance fraud detection: evidence from artificial intelligence and machine learning. Res Int Bus Financ. https://doi.org/10.1016/j.ribaf.2022.101744

Baghdasaryan V, Davtyan H, Sarikyan A, Navasardyan Z (2022) Improving tax audit efficiency using machine learning: the role of taxpayer’s network data in fraud detection. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2021.2012002

Baker MR, Mahmood ZN, Shaker EH (2022) Ensemble learning with supervised machine learning models to predict credit card fraud transactions. Rev Intell Artif. https://doi.org/10.18280/ria.360401

Bakumenko A, Elragal A (2022) Detecting anomalies in financial data using machine learning algorithms. Systems. https://doi.org/10.3390/systems10050130

Bekirev AS, Klimov VV, Kuzin MV, Shchukin BA (2015) Payment card fraud detection using neural network committee and clustering. Optical Mem. Neural Netw 24(3):193–200. https://doi.org/10.3103/S1060992X15030030

Benchaji I, Douzi S, Ouahidi BEl (2021) Credit card fraud detection model based on LSTM recurrent neural networks. J Adv Inf Technol 12(2):113–118. https://doi.org/10.12720/jait.12.2.113-118

Błaszczyński J, de Almeida Filho AT, Matuszyk A, Szeląg M, Słowiński R (2021) Auto loan fraud detection using dominance-based rough set approach versus machine learning methods. Expert Syst Appl 163:113740. https://doi.org/10.1016/j.eswa.2020.113740

Bolgorian M, Mayeli A, Ronizi NG (2023) CEO compensation and money laundering risk. J Econ Criminol 1:100007. https://doi.org/10.1016/j.jeconc.2023.100007

Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. SpringerPlus 5(1):89. https://doi.org/10.1186/s40064-016-1707-6

Article   PubMed   PubMed Central   Google Scholar  

Chen S, Goo Y-JJ, Shen Z-D (2014) A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements. Sci World J 2014:1–9. https://doi.org/10.1155/2014/968712

Chen Y, Wu Z (2022) Financial fraud detection of listed companies in China: a machine learning approach. Sustainability 15(1):105. https://doi.org/10.3390/su15010105

Chullamonthon P, Tangamchit P (2023) Ensemble of supervised and unsupervised deep neural networks for stock price manipulation detection. Expert Syst Appl 220:119698. https://doi.org/10.1016/j.eswa.2023.119698

Compustat (2022) Compustat. S&P Global Market Intelligence. https://www.marketplace.spglobal.com/en/datasets?cq_cmp=9778467255&cq_plac=&cq_net=g&cq_pos=&cq_plt=gp&utm_source=google&utm_medium=cpc&utm_campaign=DMS_Marketplace_Search_Google&utm_term=&utm_content=586436401424&_bt=586436401424&_bk=&_bm=&_bn=g&_bg=133704002389&gclid=Cj0KCQjw4s-kBhDqARIsAN-ipH3TguUoVohfDZgD65fjvKomc6BBgJ3uA9zP95m6u4vOs5yG7_L7w2UaAnnvEALw_wcB

CSMAR (2022) China Stock Market & Accounting Research (CSMAR). Wharton University of Pennsylvania. https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/china-stock-market-accounting-research-csmar/

Dalal S, Seth B, Radulescu M, Secara C, Tolea C (2022) Predicting fraud in financial payment services through optimized hyper-parameter-tuned XGBoost model. Mathematics 10(24):4679. https://doi.org/10.3390/math10244679

Dantas RM, Firdaus R, Jaleel F, Neves Mata P, Mata MN, Li G (2022) Systemic acquired critique of credit card deception exposure through machine learning. J Open Innov: Technol Mark Complex 8(4):192. https://doi.org/10.3390/joitmc8040192

Domashova J, Kripak E (2021) Identification of non-typical international transactions on bank cards of individuals using machine learning methods. Procedia Comput Sci 190:178–183. https://doi.org/10.1016/j.procs.2021.06.023

Domashova J, Kripak E (2022) Development of a generalized algorithm for identifying atypical bank transactions using machine learning methods. Procedia Comput Sci 213:101–109. https://doi.org/10.1016/j.procs.2022.11.044

Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393. https://doi.org/10.1016/j.eswa.2017.08.030

Elshaar S, Sadaoui S (2020) Semi-supervised Classification of Fraud Data in Commercial Auctions. Appl Artif Intell 34(1):47–63. https://doi.org/10.1080/08839514.2019.1691341

Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407. https://doi.org/10.1109/ACCESS.2022.3148298

Eshghi A, Kargari M (2019) Introducing a new method for the fusion of fraud evidence in banking transactions with regards to uncertainty. Expert Syst Appl 121:382–392. https://doi.org/10.1016/j.eswa.2018.11.039

Estupiñán Gaitán R (2015) Control interno y fraudes: análisis de informe COSO I, II y III con base en los ciclos transaccionales, Tercera edición (Niebel BW (ed)). Ecoe Ediciones

Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562. https://doi.org/10.1016/j.eswa.2023.119562

Fang Y, Zhang Y, Huang C (2019) Credit card fraud detection based on machine learning. Comput Mater Contin 61(1):185–195. https://doi.org/10.32604/cmc.2019.06144

Femila Roseline J, Naidu G, Samuthira Pandi V, Alamelu alias Rajasree S, Mageswari N (2022) Autonomous credit card fraud detection using machine learning approach✰. Comput Electr Eng 102:108132. https://doi.org/10.1016/j.compeleceng.2022.108132

García-Ordás MT, Alaiz-Moretón H, Casteleiro-Roca J-L, Jove E, Benítez-Andrades JA, García-Rodríguez I, Quintián H, Calvo-Rolle JL (2023) Clustering techniques selection for a hybrid regression model: a case study based on a solar thermal system. Cybern Syst 54(3):286–305. https://doi.org/10.1080/01969722.2022.2030006

Gupta S, Mehta SK (2021) Data mining-based financial statement fraud detection: systematic literature review and meta-analysis to estimate data sample mapping of fraudulent companies against non-fraudulent companies. Global Bus Rev https://doi.org/10.1177/0972150920984857

Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud—a comparative study of machine learning methods. Knowl-Based Syst 128:139–152. https://doi.org/10.1016/j.knosys.2017.05.001

Hamza C, Lylia A, Nadine C, Nicolas C (2023) Semi-supervised method to detect fraudulent transactions and identify fraud types while minimizing mounting costs. Int J Adv Comput Sci Appl 14(2). https://doi.org/10.14569/IJACSA.2023.0140298

Hilal W, Gadsden SA, Yawney J (2022) Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429. https://doi.org/10.1016/j.eswa.2021.116429

Hofmann H (1994) Statlog (German credit data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

Huang D, Mu D, Yang L, Cai X (2018) CoDetect: financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174. https://doi.org/10.1109/ACCESS.2018.2816564

Hwang J, Kim K (2020) An efficient domain-adaptation method using GAN for fraud detection. Int J Adv Comput Sci Appl 11(11). https://doi.org/10.14569/IJACSA.2020.0111113

Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost. IEEE Access 9:165286–165294. https://doi.org/10.1109/ACCESS.2021.3134330

Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):24. https://doi.org/10.1186/s40537-022-00573-8

Khan S, Alourani A, Mishra B, Ali A, Kamal M (2022) Developing a credit card fraud detection model using machine learning approaches. Int J Adv Comput Sci Appl 13(3). https://doi.org/10.14569/IJACSA.2022.0130350

Kim J, Kim H-J, Kim H (2019) Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl Intell 49(8):2842–2861. https://doi.org/10.1007/s10489-019-01419-2

Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43. https://doi.org/10.1016/j.eswa.2016.06.016

Kitchenham B, Brereton P (2013) A systematic review of systematic review process research in software engineering. Inf Softw Technol 55(12):2049–2075. https://doi.org/10.1016/j.infsof.2013.07.010

Kitchenham B, Stuart C (2007) Guidelines for performing systematic literature reviews in software engineering. https://www.researchgate.net/publication/302924724_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering

Kootanaee AJ, Aghajan AAP, Shirvani MH (2021) A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements. J Optim Ind Eng 14(2):183–201. https://doi.org/10.22094/JOIE.2020.1877455.1685

KPMG (2022) Una triple amenaza en las Américas. KMPG. https://kpmg.com/co/es/home/insights/2022/01/kpmg-fraud-outlook-survey.html

Kumar S, Ahmed R, Bharany S, Shuaib M, Ahmad T, Tag Eldin E, Rehman AU, Shafiq M (2022) Exploitation of machine learning algorithms for detecting financial crimes based on customers’ behavior. Sustainability 14(21):13875. https://doi.org/10.3390/su142113875

Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 197:116659. https://doi.org/10.1016/j.eswa.2022.116659

Lee H, Choi E, Kim I, Choi D, Go W, Lee K, Yim H, Lee T (2018) Feature selection practice for unsupervised learning of credit card fraud detection. J Theor Appl Inf Technol 96(2):408–417

Google Scholar  

Lei X, Mohamad UH, Sarlan A, Shutaywi M, Daradkeh YI, Mohammed HO (2022) Development of an intelligent information system for financial analysis depend on supervised machine learning algorithms. Inf Process Manag 59(5):103036. https://doi.org/10.1016/j.ipm.2022.103036

Lokanan M, Tran V, Vuong NH (2019) Detecting anomalies in financial statements using machine learning algorithm. Asian J Account Res 4(2):181–201. https://doi.org/10.1108/AJAR-09-2018-0032

Lokanan ME, Sharma K (2022) Fraud prediction using machine learning: The case of investment advisors in Canada. Mach Learn Appl 8:100269. https://doi.org/10.1016/j.mlwa.2022.100269

Lokanan ME (2022) Predicting money laundering using machine learning and artificial neural networks algorithms in banks. J Appl Secur Res 1–25. https://doi.org/10.1080/19361610.2022.2114744

López-Rojas E (2017) Synthetic financial datasets for fraud detection. Kaggle. https://www.kaggle.com/datasets/ealaxi/paysim1

Machine Learning Group (2018) Credit card fraud detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Madhurya MJ, Gururaj HL, Soundarya BC, Vidyashree KP, Rajendra AB (2022) Exploratory analysis of credit card fraud detection using machine learning techniques. Glob Transit Proc 3(1):31–37. https://doi.org/10.1016/j.gltp.2022.04.006

Malik EF, Khaw KW, Belaton B, Wong WP, Chew X (2022) Credit card fraud detection using a new hybrid machine learning architecture. Mathematics 10(9):1480. https://doi.org/10.3390/math10091480

Márquez Arcila RH (2019) Auditoría forense. Ecoe Ediciones

Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Procedia Comput Sci 167:254–262. https://doi.org/10.1016/j.procs.2020.03.219

Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097

Mongwe W, Malan K (2020) A survey of automated financial statement fraud detection with relevance to the South African context. S Afr Comput J 32(1). https://doi.org/10.18489/sacj.v32i1.777

Montes Salazar CA (2019) Riesgos de fraude en una auditoría de estados financieros (1.a ed.). Alfaomega. ISBN: 9789587782639. https://www.alfaomegacloud.com/reader/riesgos-de-fraude-en-una-auditoria-de-estados-financieros?location=3

Moreira MÂL, Junior C, de SR, Silva DF, de L, de Castro Junior MAP, Costa IP, de A, Gomes CFS, dos Santos M (2022) Exploratory analysis and implementation of machine learning techniques for predictive assessment of fraud in banking systems. Procedia Comput Sci 214:117–124. https://doi.org/10.1016/j.procs.2022.11.156

Narsimha B, Raghavendran CV, Rajyalakshmi P, Reddy GK, Bhargavi M, Naresh P (2022) Cyber defense in the age of artificial intelligence and machine learning for financial fraud detection application. Int J Electr Electron Res 10(2):87–92. https://doi.org/10.37391/ijeer.100206

Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Financ Data Sci 2(1):58–75. https://doi.org/10.1016/j.jfds.2016.03.001

Nicholls J, Kuppa A, Le-Khac N-A (2021) Financial cybercrime: a comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape. IEEE Access 9:163965–163986. https://doi.org/10.1109/ACCESS.2021.3134076

Nonnenmacher J, Marx Gómez J (2021) Unsupervised anomaly detection for internal auditing: Literature review and research agenda. Int J Digit Account Res 1–22. https://doi.org/10.4192/1577-8517-v21_1

Olszewski D (2014) Fraud detection using self-organizing map visualizing the user profiles. Knowl Based Syst 70:324–334. https://doi.org/10.1016/j.knosys.2014.07.008

Omershafiq (2019) Bitcoin network transactional metadata. Kaggle. https://www.kaggle.com/datasets/omershafiq/bitcoin-network-transactional-metadata

Ounacer S, Ait El Bour H, Oubrahim Y, Ghoumari MY, Azzouazi M (2018) Using isolation forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394. https://doi.org/10.21533/pen.v6i2.533

Palacio SM (2019) Abnormal pattern prediction: detecting fraudulent insurance property claims with semi-supervised machine-learning. Data Sci J 18(1):35. https://doi.org/10.5334/dsj-2019-035

Papík M, Papíková L (2022) Detecting accounting fraud in companies reporting under US GAAP through data mining. Int J Account Inf Syst 45:100559. https://doi.org/10.1016/j.accinf.2022.100559

Plakandaras V, Gogas P, Papadimitriou T, Tsamardinos I (2022) Credit card fraud detection with automated machine learning systems. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2022.2086354

Polak P, Nelischer C, Guo H, Robertson DC (2020) Intelligent” finance and treasury management: what we can expect. AI Soc 35(3):715–726. https://doi.org/10.1007/s00146-019-00919-6

PricewaterhouseCoopers (2022) Encuesta Global de Crimen y Fraude Económico de PwC Colombia 2022 – 2023. https://www.pwc.com/co/es/publicaciones/encuesta-crimen-fraude-economico.html

Pumsirirat A, Yan L (2018) Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int J Adv Comput Sci Appl 9(1). https://doi.org/10.14569/IJACSA.2018.090103

Putten P (2000) Insurance Company Benchmark (COIL 2000). UCI Machine Learning Repository. https://doi.org/10.24432/C5630S

Quinlan R (1997) Statlog (Australian credit approval). UCI Machine Learning Repository. https://doi.org/10.24432/C59012

Rakowski R, Polak P, Kowalikova P (2021) Ethical aspects of the impact of AI: the status of humans in the era of artificial intelligence. Society 58(3):196–203. https://doi.org/10.1007/s12115-021-00586-8

Ramírez-Alpízar A, Jenkins M, Martínez A, Quesada-López C (2020a) Use of data mining and machine learning techniques for fraud detection in financial statements: a systematic mapping study. Rev Ibér Sist Tecnol Inf Lousada No. E28:97–109

Reurink A (2018) Financial fraud: a literature review. J Econ Surv 32(5):1292–1325. https://doi.org/10.1111/joes.12294

Rocha-Salazar J-J, Segovia-Vargas M-J, Camacho-Miñano M-M (2021) Money laundering and terrorism financing detection using neural networks and an abnormality indicator. Expert Syst Appl 169:114470. https://doi.org/10.1016/j.eswa.2020.114470

Roehrs A, da Costa CA, Righi R, da R, de Oliveira KSF (2017) Personal health records: a systematic literature review. J Med Internet Res 19(1):e13. https://doi.org/10.2196/jmir.5876

Rubio J, Barucca P, Gage G, Arroyo J, Morales-Resendiz R (2020) Classifying payment patterns with artificial neural networks: an autoencoder approach. Lat Am J Cent Bank 1(1–4):100013. https://doi.org/10.1016/j.latcb.2020.100013

Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923. https://doi.org/10.1016/j.eswa.2013.05.021

Saputra M, Santosa PI, Permanasari AE (2023) Consumer behaviour and acceptance in fintech adoption: a systematic literature review. Acta Inform Pragensia 12(2):468–489. https://doi.org/10.18267/j.aip.222

Saragih MG, Chin J, Setyawasih R, Nguyen PT, Shankar K (2019) Machine learning methods for analysis fraud credit card transaction. Int J Eng Adv Technol 8(6S):870–874. https://doi.org/10.35940/ijeat.F1164.0886S19

Sathya M, Balakumar B (2022) Insurance fraud detection using novel machine learning technique. Int J Intell Syst Appl Eng 10(3):374–381

Savić M, Atanasijević J, Jakovetić D, Krejić N (2022) Tax evasion risk management using a hybrid unsupervised outlier detection method. Expert Syst Appl 193:116409. https://doi.org/10.1016/j.eswa.2021.116409

Seera M, Lim CP, Kumar A, Dhamotharan L, Tan KH (2021) An intelligent payment card fraud detection system. Ann Oper Res. https://doi.org/10.1007/s10479-021-04149-2

Shahana T, Lavanya V, Bhat AR (2023) State of the art in financial statement fraud detection: a systematic review. Technol Forecast Soc Change 192:122527. https://doi.org/10.1016/j.techfore.2023.122527

Shou M, Bao X, Yu J (2023) An optimal weighted machine learning model for detecting financial fraud. Appl Econ Lett 30(4):410–415. https://doi.org/10.1080/13504851.2021.1989367

Singh A, Jain A, Biable SE (2022) Financial fraud detection approach based on firefly optimization algorithm and support vector machine. Appl Comput Intell Soft Comput 2022:1–10. https://doi.org/10.1155/2022/1468015

Smith Q-J, Valverde R (2021) A perceptron based neural network data analytics architecture for the detection of fraud in credit card transactions in financial legacy systems. WSEAS Trans Syst Control 16:358–374. https://doi.org/10.37394/23203.2021.16.31

Sofy MA, Khafagy MH, Badry RM (2023) An intelligent Arabic model for recruitment fraud detection using machine learning. J Adv Informat Technol. https://doi.org/10.12720/jait.14.1.102-111

Srokosz M, Bobyk A, Ksiezopolski B, Wydra M (2023) Machine-learning-based scoring system for antifraud CISIRTs in banking environment. Electronics 12(1):251. https://doi.org/10.3390/electronics12010251

Subudhi S, Panigrahi S (2020) Use of optimized fuzzy C -Means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ— Comput Inf Sci 32(5):568–575. https://doi.org/10.1016/j.jksuci.2017.09.010

Ti Y-W, Hsin Y-Y, Dai T-S, Huang M-C, Liu L-C (2022) Feature generation and contribution comparison for electronic fraud detection. Sci Rep 12(1):18042. https://doi.org/10.1038/s41598-022-22130-2

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853. https://doi.org/10.1109/ACCESS.2020.3015600

Torrano C, Recuero P, Ramirez F, Hernández S, Torres J (2018) Machine learning aplicado a la ciberseguridad: técnicas y ejemplos en detección de amenazas. Zeroxword Computing

Udeze CL, Eteng IE, Ibor AE (2022) Application of machine learning and resampling techniques to credit card fraud detection. J Niger Soc Phys Sci 769. https://doi.org/10.46481/jnsps.2022.769

Usman A, Naveed N, Munawar S (2023) Intelligent anti-money laundering fraud control using graph-based machine learning model for the financial domain. J Cases Inf Technol 25(1):1–20. https://doi.org/10.4018/JCIT.316665

Van Capelleveen G, Poel M, Mueller RM, Thornton D, Van Hillegersberg J (2016) Outlier detection in healthcare fraud: a case study in the Medicaid dental domain. Int J Account Inf Syst 21:18–31. https://doi.org/10.1016/j.accinf.2016.04.001

Vanhoeyveld J, Martens D, Peeters B (2020) Value-added tax fraud detection with scalable anomaly detection techniques. Appl Soft Comput 86:105895. https://doi.org/10.1016/j.asoc.2019.105895

Vanini P, Rossi S, Zvizdic E, Domenig T (2023) Online payment fraud: from anomaly detection to risk management. Financ Innov 9(1):66. https://doi.org/10.1186/s40854-023-00470-w

Vanneschi L, Horn DM, Castelli M, Popovič A (2018) An artificial intelligence system for predicting customer default in e-commerce. Expert Syst Appl 104:1–21. https://doi.org/10.1016/j.eswa.2018.03.025

Viera J, Aguilar J, Rodríguez-Moreno M, Quintero-Gull C (2023) Analysis of the behavior pattern of energy consumption through online clustering techniques. Energies 16(4):1649. https://doi.org/10.3390/en16041649

Wadhwa VK, Saini AK, Kumar SS (2020) Financial fraud prediction models: a review of research evidence. Int J Sci Technol Res 9(1):677–680

West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57:47–66. https://doi.org/10.1016/j.cose.2015.09.005

Whiting DG, Hansen JV, McDonald JB, Albrecht C, Albrecht WS (2012) Machine learning methods for detecting patterns of management fraud. Comput Intell 28(4):505–527. https://doi.org/10.1111/j.1467-8640.2012.00425.x

Article   MathSciNet   Google Scholar  

Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering. pp. 1–10

Wu B, Lv X, Alghamdi A, Abosaq H, Alrizq M (2023) Advancement of management information system for discovering fraud in master card based intelligent supervised machine learning and deep learning during SARS-CoV2. Inf Process Manag 60(2):103231. https://doi.org/10.1016/j.ipm.2022.103231

Article   PubMed   Google Scholar  

Xiong T, Ma Z, Li Z, Dai J (2022) The analysis of influence mechanism for internet financial fraud identification and user behavior based on machine learning approaches. Int J Syst Assur Eng Manag 13(S3):996–1007. https://doi.org/10.1007/s13198-021-01181-0

Xiuguo W, Shengyong D (2022) An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access 10:22516–22532. https://doi.org/10.1109/ACCESS.2022.3153478

Yeh I-C (2016) Default of credit card clients. UCI Machine Learning Repository. https://doi.org/10.24432/C55S3H

Zhang Z, Zhou X, Zhang X, Wang L, Wang P (2018) A model based on convolutional neural network for online transaction fraud detection. Secur Commun. Netw. 2018:1–9. https://doi.org/10.1155/2018/5680264

Zhao Z, Bai T (2022) Financial fraud detection and prediction in listed companies using SMOTE and machine learning algorithms. Entropy 24(8):1157. https://doi.org/10.3390/e24081157

Zhou H, Chai H, Qiu M (2018) Fraud detection within bankcard enrollment on mobile device based payment using machine learning. Front Inf Technol Electron Eng 19(12):1537–1545. https://doi.org/10.1631/FITEE.1800580

Zupan M, Budimir V, Letinic S (2020) Journal entry anomaly detection model. Intell Syst Account Financ Manag 27(4):197–209. https://doi.org/10.1002/isaf.1485

Download references

Acknowledgements

We would like to express our gratitude to the Universidad Cooperativa de Colombia, Ibagué campus, Espinal. This research work was supported by Universidad Cooperativa de Colombia and derived from research project INV3456 entitled “Detection of anomalies in financial data in social economy organizations through machine learning techniques” associated with the PLANAUDI, AQUA and SINERGIA UCC group, from the Research Center of the Public Accounting and Systems Engineering program of the UCC Ibagué campus.

Author information

Authors and affiliations.

School of Public Accounting, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Ludivia Hernandez Aros & John Johver Moreno Hernandez

School of Systems Engineering, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Luisa Ximena Bustamante Molano & Fernando Gutierrez-Portela

School of Business Administration, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Mario Samuel Rodríguez Barrero

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the creation and design of the study.

Corresponding author

Correspondence to Ludivia Hernandez Aros .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval and consent to participate

The authors declare that they have no human participants, human data, or human tissue.

Consent to publish

The authors have no data from any individual person on any form.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hernandez Aros, L., Bustamante Molano, L.X., Gutierrez-Portela, F. et al. Financial fraud detection through the application of machine learning techniques: a literature review. Humanit Soc Sci Commun 11 , 1130 (2024). https://doi.org/10.1057/s41599-024-03606-0

Download citation

Received : 15 November 2023

Accepted : 13 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1057/s41599-024-03606-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

exclusion criteria for literature review

Perioperative outcomes of robot-assisted versus laparoscopic distal gastrectomy for gastric cancer: a systematic review and meta-analysis of propensity score matching studies

  • Published: 04 September 2024
  • Volume 18 , article number  333 , ( 2024 )

Cite this article

exclusion criteria for literature review

  • Wei Li 1 &
  • Shou-Jiang Wei 1  

The aim of this meta-analysis was to compare the efficacy of robot distal gastrectomy (RDG) versus laparoscopic distal gastrectomy (LDG) for gastric cancer. Studies included only those that utilized propensity score matching (PSM). A systematic literature search was conducted in several major global databases, including PubMed, Embase, and Google Scholar, up to June 2024. Articles were screened based on predefined inclusion and exclusion criteria. Baseline data and primary and secondary outcome measures (e.g., operative time, estimated blood loss, lymph-node yield dissection, length of hospital stay, and time to first flatus) were extracted. The quality of PSM studies was assessed using the ROBINS-I, and data were analyzed using Review Manager 5.4.1 software. A total of 12 propensity score-matched studies involving 3688 patients were included in this meta-analysis. Robot-assisted surgery resulted in a longer operative time (WMD 30.64 min, 95% CI 15.63 – 45.66; p  < 0.0001), less estimated blood loss (WMD 29.54 mL, 95% CI − 47.14 − 11.94; p  = 0.001), more lymph-node yield (WMD 5.14, 95% CI 2.39 − 7.88; p  = 0.0002), and a shorter hospital stay (WMD − 0.36, 95% CI − 0.60 − 0.12; p  = 0.004) compared with laparoscopic surgery. There were no significant differences between the two surgical methods in terms of time to first flatus, overall complications, and major complications. Robot distal gastrectomy for gastric cancer reduces intraoperative blood loss, increases lymph-node yield, and shortens hospital stay compared with laparoscopic surgery, despite a longer operative time. There are no significant differences in time to first flatus and complication rates between the two groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

exclusion criteria for literature review

Data availability

The original contributions detailed in the study are encompassed within the article material. For additional inquiries, please contact the corresponding author/s directly.

Abbreviations

  • Gastric cancer

Laparoscopic gastrectomy

Robotic gastrectomy

Robotic-assisted distal gastrectomy

Laparoscopic-assisted group

Randomized controlled trials

Newcastle–Ottawa Scale

Confidence intervals

Odds ratios

Weighted mean difference

Colorectal cancer

Standard deviation

Body mass index

Robotic distal gastrectomy

Propensity score matching

Laparoscopic distal gastrectomy

Estimated blood loss

Not available

Bray F, Ferlay J, Soerjomataram I et al (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424. https://doi.org/10.3322/caac.21492

Article   PubMed   Google Scholar  

Chen W, Zheng R, Baade PD et al (2016) Cancer statistics in China, 2015. CA Cancer J Clin 66(2):115–132. https://doi.org/10.3322/caac.21338

Suda K, Man-I M, Ishida Y et al (2015) Potential advantages of robotic radical gastrectomy for gastric adenocarcinoma in comparison with conventional laparoscopic approach: a single institutional retrospective comparative cohort study. Surg Endosc 29(3):673–685. https://doi.org/10.1007/s00464-014-3718-0

Uyama I, Suda K, Satoh S (2013) Laparoscopic surgery for advanced gastric cancer: current status and future perspectives. J Gastric Cancer 13(1):19–25. https://doi.org/10.5230/jgc.2013.13.1.19

Article   PubMed   PubMed Central   Google Scholar  

Kitano S, Iso Y, Moriyama M et al (1994) Laparoscopy-assisted Billroth I gastrectomy. Surg Laparosc Endosc 4(2):146–148

CAS   PubMed   Google Scholar  

Furukawa T, Wakabayashi G, Ozawa S et al (2000) Surgery using master-slave manipulators and telementoring. Nihon Geka Gakkai Zasshi 101(3):293–298

Liu F, Huang C, Xu Z et al (2020) Morbidity and mortality of laparoscopic vs open total gastrectomy for clinical stage I gastric cancer: the CLASS02 multicenter randomized clinical trial. JAMA Oncol 6(10):1590–1597. https://doi.org/10.1001/jamaoncol.2020.3152

Hyung WJ, Yang HK, Park YK et al (2020) Long-term outcomes of laparoscopic distal gastrectomy for locally advanced gastric cancer: The KLASS-02-RCT randomized clinical trial. J Clin Oncol 38(28):3304–3313. https://doi.org/10.1200/JCO.20.01210

Hu Y, Huang C, Sun Y et al (2016) Morbidity and mortality of laparoscopic versus open d2 distal gastrectomy for advanced gastric cancer: a randomized controlled trial. J Clin Oncol 34(12):1350–1357. https://doi.org/10.1200/JCO.2015.63.7215

Hashizume M, Sugimachi K (2003) Robot-assisted gastric surgery. Surg Clin North Am 83(6):1429–1444. https://doi.org/10.1016/S0039-6109(03)00158-0

Lee J, Kim YM, Woo Y et al (2015) Robotic distal subtotal gastrectomy with D2 lymphadenectomy for gastric cancer patients with high body mass index: comparison with conventional laparoscopic distal subtotal gastrectomy with D2 lymphadenectomy. Surg Endosc 29(11):3251–3260. https://doi.org/10.1007/s00464-015-4069-1

Song JH, Son T, Lee S et al (2020) D2 lymph node dissections during reduced-port robotic distal subtotal gastrectomy and conventional laparoscopic surgery performed by a single surgeon in a high-volume center: a propensity score-matched analysis. J Gastric Cancer 20(4):431–441. https://doi.org/10.5230/jgc.2020.20.e36

Li ZY, Zhou YB, Li TY et al (2023) Robotic Gastrectomy Versus Laparoscopic Gastrectomy for Gastric Cancer: a multicenter cohort study of 5402 patients in china. Ann Surg 277(1):e87–e95. https://doi.org/10.1097/SLA.0000000000005046

Ye SP, Shi J, Liu DN et al (2019) Robotic-assisted versus conventional laparoscopic-assisted total gastrectomy with D2 lymphadenectomy for advanced gastric cancer: short-term outcomes at a mono-institution. BMC Surg 19(1):86. https://doi.org/10.1186/s12893-019-0549-x

Wang WJ, Li R, Guo CA et al (2019) Systematic assessment of complications after robotic-assisted total versus distal gastrectomy for advanced gastric cancer: a retrospective propensity score-matched study using Clavien-Dindo classification. Int J Surg 71:140–148. https://doi.org/10.1016/j.ijsu.2019.09.029

Page MJ, McKenzie JE, Bossuyt PM et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71. https://doi.org/10.1136/bmj.n71

Guyatt GH, Oxman AD, Vist GE et al (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924–926. https://doi.org/10.1136/bmj.39489.470347.AD

Ebihara Y, Kurashima Y, Murakami S et al (2022) Short-term outcomes of robotic distal gastrectomy with the “preemptive retropancreatic approach”: a propensity score matching analysis. J Robot Surg 16(4):825–831. https://doi.org/10.1007/s11701-021-01306-4

Li Z, Li J, Li B et al (2018) Robotic versus laparoscopic gastrectomy with D2 lymph node dissection for advanced gastric cancer: a propensity score-matched analysis. Cancer Manag Res 10:705–714. https://doi.org/10.2147/CMAR.S161007

Zhengyan L, Yong-liang Z, Feng Q et al (2021) Morbidity and short-term surgical outcomes of robotic versus laparoscopic distal gastrectomy for gastric cancer: a large cohort study. Surg Endosc 35(7):3572–3583. https://doi.org/10.1007/s00464-020-07820-0

Kitazono M, Fujita M, Uchiyama S et al (2024) Robotic vs. laparoscopic distal gastrectomy for gastric cancer: a propensity score-matched retrospective comparative study at a single institution. Asian J Surg. https://doi.org/10.1016/j.asjsur.2024.03.086

Gao G, Liao H, Jiang Q et al (2022) Surgical and oncological outcomes of robotic- versus laparoscopic-assisted distal gastrectomy with D2 lymphadenectomy for advanced gastric cancer: a propensity score-matched analysis of 1164 patients. World J Surg Oncol 20(1):315. https://doi.org/10.1186/s12957-022-02778-w

Hong SS, Son SY, Shin HJ et al (2016) Can robotic gastrectomy surpass laparoscopic gastrectomy by acquiring long-term experience? a propensity score analysis of a 7-year experience at a single institution. J Gastric Cancer 16(4):240–246. https://doi.org/10.5230/jgc.2016.16.4.240

Huang W, Liu S, Chen J (2022) Surgical and short-term outcomes in robotic and laparoscopic distal gastrectomy for gastric cancer with enhanced recovery after surgery protocol: a propensity score matching analysis. Front Surg 9:944395. https://doi.org/10.3389/fsurg.2022.944395

Ye SP, Shi J, Liu DN et al (2020) Robotic- versus laparoscopic-assisted distal gastrectomy with D2 lymphadenectomy for advanced gastric cancer based on propensity score matching: short-term outcomes at a high-capacity center. Sci Rep 10(1):6502. https://doi.org/10.1038/s41598-020-63616-1

Article   CAS   PubMed   PubMed Central   Google Scholar  

Tian Y, Guo H, Hu Y et al (2023) Safety and efficacy of robotic-assisted versus laparoscopic distal gastrectomy after neoadjuvant chemotherapy for advanced gastric cancer. Surg Endosc 37(9):6761–6770. https://doi.org/10.1007/s00464-023-10122-w

Roh CK, Choi S, Seo WJ et al (2020) Comparison of surgical outcomes between integrated robotic and conventional laparoscopic surgery for distal gastrectomy: a propensity score matching analysis. Sci Rep 10(1):485. https://doi.org/10.1038/s41598-020-57413-z

Isobe T, Murakami N, Minami T et al (2021) Robotic versus laparoscopic distal gastrectomy in patients with gastric cancer: a propensity score-matched analysis. BMC Surg 21(1):203. https://doi.org/10.1186/s12893-021-01212-4

Chen K, Mou YP, Xu XW et al (2015) Comparison of short-term surgical outcomes between totally laparoscopic and laparoscopic-assisted distal gastrectomy for gastric cancer: a 10-y single-center experience with meta-analysis. J Surg Res 194(2):367–374. https://doi.org/10.1016/j.jss.2014.10.020

Kim MS, Kim WJ, Hyung WJ et al (2021) Comprehensive learning curve of robotic surgery: discovery from a multicenter prospective trial of robotic gastrectomy. Ann Surg 273(5):949–956. https://doi.org/10.1097/SLA.0000000000003583

Woo Y, Hyung WJ, Pak KH et al (2011) Robotic gastrectomy as an oncologically sound alternative to laparoscopic resections for the treatment of early-stage gastric cancers. Arch Surg 146(9):1086–1092. https://doi.org/10.1001/archsurg.2011.114

Shibasaki S, Suda K, Obama K et al (2020) Should robotic gastrectomy become a standard surgical treatment option for gastric cancer? Surg Today 50(9):955–965. https://doi.org/10.1007/s00595-019-01875-w

Xu Y, Li Z, Pan G et al (2021) Anatomical findings and short-term efficacy of fascial anatomy-guided infrapyloric lymphadenectomy in laparoscopic radical gastrectomy for gastric cancer. Surg Laparosc Endosc Percutan Tech 31(4):434–438. https://doi.org/10.1097/SLE.0000000000000886

Kim YW, Reim D, Park JY et al (2016) Role of robot-assisted distal gastrectomy compared to laparoscopy-assisted distal gastrectomy in suprapancreatic nodal dissection for gastric cancer. Surg Endosc 30(4):1547–1552. https://doi.org/10.1007/s00464-015-4372-x

Coburn NG (2009) Lymph nodes and gastric cancer. J Surg Oncol 99(4):199–206. https://doi.org/10.1002/jso.21224

Yu J, Huang C, Sun Y et al (2019) Effect of Laparoscopic vs open distal gastrectomy on 3-year disease-free survival in patients with locally advanced gastric cancer: the CLASS-01 randomized clinical trial. JAMA 321(20):1983–1992. https://doi.org/10.1001/jama.2019.5359

Obama K, Kim YM, Kang DR et al (2018) Long-term oncologic outcomes of robotic gastrectomy for gastric cancer compared with laparoscopic gastrectomy. Gastric Cancer 21(2):285–295. https://doi.org/10.1007/s10120-017-0740-7

Ma J, Li X, Zhao S et al (2020) Robotic versus laparoscopic gastrectomy for gastric cancer: a systematic review and meta-analysis. World J Surg Oncol 18(1):306. https://doi.org/10.1186/s12957-020-02080-7

Bobo Z, Xin W, Jiang L et al (2019) Robotic gastrectomy versus laparoscopic gastrectomy for gastric cancer: meta-analysis and trial sequential analysis of prospective observational studies. Surg Endosc 33(4):1033–1048. https://doi.org/10.1007/s00464-018-06648-z

Uyama I, Suda K, Nakauchi M et al (2019) Clinical advantages of robotic gastrectomy for clinical stage I/II gastric cancer: a multi-institutional prospective single-arm study. Gastric Cancer 22(2):377–385. https://doi.org/10.1007/s10120-018-00906-8

Sun T, Wang Y, Liu Y et al (2022) Perioperative outcomes of robotic versus laparoscopic distal gastrectomy for gastric cancer: a meta-analysis of propensity score-matched studies and randomized controlled trials. BMC Surg 22(1):427. https://doi.org/10.1186/s12893-022-01881-9

Hiki N, Shimizu N, Yamaguchi H et al (2006) Manipulation of the small intestine as a cause of the increased inflammatory response after open compared with laparoscopic surgery. Br J Surg 93(2):195–204. https://doi.org/10.1002/bjs.5224

Article   CAS   PubMed   Google Scholar  

Kim MC, Heo GU, Jung GJ (2010) Robotic gastrectomy for gastric cancer: surgical techniques and clinical merits. Surg Endosc 24(3):610–615. https://doi.org/10.1007/s00464-009-0618-9

Download references

Acknowledgements

Author information, authors and affiliations.

Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, China

Wei Li & Shou-Jiang Wei

You can also search for this author in PubMed   Google Scholar

Contributions

Every author played a role in conceptualizing and designing the study. LW and WSJ were responsible for gathering and analyzing the data. The initial draft was penned by LW, while WSJ critically reviewed and revised the draft, ensuring significant intellectual content was added. Each author provided feedback on earlier drafts and gave their approval to the final version of the manuscript.

Corresponding author

Correspondence to Shou-Jiang Wei .

Ethics declarations

Conflict of interests.

The authors declare that they have no conflict of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 11 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Li, W., Wei, SJ. Perioperative outcomes of robot-assisted versus laparoscopic distal gastrectomy for gastric cancer: a systematic review and meta-analysis of propensity score matching studies. J Robotic Surg 18 , 333 (2024). https://doi.org/10.1007/s11701-024-02038-x

Download citation

Received : 06 June 2024

Accepted : 30 June 2024

Published : 04 September 2024

DOI : https://doi.org/10.1007/s11701-024-02038-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Distal gastrectomy
  • Robot-assisted
  • Laparoscopic
  • Meta-analysis
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 03 September 2024

Duration of antibiotic therapy for multidrug resistant Pseudomonas aeruginosa pneumonia: is shorter truly better?

  • Clover N. Truong   ORCID: orcid.org/0000-0002-8827-3016 1 , 2 ,
  • Nafeesa Chin-Beckford 1 ,
  • Ana Vega 1 ,
  • Kailynn DeRonde 1 ,
  • Julio Simon 1 ,
  • Lilian M. Abbo 3 , 4 ,
  • Rossana Rosa 4 &
  • Christine A. Vu 1  

BMC Infectious Diseases volume  24 , Article number:  911 ( 2024 ) Cite this article

1 Altmetric

Metrics details

The 2016 IDSA guideline recommends a treatment duration of at least 7 days for hospital-acquired (HAP)/ventilator-associated pneumonia (VAP). The limited literature has demonstrated higher rates of recurrence for non-glucose fermenting gram-negative bacilli with short course therapy, raising the concern of optimal treatment duration for these pathogens. Therefore, we aimed to compare the outcomes for patients receiving shorter therapy treatment (≤ 8 days) versus longer regimen (> 8 days) for the treatment of multidrug resistant (MDR) Pseudomonas pneumonia.

A single-center, retrospective cohort study was conducted to evaluate adult patients receiving an antimicrobial regimen with activity against MDR Pseudomonas aeruginosa in respiratory culture between 2017 and 2020 for a minimum of 6 consecutive days. Exclusion criteria were inmates, those with polymicrobial pneumonia, community-acquired pneumonia, and infections requiring prolonged antibiotic therapy.

Of 427 patients with MDR P. aeruginosa respiratory isolates, 85 patients were included. Baseline characteristics were similar among groups with a median age of 65.5 years and median APACHE 2 score of 20. Roughly 75% had ventilator-associated pneumonia. Compared to those who received ≤ 8 days of therapy, no difference was seen for clinical success in patients treated for more than 8 days (80% vs. 65.5%, p  = 0.16). The number of 30-day and 90-day in-hospital mortality, 30-days relapse, and other secondary outcomes did not significantly differ among the treatment groups.

Conclusions

Prolonging treatment duration beyond 8 days did not improve patient outcomes for MDR P. aeruginosa HAP/VAP.

Peer Review reports

Nosocomial pneumonia is one of the most common hospital-acquired infections [ 1 ]. Non-fermenting gram-negative bacterial pathogens – particularly Pseudomonas aeruginosa – are common causes of nosocomial pneumonia. P. aeruginosa harbor several antimicrobial resistance mechanisms to standard β-lactams; and thus, infections caused by this pathogen are often difficult to treat.

The Centers for Disease Control and Prevention recognizes multidrug resistant (MDR) P. aeruginosa as a serious public health threat due to the high burden of mortality and health-care expenditure [ 2 ]. Optimal duration of treatment for P. aeruginosa HAP/VAP has been a matter of debate to ensure successful outcomes. The 2016 Infectious Diseases Society of America (IDSA) HAP/VAP guidelines recommend treating nosocomial pneumonia for a minimum of 7 days based on clinical improvement [ 3 ], but with a mention that recurrence may be increased when treating VAP caused by non-glucose fermenting gram-negative bacilli with a short course of antibiotics (7–8 days) [ 4 , 5 ]. Previous studies supporting IDSA guideline recommendation do not specifically look at outcomes in patients with pneumonia caused by MDR P. aeruginosa and often exclude immunocompromised patient population in whom relapsed pneumonia might lead to a higher mortality beyond 30 days. In these difficult to treat infections in vulnerable patient populations, clinicians may be inclined to treat for a longer duration; however, this may have negative consequences in the future as each additional day of exposure to any anti-pseudomonal antibiotics is associated with an increased risk of new resistance development [ 6 , 7 ]. Therefore, the purpose of this study is to compare outcomes for patients receiving shorter regimens (≤ 8 days) versus longer regimens (> 8 days) for the treatment of MDR P. aeruginosa hospital-acquired/ventilator-associated pneumonia, including those at a higher risk of deaths and/or with immunocompromised conditions.

Materials and methods

Study design and patient population.

This single-center retrospective cohort study was conducted at Jackson Memorial Hospital in Miami, Florida. The microbiology database queried to identify all patients with respiratory cultures of MDR P. aeruginosa isolates from January 2017 through December 2020. MDR P. aeruginosa was defined as isolates that were non-susceptible (intermediate or resistant) to one or more drugs in at least three of the following categories: extended spectrum (ES) cephalosporins (i.e. cefepime, ceftazidime), ES penicillin with beta lactamase inhibitor (i.e. piperacillin/tazobactam), fluoroquinolones, aminoglycosides, and/or carbapenems. All adult recipients initiated on systemic antimicrobial regimen active (susceptible in vitro) against MDR P. aeruginosa for a minimum of 6 days were eligible for inclusion. Recipients who were inmates, had community acquired pneumonia, polymicrobial pneumonia, empyema, lung abscesses, or other pulmonary complications secondary to pneumonia, non- P. aeruginosa infection and/or patients requiring prolonged antibiotic therapy (> 21 days) for a different indication were excluded from the study. This study was approved by the Institutional Review Board of University of Miami-Jackson Health (IRB# 20210568) and a waiver of informed consent was granted.

Inpatient encounter notes along with laboratory records of all eligible patients were retrospectively reviewed. Information extracted included patient demographics, pre-existing comorbidities, Charlson comorbidity index, quick Sepsis-related Organ Failure Assessment (qSOFA) score, Acute Physiology and Chronic Health Evaluation II (APACHE II) score, mechanical ventilation duration prior and after index culture, ICU and hospital length of stays after index culture, vasopressor use within 24 h of index culture, and pneumonia clinical course. Patient specific antibiotic regimen (drug, dose, route of administration, total duration) was also recorded.

Study outcomes and definitions

The primary outcome was clinical success at end-of-therapy defined as resolution of signs and symptoms of infection and no requirement for additional antibacterial treatment for the same indication. Secondary outcomes included (1) All-cause in-hospital mortality within 30 and 90 days; (2) incidence of relapsed pneumonia within 30 days of index culture defined as reappearance of signs and symptoms of pneumonia with re-isolation of P. aeruginosa isolate in the respiratory culture necessitating antibiotic treatment; (3) ICU and hospital length of stay after index culture; (4) mechanical ventilation free-days after index culture, and (5) 30-day readmission rate for pneumonia due to Pseudomonas aeruginosa . All outcomes were assessed until discharge or death, or date lost to follow-up.

Statistical methods

Descriptive statistics were reported using means and standards deviations for normally distributed continuous data, medians and interquartile ranges for non-normally distributed continuous data or ordinal scale data, and percentages for event rates and nominal data. Student’s t-test was used for parametric continuous variables, and Chi-square test, Fisher exact test, or Mann Whitney U was used for categorical variables or non-parametric continuous data as appropriate. Odds ratios for clinical success, 30-day mortality, and 90-day mortality were estimated using logistic regression and adjusted for pre-defined relevant exposure variables (age, intensive care unit stay, vasopressors, mechanical ventilation, qSOFA score, and receiving combination therapy). Statistical analyses were performed with Stata version 14 (College Station, TX).

Patient characteristics

From January 2017 through December 2020, 427 MDR P. aeruginosa respiratory isolates were identified at Jackson Memorial Hospital. Of these, 85 unique patients met inclusion criteria, 342 patients were excluded due to no active treatment ( n  = 148), polymicrobial pneumonia ( n  = 132), empyema or concomitant infections requiring prolonged antibiotics ( n  = 19), and respiratory cultures obtained within 48 h of admission ( n  = 43). Patients were predominantly male with median age of 63–65 years old. For treatment duration, 30 patients (35.3%) received ≤ 8 days of antibiotics, 55 patients (64.7%) received more than 8 days of antibiotics. The baseline characteristics of the included patients were similar among groups and are summarized in Table  1 . The most common indication was ventilator-associated pneumonia among two groups. Median duration of mechanical ventilation prior to index culture was numerically longer in ≤ 8-day group (24.6 days) compared to more than 8 days group (13.6 days). Approximately 15% had history of solid organ transplant. More patients were in the ICU in the more than 8-day group (80%) compared to ≤ 8-day group (63.3%). The median APACHE II score approximated 20 and were not different among treatment groups.

Primary and secondary outcomes

At the end-of-treatment, compared to patients who treated with ≤  8 days of antibiotics, there was no significant difference in clinical success for those treated for more than 8 days (OR 0.47 (95% CI 0.16–1.36; p  = 0.16). Estimates were unchanged after adjusting for age, ICU admission, mechanical ventilation, vasopressors, qSOFA score, and receiving combination therapy ( p  = 0.34). Similarly, there was no difference in 30-day in-hospital mortality among treatment groups [≤ 8-day; 8/30 (26.7%) vs. more than 8 days; 17/55 (30.9%)] ( p  = 0.68). Patients who were given a shorter duration of therapy (≤ 8 days) did not demonstrate a higher relapse rate within 30 days of index culture compared to those who were treated for longer duration of antibiotics (10% vs. 18.2%, p  = 0.32). Additionally, 90-day in-hospital mortality did not differ among treatment group ( p  = 0.89). As reported in Table  2 , none of the other secondary outcome events—number of mechanical ventilation–free days, length of ICU stay, and incidence of 30-day readmission due to P. aeruginosa pneumonia – differed significantly among those who were treated ≤ 8 days vs. more than 8 days.

Antibiotic regimen

Since antibiotic regimens were chosen at the discretion of treating physicians and based on susceptibility reports, various antibiotics were used during the study period (Table  3 ).

Multidrug resistant P. aeruginosa continues to pose a threat to hospitalized patients, especially those with immunocompromised conditions. To our knowledge, this is the first study assessing the outcomes of shorter course vs. longer course of antibiotics specifically in patients with HAP/VAP caused by MDR Pseudomonas aeruginosa . We found longer courses of antibiotics (> 8 days) did not result in any significant difference in outcomes compared to those treated with shorter therapy. This is contrary to a recent randomized controlled trial conducted by Bougle et al. where non-inferiority of short duration (8 days) in the treatment of P. aeruginosa VAP was not demonstrated compared to long duration (15 days) due to significant higher rate of recurrence in the shorter duration group [ 8 ], however this study was limited due to its lack of power. Our clinical success at the end of therapy in patients treated with 8 days of antibiotics (80%) was very similar to that of the REPROVE trial where 79.5% of their patients with P. aeruginosa VAP were successfully treated with ceftazidime-avibactam for 7–14 days [ 9 ]. Furthermore, all-cause mortality in our cohort (26.7%) was similar to patients being treated with 14 days of imipenem-relebactam for P. aeruginosa HAP/VAP in the RESTORE-IMI 2 trial (33.3%) [ 10 ]. Lastly, for the ASPECT-NP trial (50/511; 9.78% patients with MDR Pseudomonas aeruginosa) , median duration of treatment was 12 days (range 0–14) with clinical cure at test-of-cure (7–14 days after end-of-therapy) was 60% and 28-day all-cause mortality was 16.7% in ceftolozane-tazobactam vs. 26.3% in meropenem group [ 11 ]. One of the biggest strengths of our study was the population consisting mostly of patients at increased risk of adverse treatment outcomes and death. Majority of our cohort were in the ICU at time of infection (> 60%), had a median APACHE II score of 20, and we included a subset of immunocompromised patients. Despite the high acuity of our patient population and being treated for an MDR infection, we still did not find worse outcome with shorter course of antibiotics. This result favors the approach of treating patients with MDR P. aeruginosa HAP/VAP for ≤8 days, instead of a longer duration, to prevent the development of resistance and adverse drug events.

The use of combination therapy for severe pseudomonal infections has been considered standard of practice by many clinicians due to in-vitro antibiotic synergy and potential prevention of resistance emergence while receiving therapy. In our cohort, the choice of antibiotics for treatment of MDR P. aeruginosa HAP/VAP was dictated by the susceptibility patterns, with the majority using monotherapy. We did not see any trend favoring combination therapy over monotherapy. This finding was similar to the meta-analysis conducted by the IDSA expert panels including 7 randomized trials which found that combination therapy offered no benefit in reducing mortality beyond monotherapy (RR, 0.94; 95% CI, 0.76–1.16) [ 3 ]. In a retrospective cohort study that included 183 episodes of VAP caused by Pseudomonas aeruginosa , Garnacho-Montero et al. found that inappropriate empiric therapy was associated with increased mortality. After exclusion of patients receiving inappropriate empiric treatment regimen, mortality was not different among groups who were treated with monotherapy vs. combination (23.1% vs. 33.2%, adjusted HR 0.9; 95% CI 0.5–1.63) [ 12 ]. This observational study along with the meta-analysis by IDSA panel suggested that once the antibiotic susceptibilities were known, combination therapy was not necessary for P. aeruginosa HAP/VAP.

Limitations of the present study should be noted. Our sample sizes were small; and thus, may not be powered enough to detect the difference in outcomes among treatment groups. However, outcomes including clinical success rate at end of therapy and in-hospital mortality in our cohort were similar to those reported in previous trials with no trend favoring longer course of treatment. Additionally, since this was a retrospective study, clinical diagnosis of pneumonia was largely dependent on provider documentation, which did not always detail the specific rationale for the diagnosis and thus it was hard to retrospectively differentiate between pneumonia and possible colonization. Finally, we were unable to capture mortality or re-admission events outside our hospital; and thus, the incident rate may represent an underestimation.

MDR P. aeruginosa remains a significant pathogen in nosocomial pneumonia and is associated with high mortality. In this study, treating MDR P. aeruginosa HAP/VAP beyond 8 days did not result in better clinical success, lower mortality, or less incidence of relapse. Therefore, shorter course of antibiotics (≤8 days) can be considered in treating MDR P. aeruginosa HAP/VAP, including those with immunocompromised conditions. Further studies are needed to validate our initial findings.

Data availability

The datasets used in this study are available from the corresponding author on reasonable request.

Abbreviations

Acute Physiology and Chronic Health Evaluation II (severity of disease classification system utilizing initial values of 12 physiologic measurements, age, and previous health status)

Charlson Comorbidity Index (a weighted score taking into account the number and severity of comorbidity conditions)

Hospital-acquired pneumonia

Infectious Diseases Society of America

  • Multidrug resistant

Quick Sepsis-related Organ Failure Assessment (bedside clinical tool to identify high risk patients with suspected infections)

Ventilator-associated pneumonia

Magill SS, Edwards JR, Bamberg W, et al. Emerging Infections Program Healthcare-Associated Infections and Antimicrobial Use Prevalence Survey Team. Multistate point-prevalence survey of health care-associated infections. N Engl J Med. 2014;370(13):1198–208.

Article   PubMed   PubMed Central   CAS   Google Scholar  

CDC. Threat Report – Pathogen Page: Multi-drug Resistant Pseudomonas. 2019. Accessed on May 23rd, 2021.

Kalil AC, Metersky ML, Klompas M, et al. Management of adults with hospital-acquired and ventilator-associated Pneumonia: 2016 clinical practice guidelines by the Infectious Diseases Society of American and the American thoracic society. Clin Inf Dis. 2016;63(5):e61–111.

Article   Google Scholar  

Chastre J, Wolff M, Fagon JY, et al. Comparison of 8 vs. 15 days of antibiotic therapy for ventilator-associated pneumonia in adults: a randomized trial. JAMA. 2003;290:2588–98.

Article   PubMed   CAS   Google Scholar  

Pugh R, Grant C, Cooke RPD et al. Short-course versus prolonged-course antibiotic therapy for hospital-acquired pneumonia in critically ill adults. Cochrane Database Syst Reviews 2015; 8.

Hart DE, Gallagher JC, Puzniak LA et al. A multicenter evaluation of ceftolozane/tazobactam treatment outcomes in immunocompromised patients with multidrug-resistant Pseudomonas aeruginosa infections. OFID 2021; 8(3): ofab089.

Teshome BF, Vouri SM, Hampton N, Kollef MH, Micek ST. Duration of exposure to Antipseudomonal β-Lactam antibiotics in the critically ill and development of New Resistance. Pharmacotherapy. 2019;39(3):261–70.

Bouglé A, Tuffet S, Federici L, Leone M, Monsel A, Dessalle T, Amour J, Dahyot-Fizelier C, Barbier F, Luyt CE, Langeron O, Cholley B, Pottecher J, Hissem T, Lefrant JY, Veber B, Legrand M, Demoule A, Kalfon P, Constantin JM, Rousseau A, Simon T, Foucrier A. iDIAPASON Trial Investigators. Comparison of 8 versus 15 days of antibiotic therapy for Pseudomonas aeruginosa ventilator-associated pneumonia in adults: a randomized, controlled, open-label trial. Intensive Care Med. 2022;48(7):841–849. doi: 10.1007/s00134-022-06690-5. Epub 2022 May 13. Erratum in: Intensive Care Med. 2022;: PMID: 35552788.

Torres A, Zhong N, Pachl J, et al. Ceftazidime-Avibactam versus meropenem in nosocomial pneumonia, including ventilator-associated pneumonia (REPROVE): a randomised, double-blind, phase 3 non-inferiority trial. Lancet Infect Dis. 2018;18(3):285–95.

Titov I, Wunderink RG, Roquilly A, et al. A Randomized, Double-blind, Multicenter Trial comparing efficacy and Safety of Imipenem/Cilastatin/Relebactam Versus Piperacillin/Tazobactam in adults with hospital-acquired or ventilator-associated bacterial pneumonia (RESTORE-IMI 2 study). Clin Infect Dis. 2021;73(11):e4539–48.

Article   PubMed   Google Scholar  

Kollef MH, Nováček M, Kivistik Ü, et al. Ceftolozane-tazobactam versus meropenem for treatment of nosocomial pneumonia (ASPECT-NP): a randomised, controlled, double-blind, phase 3, non-inferiority trial. Lancet Infect Dis. 2019;19(12):1299–311.

Garnacho-Montero J, Sa-Borges M, Sole-Violan J, et al. Optimal management therapy for P. Aeruginosa ventilator-associated pneumonia: an observational, multicenter study comparing monotherapy with combination antibiotic therapy. Crit Care Med. 2007;35(8):1888–95.

Download references

Acknowledgements

This study did not receive any funding.

Author information

Authors and affiliations.

Department of Pharmacy Services, Jackson Memorial Hospital, Miami, FL, USA

Clover N. Truong, Nafeesa Chin-Beckford, Ana Vega, Kailynn DeRonde, Julio Simon & Christine A. Vu

Norton Infectious Diseases Institute, Norton Healthcare, 4950 Norton Healthcare Blvd, Suite 303, Louisville, KY, 40241, USA

Clover N. Truong

Department of Medicine, Division of Infectious Disease, University of Miami Miller School of Medicine, Miami, FL, USA

Lilian M. Abbo

Department of Infection Prevention and Control, Jackson Health System, Miami, FL, USA

Lilian M. Abbo & Rossana Rosa

You can also search for this author in PubMed   Google Scholar

Contributions

All author contributed to design study. C.T. and J.S collected data. RR analyzed findings. C.T. analyzed and interpreted findings. C.T. prepared manuscripts with the support from N.B., A.V., K.D., J.S., L.A., R.R., and C.V. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Clover N. Truong .

Ethics declarations

Ethics approval and consent to participate.

The University of Miami Institutional Review Board (IRB) and Jackson Health System Clinical Research Review Committee (CRRC) approved the study and granted a waiver of informed consent. All methods were carried out in accordance with relevant guidelines and regulations. Data collected was kept and treated as confidential at all times.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Truong, C.N., Chin-Beckford, N., Vega, A. et al. Duration of antibiotic therapy for multidrug resistant Pseudomonas aeruginosa pneumonia: is shorter truly better?. BMC Infect Dis 24 , 911 (2024). https://doi.org/10.1186/s12879-024-09600-w

Download citation

Received : 14 May 2023

Accepted : 08 July 2024

Published : 03 September 2024

DOI : https://doi.org/10.1186/s12879-024-09600-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Treatment duration
  • Pseudomonas

BMC Infectious Diseases

ISSN: 1471-2334

exclusion criteria for literature review

IMAGES

  1. Inclusion and Exclusion Criteria for the Systematic Review

    exclusion criteria for literature review

  2. Literature review

    exclusion criteria for literature review

  3. Inclusion and exclusion criteria for the literature review

    exclusion criteria for literature review

  4. inclusion and exclusion criteria in literature review examples

    exclusion criteria for literature review

  5. Systematic review 'inclusion-exclusion criteria' flow chart.

    exclusion criteria for literature review

  6. 1. Inclusion and exclusion criteria for literature review

    exclusion criteria for literature review

VIDEO

  1. Inclusion and Exclusion Criteria

  2. Exclusion criteria number 8

  3. UDS,Criteria of inclusion and exclusion of subject area from the school curriculum

  4. PWS Clinical Trial Webinar

  5. TSPSC Group 1 Telangana culture ,arts, literature by Purushotham Reddy Sir #tspsc

  6. Defining the inclusion and exclusion criteria for a systematic review

COMMENTS

  1. Inclusion & Exclusion Criteria

    You may want to think about criteria that will be used to select articles for your literature review based on your research question. These are commonly known as inclusion criteria and exclusion criteria, and they set the boundaries for the literature review.. Inclusion and exclusion criteria are determined after formulating the research question but usually before the search is conducted ...

  2. LibGuides: Systematic Reviews : Inclusion and Exclusion Criteria

    A type of literature review that uses a systematic and rigorous approach to identify, select, appraise, and synthesize all available evidence on a particular topic. ... The inclusion and exclusion criteria must be decided before you start the review. Inclusion criteria is everything a study must have to be included. Exclusion criteria are the ...

  3. Define Inclusion/Exclusion Criteria

    Tip: Choose your criteria carefully to avoid bias. For example, if you exclude non-English language articles, you may be ignoring relevant studies. The following 6-minute video explains the relationship between inclusion and exclusion criteria and database searches.

  4. Selection Criteria

    Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review. Some examples are: Study used an observational design; Study used a qualitative methodology; Study was published more than 5 years ago; Study was published in a language other than English

  5. Inclusion and exclusion criteria

    Inclusion and exclusion criteria set the boundaries for the systematic review. They are determined after setting the research question usually before the search is conducted, however scoping searches may need to be undertaken to determine appropriate criteria. Many different factors can be used as inclusion or exclusion criteria.

  6. Avoiding Bias in Selecting Studies

    The EPC should carefully consider whether PICOTS criteria are effect modifiers and how inclusion and exclusion criteria may potentially skew the studies and thus results reported in the review. Table 2 below suggests potential implications or biases that may result from specific hypothetical examples of inclusion and exclusion criteria.

  7. Systematic Reviews: Inclusion and Exclusion Criteria

    An important part of the SR process is defining what will and will not be included in your review. Inclusion and exclusion criteria are developed after a research question is finalized but before a search is carried out. ... they are important in identifying gaps in the literature. Unanswered questions implications of an empty review. Slyer ...

  8. Inclusion and exclusion criteria in research studies: definitions and

    Establishing inclusion and exclusion criteria for study participants is a standard, required practice when designing high-quality research protocols. Inclusion criteria are defined as the key features of the target population that the investigators will use to answer their research question. 2 Typical inclusion criteria include demographic ...

  9. Inclusion and Exclusion Criteria

    Step 1: Developing and testing criteria. Developing the inclusion and exclusion criteria may involve an iterative process of refinement during review conceptualization and construction (see Chapter 2).During conceptualization, criteria may be adjusted as reviewers scope the likely literature base, consult stakeholders, and explore what questions may be feasible or relevant.

  10. Guidelines for writing a systematic review

    SRs are more comprehensive than a Literature Review, which most academics will be familiar with, as they follow a methodical process to identify and analyse existing literature ... These initial searches will form the development of inclusion and exclusion criteria to ensure and strengthen a methodical, reliable, and unbiased approach to the SR.

  11. How to Conduct a Systematic Review: A Narrative Literature Review

    Inclusion and exclusion criteria. Establishing inclusion and exclusion criteria come after formulating research questions. The concept of inclusion and exclusion of data in a systematic review provides a basis on which the reviewer draws valid and reliable conclusions regarding the effect of the intervention for the disorder under consideration ...

  12. Selecting Studies for Systematic Review: Inclusion and Exclusion Criteria

    The eligibility criteria are liberally applied in the beginning to ensure that relevant studies are included and no study is excluded without thorough evaluation. At the outset, studies are only excluded if they clearly meet one or more of the exclusion criteria. For example, if the focus of review is children, then studies with adult ...

  13. Determine inclusion and exclusion criteria

    What is a literature review? Steps in the Literature Review Process; Define your research question; Determine inclusion and exclusion criteria; Choose databases and search; ... Analyze Results; Write; Librarian Support; Artificial Intelligence (AI) Tools; Determine inclusion and exclusion criteria. Once you have a clearly defined research ...

  14. LibGuides: Systematic Reviews: Inclusion and Exclusion Criteria

    A balance of specific inclusion and exclusion criteria is paramount. For some systematic reviews, there may already be a large pre-existing body of literature. The search strategy may retrieve thousands of results that must be screened. Having explicit exclusion criteria from the beginning allows those conducting the screening process, an ...

  15. Inclusion and exclusion criteria

    Inclusion and exclusion criteria. Inclusion and exclusion criteria are a list of pre-defined characteristics to which literature must adhere to be included in a study. They are vital for the decision-making progress on what to review when undertaking a systematic review and will also help with systematic literature reviews.

  16. Chapter 3: Defining the criteria for including studies and how they

    Justify any changes to eligibility criteria or outcomes studied. In particular, post-hoc decisions about inclusion or exclusion of studies should keep faith with the objectives of the review rather than with arbitrary rules. Following pre-specified eligibility criteria is a fundamental attribute of a systematic review.

  17. Establish your Inclusion and Exclusion criteria

    Using specific criteria will help make sure your final review is as unbiased, transparent and ethical as possible. How to establish your Inclusion and Exclusion criteria To establish your criteria you need to define each aspect of your question to clarify what you are focusing on, and consider if there are any variations you also wish to explore.

  18. Define Inclusion/Exclusion Criteria

    To be included in the review, a study needs to meet all inclusion criteria and not meet any exclusion criteria. Inclusion/eligibility criteria include participants, interventions and comparisons and often study design. Outcomes are usually not part of the criteria, though some reviews do legitimately restrict eligibility to specific outcomes.

  19. Selecting Criteria

    Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review. Some examples are: Study used an observational design; Study used a qualitative methodology; Study was published more than 5 years ago; Study was published in a language other than English

  20. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  21. Sample Selection in Systematic Literature Reviews of Management

    The present methodological literature review (cf. Aguinis et al., 2020) addresses this void and aims to identify the dominant approaches to sample selection and provide insights into essential choices in this step of systematic reviews, with a particular focus on management research.To follow these objectives, I have critically reviewed systematic reviews published in the two most prominent ...

  22. Reviewing the literature

    Implementing evidence into practice requires nurses to identify, critically appraise and synthesise research. This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review. Literature reviews aim to answer focused questions to: inform professionals and patients of the best available ...

  23. Chapter 9 Methods for Literature Reviews

    The second form of literature review, which is the focus of this chapter, constitutes an original and valuable work of research in and of itself (Paré et al., 2015). ... Four inclusion and three exclusion criteria were utilized during the screening process. Both authors independently reviewed each of the identified articles to determine ...

  24. A Quantitative Systematic Literature Review of Combination Punishment

    A Quantitative Systematic Literature Review of Combination Punishment Literature: Progress Over the Last Decade ... Agreement was defined as both raters indicating that an article met review inclusion criteria. A ... selecting the same articles based on inclusion criteria and excluding the same articles for the same reasons in the exclusion ...

  25. Evaluation of Literature Review: Reciprocal Teaching & Reading

    Evaluation of a Literature Review: Reciprocal Teaching & Reading E VALUATION OF A L ITERATURE R EVIEW A SSIGNMENT I NSTRUCTIONS The purpose of this assignment is for you to read, evaluate, and critique the literature review of a dissertation using the criteria provided in this rubric. You will score each criterion with a 1, 2, or 3, where "3" is good and "1" is poor and then assign a final ...

  26. Future

    This literature review aims to identify and synthesize the characteristics of teacher-training programs that have effectively reduced school violence. ... with either quantitative or qualitative outcomes. Exclusion criteria were studies not specifically addressing school violence or lacking evaluative measures. Data sources included Web of ...

  27. Financial fraud detection through the application of machine learning

    Addressing this issue, this study presents a literature review on financial fraud detection through machine learning techniques. The PRISMA and Kitchenham methods were applied, and 104 articles ...

  28. Perioperative outcomes of robot-assisted versus laparoscopic distal

    Literature search. This study followed the PRISMA guidelines and was registered in the PROSPERO database (CRD42024555487). Two researchers (LW and WSJ) independently screened and extracted data based on predefined criteria to determine study eligibility for the systematic review. Relevant data were collected up to June 1, 2024.

  29. Duration of antibiotic therapy for multidrug resistant Pseudomonas

    The 2016 IDSA guideline recommends a treatment duration of at least 7 days for hospital-acquired (HAP)/ventilator-associated pneumonia (VAP). The limited literature has demonstrated higher rates of recurrence for non-glucose fermenting gram-negative bacilli with short course therapy, raising the concern of optimal treatment duration for these pathogens.