systematic vs targeted literature review

Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT)

The difference between a systematic review and a literature review

Best Practice

Home | Blog | Best Practice | The difference between a systematic review and a literature review

Covidence takes a look at the difference between the two

Most of us are familiar with the terms systematic review and literature review. Both review types synthesise evidence and provide summary information. So what are the differences? What does systematic mean? And which approach is best 🤔 ?

‘ Systematic ‘ describes the review’s methods. It means that they are transparent, reproducible and defined before the search gets underway. That’s important because it helps to minimise the bias that would result from cherry-picking studies in a non-systematic way.

This brings us to literature reviews. Literature reviews don’t usually apply the same rigour in their methods. That’s because, unlike systematic reviews, they don’t aim to produce an answer to a clinical question. Literature reviews can provide context or background information for a new piece of research. They can also stand alone as a general guide to what is already known about a particular topic.

Interest in systematic reviews has grown in recent years and the frequency of ‘systematic reviews’ in Google books has overtaken ‘literature reviews’ (with all the usual Ngram Viewer warnings – it searches around 6% of all books, no journals).

systematic vs targeted literature review

Let’s take a look at the two review types in more detail to highlight some key similarities and differences 👀.

🙋🏾‍♂️ What is a systematic review?

Systematic reviews ask a specific question about the effectiveness of a treatment and answer it by summarising evidence that meets a set of pre-specified criteria.

The process starts with a research question and a protocol or research plan. A review team searches for studies to answer the question using a highly sensitive search strategy. The retrieved studies are then screened for eligibility using the inclusion and exclusion criteria (this is done by at least two people working independently). Next, the reviewers extract the relevant data and assess the quality of the included studies. Finally, the review team synthesises the extracted study data and presents the results. The process is shown in figure 2 .

The results of a systematic review can be presented in many ways and the choice will depend on factors such as the type of data. Some reviews use meta-analysis to produce a statistical summary of effect estimates. Other reviews use narrative synthesis to present a textual summary.

Covidence accelerates the screening, data extraction, and quality assessment stages of your systematic review. It provides simple workflows and easy collaboration with colleagues around the world.

When is it appropriate to do a systematic review?

If you have a clinical question about the effectiveness of a particular treatment or treatments, you could answer it by conducting a systematic review. Systematic reviews in clinical medicine often follow the PICO framework, which stands for:

👦 Population (or patients)

💊 Intervention

💊 Comparison

Here’s a typical example of a systematic review title that uses the PICO framework: Alarms [intervention] versus drug treatments [comparison] for the prevention of nocturnal enuresis [outcome] in children [population]

Key attributes

Systematic reviews follow prespecified methods
The methods are explicit and replicable
The review team assesses the quality of the evidence and attempts to minimise bias
Results and conclusions are based on the evidence

🙋🏻‍♀️ What is a literature review?

Literature reviews provide an overview of what is known about a particular topic. They evaluate the material, rather than simply restating it, but the methods used to do this are not usually prespecified and they are not described in detail in the review. The search might be comprehensive but it does not aim to be exhaustive. Literature reviews are also referred to as narrative reviews.

Literature reviews use a topical approach and often take the form of a discussion. Precision and replicability are not the focus, rather the author seeks to demonstrate their understanding and perhaps also present their work in the context of what has come before. Often, this sort of synthesis does not attempt to control for the author’s own bias. The results or conclusion of a literature review is likely to be presented using words rather than statistical methods.

When is it appropriate to do a literature review?

We’ve all written some form of literature review: they are a central part of academic research ✍🏾. Literature reviews often form the introduction to a piece of writing, to provide the context. They can also be used to identify gaps in the literature and the need to fill them with new research 📚.

Literature reviews take a thematic approach
They do not specify inclusion or exclusion criteria
They do not answer a clinical question
The conclusions might be influenced by the author’s own views

🙋🏽 Ok, but what is a systematic literature review?

A quick internet search retrieves a cool 200 million hits for ‘systematic literature review’. What strange hybrid is this 🤯🤯 ?

Systematic review methodology has its roots in evidence-based medicine but it quickly gained traction in other areas – the social sciences for example – where researchers recognise the value of being methodical and minimising bias. Systematic review methods are increasingly applied to the more traditional types of review, including literature reviews, hence the proliferation of terms like ‘systematic literature review’ and many more.

Beware of the labels 🚨. The terminology used to describe review types can vary by discipline and changes over time. To really understand how any review was done you will need to examine the methods critically and make your own assessment of the quality and reliability of each synthesis 🤓.

Review methods are evolving constantly as researchers find new ways to meet the challenge of synthesising the evidence. Systematic review methods have influenced many other review types, including the traditional literature review.

Covidence is a web-based tool that saves you time at the screening, selection, data extraction and quality assessment stages of your systematic review. It supports easy collaboration across teams and provides a clear overview of task status.

Get a glimpse inside Covidence and how it works

Laura Mellor. Portsmouth, UK

Perhaps you'd also like....

Data Extraction Communicate Regularly & Keep a Log for Reporting Checklists

Data Extraction Tip 5: Communicate Regularly

The Covidence Global Scholarship recipients are putting evidence-based research into practice. We caught up with some of the winners to discover the impact of their work and find out more about their experiences.

Data Extraction: Extract the right amount of data

Data Extraction Tip 4: Extract the Right Amount of Data

Data Extraction Tip 3: Pilot the Template

Better systematic review management, head office, working for an institution or organisation.

Find out why over 350 of the world’s leading institutions are seeing a surge in publications since using Covidence!

Request a consultation with one of our team members and start empowering your researchers:

By using our site you consent to our use of cookies to measure and improve our site’s performance. Please see our Privacy Policy for more information.

Research Process
Manuscript Preparation
Manuscript Review
Publication Process
Publication Recognition
Language Editing Services
Translation Services

Systematic Literature Review or Literature Review?

3 minute read
57.2K views

Table of Contents

As a researcher, you may be required to conduct a literature review. But what kind of review do you need to complete? Is it a systematic literature review or a standard literature review? In this article, we’ll outline the purpose of a systematic literature review, the difference between literature review and systematic review, and other important aspects of systematic literature reviews.

What is a Systematic Literature Review?

The purpose of systematic literature reviews is simple. Essentially, it is to provide a high-level of a particular research question. This question, in and of itself, is highly focused to match the review of the literature related to the topic at hand. For example, a focused question related to medical or clinical outcomes.

The components of a systematic literature review are quite different from the standard literature review research theses that most of us are used to (more on this below). And because of the specificity of the research question, typically a systematic literature review involves more than one primary author. There’s more work related to a systematic literature review, so it makes sense to divide the work among two or three (or even more) researchers.

Your systematic literature review will follow very clear and defined protocols that are decided on prior to any review. This involves extensive planning, and a deliberately designed search strategy that is in tune with the specific research question. Every aspect of a systematic literature review, including the research protocols, which databases are used, and dates of each search, must be transparent so that other researchers can be assured that the systematic literature review is comprehensive and focused.

Most systematic literature reviews originated in the world of medicine science. Now, they also include any evidence-based research questions. In addition to the focus and transparency of these types of reviews, additional aspects of a quality systematic literature review includes:

Clear and concise review and summary
Comprehensive coverage of the topic
Accessibility and equality of the research reviewed

Systematic Review vs Literature Review

The difference between literature review and systematic review comes back to the initial research question. Whereas the systematic review is very specific and focused, the standard literature review is much more general. The components of a literature review, for example, are similar to any other research paper. That is, it includes an introduction, description of the methods used, a discussion and conclusion, as well as a reference list or bibliography.

A systematic review, however, includes entirely different components that reflect the specificity of its research question, and the requirement for transparency and inclusion. For instance, the systematic review will include:

Eligibility criteria for included research
A description of the systematic research search strategy
An assessment of the validity of reviewed research
Interpretations of the results of research included in the review

As you can see, contrary to the general overview or summary of a topic, the systematic literature review includes much more detail and work to compile than a standard literature review. Indeed, it can take years to conduct and write a systematic literature review. But the information that practitioners and other researchers can glean from a systematic literature review is, by its very nature, exceptionally valuable.

This is not to diminish the value of the standard literature review. The importance of literature reviews in research writing is discussed in this article . It’s just that the two types of research reviews answer different questions, and, therefore, have different purposes and roles in the world of research and evidence-based writing.

Systematic Literature Review vs Meta Analysis

It would be understandable to think that a systematic literature review is similar to a meta analysis. But, whereas a systematic review can include several research studies to answer a specific question, typically a meta analysis includes a comparison of different studies to suss out any inconsistencies or discrepancies. For more about this topic, check out Systematic Review VS Meta-Analysis article.

Language Editing Plus

With Elsevier’s Language Editing Plus services , you can relax with our complete language review of your systematic literature review or literature review, or any other type of manuscript or scientific presentation. Our editors are PhD or PhD candidates, who are native-English speakers. Language Editing Plus includes checking the logic and flow of your manuscript, reference checks, formatting in accordance to your chosen journal and even a custom cover letter. Our most comprehensive editing package, Language Editing Plus also includes any English-editing needs for up to 180 days.

How to Make a PowerPoint Presentation of Your Research Paper

What is and How to Write a Good Hypothesis in Research?

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

Making Technical Writing in Environmental Engineering Accessible

To Err is Not Human: The Dangers of AI-assisted Academic Writing

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

Choosing the Right Research Methodology: A Guide for Researchers

Why is data validation important in research?

Writing a good review article

Input your search keywords and press Enter.

Locations and Hours
UCLA Library
Research Guides
Biomedical Library Guides

Systematic Reviews

Types of Literature Reviews

What Makes a Systematic Review Different from Other Types of Reviews?

Planning Your Systematic Review
Database Searching
Creating the Search
Search Filters and Hedges
Grey Literature
Managing and Appraising Results
Further Resources

Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91–108. doi:10.1111/j.1471-1842.2009.00848.x


	Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or mode	Seeks to identify most significant items in the field	No formal quality assessment. Attempts to evaluate according to contribution	Typically narrative, perhaps conceptual or chronological	Significant component: seeks to identify conceptual contribution to embody existing or derive new theory
	Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings	May or may not include comprehensive searching	May or may not include quality assessment	Typically narrative	Analysis may be chronological, conceptual, thematic, etc.
Mapping review/ systematic map	Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature	Completeness of searching determined by time/scope constraints	No formal quality assessment	May be graphical and tabular	Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research
	Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results	Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness	Quality assessment may determine inclusion/ exclusion and/or sensitivity analyses	Graphical and tabular with narrative commentary	Numerical analysis of measures of effect assuming absence of heterogeneity
	Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies	Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies	Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists	Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies	Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other
	Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics	May or may not include comprehensive searching (depends whether systematic overview or not)	May or may not include quality assessment (depends whether systematic overview or not)	Synthesis depends on whether systematic or not. Typically narrative but may include tabular features	Analysis may be chronological, conceptual, thematic, etc.
	Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies	May employ selective or purposive sampling	Quality assessment typically used to mediate messages not for inclusion/exclusion	Qualitative, narrative synthesis	Thematic analysis, may include conceptual models
	Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research	Completeness of searching determined by time constraints	Time-limited formal quality assessment	Typically narrative and tabular	Quantities of literature and overall quality/direction of effect of literature
	Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research)	Completeness of searching determined by time/scope constraints. May include research in progress	No formal quality assessment	Typically tabular with some narrative commentary	Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review
	Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives	Aims for comprehensive searching of current literature	No formal quality assessment	Typically narrative, may have tabular accompaniment	Current state of knowledge and priorities for future investigation and research
	Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review	Aims for exhaustive, comprehensive searching	Quality assessment may determine inclusion/exclusion	Typically narrative with tabular accompaniment	What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research
	Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’	Aims for exhaustive, comprehensive searching	May or may not include quality assessment	Minimal narrative, tabular summary of studies	What is known; recommendations for practice. Limitations
	Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment	May or may not include comprehensive searching	May or may not include quality assessment	Typically narrative with tabular accompaniment	What is known; uncertainty around findings; limitations of methodology
	Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results	Identification of component reviews, but no search for primary studies	Quality assessment of studies within component reviews and/or of reviews themselves	Graphical and tabular with narrative commentary	What is known; recommendations for practice. What remains unknown; recommendations for future research

<< Previous: Home
Next: Planning Your Systematic Review >>
Last Updated: Jul 23, 2024 3:40 PM
URL: https://guides.library.ucla.edu/systematicreviews

MSK Library Blog

Sharing research, resources & news.

Posts are written by library staff and reflect their personal opinions not necessarily those of MSK.

Systematic Review vs. Literature Review…What’s Best for Your Needs?

We at the MSK Library are often called upon to help our researchers with searches. Whether it’s a literature review or a systematic review depends on the needs of the patron, but what is the difference between these two and when are they needed? Both systematic and literature (or comprehensive) reviews are a gathering of available information on a certain subject. The difference comes in the depth of the research and the reporting of the conclusions. Let’s take a look.

A literature or comprehensive review brings together information on a topic in order to provide an overview of the available literature on a certain subject. Research materials are gathered through searching one or more databases and qualitatively brought together in the review. Literature reviews can be the first step in perusing a topic for a further study to get an idea of the current state of the science available but they can also be their own publication. Complete our Literature Search form if you would like us to find information on a review or other project you are working in.

Systematic reviews look at a topic more in depth using a scientific method. By looking at not only the available literature, but also theses/dissertations, abstracts/conference proceedings, and other grey literature sources, systematic reviews seek to be all-encompassing in showing results on a topic. To complete a systematic review, a team of researchers select a clinical question to be answered and specify eligibility criteria for their resources before synthesizing the information to answer their question. Multiple databases are searched in order to find every possible article on the topic. Not only are the results of the searches presented, but the search strategy, assessments and interpretations of research are also included in this form of review. Here at MSK, we use the PRISMA Statement to provide a helpful structure when working on systematic reviews. Take a look at our Systematic Review LibGuide to learn more about this investigation into the literature.

Penn State University Libraries

Home-Articles and Databases
Asking the clinical question
PICO & Finding Evidence
Evaluating the Evidence
Systematic Review vs. Literature Review
Ethical & Legal Issues for Nurses
Nursing Library Instruction Course
Data Management Toolkit This link opens in a new window
Useful Nursing Resources
Writing Resources
LionSearch and Finding Articles
The Catalog and Finding Books

Know the Difference! Systematic Review vs. Literature Review

It is common to confuse systematic and literature reviews as both are used to provide a summary of the existent literature or research on a specific topic. Even with this common ground, both types vary significantly. Please review the following chart (and its corresponding poster linked below) for the detailed explanation of each as well as the differences between each type of review.

Systematic vs. Literature Review
	Systematic Review	Literature Review
Definition	High-level overview of primary research on a focused question that identifies, selects, synthesizes, and appraises all high quality research evidence relevant to that question	Qualitatively summarizes evidence on a topic using informal or subjective methods to collect and interpret studies
Goals	Answers a focused clinical question Eliminate bias	Provide summary or overview of topic
Question	Clearly defined and answerable clinical question Recommend using PICO as a guide	Can be a general topic or a specific question
Components	Pre-specified eligibility criteria Systematic search strategy Assessment of the validity of findings Interpretation and presentation of results Reference list	Introduction Methods Discussion Conclusion Reference list
Number of Authors	Three or more	One or more
Timeline	Months to years Average eighteen months	Weeks to months
Requirement	Thorough knowledge of topic Perform searches of all relevant databases Statistical analysis resources (for meta-analysis)	Understanding of topic Perform searches of one or more databases
Value	Connects practicing clinicians to high quality evidence Supports evidence-based practice	Provides summary of literature on the topic

What's in a name? The difference between a Systematic Review and a Literature Review, and why it matters by Lynn Kysh, MLIS, University of Southern California - Norris Medical Library
<< Previous: Evaluating the Evidence
Next: Ethical & Legal Issues for Nurses >>
Last Updated: Mar 1, 2024 11:54 AM
URL: https://guides.libraries.psu.edu/nursing

Plan for research
Find information
Manage data
Publish and share
Research metrics
Help and site map

Plan for research

Introduction

Starting your research

Identifying and developing research

Systematic vs literature reviews

Collaborating

Indigenous research

UN Sustainable Development Goals

Data management

Responsible conduct of research

Library resources and support

Systematic review
Literature review
Systematic literature review

A systematic review is a summary of the medical literature that uses explicit methods to perform a comprehensive literature search and critical appraisal of individual studies and that uses appropriate statistical techniques to combine these valid studies.

Visit the for more information on how to conduct this type of review.

Question	Focused on a single question
Protocol	A protocol is usually registered or published prior to commencing the review
Background	Provides a summary of the available literature on the topic
Objectives	Clear objectives are identified
Inclusion and exclusion criteria	Criteria stated before the review is conducted
Search strategy	Comprehensive search conducted in a systematic way
Process of selecting articles	Transparent to minimize bias and human error, detailed in the protocol
Process of evaluating articles	Comprehensive evaluation of study quality
Process of extracting relevant information	Usually clear and specific
Results and data synthesis	Clear summaries of studies based on high quality evidence
Discussion	Written by expert/s with a detailed knowledge of the issues
Number of reviewers	At least three to independently evaluate studies and adjudicate any differences

A literature review is a critical and in-depth evaluation of previous research. It is a summary and synopsis of a particular area of research, allowing anybody reading the paper to establish why you are pursuing this particular research program. A good literature review expands upon the reasons behind selecting a particular research question.

Question	Not necessarily focused on a single question, but may describe an overview
Protocol	No protocol is included
Background	Provides summary of the available literature on a topic
Objectives	Objective may or may not be identified
Inclusion and exclusion criteria	Criteria may not be specified
Search strategy	Strategy may not be explicitly stated
Process of selecting articles	Not described in a literature review
Process of evaluating articles	Evaluation of study quality may or may not be included
Process of extracting relevant information	Not clear or explicit
Results and data synthesis	Summary based on studies where the quality of the articles may not be expected. May also be influenced by the reviewer's theories, needs and beliefs
Discussion	Written by expert/s with a detailed knowledge of the issues
Number of reviewers	Can be conducted by one reviewer

A systematic literature review is designed to review relevant literature in your field through a highly rigorous and 'systematic' process.

The process of undertaking a systematic literature review covers not only the content found in the literature but the methods used to find the literature, what search strategies you used, how and where you searched, what was included/excluded from your research and to determine whether any gaps can be found in existing research.

Question	Focused on a single question
Protocol	A protocol may be created
Background	Provides a summary of the available literature on the topic
Objectives	Clear objectives are identified
Inclusion and exclusion criteria	Criteria stated before the review is conducted
Search strategy	Comprehensive search conducted in a systematic way
Process of selecting articles	Usually clear and explicit
Process of evaluating articles	Comprehensive evaluation of study quality
Process of extracting relevant information	Usually clear and specific
Results and data synthesis	Clear summaries of studies based on high quality evidence
Discussion	Written by expert/s with a detailed knowledge of the issues
Number of reviewers	Can be conducted by one reviewer

Last Updated: Aug 14, 2024 10:16 AM
URL: https://libguides.newcastle.edu.au/rstoolkit

Literature Review

What is a Literature Review?
What is a good literature review?
Types of Literature Reviews
What are the parts of a Literature Review?
What is the difference between a Systematic Review and a Literature Review?

Systematic vs Literature

Systematic reviews and literature reviews are commonly confused. The main difference between the two is that systematic reviews answer a focused question whereas literature reviews contextualize a topic.

Systematic Review	Literature Review

Kysh, Lynn (2013): Difference between a systematic review and a literature review. Available at: https://figshare.com/articles/Difference_between_a_systematic_review_and_a_literature_review/766364

New More Help with Writing?

Visit the writing center via lamc tutoring.

Another Writing Tip!

Review not just what scholars are saying, but how are they saying it. Some questions to ask:

How are they organizing their ideas?
What methods have they used to study the problem?
What theories have been used to explain, predict, or understand their research problem?
What sources have they cited to support their conclusions?
How have they used non-textual elements [e.g., charts, graphs, figures, etc.] to illustrate key points?

When you begin to write your literature review section, you'll be glad you dug deeper into how the research was designed and constructed because it establishes a means for developing more substantial analysis and interpretation of the research problem.

Hart, Chris. Doing a Literature Review: Releasing the Social Science Research Imagination . Thousand Oaks, CA: Sage Publications, 1998.

<< Previous: What are the parts of a Literature Review?
Last Updated: Nov 21, 2023 12:49 PM
URL: https://libguides.lamission.edu/c.php?g=1190903

5 differences between a systematic review and other types of literature review

September 26, 2017.

There are many types of reviews of the medical and public health evidence, each with its own benefits and challenges. In this blog post, we detail five key differences between a systematic review and other types of reviews, including narrative and comprehensive reviews.

First, we must define some terms. “Literature review” is a general term that describes a summary of the evidence on a certain topic. Literature reviews can be very simple or highly complex, and they can use a variety of methods for finding, assessing, and presenting evidence. A “systematic review” is a specific type of review that uses rigorous and transparent methods in an effort to summarize all of the available evidence with little to no bias. A good systematic review adheres to the international standards set forth in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 27-item checklist. 1 Reviews that are less rigorous are often called “narrative,” “comprehensive,” or simply “literature reviews.”

So, what are the 5 key differences between a systematic review and other types of review?

1. The goal of the review The goal of a literature review can be broad and descriptive (example: “ Describe the available treatments for sleep apnea ”) or it can be to answer a specific question (example: “ What is the efficacy of CPAP for people with sleep apnea? ”). The goal of a systematic review is to answer a specific and focused question (example: “ Which treatment for sleep apnea reduces the apnea-hypopnea index more: CPAP or mandibular advancement device? ”). People seeking to make evidence-based decisions look to systematic reviews due to their completeness and reduced risk of bias.

2. Searching for evidence Where and how one searches for evidence is an important difference. While literature reviews require only one database or source, systematic reviews require more comprehensive efforts to locate evidence. Multiple databases are searched, each with a specifically tailored search strategy (usually designed and implemented by a specialist librarian). In addition, systematic reviews often include attempts to find data beyond typical databases. Systematic reviewers might search conference abstracts or the web sites of professional associations or pharmaceutical companies, and they may contact study authors to obtain additional or unpublished data. All of these extra steps reflect an attempt to minimize bias in the summary of the evidence. 3. Assessing search results In a systematic review, the parameters for inclusion are established at the start of the project and applied consistently to search results. Usually, such parameters take the form of PICOs (population, intervention, comparison, outcomes). Reviewers hold search results against strict criteria based on the PICOs to determine appropriateness for inclusion. Another key component of a systematic review is dual independent review of search results; each search result is reviewed by at least two people independently. In many other literature reviews, there is only a single reviewer. This can result in bias (even if it is unintentional) and missed studies.

4. Summary of findings In a systematic review, an effort is usually made to assess the quality of the evidence, often using risk of bias assessment, at the study level and often across studies. Other literature reviews rarely assess and report any formal quality assessment by individual study. Risk of bias assessment is important to a thorough summary of the evidence, since conclusions based on biased results can be incorrect (and dangerous, at worst). Results from a systematic review can sometimes be pooled quantitatively (e.g., in a meta-analysis) to provide numeric estimates of treatment effects, for example.

5. Utility of results Due to the rigor and transparency applied to a systematic review, it is not surprising that the results are usually of higher quality and at lower risk of bias than results from other types of literature review. Literature reviews can be useful to inform background sections of papers and reports and to give the reader an overview of a topic. Systematic reviews are used by professional associations and government agencies to issue guidelines and recommendations; such important activities are rarely based on a non-systematic review. Clinicians may also rely on high quality systematic reviews to make evidence-based decisions about patient care.

Each type of review has a place in the scientific literature. For narrow, specific research questions, a systematic review can provide a thorough summary and assessment of all of the available evidence. For broader research questions, other types of literature review can summarize the best available evidence using targeted search strategies. Ultimately, the choice of methodology depends on the research question and the goal of the review.

[1] Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyse s: The PRISMA Statement. PLoS Med 6(7): e1000097. doi:10.1371/journal.pmed1000097.

Literature Review Research

Literature review vs. systematic review.

Literature Review Process
Finding Literature Reviews
Helpful Tips and Resources
Citing Sources This link opens in a new window

Resources for Systematic Reviews

NIH Systematic Review Protocols and Protocol Registries Systematic review services and information from the National Institutes of Health.
Purdue University Systematic Reviews LibGuide Purdue University has created this helpful online research guide on systematic reviews. Most content is available publicly but please note that some links are accessible only to Purdue students.

It is common to confuse literature and systematic reviews because both are used to provide a summary of the existing literature or research on a specific topic. Despite this commonality, these two reviews vary significantly. The table below highlights the differences.


	Qualitatively summarizes evidence on a topic using informal or subjective methods to collect and interpret studies	High-level overview of primary research on a focused question that identifies, selects, synthesizes, and appraises all high quality research evidence to that question
	Provide summary or overview of topic	Answer a focused clinical question Eliminate bias
	Can be a general topic or specific question	Clearly defined and answerable clinical question
	Introduction Methods Discussion Conclusion Reference List	Pre-specified eligibility criteria Systematic search strategy Assessment of the validity of findings Interpretation and presentation of results Reference list
	One or more	Three or more
	Weeks to months	Months to years (average 18 months)
	Understanding of topic Perform searches of one or more databases	Thorough knowledge of topic Perform searches of all relevant databases Statistical analysis resources (for meta-analysis)
	Provides summary of literature on a topic	Connects practicing clinicians to high-quality evidence Supports evidence-based practice

Kysh, Lynn (2013). Difference between a systematic review and a literature review. figshare. Poster. https://doi.org/10.6084/m9.figshare.766364.v1

<< Previous: Home
Next: Literature Review Process >>
Last Updated: May 6, 2024 4:11 PM
URL: https://tcsedsystem.libguides.com/literature_review

Social Work: Research Overview

School of Social Work
Literature Reviews
Scoping Vs Systematic Reviews
Search Strategies
APA Tutorials and Software
Paper Formatting Basics
Citation Tips
Qualitative Vs Quantitative
Primary Vs Secondary Resources
Data Management
Research Mavs

Scoping Reviews

Systematic Review

Scoping Review: Explained!
PRISMA 2020 Example
Process and Tools

Joko Gunawan, PhD Youtube Channel He has other videos on different types of review styles which can be a useful tool.

How to Create an Effective PRISMA Flow Diagram | AJE

Scoping Reviews: Tools This is a guide with links that can be helpful on guides and tools on scoping reviews from the University of Nebraska-Lincoln

Systematic Reviews

Systematic Review: Explained!
About Systematic Reviews
General Steps
Time Commitment

Levels of Research Evidence

Systematic reviews are considered the highest form of evidence as they are an accumulation of research on one topic. Cochrane Systematic Reviews are considered the most rigorous systematic reviews being done .

	Narrative Literature Review	Systematic Literature Review
	Broad	Narrow
	Not specified, potentially biased	Comprehensive sources and search approach explicitly specified
	Not usually specified, potentially biased	uniformly applied preselected inclusion/exclusion criteria
	Variable	Rigorous critical evaluation
	Often qualitative, quantitative through meta-analysis*	Often qualitative, quantitative through meta-analysis*

* Meta-analysis is a method of statistically combining the results of multiple studies in order to arrive at a quantitative conclusion about a body of literature and is most often used to assess the clinical effectiveness of healthcare interventions ("Meta-analysis", 2008).

Steps for a Systematic Review

Develop an answerable question
Check for recent systematic reviews
Agree on specific inclusion and exclusion criteria
Develop a system to organize data and notes
Devise reproducible search methods
Launch and track exhaustive search
Organize search results
Reproduce search results
Abstract data into a standardized format
Synthesize data using statistical methods (meta-analysis)
Write about what you found

To learn more, see this presentation.

Timeline for a Cochrane Review

Table reproduced from Cochrane systematic reviews handbook.

Recommended Guidelines

The Cochrane Handbook for Systematic Reviews of Interventions is the official document that describes in detail the process of preparing and maintaining Cochrane systematic reviews on the effects of healthcare interventions.

Welcome to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) website! PRISMA is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. PRISMA focuses on the reporting of reviews evaluating randomized trials, but can also be used as a basis for reporting systematic reviews of other types of research, particularly evaluations of interventions.

The JBI Reviewers’ Manual is designed to provide authors with a comprehensive guide to conducting JBI systematic reviews. It describes in detail the process of planning, undertaking and writing up a systematic review of qualitative, quantitative, economic, text and opinion based evidence. It also outlines JBI support mechanisms for those doing review work and opportunities for publication and training. The JBI Reviewers Manual should be used in conjunction with the JBI SUMARI User Guide.

These standards are for systematic reviews of comparative effectiveness research of therapeutic medical or surgical interventions

Green, S., & Higgins, J. P. T. (editors). (2011). Chapter 2: Preparing a Cochrane review. In J. P. T. Higgins, & S. Green (Eds.). Cochrane Handbook for Systematic Reviews of Interventions (Version 5.1.0). Available from http://handbook.cochrane.org

Meta-Analysis. (2008). In W. A. Darity, Jr. (Ed.), International Encyclopedia of the Social Sciences (2nd ed., Vol. 5, pp. 104-105). Detroit: Macmillan Reference USA.

<< Previous: Literature Reviews
Next: Search Strategies >>
Last Updated: Aug 16, 2024 4:29 PM
URL: https://libguides.uta.edu/c.php?g=1388834

University of Texas Arlington Libraries 702 Planetarium Place · Arlington, TX 76019 · 817-272-3000

Internet Privacy
Accessibility
Problems with a guide? Contact Us.

Start your free trial

Arrange a trial for your organisation and discover why FSTA is the leading database for reliable research on the sciences of food and health.

REQUEST A FREE TRIAL

Research Skills Blog

What is the difference between a systematic review and a systematic literature review?

By Carol Hollier on 07-Jan-2020 14:23:00

Systematic Literative Reviews | IFIS Publishing

For those not immersed in systematic reviews, understanding the difference between a systematic review and a systematic literature review can be confusing. It helps to realise that a “systematic review” is a clearly defined thing, but ambiguity creeps in around the phrase “systematic literature review” because people can and do use it in a variety of ways.

A systematic review is a research study of research studies. To qualify as a systematic review, a review needs to adhere to standards of transparency and reproducibility. It will use explicit methods to identify, select, appraise, and synthesise empirical results from different but similar studies. The study will be done in stages:

In stage one, the question, which must be answerable, is framed
Stage two is a comprehensive literature search to identify relevant studies
In stage three the identified literature’s quality is scrutinised and decisions made on whether or not to include each article in the review
In stage four the evidence is summarised and, if the review includes a meta-analysis, the data extracted; in the final stage, findings are interpreted. [1]

Some reviews also state what degree of confidence can be placed on that answer, using the GRADE scale. By going through these steps, a systematic review provides a broad evidence base on which to make decisions about medical interventions, regulatory policy, safety, or whatever question is analysed. By documenting each step explicitly, the review is not only reproducible, but can be updated as more evidence on the question is generated.

Sometimes when people talk about a “systematic literature review”, they are using the phrase interchangeably with “systematic review”. However, people can also use the phrase systematic literature review to refer to a literature review that is done in a fairly systematic way, but without the full rigor of a systematic review.

For instance, for a systematic review, reviewers would strive to locate relevant unpublished studies in grey literature and possibly by contacting researchers directly. Doing this is important for combatting publication bias, which is the tendency for studies with positive results to be published at a higher rate than studies with null results. It is easy to understand how this well-documented tendency can skew a review’s findings, but someone conducting a systematic literature review in the loose sense of the phrase might, for lack of resource or capacity, forgo that step.

Another difference might be in who is doing the research for the review. A systematic review is generally conducted by a team including an information professional for searches and a statistician for meta-analysis, along with subject experts. Team members independently evaluate the studies being considered for inclusion in the review and compare results, adjudicating any differences of opinion. In contrast, a systematic literature review might be conducted by one person.

Overall, while a systematic review must comply with set standards, you would expect any review called a systematic literature review to strive to be quite comprehensive. A systematic literature review would contrast with what is sometimes called a narrative or journalistic literature review, where the reviewer’s search strategy is not made explicit, and evidence may be cherry-picked to support an argument.

FSTA is a key tool for systematic reviews and systematic literature reviews in the sciences of food and health.

The patents indexed help find results of research not otherwise publicly available because it has been done for commercial purposes.

The FSTA thesaurus will surface results that would be missed with keyword searching alone. Since the thesaurus is designed for the sciences of food and health, it is the most comprehensive for the field.

All indexing and abstracting in FSTA is in English, so you can do your searching in English yet pick up non-English language results, and get those results translated if they meet the criteria for inclusion in a systematic review.

FSTA includes grey literature (conference proceedings) which can be difficult to find, but is important to include in comprehensive searches.

FSTA content has a deep archive. It goes back to 1969 for farm to fork research, and back to the late 1990s for food-related human nutrition literature—systematic reviews (and any literature review) should include not just the latest research but all relevant research on a question.

You can also use FSTA to find literature reviews.

FSTA allows you to easily search for review articles (both narrative and systematic reviews) by using the subject heading or thesaurus term “REVIEWS" and an appropriate free-text keyword.

On the Web of Science or EBSCO platform, an FSTA search for reviews about cassava would look like this: DE "REVIEWS" AND cassava.

On the Ovid platform using the multi-field search option, the search would look like this: reviews.sh. AND cassava.af.

In 2011 FSTA introduced the descriptor META-ANALYSIS, making it easy to search specifically for systematic reviews that include a meta-analysis published from that year onwards.

On the EBSCO or Web of Science platform, an FSTA search for systematic reviews with meta-analyses about staphylococcus aureus would look like this: DE "META-ANALYSIS" AND staphylococcus aureus.

On the Ovid platform using the multi-field search option, the search would look like this: meta-analysis.sh. AND staphylococcus aureus.af.

Systematic reviews with meta-analyses published before 2011 are included in the REVIEWS controlled vocabulary term in the thesaurus.

An easy way to locate pre-2011 systematic reviews with meta-analyses is to search the subject heading or thesaurus term "REVIEWS" AND meta-analysis as a free-text keyword AND another appropriate free-text keyword.

On the Web of Science or EBSCO platform, the FSTA search would look like this: DE "REVIEWS" AND meta-analysis AND carbohydrate*

On the Ovid platform using the multi-field search option, the search would look like this: reviews .sh. AND meta-analysis.af. AND carbohydrate*.af.

Related resources:

Literature Searching Best Practise Guide
Predatory publishing: Investigating researchers’ knowledge & attitudes
The IFIS Expert Guide to Journal Publishing

Library image by Paul Schafer , microscope image by Matthew Waring , via Unsplash.

FSTA - Food Science & Technology Abstracts
IFIS Collections
Resources Hub
Diversity Statement
Sustainability Commitment
Company news
Frequently Asked Questions
Privacy Policy
Terms of Use for IFIS Collections

Ground Floor, 115 Wharfedale Road, Winnersh Triangle, Wokingham, Berkshire RG41 5RB

Get in touch with IFIS

Systematic Reviews: Types of reviews

Systematic literature reviews.

Using a systematic approach in conducting a literature review

A literature review may be undertaken in a systematic way using a rigorous and structured search strategy in order to be comprehensive, without necessarily attempting to include all available research on a particular topic, as in a systematic review.

Why be systematic? This approach can:

Provide a robust overview of the available literature on your topic
Ensure relevant literature is identified and key publications are not overlooked
Reduce irrelevant search results through search planning
Help you to create a reproducible search strategy.

In addition, applying a systematic approach will allow you to work more efficiently. Not every review is a systematic review. Be sure to select the review type that matches the purpose and scope of your project. All reviews should be methodical and done in a careful and deliberate manner with a defined protocol.

Questions to ask yourself:

What is the purpose of this review?
What is the research question?
How long do I have to complete it?
Am I doing it alone or part of a team?
How much of the literature do I need to capture?
Does my literature search have to be transparent and replicable?
Are there standard methods that need to be followed
Types of reviews
Systematic review
Rapid review
Umbrella review

Scoping review

Narrative review

A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question. Researchers conducting systematic reviews use explicit methods aimed at minimizing bias in order to produce more reliable findings that can be used to inform decision making.

An essential step in the early development of a systematic review is the development of a review protocol. A protocol pre-defines the objectives and methods of the systematic review which allows transparency of the process. It must be done prior to conducting the systematic review as it is important in restricting the presence of reporting bias. The protocol is a completely separate document to the systematic review report.

Adapted from: JBI Manual for Evidence Synthesis

In summary, a systematic review:

Addresses a specific question
Uses specified methodology
Assesses quality of the literature
Requires a team and long term commitment

What is a rapid review?

The Cochrane Rapid Reviews Methods Group has proposed the following definition: “A form of knowledge synthesis that accelerates the process of conducting a traditional systematic review through streamlining or omitting specific methods to produce evidence for stakeholders in a resource-efficient manner.”

Rapid reviews are usually undertaken when decision makers have urgent and emerging needs which require evidence produced on a short time frame. Typically, to compensate for the short time frame of a rapid review, methodological rigour may be sacrificed. For example, the grey literature may not be sought and preference may be given to the more readily available research published and written in English.

A rapid review follows most of the principle steps of a systematic review, using systematic and transparent methods to identify, select, critically appraise and analyze data from relevant research. However, to provide timely evidence, some of the components of a systematic review process are either simplified or omitted. There are various approaches for simplifying the review components, such as by reducing the number of databases, assigning a single reviewer in each step while another reviewer verifies the results, excluding or limiting the use of grey literature, or by narrowing the scope of the review. In general, a rapid review takes about four months or less.

Adapted from: Health Evaluation and Applied Research Development (HEARD). (June 25th, 2018). Rapid reviews versus systematic reviews. https://www.heardproject.org/news/rapid-review-vs-systematic-review-what-are-the-differences/

Umbrella reviews are sometimes referred to as a "review of reviews". They are an attempt to identify and appraise, extract and summarise all the evidence from research syntheses related to a topic or question.

Umbrella reviews may:

Include analyses of different interventions for the same problem or condition.
Analyse the same intervention and condition, but different outcomes.
Analyse the same intervention but different conditions, problems or populations.

Umbrella reviews offer the possibility to address a broad scope of issues related to the topic of interest.

In summary, an umbrella review:

Is a systematic review of systematic reviews
Synthesizes systematic reviews of the same topic
Assesses scope and quality of individual systematic reviews

"Scoping reviews, a type of knowledge synthesis, follow a systematic approach to map evidence on a topic and identify main concepts, theories, sources, and knowledge gaps" (Tricco, et al., 2018).

"Scoping reviews conducted as precursors to systematic reviews may enable authors to identify the nature of a broad field of evidence so that ensuing reviews can be assured of locating adequate numbers of relevant studies for inclusion" (Munn, Z., Peters, M., Stern, C., Tufanaru, C., McArthur, A., & Aromataris, E., 2018).

A scoping review may be undertaken as a preliminary exercise prior to the conduct of a systematic review, or as a stand alone review.

A scoping review may be used:

As a precursor to a systematic review.
To identify the types of available evidence in a given field.
To identify and analyse knowledge gaps.
To clarify key concepts/ definitions in the literature.
To examine how research is conducted on a certain topic or field.
To identify key characteristics or factors related to a concept.

Adapted from: JBI Manual for Evidence Synthesis, chapter 11 Scoping reviews. https://doi.org/10.46658/JBIMES-20-01

Getting started: Cochrane: Scoping reviews: what they are and how you can do them

Reporting: The PRISMA extension for scoping reviews was published in 2018. The checklist contains 20 essential reporting items and 2 optional items to include when completing a scoping review. Scoping reviews serve to synthesize evidence and assess the scope of literature on a topic. Among other objectives, scoping reviews help determine whether a systematic review of the literature is warranted.

A traditional literature review or narrative review examines and evaluates the scholarly literature on a topic. Literature reviews often do not answer one specific question, rather they usually bring together a summary of the literature in a qualitative manner.

A literature review may be undertaken in a systematic way in order to be comprehensive, without being a systematic review. It is important to recognise the differences between the two and determine which type of review is best suited to your needs - or whether one of the other reviews detailed here is more applicable.

Narrative reviews:

provide a (generally qualitative) summary of the relevant literature, as determined by the author.
do not necessarily provide an analysis of the literature or its quality.
usually do not include a description of the methodology of the search process.
refer to key journal literature without going into the grey literature.
don't always answer a specific research question.
are not protocol driven.

Barnard, M. (2015). Research essentials: How to undertake a literature review . Nursing Children and Young People, 27 (10), 12-12. doi:10.7748/ncyp.27.10.12.s15

Bettany-Saltikov, J. (2010). Learning how to undertake a systematic review: Part 1 . Nursing Standard , 24 (40): 47-55.

Grant, M.J., & Booth, A. (2009). A typology of reviews: An analysis of 14 review types and associated methodologies . Health Information and Libraries Journal, 26 (2), 91-108. doi:10.1111/j.1471-1842.2009.00848.x

Kowalczyk, N., & Truluck, C. (2013). Literature reviews and systematic reviews: What is the difference? Radiologic Technology, 85 (2), 219-222.

Munn, Z., Peters, M., Stern, C., Tufanaru, C., McArthur, A., & Aromataris, E. (2018). Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Medical Research Methodology, 18 (1), 1-7. doi:10.1186/s12874-018-0611-x

Munn, Z., Stern, C., Aromataris, E., Lockwood, C., & Jordan, Z. (2018). What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences . BMC Medical Research Methodology , 18 (1), 5. https://doi-org.ezproxy.ecu.edu.au/10.1186/s12874-017-0468-4

Pawson, R., Greenhalgh, T., Harvey, G., & Walshe, K. (2005). Realist review: A new method of systematic review designed for complex policy interventions. Journal of Health Services Research and Policy, 10 (3), 21-34. https://doi.org/10.1258/1355819054308530

Robinson, P., & Lowe, J. (2015). Literature reviews vs systematic reviews . Australian and New Zealand Journal of Public Health, 39 (2), 103-103. doi:10.1111/1753-6405.12393

Tricco, A., Lillie, E., Zarin, W., O'Brien, K., Colquhoun, H., Levac, D., . . . Straus, S. (2018). Prisma extension for scoping reviews (prisma-scr): Checklist and explanation . Annals of Internal Medicine, 169 (7), 467-467.

<< Previous: Getting started
Next: Systematic review process >>
Getting started
Formulate the question
SR protocol
Levels of evidence and study design
Searching for systematic reviews
Search strategies
Subject databases
Keeping up to date/Alerts
Trial registers
Conference proceedings
Critical appraisal
Documenting and reporting
Managing search results
Statistical methods
Journal information/publishing
Contact a librarian
Last Updated: May 15, 2024 11:15 AM
URL: https://ecu.au.libguides.com/systematic-reviews

Edith Cowan University acknowledges and respects the Noongar people, who are the traditional custodians of the land upon which its campuses stand and its programs operate. In particular ECU pays its respects to the Elders, past and present, of the Noongar people, and embrace their culture, wisdom and knowledge.

About Systematic Reviews

Understanding the Differences Between a Systematic Review vs Literature Review

Automate every stage of your literature review to produce evidence-based research faster and more accurately.

Let’s look at these differences in further detail.

Goal of the Review

The objective of a literature review is to provide context or background information about a topic of interest. Hence the methodology is less comprehensive and not exhaustive. The aim is to provide an overview of a subject as an introduction to a paper or report. This overview is obtained firstly through evaluation of existing research, theories, and evidence, and secondly through individual critical evaluation and discussion of this content.

A systematic review attempts to answer specific clinical questions (for example, the effectiveness of a drug in treating an illness). Answering such questions comes with a responsibility to be comprehensive and accurate. Failure to do so could have life-threatening consequences. The need to be precise then calls for a systematic approach. The aim of a systematic review is to establish authoritative findings from an account of existing evidence using objective, thorough, reliable, and reproducible research approaches, and frameworks.

Level of Planning Required

The methodology involved in a literature review is less complicated and requires a lower degree of planning. For a systematic review, the planning is extensive and requires defining robust pre-specified protocols. It first starts with formulating the research question and scope of the research. The PICO’s approach (population, intervention, comparison, and outcomes) is used in designing the research question. Planning also involves establishing strict eligibility criteria for inclusion and exclusion of the primary resources to be included in the study. Every stage of the systematic review methodology is pre-specified to the last detail, even before starting the review process. It is recommended to register the protocol of your systematic review to avoid duplication. Journal publishers now look for registration in order to ensure the reviews meet predefined criteria for conducting a systematic review [1].

Search Strategy for Sourcing Primary Resources

Learn more about distillersr.

(Article continues below)

Quality Assessment of the Collected Resources

A rigorous appraisal of collected resources for the quality and relevance of the data they provide is a crucial part of the systematic review methodology. A systematic review usually employs a dual independent review process, which involves two reviewers evaluating the collected resources based on pre-defined inclusion and exclusion criteria. The idea is to limit bias in selecting the primary studies. Such a strict review system is generally not a part of a literature review.

Presentation of Results

Most literature reviews present their findings in narrative or discussion form. These are textual summaries of the results used to critique or analyze a body of literature about a topic serving as an introduction. Due to this reason, literature reviews are sometimes also called narrative reviews. To know more about the differences between narrative reviews and systematic reviews , click here.

A systematic review requires a higher level of rigor, transparency, and often peer-review. The results of a systematic review can be interpreted as numeric effect estimates using statistical methods or as a textual summary of all the evidence collected. Meta-analysis is employed to provide the necessary statistical support to evidence outcomes. They are usually conducted to examine the evidence present on a condition and treatment. The aims of a meta-analysis are to determine whether an effect exists, whether the effect is positive or negative, and establish a conclusive estimate of the effect [2].

Using statistical methods in generating the review results increases confidence in the review. Results of a systematic review are then used by clinicians to prescribe treatment or for pharmacovigilance purposes. The results of the review can also be presented as a qualitative assessment when the end goal is issuing recommendations or guidelines.

Risk of Bias

Literature reviews are mostly used by authors to provide background information with the intended purpose of introducing their own research later. Since the search for included primary resources is also less exhaustive, it is more prone to bias.

One of the main objectives for conducting a systematic review is to reduce bias in the evidence outcome. Extensive planning, strict eligibility criteria for inclusion and exclusion, and a statistical approach for computing the result reduce the risk of bias.

Intervention studies consider risk of bias as the “likelihood of inaccuracy in the estimate of causal effect in that study.” In systematic reviews, assessing the risk of bias is critical in providing accurate assessments of overall intervention effect [3].

With numerous review methods available for analyzing, synthesizing, and presenting existing scientific evidence, it is important for researchers to understand the differences between the review methods. Choosing the right method for a review is crucial in achieving the objectives of the research.

[1] “Systematic Review Protocols and Protocol Registries | NIH Library,” www.nihlibrary.nih.gov . https://www.nihlibrary.nih.gov/services/systematic-review-service/systematic-review-protocols-and-protocol-registries

[2] A. B. Haidich, “Meta-analysis in medical research,” Hippokratia , vol. 14, no. Suppl 1, pp. 29–37, Dec. 2010, [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049418/#:~:text=Meta%2Danalyses%20are%20conducted%20to

3 Reasons to Connect

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A systematic literature review of modalities, trends, and limitations in emotion recognition, affective computing, and sentiment analysis.

1. Introduction

2. methodology, 2.1. research questions, 2.2. search process, 2.2.1. search terms, 2.2.2. inclusion and exclusion criteria, 2.2.3. quality assessment, 2.2.4. data extraction, 3.1. overview, 3.2. unimodal data approaches, 3.2.1. unimodal physical approaches, 3.2.2. unimodal speech data approaches.

Several articles mention the use of transfer learning for speech emotion recognition. This technique involves training models on one dataset and applying them to another. This can improve the efficiency of emotion recognition across different datasets.
Some articles discuss multitask learning models, which are designed to simultaneously learn multiple related tasks. In the context of speech emotion recognition, this approach may help capture commonalities and differences across different datasets or emotions.
Data augmentation techniques are mentioned in multiple articles, which involve generating additional training data from existing data, which can improve model performance and generalization.
Attention mechanisms are a common trend for improving emotion recognition. Attention models allow the model to focus on specific features or segments of the input data that are most relevant for recognizing emotions, such as in multi-level attention-based approaches.
Many articles discuss the use of deep learning models, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and some variants like “Two-Stage Fuzzy Fusion Based-Convolution Neural Network, “Deep Convolutional LSTM”, and “Attention-Oriented Parallel CNN Encoders”.
While deep learning is prevalent, some articles explore novel feature engineering methods, such as modulation spectral features and wavelet packet information gain entropy, to enhance emotion recognition.
From the list of articles on unimodal emotion recognition through speech, 7.14% address the challenge of recognizing emotions across different datasets or corpora. This is an important trend for making emotion recognition models more versatile.
A few articles focus on making emotion recognition models more interpretable and explainable, which is crucial for real-world applications and understanding how the model makes its predictions.
Ensemble methods, which combine multiple models to make predictions, are mentioned in several articles as a way to improve the performance of emotion recognition systems.
Some articles discuss emotion recognition in specific contexts, such as call/contact centers, school violence detection, depression detection, analysis of podcast recordings, noisy environment analysis, in-the-wild sentiment analysis, and speech emotion segmentation of vowel-like and non-vowel-like regions. This indicates a trend toward applying emotion recognition in diverse applications.

3.2.3. Unimodal Text Data Approaches

3.2.4. unimodal physiological data approaches.

Attention and self-attention mechanisms: These suggest that researchers are paying attention to the relevance of different parts of EEG signals for emotion recognition.
Generative adversarial networks (GANs): Used for generating synthetic EEG data in order to improve the robustness and generalization of the models.
Semi-supervised learning and domain transfer: Allow emotion recognition with limited datasets or datasets that are applicable to different domains, suggesting a concern for scalability and generalization of models.
Interpretability and explainability: There is a growing interest in models that are interpretable and explainable, suggesting a concern for understanding how models make decisions and facilitating user trust in them.
Utilization of transformers and capsule networks: Newer neural network architectures such as transformers and capsule networks are being explored for emotion recognition, indicating an interest in enhancing the modeling and representation capabilities of EEG signals.
Although studies with a unimodal physical approach using signals different from EEG, like ECG, EDA, HR, and PPG, are still scarce, these can provide information about the cardiovascular system and the body’s autonomic response to emotions. Their limitations are that they may not be as specific or sensitive in detecting subtle or changing emotions. Noise and artifacts, such as motion, can affect the quality of these signals in practical situations and can be influenced by non-emotional factors, such as physical exercise and fatigue. Various studies explore the utilization of ECG and PPG signals for emotion recognition and stress classification. Techniques such as CNNs, LSTMs, attention mechanisms, self-supervised learning, and data augmentation are employed to analyze these signals and extract meaningful features for emotion recognition tasks. Bayesian deep learning frameworks are utilized for probabilistic modeling and uncertainty estimation in emotion prediction from HB data. These approaches aim to enhance human–computer interaction, improve mental health monitoring, and develop personalized systems for emotion recognition based on individual user characteristics.

3.3. Multi-Physical Data Approaches

Most studies employ CNNs and RNNs, while others utilize variations of general neural networks, such as spiking neural networks (SNN) and tree-based neural networks. SNNs represent and transmit information through discrete bursts of neuronal activity, known as “spikes” or “pulses”, unlike conventional neural networks, which process information in continuous values. Additionally, several studies leverage advanced analysis models such as the stacked ensemble model and multimodal fusion models, which focus on integrating diverse sources of information to enhance decision-making. Transfer learning models and hybrid attention networks aim to capitalize on knowledge from related tasks or domains to improve performance in a target task. Attention-based neural networks prioritize capturing relevant information and patterns within the data. Semi-supervised and contrastive learning models offer alternative learning paradigms by incorporating both labeled and unlabeled data.
The studies address diverse applications, including sarcasm, sentiment, and emotion recognition in conversations, financial distress prediction, performance evaluation in job interviews, emotion-based location recommendation systems, user experience (UX) analysis, emotion detection in video games, and in educational settings. This suggests that emotion recognition thorough multi-physical data analysis has a wide spectrum of applications in everyday life.
Various audio and video signal processing techniques are employed, including pitch analysis, facial feature detection, cross-attention, and representational learning.

3.4. Multi-Physiological Data Approaches

The fusion of physiological signals, such as EEG, ECG, PPG, GSR, EMG, BVP, EOG, respiration, temperature, and movement signals, is a predominant trend in these studies. The combination of multiple physiological signals allows for a richer representation of emotions.
Most studies apply deep learning models, such as CNNs, RNNs, and autoencoder neural networks (AE), for the processing and analysis of these signals. Supervised and unsupervised learning approaches are also used.
These studies focus on a variety of applications, such as emotion recognition in healthcare environments, brain–computer interfaces for music, emotion detection in interactive virtual environments, stress assessment in mobility environments for visually impaired people, among others. This indicates that emotion recognition based on physiological signals has applications in healthcare, technology, and beyond.
Some studies focus on personalized emotion recognition, suggesting tailoring of models for each individual. This may be relevant for personalized health and wellness applications. Others focus on interactive applications and virtual environments useful for entertainment and virtual therapy.
It is important to mention that the studies within this classification are quite limited in comparison to the previously described modalities. Although it appears that they are using similar physiological signals, the databases differ in terms of their approaches and generation methods. Therefore, there is an opportunity to establish a protocol for generating these databases, allowing for meaningful comparisons among studies.

3.5. Multi-Physical–Physiological Data Approaches

Studies tend to combine multiple types of signals, such as EEG, facial expressions, voice signals, GSR, and other physiological data. Combining signals aims to take advantage of the complementarity of different modalities to improve accuracy in emotion detection.
Machine learning models, in particular CNNs, are widely used in signal fusion for emotion recognition. CNN models can effectively process data from multiple modalities.
Applications are also being explored in the health and wellness domain, such as emotion detection for emotional health analysis of people in smart environments.
The use of standardized and widely accepted databases is important for comparing results between different studies; however, these are still limited.
The trend towards non-intrusive sensors and wireless technology enables data collection in more natural and less intrusive environments, which facilitates the practical application of these systems in everyday environments.

4. Discussion

Facial expression analysis approaches are currently being applied across various domains, including naturalistic settings (“in the wild”), on-road driver monitoring, virtual reality environments, smart homes, IoT and edge devices, and assistive robots. There is also a focus on mental health assessment, including autism, depression, and schizophrenia, and distinguishing between genuine and unfelt facial expressions of emotion. Efforts are being made to improve performance in processing faces acquired at a distance despite the challenges posed by low-quality images. Furthermore, there is an emerging interest in utilizing facial expression analysis in human–computer interaction (HCI), learning environments, and multicultural contexts.
The recognition of emotions through speech and text has experienced tremendous growth, largely due to the abundance of information facilitated by advancements in technology and social media. This has enabled individuals to express their opinions and sentiments through various media, including podcast recordings, live videos, and readily available data sources such as social media platforms like Twitter, Facebook, Instagram, and blogs. Additionally, researchers have utilized unconventional sources like stock market data and tourism-related reviews. The variety and richness of these data sources indicate a wide range of segments where such emotion recognition analyses can be applied effectively.
EEG signals continue to be a prominent modality for emotion recognition due to their highly accurate insights into emotional states. Between 2022 and 2023, studies in this field experienced exponential growth. The identified trends include utilizing EEG for enhancing human–computer interaction, recognizing emotions in various contexts such as patients with consciousness disorders, movie viewing, virtual environments, and driving scenarios. EEG is being used for detecting and monitoring mental health issues. There is also a growing focus on personalization, leading towards more individualized and user-specific emotion recognition systems, Other physiological signals, such as ECG, EDA, and HR, are also gaining attention, albeit at a slower pace.
In the realm of multi-physical, multi-physiological, and multi-physical–physiological approaches, it is the former that appears to be laying the groundwork, as evidenced by the abundance of studies in this area. The latter two approaches, incorporating fusions with physiological signals, are still relatively scarce but seem to be paving the way for future researchers to contribute to their growth. Multimodal approaches, which integrate both physical and physiological signals, are finding diverse applications in emotion recognition. These range from healthcare systems, individual and group mood research, personality recognition, pain intensity recognition, anxiety detection, work stress detection, stress classification and security monitoring in public spaces, to vehicle security monitoring, movie audience emotion recognition, applications for autism spectrum disorder detection, music interfacing, and virtual environments.
Bidirectional encoder representations from transformers: Used in sentiment analysis and emotion recognition from text, BERT models can understand the context of words in sentences by pre-training on a large text and then fine-tuning for specific tasks like sentiment analysis.
CNNs: These are commonly applied in facial emotion recognition, emotion recognition from physiological signals, and even in speech emotion recognition by analyzing spectrograms.
RNNS and variants (LSTM, GRU): These models are suited for sequential data like speech and text. LSTMs and GRUs are particularly effective in speech emotion recognition and sentiment analysis of time-series data.
Graph convolutional networks (GCNs): Applied in emotion recognition from EEG signals and conversation-based emotion recognition, these can model relational data and capture the complex dependencies in graph-structured data, like brain connectivity patterns or conversational contexts.
Attention mechanisms and transformers: Enhancing the ability of models to focus on relevant parts of the data, attention mechanisms are integral to models like transformers for tasks that require understanding the context, such as sentiment analysis in long documents or emotion recognition in conversations.
Ensemble models: Combining predictions from multiple models to improve accuracy, ensemble methods are used in multimodal emotion recognition, where inputs from different modalities (e.g., audio, text, and video) are integrated to make more accurate predictions.
Autoencoders and generative adversarial networks (GANs): For tasks like data augmentation in emotion recognition from EEG or for generating synthetic data to improve model robustness, these unsupervised learning models can learn compact representations of data or generate new data samples, respectively.
Multimodal fusion models: In applications requiring the integration of multiple data types (e.g., speech, text, and video for emotion recognition), fusion models combine features from different modalities to capture more comprehensive information for prediction tasks.
Transfer learning: Utilizing pre-trained models on large datasets and fine-tuning them for specific affective computing tasks, transfer learning is particularly useful in scenarios with limited labeled data, such as sentiment analysis in niche domains.
Spatio-temporal models: For tasks that involve data with both spatial and temporal dimensions (like video-based emotion recognition or physiological signal analysis), models that capture spatio-temporal dynamics are employed, combining approaches like CNNs for spatial features and RNNs/LSTMs for temporal features.

5. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

Zhou, T.H.; Liang, W.; Liu, H.; Wang, L.; Ryu, K.H.; Nam, K.W. EEG Emotion Recognition Applied to the Effect Analysis of Music on Emotion Changes in Psychological Healthcare. Int. J. Environ. Res. Public Health 2022 , 20 , 378. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Hajek, P.; Munk, M. Speech Emotion Recognition and Text Sentiment Analysis for Financial Distress Prediction. Neural Comput. Appl. 2023 , 35 , 21463–21477. [ Google Scholar ] [ CrossRef ]
Naim, I.; Tanveer, M.d.I.; Gildea, D.; Hoque, M.E. Automated Analysis and Prediction of Job Interview Performance. IEEE Trans. Affect. Comput. 2018 , 9 , 191–204. [ Google Scholar ] [ CrossRef ]
Ayata, D.; Yaslan, Y.; Kamasak, M.E. Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems. J. Med. Biol. Eng. 2020 , 40 , 149–157. [ Google Scholar ] [ CrossRef ]
Maithri, M.; Raghavendra, U.; Gudigar, A.; Samanth, J.; Barua, D.P.; Murugappan, M.; Chakole, Y.; Acharya, U.R. Automated Emotion Recognition: Current Trends and Future Perspectives. Comput. Methods Programs Biomed. 2022 , 215 , 106646. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Du, Z.; Wu, S.; Huang, D.; Li, W.; Wang, Y. Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition. IEEE Trans. Affect. Comput. 2021 , 12 , 565–578. [ Google Scholar ] [ CrossRef ]
Montero Quispe, K.G.; Utyiama, D.M.S.; dos Santos, E.M.; Oliveira, H.A.B.F.; Souto, E.J.P. Applying Self-Supervised Representation Learning for Emotion Recognition Using Physiological Signals. Sensors 2022 , 22 , 9102. [ Google Scholar ] [ CrossRef ]
Zhang, Y.; Wang, J.; Liu, Y.; Rong, L.; Zheng, Q.; Song, D.; Tiwari, P.; Qin, J. A Multitask Learning Model for Multimodal Sarcasm, Sentiment and Emotion Recognition in Conversations. Inf. Fusion 2023 , 93 , 282–301. [ Google Scholar ] [ CrossRef ]
Leong, S.C.; Tang, Y.M.; Lai, C.H.; Lee, C.K.M. Facial Expression and Body Gesture Emotion Recognition: A Systematic Review on the Use of Visual Data in Affective Computing. Comput. Sci. Rev. 2023 , 48 , 100545. [ Google Scholar ] [ CrossRef ]
Aranha, R.V.; Correa, C.G.; Nunes, F.L.S. Adapting Software with Affective Computing: A Systematic Review. IEEE Trans. Affect. Comput. 2021 , 12 , 883–899. [ Google Scholar ] [ CrossRef ]
Kratzwald, B.; Ilić, S.; Kraus, M.; Feuerriegel, S.; Prendinger, H. Deep Learning for Affective Computing: Text-Based Emotion Recognition in Decision Support. Decis. Support. Syst. 2018 , 115 , 24–35. [ Google Scholar ] [ CrossRef ]
Ab. Aziz, N.A.; K., T.; Ismail, S.N.M.S.; Hasnul, M.A.; Ab. Aziz, K.; Ibrahim, S.Z.; Abd. Aziz, A.; Raja, J.E. Asian Affective and Emotional State (A2ES) Dataset of ECG and PPG for Affective Computing Research. Algorithms 2023 , 16 , 130. [ Google Scholar ] [ CrossRef ]
Jung, T.-P.; Sejnowski, T.J. Utilizing Deep Learning Towards Multi-Modal Bio-Sensing and Vision-Based Affective Computing. IEEE Trans. Affect. Comput. 2022 , 13 , 96–107. [ Google Scholar ] [ CrossRef ]
Shah, S.; Ghomeshi, H.; Vakaj, E.; Cooper, E.; Mohammad, R. An Ensemble-Learning-Based Technique for Bimodal Sentiment Analysis. Big Data Cogn. Comput. 2023 , 7 , 85. [ Google Scholar ] [ CrossRef ]
Tang, J.; Hou, M.; Jin, X.; Zhang, J.; Zhao, Q.; Kong, W. Tree-Based Mix-Order Polynomial Fusion Network for Multimodal Sentiment Analysis. Systems 2023 , 11 , 44. [ Google Scholar ] [ CrossRef ]
Khamphakdee, N.; Seresangtakul, P. An Efficient Deep Learning for Thai Sentiment Analysis. Data 2023 , 8 , 90. [ Google Scholar ] [ CrossRef ]
Jo, A.-H.; Kwak, K.-C. Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information. Appl. Sci. 2023 , 13 , 2167. [ Google Scholar ] [ CrossRef ]
Abdulrahman, A.; Baykara, M.; Alakus, T.B. A Novel Approach for Emotion Recognition Based on EEG Signal Using Deep Learning. Appl. Sci. 2022 , 12 , 10028. [ Google Scholar ] [ CrossRef ]
Middya, A.I.; Nag, B.; Roy, S. Deep Learning Based Multimodal Emotion Recognition Using Model-Level Fusion of Audio–Visual Modalities. Knowl. Based Syst. 2022 , 244 , 108580. [ Google Scholar ] [ CrossRef ]
Ali, M.; Mosa, A.H.; Al Machot, F.; Kyamakya, K. EEG-Based Emotion Recognition Approach for e-Healthcare Applications. In Proceedings of the 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), Vienna, Austria, 5–8 July 2016; pp. 946–950. [ Google Scholar ]
Zepf, S.; Hernandez, J.; Schmitt, A.; Minker, W.; Picard, R.W. Driver Emotion Recognition for Intelligent Vehicles. ACM Comput. Surv. (CSUR) 2020 , 53 , 1–30. [ Google Scholar ] [ CrossRef ]
Zaman, K.; Zhaoyun, S.; Shah, B.; Hussain, T.; Shah, S.M.; Ali, F.; Khan, U.S. A Novel Driver Emotion Recognition System Based on Deep Ensemble Classification. Complex. Intell. Syst. 2023 , 9 , 6927–6952. [ Google Scholar ] [ CrossRef ]
Du, Y.; Crespo, R.G.; Martínez, O.S. Human Emotion Recognition for Enhanced Performance Evaluation in E-Learning. Prog. Artif. Intell. 2022 , 12 , 199–211. [ Google Scholar ] [ CrossRef ]
Alaei, A.; Wang, Y.; Bui, V.; Stantic, B. Target-Oriented Data Annotation for Emotion and Sentiment Analysis in Tourism Related Social Media Data. Future Internet 2023 , 15 , 150. [ Google Scholar ] [ CrossRef ]
Caratù, M.; Brescia, V.; Pigliautile, I.; Biancone, P. Assessing Energy Communities’ Awareness on Social Media with a Content and Sentiment Analysis. Sustainability 2023 , 15 , 6976. [ Google Scholar ] [ CrossRef ]
Bota, P.J.; Wang, C.; Fred, A.L.N.; Placido Da Silva, H. A Review, Current Challenges, and Future Possibilities on Emotion Recognition Using Machine Learning and Physiological Signals. IEEE Access 2019 , 7 , 140990–141020. [ Google Scholar ] [ CrossRef ]
Egger, M.; Ley, M.; Hanke, S. Emotion Recognition from Physiological Signal Analysis: A Review. Electron. Notes Theor. Comput. Sci. 2019 , 343 , 35–55. [ Google Scholar ] [ CrossRef ]
Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A Review of Emotion Recognition Using Physiological Signals. Sensors 2018 , 18 , 2074. [ Google Scholar ] [ CrossRef ]
Canal, F.Z.; Müller, T.R.; Matias, J.C.; Scotton, G.G.; de Sa Junior, A.R.; Pozzebon, E.; Sobieranski, A.C. A Survey on Facial Emotion Recognition Techniques: A State-of-the-Art Literature Review. Inf. Sci. 2022 , 582 , 593–617. [ Google Scholar ] [ CrossRef ]
Assabumrungrat, R.; Sangnark, S.; Charoenpattarawut, T.; Polpakdee, W.; Sudhawiyangkul, T.; Boonchieng, E.; Wilaiprasitporn, T. Ubiquitous Affective Computing: A Review. IEEE Sens. J. 2022 , 22 , 1867–1881. [ Google Scholar ] [ CrossRef ]
Schmidt, P.; Reiss, A.; Dürichen, R.; Laerhoven, K. Van Wearable-Based Affect Recognition—A Review. Sensors 2019 , 19 , 4079. [ Google Scholar ] [ CrossRef ]
Rouast, P.V.; Adam, M.T.P.; Chiong, R. Deep Learning for Human Affect Recognition: Insights and New Developments. IEEE Trans. Affect. Comput. 2021 , 12 , 524–543. [ Google Scholar ] [ CrossRef ]
Ahmed, N.; Aghbari, Z.A.; Girija, S. A Systematic Survey on Multimodal Emotion Recognition Using Learning Algorithms. Intell. Syst. Appl. 2023 , 17 , 200171. [ Google Scholar ] [ CrossRef ]
Kitchenham, B. Procedures for Performing Systematic Reviews ; Keele University: Keele, UK, 2004; Volume 33, pp. 1–26. [ Google Scholar ]
Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2019 , 10 , 18–31. [ Google Scholar ] [ CrossRef ]
Al Jazaery, M.; Guo, G. Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features. IEEE Trans. Affect. Comput. 2021 , 12 , 262–268. [ Google Scholar ] [ CrossRef ]
Kollias, D.; Zafeiriou, S. Exploiting Multi-CNN Features in CNN-RNN Based Dimensional Emotion Recognition on the OMG in-the-Wild Dataset. IEEE Trans. Affect. Comput. 2021 , 12 , 595–606. [ Google Scholar ] [ CrossRef ]
Li, S.; Deng, W. A Deeper Look at Facial Expression Dataset Bias. IEEE Trans. Affect. Comput. 2022 , 13 , 881–893. [ Google Scholar ] [ CrossRef ]
Kulkarni, K.; Corneanu, C.A.; Ofodile, I.; Escalera, S.; Baro, X.; Hyniewska, S.; Allik, J.; Anbarjafari, G. Automatic Recognition of Facial Displays of Unfelt Emotions. IEEE Trans. Affect. Comput. 2021 , 12 , 377–390. [ Google Scholar ] [ CrossRef ]
Punuri, S.B.; Kuanar, S.K.; Kolhar, M.; Mishra, T.K.; Alameen, A.; Mohapatra, H.; Mishra, S.R. Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning. Mathematics 2023 , 11 , 776. [ Google Scholar ] [ CrossRef ]
Mukhiddinov, M.; Djuraev, O.; Akhmedov, F.; Mukhamadiyev, A.; Cho, J. Masked Face Emotion Recognition Based on Facial Landmarks and Deep Learning Approaches for Visually Impaired People. Sensors 2023 , 23 , 1080. [ Google Scholar ] [ CrossRef ]
Babu, E.K.; Mistry, K.; Anwar, M.N.; Zhang, L. Facial Feature Extraction Using a Symmetric Inline Matrix-LBP Variant for Emotion Recognition. Sensors 2022 , 22 , 8635. [ Google Scholar ] [ CrossRef ]
Mustafa Hilal, A.; Elkamchouchi, D.H.; Alotaibi, S.S.; Maray, M.; Othman, M.; Abdelmageed, A.A.; Zamani, A.S.; Eldesouki, M.I. Manta Ray Foraging Optimization with Transfer Learning Driven Facial Emotion Recognition. Sustainability 2022 , 14 , 14308. [ Google Scholar ] [ CrossRef ]
Bisogni, C.; Cimmino, L.; De Marsico, M.; Hao, F.; Narducci, F. Emotion Recognition at a Distance: The Robustness of Machine Learning Based on Hand-Crafted Facial Features vs Deep Learning Models. Image Vis. Comput. 2023 , 136 , 104724. [ Google Scholar ] [ CrossRef ]
Sun, Q.; Liang, L.; Dang, X.; Chen, Y. Deep Learning-Based Dimensional Emotion Recognition Combining the Attention Mechanism and Global Second-Order Feature Representations. Comput. Electr. Eng. 2022 , 104 , 108469. [ Google Scholar ] [ CrossRef ]
Sudha, S.S.; Suganya, S.S. On-Road Driver Facial Expression Emotion Recognition with Parallel Multi-Verse Optimizer (PMVO) and Optical Flow Reconstruction for Partial Occlusion in Internet of Things (IoT). Meas. Sens. 2023 , 26 , 100711. [ Google Scholar ] [ CrossRef ]
Barra, P.; De Maio, L.; Barra, S. Emotion Recognition by Web-Shaped Model. Multimed. Tools Appl. 2023 , 82 , 11321–11336. [ Google Scholar ] [ CrossRef ]
Bhattacharya, A.; Choudhury, D.; Dey, D. Edge-Enhanced Bi-Dimensional Empirical Mode Decomposition-Based Emotion Recognition Using Fusion of Feature Set. Soft Comput. 2018 , 22 , 889–903. [ Google Scholar ] [ CrossRef ]
Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The Extended Cohn-Kanade Dataset (CK+): A Complete Dataset for Action Unit and Emotion-Specified Expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [ Google Scholar ]
Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; Pietikäinen, M. Facial Expression Recognition from Near-Infrared Videos. Image Vis. Comput. 2011 , 29 , 607–619. [ Google Scholar ] [ CrossRef ]
Barros, P.; Churamani, N.; Lakomkin, E.; Siqueira, H.; Sutherland, A.; Wermter, S. The OMG-Emotion Behavior Dataset. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–7. [ Google Scholar ]
Ullah, Z.; Qi, L.; Hasan, A.; Asim, M. Improved Deep CNN-Based Two Stream Super Resolution and Hybrid Deep Model-Based Facial Emotion Recognition. Eng. Appl. Artif. Intell. 2022 , 116 , 105486. [ Google Scholar ] [ CrossRef ]
Zheng, W.; Zong, Y.; Zhou, X.; Xin, M. Cross-Domain Color Facial Expression Recognition Using Transductive Transfer Subspace Learning. IEEE Trans. Affect. Comput. 2018 , 9 , 21–37. [ Google Scholar ] [ CrossRef ]
Tan, K.L.; Lee, C.P.; Lim, K.M. RoBERTa-GRU: A Hybrid Deep Learning Model for Enhanced Sentiment Analysis. Appl. Sci. 2023 , 13 , 3915. [ Google Scholar ] [ CrossRef ]
Ren, M.; Huang, X.; Li, W.; Liu, J. Multi-Loop Graph Convolutional Network for Multimodal Conversational Emotion Recognition. J. Vis. Commun. Image Represent. 2023 , 94 , 103846. [ Google Scholar ] [ CrossRef ]
Mai, S.; Hu, H.; Xu, J.; Xing, S. Multi-Fusion Residual Memory Network for Multimodal Human Sentiment Comprehension. IEEE Trans. Affect. Comput. 2022 , 13 , 320–334. [ Google Scholar ] [ CrossRef ]
Yang, L.; Jiang, D.; Sahli, H. Integrating Deep and Shallow Models for Multi-Modal Depression Analysis—Hybrid Architectures. IEEE Trans. Affect. Comput. 2021 , 12 , 239–253. [ Google Scholar ] [ CrossRef ]
Mocanu, B.; Tapu, R.; Zaharia, T. Multimodal Emotion Recognition Using Cross Modal Audio-Video Fusion with Attention and Deep Metric Learning. Image Vis. Comput. 2023 , 133 , 104676. [ Google Scholar ] [ CrossRef ]
Noroozi, F.; Marjanovic, M.; Njegus, A.; Escalera, S.; Anbarjafari, G. Audio-Visual Emotion Recognition in Video Clips. IEEE Trans. Affect. Comput. 2019 , 10 , 60–75. [ Google Scholar ] [ CrossRef ]
Davison, A.K.; Lansley, C.; Costen, N.; Tan, K.; Yap, M.H. SAMM: A Spontaneous Micro-Facial Movement Dataset. IEEE Trans. Affect. Comput. 2018 , 9 , 116–129. [ Google Scholar ] [ CrossRef ]
Happy, S.L.; Routray, A. Fuzzy Histogram of Optical Flow Orientations for Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2019 , 10 , 394–406. [ Google Scholar ] [ CrossRef ]
Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. In Proceedings of the Proceedings of the 20th ACM International Conference on Multimodal Interaction, New York, NY, USA, 2 October 2018; ACM: New York, NY, USA, 2018; pp. 400–408. [ Google Scholar ]
Miranda-Correa, J.A.; Abadi, M.K.; Sebe, N.; Patras, I. AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups. IEEE Trans. Affect. Comput. 2021 , 12 , 479–493. [ Google Scholar ] [ CrossRef ]
Subramanian, R.; Wache, J.; Abadi, M.K.; Vieriu, R.L.; Winkler, S.; Sebe, N. ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors. IEEE Trans. Affect. Comput. 2018 , 9 , 147–160. [ Google Scholar ] [ CrossRef ]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012 , 3 , 18–31. [ Google Scholar ] [ CrossRef ]
Zhang, Y.; Cheng, C.; Wang, S.; Xia, T. Emotion Recognition Using Heterogeneous Convolutional Neural Networks Combined with Multimodal Factorized Bilinear Pooling. Biomed. Signal Process Control 2022 , 77 , 103877. [ Google Scholar ] [ CrossRef ]
The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Available online: https://www.prisma-statement.org/prisma-2020-statement (accessed on 12 August 2024).

Click here to enlarge figure

Database	Resulted Studies with Key Terms	After Years Filter	After Article Type	Relevant Order
IEEE	2112	1152	536	200
Springer	4121	1808	1694	200
Science Direct	1041	582	480	200
MDPI	686	643	635	200

Database	Quantity
IEEE	148
Springer	112
Science Direct	166
MDPI	183

Modality	2018	2019	2020	2021	2022	2023	Total
Multi-physical	8	6		8	22	27	71
Multi-physical–physiological	2			3	6	7	18
Multi-physiological	2		6	3	6	4	21
Unimodal	37	26	29	37	176	194	499
Total	49	32	35	51	210	232	609

Article Title	Databases Used	Ref.
AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild.	AffectNet	[ ]
Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features.	AVEC2013, AVEC2014	[ ]
Exploiting Multi-CNN Features in CNN-RNN Based Dimensional Emotion Recognition on the OMG in-the-Wild Dataset.	Aff-Wild, Aff-Wild2, OMG	[ ]
A Deeper Look at Facial Expression Dataset Bias.	CK+, JAFFE, MMI, Oulu-CASIA, AffectNet, FER2013, RAF-DB 2.0, SFEW 2.0	[ ]
Automatic Recognition of Facial Displays of Unfelt Emotions.	CK+, OULU-CASIA, BP4D	[ ]
Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition.	OMG, RECOLA, SEWA	[ ]
Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning.	CK+, FER2013, JAFFE, KDEF	[ ]
Masked Face Emotion Recognition Based on Facial Landmarks and Deep Learning Approaches for Visually Impaired People.	AffectNet	[ ]
Facial Feature Extraction Using a Symmetric Inline Matrix-LBP Variant for Emotion Recognition.	JAFFE	[ ]
Manta Ray Foraging Optimization with Transfer Learning Driven Facial Emotion Recognition.	CK+, FER-2013	[ ]
Emotion recognition at a distance: The robustness of machine learning based on hand-crafted facial features vs deep learning models.	CK+	[ ]
Deep learning-based dimensional emotion recognition combining the attention mechanism and global second-order feature representations.	AffectNet	[ ]
On-road driver facial expression emotion recognition with parallel multi-verse optimizer (PMVO) and optical flow reconstruction for partial occlusion in internet of things (IoT).	CK+, KMU-FED	[ ]
Emotion recognition by web-shaped model.	CK+, KDEF	[ ]
Edge-enhanced bi-dimensional empirical mode decomposition-based emotion recognition using fusion of feature set	eNTERFACE, CK, JAFFE	[ ]
A novel driver emotion recognition system based on deep ensemble classification	AffectNet, CK+, DFER, FER-2013, JAFFE, and custom- dataset)	[ ]

1.Facial emotion recognition for mental health assessment (depression, schizophrenia)	14. Emotion recognition performance assessment from faces acquired at a distance.
2. Emotion analysis in human-computer interaction	15. Facial emotion recognition for IoT and edge devices
3. Emotion recognition in the context of autism	16. Idiosyncratic bias in emotion recognition
4. Driver emotion recognition for intelligent vehicles	17. Emotion recognition in socially assistive robots
5. Assessment of emotional engagement in learning environments	18. In the wild facial emotion recognition
6. Facial emotion recognition for apparent personality trait analysis	19. Video-based emotion recognition
7. Facial emotion recognition for gender, age, and ethnicity estimation	20. Spatio-temporal emotion recognition in videos
8. Emotion recognition in virtual reality and smart homes	21. Spontaneous emotion recognition
9. Emotion recognition in healthcare and clinical settings	22. Emotion recognition using facial components
10. Emotion recognition in real-world and COVID-19 masked scenarios	23. Comparing emotion recognition from genuine and unfelt
11. Personalized and group-based emotion recognition	facial expressions.
12. Music-enhanced emotion recognition
13. Cross-dataset emotion recognition

Database Name	Description	Advantages	Limitation
MELD (Multimodal Emotion Lines Dataset) [ ]	Focuses on emotion recognition in movie dialogues. It contains transcriptions of dialogues and their corresponding audio and video tracks. Emotions are labeled at the sentence and speaker levels.	Large amount of data, multimodal (text, audio, video).	Emotions induced by movies. Manually labeled.
IEMOCAP (Interactive Emotional Dyadic Motion Capture), 2005 [ ]	Focuses on emotional interactions between two individuals during acting sessions. It contains video and audio recordings of actors performing emotional scenes.	Realistic data, emotional interactions, a wide range of emotions.	Not real induced emotions (acting).
CMU-MOSI (Multimodal Corpus of Sentiment Intensity. 2014, 2017 [ ]	Focuses on sentiment intensity in speeches and interviews. It includes transcriptions of audio and video, along with sentiment annotations. Updated in the 2017 CMU-MOSEI.	Emotions are derived from real speeches and interviews.	Relatively small size.
AVEC (Affective Behavior in the Context of E-Learning with Social Signals 2007–2016 [ ]	AVEC is a series of competitions focused on the detection of emotions and behaviors in the context of online learning. It includes video and audio data of students participating in e-learning activities.	Emotions are naturally induced during online learning activities.	Context-specific data, enables emotion assessment in e-learning settings.
RAVDESS (The Ryerson Audio-Visual Database of Emotional Speech and Song) 2016 [ ]	Audio and video database that focuses on emotion recognition in speech and song. It includes performances by actors expressing various emotions.	Diverse data in terms of emotions, modalities, and contexts.	Does not contain natural dialogues.
SAVEE (Surrey Audio–Visual Expressed Emotion) 2010 [ ]	Focuses on emotion recognition in speech. It contains recordings of speakers expressing emotions through phrases and words.	Clean audio data.
SAMM (Spontaneous Micro-expression Dataset) [ ]	Focuses on spontaneous micro-expressions that last only a fraction of a second. It contains videos of people expressing emotions in real emotional situations.	Real spontaneous micro-expressions.
CASME (Chinese Academy of Sciences Micro-Expression) [ ]	Focus on the detection of micro-expressions in response to emotional stimuli. They contain videos of micro-expressions.	Induced by emotional stimuli.	Not multicultural.

Database Name	Description	Advantages	Limitation
WESAD (Wearable Stress and Affect Detection) [ ]	It focuses on stress and affect recognition from physiological signals like ECG, EMG, and EDA, as well as motion signals from accelerometers. Data were collected while participants performed tasks and experienced emotions in a controlled laboratory setting, wearing wearable sensors.	Facilitates the development of wearable emotion recognition systems.	The dataset is relatively small, and participant diversity may be limited.
AMIGOS [ ]	It is a multimodal dataset for personality traits and mood. Emotions are induced by emotional videos in two social contexts: one with individual viewers and one with groups of viewers. Participants’ EEG, ECG, and GSR signals were recorded using wearable sensors. Frontal HD videos and full-body videos in RGB and depth were also recorded.	Participants’ emotions were scored by self-assessment of valence, arousal, control, familiarity, liking, and basic emotions felt during the videos, as well as external assessments of valence and arousal.	Reduced number of participants.
DREAMER [ ]	Records physiological ECG, EMG, and EDA signals and self-reported emotional responses. Collected during the presentation of emotional video clips.	Enables the study of emotional responses in a controlled environment and their comparison with self-reported emotions.	Emotions may be biased towards those induced by video clips, and the dataset size is limited.
ASCERTAIN [ ]	Focus on linking personality traits and emotional states through physiological responses like EEG, ECG, GSR, and facial activity data while participants watched emotionally charged movie clips.	Suitable for studying emotions in stressful situations and their impact on human activity.	The variety of emotions induced is limited.
DEAP (Database for Emotion Analysis using Physiological Signals), [ , ]	Includes physiological signals like EEG, ECG, EMG, and EDA, as well as audiovisual data. Data were collected by exposing participants to audiovisual stimuli designed to elicit various emotions.	Provides a diverse range of emotions and physiological data for emotion analysis.	The size of the database is small.
MAHNOB-HCI (Multimodal Human Computer Interaction Database for Affect Analysis and Recognition) [ , ].	Includes multimodal data, such as audio, video, physiological, ECG, EDA, and kinematic data. Data were collected while participants engaged in various human–computer interaction scenarios.	Offers a rich dataset for studying emotional responses during interactions with technology.

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

García-Hernández, R.A.; Luna-García, H.; Celaya-Padilla, J.M.; García-Hernández, A.; Reveles-Gómez, L.C.; Flores-Chaires, L.A.; Delgado-Contreras, J.R.; Rondon, D.; Villalba-Condori, K.O. A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis. Appl. Sci. 2024 , 14 , 7165. https://doi.org/10.3390/app14167165

García-Hernández RA, Luna-García H, Celaya-Padilla JM, García-Hernández A, Reveles-Gómez LC, Flores-Chaires LA, Delgado-Contreras JR, Rondon D, Villalba-Condori KO. A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis. Applied Sciences . 2024; 14(16):7165. https://doi.org/10.3390/app14167165

García-Hernández, Rosa A., Huizilopoztli Luna-García, José M. Celaya-Padilla, Alejandra García-Hernández, Luis C. Reveles-Gómez, Luis Alberto Flores-Chaires, J. Ruben Delgado-Contreras, David Rondon, and Klinge O. Villalba-Condori. 2024. "A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis" Applied Sciences 14, no. 16: 7165. https://doi.org/10.3390/app14167165

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
PMC10248995

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses


Cochrane (formerly Cochrane Collaboration)
JBI (formerly Joanna Briggs Institute)

National Institute for Health and Care Excellence (NICE)—United Kingdom
Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review type	Topic assessed	Elements of research question (mnemonic)
Intervention [ , ]	Benefits and harms of interventions used in healthcare.	opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]	How well a diagnostic test performs in diagnosing and detecting a particular disease.	opulation, ndex test(s), and arget condition ( )
Qualitative
Cochrane [ ]	Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.	etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( ) ample, henomenon of nterest, esign, valuation, esearch type ( ) spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )
JBI [ ]	Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities.	opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]	Probable course or future outcome(s) of people with a health problem.	opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]	The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome.	opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]	What is the most suitable instrument to measure a construct of interest in a specific study population?	opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]	The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?	Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI



Intervention	8572	96.3	Effectiveness	435	61.5
Diagnostic	176	1.9	Diagnostic Test Accuracy	9	1.3
Overview	64	0.7	Umbrella	4	0.6
Methodology	41	0.45	Mixed Methods	2	0.3
Qualitative	17	0.19	Qualitative	159	22.5
Prognostic	11	0.12	Prevalence and Incidence	6	0.8
Rapid	11	0.12	Etiology and Risk	7	1.0
Prototype	8	0.08	Measurement Properties	3	0.4
			Economic	6	0.6
			Text and Opinion	1	0.14
			Scoping	43	6.0
			Comprehensive	32	4.5
	Total = 8900			Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1 provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis


Quality of Reporting of Meta-analyses (QUOROM) Statement	Moher 1999 [ ]
Meta-analyses Of Observational Studies in Epidemiology (MOOSE)	Stroup 2000 [ ]
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)	Moher 2009 [ ]
PRISMA 2020	Page 2021 [ ]

Overview Quality Assessment Questionnaire (OQAQ)	Oxman and Guyatt 1991 [ ]
Systematic Review Critical Appraisal Sheet	Centre for Evidence-based Medicine 2005 [ ]
A Measurement Tool to Assess Systematic Reviews (AMSTAR)	Shea 2007 [ ]
AMSTAR-2	Shea 2017 [ ]

Risk of Bias in Systematic Reviews (ROBIS)	Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1 but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic

	Extensive	Extensive
	Intervention	Intervention, diagnostic, etiology, prognostic
	7 critical, 9 non-critical	4

Total number	16	29
Response options	Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or Items # 2, 4, 7, 8, 9 : rated or Items # 11 , 12, 15: rated or	24 assessment items: rated 5 items regarding level of concern: rated

Construct	Confidence based on weaknesses in critical domains	Level of concern for risk of bias
Categories	High, moderate, low, critically low	Low, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions


PRISMA for systematic reviews with a focus on health equity [ ]	PRISMA-E	2012
Reporting systematic reviews in journal and conference abstracts [ ]	PRISMA for Abstracts	2015; 2020
PRISMA for systematic review protocols [ ]	PRISMA-P	2015
PRISMA for Network Meta-Analyses [ ]	PRISMA-NMA	2015
PRISMA for Individual Participant Data [ ]	PRISMA-IPD	2015
PRISMA for reviews including harms outcomes [ ]	PRISMA-Harms	2016
PRISMA for diagnostic test accuracy [ ]	PRISMA-DTA	2018
PRISMA for scoping reviews [ ]	PRISMA-ScR	2018
PRISMA for acupuncture [ ]	PRISMA-A	2019
PRISMA for reporting literature searches [ ]	PRISMA-S	2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1 links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a



			Table	Table



Methods for study selection	#5	#2.5	All three components must be done in duplicate, and methods fully described.	Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction	#6	#3.1
Methods for RoB assessment	NA	#3.5

Study description	#8	#3.2	Research design features, components of research question (eg, PICO), setting, funding sources.	Allows readers to understand the individual studies in detail.


Sources of funding	#10	NA	Identified for all included studies.	Can reveal CoI or bias.

Publication bias	#15*	#4.5	Explored, diagrammed, and discussed.	Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI	#16	NA	Disclosed, with management strategies described.	If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2 lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

Acronym	Meaning
	feasible, interesting, novel, ethical, and relevant
	specific, measurable, attainable, relevant, timely
	time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses


BMJ Open
BioMed Central
JMIR Research Protocols
World Journal of Meta-analysis

Cochrane
JBI
PROSPERO
Research Registry- Registry of Systematic Reviews/Meta-Analyses
International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)

Center for Open Science
Protocols.io

Figshare
Open Science Framework
Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis



Aggregate data Individual participant data	Weighted average of effect estimates	Pairwise comparisons of effect estimates, CI Overall effect estimate, CI, value Evaluation of heterogeneity	Forest plot with summary statistic for average effect estimate
Network	Variable	The interventions, which are compared directly indirectly	Network diagram or graph, tabular presentations
		Comparisons of relative effects between any pair of interventions	Effect estimates for intervention pairings
		Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneity	Forest plot, other methods
		Treatment rankings (ie, probability that an intervention is among the best options)	Rankogram plot
	Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)	Range and distribution of observed effects such as median, interquartile range, range	Box-and-whisker plot, bubble plot Forest plot (without summary effect estimate)
	Combining values	Combined value, number of studies	Albatross plot (study sample size against values per outcome)
	Vote counting by direction of effect (eg, favors intervention over the comparator)	Proportion of studies with an effect in the direction of interest, CI, value	Harvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4 for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4 for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

	[ ]
Risk of bias [ ]	Large magnitude of effect
Imprecision [ ]	Dose–response gradient
Inconsistency [ ]	All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

⊕ ⊕ ⊕ ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect

⊕ ⊕ ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different

⊕ ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect

⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.

2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).

3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.

4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a


	Cochrane , JBI	Cochrane, JBI	Cochrane	Cochrane, JBI	JBI	JBI	JBI	Cochrane, JBI	JBI

Protocol	PRISMA-P [ ]	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P
Systematic review	PRISMA 2020 [ ]	PRISMA-DTA [ ]	PRISMA 2020	eMERGe [ ] ENTREQ [ ]	PRISMA 2020	PRISMA 2020	PRISMA 2020	PRIOR [ ]	PRISMA-ScR [ ]
Synthesis without MA	SWiM [ ]	PRISMA-DTA [ ]	SWiM	eMERGe [ ] ENTREQ [ ]	SWiM	SWiM	SWiM	PRIOR [ ]	PRISMA-ScR [ ]
	For RCTs: Cochrane RoB2 [ ] For NRSI: ROBINS-I [ ] Other primary research	QUADAS-2[ ]	Factor review QUIPS [ ] Model review PROBAST [ ]	CASP qualitative checklist [ ] JBI Critical Appraisal Checklist [ ]	JBI checklist for studies reporting prevalence data [ ]	For NRSI: ROBINS-I [ ] Other primary research	COSMIN RoB Checklist [ ]	AMSTAR-2 [ ] or ROBIS [ ]	Not required
	GRADE [ ]	GRADE adaptation	GRADE adaptation	CERQual [ ] ConQual [ ]	GRADE adaptation	Risk factors	GRADE adaptation	GRADE (for intervention reviews) Risk factors	Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.

The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.

A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.

An event or measurement collected for participants in a study (such as quality of life, mortality).

The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.

A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.

The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.

An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

Preferred	Potentially problematic
Evidence synthesis with meta-analysis Systematic review with meta-analysis	Meta-analysis
Overview or umbrella review	Systematic review of systematic reviews Review of reviews Meta-review
Randomized	Experimental
Non-randomized	Observational
Single case experimental design	Single-subject research N-of-1 design
Case report or case series	Descriptive study
Methodological quality	Quality
Certainty of evidence	Quality of evidence Grade of evidence Level of evidence Strength of evidence
Qualitative systematic review	Qualitative synthesis
Synthesis of qualitative data	Qualitative synthesis
Synthesis without meta-analysis	Narrative synthesis , narrative summary Qualitative synthesis Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

Where to start
the difference between literature review and systematic review
Overview
How to choose the right systematic literature review?
What is difference between systematic review and literature review?
Systematic Review and Literature Review: What's The Differences?

COMMENTS

The difference between a systematic review and a literature ...
Systematic review methods have influenced many other review types, including the traditional literature review. Covidence is a web-based tool that saves you time at the screening, selection, data extraction and quality assessment stages of your systematic review. It supports easy collaboration across teams and provides a clear overview of task ...
Systematic Literature Review or Literature Review
Systematic Literature Review vs Meta Analysis. It would be understandable to think that a systematic literature review is similar to a meta analysis. But, whereas a systematic review can include several research studies to answer a specific question, typically a meta analysis includes a comparison of different studies to suss out any ...
Systematic and other reviews: criteria and complexities
A systematic review follows explicit methodology to answer a well-defined research question by searching the literature comprehensively, evaluating the quantity and quality of research evidence rigorously, and analyzing the evidence to synthesize an answer to the research question. The evidence gathered in systematic reviews can be qualitative ...
Research Guides: Systematic Reviews: Types of Literature Reviews
Rapid review. Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. Completeness of searching determined by time constraints. Time-limited formal quality assessment. Typically narrative and tabular.
Systematic reviews: Structure, form and content
Topic selection and planning. In recent years, there has been an explosion in the number of systematic reviews conducted and published (Chalmers & Fox 2016, Fontelo & Liu 2018, Page et al 2015) - although a systematic review may be an inappropriate or unnecessary research methodology for answering many research questions.Systematic reviews can be inadvisable for a variety of reasons.
Guidance on Conducting a Systematic Literature Review
Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...
Systematic Review vs. Literature Review…What's Best for Your Needs?
Both systematic and literature (or comprehensive) reviews are a gathering of available information on a certain subject. The difference comes in the depth of the research and the reporting of the conclusions. Let's take a look. A literature or comprehensive review brings together information on a topic in order to provide an overview of the ...
How-to conduct a systematic literature review: A quick guide for
Abstract. Performing a literature review is a critical first step in research to understanding the state-of-the-art and identifying gaps and challenges in the field. A systematic literature review is a method which sets out a series of steps to methodically organize the review. In this paper, we present a guide designed for researchers and in ...
Systematic Review vs. Literature Review
It is common to confuse systematic and literature reviews as both are used to provide a summary of the existent literature or research on a specific topic. Even with this common ground, both types vary significantly. Please review the following chart (and its corresponding poster linked below) for the detailed explanation of each as well as the ...
Detailed Comparison: Systematic Review vs Literature Review
While a literature review aims to provide a broad understanding of a research area, a systematic review focuses on answering a specific research question by analyzing high-quality evidence. The systematic review process is more structured, rigorous, and transparent than a traditional literature review.
Literature reviews vs systematic reviews
Acommon type of submission at any Journal is a review of the published information related to a topic.These are often returned to their authors without review, usually because they are literature reviews rather than systematic reviews. There is a big difference between the two (Table 1).Here, we summarise the differences, how they are used in academic work, and why a general literature review ...
Systematic vs literature reviews
A systematic literature review is designed to review relevant literature in your field through a highly rigorous and 'systematic' process. The process of undertaking a systematic literature review covers not only the content found in the literature but the methods used to find the literature, what search strategies you used, how and where you ...
What is the difference between a Systematic Review and a Literature
This research guide will help you research, compile, and understand the elements required for a literature review. Systematic reviews and literature reviews are commonly confused. The main difference between the two is that systematic reviews answer a focused question whereas literature reviews contextualize a topic.
Literature review as a research methodology: An ...
2.1.1. Systematic literature review. What is it and when should we use it? Systematic reviews have foremost been developed within medical science as a way to synthesize research findings in a systematic, transparent, and reproducible way and have been referred to as the gold standard among reviews (Davis et al., 2014).Despite all the advantages of this method, its use has not been overly ...
5 differences between a systematic review and other types of literature
For narrow, specific research questions, a systematic review can provide a thorough summary and assessment of all of the available evidence. For broader research questions, other types of literature review can summarize the best available evidence using targeted search strategies. Ultimately, the choice of methodology depends on the research ...
Literature Review vs. Systematic Review
Literature Review: Systematic Review: Definition. Qualitatively summarizes evidence on a topic using informal or subjective methods to collect and interpret studies: High-level overview of primary research on a focused question that identifies, selects, synthesizes, and appraises all high quality research evidence to that question ...
PDF Systematic and Non-Systematic Literature Review Differences
A non-systematic literature review that is meant to be an informative, rather than all-encompassing, review of the literature on a topic. Generally takes an in-depth but not systematic approach to a specific research question. Largely based on a knowledgeable selection of current, high-quality articles on the topic of interest.
Literature reviews vs systematic reviews
Table 1: Literature reviews vs systematic reviews. Literature review. Methodological stage. Systematic review. Introduces context and current thinking, often without a specific question, is general and covers several aspects of a topic. Focus of review. Uses a precise question to produce evidence to underpin a piece of research.
Scoping Vs Systematic Reviews
The JBI Reviewers' Manual is designed to provide authors with a comprehensive guide to conducting JBI systematic reviews. It describes in detail the process of planning, undertaking and writing up a systematic review of qualitative, quantitative, economic, text and opinion based evidence.
Ten Steps to Conduct a Systematic Review
Step 5: title and abstract screening. The process of a systematic review encompasses several steps, which include screening titles and abstracts and applying selection criteria. During the phase of title and abstract screening, a minimum of two reviewers independently evaluate the pertinence of each reference.
What is the difference between a systematic review and a systematic
In contrast, a systematic literature review might be conducted by one person. Overall, while a systematic review must comply with set standards, you would expect any review called a systematic literature review to strive to be quite comprehensive. A systematic literature review would contrast with what is sometimes called a narrative or ...
LibGuides: Systematic Reviews: Types of reviews
This approach can: Help you to create a reproducible search strategy. In addition, applying a systematic approach will allow you to work more efficiently. Not every review is a systematic review. Be sure to select the review type that matches the purpose and scope of your project. All reviews should be methodical and done in a careful and ...
Understanding the Differences Between a Systematic Review vs Literature
The methodology involved in a literature review is less complicated and requires a lower degree of planning. For a systematic review, the planning is extensive and requires defining robust pre-specified protocols. It first starts with formulating the research question and scope of the research. The PICO's approach (population, intervention ...
Applied Sciences
This systematic literature review delves into the extensive landscape of emotion recognition, sentiment analysis, and affective computing, analyzing 609 articles. Exploring the intricate relationships among these research domains, and leveraging data from four well-established sources—IEEE, Science Direct, Springer, and MDPI—this systematic review classifies studies in four modalities ...
Guidance to best tools and practices for systematic reviews
Guise JM, Chang C, Butler M, Viswanathan M, Tugwell P. AHRQ series on complex intervention systematic reviews—paper 1: an introduction to a series of articles that provide guidance and tools for reviews of complex interventions. J Clin Epidemiol. 2017;90:6-10.