medRxiv

Data Quality in health research: a systematic literature review

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Filipe Andrade Bernardi
  • For correspondence: [email protected]
  • ORCID record for Nathalia Yukie Crepaldi
  • ORCID record for Diego Bettiol Yamada
  • ORCID record for Vinícius Costa Lima
  • ORCID record for Rui Pedro Charters Lopes Rijo
  • Info/History
  • Preview PDF

Decision-making and strategies to improve service delivery need to be supported by reliable health data to generate consistent evidence on health status, so the data quality management process must ensure the reliability of the data collected. Thus, through an integrative literature review, the main objective of this work is to identify and evaluate digital health technology interventions designed to support the conduct of health research based on data quality. After analyzing and extracting the results of interest, 33 articles were included in the review. This transdisciplinarity may be reaching the threshold of significant growth and thus forcing the need for a metamorphosis of the area from focusing on the measurement and evaluation of data quality, today focused on content, to a direction focused on use and context

In general, the main barriers reported in relation to the theme of research in the area of health data quality cite circumstances regarding a) use, b) systems and c) health services.. The resources presented can help guide medical decisions that do not only involve medical professionals, and indirectly contribute to avoiding decisions based on low-quality information that can put patients’ lives at risk

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the São Paulo Research Foundation (FAPESP) - grant no. 2020/01975-9 – as part of the project “Digital health for the End TB strategy: from linked data integration to a better evidence-based decision making”, coordinated by author D.A.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Not Applicable

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Data Availability

The list of all findings included in our review is available at Supplementary file 1 - Individual studies and https://docs.google.com/spreadsheets/d/1l-1do1xn1jGq4uXrfHQrdnA_Y2LC_ku5/edit?usp=sharing&ouid=110857452492520611792&rtpof=true&sd=true :

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Twitter logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
  • Addiction Medicine (336)
  • Allergy and Immunology (664)
  • Anesthesia (178)
  • Cardiovascular Medicine (2594)
  • Dentistry and Oral Medicine (314)
  • Dermatology (218)
  • Emergency Medicine (390)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (918)
  • Epidemiology (12131)
  • Forensic Medicine (10)
  • Gastroenterology (752)
  • Genetic and Genomic Medicine (4026)
  • Geriatric Medicine (378)
  • Health Economics (670)
  • Health Informatics (2604)
  • Health Policy (994)
  • Health Systems and Quality Improvement (965)
  • Hematology (358)
  • HIV/AIDS (830)
  • Infectious Diseases (except HIV/AIDS) (13610)
  • Intensive Care and Critical Care Medicine (785)
  • Medical Education (397)
  • Medical Ethics (109)
  • Nephrology (426)
  • Neurology (3793)
  • Nursing (208)
  • Nutrition (561)
  • Obstetrics and Gynecology (728)
  • Occupational and Environmental Health (689)
  • Oncology (1983)
  • Ophthalmology (575)
  • Orthopedics (235)
  • Otolaryngology (303)
  • Pain Medicine (248)
  • Palliative Medicine (72)
  • Pathology (470)
  • Pediatrics (1097)
  • Pharmacology and Therapeutics (456)
  • Primary Care Research (443)
  • Psychiatry and Clinical Psychology (3376)
  • Public and Global Health (6467)
  • Radiology and Imaging (1375)
  • Rehabilitation Medicine and Physical Therapy (801)
  • Respiratory Medicine (866)
  • Rheumatology (395)
  • Sexual and Reproductive Health (403)
  • Sports Medicine (336)
  • Surgery (437)
  • Toxicology (51)
  • Transplantation (185)
  • Urology (165)

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 03 August 2024

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

  • Daniel Schwabe   ORCID: orcid.org/0000-0003-2825-3352 1 ,
  • Katinka Becker   ORCID: orcid.org/0009-0007-9663-6722 1 ,
  • Martin Seyferth   ORCID: orcid.org/0009-0000-5930-287X 1 ,
  • Andreas Klaß   ORCID: orcid.org/0009-0007-3244-3729 1 &
  • Tobias Schaeffter   ORCID: orcid.org/0000-0003-1310-2631 1 , 2 , 3  

npj Digital Medicine volume  7 , Article number:  203 ( 2024 ) Cite this article

1013 Accesses

22 Altmetric

Metrics details

  • Health care
  • Scientific data

The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients’ lives. While trustworthiness concerns various aspects including ethical, transparency and safety requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical ML products. We perform a systematic review following PRISMA guidelines using the databases Web of Science, PubMed and ACM Digital Library. We identify 5408 studies, out of which 120 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. The METRIC-framework may serve as a base for systematically assessing training datasets, establishing reference datasets, and designing test datasets which has the potential to accelerate the approval of medical ML products.

Similar content being viewed by others

literature review quality of data

Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis

literature review quality of data

The value of standards for health datasets in artificial intelligence-based applications

literature review quality of data

Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review

Introduction.

During the last decade, the field of artificial intelligence (AI) and in particular machine learning (ML) has experienced unprecedented advances, largely due to breakthroughs in deep learning (DL) 1 , 2 , 3 , 4 , 5 and increased computational power. Recently, the introduction of easy-to-use yet still extremely capable models such as GPT-4 6 and Stable Diffusion 7 has further expanded the technology to an even broader audience. The large-scale handling and implementation of AI 8 into fields such as manufacturing, agriculture and food, automated driving, smart cities and healthcare has since shifted the topic into the centre of attention of not just scholars and companies but the general public.

The introduction of novel and disruptive technologies is typically accompanied by an oscillating struggle between exploiting technological chances and mitigation of risks. ML is proving to have great potential to improve many aspects of our lives 9 , 10 , 11 . However, the race for implementation and utilisation is currently outpacing comprehension of the technology. The complex and black box character of AI applications has therefore largely steered the public conversation towards safety, security and privacy concerns 12 , 13 . A lack of confidence of the general population in the transparency of AI prevents its utilisation for society and economic growth. It can lead to a slowed adoption of innovations in crucial areas and discourage innovators from unlocking the technology’s full potential. Hence, the demand for regulation (e.g., EU AI Act 14 , US FDA considerations 15 ) as well as the need for an improved understanding of AI is ever increasing. This is of particular importance in the field of healthcare due to its large impact on people’s lives. The amount of ML solutions in medicine (research tools and commercial products) is steadily on the rise, in particular in the fields of radiology and cardiology 16 , 17 . Despite breakthroughs up to human-level performance 9 , 18 , 19 , 20 , ML-backed medical products are mainly used as diagnosis assistance systems 17 leaving the final decision to medical human professionals. In particular, medical ML solutions are successfully solving the task of image segmentation 21 , 22 , 23 . Due to the unknown consequences of using AI for medical decision-making, more stringent regulatory requirements are of high importance to accelerate the approval process of new AI products into medical practice. Decision-making needs to be supported by reliable health data to generate consistent evidence. One of the drivers for evidence-based medicine approaches was the introduction of scientific standards in clinical practice 24 . Since then, data integrity (defined by the ALCOA-principles or ALCOA+ 25 ) has become an essential requirement of several guidelines, such as good clinical practice 26 , good laboratory practice 27 or good manufacturing practice 28 . In the pharmaceutical industry, data integrity plays a similarly important role as a requirement for drug trials. While data integrity focuses on maintaining the accuracy and consistency of a dataset over its entire life cycle, data quality is concerned with the fitness of data for use.

To improve confidence in AI utilisation in general, the focus is put on the development of so-called trustworthy AI, which aims at overcoming the black box character and developing a better understanding. Several approaches and definitions for trustworthy AI have been discussed and published over the past years by researchers 29 , 30 , 31 , 32 , 33 , public entities 34 , 35 , corporations 36 , and organisations 37 , 38 . Depending on the area of interest, trustworthiness may include (but is far from limited to) topics such as ethics; societal and environmental well-being; security, safety, and privacy; robustness, interpretability and explainability; providing appropriate documentation for transparency and accountability 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 . In particular, the approach to achieve transparency through documentation has gained much attention in the form of reporting guidelines and best practices. While some initiatives cover the entire ML system and development pipeline (e.g., MINIMAR 39 , FactSheets 40 ), others are concerned with documentation surrounding the model (e.g., Model Cards 41 ), and still others concentrate on the documentation of datasets (e.g., Datasheets 42 , STANDING Together 43 , 44 , Dataset Nutrition Label 45 , Data Cards 46 , Healthsheet 47 , Data Statements for NLP 48 ). These standardisation efforts are a crucial first step for developing a better understanding of ML systems as a whole and of the interdependence of its components (e.g., data and algorithm). However, these approaches cover only limited information on the content of datasets and their suitability for use in ML. Additionally, we note that reporting guidelines and best practices concerning the documentation of datasets are mostly written from the perspective of providers and creators of datasets 42 , 45 , with some explicitly trying to reduce information asymmetry between supplier and consumer 40 .

One of the most critical parts of an AI is the quality of its training data since it has fundamental impact on the resulting system. It lays the foundation and inherently provides limitations for the AI application. If the data used for training a model is bad, the resulting AI will be bad as well (‘garbage in, garbage out’ 49 ). Neural networks are prone to learning biases from training data and amplifying them at test time 50 , giving rise to a much discussed aspect of AI behaviour: fairness 51 . Many remedies have been put forward to tackle discriminating and unfair algorithm behaviour 52 , 53 , 54 . Yet, one of the main causes of undesirable learned patterns lies in biased training data 55 , 56 . Thus, data quality plays a decisive role in the creation of trustworthy AI and assessing the quality of a dataset is of utmost importance to AI developers, as well as regulators and notified bodies.

The scientific investigation of data quality was initiated roughly 30 years ago. The term data quality was famously broken down into so-called data quality dimensions by Wang and Strong in 1996 57 . These dimensions represent different characteristics of a dataset which together constitute the quality of the data. Throughout the years, general data quality frameworks have taken advantage of this approach and have produced refined lists of data quality dimensions for various fields of application and types of data. Naturally, this has produced different definitions and understandings. Within this systematic review, we transfer the existing research and knowledge about data quality to the topic of AI in medicine. In particular, we investigate the research question: Along which characteristics should data quality be evaluated when employing a dataset for trustworthy AI in medicine? The systematic comparison of previous studies on data quality combined with the perspective on modern ML enables us to develop a specialised data quality framework for medical training data: the METRIC-framework. It is intended for assessing the suitability of a fixed training dataset for a specific ML application, meaning that the model to be trained as well as the intended use case should drive the data quality evaluation. The METRIC-framework provides a comprehensive list of 15 awareness dimensions which developers of AI medical devices should be mindful of. Knowledge about the composition of medical training data with respect to the dimensions of the METRIC-framework should drastically improve comprehension of the behaviour of ML applications and lead to more trustworthy AI in medicine.

We note that data quality itself is a term used in different settings, with different meanings and varying scopes. For the purpose of this review, we focus on the actual content of a dataset instead of the surrounding technical infrastructure. We do so since the content is the part of a dataset which ML applications use to learn patterns and develop their characteristics. We thus exclude research on data quality considerations and frameworks within the topic of data governance and data management 58 , 59 . This concerns aspects such as data integration 60 , information quality management 61 , ETL processes in data warehouses 62 , or tools for data warehouses 63 , 64 which do not affect the behavioural characteristics of AI systems. We also omit records discussing case studies of survey data quality 65 , 66 , as well as training strategies to cope with bad data 67 , 68 , 69 , 70 , 71 , 72 .

We further point out that the use of the term AI in current discussions is scientifically imprecise since discussions within the healthcare sector almost exclusively revolve around the implementation of ML approaches, in particular of DL approaches. Technically, the term AI spans a much wider range of technologies than just DL as part of the field of ML. Due to the complexity of DL applications and their proficiency in solving tasks deemed to require human intelligence, the terms are currently often used interchangeably in literature. We follow the same vocabulary here (e.g., ‘trustworthy AI’, ‘AI in medicine’) but stress the limitation of our results to ML approaches.

In order to answer the research question ‘Along which characteristics should data quality be evaluated when employing a dataset for trustworthy AI in medicine?’, we conducted an unregistered systematic review following the PRISMA guidelines 73 . Our predetermined search string contains variations of the following terms: (i) data quality, (ii) framework or dimensions and (iii) machine learning (see Methods for more details and the full search string). The initial search of the databases Web of Science, PubMed and ACM Digital Library was performed on the 12th of April 2024 and yielded 4633 unique results. After title and abstract screening, adding references of the remaining records (‘snowballing’) and full text assessment, we find 120 records that match our eligibility criteria (see Methods). This represents the literature corpus that serves as a foundation for answering the research question. The full workflow is illustrated in Fig. 1 .

figure 1

The flow diagram shows the number of records identified, included and excluded at the different stages of the systematic review. The eligibility criteria for inclusion and exclusion are presented in the bottom right hand side. From a total of 5408 identified studies (4633 from database search, 775 from snowballing), the resulting literature corpus on data quality for trustworthy AI in medicine includes 120 studies.

In Fig. 2 , the papers from our literature corpus are displayed according to their publication year 57 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 190 , 191 , 192 . The overarching topics contained in the corpus naturally divide the papers into three categories: general data (35 entries), big data (8 entries) and ML data (77 entries). This reflects the historic development of the research field of data quality during the last 30 years.

figure 2

The 120 studies are divided into the three categories general data (35), big data (8) and ML data (77), which represent major changes in the perception of data quality. The studies' affiliation to either non-life science (76) or life science (44) related topics is indicated as well.

General data quality

The field first shifted into focus with digital and automatically mass-generated data during the 1980s and 1990s causing a need for quality evaluation and control on a broad scale. While during the first 10 years landmark papers 57 , 74 built the foundation for the field, the last 20 years have seen general data quality frameworks published more frequently 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 . The literature corpus additionally contains general data quality frameworks with high specificity to medical applications 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 while frameworks with high specificity to non-medical topics 193 , 194 were excluded.

The early data quality research in the 1980s and 1990s uncovered the lack of objective measures to asses data quality, which led to the introduction of task-dependent dimensions and the establishment of a data quality framework from the perspective of the data consumer 57 . Another fundamental challenge in the data quality field is the efficient data storage while maintaining quality. This was first investigated with the introduction of a data quality framework from the perspective of the data handler 74 . Both approaches to data quality proved to be useful and were unified in one framework 75 . In the following years, the frameworks were further extended 76 , 77 , equipped with measures 78 , 79 and refined 80 , 81 . Moreover, it became clear that specialised fields such as the medical domain require adapted frameworks.

With the overarching question of how to improve patient care and the rise of electronic health records (EHR) in the 1990s, the need for high data quality in the medical sector increased. Accordingly, one of the first data quality frameworks in healthcare was implemented by the Canadian Institute for Health Information 85 . The first comprehensive data quality framework specifically for EHR data in the literature corpus was established by conducting a survey of quality challenges in EHR 86 . It considers, among other characteristics, accuracy, completeness and particularly timeliness. However, accuracy is hard to quantify in the medical context as even the diagnosis of experienced practitioners sometimes do not coincide. Accordingly, the notion of concordance of differing data sources was introduced 87 . Yet, the data quality frameworks for EHR could only be transferred to other types of medical data to a certain extent. Thus, data quality frameworks for particular data types such as immunisation data, public health data, multi-centre healthcare data or similar were put forward 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 . The various frameworks still suffered from inconsistent terminology and attempts were made to harmonise the definitions and assessment 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 . Particularly, Kahn et al. 97 proposed a framework with exact definitions and recently, Declerck et al. 103 published a ‘review of reviews’ portraying the different terminologies and attempting to map them to a reference. While these developments have advanced the understanding of data quality in the context of medical applications, frameworks for EHR frequently focus on the data quality of individual patients 86 , 87 , neglecting data quality aspects for the overall population. In particular, representativeness is often not a factor 86 , 87 while it is a crucial property for secondary use of data in clinical studies 88 or when reusing medical data as training data for ML applications.

Big data quality

As the amount of data from varying sources grew, conventional databases reached their capacity and the field of big data emerged. Big data is generally concerned with handling huge unstructured data streams that need to be processed at a rapid pace, emphasising the need for extended data quality frameworks. This development is reflected by a small wave of papers published between 2015 and 2020 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 . For example, the weaker structure of the data encouraged the use of data quality frameworks that include the data schema as a data quality dimension 106 , 107 . Further, the increasing amount of data requires the computational efficiency of the surrounding database infrastructure to be a part of big data quality frameworks 108 , 109 , 110 . Computational efficiency is also a limiting factor when ML methods are applied to big data. While it is generally assumed that more data leads to better results, this has to be balanced with computational capabilities. Hence, a data quality framework was developed that bridges the gap between ML and big data 111 . We note that the ‘4 V’s’ (volume, velocity, veracity and variety) of big data 195 implicitly suggest a framework for big data quality. However, the ‘4 V’s’ are in fact big data traits which can have an effect on data quality but are not considered data quality dimensions 196 . They therefore do not contribute to answering our research question and are not further discussed. This might change in the future when data from wearables or remote patient monitoring sensors become available for health management.

ML data quality

The performance and behaviour of DL applications heavily depends on the quality of the data used during training as this is the foundation from which patterns are learned. The records of the literature corpus which discuss or empirically evaluate the effect of data quality on DL deal with a wide variety of data types and models. Many records investigate tabular data while utilising both simpler and more advanced architectures 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 . Recently, studies increasingly look at data quality in the context of sequential data (often time series) 133 , 134 , 135 , 136 , 137 , 138 , 139 , images 119 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , natural language 120 , 121 , 145 , 182 , 183 , 184 , 185 , 186 or other complex types of data 122 , 151 , 187 , 188 , 189 , 190 , 191 , 192 . Some papers try to estimate data quality effects on ML models by using synthetic data 122 , 123 , 189 .

Contrary to the big data and general data quality literature from our corpus, the DL papers focus on the evaluation of one or very few specific data quality dimensions without (yet) considering broader theoretical data quality frameworks. Dimensions that are predominantly investigated are those which can easily be manipulated and lend themselves to be applicable to a wide range of datasets irrespective of specific tasks. The most prominent dimension is amount of data 123 , 124 , 125 , 126 , 129 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 183 , 184 , 185 , 186 , 187 , 188 , 189 which is empirically shown to benefit performance, albeit in a saturating manner. Another dominant topic is completeness to which the ML community almost exclusively refers to as missing data 119 , 125 , 126 , 127 , 128 , 133 , 134 , 135 , 182 . The effect that data errors have on the DL application is also frequently investigated. Specifically, this is done by separately looking at perturbed features (inputs of a NN) 128 , 129 , 130 , 131 , 132 , 133 , 134 , 136 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 182 and noisy targets (predictions to be generated by a NN) 131 , 132 , 133 , 154 , 155 , 156 , 157 , 158 , 159 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 . Many ML settings are classification tasks which is reflected by the corpus often addressing label noise 157 , 158 , 159 , 169 , 182 , 183 . One record highlights the hefty weight that physicians’ annotations carry in medicine 158 . In order to evaluate the effect of data quality (features or targets) on ML applications, the training data is commonly manipulated. On the feature (input) side, e.g., images are distorted by adjusting contrast whereas time series sequences are disturbed by swapping elements. On the target side, e.g., correct labels are randomly replaced by false ones.

When it comes to the concrete behaviour change of the DL algorithm, most of the DL papers in the literature corpus investigate the robustness of a model, i.e. the stable behaviour of a model when facing erroneous or a limited amount of inputs. Only few records investigate generalisability 119 , 144 , 145 or distribution shift 139 , 192 , a model’s capability of coping with new, unseen data. Another noteworthy exception is Ovadia et al. 145 who additionally study predictive uncertainty.

Overall, theoretical data quality frameworks enjoy little attention by the ML community due to the novelty of the ML research field. Papers often focus on few specific data quality dimensions and tasks. Each task comes with its specific data type, necessitating different approaches to manipulate the data and measure these effects. The research dealing with the impact of manipulated data is heavily skewed towards robust behaviour in the sense of predictive performance. Other possibly affected aspects such as explainability or fairness are underrepresented and to some degree neglected which is a potential shortcoming for safety-critical applications such as medical diagnosis predictions.

METRIC-framework for medical training data

The literature corpus has shown that while similar ideas exist for the assessment of data quality across fields and applications, the idiosyncrasy of each field or application can only be captured by specialised frameworks rather than by a one-model-fits-all framework. The evaluation of data quality plays a particularly important role in the field of ML due to the fact that its behaviour is not only dependent on the algorithm choice but also strongly depends on its training data. At the same time, ML is implemented in various fields, each processing and requiring different types and qualities of data. We therefore propose a specialised data quality framework for evaluating the quality of medical training data: the METRIC-framework (Fig. 3 ), which is based on our literature corpus 57 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 190 , 191 , 192 . We note that the METRIC-framework is specifically not designed to assess the data quality of a dataset in vacuum. Rather, it was conceived for the situation where the purpose of the desired medical AI is known. Thus, the intention of the METRIC-framework is to assess the appropriateness of a dataset with respect to a specific use case. From now on, we refer to data quality for training (or test) data of medical ML applications only. We point out that our framework does not yet include a guideline on the assessment or measurement of data qualities but rather presents a set of awareness dimensions which play a central role in the evaluation of data quality.

figure 3

This specialised framework for evaluating data quality of the content of medical training data includes a comprehensive set of awareness dimensions. The inner circle divides data quality into five clusters. These clusters contain a total of 15 data quality dimensions, which are shown on the outer circle. The subdimensions presented in grey on the border of the figure contribute to the superordinate dimension. Due to the shape of the graphic, we refer to it as wheel of data quality .

While examining the literature corpus, we found that terms describing data quality appear under varying definitions, or often with no definition at all. While standardisation efforts exist for the terminology in the context of evaluating data quality 83 , 197 , 198 , they are often not employed or did not exist yet for older papers making comparisons difficult. Therefore as a first step, we extracted all mentioned data quality dimensions from the literature corpus together with their definitions (if present) and added them to a list. This yielded 461 different terms with 991 mentions across all papers. Second, we hierarchically clustered the terms with respect to their intended meaning and according to their dependencies into clusters, dimensions and subdimensions (see Methods for more details on data extraction). We thus obtained 38 relevant dimensions and subdimensions which are displayed on the outer circle of Fig. 3 . In Tables 1 – 6 , we provide a complete list of definitions for all 38 relevant dimensions and subdimensions, as well as their hierarchy, practical examples and references with respect to the literature corpus. We adopted definitions from a recent data quality glossary 197 if they existed there and met our understanding of the dimension in the given context of medical training data. If necessary, we included definitions given by Wang et al. 57 in a second iteration. If none of these two sources suggested an appropriate definition, we captured the meaning of the desired term on the basis of the literature corpus and thus determined its definition in the context of medical training data.

The METRIC-framework encompasses three levels of details: clusters which pool similar dimensions; dimensions which are individual characteristics of data quality; and subdimensions which split larger dimensions into more detailed attributes (compare Fig. 3 from inside to outside). Besides the terms contained in the METRIC-framework, we found several frequently mentioned dataset properties which we, for our purpose, want to separate from the METRIC-framework. We summarise these additional properties under a separate cluster called data management (Fig. 4 ). The attributes included in this cluster ensure that a dataset is well-documented, legally and effectively usable. In particular, it includes the properties documentation , security and privacy , as well as the well-established FAIR-Principles 199 . Appropriate documentation of datasets is the topic of multiple initiatives 42 , 43 , 44 , 45 , 46 , 47 that give guidance for the data creator and handler. The METRIC-framework on the other hand is targeted towards AI developers. It evaluates the suitability of the content of the data for a specific ML task, which is greatly facilitated by appropriate documentation but does not depend on it. Similarly, the FAIR-principles 199 , requiring data to be findable, accessible, interoperable and reusable, are vital for evaluating datasets for general purpose but are not included in the METRIC-framework since the question of fit for a specific purpose can only be asked when a dataset is already successfully obtained. Security is another important aspect of data management: Who can access and edit the data? Can it be manipulated? Again, such questions concern the handling of the data, not the evaluation of its content. Finally, privacy (data privacy and patient privacy) is a delicate and heavily discussed topic in the context of healthcare. However, we separate these issues from the METRIC-framework since they concern data collection, creation and handling. We note that aspects such as anonymisation or pseudonymisation may impact the quality of the content of a dataset by, e.g., removing information 167 . However, the METRIC-framework is designed to evaluate the resulting dataset with respect to its usefulness for a specific task, not the quality of the modifications. Hence, while these properties play a central role in the creation, handling, management and obtainment of data, the METRIC-framework is targeted at the content of a dataset since that is the part the ML algorithm learns from. Therefore, we see the data management cluster as a prerequisite for data quality assessment by the METRIC-framework which itself divides the concept of data quality for the content of a dataset into five clusters: measurement process , timeliness , representativeness , informativeness , consistency . A summary of the characteristics and key aspects of all five clusters is given in Table 7 .

figure 4

The cluster data management is concerned with the effective usage of the dataset. It includes basic requirements for the dataset but does not address data quality issues regarding its content. Therefore, it can be seen as a prerequisite for assessment using the METRIC-framework. Figuratively speaking, the data management cluster serves as a stable foundation for the wheel of data quality.

Measurement process

The cluster measurement process captures factors that influence uncertainty during the data acquisition process. Two of the dimensions within this cluster differentiate between technical errors originating from devices during measurement (see device error ) and errors induced by humans during, e.g., data handling, feature selection or data labelling (see human-induced error ). For the dimension device error , we distinguish between the subdimension accuracy , the systematic deviation from the ground truth (also called bias), and the subdimension precision , the variance of the data around a mean value (also called noise). In practice, a ground truth for medical data is most often not attainable, making accuracy evaluation impossible. In that case, the level and structure of noise in the training data should be compared to the expected noise in the data after AI deployment. If the training data only contains low noise but the AI is utilised in clinical practice on data with much higher noise levels, the performance of the AI application might not be sufficient since the model did not face suitable error characteristics during training. Therefore, lower noise data is not necessarily better and adding noise to the training data might in some instances even improve performance 200 , 201 , 202 . The errors belonging to the dimension human-induced errors are of a fundamentally different nature and need to be treated accordingly. This type of error includes human carelessness and outliers in the dataset due to (unintentional) human mistakes. The final subdimension, noisy labels , is one of the most relevant topics in current ML research 157 , 159 , 169 , 182 . Since in the medical domain, supervised learning paradigms are prevalent, proper feature selection and reliable labelling are indispensable. However, human decision making can be highly irrational and subjective, especially in the medical context 203 , 204 , 205 , representing one of various sources of labelling noise 206 . Among expert annotators there is often considerable variability 206 , 207 . Even in the most common (non-medical) datasets of ML (e.g., MNIST 208 , CIFAR-100 209 , Fashion-MNIST 210 ) there is a significant percentage of wrong labels 211 , 212 . In contrast to the precision of instruments, noise in human judgements is demanding to be assessed through so-called noise audits to identify different factors, like pattern noise and occasion noise in the medical decision process 213 . Such intra- and inter-observer variability has always been a highly important topic in many medical disciplines, e.g., in radiology where guidelines, training and consensus reading approaches are used to reduce noise 214 .

Another issue that frequently occurs in the data acquisition process and which plays an important role in ML is the absence of data values with unknown reason. We follow the ML vocabulary by capturing this quality issue with the dimension completeness , while noting that outside of ML contexts, this term is commonly used to describe representativeness, coverage or variety in other contexts. Most prominently, Wang et al. 57 define completeness as ‘breadth, depth, and scope of information’. This definition has been picked up by other researchers, as well 100 , 106 , 126 . In ML, however, completeness is usually measured by the ratio of missing to total values. Apart from the mostly quantitative dimensions within the cluster, the dimension source credibility is concerned with mostly qualitative characteristics. On the one hand, it includes the question whether or not the measured data can be trusted based on the expertise of people involved in data measurement, processing and handling. On the other hand, the subdimension traceability evaluates whether changes from original data to its current state are documented. Being aware of modifications such as the exclusion of outliers, automated image processing in medical imaging or data normalisation and their utilised algorithms are necessary for understanding the composition of the data. Finally, the subdimension data poisoning considers whether the data was intentionally corrupted (e.g., adversarial attacks) to cause distorted outcomes. The entire cluster measurement process is crucial for data quality evaluation in the medical field since errors may propagate through the ML model and lead to false diagnosis or treatment of patients.

We note that special consideration has to be given to the field of medical imaging within the measurement process cluster due to the fact that many imaging devices are not classical measurement devices. For instance, in current radiological practice, decisions are still based mainly on visual inspection of images and rating of diseases and therapy effects are often done in qualitative terms such as ‘enlarged’, ‘smaller’ or ‘enhanced’. This places a lot of importance on the qualitative subdimensions source credibility and expertise , with respect to quality assessment in such use cases. However, over the last two decades significant efforts have been made to establish quantitative imaging biomarkers to transform scanners more into measurement devices to quantify biophysical parameters, like flow, perfusion, diffusion or elasticity. Such quantitative imaging approaches reduce the operator dependency and enable more quantitative evaluation in the dimension device error . Worldwide alliances such as Quantitative Imaging Biomarkers Alliance (QIBA) launched in 2007 by the Radiology Society North America 215 and now replaced by the Quantitative Imaging Committee (QUIC), the Quantitative Imaging Network (QIN) of the National Cancer Institute in the US 216 or the European Imaging Biomarkers Alliance (EIBALL) by European Society of Radiology 217 are committed to make this transformation.

Since medical knowledge and understanding is subject to constant development, it is important to investigate the cluster timeliness which indicates whether the point in time at which the dataset is used in relation to the point in time at which it was created and updated is appropriate for the task at hand. Indications for diagnoses based on medical data may have changed since a dataset was created and labelled, and changes in coding systems (such as the transition from ICD-9 to ICD-10 or ICD-9-CM to ICD-10-CM) may affect mortality and injury statistics 218 , 219 . The age of the data dictates whether such investigations are necessary. In such cases, the labels or standards utilised would then have to be appropriately updated to satisfy the subdimension currency . Furthermore, knowledge about the subdimension age might provide information about precision and accuracy of the measurement as it gives insight into the technology used during data acquisition.

Representativeness

Another central cluster, especially for medical applications, is representativeness . Its dimensions are concerned with the extent to which the dataset represents the targeted population (such as patients) for which the application is intended. Whether the population of the dataset covers a sufficient range in terms of age, sex, race or other background information is the topic of the subdimension variety in demographics contained within the dimension variety . This dimension also contains the subdimension variety of data sources concerned with questions such as: Does the data originate from a single site? Were the measurements done with devices from the same or different manufacturers? Appropriately investigating such questions can provide a strong indication for the applicability and generalisability of the ML application in different environments 220 , 221 , 222 , 223 . The dimension depth of data is one of the main topics of the ML papers in our literature corpus. Apart from the subdimension dataset size already discussed in the previous section, this dimension also includes the subdimension granularity , which considers whether the level of detail (e.g., the resolution of image data) is sufficient for the application, as well as the subdimension coverage , which investigates whether sub-populations (e.g., specific age groups) are still diverse by themselves (e.g., still contain all possible diagnoses in case of classification applications). Finally, the highly-discussed dimension target class balance pays tribute to the technical requirements of ML 140 , 141 , 144 , 150 , 159 . An algorithm must learn patterns for specific classes from the training data. However, strong imbalances in the class ratio could be caused by, e.g., rare diseases. In order to still be able to properly learn corresponding patterns it may be helpful to deliberately overrepresent rare classes in the dataset instead of matching their real world distribution 224 , 225 .

Informativeness

The cluster informativeness considers the connection between the data and the information it provides and whether the data does so in a clear, compact and beneficial way. First of all, the understandability of the data considers whether the information of the data is easily comprehended. Second, the dimension redundancy investigates whether such information is concisely communicated (see subdimension conciseness ) or whether redundant information is present such as duplicate records (see subdimension uniqueness ). The dimension informative missingness answers the question whether the patterns of missing values provide additional information. Che et al. 135 find an informative pattern in the case of the MIMIC-III critical care dataset 226 which displays a correlation between missing rates of variables and ICD9-diagnosis labels. Missingness patterns are categorised by the literature into either not missing at random (NMAR), missing at random (MAR) or missing completely at random (MCAR) 227 , 228 . Finally, feature importance is concerned with the overall relevance of the features for the task at hand and moreover with the value each feature provides for the performance of a ML application since the quantity of data has to be balanced with computational capability. Valuable features might in many cases be as important as dataset size 229 , which is a frequently discussed topic in the data-centric AI community 230 .

Consistency

The dimensions belonging to the cluster consistency illuminate the topic of consistent data presentation from three perspectives. Rule-based consistency summarises subdimensions concerned with format ( syntactic consistency ), which includes the fundamental and well-discussed topic of data schema 106 , and the conformity to standards and laws ( compliance ). These subdimensions ensure that the dataset is easily processable on the one hand and comparable and legally correct on the other. Logical consistency evaluates whether or not the content of the dataset is free of contradictions, both within the dataset (e.g., a patient without kidneys that is diagnosed with kidney stones) and in relationship to real world knowledge (e.g., a 200-year-old patient). The last dimension of the cluster, distribution consistency , concerns the distributions and their statistical properties of relevant subsets of the total dataset. While the subdimension homogeneity evaluates whether subsets have similar or different statistical properties at the same point in time (e.g., can data from different hospitals be identified by statistics?), the subdimension distribution drift deals with varying distributions at different time points. This subdimension can be neglected if the dataset is not continuously changing over time, but distribution drift is sometimes unconsciously discarded due to a lack of model surveillance. Therefore, it is a prominent research topic 145 and the unconsciousness furthermore underlines the importance of distribution drift for medical applications 93 .

The METRIC-framework (Fig. 3) represents a comprehensive system of data quality dimensions for evaluating the content of medical training data with respect to an intended ML task. We stress again that these dimensions should for now be regarded as awareness dimensions. They provide a guideline along which developers should familiarise themselves with their data. Knowledge about these characteristics is helpful for recognising the reason for the behaviour of an AI system. Understanding this connection enables developers to improve data acquisition and selection which may help in reducing biases, increasing robustness, facilitating interpretability and thus has the potential to drastically improve the AI’s trustworthiness.

With training data being the basis for almost all medical AI applications, the assessment of its quality gains more and more attention. However, we note that providing a division of the term data quality into data quality dimensions is only the first step on the way to overall data quality assessment. The next step will be to equip each data quality dimension with quantitative or qualitative measures to describe their state. The result of this measure then has to be evaluated with respect to the question: Is the state of the dimension appropriate for the desired AI algorithm and its application? These three steps (choosing a measure, obtaining a result, evaluating its appropriateness for the desired task) can be applied to each dimension and subdimension. Appropriately combining the individual outcomes can potentially serve as a basis for a measure of the overall data quality in future work.

So far the dimensions in the METRIC-framework are not ranked in any way. However, it is clear that some of them are more important than others. Therefore, some dimensions deserve more attention in the assessment process or might even be a criterion for exclusion of a dataset for a certain task. These dimensions should be among the first to be assessed in practice. On the other hand, some dimensions are much more difficult to measure and evaluate than others. This can be due to their qualitative nature, the complexity of the statistical measure, the degree of use-case dependence or the expert knowledge that is needed for the assessment, to name a few. These considerations are of central interest for the development of a complete data quality assessment and examination process.

In Fig. 5 , we provide insights that should be taken into consideration when practically assessing data quality. We classify each of the 15 awareness dimensions along two different properties. On the one hand, we estimate whether a dimension requires mostly quantitative or qualitative measures. We observe that about half of the dimensions require mostly quantitative measures while a fifth necessitate more manual inspection by qualitative measures (see left-hand side of Fig. 5) . Being able to choose quantitative measures typically implies more objectivity and enables automation, two desirable properties for quality assessment. Dimensions categorised as mostly qualitatively measurable or requiring a mixture of quantitative and qualitative input will typically require specific domain knowledge from the medical field. Such domain knowledge can be difficult to obtain and expensive.

figure 5

Categorisation of dimensions along the properties quantitative vs. qualitative measure (left) and use case dependence for evaluating data quality (right). The affiliation to a category is colour-coded. The colour scale is presented in the inner circle.

On the other hand, we consider whether the state of a dimension or the evaluation of its appropriateness level is use case dependent (see right-hand side of Fig. 5) . This is of interest to developers as use case dependent dimensions require not only additional knowledge, work and time during quality assessment but also during quality improvement of data. Our findings suggest a division of the wheel of data quality after categorising all 15 dimensions. The clusters representativeness and timeliness as well as the dimensions device error and feature importance belong to the group of use case dependent dimensions. Whether a dataset is representative of the targeted population can only be evaluated with knowledge of the use case. Similarly, the importance of features changes between applications. Whether the age and currency of the data (see dimension timeliness ) are appropriate can also differ depending on the task. For instance, the coding standard the data should conform to depends on the application. The newest standards are not necessarily the best if in practice these standards are not implemented (see section on Timeliness ). Similarly, reducing noise levels in the data is not necessarily better for all applications. It rather depends on the expected noise levels of the application (see section on Measurement process for more detail).

For an overall assessment of the quality of the dataset, we estimate that on average the dimensions of the representativeness cluster together with the dimensions feature importance , distribution consistency and human-induced error are crucial factors. Ignoring a single one of these dimensions potentially has proportionally larger effects on the AI application than other dimensions. This might also depend on the type of ML problem. Actual quantification of the effect of data quality dimensions on ML applications is part of ongoing and future research. Nevertheless, we for now recommend prioritising these six dimensions if it is possible to dedicate time to evaluating or improving a dataset. With the exception of the dimension feature importance , all of the crucial dimensions are simultaneously measured mostly quantitatively making them primary candidates for software tools designed for improving the quality of datasets.

The importance of data quality for medical ML products is undisputed and gaining more and more attention with on-going discussions about fairness and trustworthiness. Parts of future regulation and certification guidelines will not only include ML algorithms but likely also require evaluating the quality of datasets used for their training and testing. Such inclusion of data quality in regulation requires systematic assessment of medical datasets. The METRIC-framework may serve as a base for such a systematic assessment of training datasets, for establishing reference datasets, and for designing test datasets during the approval of medical ML products. This has the potential to accelerate the process of bringing new ML products into medical practice.

Literature review

In order to answer the research question ‘Along which characteristics should data quality be evaluated when employing a dataset for trustworthy AI in medicine?’, we conducted a systematic review following the PRISMA guidelines 73 . The goal of such a review is to objectively collect the knowledge of a chosen research area by summarising, condensing and expanding the ideas to further its progress. PRISMA reviews commonly follow four main steps: (i) Searching suitable databases with carefully formulated search strings and extracting matching papers; (ii) screening titles and abstracts to include or exclude papers based on predetermined criteria; (iii) extending the literature list by screening titles and abstracts of all referenced papers from the included papers (called ‘snowballing’); (iv) screening the full text of all still included papers with respect to the eligibility criteria to build the final literature corpus.

Search strategy

Our research question aims at combining the knowledge from the field of general data quality frameworks with insights about the effects that the quality of training data has on ML applications in medicine. This should ultimately lead to a novel framework for data quality in the context of medical training data. Therefore, we built a search string that on the one hand targeted papers about data quality frameworks by combining variations of ‘data quality’ with variations of the terms ‘framework’ and ‘dimensions’. On the other hand, we attempted to collect papers about the connection between the quality of training data and the behaviour of a DL application by again combining variations of the word ‘data quality’ but this time with variations of ‘machine learning’, including ‘artificial intelligence’ and ‘deep learning’ (see Search query). We then performed the database search on one general and two thematically suitable online databases: Web of Science, PubMed and ACM Digital Library. We are aware that the choice of databases skews, to some degree, all interpretations which, to some extent, is mitigated by snowballing. All retrieved results were concatenated and duplicates removed, yielding 4633 records.

Search query

The following search string in pseudo-code (visualised in Fig. 6 ) was executed on the 12th of April 2024 on Web of Science, PubMed and ACM Digital Library:

figure 6

Visualisation of the keywords and logical connections that formed the search string. Each box can be translated to parantheses in the search string. Keywords inside each box are connected with each other by a logical OR .

(("data quality" OR "data-quality"   OR "data qualities" OR "quality of data"   OR "quality of the data" OR "qualities of data"   OR "qualities of the data" OR "quality of training data"   OR "quality of the training data" OR "quality of ML data"   OR "data bias" OR "data biases"   OR "bias in the data" OR "biases in the data"   OR "data problem" OR "data problems"   OR "problem in the data" OR "problem with the data"   OR "problems with the data" OR "data error"   OR "data errors" OR "error in the data"  )  AND  ("dimension" OR "dimensions"   OR "AI" OR "artificial intelligence"   OR "ML" OR "machine learning"   OR "deep learning"   OR "neural network" OR "neural networks"  ) ) OR ("data quality framework" OR "data quality frameworks"  OR "framework of data quality" OR "framework for data quality" )

The chosen databases supported exact (instead of fuzzy) searches, expressed by quotation marks around keywords. The search was applied to the title and abstract fields of all records of the databases.

Eligibility criteria

In Table 8 , our chosen eligibility criteria that were applied to the various screening steps are listed. Papers were included if they either provided broad-scale data quality frameworks with general purpose or with specificity to a medical application, or if they discussed or quantified the effects of at least one training data quality dimension on DL behaviour. In contrast, papers were excluded if they (i) either discussed frameworks with specificity to non-medical fields or (ii) only considered single or few data quality dimensions without reference to ML or (iii) focused on the quality of data management and surveys. No limits were imposed with respect to publication date or publisher source (i.e. peer-reviewed or not), while non-English records and inaccessible records were omitted.

We note that in order to be as precise and logical as possible during the practical screening and eligibility checks, we implemented the following eligibility criteria: (I1) Inclusion: No exclusion criteria apply; (I2) Inclusion: Study measures effect of data on DL; (E1) Exclusion: Focus of study is not data quality; (E2) Exclusion: Focus of study is not on general theoretical data quality framework; (E3) Exclusion: Study has high specificity to non-medical field; (E4): Exclusion: Focus of study is quality of data management or surveys. The logic we applied during screening and eligibility check is: If any exclusion criteria applies, the study is excluded, unless an inclusion criteria applies at the same time.

Literature review process

Titles and abstracts from the records of the database search were screened with respect to the eligibility criteria. This was done by two authors independently to mitigate biases. In case of disagreement, consensus was achieved by discussion. If necessary, a third author was consulted to arrive at the final decision. This step reduced the number of records to 165. The snowballing step expands the scope of the literature corpus to make it more independent of the initially chosen databases and search string which is important to reduce bias. For the process of snowballing, we considered all references from the so far 165 included papers which resulted in adding 775 records to the literature list. Analogously, title and abstract screening was performed on these new entries with the same criteria and workflow as before, leaving 135 additional papers from snowballing. As a final step, all 300 remaining papers were evaluated on the full text with respect to the eligibility criteria. In the end, 120 entries passed all screening steps. For each retrieved record, the decision whether to include or exclude was documented along with the corresponding eligibility criterion. Each record which had passed the screening was eligible for extracting data quality terms.

Data extraction strategy

In order to introduce a comprehensive data quality framework, the 120 selected records were each read by two authors and all terms that were deemed relevant to describe data quality were extracted. See Table 9 for details on extracted vocabulary from each record. We discarded terms if (i) their scope is limited to a specialised data source and not transferable to a general framework, (ii) the term refers to the quality of database infrastructure or (iii) no definition was given and it was impossible to grasp the intended meaning from the context. The accepted terms were copied into an Excel sheet, which served as a starting template for the METRIC-framework. We clustered related concepts into groups according to the terms’ definition or intended meaning. From these small and detailed groups we formed the so-called subdimensions, ensuring that each subdimension is mentioned by at least three references in the literature corpus, otherwise the level of detail was deemed too great leading to further grouping.

It seems that with 461 extracted terms, we are beyond a saturation point of finding new data quality dimensions. From a certain point on, more synonyms do not uncover new concepts. From a bias assessment point of view, it is possible that the literature that investigates effects of data quality on ML could be skewed towards investigating and reporting dimensions with bigger effects. The risk of missing out on vocabulary due to this is mitigated by the inclusion of broad theoretical frameworks in our literature corpus.

Thorough discussion of all authors about underlying concepts and definitions of the subdimensions resulted in hierarchically grouping these into dimensions and the dimensions into clusters. In parallel to this grouping, all authors reached consensus on definitions for dimensions and subdimensions of the METRIC-framework. The definitions were adopted from a recent data quality glossary 197 if they existed there and met our understanding of the vocabulary in the given context of medical training data. If necessary, we included definitions given by Wang et al. 57 in a second iteration. If none of these two sources suggested an appropriate definition, we captured the meaning of the desired term on the basis of the literature corpus and thus determined its definition in the context of medical training data (see Tables 1 – 6) .

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data utilised in this study is available. The literature database that serves as a basis for this systematic review is provided in Supplementary Data 1 . The extracted data quality vocabulary from the literature database that serves as a basis for the METRIC-framework is provided in Supplementary Data 2 .

Code availability

All code utilised for this study is available at https://github.com/danielschw188/ReviewPaper_DataQualityForMLinMedicine .

Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

Deng, L. Artificial intelligence in the rising wave of deep learning: the historical path and future outlook. IEEE Signal Process. Mag. 35 , 180–177 (2018).

Article   Google Scholar  

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550 , 354–359 (2017).

Article   CAS   PubMed   Google Scholar  

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition , 770–778 (2016).

Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition , 779–788 (2016).

OpenAI. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).

Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 10684–10695 (2022).

Chui, M., Yee, L., Hall, B. & Singla, A. The state of AI in 2023: Generative AI’s breakout year. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-generative-ais-breakout-year (2023).

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Teoh, E. R. & Kidd, D. G. Rage against the machine? Google’s self-driving cars versus human drivers. J. Saf. Res. 63 , 57–60 (2017).

von Eschenbach, W. J. Transparency and the black box problem: why we do not trust AI. Philos. Technol. 34 1607–1622 (2021).

UK Government. Chair’s Summary of the AI Safety Summit 2023. https://www.gov.uk/government/publications/ai-safety-summit-2023-chairs-statement-2-november (2023).

Council of the European Union and European Parliament. Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 (2021).

Food and Drug Administration. Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD). https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf (2019).

Muehlematter, U. J., Daniore, P. & Vokinger, K. N. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. Lancet Digit. Health 3 , e195–e203 (2021).

Zhu, S., Gilbert, M., Chetty, I. & Siddiqui, F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. Int. J. Med. Inform. 165 , 104828 (2022).

Article   PubMed   Google Scholar  

Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316 , 2402–2410 (2016).

Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25 , 954–961 (2019).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1 , e271–e297 (2019).

Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, 234–241 (Springer International Publishing, Cham, 2015).

Chen, J. et al. TransUNet: Transformers make strong encoders for medical image segmentation. Preprint at https://doi.org/10.48550/arXiv.2102.04306 (2021).

Hatamizadeh, A. et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries , 272–284 (Springer International Publishing, 2022).

Feinstein, A. R. Scientific standards in epidemiologic studies of the menace of daily life. Science 242 , 1257–1263 (1988).

WHO Technical Report Series, no. 1033. Annex 4—guideline on data integrity. https://www.gmp-navigator.com/files/guidemgr/trs1033-annex4-guideline-on-data-integrity.pdf (2021).

International Council For Harmonisation Of Technical Requirements For Pharmaceuticals For Human Use (ICH). Integrated addendum to ich e6(r1): guideline for good clinical practice. https://www.slideshare.net/ICRInstituteForClini/integrated-addendum-to-ich-e6r1-guideline-for-good-clinical-practice-e6r2 (2016).

Directive 2004/9/EC of the European Parliament and of the Council of 11 February 2004 on the inspection and verification of good laboratory practice (GLP). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02004L0009-20190726 (2004).

EudraLex - Volume 4 - Good Manufacturing Practice (GMP) guidelines. https://health.ec.europa.eu/medicinal-products/eudralex/eudralex-volume-4_en .

Adadi, A. & Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6 , 52138–52160 (2018).

Liu, H. et al. Trustworthy AI: a computational perspective. ACM Trans. Intell. Syst. Technol. 14 , 1–59 (2022).

Google Scholar  

Li, B. et al. Trustworthy AI: from principles to practices. ACM Comput. Surv. 55 , 1–46 (2023).

Kale, A. et al. Provenance documentation to enable explainable and trustworthy AI: a literature review. Data Intell. 5 , 139–162 (2023).

Alzubaidi, L. et al. Towards risk-free trustworthy artificial intelligence: significance and requirements. Int. J. Intell. Syst. 2023 , 41 (2023).

AI, H. High-level expert group on artificial intelligence. https://digital-strategy.ec.europa.eu/en/library/policy-and-investment-recommendations-trustworthy-artificial-intelligence (2019).

Commission, E., Directorate-General for Communications Networks, C. & Technology. The assessment list for trustworthy artificial intelligence (ALTAI). https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment (2020).

Deloitte GmbH Wirtschaftsprüfungsgesellschaft. Trustworthy AI. https://www2.deloitte.com/de/de/pages/innovation/contents/trustworthy-ai.html .

VDE Verband der Elektrotechnik Elektronik Informationstechnik e.V. VCIO-based description of systems for AI trustworthiness characterisation. VDE SPEC 90012 v1.0 (en). https://www.vde.com/resource/blob/2242194/a24b13db01773747e6b7bba4ce20ea60/vcio-based-description-of-systems-for-ai-trustworthiness-characterisationvde-spec-90012-v1-0--en--data.pdf (2022).

Interessengemeinschaft der Benannten Stellen für Medizinprodukte in Deutschland - IG-NB. Questionnaire Artificial Intelligence (AI) in medical devices. https://www.ig-nb.de/?tx_epxelo_file%5Bid%5D=884878&cHash=53e7128f5a6d5760e2e6fe8e3d4bb02a (2022).

Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J. P. A. & Shah, N. H. MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. J. Am. Med. Inform. Assoc. 27 , 2011–2015 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Arnold, M. et al. Factsheets: increasing trust in AI services through supplier’s declarations of conformity. IBM J. Res. Dev. 63 , 6:1–6:13 (2019).

Mitchell, M. et al. Model cards for model reporting. In Proc. Conference on Fairness, Accountability, and Transparency , 220–229 (Association for Computing Machinery, New York, NY, USA, 2019).

Gebru, T. et al. Datasheets for datasets. Commun. ACM 64 , 86–92 (2021).

The STANDING Together Collaboration. Recommendations for diversity, inclusivity, and generalisability in artificial intelligence health technologies and health datasets. https://doi.org/10.5281/zenodo.10048356 (2023).

Arora, A. et al. The value of standards for health datasets in artificial intelligence-based applications. Nat. Med. 29 , 2929–2938 (2023).

Holland, S., Hosny, A., Newman, S., Joseph, J. & Chmielinski, K. The Dataset Nutrition Label: A Framework to Drive Higher Data Quality Standards, 1–26 (Hart Publishing, Oxford, 2020).

Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: Purposeful and transparent dataset documentation for responsible AI. In Proc. ACM Conference on Fairness, Accountability, and Transparency (ACM, Seoul, South Korea, 2022).

Rostamzadeh, N. et al. Healthsheet: Development of a transparency artifact for health datasets. In Proc. ACM Conference on Fairness, Accountability, and Transparency (ACM, Seoul, South Korea, 2022).

Bender, E. M. & Friedman, B. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Trans. Assoc. Comput. Linguist. 6 , 587–604 (2018).

Geiger, R. S. et al. Garbage in, garbage out? Do machine learning application papers in social computing report where human-labeled training data comes from? In Proc. Conference on Fairness, Accountability, and Transparency , 325–336 (2020).

Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K.-W. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proc. Conference on Empirical Methods in Natural Language Processing , 2979–2989 (Association for Computational Linguistics, Copenhagen, Denmark, 2017).

Whittlestone, J., Nyrup, R., Alexandrova, A., Dihal, K. & Cave, S. Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research (The Nuffield Foundation, London, 2019).

Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. Learning fair representations. In Proc. 30th International Conference on Machine Learning , vol. 28, 325–333 (PMLR, Atlanta, Georgia, USA, 2013).

Kim, B., Kim, H., Kim, K., Kim, S. & Kim, J. Learning not to learn: training deep neural networks with biased data (2019).

Wang, Z. et al. Towards fairness in visual recognition: effective strategies for bias mitigation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition , 8919–8928 (2020).

Suresh, H. & Guttag, J. A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’21), October 5–9, 2021, NY, USA. ACM, New York, NY, USA. https://doi.org/10.1145/3465416.3483305 (2021).

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. 54 , 1–35 (2021).

Wang, R. Y. & Strong, D. M. Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12 , 5–33 (1996).

Khatri, V. & Brown, C. V. Designing data governance. Commun. ACM 53 , 148–152 (2010).

Liaw, S.-T., Pearce, C., Liyanage, H., Cheah-Liaw, G. S. & De Lusignan, S. An integrated organisation-wide data quality management and information governance framework: theoretical underpinnings. J. Innov. Health Inform. 21 , 199–206 (2014).

Mo, L. & Zheng, H. A method for measuring data quality in data integration. In Proc. International Seminar on Future Information Technology and Management Engineering , 525–527 (2008).

Lindquist, M. Data quality management in pharmacovigilance. Drug Saf. 27 , 857–870 (2004).

Souibgui, M., Atigui, F., Zammali, S., Cherfi, S. & Yahia, S. B. Data quality in ETL process: a preliminary study. Proced. Comput. Sci. 159 , 676–687 (2019).

Gebhardt, M., Jarke, M., Jeusfeld, M. A., Quix, C. & Sklorz, S. Tools for data warehouse quality. In Proc. Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), 229–232 (1998).

Ballou, D. P. & Tayi, G. K. Enhancing data quality in data warehouse environments. Commun. ACM 42 , 73–78 (1999).

Jenkinson, C., Fitzpatrick, R., Norquist, J., Findley, L. & Hughes, K. Cross-cultural evaluation of the Parkinson’s disease questionnaire: tests of data quality, score reliability, response rate, and scaling assumptions in the United States, Canada, Japan, Italy, and Spain. J. Clin. Epidemiol. 56 , 843–847 (2003).

Lim, L. L., Seubsman, S.-a & Sleigh, A. Thai SF-36 health survey: tests of data quality, scaling assumptions, reliability and validity in healthy men and women. Health Qual. life outcomes 6 , 1–9 (2008).

Candemir, S., Nguyen, X. V., Folio, L. R. & Prevedello, L. M. Training strategies for radiology deep learning models in data-limited scenarios. Radiol. Artif. Intell. 3 , e210014 (2021).

Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6 , 1–48 (2019).

Feng, S. Y. et al. A survey of data augmentation approaches for NLP. Preprint at https://doi.org/10.48550/arXiv.2105.03075 (2021)

Larochelle, H., Bengio, Y., Louradour, J. & Lamblin, P. Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10 , 1–40 (2009).

Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proc. 25th International Conference on Machine Learning , 1096–1103 (2008).

Wang, R. & Tao, D. Non-local auto-encoder with collaborative stabilization for image restoration. IEEE Trans. Image Process. 25 , 2117–2129 (2016).

Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88 , 105906 (2021).

Redman, T. C. Data Quality for the Information Age (Artech House, Inc., 1997).

Loshin, D. Dimensions of data quality (2011).

Yoon, V. Y., Aiken, P. & Guimaraes, T. Managing organizational data resources: quality dimensions. Inf. Resour. Manag. J. 13 , 5–13 (2000).

Sidi, F. et al. Data quality: A survey of data quality dimensions. In Proc. International Conference on Information Retrieval & Knowledge Management , 300–304 (2012).

Pipino, L. L., Lee, Y. W. & Wang, R. Y. Data quality assessment. Commun. ACM 45 , 211–218 (2002).

Sebastian-Coleman, L. Measuring Data Quality for Ongoing Improvement: a Data Quality Assessment Framework (Newnes, 2012).

Stvilia, B., Gasser, L., Twidale, M. B. & Smith, L. C. A framework for information quality assessment. J. Am. Soc. Inf. Sci. Technol. 58 , 1720–1733 (2007).

Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K. & Lee, D. A taxonomy of dirty data. Data Min. Knowl. Discov. 7 , 81–99 (2003).

DAMA UK Working Group on Quality Dimensions. The six primary dimensions for data quality assessment. Technical Report, DAMA UK - The premier organisation for data professionals in the UK (DAMA UK, 2013).

International Organization for Standardization and International Electrotechnical Commission. ISO 25012. https://iso25000.com/index.php/en/iso-25000-standards/iso-25012?start=15 (2008).

Corrales, D., Ledezma, A. & Corrales, J. From theory to practice: a data quality framework for classification tasks. Symmetry 10 , 248 (2018).

Long, J., Richards, J. & Seko, C. The Canadian Institute for Health Information Data Quality Framework, version 1: a meta-evaluation and future directions. In Proc. Sixth International Conference on Information Quality , 370–383 (2001).

Chan, K. S., Fowles, J. B. & Weiner, J. P. Electronic health records and the reliability and validity of quality measures: a review of the literature. Med. Care Res. Rev. 67 , 503–527 (2010).

Weiskopf, N. G. & Weng, C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20 , 144–151 (2013).

Nahm, M. Data quality in clinical research. In: Clinical Research Information 175–201 (Springer, 2012).

Almutiry, O., Wills, G., Alwabel, A., Crowder, R. & Walters, R. Toward a framework for data quality in cloud-based health information system. In Proc. International Conference on Information Society (i-Society 2013) , 153–157 (IEEE, 2013).

Chen, H., Hailey, D., Wang, N. & Yu, P. A review of data quality assessment methods for public health information systems. Int. J. Environ. Res. Public Health 11 , 5170–5207 (2014).

Bloland, P. & MacNeil, A. Defining & assessing the quality, usability, and utilization of immunization data. BMC Public Health 19 , 1–8 (2019).

Vanbrabant, L., Martin, N., Ramaekers, K. & Braekers, K. Quality of input data in emergency department simulations: Framework and assessment techniques. Simul. Model. Pract. Theory 91 , 83–101 (2019).

Bian, J. et al. Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data. J. Am. Med. Inform. Assoc. 27 , 1999–2010 (2020).

Kim, K.-H. et al. Multi-center healthcare data quality measurement model and assessment using omop cdm. Appl. Sci. 11 , 9188 (2021).

Article   CAS   Google Scholar  

Tahar, K. et al. Rare diseases in hospital information systems—an interoperable methodology for distributed data quality assessments. Methods Inf. Med. 62 , 71–89 (2023).

Johnson, S. G., Speedie, S., Simon, G., Kumar, V. & Westra, B. L. A data quality ontology for the secondary use of EHR Data. In AMIA Annu Symposium Proceedings (2015).

Kahn, M. G. et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. Egems 4 (2016).

Schmidt, C. O. et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med. Res. Methodol. 21 (2021).

Lewis, A. E. et al. Electronic health record data quality assessment and tools: a systematic review. J. Am. Med. Inform. Assoc. 30 , 1730–1740 (2023).

Liu, C., Talaei-Khoei, A., Storey, V. C. & Peng, G. A review of the state of the art of data quality in healthcare. J. Glob. Inf. Manag. 31 , 1–18 (2023).

Mashoufi, M., Ayatollahi, H., Khorasani-Zavareh, D. & Talebi Azad Boni, T. Data quality in health care: main concepts and assessment methodologies. Methods Inf. Med. 62 , 005–018 (2023).

Syed, R. et al. Digital health data quality issues: systematic review. J. Med. Internet Res. 25 , e42615 (2023).

Declerck, J., Kalra, D., Vander Stichele, R. & Coorevits, P. Frameworks, dimensions, definitions of aspects, and assessment methods for the appraisal of quality of health data for secondary use: comprehensive overview of reviews. JMIR Med. Inform. 12 , e51560 (2024).

Alipour, J. Dimensions and assessment methods of data quality in health information systems. Acta Med. Mediter. 313–320 (2017).

European Medicines Agency. Data quality framework for EU medicines regulation. https://www.ema.europa.eu/system/files/documents/regulatory-procedural-guideline/data-quality-framework-eu-medicines-regulation_en_1.pdf (2022).

Batini, C., Rula, A., Scannapieco, M. & Viscusi, G. From data quality to big data quality. J. Database Manag. 26 , 60–82 (2015).

Eder, J. & Shekhovtsov, V. A. Data quality for medical data lakelands (2020).

Cai, L. & Zhu, Y. The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 14 , 2 (2015).

Gao, J., Xie, C. & Tao, C. Big data validation and quality assurance—issues, challenges, and needs. In Proc. IEEE Symposium on Service-Oriented System Engineering (SOSE) Oxford, UK, 2016, pp. 433–441 (2016).

Ramasamy, A. & Chowdhury, S. Big data quality dimensions: a systematic literature review. J. Inf. Syst. Technol. Manag. https://doi.org/10.4301/S1807-1775202017003 17 (2020).

Gudivada, V., Apon, A. & Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Softw. 10 , 1–20 (2017).

Juddoo, S., George, C., Duquenoy, P. & Windridge, D. Data governance in the health industry: investigating data quality dimensions within a big data context. Appl. Syst. Innov. 1 , 43 (2018).

Ijab, M. T., Mat Surin, E. S. & Mat Nayan, N. Conceptualizing big data quality framework from a systematic literature review perspective. Malays. J. Comput. Sci. 25–37 (2019).

Cao, W., Hu, L., Gao, J., Wang, X. & Ming, Z. A study on the relationship between the rank of input data and the performance of random weight neural network. Neural Comput. Appl. 32 , 12685–12696 (2020).

Johnson, J. M. & Khoshgoftaar, T. M. The effects of data sampling with deep learning and highly imbalanced big data. Inf. Syst. Front. 22 , 1113–1131 (2020).

Sahu, A., Mao, Z., Davis, K. & Goulart, A. E. Data processing and model selection for machine learning-based network intrusion detection. In Proc. IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR) (2020).

Qi, Z.-X., Wang, H.-Z. & Wang, A.-J. Impacts of dirty data on classification and clustering models: an experimental evaluation. J. Comput Sci. Technol. 36 , 806–821 (2021).

Hu, J. & Wang, J. Influence of data quality on the performance of supervised classification models for predicting gravelly soil liquefaction. Eng. Geol. 324 , 107254 (2023).

Jouseau, R., Salva, S. & Samir, C. On studying the effect of data quality on classification performances. Intelligent Data Engineering and Automated Learning – IDEAL. 82–93 (Springer Cham, 2022).

Tran, N., Chen, H., Bhuyan, J. & Ding, J. Data curation and quality evaluation for machine learning-based cyber intrusion detection. IEEE Access 10 , 121900–121923 (2022).

Sha, L., Gašević, D. & Chen, G. Lessons from debiasing data for fair and accurate predictive modeling in education. Expert Syst. Appl. 228 , 120323 (2023).

Lake, S. & Tsai, C.-W. An exploration of how training set composition bias in machine learning affects identifying rare objects. Astron. Comput. 40 , 100617 (2022).

Bailly, A. et al. Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Comput. Methods Prog. Biomed. 213 , 106504 (2022).

Althnian, A. et al. Impact of dataset size on classification performance: an empirical evaluation in the medical domain. Appl. Sci. 11 , 796 (2021).

Michel, E., Zernikow, B. & Wichert, S. A. Use of an artificial neural network (ANN) for classifying nursing care needed, using incomplete input data. Med. Inform. Internet Med. 25 , 147–158 (2000).

Barakat, M. S. et al. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf. Sci. Syst. 5 , 16 (2017).

Radliński, Ł. The impact of data quality on software testing effort prediction. Electronics 12 , 1656 (2023).

Ghotra, B., McIntosh, S. & Hassan, A. E. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proc. IEEE/ACM 37th IEEE International Conference on Software Engineering (2015).

Zhou, Y. & Wu, Y. Analyses on Influence Of Training Data Set To Neural Network Supervised Learning Performance, 19–25 (Springer, Berlin Heidelberg, 2011).

Bansal, A., Kauffman, R. J. & Weitz, R. R. Comparing the modeling performance of regression and neural networks as data quality varies: A business value approach. J. Manag. Inf. Syst. 10 , 11–32 (1993).

Twala, B. Impact of noise on credit risk prediction: does data quality really matter? Intell. Data Anal. 17 , 1115–1134 (2013).

Deshsorn, K., Lawtrakul, L. & Iamprasertkun, P. How false data affects machine learning models in electrochemistry? J. Power Sources 597 , 234127 (2024).

Blake, R. & Mangiameli, P. The effects and interactions of data quality and problem complexity on classification. J. Data Inf. Qual. 2 , 1–28 (2011).

Benedick, P.-L., Robert, J. & Traon, Y. L. A systematic approach for evaluating artificial intelligence models in industrial settings. Sensors 21 , 6195 (2021).

Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8 , 6085 (2018).

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L. & Muller, P.-A. Adversarial attacks on deep neural networks for time series classification. In Proc. International Joint Conference on Neural Networks (IJCNN) (IEEE, Budapest, Hungary, 2019).

Habib, A., Karmakar, C. & Yearwood, J. Impact of ecg dataset diversity on generalization of cnn model for detecting qrs complex. IEEE Access 7 , 93275–93285 (2019).

Ito, A., Saito, K., Ueno, R. & Homma, N. Imbalanced data problems in deep learning-based side-channel attacks: analysis and solution. IEEE Trans. Inf. Forensics Secur. 16 , 3790–3802 (2021).

Zhang, H., Singh, H., Ghassemi, M. & Joshi, S. ‘Why did the model fail?’ Attributing model performance changes to distribution shifts. In Proc. 40th International Conference on Machine Learning , Vol. 202, 41550–41578 (2023).

Masko, D. & Hensman, P. The impact of imbalanced training data for convolutional neural networks. https://www.kth.se/social/files/588617ebf2765401cfcc478c/PHensmanDMasko_dkand15.pdf (2015).

Buda, M., Maki, A. & Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106 , 249–259 (2018).

Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6 , 1–54 (2019).

Bai, M. et al. The uncovered biases and errors in clinical determination of bone age by using deep learning models. Eur. Radiol. 33 , 3544–3556 (2022).

Pan, Y., Xie, F. & Zhao, H. Understanding the challenges when 3D semantic segmentation faces class imbalanced and OOD data. IEEE Trans. Intell. Transp. Syst. 24 , 6955–6970 (2023).

Ovadia, Y. et al. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. Adv. Neural Inf. Process. Syst. 32 (2019).

Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proc. IEEE International Conference on Computer Vision , 843–852 (2017).

Nuha, F. U. Training dataset reduction on generative adversarial network. Proced. Comput. Sci. 144 , 133–139 (2018).

Hong, S. & Shen, J. Impact of training size on deep learning performance in in vivo 1H MRS. In Proc. ISMRM & SMRT Annual Meeting & Exhibition (2021).

Li, Y. & Chao, X. Toward sustainability: trade-off between data quality and quantity in crop pest recognition. Front. Plant Sci. 12 , 811241 (2021).

Li, Y., Yang, J. & Wen, J. Entropy-based redundancy analysis and information screening. Digit. Commun. Netw. 9 , 1061–1069 (2021).

Fan, F. J. & Shi, Y. Effects of data quality and quantity on deep learning for protein-ligand binding affinity prediction. Bioorg. Med. Chem. 72 , 117003 (2022).

Ranjan, R., Sharrer, K., Tsukuda, S. & Good, C. Effects of image data quality on a convolutional neural network trained in-tank fish detection model for recirculating aquaculture systems. Comput. Electron. Agric. 205 , 107644 (2023).

Vilaça, L., Viana, P., Carvalho, P. & Andrade, M. T. Improving efficiency in facial recognition tasks through a dataset optimization approach. IEEE Access 12 , 32532–32544 (2024).

Barragán-Montero, A. M. et al. Deep learning dose prediction for IMRT of esophageal cancer: the effect of data quality and quantity on model performance. Phys. Med. 83 , 52–63 (2021).

Motamedi, M., Sakharnykh, N. & Kaldewey, T. A data-centric approach for training deep neural networks with less data. Preprint at https://doi.org/10.48550/arXiv.2110.03613 (2021).

Xu, G., Yue, Q., Liu, X. & Chen, H. Investigation on the effect of data quality and quantity of concrete cracks on the performance of deep learning-based image segmentation. Expert Syst. Appl. 237 , 121686 (2024).

Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L. & Fergus, R. Training convolutional networks with noisy labels. Preprint at https://doi.org/10.48550/arXiv.1406.2080 (2014).

Wesemeyer, T., Jauer, M.-L. & Deserno, T. M. Annotation quality vs. quantity for deep-learned medical image segmentation. Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications (2021).

He, T., Yu, S., Wang, Z., Li, J. & Chen, Z. From data quality to model quality: An exploratory study on deep learning. In Proc. 11th Asia-Pacific Symposium on Internetware , 1–6 (2019).

Dodge, S. & Karam, L. Understanding how image quality affects deep neural networks. In Proc. Eighth International Conference on Quality of Multimedia Experience (QoMEX) , 1–6 (2016).

Karahan, S. et al. How image degradations affect deep CNN-based face recognition? In Proc. International Conference of the Biometrics Special Interest Group , 1–5 (2016).

Pei, Y., Huang, Y., Zou, Q., Zhang, X. & Wang, S. Effects of image degradation and degradation removal to cnn-based image classification. IEEE Trans. Pattern Anal. Mach. Intell. 43 , 1239–1253 (2019).

Schnabel, L., Matzka, S., Stellmacher, M., Patzold, M. & Matthes, E. Impact of anonymization on vehicle detector performance. In Proc. Second International Conference on Artificial Intelligence for Industries (AI4I) (2019).

Zhong, X. et al. A study of real-world micrograph data quality and machine learning model robustness. npj Comput. Mater. 7 , 161 (2021).

Hukkelås, H. & Lindseth, F. Does image anonymization impact computer vision training? In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition , 140–150 (2023).

Jaspers, T. J. M. et al. Investigating the Impact of Image Quality on Endoscopic AI Model Performance, 32–41 (Springer, Cham, 2023).

Lee, J. H. & You, S. J. Balancing privacy and accuracy: Exploring the impact of data anonymization on deep learning models in computer vision. IEEE Access 12 , 8346–8358 (2024).

Güneş, A. M. et al. Impact of imperfection in medical imaging data on deep learning-based segmentation performance: an experimental study using synthesized data. Med. Phys. 50 , 6421–6432 (2023).

Rolnick, D., Veit, A., Belongie, S. & Shavit, N. Deep learning is robust to massive label noise. Preprint at https://doi.org/10.48550/arXiv.1705.10694 (2017).

Wang, F. et al. The devil of face recognition is in the noise. In Proc. European Conference on Computer Vision (ECCV) , 765–780 (2018).

Peterson, J. C., Battleday, R. M., Griffiths, T. L. & Russakovsky, O. Human uncertainty makes classification more robust. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) , 9616–9625 (IEEE Computer Society, Los Alamitos, CA, USA, 2019).

Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65 , 101759 (2020).

Taran, V., Gordienko, Y., Rokovyi, A., Alienin, O. & Stirenko, S. Impact of ground truth annotation quality on performance of semantic image segmentation of traffic conditions. Advances in Computer Science for Engineering and Education II, 183–193 (Springer, Cham, 2020).

Volkmann, N. et al. Learn to train: improving training data for a neural network to detect pecking injuries in turkeys. Animals 11 , 2655 (2021).

Wei, J. et al. Learning with noisy labels revisited: a study using real-world human annotations. Preprint at https://doi.org/10.48550/arXiv.2110.12088 (2021).

Ma, J., Ushiku, Y. & Sagara, M. The effect of improving annotation quality on object detection datasets: a preliminary study. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition , 4850–4859 (2022).

Schmarje, L. et al. Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation (2022).

Agnew, C. et al. Quantifying the effects of ground truth annotation quality on object detection and instance segmentation performance. IEEE Access 11 , 25174–25188 (2023).

Costa, D., Silva, C., Costa, J. & Ribeiro, B. Enhancing pest detection models through improved annotations. In Proc. EPIA Conference on Artificial Intelligence , 364–375 (Springer, Cham, 2023).

Cui, J. et al. Impact of annotation quality on model performance of welding defect detection using deep learning. Weld. World 68 , 855–865 (2024).

Wang, S., Gao, J., Li, B. & Hu, W. Narrowing the gap: Improved detector training with noisy location annotations. IEEE Trans. Image Process. 31 , 6369–6380 (2022).

Whang, S. E., Roh, Y., Song, H. & Lee, J.-G. Data collection and quality challenges in deep learning: a data-centric AI perspective. VLDB J. 32 , 791–813 (2023).

Xu, S. et al. Data quality matters: A case study of obsolete comment detection (2023).

Li, Y., Zhao, C. & Caragea, C. Improving stance detection with multi-dataset learning and knowledge distillation. In Proc. Conference on Empirical Methods in Natural Language Processing , 6332–6345 (2021).

Shimizu, A. & Wakabayashi, K. Examining effect of label redundancy for machine learning using crowdsourcing. J. Data Intell. 3 , 301–315 (2022).

Zengin, M. S., Yenisey, B. U. & Kutlu, M. Exploring the impact of training datasets on Turkish stance detection. Turk. J. Electr. Eng. Comput. Sci. 31 , 1206–1222 (2023).

Derry, A., Carpenter, K. A. & Altman, R. B. Training data composition affects performance of protein structure analysis algorithms. Pac. Symp. Biocomput. 27 , 10–21 (2022).

PubMed   PubMed Central   Google Scholar  

Nikolados, E.-M., Wongprommoon, A., Aodha, O. M., Cambray, G. & Oyarzún, D. A. Accuracy and data efficiency in deep learning models of protein expression. Nat. Commun. 13 (2022).

Wang, L. & Jackson, D. A. Effects of sample size, data quality, and species response in environmental space on modeling species distributions. Landsc. Ecol. 38 , 4009–4031 (2023).

Snodgrass, S., Summerville, A. & Ontañón, S. Studying the effects of training data on machine learning-based procedural content generation. Vol. 13 of Proc. AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , 122–128 (2017).

Eid, F.-E. et al. Systematic auditing is essential to debiasing machine learning in biology. Commun. Biol. 4 , 183 (2021).

Guo, L. L. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12 , 2726 (2022).

Xu, H., Horn Nord, J., Brown, N. & Daryl Nord, G. Data quality issues in implementing an ERP. Ind. Manag. Data Syst. 102 , 47–58 (2002).

Verma, R. M., Zeng, V. & Faridi, H. Data quality for security challenges: case studies of phishing, malware and intrusion detection datasets. In Proc. ACM SIGSAC Conference on Computer and Communications Security , 2605–2607 (2019).

Laney, D. 3D data management: controlling data volume, velocity and variety. https://www.scirp.org/reference/ReferencesPapers?ReferenceID=1611280 (2001).

Wook, M. et al. Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling. J. Big Data 8 , 1–15 (2021).

Black, A. & van Nederpelt, P. Dimensions of data quality (DDQ). https://www.dama-nl.org/wp-content/uploads/2020/09/DDQ-Dimensions-of-Data-Quality-Research-Paper-version-1.2-d.d.-3-Sept-2020.pdf (2020).

IEEE standard glossary of software engineering terminology. IEEE Std 610.12-1990 610 , 1–84 (1990).

Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3 , 160018 (2016).

Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7 , 108–116 (1995).

Grandvalet, Y., Canu, S. & Boucheron, S. Noise injection: theoretical prospects. Neural Comput. 9 , 1093–1108 (1997).

Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. Preprint at https://doi.org/10.48550/arXiv.1706.03825 (2017).

Thaler, R. H. & Sunstein, C. R. Nudge: Improving Decisions About Health, Wealth, and Happiness (Yale University Press, 2009).

Kahneman, D. Thinking, Fast and Slow (Farrar, Straus and Giroux, New York, 2011).

Malossini, A., Blanzieri, E. & Ng, R. T. Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics 22 , 2114 (2006).

Frénay, B. & Verleysen, M. Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25 , 845–869 (2013).

Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34 , 1993–2024 (2014).

Deng, L. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29 , 141–142 (2012).

Krizhevsky, A. Learning multiple layers of features from tiny images. https://www.cs.utoronto.ca/̃kriz/learning-features-2009-TR.pdf (2019).

Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://doi.org/10.48550/arXiv.1708.07747 (2017).

Müller, N. M. & Markert, K. Identifying mislabeled instances in classification datasets. In Proc. International Joint Conference on Neural Networks (IJCNN) , 1–8 (2019).

Northcutt, C., Jiang, L. & Chuang, I. Confident learning: estimating uncertainty in dataset labels. J. Artif. Intell. Res. 70 , 1373–1411 (2021).

Kahneman, D., Sibony, O. & Sunstein, C. R. Noise: A flaw in Human Judgment (Hachette UK, New York, 2021).

Jaramillo, D. Radiologists and their noise: variability in human judgment, fallibility, and strategies to improve accuracy. Radiology 302 , 511–512 (2022).

Radiological Society of North America. https://www.rsna.org .

National Cancer Institute, US. QIN - Quantitative Imaging Network. https://imaging.cancer.gov/programs_resources/specialized_initiatives/qin/about/default.htm .

European society of radiology. EIBALL - European Imaging Biomarkers Alliance. https://www.myesr.org/research/eiball/ .

Anderson, R. N., Miniño, A. M., Hoyert, D. L. & Rosenberg, H. M. Comparability of cause of death between ICD-9 and ICD-10: preliminary estimates. vol. 49 of National Vital Statistics Reports (2001).

Sebastião, Y. V., Metzger, G. A., Chisolm, D. J., Xiang, H. & Cooper, J. N. Impact of ICD-9-cm to ICD-10-cm coding transition on trauma hospitalization trends among young adults in 12 states. Injury Epidemiol. 8 , 4 (2021).

Remedios, S. W. et al. Distributed deep learning across multisite datasets for generalized CT hemorrhage segmentation. Med. Phys. 47 , 89–98 (2020).

Onofrey, J. A. et al. Generalizable multi-site training and testing of deep neural networks using image normalization. In Proc. IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) , 348–351 (2019).

Pooch, E. H., Ballester, P. & Barros, R. C. Can we trust deep learning-based diagnosis? The impact of domain shift in chest radiograph classification. In Proc. Thoracic Image Analysis: Second International Workshop, TIA 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8, 2020 , 74–83 (2020).

Glocker, B., Robinson, R., Castro, D. C., Dou, Q. & Konukoglu, E. Machine learning with multi-site imaging data: an empirical study on the impact of scanner effects. Preprint at https://doi.org/10.48550/arXiv.1910.04597 (2019).

Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 , 321–357 (2002).

He, H., Bai, Y., Garcia, E. A. & Li, S. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In Proc. IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (IEEE, Hong Kong, 2008).

Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3 , 1–9 (2016).

Rubin, D. B. Inference and missing data. Biometrika 63 , 581–592 (1976).

Schafer, J. L. & Graham, J. W. Missing data: our view of the state of the art. Psychol. Methods 7 , 147 (2002).

Mazumder, M. et al. Dataperf: benchmarks for data-centric AI development. Adv. Neural Inf. Process. Syst. 36 (2024).

Zha, D. et al. Data-centric artificial intelligence: a survey. Preprint at https://doi.org/10.48550/arXiv.2303.10158 (2023).

Download references

Acknowledgements

The authors acknowledge funding by the EU project TEF-Health. The project TEF-Health has received funding from the European Union’s Digital Europe programme under grant agreement no. 101100700. We would like to thank Stefan Haufe for valuable input on the manuscript. We further thank the project partners of the TEF-Health project for feedback on our study.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany

Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Klaß & Tobias Schaeffter

Department of Medical Engineering, Technical University Berlin, Berlin, Germany

Tobias Schaeffter

Einstein Centre for Digital Future, Berlin, Germany

You can also search for this author in PubMed   Google Scholar

Contributions

D.S. and T.S. designed and supervised the study. D.S., K.B., M.S., and A.K. carried out the theoretical methods and analysed the data. M.S., A.K. extracted the data. D.S., K.B., M.S., A.K., and T.S. wrote the manuscript.

Corresponding author

Correspondence to Daniel Schwabe .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting summary, supplementary data 1, supplementary data 2, prisma checklist, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schwabe, D., Becker, K., Seyferth, M. et al. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digit. Med. 7 , 203 (2024). https://doi.org/10.1038/s41746-024-01196-4

Download citation

Received : 21 February 2024

Accepted : 12 July 2024

Published : 03 August 2024

DOI : https://doi.org/10.1038/s41746-024-01196-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

literature review quality of data

To read this content please select one of the options below:

Please note you do not have access to teaching notes, understanding the differences across data quality classifications: a literature review and guidelines for future research.

Industrial Management & Data Systems

ISSN : 0263-5577

Article publication date: 24 August 2021

Issue publication date: 10 November 2021

Numerous data quality (DQ) definitions in the form of sets of DQ dimensions are found in the literature. The great differences across such DQ classifications (DQCs) imply a lack of clarity about what DQ is. For an improved foundation for future research, this paper aims to clarify the ways in which DQCs differ and provide guidelines for dealing with this variance.

Design/methodology/approach

A literature review identifies DQCs in conference and journal articles, which are analyzed to reveal the types of differences across these. On this basis, guidelines for future research are developed.

The literature review found 110 unique DQCs in journals and conference articles. The analysis of these articles identified seven distinct types of differences across DQCs. This gave rise to the development of seven guidelines for future DQ research.

Research limitations/implications

By identifying differences across DQCs and providing a set of guidelines, this paper may promote that future research, to a greater extent, will converge around common understandings of DQ.

Practical implications

Awareness of the identified types of differences across DQCs may support managers when planning and conducting DQ improvement projects.

Originality/value

The literature review did not identify articles, which, based on systematic searches, identify and analyze existing DQCs. Thus, this paper provides new knowledge on the variance across DQCs, as well as guidelines for addressing this.

  • Data quality
  • Information quality
  • Data management
  • Information management
  • Data quality dimensions
  • Information quality dimensions

Haug, A. (2021), "Understanding the differences across data quality classifications: a literature review and guidelines for future research", Industrial Management & Data Systems , Vol. 121 No. 12, pp. 2651-2671. https://doi.org/10.1108/IMDS-12-2020-0756

Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited

Related articles

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Homepage image

Data Science Journal

Ubiquity Press logo

  • Download XML (English) PDF (English)
  • Alt. Display

Proceedings Papers

The challenges of data quality and data quality assessment in the big data era.

  • Yangyong Zhu

High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data characteristics of the big data environment, presents quality challenges faced by big data, and formulates a hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes. Finally, on the basis of this framework, this paper constructs a dynamic assessment process for data quality. This process has good expansibility and adaptability and can meet the needs of big data quality assessment. The research results enrich the theoretical scope of big data and lay a solid foundation for the future by establishing an assessment model and studying evaluation algorithms.

1 Introduction

Many significant technological changes have occurred in the information technology industry since the beginning of the 21 st century, such as cloud computing, the Internet of Things, and social networking. The development of these technologies has made the amount of data increase continuously and accumulate at an unprecedented speed. All the above mentioned technologies announce the coming of big data ( Meng & Ci, 2013 ). Currently, the amount of global data is growing exponentially. The data unit is no longer the GB and TB, but the PB (1PB = 2 10 TB), EB (1EB = 2 10 PB), and ZB (1ZB = 2 10 EB). According to IDC’s “Digital Universe” forecasts ( Gantz & Reinsel, 2012 ), 40 ZB of data will be generated by 2020.

The emergence of an era of big data attracts the attention of industry, academics, and government. For example, in 2012, the US government invested $200 million to start the “Big Data Research and Development Initiative” ( Li & Chen, 2012 ). Nature launched a special issue on big data ( Nature, 2008 ). Science also published a special issue “Dealing with Data” ( Science, 2011 ), which illustrated the importance of big data for scientific research. In addition, the development and utilization of big data have been spread widely in the medical field, retail, finance, manufacturing, logistics, telecommunications, and other industries and have generated great social value and industrial potential ( Feng, Z. Y., Guo, X. H., Zeng, D. J., et al., 2013 ).

By rapidly acquiring and analyzing big data from various sources and with various uses, researchers and decision-makers have gradually realized that this massive amount of information has benefits for understanding customer needs, improving service quality, and predicting and preventing risks. However, the use and analysis of big data must be based on accurate and high-quality data, which is a necessary condition for generating value from big data. Therefore, we analyzed the challenges faced by big data and proposed a quality assessment framework and assessment process for it.

2 Literature Review on Data Quality

In the 1950s, researchers began to study quality issues, especially for the quality of products, and a series of definitions, for example, quality is “the degree to which a set of inherent characteristics fulfill the requirements” ( General Administration of Quality Supervision, 2008 ); “fitness for use” ( Wang & Strong, 1996 ); “conformance to requirements” ( Crosby, 1988 ) were published. Later, with the rapid development of information technology, research turned to the study of the data quality.

Research on data quality started abroad in the 1990s, and many scholars proposed different definitions of data quality and division methods of quality dimensions. The Total Data Quality Management group of MIT University led by Professor Richard Y. Wang has done in-depth research in the data quality area. They defined “data quality” as “fitness for use” ( Wang & Strong, 1996 ) and proposed that data quality judgment depends on data consumers. At the same time, they defined a “data quality dimension” as a set of data quality attributes that represent a single aspect or construct of data quality. They used a two-stage survey to identify four categories containing fifteen data quality dimensions.

Some literature regarded web data as research objects and proposed individual data quality standards and quality measures. Alexander and Tate ( 1999 ) described six evaluation criteria - authority, accuracy, objectivity, currency, coverage/intended audience, and interaction/transaction features for web data. Katerattanakul and Siau ( 1999 ) developed four categories for the information quality of an individual website and a questionnaire to test the importance of each of these newly developed information quality categories and how web users determine the information quality of individual sites. For information retrieval, Gauch ( 2000 ) proposed six quality metrics, including currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, to investigate.

From the perspective of society and culture, Shanks and Corbitt ( 1999 ) studied data quality and set up an emiotic-based framework for data quality with 4 levels and a total of 11 quality dimensions. Knight and Burn ( 2005 ) summarized the most common dimensions and the frequency with which they are included in the different data quality/information quality frameworks. Then they presented the IQIP (Identify, Quantify, Implement, and Perfect) model as an approach to managing the choice and implementation of quality related algorithms of an internet crawling search engine.

According to the U.S. National Institute of Statistical Sciences (NISS) ( 2001 ), the principles of data quality are: 1. data are a product, with customers, to whom they have both cost and value; 2. as a product, data have quality, resulting from the process by which data are generated; 3. data quality depends on multiple factors, including (at least) the purpose for which the data are used, the user, the time, etc.

Research in China on data quality began later than research abroad. The 63rd Research Institute of the PLA General Staff Headquarters created a data quality research group in 2008. They discussed basic problems with data quality such as definition, error sources, improving approaches, etc. ( Cao, Diao, Wang, et al., 2010 ). In 2011, Xi’an Jiaotong University set up a research group of information quality that analyzed the challenges and importance of assuring the quality of big data and response measures in the aspects of process, technology, and management ( Zong & Wu, 2013 ). The Computer Network Information Center of the Chinese Academy of Sciences proposed a data quality assessment method and index system ( Data Application Environment Construction and Service of the Chinese Academy of Sciences, 2009 ) in which data quality is divided into three categories including external form quality, content quality, and the utility of quality. Each category is subdivided into quality characteristics and an evaluation index.

In summary, the existing studies focus on two aspects: a series of studies of web data quality and studies in specific areas, such as biology, medicine, geophysics, telecommunications, scientific data, etc. Big data as an emerging technology, acquires more and more attention but also lacks research results in establishing big data quality and assessment methods under multi-source, multi-modal environments ( Song & Qin, 2007 ).

3 The Challenges of Data Quality in the Big Data Era

3.1 features of big data.

Because big data presents new features, its data quality also faces many challenges. The characteristics of big data come down to the 4Vs: Volume, Velocity, Variety, and Value ( Katal, Wazid, & Goudar, 2013 ). Volume refers to the tremendous volume of the data. We usually use TB or above magnitudes to measure this data volume. Velocity means that data are being formed at an unprecedented speed and must be dealt with in a timely manner. Variety indicates that big data has all kinds of data types, and this diversity divides the data into structured data and unstructured data. These multityped data need higher data processing capabilities. Finally, Value represents low-value density. Value density is inversely proportional to total data size, the greater the big data scale, the less relatively valuable the data.

3.2 The challenges of data quality

Because big data has the 4V characteristics, when enterprises use and process big data, extracting high-quality and real data from the massive, variable, and complicated data sets becomes an urgent issue. At present, big data quality faces the following challenges:

  • The diversity of data sources brings abundant data types and complex data structures and increases the difficulty of data integration. In the past, enterprises only used the data generated from their own business systems, such as sales and inventory data. But now, data collected and analyzed by enterprises have surpassed this scope. Big data sources are very wide, including: 1) data sets from the internet and mobile internet ( Li & Liu, 2013 ); 2) data from the Internet of Things; 3) data collected by various industries; 4) scientific experimental and observational data ( Demchenko, Grosso & Laat, 2013 ), such as high-energy physics experimental data, biological data, and space observation data. These sources produce rich data types. One data type is unstructured data, for example, documents, video, audio, etc. The second type is semi-structured data, including: software packages/modules, spreadsheets, and financial reports. The third is structured data. The quantity of unstructured data occupies more than 80% of the total amount of data in existence. As for enterprises, obtaining big data with complex structure from different sources and effectively integrating them are a daunting task ( McGilvray, 2008 ). There are conflicts and inconsistent or contradictory phenomena among data from different sources. In the case of small data volume, the data can be checked by a manual search or programming, even by ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform). However, these methods are useless when processing PB-level even EB-level data volume.
  • Data volume is tremendous, and it is difficult to judge data quality within a reasonable amount of time. After the industrial revolution, the amount of information dominated by characters doubled every ten years. After 1970, the amount of information doubled every three years. Today, the global amount of information can be doubled every two years. In 2011, the amount of global data created and copied reached 1.8 ZB. It is difficult to collect, clean, integrate, and finally obtain the necessary high-quality data within a reasonable time frame. Because the proportion of unstructured data in big data is very high, it will take a lot of time to transform unstructured types into structured types and further process the data. This is a great challenge to the existing techniques of data processing quality.
  • Data change very fast and the “timeliness” of data is very short, which necessitates higher requirements for processing technology. Due to the rapid changes in big data, the “timeliness” of some data is very short. If companies can’t collect the required data in real time or deal with the data needs over a very long time, then they may obtain outdated and invalid information. Processing and analysis based on these data will produce useless or misleading conclusions, eventually leading to decision-making mistakes by governments or enterprises. At present, real-time processing and analysis software for big data is still in development or improvement phases; really effective commercial products are few.
  • No unified and approved data quality standards have been formed in China and abroad, and research on the data quality of big data has just begun. In order to guarantee the product quality and improve benefits to enterprises, in 1987 the International Organization for Standardization (ISO) published ISO 9000 standards. Nowadays, there are more than 100 countries and regions all over the world actively carrying out these standards. This implementation promotes mutual understanding among enterprises in domestic and international trade and brings the benefit of eliminating trade barriers. By contrast, the study of data quality standards began in the 1990s, but not until 2011 did ISO published ISO 8000 data quality standards ( Wang, Li, & Wang, 2010 ). At present, more than 20 countries have participated in this standard, but there are many disputes about it. The standards need to be mature and perfect. At the same time, research on big data quality in China and abroad has just begun and there are, as yet, few results.

4 Quality Criteria of Big Data

Big data is a new concept, and academia hasn’t made a uniform definition of its data quality and quality criteria. The literature differs on a definition of data quality, but one thing is certain: data quality depends not only on its own features but also on the business environment using the data, including business processes and business users. Only the data that conform to the relevant uses and meet requirements can be considered qualified (or good quality) data. Usually, data quality standards are developed from the perspective of data producers. In the past, data consumers were either direct or indirect data producers, which ensured the data quality. However, in the age of big data, with the diversity of data sources, data users are not necessarily data producers. Thus, it is very difficult to measure data quality. Therefore, we propose a hierarchical data quality standard from the perspective of the users, as shown in Figure 1 .

literature review quality of data

Data quality framework.

We chose data quality dimensions commonly accepted and widely used as big data quality standards and redefined their basic concepts based on actual business needs. At the same time, each dimension was divided into many typical elements associated with it, and each element has its own corresponding quality indicators. In this way, hierarchical quality standards for big data were used for evaluation. Figure 2 shows a universal two-layer data quality standard. Some detailed data quality indicators are given in Table 1 .

literature review quality of data

A universal, two-layer big data quality standard for assessment.

The hierarchical big data quality assessment framework (partial content).

DimensionsElementsIndicators
1) Availability1) AccessibilityWhether a data access interface is provided
Data can be easily made public or easy to purchase
2) TimelinessWithin a given time, whether the data arrive on time
Whether data are regularly updated
Whether the time interval from data collection and processing to release meets requirements
2) Usability1) CredibilityData come from specialized organizations of a country, field, or industry
Experts or specialists regularly audit and check the correctness of the data content
Data exist in the range of known or acceptable values
3) Reliability1) AccuracyData provided are accurate
Data representation (or value) well reflects the true state of the source information
Information (data) representation will not cause ambiguity
2) ConsistencyAfter data have been processed, their concepts, value domains, and formats still match as before processing
During a certain time, data remain consistent and verifiable
Data and the data from other data sources are consistent or verifiable
3) IntegrityData format is clear and meets the criteria
Data are consistent with structural integrity
Data are consistent with content integrity
4) CompletenessWhether the deficiency of a component will impact use of the data for data with multi-components
Whether the deficiency of a component will impact data accuracy and integrity
4) Relevance1) FitnessThe data collected do not completely match the theme, but they expound one aspect
Most datasets retrieved are within the retrieval theme users need
Information theme provides matches with users’ retrieval theme
5) Presentation Quality1) ReadabilityData (content, format, etc.) are clear and understandable
It is easy to judge that the data provided meet needs
Data description, classification, and coding content satisfy specification and are easy to understand

In Figure 2 , the data quality standard is composed of five dimensions of data quality - availability, usability, reliability, relevance, and presentation quality. For each dimension, we identified 1–5 elements with good practices. The first four quality dimensions are regarded as indispensible, inherent features of data quality, and the final dimension is additional properties that improve customer satisfaction. Availability is defined as the degree of convenience for users to obtain data and related information, which is divided into the three elements of accessibility, authorization, and timeliness. The concept of usability means whether the data are useful and meet users’ needs, including data definition/documentation, reliability, and metadata. Reliability refers to whether we can trust the data; this consists of accuracy, consistency, completeness, adequacy, and auditability elements. Relevance is used to describe the degree of correlation between data content and users’ expectations or demands; adaptability is its quality element ( Cappiello, Francalanci, & Pernici, 2004 ). Presentation quality refers to a valid description method for the data, which allows users to fully understand the data. Its dimensions are readability and structure. Descriptions of the data quality elements are given below.

  • Accessibility Accessibility refers to the difficulty level for users to obtain data. Accessibility is closely linked with data openness, the higher the data openness degree, the more data types obtained, and the higher the degree of accessibility.
  • Timeliness Timeliness is defined as the time delay from data generation and acquisition to utilization ( McGivray, 2010 ). Data should be available within this delay to allow for meaningful analysis. In the age of big data, data content changes quickly so timeliness is very important.
  • Authorization Authorization refers to whether an individual or organization has the right to use the data.
  • Credibility Credibility is used to evaluate non-numeric data. It refers to the objective and subjective components of the believability of a source or message. Credibility of data has three key factors: reliability of data sources, data normalization, and the time when the data are produced.
  • Definition/Documentation Definition/document consists of data specification, which includes data name, definition, ranges of valid values, standard formats, business rules, etc. Normative data definition improves the degree of data usage.
  • MetaData With the increase of data sources and data types, because data consumers distort the meaning of common terminology and concepts of data, using data may bring risks. Therefore, data producers need to provide metadata describing different aspects of the datasets to reduce the problems caused by misunderstanding or inconsistencies.
  • Accuracy To ascertain the accuracy of a given data value, it is compared to a known reference value. In some situations, accuracy can be easily measured, such as gender, which has only two definite values: male and female. But in other cases, there is no known reference value, making it difficult to measure accuracy. Because accuracy is correlated with context to some extent, data accuracy should be decided by the application situation.
  • Consistency Data consistency refers to whether the logical relationship between correlated data is correct and complete. In the field of databases ( Silberschatz, Korth, & Sudarshan, 2006 ), it usually means that the same data that are located in different storage areas should be considered to be equivalent. Equivalency means that the data have equal value and the same meaning or are essentially the same. Data synchronization is the process of making data equal.
  • Integrity The term data integrity is broad in scope and may have widely different meanings depending on the specific context. In a database, data with “integrity” are said to have a complete structure. Data values are standardized according to a data model and/or data type. All characteristics of the data must be correct – including business rules, relations, dates, definitions, etc. In information security, data integrity means maintaining and assuring the accuracy and consistency of data over its entire life-cycle. This means that data cannot be modified in an unauthorized or undetected manner.
  • Completeness If a datum has multiple components, we can describe the quality with completeness. Completeness means that the values of all components of a single datum are valid. For example, for image color, RGB can be used to describe red, green, and blue, and RGB represents all parts of the color data. If the color value of a certain component is missing, the image cannot show the real color and its completeness is destroyed ( Wang & Storey, 1995 ).
  • Auditability From the perspective of audit application, the data life cycle includes three phases: data generation, data collection, and data use ( Wang & Zhu, 2007 ). But here auditability means that auditors can fairly evaluate data accuracy and integrity within rational time and manpower limits during the data use phase.
  • Fitness Fitness has two-level requirements: 1) the amount of accessed data used by users and 2) the degree to which the data produced matches users’ needs in the aspects of indicator definition, elements, classification, etc.
  • Readability Readability is defined as the ability of data content to be correctly explained according to known or well defined terms, attributes, units, codes, abbreviations, or other information.
  • Structure More than 80% of all data is unstructured, therefore, structure refers to the level of difficulty in transforming semi-structured or unstructured data to structured data through technology.

We present a big data quality assessment framework in Table 1 , which lists the common quality elements and their associated indicators. Generally, a quality element has its own multi-indicators.

5 QUALITY ASSESSMENT PROCESS FOR BIG DATA

An appropriate quality assessment method for big data is necessary to draw valid conclusions. In this paper, we propose an effective data quality assessment process with a dynamic feedback mechanism based on big data’s own characteristics, shown in Figure 3 .

literature review quality of data

Quality assessment process for big data.

Determining the goals of data collection is the first step of the whole assessment process. Big data users rationally choose the data to be used according to their strategic objectives or business requirements, such as operations, decision making, and planning. The data sources, types, volume, quality requirements, assessment criteria, and specifications as well as the expected goals need to be determined in advance.

In different business environments, the selection of data quality elements will differ. For example, for social media data, timeliness and accuracy are two important quality features. However, because it is difficult to directly judge accuracy ( Shankaranarayanan, Ziad, & Wang, 2012 ), some additional information is needed to judge the raw data, and other data sources serve as supplements or evidence. Therefore, credibility has become an important quality dimension. However, social media data are usually unstructured, and their consistency and integrity are not suitable for evaluation. The field of biology is an important source of big data. However, due to the lack of uniform standards, data storage software and data formats vary widely. Thus, it is difficult to regard consistency as a quality dimension, and the needs of regarding timeliness and completeness as data quality dimensions are not high.

In order to further quality assessment, we need to choose specific assessment indicators for every dimension. These require the data to comply with specific conditions or features. The formulation of assessment indicators also depends on the actual business environment.

Each quality dimension needs different measurement tools, techniques, and processes, which leads to differences in assessment times, costs, and human resources. In a clear understanding of the work required to assess each dimension, choosing those dimensions that meet the needs can well define a project’s scope. The preliminary assessment results of data quality dimensions determine the baseline while the remaining assessment as a part of the business process is used for continuous detection and information improvement.

After the quality assessment preparation is completed, the process enters the data acquisition phase. There are many ways to collect data ( Zhu & Xiong, 2009 ), including: data integration, search-download, web crawlers, agent methods, carrier monitors, etc. In the age of big data, data acquisition is relatively easy, but much of the data collected is not always good. We need to improve data quality as far as possible under these conditions without a large increase in acquisition cost.

Big data sources are very wide and data structures are complex. The data received may have quality problems, such as data errors, missing information, inconsistencies, noise, etc. The purpose of data cleaning (data scrubbing) is to detect and remove errors and inconsistencies from data in order to improve their quality. Data cleaning can be divided into four patterns based on implementation methods and scopes ( Wang, Zhang, & Zhang, 2007 ): manual implementation, writing of special application programs, data cleaning unrelated to specific application fields, and solving the problem of a type of specific application domain. In these four approaches, the third has good practical value and can be applied successfully.

Then, the process enters the data quality assessment and monitoring phases. The core of data quality assessment is how to evaluate each dimension. The current method has two categories: qualitative and quantitative methods. The qualitative evaluation method is based on certain evaluation criteria and requirements, according to assessment purposes and user demands, from the perspective of qualitative analysis to describe and assess data resources. Qualitative analysis should be performed by subject experts or professionals. The quantitative method is a formal, objective, and systematic process in which numerical data are utilized to obtain information. Therefore, objectivity, generalizability, and numbers are features often associated with this method, whose evaluation results are more intuitive and concrete.

After assessment, the data can be compared with the baseline for the data quality assessment established above. If the data quality accords with the baseline standard, a follow-up data analysis phase can be entered, and a data quality report will be generated. Otherwise, if the data quality fails to satisfy the baseline standard, it is necessary to acquire new data.

Strictly speaking, data analysis and data mining do not belong to the scope of big data quality assessment, but they play an important role in the dynamic adjustment and feedback of data quality assessment. We can use these two methods to discover whether valuable information or knowledge exists in big data and whether the knowledge can be helpful for policy proposals, business decisions, scientific discoveries, disease treatments, etc. If the analysis results meet the goal, then the results are outputted and fed back to the quality assessment system so as to provide better support for the next round of assessment. If results do not reach the goal, the data quality assessment baseline may not be reasonable, and we need to adjust it in a timely fashion in order to obtain results in line with our goals.

6 Conclusion

The arrival of the big data era makes data in various industries and fields present explosive growth. How to ensure big data quality and how to analyze and mine information and knowledge hidden behind the data become major issues for industry and academia. Poor data quality will lead to low data utilization efficiency and even bring serious decision-making mistakes. We analyzed the challenges faced by big data quality and proposed the establishment and hierarchical structure of a data quality framework. Then, we formulated a dynamic big data quality assessment process with a feedback mechanism, which has laid a good foundation for further study of the assessment model. The next stage of research will involve the construction of a big data quality assessment model and formation of a weight coefficient for each assessment indicator. At the same time, the research team will develop an algorithm used to make a practical assessment of the big data quality in a specific field.

7 Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61170096, the Major Program of National Natural Science Foundation of China under Grant No. 71331005, the Shanghai Science and Technology Development Funds under Grant No. 13dz2260200, 13511504300, and the Department of Science and Technology of Yunnan Province under Grant No. 2012FD004.

Alan, F. K., Sanil, A. P., Sacks, J., et al. (2001) Workshop Report: Affiliates Workshop on Data Quality, North Carolina: NISS.  

Alexander, J. E., & Tate, M. A. Web wisdom: How to evaluate and create information on the web , Mahwah, NJ: Erlbaum.  

Cao, J. J., Diao, X. C., Wang, T., et al. (2010) Research on Some Basic Problems in Data Quality Control. Microcomputer Information 09 , pp 12–14.  

Cappiello, C., Francalanci, C., & Pernici, B. (2004) Data quality assessment from user‘s perspective. Procedures of the 2004 International Workshop on Information Quality in Information Systems , New York: ACM, pp 78–73.  

Crosby, P. B. (1988) Quality is Free: The Art of Making Quality Certain , New York: McGraw-Hill.  

Data Application Environment Construction and Service of Chinese Academy of Sciences (2009) Data Quality Evaluation Method and Index System. Retrieved October 30, 2013 from the World Wide Web: http://www.csdb.cn/upload/101205/1012052021536150.pdf  

Demchenko, Y., Grosso, P., de Laat, C., et al. (2013) Addressing Big Data Issues in Scientific Data Infrastructure. Procedures of the 2013 International Conference on Collaboration Technologies and Systems , California: ACM, pp 48–55.  

Feng, Z. Y., Guo, X. H., Zeng, D. J., et al. (2013) On the research frontiers of business management in the context of Big Data. Journal of Management Sciences in China 16 (01), pp 1–9.  

Gantz, J., & Reinsel, D. (2012) THE DIGITAL UNIVERSE IN 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Retrieved February, 2013 from the World Wide Web: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-western-europe.pdf  

General Administration of Quality Supervision (2008) Inspection and Quarantine of the People’s Republic of China. Quality management systems-Fundamentals and vocabulary (GB/T19000—2008/ISO9000:2005), Beijing.  

Katal, A., Wazid, M., & Goudar, R. (2013) Big Data: Issues, Challenges, Tools and Good Practices. Procedures of the 2013 Sixth International Conference on Contemporary Computing , Noida: IEEE, pp 404–409.  

Katerattanakul, P., & Siau, K. (1999) Measuring information quality of web sites: Development of an instrument. Procedures of the 20th International Conference on Information Systems , North Carolina: ACM, pp 279–285.  

Knight, S., & Burn, J. (2005) Developing a Framework for Assessing Information Quality on the World Wide Web. Information Science Journal 18 , pp 159–171.  

Li, G. J., & Chen, X. Q. (2012) Research Status and Scientific Thinking of Big Data. Bulletin of Chinese Academy of Sciences 27 (06), pp 648–657.  

Li, J. Z., & Liu, X. M. (2013) An Important Aspect of Big Data: Data Usability. Journal of Computer Research and Development 50 (6), pp 1147–1162.  

McGilvray, D. (2008) Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information , California: Morgan Kaufmann.  

McGilvray, D. (2010) Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information , Beijing: Publishing House of Electronics Industry.  

Meng, X. F., & Ci, X. (2013) Big Data Management: Concepts, Techniques and Challenges. Journal of Computer Research and Development 50 (1), pp 146–169.  

Nature (2008) Big Data. Retrieved November 5, 2013 from the World Wide Web: http://www.nature.com/news/specials/bigdata/index.html  

Science (2011) Special online collection: Dealing with data. Retrieved November 5, 2013 from the World Wide Web: http://www.sciencemag.org/site/special/data/  

Shankaranarayanan, G., Ziad, M., & Wang, R. Y. (2012) Preliminary Study on Data Quality Assessment for Socialized Media. China Science and Technology Resources 44 (2), pp 72–79.  

Shanks, G., & Corbitt, B. (1999) Understanding data quality: Social and cultural aspects. Procedures of the 10th Australasian Conference on Information Systems , Wellington: MCB University Press Ltd., pp 785–797.  

Silberschatz, A., Korth, H., & Sudarshan, S. (2006) Database System Concepts , Beijing: Higher Education Press.  

Song, M., & Qin, Z. (2007) Reviews of Foreign Studies on Data Quality Management. Journal of Information 2 , pp 7–9.  

Wang, H., & Zhu, W. M. (2007) Quality of Audit Data: A Perspective of Evidence. Journal of Nanjing University (Natural Sciences) 43 (1), pp 29–34.  

Wang, J. L., Li, H., & Wang, Q. (2010) Research on ISO 8000 Series Standards for Data Quality. Standard Science 12 , pp 44–46.  

Wang, R., & Storey, V. (1995) Framework for Analysis of Quality Research. IEEE Transactions on Knowledge and Data Engineering 1 (4), pp 623–637.  

Wang, R. Y., & Strong, D. M. (1996) Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 12 (4), pp 5–33.  

Wang, Y. F., Zhang, C. Z., Zhang, B. B., et al. (2007) A Survey of Data Cleaning. New Technology of Library and Information Service 12 , pp 50–56.  

Zhu, X., & Gauch, S. (2000) Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. Procedures of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , Athens: ACM, pp 288–295.  

Zhu, Y. Y., & Xiong, Y. (2009) Datology and Data Science , Shanghai: Fudan University Press.  

Zong, W., & Wu, F.(2013) The Challenge of Data Quality in the Big Data Age. Journal of Xi’an Jiaotong University (Social Sciences) 33 (5), pp 38–43.  

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Literature Review | Guide, Examples, & Templates

How to Write a Literature Review | Guide, Examples, & Templates

Published on January 2, 2023 by Shona McCombes . Revised on September 11, 2023.

What is a literature review? A literature review is a survey of scholarly sources on a specific topic. It provides an overview of current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing research that you can later apply to your paper, thesis, or dissertation topic .

There are five key steps to writing a literature review:

  • Search for relevant literature
  • Evaluate sources
  • Identify themes, debates, and gaps
  • Outline the structure
  • Write your literature review

A good literature review doesn’t just summarize sources—it analyzes, synthesizes , and critically evaluates to give a clear picture of the state of knowledge on the subject.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

What is the purpose of a literature review, examples of literature reviews, step 1 – search for relevant literature, step 2 – evaluate and select sources, step 3 – identify themes, debates, and gaps, step 4 – outline your literature review’s structure, step 5 – write your literature review, free lecture slides, other interesting articles, frequently asked questions, introduction.

  • Quick Run-through
  • Step 1 & 2

When you write a thesis , dissertation , or research paper , you will likely have to conduct a literature review to situate your research within existing knowledge. The literature review gives you a chance to:

  • Demonstrate your familiarity with the topic and its scholarly context
  • Develop a theoretical framework and methodology for your research
  • Position your work in relation to other researchers and theorists
  • Show how your research addresses a gap or contributes to a debate
  • Evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

Writing literature reviews is a particularly important skill if you want to apply for graduate school or pursue a career in research. We’ve written a step-by-step guide that you can follow below.

Literature review guide

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

literature review quality of data

Writing literature reviews can be quite challenging! A good starting point could be to look at some examples, depending on what kind of literature review you’d like to write.

  • Example literature review #1: “Why Do People Migrate? A Review of the Theoretical Literature” ( Theoretical literature review about the development of economic migration theory from the 1950s to today.)
  • Example literature review #2: “Literature review as a research methodology: An overview and guidelines” ( Methodological literature review about interdisciplinary knowledge acquisition and production.)
  • Example literature review #3: “The Use of Technology in English Language Learning: A Literature Review” ( Thematic literature review about the effects of technology on language acquisition.)
  • Example literature review #4: “Learners’ Listening Comprehension Difficulties in English Language Learning: A Literature Review” ( Chronological literature review about how the concept of listening skills has changed over time.)

You can also check out our templates with literature review examples and sample outlines at the links below.

Download Word doc Download Google doc

Before you begin searching for literature, you need a clearly defined topic .

If you are writing the literature review section of a dissertation or research paper, you will search for literature related to your research problem and questions .

Make a list of keywords

Start by creating a list of keywords related to your research question. Include each of the key concepts or variables you’re interested in, and list any synonyms and related terms. You can add to this list as you discover new keywords in the process of your literature search.

  • Social media, Facebook, Instagram, Twitter, Snapchat, TikTok
  • Body image, self-perception, self-esteem, mental health
  • Generation Z, teenagers, adolescents, youth

Search for relevant sources

Use your keywords to begin searching for sources. Some useful databases to search for journals and articles include:

  • Your university’s library catalogue
  • Google Scholar
  • Project Muse (humanities and social sciences)
  • Medline (life sciences and biomedicine)
  • EconLit (economics)
  • Inspec (physics, engineering and computer science)

You can also use boolean operators to help narrow down your search.

Make sure to read the abstract to find out whether an article is relevant to your question. When you find a useful book or article, you can check the bibliography to find other relevant sources.

You likely won’t be able to read absolutely everything that has been written on your topic, so it will be necessary to evaluate which sources are most relevant to your research question.

For each publication, ask yourself:

  • What question or problem is the author addressing?
  • What are the key concepts and how are they defined?
  • What are the key theories, models, and methods?
  • Does the research use established frameworks or take an innovative approach?
  • What are the results and conclusions of the study?
  • How does the publication relate to other literature in the field? Does it confirm, add to, or challenge established knowledge?
  • What are the strengths and weaknesses of the research?

Make sure the sources you use are credible , and make sure you read any landmark studies and major theories in your field of research.

You can use our template to summarize and evaluate sources you’re thinking about using. Click on either button below to download.

Take notes and cite your sources

As you read, you should also begin the writing process. Take notes that you can later incorporate into the text of your literature review.

It is important to keep track of your sources with citations to avoid plagiarism . It can be helpful to make an annotated bibliography , where you compile full citation information and write a paragraph of summary and analysis for each source. This helps you remember what you read and saves time later in the process.

Don't submit your assignments before you do this

The academic proofreading tool has been trained on 1000s of academic texts. Making it the most accurate and reliable proofreading tool for students. Free citation check included.

literature review quality of data

Try for free

To begin organizing your literature review’s argument and structure, be sure you understand the connections and relationships between the sources you’ve read. Based on your reading and notes, you can look for:

  • Trends and patterns (in theory, method or results): do certain approaches become more or less popular over time?
  • Themes: what questions or concepts recur across the literature?
  • Debates, conflicts and contradictions: where do sources disagree?
  • Pivotal publications: are there any influential theories or studies that changed the direction of the field?
  • Gaps: what is missing from the literature? Are there weaknesses that need to be addressed?

This step will help you work out the structure of your literature review and (if applicable) show how your own research will contribute to existing knowledge.

  • Most research has focused on young women.
  • There is an increasing interest in the visual aspects of social media.
  • But there is still a lack of robust research on highly visual platforms like Instagram and Snapchat—this is a gap that you could address in your own research.

There are various approaches to organizing the body of a literature review. Depending on the length of your literature review, you can combine several of these strategies (for example, your overall structure might be thematic, but each theme is discussed chronologically).

Chronological

The simplest approach is to trace the development of the topic over time. However, if you choose this strategy, be careful to avoid simply listing and summarizing sources in order.

Try to analyze patterns, turning points and key debates that have shaped the direction of the field. Give your interpretation of how and why certain developments occurred.

If you have found some recurring central themes, you can organize your literature review into subsections that address different aspects of the topic.

For example, if you are reviewing literature about inequalities in migrant health outcomes, key themes might include healthcare policy, language barriers, cultural attitudes, legal status, and economic access.

Methodological

If you draw your sources from different disciplines or fields that use a variety of research methods , you might want to compare the results and conclusions that emerge from different approaches. For example:

  • Look at what results have emerged in qualitative versus quantitative research
  • Discuss how the topic has been approached by empirical versus theoretical scholarship
  • Divide the literature into sociological, historical, and cultural sources

Theoretical

A literature review is often the foundation for a theoretical framework . You can use it to discuss various theories, models, and definitions of key concepts.

You might argue for the relevance of a specific theoretical approach, or combine various theoretical concepts to create a framework for your research.

Like any other academic text , your literature review should have an introduction , a main body, and a conclusion . What you include in each depends on the objective of your literature review.

The introduction should clearly establish the focus and purpose of the literature review.

Depending on the length of your literature review, you might want to divide the body into subsections. You can use a subheading for each theme, time period, or methodological approach.

As you write, you can follow these tips:

  • Summarize and synthesize: give an overview of the main points of each source and combine them into a coherent whole
  • Analyze and interpret: don’t just paraphrase other researchers — add your own interpretations where possible, discussing the significance of findings in relation to the literature as a whole
  • Critically evaluate: mention the strengths and weaknesses of your sources
  • Write in well-structured paragraphs: use transition words and topic sentences to draw connections, comparisons and contrasts

In the conclusion, you should summarize the key findings you have taken from the literature and emphasize their significance.

When you’ve finished writing and revising your literature review, don’t forget to proofread thoroughly before submitting. Not a language expert? Check out Scribbr’s professional proofreading services !

This article has been adapted into lecture slides that you can use to teach your students about writing a literature review.

Scribbr slides are free to use, customize, and distribute for educational purposes.

Open Google Slides Download PowerPoint

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarize yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

The literature review usually comes near the beginning of your thesis or dissertation . After the introduction , it grounds your research in a scholarly field and leads directly to your theoretical framework or methodology .

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, September 11). How to Write a Literature Review | Guide, Examples, & Templates. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/dissertation/literature-review/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is a theoretical framework | guide to organizing, what is a research methodology | steps & tips, how to write a research proposal | examples & templates, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Data analytics in quality 4.0: literature review and future research directions

  • September 2022

Alexandros Bousdekis at National Technical University of Athens

  • National Technical University of Athens

Katerina Lepenioti at National Technical University of Athens

  • This person is not on ResearchGate, or hasn't claimed this research yet.

Gregoris Mentzas at National Technical University of Athens

Abstract and Figures

The methodology of the literature review.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Shimon Fridkin
  • Michael Winokur
  • Amir Gamliel
  • Int J Qual Reliab Manag

Faisal Talib

  • Gerardo Hernández Chávez
  • Yazmín Hernández Chávez

Radosław Wolniak

  • TOTAL QUAL MANAG BUS
  • Diska Prini Fadilasari

Ranjit Roy Ghatak

  • Tama Rani Sarker
  • Julie K. Dunston
  • J SOUND VIB

Omri Matania

  • Itai Dattner
  • Jacob Bortman
  • Yisrael Parmet
  • Kerem Elibal

Eren Özceylan

  • J INTELL MANUF

Ning Ge

  • José Oliveira
  • Kong Yao Chee

Tom Z. Jiahao

  • M. Ani Hsieh
  • Noman Haleem
  • Matteo Bustreo

Alessio Del Bue

  • Enrica Bosani

Gregoris Mentzas

  • Helmut Ennsbrunner

Stefan Thalmann

  • Juergen Mangler

Stefanie Rinderle-Ma

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • UConn Library
  • Literature Review: The What, Why and How-to Guide
  • Introduction

Literature Review: The What, Why and How-to Guide — Introduction

  • Getting Started
  • How to Pick a Topic
  • Strategies to Find Sources
  • Evaluating Sources & Lit. Reviews
  • Tips for Writing Literature Reviews
  • Writing Literature Review: Useful Sites
  • Citation Resources
  • Other Academic Writings

What are Literature Reviews?

So, what is a literature review? "A literature review is an account of what has been published on a topic by accredited scholars and researchers. In writing the literature review, your purpose is to convey to your reader what knowledge and ideas have been established on a topic, and what their strengths and weaknesses are. As a piece of writing, the literature review must be defined by a guiding concept (e.g., your research objective, the problem or issue you are discussing, or your argumentative thesis). It is not just a descriptive list of the material available, or a set of summaries." Taylor, D.  The literature review: A few tips on conducting it . University of Toronto Health Sciences Writing Centre.

Goals of Literature Reviews

What are the goals of creating a Literature Review?  A literature could be written to accomplish different aims:

  • To develop a theory or evaluate an existing theory
  • To summarize the historical or existing state of a research topic
  • Identify a problem in a field of research 

Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews .  Review of General Psychology , 1 (3), 311-320.

What kinds of sources require a Literature Review?

  • A research paper assigned in a course
  • A thesis or dissertation
  • A grant proposal
  • An article intended for publication in a journal

All these instances require you to collect what has been written about your research topic so that you can demonstrate how your own research sheds new light on the topic.

Types of Literature Reviews

What kinds of literature reviews are written?

Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified. The review ends with a conclusion section which summarizes the findings regarding the state of the research of the specific study, the gaps identify and if applicable, explains how the author's research will address gaps identify in the review and expand the knowledge on the topic reviewed.

  • Example : Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework:  10.1177/08948453211037398  

Systematic review : "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139). Nelson, L. K. (2013). Research in Communication Sciences and Disorders . Plural Publishing.

  • Example : The effect of leave policies on increasing fertility: a systematic review:  10.1057/s41599-022-01270-w

Meta-analysis : "Meta-analysis is a method of reviewing research findings in a quantitative fashion by transforming the data from individual studies into what is called an effect size and then pooling and analyzing this information. The basic goal in meta-analysis is to explain why different outcomes have occurred in different studies." (p. 197). Roberts, M. C., & Ilardi, S. S. (2003). Handbook of Research Methods in Clinical Psychology . Blackwell Publishing.

  • Example : Employment Instability and Fertility in Europe: A Meta-Analysis:  10.1215/00703370-9164737

Meta-synthesis : "Qualitative meta-synthesis is a type of qualitative study that uses as data the findings from other qualitative studies linked by the same or related topic." (p.312). Zimmer, L. (2006). Qualitative meta-synthesis: A question of dialoguing with texts .  Journal of Advanced Nursing , 53 (3), 311-318.

  • Example : Women’s perspectives on career successes and barriers: A qualitative meta-synthesis:  10.1177/05390184221113735

Literature Reviews in the Health Sciences

  • UConn Health subject guide on systematic reviews Explanation of the different review types used in health sciences literature as well as tools to help you find the right review type
  • << Previous: Getting Started
  • Next: How to Pick a Topic >>
  • Last Updated: Sep 21, 2022 2:16 PM
  • URL: https://guides.lib.uconn.edu/literaturereview

Creative Commons

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Automated data analysis of unstructured grey literature in health research: A mapping review

Affiliations.

  • 1 National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK.
  • 2 Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing, Newcastle University, Newcastle upon Tyne, UK.
  • PMID: 38115736
  • DOI: 10.1002/jrsm.1692

The amount of grey literature and 'softer' intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research.

Keywords: artificial intelligence; automation; grey literature; literature review; natural language processing.

© 2023 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.

PubMed Disclaimer

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia. Osborne SR, Alston LV, Bolton KA, Whelan J, Reeve E, Wong Shee A, Browne J, Walker T, Versace VL, Allender S, Nichols M, Backholer K, Goodwin N, Lewis S, Dalton H, Prael G, Curtin M, Brooks R, Verdon S, Crockett J, Hodgins G, Walsh S, Lyle DM, Thompson SC, Browne LJ, Knight S, Pit SW, Jones M, Gillam MH, Leach MJ, Gonzalez-Chica DA, Muyambi K, Eshetie T, Tran K, May E, Lieschke G, Parker V, Smith A, Hayes C, Dunlop AJ, Rajappa H, White R, Oakley P, Holliday S. Osborne SR, et al. Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881. Med J Aust. 2020. PMID: 33314144
  • FreeStyle Libre Flash Glucose Self-Monitoring System: A Single-Technology Assessment [Internet]. Bidonde J, Fagerlund BC, Frønsdal KB, Lund UH, Robberstad B. Bidonde J, et al. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 21. Report from the Norwegian Institute of Public Health No. 2017-07. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 21. Report from the Norwegian Institute of Public Health No. 2017-07. PMID: 29553668 Free Books & Documents. Review.
  • Recovery schools for improving behavioral and academic outcomes among students in recovery from substance use disorders: a systematic review. Hennessy EA, Tanner-Smith EE, Finch AJ, Sathe N, Kugley S. Hennessy EA, et al. Campbell Syst Rev. 2018 Oct 4;14(1):1-86. doi: 10.4073/csr.2018.9. eCollection 2018. Campbell Syst Rev. 2018. PMID: 37131375 Free PMC article.
  • The use of artificial intelligence for automating or semi-automating biomedical literature analyses: A scoping review. Santos ÁOD, da Silva ES, Couto LM, Reis GVL, Belo VS. Santos ÁOD, et al. J Biomed Inform. 2023 Jun;142:104389. doi: 10.1016/j.jbi.2023.104389. Epub 2023 May 13. J Biomed Inform. 2023. PMID: 37187321 Review.
  • Blumenfeld P, Pfeffer RM, Symon Z, et al. The lag time in initiating clinical testing of new drugs in combination with radiation therapy, a significant barrier to progress? Br J Cancer. 2014;111(7):1305-1309. doi:10.1038/bjc.2014.448
  • Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011;104(12):510-520. doi:10.1258/jrsm.2011.110180
  • Van Norman GA. Drugs, devices, and the FDA: part 2: an overview of approval processes: FDA approval of medical devices. JACC: Basic Transl Sci. 2016;1(4):277-287. doi:10.1016/j.jacbts.2016.03.009
  • DeYoung J, Beltagy I, van Zuylen M, Kuehl B, Wang LL. MS2: a dataset for multi-document summarization of medical studies. ArXiv. 2021. doi:10.48550/arXiv.2104.06486
  • Goodman CS, Church F. HTA 101 Introduction to health technology assessment. 2004.

Publication types

  • Search in MeSH

Related information

Grants and funding.

  • HSRIC-2016-10009/National Institute for Health and Care Research

LinkOut - more resources

Full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Systematic review
  • Open access
  • Published: 07 August 2024

Models and frameworks for assessing the implementation of clinical practice guidelines: a systematic review

  • Nicole Freitas de Mello   ORCID: orcid.org/0000-0002-5228-6691 1 , 2 ,
  • Sarah Nascimento Silva   ORCID: orcid.org/0000-0002-1087-9819 3 ,
  • Dalila Fernandes Gomes   ORCID: orcid.org/0000-0002-2864-0806 1 , 2 ,
  • Juliana da Motta Girardi   ORCID: orcid.org/0000-0002-7547-7722 4 &
  • Jorge Otávio Maia Barreto   ORCID: orcid.org/0000-0002-7648-0472 2 , 4  

Implementation Science volume  19 , Article number:  59 ( 2024 ) Cite this article

499 Accesses

6 Altmetric

Metrics details

The implementation of clinical practice guidelines (CPGs) is a cyclical process in which the evaluation stage can facilitate continuous improvement. Implementation science has utilized theoretical approaches, such as models and frameworks, to understand and address this process. This article aims to provide a comprehensive overview of the models and frameworks used to assess the implementation of CPGs.

A systematic review was conducted following the Cochrane methodology, with adaptations to the "selection process" due to the unique nature of this review. The findings were reported following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting guidelines. Electronic databases were searched from their inception until May 15, 2023. A predetermined strategy and manual searches were conducted to identify relevant documents from health institutions worldwide. Eligible studies presented models and frameworks for assessing the implementation of CPGs. Information on the characteristics of the documents, the context in which the models were used (specific objectives, level of use, type of health service, target group), and the characteristics of each model or framework (name, domain evaluated, and model limitations) were extracted. The domains of the models were analyzed according to the key constructs: strategies, context, outcomes, fidelity, adaptation, sustainability, process, and intervention. A subgroup analysis was performed grouping models and frameworks according to their levels of use (clinical, organizational, and policy) and type of health service (community, ambulatorial, hospital, institutional). The JBI’s critical appraisal tools were utilized by two independent researchers to assess the trustworthiness, relevance, and results of the included studies.

Database searches yielded 14,395 studies, of which 80 full texts were reviewed. Eight studies were included in the data analysis and four methodological guidelines were additionally included from the manual search. The risk of bias in the studies was considered non-critical for the results of this systematic review. A total of ten models/frameworks for assessing the implementation of CPGs were found. The level of use was mainly policy, the most common type of health service was institutional, and the major target group was professionals directly involved in clinical practice. The evaluated domains differed between the models and there were also differences in their conceptualization. All the models addressed the domain "Context", especially at the micro level (8/12), followed by the multilevel (7/12). The domains "Outcome" (9/12), "Intervention" (8/12), "Strategies" (7/12), and "Process" (5/12) were frequently addressed, while "Sustainability" was found only in one study, and "Fidelity/Adaptation" was not observed.

Conclusions

The use of models and frameworks for assessing the implementation of CPGs is still incipient. This systematic review may help stakeholders choose or adapt the most appropriate model or framework to assess CPGs implementation based on their specific health context.

Trial registration

PROSPERO (International Prospective Register of Systematic Reviews) registration number: CRD42022335884. Registered on June 7, 2022.

Peer Review reports

Contributions to the literature

Although the number of theoretical approaches has grown in recent years, there are still important gaps to be explored in the use of models and frameworks to assess the implementation of clinical practice guidelines (CPGs). This systematic review aims to contribute knowledge to overcome these gaps.

Despite the great advances in implementation science, evaluating the implementation of CPGs remains a challenge, and models and frameworks could support improvements in this field.

This study demonstrates that the available models and frameworks do not cover all characteristics and domains necessary for a complete evaluation of CPGs implementation.

The presented findings contribute to the field of implementation science, encouraging debate on choices and adaptations of models and frameworks for implementation research and evaluation.

Substantial investments have been made in clinical research and development in recent decades, increasing the medical knowledge base and the availability of health technologies [ 1 ]. The use of clinical practice guidelines (CPGs) has increased worldwide to guide best health practices and to maximize healthcare investments. A CPG can be defined as "any formal statements systematically developed to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances" [ 2 ] and has the potential to improve patient care by promoting interventions of proven benefit and discouraging ineffective interventions. Furthermore, they can promote efficiency in resource allocation and provide support for managers and health professionals in decision-making [ 3 , 4 ].

However, having a quality CPG does not guarantee that the expected health benefits will be obtained. In fact, putting these devices to use still presents a challenge for most health services across distinct levels of government. In addition to the development of guidelines with high methodological rigor, those recommendations need to be available to their users; these recommendations involve the diffusion and dissemination stages, and they need to be used in clinical practice (implemented), which usually requires behavioral changes and appropriate resources and infrastructure. All these stages involve an iterative and complex process called implementation, which is defined as the process of putting new practices within a setting into use [ 5 , 6 ].

Implementation is a cyclical process, and the evaluation is one of its key stages, which allows continuous improvement of CPGs development and implementation strategies. It consists of verifying whether clinical practice is being performed as recommended (process evaluation or formative evaluation) and whether the expected results and impact are being reached (summative evaluation) [ 7 , 8 , 9 ]. Although the importance of the implementation evaluation stage has been recognized, research on how these guidelines are implemented is scarce [ 10 ]. This paper focused on the process of assessing CPGs implementation.

To understand and improve this complex process, implementation science provides a systematic set of principles and methods to integrate research findings and other evidence-based practices into routine practice and improve the quality and effectiveness of health services and care [ 11 ]. The field of implementation science uses theoretical approaches that have varying degrees of specificity based on the current state of knowledge and are structured based on theories, models, and frameworks [ 5 , 12 , 13 ]. A "Model" is defined as "a simplified depiction of a more complex world with relatively precise assumptions about cause and effect", and a "framework" is defined as "a broad set of constructs that organize concepts and data descriptively without specifying causal relationships" [ 9 ]. Although these concepts are distinct, in this paper, their use will be interchangeable, as they are typically like checklists of factors relevant to various aspects of implementation.

There are a variety of theoretical approaches available in implementation science [ 5 , 14 ], which can make choosing the most appropriate challenging [ 5 ]. Some models and frameworks have been categorized as "evaluation models" by providing a structure for evaluating implementation endeavors [ 15 ], even though theoretical approaches from other categories can also be applied for evaluation purposes because they specify concepts and constructs that may be operationalized and measured [ 13 ]. Two frameworks that can specify implementation aspects that should be evaluated as part of intervention studies are RE-AIM (Reach, Effectiveness, Adoption, Implementation, Maintenance) [ 16 ] and PRECEDE-PROCEED (Predisposing, Reinforcing and Enabling Constructs in Educational Diagnosis and Evaluation-Policy, Regulatory, and Organizational Constructs in Educational and Environmental Development) [ 17 ]. Although the number of theoretical approaches has grown in recent years, the use of models and frameworks to evaluate the implementation of guidelines still seems to be a challenge.

This article aims to provide a complete map of the models and frameworks applied to assess the implementation of CPGs. The aim is also to subside debate and choices on models and frameworks for the research and evaluation of the implementation processes of CPGs and thus to facilitate the continued development of the field of implementation as well as to contribute to healthcare policy and practice.

A systematic review was conducted following the Cochrane methodology [ 18 ], with adaptations to the "selection process" due to the unique nature of this review (details can be found in the respective section). The review protocol was registered in PROSPERO (registration number: CRD42022335884) on June 7, 2022. This report adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [ 19 ] and a completed checklist is provided in Additional File 1.

Eligibility criteria

The SDMO approach (Types of Studies, Types of Data, Types of Methods, Outcomes) [ 20 ] was utilized in this systematic review, outlined as follows:

Types of studies

All types of studies were considered for inclusion, as the assessment of CPG implementation can benefit from a diverse range of study designs, including randomized clinical trials/experimental studies, scale/tool development, systematic reviews, opinion pieces, qualitative studies, peer-reviewed articles, books, reports, and unpublished theses.

Studies were categorized based on their methodological designs, which guided the synthesis, risk of bias assessment, and presentation of results.

Study protocols and conference abstracts were excluded due to insufficient information for this review.

Types of data

Studies that evaluated the implementation of CPGs either independently or as part of a multifaceted intervention.

Guidelines for evaluating CPG implementation.

Inclusion of CPGs related to any context, clinical area, intervention, and patient characteristics.

No restrictions were placed on publication date or language.

Exclusion criteria

General guidelines were excluded, as this review focused on 'models for evaluating clinical practice guidelines implementation' rather than the guidelines themselves.

Studies that focused solely on implementation determinants as barriers and enablers were excluded, as this review aimed to explore comprehensive models/frameworks.

Studies evaluating programs and policies were excluded.

Studies that only assessed implementation strategies (isolated actions) rather than the implementation process itself were excluded.

Studies that focused solely on the impact or results of implementation (summative evaluation) were excluded.

Types of methods

Not applicable.

All potential models or frameworks for assessing the implementation of CPG (evaluation models/frameworks), as well as their characteristics: name; specific objectives; levels of use (clinical, organizational, and policy); health system (public, private, or both); type of health service (community, ambulatorial, hospital, institutional, homecare); domains or outcomes evaluated; type of recommendation evaluated; context; limitations of the model.

Model was defined as "a deliberated simplification of a phenomenon on a specific aspect" [ 21 ].

Framework was defined as "structure, overview outline, system, or plan consisting of various descriptive categories" [ 21 ].

Models or frameworks used solely for the CPG development, dissemination, or implementation phase.

Models/frameworks used solely for assessment processes other than implementation, such as for the development or dissemination phase.

Data sources and literature search

The systematic search was conducted on July 31, 2022 (and updated on May 15, 2023) in the following electronic databases: MEDLINE/PubMed, Centre for Reviews and Dissemination (CRD), the Cochrane Library, Cumulative Index to Nursing and Allied Health Literature (CINAHL), EMBASE, Epistemonikos, Global Health, Health Systems Evidence, PDQ-Evidence, PsycINFO, Rx for Change (Canadian Agency for Drugs and Technologies in Health, CADTH), Scopus, Web of Science and Virtual Health Library (VHL). The Google Scholar database was used for the manual selection of studies (first 10 pages).

Additionally, hand searches were performed on the lists of references included in the systematic reviews and citations of the included studies, as well as on the websites of institutions working on CPGs development and implementation: Guidelines International Networks (GIN), National Institute for Health and Care Excellence (NICE; United Kingdom), World Health Organization (WHO), Centers for Disease Control and Prevention (CDC; USA), Institute of Medicine (IOM; USA), Australian Department of Health and Aged Care (ADH), Healthcare Improvement Scotland (SIGN), National Health and Medical Research Council (NHMRC; Australia), Queensland Health, The Joanna Briggs Institute (JBI), Ministry of Health and Social Policy of Spain, Ministry of Health of Brazil and Capes Theses and Dissertations Catalog.

The search strategy combined terms related to "clinical practice guidelines" (practice guidelines, practice guidelines as topic, clinical protocols), "implementation", "assessment" (assessment, evaluation), and "models, framework". The free term "monitoring" was not used because it was regularly related to clinical monitoring and not to implementation monitoring. The search strategies adapted for the electronic databases are presented in an additional file (see Additional file 2).

Study selection process

The results of the literature search from scientific databases, excluding the CRD database, were imported into Mendeley Reference Management software to remove duplicates. They were then transferred to the Rayyan platform ( https://rayyan.qcri.org ) [ 22 ] for the screening process. Initially, studies related to the "assessment of implementation of the CPG" were selected. The titles were first screened independently by two pairs of reviewers (first selection: four reviewers, NM, JB, SS, and JG; update: a pair of reviewers, NM and DG). The title screening was broad, including all potentially relevant studies on CPG and the implementation process. Following that, the abstracts were independently screened by the same group of reviewers. The abstract screening was more focused, specifically selecting studies that addressed CPG and the evaluation of the implementation process. In the next step, full-text articles were reviewed independently by a pair of reviewers (NM, DG) to identify those that explicitly presented "models" or "frameworks" for assessing the implementation of the CPG. Disagreements regarding the eligibility of studies were resolved through discussion and consensus, and by a third reviewer (JB) when necessary. One reviewer (NM) conducted manual searches, and the inclusion of documents was discussed with the other reviewers.

Risk of bias assessment of studies

The selected studies were independently classified and evaluated according to their methodological designs by two investigators (NM and JG). This review employed JBI’s critical appraisal tools to assess the trustworthiness, relevance and results of the included studies [ 23 ] and these tools are presented in additional files (see Additional file 3 and Additional file 4). Disagreements were resolved by consensus or consultation with the other reviewers. Methodological guidelines and noncomparative and before–after studies were not evaluated because JBI does not have specific tools for assessing these types of documents. Although the studies were assessed for quality, they were not excluded on this basis.

Data extraction

The data was independently extracted by two reviewers (NM, DG) using a Microsoft Excel spreadsheet. Discrepancies were discussed and resolved by consensus. The following information was extracted:

Document characteristics : author; year of publication; title; study design; instrument of evaluation; country; guideline context;

Usage context of the models : specific objectives; level of use (clinical, organizational, and policy); type of health service (community, ambulatorial, hospital, institutional); target group (guideline developers, clinicians; health professionals; health-policy decision-makers; health-care organizations; service managers);

Model and framework characteristics : name, domain evaluated, and model limitations.

The set of information to be extracted, shown in the systematic review protocol, was adjusted to improve the organization of the analysis.

The "level of use" refers to the scope of the model used. "Clinical" was considered when the evaluation focused on individual practices, "organizational" when practices were within a health service institution, and "policy" when the evaluation was more systemic and covered different health services or institutions.

The "type of health service" indicated the category of health service where the model/framework was used (or can be used) to assess the implementation of the CPG, related to the complexity of healthcare. "Community" is related to primary health care; "ambulatorial" is related to secondary health care; "hospital" is related to tertiary health care; and "institutional" represented models/frameworks not specific to a particular type of health service.

The "target group" included stakeholders related to the use of the model/framework for evaluating the implementation of the CPG, such as clinicians, health professionals, guideline developers, health policy-makers, health organizations, and service managers.

The category "health system" (public, private, or both) mentioned in the systematic review protocol was not found in the literature obtained and was removed as an extraction variable. Similarly, the variables "type of recommendation evaluated" and "context" were grouped because the same information was included in the "guideline context" section of the study.

Some selected documents presented models or frameworks recognized by the scientific field, including some that were validated. However, some studies adapted the model to this context. Therefore, the domain analysis covered all models or frameworks domains evaluated by (or suggested for evaluation by) the document analyzed.

Data analysis and synthesis

The results were tabulated using narrative synthesis with an aggregative approach, without meta-analysis, aiming to summarize the documents descriptively for the organization, description, interpretation and explanation of the study findings [ 24 , 25 ].

The model/framework domains evaluated in each document were studied according to Nilsen et al.’s constructs: "strategies", "context", "outcomes", "fidelity", "adaptation" and "sustainability". For this study, "strategies" were described as structured and planned initiatives used to enhance the implementation of clinical practice [ 26 ].

The definition of "context" varies in the literature. Despite that, this review considered it as the set of circumstances or factors surrounding a particular implementation effort, such as organizational support, financial resources, social relations and support, leadership, and organizational culture [ 26 , 27 ]. The domain "context" was subdivided according to the level of health care into "micro" (individual perspective), "meso" (organizational perspective), "macro" (systemic perspective), and "multiple" (when there is an issue involving more than one level of health care).

The "outcomes" domain was related to the results of the implementation process (unlike clinical outcomes) and was stratified according to the following constructs: acceptability, appropriateness, feasibility, adoption, cost, and penetration. All these concepts align with the definitions of Proctor et al. (2011), although we decided to separate "fidelity" and "sustainability" as independent domains similar to Nilsen [ 26 , 28 ].

"Fidelity" and "adaptation" were considered the same domain, as they are complementary pieces of the same issue. In this study, implementation fidelity refers to how closely guidelines are followed as intended by their developers or designers. On the other hand, adaptation involves making changes to the content or delivery of a guideline to better fit the needs of a specific context. The "sustainability" domain was defined as evaluations about the continuation or permanence over time of the CPG implementation.

Additionally, the domain "process" was utilized to address issues related to the implementation process itself, rather than focusing solely on the outcomes of the implementation process, as done by Wang et al. [ 14 ]. Furthermore, the "intervention" domain was introduced to distinguish aspects related to the CPG characteristics that can impact its implementation, such as the complexity of the recommendation.

A subgroup analysis was performed with models and frameworks categorized based on their levels of use (clinical, organizational, and policy) and the type of health service (community, ambulatorial, hospital, institutional) associated with the CPG. The goal is to assist stakeholders (politicians, clinicians, researchers, or others) in selecting the most suitable model for evaluating CPG implementation based on their specific health context.

Search results

Database searches yielded 26,011 studies, of which 107 full texts were reviewed. During the full-text review, 99 articles were excluded: 41 studies did not mention a model or framework for assessing the implementation of the CPG, 31 studies evaluated only implementation strategies (isolated actions) rather than the implementation process itself, and 27 articles were not related to the implementation assessment. Therefore, eight studies were included in the data analysis. The updated search did not reveal additional relevant studies. The main reason for study exclusion was that they did not use models or frameworks to assess CPG implementation. Additionally, four methodological guidelines were included from the manual search (Fig.  1 ).

figure 1

PRISMA diagram. Acronyms: ADH—Australian Department of Health, CINAHL—Cumulative Index to Nursing and Allied Health Literature, CDC—Centers for Disease Control and Prevention, CRD—Centre for Reviews and Dissemination, GIN—Guidelines International Networks, HSE—Health Systems Evidence, IOM—Institute of Medicine, JBI—The Joanna Briggs Institute, MHB—Ministry of Health of Brazil, NICE—National Institute for Health and Care Excellence, NHMRC—National Health and Medical Research Council, MSPS – Ministerio de Sanidad Y Política Social (Spain), SIGN—Scottish Intercollegiate Guidelines Network, VHL – Virtual Health Library, WHO—World Health Organization. Legend: Reason A –The study evaluated only implementation strategies (isolated actions) rather than the implementation process itself. Reason B – The study did not mention a model or framework for assessing the implementation of the intervention. Reason C – The study was not related to the implementation assessment. Adapted from Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. https://doi.org/10.1136/bmj.n71 . For more information, visit:

According to the JBI’s critical appraisal tools, the overall assessment of the studies indicates their acceptance for the systematic review.

The cross-sectional studies lacked clear information regarding "confounding factors" or "strategies to address confounding factors". This was understandable given the nature of the study, where such details are not typically included. However, the reviewers did not find this lack of information to be critical, allowing the studies to be included in the review. The results of this methodological quality assessment can be found in an additional file (see Additional file 5).

In the qualitative studies, there was some ambiguity regarding the questions: "Is there a statement locating the researcher culturally or theoretically?" and "Is the influence of the researcher on the research, and vice versa, addressed?". However, the reviewers decided to include the studies and deemed the methodological quality sufficient for the analysis in this article, based on the other information analyzed. The results of this methodological quality assessment can be found in an additional file (see Additional file 6).

Documents characteristics (Table  1 )

The documents were directed to several continents: Australia/Oceania (4/12) [ 31 , 33 , 36 , 37 ], North America (4/12 [ 30 , 32 , 38 , 39 ], Europe (2/12 [ 29 , 35 ] and Asia (2/12) [ 34 , 40 ]. The types of documents were classified as cross-sectional studies (4/12) [ 29 , 32 , 34 , 38 ], methodological guidelines (4/12) [ 33 , 35 , 36 , 37 ], mixed methods studies (3/12) [ 30 , 31 , 39 ] or noncomparative studies (1/12) [ 40 ]. In terms of the instrument of evaluation, most of the documents used a survey/questionnaire (6/12) [ 29 , 30 , 31 , 32 , 34 , 38 ], while three (3/12) used qualitative instruments (interviews, group discussions) [ 30 , 31 , 39 ], one used a checklist [ 37 ], one used an audit [ 33 ] and three (3/12) did not define a specific instrument to measure [ 35 , 36 , 40 ].

Considering the clinical areas covered, most studies evaluated the implementation of nonspecific (general) clinical areas [ 29 , 33 , 35 , 36 , 37 , 40 ]. However, some studies focused on specific clinical contexts, such as mental health [ 32 , 38 ], oncology [ 39 ], fall prevention [ 31 ], spinal cord injury [ 30 ], and sexually transmitted infections [ 34 ].

Usage context of the models (Table  1 )

Specific objectives.

All the studies highlighted the purpose of guiding the process of evaluating the implementation of CPGs, even if they evaluated CPGs from generic or different clinical areas.

Levels of use

The most common level of use of the models/frameworks identified to assess the implementation of CPGs was policy (6/12) [ 33 , 35 , 36 , 37 , 39 , 40 ]. In this level, the model is used in a systematic way to evaluate all the processes involved in CPGs implementation and is primarily related to methodological guidelines. This was followed by the organizational level of use (5/12) [ 30 , 31 , 32 , 38 , 39 ], where the model is used to evaluate the implementation of CPGs in a specific institution, considering its specific environment. Finally, the clinical level of use (2/12) [ 29 , 34 ] focuses on individual practice and the factors that can influence the implementation of CPGs by professionals.

Type of health service

Institutional services were predominant (5/12) [ 33 , 35 , 36 , 37 , 40 ] and included methodological guidelines and a study of model development and validation. Hospitals were the second most common type of health service (4/12) [ 29 , 30 , 31 , 34 ], followed by ambulatorial (2/12) [ 32 , 34 ] and community health services (1/12) [ 32 ]. Two studies did not specify which type of health service the assessment addressed [ 38 , 39 ].

Target group

The focus of the target group was professionals directly involved in clinical practice (6/12) [ 29 , 31 , 32 , 34 , 38 , 40 ], namely, health professionals and clinicians. Other less related stakeholders included guideline developers (2/12) [ 39 , 40 ], health policy decision makers (1/12) [ 39 ], and healthcare organizations (1/12) [ 39 ]. The target group was not defined in the methodological guidelines, although all the mentioned stakeholders could be related to these documents.

Model and framework characteristics

Models and frameworks for assessing the implementation of cpgs.

The Consolidated Framework for Implementation Research (CFIR) [ 31 , 38 ] and the Promoting Action on Research Implementation in Health Systems (PARiHS) framework [ 29 , 30 ] were the most commonly employed frameworks within the selected documents. The other models mentioned were: Goal commitment and implementation of practice guidelines framework [ 32 ]; Guideline to identify key indicators [ 35 ]; Guideline implementation checklist [ 37 ]; Guideline implementation evaluation tool [ 40 ]; JBI Implementation Framework [ 33 ]; Reach, effectiveness, adoption, implementation and maintenance (RE-AIM) framework [ 34 ]; The Guideline Implementability Framework [ 39 ] and an unnamed model [ 36 ].

Domains evaluated

The number of domains evaluated (or suggested for evaluation) by the documents varied between three and five, with the majority focusing on three domains. All the models addressed the domain "context", with a particular emphasis on the micro level of the health care context (8/12) [ 29 , 31 , 34 , 35 , 36 , 37 , 38 , 39 ], followed by the multilevel (7/12) [ 29 , 31 , 32 , 33 , 38 , 39 , 40 ], meso level (4/12) [ 30 , 35 , 39 , 40 ] and macro level (2/12) [ 37 , 39 ]. The "Outcome" domain was evaluated in nine models. Within this domain, the most frequently evaluated subdomain was "adoption" (6/12) [ 29 , 32 , 34 , 35 , 36 , 37 ], followed by "acceptability" (4/12) [ 30 , 32 , 35 , 39 ], "appropriateness" (3/12) [ 32 , 34 , 36 ], "feasibility" (3/12) [ 29 , 32 , 36 ], "cost" (1/12) [ 35 ] and "penetration" (1/12) [ 34 ]. Regarding the other domains, "Intervention" (8/12) [ 29 , 31 , 34 , 35 , 36 , 38 , 39 , 40 ], "Strategies" (7/12) [ 29 , 30 , 33 , 35 , 36 , 37 , 40 ] and "Process" (5/12) [ 29 , 31 , 32 , 33 , 38 ] were frequently addressed in the models, while "Sustainability" (1/12) [ 34 ] was only found in one model, and "Fidelity/Adaptation" was not observed. The domains presented by the models and frameworks and evaluated in the documents are shown in Table  2 .

Limitations of the models

Only two documents mentioned limitations in the use of the model or frameworks. These two studies reported limitations in the use of CFIR: "is complex and cumbersome and requires tailoring of the key variables to the specific context", and "this framework should be supplemented with other important factors and local features to achieve a sound basis for the planning and realization of an ongoing project" [ 31 , 38 ]. Limitations in the use of other models or frameworks are not reported.

Subgroup analysis

Following the subgroup analysis (Table  3 ), five different models/frameworks were utilized at the policy level by institutional health services. These included the Guideline Implementation Evaluation Tool [ 40 ], the NHMRC tool (model name not defined) [ 36 ], the JBI Implementation Framework + GRiP [ 33 ], Guideline to identify key indicators [ 35 ], and the Guideline implementation checklist [ 37 ]. Additionally, the "Guideline Implementability Framework" [ 39 ] was implemented at the policy level without restrictions based on the type of health service. Regarding the organizational level, the models used varied depending on the type of service. The "Goal commitment and implementation of practice guidelines framework" [ 32 ] was applied in community and ambulatory health services, while "PARiHS" [ 29 , 30 ] and "CFIR" [ 31 , 38 ] were utilized in hospitals. In contexts where the type of health service was not defined, "CFIR" [ 31 , 38 ] and "The Guideline Implementability Framework" [ 39 ] were employed. Lastly, at the clinical level, "RE-AIM" [ 34 ] was utilized in ambulatory and hospital services, and PARiHS [ 29 , 30 ] was specifically used in hospital services.

Key findings

This systematic review identified 10 models/ frameworks used to assess the implementation of CPGs in various health system contexts. These documents shared similar objectives in utilizing models and frameworks for assessment. The primary level of use was policy, the most common type of health service was institutional, and the main target group of the documents was professionals directly involved in clinical practice. The models and frameworks presented varied analytical domains, with sometimes divergent concepts used in these domains. This study is innovative in its emphasis on the evaluation stage of CPG implementation and in summarizing aspects and domains aimed at the practical application of these models.

The small number of documents contrasts with studies that present an extensive range of models and frameworks available in implementation science. The findings suggest that the use of models and frameworks to evaluate the implementation of CPGs is still in its early stages. Among the selected documents, there was a predominance of cross-sectional studies and methodological guidelines, which strongly influenced how the implementation evaluation was conducted. This was primarily done through surveys/questionnaires, qualitative methods (interviews, group discussions), and non-specific measurement instruments. Regarding the subject areas evaluated, most studies focused on a general clinical area, while others explored different clinical areas. This suggests that the evaluation of CPG implementation has been carried out in various contexts.

The models were chosen independently of the categories proposed in the literature, with their usage categorized for purposes other than implementation evaluation, as is the case with CFIR and PARiHS. This practice was described by Nilsen et al. who suggested that models and frameworks from other categories can also be applied for evaluation purposes because they specify concepts and constructs that may be operationalized and measured [ 14 , 15 , 42 , 43 ].

The results highlight the increased use of models and frameworks in evaluation processes at the policy level and institutional environments, followed by the organizational level in hospital settings. This finding contradicts a review that reported the policy level as an area that was not as well studied [ 44 ]. The use of different models at the institutional level is also emphasized in the subgroup analysis. This may suggest that the greater the impact (social, financial/economic, and organizational) of implementing CPGs, the greater the interest and need to establish well-defined and robust processes. In this context, the evaluation stage stands out as crucial, and the investment of resources and efforts to structure this stage becomes even more advantageous [ 10 , 45 ]. Two studies (16,7%) evaluated the implementation of CPGs at the individual level (clinical level). These studies stand out for their potential to analyze variations in clinical practice in greater depth.

In contrast to the level of use and type of health service most strongly indicated in the documents, with systemic approaches, the target group most observed was professionals directly involved in clinical practice. This suggests an emphasis on evaluating individual behaviors. This same emphasis is observed in the analysis of the models, in which there is a predominance of evaluating the micro level of the health context and the "adoption" subdomain, in contrast with the sub-use of domains such as "cost" and "process". Cassetti et al. observed the same phenomenon in their review, in which studies evaluating the implementation of CPGs mainly adopted a behavioral change approach to tackle those issues, without considering the influence of wider social determinants of health [ 10 ]. However, the literature widely reiterates that multiple factors impact the implementation of CPGs, and different actions are required to make them effective [ 6 , 46 , 47 ]. As a result, there is enormous potential for the development and adaptation of models and frameworks aimed at more systemic evaluation processes that consider institutional and organizational aspects.

In analyzing the model domains, most models focused on evaluating only some aspects of implementation (three domains). All models evaluated the "context", highlighting its significant influence on implementation [ 9 , 26 ]. Context is an essential effect modifier for providing research evidence to guide decisions on implementation strategies [ 48 ]. Contextualizing a guideline involves integrating research or other evidence into a specific circumstance [ 49 ]. The analysis of this domain was adjusted to include all possible contextual aspects, even if they were initially allocated to other domains. Some contextual aspects presented by the models vary in comprehensiveness, such as the assessment of the "timing and nature of stakeholder engagement" [ 39 ], which includes individual engagement by healthcare professionals and organizational involvement in CPG implementation. While the importance of context is universally recognized, its conceptualization and interpretation differ across studies and models. This divergence is also evident in other domains, consistent with existing literature [ 14 ]. Efforts to address this conceptual divergence in implementation science are ongoing, but further research and development are needed in this field [ 26 ].

The main subdomain evaluated was "adoption" within the outcome domain. This may be attributed to the ease of accessing information on the adoption of the CPG, whether through computerized system records, patient records, or self-reports from healthcare professionals or patients themselves. The "acceptability" subdomain pertains to the perception among implementation stakeholders that a particular CPG is agreeable, palatable or satisfactory. On the other hand, "appropriateness" encompasses the perceived fit, relevance or compatibility of the CPG for a specific practice setting, provider, or consumer, or its perceived fit to address a particular issue or problem [ 26 ]. Both subdomains are subjective and rely on stakeholders' interpretations and perceptions of the issue being analyzed, making them susceptible to reporting biases. Moreover, obtaining this information requires direct consultation with stakeholders, which can be challenging for some evaluation processes, particularly in institutional contexts.

The evaluation of the subdomains "feasibility" (the extent to which a CPG can be successfully used or carried out within a given agency or setting), "cost" (the cost impact of an implementation effort), and "penetration" (the extent to which an intervention or treatment is integrated within a service setting and its subsystems) [ 26 ] was rarely observed in the documents. This may be related to the greater complexity of obtaining information on these aspects, as they involve cross-cutting and multifactorial issues. In other words, it would be difficult to gather this information during evaluations with health practitioners as the target group. This highlights the need for evaluation processes of CPGs implementation involving multiple stakeholders, even if the evaluation is adjusted for each of these groups.

Although the models do not establish the "intervention" domain, we thought it pertinent in this study to delimit the issues that are intrinsic to CPGs, such as methodological quality or clarity in establishing recommendations. These issues were quite common in the models evaluated but were considered in other domains (e.g., in "context"). Studies have reported the importance of evaluating these issues intrinsic to CPGs [ 47 , 50 ] and their influence on the implementation process [ 51 ].

The models explicitly present the "strategies" domain, and its evaluation was usually included in the assessments. This is likely due to the expansion of scientific and practical studies in implementation science that involve theoretical approaches to the development and application of interventions to improve the implementation of evidence-based practices. However, these interventions themselves are not guaranteed to be effective, as reported in a previous review that showed unclear results indicating that the strategies had affected successful implementation [ 52 ]. Furthermore, model domains end up not covering all the complexity surrounding the strategies and their development and implementation process. For example, the ‘Guideline implementation evaluation tool’ evaluates whether guideline developers have designed and provided auxiliary tools to promote the implementation of guidelines [ 40 ], but this does not mean that these tools would work as expected.

The "process" domain was identified in the CFIR [ 31 , 38 ], JBI/GRiP [ 33 ], and PARiHS [ 29 ] frameworks. While it may be included in other domains of analysis, its distinct separation is crucial for defining operational issues when assessing the implementation process, such as determining if and how the use of the mentioned CPG was evaluated [ 3 ]. Despite its presence in multiple models, there is still limited detail in the evaluation guidelines, which makes it difficult to operationalize the concept. Further research is needed to better define the "process" domain and its connections and boundaries with other domains.

The domain of "sustainability" was only observed in the RE-AIM framework, which is categorized as an evaluation framework [ 34 ]. In its acronym, the letter M stands for "maintenance" and corresponds to the assessment of whether the user maintains use, typically longer than 6 months. The presence of this domain highlights the need for continuous evaluation of CPGs implementation in the short, medium, and long term. Although the RE-AIM framework includes this domain, it was not used in the questionnaire developed in the study. One probable reason is that the evaluation of CPGs implementation is still conducted on a one-off basis and not as a continuous improvement process. Considering that changes in clinical practices are inherent over time, evaluating and monitoring changes throughout the duration of the CPG could be an important strategy for ensuring its implementation. This is an emerging field that requires additional investment and research.

The "Fidelity/Adaptation" domain was not observed in the models. These emerging concepts involve the extent to which a CPG is being conducted exactly as planned or whether it is undergoing adjustments and adaptations. Whether or not there is fidelity or adaptation in the implementation of CPGs does not presuppose greater or lesser effectiveness; after all, some adaptations may be necessary to implement general CPGs in specific contexts. The absence of this domain in all the models and frameworks may suggest that they are not relevant aspects for evaluating implementation or that there is a lack of knowledge of these complex concepts. This may suggest difficulty in expressing concepts in specific evaluative questions. However, further studies are warranted to determine the comprehensiveness of these concepts.

It is important to note the customization of the domains of analysis, with some domains presented in the models not being evaluated in the studies, while others were complementarily included. This can be seen in Jeong et al. [ 34 ], where the "intervention" domain in the evaluation with the RE-AIM framework reinforced the aim of theoretical approaches such as guiding the process and not determining norms. Despite this, few limitations were reported for the models, suggesting that the use of models in these studies reflects the application of these models to defined contexts without a deep critical analysis of their domains.

Limitations

This review has several limitations. First, only a few studies and methodological guidelines that explicitly present models and frameworks for assessing the implementation of CPGs have been found. This means that few alternative models could be analyzed and presented in this review. Second, this review adopted multiple analytical categories (e.g., level of use, health service, target group, and domains evaluated), whose terminology has varied enormously in the studies and documents selected, especially for the "domains evaluated" category. This difficulty in harmonizing the taxonomy used in the area has already been reported [ 26 ] and has significant potential to confuse. For this reason, studies and initiatives are needed to align understandings between concepts and, as far as possible, standardize them. Third, in some studies/documents, the information extracted was not clear about the analytical category. This required an in-depth interpretative process of the studies, which was conducted in pairs to avoid inappropriate interpretations.

Implications

This study contributes to the literature and clinical practice management by describing models and frameworks specifically used to assess the implementation of CPGs based on their level of use, type of health service, target group related to the CPG, and the evaluated domains. While there are existing reviews on the theories, frameworks, and models used in implementation science, this review addresses aspects not previously covered in the literature. This valuable information can assist stakeholders (such as politicians, clinicians, researchers, etc.) in selecting or adapting the most appropriate model to assess CPG implementation based on their health context. Furthermore, this study is expected to guide future research on developing or adapting models to assess the implementation of CPGs in various contexts.

The use of models and frameworks to evaluate the implementation remains a challenge. Studies should clearly state the level of model use, the type of health service evaluated, and the target group. The domains evaluated in these models may need adaptation to specific contexts. Nevertheless, utilizing models to assess CPGs implementation is crucial as they can guide a more thorough and systematic evaluation process, aiding in the continuous improvement of CPGs implementation. The findings of this systematic review offer valuable insights for stakeholders in selecting or adjusting models and frameworks for CPGs evaluation, supporting future theoretical advancements and research.

Availability of data and materials

Abbreviations.

Australian Department of Health and Aged Care

Canadian Agency for Drugs and Technologies in Health

Centers for Disease Control and

Consolidated Framework for Implementation Research

Cumulative Index to Nursing and Allied Health Literature

Clinical practice guideline

Centre for Reviews and Dissemination

Guidelines International Networks

Getting Research into Practice

Health Systems Evidence

Institute of Medicine

The Joanna Briggs Institute

Ministry of Health of Brazil

Ministerio de Sanidad y Política Social

National Health and Medical Research Council

National Institute for Health and Care Excellence

Promoting action on research implementation in health systems framework

Predisposing, Reinforcing and Enabling Constructs in Educational Diagnosis and Evaluation-Policy, Regulatory, and Organizational Constructs in Educational and Environmental Development

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

International Prospective Register of Systematic Reviews

Reach, effectiveness, adoption, implementation, and maintenance framework

Healthcare Improvement Scotland

United States of America

Virtual Health Library

World Health Organization

Medicine I of. Crossing the Quality Chasm: A New Health System for the 21st Century. 2001. Available from: http://www.nap.edu/catalog/10027 . Cited 2022 Sep 29.

Field MJ, Lohr KN. Clinical Practice Guidelines: Directions for a New Program. Washington DC: National Academy Press. 1990. Available from: https://www.nap.edu/read/1626/chapter/8 Cited 2020 Sep 2.

Dawson A, Henriksen B, Cortvriend P. Guideline Implementation in Standardized Office Workflows and Exam Types. J Prim Care Community Heal. 2019;10. Available from: https://pubmed.ncbi.nlm.nih.gov/30900500/ . Cited 2020 Jul 15.

Unverzagt S, Oemler M, Braun K, Klement A. Strategies for guideline implementation in primary care focusing on patients with cardiovascular disease: a systematic review. Fam Pract. 2014;31(3):247–66. Available from: https://academic.oup.com/fampra/article/31/3/247/608680 . Cited 2020 Nov 5.

Article   PubMed   Google Scholar  

Nilsen P. Making sense of implementation theories, models and frameworks. Implement Sci. 2015;10(1):1–13. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/s13012-015-0242-0 . Cited 2022 May 1.

Article   Google Scholar  

Mangana F, Massaquoi LD, Moudachirou R, Harrison R, Kaluangila T, Mucinya G, et al. Impact of the implementation of new guidelines on the management of patients with HIV infection at an advanced HIV clinic in Kinshasa, Democratic Republic of Congo (DRC). BMC Infect Dis. 2020;20(1):N.PAG-N.PAG. Available from: https://search.ebscohost.com/login.aspx?direct=true&db=c8h&AN=146325052&amp .

Browman GP, Levine MN, Mohide EA, Hayward RSA, Pritchard KI, Gafni A, et al. The practice guidelines development cycle: a conceptual tool for practice guidelines development and implementation. 2016;13(2):502–12. https://doi.org/10.1200/JCO.1995.13.2.502 .

Killeen SL, Donnellan N, O’Reilly SL, Hanson MA, Rosser ML, Medina VP, et al. Using FIGO Nutrition Checklist counselling in pregnancy: A review to support healthcare professionals. Int J Gynecol Obstet. 2023;160(S1):10–21. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85146194829&doi=10.1002%2Fijgo.14539&partnerID=40&md5=d0f14e1f6d77d53e719986e6f434498f .

Bauer MS, Damschroder L, Hagedorn H, Smith J, Kilbourne AM. An introduction to implementation science for the non-specialist. BMC Psychol. 2015;3(1):1–12. Available from: https://bmcpsychology.biomedcentral.com/articles/10.1186/s40359-015-0089-9 . Cited 2020 Nov 5.

Cassetti V, M VLR, Pola-Garcia M, AM G, J JPC, L APDT, et al. An integrative review of the implementation of public health guidelines. Prev Med reports. 2022;29:101867. Available from: http://www.epistemonikos.org/documents/7ad499d8f0eecb964fc1e2c86b11450cbe792a39 .

Eccles MP, Mittman BS. Welcome to implementation science. Implementation Science BioMed Central. 2006. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/1748-5908-1-1 .

Damschroder LJ. Clarity out of chaos: Use of theory in implementation research. Psychiatry Res. 2020;1(283):112461.

Handley MA, Gorukanti A, Cattamanchi A. Strategies for implementing implementation science: a methodological overview. Emerg Med J. 2016;33(9):660–4. Available from: https://pubmed.ncbi.nlm.nih.gov/26893401/ . Cited 2022 Mar 7.

Wang Y, Wong ELY, Nilsen P, Chung VC ho, Tian Y, Yeoh EK. A scoping review of implementation science theories, models, and frameworks — an appraisal of purpose, characteristics, usability, applicability, and testability. Implement Sci. 2023;18(1):1–15. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/s13012-023-01296-x . Cited 2024 Jan 22.

Moullin JC, Dickson KS, Stadnick NA, Albers B, Nilsen P, Broder-Fingert S, et al. Ten recommendations for using implementation frameworks in research and practice. Implement Sci Commun. 2020;1(1):1–12. Available from: https://implementationsciencecomms.biomedcentral.com/articles/10.1186/s43058-020-00023-7 . Cited 2022 May 20.

Glasgow RE, Vogt TM, Boles SM. *Evaluating the public health impact of health promotion interventions: the RE-AIM framework. Am J Public Health. 1999;89(9):1322. Available from: /pmc/articles/PMC1508772/?report=abstract. Cited 2022 May 22.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Asada Y, Lin S, Siegel L, Kong A. Facilitators and Barriers to Implementation and Sustainability of Nutrition and Physical Activity Interventions in Early Childcare Settings: a Systematic Review. Prev Sci. 2023;24(1):64–83. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139519721&doi=10.1007%2Fs11121-022-01436-7&partnerID=40&md5=b3c395fdd2b8235182eee518542ebf2b .

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions. version 6. Cochrane; 2022. Available from: https://training.cochrane.org/handbook. Cited 2022 May 23.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372. Available from: https://www.bmj.com/content/372/bmj.n71 . Cited 2021 Nov 18.

M C, AD O, E P, JP H, S G. Appendix A: Guide to the contents of a Cochrane Methodology protocol and review. Higgins JP, Green S, eds Cochrane Handb Syst Rev Interv. 2011;Version 5.

Kislov R, Pope C, Martin GP, Wilson PM. Harnessing the power of theorising in implementation science. Implement Sci. 2019;14(1):1–8. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/s13012-019-0957-4 . Cited 2024 Jan 22.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):1–10. Available from: https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-016-0384-4 . Cited 2022 May 20.

JBI. JBI’s Tools Assess Trust, Relevance & Results of Published Papers: Enhancing Evidence Synthesis. Available from: https://jbi.global/critical-appraisal-tools . Cited 2023 Jun 13.

Drisko JW. Qualitative research synthesis: An appreciative and critical introduction. Qual Soc Work. 2020;19(4):736–53.

Pope C, Mays N, Popay J. Synthesising qualitative and quantitative health evidence: A guide to methods. 2007. Available from: https://books.google.com.br/books?hl=pt-PT&lr=&id=L3fbE6oio8kC&oi=fnd&pg=PR6&dq=synthesizing+qualitative+and+quantitative+health+evidence&ots=sfELNUoZGq&sig=bQt5wt7sPKkf7hwKUvxq2Ek-p2Q#v=onepage&q=synthesizing=qualitative=and=quantitative=health=evidence& . Cited 2022 May 22.

Nilsen P, Birken SA, Edward Elgar Publishing. Handbook on implementation science. 542. Available from: https://www.e-elgar.com/shop/gbp/handbook-on-implementation-science-9781788975988.html . Cited 2023 Apr 15.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: A consolidated framework for advancing implementation science. Implement Sci. 2009;4(1):1–15. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/1748-5908-4-50 . Cited 2023 Jun 13.

Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, et al. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Adm Policy Ment Health. 2011;38(2):65–76. Available from: https://pubmed.ncbi.nlm.nih.gov/20957426/ . Cited 2023 Jun 11.

Bahtsevani C, Willman A, Khalaf A, Östman M, Ostman M. Developing an instrument for evaluating implementation of clinical practice guidelines: a test-retest study. J Eval Clin Pract. 2008;14(5):839–46. Available from: https://search.ebscohost.com/login.aspx?direct=true&db=c8h&AN=105569473&amp . Cited 2023 Jan 18.

Balbale SN, Hill JN, Guihan M, Hogan TP, Cameron KA, Goldstein B, et al. Evaluating implementation of methicillin-resistant Staphylococcus aureus (MRSA) prevention guidelines in spinal cord injury centers using the PARIHS framework: a mixed methods study. Implement Sci. 2015;10(1):130. Available from: https://pubmed.ncbi.nlm.nih.gov/26353798/ . Cited 2023 Apr 3.

Article   PubMed   PubMed Central   Google Scholar  

Breimaier HE, Heckemann B, Halfens RJGG, Lohrmann C. The Consolidated Framework for Implementation Research (CFIR): a useful theoretical framework for guiding and evaluating a guideline implementation process in a hospital-based nursing practice. BMC Nurs. 2015;14(1):43. Available from: https://search.ebscohost.com/login.aspx?direct=true&db=c8h&AN=109221169&amp . Cited 2023 Apr 3.

Chou AF, Vaughn TE, McCoy KD, Doebbeling BN. Implementation of evidence-based practices: Applying a goal commitment framework. Health Care Manage Rev. 2011;36(1):4–17. Available from: https://pubmed.ncbi.nlm.nih.gov/21157225/ . Cited 2023 Apr 30.

Porritt K, McArthur A, Lockwood C, Munn Z. JBI Manual for Evidence Implementation. JBI Handbook for Evidence Implementation. JBI; 2020. Available from: https://jbi-global-wiki.refined.site/space/JHEI . Cited 2023 Apr 3.

Jeong HJJ, Jo HSS, Oh MKK, Oh HWW. Applying the RE-AIM Framework to Evaluate the Dissemination and Implementation of Clinical Practice Guidelines for Sexually Transmitted Infections. J Korean Med Sci. 2015;30(7):847–52. Available from: https://pubmed.ncbi.nlm.nih.gov/26130944/ . Cited 2023 Apr 3.

GPC G de trabajo sobre implementación de. Implementación de Guías de Práctica Clínica en el Sistema Nacional de Salud. Manual Metodológico. 2009. Available from: https://portal.guiasalud.es/wp-content/uploads/2019/01/manual_implementacion.pdf . Cited 2023 Apr 3.

Australia C of. A guide to the development, implementation and evaluation of clinical practice guidelines. National Health and Medical Research Council; 1998. Available from: https://www.health.qld.gov.au/__data/assets/pdf_file/0029/143696/nhmrc_clinprgde.pdf .

Health Q. Guideline implementation checklist Translating evidence into best clinical practice. 2022.

Google Scholar  

Quittner AL, Abbott J, Hussain S, Ong T, Uluer A, Hempstead S, et al. Integration of mental health screening and treatment into cystic fibrosis clinics: Evaluation of initial implementation in 84 programs across the United States. Pediatr Pulmonol. 2020;55(11):2995–3004. Available from: https://www.embase.com/search/results?subaction=viewrecord&id=L2005630887&from=export . Cited 2023 Apr 3.

Urquhart R, Woodside H, Kendell C, Porter GA. Examining the implementation of clinical practice guidelines for the management of adult cancers: A mixed methods study. J Eval Clin Pract. 2019;25(4):656–63. Available from: https://search.ebscohost.com/login.aspx?direct=true&db=c8h&AN=137375535&amp . Cited 2023 Apr 3.

Yinghui J, Zhihui Z, Canran H, Flute Y, Yunyun W, Siyu Y, et al. Development and validation for evaluation of an evaluation tool for guideline implementation. Chinese J Evidence-Based Med. 2022;22(1):111–9. Available from: https://www.embase.com/search/results?subaction=viewrecord&id=L2016924877&from=export .

Breimaier HE, Halfens RJG, Lohrmann C. Effectiveness of multifaceted and tailored strategies to implement a fall-prevention guideline into acute care nursing practice: a before-and-after, mixed-method study using a participatory action research approach. BMC Nurs. 2015;14(1):18. Available from: https://search.ebscohost.com/login.aspx?direct=true&db=c8h&AN=103220991&amp .

Lai J, Maher L, Li C, Zhou C, Alelayan H, Fu J, et al. Translation and cross-cultural adaptation of the National Health Service Sustainability Model to the Chinese healthcare context. BMC Nurs. 2023;22(1). Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153237164&doi=10.1186%2Fs12912-023-01293-x&partnerID=40&md5=0857c3163d25ce85e01363fc3a668654 .

Zhao J, Li X, Yan L, Yu Y, Hu J, Li SA, et al. The use of theories, frameworks, or models in knowledge translation studies in healthcare settings in China: a scoping review protocol. Syst Rev. 2021;10(1):13. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7792291 .

Tabak RG, Khoong EC, Chambers DA, Brownson RC. Bridging research and practice: models for dissemination and implementation research. Am J Prev Med. 2012;43(3):337–50. Available from: https://pubmed.ncbi.nlm.nih.gov/22898128/ . Cited 2023 Apr 4.

Phulkerd S, Lawrence M, Vandevijvere S, Sacks G, Worsley A, Tangcharoensathien V. A review of methods and tools to assess the implementation of government policies to create healthy food environments for preventing obesity and diet-related non-communicable diseases. Implement Sci. 2016;11(1):1–13. Available from: https://implementationscience.biomedcentral.com/articles/10.1186/s13012-016-0379-5 . Cited 2022 May 1.

Buss PM, Pellegrini FA. A Saúde e seus Determinantes Sociais. PHYSIS Rev Saúde Coletiva. 2007;17(1):77–93.

Pereira VC, Silva SN, Carvalho VKSS, Zanghelini F, Barreto JOMM. Strategies for the implementation of clinical practice guidelines in public health: an overview of systematic reviews. Heal Res Policy Syst. 2022;20(1):13. Available from: https://health-policy-systems.biomedcentral.com/articles/10.1186/s12961-022-00815-4 . Cited 2022 Feb 21.

Grimshaw J, Eccles M, Tetroe J. Implementing clinical guidelines: current evidence and future implications. J Contin Educ Health Prof. 2004;24 Suppl 1:S31-7. Available from: https://pubmed.ncbi.nlm.nih.gov/15712775/ . Cited 2021 Nov 9.

Lotfi T, Stevens A, Akl EA, Falavigna M, Kredo T, Mathew JL, et al. Getting trustworthy guidelines into the hands of decision-makers and supporting their consideration of contextual factors for implementation globally: recommendation mapping of COVID-19 guidelines. J Clin Epidemiol. 2021;135:182–6. Available from: https://pubmed.ncbi.nlm.nih.gov/33836255/ . Cited 2024 Jan 25.

Lenzer J. Why we can’t trust clinical guidelines. BMJ. 2013;346(7913). Available from: https://pubmed.ncbi.nlm.nih.gov/23771225/ . Cited 2024 Jan 25.

Molino C de GRC, Ribeiro E, Romano-Lieber NS, Stein AT, de Melo DO. Methodological quality and transparency of clinical practice guidelines for the pharmacological treatment of non-communicable diseases using the AGREE II instrument: A systematic review protocol. Syst Rev. 2017;6(1):1–6. Available from: https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-017-0621-5 . Cited 2024 Jan 25.

Albers B, Mildon R, Lyon AR, Shlonsky A. Implementation frameworks in child, youth and family services – Results from a scoping review. Child Youth Serv Rev. 2017;1(81):101–16.

Download references

Acknowledgements

Not applicable

This study is supported by the Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF). FAPDF Award Term (TOA) nº 44/2024—FAPDF/SUCTI/COOBE (SEI/GDF – Process 00193–00000404/2024–22). The content in this article is solely the responsibility of the authors and does not necessarily represent the official views of the FAPDF.

Author information

Authors and affiliations.

Department of Management and Incorporation of Health Technologies, Ministry of Health of Brazil, Brasília, Federal District, 70058-900, Brazil

Nicole Freitas de Mello & Dalila Fernandes Gomes

Postgraduate Program in Public Health, FS, University of Brasília (UnB), Brasília, Federal District, 70910-900, Brazil

Nicole Freitas de Mello, Dalila Fernandes Gomes & Jorge Otávio Maia Barreto

René Rachou Institute, Oswaldo Cruz Foundation, Belo Horizonte, Minas Gerais, 30190-002, Brazil

Sarah Nascimento Silva

Oswaldo Cruz Foundation - Brasília, Brasília, Federal District, 70904-130, Brazil

Juliana da Motta Girardi & Jorge Otávio Maia Barreto

You can also search for this author in PubMed   Google Scholar

Contributions

NFM and JOMB conceived the idea and the protocol for this study. NFM conducted the literature search. NFM, SNS, JMG and JOMB conducted the data collection with advice and consensus gathering from JOMB. The NFM and JMG assessed the quality of the studies. NFM and DFG conducted the data extraction. NFM performed the analysis and synthesis of the results with advice and consensus gathering from JOMB. NFM drafted the manuscript. JOMB critically revised the first version of the manuscript. All the authors revised and approved the submitted version.

Corresponding author

Correspondence to Nicole Freitas de Mello .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13012_2024_1389_moesm1_esm.docx.

Additional file 1: PRISMA checklist. Description of data: Completed PRISMA checklist used for reporting the results of this systematic review.

Additional file 2: Literature search. Description of data: The search strategies adapted for the electronic databases.

13012_2024_1389_moesm3_esm.doc.

Additional file 3: JBI’s critical appraisal tools for cross-sectional studies. Description of data: JBI’s critical appraisal tools to assess the trustworthiness, relevance, and results of the included studies. This is specific for cross-sectional studies.

13012_2024_1389_MOESM4_ESM.doc

Additional file 4: JBI’s critical appraisal tools for qualitative studies. Description of data: JBI’s critical appraisal tools to assess the trustworthiness, relevance, and results of the included studies. This is specific for qualitative studies.

13012_2024_1389_MOESM5_ESM.doc

Additional file 5: Methodological quality assessment results for cross-sectional studies. Description of data: Methodological quality assessment results for cross-sectional studies using JBI’s critical appraisal tools.

13012_2024_1389_MOESM6_ESM.doc

Additional file 6: Methodological quality assessment results for the qualitative studies. Description of data: Methodological quality assessment results for qualitative studies using JBI’s critical appraisal tools.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Freitas de Mello, N., Nascimento Silva, S., Gomes, D.F. et al. Models and frameworks for assessing the implementation of clinical practice guidelines: a systematic review. Implementation Sci 19 , 59 (2024). https://doi.org/10.1186/s13012-024-01389-1

Download citation

Received : 06 February 2024

Accepted : 01 August 2024

Published : 07 August 2024

DOI : https://doi.org/10.1186/s13012-024-01389-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implementation
  • Practice guideline
  • Evidence-Based Practice
  • Implementation science

Implementation Science

ISSN: 1748-5908

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review quality of data

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Otolaryngol Head Neck Surg

Logo of jooheadnecksurg

Systematic and other reviews: criteria and complexities

Robert t. sataloff.

1 Editor-in-Chief, Journal of Voice, Philadephia, USA

2 Editor Emeritus, Ear, Nose and Throat Journal, Philadephia, USA

Matthew L. Bush

3 Assistant Editor, Otology & Neurotology, Lexington, USA

Rakesh Chandra

4 Editor-in-Chief, Ear, Ear, Nose and Throat Journal, Nashville, USA

Douglas Chepeha

5 Editor-in-Chief, Journal of Otolaryngology – Head & Neck Surgery, Toronto, Canada

Brian Rotenberg

6 Editor-in-Chief, Journal of Otolaryngology – Head & Neck Surgery, London, Canada

Edward W. Fisher

7 Senior Editor, Journal of Laryngology and Otology, Birmingham, UK

David Goldenberg

8 Editor-in-Chief, Operative Techniques in Otolaryngology – Head and Neck Surgery, Hershey, USA

Ehab Y. Hanna

9 Editor-in-Chief, Head & Neck, Houston, USA

Joseph E. Kerschner

10 Editor-in-Chief, International Journal of Pediatric Otorhinolaryngology, Milwaukee, USA

Dennis H. Kraus

11 Co-Editor-in-Chief, Journal of Neurological Surgery Part B: Skull Base, New York, USA

John H. Krouse

12 Editor-in-Chief, Otolaryngology – Head and Neck Surgery, Philadelphia, USA

13 Editor-in-Chief, OTO-Open, Philadelphia, USA

14 Editor-in-Chief, Journal for Oto-Rhino-Laryngology, Head and Neck Surgery, Philadelphia, USA

15 Editor-in-Chief, World Journal of Otorhinolaryngology – Head and Neck Surgery, Philadelphia, USA

Michael Link

16 Co-Editor-in-Chief, Journal of Neurological Surgery Part B: Skull Base, Rochester, USA

Lawrence R. Lustig

17 Editor-in-Chief, Otology & Neurotology, New York, USA

Samuel H. Selesnick

18 Editor-in-Chief, The Laryngoscope, New York, USA

Raj Sindwani

19 Editor-in-Chief, American Journal of Rhinology & Allergy, Cleveland, USA

Richard J. Smith

20 Editor-in-Chief, Annals of Otology, Rhinology & Laryngology, Iowa City, USA

James Tysome

21 Editor-in-Chief, Clinical Otolaryngology, Cambridge, UK

Peter C. Weber

22 Editor-in-Chief, American Journal of Otolaryngology, Boston, USA

D. Bradley Welling

23 Editor-in-Chief, Laryngoscope Investigative Otolaryngology, Boston, USA

Review articles can be extremely valuable. They synthesize information for readers, often provide clarity and valuable insights into a topic; and good review articles tend to be cited frequently. Review articles do not require Institutional Review Board (IRB) approval if the data reviewed are public (including private and government databases) and if the articles reviewed have received IRB approval previously. However, some institutions require IRB review and exemption for review articles. So, authors should be familiar with their institution’s policy. In assessing and interpreting review articles, it is important to understand the article’s methodology, scholarly purpose and credibility. Many readers, and some journal reviewers, are not aware that there are different kinds of review articles with different definitions, criteria and academic impact [ 1 ]. In order to understand the importance and potential application of a review article, it is valuable for readers and reviewers to be able to classify review articles correctly.

Systematic reviews

Authors often submit articles that include the term “systematic” in the title without realizing that that term requires strict adherence to specific criteria. A systematic review follows explicit methodology to answer a well-defined research question by searching the literature comprehensively, evaluating the quantity and quality of research evidence rigorously, and analyzing the evidence to synthesize an answer to the research question. The evidence gathered in systematic reviews can be qualitative or quantitative. However, if adequate and comparable quantitative data are available then a meta-analysis can be performed to assess the weighted and summarized effect size of the studies included. Depending on the research question and the data collected, systematic reviews may or may not include quantitative meta-analyses; however, meta-analyses should be performed in the setting of a systematic review to ensure that all of the appropriate data were accessed. The components of a systematic review can be found in an important article by Moher et al. published in 2009 that defined requirements for systematic reviews and meta-analyses [ 2 ].

In order to optimize reporting of meta-analyses, an international group developed the Quality of Reporting of Meta-Analyses (QUOROM) statement at a meeting in 1996 that led to publication of the QUOROM statement in 1999 [ 3 ]. Moher et al. revised that document and re-named the guidelines the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The PRISMA statement included both meta-analyses and systematic reviews, and the authors incorporated definitions established by the Cochrane Collaboration [ 4 ]. The PRISMA statement established the current standard for systematic reviews. To qualify as a systematic review, the methods section should acknowledge use of the PRISMA guidelines, and all PRISMA components should be incorporated strictly in all facets of the paper from the research question to the discussion. The PRISMA statement includes a checklist of 27 items that must be included when reporting a systematic review or meta-analysis [ 2 ]. A downloadable version of this checklist can be used by authors, reviewers, and journal editorial staff to ensure compliance with recommended components [ 5 ]. All 27 will not be listed in this brief editorial (although authors and reviewers are encouraged to consult the article by Moher et al. and familiarize themselves with all items), but a few will be highlighted.

The research question, as reflected in the title, should be a hypothesis-based specific research inquiry. The introduction must describe the rationale for the review and provide a specific goal or set of goals to be addressed. The type of systematic review, according to the Cochrane Collaboration, is based on the research question being asked and may assess diagnostic test accuracy, review prognostic studies evidence, evaluate intervention effect, scrutinize research methodology, or summarize qualitative evidence [ 6 ].

In the methods section, the participants, interventions, comparisons, outcomes and study design (PICOS) must be put forward. In addition to mentioning compliance with PRISMA, the methods section should state whether a review protocol exists and, if so, where it can be accessed (including a registration number). Systematic reviews are eligible for registration in the International Prospective Register of Systematic Reviews (PROSPERO) as established at the University of York (York, UK). When PROSPERO is used (it is available but not required for systematic reviews), registration should occur at the initial protocol stage of the review, and the final paper should direct to the information in the register. The methods section also must include specific study characteristics including databases used, years considered, languages of articles included, specific inclusion and exclusion criteria for studies; and rationale for each criterion must be included. Which individuals specifically performed searches should be noted. Electronic search strategy (with a full description of at least one electronic search strategy sufficient to allow replication of the search), process for article selection, data variables sought, assumptions and simplifications, methods for assessing bias risk of each individual study (such as selective reporting in individual studies) and utilization of this information in data synthesis, principal summary measures (risk ratio, hazard ratio, difference in means, etc.), methods of data management and combining study results, outcome level assessment, and other information should be reported.

The results section should include the number of studies identified, screened, evaluated for eligibility (including rationale for exclusion), and those included in the final synthesis. A PRISMA flow diagram should be included to provide this information succinctly [ 7 ]. The results also should include the study characteristics, study results, risk of bias within and across studies, and a qualitative or quantitative synthesis of the results of the included studies. This level of rigor in acquiring and evaluating the evidence of each individual study is one of the criteria that distinguishes systematic reviews from other categories. If the systematic review involves studies with paired samples and quantitative data, a summary of data should be provided for each intervention group along with effect estimates and confidence intervals for all outcomes of each study. If a meta-analysis is performed, then synthesized effect size should be reported with confidence intervals and measures of consistency (i.e. – data heterogeneity such as I 2 ) for each meta-analysis, and assessment of bias risk across studies. A forest plot, which provides a graphical presentation of the meta-analysis results, should be included.

The discussion section should summarize the main findings commenting on the strength of evidence for each outcome, as well as relevance to healthcare providers, policymakers and other key stake-holders; limitations of the study and outcomes; and conclusions highlighting the interpretation of results in the context of other research, and implications for future research.

Without adhering to of all of these criteria and the others listed in the PRISMA statement and checklist, the review does not qualify to be classified as “systematic”.

Meta-analyses

Meta-analyses, when feasible based on available and comparable quantitative data, supplement a systematic review evaluation, by adding a secondary statistical analysis of the pooled weighted outcomes of similar studies. This adds a level of objectivity in the synthesis of the review’s findings. Meta-analyses are appropriate when at least 2 individual studies contain paired samples (experimental group and control group) and provide quantitative outcome data and sample size. Studies that lack a control group may over-estimate the effect size of the experimental intervention or condition being studied and are not ideal for meta-analyses [ 8 ]. It also should be remembered that the conclusions of a meta-analysis are only as valid as the data on which the analysis is based. If the articles included are flawed, then the conclusions of the meta-analysis also may be flawed. Systematic reviews and meta-analyses are the most rigorous categories of review.

Other types of reviews

Mixed methods reviews.

Systematic reviews typically contain a single type of data, either qualitative or quantitative; however, mixed methods reviews bring together a combination of data types or study types. This approach may be utilized when quantitative data, in the setting of an intervention study, only provide a narrow perspective of the efficacy or effectiveness of the intervention. The addition of qualitative data or qualitative studies may provide a more complete picture of the knowledge, attitudes, and behaviors of clinicians, patients or researchers regarding that intervention. This type of review could involve collecting either the quantitative or the qualitative data using systematic review methodology, but often the qualitative data are gathered using a convenience sampling. Many qualitative studies provide useful insights into clinical management and/or implementation of research interventions; and incorporating them into a mixed methods review may provide valuable perspective on a wide range of literature. Mixed methods reviews are not necessarily systematic in nature; however, authors conducting mixed methods reviews should follow systematic review methodology, when possible.

Literature and narrative reviews

Literature reviews include peer-reviewed original research, systematic reviews, and meta-analyses, but also may include conference abstracts, books, graduate degree theses, and other non-peer reviewed publications. The methods used to identify and evaluate studies should be specified, but they are less rigorous and comprehensive than those required for systematic reviews. Literature reviews can evaluate a broad topic but do not specifically articulate a specific question, nor do they synthesize the results of included studies rigorously. Like mixed method reviews, they provide an overview of published information on the topic, although they may be less comprehensive than integrative reviews; and, unlike systematic reviews, they do not need to support evidence-based clinical or research practices, or highlight high-quality evidence for the reader. Narrative reviews are similar to literature reviews and evaluate the same scope of literature. The terms sometimes are used interchangeably, and author bias in article selection and data interpretation is a potential concern in literature and narrative reviews.

Umbrella reviews

An umbrella review integrates previously published, high-quality reviews such as systematic reviews and meta-analyses. Its purpose is to synthesize information in previously published systematic reviews and meta-analyses into one convenient paper.

Rapid review

A rapid review uses systematic review methodology to evaluate existing research. It provides a quick synthesis of evidence and is used most commonly to assist in emergent decision-making such as that required to determine whether COVID-19 vaccines should receive emergent approval.

Scoping, mapping, and systematized reviews

If literature has not been reviewed comprehensively in a specific subject that is varied and complex, a mapping review (also called scoping review) may be useful to organize initial understanding of the topic and its available literature. While mapping reviews may be helpful in crystallizing research findings and may be published, they are particularly useful in helping to determine whether a topic is amenable to systematic review, and to help organize and direct the approach of the systematic review or other reviews of the subject. Systematized reviews are used most commonly by students. The systematized review provides initial assessment of a topic that is potentially appropriate for a systematic review, but a systematized review does not meet the rigorous criteria of a systematic review and has substantially more limited value. Additional types of reviews exist including critical review, state-of-the-art review, and others.

Reviews can be invaluable; but they also can be misleading. Systematic reviews and meta-analyses provide readers with the greatest confidence that rigorous efforts have attempted to eliminate bias and ensure validity, but even they have limitations based upon the strengths and weaknesses of the literature that they have assessed (and the skill and objectivity with which the authors have executed the review). Risks of bias, incomplete information and misinformation increase as the rigor of review methodology decreases. While review articles may summarize research related to a topic for readers, non-systematic reviews lack the rigor to answer adequately hypothesis-driven research questions that can influence evidence-based practice. Journal authors, reviewers, editorial staff, and should be cognizant of the strengths and weaknesses of review methodology and should consider them carefully as they assess the value of published review articles, particularly as they determine whether the information presented should alter their patient care.

Authors’ contributions

The author(s) read and approved the final manuscript.

Declarations

The authors declare no competing interests.

This article is co-published in the following journals: Journal of Voice, Otology & Neurotology, Ear, Nose and Throat Journal, Journal of Laryngology and Otology, Operative Techniques in Otolaryngology – Head and Neck Surgery, Head & Neck, International Journal of Pediatric Otorhinolaryngology, Journal of Neurological Surgery Part B: Skull Base, Otolaryngology – Head and Neck Surgery, World Journal of Otorhinolaryngology – Head and Neck Surgery, The Laryngoscope, American Journal of Rhinology & Allergy, Annals of Otology, Rhinology & Laryngology, Clinical Otolaryngology, American Journal of Otolaryngology, Laryngoscope Investigative Otolaryngology.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Systematic literature review of gender equity and social inclusion in primary education for teachers in Tanzania: assessing status and future directions

  • Open access
  • Published: 13 August 2024
  • Volume 3 , article number  122 , ( 2024 )

Cite this article

You have full access to this open access article

literature review quality of data

  • Henry Nkya 1 &
  • Isack Kibona 2  

Gender equity and social inclusion (GESI) are crucial for creating inclusive and equitable educational environments in primary schools. This systematic literature review aimed to interpret and synthesize the findings of previous studies on GESI interventions and programs in primary schools in Tanzania, identified gaps in the knowledge, and provided recommendations for policy and practice. A systematic literature review search identified 22 relevant studies that met the inclusion criteria. The studies conducted between 2010 and 2021, and the sample sizes of participants were above 50. More than 50% of the studies were conducted in rural areas and used a quasi-experimental design. The interventions evaluation included teacher training, community engagement, and curriculum reform. The systematic literature review employed statistical methods to measure effect sizes and employed traditional univariate systematic literature review to synthesize the results. A table summarizing the literature that met the inclusion criteria was created to ensure transparency and clarity in the data coding process. The systematic literature review found a positive effect of GESI interventions on various outcomes, including improved academic performance, reduced gender-based violence, and increased social inclusion. However, variations in effect sizes and study designs across the studies were noted. Several gaps were identified, such as the lack of long-term follow-up and the need for more rigorous study designs. The implications of the findings for policy and practice in promoting GESI in primary schools in Tanzania were discussed, and recommendations for future research were provided. This systematic literature review highlighted the importance of addressing GESI in primary school education in Tanzania and underscored the critical role of teachers in promoting these values. It calls for targeted interventions, policy enhancements, and further research to bridge the gaps identified in the literature.

Explore related subjects

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

GESI are critical components of education that ensure equitable access to education for all individuals, regardless of their gender, socioeconomic status, ethnicity, or other backgrounds. In Tanzania, GESI has become a significant concern, particularly in primary schools, where gender and social inequalities often lead to disparities in educational outcomes.

Research has shown that girls are more likely to face barriers in education than boys, including poverty, early marriages, and cultural bias that prioritize boys’ education over girls [ 28 ]. Furthermore, children from marginalized groups, such as children with disabilities, children from ethnic minority groups, and children from low-income families, often experience unequal access to quality education [ 38 ].

Addressing GESI issues in primary schools is crucial for ensuring that all children have access to quality education, which is essential for their personal development and future success. GESI initiatives can promote equity and inclusion in schools and create an environment where all children feel valued and supported [ 13 ].

Addressing gender stereotypes in teacher education programs can play a vital role in promoting GESI in primary schools [ 32 ]. Similarly, Okkolin et al. [ 27 ] suggest that interventions that address GESI can improve educational outcomes for girls and marginalized groups.

Overall, promoting GESI in primary schools is essential for creating a more equitable and inclusive education system that benefits all children [ 2 ]. It requires a concerted effort from policymakers, educators, parents, and communities to work together to create a learning environment that is supportive, respectful, and inclusive for all children.

1.1 Theoretical framework

This study is guided by the Social Justice Theory, which emphasizes the need for equitable treatment, opportunities, and outcomes for all individuals, particularly those from marginalized and disadvantaged backgrounds [ 10 ]. This framework is crucial in understanding the components of GESI and their impact on educational outcomes. The Social Justice Theory aligns with the goals of GESI by promoting fairness and the elimination of disparities in education [ 1 ].

1.1.1 Components of GESI

The key components of GESI in this study include [ 24 ]:

Gender equity: ensuring that girls and boys have equal access to education and opportunities.

Social inclusion: creating an inclusive environment where all students, regardless of their backgrounds, can participate and succeed.

Teacher training: educating teachers on gender-sensitive and inclusive teaching practices.

Community engagement: involving communities in promoting GESI.

Curricula reform: developing and implementing curricula that address GESI issues.

1.2 Justification for focusing on Tanzania

Tanzania provides a unique context for examining GESI due to its diverse population and the significant challenges it faces in achieving GESI in education [ 18 ]. Despite efforts to promote GESI, disparities persist, making it an important area of study to identify effective interventions and inform policy and practice.

1.3 Rationale for conducting a systematic literature review

A Systematic literature review is an essential tool for synthesizing research findings from different studies and summarizing the overall effect size of an intervention or variable of interest [ 34 ]. Conducting a systematic literature review on GESI in primary school education is critical for providing an overview of the existing research and identifying gaps that need to be addressed in future research. It also helps establish the overall effect of interventions aimed at promoting GESI in primary schools in Tanzania [ 9 ]. The results of the Systematic literature review can inform policies and practices aimed at promoting GESI in primary school education, thereby improving learning outcomes for all children, regardless of their gender, social, and economic backgrounds.

By addressing the GESI issues and synthesizing the existing literature, this systematic literature review aims to contribute to a more equitable and inclusive educational environment in Tanzania [ 22 ].

1.4 Research objectives

To identify the state of GESI in primary schools. This objective aims to provide a comprehensive understanding of how GESI issues manifest in primary schools, considering various social and educational contexts.

To number factors that contribute to gender GESI in primary schools. This shall allow informed decisions on the effort to contain the issues of GESI.

To synthesize the findings of previous studies on GESI in primary schools. This objective focuses on aggregating and interpreting the results of existing research to offer a clear and cohesive picture of what is known about GESI interventions and their effectiveness.

To identify gaps in the knowledge of GESI in primary schools. By evaluating the existing literature, this objective seeks to highlight areas where further research is needed, identifying shortcomings in study designs, populations, or intervention strategies.

To provide recommendations for improving GESI in primary schools. Based on the synthesis of previous studies and identified gaps, this objective aims to propose actionable strategies and policies to enhance GESI in primary education.

2 Methodology

Having set the study objectives, the search-strategy for the study involved conducting a comprehensive literature review of studies on GESI in primary schools. The search was conducted using electronic databases such as Google Scholar, JSTOR, and EBSCOhost. The search terms used were “gender equity,” “social inclusion,” “primary schools,” “Tanzania,” and “teachers.” Additionally, hand searching was conducted by reviewing the reference lists of identified studies to identify any relevant studies that may have been missed during the initial search.

Inclusion criteria:

The study must be conducted in primary schools.

The study must focus on gender equity and/or social inclusion in education.

The study must involve teachers as the primary participants or focus on the teacher’s role in promoting GESI.

The study must be published in English between 2010 and 2022.

Exclusion criteria:

Studies conducted outside Tanzania.

Studies not related to gender equity and/or social inclusion in education.

Studies not involving teachers or not focusing on the teacher’s role in promoting GESI.

Studies published before 2010 or after 2022.

The search process was conducted by two independent reviewers to ensure the accuracy and completeness of the search results. The reviewers screened the titles and abstracts of the identified studies for relevance and then reviewed the full text of potentially relevant studies. Any discrepancies between the reviewers were resolved through discussion and consensus. Reviewers made necessary steps to ensure a justified systematic review. Overall, the Authors reviewed 22 papers considered to have met the set criteria.

2.1 Choice of the effect size measure and analytical methods

The effect size measure used in this study was generated by statistical tools, making it suitable for systematics review that synthesize findings across multiple studies. For similar research questions, the study employed traditional univariate meta-analysis. This method was chosen because it is suitable for synthesizing the results of multiple studies that investigate similar research questions. Traditional univariate meta-analysis allows for the calculation of an overall effect size, providing a comprehensive summary of the impact of GESI interventions across different studies.

2.2 Choice of software

We used R software, specifically the ‘metafor’ package, for our analysis. This software was selected due to its robustness and versatility in conducting analytical procedures. The ‘metafor’ package supports a wide range of meta-analytic models and methods, making it a comprehensive tool for this type of analysis.

2.3 Coding of effect sizes

Table 1 summarizes the literature included that meets the inclusion criteria. This table includes information such as study design, sample size, effect sizes, and any other relevant variables. This step ensures transparency and clarity in the data coding process.

3 Results and analysis

The layout of the manuscript has been organized accordingly, so that headings and subheadings clearly demarcates each step of the systematic literature review process.

3.1 Status of GESI in primary schools in Tanzania

3.1.1 persistent gender disparities.

One of the major findings in this study was that gender disparities in primary education persist in Tanzania. This was evident in the lower enrollment and completion rates for girls in primary schools compared to boys [ 36 ]. Girls are less likely to attend school than boys, with enrollment rates lower for girls at both the primary and secondary levels. Additionally, girls are more likely to drop out of school due to various reasons, including early marriage, household responsibilities, and financial constraints [ 5 ]. These disparities highlight the ongoing challenges faced by girls in accessing and completing primary education.

3.1.2 Cultural and societal beliefs

Several studies have identified cultural and societal beliefs as a major factor contributing to gender disparities in primary education. In many Tanzanian communities, girls are expected to prioritize domestic responsibilities over their education, which can lead to low enrollment rates and high drop-out rates [ 39 ]. Furthermore, gender-based violence and sexual harassment are prevalent in schools, with girls facing discrimination and harassment from both male students and teachers [ 4 ]. These issues underscore the need for targeted interventions to create a safer and more supportive educational environment for girls.

Furthermore, Losioki and Mdee [ 12 ] found that gender stereotypes perpetuated in teacher education programs in Tanzania, which can affect the ability of teachers to create a gender-equitable and socially inclusive classroom environment. Teachers may unconsciously reinforce gender stereotypes in the classroom, leading to further marginalization of girls and other vulnerable groups.

3.1.3 Underrepresented minorities

In addition, limited access to education for children with disabilities or those from low-income families and marginalized communities can perpetuate social inequalities in primary schools [ 30 ]. These students often face significant barriers, including inadequate school facilities, lack of appropriate learning materials, and insufficient support services, which hinder their educational progress.

3.2 Strategies addressing the challenge

Despite these challenges, there have been government efforts to improve GESI in primary schools. The government of Tanzania has committed to providing equal access to education for all children, regardless of gender, ethnicity, or socio-economic status. The government has implemented policies such as free primary education and affirmative action programs to promote equal access to education for all children, regardless of gender or social status [ 15 , 26 ]. These initiatives aim to reduce financial barriers to education and encourage the enrollment and retention of girls and children from marginalized groups. This includes initiatives such as the Tanzania Education Sector Development Plan (ESDP) and the Primary Education Development Program (PEDP) [ 6 , 16 ]. These programs aim to address systemic barriers in education and promote inclusive practices in schools. The government also is open to collaborate with external forces like international interventions, community development agencies and NGO to work toward enhancing GESI. Some Strategies Addressing GESI Challenges. For instance, projects that focus on community engagement and parental involvement have shown positive impacts in changing attitudes towards girls’ education and promoting inclusive practices [ 17 ].

3.2.1 International and community-based programs

In recent years, there have been an increase in programs and initiatives aimed at promoting GESI in primary education. For example, the “Let Girls Learn” program, launched by the US government in partnership with the Tanzanian government, aimed to increase access to education for girls and reduce gender disparities in education [ 7 ]. Similarly, the Tanzania Gender Networking Programme (TGNP) has been working to promote GESI in education through community mobilization, advocacy, and capacity building [ 14 ].

3.2.2 Interventions with recorded impact

Previous studies identified several approaches that have been successful in improving GESI in primary schools. Among others, at least two are discussed. One such approach is the use of gender-responsive pedagogy, which involves incorporating gender-sensitive teaching practices and materials into the classroom [ 17 ]. This method helps create a more inclusive learning environment that acknowledges and addresses the different needs of boys and girls. Another effective intervention is the provision of sanitary pads and menstrual hygiene education to girls, which has been shown to improve school attendance and reduce drop-out rates [ 35 ]. By addressing menstrual hygiene needs, schools can help ensure that girls do not miss out on education due to a lack of resources or stigma associated with menstruation.

3.2.3 Intervention recommendations

GESI are essential components of a quality education system, and there is a need to address the persistent gender disparities in primary education. While cultural and societal beliefs continue to be major barriers, efforts to improve GESI through government policies and initiatives, as well as community-based programs, showed promise. The use of gender-responsive pedagogy and the provision of menstrual hygiene education and supplies were promising approaches that showed positive results [ 21 ]. However, more research and investment are needed to ensure that all children have access to primary education. Continued collaboration between the government, NGOs, and communities is essential to sustain and expand these efforts, ensuring that all students can benefit from a supportive and equitable educational environment [ 29 ].

Overall, there is still much work to be done to ensure GESI in primary schools [ 33 ]. It will require continued efforts and collaboration from the government, educators, and communities to address cultural and traditional beliefs, promote teacher education that challenges gender stereotypes, and provide equal access to education for all children. Policymakers must prioritize the allocation of resources to support GESI initiatives and ensure that schools are equipped to meet the diverse needs of all students [ 3 ].

By addressing these systemic issues, Tanzania can make significant strides towards achieving an inclusive and equitable education system that benefits all children, irrespective of their gender or socioeconomic background. Continued research and monitoring are essential to evaluate the effectiveness of existing interventions and identify new strategies to overcome persistent challenges in promoting GESI in primary education [ 31 ].

3.2.4 Gaps in the knowledge about GESI in primary schools

While the literature have provided valuable insights into the state of GESI in primary schools in Tanzania, several gaps in the knowledge still need to be addressed.

One major gap is the lack of research on the experiences of marginalized groups, including children with disabilities and those from low-income households. Studies have shown that these groups face significant barriers to accessing education and are often excluded from educational opportunities. For example, a study by Mwaijande [ 20 ] found that children with disabilities faced challenges such as lack of access to assistive devices and negative attitudes from teachers and other students. Similarly, research by Pak et al. [ 30 ] and Thomas and Rugambwa [ 36 ] revealed that children from poor families often struggle to pay school fees and may not have access to basic learning materials.

Another gap in the Tanzanian knowledge is the lack of research on the experiences of female teachers in primary schools. While studies have examined gender stereotypes and biases among teacher education programs, Thomas and Rugambwa [ 36 ] stressed that there is limited research on the experiences of female teachers in the classroom. Research on female teachers could shed light on the ways in which gender intersects with other forms of marginalization, such as age and socioeconomic status.

Furthermore, there is a need for more research on effective interventions and strategies for promoting GESI in primary schools. While some studies have evaluated the impact of interventions such as teacher training programs [ 19 , 25 ] , more rigorous evaluations of these interventions are needed to determine their effectiveness and sustainability.

Additionally, there is a lack of longitudinal studies that follow the long-term impact of GESI interventions. Many studies focus on short-term outcomes, but understanding the lasting effects of interventions is crucial for developing sustainable policies and practices.

In summary, while previous research has provided valuable insights into GESI in primary schools, several gaps in the knowledge need to be addressed. Future research should focus on the experiences of marginalized groups, including children with disabilities and those from low-income households, as well as female teachers. Additionally, the study showed more need for more rigorous evaluations of interventions and strategies aimed at promoting GESI in primary schools. Longitudinal studies that assess the long-term impact of these interventions would also be beneficial.

3.3 Patterns observed across the studies

As observed in the study, there were some patterns and trends identified across the studies. Firstly, there was a consistent finding that gender disparities persist in primary schools, particularly in terms of access to education and academic achievement. Despite efforts to promote GESI, girls and marginalized groups continue to face significant barriers that hinder their educational progress.

Secondly, there was a growing recognition of the importance of addressing GESI in primary education, as evidenced by the increasing number of interventions and programs aimed at promoting these values. This trend indicates a positive shift towards acknowledging and addressing GESI issues within the education system.

Thirdly, the systematic literature review revealed that the role of teachers is critical in promoting GESI in primary schools. Teacher training and support are essential for equipping educators with the skills and knowledge needed to foster an inclusive and equitable learning environment. Studies consistently highlighted the need for gender-sensitive pedagogy and teacher professional development programs.

Finally, there were some gaps in the current knowledge base, particularly with regard to the long-term impact of interventions and the effectiveness of different approaches to promoting GESI in primary education. While some interventions showed promising results, more research was needed to determine their sustainability and broader applicability.

By addressing these gaps and building on the patterns observed across studies, future research could contribute to a more comprehensive understanding of GESI in primary schools and inform the development of policies and practices to promote equity and inclusion for all students.

To sum up, analysis revealed that GESI interventions have a positive effect on various outcomes such as academic performance, reduced gender-based violence, and increased social inclusion. However, variations in effect sizes and study designs were observed across the studies. The studies included in the systematic literature review used various designs, such as randomized controlled trials (RCTs) and quasi-experimental designs, which contributed to the diversity in effect sizes.

4 Discussion

GESI is a critical components of a better-quality education system over otherwise. In Tanzania, primary education is the foundation for future academic and professional success [ 23 ], making it essential to ensure that all students, regardless of gender or social status, have access to an inclusive and equitable education. Previous studies explored the state of GESI in primary schools and identified areas for improvement.

The findings of the study highlighted the state of GESI in primary schools. The analysis of some 10 included studies revealed that significant disparities in access to education and academic performance among genders persist, with girls being more disadvantaged. Additionally, children from marginalized backgrounds, such as those from low-income families or those with disabilities, face substantial barriers to education.

To sum up, the study suggests a holistic approach involving teachers, schools, communities, and policymakers. Thus, multifaceted approach is necessary to create a more inclusive and equitable education system. Therefore, Recommendations include:

Providing comprehensive teacher training on gender-sensitive teaching methods.

Implementing community-based initiatives to address social and cultural barriers.

Developing policies and programs prioritizing marginalized students’ needs.

4.1 Implications of the study

Overall, the systematic literature review provided important insights into the state of GESI in primary schools. While progress has been made, significant challenges remain. Continued efforts and investments are necessary to promote a more equitable and inclusive education system. Future research should address the identified gaps and build on the promising interventions highlighted in this study. Based on the evidence synthesized, it is clear that targeted interventions are necessary to address the barriers that girls and other marginalized groups face in accessing and completing primary education. The study has the following recommendations on policy and practice and the areas for future research.

4.1.1 Addressing school issues related to GESI

Teacher training: policies should mandate comprehensive training for teachers on gender-sensitive teaching practices. Educators need to be equipped with the skills and knowledge to foster an inclusive classroom environment that supports both boys and girls. This includes understanding how to address and counteract gender stereotypes and biases.

Providing resources: schools should be equipped with resources to support girls’ education. This includes the provision of sanitary pads, access to clean and safe gender-segregated toilets, and gender-sensitive teaching materials. These resources are essential in reducing barriers to attendance and participation for girls.

Reviewing curricula: the school curriculum should be reviewed and revised to promote GESI. Curricula should reflect the diversity of Tanzanian society and challenge existing gender stereotypes. Including content that promotes GESI will help inculcate these values in students from a young age.

4.1.2 Addressing structural and socio-economic barriers

Financial support: there should be policies to provide financial support to families who cannot afford school fees. This can include scholarships, free school meals, and other financial incentives that alleviate the economic burden on families and keep girls in school.

Cultural norms and attitudes: interventions must focus on changing cultural norms and attitudes that limit girls’ access to education. Community engagement and awareness campaigns are crucial in shifting perceptions and promoting the value of girls’ education. Programs should aim to involve parents and community leaders in promoting gender equity.

Reducing gender-based violence: schools should implement strict policies against gender-based violence and harassment. Providing a safe and supportive environment is crucial for retaining girls in school. Support services for victims of violence and harassment should be readily available.

4.1.3 Promoting girls’ participation and leadership

Extracurricular activities: schools should create opportunities for girls to engage in extracurricular activities. Programs such as sports, arts, and clubs can enhance girls’ skills and confidence, providing a platform for them to express themselves and develop leadership qualities.

Leadership training: providing leadership training for girls to support their involvement in decision-making processes within schools and communities is essential. This training can empower girls to take active roles in their schools and communities, fostering a sense of agency and leadership.

4.1.4 Comprehensive and integrated approach

Involving multiple stakeholders: a comprehensive approach to promoting GESI should involve multiple stakeholders, including the government, civil society, and communities. Collaboration among these groups is essential for creating a supportive environment for GESI.

Evidence-based interventions: policies and practices should be guided by evidence-based interventions tailored to the specific needs and contexts of different regions and populations. Utilizing data and research to inform practices ensures that efforts are effective and impactful.

Monitoring and evaluation: continuous monitoring and evaluation of interventions are necessary to assess their effectiveness and make necessary adjustments. This helps in ensuring the sustainability and scalability of successful initiatives.

The study highlights the importance of a comprehensive and integrated approach to promoting GESI in primary schools. It underscores the need for targeted interventions, policy enhancements, and continued efforts to address the persistent barriers that girls and marginalized groups face. By implementing these recommendations, Tanzania can make significant strides towards achieving a more inclusive and equitable education system for all children.

4.2 Areas for future research

Future research and policy efforts should focus on sustaining and scaling successful interventions, ensuring that all children, regardless of gender or socio-economic background, have access to quality education. Future research should address these gaps:

Experiences of marginalized groups: more high-quality research is needed on the experiences of marginalized groups, including children with disabilities and those from low-income households.

Female teachers: investigate the experiences of female teachers in primary schools to understand how gender intersects with other forms of marginalization, such as age and socioeconomic status.

Effectiveness of interventions: conduct more rigorous evaluations of specific interventions and strategies for promoting GESI, including long-term impact studies.

Intersectionality: explore the intersectionality of factors such as gender, socioeconomic status, and ethnicity to provide a more comprehensive

5 Conclusion

GESI is crucial for improving access to education, ensuring equal opportunities, and promoting positive social outcomes. Teachers play a critical role in promoting these values and must receive appropriate training and support to create inclusive learning environments. Policymakers and education leaders must prioritize efforts to address GESI in primary schools, including investing in research to understand the factors contributing to gender and social equality and identifying effective strategies for promoting GESI.

The systematic literature review examined the state of GESI in primary schools and revealed significant challenges, particularly in terms of teacher training and the implementation of policies and programs. The review highlighted persistent gender disparities and the barriers faced by marginalized groups, such as children with disabilities and those from low-income families.

The findings suggest that targeted interventions are needed to address these barriers, recommended interventions include:

Increasing access to education: efforts to increase access to education for marginalized groups, such as scholarships and school feeding programs.

Policy development: implementing policies that address gender-based violence and discrimination.

Community engagement: involving multiple stakeholders, including government, civil society, and communities, in promoting GESI.

Develop and implement teacher training programs: focus on GESI principles, awareness of gender biases, strategies for promoting inclusivity, and the use of gender-sensitive teaching materials.

Develop and implement gender-sensitive curricula: address gender biases and stereotypes across all subject areas.

Strengthen policies and regulations: enforce policies that promote GESI in school governance, teacher recruitment, and student enrollment.

Increase participation of girls: provide incentives for girls to attend school, such as scholarships and school feeding programs, and improve school infrastructure.

The study provides crucial insights into the state of GESI in primary schools and underscores the need for coordinated and sustained efforts to address these challenges. By implementing the recommended strategies and involving all stakeholders, Tanzania can ensure that all children have access to quality primary education that promotes GESI.

Data availability

No datasets were generated or analysed during the current study.

Adipat S, Chotikapanich R. Sustainable development goal 4: an education goal to achieve equitable quality education. Acad J Interdiscip Stud. 2022;11(6):174–83.

Article   Google Scholar  

Cavicchioni V, Motivans A. Monitoring educational disparities in less developed countries. In: In pursuit of equity in education: using international indicators to compare equity policies. New York: Springer; 2001. p. 217–40.

Google Scholar  

Clancy J, Barnett A, Cecelski E, Pachauri S, Dutta S, Oparaocha S, Kooijman A. Gender in the transition to sustainable energy for all: from evidence to inclusive policies. 2019.

Colclough C, Rose P, Tembon M. Gender inequalities in primary schooling: the roles of poverty and adverse cultural practice. Int J Educ Dev. 2000;20(1):5–27.

Esteves M. Gender equality in education: a challenge for policy makers. Int J Soc Sci. 2018;4(2):893–905.

Fenech M, Skattebol J. Supporting the inclusion of low-income families in early childhood education: an exploration of approaches through a social justice lens. Int J Incl Educ. 2021;25(9):1042–60.

Frank A. Understanding the “success” of an all girls’ boarding school in rural Tanzania: perspectives of graduates, teachers, and administrators, PhD thesis. The Florida State University; 2019.

Group WB. Malawi systematic country diagnostic: breaking the cycle of low growth and slow poverty reduction. Washington, DC: World Bank; 2018.

Book   Google Scholar  

Guthridge M, Kirkman M, Penovic T, Giummarra MJ. Promoting gender equality: a systematic review of interventions. Soc Justice Res. 2022;35(3):318–43.

Kaur B. Equity and social justice in teaching and teacher education. Teach Teach Educ. 2012;28(4):485–92.

Lokina RB, Nyoni J, Kahyarara G. Social policy, gender and labour in Tanzania. Dar es Salaam: Economic and Social Research Foundation (ESRF); 2016.

Losioki BE, Mdee HK. The contribution of the hidden curriculum to gender inequality in teaching and learning materials: experiences from Tanzania. Asian J Educ Train. 2023;9(2):54–8.

Lovell E. Gender equality, social inclusion and resilience in Malawi. In: Building resilience and adapting to climate change. 2021.

Makulilo AB, Bakari M. Building a transformative feminist movement for women empowerment in Tanzania: the role of the Tanzania gender networking programme (TGNP-Mtandao). Afr Rev. 2021;13(2):155–74.

Malelu AM. Institutional factors influencing career advancement of women faculty: a case of, PhD thesis. Kenyatta University; 2015.

Mashala YL. The impact of the implementation of free education policy on secondary education in Tanzania. Int J Acad Multidiscip Res. 2019;3(1):6–14.

Mhewa, M. M., Bhalalusesa, E. P., & Kafanabo, E. (2021). Secondary school teachers’ understanding of gender-responsive pedagogy in bridging inequalities of students’ learning in tanzania. Papers in Education and Development, 38(2).

Mohun R, Biswas S, Jacobson J, Sajjad F. Infrastructure: a game changer for women’s economic empowerment. Background paper. UN Secretary-General’s High-Level Panel on Women’s Economic Empowerment; 2016.

Mondal S, Joe W, Akhauri S, Sinha I, Thakur P, Kumar V, Kumar T, Pradhan N, Kumar A. Delivering PACE++ curriculum in community settings: impact of TARA intervention on gender attitudes and dietary practices among adolescent girls in Bihar, India. PLoS ONE. 2023;18(11): e0293941.

Mwaijande VT. Access to education and assistive devices for children with physical disabilities in Tanzania, Master’s thesis. Oslo and Akershus University College; 2014.

Mwakabenga RJ, Komba SC. Gender inequalities in pedagogical classroom practice: what influence do teachers make? J Educ Humanit Sci. 2021;10(3):66–82.

Nazneen S, Cole N. Literature review on socially inclusive budgeting. 2018.

Ndijuye LG, Mligo IR, Machumu MAM. Early childhood education in Tanzania: views and beliefs of stakeholders on its status and development. Global Educ Rev. 2020;7(3):22–39.

Nelly S. Gender equality and social inclusion (GESI) in village development. Legal Brief. 2021;10(2):245–52.

Nkya HE, Bimbiga I. Unlocking potential: the positive impact of in-service training on science and mathematics teachers teaching strategies. Res Humanit Soc Sci. 2023. https://doi.org/10.7176/RHSS/13-16-04 .

Nyoni WP, He C, Yusuph ML. Sustainable interventions in enhancing gender parity in senior leadership positions in higher education in Tanzania. J Educ Pract. 2017;8(13):44–54.

Okkolin M-A, Lehtomäki E, Bhalalusesa E. The successful education sector development in Tanzania—comment on gender balance and inclusive education. Gend Educ. 2010;22(1):63–71.

Omari CK, Mbilinyi DA. Born to be less equal: the predicament of the girl child in Tanzania. In: Gender, family and work in Tanzania. London: Routledge; 2018. p. 292–314.

Chapter   Google Scholar  

Opini B, Onditi H. Education for all and students with disabilities in Tanzanian primary schools: challenges and successes. Int J Educ Stud. 2016;3(2):65–76.

Pak K, Desimone LM, Parsons A. An integrative approach to professional development to support college-and career-readiness standards. Educ Policy Anal Arch. 2020;28(111): n111.

Palmary I. Back2School gender mainstreaming guidelines. 2024.

Prasetyo P, Azwardi A, Kistanti N. Gender equality and social inclusion (GESI) and institutions as key drivers of green entrepreneurship. Int J Data Netw Sci. 2023;7(1):391–8.

Shelley J. Identifying and overcoming barriers to gender equality in Tanzanian schools: educators’ reflections. Int J Pedagog Innov New Technol. 2019;6(1):9–27.

Siddaway AP, Wood AM, Hedges LV. How to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annu Rev Psychol. 2019;70(1):747–70.

Stoilova D, Cai R, Aguilar-Gomez S, Batzer NH, Nyanza EC, Benshaul-Tolonen A. Biological, material and socio-cultural constraints to effective menstrual hygiene management among secondary school students in Tanzania. PLOS Global Public Health. 2022;2(3): e0000110.

Thomas MA, Rugambwa A. Equity, power, and capabilities: constructions of gender in a Tanzanian secondary school. Fem Form. 2011;23(3):153–75.

Tieng’o EWB. Community perception on public primary schools: implications for sustainable fee free basic education in Rorya district, Tanzania. East Afr J Educ Soc Sci. 2019;1(1):32–47.

Wapling L. Inclusive education and children with disabilities: quality education for all in low and middle income countries. 2016. https://eajess.ac.tz/2020/05/26/community-perception-on-public-primary-schools-implications-for-sustainable-fee-free-basic-education-in-rorya-district-tanzania/ .

Zacharia L. Factors causing gender inequality in education in Tanzania: a case of Korogwe district secondary schools, PhD thesis. The Open University of Tanzania; 2014.

Download references

Author information

Authors and affiliations.

Tanzania Institute of Accountancy, Mwanza, Tanzania

Mbeya University of Science and Technology, Mbeya, Tanzania

Isack Kibona

You can also search for this author in PubMed   Google Scholar

Contributions

H.E was collecting the literatures and read and write major parts I.K was good on drafting conclusion and analysis part. But we work hand on hand together.

Corresponding author

Correspondence to Henry Nkya .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Nkya, H., Kibona, I. Systematic literature review of gender equity and social inclusion in primary education for teachers in Tanzania: assessing status and future directions. Discov Educ 3 , 122 (2024). https://doi.org/10.1007/s44217-024-00221-8

Download citation

Received : 26 March 2024

Accepted : 02 August 2024

Published : 13 August 2024

DOI : https://doi.org/10.1007/s44217-024-00221-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Equitable education
  • Inclusive education
  • Targeted intervention
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 13 August 2024

Challenges affecting migrant healthcare workers while adjusting to new healthcare environments: a scoping review

  • Asem Al-Btoush 1 , 2 &
  • Charbel El-Bcheraoui   ORCID: orcid.org/0000-0003-3117-9966 1  

Human Resources for Health volume  22 , Article number:  56 ( 2024 ) Cite this article

Metrics details

Introduction

Shifting demographics, an aging population, and increased healthcare needs contribute to the global healthcare worker shortage. Migrant Health Care Workers (MHCWs) are crucial contributors to reducing this shortage by moving from low-and middle-income countries (LMICs) to high-income countries (HICs) for better opportunities. Economic factors and health workforce demand drive their migration, but they also face challenges adapting to a new country and new working environments. To effectively address these challenges, it is crucial to establish evidence-based policies. Failure to do so may result in the departure of Migrant Healthcare Workers (MHCWs) from host countries, thereby worsening the shortage of healthcare workers.

To review and synthesize the barriers experienced by MHCWs as they adjust to a new country and their new foreign working environments.

Methodology

We followed the PRISMA guidelines and conducted a search in the PubMed and Embase databases. We included cross-sectional studies published after the year 2000, addressing MHCWs from LMIC countries migrating to high-income countries, and published in English. We established a data extraction tool and used the Appraisal tool for Cross-Sectional Studies (AXIS) to assess article quality based on predetermined categories.

Through a targeted search, we identified fourteen articles. These articles covered 11,025 MHCWS from low- to medium-income countries, focusing on Europe, the USA, Canada, Australia, New Zealand, and Israel. Participants and respondents’ rates were diverse ranging from 12% to 90%. Studies encompassed various healthcare roles and age ranges, mainly 25–45 years, with a significant female presence. Participants resided in host countries for 3–10 years on average. Results are categorized based on the Riverside Acculturation Stress Inventory (RASI) and expanded to include bureaucratic and employment barriers, Gender differences, Natives vs. non-natives, and orientation programs.

Conclusions

The findings emphasize the importance of cultural competence training and tailored support for MHCWs integration and job satisfaction. Time spent in the new healthcare setting and the influence of orientation programs are key factors in shaping their intentions to stay or leave. Despite limitations, these studies provide valuable insights, emphasizing the ongoing need for holistic strategies to facilitate successful integration, ultimately benefiting healthcare systems and well-being for all stakeholders.

Peer Review reports

Introduction and background

In 2020, the worldwide healthcare workforce comprised 29.1 million nurses, 12.7 million medical doctors, 3.7 million pharmacists, 2.5 million dentists, 2.2 million midwives, and 14.9 million other healthcare professionals, totaling 65.1 million. However, this distribution was far from equal, with a staggering 6.5-fold difference in density observed between high-income and low-income nations [ 1 , 2 ]. Insufficiently regulated international migration of health workers can worsen existing disparities, intensifying shortages in countries already grappling with a scarcity of healthcare professionals. This impact is particularly pronounced in low- and middle-income Countries (LMICs), where the loss of skilled healthcare workers exacerbates the strain on already fragile health systems, limiting access to essential services for their populations. In contrast, high-income countries (HICs) may experience challenges due to increased demand for healthcare services, but they often have greater resources to attract and retain healthcare workers from both domestic and international sources, mitigating the impact to some extent [ 1 ].

According to a Global Burden of Disease Study conducted in 2022, it was projected that a minimum of 20.7 doctors, 70.6 nurses and midwives, 8.2 dental professionals, and 9.4 pharmaceutical personnel per demographic trends, a progressively aging population, and heightened healthcare requirements have collectively played a role in the persistent shortage of healthcare workers on a worldwide scale [ 3 ]. The International Centre on Nurse Migration has estimated that around 10.6 million fresh nursing professionals will be required within the next 10 years to confront the current nursing deficit and to fill the void left by an anticipated 4.7 million retiring nurses [ 4 ].

International Medical Graduates (IMGs), also known as Migrant Healthcare Workers (MHCWs), are physicians who practice medicine in a country different from where they obtained their primary medical qualification [ 5 ]. Approximately 40% of active physicians in the United Kingdom are IMGs [ 6 ]. This percentage exceeds 25% in the USA and Canada [ 8 ], and it surpasses 40% in countries like Australia, New Zealand, and Norway [ 7 ]. MHCWs, including doctors, nurses, therapists, and technicians, are a dynamic and crucial segment of the global healthcare workforce, relocating from LMIC countries to HICs for better opportunities and improved living conditions [ 8 ]. Their migration is driven by economic factors, career prospects, and the demand for skilled healthcare personnel in destination countries [ 9 ]. While their presence addresses healthcare workforce shortages and enhances service delivery, MHCWs encounter various challenges and barriers when transitioning to new countries and working environments [ 7 ]. A comprehensive two-phased literature review analysis underscores the challenges encountered by migrant healthcare workers, such as language barriers, difficulties with slang and medical terminology, and perceived differences in cultural, social, and professional norms. These challenges contribute to uncertainties in their interactions with colleagues and patients [ 10 ].

MHCWs constitute a vital and dynamic segment of the global healthcare workforce [ 2 ], contributing significantly to the provision of medical services across borders. Insufficiently regulated international migration of health workers can worsen existing disparities, intensifying shortages in countries already grappling with a scarcity of healthcare professionals [ 11 ]. This impact is predominantly felt by HICs (HICs) rather than LMIC Countries (LMICs) [ 1 ]. This study aims to identify the challenges MHCWs face when integrating into new countries and healthcare environments, with a focus on quantitative data from cross-sectional surveys. It contrasts with previous reviews that relied on qualitative data from interviews and discussions. The findings can inform evidence-based policies to retain MHCWs, preventing a worsening shortage if such policies are lacking.

In this scoping review, we adhered to the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guidelines to enumerate the barriers experienced by migrant and foreign healthcare workers during their transition to a new country and new working environments [ 12 ].

Data search

Our data search was initiated on June 20th, 2023, utilizing the PubMed database, and subsequently extended on June 25th, 2023, to include the Embase database, which incorporates Medline. To construct an effective search strategy, we conducted a preliminary literature review to identify relevant keywords and Mesh terms. Specifically, we focused on examining articles that address the challenges encountered by migrant healthcare workers during their transition to a new healthcare environment. As a result, we identified three key concepts—Barriers, Adjustment, and foreign healthcare worker—to formulate a search string that yielded a precise and pertinent literature on the subject matter, the final search string used is as follows:

("Barrier*"[All Fields] OR "experience*"[All Fields] OR " perspectiv*"[All Fields] OR " percept*"[All Fields] OR "Obstacle*"[All Fields] OR "challenge*"[All Fields] OR "limitation*"[All Fields] OR "factor*"[All Fields] OR "strateg*"[All Fields] OR "Social Support"[MeSH Terms] OR "Communication Barriers"[MeSH Terms]) AND ("adjust*"[All Fields] OR "adapt*"[All Fields] OR "transition*"[All Fields] OR "integrat*"[All Fields] OR "wellbeing*"[All Fields] OR "Attitude of Health Personnel"[MeSH Terms] OR "Personnel Turnover"[MeSH Terms] OR "Occupational Health"[MeSH Terms] OR "adaptation, psychological"[MeSH Terms] OR "Personal Satisfaction"[Mesh] OR "Job Satisfaction"[MeSH Terms] OR "work/psychology"[MeSH Terms] OR "occupational stress/psychology"[MeSH Terms] OR "workplace/psychology"[MeSH Terms]) AND ("Migrant healthcare worker*"[All Fields] OR "migrant care worker*" [All Fields] OR "International medical graduate*"[All Fields] OR "Foreign medical graduate*"[All Fields] OR "Healthcare worker migration*"[All Fields] OR "care worker migration*"[All Fields] OR "migrant physician*" [All Fields] OR "migrant nurse*" [All Fields] OR "Foreign medical"[All Fields] OR "internationally educated healthcare professional*" [All Fields] OR "internationally educated physician*" [All Fields] OR "internationally educated nurse*" [All Fields] OR "overseas qualified*"[All Fields] OR "International trained*"[All Fields] OR "Foreign Medical Graduates/psychology"[Mesh] OR "Foreign Professional Personnel"[MeSH Terms] OR "Foreign Medical Graduates"[MeSH Terms]).

We refined the study results by including only original research articles, while excluding preprints, conference abstracts, editorials, letters to editors, commentaries, interviews, and correspondence. We began the search process by developing a search string and applying it to the PubMed database. Then, we screened titles and abstracts. We conducted a full-text analysis on the filtered results, following the specified eligibility criteria. We conducted the same process for the Embase and Medline databases. To identify additional relevant articles, we employed snowballing and reference harvesting techniques. We eliminated duplicate articles and saved the title and abstract screening of the literature using the software EndNote X73. To stay updated during the writing process, we set a notification alarm for database updates, utilizing the same methodology for results analysis and filtering as previously described (see Table 1 ).

Data extraction

We developed a comprehensive data extraction tool encompassing the following data points: PubMed identification number and Embase identification number for PubMed and Embase (Medline) respectively, Title, Author, Journal, aims, Type of study, study design, year, number of participants contacted, number of participants responded, response rate, country of origin of migrant healthcare worker, country immigrated to, inclusion criteria of participants, Survey content development, Questionnaire topics, Type of analysis, Analysis Measures categorized Results, Conclusions and recommendations, and Limitations. The detailed data extraction tool can be accessed in Appendix S1.

Quality assessment

To assess the quality of the selected articles, we employed the Appraisal tool for Cross-Sectional Studies (AXIS). AXIS Quality Assessment tool is specifically designed for Observational Cohort and Cross-Sectional studies. We used this tool to assess the reliability of the included studies, examining aspects like study design, sampling methods, reliability and validity measures, statistical techniques, and the overall quality of reporting. The main goal was to assess the studies’ methodological rigor and how well they addressed potential biases [ 13 ]. We employed a 20-point scoring system to evaluate article quality across predefined categories. The scoring ranged from excellent and good quality to fair quality, and in some cases poor quality. This system helped us assess and categorize the articles. The detailed quality assessment tool can be accessed in Appendix S2.

We conducted a thorough search, following PRISMA guidelines [ 12 ], initially identifying 20 articles for potential inclusion in the review. Out of these, we retrieved 18 from systematic databases—13 from PubMed and five from Embase (including Medline). We removed duplicate entries in PubMed, combined the refined dataset with Embase results, and acquired an additional five articles through snowballing related articles. After a thorough screening process, we excluded nine articles: four for duplicating data, four for conference abstracts, and one for focusing on medical fellows’ post-residency, which did not align with our Eligibility criteria. This process resulted in our final selection of 14 articles (see Fig. 1 ).

figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) [ 12 ] flow chart of article selection for the scoping review on challenges affecting migrant healthcare workers while adjusting to new healthcare environment

General characteristics

Of the fourteen reviewed studies, seven were deemed high quality, with the majority employing cross-sectional survey designs. Collectively these studies encompassed 11,025 MHCWs. The average response rate across most studies was moderately high, ranging from 40% to 90%, except for one study in Ireland with a response rate of 12% [ 14 ]. The studies covered a range of scenarios, including involvement of IMGs in training, during the examination process, and in permanent posts. Respondent diversity was notable, with participants originating from various LMIC countries such as Nigeria, India, the Philippines, Nepal, China, Egypt, and Pakistan, and relocating to high income countries like the USA, Canada, Australia, New Zealand, Finland, Sweden, Israel, and Ireland. The studies covered a range of healthcare occupations from 9 studies covering doctors (65%), 4 studies covering nurses (28%), and 1 study covering migrant care workers in Australia (7%) [ 15 ]. Participant ages fall between 25 and 45 years. Most of the studies focused on Gender not sex, while gender ratios varied, with more apparent considerable proportion were female. Participants had an average host country residence of 3–10 years.

Main barriers

We will be highlighting the results based on The Riverside Acculturation Stress Inventory (RASI), an acculturation scale developed by Benet-Martínez and Haritatos in 2005 [ 16 ]. It comprises 15 items, which focus on culture-related challenges in five life domains. These are (1) language skills; (2) work challenges; (3) intercultural relations; (4) discrimination; and (5) cultural isolation; in addition, we will be highlighting Bureaucratic and employment barriers.

Language problems

Language problems were identified as a significant challenge faced by MHCWs across multiple studies. In various contexts, such as IMGs in the USA, foreign-born physicians in Finland, migrant nurses in Australia and the USA, and migrant care workers in Australian aged care facilities [ 15 , 17 , 18 , 19 ], Language barriers were found to exert a detrimental influence on the experiences of migrant care workers within Australian residential aged care facilities. Adebayo et al. identified ethnicity and self-reported English proficiency as significant factors contributing to acculturation stress [ 15 ]. Language and communication difficulties were prominent challenges for MHCWs in the USA, with 7% of respondents expressing concerns in this area [ 17 ]. Language barriers were among the top barriers experienced by non-English speaking psychiatry IMGs who participated in a web-based questionnaire in Canada (Median Score: 2.5 vs. 2; p  = 0.002) [ 20 ]. Linguistical barriers and communication issues affected their interactions with colleagues and patients, making it difficult to provide the best possible care and integrate into their work environment effectively.

Work challenges, new healthcare settings

MHCWs often encounter unique challenges when adapting to healthcare systems and working environments in countries such as the USA and Canada, where understanding healthcare team dynamics and roles, as well as the legal and ethical aspects specific to the new system, is crucial for integration [ 17 , 21 ]. Likewise, migrant nurses in Australia noted disparities in work practices and patient care approaches compared to their home countries [ 18 ]. In the USA, MHCWs faced barriers due to differences in professional practices, MHCWs may encounter differences in the use of medical equipment, treatment approaches, or patient management strategies. These disparities can lead to confusion, uncertainty, or inefficiency in their work, potentially impeding their learning and adaptation process within the new healthcare environment [ 17 ]. The understanding of the Canadian healthcare system (Median Score: 4 vs. 2; p = 0.020) was mentioned as second choice among psychiatry IMGs who are in Canada for less than 12 months [ 22 ]. In another study in Canada, mean scores of challenges faced by IMGs and program directors were for Knowledge of the Canadian Healthcare System as follows: IMGs: 3.93 (SD: 1.097), Program Directors: 3.55 (SD: 0.852) [ 20 ]. Some MHCWs also reported insufficient workplace support, affecting their overall well-being and job satisfaction [ 19 ]. Therefore, support at work, including providing assistance in areas such as language and cultural adaptation, professional development, social integration, psychosocial well-being, and recognition for MHCWs, plays a vital role in helping them overcome these challenges and succeed in their new healthcare roles.

Discrimination

Discrimination poses a significant challenge for MHCWs worldwide, stemming from factors like ethnicity, language, and cultural differences. This discrimination is linked to struggles adapting to new healthcare systems, potentially leading to workforce talent loss [ 23 ]. Female MHCWs often face gender-related discrimination, impacting their integration and career intentions. Male respondents primarily expressed concerns related to logistical challenges, such as family issues (80%), adjusting to American diets (72%), visa and immigration matters (64%), finding adequate housing (59%), and managing finances (57%). In contrast, female respondents were more focused on personal issues, including mental health concerns (65%), duty hours (57%), self-sufficiency (54%), workplace discrimination (53%), and lack of support at work (52%). These differences indicate that while male IMGs were mainly worried about bureaucratic hurdles, female IMGs were more concerned about personal challenges like discrimination and mental well-being [ 17 ]. Workplace discrimination is particularly detrimental, affecting job satisfaction and integration among MHCWs in the USA [ 17 ].

Beyond the USA, MHCWs in various countries confront discrimination challenges. In Sweden, a significant portion of respondents (29%), reported instances of perceived discrimination during their job-seeking endeavors. Gender differences were evident in the types of discriminatory experiences recounted. Barriers to employment included feelings of competence undervaluation attributed to factors such as foreign ethnicity, religion, language proficiency, and limited work experience or references in Sweden. Notably, respondents with a background of growing up or residing in Sweden reported fewer instances of discrimination or undervalued competence, amounting to 6% of the sample size (n = 16) [ 24 ]. Citizenship and perceptions regarding career opportunities emerged as pivotal factors influencing decision-making among respondents in Ireland. Those intending to remain perceived more abundant career prospects, while those planning to migrate onward expressed disagreement with this perception [ 14 ]. Similarly, foreign-born physicians in Finland encounter discrimination linked to language difficulties and employment barriers, affecting their intentions to stay in the country, 59% of foreign-born public sector GPs intended to leave their jobs, while 52% of Finnish public GPs had the same intention [ 25 , 26 ]. Overseas-qualified nurses in Australia experience discrimination due to language barriers and advocate for more cultural diversity education [ 18 ]. In Ireland, migrant doctors struggle with communication difficulties and discrimination based on dialects and accents [ 14 ]. Canadian MHCWs contend with acculturation stress due to limited communication training, language barriers, and discrimination tied to cultural backgrounds [ 21 ]. In Australia, migrant care workers report discrimination related to ethnicity, impacting their mental health and well-being [ 15 ].

Intercultural relations and cultural isolation

The study by Symes [ 17 ] in the USA revealed significant challenges faced by MHCWs. Intercultural barriers, affecting both professional practices and individual experiences, were a major concern for 18% of respondents. The study also highlighted the USA' healthcare system as a substantial hurdle for MHCWs, along with the emotional strain of being far from their support network; family, and friends (11%). Social Isolation was among the top barriers experienced by non-English speaking psychiatry IMGs (Median Score: 3 vs. 3.5; p  = 0.043) [ 22 ].

Meanwhile, Finland saw MHCWs encountering competence undervaluation based on factors, such as foreign ethnicity, religion, and language skills [ 25 , 26 ]. In Israel and the USA, migrant nurses faced challenges concerning work practices and communication issues, underscoring the need for enhanced cultural education [ 19 ]. In addition, Australian aged care facilities reported that weak associations were found between cultural isolation and DASS-depression, anxiety, and stress, as well as intercultural relations and DASS-depression, anxiety, and stress [ 15 ]. These experiences underscored the need for enhanced cultural education to aid integration and maximize the utilization of their skills [ 27 ].

Bureaucratic barriers

The articles shed light on challenges faced by MHCWs in different countries, encompassing bureaucratic and employment barriers that affect their integration and well-being. These challenges encompass work-related difficulties, interrelationships with colleagues, bureaucratic obstacles, visa issues, and financial constraints. Notably, Sweden and Finland encountered integration challenges for foreign-born physicians, including discrimination, undervaluation of competence, and language difficulties [ 24 , 25 , 26 ].

Bureaucratic barriers were a significant issue, particularly in the USA, the study by Symes [ 17 ] in the USA revealed significant challenges faced by MHCWs. Bureaucratic barriers, affecting both professional practices and individual experiences, were a major concern for 9% of study respondents where recent travel restrictions to specific countries delayed visa applications, causing stress and hindrances to MHCWs' successful adjustment. Employment barriers, including visa-related challenges and a lack of orientation support, impacted integration and raised concerns related to mental health, work–life balance, workplace discrimination, and support [ 17 ]. In Finland, standardized beta weights for significant variables used in the study indicated a P value of 0.085, reflecting the impact on migrant healthcare workers' intentions to remain in the country. Among these variables, employment barriers were associated with increased turnover intentions among migrant healthcare workers [ 25 , 26 ], while in Australia, the length of stay was linked to job satisfaction among immigrant nurses, indicating the need to address bureaucratic and employment-related challenges for MHCWs' successful integration and well-being [ 18 ].

The duration spent in a new healthcare setting significantly shapes healthcare professionals' career choices and migration intentions. Studies across the fourteen articles consistently demonstrate that the length of time spent in the new environment is intricately linked to these decisions. Longer stays in the host country are associated with stronger intentions to stay, as seen in MHCWs in the USA, who report higher job satisfaction and reduced turnover intentions [ 17 ]. Similarly, migrant doctors in Sweden with lengthier average durations have greater career stability and advancement in the medical labor market [ 24 ].

Conversely, shorter periods in the new healthcare setting often express higher intentions to leave or migrate onward. MHCWs in Australia with shorter durations experience higher acculturation stress, which is associated with intentions to leave [ 15 ]. MHCWs in Finland who have shorter contracts are more likely to express intentions to leave their positions [ 25 ].

Support and orientation programs

Support programs and orientation programs are integral in addressing the challenges faced by MHCWs in various healthcare and professional settings. An increase in perceived quality of orientation reduced the odds of experiencing organizational-level turnover by 36% among Asian Foreign-Educated Nurses in their 1st year of US employment [ 28 ]. In Australian residential aged care facilities, support programs alleviate acculturation stress for migrant care workers [ 15 ].

In Canada, approximately 75% of all participants, including 93% of Program Directors and 63% of IMGs, expressed the need for an orientation program for International Medical Graduates (IMGs). These findings underscore Canada's recognition of the importance of resources and orientation programs to facilitate the integration of MHCWs into the Canadian healthcare system [ 20 ]. Moreover, communication skills training and cultural orientation are identified as essential components of support programs to IMGs, especially for those dealing with language barriers and unfamiliar healthcare systems [ 21 ]. These programs do not only assist individuals in overcoming cultural and language challenges but also provide them with the necessary tools to navigate the complexities of their new professional environments effectively.

This paper provides a nuanced understanding of the unique challenges faced by migrant healthcare workers (MHCWs) when transitioning to new countries and healthcare environments. Unlike previous reviews that predominantly utilized qualitative data, this scoping review focuses on quantitative data from cross-sectional surveys, offering a broader, data-driven perspective on the integration of MHCWs. The study highlights significant barriers, such as language difficulties, cultural differences, and acculturation stress [ 8 ], emphasizing their impact on communication, job satisfaction, and overall integration into the healthcare system. Notably, the review underscores the importance of the temporal dimension, revealing how the duration of stay in a new environment influences MHCWs' adaptation and retention.

Furthermore, this review extends the existing literature by providing concrete recommendations for healthcare systems to improve the integration of MHCWs. It suggests implementing cultural competence training, diversity and inclusion policies, and support networks to address cultural and language barriers. The study also highlights the pivotal role of effective orientation programs in enhancing MHCWs’ confidence, competence, and sense of belonging, ultimately leading to reduced turnover intentions. By addressing these challenges through tailored strategies, the paper aims to foster a more inclusive and supportive healthcare environment, enhancing both patient care and the well-being of migrant healthcare professionals [ 2 ].

Language difficulties affected interactions with patients, colleagues, and supervisors, leading to miscommunications, misunderstandings, and potential risks in patient care. MHCWs reported struggling with English language skills, including comprehension of medical terminology, idioms, and nuances, which hindered effective communication and patient-centered history taking. For some, this also influenced their ability to understand and adhere to local healthcare protocols, ethical standards, and legal requirements. Inadequate language proficiency can hinder patient–physician interactions, potentially compromising the quality of care provided [ 29 ]. The consequences of language impacted MHCWs’ confidence, job satisfaction, and overall integration into the healthcare system.

The literature on cultural aspects as barriers to integration and intercultural relations among MHCWs underscores the importance of cultural competence, effective communication, and a supportive work environment. Comparative studies have offered valuable insights into how cultural factors vary across different countries and healthcare systems, highlighting the need for tailored strategies to address these barriers and promote successful integration and positive intercultural relations within the healthcare profession [ 30 , 31 ].

Cultural differences have a significant impact on the experiences of healthcare professionals, as highlighted in the analyzed articles. These differences encompass various aspects, including communication styles, power distance, and healthcare practices. For instance, MHCWs often face language barriers as mentioned before, making it challenging to effectively communicate with colleagues and patients [ 29 , 32 ]. In addition, variations in cultural norms and values can influence how healthcare professionals perceive and respond to specific situations, potentially leading to misunderstandings or conflicts in clinical settings. These cultural disparities can also affect power dynamics within healthcare teams, with MHCWs sometimes feeling marginalized or undervalued [ 23 , 33 ].

This scoping review goes along with existing literature that highlights the significance of the temporal dimension in healthcare professionals' career decisions and intentions [ 34 ]. A longer duration in the new healthcare environment provides professionals with the opportunity to adapt, integrate, and establish themselves, leading to a higher likelihood of staying. On the other hand, those who are relatively new to the setting may grapple with acculturation stress, language barriers, and the challenges of adjusting to a new healthcare system, potentially influencing their decisions to leave or seek opportunities elsewhere.

Overall, the time frame serves as a crucial context for understanding the complexities of professionals' intentions to either stay, return home, or migrate onward in their healthcare careers. Recognizing the dynamic interplay between time spent in the new healthcare setting and career intentions is pivotal for designing effective interventions and support mechanisms that address the evolving needs of healthcare professionals at various stages of their migration journey. The scoping review reveals several implications for further research and identifies gaps in the existing literature. One important avenue for future investigation is the in-depth exploration of the specific factors influencing the time frame healthcare professionals spend in new healthcare settings and its connection to their intentions to stay or leave.

To address highlighted challenges and provide a more inclusive and supportive healthcare environment, healthcare systems should implement several strategies. First, cultural competence training should be a fundamental component of medical education and professional development programs [ 28 , 35 ]. This training equips MHCWs with the skills to navigate cultural differences effectively, resulting in better communication and collaboration [ 36 ]. Second, healthcare systems should establish diversity and inclusion policies that promote equality and respect for all staff, regardless of their cultural background. These policies can help create a more welcoming and accepting workplace culture [ 8 ].

Furthermore, healthcare organizations should offer support networks for MHCWs [ 8 ]. These networks can provide emotional support, guidance on cultural adaptation, and opportunities for social interaction. In addition, interprofessional education programs can enhance teamwork and collaboration among healthcare professionals from diverse backgrounds [ 37 ]. Language support services, such as interpreters or language courses, are crucial in overcoming language barriers [ 29 ]. Cultural liaisons within healthcare organizations can serve as valuable resources for IMGs, helping them navigate the healthcare system and address cultural challenges effectively.

Orientation programs have been mentioned many times in numerous studies. They play a significant role in shaping the experiences, attitudes, and intentions of healthcare professionals in the various studies [ 38 ]. These programs are designed to facilitate the integration of foreign-trained healthcare workers into their new healthcare settings, providing them with essential information, skills, and support [ 8 ]. By actively addressing cultural differences and implementing these measures, healthcare systems can create a more inclusive and culturally competent environment that enhances patient care, promotes job satisfaction, and supports the well-being and integration of healthcare professionals from diverse cultural backgrounds. Quality cultural orientation experiences are linked to reduced turnover intentions and increased job satisfaction [ 38 ].

Orientation programs also contribute to the acculturation and integration of healthcare professionals into the new healthcare system. Effective orientation programs provide newcomers with a clear understanding of their roles, responsibilities, and expectations, helping them feel more confident and competent in their positions [ 39 ]. Effective orientation equips professionals with the necessary skills for effective communication and cultural understanding, promoting a smoother transition and a sense of belonging in the new environment [ 28 ]. Effective orientation equips professionals with the knowledge and skills needed to navigate the complexities of the host country's healthcare system, communicate effectively with colleagues and patients, and understand cultural norms and practices [ 40 ].

Further research can focus on policies around cultural competence training, diversity and inclusion, support networks, interprofessional education, language support services, and orientation programs in healthcare systems. For example, some countries like the United States and Australia have implemented policies on cultural competence training as part of medical education, established diversity and inclusion policies to promote equality and respect, created support networks for healthcare workers, and offered interprofessional education programs to enhance teamwork [ 15 , 28 ]. In addition, language support services and effective orientation programs have been introduced to aid in overcoming language barriers and facilitating the integration of foreign-trained healthcare workers [ 29 ]. However, further research is needed to understand the impact of these policies on creating a more inclusive and supportive healthcare environment, improving patient care, job satisfaction, and the integration of healthcare professionals from diverse cultural backgrounds.

Retention and turnover intentions are influenced by a complex interplay of factors. Positive experiences, such as effective orientation, supportive team climates, and ample career opportunities, are associated with reduced turnover intentions [ 23 , 30 ], while barriers and dissatisfaction contribute to higher intentions to leave or migrate. Personal demographics, nationality, career motivations, and the quality of professional experiences intersect to shape migration intentions. The reviewed studies add to existing literature and highlight the importance of addressing discrimination, providing support, and creating inclusive work environments to optimize the integration and well-being of migrant healthcare professionals [ 23 , 33 ]. Enhanced preparation, orientation programs, and communication skills training emerge as valuable strategies to facilitate successful transitions and mitigate challenges.

Limitations

While this scoping review provides a comprehensive exploration of the challenges and barriers faced by MHCWs in unfamiliar healthcare environments, it is important to acknowledge certain limitations that warrant consideration when interpreting its findings and implications. The diverse range of MHCWs, including doctors and nurses, from various countries of origin and experience levels, introduces sample heterogeneity. Some countries of origin were vast and versatile, not neatly fitting into the LMIC countries classification like Saudi Arabia or Estonia. However, this diversity serves as a strength, enriching insights and offering a holistic understanding of the phenomenon. Similarly, while the predominantly cross-sectional design restricts the ability to establish causal relationships, the utilization of cross-sectional surveys across the studies enhances methodological rigor by providing valuable statistical insights.

In addition, the reliance on self-reported data within the studies raises concerns about potential biases. However, this limitation can be mitigated by the scoping review's consideration of gender and ethnic differences within the analysis. By doing so, it offers valuable insights into the subtle ways in which MHCWs experiences may vary based on these factors, enriching the conclusions, and enhancing applicability across diverse contexts. Furthermore, the variability in response rates across studies, which could introduce non-response bias, is balanced by the review's comprehensive scope, capturing a wide range of perspectives from different healthcare settings, professional groups, and countries of origin.

One limitation of the scoping review conducted for this research pertains to the language barrier encountered during the literature search process. The review aimed to comprehensively explore existing studies on the experiences of MHCWs, encompassing a broad range of sources to ensure inclusivity. However, the search was primarily conducted in English, which may have inadvertently excluded relevant studies published in other languages. As a result, there is a possibility that valuable insights and perspectives from non-English language sources were not captured in the review. This limitation could potentially introduce bias into the findings, as it may overlook important research conducted in languages other than English. In addition, the reliance on English-language publications may limit the generalizability of the findings, particularly in contexts where English is not widely used or where research is predominantly published in other languages. Therefore, it is important to acknowledge the language limitation as a potential constraint in the scope and comprehensiveness of the scoping review findings. Future research endeavors may benefit from employing multilingual search strategies to mitigate this limitation and ensure a more comprehensive and inclusive exploration of the topic.

In summary, while the limitations should be acknowledged, they are counterbalanced by the scoping review's strengths. This review's inclusivity, methodological rigor, and synthesis of findings contribute to its credibility and effectiveness in shedding light on the multifaceted challenges and experiences of MHCWs navigating unfamiliar healthcare environments. Addressing these limitations in future research, through more focused samples, longitudinal designs, and consideration of additional contextual factors, would further refine our understanding of this complex phenomenon.

This paper sheds light on the multifaceted barriers of MHCWs while adjusting to a new country and healthcare system in various countries, while many of these barriers, such as language skills, discrimination, and work challenges, have been well-documented in existing literature, our review has identified additional nuances and new barriers. For example, we found that intercultural relations and cultural isolation were less frequently highlighted in previous studies but emerged as significant issues in our review to inform evidence-based policies that can address these challenges. Without effective policies, MHCWs may face significant challenges that could lead to their departure from the host country, exacerbating healthcare worker shortages there. While our review is focused on MHCWs in host countries, we also recognize that their migration may also worsen healthcare shortages in their source countries. Therefore, it's crucial to implement and evaluate strategies that support the integration and well-being of MHCWs in host countries. It is equally important to address the reasons that lead to MCHWs leaving their country of origin, a topic that goes beyond the scope of our review. This paper offers a detailed examination of the unique challenges encountered by MHCWs as they adapt to new countries and healthcare settings. Unlike prior reviews that mainly relied on qualitative data, this scoping review leverages quantitative data from cross-sectional surveys, providing a comprehensive, data-driven perspective on MHCWs' integration. This study also underscores the temporal dimension’s importance, highlighting how the duration of stay in a new environment influences MHCWs’ adaptation and retention, which adds a new layer of insight compared to prior research. Findings reveal the complex interplay between Language barriers, cultural differences, Discrimination, employment barriers, work environment, and personal well-being. The findings underscore the significance of cultural competence training and support programs to enhance the integration and job satisfaction of MHCWs. The role of time spent in the new healthcare setting emerges as a crucial factor in shaping intentions to stay or leave. Retention and turnover intentions in migrant healthcare professionals are influenced by a complex interplay of factors, with positive experiences and support reducing turnover intentions. Addressing discrimination, promoting inclusive work environments, and enhancing preparation programs are crucial for their successful integration and well-being. Further research should explore the impact of policies on cultural competence, diversity, support networks, interprofessional education, language services, and orientation programs. These measures, implemented in some countries, aim to create inclusive healthcare environments. Despite limitations of this scoping review on Sample Heterogeneity, Variability in Response Rates, and self-reported data, this study contributes valuable insights and emphasize the ongoing need for comprehensive strategies to facilitate the successful integration of MHCWs in diverse contexts. Ultimately, addressing these dynamics can lead to improved healthcare systems and the well-being of both healthcare providers and patients alike.

Availability of data and materials

Not applicable.

Abbreviations

  • Migrant healthcare worker
  • International medical graduate

Foreign-educated nurse

High income countries

Low- and middle-income Countries

Boniol M, Kunjumen T, Nair TS, Siyam A, Campbell J, Diallo K. The global health workforce stock and distribution in 2020 and 2030: a threat to equity and “universal” health coverage? BMJ Glob Health. 2022;7(6): e009316.

Article   PubMed   PubMed Central   Google Scholar  

Norcini JJ, van Zanten M, Boulet JR. The contribution of international medical graduates to diversity in the US physician workforce: graduate medical education. J Health Care Poor Underserved. 2008;19(2):493–9.

Article   PubMed   Google Scholar  

Li H, Nie W, Li J. The benefits and caveats of international nurse migration. Int J Nurs Sci. 2014;1(3):314–7.

Google Scholar  

Anderson K. Nursing workforce crisis looms as expected six-million shortfall will be increased by more than four million nurses retiring by 2030. International Council of Nurses. 2020.

Siyam A, Dal Poz MR, Organization WHO. Migration of health workers: WHO code of practice and the global economic crisis. Geneva: World Health Organization; 2014.

Karas M, Sheen NJ, North RV, Ryan B, Bullock A. Continuing professional development requirements for UK health professionals: a scoping review. BMJ Open. 2020;10(3): e032781.

Health workforce migration (Edition 2018) [Internet]. 2018 [cited August 2023]. https://www.oecd-ilibrary.org/content/data/26513358-en .

Motala MI, Van Wyk JM. Experiences of foreign medical graduates (FMGs), international medical graduates (IMGs) and overseas trained graduates (OTGs) on entering developing or middle-income countries like South Africa: a scoping review. Hum Resour Health. 2019;17(1):7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Serour GI. Healthcare workers and the brain drain. Int J Gynecol Obstet. 2009;106(2):175–8.

Article   Google Scholar  

Claudia Leone NT. Lived Experience of Migrant Health Workers. 2018; 2nd Review of Relevance and Effectiveness of the WHO Global Code of Practice on the International Recruitment of Health Personnel.

Cometto G, Boniol M, Mahat A, Diallo K, Campbell J. Understanding the WHO health workforce support and safeguards list 2023. Bull World Health Organ. 2023;101(6):362.

Beller EM, Glasziou PP, Altman DG, Hopewell S, Bastian H, Chalmers I, et al. PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Med. 2013;10(4): e1001419.

Downes MJ, Brennan ML, Williams HC, Dean RS. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open. 2016;6(12): e011458.

Brugha R, McAleese S, Dicker P, Tyrrell E, Thomas S, Normand C, Humphries N. Passing through-reasons why migrant doctors in Ireland plan to stay, return home or migrate onwards to new destination countries. Hum Resour Health. 2016;14(Suppl 1):35.

Adebayo B, Nichols P, Albrecht MA, Brijnath B, Heslop K. Investigating the impacts of acculturation stress on migrant care workers in australian residential aged care facilities. J Transcult Nurs. 2021;32(4):389–98.

Miller MJ, Kim J, Benet-Martínez V. Validating the riverside acculturation stress inventory with Asian Americans. Psychol Assess. 2011;23(2):300.

Symes HA, Boulet J, Yaghmour NA, Wallowicz T, McKinley DW. International medical graduate resident wellness: examining qualitative data from J-1 visa physician recipients. Acad Med. 2022;97(3):420–5.

Timilsina Bhandari KK, Xiao LD, Belan I. Job satisfaction of overseas-qualified nurses working in Australian hospitals. Int Nurs Rev. 2015;62(1):64–74.

Article   CAS   PubMed   Google Scholar  

Itzhaki M, Ea E, Ehrenfeld M, Fitzpatrick JJ. Job satisfaction among immigrant nurses in Israel and the United States of America. Int Nurs Rev. 2013;60(1):122–8.

Zulla R, Baerlocher MO, Verma S. International medical graduates (IMGs) needs assessment study: comparison between current IMG trainees and program directors. BMC Med Educ. 2008;8:42.

Hall P, Keely E, Dojeiji S, Byszewski A, Marks M. Communication skills, cultural challenges and individual support: challenges of international medical graduates in a Canadian healthcare environment. Med Teach. 2004;26(2):120–5.

Sockalingam S, Hawa R, Al-Battran M, Abbey SE, Zaretsky A. Preparing international medical graduates for psychiatry residency: a multi-site needs assessment. Acad Psychiatry. 2012;36(4):277–81.

Tuttas CA. Perceived racial and ethnic prejudice and discrimination experiences of minority migrant nurses: a literature review. J Transcult Nurs. 2015;26(5):514–20.

Sturesson L, Öhlander M, Nilsson GH, Palmgren PJ, Stenfors T. Migrant physicians’ entrance and advancement in the Swedish medical labour market: a cross-sectional study. Hum Resour Health. 2019;17(1):71.

Heponiemi T, Hietapakka L, Kaihlanen A, Aalto AM. The turnover intentions and intentions to leave the country of foreign-born physicians in Finland: a cross-sectional questionnaire study. BMC Health Serv Res. 2019;19(1):624.

Kuusio H, Heponiemi T, Vänskä J, Aalto AM, Ruskoaho J, Elovainio M. Psychosocial stress factors and intention to leave job: differences between foreign-born and Finnish-born general practitioners. Scand J Public Health. 2013;41(4):405–11.

Mannes MM, Thornley DJ, Wilkinson TJ. The consequences of cultural difference: the international medical graduate journey in New Zealand. Int J Med Educ. 2023;14:43–54.

Geun HG, Redman RW, McCullagh MC. Predictors of turnover among Asian foreign-educated nurses in their 1st year of US employment. J Nurs Adm. 2018;48(10):519–25.

McGrath P, Henderson D, Holewa H. Language issues: an important professional practice dimension for Australian International medical graduates. Commun Med. 2013;10(3):191–200.

Ho KH, Chiang VC. A meta-ethnography of the acculturation and socialization experiences of migrant care workers. J Adv Nurs. 2015;71(2):237–54.

Liou SR, Cheng CY. Experiences of a Taiwanese nurse in the United States. Nurs Forum. 2011;46(2):102–9.

McGrath PD, Henderson D, Tamargo J, Holewa HA. ‘All these allied health professionals and you’re not really sure when you use them’: insights from Australian international medical graduates on working with allied health. Aust Health Rev. 2011;35(4):418–23.

Foulex A, Robino M, Grira M. New medical demography: challenges for international medical graduates. Rev Med Suisse. 2018;14(620):1710–3.

PubMed   Google Scholar  

Palese A, Barba M, Borghi G, Mesaglio M, Brusaferro S. Competence of Romanian nurses after their first six months in Italy: a descriptive study. J Clin Nurs. 2007;16(12):2260–71.

Cummins T. Migrant nurses’ perceptions and attitudes of integration into the perioperative setting. J Adv Nurs. 2009;65(8):1611–6.

Smith JB, Herinek D, Woodward-Kron R, Ewers M. Nurse migration in Australia, Germany, and the UK: a rapid evidence assessment of empirical research involving migrant nurses. Policy Polit Nurs Pract. 2022;23(3):175–94.

Jalal M, Bardhan KD, Sanders D, Illing J. International: overseas doctors of the NHS: migration, transition, challenges and towards resolution. Future Healthc J. 2019;6(1):76–81.

Schumann M, Sepke M, Peters H. Doctors on the move 2: a qualitative study on the social integration of middle eastern physicians following their migration to Germany. Global Health. 2022;18(1):78.

Wolcott K, Llamado S, Mace D. Integration of internationally educated nurses into the U.S. workforce. J Nurses Prof Dev. 2013;29(5):263–8.

Sockalingam S, Khan A, Tan A, Hawa R, Abbey S, Jackson T, et al. A framework for understanding international medical graduate challenges during transition into fellowship programs. Teach Learn Med. 2014;26(4):401–8.

Download references

Acknowledgements

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Evidence-Based Public Health Unit (ZIG2), Center for International Health Protection, Robert Koch Institute, Nordufer 20, 13353, Berlin, Germany

Asem Al-Btoush & Charbel El-Bcheraoui

Charité Center for Global Health, Institute of International Health, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353, Berlin, Germany

Asem Al-Btoush

You can also search for this author in PubMed   Google Scholar

Contributions

Corresponding author.

Correspondence to Charbel El-Bcheraoui .

Ethics declarations

Ethics approval and consent to participate, consent for publication., competing interests, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: appendix s1: data extraction sheet., additional file 2: appendix s2: quality assessment sheet., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Al-Btoush, A., El-Bcheraoui, C. Challenges affecting migrant healthcare workers while adjusting to new healthcare environments: a scoping review. Hum Resour Health 22 , 56 (2024). https://doi.org/10.1186/s12960-024-00941-w

Download citation

Received : 30 December 2023

Accepted : 05 August 2024

Published : 13 August 2024

DOI : https://doi.org/10.1186/s12960-024-00941-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Integration
  • New healthcare environment
  • Foreign healthcare worker

Human Resources for Health

ISSN: 1478-4491

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review quality of data

SYSTEMATIC REVIEW article

Evaluating the global prevalence of insomnia during pregnancy through standardized questionnaires and diagnostic criteria: a systematic review and meta-analysis.

Chengcheng Yang&#x;

  • 1 Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
  • 2 Nanjing University of Chinese Medicine, Nanjing, China

Introduction: Insomnia during pregnancy presents significant medical care challenges and heightens the risk of adverse outcomes for both pregnant women and fetuses. This study undertook a meta-analysis to assess the global prevalence of insomnia during pregnancy, examining both the overall prevalence and regional variations.

Method: The aim of this study was to perform a meta-analysis of articles indexed in PubMed, Embase, and Web of Science from the inception of these databases up to February 29, 2024. The study systematically reviewed the global prevalence of gestational insomnia and explored potential moderating factors, encompassing research type, publication date, regional influences, maternal age, pregnancy status, depressive symptoms, and anxiety symptoms.

Result: Forty-four studies, encompassing a total of 47,399,513 participants, were included in the analysis. The overall prevalence of insomnia symptoms during pregnancy was 43.9%. Regional factors and depression emerged as the main factors affecting insomnia, with Europe (53.6%) surpassing North America (41.0%), followed by South America (50.6%) and Asia (40.7%). High depression rates (56.2%) correlated with increased insomnia prevalence compared to low depression rates (39.8%). The type of research and publication date showed no significant impact on the prevalence of insomnia symptoms.

Conclusion: The meta-analysis results indicated that the prevalence of insomnia symptoms was higher during pregnancy, especially among pregnant women who were in a highly depressed state or located in the European region.

Systematic review registration: PROSPERO, identifier CRD42018104460.

● Reassess the global prevalence of insomnia during pregnancy.

● The regional differences in insomnia disease were discussed.

● The possible reasons for the difference in prevalence were analyzed.

Introduction

Insomnia is a common clinical disorder characterized by difficulty falling asleep or maintaining sleep, often accompanied by symptoms such as irritability or fatigue when awake. It often occurs at least three times a week and lasted for a duration of at least three months, and can’t be attributed to other diseases or substances ( 1 ).

Research indicates that the incidence rate of insomnia in adults typically ranges from 6% to 10%, whereas among pregnant women it is notably higher at 38.2% ( 2 , 3 ). Insomnia during pregnancy can be attributed to a varied of factors, such as physical discomfort, hormonal fluctuations, fetal growth ( 4 ), and emotional distress. Research has shown that insomnia during pregnancy ( 5 ) not only leads to a decline in quality of life, but also is a potential cause of premature delivery, cesarean section, prolonged delivery, pregnancy induced hypertension, pregnancy induced diabetes, and postpartum depression. Lu conducted a comprehensive review of sleep disorders and their association with adverse maternal and infant outcomes. The results revealed that sleep disorders, including insomnia, were associated with an increased risk of adverse pregnancy outcomes such as preeclampsia (OR=2.80, 95% CI 2.38-3.30), hypertension during pregnancy (OR=1.74, 95% CI 1.54-1.97), diabetes during pregnancy (OR=1.59, 95% CI 1.45-1.76), cesarean section (OR=1.47, 95% CI 1.31-1.64), and premature delivery (OR=1.38, 95% CI 1.26-1.51) ( 6 ); Additionally, a study demonstrated that insomnia was associated with an increased risk of perinatal suicide (OR=4.76, 95% CI 1.83-12.34) ( 7 ). Therefore, insomnia poses a significant threat to maternal and fetal health throughout pregnancy.

In the 2020 review on the prevalence of insomnia, the authors conducted subgroup analysis on variables including maternal age, gestational age, depressive symptoms, and gestational period ( 8 ). The review found that maternal age, gestational age, and anxiety significantly impact the prevalence of insomnia. Due to the limited number of articles included, the authors did not compare regional prevalence rates. As is well known, insomnia during pregnancy is a sleep disorder caused by various confounding factors. Regional studies are important for the prevention of women’s health. In China, a cross-sectional study on insomnia in Chinese pregnant women showed that 24.3% of the pregnant women suffered from insomnia, and found that maternal age, attained education, occupation, monthly household income, insurance coverage, relationship with the mother-in-law, gestational age, and anxiety symptoms were independently risk factors for insomnia ( 9 ). A Canadian study found that good social attention and partnership can reduce the incidence of prenatal depression and thus reduce the risk of insomnia during pregnancy ( 10 ). Similarly, several studies of pregnancy insomnia in the United States have found that degree of social concern is related to differences in the risk of pregnancy insomnia ( 11 , 12 ). What’s more, different countries and regions may also have significant differences in the prevalence of insomnia during pregnancy due to diet, social life pressure, cultural background, and national women’s policies.

Based on this, we conducted a comprehensive database search and compare the prevalence of insomnia during pregnancy in different countries by region. These results will provide some reference evidence for the formulation of prevention and health policies in high prevalence areas, thereby increasing social attention to women’s health.

A systematic review and meta-analysis were conducted on articles related to insomnia symptoms during pregnancy. Both systematic reviews and meta-analyses were reported in accordance with the PRISMA Declaration Guidelines ( 13 ). This review has been registered in the PROSPERO database (registration number: CRD42018104460). The PRISMA checklist could be found in Supplementary Material 1 . The PICOS method was used in formulating research questions ( 14 ).

Search strategy

A full-text search was conducted on the PubMed, Embase, and Web of Science databases, with a time limit for papers published until February 29, 2024, including articles that reported insomnia rates during pregnancy through self-report or questionnaire surveys. The search algorithm was based on terms such as ‘pregnancy’ and ‘insomnia’. Specific search strategies were in Supplementary Material 2 .

The selected titles and relevant abstracts of the articles were reviewed. Each article was categorized as ‘yes’, ‘no’, or ‘possible’, with articles marked as ‘no’ being excluded from the analysis. The entire articles with titles or abstracts marked as ‘yes’ or ‘possible’ were thoroughly reviewed to determine if they met the inclusion criteria. Please refer to Figure 1 for a detailed flowchart outlining the detection program for the study.

www.frontiersin.org

Figure 1 . Flow chart.

Research selection

The P-population was Women with pregnancy, the I-study has no intervention, the C-compare with and without sleep problem, the O-outcome is insomnia, and S-study included Randomized controlled studies, cohort studies, cross-sectional studies, case-control studies. The study included data on the prevalence of insomnia among pregnant women, or reported the number of individuals with insomnia. Research methods involved self-reporting, questionnaire surveys, and the measurement of epidemiological data.

Exclusion criteria: The sample consisted solely of pregnant women with sleep disorders. Studies that used non-standardized measurement methods, such as evaluating insomnia with a single question, were not included. Furthermore, studies that were case reports, systematic reviews, or used data from previous studies were also excluded.

The first and second authors (FR and WH) independently reviewed these studies and conducted a full-text review based on inclusion and exclusion criteria to further exclude studies that were not qualified. Any differences were be resolved by the senior author (ZS).

Data extraction

The first (FR) and second (WH) authors independently extracted data to confirm accuracy. The third author (ZS) confirmed the accuracy of the included data. The studies selected for review are recorded using standardized tables to describe the important variables of each study. The estimated prevalence of insomnia was obtained by extracting data on the number of cases, total sample size, or percentage of samples identified as having insomnia symptoms, as well as the study type, region, maternal age (year), gestational age (week), and percentage of anxiety or depression cases (see Table 1 ).

www.frontiersin.org

Table 1 . Basic information included in the literature.

Meta-analysis was conducted using the comprehensive software STATA 16.0. A random effects model was chosen for analysis based on the level of heterogeneity, typically with I 2 >50%. The I 2 index was used to assess heterogeneity between point estimates, indicating the proportion of variation between point estimates attributed to heterogeneity. Traditionally, I 2 values below 25% suggest low heterogeneity, while values between 25% and 50% suggest moderate heterogeneity, and values above 50% suggest high heterogeneity. Subgroup comparison was then employed to further investigate the sources of heterogeneity.

Quality evaluation

The Newcastle-Ottawa Scale (NOS) was used to evaluate the quality of the included observational studies, and it was generally considered that 1-3 was classified as low quality, 4-6 as medium quality, and 7-9 as high quality. Randomized controlled trials were evaluated using the Cochrane risk bias assessment tool.

Figure 1 illustrated the flowchart of the search and selection process. Initially, a search yielded 18,074 records, of which 10,312 were filtered based on title and abstract after removing duplicates. Following a full text review of the remaining 200 studies, 152 studies were excluded due to unclear sleep outcomes. The meta-analysis included 44 studies ( 9 – 12 , 15 – 54 ), involving a total of 47,399,513 participants in the analysis. The literature included 26 cross-sectional studies ( 9 – 11 , 15 – 18 , 21 – 26 , 28 , 30 – 34 , 39 , 41 , 44 , 45 , 47 , 49 , 50 ), 13 cohort studies ( 12 , 19 , 20 , 27 , 29 , 35 , 36 , 40 , 42 , 46 , 51 – 53 ), 4 randomized controlled studies ( 37 , 38 , 43 , 48 ), and 1 case-control study ( 54 ).

Figures 2 and 3 displayed a summary of insomnia prevalence rates and subgroup forest plots, indicating an estimated range of 1% -77.1% for the prevalence of insomnia among 44 study patients. The overall prevalence rate was 43.9% (33.5% - 54.4%). On this basis, multiple subgroup analyses were conducted.

www.frontiersin.org

Figure 2 . Analysis results of total prevalence of insomnia.

www.frontiersin.org

Figure 3 . Overall prevalence and subgroup analysis of insomnia.

Subgroup analysis based on region and country

Based on regional grouping results, the prevalence of insomnia in Asia was the lowest ( 9 , 16 – 18 , 23 , 24 , 29 – 34 , 36 , 39 , 45 , 54 ), with a specific value of 40.7% (34.1% - 47.3%); Pregnant women in Europe exhibited a high insomnia rate of 53.6% (45.6% - 61.6%) ( 10 , 15 , 20 – 22 , 26 , 37 , 40 , 41 , 51 – 53 ); The prevalence rates in North and South America were 41.0% and 50.6%, respectively ( 11 , 12 , 19 , 25 , 27 , 28 , 35 , 38 , 42 – 44 , 46 – 50 ). Further analysis of the prevalence of insomnia by the country revealed that Spain has the highest insomnia rate of 67.8% (52.6% - 82.9%, I 2 = 92.9%) ( 37 , 52 ), while China has the lowest pregnancy insomnia rate of 35.4% (21.3% -49.6%, I 2 = 99.2%) ( 9 , 16 , 32 , 34 , 39 , 54 ).

Subgroup analysis based on study design

According to the analysis of article categories, it was found that the prevalence of insomnia in randomized controlled studies was the lowest at 37.7% (12% - 63.5%) ( 37 , 38 , 43 , 48 ).The prevalence of insomnia was relatively similar between cross-sectional studies ( 9 – 11 , 15 – 18 , 21 – 26 , 28 , 30 – 34 , 39 , 41 , 44 , 45 , 47 , 49 , 50 ) and cohort studies ( 12 , 19 , 20 , 27 , 29 , 35 , 36 , 40 , 42 , 46 , 51 – 53 ), at 42.8% and 47.4%, respectively. Based on the analysis of publication years, there was no significant difference in the prevalence of insomnia.

Subgroup analysis based on psychological depression participants

Based on classification analysis, it was discovered that in studies with a high prevalence of depression, the occurrence of gestational insomnia rose by 56.2% (49.8% - 62.6%). This indicates that depression is a contributing factor to the elevated rates of insomnia.

The overall quality of the included literature was deemed high, with 24 studies in Table 2 offering moderate evidence and 2 studies presenting low-level evidence. Only one article in the randomized controlled study indicated low quality, as shown in Figure 4 . The high quality of the papers contributes to the reliability of the analysis results.

www.frontiersin.org

Table 2 . The NOS scales evaluation table for observational studies was included.

www.frontiersin.org

Figure 4 . Quality evaluation of 4 randomized controlled trials included.

Sensitivity analysis

After deleting each study item by item ( Figure 5 ), the overall estimates remained stable, indicating that the studies did not significantly affect the overall combined prevalence estimate.

www.frontiersin.org

Figure 5 . Sensitivity analysis.

An in-depth analysis was conducted on the various factors influencing insomnia during pregnancy, indicating that sleep quality can be significantly impacted by pregnancy. As the pregnancy advances, there is a noticeable decline in sleep quality, with late pregnancy being the most disruptive period ( 53 ). Studies have indicated a strong correlation between subjective sleep scores and the severity of depressive symptoms ( 55 ). Moreover, pregnancy represents a physiological state characterized by continuous hormonal, physical, and behavioral changes that may significantly alter both sleep quality and duration ( 56 ).

The study found that the total prevalence of insomnia during pregnancy was 44.0%, significantly higher than the general population. This highlights the importance of addressing insomnia as a significant health issue during pregnancy. Subgroup analysis by region revealed the highest prevalence in Europe and the lowest in Asia. Further analysis was then conducted at the country level.

Among European countries, Spain, Poland, and Norway exhibited a higher risk of insomnia compared to the overall level of insomnia in this analysis. This finding is in line with previous European studies that have shown a relatively high prevalence of insomnia in the European population ( 57 , 58 ). Pregnancy events will further exacerbate the burden of female insomnia. A self-reported survey conducted by David O’Regan on individuals with insomnia in Europe identified lifestyle factors and high levels of life stress as the primary causes of insomnia ( 59 ). Numerous articles on insomnia have also highlighted the impact of lifestyle factors such as diet, exercise, smoking, and sleep habits on the development of insomnia ( 60 ). Additionally, a study on diet and insomnia revealed a positive association between dietary glycemic load and insomnia risk (OR: 1.10; 95% CI, 1.01, 1.20) ( 61 ). Spain, Poland, and Norway, being developed countries, often experience higher life burdens and pressures due to the pursuit of a high quality of life. Meanwhile, the preference for sugary foods among Europeans may contribute to the increased risk of insomnia ( 62 – 64 ).

In the Asian subgroup analysis, Japan, a developed country, exhibited a higher prevalence of insomnia during pregnancy, in line with expectations. The study revealed that, apart from economic pressures, the societal focus on women played a significant role in causing insomnia. Notably, China has specific support measures for pregnant women, including reduced working hours, dietary adjustments, and tailored psychological counseling, effectively alleviating psychological stress ( 65 , 66 ). Moreover, within the Chinese cultural context, pregnant women receive extensive care from family members, contributing to a low insomnia rate ( 65 ). These findings offer valuable insights for designing women’s health initiatives. Conversely, the status of women in Japan is comparatively lower, hindering access to social support and contributing to the high prevalence of insomnia among pregnant women ( 67 ).

North America mainly includes the United States and Canada. In this analysis, only the United States was considered, revealing a lower insomnia rate compared to the overall level. Throughout the past century, the United States has dedicated efforts to safeguarding women’s power and status. Additionally, being the most developed country globally, the United States boasts top-tier economic and medical advancements. These factors, including a robust system and favorable economic and medical conditions, play a vital role in ensuring a safe pregnancy and reducing the risk of insomnia in pregnant women ( 68 – 70 ).

A correlation between insomnia and depression has been observed. In recent years, the detection rate of pregnancy complications in clinical practice has been on the rise ( 71 ), attributed to changes in the living environment, maternal neuroendocrine function, and abnormal fetal development. Studies both domestically and internationally have reported a prevalence rate of depression symptoms during pregnancy among women with pregnancy complications ranging from 29.4% to 39%, significantly surpassing that of healthy pregnant women ( 72 , 73 ).

Through data analysis, it was found that individuals with high levels of depression were at a higher risk of insomnia, and depression was positively correlated with an increase in sleep latency. A study found differences in the consistency of local activity in the auxiliary motor area and insula between patients with insomnia and those without. Patients with insomnia and severe depression exhibited differences in the intensity of spontaneous activity in the middle frontal gyrus and paracentral lobules compared to those without insomnia. Previous studies have shown that the potential neurobiological mechanisms of depression and insomnia symptoms may have included: (1) abnormalities in monoamine neurotransmitters, especially changes in 5-HT concentration, which were closely related to sleep awakening and depression, such as shortened REM latency in patients with depression; (2) Overexpression of biological clock genes and stress response genes; (3) Dysfunction of the hypothalamic pituitary adrenal axis (HPA) and abnormal release of cortisol ( 74 ). According to data, there was a close relationship between insomnia during pregnancy and depression at both the symptom and disease levels. Some women’s insomnia symptoms were relatively stable in the early stages of pregnancy, but temporarily increased in the late stages of pregnancy. This is closely associated with significant physiological and psychological changes, and pregnancy can be characterized as a period of heightened biological and situational stress ( 75 ), which may activate latent vulnerabilities and magnify them. Hence, many pregnant women might experience a worsening of insomnia symptoms in the later stages of pregnancy. For instance, around 70% of individuals with depression experience symptoms of insomnia, and the prevalence of depression among pregnant women with insomnia is 3-4 times higher than in those without insomnia ( 76 ). This bidirectional and cumulative relationship necessitates greater clinical attention, as gestational insomnia and depression both pose risk factors for adverse pregnancy outcomes.

Insomnia during pregnancy is not inherently harmful, but it can contribute to an elevated risk of various health complications for women, such as stillbirth, miscarriage, perinatal depression, and other adverse outcomes. As a result, it is essential to focus on non-pharmacological methods for preventing and managing insomnia during pregnancy. Psychological and social factors play a significant role in the varying prevalence of insomnia during pregnancy, with psychological factors often being linked to some social factors. Pan Chen and Eric S Kim suggest that enhancing overall well-being can be an effective way to alleviate negative psychological symptoms ( 77 , 78 ). Therefore, enhancing the focus on women during pregnancy is crucial for safeguarding their health ( 79 ). China, with its historical cultural background, has shown a greater emphasis on women’s health compared to other developed countries. Drawing from China’s approach, strategies such as reducing the work intensity of pregnant women, promoting increased attention from family members and social groups, providing regular psychological counseling, and encouraging appropriate exercise like relaxation training and mindfulness can help alleviate psychological issues during pregnancy and improve sleep quality to some extent ( 80 ).

Limitations

In this meta-analysis, significant heterogeneity was found, which may be attributed to population characteristics, study design, evaluation of insomnia and measurement of outcomes, as well as clinical stages of pregnancy. Secondly, insomnia mainly came from subjective reports or questionnaire surveys, and differences in diagnosis may have a certain impact on the results. Due to the fact that the summary result of single arm rate is a descriptive result and not a difference comparison result, the statistical significance of publication bias is not strong. We strictly followed the inclusion and exclusion criteria to manually screen relevant articles, without any restrictions on the language or year of the study, thus minimizing the possibility of omitting any research related to the topic; We also conducted a stratified analysis based on geographical location, publication time, literature type, and degree of depression. Therefore, compared to other small-scale studies, our research results may have more reference value and robustness. Finally, due to the limited data on insomnia across various gestational periods, a subgroup analysis based on these different periods has not yet been conducted. Further studies are anticipated to provide additional verification in the future.

The prevalence of insomnia during pregnancy, reaching as high as 44%, has been displaying an upward trend year by year. Urgent attention must be directed toward women’s health issues. Insomnia during pregnancy not only elevates the risk of adverse pregnancy outcomes for women but also significantly affects fetal development and postpartum well-being. As widely acknowledged, insomnia during pregnancy stems from various complex factors, with regional disparities emerging as a central aspect warranting special attention. We aim to undertake further regional research in the future to enhance clinical evidence for developing regional policies aimed at safeguarding women’s health.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary Material . Further inquiries can be directed to the corresponding authors.

Author contributions

CY: Writing – review & editing, Writing – original draft. RF: Writing – review & editing, Writing – original draft. HW: Writing – review & editing, Writing – original draft. YJ: Writing – review & editing, Writing – original draft. SZ: Writing – review & editing, Writing – original draft. XJ: Writing – review & editing, Writing – original draft.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. Administration of Traditional Chinese Medicine, National Famous Traditional Chinese Medicine Expert Inheritance Studio Construction Project.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyt.2024.1427255/full#supplementary-material

1. Louis JM, Koch MA, Reddy UM, Silver RM, Parker CB, Facco FL, et al. Predictors of sleep-disordered breathing in pregnancy. Am J Obstet Gynecol . (2018) 218:521.e1–.e12. doi: 10.1016/j.ajog.2018.01.031

PubMed Abstract | Crossref Full Text | Google Scholar

2. Krystal AD, Ashbrook LH, Prather AA. What is insomnia? Jama . (2021) 326:2444. doi: 10.1001/jama.2021.19283

3. Buysse DJ. Insomnia. Jama . (2013) 309:706–16. doi: 10.1001/jama.2013.193

4. Chaudhry SK, Susser LC. Considerations in treating insomnia during pregnancy: A literature review. Psychosomatics . (2018) 59:341–8. doi: 10.1016/j.psym.2018.03.009

5. Okun ML, Mancuso RA, Hobel CJ, Schetter CD, Coussons-Read M. Poor sleep quality increases symptoms of depression and anxiety in postpartum women. J Behav Med . (2018) 41:703–10. doi: 10.1007/s10865-018-9950-7

6. Lu Q, Zhang X, Wang Y, Li J, Xu Y, Song X, et al. Sleep disturbances during pregnancy and adverse maternal and fetal outcomes: A systematic review and meta-analysis. Sleep Med Rev . (2021) 58:101436. doi: 10.1016/j.smrv.2021.101436

7. Palagini L, Cipriani E, Miniati M, Bramante A, Gemignani A, Geoffroy PA, et al. Insomnia, poor sleep quality and perinatal suicidal risk: A systematic review and meta-analysis. J Sleep Res . (2024) 33:e14000. doi: 10.1111/jsr.14000

8. Sedov ID, Anderson NJ, Dhillon AK, Tomfohr-Madsen LM. Insomnia symptoms during pregnancy: A meta-analysis. J Sleep Res . (2021) 30:e13207. doi: 10.1111/jsr.13207

9. Yang JP, Lin RJ, Sun K, Gao LL. Incidence and correlates of insomnia and its impact on health-related quality of life among Chinese pregnant women: a cross-sectional study. J Reprod Infant Psychol . (2023) 41:391–402. doi: 10.1080/02646838.2021.2020228

10. Kugbey N, Ayanore M, Doegah P, Chirwa M, Bartels SA, Davison CM, et al. Prevalence and correlates of prenatal depression, anxiety and suicidal behaviours in the Volta region of Ghana. Int J Environ Res Public Health . (2021) 18(11):5857. doi: 10.3390/ijerph18115857

11. Manber R, Steidtmann D, Chambers AS, Ganger W, Horwitz S, Connelly CD. Factors associated with clinically significant insomnia among pregnant low-income Latinas. J Women's Health (2002) . (2013) 22:694–701. doi: 10.1089/jwh.2012.4039

Crossref Full Text | Google Scholar

12. Okun ML, O'Brien LM. Concurrent insomnia and habitual snoring are associated with adverse pregnancy outcomes. Sleep Med . (2018) 46:12–9. doi: 10.1016/j.sleep.2018.03.004

13. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical Res ed) . (2021) 372:n71. doi: 10.1136/bmj.n71

14. Morgan RL, Whaley P, Thayer KA, Schünemann HJ. Identifying the PECO: A framework for formulating good questions to explore the association of environmental and other exposures with health outcomes. Environ Int . (2018) 121:1027–31. doi: 10.1016/j.envint.2018.07.015

15. Wołyńczyk-Gmaj D, Różańska-Walędziak A, Ziemka S, Ufnal M, Brzezicka A, Gmaj B, et al. Insomnia in pregnancy is associated with depressive symptoms and eating at night. J Clin Sleep Med: JCSM: Off Publ Am Acad Sleep Med . (2017) 13:1171–6. doi: 10.5664/jcsm.6764

16. Wang WJ, Hou CL, Jiang YP, Han FZ, Wang XY, Wang SB, et al. Prevalence and associated risk factors of insomnia among pregnant women in China. Compr Psychiatry . (2020) 98:152168. doi: 10.1016/j.comppsych.2020.152168

17. Umeno S, Kato C, Nagaura Y, Kondo H, Eto H. Characteristics of sleep/wake problems and delivery outcomes among pregnant Japanese women without gestational complications. BMC Pregnancy Childbirth . (2020) 20:179. doi: 10.1186/s12884-020-02868-1

18. Altınayak S, Rüzgar Ş, Koç E. The relationship between sleep problems and sexual dysfunction among pregnant women in Turkey. Sleep Breathing Schlaf Atmung . (2024) 28:459–65. doi: 10.1007/s11325-023-02896-z

19. Román-Gálvez RM, Amezcua-Prieto C, Salcedo-Bellido I, Martínez-Galiano JM, Khan KS, Bueno-Cavanillas A. Factors associated with insomnia in pregnancy: A prospective Cohort Study. Eur J Obstet Gynecol Reprod Biol . (2018) 221:70–5. doi: 10.1016/j.ejogrb.2017.12.007

20. Osnes RS, Eberhard-Gran M, Follestad T, Kallestad H, Morken G, Roaldset JO. Mid-pregnancy insomnia is associated with concurrent and postpartum maternal anxiety and obsessive-compulsive symptoms: A prospective cohort study. J Affect Disord . (2020) 266:319–26. doi: 10.1016/j.jad.2020.01.140

21. Liset R, Grønli J, Henriksen RE, Henriksen TEG, Nilsen RM, Pallesen S. Sleep, evening light exposure and perceived stress in healthy nulliparous women in the third trimester of pregnancy. PloS One . (2021) 16:e0252285. doi: 10.1371/journal.pone.0252285

22. Palagini L, Cipollone G, Masci I, Novi M, Caruso D, Kalmbach DA, et al. Stress-related sleep reactivity is associated with insomnia, psychopathology and suicidality in pregnant women: preliminary results. Sleep Med . (2019) 56:145–50. doi: 10.1016/j.sleep.2019.01.009

23. Nacar G, Taşhan ST. Relationship between sleep characteristics and depressive symptoms in last trimester of pregnancy. Afr Health Sci . (2019) 19:2934–44. doi: 10.4314/ahs.v19i4.14

24. Mourady D, Richa S, Karam R, Papazian T, Hajj Moussa F, El Osta N, et al. Associations between quality of life, physical activity, worry, depression and insomnia: A cross-sectional designed study in healthy pregnant women. PloS One . (2017) 12:e0178181. doi: 10.1371/journal.pone.0178181

25. Mindell JA, Cook RA, Nikolovski J. Sleep patterns and sleep disturbances across pregnancy. Sleep Med . (2015) 16:483–8. doi: 10.1016/j.sleep.2014.12.006

26. Smyka M, Kosińska-Kaczyńska K, Sochacki-Wójcicka N, Zgliczyńska M, Wielgoś M. Sleep problems in pregnancy-A cross-sectional study in over 7000 pregnant women in Poland. Int J Environ Res Public Health . (2020) 17(15):5306. doi: 10.3390/ijerph17155306

27. Cohen MF, Corwin EJ, Johnson DA, Amore AD, Brown AL, Barbee NR, et al. Discrimination is associated with poor sleep quality in pregnant Black American women. Sleep Med . (2022) 100:39–48. doi: 10.1016/j.sleep.2022.07.015

28. Swanson LM, Pickett SM, Flynn H, Armitage R. Relationships among depression, anxiety, and insomnia symptoms in perinatal women seeking mental health treatment. J Women's Health (2002) . (2011) 20:553–8. doi: 10.1089/jwh.2010.2371

29. Ko H, Shin J, Kim MY, Kim YH, Lee J, Kil KC, et al. Sleep disturbances in Korean pregnant and postpartum women. J Psychosom Obstet Gynaecol . (2012) 33:85–90. doi: 10.3109/0167482X.2012.658465

30. Kızılırmak A, Timur S, Kartal B. Insomnia in pregnancy and factors related to insomnia. Sci World J . (2012) 2012:197093. doi: 10.1100/2012/197093

31. Murakami K, Ishikuro M, Obara T, Ueno F, Noda A, Onuma T, et al. Social isolation and insomnia among pregnant women in Japan: The Tohoku Medical Megabank Project Birth and Three-Generation Cohort Study. Sleep Health . (2022) 8:714–20. doi: 10.1016/j.sleh.2022.08.007

32. Wang J, Huang Y, Li Y, Wu L, Cao D, Cao F. Sleep-related attentional bias: Development and validation of a Chinese version of the brief sleep-associated monitoring index in pregnant women. J Psychosom Res . (2022) 163:111052. doi: 10.1016/j.jpsychores.2022.111052

33. Bušková J, Miletínová E, Králová R, Dvořáková T, Tefr Faridová A, Heřman H, et al. Parasomnias in pregnancy. Brain Sci . (2023) 13(2):357. doi: 10.3390/brainsci13020357

34. Wang J, Zhou Y, Qian W, Zhou Y, Han R, Liu Z. Maternal insomnia during the COVID-19 pandemic: associations with depression and anxiety. Soc Psychiatry Psychiatr Epidemiol . (2021) 56:1477–85. doi: 10.1007/s00127-021-02072-2

35. Sedov ID, Tomfohr-Madsen LM. Trajectories of insomnia symptoms and associations with mood and anxiety from early pregnancy to the postpartum. Behav Sleep Med . (2021) 19:395–406. doi: 10.1080/15402002.2020.1771339

36. Peltonen H, Paavonen EJ, Saarenpää-Heikkilä O, Vahlberg T, Paunio T, Polo-Kantola P. Sleep disturbances and depressive and anxiety symptoms during pregnancy: associations with delivery and newborn health. Arch Gynecol Obstet . (2023) 307:715–28. doi: 10.1007/s00404-022-06560-x

37. Amezcua-Prieto C, Naveiro-Fuentes M, Arco-Jiménez N, Olmedo-Requena R, Barrios-Rodríguez R, Vico-Zúñiga I, et al. Walking in pregnancy and prevention of insomnia in third trimester using pedometers: study protocol of Walking_Preg project (WPP). A randomized Controlled trial. BMC Pregnancy Childbirth . (2020) 20:521. doi: 10.1186/s12884-020-03225-y

38. Benito-Villena R, Guerrero-Martínez I, Naveiro-Fuentes M, Cano-Ibánez N, Femia-Marzo P, Gallo-Vallejo JL, et al. Walking promotion in pregnancy and its effects on insomnia: results of walking_Preg project (WPP) clinical trial. Int J Environ Res Public Health . (2022) 19(16):10012. doi: 10.3390/ijerph191610012

39. Chen X, Liu Y, Liu M, Min F, Tong J, Wei W, et al. Prevalence and associated factors of insomnia symptoms among pregnant women in the third trimester in a moderately developing region of China. BMC Public Health . (2023) 23:2319. doi: 10.1186/s12889-023-17269-0

40. Dolatian M, Mehraban Z, Sadeghniat K. The effect of impaired sleep on preterm labour. West Indian Med J . (2014) 63:62–7. doi: 10.7727/wimj.2012.305

41. Dørheim SK, Bjorvatn B, Eberhard-Gran M. Insomnia and depressive symptoms in late pregnancy: a population-based study. Behav Sleep Med . (2012) 10:152–66. doi: 10.1080/15402002.2012.660588

42. Facco FL, Kramer J, Ho KH, Zee PC, Grobman WA. Sleep disturbances in pregnancy. Obstet Gynecol . (2010) 115:77–83. doi: 10.1097/AOG.0b013e3181c4f8ec

43. Felder JN, Epel ES, Neuhaus J, Krystal AD, Prather AA. Efficacy of digital cognitive behavioral therapy for the treatment of insomnia symptoms among pregnant women: A randomized clinical trial. JAMA Psychiatry . (2020) 77:484–92. doi: 10.1001/jamapsychiatry.2019.4491

44. Felder JN, Hartman AR, Epel ES, Prather AA. Pregnant patient perceptions of provider detection and treatment of insomnia. Behav Sleep Med . (2020) 18:787–96. doi: 10.1080/15402002.2019.1688153

45. Fernández-Alonso AM, Trabalón-Pastor M, Chedraui P, Pérez-López FR. Factors related to insomnia and sleepiness in the late third trimester of pregnancy. Arch Gynecol Obstet . (2012) 286:55–61. doi: 10.1007/s00404-012-2248-z

46. Kalmbach DA, Ahmedani BK, Gelaye B, Cheng P, Drake CL. Nocturnal cognitive hyperarousal, perinatal-focused rumination, and insomnia are associated with suicidal ideation in perinatal women with mild to moderate depression. Sleep Med . (2021) 81:439–42. doi: 10.1016/j.sleep.2021.03.004

47. Kalmbach DA, Cheng P, Ong JC, Ciesla JA, Kingsberg SA, Sangha R, et al. Depression and suicidal ideation in pregnancy: exploring relationships with insomnia, short sleep, and nocturnal rumination. Sleep Med . (2020) 65:62–73. doi: 10.1016/j.sleep.2019.07.010

48. Kalmbach DA, Cheng P, Reffi AN, Seymour GM, Ruprich MK, Bazan LF, et al. Racial disparities in treatment engagement and outcomes in digital cognitive behavioral therapy for insomnia among pregnant women. Sleep Health . (2023) 9:18–25. doi: 10.1016/j.sleh.2022.10.010

49. Kalmbach DA, Roth T, Cheng P, Ong JC, Rosenbaum E, Drake CL. Mindfulness and nocturnal rumination are independently associated with symptoms of insomnia and depression during pregnancy. Sleep Health . (2020) 6:185–91. doi: 10.1016/j.sleh.2019.11.011

50. Kendle AM, Salemi JL, Jackson CL, Buysse DJ, Louis JM. Insomnia during pregnancy and severe maternal morbidity in the United States: nationally representative data from 2006 to 2017. Sleep . (2022) 45(10):zsac175. doi: 10.1093/sleep/zsac175

51. Kiviruusu O, Pietikäinen JT, Kylliäinen A, Pölkki P, Saarenpää-Heikkilä O, Marttunen M, et al. Trajectories of mothers' and fathers' depressive symptoms from pregnancy to 24 months postpartum. J Affect Disord . (2020) 260:629–37. doi: 10.1016/j.jad.2019.09.038

52. Liebana-Presa C, Martínez-Fernández MC, García-Fernández R, Martín-Vázquez C, Fernández-Martínez E, Hidalgo-Lopezosa P. Self perceived health and stress in the pregnancy during the COVID-19 pandemic. Front Psychiatry . (2023) 14:1166882. doi: 10.3389/fpsyt.2023.1166882

53. Osnes RS, Eberhard-Gran M, Follestad T, Kallestad H, Morken G, Roaldset JO. Mid-pregnancy insomnia and its association with perinatal depressive symptoms: A prospective cohort study. Behav Sleep Med . (2021) 19:285–302. doi: 10.1080/15402002.2020.1743705

54. Zhou X, Hong X, Huang K, Ding X, Yu H, Zhao J, et al. Poor sleep quality in early pregnancy increases the risk of developing gestational diabetes mellitus: a propensity score matching analysis. Sleep Breathing Schlaf Atmung . (2023) 27:1557–65. doi: 10.1007/s11325-022-02748-2

55. Bjornsdottir E, Lindberg E, Benediktsdottir B, Gislason T, Garcia Larsen V, Franklin K, et al. Are symptoms of insomnia related to respiratory symptoms? Cross-sectional results from 10 European countries and Australia. BMJ Open . (2020) 10:e032511. doi: 10.1136/bmjopen-2019-032511

56. Schmidhuber J, Traill WB. The changing structure of diets in the European Union in relation to healthy eating guidelines. Public Health Nutr . (2006) 9:584–95. doi: 10.1079/PHN2005844

57. Palagini L, Manni R, Liguori C, De Gennaro L, Gemignani A, Fanfulla F, et al. Evaluation and management of insomnia in the clinical practice in Italy: a 2023 update from the Insomnia Expert Consensus Group. J Neurol . (2024) 271:1668–79. doi: 10.1007/s00415-023-12112-3

58. Bjornsdottir E, Thorarinsdottir EH, Lindberg E, Benediktsdottir B, Franklin K, Jarvis D, et al. Association between physical activity over a 10-year period and current insomnia symptoms, sleep duration and daytime sleepiness: a European population-based study. BMJ Open . (2024) 14:e067197. doi: 10.1136/bmjopen-2022-067197

59. O'Regan D, Garcia-Borreguero D, Gloggner F, Wild I, Leontiou C, Ferini-Strambi L. Mapping the insomnia patient journey in Europe and Canada. Front Public Health . (2023) 11:1233201. doi: 10.3389/fpubh.2023.1233201

60. Höglund P, Hakelind C, Nordin M, Nordin S. Risk factors for insomnia and burnout: A longitudinal population-based cohort study. Stress Health: J Int Soc Invest Stress . (2023) 39:798–812. doi: 10.1002/smi.3218

61. Arab A, Karimi E, Garaulet M, Scheer F. Dietary patterns and insomnia symptoms: A systematic review and meta-analysis. Sleep Med Rev . (2024) 75:101936. doi: 10.1016/j.smrv.2024.101936

62. Poličnik R, Hristov H, Lavriša Ž, Farkaš J, Smole Možina S, Koroušić Seljak B, et al. Dietary intake of adolescents and alignment with recommendations for healthy and sustainable diets: results of the SI.Menu study. Nutrients . (2024) 16(12):1912. doi: 10.3390/nu16121912

63. Sonestedt E, Lukic M. Beverages - a scoping review for Nordic Nutrition Recommendations 2023. Food Nutr Res . (2024) 68. doi: 10.29219/fnr.v68.10458

64. Yusta-Boyo MJ, González EG, García-Solano M, Rollán Gordo A, Peña-Rey I, Rodríguez-Artalejo F. Reduction of sugar, salt and fat content in foods over the period 2016-2021 in Spain: the National Food Reformulation Plan. Eur J Clin Nutr . (2024) 78:149–54. doi: 10.1038/s41430-023-01357-w

65. Zheng G, Lyu X, Pan L, Chen A. The role conflict-burnout-depression link among Chinese female health care and social service providers: the moderating effect of marriage and motherhood. BMC Public Health . (2022) 22:230. doi: 10.1186/s12889-022-12641-y

66. Yan HWM. Hotspots and Prospects of fertility Support Policy Research in China: A Visual econometric Analysis based on CNKI Database Literature (2013-2023). Jianghan Acad . (2024) 2024:46–57. doi: 10.16388/j.cnki.cn42-1843/c.2024.04.005

67. Sakurai K, Kawakami N, Yamaoka K, Ishikawa H, Hashimoto H. The impact of subjective and objective social status on psychological distress among men and women in Japan. Soc Sci Med (1982) . (2010) 70:1832–9. doi: 10.1016/j.socscimed.2010.01.019

68. Davis KC, Fortino BR, O'Shea NG. Potential consequences of the Dobbs v. Jackson Women's Health Organization decision. Psychol Addictive Behav: J Soc Psychol Addictive Behav . (2024) 38:161–6. doi: 10.1037/adb0000986

69. Ickovics JR, Lewis JB, Cunningham SD, Thomas J, Magriples U. Transforming prenatal care: Multidisciplinary team science improves a broad range of maternal-child outcomes. Am Psychol . (2019) 74:343–55. doi: 10.1037/amp0000435

70. Pirkle JRA. Protecting and advancing women's health and rights post-roe era through policy. Health Educ Behav: Off Publ Soc Public Health Educ . (2023) 50:538–42. doi: 10.1177/10901981231164578

71. Wu D, Chen S, Zhong X, Zhang J, Zhao G, Jiang L. Prevalence and factors associated with antenatal depressive symptoms across trimesters: a study of 110,584 pregnant women covered by a mobile app-based screening programme in Shenzhen, China. BMC Pregnancy Childbirth . (2024) 24:480. doi: 10.1186/s12884-024-06680-z

72. Vargas I, Perlis ML. Insomnia and depression: clinical associations and possible mechanistic links. Curr Opin Psychol . (2020) 34:95–9. doi: 10.1016/j.copsyc.2019.11.004

73. Zhang Jihui LY, Jiyang P. Research progress and existing problems on the relationship between insomnia and depression from 2008 to 2013. Chin J Ment Health . (2015) 2015:29.

Google Scholar

74. Grigoriadis S, Graves L, Peer M, Mamisashvili L, Tomlinson G, Vigod SN, et al. Maternal anxiety during pregnancy and the association with adverse perinatal outcomes: systematic review and meta-analysis. J Clin Psychiatry . (2018) 79(5):17r12011. doi: 10.4088/JCP.17r12011

75. Wan EY, Moyer CA, Harlow SD, Fan Z, Jie Y, Yang H. Postpartum depression and traditional postpartum care in China: role of zuoyuezi. Int J Gynaecol Obstet: Off Organ Int Fed Gynaecol Obstet . (2009) 104:209–13. doi: 10.1016/j.ijgo.2008.10.016

76. Dunkel Schetter C, Rahal D, Ponting C, Julian M, Ramos I, Hobel CJ, et al. Anxiety in pregnancy and length of gestation: Findings from the healthy babies before birth study. Health Psychol: Off J Division Health Psychol Am psychol Assoc . (2022) 41:894–903. doi: 10.1037/hea0001210

77. Chen P, Sun HL, Zhang L, Feng Y, Sha S, Su Z, et al. Inter-relationships of depression and insomnia symptoms with life satisfaction in stroke and stroke-free older adults: Findings from the Health and Retirement Study based on network analysis and propensity score matching. J Affect Disord . (2024) 356:568–76. doi: 10.1016/j.jad.2024.04.036

78. Kim ES, Wilkinson R, Case BW, Cowden RG, Okuzono SS, VanderWeele TJ. Connected communities: Perceived neighborhood social cohesion during adolescence and subsequent health and well-being in young adulthood-An outcome-wide longitudinal approach. J Community Psychol . (2024) 52:774–91. doi: 10.1002/jcop.23130

79. Li Q, Kanduma E, Ramiro I, Xu DR, Cuco RMM, Chaquisse E, et al. Spatial access to continuous maternal and perinatal health care services in low-resource settings: cross-sectional study. JMIR Public Health Surveil . (2024) 10:e49367. doi: 10.2196/49367

80. Riedel A, Benz F, Deibert P, Barsch F, Frase L, Johann AF, et al. The effect of physical exercise interventions on insomnia: A systematic review and meta-analysis. Sleep Med Rev . (2024) 76:101948. doi: 10.1016/j.smrv.2024.101948

Keywords: insomnia, depression, geographic location, prevalence during pregnancy, global

Citation: Yang C, Fu R, Wang H, Jiang Y, Zhang S and Ji X (2024) Evaluating the global prevalence of insomnia during pregnancy through standardized questionnaires and diagnostic criteria: a systematic review and meta-analysis. Front. Psychiatry 15:1427255. doi: 10.3389/fpsyt.2024.1427255

Received: 03 May 2024; Accepted: 25 July 2024; Published: 13 August 2024.

Reviewed by:

Copyright © 2024 Yang, Fu, Wang, Jiang, Zhang and Ji. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shipeng Zhang, [email protected] ; Xiaoli Ji, [email protected]

†These authors share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Open access
  • Published: 07 August 2024

Relationship between patient safety culture and patient experience in hospital settings: a scoping review

  • Adel Alabdaly   ORCID: orcid.org/0000-0003-0914-5225 1 , 2 ,
  • Reece Hinchcliff   ORCID: orcid.org/0000-0001-9920-4211 3 , 4 ,
  • Deborah Debono   ORCID: orcid.org/0000-0003-2095-156X 5 &
  • Su-Yin Hor   ORCID: orcid.org/0000-0002-6498-9722 5  

BMC Health Services Research volume  24 , Article number:  906 ( 2024 ) Cite this article

108 Accesses

Metrics details

Measures of patient safety culture and patient experience are both commonly utilised to evaluate the quality of healthcare services, including hospitals, but the relationship between these two domains remains uncertain. In this study, we aimed to explore and synthesise published literature regarding the relationships between these topics in hospital settings.

This study was performed using the five stages of Arksey and O’Malley’s Framework, refined by the Joanna Briggs Institute. Searches were conducted in the CINAHL, Cochrane Library, ProQuest, MEDLINE, PsycINFO, SciELO and Scopus databases. Further online search on the websites of pertinent organisations in Australia and globally was conducted. Data were extracted against predetermined criteria.

4512 studies were initially identified; 15 studies met the inclusion criteria. Several positive statistical relationships between patient safety culture and patient experience domains were identified. Communication and teamwork were the most influential factors in the relationship between patient safety culture and patient experience. Managers and clinicians had a positive view of safety and a positive relationship with patient experience, but this was not the case when managers alone held such views. Qualitative methods offered further insights into patient safety culture from patients’ and families’ perspectives.

The findings indicate that the patient can recognise safety-related issues that the hospital team may miss. However, studies mostly measured staff perspectives on patient safety culture and did not always include patient experiences of patient safety culture. Further, the relationship between patient safety culture and patient experience is generally identified as a statistical relationship, using quantitative methods. Further research assessing patient safety culture alongside patient experience is essential for providing a more comprehensive picture of safety. This will help to uncover issues and other factors that may have an indirect effect on patient safety culture and patient experience.

Peer Review reports

Introduction

Patient safety is a pressing challenge for health systems, globally. The importance of promoting and sustaining a robust safety culture is widely recognised [ 1 ]. The importance of the patient’s role in supporting patient safety is also increasingly recognised [ 2 ]. Despite the prominence of the concepts of patient safety culture and patient experience in academia and industry, the relationship between them remains underexplored and diffuse.

The concept of patient safety culture was defined as a collective of beliefs, attitudes, values, and norms that influence behaviours and attitudes, concerning patient safety [ 3 ]. Patient perspectives are often neglected when measuring safety culture [ 4 ]. Patient experience has been defined as patients’ perspectives of services, recognising that patients are the most valuable sources of information about their experiences [ 5 ].

It is essential to put the patient at the centre of healthcare services [ 6 ], and to do this requires nurturing caring cultures through the assurance that health professionals feel esteemed, involved and supported [ 7 ]. Patients pay attention to staff performance and other issues and can identify safety problems that hospital staff may miss, such as problems entering and exiting the healthcare system, systemic (multiple and distributed) problems that are cumulative, and errors of omission, especially the failure to attend to patients’ concerns [ 2 , 8 , 9 , 10 ]. A cultural change from the conventional approach that considered patients as care recipients, to seeing patients as partners in their care, is essential to provide patient-centred care that is informed by patient experience.

There has been considerable knowledge gained about patient safety, but it persists as a worldwide challenge in healthcare [ 11 ], with serious incidents and iatrogenic harm continuing to occur across health care settings, including within hospital settings. There has been a focus on reducing iatrogenic harm by enhancing safety culture in hospitals.

Understanding patient safety from the staff perspective alone is not enough. It is essential to also understand what factors might link safety culture and patient experience, as concepts often measured separately, but both important indicators of safety and quality. In examining this link, we hope to better understand what facets of care might contribute to both safety culture, as experienced by staff, and the safety and quality of care, as experienced by patients. The aim of this review is to explore and synthesise existing research literature to find out what is known regarding the relationship between patient safety culture and patient experience (of safety and quality) in hospital settings. We sought to achieve this aim through the following objectives: (a) to identify how these concepts have been defined or described in the literature; (b) to identify how these concepts are measured; and (c) to identify the links between the concepts.

This study followed a published protocol [ 12 ]. The methodology of this scoping review was developed using the Arksey and O’Malley [ 13 ] framework for a scoping review (Arksey & O’Malley, 2005), refined by the Joanna Briggs Institute [ 14 ]. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMAScR) [ 15 ] guidelines were followed. The study does not critically appraise the included papers’ quality and risk of bias. The aim in our scoping review is not to evaluate the quality of the evidence found, but rather to explore what research has been done in this field, and what approaches were undertaken.

The processes of searching, applying inclusion and exclusion criteria, screening, data extraction, and reporting of the findings followed a published protocol for this study [ 12 ]. The search terms and strategies appear in the protocol, and searches were completed on 18 June 2022.

The inclusion and exclusion criteria

This review followed the Population, Concept and Context (PCC) framework for the inclusion criteria recommended by the Joanna Briggs Institute for scoping reviews [ 14 ]. In addition to the PCC criteria noted in Table  1 , included studies must have been conducted in the hospital context and reported in English or Arabic languages.

We searched journals from seven electronic databases relevant to the scope of the study (CINAHL, Cochrane Library, ProQuest, MEDLINE, PsycINFO, SciELO and Scopus); web search engine Google Scholar (first 30 results); and four organisations in Australia and globally: the Agency for Healthcare Research and Quality (AHRQ), the Australian Commission for Safety and Quality in Healthcare (ACSQHC), the Agency for Clinical Innovation (ACI), and National Institutes of Health (NIH). We supplemented these searches with hand-searching the reference lists of the final included papers for additional studies of relevance.

Study selection

As indicated in the protocol for this study [ 12 ], retrieved papers were screened and selected in two phases. In the first phase, one reviewer (AA) evaluated all titles and abstracts to determine whether each paper met the eligibility criteria, including categorising screened studies as ‘included’, ‘excluded’ or ‘not sure’. All papers screened as ‘included’ and ‘not sure’ in the first phase were considered for full-text review by the reviewer (AA). In the second phase, three reviewers (RH, DD, SH) screened ten per cent of titles and abstracts of studies screened as ‘included’, ‘excluded’ or ‘not sure’ against selection criteria. All authors (AA, RH, DD, SH) independently reviewed the full text of the included studies. The authors discussed the included papers in a meeting and reached a consensus on the included papers, with no disagreement between the authors.

Charting the data

One reviewer (AA) extracted relevant data from the included studies to address the scoping review question using the template provided in the published protocol [ 12 ]. Three reviewers (RH, DD and SH) verified the accuracy of the data extraction exercise. The data extracted included the following:

Aims/objective(s).

Methodology/methods.

Inclusion/exclusion criteria (e.g., PCC).

Types of intervention (if applicable).

Measurement of outcomes (if applicable).

Key results that relate to the review question.

Reporting the findings

Other concepts related to patient safety culture and patient experience, such as safety climate and patient satisfaction, were used in literature that measured safety culture or patient experience. The nuances of these terms were illustrated in the published protocol. The decision was taken to incorporate findings about safety climate alongside those about patient safety culture, and to incorporate findings about both patient satisfaction and patient experience. We noticed that the ‘patient experience’ and ‘patient satisfaction’ terms are often used interchangeably. For example, a study conducted by Mazurenko et al. [ 16 ] used the term ‘patient satisfaction’ in the paper title but measured patient satisfaction using the HCAHPS tool, which is a well-known tool for measuring ‘patient experience’. In fact, the terms, as operationalised in the instruments, overlap more than they should.

According to Bull [ 17 ], ‘patient satisfaction’ involves an evaluation and hence is subjective, suggesting that ‘patient experience’ is the more objective measure. However, considering the questions in the HCAHPS tool (commonly used for measuring ‘patient experience’ as mentioned above), we see that several questions involve an element of subjectivity and evaluation from the patient’s perspective. For instance, questions like: “During this hospital stay, how often did nurses treat you with courtesy and respect?” or “How often did you get help in getting to the bathroom or in using a bedpan as soon as you wanted?”. The point made by Bull [ 17 ] reflects a tension between the recognised importance of finding out what care is like, from patients’ perspectives (which is subjective and evaluative), and the desire for objective measurements of care delivery for the purposes of comparison and evaluation of health services [ 18 ]. Due to these concepts being so intertwined in how they are understood and measured, and not wanting to limit the understanding of the patient experience only to objective measures devoid of patients’ subjective judgements, papers on patient satisfaction from the review were included, based on the inclusion criteria.

The study sought to review a wide range of literature in relation to the study aim and inclusion criteria. Rather than being a systematic review or meta-analysis, the study aims to offer the reader an overview of the research carried out regarding the relationship between safety culture and patient experience. The characteristics and findings of the included papers were analysed initially by (AA), performing a content analysis, using a framework of categories aligned with the research questions. Within these categories, study features and findings were discussed among all the authors (AA, RH, DD, SH), and descriptively summarised. All authors agreed upon the findings and categories. This descriptive content analysis was found to be sufficient to address the study objectives. Thus, deviating from the published protocol [ 12 ], no further thematic analysis was conducted. The results are presented according to the categories as follows:

Conceptualisations of patient safety culture and patient experience.

Measurement of patient safety culture and patient experience.

Relationship between patient safety culture and patient experience.

As depicted in Fig.  1 , the initial search yielded 4512 articles. After removing duplicates, 3833 articles remained, and 3793 were excluded at the first stage of screening (title and abstract). Following full-text screening, 15 articles remained that met the inclusion criteria. The included studies were conducted in different countries, including Australia (one study) [ 19 ], Canada (two studies) [ 8 , 20 ], Germany (one study) [ 4 ], Indonesia (one study) [ 21 ], Iran (one study) [ 22 ], Israel (two studies) [ 10 , 23 ], Nigeria (one study) [ 24 ], United Kingdom (one study) [ 2 ] and United States (five studies) [ 16 , 25 , 26 , 27 , 28 ]. A summary of the characteristics of the included studies is presented in Table  2 .

figure 1

PRISMA flowchart of search process and results

Conceptualisations of patient safety culture and patient experience

Patient safety culture.

In the studies reviewed, patient safety culture was commonly conceptualised as relating to the attitudes, beliefs, perceptions, norms and values that workers share about safety [ 8 , 10 , 24 , 27 ]. These shared characteristics shape healthcare professionals’ understandings of what is essential in a healthcare institution, how they should act, what attitudes or actions are acceptable, and what approaches are rewarded or punished concerning patient safety [ 8 , 10 , 27 ]. Patient safety culture has been identified within the included studies as being central to the behaviour of the individuals, and influences staff proficiency, attitudes and behaviours concerning their safety performance [ 8 , 10 , 27 ].

The reviewed literature also identified patient safety culture as one element of a broader organisational culture, related to preventing and detecting shortfalls in patient safety, and managing patient safety in healthcare settings [ 16 , 20 , 21 ]. The concept of ‘safety climate’ was also prevalent in the literature, and was often used in studies that also described ‘safety culture’ [ 10 , 16 , 19 , 26 , 27 ] without distinguishing between the two concepts.

  • Patient experience

From our review of the studies, the concept of patient satisfaction was more commonly used than patient experience, and defined as a subjective assessment of the ways those receiving healthcare react to particular relevant elements of treatment, including the process, environment, and outcomes, and this was quantified as representing the degree to which patients believe that their requirements and aspirations were fulfilled by their experiences [ 24 , 26 ]. Although the research that examined patient experience, did not offer specific definitions of the concept, patient experience was conceptualised as a resource for understanding patients’ perceptions, which helps promote the quality and safety of healthcare services [ 2 , 8 , 25 , 27 , 28 ].

The reviewed research frequently refered to the concept of patient satisfaction and ways of measuring it, regarding patient satisfaction as indicative of the effectiveness of organisational performance with regard to patient safety [ 2 , 8 , 25 , 26 , 27 ]. Review of the included studies identified another related concept, customer satisfaction, which is defined as how the individual feels when making a comparison between what they expected and how they regarded what they received; this is regarded as a high-performance target for the delivery of public services [ 21 ]. The variation in the concepts also reflected variation in the measurement tools currently used.

Measuring patient safety culture and patient experience

In the research reviewed, patient safety culture was most commonly measured by the deployment of questionnaires. Included studies also presented assessments of the validity of deployed instruments. The most common patient safety culture tool used in the reviewed studies was the Hospital Survey on Patient Safety Culture (HSOPS) [ 2 , 16 , 20 , 22 , 24 , 25 , 27 , 28 ]. The next most common tool used was the Safety Attitudes Questionnaire (SAQ) [ 19 , 26 ]. The SAQ was also combined with the Leadership Effectiveness Survey (LES) to construct a new tool named the Safety Culture and Leadership Questionnaire to assess clinician perceptions of safety, teamwork and leadership [ 19 ].

The HSOPS tool developed by the Agency of Healthcare Research and Quality was employed in included studies to assess clinician and staff perceptions of the culture of safety at the hospital’s macro level [ 16 , 22 , 27 , 28 ]. HSOPS is also used in individual departments within a hospital [ 2 , 20 , 24 , 25 ], and regarded as a reliable and valid tool. The SAQ is another reliable and valid tool employed for the evaluation of patient safety culture [ 26 ]. The safety culture domains in HSOPS and SAQ tools are different but overlapping (Table  3 ).

The use of HSOPS and SAQ tools reflected the overlap in use of the concepts of safety culture and safety climate. For example, HSOPS includes more dimensions of patient safety culture than the SAQ, and both tools were employed to measure ‘patient safety culture’ [ 2 , 16 , 20 , 21 , 24 , 25 , 26 , 27 , 28 ], although the HSOPS was also employed for the measurement of ‘safety climate’ [ 16 ]. In addition, the SAQ includes two dimensions referring to climate: teamwork climate and safety climate [ 29 ]. Importantly however, both the HSOPS and SAQ offer a quantitative measure of patient safety culture from the point of view of staff alone [ 2 , 16 , 20 , 24 , 25 , 26 , 27 , 28 ].

Patient-reported measures of safety were limited and mentioned more frequently in more recent literature. The Patient Measure of Safety (PMOS), Patients’ Perceptions of Safety Culture (PaPSC) and narratives were used in the research reviewed to identify safety concerns from the patient’s perspective and provide data regarding safety matters, including patient safety culture [ 2 , 4 , 8 , 19 ]. Lawton et al. [ 2 ] noted that the PMOS has undergone considerable testing and is generally recognised as having both validity and reliability; it is also popular with patients and allows researchers to assess how patients perceive the ways in which organisational elements influence patient safety within a hospital by collecting patient feedback about contributing factors to safety incidents [ 2 ].

With regard to measuring patient experience, the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) was the most frequently used tool in studies reviewed, and is regarded as a valid and reliable instrument for measuring the ways in which patients perceive their interactions with the hospital, and can be used by government as a tool for assessing hospital funding [ 16 , 25 , 26 , 28 ]. HCAHPS (also referred to as Hospital CAHPS) asks the patient to report on their recent experiences with inpatient care [ 16 , 25 , 26 , 28 ]. The HCAHPS tool measures the following domains: nurse communication, doctor communication, pain management, staff responsiveness, hospital environment, communication about medicine, discharge information, and overall patient perception [ 16 , 25 , 26 , 28 ]. Similarly to the overlapping concepts described with the safety culture surveys earlier, the HCAHPS has been employed for the measurement of both patient satisfaction [ 16 , 26 ] and patient experience [ 25 , 28 ]. Other feedback tools such as the Patient Satisfaction Questionnaire Short Form (PSQ) [ 24 ], the Friends and Family Test (FFT) [ 2 ] and Family Satisfaction in the Intensive Care Unit questionnaire (FS-ICU-24) [ 20 ] were used for measuring patient feedback and perception of care in our reviewed studies.

Finally, only one study in our review used a qualitative method to examine patient experience; drawing on pre-recorded video narratives published on the Canadian Patient Safety Institute website [ 2 ].

Relationship between patient safety culture and patient experience

In the research reviewed, the relationship between patient safety culture and patient experience was generally identified and presented as a statistical correlation [ 2 , 16 , 24 , 25 , 26 , 27 , 28 ]. Positive correlations were found between some domains of patient safety culture and patient experience (Table  4 ) [ 2 , 8 , 20 , 21 , 23 , 25 , 28 ]. The teamwork and communication domains seem to be central to positive correlations between patient safety culture and patient experience [ 8 , 16 , 25 , 26 , 27 ]. Other studies reviewed demonstrated no correlation between patient safety culture and patient experience overall scores [ 2 , 24 , 26 ].

Staff responsibilities, including direct contact with patients, may affect the relationship between patient safety culture and patient experience. For instance, no significant correlation was found between patient satisfaction and safety climate when management alone had a highly positive view of the safety climate [ 16 ]. However, when management and clinicians both had a positive view of the safety climate, there was a positive correlation. The FFT tool that measured patient experience was correlated with the ways patients perceived safety issues but was not correlated with either the staff safety culture or publicly available safety data [ 2 ]. From the sole qualitative study, we find that structuring safety and quality based on teamwork among healthcare professionals, patients, and family members is a more effective approach than relying on the individual healthcare practitioner alone [ 8 ]. Also, patients’ and families’ involvement is essential for creating a trusting relationship, which helps create an inviting environment that facilitates and encourages open communication and coordination among staff and patients [ 8 ]. Finally, conversation between staff, patients and families is crucial to capture different views of safety and better understand safety culture, particularly from the patient’s perspective.

The research under review also frequently examined how patient safety culture and patient experience, either individually or in combination, were related to other quality measures such as hospital performance, however this is outside of the scope of our review.

Patient safety culture and patient experience overlapped with other concepts

The concepts “safety culture” and “safety climate” were used interchangeably in the reviewed literature, which reflects their overlap in the broader literature, although these concepts are also sometimes differentiated. Patient safety culture tends to refer more broadly to the complex set of shared perceptions about safety that form over time in an organisation, while safety climate is considered ‘a snapshot’ of these shared perceptions, that can be measured at a specific time point using survey studies [ 29 , 30 ].

In the reviewed studies, the use of the terms patient experience and patient satisfaction also significantly overlapped. The two terms are recognised quality indicators for assessing healthcare quality, and while both concepts are related, they have also been differentiated [ 31 ]. Although the reviewed studies did not offer specific definitions, patient experience has been described elsewhere as patient “perceptions of phenomena for which they are the best or only sources of information, such as personal comfort or effectiveness of discharge planning” [5 p1]. While patient experience is viewed as the sum of all interactions that influence patient perceptions over the entire experience [ 32 ], as noted earlier, patient satisfaction is more about whether patients’ expectations are met [ 33 ]. In this regard, patient satisfaction is viewed as evaluating the patient experience of health services. Therefore, patients’ perception of what they actually experienced in healthcare organisations (patient experience) has an influential impact on how they evaluate healthcare services (patient satisfaction).

Measuring the relationship between patient safety culture and patient experience

The relationship identified between patient safety culture and patient experience in the reviewed literature is mostly measured by quantitative approaches/surveys, and thus little is known about causality or the underlying reasons (or mechanisms) for any relationship identified between these concepts. The availability, validity and reliability of the surveys such as HSOPS and HCAHPS may facilitate and encourage the use of questionnaires in busy working environments such as hospitals. However, the significant differences and variations in methodologies/tools (including dimensions captured by the instruments) employed to measure safety culture and patient experience, makes it difficult to compare the different items of research, and results in variations in the findings.

Patient involvement in the measurement of patient safety culture

Our review findings support research arguing that patients can provide useful feedback on safety [ 34 ]. Patient voice is increasingly included in other aspects of patient safety, but we need to include it more in the measurement of safety culture. In fact, some measures of patient experience pay attention to safety, for instance, in terms of physical comfort and a safe environment, which are also domains of patient safety culture. It was recognised in the included studies that instruments for assessing patient perceptions could be adapted to incorporate questions regarding patient safety, such as PMOS and PaPSC. This would enable patient perceptions and experience of safety to be assessed and the findings employed to effect enhancements in safety culture.

The PMOS and PaPSC scales were developed specifically to capture patients’ feedback on the safety of their care. The PMOS is based on the Yorkshire Contributory Factors Framework (YCFF) to capture patient feedback regarding the contributing factors to patient safety incidents [ 35 ]. However, the YCFF was developed based on input from healthcare professionals alone [ 36 ]. Likewise, the PaPSC scale was also initially developed based on staff perceptions. Although these scales are administered to patients, they may not fully reflect the patients’ perceptions of safety culture, if patients identify other aspects. In addition, the PMOS data was collected from one hospital in northern England; as such, the outcomes of the survey are not reflective of the perceptions of the general global population.

Another measurement approach for capturing patient perceptions of safety culture is to consider patients’ and families’ pre-recorded narratives as a qualitative assessment method [ 8 ]. This approach was limited in terms of inability to ask questions or follow-up with the participants, and the analysis was based on a revised or edited perspective that could carry certain biases. However, this study demonstrated the value of patient narratives and interviews in understanding the interrelationships between different aspects of patient safety culture. In contrast to surveys, qualitative interviews aim to understand participants’ attitudes, behaviours, experiences and perceptions. Qualitative research methods are common in healthcare research, but are largely missing in research into the association between safety culture and patient perceptions of safety culture.

No consensus exists as to the best method to be employed for the measurement of the concepts in question. Different measurements have been employed for each concept for various purposes, resulting in variations in data sources, and variations in results. Consequently, to create useful and usable data, there is a need to adopt measurement methods that are reliable, comparable and valid, for examining the relationship between patient safety culture and patient experience, such as the HSOPS and HCAHPS. It is also useful to consider qualitative investigation when exploring the relationship between these concepts.

Several relationships between patient experience and safety culture subdomains were identified in the included studies (Table  4 ). This suggests that staff and patient views on aspects of safety can be usefully incorporated and examined together. For example, the communication between staff and patients, and the coordination within and across hospital departments. According to Doyle, Lennox, and Bell [ 37 ], the smooth coordination (integration) of care is a key and valued aspect of the patient experience.

In this review, we found that the conceptual relationship between patient safety culture and the patient experience was not clearly described. The differences and overlaps between concepts, results, or measurement tools makes it difficult to understand the relationship between patient safety culture (among health professionals and managers) and patient experience. Future investigations may benefit from the development of a conceptual framework that allows researchers to test and develop their understandings of how patients’ experiences intersect with safety culture. We know that patient experience and safety culture are both valuable quality indicators. Better understanding how they are associated will enable healthcare staff to comprehend patient needs and create an effective strategy for enhancing patient safety culture that aligns with patients’ needs.

This scoping review has offered an overview of extant research regarding the association between patient experience and patient safety culture within the hospital context, and identified potential associations between the two concepts. However, the included studies have been conducted in limited countries, and generally assessed the relationship between these two concepts using quantitative methods. It may be the case that in other countries or cultures, the type of relationship could vary. Differences in ethnicity and national cultures could play an important role in patient experience. For instance, it was recognised in the reviewed literature, that Arab patients reported lower patient satisfaction levels compared with other ethnic groups within the same setting [ 10 ]. Therefore, it is important to consider other elements that may have an indirect effect on patient safety culture and patient experience, particularly in ethnic or national cultures where this relationship has not yet been investigated. Likewise, other factors related to the organisation could impact the relationship between the concepts. For example, the accreditation status of a facility has been shown to have a significant positive relationship with patient satisfaction [ 21 ].

It has been demonstrated that the terms “safety culture” and “safety climate,” as well as “patient experience” and “patient satisfaction” are not always consistently applied across research, with the concepts not often being clearly defined, lacking a theoretical basis for the relationship, not being widely investigated with qualitative methodologies and with considerable diversity in terms of the tools and methodologies employed. The outcomes of this review suggest that research into the association between patient safety culture and patient experience needs to be investigated by using a suitable theoretical framework, in combination with validated methods, and supported by qualitative inquiry, in order to investigate this relationship more comprehensively, particularly in contexts where such investigations have not taken place.

Limitations

While the literature search was conducted in major electronic databases without restrictions on date of publication or country of origin, additional relevant resources not in English or Arabic languages are likely to have been missed. This may lead to a language bias and limit the chance of capturing different perspectives from diverse communities to obtain a comprehensive understanding of the research phenomena, impacting the findings’ generalisability. Further, in accordance with the scoping review methodology of Arksey and O’Malley, a quality assessment was not conducted. Thus, it would be challenging to determine the validity of the reported findings due to the lack of quality assessment. These limitations are common in scoping reviews.

Data availability

Not applicable.

Abbreviations

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews

Population, Concept and Context

The Agency for Healthcare Research and Quality

The Australian Commission for Safety and Quality in Healthcare

The Agency for Clinical Innovation

National Institutes of Health

The Hospital Survey on Patient Safety Culture

The Safety Attitudes Questionnaire

The Patient Measure of Safety

The Hospital Consumer Assessment of Healthcare Providers and Systems

The Patient Satisfaction Questionnaire Short Form

The Friends and Family Test

Family Satisfaction in the Intensive Care Unit questionnaire

The Yorkshire Contributory Factors Framework

Donaldson MS, Corrigan JM, Kohn LT. To err is human: building a safer health system. National Academies; 2000.

Lawton R, O’Hara JK, Sheard L, Reynolds C, Cocks K, Armitage G, et al. Can staff and patient perspectives on hospital safety predict harm-free care? An analysis of staff and patient survey data and routinely collected outcomes. BMJ Qual Saf. 2015;24:369–76. https://doi.org/10.1136/bmjqs-2014-003691 .

Article   PubMed   PubMed Central   Google Scholar  

Zohar D, Livne Y, Tenne-Gazit O, Admi H, Donchin Y. Healthcare climate: a framework for measuring and improving patient safety. Crit Care Med. 2007;35(5):1312–7.

Article   PubMed   Google Scholar  

Monaca C, Bestmann B, Kattein M, Langner D, Müller H, Manser T. Assessing patients’ perceptions of safety culture in the hospital setting: development and initial evaluation of the patients’ perceptions of safety culture scale. J Patient Saf. 2020;16:90–7. https://doi.org/10.1097/pts.0000000000000436 .

Hagerty TA, Samuels W, Norcini-Pala A, Gigliotti E. Peplau’s theory of interpersonal relations: an alternate factor structure for patient experience data? Nurs Sci Quaterly. 2017;30(2):160–7.

Article   Google Scholar  

Dixon-Woods M, Baker R, Charles K, Dawson J, Jerzembek G, Martin G, McCarthy I, McKee L, Minion J, Ozieranski P, et al. Culture and behaviour in the English National Health Service: overview of lessons from a large multimethod study. BMJ Qual Saf. 2013;23:106–15.

Zimlichman E, Rozenblum R, Millenson ML. The road to patient experience of care measurement: lessons from the United States. Isr J Health Policy Res. 2013;2:35. https://doi.org/10.1186/2045-4015-2-35 .

Bishop AC, Cregan BR. Patient safety culture: finding meaning in patient experiences. Int J Health Care Qual Assur. 2015;28:595–610. https://doi.org/10.1108/IJHCQA-03-2014-0029 .

Gillespie A, Reader TW. Patient-centered insights: using health care complaints to reveal hot spots and blind spots in quality and safety. Milbank Q. 2018;96:530–67. https://doi.org/10.1111/1468-0009.12338 .

Kagan I, Porat N, Barnoy S. The quality and safety culture in general hospitals: patients’, physicians’ and nurses’ evaluation of its effect on patient satisfaction. Int J Qual Health Care. 2019;31:261–8. https://doi.org/10.1093/intqhc/mzy138 .

Ayorinde MO, Alabi PI. Perception and contributing factors to medication administration errors among nurses in Nigeria. Int J Afr Nurs Sci. 2019;11:100153–60.

Google Scholar  

Alabdaly A, Debono D, Hinchcliff R, Hor SY. Relationship between patient safety culture and patient experience in hospital settings: a scoping review protocol. BMJ Open. 2021;11:e049873. https://doi.org/10.1136/bmjopen-2021-049873c .

Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8:19–32. https://doi.org/10.1080/1364557032000119616 .

Joanna Briggs Institute. Joanna Briggs Institute reviewers’ manual. 2015. https://nursing.lsuhsc.edu/JBI/docs/ReviewersManuals/Scoping-.pdf . Accessed 18 Jun 2022.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73. https://doi.org/10.7326/M18-0850 .

Mazurenko O, Richter J, Kazley AS, Ford E. Examination of the relationship between management and clinician perception of patient safety climate and patient satisfaction. Health Care Manage Rev. 2019;44(1):79–89. https://doi.org/10.1097/HMR.0000000000000156 .

Bull C. Patient satisfaction and patient experience are not interchangeable concepts. Int J Qual Health Care. 2021;33(1). https://doi.org/10.1093/intqhc/mzab023 .

Lord L, Gale N. Subjective experience or objective process: understanding the gap between values and practice for involving patients in designing patient-centred care. J Health Organ Manag. 2014;28(6):714–30. https://doi.org/10.1108/JHOM-08-2013-0160 .

Do VQ, Mitchell R, Clay-Williams R, Taylor N, Ting HP, Arnolda G, Braithwaite J. Safety climate, leadership and patient views associated with hip fracture care quality and clinician perceptions of hip fracture care performance. Int J Qual Health Care. 2021;33:mzab152. https://doi.org/10.1093/intqhc/mzab152 .

Dodek PM, Wong H, Heyland DK, Cook DJ, Rocker GM, Kutsogiannis DJ, et al. The relationship between organizational culture and family satisfaction in critical care. Crit Care Med. 2012;40(5):1506–12. https://doi.org/10.1097/CCM.0b013e318241e368 .

Sembodo T, Hadi C, Purnomo W. Service quality model with cultural perspective in effect opatient satisfaction in hospitals with different accreditation status. Medico-Legal Update. 2019;19:204–9. https://doi.org/10.5958/0974-1283.2019.00041.0 .

Afshar PJ, Karbasi BJ, Moghadam MN. The relationship between patient safety culture with patient satisfaction and hospital performance in Shafa Hospital of Kerman in 2020. J Educ Health Promotion. 2021;10:455.

Burlakov N, Rozani V, Bluvstein I, Kagan I. The Association between quality and safety climate of a hospital ward, family members’ empowerment, and satisfaction with provided care. J Nurs Scholarsh. 2021;53:727–36. https://doi.org/10.1111/jnu.12682 .

Okafor CH, Ugwu AC, Okon IE. Effects of patient safety culture on patient satisfaction with radiological services in Nigerian radiodiagnostic practice. J Patient Experience. 2018;5:267–71. https://doi.org/10.1177/2374373518755500 .

Abrahamson K, Hass Z, Morgan K, Fulton B, Ramanujam R. The relationship between nurse-reported safety culture and the patient experience. J Nurs Adm. 2016;46:662–8. https://doi.org/10.1097/NNA.0000000000000423 .

Lyu H, Wick EC, Housman M, Freischlag JA, Makary MA. Patient satisfaction as a possible indicator of quality surgical care. JAMA Surg. 2013;148:362–7. https://doi.org/10.1001/2013.jamasurg.270 .

Smith SA, Yount N, Sorra J. Exploring relationships between hospital patient safety culture and consumer reports safety scores. BMC Health Serv Res. 2017;17:143. https://doi.org/10.1186/s12913-017-2078-6 .

Sorra J, Khanna K, Dyer N, Mardon R, Famolaro T. Exploring relationships between patient safety culture and patients’ assessments of hospital care. J Patient Saf. 2012;8:131–9. https://doi.org/10.1097/PTS.0b013e318258ca46 .

Sexton JB, Helmreich RL, Neilands TB, Rowan K, Vella K, Boyden J, Roberts PR, Thomas EJ. The safety attitudes questionnaire: psychometric properties, benchmarking data, and emerging research. BMC Health Serv Res. 2006;6:44.

Weaver SJ, Lubomksi LH, Wilson RF, Pfoh ER, Martinez KA, Dy SM. Promoting a culture of safety as a patient safety strategy. Ann Intern Med. 2013;158(5Part2):369–74. https://doi.org/10.7326/0003-4819-158-5-201303051-00002 .

Kumah E. Patient experience and satisfaction with a healthcare system: connecting the dots. Int J Healthc Manag. 2019;12:173–9. https://doi.org/10.1080/20479700.2017.1353776 .

The Beryl Institute, Defining. patient experience. 2018. https://www.theberylinstitute.org/page/DefiningPatientExp . Accessed 18 Jun 2022.

The Agency for Healthcare Research and Quality. What Is Patient Experience?. 2016. https://www.ahrq.gov/cahps/about-cahps/patient-experience/index.html . Accessed 18 Jun 2022.

Hor SY, Godbold N, Collier A, Iedema R. Finding the patient in patient safety. Health. 2013;17:567–83. https://doi.org/10.1177/1363459312472082 .

Giles SJ, Lawton RJ, Din I, McEachan RR. Developing a patient measure of safety (PMOS). BMJ Qual Saf Sci. 2013;22:554–62.

Lawton R, McEachan RR, Giles SJ, Sirriyeh R, Watt IS, Wright J. Development of an evidence-based framework of factors contributing to patient safety incidents in hospital settings: a systematic review. BMJ Qual Saf. 2012;21:369–80.

Doyle C, Lennox L, Bell D. A systematic review of evidence on the links between patient experience and clinical safety and effectiveness. BMJ Open. 2013;3:1–18. https://doi.org/10.1136/bmjopen-2012-001570 .

Download references

Acknowledgements

The authors wish to acknowledge the librarians at the University of Technology Sydney for providing support in developing the search strategy for this study. The authors acknowledge the Gadigal of the Eora Nation, the traditional custodians of the land on which this study was conducted, and pay our respects to the Elders both past and present.

The first author is funded for a PhD scholarship from Imam Abdulrahman Bin Faisal University, Saudi Arabia.

Author information

Authors and affiliations.

Faculty of Health, University of Technology Sydney, Sydney, NSW, Australia

Adel Alabdaly

College of Nursing, Imam Abdulrahman Bin Faisal University, Dammam, Eastern Province, Saudi Arabia

School of Applied Psychology, Griffith Health Group, Griffith University, Brisbane, QLD, Australia

Reece Hinchcliff

School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Brisbane, QLD, Australia

School of Public Health, University of Technology Sydney, Sydney, NSW, Australia

Deborah Debono & Su-Yin Hor

You can also search for this author in PubMed   Google Scholar

Contributions

A.A conceived and wrote the original manuscript. R.H, D.D and S.H reviewed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Adel Alabdaly .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Alabdaly, A., Hinchcliff, R., Debono, D. et al. Relationship between patient safety culture and patient experience in hospital settings: a scoping review. BMC Health Serv Res 24 , 906 (2024). https://doi.org/10.1186/s12913-024-11329-w

Download citation

Received : 24 August 2023

Accepted : 18 July 2024

Published : 07 August 2024

DOI : https://doi.org/10.1186/s12913-024-11329-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Safety culture
  • Safety climate
  • Patient satisfaction
  • Customer satisfaction
  • Healthcare quality
  • Health services
  • Quality indicators
  • Patient safety

BMC Health Services Research

ISSN: 1472-6963

literature review quality of data

IMAGES

  1. (PDF) A practical guide to data analysis in general literature reviews

    literature review quality of data

  2. (PDF) Systematic Literature Review of Data Quality Within OpenStreetMap

    literature review quality of data

  3. How to conduct a Systematic Literature Review

    literature review quality of data

  4. (PDF) A systematic literature review on data quality assessment

    literature review quality of data

  5. Systematic Literature Review Methodology

    literature review quality of data

  6. systematic literature review quality management

    literature review quality of data

COMMENTS

  1. Data Quality in Health Research: Integrative Literature Review

    Background. Decision-making and strategies to improve service delivery must be supported by reliable health data to generate consistent evidence on health status. The data quality management process must ensure the reliability of collected data. Consequently, various methodologies to improve the quality of services are applied in the health field.

  2. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  3. A systematic literature review on data quality assessment

    In light of this objective, the study proposes a systematic literature review (SLR) as a suitable approach to examine the landscape of data quality and investigate available research specifically ...

  4. Assessing the practice of data quality evaluation in a national

    A systematic scoping review of data quality assessment literature. We started with 3 widely cited core references on EHR DQ assessment, including 2 review articles from Chan et al (2010) 16 and Weiskopf et al (2013a), 19 and 1 DQ framework from Kahn et al (2016). 25 First, we summarized and mapped the DQ dimensions in these 3 core references ...

  5. Digital Health Data Quality Issues: Systematic Review

    Poor data quality (DQ) can be detrimental to continuity of care , patient safety ... The multidisciplinary systematic literature review conducted in this study resulted in the development of a consolidated digital health DQ framework comprising 6 DQ dimensions, the interrelationships among these dimensions, 6 DQ outcomes, and the relationships ...

  6. Data Quality in Health Research: Integrative Literature Review

    Objective: Through an integrative literature review, the aim of this work was to identify and evaluate digital health technology interventions designed to support the conducting of health research based on data quality. Methods: A search was conducted in 6 electronic scientific databases in January 2022: PubMed, SCOPUS, Web of Science ...

  7. Literature review as a research methodology: An overview and guidelines

    This is why the literature review as a research method is more relevant than ever. Traditional literature reviews often lack thoroughness and rigor and are conducted ad hoc, rather than following a specific methodology. Therefore, questions can be raised about the quality and trustworthiness of these types of reviews.

  8. Overview of Data Quality: Examining the Dimensions ...

    To better understand data quality, we review the literature on data quality studies in information systems. We identify the data quality dimensions, antecedents, and their impacts. In this study, the notion of "Data Analytics Competency" is developed and validated as a five-dimensional formative measure (i.e., data quality, the bigness of ...

  9. Writing a literature review

    A literature review differs from a systematic review, which addresses a specific clinical question by combining the results of multiple clinical trials (an article on this topic will follow as part of this series of publications). ... data processing and statistical analysis can give an indication of data quality, reliability and reproducibility.

  10. Data Quality in health research: a systematic literature review

    Decision-making and strategies to improve service delivery need to be supported by reliable health data to generate consistent evidence on health status, so the data quality management process must ensure the reliability of the data collected. Thus, through an integrative literature review, the main objective of this work is to identify and evaluate digital health technology interventions ...

  11. (PDF) A review of data quality research in achieving high data quality

    The aim of this review is to highlight issues in data quality research and to discuss potential research opportunity to achieve high data quality within an organization. The review adopted ...

  12. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  13. Measuring Data Quality: A Review of the Literature between ...

    A literature review was done within a revision of a guideline concerned with data quality management in registries and cohort studies. The review focused on quality indicators, feedback, and source data verification. Thirty-nine relevant articles were selected in a stepwise selection process. The majority of the papers dealt with indicators.

  14. The METRIC-framework for assessing data quality for ...

    Contrary to the big data and general data quality literature from our corpus, the DL papers focus on the evaluation of one or very few specific data quality dimensions without (yet) considering ...

  15. Understanding the differences across data quality classifications: a

    Understanding the differences across data quality classifications: a literature review and guidelines for future research - Author: Anders Haug ... On this basis, guidelines for future research are developed.,The literature review found 110 unique DQCs in journals and conference articles. The analysis of these articles identified seven distinct ...

  16. A systematic literature review on data quality assessment

    Quality models Systematic literature review ABSTRACT Defining and evaluating data quality can be a complex task as it varies depend-ing on the specific purpose for which the data is intended. To effectively assess data quality, it is essential to take into account the intended use of the data and the specific requirements of the data users.

  17. The Challenges of Data Quality and Data Quality Assessment in the Big

    2 Literature Review on Data Quality. In the 1950s, researchers began to study quality issues, especially for the quality of products, and a series of definitions, for example, quality is "the degree to which a set of inherent characteristics fulfill the requirements" ...

  18. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  19. (PDF) Data analytics in quality 4.0: literature review and future

    Gregoris Mentzas (2022): Data analytics in quality 4.0: literature review and future. research directions, International Journal of Computer Integrated Manufacturing, DOI: 10.1080/0951192X.2022. ...

  20. Systematic Literature Review of Data Quality Within OpenStreetMap

    This paper conducts a systematic literature review aims to identify current research and directions in terms of quality of OpenStreetMap data. OpenStreetMap is a valuable source of geographical data. Worldwide several people with different mapping experience and skills are contributing data to the free geo-database by using different mapping gadgets. This makes the OpenStreetMap data more ...

  21. Literature Review: The What, Why and How-to Guide

    Example: Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework: 10.1177/08948453211037398 ; Systematic review: "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139).

  22. Automated data analysis of unstructured grey literature in health

    Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. ... this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans ...

  23. Models and frameworks for assessing the implementation of clinical

    Search results. Database searches yielded 26,011 studies, of which 107 full texts were reviewed. During the full-text review, 99 articles were excluded: 41 studies did not mention a model or framework for assessing the implementation of the CPG, 31 studies evaluated only implementation strategies (isolated actions) rather than the implementation process itself, and 27 articles were not related ...

  24. Frontiers

    The process consisted of (1) the formulation of research questions, (2) the selection of databases and search terms or keywords for the preliminary mapping of articles published on the topic in question as part of the search process, (3) the selection of quality criteria for the inclusion and exclusion of articles to be reviewed, (4) the ...

  25. Systematic and other reviews: criteria and complexities

    A systematic review follows explicit methodology to answer a well-defined research question by searching the literature comprehensively, evaluating the quantity and quality of research evidence rigorously, and analyzing the evidence to synthesize an answer to the research question. The evidence gathered in systematic reviews can be qualitative ...

  26. Frontiers

    The study included an evaluation of bias risk, which was performed using the bias risk assessment tool recommended by the Cochrane Systematic Review Manual 5.3.2. Two researchers independently screened the literature and extracted data, while also assessing the quality of the extracted data and cross-checking it.

  27. Systematic literature review of gender equity and social inclusion in

    The systematic literature review employed statistical methods to measure effect sizes and employed traditional univariate systematic literature review to synthesize the results. A table summarizing the literature that met the inclusion criteria was created to ensure transparency and clarity in the data coding process.

  28. Challenges affecting migrant healthcare workers while adjusting to new

    Our data search was initiated on June 20th, 2023, utilizing the PubMed database, and subsequently extended on June 25th, 2023, to include the Embase database, which incorporates Medline. To construct an effective search strategy, we conducted a preliminary literature review to identify relevant keywords and Mesh terms.

  29. Frontiers

    Meta-analysis was conducted using the comprehensive software STATA 16.0. A random effects model was chosen for analysis based on the level of heterogeneity, typically with I 2 >50%. The I 2 index was used to assess heterogeneity between point estimates, indicating the proportion of variation between point estimates attributed to heterogeneity. . Traditionally, I 2 values below 25% suggest low ...

  30. Relationship between patient safety culture and patient experience in

    Background Measures of patient safety culture and patient experience are both commonly utilised to evaluate the quality of healthcare services, including hospitals, but the relationship between these two domains remains uncertain. In this study, we aimed to explore and synthesise published literature regarding the relationships between these topics in hospital settings. Methods This study was ...