“The only truly modern academic research engine”

Oa.mg is a search engine for academic papers, specialising in open access. we have over 250 million papers in our index..

  • Advanced search
  • Peer review

research papers open access

Discover relevant research today

research papers open access

Advance your research field in the open

research papers open access

Reach new audiences and maximize your readership

ScienceOpen puts your research in the context of

Publications

For Publishers

ScienceOpen offers content hosting, context building and marketing services for publishers. See our tailored offerings

  • For academic publishers  to promote journals and interdisciplinary collections
  • For open access journals  to host journal content in an interactive environment
  • For university library publishing  to develop new open access paradigms for their scholars
  • For scholarly societies  to promote content with interactive features

For Institutions

ScienceOpen offers state-of-the-art technology and a range of solutions and services

  • For faculties and research groups  to promote and share your work
  • For research institutes  to build up your own branding for OA publications
  • For funders  to develop new open access publishing paradigms
  • For university libraries to create an independent OA publishing environment

For Researchers

Make an impact and build your research profile in the open with ScienceOpen

  • Search and discover relevant research in over 95 million Open Access articles and article records
  • Share your expertise and get credit by publicly reviewing any article
  • Publish your poster or preprint and track usage and impact with article- and author-level metrics
  • Create a topical Collection  to advance your research field

Create a Journal powered by ScienceOpen

Launching a new open access journal or an open access press? ScienceOpen now provides full end-to-end open access publishing solutions – embedded within our smart interactive discovery environment. A modular approach allows open access publishers to pick and choose among a range of services and design the platform that fits their goals and budget.

Continue reading “Create a Journal powered by ScienceOpen”   

What can a Researcher do on ScienceOpen?

ScienceOpen provides researchers with a wide range of tools to support their research – all for free. Here is a short checklist to make sure you are getting the most of the technological infrastructure and content that we have to offer. What can a researcher do on ScienceOpen? Continue reading “What can a Researcher do on ScienceOpen?”   

ScienceOpen on the Road

Upcoming events.

  • 15 June – Scheduled Server Maintenance, 13:00 – 01:00 CEST

Past Events

  • 20 – 22 February – ResearcherToReader Conference
  • 09 November – Webinar for the Discoverability of African Research
  • 26 – 27 October – Attending the Workshop on Open Citations and Open Scholarly Metadata
  • 18 – 22 October – ScienceOpen at Frankfurt Book Fair.
  • 27 – 29 September – Attending OA Tage, Berlin .
  • 25 – 27 September – ScienceOpen at Open Science Fair
  • 19 – 21 September – OASPA 2023 Annual Conference .
  • 22 – 24 May – ScienceOpen sponsoring Pint of Science, Berlin.
  • 16-17 May – ScienceOpen at 3rd AEUP Conference.
  • 20 – 21 April – ScienceOpen attending Scaling Small: Community-Owned Futures for Open Access Books .

What is ScienceOpen?

  • Smart search and discovery within an interactive interface
  • Researcher promotion and ORCID integration
  • Open evaluation with article reviews and Collections
  • Business model based on providing services to publishers

Live Twitter stream

Some of our partners:.

EDP

This website uses cookies to ensure you get the best experience. Learn more about DOAJ’s privacy policy.

Hide this message

You are using an outdated browser. Please upgrade your browser to improve your experience and security.

The Directory of Open Access Journals

Directory of Open Access Journals

Find open access journals & articles.

Doaj in numbers.

80 languages

134 countries represented

13,678 journals without APCs

20,748 journals

10,391,206 article records

Quick search

About the directory.

DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, and is committed to ensuring quality content is freely available online for everyone.

DOAJ is committed to keeping its services free of charge, including being indexed, and its data freely available.

→ About DOAJ

→ How to apply

DOAJ is twenty years old in 2023.

Fund our 20th anniversary campaign

DOAJ is independent. All support is via donations.

82% from academic organisations

18% from contributors

Support DOAJ

Publishers don't need to donate to be part of DOAJ.

News Service

Meet the doaj team: head of editorial and deputy head of editorial (quality), vacancy: operations manager, press release: pubscholar joins the movement to support the directory of open access journals, new major version of the api to be released.

→ All blog posts

We would not be able to work without our volunteers, such as these top-performing editors and associate editors.

→ Meet our volunteers

Librarianship, Scholarly Publishing, Data Management

Brisbane, Australia (Chinese, English)

Adana, Türkiye (Turkish, English)

Humanities, Social Sciences

Natalia Pamuła

Toruń, Poland (Polish, English)

Medical Sciences, Nutrition

Pablo Hernandez

Caracas, Venezuela (Spanish, English)

Research Evaluation

Paola Galimberti

Milan, Italy (Italian, German, English)

Social Sciences, Humanities

Dawam M. Rohmatulloh

Ponorogo, Indonesia (Bahasa Indonesia, English, Dutch)

Systematic Entomology

Kadri Kıran

Edirne, Türkiye (English, Turkish, German)

Library and Information Science

Nataliia Kaliuzhna

Kyiv, Ukraine (Ukrainian, Russian, English, Polish)

WeChat QR code

research papers open access

🇺🇦    make metadata, not war

A comprehensive bibliographic database of the world’s scholarly literature

The world’s largest collection of open access research papers, machine access to our vast unique full text corpus, core features, indexing the world’s repositories.

We serve the global network of repositories and journals

Comprehensive data coverage

We provide both metadata and full text access to our comprehensive collection through our APIs and Datasets

Powerful services

We create powerful services for researchers, universities, and industry

Cutting-edge solutions

We research and develop innovative data-driven and AI solutions

Committed to the POSI

Cost-free PIDs for your repository

OAI identifiers are unique identifiers minted cost-free by repositories. Ensure that your repository is correctly configured, enabling the CORE OAI Resolver to redirect your identifiers to your repository landing pages.

OAI IDs provide a cost-free option for assigning Persistent Identifiers (PIDs) to your repository records. Learn more.

Who we serve?

Enabling others to create new tools and innovate using a global comprehensive collection of research papers.

Companies

“ Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can ... ” Show more

Gareth Malcolm, Content Partner Manager at Turnitin

Academic institutions.

Making research more discoverable, improving metadata quality, helping to meet and monitor open access compliance.

Academic institutions

“ CORE’s role in providing a unified search of repository content is a great tool for the researcher and ex... ” Show more

Nicola Dowson, Library Services Manager at Open University

Researchers & general public.

Tools to find, discover and explore the wealth of open access research. Free for everyone, forever.

Researchers & general public

“ With millions of research papers available across thousands of different systems, CORE provides an invalu... ” Show more

Jon Tennant, Rogue Paleontologist and Founder of the Open Science MOOC

Helping funders to analyse, audit and monitor open research and accelerate towards open science.

Funders

“ Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, hel... ” Show more

Ben Johnson, Research Policy Adviser at Research England

Our services, access to raw data.

Create new and innovative solutions.

Content discovery

Find relevant research and make your research more visible.

Managing content

Manage how your research content is exposed to the world.

Companies using CORE

Gareth Malcolm

Gareth Malcolm

Content Partner Manager at Turnitin

Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can utilise in our plagiarism detection software.

Academic institution using CORE

Kathleen Shearer

Executive Director of the Confederation of Open Access Repositories (COAR)

CORE has significantly assisted the academic institutions participating in our global network with their key mission, which is their scientific content exposure. In addition, CORE has helped our content administrators to showcase the real benefits of repositories via its added value services.

Partner projects

Ben Johnson

Ben Johnson

Research Policy Adviser

Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

logo

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox , Microsoft Edge , Google Chrome , or Safari 14 or newer. If you are unable to, and need support, please send us your feedback .

We'd appreciate your feedback. Tell us what you think! opens in new tab/window

Open access

Advancing open access to knowledge.

Open access is a key part of our mission to help researchers advance science for societal progress.

Man with telescope looking at stars and galaxy

Open access at Elsevier

Open access is vital to a collaborative, inclusive and transparent world of research where quality knowledge can be shared and built upon. Every day, we work to bring more insight into closer reach for the research community and the public. We offer a wide choice and flexibility for every researcher and institution around the world that wants to publish open access, without ever compromising on research quality, integrity and value.

Enabling a transition to open access

As one of the largest open access publishers in the world we are enabling a transition to open access at scale. Nearly all our 2,900 journals enable open access publishing and more than 800 of these are fully open access. In 2023 we published more than 190,000 open access articles.

O ur world-leading research platforms make available 3.3 million validated open access articles and we support more than 2000 institutions with open access agreements.

Students sitting together

Delivering high quality research

Each year, we receive around 3 million research papers from authors. Whether published open access or via subscription model, they are all rigorously reviewed by our in-house editorial teams in collaboration with 33,000 editors and 1.5 million expert reviewers around the world.

The result is over 630,000 articles in 2023 enhanced, indexed, certified, published and promoted following peer review. These processes and the assistance provided to authors along the way ensure the integrity and reliability of research and of the scientific record. Articles in Elsevier journals account for over 17% of the global research output and 28% of global citations, reinforcing our focus on quality.

Librarian looking up books on computer for student

Supporting every researcher and institution

We offer a broad range of choices to support every researcher and institution in accessing and publishing research. In 2023 we supported more than half a million researchers in 190 countries and territories to publish open access.

Alongside our commitment to pricing article publishing charges below market average relative to comparable quality, we have initiatives to support researchers in low- and middle-income countries. In 2023, we waived or discounted costs for nearly 80% of authors from the Global South and introduced the industry-first Geographical Pricing for Open Access initiative. This considers local economic circumstances to help researchers publish research open access.

research papers open access

Building open access sustainability with transformative agreements

In a series of three case studies, library leaders share their insights into the transformative agreement process. Librarians guide readers through setting goals and communicating to stakeholders, working with publishers, and implementing the agreement across their institutions.

Learn more about transformative agreements that drive cross-campus collaboration, support researchers, and sustainably expand open access.

Three people standing around a table looking at a paper.

How we are advancing open access

Researcher perspectives

Prof Charles Spence, PhD, of the University of Oxford investigates how our senses interact and how they impact our daily lives.

Multisensory researcher on how open access connects academia to the wider world

Image of Dr. Heyddy Calderon

Open access publishing is indispensable, says award-winning hydrology researcher

Quote by Christopher Parsonson

Advancing data center networking through open access

Image of Prof. Gawsia Wahidunessa Chowdhury, PhD

“Open access is like a window of knowledge”

Open science .

Open access is just one element of the way we partner with you to drive open science. Together we can create a more inclusive, collaborative and transparent world of research.

Unlocking the potential of data

We're working to help researchers and institutions store, share, discover and effectively reuse data. Effective data sharing can improve the impact, validity, reproducibility, efficiency and transparency of scientific research.

Underwater marine biologist photographer taking a photo of the fish and coral reef

Promoting research integrity

We are committed to promoting the integrity of research through a range of activities and initiatives from free author training on publication ethics and providing transparency in author contributor roles

Researchers in the lab

Free access initiatives

From researchers and students using content published in our books and journals on a daily basis to a patient who needs critical information about their treatment, Elsevier has a range of access options to ensure that everyone can access the important information they need.

Find our access options:

Public Relations Business woman smiling

Access for public and media

solar farm

Access for developing countries

students in library, looking at books

Access for researchers and students

doctor examining patient

Access for healthcare and patients

coronavirus-image

Responding to public health emergencies

Frequently asked questions, how many of your journals offer a gold open access option.

Elsevier is one of the fastest-growing open access publishers in the world. Nearly all of Elsevier's 2,900 journals now enable open access publishing, including 800 journals which are fully open access journals. 

What is your position on Green Open Access?

All Elsevier journals allow authors to use Green Open Access, usually after an embargo period. Green Open Access is when authors share a public version of their article, for example in their institution or funder’s repository, which would otherwise only be available to paying subscribers. 

Do you support access to subscription articles in any other ways?

Elsevier makes subscription articles completely free to access in specific situations: 

We offer free access to relevant research for health emergencies,  as we did during the Covid-19 pandemic . 

Patients and caregivers are provided with papers related to medicine and healthcare upon request to help them better understand the latest research on their conditions. 

Through  Research4Life opens in new tab/window , institutions in 120 low- and middle-income countries receive affordable access to nearly 100,400 peer reviewed resources. As founding member, Elsevier provides over a quarter of that content, as well as access to the abstract and citation database Scopus, and trainings for librarians. 

How do you formulate your prices for publishing and subscriptions?

We strive to offer researchers value for money, and we are committed to pricing our journals competitively with an underlying principle of pricing lower than the market for like-for-like quality.

Open access content and subscription content are priced separately. Open access publishing is supported by the pay-to-publish model, where authors (or others on their behalf) pay an Article Publishing Charge (APC) to enable the article to be made publicly available immediately on publication. 

We set APC prices based on the following criteria: 

Journal quality 

The journal’s editorial and technical processes 

Competitive considerations 

Market conditions 

Other revenue streams associated with the journal such as advertising 

Elsevier’s APC prices are set on a per journal basis. Fees range between c$150 and c$10,100 US Dollars, excluding tax, with prices clearly displayed on our  APC price list opens in new tab/window   and on journal homepages. 

Where articles are not supported by the pay-to-publish model, they are typically supported by subscription fees paid for by readers. 

We set journal subscription list prices based on the following criteria: 

Number of subscription articles 

And other revenue streams such as commercial contributions from advertising, reprints and supplements 

Can you be more transparent in what you charge?

We are constantly striving to be more transparent in all aspects of what Elsevier does, including pricing. We try to support requests for information within the bounds allowed by financial reporting requirements and competition rules. 

For authors: 

We provide the price of publishing gold open access on each journal homepage and in a central list opens in new tab/window

We automatically  notify authors who are entitled to free or discounted gold open Access, for example where there is an agreement with their institution or funder

We automatically notify authors who are entitled to free or discounted gold open access because they are based in a low — or middle-income country — our APC waiver policy explains this process

For librarians: 

We provide a range of information opens in new tab/window   about our pricing competitiveness; how our pricing corresponds to quality; and publishing model uptake across subscription and open access

We publicly announce significant agreements, including our open access pilots 

We provide a list of our journal subscription prices

We describe the process we follow to calculate list prices

We describe the process to ensure we do not double dip — we also show the number of articles that are published gold open access, and the number which are financed through subscriptions, on each journal homepage, to allow librarians to validate this

Do you double dip (i.e., charge for the same article twice)?

We do not double-dip. We can be reimbursed for an article in two ways — through an Article Publishing Charge (APC) or a subscription — but we never charge for the same article twice. We have a strict no double-dipping policy .

How do you help authors who cannot afford to pay to be published, and why can't you offer that support more widely?

As part of our commitment to inclusion and diversity in science we believe no researcher should be prevented from publishing in their journal of choice because of financial barriers. We support researchers from low- and middle-income countries to publish fold open access if they wish to do so. When publishing in fully open access journals, we fully waive all open access charges for authors from 69 countries ( Group A opens in new tab/window ) and give a 50% discount for authors from 57 countries ( Group B opens in new tab/window ). 

For other authors, we offer a choice of journals with open access publishing charges ranging from $150 to $10,100. We will also consider requests for accommodations on a case-by-case basis for authors who are required to publish open access but do not have the financial means to do so. 

Finally, we provide high quality subscription publishing options, so authors should never face a cost barrier to publishing in their journal of choice.

If more authors are publishing Gold Open Access, why don't you reduce your subscription fees?

We strive to offer researchers real value, and we are continuing our commitment to pricing our journals competitively with an underlying principle of pricing lower than the market for like-for-like quality.

We see growth in the number of articles published through both the gold open access and subscription models. Subscription volumes rose by over 7% in 2020 compared to the previous year, for instance. However, we still price competitively: Elsevier’s average price change has been the lowest amongst major competitors in the last 13 years due to moderate historical price changes and this strong volume growth. At the same time, we maintain high-quality content. 

Our prices for subscription articles and APCs are set completely separately. Subscription fees are based on a range of factors, as noted above.

Does Elsevier have any Transformative Journals?

Elsevier is piloting  transformative journal opens in new tab/window status for more than 60 journals across our portfolio. You can see the full list opens in new tab/window of transformative journals and targets and visit the relevant individual journal home pages for more information.

Explore more

Open access journals, open access books.

Journals By Subject    |    Journals A - Z

Architecture / Design

Biomedicine, business and management, computer science, earth sciences, engineering, environment, life sciences, materials science, mathematics, medicine & public health, science, humanities and social sciences, multidisciplinary, social sciences.

Browse article collections by subject.

  • Built Heritage
  • Cellular and Molecular Neurobiology
  • Future Business Journal
  • International Journal of Corporate Social Responsibility
  • Journal of Innovation and Entrepreneurship
  • Journal of Shipping and Trade
  • Schmalenbach Journal of Business Research
  • Applied Biological Chemistry
  • Bioresources and Bioprocessing
  • Fashion and Textiles
  • Journal of Analytical Science and Technology
  • Journal of Umm Al-Qura University for Applied Sciences
  • Applied Network Science
  • Brain Informatics
  • Cybersecurity
  • Energy Informatics
  • EPJ Data Science
  • International Journal of Educational Technology in Higher Education
  • Journal of Big Data
  • Journal of Cloud Computing
  • Visual Computing for Industry, Biomedicine, and Art
  • International Journal of Implant Dentistry
  • Maxillofacial Plastic and Reconstructive Surgery
  • Progress in Orthodontics
  • Earth, Planets and Space
  • Geoscience Letters
  • Geothermal Energy
  • Progress in Earth and Planetary Science
  • Swiss Journal of Geosciences
  • Swiss Journal of Palaeontology
  • Agricultural and Food Economics
  • Financial Innovation
  • Journal for Labour Market Research
  • Journal of Economic Structures
  • Marine Development
  • Swiss Journal of Economics and Statistics
  • Asian-Pacific Journal of Second and Foreign Language Education
  • Disciplinary and Interdisciplinary Science Education Research
  • Empirical Research in Vocational Education and Training
  • International Journal of Child Care and Education Policy
  • International Journal of STEM Education
  • Language Testing in Asia
  • Large-scale Assessments in Education
  • Smart Learning Environments
  • Sustainable Energy Research
  • Advanced Modeling and Simulation in Engineering Sciences
  • Advances in Aerodynamics
  • Advances in Bridge Engineering
  • AI Perspectives & Advances
  • Chinese Journal of Mechanical Engineering
  • EURASIP Journal on Advances in Signal Processing
  • EURASIP Journal on Audio, Speech, and Music Processing
  • EURASIP Journal on Image and Video Processing
  • EURASIP Journal on Information Security
  • EURASIP Journal on Wireless Communications and Networking
  • European Transport Research Review
  • International Journal of Concrete Structures and Materials
  • Journal of Electrical Systems and Information Technology
  • Journal of Engineering and Applied Science
  • Journal of Infrastructure Preservation and Resilience
  • Journal of Materials Science: Materials in Engineering
  • Journal of Umm Al-Qura University for Engineering and Architecture
  • Micro and Nano Systems Letters
  • Moore and More
  • ROBOMECH Journal
  • Satellite Navigation
  • Ecological Processes
  • Environmental Sciences Europe
  • Environmental Systems Research
  • Geoenvironmental Disasters
  • City, Territory and Architecture
  • European Journal of Futures Research
  • Journal of International Humanitarian Action
  • AMB Express
  • Botanical Studies
  • Cell Regeneration
  • Chemical and Biological Technologies in Agriculture
  • Egyptian Journal of Biological Pest Control
  • Fire Ecology
  • Horticulture Advances
  • Journal of Wood Science
  • Natural Products and Bioprospecting
  • The Journal of Basic and Applied Zoology
  • Applied Microscopy
  • Collagen and Leather
  • Functional Composite Materials
  • Heritage Science
  • Journal of Materials Science: Materials Theory
  • Microplastics and Nanoplastics
  • Nano Convergence
  • Advances in Continuous and Discrete Models
  • Boundary Value Problems
  • Fixed Point Theory and Algorithms for Sciences and Engineering
  • Journal of Inequalities and Applications
  • Journal of Mathematics in Industry
  • African Journal of Urology
  • Annals of Intensive Care
  • Beni-Suef University Journal of Basic and Applied Sciences
  • Blood Research
  • Bulletin of Faculty of Physical Therapy
  • Clinical Phytoscience
  • CVIR Endovascular Open peer review
  • Egyptian Journal of Forensic Sciences
  • Egyptian Journal of Medical Human Genetics
  • Egyptian Journal of Neurosurgery
  • Egyptian Journal of Radiology and Nuclear Medicine
  • Egyptian Liver Journal
  • Egyptian Pediatric Association Gazette
  • Egyptian Rheumatology and Rehabilitation
  • EJNMMI Physics
  • EJNMMI Radiopharmacy and Chemistry
  • EJNMMI Reports
  • EJNMMI Research
  • European Radiology Experimental
  • Future Journal of Pharmaceutical Sciences
  • Insights into Imaging
  • Intensive Care Medicine Experimental
  • International Journal of Bipolar Disorders
  • JA Clinical Reports
  • Journal of Ophthalmic Inflammation and Infection
  • Journal of Orthopaedics and Traumatology
  • Journal of Patient-Reported Outcomes
  • Journal of the Egyptian National Cancer Institute
  • Journal of the Egyptian Public Health Association
  • Middle East Current Psychiatry
  • Middle East Fertility Society Journal
  • Molecular and Cellular Pediatrics
  • Sports Medicine - Open
  • Surgical Case Reports
  • The Cardiothoracic Surgeon
  • The Egyptian Heart Journal
  • The Egyptian Journal of Bronchology
  • The Egyptian Journal of Internal Medicine
  • The Egyptian Journal of Neurology, Psychiatry and Neurosurgery
  • The Egyptian Journal of Otolaryngology
  • The Ultrasound Journal
  • eLight Transparent peer review
  • EPJ Quantum Technology
  • EPJ Techniques and Instrumentation
  • Surface Science and Technology
  • Cognitive Research: Principles and Implications
  • Psicologia: Reflexão e Crítica
  • Bulletin of the National Research Centre
  • Comparative Migration Studies
  • International Journal of Anthropology and Ethnology
  • The Journal of Chinese Sociology
  • Search Search
  • CN (Chinese)
  • DE (German)
  • ES (Spanish)
  • FR (Français)
  • JP (Japanese)
  • Open science
  • Booksellers
  • Peer Reviewers
  • Springer Nature Group ↗
  • Fundamentals of open research
  • Gold or Green routes to open research
  • Benefits of open research
  • Open research timeline
  • Whitepapers
  • About overview
  • Journal pricing FAQs
  • Publishing an OA book
  • Journals & books overview
  • OA article funding
  • Article OA funding and policy guidance
  • OA book funding
  • Book OA funding and policy guidance
  • Funding & support overview
  • Open access agreements
  • Springer Nature journal policies
  • APC waivers and discounts
  • Springer Nature book policies
  • Publication policies overview

Open access journals

We have published over 124,000 open access articles via gold open access across disciplines –from the life sciences to the humanities, representing 33% of all springer nature articles in 2020. authors can also publish their article under an open access licence in more than 2,200 of our hybrid journals..

Our portfolio focuses on robust and insightful research, supporting the development of new areas of knowledge and making ideas and information accessible around the globe.

Across our publishing imprints there are leading multidisciplinary and community-focused journals that offer rigorous, high-impact open access. Many of our titles are also published in partnership with academic societies, enabling them to achieve their own open research ambitions.

OA articles published via Gold OA 

Hybrid OA journals

Open access books

Fully open access journals

Download a list of our fully open access journals, including APC and licence information.

This list indicates the standard article processing charge (APC) for each journal. APCs are payable for articles upon acceptance. While we make every effort to keep this list updated, please note that APCs are subject to change and may vary from the price listed. For further information on the licences and other currencies available, self-archiving embargoes, manuscript deposition, and abstracting & indexing, visit the individual journal’s website. VAT or local taxes will be added where applicable.

Questions about paying for open access?

View our frequently asked questions about article processing charges (APCs).

Visit our imprint sites

BMC

Hybrid journals

Download a list of our hybrid journals, including Springer Open Choice titles. We publish more than 2,200 journals that offer open access at the article level, allowing optional open access in the majority of Springer Nature's subscription-based journals.

This list indicates the standard article processing charge (APC) for each journal. APCs are payable for articles upon acceptance. While we make every effort to keep this list updated, please note that APCs are subject to change and may vary from the price listed. For further information on the licences and other currencies available, self-archiving embargoes, manuscript deposition, and abstracting & indexing, visit the individual journal’s website. VAT or local taxes will be added where applicable.

Find out more by imprint

Springer open choice, springer nature hybrid journals on nature.com, palgrave macmillan hybrid journals, stay up to date.

Here to foster information exchange with the library community

Connect with us on LinkedIn and stay up to date with news and development.

  • Tools & Services
  • Account Development
  • Sales and account contacts
  • Professional
  • Press office
  • Locations & Contact

We are a world leading research, educational and professional publisher. Visit our main website for more information.

  • © 2024 Springer Nature
  • General terms and conditions
  • Your US State Privacy Rights
  • Your Privacy Choices / Manage Cookies
  • Accessibility
  • Legal notice
  • Help us to improve this site, send feedback.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 07 June 2023

CORE: A Global Aggregation Service for Open Access Papers

  • Petr Knoth   ORCID: orcid.org/0000-0003-1161-7359 1 ,
  • Drahomira Herrmannova   ORCID: orcid.org/0000-0002-2730-1546 1   nAff2 ,
  • Matteo Cancellieri 1 ,
  • Lucas Anastasiou 1 ,
  • Nancy Pontika 1 ,
  • Samuel Pearce 1 ,
  • Bikash Gyawali 1 &
  • David Pride 1  

Scientific Data volume  10 , Article number:  366 ( 2023 ) Cite this article

  • Research data

This paper introduces CORE, a widely used scholarly service, which provides access to the world’s largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE’s continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services.

Similar content being viewed by others

research papers open access

A large dataset of scientific text reuse in Open-Access publications

research papers open access

SciSciNet: A large-scale open data lake for the science of science research

research papers open access

re3data – Indexing the Global Research Data Repository Landscape Since 2012

Introduction.

Scientific literature contains some of the most important information we have assembled as a species, such as how to treat diseases, solve difficult engineering problems, and answer many of the world’s challenges we are facing today. The entire body of scientific literature is growing at an enormous rate with an annual increase of more than 5 million articles (almost 7.2 million papers were published in 2022 according to Crossref, the largest Digital Object Identifier (DOI) registration agency). Furthermore, it was estimated that the amount of research published each year increases by about 10% annually 1 . At the same time, an ever growing amount of research literature, which has been estimated to be well over 1 million publications per year in 2015 2 , is being published as open access (OA), and can therefore be read and processed with limited or no copyright restrictions. As reading this knowledge is now beyond the capacities of any human being, text mining offers the potential to not only improve the way we access and analyse this knowledge 3 , but can also lead to new scientific insights 4 .

However, systematically gathering scientific literature to enable automated methods to process it at scale is a significant problem. Scientific literature is spread across thousands of publishers, repositories, journals, and databases, which often lack common data exchange protocols and other support for inter-operability. Even when protocols are in place, the lack of infrastructure for collecting and processing this data, as well as restrictive copyrights and the fact that OA is not yet the default publishing route in most parts of the world further complicate the machine processing of scientific knowledge.

To alleviate these issues and support text and data mining of scientific literature we have developed CORE ( https://core.ac.uk/ ). CORE aggregates open access research papers from thousands of data providers from all over the world including institutional and subject repositories, open access and hybrid journals. CORE is the largest collection of OA literature–at the time of writing this article, it provides a single point of access to scientific literature collected from over ten thousand data providers worldwide and it is constantly growing. It provides a number of ways for accessing its data for both users and machines, including a free API and a complete dump of its data.

As of January 2023, there are 4,700 registered API users and 2,880 registered dataset and more than 70 institutions have registered to use CORE Recommender in their repository systems.

The main contributions of this work are the development of CORE’s continuously growing dataset and the tools and services built on top of this corpus. In this paper, we describe the motivation behind the dataset’s creation and the challenges and methods of assembling it and keeping it continuously up-to-date. Overcoming the challenges posed by creating a collection of research papers of this scale required devising innovative solutions to harvesting and resource management. Our key innovations in this area which have contributed to the improvement of the process of aggregating research literature include:

Devising methods to extend the functionality of existing widely-adopted metadata exchange protocols which were not designed for content harvesting, to enable efficient harvesting of research papers’ full texts.

Developing a novel harvesting approach (referred to here as CHARS) which allows us to continuously utilise the available compute resources while providing improved horizontal scalability, recoverability, and reliability.

Designing an efficient algorithm for scheduling updates of harvested resources which optimises the recency of our data while effectively utilising the compute resources available to us.

This paper is organised as follows. First, in the remainder of this section, we present several use cases requiring large scale text and data mining of scientific literature, and explain the challenges in obtaining data for these tasks. Next, we present the data offered by CORE and our approach for systematically gathering full text open access articles from thousands of repositories and key scientific publishers.

Terminology

In digital libraries the term record is typically used to denote a digital object such as text, image, or video. In this paper and when referring to data in CORE, we use the term metadata record to refer to the metadata of a research publication, i.e. the title, authors, abstract, project funding details, etc., and the term full text record to describe a metadata record which has an associated full text.

We use the term data provider to refer to any database or a dataset from which we harvest records. Data providers harvested by CORE include disciplinary and institutional repositories, publishers and other databases.

When talking about open access (OA) to scientific literature, we refer to the Budapest Open Access Initiative (BOAI) definition which defines OA as “free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose” ( https://www.budapestopenaccessinitiative.org/read ). There are two routes to open access, 1) OA repositories and 2) OA journals. The first can be achieved by self-archiving (depositing) publications in repositories (green OA), and the latter by directly publishing articles in OA journals (gold OA).

Text and Data Mining of Scientific Literature

Text and data mining (TDM) is the discovery by a computer of new, previously unknown information, by automatically extracting information from different written resources ( http://bit.ly/jisc-textm ). The broad goal of TDM of scientific literature is to build tools that can retrieve useful information from digital documents, improve access to these documents, or use these documents to support scientific discovery. OA and TDM of scientific literature have one thing in common–they both aim to improve access to scientific knowledge for people. While OA aims to widen the availability of openly available research, TDM aims to improve our ability to discover, understand and interpret scientific knowledge.

TDM of scientific literature is being used in a growing number of applications, many of which were until recently not viable due to the difficulties associated with accessing the data from across many publishers and other data providers. Because many use cases involving text and data mining can only realise their full potential when they are executed on an as large corpus of research papers as possible, these data access difficulties have rendered many of the uses cases described below very difficult to achieve. For example, to reliably detect plagiarism in newly submitted publications it is necessary to have access to an always up-to-date dataset of published literature spanning all disciplines. Based on data needs, scientific literature TDM use cases can be broadly categorised into the following two categories, which are shown in Fig.  1 :

A priori defined sample use cases: Use cases which require access to a subset of scientific publications that can be specified prior to the execution of the use case. For example, gathering the list of all trialled treatments for a particular disease in the period 2000–2010 is a typical example of such a use case.

Undefined sample use cases: Use cases which cannot be completed using data samples that are defined a priori. The execution of such use cases might require access to data not known prior to the execution or may require access to all data available. Plagiarism detection is a typical example of such use case.

figure 1

Example uses cases of text and data mining of scientific literature. Depending on data needs, TDM uses can be categorised into a) a priori defined sample use cases, and b) undefined sample use cases. Furthermore, TDM use cases can broadly be categorised into 1) indirect applications which aim to improve access to and organisation of literature and 2) direct applications which focus on answering specific questions or gaining insights.

However, there are a number of factors that significantly complicate access to data for these applications. The needed data is often spread across many publishers, repositories, and other databases, often lacking interoperability (these factors will be further discussed in the next section). Consequently, researchers and developers working in these areas typically invest a considerable amount of time in corpus collection, which could be up to 90% of the total investigation time 5 . For many, this task can even prove impossible due to technical restrictions and limitations of publisher platforms, some of which will be discussed in the next section. Consequently, there is a need for a global, continuously updated, and downloadable dataset of full text publications to enable such analysis.

Challenges in machine access to scientific literature

Probably the largest obstacle to the effective and timely retrieval of relevant research literature is that it may be stored in a wide variety of locations with little to no interoperability: repositories of individual institutions, publisher databases, conference and journal websites, pre-print databases, and other locations, each of which typically offers different means for accessing their data. While repositories often implement a standard protocol for metadata harvesting, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), publishers typically allow access to their data through custom made APIs, which are not standardised and are subject to changes 6 . Other data sources may provide static data dumps in a variety of formats or not offer programmatic access to their data at all.

However, even when publication metadata can be obtained, other steps involved in the data collection process complicate the creation of a final dataset suitable for TDM applications. For example, the identification of scientific publications within all downloaded documents, matching these publications correctly to the original publication metadata, and their conversion from formats used in publishing, such as the PDF format, into a textual representation suitable for text and data mining, are just some of the additional difficulties involved in this process. The typical minimum steps involved in this process are illustrated in Fig.  2 . As there are no widely adopted solutions providing interoperability across different platforms, custom harvesting solutions need to be created for each.

figure 2

Example illustration of the data collection process. The figure depicts the typical minimum steps which are necessary to produce a dataset for TDM of scientific literature. Depending on the use case, tens or hundreds of different data sources may need to be accessed, each potentially requiring a different process–for example accessing a different set of API methods or a different process for downloading publication full text. Furthermore, depending on the use case, additional steps may be needed, such as extraction of references, identification of duplicate items or detection of the publication’s language. In the context of CORE, we provide the details of this process in Section Methods.

Challenges in systematically gathering open access research literature

Open access journals and repositories are increasingly becoming the central providers of open access content, in part thanks to the introduction of funder and institutional open access policies 7 . Open access repositories include institutional repositories such as the University of Cambridge Repository https://www.repository.cam.ac.uk/ , and subject repositories such arXiv https://arxiv.org/ . As of February 2023, there are 6,015 open access repositories indexed in the Directory of Open Access Repositories http://v2.sherpa.ac.uk/opendoar/ (OpenDOAR), as well as 18,935 open access journals indexed in the Directory of Open Access Journals https://doaj.org/ (DOAJ). However, open access research literature can be stored in a wide variety of other locations, including publisher and conference websites, individual researcher websites, and elsewhere. Consequently, a system for harvesting open access content needs to be able to harvest effectively from thousands of data providers. Furthermore, a large number of open access repositories (69.4% of repositories indexed in OpenDOAR as of January 2018) expose their data through the OAI-PMH protocol while often not providing any alternatives. An open access harvesting system therefore also needs to be able to effectively utilise OAI-PMH for open access content harvesting. However, these two requirements–harvesting from thousands of data providers and utilising OAI-PMH for content harvesting–pose a number of significant scalability challenges.

Challenges related to harvesting from thousands of data providers

Open access data providers vary greatly in size, with some hosting millions of documents while others host a significantly lower number. New documents are added and old documents are often updated by data providers daily.

Different geographic locations and internet connection speeds may result in vastly differing times needed to harvest information from different providers, even when their size in terms of publication numbers is the same. As illustrated in Table  1 , there are also a variety of OAI-PMH implementations across commonly used repository platforms providing significantly different harvesting performance. To construct this table, we analysed OAI-PMH metadata harvesting performances of 1,439 repositories in CORE, covering eight different repository platforms. It should be noted that the OAI-PMH protocol only necessitates metadata to be expressed in the Dublin Core (DC) format. However, it also can be extended to express the metadata in other formats. Because the Dublin-Core standard is constrained to just 15 elements, it is not uncommon for OAI-PMH repositories to also use and extended metadata format such as Rioxx ( https://rioxx.net ) or the OpenAIRE Guidelines ( https://www.openaire.eu/openaire-guidelines-for-literature-institutional-and-thematic-repositories ).

Additionally, harvesting is limited not only by factors related to the data providers, but also by the compute resources (hardware) available to the aggregator. As many use cases listed in the Introduction, such as in plagiarism detection or systematic review automation, require access to very recent data, ensuring that the harvested data stays recent and that the compute resources are utilised efficiently both pose significant challenges.

To overcome these challenges, we designed the CORE Harvesting System (CHARS) which relies on two key principles. The first is the application of the microservices software principles to open access content harvesting 8 . The second is our strategy we denote pro-active harvesting , which means that providers are scheduled automatically according to current need. This strategy is implemented in the harvesting Scheduler (Section CHARS_architecture). The Scheduler uses a formula we designed for prioritising data providers.

The combination of the Scheduler with CHARS microservices architecture enables us to schedule harvesting according to current compute resource utilisation, thus greatly increasing our harvesting efficiency. Since switching from a fixed-schedule approach described above to pro-active harvesting, we have been able to greatly improve the data recency of our collection as well as to increase the size of the collection threefold within the span of three years.

Challenges related to the use of OAI-PMH protocol for content harvesting

As explained above, OAI-PMH is currently the standard method for exchanging data across repositories. While the OAI-PMH protocol was originally been designed for metadata harvesting only, it has been, due to its wide adoption and lack of alternatives, used as an entry point for full text harvesting. Full text harvesting is achieved by extracting URLs from the metadata records collected through OAI-PMH, the extracted URLs are then used to discover the location of the actual resource 9 . However, there are a number of limitations of the OAI-PMH protocol which make it unsuitable for large-scale content harvesting:

It directly supports only metadata harvesting, meaning additional functionality has to be implemented in order to use it for content harvesting.

The location of full text links in the OAI-PMH metadata is not standardised and the OAI-PMH metadata records typically contain multiple links. From the metadata it is not clear which of these links points to the described representation of the resource and in many cases none of them does so directly. Therefore, all possible links to the resource itself have to be extracted from the metadata and tested to identify the correct resource. Furthermore, OAI-PMH does not facilitate any validation for ensuring the discovered resource is truly the described resource. In order to overcome this issues, the adoption of the RIOXX https://rioxx.net/ metadata format or the OpenAIRE guidelines https://guidelines.openaire.eu/ has been promoted. However, the issue of unambiguously connecting metadata records and the described resource is still present.

The architecture of the OAI-PMH protocol is inherently sequential, which makes it ill-suited for harvesting from very large repositories. This is because the processing of large repositories cannot be parallelised and it is not possible to recover the harvesting in case of failures.

Scalability across different implementations of OAI-PMH differs dramatically. Our analysis (Table  1 ) shows that performance can differ significantly also when only a single repository software is considered 10 .

Other limitations include difficulties in incremental harvesting, reliability issues, metadata interoperability issues, and scalability issues 11 .

We have designed solutions to overcome a number of these issues, which have enabled us to efficiently and effectively utilise OAI-PMH to harvest open access content from repositories. We present these solutions in Section Using OAI-PMH for content harvesting. While we currently rely on a variety of solutions and workarounds to enable content harvesting through OAI-PMH, most of the limitations listed in this section could also be addressed by adopting more sophisticated data exchange protocols, such as the ResourceSync ( http://www.openarchives.org/rs/1.1/resourcesync ) protocol which was designed with content harvesting in mind 10 and the adoption in the systems of data providers we support.

Our solution

In the above sections we have highlighted a critical need for many researchers and organisations globally for large-scale always up-to-date seamless machine access to scientific literature originating from thousands of data providers at full text level. Providing this seamless access has become both a defining goal and a feature of CORE and has enabled other researchers to design and test innovative methods on CORE data, often powered by artificial intelligence processes. In order to put together this vast continuously updated dataset, we had to overcome a number of research challenges, such as those related to the lack of interoperability, scalability, regular content synchronisation, content redundancy and inconsistency. Our key innovation in this area is the improvement of the process of aggregating research literature , as specified in the Introduction section.

This underpinning research has allowed CORE to become a leading provider of open access papers. The amount of data made available by CORE has been growing since 2011 12 and is continuously kept up to date. As of February 2023, CORE provides access to over 291 million metadata records and 32.8 million full text open access articles, making it the world’s largest archive of open access research papers, significantly larger than PubMed, arXiv and JSTOR datasets.

Whilst there are other publication databases that could be initially viewed as similar to CORE, such as BASE or Unpaywall, we will demonstrate the significant differences that set CORE apart and show how it provides access to a unique, harmonised corpus of Open Access literature. A major difference between these existing services is that CORE is completely free to use for the end user, it hosts full text content, and offers several methods for accessing its data for machine processing. Consequently, it removes the need to harvest and pre-process full text for text mining, since CORE provides plain text access to the full texts via its raw data services, eliminating the need for text and data miners to work on PDF formats. A detailed comparison of other publication databases is provided in the Discussion. In addition, CORE enables building powerful services on top of the collected full texts, supporting all the categories of use cases outlined in the Use cases section.

As of today, CORE provides three services for accessing its raw data: API, dataset, and a FastSync service. The CORE API provides real-time machine access to both metadata and full texts of research papers. It is intended for building applications that need reliable access to a fraction of CORE data at any time. CORE provides a RESTful API. Users can register for an API key to access the service. Full documentation and Python notebooks containing code examples can be found on the CORE documentation pages online ( https://api.core.ac.uk/docs/v3 ). The CORE Dataset can be used to download CORE data in bulk. Finally, CORE FastSync enables third party systems to keep an always up to date copy of all CORE data within their infrastructure. Content can be transferred as soon as it becomes available in CORE using a data synchronisation service on top of the ResourceSync protocol 13 optimised by us for improved synchronisation scalability with an on-demand resource dumps capability. CORE FastSync provides fast, incremental and enterprise data synchronisation.

CORE is the largest up-to-date full text open access dataset as well as one of the most widely used services worldwide supporting access to freely available research literature. CORE regularly releases data dumps licensed as ODC-By, making the data freely available for both commercial and non-commercial purposes. Access to CORE data via the API is provided freely to individuals conducting work in their own personal capacity and to public research organisations for unfunded research purposes. CORE offers licenses to commercial organisations wanting to use CORE services to obtain a convenient way of accessing CORE data with a guaranteed level of service support. CORE is operated as a not-for-profit entity by The Open University and this business model makes it possible for CORE to remain free for the >99.99% of its users.

A large number of commercial organisations have benefited from these licenses in areas as diverse as plagiarism detection in research, building specialised scholarly publication search engines, developing scientific assistants and machine translation systems and supporting education etc. https://core.ac.uk/about/endorsements/partner-projects . The CORE data services–CORE API and Dataset, have been used by over 7,000 experts to analyse data, develop text-mining applications and to embed CORE into existing production systems.

Additionally, more than 70 repository systems have registered to use the CORE Recommender and the service is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org. Other CORE services are the CORE Discovery and the CORE Repository Dashboard. The first was released on July 2019 and at the time of writing it has more than 5000 users. The latter is a tool designed specifically for repository managers which provides access to a range of tools for managing the content within their repositories. The CORE Repository Dashboard is currently used by 499 users from 36 countries.

In the rest of this paper we describe the CORE dataset and the methods of assembling it and keeping it continuously up-to-date. We also present the services and tools built on top of the aggregated corpus and provide several examples of how the CORE dataset has been used to create real-world applications addressing specific use-cases.

As highlighted in the Introduction, CORE is a continuously growing dataset of scientific publications for both human and machine processing. As we will show in this section, it is a global dataset spanning all disciplines and containing publications aggregated from more than ten thousand data providers including disciplinary and institutional repositories, publishers, and other databases. To improve access to the collected publications, CORE performs a number of data enrichment steps. These include metadata and full text extraction, language and DOI detection, and linking with other databases. Furthermore, CORE provides a number of services which are built on top of the data: a publications recommender ( https://core.ac.uk/services/recommender/ ), CORE Discovery service ( https://core.ac.uk/services/discovery/ ) (a tool for discovering OA versions of scientific publications), and a dashboard for repository managers ( https://core.ac.uk/services/repository-dashboard/ ).

Dataset size

As of February 2023, CORE is the world’s largest dataset of open access papers (comparison with other systems is provided in the Discussion). CORE hosts over 291 million metadata records including over 34 million articles with full text written in 82 languages and aggregated from over ten thousand data providers located in 150 countries. Full details of CORE Dataset size are presented in Table  2 . In the table, “Metadata records” represent all valid (not retracted, deleted, or for some other reason withdrawn) records in CORE. It can be seen that about 13% of records in CORE contain full text. This number represents records for which a manuscript was successfully downloaded and converted to plain text. However, a much higher proportion of records contains links to additional freely available full text articles hosted by third-party providers. Based on analysing a subset of our data, we estimate that about 48% of metadata records in CORE fall into this category, indicating that CORE is likely to contain links to open access full texts for 139 million articles. Due to the nature of academic publishing there will be instances where multiple versions of the same paper are deposited in different repositories. For example, an early version of an article can be deposited by an author to a pre-print server such as arXiv or BiorXiv and then a later version uploaded to an institutional repository. Identifying and matching these different versions is a significant undertaking. CORE has carried out research to develop techniques based on locality sensitive hashing for duplicates identification 8 and integrated these into its ingestion pipeline to link versions of papers from across the network of OA repositories and group these under a single works entity. The large number of records in CORE translates directly into the size of the dataset in bytes as the uncompressed version of the dataset including PDFs is about 100 TB. The compressed version of the CORE dataset with plain texts only amounts to 393 GB and uncompressed to 3.5 TBs.

Recent studies have estimated that around 24%–28% of all articles are available free to read 2 , 14 . There are a number of reasons why the proportion of full text content in CORE is lower than these estimates. The main reason is likely that a significant proportion of the free to read articles represents content hosted on platform with many restrictions for machine accessibility, i.e. some repositories severely restrict or fully prohibit content harvesting 9 .

The growth of CORE has been made possible thanks to the introduction of a novel harvesting system and the creation of an efficient harvesting scheduler, both of which are described in the Methods section. The growth of metadata and full text records in CORE is shown in Fig.  3 . Finally, Fig.  4 shows age of publications in CORE.

figure 3

Growth of records in CORE per month since February 2012. “Full text growth” represents growth of records containing full text, while “Metadata growth” represents growth of records without full text, i.e. the two numbers do not overlap. The two area plots are stacked on top of each other, their sum therefore represents the total number of records in CORE.

figure 4

Age of publications in CORE. Similarly as in Fig.  3 , the “Metadata” and “Full text” records bars are stacked on top of each other.

Data sources and languages

As of February 2023, CORE was aggregating content from 10,744 data sources. These data sources include institutional repositories (for example the USC Digital Library or the University of Michigan Library Repository), academic publishers (Elsevier, Springer), open access journals (PLOS), subject repositories, including those hosting eprints (arXiv, bioRxiv, ZENODO, PubMed Central) and aggregators (e.g. DOAJ). The ten largest data sources in CORE are shown in Table  3 . To calculate the total number of data providers in CORE, we consider aggregators and publishers as one data source despite each aggregating data from multiple sources. A full list of all data providers can be found on the CORE website. ( https://core.ac.uk/data-providers ).

The data providers aggregated by CORE are located in 150 different countries. Figure  5 shows the top ten countries in terms of number of data providers aggregated by CORE from each country alongside the top ten languages. The geographic spread of repositories is largely reflective of the size of the research economy in those countries. We see the US, Japan, Germany, Brazil and the UK all in the top six. One result that at first may appear surprising is the significant number of repositories in Indonesia, enough to place them at the top of the list. An article in Nature in 2019 showed that Indonesia may be the world’s OA leader, finding that 81% of 20,000 journal articles published in 2017 with an Indonesia-affiliated author are available to read for free somewhere online. ( https://www.nature.com/articles/d41586-019-01536-5 ). Additionally, there are a large number of Indonesian open-access journals registered with Crossref. This subsequently leads to a much higher number of individual repositories in this country.

figure 5

Top ten languages and top ten provider locations in CORE.

As part of the enrichment process, CORE performs language detection. Language is either extracted from the attached metadata where available or identified automatically from full text in case it is not available in metadata. More than 80% of all documents with language information are in English. Overall, CORE contains publications in a variety of languages, the top 10 of which are shown in Fig.  5 .

Document types

The CORE dataset comprises a collection of documents gathered from various sources, many of which contain articles of different types. Consequently, aside of research articles from journals and conferences, it includes other types of research outputs such as research theses, presentations, and technical reports. To distinguish different types of articles, CORE has implemented a method of automatically classifying documents into one of the following four categories 15 : (1) research article, (2) thesis, (3) presentation, (4) unknown (for articles not belonging into any of the previous three categories). This method is based on a supervised machine learning model trained on article full texts. Figure  6 shows the distribution of articles in CORE into these four categories. It can be seen that the collection aggregated by CORE consists predominantly of research articles. We have observed in the data collected from repositories that the vast majority of research theses deposited in repositories has full text associated with the metadata. As this is not always the case for research articles, and as Fig.  6 is produced on articles with full text only, we expect that the proportion of research articles compared to research theses in CORE is actually higher across the entire collection.

figure 6

Distribution of document types.

Research disciplines

To analyse the distribution of disciplines in CORE we have leveraged a third-party service. Figure  7 shows a subject distribution of a sample of 20,758,666 publications in CORE. For publications with multiple subjects we count the publication towards each discipline.

figure 7

Subject distribution of a sample of 20,758,666 CORE publications.

The subject for each article was obtained using Microsoft Academic ( https://academic.microsoft.com/home ) prior to its retirement in November 2021. Our results are consistent with other studies, which have reported Biology, Medicine, and Physics to be the largest disciplines in terms of number of publications 16 , 17 , suggesting that the distribution of articles in CORE is representative of research publications in general.

Additional CORE Tools and Services

CORE has built several additional tools for a range of stakeholders including institutions, repository managers and researchers from across all scientific domains. Details of usage of these services is covered in the Uptake of CORE section.

The Dashboard provides a suite of tools for repository management, content enrichment, metadata quality assessment and open access compliance checking. Further, it can provide statistics regarding content downloads and suggestions for improving the efficiency of harvesting and the quality of metadata.

CORE Discovery helps users to discover freely accessible copies of research papers. There are several methods for interacting with the Discovery tool. First, as a plugin for repositories, enriching metadata only pages in repositories with links to open access copies of full text documents. Second, via a browser extension for researchers and anyone interested in reading scientific documents. And finally as an API service for developers.

Recommender

The recommender is a plugin for repositories, journal systems and web interfaces that provides suggestions on relevant articles to the one currently displayed. Its purpose is to support users in discovering articles of interest from across the network of open access repositories. It is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org.

OAI Resolver

An OAI (Open Archives Initiative) identifier is a unique identifier of a metadata record. OAI identifiers are used in the context of repositories using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAI Identifiers are viable persistent identifiers for repositories that can be, as opposed to DOIs, minted in a distributed fashion and cost-free, and which can be resolvable directly to the repository rather than to the publisher. The CORE OAI Resolver can resolve any OAI identifier to either a metadata page of the record in CORE or route it directly to the relevant repository page. This approach has the potential to increase the importance of repositories in the process of disseminating knowledge.

Uptake of CORE

As of February 2023, CORE averages over 40 million monthly active users and is the top 10th website in the category Science and Education according to SimilarWeb ( https://www.similarweb.com/ ). There are currently 4,700 registered API users and 2,880 registered dataset users. The CORE Dashboard is currently used by 499 institutional repositories to manage their open access content, monitor content download statistics, manage issues with metadata within the repository and ensure compliance with OA funder policies, notably REF in the U.K. The CORE Discovery plugin has been integrated into 434 repositories and the browser extension has been downloaded by more than 5,000 users via the Google Chrome Web Store ( https://chrome.google.com/webstore/category/extensions ). The CORE Recommender has been embedded in 70 repository systems including the University of Cambridge and arXiv.

In this section we discuss differences between CORE and other open access aggregation services and present several real-word use cases where CORE was used to develop services to support science. In this section we also present our future plans.

Existing open access aggregation services

Currently there are a number of open access aggregation services available (Table  4 ), with some examples being BASE ( https://base-search.net/ ), OpenAIRE ( https://www.openaire.eu/ ), Unpaywall ( http://unpaywall.org/ ), Paperity ( https://paperity.org/ ). BASE (Bielfield Academic Search Engine) is a global metadata harvesting service. It harvests repositories and journals via OAI-PMH and exposes the harvested content through an API and a dataset. OpenAIRE is a network of open access data providers who support open access policies. Even though in the past the project focused on European repositories, it has recently expanded by including institutional and subject repositories from outside Europe. A key focus of OpenAIRE is to assist the European Council to monitor compliance of its open access policies. OpenAIRE data is exposed via an API. Paperity is a service which harvests publications from open access journals. Paperity harvests both metadata and full text but does not host full texts. SHARE (Shared Access Research Ecosystem) is a harvester of open access content from US repositories. Its aim is to assist with the White House Office of Science and Technology Policy (OSTP) open access policies compliance. Even though SHARE harvests both metadata and full text it does not host the latter. Unpaywall is not primarily a harvester, but rather collects content from Crossref, whenever a free to read available version can be retrieved. It processes both metadata and full text but does not host them. It exposes the discovered links to documents through an API.

CORE differs from these services in a number of ways. CORE is currently the largest database of full text OA documents. In addition, CORE offers via its API a rich metadata record for each item in its collection which includes additional enrichments, contrary, for example, to Unpaywall’s API, which focuses only on delivering to the user information as to whether a free to read version is available. CORE also provides the largest number of links to OA content. To simplify access to data for end users it provides a number of ways for accessing its collection. All of the above services are free to use for research purposes however both CORE and Unpaywall also offer services to commercial partners on a paid-for basis.

Existing publication databases

Apart from OA aggregation services, a number of other services exists for searching and downloading scientific literature (Table  5 ). One of the main publication databases is Crossref ( https://www.crossref.org/ ), an authoritative index of DOI identifiers. Its primary function is to maintain metadata information associated with each DOI. The metadata stored by Crossref includes both OA and non-OA records. Crossref does not store publication full text, but for many publications provides full text links. As of February 2023, 5.9 m records in Crossref were associated with an explicit Creative Commons license (we have used the Crossref API to determine this number). Although Crossref provides an API, it does not offer its data for download in bulk, or provide a data sync service.

The remaining services from Table  5 can be roughly grouped into the following two categories: 1) citation indices, 2) academic search engines and scholarly graphs. The two major citation indices are Elsevier’s Scopus ( https://www.elsevier.com/solutions/scopus ) and Clarivate’s Web of Science ( https://clarivate.com/webofsciencegroup/solutions/web-of-science/ ), both of which are premium subscription services. Google Scholar, the best known academic search engine does not provide an API for accessing its data and does not permit crawling its website. Semantic Scholar ( https://www.semanticscholar.org/ ) is a relatively new academic search service which aims to create an “intelligent academic search engine” 18 . Dimensions ( https://www.dimensions.ai/ ) is a service focused on data analysis. It integrates publications, grants, policy documents, and metrics. 1findr ( https://1findr.1science.com/home ) is a curated abstract indexing service. It provides links to full text, but no API or a dataset for download.

The added value of CORE

There are other services that claim to provide access to a large dataset of open access papers. In particular, Unpaywall 2 , claim to provide access to 46.4 million free to read articles, and BASE, who state they provide access to full texts of about 60% of their 300 million metadata records. However, these statistics are not directly comparable to the numbers we report and are a product of a different focus of these two projects. This is because both the analysis of BASE and now Unpaywall define “providing access to” in terms of having a list of URLs from which a human user can navigate to the full text of the resource. This means that both Unpaywall and BASE do not collect these full text resources, which is also why they do not face many of the challenges we described in the Introduction. Using this approach, we could say that the CORE Dataset provides access to approximately 139 million full texts, i.e. about 48% of our 291 million metadata records point to a URL from which a human can navigate to the full text. However, to people concerned with text and data mining of scientific literature, it makes little sense to count URLs pointing to many different domains on the Web as the number of full texts made available.

As a result, our 32.8 million statistic refers to the number of OA documents we identified, downloaded, extracted text from, validated their relationship to the metadata record and the full texts of which we host on the CORE servers and make available to others. In contrast, BASE and Unpaywall do not aggregate the full texts of the resources they provide access to and consequently do not offer the means to interact with the full texts of these resources or offer bulk download capability of these resources for text analytics over scholarly literature.

We have also integrated CORE data with the OpenMinTeD infrastructure, a European Commission funded project which aimed to provide a platform for text mining of scholarly literature in the cloud 6 .

A number of academia and industry partners have utilised CORE in their services. In this section we present three existing uses of CORE demonstrating how CORE can be utilised to support text and data mining use cases.

Since 2017, CORE has been collaborating with a range of scholarly search and discovery systems. These include Naver ( https://naver.com/ ), Lean Library ( https://www.leanlibrary.com/ ) and Ontochem ( https://ontochem.com/ ). As part of this work, CORE serves as a provider of full text copies of reserch papers to existing records in these systems (Lean Library) or even supplies both metadata and full texts for indexing (Ontochem, NAVER). This collaboration also benefits CORE’s data providers as it expands and increases the visibility of their content.

In 2019, CORE entered into a collaboration with Turnitin, a global leader in plagiarism detection software. By using the CORE FastSync service, Turnitin’s proprietary web crawler searches through CORE’s global database of open access content and metadata to check for text similarity. This partnership enables Turnitin to significantly enlarge its content database in a fast and efficient manner. In turn, it also helps protect open access content from misuse, thus protecting authors and institutions.

As of February 2023, CORE Recommender 19 is actively running in over 70 repositories including the University of Cambridge institutional repository and arXiv.org among others. The purpose of the recommender is to improve the discoverability of research outputs by providing suggestions for similar research papers both within the collection of the hosting repository and the CORE collection. Repository managers can install the recommender to advance the accessibility of other scientific papers and outreach to other scientific communities, since the CORE Recommender acts as a gate to millions of open access research papers. The recommender is integrated with the CORE search functionality and is also offered as a plugin for all repository software, for example EPrints, DSpace, etc. as well as open access journals and any other webpage. Based on the fact that CORE harvests open repositories, the recommender only displays research articles where the full text is available as open access, i.e. for immediate use, without access barriers or limited rights’ restrictions. Through the recommender, CORE promotes the widest discoverability and distribution of the open access scientific papers.

Future work

An ongoing goal of CORE is to keep growing the collection to become a single point of access to all of world’s open access research. However, there are a number of other ways we are planning to improve both the size and ease of access to the collection. The CORE Harvesting System was designed to enable adding new harvesting steps and enrichment tasks. There remains scope for adding more of such enrichments. Some of these are machine learning powered, such as classification of scientific citations 20 . Further, CORE is currently developing new methodologies to identify and link different versions of the same article. The proposed system, titled CORE Works, will leverage CORE’s central position in the OA infrastructure landscape and will link different versions of the same paper using a unique identifier. We will continue to keep linking the CORE collection to scholarly entities from other services, thereby making CORE data participate in a global scholarly knowledge graph.

In the Introduction section we focused on a a number of challenges researchers face when collecting research literature for text and data mining. In this section, we instead focus on the perspective of a research literature aggregator, i.e. a system whose goal is to continuously provide seamless access to research literature aggregated from thousands of data providers worldwide in a way that enables the resulting research publication collection to be used by others in production applications. We describe the challenges we had to overcome to build this collection and to keep it continuously up-to-date, and present the key technical innovations which allowed us to greatly increase the size of the CORE collection and become a leading provider of open access literature which we illustrate using our content growth statistics.

CORE Harvesting system (CHARS)

CORE Harvesting System (CHARS) is the backbone of our harvesting process. CHARS uses the Harvesting Scheduler (Section CHARS_architecture) to select data providers to be processed next. It manages all the running processes (tasks) and ensures the available compute resources are well utilised.

Prior to implementing CHARS, CORE was centralised around data providers rather than around individual tasks needed to harvest and process these data providers (e.g. metadata download and parsing, full text download, etc.). Consequently, even though the scaling up and the continuation of this system was possible, the infrastructure was not horizontally scalable and the architecture suffered from tight coupling of services. This was not consistent with CORE’s high availability requirements and was regularly causing problems in the complexity of maintenance. In response to these challenges, we designed CHARS using a microservices architecture, i.e. using small manageable autonomous components that work together as part of a larger infrastructure 21 . One of the key benefits of microservices-oriented architecture is that the implementation focus can be put on the individual components which can be improved and redeployed as frequently as needed and independently of the rest of the infrastructure. As the process of open access content harvesting can be inherently split into individual consecutive tasks, a microservices-oriented architecture presents a natural fit for aggregation systems like CHARS.

Tasks involved in open access content harvesting

The harvesting process can be described as a pipeline where each task performs a certain action and where the output of each task feeds into the next task. The input to this pipeline is a set of data providers and the final output is a system populated with records of research papers available from them. The main types of key tasks currently performed as part of CORE’s harvesting system are (Fig.  8 ):

Metadata download: The metadata exposed by a data provider via OAI-PMH are downloaded and stored in the file system (typically as an XML). The downloading process is sequential, i.e. a repository provides typically between 100–1,000 metadata records per request and a resumption token. This token is then used to provide the next batch. As a result, full harvesting can a significant amount of time (hours-days) for large data providers. Therefore, this process has been implemented to provide resilience to a range of communication failures.

Metadata extraction : Metadata extraction parses, cleans, and harmonises the downloaded metadata and stores them into the CORE internal data structure (database). The harmonisation and cleaning process addresses the fact that different data providers/repository platforms describe the same information in different ways (syntactic heterogeneity) as well as having different interpretations for the same information (semantic heterogeneity).

Full text download : Using links extracted from the metadata CORE attempts to download and store publication manuscripts. This process is non-trivial and is further described in the Using OAI-PMH for content harvesting section.

Information extraction : Plain text from the downloaded manuscripts is extracted and processed to create a semi-structured representation. This process includes a range of information extraction tasks, such as references extraction.

Enrichment : The enrichment task works by increasing both metadata and full text harvested from the data providers with additional data from multiple sources. Some of the enrichments are performed directly by specific tasks in the pipeline such as language detection and document type detection. The remaining enrichments that involve external datasets are performed externally and independently to the CHARS pipeline and ingested into the dataset as described in the Enrichments section.

Indexing : The final step in the harvesting pipeline is indexing the harvested data. The resulting index powers CORE’s services, including search, API and FastSync.

figure 8

CORE Harvesting Pipeline. Each tasks’ output produces the input for the following task. In some cases the input is considered as a whole, for example all the content harvested from a data provider, while in other cases, the output is split in multiple small tasks performed on a record level.

Scalable infrastructure requirements

Based on the experience obtained while developing and maintaining our harvesting system as well as taking into consideration the features of the CiteSeerX 22 architecture, we have defined a set of requirements for a scalable harvesting infrastructure 8 . These requirements are generic and apply to any aggregation or digital library scenario. These requirements informed and are reflected in the architecture design of CHARS (Section CHARS architecture):

Easy to maintain: The system should be easy to manage, maintain, fix, and improve.

High levels of automation: The system should be completely autonomous while allowing manual interaction.

Fail fast: Items in the harvesting pipeline should be validated immediately after a task is performed, instead of having only one and final validation at the end of the pipeline. This has the benefit of recognising issues and enabling fixes earlier in the process.

Easy to troubleshoot: Possible code bugs should be easily discerned.

Distributed and scalable: The addition of more compute resources should allow scalability, be transparent and replicable.

No single point of failure: A single crash should not affect the whole harvesting pipeline, individual tasks should work independently.

Decoupled from user-facing systems: Any failure in the ingestion processing services should not have an immediate impact on user-facing services.

Recoverable: When a harvesting task stops, either manually or due to a failure, the system should be able to recover and resume the task without manual intervention.

Performance observable: The system’s progress must be properly logged at all times and overlay monitoring services should be set up to provide a transparent overview of the services’ progress at all times, to allow early detection of scalability problems and identification of potential bottlenecks.

CHARS architecture

An overview of CHARS is shown in Fig.  9 . The system consists of the following main software components:

Scheduler: it becomes active when a task finishes. It monitors resource utilisation and selects and submits data providers to be harvested.

Queue (Qn): a messaging system that assists with communication between parts of the harvesting pipeline. Every individual task, such as metadata download, metadata parsing, full text download, and language detection, has its own message queue.

Worker (W i ): an independent and standalone application capable of executing a specific task. Every individual task has its own set of workers.

figure 9

CORE Harvesting System.

A complete harvest of a data provider can be described as follows. When an existing task finishes, the scheduler is activated and informed of the result. It then uses the formula described in Appendix A to assign a score to each data provider. Depending on current resource utilisation, i.e. if there are any idle workers, and the number of data providers already scheduled for harvesting, the data provider with the highest score is then placed in the first queue Q 1 which contains data providers scheduled for metadata download. Once one of the metadata download workers W i -W j becomes available, a data provider is taken out of the queue and a new download of its metadata starts. Upon completion, the worker notifies the scheduler and, if the task is completed successfully, places the data provider in the next queue. This process continues until the data provider passes through the entire pipeline.

While some of the tasks in the pipeline need to be performed at the granularity of data providers, specifically metadata download and parsing, other tasks, such as full text extraction and language detection, can be performed at the granularity of individual records. While these tasks are originally scheduled at the granularity of data providers, only the individual records of a selected data provider which require processing are subsequently independently placed in the appropriate queue. Workers assigned to these tasks then process the individual records in the queue and they move through the pipeline once completed.

A more detailed description of CHARS, which includes technologies used to implement it, as well as other details can be found in 8 .

The harvesting scheduler is a component responsible for identifying data providers which need to be harvested next and placing these data providers in the harvesting queue. In the original design of CORE, our harvesting schedule was created manually, assigning the same harvesting frequency to every data provider. However, we found this approach inefficient as it does not scale due to the varying data providers size, differences in the update frequency of their databases and the maximum data delivery speeds of their repository platforms. To address these limitations, we designed the CHARS scheduler according to our new concept of “pro-active harvesting.” This means that the scheduler is event driven. It is triggered whenever the underlying hardware infrastructure has resources available to determine which data provider should be harvested next. The underlying idea is to maximise the number of ingested documents over a unit of time. The pseudocode and the formula we use to determine which repository to harvest next is described in Algorithm 1.

The size of the metadata download queue, i.e. the queue which represents an entry into the harvesting pipeline, is kept limited in order to keep the system responsive to the prioritisation of data providers. A long queue makes prioritising data providers harder, as it is not known beforehand how long the processing of a particular data provider will take. An appropriate size of the queue ensures a good balance between the reactivity and utilisation of the available resources.

Using OAI-PMH for content harvesting

We now describe the third key technical innovation which enables us to harvest full text content (as opposed to just metadata) from data providers using the OAI-PMH protocol. This process represents one step in the harvesting pipeline (Fig.  9 ), specifically, the third step which is activated after data provider metadata have been downloaded and parsed.

The OAI-PMH protocol was originally designed for metadata harvesting only, but due to its wide adoption and lack of alternatives it has been used as an entry point for full text harvesting from repositories. Full text harvesting is achieved by using URLs found in the metadata records to discover the location of the actual resource and subsequently downloading it 9 . We summarised the key challenges of this approach in the Challenges related to the use of OAI-PMH protocol for content harvesting section. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

The procedure works in the following way. First, all metadata records from a selected data provider with no full text are collected. Those records for which full text download was attempted within the retry period ( RP ) (usually six months) are filtered out. This is to avoid repeatedly downloading URLs that do not lead to the sought after documents. The downside of this approach is that if a data provider updates a link in the metadata, it might take up to the duration of the retry period to acquire the full text.

Algorithm 1

research papers open access

Next, the records are further filtered using a set of rules and heuristics we developed to a) increase the chances of identifying the URL leading to the described document quickly and b) to ensure that we identify the correct document. These filtering rules include:

Accepted file extensions: URLs are filtered according to a list of accepted file extensions. URLs ending with extensions such as .pptx that clearly indicate that the URL does not link to the required resource are removed from the list.

Same domain policy: URLs in the OAI-PMH metadata can link to any resources and domains. For example, a common practice is to provide a link to the associated presentation, dataset, or another related resource. As these are often stored in external databases, filtering out all URLs that lead to an external domain, i.e. domain different than the domain of the data provider, presents a simple method of avoiding the download of resources which with very high likelihood do not represent the target document. Exceptions include dx.doi.org and hdl.handle.net domains whose purpose is to provide a persistent identifier pointing to the document. The same domain policy is disabled for data providers which are aggregators and link to many different domains by design.

Provider-specific crawling heuristics: Many data providers follow a specific pattern when composing URLs. For example, a link to a full text document may be composed of the following parts: data provider URL  +  record handle  +  .pdf . For data providers utilising such patterns, URLs may be composed automatically where the relevant information (record handle) is known to us from the metadata. These generated URLs are then added to the list of URLs obtained from the metadata.

Prioritising certain URLs: As it is more likely for PDF URL to contain the target record than for an HTML URL, the final step is to sort URLs according to file and URL type. Highest priority is assigned to URLs that uses repository software specific patterns to identify full text, document, and PDF filetypes, while the lowest priority is assigned to hdl.handle.net URLs.

The system then attempts to request the document at each URL and download it. After each download, checks are performed to determine whether the downloaded document represents the target record. Currently, the downloaded document has to be a valid PDF with a title matching the original metadata record. If the target record is identified, the downloaded document is stored and the download process for that record ends. If the downloaded document contains an HTML page, URLs are extracted from this page and filtered using the same method mentioned above. This is because it is common in some of the most widely used repository systems such as DSpace for the documents not to be directly referenced from within the metadata records. Instead, the metadata records typically link to an HTML overview page of the document. To deal with this problem, we use the concept of harvesting levels. A maximum harvesting level corresponds to the maximum search depth for the referenced document. The algorithm finishes either as soon as the first matching document is found or after all the available URLs up to the maximum harvesting level have been exhausted. Algorithm 2 describes our approach for collecting the full texts using the OAI-PMH protocol. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

Algorithm 2

research papers open access

CHARS limitations

Despite overcoming the key issues to scalable harvesting of content from repositories, there still remains a number of important challenges. The first relates to the difficulty of estimating the optimal number of workers in our system to run efficiently. While the worker allocation is still largely established empirically, we are investigating more sophisticated approaches based on formal models of distributed computation, such as Petri Nets. This will allow us to investigate new approaches to dynamically allocating and launching workers to optimise the usage of our resources.

Enrichments

Conceptually, two types of enrichment processes are used within CORE: 1) an online enrichment process enriching a single record at the time of it being processed by the CHARS pipeline and 2) a periodic offline enrichment process which enriches a record based on information in external datasets (Fig.  10 ).

figure 10

CORE Offline Enrichments.

Online enrichments

Online enrichments are fully integrated into the CHARS pipeline described earlier in this section. These enrichments generally involve the application of machine learning models and rule-based tools to gather additional insights about the record, such as language detection, document type detection. As opposed to offline enrichments, online enrichments are always performed just once for a given record. The following is a list of the current enrichments performed online:

Article type detection: A machine learning algorithm assigns each publication one of the following four types: presentation, thesis, research paper, other. In the future we may include other types.

Language identification: This task uses third-party libraries to identify the language based on the full text of a document. The resulting language is then compared to the one provided by the metadata record. Some heuristics are applied to disambiguate and harmonise languages.

Offline enrichments

Offline enrichments are carried out by means of gathering a range of information from large third-party scholarly datasets (research graphs). Such information includes metadata that do not necessarily change, such as a DOI identifier, as well as metadata that evolve, such as the number of citations. Especially due to the latter, CORE performs offline enrichments periodically, i.e. all records in CORE go through this process repeatedly at specified time intervals (currently once per month).

The process is depicted in Fig.  10 . The initial mapping of a record is carried out using a DOI, if available. However, as the majority of records from repositories do not come with a DOI, we carry out a matching process against the Crossref database using a subset of metadata fields including title, authors and year. Once the mapping is performed, we can harmonise fields as well as gather a wide range of additional useful data from relevant external databases, thereby enriching the CORE record. Such data include, ORCID identifiers, citation information, additional links to freely available full texts, field of study information and PubMed identifiers. Our solution is based on a set of map-reduce tasks to enrich the dataset and implemented on a Cloudera Enterprise Data Hub ( https://www.cloudera.com/products/enterprise-data-hub.html ) 23 , 24 , 25 , 26 .

Data availability

CORE provides several large data dumps of the processed and aggregated data under the ODC-BY licence ( https://core.ac.uk/documentation/dataset ). The only condition for both commercial and non-commercial reuse of these datasets is to acknowledge the use of CORE in their outputs. Additionally, CORE makes its API and most recent data dump freely available to registered individual users and researchers. Please note that CORE claims no rights in the aggregated content itself which is open access and therefore freely available to everyone. All CORE data rights correspond to the sui generis database rights of the aggregated and processed collection.

Licences for CORE services, such as the API and FastSync, are available for commercial users wishing to benefit from convenient access to CORE data with guaranteed level of customer support. The organisation running CORE, i.e. The Open University, is a charitable organisation fully committed to the Open Research mission. CORE is a signatory of the Principles of Open Scholarly Infrastructure (POSI) ( https://openscholarlyinfrastructure.org/posse ). No profit generation is practised. Instead, CORE’s income from licences to commercial parties is used solely to provide sustainability by means of enabling CORE to become less reliant on unstable project grants, thus offsetting and reducing the cost of CORE to the taxpayer. This is done in full compliance with the principles and best practices of sustainable open science infrastructure.

Code availability

CORE consists of multiple services. Most of our source code is open source and available in our public repository on GitHub ( https://github.com/oacore/ ). As of today, we are unfortunately not yet able to provide the source code to our data ingestion module. However, as we want to be as transparent as possible with our community, we have documented in this paper the key algorithms and processes which we apply using pseudocode.

Bornmann, L. & Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. JASIST 66 (11), 2215–2222 (2015).

CAS   Google Scholar  

Piwowar, H. et al . The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 6 , e4375 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Saggion, H. & Ronzano, F. Scholarly data mining: making sense of scientific literature. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) : 1–2 (2017).

Kim, E. et al . Materials synthesis insights from scientific literature via text extraction and machine learning. Chemistry of Materials 29 (21), 9436–9444 (2017).

Article   CAS   Google Scholar  

Jacobs, N. & Ferguson, N. Bringing the UK’s open access research outputs together: Barriers on the Berlin road to open access. Jisc Repository (2014).

Knoth, P., Pontika, N. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not? In: INTEROP2016 (2016).

Herrmannova, D., Pontika, N. & Knoth, P. Do Authors Deposit on Time? Tracking Open Access Policy Compliance. Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries , Urbana-Champaign, IL (2019).

Cancellieri, M., Pontika, N., Pearce, S., Anastasiou, L. & Knoth, P. Building Scalable Digital Library Ingestion Pipelines Using Microservices. Proceedings of the 11th International Conference on Metadata and Semantics Research (MTSR 2017) : 275–285. Springer (2017).

Knoth, P. From open access metadata to open access content: two principles for increased visibility of open access content. Proceedings of the 2013 Open Repositories Conference , Charlottetown, Prince Edward Island, Canada (2013).

Knoth, P.; Cancellieri, M. & Klein, M. Comparing the Performance of OAI-PMH with ResourceSync. Proceedings of the 2019 Open Repositories Conference , Hamburg, Germany (2019).

Kapidakis, S. Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting. Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science 11057 , 16–31 (2018).

Google Scholar  

Knoth, P. and Zdrahal, Z. CORE: three access levels to underpin open access. D-Lib Magazine 18 (11/12) (2012).

Haslhofer, B. et al . ResourceSync: leveraging sitemaps for resource synchronization. Proceedings of the 22nd International Conference on World Wide Web : 11–14 (2013).

Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLOS One 9 (5), e93949 (2014).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Charalampous, A. & Knoth, P. Classifying document types to enhance search and recommendations in digital libraries. Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science 10450 , 181–192 (2017).

Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105 (4), 1118–1123 (2008).

Article   ADS   CAS   Google Scholar  

D’Angelo, C. A. & Abramo, G. Publication rates in 192 research fields of the hard sciences. Proceedings of the 15th ISSI Conference : 915–925 (2015).

Ammar, W. et al . Construction of the Literature Graph in Semantic Scholar. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 3 (Industry Papers): 84–91 (2018).

Knoth, P. et al . Towards effective research recommender systems for repositories. Open Repositories , Bozeman, USA (2017).

Pride, D. & Knoth, P. An Authoritative Approach to Citation Classification. Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Virtual–China (2020).

Newman, S. Building microservices: designing fine-grained systems. O’Reilly Media, Inc. (2015).

Li, H. et al . CiteSeer χ : a scalable autonomous scientific digital library. Proceedings of the 1st International Conference on Scalable Information Systems , ACM (2006).

Bastian, H., Glasziou, P. & Chalmers, I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS medicine 7 (9), e1000326 (2010).

Shojania, K. G. et al . How quickly do systematic reviews go out of date? A survival analysis. Annals of internal medicine 147 (4), 224–233 (2007).

Article   PubMed   Google Scholar  

Tsafnat, G. et al . Systematic review automation technologies. Systematic reviews 3 (1), 74 (2014).

Harzing, A.-W. & Alakangas, S. Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics 112 (3), 1887–1894 (2017).

Article   Google Scholar  

Download references

Acknowledgements

We would like to acknowledge the generous support of Jisc, under a number of grants and service contracts with The Open University. These included projects CORE, ServiceCORE, UK Aggregation (1 and 2) and DiggiCORE, which was co-funded by Jisc with NWO. Since 2015, CORE has been supported in three iterations under the Jisc Digital Services–CORE (JDSCORE) service contract with The Open University. Within Jisc, we would like to thank primarily the CORE project managers, Andy McGregor, Alastair Dunning, Neil Jacobs and Balviar Notay. We would also like to thank the European Commission for funding that contributed to CORE, namely OpenMinTeD (739563) and EOSC Pilot (654021). We would like to show our gratitude to all current CORE Team members who contributed to CORE but are not authors of the manuscript, namely Valeriy Budko, Ekaterine Chkhaidze, Viktoriia Pavlenko, Halyna Torchylo, Andrew Vasilyev and Anton Zhuk. We would like to show our gratitude to all past CORE Team members who have contributed to CORE over the years, namely Lucas Anastasiou, Giorgio Basile, Aristotelis Charalampous, Josef Harag, Drahomira Herrmannova, Alexander Huba, Bikash Gyawali, Tomas Korec, Dominika Koroncziova, Magdalena Krygielova, Catherine Kuliavets, Sergei Misak, Jakub Novotny, Gabriela Pavel, Vojtech Robotka, Svetlana Rumyanceva, Maria Tarasiuk, Ian Tindle, Bethany Walker and Viktor Yakubiv, Zdenek Zdrahal and Anna Zelinska.

Author information

Drahomira Herrmannova

Present address: Oak Ridge National Laboratory Oak Ridge, Oak Ridge, TN, USA

Authors and Affiliations

Knowledge Media Institute, The Open University Walton Hall, Milton Keynes, UK

Petr Knoth, Drahomira Herrmannova, Matteo Cancellieri, Lucas Anastasiou, Nancy Pontika, Samuel Pearce, Bikash Gyawali & David Pride

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. is the Founder and Head of CORE. He conceived the idea and has been the project lead since the start in 2011. He researched and created the first version of CORE, acquired funding, built the team, and has been managing and leading all research and development. M.C., L.A., S.P. and P.K. designed, worked out all technical details, and implemented significant parts of the system including CHARS, the harvesting scheduler, and the OAI-PMH content harvesting method. All authors contributed to the maintenance, operation and improvements of the system. D.H. drafted the initial version of the manuscript based on consultations with P.K. D.P. and P.K. wrote the final manuscript with additional input from L.A. and N.P. D.H., M.C. and L.A. performed the data analysis for the paper and D.H. produced the figures. D.H., D.P., B.G. and L.A. participated in research activities and tasks related to CORE following the instructions and directly supervised by P.K.

Corresponding author

Correspondence to Petr Knoth .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Knoth, P., Herrmannova, D., Cancellieri, M. et al. CORE: A Global Aggregation Service for Open Access Papers. Sci Data 10 , 366 (2023). https://doi.org/10.1038/s41597-023-02208-w

Download citation

Received : 18 May 2021

Accepted : 03 May 2023

Published : 07 June 2023

DOI : https://doi.org/10.1038/s41597-023-02208-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research papers open access

  • DSpace@MIT Home

MIT Open Access Articles

The MIT Open Access Articles collection consists of scholarly articles written by MIT-affiliated authors that are made available through DSpace@MIT under the MIT Faculty Open Access Policy, or under related publisher agreements. Articles in this collection generally reflect changes made during peer-review.

Version details are supplied for each paper in the collection:

  • Original manuscript: author's manuscript prior to formal peer review
  • Author's final manuscript: final author's manuscript post peer review, without publisher's formatting or copy editing
  • Final published version: final published article, as it appeared in a journal, conference proceedings, or other formally published context (this version appears here only if allowable under publisher's policy)

Some peer-reviewed scholarly articles are available through other DSpace@MIT collections, such as those for departments, labs, and centers.

If you are an MIT community member who wants to deposit an article into the this collection, you will need to log in to do so. If you don't have an account, please contact us.

More information:

  • Working with MIT's open access policy
  • Submitting a paper under the policy
  • FAQ about the policy

Recent Submissions

Thumbnail

Examples out of Thin Air: AI-Generated Dynamic Context to Assist Program Comprehension by Example 

Thumbnail

Evaluating Prediction Mechanisms: A Profitability Test 

Thumbnail

Supermind Ideator: How Scaffolding Human-AI Collaboration Can Increase Creativity 

Show Statistical Information

feed

  • Search Menu
  • Sign in through your institution
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical Numismatics
  • Classical Literature
  • Classical Reception
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Late Antiquity
  • Religion in the Ancient World
  • Social History
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Language Acquisition
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Culture
  • Music and Religion
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Science
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Law and Politics
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Legal System - Costs and Funding
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Restitution
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Games
  • Computer Security
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business History
  • Business Strategy
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Social Issues in Business and Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Methodology
  • Economic Systems
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Management of Land and Natural Resources (Social Science)
  • Natural Disasters (Environment)
  • Pollution and Threats to the Environment (Social Science)
  • Social Impact of Environmental Issues (Social Science)
  • Sustainability
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • Ethnic Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Politics and Law
  • Politics of Development
  • Public Policy
  • Public Administration
  • Qualitative Political Methodology
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Disability Studies
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

research papers open access

  • Open access
Our open access publishing is key to delivering on our mission

Open access (OA) is a key part of how Oxford University Press (OUP) supports our mission to achieve the widest possible dissemination of high-quality research.  We publish rigorously peer-reviewed, world-leading, trusted open access research, upholding the highest standards of publication ethics and integrity.

We work closely with our publishing partners to ensure that we offer open access in a sustainable way, supporting publications for their communities and offering researchers publishing options for making their research available to all and compliant with funder mandates.

Our open access publishing in numbers

Our open access articles have the highest number of policy and patent document mentions, relative to volume of output, compared to other major academic publishers*

Our open access articles have the 2nd highest mean lifetime citation rate compared to other major academic publishers**

12 of our journals are diamond OA, meaning authors publish for free and readers access for free

We publish over 142 fully open access journals

More than 350 of the books we have published are open access

Over 450 of our journals have adopted a research data policy

Our Read & Publish agreements cover almost 1000 institutions at which authors can use funds to publish their article open access in an OUP journal

More than 27,000 of the journal articles we published in 2023 are open access

Open access for Journals

OUP’s options for publishing open access in journals include:

Fully open access

Articles published in fully OA journals are available to all; no subscription is required. OUP’s fully OA journals use Creative Commons licenses and there is usually an Article Processing Charge (APC) for OA publication.

Hybrid open access

Hybrid journals include a mix of open access articles and articles available to those with a journal subscription.

Hybrid journals offer authors the option of gold open access publishing. With gold open access, authors usually pay an APC to make their research articles available immediately upon publication, under a Creative Commons licence with re-use rights for readers.

For articles published under a Creative Commons licence, readers can re-use the work under the terms of the applicable licence.

‘Read and Publish’ transformative agreements

OUP has agreements with many institutions to provide access to OUP journals for faculty and students and provide funding for open access publishing for affiliated researchers. Find out which institutions are participating, and how to take advantage of available funding for publishing in an OUP journal .

Green open access and self-archiving

OUP has self-archiving policies that permit authors to take advantage of green open access by depositing their accepted manuscript (i.e. the post-acceptance version, before copyediting) into a non-commercial repository. In non-commercial repositories, articles can become freely available after the proscribed embargo period. Find out more about OUP green OA for journals .

Inclusive publishing

OUP believes that the move to open access and open research needs to be equitable and inclusive for all. We want to ensure that authors can publish in their journal of choice. As part of our Developing Countries Initiative , corresponding authors based in qualifying countries publishing in any of OUP’s fully open access journals are eligible for a full waiver of their open access charge.

Open access for Books

OUP has supported OA for books since 2012 as part of our mission to publish high-quality academic and research publications and ensure they are accessible and discoverable.

Publishing your book on an OA basis makes your work freely available online, with no barriers to access. OUP applies the same peer review and editorial development processes to all books whether published open access or under a customer sales model.

If you are considering publishing a book on an OA basis with OUP, please discuss the idea with your Editor. In most instances, the open access fee for books is met by a research funder under their funding and open access policy. All prospective authors are encouraged to provide information on any funding which directly supports the research for a proposed book so that we can plan the publishing route accordingly. You can also consult our information on funders and funder policies .

When a book is published OA it is:

available to read on the Oxford Academic platform both in a browser and as a downloadable PDF

available on Google books as a full preview

indexed in, and available from, the OAPEN online library and the Directory of Open Access Books (DOAB) as a PDF

sold in print and as an eBook

As well as publishing new books on an open access basis we are also able to convert backlist titles to OA and if you are the author of a published work and a funder has made funds available to help accelerate OA by converting existing published works, please contact your Editor.

See the full list of our open access books.

Find out more about licences, charges and self-archiving for your open access book .

*Data source: Altmetric. Comparing number of policy and patent document mentions, relative to number of articles published, to Cambridge University Press, Elsevier, Frontiers, Hindawi, Institute of Physics Publishing, MDPI, PLOS, Sage, Springer Nature, Taylor & Francis, and Wiley.

**Data source: Dimensions. Comparing the mean lifetime citation rate of open access articles to those published by Cambridge University Press, Elsevier, Frontiers, Hindawi, Institute of Physics Publishing, MDPI, PLOS, Sage, Springer Nature, Taylor & Francis, and Wiley.

Related information

  • Complying with funder policies on open access
  • Charges, licences, and self-archiving
  • Read and publish agreements
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Illustration on Open Science

  • UNESCO Recommendation on Open Science
  • Development of the Recommendation
  • Implementation strategy
  • Reporting by Member States
  • Open Science Toolkit
  • Knowledge Sharing Index
  • Capacity Building Index

Open Access to Research Papers

Making scholarly research outputs openly available is easy, legal, and has demonstrable benefits to authors, making it a good beginning step for a researcher just beginning to explore the open world. There is a set of knowledge required to navigate the Open Access landscape, involving copyright, article status, repositories, and economics. This module will introduce key concepts and tools that can help a researcher make their work openly available and maximize the benefits to themselves and others.

Related items

  • Region: Global
  • Type of Resources: E-course or annotated syllabus, learning module
  • Category: Open access
  • Target audience: Researchers
  • Target audience: STEM students
  • Target audience: Library & information specialists
  • Target audience: Early career professionals
  • See more add
  • Contact: [email protected]
  • Natural Sciences
  • Locations and Hours

Freely Available and Open Access Resources

Open access directories, open access databases, open access e-books & digital collections.

  • Arts & Music OA Resources
  • Humanities OA Resources
  • Social Sciences OA Resources
  • Physical Sciences OA Resources
  • Life Sciences OA Resources
  • Public Library Resources
  • Special Collections & Archives
  • Search UC Library's OA Resources
  • Attribution
  • Directory of Open Access Research Papers The world’s largest collection of open access research papers.
  • Directory of Open Access Dissertations Browse millions of electronic theses and dissertations.
  • Directory of Open Access Journals DOAJ's mission is to increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language.
  • Directory of Open Access Books Contains several hundred academic books that are published online and are open access.
  • Directory of Open Access Repositories OpenDOAR is the quality-assured global directory of academic open access repositories. It enables the identification, browsing and search for repositories, based on a range of features, such as location, software or type of material held.
  • Open Access JSTOR & Artstor Resources Browse open and free content made available through JSTOR and Artstor.
  • Academic Journals Academic Journals' mission is to accelerate the dissemination of knowledge through the publication of high quality research articles using the open access model.
  • Internet Archive The Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more.
  • Paperity Paperity is a multidisciplinary aggregator of Open Access journals and papers.
  • Google Scholar Uses the Google search engine to search for scholarly materials such as peer-reviewed papers, theses, books, preprints, abstracts and technical reports from broad areas of research. It includes a variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web. Includes full text and citations.
  • Project Gutenberg Project Gutenberg is a library of over 70,000 free eBooks, with focus on older works for which U.S. copyright has expired.
  • Library of Congress - Open Access Books A growing collection of contemporary open access e-books. The books in this collection cover a wide range of subjects, including history, music, poetry, technology, and works of fiction. Most of the books in this collection were published in English, but there are some titles in other languages.
  • OpenStax Open Access Textbooks OpenStax is an educational initiative headed by Rice University dedicated to publishing high-quality, peer-reviewed, openly licensed college textbooks that are absolutely free online and low cost in print.
  • Google Books Google's free books are made available to read through careful consideration of and respect for copyright law globally: they are public-domain works, made free on request of the copyright owner, or copyright-free, e.g. US government documents. Use this link to search for available books in Google's digital database.
  • HathiTrust Digital Library Not-for-profit collaborative of academic and research libraries now preserving 18+ million digitized items in the HathiTrust Digital Library. We offer reading access to the fullest extent allowable by U.S. and international copyright law, text and data mining tools for the entire corpus, and other emerging services based on the combined collection.
  • UCLA Library Digital Collections Rare and unique digital materials developed by the UCLA Library to support education, research, service, and creative expression, including the AIDS Poster Collection, the Los Angeles Times Photographic Archive and more. Legacy site with additional content also available.
  • Library of Congress Digital Collections Still and moving images, prints, maps, and other documents that chronicle historical events, people, places, and ideas that continue to shape America and more. (From the collections of the Library of Congress.)
  • Digital Public Library of America Launched in 2013, this site provides access to millions of digitized primary sources from archives, museums, and libraries across United States.
  • << Previous: Home
  • Next: Arts & Music OA Resources >>
  • Last Updated: Jul 26, 2024 2:08 PM
  • URL: https://guides.library.ucla.edu/free-resources

(Stanford users can avoid this Captcha by logging in.)

  • Send to text email RefWorks EndNote printer

CORE : the world's largest collection of open access research papers

Search this database, more options.

  • Find it at other libraries via WorldCat
  • Contributors

Description

Creators/contributors, contents/summary, bibliographic information, browse related items.

Stanford University

  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Non-Discrimination
  • Accessibility

© Stanford University , Stanford , California 94305 .

Cambridge University Press & Assessment

Products and services

Our innovative products and services for learners, authors and customers are based on world-class research and are relevant, exciting and inspiring.

  • Academic Research, Teaching and Learning
  • English Language Learning
  • English Language Assessment
  • International Education
  • Educational resources for schools
  • Educational Research & Network
  • Cambridge Assessment Admissions Testing
  • Cambridge CEM
  • Partnership for Education
  • Cambridge Dictionary
  • The Cambridge Mathematics Project

We unlock the potential of millions of people worldwide. Our assessments, publications and research spread knowledge, spark enquiry and aid understanding around the world.

  • Diversity & Inclusion
  • Environment
  • United Nations Global Compact
  • Communities
  • Diversity and inclusion
  • Annual Report 2023
  • News and insights
  • Anti Slavery and Human Trafficking
  • Australia & New Zealand Terms of Trade
  • Candidate Privacy Notice
  • Conditions of Sale - Consumer
  • Conditions of Sale - Goods
  • Freedom of Information
  • Mobile Apps
  • Purchase Terms
  • Safeguarding policy
  • Security & Vulnerability Disclosure Policy
  • Social Media Comments Policy
  • Website Terms of Use
  • Accessibility
  • Rights and permissions
  • Media enquiries
  • Delivery and returns
  • Remittance information

No matter who you are, what you do, or where you come from, you’ll feel proud to work here.

Your bag is empty.

Remove item

Are you sure you want to remove from your bag?

News and Events

More than 50 percent of cambridge research papers now open access.

Over half of Cambridge University Press research articles are now published open access and so freely available to read. 

Open Access brand image

Having passed the 50 percent threshold last year - approximately 10,000 papers - Cambridge University Press is aiming for the vast majority of its research papers to be published fully open access (OA) each year by 2025. 

OA research has a significantly higher readership and impact. Such articles published by Cambridge alone - and freely available online via Cambridge Core - receive about 3.5 times more full text views and on average 1.6 times more citations. Under traditional subscription models, readers, or their institutions, pay to access research. 

Transformative agreements 

Cambridge, the world's oldest academic press, has signed transformative agreements covering over 2,000 institutions worldwide: enabling researchers at those universities and research institutes to publish open research at no additional cost. 

Hitting this OA milestone is especially remarkable, as about 60 percent of our research publications are in the humanities and social sciences - fields where research funding constraints have historically held back open research adoption, relative to science, technology and medicine. 

Cambridge OA papers meet the highest standards, undergoing thorough peer review, before becoming permanently available for anyone to read. Open access research is published under a Creative Commons license , allowing people to freely view, download and distribute content. 

Over 400 Cambridge journals offer OA options to publishers; 66 with fully open access and 340 hybrid. 

Mandy Hill, Managing Director, Academic, at Cambridge University Press & Assessment, said: 

"Two years ago, we set a bold ambition to transition our research publishing to open access by 2025 and this is a major milestone towards that goal. As an academic publisher, we are committed to maximising the dissemination and impact of high-quality research and gold open access is a sustainable route to support that mission.  "That's why we will not stop at half of our journal papers being open access; we are working to make the vast majority of such research fully open by 2025. 

"Transformative agreements have provided an important route towards open access for all authors, irrespective of their funding. We are building on this momentum to exploring a range of business models to take us beyond the Transformative Agreement and establish innovations to ensure the world's academics, students and citizens can enjoy open access in a sustainable manner." 

Sustainable and open 

Cambridge has adopted, and in some cases pioneered, financial models to make open research sustainable. In the United States alone, Cambridge has signed more than 300 such agreements within the last two years, including with the University of California, the world's leading public research university system. 

We have developed our own open research platform, Cambridge Open Engage , to accelerate dissemination and collaboration. 

This has led to a partnership with ChemRxiv, as well as major conferences such as Climate Exp0 and has extended our existing relationship with the American Political Science Association (APSA). 

The open approach is bolstered by Cambridge Prisms , a series of fully OA journals that map out and build connections in cross-disciplinary subject areas to address real-world challenges. Coastal Futures , Precision Medicine , Global Mental Health and Extinction launched in January 2023, with Plastics and Water set to launch in early summer. 

Beyond journal papers, Cambridge is experimenting with new models to publish open research monographs, such as Flip it Open , which makes the most popular books freely available online once they have met a revenue threshold. 

Cambridge University Press & Assessment logo

What we do What we do

  • Author support
  • Assessment Research

About us About us

  • People and planet
  • University of Cambridge

© 2024 Cambridge University Press & Assessment

  • Modern slavery

Do you want to continue?

You are about to change your country, but have items in your bag.

Your current bag items may not be available in the new country you wish to view, so we need to empty your bag.

Email verified with restricted access to resources

Your email has been verified. You are now able to request access to teacher restricted resources. If you are a teacher, simply complete the teacher resource request form, and wait for your request to be validated.

Email verified

Your email has been verified.

Other ways to search:

  • Events Calendar
  • Jobs & Opportunities
  • Copyright Information

OneSearch: Find articles, books and more

A-Z Databases  •  E-Journals  •  Interlibrary Loan  •  Library Catalog •  Advanced Search

Open Access Repositories and Self-Archiving

A repository is an online database used by institutions and organizations to capture, preserve, and provide access to the intellectual output of a scholarly community. Self-archiving ensures the long-term digital preservation of a work, as well as increasing its visibility on search engines such as Google, and boosting its potential impact.

With the University of Colorado at Boulder’s repository, CU Scholar (effective April 22 nd , 2015), all faculty grant The Regents of the University of Colorado a non-exclusive, irrevocable, worldwide license to exercise any and all rights under copyright relating to their scholarly journal articles and conference proceedings. View the full Policy .

The University of Colorado Boulder Open Access Policy helps faculty retain the rights often signed over to a publisher when signing an agreement. An optional author addendum may be added to a publication submission to further reserve rights to self-archive.

Many U.S. federal funders have generated public access policies for their work. Make sure to review your obligations for sharing your federally funded research.

Visit CU Scholar 

Additional Information and Resources on Depositing your Work in a Repository

General information and advantages of depositing at CU Boulder:

  •   Terms of Use for CU Scholar

Resources regarding copyright and self-deposit:

  • SHERPA-RoMEO – directory of publisher policies on self-archiving
  • SPARC Author Addendum – how to retain rights in your author agreement

Federal Funder Sharing Policies:

  • NIH Public Access Policy
  • SPARC Article Sharing Requirements by Federal Agency

Subject Repositories

Subject Repository List – List of open access repositories that accept research based on discipline rather than institution

Selected Subject Repositories:

  • ArXiv.org - Contains over 500,000 e-prints in the fields of physics, mathematics, computer science, quantitative biology, statistics and non-linear science. Maintained by Cornell University.
  • E-Print Network  (U.S. Department of Energy, Office of Scientific and Technical Information) Provides access to numerous repositories in life sciences, engineering, environmental sciences, mathematics, physics and other disciplines relevant to the Department of Energy.
  • PubMed Central National Institutes of Health’s publicly accessible repository of biomedical and life sciences journal literature.
  • Humanities Commons – platform for humanists to share research and teaching materials and cultivate interdisciplinary connections. Maintained by MLA.
  • Data Services
  • Digital Scholarship
  • Find Articles and Databases
  • Information Literacy
  • CU Boulder Open Monograph Fund
  • Guidance for Public Access to Federally Funded Research
  • Open Access Fund
  • Open Access Resolutions & Policies
  • Open Access Repositories & Self-Archiving
  • Support for Open Access Resources and Publishing
  • Request a Research Consultation
  • Research by Subject
  • Research for a Course
  • Research Strategies
  • Request a Research Seminar

Researchers on open access practices

Researchers on open access practices 2024.

August 01, 2024

research papers open access

Researchers on Open Practices

The growth of open access continues to accelerate within the research community, transforming how knowledge is disseminated and accessed worldwide. to understand what researchers truly think about open research, we conducted an extensive survey with over 600 authors from 125 countries participating. our findings demonstrate the evolving attitudes towards open access practices.,  open access.

75% image

72% of respondents had published their work open access in the past three years. 

85%

87% agree that publishing an article open access increases the impact of their work. That’s compared to 77% who agreed in 2021.

 Transformational agreements, institutions and funding

78% of researchers would submit their research open access if the article publication charge (APC) was paid by their institution or funder.

Are transformational agreements (TAs) the solution? 78% (a 3% increase compared to 2023 ) of researchers consider transformational agreements the right solution to making research findings more openly available.

68% of researchers cite lack of funds for publishing fees as an obstacle to open access publishing. This has decreased significantly by 13% compared to 2023.

Each year we’re seeing a significantly lower percentage of TA-affiliated researchers indicating issues with accessing articles as a result of the rate of article transition to fully open access.

Check if your open access fees could be covered via a transformational agreement here.

 Impact of open practices

Percentage of authors engaging in open practices compared to 2023.

Many open research behaviours, including open access, open data and open peer review, have high participation rates. 

An image showing the years 2021 on the left and 2023 on the right to convey the differences in the ways authors engaged in open access practices over time

86% of researchers felt it’s important or very important to have the final Version of Record published in a peer-reviewed journal.

‘benefits-based’ reasons are more important motivators than ‘requirements’ in driving open research behaviours. the most commonly cited reasons researchers choose to publish open access are the impact and visibility it provides, for public benefit, and for transparency and reuse of their work., motivators for those publishing open access:.

Graph showing 65%

 Supporting the Transition to Open Research

Icon of an open lock

As of July 2024, we have over 100 transformational agreements in place.

These agreements give researchers from 2,800+ institutions the opportunity to publish open access and benefit from their research being made publicly available to all., that’s over 82,000 articles that will be open access thanks to transformational agreements,  research4life.

We partner with Research4Life ,  making the benefits of open access publishing available to authors in low- and lower-middle income countries. The article publication charges (APCs) on publishing in our fully open access portfolio are:

Waived for 86  countries

Discounted for 41 countries, in 2023 wiley waived the apcs for 3,250 articles. that's 38% more than in 2022..

Our survey findings show that authors in Research4Life countries engage in open practices at about the same rate as authors globally.   

 Find out more about open access here.

research papers open access

Your Guide to Publishing Open Access with Wiley

research papers open access

Navigating Creative Commons: A Guide to CC BY, CC BY-NC, and CC BY-NC-ND Licenses

research papers open access

Introducing the Forward Series journals

research papers open access

Advancing Open Access: A broader portfolio of journals offers greater discoverability and impact

research papers open access

Demonstrating the Advantage of Publishing Open Access with Wiley

research papers open access

Why Choose Open Access Infographic

research papers open access

There are Clear Advantages When you Choose to Publish Open Access with Wiley

research papers open access

Publishing Gold Open Access With Wiley

Related articles.

What is open access and how does it work? Read Wiley's guide to publishing open access to learn everything there is to know about OA.

Looking to learn about which CC BY license is right for you?

Learn about our new Forward Series journals and how they keep author experience and research integrity at their core.

Discover our newly integrated portfolio of journals, providing varied publishing options and ensuring the discovery of high-quality research

Discover the ever evolving advantage of publishing Open Access with Wiley

Publishing your work open access means it’s available for anyone to read, use and build upon – enabling faster innovation, and removing barriers to readership.

You can be confident that your work has the best chance to be read, cited, and shared. Here's the data to prove it.

Learn the steps to publishing Gold Open Access with Wiley

research papers open access

Thriving, not just surviving: Women’s health research today

Is women's health research getting the attention it deserves?

research papers open access

Self-Archiving with Wiley

Publishing your work open access means it’s available  for anyone to read

FOR INDIVIDUALS

FOR INSTITUTIONS & BUSINESSES

WILEY NETWORK

ABOUT WILEY

Corporate Responsibility

Corporate Governance

Leadership Team

Cookie Preferences

Copyright @ 2000-2024  by John Wiley & Sons, Inc., or related companies. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.

Rights & Permissions

Privacy Policy

Terms of Use

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: evaluating sam2's role in camouflaged object detection: from sam to sam2.

Abstract: The Segment Anything Model (SAM), introduced by Meta AI Research as a generic object segmentation model, quickly garnered widespread attention and significantly influenced the academic community. To extend its application to video, Meta further develops Segment Anything Model 2 (SAM2), a unified model capable of both video and image segmentation. SAM2 shows notable improvements over its predecessor in terms of applicable domains, promptable segmentation accuracy, and running speed. However, this report reveals a decline in SAM2's ability to perceive different objects in images without prompts in its auto mode, compared to SAM. Specifically, we employ the challenging task of camouflaged object detection to assess this performance decrease, hoping to inspire further exploration of the SAM model family by researchers. The results of this paper are provided in \url{ this https URL }.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

research papers open access

Knowledge and Information Systems

An International Journal

  • Examines theoretical foundations, infrastructure, and enabling technologies of knowledge and information systems.
  • Each individual issue focuses on a small number of theme areas.
  • Publishes original research, critical reviews, and experience and vision papers.
  • Encourages submissions of high-quality papers from relevant conferences.
  • Aims for a target turnaround time of 3 months for reviews.
  • Diane Cook,
  • Dacheng Tao

research papers open access

Latest issue

Volume 66, Issue 8

Latest articles

Crop health assessment through hierarchical fuzzy rule-based status maps.

  • Danilo Cavaliere
  • Sabrina Senatore
  • Vincenzo Loia

research papers open access

An efficient approach for incremental erasable utility pattern mining from non-binary data

  • Yoonji Baek

research papers open access

An empirical study of a novel multimodal dataset for low-resource machine translation

  • Loitongbam Sanayai Meetei
  • Thoudam Doren Singh
  • Sivaji Bandyopadhyay

research papers open access

Wave Hedges distance-based feature fusion and hybrid optimization-enabled deep learning for cyber credit card fraud detection

  • Venkata Ratnam Ganji
  • Aparna Chaparala

research papers open access

An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory

  • Doaa Mohey El-Din
  • Aboul Ella Hassanein
  • Ehab E. Hassanien

research papers open access

Journal updates

Call for survey papers.

Knowledge and Information Systems seeks to publish survey papers on areas covered by the journal.

Journal information

  • ACM Digital Library
  • Australian Business Deans Council (ABDC) Journal Quality List
  • Current Contents/Engineering, Computing and Technology
  • EI Compendex
  • Google Scholar
  • INIS Atomindex
  • Japanese Science and Technology Agency (JST)
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer-Verlag London Ltd., part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research
  • Introduction
  • Conclusions
  • Article Information

A and B, left panels: Observed monthly mortality rate in 3 universities of China, and BI (monthly mean) in Beijing and Heilongjiang province, January 2016 to January 2023. The dotted vertical line represents the removal of the zero COVID policy in December 2022. The figure displays the BI trends, represented by light blue lines. Dark blue lines represent morality rates in Tsinghua University (THU) (A, left panel) and Harbin Institute of Technology (HIT) (B, left panel), and the light gray line represents mortality rates in Peking University (PKU) (A, left panel). A and B, right panels: Association between weekly death counts and BI (weekly mean), November 2022 to January 2023. Weekly deaths from PKU and HIT were used. Weekly death counts from THU were not obtainable. The orange, brown, and gray squares represent weeks in November 2022, December 2022, and January 2023, respectively.

HIT indicates Harbin Institute of Technology; PKU, Peking University; THU, Tsinghua University.

eFigure 1. Daily BI by term and region (September 1, 2022-January 31, 2023)

eFigure 2. Trends of COVID-19 related and non-COVID-19 related BI, December 2021-February 2023

Data Sharing Statement

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Xiao H , Wang Z , Liu F , Unger JM. Excess All-Cause Mortality in China After Ending the Zero COVID Policy. JAMA Netw Open. 2023;6(8):e2330877. doi:10.1001/jamanetworkopen.2023.30877

Manage citations:

© 2024

  • Permissions

Excess All-Cause Mortality in China After Ending the Zero COVID Policy

  • 1 Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
  • 2 Independent researcher

Question   Was the sudden end of China’s zero COVID policy associated with an increase in population all-cause mortality?

Findings   In this cohort study across all regions in mainland China, an estimated 1.87 million excess deaths occurred among individuals 30 years and older during the first 2 months after the end of China’s zero COVID policy. Excess deaths predominantly occurred among older individuals and were observed across all provinces in mainland China, with the exception of Tibet.

Meaning   These findings suggest that the sudden lifting of the zero COVID policy in China was associated with significant increases in all-cause mortality.

Importance   In China, the implementation of stringent mitigation measures kept COVID-19 incidence and excess mortality low during the first years of the pandemic. However, China’s decision to end its dynamic zero COVID policy (a proactive strategy that deploys mass testing and strict quarantine measures to stamp out any outbreak before it can spread) in December 2022 resulted in a surge in COVID-19 incidence and hospitalizations. Despite worldwide attention given to this event, the actual impact of this sudden shift in policy on population mortality has not been empirically estimated.

Objective   To assess the association of the sudden shift in China’s dynamic zero COVID policy with mortality using empirical and syndromic surveillance data.

Design, Setting, and Participants   This cohort study analyzed published obituary data from 3 universities in China (2 in Beijing and 1 in Heilongjiang) and search engine data from the Baidu index (BI; weighted frequency of unique searches for a given keyword relative to the total search volume on the Baidu search engine) in each region of China from January 1, 2016, to January 31, 2023. Using an interrupted time-series design, analyses estimated the relative change in mortality among individuals 30 years and older in the universities and the change in BI for mortality-related terms in each region of China from December 2022 to January 2023. Analysis revealed a strong correlation between Baidu searches for mortality-related keywords and actual mortality burden. Using this correlation, the relative increase in mortality in Beijing and Heilongjiang was extrapolated to the rest of China, and region-specific excess mortality was calculated by multiplying the proportional increase in mortality by the number of expected deaths. Data analysis was performed from February 10, 2023, to March 5, 2023.

Exposure   The end to the dynamic zero COVID policy in December 2022 in China.

Main Outcomes and Measures   Monthly all-cause mortality by region.

Results   An estimated 1.87 million (95% CI, 0.71 million-4.43 million; 1.33 per 1000 population) excess deaths occurred among individuals 30 years and older in China during the first 2 months after the end of the zero COVID policy. Excess deaths predominantly occurred among older individuals and were observed across all provinces in mainland China except Tibet.

Conclusions and Relevance   In this cohort study of the population in China, the sudden lifting of the zero COVID policy was associated with significant increases in all-cause mortality. These findings provide valuable insights for policy makers and public health experts and are important for understanding how the sudden propagation of COVID-19 across a population may be associated with population mortality.

During the first 3 years of the pandemic, China experienced low COVID-19–related excess mortality due to the implementation of stringent mitigation measures. 1 However, after China ended its dynamic zero COVID policy in December 2022, COVID-19 incidence and hospitalizations surged. 2 It has been reported by the Chinese government that approximately 60 000 COVID-19–related deaths occurred in health facilities in China from early December 2022 to January 12, 2023. 2 Prior forecasts had anticipated a notably higher number of excess deaths if the zero COVID policy were to be abandoned during the Omicron surge, ranging from 0.97 million to 2.10 million. 3 - 7 However, those model-based forecasts of excess deaths lacked an empirical basis.

Mortality information was derived from published obituary data for Peking University (PKU) and Tsinghua University (THU) in Beijing and Harbin Institute of Technology (HIT) in Harbin (Heilongjiang province) from January 1, 2016, to January 31, 2023. The number of employees, including current and retired staff, of the 3 universities as of 2022 were 19 992, 19 898, and 7293, respectively. The university published the obituary of each deceased official employee on its website, with an average delay of about 3 days after the date of death. Importantly, this process occurred both before and during the COVID-19 pandemic in a consistent fashion. The obituaries were extended to all deceased official employees, regardless of their age, sex, position (eg, professor, researcher, technician, librarian, and administrative staff), and employment status (ie, currently employed or retired). Our analysis did not include employees from the affiliated hospitals of the 3 universities because their obituaries were not published on the universities’ websites.

Syndromic surveillance data were collected through search queries from Baidu, a Chinese internet search engine. The Baidu index (BI) is the weighted frequency of unique searches for a given keyword relative to the total search volume on Baidu. 8 The number of internet users in China exceeded 1 billion as of March 2023, and Baidu search’s penetration rate reached over 90% among internet search engine users. 9 , 10 The BI has been widely used as a data source for infodemiology and infoveillance studies, particularly during outbreaks of infectious disease. 11 - 14 Daily BI values for mortality-related Chinese terms for “funeral parlour (殡仪馆[Bin Yi Guan]),” “cremation (火葬[Huo Zang]),” “crematorium (火葬场[Huo Zang Chang]),” and “burial (土葬[Tu Zang])” in each region (22 provinces, 4 municipalities, and 5 autonomous regions) of mainland China from January 1, 2016, to January 31, 2023, were obtained ( https://index.baidu.com/v2/index.html#/ ). This study was exempt from institutional review approval owing to the use of published literature and publicly available data.

We estimated the relative change in mortality among individuals 30 years and older in Beijing and Harbin from December 2022 to January 2023 using an interrupted time-series design, a quasi-experimental design widely used to assess the causal impact of shocks or interventions introduced at a distinct point in time. 15 We used a segmented negative binomial regression model, separating the time series into 3 periods: a pre–COVID-19 period (January 2016-December 2019), a period with stringent mitigation measures (January 2020-November 2022), and a post–zero COVID policy period (December 2022-January 2023). We included a linear association of time to capture the long-term secular trend of mortality rate. The negative binomial model equation (Equation 1) estimating monthly counts of deaths was specified as:

E(ln(Death)) = β 0  + β 1 Month + β 2 COVID + β 3 ZeroCovid + offset(ln(P)).

Here, Death represents the monthly count of deaths, Month is the time (in months) since the start of the study period, COVID is an indicator variable indicating whether time occurred prior to or after the start of the COVID-19 pandemic (coded as 1 for months occurring after December 2019, and 0 otherwise), ZeroCovid is an indicator variable indicating whether time occurred prior to or after the end of the zero COVID policy (coded as 1 for months occurring after November 2022, and 0 otherwise), and P represents the catchment population (number of employees) in Month t . Newey-West standard errors with autocorrelation of 1 lag were used.

Similarly, we estimated the relative change in the BI associated with the lifting of the zero COVID policy in each region in China. The negative binomial model equation (Equation 2) estimating the daily BI was specified as:

E(ln(BI i )) = β 0  + β 1 Day + β 2 COVID + β 3 ZeroCovid.

Here, BI i represents the daily BI in region i, Day is the time (in days) since the start of the study period, COVID is an indicator variable indicating whether time occurred prior to or after the start of the COVID-19 pandemic (coded as 1 for days occurring after December 2019, and 0 otherwise), and ZeroCovid is an indicator variable indicating whether time occurred prior to or after the end of the zero COVID policy (coded as 1 for days occurring after November 2022, and 0 otherwise). Newey-West standard errors with autocorrelation of 1 lag were used.

We observed a strong positive correlation (Beijing: r  = 0.95, P  < .001; Heilongjiang: r  = 0.97, P  < .001) between the change in BI for mortality-related terms and that in mortality due to relaxed zero COVID policies ( Figure 1 ). Additionally, similar patterns in the change of BI for mortality-related terms were observed in all regions of China (eFigure 1 in Supplement 1 ). Therefore, the relative increase in mortality in the reference region (Beijing and Heilongjiang) was extrapolated to the rest of China assuming this same proportional association (Equation 3):

(BI[RR ref ] − 1)/(MR[RR ref ] − 1) = (BI[RR i ] − 1)/(MR[RR i ] − 1).

Here, (BI[RR ref ] − 1) and (BI[RR i ] − 1) represent the estimated relative change in BI using Equation 2 for the reference region and region i, respectively. The estimated relative change in mortality rate for the reference region is represented by (MR[RR ref ]-1), calculated using Equation 1, while (MR[RR i ] − 1) denotes the projected relative change in mortality rate in region i.

Region-specific excess mortality was calculated by multiplying the proportional increase in mortality by the number of expected deaths (Equation 4):

EM i  = (MR[RR i ] − 1) × Death i .

Here, EM i is the excess mortality for region i, and Death i is the expected number of deaths for region i in December and January. The expected number of deaths by region and month was derived from the 2020 census data 16 and China National Disease Surveillance Points. 17

Additionally, sensitivity analyses were conducted using the leave-one-out method; that is, in 3 additional analyses, we iteratively excluded 1 of the 3 universities. In both the primary and sensitivity analyses, parameter uncertainty was incorporated by randomly drawing 10 000 samples from each parameter distribution and propagating this uncertainty forward through each step of the analysis. A 2-sided P  < .05 indicated statistical significance. Analyses were conducted in R, version 4.2.1 (R Foundation for Statistical Computing). This study is reported per the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) guidelines for cohort studies.

A total of 130 and 42 deaths occurred among employees of the 2 universities (PKU and THU) in Beijing in December 2022 and January 2023, respectively. A total of 12 and 19 deaths were reported among employees of HIT in Heilongjiang in December 2022 and January 2023, respectively. Among those deaths in Beijing, 76% (95% CI, 65%-84%) were in male individuals, and 80% (95% CI, 70%-87%) were in individuals 85 years and older, which was higher ( P  < .001) than the proportion of deaths in the 85-years-and-older age group in the prepandemic period and the first 3 years of the pandemic ( Table 1 ). The age and sex distributions among the deaths were similar in HIT ( Table 1 ).

In both cities, death counts peaked in the fourth week of December 2022, concurrent with the highest BI in most provinces on December 25 ( Figure 1 ; eFigure 1 in Supplement 1 ). The number of deaths in universities in Beijing showed a substantial increase compared with expected deaths, with a rise of 403% (95% CI, 351%-461%) and 56% (95% CI, 41%-73%) during December 2022 and January 2023, respectively ( Table 1 , Figure 1 ). Similarly, observed deaths in HIT were statistically significantly higher than the expected deaths both in December 2022 (12 vs 3; P  < .001) and January 2023 (19 vs 3; P  < .001).

The validity of our model was supported by examining whether placebo search terms (that is, search terms that are not expected to be related to the lifting of the zero COVID policy) also increased concurrently with mortality-related search terms. We found no evidence that this occurred, suggesting that mortality-related search terms through the BI served as valid surrogate for increased mortality (eFigure 2 in Supplement 1 ).

Overall, an estimated 1.87 million (95% CI, 0.71 million-4.43 million; 1.33 per 1000 population) excess deaths among individuals 30 years and older occurred in China from December 2022 to January 2023. Statistically significant increases in mortality were observed in all provinces except Tibet, ranging from 77% (95% CI, 24%-197%) in Guangxi to 279% (95% CI, 109%-658%) in Ningxia ( Table 2 , Figure 2 ). Estimates for excess deaths were generally consistent in the specified leave-one-out analyses ( Figure 3 ).

We estimated 1.87 million excess deaths in China during the first 2 months after the end of China’s zero COVID policy. Excess deaths predominantly occurred among older individuals and were observed in all provinces in China. The number of excess deaths far exceeded official Chinese government estimates of 60 000, although the pattern of excess deaths was consistent with Chinese government reports that COVID-19–related hospitalizations and deaths in hospitals achieved its peak at the end of December 2022. 2

The observation is consistent with prior model-based forecasts of excess deaths, trending toward a higher estimate. 5 , 6 , 18 Airfinity’s model projected between 1.3 million and 2.1 million deaths (0.93-1.50 per 1000 population) in China if the government abruptly ended its zero COVID policy. 5 , 7 Using a stochastic dynamic model of SARS-CoV-2 transmission, Cai et al 6 projected that the Omicron wave in mainland China could cause 112.2 million symptomatic cases (79.58 per 1000 population) and 1.6 million deaths (1.10 per 1000 population), should the zero COVID policy be lifted. Drawing on the experiences of Hong Kong in 2022 as prototypes, Ioannidis et al 18 projected 0.99 million COVID-19 deaths (0.71 per 1000 population) if the entire China population were infected after abandoning zero COVID policy. Our higher estimate than some forecasters may represent a greater effect of the SARS-CoV-2 virus on a population with limited immunity than anticipated. 3 , 18

Our study is among the first to provide rigorously derived, empirical estimates about excess deaths in China after the lifting of the zero COVID policy. Given the absence of comprehensive, publicly available data from China, our novel strategy for estimating excess deaths is both timely and important on this topic of public health concern both in China and internationally and demonstrates how the strategic combination of data sources can provide insights into seemingly opaque public health research questions. However, our study has limitations. The reliance on obituary data for employees from 3 universities in Beijing and Heilongjiang could result in an overestimation of excess mortality because university employees were older than the general population, or alternatively, an underestimation because the employees had higher socioeconomic status. 19 Such biases may be especially pronounced if patterns of representation of these variables among those with obituary data changed over time. Also, increases in BI searches may not have fully reflected mortality increases outside the reference region, leading to underestimations of excess mortality in other regions. Further validation of our estimate will be crucial once alternative data sources (eg, population-based mortality data at the national or subnational level) become available. In particular, data delineated by levels of age, sex, and socioeconomic status would allow covariate adjustment for these important demographic variables.

In this cohort study of the population in China, we found that the sudden lifting of zero COVID policy was associated with significant increases in all-cause mortality. Our study of excess deaths related to the lifting of the zero COVID policy in China sets an empirically derived benchmark estimate. These findings are important for understanding how the sudden propagation of COVID-19 across a population may impact population mortality.

Accepted for Publication: July 19, 2023.

Published: August 24, 2023. doi:10.1001/jamanetworkopen.2023.30877

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2023 Xiao H et al. JAMA Network Open .

Corresponding Authors: Hong Xiao, PhD ( [email protected] ), and Joseph M. Unger, PhD, MS ( [email protected] ), Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109-1024.

Author Contributions: Dr Xiao had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Xiao and Wang contributed equally.

Concept and design: All authors.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Xiao, Unger.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: All authors.

Obtained funding: Unger.

Administrative, technical, or material support: Xiao.

Supervision: Unger.

Conflict of Interest Disclosures: None reported.

Funding/Support: Research reported in this publication was supported by the Public Health Sciences Division of the Fred Hutchinson Cancer Center.

Role of the Funder/Sponsor: The Public Health Sciences Division of the Fred Hutchinson Cancer Center had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 2 .

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

IMAGES

  1. An Introduction to Open Access Publishing

    research papers open access

  2. IEEE Announces 3 New Fully Open Access Journals and 5 Hybrid Journals

    research papers open access

  3. (PDF) Open Access Economics Journals and the Market for Reproducible

    research papers open access

  4. Policies and Perspectives

    research papers open access

  5. Free Research

    research papers open access

  6. Open Access Research Journals

    research papers open access

VIDEO

  1. Unlocking the Secrets of arxiv

  2. 2024 QP:OPEN ELECTIVE : SPOKEN ENGLISH FOR CORPORATE JOBS: III SEM 2024 SOLVED QP: BU

  3. How I Published 3 Research Papers in High School (secrets revealed)

  4. Science for all with compatible AI

  5. Open Access at UBC Library

  6. No Research Papers? Boost Your Masters/PhD Applications with These Tips

COMMENTS

  1. OA.mg

    Free access to millions of research papers for everyone. OA.mg is a search engine for academic papers. Whether you are looking for a specific paper, or for research from a field, or all of an author's works - OA.mg is the place to find it. Universities and researchers funded by the public publish their research in papers, but where do we ...

  2. ScienceOpen

    Make an impact and build your research profile in the open with ScienceOpen. Search and discover relevant research in over 95 million Open Access articles and article records; Share your expertise and get credit by publicly reviewing any article; Publish your poster or preprint and track usage and impact with article- and author-level metrics; Create a topical Collection to advance your ...

  3. Directory of Open Access Journals

    About the directory. DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, and is committed to ensuring quality content is freely available online for everyone. DOAJ is committed to keeping its services free of charge, including being indexed, and its data freely available.

  4. CORE

    Research Policy Adviser. Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

  5. arXiv.org e-Print archive

    arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.

  6. Open Research Library

    The Open Research Library (ORL) is planned to include all Open Access book content worldwide on one platform for user-friendly discovery, offering a seamless experience navigating more than 20,000 Open Access books. Open Research Library.

  7. SpringerOpen

    The SpringerOpen portfolio has grown tremendously since its launch in 2010, so that we now offer researchers from all areas of science, technology, medicine, the humanities and social sciences a place to publish open access in journals. Publishing with SpringerOpen makes your work freely available online for everyone, immediately upon ...

  8. Open access

    Open access is an integral part of Elsevier's commitment to a collaborative. ... Each year, we receive around 3 million research papers from authors. Whether published open access or via subscription model, they are all rigorously reviewed by our in-house editorial teams in collaboration with 33,000 editors and 1.5 million expert reviewers ...

  9. Journals

    Asian-Pacific Journal of Second and Foreign Language Education. Disciplinary and Interdisciplinary Science Education Research. Empirical Research in Vocational Education and Training. International Journal of Child Care and Education Policy. International Journal of STEM Education. Language Testing in Asia.

  10. The fundamentals of open access and open research

    Open access (OA) refers to the free, immediate, online availability of research outputs such as journal articles or books, combined with the rights to use these outputs fully in the digital environment. OA content is open to all, with no access fees. Open research goes beyond the boundaries of publications to consider all research outputs ...

  11. Open access journals

    We publish the world's most significant open access portfolio. In 2016, we helped over 78,000 authors from all over the world make their research freely available. We publish over 600 fully open access journals in all disciplines, from the life sciences to the humanities. Authors also have the option to publish their article under an open access licence in more than 1,700 of our subscription ...

  12. Open access at the Nature Portfolio

    Nature and the Nature research journals - now with immediate gold open access options for all primary research. ... publishing papers that significantly advance the natural sciences. Since ...

  13. CORE: A Global Aggregation Service for Open Access Papers

    Abstract. This paper introduces CORE, a widely used scholarly service, which provides access to the world's largest collection of open access research publications, acquired from a global ...

  14. MIT Open Access Articles

    MIT Open Access Articles. The MIT Open Access Articles collection consists of scholarly articles written by MIT-affiliated authors that are made available through DSpace@MIT under the MIT Faculty Open Access Policy, or under related publisher agreements. Articles in this collection generally reflect changes made during peer-review.

  15. Open access

    Open access (OA) is a key part of how Oxford University Press (OUP) supports our mission to achieve the widest possible dissemination of high-quality research. We publish rigorously peer-reviewed, world-leading, trusted open access research, upholding the highest standards of publication ethics and integrity. We work closely with our publishing ...

  16. Open Access at AAAS

    We support Open Access (OA) options that are informed by the scientific community, contribute to the accurate record of published scientific content, and protect the overall integrity of that content. ... We also encourage authors to use preprint servers to make earlier versions of research papers and the underlying data available.

  17. Publishing Open Access research journals & papers

    Hindawi journals have joined Wiley's open access journal portfolio. Journal content is available to openly view, download, and share on Wiley Online Library.. With a 200 year tradition of publishing excellence, Wiley is committed to expanding routes to open access publishing and ensuring the maximum reach and impact of high-quality, trusted research for the benefit of humankind.

  18. Routledge Open Research

    The new publishing platform will enable them to fully embrace Open Science meeting their publishing needs and openly share, use and find linked publications and data.". Routledge Open Research is an innovative open access publishing platform offering rapid publication and open peer review, whilst supporting data deposition and sharing.

  19. Open Access to Research Papers

    Making scholarly research outputs openly available is easy, legal, and has demonstrable benefits to authors, making it a good beginning step for a researcher just beginning to explore the open world. There is a set of knowledge required to navigate the Open Access landscape, involving copyright, article status, repositories, and economics.

  20. Research Guides: Freely Available and Open Access Resources: General

    Paperity is a multidisciplinary aggregator of Open Access journals and papers. Google Scholar Uses the Google search engine to search for scholarly materials such as peer-reviewed papers, theses, books, preprints, abstracts and technical reports from broad areas of research.

  21. List of open-access journals

    This is a list of open-access journals by field. The list contains notable journals which have a policy of full open access. It does not include delayed open access journals, hybrid open access journals, or related collections or indexing services.. True open-access journals can be split into two categories:

  22. CORE : the world's largest collection of open access research papers

    Summary. CORE's mission is to aggregate all open access research worldwide and deliver unrestricted access for all. CORE harvests research papers from such as institutional and subject repositories, and open access and hybrid journals. CORE currently contains 207,255,818 open access articles collected from 10,286 data providers around the world."

  23. More than 50 percent of Cambridge research papers now open access

    Having passed the 50 percent threshold last year - approximately 10,000 papers - Cambridge University Press is aiming for the vast majority of its research papers to be published fully open access (OA) each year by 2025. OA research has a significantly higher readership and impact.

  24. Open Access Repositories and Self-Archiving

    Subject Repository List - List of open access repositories that accept research based on discipline rather than institution Selected Subject Repositories: ArXiv.org - Contains over 500,000 e-prints in the fields of physics, mathematics, computer science, quantitative biology, statistics and non-linear science.

  25. Researchers on open access practices (2024)

    86% of researchers felt it's important or very important to have the final Version of Record published in a peer-reviewed journal. 'Benefits-based' reasons are more important motivators than 'requirements' in driving open research behaviours. The most commonly cited reasons researchers choose to publish open access are the impact and visibility it provides, for public benefit, and ...

  26. Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2

    The Segment Anything Model (SAM), introduced by Meta AI Research as a generic object segmentation model, quickly garnered widespread attention and significantly influenced the academic community. To extend its application to video, Meta further develops Segment Anything Model 2 (SAM2), a unified model capable of both video and image segmentation. SAM2 shows notable improvements over its ...

  27. Biosynthesized Metallic and Bimetallic Nanoparticles as Effective

    Overall, biosynthesized metallic and bimetallic NPs hold promise as effective biocides for plant protection, but further research is needed to fully understand their mechanisms of action, optimize their efficacy, and ensure their safe and sustainable use in agriculture.

  28. Home

    Knowledge and Information Systems is an international forum publishing state-of-the-art research on emerging topics in knowledge and advanced information systems. Examines theoretical foundations, infrastructure, and enabling technologies of knowledge and information systems.

  29. Excess All-Cause Mortality in China After Ending the Zero COVID Policy

    The paper has estimated COVID-19 mortality in mainland China using data on deaths in University employees. For comparison, another study found 9-fold higher COVID-19 rates in residential facilities for dependent elderly persons compared to other residences for elderly persons in France[1]. ... Open Access: This is an open ... Research reported ...

  30. The environmental impact of AI in the lab: a double-edged sword?

    Search calls for papers; Journal Suggester; Open access publishing; We're here to help. Find guidance on Author ... The use of open-access models presents undeniable benefits - AlphaFold reportedly has 2.4 million users a year, and one model trained and used by millions is arguably better than having millions of smaller models trained and ...