🇺🇦    make metadata, not war

A comprehensive bibliographic database of the world’s scholarly literature

The world’s largest collection of open access research papers, machine access to our vast unique full text corpus, core features, indexing the world’s repositories.

We serve the global network of repositories and journals

Comprehensive data coverage

We provide both metadata and full text access to our comprehensive collection through our APIs and Datasets

Powerful services

We create powerful services for researchers, universities, and industry

Cutting-edge solutions

We research and develop innovative data-driven and AI solutions

Committed to the POSI

Cost-free PIDs for your repository

OAI identifiers are unique identifiers minted cost-free by repositories. Ensure that your repository is correctly configured, enabling the CORE OAI Resolver to redirect your identifiers to your repository landing pages.

OAI IDs provide a cost-free option for assigning Persistent Identifiers (PIDs) to your repository records. Learn more.

Who we serve?

Enabling others to create new tools and innovate using a global comprehensive collection of research papers.

Companies

“ Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can ... ” Show more

Gareth Malcolm, Content Partner Manager at Turnitin

Academic institutions.

Making research more discoverable, improving metadata quality, helping to meet and monitor open access compliance.

Academic institutions

“ CORE’s role in providing a unified search of repository content is a great tool for the researcher and ex... ” Show more

Nicola Dowson, Library Services Manager at Open University

Researchers & general public.

Tools to find, discover and explore the wealth of open access research. Free for everyone, forever.

Researchers & general public

“ With millions of research papers available across thousands of different systems, CORE provides an invalu... ” Show more

Jon Tennant, Rogue Paleontologist and Founder of the Open Science MOOC

Helping funders to analyse, audit and monitor open research and accelerate towards open science.

Funders

“ Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, hel... ” Show more

Ben Johnson, Research Policy Adviser at Research England

Our services, access to raw data.

Create new and innovative solutions.

Content discovery

Find relevant research and make your research more visible.

Managing content

Manage how your research content is exposed to the world.

Companies using CORE

Gareth Malcolm

Gareth Malcolm

Content Partner Manager at Turnitin

Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can utilise in our plagiarism detection software.

Academic institution using CORE

Kathleen Shearer

Executive Director of the Confederation of Open Access Repositories (COAR)

CORE has significantly assisted the academic institutions participating in our global network with their key mission, which is their scientific content exposure. In addition, CORE has helped our content administrators to showcase the real benefits of repositories via its added value services.

Partner projects

Ben Johnson

Ben Johnson

Research Policy Adviser

Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

logo

This website uses cookies to ensure you get the best experience. Learn more about DOAJ’s privacy policy.

Hide this message

You are using an outdated browser. Please upgrade your browser to improve your experience and security.

The Directory of Open Access Journals

Directory of Open Access Journals

Find open access journals & articles.

Doaj in numbers.

80 languages

135 countries represented

13,753 journals without APCs

20,943 journals

10,494,725 article records

Quick search

About the directory.

DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, and is committed to ensuring quality content is freely available online for everyone.

DOAJ is committed to keeping its services free of charge, including being indexed, and its data freely available.

→ About DOAJ

→ How to apply

DOAJ is twenty years old in 2023.

Fund our 20th anniversary campaign

DOAJ is independent. All support is via donations.

82% from academic organisations

18% from contributors

Support DOAJ

Publishers don't need to donate to be part of DOAJ.

News Service

Meet the doaj team: head of editorial and deputy head of editorial (quality), vacancy: operations manager, press release: pubscholar joins the movement to support the directory of open access journals, new major version of the api to be released.

→ All blog posts

We would not be able to work without our volunteers, such as these top-performing editors and associate editors.

→ Meet our volunteers

Librarianship, Scholarly Publishing, Data Management

Brisbane, Australia (Chinese, English)

Adana, Türkiye (Turkish, English)

Humanities, Social Sciences

Natalia Pamuła

Toruń, Poland (Polish, English)

Medical Sciences, Nutrition

Pablo Hernandez

Caracas, Venezuela (Spanish, English)

Research Evaluation

Paola Galimberti

Milan, Italy (Italian, German, English)

Social Sciences, Humanities

Dawam M. Rohmatulloh

Ponorogo, Indonesia (Bahasa Indonesia, English, Dutch)

Systematic Entomology

Kadri Kıran

Edirne, Türkiye (English, Turkish, German)

Library and Information Science

Nataliia Kaliuzhna

Kyiv, Ukraine (Ukrainian, Russian, English, Polish)

Recently-added journals

DOAJ’s team of managing editors, editors, and volunteers work with publishers to index new journals. As soon as they’re accepted, these journals are displayed on our website freely accessible to everyone.

→ See Atom feed

→ A log of journals added (and withdrawn)

→ DOWNLOAD all journals as CSV

  • Nota al Margen
  • RGUHS National Journal of Public Health
  • Magistra Andalusia
  • Revista Trágica
  • Shiyou huagong gaodeng xuexiao xuebao
  • JMIR XR and Spatial Computing
  • Revista de Investigación Educativa Intervención Pedagógica y Docencia
  • Revista Científica SENAI-SP
  • Revista Educação e Emancipação
  • Jurnal Gizi dan Pangan Soedirman
  • Germanica Wratislaviensia
  • Latin American Law Review
  • Geografia (Londrina)
  • Yıldız Sosyal Bilimler Enstitüsü Dergisi
  • Journal of the Pediatric Orthopaedic Society of North America

WeChat QR code

research papers open access

“The only truly modern academic research engine”

Oa.mg is a search engine for academic papers, specialising in open access. we have over 250 million papers in our index..

  • Advanced search
  • Peer review

research papers open access

Discover relevant research today

research papers open access

Advance your research field in the open

research papers open access

Reach new audiences and maximize your readership

ScienceOpen puts your research in the context of

Publications

For Publishers

ScienceOpen offers content hosting, context building and marketing services for publishers. See our tailored offerings

  • For academic publishers  to promote journals and interdisciplinary collections
  • For open access journals  to host journal content in an interactive environment
  • For university library publishing  to develop new open access paradigms for their scholars
  • For scholarly societies  to promote content with interactive features

For Institutions

ScienceOpen offers state-of-the-art technology and a range of solutions and services

  • For faculties and research groups  to promote and share your work
  • For research institutes  to build up your own branding for OA publications
  • For funders  to develop new open access publishing paradigms
  • For university libraries to create an independent OA publishing environment

For Researchers

Make an impact and build your research profile in the open with ScienceOpen

  • Search and discover relevant research in over 96 million Open Access articles and article records
  • Share your expertise and get credit by publicly reviewing any article
  • Publish your poster or preprint and track usage and impact with article- and author-level metrics
  • Create a topical Collection  to advance your research field

Create a Journal powered by ScienceOpen

Launching a new open access journal or an open access press? ScienceOpen now provides full end-to-end open access publishing solutions – embedded within our smart interactive discovery environment. A modular approach allows open access publishers to pick and choose among a range of services and design the platform that fits their goals and budget.

Continue reading “Create a Journal powered by ScienceOpen”   

What can a Researcher do on ScienceOpen?

ScienceOpen provides researchers with a wide range of tools to support their research – all for free. Here is a short checklist to make sure you are getting the most of the technological infrastructure and content that we have to offer. What can a researcher do on ScienceOpen? Continue reading “What can a Researcher do on ScienceOpen?”   

ScienceOpen on the Road

Upcoming events.

  • 15 June – Scheduled Server Maintenance, 13:00 – 01:00 CEST

Past Events

  • 20 – 22 February – ResearcherToReader Conference
  • 09 November – Webinar for the Discoverability of African Research
  • 26 – 27 October – Attending the Workshop on Open Citations and Open Scholarly Metadata
  • 18 – 22 October – ScienceOpen at Frankfurt Book Fair.
  • 27 – 29 September – Attending OA Tage, Berlin .
  • 25 – 27 September – ScienceOpen at Open Science Fair
  • 19 – 21 September – OASPA 2023 Annual Conference .
  • 22 – 24 May – ScienceOpen sponsoring Pint of Science, Berlin.
  • 16-17 May – ScienceOpen at 3rd AEUP Conference.
  • 20 – 21 April – ScienceOpen attending Scaling Small: Community-Owned Futures for Open Access Books .

What is ScienceOpen?

  • Smart search and discovery within an interactive interface
  • Researcher promotion and ORCID integration
  • Open evaluation with article reviews and Collections
  • Business model based on providing services to publishers

Live Twitter stream

Some of our partners:.

Akadémiai Kiadó

  • Locations and Hours

Freely Available and Open Access Resources

Open access directories, open access databases, open access e-books & digital collections.

  • Arts & Music OA Resources
  • Humanities OA Resources
  • Social Sciences OA Resources
  • Physical Sciences OA Resources
  • Life Sciences OA Resources
  • Public Library Resources
  • Special Collections & Archives
  • Search UC Library's OA Resources
  • Attribution
  • Directory of Open Access Research Papers The world’s largest collection of open access research papers.
  • Directory of Open Access Dissertations Browse millions of electronic theses and dissertations.
  • Directory of Open Access Journals DOAJ's mission is to increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language.
  • Directory of Open Access Books Contains several hundred academic books that are published online and are open access.
  • Directory of Open Access Repositories OpenDOAR is the quality-assured global directory of academic open access repositories. It enables the identification, browsing and search for repositories, based on a range of features, such as location, software or type of material held.
  • Open Access JSTOR & Artstor Resources Browse open and free content made available through JSTOR and Artstor.
  • Academic Journals Academic Journals' mission is to accelerate the dissemination of knowledge through the publication of high quality research articles using the open access model.
  • Internet Archive The Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more.
  • Paperity Paperity is a multidisciplinary aggregator of Open Access journals and papers.
  • Google Scholar Uses the Google search engine to search for scholarly materials such as peer-reviewed papers, theses, books, preprints, abstracts and technical reports from broad areas of research. It includes a variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web. Includes full text and citations.
  • Project Gutenberg Project Gutenberg is a library of over 70,000 free eBooks, with focus on older works for which U.S. copyright has expired.
  • Library of Congress - Open Access Books A growing collection of contemporary open access e-books. The books in this collection cover a wide range of subjects, including history, music, poetry, technology, and works of fiction. Most of the books in this collection were published in English, but there are some titles in other languages.
  • OpenStax Open Access Textbooks OpenStax is an educational initiative headed by Rice University dedicated to publishing high-quality, peer-reviewed, openly licensed college textbooks that are absolutely free online and low cost in print.
  • Google Books Google's free books are made available to read through careful consideration of and respect for copyright law globally: they are public-domain works, made free on request of the copyright owner, or copyright-free, e.g. US government documents. Use this link to search for available books in Google's digital database.
  • HathiTrust Digital Library Not-for-profit collaborative of academic and research libraries now preserving 18+ million digitized items in the HathiTrust Digital Library. We offer reading access to the fullest extent allowable by U.S. and international copyright law, text and data mining tools for the entire corpus, and other emerging services based on the combined collection.
  • UCLA Library Digital Collections Rare and unique digital materials developed by the UCLA Library to support education, research, service, and creative expression, including the AIDS Poster Collection, the Los Angeles Times Photographic Archive and more. Legacy site with additional content also available.
  • Library of Congress Digital Collections Still and moving images, prints, maps, and other documents that chronicle historical events, people, places, and ideas that continue to shape America and more. (From the collections of the Library of Congress.)
  • Digital Public Library of America Launched in 2013, this site provides access to millions of digitized primary sources from archives, museums, and libraries across United States.
  • << Previous: Home
  • Next: Arts & Music OA Resources >>
  • Last Updated: Sep 23, 2024 3:48 PM
  • URL: https://guides.library.ucla.edu/free-resources
  • Search Search
  • CN (Chinese)
  • DE (German)
  • ES (Spanish)
  • FR (Français)
  • JP (Japanese)
  • Open science
  • Booksellers
  • Peer Reviewers
  • Springer Nature Group ↗
  • Fundamentals of open research
  • Gold or Green routes to open research
  • Benefits of open research
  • Open research timeline
  • Whitepapers
  • About overview
  • Journal pricing FAQs
  • Publishing an OA book
  • Journals & books overview
  • OA article funding
  • Article OA funding and policy guidance
  • OA book funding
  • Book OA funding and policy guidance
  • Funding & support overview
  • Open access agreements
  • Springer Nature journal policies
  • APC waivers and discounts
  • Springer Nature book policies
  • Publication policies overview

Open access journals

We have published over 124,000 open access articles via gold open access across disciplines –from the life sciences to the humanities, representing 33% of all springer nature articles in 2020. authors can also publish their article under an open access licence in more than 2,200 of our hybrid journals..

Our portfolio focuses on robust and insightful research, supporting the development of new areas of knowledge and making ideas and information accessible around the globe.

Across our publishing imprints there are leading multidisciplinary and community-focused journals that offer rigorous, high-impact open access. Many of our titles are also published in partnership with academic societies, enabling them to achieve their own open research ambitions.

OA articles published via Gold OA 

Hybrid OA journals

Open access books

Fully open access journals

Download a list of our fully open access journals, including APC and licence information.

This list indicates the standard article processing charge (APC) for each journal. APCs are payable for articles upon acceptance. While we make every effort to keep this list updated, please note that APCs are subject to change and may vary from the price listed. For further information on the licences and other currencies available, self-archiving embargoes, manuscript deposition, and abstracting & indexing, visit the individual journal’s website. VAT or local taxes will be added where applicable.

Questions about paying for open access?

View our frequently asked questions about article processing charges (APCs).

Visit our imprint sites

BMC

Hybrid journals

Download a list of our hybrid journals, including Springer Open Choice titles. We publish more than 2,200 journals that offer open access at the article level, allowing optional open access in the majority of Springer Nature's subscription-based journals.

This list indicates the standard article processing charge (APC) for each journal. APCs are payable for articles upon acceptance. While we make every effort to keep this list updated, please note that APCs are subject to change and may vary from the price listed. For further information on the licences and other currencies available, self-archiving embargoes, manuscript deposition, and abstracting & indexing, visit the individual journal’s website. VAT or local taxes will be added where applicable.

Find out more by imprint

Springer open choice, springer nature hybrid journals on nature.com, palgrave macmillan hybrid journals, stay up to date.

Here to foster information exchange with the library community

Connect with us on LinkedIn and stay up to date with news and development.

  • Tools & Services
  • Account Development
  • Sales and account contacts
  • Professional
  • Press office
  • Locations & Contact

We are a world leading research, educational and professional publisher. Visit our main website for more information.

  • © 2024 Springer Nature
  • General terms and conditions
  • Your US State Privacy Rights
  • Your Privacy Choices / Manage Cookies
  • Accessibility
  • Legal notice
  • Help us to improve this site, send feedback.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

Active Safety Surveillance of Spikevax and Nuvaxovid Vaccines

Graphical abstract

research papers open access

Open Access Journals

  • Web of Science
  • Ei Compendex
  • CAPlus / SciFinder
  • Biology & Life Sciences
  • Business & Economics
  • Chemistry & Materials Science
  • Computer Science & Mathematics
  • Engineering
  • Environmental & Earth Sciences
  • Medicine & Pharmacology
  • Physical Sciences
  • Public Health & Healthcare
  • Social Sciences, Arts and Humanities

research papers open access

Highly Accessed Articles

Latest books, selected special issues, selected collections.

  • Feedback We are keen to hear what you think about MDPI. Feedback, suggestions, questions?
  • About MDPI MDPI.com is a platform for peer-reviewed, scientific open-access journals operated by MDPI. Read more about MDPI

Journals by Subject

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • DSpace@MIT Home

MIT Open Access Articles

The MIT Open Access Articles collection consists of scholarly articles written by MIT-affiliated authors that are made available through DSpace@MIT under the MIT Faculty Open Access Policy, or under related publisher agreements. Articles in this collection generally reflect changes made during peer-review.

Version details are supplied for each paper in the collection:

  • Original manuscript: author's manuscript prior to formal peer review
  • Author's final manuscript: final author's manuscript post peer review, without publisher's formatting or copy editing
  • Final published version: final published article, as it appeared in a journal, conference proceedings, or other formally published context (this version appears here only if allowable under publisher's policy)

Some peer-reviewed scholarly articles are available through other DSpace@MIT collections, such as those for departments, labs, and centers.

If you are an MIT community member who wants to deposit an article into the this collection, you will need to log in to do so. If you don't have an account, please contact us.

More information:

  • Working with MIT's open access policy
  • Submitting a paper under the policy
  • FAQ about the policy

Recent Submissions

Thumbnail

Additive manufacturing of interlocking glass masonry units 

Thumbnail

Metabolic modulation to improve MSC expansion and therapeutic potential for articular cartilage repair 

Thumbnail

Learning design for short-duration e-textile workshops: outcomes on knowledge and skills 

Show Statistical Information

feed

Journals By Subject    |    Journals A - Z

Architecture / Design

Biomedicine, business and management, computer science, earth sciences, engineering, environment, life sciences, materials science, mathematics, medicine & public health, science, humanities and social sciences, multidisciplinary, social sciences.

Browse article collections by subject.

  • Built Heritage
  • Cellular and Molecular Neurobiology
  • Future Business Journal
  • International Journal of Corporate Social Responsibility
  • Journal of Innovation and Entrepreneurship
  • Journal of Shipping and Trade
  • Schmalenbach Journal of Business Research
  • Applied Biological Chemistry
  • Bioresources and Bioprocessing
  • Fashion and Textiles
  • Journal of Analytical Science and Technology
  • Journal of Umm Al-Qura University for Applied Sciences
  • Applied Network Science
  • Brain Informatics
  • Cybersecurity
  • Energy Informatics
  • EPJ Data Science
  • International Journal of Educational Technology in Higher Education
  • Journal of Big Data
  • Journal of Cloud Computing
  • Visual Computing for Industry, Biomedicine, and Art
  • International Journal of Implant Dentistry
  • Maxillofacial Plastic and Reconstructive Surgery
  • Progress in Orthodontics
  • Earth, Planets and Space
  • Geoscience Letters
  • Geothermal Energy
  • Progress in Earth and Planetary Science
  • Swiss Journal of Geosciences
  • Swiss Journal of Palaeontology
  • Agricultural and Food Economics
  • Financial Innovation
  • Journal for Labour Market Research
  • Journal of Economic Structures
  • Marine Development
  • Swiss Journal of Economics and Statistics
  • Asian-Pacific Journal of Second and Foreign Language Education
  • Disciplinary and Interdisciplinary Science Education Research
  • Empirical Research in Vocational Education and Training
  • International Journal of Child Care and Education Policy
  • International Journal of STEM Education
  • Language Testing in Asia
  • Large-scale Assessments in Education
  • Smart Learning Environments
  • Sustainable Energy Research
  • Advanced Modeling and Simulation in Engineering Sciences
  • Advances in Aerodynamics
  • Advances in Bridge Engineering
  • AI Perspectives & Advances
  • Chinese Journal of Mechanical Engineering
  • EURASIP Journal on Advances in Signal Processing
  • EURASIP Journal on Audio, Speech, and Music Processing
  • EURASIP Journal on Image and Video Processing
  • EURASIP Journal on Information Security
  • EURASIP Journal on Wireless Communications and Networking
  • European Transport Research Review
  • International Journal of Concrete Structures and Materials
  • Journal of Electrical Systems and Information Technology
  • Journal of Engineering and Applied Science
  • Journal of Infrastructure Preservation and Resilience
  • Journal of Materials Science: Materials in Engineering
  • Journal of Umm Al-Qura University for Engineering and Architecture
  • Micro and Nano Systems Letters
  • Moore and More
  • ROBOMECH Journal
  • Satellite Navigation
  • Ecological Processes
  • Environmental Sciences Europe
  • Environmental Systems Research
  • Geoenvironmental Disasters
  • City, Territory and Architecture
  • European Journal of Futures Research
  • Journal of International Humanitarian Action
  • AMB Express
  • Botanical Studies
  • Cell Regeneration
  • Chemical and Biological Technologies in Agriculture
  • Egyptian Journal of Biological Pest Control
  • Fire Ecology
  • Horticulture Advances
  • Journal of Wood Science
  • Natural Products and Bioprospecting
  • The Journal of Basic and Applied Zoology
  • Applied Microscopy
  • Collagen and Leather
  • Functional Composite Materials
  • Heritage Science
  • Journal of Materials Science: Materials Theory
  • Microplastics and Nanoplastics
  • Nano Convergence
  • Advances in Continuous and Discrete Models
  • Boundary Value Problems
  • Fixed Point Theory and Algorithms for Sciences and Engineering
  • Journal of Inequalities and Applications
  • Journal of Mathematics in Industry
  • African Journal of Urology
  • Annals of Intensive Care
  • Beni-Suef University Journal of Basic and Applied Sciences
  • Blood Research
  • Bulletin of Faculty of Physical Therapy
  • Clinical Phytoscience
  • CVIR Endovascular Open peer review
  • Egyptian Journal of Forensic Sciences
  • Egyptian Journal of Medical Human Genetics
  • Egyptian Journal of Neurosurgery
  • Egyptian Journal of Radiology and Nuclear Medicine
  • Egyptian Liver Journal
  • Egyptian Pediatric Association Gazette
  • Egyptian Rheumatology and Rehabilitation
  • EJNMMI Physics
  • EJNMMI Radiopharmacy and Chemistry
  • EJNMMI Reports
  • EJNMMI Research
  • European Radiology Experimental
  • Future Journal of Pharmaceutical Sciences
  • Insights into Imaging
  • Intensive Care Medicine Experimental
  • International Journal of Bipolar Disorders
  • JA Clinical Reports
  • Journal of Ophthalmic Inflammation and Infection
  • Journal of Orthopaedics and Traumatology
  • Journal of Patient-Reported Outcomes
  • Journal of the Egyptian National Cancer Institute
  • Journal of the Egyptian Public Health Association
  • Middle East Current Psychiatry
  • Middle East Fertility Society Journal
  • Molecular and Cellular Pediatrics
  • Sports Medicine - Open
  • Surgical Case Reports
  • The Cardiothoracic Surgeon
  • The Egyptian Heart Journal
  • The Egyptian Journal of Bronchology
  • The Egyptian Journal of Internal Medicine
  • The Egyptian Journal of Neurology, Psychiatry and Neurosurgery
  • The Egyptian Journal of Otolaryngology
  • The Ultrasound Journal
  • eLight Transparent peer review
  • EPJ Quantum Technology
  • EPJ Techniques and Instrumentation
  • Surface Science and Technology
  • Cognitive Research: Principles and Implications
  • Psicologia: Reflexão e Crítica
  • Bulletin of the National Research Centre
  • Comparative Migration Studies
  • International Journal of Anthropology and Ethnology
  • The Journal of Chinese Sociology

Understanding Open Access

In this guide.

  • What is Open Access?
  • Open Access Policies
  • Open Access at Lane Library
  • Frequently Asked Questions

Profile Photo

What is OA?

Open access  ( OA ) is a set of principles and practices through which research outputs like journal articles are distributed online, free of cost or other access barriers.

In "traditional" scholarly publishing, the publisher owns the rights to the articles in their journals. Individuals looking to read these articles may encounter a paywall, requiring them to pay a fee for access. Institutions and libraries (including Lane Library) help provide access to such paywalled research by negotiating with the publishers and paying costly subscription fees. In contrast, open access ensures that the outputs of the research process can be read and built upon by everyone.

Open access to publications is a component of  Open Science , which encompasses a variety of efforts focused on making scientific research more transparent and accessible. Though the term is frequently used to refer to efforts aimed at ensuring access to the products of the research process - journal articles, datasets, code, and other materials - open science also encompasses efforts to ensure that the scientific enterprise is inclusive and equitable.

This guide is intended to help you understand open access-related policies, the various routes you may use to make work "open", and the OA-related resources available to you through Lane Library. 

If you have specific questions about open access, please do not hesitate to contact your liaison librarian . If you are interested in engaging in a broader discussion about open access and other open science-related issues, consider attending a meeting of the Open Science Reading Group .

Methods of Making Work Open

There are a variety of ways to make work "open". Below we have highlighted some of the most common and provided detail about how they differ during the writing and submission process, during the evaluation (peer review) process, during the production and publishing process, and how readers are able to access and read articles.

Please note that open access is best conceived of as a continuum of practice. As shown by visualizations like How Open Is It? , individual journals may exhibit greater or lesser degrees of "openness".

Open Access Publishing

Open Access publishing (also sometimes called Gold OA) is a form of open access in which a publisher makes all articles and related content associated with a certain journal available for free immediately on the journal's website. In this model, authors are often asked to bear the cost of publication, typically through an article processing charge (APC). Examples of this form of open access are journals like eLife and those published by PLOS .

The authors write an article and submit it to an open access journal.

The article goes through the peer review process. Once accepted, the authors may pay an article processing charge.

The article goes through the production process, where it is formatted, typeset, etc. The publisher then makes it available free of charge and other access barriers online.

Readers can read the article free of charge.

Self Archiving

Self-archiving (also sometimes called Green OA) is a form of open access in which, independently of publication by a journal publisher, an author posts their work to a website where it can be accessed and read by others. The NIH Public Access Policy can be considered an example of this type of open access. Stanford University's proposed open access policy includes self archiving.

There are a variety of ways to find and read articles that have been self-archived in this manner. We recommend the Unpaywall browser extension.

The authors write an article and submit to the journal of their choice.

The article goes through the peer review process. Once accepted, the authors submit a copy of the peer reviewed manuscript to a repository or post it on a website.

The article goes through the production process, where it is formatted, typeset, etc. The publisher then makes it available, potentially behind a paywall

Readers can read the self-archived copy of the article free of charge but may have to pay to read the version published by the journal publisher.

Preprints are a special case of self-archiving where authors submit a copy of an article that has not yet gone through peer view to a preprint repository so it can be accessed and read by others. Preprint servers for biomedical and health sciences-related work include bioRxiv and MedRxiv . Europe PMC can be used to search for preprints and there are a limited number of COVID-19 related preprints in PubMed Central .

The authors write an article and post it on a preprint server. They may also submit it to a journal.

Once posted, the preprint may be commented upon and reviewed by readers. Authors may revise their preprints in light of these comments. If submitted to a journal, the article will also go through the peer review process.

If submitted to a journal, the article will go through the peer review process. Once published, the article will may include a link to the original preprint.

Readers will be able to read the preprint free of charge but, if it was also submitted to a journal, may have to pay to read the version published by the journal publisher.

Other Forms of OA

In addition to the forms of open access discussed above, there may be cases where a "traditional" journal makes temporarily removes paywalls for specific articles or instances where paywalls are removed for articles after a certain period of time following publication. There are also cases where a "traditional" journal will make an individual article free to read if the authors pay a fee. In some cases, the journal may maintain copyright of the articles under these models.

More Information

The video below, created by  Jorge Cham and featuring Nick Shockey and Jonathan Eisen , provides a quick introduction to the motivations behind and principles of open access.

For even more information about open access, see the list of resources below:

  • SPARC SPARC (the Scholarly Publishing and Academic Resources Coalition) works to enable the open sharing of research outputs and educational materials in order to democratize access to knowledge, accelerate discovery, and increase the return on our investment in research and education.
  • Open Access (The Book) Peter Suber's excellent book provides an introduction to Open Access. It is freely available through a variety of sources.
  • Open Science Reading Group The Open Science Reading Group is intended to bring together members of the Stanford Medicine community to learn about open science, discuss the application of open science practices in a biomedical context.
  • Next: Open Access Policies >>
  • Last Updated: Jul 17, 2024 3:27 PM
  • URL: https://laneguides.stanford.edu/openaccess

(Stanford users can avoid this Captcha by logging in.)

  • Send to text email RefWorks EndNote printer

CORE : the world's largest collection of open access research papers

Search this database, more options.

  • Find it at other libraries via WorldCat
  • Contributors

Description

Creators/contributors, contents/summary, bibliographic information, browse related items.

Stanford University

  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Non-Discrimination
  • Accessibility

© Stanford University , Stanford , California 94305 .

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox , Microsoft Edge , Google Chrome , or Safari 14 or newer. If you are unable to, and need support, please send us your feedback .

We'd appreciate your feedback. Tell us what you think!   opens in new tab/window

Open access

Advancing open access to knowledge.

Open access is a key part of our mission to help researchers advance science for societal progress.

Man with telescope looking at stars and galaxy

Open access at Elsevier

Open access is vital to a collaborative, inclusive and transparent world of research where quality knowledge can be shared and built upon. Every day, we work to bring more insight into closer reach for the research community and the public. We offer a wide choice and flexibility for every researcher and institution around the world that wants to publish open access, without ever compromising on research quality, integrity and value.

Advancing open access

Elsevier’s Laura Hassink and Stuart Whayman talk about the growth of open access and what the future holds for researchers, librarians and publishers

Photo of Elsevier's Stuart Whayman, Managing Direct, Researchers & Librarians, and Laura Hassink, Managing Director, Journals

Enabling a transition to open access

As one of the largest open access publishers in the world we are enabling a transition to open access at scale. Nearly all our 2,900 journals enable open access publishing and more than 800 of these are fully open access. In 2023 we published more than 190,000 open access articles.

O ur world-leading research platforms make available 3.3 million validated open access articles and we support more than 2000 institutions with open access agreements.

Students sitting together

Delivering high quality research

Each year, we receive around 3 million research papers from authors. Whether published open access or via subscription model, they are all rigorously reviewed by our in-house editorial teams in collaboration with 33,000 editors and 1.5 million expert reviewers around the world.

The result is over 630,000 articles in 2023 enhanced, indexed, certified, published and promoted following peer review. These processes and the assistance provided to authors along the way ensure the integrity and reliability of research and of the scientific record. Articles in Elsevier journals account for over 17% of the global research output and 28% of global citations, reinforcing our focus on quality.

Librarian looking up books on computer for student

Supporting every researcher and institution

We offer a broad range of choices to support every researcher and institution in accessing and publishing research. In 2023 we supported more than half a million researchers in 190 countries and territories to publish open access.

Alongside our commitment to pricing article publishing charges below market average relative to comparable quality, we have initiatives to support researchers in low- and middle-income countries. In 2023, we waived or discounted costs for nearly 80% of authors from the Global South and introduced the industry-first Geographical Pricing for Open Access initiative. This considers local economic circumstances to help researchers publish research open access.

research papers open access

Building open access sustainability with transformative agreements

In a series of three case studies, library leaders share their insights into the transformative agreement process. Librarians guide readers through setting goals and communicating to stakeholders, working with publishers, and implementing the agreement across their institutions.

Learn more about transformative agreements that drive cross-campus collaboration, support researchers, and sustainably expand open access.

Three people standing around a table looking at a paper.

How we are advancing open access

Researcher perspectives

Prof Charles Spence, PhD, of the University of Oxford investigates how our senses interact and how they impact our daily lives.

Multisensory researcher on how open access connects academia to the wider world

Image of Dr. Heyddy Calderon

Open access publishing is indispensable, says award-winning hydrology researcher

Quote by Christopher Parsonson

Advancing data center networking through open access

Image of Prof. Gawsia Wahidunessa Chowdhury, PhD

“Open access is like a window of knowledge”

Open science .

Open access is just one element of the way we partner with you to drive open science. Together we can create a more inclusive, collaborative and transparent world of research.

Unlocking the potential of data

We're working to help researchers and institutions store, share, discover and effectively reuse data. Effective data sharing can improve the impact, validity, reproducibility, efficiency and transparency of scientific research.

Underwater marine biologist photographer taking a photo of the fish and coral reef

Promoting research integrity

We are committed to promoting the integrity of research through a range of activities and initiatives from free author training on publication ethics and providing transparency in author contributor roles

Researchers in the lab

Free access initiatives

From researchers and students using content published in our books and journals on a daily basis to a patient who needs critical information about their treatment, Elsevier has a range of access options to ensure that everyone can access the important information they need.

Find our access options:

Public Relations Business woman smiling

Access for public and media

solar farm

Access for developing countries

students in library, looking at books

Access for researchers and students

doctor examining patient

Access for healthcare and patients

coronavirus-image

Responding to public health emergencies

Frequently asked questions, how many of your journals offer a gold open access option.

Elsevier is one of the fastest-growing open access publishers in the world. Nearly all of Elsevier's 2,900 journals now enable open access publishing, including 800 journals which are fully open access journals. 

What is your position on Green Open Access?

All Elsevier journals allow authors to use Green Open Access, usually after an embargo period. Green Open Access is when authors share a public version of their article, for example in their institution or funder’s repository, which would otherwise only be available to paying subscribers. 

Do you support access to subscription articles in any other ways?

Elsevier makes subscription articles completely free to access in specific situations: 

We offer free access to relevant research for health emergencies,  as we did during the Covid-19 pandemic . 

Patients and caregivers are provided with papers related to medicine and healthcare upon request to help them better understand the latest research on their conditions. 

Through  Research4Life   opens in new tab/window , institutions in 120 low- and middle-income countries receive affordable access to nearly 100,400 peer reviewed resources. As founding member, Elsevier provides over a quarter of that content, as well as access to the abstract and citation database Scopus, and trainings for librarians. 

How do you formulate your prices for publishing and subscriptions?

We strive to offer researchers value for money, and we are committed to pricing our journals competitively with an underlying principle of pricing lower than the market for like-for-like quality.

Open access content and subscription content are priced separately. Open access publishing is supported by the pay-to-publish model, where authors (or others on their behalf) pay an Article Publishing Charge (APC) to enable the article to be made publicly available immediately on publication. 

We set APC prices based on the following criteria: 

Journal quality 

The journal’s editorial and technical processes 

Competitive considerations 

Market conditions 

Other revenue streams associated with the journal such as advertising 

Elsevier’s APC prices are set on a per journal basis. Fees range between c$150 and c$10,100 US Dollars, excluding tax, with prices clearly displayed on our  APC price list   opens in new tab/window   and on journal homepages. 

Where articles are not supported by the pay-to-publish model, they are typically supported by subscription fees paid for by readers. 

We set journal subscription list prices based on the following criteria: 

Number of subscription articles 

And other revenue streams such as commercial contributions from advertising, reprints and supplements 

Can you be more transparent in what you charge?

We are constantly striving to be more transparent in all aspects of what Elsevier does, including pricing. We try to support requests for information within the bounds allowed by financial reporting requirements and competition rules. 

For authors: 

We provide the price of publishing gold open access on each journal homepage and in a central list   opens in new tab/window

We automatically  notify authors who are entitled to free or discounted gold open Access, for example where there is an agreement with their institution or funder

We automatically notify authors who are entitled to free or discounted gold open access because they are based in a low — or middle-income country — our APC waiver policy explains this process

For librarians: 

We provide a range of information   opens in new tab/window   about our pricing competitiveness; how our pricing corresponds to quality; and publishing model uptake across subscription and open access

We publicly announce significant agreements, including our open access pilots 

We provide a list of our journal subscription prices

We describe the process we follow to calculate list prices

We describe the process to ensure we do not double dip — we also show the number of articles that are published gold open access, and the number which are financed through subscriptions, on each journal homepage, to allow librarians to validate this

Do you double dip (i.e., charge for the same article twice)?

We do not double-dip. We can be reimbursed for an article in two ways — through an Article Publishing Charge (APC) or a subscription — but we never charge for the same article twice. We have a strict no double-dipping policy .

How do you help authors who cannot afford to pay to be published, and why can't you offer that support more widely?

As part of our commitment to inclusion and diversity in science we believe no researcher should be prevented from publishing in their journal of choice because of financial barriers. We support researchers from low- and middle-income countries to publish fold open access if they wish to do so. When publishing in fully open access journals, we fully waive all open access charges for authors from 69 countries ( Group A   opens in new tab/window ) and give a 50% discount for authors from 57 countries ( Group B   opens in new tab/window ). 

For other authors, we offer a choice of journals with open access publishing charges ranging from $150 to $10,100. We will also consider requests for accommodations on a case-by-case basis for authors who are required to publish open access but do not have the financial means to do so. 

Finally, we provide high quality subscription publishing options, so authors should never face a cost barrier to publishing in their journal of choice.

If more authors are publishing Gold Open Access, why don't you reduce your subscription fees?

We strive to offer researchers real value, and we are continuing our commitment to pricing our journals competitively with an underlying principle of pricing lower than the market for like-for-like quality.

We see growth in the number of articles published through both the gold open access and subscription models. Subscription volumes rose by over 7% in 2020 compared to the previous year, for instance. However, we still price competitively: Elsevier’s average price change has been the lowest amongst major competitors in the last 13 years due to moderate historical price changes and this strong volume growth. At the same time, we maintain high-quality content. 

Our prices for subscription articles and APCs are set completely separately. Subscription fees are based on a range of factors, as noted above.

Does Elsevier have any Transformative Journals?

Elsevier is piloting  transformative journal   opens in new tab/window status for more than 60 journals across our portfolio. You can see the full list   opens in new tab/window of transformative journals and targets and visit the relevant individual journal home pages for more information.

Explore more

Open access journals, open access books.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 07 June 2023

CORE: A Global Aggregation Service for Open Access Papers

  • Petr Knoth   ORCID: orcid.org/0000-0003-1161-7359 1 ,
  • Drahomira Herrmannova   ORCID: orcid.org/0000-0002-2730-1546 1   nAff2 ,
  • Matteo Cancellieri 1 ,
  • Lucas Anastasiou 1 ,
  • Nancy Pontika 1 ,
  • Samuel Pearce 1 ,
  • Bikash Gyawali 1 &
  • David Pride 1  

Scientific Data volume  10 , Article number:  366 ( 2023 ) Cite this article

8505 Accesses

2 Citations

74 Altmetric

Metrics details

  • Research data

This paper introduces CORE, a widely used scholarly service, which provides access to the world’s largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE’s continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services.

Similar content being viewed by others

research papers open access

A large dataset of scientific text reuse in Open-Access publications

research papers open access

SciSciNet: A large-scale open data lake for the science of science research

research papers open access

re3data – Indexing the Global Research Data Repository Landscape Since 2012

Introduction.

Scientific literature contains some of the most important information we have assembled as a species, such as how to treat diseases, solve difficult engineering problems, and answer many of the world’s challenges we are facing today. The entire body of scientific literature is growing at an enormous rate with an annual increase of more than 5 million articles (almost 7.2 million papers were published in 2022 according to Crossref, the largest Digital Object Identifier (DOI) registration agency). Furthermore, it was estimated that the amount of research published each year increases by about 10% annually 1 . At the same time, an ever growing amount of research literature, which has been estimated to be well over 1 million publications per year in 2015 2 , is being published as open access (OA), and can therefore be read and processed with limited or no copyright restrictions. As reading this knowledge is now beyond the capacities of any human being, text mining offers the potential to not only improve the way we access and analyse this knowledge 3 , but can also lead to new scientific insights 4 .

However, systematically gathering scientific literature to enable automated methods to process it at scale is a significant problem. Scientific literature is spread across thousands of publishers, repositories, journals, and databases, which often lack common data exchange protocols and other support for inter-operability. Even when protocols are in place, the lack of infrastructure for collecting and processing this data, as well as restrictive copyrights and the fact that OA is not yet the default publishing route in most parts of the world further complicate the machine processing of scientific knowledge.

To alleviate these issues and support text and data mining of scientific literature we have developed CORE ( https://core.ac.uk/ ). CORE aggregates open access research papers from thousands of data providers from all over the world including institutional and subject repositories, open access and hybrid journals. CORE is the largest collection of OA literature–at the time of writing this article, it provides a single point of access to scientific literature collected from over ten thousand data providers worldwide and it is constantly growing. It provides a number of ways for accessing its data for both users and machines, including a free API and a complete dump of its data.

As of January 2023, there are 4,700 registered API users and 2,880 registered dataset and more than 70 institutions have registered to use CORE Recommender in their repository systems.

The main contributions of this work are the development of CORE’s continuously growing dataset and the tools and services built on top of this corpus. In this paper, we describe the motivation behind the dataset’s creation and the challenges and methods of assembling it and keeping it continuously up-to-date. Overcoming the challenges posed by creating a collection of research papers of this scale required devising innovative solutions to harvesting and resource management. Our key innovations in this area which have contributed to the improvement of the process of aggregating research literature include:

Devising methods to extend the functionality of existing widely-adopted metadata exchange protocols which were not designed for content harvesting, to enable efficient harvesting of research papers’ full texts.

Developing a novel harvesting approach (referred to here as CHARS) which allows us to continuously utilise the available compute resources while providing improved horizontal scalability, recoverability, and reliability.

Designing an efficient algorithm for scheduling updates of harvested resources which optimises the recency of our data while effectively utilising the compute resources available to us.

This paper is organised as follows. First, in the remainder of this section, we present several use cases requiring large scale text and data mining of scientific literature, and explain the challenges in obtaining data for these tasks. Next, we present the data offered by CORE and our approach for systematically gathering full text open access articles from thousands of repositories and key scientific publishers.

Terminology

In digital libraries the term record is typically used to denote a digital object such as text, image, or video. In this paper and when referring to data in CORE, we use the term metadata record to refer to the metadata of a research publication, i.e. the title, authors, abstract, project funding details, etc., and the term full text record to describe a metadata record which has an associated full text.

We use the term data provider to refer to any database or a dataset from which we harvest records. Data providers harvested by CORE include disciplinary and institutional repositories, publishers and other databases.

When talking about open access (OA) to scientific literature, we refer to the Budapest Open Access Initiative (BOAI) definition which defines OA as “free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose” ( https://www.budapestopenaccessinitiative.org/read ). There are two routes to open access, 1) OA repositories and 2) OA journals. The first can be achieved by self-archiving (depositing) publications in repositories (green OA), and the latter by directly publishing articles in OA journals (gold OA).

Text and Data Mining of Scientific Literature

Text and data mining (TDM) is the discovery by a computer of new, previously unknown information, by automatically extracting information from different written resources ( http://bit.ly/jisc-textm ). The broad goal of TDM of scientific literature is to build tools that can retrieve useful information from digital documents, improve access to these documents, or use these documents to support scientific discovery. OA and TDM of scientific literature have one thing in common–they both aim to improve access to scientific knowledge for people. While OA aims to widen the availability of openly available research, TDM aims to improve our ability to discover, understand and interpret scientific knowledge.

TDM of scientific literature is being used in a growing number of applications, many of which were until recently not viable due to the difficulties associated with accessing the data from across many publishers and other data providers. Because many use cases involving text and data mining can only realise their full potential when they are executed on an as large corpus of research papers as possible, these data access difficulties have rendered many of the uses cases described below very difficult to achieve. For example, to reliably detect plagiarism in newly submitted publications it is necessary to have access to an always up-to-date dataset of published literature spanning all disciplines. Based on data needs, scientific literature TDM use cases can be broadly categorised into the following two categories, which are shown in Fig.  1 :

A priori defined sample use cases: Use cases which require access to a subset of scientific publications that can be specified prior to the execution of the use case. For example, gathering the list of all trialled treatments for a particular disease in the period 2000–2010 is a typical example of such a use case.

Undefined sample use cases: Use cases which cannot be completed using data samples that are defined a priori. The execution of such use cases might require access to data not known prior to the execution or may require access to all data available. Plagiarism detection is a typical example of such use case.

figure 1

Example uses cases of text and data mining of scientific literature. Depending on data needs, TDM uses can be categorised into a) a priori defined sample use cases, and b) undefined sample use cases. Furthermore, TDM use cases can broadly be categorised into 1) indirect applications which aim to improve access to and organisation of literature and 2) direct applications which focus on answering specific questions or gaining insights.

However, there are a number of factors that significantly complicate access to data for these applications. The needed data is often spread across many publishers, repositories, and other databases, often lacking interoperability (these factors will be further discussed in the next section). Consequently, researchers and developers working in these areas typically invest a considerable amount of time in corpus collection, which could be up to 90% of the total investigation time 5 . For many, this task can even prove impossible due to technical restrictions and limitations of publisher platforms, some of which will be discussed in the next section. Consequently, there is a need for a global, continuously updated, and downloadable dataset of full text publications to enable such analysis.

Challenges in machine access to scientific literature

Probably the largest obstacle to the effective and timely retrieval of relevant research literature is that it may be stored in a wide variety of locations with little to no interoperability: repositories of individual institutions, publisher databases, conference and journal websites, pre-print databases, and other locations, each of which typically offers different means for accessing their data. While repositories often implement a standard protocol for metadata harvesting, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), publishers typically allow access to their data through custom made APIs, which are not standardised and are subject to changes 6 . Other data sources may provide static data dumps in a variety of formats or not offer programmatic access to their data at all.

However, even when publication metadata can be obtained, other steps involved in the data collection process complicate the creation of a final dataset suitable for TDM applications. For example, the identification of scientific publications within all downloaded documents, matching these publications correctly to the original publication metadata, and their conversion from formats used in publishing, such as the PDF format, into a textual representation suitable for text and data mining, are just some of the additional difficulties involved in this process. The typical minimum steps involved in this process are illustrated in Fig.  2 . As there are no widely adopted solutions providing interoperability across different platforms, custom harvesting solutions need to be created for each.

figure 2

Example illustration of the data collection process. The figure depicts the typical minimum steps which are necessary to produce a dataset for TDM of scientific literature. Depending on the use case, tens or hundreds of different data sources may need to be accessed, each potentially requiring a different process–for example accessing a different set of API methods or a different process for downloading publication full text. Furthermore, depending on the use case, additional steps may be needed, such as extraction of references, identification of duplicate items or detection of the publication’s language. In the context of CORE, we provide the details of this process in Section Methods.

Challenges in systematically gathering open access research literature

Open access journals and repositories are increasingly becoming the central providers of open access content, in part thanks to the introduction of funder and institutional open access policies 7 . Open access repositories include institutional repositories such as the University of Cambridge Repository https://www.repository.cam.ac.uk/ , and subject repositories such arXiv https://arxiv.org/ . As of February 2023, there are 6,015 open access repositories indexed in the Directory of Open Access Repositories http://v2.sherpa.ac.uk/opendoar/ (OpenDOAR), as well as 18,935 open access journals indexed in the Directory of Open Access Journals https://doaj.org/ (DOAJ). However, open access research literature can be stored in a wide variety of other locations, including publisher and conference websites, individual researcher websites, and elsewhere. Consequently, a system for harvesting open access content needs to be able to harvest effectively from thousands of data providers. Furthermore, a large number of open access repositories (69.4% of repositories indexed in OpenDOAR as of January 2018) expose their data through the OAI-PMH protocol while often not providing any alternatives. An open access harvesting system therefore also needs to be able to effectively utilise OAI-PMH for open access content harvesting. However, these two requirements–harvesting from thousands of data providers and utilising OAI-PMH for content harvesting–pose a number of significant scalability challenges.

Challenges related to harvesting from thousands of data providers

Open access data providers vary greatly in size, with some hosting millions of documents while others host a significantly lower number. New documents are added and old documents are often updated by data providers daily.

Different geographic locations and internet connection speeds may result in vastly differing times needed to harvest information from different providers, even when their size in terms of publication numbers is the same. As illustrated in Table  1 , there are also a variety of OAI-PMH implementations across commonly used repository platforms providing significantly different harvesting performance. To construct this table, we analysed OAI-PMH metadata harvesting performances of 1,439 repositories in CORE, covering eight different repository platforms. It should be noted that the OAI-PMH protocol only necessitates metadata to be expressed in the Dublin Core (DC) format. However, it also can be extended to express the metadata in other formats. Because the Dublin-Core standard is constrained to just 15 elements, it is not uncommon for OAI-PMH repositories to also use and extended metadata format such as Rioxx ( https://rioxx.net ) or the OpenAIRE Guidelines ( https://www.openaire.eu/openaire-guidelines-for-literature-institutional-and-thematic-repositories ).

Additionally, harvesting is limited not only by factors related to the data providers, but also by the compute resources (hardware) available to the aggregator. As many use cases listed in the Introduction, such as in plagiarism detection or systematic review automation, require access to very recent data, ensuring that the harvested data stays recent and that the compute resources are utilised efficiently both pose significant challenges.

To overcome these challenges, we designed the CORE Harvesting System (CHARS) which relies on two key principles. The first is the application of the microservices software principles to open access content harvesting 8 . The second is our strategy we denote pro-active harvesting , which means that providers are scheduled automatically according to current need. This strategy is implemented in the harvesting Scheduler (Section CHARS_architecture). The Scheduler uses a formula we designed for prioritising data providers.

The combination of the Scheduler with CHARS microservices architecture enables us to schedule harvesting according to current compute resource utilisation, thus greatly increasing our harvesting efficiency. Since switching from a fixed-schedule approach described above to pro-active harvesting, we have been able to greatly improve the data recency of our collection as well as to increase the size of the collection threefold within the span of three years.

Challenges related to the use of OAI-PMH protocol for content harvesting

As explained above, OAI-PMH is currently the standard method for exchanging data across repositories. While the OAI-PMH protocol was originally been designed for metadata harvesting only, it has been, due to its wide adoption and lack of alternatives, used as an entry point for full text harvesting. Full text harvesting is achieved by extracting URLs from the metadata records collected through OAI-PMH, the extracted URLs are then used to discover the location of the actual resource 9 . However, there are a number of limitations of the OAI-PMH protocol which make it unsuitable for large-scale content harvesting:

It directly supports only metadata harvesting, meaning additional functionality has to be implemented in order to use it for content harvesting.

The location of full text links in the OAI-PMH metadata is not standardised and the OAI-PMH metadata records typically contain multiple links. From the metadata it is not clear which of these links points to the described representation of the resource and in many cases none of them does so directly. Therefore, all possible links to the resource itself have to be extracted from the metadata and tested to identify the correct resource. Furthermore, OAI-PMH does not facilitate any validation for ensuring the discovered resource is truly the described resource. In order to overcome this issues, the adoption of the RIOXX https://rioxx.net/ metadata format or the OpenAIRE guidelines https://guidelines.openaire.eu/ has been promoted. However, the issue of unambiguously connecting metadata records and the described resource is still present.

The architecture of the OAI-PMH protocol is inherently sequential, which makes it ill-suited for harvesting from very large repositories. This is because the processing of large repositories cannot be parallelised and it is not possible to recover the harvesting in case of failures.

Scalability across different implementations of OAI-PMH differs dramatically. Our analysis (Table  1 ) shows that performance can differ significantly also when only a single repository software is considered 10 .

Other limitations include difficulties in incremental harvesting, reliability issues, metadata interoperability issues, and scalability issues 11 .

We have designed solutions to overcome a number of these issues, which have enabled us to efficiently and effectively utilise OAI-PMH to harvest open access content from repositories. We present these solutions in Section Using OAI-PMH for content harvesting. While we currently rely on a variety of solutions and workarounds to enable content harvesting through OAI-PMH, most of the limitations listed in this section could also be addressed by adopting more sophisticated data exchange protocols, such as the ResourceSync ( http://www.openarchives.org/rs/1.1/resourcesync ) protocol which was designed with content harvesting in mind 10 and the adoption in the systems of data providers we support.

Our solution

In the above sections we have highlighted a critical need for many researchers and organisations globally for large-scale always up-to-date seamless machine access to scientific literature originating from thousands of data providers at full text level. Providing this seamless access has become both a defining goal and a feature of CORE and has enabled other researchers to design and test innovative methods on CORE data, often powered by artificial intelligence processes. In order to put together this vast continuously updated dataset, we had to overcome a number of research challenges, such as those related to the lack of interoperability, scalability, regular content synchronisation, content redundancy and inconsistency. Our key innovation in this area is the improvement of the process of aggregating research literature , as specified in the Introduction section.

This underpinning research has allowed CORE to become a leading provider of open access papers. The amount of data made available by CORE has been growing since 2011 12 and is continuously kept up to date. As of February 2023, CORE provides access to over 291 million metadata records and 32.8 million full text open access articles, making it the world’s largest archive of open access research papers, significantly larger than PubMed, arXiv and JSTOR datasets.

Whilst there are other publication databases that could be initially viewed as similar to CORE, such as BASE or Unpaywall, we will demonstrate the significant differences that set CORE apart and show how it provides access to a unique, harmonised corpus of Open Access literature. A major difference between these existing services is that CORE is completely free to use for the end user, it hosts full text content, and offers several methods for accessing its data for machine processing. Consequently, it removes the need to harvest and pre-process full text for text mining, since CORE provides plain text access to the full texts via its raw data services, eliminating the need for text and data miners to work on PDF formats. A detailed comparison of other publication databases is provided in the Discussion. In addition, CORE enables building powerful services on top of the collected full texts, supporting all the categories of use cases outlined in the Use cases section.

As of today, CORE provides three services for accessing its raw data: API, dataset, and a FastSync service. The CORE API provides real-time machine access to both metadata and full texts of research papers. It is intended for building applications that need reliable access to a fraction of CORE data at any time. CORE provides a RESTful API. Users can register for an API key to access the service. Full documentation and Python notebooks containing code examples can be found on the CORE documentation pages online ( https://api.core.ac.uk/docs/v3 ). The CORE Dataset can be used to download CORE data in bulk. Finally, CORE FastSync enables third party systems to keep an always up to date copy of all CORE data within their infrastructure. Content can be transferred as soon as it becomes available in CORE using a data synchronisation service on top of the ResourceSync protocol 13 optimised by us for improved synchronisation scalability with an on-demand resource dumps capability. CORE FastSync provides fast, incremental and enterprise data synchronisation.

CORE is the largest up-to-date full text open access dataset as well as one of the most widely used services worldwide supporting access to freely available research literature. CORE regularly releases data dumps licensed as ODC-By, making the data freely available for both commercial and non-commercial purposes. Access to CORE data via the API is provided freely to individuals conducting work in their own personal capacity and to public research organisations for unfunded research purposes. CORE offers licenses to commercial organisations wanting to use CORE services to obtain a convenient way of accessing CORE data with a guaranteed level of service support. CORE is operated as a not-for-profit entity by The Open University and this business model makes it possible for CORE to remain free for the >99.99% of its users.

A large number of commercial organisations have benefited from these licenses in areas as diverse as plagiarism detection in research, building specialised scholarly publication search engines, developing scientific assistants and machine translation systems and supporting education etc. https://core.ac.uk/about/endorsements/partner-projects . The CORE data services–CORE API and Dataset, have been used by over 7,000 experts to analyse data, develop text-mining applications and to embed CORE into existing production systems.

Additionally, more than 70 repository systems have registered to use the CORE Recommender and the service is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org. Other CORE services are the CORE Discovery and the CORE Repository Dashboard. The first was released on July 2019 and at the time of writing it has more than 5000 users. The latter is a tool designed specifically for repository managers which provides access to a range of tools for managing the content within their repositories. The CORE Repository Dashboard is currently used by 499 users from 36 countries.

In the rest of this paper we describe the CORE dataset and the methods of assembling it and keeping it continuously up-to-date. We also present the services and tools built on top of the aggregated corpus and provide several examples of how the CORE dataset has been used to create real-world applications addressing specific use-cases.

As highlighted in the Introduction, CORE is a continuously growing dataset of scientific publications for both human and machine processing. As we will show in this section, it is a global dataset spanning all disciplines and containing publications aggregated from more than ten thousand data providers including disciplinary and institutional repositories, publishers, and other databases. To improve access to the collected publications, CORE performs a number of data enrichment steps. These include metadata and full text extraction, language and DOI detection, and linking with other databases. Furthermore, CORE provides a number of services which are built on top of the data: a publications recommender ( https://core.ac.uk/services/recommender/ ), CORE Discovery service ( https://core.ac.uk/services/discovery/ ) (a tool for discovering OA versions of scientific publications), and a dashboard for repository managers ( https://core.ac.uk/services/repository-dashboard/ ).

Dataset size

As of February 2023, CORE is the world’s largest dataset of open access papers (comparison with other systems is provided in the Discussion). CORE hosts over 291 million metadata records including over 34 million articles with full text written in 82 languages and aggregated from over ten thousand data providers located in 150 countries. Full details of CORE Dataset size are presented in Table  2 . In the table, “Metadata records” represent all valid (not retracted, deleted, or for some other reason withdrawn) records in CORE. It can be seen that about 13% of records in CORE contain full text. This number represents records for which a manuscript was successfully downloaded and converted to plain text. However, a much higher proportion of records contains links to additional freely available full text articles hosted by third-party providers. Based on analysing a subset of our data, we estimate that about 48% of metadata records in CORE fall into this category, indicating that CORE is likely to contain links to open access full texts for 139 million articles. Due to the nature of academic publishing there will be instances where multiple versions of the same paper are deposited in different repositories. For example, an early version of an article can be deposited by an author to a pre-print server such as arXiv or BiorXiv and then a later version uploaded to an institutional repository. Identifying and matching these different versions is a significant undertaking. CORE has carried out research to develop techniques based on locality sensitive hashing for duplicates identification 8 and integrated these into its ingestion pipeline to link versions of papers from across the network of OA repositories and group these under a single works entity. The large number of records in CORE translates directly into the size of the dataset in bytes as the uncompressed version of the dataset including PDFs is about 100 TB. The compressed version of the CORE dataset with plain texts only amounts to 393 GB and uncompressed to 3.5 TBs.

Recent studies have estimated that around 24%–28% of all articles are available free to read 2 , 14 . There are a number of reasons why the proportion of full text content in CORE is lower than these estimates. The main reason is likely that a significant proportion of the free to read articles represents content hosted on platform with many restrictions for machine accessibility, i.e. some repositories severely restrict or fully prohibit content harvesting 9 .

The growth of CORE has been made possible thanks to the introduction of a novel harvesting system and the creation of an efficient harvesting scheduler, both of which are described in the Methods section. The growth of metadata and full text records in CORE is shown in Fig.  3 . Finally, Fig.  4 shows age of publications in CORE.

figure 3

Growth of records in CORE per month since February 2012. “Full text growth” represents growth of records containing full text, while “Metadata growth” represents growth of records without full text, i.e. the two numbers do not overlap. The two area plots are stacked on top of each other, their sum therefore represents the total number of records in CORE.

figure 4

Age of publications in CORE. Similarly as in Fig.  3 , the “Metadata” and “Full text” records bars are stacked on top of each other.

Data sources and languages

As of February 2023, CORE was aggregating content from 10,744 data sources. These data sources include institutional repositories (for example the USC Digital Library or the University of Michigan Library Repository), academic publishers (Elsevier, Springer), open access journals (PLOS), subject repositories, including those hosting eprints (arXiv, bioRxiv, ZENODO, PubMed Central) and aggregators (e.g. DOAJ). The ten largest data sources in CORE are shown in Table  3 . To calculate the total number of data providers in CORE, we consider aggregators and publishers as one data source despite each aggregating data from multiple sources. A full list of all data providers can be found on the CORE website. ( https://core.ac.uk/data-providers ).

The data providers aggregated by CORE are located in 150 different countries. Figure  5 shows the top ten countries in terms of number of data providers aggregated by CORE from each country alongside the top ten languages. The geographic spread of repositories is largely reflective of the size of the research economy in those countries. We see the US, Japan, Germany, Brazil and the UK all in the top six. One result that at first may appear surprising is the significant number of repositories in Indonesia, enough to place them at the top of the list. An article in Nature in 2019 showed that Indonesia may be the world’s OA leader, finding that 81% of 20,000 journal articles published in 2017 with an Indonesia-affiliated author are available to read for free somewhere online. ( https://www.nature.com/articles/d41586-019-01536-5 ). Additionally, there are a large number of Indonesian open-access journals registered with Crossref. This subsequently leads to a much higher number of individual repositories in this country.

figure 5

Top ten languages and top ten provider locations in CORE.

As part of the enrichment process, CORE performs language detection. Language is either extracted from the attached metadata where available or identified automatically from full text in case it is not available in metadata. More than 80% of all documents with language information are in English. Overall, CORE contains publications in a variety of languages, the top 10 of which are shown in Fig.  5 .

Document types

The CORE dataset comprises a collection of documents gathered from various sources, many of which contain articles of different types. Consequently, aside of research articles from journals and conferences, it includes other types of research outputs such as research theses, presentations, and technical reports. To distinguish different types of articles, CORE has implemented a method of automatically classifying documents into one of the following four categories 15 : (1) research article, (2) thesis, (3) presentation, (4) unknown (for articles not belonging into any of the previous three categories). This method is based on a supervised machine learning model trained on article full texts. Figure  6 shows the distribution of articles in CORE into these four categories. It can be seen that the collection aggregated by CORE consists predominantly of research articles. We have observed in the data collected from repositories that the vast majority of research theses deposited in repositories has full text associated with the metadata. As this is not always the case for research articles, and as Fig.  6 is produced on articles with full text only, we expect that the proportion of research articles compared to research theses in CORE is actually higher across the entire collection.

figure 6

Distribution of document types.

Research disciplines

To analyse the distribution of disciplines in CORE we have leveraged a third-party service. Figure  7 shows a subject distribution of a sample of 20,758,666 publications in CORE. For publications with multiple subjects we count the publication towards each discipline.

figure 7

Subject distribution of a sample of 20,758,666 CORE publications.

The subject for each article was obtained using Microsoft Academic ( https://academic.microsoft.com/home ) prior to its retirement in November 2021. Our results are consistent with other studies, which have reported Biology, Medicine, and Physics to be the largest disciplines in terms of number of publications 16 , 17 , suggesting that the distribution of articles in CORE is representative of research publications in general.

Additional CORE Tools and Services

CORE has built several additional tools for a range of stakeholders including institutions, repository managers and researchers from across all scientific domains. Details of usage of these services is covered in the Uptake of CORE section.

The Dashboard provides a suite of tools for repository management, content enrichment, metadata quality assessment and open access compliance checking. Further, it can provide statistics regarding content downloads and suggestions for improving the efficiency of harvesting and the quality of metadata.

CORE Discovery helps users to discover freely accessible copies of research papers. There are several methods for interacting with the Discovery tool. First, as a plugin for repositories, enriching metadata only pages in repositories with links to open access copies of full text documents. Second, via a browser extension for researchers and anyone interested in reading scientific documents. And finally as an API service for developers.

Recommender

The recommender is a plugin for repositories, journal systems and web interfaces that provides suggestions on relevant articles to the one currently displayed. Its purpose is to support users in discovering articles of interest from across the network of open access repositories. It is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org.

OAI Resolver

An OAI (Open Archives Initiative) identifier is a unique identifier of a metadata record. OAI identifiers are used in the context of repositories using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAI Identifiers are viable persistent identifiers for repositories that can be, as opposed to DOIs, minted in a distributed fashion and cost-free, and which can be resolvable directly to the repository rather than to the publisher. The CORE OAI Resolver can resolve any OAI identifier to either a metadata page of the record in CORE or route it directly to the relevant repository page. This approach has the potential to increase the importance of repositories in the process of disseminating knowledge.

Uptake of CORE

As of February 2023, CORE averages over 40 million monthly active users and is the top 10th website in the category Science and Education according to SimilarWeb ( https://www.similarweb.com/ ). There are currently 4,700 registered API users and 2,880 registered dataset users. The CORE Dashboard is currently used by 499 institutional repositories to manage their open access content, monitor content download statistics, manage issues with metadata within the repository and ensure compliance with OA funder policies, notably REF in the U.K. The CORE Discovery plugin has been integrated into 434 repositories and the browser extension has been downloaded by more than 5,000 users via the Google Chrome Web Store ( https://chrome.google.com/webstore/category/extensions ). The CORE Recommender has been embedded in 70 repository systems including the University of Cambridge and arXiv.

In this section we discuss differences between CORE and other open access aggregation services and present several real-word use cases where CORE was used to develop services to support science. In this section we also present our future plans.

Existing open access aggregation services

Currently there are a number of open access aggregation services available (Table  4 ), with some examples being BASE ( https://base-search.net/ ), OpenAIRE ( https://www.openaire.eu/ ), Unpaywall ( http://unpaywall.org/ ), Paperity ( https://paperity.org/ ). BASE (Bielfield Academic Search Engine) is a global metadata harvesting service. It harvests repositories and journals via OAI-PMH and exposes the harvested content through an API and a dataset. OpenAIRE is a network of open access data providers who support open access policies. Even though in the past the project focused on European repositories, it has recently expanded by including institutional and subject repositories from outside Europe. A key focus of OpenAIRE is to assist the European Council to monitor compliance of its open access policies. OpenAIRE data is exposed via an API. Paperity is a service which harvests publications from open access journals. Paperity harvests both metadata and full text but does not host full texts. SHARE (Shared Access Research Ecosystem) is a harvester of open access content from US repositories. Its aim is to assist with the White House Office of Science and Technology Policy (OSTP) open access policies compliance. Even though SHARE harvests both metadata and full text it does not host the latter. Unpaywall is not primarily a harvester, but rather collects content from Crossref, whenever a free to read available version can be retrieved. It processes both metadata and full text but does not host them. It exposes the discovered links to documents through an API.

CORE differs from these services in a number of ways. CORE is currently the largest database of full text OA documents. In addition, CORE offers via its API a rich metadata record for each item in its collection which includes additional enrichments, contrary, for example, to Unpaywall’s API, which focuses only on delivering to the user information as to whether a free to read version is available. CORE also provides the largest number of links to OA content. To simplify access to data for end users it provides a number of ways for accessing its collection. All of the above services are free to use for research purposes however both CORE and Unpaywall also offer services to commercial partners on a paid-for basis.

Existing publication databases

Apart from OA aggregation services, a number of other services exists for searching and downloading scientific literature (Table  5 ). One of the main publication databases is Crossref ( https://www.crossref.org/ ), an authoritative index of DOI identifiers. Its primary function is to maintain metadata information associated with each DOI. The metadata stored by Crossref includes both OA and non-OA records. Crossref does not store publication full text, but for many publications provides full text links. As of February 2023, 5.9 m records in Crossref were associated with an explicit Creative Commons license (we have used the Crossref API to determine this number). Although Crossref provides an API, it does not offer its data for download in bulk, or provide a data sync service.

The remaining services from Table  5 can be roughly grouped into the following two categories: 1) citation indices, 2) academic search engines and scholarly graphs. The two major citation indices are Elsevier’s Scopus ( https://www.elsevier.com/solutions/scopus ) and Clarivate’s Web of Science ( https://clarivate.com/webofsciencegroup/solutions/web-of-science/ ), both of which are premium subscription services. Google Scholar, the best known academic search engine does not provide an API for accessing its data and does not permit crawling its website. Semantic Scholar ( https://www.semanticscholar.org/ ) is a relatively new academic search service which aims to create an “intelligent academic search engine” 18 . Dimensions ( https://www.dimensions.ai/ ) is a service focused on data analysis. It integrates publications, grants, policy documents, and metrics. 1findr ( https://1findr.1science.com/home ) is a curated abstract indexing service. It provides links to full text, but no API or a dataset for download.

The added value of CORE

There are other services that claim to provide access to a large dataset of open access papers. In particular, Unpaywall 2 , claim to provide access to 46.4 million free to read articles, and BASE, who state they provide access to full texts of about 60% of their 300 million metadata records. However, these statistics are not directly comparable to the numbers we report and are a product of a different focus of these two projects. This is because both the analysis of BASE and now Unpaywall define “providing access to” in terms of having a list of URLs from which a human user can navigate to the full text of the resource. This means that both Unpaywall and BASE do not collect these full text resources, which is also why they do not face many of the challenges we described in the Introduction. Using this approach, we could say that the CORE Dataset provides access to approximately 139 million full texts, i.e. about 48% of our 291 million metadata records point to a URL from which a human can navigate to the full text. However, to people concerned with text and data mining of scientific literature, it makes little sense to count URLs pointing to many different domains on the Web as the number of full texts made available.

As a result, our 32.8 million statistic refers to the number of OA documents we identified, downloaded, extracted text from, validated their relationship to the metadata record and the full texts of which we host on the CORE servers and make available to others. In contrast, BASE and Unpaywall do not aggregate the full texts of the resources they provide access to and consequently do not offer the means to interact with the full texts of these resources or offer bulk download capability of these resources for text analytics over scholarly literature.

We have also integrated CORE data with the OpenMinTeD infrastructure, a European Commission funded project which aimed to provide a platform for text mining of scholarly literature in the cloud 6 .

A number of academia and industry partners have utilised CORE in their services. In this section we present three existing uses of CORE demonstrating how CORE can be utilised to support text and data mining use cases.

Since 2017, CORE has been collaborating with a range of scholarly search and discovery systems. These include Naver ( https://naver.com/ ), Lean Library ( https://www.leanlibrary.com/ ) and Ontochem ( https://ontochem.com/ ). As part of this work, CORE serves as a provider of full text copies of reserch papers to existing records in these systems (Lean Library) or even supplies both metadata and full texts for indexing (Ontochem, NAVER). This collaboration also benefits CORE’s data providers as it expands and increases the visibility of their content.

In 2019, CORE entered into a collaboration with Turnitin, a global leader in plagiarism detection software. By using the CORE FastSync service, Turnitin’s proprietary web crawler searches through CORE’s global database of open access content and metadata to check for text similarity. This partnership enables Turnitin to significantly enlarge its content database in a fast and efficient manner. In turn, it also helps protect open access content from misuse, thus protecting authors and institutions.

As of February 2023, CORE Recommender 19 is actively running in over 70 repositories including the University of Cambridge institutional repository and arXiv.org among others. The purpose of the recommender is to improve the discoverability of research outputs by providing suggestions for similar research papers both within the collection of the hosting repository and the CORE collection. Repository managers can install the recommender to advance the accessibility of other scientific papers and outreach to other scientific communities, since the CORE Recommender acts as a gate to millions of open access research papers. The recommender is integrated with the CORE search functionality and is also offered as a plugin for all repository software, for example EPrints, DSpace, etc. as well as open access journals and any other webpage. Based on the fact that CORE harvests open repositories, the recommender only displays research articles where the full text is available as open access, i.e. for immediate use, without access barriers or limited rights’ restrictions. Through the recommender, CORE promotes the widest discoverability and distribution of the open access scientific papers.

Future work

An ongoing goal of CORE is to keep growing the collection to become a single point of access to all of world’s open access research. However, there are a number of other ways we are planning to improve both the size and ease of access to the collection. The CORE Harvesting System was designed to enable adding new harvesting steps and enrichment tasks. There remains scope for adding more of such enrichments. Some of these are machine learning powered, such as classification of scientific citations 20 . Further, CORE is currently developing new methodologies to identify and link different versions of the same article. The proposed system, titled CORE Works, will leverage CORE’s central position in the OA infrastructure landscape and will link different versions of the same paper using a unique identifier. We will continue to keep linking the CORE collection to scholarly entities from other services, thereby making CORE data participate in a global scholarly knowledge graph.

In the Introduction section we focused on a a number of challenges researchers face when collecting research literature for text and data mining. In this section, we instead focus on the perspective of a research literature aggregator, i.e. a system whose goal is to continuously provide seamless access to research literature aggregated from thousands of data providers worldwide in a way that enables the resulting research publication collection to be used by others in production applications. We describe the challenges we had to overcome to build this collection and to keep it continuously up-to-date, and present the key technical innovations which allowed us to greatly increase the size of the CORE collection and become a leading provider of open access literature which we illustrate using our content growth statistics.

CORE Harvesting system (CHARS)

CORE Harvesting System (CHARS) is the backbone of our harvesting process. CHARS uses the Harvesting Scheduler (Section CHARS_architecture) to select data providers to be processed next. It manages all the running processes (tasks) and ensures the available compute resources are well utilised.

Prior to implementing CHARS, CORE was centralised around data providers rather than around individual tasks needed to harvest and process these data providers (e.g. metadata download and parsing, full text download, etc.). Consequently, even though the scaling up and the continuation of this system was possible, the infrastructure was not horizontally scalable and the architecture suffered from tight coupling of services. This was not consistent with CORE’s high availability requirements and was regularly causing problems in the complexity of maintenance. In response to these challenges, we designed CHARS using a microservices architecture, i.e. using small manageable autonomous components that work together as part of a larger infrastructure 21 . One of the key benefits of microservices-oriented architecture is that the implementation focus can be put on the individual components which can be improved and redeployed as frequently as needed and independently of the rest of the infrastructure. As the process of open access content harvesting can be inherently split into individual consecutive tasks, a microservices-oriented architecture presents a natural fit for aggregation systems like CHARS.

Tasks involved in open access content harvesting

The harvesting process can be described as a pipeline where each task performs a certain action and where the output of each task feeds into the next task. The input to this pipeline is a set of data providers and the final output is a system populated with records of research papers available from them. The main types of key tasks currently performed as part of CORE’s harvesting system are (Fig.  8 ):

Metadata download: The metadata exposed by a data provider via OAI-PMH are downloaded and stored in the file system (typically as an XML). The downloading process is sequential, i.e. a repository provides typically between 100–1,000 metadata records per request and a resumption token. This token is then used to provide the next batch. As a result, full harvesting can a significant amount of time (hours-days) for large data providers. Therefore, this process has been implemented to provide resilience to a range of communication failures.

Metadata extraction : Metadata extraction parses, cleans, and harmonises the downloaded metadata and stores them into the CORE internal data structure (database). The harmonisation and cleaning process addresses the fact that different data providers/repository platforms describe the same information in different ways (syntactic heterogeneity) as well as having different interpretations for the same information (semantic heterogeneity).

Full text download : Using links extracted from the metadata CORE attempts to download and store publication manuscripts. This process is non-trivial and is further described in the Using OAI-PMH for content harvesting section.

Information extraction : Plain text from the downloaded manuscripts is extracted and processed to create a semi-structured representation. This process includes a range of information extraction tasks, such as references extraction.

Enrichment : The enrichment task works by increasing both metadata and full text harvested from the data providers with additional data from multiple sources. Some of the enrichments are performed directly by specific tasks in the pipeline such as language detection and document type detection. The remaining enrichments that involve external datasets are performed externally and independently to the CHARS pipeline and ingested into the dataset as described in the Enrichments section.

Indexing : The final step in the harvesting pipeline is indexing the harvested data. The resulting index powers CORE’s services, including search, API and FastSync.

figure 8

CORE Harvesting Pipeline. Each tasks’ output produces the input for the following task. In some cases the input is considered as a whole, for example all the content harvested from a data provider, while in other cases, the output is split in multiple small tasks performed on a record level.

Scalable infrastructure requirements

Based on the experience obtained while developing and maintaining our harvesting system as well as taking into consideration the features of the CiteSeerX 22 architecture, we have defined a set of requirements for a scalable harvesting infrastructure 8 . These requirements are generic and apply to any aggregation or digital library scenario. These requirements informed and are reflected in the architecture design of CHARS (Section CHARS architecture):

Easy to maintain: The system should be easy to manage, maintain, fix, and improve.

High levels of automation: The system should be completely autonomous while allowing manual interaction.

Fail fast: Items in the harvesting pipeline should be validated immediately after a task is performed, instead of having only one and final validation at the end of the pipeline. This has the benefit of recognising issues and enabling fixes earlier in the process.

Easy to troubleshoot: Possible code bugs should be easily discerned.

Distributed and scalable: The addition of more compute resources should allow scalability, be transparent and replicable.

No single point of failure: A single crash should not affect the whole harvesting pipeline, individual tasks should work independently.

Decoupled from user-facing systems: Any failure in the ingestion processing services should not have an immediate impact on user-facing services.

Recoverable: When a harvesting task stops, either manually or due to a failure, the system should be able to recover and resume the task without manual intervention.

Performance observable: The system’s progress must be properly logged at all times and overlay monitoring services should be set up to provide a transparent overview of the services’ progress at all times, to allow early detection of scalability problems and identification of potential bottlenecks.

CHARS architecture

An overview of CHARS is shown in Fig.  9 . The system consists of the following main software components:

Scheduler: it becomes active when a task finishes. It monitors resource utilisation and selects and submits data providers to be harvested.

Queue (Qn): a messaging system that assists with communication between parts of the harvesting pipeline. Every individual task, such as metadata download, metadata parsing, full text download, and language detection, has its own message queue.

Worker (W i ): an independent and standalone application capable of executing a specific task. Every individual task has its own set of workers.

figure 9

CORE Harvesting System.

A complete harvest of a data provider can be described as follows. When an existing task finishes, the scheduler is activated and informed of the result. It then uses the formula described in Appendix A to assign a score to each data provider. Depending on current resource utilisation, i.e. if there are any idle workers, and the number of data providers already scheduled for harvesting, the data provider with the highest score is then placed in the first queue Q 1 which contains data providers scheduled for metadata download. Once one of the metadata download workers W i -W j becomes available, a data provider is taken out of the queue and a new download of its metadata starts. Upon completion, the worker notifies the scheduler and, if the task is completed successfully, places the data provider in the next queue. This process continues until the data provider passes through the entire pipeline.

While some of the tasks in the pipeline need to be performed at the granularity of data providers, specifically metadata download and parsing, other tasks, such as full text extraction and language detection, can be performed at the granularity of individual records. While these tasks are originally scheduled at the granularity of data providers, only the individual records of a selected data provider which require processing are subsequently independently placed in the appropriate queue. Workers assigned to these tasks then process the individual records in the queue and they move through the pipeline once completed.

A more detailed description of CHARS, which includes technologies used to implement it, as well as other details can be found in 8 .

The harvesting scheduler is a component responsible for identifying data providers which need to be harvested next and placing these data providers in the harvesting queue. In the original design of CORE, our harvesting schedule was created manually, assigning the same harvesting frequency to every data provider. However, we found this approach inefficient as it does not scale due to the varying data providers size, differences in the update frequency of their databases and the maximum data delivery speeds of their repository platforms. To address these limitations, we designed the CHARS scheduler according to our new concept of “pro-active harvesting.” This means that the scheduler is event driven. It is triggered whenever the underlying hardware infrastructure has resources available to determine which data provider should be harvested next. The underlying idea is to maximise the number of ingested documents over a unit of time. The pseudocode and the formula we use to determine which repository to harvest next is described in Algorithm 1.

The size of the metadata download queue, i.e. the queue which represents an entry into the harvesting pipeline, is kept limited in order to keep the system responsive to the prioritisation of data providers. A long queue makes prioritising data providers harder, as it is not known beforehand how long the processing of a particular data provider will take. An appropriate size of the queue ensures a good balance between the reactivity and utilisation of the available resources.

Using OAI-PMH for content harvesting

We now describe the third key technical innovation which enables us to harvest full text content (as opposed to just metadata) from data providers using the OAI-PMH protocol. This process represents one step in the harvesting pipeline (Fig.  9 ), specifically, the third step which is activated after data provider metadata have been downloaded and parsed.

The OAI-PMH protocol was originally designed for metadata harvesting only, but due to its wide adoption and lack of alternatives it has been used as an entry point for full text harvesting from repositories. Full text harvesting is achieved by using URLs found in the metadata records to discover the location of the actual resource and subsequently downloading it 9 . We summarised the key challenges of this approach in the Challenges related to the use of OAI-PMH protocol for content harvesting section. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

The procedure works in the following way. First, all metadata records from a selected data provider with no full text are collected. Those records for which full text download was attempted within the retry period ( RP ) (usually six months) are filtered out. This is to avoid repeatedly downloading URLs that do not lead to the sought after documents. The downside of this approach is that if a data provider updates a link in the metadata, it might take up to the duration of the retry period to acquire the full text.

Algorithm 1

research papers open access

Next, the records are further filtered using a set of rules and heuristics we developed to a) increase the chances of identifying the URL leading to the described document quickly and b) to ensure that we identify the correct document. These filtering rules include:

Accepted file extensions: URLs are filtered according to a list of accepted file extensions. URLs ending with extensions such as .pptx that clearly indicate that the URL does not link to the required resource are removed from the list.

Same domain policy: URLs in the OAI-PMH metadata can link to any resources and domains. For example, a common practice is to provide a link to the associated presentation, dataset, or another related resource. As these are often stored in external databases, filtering out all URLs that lead to an external domain, i.e. domain different than the domain of the data provider, presents a simple method of avoiding the download of resources which with very high likelihood do not represent the target document. Exceptions include dx.doi.org and hdl.handle.net domains whose purpose is to provide a persistent identifier pointing to the document. The same domain policy is disabled for data providers which are aggregators and link to many different domains by design.

Provider-specific crawling heuristics: Many data providers follow a specific pattern when composing URLs. For example, a link to a full text document may be composed of the following parts: data provider URL  +  record handle  +  .pdf . For data providers utilising such patterns, URLs may be composed automatically where the relevant information (record handle) is known to us from the metadata. These generated URLs are then added to the list of URLs obtained from the metadata.

Prioritising certain URLs: As it is more likely for PDF URL to contain the target record than for an HTML URL, the final step is to sort URLs according to file and URL type. Highest priority is assigned to URLs that uses repository software specific patterns to identify full text, document, and PDF filetypes, while the lowest priority is assigned to hdl.handle.net URLs.

The system then attempts to request the document at each URL and download it. After each download, checks are performed to determine whether the downloaded document represents the target record. Currently, the downloaded document has to be a valid PDF with a title matching the original metadata record. If the target record is identified, the downloaded document is stored and the download process for that record ends. If the downloaded document contains an HTML page, URLs are extracted from this page and filtered using the same method mentioned above. This is because it is common in some of the most widely used repository systems such as DSpace for the documents not to be directly referenced from within the metadata records. Instead, the metadata records typically link to an HTML overview page of the document. To deal with this problem, we use the concept of harvesting levels. A maximum harvesting level corresponds to the maximum search depth for the referenced document. The algorithm finishes either as soon as the first matching document is found or after all the available URLs up to the maximum harvesting level have been exhausted. Algorithm 2 describes our approach for collecting the full texts using the OAI-PMH protocol. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

Algorithm 2

research papers open access

CHARS limitations

Despite overcoming the key issues to scalable harvesting of content from repositories, there still remains a number of important challenges. The first relates to the difficulty of estimating the optimal number of workers in our system to run efficiently. While the worker allocation is still largely established empirically, we are investigating more sophisticated approaches based on formal models of distributed computation, such as Petri Nets. This will allow us to investigate new approaches to dynamically allocating and launching workers to optimise the usage of our resources.

Enrichments

Conceptually, two types of enrichment processes are used within CORE: 1) an online enrichment process enriching a single record at the time of it being processed by the CHARS pipeline and 2) a periodic offline enrichment process which enriches a record based on information in external datasets (Fig.  10 ).

figure 10

CORE Offline Enrichments.

Online enrichments

Online enrichments are fully integrated into the CHARS pipeline described earlier in this section. These enrichments generally involve the application of machine learning models and rule-based tools to gather additional insights about the record, such as language detection, document type detection. As opposed to offline enrichments, online enrichments are always performed just once for a given record. The following is a list of the current enrichments performed online:

Article type detection: A machine learning algorithm assigns each publication one of the following four types: presentation, thesis, research paper, other. In the future we may include other types.

Language identification: This task uses third-party libraries to identify the language based on the full text of a document. The resulting language is then compared to the one provided by the metadata record. Some heuristics are applied to disambiguate and harmonise languages.

Offline enrichments

Offline enrichments are carried out by means of gathering a range of information from large third-party scholarly datasets (research graphs). Such information includes metadata that do not necessarily change, such as a DOI identifier, as well as metadata that evolve, such as the number of citations. Especially due to the latter, CORE performs offline enrichments periodically, i.e. all records in CORE go through this process repeatedly at specified time intervals (currently once per month).

The process is depicted in Fig.  10 . The initial mapping of a record is carried out using a DOI, if available. However, as the majority of records from repositories do not come with a DOI, we carry out a matching process against the Crossref database using a subset of metadata fields including title, authors and year. Once the mapping is performed, we can harmonise fields as well as gather a wide range of additional useful data from relevant external databases, thereby enriching the CORE record. Such data include, ORCID identifiers, citation information, additional links to freely available full texts, field of study information and PubMed identifiers. Our solution is based on a set of map-reduce tasks to enrich the dataset and implemented on a Cloudera Enterprise Data Hub ( https://www.cloudera.com/products/enterprise-data-hub.html ) 23 , 24 , 25 , 26 .

Data availability

CORE provides several large data dumps of the processed and aggregated data under the ODC-BY licence ( https://core.ac.uk/documentation/dataset ). The only condition for both commercial and non-commercial reuse of these datasets is to acknowledge the use of CORE in their outputs. Additionally, CORE makes its API and most recent data dump freely available to registered individual users and researchers. Please note that CORE claims no rights in the aggregated content itself which is open access and therefore freely available to everyone. All CORE data rights correspond to the sui generis database rights of the aggregated and processed collection.

Licences for CORE services, such as the API and FastSync, are available for commercial users wishing to benefit from convenient access to CORE data with guaranteed level of customer support. The organisation running CORE, i.e. The Open University, is a charitable organisation fully committed to the Open Research mission. CORE is a signatory of the Principles of Open Scholarly Infrastructure (POSI) ( https://openscholarlyinfrastructure.org/posse ). No profit generation is practised. Instead, CORE’s income from licences to commercial parties is used solely to provide sustainability by means of enabling CORE to become less reliant on unstable project grants, thus offsetting and reducing the cost of CORE to the taxpayer. This is done in full compliance with the principles and best practices of sustainable open science infrastructure.

Code availability

CORE consists of multiple services. Most of our source code is open source and available in our public repository on GitHub ( https://github.com/oacore/ ). As of today, we are unfortunately not yet able to provide the source code to our data ingestion module. However, as we want to be as transparent as possible with our community, we have documented in this paper the key algorithms and processes which we apply using pseudocode.

Bornmann, L. & Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. JASIST 66 (11), 2215–2222 (2015).

CAS   Google Scholar  

Piwowar, H. et al . The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 6 , e4375 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Saggion, H. & Ronzano, F. Scholarly data mining: making sense of scientific literature. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) : 1–2 (2017).

Kim, E. et al . Materials synthesis insights from scientific literature via text extraction and machine learning. Chemistry of Materials 29 (21), 9436–9444 (2017).

Article   CAS   Google Scholar  

Jacobs, N. & Ferguson, N. Bringing the UK’s open access research outputs together: Barriers on the Berlin road to open access. Jisc Repository (2014).

Knoth, P., Pontika, N. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not? In: INTEROP2016 (2016).

Herrmannova, D., Pontika, N. & Knoth, P. Do Authors Deposit on Time? Tracking Open Access Policy Compliance. Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries , Urbana-Champaign, IL (2019).

Cancellieri, M., Pontika, N., Pearce, S., Anastasiou, L. & Knoth, P. Building Scalable Digital Library Ingestion Pipelines Using Microservices. Proceedings of the 11th International Conference on Metadata and Semantics Research (MTSR 2017) : 275–285. Springer (2017).

Knoth, P. From open access metadata to open access content: two principles for increased visibility of open access content. Proceedings of the 2013 Open Repositories Conference , Charlottetown, Prince Edward Island, Canada (2013).

Knoth, P.; Cancellieri, M. & Klein, M. Comparing the Performance of OAI-PMH with ResourceSync. Proceedings of the 2019 Open Repositories Conference , Hamburg, Germany (2019).

Kapidakis, S. Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting. Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science 11057 , 16–31 (2018).

Google Scholar  

Knoth, P. and Zdrahal, Z. CORE: three access levels to underpin open access. D-Lib Magazine 18 (11/12) (2012).

Haslhofer, B. et al . ResourceSync: leveraging sitemaps for resource synchronization. Proceedings of the 22nd International Conference on World Wide Web : 11–14 (2013).

Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLOS One 9 (5), e93949 (2014).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Charalampous, A. & Knoth, P. Classifying document types to enhance search and recommendations in digital libraries. Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science 10450 , 181–192 (2017).

Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105 (4), 1118–1123 (2008).

Article   ADS   CAS   Google Scholar  

D’Angelo, C. A. & Abramo, G. Publication rates in 192 research fields of the hard sciences. Proceedings of the 15th ISSI Conference : 915–925 (2015).

Ammar, W. et al . Construction of the Literature Graph in Semantic Scholar. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 3 (Industry Papers): 84–91 (2018).

Knoth, P. et al . Towards effective research recommender systems for repositories. Open Repositories , Bozeman, USA (2017).

Pride, D. & Knoth, P. An Authoritative Approach to Citation Classification. Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Virtual–China (2020).

Newman, S. Building microservices: designing fine-grained systems. O’Reilly Media, Inc. (2015).

Li, H. et al . CiteSeer χ : a scalable autonomous scientific digital library. Proceedings of the 1st International Conference on Scalable Information Systems , ACM (2006).

Bastian, H., Glasziou, P. & Chalmers, I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS medicine 7 (9), e1000326 (2010).

Shojania, K. G. et al . How quickly do systematic reviews go out of date? A survival analysis. Annals of internal medicine 147 (4), 224–233 (2007).

Article   PubMed   Google Scholar  

Tsafnat, G. et al . Systematic review automation technologies. Systematic reviews 3 (1), 74 (2014).

Harzing, A.-W. & Alakangas, S. Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics 112 (3), 1887–1894 (2017).

Article   Google Scholar  

Download references

Acknowledgements

We would like to acknowledge the generous support of Jisc, under a number of grants and service contracts with The Open University. These included projects CORE, ServiceCORE, UK Aggregation (1 and 2) and DiggiCORE, which was co-funded by Jisc with NWO. Since 2015, CORE has been supported in three iterations under the Jisc Digital Services–CORE (JDSCORE) service contract with The Open University. Within Jisc, we would like to thank primarily the CORE project managers, Andy McGregor, Alastair Dunning, Neil Jacobs and Balviar Notay. We would also like to thank the European Commission for funding that contributed to CORE, namely OpenMinTeD (739563) and EOSC Pilot (654021). We would like to show our gratitude to all current CORE Team members who contributed to CORE but are not authors of the manuscript, namely Valeriy Budko, Ekaterine Chkhaidze, Viktoriia Pavlenko, Halyna Torchylo, Andrew Vasilyev and Anton Zhuk. We would like to show our gratitude to all past CORE Team members who have contributed to CORE over the years, namely Lucas Anastasiou, Giorgio Basile, Aristotelis Charalampous, Josef Harag, Drahomira Herrmannova, Alexander Huba, Bikash Gyawali, Tomas Korec, Dominika Koroncziova, Magdalena Krygielova, Catherine Kuliavets, Sergei Misak, Jakub Novotny, Gabriela Pavel, Vojtech Robotka, Svetlana Rumyanceva, Maria Tarasiuk, Ian Tindle, Bethany Walker and Viktor Yakubiv, Zdenek Zdrahal and Anna Zelinska.

Author information

Drahomira Herrmannova

Present address: Oak Ridge National Laboratory Oak Ridge, Oak Ridge, TN, USA

Authors and Affiliations

Knowledge Media Institute, The Open University Walton Hall, Milton Keynes, UK

Petr Knoth, Drahomira Herrmannova, Matteo Cancellieri, Lucas Anastasiou, Nancy Pontika, Samuel Pearce, Bikash Gyawali & David Pride

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. is the Founder and Head of CORE. He conceived the idea and has been the project lead since the start in 2011. He researched and created the first version of CORE, acquired funding, built the team, and has been managing and leading all research and development. M.C., L.A., S.P. and P.K. designed, worked out all technical details, and implemented significant parts of the system including CHARS, the harvesting scheduler, and the OAI-PMH content harvesting method. All authors contributed to the maintenance, operation and improvements of the system. D.H. drafted the initial version of the manuscript based on consultations with P.K. D.P. and P.K. wrote the final manuscript with additional input from L.A. and N.P. D.H., M.C. and L.A. performed the data analysis for the paper and D.H. produced the figures. D.H., D.P., B.G. and L.A. participated in research activities and tasks related to CORE following the instructions and directly supervised by P.K.

Corresponding author

Correspondence to Petr Knoth .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Knoth, P., Herrmannova, D., Cancellieri, M. et al. CORE: A Global Aggregation Service for Open Access Papers. Sci Data 10 , 366 (2023). https://doi.org/10.1038/s41597-023-02208-w

Download citation

Received : 18 May 2021

Accepted : 03 May 2023

Published : 07 June 2023

DOI : https://doi.org/10.1038/s41597-023-02208-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research papers open access

Illustration on Open Science

  • UNESCO Recommendation on Open Science
  • Development of the Recommendation
  • Implementation strategy
  • Reporting by Member States
  • Open Science Toolkit
  • Knowledge Sharing Index
  • Capacity Building Index

Open Access to Research Papers

Making scholarly research outputs openly available is easy, legal, and has demonstrable benefits to authors, making it a good beginning step for a researcher just beginning to explore the open world. There is a set of knowledge required to navigate the Open Access landscape, involving copyright, article status, repositories, and economics. This module will introduce key concepts and tools that can help a researcher make their work openly available and maximize the benefits to themselves and others.

Related items

  • Type of Resources: E-course or annotated syllabus, learning module
  • Category: Open access
  • Target audience: Researchers
  • Target audience: STEM students
  • Target audience: Library & information specialists
  • Target audience: Early career professionals
  • See more add
  • Contact: [email protected]
  • Natural Sciences

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of f1000res

  • PMC4837983.1 ; 2016 Apr 11
  • PMC4837983.2 ; 2016 Jun 9
  • ➤ PMC4837983.3; 2016 Sep 21

The academic, economic and societal impacts of Open Access: an evidence-based review

Jonathan p. tennant.

1 Department of Earth Science and Engineering, Imperial College London, London, UK

François Waldner

2 Earth and Life Institute, Université catholique de Louvain, Louvain-la-Neuve, Belgium

Damien C. Jacques

Paola masuzzo.

3 Medical Biotechnology Center, VIB, Ghent, Belgium

4 Department of Biochemistry, Ghent University, Ghent, Belgium

Lauren B. Collister

5 University Library System, University of Pittsburgh, Pittsburgh, PA, USA

Chris. H. J. Hartgerink

6 Department of Methodology and Statistics, Tilburg University, Tilburg, Netherlands

All authors contributed equally to the writing of this manuscript using the Overleaf collaborative writing platform.

Version Changes

Revised. amendments from version 2.

The final version of this manuscript includes minor edits reflected in the last review by Peter Suber, as well as stylistic edits noted by other commenters. We hope that this paper will continue to be shared and discussed, and that it helps with future developments in Open Access.

Peer Review Summary

Review dateReviewer name(s)Version reviewedReview status
Peter Suber Approved with Reservations
Gwilym Lockwood Approved
Peter Suber Approved with Reservations
Chris Chambers Approved
Chris Chambers Approved
Anne Tierney Approved
Paige Brown Jarreau Approved with Reservations
Peter Suber Approved
Gwilym Lockwood Approved

Ongoing debates surrounding Open Access to the scholarly literature are multifaceted and complicated by disparate and often polarised viewpoints from engaged stakeholders. At the current stage, Open Access has become such a global issue that it is critical for all involved in scholarly publishing, including policymakers, publishers, research funders, governments, learned societies, librarians, and academic communities, to be well-informed on the history, benefits, and pitfalls of Open Access. In spite of this, there is a general lack of consensus regarding the potential pros and cons of Open Access at multiple levels. This review aims to be a resource for current knowledge on the impacts of Open Access by synthesizing important research in three major areas: academic, economic and societal. While there is clearly much scope for additional research, several key trends are identified, including a broad citation advantage for researchers who publish openly, as well as additional benefits to the non-academic dissemination of their work. The economic impact of Open Access is less well-understood, although it is clear that access to the research literature is key for innovative enterprises, and a range of governmental and non-governmental services. Furthermore, Open Access has the potential to save both publishers and research funders considerable amounts of financial resources, and can provide some economic benefits to traditionally subscription-based journals. The societal impact of Open Access is strong, in particular for advancing citizen science initiatives, and leveling the playing field for researchers in developing countries. Open Access supersedes all potential alternative modes of access to the scholarly literature through enabling unrestricted re-use, and long-term stability independent of financial constraints of traditional publishers that impede knowledge sharing. However, Open Access has the potential to become unsustainable for research communities if high-cost options are allowed to continue to prevail in a widely unregulated scholarly publishing market. Open Access remains only one of the multiple challenges that the scholarly publishing system is currently facing. Yet, it provides one foundation for increasing engagement with researchers regarding ethical standards of publishing and the broader implications of 'Open Research'.

Introduction

Open Access (OA) refers to the removal of major obstacles to accessing, sharing and re-using the outputs of scholarly research. The rationale is that the research process is facilitated by ensuring rapid and widespread access to research findings such that all communities have the opportunity to build upon them and participate in scholarly conversations. As such, the major drivers behind OA relate to within- and between-community equality ( Veletsianos & Kimmons, 2012 ), as well as bridging the global North-South research divide ( Adcock & Fottrell, 2008 ). Reflecting this ambition, there are currently over 700 OA policies and mandates recorded worldwide from a range of research institutes and funding bodies ( roarmap.eprints.org ). OA pertains to documents made available via two main pathways: the Gold route and the Green route ( Harnad et al. , 2008 ). The Gold route refers to freely accessible research articles at the point of publication. This route is often, although not always, accompanied by article processing charges (APCs). The Green route refers to author self-archiving, in which peer-reviewed articles and/or not peer-reviewed pre-prints are posted online to an institutional and/or subject repository, or to a personal website. This route is often dependent on journal or publisher policies on self-archiving ( sherpa.ac.uk/romeo ). Some publishers require an embargo period before deposition in public repositories is allowed. These embargoes are applied in order to avoid putative reductions in subscription income due to such self-archiving, although there is little evidence to support the existence of such embargoes ( Berners-Lee et al. , 2005 ; Bernius et al. , 2013 ; Henneken et al. , 2006 ; Houghton & Oppenheim, 2010 ; Swan & Brown, 2005 ). The Green route is also enabled through author rights retention, in which authors pre-emptively grant non-exclusive rights to their institutions before publishing any works. The institution then has the ability to make articles by these authors OA without seeking permission from the publishers (e.g., this is the case of the Dutch Taverne amendment that has declared self-archival of research after ‘a reasonable period of time’ a legal right ( Open Access NL, 2015 )). Through these dual pathways, almost 25% of all scholarly documents archived on the Web are now obtainable via OA somewhere on the Internet ( Khabsa & Giles, 2014 ).

A core issue remains: universal or even marginal access to approximately 75% of articles is not directly possible unless one either is in a privileged position to work at an institute that has subscription access to these articles, or has enough money to pay on a per-article basis (given that journals provide this feature; some do not). Subscriptions to all peer-reviewed journals is not affordable for any single individual, research institute or university ( Odlyzko, 2006 ; Suber, 2012 ). Consequently, the potential impact of research articles is never fully realized, impeding scientific progress by a lack of use, while simultaneously negatively affecting the recognition of individual researchers ( Hitchcock, 2013 ) and the funders who support their work.

Because of these issues, free and unrestricted access to primary research literature has become a global goal of the OA movement. The steady increase in OA over the past two decades has required careful negotiations between a range of stakeholders (e.g., librarians, funders, academics). Much of the driving force behind this global change has been through a combination of direct, grassroots advocacy initiatives and policy reforms from universities, funders and governments. The debates regarding the benefits of OA over subscription-based access often hinge on the increased value to academics. However, increased access has broader benefits to research through enhanced visibility, facilitating innovation by businesses and decreasing financial pressure on academic/research libraries (known more broadly as the ‘serials crisis’ ( McGuigan & Russel, 2008 )). Additionally, increased access to scholarly outputs might help foster a culture of greater scientific education and literacy, which in turn could have a direct impact on public policy ( European Commission, 2012 ; Zuccala, 2010 ), particularly in domains such as climate change and global health, as well as increasing public engagement in scientific research ( Stodden, 2010 ). OA also includes a moral aspect, where access to scientific knowledge and information is regarded as a fundamental feature of global human equality. For example, Article 27 of the United Nations Declaration of Human Rights states that " Everyone has the right to freely participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits. " ( United Nations, 1948 ).

This review aims to provide information on the various impacts of OA to scholarly research. We consider the impact of OA from the academic, economic, and societal perspective. In addition, we shortly consider the broader implications of OA on Open Data, a closely related aspect united under a general theme of Open Research or Open Science. By aggregating evidence from a range of primary sources, this review should be useful to those broadly interested in the impact of open scholarly research, as well as policymakers and others involved in implementing OA policies and strategies. We refrain from making predictions about the future of OA publishing or policy recommendations, as these are both beyond the scope of this work.

A brief history of Open Access

The OA movement is intrinsically tied to the development of the Internet and how it redefined communication and publishing ( Laakso et al. , 2011 ). With increased availability of Internet bandwidth, print articles have become virtually redundant, and sharing of information has never been cheaper. As a consequence, the costs per research article should have potentially decreased as a result of not investing material resources in publications printing and distribution. Therefore, widespread dissatisfaction with the expensive traditional publishing model has increased, resulting in the OA movement and concomitant innovations in scholarly publishing. A comprehensive timeline of the OA movement is provided as part of the Open Access Directory ( oad.simmons.edu/oadwiki/Timeline ).

Interest in using the Internet for facilitating access to scientific research coalesced throughout the 1990s, culminating with the 2001 conference on "Free Online Scholarship" by the Open Society Institute in Budapest. The result of this conference was the release of the Budapest Open Access Initiative (BOAI), which is recognized as one of the defining points of the OA movement. The BOAI was the first initiative to use the term "Open Access" and articulated the following definition:

  • By "open access" to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

This definition is broadly equivalent to the Creative Commons Attribution license (CC-BY), which is widely considered to be a standard for OA ( creativecommons.org/licenses/ ). One result of the growing OA movement is the rise of OA-only publishers, who publish exclusively digital content and have demonstrated that such a business model is financially feasible (but does not necessarily sustain the current journal ecosystem). Some of these publishers are for-profit and some are non-profit. For example, pioneer OA publishers BioMed Central (for-profit) and the Public Library of Science (PLOS) (non-profit) were founded in the early 2000s and remain successful OA publishing businesses to date. More recently, OA publishing has gained increasing momentum among researchers, funders, and governments. This has led to a proliferation of innovative approaches to publishing (e.g., PeerJ , peerj.com ; F1000Research , f1000research.com ; Open Library of Humanities , openlibhums.org ) and a range of different policies from research funders and institutes mandating OA. All of these different policies and new business models, combined with traditional publishers launching their own OA titles and programs, have made the overall OA ecosystem quite complex.

Even with this growing prevalence of publishers that facilitate OA to the scholarly literature, OA is still hardly ubiquitous. Bjork et al. (2009) estimated that the total number of published articles in 2006 was approximately 1,350,000. Of these, 4.6% became immediately accessible and an additional 3.5% became accessible after an embargo period of typically one year. Furthermore, usable copies of 11.3% could be found in repositories or on the author’s home pages. Since the U.S. National Institutes of Health (NIH) mandated archival of articles in the public PubMed Central repository in 2008, the cumulative number of OA articles in PMC has increased more than the number of non-OA articles (see Figure 1 ). In 2013, the total percentage of OA articles available was estimated at 24% of English-language scholarly documents accessible on the Web ( Khabsa & Giles, 2014 ).

An external file that holds a picture, illustration, etc.
Object name is f1000research-5-10376-g0000.jpg

Since 2004, the growth rate of OA articles is significantly higher than that of non-OA articles.

Although these estimates show OA is on the rise, the full potential of OA is far from achieved. Björk et al. (2014) pointed out that 62% of journals (from the top 100 journal publishers indexed by SCOPUS) endorse immediate Green OA self-archiving by their authors, 4% impose a 6-month embargo, and 13% impose a 12-month embargo. As such, 79% of articles published in any recent year could already be OA within 12 months after publication via Green OA, 62% of them immediately if authors were actually self-archiving properly ( Gargouri et al. , 2010 ; Gargouri et al. , 2012 ). The disconnect between practice and what is allowed has three potential explanations: (i) researchers are unsure whether they have the legal right to self-archive, (ii) they fear that it might put their article’s acceptance for publication at risk, and (iii) they believe that self-archiving may be a lot of work ( Harnad, 2006 ). Research funders and institutions worldwide are now beginning to realize that they need to alter their conditions to make OA mandatory ( Vincent-Lamarre et al. , 2016 ) in order to counteract these misconceptions of self-archiving ( Carr et al. , 2007 ; Swan & Brown, 2005 ). Swan & Brown (2005) have indicated that the vast majority of researchers (81%) would comply with mandatory OA if it were a condition of funding. On the other hand, it is worth mentioning that ensuring compliance with OA policies set by research institutions is rather difficult. Some tools, such as the Open Access Monitor ( http://symplectic.co.uk/elements-updates/introducing-open-access-monitor ), help institutions to track compliance with their OA policy.

Table 1 shows a non-exhaustive summary of the developments in the advancement of scholarly publishing and the OA movement. Included are the founding of major institutions in the movement as well as policy and legal developments. Several controversial moments are included, because they have spurred action or generated awareness for the movement. One of them is the suicide of Aaron Swartz, who was arrested for downloading JSTOR articles on the grounds that he allegedly intended to make these publicly available. Another ongoing controversy is scholarly piracy; this includes the Sci-Hub and LibGen projects, which have created an online repository of pirated scholarly papers (around 50 million at the time of writing). Both projects gained increased attention after becoming the target of a lawsuit by the publisher Elsevier. There have been mixed responses to these kinds of activities, polarising the view that illegal acts regress or weaken the case for OA, while some hail the development as the ‘Napster moment’ (i.e., a change inducing disruption; Rosenwald, 2016 ) for the OA movement, which will force the established industry to change. Regardless of the legality of it, Sci-Hub is used by a large number of people from all over the world to access research articles ( Bohannon, 2016 ; Elbakyan & Bohannon, 2016 ).

Y ear M ilestone
1454Invention of
1665January 5: First issue of The (later spelled ), the earliest academic journal published
in Europe and established by Denis de Sallo.
180725-year-old opens a small printing shop at 6 Reade Street in lower Manhattan.
1842May 10: founded what is now Springer Science+Business Media in Berlin.
1848 (son of Charles Wiley) gradually started shifting his focus away from literature toward scientific, technical, medical,
and other types of nonfiction publishing.
1880Foundation of .
1936First scientific book published by .
1990First .
1991An online repository of electronic preprints, known as e-prints, of scientific papers is founded in Los Alamos by the American
physicist Paul Ginsparg. It was renamed to in 1999. The total number of submissions by May 11st, 2016 (after 24.8
years) is 1,143,129 ( ).
1993Creation of the (renamed to the Open Society Foundations [OSF] since 2001) by the progressive
liberal business magnate George Soros. The OSF financially supports civil society groups around the world, with a stated aim
of advancing justice, education, public health and independent media.
1997Launch of in Brazil. There are currently 14 countries in the SciELO network and its journal collections: Argentina,
Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Mexico, Peru, Portugal, South Africa, Spain, Uruguay, and Venezuela.
1998 (PKP) is founded by John Willinsky in the Faculty of Education at UBC, with Pacific Press
Professorship endowment, dedicated to improving the scholarly and public quality of research.
PKP has created the (2000), (2001), (2002)
and the (2013).
2000 , the self-described first and largest OA science publisher and , a free digital repository for
biomedical and life sciences journal, is founded. In 2008, Springer announces the acquisition of BioMed Central, making it, in
effect, the world’s largest open access publisher.
2001An online petition calling for all scientists to pledge that from September 2001 they would discontinue submission of papers
to journals which did not make the full-text of their papers available to all, free and unfettered, either immediately or after a
delay of several months is released. The petition collected 34,000 signatures but publishers took no strong response to the
demands. Shortly thereafter, the (PLOS) was founded as an alternative to traditional publishing.
is currently the world’s largest journal by number of papers published (about 30,000 a year in 2015).
December 1–2: by the Open Society Institute to promote open access – at the time also
known as Free Online Scholarship. Where the Budapest Open Access Initiative (BOAI) was born.
2002February 14th: Release of the (BOAI), a public statement of principles relating to OA to the
research literature. This small gathering of individuals is recognised as one of the major defining events of the OA movement.
On the occasion of the 10th anniversary of the initiative, it was reaffirmed in 2012 and supplemented with a set of concrete
recommendations for achieving "the new goal that within the next ten years, Open Access will become the default method for
distributing new peer-reviewed research in every field and country."
Start of the Research in Health - programme of the World Health Organization and major publishers to enable
developing countries to access collections of biomedical and health literature online at reduced subscription costs. Together
with Research in Agriculture - , Research in the Environment - and Research for Development and
Innovation - programmes, it currently forms that provides developing countries with free or low cost access to
academic and professional peer-reviewed content online.
2008The (NIH) Public Access Policy, an OA mandate requiring that research papers resulting
from NIH funding must be freely and publicly available through PubMed Central within 12 months of publication, is officially
recorded.
2009The (Bill H.R 801 IH, also known as the "Conyers Bill") is submitted as a direct
response to the National Institutes of Health (NIH) Public Access Policy; intending to reverse it. The bill’s alternate name relates
it to U.S Representative John Conyers (D-MI), who introduced it at the 111th United States Congress on February 3, 2009.
2011Arrest of after he systematically downloaded articles from JSTOR, for alleged copyright infringement.
In reaction to the high cost of research papers behind paywalls, , the first known website to provide automatic
and free, but illegal, access to paywalled academic papers on a massive scale, is founded by Alexandra Elbakyan from
Kazakhstan.
2012Start of the , a trend wherein academics and researchers began to oppose restrictive copyright in traditional
academic journals and to promote free online access to scholarly articles.
Start of the campaign which specifically targeted Elsevier. It was initiated by a group of prominent
mathematicians who each made a commitment to not participate in publishing in Elsevier’s journals, and currently has over
15,933 co-signatories.
Start of the United States-based campaign in which open access advocates (Michael W. Carroll, Heather
Joseph, Mike Rossner, and John Wilbanks) appealed to the United States government to require that taxpayer-funded
research be made available to the public under open licensing. This campaign was widely successful, and the directive and
FASTR (the Fair Access to Science and Technology Research Act) have become defining pieces in the progress of OA in the
USA at the federal level.
Launch of , an OA journal that charges publication fees through researcher memberships, not on a per-article basis,
resulting in what has been called "a flat fee for ’all you can publish’". Note that as of October 2015 also have a flat rate
APC of $695.
2013January: The suicide of draws new international attention for the Open Access movement.
November: for students and early career researchers, which brought together more than 70
participants from 35 countries to engage on Open Access to scientific and scholarly research.
2014First in Washington DC, an annual conference for students and early career researchers on Open Access, Open
Data, and Open Educational resources.
Open Access is embedded the European Commission’s Research and Innovation programme.
2015Academic publisher makes a complaint in New York City for copyright infringement by . Sci-Hub is found
guilty and ordered to shut down. The website re-emerges under a different domain name as a consequence. A second
hearing in March 2016 is delayed due to failure of the defendant to appear in court, and to gather more evidence for the
prosecution.

The effect of Open Access upon academia

The two main ways in which OA affects academia are (i) through association with a higher documented impact of scholarly articles, as a result of availability and re-use, and (ii) through the possibility of non-restrictively allowing researchers to use automated tools to mine the scholarly literature. For the former, major arguments in favor of OA include the evidence that work that is openly available generates more academic citations, but also has more societal impact. In addition, appropriately-licensed OA works play a major role in academic education, including re-use in classes and for dissertations. The latter major argument involves non-restrictive access to the scholarly literature through appropriate licensing, making it possible to use automated tools to collect and analyze the entire body of scholarly literature in a legally sound framework and irrespective of copyright laws. The following sections cover these two effects of OA.

The potential impact advantage

Academic impact. Academic impact is frequently measured through citation counts, and these remain fundamental as the ‘currency units’ for researchers, research groups, institutes and universities. Lawrence (2001) was the first to propose that OA would have a citation advantage. The utility and consistency of the citation advantage across different research fields has been intensively debated because its magnitude substantially varies depending on the discipline ( Table 2 ). However, the general tendency identified by studies to date indicates that there is at least some association between OA publishing and increased citation counts across most disciplines ( Hajjem et al. , 2006 ; Antelman, 2004 ) ( Figure 2 and Table 2 ). A comprehensive and annotated bibliography of studies documenting potential citation impacts was created by Steve Hitchcock ( eprints.soton.ac.uk/354006/1/oacitation-biblio-snapshot0613.html ) and has been managed by SPARC Europe since 2013 ( sparceurope.org/oaca/ ).

An external file that holds a picture, illustration, etc.
Object name is f1000research-5-10376-g0001.jpg

The majority concluded that there is a significant citation advantage for Open Access articles. Source: Data from The Open Access Citation Advantage Service, SPARC Europe, accessed March 2016.

R eference D iscipline C itation advantage O rigin
Mathematics, Electrical Engineering, Political
Science, Philosophy
+91%, +51%, +86%, +45%
per discipline respectively
NA
Political ScienceStatistically significant
citation advantage
NA
Medicine, Biology, Agricultural Sciences,
Chemistry and University Journals
+200%NA
Mathematics+35%Quality advantage,
no evidence of early
advantage
(2008) Physiology-5%NA
Sciences, Social Sciences, and Humanities+1% but statistically
indistinguishable
No evidence of an early
advantage
All+8% for newly published
articles; +16% for citations
from developing countries
NA
Natural Sciences+210 up to +290%NA
Biology, Mathematics, Pharmacy and
Pharmacology
No clear tendency towards
an increase in impact
NA
(2010) Engineering, Biology, Biomedicine, Chemistry,
Psychology, Mathematics, Clinical Medicine,
Health, Physics, Social Science, Earth Sciences
+?% to ?% depending on the
discipline
Quality advantage is
confirmed no evidence
for selection bias
BiologyNo evidence of citation
advantage
NA
(2010) High Energy Physics+200%Early advantage
confirmed
(2006) Biology, Psychology, Sociology, Health, Political
Science, Economics, Education, Law, Business,
Management
+36% to 172%NA
Physics+250% to 580%NA
(2006) Astronomy and Physics+200%NA
Agricultural Science+621% but not to every
journal
NA
(2005) AstronomyNoneSelection bias and early
advantage
Astronomy+200%Early advantage
confirmed
OpthalmologyNoNA
Computer Science+157% up to +284% for top
publication
NA
Ecology, Botany, Multidisciplinary Science and
Biology
+8%NA
Natural Sciences0-+50% in 2003 depending
on field, negative citation
advantage in 2000
NA
Astronomy+200%NA
Solar Physics+170% and +260%
depending on the online
repository
No evidence for
selection bias
Condensed Matter PhysicsNAConfirm early access
advantage and
selection bias but no
OA effect
(2008) Ecology, Applied Mathematics, Sociology and
Economics
+157%NA
(2005) Medicine+300% up to +450%NA

Astronomy+200%Early advantage
Environmental ScienceNot significantNA
(2015) All+111% up to 152%NA

Economics+35% up to 64% depending on
the database used
NA
(2011) Humanities, Life Sciences, Mathematics & Physical
Science, Medicine, Social Sciences
-49.24%-+87.73%NA
Communication Studies+200%NA

Estimates for the open citation advantage range from +36% (Biology) to +600% (Agricultural Sciences) ( Swan, 2010 ; Wagner, 2010 ). In a longitudinal study, Eysenbach (2006) compared the bibliometric impact of a cohort of articles from a multi-disciplinary journal ( Proceedings of the National Academy of Sciences ) that offers both OA and non-OA publishing options. After adjusting for potentially confounding variables, the results indicated that non-OA papers were twice as likely to remain uncited six months after publication when compared to OA articles. Additionally, the average number of citations for OA articles was more than double than that of the non-OA articles. The study also differentiated the type of OA article, namely the self-archived (i.e., Green OA) and the publisher version of record (VOR) that is freely available (i.e., Gold OA). Gold OA was found to have a higher overall academic impact than Green OA.

Despite strong evidence for a citation advantage, the magnitude of this advantage remains variable. The substantial heterogeneity in observed citation advantages can be due to different academic cultures or could simply be spurious. For example, self-archiving prior to publication is a community standard in fields such as high energy physics or mathematics, but has yet to be widely adopted among the life sciences. Such ‘pre-prints’ have also been associated with an overall increase in the average number of citations, the total number of citations, and the speed of citation accumulation ( Aman, 2014 ; Gentil-Beccot et al. , 2010 ). Other studies could only replicate immense citation advantages (+600%) if relevant predictors were omitted ( McCabe & Snyder, 2014 ), which indicates a potential spurious effect. When taking into account these relevant predictors, the citation advantage became much smaller (+8%). When the citation advantage is low or non-existent, this could suggest that in those research fields there is a sufficient level of access to the literature such that OA confers no localised access advantage, or that adoption of OA has not yet reached a level where any such advantage has become statistically evident.

One alternative explanation for the existence of citation advantages could be that researchers choose to publish OA when a finding is more impactful, but empirical evidence contradicts this selection effect. Gargouri et al. (2010) compared citation counts for articles which were self-selected as OA or mandated as OA (e.g., by funders). The study concluded that both were cited significantly more than non-OA articles and showed no differences in citation rates. As such, these findings rule out a selection bias from authors as the cause for the citation advantage ( Gargouri et al. , 2010 ). However, research that is selected to merit funding by funding agencies may, in itself, be perceived to be more impactful than research that is not funded. Additionally, as no single OA mandate is ever 100% effective, it might be the simple case that authors are more likely to comply with a mandate for the research they perceive to be of higher impact. In a study of articles in the field of psychology, Anderson (2013a) found that publications with funding sources reported in the text were found to be more highly cited and connected to other highly-cited publications (this type of publication is called "generative" in the study) than publications with no reported funding sources. Furthermore, research that was privately funded was found to be more generative than publicly funded research. In a similar study in the Library and Information Sciences field done by Zhao (2010) , the citation counts for grant-funded publications were "substantially higher" than publications without grant funding. Although these studies indicate that grant funding is correlated with increased citation rates, the openness of articles was not addressed in either study. Future research will be required to demarcate the potential causality and to determine the conditions under which we could see whether or not OA has an effect on citation counts. For example, this could be conducted through a randomised controlled trial in which research articles from a particular funder are randomly assigned to OA and non-OA routes, with the citation counts assessed after a certain time.

In sum, evidence indicates that OA is broadly related to increased academic impact in terms of citations ( Figure 2 ; see also McKiernan et al. (2016) ), but given the large variability in results, further research should aim to synthesize these findings in a meta-analysis and try to explain the cause of this variability.

Broader societal impact. Scholarly articles also have a societal impact, such as when they are covered in news media or are discussed in social media channels; alternative metrics, or altmetrics, can be used as a guide to measure this mode of impact ( Liang et al. , 2014 ). Information such as social media usage, Mendeley readership, and media attention ( Piwowar, 2013 ) can be tracked by various altmetrics providers (e.g. ImpactStory, Plum Analytics, and Altmetric.com). As such, when an article generates discussions outside of the academic literature, altmetrics are capable of tracking this. Despite limitations (such as academics discussing their own research on platforms like Twitter), altmetrics provide a general view of the wider societal impact of research articles. Considering the increased pressure on researchers and research institutes to communicate research findings to the public, altmetrics can provide additional insight into which research drives public interest. A working group established by NISO is investigating the future role of altmetrics in research communication and assessment ( www.niso.org/topics/tl/altmetrics_initiative/ ).

OA articles would be expected to have an altmetrics advantage compared to the non-OA literature; if an article has fewer restrictions for journalists, citizens, businesses, and policy-makers, it seems logical that this would enable the research to be publicly re-used. Furthermore, those parties may be more likely to promote articles which are publicly accessible into different communication channels. In other words, increased access removes barriers to widespread societal engagement, whereas a relative lack of article access discourages engagement.

There is research showing evidence for an altmetrics advantage for OA articles, but this does not reflect itself in the most impactful articles. Wang et al. (2015) found evidence that OA articles receive more attention through social media. The authors compared social media attention (Twitter and Facebook) between OA and non-OA articles at Nature Communications and found that OA articles get 1.2–1.48 times as much social media attention as compared to non-OA articles (see also Adie, 2014 ). Nonetheless, of the top 100 articles of 2015 as presented by Altmetric.com, only 42 articles were OA ( www.altmetric.com/top100/2015/ ). This 42% is larger than the overall proportion of OA articles in the literature, which indicates that OA contributes relatively more impact per paper. However, it also indicates that the open impact advantage can be overshadowed by the intrinsic nature of the research published or by the traditionally prestigious journals with a larger and dedicated media apparatus (e.g., Nature, Science; Brembs et al. , 2013 ).

Allen et al. (2013) found that a social media announcement of the release of a research article increases the number of users who view or download that article, but does not translate to increases in the citation count in the field of clinical pain research. Costas et al. (2015) found a relatively weak correlation between social media activity and citation counts for the articles in their sample (over 1.5 million article records), while Mohammadi et al. (2015) found that the number of Mendeley readers with a status of graduate student or faculty correlated with citation counts. When OA to the articles is factored into an analysis, there is a potential recursive relation between citation counts and altmetrics due to OA. Eysenbach (2011) indicated that there is a moderate correlation (0.42–0.72) between the tweets and citations of articles from an OA journal ( Journal of Medical Internet Research ). Highly tweeted articles were eleven times more likely to be highly cited than less-tweeted articles, or vice versa (75% of highly tweeted articles were highly cited; 7% of less-tweeted articles were highly cited). However, it is difficult to assess causality in these cases: do research papers that have more academic impact make their way more frequently into societal discussions, or does online discourse increase their potential citation rates? Overall, this evidence implies that there is a general media advantage with OA (see also McKiernan et al. (2016) ), which can be used as a proxy or pathway to indicate greater societal impact.

Altmetrics themselves should not be conflated with citations when it comes to assessing impact, even though some providers such as Altmetric.com supply a single score that can be used to rank an article in a similar way to a journal’s Impact Factor. Each measure of altmetrics tells a different story about the impact of research, and a careful understanding of the altmetrics landscape in conjunction with citation-based metrics can lead to a clearer picture of societal impact of scientific research.

Open Access and text- and data-mining

Traditionally, in order to publish a paper, researchers hand over their copyright via a Copyright Transfer Agreement. Copyright transfer as the default has far-reaching consequences on the ability of both the authors and others to re-use that published research, and many authors are not aware of the impact of these transfers on their ownership of the work. Academics frequently give the copyright to the publishers in exchange for the perceived prestige of publishing in one of their venues (e.g., Müller-Langer & Watt, 2010 ). In some cases, institutes adopt rights-retention OA policies that grant authors non-exclusive rights to their institutes before signing copyright agreements with publishers, which enables them to make articles OA without requiring permission from publishers ( cyber.law.harvard.edu/hoap/Good_practices_for_university_open-access_policies ). Essentially, copyright is a pre-digital tool wielded by traditional publishers to maintain revenues rather than fostering creativity, innovation, or protecting authors ( Okerson, 1991 ; Willinsky, 2002 ). For example, the Author’s Guild sued Google Books for copyright infringement because they provided freely available digital copies; the court rejected this suit in 2016, stating that Google Books served the public interest and that copyright’s "primary intended beneficiary is the public" ( EFF, 2015 ). In the digital age, copying is essential to perform necessary research tasks. These activities range from viewing the article (i.e., downloading requires copying) to re-using figures from an article in a book. The interaction of OA and copyright is complex and deserves extended research in itself (e.g., Scheufen, 2015 ). We will highlight how OA views copyright and relate this to its effects on text- and data-mining (TDM).

The majority of ‘born OA’ journals and publishers do not request or receive copyright from authors. Instead, publishers are granted non-exclusive rights to publish, and copyright is retained by authors through a Creative Commons license (typically CC-BY). Importantly, this represents a power shift from publisher-owned to author-owned rights to research. This model of author-retained copyright appears to be favoured by the majority (71%) of the research community ( Hoorn & van der Graaf, 2006 ). Shifting copyright to stay with the author, combined with appropriate open licensing, allows for wider re-use, including TDM, and forms the basis for a robust scholarly ecosystem.

As such, copyright in OA publications is non-restrictive and also allows machines to freely access it. In traditional publishing, human reading and computer reading are seen as two separate things which require different agreements, whereas OA publishing views them both in the same, non-restrictive manner. In other words, in order to mine OA journals, one only needs the technical skills to do so. In order to mine closed access journals, one needs to sign or negotiate access conditions, even if legitimate access to the articles has already been bought ( Bloudoff-Indelicato, 2015 ).

Automated extraction of information from scholarly research via TDM is a methodology that can be applied to investigate the scholarly literature at an enormous scale, creating new knowledge by combining individual findings. This has already proven to be useful for a large variety of applications (e.g., Glenisson et al. , 2005 ; Martone et al. , 2016 ; Swanson, 1987 ). Moreover, OA publishers facilitate TDM on a massive scale by allowing multiple options for collecting the literature needed. For example, PLOS is non-restrictive and allows users to scrape articles directly from the website or using its API. As a result, scraping tools can be used, such as rplos , an R package developed to search and download full-text scholarly papers ( Chamberlain et al. , 2016 ).

TDM is not only a knowledge-generation tool; it also allows for automated screening for errors and automated literature searches that renew scientific discovery. With TDM it becomes possible to easily compare one’s results with those of the published literature, identify convergence of evidence and enable knowledge discovery ( Natarajan et al. , 2006 ) or discover frequent tentative hypotheses that can be used for new research ( Malhotra et al. , 2013 ). It has already been used to make major advances in fields such as biomedicine ( Gonzalez et al. , 2016 ). TDM also allows for computer applications that can download all scholarly literature given certain search terms (e.g., ContentMine’s getpapers tool ; github.com/ContentMine/getpapers ), simplifying and shortening the tedious literature search. TDM can also serve a screening purpose similar to plagiarism scanners, helping to detect statistical errors in the scholarly literature (e.g., Nuijten et al. (2015) ). TDM can be used in various innovative ways and is an emerging and rapidly advancing field; non-restrictive licensing through OA certainly promotes its wider application.

Given the exponential increase in the number of scholarly publications, (semi-)automated methods to synthesize results have become increasingly important. TDM decreases the time dedicated to the search for relevant information in scholarly literature by categorizing information ( Leitner & Valencia, 2008 ), highlighting and annotating relevant results to specific users ( Shatkay et al. , 2008 ), and profiling research ( Porter et al. , 2002 ). Furthermore, TDM also prevents researchers and readers from wasting time on reinventing the wheel, simply because one can no longer keep up with the huge amount of published literature available ( Harmston et al. , 2010 ).

Because of traditional copyright transfers, TDM has often been stymied by traditional, closed access publishers who frequently see it as a copyright infringement. Researchers using software that harvests data from online publications have been (threatened to be) cut off from accessing the articles. These researchers found themselves trapped in negotiations to resume their research (even though their universities had paid subscription fees for access (e.g., Bloudoff-Indelicato, 2015 ; Van Noorden, 2012 )). Standard subscriptions do not permit systematic downloads because publishers fear that their content might be stolen and their revenue therefore lost ( Van Noorden, 2012 ). In 2014, Elsevier opened its papers for TDM via a proprietary API ( Van Noorden, 2014 ), but placed restrictions on the researchers using it; however, researchers are not legally required to comply with these restrictions in some countries (e.g., U.K., U.S.A., Handke et al. , 2015 ).

To make the enormous corpus of closed access papers retrospectively available to the public might be possible through legal action at an institutional or governmental level. The Dutch Government, for example, has recognized OA as a right, with Dutch citizens capable to make their scientific publications free to access after a ‘reasonable period of time’ ( Open Access NL, 2015 ). Such steps are further supported by Shavell (2010) and Eger & Scheufen (2012) who ascertained that transition towards an OA model could not be smooth without first undertaking the necessary legislative steps. The position of institutes regarding copyright transfer remains generally unclear. While academics themselves may have little power in broader debates regarding copyright, institutes could claim ownership of the work by invoking their rights under the work made-for-hire doctrine ( Denicola, 2006 ). However, OA policies at universities generally use a system of non-exclusive rights, presupposing that faculty are the owners of their work and can grant non-exclusive rights to the university for use (for examples of approaches and language used when drafting OA policies, see ( Shieber & Suber, 2016 )). Importantly, this means that OA through the ‘Green’ route does not always depend on permission from publishers, and increasingly is becoming dependent on rights retention by authors, through carefully-drafted and widely-supported university policies. While these are positive steps towards making research available for TDM, in light of the potential copyright problems for closed access articles and the fact that not all research is available through institutional Open Access policies, TDM will be easier and legally safer for OA journals. As a consequence, TDM is likely to be more readily applied to OA literature when compared to closed access literature.

The economic impact of Open Access

The effect on publishers.

Any publisher has to cover operating costs, which are primarily made of (i) article processing charges (APCs), (ii) management and investment costs, and (iii) other costs. Article processing includes editing, proofreading and typesetting, among other things. Management and investment are instead the marginal costs needed to establish and keep the journal running. Other costs include promoting the journal, hosting and infrastructural services, sponsoring conferences, and other services that are extrinsic to research articles themselves. The average production cost for a single research article is estimated to be around $3500–$4000 ( Van Noorden, 2013 ) but these costs are highly depending on the publisher. For example, Philip Campbell (Editor-in-Chief of Nature ) stated that his journal’s internal costs were at $20,000–$30,000 per paper ( Van Noorden, 2013 ), due in part to the high selectivity and rejection rate at Nature (i.e., this is an average cost per published paper, and not the production costs associated with publishing a single accepted paper). However, these are at the high end of the cost spectrum, with other journals, such as the Journal of Machine Learning Research (JMLR) costing between $6.50–$10 per article ( blogs.harvard.edu/pamphlet/2012/03/06/anefficient-journal/ ). Other publishers are completely transparent about their direct and indirect production costs, such as Ubiquity Press, which levies an APCs of $500 ( ubiquitypress.com/site/publish/ ). One possible reason for such variation between journals and publishers is that it is generally unclear whether proposed costs relate to those directly involved in article processing or those required in order for a publisher to ‘break even’ if they receive zero subscription income for an article made OA.

In order to cover those costs and make a profit, closed access publishers charge for access via subscriptions, whereas many OA publishers or journals charge to publish. Due to increased subscription costs, closed access publishing is becoming an increasingly unsustainable business model ( Odlyzko, 2013 ) with prices estimated to have outpaced inflation at 250% in the past thirty years ( www.eff.org/issues/open-access ). This will slowly but surely diminish the scope of access to the scholarly literature as fewer organisations are able to pay such high costs. Only recently has any transparency into the detailed costs of subscriptions been gained by using Freedom of Information Requests to bypass non-disclosure agreements between libraries and publishers ( Bergstrom et al. , 2014 ; Lawson & Meghreblian, 2015 ). These requests provide the basis for understanding the economics of scholarly communication. For example, Bergstrom et al. (2014) found that commercial publishers, including Emerald, Sage, and Taylor and Francis, have prices of ten times the amount of non-profit publishers per citation for PhD-granting institutions. Two potential ways to prevent future retention of an unsustainable model is through decreasing the subscription prices, thereby lowering publishers’ profit margins and the financial burden on subscribers, or through switching to new OA-oriented business models and creating new value. Either way, price transparency will be essential for future bargaining efforts between academic libraries and publishers, and will be of interest to those involved in public policy and scholarly publishing. The concept of transitioning from a subscription-based model to one driven by APCs will be financially appealing to journals that operate with minimal profits or at a loss, and can be a pathway to achieve financial security and long-term journal sustainability. As such, increasing revenues is a strong incentive for OA ( osc.hul.harvard.edu/programs/journal-flipping/public-consultation/4/6/ , accessed 26/04/2016).

OA publishing has become associated with a ‘pay-to-publish’ model, whereas around 70% of peer-reviewed OA journals do not charge APCs, according to data from the Directory of Open Access Journals (DOAJ) (see blogs.harvard.edu/pamphlet/2009/05/29/what-percentage-of-open-access-journals-charge-publication-fees/ and citesandinsights.info/civ16i4.pdf ). However, approximately 50% of all articles published in peer-reviewed OA journals are published in APC-based venues ( Crawford, 2015 ; Laakso & Björk, 2012 ; Walters & Linvill, 2011 ). Authors paying to publish can be viewed as a fundamental conflict of interest for researchers. Nonetheless, this payment model has proven itself to function properly when editorial decisions are separated from the business-side of the publisher (i.e., editorial independence), removing the problem of ‘publication-bribery’. Additionally, many journals have always levied charges for to cover the costs of publishing regardless of OA; for example, PNAS charges $1225 per regular research article (with an additional $1350 for OA; pnas.org/site/authors/fees.xhtml ), and Cell charges $1000 for the first colour figure and $275 for each subsequent one ( cell.com/cell/authors ; as of April 2016). Therefore, equating OA with ‘pay-to-publish’ is actually a bit of a misnomer, as several closed journals charge to publish and many open journals do not. Furthermore, many publishers (e.g., PLOS , PeerJ ), as well as many learned societies, operate fee waiver schemes for researchers unable to obtain funds to cover publication fees.

For those OA publishers implementing a pay-to-publish model, around 68.8% of publishers offer fee waivers to low- and middle-income countries ( Lawson, 2015 ), while other journals offer fee discounts often given in lieu of total fee waivers. Solomon & Björk (2012) investigated the sources of funding used by authors for APCs, indicating that these are highly variable across academic disciplines. For example, while 45.5% of authors in Health Sciences, Biology and Life Sciences use grant or contract funding as source for APCs, only 10.4% use this in Business and Economics, with 45.8% coming from personal funds. Other sources include national funding bodies, and discretionary funds administered by institutions, as well as institutional funds specifically in place to support OA policies (see also Dallmeier-Tiessen et al. , 2011 ). Sources for APCs are also highly variable depending on the per capital GNP of the authors’ country, as well as the size of the APC ( Solomon & Björk, 2012 ). According to MacKie-Mason (2016) , one potential outcome of authors seeing the price of APCs and securing funding for them is that authors may begin to take the price of APCs into account (in addition to other factors such as prestige and topic) when selecting a journal for their research output, which may drive market competition and could as a consequence lower the price of APCs. However, a potential negative consequence of an increasingly APC-driven model of OA is that some researchers may struggle to procure funds in order to publish and conform to mandates at different levels. This might impact early-career researchers and those working in fields were research grants and publishing fees are more difficult to obtain.

Subscription-based publishers still frequently produce print versions of their journals, which increases production costs, potentially to justify charging for readership or to satisfy a small demographic who prefers this mode of reading. After all, subscriptions to print journals make sense and, if large-scale printing is still in place, simply transferring this idea to the digital versions creates continuity. Print versions are accompanied by logistical costs to print and ship each issue, but these are partially offset with reprint orders, additional charges for colour figures, and print-based advertising. For some of the largest subscription-oriented publishers the annual net profit on investment reaches up to 40 percent, which makes academic journal publishing highly lucrative for investors ( Satyanarayana, 2013 ), further increases investment to sustain this type of publishing model, and allows maintenance of an oligopoly ( Larivière et al. , 2015 ).

OA publishers only publish digitally and have opened up avenues for innovation. For example, PeerJ has introduced a wholly different OA business model, where readers pay nothing to access articles, but authors pay a membership fee once to publish for a lifetime. The Open Library of Humanities (OLH) is another innovative business model in which libraries pay a small fee to support OLH and scholars are able to publish for free (subscription for publishing rather than subscription for access); this support also enables the OLH to help journals transition from a subscription model to OA (for example, the recent case of Lingua ; timeshighereducation.com/research-intelligence/open-library-humanities-aims-flip-journals-open-access ). Library publishing has also developed in response to the OA movement; in this model, academic libraries begin publishing operations in the interest of providing added value to their patrons and contributing to the growth of knowledge ( librarypublishing.org ). In terms of innovating in the publishing platform itself, eLife have introduced the Lens as a novel way of viewing research articles online ( lens.elifesciences.org/about/ ), and F1000Research has introduced so-called ‘living figures’ to enable researchers to interact with data underlying research findings (e.g., Colomb & Brembs, 2015 ).

With this innovation comes massive scope for reducing the costs associated with publishing through implementing more efficient procedures. In this case, costs are reduced by eliminating the need for type-setting and copy-editing, with web-hosting costing only $15/year, and a total operating cost of between $6.50–$10.50 per article. Other platforms such as ARPHA offer an end-to-end XML-based publishing service, utilised by publishers including Pensoft, with a more efficient and integrated publishing workflow, which should highlight and reduce the real costs of publishing. In addition, OA has the potential to increase the speed of publication, as seen in journals like eLife and PeerJ ( blog.dhimmel.com/plos-and-publishing-delays/ ), which combined with ‘pre-print’ servers like biorXiv and platforms that offer post-publication peer review like Research Ideas and Outcomes ( riojournal.com/ ), F1000 Research , and ScienceOpen ( www.scienceopen.com/ ), can exceptionally accelerate the speed of research communication. Such innovations add value to the research communication process (contrary to services such as paying to print colour figures) and represent just several cases of innovation across the publishing ecosystem. One can imagine that publishing costs in OA journals become dependent on the value added on a per-article basis, which can help reshape and improve scholarly communication. As such, making publication costs dependent on the value added aligns the interests of publishers with those of scholars, where improving the quality of the process of scholarly communication is the end goal. The motivation behind this could come from the currently available data, which suggest that hybrid publishing options offered by traditional publishers, while being of higher cost due to supposed prestige, provide a much lower overall quality publishing process ( blog.wellcome.ac.uk/2016/03/23/wellcome-trust-and-coaf-open-access-spend-2014-15/ ). It is noteworthy that in spite of the higher costs of hybrid publishing compared to ‘pure’ or ‘born’ OA publishing, some reports, such as the highly influential and somewhat controversial Finch Report in the UK ( www.researchinfonet.org/publish/finch/ ), favoured the former model and high-priced Gold OA over a Green model.

The effect on non-publishers

The implementation of OA models has implications beyond the publishing industry in terms of economics. Research funding comes from multiple sources, including national funding agencies and industries, as well as private funders. Much primary research actually takes place outside of academia, inside R&D departments; if R&D in the private sector can access more research findings, this will ultimately benefit the public interest as well. A report from 2004 by Arzberger and colleagues into the scientific, social and economic development of access to research results concluded that access should be promoted to the largest extent possible. According to this report, access to research results can only be responsibly restricted in the case of national security, privacy, or those involving IP rights of the authors ( Arzberger et al. , 2004 ). A major principle underlying this is the ownership of research results: publicly funded research and data are public goods and because they have been produced in the public interest they should be considered and maintained as such. Indeed, such a principle has become one of the focal rallying points of the global OA movement. Appropriate licensing and accessibility can influence re-use through commercialization, and can empower citizens and industry to recognize great economic benefits. This apparently resonates with many organisations, as indicated by the increased numbers of OA policies on a global basis (see Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is f1000research-5-10376-g0002.jpg

Figures are given at the beginning of each year. Source: ROARMAP, accessed March 2016.

With access to scholarly articles, entrepreneurs and small businesses can accelerate innovation and discovery, which is advantageous for advancing the ‘entrepreneurial state’ ( Mazzucato, 2011 ). Access to research results has clear advantages for a range of industries and can help stimulate regional and global economies. Increased access to research results has been associated with considerable increases of return on financial investment ( Beagrie & Houghton, 2014 ). Furthermore, OA facilitates collaborations between publishers and industrial partners to leverage the potential of structured information networks for advanced data mining projects, such as that recently announced between IBM Watson and PLOS ( Denker, 2016 ). One of the major driving forces behind the development of OA in the UK on a national level, the ‘Finch Report’, also concluded that OA was an essential source for information and innovation to the civil service, commercial sectors, small- and medium-sized enterprises (SMEs), and the general public ( www.researchinfonet.org/publish/finch/ ).

Taking UK cancer research as one high impact case study, there is substantial evidence for the economic benefit of OA. In 2011–12 prices, the total expenditure on research relating to cancer in the period of 1970–2009 was £15 billion ( Glover et al. , 2014 ). 5.9 million quality adjusted life years were gained from the prioritized interventions in 1991–2010, of which the net-monetary benefit was an estimated £124 billion (i.e., eight-fold return on investment). However, only 17% of the annual net-monetary benefit was estimated to be attributable to research performed in the UK ( Glover et al. , 2014 ), suggesting that 83% of the economic return on cancer research is drawn from research from non-UK sources. Another example is from the area of environmental impact assessments, where Vickery (2011) has shown that OA to R&D results could result in recurring gains of around €6 billion per year. As such, opening up research for global access rather than localized and restricted use has the potential to increase the economic return, as demonstrated with the cases on cancer research and environmental impact assessments.

The price of Open Access

The question of the current publication cost is difficult and confounded by estimates of the total global publishing costs and revenue. Data provided by Outsell, a consultant in Burlingame, California, suggest that the science publishing industry generated $9.4 billion in revenue in 2011 and published around 1.8 million English-language articles. This equates to an approximate average revenue per article of $5,000. A white paper produced by the Max Planck Society estimated costs at €3,800–€5,000 per paper through subscription spending, based on a total global spending of €7.6 billion across 1.5-2 million articles per year in total ( Schimmer et al. , 2015 ). Other estimates suggest that the total spending on publishing, distribution and access to research is around £25 billion per year, with an additional £34 billion spent on reading those outputs, a sum which equates to around one third of the total annual global spending on research (£175 billion; Research Information Network, 2008 ).

Such high costs are at odds with alternative estimates of the cost of OA publishing. For example, the Scientific Electronic Library Online ( SciELO ) is a pan-Latin American bibliographic database, digital library, and co-operative electronic publishing model of OA journals. It is estimated that their costs are between $70 and $600 per OA article depending on the services provided ( Brembs, 2015 ). OA now dominates the Latin American publishing landscape, with an estimated 72–85% of articles now with full text OA articles publicly available ( www.sparc.arl.org/news/open-access-latin-america-embraced-key-visibility-research-outputs ). Furthermore, in countries such as Brazil, higher quality journals are more likely to be published OA ( Neto et al. , 2016 ), implying that low-cost, high quality, and OA can all co-exist. Even more extreme estimates of the cost of OA come from Standard Analytics, who suggested the absolute minimum per-article costs of publishing could fall to between $1.36 and $1.61 with sufficient cloud-based infrastructure ( Bogich et al. , 2016 ). However, it is likely that this estimate under-emphasizes marginal costs that are beyond a per-article cost basis. However, what is clear from these analyses is that OA has the opportunity to become a cost-reducing mechanism for scholarly publishing. Open Journals System (OJS), an open source software available for anyone to use and download without charge, is another example of this. Additionally, researcher-led initiatives such as the recently launched Discrete Analysis have costs that average around $30 per article, with no cost to authors or readers, and utilise the infrastructure offered by the arXiv to keep costs low ( discreteanalysisjournal.com ).

In her article, Sutton (2011) argued that current scholarly journals are digital products and that as such they are driven by very different economic principles and social forces than their print ancestors. Based on Anderson (2013b) , the author made the case that changes in both the delivery of scientific content and in publishers’ business models was inevitable when journals moved online. Sutton (2011) considered that scientific literature is no different from other digital products with respect to distribution costs and as such it is no exception to the ‘zero is inevitable’ rule of pricing.

The societal impact of Open Access

OA to the scholarly literature does not just benefit academics, but also has wider impacts on other domains in society. It makes research available to anyone with an Internet connection who has the ability to search and read the material. Therefore, it transcends academic affiliation and supports sustainable lifelong learning. Examples of groups who might benefit most from OA include citizen scientists, medical patients and their supporting networks, health advocates, NGOs, and those who benefit from translation and transformation (e.g., sight-impaired people). In theory, OA affects anyone who uses information, and opens up possibilities for knowledge to be used in unexpected, creative and innovative ways, far beyond the mainstream professional research.

Access to knowledge has been called a human rights issue, considering it is included in Article 27 of the United Nations Declaration of Human Rights. Willinsky (2006) has argued that " Access to knowledge is a human right that is closely associated with the ability to defend, as well as to advocate for, other rights. ". This is not only true for access to knowledge from research that could save human lives, but also, as argued by Jacques Derrida, to the right of access to philosophy and the humanities disciplines that stem from it. Derrida writes about the field of Philosophy, " No one can forbid access to it. The moment one has the desire or will for it, one has the right to it. The right is inscribed in philosophy itself " ( Derrida, 2002 ).

Society’s ability to make research publicly accessible supports the long-term interest and investment in research. Citizens support research through taxes and therefore one could argue that efforts to support public access should be a fundamental part of the research process. While OA is not a solution to all aspects of research accessibility (e.g., filtering and language barriers, connectivity barriers and disability access remain continuing issues to be addressed; cyber.law.harvard.edu/hoap/Open_Access_(the_book) ), it most certainly increases accessibility greatly and at the same time allows innovations to remove other barriers (e.g., OA articles can be freely translated to address language barriers and can be changed to different formats to accommodate screen readers). Anecdotal evidence suggests that public access to research is required from a range of public spheres ( whoneedsaccess.org/ ). Nonetheless, the fact that access to knowledge continues to be prohibited in fields like public health should be of major concern to all stakeholders engaged in academic publishing.

In addition to professional research by, for example, academics, there is the dimension of citizen science. In citizen science, the broader public participates in the research process itself and will have an increased interest in accessing previous research. Numerous projects such as Galaxy Zoo, Zooniverse, Old Weather, Fold It, Whale FM, Bat Detective, and Project Discovery, are all different initiatives in which citizens publicly and openly engage with research. These initiatives introduce new ways of knowledge creation and these groups also require thorough access to actually be able to do non-redundant research. Citizen science forms part of the societal case for OA, because it clearly indicates that anyone can be actively engaged with research, and not only professional scientists.

Some traditional publishers and some academics have argued that public access to research is not required because research papers cannot be understood by non-specialists ( cyber.law.harvard.edu/hoap/Open_Access_(the_book) - see Section 5.5.1). However, citizen science initiatives already indicate the general public is interested in and understands the research. Whereas this understanding and engagement is highly variable, and strongly dependent on a range of extrinsic and intrinsic factors, the fact that a high level of public interest in science already exists is of relevance. These publishers and academics argue that specialization is a sufficient reason for confining access to professional research bodies through subscriptions. Such statements conflate a lack of desire or need for access with the denial of opportunity to access research, and makes false presumptions about the demand in access to the literature (i.e., unmet and unknown demand). Importantly, OA provides access to everyone who potentially needs or wants it, without making explicit and patronising statements or guesswork about who needs or deserves it. As Peter Suber says in his 2012 book: "The idea [of OA] is to stop thinking of knowledge as a commodity to meter out to deserving customers, and to start thinking of it as a public good, especially when it is given away by its authors, funded with public money, or both" (page 116). Isolated incidents such as the crashing of servers of Physical Reviews Letters upon the ‘Gravitational Waves’ announcement and OA publication (Feb, 2016; Abbott et al. , 2016 ) indicate that there are cases of extreme public interest in science that closed access would only impede. Moreover, one out of four people seeking medical information have hit a paywall at least once ( pewinternet.org/2013/01/15/information-triage/ ). Claims that only experts can and should read research articles does little to break down the ‘ivory tower’ perception that still pervades academia, and undermines the enormous amounts of resources invested in science communication and public engagement activities. Such perceptions run counter to the idea of access to knowledge as a right, retaining it as a privilege based on financial or academic status.

Open Access in developing countries

The arguments outlined above form the basis for democratic and equal access to research, which come to light even stronger in the developing world. For low- and middle-income countries (LMIC), OA publishing breaks traditional financial barriers and allows unrestricted, equal access to scholarly information to people all over the globe. Due to the high prices of journal subscriptions, developing countries struggle with access just as in developed countries, but to a greater extent and consequently with greater negative repercussions. For example, a research paper from 1982 that indicated why Liberia should be included in the Ebola endemic zone was unknown to Liberian officials in the 2014 Ebola outbreak ( Knobloch et al. , 1982 ); the paper was published behind a paywall, drastically reducing its discoverability. Even though the result is available in the abstract of the paywalled article, assessing the truth of the result certainly requires access to the full research article. In general, lack of access can have major deleterious consequences for students and researchers, in that they do not have sufficient material to conduct their own primary research or education.

OA provides a mechanism to level the playing field between developed and developing countries. This increases fair competition and the scientific potential of the developing world ( Chan et al. , 2005 ). This aspect is linked to the wider issue of open licensing, which is essential for effective marketing of medicines and medical research in developing countries ( Flynn et al. , 2009 ), and justifies the necessity of OA in the wider context of social welfare. Developing countries clearly acknowledge the need for access and as such have launched many repositories to increase access with self-archiving of research articles. In 2014, over 100 institutions in Africa launched a network of over 25 fully-operational OA repositories in Kenya, Tanzania and Uganda ( www.ubuntunet.net/april2014#researchrelevant ). Such developments suggest that African nations are leaning more towards a Green model of OA adoption.

The shift from a ‘reader pays’ to a pre-publication fee model (often conflated with ‘author pays’; see subsection ‘The effect on publishers’) with OA potentially limits its adoption in developing countries. The pay-to-publish system is a potentially greater burden for authors in developing countries, considering that they are not used to paying publication costs, and funding systems for OA are not as well-established as those in the Western world. Publication fees present an even greater relative burden ( Matheka et al. , 2014 ) given that they can often exceed a monthly salary for researchers. This has been at least partially mitigated with waiver fees for authors from developing countries and additional provisions in research grants, and around 70% of peer reviewed OA journals are fee-free. In November 2015, Research4Life ( research4life.org ) and DOAJ announced a working partnership that will help to ensure that the Research4Life users will have access to the largest possible array of OA journals from publishers with a certain quality standard. While Research4Life does not directly cover OA publication costs, a lot of publishers propose full or partial waivers if they are based in countries eligible by Research4Life. However, determining which countries qualify for access to scientific journals through these programs, and which journals they are provided access to, is a fairly closed process. They are also not entirely stable, as publishers can opt out of the initiative, or be selective about which countries they choose to serve. In 2011, publishers withdrew free access to 2500 health and biomedical journals for Bangladesh ( Kmietowicz, 2011 ) through the HINARI programme. While access was subsequently reinstated, this demonstrates that such initiatives are not an adequate replacement for full OA ( Chatterjee et al. , 2013 ). Despite these programs purporting to provide essential articles to researchers in poor nations, they exclude some developing countries (e.g., India) and limit access to researchers who work in registered institutions.

Initiatives such as the Journals Online Project developed by INASP (International Network for the Availability of Scientific Publications; inasp.info/en/ ) has helped to develop a number of online OA platforms in the Global South. These were launched in 1998 with the African Journals Online (AJOL) platform, a project currently managed in South Africa. More recently, INASP have set up Latin American Journals Online (LAMJOL) which hosts journals in El Salvador, Honduras, and Nicaragua. In Asia, Bangladesh Journals Online (BanglaJOL), Nepal Journals Online (NepJOL), and Sri Lankan Journals Online (SLJOL), all facilitated through INASP, continue to develop and now around 95% of their articles are full-text Open Access. As mentioned previously, improved access should not be limited to professional researchers only, considering that there is also global interest from the broader public, including health professionals.

Deceptive publishing practices

One negative effect of OA comes from entities that attempt to profit by exploiting the pay-to-publish system used by many OA publishers. These publishers operate a sub-category of OA journals known as vanity presses, predatory publishers ( Beall, 2012 ) or pseudo-journals ( McGlynn, 2013 ). These journals, referred to in this work as ‘deceptive publishers’, seem to be in the scholarly publishing business primarily to collect publication fees (i.e., APCs) in exchange for rapid publication without formal peer-review. Beall (2015) has defined a list of criteria for identifying deceptive publishers and an index of publishers and individual journals that meet these criteria is continuously updated ( scholarlyoa.com ).

While not all scholars and advocates agree with the criteria proposed by Jeffrey Beall (who controversially describes the OA movement as "an anti-corporatist movement that wants to deny the freedom of the press to companies it disagrees with" ( Beall, 2013 )), there are several factors that many agree on to identify a deceptive publisher, but these factors are not clear-cut indicators of deceptive publishing. One such indicator is that deceptive publishers tend to charge low publication fees ( Xia, 2015 ), most below $100 and few charge more than $200. However, while this is a trait of almost all deceptive publishers, the reverse is not necessarily the case. For example, a single-authored paper with PeerJ would cost $99, but this is not a deceptive publisher. On the contrary, the average publication fee of journals indexed in the Directory of Open Access Journals (DOAJ) is around $900–$1,000 ( Solomon & Björk, 2012 ) and leading universities in the UK and Germany pay on average $1,200–$1,300 per article ( Schimmer et al. , 2015 ). The editorial and peer-review aspects of deceptive publishers are either non-existent or suspect; they also falsely claim to have ratings such as a Journal Impact Factor and to be indexed in major databases such as Scopus ( Djuric, 2015 ). Editors from these journals solicit articles that have no relation to the topic of their journal and do not send the manuscripts out to be properly peer-reviewed ( Bowman, 2014 ).

The problem of deceptive publishers in OA seems to highly affect countries where the academic evaluation strongly favors international publication without further quality checks ( Shen & Björk, 2015 ). Xia et al. (2015) collected and analyzed the publication record, citation count, and geographic location of authors from the various groups of journals. Statistical analyses verified that deceptive and non-deceptive journals have distinct author populations: authors who publish in deceptive journals tend to be early-career researchers from developing countries with still little publishing experience. The spatial distribution of both the deceptive publishers and those authors who submit in pseudo-journals is highly skewed: Asia and Africa contributes three quarters of authors ( Xia et al. , 2015 ) and Indian journals form the overwhelming proportion of deceptive publishers ( Xia, 2015 ). An interesting finding is the very low involvement of South America, both among deceptive publishers (0.5%) and corresponding authors in deceptive journals (2.2%). The OA infrastructure in Latin America is different compared to other developing countries, which reveals a possible reason for this asymmetric situation. Latin American journals and universities are engaged in OA publication models at a higher degree than other regions ( Alperin et al. , 2011 ). As a result, scholars from this region are not only more aware of OA issues, but they have more options for publishing OA than those from other regions ( Alperin et al. , 2011 ). Moreover, SciELO ( Packer, 2009 ) and the creation of Latin American databases ( Alonso-Gamboa & Russell, 2012 ) have played a tremendous part in this process by bringing recognition and a good reputation to publishing outlets in Latin America.

Considerable attention is given to the subject of deceptive publishers, who have become conflated with the OA movement in general to the detriment of genuine OA publishers. For example, a ‘sting’ operation that outed bad peer-review instead got misinterpreted as bad peer-review in OA journals ( Bohannon, 2013 ), but was probably more indicative of issues to do with the traditional closed and over-burdened system of peer review ( scilogs.com/communication_breakdown/jon-tennant-oa/ ). Overall, the deceptive publisher phenomenon is one major negative aspect that spawns many misconceptions and misgivings about publishing OA. Recently launched industry-led initiatives such as "Think, Check, Submit" ( thinkchecksubmit.org ) provide a checklist to help researchers identify trustworthy journals, and will likely be a pivotal tool in combating deceptive publishers.

Open Access and Open Science

OA exists in a constantly evolving scholarly research ecosystem and the proliferation of "open" as a description of scientific activities has caused some confusion about what the term "open" means (for a more comprehensive discussion, see Pomerantz & Peek (2016) ). As such, it is important to note how it is interconnected to other facets of the scholarly communication system. Here, we discuss the implications that the transition to OA has on developments in the broader context of Open Science (or Open Research).

Open Access and Open Data

The overall movement of OA has become conjoined with the drive for Open Data. Data sharing is fundamental to scientific progress, because data lead to the knowledge generated in research articles. Furthermore, data sharing has recently become a common requirement, together with OA, for both research funding and publication. The data sharing policy from PLOS illustrates the high degree of overlap between OA and Open Data; authors of articles published in PLOS are required to share the data except if they have valid reasons not to (i.e., an opt-out system; journals.plos.org/plosone/s/data-availability ). Many publishers, NGOs, and research funders have recently come together to commit to free research sharing in times of public health emergency, catalysed by the current Zika health threat ( http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-emergencies/index.htm ). It is noteworthy that some of the largest publishers, including Wiley, Taylor and Francis, and Elsevier (with the exception of the journal The Lancet ) did not commit to research sharing during ongoing or future public health crises (as of May, 2016).

The benefits of Open Data are diverse, including a citation advantage. Combined with the citation advantage for OA articles, providing data alongside publications can increase citations on average by 30% ( Piwowar & Vision, 2013 ) and up to 69% ( Piwowar et al. , 2007 ), but this evidence is entirely field-dependent (e.g., Dorch et al. , 2015) ). Below we cover seven additional benefits of Open Data.

First, data sharing enhances reproducibility, a crucial aspect in a time where some scientific domains appear to have problems with reproducibility (e.g., Open Science Collaboration, 2015 ). Several factors could form the basis for this ‘crisis’, such as an overemphasis on novelty instead of rigour, selective reporting of results, an overemphasis on statistical significance, and insufficient documentation of the research methods. Publicly sharing data, code, and materials can certainly alleviate issues with reproducibility. This is especially pertinent in the modern sciences, where a substantial proportion of published results draw on quantitative experiments and computer simulations. As such, it is largely impossible to reproduce these experiments as they become more complex and the associated datasets increase in complexity. When full access to the data, metadata, and the code used to produce the ultimate results are provided alongside publication, this greatly improves reproducibility.

Second, publicly available data can be used to stimulate innovations, such as new analytical methods. An excellent example of this is provided by the neuroimaging OpenfMRI project, where shared data have been used to examine the effects of different processing pipelines on analysis outcomes ( Carp, 2012 ) and test new methods to characterize different cognitive tasks ( Turner & Laird, 2012 ). Another good example is the Protein Data Bank (PDB) ( Berman et al. , 2000 ), a project which has enabled the re-use of the primary structural data and opened up new avenues of research, despite the latter not being expected.

Third, data sharing enables new research questions that can only be answered by combining datasets which now remain separated. Analyzing vast volumes of data can yield novel and perhaps surprising findings. This allows for integrated research hypotheses on the underlying processes behind the original data and observations. Exploratory approaches to large datasets can be seen as hypothesis generating tools, which later drives experimental testing to confirm or disprove these hypotheses ( Wagenmakers et al. , 2012 ).

Fourth, the realization that data will ultimately be shared and visible to the community provides a strong incentive for researchers to ensure they engage in better data documentation and, therefore, research methods. For example, the willingness to publicly share data has been associated with fewer statistical errors in the final research article ( Wicherts et al. , 2011 ).

Fifth, public data sharing provides a digital backup for datasets, protecting valuable scientific resources. Moreover, a considerable amount of data produced every day does not ultimately lead to publication and often remain hidden. Such data might remain in a hidden file-drawer despite being valid, creating a systematic bias in the information available. Public data sharing opens this file-drawer and, consequently, allows independent assessments of whether the data are valid or not.

Sixth, sharing data can certainly reduce the cost of performing research. A file-drawer has been indicated to greatly reduce the efficiency of research in detecting effects ( van Assen et al. , 2014 ). Open Data, as such, discourages redundant data collection (i.e., data that have been already collected but never made publicly accessible) and simultaneously allows researchers to better approximate what is happening in their fields. This will have a large effect on research costs, resulting in savings that can be then be used for more productive research goals.

Finally, and tightly connected with the sixth point, Open Data potentially has a great economic value. For example, Open Data creates jobs for analysis and re-use of these data Capgemini (2015) , and contributes to additional value of products and services in major sectors ( Manyika et al. , 2013 ), ad well as benefits users of these data rich services ( Stott, 2014 ).

Beyond OA and Open Data lies a more integrated approach to research, referred to more broadly as Open Science (i.e., Science 2.0, Open Scholarship). According to the European Commission’s Horizon 2020 programme, Open Science is defined as " The transformation, opening up and democratisation of science and research through ICT, with the objectives of making science more efficient, transparent and interdisciplinary, of changing the interaction between science and society, and of enabling broader societal impact and innovation ". Consequently, we see OA as only one of the multiple challenges currently facing the ‘open transformation’ of the scholarly publishing system ( Watson, 2015 ), and OA should therefore be considered in the wider contexts and complimentary domains of research transparency and open source.

As Kriegeskorte et al. (2012) pointed out, OA is now widely accepted as desirable and becoming a reality in many academic spheres. However, the second essential complementary element to research, evaluation, has received less attention despite the large amount of research that has been done to document its current limitations ( Benos et al. , 2007 ; Birukou et al. , 2011 ; Ioannidis, 2005 ; Ioannidis, 2012a ; Ioannidis, 2012b ; John et al. , 2012 ; Nosek & Bar-Anan, 2012 ; Simmons et al. , 2011 ).

Open evaluation, an ongoing post-publication process of transparent peer review and rating of papers, promises to address the problems of the current assessment systems Kriegeskorte et al. (2012) , as well as increasing the overall quality of the peer review process. As such, ongoing assessments of the development of OA must also consider the broader impact and concurrent changes to the peer review system ( van Rooyen et al. , 1999 ; Wicherts, 2016 ; Leek et al. , 2011 ). Some assessment methods, such as the Research Excellence Framework (REF) in England and administered by HEFCE, have already made OA a core feature of evaluation in that all research papers submitted to the REF must be archived in an institutional or subject repository ( www.hefce.ac.uk/pubs/year/2014/201407/ ). While it is too early to evaluate the impact of this policy, by tying OA compliance with research evaluation we might expect to see a national shift towards large-scale OA adoption. At the very least, such a combination is generating increasing interest and awareness about OA among researchers, increasing usage of institutional repositories, and increasing demand for funding for APCs ( Tate, 2015 ).

Future research regarding better ways to improve scholarly communication will be instrumental in providing evidence to support the transformation of the publishing system and design new alternatives ( Buttliere, 2014 ; Ghosh et al. , 2012 ; Kriegeskorte et al. , 2012 ; Pöschl, 2012 ), which will draw heavily upon on open publishing framework driven by developments and newly emerging models in OA. Finally, consideration of Open Science and OA will be important inclusions in evolving research standards such as the Transparency and Openness Promotion (TOP) guidelines ( https://cos.io/top/ ) .

Conclusions

This article provides an evidence-based review of the impact of OA on academy, economy and society. Overall, the evidence points to a favorable impact of OA on the scholarly literature through increased dissemination and re-use. OA has the potential to be a sustainable business venture for new and established publishers, and can provide substantial benefits to research- and development-intensive businesses, including health organisations, volunteer sectors, and technology. OA is a global issue, highlighted by inequalities between developing and developed nations, and largely fueled by financial disparity. Current levels of access in the developing world are insufficient and unstable, and OA has the potential to foster the development of stable research ecosystems. While deceptive publishing remains an ongoing issue, particularly in the developing world, increasing public engagement, development of OA policies, and discussion of sustainable and ethical publishing practices can remove this potential threat.

For libraries, universities, governments, and research institutions, one major benefit of lowering the cost of knowledge is the availability of extra budget that can be reallocated for other purposes. For researchers themselves, OA can increase their audience and impact by delivering wider and easier access for readers. For publishers, promoting OA is an answer to the desires and the needs of their research communities. Furthermore, subscription-based publishers have (partly) answered the call of the increasing global demand for OA, by giving their green light to author self-archiving ( Harnad et al. , 2008 ), as well as by establishing numerous ‘hybrid’ OA options. In an author survey, Swan & Brown (2004) reported that the vast majority of their sample indicated that they would self-archive willingly if their employer (or funding body) required them to do so. Similarly, in a study by Swan & Brown (2005) the vast majority of researchers (81%) indicated that they would comply with mandates that made OA a condition of funding or employment. There is evidence that many funders and research organisations are moving in this direction: since 2005, the number of policies supporting OA publishing increased steadily, and there is similar growth in the number of institutional rights-retention policies. Consequently, it is now the responsibility of researchers to ensure OA to their publications either by choosing the Green or the Gold road, and for public research funders to employ policies that are in the best interests of the wider public while considering the financial sustainability of the scholarly publishing ecosystem.

The fact that OA impacts upon such a diverse range of stakeholders, often with highly polarised and emotional viewpoints, highlights the ongoing need for evidence-informed discussion and engagement at all levels. This is especially the case for research communities, who have exceptionally diverse perspectives about OA and in particular how it interacts with ‘quality’ and ‘prestige’ in publishing ( Schroter & Tite, 2006 ; Schroter et al. , 2005 ). As Peter Suber, a leading voice in the OA movement, stated ( dash.harvard.edu/handle/1/4391169 ).

  • " TA [toll-access] publishers are not the enemy. They are only unpersuaded. Even when they are opposed, and not merely unpersuaded, they are only enemies if they have the power to stop OA. No publisher has this power, or at least not by virtue of publishing under a TA business model. If we have enemies, they are those who can obstruct progress to OA. The only people who fit this description are friends of OA who are distracted from providing OA by other work or other priorities. "

Therefore, OA supporters should focus their efforts on working for new models and systems rather than trying to undermine or punish the existing ones. OA remains only one of the multiple challenges that the scholarly publishing system is currently facing. As highlighted in this review, the empirical evidence for OA is overwhelmingly positive, but further research is certainly required to move from investigating the effects of OA to researching the broader effects of Open Science. In particular, OA must be considered in the future to more broadly regarding the adverse effects of a system of journal-based research assessment ( Brembs et al. , 2013 ), and the development of scholarly communication systems that are sustainable for, and in the best interests of, the commons.

Acknowledgments

We would like to collectively acknowledge the OpenCon community for inspiring this paper, and for providing continuous discussion about the various aspects of Open Access. In particular, we are grateful to Brett Buttliere, Audrey Risser, Sarah Barkla, and April Clyburne-Sherin for contributing resources to the development of this paper, and Tracey Depellegrin Connelly, Matt Menzenski, and Joseph McArthur for helpful comments on an earlier draft. We also thank Neil Saunders who provided the base code to extract data from PubMed Central. We would also like to thank Andy Nobes for drawing our attention to the work of INASP. PM would like to thank Lennart Martens for insightful discussions on Open Science. For comments on the first published version of this manuscript, we would like to thank Philip Young, Ross Mounce, Anna Sharman, and David Wojick for their helpful comments. Reviews from Gwilym Lockwood, Peter Suber, Paige Brown Jarreau, Anne Tierney and Chris Chambers were supportive, constructive, and greatly improved the content and balance of this article.

[version 3; referees: 3 approved

Funding Statement

This research was partly funded by the Belgian National Fund for Scientific Research through a FRIA grant. PM acknowledges support from the European Commission Horizon 2020 Programme under Grant Agreement 634107 (PHC32-2014) ‘MULTIMOT’.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  • Abbott BP, Abbott R, Abbott TD, et al.: Observation of Gravitational Waves from a Binary Black Hole Merger. Phys Rev Lett. 2016; 116 ( 6 ): 061102. 10.1103/PhysRevLett.116.061102 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Adcock J, Fottrell E: The North-South information highway: case studies of publication access among health researchers in resource-poor countries. Glob Health Action. 2008; 1 . 10.3402/gha.v1i0.1865 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Adie E: Attention! a study of open access vs non-open access articles. Figshare. 2014. 10.6084/m9.figshare.1213690 [ CrossRef ] [ Google Scholar ]
  • Allen HG, Stanton TR, Di Pietro F, et al.: Social media release increases dissemination of original articles in the clinical pain sciences. PLoS One. 2013; 8 ( 7 ):e68914. 10.1371/journal.pone.0068914 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alonso-Gamboa JO, Russell JM: Latin American scholarly journal databases: a look back to the way forward. In Aslib Proceedings. Emerald Group Publishing Limited,2012; 64 ( 1 ):32–45. 10.1108/00012531211196693 [ CrossRef ] [ Google Scholar ]
  • Alperin JP, Fischman GE, Willinsky J: Scholarly communication strategies in Latin America’s research-intensive universities. Educación superior y sociedad. 2011; 16 ( 2 ). Reference Source [ Google Scholar ]
  • Aman V: Is there any measurable benefit in publishing preprints in the arxiv section quantitative biology? CoRR. 2014; abs/1411.1955. Reference Source [ Google Scholar ]
  • Anderson B: Funding sources of impactful and transformative research. Master’s thesis, San Jose State University,2013a. Reference Source [ Google Scholar ]
  • Anderson C: Free: How today’s smartest businesses profit by giving something for nothing. Random House,2013b. Reference Source [ Google Scholar ]
  • Antelman K: Do open-access articles have a greater research impact? Coll Res Libr. 2004; 65 ( 5 ):372–382. 10.5860/crl.65.5.372 [ CrossRef ] [ Google Scholar ]
  • Arzberger P, Schroeder P, Beaulieu A, et al.: Promoting access to public research data for scientific, economic, and social development. Data Sci J. 2004; 3 :135–152. 10.2481/dsj.3.135 [ CrossRef ] [ Google Scholar ]
  • Atchison A, Bull J: Will open access get me cited? an analysis of the efficacy of open access publishing in political science. PS Polit Sci Polit. 2015; 48 ( 01 ):129–137. 10.1017/S1049096514001668 [ CrossRef ] [ Google Scholar ]
  • Beagrie N, Houghton JW: The value and impact of data sharing and curation: A synthesis of three recent studies of UK research data centres .2014. Reference Source [ Google Scholar ]
  • Beall J: Criteria for determining predatory open-access publishers. Scholarly Open Access. 2015. (accessed 2015-02-14). Reference Source [ Google Scholar ]
  • Beall J: Predatory publishers are corrupting open access. Nature. 2012; 489 ( 7415 ):179. 10.1038/489179a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Beall J: The open-access movement is not really about open access. tripleC: Communication, Capitalism and Critique. Open Access Journal for a Global Sustainable Information Society. 2013; 11 ( 2 ):589–597. Reference Source [ Google Scholar ]
  • Benos DJ, Bashari E, Chaves JM, et al.: The ups and downs of peer review. Adv Physiol Educ. 2007; 31 ( 2 ):145–152. 10.1152/advan.00104.2006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bergstrom TC, Courant PN, McAfee RP, et al.: Evaluating big deal journal bundles. Proc Natl Acad Sci U S A. 2014; 111 ( 26 ):9425–9430. 10.1073/pnas.1403006111 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Berman HM, Westbrook J, Feng Z, et al.: The Protein Data Bank. Nucleic Acids Res. 2000; 28 ( 1 ):235–242. 10.1093/nar/28.1.235 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Berners-Lee T, De Roure D, Harnad S, et al.: Journal publishing and author self-archiving: Peaceful co-existence and fruitful collaboration. Commentary On: http://www.alpsp.org/news/rcuk/default.htm .2005. Reference Source [ Google Scholar ]
  • Bernius S, Hanauske M, Dugall B, et al.: Exploring the effects of a transition to open access: Insights from a simulation study. J Am Soc Inf Sci Technol. 2013; 64 ( 4 ):701–726. 10.1002/asi.22772 [ CrossRef ] [ Google Scholar ]
  • Birukou A, Wakeling JR, Bartolini C, et al.: Alternatives to peer review: novel approaches for research evaluation. Front Comput Neurosci. 2011; 5 :56. 10.3389/fncom.2011.00056 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Björk BC, Laakso M, Welling P, et al.: Anatomy of green open access. J Assoc Inf Sci Technol. 2014; 65 ( 2 ):237–250. 10.1002/asi.22963 [ CrossRef ] [ Google Scholar ]
  • Bjork BC, Roos A, Lauri M: Scientific journal publishing: yearly volume and open access availability. Inform Res. 2009; 14 ( 1 ). Reference Source [ Google Scholar ]
  • Bloudoff-Indelicato M: Text-mining block prompts online response. Nature News. 2015; 527 ( 7579 ):413 10.1038/527413f [ CrossRef ] [ Google Scholar ]
  • Bogich T, Ballesteros S, Berjon R, et al.: On the marginal cost of scholarly communication .2016. Accessed: 2016-3-24. Reference Source [ Google Scholar ]
  • Bohannon J: Who’s afraid of peer review? Science. 2013; 342 ( 6154 ):60–5. 10.1126/science.342.6154.60 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bohannon J: Who’s downloading pirated papers? everyone .2016; Accessed: 2016-5-4. Reference Source [ PubMed ] [ Google Scholar ]
  • Bowman JD: Predatory publishing, questionable peer review, and fraudulent conferences. Am J Pharm Educ. 2014; 78 ( 10 ): 176. 10.5688/ajpe7810176 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brembs B, Button K, Munafò M: Deep impact: unintended consequences of journal rank. Front Hum Neurosci. 2013; 7 :291. 10.3389/fnhum.2013.00291 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brembs B: What goes into making a scientific manuscript public ?2015; Accessed: 2016-3-24. Reference Source [ Google Scholar ]
  • Buttliere BT: Using science and psychology to improve the dissemination and evaluation of scientific work. Front Comput Neurosci. 2014; 8 :82. 10.3389/fncom.2014.00082 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Capgemini: Creating value through open data: Study on the impact of re-use of public data resources. Digital Agenda for Europe. 2015. Reference Source [ Google Scholar ]
  • Carp J: On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments. Front Neurosci. 2012; 6 :149. 10.3389/fnins.2012.00149 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr L, Harnad S, Swan A: A longitudinal study of the practice of self-archiving. University of Southampton Working Paper. 2007. Reference Source [ Google Scholar ]
  • Chamberlain S, Boettiger C, Ram K: rplos: Interface to the Search ‘API’ for ‘PLoS’ Journals .2016. Reference Source [ Google Scholar ]
  • Chan L, Kirsop B, Arunachalam S: Open access archiving: the fast track to building research capacity in developing countries .2005. Reference Source [ Google Scholar ]
  • Chatterjee P, Biswas T, Mishra V: Open access: the changing face of scientific publishing. J Family Med Prim Care. 2013; 2 ( 2 ):128–30. 10.4103/2249-4863.117400 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cheng W, Ren S: Evolution of open access publishing in Chinese scientific journals. Learn Publ. 2008; 21 ( 2 ):140–152. 10.1087/095315108X288884 [ CrossRef ] [ Google Scholar ]
  • Colomb J, Brembs B: Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [version 2; referees: 3 approved]. F1000Res. 2015; 3 :176 10.12688/f1000research.4263.2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Costas R, Zahedi Z, Wouters P: Do “altmetrics” correlate with citations? extensive comparison of altmetric indicators with citations from a multidisciplinary perspective. J Assoc Inf Sci Technol. 2015; 66 :2003–2019. 10.1002/asi.23309 [ CrossRef ] [ Google Scholar ]
  • Crawford W: Open Access Journals 2014, DOAJ subset .2015. 10.6084/m9.figshare.1299451.v4 [ CrossRef ] [ Google Scholar ]
  • Dallmeier-Tiessen S, Darby R, Goerner B, et al.: Highlights from the soap project survey. What scientists think about open access publishing . arXiv preprint arXiv:1101.5260.2011. Reference Source [ Google Scholar ]
  • Davis P, Fromerth M: Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? Scientometrics. 2007; 71 ( 2 ):203–215. 10.1007/s11192-007-1661-8 [ CrossRef ] [ Google Scholar ]
  • Davis PM, Lewenstein BV, Simon DH, et al.: Open access publishing, article downloads, and citations: randomised controlled trial. BMJ. 2008; 337 :a568. 10.1136/bmj.a568 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Davis PM: Open access, readership, citations: a randomized controlled trial of scientific journal publishing. FASEB J. 2011; 25 ( 7 ):2129–2134. 10.1096/fj.11-183988 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Denicola R: Copyright and open access: reconsidering university ownership of faculty research. Nebraska Law Review. 2006; 85 ( 2 ). Reference Source [ Google Scholar ]
  • Denker SP: Collaboration with IBM watson supports the value add of open access .2016. Reference Source [ Google Scholar ]
  • Derrida J: Who’s afraid of philosophy?: Right to philosophy 1. Stanford University Press,2002; 1 Reference Source [ Google Scholar ]
  • Djuric D: Penetrating the omerta of predatory publishing: the Romanian connection. Sci Eng Ethics. 2015; 21 ( 1 ):183–202. 10.1007/s11948-014-9521-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dorch SBF, Drachen TM, Ellegaard O: The data sharing advantage in Astrophysics . arXiv,2015. Reference Source [ Google Scholar ]
  • EFF: Big win for fair use in google books lawsuit .2015; Accessed: 2016-4-26. Reference Source [ Google Scholar ]
  • Eger T, Scheufen M: The past and the future of copyright law: Technological change and beyond . Liber Amicorum Boudewijn Bouckaert , forthcoming.2012;37–64. Reference Source [ Google Scholar ]
  • Elbakyan A, Bohannon J: Data from: Who’s downloading pirated papers? everyone. Dryad Digital Repository. 2016. 10.5061/dryad.q447c [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • European Commission: Communication from the commission to the european parliament, the council, the european economic and social committee, and the committee of the regions . Towards better access to scientific information: Boosting the benefits of public investments in research.2012. Reference Source [ Google Scholar ]
  • Evans JA, Reimer J: Open access and global participation in science. Science. 2009; 323 ( 5917 ):1025. 10.1126/science.1154562 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eysenbach G: Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact. J Med Internet Res. 2011; 13 ( 4 ):e123. 10.2196/jmir.2012 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eysenbach G: Citation advantage of open access articles. PLoS Biol. 2006; 4 ( 5 ):e157. 10.1371/journal.pbio.0040157 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flynn S, Hollis A, Palmedo M: An economic justification for open access to essential medicine patents in developing countries. J Law Med Ethics. 2009; 37 ( 2 ):184–208. 10.1111/j.1748-720X.2009.00365.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frandsen TF: The integration of open access journals in the scholarly communication system: Three science fields. Inf Process Manag. 2009; 45 ( 1 ):131–141. 10.1016/j.ipm.2008.06.001 [ CrossRef ] [ Google Scholar ]
  • Gargouri Y, Hajjem C, Larivière V, et al.: Self-selected or mandated, open access increases citation impact for higher quality research. PLoS One. 2010; 5 ( 10 ):e13636. 10.1371/journal.pone.0013636 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gargouri Y, Larivière V, Gingras Y, et al.: Green and gold open access percentages and growth, by discipline. arXiv preprint arXiv:1206.3664. 2012. Reference Source [ Google Scholar ]
  • Gaule P, Maystre N: Getting cited: does open access help? Res Policy. 2011; 40 ( 10 ):1332–1338. 10.1016/j.respol.2011.05.025 [ CrossRef ] [ Google Scholar ]
  • Gentil-Beccot A, Mele S, Brooks T: Citing and reading behaviours in high-energy physics. Scientometrics. 2010; 84 ( 2 ):345–355. 10.1007/s11192-009-0111-1 [ CrossRef ] [ Google Scholar ]
  • Ghosh SS, Klein A, Avants B, et al.: Learning from open source software projects to improve scientific review. Front Comput Neurosci. 2012; 6 :18. 10.3389/fncom.2012.00018 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Glenisson P, Glänzel W, Janssens F, et al.: Combining full text and bibliometric information in mapping scientific disciplines. Inf Process Manag. 2005; 41 ( 6 ):1548–1572. 10.1016/j.ipm.2005.03.021 [ CrossRef ] [ Google Scholar ]
  • Glover M, Buxton M, Guthrie S, et al.: Estimating the returns to UK publicly funded cancer-related research in terms of the net value of improved health outcomes. BMC Med. 2014; 12 ( 1 ):99. 10.1186/1741-7015-12-99 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gonzalez GH, Tahsin T, Goodale BC, et al.: Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform. 2016; 17 ( 1 ):33–42. 10.1093/bib/bbv087 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hajjem C, Harnad S, Gingras Y: Ten-year cross-disciplinary comparison of the growth of open access and how it increases research citation impact .2006; arXiv preprint cs/0606079. Reference Source [ Google Scholar ]
  • Handke C, Guibault L, Vallbé JJ: Is Europe falling behind in data mining? Copyright’s impact on data mining in academic research .2015. 10.2139/ssrn.2608513 [ CrossRef ] [ Google Scholar ]
  • Harmston N, Filsell W, Stumpf MP: What the papers say: text mining for genomics and systems biology. Hum Genomics. 2010; 5 ( 1 ):17–29. 10.1186/1479-7364-5-1-17 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harnad S, Brody T, Vallieres F, et al.: The access/impact problem and the green and gold roads to open access: An update. Serials Rev. 2008; 34 ( 1 ):36–40. 10.1016/j.serrev.2007.12.005 [ CrossRef ] [ Google Scholar ]
  • Harnad S, Brody T: Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-lib Magazine. 2004; 10 ( 6 ). 10.1045/june2004-harnad [ CrossRef ] [ Google Scholar ]
  • Harnad S: Opening access by overcoming zeno’s paralysis .2006. Reference Source [ Google Scholar ]
  • Henneken EA, Kurtz MJ, Eichhorn G, et al.: Effect of E-Printing on Citation Rates in Astronomy and Physics. arXiv. 2006. Reference Source [ Google Scholar ]
  • Hitchcock S: The effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies .2013. Reference Source [ Google Scholar ]
  • Hoorn E, van der Graaf M: Copyright issues in open access research journals: The authors perspective. D-Lib Magazine. 2006; 12 ( 2 ):6 10.1045/february2006-vandergraaf [ CrossRef ] [ Google Scholar ]
  • Houghton JW, Oppenheim C: The economic implications of alternative publishing models. Prometheus. 2010; 28 ( 1 ):41–54. 10.1080/08109021003676359 [ CrossRef ] [ Google Scholar ]
  • Ioannidis JP: Scientific communication is down at the moment, please check again later. Psychol Inq. 2012b; 23 ( 3 ):267–270. 10.1080/1047840X.2012.699427 [ CrossRef ] [ Google Scholar ]
  • Ioannidis JP: Why most published research findings are false. PLoS Med. 2005; 2 ( 8 ):e124. 10.1371/journal.pmed.0020124 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ioannidis JP: Why Science Is Not Necessarily Self-Correcting. Perspect Psychol Sci. 2012a: 7 ( 6 ):645–654. 10.1177/1745691612464056 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • John LK, Loewenstein G, Prelec D: Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012; 23 ( 5 ):524–32. 10.1177/0956797611430953 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Khabsa M, Giles CL: The number of scholarly documents on the public web. PLoS One. 2014; 9 ( 5 ):e93949. 10.1371/journal.pone.0093949 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kmietowicz Z: Publishers withdraw 2500 journals from free access scheme in Bangladesh. BMJ. 2011; 342 :d196 Reference Source [ Google Scholar ]
  • Knobloch J, Albiez EJ, Schmitz H: A serological survey on viral haemorrhagic fevers in Liberia. In Annales de l’Institut Pasteur/Virologie. Elsevier.1982; 133 :125–128. 10.1016/S0769-2617(82)80028-2 [ CrossRef ] [ Google Scholar ]
  • Kousha K, Abdoli M: The citation impact of Open Access agricultural research: A comparison between OA and non-OA publications. Online Inform Rev. 2010; 34 ( 5 ):772–785. 10.1108/14684521011084618 [ CrossRef ] [ Google Scholar ]
  • Kriegeskorte N, Walther A, Deca D: An emerging consensus for open evaluation: 18 visions for the future of scientific publishing. Front Comput Neurosci. 2012; 6 :94. 10.3389/fncom.2012.00094 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kurtz MJ, Eichhorn G, Accomazzi A, et al.: The effect of use and access on citations. Inf Process Manag. 2005; 41 ( 6 ):1395–1402. 10.1016/j.ipm.2005.03.010 [ CrossRef ] [ Google Scholar ]
  • Kurtz MJ, Henneken EA: Open Access does not increase citations for research articles from The Astrophysical Journal. arXiv. 2007. Reference Source [ Google Scholar ]
  • Laakso M, Björk BC: Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Med. 2012; 10 ( 1 ):124. ISSN 1741-7015. 10.1186/1741-7015-10-124 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Laakso M, Welling P, Bukvova H, et al.: The development of open access journal publishing from 1993 to 2009. PLoS One. 2011; 6 ( 6 ):e20961. 10.1371/journal.pone.0020961 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lansingh VC, Carter MJ: Does open access in ophthalmology affect how articles are subsequently cited in research? Ophthalmology. 2009; 116 ( 8 ):1425–1431. 10.1016/j.ophtha.2008.12.052 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Larivière V, Haustein S, Mongeon P: The Oligopoly of Academic Publishers in the Digital Era. PLoS One. 2015; 10 ( 6 ):e0127502. 10.1371/journal.pone.0127502 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lawrence S: Online or invisible? Nature. 2001; 411 ( 6837 ):521 Reference Source [ PubMed ] [ Google Scholar ]
  • Lawson S, Meghreblian B: Journal subscription expenditure of UK higher education institutions [version 3; referees: 4 approved]. F1000Res. 2015; 3 ( 274 ). 10.12688/f1000research.5706.3 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lawson S: Fee waivers for open access journals. Publications. 2015; 3 ( 3 ):155–167. 10.3390/publications3030155 [ CrossRef ] [ Google Scholar ]
  • Leek JT, Taub MA, Pineda FJ: Cooperation between referees and authors increases peer review accuracy. PLoS One. 2011; 6 ( 11 ):e26895. 10.1371/journal.pone.0026895 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Leitner F, Valencia A: A text-mining perspective on the requirements for electronically annotated abstracts. FEBS Lett. 2008; 582 ( 8 ):1178–81. 10.1016/j.febslet.2008.02.072 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liang X, Su LY, Yeo SK, et al.: Building buzz (scientists) communicating science in new media environments. Journal Mass Commun Q. 2014; 91 ( 4 ):772–791. 10.1177/1077699014550092 [ CrossRef ] [ Google Scholar ]
  • MacKie-Mason J: Economic thoughts about “gold” open access .2016; Accessed: 2016-4-27. Reference Source [ Google Scholar ]
  • Malhotra A, Younesi E, Gurulingappa H, et al.: ‘HypothesisFinder:’ a strategy for the detection of speculative statements in scientific text. PLoS Comput Biol. 2013; 9 ( 7 ):e1003117. 10.1371/journal.pcbi.1003117 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manyika J, Chui M, Groves P, et al.: Open data: Unlocking innovation and performance with liquid information. McKinsey global institute. McKinsey Center for Government and McKinsey Business Technology Office 2013. Reference Source [ Google Scholar ]
  • Martone M, Murray-Rust P, Molloy J, et al.: Contentmine/hypothes.is proposal. Research Ideas and Outcomes. 2016; 2 :e8424 10.3897/rio.2.e8424 [ CrossRef ] [ Google Scholar ]
  • Matheka DM, Nderitu J, Mutonga D, et al.: Open access: academic publishing and its implications for knowledge equity in kenya. Global Health. 2014; 10 ( 1 ):26. 10.1186/1744-8603-10-26 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mazzucato M: The entrepreneurial state. Soundings. 2011; 49 ( 49 ):131–142. 10.3898/136266211798411183 [ CrossRef ] [ Google Scholar ]
  • McCabe M, Snyder CM: Identifying the effect of open access on citations using a panel of science journals. Econ Inq. 2014; 52 ( 4 ):1284–1300. 10.1111/ecin.12064 [ CrossRef ] [ Google Scholar ]
  • McGlynn T: The evolution of pseudojournals . Small Pond Science Dominguez Hills, CA,2013. Reference Source [ Google Scholar ]
  • McGuigan GS, Russel RD: The business of academic publishing: A strategic analysis of the academic journal publishing industry and its impact on the future of scholarly publishing. Electron Journal of Academic and Special Librarianship. 2008; 9 ( 3 ). Reference Source [ Google Scholar ]
  • McKiernan E, Bourne PE, Brown CT, et al.: The open research value proposition: How sharing can help researchers succeed. Figshare. 2016; 1 10.6084/m9.figshare.1619902.v2 [ CrossRef ] [ Google Scholar ]
  • McVeigh ME: Open access journals in the ISI citation databases: analysis of impact factors and citation patterns: a citation study from Thomson Scientific . Thomson Scientific.2004. Reference Source [ Google Scholar ]
  • Metcalfe TS: The citation impact of digital preprint archives for solar physics papers. Sol Phys. 2006; 239 ( 1–2 ):549–553. 10.1007/s11207-006-0262-7 [ CrossRef ] [ Google Scholar ]
  • Metcalfe TS: The rise and citation impact of astroph in major journals . arXiv preprint astro-ph/0503519 2005. Reference Source [ Google Scholar ]
  • Moed H: The effect of “open access” upon citation impact: an analysis of arxiv’s condensed matter section .2006. Reference Source [ Google Scholar ]
  • Mohammadi E, Thelwall M, Haustein S, et al.: Who reads research articles? an altmetrics analysis of Mendeley user categories. J Assoc Inf Sci Technol. 2015; 66 ( 9 ):1832–1846. 10.1002/asi.23286 [ CrossRef ] [ Google Scholar ]
  • Müller-Langer F, Watt R: Copyright and open access for academic works . Review of Economic Research on Copyright Issues 2010; 7 ( 1 ):45–65. Reference Source [ Google Scholar ]
  • Natarajan J, Berrar D, Dubitzky W, et al.: Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line. BMC Bioinformatics. 2006; 7 ( 1 ):373. 10.1186/1471-2105-7-373 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Neto SC, Willinsky J, Alperin J: Measuring, rating, supporting, and strengthening open access scholarly publishing in brazil. Educ Policy Anal Arch. 2016; 24 ( 0 ):54 ISSN 1068-2341. 10.14507/epaa.24.2391 [ CrossRef ] [ Google Scholar ]
  • Norris M, Oppenheim C, Rowland F: Open access citation rates and developing countries. In ELPUB 2008;335–342. Reference Source [ Google Scholar ]
  • Nosek BA, Bar-Anan Y: Scientific utopia: I. opening scientific communication. Psychol Inq. 2012; 23 ( 3 ):217–243. 10.1080/1047840X.2012.692215 [ CrossRef ] [ Google Scholar ]
  • Nuijten MB, Hartgerink CH, van Assen MA, et al.: The prevalence of statistical reporting errors in psychology (1985–2013). Behav Res Methods. 2015;1–22. ISSN 1554-351X, 1554-3528. 10.3758/s13428-015-0664-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Odlyzko A: Economic costs of toll access. Open Access: Key Strategic, Technical and Economic Aspects. 2006; 4 :39–43. 10.1016/B978-1-84334-203-8.50004-2 [ CrossRef ] [ Google Scholar ]
  • Odlyzko AM: Open Access, library and publisher competition, and the evolution of general commerce. CoRR. abs/1302.1105,2013. Reference Source [ PubMed ] [ Google Scholar ]
  • Okerson A: With feathers: Effects of copyright and ownership on scholarly publishing. Coll Res Libr. 1991; 52 ( 5 ):425–38. 10.5860/crl_52_05_425 [ CrossRef ] [ Google Scholar ]
  • Open Access NL: Amendment to copyright act .2015; Accessed: 2016-4-27. Reference Source [ Google Scholar ]
  • Open Science Collaboration: PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015; 349 ( 6251 ):aac4716. 10.1126/science.aac4716 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Packer AL: The SciELO Open Access: A Gold Way from the South. Can J High Educ. 2009; 39 ( 3 ):111–126. Reference Source [ Google Scholar ]
  • Piwowar H: Altmetrics: Value all research products. Nature. 2013; 493 ( 7431 ):159. 10.1038/493159a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Piwowar HA, Day RS, Fridsma DB: Sharing detailed research data is associated with increased citation rate. PLoS One. 2007; 2 ( 3 ):e308. 10.1371/journal.pone.0000308 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Piwowar HA, Vision TJ: Data reuse and the open data citation advantage. PeerJ. 2013; 1 :e175. ISSN 2167-8359. 10.7717/peerj.175 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pomerantz J, Peek R: Fifty shades of open. First Monday. 2016; 21 ( 5 ): ISSN 13960466. 10.5210/fm.v21i5.6360 [ CrossRef ] [ Google Scholar ]
  • Porter AL, Kongthon A, Lu JC: Research profiling: Improving the literature review. Scientometrics. 2002; 53 ( 3 ):351–370. 10.1023/A:1014873029258 [ CrossRef ] [ Google Scholar ]
  • Pöschl U: Multi-stage open peer review: scientific evaluation integrating the strengths of traditional peer review with the virtues of transparency and self-regulation. Front Comput Neurosci. 2012; 6 :33. 10.3389/fncom.2012.00033 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Research Information Network: Activities, costs and funding flows in the scholarly communications system in the UK. Technical report,2008. Reference Source [ Google Scholar ]
  • Rosenwald MS: This student put 50 million stolen research articles online. And they’re free .2016; Accessed: 2016-4-26. Reference Source [ Google Scholar ]
  • Sahu DK, Gogtay NJ, Bavdekar SB: Effect of open access on citation rates for a small biomedical journal . Fifth International Congress on Peer Review and Biomedical Publication, Chicago, September 16-18 ,2005. Reference Source [ Google Scholar ]
  • Satyanarayana K: Journal publishing: the changing landscape. Indian J Med Res. 2013; 138 ( 1 ):4–7. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Scheufen M: Copyright Versus Open Access: On the Organisation and International Political Economy of Access to Scientific Knowledge. Springer,2015. 10.1007/978-3-319-12739-2 [ CrossRef ] [ Google Scholar ]
  • Schimmer R, Geschuhn KK, Vogler A: Disrupting the subscription journals’ business model for the necessary large-scale transformation to open access .2015. 10.17617/1.3 [ CrossRef ] [ Google Scholar ]
  • Schroter S, Tite L, Smith R: Perceptions of open access publishing: interviews with journal authors. BMJ. 2005; 330 ( 7494 ):756. 10.1136/bmj.38359.695220.82 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schroter S, Tite L: Open access publishing and author-pays business models: a survey of authors' knowledge and perceptions. J R Soc Med. 2006; 99 ( 3 ):141–148. 10.1258/jrsm.99.3.141 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwarz GJ, Kennicutt RC, Jr: Demographic and citation trends in astrophysical journal papers and preprints . arXiv preprint astro-ph/0411275,2004. Reference Source [ Google Scholar ]
  • Shatkay H, Pan F, Rzhetsky A, et al.: Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics. 2008; 24 ( 18 ):2086–2093. 10.1093/bioinformatics/btn381 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shavell S: Should copyright of academic works be abolished? J Legal Analysis. 2010; 2 ( 1 ):301–358. 10.1093/jla/2.1.301 [ CrossRef ] [ Google Scholar ]
  • Shen C, Björk BC: ‘predatory’ open access: a longitudinal study of article volumes and market characteristics. BMC Med. 2015; 13 ( 1 ):230. 10.1186/s12916-015-0469-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shieber S, Suber P: Good practices for university open-access policies .2016; Accessed: 2016-9-12. Reference Source [ Google Scholar ]
  • Simmons JP, Nelson LD, Simonsohn U: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011; 22 ( 11 ):1359–66. 10.1177/0956797611417632 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Solomon DJ, Björk BC: A study of open access journals using article processing charges. J Am Soc Inf Sci Technol. 2012; 63 ( 8 ):1485–1495. 10.1002/asi.22673 [ CrossRef ] [ Google Scholar ]
  • Solomon DJ, Björk BC: Publication fees in open access publishing: Sources of funding and factors influencing choice of journal. J Am Soc Inf Sci Technol. 2012; 63 ( 1 ):98–107. 10.1002/asi.21660 [ CrossRef ] [ Google Scholar ]
  • Stodden V: Open science: policy implications for the evolving phenomenon of user-led scientific innovation. Journal of Science Communication. 2010; 9 ( 1 ):A05 Reference Source [ Google Scholar ]
  • Stott A: Open data for economic growth . Washington DC: World Bank,2014. Reference Source [ Google Scholar ]
  • Suber P: Open Access . MIT Press, Cambridge, Mass,2012. Reference Source [ Google Scholar ]
  • Sutton C: Is free inevitable in scholarly communication? the economics of open access. College & Research Libraries News. 2011; 72 ( 11 ):642–645. Reference Source [ Google Scholar ]
  • Swan A, Brown S: Authors and open access publishing. Learn Publ. 2004; 17 ( 3 ):219–224. 10.1087/095315104323159649 [ CrossRef ] [ Google Scholar ]
  • Swan A, Brown S: Open access self-archiving: An author study .2005. Reference Source [ Google Scholar ]
  • Swan A: The open access citation advantage: Studies and results to date .2010. Reference Source [ Google Scholar ]
  • Swanson DR: Two medical literatures that are logically but not bibliographically connected. J Am Soc Inf Sci. 1987; 38 ( 4 ):228–233. 10.1002/(SICI)1097-4571(198707)38:4<228::AID-ASI2>3.0.CO;2-G [ CrossRef ] [ Google Scholar ]
  • Tate D: Open access and research assessment: Dealing with uk open access requirements in practice .2015;58–62. 10.3233/978-1-61499-562-3-58 [ CrossRef ] [ Google Scholar ]
  • Turner JA, Laird AR: The cognitive paradigm ontology: design and application. Neuroinformatics. 2012; 10 ( 1 ):57–66. 10.1007/s12021-011-9126-x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • United Nations: Universal declaration of human rights .1948. Reference Source [ Google Scholar ]
  • van Assen MA, van Aert RC, Nuijten MB, et al.: Why publishing everything is more effective than selective publishing of statistically significant results. PLoS One. 2014; 9 ( 1 ):e84896. 10.1371/journal.pone.0084896 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Noorden R: Elsevier opens its papers to text-mining. Nature. 2014; 506 ( 7486 ):17. 10.1038/506017a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Noorden R: Open access: The true cost of science publishing. Nature. 2013; 495 ( 7442 ):426–429. 10.1038/495426a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Noorden R: Trouble at the text mine. Nature. 2012; 483 ( 7388 ):134–135. 10.1038/483134a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Rooyen S, Godlee F, Evans S, et al.: Effect of open peer review on quality of reviews and on reviewers' recommendations: a randomised trial. BMJ. 1999; 318 ( 7175 ):23–27. 10.1136/bmj.318.7175.23 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vanclay JK: Factors affecting citation rates in environmental science. J Informetr. 2013; 7 ( 2 ):265–271. 10.1016/j.joi.2012.11.009 [ CrossRef ] [ Google Scholar ]
  • Veletsianos G, Kimmons R: Assumptions and challenges of open scholarship. The International Review of Research in Open and Distributed Learning. 2012; 13 ( 4 ):166–189. Reference Source [ Google Scholar ]
  • Vickery G: Review of recent studies on psi re-use and related market developments. Information Economics. 2011. Reference Source [ Google Scholar ]
  • Vincent-Lamarre P, Boivin J, Gargouri Y, et al.: Estimating open access mandate effectiveness: The melibea score. J Assoc Inf Sci Technol. 2016. 10.1002/asi.23601 [ CrossRef ] [ Google Scholar ]
  • Wagenmakers EJ, Wetzels R, Borsboom D, et al.: An agenda for purely confirmatory research. Perspect Psychol Sci. 2012; 7 ( 6 ):632–638. 10.1177/1745691612463078 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wagner B: Open access citation advantage: An annotated bibliography. Issues Sci Technol Librarianship. 2010; ( 60 ):2 10.5062/F4Q81B0W [ CrossRef ] [ Google Scholar ]
  • Walters WH, Linvill A: Characteristics of open access journals in six subject areas. Coll Res Libr. 2011; 72 ( 4 ):372–392. 10.5860/crl-132 [ CrossRef ] [ Google Scholar ]
  • Wang X, Liu C, Mao W, et al.: The open access advantage considering citation, article usage and social media attention. Scientometrics. 2015; 103 ( 2 ):555–564. 10.1007/s11192-015-1547-0 [ CrossRef ] [ Google Scholar ]
  • Watson M: When will ‘open science’ become simply ‘science’? Genome Biol. 2015; 16 ( 1 ):101, ISSN 1465-6906. 10.1186/s13059-015-0669-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wicherts JM, Bakker M, Molenaar D: Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS One. 2011; 6 ( 11 ):e26828. 10.1371/journal.pone.0026828 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wicherts JM: Peer Review Quality and Transparency of the Peer-Review Process in Open Access and Subscription Journals. PLoS One. 2016; 11 ( 1 ):e0147913. 10.1371/journal.pone.0147913 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Willinsky J: Copyright contradictions in scholarly publishing . First Monday 2002; 7 ( 11 ). 10.5210/fm.v7i11.1006 [ CrossRef ] [ Google Scholar ]
  • Willinsky J: The Access Principle: The Case for Open Access to Research and Scholarship . MIT Press, Cambridge, Mass,2006. Reference Source [ Google Scholar ]
  • Wohlrabe K, Birkmeier D: Do open access articles in economics have a citation advantage? Munich Personal RePEc Archive. 2014. Reference Source [ Google Scholar ]
  • Xia J, Harmon JL, Connolly KG, et al.: Who publishes in “predatory” journals? J Assoc Inf Sci Technol. 2015; 66 ( 7 ):1406–1417. 10.1002/asi.23265 [ CrossRef ] [ Google Scholar ]
  • Xia J: Predatory journals and their article publishing charges. Learn Publ. 2015; 28 ( 1 ):69–74. 10.1087/20150111 [ CrossRef ] [ Google Scholar ]
  • Xu L, Liu J, Fang Q: Analysis on open access citation advantage: an empirical study based on oxford open journals. In Proceedings of the 2011 iConference. ACM,2011;426–432. 10.1145/1940761.1940819 [ CrossRef ] [ Google Scholar ]
  • Zhang Y: The effect of open access on citation impact: a comparison study based on web citation analysis. Libri. 2006; 56 ( 3 ):145–156. 10.1515/LIBR.2006.145 [ CrossRef ] [ Google Scholar ]
  • Zhao D: Characteristics and impact of grant-funded research: a case study of the library and information science field. Scientometrics. 2010; 84 ( 2 ):293–306. 10.1007/s11192-010-0191-y [ CrossRef ] [ Google Scholar ]
  • Zuccala A: Open access and civic scientific information literacy. Information Research: An International Electronic Journal. 2010; 15 ( 1 ). Reference Source [ Google Scholar ]

Referee response for version 3

Peter suber.

1 Berkman Center for Internet & Society, Harvard University, Cambridge, MA, USA

I'm commenting on this sentence from the first paragraph of Version 3 of the article:

The Green route is also enabled through author rights retention, in which authors pre-emptively grant non-exclusive rights to their institutions before publishing any works. The institution then has the ability to make articles by these authors OA without seeking permission from the publishers (e.g., this is the case of the Dutch Taverne amendment that has declared self-archival of research after ‘a reasonable period of time’ a legal right ( Open Access NL, 2015 )).

​The authors added the bulk of this at my request in an earlier comment. I'm glad they did, but it's still only partially true. It's true that universities can adopt rights-retention OA policies that make it unnecessary to seek permission from publishers. But it's not true that Dutch Taverne amendment is an example. It's not a university policy, but legislation. It's legislation that gives authors permission for green OA regardless of the contracts they signed, and regardless of the rights they might have retained through a university policy. It's a very good idea and I recommend it everywhere. (There is already a similar law in Germany, and one is now emerging in France.) But what the authors need here is an example of a university rights-retention OA policy, or a thorough explanation of this kind of OA policy.

I'm not seeking a citation to my own university's activity, though it adopted the first OA policies of this kind. Nor am I seeking a citation to the good-practices guide that I maintain with Stuart Shieber, though it's the standard reference on this kind of policy. In fact, I was reluctant to follow up the authors' response to my prior suggestion because I didn't want to appear to seek additional citations. 

But since I've gone this far, I'll mention these two sources anyway:

The Harvard open-access policies

https://osc.hul.harvard.edu/policies/

​Good practices for university open-access policies

http://bit.ly/goodoa  

Of the two, I'd recommend the second in this situation. But even if the authors include no citation on this point, at least they should stop citing the Dutch law as an example, and treat it separately as another path to the same goal.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Referee response for version 2

Gwilym lockwood.

1 Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands

I have no further comments; my points or concerns have been addressed, and other issues are highlighted in further depth by other reviewers.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

In the first version I reviewed, you said, "This [green] route is dependent on journal or publisher policies on self-archiving (sherpa.ac.uk/romeo)."

That was untrue or incomplete and I offered this comment: It overlooks rights retention. Some individual authors retain enough rights to authorize green OA on their own. While this may be fairly rare, rights-retention OA policies at universities are increasingly common. More than 80 institutions in North America, Europe, Africa, and Asia have now adopted rights-retention OA policies. Under these policies, the faculty grant non-exclusive rights to their institution before they sign future publishing contracts. The institution then has permission to make those future articles OA without having to seek permission from publishers. (The institutions also grant the same non-exclusive rights back to authors.) For more detail on rights-retention OA policies, see Stuart Shieber and Peter Suber, "Good Practices for University OA Policies."

http://bit.ly/goodoa

You revised the text in a way that missed my point and misstated my position: "While academics themselves may have little power in debates regarding copyright, institutes could claim ownership of the work they likely already own by invoking their rights under the work made-for-hire doctrine (Denicola, 2006). However, it is difficult to imagine researchers favoring university-held rather than journal-held copyright, and a system of non-exclusive rights is preferred, as is reflected in OA policies and OA journals (Suber, 2012)."

Here's the main point: More than 80 universities around the world have adopted rights-retention OA policies. These policies are adopted by faculty votes, not administrative edicts. At these institutions the rights needed to authorize OA are not seized from faculty by the institution, or claimed by the institution through work-for-hire. These policies presuppose that the rights initially belong to faculty, not the institution. If the institution is to exercise them, faculty must voluntarily grant them. (There are details we needn't go into here, for example, that we're only talking about non-exclusive rights, and that these policies generally include waiver options when faculty don't want the institution to have rights to a given work.) At Harvard, which pioneered this type of policy, four of the school-level votes were unanimous. In short, it's not at all "difficult to imagine researchers favoring university-held...copyright." On the contrary, it's easy to imagine and widely attested. Green OA does not always depend on permission from publishers. Increasingly it depends on rights retention by authors, through carefully drafted and widely supported university policies. That's a fact. My opinion is that that's a good thing. 

You needn't share my opinion, and needn't mention the fact. But please don't misrepresent the fact or reverse my opinion.

Chris Chambers

1 School of Psychology, Cardiff University, Cardiff, UK

All my suggestions have been addressed. I'm very happy approve this interesting and useful addition to the literature.

Referee response for version 1

Tennant et al offer a timely and insightful review of the various effects of open access publishing on science and society. The paper is well structured and enjoyable to read. Although I am not an expert on open access publishing, I also found the discussion of the literature quite balanced and evidence-based.

I have just three recommendations for revisions:

  • In the discussion of the OA citation advantage (which is excellent), the authors are very careful to avoid asserting a causal link between the OA status of a paper and the number of citations it generates. However, in my view, the conclusion of the Gargouri (2010) paper should be caveated. While the results of Gargouri are consistent with the absence of a self-selection bias in producing the OA advantage, they cannot rule it out. It might be the case that funders or institutions that mandate OA are also those that are more likely to support/host higher impact research. Furthermore, given that OA mandates are never 100% effective, perhaps authors are more likely to comply with a mandate for work they perceive to be of high impact. A useful addition to this section would be specify the conditions under which we could determine whether or not OA causally influences citations. This would require a randomised controlled trial in which articles are randomly are assigned to OA and non-OA routes. To my knowledge, no such a trial has yet been undertaken, although the authors will be in a more informed position to know whether this is the case.
  • There is not much discussion in the paper of the distinction between full OA and hybrid OA. Given that the APCs for hybrid OA articles tend to be substantially higher than those for full OA articles, this may warrant more prominent coverage in the economic case for OA. It is also relevant to the brief mention of the Finch Report, which (controversially) favours gold OA (including the hybrid route) over green OA.
  • I was glad to see the link formed between OA and open science more generally, as many researchers and advocates draw a distinct (and somewhat arbitrary) line between these  initiatives. In this context, it would perhaps be relevant to mention the TOP guidelines ( https://cos.io/top/ ). TOP is very much an evolving entity so it would be interesting to consider the inclusion of an OA standard in future revisons of TOP.

p13 This sentence is difficult to parse: "Whereas this is hyper-variable, and strongly dependent on a range of factors, it is the fact that any public interest in science that is of importance."

Anne Tierney

1 Department of Learning and Teaching Enhancement, Edinburgh Napier University, Edinburgh, UK

This paper is a comprehensive review of the complexities of OA. I have come late to the discussion on this paper, and I find that the previous reviewers have been meticulous in their critique of the paper, to the point I have very little to add. However, there are a couple of points for consideration. What is the effect (if any) of the UK Research Excellence Framework on Open Access? To what extent is disciplinarity a factor in Open Access? I ask this question because of the high impact of the sciences and biomedical research, but wonder about Arts and Humanities (and other areas) as a comparision. As one of the other reviewers said, "this doesn't consider the fact that the prestige of some journals is advertising in and of itself." While this is true of scientific journals, the same can't be said, for example, of education journals, so there is a lack of parity between disciplines. There was also an assumption of the willingness of reviewers to continue to offer their services freely. This aspect of OA (and subscription-based) publishing is hardly ever critiqued, but it assumed to be part of the process.

All in all, this paper gives a lot of food for thought. I don't expect a rewrite of the paper, based on my comments, but I would welcome further discussion on where the authors (and readers) see OA going in the future.

Paige Brown Jarreau

1 Manship School of Mass Communication, Louisiana State University, Baton Rouge, LA, USA

This is an interesting and timely review of the issue of open access to scientific literature.

The two other reviewers have highlighted specific issues that should be addressed in the revision of this article, and I agree with these issues. I've added other notes below. I think this review article would benefit from a re-write to correct potentially biased language in support of open access and to round out the review with further evidence of open access impacts on citation rates, altmetrics, scientific literacy / public engagement and research quality. 

The authors cite “fostering a culture of greater scientific literacy” as a benefit of open access. While this is theoretically a benefit, has more or less open access in particular scientific fields been tied to greater or lesser scientific literacy in those areas? Is this potential benefit supported by research literature? The authors should be clear on what the evidence-based benefits of open access are, and also what the potential drawbacks are. References to related research should be provided on this topic.

The structure the authors use for laying out their evidence and the language they use (e.g. “[the] case for Open Access” ) appear to lean more toward the positive impacts / benefits of open access from the outset. The authors should be very careful to review the evidence first before making value-based statements or arguments about open access, even if the evidence-based benefits outweigh any potential drawbacks, or lack of significant benefits, in the end.

Correct the typo in the following sentence: “[In] A longitudinal study Eysenbach (2006) compared…”

The authors write: “One alternative explanation for the citation advantage could be that researchers choose to publish OA when a finding is more impactful, but empirical evidence contradicts this selection effect. Gargouri et al. (2010) compared citation counts within a cohort of OA articles that had either been self-selected as OA or mandated as OA (e.g., by funders). The study concluded that both were cited significantly more than non-OA articles. As such, these findings rule out a selection bias from authors as the cause for the citation advantage (Gargouri et al., 2010).” However, couldn’t funded research also have a tendency to be considered “more impactful,” because it was chosen in the first place to be funded and mandated as OA? The authors should discuss this, and whether there is any research that experimentally investigates whether open access provides a citation advantage. This could perhaps be suggested as future research. The authors should also discuss how/why some studies have found no citation advantage for OA papers.

Related to social media mentions of research papers and citation counts, the authors might also consider citing Liang, X., Su, L. Y. F., Yeo, S. K., Scheufele, D. A., Brossard, D., Xenos, M., ... & Corley, E. A. (2014). Building Buzz (Scientists) Communicating Science in New Media Environments. Journalism & Mass Communication Quarterly , 1077699014550092. [ PDF ]

                          

In discussing the economics of OA, the authors should also discuss any evidence of potential drawbacks for various stakeholders, such as where funds for pay-to-publish fees will come from and how these fees may affect individual researchers. Pay-to-publish models of OA may also burden early career researchers and researchers working in fields where research grants are more difficult to obtain. 

The authors do not discuss the potential impact of OA on research quality or reproducibility (reproducibility is only mentioned in the context of open data). As this has been a controversial issue in the past (e.g. the mentioned 'sting' operations) the authors should discuss any research that has investigated the impact of open access on the rigor of peer reviews, research quality, presence of replication studies / reproducibility, etc. There has also been some discussion of whether open peer review (like that used by F1000Research) affects the quality of reviews, e.g. (Rooyen et al. 1999). The authors should mention this and/or subsequent literature when addressing open peer review.

There have also been studies on scientists' / journal article authors' perceptions and attitudes toward open access, e.g. Schroter and Tite (2005; 2006). The authors might considering summarizing some of this research, as it gives context to some of the existing barriers to open access and perceived drawbacks among researchers.

In summary, the topic of this review is important and timely. However, this paper falls short of what I would expect from a systematic review in terms of systematically summarizing previous research findings related to the impact of open access on scientific publishing, public engagement with science, science literacy and altmetrics. The authors should be careful to hold back value judgements / arguments related to the case for open access until having systemically reviewed the evidence-based benefits, drawbacks, and/or lack of significant benefits. The authors should also provide some discussion of how we might objectively weigh any evidence-based benefits with potential drawbacks for various stakeholders including researchers and especially early career researchers. The authors should avoid summarizing mostly the findings of previous studies that find positive impacts of open access on the various domains of potential impact they consider in their paper. The evidence already presented in the paper is rigorous and detailed. However, I would recommend a revision that rounds this review out with more systematic evidence.

The article is very well-done, unusually thorough and detailed. Here are a few ways to improve it.

When I refer to page numbers, I mean the page numbers in the PDF of v1, April 11, 2016.

http://f1000research.com/articles/5-632/v1

"You" refers to the authors.

Apologies in advance if I sometimes cite my own work in these comments. 

p. 1. In the abstract you say, "The economic case for Open Access is less well-understood, although it is clear that access to the research literature is key for innovative enterprises, and a range of governmental and non-governmental services." 

This understates the economic case. For example, some subscription journals convert to OA precisely for economic benefits.

See the preliminary version of David Solomon, Bo-Christer Björk, and Mikael Laakso, "Converting Scholarly Journals to Open Access: A Review of Approaches and Experiences" now open for public comment. (The final version will be published this summer.)

https://osc.hul.harvard.edu/programs/journal-flipping/public-consultation/

See especially section 4.6, "Increased revenue and financial viability."

https://osc.hul.harvard.edu/programs/journal-flipping/public-consultation/4/6/

p. 3. You say, "The Green route refers to author self-archiving, in which a version of the peer-reviewed article is posted online to a repository or website."

Green OA also applies to preprints, which are not peer-reviewed.

p. 3. You say, "This [green] route is dependent on journal or publisher policies on self-archiving ( sherpa.ac.uk/romeo) ."

This is importantly incomplete. It overlooks rights retention. Some individual authors retain enough rights to authorize green OA on their own. While this may be fairly rare, rights-retention OA policies at universities are increasingly common. More than 80 institutions in North America, Europe, Africa, and Asia have now adopted rights-retention OA policies. Under these policies, the faculty grant non-exclusive rights to their institution before they sign future publishing contracts. The institution then has permission to make those future articles OA without having to seek permission from publishers. (The institutions also grant the same non-exclusive rights back to authors.)

For more detail on rights-retention OA policies, see Stuart Shieber and Peter Suber, "Good Practices for University OA Policies." 

p. 3. You say, "A subscription to all peer-reviewed journals is not affordable for any single individual, research institute or university (Odlyzko, 2006)." 

This is true and important, but it's a pity you don't cite more recent evidence than 2006. 

An important kind of evidence for this proposition is that not even Harvard University can afford all the journals needed by its faculty and students, and must cancel journals every year for budgetary reasons alone. I've collected seven public statements from Harvard to this effect (2008-2012) in the supplements to p. 30 of my 2012 book (Open Access, MIT Press, 2012).

http://bit.ly/oa-book#p30.2

p. 3. You say, "Much of the driving force behind this global change has been through a combination of direct, grassroots advocacy initiatives in conjunction with policy changes from funders and governments."

Please add *university policies* to this list. They're on a par with funder policies in importance, and they're far more numerous. ROARMAP shows that 7+ times more universities have OA policies than funders. 

p. 3. You say, "The Open Access movement is intrinsically tied to the development of the Internet and how it redefined communication and publishing (Laakso et al., 2011)."

For more documentation on how the OA movement arose as soon as the internet arose, see my Timeline of the Open Access Movement.

http://legacy.earlham.edu/~peters/fos/timeline.htm

In 2009, I moved the timeline to the Open Access Directory wiki, and you should probably cite that version:

Timeline of the open access movement...

http://oad.simmons.edu/oadwiki/Timeline

...especially subsection on developments before 2000.

http://oad.simmons.edu/oadwiki/Timeline_before_2000

(You might cite this timeline again on p. 5, when you introduce your own timeline.)

p. 3. You say, "One result of the growing OA movement is the rise of OA-only publishers...."

Somewhere in this paragraph, I'd mention that some OA publishers are for-profit (e.g. BMC) and some are non-profit (e.g. PLoS).

p. 4. Your section on the impact advantage is very well-done. Most treatments are much briefer, less careful, and less detailed than yours.

I have just these suggestions. You cite authors of individual studies, and Alma Swan's 2010 literature review. But you don't cite the mother lode of literature on this topic: Steve Hitchcock's annotated bibliography, "The effect of open access and downloads ('hits') on citation impact: a bibliography of studies."

http://eprints.soton.ac.uk/354006/1/oacitation-biblio-snapshot0613.html

Or more precisely, you cite it once, 10 paragraphs before the section on the impact advantage begins. You should cite it again within the section on the impact advantage. You should mention that it's comprehensive and annotated.

Hitchcock stopped updating it in 2013. But you should mention that SPARC Europe has committed to update it through its Open Access Citation Advantage Service.

http://sparceurope.org/oaca/

Finally, in the same place where you cite Swan's literature review, you should cite Ben Wagner's literature review, "Open Access Citation Advantage: An Annotated Bibliography," Issues in Science and Technology Librarianship, Winter 2010.

http://www.istl.org/10-winter/article2.html

p. 5. In the timeline entry for 2002, the BOAI was released on February 14, not January 14.

p. 6. In the timeline entry for 2013, I'd say that the suicide of Aaron Swartz "increases" (not "gains") international attention for the OA movement, or "draws new attention" to the OA movement. The current language suggests that the OA movement didn't have international attention before that, which is very far from the truth.

p. 9. You say, "Shifting copyright to stay with the author allows for wider re-use, including TDM, and forms the basis for a robust and developing public domain." 

You shouldn't use "public domain" here. In copyright law, the term has a specific meaning which you don't mean here.

p. 10. You say, "Only recently has any transparency into the detailed costs of subscriptions been gained by using Freedom of Information Requests to bypass non-disclosure agreements between libraries and publishers (Lawson & Meghreblian, 2015)."

Here you overlook the earlier Big Deal Contract Project in the US, from Ted Bergstrom, Paul Courant, and Preston McAfee. It too used public records laws and Freedom of Information requests. I'm not sure when it launched, but it was before 2009.

http://www.econ.ucsb.edu/~tedb/Journals/BundleContracts.html

p. 10. You say, "The average production cost for one paper is estimated to be around $3500–$4000 (Van Noorden, 2013)."

I've seen dozens of widely varying estimates of this cost, most of them much lower than Van Noorden's. Unfortunately I don't have time to hunt them down. I hope you can introduce at least a few more, if only to show that estimates differ widely here.

p. 10. You say, "Philip Campbell (Editor-in-Chief of Nature) stated that his journal’s internal costs were at $20,000–$30,000 per paper...."

To clarify, I think he meant that this was the cost per published paper. If Nature rejects x articles for every one it publishes, then this includes the cost of peer reviewing x rejected articles. Since Nature is very selective, x is high. But this "cost per published paper" should not be compared to costs for peer-reviewing a single paper or the production costs of publishing an accepted paper.

p. 10. You say, "OA publishing is most prevalent in the form of ‘pay-to-publish’...."

This is either false or misleading. About 70% of peer-reviewed OA journals charge no APCs at all. In that sense, the fee-based model is not the most prevalent. It's a minority model. On the other hand, about 50% of the articles published in peer-reviewed OA journals are published in the fee-based variety. 

On my claim that most OA journals charge no APCs: 

See my article, "Good facts, bad predictions," SPARC Open Access Newsletter, June 2006.

https://dash.harvard.edu/handle/1/4391309

And my article, "No-fee open-access journals," SPARC Open Access Newsletter, November 2, 2006. https://dash.harvard.edu/handle/1/4552050

The DOAJ used to make it easy to see what percentage of listed journals were fee-based and what percentage were no-fee. But it has temporarily made that difficult by combining the categories of "no-fee journals" and "journals for which we don't have enough information to say."

On my claim that about half the articles published in peer-reviewed OA journals are published in the fee-based variety, see the updates to p. 170 of my 2012 book. There I cite three studies and quote the relevant excerpts.

http://bit.ly/oa-book#p170

p. 10. When you describe ways in which fee-based OA journals mitigate some problems arising from the model, you mention the firewall between the editorial and business side of the journal (good), and you mention fee waivers (good).

You should also mention fee discounts, which many journals give in lieu of fee waivers. 

You should also mention that most fees charged by fee-based journals are paid by funders (59%), or the author's employer (24%), and only 12% are paid by authors themselves. These numbers are from Suenje Dallmeier-Tiessen et al., "Highlights from the SOAP project survey. What Scientists Think about Open Access Publishing," arXiv, January 29, 2011, Table 4.

http://arxiv.org/abs/1101.5260

p. 12. You say, "In his article Sutton (2011)..."

please change "his" to "her". The article is by Caroline Sutton.

p. 12. You say, "While OA is not a solution to all aspects of research accessibility (e.g., language barriers and disability access remain continuing issues to be addressed)...."

See my 2012 book (Open Access, MIT Press, 2012, http://bit.ly/oa-book ), at pp. 26-27, where I make much the same point. "OA isn't universal access" and by itself doesn't overcome "filtering and censorship barriers", "language barriers", "handicap access barriers", or "connectivity barriers".

p. 13. You refer to "the fact that access to knowledge is actively prohibited in fields like public health...."

I don't know what you mean here by "actively prohibited".

p. 13. You say, "Some traditional publishers, and some academics, have argued that public access to research is not required because research papers cannot be understood by non-specialists...."

Here you might want to cite Section 5.5.1 ("OA for Lay Readers," pp. 115-119) of my 2012 book.

p. 13. You say, "The shift from a ‘reader pays’ to an ‘author pays’ mode...."

I recommend avoiding the term "author pays" for the reasons I gave in my fourth comment to p. 10 above. Most OA journals don't charge author-side fees, and among those who do, most fees are not paid by authors. 

p. 13. You say, "This has been at least partially mitigated with waiver fees for authors from developing countries and additional provisions in research grants...."

Yes. But again, don't forget that the majority of peer-reviewed journals are no-fee journals. See my fourth comment to p. 10 above. 

pp. 15-15. You say, "Fortunately, it seems that funders and research organisations are moving in that direction. Since 2005, the number policies supporting OA publishing increased steadily. Consequently, it is now the responsibility of researchers to ensure OA to their publications either by choosing the green or the gold road."

Since you're recapitulating some grounds for optimism here, I'd also reiterate the growth of rights-retention OA policies. See my second comment on p. 3 above. 

p. 16. You say, "As Peter Suber, a leading voice in the OA movement, stated: 'As long as they do not have the power to stop Open Access, the toll-access publishers are not the enemy'."

Thanks for quoting me. I wanted to give you the source to cite. Unfortunately, I don't think you're using an exact quote. Here's the closest one I can find:

"TA [toll-access] publishers are not the enemy. They are only unpersuaded. Even when they are opposed, and not merely unpersuaded, they are only enemies if they have the power to stop OA. No publisher has this power, or at least not by virtue of publishing under a TA business model. If we have enemies, they are those who can obstruct progress to OA. The only people who fit this description are friends of OA who are distracted from providing OA by other work or other priorities."

It's from "Two distractions," SPARC Open Access Newsletter, May 3, 2004.

https://dash.harvard.edu/handle/1/4391169

Jon Tennant

Imperial College London, UK

Hi Alexander,

I've been looking into this, and the most up-to-date statistics for this based on the DOAJ come from this source:  http://citesandinsights.info/civ16i4.pdf . Figures here seem to suggest that 71% of journals in the DOAJ do not levy an APC. This is likely to change slightly with the updated 'crackdown' from the DOAJ ( http://www.nature.com/news/open-access-index-delists-thousands-of-journals-1.19871 ), but I'll add a reference to this in.

Harvard University, USA

Here are the latest stats from the DOAJ (May 24, 2016).

https://goo.gl/LejTAw

Dear Alexander,

I'm just posting the link that Peter was kind enough to send us yesterday, with updated statistics on this matter:  https://plus.google.com/+PeterSuber/posts/HjrRDcrZS8p

Important points:

Here are the numbers as of May 24, 2016:

Total number of journals listed in DOAJ = 8,858

Yes (fee-based) = 1,424 = 16%

No (no-fee) = 2,601 = 29%

No info = 4,833 = 55%

Further references and information are provided in that post, and will be integrated into the next version of this manuscript.

This is the first time I've written an open peer review, although I always sign private peer reviews. Normally I'd make comments directed to the authors and the authors alone, but since this is open, I've also included a section for other readers of this paper. This may sound a bit like an Amazon or Airbnb review or something.

Short summary for readers

This is an excellent paper about the academic, economic, and societal benefits and impacts of Open Access. It's a good introductory text for people who don't know much about OA and would like to know more. It's also a good persuasive text for stakeholders in policy, universities, publishing, funding, etc. positions who may be interested in including OA in their decision making.

In addition to its attention to detail, its main strengths are its focus, its brevity, and its relative impartiality.

One of the difficulties with writing about OA is that there are so many overlapping issues; this paper is very good at giving a brief overview or description of the other issues, pointing the reader in the direction of somewhere with more information, and then getting back onto the topic.

Another thing about OA is that its advocates are very passionate about it. As with any cause, that's a good thing for its supporters, but overwhelmingly pro-OA resources can seem potentially off-putting to neutrals. This paper does an excellent job of presenting an evidence-based pro-OA viewpoint in a measured tone and without coming across as ideological.

One possible caveat is that the paper presents extensive evidence of what OA does, but it doesn't tackle the meatier issue of how to implement it successfully. However, I feel that's a separate issue which is beyond the scope and purpose of this paper.

Suggestions and comments for authors

First of all, great article! Well done and thank you for pulling together what is a disparate collection of links and literature into a one-stop shop which is both useful and coherent. I like this article a lot... but my role here is to criticise and make it better, so the rest of this review will focus on that.

This article is well-written and well-structured. That's made it much easier as a reviewer to simply go through the article and highlight my issues with it paragraph by paragraph, rather than having to make it coherent first and then sort out the smaller things.

The vast majority of the issues I have with this paper are minor ones, so it didn't make sense to have separate major/minor sections; rather, I'll just go through them in order in the text.

(I printed this out to underline/comment on, so for me, tables 1 and 2 came during the academic case for OA section. Online, they're supplementary materials, and I think it's best that way, but this is why I'm commenting on the tables during that section)

"We recommend that OA supporters focus their efforts on working to establish viable new models and systems of scholarly communication, rather than trying to undermine the existing ones..."

In general, I agree with this sentiment. However, I feel that its inclusion in the abstract is a bit jarring as the text of the article doesn't really cover recommendations to OA supporters at all, other than in the very last paragraph. I think that's good, as I feel this paper is best suited as relatively neutral source of information rather than a preaching to the converted or ideology discussion kind of purpose. So, I think this part can be left out of the abstract; it doesn't refer to any particular "recommendations to OA supporters" bit in the text and potentially clouds the strength of the relative impartiality of the paper.

A brief history of OA

"BioMed Central and ... PLOS were founded in the early 2000s and remain successful businesses to date." (p3, col2)

Technically, PLOS is a non-profit. I suggest changing successful businesses to successful business models. This both highlights the financial sustainability of OA (increased APCs at PLOS notwithstanding) and also sets it apart from traditional publishers, which are definitely successful businesses.

The academic case for OA

figure 1 (p4, top)

I have difficulty interpreting the y-axis on figure 1. It's labelled as cumulative number of PubMed articles relative to 2000, but I'm not sure how to read it. Reading off 2014, non-OA is c.22 on the y-axis, and OA is c.33 on the y-axis. Based on the figure 1 caption about the ratio, I'm interpreting this as meaning that, in 2014, the ratio of cumulative PubMed articles was approx 33:22 OA to non-OA, or in other words, 60% of PubMed articles in 2000-2014 were OA. However, I'm not sure if this is how it's meant to be interpreted. I think that it's well visualised, and really makes it clear how OA has taken off, but exactly what the numbers represent on the y-axis is unclear to me: number of articles? number of times more articles? It could use some relabelling.

"Napster moment" (p4, col1)

I like the comparison, but it could use a citation (even just the Napster wikipedia article) and/or a little more explanation to clarify what that means.

"1991 ... by the American physician Paul Ginsparg"

He's a physicist, not a physician.

Xu et al (2011)

I don't think this reference was very well cited. Firstly, Oxford Open Journals are listed as a discipline, when they're the source of papers across disciplines. The actual disciplines were Medicine, Social Sciences, Mathematics & Physical Sciences, Life Sciences, and Humanities. Secondly, you list the citation advantage as 138.87%. However, one of the main findings of this paper was the disparity in citation advantages; it ranged from 163.16% for OA articles in Mathematics & Physical Sciences to an actual citation disadvantage of -49.24% for OA articles in Humanities. Given the pro-OA nature of the paper, I feel like you have an extra responsibility to report the few anti-OA pieces of evidence.

Gargouri et al (2010) (page 8, col1)

This paragraph is about a possible confound for the OA citation advantage, where it could be that researchers choose to publish OA for extra cool findings, and you use the Gargouri et al. study to counter this... which is totally correct. You write:

"Gargouri et al. (2010) compared citation counts [for articles which were] self-selected as OA or mandated as OA. The study concluded that both were cited significantly more than non-OA articles. As such, these findings rule out a selection bias"

This is true that both OA types were cited more than non-OA. However, it's also missing the crucial point that there was no difference in citation between self-selected OA articles and mandatory OA articles. Including this would strengthen your point to show that it's OA itself which leads to the citation advantage.

The whole section about altmetrics (subhead societal impact of the academic case for OA, p8, col2) could use some attention. It's not clear until much later what the difference is between alternative metrics (i.e. altmetrics), i.e. the various types of metrics which are alternative to journal impact factors, and Altmetric, i.e. the company which is often confusingly referred to as Altmetrics (not in this paper, to be fair, but elsewhere). A quick disambiguating sentence or two would be really useful here.

In the following paragraph (page 8, col2), you write about OA altmetrics advantage, and say that there's a logical assumption that OA articles should have one. However, this doesn't consider the fact that the prestige of some journals is advertising in and of itself. You can, and do, get a lot of closed-access papers which generate high altmetrics (social media attention, Mendeley readhership) from academics who do have access. And sure enough, in the next paragraph, (page 8 and 9), the Wang et al. 2015 article finds that the OA altmetric advantage doesn't extend to the most impactful articles. I think this section can be made more nuanced and informative by quickly discussing the role of journal prestige. Nothing in depth, just as something that exists and needs to change (for example, you could point people to Brembs et al. and the Deep Impact paper in Frontiers).

"Essentially, copyright is a tool wielded by traditional publishers for financial gain rather than fostering creativity..."

I don't disagree with this. However, I feel it comes on too strong. I think it's fair to say that most people's immediate opinion of copyright is "well, I'd like my stuff to be copyrighted, as that means people can't steal it and pass it off as their own". I think that you need a little more detail here, even just two or three sentences to explain how and why copyright is used for financial gain rather than author protection. Otherwise, it just sounds political/ideological, and counterintuitive for people who haven't read much about copyright.

Glenisson et al. (2005) citation (page 9, col2).

You write that TDM has "proven to be useful for a large variety of applications", and use the Glenisson citation to back this up. I have to say here that I know very little about TDM; however, following through to the Glenisson paper, I don't see how it supports that conclusion. I read it and it seems to show a proof-of-concept kind of study: that TDM can group a set of papers into themes in the same way that an expert can. This is really cool and everything, but I don't think that that substantiates your point that TDM is useful for a large variety of applications. Rather, I'd like to see a couple of specific examples, which you then describe more fully in the next paragraph. One good one is Swanson 1987 (I think - taken from here: http://people.ischool.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html ), who used TDM to make the link between migraines and magnesium deficiency.

"...simply because one can no longer keep up with the published literature".

Small point, but I think it's worth stressing that this is due to the amount of literature that there is.

The economic case for OA

the pay-to-publish part (p10, col2)

I feel this glosses over problems with pay-to-publish. You come back to predatory OA later, but this isn't quite the same: I think it could use a couple of extra sentences describing what the conflict of interest for researchers is, and also stress that pay-to-publish makes it potentially in a journal's interest to accept more papers than they necessarily should. One of the most common anti-OA arguments I see in non-scientific media is that OA is pay-to-publish, which is often misrepresented as "pay-to-publish is publication bribery". I think this section needs a little more substance to it to acknowledge/address this.

"making publication costs dependent on the value added..." (page 11, col1)

When talking about the value added by journals, this paragraph ignores the elephant in the room: journal prestige. Again, I know that this isn't the purpose of this article, but I think it could really be strengthened by mentioning it before moving on.

"Much primary research actually takes place outside of academia inside research and development departments" (page 11, col2)

The part following this sentence is muddy. First, you talk about R&D outside academia (i.e. presumably private research), and then you talk about access to research results because they're publicly financed public goods. So, what does that mean, that R&D from private businesses who've invested their own capital in it should be made available to all? (maybe I agree with that, in some cases, but a lot of people sure won't)

I think this paragraph could be honed a bit; otherwise, it's straying into the ideological territory of saying that all private research should be made public for the public good. That transcends OA in scholarly publishing, and makes OA in scholarly publishing too easy to dismiss.

The cancer research paragraph (page 12, col1) is also unclear. It took me a while to figure out it's talking about UK expenditure - my first assumption of "total expenditure" meant worldwide. It's also not totally clear what the point is - the geographical origin of research is unrelated to its open status. I think that it's quite a leap to write (apologies for paraphrasing) "83% of UK economic benefit from cancer research comes from research outside UK, therefore open access is good", because I think it conflates two different things.

Also, small point, "17% of the annual net-monetary was estimated" is missing the word benefit after net-monetary.

The societal case for OA

Small point: as somebody who wears a linguistics hat quite often, it rankles to read on page 11 "Examples of [non-academic] groups who might benefit include... those who work in linguistics and translation". Translation, for sure, but linguistics is an academic field - you even mention the Lingua to Glossa movement organised by academic linguists later in the manuscript! To me, this is like writing "...those who work in biology and vets", lumping the academic field and a practical use of that field together. Just referring to translation is fine.

Citizen engagement (page 13, col 1)

I agree that these are great examples of citizen engagement with science, but at the risk of sounding like an Elsevier representative, interest in projects like Galaxy Zoo does not entail desire to download and read papers. In fact, you could even make the (spurious) argument that those projects come into existence precisely because citizens aren't interested in downloading and reading papers. I don't actually agree with that, I agree with your general point... but I think that citizen science project interest and citizen science paper interest. Obviously I think it is in the public interest to have science journals OA, but this isn't the right argument (and I think the sentence "Such statements conflate a lack of desire or need for access with the denial of opportunity to access research" is perfect). I think a stronger argument would be to look at existing OA journals, such as PLOS and Frontiers, and see how many views and downloads come from people who aren't academics. If you can point to, say,  some of the most viewed/downloaded PLOS papers and say "look, 30% (or whatever, that's a random number) of these readers aren't academics, they're real people who are interested in it", that would make for a stronger argument.

Quibble about the "yes, we were warned about Ebola" example: the finding from that paper (that Liberians have Ebola antibodies in their blood, suggesting the endemic presence of Ebola) is actually written on the first page preview of the paper ( http://www.sciencedirect.com/science/article/pii/S0769261782800282/part/first-page-pdf , accessed from my laptop outside my institution). It could be argued that anybody could see this finding anywhere in the world, meaning that it's not a problem of OA, it's a problem about searching and indexing. A good counterargument to that is obviously that this paper would have been unsearchable with TDM at the start of the outbreak when people were combing through all West African Ebola literature.

" 'green' model of OA adoption" (page 13, col2)

You generally refer to Green and Gold routes, with the colours capitalised. Just a small terminology thing to keep consistent.

A much more important thing is also on page 13, col2:

"The pay-to-publish system is a potentially greater burden for authors in developed countries, considering that they are not used to paying publication costs, and funding systems for OA are not as well-established as those in the Western world."

--> developing countries, not developed countries!

Predatory publishers (page 14, col1)

I agree with Ross Mounce's comment on the paper: you give Beall too much importance. I think it can be a useful list and should be mentioned, but definitely include some caveats like the ones Ross writes, or the fact that he added Frontiers to the list because of a couple of editorial mistakes.

Peter Suber (page 16, col1)

You describe him as "a leading voice in the OA movement", but I think you should write what his positions are (see http://cyber.law.harvard.edu/~psuber/wiki/Peter_Suber ) in order to justify his importance.

Other general things

There are no proposed solutions in this paper, which is totally fine, because it's beyond the scope of the paper. I feel it could benefit by putting in a couple of sentences here and there about who is needed for driving this change: academics, funders, governments, etc.

I was disappointed not to see anything about the Dutch government and university library organisations' collective drive towards OA. They've changed the national law on copyright, they've reached agreements with most major publishing groups, they may well introduce mandatory OA publishing in the Netherlands in 2016, and they've made it one of the main priorities of their EU presidency this year. It's like the best example of how a whole country can take the lead and sort it out. I think including a quick reference to the Netherlands as an example of excellent OA policy (in the same way that you mention sciELO in Latin America) would go a long way towards convincing the people who are reading this thinking, "ah, yes, I guess OA makes sense in the developing world, but we're doing fine here in the West and it would be too difficult to change things". A good summary of that is here: http://openaccess.nl/en/in-the-netherlands/current-situation

Final remarks

That's the end of my 2800-odd word review. I really enjoyed reading this paper, going through it, and trying to find ways to improve it. Thanks to the authors for writing an excellent paper.

Help | Advanced Search

Computer Science > Computation and Language

Title: can pre-trained language models generate titles for research papers.

Abstract: The title of a research paper communicates in a succinct style the main theme and, sometimes, the findings of the paper. Coming up with the right title is often an arduous task, and therefore, it would be beneficial to authors if title generation can be automated. In this paper, we fine-tune pre-trained and large language models to generate titles of papers from their abstracts. We also use ChatGPT in a zero-shot setting to generate paper titles. The performance of the models is measured with ROUGE, METEOR, MoverScore, BERTScore and SciBERTScore metrics.
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: [cs.CL]
  (or [cs.CL] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Publishing your research in Open Access

    research papers open access

  2. (PDF) PAPER • OPEN ACCESS

    research papers open access

  3. (PDF) A review on open access e-journals publication

    research papers open access

  4. (PDF) RESEARCH Open Access

    research papers open access

  5. Research Paper: A Comprehensive Guide:2023-24 » Open access journals

    research papers open access

  6. An Introduction to Open Access Publishing

    research papers open access

COMMENTS

  1. CORE

    Research Policy Adviser. Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

  2. Directory of Open Access Journals

    DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, committed to ensuring quality content is freely available online for everyone.

  3. OA.mg · Open Access for Everyone · Download and read over 240 million

    Free access to millions of research papers for everyone. OA.mg is a search engine for academic papers. Whether you are looking for a specific paper, or for research from a field, or all of an author's works - OA.mg is the place to find it. Universities and researchers funded by the public publish their research in papers, but where do we ...

  4. SpringerOpen

    The SpringerOpen portfolio has grown tremendously since its launch in 2010, so that we now offer researchers from all areas of science, technology, medicine, the humanities and social sciences a place to publish open access in journals. Publishing with SpringerOpen makes your work freely available online for everyone, immediately upon publication, and our high-level peer-review and production ...

  5. ScienceOpen

    ScienceOpen offers content hosting, context building and marketing services for publishers. See our tailored offerings For academic publishers to promote journals and interdisciplinary collections For open access journals to host journal content in an interactive environment For university library publishing to develop new open access paradigms for their scholars For scholarly societies to ...

  6. Research Guides: Freely Available and Open Access Resources: General

    DOAJ's mission is to increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language.

  7. The fundamentals of open access and open research

    What is open access and open research? Open access (OA) refers to the free, immediate, online availability of research outputs such as journal articles or books, combined with the rights to use these outputs fully in the digital environment. OA content is open to all, with no access fees. Open research goes beyond the boundaries of publications ...

  8. Open access journals

    We have published over 124,000 open access articles via Gold open access across disciplines -from the life sciences to the humanities, representing 33% of all Springer Nature articles in 2020. Authors can also publish their article under an open access licence in more than 2,200 of our hybrid journals. Our portfolio focuses on robust and insightful research, supporting the development of new ...

  9. Home

    PLOS is a nonprofit, Open Access publisher empowering researchers to accelerate progress in science and medicine by leading a transformation in research communication.

  10. MDPI

    MDPI is a publisher of peer-reviewed, open access journals in various fields of science and technology since 1996.

  11. MIT Open Access Articles

    The MIT Open Access Articles collection consists of scholarly articles written by MIT-affiliated authors that are made available through DSpace@MIT under the MIT Faculty Open Access Policy, or under related publisher agreements. Articles in this collection generally reflect changes made during peer-review. Version details are supplied for each paper in the collection: Original manuscript ...

  12. Journals

    SpringerOpen Journals offers a wide range of open access publications in various fields of study.

  13. Open access at the Nature Portfolio

    Nature and the Nature research journals - now with immediate gold open access options for all primary research. We are fully committed to open research and the benefits this brings for ...

  14. Open Research Library

    The Open Research Library (ORL) is planned to include all Open Access book content worldwide on one platform for user-friendly discovery, offering a seamless experience navigating more than 20,000 Open Access books.

  15. Hindawi journals have joined Wiley's open access journal portfolio

    Hindawi journals have joined Wiley's open access journal portfolio Journal content is available to openly view, download, and share on Wiley Online Library. With a 200 year tradition of publishing excellence, Wiley is committed to expanding routes to open access publishing and ensuring the maximum reach and impact of high-quality, trusted research for the benefit of humankind.

  16. Open Access at AAAS

    Open Access at AAAS. AAAS and the Science family of journals believe in empowering authors with choice. We support Open Access (OA) options that are informed by the scientific community, contribute to the accurate record of published scientific content, and protect the overall integrity of that content. In 2015, AAAS began offering Gold Open ...

  17. What is Open Access?

    In contrast, open access ensures that the outputs of the research process can be read and built upon by everyone. Open access to publications is a component of Open Science, which encompasses a variety of efforts focused on making scientific research more transparent and accessible.

  18. CORE : the world's largest collection of open access research papers

    CORE harvests research papers from such as institutional and subject repositories, and open access and hybrid journals. CORE currently contains 207,255,818 open access articles collected from 10,286 data providers around the world."

  19. Advancing open access to knowledge

    Open access is vital to a collaborative, inclusive and transparent world of research where quality knowledge can be shared and built upon. Every day, we work to bring more insight into closer reach for the research community and the public. We offer a wide choice and flexibility for every researcher and institution around the world that wants ...

  20. CORE: A Global Aggregation Service for Open Access Papers

    Abstract. This paper introduces CORE, a widely used scholarly service, which provides access to the world's largest collection of open access research publications, acquired from a global ...

  21. Open access for science that inspires

    Cell Press open access and hybrid research journals support open access publication for groups of authors from Research4Life (R4L) countries. For papers where all of the authors are from a Group A and/or Group B R4L country we will grant a waiver or discount of the standard publishing fee, as appropriate.

  22. Open Access to Research Papers

    Making scholarly research outputs openly available is easy, legal, and has demonstrable benefits to authors, making it a good beginning step for a researcher just beginning to explore the open world. There is a set of knowledge required to navigate the Open Access landscape, involving copyright, article status, repositories, and economics.

  23. The academic, economic and societal impacts of Open Access: an evidence

    In spite of this, there is a general lack of consensus regarding the potential pros and cons of Open Access at multiple levels. This review aims to be a resource for current knowledge on the impacts of Open Access by synthesizing important research in three major areas: academic, economic and societal.

  24. Can pre-trained language models generate titles for research papers?

    The title of a research paper communicates in a succinct style the main theme and, sometimes, the findings of the paper. Coming up with the right title is often an arduous task, and therefore, it would be beneficial to authors if title generation can be automated. In this paper, we fine-tune pre-trained and large language models to generate titles of papers from their abstracts. We also use ...

  25. Perceived risk, health protective behavior, and trust: a case study of

    ABSTRACT. The Flint Water Crisis exemplifies one of the most significant environmental disasters in the recent U.S. history. While hazard exposure and effects pertaining to the crisis have been addressed, and technical experts have pronounced Flint's water safe for consumption, media reports and stories from community members suggest that significant distrust remains among residents ...

  26. Learning to Reason with LLMs

    Let's break this down step by step based on the example: 1. Example given: • Input: oyfjdnisdr rtqwainr acxz mynzbhhx • Output: Think step by step By examining the words: • The pattern involves selecting specific letters or transforming them. 2. Now, let's decode the new phrase: • Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz