litstudy: A Python package for literature reviews

  • December 2022
  • SoftwareX 20(7825):101207
  • 20(7825):101207
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Ben van Werkhoven at Leiden University

  • Leiden University

Abstract and Figures

Software architecture of litstudy..

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Sebastián Robledo

  • Luis Valencia

Martha Viviana Zuluaga

  • Faizhal Arif Santosa
  • Manika Lamba
  • Crissandra George
  • J. Stephen Downie
  • Rodrigo Guarischi-Sousa
  • José Eduardo Kroll
  • Adriano Bonaldi
  • Guilherme Lopes Yamamoto

Matthias Mueller

  • Simon Y. K. Li

Elias J. R. Freitas

  • RENEW SUST ENERG REV
  • Inês Campos
  • Marius Korsnes
  • nicola labanca
  • Paolo Bertoldi
  • J CLEAN PROD

Manjunath S. Vhatkar

  • Rakesh D. Raut
  • Ravindra Gokhale

Milind M. Akarte

  • Michael Waskom
  • Charles R Harris
  • K. Jarrod Millman
  • Stéfan van der Walt
  • Travis E. Oliphant
  • Stijn Heldens
  • Pieter Hijma

Ben van Werkhoven

  • Michael E. Rose
  • John R. Kitchin

Barbara Kitchenham

  • Oleg Shpynov
  • Kapralov Nikolai
  • Wes McKinney
  • Miloš Savić

Mirjana Ivanovic

  • Thomas Kluyver

Benjamin Ragan-Kelley

  • Jupyter development team [Unknown
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Litstudy: A Python Package for Literature Reviews

14 Pages Posted: 9 Apr 2022

Stijn Heldens

affiliation not provided to SSRN

Alessio Sclocco

Netherlands eScience Center

Henk Dreuning

VU University Amsterdam

Ben van Werkhoven

Pieter hijma, jason maassen, rob v. van nieuwpoort.

Researchers are often faced with exploring a new research domain, which can be challenging due to the overwhelming number of relevant publications. Broad questions, such as what are the novel research avenues in this domain, are difficult to answer. Therefore, we present litstudy, a Python package that allows answering such questions using simple scripts or Jupyter notebooks.The package enables selecting scientific publications and studying their metadata using visualizations, network analysis, and natural language processing. The software was previously used in a publication about the landscape of Exascale computing, and we envision great potential for reuse.

Keywords: Literature review, Python, Jupyter, Bibliometrics

Suggested Citation: Suggested Citation

Stijn Heldens (Contact Author)

Affiliation not provided to ssrn ( email ).

No Address Available

Netherlands eScience Center ( email )

Netherlands

VU University Amsterdam ( email )

Ben van werkhoven.

De Boelelaan 1105 Amsterdam, 1081HV Netherlands

Rob V. Van Nieuwpoort

Do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, educational & instructional communication ejournal.

Subscribe to this fee journal for more curated articles on this topic

CompSciRN EM Feeds

Subscribe to this free journal for more curated articles on this topic

  • DOI: 10.2139/ssrn.4079400
  • Corpus ID: 248161752

litstudy: A Python package for literature reviews

  • Stijn Heldens , A. Sclocco , +4 authors R. V. Nieuwpoort
  • Published in SoftwareX 1 December 2022
  • Computer Science

Figures and Tables from this paper

figure 1

12 Citations

Coconut libtool: bridging textual analysis gaps for non-programmers, pybibx - a python library for bibliometric and scientometric analysis powered with artificial intelligence tools, an exploratory study of helping undergraduate students solve literature review problems using litstudy and nlp.

  • Highly Influenced

LAxplore: An NLP-Based Tool for Distilling Learning Analytics and Learning Design Instruments out of Scientific Publications

Covid-19 fake news: a systematic literature review using “smartlitreview”, big data and machine learning driven bioprocessing - recent trends and critical analysis., tosr: create the tree of science from wos and scopus, a benchmark of in-house homologous recombination repair deficiency testing solutions for high-grade serous ovarian cancer diagnosis, a systematic literature mapping of path planning and collision avoidance approaches for unmanned fixed-wings, can renewable energy prosumerism cater for sufficiency and inclusion, 31 references, pubtrends: a scientific literature explorer, software framework for topic modelling with large corpora, pybliometrics: scriptable bibliometrics using a python interface to scopus, on bibliographic networks, jupyter notebooks - a publishing format for reproducible computational workflows, tools to support systematic literature reviews in software engineering: a mapping study, mining text data, co-citation in the scientific literature: a new measure of the relationship between two documents, utopian: user-driven topic modeling based on interactive nonnegative matrix factorization, a systematic review of systematic review process research in software engineering, related papers.

Showing 1 through 3 of 0 Related Papers

researchpal 1.0.1

pip install researchpal Copy PIP instructions

Released: Sep 27, 2023

Python library for generating literature review

Verified details  (What is this?)

Maintainers.

Avatar for ResearchPal from gravatar.com

Unverified details

  • License: MIT License (MIT)
  • Author: Veracious.ai
  • Tags researchpal, literature review, generate literature review, python literature

Classifiers

  • 5 - Production/Stable
  • Science/Research
  • OSI Approved :: MIT License
  • MacOS :: MacOS X
  • Microsoft :: Windows
  • Python :: 3.8

Project description

literature review on python

researchpal: A Python Library for Automated Literature Review Generation

What is it.

researchpal is a Python library that automates the process of generating academic literature reviews based on a research question. It utilizes external data sources to fetch research papers, synthesizes the findings, and generates a concise literature review. This library is particularly useful for researchers and students looking to streamline the literature review process.

  • Fetches research papers from Springer and Arxiv.
  • Synthesizes research findings into a coherent literature review.
  • Extracts citations and generates a references list.
  • Supports both short(cite around 5 research papers) and long(cite around 10 research papers) literature reviews.

Prerequisites:

Before using researchpal, ensure you have the following prerequisites:

  • Python 3.8 installed on your system.
  • An API key for OpenAI (required for certain functionalities).

Installation:

To install researchpal, you can use pip:

pip install researchpal
  • research_question: Your research question.
  • openai_key: Your OpenAI API key.
  • length: The length of the literature review ("short" or "long," default is "short").

Here's a basic example of how to use researchpal in your Python script:

from researchpal import generate_literature_review research_question = "your_query" openai_key = "your_openai_api_key" length (by default) = "short" # can be "short" or "long" literature_review.generate_literature_review(research_question, openai_key, length)

This project is licensed under the MIT License.

Support and Feedback:

For support or feedback, please contact us at [email protected]

Acknowledgments:

This library makes use of research papers from Springer API and Arxiv API, and it requires an OpenAI API key for certain functionality.

Project details

Release history release notifications | rss feed.

Sep 27, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages .

Source Distribution

Uploaded Sep 27, 2023 Source

Built Distribution

Uploaded Sep 27, 2023 Python 3

Hashes for researchpal-1.0.1.tar.gz

Hashes for researchpal-1.0.1.tar.gz
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256

Hashes for researchpal-1.0.1-py3-none-any.whl

Hashes for researchpal-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256
  • português (Brasil)

Supported by

literature review on python

The Best Python Books

The Best Python Books

Table of Contents

Python Crash Course

Head-first python, 2nd edition, invent your own computer games with python, 4th edition, think python: how to think like a computer scientist, 2nd edition, effective computation in physics: field guide to research with python, learn python 3 the hard way, real python course, part 1, python for kids: a playful introduction to programming, teach your kids to code: a parent-friendly guide to python programming, python tricks: a buffet of awesome python features, fluent python: clear, concise, and effective programming, effective python: 59 ways to write better python, python cookbook, get coding.

In this article, we highlight the best books for learning Python through a collection of book reviews. Each review gives you a taste of the book, the topics covered, and the context used to illustrate those topics. Different books will resonate with different people, depending on the style and presentation of the books, the readers’ backgrounds, as well as other factors.

Python is an amazing programming language. It can be applied to almost any programming task, allows for rapid development and debugging, and brings the support of what is arguably the most welcoming user community.

Getting started with Python is like learning any new skill: it’s important to find a resource you connect with to guide your learning. Luckily, there’s no shortage of excellent books that can help you learn both the basic concepts of programming and the specifics of programming in Python. With the abundance of resources, it can be difficult to identify which book would be best for your situation.

If you are new to Python, any of the introductory books will give you a solid foundation in the basics.

Perhaps you want to learn Python with your kid, or maybe teach Python to a group of kids. Check out the Best Python Books for Kids for resources aimed at a younger audience.

As you progress in you Python journey, you will want to dig deeper to maximize the efficiency of your code. The best intermediate and advanced Python books provide insight to help you level up your Python skills, enabling you to become an expert Pythonista.

After reading these reviews, if you still are not sure which book to choose, publishers often provide a sample chapter or section to give you an example of what the book offers. Reading a sample of the book should give you the most representative picture of the author’s pace, style, and expectations.

Regardless of which book most stands out, consider this anecdote from one of our book reviewers, Steven C. Howell:

“A favorite professor once told me, ‘It doesn’t matter which book you read first. It’s always the second one that makes the most sense.’ I can’t say this has always been the case for me, but I’ve definitely found that a second reference can make all the difference when the first left me puzzled or frustrated. When learning Python classes, I had difficulty relating to the examples used in the first two books I picked up. It wasn’t until the third book I referred to that the concepts started to click. The important lesson is that if you get stuck or frustrated, and the resources you have are not helping, then don’t give up. Look at another book, search the web, ask on a forum, or just take a break.”

Note: This article contains affiliate links to retailers like Amazon, so you can support Real Python by clicking through and making a purchase on some of the links. Purchasing from one of these links adds no extra cost to you. Affiliate links never influence our editorial decisions in any way.

Best Books for Learning Python

If you are new to Python, you are likely in one of the following two situations:

  • You are new to programming and want to start by learning Python.
  • You have a reasonable amount of programming experience in another language and now want to learn Python.

This section focuses on the first of these two scenarios, with reviews of the books we consider to be the best Python programming books for readers who are new to both programming and Python. Accordingly, these books require no previous programming experience. They start from the absolute basics and teach both general programming concepts as well as how they apply to Python.

Note: If you’re looking for the best Python books for experienced programmers , consider the following selection of books with full reviews in the intro and advanced sections:

  • Think Python : The most basic of this list, Think Python provides a comprehensive Python reference.
  • Fluent Python : While Python’s simplicity lets you quickly start coding, this book teaches you how to write idiomatic Python code, while going into several deep topics of the language.
  • Effective Python: 59 Ways to Write Better Python : This relatively short book is a collection of 59 articles that, similarly to Fluent Python , focus on teaching you how to write truly Pythonic code.
  • Python Cookbook : As a cookbook, this will be a good reference on how to use Python to complete tasks you have done in another language.

Alternatively, you may even prefer to go directly to the official Python Tutorial , a well-written and thorough resource.

Eric Matthes (No Starch Press, 2016)

"Python Crash Course" Book Cover

It does what it says on the tin, and it does it really well. The book starts out with a walkthrough of the basic Python elements and data structures, working through variables, strings, numbers, lists, and tuples, outlining how you work with each of them.

Next, if statements and logical tests are covered, followed by a dive into dictionaries.

After that, the book covers user input, while loops , functions, classes, and file handling, as well as code testing and debugging.

That’s just the first half of the book! In the second half, you work on three major projects, creating some clever, fun applications.

The first project is an Alien Invasion game, essentially Space Invaders, developed using the pygame package. You design a ship (using classes), then program how to pilot it and make it fire bullets. Then, you design several classes of aliens, make the alien fleet move, and make it possible to shoot them down. Finally, you add a scoreboard and a list of high scores to complete the game.

After that, the next project covers data visualization with matplotlib , random walks, rolling dice, and a little bit of statistical analysis, creating graphs and charts with the pygal package. You learn how to download data in a variety of formats, import it into Python, and visualize the results, as well as how to interact with web APIs, retrieving and visualizing data from GitHub and HackerNews.

The third project walks you through the creation of a complete web application using Django to set up a Learning Log to track what users have been studying. It covers how to install Django, set up a project, design your models, create an admin interface, set up user accounts, manage access controls on a per-user basis, style your entire app with Bootstrap, and then finally deploy it to Heroku.

This book is well written and nicely organized. It presents a large number of useful exercises as well as three challenging and entertaining projects that make up the second half of the book. (Reviewed by David Schlesinger.)

  • View On Amazon »
  • View On Publisher Website »

Paul Barry (O’Reilly, 2016)

"Head-First Python" Book Cover

I really like the Head-First series of books, although they’re admittedly lighter weight in overall content than many of the other recommendations in this section. The trade-off is the that this approach makes the book more user-friendly.

If you’re the kind of person who likes to learn things one small, fairly self-contained chunk at a time, and you want to have lots of concrete examples and illustrations of the concepts involved, then the Head-First series is for you. The publisher’s website has the following to say about their approach:

“Based on the latest research in cognitive science and learning theory, Head-First Python uses a visually rich format to engage your mind, rather than a text-heavy approach that puts you to sleep. Why waste your time struggling with new concepts? This multi-sensory learning experience is designed for the way your brain really works.” (Source)

Chock full of illustrations, examples, asides, and other tidbits, Head-First Python is consistently engaging and easy to read. This book starts its tour of Python by diving into lists and explaining how to use and manipulate them. It then goes into modules, errors, and file handling. Each topic is organized around a unifying project: building a dynamic website for a school athletic coach using Python through a Common Gateway Interface (CGI).

After that, the book spends time teaching you how to use an Android application to interact with the website you created. You learn to handle user input, wrangle data, and look into what’s involved in deploying and scaling a Python application on the web.

While this book isn’t as comprehensive as some of the others, it covers a good range of Python tasks in a way that’s arguably more accessible, painless, and effective. This is especially true if you find the subject of writing programs somewhat intimidating at first.

This book is designed to guide you through any challenge. While the content is more focused, this book has plenty of material to keep you busy and learning. You will not be bored. If you find most programming books to be too dry, this could be an excellent book for you to get started in Python. (Reviewed by David Schlesinger and Steven C. Howell.)

Al Sweigart (No Starch, 2017)

"Invent Your Own Computer Games with Python" Book Cover

If games are your thing, or you even have a game idea of your own, this would be the perfect book to learn Python. In this book, you learn the fundamentals of programming and Python with the application exercises focused on building classic games.

Starting with an introduction to the Python shell and the REPL loop, followed by a basic “Hello, World!” script, you dive right into making a basic number-guessing game, covering random numbers, flow control, type conversion, and Boolean data. After that, a small joke-telling script is written to illustrate the use of print statements, escape characters, and basic string operations.

The next project is a text-based cave exploration game, Dragon’s Realm, which introduces you to flowcharts and functions, guides you through how to define your own arguments and parameters, and explains Boolean operators, global and local scope, and the sleep() function.

After a brief detour into how to debug your Python code, you next implement the game of Hangman, using ASCII artwork, while learning about lists, the in operator, methods, elif statements, the random module, and a handful of string methods.

You then extend the Hangman game with new features, like word lists and difficulty levels, while learning about dictionaries, key-value pairs, and assignment to multiple variables.

Your next project is a Tic-Tac-Toe game, which introduces some high-level artificial intelligence concepts, shows you how to short-circuit evaluation in conditionals, and explains the None value as well as some different ways of accessing lists.

Your journey through the rest of the book proceeds in a similar vein. You’ll learn nested loops while building a Mastermind-style number guessing game, Cartesian coordinates for a Sonar Hunt game, cryptography to write a Caesar cipher, and artificial intelligence when implementing Reversi (also known as Othello), in which the computer can play against itself.

After all of this, there’s a dive into using graphics for your games with PyGame: you’ll cover how to animate the graphics, manage collision detection, as well as use sounds, images, and sprites. To bring all these concepts together, the book guides you through making a graphical obstacle-dodging game.

This book is well done, and the fact that each project is a self-contained unit makes it appealing and accessible. If you’re someone who likes to learn by doing, then you’ll enjoy this book.

The fact that this book introduces concepts only as needed can be a possible disadvantage. While it’s organized more as a guide than a reference, the broad range of contents taught in the context of familiar games makes this one of the best books for learning Python. (Reviewed by David Schlesinger.)

  • View On Author Website »

Allen B. Downey (O’Reilly, 2015)

"Think Python: How to Think Like a Computer Scientist" Book Cover

If learning Python by creating video games is too frivolous for you, consider Allen Downey’s book Think Python , which takes a much more serious approach.

As the title says, the goal of this book is to teach you how coders think about coding, and it does a good job of it. Compared to the other books, it’s drier and organized in a more linear way. The book focuses on everything you need to know about basic Python programming, in a very straightforward, clear, and comprehensive way.

Compared to other similar books, it doesn’t go quite as deep into some of the more advanced areas, instead covering a wider range of material, including topics the other books don’t go anywhere near. Examples of such topics include operator overloading, polymorphism, analysis of algorithms, and mutability versus immutability.

Previous versions were a little light on exercises, but the latest edition has largely corrected this shortcoming. The book contains four reasonably deep projects, presented as case studies, but overall, it has fewer directed application exercises compared to many other books.

If you like a step-by-step presentation of just the facts, and you want to get a little additional insight into how professional coders look at problems, this book is a great choice. (Reviewed by David Schlesinger and Steven C. Howell.)

Anthony Scopatz, Kathryn D. Huff (O’Reilly, 2015)

Effective Computation in Physics

This is the book I wish I had when I was first learning Python.

Despite its name, this book is an excellent choice for people who don’t have experience with physics, research, or computational problems.

It really is a field guide for using Python. On top of actually teaching you Python, it also covers the related topics, like the command-line and version control, as well as the testing and deploying of software.

In addition to being a great learning resource, this book will also serve as an excellent Python reference, as the topics are well organized with plenty of interspersed examples and exercises.

The book is divided into four aptly named sections: Getting Started, Getting it Done, Getting it Right, and Getting it Out There.

The Getting Started section contains everything you need to hit the ground running. It begins with a chapter on the fundamentals of the bash command-line. (Yes, you can even install bash for Windows .) The book then proceeds to explain the foundations of Python, hitting on all the expected topics: operators, strings, variables, containers, logic, and flow control. Additionally, there is an entire chapter dedicated to all the different types of functions, and another for classes and object-oriented programming.

Building on this foundation, the Getting it Done section moves into the more data-centric area of Python. Note that this section, which takes up approximately a third of the book, will be most applicable to scientists, engineers, and data scientists. If that is you, enjoy. If not, feel free to skip ahead, picking out any pertinent sections. But be sure to catch the last chapter of the section because it will teach you how to deploy software using pip, conda, virtual machines, and Docker containers.

For those of you who are interested in working with data, the section begins with a quick overview of the essential libraries for data analysis and visualization. You then have a separate chapter dedicated to teaching you the topics of regular expressions, NumPy, data storage (including performing out-of-core operations), specialized data structures (hash tables, data frames, D-trees, and k-d trees), and parallel computation.

The Getting it Right section teaches you how to avoid and overcome many of the common pitfalls associated with working in Python. It begins by extending the discussion on deploying software by teaching you how to build software pipelines using make . You then learn how to use Git and GitHub to track, store, and organize your code edits over time, a process known as version control. The section concludes by teaching you how to debug and test your code, two incredibly valuable skills.

The final section, Getting it Out There, focuses on effectively communicating with the consumers of your code, yourself included. It covers the topics of documentation, markup languages (primarily LaTeX), code collaboration, and software licenses. The section, and book, concludes with a long list of scientific Python projects organized by topic.

This book stands out because, in addition to teaching all the fundamentals of Python, it also teaches you many of the technologies used by Pythonistas. This is truly one of the best books for learning Python.

It also serves as a great reference, will a full glossary, bibliography, and index. The book definitely has a scientific Python spin, but don’t worry if you do not come from a scientific background. There are no mathematical equations, and you may even impress your coworkers when they see you are on reading up on Computational Physics! (Reviewed by Steven C Howell.)

Zed A. Shaw (Addison-Wesley, 2016)

"Learn Python 3 The Hard Way" Book Cover

Learn Python the Hard Way is a classic. I’m a big fan of the book’s approach. When you learn “the hard way,” you have to:

  • Type in all the code yourself
  • Do all the exercises
  • Find your own solutions to problems you run into

The great thing about this book is how well the content is presented. Each chapter is clearly presented. The code examples are all concise, well constructed, and to the point. The exercises are instructive, and any problems you run into will not be at all insurmountable. Your biggest risk is typographical errors. Make it through this book, and you’ll definitely no longer be a beginner at Python.

Don’t let the title put you off. The “hard way” turns out to be the easy way if you take the long view. Nobody loves typing a lot of stuff in, but that’s what programming actually involves, so it’s good to get used to it from the start. One nice thing about this book is that it has been refined through several editions now, so any rough edges have been made nice and smooth by now.

The book is constructed as a series of over fifty exercises , each building on the previous, and each teaching you some new feature of the language. Starting from Exercise 0, getting Python set up on your computer, you begin writing simple programs. You learn about variables, data types, functions, logic, loops, lists, debugging, dictionaries, object-oriented programming, inheritance, and packaging. You even create a simple game using a game engine.

The next sections cover concepts like automated testing, lexical scanning on user input to parse sentences, and the lpthw.web package , to put your game up on the web.

Zed is an engaging, patient writer who doesn’t gloss over the details. If you work through this book the right way—the “hard way,” by following up on the study suggestions provided throughout the text as well as the programming exercises—you’ll be well beyond the beginner programmer stage when you’ve finished. (Reviewed by David Schlesinger.)

Note: Of all the books included in this article, this is the only with somewhat mixed reviews. The Stack Overflow (SO) community has compiled a list of 22 complaints prefaced with the following statement:

“We noticed a general trend that users using [ Learn Python the Hard Way ] post questions that don’t make a lot of sense both on SO and in chat. This is due to the structure and techniques used in the book.” (Source)

They provide their own list of recommended tutorials , which includes the following:

  • The official Python 3 tutorial
  • Dive into Python 3
  • The Invent with Python series , which includes Invent Your Own Computer Games with Python
  • Think Python

Despite the negative criticism toward Learn Python the Hard Way , David Schlesinger and Amazon reviewers agree that the book is worthwhile, though you probably want to supplement your library with another Python book that could serve more as a reference. Also, be sure to do your due diligence before posting questions to Stack Overflow, as that community can be somewhat abrasive at times.

Real Python Team (Real Python, 2017)

Real Python Logo

This eBook is the first of three (so far) in the Real Python course series . It was written with the goal of getting you up and running, and it does a great job at achieving this goal. The book is a mix of explanatory prose, example code, and review exercises. The interspersed review exercises solidify your learning by letting you immediately apply what you’ve learned.

As with the previous books, clear instructions are provided up front for getting Python installed and running on your computer. After the setup section, rather than giving a dry overview of data types, Real Python simply starts with strings and is actually quite thorough: you learn string slicing before you hit page 30.

Then the book gives you a good sense of the flavor of Python by showing you how to play with some of the class methods that can be applied. Next, you learn to write functions and loops, use conditional logic, work with lists and dictionaries, and read and write files.

Then things get really fun! Once you’ve learned to install packages with pip (and from source), Real Python covers interacting with and manipulating PDF files, using SQL from within Python, scraping data from web pages, using numpy and matplotlib to do scientific computing, and finally, creating graphical user interfaces with EasyGUI and tkinter .

What I like best about Real Python is that, in addition to covering the basics in a thorough and friendly way, the book explores some more advanced uses of Python that none of the other books hit on, like web-scraping. There are also two additional volumes, which go into more advanced Python development. (Reviewed by David Schlesinger.)

  • View On Real Python »

Disclaimer: I first started using the Real Python books several years ago, when they were still in beta. I thought then—and still think now—that they’re one of the best resources available to learn the Python language and several ways it can be used. My gig writing articles on the Real Python web site is a much more recent development, and my review is completely independent. — David

Best Python Books for Kids

The following books are aimed at adults interested in teaching kids to code, while possibly learning it themselves along the way. Both of these books are recommended for kids as young as 9 or 10, but they are great for older kids as well.

It’s important to note that these books are not meant to be just handed to a kid, depending on their age. They would be ideal for a parent who wanted to learn Python alongside their child.

Jason R. Briggs (No Starch, 2013)

"Python for Kids: A Playful Introduction to Programming" Book Cover

“Playful” is right! This is a fun book for all ages, despite its title. It provides a clear, easy to follow, introduction to Python programming. It’s profusely illustrated, the examples are straightforward and clearly presented, and it’s a solid guide for someone who wants to get a good grounding in the basics, plus a little more.

The book begins with an excellent, detailed guide to getting Python installed on your system, whether that’s Windows, OS X, or Ubuntu Linux. It then proceeds to introduce the Python shell and how it can be used as a simple calculator. This serves to introduce some basic concepts like variables and arithmetic operation.

Next, iterables are tackled, and the chapter works its way progressively through strings, lists, tuples, and dictionaries.

Once that’s accomplished, the Python turtle library is used to begin working with turtle graphics, a popular framework for teaching children to code. From there, the book progresses through conditional statements, loops, functions, and modules.

Classes and objects are covered, followed by a truly excellent section on Python’s built-in functions, and then a section on a number of useful Python libraries and modules. Turtle graphics are revisited in greater detail, after which the book introduces tkinter for creating user interfaces, better graphics, and even animations.

This concludes part 1 of the book, “Learning to Program,” with the remainder focused on building two fun application projects. The first project is to build a single-player version of Pong , called Bounce! This integrates the programming concepts of functions, classes, and control flow, together with the tasks of creating an interface using tkinter , illustrating to the canvas, performing geometric calculations, and using event bindings to create interactivity.

In the second project, you build a side-scrolling video game, Mr. Stickman Races for the Exit. This game applies many of the same concepts and tasks as Bounce! but with more depth and increased complexity. Along the way, you also get introduced to the open source image manipulation program GIMP , used to create your game’s assets. The book gets an amazing amount of mileage out of these two games, and getting them working is both instructive and a lot of fun.

I really like this book. Whether you are young, or just young at heart, you will enjoy this book if you are looking for a fun, approachable, introduction to Python and programming. (Reviewed by David Schlesinger and Steven C. Howell.)

Bryson Payne (No Starch, 2015)

"Teach Your Kids to Code: A Parent-Friendly Guide to Python Programming" Book Cover

This book is similar to Python for Kids but intended more for an adult working with a child (or children) to learn to code, as the title suggests. One thing that sets this book apart from most introductory books is the use of color and illustrations on almost every page. The book is well written and presents learning to code as a way to teach children problem-solving skills.

As is commonly the case, this book begins with a Python installation guide. Compared to Python for Kids , the guide in this book is more cursory but completely adequate.

The first activity is, again, turtle graphics. A number of basic variations on drawing a rotated square are presented—without a lot of underlying explanation, initially—just to introduce the general concepts, but by the end of the section, you’ll have been provided with a pretty good understanding of the basics.

Next, calculations, variables, and mathematics in Python are explained. Once strings have been covered, the book brings all of that back into turtle graphics to enhance and explore the work that was done earlier. By this point, the code explanations are extremely clear, with explicit line-by-line details. You’d have a hard time misunderstanding any of the code presented.

Lists are explored next, as is the eval() function. Loops are introduced and then used to create increasingly complex graphics with the turtle. Conditional expressions come next, along with Boolean logic and operators.

The random library is introduced with a guessing game and randomly placed spirals made with turtle graphics. You explore randomness further by implementing rolling dice and picking cards, which leads up to you creating the games Yahtzee and War.

Functions, more advanced graphics, and user interaction are investigated next.

The book then branches off to cover using PyGame to create even more advanced graphics and animations, and then user interaction to create a very simple drawing program.

At this point, you have all the tools to create some real games. Development of both a full-featured version of Pong and a bubble-popping game are presented. Both provide enough depth to pose some challenges and maintain interest.

What I like best about this book is its large number of programming challenges, as well as the excellent summaries at the end of each chapter reminding you what was covered. If you and your child are interested in programming, this book should take both of you a good distance, and you’ll have a lot of fun. As the author, Dr. Bryson Payne, said in his recent TEDx talk , “Step out of your comfort zone, and become literate in the language of technology.” (Reviewed by David Schlesinger and Steven C. Howell.)

Best Intermediate and Advanced Python Books

Knowing Python is one thing. Knowing what’s Pythonic takes practice. Sometimes Python’s low barrier to entry gives people the mistaken idea that the language is less capable than other languages, that style does not matter, or that best practices are only a matter of preference. Have you ever seen Python code that looked like C or Fortran?

Learning how to use Python effectively requires some understanding of what Python is doing under the hood. Pythonic programming takes advantage of how the Python language is implemented to maximize the efficiency of your code.

Fortunately, there are some excellent books, packed with expert guidance, aimed to help you take what you’ve learned and level up your skills. Any of the books in this section will give you a deeper understanding of Python programming concepts and teach you how to write developer-style Python code. Note that these are by no means introductory books. They do not include the basics of getting started. These books will be helpful if you are already coding in Python and want to further hone your skills on your path to becoming a serious Pythonista.

Dan Bader (dbader.org, 2017)

"Python Tricks" Book Cover

This book illustrates valuable lesser-known Python features and best practices, written to help you gain a deeper understanding of Python. Each of the 43 subsections presents a different concept, referred to as a Python Trick, with discussion and easy-to-digest code examples illustrating how you can take advantage of that concept.

The book’s content is broken into the following sections:

  • Patterns for Cleaner Python
  • Effective Functions
  • Classes & OOP
  • Common Data Structures in Python
  • Looping & Iteration
  • Dictionary Tricks
  • Pythonic Productivity Techniques

As it says on the cover, the content is organized as “A Buffet,” with each subsection being a self-contained topic, with a brief introduction, examples, discussion, and list of Key Takeaways . As such, you should feel free to jump around to whichever sections are the most appealing.

In addition to the book, I particularly enjoyed the 12 Bonus Videos that are available when you purchase this as an eBook. They have an average length of 11 minutes, perfect for watching during lunch. Each video illustrates a different concept using clear and concise code examples that are simple to reproduce. While some of the videos covered familiar concepts, they still provided interesting insight without dragging on. (Reviewed by Steven C. Howell.)

Disclaimer: Though this book is officially distributed through Real Python, I recommend it independently of my connection with Real Python. I purchased this book when it was first released, before I had the opportunity to write for Real Python. For further evidence of the value of this book, check out the Amazon reviews : 148, averaging 4.8 out of 5 stars, at the time of this review. — Steve

Luciano Ramalho (O’Reilly, 2014)

"Fluent Python" Book Cover

This book was written for experienced Python 2 programmers who want to become proficient in Python 3. Consequently, this book is perfect for someone with a solid foundation in the basics of Python, 2 or 3, who wants to take their skills to the next level. Additionally, this book also works well as a reference for an experienced programmer from another language who wants to look up “How do I do <x> in Python?”

The book is organized by topic so that each section can be read independently. While many of the topics covered in this book are found in introductory books, Fluent Python provides much more detail, illuminating many of the more nuanced and overlooked features of the Python language.

The chapters are broken into the following six sections:

  • Prologue : introduces Python’s object-oriented nature and the special methods that keep Python libraries consistent
  • Data Structures : covers sequences, mappings, sets, and the difference between str and bytes
  • Functions as Objects : explains the consequences of functions being first-class objects in the Python language
  • Object-Oriented Idioms : includes references, mutability, instances, multiple inheritance, and operator overloading
  • Control Flow : extends beyond the basic conditionals and covers the concept of generators, context managers, coroutines, yield from syntax, and concurrency using asyncio
  • Metaprogramming : explores the lesser know aspects of classes, discussing dynamic attributes and properties, attribute descriptors, class decorators, and metaclasses

With code examples on almost every page, and numbered call-outs linking lines of code to helpful descriptions, this book is extremely approachable. Additionally, the code examples are geared toward the interactive Python console, a practical approach to exploring and learning the concepts presented.

I find myself turning to this book when I have a Python question and want an explanation that is more thorough than the one I would likely get on Stack Overflow. I also enjoy reading this book when I have a bit of down-time and just want to learn something new. On more than one occasion, I have found that a concept I recently learned from this book unexpectedly turned out to be the perfect solution to a problem I had to solve. (Reviewed by Steven C. Howell.)

Brett Slatkin (Addison-Wesley, 2015)

"Effective Python: 59 Ways to Write Better Python" Book Cover

This book is a collection of 59 independent articles that build on a basic understanding of Python to teach Pythonic best practices, lesser known functionality, and built-in tools. The topics range in complexity, beginning with the simple concept of being aware of which Python version you’re using, and ending with the more complicated, and typically ignored, concept of identifying memory leaks.

Each article is a combination of example code, discussion, and a list of things to remember.

As each article is independent, this is a great book to jump around in, allowing you to focus on the topics that are most applicable or interesting. This also makes it perfect for reading one article at a time. With each article being around two to four pages in length, you could make time to read one article per day, finishing the book in two to three months (depending on whether you read on weekends).

The articles are grouped into the following 8 chapters:

  • Pythonic Thinking : introduces the best ways to perform common tasks, while taking advantage of how Python is implemented
  • Functions : clarifies nuanced differences of Python functions and outlines how to use functions to clarify intention, promote reuse, and reduce bugs
  • Classes and Inheritance : outlines the best practices when working with Python classes
  • Metaclasses and Attributes : illuminates the somewhat mysterious topic of metaclasses, teaching you how to use them to create intuitive functionality
  • Concurrency and Parallelism : explains how to know to write multi-threaded applications in Python
  • Built-in Modules : introduces a few of Python’s lesser-known built-in libraries to make your code more useful and reliable
  • Collaboration : discusses proper documentation, packaging, dependency, and virtual environments
  • Production : covers the topics of debugging, optimization, testing, and memory management

If you have a solid foundation in Python and want to fill in holes, deepen you understanding, and learn some of the less obvious features of Python, this would be a great book for you. (Reviewed by Steven C. Howell.)

David Beazley & Brian K. Jones (O’Reilly, 3rd edition, 2013)

Python Cookbook, 3rd. Edition

What makes this book stand out is its level of detail. Code cookbooks are typically designed as short and sweet manuals to illustrate slick ways of doing everyday tasks. In this case, each recipe in Python Cookbook has an extended code solution as well as an author’s discussion of some particular elements of the solution.

Each recipe starts out with a clear problem statement, such as, “You want to write a decorator that adds an extra argument to the calling signature of the wrapped function.” It then jumps into a solution that uses modern, idiomatic Python 3 code, patterns, and data structures, often spending four to five pages discussing the solution.

Based on its more involved and sophisticated examples, and the authors’ own recommendation in the preface, this is probably the most advanced Python book on our list. Despite that, don’t be scared away if you consider yourself an intermediate Python programmer. Who’s judging, anyway? There’s an old saying that goes something like this:

“The best way to become a better basketball player is to lose to the best players you can find, rather than beating the worst.”

You may see some code blocks you don’t fully understand—come back to them in a few months. Re-read those sections after you’ve picked up a few additional concepts, and suddenly, it will click. Most of the chapters start out fairly straightforward, and then gradually become more intense.

The latter half of the book illustrates designs like decorator patterns, closures, accessor functions, and callback functions.

It’s always nice to read from a trustworthy source, and this book’s authors certainly fit that bill. David Beazley is a frequent keynote speaker at events such as PyCon and also the author of Python Essential Reference . Similarly, Brian K. Jones is a CTO, the creator of a Python magazine, and founder of the Python User Group in Princeton (PUG-IP) .

This particular edition is written and tested with Python 3.3. (Reviewed by Brad Solomon.)

One of the awesome things about Python is it has a relatively low barrier to entry, compared to many other languages. Despite this, learning Python is a never-ending process. The language is relevant for such a wide variety of tasks, and evolves so much that there will always be something new to discover and learn. While you can pick up enough Python to do some fun things in a week or two, people who’ve been using Python for twenty years will tell you they’re still learning new things they can do with this flexible and evolving language.

To ultimately be successful as a Python programmer, you need to begin with a solid foundation, then gain a deeper understanding of how the language works, and how to best put it to use. To gain a solid foundation, you really can’t go wrong with any of the best books to learn Python . If you want to learn Python with a child, or maybe teach a group of kids, check out the list of best Python books for kids . After you’ve got your feet wet, check out some of the best intermediate and advanced Python books to dig in deeper to less obvious concepts that will improve the efficiency of your code.

All of these books will teach you what you need to know to legitimately call yourself a Python coder. The only ingredient missing is you .

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About The Team

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Aldren Santos

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal . Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session . Happy Pythoning!

Keep Learning

Related Topics: basics intermediate career community

Keep reading Real Python by creating a free account or signing in:

Already have an account? Sign-In

literature review on python

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

78k Accesses

276 Citations

162 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Similar content being viewed by others

literature review on python

AI-assisted peer review

literature review on python

A typology for exploring the mitigation of shortcut behaviour

literature review on python

Distributed peer review enhanced with natural language processing and machine learning

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide.

  • Hadeel A. Ghazzawi
  • Lana S. Nimer
  • Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

  • Amirhossein Saeidmehr
  • Piers David Gareth Steel
  • Faramarz F. Samavati

Systematic Reviews (2024)

Determinants of and interventions for Proton Pump Inhibitor prescription behavior: A systematic scoping review

  • L. C. van Gestel
  • M. A. Adriaanse
  • G. van den Brink

BMC Primary Care (2024)

The spatial patterning of emergency demand for police services: a scoping review

  • Samuel Langton
  • Stijn Ruiter
  • Linda Schoonmade

Crime Science (2024)

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses

  • Josien Boetje
  • Rens van de Schoot

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

literature review on python

  • The LitStudy documentation
  • View page source

The LitStudy documentation 

Logo

LitStudy is a Python package that enables analysis of scientific literature from the comfort of a Jupyter notebook. It provides the ability to select scientific publications and study their metadata through the use of visualizations, network analysis, and natural language processing.

In essence, this package offers five main features:

Extract metadata from scientific documents sourced from various locations. The data is presented in a standardized interface, allowing for the combination of data from different sources.

Filter, select, deduplicate, and annotate collections of documents.

Compute and plot general statistics for document sets, such as statistics on authors, venues, and publication years.

Generate and plot various bibliographic networks as interactive visualizations.

Topic discovery using natural language processing (NLP) allows for the automatic discovery of popular topics.

Frequently Asked Questions 

If you have any questions or run into an error, see the *Frequently Asked Questions* section of the documentation . If your question or error is not on the list, please check the GitHub issue tracker for a similar issue or create a new issue .

Supported Source 

LitStudy supports the following data sources. The table below lists which metadata is fully (✓) or partially (*) provided by each source.

An example notebook is available in notebooks/example.ipynb and here .

Example notebook

Installation Guide 

LitStudy is available on PyPI! Full installation guide is available here .

Or install the latest development version directly from GitHub:

Documentation 

Documentation is available here .

Requirements 

The package has been tested for Python 3.7. Required packages are available in requirements.txt .

litstudy supports several data sources. Some of these sources (such as semantic Scholar, CrossRef, and arXiv) are openly available. However to access the Scopus API, you (or your institute) requires a Scopus subscription and you need to request an Elsevier Developer API key (see Elsevier Developers ). For more information, see the guide by pybliometrics .

Apache 2.0. See LICENSE .

Change log 

See CHANGELOG.md .

Contributing 

See CONTRIBUTING.md .

If you use LitStudy in your work, please cite the following publication:

Heldens, A. Sclocco, H. Dreuning, B. van Werkhoven, P. Hijma, J. Maassen & R.V. van Nieuwpoort (2022), "litstudy: A Python package for literature reviews", SoftwareX 20

Related work 

Don't forget to check out these other amazing software packages!

ScientoPy : Open-source Python based scientometric analysis tool.

pybliometrics : API-Wrapper to access Scopus.

ASReview : Active learning for systematic reviews.

metaknowledge : Python library for doing bibliometric and network analysis in science.

tethne : Python module for bibliographic network analysis.

VOSviewer : Software tool for constructing and visualizing bibliometric networks.

Indices and tables 

Module Index

Search Page

literature review on python

Quantitative literature reviews with python

One of my first tasks as a new post-doc was to undertake a systematic quantitative literature review. We wanted to get a feel for the international & NZ literature on functional biodiversity in agroecosystems, and this was a bit daunting for me as I my background is in invasions, not native biodiversity or agriculture! Luckily, the review method we chose relies on data, not expert knowledge - we chose the method developed by Griffith University ( https://www.griffith.edu.au/griffith-sciences/school-environment-science/research/systematic-quantitative-literature-review ).

It's quite an exhaustive process but the method does a really good job of catching easily overlooked papers, and provides a reproducible and transparent method for conducting reviews and meta-analyses. I'm a fan! 

The first few steps involve defining your keywords and databases for undertaking the searches, and designing a way of storing your papers and extracting the data. Once you've completed the first 10% of your search, though, you'll need to do a stock-take of the papers that you're picking up and make sure you haven't missed any key words in your search terms. 

It was at this point that I realized two things: a) automating this would save me a lot of time, and b) there were no existing programs that could do what I wanted. So, I wrote up some python code to read in all papers, extract the keywords and write them to a new text file. I then used R to rank them by how commonly they occurred and graph the results, and added any commonly-occurring keywords to my database search strings. 

literature review on python

It's not the prettiest graph I've ever made! These were all keywords that occurred more than once in my database. We added "corridor" and "habitat" to our searches based on this.

You can download a copy of my code at my GitHub  https://github.com/pannellj/systematicreviews (but  disclaimer!  I am a total python newbie and it's a bit rough. Suggestions, commits are very welcome). To run the code, you'll first need to convert the PDFs of your papers to text files. I did this using the command line utility pdftotext. The bash script is also available on GitHub. 

My next post will be about the next step in the literature review, which was a bit trickier - cross-checking the papers in the database using python. Let me know in the comments if you've found this useful! 

Write a comment

ISABEL CRUZ ( Friday, 08 January 2021 07:28 )

Hi! Thanks for sharing your experience. I am learning Python to use it for Nursing Evidenced Based studies. I will try your code. regards!

  • Scroll to top

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

literature-review

Here are 166 public repositories matching this topic..., ahmetbersoz / chatgpt-prompts-for-academic-writing.

This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.

  • Updated Jan 25, 2024

patrick-llgc / Learning-Deep-Learning

Paper reading notes on Deep Learning and Machine Learning

  • Updated Jun 24, 2024
  • Jupyter Notebook

safe-graph / graph-adversarial-learning-literature

A curated list of adversarial attacks and defenses papers on graph-structured data.

  • Updated Dec 15, 2023

gkiril / oie-resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

  • Updated Oct 25, 2022

ChandlerBang / awesome-graph-attack-papers

Adversarial attacks and defenses on Graph Neural Networks.

  • Updated Feb 22, 2024

anoopkcn / obsidian-reference-map

Reference and citation map for literature review and discovery

  • Updated Jan 1, 2024

NPCai / Open-IE-Papers

Open Information Extraction (OpenIE) and Open Relation Extraction (ORE) papers and data.

  • Updated Feb 13, 2020

shaunabanana / intrigue

Organize literature into ideas, fast.

  • Updated Jan 15, 2024

sisinflab / adversarial-recommender-systems-survey

The goal of this survey is two-fold: (i) to present recent advances on adversarial machine learning (AML) for the security of RS (i.e., attacking and defense recommendation models), (ii) to show another successful application of AML in generative adversarial networks (GANs) for generative applications, thanks to their ability for learning (high-…

  • Updated Mar 3, 2021

NLeSC / litstudy

LitStudy: Using the power of Python to automate scientific literature analysis from the comfort of a Jupyter notebook

  • Updated Jul 31, 2024

nealhaddaway / citationchaser

Perform forward and backward citation chasing as part of an evidence synthesis project

  • Updated Apr 12, 2024

LocalCitationNetwork / LocalCitationNetwork.github.io

This web app aims to help scientists with their literature review using metadata from OpenAlex (OA), Semantic Scholar (S2) and Crossref (CR) in local citation networks.

  • Updated Jun 9, 2024

lisc-tools / lisc

Literature Scanner: Automated collection & analyses of the scientific literature.

  • Updated Sep 1, 2024

sparks-baird / auto-paper

The aim of auto-paper is to give you tips, tricks, and tools to accelerate your publication rate and improve publication quality.

  • Updated Feb 4, 2023
  • Mathematica

JoaoFelipe / snowballing

Provides tools for literature snowballing

  • Updated Oct 27, 2022

jejjohnson / gp_model_zoo

Literature and light wrappers for gaussian process models.

  • Updated Jun 26, 2021

drshahizan / ai-tools

AI-powered literature review tools leverage machine learning to expedite and enhance the scholarly process of identifying, analyzing, and synthesizing relevant research.

  • Updated Sep 3, 2024

dvklopfenstein / pmidcite

Turbocharge a PubMed literature rather than clicking and clicking and clicking on Google Scholar

  • Updated Jun 20, 2024

yassinekdi / naimai

Package to help with scientific literature research

  • Updated Dec 8, 2022

luiscruz / awesome-mobile-app-energy-papers

A curated list of awesome papers that study energy efficiency for mobile applications.

  • Updated Apr 11, 2023

Improve this page

Add a description, image, and links to the literature-review topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the literature-review topic, visit your repo's landing page and select "manage topics."

IMAGES

  1. Performing Literature Review using Python Script (litreviewer)

    literature review on python

  2. (PDF) litstudy: A Python package for literature reviews

    literature review on python

  3. Performing Literature Review using Python Script (litreviewer)

    literature review on python

  4. (PDF) A Short Review of Python Libraries and Data Science Tools

    literature review on python

  5. (PDF) A Review on using Python as a Preferred Programming Language for

    literature review on python

  6. GitHub

    literature review on python

VIDEO

  1. Papyri: Better Documentation for the Jupyter Scientific Ecosystem- Matthias Bussonnier

  2. Performing a code review , Python

  3. Lecture_9: Python Essentials

  4. Introduction to Machine Learning in Python with Scikit-Learn

  5. Keynote: Python's Contribution to Astronomy & Major Space Telescope Missions Megan Sosey

  6. Making a KeyLogger in Python

COMMENTS

  1. litstudy: A Python package for literature reviews

    In this work, we present litstudy [1]: a Python package that assists in exploring scientific literature. The package can be used from simple Python scripts or Jupyter notebooks [2], allowing researchers to quickly and interactively experiment with different ideas and methods.

  2. litstudy

    LitStudy is a Python package that enables analysis of scientific literature from the comfort of a Jupyter notebook. It provides the ability to select scientific publications and study their metadata through the use of visualizations, network analysis, and natural language processing. In essence, this package offers five main features: Extract ...

  3. GitHub

    LitStudy is a Python package that enables analysis of scientific literature from the comfort of a Jupyter notebook. It provides the ability to select scientific publications and study their metadata through the use of visualizations, network analysis, and natural language processing.

  4. litreviewer: A Python Package for Review of Literature (RoL)

    The package litreviewer is a collection of few Python scripts called modules which is useful for performing text mining. All those methods defined in these modules found to have worked effectively while using for web crawling and scraping. The package also supports statistical analysis and other utility functions related to text concatenation ...

  5. litstudy: A Python package for literature reviews

    litstudy: A Python package for literature reviews Stijn Heldens a , b , ∗ , Alessio Sclocco a , Henk Dreuning b , Ben van Werkhoven a , Pieter Hijma c , Jason Maassen a , Rob V. van Nieuwpoort a , b

  6. Litstudy: A Python Package for Literature Reviews

    Therefore, we present litstudy, a Python package that allows answering such questions using simple scripts or Jupyter notebooks.The package enables selecting scientific publications and studying their metadata using visualizations, network analysis, and natural language processing. ... Keywords: Literature review, Python, Jupyter, Bibliometrics.

  7. literature-review · GitHub Topics · GitHub

    Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, download full-text pdf of research articles in batch.

  8. litstudy: A Python package for literature reviews

    Researchers are often faced with exploring new research domains. Broad questions about the research domain, such as who are the influential authors or what are important topics, are difficult to answer due to the overwhelming number of relevant publications. Therefore, we present litstudy: a Python package that enables answering such questions using simple scripts or Jupyter notebooks. The ...

  9. systematic-reviewpy · PyPI

    The main objective of the Python framework is to automate systematic reviews to save reviewers time without creating constraints that might affect the review quality. The other objective is to create an open-source and highly customisable framework with options to use or improve any parts of the framework. python framework supports each step in ...

  10. researchpal

    researchpal is a Python library that automates the process of generating academic literature reviews based on a research question. It utilizes external data sources to fetch research papers, synthesizes the findings, and generates a concise literature review. This library is particularly useful for researchers and students looking to streamline ...

  11. litreviewer is a Python package (collection of few Python modules) that

    `litreviewer`: A Python Package for Review of Literature (RoL) Background. Secondary research. ... A Literature Review is a systematic and comprehensive analysis of books, scholarly articles, and other sources relevant to a specific topic providing a base of knowledge on a topic. A literature review is an overview of the previously published ...

  12. The Best Python Books

    Note: If you're looking for the best Python books for experienced programmers, consider the following selection of books with full reviews in the intro and advanced sections: Think Python: The most basic of this list, Think Python provides a comprehensive Python reference.; Fluent Python: While Python's simplicity lets you quickly start coding, this book teaches you how to write idiomatic ...

  13. An open source machine learning framework for efficient and ...

    Compiled and packaged versions of the software are available on the Python Package ... S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in ...

  14. litstudy: A Python package for literature reviews

    Researchers are often faced with exploring new research domains. Broad questions about the research domain, such as who are the influential authors or what are important topics, are difficult to answer due to the overwhelming number of relevant publications. Therefore, we present litstudy: a Python package that enables answering such questions using simple scripts or Jupyter notebooks. The ...

  15. The LitStudy documentation

    LitStudy is a Python package that enables analysis of scientific literature from the comfort of a Jupyter notebook. It provides the ability to select scientific publications and study their metadata through the use of visualizations, network analysis, and natural language processing. In essence, this package offers five main features: Extract ...

  16. pyResearchInsights—An open‐source Python package for scientific text

    1 INTRODUCTION. Keeping track of conceptual and methodological developments in any scientific discipline is imperative to advance research. An exponential growth in published scientific literature has made it extremely difficult to keep track of scientific advancements (Roll et al., 2018).Within the field of ecology, we have observed a twofold increase in published literature over the last ...

  17. Performing Literature Review using Python Script (litreviewer)

    "litreviewer' is Python Script useful for performing Review of Literature (RoL) and also find gaps in existing body of literature. Visit https://github.com/K...

  18. TechMiner: Analysis of bibliographic datasets using Python

    In research, the systematic literature review methodology is commonly used ... In Fig. 3, the column explorer is used to search for articles containing the term python in the cleaned author keywords field in the dataset. This app allows the user to view the basic bibliographic information of the selected document and search for documents that ...

  19. litstudy: A Python package for literature reviews

    Therefore, we present litstudy: a Python package that enables answering such questions using simple scripts or Jupyter notebooks. The package enables selecting scientific publications and studying their metadata using visualizations, bibliographic network analysis, and natural language processing.

  20. Quantitative literature reviews with python

    Quantitative literature reviews with python. One of my first tasks as a new post-doc was to undertake a systematic quantitative literature review. We wanted to get a feel for the international & NZ literature on functional biodiversity in agroecosystems, and this was a bit daunting for me as I my background is in invasions, not native ...

  21. systematic-literature-reviews · GitHub Topics · GitHub

    This is an application that generates systematic reviews powered by GPT. It searches Google Scholar with a given query and generates a systematic review using the search results. python gpt systematic-literature-reviews systematic-review llm. Updated on May 12, 2023.

  22. GitHub

    This project contains Python code in Jupyter Notebooks used to conduct a systematic literature review using Google Scholar via a SERP API service. It simplifies three aspects of literature reviews: It creates a spreadsheet with papers, snippets and links to the PDFs ready for fast dual coding

  23. literature-review · GitHub Topics · GitHub

    This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans. writing prompt academic-writing literature-review gpt3 ai-writing gpt4 chatgpt chatgpt-prompts gpt35 customgpt. Updated on Jan 25.