• Hands on Lab Kits
  • Virtual Simulations
  • Virtual Pre-Labs
  • Digital Curriculum
  • Lab Management Platform
  • Anatomy and Physiology
  • Earth Science
  • Environmental Science
  • Forensic Science
  • General, Organic, & Biochemistry
  • Microbiology
  • Organic Chemistry
  • Pharmacy Technician
  • Physical Science
  • Choose Your Labs
  • Resource Hub
  • Client Stories
  • 2023 Annual Lab Report

The Pros & Cons of Virtual Labs Based on 1,614 Instructors & Students 

Table of contents.

The Pros & Cons of Virtual Labs Based on 1,614 Instructors & Students  Featured Image

“Science is hard! I do better to learn at my own pace and re-read things that I didn’t understand before going to the next topic, whereas in class with lecture, the professor moves at a very fast pace.” 

That’s just what one student said in our survey of over 1,293 students who took online science labs in 2022-23. That sentiment echoes through many of the student responses, and underscores the importance of delivering online lab courses in ways that support students’ needs and help them engage more deeply with course material. To do that well, lab format, specifically hands-on and virtual labs (a.k.a online science lab simulations), plays a powerful role—and the 321 instructors we surveyed on the topic said the same. 

In this article, we explore some of the reasons why virtual labs have gained popularity, and share insights from our Annual Lab Report that shed light on how to use virtual labs most effectively and when they fall short.   

The Drawbacks of Virtual Science Labs

While it’s clear virtual labs have an important role in your course, there are also drawbacks—especially if you’re relying solely on this format for labs. Studies have shown that courses using virtual-only simulations end up being less effective in providing students with the experiences they need to acquire new skills, achieve course competencies, and apply knowledge. 

Lacking Real-World Application : Virtual labs can bridge theory and practice by providing a multimedia connection between abstract concepts and practical execution. But when you think about how to teach organic chemistry online, for example, consider the years of training it takes to become skilled in research laboratory techniques. Dealing with the frustrations of getting equipment to work and developing the muscle memory of performing work hands-on cannot be replicated in the virtual platforms. For students looking to pursue advanced courses or even secure jobs in any science-related field, real-world experience is an essential part of the process for becoming skilled. 

According to our recent research report, 74% of students who only participated in virtual labs said they would have felt more confident applying what they learned in real-world situations if their labs had a hands-on component. In fact, this student shared the impact of those opportunities.

advantages of online experiments

Less Effective Student Learning : When used as the only form of learning, virtual labs don’t provide a holistic and compelling learning experience. The experience of working with physical solutions and lab equipment are considered part of the learning process, and a student who wants to be a chemist, for instance, must handle chemicals in a physical lab at some point. So despite their increasing levels of realism and fidelity to the physical world, virtual labs aren’t comparable to the experience of conducting physical labs, which has important implications when it comes to student performance and achievement . 

Prone to Cheating : Another drawback of virtual labs is the ease in which students might be tempted to cheat. With virtual labs, it’s easy to click through simulations quickly and find answers online. As Biology instructor at Richmond Community College Lori Frear explains, “With simulated labs, students will appear to do better — meaning, they get higher grades than those doing hands-on online labs. That’s because digital labs are often based on effort, whereas with hands-on labs, if students don’t show evidence of having completed the lab, it will impact their grade. You can’t wing it in hands-on online labs and expect to learn it and get a good grade. The hands-on labs allow students to process information and think through concepts in ways that the clicking of digital-only labs don’t allow.”

Using hands-on labs mimics a traditional lab setting, making it more difficult to cheat. So just be sure that if you’re using virtual labs, you’re also using technology that can curb academic misconduct. For example, lab management platforms should have built-in tools that prevent students from accessing content during evaluations.

Accessibility Concerns : Because virtual science labs rely on a variety of technologies, accessibility is particularly important. Before selecting a virtual lab, ensure that all students will be able to use the program easily. Some factors to consider include: 

  • Do all videos and audio have captions and transcripts? 
  • Will a screen reader correctly read all aspects of the program? 
  • Can the entire program be navigated with just a keyboard? 
  • Will the program work with low-quality or unreliable internet access? 
  • Can the lab be completed on a phone or tablet?

In general, virtual science labs can be an important element in program’s efforts to

expand access to lab-based courses but are often not a good substitute for sophisticated, highly complex lab activities, which brings us to a more effective approach.

The Benefits of Virtual Labs

Virtual science labs are digital presentations that provide students with a way to experience the experimental steps of a scientific experiment on a screen. They use graphics and animations to present specific topics. Some are like videos — students simply click play and watch the simulation take place — whereas others are interactive, allowing students to manipulate variables to discover different outcomes. Some examples include representations that demonstrate a law of physics, the effects of gravity, or a frog dissection typical of many biology lab simulations.

There are several benefits to using virtual simulations in your online lab course. As one student puts it: “ The online labs gave us a chance to do experiments over and over again, until they were clearly understood. That was a great advantage.”

In fact, 81% of students said these online labs made their course more engaging. In addition to repeating these labs in a low-stakes environment, as many times as needed, to master content, students could interact more thoughtfully with course material, complete labs at their own pace, and get immediate feedback tailored to their performance. And there are several instances where it makes sense to choose virtual labs.

Observing the Unobservable : An important benefit of virtual labs is that students can ‘conduct’ experiments on phenomena that would normally be unobservable or unsafe to observe (de Jong et al., 2013; Faulconer & Gruss, 2018). Virtual labs can portray abstract objects, such as light rays, that students cannot see in a physical lab, which helps students understand the concept better.

Some chemical reactions might also be unsafe to conduct in-person but could be easily reproduced using a chemistry lab simulator. For example, take experiments in qualitative analysis that use toxic metals or the nitrogen dioxide/dinitrogen tetroxide dynamic equilibrium. These require too toxic of chemicals and too expensive of glassware to be able to provide students with a hands-on experience, but virtually through a simulation, students can conduct tests and see the reactions in the context of a real-world scenario. Virtual labs, like these, have been shown to increase students’ conceptual understanding of certain topics (Kollöffel & de Jong, 2013). 

Particular Learning Outcomes : Virtual labs can often be appropriate to adopt for certain learning outcomes and topics. For example, learning about the scientific method or graphing preparation don’t require a hands-on lab and are topics best approached using a virtual lab.

Price Consciousness : With virtual labs, students don’t have to pay for access to large scientific apparatus or expensive equipment, like they would for on-campus labs. In fact, some lab equipment is so expensive, it’s not even feasible to have them on campus. Incorporating a mix of virtual with hands-on labs can then be an effective strategy for balancing costs.

Safety :  In a virtual lab, students can’t injure themselves or others, or break equipment, like they could in a campus lab or hands-on home setting. Still, the best virtual labs should instruct students on lab safety protocols and show them how to use equipment and materials properly—setting them up for safety and success should they ever find themselves in a more traditional lab setting.

Repeatability & Preparation : Finally, with a virtual lab, a student can run a digital experiment as many times as needed to get the desired result. With each experiment, students can manipulate variables, run the experiment, and immediately see the results. With each run, students engage more deeply with the concept, which is helpful for struggling students or those who need to repeat activities for a deeper understanding. Similarly, using virtual pre-labs in preparation for on- campus labs can also be beneficial, allowing students to practice repeatedly before entering a physical lab.

The Right Mix of Hands-on & Virtual Labs

We already know what you’re probably thinking! Hands-on labs increase cost. They’re hard to implement. And does anyone know just how effective they are?

Hands-on Improves Student Learning

Well, it turns out adopting hands-on labs is easier than you think and significantly improves learning outcomes. 85% of instructors using hands-on labs were confident students could apply what they learned to the real world. Unlike virtual simulations, students gain the hands-on experience of using equipment and carrying out experiments — building technical and psychomotor skills as well as expanding students’ abilities to develop the problem-solving and critical thinking skills students would in a traditional lab. 

Hands-on Replicate the In-Person Lab Experience

Comparative studies suggest that students learn just as much from these labs and in ways similar to those who attend in-person labs, while virtual simulations when used alone aren’t as effective (Casanova et al., 2006; Reuter, 2009). In fact, 71% of surveyed instructors said using hands-on labs made their labs feel comparable to an in-person experience, whereas the majority of instructors using virtual simulations said the lab format was NOT comparable to an in-person lab experience.

Hands-on Costs, Implementation & Safety Concerns Eased

Further, when it comes to cost, survey data showed that 22% of instructors cite cost as the reason they didn’t adopt hands-on labs. However, those instructors who did use hands-on labs thought the cost was commensurate with the value of the experience. In fact, both instructors and students agree that students learn more when they have the opportunity to do hands-on labs at home AND the cost of doing those hands-on labs is worth the value received. Nearly ¾ of students said the cost of the hands-on labs was aligned with the value received.

In fact, Donna Uguccioni, Anatomy & Physiology Professor at Cape Fear Community College, says, “When I started teaching Anatomy & Physiology online, we used kits but the costs became prohibitive for the students and the goals of the college. After that, we used a hard copy lab manual, which matched what we use in hybrid and seated classes. It worked fine but not the best option for a fully online course. Then we found Science Interactive. The labs tie into what we cover in class and offer students a hands-on lab experience in their own home. The kit is completely customizable to match our learning objectives, and the price point makes it a perfect addition and replacement.”

Similarly, nearly a third of instructors say they don’t use hands-on labs because of implementation concerns. While implementing hands-on might seem daunting at first glance, it can be a smooth process when you find partners experienced in supporting instructors through implementation and that can provide ready-made lab kits to meet your needs.

Finally, hands-on labs can be designed to ensure student safety . With a focus on training, workstation preparation, clear guidance, instruction on how to handle lab equipment, and the use of microscaling, hands-on labs can be done safely at home without compromising quality.

The Most Effective Approach to Online Labs

“For someone like me who is a bit older, it may have benefitted me to have had a mixture of hands-on as well as virtual. I say that because I had so much fear of failure and that I wasn’t doing the experiments right.”

This student, along with others, emphasize the importance of using a blend of hands-on and virtual labs, as well as the benefits of using virtual simulations to supplement in-person courses:

advantages of online experiments

In the end, the most successful courses will use an intentional blend of both virtual and hands-on science labs to create the most effective and authentic student learning experience possible. 

Identifying lessons that can be done virtually and others that can provide hands-on experience will keep course costs reasonable, ensure student safety and content accessibility, and prepare students with the skills and knowledge needed to apply concepts to the real world. 

If you’re interested in delivering a more authentic and engaging online lab course, we can help .

TLDR; Infographic

Click here to download and save the infographic.

advantages of online experiments

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

The Surprising Power of Online Experiments

  • Stefan Thomke

advantages of online experiments

In the fast-moving digital world, even experts have a hard time assessing new ideas. Case in point: At Bing a small headline change an employee proposed was deemed a low priority and shelved for months until one engineer decided to do a quick online controlled experiment—an A/B test—to try it out. The test showed that the change increased revenue by an astonishing 12%. It ended up being the best revenue-generating idea Bing ever had, worth $100 million.

That experience illustrates why it’s critical to adopt an “experiment with everything” approach, say Kohavi, the head of the Analysis & Experimentation team at Microsoft, and Thomke, an HBS professor. In this article they describe how to properly design and execute A/B and other controlled tests, ensure their integrity, interpret results, and avoid pitfalls. They argue that if a company sets up the right infrastructure and software, it will be able to evaluate ideas not only for improving websites but also for new business models, products, strategies, and marketing campaigns—all relatively inexpensively. This will help it find the right path forward, especially when answers aren’t obvious or people have conflicting opinions.

Getting the most out of A/B and other controlled tests

When building websites and applications, too many companies make decisions—on everything from new product features, to look and feel, to marketing campaigns—using subjective opinions rather than hard data.

The Solution

Companies should conduct online controlled experiments to evaluate their ideas. Potential improvements should be rigorously tested, because large investments can fail to deliver, and some tiny changes can be surprisingly detrimental while others have big payoffs.

Implementation

Leaders should understand how to properly design and execute A/B tests and other controlled experiments, ensure their integrity, interpret their results, and avoid pitfalls.

In 2012 a Microsoft employee working on Bing had an idea about changing the way the search engine displayed ad headlines. Developing it wouldn’t require much effort—just a few days of an engineer’s time—but it was one of hundreds of ideas proposed, and the program managers deemed it a low priority. So it languished for more than six months, until an engineer, who saw that the cost of writing the code for it would be small, launched a simple online controlled experiment—an A/B test—to assess its impact. Within hours the new headline variation was producing abnormally high revenue, triggering a “too good to be true” alert. Usually, such alerts signal a bug, but not in this case. An analysis showed that the change had increased revenue by an astonishing 12%—which on an annual basis would come to more than $100 million in the United States alone—without hurting key user-experience metrics. It was the best revenue-generating idea in Bing’s history, but until the test its value was underappreciated.

  • Ron Kohavi is a distinguished engineer and the general manager of the Analysis & Experimentation team at Microsoft. Previously, he was the director of data mining and personalization at Amazon, where he was responsible for Weblab, Amazon’s experimentation system.
  • Stefan Thomke is the William Barclay Harding Professor of Business Administration at Harvard Business School. He is a leading authority on the management of business experimentation and innovation and has worked with many global companies on product, process, and technology development. He is the author of Experimentation Works: The Surprising Power of Business Experiments (HBR Press, 2020).

advantages of online experiments

Partner Center

  • Methodology
  • Open access
  • Published: 07 February 2020

Online randomized controlled experiments at scale: lessons and extensions to medicine

  • Ron Kohavi 1 , 2 ,
  • Diane Tang 3 ,
  • Lars G. Hemkens 5 &
  • John P. A. Ioannidis 6 , 7 , 8 , 9 , 10  

Trials volume  21 , Article number:  150 ( 2020 ) Cite this article

13k Accesses

20 Citations

36 Altmetric

Metrics details

Many technology companies, including Airbnb, Amazon, Booking.com , eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in technology, the marginal cost of such experiments is approaching zero and the value for data-driven decision-making is broadly recognized.

Methods and results

This is an overview of key scaling lessons learned in the technology field. They include (1) a focus on metrics, an overall evaluation criterion and thousands of metrics for insights and debugging, automatically computed for every experiment; (2) quick release cycles with automated ramp-up and shut-down that afford agile and safe experimentation, leading to consistent incremental progress over time; and (3) a culture of ‘test everything’ because most ideas fail and tiny changes sometimes show surprising outcomes worth millions of dollars annually.

Technological advances, online interactions, and the availability of large-scale data allowed technology companies to take the science of RCTs and use them as online randomized controlled experiments at large scale with hundreds of such concurrent experiments running on any given day on a wide range of software products, be they web sites, mobile applications, or desktop applications. Rather than hindering innovation, these experiments enabled accelerated innovation with clear improvements to key metrics, including user experience and revenue. As healthcare increases interactions with patients utilizing these modern channels of web sites and digital health applications, many of the lessons apply. The most innovative technological field has recognized that systematic series of randomized trials with numerous failures of the most promising ideas leads to sustainable improvement.

While there are many differences between technology and medicine, it is worth considering whether and how similar designs can be applied via simple RCTs that focus on healthcare decision-making or service delivery. Changes – small and large – should undergo continuous and repeated evaluations in randomized trials and learning from their results will enable accelerated healthcare improvements.

Peer Review reports

Every major technology company runs online controlled experiments, often called A/B tests, to gather trustworthy data and make data-driven decisions about how to improve their products. All these controlled experiments are randomized. Companies that make widespread use of this approach include Microsoft [ 1 , 2 , 3 ], Google [ 4 , 5 ], LinkedIn [ 6 , 7 , 8 ], Facebook [ 9 ], Amazon [ 10 ] and Intuit [ 11 ]. Much of the methodology used in these online controlled experiments derives from the same family of experimental methods developed in the earlier part of the twentieth century that led to randomized controlled trials (RCT) in medicine [ 12 ]. The scale of online controlled experiments has grown dramatically in the last decade, as marginal costs approach zero. In this paper, we share some insights about the evolution and use of A/B tests and derive some key lessons that may be useful for medicine.

It may be possible to translate some of the advantages of online controlled experiments to medicine and invigorate the traditional RCT designs and their applications. In particular, RCTs in medicine are often criticized for being expensive, requiring longer follow-up to obtain reliable answers, and difficult to do. This criticism draws mostly on the paradigm of licensing trials for new medications and biologics, typically done in strictly controlled settings under very specific circumstances. However, a very large number of questions in medicine, health, and healthcare could potentially be answered with simple RCTs at significantly lower cost. Such trials are conducted in a pragmatic fashion and directly address issues of decision-making, such as whether to do or not to do some procedure, test, intervention, information offering, quality improvement, service delivery [ 13 ], or management or policy change. They aim to directly compare the effects of choosing option A or option B and outcomes can be collected routinely, for example, obtained from interactions with web sites, mobile applications, and desktop applications, wearable devices or electronic health records, or from reimbursement claims or financial datasets. There are ongoing initiatives aiming to improve the design and affordability of trials or the use of routinely collected data for RCTs [ 14 , 15 , 16 ]. Some outcomes may be possible to meaningfully collect very quickly, for example, rehospitalization rates, which is increasingly possible using routinely collected data from electronic health records, administrative data, or registries [ 13 , 16 ]. In this regard, it would be very useful to learn from the A/B testing experience in technology and allow the medical and healthcare research community to consider whether and how similar designs can be applied in a focused fashion or at massive scale in biomedicine as well.

The test everything with controlled experiments theme

In the digital world, data is generated and collected at an explosive rate. More than 4 billion of the world’s 7.6 billion population is connected to the internet. The volume and frequency of data production are enormous. For example, Google receives billions of queries every day [ 17 ], and along with these queries, terabytes of telemetry data are logged to improve the service. Over the years, technology has also been developed not only to be able to handle the volume and frequency of the data flowing around but also the transfer speed, reliability and security of data. Digital collection of data has become much cheaper and reliable.

At Google, LinkedIn, and Microsoft, where three of the co-authors work, the value of online controlled experiments became clear – tiny changes had surprisingly large impact on key metrics, while big expensive projects often failed. About two-thirds of experiments show that promising ideas that we implemented in products failed to improve the metrics they were designed to change, and this was worse in well-optimized domains such as the search engines [ 2 ], where failures were in the range of 80–90%. The humbling results led to a theme of ‘test everything with controlled experiments’ coupled with the idea of testing Minimum Viable Products popularized by Eric Ries in the Lean Startup [ 18 ] – the sooner we can get ideas into controlled experiments and thus get objective data, the sooner we can learn and adjust. A motivating example is described in Table 1 .

Figure  1 shows how the different organizations scaled experimentation over the years with year 1 being a year where experimentation scaled to over an experiment per day (over 365/year). The graph shows an order of magnitude growth over the next 4 years for Bing, Google, and LinkedIn. In the early years, growth was slowed by the experimentation platform capabilities itself. In the case of Microsoft Office, which just started to use controlled experiments as a safe deployment mechanism for feature rollouts at scale in 2017, the platform was not a limiting factor because of its prior use in Bing, and feature rollouts, run as controlled experiments, grew by over 600% in 2018. Growth slows down when the organization reaches a culture of ‘test everything’ and the limiting factor becomes its ability to convert ideas into code that can be deployed in controlled experiments.

figure 1

Experimentation growth over the years since experimentation operated at scale of over one new experiment per day

Today, Google, LinkedIn, and Microsoft are at a run rate of over 20,000 controlled experiments/year, although counting methodologies differ (e.g., ramping up the exposure from 1% of users to 5% to 10% can be counted as one or three experiments; an experiment consisting of a control plus two treatments can count as either one or two experiments).

Phases of technical and cultural change

Software development organizations that start to use controlled experiments typically go through phases of technical and cultural changes as they scale experimentation. Here are key axes on which this evolution at Google, LinkedIn, and Microsoft happened.

Scale and statistical power

Firstly, to scale experimentation, the experimentation platform must support the capability of exposing a single user to multiple experiments. Whether the experimentation surface (web site, mobile app, desktop app) has 10,000 monthly active users or 100 million (as Bing, Google, and LinkedIn have), there are never enough users if each user is exposed to just a single experiment. Web sites (like Bing and Google) with multibillion-dollar annual revenues that depend on a single key web page (e.g., the search engine results page, or SERP) imply that we must be able to detect small effects – not detecting a true 0.5% relative degradation to revenue will cost tens of millions of dollars. In the medical literature, looking for such effects would be equivalent to looking for risk ratios of 1.005 or less, which is one order of magnitude lower than the threshold of what are considered ‘tiny effects’ (relative risks < 1.05) [ 21 ]. However, this may be very different on a public health level. Here, on a large scale, the impact of tiny effects can be substantial. For example, the effect of fruits and vegetables may be tiny per serving on reducing cancer risk individually (with a HR of 0.999) but substantial at a population level [ 21 ].

High statistical power is required, and the way to achieve this is to expose each user to multiple experiments. Because the relationship between the detectable effect and the number of users needed is quadratic [ 22 ], the ability to detect an effect twice as small, e.g., 0.25%, requires quadrupling the number of users. For Bing, Google, and LinkedIn, it is common for each experiment to be exposed to over a million users.

If the results are surprising, such as a much larger effect being seen than expected, then the experiment will typically be rerun with tens of millions of users to gain confidence in the results. Both the act of replication and the increased power are important factors in increased trust in the results.

All three companies started with a simple system running experiments on disjoint users, and all switched to concurrent, or overlapping, experiments [ 2 , 4 , 7 ]. A user visiting Bing, Google, or LinkedIn today is exposed to tens of experiments, which may change the user interface, personalization, ranking algorithms, and infrastructure (e.g., improving site speed).

To ensure robustness given the high level of concurrency, mechanisms were developed to prevent interactions (e.g., by declaring constraints or parameters being modified, the system will guarantee disjoint users to those experiments) and nightly tests are sometimes run, which test all pairs of experiments for interactions. A classic example of an interaction has two different experiment treatments wherein each adds a line that pushes the buy button at a retail site down. A user in both treatments experiences a buy button pushed below the ‘fold’ (bottom of screen) and thus add-to-carts drop. In our experience, unexpected interactions in technology are rare and these are addressed by serializing the experiments or, more commonly, by identifying and fixing software issues that show up when users are exposed to multiple experiments.

Incremental costs

Secondly, the cost (developer time, data scientist time, hardware resources) of setting up and analyzing experiments is initially high but comes down with scale. As the experimentation platform matures, running and analyzing experiments becomes self-service. For instance, at Google, LinkedIn, and Microsoft, developers, data scientists and product/program managers set up experiments using a browser interface; over 1000 metrics are then computed for each experiment, ranging from various engagement metrics (e.g., pageviews and clicks) to monetization (e.g., revenue and subscription rates) to service metrics (e.g., queries-per-second, latency, and crash rates). It is common that after an experiment is activated, one can get the first read on the experiment impact in minutes for critical metrics. Such near-real-time data pipelines are used to abort egregiously bad experiments or for supporting an experiment to be ramped up from a small percentage of users to a larger one.

Data scientists with statistics and coding background (able to manipulate large amounts of data) are involved in only a small percentage of experiments (e.g., under 5%), where special experiment designs are needed or a deep-dive analysis is required (e.g., two metrics that are normally highly correlated move in opposite directions). As another example of a surprisingly hard problem, some clicks are caused by bots – automated programs that scrape the web site – and should be removed from the analysis as they introduce non-human signals that could skew results or reduce statistical power. At Bing, over 50% of US web traffic is due to bots and the proportion is about 90% in China and Russia; fairly sophisticated mechanisms have been developed to detect bots and remove them.

Culture change

Thirdly, when the experimentation platform is no longer limiting the number of experiments (neither technically nor due to costs), the culture changes to the abovementioned ‘test everything with controlled experiments’ mentality. The limiting factor to innovation now becomes the ability to generate ideas and develop the code for them. Software development cycles shrink to enable quick iterations and feedback loops based on the idea of the Minimum Viable Product [ 18 ], which means that you build just enough of an idea so that it can test be tested in a controlled experiment and then get feedback and iterate. The key observation is that long development cycles based on the traditional waterfall model often fail to meet their goals due to optimistic assumptions and changing requirements; to paraphrase Helmuth von Moltke, ideas rarely survive contact with customers. Instead, we want to test an idea quickly with real users in a controlled experiment and learn from the results and feedback (mostly implicit, but sometimes explicit through feedback links and survey). Several changes typically happen, as follows:

Release frequency (tempo) improves. Increasing the frequency of software developments with controlled experiments improves the stability and reliability of software because small changes that are evaluated in isolation allow quick corrections before major maldevelopments have big consequences (e.g., rollbacks) [ 23 , 24 ]. Release cycles went from 6 months to monthly to weekly to daily, and now at Bing, Google, and LinkedIn, they are made multiple times a day to services and web sites. Experiments on client software, like Microsoft Office, is still limited because, unlike a website, it requires users to update the software on their machines (e.g., PCs or phones). That said, even for client software, release cycles have shrunk from years to weeks, with each release containing hundreds of new features evaluated using controlled experiments.

Agreement on the Overall Evaluation Criterion (OEC) becomes critically important. An experiment scorecard shows hundreds to thousands of metrics. It is usually easy to find something that improves (or degrades), but the challenge is to come up with a small set of key metrics, ideally a single OEC, to help make tradeoffs. A good OEC captures the organizational long-term objectives but must be based on metrics that are measurable in short-term experiments. Since the OEC is used to determine success (e.g., shipping a change) and consists of one or a few metrics, there is less concern about multiple hypothesis testing. One example of a key component of the OEC is the sessions per user metric [ 25 ]; if users are coming more often, it is usually a strong sign that the treatment is useful. The rest of the metrics are used for debugging and understanding why something happened, and these are marked as interesting when the p value is low, e.g., < 0.001.

The reason we can look at so many metrics is that key metrics are broken down by areas. For example, we might be interested in the click-through rate of the page (single metric); to understand the change in this metric, we will show the click-through rate of 20 subareas of the page. In many cases we find that users often have a fixed amount of attention, so there is a conservation of clicks: if one sub-area gets more clicks, it is likely cannibalizing clicks from other sub-areas. In medicine, the issue of competing risks, concurring events, and their capture in combined endpoints integrating the competing components might be the closer analogy to cannibalization of outcomes [ 26 , 27 ]. Selecting a useful primary outcome(s) is key but not straightforward. Core outcome sets are increasingly developed with input from patients and clinicians to reflect outcomes that cover the long-term objectives of treatment such as the prevention of death, disability, or loss of quality of life [ 28 ]. Combined endpoints may integrate several components that may occasionally be competing risks. With a plethora of outcomes, concerns arise about multiplicity [ 29 ].

Humbling reality sets in on the value of ideas. Goals change from ‘ship feature X by date Y’ to ‘improve the OEC by x% over the next year’. Success becomes harder and a humbling reality sets in – most ideas are not as good as we believe [ 19 ]. High attrition is similarly common in the development pipeline of medical interventions [ 30 ]. Moreover, while many of the more successfully licensed interventions originally have expectations of major benefits, e.g., in survival, most often they settle for improvements in less serious outcomes, e.g., disease progression, without affecting death rates [ 31 ].

Evaluation encourages more exploration – breakthrough ideas are discovered. The safety net afforded by controlled experiments encourages more exploration of ideas that may not be highly prioritized a priori but are easy to code and evaluate. Our experience is that there is no strong correlation between the effort to code an idea and its value. For example, a simple change to ad titles at Bing, which was rated low and took days to code, was worth over $100 M annually [ 3 ]. Tweaks to Google’s color scheme, which were shunned by Google’s visual design lead at the time, because he had “ grown tired of debating such minuscule design decisions ” [ 32 ] were worth over $200 M annually [ 33 ]. In the same way, some medical treatments may have tremendous health effects and are incredibly cheap (e.g., simple diagnostics such as measurement of blood pressure, body temperature or listening to the patient and interventions such as beta-blockers for antihypertensive treatment or antibiotics in sepsis), while high tech interventions that are extremely costly often provide relatively little health gain (e.g., modern oncology treatments [ 31 , 34 ]).

Incremental progress on long-term goals. Many long-term improvements are the result of thousands of candidate ideas that are evaluated over multiple iterations. Winners are shipped, losers are modified (given new data and insights from the experiment) or abandoned. It is impressive to see how key metrics have improved over time. This would be the ultimate goal of a learning healthcare system in medicine, where A/B testing might play a crucial role in the continuous evaluation of innovative changes of care [ 20 ].

Evolution of organizational processes: experimentation maturity on multiple axes

As experimentation matures in an organization [ 35 ], the organizational needs evolve, including:

Early indicators and holdout. While there are metrics that take longer to materialize, such as the retention rate of a paid customer, the desire to iterate quickly usually pushes one to look for early indicators that are then combined with a holdout experiment to see if the long-term metrics differ. Therefore, time to measure is usually a week or a few weeks. For example, a site may give customers a free subscription service trial, and they have 30 days to decide whether they want to subscribe. The customer’s usage and satisfaction of the service during the first few days can be very indicative of whether they will end up paying. In the medical field, such early indicators would be metrics like duration of hospital stay, hospital mortality, complications or 30-day re-admission rates, for example, in clinical trials evaluating different types of surgery.

Near-real-time analysis. Whereas the initial experimentation system usually produces a scorecard after a day, as reliance on experimentation grows, so does the need for faster scorecards. If there is a bug, a day is too long – too many users are hurt and the development team needs faster feedback. Today, initial scorecards are produced in near-real-time (e.g., every 15 min). While they do not have statistical power to detect the effect we are hoping for, they are sufficient for detecting egregious issues, allowing the platform to abort experiments. Note that, given the large number of scorecards generated, multiple hypothesis issues have to be addressed [ 2 ]. The final treatment effect is determined by the final scorecard, usually based on 1–2 weeks of data.

Automated ramp-up. With near-real-time analysis, it is possible to tradeoff risk versus statistical power. An experiment starts at a small percentage in a single data center, similar to pilot studies in medicine. As discussed above, scorecards are generated in near-real-time and, if certain metrics degrade beyond acceptable limits, the experiment is auto-aborted without the need for human intervention. If after several hours no key metric degrades, the experiment auto-ramps to a higher percentage of users and at multiple data centers.

Heterogeneous treatment effects are provided in scorecards. Rather than focus just on the average treatment effect, the scorecard also highlights interesting segments, where the treatment effect is different than the average. For example, a browser version (say Internet Explorer 8) may behave differently, leading to a discovery that JavaScript code failed in that setting; in other cases, low performance in a country or market may be due to poorly localized text. The key is that hypotheses develop and experiments start to target segments of users. In contrast to typically underpowered subgroup analyses in medical clinical trials, these experiments are highly powered with enough users that the segments are big enough for reliable statistical analyses.

Trustworthiness. With so many experiments running, there is an obvious concern for lack of trustworthiness and false positive results. We exercise multiple tests to identify scenarios that would indicate a problem [ 36 ] such as, for instance, skewed assignments. For example, suppose the experiment design calls for equal assignment to control treatment and that the actual number of control users is 821,588 and of treatment users is 815,482, and thus the ratio is 50.2% instead of 50%. The system would flag this as a sample-ratio-mismatch and declare the experiment result invalid, as the p value for such a split is 1.8x10 –6 . For dealing with multiple hypothesis testing problems, we replicate experiments. In areas like search relevance, teams are measured on the sum of treatment effects of a single key metric and, because many experiments are run, once a positive result is found, it is rerun, and the replication run determines the actual credit the team gets. The replication effect is unbiased, while the first run may have found an exaggerated effect [ 37 ].

Institutional memory. With tens of thousands of experiments run every year, it is important to highlight surprising results (both failures and successes). Some are published in conferences [ 19 ] or websites [ 38 ], but internal presentations and documents are important for cross-pollination.

A summary of the lessons for medicine learned in the technology field is given in Table 2 .

Similarities and dissimilarities with medical RCTs

Given their large sample sizes and scale, large scale A/B tests in technology allow addressing some additional design implementation issues that would have been difficult to address in traditional medical RCTs, which have rarely very large sample sizes to date. Some interesting topics are covered in Table 3 . Several of the features of A/B experiments discussed above can be adopted in RCTs in medicine and do not necessarily require a very large scale; the principles described here are already used in healthcare, although rarely. For example, Horwitz et al. describe a “ rapid-cycle randomized testing ” system that has been established in NYU Langone Health in the US and allowed to complete 10 randomized A/B tests, involving several hundred to several thousands of patients, within 1 year, with annual costs of $350,000 [ 20 ]. By testing various interventions that are introduced in routine care every day in many places in the world, and typically without randomized evaluation, they were able to determine what really works and systematically improved healthcare in their hospital: “ We now know with confidence that changing the text of a provider-targeted prompt to give tobacco cessation counseling in an office produces a significant increase in rates of medication prescriptions and that changing just a few sentences in telephone outreach scripts can both shorten telephone calls and increase rates of appointments for annual examinations. We have also learned that our postdischarge telephone calls have made no difference in rates of readmission or patient-experience ratings, that our appointment-reminder letters were completely ineffective, and that our community health worker program was inadvertently targeting patients who were unlikely to benefit ” [ 20 ].

The most desirable features of A/B experiments are their large-scale and low cost, which are commensurate with the tradition of large simple trials [ 42 ] and the emerging interest in pragmatic trials [ 43 , 44 ]. Lower costs would allow to test more and other interventions and provide better evidence on thus far understudied healthcare questions [ 13 , 16 ]. Online administration is also commensurate with the emerging efforts to perform point-of-care randomization [ 45 ]. The principles of ongoing, routine data collection for outcomes has parallelisms to the concept of using routinely collected data, e.g., from electronic health records, to fuel RCT datasets with proper outcomes [ 46 ].

There is less emphasis in medical RCTs on performing multiple RCTs at the same time and engaging the same participants in multiple concurrent RCTs. However, besides the traditional factorial designs [ 47 ], there is some literature, especially on lifestyle, about performing multiple concurrent parallel randomizations [ 48 ].

A major difference between A/B testing in technology and medical RCTs is their time horizon. Many RCTs in biomedicine would require longer follow-up, often much longer than that afforded by technology A/B trials. However, if a data collection system is in place (e.g., electronic health records), such data collection may be automated and real-time assembly of data would be feasible. Moreover, in acute medical treatment settings, there are many patient-relevant and economically important outcomes that can be collected in the short time frame, such as duration of hospital stay, admission to intensive care or re-admission rates.

Ethical implications are different between the technology field and medicine. There is a push towards having more trials that are simple and which compare usual care modifications that are already implemented somewhere or would be implemented anyway without ethical approval [ 49 ]. The evaluation of minor usual care modifications may be seen more as quality improvement than research [ 50 ] and using randomization alone may not necessarily define an evaluation as research [ 20 ].

Finally, the A/B concept may be particularly attractive for healthcare services, management, and improvement interventions, where most of the current research pertains to non-randomized before–after studies and interrupted time series. Essentially, each digital interaction, use of diagnostic software or algorithm, or electronic decision aid could and maybe should be evaluated and optimized in a randomized experiment.

Summary and discussion

Randomization is recognized as a powerful tool that technology companies successfully use at extremely large scale to improve their products and increase revenue. Not only the origins of the methods are similar in the technology world and the medical field, there are also many parallels in possible applications. However, the consistent and systematic implementation and integration into the entire development and application cycles have no such parallel in the biomedical world. The development and ongoing evaluation of new interventions as well as the many interfaces between users and providers of healthcare are far from optimal. There is substantial potential to improve health if these can be optimized.

Recently, criticism of randomized trials in medicine seems to be growing. Technological advances and the availability of large-scale data makes it tempting to abandon randomization, while randomization is precisely what has turned out to be so useful for the most successful technology companies. The technology world has demonstrated, on several occasions, that promising ideas in the vast majority of cases do not prove useful once they have been tested in online controlled experiments. While this has repeatedly been shown also for various cases in the medical world and various estimates of the extent of the problem exist, technology companies can objectively measure the failure rate and directly assess the true value of randomization. When most of the promising, plausible changes of practice turned out to be wrong, and even tiny changes of usual practice had substantial impact on key outcomes, a philosophy of ‘test everything with controlled experiments’ was established. Rather than hindering innovation; it fostered improvements to products and revenue.

Perhaps this is the most important lesson to be learned by the medical world. The most innovative technological field has recognized that systematic series of randomized experiments with numerous failures leads to sustainable improvement of the products. Even tiny changes should ideally undergo continuous and repeated evaluations in randomized experiments and learning from their results may be indispensable also for healthcare improvement.

Availability of data and materials

Not applicable.

Kohavi R, Crook T, Longbotham R. Online experimentation at Microsoft. Third workshop on data mining case studies and practice prize; 2009. https://exp-platform.com/Documents/ExP_DMCaseStudies.pdf . Accessed 3 Feb 2020.

Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N. Online controlled experiments at large scale. KDD ‘13: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM; 2013. p. 1168–76.

Google Scholar  

Kohavi R, Thomke S. The surprising power of online experiments. Harv Bus Rev. 2017. https://hbr.org/2017/09/the-surprising-power-of-online-experiments . Accessed 3 Feb 2020.

Tang D, Agarwal A, O’Brien D, Meyer M. Overlapping experiment infrastructure: more, better, faster experimentation. Washington, DC: Proceedings 16th Conference on Knowledge Discovery and Data Mining; 2010.

Book   Google Scholar  

Hohnhold H, O’Brien D, Tang D. Focus on the long-term: it’s better for users and business. Proceedings 21st Conference on Knowledge Discovery and Data Mining (KDD 2015). Sydney: ACM; 2015.

Posse C. Key lessons learned building linkedin online experimentation platform. Slideshare; 2013. https://www.slideshare.net/HiveData/googlecontrolled-experimentationpanelthe-hive . Accessed 20 Mar 2019.

Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A. From infrastructure to culture: A/B testing challenges in large scale social networks. KDD ‘15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney: ACM; 2015. p. 2227–36.

Xu Y, Chen N. Evaluating mobile apps with A/B and quasi A/B tests. KDD ‘16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 2016. San Francisco: ACM; 2016. p. 313–22.

Bakshy E, Eckles D, Bernstein M. Designing and Deploying online field experiments. WWW '14: Proceedings of the 23rd international conference on World Wide Web: 2014: Seoul: ACM; 2014. p. 283–92. https://doi.org/10.1145/2566486.2567967 . Accessed 3 Feb 2020.

Kohavi R, Round M. Front Line Internet Analytics at http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf . Accessed 3 Feb 2020.

Moran M: Multivariate testing in action: quicken loan’s regis hadiaris on multivariate testing. In: Biznology Blog by Mike Moran. 2008. https://biznology.com/2008/12/multivariate_testing_in_action/ . Accessed 3 Feb 2020.

Kohavi R, Tang D, Xu Y. Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge: Cambridge University Press; 2020.

Mc Cord KA, Ewald H, Ladanie A, Briel M, Speich B, Bucher HC, Hemkens LG, RCD for RCTs initiative and the Making Randomized Trials More Affordable Group. Current use and costs of electronic health records for clinical trial research: a descriptive study. CMAJ Open. 2019;7(1):E23–32.

Article   Google Scholar  

TrialForge. www.trialforge.org . Accessed 3 Feb 2020.

Treweek S, Altman DG, Bower P, Campbell M, Chalmers I, Cotton S, Craig P, Crosby D, Davidson P, Devane D, et al. Making randomised trials more efficient: report of the first meeting to discuss the Trial Forge platform. Trials. 2015;16:261.

Mc Cord KA, Al-Shahi Salman R, Treweek S, Gardner H, Strech D, Whiteley W, Ioannidis JPA, Hemkens LG. Routinely collected data for randomized trials: promises, barriers, and implications. Trials. 2018;19(1):29.

Google Search Statistics. Internet live stats. https://www.internetlivestats.com/google-search-statistics/ . Accessed 3 February 2020.

Ries E. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. New York: Crown Business; 2011.

Kohavi R, Deng A, Longbotham R, Xu Y. Seven Rules of Thumb for Web Site. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘14). 2014. p. 1857–1866. https://doi.org/10.1145/2623330.2623341 .

Horwitz LI, Kuznetsova M, Jones SA. Creating a learning health system through rapid-cycle, randomized testing. N Engl J Med. 2019;381(12):1175–9.

Siontis GC, Ioannidis JP. Risk factors and interventions with statistically significant tiny effects. Int J Epidemiol. 2011;40(5):1292–307.

van Belle G. Statistical rules of thumb. Hoboken: Wiley-Interscience; 2002.

Why most redesigns fail. https://www.freecodecamp.org/news/why-most-redesigns-fail-6ecaaf1b584e/ . Accessed 3 Feb 2020.

Forsgen N, Humble J, Kim G. Accelerate: the science of lean software and DevOps: building and scaling high performing technology organizations. Hoboken: IT Revolution Press; 2018.

Kohavi R, Deng A, Frasca B, Longbotham R, Walker T, Xu Y. Trustworthy online controlled experiments: Five puzzling outcomes explained. Proceedings of the 18th Conference on Knowledge Discovery and Data Mining. 2012. p. 786–794. https://doi.org/10.1145/2339530.2339653 .

Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JP. Concordance of effects of medical interventions on hospital admission and readmission rates with effects on mortality. CMAJ. 2013;185(18):E827–37.

Williamson PR, Altman DG, Bagley H, Barnes KL, Blazeby JM, Brookes ST, Clarke M, Gargon E, Gorst S, Harman N, et al. The COMET Handbook: version 1.0. Trials. 2017;18(Suppl 3):280.

Vickerstaff V, Ambler G, King M, Nazareth I, Omar RZ. Are multiple primary outcomes analysed appropriately in randomised controlled trials? A review. Contemp Clin Trials. 2015;45(Pt A):8–12.

Hay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nat Biotechnol. 2014;32(1):40–51.

Article   CAS   Google Scholar  

Davis C, Naci H, Gurpinar E, Poplavska E, Pinto A, Aggarwal A. Availability of evidence of benefits on overall survival and quality of life of cancer drugs approved by European Medicines Agency: retrospective cohort study of drug approvals 2009-13. BMJ. 2017;359:j4530.

Bowman D. Goodbye, Google. 2009. https://stopdesign.com/archive/2009/03/20/goodbye-google.html . Accessed 3 Feb 2020.

Hern A. Why Google has 200m reasons to put engineers over designers. Kings Place: The Guardian; 2014. https://www.theguardian.com/technology/2014/feb/05/why-google-engineers-designers . Accessed 3 Feb 2020.

Prasad V. Do cancer drugs improve survival or quality of life? BMJ. 2017;359:j4528.

Fabijan A, Dmitriev P, Holmström H, Bosch J. The evolution of continuous experimentation in software product development. Buenos Aires: ICSE ‘17: 2017;2017:770–80. https://doi.org/10.1109/ICSE.2017.76 .

Fabijan A, Gupchup J, Gupta S, Omhover J, Qin W, Vermeer L, Dmitriev P: Diagnosing sample ratio mismatch in online controlled experiments: a taxonomy and rules of thumb for practitioners. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19), August 4–8 , 2019 , Anchorage , Alaska.

Gelman A, Carlin J. Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect Psychol Sci. 2014;9(6):641–51.

Linowski J. Good UI: learn from what we try and test; 2018. https://goodui.org/ . Accessed 3 Feb 2020.

Kohavi R. Twyman’s law and controlled experiments. ExP Experimentation Platform. 2017. bit.ly/twymanLaw . Accessed 3 Feb 2020.

Deng A, Xu Y, Kohavi R, Walker T. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. WSDM 2013: Sixth ACM International Conference on Web Search and Data Mining: 2013. Rome: ACM; 2013. p. 123–32.

Xie H, Aurisset J. Improving the sensitivity of online controlled experiments: case studies at Netflix. KDD ‘16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 2016. New York: ACM; 2016. p. 645–54.

Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med. 1984;3(4):409–22.

Dal-Re R, Janiaud P, Ioannidis JP. Real-world evidence: HOW pragmatic are randomized controlled trials labeled as pragmatic? BMC Med. 2018;16(1):49.

Lipman PD, Loudon K, Dluzak L, Moloney R, Messner D, Stoney CM. Framing the conversation: use of PRECIS-2 ratings to advance understanding of pragmatic trial design domains. Trials. 2017;18(1):532.

Shih MC, Turakhia M, Lai TL. Innovative designs of point-of-care comparative effectiveness trials. Contemp Clin Trials. 2015;45:61–8.

Mc Cord KA, Al-Shahi Salman R, Treweek S, Gardner H, Strech D, Whiteley W, Ioannidis JP, Hemkens LG. Routinely collected data for randomized trials: promises, barriers, and implications. Trials. 2018;19(1):29.

Montgomery AA, Astin MP, Peters TJ. Reporting of factorial trials of complex interventions in community settings: a systematic review. Trials. 2011;12:179.

Ioannidis JP, Adami HO. Nested randomized trials in large cohorts and biobanks: studying the health effects of lifestyle factors. Epidemiology. 2008;19(1):75–82.

Dal-Re R, Avendano-Sola C, de Boer A, James SK, Rosendaal FR, Stephens R, Ioannidis JPA. A limited number of medicines pragmatic trials had potential for waived informed consent following the 2016 CIOMS ethical guidelines. J Clin Epidemiol. 2019;114:60–71.

Finkelstein JA, Brickman AL, Capron A, Ford DE, Gombosev A, Greene SM, Iafrate RP, Kolaczkowski L, Pallin SC, Pletcher MJ, et al. Oversight on the borderline: quality improvement and pragmatic research. Clin Trials. 2015;12(5):457–66.

Download references

Acknowledgements

We wish to thank members of Microsoft’s Analysis & Experimentation team and LinkedIn’s experimentation team for their involvement in many of the experiments discussed here.

The Meta-Research Innovation Center at Stanford is funded by a grant by the Laura and John Arnold Foundation. The Basel Institute of Clinical Epidemiology and Biostatistics is supported by Stiftung Institut für Klinische Epidemiologie. None of the funders/sponsors had a role in the design and conduct of the project and preparation, review, approval of the manuscript, or decision to submit the manuscript for publication.

Author information

Authors and affiliations.

Analysis & Experimentation, Microsoft, One Microsoft way, Redmond, WA, 98052, USA

Airbnb, 888 Brannan St, San Francisco, CA, 94103, USA

Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA

LinkedIn, 950 W Maude Ave, Sunnyvale, CA, 94085, USA

Basel Institute for Clinical Epidemiology and Biostatistics, Department of Clinical Research, University Hospital Basel, University of Basel, 4031, Basel, Switzerland

Lars G. Hemkens

Stanford Prevention Research Center, Department of Medicine, Stanford University School of Medicine, Medical School Office Building, Room X306, 1265 Welch Rd, Stanford, CA, 94305, USA

John P. A. Ioannidis

Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Palo Alto, CA, 94305, USA

Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, 94305, USA

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, 94305, USA

Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA, 94305, USA

You can also search for this author in PubMed   Google Scholar

Contributions

RK, DT, and YX wrote the first draft with input by LGH and JPAI, and all authors made critical revisions to the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to John P. A. Ioannidis .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

RK currently works at Airbnb; this article was written while he worked at Microsoft, where many of the experiments described here ran. Microsoft may indirectly benefit from the manuscript, which establishes it as a thought leader in the Controlled Experiments space.

As disclosed in the authors section, DT works at Google. Google may indirectly benefit from the manuscript, which establishes it as a thought leader in the Controlled Experiments space. Google includes experimentation as part of existing products such as Google Analytics.

As disclosed in the authors section, YX works at LinkedIn. LinkedIn may indirectly benefit from the manuscript, which establishes it as a thought leader in the Controlled Experiments space.

LGH and JPAI support the RCD for RCT initiative, which aims to explore the use of routinely collected data for clinical trials. They have no other relationships or activities that could appear to have influenced the submitted work.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Kohavi, R., Tang, D., Xu, Y. et al. Online randomized controlled experiments at scale: lessons and extensions to medicine. Trials 21 , 150 (2020). https://doi.org/10.1186/s13063-020-4084-y

Download citation

Received : 02 April 2019

Accepted : 18 January 2020

Published : 07 February 2020

DOI : https://doi.org/10.1186/s13063-020-4084-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Randomization
  • Healthcare decision-making
  • Online experiments

ISSN: 1745-6215

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

advantages of online experiments

  • Harvard Business School →
  • Faculty & Research →
  • Harvard Data Science Review

Online Experimentation: Benefits, Operational and Methodological Challenges, and Scaling Guide

  • Format: Electronic

About The Author

advantages of online experiments

Iavor I. Bojinov

More from the authors.

  • Faculty Research

Winner Take All: Exploiting Asymmetry in Factorial Designs

Pernod ricard: uncorking digital transformation, design-based inference for multi-arm bandits.

  • Winner Take All: Exploiting Asymmetry in Factorial Designs  By: Matthew DosSantos DiSorbo, Iavor I. Bojinov and Fiammetta Menchetti
  • Pernod Ricard: Uncorking Digital Transformation  By: Iavor Bojinov, Edward McFowland III, François Candelon, Nikolina Jonsson and Emer Moloney
  • Design-Based Inference for Multi-arm Bandits  By: Dae Woong Ham, Iavor I. Bojinov, Michael Lindon and Martin Tingley

Journal Name Logo

Swiss Psychology Open: the official journal of the Swiss Psychological Society

Press Logo

  • Download PDF (English) XML (English)
  • Alt. Display

How to Run Behavioural Experiments Online: Best Practice Suggestions for Cognitive Psychology and Neuroscience

  • Nathan Gagné
  • Léon Franzen

The combination of a replication crisis, the global COVID-19 pandemic in 2020, and recent technological advances, have accelerated the on-going transition of research in cognitive psychology and neuroscience to the online realm. When participants cannot be tested in-person, data of acceptable quality can still be collected online. While online research offers many advantages, numerous pitfalls may hinder researchers in addressing their questions appropriately, potentially resulting in unusable data and misleading conclusions. Here, we present an overview of the costs and benefits of conducting online studies in cognitive psychology and neuroscience, coupled with detailed best practice suggestions that span the range from initial study design to the final interpretation of data. These suggestions offer a critical look at issues regarding recruitment of typical and (sub)clinical samples, their comparison, and the importance of context-dependency in each part of a study. We illustrate our suggestions by means of a fictional online study, applicable to traditional paradigms such as research on working memory with a control and treatment group.

  • online research
  • best practice
  • neuroscience

Introduction

In the midst of a global pandemic, experimental research has exceeded the realm of physical space through online experimentation. In a change that started years ago, a large-scale transition was long overdue. The increasing shift to online experiments in cognitive psychology and neuroscience was enabled by increasing technological innovations; shifting the focus to both the costs and benefits attributed to online experiments. The 2020 COVID-19 pandemic forced most researchers to transition online almost overnight, yet many of these researchers possess limited experience with this new delivery method, and several challenges await them.

With careful implementation, the benefits attributed to online research present the potential to address some of the current issues in the fields of cognitive psychology and neuroscience. In recent years, researchers in these fields have found themselves in a dire replication crisis ( Baker & Penny, 2016 ; Ioannidis, 2005 ; Ioannidis et al., 2014 ; Makel et al., 2012 ). The failure to replicate findings of previous work has been a growing trend ( Sharpe & Poets, 2020 ), as reproducibility rates of published findings in the field of psychology are estimated to only range between 35% and 75% ( Artner et al., 2020 ; Camerer et al., 2018 ; Etz & Vandekerckhove, 2016 ). In fact, more than half of researchers have tried and failed to reproduce their own studies ( Baker & Penny, 2016 ). These numbers do not build trust in reported findings within academia and the public eye ( Sauter et al., 2022 ). Causes for low replication rates had been attributed to the complex nature of reproducing experimental methodology, including problems with statistical power, selective and small convenience sampling, engaging in questionable research practices, publication bias, and high costs associated with running a study repeatedly with a sufficient sample size ( Button et al., 2013 ; Holcombe, 2020 ; Ioannidis et al., 2014 ; John et al., 2012 ; Munafò et al., 2017 ; Nosek et al., 2012 ; Rosenthal, 1979 ; Simmons et al., 2011 ). However, the replication crisis constitutes only one example of a general problem that can be addressed with online research. Other aspects include greater online accessibility for recruitment, as well as quicker and cheaper training of budding researchers for instance. Nonetheless, online designs are not a blanket solution and must be thoughtfully implemented.

Until the start of the COVID-19 pandemic, data validity and reliability had only been investigated for selected leading platforms, such as Amazon’s Mechanical Turk (MTurk; e.g., Berinsky et al., 2012 ; Chandler et al., 2014 ; Crump et al., 2013 ), and groups of experiments (e.g., psychophysics: de Leeuw & Motz, 2016 ; perception: Woods et al., 2015 ). However, the sudden shift encouraged publications addressing the generic implementation of online experiments in accordance with the current technological possibilities ( Grootswagers, 2020 ; Sauter et al., 2020 ). These constitute a great starting point, but a more nuanced take on many practical and experimental aspects of online research remains absent from the literature. Recent work found no statistical difference between online and in-person testing ( Sauter et al., 2022 ), whereas other work reveals a small yet acceptable loss of data quality when comparing online testing to in-person testing ( Uittenhove et al., 2022 ). Notably, the latter authors place a large emphasis on sampling by suggesting that “who” is tested is more important than “how” they are tested. Specifically, participants from the online platform Prolific ( Palan & Schitter, 2018 ) were found to be more indistinguishable to students than MTurk participants. In fact, Prolific consistently provided higher data quality than MTurk ( Peer et al., 2021 ). These findings underline the importance of deliberately choosing the platform and participants for successful studies.

The present work provides an overview of the costs and benefits of online experimentation, paired with some best practice suggestions spanning the entire study cycle from the design to interpreting the results—with a focus on the sampling of specific populations. To this end, we use a fictional example of a study recruiting a specific sub-clinical dyslexia sample, as a practical illustration that can be generalized to most difficult-to-recruit populations. The recruitment of these populations has a long history of proving difficult and being biased ( Blythe et al., 2018 ).

Overview of costs and benefits of online research

Online research provides a wealth of opportunities to improve psychological studies and to address some of the postulated pitfalls (henceforth, costs) that have previously deterred researchers from using this delivery method (see costs listed in Table 1 ). Often methodological problems with online research have appeared as major obstacles in the eyes of many researchers. Specifically, costs of online research may include reduced control of the testing environment and possibility for intervention, especially in a mass online testing setting. The physical absence of researchers in the online realm prevents certain types of direct intervention, if issues arise during the experiment, and precludes the mitigation of extraneous distractors during testing. However, this limitation only applies if participants are being tested in a mass online setting. Intervention on an individual level, such as organising an online call during testing, remains a possibility, specifically for research with special populations. These extraneous distracting variables present an opportunity to confound results when a study is conducted outside of the laboratory. Distractors, and unforeseen issues may also add noise to the data. For example, participants may start answering at random part-way through the experiment due to a lack of motivation or attention, or they may even start cheating on the task by screen grabbing.

List of costs and benefits of online studies. To facilitate evaluation of the importance and consequences of items, costs and benefits indicate a potential trade-off any experimenter would engage in when conducting research online.

COSTS

The physical presence of the researcher in an in-person setting not only adds social pressure to perform well on the experiment, but the social interaction itself may also offer added motivation to participate. As such, it is conceivable that participants who are looking at a screen for the entirety of an online experiment, in the comfort of their own home, may feel more unmotivated and get easily distracted, as it may feel less personal and purposeful than completing a study in-person. Participant dropout rates are higher online compared to offline ( Yetano & Royo, 2017 ). In fact, a dropout rate of 20% is not uncommon online ( Peer et al., 2021 ), potentially due to a lack of social pressure to complete a study and type of motivation ( Jun et al., 2017 ).

Expensive hardware may also be used to compensate for cognitive and perceptual differences to a certain extent. For example, without the appropriate screen calibration checks in place, a larger screen size may allow for faster and better detection of a stimulus in a reaction time task. Additionally, participants’ computers may have varying processing capacities which could come to influence millisecond timing accuracy in brief stimulus presentation tasks.

The increased anonymity provided by online experiments gives participants a certain level of protection against fraudulent behaviour, such as claiming they meet the eligibility criteria when they fail to do so in reality. The temptation to pay participants non-adequately may also be greater in the online realm, given the lack of physical and social contact with participants. Additionally, if a bug is present, then much data needs to be discarded, unlike in offline research where participants are tested individually, and the bug could be eliminated without having affected hundreds of datasets in a matter of minutes or hours. Online studies also require both participants and researchers to have access to a computer and stable Internet connection, rendering it more difficult to reach poorer populations. Since not everyone has reliable access to the Internet and technology and is given access to online platforms such as MTurk or Prolific, not everyone may be reached. Lastly, when sampling from common online platforms, there may be a higher likelihood of sampling from non-naïve participants. Individuals registered on common online platforms may already be well-versed in research, as these platforms offer unlimited opportunities for participants to familiarize themselves with common study designs. This concern is heavily task-dependent, and more impactful for social and learning tasks. The amount of time spent on a platform and whether this is one’s main source of income factors into naivety and data quality concerns as well ( Peer et al., 2021 ).

Recruitment strategies and sampling methods also remain important components to increasing an experiment’s ecological validity. Many sampling biases in experimental research are often a product of convenience sampling, which involves drawing a sample from a population that is close and easy to reach ( Saunders et al., 2019 ). This type of sampling method is often useful for pilot studies, but psychological research has become increasingly reliant on convenience samples of undergraduate students from western universities, resulting in a misrepresentation of the true population ( Masuda et al., 2020 ). Henrich et al. ( 2010 ) grouped these issues with the growing reliance on such samples in psychological research in the descriptive term WEIRD (Westernized, Educated, Industrialized, Rich, and Democratic). These individuals tend to be undergraduate students who are taking courses in psychology or neuroscience and have previously been exposed to psychological research. WEIRD samples in university participant pools allow researchers to conduct their experiments at a low cost and with limited recruitment effort. Students are already familiar with the experimental process and only receive course credit as compensation, which results in an easy and low-cost form of recruitment ( Masuda et al., 2020 ), but places an inherent limitation on the generalisability of results.

The evident convenience associated with WEIRD samples has often left researchers reluctant to explore new ways to expand their sampling efforts. However, online research provides an alternative delivery method that has the potential to counteract this reluctance by allowing for greater data collection for a similar budget. To increase sampling efforts without much effort and recruit a wider variety of participants, one may use one of the many existing platform solutions. If the appropriate sampling method is paired with a complementary delivery system (online vs offline), there is potential for capturing a more heterogenous sample. While online platforms and large-scale initiatives simplify recruitment of larger samples, it is crucial to investigate who are the standard users of platforms, such as Amazon’s MechanicalTurk (MTurk; www.mturk.com ), Prolific Academic ( www.prolific.co ; Palan & Schitter, 2018 ), OpenLab ( www.open-lab.online ), and university participant pools, among others, before recruiting through them ( Chandler et al., 2014 ; Rodd, 2019 ; for details on implementation and platforms, see Grootswagers, 2020 ; Sauter et al., 2020 ). Here, we do not aim to describe populations on these platforms in much detail, as this has been done elsewhere (e.g., Berinsky et al., 2012 ; Levay et al., 2016 ; Walters et al., 2018 ; Woods et al., 2015 ). Specifically for MTurk, the interested reader can draw back on several analyses of participants and task performance in the context of online tasks ( Berinsky et al., 2012 ; Casler et al., 2013 ; Chandler et al., 2014 ; Crump et al., 2013 ; Goodman et al., 2013 ; Hauser & Schwarz, 2016 ; Levay et al., 2016 ; Mason & Suri, 2012 ; Paolacci et al., 2010 ; Shapiro et al., 2013 ; Sprouse, 2011 ; Walters et al., 2018 ). The prevalence of self-reported clinical conditions in these samples matches that seen in the general population, but can also surpass that observed in laboratory studies, making crowdsourcing a viable way to conduct research on clinical populations ( Gillan & Daw, 2016 ; Shapiro et al., 2013 ; van Stolk-Cooke et al., 2018 ). Another investigation in the context of political science shows that users of MTurk are more representative of the US population than in-person convenience samples ( Berinsky et al., 2012 ). While MTurk can provide a representative US sample, other platforms such as Prolific allow to collect data from representative samples (US, UK, Germany, etc.) based on census data (stratified by age, sex, and ethnicity). Reports also show that results obtained from more diverse MTurk samples are almost indistinguishable from those collected in laboratory settings, as many well-known laboratory effects replicate ( Casler et al., 2013 ; Crump et al., 2013 ; Goodman et al., 2013 ; Sprouse, 2011 ). This also holds true for Prolific ( Sauter et al., 2022 ). Nonetheless, researchers must be wary of undesired effects. These include a lack of seriousness that can occur when a specific type of user (i.e., younger men) is dominant on a platform ( Downs et al., 2010 ). Additionally, socioeconomic differences may prevent reaching everyone equally, even in industrialized countries. Although this list of potential costs of online experimentation appears lengthy, many of the presented costs have the potential to be mitigated in these studies, given best practice adaptations. We will illustrate these adaptations in a fictional study on dyslexia.

Some of the adaptations are solutions to issues associated with the replication crisis that can be implemented with relative ease. Increased possibility for recruitment is the most frequently mentioned benefit of online studies (see benefits listed in Table 1 ). Online experimentation attracts a wider audience that would otherwise be difficult to reach in-person. This benefit results in the important ability to efficiently collect large amounts of data as a function of greater accessibility brought on by the online realm, particularly in industrialized countries with wide-spread access to technological devices and reliable internet. Though, socioeconomic differences may still affect an equal reach. Online samples, therefore, generally provide higher accessibility to more representative samples. This benefit contributes to reducing the use of selective and small convenience samples and increases statistical power, which has positive implications for the replication crisis.

Another benefit of online studies stems from the recent availability of all-in-one platforms offering experiment building, presentation, recruitment, and distribution capabilities. Platforms that offer integrated features for building, hosting, and recruiting include Inquisit Web, Labvanced, and Testable ( Sauter et al., 2020 ). These integrated online environments can result in less monetary and time investments on the researchers’ end when it comes to data collection. However, the same or more time may need to be dedicated to building and setting up an experiment. Therefore, having a good workflow is important to maximize efficiency, as we will discuss later on. Online platforms also allow for rapid data collection that requires less lab equipment and space. For example, no experimental consumables need to be purchased or costly lab space booked (for discussions, see Grootswagers, 2020 ; Mathôt & March, 2021 ; Sauter et al., 2020 ). The combination of these benefits also provides access for trainees to run their own online experiments earlier on in their careers when funds can be limited.

The absence of the researcher also reduces any social pressures and feelings of obligation to finish a study that may be present in offline research, and increases the potential for data anonymity. Participants may also experience greater autonomy with regards to the time and location of their participation, as there is no dependence on the experimenter and the laboratory’s availability. Hence, online research could become advantageous for both single studies and entire research fields.

Although researchers may accept some of the costs to achieve increased sample sizes that may result in increased statistical power, online research is not the one-size-fits-all solution to all problems. Only if study objectives and online methods are aligned, the online experiment can quickly become a useful and trustworthy tool that is able to replicate ( Crump et al., 2013 ; Sauter et al., 2022 ), extend, and generalise findings from laboratory experiments. Hence, it is indispensable to tailor suggestions for successful online studies to the specific context of a study and its research questions. Otherwise, well-intended generic blueprints bear the potential to be counterproductive by leading the avid trainee astray. For example, the generic recommendation to provide participants with feedback in each trial ( Crump et al., 2013 ) may help to avoid missed trials but risks adding an unintended, confounding learning component to a perceptual decision-making experiment. Thereby, it could fail to address the research question altogether. To facilitate the transition to online experiments, this paper provides suggestions for future online studies, focusing on context-dependent leveraging of the opportunities of online studies in cognitive psychology and neuroscience research.

Online Research Suggestions

In this section, we introduce best practice suggestions generally applicable to traditional paradigms in the field of cognitive psychology and demonstrate their application using a fictional online study (for an overview, see Figure 2 ). These suggestions equally apply to behavioural data forming a part of neuroscience studies, which may be complemented or followed up with neuroscientific methods in an offline setting. While the outlined suggestions cover many aspects involved in online studies and their organisational structure, they are not intended to be all-encompassing.

The proposed fictional (i.e., simulation) study investigates whether adult dyslexia is associated with visuo-spatial working memory deficits (i.e., accuracy and reaction time) in an adapted version of a Sternberg task ( Sternberg, 1966 ) using stimuli from consumer psychology ( Henderson & Cote, 1998 ). Such a research question can be answered with behavioural data collected online and fuses a cognitive research question with the investigation of a hard-to-recruit sample. This fictional example fits the current zeitgeist as its experimental paradigm could also be run in the laboratory, but some aspects, such as recruitment from a specific population, pandemic-related limitations of in-person interaction, and cost efficiency, would benefit crucially from the online delivery method.

At the very beginning, when designing a study, the specific objectives and research questions of the study are important to consider in determining whether running the study online would address the research question(s) appropriately. For example, if physiological measurements, such as those obtained from electroencephalography (EEG), functional magnetic resonance imaging (fMRI) or galvanic skin response devices are a key component of the study, answering questions regarding those data by means of an online study alone would be impossible. If the study also asks a question about behavioural performance, however, the experimenters could decide to run this part online. This online option could work as a pilot for a subsequent lab experiment including physiological and neuroscientific measures. Vice versa, if experimenters get to run the study with a smaller sample size in the lab first, those pilot data collected under controlled circumstances can be used to establish data screening thresholds for the future analysis of the online data. However, it is not recommended to estimate power based on effect sizes from small sample pilot studies, as this could lead to overall biased estimations ( Albers & Lakens, 2018 ). Depending on the task, information from meta-analyses, previous published research, or data simulations may be used as appropriate for power computations. In this way, online and offline studies can be complementary and benefit each other, but carefully considering the research question(s) that can be addressed with each delivery method is key. Such complementary methodology also has the advantage of avoiding a reductionist bias in neuroscience ( Krakauer et al., 2017 ).

Workflow and organisation

An efficient online workflow is important for success, which is discussed at length elsewhere ( Grootswagers, 2020 ; Mathôt & March, 2021 ; Sauter et al., 2020 ). The hypothetical workflow of the presented simulated experiment would include the creation of our experiment using the Builder or python-based Coder of PsychoPy3 ( Peirce et al., 2019 ) before exporting a preliminary form of the experiment to the experimental platform Pavlovia.org with the click of a button. This transfer translates the Python-based PsychoPy code into JavaScript code and creates a repository on GitLab to host the translated code. The experiment would then be linked to this new repository and Pavlovia at the same time. It could also be updated directly from within a PsychoPy file, if the JavaScript code on GitLab has not yet been modified manually. Should one want to implement custom or advanced aspects, the easily accessible JavaScript code upon which the experiment is based can be modified directly. Other experiment builders are also available, such as Gorilla ( Anwyl-Irvine et al., 2020 ), LabVanced ( www.Labvanced.com ), InquisitWeb ( www.millisecond.com ), PsyToolkit ( Stoet, 2010 , 2017 ), OpenSesame/OSWeb ( Mathôt et al., 2012 ), Testable ( www.testable.org ; Sauter et al., 2020 ), and FindingFive ( www.findingfive.com ). Here, we do not intend to give specific recommendations. Instead, we encourage researchers to examine the technical capabilities and licensing costs of each platform closely before committing.

Participants may be recruited through a university participant pool (e.g., SONA) or advertisements sharing a link to a questionnaire hosted on an online survey platform, such as Qualtrics ( www.qualtrics.com ), SurveyMonkey ( www.surveymonkey.com ) or SoSci Survey ( www.soscisurvey.de ). Written consent, demographics and other relevant information can be collected online using a questionnaire on Qualtrics. Some software including SONA allows for the automatic generation of ID codes and porting them from one software to the next. In the case of the SONA platform, it generates a random ID code that can be automatically forwarded to Qualtrics and subsequently to Pavlovia. Here, it is important to choose this code carefully for it to work across platforms and coding languages. The Pavlovia experiment can then be opened in the participants’ default browser and start the automatic download of the experimental stimuli to run the study locally via the browser. Upon completion of the task, a poststudy questionnaire and a thank you message can also be displayed.

Importantly, to avoid participants spending a long time wrongly doing the task and the researchers having to award credit or pay participants for unusable data, checks should be performed in the background and problems may be presented as error messages before the end of the experiment, and may even lead to automatic experiment abortion. It is crucial to pilot your workflow using multiple machines, browsers, and participants extensively before starting any data collection.

Lastly, having good lab organisation is an often-underrated factor of successful online studies—and in-person experiments alike. Acquiring research assistants, ensuring having access to all software and platforms, implementing a system of on-boarding or training on these platforms and all aspects involved in the study, and keeping track of administrative tasks such as emails, subject IDs, and credits should all be independent of all individuals to mitigate the impact of the absence of one crucial person or researchers moving to different labs. This is especially important considering many younger trainees (e.g., undergraduate thesis students) getting the opportunity to run an online study but moving to a different programme shortly after its completion. Figure 2 summarises our suggestions for best practices in online research.

Experimental design

To examine the research question in our fictional study, we used a mixed (within-between-subjects) experimental design . That is, comparing two memory conditions and two experimental groups. The initial step includes a power analysis. Since we generate data for our fictional example, we take a simulation approach, which evaluates the power to detect a potential effect by means of repeated linear mixed-effects modelling (tutorial available: DeBruine & Barr, 2021 ). This approach’s flexibility allows for running simulated power and sensitivity analyses for various experimental designs including their specific factors. Our assumptions were based on data from a recent working memory dyslexia study (for details, see R markdown script on the OSF; Franzen et al., 2022 ). For comparison, an a priori power analysis for a traditional repeated measures ANOVA with a within- and between-subject interaction resulted in a required total sample size of 68 participants (two predictors, Cohen’s f effect size = 0.225, alpha = 0.05, power = 95%; Faul et al., 2007 ). We used this number as a stopping criterion for the data simulation process. Online research gains strengths from collecting as much data as possible and effects are often small. However, collecting large amounts of data will highly depend on a trade-off of statistical, resource, and time considerations. In real-life studies, the results of a power analysis are important guiding principles to avoid introducing data collection biases, which have been contributors to the replication crisis.

Incorporating a within-subjects comparison of conditions has been recommended as a safeguard to decrease both participant and technical noise by making the data less dependent on differences between participant set-ups (i.e., excluding technical noise; Anwyl-Irvine et al., 2021 ; Bridges et al., 2020 ; Pronk et al., 2020 ; Rodd, 2019 ). Our design included a three-item working memory condition, which could serve as a familiar reference condition in the analysis by calculating the difference between the target and reference condition. A between-participant component could then investigate potential differences of two groups of independent participants between relevant conditions.

Specific to our fictional study design is a sample comprised of two participant groups that are captured by a between-subjects factor . Like in an ideal scenario, the simulated groups were of equal size. Here, when recruiting from hard-to-find populations, online studies can be particularly beneficial by providing locally unbound access to more potential participants. We would expect to be able to collect a much larger sample by conducting the study online rather than in the laboratory, especially assuming circumstances surrounding data collection during a pandemic.

However, participants on online platforms may be non-naïve resulting in a form of dependency of data points, as some participants are more likely to take part in similar tasks ( Chandler et al., 2014 ; Chandler & Paolacci, 2017 ; Meyers et al., 2020 ; for an empirical evaluation, see Woike, 2019 ). To circumvent issues arising from certain task types being frequently offered on a given platform (e.g., economic game paradigms), researchers might opt for less traditional paradigms. An alternative could also be to switch the stimulus type of a traditional paradigm if this fits with the proposed research question and objectives. For example, an n-back task could be presented using letters or symbols rather than numbers. The issue of naivety may be less prominent for speeded reaction time tasks, as having a previous understanding of the instructions, may not result in improved performance if the stimuli are different. This point would be especially relevant for implicit learning tasks, where the absence of experience is critical to the paradigm. Lastly, avoiding a predominantly western sample may still be difficult, as about ⅔ of the recently active users on Prolific Academic were born in the UK, US, Germany or Canada (accessed 28/07/2021) but efforts in this direction should be made regardless.

All inclusion/exclusion criteria should be established prior to data analysis to avoid introducing biases ( Chandler et al., 2014 ; Munafò et al., 2017 ; Nosek et al., 2018 ), checking the amount of time participants spent on reading instructions, using instructional manipulation/comprehension checks ( Oppenheimer et al., 2009 ), integrating timed attention/catch trials to weed out inattentive participants, and analysing data that was flagged for exclusion separately or including this fact as a moderator variable in the analysis ( Chandler et al., 2014 ). These suggestions can be implemented for experimental research and questionnaires alike. Ideally, a study would perform participant screening and exclusion before collecting actual experimental data by using one of the many tools built into the aforementioned recruitment platforms. For example, catch trials would then represent a complementary check of the data. These attention checks are meant to determine whether participants are paying attention and completing the study in accordance with the instructions, and could mitigate the lack of motivation and attention present in the online realm. In fact, probing participants’ attention using catch trials is generally a useful strategy, as results revealed that inattention may be even larger in the lab. One study found that 61% of participants failed attention checks in the lab compared to only 5% of MTurk participants ( Hauser & Schwarz, 2016 ). We also suggest testing participants in small batches, as an alternative, to prevent having to discard large amounts of data at once, if a bug is present in the code. In between batches, a pre-registered script assessing the data quality and checking for bugs only should be run.

An ever-existing risk in an online environment is the potential for participant fraud —particularly when recruiting rare populations ( Chandler & Paolacci, 2017 ). While fraudulent behaviour cannot be entirely excluded, it is important to verify as much as possible that participants really are who they say they are and are prevented from participating repeatedly ( Rodd, 2019 ). Here, recruiting through a trustworthy official platform is likely to mitigate potential fraud due to in-built screening and monitoring tools. Besides demographics, system information, and ID numbers, one may also include checks throughout the study by asking about similar information several times using different wording. As an example, to ensure that a participant is truly an individual with dyslexia, they may be asked “have you been officially diagnosed with dyslexia?”. However, later in the study they could also be asked “have you experienced much difficulty with reading or spelling at some point during your life?”. The responses to these questions may help solidify the participant’s identity as an individual with dyslexia.

As in previous studies, we suggest safeguarding against including participants in the control group who might be part of a special population . For instance, participants who are suspected to encounter dyslexia-related issues can be filtered by means of the Adult Dyslexia Checklist ( Franzen, 2018 ; Franzen et al., 2021 ; Smythe & Everatt, 2001 ; Stark et al., 2022 ). This checklist is a self-report questionnaire that can be easily administered and scored online without much effort on the experimenters’ side. Once implemented in the software of choice, it can easily be reused in multiple studies using the same software. This checklist adds another level of safety when recruiting participants fitting specific group in-/exclusion criteria online. We used a conservative cut-off score of 40 as an upper cut-off and exclusion criterion from the control group ( Figure 1b ). Online studies could also be run in two parts, the first part for screening participants, and the second only for those with valid information that meets the eligibility criteria.

Example study’s design, simulated checklist scores, and data screening visualisations

Experimental design, dyslexia scores, and data screening visualisations. a) Schematic of the fictional paradigm and trial sequence. First, participants saw an encoding period in which they encoded the location of various stimuli of the same type presented in different locations. A total of eight fixed locations were available on each trial and either three or six locations were filled with items. Then, a spatial retrieval cue was followed by a decision screen presenting three different stimuli of the same type. Participants were instructed to respond as quickly as possible using their physical keyboard. Further experimental details are available from the study’s Open Science Framework repository ( Franzen et al., 2022 ). b) Raincloud plots ( Allen et al., 2019 ) of the simulated dyslexia checklist scores that served as screening tool after the removal of excluded participants. Dyslexia data is depicted in blue colour, while the yellow colour indicates data of the control group. Overlaid boxplots show the median, upper and lower quartile. A maximum score of 40 was used to delineate between participants included in the control group and others without an official dyslexia diagnosis who were excluded from further analyses and this plot. c) Scatterplots of accuracy as a function of reaction time across all conditions (top: measures of central tendency; bottom standard deviations). One mean value per participant computed across mean accuracy or median reaction times of both working memory conditions. Colours indicate groups. Blue dots depict single-participant values of the dyslexia group, whereas yellow dots depict values of the control group. Dashed lines indicate the lower and upper bounds of the 95% confidence interval for all participants included in the analyses.

Flowchart outlining best practice suggestions by stage from design to discussion

Overview of suggestions for online research by study stage. Flowchart following the workflow of experimental studies in cognitive psychology and neuroscience.

We simulated accuracy and reaction times as dependent performance measures from individuals with and without dyslexia. The use of a speeded task in an online design is recommended, as fast reaction times may avoid cheating and serve as a supplementary screening measure by recording hints of a lack of attention ( Rodd, 2019 ). We implemented sensible but challenging time limits, particularly for the memory decision (i.e., 3 seconds; Figure 1a ). These are fast enough to keep participants attentive and discourage “screen grabbing” on memory tasks, while not being too fast either making the task too difficult or to avoid participants dropping out. Researchers may also review the relevant literature and pilot ahead of time to determine an appropriate response deadline, depending on the testing population. For example, while A/B perceptual decision-making frequently uses a 1.5 second response deadline, the deadline for the same task has been extended to 3 seconds for patients with psychosis. Participants were required to use a physical keyboard and button responses to avoid cursor moving times or touch screen inaccuracies to confound our results. Nevertheless, using this set-up, one has to keep in mind the standard polling rate of keyboard devices of 125 Hz (i.e., one sample every 8 ms) when interpreting the precision of logged reaction times ( Anwyl-Irvine et al., 2020 ).

Equally, the monitor’s frame rate has important implications for the stimulus presentation duration as it constrains the rate at which stimuli can be presented, and in turn their presentation duration. We suggest recording the frame rate of all participants to get an idea of potential stimulus presentation timing differences. 60 frames per second (also termed Hz) is the standard for most laptop screens and laboratories, which means that a new stimulus can be presented every 16.6 ms and its multiples. Most platforms exhibit a positive delay, whereby they present a stimulus longer than intended ( Anwyl-Irvine et al., 2021 ). Therefore, achieving exact presentation of short stimulus durations of 50 ms or less is difficult to impossible to guarantee online and should be avoided for this reason ( Grootswagers, 2020 ). However, timing concerns for longer stimulus presentation durations on modern browsers, and with the optimal operating system/browser pairing, have been alleviated in recent years ( Anwyl-Irvine et al., 2021 ; Bridges et al., 2020 ; Gallant & Libben, 2019 ; Mathôt & March, 2021 ; Pronk et al., 2020 ). Timing accuracy of visual displays and reaction times tend to be best when chromium-based browsers are being used, independent of the operating system (e.g., Google Chrome; Anwyl-Irvine et al., 2021 ). But substantial differences between the browser/presentation software/operating system combinations exist and need to be considered in the study design ( Bridges et al., 2020 ).

Generally, timing accuracy in online studies is affected by the variety of set-ups and varies slightly more than in the lab across all measurements (for more details, see Bridges et al., 2020 ). For example, the accuracy of reaction times may be adversely affected if certain operating system/browser pairings are used (for details, see Anwyl-Irvine et al., 2021 ; Bridges et al., 2020 ). Laptops offer more recording precision and are preferable over touch screen devices. In contrast, research questions less reliant on speeded answers or interested in using push notifications may alternatively opt to use smart phones and tablet devices. Thus, we suggest recording all possible timings regardless of the allowed testing devices to check whether participants stayed attentive and on task.

Counterbalancing can be applied, as normal experimental design principles become even more important online to avoid even small order effects becoming relevant due to larger sample sizes ( DePuy & Berger, 2014 ). Optimally, a Latin square can be used to determine the counterbalancing order of conditions. Counterbalancing of the block or trial order could also be performed using two or more separate versions of the experiment to avoid unequal dropout rates imbalancing groups during automatic, participant-id-dependent assignment. Some platforms offer built-in quota management or randomisation modes. An alternative method to structured counterbalancing would be full randomisation of stimuli. Here, the order of blocks and trials within blocks gets randomised to mitigate the impact of their order altogether. Counterbalancing and randomisation are supposed to be common practice in experimental psychology online and in the laboratory ( Kingdom & Prins, 2016 ). Particularly, when participants can be randomly assigned to two groups independent of their characteristics in between-subject designs ( Altman & Bland, 1999 ).

To equalise visual stimuli appearance across a variety of screens, we suggest adapting it based on a box that the participant needs to manually adapt to the size of a common reference object (e.g., a credit card) at the start of the experiment. This method has been implemented for some of the platforms including Psychtoolbox ( Psychtoolbox Team, 2021 ), PsychoPy/PsychoJS/Pavlovia ( Carter, 2021 ), Gorilla ( Gorilla Team, 2021 ), and Labvanced ( Labvanced Team, 2022 ). Additionally, we did not allow participants to use their phones or tablets to complete the study. Most online delivery platforms have the option to select the specific types of devices allowed for a particular study before publishing it. This may add additional safety regarding consistent stimulus presentation. A mixture of screen size, refresh rate, and resolution may serve as quick a posteriori proxy of stimulus size effects and help to rule these hardware parameters out as potential confounds of performance. Screen size and resolution are easily accessible measures that most platforms provide, but other options could also be used. Researchers can correlate these proxies with behavioural performance to get an indication. The combination of all these proxies (common reference objects, screen size, screen refresh rate, and screen resolution) would be most informative.

To ensure full attention on the task and reduce the likelihood of distractions, we suggest forcing the browser in full screen mode . Researchers may consider pausing the experiment as soon as exiting the full screen mode is detected by the software. This mode reduces, but does not eliminate, the possibility of distractions, since notifications often need to be disabled manually and extended or second monitors are not affected by the full screen mode.

The absent possibility for the experimenter to intervene in a mass online testing setting calls for improved study designs that make experiments “dummy proof”. This places particular emphasis on the use of clear and concise instructions and substantial practice . Instructions may include a step-by-step guide. Their minimum presentation time should be fixed to increase the likelihood of them being read by the participant without simply moving on. Visual instructions may be accompanied by a read-out audio file for increased comprehension and accessibility. However, this supplementary audio would require participants to turn up the volume on their device. A technical check needs to be implemented by the experimenter. Headphone checks could also be implemented to ensure participants are wearing them when the task contains auditory material. In turn, this addition may result in more time costs during the implementation but may be worth the trade-off depending on the tested population. For example, individuals with dyslexia are likely to benefit from this audio-visual presentation format, while fast readers may be distracted by a slower audio. Researchers can also consider instructional manipulation checks, which follow a format similar to other questions or instructions in the study but ask participants to click on a non-intuitive part of the display ( Oppenheimer et al., 2009 ). If time allows, participants may be quizzed about the nature of the study after reading the instructions ( Woods et al., 2015 ) and a post study questionnaire can be used as well.

Subjecting participants to practice trials with intuitive feedback at the beginning of the experiment to ensure proper understanding of the task requirements is already standard in behavioural studies. However, its exact implementation becomes crucial online. This practice should consist of at least three trials per experimental condition (a first encounter and two practice trials for performance evaluation). Depending on the study, the practice could also include many more trials and a staircase procedure for achieving a first performance plateau. Participants should also receive feedback on whether their response was correct, incorrect, or too slow (in a speeded task). Intuitive colouring of the feedback itself, such as correct in green, incorrect in red, and too slow in blue may facilitate learning the task. Experimenters should consider repeating the practice, if a valid response (correct or incorrect) was given in fewer than 51% of the trials for performance evaluation. This general threshold can be applied to any paradigm measuring accuracy. A threshold level of 51% allows for the distinction between mere chance level of responding yet is not too high as to eliminate participants who have understood the instructions but are low performing (51–70%). These thresholds are subject to change based on a study’s specific paradigm, its experimental conditions, and prior evidence in the literature, as other researchers set their accuracy threshold at 85% for instance ( Sauter et al., 2022 ). Participants could also be allowed to repeat the practice multiple times if they want to or feel that they need to, however, a limit of repetitions (3–5 times) should be imposed to reduce the possibility of confounding practice effects. These effects can occur quickly ( Bartels et al., 2010 ; Benedict & Zgaljardic, 1998 ; Calamia et al., 2012 ). If participants are allowed to repeat the practice as often as they wish, repeating the practice could be more tied to personality type or motivation, as it is conceivable that rather unconfident or highly perfectionistic individuals opt for repeating the practice more often than others, even though they understand the task perfectly well. If a participant is systematically failing the practice (i.e., > 3–5 times) and has failed both attempts at the comprehension checks prior to starting the practice, they should be prevented from continuing the study.

In our fictional study, we designated an overall study duration of 30 minutes as ideal and 45 minutes as maximum; to reduce dropout rates based on the assumption that participants are at higher risk of distraction and have shorter attention spans in an online setting. Nonetheless, 45 minutes can be considered rather long, as 70% of respondents in a recent survey indicated to prefer studies with a duration of less than 30 minutes ( Sauter et al., 2020 ). This study duration refers to a single session. If the experiment were to consist of multiple sessions, separated by days or hours, each session would ideally be a maximum of 45 minutes or much less. Depending on the total number of sessions, to limit fatigue effects in a particular session, the duration of each session should be equally long. With increasing duration of an experiment, the dropout rate increases, as participants become less attentive, motivated, and occasionally discouraged. However, increasing the monetary incentive structure to a pay slightly above minimum wage only mitigates this issue somewhat ( Crump et al., 2013 ). Researchers could also consider implementing a game-style reward structure to mitigate participant dropout. An example would be to give block-by-block feedback, which may motivate participants to continue the experiment and track their performance along the way. Nevertheless, shorter studies have increasing appeal for participants ( Sauter et al., 2020 ) and should in turn benefit researchers.

When it comes to keeping it short, researchers always face a context-dependent trade-off between statistical power per condition, often represented by the trial count, and the length and complexity of the experiment. For instance, to perform modelling of reaction times with linear mixed-effects models, one requires enough sampling units (i.e., participants and trials) to increase the likelihood of these models’ conversion and thereby credible results. Although devising rules of thumb is difficult, 40 participants and 40 different stimuli are considered a good starting point in cognitive psychology ( Meteyard & Davies, 2020 ). Importantly, the sampling unit count depends mainly on the research design (e.g., within or between-subject/item factors) and researchers’ planned analyses. Here, we emphasise that this is highly specific to each study and a simulation approach is recommended to determine the power and number of sampling units for a specific design and all its factors.

In terms of procedural design recommendations for online research, behaviours leading to irreproducible research ( Button et al., 2013 ; Ioannidis, 2005 , 2008 ; Ioannidis et al., 2014 ; Munafò et al., 2017 ; Simmons et al., 2011 ) can be partly avoided by pre-registering an unmodifiable analysis plan consisting of research questions, hypotheses, exclusion criteria, a priori power calculations, and target sample size on the Open Science Framework for instance (OSF; www.osf.io ; Foster & Deardorff, 2017 ; Nosek, 2015 ; Nosek et al., 2018 ). This does not mean that the door for exploratory or unplanned analyses is closed, which could be added as an “unplanned protocol deviations” section in the final report ( Camerer et al., 2018 ). Pre-registration templates exist ( van den Akker et al., 2021 ) and data, analyses, and materials can also be made easily accessible and reproducible by sharing these in hybrid formats, such as markdown ( Artner et al., 2020 ). These formats merge code, results, and text. They can be stored via cloud-based solutions including the OSF, GitHub or Databrary ( Gilmore et al., 2018 ; for a guide on transparent research steps, see Klein et al., 2018 ). Due to the lack of hypotheses and fictional nature, the presented example simulation study was not pre-registered. However, in a truly experimental setting, all aspects of the fictional study’s procedures would have been pre-registered, and authors may even consider submitting for a registered report.

Overall, asking participants to find a quiet environment, exit all non-essential programmes, close all other browser tabs and applications, and empty their browser’s cache can all help the study to run smoothly and provide the best possible timing accuracy. Instructions may rather be extensive, and their visual form complemented by an auditory read out. Attention and comprehension checks are recommended. Participants could be guided through a sample trial step-by-step. We also suggest complementing experimental testing with questionnaire items (after the experiment) to collect self-report data on experienced noise and distractions in the participant’s environment, on what may have gone wrong, and give participants a chance to provide feedback that is often collected informally in the lab.

Data screening and analysis

Once the aforementioned experimental design considerations have been implemented and data collected, researchers face the challenging task to evaluate whether their data could give rise to valid empirical results. Therefore, the implementation of robust data screening measures is of utmost importance. Since control over a participant’s setting, their technical equipment and understanding of the task is more limited online than in the laboratory, increased noise in a dataset should be expected. This gives even more importance to the screening of data collected online as opposed to a laboratory. The data screening procedure has to be able to 1) identify and quantify this noise and 2) lead to clear decisions on whether a dataset should be kept or excluded from all analyses.

When dealing with empirical data, the first data screening step consists of identifying the number of participants that did not fully partake . The number of completed experimental trials lends itself as a good proxy to evaluate whether sufficient trials have been recorded for analysis and whether sheer experimental duration might have been the reason for participants dropping out. This threshold should be based on the requirements of the statistical analyses, such as power and sensitivity, and set a priori to avoid issues that have contributed to the replication crisis ( Button et al., 2013 ; Ioannidis, 2005 , 2008 ; Ioannidis et al., 2014 ; Munafò et al., 2017 ; Simmons et al., 2011 ). In our fictional study, we simulated power for a linear mixed-effects model as a function of both the number of participants and stimuli. This simulation allows for determining the critical number of trials (i.e., stimuli) necessary for a reliable minimum of 80% power, given a sample size and assumptions about the effect. This threshold should be determined for every study. In the simulation, it resulted in a sample of 70 participants for detecting a group difference (between-subjects factor) in response times when using 40 or more stimuli. Repeated participation attempts can be prevented using the settings in most experiment manager software. For example, the platform Prolific automatically tracks participants’ completed, rejected, and aborted attempts, providing a full report that includes participant ID numbers and session durations. We suggest excluding data of quickly aborted and repeated sessions from all analyses, unless this number is part of the research question.

In a second step, the double-checking of crucial experimental variables at independent time points is essential and should be considered good practice. Participants should only be excluded in accordance with hard exclusion criteria and/or repeated failing of attention and/or comprehension checks, which were obtained at different timepoints throughout the experiment. The platform Prolific provides useful recommendations on how to integrate attention and comprehension checks within their platform but can also be regarded as general advice ( Prolific Team, 2022 ). Specifically, they allow two different types of attentional checks. The first type is an Instructional Manipulation Check (IMC) , where participants are asked to answer a question in a very specific way, and thus, researchers can determine whether participants were paying attention or not, based on their answer. The second method is to integrate a nonsensical item into a survey, where only one objectively logical answer makes sense. These checks may take the form of a multiple-choice question. To ensure participants have understood critical information about the experiment, Prolific also recommends implementing valid comprehension checks right at the beginning of the study alongside the instructions (i.e., before the training). As per Prolific guidelines, participants should be given at least two attempts for comprehension checks, and if they fail both attempts, they should be asked to return to their submission (for more information, see Prolific’s Attention and Comprehension Check Policy; Prolific Team, 2022 ).

Third, one should clean all trials without a valid response . For example, in perceptual decision-making all trials without a decision before the limits of the response deadline are commonly discarded ( Franzen et al., 2020 ), because a decision is simply absent and should not be evaluated as incorrect for this reason. However, criteria for the validity of trials may vary by research field and question. In a design where missed responses could be of interest, such as inhibited responses in the go/no-go task, these trials may be retained for analysis, as they present valuable information for answering the research question.

Additional data screening measures that are specific to the collected behavioural data (e.g., accuracy and reaction times) can be applied. First, as an indicator of the ability to understand and perform the task, we suggest screening the simulated data for mean accuracy levels that fall below 50% on the rather easy three-load (low-WM load) condition. As mentioned previously, a threshold of 51% is a soft recommendation that should allow for the differentiation between mere chance and low performance. Specific to the fictional study, a three-digit working memory load should not be a problem for anyone, unless they are affected by a more general cognitive impairment ( Vogel et al., 2001 ), and can thus be used as a baseline. To finish this step, report the number of participants removed due to accuracy concerns.

Next, we suggest screening reaction times for trials considered too fast for any valid decision-making, memory or other relevant processes having been completed and indicated via button press(es). For instance, given that the lower bound of object processing of a single object was found to be around 100 ms ( Bieniek et al., 2016 ) and participants saw three choice options, we would remove all trials with reaction times faster than 200 ms. This threshold is highly study specific.

In general, it is good practice to base data screening decisions , reliability estimates, and analysis of results on robust statistical evidence ( Field & Wilcox, 2017 ; Parsons et al., 2019 ). When computing averages, robust measures of central tendency, such as the median, trimmed or winsorized means should be considered ( Rousselet et al., 2017 ; Wilcox & Rousselet, 2018 ). These measures are best suited for continuous variables. We also suggest to exclude participants whose data represents outliers based on robust measures of variation in a univariate or multivariate distribution including the median absolute deviation (MAD) or minimum covariance determinant (MCD; Leys et al., 2019 ). Similarly, the combination of frequent fast responses without much variation and poor performance suggests mere guessing. This would be indicated by a pattern of fast median reaction time, small median absolute deviation in reaction times, and low mean accuracy (< 5th percentile of the group distributions). Researchers can first explore visually whether such a pattern might exist by plotting mean/SD accuracy vs mean/SD reaction times ( Figure 1c ).

Since many studies in cognitive psychology aim to generalise results to a population of participants and/or stimuli, we suggest applying multilevel analysis including random effects, as is done in linear mixed-effects modelling (for a tutorial, see DeBruine & Barr, 2021 ). Alternatively, one may use Bayesian mixed modelling, which allows for quantifying evidence for both the alternative and the null hypothesis ( Kruschke, 2014 ; Kruschke & Liddell, 2018 ). Modern computing power equally allows for extrapolation from one’s data by means of bootstrapping or permutation approaches (i.e., sampling at random with or without replacement, respectively; Efron, 1979 ; Wilcox & Rousselet, 2018 ). Researchers should ensure reproducibility of their analysis and avoid inconsistencies by implementing all data screening procedures in custom scripts in open source languages available to everyone such as RStudio ( RStudio Team, 2021 ).

Taken together, in our fictional example, applying these data screening measures resulted in the exclusion of six participants. The remaining data shows that all average values are within reasonable boundaries, with only a few participants from both groups exhibiting more extreme performance ( Figure 1c ).

The opportunities for future online research are manifold and provide some exciting possibilities but the devil is in the detail. A combination of specifically tailored research questions and experimental designs paired with online attention checks and rigorous data screening is required for success.

With increased cost efficiency and feeling of anonymity, online experiments could help in the recruitment of larger and locally independent samples or specific populations and demographics that are otherwise hard to reach. Lower costs would allow undergraduate students to run scientifically meaningful independent research projects, demonstrating the type of access the online delivery method provides to motivated trainees, even in times of in-person lab closures. Particularly in these times, the lack of required physical presence in labs—a benefit of online research—was complimentary to the COVID-19 social distancing measures and facilitated cost efficient access to data. Thereby, it allowed trainees to fulfil their degree requirements.

Increased anonymity may be a particularly important advantage for individuals with learning disabilities who may not feel comfortable disclosing their diagnosis and standing physically present in front of an experimenter who they may even know. Equally, the hurdle to quit an experiment is much reduced by simply closing one’s browser tab instead of having to walk out of a laboratory. This is of particular relevance for studies conducted physically in the university context where students participate in their professors’ experiments for course credit, and may feel obligated to finish a study due to implicit social pressure associated with this relationship. Hence, especially the recruitment of participants with specific diagnoses (e.g., dyslexia; for psychiatric conditions, see Gillan & Daw, 2016 ) may benefit widely from shifting experimental research in cognitive psychology and neuroscience online—if done thoughtfully.

However, increased anonymity comes at the cost of lower experimenter control online. Some level of control could be regained in a proctored setting with online intervention and supervision, such as organising a call where both participant and experimenter are online during testing. The decision to add researcher supervision online would also depend on the type of study design, such as generic paradigms that are built and ready to run independently, as opposed to specific protocol administrations that require additional supervision (i.e., neuropsychological test administration). Additionally, the level of intervention could also be based on the type of population, such as older adults requiring more supervision and assistance in the online realm.

Recruiting populations with dyslexia and other specific diagnoses or traits has a long history of proving difficult, especially in a university setting. The nature of dyslexia and the predominant focus of the education system on reading and writing mean that individuals with medium to severe cases of dyslexia may not be captured when recruiting at the university level, as many have most likely not progressed to this level ( Warnke, 1999 ). Therefore, generalising lab-based results to the general population of individuals with dyslexia can become problematic, particularly if based on a WEIRD sample. Here, extended sampling of individuals from a variety of socioeconomic backgrounds would be a step towards more generalisability. Collecting a sample diverse in socioeconomic background, age, country of origin, etc. is not automatic and needs to keep the distribution of access to the internet and technology among other variables in mind ( Lourenco & Tasimi, 2020 ).

This issue equally applies to individuals with psychiatric disorders. These individuals would usually not be part of university subject pools and require even better data protection standards. For example, patients with psychotic disorders may be outpatients at a university clinic or affiliated hospital facilitating recruitment through collaborations but requiring pro-active, in-person recruitment due to high levels of distrust. If researchers are not able to establish or benefit from those in-person collaborations and would like to recruit online, extending the research question to other (model) populations/aspects would be a desirable option. In the case of schizophrenia, one such population are individuals on the schizotypy personality spectrum. According to the continuum hypothesis of psychosis, schizotypy can be regarded as a subclinical model of schizophrenia among the normal population ( Ettinger et al., 2014 ; Giakoumaki, 2012 ), featuring subclinical symptoms of psychosis, but does not need to lead to a clinically diagnosable state ( Kwapil & Barrantes-Vidal, 2015 ; Nelson et al., 2013 ; Raine et al., 1994 ; Siddi et al., 2017 ). Recruiting from the normal (i.e., non-clinical) population comes with the advantages of being able to recruit through standard platforms, assess relevant traits with standardised questionnaires instead of requiring an official diagnosis, and often avoid medication-related confounds. Further, investigating a dimensionally expressed trait in the normal population using a continuum (correlational) approach avoids the need for a well-matched control group.

To recruit hard-to-reach populations, other recruitment attempts could include the use of mailing lists, listservs, online forums or Facebook (sponsored) posts, as has been done for recruiting infant participants ( Brouillard & Byers-Heinlein, 2019b , 2019a ). Having one’s study featured in relevant newspaper or pop-science articles on the topic presents another opportunity. However, as with different platforms, there may be expected differences between samples solely based on the recruitment source and technique. In using these methods, it is important to attempt to recruit outside of the researchers’ own social media networks, since such recruitment may not increase the diversity of the sample—leading instead to homophily ( Aiello et al., 2012 ; Sheskin et al., 2020 ). Some of the popular platforms for managing data collection, such as MTurk or Prolific, provide access to quite large populations from around the world. Most importantly, the characteristics, intentions, and motivations of these samples must align with the study’s objectives.

The need for an appropriately matched control group is a crucial aspect of research with specific populations, which intends to compare groups. Its appropriateness is highly context-dependent, as a control group with limited years of education may be matched for age but is likely a mismatch on a cognitive task (e.g., reading age) when compared to a group of individuals with dyslexia taking part in higher education. In these cases, a WEIRD sample ( Henrich et al., 2010 ) may also be appropriate, particularly if sampling of a population taking part in higher education is important for the research question. Another reason might be that the budget for advertising the study is constraint, as is often the case in student research. In these cases, matched WEIRD samples are important for scientific validity of the comparison but care needs to be exercised regarding the generalisability of results. In the case of our fictional study, comparing a diverse dyslexia sample from all walks of life to undergraduate students on a challenging cognitive task would have rendered this group comparison not meaningful. It would have been more beneficial to compare groups similar in certain demographic characteristics, raising awareness for the limitations and skewness in the interpretation of the results. Hence, WEIRD samples need to be well-justified but should not be categorically condemned.

One suggestion for increasing ecological validity, however, is to collect similar, but more, samples in independent locations. In other words, running the identical study at several universities in different cities, provinces or even countries. Thereby, cross-cultural questions could be addressed ( Knoeferle et al., 2010 ; Woods et al., 2013 ). Selecting specifically matching control groups is equally key for those studies run at multiple sites. Networks facilitating such collaboration exist, such as the Psychology Science Accelerator ( www.psysciacc.org ; Forscher et al., 2021 ), and have their roots in the promising open science movement that seeks to counteract the replication crisis by promoting transparency, openness, and replication ( Klein et al., 2018 ; Munafò et al., 2017 ; Nosek, 2015 ; Nosek et al., 2012 , 2018 ; Simmons et al., 2011 ). However, these initiatives can come with a separate set of challenges regarding their organisation and logistics.

Another aspect, under the researcher’s control, that can help increase ecological validity by minimising the risk of participant fraud and ethical concerns, is a study’s reward structure. The reward structure of an experiment must be in line with payment amounts offered in laboratory settings to be ethical, while not being too large to avoid participants providing false information and/or providing random data simply to gain access to the monetary reward. Arbitrarily increasing monetary incentives much above the local minimum wage is unlikely to have a positive effect. Compared to payment below minimum wage, it has been shown to affect only the speed at which a sample and data is being collected as well as reducing dropout rates, but importantly, it did not affect experimental results ( Crump et al., 2013 ). In accordance with Woods and colleagues ( 2015 ), we suggest an ethical and reasonable rate based on the local minimum wage (e.g., 20 cents EUR per minute). Variable bonus incentives could also be introduced for completing certain aspects or the entire task successfully ( Chandler et al., 2014 ). Lastly, researchers also need to consider the likelihood of potential participants on a platform using participation as their main source of income, as this may affect data quality ( Peer et al., 2021 ).

Besides participant recruitment and sample characteristics, technological capabilities relevant to the experimental design are another main aspect underlying successful online studies. These become much more important in the online realm, as their variability increases with every participant bringing their own set-up to the study. To minimise unwanted and negative effects, researchers are well-advised to include hardware requirements in their experiment description and recruitment filters. Checking whether these have been fulfilled when starting the experiment and enforcing their compliance is a must, as it allows for better standardisation of procedures and results. In this respect, the necessity of extensive piloting of an experiment cannot be emphasised enough. Piloting the workflow should be carried out using multiple machines, browsers, and participants. This should include an evaluation of the instructions and accuracy of technical and other screening checks upon which the experiment should get aborted, if a mismatch is detected. Researchers need to keep in mind that extensive piloting takes time, potentially even more than in the lab.

Very recently, technological innovations have given rise to the possibility of webcam-based eye-tracking. This follows on from the introduction of more and more portable eye-trackers to the market since 2015. The possibility of collecting eye-tracking data in online studies using a laptop’s in-built or a USB webcam is a promising prospect and could benefit cognitive research majorly, as eye movements provide a mechanism to examine cognitive and physiological processes simultaneously. Current sampling rates are often restricted to 30 Hz (one data point every ~33 ms), which is still low compared to high-end lab-based eye-tracking systems often achieving 1000 Hz (one data point per millisecond). This maximum rate depends on the specifications of the webcam, processing power of the computer, current computational load (e.g., number of open browser tabs), and the eye-tracking algorithm itself. It is important to keep in mind that the maximum rate may not always be achieved by all systems and at all times, as the algorithm is often run on the participant’s local machine. Therefore, it is likely sufficient for experiments expecting longer and more steady fixations of at least 150 ms, as some samples may be skipped or not accurately collected. This type of research is often conducted in consumer psychology or sustained attention paradigms. Hence, the sampling rate remains a caveat that requires careful consideration of its usefulness for a given study.

Accuracy of some webcam-based solutions is estimated to be around 1 degree of visual angle by the developers ( GazeRecorder, 2021 ), comparable to lab-based systems. Occasional recalibration throughout an experiment can be useful, since accuracy may decrease over time ( Pronk, 2021 ). Factors in the participants’ environment, such as the general lighting conditions, can affect tracking performance as well. As accuracy often decreases towards the edges of a display, focusing the stimulus presentation around the centre of the display would be good practice. At the time of writing, webcam-based eye-tracking is available on platforms including PsychoPy/Pavlovia, Labvanced, and Gorilla. Most use the WebGazer.js library ( Papoutsaki et al., 2016 ) and require a mouse/cursor to perform the calibration. Taken together, if used purposefully, webcam-based eye-tracking and automated video analysis have great potential for adult and developmental research ( Chouinard et al., 2019 ), as one of the first physiological measures to be reliably collected online.

As the transition to online research proves difficult to achieve based on a simple blueprint, the presented suggestions aim to provide a starting point. They are intended to guide one’s critical thinking about experimental design considerations without claiming to be all-encompassing. Crucial questions as one begins to design any online study are: What is the goal of my study, is the online delivery method appropriate and sufficient, and are all the measures needed to answer my research question accurately collectable online? It is also important to consider worst-case scenarios with regards to the experimental design, participants, and technology, and to think of ways to mitigate these issues beforehand. Our fictional study illustrates these suggestions in practice. Often the benefits outweigh the costs, as the future of research is heading towards technological innovation, and the COVID-19 pandemic offered many a first opportunity at trying to leverage the benefits of online research. As increased environmental changes and biological hazards may result in an uncertain future with regards to global pandemics ( Beyer et al., 2021 ), making the transition to online experimentation sooner rather than later, could prove to be more advantageous for research teams in many different settings and research fields in the long run. Thus, whether more researchers adopt this method should simply be a matter of time. The key factor is how it is being done.

Transparency statement

Regarding the presented fictional example study, we reported how we determined the sample size and stopping criterion. We reported all experimental conditions and variables. We reported all data exclusion and outlier criteria and whether these were determined before or during the data analysis.

Data accessibility statement

Data and code is accessible from the experimental study’s Open Science Framework repository: https://osf.io/dyn5t/ (doi: 10.17605/OSF.IO/DYN5T ).

Funding Statement

Funded by: Fonds de Recherche du Québec – Société et Culture.

Acknowledgements

We would like to thank Bianca Grohmann, Malte Wöstmann, Aaron P. Johnson, and Jonas Obleser for their feedback on earlier drafts of this manuscript. Further, we wish to acknowledge all the suggestions and feedback that were collected in response to a tweet by LF on the topic.

Funding information

This research was supported by a Fonds de Recherche du Québec – Société et Culture team grant (196369), and a Horizon Postdoctoral Fellowship awarded to LF by Concordia University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests

The authors have no competing interests to declare.

Author contributions

Both authors contributed equally to most stages of this research, while Léon Franzen also acted as senior, supervising author. Specifically, authors contributed in the following way: Conceptualization: Nathan Gagné and Léon Franzen. Data curation: Nathan Gagné and Léon Franzen. Formal analysis: Nathan Gagné and Léon Franzen. Funding acquisition: Léon Franzen. Investigation: Nathan Gagné and Léon Franzen. Methodology: Nathan Gagné and Léon Franzen. Project administration: Léon Franzen. Resources: Léon Franzen. Software: Nathan Gagné and Léon Franzen. Supervision: Léon Franzen. Validation: Nathan Gagné and Léon Franzen. Visualization: Nathan Gagné and Léon Franzen. Writing – original draft: Nathan Gagné and Léon Franzen. Writing – review & editing: Nathan Gagné and Léon Franzen.

Aiello, L. M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B., & Menczer, F. (2012). Friendship prediction and homophily in social media. ACM Transactions on the Web , 6(2). DOI: https://doi.org/10.1145/2180861.2180866  

Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology , 74, 187–195. DOI: https://doi.org/10.1016/j.jesp.2017.09.004  

Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., & Kievit, R. (2019). Raincloud plots: a multi-platform tool for robust data visualization [version 1; peer review: 2 approved]. Wellcome Open Res , 4(63). DOI: https://doi.org/10.12688/wellcomeopenres.15191.1  

Altman, D. G., & Bland, J. M. (1999). Treatment allocation in controlled trials: why randomise? BMJ , 318(7192), 1209–1209. DOI: https://doi.org/10.1136/bmj.318.7192.1209  

Anwyl-Irvine, A. L., Dalmaijer, E. S., Hodges, N., & Evershed, J. K. (2021). Realistic precision and accuracy of online experiment platforms, web browsers, and devices. Behavior Research Methods , 53(4), 1407–1425. DOI: https://doi.org/10.3758/s13428-020-01501-5  

Anwyl-Irvine, A. L., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. K. (2020). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods , 52(1), 388–407. DOI: https://doi.org/10.3758/s13428-019-01237-x  

Artner, R., Verliefde, T., Steegen, S., Gomes, S., Traets, F., Tuerlinckx, F., & Vanpaemel, W. (2020). The reproducibility of statistical results in psychological research: An investigation using unpublished raw data. Psychological Methods , 26(5), 527–546. DOI: https://doi.org/10.1037/met0000365  

Baker, M., & Penny, D. (2016). Is there a reproducibility crisis in science? Nature , 452–454. DOI: https://doi.org/10.1038/d41586-019-00067-3  

Bartels, C., Wegrzyn, M., Wiedl, A., Ackermann, V., & Ehrenreich, H. (2010). Practice effects in healthy adults: A longitudinal study on frequent repetitive cognitive testing. BMC Neuroscience , 11. DOI: https://doi.org/10.1186/1471-2202-11-118  

Benedict, R. H. B., & Zgaljardic, D. J. (1998). Practice effects during repeated administrations of memory tests with and without alternate forms. Journal of Clinical and Experimental Neuropsychology , 20(3), 339–352. DOI: https://doi.org/10.1076/jcen.20.3.339.822  

Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s mechanical turk. Political Analysis , 20(3), 351–368. DOI: https://doi.org/10.1093/pan/mpr057  

Beyer, R. M., Manica, A., & Mora, C. (2021). Shifts in global bat diversity suggest a possible role of climate change in the emergence of SARS-CoV-1 and SARS-CoV-2. Science of the Total Environment , 767, 145413. DOI: https://doi.org/10.1016/j.scitotenv.2021.145413  

Bieniek, M. M., Bennett, P. J., Sekuler, A. B., & Rousselet, G. A. (2016). A robust and representative lower bound on object processing speed in humans. European Journal of Neuroscience , 44(2), 1804–1814. DOI: https://doi.org/10.1111/ejn.13100  

Blythe, H. I., Kirkby, J. A., & Liversedge, S. P. (2018). Comments on: “What is developmental dyslexia?” brain sci. 2018, 8, 26. the relationship between eye movements and reading difficulties. Brain Sciences , 8(6). DOI: https://doi.org/10.3390/brainsci8060100  

Bridges, D., Pitiot, A., MacAskill, M. R., & Peirce, J. W. (2020). The timing mega-study: Comparing a range of experiment generators, both lab-based and online. PeerJ , 8(e9414). DOI: https://doi.org/10.7717/peerj.9414  

Brouillard, M., & Byers-Heinlein, K. (2019a). Recruiting hard-to-find participants using Facebook sponsored posts . DOI: https://doi.org/10.17605/OSF.IO/9BCKN  

Brouillard, M., & Byers-Heinlein, K. (2019b). Recruiting infant participants using Facebook sponsored posts . DOI: https://doi.org/10.17605/OSF.IO/9BCKN  

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience , 14(5), 365–376. DOI: https://doi.org/10.1038/nrn3475  

Calamia, M., Markon, K., & Tranel, D. (2012). Scoring higher the second time around: Meta-analyses of practice effects in neuropsychological assessment. Clinical Neuropsychologist , 26(4), 543–570. DOI: https://doi.org/10.1080/13854046.2012.680913  

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour , 2(9), 637–644. DOI: https://doi.org/10.1038/s41562-018-0399-z  

Carter, W. L. (2021). ScreenScale . DOI: https://doi.org/10.17605/OSF.IO/8FHQK  

Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior , 29(6), 2156–2160. DOI: https://doi.org/10.1016/j.chb.2013.05.009  

Chandler, J. J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods , 46(1), 112–130. DOI: https://doi.org/10.3758/s13428-013-0365-7  

Chandler, J. J., & Paolacci, G. (2017). Lie for a Dime: When Most Prescreening Responses Are Honest but Most Study Participants Are Impostors. Social Psychological and Personality Science , 8(5), 500–508. DOI: https://doi.org/10.1177/1948550617698203  

Chouinard, B., Scott, K., & Cusack, R. (2019). Using automatic face analysis to score infant behaviour from video collected online. Infant Behavior and Development , 54, 1–12. DOI: https://doi.org/10.1016/j.infbeh.2018.11.004  

Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a Tool for Experimental Behavioral Research. PLoS ONE , 8(3). DOI: https://doi.org/10.1371/journal.pone.0057410  

de Leeuw, J. R., & Motz, B. A. (2016). Psychophysics in a Web browser? Comparing response times collected with JavaScript and Psychophysics Toolbox in a visual search task. Behavior Research Methods , 48(1), 1–12. DOI: https://doi.org/10.3758/s13428-015-0567-2  

DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science , 4(1). DOI: https://doi.org/10.1177/2515245920965119  

DePuy, V., & Berger, V. W. (2014). Counterbalancing. Wiley StatRef: Statistics Reference Online . DOI: https://doi.org/10.1002/9781118445112.stat06195  

Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening mechanical Turk workers. Conference on Human Factors in Computing Systems – Proceedings, 4, 2399–2402. DOI: https://doi.org/10.1145/1753326.1753688  

Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics , 7(1), 1–26. DOI: https://doi.org/10.1214/aos/1176344552  

Ettinger, U., Meyhöfer, I., Steffens, M., Wagner, M., & Koutsouleris, N. (2014). Genetics, Cognition, and Neurobiology of Schizotypal Personality: A Review of the Overlap with Schizophrenia. Frontiers in Psychiatry , 5, 1–16. DOI: https://doi.org/10.3389/fpsyt.2014.00018  

Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS ONE , 11(2), 1–12. DOI: https://doi.org/10.1371/journal.pone.0149794  

Faul, F., Erdefelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Journal of Materials and Environmental Science , 39(2), 175–191. DOI: https://doi.org/10.3758/BF03193146  

Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour Research and Therapy , 98, 19–38. DOI: https://doi.org/10.1016/j.brat.2017.05.013  

Forscher, P., Wagenmakers, E.-J., Coles, N. A., Silan, M. A., Dutra, N. B., Basnight-Brown, D., & IJzerman, H. (2021). A Manifesto for Big Team Science .  

Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association , 105(2). DOI: https://doi.org/10.5195/jmla.2017.88  

Franzen, L. (2018). Neural and visual correlates of perceptual decision making in adult dyslexia (Issue December) [University of Glasgow]. https://theses.gla.ac.uk/71950/  

Franzen, L., Delis, I., Sousa, G. De, Kayser, C., & Philiastides, M. G. (2020). Auditory information enhances post-sensory visual evidence during rapid multisensory decision-making. Nature Communications , 11, 5440. DOI: https://doi.org/10.1038/s41467-020-19306-7  

Franzen, L., Gagné, N., Johnson, A. P., & Grohmann, B. (2022). Behavioral markers of visuo-spatial working memory load in adult dyslexia . Open Science Framework. DOI: https://doi.org/10.17605/OSF.IO/DYN5T  

Franzen, L., Stark, Z., & Johnson, A. P. (2021). Individuals with dyslexia use a different visual sampling strategy to read text. Scientific Reports , 11, 6449. DOI: https://doi.org/10.1038/s41598-021-84945-9  

Gallant, J., & Libben, G. (2019). No lab, no problem: Designing lexical comprehension and production experiments using PsychoPy3. The Mental Lexicon , 14(1), 152–168. DOI: https://doi.org/10.1075/ml.00002.gal  

GazeRecorder. (2021). Gaze flow . https://gazerecorder.com/gazeflow/  

Giakoumaki, S. G. (2012). Cognitive and prepulse inhibition deficits in psychometrically high schizotypal subjects in the general population: Relevance to schizophrenia research. Journal of the International Neuropsychological Society , 18(4), 643–656. DOI: https://doi.org/10.1017/S135561771200029X  

Gillan, C. M., & Daw, N. D. (2016). Taking Psychiatry Research Online. Neuron , 91(1), 19–23. DOI: https://doi.org/10.1016/j.neuron.2016.06.002  

Gilmore, R. O., Kennedy, J. L., & Adolph, K. E. (2018). Practical Solutions for Sharing Data and Materials From Psychological Research. Advances in Methods and Practices in Psychological Science , 1(1), 121–130. DOI: https://doi.org/10.1177/2515245917746500  

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Samples. Journal of Behavioral Decision Making , 26(3), 213–224. DOI: https://doi.org/10.1002/bdm.1753  

Gorilla Team. (2021). Gorilla Screen Calibration . https://support.gorilla.sc/support/reference/task-builder-zones#eyetracking  

Grootswagers, T. (2020). A primer on running human behavioural experiments online. Behavior Research Methods , 52(6), 2283–2286. DOI: https://doi.org/10.3758/s13428-020-01395-3  

Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods , 48(1), 400–407. DOI: https://doi.org/10.3758/s13428-015-0578-z  

Henderson, P. W., & Cote, J. A. (1998). Guidelines for selecting or modifying logos. Journal of Marketing , 62(2), 14–30. DOI: https://doi.org/10.1177/002224299806200202  

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences , 33(2–3), 61–83. DOI: https://doi.org/10.1017/S0140525X0999152X  

Holcombe, A. (2020). The reproducibility crisis . https://osf.io/r4wpt  

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine , 2(8), e124. DOI: https://doi.org/10.1371/journal.pmed.0020124  

Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology , 19(5), 640–648. DOI: https://doi.org/10.1097/EDE.0b013e31818131e7  

Ioannidis, J. P. A., Munafò, M. R., Fusar-Poli, P., Nosek, B. A., & David, S. P. (2014). Publication and other reporting biases in cognitive sciences: Detection, prevalence, and prevention. Trends in Cognitive Sciences , 18(5), 235–241. DOI: https://doi.org/10.1016/j.tics.2014.02.010  

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science , 23(5), 524–532. DOI: https://doi.org/10.1177/0956797611430953  

Jun, E., Hsieh, G., & Reinecke, K. (2017). Types of motivation affect study selection, attention, and dropouts in online experiments. Proceedings of the ACM on Human-Computer Interaction , 1(CSCW), 1–15. DOI: https://doi.org/10.1145/3134691  

Kingdom, F., & Prins, N. (2016). Psychophysics (2nd ed.). Academic Press.  

Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Mohr, A. H., Jzerman, H. I., Nilsonne, G., Vanpaemel, W., & Frank, M. C. (2018). A practical guide for transparency in psychological science. Collabra: Psychology , 4(1), 1–15. DOI: https://doi.org/10.1525/collabra.158  

Knoeferle, K. M., Woods, A., Käppler, F., & Spence, C. (2010). That Sounds Sweet: Using Cross-Modal Correspondences to Communicate Gustatory Attributes. Psychology & Marketing , 30(6), 461–469. DOI: https://doi.org/10.1002/mar  

Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017). Neuroscience Needs Behavior: Correcting a Reductionist Bias. Neuron , 93(3), 480–490. DOI: https://doi.org/10.1016/j.neuron.2016.12.041  

Kruschke, J. K. (2014). Doing Bayesian Data Analysis . Academic Press.  

Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin and Review , 25(1), 178–206. DOI: https://doi.org/10.3758/s13423-016-1221-4  

Kwapil, T. R., & Barrantes-Vidal, N. (2015). Schizotypy: Looking back and moving forward. Schizophrenia Bulletin , 41(2), S366–S373. DOI: https://doi.org/10.1093/schbul/sbu186  

Labvanced Team. (2022). Labvanced Eye-tracking Guide . Scicovery GmbH. https://www.labvanced.com/docs/guide/eyetracking/  

Levay, K. E., Freese, J., & Druckman, J. N. (2016). The Demographic and Political Composition of Mechanical Turk Samples. SAGE Open , 6(1). DOI: https://doi.org/10.1177/2158244016636433  

Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration. International Review of Social Psychology , 32(1), 1–10. DOI: https://doi.org/10.5334/irsp.289  

Lourenco, S. F., & Tasimi, A. (2020). No Participant Left Behind: Conducting Science During COVID-19. Trends in Cognitive Sciences , 24(8), 583–584. DOI: https://doi.org/10.1016/j.tics.2020.05.003  

Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in Psychology Research: How Often Do They Really Occur? Perspectives on Psychological Science , 7(6), 537–542. DOI: https://doi.org/10.1177/1745691612460688  

Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods , 44(1), 1–23. DOI: https://doi.org/10.3758/s13428-011-0124-6  

Masuda, T., Batdorj, B., & Senzaki, S. (2020). Culture and Attention: Future Directions to Expand Research Beyond the Geographical Regions of WEIRD Cultures. Frontiers in Psychology , 11, 1394. DOI: https://doi.org/10.3389/fpsyg.2020.01394  

Mathôt, S., & March, J. (2021). Conducting linguistic experiments online with OpenSesame and OSWeb. PsyArXiv . DOI: https://doi.org/10.31234/osf.io/wnryc  

Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods , 44(2), 314–324. DOI: https://doi.org/10.3758/s13428-011-0168-7  

Meteyard, L., & Davies, R. A. I. (2020). Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language , 112. DOI: https://doi.org/10.1016/j.jml.2020.104092  

Meyers, E. A., Walker, A. C., Fugelsang, J. A., & Koehler, D. J. (2020). Reducing the number of non-naïve participants in Mechanical Turk samples. Methods in Psychology , 3, 100032. DOI: https://doi.org/10.1016/j.metip.2020.100032  

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie Du Sert, N., Simonsohn, U., Wagenmakers, E. J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour , 1, 0021. DOI: https://doi.org/10.1038/s41562-016-0021  

Nelson, M. T., Seal, M. L., Pantelis, C., & Phillips, L. J. (2013). Evidence of a dimensional relationship between schizotypy and schizophrenia: A systematic review. Neuroscience and Biobehavioral Reviews , 37(3), 317–327. DOI: https://doi.org/10.1016/j.neubiorev.2013.01.004  

Nosek, B. A. (2015). Promoting an open research culture: The TOP guidelines. Science , 348(6242), 1422–1425. DOI: https://doi.org/10.1126/science.aab2374  

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences of the United States of America , 115(11), 2600–2606. DOI: https://doi.org/10.1073/pnas.1708274114  

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science , 7(6), 615–631. DOI: https://doi.org/10.1177/1745691612459058  

Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology , 45(4), 867–872. DOI: https://doi.org/10.1016/j.jesp.2009.03.009  

Palan, S., & Schitter, C. (2018). Prolific.ac—A subject pool for online experiments. Journal of Behavioral and Experimental Finance , 17, 22–27. DOI: https://doi.org/10.1016/j.jbef.2017.12.004  

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon mechanical turk. Judgment and Decision Making , 5(5), 411–419.  

Papoutsaki, A., Daskalova, N., Sangkloy, P., Huang, J., Laskey, J., & Hays, J. (2016). WebGazer: Scalable webcam eye tracking using user interactions. IJCAI International Joint Conference on Artificial Intelligence, 3839–3845. DOI: https://doi.org/10.1145/2702613.2702627  

Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements. Advances in Methods and Practices in Psychological Science , 2(4), 378–395. DOI: https://doi.org/10.1177/2515245919879695  

Peer, E., Rothschild, D., Gordon, A., Evernden, Z., & Damer, E. (2021). Data quality of platforms and panels for online behavioral research. Behavior Research Methods , 54(4), 1643–1662. DOI: https://doi.org/10.3758/s13428-021-01694-3  

Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods , 51(1), 195–203. DOI: https://doi.org/10.3758/s13428-018-01193-y  

Prolific Team. (2022). Prolific’s Attention and Comprehension Check Policy . https://researcher-help.prolific.co/hc/en-gb/articles/360009223553-Prolific-s-Attention-and-Comprehension-Check-Policy  

Pronk, T. (2021). Demo Eye Tracking 2. https://gitlab.pavlovia.org/tpronk/demo_eye_tracking2  

Pronk, T., Wiers, R. W., Molenkamp, B., & Murre, J. (2020). Mental chronometry in the pocket? Timing accuracy of web applications on touchscreen and keyboard devices. Behavior Research Methods , 52(3), 1371–1382. DOI: https://doi.org/10.3758/s13428-019-01321-2  

Psychtoolbox Team. (2021). Psychtoolbox MeasureDpi . http://psychtoolbox.org/docs/MeasureDpi  

Raine, A., Lencz, T., Scerbo, A., & Kim, D. (1994). Disorganized Features of Schizotypal Personality. Schizophrenia Bulletin , 20(1), 191–201. DOI: https://doi.org/10.1093/schbul/20.1.191  

Rodd, J. (2019). How to maintain data quality when you can’t see your participants . Psychological Science. https://www.psychologicalscience.org/observer/how-to-maintain-data-quality-when-you-cant-see-your-participants  

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin , 86(3), 638–641. DOI: https://doi.org/10.1037/0033-2909.86.3.638  

Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2017). Beyond differences in means: Robust graphical methods to compare two groups in neuroscience. European Journal of Neuroscience , 46(2), 1738–1748. DOI: https://doi.org/10.1111/ejn.13610  

RStudio Team. (2021). RStudio: Integrated development for R .  

Saunders, M. N. K., Lewis, P., & Thornhill, A. (2019). “Research Methods for Business Students” Chapter 4: Understanding research philosophy and approaches to theory development. In Research Methods for Business Students (8th ed., pp. 128–171). Pearson Education. www.pearson.com/uk  

Sauter, M., Draschkow, D., & Mack, W. (2020). Building, hosting and recruiting: A brief introduction to running behavioral experiments online. Brain Sciences , 10(4), 1–11. DOI: https://doi.org/10.3390/brainsci10040251  

Sauter, M., Stefani, M., & Mack, W. (2022). Equal Quality for Online and Lab Data: A Direct Comparison from Two Dual-Task Paradigms. Open Psychology , 4(1), 47–59. DOI: https://doi.org/10.1515/psych-2022-0003  

Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using mechanical turk to study clinical populations. Clinical Psychological Science , 1(2), 213–220. DOI: https://doi.org/10.1177/2167702612469015  

Sharpe, D., & Poets, S. (2020). Meta-analysis as a response to the replication crisis. Canadian Psychology , 61(4), 377–387. DOI: https://doi.org/10.1037/cap0000215  

Sheskin, M., Scott, K., Mills, C. M., Bergelson, E., Bonawitz, E., Spelke, E. S., Fei-Fei, L., Keil, F. C., Gweon, H., Tenenbaum, J. B., Jara-Ettinger, J., Adolph, K. E., Rhodes, M., Frank, M. C., Mehr, S. A., & Schulz, L. (2020). Online Developmental Science to Foster Innovation, Access, and Impact. Trends in Cognitive Sciences , 24(9), 675–678. DOI: https://doi.org/10.1016/j.tics.2020.06.004  

Siddi, S., Petretto, D. R., & Preti, A. (2017). Neuropsychological correlates of schizotypy: a systematic review and meta-analysis of cross-sectional studies. Cognitive Neuropsychiatry , 22(3), 186–212. DOI: https://doi.org/10.1080/13546805.2017.1299702  

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science , 22(11), 1359–1366. DOI: https://doi.org/10.1177/0956797611417632  

Smythe, I., & Everatt, J. (2001). Adult Dyslexia Checklist . http://www.itcarlow.ie/public/userfiles/files/Adult-Checklist.pdf  

Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods , 43(1), 155–167. DOI: https://doi.org/10.3758/s13428-010-0039-7  

Stark, Z., Franzen, L., & Johnson, A. P. (2022). Insights from a dyslexia simulation font: Can we simulate reading struggles of individuals with dyslexia? Dyslexia , 28(2), 228–243. DOI: https://doi.org/10.1002/dys.1704  

Sternberg, S. (1966). High-speed scanning in human memory. Science , 153(3736), 652–654. DOI: https://doi.org/10.1126/science.153.3736.652  

Stoet, G. (2010). PsyToolkit: A software package for programming psychological experiments using Linux. Behavior Research Methods , 42(4), 1096–1104. DOI: https://doi.org/10.3758/BRM.42.4.1096  

Stoet, G. (2017). PsyToolkit: A Novel Web-Based Method for Running Online Questionnaires and Reaction-Time Experiments. Teaching of Psychology , 44(1), 24–31. DOI: https://doi.org/10.1177/0098628316677643  

Uittenhove, K., Jeanneret, S., & Vergauwe, E. (2022). From lab-based to web-based behavioural research: Who you test is more important than how you test. PsyArXiv , February 16. DOI: https://doi.org/10.31234/osf.io/uy4kb  

van den Akker, O. R., Weston, S., Campbell, L., Chopik, B., Damian, R., Davis-Kean, P., Hall, A., Kosie, J., Kruse, E., Olsen, J., Ritchie, S., Valentine, K., Van ’t Veer, A., & Bakker, M. (2021). Preregistration of secondary data analysis: A template and tutorial. Meta-Psychology , 5. DOI: https://doi.org/10.15626/MP.2020.2625  

van Stolk-Cooke, K., Brown, A., Maheux, A., Parent, J., Forehand, R., & Price, M. (2018). Crowdsourcing Trauma: Psychopathology in a Trauma-Exposed Sample Recruited via Mechanical Turk. Journal of Traumatic Stress , 31(4), 549–557. DOI: https://doi.org/10.1002/jts.22303  

Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance , 27(1), 92–114. DOI: https://doi.org/10.1037/0096-1523.27.1.92  

Walters, K., Christakis, D. A., & Wright, D. R. (2018). Are Mechanical Turk worker samples representative of health status and health behaviors in the U.S.? PLoS ONE , 13(6), e0198835. DOI: https://doi.org/10.1371/journal.pone.0198835  

Warnke, A. (1999). Reading and spelling disorders: Clinical features and causes. European Child & Adolescent Psychiatry , 8(S3), S002–S012. DOI: https://doi.org/10.1007/PL00010689  

Wilcox, R. R., & Rousselet, G. A. (2018). A Guide to Robust Statistical Methods in Neuroscience. Current Protocols in Neuroscience , 82(1), 8–42. DOI: https://doi.org/10.1002/cpns.41  

Woike, J. K. (2019). Upon Repeated Reflection: Consequences of Frequent Exposure to the Cognitive Reflection Test for Mechanical Turk Participants. Frontiers in Psychology , 10, 2646. DOI: https://doi.org/10.3389/fpsyg.2019.02646  

Woods, A. T., Spence, C., Butcher, N., & Deroy, O. (2013). Fast lemons and sour boulders: Testing crossmodal correspondences using an internet-based testing methodology. I-Perception , 4(6), 365–379. DOI: https://doi.org/10.1068/i0586  

Woods, A. T., Velasco, C., Levitan, C. A., Wan, X., & Spence, C. (2015). Conducting perception research over the internet: a tutorial review. PeerJ , 3, e1058. DOI: https://doi.org/10.7717/peerj.1058  

Yetano, A., & Royo, S. (2017). Keeping Citizens Engaged: A Comparison Between Online and Offline Participants. Administration and Society , 49(3), 394–422. DOI: https://doi.org/10.1177/0095399715581625  

Peer Review Comments

Swiss Psychology Open has blind peer review, which is unblinded upon article acceptance. The editorial history of this article can be downloaded here:

Peer Review History. DOI: https://doi.org/10.5334/spo.34.pr1

  • DOI: 10.17705/1jais.00787
  • Corpus ID: 253378670

Why and How Online Experiments Can Benefit Information Systems Research

  • Published in Journal of the AIS 2022
  • Computer Science

9 Citations

Trust in public and private providers of health apps and usage intentions, gamified monetary reward designs: offering certain versus chance‐based rewards, sustainable energy consumption behaviour with smart meters: the role of relative performance and evaluative standards, augmenting frontline service employee onboarding via hybrid intelligence: examining the effects of different degrees of human-genai interaction, pelatihan & penerapan issn pada jurnal di sma negeri 3 semarang, the critical challenge of using large-scale digital experiment platforms for scientific discovery, let me decide: increasing user autonomy increases recommendation acceptance, the effect of the anthropomorphic design of chatbots on customer switching intention when the chatbot service fails: an expectation perspective, cross-sectional research: a critical perspective, use cases, and recommendations for is research, related papers.

Showing 1 through 3 of 0 Related Papers

Online versus In-lab: Pros and Cons of an Online Prospective Memory Experiment

  • January 2015
  • In book: Advances in Psychology Research Volume 113 (pp.135-161)
  • Chapter: Online versus In-lab: Pros and Cons of an Online Prospective Memory Experiment
  • Publisher: NOVA

Anna J Finley at University of Wisconsin–Madison

  • University of Wisconsin–Madison

Suzanna L. Penningroth

Abstract and Figures

List of tasks performed in the PM experiment, for both the online version and the inlab version. PM = Prospective Memory.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Gianvito Laera

  • Sandrine Vanneste
  • Matthias Kliegel

Jasmin Brummer

  • Kris McCarty
  • Phoebe Penfold

Mark Wetherell

  • Nikolas S Williams
  • William King
  • Geoffrey Mackellar

Nicholas A Badcock

  • MOTIV EMOTION

Hanjian Xu

  • Jorge L. Armony

Laurent Bègue

  • Hannes Rakoczy
  • Bei-Hong Lin
  • Yu-Jung Chung
  • Hao-Yuan Cheng

Fu-Yin Cherng

  • COGNITIVE SCI

Eva Viviani

  • Priyanka Sukumaran

Nina Kazanina

  • SOC SCI COMPUT REV

Ulf-Dietrich Reips

  • SOC COGNITION

S. Christian Wheeler

  • J PERS SOC PSYCHOL
  • Kipling D. Williams
  • Christopher K. T. Cheung

Sean McCrea

  • Maurissa P. Radakovich
  • APPL COGNITIVE PSYCH

Jennifer M Gray

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Comparing online and lab methods in a problem-solving experiment

  • Published: May 2008
  • Volume 40 , pages 428–434, ( 2008 )

Cite this article

advantages of online experiments

  • Frédéric Dandurand 1 ,
  • Thomas R. Shultz 1 &
  • Kristine H. Onishi 1  

8668 Accesses

127 Citations

3 Altmetric

Explore all metrics

Online experiments have recently become very popular, and—in comparison with traditional lab experiments— they may have several advantages, such as reduced demand characteristics, automation, and generalizability of results to wider populations (Birnbaum, 2004; Reips, 2000, 2002a, 2002b). We replicated Dandurand, Bowen, and Shultz’s (2004) lab-based problem-solving experiment as an Internet experiment. Consistent with previous results, we found that participants who watched demonstrations of successful problem-solving sessions or who read instructions outperformed those who were told only that they solved problems correctly or not. Online participants were less accurate than lab participants, but there was no interaction with learning condition. Thus, we conclude that online and Internet results are consistent. Disadvantages included high dropout rate for online participants; however, combining the online experiment with the department subject pool worked well.

Article PDF

Download to read the full article text

Similar content being viewed by others

advantages of online experiments

LIONESS Lab: a free web-based platform for conducting interactive experiments online

advantages of online experiments

Designing Online Learning Environments to Support Problem-Based Learning

advantages of online experiments

Avoid common mistakes on your manuscript.

Birnbaum, M. H. (2004). Human research and data collection via the Internet. Annual Review of Psychology , 55 , 803–832.

Article   PubMed   Google Scholar  

Bosnjak, M. , & Tuten, T. L. (2003). Prepaid and promised incentives in Web surveys. An experiment. Social Science Computer Review , 21 , 208–217.

Article   Google Scholar  

Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research & Practice , 33 , 148–154.

Dandurand, F., Bowen, M. , & Shultz, T. R. (2004). Learning by imitation, reinforcement and verbal rules in problem-solving tasks. In J. Triesch & T. Jebara (Eds.), Proceedings of the Third International Conference on Development and Learning: Developing social brains (pp. 88–95). La Jolla: University of California, San Diego, Institute for Neural Computation.

Google Scholar  

Eaton, J. , & Struthers, C. W. (2002). Using the Internet for organizational research: A study of cynicism in the workplace. CyberPsychology & Behavior , 5 , 305–313.

Eichstaedt, J. (2001). An inaccurate-timing filter for reaction time measurement by JAVA applets implementing Internet-based experiments. Behavior Research Methods, Instruments, & Computers , 33 , 179–186.

Gosling, S. D., Vazire, S., Srivastava, S. , & John, O. P. (2004). Should we trust Web-based studies? A comparative analysis of six preconceptions about Internet questionnaires. American Psychologist , 59 , 93–104.

Halbeisen, L. , & Hungerbühler, N. (1995). The general counterfeit coin problem. Discrete Mathematics , 147 , 139–150.

Hogg, R. V. , & Craig, A. T. (1995). Introduction to mathematical statistics . Upper Saddle River, NJ: Prentice-Hall.

Konstan, J. A., Rosser, B. R. S., Ross, M. W., Stanton, J., & Edwards, W. M. (2005). The story of subject naught: A cautionary but optimistic tale of Internet survey research. Journal of Computer-Mediated Communication , 10 , Article 11. Retrieved 2006 from jcmc.indiana.edu/vol10/issue2/konstan.html.

Krantz, J. H. , & Dalal, R. (2000). Validity of Web-based psychological research. In M. H. Birnbaum (Ed.), Psychological experiments on the Internet (pp. 35–60). San Diego: Academic Press.

Chapter   Google Scholar  

Meyerson, P. , & Tryon, W. W. (2003). Validating Internet research: A test of the psychometric equivalence of Internet and in-person samples. Behavior Research Methods, Instruments, & Computers , 35 , 614–620.

Michalak, E. E. , & Szabo, A. (1998). Guidelines for Internet research: An update. European Psychologist , 3 , 70–75.

Musch, J. , & Klauer, K. C. (2002). Psychological experimenting on the World Wide Web: Investigating context effects in syllogistic reasoning. In B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.), Online social sciences (pp. 181–212). Seattle: Hogrefe & Huber.

O’Neil, K. M. , & Penrod, S. D. (2001). Methodological variables in Web-based research that may affect results: Sample type, monetary incentives, and personal information. Behavior Research Methods, Instruments, & Computers , 33 , 226–233.

O’Neil, K. M., Penrod, S. D. , & Bornstein, B. H. (2003). Web-based research: Methodological variables’ effects on dropout and sample characteristics. Behavior Research Methods, Instruments, & Computers , 35 , 217–226.

Preckel, F. , & Thiemann, H. (2003). Online- versus paper-pencil- version of a high potential intelligence test. Swiss Journal of Psychology , 62 , 131–138.

Reips, U.-D. (2000). The Web experiment method: Advantages, disadvantages, and solutions. In M. H. Birnbaum (Ed.), Psychological experiments on the Internet (pp. 89–114). San Diego: Academic Press.

Reips, U.-D. (2002a). Standards for Internet-based experimenting. Experimental Psychology , 49 , 243–256.

PubMed   Google Scholar  

Reips, U.-D. (2002b). Theory and techniques of conducting Web experiments. In B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.), Online social sciences (pp. 229–250). Seattle: Hogrefe & Huber.

Riva, G., Teruzzi, T. , & Anolli, L. (2003). The use of the Internet in psychological research: Comparison of online and offline questionnaires. CyberPsychology & Behavior , 6 , 73–80.

Salgado, J. F. , & Moscoso, S. (2003). Internet-based personality testing: Equivalence of measures and assessees’ perceptions and reactions. International Journal of Selection & Assessment , 11 , 194–205.

Simmel, M. L. (1953). The coin problem: A study in thinking. American Journal of Psychology , 66 , 229–241.

Virzi, R. A. (1992). Refining the test phase of usability evaluation: How many subjects is enough? Human Factors , 34 , 457–468.

Download references

Author information

Authors and affiliations.

Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, H3A 1B1, Montreal, PQ, Canada

Frédéric Dandurand, Thomas R. Shultz & Kristine H. Onishi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Thomas R. Shultz .

Additional information

This work began as a project completed for a graduate seminar in Human Factors and Ergonomics, taught by D. C. Donderi in the McGill University Department of Psychology.

Rights and permissions

Reprints and permissions

About this article

Dandurand, F., Shultz, T.R. & Onishi, K.H. Comparing online and lab methods in a problem-solving experiment. Behavior Research Methods 40 , 428–434 (2008). https://doi.org/10.3758/BRM.40.2.428

Download citation

Received : 05 August 2007

Accepted : 28 September 2007

Issue Date : May 2008

DOI : https://doi.org/10.3758/BRM.40.2.428

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Behavior Research Method
  • Subject Pool
  • Online Experiment
  • Explicit Learning
  • Online Participant
  • Find a journal
  • Publish with us
  • Track your research
  • Research Article

Pros (and Cons) of Virtual Labs

Teaching Strategies

School labs are places where students traditionally get “hands-on” with materials, doing scientific experiments themselves. Educators have argued for years that these experiences in the lab are invaluable. When students can observe the reaction of the chemicals or see the actual inside of a frog, they understand concepts better than if they simply read about them.

Unfortunately, schools must deal with budget cuts and limited resources every day, and science labs are expensive to maintain. If students can learn just as much doing virtual labs with the computers they already have, many schools would be happy to switch and lower costs. But what is lost and what is gained from switching to virtual labs?

Going beyond beakers and dissections

While experience using materials in science labs can be helpful for students, not everything students need to learn can be done in a school laboratory in the first place. Students cannot experiment with nuclear fission or the Big Bang Theory or DNA in school labs, and don’t have enough time during the semester to learn how natural selection or the food chain works. Consequently, the investigation of those important ideas must be done through lectures, textbooks, videos—or online simulations like ExploreLearning Gizmos.

Online simulations allow students to experiment, test and really understand concepts. Simulations allow students to try various “what-if” scenarios, running the same experiments again and again while just changing a variable each time. And all this exploration can be done in the class time allowed.

With over 550 math and science simulations in the ExploreLearning Gizmos library , students can knock a castle wall down with a trebuchet , dive below the sea to investigate the dangers to coral reefs , explore the universe , or use moths to investigate natural selection . All Gizmos come with a teacher guide and lesson materials that help educators and students go deeper.

No lab coats (or costs) required

In a virtual lab, no one gets their hands dirty. There’s nothing to clean up and there’s no need to set aside a separate room with sinks and equipment. A subscription to a virtual lab program costs money, but so does stocking a lab with materials. A virtual lab also easily transforms from a science lab to a computer lab that can be used for other subjects. This flexibility can be especially helpful for schools that don’t have the resources to build a lab.

Computer-based assessments

Computer-based assessments are moving beyond traditional multiple choice questions and becoming more interactive. The new assessments require students to find their way to the correct answer by moving data points, dragging objects, or adjusting aspects of an image. Students must do more than “provide an answer” or select vocabulary word definitions; they have to actually understand what the questions themselves require, and provide the evidence and reasoning needed to support their answers.

The test questions look a lot like science simulations, so using Gizmos in the classroom helps students become more comfortable and gives them a real leg up on standardized tests. Gizmos help students go deeper as they explore, analyze data and apply new concepts, and really understand the material.

Why choose?

A combination of hands-on and virtual simulations is definitely preferable for students and schools. Online simulations can help prepare students for lab experiments, leading students more readily to an “ah-ha” moment. Students can go deeper in simulations, making mistakes and thinking through problems to find a solution the way that scientists do.

Sign up for a free Gizmos account to give simulations a try in your virtual lab.

You might also like these stories...

Sign up to get the latest updates from ExploreLearning via occasional email.

AIS Electronic Library (AISeL)

  • eLibrary Home
  • eLibrary Login

Home > Journals > AIS Journals > JAIS > Vol. 23 > Iss. 6 (2022)

Journal of the Association for Information Systems

Why and How Online Experiments Can Benefit Information Systems Research

Lior Fink , Ben-Gurion University of the Negev Follow

Online experiments have become an important methodology in the study of human behavior. While social scientists have been quick to capitalize on the benefits of online experiments, information systems (IS) researchers seem to be among the laggards in taking advantage of this emerging paradigm, despite having the research motivations and technological capabilities to be among the leaders. A major reason for this gap is probably the secondary role traditionally attributed in IS research to experimental methods, as repeatedly demonstrated in methodological reviews of work published in major IS publication outlets. The purpose of this editorial is to encourage IS researchers interested in online behavior to adopt online experiments as a primary methodology, which may substitute for traditional lab experiments and complement nonexperimental methods. This purpose is pursued by analyzing why IS research has lagged behind neighboring disciplines in adopting experimental methods, what IS research can benefit from utilizing online experiments, and how IS research can reap these benefits. The prescriptive analysis is structured around key considerations that should be taken into account in using online experiments to study online behavior.

Recommended Citation

Fink, Lior (2022) "Why and How Online Experiments Can Benefit Information Systems Research," Journal of the Association for Information Systems , 23(6) , 1333-1346. DOI: 10.17705/1jais.00787 Available at: https://aisel.aisnet.org/jais/vol23/iss6/11

10.17705/1jais.00787

Since November 02, 2022

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here .

  • Journal Home
  • About This Journal
  • Information for Authors
  • JAIS Policy
  • Editorial Board
  • Preprints of Forthcoming Papers
  • Awards and Honors
  • Special Issues
  • Submit an Author-Video Here
  • Most Popular Papers
  • Receive Email Notices or RSS

Advanced Search

ISSN: 1536-9323

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

The Daily Wildcat

Is there a disadvantage to taking online labs.

With some classes at the University of Arizona being online for the fall semester, the Daily Wildcat has some tips to help you succeed.

Laboratories are offered fully online, live-online and flex in-person. There are no labs that are allowed to be fully in person. At most, six students and one instructor are allowed in the lab at the same time, which means most classes follow a laboratory rotation of splitting the class up into four sections and having the student only attend the lab every other week on a rotation.

As the University of Arizona has changed many of its lab courses to be online, it is clear there are mixed feelings among students. Students may view online labs as easier, as they do not have to physically attend class, while other students may believe it is harder, as work has to be done more individually and without physically getting to experience the concepts.

The first aspect to consider is whether or not online versions of labs are as effective in absorbing concepts as in-person labs. Brian Zacher teaches four, three-hour lab courses a week. In his opinion, he does not believe that is as effective as in-person options. 

When teaching his courses of qualitative analysis and physical chemistry, there is a lot of hands-on learning. Zacher specifically stated that “chemistry is a hands-on science/practice, which requires competence in a number of methods and techniques — which can only be developed through in-person practice.”

Another professor, Hilary Lease, teaches a number of physiology courses. One of her courses — physiology 202 — has a required lab component. Since her other two classes do not, she is able to compare just how effective the lab portion of a class can be and how it impacts a student’s understanding of the course material. Lease said she believes online labs can be just as effective as in-person labs if thought out precisely. 

Laura Van Dorn is a chemistry professor who teaches chemistry 101, which has a laboratory component — chemistry 102. 

“In-person and online labs focus on the same concepts but may deliver outcomes with slightly different skill sets,” Van Dorn said. “The key advantage of online instruction is flexibility within the curriculum.”

Each professor has their own opinion of the impact in-person labs can have, yet they have acknowledged there are advantages to each style of lab, whether it ranges from the student’s personal preference to the rigor that each option provides.

RELATED: Tips and tools for succeeding in STEM courses

Van Dorn said she believes the “primary advantage of in-person labs for students is the ability to practice hands-on techniques with physical instrumentation and real chemicals.” On the other hand, she also said there is much more flexibility with online courses. 

As for Lease, she said in-person labs and online courses are different, yet they complement each other. An in-person version can emphasize more hands-on activities such as dissections, experiments and models. 

“Online labs emphasize comprehension, but also include at-home dissections and at-home experiments,” Lease said. “I think all lab modalities are really well developed this semester for our course.”

As Zacher’s classes total up to 12 hours in the lab per week, he understands how crucial having that profound understanding of mechanisms and reactions are. He also commented that it is possible for online courses to be easier in workload, yet students are not exposed to crucial aspects of science. Zacher provided a thorough explanation as he works predominantly with chemistry labs. 

Lease claims that the online version of physiology 202 is still effective in teaching students materials, and it’s still a challenging course.

“I wouldn’t say that there is an advantage to being online,” she said. “Students are expected to know the same material.”

Lastly is the notion of if students should wait to take labs in-person during a different semester. It may be the student’s preference to take labs online, but having a professor’s opinion may aid students to make the best decision for themselves.

Van Dorn and Lease both agreed that it is probably not necessary to wait to enroll in an in-person course. There are a lot of resources available to students, and Lease even commented she believes each student in her physiology 202 course was able to get into their preferred laboratory modality.

Keep in mind that each laboratory course is set up in a different way. There are simply some experiments that cannot be replicated or simulated precisely online. Thus, some courses have transitioned more easily to an online version as it fits the curriculum better than other courses.

Follow Briana Aguilar on Twitter

  • Banner Health
  • solar system
  • The University of Arizona
  • Add Listing
  • Advertising
  • All Authors
  • All Listings
  • Arts & Life
  • Author Profile
  • Classifieds
  • Daily Wildcat newsletter sign-up
  • Donate to the Daily Wildcat
  • Listing Form
  • Payment Receipt
  • Privacy Policy
  • Registration
  • Scores and Schedules
  • Search Home
  • Search Result
  • Single Category
  • Terms and Conditions
  • Transaction Failure

Search Icon

Events See all →

Driskell and friends.

Driskell surrounded by paintings

The Arthur Ross Gallery presents the work of artist, scholar, and curator David Driskell and explores his relationships with other artists. Friends include: Romare Bearden, Elizabeth Catlett, Jacob Lawrence, Keith Morrison, James Porter, and Hale Woodruff.

6:00 a.m. - 1:00 p.m.

Arthur Ross Gallery, 220 S. 34th St.

Garden Jams

Penn Museum exterior

5:00 p.m. - 8:00 p.m.

Penn Museum, 3260 South St.

July 2024 Wellness Walk

Franklin Statue at College Green.

8:00 a.m. - 9:00 a.m.

Benjamin Franklin Statue by College Hall

ICA Summer 2024 Opening Celebration

7:00 p.m. - 10:00 p.m.

Institute of Contemporary Art, 118 S. 36th St.

Arts, Humanities, & Social Sciences

Studying the benefits of virtual art engagement

James pawelski and katherine cotter talk to penn today about their research into digital art galleries..

Vincent Van Gogh's The Postman, full-scale and zoomed-in.

The everyday visitor to an art museum may not know how many seconds or minutes they spent looking at a given painting or whether they spent more time with purple art or green art. But by placing participants in a virtual art gallery and using an open-source tool, researchers from the Humanities and Human Flourishing Project at the Positive Psychology Center have been able to track this kind of data and match behavior with questionnaire responses.

Studying virtual art galleries and their wellbeing benefits is a relatively new line of inquiry for the Project, a National Endowment for the Arts Research Lab. Some digital art experiences take the form of an online picture catalog of artwork while others “are almost like Google Maps Street View, where you can click through,” says Katherine Cotter , associate director of research.

Cotter and James Pawelski , principal investigator and founding director of the Humanities and Human Flourishing Project, talked with Penn Today about their research into digital art engagement.

pennsylvania museum of art artwork

How did you begin studying virtual art galleries?

Pawelski: This is a case of necessity being the mother of invention. When we brought Katherine on as a part of the Humanities and Human Flourishing Project, we had great plans and support to conduct research in the Philadelphia Museum of Art to see how visits there affected the visitor’s wellbeing. Unfortunately, the pandemic had other ideas. 

A lot of art museums pivoted very quickly to making their exhibitions available online. Katherine had a colleague who had a very creative and dedicated partner who offered to create a virtual platform for study during the pandemic. He followed through and OGAR—the Open Gallery for Arts Research—was born. We then helped to co-develop it to add more functions. 

Because of these various projects that museums had undergone—putting their art online, making it accessible virtually—these opportunities were not going to go away when the pandemic was over. Instead, they realized this is a very powerful way of engaging audiences who can’t come, or can’t come today, to the art museum. We’ve been in conversations with a variety of art museums, including the Metropolitan Museum of Art and their digital folks there, and thinking, “How can we continue to use this platform to study the wellbeing effects of engaging with art?”

For people interested in exploring art from the comfort of their homes, what are some of your favorite virtual art experiences?

Cotter: One that I think is also really cool is Google Arts & Culture because they have such high-resolution images. You can scroll in so close to see the brushstrokes. James and I also taught a course for the Barnes Foundation in Philadelphia around this topic of visual art and flourishing on their online platform, which also has some of these really nice zoom-in features. They’re doing a lot of robust online teaching and programming on a variety of topics.

How does this work fit into the larger Humanities and Human Flourishing Project, and what does human flourishing mean in the context of engaging with art?

Pawelski: The Humanities and Human Flourishing Project is interested in looking at connections between arts and culture and various positive outcomes. These can range from physiological outcomes—Does it have an effect on your cortisol levels? Does it have an effect on your heart rate?—to neuroscientific effects. What happens in our brains as we walk into an art museum or as we go onto OGAR? 

We have a very broad notion of what we mean by these flourishing outcomes, and we have five different key pathways that we’ve identified. The first one is immersion; it’s hard to be changed by an experience in the arts and humanities if you’re not paying attention to it. The others include being able to express your feelings and your thoughts, acquiring long-term skills that you can put to use elsewhere, connecting with others, and reflecting on what the experience means to you.

You recently published a study on the wellbeing benefits of virtual art galleries. Can you talk about the design of the study and the digital art gallery that you used?

Cotter: This was a gallery put together with OGAR where we partnered with the Philadelphia Museum of Art to utilize their collection and create a set of galleries featuring 30 artworks. Part of what we were interested in was what happens in the virtual gallery but also what happens when people have repeated engagement. We had people complete a series of four gallery visits, and each gallery was different, so they were seeing new art each time. 

We see that people—across time—are having changes in their positive emotions, their negative emotions, and what we call aesthetic emotions, so feeling moved or in awe, or getting goosebumps or chills. But what seems to be particularly important is their immersion levels. People who are more immersed in these experiences overall have greater positive emotion, lower negative emotion, and more of these aesthetic feelings. 

All five personality traits—openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism—were uniquely linked to immersion, so people who are higher in these traits are reporting greater immersion. An interesting lack of effect was that none of these things was associated with people’s interest in art coming in, so it didn’t mean you had to be highly engaged with art or highly interested in art to see these benefits. 

Pawelski: Another aspect of our work that I think is really valuable is that if you go to an art museum and you ask these questions, that’s great, but you have to keep in mind that you have a biased sample. These are people who have decided, for whatever reason, that this is the way they’re going to spend their day. What is it about those people versus the other people driving by who have not decided to do that? Can you really generalize from that self-selected population to everyone? But with the work that we’re doing with OGAR, these are people who are representatively selected.

Are there particular benefits to a virtual art engagement compared to an in-person experience?

Cotter: There are unique affordances to the digital. I can go to three different international museums in the same day if I want to and not have to spend all that money on airfare to get to them. 

I got a not-infrequent number of responses from people doing this study saying they haven’t been to a museum in a long time because of physical or geographic accessibility, and they’re like, “It was so nice to view art again because I couldn’t.” I think there’s also some of these broader accessibility factors that come into play. If you want to go to a museum, they’re open at certain times; the internet’s always open. There’s mobility considerations as well; there’s not always a lot of spots to sit in the galleries, or they’re often taken.

Headshots of researchers Katherine Cotter, a woman smiling with a brown hair and a blue blazer, and James Pawelski, a smiling man with tortoiseshell glasses and greying hair in a blue blazer.

Pawelski: I think in some ways, our attitudes about art need to catch up with our attitudes about music. I don’t think anybody would say, “You’re listening to Spotify? Why would you do that? That’s kind of nuts. You’re not actually with the musician? You’re not actually at the concert?” 

We have incredible richness available to us in music. Why not take advantage of a similar kind of richness that we have available to us in art?

The power of protons

graduates take a selfie at penn park

Campus & Community

To Penn’s Class of 2024: ‘The world needs you’

The University celebrated graduating students on Monday during the 268th Commencement.

students climb the love statue during hey day

Class of 2025 relishes time together at Hey Day

An iconic tradition at Penn, third-year students were promoted to senior status.

students working with clay slabs at a table

Picturing artistic pursuits

Hundreds of undergraduates take classes in the fine arts each semester, among them painting and drawing, ceramics and sculpture, printmaking and animation, photography and videography. The courses, through the School of Arts & Sciences and the Stuart Weitzman School of Design, give students the opportunity to immerse themselves in an art form in a collaborative way.

interim president larry jameson at solar panel ribbon cutting

Penn celebrates operation and benefits of largest solar power project in Pennsylvania

Solar production has begun at the Great Cove I and II facilities in central Pennsylvania, the equivalent of powering 70% of the electricity demand from Penn’s academic campus and health system in the Philadelphia area.

I Got a Massage Every Week for a Month — Here's What I Learned

advantages of online experiments

I'm really bad at massages, and I don't mean giving them. I envy those who are able to fully unwind and let themselves relax while getting a massage since I typically use the 50 minutes of what should be a calming experience to create to-do lists and allow my mind to wander (this is probably why I also find it difficult to meditate or practice yoga). That being said, I'm well aware of the benefits of getting a massage and what it can do for one's overall mood, health, and wellbeing. So when I discovered Squeeze , a massage business founded by the people behind Drybar, I decided to challenge myself and see if I was able to fully lean into the massage experience by the end of four weeks. As someone who leads a busy lifestyle with an unpredictable schedule and backaches galore at the hands of my laptop, the idea of weekly massages sounds dreamy. On the flip side, it also seems like it could become more of a chore than a pampering session if done every week.

With several locations across the country, Squeeze is a no-frills massage studio that allows you to book, modify, pay, and review your treatment all in one app. I like that there are no phone calls needed to make an appointment, no awkward checkout processes, and that the entire experience from start to finish is completely customizable. Upon booking, I'm prompted to select a massage therapist based on a short bio that describes their expertise — this is great for those preferring someone with specific specialties like deep tissue or Swedish. Within the app, I was also able to choose my desired pressure, areas I'd like to avoid, and areas I'd like to focus on which allowed me to tailor my preferences to how my body was feeling that week. Finally, before each treatment began, I was able to select from four complementary aromatherapy oils like grapefruit, bergamot, rosemary, and lavender, as well as six musical options ranging from white noise to instrumental.

Experts Featured in This Article

Clinton Kyles , CMT, is a certified massage therapist.

Benefits of Weekly Massages

Unsurprisingly, there are a whole host of benefits to getting a weekly massage. "Regular massage therapy increases serotonin and dopamine levels, which helps to regulate mood and reduce stress, significantly lowering anxiety and depression over time," says Clinton Kyles, CMT, certified massage therapist. "Weekly massages improve circulation, support cardiovascular health, boost immune response by increasing white blood cells, and lower blood pressure and heart rates." For those who suffer from chronic pain conditions, a weekly massage can also help reduce discomfort. Workout enthusiasts may also find themselves feeling less sore with a quicker recovery period as a result of weekly massages.

My Experience With Weekly Massages

Massages are something I indulge in occasionally, but making them a part of my weekly routine came with a few unexpected side effects. Since July turned out to be my busiest work month I've had in a while my stress levels at the beginning of the month were at an all-time high. Thankfully, my first time at Squeeze automatically put me at ease. The muted cyan shade throughout the facility was a thoughtful choice to ensure my visits felt peaceful, calm, and serene. I also appreciated the subtle reminders to slow down and relax, like the entire wall of stress balls in the waiting room.

advantages of online experiments

For my first week, I decided to go the traditional route of a relaxing massage in anticipation of the demanding weeks ahead. I chose light to medium pressure with soothing music, and made a concerted effort to practice breathwork throughout the session to encourage my brain to not overthink. The 50 minutes flew by, and although I had a late night the evening before, I felt surprisingly energized and ready to tackle the day after the treatment.

My second week, which involved spending hours a day hunched over my laptop, the massage focused on my upper back. My therapist let me know that I held a lot of tension on my right side, and she spent some time massaging knots in this area. My third week saw a pretty heavy workout routine (Pilates three times and a bootcamp-type class twice), so my muscles were in need of a little more love. I opted for firm pressure with a focus on my shoulders, arms, and glutes. This massage was slightly uncomfortable compared to my first two, but my body definitely felt more relaxed when I walked out. By the time my fourth massage rolled around, I was so used to my new routine that I dozed off midway through the session, something I've never been able to do during a massage.

While I noticed a change in my stress levels, digestion, and sleep quality over the course of the four weeks, I was most surprised by how the experience affected my time management skills. While I previously always felt strapped for time, this month I was better equipped to manage my schedule. Specifically, my mind felt clearer, which meant that I was able to be more efficient while tackling work, personal, and daily tasks. I even noticed a change in my makeup routine. Whereas my typical makeup routine would be rushed and thrown together, I actually craved more "me time" and a lengthier getting-ready process. I also felt inspired to experiment with new products and techniques rather than reach for the same formulas and tools. Finally, because my stress levels leave me in a perpetual state of fight-or-flight, I typically engage in shallow breathing. This month, however, I found that I started to take long, deep breaths a few times each day, which consequently calmed my nervous system.

All in all, I began to look forward to the weekly massages as they quite literally forced me to take a much-needed break from the hustle and bustle of my daily life. Just like staying committed to healthy eating and a consistent workout regimen, I found myself feeling more connected to my body and mind over the course of the month. Massages may not be something I'll be shelling out the money for on a weekly basis but I certainly plan on trying to make them a more consistent part of my self-care routine .

Michelle Rostamian is a Los Angeles-based beauty and wellness contributor with over 10 years of experience in the industry. She began her career as a publicist, content writer, and social media manager, representing beauty brands and industry professionals. Currently, she is a writer and editor on all things makeup, beauty, skincare, and lifestyle.

  • Editor Experiments

Photograph of Oxford University skyline

Image credit: University of Oxford Images / John Cairns Photography

New therapies developed by Oxford experts offer online support for anxiety and post-traumatic stress disorders

Four internet-based therapies developed by experts at the University of Oxford’s Department of Experimental Psychology and Department of Psychiatry are proving helpful for patients with social anxiety disorder and post-traumatic stress disorders and for children with anxiety disorders.

Urgent treatment solutions are needed for children, adolescents and adults with mental health conditions. Despite the government committing to spending 8.9% of all NHS funding on mental health treatment last year, the pipeline to build new facilities and train new staff will take years and, on their own, are insufficient to meet demand. 

A suite of online therapies, developed and clinically validated by expert teams at the University of Oxford’s Experimental Psychology and Psychiatry Departments, is now available to help close this gap in care, and tackle anxiety disorders and mental health conditions across all age groups from children through to adolescents and adults. Patients work through a series of online modules with the brief support of a therapist through short phone or video calls and messages.

Randomised clinical trials by the University of Oxford team have demonstrated the impact of all four of the online platforms. Excellent results led to a new commercial licence partnership negotiated between Oxford University Innovation and Koa Health, a company well placed to leverage this cutting-edge technology and research. Koa Health looks forward to making the programmes available to patients across many NHS services, beginning in West Sussex, Oxfordshire, Buckinghamshire, Leicestershire, Bradford, North Tyneside, and London. Dr. Simon Warner , Head of Licensing & Ventures, Oxford University Innovation , said, “These four mental health digital therapies are a fantastic example of the world class expertise within the University of Oxford which has enabled us to launch cutting edge therapies with our industry partner Koa Health. The therapies are tried and tested and now readily available to help change the lives of people suffering from mental health conditions.”

The National Institute for Health and Care Excellence (NICE) early value assessment recommended 9 online therapies for use across the NHS. The therapies developed by the University of Oxford team, with funding from Wellcome and the National Institute for Health and Care Research (NIHR), represent 4 of the 9 selected therapies and will now be made widely available across NHS Trusts, mental health facilities, schools and colleges

One in five children and young people in England aged eight to 25 have a probable mental disorder and one in four adults in England experiences at least one diagnosable mental health problem in any given year.

Professor Cathy Creswell , a psychologist at the University of Oxford, whose team developed the childhood anxiety programme explains: 'Recent surveys suggest ongoing increases in the number of children and young people that are experiencing anxiety problems. Our online platforms, which were developed with support from the National Institute for Health and Care Research (NIHR) Oxford Health Biomedical Research Centre (OH BRC), provide practical tools with guidance and support to help tackle issues from home.' Professor David Clark , University of Oxford, whose team developed the social anxiety disorder programme adds: 'Social anxiety disorder starts in childhood and is remarkably persistent in the absence of treatment. Internet programmes that deliver optimal treatment for both adolescents and adults have the potential to transform lives and enable people to realise their true potential at school, in the workplace and in society.' Professor Anke Ehlers , a psychologist at the University of Oxford, and OH BRC Co Theme Lead for Psychological Treatments who led the work on post-traumatic stress disorder (PTSD) says: 'We’ve tested the digital therapy with patients who have PTSD from a broad range of traumas. Recovery rates and improvements in quality of life are excellent. Our clients value being able to work on the treatment from home at a time convenient to them.'

The team at the University of Oxford, Koa Health and Oxford University Innovation will work together to maximise the adoption of all four therapies adopted across NHS Trusts and schools over the coming year.

Oliver Harrison, CEO at Koa Health said: 'Koa Health is committed to delivering scalable, evidence-based interventions for mental health. The programmes developed by the Oxford teams can lower the barriers to care, deliver excellent outcomes, and reduce the cost to health services. In short, this means that our NHS is able to treat more people and improve mental health across the population. With an impeccable evidence base and approval by NICE, we see great potential to expand these programmes worldwide, helping children and adults.'

Dr John Pimm, Clinical and Professional Lead for Buckinghamshire Talking Therapies, said: 'People using our Talking Therapies services had been successfully using internet-based cognitive therapy for social anxiety disorders and post-traumatic stress disorder as part of the research trial and we are now pleased that our therapists will be able to offer this innovative treatment to more people using the Koa platform.'

Dr. Jon Wheatley, Clinical Lead, City and Hackney, NHS North East London, said: 'City and Hackney Talking Therapies are looking forward to embracing digital technology in response to increasing patient demand. We are proud to be working with Koa Health as an early adopter of these innovative solutions that enable therapists to deliver gold standard evidence-based treatments through internet programmes that are engaging and empowering for patients.'

Professor Miranda Wolpert , Director of Mental Health at Wellcome , said: 'These important online therapies have arisen from more than three decades of thorough science. Digital therapies have the potential to transform millions of people’s lives around the world. We look forward to supporting more digital innovation in the years to come.'

Subscribe to News

DISCOVER MORE

  • Support Oxford's research
  • Partner with Oxford on research
  • Study at Oxford
  • Research jobs at Oxford

You can view all news or browse by category

NTRS - NASA Technical Reports Server

Available downloads, related records.

  • Visit the University of Nebraska–Lincoln
  • Apply to the University of Nebraska–Lincoln
  • Give to the University of Nebraska–Lincoln

Search Form

Global historic experiments gathered at the congress of the international union of soil sciences.

People stand in group for photo

Established in 1924, the International Union of Soil Sciences (IUSS) is the global union of soil scientists contributing to both nature and human well-being. On May 19, the IUSS held a three-day centennial celebration in Florence, Italy, focused on the past achievements and future challenges of scientists and specialists from different disciplines.

At this IUSS event, one session was dedicated to soil science lessons from 100-plus-year-old experiments.

Bijesh Maharjan, the supervisor of the Knorr-Holden Plot in Nebraska — one such historic experiment — chaired the session. Maharjan, an associate professor at the University of Nebraska-Lincoln (UNL) and extension specialist in soil and nutrient management at the Panhandle Research, Extension, and Education Center (PREEC), intended to bring generations of soil scientists’ commitment to agriculture and research to the global stage.

In accordance with the primary reason behind the session, Maharjan said “we want to follow the footsteps of our visionary past scientists who initiated and maintained these historical research sites and be strategic and forward-looking by bringing all these unique and rare historical research together to enhance our understanding of agricultural sustainability, the prime need of the hour.”

Attendees listen to woman speaking at front of room

In chairing the session, Maharjan likewise aimed to lift Nebraska’s agriculture history and research contribution to the level of other well-known historical experiments. He presented long-term research from the historic Knorr-Holden continuous corn production system at the IUSS event. Established in 1910 and entered on the National Register of Historic Places in 1992, the Knorr-Holden plot near Scottsbluff, Nebraska is potentially the world’s oldest irrigated continuous corn research plot.

During the session, scientists shared their findings from their 100-plus-year-old experiments, emphasizing the invaluable importance of the impact of agriculture on the environment at the national and international level. The session brought together scientists representing most of these exceptionally established research experiments from across the globe. Not only were there scientists representing institutions from the United States — the UNL, Auburn University, Louisiana State University, and the University of Missouri — but scientists also represented institutions from the United Kingdom, Poland, Germany, Sweden, and Indonesia ( Figure 1 ).

Andy Gregory, head of the Rothamsted long-term experiments (LTEs) in the United Kingdom, was the keynote speaker at Maharjan’s session.

“I decided to frame my presentation in terms of ‘gradients of soil health’, because that is what the treatments imposed for well over a century at Rothamsted have created,” Gregory said. “We may demonstrate the link between management in agricultural systems and various important functions we ask of our soils, such as support for crop growth, regulation of water and air, and climate change mitigation, all of which underpin soil health.”

Collectively, attendees in Maharjan’s session emphasized the substantial benefits of historical research plots to education, crop productivity, and promoting agricultural sustainability while bridging the past with the future.

“The ‘Old Rotation’ (in Alabama) is a valuable educational resource for students, farmers, and agricultural scientists to learn more about the long-term effects of different farming practices on soil health, crop yields, and environmental impacts,” said Audrey Gamble, from Auburn University. “Their ability to connect the past with the present helps to support the educational, research and service missions of universities.”

Hans-Jörg Vogel from Germany noted that LTEs provide essential data that capture the complexity and long-term trends of soil processes. As such, LTE data are indispensable for developing, validating and refining systemic soil models, leading to more accurate predictions of future developments.

“Long-term experiments function both as a field laboratory, where long-term management has created a great variation of factors between plots, and as a reference material for the long-term impact of various management practices on soil and crop properties,” said Sabina Braun, a researcher from the Swedish University of Agriculture Sciences. “The LTEs play an invaluable role in Swedish agriculture and environmental research, policy making, and teaching.”

Across the globe, the handful of remaining LTEs not only continue to function as research entities but have begun to enlighten scientists and others to the lasting effects of management practices and climate trends on agriculture sustainability.

Lukasz Uzarowicz, from Poland, emphasized that observations of long-term trends from LTEs may be the basis for creating forecasts of changes in crop yields and the rate of sequestration or loss of organic carbon in the soil in the context of climate change.

“These forecasts can be used by decision-makers at the national and international levels to develop agricultural development strategies and introduce good practices to reduce agriculture's impact on the environment, including the climate,” Uzarowicz added.

Attendees listen to woman speaking at front of room

Though LTEs have substantial benefits, maintaining their continuity in the competitive grant funding system can be a long-term challenge. Tim Reinbott, director of the University of Missouri’s Sanborn Field, noted most grant funding awards are for only three to five years, and the time in between grant awards is a dangerous time for continuing the long-term research sites if financial commitments are not made by the stakeholders. Too often, long-term research sites in the 10-50 years after establishment are terminated due to lack of funds.

“Consistent soil and crop sampling and archiving of these samples is critical as that enhances the value of the experimental for future generations,” Reinbott said. “The biggest challenge for many long-term experiments is consistent funding to ensure that there are not breaks in data collection.”

Inspired by their harmonious efforts as stewards of LTEs, Maharjan’s IUSS session members intend to stay connected and continue the new collaboration. Efforts are being coordinated to share research strategies to enlighten and educate the LTE stewards. An example of their continued connection will be the organization of a spring 2026 conference at the Institute of Agriculture, Warsaw University of Life Sciences, related to celebrating centennial LTEs.

“I am planning to organize and welcome all to the conference devoted to LTEs and their role in science and society, connected with the celebration of centennial of LTEs belonging to the Warsaw University of Life Sciences,” Uzarowicz said. “We are also planning to organize a visit at our Experimental Station in Skierniewice, central Poland, where we have our oldest LTEs.”

Gregory, with Rothamsted Research, further emphasized the importance of LTEs and the new collaboration on the global stage.

“Although Rothamsted has the oldest set of agricultural LTEs, I genuinely feel that we are but one member of a global family of LTEs,” he said. “The future lies in closer links between these invaluable global resources so that we may maximize their potential to support new research that addresses the key societal challenges of sustainable agriculture, food security and climate change.”

This centennial IUSS event has brought together a collaboration between scientists dedicated to preserving long-term science and their capability to further an international link to enhance our understanding of agriculture sustainability.

Online Master of Science in Agronomy

With a focus on industry applications and research, the online program is designed with maximum flexibility for today's working professionals.

A field of corn.

advantages of online experiments

Learn new skills, connect in real time, and grow your career in the Salesblazer Community.

What Is Inside Sales? A Complete Guide

Inside sales rep on a call.

To succeed in this role, you'll need to hone your prospecting, communication, negotiation, and active listening skills.

advantages of online experiments

Belal Batrawy

Share article.

There’s more than one way to make it in sales. Gone are the days when you have to meet with prospects in person to seal a deal. Modern technology enables today’s sales pros to work remotely, ditching the demand to travel. Inside sales reps build customer relationships virtually.

If you’re interested in pursuing a career in inside sales, you’re in the right place to learn. Let’s get started.

What you’ll learn:

What is inside sales, benefits of inside sales, roles and responsibilities of inside sales, key inside sales skills you need to know, top features of inside sales tools, drive pipe faster with a single source of truth.

Discover how Sales Cloud uses data and AI to help you manage your pipeline, build relationships, and close deals fast.

advantages of online experiments

Inside sales is the process of selling an organization’s products or services remotely using digital communication tools such as video conferencing, phone calls, emails, chat, social media, or other online channels. Inside sales may also be referred to as “virtual sales” or “remote sales.”

It’s called inside sales because the sales rep is indoors at a company’s location or a home office. Companies in the business to business (B2B), technology, and software as a service (SaaS) industries typically rely on inside sales teams, since products and services can be presented and demonstrated on a screen.

Inside sales is not telemarketing. Unlike an inside sales rep, a telemarketer contacts prospects at random, reads from a script, and has limited knowledge of the product being sold.

Inside sales vs. outside sales

Outside sales refers to selling a product or service in person. The outside sales rep travels to the customer or prospect and meets them at a company’s office, a restaurant, an industry event, or another location. An outside sales approach is commonly used in industries that require the customer to physically hold or interact with the product being sold. For example, having a customer try a piece of machinery or feel the texture of fabrics.

Examples of inside sales:

  • Calling a prospect on the phone
  • Having a video conference meeting with a customer
  • Sending an email to a customer with a product demo video

Examples of outside sales:

  • Meeting a prospect at an industry networking get-together
  • Flying to a customer’s company headquarters for an in-person demo
  • Taking prospects out to dinner and a show

( Back to top )

The main advantage of having an inside sales team versus an outside sales team is that they do not travel to meet with customers. This means big savings in cost and time. An inside seller needs minimal equipment to do their job. It’s usually a computer, internet connection, and a phone. No expensive travel and transportation costs, such as flights, hotels, and entertainment. Companies that have a work-from-home sales structure may save on office space.

The sales cycle is the collection of sequential stages sales reps follow when converting a prospect into a customer. In my experience, the inside sales cycle tends to be shorter because the product or service being sold is usually less complex.

For example, an outside sales team selling solar panels must navigate the intricacies of permits and installation, leading to a longer sales cycle. In contrast, a company selling solar panel replacement parts depends on an inside sales team for the quick sale of the parts needed.

advantages of online experiments

Get the latest articles in your inbox.

360 Highlights

Yes, I would like to receive the Salesblazer newsletter as well as marketing emails regarding Salesforce products, services, and events. I can unsubscribe at any time.

By registering, you confirm that you agree to the processing of your personal data by Salesforce as described in the Privacy Statement .

Thanks, you’re subscribed!

advantages of online experiments

An inside sales rep oversees the entire sales cycle, from finding leads to closing deals. The role involves doing inbound and outbound sales . Inbound sales is when a potential customer hears about your product or service and reaches out to you to learn more. For outbound sales, a sales rep reaches out to a potential customer.

Some of the responsibilities of an inside sales rep may include:

  • Lead qualification: Responding to inbound leads that may come to the company via an existing customer’s referral or an online search. The rep also generates their own outbound leads by researching companies and creating a target prospects list. Potential prospects are contacted by phone, email, or social media to gauge interest in your product and service and to schedule a virtual sales meeting.
  • Social selling: Social selling is favored by inside sales reps because it’s a digital channel where they can interact with prospects on social media platforms to generate leads and build relationships.
  • Cold calling and emailing: The cold prospecting method is when a rep contacts a potential customer who has not previously expressed interest in the company’s product or service.
  • Virtual product demos: An inside sales rep presents a demo of their product or service to a prospective customer to show how it works and its uses and benefits. Product demos may be pre-recorded videos or conducted live on a video conference.
  • Deal closure: Signing the sales contract and documents with a customer or prospect is done digitally and usually with e-signatures.

One of the biggest challenges for an inside sales rep is trying to build relationships with customers or prospects without face-to-face interactions. Conversations lack the context of nonverbal behavior and body language such as a handshake or head nod, which can give a rep an understanding of how the customer is reacting to their sales pitch. To develop rapport, sellers should work on their active listening skills, such as paying attention to tone of voice or asking open-ended questions.

Here are some other skills you need for inside sales:

  • Prospecting: Be able to research potential customers and reach out to them on their preferred channels.
  • Communication skills: Since you are talking with customers on the phone or on a video calls, you must be able to communicate effectively to get your points across.
  • Negotiation skills: You need to be able to handle potential objecti o ns that are part of any deal.

Join the Salesblazer movement

We’re building the largest and most successful community of sales professionals, so you can learn, connect, and grow. 

advantages of online experiments

Modernizing tools and technologies is a top tactic that sales leaders are using to drive growth, according to the Salesforce State of Sales report. Inside sales tools can help you automate and organize your sales processes so you can be more efficient and productive.

Here are a few tools to consider:

  • Customer relationship management (CRM) system: A CRM manages all your interactions with customers and prospects. The system unifies your customer data in one place and tracks progress across the sales cycle. You can easily see when you last called a prospect or a customer’s buying history. Inside sales reps often work alone in a home office, and a CRM is a useful way to share information with your sales team while maintaining visibility into other reps’ activities. Consider getting a CRM with a built-in Configure, Price, Quote (CPQ) capability . The function automatically generates accurate quotes for orders using a customizable template. You can set it up for discounts, product bundles, or other pricing needs. A proposal document can be created from the CPQ process.
  • Video conferencing platform: It’s worth investing in a quality camera and microphone since you’ll be on a lot of video conferences and want your voice and image to look and sound professional. Test out your equipment with a friend and look at your camera angle, lighting, background, and voice clarity. Select a platform that syncs your voice and video calls with your CRM and automatically transcribes the calls. Then, play your calls back to review how you sound. Listen for volume, tone, and clarity, and work on areas for improvement.
  • Collaboration tools: Bring conversations together in the flow of work. A collaboration tool helps you work across teams to speed up processes and get results. For example, it allows you to share information such as proposal files or news of a closed deal in real time and improves productivity by eliminating a lot of back and forth.
  • Generative AI tool: Automatically generate sales emails, prospect proposals, or social media posts with generative AI. You’ll save time on writing individual drafts when you have AI baked into your CRM . It lets you personalize communications every time in just a few clicks.

Close inside sales deals anywhere and everywhere

Innovative tools and technologies have elevated inside sales reps’ ability to build better customer relationships that were once only possible with in-person interactions. By learning the right skills to sell virtually and having the essential hardware and software to support your sales efforts, you can excel at inside sales without ever leaving your desk.

Onboard fast, sell faster

See how Sales Cloud speeds up the sales cycle with data and AI, making you more efficient at every step.

advantages of online experiments

Just For You

Illustration of a person sitting in front of a computer screen with bar charts representing sales tracking software

What is Sales Tracking Software and Why Do You Need It?

Illustration of shaking hands on a blue background to signify Always Be Closing.

Why Not to ‘Always Be Closing’ in Sales (and What to Do Instead)

advantages of online experiments

Explore related content by topic

  • Sales Fundamentals
  • Sales Representative
  • Sales Strategy
  • Salesblazer

advantages of online experiments

Belal Batrawy is the founder of learntosell.io and the well known #deathtofluff hashtag on LinkedIn. He helps sellers learn how to outbound using social psychology tactics not taught in traditional sales trainings. Belal has been an early sales hire at 7 different startups, including 2 IPOs and 3 ... Read More acquisitions. He's been recognized for his thought leadership by Salesforce, Business Insider, LinkedIn, Gartner, Salesloft, and many more. Outside of work Belal is a father to 3 kids and an avid mountain biker.

advantages of online experiments

What is Gross Revenue and Why Does It Matter?

Loss Leader: customer pushing a shopping cart to a mobile checkout.

Winning with Loss Leaders: Turn Small Losses into Big Gains

Generative AI for Sales: computer giving a helping hand, light bulb, sales rep holding a magnifying glass up to a bar chart.

How Generative AI Is Changing Sales (and How to Use It to Win)

People holding up $ icons on a teal background to indicate what is CPQ

Configure, Price, Quote (CPQ) Software Explained

Illustration of sales reps standing in front of a computer monitor looking at channel activation plans with a magnifying glass.

What Is Channel Activation? 6 Steps to Selling with Partners

Illustration of two people holding dollar icons looking at an invoice

What Is an Invoice? Three Common Types and How to Use Them to Get Paid Faster

Sales CRM: Sales reps funneling contacts into a computer

What Is a Sales CRM? How Does It Help You Streamline Engagement and Close Deals?

Business Development Representative wearing a headset while working on a laptop

How to Land Your First Sales Job as a Business Development Representative (BDR)

advantages of online experiments

New to Salesforce?

  • What is Salesforce?
  • Best CRM software
  • Explore all products
  • What is cloud computing
  • Customer success
  • Product pricing

About Salesforce

  • Salesforce.org
  • Sustainability

Popular Links

  • Salesforce Mobile
  • AppExchange
  • CRM software
  • Salesforce LIVE
  • Salesforce for startups
  • América Latina (Español)
  • Brasil (Português)
  • Canada (English)
  • Canada (Français)
  • United States (English)

Europe, Middle East, and Africa

  • España (Español)
  • Deutschland (Deutsch)
  • France (Français)
  • Italia (Italiano)
  • Nederland (Nederlands)
  • Sverige (Svenska)
  • United Kingdom (English)
  • All other countries (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • Malaysia (English)
  • ประเทศไทย (ไทย)

© Copyright 2024 Salesforce, Inc. All rights reserved.  Various trademarks held by their respective owners. Salesforce, Inc. Salesforce Tower, 415 Mission Street, 3rd Floor, San Francisco, CA 94105, United States

IMAGES

  1. Don’t Have Lab Access? Make the Most of Online Science Experiments

    advantages of online experiments

  2. Infographic: Tips On Digital Experiments

    advantages of online experiments

  3. Harvard Business Review Power of Experimentation

    advantages of online experiments

  4. List of online-experiments

    advantages of online experiments

  5. 24 Engaging Synchronous Activities For Online Learning

    advantages of online experiments

  6. Future Internet

    advantages of online experiments

VIDEO

  1. The Pros and Cons of Online Learning

  2. Online study advantages and disadvantages

  3. Online Experiments

  4. Random block design // B.sc // degree// statistics// semester 6

  5. Advantages of the Soviet hammer

  6. Top 10 Benefits of E Learning in the Digital Age

COMMENTS

  1. Online Experimentation: Benefits, Operational and Methodological

    1. Benefits of Experimentation . In basic online experiments, often called A/B tests, users are randomly assigned to see either the current version 'A' (often called the control) or the new version 'B' (often called the treatment).

  2. The Pros & Cons of Virtual Labs Based on 1,614 Instructors & Students

    There are several benefits to using virtual simulations in your online lab course. As one student puts it: "The online labs gave us a chance to do experiments over and over again, until they were clearly understood. That was a great advantage." In fact, 81% of students said these online labs made their course more engaging. In addition to ...

  3. The Surprising Power of Online Experiments

    Leaders should understand how to properly design and execute A/B tests and other controlled experiments, ensure their integrity, interpret their results, and avoid pitfalls. In 2012 a Microsoft ...

  4. Online randomized controlled experiments at scale: lessons and

    Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized ...

  5. Online Experimentation: Benefits, Operational and Methodological

    Online Experimentation: Benefits, Operational and Methodological Challenges, and Scaling Guide. By: Iavor Bojinov and Somit ... Abstract. In the past decade, online controlled experimentation, or A/B testing, at scale has proved to be a significant driver of business innovation. The practice was first pioneered by the technology sector and ...

  6. PDF The online laboratory: conducting experiments in ...

    advantages of online experimentation and optimistically concluded: ... net, online experiments are still relatively rare, particularly in economics. During the same period, both field and traditional laboratory experiments have become far more common (Levitt and List 2009). We believe that the practical problems of (1) re-

  7. How to Run Behavioural Experiments Online: Best Practice Suggestions

    Overview of costs and benefits of online research. Online research provides a wealth of opportunities to improve psychological studies and to address some of the postulated pitfalls (henceforth, costs) that have previously deterred researchers from using this delivery method (see costs listed in Table 1).Often methodological problems with online research have appeared as major obstacles in the ...

  8. PDF Comparing online and lab methods in a problem-solving experiment

    in comparison with traditional lab experiments; then we compare online versus lab results in problem solving—an area in which this comparison has received less attention. Online Experiment Pros and Cons There are many potential advantages for doing an ex-periment online as opposed to in the lab (see Birnbaum, 2004, for a review of pros and cons).

  9. How to Conduct Online Experiments

    Abstract. This guide focuses on the necessary steps to prepare and conduct an online experiment. After a brief look at the history and definition of online experiments, the advantages and disadvantages of the method are discussed. The reader is then guided through a 10-step procedure for creating an experiment with the platform WEXTOR.

  10. The Web Experiment Method: Advantages, Disadvantages ...

    However, the current researches have an advantage of generalizability, volunteer bias, detectablity of motivational confounding, and other advantages like cost. Openness is one of the fundamental principles of science and this can be achieved in a much better way through Web experiments than in laboratory experiments.

  11. Why and How Online Experiments Can Benefit ...

    The purpose of this editorial is to encourage IS researchers interested in online behavior to adopt online experiments as a primary methodology, which may substitute for traditional lab experiments and complement nonexperimental methods. Online experiments have become an important methodology in the study of human behavior. While social scientists have been quick to capitalize on the benefits ...

  12. (PDF) A Review to Weigh the Pros and Cons of Online ...

    This strength of online learning concerns two aspects of accessibility: 1) the ease of access to the resources [20, 22] 2) the e-learning cost effectiveness [12,16,17,23] Together with flexibility ...

  13. Online versus In-lab: Pros and Cons of an Online ...

    Importantly, online studies have some benefits that apply especially well to experimental designs. For example, online experiments may have higher ecological validity than lab-based

  14. Comparing online and lab methods in a problem-solving experiment

    Online experiments have recently become very popular, and—in comparison with traditional lab experiments— they may have several advantages, such as reduced demand characteristics, automation, and generalizability of results to wider populations (Birnbaum, 2004; Reips, 2000, 2002a, 2002b). We replicated Dandurand, Bowen, and Shultz's (2004) lab-based problem-solving experiment as an ...

  15. Psychological research online: Opportunities and challenges

    One of the benefits of online research is that it allows a degree of automation and experimental control that can be otherwise difficult to achieve without the use of computers. A primary advantage of the Internet for both survey and experimental research is the low marginal cost of each additional research participant.

  16. Pros (and Cons) of Virtual Labs

    Pros (and Cons) of Virtual Labs. Teaching Strategies. Gizmos. School labs are places where students traditionally get "hands-on" with materials, doing scientific experiments themselves. Educators have argued for years that these experiences in the lab are invaluable. When students can observe the reaction of the chemicals or see the actual ...

  17. "Why and How Online Experiments Can Benefit Information Systems Researc

    Online experiments have become an important methodology in the study of human behavior. While social scientists have been quick to capitalize on the benefits of online experiments, information systems (IS) researchers seem to be among the laggards in taking advantage of this emerging paradigm, despite having the research motivations and technological capabilities to be among the leaders.

  18. Why do Experiments Fail? Six Practical Suggestions for Successful

    Our set of six recommendations for online experiments differs significantly from the "Ten Commandments" of experimental research (Lonati et al., 2018), as well as the five recommendations proposed by Eckerd et al. (2021) in the field of operational management, except for the common utilization of field experiments or the recommendation of ...

  19. Is there a disadvantage to taking online labs?

    "Online labs emphasize comprehension, but also include at-home dissections and at-home experiments," Lease said. "I think all lab modalities are really well developed this semester for our ...

  20. Studying the benefits of virtual art engagement

    Studying virtual art galleries and their wellbeing benefits is a relatively new line of inquiry for the Project, a National Endowment for the Arts Research Lab. Some digital art experiences take the form of an online picture catalog of artwork while others "are almost like Google Maps Street View, where you can click through," says ...

  21. What I Learned From Getting a Weekly Message For a Month

    Unsurprisingly, there are a whole host of benefits to getting a weekly massage. "Regular massage therapy increases serotonin and dopamine levels, which helps to regulate mood and reduce stress ...

  22. New therapies developed by Oxford experts offer online support for

    A suite of online therapies, developed and clinically validated by expert teams at the University of Oxford's Experimental Psychology and Psychiatry Departments, is now available to help close this gap in care, and tackle anxiety disorders and mental health conditions across all age groups from children through to adolescents and adults ...

  23. NTRS

    An experimental campaign was conducted to assess recent acoustic modifications to the NASA Langley Research Center 14- by 22-Foot Subsonic Tunnel. This effort was undertaken in preparation for future rotorcraft, Advanced Air Mobility, and airframe noise acoustic tests. The tunnel is a closed circuit and typically operates in an open-jet configuration for acoustic studies.

  24. Emotional leader communication in the digital age: An experimental

    The rapid pace of digitalization in the workplace confronts leaders and followers with the demand to get accustomed to digital ways of communication. The prevalent digital media platforms, like e-mail and messenger services, however, are less broad in informational depth, omitting nonverbal cues and usually adopting a more stifled and factual tonality. This constitutes a substantial challenge ...

  25. Global Historic Experiments Gathered at the Congress of the

    Though LTEs have substantial benefits, maintaining their continuity in the competitive grant funding system can be a long-term challenge. Tim Reinbott, director of the University of Missouri's Sanborn Field, noted most grant funding awards are for only three to five years, and the time in between grant awards is a dangerous time for continuing the long-term research sites if financial ...

  26. What is Inside Sales? & How it Works

    Benefits of inside sales. The main advantage of having an inside sales team versus an outside sales team is that they do not travel to meet with customers. This means big savings in cost and time. An inside seller needs minimal equipment to do their job. It's usually a computer, internet connection, and a phone.