• The 25 Most Influential Psychological Experiments in History

Most Influential Psychological Experiments in History

While each year thousands and thousands of studies are completed in the many specialty areas of psychology, there are a handful that, over the years, have had a lasting impact in the psychological community as a whole. Some of these were dutifully conducted, keeping within the confines of ethical and practical guidelines. Others pushed the boundaries of human behavior during their psychological experiments and created controversies that still linger to this day. And still others were not designed to be true psychological experiments, but ended up as beacons to the psychological community in proving or disproving theories.

This is a list of the 25 most influential psychological experiments still being taught to psychology students of today.

1. A Class Divided

Study conducted by: jane elliott.

Study Conducted in 1968 in an Iowa classroom

A Class Divided Study Conducted By: Jane Elliott

Experiment Details: Jane Elliott’s famous experiment was inspired by the assassination of Dr. Martin Luther King Jr. and the inspirational life that he led. The third grade teacher developed an exercise, or better yet, a psychological experiment, to help her Caucasian students understand the effects of racism and prejudice.

Elliott divided her class into two separate groups: blue-eyed students and brown-eyed students. On the first day, she labeled the blue-eyed group as the superior group and from that point forward they had extra privileges, leaving the brown-eyed children to represent the minority group. She discouraged the groups from interacting and singled out individual students to stress the negative characteristics of the children in the minority group. What this exercise showed was that the children’s behavior changed almost instantaneously. The group of blue-eyed students performed better academically and even began bullying their brown-eyed classmates. The brown-eyed group experienced lower self-confidence and worse academic performance. The next day, she reversed the roles of the two groups and the blue-eyed students became the minority group.

At the end of the experiment, the children were so relieved that they were reported to have embraced one another and agreed that people should not be judged based on outward appearances. This exercise has since been repeated many times with similar outcomes.

For more information click here

2. Asch Conformity Study

Study conducted by: dr. solomon asch.

Study Conducted in 1951 at Swarthmore College

Asch Conformity Study

Experiment Details: Dr. Solomon Asch conducted a groundbreaking study that was designed to evaluate a person’s likelihood to conform to a standard when there is pressure to do so.

A group of participants were shown pictures with lines of various lengths and were then asked a simple question: Which line is longest? The tricky part of this study was that in each group only one person was a true participant. The others were actors with a script. Most of the actors were instructed to give the wrong answer. Strangely, the one true participant almost always agreed with the majority, even though they knew they were giving the wrong answer.

The results of this study are important when we study social interactions among individuals in groups. This study is a famous example of the temptation many of us experience to conform to a standard during group situations and it showed that people often care more about being the same as others than they do about being right. It is still recognized as one of the most influential psychological experiments for understanding human behavior.

3. Bobo Doll Experiment

Study conducted by: dr. alburt bandura.

Study Conducted between 1961-1963 at Stanford University

Bobo Doll Experiment

In his groundbreaking study he separated participants into three groups:

  • one was exposed to a video of an adult showing aggressive behavior towards a Bobo doll
  • another was exposed to video of a passive adult playing with the Bobo doll
  • the third formed a control group

Children watched their assigned video and then were sent to a room with the same doll they had seen in the video (with the exception of those in the control group). What the researcher found was that children exposed to the aggressive model were more likely to exhibit aggressive behavior towards the doll themselves. The other groups showed little imitative aggressive behavior. For those children exposed to the aggressive model, the number of derivative physical aggressions shown by the boys was 38.2 and 12.7 for the girls.

The study also showed that boys exhibited more aggression when exposed to aggressive male models than boys exposed to aggressive female models. When exposed to aggressive male models, the number of aggressive instances exhibited by boys averaged 104. This is compared to 48.4 aggressive instances exhibited by boys who were exposed to aggressive female models.

While the results for the girls show similar findings, the results were less drastic. When exposed to aggressive female models, the number of aggressive instances exhibited by girls averaged 57.7. This is compared to 36.3 aggressive instances exhibited by girls who were exposed to aggressive male models. The results concerning gender differences strongly supported Bandura’s secondary prediction that children will be more strongly influenced by same-sex models. The Bobo Doll Experiment showed a groundbreaking way to study human behavior and it’s influences.

4. Car Crash Experiment

Study conducted by: elizabeth loftus and john palmer.

Study Conducted in 1974 at The University of California in Irvine

Car Crash Experiment

The participants watched slides of a car accident and were asked to describe what had happened as if they were eyewitnesses to the scene. The participants were put into two groups and each group was questioned using different wording such as “how fast was the car driving at the time of impact?” versus “how fast was the car going when it smashed into the other car?” The experimenters found that the use of different verbs affected the participants’ memories of the accident, showing that memory can be easily distorted.

This research suggests that memory can be easily manipulated by questioning technique. This means that information gathered after the event can merge with original memory causing incorrect recall or reconstructive memory. The addition of false details to a memory of an event is now referred to as confabulation. This concept has very important implications for the questions used in police interviews of eyewitnesses.

5. Cognitive Dissonance Experiment

Study conducted by: leon festinger and james carlsmith.

Study Conducted in 1957 at Stanford University

Experiment Details: The concept of cognitive dissonance refers to a situation involving conflicting:

This conflict produces an inherent feeling of discomfort leading to a change in one of the attitudes, beliefs or behaviors to minimize or eliminate the discomfort and restore balance.

Cognitive dissonance was first investigated by Leon Festinger, after an observational study of a cult that believed that the earth was going to be destroyed by a flood. Out of this study was born an intriguing experiment conducted by Festinger and Carlsmith where participants were asked to perform a series of dull tasks (such as turning pegs in a peg board for an hour). Participant’s initial attitudes toward this task were highly negative.

They were then paid either $1 or $20 to tell a participant waiting in the lobby that the tasks were really interesting. Almost all of the participants agreed to walk into the waiting room and persuade the next participant that the boring experiment would be fun. When the participants were later asked to evaluate the experiment, the participants who were paid only $1 rated the tedious task as more fun and enjoyable than the participants who were paid $20 to lie.

Being paid only $1 is not sufficient incentive for lying and so those who were paid $1 experienced dissonance. They could only overcome that cognitive dissonance by coming to believe that the tasks really were interesting and enjoyable. Being paid $20 provides a reason for turning pegs and there is therefore no dissonance.

6. Fantz’s Looking Chamber

Study conducted by: robert l. fantz.

Study Conducted in 1961 at the University of Illinois

Experiment Details: The study conducted by Robert L. Fantz is among the simplest, yet most important in the field of infant development and vision. In 1961, when this experiment was conducted, there very few ways to study what was going on in the mind of an infant. Fantz realized that the best way was to simply watch the actions and reactions of infants. He understood the fundamental factor that if there is something of interest near humans, they generally look at it.

To test this concept, Fantz set up a display board with two pictures attached. On one was a bulls-eye. On the other was the sketch of a human face. This board was hung in a chamber where a baby could lie safely underneath and see both images. Then, from behind the board, invisible to the baby, he peeked through a hole to watch what the baby looked at. This study showed that a two-month old baby looked twice as much at the human face as it did at the bulls-eye. This suggests that human babies have some powers of pattern and form selection. Before this experiment it was thought that babies looked out onto a chaotic world of which they could make little sense.

7. Hawthorne Effect

Study conducted by: henry a. landsberger.

Study Conducted in 1955 at Hawthorne Works in Chicago, Illinois

Hawthorne Effect

Landsberger performed the study by analyzing data from experiments conducted between 1924 and 1932, by Elton Mayo, at the Hawthorne Works near Chicago. The company had commissioned studies to evaluate whether the level of light in a building changed the productivity of the workers. What Mayo found was that the level of light made no difference in productivity. The workers increased their output whenever the amount of light was switched from a low level to a high level, or vice versa.

The researchers noticed a tendency that the workers’ level of efficiency increased when any variable was manipulated. The study showed that the output changed simply because the workers were aware that they were under observation. The conclusion was that the workers felt important because they were pleased to be singled out. They increased productivity as a result. Being singled out was the factor dictating increased productivity, not the changing lighting levels, or any of the other factors that they experimented upon.

The Hawthorne Effect has become one of the hardest inbuilt biases to eliminate or factor into the design of any experiment in psychology and beyond.

8. Kitty Genovese Case

Study conducted by: new york police force.

Study Conducted in 1964 in New York City

Experiment Details: The murder case of Kitty Genovese was never intended to be a psychological experiment, however it ended up having serious implications for the field.

According to a New York Times article, almost 40 neighbors witnessed Kitty Genovese being savagely attacked and murdered in Queens, New York in 1964. Not one neighbor called the police for help. Some reports state that the attacker briefly left the scene and later returned to “finish off” his victim. It was later uncovered that many of these facts were exaggerated. (There were more likely only a dozen witnesses and records show that some calls to police were made).

What this case later become famous for is the “Bystander Effect,” which states that the more bystanders that are present in a social situation, the less likely it is that anyone will step in and help. This effect has led to changes in medicine, psychology and many other areas. One famous example is the way CPR is taught to new learners. All students in CPR courses learn that they must assign one bystander the job of alerting authorities which minimizes the chances of no one calling for assistance.

9. Learned Helplessness Experiment

Study conducted by: martin seligman.

Study Conducted in 1967 at the University of Pennsylvania

Learned Helplessness Experiment

Seligman’s experiment involved the ringing of a bell and then the administration of a light shock to a dog. After a number of pairings, the dog reacted to the shock even before it happened. As soon as the dog heard the bell, he reacted as though he’d already been shocked.

During the course of this study something unexpected happened. Each dog was placed in a large crate that was divided down the middle with a low fence. The dog could see and jump over the fence easily. The floor on one side of the fence was electrified, but not on the other side of the fence. Seligman placed each dog on the electrified side and administered a light shock. He expected the dog to jump to the non-shocking side of the fence. In an unexpected turn, the dogs simply laid down.

The hypothesis was that as the dogs learned from the first part of the experiment that there was nothing they could do to avoid the shocks, they gave up in the second part of the experiment. To prove this hypothesis the experimenters brought in a new set of animals and found that dogs with no history in the experiment would jump over the fence.

This condition was described as learned helplessness. A human or animal does not attempt to get out of a negative situation because the past has taught them that they are helpless.

10. Little Albert Experiment

Study conducted by: john b. watson and rosalie rayner.

Study Conducted in 1920 at Johns Hopkins University

Little Albert Experiment

The experiment began by placing a white rat in front of the infant, who initially had no fear of the animal. Watson then produced a loud sound by striking a steel bar with a hammer every time little Albert was presented with the rat. After several pairings (the noise and the presentation of the white rat), the boy began to cry and exhibit signs of fear every time the rat appeared in the room. Watson also created similar conditioned reflexes with other common animals and objects (rabbits, Santa beard, etc.) until Albert feared them all.

This study proved that classical conditioning works on humans. One of its most important implications is that adult fears are often connected to early childhood experiences.

11. Magical Number Seven

Study conducted by: george a. miller.

Study Conducted in 1956 at Princeton University

Experiment Details:   Frequently referred to as “ Miller’s Law,” the Magical Number Seven experiment purports that the number of objects an average human can hold in working memory is 7 ± 2. This means that the human memory capacity typically includes strings of words or concepts ranging from 5-9. This information on the limits to the capacity for processing information became one of the most highly cited papers in psychology.

The Magical Number Seven Experiment was published in 1956 by cognitive psychologist George A. Miller of Princeton University’s Department of Psychology in Psychological Review .  In the article, Miller discussed a concurrence between the limits of one-dimensional absolute judgment and the limits of short-term memory.

In a one-dimensional absolute-judgment task, a person is presented with a number of stimuli that vary on one dimension (such as 10 different tones varying only in pitch). The person responds to each stimulus with a corresponding response (learned before).

Performance is almost perfect up to five or six different stimuli but declines as the number of different stimuli is increased. This means that a human’s maximum performance on one-dimensional absolute judgment can be described as an information store with the maximum capacity of approximately 2 to 3 bits of information There is the ability to distinguish between four and eight alternatives.

12. Pavlov’s Dog Experiment

Study conducted by: ivan pavlov.

Study Conducted in the 1890s at the Military Medical Academy in St. Petersburg, Russia

Pavlov’s Dog Experiment

Pavlov began with the simple idea that there are some things that a dog does not need to learn. He observed that dogs do not learn to salivate when they see food. This reflex is “hard wired” into the dog. This is an unconditioned response (a stimulus-response connection that required no learning).

Pavlov outlined that there are unconditioned responses in the animal by presenting a dog with a bowl of food and then measuring its salivary secretions. In the experiment, Pavlov used a bell as his neutral stimulus. Whenever he gave food to his dogs, he also rang a bell. After a number of repeats of this procedure, he tried the bell on its own. What he found was that the bell on its own now caused an increase in salivation. The dog had learned to associate the bell and the food. This learning created a new behavior. The dog salivated when he heard the bell. Because this response was learned (or conditioned), it is called a conditioned response. The neutral stimulus has become a conditioned stimulus.

This theory came to be known as classical conditioning.

13. Robbers Cave Experiment

Study conducted by: muzafer and carolyn sherif.

Study Conducted in 1954 at the University of Oklahoma

Experiment Details: This experiment, which studied group conflict, is considered by most to be outside the lines of what is considered ethically sound.

In 1954 researchers at the University of Oklahoma assigned 22 eleven- and twelve-year-old boys from similar backgrounds into two groups. The two groups were taken to separate areas of a summer camp facility where they were able to bond as social units. The groups were housed in separate cabins and neither group knew of the other’s existence for an entire week. The boys bonded with their cabin mates during that time. Once the two groups were allowed to have contact, they showed definite signs of prejudice and hostility toward each other even though they had only been given a very short time to develop their social group. To increase the conflict between the groups, the experimenters had them compete against each other in a series of activities. This created even more hostility and eventually the groups refused to eat in the same room. The final phase of the experiment involved turning the rival groups into friends. The fun activities the experimenters had planned like shooting firecrackers and watching movies did not initially work, so they created teamwork exercises where the two groups were forced to collaborate. At the end of the experiment, the boys decided to ride the same bus home, demonstrating that conflict can be resolved and prejudice overcome through cooperation.

Many critics have compared this study to Golding’s Lord of the Flies novel as a classic example of prejudice and conflict resolution.

14. Ross’ False Consensus Effect Study

Study conducted by: lee ross.

Study Conducted in 1977 at Stanford University

Experiment Details: In 1977, a social psychology professor at Stanford University named Lee Ross conducted an experiment that, in lay terms, focuses on how people can incorrectly conclude that others think the same way they do, or form a “false consensus” about the beliefs and preferences of others. Ross conducted the study in order to outline how the “false consensus effect” functions in humans.

Featured Programs

In the first part of the study, participants were asked to read about situations in which a conflict occurred and then were told two alternative ways of responding to the situation. They were asked to do three things:

  • Guess which option other people would choose
  • Say which option they themselves would choose
  • Describe the attributes of the person who would likely choose each of the two options

What the study showed was that most of the subjects believed that other people would do the same as them, regardless of which of the two responses they actually chose themselves. This phenomenon is referred to as the false consensus effect, where an individual thinks that other people think the same way they do when they may not. The second observation coming from this important study is that when participants were asked to describe the attributes of the people who will likely make the choice opposite of their own, they made bold and sometimes negative predictions about the personalities of those who did not share their choice.

15. The Schachter and Singer Experiment on Emotion

Study conducted by: stanley schachter and jerome e. singer.

Study Conducted in 1962 at Columbia University

Experiment Details: In 1962 Schachter and Singer conducted a ground breaking experiment to prove their theory of emotion.

In the study, a group of 184 male participants were injected with epinephrine, a hormone that induces arousal including increased heartbeat, trembling, and rapid breathing. The research participants were told that they were being injected with a new medication to test their eyesight. The first group of participants was informed the possible side effects that the injection might cause while the second group of participants were not. The participants were then placed in a room with someone they thought was another participant, but was actually a confederate in the experiment. The confederate acted in one of two ways: euphoric or angry. Participants who had not been informed about the effects of the injection were more likely to feel either happier or angrier than those who had been informed.

What Schachter and Singer were trying to understand was the ways in which cognition or thoughts influence human emotion. Their study illustrates the importance of how people interpret their physiological states, which form an important component of your emotions. Though their cognitive theory of emotional arousal dominated the field for two decades, it has been criticized for two main reasons: the size of the effect seen in the experiment was not that significant and other researchers had difficulties repeating the experiment.

16. Selective Attention / Invisible Gorilla Experiment

Study conducted by: daniel simons and christopher chabris.

Study Conducted in 1999 at Harvard University

Experiment Details: In 1999 Simons and Chabris conducted their famous awareness test at Harvard University.

Participants in the study were asked to watch a video and count how many passes occurred between basketball players on the white team. The video moves at a moderate pace and keeping track of the passes is a relatively easy task. What most people fail to notice amidst their counting is that in the middle of the test, a man in a gorilla suit walked onto the court and stood in the center before walking off-screen.

The study found that the majority of the subjects did not notice the gorilla at all, proving that humans often overestimate their ability to effectively multi-task. What the study set out to prove is that when people are asked to attend to one task, they focus so strongly on that element that they may miss other important details.

17. Stanford Prison Study

Study conducted by philip zimbardo.

Study Conducted in 1971 at Stanford University

Stanford Prison Study

The Stanford Prison Experiment was designed to study behavior of “normal” individuals when assigned a role of prisoner or guard. College students were recruited to participate. They were assigned roles of “guard” or “inmate.”  Zimbardo played the role of the warden. The basement of the psychology building was the set of the prison. Great care was taken to make it look and feel as realistic as possible.

The prison guards were told to run a prison for two weeks. They were told not to physically harm any of the inmates during the study. After a few days, the prison guards became very abusive verbally towards the inmates. Many of the prisoners became submissive to those in authority roles. The Stanford Prison Experiment inevitably had to be cancelled because some of the participants displayed troubling signs of breaking down mentally.

Although the experiment was conducted very unethically, many psychologists believe that the findings showed how much human behavior is situational. People will conform to certain roles if the conditions are right. The Stanford Prison Experiment remains one of the most famous psychology experiments of all time.

18. Stanley Milgram Experiment

Study conducted by stanley milgram.

Study Conducted in 1961 at Stanford University

Experiment Details: This 1961 study was conducted by Yale University psychologist Stanley Milgram. It was designed to measure people’s willingness to obey authority figures when instructed to perform acts that conflicted with their morals. The study was based on the premise that humans will inherently take direction from authority figures from very early in life.

Participants were told they were participating in a study on memory. They were asked to watch another person (an actor) do a memory test. They were instructed to press a button that gave an electric shock each time the person got a wrong answer. (The actor did not actually receive the shocks, but pretended they did).

Participants were told to play the role of “teacher” and administer electric shocks to “the learner,” every time they answered a question incorrectly. The experimenters asked the participants to keep increasing the shocks. Most of them obeyed even though the individual completing the memory test appeared to be in great pain. Despite these protests, many participants continued the experiment when the authority figure urged them to. They increased the voltage after each wrong answer until some eventually administered what would be lethal electric shocks.

This experiment showed that humans are conditioned to obey authority and will usually do so even if it goes against their natural morals or common sense.

19. Surrogate Mother Experiment

Study conducted by: harry harlow.

Study Conducted from 1957-1963 at the University of Wisconsin

Experiment Details: In a series of controversial experiments during the late 1950s and early 1960s, Harry Harlow studied the importance of a mother’s love for healthy childhood development.

In order to do this he separated infant rhesus monkeys from their mothers a few hours after birth and left them to be raised by two “surrogate mothers.” One of the surrogates was made of wire with an attached bottle for food. The other was made of soft terrycloth but lacked food. The researcher found that the baby monkeys spent much more time with the cloth mother than the wire mother, thereby proving that affection plays a greater role than sustenance when it comes to childhood development. They also found that the monkeys that spent more time cuddling the soft mother grew up to healthier.

This experiment showed that love, as demonstrated by physical body contact, is a more important aspect of the parent-child bond than the provision of basic needs. These findings also had implications in the attachment between fathers and their infants when the mother is the source of nourishment.

20. The Good Samaritan Experiment

Study conducted by: john darley and daniel batson.

Study Conducted in 1973 at The Princeton Theological Seminary (Researchers were from Princeton University)

Experiment Details: In 1973, an experiment was created by John Darley and Daniel Batson, to investigate the potential causes that underlie altruistic behavior. The researchers set out three hypotheses they wanted to test:

  • People thinking about religion and higher principles would be no more inclined to show helping behavior than laymen.
  • People in a rush would be much less likely to show helping behavior.
  • People who are religious for personal gain would be less likely to help than people who are religious because they want to gain some spiritual and personal insights into the meaning of life.

Student participants were given some religious teaching and instruction. They were then were told to travel from one building to the next. Between the two buildings was a man lying injured and appearing to be in dire need of assistance. The first variable being tested was the degree of urgency impressed upon the subjects, with some being told not to rush and others being informed that speed was of the essence.

The results of the experiment were intriguing, with the haste of the subject proving to be the overriding factor. When the subject was in no hurry, nearly two-thirds of people stopped to lend assistance. When the subject was in a rush, this dropped to one in ten.

People who were on the way to deliver a speech about helping others were nearly twice as likely to help as those delivering other sermons,. This showed that the thoughts of the individual were a factor in determining helping behavior. Religious beliefs did not appear to make much difference on the results. Being religious for personal gain, or as part of a spiritual quest, did not appear to make much of an impact on the amount of helping behavior shown.

21. The Halo Effect Experiment

Study conducted by: richard e. nisbett and timothy decamp wilson.

Study Conducted in 1977 at the University of Michigan

Experiment Details: The Halo Effect states that people generally assume that people who are physically attractive are more likely to:

  • be intelligent
  • be friendly
  • display good judgment

To prove their theory, Nisbett and DeCamp Wilson created a study to prove that people have little awareness of the nature of the Halo Effect. They’re not aware that it influences:

  • their personal judgments
  • the production of a more complex social behavior

In the experiment, college students were the research participants. They were asked to evaluate a psychology instructor as they view him in a videotaped interview. The students were randomly assigned to one of two groups. Each group was shown one of two different interviews with the same instructor. The instructor is a native French-speaking Belgian who spoke English with a noticeable accent. In the first video, the instructor presented himself as someone:

  • respectful of his students’ intelligence and motives
  • flexible in his approach to teaching
  • enthusiastic about his subject matter

In the second interview, he presented himself as much more unlikable. He was cold and distrustful toward the students and was quite rigid in his teaching style.

After watching the videos, the subjects were asked to rate the lecturer on:

  • physical appearance

His mannerisms and accent were kept the same in both versions of videos. The subjects were asked to rate the professor on an 8-point scale ranging from “like extremely” to “dislike extremely.” Subjects were also told that the researchers were interested in knowing “how much their liking for the teacher influenced the ratings they just made.” Other subjects were asked to identify how much the characteristics they just rated influenced their liking of the teacher.

After responding to the questionnaire, the respondents were puzzled about their reactions to the videotapes and to the questionnaire items. The students had no idea why they gave one lecturer higher ratings. Most said that how much they liked the lecturer had not affected their evaluation of his individual characteristics at all.

The interesting thing about this study is that people can understand the phenomenon, but they are unaware when it is occurring. Without realizing it, humans make judgments. Even when it is pointed out, they may still deny that it is a product of the halo effect phenomenon.

22. The Marshmallow Test

Study conducted by: walter mischel.

Study Conducted in 1972 at Stanford University

The Marshmallow Test

In his 1972 Marshmallow Experiment, children ages four to six were taken into a room where a marshmallow was placed in front of them on a table. Before leaving each of the children alone in the room, the experimenter informed them that they would receive a second marshmallow if the first one was still on the table after they returned in 15 minutes. The examiner recorded how long each child resisted eating the marshmallow and noted whether it correlated with the child’s success in adulthood. A small number of the 600 children ate the marshmallow immediately and one-third delayed gratification long enough to receive the second marshmallow.

In follow-up studies, Mischel found that those who deferred gratification were significantly more competent and received higher SAT scores than their peers. This characteristic likely remains with a person for life. While this study seems simplistic, the findings outline some of the foundational differences in individual traits that can predict success.

23. The Monster Study

Study conducted by: wendell johnson.

Study Conducted in 1939 at the University of Iowa

Experiment Details: The Monster Study received this negative title due to the unethical methods that were used to determine the effects of positive and negative speech therapy on children.

Wendell Johnson of the University of Iowa selected 22 orphaned children, some with stutters and some without. The children were in two groups. The group of children with stutters was placed in positive speech therapy, where they were praised for their fluency. The non-stutterers were placed in negative speech therapy, where they were disparaged for every mistake in grammar that they made.

As a result of the experiment, some of the children who received negative speech therapy suffered psychological effects and retained speech problems for the rest of their lives. They were examples of the significance of positive reinforcement in education.

The initial goal of the study was to investigate positive and negative speech therapy. However, the implication spanned much further into methods of teaching for young children.

24. Violinist at the Metro Experiment

Study conducted by: staff at the washington post.

Study Conducted in 2007 at a Washington D.C. Metro Train Station

Grammy-winning musician, Joshua Bell

During the study, pedestrians rushed by without realizing that the musician playing at the entrance to the metro stop was Grammy-winning musician, Joshua Bell. Two days before playing in the subway, he sold out at a theater in Boston where the seats average $100. He played one of the most intricate pieces ever written with a violin worth 3.5 million dollars. In the 45 minutes the musician played his violin, only 6 people stopped and stayed for a while. Around 20 gave him money, but continued to walk their normal pace. He collected $32.

The study and the subsequent article organized by the Washington Post was part of a social experiment looking at:

  • the priorities of people

Gene Weingarten wrote about the social experiment: “In a banal setting at an inconvenient time, would beauty transcend?” Later he won a Pulitzer Prize for his story. Some of the questions the article addresses are:

  • Do we perceive beauty?
  • Do we stop to appreciate it?
  • Do we recognize the talent in an unexpected context?

As it turns out, many of us are not nearly as perceptive to our environment as we might like to think.

25. Visual Cliff Experiment

Study conducted by: eleanor gibson and richard walk.

Study Conducted in 1959 at Cornell University

Experiment Details: In 1959, psychologists Eleanor Gibson and Richard Walk set out to study depth perception in infants. They wanted to know if depth perception is a learned behavior or if it is something that we are born with. To study this, Gibson and Walk conducted the visual cliff experiment.

They studied 36 infants between the ages of six and 14 months, all of whom could crawl. The infants were placed one at a time on a visual cliff. A visual cliff was created using a large glass table that was raised about a foot off the floor. Half of the glass table had a checker pattern underneath in order to create the appearance of a ‘shallow side.’

In order to create a ‘deep side,’ a checker pattern was created on the floor; this side is the visual cliff. The placement of the checker pattern on the floor creates the illusion of a sudden drop-off. Researchers placed a foot-wide centerboard between the shallow side and the deep side. Gibson and Walk found the following:

  • Nine of the infants did not move off the centerboard.
  • All of the 27 infants who did move crossed into the shallow side when their mothers called them from the shallow side.
  • Three of the infants crawled off the visual cliff toward their mother when called from the deep side.
  • When called from the deep side, the remaining 24 children either crawled to the shallow side or cried because they could not cross the visual cliff and make it to their mother.

What this study helped demonstrate is that depth perception is likely an inborn train in humans.

Among these experiments and psychological tests, we see boundaries pushed and theories taking on a life of their own. It is through the endless stream of psychological experimentation that we can see simple hypotheses become guiding theories for those in this field. The greater field of psychology became a formal field of experimental study in 1879, when Wilhelm Wundt established the first laboratory dedicated solely to psychological research in Leipzig, Germany. Wundt was the first person to refer to himself as a psychologist. Since 1879, psychology has grown into a massive collection of:

  • methods of practice

It’s also a specialty area in the field of healthcare. None of this would have been possible without these and many other important psychological experiments that have stood the test of time.

  • 20 Most Unethical Experiments in Psychology
  • What Careers are in Experimental Psychology?
  • 10 Things to Know About the Psychology of Psychotherapy

About Education: Psychology

Explorable.com

Mental Floss.com

About the Author

After earning a Bachelor of Arts in Psychology from Rutgers University and then a Master of Science in Clinical and Forensic Psychology from Drexel University, Kristen began a career as a therapist at two prisons in Philadelphia. At the same time she volunteered as a rape crisis counselor, also in Philadelphia. After a few years in the field she accepted a teaching position at a local college where she currently teaches online psychology courses. Kristen began writing in college and still enjoys her work as a writer, editor, professor and mother.

  • 5 Best Online Ph.D. Marriage and Family Counseling Programs
  • Top 5 Online Doctorate in Educational Psychology
  • 5 Best Online Ph.D. in Industrial and Organizational Psychology Programs
  • Top 10 Online Master’s in Forensic Psychology
  • 10 Most Affordable Counseling Psychology Online Programs
  • 10 Most Affordable Online Industrial Organizational Psychology Programs
  • 10 Most Affordable Online Developmental Psychology Online Programs
  • 15 Most Affordable Online Sport Psychology Programs
  • 10 Most Affordable School Psychology Online Degree Programs
  • Top 50 Online Psychology Master’s Degree Programs
  • Top 25 Online Master’s in Educational Psychology
  • Top 25 Online Master’s in Industrial/Organizational Psychology
  • Top 10 Most Affordable Online Master’s in Clinical Psychology Degree Programs
  • Top 6 Most Affordable Online PhD/PsyD Programs in Clinical Psychology
  • 50 Great Small Colleges for a Bachelor’s in Psychology
  • 50 Most Innovative University Psychology Departments
  • The 30 Most Influential Cognitive Psychologists Alive Today
  • Top 30 Affordable Online Psychology Degree Programs
  • 30 Most Influential Neuroscientists
  • Top 40 Websites for Psychology Students and Professionals
  • Top 30 Psychology Blogs
  • 25 Celebrities With Animal Phobias
  • Your Phobias Illustrated (Infographic)
  • 15 Inspiring TED Talks on Overcoming Challenges
  • 10 Fascinating Facts About the Psychology of Color
  • 15 Scariest Mental Disorders of All Time
  • 15 Things to Know About Mental Disorders in Animals
  • 13 Most Deranged Serial Killers of All Time

Online Psychology Degree Guide

Site Information

  • About Online Psychology Degree Guide

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

An Introduction to t Tests | Definitions, Formula and Examples

Published on January 31, 2020 by Rebecca Bevans . Revised on June 22, 2023.

A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.

  • The null hypothesis ( H 0 ) is that the true difference between these group means is zero.
  • The alternate hypothesis ( H a ) is that the true difference is different from zero.

Table of contents

When to use a t test, what type of t test should i use, performing a t test, interpreting test results, presenting the results of a t test, other interesting articles, frequently asked questions about t tests.

A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an   ANOVA test  or a post-hoc test.

The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:

  • are independent
  • are (approximately) normally distributed
  • have a similar amount of variance within each group being compared (a.k.a. homogeneity of variance)

If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances .

Prevent plagiarism. Run a free check.

When choosing a t test, you will need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction.

What type of t-test should I use

One-sample, two-sample, or paired t test?

  • If the groups come from a single population (e.g., measuring before and after an experimental treatment), perform a paired t test . This is a within-subjects design .
  • If the groups come from two different populations (e.g., two different species, or people from two separate cities), perform a two-sample t test (a.k.a. independent t test ). This is a between-subjects design .
  • If there is one group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral pH of 7), perform a one-sample t test .

One-tailed or two-tailed t test?

  • If you only care whether the two populations are different from one another, perform a two-tailed t test .
  • If you want to know whether one population mean is greater than or less than the other, perform a one-tailed t test.
  • Your observations come from two separate populations (separate species), so you perform a two-sample t test.
  • You don’t care about the direction of the difference, only whether there is a difference, so you choose to use a two-tailed t test.

The t test estimates the true difference between two group means using the ratio of the difference in group means over the pooled standard error of both groups. You can calculate it manually using a formula, or use statistical analysis software.

T test formula

The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.

\begin{equation*}t=\dfrac{\bar{x}_{1}-\bar{x}_{2}}{\sqrt{(s^2(\frac{1}{n_{1}}+\frac{1}{n_{2}}))}}}\end{equation*}

In this formula, t is the t value, x 1 and x 2 are the means of the two groups being compared, s 2 is the pooled standard error of the two groups, and n 1 and n 2 are the number of observations in each of the groups.

A larger t value shows that the difference between group means is greater than the pooled standard error, indicating a more significant difference between the groups.

You can compare your calculated t value against the values in a critical value chart (e.g., Student’s t table) to determine whether your t value is greater than what would be expected by chance. If so, you can reject the null hypothesis and conclude that the two groups are in fact different.

T test function in statistical software

Most statistical software (R, SPSS, etc.) includes a t test function. This built-in function will take your raw data and calculate the t value. It will then compare it to the critical value, and calculate a p -value . This way you can quickly see whether your groups are statistically different.

In your comparison of flower petal lengths, you decide to perform your t test using R. The code looks like this:

Download the data set to practice by yourself.

Sample data set

If you perform the t test for your flower hypothesis in R, you will receive the following output:

T-test output in R

The output provides:

  • An explanation of what is being compared, called data in the output table.
  • The t value : -33.719. Note that it’s negative; this is fine! In most cases, we only care about the absolute value of the difference, or the distance from 0. It doesn’t matter which direction.
  • The degrees of freedom : 30.196. Degrees of freedom is related to your sample size, and shows how many ‘free’ data points are available in your test for making comparisons. The greater the degrees of freedom, the better your statistical test will work.
  • The p value : 2.2e-16 (i.e. 2.2 with 15 zeros in front). This describes the probability that you would see a t value as large as this one by chance.
  • A statement of the alternative hypothesis ( H a ). In this test, the H a is that the difference is not 0.
  • The 95% confidence interval . This is the range of numbers within which the true difference in means will be 95% of the time. This can be changed from 95% if you want a larger or smaller interval, but 95% is very commonly used.
  • The mean petal length for each group.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

When reporting your t test results, the most important values to include are the t value , the p value , and the degrees of freedom for the test. These will communicate to your audience whether the difference between the two groups is statistically significant (a.k.a. that it is unlikely to have happened by chance).

You can also include the summary statistics for the groups being compared, namely the mean and standard deviation . In R, the code for calculating the mean and the standard deviation from the data looks like this:

flower.data %>% group_by(Species) %>% summarize(mean_length = mean(Petal.Length), sd_length = sd(Petal.Length))

In our example, you would report the results like this:

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.

A t-test measures the difference in group means divided by the pooled standard error of the two group means.

In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).

Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.

If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .

If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .

A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).

A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).

A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.

If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). An Introduction to t Tests | Definitions, Formula and Examples. Scribbr. Retrieved September 5, 2024, from https://www.scribbr.com/statistics/t-test/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, what is your plagiarism score.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

T Test Overview: How to Use & Examples

By Jim Frost 12 Comments

What is a T Test?

A t test is a statistical hypothesis test that assesses sample means to draw conclusions about population means. Frequently, analysts use a t test to determine whether the population means for two groups are different. For example, it can determine whether the difference between the treatment and control group means is statistically significant.

A scale weighing the population means to represent a t test.

The following are the standard t tests:

  • One-sample: Compares a sample mean to a reference value.
  • Two-sample: Compares two sample means.
  • Paired: Compares the means of matched pairs, such as before and after scores.

In this post, you’ll learn about the different types of t tests, when you should use each one, and their assumptions. Additionally, I interpret an example of each type.

Which T Test Should I Use?

To choose the correct t test, you must know whether you are assessing one or two group means. If you’re working with two group means, do the groups have the same or different items/people? Use the table below to choose the proper analysis.

One One sample t test
Two Different items in each group Two sample t test
Two Same items in both groups Paired t test

Now, let’s review each t test to see what it can do!

Imagine we’ve developed a drug that supposedly boosts your IQ score. In the following sections, we’ll address the same research question, and I’ll show you how the various t tests can help you answer it.

One Sample T Test

Use a one-sample t test to compare a sample mean to a reference value. It allows you to determine whether the population mean differs from the reference value. The reference value is usually highly relevant to the subject area.

For example, a coffee shop claims their large cup contains 16 ounces. A skeptical customer takes a random sample of 10 large cups of coffee and measures their contents to determine if the mean volume differs from the claimed 16 ounces using a one-sample t test.

One-Sample T Test Hypotheses

  • Null hypothesis (H 0 ): The population mean equals the reference value (µ = µ 0 ).
  • Alternative hypothesis (H A ): The population mean DOES NOT equal the reference value (µ ≠ µ 0 ).

Reject the null when the p-value is less than the significance level (e.g., 0.05). This condition indicates the difference between the sample mean and the reference value is statistically significant. Your sample data support the idea that the population mean does not equal the reference value.

Learn more about the One-Sample T-Test .

The above hypotheses are two-sided analyses. Alternatively, you can use one-sided hypotheses to find effects in only one direction. Learn more in my article, One- and Two-Tailed Hypothesis Tests Explained .

Related posts : Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels

We want to evaluate our IQ boosting drug using a one-sample t test. First, we draw a single random sample of 15 participants and administer the medicine to all of them. Then we measure all their IQs and calculate a sample average IQ of 109.

In the general population, the average IQ is defined as 100 . So, we’ll use 100 as our reference value. Is the difference between our sample mean of 109 and the reference value of 100 statistically significant? The t test output is below.

Statistical output for a one-sample t test.

In the output, we see that our sample mean is 109. The procedure compares the sample mean to the reference value of 100 and produces a p-value of 0.036. Consequently, we can reject the null hypothesis and conclude that the population mean for those who take the IQ drug is higher than 100.

Two-Sample T Test

Use a two-sample t test to compare the sample means for two groups. It allows you to determine whether the population means for these two groups are different. For the two-sample procedure, the groups must contain different sets of items or people.

For example, you might compare averages between males and females or treatment and controls.

Two-Sample T Test Hypotheses

  • Null hypothesis (H 0 ): Two population means are equal (µ 1 = µ 2 ).
  • Alternative hypothesis (H A ): Two population means are not equal (µ 1 ≠ µ 2 ).

Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant. Your sample data support the theory that the two population means are different. Learn more about the Null Hypothesis: Definition, Rejecting & Examples .

Learn more about the two-sample t test .

Related posts : How to Interpret P Values and Statistical Significance

For our IQ drug, we collect two random samples, a control group and a treatment group. Each group has 15 subjects. We give the treatment group the medication and a placebo to the control group.

We’ll use a two-sample t test to evaluate if the difference between the two group means is statistically significant. The t test output is below.

Statistical output for a two-sample t test.

In the output, you can see that the treatment group (Sample 1) has a mean of 109 while the control group’s (Sample 2) average is 100. The p-value for the difference between the groups is 0.112. We fail to reject the null hypothesis. There is insufficient evidence to conclude that the IQ drug has an effect .

Paired Sample T Test

Use a paired t-test when you measure each subject twice, such as before and after test scores. This procedure determines if the mean difference between paired scores differs from zero, where zero represents no effect. Because researchers measure each item in both conditions, the subjects serve as their own controls.

For example, a pharmaceutical company develops a new drug to reduce blood pressure. They measure the blood pressure of 20 patients before and after administering the medication for one month. Analysts use a paired t-test to assess whether there is a statistically significant difference in pressure measurements before and after taking the drug.

Paired T Test Hypotheses

  • Null hypothesis: The mean difference between pairs equals zero in the population (µ D = 0).
  • Alternative hypothesis: The mean difference between pairs does not equal zero in the population (µ D ≠ 0).

Reject the null when the p-value is less than or equal to your significance level (e.g., 0.05). Your sample provides sufficiently strong evidence to conclude that the mean difference between pairs does not equal zero in the population.

Learn more about the paired t test.

Back to our IQ boosting drug. This time, we’ll draw one random sample of 15 participants. We’ll measure their IQ before taking the medicine and then again afterward. The before and after groups contain the same people. The procedure subtracts the After — Before scores to calculate the individual differences. Then it calculates the average difference.

If the drug increases IQs effectively, we should see a positive difference value. Conversely, a value near zero indicates that the IQ scores didn’t improve between the Before and After scores. The paired t test will determine whether the difference between the pre-test and post-test is statistically significant.

The t test output is below.

Statistical output for a two-sample t test.

The mean difference between the pre-test and post-test scores is 9 IQ points. In other words, the average IQ increased by 9 points between the before and after measurements. The p-value of 0.000 causes us to reject the null. We conclude that the difference between the pre-test and post-test population means does not equal zero. The drug appears to increase IQs by an average of 9 IQ points in the population.

T Test Assumptions

For your t test to produce reliable results, your data should meet the following assumptions:

You have a random sample

Drawing a random sample from your target population helps ensure it represents the population. Representative samples are crucial for accurately inferring population properties. The t test results are invalid if your data do not reflect the population.

Related posts : Random Sampling and Representative Samples

Continuous data

A t test requires continuous data . Continuous variables can take on all numeric values, and the scale can be divided meaningfully into smaller increments, such as fractional and decimal values. For example, weight, height, and temperature are continuous.

Other analyses can assess additional data types. For more information, read Comparing Hypothesis Tests for Continuous, Binary, and Count Data .

Your sample data follow a normal distribution, or you have a large sample size

A t test assumes your data follow a normal distribution . However, due to the central limit theorem, you can waive this assumption when your sample is large enough.

The following sample size guidelines specify when normality becomes less of a restriction:

  • One-Sample and Paired : 20 or more observations.
  • Two-Sample : At least 15 in each group.

Related posts : Central Limit Theorem and Skewed Distributions

Population standard deviation is unknown

A t test assumes you have a sample estimate of the standard deviation. In other words, you don’t know the precise value of the population standard deviation. This assumption is almost always true. However, if you know the population standard deviation, use the Z test instead. However, when n > 30, the difference between the t and Z tests becomes trivial.

Learn more about the Z test .

Related post : Standard Deviations

Share this:

psychology experiment t test

Reader Interactions

' src=

April 16, 2024 at 5:00 pm

Hello Jim, and thank you on behalf of the thousands you have helped.

Question about which t test to use:

20 members of a committee are asked to interview and rate two candidates for a position – one candidate on Monday, the other candidate on Tuesday. So, one group of 20 committee members interviews 2 separate candidates one day after the other on the same variables . Would this scenario use a paired or independent application? thank you,, js

' src=

April 16, 2024 at 8:37 pm

This would be a case where you’d potentially use a paired t-test . You’re determining whether there’s a significant difference between the two candidates as given by the same 20 committee members. The two observations are paired because it’s the same 20 members giving the two ratings.

The only wrinkle in that, which is why I say “potentially use,” is that ratings are often ordinal. If you have ordinal rankings, you might need to use a nonparametric test.

' src=

April 11, 2024 at 11:25 pm

Question about determining tails: when determining the P values, this is what I am told: “You draw a t curve and plot t value on the horizontal axis, then you check the sign in Ha, if it is > such as our case you shade the right hand side. ( if Ha has <sign, the shade the left hand side).II) Determine if the shaded side is a tail or not ( a smaller side is called a tail), if it is, P=sig/2;If it is not a tail then P=1-(sig/2)" When emailing the isntructor, this is all I was told: For p of t test, if the shaded area according to your Ha is small, it is a tail (which is half of the two tails), if it is large then 1- a tail.

So, when determining P of T test, how do I know whether to perform 1-(p/2) or just P/2

We use the software SPSS so P=sig in the instructions.

April 12, 2024 at 12:04 am

From your description, I can’t tell what you’re saying.

Tails are just the thin, extreme parts of the distribution. In this hypothesis testing context, shaded areas are called critical regions or rejection regions. You need to determine whether your t-value (or other test statistic) falls within a critical region. If it does, your results are significant and you reject the null. However that process doesn’t tell you the p-value. I think you’re mixing two different things. Here are a couple of posts I’ve written that will clarify the issues you asked about.

Finding the P-value One and Two Tailed Hypothesis Tests Explained

' src=

January 10, 2024 at 3:08 pm

Happy New Year!

I have a few questions I was hoping you’d be able to help me with please?

In the case of a t-test, I know one assumption is that the DV should be the scale variable and the IV should be the categorical variable. I wondered if it mattered whether it was the other way around – so the scale variable was the IV and the categorial variable the DV. Would it make much difference? When I’ve done a t-test like this before, it doesn’t seem to, but I may be missing something.

Would it be better to recode the scale variable to a categorical variable and do a chi-square test?

Or does it just depend on what I am aiming to do. So whether I want to examine relationships or compare means?

Any advice would be appreciated.

January 10, 2024 at 5:34 pm

Hi Charlotte

Yes, you can do that in the opposite direction but you’ll need to use a different analysis.

If you have two groups based on a categorical variable and a continuous variable, you have a couple of choices:

You can use the 2-sample t-test as you suggest to determine whether the group means are different.

Or, you can use something like binary logistic regression to use the continuous variable to predict the outcome of the binary variable.

Typically, you’ll choose the one that makes the most sense for your subject area. If you think group assignment affects the mean outcome, use the t-test. However, if you think the continuous value of a variable predicts the outcome of the binary variable, use binary logistic regression.

I hope that helps!

' src=

October 11, 2023 at 5:40 am

Jim, When the input variable is continuous (such as speed) and the output variable is categorical (pass/ fail) I know that logistic regression should be done. However can a standard 2-sample t-test be done to determine if the mean input level is independent of result (pass or fail)? Can a standard deviations test also be done to determine if the spread on values for the input variable is independent of result?

' src=

October 6, 2023 at 5:23 am

This was really helpful. After reading it, conducting a T test analysis is almost like a walk in the park. Thanks!

October 6, 2023 at 6:41 pm

Thanks so much, Mark!

' src=

September 8, 2023 at 2:14 am

Thank you for your awesome work.

' src=

September 7, 2023 at 2:03 am

Your explanation is comprehensive even to non-statisticians

September 7, 2023 at 6:57 pm

Thanks so much, Daniel. So glad my blog post could help!

Comments and Questions Cancel reply

Fundamentals of Quantitative Analysis

Experiments where you compare results from two conditions or two groups are very common within Psychology as often we want to know if there is an effect of a given variable. One of the really confusing things however about research design is that there are many names for the same type of design. To clarify:

  • One-sample are used to study one group of people against a known norm or criterion - for example, comparing the mean IQ of a sample against a known population norm such as an IQ of 100.
  • Independent-samples and between-subjects designs mean the same thing - different participants in different conditions.
  • In contrast, within-subjects , dependent-samples, paired-samples, and repeated-measures all tend to mean the same participants in all conditions
  • Matched-pairs design means different people in different conditions but you have matched participants across the conditions so that they are effectively the same person (e.g. age, IQ, Social Economic Status, etc)
  • Mixed-design is when there is a combination of within-subjects and between-subjects designs in the one experiment. For example, say you are looking at attractiveness and dominance of male and female faces. Everyone might see both male and female faces (within) but half of the participants do ratings of attractiveness and half of the participants do ratings of trustworthiness (between).

To get a better understanding of how some of these tests run we will look at running an example of a between-subjects t-test and a within-subjects t-test through a series of activities. Remember that the solutions are at the bottom of the page if you are stuck, and please do ask questions on the forums.

10.1 Between-Subjects t-tests (two-sample)

We will begin by looking at the between-subjects t-test which is used for comparing the outcome in two groups of different people. Here we will be using data from Schroeder and Epley (2015) on the perception of people from their job applications. You can take a look at the Psychological Science article here, Schroeder, J. and Epley, N. (2015). The sound of intellect: Speech reveals a thoughtful mind, increasing a job candidate's appeal. Psychological Science, 26, 277--891. , if you like but it is not essential for completing the activities. The abstract from this article explains more about the different experiments conducted, and we will be specifically looking at the data set from Experiment 4, based on information from the Open Stats Lab . The abstract reads:

A person's mental capacities, such as intellect, cannot be observed directly and so are instead inferred from indirect cues. We predicted that a person's intellect would be conveyed most strongly through a cue closely tied to actual thinking: his or her voice. Hypothetical employers (Experiments 1-3b) and professional recruiters (Experiment 4) watched, listened to, or read job candidates' pitches about why they should be hired. These evaluators (the employers) rated a candidate as more competent, thoughtful, and intelligent when they heard a pitch rather than read it and, as a result, had a more favourable impression of the candidate and were more interested in hiring the candidate. Adding voice to written pitches, by having trained actors (Experiment 3a) or untrained adults (Experiment 3b) read them, produced the same results. Adding visual cues to audio pitches did not alter evaluations of the candidates. For conveying one's intellect, it is important that one's voice, quite literally, be heard.

To summarise, 39 professional recruiters from Fortune 500 companies evaluated job pitches of M.B.A. candidates from the University of Chicago Booth School of Business. The methods and results appear on pages 887-889 of the article if you want to look at them specifically for more details and the original data, in wide format, can be found at the Open Stats Lab website for later self-directed learning. Today however, we will be working with a modified version in "tidy" format which can be downloaded below and what we plan to do is reproduce the results from the article on Pg 887.

10.1.1 Data and Descriptives

As always, the first activity is about getting ourselves ready to analyse the data so try out the steps and if you need help, consult the earlier chapters.

10.1.1.1 Activity 1: Set-up

  • If you're using the Rserver, avoid a number of issues by restarting the session - click Session - Restart R
  • Open a new R Markdown document and save it in your working directory. Call the file "ttests".
  • If you prefer you can download the data in a zip folder by clicking here
  • Remember not to change the file names at all and that data.csv is not the same as data (1).csv .
  • Load the packages in this order, Hmisc , broom , car , effectsize , report , and tidyverse
  • again we have not used some of these packages so you will likely need to install some of them using install.packages() . Remember though that you should only do this on your own machine and only in the console window. If you are using the RServer you will not need to install them.
  • Finally, load the data held in evaluators.csv as a tibble into an object named evaluators using read_csv() .

Remember to have a look at your data to help you understand the structure and the layout of the data. You can do this in whatever way you prefer.

Now that we have our data, and have explored it, there is a few things we can do to make working with it a bit easier. If you look at the data, and in particular the sex column, you will see it is actually coded as numeric but we will want to treat it as categorical . Secondly, it can be tricky to work with 1s and 2s when you mean people, so we can "recode" the variables into labels that are easier to work with. That is what we will do here using a combination of mutate() , which we already know, and the recode() function from the dplyr package that is loaded in as part of the tidyverse , and the as.factor() function from base . Converting categorical data to factors will make it easier to work with in visualisations and analysis.

10.1.1.2 Activity 2: Explore the dataset

In a new code chunk, copy the code below and see if you can follow it.

  • Be careful using recode() as there are multiple functions in different packages called with the same name so it is better to use the package::function() approach and specify dplyr::recode() to get the right one.
  • Then we use mutate() and as.factor() to overwrite sex_labels and condition as factors.

Now see if you can create a count of the different sex labels to answer the following question. One approach would be group_by() %>% count() but what would you group by? Maybe store this tibble in an object called eval_counts .

  • How many participants were noted as being female:
  • How many participants were noted as being male:
  • How many data points are missing for sex ?

10.1.1.3 Activity 3: Ratings

Excellent work. Our evaluator data is ready to work with and we are now going to calculate what is called an "overall intellect rating" given by each evaluator, calculated by averaging the ratings of competent , thoughtful and intelligent from each evaluator; held within ratings.csv . This overall rating will measure how intellectual the evaluators thought candidates were, depending on whether or not the evaluators read or listened to the candidates' resume pitches. Note , however, we are not looking at ratings to individual candidates; we are looking at overall ratings for each evaluator. This is a bit confusing but makes sense if you stop to think about it a little. What we are interested in is how the medium they received the resume impacted their rating of the candidate. Once we have done that, we will then combine the overall intellect rating with the overall impression ratings and overall hire ratings for each evaluator, with the end goal of having a tibble called ratings2 - which has the following structure:

eval_id Category Rating condition sex_labels
1 hire 6.000 listened female
1 impression 7.000 listened female
1 intellect 6.000 listened female
2 hire 4.000 listened female
2 impression 4.667 listened female
2 intellect 5.667 listened female

The following steps describe how to create the above tibble and it would be good practice to try this out yourself. Look at the table and think what do I need? The trick when doing data analysis and data wrangling is to first think about what you want to achieve - the end goal - and then think about what functions you need to use to get there. The solution is hidden just below the stpes of course if you want to look at it. Let's look at the steps. Steps 1, 2 and 3 calculate the new overall intellect rating. Steps 4 and 5 combine this rating to all other information.

Load the data found in ratings.csv as a tibble into an object called ratings . (e.g. read the csv)

filter() only the relevant variables ( thoughtful , competent , intelligent ) into a new tibble stored in an objected called something useful (we will call ours iratings ), and then calculate a mean Rating for each evaluator (e.g. group_by & summarise).

Add on a new column called Category where every entry is the word intellect . This tells us that every number in this tibble is an intellect rating. (e.g. mutate)

Now create a new tibble called ratings2 and filter into it just the "impression" and "hire" ratings from the original ratings tibble.

Next, bind this tibble with the tibble you created in step 3 to bring together the intellect, impression, and hire ratings, in ratings2 . (e.g. bind_rows(object1, object2) )

Join ratings2 with the evaluator tibble that we created in Task 1 (e.g. inner_join() ). Keep only the necessary columns as shown above (e.g. select() ) and arrange by Evaluator and Category (e.g. arrange() ).

  • Finally, calculate the n, mean and SD for each condition and category to help with reporting the descriptive statistics.

10.1.2 Visualising two groups

Brilliant! Now that we have our data in a workable fashion, we are going to start looking at some visualisations and making figures. You should always visualise your data before you run a statistical analysis. Visualisations serve as part of the descriptive measures and they help you interpret the results of the test but they also give you an understanding of the spread of your data as part of the test assumptions. For data with a categorical IV, we are going to look at using the violin-boxplots that we saw in the introduction to visualisation chapter. In the past people would have tended to use barplots but as Newman and Scholl (2012) point out, barplots are misleading to viewers about how the underlying data actually looks. You can read that paper if you like, for more info, but hopefully by the end of this section you will see why violin-boxplots are more informative.

10.1.2.1 Activity 4: Visualisation

We will visualise the intellect ratings for the listened and the read conditions. The code we will use to create our figure is as follows with the explanation below. Put this code in a new code chunk and run it.

The first part of the code uses a pipe to filter the data to just the intellect rating:

  • ratings %>% filter(Category == "intellect) is the same as filter(ratings, Category == "intellect")
  • this code also reflects nicely the difference between pipes ( %>% ) used in wrangling and the + used in the visualisations with ggplot. Notice that we switch from pipes to plus when we start adding layers to our visualisation.

The main parts of the code to create the violin-boxplot above are:

  • ggplot() which creates our base layer and sets our data and our x and y axes.
  • geom_violin() which creates the density plot. The reason it is called a violin plot is because if your data are normally distributed it should look something like a violin.
  • geom_boxplot() which creates the boxplot, showing the median and inter-quartile range (see here if you would like more information). The boxplot can also give you a good idea if the data are skewed - the median line should be in the middle of the box. The more the median is moved towards one of th extremities of the box, the more your data is likely to be skewed.
  • geom_jitter() can be used to show individual data points in your dataset and you can change the width and height of the jitter. Note that this uses a randomised method to display the points so you will get a different output each time you run it.
  • And finally, we will use stat_summary() for displaying the mean and confidence intervals. Within this function, fun.data specifies the a summary function that gives us the summary of the data we want to plot, in this case, mean_cl_normal which will calculate the mean plus the upper and lower confidence interval limits. You could also specify mean_se here if you wanted standard error. Finally, geom specifies what shape or plot we want to use to display the summary, in this case we want a pointrange (literally a point (the mean) with a range (the CI)).

The figure will look like this:

Violin-boxplot of the evaluator data

Figure 10.1: Violin-boxplot of the evaluator data

An alternative version would be this shown below. Perhaps compare the two codes and see if you can see what makes the differences:

Violin-boxplot of the evaluator data

Figure 10.2: Violin-boxplot of the evaluator data

Try to answer the following question:

  • In which condition did the evaluators give the higher ratings overall? listened read
  • Would the descriptives (means, sds, figure) be inline with the hypothesis that evaluators favour resumes they have listened to more than resumes they have read? yes no

Nice and informative figure huh? It gives a good representation of the data in the two conditions, clearly showing the spread and the centre points. If you compare this to Figure 7 in the original paper you see the difference. We actually get much more information with our approach. We even get a sense that maybe the data is questionable on whether it is skewed or not, but more on that below.

The code is really useful as well so you know it is here if you want to use it again. But maybe have a play with the code to try out things to see what happens. For instance:

  • Try setting trim = TRUE , show.legend = FALSE , and/or altering the value of width to see what these arguments do.
  • change the Category == "intellect" to Category == "hire" or Category == "impression" to create visualisations of the other conditions.

10.1.3 Assumptions

Great. We have visualised our data as well and we have been able to make some descriptive analysis about what is going on. Now we want to get ready to run the actual analysis. But one final thing we are going to decide is which t-test? But hang on you say, didn't we decide that? We are going to run a between-subjects t-test! Right? Yes! But, and you know what we are about to say, there is more than one between-subjects t-test you can run. The two common ones are:

  • Student's between-subjects t-test
  • Welch's between-subjects t-test

We are going to recommend that, at least when doing the analysis by code, you should use Welch's between-subjects t-test for the reasons explained in this paper by Delarce et al,m (2017) Now you don't have to read that paper but effectively, the Welch's between-subjects t-test is better at maintaining the false positive rate of your test ( \(\alpha\) , usually set at \(\alpha\) = .05) at the requested level. So we will show you how to run a Welch's t-test here.

The assumptions for a Welch's between-subjects t-test are:

  • The data are continuous, i.e. interval/ratio
  • The data are independent
  • The residuals are normally distributed for each group

We know that 1 and 2 are true from the design of the experiment, the measures used, and by looking at the data. To test assumption 3, we can create a Q-Q plots of the residuals . For a between-subject t-test the residuals are the difference between the mean of each group and each data point. E.g., if the mean of group A is 10 and a participant in group A scores 12, the residual for that participant is 2.

  • Thinking back to your lectures, if you ran a Student's t-test instead of a Welch t-test, what would the 4th assumption be? Homogeneity of variance Homoscedascity Nominal data

10.1.3.1 Activity 5: Assumptions

  • Run the below code to calculate then plot the residuals for the "listened" condition on "intellect" ratings.
  • Run the below code to calculate then plot the residuals for the "read" condition on "intellect" ratings.

If we then look at our plots we get something that looks like this for the listened condition:

Residual plots of listened condition. Each circle represents an indivudal rater. If data is normally distributed then it should fall close to or on the diagonal line.

Figure 10.3: Residual plots of listened condition. Each circle represents an indivudal rater. If data is normally distributed then it should fall close to or on the diagonal line.

And something like this for the read condition.

Residual plots of read intellect condition. Each circle represents an indivudal rater. If data is normally distributed then it should fall close to or on the diagonal line.

Figure 10.4: Residual plots of read intellect condition. Each circle represents an indivudal rater. If data is normally distributed then it should fall close to or on the diagonal line.

What you are looking for is for the data to fall close to the diagonal line. Looking at the plots, maybe we could suggest that the "listened" condition is not so great as there is some data points moving away from the line at the far ends. The "read" condition seems a bit better, at least subjectively! There will always be some deviation from the diagonal line but at perhaps most of the data in both plots is relatively close to their respective diagonal lines.

But in addition to the Q-Q plots we can also run a test on the residuals known as the Shapiro-Wilk test. The Shapiro-Wilk's test has the alternative hypothesis that the data is significantly different from normal. As such, if you find a significant result using the test then the interpretation is that your data is not normal. If you find a non-significant finding then the interpretation is that your data is not significantly different from normal. One technical point is that the test doesn't actually say your data is normal either but just that it is not significantly different from normal. Again, remember that assumptions have a degree of subjectivity to them. We use the shapiro.wilk() function from the base package to run the Shapiro-Wilk's test.

  • In a new code chunk, run both lines of code below and look at their output.

Try to answer the following questions:

  • According to the Shapiro-Wilk's test, is the data normally distributed for the listened condition? Yes No
  • According to the Shapiro-Wilk's test, is the data normally distributed for the read condition? Yes No

So as you can see, the p-value for the listened condition is p = .174, and the p-value for the read condition is p = .445. So here we are in an interesting position that often happens. The figures for "listened" is a bit unclear, but the figure for "read" looks ok and both tests show a non-significant difference from normality. What do we do? Well we combine our knowledge of our data to make a reasoned decision. In this situation the majority of our information is pointing to the data being normal. However, there are known issues with the Shapiro-Wilks test when there are small sample sizes so we must always take results like this with some caution. It is never a good idea to run a small sample such as this and so in reality we might want to design a study that has larger sample groups. All that said, here it would not be unreasonable to take the assumption of normality as being held.

For info though, here are some options if you are convinced your data is nor normal.

  • Transform your data to try and normalise the distribution. We won't cover this but if you'd like to know more, this page is a good start. Not usually recommended these days but some still use it.
  • Use a non-parametric test. The non-parametric equivalent of the independent t-test is the Mann-Whitney and the equivalent of the paired-samples t-test is the Wilcoxon signed-ranks test. Though more modern permutation tests are better. Again we won't cover these here but useful to know if you read them in a paper.
  • Do nothing. Delacre, Lakens & Leys, 2017 argue that with a large enough sample (>30), the Welch test is robust to deviations from assumptions. With very large samples normality is even less of an issue, so design studies with large samples.

10.1.4 Inferential analysis

Now that we have checked our assumptions and our data seems to fit our Welch's t-test we can go ahead and run the test. We are going to conduct t-tests for the Intellect, Hire and Impression ratings separately; each time comparing evaluators' overall ratings for the listened group versus overall ratings for the read group to see if there was a significant difference between the two conditions: i.e. did the evaluators who listened to pitches give a significant higher or lower rating than evaluators that read pitches.

10.1.4.1 Activity 6: Running the t-test

  • First, create separate objects for the intellect, hire, and impression data using filter() . We have completed intellect object for you so you should replace the NULLs in the below code to create one for hire and impression .

And we are finally ready to run the t-test. It is funny right, as you may have realised by now, most of the work in analysis involves the set-up and getting the data ready, running the tests is generally just one more function. To conduct the t-test we will use t.test() function from base which takes the following format called the formula syntax :

  • ~ is called a tilde. It can be read as 'by' as in "analyse the DV by the IV".
  • The variable on the left of the tilde is the dependent or outcome variable, DV_column_name .
  • The variable(s) on the right of the tilde is the independent or predictor variable, IV_column_name .
  • and paired = FALSE indicates that we do not want to run a paired-samples test and that our data is from a between-subjects design.

So let's run our first test:

  • In a new code chunk, type and run the below code, and thenview the output by typing intellect_t in the console.

Similar to when we used cor.test() for correlations, the output of t.test() is a list type object which can make it harder to work with. This time, we are going to show you how to use the function tidy() from the broom package to convert the output to a tidyverse format.

  • Run the below code. You can read it as "take what is in the object intellect_t and try to tidy it into a tibble".
  • View the object by clicking on results_intellect in the environment.

As you will see, results_intellect is now in a nice tibble format that makes it easy to extract individual values. It is worth looking at the values with the below explanations:

  • estimate is the difference between the two means (alphabetically entered as mean 1 minus mean 2)
  • estimate1 is the mean of group 1
  • estimate2 is the mean of group 2
  • statistic is the t-statistic
  • p.value is the p-value
  • parameter is the degrees of freedom
  • con.low and conf.high are the confidence interval of the estimate
  • method is the type of test, Welch's, Student's, paired, or one-sample
  • alternative is whether the test was one or two-tailed

And now that we know how to run the test and tidy it, try the below:

  • Complete the code below in a new code chunk by replacing the NULLs to run the t-tests for the hire and impression ratings, don't tidy them yet.
  • And now tidy the data into the respective objects - hire_t into results_hire , etc.

Be sure to look at each of your tests and see what the outcome of each was. To make that easier, we are going join all the results of the t-tests together using bind_rows() - which we can do because all the tibbles have the same column names after we passed them through tidy() .

  • Copy and run the below code. First, it specifies all of the individual tibbles you want to join and gives them a label (hire, impression, intellect), and then you specify what the ID column should be named (test).

Which produces the below:

test estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
hire 1.825397 4.714286 2.888889 2.639949 0.0120842 36.85591 0.4241979 3.226596 Welch Two Sample t-test two.sided
impression 1.894333 5.968333 4.074000 2.817175 0.0080329 33.80061 0.5275086 3.261158 Welch Two Sample t-test two.sided
intellect 1.986722 5.635000 3.648278 3.478555 0.0014210 33.43481 0.8253146 3.148130 Welch Two Sample t-test two.sided

And looking along the line at the p-values we might have some significant differences. However, we have to remember to consider multiple comparisons.

10.1.4.2 Activity 7: Correcting for multiple comparisons

Because we have run three t-tests, we are actually increasing our false positive rate due to what is called familywise error - essentially, instead of a false positive rate of .05, we would have a false positive rate of 1-(1-.05)^3 = 0.142625, where the "3" in the formula is the number of tests we ran. To correct for this we can apply the multiple comparison correction just like we did with correlations when we ran a lot of correlations. So, we're going to add on a column to our results tibble that shows the adjusted p-values using p.adj() and mutate() .

  • inside the p.adjust() , p.value says what column the p-values are in, and bonferroni says what adjustment to use.

Looking at the adjusted p-values, try to answer the following questions:

  • Listened is significantly more preferred in the hire condition after adjusting for multiple comparisons? TRUE FALSE
  • Listened is significantly more preferred in the impression condition after adjusting for multiple comparisons? TRUE FALSE
  • Listened is significantly more preferred in the intellect condition after adjusting for multiple comparisons? TRUE FALSE

10.1.5 Effect Size

As you can see, even after correcting for multiple comparisons, our effects are still significant and we have maintained our false positive rate. But one more thing we can add is the effect size. Remember that some effects are significant and large, some are significant and medium, and some are significant and small. The effect size tells us the magnitude of the effect size in a way we can compare across studies - it is said to be a standardised - and the common effect size for a t-test is called Cohen's D.

10.1.5.1 Activity 8: Effect size

Whilst Cohen's D is relatively straightforward by hand, here we will use the function cohens_d() from the effectsize package. The code is similar to the syntax for t.test() .

  • The first argument should specify the formula, using the same syntax as t.test() , that is dv ~ iv .
  • pooled_sd should be FALSE if you ran a Welch test where the variances are not assumed to be equal and TRUE if you ran a regular Student's t-test.
  • Run and complete the code below by replacing the NULLs to calculate the effect sizes for hire and impression

10.1.6 Interpretation

Great Work! But let's take a second to recap on our understanding of the data.

10.1.6.1 Activity 9: Interpreting the results

Were your results for hire significant? Enter the mean estimates and t-test results (means and t-value to 2 decimal places, p-value to 3 decimal places). Use the adjusted p-values:

Mean estimate1 (listened condition) =

Mean estimate2 (read condition) =

t( ) = , p =

Were your results for impression significant? Enter the mean estimates and t-test results (means and t-value to 2 decimal places, p-value to 3 decimal places):

According to Cohen's (1988) guidelines, the effect sizes for all three tests are Small Medium Large

10.1.7 Write-Up

And then finally on the between-subjects t-test, we should look at the write up.

10.1.7.1 Activity 10: Write-up

If you refer back to the original paper on pg 887, you can see, for example, that the authors wrote:

In particular, the recruiters believed that the job candidates had greater intellect—were more competent, thoughtful, and intelligent—when they listened to pitches (M = 5.63, SD = 1.61) than when they read pitches (M = 3.65, SD = 1.91), t(37) = 3.53, p < .01, 95% CI of the difference = [0.85, 3.13], d = 1.16.

If we were to compare our findings, we would have something like the below:

A bonferroni-corrected Welch t-test found that recruiters rated job candidates as more intellectual when they listened to resumes (M = 5.64, SD = 1.61) than when they read resumes (M = 3.65, SD = 1.91), t(33.43) = 3.48, p = 0.004, 95% CI of the difference = [0.83, 3.15], d = 1.12.

You can create this same paragraph, using code, by copying and pasting the below exactly into white space in your R Markdown document and then knitting the file.

Note that we haven't replicated the analysis exactly - the authors of this paper conducted Student's t-test whilst we have conducted Welch tests and we've also applied a multiple comparison correction. But you can look at the two examples and see the difference. It would also be worthwhile trying your own write-up of the two remaining conditions before moving on to within-subjects t-tests.

10.2 Within-subjects (paired-samples)

For the final activity we will run a paired-samples t-test for a within-subject design but we will go through this one more quickly and just point out the differences to the above. For this example we will again draw from the Open Stats Lab and look at data from the data in Mehr, S. A., Song. L. A., & Spelke, E. S. (2016). For 5-month-old infants, melodies are social. Psychological Science, 27, 486-501. {target = "_blank"}.

The premis of the paper is that parents often sing to their children and, even as infants, children listen to and look at their parents while they are sung to. The authors sought to explore the psychological function that music has for parents and infants, by examining the research question that particular melodies may convey important social information to infants. More specifically, that common knowledge of songs and melodies convey information about social affiliation. The authors argue that melodies are shared within social groups. Whereas children growing up in one culture may be exposed to certain songs as infants (e.g., “Rock-a-bye Baby”), children growing up in other cultures (or even other groups within a culture) may be exposed to different songs. Thus, when a novel person (someone who the infant has never seen before) sings a familiar song, it may signal to the infant that this new person is a member of their social group.

To test this the researchers recruited 32 infants and their parents to take part in the following experiment. During their first visit to the lab, the parents were taught a new lullaby (one that neither they nor their infants had heard before). The experimenters asked the parents to sing the new lullaby to their child every day for the next 1-2 weeks. Following this 1-2 week exposure period, the parents and their infant returned to the lab to complete the experimental portion of the study. Infants were first shown a screen with side-by-side videos of two unfamiliar people, each of whom were silently smiling and looking at the infant. The researchers recorded the looking behaviour (or gaze) of the infants during this ‘baseline’ phase. Next, one by one, the two unfamiliar people on the screen sang either the lullaby that the parents learned or a different lullaby (that had the same lyrics and rhythm, but a different melody). Finally, the infants saw the same silent video used at baseline, and the researchers again recorded the looking behaviour of the infants during this ‘test’ phase. For more details on the experiment’s methods, please refer to Mehr et al. (2016) Experiment 1.

10.2.1 The Data

10.2.1.1 activity 11: getting the data ready.

  • again if easier you can download the data as a zip file by clicking this link .
  • it filters so we just have the first experiment from the paper
  • selects the id and the preferential looking time of babies at the baseline stage and at the test stage.
  • finally it renames the two preferential looking time columns to have names that are easier to work with using the rename() function.

10.2.2 Assumptions

So now that we have our data ready to work with, and be sure to look at it to get an understanding of the data, we want to consider the assumptions of the within-subjects t-test.

The assumptions for this t-test are a little different (although very similar) to the between-subjects t-tests above. They are

  • The data is continuous, i.e. interval/ratio
  • All participants should appear in both conditions/groups.
  • The residuals are normally distributed.

Aside from the data being paired rather than independent, i.e. it is the same participants in two conditions, rather than two groups of people in different conditions, the key difference is that for the within-subjects test, the data is actually determined as the difference between the scores in the two conditions for each participant. So for example, say participant one scores 10 in condition 1 and 7 in condition 2, then there data is actually 3, and you do that for all participants. So it isn't looking at what they scored in either condition by itself, but what was the difference between conditions. And it is that data that must be continuous and that the residuals must be normally distributed for.

10.2.2.1 Activity 12: Assumptions

  • Type and run the below code to first calculate the difference scores ( diff ) and then the residuals ( group_resid ).
  • next it plots the Q-Q plot of the residuals before carrying out a Shapiro-Wilk's test on the residuals

And if we look at the plot we see:

psychology experiment t test

and the Shapiro-Wilk's suggests:

Now as we saw above, with the Q-Q plot we want the data to fall approximately on the diagonal line, and with the Shapiro-Wilks test we are looking for a non-significant finding. Based on those two tests, we can therefor say that our data meets the assumption of normality and so we can proceed.

10.2.3 Descriptives

Now we are going to look at some descriptives. It made sense to keep the data in wide-form until this point to make it easy to calculate a column for the difference score, but now we will transform it to tidy data so that we can easily create descriptives and plot the data using tidyverse tools.

10.2.3.1 Activity 13: Descriptives and visualisations

  • Type and run the below code to gather the data using pivot_longer().
  • Next create a violin-boxplot of the data using your knowledge (and code) from Activity 4 above.
  • If you prefer, you could actually work on the difference scores instead of the two different conditions. Whilst we analyse the difference, people plot either the difference or the two conditions as descriptives.

If you have done this step correctly, you should see a plot that looks like this:

Preferential Looking time for infants at baseline stage (left) and test stage (right).

Figure 7.7: Preferential Looking time for infants at baseline stage (left) and test stage (right).

And the descriptives:

time n mean_looking sd_looking
baseline 32 0.5210967 0.1769651
test 32 0.5934912 0.1786884

Again you could look at the differences and if you know how you could plot the confidence interval of the difference, but it is not essential here. But looking at what you have done it would be worth spending a few minutes to try and predict the outcome of the t-test if the null hypothesis is that there is no difference in preferential looking time in babies between the baseline and test conditions.

10.2.4 Inferential Analysis

Which brings us on to running the t-test and the effect size. The code is almost identical to the independent code with two differences:

  • In t.test() you should specify paired = TRUE rather than FALSE
  • In cohens_d() you should specify method = paired rather than pooled_sd

10.2.4.1 Activity 14: Paired-samples t-test

  • i.e. pipe the output of the t-test into `tidy() in the one line of code.
  • calculate the Cohen's D for the t-test and store it in gaze_d

And if you have done that correctly, you should see in gaze_test something like this:

estimate statistic p.value parameter conf.low conf.high method alternative
-0.0723946 -2.41643 0.0217529 31 -0.133497 -0.0112922 Paired t-test two.sided

10.2.5 Write-Up and Interpretation

Looking at the output of the test, it is actually very similar to the between-subjects t-test, with one exception. Rather than providing the means of both conditions, there is a single estimate . This is the mean difference score between the two conditions and if you had calculated the descriptives on the diff we created above you would get the same answer.

Enter the mean estimates and t-test results (means and t-value to 2 decimal places, p-value to 3 decimal places):

Mean estimate =

10.2.5.1 Activity 15: Write-up

Now have a go at summarising this finding in a sentence using the standard APA formatting. We have hidden our version just below for you to look at when you have had a go.

At test stage (M = .59, SD = .18), infants showed a significantly longer preferential looking time to the singer of the familiar melody than they had shown the same singer at baseline (M = .52, SD = .18), t(31) = 2.42, p = .022, d = .41.

Alternatively:

At test stage, infants showed a significantly longer preferential looking time to the singer of the familiar melody than they had shown the same singer at baseline (Mean Difference = 0.07, SD = 0.17), t(31) = 2.42, p = .022, d = .41.

10.3 Finished!

That was a long chapter but hopefully you will see that it really is true that the hardest part is the set-up and the data wrangling. As we've said before, you don't need to memorise lines of code - you just need to remember where to find examples and to understand which bits of them you need to change. Play around with the examples we have given you and see what changing the values does. There is no specific Test Yourself section for this chapter but make sure you check your understanding of the different sections before moving on.

10.4 Activity solutions

Below you will find the solutions to the above questions. Only look at them after giving the questions a good try and trying to find help on Google or Teams about any issues.

10.4.0.1 Activity 1

10.4.0.2 activity 2.

This was our code:

and you could summarise as below to give an output:

10.4.0.3 Activity 6

10.4.0.4 activity 8, 10.4.0.5 activity 13.

For the plot:

For the descriptives:

10.4.0.6 Activity 14

For the t-test:

For the Cohen's D:

10.5 Words from this Chapter

Below you will find a list of words that were used in this chapter that might be new to you in case it helps to have somewhere to refer back to what they mean. The links in this table take you to the entry for the words in the PsyTeachR Glossary . Note that the Glossary is written by numerous members of the team and as such may use slightly different terminology from that shown in the chapter.

term definition
Not varying within unit of observation, such that each has only one value
Data that can only take certain values, such as types of pet.
An experimental design that has both within-subject and between-subject factors.
A data type representing a real decimal number or integer.
A study to compare the mean and spread of one group against a known value or population norm.
Varying such that each unit of observation has more than one value

That is end of this chapter. Be sure to look again at anything you were unsure about and make some notes to help develop your own knowledge and skills. It would be good to write yourself some questions about what you are unsure of and see if you can answer them later or speak to someone about them. Good work today!

Logo for British Columbia/Yukon Open Authoring Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7. Independent Means t-test

In this chapter, I will introduce to you one last t-test variation – the independent means t-test . This one is intended for the classic experimental design, in which two independent samples are compared.

psychology experiment t test

In a classic experimental design, we are comparing two samples. The independent means t-test is typically used to compare the data from an experimental group to those from a control group. The experimental group is the one that receives the manipulation, or the independent variable, and the control group is the one that receives either no manipulation, or an alternative one that represents the status quo – like a placebo. What makes this different from the repeated measures type of design, is that the scores from the two groups are independent. They are obtained from different participants who are randomly assigned to one group or the other.

For example, if I want to see if memory span is affected by the colour in which items are presented. I test one group of people with black and white items, and test another group of people with red items.

psychology experiment t test

I will compare one group average with the other group average. There is no relationship or dependency of one group of scores with the other group, so it will be appropriate to analyze the data with an independent t-test .

With all statistical tests, we know each sample comes from a population. The question is: are they different populations? To answer this question using statistical tests, we need to make some assumptions about the data. We have been making the normal curve assumption all along, and we saw how the central limit theorem can be used to justify this assumption when our samples are large enough. With this new kind of t-test, we are also going to make the homoscedasticity assumption : that the two populations we are comparing have the same variance. In an introductory course like this, we will not go into the technicalities of verifying this assumption, but it is possible to do so before we conduct the analysis.

In the independent means t-test , just like with the dependent t-test, we have no direct information about population 1 or 2. We calculate sample means from our research and comparison samples. To find the standard deviation for the comparison population, we will take sample based estimates using all the scores we have to hand, by pooling together the variance of each sample. This makes sense if we are assuming the two populations have equal variance.

In step 2, we will once again set the comparison population mean to zero, because the comparison reflects the distribution of means under the null hypothesis, in which there is no difference between the populations.

psychology experiment t test

The standard deviation will be calculated through a workflow previous students have told me looks like an hourglass shape.

psychology experiment t test

Starting from the top, we first calculate the variance of sample 1 and sample 2 separately.

 \[S^{2}=\frac{(X-M)^{2}}{N-1}\]

We then pool the two variances together using a weighted average formula. This formula just allows us to count one variance more than the other if the sample size is bigger. With two independent samples, it is not uncommon to have an unequal N, or number of scores, in each group.

 \[S^{2}_{Pooled}=\frac{df_{1}}{df_{Total}}(S^2_{1})+\frac{df_{2}}{df_{Total}}(S^2_{2})\]

Once we have the pooled variance calculated, we use that to convert to the variance of the distribution of the difference between means, which is the comparison distribution for this test.

 \[S^{2}_{M}=\frac{S^{2}_{Pooled}}{N} \]

We then square root to get the estimated standard deviation for the comparison distribution.

 \[S_{Difference}=\sqrt{S^{2}_{Difference}} \]

Now for step 3: the one new thing here is the degrees of freedom we will use to look up the cutoff sample score in the t tables . Because we have two samples, we will use the pooled, or total, degrees of freedom for lookup. That is the main advantage of the independent means t-test . Because we have two samples of scores, we get the benefit of more degrees of freedom.

For step 4, we have a new t-test formula. We subtract the sample mean of the control group from the sample mean of the experimental group, so that our directionality makes sense when we mark the test score on the comparison distribution and determine whether it falls in the shaded tail.

 \[t=\frac{M_{1}-M_{2}}{S_{Difference}}\]

One thing never changes: In step 5, if the t-test score falls in the shaded tail we reject the null hypothesis.

psychology experiment t test

As we go through the course, we are repeating lots of concepts and procedure enough, that I start to go quickly through those elements. If some aspect of the hypothesis test is still not making sense, that’s totally okay, and it’s completely normal. But you need to come back to those bits and grapple with them, perhaps by heading back to earlier chapters where those concepts or procedures were first introduced. Do not give up on a concept if it is still fuzzy. By now things should be starting to gel. Are there any aspects that you are still doing by rote rather than through conceptual understanding? I recommend that you persist. It will make sense if you get enough examples and explanations. For most of us it takes quite a bit of repetition and a few different approaches. Check out another textbook for an alternative look at the same piece.

If we were writing up the hypothesis test outcome from the example illustrated above, we might interpret it this way in the results section: “We found that people who consumed chocolate had significantly lower mood scores than the control group (p = 0.0145).”

The p-value represents the probability of the test score, or any score that is more extreme than that, occurring under the comparison distribution. To get that, we find the area under the curve beyond the test score, either in one tail or in both tails, depending on the directionality of the test.

We have now completed the decision tree for this section of the course. If we have two samples to compare, and there is no relationship between the individuals in the two samples, we use the independent t-test .

psychology experiment t test

Chapter Summary

In this chapter, we introduced the use of the independent means t-test in the context of hypothesis tests of the difference of two sample means. This test is appropriate for research designs in which two samples are formed through random assignment to groups, for example and experimental group and a control group. Scores from both samples are used to estimate the comparison population distribution, and to contribute to degrees of freedom.

a statistical test used in hypothesis tests comparing the means of two independent samples, created by random assignment of individuals to experimental and control groups

parametric tests like the t-test and Z-test require the assumption that the distribution of means for any given population is normally distributed

independent means t-tests require the assumption that the two populations we are comparing have the same variance

Beginner Statistics for Psychology Copyright © 2021 by Nicole Vittoz is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Logo for Maricopa Open Digital Press

12 Chapter 12: Repeated Measures t-test

So far, we have dealt with data measured on a single variable at a single point in time, allowing us to gain an understanding of the logic and process behind statistics and hypothesis testing. Now, we will look at a slightly different type of data that has new information we couldn’t get at before: change. Specifically, we will look at how the value of a variable, within people , changes across two timepoints. This is a very powerful thing to do, and, as we will see shortly, it involves only a very slight addition to our existing process and does not change the mechanics of hypothesis testing or formulas at all!

Change and Differences

Researchers are often interested in change over time. Sometimes we want to see if change occurs naturally, and other times we are hoping for change in response to some manipulation. In each of these cases, we measure a single variable at different times, and what we are looking for is whether or not we get the same score at time 2 as we did at time 1. This is a repeated sample research design , where a single group of individuals is obtained and each individual is measured in two treatment conditions  that are then compared.  Data consist of two scores for each individual. This means that all subjects participate in each treatment condition. Think about it like a pretest/posttest.

When we analyze data for a repeated research design, we calculate the difference between members of each pair of scores and then take the average of those differences. The absolute value of our measurements does not matter – all that matters is the change. If the average difference between scores in our sample is very large, compared to the difference between scores we would expect if the member was selected from the same population then we will conclude that the individuals were selected from different populations.

Let’s look at an example:

Before

After

Improvement

6

9

3

7

7

0

4

10

6

1

3

2

8

10

2

Table 1. Raw and difference scores before and after training.

psychology experiment t test

In both of these types of data, what we have are multiple scores on a single variable. That is, a single observation or data point is comprised of two measurements that are put together into one difference score. This is what makes the analysis of change unique – our ability to link these measurements in a meaningful way. This type of analysis would not work if we had two separate samples of people that weren’t related at the individual level, such as samples of people from different states that we gathered independently. Such datasets and analyses are the subject of the following chapter.

A rose by any other name…

It is important to point out that this form of t -test has been called many different things by many different people over the years: “matched pairs”, “paired samples”, “repeated measures”, “dependent measures”, “dependent samples”, and many others. What all of these names have in common is that they describe the analysis of two scores that are related in a systematic way within people or within pairs, which is what each of the datasets usable in this analysis have in common. As such, all of these names are equally appropriate, and the choice of which one to use comes down to preference. In this text, we will refer to paired samples , though the appearance of any of the other names throughout this chapter should not be taken to refer to a different analysis: they are all the same thing.

psychology experiment t test

2 cups of tea for me: for a repeated measures design the same individuals are in both conditions for a t-test. Photo credit

Now that we have an understanding of what difference scores are and know how to calculate them, we can use them to test hypotheses. As we will see, this works exactly the same way as testing hypotheses about one sample mean with a t- statistic. The only difference is in the format of the null and alternative hypotheses, where for focus on the difference score.

Hypotheses of Change and Differences for step 1

When we work with difference scores, our research questions have to do with change. Did scores improve? Did symptoms get better? Did prevalence go up or down? Our hypotheses will reflect this. Remember that the null hypothesis is the idea that there is nothing interesting, notable, or impactful represented in our dataset. In a paired samples t-test, that takes the form of ‘no change’. There is no improvement in scores or decrease in symptoms.

Just as before, you choice of which alternative hypothesis to use should be specified before you collect data based on your research question and any evidence you might have that would indicate a specific directional (or non-directional) change.  Additionally, it should be noted that a non-directional research/alternative hypothesis is a more conservative approach when you have an expected direction for change.

Choosing 1-tail vs 2-tail test

How do you choose whether to use a one-tailed versus a two-tailed test? The two-tailed test is always going to be more conservative, so it’s always a good bet to use that one, unless you had a very strong prior reason for using a one-tailed test. In that case, you should have written down the hypothesis before you ever looked at the data. In Chapter 19, we will discuss the idea of pre-registration of hypotheses, which formalizes the idea of writing down your hypotheses before you ever see the actual data. You should never make a decision about how to perform a hypothesis test once you have looked at the data, as this can introduce serious bias into the results.

We do have to make one main assumption when we use the randomization test, which we refer to as exchangeability . This means that all of the observations are distributed in the same way, such that we can interchange them without changing the overall distribution. The main place where this can break down is when there are related observations in the data; for example, if we had data from individuals in 4 different families, then we couldn’t assume that individuals were exchangeable, because siblings would be closer to each other than they are to individuals from other families. In general, if the data were obtained by random sampling, then the assumption of exchangeability should hold.

Critical Values and Decision Criteria for step 2

As with before, once we have our hypotheses laid out, we need to find our critical values that will serve as our decision criteria. This step has not changed at all from the last chapter. Our critical values are based on our level of significance (still usually α = 0.05), the directionality of our test (one-tailed or two-tailed), and the degrees of freedom, which are still calculated as df = n – 1. Because this is a t -test like the last chapter, we will find our critical values on the same t -table using the same process of identifying the correct column based on our significance level and directionality and the correct row based on our degrees of freedom or the next lowest value if our exact degrees of freedom are not presented. After we calculate our test statistic, our decision criteria are the same as well: p < α or t obt > t crit *.

Test Statistic for step 3

Our test statistic for our change scores follows exactly the same format as it did for our 1-sample t -test. In fact, the only difference is in the data that we use. For our change test, we first calculate a difference score as shown above. Then, we use those scores as the raw data in the same mean calculation, standard error formula, and t -statistic. Let’s look at each of these.

psychology experiment t test

Here we are using the subscript D to keep track of that fact that these are difference scores instead of raw scores; it has no actual effect on our calculation.

Using this, we calculate the standard deviation of the difference scores the same way as well:

psychology experiment t test

We will find the numerator, the Sum of Squares, using the same table format that we learned in chapter 3. Once we have our standard deviation, we can find the standard error:

psychology experiment t test

Finally, our test statistic t has the same structure as well:

psychology experiment t test

As we can see, once we calculate our difference scores from our raw measurements, everything else is exactly the same. Let’s see an example.

Example: Increasing Satisfaction at Work

psychology experiment t test

Hopefully the above example made it clear that running a dependent samples t -test to look for differences before and after some treatment works exactly the same way as a regular 1-sample t -test does from chapter 11 (which was just a small change in how z -tests were performed in chapter 10). At this point, this process should feel familiar, and we will continue to make small adjustments to this familiar process as we encounter new types of data to test new types of research questions.

psychology experiment t test

Example with Confidence Interval Hypothesis Testing: Bad Press

Let’s say that a bank wants to make sure that their new commercial will make them look good to the public, so they recruit 7 people to view the commercial as a focus group. The focus group members fill out a short questionnaire about how they view the company, then watch the commercial and fill out the same questionnaire a second time. The bank really wants to find significant results, so they test for a change at α = 0.05. However, they use a 2-tailed test since they know that past commercials have not gone over well with the public, and they want to make sure the new one does not backfire. They decide to test their hypothesis using a confidence interval to see just how spread out the opinions are. As we will see, confidence intervals work the same way as they did before, just like with the test statistic.

Step 1: State the Hypotheses

As always, we start with hypotheses, and with confidence interval hypothesis test, we must use a 2-tail test.

H 0 : There is no change in how people view the bank H 0 : μ D = 0

H A : There is a change in how people view the bank H A : μ D ≠ 0

Step 2: Find the Critical Values

Just like with our regular hypothesis testing procedure, we will need critical values from the appropriate level of significance and degrees of freedom in order to form our confidence interval. Because we have 7 participants, our degrees of freedom are df = 6. From our t -table, we find that the critical value corresponding to this df at this level of significance is t * = 2.447.

Step 3: Calculate the Confidence Interval

The data collected before (time 1) and after (time 2) the participants viewed the commercial is presented in Table 1. In order to build our confidence interval, we will first have to calculate the mean and standard deviation of the difference scores, which are also in Table 1. As a reminder, the difference scores ( D̅ or M D ) are calculated as Time 2 – Time 1.

Time 1

Time 2

3

2

-1

3

6

3

5

3

-2

8

4

-4

3

9

6

1

2

1

4

5

1

Table 1. Opinions of the bank

The mean of the difference scores is: D̅ = 4/7 = .57

The standard deviation will be solved by first using the Sum of Squares Table:

D

D –

(D –)

-1

-1.57

2.46

3

2.43

5.90

-2

-2.57

6.60

-4

-4.57

20.88

6

5.43

29.48

1

0.43

0.18

1

0.43

0.18

Σ = 4

Σ = 0

Σ = 65.68 (our SS)

s = √SS/df where SS = 65.68 and df = n-1 = 7-1 = 6

Step 4: Make the Decision

Remember that the confidence interval represents a range of values that seem plausible or reasonable based on our observed data. The interval spans -1.86 to 3.00, which includes 0, our null hypothesis value. Because the null hypothesis value is in the interval, it is considered a reasonable value, and because it is a reasonable value, we have no evidence against it. We fail to reject the null hypothesis.

Assumptions are conditions that must be met in order for our hypothesis testing conclusion to be valid. [Important: If the assumptions are not met then our hypothesis testing conclusion is not likely to be valid. Testing errors can still occur even if the assumptions for the test are met.]

Recall that inferential statistics allow us to make inferences (decisions, estimates, predictions) about a population based on data collected from a sample. Recall also that an inference about a population is true only if the sample studied is representative of the population. A statement about a population based on a biased sample is not likely to be true.

Assumption 1 : Individuals in the sample were selected randomly and independently, so the sample is highly likely to be representative of the larger population.

•        Random sampling ensures that each member of the population is equally likely to be selected.

•        An independent sample is one which the selection of one member has no effect on the selection of any other.

Assumption 2: The distribution of sample differences (DSD) is a normal, because we drew the samples from a population that was normally distributed.

  • This assumption is very important because we are estimating probabilities using the t- table – which provide accurate estimates of probabilities for events distributed normally.

Assumption 3: Sampled populations have equal variances or have homogeneity of variance.

Advantages & Disadvantages of using a repeated measures design

Advantages. Repeated measure designs reduce the probability of Type I errors when compared with independent sample designs because repeated measure t-tests reduce the probability that we will get a statistically significant difference that is due to an extraneous variable that differed between groups by chance (due to some other factor than the one in which we are interested).

Repeated measure designs are also more powerful (sensitive) than independent sample designs because two scores from each person are compared so each person serves as his or her own control group (we analyze the difference between scores). A special type of repeated measures design is known as the matched pairs design. If we are designing a study and suspect that there are important factors that could differ between our groups even if we randomly select and assign subjects, then we may use this type of design.

Because members of a matched-pair are similar to each other there is greater likelihood of our statistical test finding an “effect” when one person is present (power) in a repeated sample design as compared to a two-repeated sample design (in which subjects for two groups are picked randomly and independently – not matched on any traits).

Disadvantages. Repeated measure t-tests are very sensitive to outside influences and treatment influences. Outside Influences refers to factors outside of the experiment that may interfere with testing an individual across treatment/trials. Examples include mood or health or motivation of the individual participants. Think about it, if a participant tries really hard during the pretest but does not try very hard during the posttest, these differences can create problems later when analyzing the data.

Treatment Influences refers to the events that happen within the testing experience that interferes with how the data are collected. Three of the most common treatment influences are: 1. Practice effects, 2. Fatigue effects, and 3. Order effects.

Practice effect is present where participants perform a task better in later conditions because they have had a chance to practice it. Another type is a fatigue effect, where participants perform a task worse in later conditions because they become tired or bored. Order effects refer to differences in research participants’ responses that result from the order (e.g., first, second, third) in which the experimental materials are presented to them.

Imagine, for example, that participants judge the guilt of an attractive defendant and then judge the guilt of an unattractive defendant. If they judge the unattractive defendant more harshly, this might be because of his unattractiveness. But it could be instead that they judge him more harshly because they are becoming bored or tired. In other words, the order of the conditions is a confounding variable. The attractive condition is always the first condition and the unattractive condition the second. Thus any difference between the conditions in terms of the dependent variable could be caused by the order of the conditions and not the independent variable itself.

There is a solution to the problem of order effects, however, that can be used in many situations. It is counterbalancing, which means testing different participants in different orders. For example, some participants would be tested in the attractive defendant condition followed by the unattractive defendant condition, and others would be tested in the unattractive condition followed by the attractive condition. With three conditions, there would be six different orders (ABC, ACB, BAC, BCA, CAB, and CBA), so some participants would be tested in each of the six orders. With counterbalancing, participants are assigned to orders randomly, using the techniques we have already discussed. Thus random assignment plays an important role in within-subjects designs just as in between-subjects designs. Here, instead of randomly assigning to conditions, they are randomly assigned to different orders of conditions. In fact, it can safely be said that if a study does not involve random assignment in one form or another, it is not an experiment.

Because the repeated-measures design requires that each individual participate in more than one treatment, there is always the risk that exposure to the first treatment will cause a change in the participants that influences their scores in the second treatment that have nothing to do with the intervention.  For example, if students are given the same test before and after the intervention the change in the posttest might be because the student got practice taking the test, not because the intervention was successful.

Learning Objectives

Having read this chapter, a student should be able to:

  • identify when appropriate to calculate a paired or dependent t-test
  • perform a hypothesis test using the paired or dependent t-test
  • compute and interpret effect size for dependent or paired t-test
  • list the assumptions for running a paired or dependent t-test
  • list the advantages and disadvantages for a repeated measures design

Exercises – Ch. 12

  • What is the difference between a 1-sample t -test and a dependent-samples t – test? How are they alike?
  • Name 3 research questions that could be addressed using a dependent- samples t -test.
  • What are difference scores and why do we calculate them?
  • Why is the null hypothesis for a dependent-samples t -test always μ D = 0?
  • A researcher is interested in testing whether explaining the processes of statistics helps increase trust in computer algorithms. He wants to test for a difference at the α = 0.05 level and knows that some people may trust the algorithms less after the training, so he uses a two-tailed test. He gathers pre- post data from 35 people and finds that the average difference score is 12.10 with a standard deviation (s) is 17.39. Conduct a hypothesis test to answer the research question.
  • M𝐷̅ = 3.50, s = 1.10, n = 12, α = 0.05, two-tailed test
  • 95% CI = (0.20, 1.85)
  • t = 2.98, t * = -2.36, one-tailed test to the left
  • 90% CI = (-1.12, 4.36)
  • Calculate difference scores for the following data:

Time 1

Time 2

X or D

61

83

75

89

91

98

83

92

74

80

82

88

98

98

82

77

69

88

76

79

91

91

70

80

8. You want to know if an employee’s opinion about an organization is the same as the opinion of that employee’s boss. You collect data from 18 employee-supervisor pairs and code the difference scores so that positive scores indicate that the employee has a higher opinion and negative scores indicate that the boss has a higher opinion (meaning that difference scores of 0 indicate no difference and complete agreement). You find that the mean difference score is ̅𝑋̅̅𝐷̅ = -3.15 with a standard deviation of s D = 1.97. Test this hypothesis at the α = 0.01 level.

9. Construct confidence intervals from a mean = 1.25, standard error of 0.45, and d f = 10 at the 90%, 95%, and 99% confidence level. Describe what happens as confidence changes and whether to reject H 0 .

10.A professor wants to see how much students learn over the course of a semester. A pre-test is given before the class begins to see what students know ahead of time, and the same test is given at the end of the semester to see what students know at the end. The data are below. Test for an improvement at the α = 0.05 level. Did scores increase? How much did scores increase?

Pretest

Posttest

X

90

8

60

66

95

99

93

91

95

100

67

64

89

91

90

95

94

95

83

89

75

82

87

92

82

83

82

85

88

93

66

69

90

90

93

100

86

95

91

96

Answers to Odd- Numbered Exercises – Ch. 12

1. A 1-sample t -test uses raw scores to compare an average to a specific value. A dependent samples t -test uses two raw scores from each person to calculate difference scores and test for an average difference score that is equal to zero. The calculations, steps, and interpretation is exactly the same for each.

7. See table last column.

Time 1

Time 2

D or X

61

83

22

75

89

14

91

98

7

83

92

9

74

80

6

82

88

6

98

98

0

82

77

-5

69

88

19

76

79

3

91

91

0

70

80

10

9. At the 90% confidence level, t * = 1.812 and CI = (0.43, 2.07) so we reject H 0 . At the 95% confidence level, t * = 2.228 and CI = (0.25, 2.25) so we reject H 0 . At the 99% confidence level, t * = 3.169 and CI = (-0.18, 2.68) so we fail to reject H 0 . As the confidence level goes up, our interval gets wider (which is why we have higher confidence), and eventually we do not reject the null hypothesis because the interval is so wide that it contains 0.

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Stanley Milgram Shock Experiment

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Stanley Milgram, a psychologist at Yale University, carried out one of the most famous studies of obedience in psychology.

He conducted an experiment focusing on the conflict between obedience to authority and personal conscience.

Milgram (1963) examined justifications for acts of genocide offered by those accused at the World War II, Nuremberg War Criminal trials. Their defense often was based on obedience  – that they were just following orders from their superiors.

The experiments began in July 1961, a year after the trial of Adolf Eichmann in Jerusalem. Milgram devised the experiment to answer the question:

Could it be that Eichmann and his million accomplices in the Holocaust were just following orders? Could we call them all accomplices?” (Milgram, 1974).

Milgram (1963) wanted to investigate whether Germans were particularly obedient to authority figures, as this was a common explanation for the Nazi killings in World War II.

Milgram selected participants for his experiment by newspaper advertising for male participants to take part in a study of learning at Yale University.

The procedure was that the participant was paired with another person and they drew lots to find out who would be the ‘learner’ and who would be the ‘teacher.’  The draw was fixed so that the participant was always the teacher, and the learner was one of Milgram’s confederates (pretending to be a real participant).

stanley milgram generator scale

The learner (a confederate called Mr. Wallace) was taken into a room and had electrodes attached to his arms, and the teacher and researcher went into a room next door that contained an electric shock generator and a row of switches marked from 15 volts (Slight Shock) to 375 volts (Danger: Severe Shock) to 450 volts (XXX).

The shocks in Stanley Milgram’s obedience experiments were not real. The “learners” were actors who were part of the experiment and did not actually receive any shocks.

However, the “teachers” (the real participants of the study) believed the shocks were real, which was crucial for the experiment to measure obedience to authority figures even when it involved causing harm to others.

Milgram’s Experiment (1963)

Milgram (1963) was interested in researching how far people would go in obeying an instruction if it involved harming another person.

Stanley Milgram was interested in how easily ordinary people could be influenced into committing atrocities, for example, Germans in WWII.

Volunteers were recruited for a controlled experiment investigating “learning” (re: ethics: deception). 

Participants were 40 males, aged between 20 and 50, whose jobs ranged from unskilled to professional, from the New Haven area. They were paid $4.50 for just turning up.

Milgram

At the beginning of the experiment, they were introduced to another participant, a confederate of the experimenter (Milgram).

They drew straws to determine their roles – learner or teacher – although this was fixed, and the confederate was always the learner. There was also an “experimenter” dressed in a gray lab coat, played by an actor (not Milgram).

Two rooms in the Yale Interaction Laboratory were used – one for the learner (with an electric chair) and another for the teacher and experimenter with an electric shock generator.

Milgram Obedience: Mr Wallace

The “learner” (Mr. Wallace) was strapped to a chair with electrodes.

After he has learned a list of word pairs given to him to learn, the “teacher” tests him by naming a word and asking the learner to recall its partner/pair from a list of four possible choices.

The teacher is told to administer an electric shock every time the learner makes a mistake, increasing the level of shock each time. There were 30 switches on the shock generator marked from 15 volts (slight shock) to 450 (danger – severe shock).

Milgram Obedience IV Variations

The learner gave mainly wrong answers (on purpose), and for each of these, the teacher gave him an electric shock. When the teacher refused to administer a shock, the experimenter was to give a series of orders/prods to ensure they continued.

There were four prods, and if one was not obeyed, then the experimenter (Mr. Williams) read out the next prod, and so on.

Prod 1 : Please continue. Prod 2: The experiment requires you to continue. Prod 3 : It is absolutely essential that you continue. Prod 4 : You have no other choice but to continue.

These prods were to be used in order, and begun afresh for each new attempt at defiance (Milgram, 1974, p. 21). The experimenter also had two special prods available. These could be used as required by the situation:

  • Although the shocks may be painful, there is no permanent tissue damage, so please go on’ (ibid.)
  • ‘Whether the learner likes it or not, you must go on until he has learned all the word pairs correctly. So please go on’ (ibid., p. 22).

65% (two-thirds) of participants (i.e., teachers) continued to the highest level of 450 volts. All the participants continued to 300 volts.

Milgram did more than one experiment – he carried out 18 variations of his study.  All he did was alter the situation (IV) to see how this affected obedience (DV).

Conclusion 

The individual explanation for the behavior of the participants would be that it was something about them as people that caused them to obey, but a more realistic explanation is that the situation they were in influenced them and caused them to behave in the way that they did.

Some aspects of the situation that may have influenced their behavior include the formality of the location, the behavior of the experimenter, and the fact that it was an experiment for which they had volunteered and been paid.

Ordinary people are likely to follow orders given by an authority figure, even to the extent of killing an innocent human being.  Obedience to authority is ingrained in us all from the way we are brought up.

People tend to obey orders from other people if they recognize their authority as morally right and/or legally based. This response to legitimate authority is learned in a variety of situations, for example in the family, school, and workplace.

Milgram summed up in the article “The Perils of Obedience” (Milgram 1974), writing:

“The legal and philosophic aspects of obedience are of enormous import, but they say very little about how most people behave in concrete situations. I set up a simple experiment at Yale University to test how much pain an ordinary citizen would inflict on another person simply because he was ordered to by an experimental scientist. Stark authority was pitted against the subjects’ [participants’] strongest moral imperatives against hurting others, and, with the subjects’ [participants’] ears ringing with the screams of the victims, authority won more often than not. The extreme willingness of adults to go to almost any lengths on the command of an authority constitutes the chief finding of the study and the fact most urgently demanding explanation.”

Milgram’s Agency Theory

Milgram (1974) explained the behavior of his participants by suggesting that people have two states of behavior when they are in a social situation:

  • The autonomous state – people direct their own actions, and they take responsibility for the results of those actions.
  • The agentic state – people allow others to direct their actions and then pass off the responsibility for the consequences to the person giving the orders. In other words, they act as agents for another person’s will.

Milgram suggested that two things must be in place for a person to enter the agentic state:

  • The person giving the orders is perceived as being qualified to direct other people’s behavior. That is, they are seen as legitimate.
  • The person being ordered about is able to believe that the authority will accept responsibility for what happens.
According to Milgram, when in this agentic state, the participant in the obedience studies “defines himself in a social situation in a manner that renders him open to regulation by a person of higher status. In this condition the individual no longer views himself as responsible for his own actions but defines himself as an instrument for carrying out the wishes of others” (Milgram, 1974, p. 134).

Agency theory says that people will obey an authority when they believe that the authority will take responsibility for the consequences of their actions. This is supported by some aspects of Milgram’s evidence.

For example, when participants were reminded that they had responsibility for their own actions, almost none of them were prepared to obey.

In contrast, many participants who were refusing to go on did so if the experimenter said that he would take responsibility.

According to Milgram (1974, p. 188):

“The behavior revealed in the experiments reported here is normal human behavior but revealed under conditions that show with particular clarity the danger to human survival inherent in our make-up.

And what is it we have seen? Not aggression, for there is no anger, vindictiveness, or hatred in those who shocked the victim….

Something far more dangerous is revealed: the capacity for man to abandon his humanity, indeed, the inevitability that he does so, as he merges his unique personality into larger institutional structures.”

Milgram Experiment Variations

The Milgram experiment was carried out many times whereby Milgram (1965) varied the basic procedure (changed the IV).  By doing this Milgram could identify which factors affected obedience (the DV).

Obedience was measured by how many participants shocked to the maximum 450 volts (65% in the original study). Stanley Milgram conducted a total of 23 variations (also called conditions or experiments) of his original obedience study:

In total, 636 participants were tested in 18 variation studies conducted between 1961 and 1962 at Yale University.

In the original baseline study – the experimenter wore a gray lab coat to symbolize his authority (a kind of uniform).

The lab coat worn by the experimenter in the original study served as a crucial symbol of scientific authority that increased obedience. The lab coat conveyed expertise and legitimacy, making participants see the experimenter as more credible and trustworthy.

Milgram carried out a variation in which the experimenter was called away because of a phone call right at the start of the procedure.

The role of the experimenter was then taken over by an ‘ordinary member of the public’ ( a confederate) in everyday clothes rather than a lab coat. The obedience level dropped to 20%.

Change of Location:  The Mountain View Facility Study (1963, unpublished)

Milgram conducted this variation in a set of offices in a rundown building, claiming it was associated with “Research Associates of Bridgeport” rather than Yale.

The lab’s ordinary appearance was designed to test if Yale’s prestige encouraged obedience. Participants were led to believe that a private research firm experimented.

In this non-university setting, obedience rates dropped to 47.5% compared to 65% in the original Yale experiments. This suggests that the status of location affects obedience.

Private research firms are viewed as less prestigious than certain universities, which affects behavior. It is easier under these conditions to abandon the belief in the experimenter’s essential decency.

The impressive university setting reinforced the experimenter’s authority and conveyed an implicit approval of the research.

Milgram filmed this variation for his documentary Obedience , but did not publish the results in his academic papers. The study only came to wider light when archival materials, including his notes, films, and data, were studied by later researchers like Perry (2013) in the decades after Milgram’s death.

Two Teacher Condition

When participants could instruct an assistant (confederate) to press the switches, 92.5% shocked to the maximum of 450 volts.

Allowing the participant to instruct an assistant to press the shock switches diffused personal responsibility and likely reduced perceptions of causing direct harm.

By attributing the actions to the assistant rather than themselves, participants could more easily justify shocking to the maximum 450 volts, reflected in the 92.5% obedience rate.

When there is less personal responsibility, obedience increases. This relates to Milgram’s Agency Theory.

Touch Proximity Condition

The teacher had to force the learner’s hand down onto a shock plate when the learner refused to participate after 150 volts. Obedience fell to 30%.

Forcing the learner’s hand onto the shock plate after 150 volts physically connected the teacher to the consequences of their actions. This direct tactile feedback increased the teacher’s personal responsibility.

No longer shielded from the learner’s reactions, the proximity enabled participants to more clearly perceive the harm they were causing, reducing obedience to 30%. Physical distance and indirect actions in the original setup made it easier to rationalize obeying the experimenter.

The participant is no longer buffered/protected from seeing the consequences of their actions.

Social Support Condition

When the two confederates set an example of defiance by refusing to continue the shocks, especially early on at 150 volts, it permitted the real participant also to resist authority.

Two other participants (confederates) were also teachers but refused to obey. Confederate 1 stopped at 150 volts, and Confederate 2 stopped at 210 volts.

Their disobedience provided social proof that it was acceptable to disobey. This modeling of defiance lowered obedience to only 10% compared to 65% without such social support. It demonstrated that social modeling can validate challenging authority.

The presence of others who are seen to disobey the authority figure reduces the level of obedience to 10%.

Absent Experimenter Condition 

It is easier to resist the orders from an authority figure if they are not close by. When the experimenter instructed and prompted the teacher by telephone from another room, obedience fell to 20.5%.

Many participants cheated and missed out on shocks or gave less voltage than ordered by the experimenter. The proximity of authority figures affects obedience.

The physical absence of the authority figure enabled participants to act more freely on their own moral inclinations rather than the experimenter’s commands. This highlighted the role of an authority’s direct presence in influencing behavior.

A key reason the obedience studies fascinate people is Milgram presented them as a scientific experiment, contrasting himself as an “empirically grounded scientist” compared to philosophers. He claimed he systematically varied factors to alter obedience rates.

However, recent scholarship using archival records shows Milgram’s account of standardizing the procedure was misleading. For example, he published a list of standardized prods the experimenter used when participants questioned continuing. Milgram said these were delivered uniformly in a firm but polite tone.

Analyzing audiotapes, Gibson (2013) found considerable variation from the published protocol – the prods differed across trials. The point is not that Milgram did poor science, but that the archival materials reveal the limitations of the textbook account of his “standardized” procedure.

The qualitative data like participant feedback, Milgram’s notes, and researchers’ actions provide a fuller, messier picture than the obedience studies’ “official” story. For psychology students, this shows how scientific reporting can polish findings in a way that strays from the less tidy reality.

Critical Evaluation

Inaccurate description of the prod methodology:.

A key reason the obedience studies fascinate people is Milgram (1974) presented them as a scientific experiment, contrasting himself as an “empirically grounded scientist” compared to philosophers. He claimed he systematically varied factors to alter obedience rates.

However, recent scholarship using archival records shows Milgram’s account of standardizing the procedure was misleading. For example, he published a list of standardized prods the experimenter used when participants questioned continuing. Milgram said these were delivered uniformly in a firm but polite tone (Gibson, 2013; Perry, 2013; Russell, 2010).

Perry’s (2013) archival research revealed another discrepancy between Milgram’s published account and the actual events. Milgram claimed standardized prods were used when participants resisted, but Perry’s audiotape analysis showed the experimenter often improvised more coercive prods beyond the supposed script.

This off-script prodding varied between experiments and participants, and was especially prevalent with female participants where no gender obedience difference was found – suggesting the improvisation influenced results. Gibson (2013) and Russell (2009) corroborated the experimenter’s departures from the supposed fixed prods. 

Prods were often combined or modified rather than used verbatim as published.

Russell speculated the improvisation aimed to achieve outcomes the experimenter believed Milgram wanted. Milgram seemed to tacitly approve of the deviations by not correcting them when observing.

This raises significant issues around experimenter bias influencing results, lack of standardization compromising validity, and ethical problems with Milgram misrepresenting procedures.

Milgram’s experiment lacked external validity:

The Milgram studies were conducted in laboratory-type conditions, and we must ask if this tells us much about real-life situations.

We obey in a variety of real-life situations that are far more subtle than instructions to give people electric shocks, and it would be interesting to see what factors operate in everyday obedience. The sort of situation Milgram investigated would be more suited to a military context.

Orne and Holland (1968) accused Milgram’s study of lacking ‘experimental realism,”’ i.e.,” participants might not have believed the experimental set-up they found themselves in and knew the learner wasn’t receiving electric shocks.

“It’s more truthful to say that only half of the people who undertook the experiment fully believed it was real, and of those two-thirds disobeyed the experimenter,” observes Perry (p. 139).

Milgram’s sample was biased:

  • The participants in Milgram’s study were all male. Do the findings transfer to females?
  • Milgram’s study cannot be seen as representative of the American population as his sample was self-selected. This is because they became participants only by electing to respond to a newspaper advertisement (selecting themselves).
  • They may also have a typical “volunteer personality” – not all the newspaper readers responded so perhaps it takes this personality type to do so.

Yet a total of 636 participants were tested in 18 separate experiments across the New Haven area, which was seen as being reasonably representative of a typical American town.

Milgram’s findings have been replicated in a variety of cultures and most lead to the same conclusions as Milgram’s original study and in some cases see higher obedience rates.

However, Smith and Bond (1998) point out that with the exception of Jordan (Shanab & Yahya, 1978), the majority of these studies have been conducted in industrialized Western cultures, and we should be cautious before we conclude that a universal trait of social behavior has been identified.

Selective reporting of experimental findings:

Perry (2013) found Milgram omitted findings from some obedience experiments he conducted, reporting only results supporting his conclusions. A key omission was the Relationship condition (conducted in 1962 but unpublished), where participant pairs were relatives or close acquaintances.

When the learner protested being shocked, most teachers disobeyed, contradicting Milgram’s emphasis on obedience to authority.

Perry argued Milgram likely did not publish this 85% disobedience rate because it undermined his narrative and would be difficult to defend ethically since the teacher and learner knew each other closely.

Milgram’s selective reporting biased interpretations of his findings. His failure to publish all his experiments raises issues around researchers’ ethical obligation to completely and responsibly report their results, not just those fitting their expectations.

Unreported analysis of participants’ skepticism and its impact on their behavior:

Perry (2013) found archival evidence that many participants expressed doubt about the experiment’s setup, impacting their behavior. This supports Orne and Holland’s (1968) criticism that Milgram overlooked participants’ perceptions.

Incongruities like apparent danger, but an unconcerned experimenter likely cued participants that no real harm would occur. Trust in Yale’s ethics reinforced this. Yet Milgram did not publish his assistant’s analysis showing participant skepticism correlated with disobedience rates and varied by condition.

Obedient participants were more skeptical that the learner was harmed. This selective reporting biased interpretations. Additional unreported findings further challenge Milgram’s conclusions.

This highlights issues around thoroughly and responsibly reporting all results, not just those fitting expectations. It shows how archival evidence makes Milgram’s study a contentious classic with questionable methods and conclusions.

Ethical Issues

What are the potential ethical concerns associated with Milgram’s research on obedience?

While not a “contribution to psychology” in the traditional sense, Milgram’s obedience experiments sparked significant debate about the ethics of psychological research.

Baumrind (1964) criticized the ethics of Milgram’s research as participants were prevented from giving their informed consent to take part in the study. 

Participants assumed the experiment was benign and expected to be treated with dignity.

As a result of studies like Milgram’s, the APA and BPS now require researchers to give participants more information before they agree to take part in a study.

The participants actually believed they were shocking a real person and were unaware the learner was a confederate of Milgram’s.

However, Milgram argued that “illusion is used when necessary in order to set the stage for the revelation of certain difficult-to-get-at-truths.”

Milgram also interviewed participants afterward to find out the effect of the deception. Apparently, 83.7% said that they were “glad to be in the experiment,” and 1.3% said that they wished they had not been involved.

Protection of participants 

Participants were exposed to extremely stressful situations that may have the potential to cause psychological harm. Many of the participants were visibly distressed (Baumrind, 1964).

Signs of tension included trembling, sweating, stuttering, laughing nervously, biting lips and digging fingernails into palms of hands. Three participants had uncontrollable seizures, and many pleaded to be allowed to stop the experiment.

Milgram described a businessman reduced to a “twitching stuttering wreck” (1963, p. 377),

In his defense, Milgram argued that these effects were only short-term. Once the participants were debriefed (and could see the confederate was OK), their stress levels decreased.

“At no point,” Milgram (1964) stated, “were subjects exposed to danger and at no point did they run the risk of injurious effects resulting from participation” (p. 849).

To defend himself against criticisms about the ethics of his obedience research, Milgram cited follow-up survey data showing that 84% of participants said they were glad they had taken part in the study.

Milgram used this to claim that the study caused no serious or lasting harm, since most participants retrospectively did not regret their involvement.

Yet archival accounts show many participants endured lasting distress, even trauma, refuting Milgram’s insistence the study caused only fleeting “excitement.” By not debriefing all, Milgram misled participants about the true risks involved (Perry, 2013).

However, Milgram did debrief the participants fully after the experiment and also followed up after a period of time to ensure that they came to no harm.

Milgram debriefed all his participants straight after the experiment and disclosed the true nature of the experiment.

Participants were assured that their behavior was common, and Milgram also followed the sample up a year later and found no signs of any long-term psychological harm.

The majority of the participants (83.7%) said that they were pleased that they had participated, and 74% had learned something of personal importance.

Perry’s (2013) archival research found Milgram misrepresented debriefing – around 600 participants were not properly debriefed soon after the study, contrary to his claims. Many only learned no real shocks occurred when reading a mailed study report months later, which some may have not received.

Milgram likely misreported debriefing details to protect his credibility and enable future obedience research. This raises issues around properly informing and debriefing participants that connect to APA ethics codes developed partly in response to Milgram’s study.

Right to Withdrawal 

The BPS states that researchers should make it plain to participants that they are free to withdraw at any time (regardless of payment).

When expressing doubts, the experimenter assured them all was well. Trusting Yale scientists, many took the experimenter at his word that “no permanent tissue damage” would occur, and continued administering shocks despite reservations.

Did Milgram give participants an opportunity to withdraw? The experimenter gave four verbal prods which mostly discouraged withdrawal from the experiment:

  • Please continue.
  • The experiment requires that you continue.
  • It is absolutely essential that you continue.
  • You have no other choice, you must go on.

Milgram argued that they were justified as the study was about obedience, so orders were necessary.

Milgram pointed out that although the right to withdraw was made partially difficult, it was possible as 35% of participants had chosen to withdraw.

Replications

Direct replications have not been possible due to current ethical standards . However, several researchers have conducted partial replications and variations that aim to reproduce some aspects of Milgram’s methods ethically.

One important replication was conducted by Jerry Burger in 2009. Burger’s partial replication included several safeguards to protect participant welfare, such as screening out high-risk individuals, repeatedly reminding participants they could withdraw, and stopping at the 150-volt shock level. This was the point where Milgram’s participants first heard the learner’s protests.

As 79% of Milgram’s participants who went past 150 volts continued to the maximum 450 volts, Burger (2009) argued that 150 volts provided a reasonable estimate for obedience levels. He found 70% of participants continued to 150 volts, compared to 82.5% in Milgram’s comparable condition.

Another replication by Thomas Blass (1999) examined whether obedience rates had declined over time due to greater public awareness of the experiments. Blass correlated obedience rates from replication studies between 1963 and 1985 and found no relationship between year and obedience level. He concluded that obedience rates have not systematically changed, providing evidence against the idea of “enlightenment effects”.

Some variations have explored the role of gender. Milgram found equal rates of obedience for male and female participants. Reviews have found most replications also show no gender difference, with a couple of exceptions (Blass, 1999). For example, Kilham and Mann (1974) found lower obedience in female participants.

Partial replications have also examined situational factors. Having another person model defiance reduced obedience compared to a solo participant in one study, but did not eliminate it (Burger, 2009). The authority figure’s perceived expertise seems to be an influential factor (Blass, 1999). Replications have supported Milgram’s observation that stepwise increases in demands promote obedience.

Personality factors have been studied as well. Traits like high empathy and desire for control correlate with some minor early hesitation, but do not greatly impact eventual obedience levels (Burger, 2009). Authoritarian tendencies may contribute to obedience (Elms, 2009).

In sum, the partial replications confirm Milgram’s degree of obedience. Though ethical constraints prevent full reproductions, the key elements of his procedure seem to consistently elicit high levels of compliance across studies, samples, and eras. The replications continue to highlight the power of situational pressures to yield obedience.

Milgram (1963) Audio Clips

Below you can also hear some of the audio clips taken from the video that was made of the experiment. Just click on the clips below.

Why was the Milgram experiment so controversial?

The Milgram experiment was controversial because it revealed people’s willingness to obey authority figures even when causing harm to others, raising ethical concerns about the psychological distress inflicted upon participants and the deception involved in the study.

Would Milgram’s experiment be allowed today?

Milgram’s experiment would likely not be allowed today in its original form, as it violates modern ethical guidelines for research involving human participants, particularly regarding informed consent, deception, and protection from psychological harm.

Did anyone refuse the Milgram experiment?

Yes, in the Milgram experiment, some participants refused to continue administering shocks, demonstrating individual variation in obedience to authority figures. In the original Milgram experiment, approximately 35% of participants refused to administer the highest shock level of 450 volts, while 65% obeyed and delivered the 450-volt shock.

How can Milgram’s study be applied to real life?

Milgram’s study can be applied to real life by demonstrating the potential for ordinary individuals to obey authority figures even when it involves causing harm, emphasizing the importance of questioning authority, ethical decision-making, and fostering critical thinking in societal contexts.

Were all participants in Milgram’s experiments male?

Yes, in the original Milgram experiment conducted in 1961, all participants were male, limiting the generalizability of the findings to women and diverse populations.

Why was the Milgram experiment unethical?

The Milgram experiment was considered unethical because participants were deceived about the true nature of the study and subjected to severe emotional distress. They believed they were causing harm to another person under the instruction of authority.

Additionally, participants were not given the right to withdraw freely and were subjected to intense pressure to continue. The psychological harm and lack of informed consent violates modern ethical guidelines for research.

Baumrind, D. (1964). Some thoughts on ethics of research: After reading Milgram’s” Behavioral study of obedience.”.  American Psychologist ,  19 (6), 421.

Blass, T. (1999). The Milgram paradigm after 35 years: Some things we now know about obedience to authority 1.  Journal of Applied Social Psychology ,  29 (5), 955-978.

Brannigan, A., Nicholson, I., & Cherry, F. (2015). Introduction to the special issue: Unplugging the Milgram machine.  Theory & Psychology ,  25 (5), 551-563.

Burger, J. M. (2009). Replicating Milgram: Would people still obey today? American Psychologist, 64 , 1–11.

Elms, A. C. (2009). Obedience lite. American Psychologist, 64 (1), 32–36.

Gibson, S. (2013). Milgram’s obedience experiments: A rhetorical analysis. British Journal of Social Psychology, 52, 290–309.

Gibson, S. (2017). Developing psychology’s archival sensibilities: Revisiting Milgram’s obedience’ experiments.  Qualitative Psychology ,  4 (1), 73.

Griggs, R. A., Blyler, J., & Jackson, S. L. (2020). Using research ethics as a springboard for teaching Milgram’s obedience study as a contentious classic.  Scholarship of Teaching and Learning in Psychology ,  6 (4), 350.

Haslam, S. A., & Reicher, S. D. (2018). A truth that does not always speak its name: How Hollander and Turowetz’s findings confirm and extend the engaged followership analysis of harm-doing in the Milgram paradigm. British Journal of Social Psychology, 57, 292–300.

Haslam, S. A., Reicher, S. D., & Birney, M. E. (2016). Questioning authority: New perspectives on Milgram’s ‘obedience’ research and its implications for intergroup relations. Current Opinion in Psychology, 11 , 6–9.

Haslam, S. A., Reicher, S. D., Birney, M. E., Millard, K., & McDonald, R. (2015). ‘Happy to have been of service’: The Yale archive as a window into the engaged followership of participants in Milgram’s ‘obedience’ experiment. British Journal of Social Psychology, 54 , 55–83.

Kaplan, D. E. (1996). The Stanley Milgram papers: A case study on appraisal of and access to confidential data files. American Archivist, 59 , 288–297.

Kaposi, D. (2022). The second wave of critical engagement with Stanley Milgram’s ‘obedience to authority’experiments: What did we learn?.  Social and Personality Psychology Compass ,  16 (6), e12667.

Kilham, W., & Mann, L. (1974). Level of destructive obedience as a function of transmitter and executant roles in the Milgram obedience paradigm. Journal of Personality and Social Psychology, 29 (5), 696–702.

Milgram, S. (1963). Behavioral study of obedience . Journal of Abnormal and Social Psychology , 67, 371-378.

Milgram, S. (1964). Issues in the study of obedience: A reply to Baumrind. American Psychologist, 19 , 848–852.

Milgram, S. (1965). Some conditions of obedience and disobedience to authority . Human Relations, 18(1) , 57-76.

Milgram, S. (1974). Obedience to authority: An experimental view . Harpercollins.

Miller, A. G. (2009). Reflections on” Replicating Milgram”(Burger, 2009), American Psychologis t, 64 (1):20-27

Nicholson, I. (2011). “Torture at Yale”: Experimental subjects, laboratory torment and the “rehabilitation” of Milgram’s “obedience to authority”. Theory & Psychology, 21 , 737–761.

Nicholson, I. (2015). The normalization of torment: Producing and managing anguish in Milgram’s “obedience” laboratory. Theory & Psychology, 25 , 639–656.

Orne, M. T., & Holland, C. H. (1968). On the ecological validity of laboratory deceptions. International Journal of Psychiatry, 6 (4), 282-293.

Orne, M. T., & Holland, C. C. (1968). Some conditions of obedience and disobedience to authority. On the ecological validity of laboratory deceptions. International Journal of Psychiatry, 6 , 282–293.

Perry, G. (2013). Behind the shock machine: The untold story of the notorious Milgram psychology experiments . New York, NY: The New Press.

Reicher, S., Haslam, A., & Miller, A. (Eds.). (2014). Milgram at 50: Exploring the enduring relevance of psychology’s most famous studies [Special issue]. Journal of Social Issues, 70 (3), 393–602

Russell, N. (2014). Stanley Milgram’s obedience to authority “relationship condition”: Some methodological and theoretical implications. Social Sciences, 3, 194–214

Shanab, M. E., & Yahya, K. A. (1978). A cross-cultural study of obedience. Bulletin of the Psychonomic Society .

Smith, P. B., & Bond, M. H. (1998). Social psychology across cultures (2nd Edition) . Prentice Hall.

Further Reading

  • The power of the situation: The impact of Milgram’s obedience studies on personality and social psychology
  • Seeing is believing: The role of the film Obedience in shaping perceptions of Milgram’s Obedience to Authority Experiments
  • Replicating Milgram: Would people still obey today?

Learning Check

Which is true regarding the Milgram obedience study?
  • The aim was to see how obedient people would be in a situation where following orders would mean causing harm to another person.
  • Participants were under the impression they were part of a learning and memory experiment.
  • The “learners” in the study were actual participants who volunteered to be shocked as part of the experiment.
  • The “learner” was an actor who was in on the experiment and never actually received any real shocks.
  • Although the participant could not see the “learner”, he was able to hear him clearly through the wall
  • The study was directly influenced by Milgram’s observations of obedience patterns in post-war Europe.
  • The experiment was designed to understand the psychological mechanisms behind war crimes committed during World War II.
  • The Milgram study was universally accepted in the psychological community, and no ethical concerns were raised about its methodology.
  • When Milgram’s experiment was repeated in a rundown office building in Bridgeport, the percentage of the participants who fully complied with the commands of the experimenter remained unchanged.
  • The experimenter (authority figure) delivered verbal prods to encourage the teacher to continue, such as ‘Please continue’ or ‘Please go on’.
  • Over 80% of participants went on to deliver the maximum level of shock.
  • Milgram sent participants questionnaires after the study to assess the effects and found that most felt no remorse or guilt, so it was ethical.
  • The aftermath of the study led to stricter ethical guidelines in psychological research.
  • The study emphasized the role of situational factors over personality traits in determining obedience.

Answers : Items 3, 8, 9, and 11 are the false statements.

Short Answer Questions
  • Briefly explain the results of the original Milgram experiments. What did these results prove?
  • List one scenario on how an authority figure can abuse obedience principles.
  • List one scenario on how an individual could use these principles to defend their fellow peers.
  • In a hospital, you are very likely to obey a nurse. However, if you meet her outside the hospital, for example in a shop, you are much less likely to obey. Using your knowledge of how people resist pressure to obey, explain why you are less likely to obey the nurse outside the hospital.
  • Describe the shock instructions the participant (teacher) was told to follow when the victim (learner) gave an incorrect answer.
  • State the lowest voltage shock that was labeled on the shock generator.
  • What would likely happen if Milgram’s experiment included a condition in which the participant (teacher) had to give a high-level electric shock for the first wrong answer?
Group Activity

Gather in groups of three or four to discuss answers to the short answer questions above.

For question 2, review the different scenarios you each came up with. Then brainstorm on how these situations could be flipped.

For question 2, discuss how an authority figure could instead empower those below them in the examples your groupmates provide.

For question 3, discuss how a peer could do harm by using the obedience principles in the scenarios your groupmates provide.

Essay Topic
  • What’s the most important lesson of Milgram’s Obedience Experiments? Fully explain and defend your answer.
  • Milgram selectively edited his film of the obedience experiments to emphasize obedient behavior and minimize footage of disobedience. What are the ethical implications of a researcher selectively presenting findings in a way that fits their expected conclusions?

Print Friendly, PDF & Email

19+ Experimental Design Examples (Methods + Types)

practical psychology logo

Ever wondered how scientists discover new medicines, psychologists learn about behavior, or even how marketers figure out what kind of ads you like? Well, they all have something in common: they use a special plan or recipe called an "experimental design."

Imagine you're baking cookies. You can't just throw random amounts of flour, sugar, and chocolate chips into a bowl and hope for the best. You follow a recipe, right? Scientists and researchers do something similar. They follow a "recipe" called an experimental design to make sure their experiments are set up in a way that the answers they find are meaningful and reliable.

Experimental design is the roadmap researchers use to answer questions. It's a set of rules and steps that researchers follow to collect information, or "data," in a way that is fair, accurate, and makes sense.

experimental design test tubes

Long ago, people didn't have detailed game plans for experiments. They often just tried things out and saw what happened. But over time, people got smarter about this. They started creating structured plans—what we now call experimental designs—to get clearer, more trustworthy answers to their questions.

In this article, we'll take you on a journey through the world of experimental designs. We'll talk about the different types, or "flavors," of experimental designs, where they're used, and even give you a peek into how they came to be.

What Is Experimental Design?

Alright, before we dive into the different types of experimental designs, let's get crystal clear on what experimental design actually is.

Imagine you're a detective trying to solve a mystery. You need clues, right? Well, in the world of research, experimental design is like the roadmap that helps you find those clues. It's like the game plan in sports or the blueprint when you're building a house. Just like you wouldn't start building without a good blueprint, researchers won't start their studies without a strong experimental design.

So, why do we need experimental design? Think about baking a cake. If you toss ingredients into a bowl without measuring, you'll end up with a mess instead of a tasty dessert.

Similarly, in research, if you don't have a solid plan, you might get confusing or incorrect results. A good experimental design helps you ask the right questions ( think critically ), decide what to measure ( come up with an idea ), and figure out how to measure it (test it). It also helps you consider things that might mess up your results, like outside influences you hadn't thought of.

For example, let's say you want to find out if listening to music helps people focus better. Your experimental design would help you decide things like: Who are you going to test? What kind of music will you use? How will you measure focus? And, importantly, how will you make sure that it's really the music affecting focus and not something else, like the time of day or whether someone had a good breakfast?

In short, experimental design is the master plan that guides researchers through the process of collecting data, so they can answer questions in the most reliable way possible. It's like the GPS for the journey of discovery!

History of Experimental Design

Around 350 BCE, people like Aristotle were trying to figure out how the world works, but they mostly just thought really hard about things. They didn't test their ideas much. So while they were super smart, their methods weren't always the best for finding out the truth.

Fast forward to the Renaissance (14th to 17th centuries), a time of big changes and lots of curiosity. People like Galileo started to experiment by actually doing tests, like rolling balls down inclined planes to study motion. Galileo's work was cool because he combined thinking with doing. He'd have an idea, test it, look at the results, and then think some more. This approach was a lot more reliable than just sitting around and thinking.

Now, let's zoom ahead to the 18th and 19th centuries. This is when people like Francis Galton, an English polymath, started to get really systematic about experimentation. Galton was obsessed with measuring things. Seriously, he even tried to measure how good-looking people were ! His work helped create the foundations for a more organized approach to experiments.

Next stop: the early 20th century. Enter Ronald A. Fisher , a brilliant British statistician. Fisher was a game-changer. He came up with ideas that are like the bread and butter of modern experimental design.

Fisher invented the concept of the " control group "—that's a group of people or things that don't get the treatment you're testing, so you can compare them to those who do. He also stressed the importance of " randomization ," which means assigning people or things to different groups by chance, like drawing names out of a hat. This makes sure the experiment is fair and the results are trustworthy.

Around the same time, American psychologists like John B. Watson and B.F. Skinner were developing " behaviorism ." They focused on studying things that they could directly observe and measure, like actions and reactions.

Skinner even built boxes—called Skinner Boxes —to test how animals like pigeons and rats learn. Their work helped shape how psychologists design experiments today. Watson performed a very controversial experiment called The Little Albert experiment that helped describe behaviour through conditioning—in other words, how people learn to behave the way they do.

In the later part of the 20th century and into our time, computers have totally shaken things up. Researchers now use super powerful software to help design their experiments and crunch the numbers.

With computers, they can simulate complex experiments before they even start, which helps them predict what might happen. This is especially helpful in fields like medicine, where getting things right can be a matter of life and death.

Also, did you know that experimental designs aren't just for scientists in labs? They're used by people in all sorts of jobs, like marketing, education, and even video game design! Yes, someone probably ran an experiment to figure out what makes a game super fun to play.

So there you have it—a quick tour through the history of experimental design, from Aristotle's deep thoughts to Fisher's groundbreaking ideas, and all the way to today's computer-powered research. These designs are the recipes that help people from all walks of life find answers to their big questions.

Key Terms in Experimental Design

Before we dig into the different types of experimental designs, let's get comfy with some key terms. Understanding these terms will make it easier for us to explore the various types of experimental designs that researchers use to answer their big questions.

Independent Variable : This is what you change or control in your experiment to see what effect it has. Think of it as the "cause" in a cause-and-effect relationship. For example, if you're studying whether different types of music help people focus, the kind of music is the independent variable.

Dependent Variable : This is what you're measuring to see the effect of your independent variable. In our music and focus experiment, how well people focus is the dependent variable—it's what "depends" on the kind of music played.

Control Group : This is a group of people who don't get the special treatment or change you're testing. They help you see what happens when the independent variable is not applied. If you're testing whether a new medicine works, the control group would take a fake pill, called a placebo , instead of the real medicine.

Experimental Group : This is the group that gets the special treatment or change you're interested in. Going back to our medicine example, this group would get the actual medicine to see if it has any effect.

Randomization : This is like shaking things up in a fair way. You randomly put people into the control or experimental group so that each group is a good mix of different kinds of people. This helps make the results more reliable.

Sample : This is the group of people you're studying. They're a "sample" of a larger group that you're interested in. For instance, if you want to know how teenagers feel about a new video game, you might study a sample of 100 teenagers.

Bias : This is anything that might tilt your experiment one way or another without you realizing it. Like if you're testing a new kind of dog food and you only test it on poodles, that could create a bias because maybe poodles just really like that food and other breeds don't.

Data : This is the information you collect during the experiment. It's like the treasure you find on your journey of discovery!

Replication : This means doing the experiment more than once to make sure your findings hold up. It's like double-checking your answers on a test.

Hypothesis : This is your educated guess about what will happen in the experiment. It's like predicting the end of a movie based on the first half.

Steps of Experimental Design

Alright, let's say you're all fired up and ready to run your own experiment. Cool! But where do you start? Well, designing an experiment is a bit like planning a road trip. There are some key steps you've got to take to make sure you reach your destination. Let's break it down:

  • Ask a Question : Before you hit the road, you've got to know where you're going. Same with experiments. You start with a question you want to answer, like "Does eating breakfast really make you do better in school?"
  • Do Some Homework : Before you pack your bags, you look up the best places to visit, right? In science, this means reading up on what other people have already discovered about your topic.
  • Form a Hypothesis : This is your educated guess about what you think will happen. It's like saying, "I bet this route will get us there faster."
  • Plan the Details : Now you decide what kind of car you're driving (your experimental design), who's coming with you (your sample), and what snacks to bring (your variables).
  • Randomization : Remember, this is like shuffling a deck of cards. You want to mix up who goes into your control and experimental groups to make sure it's a fair test.
  • Run the Experiment : Finally, the rubber hits the road! You carry out your plan, making sure to collect your data carefully.
  • Analyze the Data : Once the trip's over, you look at your photos and decide which ones are keepers. In science, this means looking at your data to see what it tells you.
  • Draw Conclusions : Based on your data, did you find an answer to your question? This is like saying, "Yep, that route was faster," or "Nope, we hit a ton of traffic."
  • Share Your Findings : After a great trip, you want to tell everyone about it, right? Scientists do the same by publishing their results so others can learn from them.
  • Do It Again? : Sometimes one road trip just isn't enough. In the same way, scientists often repeat their experiments to make sure their findings are solid.

So there you have it! Those are the basic steps you need to follow when you're designing an experiment. Each step helps make sure that you're setting up a fair and reliable way to find answers to your big questions.

Let's get into examples of experimental designs.

1) True Experimental Design

notepad

In the world of experiments, the True Experimental Design is like the superstar quarterback everyone talks about. Born out of the early 20th-century work of statisticians like Ronald A. Fisher, this design is all about control, precision, and reliability.

Researchers carefully pick an independent variable to manipulate (remember, that's the thing they're changing on purpose) and measure the dependent variable (the effect they're studying). Then comes the magic trick—randomization. By randomly putting participants into either the control or experimental group, scientists make sure their experiment is as fair as possible.

No sneaky biases here!

True Experimental Design Pros

The pros of True Experimental Design are like the perks of a VIP ticket at a concert: you get the best and most trustworthy results. Because everything is controlled and randomized, you can feel pretty confident that the results aren't just a fluke.

True Experimental Design Cons

However, there's a catch. Sometimes, it's really tough to set up these experiments in a real-world situation. Imagine trying to control every single detail of your day, from the food you eat to the air you breathe. Not so easy, right?

True Experimental Design Uses

The fields that get the most out of True Experimental Designs are those that need super reliable results, like medical research.

When scientists were developing COVID-19 vaccines, they used this design to run clinical trials. They had control groups that received a placebo (a harmless substance with no effect) and experimental groups that got the actual vaccine. Then they measured how many people in each group got sick. By comparing the two, they could say, "Yep, this vaccine works!"

So next time you read about a groundbreaking discovery in medicine or technology, chances are a True Experimental Design was the VIP behind the scenes, making sure everything was on point. It's been the go-to for rigorous scientific inquiry for nearly a century, and it's not stepping off the stage anytime soon.

2) Quasi-Experimental Design

So, let's talk about the Quasi-Experimental Design. Think of this one as the cool cousin of True Experimental Design. It wants to be just like its famous relative, but it's a bit more laid-back and flexible. You'll find quasi-experimental designs when it's tricky to set up a full-blown True Experimental Design with all the bells and whistles.

Quasi-experiments still play with an independent variable, just like their stricter cousins. The big difference? They don't use randomization. It's like wanting to divide a bag of jelly beans equally between your friends, but you can't quite do it perfectly.

In real life, it's often not possible or ethical to randomly assign people to different groups, especially when dealing with sensitive topics like education or social issues. And that's where quasi-experiments come in.

Quasi-Experimental Design Pros

Even though they lack full randomization, quasi-experimental designs are like the Swiss Army knives of research: versatile and practical. They're especially popular in fields like education, sociology, and public policy.

For instance, when researchers wanted to figure out if the Head Start program , aimed at giving young kids a "head start" in school, was effective, they used a quasi-experimental design. They couldn't randomly assign kids to go or not go to preschool, but they could compare kids who did with kids who didn't.

Quasi-Experimental Design Cons

Of course, quasi-experiments come with their own bag of pros and cons. On the plus side, they're easier to set up and often cheaper than true experiments. But the flip side is that they're not as rock-solid in their conclusions. Because the groups aren't randomly assigned, there's always that little voice saying, "Hey, are we missing something here?"

Quasi-Experimental Design Uses

Quasi-Experimental Design gained traction in the mid-20th century. Researchers were grappling with real-world problems that didn't fit neatly into a laboratory setting. Plus, as society became more aware of ethical considerations, the need for flexible designs increased. So, the quasi-experimental approach was like a breath of fresh air for scientists wanting to study complex issues without a laundry list of restrictions.

In short, if True Experimental Design is the superstar quarterback, Quasi-Experimental Design is the versatile player who can adapt and still make significant contributions to the game.

3) Pre-Experimental Design

Now, let's talk about the Pre-Experimental Design. Imagine it as the beginner's skateboard you get before you try out for all the cool tricks. It has wheels, it rolls, but it's not built for the professional skatepark.

Similarly, pre-experimental designs give researchers a starting point. They let you dip your toes in the water of scientific research without diving in head-first.

So, what's the deal with pre-experimental designs?

Pre-Experimental Designs are the basic, no-frills versions of experiments. Researchers still mess around with an independent variable and measure a dependent variable, but they skip over the whole randomization thing and often don't even have a control group.

It's like baking a cake but forgetting the frosting and sprinkles; you'll get some results, but they might not be as complete or reliable as you'd like.

Pre-Experimental Design Pros

Why use such a simple setup? Because sometimes, you just need to get the ball rolling. Pre-experimental designs are great for quick-and-dirty research when you're short on time or resources. They give you a rough idea of what's happening, which you can use to plan more detailed studies later.

A good example of this is early studies on the effects of screen time on kids. Researchers couldn't control every aspect of a child's life, but they could easily ask parents to track how much time their kids spent in front of screens and then look for trends in behavior or school performance.

Pre-Experimental Design Cons

But here's the catch: pre-experimental designs are like that first draft of an essay. It helps you get your ideas down, but you wouldn't want to turn it in for a grade. Because these designs lack the rigorous structure of true or quasi-experimental setups, they can't give you rock-solid conclusions. They're more like clues or signposts pointing you in a certain direction.

Pre-Experimental Design Uses

This type of design became popular in the early stages of various scientific fields. Researchers used them to scratch the surface of a topic, generate some initial data, and then decide if it's worth exploring further. In other words, pre-experimental designs were the stepping stones that led to more complex, thorough investigations.

So, while Pre-Experimental Design may not be the star player on the team, it's like the practice squad that helps everyone get better. It's the starting point that can lead to bigger and better things.

4) Factorial Design

Now, buckle up, because we're moving into the world of Factorial Design, the multi-tasker of the experimental universe.

Imagine juggling not just one, but multiple balls in the air—that's what researchers do in a factorial design.

In Factorial Design, researchers are not satisfied with just studying one independent variable. Nope, they want to study two or more at the same time to see how they interact.

It's like cooking with several spices to see how they blend together to create unique flavors.

Factorial Design became the talk of the town with the rise of computers. Why? Because this design produces a lot of data, and computers are the number crunchers that help make sense of it all. So, thanks to our silicon friends, researchers can study complicated questions like, "How do diet AND exercise together affect weight loss?" instead of looking at just one of those factors.

Factorial Design Pros

This design's main selling point is its ability to explore interactions between variables. For instance, maybe a new study drug works really well for young people but not so great for older adults. A factorial design could reveal that age is a crucial factor, something you might miss if you only studied the drug's effectiveness in general. It's like being a detective who looks for clues not just in one room but throughout the entire house.

Factorial Design Cons

However, factorial designs have their own bag of challenges. First off, they can be pretty complicated to set up and run. Imagine coordinating a four-way intersection with lots of cars coming from all directions—you've got to make sure everything runs smoothly, or you'll end up with a traffic jam. Similarly, researchers need to carefully plan how they'll measure and analyze all the different variables.

Factorial Design Uses

Factorial designs are widely used in psychology to untangle the web of factors that influence human behavior. They're also popular in fields like marketing, where companies want to understand how different aspects like price, packaging, and advertising influence a product's success.

And speaking of success, the factorial design has been a hit since statisticians like Ronald A. Fisher (yep, him again!) expanded on it in the early-to-mid 20th century. It offered a more nuanced way of understanding the world, proving that sometimes, to get the full picture, you've got to juggle more than one ball at a time.

So, if True Experimental Design is the quarterback and Quasi-Experimental Design is the versatile player, Factorial Design is the strategist who sees the entire game board and makes moves accordingly.

5) Longitudinal Design

pill bottle

Alright, let's take a step into the world of Longitudinal Design. Picture it as the grand storyteller, the kind who doesn't just tell you about a single event but spins an epic tale that stretches over years or even decades. This design isn't about quick snapshots; it's about capturing the whole movie of someone's life or a long-running process.

You know how you might take a photo every year on your birthday to see how you've changed? Longitudinal Design is kind of like that, but for scientific research.

With Longitudinal Design, instead of measuring something just once, researchers come back again and again, sometimes over many years, to see how things are going. This helps them understand not just what's happening, but why it's happening and how it changes over time.

This design really started to shine in the latter half of the 20th century, when researchers began to realize that some questions can't be answered in a hurry. Think about studies that look at how kids grow up, or research on how a certain medicine affects you over a long period. These aren't things you can rush.

The famous Framingham Heart Study , started in 1948, is a prime example. It's been studying heart health in a small town in Massachusetts for decades, and the findings have shaped what we know about heart disease.

Longitudinal Design Pros

So, what's to love about Longitudinal Design? First off, it's the go-to for studying change over time, whether that's how people age or how a forest recovers from a fire.

Longitudinal Design Cons

But it's not all sunshine and rainbows. Longitudinal studies take a lot of patience and resources. Plus, keeping track of participants over many years can be like herding cats—difficult and full of surprises.

Longitudinal Design Uses

Despite these challenges, longitudinal studies have been key in fields like psychology, sociology, and medicine. They provide the kind of deep, long-term insights that other designs just can't match.

So, if the True Experimental Design is the superstar quarterback, and the Quasi-Experimental Design is the flexible athlete, then the Factorial Design is the strategist, and the Longitudinal Design is the wise elder who has seen it all and has stories to tell.

6) Cross-Sectional Design

Now, let's flip the script and talk about Cross-Sectional Design, the polar opposite of the Longitudinal Design. If Longitudinal is the grand storyteller, think of Cross-Sectional as the snapshot photographer. It captures a single moment in time, like a selfie that you take to remember a fun day. Researchers using this design collect all their data at one point, providing a kind of "snapshot" of whatever they're studying.

In a Cross-Sectional Design, researchers look at multiple groups all at the same time to see how they're different or similar.

This design rose to popularity in the mid-20th century, mainly because it's so quick and efficient. Imagine wanting to know how people of different ages feel about a new video game. Instead of waiting for years to see how opinions change, you could just ask people of all ages what they think right now. That's Cross-Sectional Design for you—fast and straightforward.

You'll find this type of research everywhere from marketing studies to healthcare. For instance, you might have heard about surveys asking people what they think about a new product or political issue. Those are usually cross-sectional studies, aimed at getting a quick read on public opinion.

Cross-Sectional Design Pros

So, what's the big deal with Cross-Sectional Design? Well, it's the go-to when you need answers fast and don't have the time or resources for a more complicated setup.

Cross-Sectional Design Cons

Remember, speed comes with trade-offs. While you get your results quickly, those results are stuck in time. They can't tell you how things change or why they're changing, just what's happening right now.

Cross-Sectional Design Uses

Also, because they're so quick and simple, cross-sectional studies often serve as the first step in research. They give scientists an idea of what's going on so they can decide if it's worth digging deeper. In that way, they're a bit like a movie trailer, giving you a taste of the action to see if you're interested in seeing the whole film.

So, in our lineup of experimental designs, if True Experimental Design is the superstar quarterback and Longitudinal Design is the wise elder, then Cross-Sectional Design is like the speedy running back—fast, agile, but not designed for long, drawn-out plays.

7) Correlational Design

Next on our roster is the Correlational Design, the keen observer of the experimental world. Imagine this design as the person at a party who loves people-watching. They don't interfere or get involved; they just observe and take mental notes about what's going on.

In a correlational study, researchers don't change or control anything; they simply observe and measure how two variables relate to each other.

The correlational design has roots in the early days of psychology and sociology. Pioneers like Sir Francis Galton used it to study how qualities like intelligence or height could be related within families.

This design is all about asking, "Hey, when this thing happens, does that other thing usually happen too?" For example, researchers might study whether students who have more study time get better grades or whether people who exercise more have lower stress levels.

One of the most famous correlational studies you might have heard of is the link between smoking and lung cancer. Back in the mid-20th century, researchers started noticing that people who smoked a lot also seemed to get lung cancer more often. They couldn't say smoking caused cancer—that would require a true experiment—but the strong correlation was a red flag that led to more research and eventually, health warnings.

Correlational Design Pros

This design is great at proving that two (or more) things can be related. Correlational designs can help prove that more detailed research is needed on a topic. They can help us see patterns or possible causes for things that we otherwise might not have realized.

Correlational Design Cons

But here's where you need to be careful: correlational designs can be tricky. Just because two things are related doesn't mean one causes the other. That's like saying, "Every time I wear my lucky socks, my team wins." Well, it's a fun thought, but those socks aren't really controlling the game.

Correlational Design Uses

Despite this limitation, correlational designs are popular in psychology, economics, and epidemiology, to name a few fields. They're often the first step in exploring a possible relationship between variables. Once a strong correlation is found, researchers may decide to conduct more rigorous experimental studies to examine cause and effect.

So, if the True Experimental Design is the superstar quarterback and the Longitudinal Design is the wise elder, the Factorial Design is the strategist, and the Cross-Sectional Design is the speedster, then the Correlational Design is the clever scout, identifying interesting patterns but leaving the heavy lifting of proving cause and effect to the other types of designs.

8) Meta-Analysis

Last but not least, let's talk about Meta-Analysis, the librarian of experimental designs.

If other designs are all about creating new research, Meta-Analysis is about gathering up everyone else's research, sorting it, and figuring out what it all means when you put it together.

Imagine a jigsaw puzzle where each piece is a different study. Meta-Analysis is the process of fitting all those pieces together to see the big picture.

The concept of Meta-Analysis started to take shape in the late 20th century, when computers became powerful enough to handle massive amounts of data. It was like someone handed researchers a super-powered magnifying glass, letting them examine multiple studies at the same time to find common trends or results.

You might have heard of the Cochrane Reviews in healthcare . These are big collections of meta-analyses that help doctors and policymakers figure out what treatments work best based on all the research that's been done.

For example, if ten different studies show that a certain medicine helps lower blood pressure, a meta-analysis would pull all that information together to give a more accurate answer.

Meta-Analysis Pros

The beauty of Meta-Analysis is that it can provide really strong evidence. Instead of relying on one study, you're looking at the whole landscape of research on a topic.

Meta-Analysis Cons

However, it does have some downsides. For one, Meta-Analysis is only as good as the studies it includes. If those studies are flawed, the meta-analysis will be too. It's like baking a cake: if you use bad ingredients, it doesn't matter how good your recipe is—the cake won't turn out well.

Meta-Analysis Uses

Despite these challenges, meta-analyses are highly respected and widely used in many fields like medicine, psychology, and education. They help us make sense of a world that's bursting with information by showing us the big picture drawn from many smaller snapshots.

So, in our all-star lineup, if True Experimental Design is the quarterback and Longitudinal Design is the wise elder, the Factorial Design is the strategist, the Cross-Sectional Design is the speedster, and the Correlational Design is the scout, then the Meta-Analysis is like the coach, using insights from everyone else's plays to come up with the best game plan.

9) Non-Experimental Design

Now, let's talk about a player who's a bit of an outsider on this team of experimental designs—the Non-Experimental Design. Think of this design as the commentator or the journalist who covers the game but doesn't actually play.

In a Non-Experimental Design, researchers are like reporters gathering facts, but they don't interfere or change anything. They're simply there to describe and analyze.

Non-Experimental Design Pros

So, what's the deal with Non-Experimental Design? Its strength is in description and exploration. It's really good for studying things as they are in the real world, without changing any conditions.

Non-Experimental Design Cons

Because a non-experimental design doesn't manipulate variables, it can't prove cause and effect. It's like a weather reporter: they can tell you it's raining, but they can't tell you why it's raining.

The downside? Since researchers aren't controlling variables, it's hard to rule out other explanations for what they observe. It's like hearing one side of a story—you get an idea of what happened, but it might not be the complete picture.

Non-Experimental Design Uses

Non-Experimental Design has always been a part of research, especially in fields like anthropology, sociology, and some areas of psychology.

For instance, if you've ever heard of studies that describe how people behave in different cultures or what teens like to do in their free time, that's often Non-Experimental Design at work. These studies aim to capture the essence of a situation, like painting a portrait instead of taking a snapshot.

One well-known example you might have heard about is the Kinsey Reports from the 1940s and 1950s, which described sexual behavior in men and women. Researchers interviewed thousands of people but didn't manipulate any variables like you would in a true experiment. They simply collected data to create a comprehensive picture of the subject matter.

So, in our metaphorical team of research designs, if True Experimental Design is the quarterback and Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, and Meta-Analysis is the coach, then Non-Experimental Design is the sports journalist—always present, capturing the game, but not part of the action itself.

10) Repeated Measures Design

white rat

Time to meet the Repeated Measures Design, the time traveler of our research team. If this design were a player in a sports game, it would be the one who keeps revisiting past plays to figure out how to improve the next one.

Repeated Measures Design is all about studying the same people or subjects multiple times to see how they change or react under different conditions.

The idea behind Repeated Measures Design isn't new; it's been around since the early days of psychology and medicine. You could say it's a cousin to the Longitudinal Design, but instead of looking at how things naturally change over time, it focuses on how the same group reacts to different things.

Imagine a study looking at how a new energy drink affects people's running speed. Instead of comparing one group that drank the energy drink to another group that didn't, a Repeated Measures Design would have the same group of people run multiple times—once with the energy drink, and once without. This way, you're really zeroing in on the effect of that energy drink, making the results more reliable.

Repeated Measures Design Pros

The strong point of Repeated Measures Design is that it's super focused. Because it uses the same subjects, you don't have to worry about differences between groups messing up your results.

Repeated Measures Design Cons

But the downside? Well, people can get tired or bored if they're tested too many times, which might affect how they respond.

Repeated Measures Design Uses

A famous example of this design is the "Little Albert" experiment, conducted by John B. Watson and Rosalie Rayner in 1920. In this study, a young boy was exposed to a white rat and other stimuli several times to see how his emotional responses changed. Though the ethical standards of this experiment are often criticized today, it was groundbreaking in understanding conditioned emotional responses.

In our metaphorical lineup of research designs, if True Experimental Design is the quarterback and Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, Meta-Analysis is the coach, and Non-Experimental Design is the journalist, then Repeated Measures Design is the time traveler—always looping back to fine-tune the game plan.

11) Crossover Design

Next up is Crossover Design, the switch-hitter of the research world. If you're familiar with baseball, you'll know a switch-hitter is someone who can bat both right-handed and left-handed.

In a similar way, Crossover Design allows subjects to experience multiple conditions, flipping them around so that everyone gets a turn in each role.

This design is like the utility player on our team—versatile, flexible, and really good at adapting.

The Crossover Design has its roots in medical research and has been popular since the mid-20th century. It's often used in clinical trials to test the effectiveness of different treatments.

Crossover Design Pros

The neat thing about this design is that it allows each participant to serve as their own control group. Imagine you're testing two new kinds of headache medicine. Instead of giving one type to one group and another type to a different group, you'd give both kinds to the same people but at different times.

Crossover Design Cons

What's the big deal with Crossover Design? Its major strength is in reducing the "noise" that comes from individual differences. Since each person experiences all conditions, it's easier to see real effects. However, there's a catch. This design assumes that there's no lasting effect from the first condition when you switch to the second one. That might not always be true. If the first treatment has a long-lasting effect, it could mess up the results when you switch to the second treatment.

Crossover Design Uses

A well-known example of Crossover Design is in studies that look at the effects of different types of diets—like low-carb vs. low-fat diets. Researchers might have participants follow a low-carb diet for a few weeks, then switch them to a low-fat diet. By doing this, they can more accurately measure how each diet affects the same group of people.

In our team of experimental designs, if True Experimental Design is the quarterback and Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, Meta-Analysis is the coach, Non-Experimental Design is the journalist, and Repeated Measures Design is the time traveler, then Crossover Design is the versatile utility player—always ready to adapt and play multiple roles to get the most accurate results.

12) Cluster Randomized Design

Meet the Cluster Randomized Design, the team captain of group-focused research. In our imaginary lineup of experimental designs, if other designs focus on individual players, then Cluster Randomized Design is looking at how the entire team functions.

This approach is especially common in educational and community-based research, and it's been gaining traction since the late 20th century.

Here's how Cluster Randomized Design works: Instead of assigning individual people to different conditions, researchers assign entire groups, or "clusters." These could be schools, neighborhoods, or even entire towns. This helps you see how the new method works in a real-world setting.

Imagine you want to see if a new anti-bullying program really works. Instead of selecting individual students, you'd introduce the program to a whole school or maybe even several schools, and then compare the results to schools without the program.

Cluster Randomized Design Pros

Why use Cluster Randomized Design? Well, sometimes it's just not practical to assign conditions at the individual level. For example, you can't really have half a school following a new reading program while the other half sticks with the old one; that would be way too confusing! Cluster Randomization helps get around this problem by treating each "cluster" as its own mini-experiment.

Cluster Randomized Design Cons

There's a downside, too. Because entire groups are assigned to each condition, there's a risk that the groups might be different in some important way that the researchers didn't account for. That's like having one sports team that's full of veterans playing against a team of rookies; the match wouldn't be fair.

Cluster Randomized Design Uses

A famous example is the research conducted to test the effectiveness of different public health interventions, like vaccination programs. Researchers might roll out a vaccination program in one community but not in another, then compare the rates of disease in both.

In our metaphorical research team, if True Experimental Design is the quarterback, Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, Meta-Analysis is the coach, Non-Experimental Design is the journalist, Repeated Measures Design is the time traveler, and Crossover Design is the utility player, then Cluster Randomized Design is the team captain—always looking out for the group as a whole.

13) Mixed-Methods Design

Say hello to Mixed-Methods Design, the all-rounder or the "Renaissance player" of our research team.

Mixed-Methods Design uses a blend of both qualitative and quantitative methods to get a more complete picture, just like a Renaissance person who's good at lots of different things. It's like being good at both offense and defense in a sport; you've got all your bases covered!

Mixed-Methods Design is a fairly new kid on the block, becoming more popular in the late 20th and early 21st centuries as researchers began to see the value in using multiple approaches to tackle complex questions. It's the Swiss Army knife in our research toolkit, combining the best parts of other designs to be more versatile.

Here's how it could work: Imagine you're studying the effects of a new educational app on students' math skills. You might use quantitative methods like tests and grades to measure how much the students improve—that's the 'numbers part.'

But you also want to know how the students feel about math now, or why they think they got better or worse. For that, you could conduct interviews or have students fill out journals—that's the 'story part.'

Mixed-Methods Design Pros

So, what's the scoop on Mixed-Methods Design? The strength is its versatility and depth; you're not just getting numbers or stories, you're getting both, which gives a fuller picture.

Mixed-Methods Design Cons

But, it's also more challenging. Imagine trying to play two sports at the same time! You have to be skilled in different research methods and know how to combine them effectively.

Mixed-Methods Design Uses

A high-profile example of Mixed-Methods Design is research on climate change. Scientists use numbers and data to show temperature changes (quantitative), but they also interview people to understand how these changes are affecting communities (qualitative).

In our team of experimental designs, if True Experimental Design is the quarterback, Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, Meta-Analysis is the coach, Non-Experimental Design is the journalist, Repeated Measures Design is the time traveler, Crossover Design is the utility player, and Cluster Randomized Design is the team captain, then Mixed-Methods Design is the Renaissance player—skilled in multiple areas and able to bring them all together for a winning strategy.

14) Multivariate Design

Now, let's turn our attention to Multivariate Design, the multitasker of the research world.

If our lineup of research designs were like players on a basketball court, Multivariate Design would be the player dribbling, passing, and shooting all at once. This design doesn't just look at one or two things; it looks at several variables simultaneously to see how they interact and affect each other.

Multivariate Design is like baking a cake with many ingredients. Instead of just looking at how flour affects the cake, you also consider sugar, eggs, and milk all at once. This way, you understand how everything works together to make the cake taste good or bad.

Multivariate Design has been a go-to method in psychology, economics, and social sciences since the latter half of the 20th century. With the advent of computers and advanced statistical software, analyzing multiple variables at once became a lot easier, and Multivariate Design soared in popularity.

Multivariate Design Pros

So, what's the benefit of using Multivariate Design? Its power lies in its complexity. By studying multiple variables at the same time, you can get a really rich, detailed understanding of what's going on.

Multivariate Design Cons

But that complexity can also be a drawback. With so many variables, it can be tough to tell which ones are really making a difference and which ones are just along for the ride.

Multivariate Design Uses

Imagine you're a coach trying to figure out the best strategy to win games. You wouldn't just look at how many points your star player scores; you'd also consider assists, rebounds, turnovers, and maybe even how loud the crowd is. A Multivariate Design would help you understand how all these factors work together to determine whether you win or lose.

A well-known example of Multivariate Design is in market research. Companies often use this approach to figure out how different factors—like price, packaging, and advertising—affect sales. By studying multiple variables at once, they can find the best combination to boost profits.

In our metaphorical research team, if True Experimental Design is the quarterback, Longitudinal Design is the wise elder, Factorial Design is the strategist, Cross-Sectional Design is the speedster, Correlational Design is the scout, Meta-Analysis is the coach, Non-Experimental Design is the journalist, Repeated Measures Design is the time traveler, Crossover Design is the utility player, Cluster Randomized Design is the team captain, and Mixed-Methods Design is the Renaissance player, then Multivariate Design is the multitasker—juggling many variables at once to get a fuller picture of what's happening.

15) Pretest-Posttest Design

Let's introduce Pretest-Posttest Design, the "Before and After" superstar of our research team. You've probably seen those before-and-after pictures in ads for weight loss programs or home renovations, right?

Well, this design is like that, but for science! Pretest-Posttest Design checks out what things are like before the experiment starts and then compares that to what things are like after the experiment ends.

This design is one of the classics, a staple in research for decades across various fields like psychology, education, and healthcare. It's so simple and straightforward that it has stayed popular for a long time.

In Pretest-Posttest Design, you measure your subject's behavior or condition before you introduce any changes—that's your "before" or "pretest." Then you do your experiment, and after it's done, you measure the same thing again—that's your "after" or "posttest."

Pretest-Posttest Design Pros

What makes Pretest-Posttest Design special? It's pretty easy to understand and doesn't require fancy statistics.

Pretest-Posttest Design Cons

But there are some pitfalls. For example, what if the kids in our math example get better at multiplication just because they're older or because they've taken the test before? That would make it hard to tell if the program is really effective or not.

Pretest-Posttest Design Uses

Let's say you're a teacher and you want to know if a new math program helps kids get better at multiplication. First, you'd give all the kids a multiplication test—that's your pretest. Then you'd teach them using the new math program. At the end, you'd give them the same test again—that's your posttest. If the kids do better on the second test, you might conclude that the program works.

One famous use of Pretest-Posttest Design is in evaluating the effectiveness of driver's education courses. Researchers will measure people's driving skills before and after the course to see if they've improved.

16) Solomon Four-Group Design

Next up is the Solomon Four-Group Design, the "chess master" of our research team. This design is all about strategy and careful planning. Named after Richard L. Solomon who introduced it in the 1940s, this method tries to correct some of the weaknesses in simpler designs, like the Pretest-Posttest Design.

Here's how it rolls: The Solomon Four-Group Design uses four different groups to test a hypothesis. Two groups get a pretest, then one of them receives the treatment or intervention, and both get a posttest. The other two groups skip the pretest, and only one of them receives the treatment before they both get a posttest.

Sound complicated? It's like playing 4D chess; you're thinking several moves ahead!

Solomon Four-Group Design Pros

What's the pro and con of the Solomon Four-Group Design? On the plus side, it provides really robust results because it accounts for so many variables.

Solomon Four-Group Design Cons

The downside? It's a lot of work and requires a lot of participants, making it more time-consuming and costly.

Solomon Four-Group Design Uses

Let's say you want to figure out if a new way of teaching history helps students remember facts better. Two classes take a history quiz (pretest), then one class uses the new teaching method while the other sticks with the old way. Both classes take another quiz afterward (posttest).

Meanwhile, two more classes skip the initial quiz, and then one uses the new method before both take the final quiz. Comparing all four groups will give you a much clearer picture of whether the new teaching method works and whether the pretest itself affects the outcome.

The Solomon Four-Group Design is less commonly used than simpler designs but is highly respected for its ability to control for more variables. It's a favorite in educational and psychological research where you really want to dig deep and figure out what's actually causing changes.

17) Adaptive Designs

Now, let's talk about Adaptive Designs, the chameleons of the experimental world.

Imagine you're a detective, and halfway through solving a case, you find a clue that changes everything. You wouldn't just stick to your old plan; you'd adapt and change your approach, right? That's exactly what Adaptive Designs allow researchers to do.

In an Adaptive Design, researchers can make changes to the study as it's happening, based on early results. In a traditional study, once you set your plan, you stick to it from start to finish.

Adaptive Design Pros

This method is particularly useful in fast-paced or high-stakes situations, like developing a new vaccine in the middle of a pandemic. The ability to adapt can save both time and resources, and more importantly, it can save lives by getting effective treatments out faster.

Adaptive Design Cons

But Adaptive Designs aren't without their drawbacks. They can be very complex to plan and carry out, and there's always a risk that the changes made during the study could introduce bias or errors.

Adaptive Design Uses

Adaptive Designs are most often seen in clinical trials, particularly in the medical and pharmaceutical fields.

For instance, if a new drug is showing really promising results, the study might be adjusted to give more participants the new treatment instead of a placebo. Or if one dose level is showing bad side effects, it might be dropped from the study.

The best part is, these changes are pre-planned. Researchers lay out in advance what changes might be made and under what conditions, which helps keep everything scientific and above board.

In terms of applications, besides their heavy usage in medical and pharmaceutical research, Adaptive Designs are also becoming increasingly popular in software testing and market research. In these fields, being able to quickly adjust to early results can give companies a significant advantage.

Adaptive Designs are like the agile startups of the research world—quick to pivot, keen to learn from ongoing results, and focused on rapid, efficient progress. However, they require a great deal of expertise and careful planning to ensure that the adaptability doesn't compromise the integrity of the research.

18) Bayesian Designs

Next, let's dive into Bayesian Designs, the data detectives of the research universe. Named after Thomas Bayes, an 18th-century statistician and minister, this design doesn't just look at what's happening now; it also takes into account what's happened before.

Imagine if you were a detective who not only looked at the evidence in front of you but also used your past cases to make better guesses about your current one. That's the essence of Bayesian Designs.

Bayesian Designs are like detective work in science. As you gather more clues (or data), you update your best guess on what's really happening. This way, your experiment gets smarter as it goes along.

In the world of research, Bayesian Designs are most notably used in areas where you have some prior knowledge that can inform your current study. For example, if earlier research shows that a certain type of medicine usually works well for a specific illness, a Bayesian Design would include that information when studying a new group of patients with the same illness.

Bayesian Design Pros

One of the major advantages of Bayesian Designs is their efficiency. Because they use existing data to inform the current experiment, often fewer resources are needed to reach a reliable conclusion.

Bayesian Design Cons

However, they can be quite complicated to set up and require a deep understanding of both statistics and the subject matter at hand.

Bayesian Design Uses

Bayesian Designs are highly valued in medical research, finance, environmental science, and even in Internet search algorithms. Their ability to continually update and refine hypotheses based on new evidence makes them particularly useful in fields where data is constantly evolving and where quick, informed decisions are crucial.

Here's a real-world example: In the development of personalized medicine, where treatments are tailored to individual patients, Bayesian Designs are invaluable. If a treatment has been effective for patients with similar genetics or symptoms in the past, a Bayesian approach can use that data to predict how well it might work for a new patient.

This type of design is also increasingly popular in machine learning and artificial intelligence. In these fields, Bayesian Designs help algorithms "learn" from past data to make better predictions or decisions in new situations. It's like teaching a computer to be a detective that gets better and better at solving puzzles the more puzzles it sees.

19) Covariate Adaptive Randomization

old person and young person

Now let's turn our attention to Covariate Adaptive Randomization, which you can think of as the "matchmaker" of experimental designs.

Picture a soccer coach trying to create the most balanced teams for a friendly match. They wouldn't just randomly assign players; they'd take into account each player's skills, experience, and other traits.

Covariate Adaptive Randomization is all about creating the most evenly matched groups possible for an experiment.

In traditional randomization, participants are allocated to different groups purely by chance. This is a pretty fair way to do things, but it can sometimes lead to unbalanced groups.

Imagine if all the professional-level players ended up on one soccer team and all the beginners on another; that wouldn't be a very informative match! Covariate Adaptive Randomization fixes this by using important traits or characteristics (called "covariates") to guide the randomization process.

Covariate Adaptive Randomization Pros

The benefits of this design are pretty clear: it aims for balance and fairness, making the final results more trustworthy.

Covariate Adaptive Randomization Cons

But it's not perfect. It can be complex to implement and requires a deep understanding of which characteristics are most important to balance.

Covariate Adaptive Randomization Uses

This design is particularly useful in medical trials. Let's say researchers are testing a new medication for high blood pressure. Participants might have different ages, weights, or pre-existing conditions that could affect the results.

Covariate Adaptive Randomization would make sure that each treatment group has a similar mix of these characteristics, making the results more reliable and easier to interpret.

In practical terms, this design is often seen in clinical trials for new drugs or therapies, but its principles are also applicable in fields like psychology, education, and social sciences.

For instance, in educational research, it might be used to ensure that classrooms being compared have similar distributions of students in terms of academic ability, socioeconomic status, and other factors.

Covariate Adaptive Randomization is like the wise elder of the group, ensuring that everyone has an equal opportunity to show their true capabilities, thereby making the collective results as reliable as possible.

20) Stepped Wedge Design

Let's now focus on the Stepped Wedge Design, a thoughtful and cautious member of the experimental design family.

Imagine you're trying out a new gardening technique, but you're not sure how well it will work. You decide to apply it to one section of your garden first, watch how it performs, and then gradually extend the technique to other sections. This way, you get to see its effects over time and across different conditions. That's basically how Stepped Wedge Design works.

In a Stepped Wedge Design, all participants or clusters start off in the control group, and then, at different times, they 'step' over to the intervention or treatment group. This creates a wedge-like pattern over time where more and more participants receive the treatment as the study progresses. It's like rolling out a new policy in phases, monitoring its impact at each stage before extending it to more people.

Stepped Wedge Design Pros

The Stepped Wedge Design offers several advantages. Firstly, it allows for the study of interventions that are expected to do more good than harm, which makes it ethically appealing.

Secondly, it's useful when resources are limited and it's not feasible to roll out a new treatment to everyone at once. Lastly, because everyone eventually receives the treatment, it can be easier to get buy-in from participants or organizations involved in the study.

Stepped Wedge Design Cons

However, this design can be complex to analyze because it has to account for both the time factor and the changing conditions in each 'step' of the wedge. And like any study where participants know they're receiving an intervention, there's the potential for the results to be influenced by the placebo effect or other biases.

Stepped Wedge Design Uses

This design is particularly useful in health and social care research. For instance, if a hospital wants to implement a new hygiene protocol, it might start in one department, assess its impact, and then roll it out to other departments over time. This allows the hospital to adjust and refine the new protocol based on real-world data before it's fully implemented.

In terms of applications, Stepped Wedge Designs are commonly used in public health initiatives, organizational changes in healthcare settings, and social policy trials. They are particularly useful in situations where an intervention is being rolled out gradually and it's important to understand its impacts at each stage.

21) Sequential Design

Next up is Sequential Design, the dynamic and flexible member of our experimental design family.

Imagine you're playing a video game where you can choose different paths. If you take one path and find a treasure chest, you might decide to continue in that direction. If you hit a dead end, you might backtrack and try a different route. Sequential Design operates in a similar fashion, allowing researchers to make decisions at different stages based on what they've learned so far.

In a Sequential Design, the experiment is broken down into smaller parts, or "sequences." After each sequence, researchers pause to look at the data they've collected. Based on those findings, they then decide whether to stop the experiment because they've got enough information, or to continue and perhaps even modify the next sequence.

Sequential Design Pros

This allows for a more efficient use of resources, as you're only continuing with the experiment if the data suggests it's worth doing so.

One of the great things about Sequential Design is its efficiency. Because you're making data-driven decisions along the way, you can often reach conclusions more quickly and with fewer resources.

Sequential Design Cons

However, it requires careful planning and expertise to ensure that these "stop or go" decisions are made correctly and without bias.

Sequential Design Uses

In terms of its applications, besides healthcare and medicine, Sequential Design is also popular in quality control in manufacturing, environmental monitoring, and financial modeling. In these areas, being able to make quick decisions based on incoming data can be a big advantage.

This design is often used in clinical trials involving new medications or treatments. For example, if early results show that a new drug has significant side effects, the trial can be stopped before more people are exposed to it.

On the flip side, if the drug is showing promising results, the trial might be expanded to include more participants or to extend the testing period.

Think of Sequential Design as the nimble athlete of experimental designs, capable of quick pivots and adjustments to reach the finish line in the most effective way possible. But just like an athlete needs a good coach, this design requires expert oversight to make sure it stays on the right track.

22) Field Experiments

Last but certainly not least, let's explore Field Experiments—the adventurers of the experimental design world.

Picture a scientist leaving the controlled environment of a lab to test a theory in the real world, like a biologist studying animals in their natural habitat or a social scientist observing people in a real community. These are Field Experiments, and they're all about getting out there and gathering data in real-world settings.

Field Experiments embrace the messiness of the real world, unlike laboratory experiments, where everything is controlled down to the smallest detail. This makes them both exciting and challenging.

Field Experiment Pros

On one hand, the results often give us a better understanding of how things work outside the lab.

While Field Experiments offer real-world relevance, they come with challenges like controlling for outside factors and the ethical considerations of intervening in people's lives without their knowledge.

Field Experiment Cons

On the other hand, the lack of control can make it harder to tell exactly what's causing what. Yet, despite these challenges, they remain a valuable tool for researchers who want to understand how theories play out in the real world.

Field Experiment Uses

Let's say a school wants to improve student performance. In a Field Experiment, they might change the school's daily schedule for one semester and keep track of how students perform compared to another school where the schedule remained the same.

Because the study is happening in a real school with real students, the results could be very useful for understanding how the change might work in other schools. But since it's the real world, lots of other factors—like changes in teachers or even the weather—could affect the results.

Field Experiments are widely used in economics, psychology, education, and public policy. For example, you might have heard of the famous "Broken Windows" experiment in the 1980s that looked at how small signs of disorder, like broken windows or graffiti, could encourage more serious crime in neighborhoods. This experiment had a big impact on how cities think about crime prevention.

From the foundational concepts of control groups and independent variables to the sophisticated layouts like Covariate Adaptive Randomization and Sequential Design, it's clear that the realm of experimental design is as varied as it is fascinating.

We've seen that each design has its own special talents, ideal for specific situations. Some designs, like the Classic Controlled Experiment, are like reliable old friends you can always count on.

Others, like Sequential Design, are flexible and adaptable, making quick changes based on what they learn. And let's not forget the adventurous Field Experiments, which take us out of the lab and into the real world to discover things we might not see otherwise.

Choosing the right experimental design is like picking the right tool for the job. The method you choose can make a big difference in how reliable your results are and how much people will trust what you've discovered. And as we've learned, there's a design to suit just about every question, every problem, and every curiosity.

So the next time you read about a new discovery in medicine, psychology, or any other field, you'll have a better understanding of the thought and planning that went into figuring things out. Experimental design is more than just a set of rules; it's a structured way to explore the unknown and answer questions that can change the world.

Related posts:

  • Experimental Psychologist Career (Salary + Duties + Interviews)
  • 40+ Famous Psychologists (Images + Biographies)
  • 11+ Psychology Experiment Ideas (Goals + Methods)
  • The Little Albert Experiment
  • 41+ White Collar Job Examples (Salary + Path)

Reference this article:

About The Author

Photo of author

Free Personality Test

Free Personality Quiz

Free Memory Test

Free Memory Test

Free IQ Test

Free IQ Test

PracticalPie.com is a participant in the Amazon Associates Program. As an Amazon Associate we earn from qualifying purchases.

Follow Us On:

Youtube Facebook Instagram X/Twitter

Psychology Resources

Developmental

Personality

Relationships

Psychologists

Serial Killers

Psychology Tests

Personality Quiz

Memory Test

Depression test

Type A/B Personality Test

© PracticalPsychology. All rights reserved

Privacy Policy | Terms of Use

helpful professor logo

15 Famous Experiments and Case Studies in Psychology

15 Famous Experiments and Case Studies in Psychology

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Learn about our Editorial Process

psychology theories, explained below

Psychology has seen thousands upon thousands of research studies over the years. Most of these studies have helped shape our current understanding of human thoughts, behavior, and feelings.

The psychology case studies in this list are considered classic examples of psychological case studies and experiments, which are still being taught in introductory psychology courses up to this day.

Some studies, however, were downright shocking and controversial that you’d probably wonder why such studies were conducted back in the day. Imagine participating in an experiment for a small reward or extra class credit, only to be left scarred for life. These kinds of studies, however, paved the way for a more ethical approach to studying psychology and implementation of research standards such as the use of debriefing in psychology research .

Case Study vs. Experiment

Before we dive into the list of the most famous studies in psychology, let us first review the difference between case studies and experiments.

  • It is an in-depth study and analysis of an individual, group, community, or phenomenon. The results of a case study cannot be applied to the whole population, but they can provide insights for further studies.
  • It often uses qualitative research methods such as observations, surveys, and interviews.
  • It is often conducted in real-life settings rather than in controlled environments.
  • An experiment is a type of study done on a sample or group of random participants, the results of which can be generalized to the whole population.
  • It often uses quantitative research methods that rely on numbers and statistics.
  • It is conducted in controlled environments, wherein some things or situations are manipulated.

See Also: Experimental vs Observational Studies

Famous Experiments in Psychology

1. the marshmallow experiment.

Psychologist Walter Mischel conducted the marshmallow experiment at Stanford University in the 1960s to early 1970s. It was a simple test that aimed to define the connection between delayed gratification and success in life.

The instructions were fairly straightforward: children ages 4-6 were presented a piece of marshmallow on a table and they were told that they would receive a second piece if they could wait for 15 minutes without eating the first marshmallow.

About one-third of the 600 participants succeeded in delaying gratification to receive the second marshmallow. Mischel and his team followed up on these participants in the 1990s, learning that those who had the willpower to wait for a larger reward experienced more success in life in terms of SAT scores and other metrics.

This case study also supported self-control theory , a theory in criminology that holds that people with greater self-control are less likely to end up in trouble with the law!

The classic marshmallow experiment, however, was debunked in a 2018 replication study done by Tyler Watts and colleagues.

This more recent experiment had a larger group of participants (900) and a better representation of the general population when it comes to race and ethnicity. In this study, the researchers found out that the ability to wait for a second marshmallow does not depend on willpower alone but more so on the economic background and social status of the participants.

2. The Bystander Effect

In 1694, Kitty Genovese was murdered in the neighborhood of Kew Gardens, New York. It was told that there were up to 38 witnesses and onlookers in the vicinity of the crime scene, but nobody did anything to stop the murder or call for help.

Such tragedy was the catalyst that inspired social psychologists Bibb Latane and John Darley to formulate the phenomenon called bystander effect or bystander apathy .

Subsequent investigations showed that this story was exaggerated and inaccurate, as there were actually only about a dozen witnesses, at least two of whom called the police. But the case of Kitty Genovese led to various studies that aim to shed light on the bystander phenomenon.

Latane and Darley tested bystander intervention in an experimental study . Participants were asked to answer a questionnaire inside a room, and they would either be alone or with two other participants (who were actually actors or confederates in the study). Smoke would then come out from under the door. The reaction time of participants was tested — how long would it take them to report the smoke to the authorities or the experimenters?

The results showed that participants who were alone in the room reported the smoke faster than participants who were with two passive others. The study suggests that the more onlookers are present in an emergency situation, the less likely someone would step up to help, a social phenomenon now popularly called the bystander effect.

3. Asch Conformity Study

Have you ever made a decision against your better judgment just to fit in with your friends or family? The Asch Conformity Studies will help you understand this kind of situation better.

In this experiment, a group of participants were shown three numbered lines of different lengths and asked to identify the longest of them all. However, only one true participant was present in every group and the rest were actors, most of whom told the wrong answer.

Results showed that the participants went for the wrong answer, even though they knew which line was the longest one in the first place. When the participants were asked why they identified the wrong one, they said that they didn’t want to be branded as strange or peculiar.

This study goes to show that there are situations in life when people prefer fitting in than being right. It also tells that there is power in numbers — a group’s decision can overwhelm a person and make them doubt their judgment.

4. The Bobo Doll Experiment

The Bobo Doll Experiment was conducted by Dr. Albert Bandura, the proponent of social learning theory .

Back in the 1960s, the Nature vs. Nurture debate was a popular topic among psychologists. Bandura contributed to this discussion by proposing that human behavior is mostly influenced by environmental rather than genetic factors.

In the Bobo Doll Experiment, children were divided into three groups: one group was shown a video in which an adult acted aggressively toward the Bobo Doll, the second group was shown a video in which an adult play with the Bobo Doll, and the third group served as the control group where no video was shown.

The children were then led to a room with different kinds of toys, including the Bobo Doll they’ve seen in the video. Results showed that children tend to imitate the adults in the video. Those who were presented the aggressive model acted aggressively toward the Bobo Doll while those who were presented the passive model showed less aggression.

While the Bobo Doll Experiment can no longer be replicated because of ethical concerns, it has laid out the foundations of social learning theory and helped us understand the degree of influence adult behavior has on children.

5. Blue Eye / Brown Eye Experiment

Following the assassination of Martin Luther King Jr. in 1968, third-grade teacher Jane Elliott conducted an experiment in her class. Although not a formal experiment in controlled settings, A Class Divided is a good example of a social experiment to help children understand the concept of racism and discrimination.

The class was divided into two groups: blue-eyed children and brown-eyed children. For one day, Elliott gave preferential treatment to her blue-eyed students, giving them more attention and pampering them with rewards. The next day, it was the brown-eyed students’ turn to receive extra favors and privileges.

As a result, whichever group of students was given preferential treatment performed exceptionally well in class, had higher quiz scores, and recited more frequently; students who were discriminated against felt humiliated, answered poorly in tests, and became uncertain with their answers in class.

This study is now widely taught in sociocultural psychology classes.

6. Stanford Prison Experiment

One of the most controversial and widely-cited studies in psychology is the Stanford Prison Experiment , conducted by Philip Zimbardo at the basement of the Stanford psychology building in 1971. The hypothesis was that abusive behavior in prisons is influenced by the personality traits of the prisoners and prison guards.

The participants in the experiment were college students who were randomly assigned as either a prisoner or a prison guard. The prison guards were then told to run the simulated prison for two weeks. However, the experiment had to be stopped in just 6 days.

The prison guards abused their authority and harassed the prisoners through verbal and physical means. The prisoners, on the other hand, showed submissive behavior. Zimbardo decided to stop the experiment because the prisoners were showing signs of emotional and physical breakdown.

Although the experiment wasn’t completed, the results strongly showed that people can easily get into a social role when others expect them to, especially when it’s highly stereotyped .

7. The Halo Effect

Have you ever wondered why toothpastes and other dental products are endorsed in advertisements by celebrities more often than dentists? The Halo Effect is one of the reasons!

The Halo Effect shows how one favorable attribute of a person can gain them positive perceptions in other attributes. In the case of product advertisements, attractive celebrities are also perceived as intelligent and knowledgeable of a certain subject matter even though they’re not technically experts.

The Halo Effect originated in a classic study done by Edward Thorndike in the early 1900s. He asked military commanding officers to rate their subordinates based on different qualities, such as physical appearance, leadership, dependability, and intelligence.

The results showed that high ratings of a particular quality influences the ratings of other qualities, producing a halo effect of overall high ratings. The opposite also applied, which means that a negative rating in one quality also correlated to negative ratings in other qualities.

Experiments on the Halo Effect came in various formats as well, supporting Thorndike’s original theory. This phenomenon suggests that our perception of other people’s overall personality is hugely influenced by a quality that we focus on.

8. Cognitive Dissonance

There are experiences in our lives when our beliefs and behaviors do not align with each other and we try to justify them in our minds. This is cognitive dissonance , which was studied in an experiment by Leon Festinger and James Carlsmith back in 1959.

In this experiment, participants had to go through a series of boring and repetitive tasks, such as spending an hour turning pegs in a wooden knob. After completing the tasks, they were then paid either $1 or $20 to tell the next participants that the tasks were extremely fun and enjoyable. Afterwards, participants were asked to rate the experiment. Those who were given $1 rated the experiment as more interesting and fun than those who received $20.

The results showed that those who received a smaller incentive to lie experienced cognitive dissonance — $1 wasn’t enough incentive for that one hour of painstakingly boring activity, so the participants had to justify that they had fun anyway.

Famous Case Studies in Psychology

9. little albert.

In 1920, behaviourist theorists John Watson and Rosalie Rayner experimented on a 9-month-old baby to test the effects of classical conditioning in instilling fear in humans.

This was such a controversial study that it gained popularity in psychology textbooks and syllabi because it is a classic example of unethical research studies done in the name of science.

In one of the experiments, Little Albert was presented with a harmless stimulus or object, a white rat, which he wasn’t scared of at first. But every time Little Albert would see the white rat, the researchers would play a scary sound of hammer and steel. After about 6 pairings, Little Albert learned to fear the rat even without the scary sound.

Little Albert developed signs of fear to different objects presented to him through classical conditioning . He even generalized his fear to other stimuli not present in the course of the experiment.

10. Phineas Gage

Phineas Gage is such a celebrity in Psych 101 classes, even though the way he rose to popularity began with a tragic accident. He was a resident of Central Vermont and worked in the construction of a new railway line in the mid-1800s. One day, an explosive went off prematurely, sending a tamping iron straight into his face and through his brain.

Gage survived the accident, fortunately, something that is considered a feat even up to this day. He managed to find a job as a stagecoach after the accident. However, his family and friends reported that his personality changed so much that “he was no longer Gage” (Harlow, 1868).

New evidence on the case of Phineas Gage has since come to light, thanks to modern scientific studies and medical tests. However, there are still plenty of mysteries revolving around his brain damage and subsequent recovery.

11. Anna O.

Anna O., a social worker and feminist of German Jewish descent, was one of the first patients to receive psychoanalytic treatment.

Her real name was Bertha Pappenheim and she inspired much of Sigmund Freud’s works and books on psychoanalytic theory, although they hadn’t met in person. Their connection was through Joseph Breuer, Freud’s mentor when he was still starting his clinical practice.

Anna O. suffered from paralysis, personality changes, hallucinations, and rambling speech, but her doctors could not find the cause. Joseph Breuer was then called to her house for intervention and he performed psychoanalysis, also called the “talking cure”, on her.

Breuer would tell Anna O. to say anything that came to her mind, such as her thoughts, feelings, and childhood experiences. It was noted that her symptoms subsided by talking things out.

However, Breuer later referred Anna O. to the Bellevue Sanatorium, where she recovered and set out to be a renowned writer and advocate of women and children.

12. Patient HM

H.M., or Henry Gustav Molaison, was a severe amnesiac who had been the subject of countless psychological and neurological studies.

Henry was 27 when he underwent brain surgery to cure the epilepsy that he had been experiencing since childhood. In an unfortunate turn of events, he lost his memory because of the surgery and his brain also became unable to store long-term memories.

He was then regarded as someone living solely in the present, forgetting an experience as soon as it happened and only remembering bits and pieces of his past. Over the years, his amnesia and the structure of his brain had helped neuropsychologists learn more about cognitive functions .

Suzanne Corkin, a researcher, writer, and good friend of H.M., recently published a book about his life. Entitled Permanent Present Tense , this book is both a memoir and a case study following the struggles and joys of Henry Gustav Molaison.

13. Chris Sizemore

Chris Sizemore gained celebrity status in the psychology community when she was diagnosed with multiple personality disorder, now known as dissociative identity disorder.

Sizemore has several alter egos, which included Eve Black, Eve White, and Jane. Various papers about her stated that these alter egos were formed as a coping mechanism against the traumatic experiences she underwent in her childhood.

Sizemore said that although she has succeeded in unifying her alter egos into one dominant personality, there were periods in the past experienced by only one of her alter egos. For example, her husband married her Eve White alter ego and not her.

Her story inspired her psychiatrists to write a book about her, entitled The Three Faces of Eve , which was then turned into a 1957 movie of the same title.

14. David Reimer

When David was just 8 months old, he lost his penis because of a botched circumcision operation.

Psychologist John Money then advised Reimer’s parents to raise him as a girl instead, naming him Brenda. His gender reassignment was supported by subsequent surgery and hormonal therapy.

Money described Reimer’s gender reassignment as a success, but problems started to arise as Reimer was growing up. His boyishness was not completely subdued by the hormonal therapy. When he was 14 years old, he learned about the secrets of his past and he underwent gender reassignment to become male again.

Reimer became an advocate for children undergoing the same difficult situation he had been. His life story ended when he was 38 as he took his own life.

15. Kim Peek

Kim Peek was the inspiration behind Rain Man , an Oscar-winning movie about an autistic savant character played by Dustin Hoffman.

The movie was released in 1988, a time when autism wasn’t widely known and acknowledged yet. So it was an eye-opener for many people who watched the film.

In reality, Kim Peek was a non-autistic savant. He was exceptionally intelligent despite the brain abnormalities he was born with. He was like a walking encyclopedia, knowledgeable about travel routes, US zip codes, historical facts, and classical music. He also read and memorized approximately 12,000 books in his lifetime.

This list of experiments and case studies in psychology is just the tip of the iceberg! There are still countless interesting psychology studies that you can explore if you want to learn more about human behavior and dynamics.

You can also conduct your own mini-experiment or participate in a study conducted in your school or neighborhood. Just remember that there are ethical standards to follow so as not to repeat the lasting physical and emotional harm done to Little Albert or the Stanford Prison Experiment participants.

Asch, S. E. (1956). Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological Monographs: General and Applied, 70 (9), 1–70. https://doi.org/10.1037/h0093718

Bandura, A., Ross, D., & Ross, S. A. (1961). Transmission of aggression through imitation of aggressive models. The Journal of Abnormal and Social Psychology, 63 (3), 575–582. https://doi.org/10.1037/h0045925

Elliott, J., Yale University., WGBH (Television station : Boston, Mass.), & PBS DVD (Firm). (2003). A class divided. New Haven, Conn.: Yale University Films.

Festinger, L., & Carlsmith, J. M. (1959). Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology, 58 (2), 203–210. https://doi.org/10.1037/h0041593

Haney, C., Banks, W. C., & Zimbardo, P. G. (1973). A study of prisoners and guards in a simulated prison. Naval Research Review , 30 , 4-17.

Latane, B., & Darley, J. M. (1968). Group inhibition of bystander intervention in emergencies. Journal of Personality and Social Psychology, 10 (3), 215–221. https://doi.org/10.1037/h0026570

Mischel, W. (2014). The Marshmallow Test: Mastering self-control. Little, Brown and Co.

Thorndike, E. (1920) A Constant Error in Psychological Ratings. Journal of Applied Psychology , 4 , 25-29. http://dx.doi.org/10.1037/h0071663

Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of experimental psychology , 3 (1), 1.

Chris

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 10 Reasons you’re Perpetually Single
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 20 Montessori Toddler Bedrooms (Design Inspiration)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 21 Montessori Homeschool Setups
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 101 Hidden Talents Examples

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Explore Psychology

8 Classic Psychological Experiments

Categories History

Psychological experiments can tell us a lot about the human mind and behavior. Some of the best-known experiments have given us insights into topics such as conformity, obedience, attachment, and learning.

There are many famous (and sometimes infamous) psychological experiments that have helped shape our understanding of the human mind and behavior. Such experiments offered insights into how people respond to social pressure and how they develop associations that lead to fear. 

While many of these psychological experiments are well known even outside of psychology, it is important to recognize that many of them could not be performed today.

In many instances, these experiments would never receive institutional review board approval due to ethical concerns and the potential harm to participants.

In this article, learn more about some of the most famous psychological experiments and discover why some of them are considered so controversial.

Table of Contents

Pavlov’s Dog Experiments, 1897

While not set up as a psychological experiment, Ivan Pavlov’s research on the digestive systems of dogs had a tremendous impact on the field of psychology. During his research, he noticed that dogs would begin to salivate whenever they saw the lab assistant who provided them with food.

By pairing a previously neutral stimulus (a sound) with a naturally occurring stimulus that automatically produces a response (food), Pavlov discovered that he could condition the dogs to salivate when they heard the sound.

The discovery of the classical conditioning process played a pivotal role in the formation of the behavioral school of psychology and has continued to influence our understanding of how learning can occur through associations.

Little Albert Experiment, 1920

Anyone who has ever taken an introductory course in psychology is probably familiar with the Little Albert experiment. In the famous experiment conducted in the 1920s by behaviorists John B. Watson and Rosalie Rayner, an infant was exposed to a white rat to which he initially exhibited no fear. The researchers then presented the rat accompanied by a loud clanging noise. 

After repeated pairings, the child began to cry when the rat alone was presented. This fear was even generalized to objects that resembled the rat such as fluffy white toys.

Watson’s research played an important role in the development of the school of thought known as behaviorism . It also provided evidence of the power of classical conditioning , which involves learning by association. 

The findings also had implications for our understanding of how fears can form, including phobias and irrational fears that sometimes develop early in life or after a single frightening experience.

Asch Conformity Experiment, 1951

The Asch conformity experiments were a series of psychological experiments conducted by psychologist Solomon Asch during the 1950s. The purpose of the experiments was to determine how much a person’s opinions were influenced by the opinions of the rest of the group.

In the study, participants were told that they were taking a “vision test” along with several others. In reality, the other individuals in the room were actors who were following the experimenters’ instructions.

When shown several line segments, the participants were supposed to select the one that matched a sample line segment in length.

In some cases, those who were in on the study would pick the obvious match. In other cases, however, the study confederates would unanimously pick the wrong line segment. 

The results of Asch’s experiments found that people tended to conform when other people unanimously picked the wrong answer.

Across the 12 trials he conducted, Asch found that around 33% of the naive participants conformed to the group and picked the wrong answer. In a control group, for comparison, less than 1% of the participants ever chose the wrong answer. 

The experiments revealed how group pressure can cause people to change their own behavior in order to fit in with the rest of the group.

Robbers Cave Experiment, 1954

In the Robbers Cave psychological experiment , researcher Muzafer Sherif and his colleagues used a summer camp setting to look at how factors such as competition and prejudice influenced conflict between groups. 

In the experiment, boys attending a summer camp were randomly assigned to two groups. The groups were then placed in situations where they had to compete with one another. Such competition led to conflicts, competition, and hostility between the two groups.

Later, the experiments attempted to reconcile the groups and eliminate the tensions that the previous competitive tasks had created. Bonding activities had little impact, but the researchers found that situations that required members of the two groups to work together in order to overcome a problem were effective at reducing tensions.

The study had implications for how different social groups create their own norms and hierarchies and then use those rules to exclude outsiders.

Harlow’s Rhesus Monkey Experiments, 1958

Psychologist Harry Harlow conducted a series of experiments during the 1950s and 1960s that demonstrated how important love and affection were in the course of child development. In his experiments, he placed infant monkeys in an environment where they had access to two different surrogate “mothers.”

One was a wire mother who held a bottle and provided food, while the other was a soft surrogate mother who was covered in a terry cloth fabric. 

While the cloth mother did not provide nourishment, the experiments demonstrated that the baby monkeys preferred the soft mother over the wire mother. When they were frightened and needed comfort, they would turn to the cloth mother for security.

Milgram Obedience Experiment, 1963

The Milgram experiment was one of the most famous and controversial psychological experiments ever performed. The experiments involved an experimenter ordering participants to deliver electrical shocks to other people.

While the people who were supposedly receiving the shocks were actors who pretended to be in pain, the participants fully believed that they were delivering painful, and even dangerous shocks. 

Milgram’s findings suggested that up to 65% of the participants were willing to deliver potentially fatal shocks to another person simply because an authority figure ordered them to do so. 

Based on these findings, Milgram proposed that people were willing to follow orders from an authority figure if they think that person will take responsibility for the results and is qualified to give orders. 

Bobo Doll Experiment, 1961-1963

In this experiment, Albert Bandura investigated the effects of observational learning by having young children witness acts of aggression and then observing them to see if they copied the behavior.

Children in the study observed adults act aggressively toward a Bobo doll, a large inflatable doll resembling a bowling pin. When hit or kicked, the doll tips sideways and then returns to an upright position.

Bandura found that children who watched an adult act aggressively were more likely to imitate those behaviors later when they were allowed to play in a room with the Bobo doll.

The study played an important role in our understanding of social learning theory and how kids learn by watching others. 

Stanford Prison Experiment, 1971

In this infamous social psychology experiment, Philip Zimbardo set up a mock prison in the basement of the Stanford University psychology department and randomly assigned a group of 24 college students to either be guards or prisoners. 

The study was originally supposed to last for two weeks but had to be stopped after six days because participants reportedly became so immersed in their roles that they began to experience upsetting psychological effects. The results were believed to demonstrate the power that social roles and expectations can exert over a person’s behavior. 

The experiment is widely described in psychology textbooks and even became the subject of a feature film in 2015. 

More recent analysis has suggested that the experiment had serious design flaws, among other problems. In addition to the already problematic ethics of the study, analysis of the study’s records suggests that the experimenters may have played a role in encouraging the abusive behavior displayed by the participants.

Impact of Psychological Experiments

The psychology experiments of the past have had an impact on our understanding of the human mind and behavior. While many of the experiments described here have problems in terms of their design and their ethics, they remain some of the most famous examples of research within the field of psychology.

Learning more about these classic experiments can help you better understand research that informed the development of psychology. It can also provide inspiration for your own psychology experiment ideas and provide information to explore in your psychology papers .

Bandura A, Ross D, Ross SA. Transmission of aggression through imitation of aggressive models . Journal of Abnormal and Social Psychology. 1961;63:575-82. doi:10.1037/h0045925

Gantt WH . Ivan Pavlov . Encyclopaedia Brittanica . Updated February 23, 2020.

Gonzalez-franco M, Slater M, Birney ME, Swapp D, Haslam SA, Reicher SD. Participant concerns for the Learner in a Virtual Reality replication of the Milgram obedience study. PLoS ONE. 2018;13(12):e0209704. doi:10.1371/journal.pone.0209704

Jeon, HL. The environmental factor within the Solomon Asch Line Test . International Journal of Social Science and Humanity. 2014;4(4):264-268. doi:10.7763/IJSSH.2014.V4.360 

Le Texier T. Debunking the Stanford Prison Experiment . American Psychologist . 2019;74(7):823-839. doi:10.1037/amp0000401

Sherif M, Harvey OJ, White BJ, Hood WR, Sherif CW. Intergroup conflict and cooperation: The Robbers Cave experiment (Vol. 10) . Norman, OK: University Book Exchange; 1961.

Zimbardo P, Haney C, Banks WC, Jaffe D. The Stanford Prison Experiment: A simulation study of the psychology of imprisonment. Stanford University, Stanford Digital Repository, Stanford; 1971.

7 Famous Psychology Experiments

Picture of a piece of art used for psychological experiments

Many famous experiments studying human behavior have impacted our fundamental understanding of psychology. Though some could not be repeated today due to breaches in ethical boundaries, that does not diminish the significance of those psychological studies. Some of these important findings include a greater awareness of depression and its symptoms, how people learn behaviors through the process of association and how individuals conform to a group.

Below, we take a look at seven famous psychological experiments that greatly influenced the field of psychology and our understanding of human behavior.

The Little Albert Experiment, 1920

A John’s Hopkins University professor, Dr. John B. Watson, and a graduate student wanted to test a learning process called classical conditioning. Classical conditioning involves learning involuntary or automatic behaviors by association, and Dr. Watson thought it formed the bedrock of human psychology.

A nine-month-old toddler, dubbed “Albert B,” was volunteered for Dr. Watson and Rosalie Rayner ‘s experiment. Albert played with white furry objects, and at first, the toddler displayed joy and affection. Over time, as he played with the objects, Dr. Watson would make a loud noise behind the child’s head to frighten him. After numerous trials, Albert was conditioned to be afraid when he saw white furry objects.

The study proved that humans could be conditioned to enjoy or fear something, which many psychologists believe could explain why people have irrational fears and how they may have developed early in life. This is a great example of experimental study psychology.

Stanford Prison Experiment, 1971

Stanford professor Philip Zimbardo wanted to learn how individuals conformed to societal roles. He wondered, for example, whether the tense relationship between prison guards and inmates in jails had more to do with the personalities of each or the environment.

During Zimbardo’s experiment , 24 male college students were assigned to be either a prisoner or a guard. The prisoners were held in a makeshift prison inside the basement of Stanford’s psychology department. They went through a standard booking process designed to take away their individuality and make them feel anonymous. Guards were given eight-hour shifts and tasked to treat the prisoners just like they would in real life.

Zimbardo found rather quickly that both the guards and prisoners fully adapted to their roles; in fact, he had to shut down the experiment after six days because it became too dangerous. Zimbardo even admitted he began thinking of himself as a police superintendent rather than a psychologist. The study confirmed that people will conform to the social roles they’re expected to play, especially overly stereotyped ones such as prison guards.

“We realized how ordinary people could be readily transformed from the good Dr. Jekyll to the evil Mr. Hyde,” Zimbardo wrote.

The Asch Conformity Study, 1951

Solomon Asch, a Polish-American social psychologist, was determined to see whether an individual would conform to a group’s decision, even if the individual knew it was incorrect. Conformity is defined by the American Psychological Association as the adjustment of a person’s opinions or thoughts so that they fall closer in line with those of other people or the normative standards of a social group or situation.

In his experiment , Asch selected 50 male college students to participate in a “vision test.” Individuals would have to determine which line on a card was longer. However, the individuals at the center of the experiment did not know that the other people taking the test were actors following scripts, and at times selected the wrong answer on purpose. Asch found that, on average over 12 trials, nearly one-third of the naive participants conformed with the incorrect majority, and only 25 percent never conformed to the incorrect majority. In the control group that featured only the participants and no actors, less than one percent of participants ever chose the wrong answer.

Asch’s experiment showed that people will conform to groups to fit in (normative influence) because of the belief that the group was better informed than the individual. This explains why some people change behaviors or beliefs when in a new group or social setting, even when it goes against past behaviors or beliefs.

The Bobo Doll Experiment, 1961, 1963

Stanford University professor Albert Bandura wanted to put the social learning theory into action. Social learning theory suggests that people can acquire new behaviors “through direct experience or by observing the behavior of others.” Using a Bobo doll , which is a blow-up toy in the shape of a life-size bowling pin, Bandura and his team tested whether children witnessing acts of aggression would copy them.

Bandura and two colleagues selected 36 boys and 36 girls between the ages of 3 and 6 from the Stanford University nursery and split them into three groups of 24. One group watched adults behaving aggressively toward the Bobo doll. In some cases, the adult subjects hit the doll with a hammer or threw it in the air. Another group was shown an adult playing with the Bobo doll in a non-aggressive manner, and the last group was not shown a model at all, just the Bobo doll.

After each session, children were taken to a room with toys and studied to see how their play patterns changed. In a room with aggressive toys (a mallet, dart guns, and a Bobo doll) and non-aggressive toys (a tea set, crayons, and plastic farm animals), Bandura and his colleagues observed that children who watched the aggressive adults were more likely to imitate the aggressive responses.

Unexpectedly, Bandura found that female children acted more physically aggressive after watching a male subject and more verbally aggressive after watching a female subject. The results of the study highlight how children learn behaviors from observing others.

The Learned Helplessness Experiment, 1965

Martin Seligman wanted to research a different angle related to Dr. Watson’s study of classical conditioning. In studying conditioning with dogs, Seligman made an astute observation : the subjects, which had already been conditioned to expect a light electric shock if they heard a bell, would sometimes give up after another negative outcome, rather than searching for the positive outcome.

Under normal circumstances, animals will always try to get away from negative outcomes. When Seligman tested his experiment on animals who hadn’t been previously conditioned, the animals attempted to find a positive outcome. Oppositely, the dogs who had been already conditioned to expect a negative response assumed there would be another negative response waiting for them, even in a different situation.

The conditioned dogs’ behavior became known as learned helplessness, the idea that some subjects won’t try to get out of a negative situation because past experiences have forced them to believe they are helpless. The study’s findings shed light on depression and its symptoms in humans.

Is a Psychology Degree Right for You?

Develop you strength in psychology, communication, critical thinking, research, writing, and more.

The Milgram Experiment, 1963

In the wake of the horrific atrocities carried out by Nazi Germany during World War II, Stanley Milgram wanted to test the levels of obedience to authority. The Yale University professor wanted to study if people would obey commands, even when it conflicted with the person’s conscience.

Participants of the condensed study , 40 males between the ages of 20 and 50, were split into learners and teachers. Though it seemed random, actors were always chosen as the learners, and unsuspecting participants were always the teachers. A learner was strapped to a chair with electrodes in one room while the experimenter äóñ another actor äóñ and a teacher went into another.

The teacher and learner went over a list of word pairs that the learner was told to memorize. When the learner incorrectly paired a set of words together, the teacher would shock the learner. The teacher believed the shocks ranged from mild all the way to life-threatening. In reality, the learner, who intentionally made mistakes, was not being shocked.

As the voltage of the shocks increased and the teachers became aware of the believed pain caused by them, some refused to continue the experiment. After prodding by the experimenter, 65 percent resumed. From the study, Milgram devised the agency theory , which suggests that people allow others to direct their actions because they believe the authority figure is qualified and will accept responsibility for the outcomes. Milgram’s findings help explain how people can make decisions against their own conscience, such as when participating in a war or genocide.

The Halo Effect Experiment, 1977

University of Michigan professors Richard Nisbett and Timothy Wilson were interested in following up a study from 50 years earlier on a concept known as the halo effect . In the 1920s, American psychologist Edward Thorndike researched a phenomenon in the U.S. military that showed cognitive bias. This is an error in how we think that affects how we perceive people and make judgements and decisions based on those perceptions.

In 1977, Nisbett and Wilson tested the halo effect using 118 college students (62 males, 56 females). Students were divided into two groups and were asked to evaluate a male Belgian teacher who spoke English with a heavy accent. Participants were shown one of two videotaped interviews with the teacher on a television monitor. The first interview showed the teacher interacting cordially with students, and the second interview showed the teacher behaving inhospitably. The subjects were then asked to rate the teacher’s physical appearance, mannerisms, and accent on an eight-point scale from appealing to irritating.

Nisbett and Wilson found that on physical appearance alone, 70 percent of the subjects rated the teacher as appealing when he was being respectful and irritating when he was cold. When the teacher was rude, 80 percent of the subjects rated his accent as irritating, as compared to nearly 50 percent when he was being kind.

The updated study on the halo effect shows that cognitive bias isn’t exclusive to a military environment. Cognitive bias can get in the way of making the correct decision, whether it’s during a job interview or deciding whether to buy a product that’s been endorsed by a celebrity we admire.

How Experiments Have Impacted Psychology Today

Contemporary psychologists have built on the findings of these studies to better understand human behaviors, mental illnesses, and the link between the mind and body. For their contributions to psychology, Watson, Bandura, Nisbett and Zimbardo were all awarded Gold Medals for Life Achievement from the American Psychological Foundation. Become part of the next generation of influential psychologists with King University’s online bachelor’s in psychology . Take advantage of King University’s flexible online schedule and complete the major coursework of your degree in as little as 16 months. Plus, as a psychology major, King University will prepare you for graduate school with original research on student projects as you pursue your goal of being a psychologist.

  • International
  • Education Jobs
  • Schools directory
  • Resources Education Jobs Schools directory News Search

AQA ALEVEL PSYCHOLOGY - Social Influence: Zimbardo’s (1973) Stanford Prison Experiment

AQA ALEVEL PSYCHOLOGY - Social Influence: Zimbardo’s (1973) Stanford Prison Experiment

Subject: Psychology

Age range: 16+

Resource type: Unit of work

CC H

Last updated

4 September 2024

  • Share through email
  • Share through twitter
  • Share through linkedin
  • Share through facebook
  • Share through pinterest

docx, 23.24 KB

This is a complete set of notes for Zimbardo’s Stanford Prison Experiment in AQA Alevel Psychology in the subtopic of social influence. This includes his aim, procedure, findings, conclusion and evaluation points. I hope this helps you with your revision. If you have any questions please message me or leave a reviews. Thank you

Tes paid licence How can I reuse this?

Your rating is required to reflect your happiness.

It's good to leave some feedback.

Something went wrong, please try again later.

This resource hasn't been reviewed yet

To ensure quality for our reviews, only customers who have purchased this resource can review it

Report this resource to let us know if it violates our terms and conditions. Our customer service team will review your report and will be in touch.

Not quite what you were looking for? Search by keyword to find the right resource:

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Test Conformity With Your Own Psychology Experiment

Plus, Questions to Spark Conformity Experiment Ideas

skynesher / Getty Images

  • Famous Experiments
  • Example Experiment
  • Experiment Ideas
  • Additional Tips

Conformity involves adopting certain attitudes or behaviors to fit in with a particular group of people. Conformity experiments can be interesting project ideas for a psychology class, in addition to just being fun to perform.

Here we share some conformity experiments that have sought to better understand how people conform . These can be used as inspiration when developing our own experiments. We also provide a few questions that can help us come up with even more conformity experiment ideas.

Famous Conformity Experiments

One of the most well-known series of conformity experiments was conducted by psychologist Solomon Asch in the 1950s. Known as the Asch conformity experiments , these studies demonstrated the impact of social pressure on individual behavior.  

Participants in these studies were told that they were in a vision experiment and asked to look at three lines of different lengths to determine which was the longest. They were then placed with a group that they thought included others in the study. In reality, the others were actually in on the experiment.

After a few trials where everyone stated the correct answer, the subjects who were in on the experiment began choosing an incorrect response. When surrounded by people citing an incorrect answer, 75% of the true study subjects also gave an incorrect response to at least one of the line-length questions.

Other Conformity Experiments

Another popular conformity experiment was performed on the TV show Candid Camera. It involved a group of people on an elevator who all stood facing the rear of the elevator. Inevitably, everyone else who got on ended up also facing the rear so as not to stand out from the rest.

One young man even turned repeatedly to every side, along with the rest of the group, and took off his hat when the others did.

Other conformity experiments that have been performed include:

  • Having a group of people stare up at a building
  • Picketing with blank signs and pamphlets for no specific cause
  • When one student leaves a classroom, the teacher has everyone else stand up when the student returns and sits down

A Conformity Experiment Example

Imagine that a student is in a math class and the instructor asks a basic math question. What is 8 x 4? The student knows that the correct answer is 32. However, when the teacher begins asking other students in the room for the answer, each one says that it is 27. How does the student respond?

This is a classic example of a conformity experiment in action. When the teacher finally gets to the student, does the student trust their own math skills and provide the correct answer or do they go along with the rest of the group and say that the answer is 27—even when they know that this is an incorrect response?

For some, the desire to fit in and belong is so strong that they will provide an answer that they know is incorrect. This helps them avoid being considered an "outsider" to others in the group.

Conformity Experiment Ideas

One way to envision our own experiment is to consider some of the conformity experiments that have been performed in the past. It can also be helpful to consider a few questions we could answer in our own psychology experiment.

Here are some questions that may spark a few conformity experiment ideas:

  • How does group size impact conformity? Try the experiment with different numbers of helpers to see how many other people must be present before a person starts conforming to the group.
  • What effect does age have on conformity? Try the experiment with participants in different age groups to see if the results differ.
  • What's the impact of gender on conformity ? Experiment to see if a participant is more likely to conform if other participants are of the same gender. What are the results if no other participants share their gender?
  • How does the situation influence conformity? Are people more likely to conform in certain settings, such as a classroom, than they are in more natural, everyday settings? Run trials in various settings to see if there is a difference.

Additional Conformity Experiment Tips

Performing a psychology experiment for class can be a bit intimidating. Before beginning, we should always talk with the instructor about our experiment ideas to be sure that we have permission to carry out our project. In some cases, we may have to submit our idea for review beforehand to receive permission to experiment with human participants.

Conducting conformity experiments is a great way to learn more about the impacts that groups can have on individuals. Playing around with certain variables can widen our understanding of how far people will go to fit in, making them good conformity experiment ideas to try.

Kim D, Hommel B. Social cognition 2.0: Toward mechanistic theorizing . Front Psychol . 2019;10:2643. doi:10.3389/fpsyg.2019.02643

Sowden S, Koletsi S, Lymberopoulos E, Militaru E, Catmur C, Bird G. Quantifying compliance and acceptance through public and private social conformity . Conscious Cogn . 2018;65:359-367. doi:10.1016/j.concog.2018.08.009

Howard J. Bandwagon effect and authority bias . In: Cognitive Errors and Diagnostic Mistakes . 2018:21-56. doi:10.1007/978-3-319-93224-8_3

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Evaluating the perceived affective qualities of urban soundscapes through audiovisual experiments

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Faculty of Visual Arts, Federal University of Goiás, Goiânia, Brazil, Chair of Acoustics and Haptics, Technische Universität Dresden, Dresden, Germany

ORCID logo

Roles Conceptualization, Data curation, Formal analysis, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliations Acoustics Research Centre, University of Salford, Manchester, United Kingdom, Environmental Research & Innovation Centre, University of Salford, Manchester, United Kingdom

Roles Conceptualization, Resources, Supervision, Validation, Writing – review & editing

Affiliation Chair of Acoustics and Haptics, Technische Universität Dresden, Dresden, Germany

Roles Conceptualization, Resources, Supervision, Writing – review & editing

  • Maria Luiza de Ulhôa Carvalho, 
  • Margret Sibylle Engel, 
  • Bruno M. Fazenda, 
  • William J. Davies

PLOS

  • Published: September 5, 2024
  • https://doi.org/10.1371/journal.pone.0306261
  • Reader Comments

Fig 1

The study of the perceived affective qualities (PAQs) in soundscape assessments have increased in recent years, with methods varying from in-situ to laboratory. Through technological advances, virtual reality (VR) has facilitated evaluations of multiple locations in the same experiment. In this paper, VR reproductions of different urban sites were presented in an online and laboratory environment testing three locations in Greater Manchester (‘Park’, ‘Plaza’, and pedestrian ‘Street’) in two population densities (empty and busy) using ISO/TS 12913–2 (2018) soundscape PAQs. The studied areas had audio and video recordings prepared for 360 video and binaural audio VR reproductions. The aims were to observe population density effects within locations (Wilcoxon test) and variations between locations (Mann-Whitney U test) within methods. Population density and comparisons among locations demonstrated a significant effect on most PAQs. Results also suggested that big cities can present homogenous sounds, composing a ‘blended’ urban soundscape, independently of functionality. These findings can support urban design in a low-cost approach, where urban planners can test different scenarios and interventions.

Citation: Carvalho MLdU, Engel MS, Fazenda BM, Davies WJ (2024) Evaluating the perceived affective qualities of urban soundscapes through audiovisual experiments. PLoS ONE 19(9): e0306261. https://doi.org/10.1371/journal.pone.0306261

Editor: Shazia Khalid, National University of Medical Sciences, PAKISTAN

Received: October 31, 2023; Accepted: June 13, 2024; Published: September 5, 2024

Copyright: © 2024 Carvalho et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data (scores for the perceived affective qualities - PAQs) are within the manuscript and its Supporting Information files.

Funding: The work was funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and the Universidade Federal de Goiás pole Goiânia, Brazil. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Since the publication of the ISO/TS 12913–2 [ 1 ], the characterisation of the affective attributes regarding the sonic environment has increased significantly over the years [ 2 – 7 ]. These affective attributes, or Perceived Affective Qualities (PAQs), originated from Axelsson et al. [ 8 ] research. They helped to detect the sound qualities of the investigated area, resulting in tools for urban sound management, effective urban planning, and noise control [ 9 ]. Studies point out that understanding emotional responses to soundscape supports design decisions [ 10 ], a better opportunity to achieve users’ satisfaction [ 11 ], and quality of life [ 12 ].

Regarding the emotional assessment of the acoustic environments, the work of Axelsson et al. [ 8 ] has been the reference for soundscape research. Their model was based on Russell’s circumplex affective model for environments [ 13 ]. Axelsson et al. [ 8 ] synthesised the semantic scales into a two-dimensional space constructed by pleasantness and eventfulness, which later was adopted as the PAQs in method A of the standard ISO/TS 12913–2 [ 1 ]. When rotating these two axes at 45 degrees, their diagonals result in additional dimensions, composed of the mixture related to the pleasant and eventful orthogonal axes. Thus, the standard ISO/TS 12913–2 introduces and describes the resulting eight attributes’ pairs: ‘eventful-uneventful’, ‘pleasant-annoying’, ‘vibrant-monotonous’, and ‘calm-chaotic’. However, this model is still under investigation and validation in other languages through the Soundscape Attributes Translation Project [ 14 ]. For instance, soundscape investigators lack consensus in identifying the origins and effects of emotional responses to sounds [ 4 , 15 , 16 ]. To assess these scales, researchers use self-reports, where people perceive these sounds through methods ranging from in-situ experiments to laboratory experiments, including virtual reality (VR).

The main methods for subjective data collection in soundscape studies have been soundwalks, interviews, listening tests, and focus groups [ 17 ]. The ISO/TS 12.913–2 suggests the first two methods [ 1 ]. However, the systematic review from Engel et al. [ 17 ] demonstrated that most recent studies use listening tests with the main topic of ‘soundscape quality’, using semantic differential tools to evaluate the stimuli of parks, squares, shopping areas, and traffic sounds, with students and academic staff as participants [ 17 ]. The controlled environment of the experiments happens in acoustically treated rooms with calibrated audio reproduction systems [ 18 ]. These studies allow the investigation of various aspects influencing auditory codification and perception [ 19 ], guaranteeing purity and control of factors [ 18 ], and enabling analyses of complex interactions or distinct effects [ 20 ]. In the laboratory, there are several listening experiment modalities, including with and without visual material [ 21 ], from simple (mono) [ 22 ] to complex audio reproduction (spatial audio) [ 23 ], multimodality (different sensorial stimuli), potentially implemented through Virtual Reality (VR) experiments.

Furthermore, VR technology can facilitate the evaluation of multiple locations in the same experiment under safe conditions [ 18 ] in a more engaging experiment [ 24 ], allowing observations of the effects on presence, realism, involvement, distraction level, and auditory aspect [ 25 ]. Participants are immersed in realist scenarios, giving them a ‘sense of presence’ [ 26 ], representing a similar experience of being in the real place. Audio, visual, tactile, and smells can enhance the multimodal experience. Regarding the virtual sonic environment, reproduction formats vary from mono to spatial audio [ 27 ]. Binaural audio played by headphones and ambisonics audio through loudspeakers are the main forms of audio reproduction in soundscape studies. In Sun et al. [ 28 , 29 ] study, when testing spatial audio through headphones and loudspeakers in a VR experiment, participants subjective responses demonstrated that the sense of immersion and realism were not affected by the type of audio reproduction.

Nevertheless, field and VR laboratory tests should sustain the experimental ‘ecological validity’. To guarantee this experimental condition, the laboratory reproduction of real-life audiovisual stimuli should create a similar sense of immersion and realism as in the original scenery [ 30 ]. If similarities are maintained between real and VR reproductions, laboratory experiments can support research with controlled factors. However, this may amplify results and biased conclusions, thus, outcomes should be interpreted cautiously [ 6 ]. So far, most studies have confirmed similar soundscape perceptions between in-situ and laboratory VR listening experiments [ 6 , 31 – 33 ], pointing out VR methods as a good strategy for soundscape research.

Another self-report data collection method is online experiments, which increased significantly during COVID-19. For example, the Lucid platform for online data collection in research tripled in purchases from 2019 to 2020 [ 34 ]. The drawbacks of online experiments are reduced attentiveness [ 34 ], the lack of controlled audio reproductions and system calibration used by the participants [ 32 ], the absence of assistants during the experiment, and unreliable responses given by different participants due to their context, among others [ 35 ]. The advantages of using a web-based approach in soundscape studies include a higher number of participants, ease of sharing, and engagement of citizens in sound design and urban planning. Regarding the urban sound design, ‘local experts’, people who live and use the studied location [ 36 ], local authorities, planners, designers and whoever is related to the site, should discuss their interests to indicate activities to the urban place [ 37 ]. Diversity in activities tends to create a more dynamic atmosphere in urban places. In these circumstances, acoustic zoning consists in giving the distance in space, time, or both [ 37 ]. Bento Coelho describes in his soundscape design process that a sound catalogue or sound identity map should be developed, where sounds are correlated to functions, activities, other senses, and preferred sounds of the place [ 38 ]. Additionally, appropriateness [ 7 ], and the expectations [ 39 ] of the sonic environment should reach towards a coherent soundscape. The guidelines mentioned above can delimit the acoustic zones based on sound sources, avoiding ‘lo-fi’ soundscapes. The latter represents sounds that are not easily located in an obscure population of sounds [ 40 ]—which may represent a ‘blended’ sonic environment. Its opposite is the ‘hi-fi’ soundscape with a clear distinction between foreground and background sounds [ 40 ], making it simple to identify the predominant sound source in the sonic environment.

The acoustically delimitated zones can correlate to the characteristics and functions of the locations. Urban soundscape studies have sites varying among natural places, public areas, squares, pedestrian streets, and shopping areas [ 17 ]. However, vibrant places are less studied. These are related to pleasant and eventful attributes linked to busy contexts in specific human activities [ 41 ]. Previous works confirm that the ‘presence of people’ in places leads to the ‘eventful’ dimension and may define a vibrant experience [ 3 , 29 ]. Most soundscape studies investigate parks, where natural sounds indicate psychological restoration [ 42 ], places for human de-stress [ 5 , 42 ], and improvement in the sonic environment evaluation [ 43 ]. These locations may represent pleasant places that can flourish feelings of joy and facilitate the public into fulfilling self-selected activities.

Based on the presented factors, this work adopts VR experiments through an online VR experiment, The Manchester Soundscape Experiment Online (MCR online), carried out in 2020, and a laboratory VR experiment, The Laboratory VR Soundscape Experiment (LAB VR), carried out in 2022, using spatial audio and 360° video recordings. Participants will be exposed to three urban sites (Peel Park—an urban park; Market Street—a pedestrian street; and Piccadilly Gardens—a plaza) in two population densities (empty and busy), followed by a self-report of the soundscape PAQs. The investigated hypotheses are four statements stated below. The Wilcoxon signed-rank test will be applied for comparisons within the two experiments, empty and busy conditions for the same location. In this case, the null and alternative hypotheses are:

  • H 01 = The perceptual response (PAQs) will change when in different population densities in the same location and experiment; and
  • H a1 = The perceptual response (PAQs) will not change when in different population densities in the same location and experiment.

The Mann–Whitney U test will be applied to compare the different soundscape locations for each data collection method, being their hypotheses as follows:

  • H 02 = The perceptual response (PAQs) will change according to the different urban locations for each data collection method; and
  • H a2 = The perceptual response (PAQs) will not change according to the different urban locations for each data collection method.

The PAQs of the ISO/TS 12913–2 [ 1 ] were selected as subjective responses given its international standardization. The aim is to observe the PAQ results from the previous two perspectives. The first view concerns an evaluation within each experiment where differences between the two population densities are analysed. Second, the variation between locations for each experimental method is investigated. Findings are considered to enhance comprehension of how people perceive the studied urban soundscape conditions through different VR methods, supporting urban sound design and future urban development appraisal [ 44 ].

2. Materials and methods

Fig 1 illustrates the investigated areas defined according to a previous study by Carvalho et al. [ 45 ]. They were derived from a structured interview to identify locations within the four quadrants of the ISO/TS 12913–2 [ 1 ] PAQs quadrants (‘vibrant’, ‘calm’, ‘monotonous’, and ‘chaotic’ attributes).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The top illustrates all locations on the Manchester map. The middle row shows the ‘Street’ map, pictures of empty and busy conditions, the ‘Plaza’ map, and pictures of empty and busy conditions. The bottom row illustrates the ‘Park’ map, pictures of empty and busy conditions, north, and the UK map with Manchester’s position. The yellow dots are the evaluated sites. The areas shaded in blue are the areas studied. Pictures of Carvalho taken between 2019 to 2020.

https://doi.org/10.1371/journal.pone.0306261.g001

2.1 Study areas

Piccadilly Gardens (a popular plaza in the city centre) represented the ‘vibrant’ attribute called ‘Plaza’ from now on in the paper. Peel Park (a park at the University of Salford) exemplified the ‘calm’ attribute referred to as ‘Park’ hereafter. A bus stop (common bus stop in front of the University of Salford) corresponded to the ‘monotonous’ attribute, and Market Street (pedestrian commercial street) was selected for the ‘chaotic’ attribute, hereinafter, referred to as ‘Street’. The bus stop was excluded because the LAB VR experiment did not use this condition.

Piccadilly Gardens is the largest public space in central Manchester, with 1.49 Ha and various functions such as crossing, eating places, children’s play, and places for small and large events [ 46 ]. A contemporary design changed the garden into a Plaza in 2002 [ 46 ] that included a water fountain, playground, café store, a barrier by Japanese architect Tadao Ando that also served as protection of the central plaza, grass areas, and trees where people sit on sunny days. The location is surrounded by Piccadilly Street at the north, Mosley Street at the west, Parker Street at the south, and One Piccadilly Gardens building at the east side. The constant sound source in both population densities was sounds originating from the water fountain. In the empty condition, the fountain sound was predominant, but mechanical sounds were also present in the background. In the busy condition, the predominant sound was a rich presence of human sounds, such as chat and kids shouting, while traffic sounds from nearby trams and their breaks were audible in the background.

Peel Park has 9.40 Ha and is one of the oldest public parks in the world, dating from 1846 [ 47 ]. Today, it integrates with the Peel Park Campus of the University of Salford, including walking paths, tall and scattered trees, a playground structure, sculptures, a garden with flowerbeds, lots of green area, and benches to sit. The park is surrounded by the Student Accommodation and access to the David Lewis Sports Ground at the north; the River Irwell with a bridge to The Meadow, a public green space, and a housing area at the east; the Maxwell Building, and the Salford Museum and Art Gallery on the south; and the University House, the Clifford Library, and the Cockcroft Building at the westside. The local population uses the location for ‘passive’ recreation, exercise, and crossing paths to other sites. The constant sound source in both population densities was sounds of nature, specifically from the calls of birds. In the empty condition, four different bird calls were predominant and identified, them being ‘Pica Pica’, ‘Eurasian Wren’, ‘Redwing’, and the ‘Eurasian Tree Cree’. In the busy conditions, the bird call was not recognized, given the masking effects of human sounds, placing the nature sounds in the background, while the predominant foreground sounds were children talking, shouting, and playing football.

Market Street is approximately 370 meters long, with a 280-meter pedestrian zone occupying around 0.91 Ha. Exchange Street delimits it on the west until High Street on the east. The pedestrian zone is between High Street and Corporation Street, with primarily commercial activities such as clothes and shoe stores, banks, grocery stores, street food huts, gyms, bookstores, mobile stores, pharmacies, coffee stores, and three accesses to the Manchester Arndale Shopping. When the street gains traffic, commercial activities are more related to beauty products, confectionery, stationary, clothing and footwear, coffee shops, and access to the Royal Exchange Building. The constant sound source in both population densities was the ‘hoot’ from the nearby tram. In the empty condition, the predominant sounds were mechanical sounds, such as snaps of machinery in different rhythms and frequency intervals. Traffic and chats were also present in this condition. In the busy condition, snaps were still present, but predominance was related to human-made sounds, such as babble and footsteps.

2.2 Audiovisual preparation

Two different footages of the same studied areas were tested with two methods: an online VR questionnaire (MCR online) and a laboratory VR experiment (LAB VR). Audiovisual stimuli were different recordings in each experiment because participants of the MCR online complained about the video resolution. Thus, new recordings with a higher resolution camera occurred for the LAB VR. Nevertheless, all recordings were done in the same position. The study was conducted and approved by the Research, Innovation and Academic Engagement Ethical Approval Panel of the University of Salford (protocol code STR1819-31). Fig 2 illustrates the workflow for constructing the VR environments for the experiments.

thumbnail

Each column represents a stage.

https://doi.org/10.1371/journal.pone.0306261.g002

The Sound field microphone ST250 and the sound pressure level meter, type BSWA 308, were used in recordings with a sampling rate of 44.1 kHz. For the MCR online, the microphone was plugged into a ZOOM H6 Handy Recorder for the audios, and the Ricoh Theta S camera was used for the 360° videos. In the LAB VR, the microphone was plugged into an Edirol R44 Recorder, and an Insta 360 Pro2 360° video camera was used for video recording.

Given ethical approval restrictions, a sign warning ‘Filming in progress’ was displayed with the equipment for public awareness before recordings. With a previously calibrated sound pressure level meter, a one-minute sample of A-weighted equivalent continuous sound pressure (L Aeq,60 ) registered sound levels to adjust the field levels to laboratory reproductions. After initiating the microphone and camera, the researcher clapped in front of the equipment for future audiovisual alignment.

Recordings were done in the early hours (4 to 6 am) of a weekday for empty, and the afternoon (2 to 4 pm) at the weekend for busy conditions. On arrival, the locations were established so, as to not interrupt circulation. The experimenter merged into the scenery, and the recordings lasted 10 to 12 minutes [ 29 ]. These procedures resembled those done by the ‘Urban Soundscapes of the World’ project group [ 28 , 29 , 48 ].

Video files were transformed into equirectangular format (MCR online) or edited together (VR LAB). Audio and video stimuli were synchronised in time with the initial clap, verified and corrected when necessary. On the MCR online, the selected audiovisual stimuli had a 30-second duration following a previous study [ 49 ]. The stimuli duration changed to 8 seconds in the LAB VR, using as reference a fMRI soundscape experiment [ 50 ], because of a physiological test in another stage of the experiment.

A population density calculation occurred using the footage to select the audiovisual stimuli. The people-counting criteria followed a previous study that measured the number of individuals from a selected frame [ 51 ]. Surveys with ten participants were used to certify selected footage for empty and busy conditions. When the criteria failed, new stimuli selection took place. A descriptive analysis of the sound events, foreground and background sounds, was done of the footage with empty and busy conditions to select fragments rich in soundscape diversity [ 52 ], identity [ 53 ], character [ 54 ], and sound signal [ 40 ]. The LAB VR also had controlled sound signals, such as the water fountain at the ‘Plaza’, the tram hoot at the ‘Street’, and the bird calls at the ‘Park’ in empty and busy conditions.

Audio files were calibrated to the field sound levels using a pre-calibrated High-frequency Head and Torso Simulator (HATS) connected to a PULSE software of Brüel & Kjær [ 6 ]. Audiovisual stimuli were aligned through audio rotation using the azimuth angle θ from the first-order ambisonics equations, that is, audio X from front-back positions of B-format audio recordings—WXYZ) [ 22 ]. The audio and video files were rendered into 3D head-tracked stimuli for VR reproduction. Stimuli reproductions were tested through the final experimental VR and headphone setup, recorded for calibration, verified in each step, and corrected when necessary.

2.3 Participants and experimental procedures

Participants were recruited by the Acoustics Research Centre of Salford mailing list representing people with connections to the University of Salford, and above 18 years old in both experiments. The MCR online also had respondents recruited by convenience sampling over the internet on social networks, such as Facebook, Instagram, Twitter, and LinkedIn, and participated voluntarily from August 26 to November 30, 2020. The LAB VR received a compensation of £25 in Amazon voucher. These subjects were recruited from June 27 to August 5, 2022.

Conditions were three locations (‘Park’, ‘Plaza’, and ‘Street’) in two population densities (empty and busy) responding to the eight PAQs questions. MCR online had 80 individuals rating the ‘Plaza’ and ‘Street’ (80 x 2-sites x 2-densities x 8-PAQs = 2560 results), and 75 assessing the ‘Park’ (75 x 2-densities x 8-PAQs = 1200 results). LAB VR had 36 participants (36 x 3-sites x 2-densities x 8-PAQs = 1728 results).

At the beginning of both experiments, participants signed a written consent form and received an information sheet describing the experiment and its procedure. Given the MCR online also had Brazilian participants, the questionnaires were translated to the Portuguese language. Subjects were divided into two groups to reduce experimental time: ‘Plaza’ and ‘Street’, and ‘Park’ and a bus stop. Recommendations were to use headphones and, when using mobile phones, to turn into a landscape orientation for better performance.

In the LAB VR, tests were done inside a semi-anechoic chamber at the Acoustics Research Centre of the University of Salford, Manchester, UK. Considering that cases of COVID were still occurring (July 2022), an email detailed COVID-free protocol before arriving. Participants sat in the centre of the semi-anechoic chamber, watched a short video explaining the research, answered the general information questions, and conducted a training session. They watched the six audiovisual stimuli through the VIVE HMD with a Beyerdynamic DT 1990 Pro headset as many times as they wished and answered the subjective questions presented on a laptop.

Questionnaires were developed in an online platform. For the MCR online, the questionnaire began with a written consent form. General questions were asked about demographics (gender, age, nationality, and residency), auditory health (evidence of hearing loss, and tinnitus), and digital settings (what audio and video system they used during the experiment). Questions were responded to after watching each video. They were phrased: ‘Please, slide to the word that best describes the sounds you just heard. To the left (-) is NEGATIVE, and to the right (+) is POSITIVE.’ Paired PAQs presented with three synonyms each were ‘unpleasant-pleasant’, ‘uneventful-eventful’, ‘chaotic-calm’, and ‘monotonous-vibrant’ PAQs. Scores ranged from -10 to +10 for negative to positive semantic values of terms through a slider.

In the LAB VR, video and questions were randomly presented. General questions were demographic, auditory health (as in the MCR online), number of languages spoken, education level, and acoustic or music background (no, a little, moderate, and expert level). The experimental questions were formulated: ‘To what extent do you think the sound environment you just experienced was. . . 0 = Not at all, 50 = Neutral, and 100 = Extremely’. The PAQs were presented individually and rated through a slider. The soundscape attributes tested were ‘pleasant’, ‘calm’, ‘uneventful’, ‘monotonous’, ‘annoying’, ‘chaotic’, ‘eventful’, and ‘vibrant’ PAQs separately. In both experiments, there was a final open question to have feedback regarding experiments.

2.4 Statistical analysis

Since data collection had different scales, the MCR online results separated the Paired PAQs, and -10 to +10 ratings inverted to zero (0) to one hundred (100) scores, while the LAB VR maintained as in the original scale. A summary of collected data is presented in Table 1 . Statistical analysis included the Wilcoxon signed-rank test for comparisons of the empty and busy conditions within the same location, and the Mann–Whitney U test for comparing the different locations for the same population density, being both tests within the same experiment. Given comparisons were only between two conditions and data collection was on a continuous scale, a correction for multiple comparisons (Bonferroni) was unnecessary. Significant group differences were tested with the help of the statistical package IBM SPSS Statistics 29.0.1.0 ®.

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t001

3.1 Descriptive analysis of participants

Table 2 presents the demographic information for the MCR online and LAB VR experiments. The MCR online occurred online from August to November 2020. The 155 participants came from 63 countries: 52% from Brazil, 12% from the UK, and 14% from other parts of the world, including Europe, Africa, North and South America, Asia, and the Middle East. In Group 1, 80% used a computer screen and 20% a smartphone to watch the videos, while 76% used headphones and 24% external audio to reproduce audio signals during the experiment. 89% declared they had no hearing loss, and 11% had some hearing loss. 77% mentioned not to have tinnitus, and 23% to have signs of tinnitus [ 45 ]. In Group 2, 86% used a computer screen and 14% a smartphone to watch the videos, while 65% used headphones and 35% external audio to reproduce audio signals during the experiment. 90% declared they had no hearing loss, and 10% had some hearing loss. 81% mentioned not to have tinnitus, and 19% to have signs of tinnitus [ 55 ].

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t002

For the LAB VR, participants originated from 11 countries, with 47% from the United Kingdom, 17% from India, and 36% from other parts of the world including Europe, Africa, South America, and Asia. 97% declared no hearing loss, and 3% mild hearing loss. 83% mentioned not having tinnitus, and 17% heard infrequently or regularly signs of tinnitus.

The MCR online counted 4.3 times more participants (N = 155) compared to the LAB VR (N = 36). In summary, over 50% of Brazilians participated in the MCR online, followed by 12% of British with a predominant age range of 26 to 35 years old (35%) and balanced gender distribution.

3.2 Descriptive analysis of auditory stimuli

The acoustic and psychoacoustic characteristics of the auditory stimuli for each tested scenario are demonstrated in Tables 3 and 4 . For the MCR online, 17 visits from January to December 2019 on days with no precipitation were done at Peel Park, Piccadilly Gardens, and Market Street in empty and busy conditions to collect audio recordings for the online experiment. For the LAB VR, a total of nine visits to execute field recordings were done from December 2020 to July 2021 on days with no precipitation forecast in the empty and busy conditions at Piccadilly Gardens (Plaza), Market Street (Street), and Peel Park (Park).

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t003

thumbnail

Loudness (N), Sharpness (S), Roughness (R), Fluctuation Strength (FS), and Tonality (T).

https://doi.org/10.1371/journal.pone.0306261.t004

As observed in Table 3 , the higher value for 1 min L Aeq on the MCR online was for the ‘Plaza’ busy scenario with 70 dB(A), while the smallest value was observed for the ‘Park’ empty scenario with 46 dB(A). In the LAB VR, the superior value was for the ‘Plaza’ empty with 64.5 dB(A), and the smallest appeared for the ‘Park’ empty scenario with 47.1 dB(A).

Table 4 shows the psychoacoustic metrics of each scenario’s auditory stimuli used for the LAB VR. Greater values are observed at the ‘Plaza’ busy for Loudness (N = 23.01 sone), Sharpness (S = 1.84 acum), and Tonality (T = 0.25 tu); at the ‘Park’ empty for Roughness (R = 0.03 asper); at the ‘Park’ busy for Roughness (R = 0.03 asper) and Tonality (T = 0.25 tu); and at the ‘Street ‘ busy for Roughness (R = 0.03 asper) and Fluctuation Strength (FS = 0.04 vacil). The smallest values are observed at the ‘Street’ empty for Loudnes (N = 10.61 sone), Sharpness (S = 1.31 acum), Roughness (R = 0.02 asper), Fluctuation Strength (FS = 0.02 vacil), and Tonality (T = 0.02 tu). It was also observed the smaller values of Sharpness(S = 1.31 acum) at the ‘Park’ busy, Roughness (R = 0.02 asper), at the ‘Plaza’ busy; Roughness (R = 0.02 asper), and Fluctuation Strength (FS = 0.02 vacil) at the ‘Plaza’ empty.

3.3 Wilcoxon signed-ranks test results for busy versus empty conditions

The Wilcoxon signed-ranks test evaluated how the spaces were rated in busy and empty conditions for each location and the data collection method. Table 5 shows the Wilcoxon signed-ranks test results, which suit two related samples with a non-normal distribution. Values with significant p-values indicate that there are differences between samples. 85.4% (41 PAQs) of results presented significant differences between empty and busy conditions in the studied locations, and 14.6% (7 PAQs) of results had an unexpected similarity. Fig 3 shows a set of boxplots for each studied area and data collection method, where comparing the results in busy and empty conditions is possible. It also represents the significance level of the Wilcoxon signed rank test using * for p-values below 0.05 and ** for p-values inferior to 0.001. In the boxplots, there is a higher distribution in busy conditions on positive qualities such as ‘calm’, ‘eventful’, ‘pleasant’ and ‘vibrant’ in all samples (3a-3f), while in empty conditions, ratings concentrated over the neutral answer. A smaller distribution of negative qualities such as ‘uneventful’ and ‘monotonous’ is also observed.

thumbnail

Columns for ‘Plaza’ (3a & 3d), ‘Park’ (3b & 3e), and ‘Street’ (3c & 3f); and rows for MCR online (3a-3c), and LAB VR (3d-3f). * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g003

thumbnail

Where * represents the p-value for 2-tailed significance.

https://doi.org/10.1371/journal.pone.0306261.t005

As observed in Table 5 , the significant results for the MCR Online dataset between busy and empty presented in descending order were as follows: the ‘eventful’ PAQ in the ‘Street’ (Z = -7.16, p<0.001); the ‘vibrant’ PAQ in the ‘Plaza’ (Z = -6.888, p<0.001); the ‘uneventful’ PAQ in the ‘Street’ (Z = -6.647, p<0.001); the ‘calm’ in the ‘Park’ (Z = -6.645, p<0.001); the ‘monotonous’ PAQ in the ‘Street’ (Z = -6.629, p<0.001); the ‘pleasant’ PAQ in the ‘Park’ (Z = -5.791, p<0.001); the ‘chaotic’ PAQ in the ‘Street’ (Z = -4.626, p<0.001); and the ‘annoying’ PAQ in the ‘Plaza’ (Z = -3.685, p<0.001).

As observed in Table 5 , the PAQ with non-significant values on the MCR Online and LAB VR is the quality of ‘annoying’ with a score of zero in all studied areas, except on the MCR Online at the ‘Plaza’. Non-significant level ratings regarding the quality ‘pleasant’ were observed with a score around 50 at the ‘Plaza’ and ‘vibrant’ with a neutral score at the ‘Street’ studied areas. The non-significant p-values from the qualities mentioned above indicate no perceived acoustic differences between the empty and busy conditions.

For the LAB VR dataset, the superior difference between busy and empty were in descending order as follows: the ‘vibrant’ PAQ at the ‘Plaza’ (Z = -4.611, p<0.001); the ‘uneventful’ PAQ at the ‘Street’ (Z = -4.577, p<0.001); the ‘eventful’ PAQ at the ‘Park’ (Z = -4.263, p<0.001); the ‘monotonous’ PAQ at the ‘Street’ (Z = -4.229, p<0.001); the ‘calm’ PAQ at the ‘Park’ (Z = -4.227, p<0.001); the ‘chaotic’ PAQ at the ‘Street’ (Z = -3.99, p<0.001); and the ‘pleasant’ PAQ at the ‘Street’ (Z = -3.359, p<0.05).

3.4 Mann-Whitney U test results for comparison between locations

The Mann-Whitney U test helped compare the same population density condition among different locations in each data collection method. Table 6 shows the results of the Mann-Whitney U test, which suits two independent samples with non-normal distribution. Significant p-values indicate that there are differences between locations. Some PAQs had no differences among locations, meaning no significance with a p-value higher than 0.05. Figs 4 and 5 show the set of boxplots for each studied area comparisons and data collection, where it is possible to compare the results in busy and empty conditions. It also represents the significance level of the Mann-Whitney U tests using * for p-values below 0.05 and ** for p-values inferior to 0.001.

thumbnail

Columns for comparisons of ‘Plaza’ vs. ‘Street’ (4a & 4d), Park’ vs. ‘Street’ (4b & 4e), and ‘Park’ vs. ‘Plaza’ (4c & 4f); and rows for empty (4a-4c), and busy (4d-4f) conditions. * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g004

thumbnail

Columns for comparisons of ‘Plaza’ vs. ‘Street’ (5a & 5d), Park’ vs. ‘Street’ (5b & 5e), and ‘Park’ vs. ‘Plaza’ (5c & 5f); and rows for empty (5a-5c), and busy (5d-5f) conditions. * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g005

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t006

For MCR online, 64.6% (31 PAQs) of results presented significant differences when comparing different locations, and 35.4% (17 PAQs) had similar results. Fig 4 shows the results from MCR online. It is possible to observe in the comparison of ‘Plaza’ vs. ‘Street’ that in the empty condition, there is a higher dispersion of results on the attribute ‘calm’ ( Fig 4A ). In contrast, in busy conditions, the same dispersion occurs on ‘vibrant’, ‘eventful’, ‘annoying’, ‘chaotic’, and ‘pleasant’ ( Fig 4D ). For the ‘Park’ vs. ‘Street’ comparison, the dispersion of responses in the empty condition happens on the ‘calm’, ‘monotonous’, and ‘uneventful’ attributes ( Fig 4B ), meanwhile, for the busy condition dispersion was on the ‘eventful’, ‘pleasant’, ‘vibrant’, ‘annoying’, and ‘chaotic’ attributes ( Fig 4E ). In the ‘Park’ vs. ‘Plaza’ comparison, the attributes with superior dispersion on the empty condition are ‘calm’, ‘monotonous’, and ‘uneventful’ ( Fig 4C ), while in the busy condition on the ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ attributes ( Fig 4F ).

Derived from Table 6 , the significant U values for each location comparison are presented in descending order. In the MCR Online dataset, the greatest differences between population density were as follows: for ‘Street’ vs. ‘Park’ busy, the ‘uneventful’ PAQ (U = 2754.5, p<0.05); for ‘Plaza’ vs. ‘Park’ busy, the ‘chaotic’ PAQ (U = 2471.5, p<0.05); for the same locations in the empty condition, the ‘monotonous’ PAQ (U = 2424.0, p<0.05); in the ‘Plaza’ vs. ‘Street’ busy, the ‘calm’ PAQ (U = 2405.0, p<0.05); and in the ‘Street’ vs. ‘Park’ empty, the ‘eventful’ PAQ (U = 2374.0, p<0.05).

Regarding the non-significant results also presented in Fig 4 for the MCR online, ratings around zero were observed in different PAQs, as follows: ‘uneventful’ in the ‘Plaza’ vs. ‘Street’ ( Fig 4D ), and ‘Park’ vs. ‘Plaza’ ( Fig 4F ) both for the busy condition; ‘eventful’ in the ‘Plaza’ vs. ‘Street’ ( Fig 4A ), and ‘Park’ vs. ‘Plaza’ ( Fig 4C ) both for the empty condition; ‘annoying’ for the ‘Park’ vs. ‘Plaza’ ( Fig 4C and 4F ) in both conditions; ‘calm’ in the ‘Park’ vs. ‘Plaza’ ( Fig 4C ) for the empty condition; and ‘chaotic’ in the ‘Park’ vs. ‘Plaza’ ( Fig 4C ) for the empty condition. Additionally, the ‘eventful’ scale had similar scores of around 50 for the ‘Plaza’ vs. ‘Street’ ( Fig 4D ) in the busy conditions. For the ‘uneventful’ scale, the comparisons of ‘Plaza’ vs. ‘Street’ ( Fig 4A ), and ‘Park’ vs. ‘Plaza’ ( Fig 4C ) in the empty condition had values around 20. The ‘pleasant’ PAQ scores were around 60 and 25 in the ‘Park’ vs. ‘Plaza’ for the empty ( Fig 4C ) and busy ( Fig 4F ) conditions, respectively. The ‘calm’ scores were around 60 in the ‘Park’ vs. ‘Plaza’ Fig 4C ) in the empty condition. For the busy condition, the ‘vibrant’ scores were around 25 in the ‘Park’ vs. ‘Plaza’ ( Fig 4F ).

For the LAB VR, 62.5% (30 PAQs) of results presented significant differences when comparing different locations, and 37.5% (18 PAQs) had similar results. Fig 5 shows the results from the LAB VR. Regarding the ‘Plaza’ vs. ‘Street’ comparison, the dispersion occurs on the attributes ‘calm’, ‘monotonous’, and ‘uneventful’ for the empty condition ( Fig 5A ); and ‘pleasant’, ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ on the busy condition ( Fig 5D ). In the ‘Park’ vs. ‘Street’ comparison, the dispersion of results occurs on the attributes ‘calm’, ‘monotonous’, and ‘ uneventful’ in the empty ( Fig 5B ), while ‘vibrant’, ‘chaotic’, and ‘annoying’ in the busy condition ( Fig 5E ). Finally, in the ‘Park’ vs. ‘Plaza’ comparison, the attributes with higher dispersion in the empty condition are ‘calm’, ‘pleasant’, ‘monotonous’, and ‘uneventful’ ( Fig 5C ). In the busy condition ( Fig 5F ), the dispersion was observed in ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ scales.

Derived from Table 6 , the significant U values for each location comparison are presented in descending order as follows: the ‘chaotic’ PAQ in the ‘Street’ vs. ‘Park’ empty (U = 563.0, p<0.05); the ‘annoying’ PAQ in the ‘Plaza’ vs. ‘Park’ busy (U = 506.5, p<0.05); the ‘uneventful’ PAQ in the ‘Plaza’ vs. ‘Park’ empty (P = 473.5, p<0.05); the ‘monotonous’ PAQ in the ‘Plaza’ vs. ‘Street’ empty (U = 457.5, p<0.05); the ‘monotonous’ PAQ in the ‘Street’ vs. ‘Park’ busy (U = 365.0, p<0.001); and the ‘calm’ PAQ in the ‘Plaza’ vs. ‘Street’ busy (U = 333.5, p<0.001).

Meanwhile, the non-significant results also noticed in Fig 5 , ratings around zero were observed in different PAQs, as follows: ‘uneventful’ in the ‘Park’ vs. ‘Street’ ( Fig 5E ), and ‘Park’ vs. ‘Plaza’ ( Fig 5F ) both in the busy condition; ‘monotonous’ in the ‘Park’ vs. ‘Plaza’ for both conditions ( Fig 5C and 5F ); ‘chaotic’ in the ‘Plaza’ vs. ‘Street’ empty; and ‘eventful’ in the ‘Plaza’ vs. ‘Park’ empty. Four out of six location comparisons had around zero scores for the ‘annoying’ attribute: the ‘Street’ vs. ‘Park’ empty, the ‘Plaza’ vs. ‘Park’ empty, and the ‘Plaza’ vs. ‘Street’ in both conditions ( Fig 5A and 5D ). Two comparisons scored around 50 for the ‘pleasant’ and ‘eventful’ scales in the ‘Park’ vs. ‘Plaza’ busy ( Fig 5F ). The two comparisons scored around 40 for the ‘calm’ attribute in the ‘Plaza’ vs. ‘Street’ empty ( Fig 5A ), and the ‘pleasant’ scale in the ‘Plaza’ vs. ‘Street’ busy ( Fig 5D ). A score of around 30 appeared for ‘pleasant’ in the ‘Plaza’ vs. ‘Street’ empty ( Fig 5A ). Meanwhile, the ‘uneventful’ score in the ‘Park’ vs. ‘Street’ for the empty condition ( Fig 5B ) was around -50, the ‘vibrant’ scale was around 10, and 60 in the ‘Park’ vs. ‘Plaza’ for the empty ( Fig 5C ), and busy conditions ( Fig 5F ), respectively.

4. Discussion

When verifying the hypothesis (H 01 ) regarding different population densities at the same site and experiment, the Wilcoxon signed-rank test demonstrated that 85% of comparisons were significantly different. The PAQs for ‘calm’, ‘eventful’, ‘pleasant’, ‘chaotic’, ‘monotonous’, and ‘uneventful’ corroborated with the null hypothesis, that is, they changed with the number of people in the scenario ( Fig 3A–3F ). The ‘annoying’ in the ‘Plaza’ for the LAB VR ( Fig 3A ), the ‘vibrant’ of all locations in the MCR online ( Fig 3A–3C ), and the same attribute at the ‘Park’ in the LAB VR ( Fig 3E ) were also significantly different with population densities. When relating to the ‘Plaza’, results corroborate with the strategic urban plan done in 2016 to improve Piccadilly Gardens (‘Plaza’) into a vibrant location [ 56 ]. These similar results may indicate that both experiment methods were equivalent, given recordings, methods, and locations were the same, but in different moments. That is, perceptions of calmness always changed with population density at the ‘Park’ as did perceptions of eventfulness, pleasantness, uneventfulness, chaotic, and monotonous changed at the pedestrian street (‘Street’). This observation points out that these attributes may be sound qualities to consider when studying similar locations.

In the ‘Plaza’, there was a constant water fountain sound. This sound could mask the background traffic noise, which can cause a positive sensation that could justify the same pleasant rating. This masking effect was also observed in the study related to environmental noise [ 57 ]. Similar results related to the ‘pleasant’ and ‘vibrant’ qualities of water features showed that three Naples waterfront sites had no differences among laboratory and online experiments [ 32 ]. This finding corroborates the concept of using water sound as a tool [ 58 , 59 ] to support urban sound management and planning [ 9 , 38 ].

When verifying the hypothesis (H 02 ) regarding differences among urban locations in the same population density and experimental method, the Mann-Whitney test presented 63% and 58% significant differences for the MCR online and the LAB VR, respectively. The ‘calm’ PAQ was significantly different among four comparing sites for the MCR online ( Fig 4A, 4B, 4D and 4E ). Meanwhile, the LAB VR had five comparing sites ( Fig 5B–5F ) which corroborates with the null hypothesis. This tendency indicates that the ‘calm’ soundscape quality may be easier to assess since quiet areas are the opposite of noise pollution. However, there is a misconception of the definition of ‘calm’, which is easily confused with the term ‘quiet’. The ‘calm’ term represents pleasant and harmonic sound sources, while the ‘quiet’ term refers to the absence of sound sources. The calmness is more associated with silence, relaxation, and a tranquil area [ 60 ]. In addition, regarding the empty locations, resemblances among scores may be expected, given early hours may evoke similar perceptions. The tendency of similar results was unexpected for the comparison among the park and plaza ( Fig 4F ), given that different space functionalities may indicate different soundscape ‘characters’ as observed by Bento Coelho [ 38 ] and Siebein [ 53 ].

In both experiments, neutral responses, considered here as values around zero, were observed with 56% for the Wilcoxon signed-ranked test, and 54% and 44% for the Mann-Whitney test at the MCR online and LAB VR, respectively (Figs 3 – 5 ). Such behaviour might be related to neutral emotions which are also common in public opinion polls, because people avoid conflicting issues, especially when indifferent, and not used to the research topic or location [ 61 , 62 ]. Furthermore, neutrality may be because of a lack of familiarity with location due to the absence of retrieved sound memory [ 63 ]. Since semantic memory consists of facts, concepts, data, general information, and knowledge [ 64 ], individuals’ opinions must be grounded in these elements to interpret and rate the sonic environment [ 65 ]. For example, in the Wilcoxon signed-rank test the busy condition, the ‘monotonous’ and ‘uneventful’ scales were around zero in the same compared locations in both methods ( Fig 3 ). Meanwhile, in the Mann-Whiteney test, unexpected similarities were observed in the MCR online within half compared locations for the ‘monotonous’ scale with values over zero ( Fig 4 ). Similar zero scores were observed in the location comparisons for the ‘chaotic’, ‘annoying’, and ‘eventful’ qualities in the ‘Plaza’ vs. ‘Park’ empty in both experimental methods (Figs 4 and 5 ).

Another possibility for the neutrality of responses may be due to the uniformity of soundscapes which gives an impression of ‘blended’ sounds. This sound could be denominated as a ‘blended urban soundscape’, common in big cities due to similar sound sources in different functioning landscapes, also identified by Schafer as a ‘lo-fi’ sound [ 40 ]. When the environment is excessively urbanised, where the population exceeds three million inhabitants, the sonic environment is somehow normalised, so that people do not identify differences among the diverse urban soundscapes. These urban sonic environments are dominant in traffic and human-made sounds, constantly present in the background, and natural sounds have become rare. These noises could cause neurological stress on the population, where they become anesthetised due to overwhelming urban sounds. As Le Van Quyen [ 66 ] recommended, urban citizens should practice a ‘mental detox’, which includes being in a quiet environment. Such a principle reinforces the importance of maintaining and preserving quiet areas. It is also important to notice that these ‘blended soundscapes’ should be avoided when designing urban sound zones, to give character [ 38 , 53 ] and create diversity [ 67 ] within each site.

Another factor may be socio-cultural differences since 50% of participants from the MCR online were Brazilian Portuguese speakers. Some PAQ English words may not represent a common term in the Brazilian Portuguese language, as observed in Antunes et al. [ 68 ]. These inconsistencies in translations were also encountered in participating countries of the SATP group [ 14 ], as observed in the Indonesian study [ 15 ]. Therefore, further investigations should continue to consolidate the English terminology [ 4 ] so that translations can improve. However, even though there was a neutrality of perceived responses, the psychoacoustic indicators for the ‘Plaza’ busy scene showed higher values in loudness, sharpness, and tonality due to the sound source characteristics of the location. The most common sound sources in this location were the water sound from the fountain, children playing and shouting (sharpness, loudness, and tonality), tram circulation and sounds of tram brakes (sharpness and tonality), and babble sounds (loudness) [ 17 , 69 ]. Most psychoacoustic indicators in the other locations and densities presented similar results, corroborating with the characteristics of the ‘blended’ soundscapes.

Limitations of this work consist of audio levels and different smartphone audio reproduction in the online experiment, as well as lack of familiarity with the study areas, ‘social desirability’ in which participants desire to please the researcher [ 70 ], and ‘experimenter effect’ where individuals need to use their critical thinking in a way they never had to do before [ 71 ]. Recommendations are to adjust audio levels to the field sound levels at the beginning of an online experiment [ 72 ]. In the case of smartphone use in the online experiments, it is also recommended to ask the participant to inform the brand of the device to verify the factory calibration of loudspeakers.

5. Conclusions

This work aimed to observe the PAQ results regarding differences among the two population densities for each location, and comparisons among locations for each experimental method. The study highlighted that there were significant results regarding the effect of population density and comparison among locations in the subjective responses. Still, the neutrality of results did not contribute to characterising the soundscape diversity in a megalopolis city. Meanwhile, the second hypothesis verified that the differences among locations within each experimental method demonstrated similar unexpected results. Such behaviour was discussed and could be related to the participants’ unfamiliarity with the location, and homogeneities of the urban sonic environment characterized here as ‘blended urban soundscapes’.

Based on the identified ‘blended soundscapes’, it is highlighted the importance of managing and planning the sonic environment by the clear delimitation of the acoustic zones in line with the functionality of the space. Furthermore, soundscape tools should be investigated to increase the diversity of sound sources, enhancing the sonic environment with elements such as masking, bio-phony, noise reduction, noise barriers, selection of urban materials, and sound art installations, among others.

Future works include evaluating other cities with lower population density to highlight the PAQs to avoid ‘blended’ soundscapes and enrich the sonic environment for VR experiments. Further neurologic evaluations must include more objective metrics in assessing cognitive responses to urban soundscapes and understanding how social-cultural differences are reflected in VR experiments. These VR findings can support urban design in a low-cost approach where urban planners can test different scenarios and interventions.

Supporting information

https://doi.org/10.1371/journal.pone.0306261.s001

Acknowledgments

The authors thank participants and the Acoustic Research Centre staff from the University of Salford, UK for their contributions.

  • 1. International Organization for Standardization. ISO/TS 12913–2. Acoustics–Soundscape. Part 2: Methods and measurements in soundscape studies. Geneva, Switzerland. 2018.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 14. Aletta F, et al. Soundscape assessment: Towards a validated translation of perceptual attributes in different languages. In: Inter-noise and noise-con congress and conference proceedings 2020 Oct 12 (Vol. 261, No. 3, pp. 3137–3146). Institute of Noise Control Engineering.
  • 27. Rumsey F. Spatial audio. Routledge; 2012 Sep 10.
  • 28. Sun K, Botteldooren D, De Coensel B. Realism and immersion in the reproduction of audio-visual recordings for urban soundscape evaluation. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2018 Dec 18 (Vol. 258, No. 4, pp. 3432–3441). Institute of Noise Control Engineering.
  • 38. Coelho JB. Approaches to urban soundscape management, planning, and design. Soundscape and the built environment. Jian Kang & Brigitte Schulte-Fortkamp (editors). CRC Press, 2016: 197–214. Boca Raton, USA. https://doi.org/10.1201/b19145-11
  • 40. Schafer RM. The soundscape: Our sonic environment and the tuning of the world. Simon and Schuster; 1993 Oct 1.
  • 44. Sanchez GM, Alves S, Botteldooren D. Urban sound planning: an essential component in urbanism and landscape architecture. In: Handbook of research on perception-driven approaches to urban assessment and design 2018 (pp. 1–22). IGI Global.
  • 48. De Coensel B, Sun K, Botteldooren D. Urban Soundscapes of the World: Selection and reproduction of urban acoustic environments with soundscape in mind. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2017 Dec 7 (Vol. 255, No. 2, pp. 5407–5413). Institute of Noise Control Engineering.
  • 53. Siebein GW. ‘Creating and Designing Soundscape’, in Kang J. et al. (eds) Soundscape of European Cities and Landscapes—COST. 2013, Oxford: Soundscape-COST, pp. 158–162.
  • 55. Carvalho ML, Davies WJ, Fazenda B. Manchester Soundscape Experiment Online 2020: an overview. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2023 Feb 1 (Vol. 265, No. 1, pp. 5993–6001). Institute of Noise Control Engineering.
  • 63. Engel MS, Carvalho ML, Davies WJ. The influence of memories on soundscape perception responses. In: DAGA 2022 Proceedings. 2022. DAGA Stuttgart, pp. 1–4.
  • 72. Sudarsono AS, Sarwono J. The Development of a Web-Based Urban Soundscape Evaluation System. InIOP Conference Series: Earth and Environmental Science 2018 May (Vol. 158, No. 1, p. 012052). IOP Publishing. https://doi.org/10.1088/1755-1315/158/1/012052

AI Personas

AI team-building with the AI persona quiz

Slack’s Workforce Lab uncovers the 5 persona types that are defining the AI-driven workplace—and how leaders can make AI work better for every worker

By the team at Slack September 4th, 2024

Quick take: 

Slack’s Workforce Index research shows that leader urgency to implement AI has increased 7x over the last year. Employees who are using AI are seeing a boost to productivity and overall workplace satisfaction. And yet the majority of desk workers — more than two-thirds — have still never tried AI at work.

Through in-depth interviews and a survey of 5,000 full-time desk workers, Slack’s Workforce Lab explored what motivates workers to use AI and how they feel about using AI at work, identifying five distinct “ AI persona types .” 

What this means for leaders: “Employees’ attitudes and opinions about AI drive their behavior and adoption patterns. With the widely varying perspectives among desk workers, this research makes clear that AI enablement can’t be one size fits all,” says Christina Janzer, head of the Workforce Lab. “But these perspectives are not fixed; it’s still early days of the AI-powered workforce. AI personas give leaders a temperature check on where their teams are today and can help define an AI roadmap for their teams moving forward.”

Meet the five AI persona types

🥰 The Maximalist – All in on AI and already unlocking its benefits.

🤫 The Underground – Uses AI discreetly, reluctant to broadcast to colleagues.

😎 The Rebel – Sees AI as a threat and rebels against the AI hype.

🤩 The Superfan – Excited about AI but cheering from the sidelines.

🧐 The Observer – Indifferent to AI, but watching its rise with interest and caution.

Read more about each AI Persona type  or t ake the AI persona quiz to find out where you fall in the range of AI personas .

Try this AI team-building exercise: identify your team’s AI personas 

Two in 5 desk workers (37%) say their company has no AI policy, and those workers are 6x less likely to have experimented with AI tools compared to employees at companies with established guidelines. 

“One of the issues we saw come up repeatedly in our qualitative interviews with desk workers is that so much of AI use is currently invisible,” says Workforce Lab researcher Marcia Ash. “Desk workers shared that a big part of their hesitation and confusion is that they’re unsure of the accepted norms of how and when it’s okay to use AI.”

Where to draw the lin e- Slack blog post

Similar to personality or strengths tests for team building, conducting an AI personas exercise with your team is a fun and inviting way to encourage discussion, clarify guidelines and expectations, and identify any blockers holding people back from trying AI. It also sets teams up to learn and share the most helpful and creative AI use cases for their roles and functions.

How to do it: 

  • Invite all team members to take the AI persona quiz .
  • Set aside time for team members to discuss results , either live in a meeting or asynchronously on Slack.  Pro tip: leaders should “set the table” by assuring employees that all opinions and perceptions of AI are valid and that there’s no “right” way to feel about AI.
  • Identify common blockers – and set goals to address them . In our interviews with desk workers, commonly cited blockers were fear of being seen as lazy or incompetent for relying on AI, feelings of guilt for cutting corners, and uncertainty around what type of AI tools are permitted. Understanding the biggest issues preventing your people from optimizing their use of AI tools provides more precise focus for your enablement roadmap. Pro tip: consider plotting your own “where to draw the line” chart – you’ll likely be surprised by the varying opinions on your own team!
  • Invite team members to share use cases that they’re most excited about. Pro tip: document your team’s most tried-and-true use cases in your team-level agreements and encourage employees to add their personas to their Personal Operating Manuals .

Go further: three actions to encourage AI adoption

The pet plan: permission, education, training.

Workforce Index research shows that clear permission and guidance is the essential first step to foster AI adoption. Education and training are also critical; Only 15% of desk workers say they have the education and training necessary to use AI effectively, while those who are trained to use AI are up to 19x as likely to report that AI is improving their productivity.

“AI training programs don’t have to be a heavy lift,” says Chrissie Arnold, director of future of work programs for Workforce Lab. “We’ve had pretty amazing results from just 10 minutes a day of AI microlearning.” 

AI in public

Seeing how others are using and benefiting from AI tools helps clarify AI norms. Make AI use among your team more visible by creating a Slack channel dedicated to sharing and troubleshooting AI use cases and discussing AI news or by starting each team meeting by inviting people to share their AI learning for the week. 

Leadership should lead the way by regularly sharing milestones in their own AI journeys, modeling that AI experimentation (from the successes to the trial and error) is celebrated and encouraged. 

Make it safe to experiment

To realize the full potential of AI, companies need to create a safe space to experiment. And that depends on trust. 

Teams with high degrees of interpersonal trust are the teams that feel the safest and most supported to try new technologies, including AI. Desk workers who feel trusted by their employers are 94% more likely to have tried AI for work-related tasks. 

For more tips on how to foster willingness to try new ways of working, read An inside look at Slack’s culture of experimentation .

Was this post useful?

Thanks so much for your feedback!

Thanks for your feedback.

Oops! We're having trouble. Please try again later!

Try Slack with your team for free

IMAGES

  1. Types Of T Test

    psychology experiment t test

  2. Hypothesis Testing:T Test

    psychology experiment t test

  3. What Is a T-Test?

    psychology experiment t test

  4. T-test: Definition, Formula, Types, Applications

    psychology experiment t test

  5. How to Use T-Tests to Analyze Survey Results (T-Test Calculator Guide)

    psychology experiment t test

  6. The T-Test

    psychology experiment t test

VIDEO

  1. Psychology Experiment Results 👩‍🔬🧠 w OnlyJayus

  2. Student t-test concept

  3. Statistics 23, t-test for independent group

  4. 5. Paired T-Test Using Two-Way ANOVA & ANOVA With More Effects

  5. Python

  6. Psychology Experiment: Effect of Knowledge of Result

COMMENTS

  1. 11+ Psychology Experiment Ideas (Goals + Methods)

    The Marshmallow Test. One of the most talked-about experiments of the 20th century was the Marshmallow Test, conducted by Walter Mischel in the late 1960s at Stanford University.. The goal was simple but profound: to understand a child's ability to delay gratification and exercise self-control.. Children were placed in a room with a marshmallow and given a choice: eat the marshmallow now or ...

  2. The 25 Most Influential Psychological Experiments in History

    3. Bobo Doll Experiment Study Conducted by: Dr. Alburt Bandura. Study Conducted between 1961-1963 at Stanford University . Experiment Details: During the early 1960s a great debate began regarding the ways in which genetics, environmental factors, and social learning shaped a child's development. This debate still lingers and is commonly referred to as the Nature vs. Nurture Debate.

  3. An Introduction to t Tests

    When to use a t test. A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.. The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests.

  4. T Test Overview: How to Use & Examples

    We'll use a two-sample t test to evaluate if the difference between the two group means is statistically significant. The t test output is below. In the output, you can see that the treatment group (Sample 1) has a mean of 109 while the control group's (Sample 2) average is 100. The p-value for the difference between the groups is 0.112.

  5. PDF T-TESTS: When to use a t-test

    Use a table of critical t-values (see the one at the back of this document) The critical t-value at the p = .05 significance level, for a two-tailed test, is: 2.262. Our t-value (from the experiment) was: 2.183. In order for this to be significant, it must be LARGER than the critical t-value derived from the table.

  6. Great Ideas for Psychology Experiments to Explore

    Examples of famous psychology experiments that might be a source of further questions you'd like to explore include: Marshmallow test experiments. Little Albert experiment. Hawthorne effect experiments. Bystander effect experiments. Robbers Cave experiments. Halo effect experiments. Piano stairs experiment.

  7. Chapter 10 t-tests

    10.1 Between-Subjects t-tests (two-sample) We will begin by looking at the between-subjects t-test which is used for comparing the outcome in two groups of different people. Here we will be using data from Schroeder and Epley (2015) on the perception of people from their job applications.

  8. Experimental Method In Psychology

    There are three types of experiments you need to know: 1. Lab Experiment. A laboratory experiment in psychology is a research method in which the experimenter manipulates one or more independent variables and measures the effects on the dependent variable under controlled conditions. A laboratory experiment is conducted under highly controlled ...

  9. How the Experimental Method Works in Psychology

    The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis. For example, researchers may want to learn how different visual patterns may impact our perception.

  10. 7. Independent Means t-test

    7. Independent Means t-test. 7. Independent Means t-test. In this chapter, I will introduce to you one last t-test variation - the independent means t-test. This one is intended for the classic experimental design, in which two independent samples are compared. In a classic experimental design, we are comparing two samples.

  11. Experimental Design: Types, Examples & Methods

    Three types of experimental designs are commonly used: 1. Independent Measures. Independent measures design, also known as between-groups, is an experimental design where different participants are used in each condition of the independent variable. This means that each condition of the experiment includes a different group of participants.

  12. 12 Chapter 12: Repeated Measures t-test

    1. A 1-sample t -test uses raw scores to compare an average to a specific value. A dependent samples t -test uses two raw scores from each person to calculate difference scores and test for an average difference score that is equal to zero. The calculations, steps, and interpretation is exactly the same for each. 3.

  13. Psychology Experiment Ideas

    The Stroop Effect. This classic experiment involves presenting participants with words printed in different colors and asking them to name the color of the ink rather than read the word. Students can manipulate the congruency of the word and the color to test the Stroop effect.

  14. How to Conduct a Psychology Experiment

    When conducting an experiment, it is important to follow the seven basic steps of the scientific method: Ask a testable question. Define your variables. Conduct background research. Design your experiment. Perform the experiment. Collect and analyze the data. Draw conclusions.

  15. Milgram Shock Experiment

    Stanley Milgram Shock Experiment. Stanley Milgram, a psychologist at Yale University, carried out one of the most famous studies of obedience in psychology. He conducted an experiment focusing on the conflict between obedience to authority and personal conscience. Milgram (1963) examined justifications for acts of genocide offered by those ...

  16. 19+ Experimental Design Examples (Methods

    1) True Experimental Design. In the world of experiments, the True Experimental Design is like the superstar quarterback everyone talks about. Born out of the early 20th-century work of statisticians like Ronald A. Fisher, this design is all about control, precision, and reliability.

  17. 15 Famous Experiments and Case Studies in Psychology

    1. The Marshmallow Experiment. Psychologist Walter Mischel conducted the marshmallow experiment at Stanford University in the 1960s to early 1970s. It was a simple test that aimed to define the connection between delayed gratification and success in life.

  18. 8.5: One sample t-test

    R Code. Let's try a one-sample \(t\)-test. Consider the following data set: body mass of four geckos and four Anoles lizards (Dohm unpublished data).For starters, let's say that you have reason to believe that the true mean for all small lizards is 5 grams (g).

  19. 8 Classic Psychological Experiments

    The Asch conformity experiments were a series of psychological experiments conducted by psychologist Solomon Asch during the 1950s. The purpose of the experiments was to determine how much a person's opinions were influenced by the opinions of the rest of the group. In the study, participants were told that they were taking a "vision test ...

  20. 6 Examples of Using T-Tests in Real Life

    Example 1: Fuel Treatment. Researchers want to know if a new fuel treatment leads to a change in the mean miles per gallon of a certain car. To test this, they conduct an experiment in which they measure the mpg of 11 cars with and without the fuel treatment. Since each car is used in each sample, the researchers can use a paired samples t-test ...

  21. 6 Classic Psychology Experiments

    Martin Rogers/Getty Images. In a series of controversial experiments conducted in the late 1950s and early 1960s, psychologist Harry Harlow demonstrated the powerful effects of love on normal development. By showing the devastating effects of deprivation on young rhesus monkeys, Harlow revealed the importance of love for healthy childhood development. ...

  22. Experimental psychology

    Experimental psychology refers to work done by those who apply experimental methods to psychological study and the underlying processes. Experimental psychologists employ human participants and animal subjects to study a great many topics, including (among others) sensation, perception, memory, cognition, learning, motivation, emotion; developmental processes, social psychology, and the neural ...

  23. 7 Famous Psychology Experiments

    The Little Albert Experiment, 1920. A John's Hopkins University professor, Dr. John B. Watson, and a graduate student wanted to test a learning process called classical conditioning. Classical conditioning involves learning involuntary or automatic behaviors by association, and Dr. Watson thought it formed the bedrock of human psychology.

  24. Popular psychological tests and self-fulfilling prophecy: An experiment

    The purpose of this study was to examine whether or not the feedback of the results of popular psychological tests influences human behavior, and to compare the strength of any influence with that of academic psychological tests. The authors conducted an experiment in which 64 female Japanese Ss (college students) took a popular or academic psychological test, and then interacted with a stranger.

  25. AQA ALEVEL PSYCHOLOGY

    This is a complete set of notes for Zimbardo's Stanford Prison Experiment in AQA Alevel Psychology in the subtopic of social influence. This includes his aim, procedure, findings, conclusion and evaluation points. I hope this helps you with your revision. If you have any questions please message me or leave a reviews. Thank you

  26. How to Test Conformity With Your Own Psychology Experiment

    Conformity Experiment Ideas. One way to envision our own experiment is to consider some of the conformity experiments that have been performed in the past. It can also be helpful to consider a few questions we could answer in our own psychology experiment. Here are some questions that may spark a few conformity experiment ideas:

  27. Space-based experiments could help to advance early cancer ...

    The experiment involved the controlled introduction of various fluids into the chambers, allowing researchers to study bubble formation, growth, and coalescence under microgravity conditions.

  28. Evaluating the perceived affective qualities of urban soundscapes

    The study of the perceived affective qualities (PAQs) in soundscape assessments have increased in recent years, with methods varying from in-situ to laboratory. Through technological advances, virtual reality (VR) has facilitated evaluations of multiple locations in the same experiment. In this paper, VR reproductions of different urban sites were presented in an online and laboratory ...

  29. AI team-building with the AI persona quiz

    Similar to personality or strengths tests for team building, conducting an AI personas exercise with your team is a fun and inviting way to encourage discussion, clarify guidelines and expectations, and identify any blockers holding people back from trying AI. It also sets teams up to learn and share the most helpful and creative AI use cases ...