two tailed hypothesis testing

Skip to secondary menu
Skip to main content
Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

One-Tailed and Two-Tailed Hypothesis Tests Explained

By Jim Frost 60 Comments

Choosing whether to perform a one-tailed or a two-tailed hypothesis test is one of the methodology decisions you might need to make for your statistical analysis. This choice can have critical implications for the types of effects it can detect, the statistical power of the test, and potential errors.

In this post, you’ll learn about the differences between one-tailed and two-tailed hypothesis tests and their advantages and disadvantages. I include examples of both types of statistical tests. In my next post, I cover the decision between one and two-tailed tests in more detail.

What Are Tails in a Hypothesis Test?

First, we need to cover some background material to understand the tails in a test. Typically, hypothesis tests take all of the sample data and convert it to a single value, which is known as a test statistic. You’re probably already familiar with some test statistics. For example, t-tests calculate t-values . F-tests, such as ANOVA, generate F-values . The chi-square test of independence and some distribution tests produce chi-square values. All of these values are test statistics. For more information, read my post about Test Statistics .

These test statistics follow a sampling distribution. Probability distribution plots display the probabilities of obtaining test statistic values when the null hypothesis is correct. On a probability distribution plot, the portion of the shaded area under the curve represents the probability that a value will fall within that range.

The graph below displays a sampling distribution for t-values. The two shaded regions cover the two-tails of the distribution.

Plot that display critical regions in the two tails of the distribution.

Keep in mind that this t-distribution assumes that the null hypothesis is correct for the population. Consequently, the peak (most likely value) of the distribution occurs at t=0, which represents the null hypothesis in a t-test. Typically, the null hypothesis states that there is no effect. As t-values move further away from zero, it represents larger effect sizes. When the null hypothesis is true for the population, obtaining samples that exhibit a large apparent effect becomes less likely, which is why the probabilities taper off for t-values further from zero.

Related posts : How t-Tests Work and Understanding Probability Distributions

Critical Regions in a Hypothesis Test

In hypothesis tests, critical regions are ranges of the distributions where the values represent statistically significant results. Analysts define the size and location of the critical regions by specifying both the significance level (alpha) and whether the test is one-tailed or two-tailed.

Consider the following two facts:

The significance level is the probability of rejecting a null hypothesis that is correct.
The sampling distribution for a test statistic assumes that the null hypothesis is correct.

Consequently, to represent the critical regions on the distribution for a test statistic, you merely shade the appropriate percentage of the distribution. For the common significance level of 0.05, you shade 5% of the distribution.

Related posts : Significance Levels and P-values and T-Distribution Table of Critical Values

Two-Tailed Hypothesis Tests

Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2.5% (2 * 2.5% = 5%).

When a test statistic falls in either critical region, your sample data are sufficiently incompatible with the null hypothesis that you can reject it for the population.

In a two-tailed test, the generic null and alternative hypotheses are the following:

Null : The effect equals zero.
Alternative : The effect does not equal zero.

The specifics of the hypotheses depend on the type of test you perform because you might be assessing means, proportions, or rates.

Example of a two-tailed 1-sample t-test

Suppose we perform a two-sided 1-sample t-test where we compare the mean strength (4.1) of parts from a supplier to a target value (5). We use a two-tailed test because we care whether the mean is greater than or less than the target value.

To interpret the results, simply compare the p-value to your significance level. If the p-value is less than the significance level, you know that the test statistic fell into one of the critical regions, but which one? Just look at the estimated effect. In the output below, the t-value is negative, so we know that the test statistic fell in the critical region in the left tail of the distribution, indicating the mean is less than the target value. Now we know this difference is statistically significant.

Statistical output from a two-tailed 1-sample t-test.

We can conclude that the population mean for part strength is less than the target value. However, the test had the capacity to detect a positive difference as well. You can also assess the confidence interval. With a two-tailed hypothesis test, you’ll obtain a two-sided confidence interval. The confidence interval tells us that the population mean is likely to fall between 3.372 and 4.828. This range excludes the target value (5), which is another indicator of significance.

Advantages of two-tailed hypothesis tests

You can detect both positive and negative effects. Two-tailed tests are standard in scientific research where discovering any type of effect is usually of interest to researchers.

One-Tailed Hypothesis Tests

One-tailed hypothesis tests are also known as directional and one-sided tests because you can test for effects in only one direction. When you perform a one-tailed test, the entire significance level percentage goes into the extreme end of one tail of the distribution.

In the examples below, I use an alpha of 5%. Each distribution has one shaded region of 5%. When you perform a one-tailed test, you must determine whether the critical region is in the left tail or the right tail. The test can detect an effect only in the direction that has the critical region. It has absolutely no capacity to detect an effect in the other direction.

In a one-tailed test, you have two options for the null and alternative hypotheses, which corresponds to where you place the critical region.

You can choose either of the following sets of generic hypotheses:

Null : The effect is less than or equal to zero.
Alternative : The effect is greater than zero.

Plot that displays a single critical region for a one-tailed test.

Null : The effect is greater than or equal to zero.
Alternative : The effect is less than zero.

Plot that displays a single critical region in the left tail for a one-tailed test.

Again, the specifics of the hypotheses depend on the type of test you perform.

Notice how for both possible null hypotheses the tests can’t distinguish between zero and an effect in a particular direction. For example, in the example directly above, the null combines “the effect is greater than or equal to zero” into a single category. That test can’t differentiate between zero and greater than zero.

Example of a one-tailed 1-sample t-test

Suppose we perform a one-tailed 1-sample t-test. We’ll use a similar scenario as before where we compare the mean strength of parts from a supplier (102) to a target value (100). Imagine that we are considering a new parts supplier. We will use them only if the mean strength of their parts is greater than our target value. There is no need for us to differentiate between whether their parts are equally strong or less strong than the target value—either way we’d just stick with our current supplier.

Consequently, we’ll choose the alternative hypothesis that states the mean difference is greater than zero (Population mean – Target value > 0). The null hypothesis states that the difference between the population mean and target value is less than or equal to zero.

Statistical output for a one-tailed 1-sample t-test.

To interpret the results, compare the p-value to your significance level. If the p-value is less than the significance level, you know that the test statistic fell into the critical region. For this study, the statistically significant result supports the notion that the population mean is greater than the target value of 100.

Confidence intervals for a one-tailed test are similarly one-sided. You’ll obtain either an upper bound or a lower bound. In this case, we get a lower bound, which indicates that the population mean is likely to be greater than or equal to 100.631. There is no upper limit to this range.

A lower-bound matches our goal of determining whether the new parts are stronger than our target value. The fact that the lower bound (100.631) is higher than the target value (100) indicates that these results are statistically significant.

This test is unable to detect a negative difference even when the sample mean represents a very negative effect.

Advantages and disadvantages of one-tailed hypothesis tests

One-tailed tests have more statistical power to detect an effect in one direction than a two-tailed test with the same design and significance level. One-tailed tests occur most frequently for studies where one of the following is true:

Effects can exist in only one direction.
Effects can exist in both directions but the researchers only care about an effect in one direction. There is no drawback to failing to detect an effect in the other direction. (Not recommended.)

The disadvantage of one-tailed tests is that they have no statistical power to detect an effect in the other direction.

As part of your pre-study planning process, determine whether you’ll use the one- or two-tailed version of a hypothesis test. To learn more about this planning process, read 5 Steps for Conducting Scientific Studies with Statistical Analyses .

This post explains the differences between one-tailed and two-tailed statistical hypothesis tests. How these forms of hypothesis tests function is clear and based on mathematics. However, there is some debate about when you can use one-tailed tests. My next post explores this decision in much more depth and explains the different schools of thought and my opinion on the matter— When Can I Use One-Tailed Hypothesis Tests .

If you’re learning about hypothesis testing and like the approach I use in my blog, check out my Hypothesis Testing book! You can find it at Amazon and other retailers.

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Reader Interactions

June 26, 2022 at 12:14 pm

Hi, Can help me with figuring out the null and alternative hypothesis of the following statement? Some claimed that the real average expenditure on beverage by general people is at least $10.

February 19, 2022 at 6:02 am

thank you for the thoroughly explanation, I’m still strugling to wrap my mind around the t-table and the relation between the alpha values for one or two tail probability and the confidence levels on the bottom (I’m understanding it so wrongly that for me it should be the oposite, like one tail 0,05 should correspond 95% CI and two tailed 0,025 should correspond to 95% because then you got the 2,5% on each side). In my mind if I picture the one tail diagram with an alpha of 0,05 I see the rest 95% inside the diagram, but for a one tail I only see 90% CI paired with a 5% alpha… where did the other 5% go? I tried to understand when you said we should just double the alpha for a one tail probability in order to find the CI but I still cant picture it. I have been trying to understand this. Like if you only have one tail and there is 0,05, shouldn’t the rest be on the other side? why is it then 90%… I know I’m missing a point and I can’t figure it out and it’s so frustrating…

February 23, 2022 at 10:01 pm

The alpha is the total shaded area. So, if the alpha = 0.05, you know that 5% of the distribution is shaded. The number of tails tells you how to divide the shaded areas. Is it all in one region (1-tailed) or do you split the shaded regions in two (2-tailed)?

So, for a one-tailed test with an alpha of 0.05, the 5% shading is all in one tail. If alpha = 0.10, then it’s 10% on one side. If it’s two-tailed, then you need to split that 10% into two–5% in both tails. Hence, the 5% in a one-tailed test is the same as a two-tailed test with an alpha of 0.10 because that test has the same 5% on one side (but there’s another 5% in the other tail).

It’s similar for CIs. However, for CIs, you shade the middle rather than the extremities. I write about that in one my articles about hypothesis testing and confidence intervals .

I’m not sure if I’m answering your question or not.

February 17, 2022 at 1:46 pm

I ran a post hoc Dunnett’s test alpha=0.05 after a significant Anova test in Proc Mixed using SAS. I want to determine if the means for treatment (t1, t2, t3) is significantly less than the means for control (p=pathogen). The code for the dunnett’s test is – LSmeans trt / diff=controll (‘P’) adjust=dunnett CL plot=control; I think the lower bound one tailed test is the correct test to run but I’m not 100% sure. I’m finding conflicting information online. In the output table for the dunnett’s test the mean difference between the control and the treatments is t1=9.8, t2=64.2, and t3=56.5. The control mean estimate is 90.5. The adjusted p-value by treatment is t1(p=0.5734), t2 (p=.0154) and t3(p=.0245). The adjusted lower bound confidence limit in order from t1-t3 is -38.8, 13.4, and 7.9. The adjusted upper bound for all test is infinity. The graphical output for the dunnett’s test in SAS is difficult to understand for those of us who are beginner SAS users. All treatments appear as a vertical line below the the horizontal line for control at 90.5 with t2 and t3 in the shaded area. For treatment 1 the shaded area is above the line for control. Looking at just the output table I would say that t2 and t3 are significantly lower than the control. I guess I would like to know if my interpretation of the outputs is correct that treatments 2 and 3 are statistically significantly lower than the control? Should I have used an upper bound one tailed test instead?

November 10, 2021 at 1:00 am

Thanks Jim. Please help me understand how a two tailed testing can be used to minimize errors in research

July 1, 2021 at 9:19 am

Hi Jim, Thanks for posting such a thorough and well-written explanation. It was extremely useful to clear up some doubts.

May 7, 2021 at 4:27 pm

Hi Jim, I followed your instructions for the Excel add-in. Thank you. I am very new to statistics and sort of enjoy it as I enter week number two in my class. I am to select if three scenarios call for a one or two-tailed test is required and why. The problem is stated:

30% of mole biopsies are unnecessary. Last month at his clinic, 210 out of 634 had benign biopsy results. Is there enough evidence to reject the dermatologist’s claim?

Part two, the wording changes to “more than of 30% of biopsies,” and part three, the wording changes to “less than 30% of biopsies…”

I am not asking for the problem to be solved for me, but I cannot seem to find direction needed. I know the elements i am dealing with are =30%, greater than 30%, and less than 30%. 210 and 634. I just don’t know what to with the information. I can’t seem to find an example of a similar problem to work with.

May 9, 2021 at 9:22 pm

As I detail in this post, a two-tailed test tells you whether an effect exists in either direction. Or, is it different from the null value in either direction. For the first example, the wording suggests you’d need a two-tailed test to determine whether the population proportion is ≠ 30%. Whenever you just need to know ≠, it suggests a two-tailed test because you’re covering both directions.

For part two, because it’s in one direction (greater than), you need a one-tailed test. Same for part three but it’s less than. Look in this blog post to see how you’d construct the null and alternative hypotheses for these cases. Note that you’re working with a proportion rather than the mean, but the principles are the same! Just plug your scenario and the concept of proportion into the wording I use for the hypotheses.

I hope that helps!

April 11, 2021 at 9:30 am

Hello Jim, great website! I am using a statistics program (SPSS) that does NOT compute one-tailed t-tests. I am trying to compare two independent groups and have justifiable reasons why I only care about one direction. Can I do the following? Use SPSS for two-tailed tests to calculate the t & p values. Then report the p-value as p/2 when it is in the predicted direction (e.g , SPSS says p = .04, so I report p = .02), and report the p-value as 1 – (p/2) when it is in the opposite direction (e.g., SPSS says p = .04, so I report p = .98)? If that is incorrect, what do you suggest (hopefully besides changing statistics programs)? Also, if I want to report confidence intervals, I realize that I would only have an upper or lower bound, but can I use the CI’s from SPSS to compute that? Thank you very much!

April 11, 2021 at 5:42 pm

Yes, for p-values, that’s absolutely correct for both cases.

For confidence intervals, if you take one endpoint of a two-side CI, it becomes a one-side bound with half the confidence level.

Consequently, to obtain a one-sided bound with your desired confidence level, you need to take your desired significance level (e.g., 0.05) and double it. Then subtract it from 1. So, if you’re using a significance level of 0.05, double that to 0.10 and then subtract from 1 (1 – 0.10 = 0.90). 90% is the confidence level you want to use for a two-sided test. After obtaining the two-sided CI, use one of the endpoints depending on the direction of your hypothesis (i.e., upper or lower bound). That’s produces the one-sided the bound with the confidence level that you want. For our example, we calculated a 95% one-sided bound.

March 3, 2021 at 8:27 am

Hi Jim. I used the one-tailed(right) statistical test to determine an anomaly in the below problem statement: On a daily basis, I calculate the (mapped_%) in a common field between two tables.

The way I used the t-test is: On any particular day, I calculate the sample_mean, S.D and sample_count (n=30) for the last 30 days including the current day. My null hypothesis, H0 (pop. mean)=95 and H1>95 (alternate hypothesis). So, I calculate the t-stat based on the sample_mean, pop.mean, sample S.D and n. I then choose the t-crit value for 0.05 from my t-ditribution table for dof(n-1). On the current day if my abs.(t-stat)>t-crit, then I reject the null hypothesis and I say the mapped_pct on that day has passed the t-test.

I get some weird results here, where if my mapped_pct is as low as 6%-8% in all the past 30 days, the t-test still gets a “pass” result. Could you help on this? If my hypothesis needs to be changed.

I would basically look for the mapped_pct >95, if it worked on a static trigger. How can I use the t-test effectively in this problem statement?

December 18, 2020 at 8:23 pm

Hello Dr. Jim, I am wondering if there is evidence in one of your books or other source you could provide, which supports that it is OK not to divide alpha level by 2 in one-tailed hypotheses. I need the source for supporting evidence in a Portfolio exercise and couldn’t find one.

I am grateful for your reply and for your statistics knowledge sharing!

November 27, 2020 at 10:31 pm

If I did a one directional F test ANOVA(one tail ) and wanted to calculate a confidence interval for each individual groups (3) mean . Would I use a one tailed or two tailed t , within my confidence interval .

November 29, 2020 at 2:36 am

Hi Bashiru,

F-tests for ANOVA will always be one-tailed for the reasons I discuss in this post. To learn more about, read my post about F-tests in ANOVA .

For the differences between my groups, I would not use t-tests because the family-wise error rate quickly grows out of hand. To learn more about how to compare group means while controlling the familywise error rate, read my post about using post hoc tests with ANOVA . Typically, these are two-side intervals but you’d be able to use one-sided.

November 26, 2020 at 10:51 am

Hi Jim, I had a question about the formulation of the hypotheses. When you want to test if a beta = 1 or a beta = 0. What will be the null hypotheses? I’m having trouble with finding out. Because in most cases beta = 0 is the null hypotheses but in this case you want to test if beta = 0. so i’m having my doubts can it in this case be the alternative hypotheses or is it still the null hypotheses?

Kind regards, Noa

November 27, 2020 at 1:21 am

Typically, the null hypothesis represents no effect or no relationship. As an analyst, you’re hoping that your data have enough evidence to reject the null and favor the alternative.

Assuming you’re referring to beta as in regression coefficients, zero represents no relationship. Consequently, beta = 0 is the null hypothesis.

You might hope that beta = 1, but you don’t usually include that in your alternative hypotheses. The alternative hypothesis usually states that it does not equal no effect. In other words, there is an effect but it doesn’t state what it is.

There are some exceptions to the above but I’m writing about the standard case.

November 22, 2020 at 8:46 am

Your articles are a help to intro to econometrics students. Keep up the good work! More power to you!

November 6, 2020 at 11:25 pm

Hello Jim. Can you help me with these please?

Write the null and alternative hypothesis using a 1-tailed and 2-tailed test for each problem. (In paragraph and symbols)

A teacher wants to know if there is a significant difference in the performance in MAT C313 between her morning and afternoon classes.

It is known that in our university canteen, the average waiting time for a customer to receive and pay for his/her order is 20 minutes. Additional personnel has been added and now the management wants to know if the average waiting time had been reduced.

November 8, 2020 at 12:29 am

I cover how to write the hypotheses for the different types of tests in this post. So, you just need to figure which type of test you need to use. In your case, you want to determine whether the mean waiting time is less than the target value of 20 minutes. That’s a 1-sample t-test because you’re comparing a mean to a target value (20 minutes). You specifically want to determine whether the mean is less than the target value. So, that’s a one-tailed test. And, you’re looking for a mean that is “less than” the target.

So, go to the one-tailed section in the post and look for the hypotheses for the effect being less than. That’s the one with the critical region on the left side of the curve.

Now, you need include your own information. In your case, you’re comparing the sample estimate to a population mean of 20. The 20 minutes is your null hypothesis value. Use the symbol mu μ to represent the population mean.

You put all that together and you get the following:

Null: μ ≥ 20 Alternative: μ 0 to denote the null hypothesis and H 1 or H A to denote the alternative hypothesis if that’s what you been using in class.

October 17, 2020 at 12:11 pm

I was just wondering if you could please help with clarifying what the hypothesises would be for say income for gamblers and, age of gamblers. I am struggling to find which means would be compared.

October 17, 2020 at 7:05 pm

Those are both continuous variables, so you’d use either correlation or regression for them. For both of those analyses, the hypotheses are the following:

Null : The correlation or regression coefficient equals zero (i.e., there is no relationship between the variables) Alternative : The coefficient does not equal zero (i.e., there is a relationship between the variables.)

When the p-value is less than your significance level, you reject the null and conclude that a relationship exists.

October 17, 2020 at 3:05 am

I was ask to choose and justify the reason between a one tailed and two tailed test for dummy variables, how do I do that and what does it mean?

October 17, 2020 at 7:11 pm

I don’t have enough information to answer your question. A dummy variable is also known as an indicator variable, which is a binary variable that indicates the presence or absence of a condition or characteristic. If you’re using this variable in a hypothesis test, I’d presume that you’re using a proportions test, which is based on the binomial distribution for binary data.

Choosing between a one-tailed or two-tailed test depends on subject area issues and, possibly, your research objectives. Typically, use a two-tailed test unless you have a very good reason to use a one-tailed test. To understand when you might use a one-tailed test, read my post about when to use a one-tailed hypothesis test .

October 16, 2020 at 2:07 pm

In your one-tailed example, Minitab describes the hypotheses as “Test of mu = 100 vs > 100”. Any idea why Minitab says the null is “=” rather than “= or less than”? No ASCII character for it?

October 16, 2020 at 4:20 pm

I’m not entirely sure even though I used to work there! I know we had some discussions about how to represent that hypothesis but I don’t recall the exact reasoning. I suspect that it has to do with the conclusions that you can draw. Let’s focus on the failing to reject the null hypothesis. If the test statistic falls in that region (i.e., it is not significant), you fail to reject the null. In this case, all you know is that you have insufficient evidence to say it is different than 100. I’m pretty sure that’s why they use the equal sign because it might as well be one.

Mathematically, I think using ≤ is more accurate, which you can really see when you look at the distribution plots. That’s why I phrase the hypotheses using ≤ or ≥ as needed. However, in terms of the interpretation, the “less than” portion doesn’t really add anything of importance. You can conclude that its equal to 100 or greater than 100, but not less than 100.

October 15, 2020 at 5:46 am

Thank you so much for your timely feedback. It helps a lot

October 14, 2020 at 10:47 am

How can i use one tailed test at 5% alpha on this problem?

A manufacturer of cellular phone batteries claims that when fully charged, the mean life of his product lasts for 26 hours with a standard deviation of 5 hours. Mr X, a regular distributor, randomly picked and tested 35 of the batteries. His test showed that the average life of his sample is 25.5 hours. Is there a significant difference between the average life of all the manufacturer’s batteries and the average battery life of his sample?

October 14, 2020 at 8:22 pm

I don’t think you’d want to use a one-tailed test. The goal is to determine whether the sample is significantly different than the manufacturer’s population average. You’re not saying significantly greater than or less than, which would be a one-tailed test. As phrased, you want a two-tailed test because it can detect a difference in either direct.

It sounds like you need to use a 1-sample t-test to test the mean. During this test, enter 26 as the test mean. The procedure will tell you if the sample mean of 25.5 hours is a significantly different from that test mean. Similarly, you’d need a one variance test to determine whether the sample standard deviation is significantly different from the test value of 5 hours.

For both of these tests, compare the p-value to your alpha of 0.05. If the p-value is less than this value, your results are statistically significant.

September 22, 2020 at 4:16 am

Hi Jim, I didn’t get an idea that when to use two tail test and one tail test. Will you please explain?

September 22, 2020 at 10:05 pm

I have a complete article dedicated to that: When Can I Use One-Tailed Tests .

Basically, start with the assumption that you’ll use a two-tailed test but then consider scenarios where a one-tailed test can be appropriate. I talk about all of that in the article.

If you have questions after reading that, please don’t hesitate to ask!

July 31, 2020 at 12:33 pm

Thank you so so much for this webpage.

I have two scenarios that I need some clarification. I will really appreciate it if you can take a look:

So I have several of materials that I know when they are tested after production. My hypothesis is that the earlier they are tested after production, the higher the mean value I should expect. At the same time, the later they are tested after production, the lower the mean value. Since this is more like a “greater or lesser” situation, I should use one tail. Is that the correct approach?

On the other hand, I have several mix of materials that I don’t know when they are tested after production. I only know the mean values of the test. And I only want to know whether one mean value is truly higher or lower than the other, I guess I want to know if they are only significantly different. Should I use two tail for this? If they are not significantly different, I can judge based on the mean values of test alone. And if they are significantly different, then I will need to do other type of analysis. Also, when I get my P-value for two tail, should I compare it to 0.025 or 0.05 if my confidence level is 0.05?

Thank you so much again.

July 31, 2020 at 11:19 pm

For your first, if you absolutely know that the mean must be lower the later the material is tested, that it cannot be higher, that would be a situation where you can use a one-tailed test. However, if that’s not a certainty, you’re just guessing, use a two-tail test. If you’re measuring different items at the different times, use the independent 2-sample t-test. However, if you’re measuring the same items at two time points, use the paired t-test. If it’s appropriate, using the paired t-test will give you more statistical power because it accounts for the variability between items. For more information, see my post about when it’s ok to use a one-tailed test .

For the mix of materials, use a two-tailed test because the effect truly can go either direction.

Always compare the p-value to your full significance level regardless of whether it’s a one or two-tailed test. Don’t divide the significance level in half.

June 17, 2020 at 2:56 pm

Is it possible that we reach to opposite conclusions if we use a critical value method and p value method Secondly if we perform one tail test and use p vale method to conclude our Ho, then do we need to convert sig value of 2 tail into sig value of one tail. That can be done just by dividing it with 2

June 18, 2020 at 5:17 pm

The p-value method and critical value method will always agree as long as you’re not changing anything about how the methodology.

If you’re using statistical software, you don’t need to make any adjustments. The software will do that for you.

However, if you calculating it by hand, you’ll need to take your significance level and then look in the table for your test statistic for a one-tailed test. For example, you’ll want to look up 5% for a one-tailed test rather than a two-tailed test. That’s not as simple as dividing by two. In this article, I show examples of one-tailed and two-tailed tests for the same degrees of freedom. The t critical value for the two-tailed test is +/- 2.086 while for the one-sided test it is 1.725. It is true that probability associated with those critical values doubles for the one-tailed test (2.5% -> 5%), but the critical value itself is not half (2.086 -> 1.725). Study the first several graphs in this article to see why that is true.

For the p-value, you can take a two-tailed p-value and divide by 2 to determine the one-sided p-value. However, if you’re using statistical software, it does that for you.

June 11, 2020 at 3:46 pm

Hello Jim, if you have the time I’d be grateful if you could shed some clarity on this scenario:

“A researcher believes that aromatherapy can relieve stress but wants to determine whether it can also enhance focus. To test this, the researcher selected a random sample of students to take an exam in which the average score in the general population is 77. Prior to the exam, these students studied individually in a small library room where a lavender scent was present. If students in this group scored significantly above the average score in general population [is this one-tailed or two-tailed hypothesis?], then this was taken as evidence that the lavender scent enhanced focus.”

Thank you for your time if you do decide to respond.

June 11, 2020 at 4:00 pm

It’s unclear from the information provided whether the researchers used a one-tailed or two-tailed test. It could be either. A two-tailed test can detect effects in both directions, so it could definitely detect an average group score above the population score. However, you could also detect that effect using a one-tailed test if it was set up correctly. So, there’s not enough information in what you provided to know for sure. It could be either.

However, that’s irrelevant to answering the question. The tricky part, as I see it, is that you’re not entirely sure about why the scores are higher. Are they higher because the lavender scent increased concentration or are they higher because the subjects have lower stress from the lavender? Or, maybe it’s not even related to the scent but some other characteristic of the room or testing conditions in which they took the test. You just know the scores are higher but not necessarily why they’re higher.

I’d say that, no, it’s not necessarily evidence that the lavender scent enhanced focus. There are competing explanations for why the scores are higher. Also, it would be best do this as an experiment with a control and treatment group where subjects are randomly assigned to either group. That process helps establish causality rather than just correlation and helps rules out competing explanations for why the scores are higher.

By the way, I spend a lot of time on these issues in my Introduction to Statistics ebook .

June 9, 2020 at 1:47 pm

If a left tail test has an alpha value of 0.05 how will you find the value in the table

April 19, 2020 at 10:35 am

Hi Jim, My question is in regards to the results in the table in your example of the one-sample T (Two-Tailed) test. above. What about the P-value? The P-value listed is .018. I assuming that is compared to and alpha of 0.025, correct?

In regression analysis, when I get a test statistic for the predictive variable of -2.099 and a p-value of 0.039. Am I comparing the p-value to an alpha of 0.025 or 0.05? Now if I run a Bootstrap for coefficients analysis, the results say the sig (2-tail) is 0.098. What are the critical values and alpha in this case? I’m trying to reconcile what I am seeing in both tables.

Thanks for your help.

April 20, 2020 at 3:24 am

Hi Marvalisa,

For one-tailed tests, you don’t need to divide alpha in half. If you can tell your software to perform a one-tailed test, it’ll do all the calculations necessary so you don’t need to adjust anything. So, if you’re using an alpha of 0.05 for a one-tailed test and your p-value is 0.04, it is significant. The procedures adjust the p-values automatically and it all works out. So, whether you’re using a one-tailed or two-tailed test, you always compare the p-value to the alpha with no need to adjust anything. The procedure does that for you!

The exception would be if for some reason your software doesn’t allow you to specify that you want to use a one-tailed test instead of a two-tailed test. Then, you divide the p-value from a two-tailed test in half to get the p-value for a one tailed test. You’d still compare it to your original alpha.

For regression, the same thing applies. If you want to use a one-tailed test for a cofficient, just divide the p-value in half if you can’t tell the software that you want a one-tailed test. The default is two-tailed. If your software has the option for one-tailed tests for any procedure, including regression, it’ll adjust the p-value for you. So, in the normal course of things, you won’t need to adjust anything.

March 26, 2020 at 12:00 pm

Hey Jim, for a one-tailed hypothesis test with a .05 confidence level, should I use a 95% confidence interval or a 90% confidence interval? Thanks

March 26, 2020 at 5:05 pm

You should use a one-sided 95% confidence interval. One-sided CIs have either an upper OR lower bound but remains unbounded on the other side.

March 16, 2020 at 4:30 pm

This is not applicable to the subject but… When performing tests of equivalence, we look at the confidence interval of the difference between two groups, and we perform two one-sided t-tests for equivalence..

March 15, 2020 at 7:51 am

Thanks for this illustrative blogpost. I had a question on one of your points though.

By definition of H1 and H0, a two-sided alternate hypothesis is that there is a difference in means between the test and control. Not that anything is ‘better’ or ‘worse’.

Just because we observed a negative result in your example, does not mean we can conclude it’s necessarily worse, but instead just ‘different’.

Therefore while it enables us to spot the fact that there may be differences between test and control, we cannot make claims about directional effects. So I struggle to see why they actually need to be used instead of one-sided tests.

What’s your take on this?

March 16, 2020 at 3:02 am

Hi Dominic,

If you’ll notice, I carefully avoid stating better or worse because in a general sense you’re right. However, given the context of a specific experiment, you can conclude whether a negative value is better or worse. As always in statistics, you have to use your subject-area knowledge to help interpret the results. In some cases, a negative value is a bad result. In other cases, it’s not. Use your subject-area knowledge!

I’m not sure why you think that you can’t make claims about directional effects? Of course you can!

As for why you shouldn’t use one-tailed tests for most cases, read my post When Can I Use One-Tailed Tests . That should answer your questions.

May 10, 2019 at 12:36 pm

Your website is absolutely amazing Jim, you seem like the nicest guy for doing this and I like how there’s no ulterior motive, (I wasn’t automatically signed up for emails or anything when leaving this comment). I study economics and found econometrics really difficult at first, but your website explains it so clearly its been a big asset to my studies, keep up the good work!

May 10, 2019 at 2:12 pm

Thank you so much, Jack. Your kind words mean a lot!

April 26, 2019 at 5:05 am

Hy Jim I really need your help now pls

One-tailed and two- tailed hypothesis, is it the same or twice, half or unrelated pls

April 26, 2019 at 11:41 am

Hi Anthony,

I describe how the hypotheses are different in this post. You’ll find your answers.

February 8, 2019 at 8:00 am

Thank you for your blog Jim, I have a Statistics exam soon and your articles let me understand a lot!

February 8, 2019 at 10:52 am

You’re very welcome! I’m happy to hear that it’s been helpful. Best of luck on your exam!

January 12, 2019 at 7:06 am

Hi Jim, When you say target value is 5. Do you mean to say the population mean is 5 and we are trying to validate it with the help of sample mean 4.1 using Hypo tests ?.. If it is so.. How can we measure a population parameter as 5 when it is almost impossible o measure a population parameter. Please clarify

January 12, 2019 at 6:57 pm

When you set a target for a one-sample test, it’s based on a value that is important to you. It’s not a population parameter or anything like that. The example in this post uses a case where we need parts that are stronger on average than a value of 5. We derive the value of 5 by using our subject area knowledge about what is required for a situation. Given our product knowledge for the hypothetical example, we know it should be 5 or higher. So, we use that in the hypothesis test and determine whether the population mean is greater than that target value.

When you perform a one-sample test, a target value is optional. If you don’t supply a target value, you simply obtain a confidence interval for the range of values that the parameter is likely to fall within. But, sometimes there is meaningful number that you want to test for specifically.

I hope that clarifies the rational behind the target value!

November 15, 2018 at 8:08 am

I understand that in Psychology a one tailed hypothesis is preferred. Is that so

November 15, 2018 at 11:30 am

No, there’s no overall preference for one-tailed hypothesis tests in statistics. That would be a study-by-study decision based on the types of possible effects. For more information about this decision, read my post: When Can I Use One-Tailed Tests?

November 6, 2018 at 1:14 am

I’m grateful to you for the explanations on One tail and Two tail hypothesis test. This opens my knowledge horizon beyond what an average statistics textbook can offer. Please include more examples in future posts. Thanks

November 5, 2018 at 10:20 am

Thank you. I will search it as well.

Stan Alekman

November 4, 2018 at 8:48 pm

Jim, what is the difference between the central and non-central t-distributions w/respect to hypothesis testing?

November 5, 2018 at 10:12 am

Hi Stan, this is something I will need to look into. I know central t-distribution is the common Student t-distribution, but I don’t have experience using non-central t-distributions. There might well be a blog post in that–after I learn more!

November 4, 2018 at 7:42 pm

this is awesome.

Comments and Questions Cancel reply

Statistics Made Easy

Two-Tailed Hypothesis Tests: 3 Example Problems

In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.

Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:

H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value

H A (Alternative Hypothesis): Population parameter <, >, ≠ some value

There are two types of hypothesis tests:

One-tailed test : Alternative hypothesis contains either < or > sign
Two-tailed test : Alternative hypothesis contains the ≠ sign

In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.

This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.

Check out the following example problems to gain a better understanding of two-tailed tests.

Example 1: Factory Widgets

Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

H 0 (Null Hypothesis): μ = 20 grams
H A (Alternative Hypothesis): μ ≠ 20 grams

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.

To test this, he uses the new method to produce 20 widgets and obtains the following information:

n = 20 widgets
x = 19.8 grams
s = 3.1 grams

Plugging these values into the One Sample t-test Calculator , we obtain the following results:

t-test statistic: -0.288525
two-tailed p-value: 0.776

Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.

Example 2: Plant Growth

Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.

To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

H 0 (Null Hypothesis): μ = 10 inches
H A (Alternative Hypothesis): μ ≠ 10 inches

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.

To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:

n = 15 plants
x = 11.4 inches
s = 2.5 inches
t-test statistic: 2.1689
two-tailed p-value: 0.0478

Since the p-value is less than .05, the botanist rejects the null hypothesis.

She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.

Example 3: Studying Method

A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.

To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.

She then performs a hypothesis test using the following hypotheses:

H 0 : μ = 82
H A : μ ≠ 82

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.

To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:

t-test statistic: 3.6586
two-tailed p-value: 0.0012

Since the p-value is less than .05, the professor rejects the null hypothesis.

She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?

Featured Posts

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

One Reply to “Two-Tailed Hypothesis Tests: 3 Example Problems”

i owe u my first born child

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Search Search Please fill out this field.

What Is a Two-Tailed Test?

Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.

Two-Tailed Test FAQs
Corporate Finance
Financial Analysis

What Is a Two-Tailed Test? Definition and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

Investopedia / Joules Garcia

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

Key Takeaways

In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
It is used in null-hypothesis testing and testing for statistical significance.
If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.

A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.

A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.

There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.

A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.

To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.

For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.

If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.

Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.

When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be at least x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being either higher or lower than it.

If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.

A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.

Example of a Two-Tailed Test

As a hypothetical example, imagine that a new stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.

A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?

H 0 : Null Hypothesis: mean = 18
H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
Rejection region: Z <= - Z 2.5 and Z>=Z 2.5 (assuming 5% significance level, split 2.5 each on either side).
Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25

This calculated Z value falls between the two limits defined by: - Z 2.5 = -1.96 and Z 2.5 = 1.96.

This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.

How Is a Two-Tailed Test Designed?

A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.

What Is the Difference Between a Two-Tailed and One-Tailed Test?

A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than or significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.

What Is a Z-score?

A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.

San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."

Terms of Service
Editorial Policy
Privacy Policy

The Open University
Guest user / Sign out
Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.2 Two-tailed tests

Hypotheses that have an equal (=) or not equal (≠) supposition (sign) in the statement are called non-directional hypotheses . In non-directional hypotheses, the researcher is interested in whether there is a statistically significant difference or relationship between two or more variables, but does not have any specific expectation about which group or variable will be higher or lower. For example, a non-directional hypothesis might be: ‘There is a difference in the preference for brand X between male and female consumers.’ In this hypothesis, the researcher is interested in whether there is a statistically significant difference in the preference for brand X between male and female consumers, but does not have a specific prediction about which gender will have a higher preference. The researcher may conduct a survey or experiment to collect data on the brand preference of male and female consumers and then use statistical analysis to determine whether there is a significant difference between the two groups.

Non-directional hypotheses are also known as two-tailed hypotheses. The term ‘two-tailed’ comes from the fact that the statistical test used to evaluate the hypothesis is based on the assumption that the difference or relationship could occur in either direction, resulting in two ‘tails’ in the probability distribution. Using the coffee foam example (from Activity 1), you have the following set of hypotheses:

H 0 : µ = 1cm foam

H a : µ ≠ 1cm foam

In this case, the researcher can reject the null hypothesis for the mean value that is either ‘much higher’ or ‘much lower’ than 1 cm foam. This is called a two-tailed test because the rejection region includes outcomes from both the upper and lower tails of the sample distribution when determining a decision rule. To give an illustration, if you set alpha level (α) equal to 0.05, that would give you a 95% confidence level. Then, you would reject the null hypothesis for obtained values of z 1.96 (you will look at how to calculate z-scores later in the course).

This can be plotted on a graph as shown in Figure 7.

A two-tailed test shown in a symmetrical graph reminiscent of a bell

A symmetrical graph reminiscent of a bell. The x-axis is labelled ‘z-score’ and the y-axis is labelled ‘probability density’. The x-axis increases in increments of 1 from -2 to 2.

The top of the bell-shaped curve is labelled ‘Foam height = 1cm’. The graph circles the rejection regions of the null hypothesis on both sides of the bell curve. Within these circles are two areas shaded orange: beneath the curve from -2 downwards which is labelled z 1.96 and α = 0.025.

In a two-tailed hypothesis test, the null hypothesis assumes that there is no significant difference or relationship between the two groups or variables, and the alternative hypothesis suggests that there is a significant difference or relationship, but does not specify the direction of the difference or relationship.

When performing a two-tailed test, you need to determine the level of significance, which is denoted by alpha (α). The value of alpha, in this case, is 0.05. To perform a two-tailed test at a significance level of 0.05, you need to divide alpha by 2, giving a significance level of 0.025 for each distribution tail (0.05/2 = 0.025). This is done because the two-tailed test is looking for significance in either tail of the distribution. If the calculated test statistic falls in the rejection region of either tail of the distribution, then the null hypothesis is rejected and the alternative hypothesis is accepted. In this case, the researcher can conclude that there is a significant difference or relationship between the two groups or variables.

Assuming that the population follows a normal distribution, the tail located below the critical value of z = –1.96 (in a later section, you will discuss how this value was determined) and the tail above the critical value of z = +1.96 each represent a proportion of 0.025. These tails are referred to as the lower and upper tails, respectively, and they correspond to the extreme values of the distribution that are far from the central part of the bell curve. These critical values are used in a two-tailed hypothesis test to determine whether to reject or fail to reject the null hypothesis. The null hypothesis represents the default assumption that there is no significant difference between the observed data and what would be expected under a specific condition.

If the calculated test statistic falls within the critical values, then the null hypothesis cannot be rejected at the 0.05 level of significance. However, if the calculated test statistic falls outside the critical values (orange-coloured areas in Figure 7), then the null hypothesis can be rejected in favour of the alternative hypothesis, suggesting that there is evidence of a significant difference between the observed data and what would be expected under the specified condition.

Hypothesis Testing for Means & Proportions

1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
| 10

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

All Modules

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.

Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);

H 1 : Research hypothesis (investigator's belief); α =0.05

Upper-tailed, Lower-tailed, Two-tailed Tests

The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize:

: μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized - this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .

The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.

Step 2. Select the appropriate test statistic.

The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

Step 3. Set up decision rule.

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value. In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Rejection Region for Upper-Tailed Z Test (H : μ > μ ) with α=0.05

The decision rule is: Reject H if Z 1.645.


α	Z
0.10	1.282
0.05	1.645
0.025	1.960
0.010	2.326
0.005	2.576
0.001	3.090
0.0001	3.719

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.


a	Z
0.10	-1.282
0.05	-1.645
0.025	-1.960
0.010	-2.326
0.005	-2.576
0.001	-3.090
0.0001	-3.719

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.



0.20	1.282
0.10	1.645
0.05	1.960
0.010	2.576
0.001	3.291
0.0001	3.819

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

Step 4. Compute the test statistic.

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

Step 5. Conclusion.

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .

Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

Step 3. Set up decision rule.

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05. Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis


is True	Correct Decision	Type I Error
is False	Type II Error	Correct Decision

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a or H 1 ).
Collect data in a way designed to test the hypothesis.
Perform an appropriate statistical test .
Decide whether to reject or fail to reject your null hypothesis.
Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

an estimate of the difference in average height between the two groups.
a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Descriptive statistics
Measures of central tendency
Correlation coefficient

Methodology

Cluster sampling
Stratified sampling
Types of interviews
Cohort study
Thematic analysis

Research bias

Implicit bias
Cognitive bias
Survivorship bias
Availability heuristic
Nonresponse bias
Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved June 9, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Two Tailed Test: Definition, Examples

Hypothesis Testing > Two Tailed Test

What is a Two Tailed Test?

A two tailed test tells you that you’re finding the area in the middle of a distribution. In other words, your rejection region (the place where you would reject the null hypothesis ) is in both tails.

For example, let’s say you were running a z test with an alpha level of 5% (0.05). In a one tailed test, the entire 5% would be in a single tail. But with a two tailed test, that 5% is split between the two tails, giving you 2.5% (0.025) in each tail.

Need help with a homework question? Check out our tutoring page!

Two Tailed T Test

You may want to compare a sample mean to a given value of x with a t test . Let’s say your null hypothesis is that the mean is equal to 10 (μ = 10). A two tailed t test will test:

Is the mean greater than 10?
Is the mean less than 10?

If you choose an alpha level of 5%, and the f statistic is in the top 2.5% or bottom 2.5% of the probability distribution, then there is a significant difference in the means. That situation will also result in a p-value of less than 0.05. A small p-value gives you a reason to reject the null hypothesis .

Two tailed F test

An f test tells you if two population variances are equal. A two tailed f test is the standard type of f test which will tell you if the variances are equal or not equal. The two tailed version of test will test if one variance is greater than, or less than, the other variance. This is in comparison to the one tailed f test , which is used when you only want to test if one variance is greater than the other or that one variance is less than the other (but not both).

Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics , Cambridge University Press. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial.

One and Two Tailed Tests

Suppose we have a null hypothesis H 0 and an alternative hypothesis H 1 . We consider the distribution given by the null hypothesis and perform a test to determine whether or not the null hypothesis should be rejected in favour of the alternative hypothesis.

There are two different types of tests that can be performed. A one-tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter (which can be any change- increase or decrease).

We can perform the test at any level (usually 1%, 5% or 10%). For example, performing the test at a 5% level means that there is a 5% chance of wrongly rejecting H 0 .

If we perform the test at the 5% level and decide to reject the null hypothesis, we say "there is significant evidence at the 5% level to suggest the hypothesis is false".

One-Tailed Test

We choose a critical region. In a one-tailed test, the critical region will have just one part (the red area below). If our sample value lies in this region, we reject the null hypothesis in favour of the alternative.

Suppose we are looking for a definite decrease. Then the critical region will be to the left. Note, however, that in the one-tailed test the value of the parameter can be as high as you like.

Suppose we are given that X has a Poisson distribution and we want to carry out a hypothesis test on the mean, l, based upon a sample observation of 3.

Suppose the hypotheses are: H 0 : l = 9 H 1 : l < 9

We want to test if it is "reasonable" for the observed value of 3 to have come from a Poisson distribution with parameter 9. So what is the probability that a value as low as 3 has come from a Po(9)?

P(X < 3) = 0.0212 (this has come from a Poisson table)

The probability is less than 0.05, so there is less than a 5% chance that the value has come from a Poisson(3) distribution. We therefore reject the null hypothesis in favour of the alternative at the 5% level.

However, the probability is greater than 0.01, so we would not reject the null hypothesis in favour of the alternative at the 1% level.

Two-Tailed Test

In a two-tailed test, we are looking for either an increase or a decrease. So, for example, H 0 might be that the mean is equal to 9 (as before). This time, however, H 1 would be that the mean is not equal to 9. In this case, therefore, the critical region has two parts:

Lets test the parameter p of a Binomial distribution at the 10% level.

Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the coin is fair. If the coin is fair, p = 0.5 . Put this as the null hypothesis:

H 0 : p = 0.5 H 1 : p =(doesn' equal) 0.5

Now, because the test is 2-tailed, the critical region has two parts. Half of the critical region is to the right and half is to the left. So the critical region contains both the top 5% of the distribution and the bottom 5% of the distribution (since we are testing at the 10% level).

If H 0 is true, X ~ Bin(10, 0.5).

If the null hypothesis is true, what is the probability that X is 7 or above? P(X > 7) = 1 - P(X < 7) = 1 - P(X < 6) = 1 - 0.8281 = 0.1719

Is this in the critical region? No- because the probability that X is at least 7 is not less than 0.05 (5%), which is what we need it to be.

So there is not significant evidence at the 10% level to reject the null hypothesis.

The London Interdisciplinary School banner

Skip to primary navigation
Skip to main content
Skip to primary sidebar

Institute for Digital Research and Education

FAQ: What are the differences between one-tailed and two-tailed tests?

When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output. If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test. However, the p-value presented is (almost always) for a two-tailed test. But how do you choose which test? Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output?

What is a two-tailed test?

First let’s start with the meaning of a two-tailed test. If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions. For example, we may wish to compare the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x . A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x . The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.

What is a one-tailed test?

Next, let’s discuss the meaning of a one-tailed test. If you are using a significance level of .05, a one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction. Let’s return to our example comparing the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x . A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x , but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05. The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A discussion of when this is an appropriate option follows.

When is a one-tailed test appropriate?

Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction. Imagine you have developed a new drug that you believe is an improvement over an existing drug. You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug. The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.

So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical, then you can proceed with a one-tailed test. For example, imagine again that you have developed a new drug. It is cheaper than the existing drug and, you believe, no less effective. In testing this drug, you are only interested in testing if it less effective than the existing drug. You do not care if it is significantly more effective. You only wish to show that it is not less effective. In this scenario, a one-tailed test would be appropriate.

When is a one-tailed test NOT appropriate?

Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate. Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was. Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable–a steep price to pay for a significance star in your results table!

Deriving a one-tailed test from two-tailed output

The default among statistical packages performing tests is to report two-tailed p-values. Because the most commonly used test statistic distributions (standard normal, Student’s t) are symmetric about zero, most one-tailed p-values can be derived from the two-tailed p-values.

Below, we have the output from a two-sample t-test in Stata. The test is comparing the mean male score to the mean female score. The null hypothesis is that the difference in means is zero. The two-sided alternative is that the difference in means is not zero. There are two one-sided alternatives that one could opt to test instead: that the male score is higher than the female score (diff > 0) or that the female score is higher than the male score (diff < 0). In this instance, Stata presents results for all three alternatives. Under the headings Ha: diff < 0 and Ha: diff > 0 are the results for the one-tailed tests. In the middle, under the heading Ha: diff != 0 (which means that the difference is not equal to 0), are the results for the two-tailed test.

Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999

Note that the test statistic, -3.7341, is the same for all of these tests. The two-tailed p-value is P > |t|. This can be rewritten as P(>3.7341) + P(< -3.7341). Because the t-distribution is symmetric about zero, these two probabilities are equal: P > |t| = 2 * P(< -3.7341). Thus, we can see that the two-tailed p-value is twice the one-tailed p-value for the alternative hypothesis that (diff < 0). The other one-tailed alternative hypothesis has a p-value of P(>-3.7341) = 1-(P<-3.7341) = 1-0.0001 = 0.9999. So, depending on the direction of the one-tailed hypothesis, its p-value is either 0.5*(two-tailed p-value) or 1-0.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero.

In this example, the two-tailed p-value suggests rejecting the null hypothesis of no difference. Had we opted for the one-tailed test of (diff > 0), we would fail to reject the null because of our choice of tails.

The output below is from a regression analysis in Stata. Unlike the example above, only the two-sided p-values are presented in this output.

Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 46.58 Model | 7363.62077 2 3681.81039 Prob > F = 0.0000 Residual | 15572.5742 197 79.0486001 R-squared = 0.3210 -------------+------------------------------ Adj R-squared = 0.3142 Total | 22936.195 199 115.257261 Root MSE = 8.8909 ------------------------------------------------------------------------------ socst | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2191144 .0820323 2.67 0.008 .0573403 .3808885 math | .4778911 .0866945 5.51 0.000 .3069228 .6488594 _cons | 15.88534 3.850786 4.13 0.000 8.291287 23.47939 ------------------------------------------------------------------------------

For each regression coefficient, the tested null hypothesis is that the coefficient is equal to zero. Thus, the one-tailed alternatives are that the coefficient is greater than zero and that the coefficient is less than zero. To get the p-value for the one-tailed test of the variable science having a coefficient greater than zero, you would divide the .008 by 2, yielding .004 because the effect is going in the predicted direction. This is P(>2.67). If you had made your prediction in the other direction (the opposite direction of the model effect), the p-value would have been 1 – .004 = .996. This is P(<2.67). For all three p-values, the test statistic is 2.67.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

Study Guides
One- and Two-Tailed Tests
Method of Statistical Inference
Types of Statistics
Steps in the Process
Making Predictions
Comparing Results
Probability
Quiz: Introduction to Statistics
What Are Statistics?
Quiz: Bar Chart
Quiz: Pie Chart
Introduction to Graphic Displays
Quiz: Dot Plot
Quiz: Introduction to Graphic Displays
Frequency Histogram
Relative Frequency Histogram
Quiz: Relative Frequency Histogram
Frequency Polygon
Quiz: Frequency Polygon
Frequency Distribution
Stem-and-Leaf
Box Plot (Box-and-Whiskers)
Quiz: Box Plot (Box-and-Whiskers)
Scatter Plot
Measures of Central Tendency
Quiz: Measures of Central Tendency
Measures of Variability
Quiz: Measures of Variability
Measurement Scales
Quiz: Introduction to Numerical Measures
Classic Theory
Relative Frequency Theory
Probability of Simple Events
Quiz: Probability of Simple Events
Independent Events
Dependent Events
Introduction to Probability
Quiz: Introduction to Probability
Probability of Joint Occurrences
Quiz: Probability of Joint Occurrences
Non-Mutually-Exclusive Outcomes
Quiz: Non-Mutually-Exclusive Outcomes
Double-Counting
Conditional Probability
Quiz: Conditional Probability
Probability Distributions
Quiz: Probability Distributions
The Binomial
Quiz: The Binomial
Quiz: Sampling Distributions
Random and Systematic Error
Central Limit Theorem
Quiz: Central Limit Theorem
Populations, Samples, Parameters, and Statistics
Properties of the Normal Curve
Quiz: Populations, Samples, Parameters, and Statistics
Sampling Distributions
Quiz: Properties of the Normal Curve
Normal Approximation to the Binomial
Quiz: Normal Approximation to the Binomial
Quiz: Stating Hypotheses
The Test Statistic
Quiz: The Test Statistic
Quiz: One- and Two-Tailed Tests
Type I and II Errors
Quiz: Type I and II Errors
Stating Hypotheses
Significance
Quiz: Significance
Point Estimates and Confidence Intervals
Quiz: Point Estimates and Confidence Intervals
Estimating a Difference Score
Quiz: Estimating a Difference Score
Univariate Tests: An Overview
Quiz: Univariate Tests: An Overview
One-Sample z-test
Quiz: One-Sample z-test
One-Sample t-test
Quiz: One-Sample t-test
Two-Sample z-test for Comparing Two Means
Quiz: Introduction to Univariate Inferential Tests
Quiz: Two-Sample z-test for Comparing Two Means
Two Sample t test for Comparing Two Means
Quiz: Two-Sample t-test for Comparing Two Means
Paired Difference t-test
Quiz: Paired Difference t-test
Test for a Single Population Proportion
Quiz: Test for a Single Population Proportion
Test for Comparing Two Proportions
Quiz: Test for Comparing Two Proportions
Quiz: Simple Linear Regression
Chi-Square (X2)
Quiz: Chi-Square (X2)
Correlation
Quiz: Correlation
Simple Linear Regression
Common Mistakes
Statistics Tables
Quiz: Cumulative Review A
Quiz: Cumulative Review B
Statistics Quizzes

In the previous example, you tested a research hypothesis that predicted not only that the sample mean would be different from the population mean but that it would be different in a specific direction—it would be lower. This test is called a directional or one‐tailed test because the region of rejection is entirely within one tail of the distribution.

Some hypotheses predict only that one value will be different from another, without additionally predicting which will be higher. The test of such a hypothesis is nondirectional or two‐tailed because an extreme test statistic in either tail of the distribution (positive or negative) will lead to the rejection of the null hypothesis of no difference.

Suppose that you suspect that a particular class's performance on a proficiency test is not representative of those people who have taken the test. The national mean score on the test is 74.

The research hypothesis is:

The mean score of the class on the test is not 74.

Or in notation: H a : μ ≠ 74

The null hypothesis is:

The mean score of the class on the test is 74.

In notation: H 0 : μ = 74

As in the last example, you decide to use a 5 percent probability level for the test. Both tests have a region of rejection, then, of 5 percent, or 0.05. In this example, however, the rejection region must be split between both tails of the distribution—0.025 in the upper tail and 0.025 in the lower tail—because your hypothesis specifies only a difference, not a direction, as shown in Figure 1(a). You will reject the null hypotheses of no difference if the class sample mean is either much higher or much lower than the population mean of 74. In the previous example, only a sample mean much lower than the population mean would have led to the rejection of the null hypothesis.

Figure 1.Comparison of (a) a two‐tailed test and (b) a one‐tailed test, at the same probability level (95 percent).

The decision of whether to use a one‐ or a two‐tailed test is important because a test statistic that falls in the region of rejection in a one‐tailed test may not do so in a two‐tailed test, even though both tests use the same probability level. Suppose the class sample mean in your example was 77, and its corresponding z ‐score was computed to be 1.80. Table 2 in "Statistics Tables" shows the critical z ‐scores for a probability of 0.025 in either tail to be –1.96 and 1.96. In order to reject the null hypothesis, the test statistic must be either smaller than –1.96 or greater than 1.96. It is not, so you cannot reject the null hypothesis. Refer to Figure 1(a).

Suppose, however, you had a reason to expect that the class would perform better on the proficiency test than the population, and you did a one‐tailed test instead. For this test, the rejection region of 0.05 would be entirely within the upper tail. The critical z ‐value for a probability of 0.05 in the upper tail is 1.65. (Remember that Table 2 in "Statistics Tables" gives areas of the curve below z ; so you look up the z ‐value for a probability of 0.95.) Your computed test statistic of z = 1.80 exceeds the critical value and falls in the region of rejection, so you reject the null hypothesis and say that your suspicion that the class was better than the population was supported. See Figure 1(b).

In practice, you should use a one‐tailed test only when you have good reason to expect that the difference will be in a particular direction. A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis.

Previous Quiz: The Test Statistic

Next Quiz: One- and Two-Tailed Tests

Online Quizzes for CliffsNotes Statistics QuickReview, 2nd Edition

An open portfolio of interoperable, industry leading products

The Dotmatics digital science platform provides the first true end-to-end solution for scientific R&D, combining an enterprise data platform with the most widely used applications for data analysis, biologics, flow cytometry, chemicals innovation, and more.

Statistical analysis and graphing software for scientists

Bioinformatics, cloning, and antibody discovery software

Plan, visualize, & document core molecular biology procedures

Electronic Lab Notebook to organize, search and share data

Proteomics software for analysis of mass spec data

Modern cytometry analysis platform

Analysis, statistics, graphing and reporting of flow cytometry data

Software to optimize designs of clinical trials

The Ultimate Guide to T Tests

Get all of your t test questions answered here

The ultimate guide to t tests

The t test is one of the simplest statistical techniques that is used to evaluate whether there is a statistical difference between the means from up to two different samples. The t test is especially useful when you have a small number of sample observations (under 30 or so), and you want to make conclusions about the larger population.

The characteristics of the data dictate the appropriate type of t test to run. All t tests are used as standalone analyses for very simple experiments and research questions as well as to perform individual tests within more complicated statistical models such as linear regression. In this guide, we’ll lay out everything you need to know about t tests, including providing a simple workflow to determine what t test is appropriate for your particular data or if you’d be better suited using a different model.

What is a t test?

A t test is a statistical technique used to quantify the difference between the mean (average value) of a variable from up to two samples (datasets). The variable must be numeric. Some examples are height, gross income, and amount of weight lost on a particular diet.

A t test tells you if the difference you observe is “surprising” based on the expected difference. They use t-distributions to evaluate the expected variability. When you have a reasonable-sized sample (over 30 or so observations), the t test can still be used, but other tests that use the normal distribution (the z test) can be used in its place.

Sometimes t tests are called “Student’s” t tests, which is simply a reference to their unusual history.

Barrels at the guinness brewery museum in Dublin, Ireland |sebastiangora (Adobe Stock)

It got its name because a brewer from the Guinness Brewery, William Gosset , published about the method under the pseudonym "Student". He wanted to get information out of very small sample sizes (often 3-5) because it took so much effort to brew each keg for his samples.

When should I use a t test?

A t test is appropriate to use when you’ve collected a small, random sample from some statistical “population” and want to compare the mean from your sample to another value. The value for comparison could be a fixed value (e.g., 10) or the mean of a second sample.

For example, if your variable of interest is the average height of sixth graders in your region, then you might measure the height of 25 or 30 randomly-selected sixth graders. A t test could be used to answer questions such as, “Is the average height greater than four feet?”

How does a t test work?

Based on your experiment, t tests make enough assumptions about your experiment to calculate an expected variability, and then they use that to determine if the observed data is statistically significant. To do this, t tests rely on an assumed “null hypothesis.” With the above example, the null hypothesis is that the average height is less than or equal to four feet.

Say that we measure the height of 5 randomly selected sixth graders and the average height is five feet. Does that mean that the “true” average height of all sixth graders is greater than four feet or did we randomly happen to measure taller than average students?

To evaluate this, we need a distribution that shows every possible average value resulting from a sample of five individuals in a population where the true mean is four. That may seem impossible to do, which is why there are particular assumptions that need to be made to perform a t test.

With those assumptions, then all that’s needed to determine the “sampling distribution of the mean” is the sample size (5 students in this case) and standard deviation of the data (let’s say it’s 1 foot).

That’s enough to create a graphic of the distribution of the mean, which is:

Notice the vertical line at x = 5, which was our sample mean. We (use software to) calculate the area to the right of the vertical line, which gives us the P value (0.09 in this case). Note that because our research question was asking if the average student is greater than four feet, the distribution is centered at four. Since we’re only interested in knowing if the average is greater than four feet, we use a one-tailed test in this case.

Using the standard confidence level of 0.05 with this example, we don’t have evidence that the true average height of sixth graders is taller than 4 feet.

What are the assumptions for t tests?

One variable of interest : This is not correlation or regression, where you are interested in the relationship between multiple variables. With a t test, you can have different samples, but they are all measuring the same variable (e.g., height).
Numeric data: You are dealing with a list of measurements that can be averaged. This means you aren’t just counting occurrences in various categories (e.g., eye color or political affiliation).
Two groups or less: If you have more than two samples of data, a t test is the wrong technique. You most likely need to try ANOVA.
Random sample : You need a random sample from your statistical “population of interest” in order to draw valid conclusions about the larger population. If your population is so small that you can measure everything, then you have a “census” and don’t need statistics. This is because you don’t need to estimate the truth, since you have measured the truth without variability.
Normally Distributed : The smaller your sample size, the more important it is that your data come from a normal, Gaussian distribution bell curve. If you have reason to believe that your data are not normally distributed, consider nonparametric t test alternatives . This isn’t necessary for larger samples (usually 25 or 30 unless the data is heavily skewed). The reason is that the Central Limit Theorem applies in this case, which says that even if the distribution of your data is not normal, the distribution of the mean of your data is, so you can use a z-test rather than a t test.

How do I know which t test to use?

There are many types of t tests to choose from, but you don’t necessarily have to understand every detail behind each option.

You just need to be able to answer a few questions, which will lead you to pick the right t test. To that end, we put together this workflow for you to figure out which test is appropriate for your data.

Do you have one or two samples?

Are you comparing the means of two different samples, or comparing the mean from one sample to a fixed value? An example research question is, “Is the average height of my sample of sixth grade students greater than four feet?”

If you only have one sample of data, you can click here to skip to a one-sample t test example, otherwise your next step is to ask:

Are observations in the two samples matched up or related in some way?

This could be as before-and-after measurements of the same exact subjects, or perhaps your study split up “pairs” of subjects (who are technically different but share certain characteristics of interest) into the two samples. The same variable is measured in both cases.

If so, you are looking at some kind of paired samples t test . The linked section will help you dial in exactly which one in that family is best for you, either difference (most common) or ratio.

If you aren’t sure paired is right, ask yourself another question:

Are you comparing different observations in each of the two samples?

If the answer is yes, then you have an unpaired or independent samples t test. The two samples should measure the same variable (e.g., height), but are samples from two distinct groups (e.g., team A and team B).

The goal is to compare the means to see if the groups are significantly different. For example, “Is the average height of team A greater than team B?” Unlike paired, the only relationship between the groups in this case is that we measured the same variable for both. There are two versions of unpaired samples t tests (pooled and unpooled) depending on whether you assume the same variance for each sample.

Have you run the same experiment multiple times on the same subject/observational unit?

If so, then you have a nested t test (unless you have more than two sample groups). This is a trickier concept to understand. One example is if you are measuring how well Fertilizer A works against Fertilizer B. Let’s say you have 12 pots to grow plants in (6 pots for each fertilizer), and you grow 3 plants in each pot.

In this case you have 6 observational units for each fertilizer, with 3 subsamples from each pot. You would want to analyze this with a nested t test . The “nested” factor in this case is the pots. It’s important to note that we aren’t interested in estimating the variability within each pot, we just want to take it into account.

You might be tempted to run an unpaired samples t test here, but that assumes you have 6*3 = 18 replicates for each fertilizer. However, the three replicates within each pot are related, and an unpaired samples t test wouldn’t take that into account.

What if none of these sound like my experiment?

If you’re not seeing your research question above, note that t tests are very basic statistical tools. Many experiments require more sophisticated techniques to evaluate differences. If the variable of interest is a proportion (e.g., 10 of 100 manufactured products were defective), then you’d use z-tests. If you take before and after measurements and have more than one treatment (e.g., control vs a treatment diet), then you need ANOVA.

How do I perform a t test using software?

If you’re wondering how to do a t test, the easiest way is with statistical software such as Prism or an online t test calculator .

If you’re using software, then all you need to know is which t test is appropriate ( use the workflow here ) and understand how to interpret the output. To do that, you’ll also need to:

Determine whether your test is one or two-tailed
Choose the level of significance

Is my test one or two-tailed?

Whether or not you have a one- or two-tailed test depends on your research hypothesis. Choosing the appropriately tailed test is very important and requires integrity from the researcher. This is because you have more “power” with one-tailed tests, meaning that you can detect a statistically significant difference more easily. Unless you have written out your research hypothesis as one directional before you run your experiment, you should use a two-tailed test.

Two-tailed tests

Two-tailed tests are the most common, and they are applicable when your research question is simply asking, “is there a difference?”

One-tailed tests

Contrast that with one-tailed tests, where the research questions are directional, meaning that either the question is, “is it greater than ” or the question is, “is it less than ”. These tests can only detect a difference in one direction.

Choosing the level of significance

All t tests estimate whether a mean of a population is different than some other value, and with all estimates come some variability, or what statisticians call “error.” Before analyzing your data, you want to choose a level of significance, usually denoted by the Greek letter alpha, 𝛼. The scientific standard is setting alpha to be 0.05.

An alpha of 0.05 results in 95% confidence intervals, and determines the cutoff for when P values are considered statistically significant.

One sample t test

If you only have one sample of a list of numbers, you are doing a one-sample t test. All you are interested in doing is comparing the mean from this group with some known value to test if there is evidence, that it is significantly different from that standard. Use our free one-sample t test calculator for this.

A one sample t test example research question is, “Is the average fifth grader taller than four feet?”

It is the simplest version of a t test, and has all sorts of applications within hypothesis testing. Sometimes the “known value” is called the “null value”. While the null value in t tests is often 0, it could be any value. The name comes from being the value which exactly represents the null hypothesis, where no significant difference exists.

Any time you know the exact number you are trying to compare your sample of data against, this could work well. And of course: it can be either one or two-tailed.

One sample t test formula

Statistical software handles this for you, but if you want the details, the formula for a one sample t test is:

M: Calculated mean of your sample
μ: Hypothetical mean you are testing against
s: The standard deviation of your sample
n: The number of observations in your sample.

In a one-sample t test, calculating degrees of freedom is simple: one less than the number of objects in your dataset (you’ll see it written as n-1 ).

Example of a one sample t test

For our example within Prism, we have a dataset of 12 values from an experiment labeled “% of control”. Perhaps these are heights of a sample of plants that have been treated with a new fertilizer. A value of 100 represents the industry-standard control height. Likewise, 123 represents a plant with a height 123% that of the control (that is, 23% larger).

We’ll perform a two-tailed, one-sample t test to see if plants are shorter or taller on average with the fertilizer. We will use a significance threshold of 0.05. Here is the output:

You can see in the output that the actual sample mean was 111. Is that different enough from the industry standard (100) to conclude that there is a statistical difference?

The quick answer is yes, there’s strong evidence that the height of the plants with the fertilizer is greater than the industry standard (p=0.015). The nice thing about using software is that it handles some of the trickier steps for you. In this case, it calculates your test statistic (t=2.88), determines the appropriate degrees of freedom (11), and outputs a P value.

More informative than the P value is the confidence interval of the difference, which is 2.49 to 18.7. The confidence interval tells us that, based on our data, we are confident that the true difference between our sample and the baseline value of 100 is somewhere between 2.49 and 18.7. As long as the difference is statistically significant, the interval will not contain zero.

You can follow these tips for interpreting your own one-sample test.

Graphing a one-sample t test

For some techniques (like regression), graphing the data is a very helpful part of the analysis. For t tests, making a chart of your data is still useful to spot any strange patterns or outliers, but the small sample size means you may already be familiar with any strange things in your data.

Here we have a simple plot of the data points, perhaps with a mark for the average. We’ve made this as an example, but the truth is that graphing is usually more visually telling for two-sample t tests than for just one sample.

Two sample t tests

There are several kinds of two sample t tests, with the two main categories being paired and unpaired (independent) samples.

Paired samples t test

In a paired samples t test, also called dependent samples t test, there are two samples of data, and each observation in one sample is “paired” with an observation in the second sample. The most common example is when measurements are taken on each subject before and after a treatment. A paired t test example research question is, “Is there a statistical difference between the average red blood cell counts before and after a treatment?”

Having two samples that are closely related simplifies the analysis. Statistical software, such as this paired t test calculator , will simply take a difference between the two values, and then compare that difference to 0.

In some (rare) situations, taking a difference between the pairs violates the assumptions of a t test, because the average difference changes based on the size of the before value (e.g., there’s a larger difference between before and after when there were more to start with). In this case, instead of using a difference test, use a ratio of the before and after values, which is referred to as ratio t tests .

Paired t test formula

The formula for paired samples t test is:

Md: Mean difference between the samples
sd: The standard deviation of the differences
n: The number of differences

Degrees of freedom are the same as before. If you’re studying for an exam, you can remember that the degrees of freedom are still n-1 (not n-2) because we are converting the data into a single column of differences rather than considering the two groups independently.

Also note that the null value here is simply 0. There is no real reason to include “minus 0” in an equation other than to illustrate that we are still doing a hypothesis test. After you take the difference between the two means, you are comparing that difference to 0.

For our example data, we have five test subjects and have taken two measurements from each: before (“control”) and after a treatment (“treated”). If we set alpha = 0.05 and perform a two-tailed test, we observe a statistically significant difference between the treated and control group (p=0.0160, t=4.01, df = 4). We are 95% confident that the true mean difference between the treated and control group is between 0.449 and 2.47.

Graphing a paired t test

The significant result of the P value suggests evidence that the treatment had some effect, and we can also look at this graphically. The lines that connect the observations can help us spot a pattern, if it exists. In this case the lines show that all observations increased after treatment. While not all graphics are this straightforward, here it is very consistent with the outcome of the t test.

Prism’s estimation plot is even more helpful because it shows both the data (like above) and the confidence interval for the difference between means. You can easily see the evidence of significance since the confidence interval on the right does not contain zero.

Here are some more graphing tips for paired t tests .

Unpaired samples t test

Unpaired samples t test, also called independent samples t test, is appropriate when you have two sample groups that aren’t correlated with one another. A pharma example is testing a treatment group against a control group of different subjects. Compare that with a paired sample, which might be recording the same subjects before and after a treatment.

With unpaired t tests, in addition to choosing your level of significance and a one or two tailed test, you need to determine whether or not to assume that the variances between the groups are the same or not. If you assume equal variances, then you can “pool” the calculation of the standard error between the two samples. Otherwise, the standard choice is Welch’s t test which corrects for unequal variances. This choice affects the calculation of the test statistic and the power of the test, which is the test’s sensitivity to detect statistical significance.

It’s best to choose whether or not you’ll use a pooled or unpooled (Welch’s) standard error before running your experiment, because the standard statistical test is notoriously problematic. See more details about unequal variances here .

As long as you’re using statistical software, such as this two-sample t test calculator , it’s just as easy to calculate a test statistic whether or not you assume that the variances of your two samples are the same. If you’re doing it by hand, however, the calculations get more complicated with unequal variances.

Unpaired (independent) samples t test formula

The general two-sample t test formula is:

M1 and M2: Two means you are comparing, one from each dataset
SE : The combined standard error of the two samples (calculated using pooled or unpooled standard error)

The denominator (standard error) calculation can be complicated, as can the degrees of freedom. If the groups are not balanced (the same number of observations in each), you will need to account for both when determining n for the test as a whole.

As an example for this family, we conduct a paired samples t test assuming equal variances (pooled). Based on our research hypothesis, we’ll conduct a two-tailed test, and use alpha=0.05 for our level of significance. Our samples were unbalanced, with two samples of 6 and 5 observations respectively.

The P value (p=0.261, t = 1.20, df = 9) is higher than our threshold of 0.05. We have not found sufficient evidence to suggest a significant difference. You can see the confidence interval of the difference of the means is -9.58 to 31.2.

Note that the F-test result shows that the variances of the two groups are not significantly different from each other.

Graphing an unpaired samples t test

For an unpaired samples t test, graphing the data can quickly help you get a handle on the two groups and how similar or different they are. Like the paired example, this helps confirm the evidence (or lack thereof) that is found by doing the t test itself.

Below you can see that the observed mean for females is higher than that for males. But because of the variability in the data, we can’t tell if the means are actually different or if the difference is just by chance.

Nonparametric alternatives for t tests

If your data comes from a normal distribution (or something close enough to a normal distribution), then a t test is valid. If that assumption is violated, you can use nonparametric alternatives.

T tests evaluate whether the mean is different from another value, whereas nonparametric alternatives compare either the median or the rank. Medians are well-known to be much more robust to outliers than the mean.

The downside to nonparametric tests is that they don’t have as much statistical power, meaning a larger difference is required in order to determine that it’s statistically significant.

Wilcoxon signed-rank test

The Wilcoxon signed-rank test is the nonparametric cousin to the one-sample t test. This compares a sample median to a hypothetical median value. It is sometimes erroneously even called the Wilcoxon t test (even though it calculates a “W” statistic).

And if you have two related samples, you should use the Wilcoxon matched pairs test instead. The two versions of Wilcoxon are different, and the matched pairs version is specifically for comparing the median difference for paired samples.

Mann-Whitney and Kolmogorov-Smirnov tests

For unpaired (independent) samples, there are multiple options for nonparametric testing. Mann-Whitney is more popular and compares the mean ranks (the ordering of values from smallest to largest) of the two samples. Mann-Whitney is often misrepresented as a comparison of medians, but that’s not always the case. Kolmogorov-Smirnov tests if the overall distributions differ between the two samples.

More t test FAQs

What is the formula for a t test.

The exact formula depends on which type of t test you are running, although there is a basic structure that all t tests have in common. All t test statistics will have the form:

t : The t test statistic you calculate for your test
Mean1 and Mean2: Two means you are comparing, at least 1 from your own dataset
Standard Error of the Mean : The standard error of the mean , also called the standard deviation of the mean, which takes into account the variance and size of your dataset

The exact formula for any t test can be slightly different, particularly the calculation of the standard error. Not only does it matter whether one or two samples are being compared, the relationship between the samples can make a difference too.

What is a t-distribution?

A t-distribution is similar to a normal distribution. It’s a bell-shaped curve, but compared to a normal it has fatter tails, which means that it’s more common to observe extremes. T-distributions are identified by the number of degrees of freedom. The higher the number, the closer the t-distribution gets to a normal distribution. After about 30 degrees of freedom, a t and a standard normal are practically the same.

What are degrees of freedom?

Degrees of freedom are a measure of how large your dataset is. They aren’t exactly the number of observations, because they also take into account the number of parameters (e.g., mean, variance) that you have estimated.

What is the difference between paired vs unpaired t tests?

Both paired and unpaired t tests involve two sample groups of data. With a paired t test, the values in each group are related (usually they are before and after values measured on the same test subject). In contrast, with unpaired t tests, the observed values aren’t related between groups. An unpaired, or independent t test, example is comparing the average height of children at school A vs school B.

When do I use a z-test versus a t test?

Z-tests, which compare data using a normal distribution rather than a t-distribution, are primarily used for two situations. The first is when you’re evaluating proportions (number of failures on an assembly line). The second is when your sample size is large enough (usually around 30) that you can use a normal approximation to evaluate the means.

When should I use ANOVA instead of a t test?

Use ANOVA if you have more than two group means to compare.

What are the differences between t test vs chi square?

Chi square tests are used to evaluate contingency tables , which record a count of the number of subjects that fall into particular categories (e.g., truck, SUV, car). t tests compare the mean(s) of a variable of interest (e.g., height, weight).

What are P values?

P values are the probability that you would get data as or more extreme than the observed data given that the null hypothesis is true. It’s a mouthful, and there are a lot of issues to be aware of with P values.

What are t test critical values?

Critical values are a classical form (they aren’t used directly with modern computing) of determining if a statistical test is significant or not. Historically you could calculate your test statistic from your data, and then use a t-table to look up the cutoff value (critical value) that represented a “significant” result. You would then compare your observed statistic against the critical value.

How do I calculate degrees of freedom for my t test?

In most practical usage, degrees of freedom are the number of observations you have minus the number of parameters you are trying to estimate. The calculation isn’t always straightforward and is approximated for some t tests.

Statistical software calculates degrees of freedom automatically as part of the analysis, so understanding them in more detail isn’t needed beyond assuaging any curiosity.

Perform your own t test

Are you ready to calculate your own t test? Start your 30 day free trial of Prism and get access to:

A step by step guide on how to perform a t test
Sample data to save you time
More tips on how Prism can help your research

With Prism, in a matter of minutes you learn how to go from entering data to performing statistical analyses and generating high-quality graphs.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
Duis aute irure dolor in reprehenderit in voluptate
Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.1 hypothesis testing (critical value approach).

The critical value approach involves determining "likely" or "unlikely" by determining whether or not the observed test statistic is more extreme than would be expected if the null hypothesis were true. That is, it entails comparing the observed test statistic to some cutoff value, called the " critical value ." If the test statistic is more extreme than the critical value, then the null hypothesis is rejected in favor of the alternative hypothesis. If the test statistic is not as extreme as the critical value, then the null hypothesis is not rejected.

Specifically, the four steps involved in using the critical value approach to conducting any hypothesis test are:

Specify the null and alternative hypotheses.
Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. To conduct the hypothesis test for the population mean μ , we use the t -statistic $t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}$ which follows a t -distribution with n - 1 degrees of freedom.
Determine the critical value by finding the value of the known distribution of the test statistic such that the probability of making a Type I error — which is denoted $\alpha$ (greek letter "alpha") and is called the " significance level of the test " — is small (typically 0.01, 0.05, or 0.10).
Compare the test statistic to the critical value. If the test statistic is more extreme in the direction of the alternative than the critical value, reject the null hypothesis in favor of the alternative hypothesis. If the test statistic is less extreme than the critical value, do not reject the null hypothesis.

Example S.3.1.1

Mean gpa section .

In our example concerning the mean grade point average, suppose we take a random sample of n = 15 students majoring in mathematics. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right-Tailed

The critical value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the t -value, denoted t $\alpha$ , n - 1 , such that the probability to the right of it is $\alpha$. It can be shown using either statistical software or a t -table that the critical value t 0.05,14 is 1.7613. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3 if the test statistic t * is greater than 1.7613. Visually, the rejection region is shaded red in the graph.

t distribution graph for a t value of 1.76131

Left-Tailed

The critical value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the t -value, denoted -t ( $\alpha$ , n - 1) , such that the probability to the left of it is $\alpha$. It can be shown using either statistical software or a t -table that the critical value -t 0.05,14 is -1.7613. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3 if the test statistic t * is less than -1.7613. Visually, the rejection region is shaded red in the graph.

There are two critical values for the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 — one for the left-tail denoted -t ( $\alpha$ / 2, n - 1) and one for the right-tail denoted t ( $\alpha$ / 2, n - 1) . The value - t ( $\alpha$ /2, n - 1) is the t -value such that the probability to the left of it is $\alpha$/2, and the value t ( $\alpha$ /2, n - 1) is the t -value such that the probability to the right of it is $\alpha$/2. It can be shown using either statistical software or a t -table that the critical value -t 0.025,14 is -2.1448 and the critical value t 0.025,14 is 2.1448. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3 if the test statistic t * is less than -2.1448 or greater than 2.1448. Visually, the rejection region is shaded red in the graph.

t distribution graph for a two tailed test of 0.05 level of significance

t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

A one-sample t-test;
A two-sample t-test; and
A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

The data points are independent; AND
The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 .

The alternative hypothesis is that the population mean is:

different from μ 0 \mu_0 μ 0 ;
smaller than μ 0 \mu_0 μ 0 ; or
greater than μ 0 \mu_0 μ 0 .

One-sample t-test formula :

μ 0 \mu_0 μ 0 — Mean postulated in the null hypothesis;
n n n — Sample size;
x ˉ \bar{x} x ˉ — Sample mean; and
s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 , and μ 2 \mu_2 μ 2 , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 − μ 2 is:

Different from Δ \Delta Δ ;
Smaller than Δ \Delta Δ ; or
Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

μ 1 \mu_1 μ 1 and μ 2 \mu_2 μ 2 are different from one another;
μ 1 \mu_1 μ 1 is smaller than μ 2 \mu_2 μ 2 ; and
μ 1 \mu_1 μ 1 is greater than μ 2 \mu_2 μ 2 .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p is the so-called pooled standard deviation , which we compute as:

Δ \Delta Δ — Mean difference postulated in the null hypothesis;
n 1 n_1 n 1 — First sample size;
x ˉ 1 \bar{x}_1 x ˉ 1 — Mean for the first sample;
s 1 s_1 s 1 — Standard deviation in the first sample;
n 2 n_2 n 2 — Second sample size;
x ˉ 2 \bar{x}_2 x ˉ 2 — Mean for the second sample; and
s 2 s_2 s 2 — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 + n 2 − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

s 1 s_1 s 1 — Standard deviation in the first sample;
s 2 s_2 s 2 — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 − 1 and n 2 − 1 n_2 - 1 n 2 − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 − 1 and n 2 − 1 n_2 - 1 n 2 − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

The pre- and post-means are different from one another (treatment has some effect);
The pre-mean is smaller than the post-mean (treatment increases the result); or
The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 , ... , x n be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 , ... , y n the respective post observations. That is, x i , y i x_i, y_i x i , y i are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i := x i − y i . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 , ... , d n . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

One-sample t-test;
Two-sample t-test; and
Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

Subtract the null hypothesis mean from the sample mean value.
Divide the difference by the standard deviation of the sample.
Multiply the resultant with the square root of the sample size.

.css-slt4t3.css-slt4t3{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-slt4t3.css-slt4t3:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-slt4t3 .js-external-link-button.link-like,.css-slt4t3 .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-slt4t3 .js-external-link-button.link-like:hover,.css-slt4t3 .js-external-link-anchor:hover,.css-slt4t3 .js-external-link-button.link-like:active,.css-slt4t3 .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-slt4t3 .js-external-link-button.link-like:focus-visible,.css-slt4t3 .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-slt4t3 p,.css-slt4t3 div{margin:0px;display:block;}.css-slt4t3 pre{margin:0px;display:block;}.css-slt4t3 pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-slt4t3 pre:not(:first-child){padding-top:8px;}.css-slt4t3 ul,.css-slt4t3 ol{display:block margin:0px;padding-left:20px;}.css-slt4t3 ul li,.css-slt4t3 ol li{padding-top:8px;}.css-slt4t3 ul ul,.css-slt4t3 ol ul,.css-slt4t3 ul ol,.css-slt4t3 ol ol{padding-top:0px;}.css-slt4t3 ul:not(:first-child),.css-slt4t3 ol:not(:first-child){padding-top:4px;} .css-4okk7a{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-4okk7a code,.css-4okk7a kbd,.css-4okk7a pre,.css-4okk7a samp{font-family:monospace;}.css-4okk7a code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-4okk7a figcaption,.css-4okk7a caption{text-align:center;}.css-4okk7a figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-4okk7a h3{font-size:1.75rem;}.css-4okk7a h4{font-size:1.5rem;}.css-4okk7a .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-4okk7a .mathBlock .katex{font-size:24px;text-align:left;}.css-4okk7a .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-4okk7a .videoBlock,.css-4okk7a .imageBlock{margin-bottom:16px;}.css-4okk7a .imageBlockimage-align--left,.css-4okk7a .videoBlockvideo-align--left{float:left;}.css-4okk7a .imageBlockimage-align--right,.css-4okk7a .videoBlockvideo-align--right{float:right;}.css-4okk7a .imageBlockimage-align--center,.css-4okk7a .videoBlockvideo-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-4okk7a .imageBlockimage-align--none,.css-4okk7a .videoBlockvideo-align--none{clear:both;margin-left:0;margin-right:0;}.css-4okk7a .videoBlockvideo--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-4okk7a .videoBlockvideo--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-4okk7a .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-4okk7a .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-4okk7a .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-4okk7a .katex .katex-version::after{content:'0.13.13';}.css-4okk7a .katex .katex-mathml{position:absolute;clip:rect(1px, 1px, 1px, 1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-4okk7a .katex .katex-html>.newline{display:block;}.css-4okk7a .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-4okk7a .katex .strut{display:inline-block;}.css-4okk7a .katex .textbf{font-weight:bold;}.css-4okk7a .katex .textit{font-style:italic;}.css-4okk7a .katex .textrm{font-family:KaTeX_Main;}.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .texttt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-4okk7a .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-4okk7a .katex .mathrm{font-style:normal;}.css-4okk7a .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-4okk7a .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-4okk7a .katex .amsrm{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathbb,.css-4okk7a .katex .textbb{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-4okk7a .katex .mathfrak,.css-4okk7a .katex .textfrak{font-family:KaTeX_Fraktur;}.css-4okk7a .katex .mathtt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathscr,.css-4okk7a .katex .textscr{font-family:KaTeX_Script;}.css-4okk7a .katex .mathsf,.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .mathboldsf,.css-4okk7a .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-4okk7a .katex .mathitsf,.css-4okk7a .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-4okk7a .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-4okk7a .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-4okk7a .katex .vlist-r{display:table-row;}.css-4okk7a .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-4okk7a .katex .vlist>span{display:block;height:0;position:relative;}.css-4okk7a .katex .vlist>span>span{display:inline-block;}.css-4okk7a .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-4okk7a .katex .vlist-t2{margin-right:-2px;}.css-4okk7a .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-4okk7a .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-4okk7a .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-4okk7a .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-4okk7a .katex .msupsub{text-align:left;}.css-4okk7a .katex .mfrac>span>span{text-align:center;}.css-4okk7a .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .mfrac .frac-line,.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline,.css-4okk7a .katex .hdashline,.css-4okk7a .katex .rule{min-height:1px;}.css-4okk7a .katex .mspace{display:inline-block;}.css-4okk7a .katex .llap,.css-4okk7a .katex .rlap,.css-4okk7a .katex .clap{width:0;position:relative;}.css-4okk7a .katex .llap>.inner,.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{position:absolute;}.css-4okk7a .katex .llap>.fix,.css-4okk7a .katex .rlap>.fix,.css-4okk7a .katex .clap>.fix{display:inline-block;}.css-4okk7a .katex .llap>.inner{right:0;}.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{left:0;}.css-4okk7a .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-4okk7a .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-4okk7a .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-4okk7a .katex .sizing.reset-size1.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-4okk7a .katex .sizing.reset-size1.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size1.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-4okk7a .katex .sizing.reset-size1.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size1.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size1.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-4okk7a .katex .sizing.reset-size1.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size1.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size1.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-4okk7a .katex .sizing.reset-size1.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-4okk7a .katex .sizing.reset-size1.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-4okk7a .katex .sizing.reset-size2.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size2.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-4okk7a .katex .sizing.reset-size2.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-4okk7a .katex .sizing.reset-size2.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size2.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size2.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-4okk7a .katex .sizing.reset-size2.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-4okk7a .katex .sizing.reset-size2.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size2.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size2.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-4okk7a .katex .sizing.reset-size2.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-4okk7a .katex .sizing.reset-size3.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-4okk7a .katex .sizing.reset-size3.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-4okk7a .katex .sizing.reset-size3.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-4okk7a .katex .sizing.reset-size3.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-4okk7a .katex .sizing.reset-size3.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-4okk7a .katex .sizing.reset-size3.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-4okk7a .katex .sizing.reset-size3.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-4okk7a .katex .sizing.reset-size3.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-4okk7a .katex .sizing.reset-size3.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-4okk7a .katex .sizing.reset-size4.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size4.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size4.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-4okk7a .katex .sizing.reset-size4.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-4okk7a .katex .sizing.reset-size4.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-4okk7a .katex .sizing.reset-size4.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-4okk7a .katex .sizing.reset-size4.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size4.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size4.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-4okk7a .katex .sizing.reset-size4.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-4okk7a .katex .sizing.reset-size4.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-4okk7a .katex .sizing.reset-size5.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size5.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size5.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-4okk7a .katex .sizing.reset-size5.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-4okk7a .katex .sizing.reset-size5.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-4okk7a .katex .sizing.reset-size5.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-4okk7a .katex .sizing.reset-size5.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size5.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size5.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-4okk7a .katex .sizing.reset-size5.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-4okk7a .katex .sizing.reset-size5.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-4okk7a .katex .sizing.reset-size6.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size6.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-4okk7a .katex .sizing.reset-size6.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-4okk7a .katex .sizing.reset-size6.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-4okk7a .katex .sizing.reset-size6.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-4okk7a .katex .sizing.reset-size6.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-4okk7a .katex .sizing.reset-size6.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size6.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size6.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-4okk7a .katex .sizing.reset-size6.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-4okk7a .katex .sizing.reset-size6.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-4okk7a .katex .sizing.reset-size7.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size7.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size7.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-4okk7a .katex .sizing.reset-size7.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size7.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size7.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size7.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-4okk7a .katex .sizing.reset-size7.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size7.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size7.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-4okk7a .katex .sizing.reset-size7.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-4okk7a .katex .sizing.reset-size8.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size8.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size8.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-4okk7a .katex .sizing.reset-size8.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size8.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size8.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size8.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size8.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-4okk7a .katex .sizing.reset-size8.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size8.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-4okk7a .katex .sizing.reset-size8.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-4okk7a .katex .sizing.reset-size9.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-4okk7a .katex .sizing.reset-size9.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size9.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-4okk7a .katex .sizing.reset-size9.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-4okk7a .katex .sizing.reset-size9.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-4okk7a .katex .sizing.reset-size9.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-4okk7a .katex .sizing.reset-size9.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size9.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size9.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-4okk7a .katex .sizing.reset-size9.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-4okk7a .katex .sizing.reset-size9.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-4okk7a .katex .sizing.reset-size10.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-4okk7a .katex .sizing.reset-size10.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-4okk7a .katex .sizing.reset-size10.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-4okk7a .katex .sizing.reset-size10.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-4okk7a .katex .sizing.reset-size10.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-4okk7a .katex .sizing.reset-size10.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-4okk7a .katex .sizing.reset-size10.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-4okk7a .katex .sizing.reset-size10.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-4okk7a .katex .sizing.reset-size10.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-4okk7a .katex .sizing.reset-size10.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-4okk7a .katex .sizing.reset-size10.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-4okk7a .katex .sizing.reset-size11.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-4okk7a .katex .sizing.reset-size11.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-4okk7a .katex .sizing.reset-size11.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-4okk7a .katex .sizing.reset-size11.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-4okk7a .katex .sizing.reset-size11.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-4okk7a .katex .sizing.reset-size11.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-4okk7a .katex .sizing.reset-size11.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-4okk7a .katex .sizing.reset-size11.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-4okk7a .katex .sizing.reset-size11.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-4okk7a .katex .sizing.reset-size11.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-4okk7a .katex .sizing.reset-size11.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-4okk7a .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-4okk7a .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-4okk7a .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-4okk7a .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-4okk7a .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-4okk7a .katex .delimcenter{position:relative;}.css-4okk7a .katex .op-symbol{position:relative;}.css-4okk7a .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-4okk7a .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-4okk7a .katex .op-limits>.vlist-t{text-align:center;}.css-4okk7a .katex .accent>.vlist-t{text-align:center;}.css-4okk7a .katex .accent .accent-body{position:relative;}.css-4okk7a .katex .accent .accent-body:not(.accent-full){width:0;}.css-4okk7a .katex .overlay{display:block;}.css-4okk7a .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-4okk7a .katex .mtable .arraycolsep{display:inline-block;}.css-4okk7a .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-4okk7a .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-4okk7a .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-4okk7a .katex .svg-align{text-align:left;}.css-4okk7a .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-4okk7a .katex svg path{stroke:none;}.css-4okk7a .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-4okk7a .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-4okk7a .katex .stretchy::before,.css-4okk7a .katex .stretchy::after{content:'';}.css-4okk7a .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-4okk7a .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-4okk7a .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .x-arrow-pad{padding:0 0.5em;}.css-4okk7a .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-4okk7a .katex .x-arrow,.css-4okk7a .katex .mover,.css-4okk7a .katex .munder{text-align:center;}.css-4okk7a .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-4okk7a .katex .fbox,.css-4okk7a .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-4okk7a .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-4okk7a .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-4okk7a .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-4okk7a .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-4okk7a .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-4okk7a .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-4okk7a .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-4okk7a .katex .mtr-glue{width:50%;}.css-4okk7a .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-4okk7a .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-4okk7a .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-4okk7a .katex-display{display:block;margin:1em 0;text-align:center;}.css-4okk7a .katex-display>.katex{display:block;white-space:nowrap;}.css-4okk7a .katex-display>.katex>.katex-html{display:block;position:relative;}.css-4okk7a .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-4okk7a .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-4okk7a .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-4okk7a body{counter-reset:katexEqnNo mmlEqnNo;}.css-4okk7a table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-4okk7a .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-4okk7a .tableBlock thead,.css-4okk7a .tableBlock thead th{border-bottom:1px solid #333!important;}.css-4okk7a .tableBlock th,.css-4okk7a .tableBlock td{padding:10px;text-align:left;}.css-4okk7a .tableBlock th{font-weight:bold!important;}.css-4okk7a .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-4okk7a .tableBlock caption>p{margin:0;}.css-4okk7a .tableBlock th>p,.css-4okk7a .tableBlock td>p{margin:0;}.css-4okk7a .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-4okk7a .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-4okk7a .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-4okk7a .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-4okk7a .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-4okk7a .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-4okk7a .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-4okk7a .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-4okk7a .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-4okk7a .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-4okk7a .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-4okk7a .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-4okk7a .tableBlock [data-text-align='center']{text-align:center;}.css-4okk7a .tableBlock [data-text-align='left']{text-align:left;}.css-4okk7a .tableBlock [data-text-align='right']{text-align:right;}.css-4okk7a .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-4okk7a .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-4okk7a .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-4okk7a .tableBlockfont-size--xxsmall{font-size:10px;}.css-4okk7a .tableBlockfont-size--xsmall{font-size:12px;}.css-4okk7a .tableBlockfont-size--small{font-size:14px;}.css-4okk7a .tableBlockfont-size--large{font-size:18px;}.css-4okk7a .tableBlockborder--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-4okk7a .tableBlockborder--bordered td,.css-4okk7a .tableBlockborder--bordered th{border:1px solid #e2e5e7;}.css-4okk7a .tableBlockborder--borderless tbody+tbody,.css-4okk7a .tableBlockborder--borderless td,.css-4okk7a .tableBlockborder--borderless th,.css-4okk7a .tableBlockborder--borderless tr,.css-4okk7a .tableBlockborder--borderless thead,.css-4okk7a .tableBlockborder--borderless thead th{border:0!important;}.css-4okk7a .tableBlock:not(.tableBlocktable-striped) tbody tr{background-color:unset!important;}.css-4okk7a .tableBlocktable-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-4okk7a .tableBlocktable-compactl th,.css-4okk7a .tableBlocktable-compact td{padding:3px!important;}.css-4okk7a .tableBlockfull-size{width:100%;}.css-4okk7a .textBlock{margin-bottom:16px;}.css-4okk7a .textBlocktext-formatting--finePrint{font-size:12px;}.css-4okk7a .textBlocktext-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-4okk7a .textBlocktext-infoBox p{margin:0;}.css-4okk7a .textBlocktext-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-4okk7a .textBlocktext-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-4okk7a .textBlocktext-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-4okk7a .textBlocktext-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-4okk7a .textBlocktext-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-4okk7a .textBlocktext-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-4okk7a .textBlocktext-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-4okk7a .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-4okk7a.css-4okk7a{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-4okk7a.css-4okk7a:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-4okk7a .js-external-link-button.link-like,.css-4okk7a .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-4okk7a .js-external-link-button.link-like:hover,.css-4okk7a .js-external-link-anchor:hover,.css-4okk7a .js-external-link-button.link-like:active,.css-4okk7a .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-4okk7a .js-external-link-button.link-like:focus-visible,.css-4okk7a .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-4okk7a p,.css-4okk7a div{margin:0px;display:block;}.css-4okk7a pre{margin:0px;display:block;}.css-4okk7a pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-4okk7a pre:not(:first-child){padding-top:8px;}.css-4okk7a ul,.css-4okk7a ol{display:block margin:0px;padding-left:20px;}.css-4okk7a ul li,.css-4okk7a ol li{padding-top:8px;}.css-4okk7a ul ul,.css-4okk7a ol ul,.css-4okk7a ul ol,.css-4okk7a ol ol{padding-top:0px;}.css-4okk7a ul:not(:first-child),.css-4okk7a ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05.

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance, it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

P-values and significance tests (Kahn Academy)
Hypothesis testing and p-values (Kahn Academy)
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
Criticism of using the “ p “< 0.05”.
Publication manual of the American Psychological Association
Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply. BMJ: British Medical Journal , 309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research. American Journal of Public Health , 78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In Seminars in hematology (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value. Epidemiology (Cambridge, Mass.) , 9 (1), 7-8.

Exploratory Data Analysis

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion (two tailed).

A population proportion is the share of a population that belongs to a particular category .

Hypothesis tests are used to check a claim about the size of that population proportion.

Hypothesis Testing a Proportion

The following steps are used for a hypothesis test:

Check the conditions
Define the claims
Decide the significance level
Calculate the test statistic

For example:

Population : Nobel Prize winners
Category : Women

And we want to check the claim:

"The share of Nobel Prize winners that are women is not 50%"

By taking a sample of 100 randomly selected Nobel Prize winners we could find that:

10 out of 100 Nobel Prize winners in the sample were women

The sample proportion is then: $\displaystyle \frac{10}{100} = 0.1$, or 10%.

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

The sample is randomly selected
Being in the category
Not being in the category
5 members in the category
5 members not in the category

In our example, we randomly selected 10 people that were women.

The rest were not women, so there are 90 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.

2. Defining the Claims

We need to define a null hypothesis ($H_{0}$) and an alternative hypothesis ($H_{1}$) based on the claim we are checking.

The claim was:

In this case, the parameter is the proportion of Nobel Prize winners that are women ($p$).

The null and alternative hypothesis are then:

Null hypothesis : 50% of Nobel Prize winners were women.

Alternative hypothesis : The share of Nobel Prize winners that are women is not 50%

Which can be expressed with symbols as:

$H_{0}$: $p = 0.50 $

$H_{1}$: $p \neq 0.50 $

This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different (larger or smaller) than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

3. Deciding the Significance Level

The significance level ($\alpha$) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

$\alpha = 0.1$ (10%)
$\alpha = 0.05$ (5%)
$\alpha = 0.01$ (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population proportion is:

$\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} $

$\hat{p}-p$ is the difference between the sample proportion ($\hat{p}$) and the claimed population proportion ($p$).

$n$ is the sample size.

In our example:

The claimed ($H_{0}$) population proportion ($p$) was $ 0.50 $

The sample size ($n$) was $100$

So the test statistic (TS) is then:

$\displaystyle \frac{0.1-0.5}{\sqrt{0.5(1-0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.5(0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.25}} \cdot \sqrt{100} = \frac{-0.4}{0.5} \cdot 10 = \underline{-8}$

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

With R use the built-in math functions to calculate the test statistic for a proportion.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

The critical value approach compares the test statistic with the critical value of the significance level.
The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level ($\alpha$).

For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .

This critical Z-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is different from 50%, the rejection region is split into both the left and right tail:

Choosing a significance level ($\alpha$) of 0.01, or 1%, we can find the critical Z-value from a Z-table , or with a programming language function:

Note: Because this is a two-tailed test the tail area ($\alpha$) needs to be split in half (divided by 2).

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an $\alpha$/2 = 0.005 in the left tail.

With R use the built-in qnorm() function to find the Z-value for an $\alpha$ = 0.005 in the left tail.

Using either method we can find that the critical Z-value in the left tail is $\approx \underline{-2.5758}$

Since a normal distribution i symmetric, we know that the critical Z-value in the right tail will be the same number, only positive: $\underline{2.5758}$

For a two-tailed test we need to check if the test statistic (TS) is smaller than the negative critical value (-CV), or bigger than the positive critical value (CV).

If the test statistic is smaller than the negative critical value, the test statistic is in the rejection region .

If the test statistic is bigger than the positive critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis ($H_{0}$).

Here, the test statistic (TS) was $\approx \underline{-8}$ and the critical value was $\approx \underline{-2.5758}$

Here is an illustration of this test in a graph:

Since the test statistic was smaller than the negative critical value we reject the null hypothesis.

This means that the sample data supports the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 1% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level ($\alpha$), we reject the null hypothesis ($H_{0}$).

The test statistic was found to be $ \approx \underline{-8} $

For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .

Because this is a two-tailed test, we need to find the P-value of a Z-value smaller than -8 and multiply it by 2 .

We can find the P-value using a Z-table , or with a programming language function:

With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value smaller than -8 for a two tailed test:

With R use the built-in pnorm() function find the P-value of a Z-value smaller than -8 for a two tailed test:

Using either method we can find that the P-value is $\approx \underline{1.25 \cdot 10^{-15}}$ or $0.00000000000000125$

This tells us that the significance level ($\alpha$) would need to be bigger than 0.000000000000125%, to reject the null hypothesis.

This P-value is smaller than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is rejected at all of these significance levels.

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 10%, 5%, and 1% significance level .

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50.

With R use the built-in prop.test() function find the P-value for a left tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from 0.50.

Note: The conf.level in the R code is the reverse of the significance level.

Here, the significance level is 0.01, or 1%, so the conf.level is 1-0.01 = 0.99, or 99%.

Left-Tailed and Two-Tailed Tests

This was an example of a two tailed test, where the alternative hypothesis claimed that parameter is different from the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

Right-Tailed Test
Left-Tailed Test

COLOR PICKER

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

How to Perform T-Tests in Python (One- and Two-Sample)

February 12, 2024 January 13, 2024

In this post, you’ll learn how to perform t-tests in Python using the popular SciPy library . T-tests are used to test for statistical significance and can be hugely advantageous when working with smaller sample sizes.

By the end of this tutorial, you’ll have learned the following:

What the different t-tests are and when they should be applied
How to perform a one-sample t-test and a two-sample t-test in Python
How to interpret the results from your statistical tests

Table of Contents

Understanding the T-Test

The t-test, or often referred to as the student’s t-test , dates back to the early 20th century. An Irish statistician working for Guinness Brewery, William Sealy Gosset, introduced the concept. Because the brewery was working with small sample sizes and was under strict orders of confidentiality, Gosset published his findings under the pseudonym “Student”. His seminal work, “The Probable Error of a Mean,” laid the groundwork for what we now know as Student’s t-test.

This leads us to one of the primary benefits of the t-test: the t-test is able to make reliable inferences about a population using a small sample size . Let’s explore how this works by discussing the theory behind the t-test in the following section.

Understanding the Student’s T-Test

Statistical tests are used to make assumptions about some population parameters. For example, it lets us test whether or not the average test score for any given group of students is 70%. The T-Test works in two different ways:

The one-sample t-test allows us to test whether or not the population mean is equal to some value
The two-sample t-test allows us to test whether or not two population means are equal

Let’s explore these in a little more depth.

Understanding the One-Sample T-Test

The one-sample t-test is used to test the null hypothesis that the population mean inferred from a sample is equal to some given value. It can be described as below:

There are actually three different alternative hypotheses:

Two-tailed : The population mean is not equal to some given value
Left-tailed : The population mean is less than some given value
Right-tailed : The population mean is greater than some given value

We can use the following formula to calculate our test statistic:

x: the sample mean
μ 0 : a hypothesized population mean
s: the sample standard deviation
n: the sample size

We then need to calculate the p-value using degrees of freedom equal to n – 1. If the p-value is less than your chosen significance level, we can reject the null hypothesis and say that the means differ.

Understanding the Two-Sample T-Test

The two-sample t-test is used to test whether two population means are equal (or if they differ in a significant way). In this case, the null hypothesis assumes that the two population means are equal.

When we sample two different groups, we are almost guaranteed that their sample means will differ. But the t-test allows us to test whether or not this difference is different in a statistically significant way.

Similar to the one-sample t-test, there are three different alternative hypotheses:

Two-tailed : The two means are not equal
Left-tailed : Population mean #1 is less than population mean #2
Right-tailed : Population mean #1 is greater than population mean #2

The formula for the two-sample t-test can be written as:

X 1 and X 2 are the sample means of the two groups.
s 1 and s 2 are the sample variances of the two groups.
n 1 and n 2 are the sample sizes of the two groups.

We then need to calculate the p-value using degrees of freedom equal to (n 1 +n 2 -1). If the p-value is less than your chosen significance level, we can reject the null hypothesis and say that the means differ.

Requirements for the Student T-Test

Both types of t-tests follow a key set of assumptions, including:

Observations should be independent of one another
The data should be relatively normally distributed
The samples should have approximately equal variances (this only applies to the two-sample t-test)
The samples were collected using random sampling

It’s easy to test for these assumptions using Python (and I have included links to tutorials covering how to do this). Let’s take a look at example walkthroughs of how to conduct both of these tests in Python.

Perform a One-Sample T-Test in Python

In this section, you’ll learn how to conduct a one-sample t-test in Python. Suppose you are a teacher and have just given a test. You know that the population mean for this test is 85% and you want to see whether the score of the class is significantly different from this population mean.

Let’s start by importing our required function, ttest_1samp() from SciPy and defining our data:

In the code block above, we first imported our required library. We then defined our sample as a list of values and defined our population mean as its own variable.

We can now pass these values into the function, as shown below:

The function returns a test statistic and the corresponding p-value. We can print these values out using f-strings to simplify the labeling , as shown above.

Finally, we can write a simple if-else statement to evaluate whether or not our sample mean is significantly different from the population mean:

We can see that by running this if-else statement, that our test indicates that there is no significant difference in the exam scores.

In order to calculate the different one-sample t-test alternative hypotheses, we can use the alternative= parameter:

alternative='two-sided' is the default value, checking for a two-sided alternative hypothesis
alternative='less' checks whether the provided mean is less than the population mean
alternative='greater' checks whether the provided mean is greater than the population mean

Now that you have a strong understanding of how to perform a one-sample t-test, let’s dive into the exciting world of two-sample t-tests!

Perform a Two-Sample T-Test in Python

A two-sample t-test is used to test whether the means of two samples are equal. The test requires that both samples be normally distributed, have similar variances, and be independent of one another.

Imagine that we want to compare the test scores of two different classes. This is the perfect example of when to use a t-test. Let’s begin by running a two-tailed test, which only evaluates whether or not the two means are equal. It begins with the null hypothesis, which states that the two means are equal.

Let’s take a look at how we can run a two-tailed t-test in Python:

We can see that the ttest_ind() function returns both a test statistic and a p-value. We can run a simple if-else statement to check whether or not we can reject or fail to reject the null hypothesis:

We can see that there is a significant difference between the two sets of scores. However, the two-tailed test doesn’t tell us in which direction.

In order to do this, we need to use a right- or left-tailed two-sample t-test. To do this in SciPy, we use the alternative= parameter. By default, this is set to 'two-sided' . However, we can modify this to either 'less' or 'greater' , if we want to evaluate whether or not the mean for one sample is less than or greater than another.

Let’s see how we can check if the mean of class 2 is significantly higher than that of class 1:

Because our p-value is less than our defined value of 0.05, we can say that the mean of class 2 is higher with statistical significance.

In conclusion, this comprehensive guide has equipped you with the knowledge and practical skills to perform t-tests in Python using the SciPy library. T-tests are invaluable tools for assessing statistical significance, particularly when working with smaller sample sizes.

Throughout this tutorial, you’ve gained insights into:

The different types of t-tests and their applications.
How to conduct one-sample and two-sample t-tests in Python.
Interpretation of results obtained from statistical tests.

Remember that t-tests come with certain assumptions, and it’s crucial to validate them before applying these tests to your data. Python provides tools to check these assumptions, ensuring the robustness and reliability of your statistical analyses.

To learn more about these functions, check out the official documentation for the one-sample t-test and for the two-sample t-test in SciPy.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials. View Author posts

Margin Size

Download Page (PDF)
Download Full Book (PDF)
Periodic Table
Physics Constants
Scientific Calculator
Reference & Cite
Tools expand_more
Readability

selected template will load here

This action is not available.

6.4: One- and Two-Tailed Tests

Last updated
Save as PDF
Page ID 28910

Rice University

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

Learning Objectives

Define Type I and Type II errors
Interpret significant and non-significant differences
Explain why the null hypothesis should not be accepted when the effect is not significant

In the James Bond case study, Mr. Bond was given $16$ trials on which he judged whether a martini had been shaken or stirred. He was correct on $13$ of the trials. From the binomial distribution, we know that the probability of being correct $13$ or more times out of $16$ if one is only guessing is $0.0106$. Figure $\PageIndex{1}$ shows a graph of the binomial distribution. The red bars show the values greater than or equal to $13$. As you can see in the figure, the probabilities are calculated for the upper tail of the distribution. A probability calculated in only one tail of the distribution is called a "one-tailed probability."

Binomial Calculator

A slightly different question can be asked of the data: "What is the probability of getting a result as extreme or more extreme than the one observed?" Since the chance expectation is $8/16$, a result of $3/16$ is equally as extreme as $13/16$. Thus, to calculate this probability, we would consider both tails of the distribution. Since the binomial distribution is symmetric when $\pi =0.5$, this probability is exactly double the probability of $0.0106$ computed previously. Therefore, $p = 0.0212$. A probability calculated in both tails of a distribution is called a "two-tailed probability" (see Figure $\PageIndex{2}$).

Should the one-tailed or the two-tailed probability be used to assess Mr. Bond's performance? That depends on the way the question is posed. If we are asking whether Mr. Bond can tell the difference between shaken or stirred martinis, then we would conclude he could if he performed either much better than chance or much worse than chance. If he performed much worse than chance, we would conclude that he can tell the difference, but he does not know which is which. Therefore, since we are going to reject the null hypothesis if Mr. Bond does either very well or very poorly, we will use a two-tailed probability.

On the other hand, if our question is whether Mr. Bond is better than chance at determining whether a martini is shaken or stirred, we would use a one-tailed probability. What would the one-tailed probability be if Mr. Bond were correct on only $3$ of the $16$ trials? Since the one-tailed probability is the probability of the right-hand tail, it would be the probability of getting $3$ or more correct out of $16$. This is a very high probability and the null hypothesis would not be rejected.

The null hypothesis for the two-tailed test is $\pi =0.5$. By contrast, the null hypothesis for the one-tailed test is $\pi \leq 0.5$. Accordingly, we reject the two-tailed hypothesis if the sample proportion deviates greatly from $0.5$ in either direction. The one-tailed hypothesis is rejected only if the sample proportion is much greater than $0.5$. The alternative hypothesis in the two-tailed test is $\pi \neq 0.5$. In the one-tailed test it is $\pi > 0.5$.

You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data. Statistical tests that compute one-tailed probabilities are called one-tailed tests; those that compute two-tailed probabilities are called two-tailed tests. Two-tailed tests are much more common than one-tailed tests in scientific research because an outcome signifying that something other than chance is operating is usually worth noting. One-tailed tests are appropriate when it is not important to distinguish between no effect and an effect in the unexpected direction. For example, consider an experiment designed to test the efficacy of a treatment for the common cold. The researcher would only be interested in whether the treatment was better than a placebo control. It would not be worth distinguishing between the case in which the treatment was worse than a placebo and the case in which it was the same because in both cases the drug would be worthless.

Some have argued that a one-tailed test is justified whenever the researcher predicts the direction of an effect. The problem with this argument is that if the effect comes out strongly in the non-predicted direction, the researcher is not justified in concluding that the effect is not zero. Since this is unrealistic, one-tailed tests are usually viewed skeptically if justified on this basis alone.

Get the Reddit app

Ask a question about statistics (other than homework). Don't solicit academic misconduct. Don't ask people to contact you externally to the subreddit. Use informative titles.

I made this "mental map" to help choose what hypothesis test to perform. Can you help me confirm if this is correct?

I came up with this guide. I'm just starting to learn about hypothesis testing, so that's why there are only Z-test and t-test options. I plan on simplifying it later on. "c" is for "constant" and "p" is for "proportion". On top of each of the six blocks are the conditions for that block; can you help me confirm if those conditions are correct? They use an inclusive "or" by the way. Help is very appreciated

Assumption all tests have: ( the sample(s) is/are random ) ( the population must be approximately normally distributed (in two-sample tests, both must be ) )

Assumptions two-sample tests have: ( the two samples are independent of each other )

( σ is unknown ) and ( sample size < 30 ) H0 claims µ = c , H1 claims µ ≠ c This will lead to a double-tailed one-sample t-test. H0 claims µ = c , H1 claims µ < c This will lead to a left-tailed one-sample t-test. H0 claims µ = c , H1 claims µ > c This will lead to a right-tailed one-sample t-test. H0 claims µ ≤ c , H1 claims µ > c This will lead to a right-tailed one-sample t-test. H0 claims µ ≥ c , H1 claims µ < c This will lead to a left-tailed one-sample t-test.

( σ is known ) or ( sample size ≥ 30 ) H0 claims µ = c , H1 claims µ ≠ c This will lead to a double-tailed one-sample Z-test. H0 claims µ = c , H1 claims µ < c This will lead to a left-tailed one-sample Z-test. H0 claims µ = c , H1 claims µ > c This will lead to a right-tailed one-sample Z-test. H0 claims µ ≤ c , H1 claims µ > c This will lead to a right-tailed one-sample Z-test. H0 claims µ ≥ c , H1 claims µ < c This will lead to a left-tailed one-sample Z-test.

( σ is known ) or ( sample size ≥ 30 ) H0 claims p = c , H1 claims p ≠ c This will lead to a double-tailed one-sample Z-test. H0 claims p = c , H1 claims p < c This will lead to a left-tailed one-sample Z-test. H0 claims p = c , H1 claims p > c This will lead to a right-tailed one-sample Z-test. H0 claims p ≤ c , H1 claims p > c This will lead to a right-tailed one-sample Z-test. H0 claims p ≥ c , H1 claims p < c This will lead to a left-tailed one-sample Z-test.

( σ is unknown ) and (( sample A's size < 30 ) or ( sample B's size < 30 )) H0 claims µ₁ = µ₂ , H1 claims µ₁ ≠ µ₂ This will lead to a double-tailed two-sample t-test. H0 claims µ₁ = µ₂ , H1 claims µ₁ < µ₂ This will lead to a left-tailed two-sample t-test. H0 claims µ₁ = µ₂ , H1 claims µ₁ > µ₂ This will lead to a right-tailed two-sample t-test. H0 claims µ₁ ≤ µ₂ , H1 claims µ₁ > µ₂ This will lead to a right-tailed two-sample t-test. H0 claims µ₁ ≥ µ₂ , H1 claims µ₁ < µ₂ This will lead to a left-tailed two-sample t-test.

( σ is known ) or (( sample A's size ≥ 30 ) and ( sample B's size ≥ 30 )) H0 claims µ₁ = µ₂ , H1 claims µ₁ ≠ µ₂ This will lead to a double-tailed two-sample Z-test. H0 claims µ₁ = µ₂ , H1 claims µ₁ < µ₂ This will lead to a left-tailed two-sample Z-test. H0 claims µ₁ = µ₂ , H1 claims µ₁ > µ₂ This will lead to a right-tailed two-sample Z-test. H0 claims µ₁ ≤ µ₂ , H1 claims µ₁ > µ₂ This will lead to a right-tailed two-sample Z-test. H0 claims µ₁ ≥ µ₂ , H1 claims µ₁ < µ₂ This will lead to a left-tailed two-sample Z-test.

( σ is known ) or (( sample A's size ≥ 30 ) and ( sample B's size ≥ 30 )) H0 claims p₁ = p₂ , H1 claims p₁ ≠ p₂ This will lead to a double-tailed two-sample Z-test. H0 claims p₁ = p₂ , H1 claims p₁ < p₂ This will lead to a left-tailed two-sample Z-test. H0 claims p₁ = p₂ , H1 claims p₁ > p₂ This will lead to a right-tailed two-sample Z-test. H0 claims p₁ ≤ p₂ , H1 claims p₁ > p₂ This will lead to a right-tailed two-sample Z-test. H0 claims p₁ ≥ p₂ , H1 claims p₁ < p₂ This will lead to a left-tailed two-sample Z-test.

P-value Calculator

Statistical significance calculator to easily calculate the p-value and determine whether the difference between two proportions or means (independent groups) is statistically significant. T-test calculator & z-test calculator to compute the Z-score or T-score for inference about absolute or relative difference (percentage change, percent effect). Suitable for analysis of simple A/B tests.

Related calculators

Using the p-value calculator
What is "p-value" and "significance level"
P-value formula
Why do we need a p-value?
How to interpret a statistically significant result / low p-value
P-value and significance for relative difference in means or proportions

Using the p-value calculator

This statistical significance calculator allows you to perform a post-hoc statistical evaluation of a set of data when the outcome of interest is difference of two proportions (binomial data, e.g. conversion rate or event rate) or difference of two means (continuous data, e.g. height, weight, speed, time, revenue, etc.). You can use a Z-test (recommended) or a T-test to find the observed significance level (p-value statistic). The Student's T-test is recommended mostly for very small sample sizes, e.g. n < 30. In order to avoid type I error inflation which might occur with unequal variances the calculator automatically applies the Welch's T-test instead of Student's T-test if the sample sizes differ significantly or if one of them is less than 30 and the sampling ratio is different than one.

If entering proportions data, you need to know the sample sizes of the two groups as well as the number or rate of events. These can be entered as proportions (e.g. 0.10), percentages (e.g. 10%) or just raw numbers of events (e.g. 50).

If entering means data, simply copy/paste or type in the raw data, each observation separated by comma, space, new line or tab. Copy-pasting from a Google or Excel spreadsheet works fine.

The p-value calculator will output : p-value, significance level, T-score or Z-score (depending on the choice of statistical hypothesis test), degrees of freedom, and the observed difference. For means data it will also output the sample sizes, means, and pooled standard error of the mean. The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests ). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.

Warning: You must have fixed the sample size / stopping time of your experiment in advance, otherwise you will be guilty of optional stopping (fishing for significance) which will inflate the type I error of the test rendering the statistical significance level unusable. Also, you should not use this significance calculator for comparisons of more than two means or proportions, or for comparisons of two groups based on more than one metric. If a test involves more than one treatment group or more than one outcome variable you need a more advanced tool which corrects for multiple comparisons and multiple testing. This statistical calculator might help.

What is "p-value" and "significance level"

The p-value is a heavily used test statistic that quantifies the uncertainty of a given measurement, usually as a part of an experiment, medical trial, as well as in observational studies. By definition, it is inseparable from inference through a Null-Hypothesis Statistical Test (NHST) . In it we pose a null hypothesis reflecting the currently established theory or a model of the world we don't want to dismiss without solid evidence (the tested hypothesis), and an alternative hypothesis: an alternative model of the world. For example, the statistical null hypothesis could be that exposure to ultraviolet light for prolonged periods of time has positive or neutral effects regarding developing skin cancer, while the alternative hypothesis can be that it has a negative effect on development of skin cancer.

In this framework a p-value is defined as the probability of observing the result which was observed, or a more extreme one, assuming the null hypothesis is true . In notation this is expressed as:

p(x 0 ) = Pr(d(X) > d(x 0 ); H 0 )

where x 0 is the observed data (x 1 ,x 2 ...x n ), d is a special function (statistic, e.g. calculating a Z-score), X is a random sample (X 1 ,X 2 ...X n ) from the sampling distribution of the null hypothesis. This equation is used in this p-value calculator and can be visualized as such:

p value statistical significance explained

Therefore the p-value expresses the probability of committing a type I error : rejecting the null hypothesis if it is in fact true. See below for a full proper interpretation of the p-value statistic .

Another way to think of the p-value is as a more user-friendly expression of how many standard deviations away from the normal a given observation is. For example, in a one-tailed test of significance for a normally-distributed variable like the difference of two means, a result which is 1.6448 standard deviations away (1.6448σ) results in a p-value of 0.05.

The term "statistical significance" or "significance level" is often used in conjunction to the p-value, either to say that a result is "statistically significant", which has a specific meaning in statistical inference ( see interpretation below ), or to refer to the percentage representation the level of significance: (1 - p value), e.g. a p-value of 0.05 is equivalent to significance level of 95% (1 - 0.05 * 100). A significance level can also be expressed as a T-score or Z-score, e.g. a result would be considered significant only if the Z-score is in the critical region above 1.96 (equivalent to a p-value of 0.025).

P-value formula

There are different ways to arrive at a p-value depending on the assumption about the underlying distribution. This tool supports two such distributions: the Student's T-distribution and the normal Z-distribution (Gaussian) resulting in a T test and a Z test, respectively.

In both cases, to find the p-value start by estimating the variance and standard deviation, then derive the standard error of the mean, after which a standard score is found using the formula [2] :

X (read "X bar") is the arithmetic mean of the population baseline or the control, μ 0 is the observed mean / treatment group mean, while σ x is the standard error of the mean (SEM, or standard deviation of the error of the mean).

When calculating a p-value using the Z-distribution the formula is Φ(Z) or Φ(-Z) for lower and upper-tailed tests, respectively. Φ is the standard normal cumulative distribution function and a Z-score is computed. In this mode the tool functions as a Z score calculator.

When using the T-distribution the formula is T n (Z) or T n (-Z) for lower and upper-tailed tests, respectively. T n is the cumulative distribution function for a T-distribution with n degrees of freedom and so a T-score is computed. Selecting this mode makes the tool behave as a T test calculator.

The population standard deviation is often unknown and is thus estimated from the samples, usually from the pooled samples variance. Knowing or estimating the standard deviation is a prerequisite for using a significance calculator. Note that differences in means or proportions are normally distributed according to the Central Limit Theorem (CLT) hence a Z-score is the relevant statistic for such a test.

Why do we need a p-value?

If you are in the sciences, it is often a requirement by scientific journals. If you apply in business experiments (e.g. A/B testing) it is reported alongside confidence intervals and other estimates. However, what is the utility of p-values and by extension that of significance levels?

First, let us define the problem the p-value is intended to solve. People need to share information about the evidential strength of data that can be easily understood and easily compared between experiments. The picture below represents, albeit imperfectly, the results of two simple experiments, each ending up with the control with 10% event rate treatment group at 12% event rate.

However, it is obvious that the evidential input of the data is not the same, demonstrating that communicating just the observed proportions or their difference (effect size) is not enough to estimate and communicate the evidential strength of the experiment. In order to fully describe the evidence and associated uncertainty , several statistics need to be communicated, for example, the sample size, sample proportions and the shape of the error distribution. Their interaction is not trivial to understand, so communicating them separately makes it very difficult for one to grasp what information is present in the data. What would you infer if told that the observed proportions are 0.1 and 0.12 (e.g. conversion rate of 10% and 12%), the sample sizes are 10,000 users each, and the error distribution is binomial?

Instead of communicating several statistics, a single statistic was developed that communicates all the necessary information in one piece: the p-value . A p-value was first derived in the late 18-th century by Pierre-Simon Laplace, when he observed data about a million births that showed an excess of boys, compared to girls. Using the calculation of significance he argued that the effect was real but unexplained at the time. We know this now to be true and there are several explanations for the phenomena coming from evolutionary biology. Statistical significance calculations were formally introduced in the early 20-th century by Pearson and popularized by Sir Ronald Fisher in his work, most notably "The Design of Experiments" (1935) [1] in which p-values were featured extensively. In business settings significance levels and p-values see widespread use in process control and various business experiments (such as online A/B tests, i.e. as part of conversion rate optimization, marketing optimization, etc.).

How to interpret a statistically significant result / low p-value

Saying that a result is statistically significant means that the p-value is below the evidential threshold (significance level) decided for the statistical test before it was conducted. For example, if observing something which would only happen 1 out of 20 times if the null hypothesis is true is considered sufficient evidence to reject the null hypothesis, the threshold will be 0.05. In such case, observing a p-value of 0.025 would mean that the result is interpreted as statistically significant.

But what does that really mean? What inference can we make from seeing a result which was quite improbable if the null was true?

Observing any given low p-value can mean one of three things [3] :

There is a true effect from the tested treatment or intervention.
There is no true effect, but we happened to observe a rare outcome. The lower the p-value, the rarer (less likely, less probable) the outcome.
The statistical model is invalid (does not reflect reality).

Obviously, one can't simply jump to conclusion 1.) and claim it with one hundred percent certainty, as this would go against the whole idea of the p-value and statistical significance. In order to use p-values as a part of a decision process external factors part of the experimental design process need to be considered which includes deciding on the significance level (threshold), sample size and power (power analysis), and the expected effect size, among other things. If you are happy going forward with this much (or this little) uncertainty as is indicated by the p-value calculation suggests, then you have some quantifiable guarantees related to the effect and future performance of whatever you are testing, e.g. the efficacy of a vaccine or the conversion rate of an online shopping cart.

Note that it is incorrect to state that a Z-score or a p-value obtained from any statistical significance calculator tells how likely it is that the observation is "due to chance" or conversely - how unlikely it is to observe such an outcome due to "chance alone". P-values are calculated under specified statistical models hence 'chance' can be used only in reference to that specific data generating mechanism and has a technical meaning quite different from the colloquial one. For a deeper take on the p-value meaning and interpretation, including common misinterpretations, see: definition and interpretation of the p-value in statistics .

P-value and significance for relative difference in means or proportions

When comparing two independent groups and the variable of interest is the relative (a.k.a. relative change, relative difference, percent change, percentage difference), as opposed to the absolute difference between the two means or proportions, the standard deviation of the variable is different which compels a different way of calculating p-values [5] . The need for a different statistical test is due to the fact that in calculating relative difference involves performing an additional division by a random variable: the event rate of the control during the experiment which adds more variance to the estimation and the resulting statistical significance is usually higher (the result will be less statistically significant). What this means is that p-values from a statistical hypothesis test for absolute difference in means would nominally meet the significance level, but they will be inadequate given the statistical inference for the hypothesis at hand.

In simulations I performed the difference in p-values was about 50% of nominal: a 0.05 p-value for absolute difference corresponded to probability of about 0.075 of observing the relative difference corresponding to the observed absolute difference. Therefore, if you are using p-values calculated for absolute difference when making an inference about percentage difference, you are likely reporting error rates which are about 50% of the actual, thus significantly overstating the statistical significance of your results and underestimating the uncertainty attached to them.

In short - switching from absolute to relative difference requires a different statistical hypothesis test. With this calculator you can avoid the mistake of using the wrong test simply by indicating the inference you want to make.

References

1 Fisher R.A. (1935) – "The Design of Experiments", Edinburgh: Oliver & Boyd

2 Mayo D.G., Spanos A. (2010) – "Error Statistics", in P. S. Bandyopadhyay & M. R. Forster (Eds.), Philosophy of Statistics, (7, 152–198). Handbook of the Philosophy of Science . The Netherlands: Elsevier.

3 Georgiev G.Z. (2017) "Statistical Significance in A/B Testing – a Complete Guide", [online] https://blog.analytics-toolkit.com/2017/statistical-significance-ab-testing-complete-guide/ (accessed Apr 27, 2018)

4 Mayo D.G., Spanos A. (2006) – "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction", British Society for the Philosophy of Science , 57:323-357

5 Georgiev G.Z. (2018) "Confidence Intervals & P-values for Percent Change / Relative Difference", [online] https://blog.analytics-toolkit.com/2018/confidence-intervals-p-values-percent-change-relative-difference/ (accessed May 20, 2018)

Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation: Georgiev G.Z., "P-value Calculator" , [online] Available at: https://www.gigacalculator.com/calculators/p-value-significance-calculator.php URL [Accessed Date: 09 Jun, 2024].

Our statistical calculators have been featured in scientific papers and articles published in high-profile science journals by:

The author of this tool

Statistical calculators

IMAGES

PPT
What Is a Two-Tailed Test? Definition and Example
Hypothesis Testing
Hypothesis Testing Problems
What Is a Two-Tailed Test? Definition and Example / STATISTICAL TABLES
Significance Level and Power of a Hypothesis Test Tutorial

VIDEO

1 tailed and 2 tailed Hypothesis
One tailed hypothesis and two tailed hypothesis
single tailed hypothesis test explained
CRITICAL VALUE APPROACH TO TWO TAILED HYPOTHESIS TESTING L 151
One- and Two-Tailed Hypothesis Tests
Identify and Explain Differences Between One- and Two-Tailed Hypothesis Tests

COMMENTS

One-Tailed and Two-Tailed Hypothesis Tests Explained
Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2. ...
Two-Tailed Hypothesis Tests: 3 Example Problems
To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 20 grams; H A (Alternative Hypothesis): μ ≠ 20 grams; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The engineer believes that ...
What Is a Two-Tailed Test? Definition and Example
Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...
One-tailed and two-tailed tests (video)
A one tailed test does not leave more room to conclude that the alternative hypothesis is true. The benefit (increased certainty) of a one tailed test doesn't come free, as the analyst must know "something more", which is the direction of the effect, compared to a two tailed test. Show more...
Data analysis: hypothesis testing: 4.2 Two-tailed tests
To perform a two-tailed test at a significance level of 0.05, you need to divide alpha by 2, giving a significance level of 0.025 for each distribution tail (0.05/2 = 0.025). This is done because the two-tailed test is looking for significance in either tail of the distribution. If the calculated test statistic falls in the rejection region of ...
Hypothesis Testing: Upper-, Lower, and Two Tailed Tests
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. ... In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper ...
One- and two-tailed tests
In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) ¯. If testing for whether the coin is biased towards heads, a one-tailed test would be used - only large numbers of heads would be significant.
Hypothesis Testing
Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.
Hypothesis testing: One-tailed and two-tailed tests
At this point, you might use a statistical test, like unpaired or 2-sample t-test, to see if there's a significant difference between the two groups' means. Typically, an unpaired t-test starts with two hypotheses. The first hypothesis is called the null hypothesis, and it basically says there's no difference in the means of the two groups.
S.3.2 Hypothesis Testing (P-Value Approach)
Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...
Hypothesis Testing
So let's perform the step -1 of hypothesis testing which is: Specify the Null (H0) and Alternate (H1) hypothesis. Null hypothesis (H0): The null hypothesis here is what currently stated to be true about the population. In our case it will be the average height of students in the batch is 100. H0 : μ = 100.
Two Tailed Test: Definition, Examples
A two tailed test tells you that you're finding the area in the middle of a distribution. In other words, your rejection region (the place where you would reject the null hypothesis) is in both tails. For example, let's say you were running a z test with an alpha level of 5% (0.05). In a one tailed test, the entire 5% would be in a single tail.
5.2
5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the ...
One and Two Tailed Tests
A one-tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter (which can be any change- increase or decrease). We can perform the test at any level (usually 1%, 5% or 10%). For example, performing the test at a 5% level means that there is a 5% chance of wrongly rejecting H 0.
FAQ: What are the differences between one-tailed and two-tailed tests?
A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. ... So, depending on the direction of the one-tailed hypothesis, its p-value is either .5*(two-tailed p-value) or 1-.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero.
Statistics
The test statistic is used to decide the outcome of the hypothesis test. The test statistic is a standardized value calculated from the sample. The formula for the test statistic (TS) of a population mean is: x ¯ − μ s ⋅ n. x ¯ − μ is the difference between the sample mean ( x ¯) and the claimed population mean ( μ ).
One- and Two-Tailed Tests
In practice, you should use a one‐tailed test only when you have good reason to expect that the difference will be in a particular direction. A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis. Next Quiz: One- and Two-Tailed Tests.
11.4: One- and Two-Tailed Tests
The one-tailed hypothesis is rejected only if the sample proportion is much greater than $0.5$. The alternative hypothesis in the two-tailed test is $\pi \neq 0.5$. In the one-tailed test it is $\pi > 0.5$. You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data.
One Tailed and Two Tailed Tests, Critical Values ...
This statistics video tutorial explains when you should use a one tailed test vs a two tailed test when solving problems associated with hypothesis testing. ...
The Ultimate Guide to T Tests
Two-tailed tests are the most common, and they are applicable when your research question is simply asking, "is there a difference?" ... It is the simplest version of a t test, and has all sorts of applications within hypothesis testing. Sometimes the "known value" is called the "null value". While the null value in t tests is often ...
S.3.1 Hypothesis Testing (Critical Value Approach)
The critical value for conducting the left-tailed test H0 : μ = 3 versus HA : μ < 3 is the t -value, denoted -t( α, n - 1), such that the probability to the left of it is α. It can be shown using either statistical software or a t -table that the critical value -t0.05,14 is -1.7613. That is, we would reject the null hypothesis H0 : μ = 3 ...
t-test Calculator
Decide on the alternative hypothesis: Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value. ... Critical values for two-tailed t-test: ±cdf t,d-1 (1-α/2) critical region:
Understanding P-Values and Statistical Significance
Hypothesis testing. When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis. ... Two-Tailed Test In a normal distribution, the significance level corresponds to regions in the tails of the curve. In a two-tailed test, you have two regions each constituting 2.5% of ...
Choosing One-Tailed vs Two-Tailed Tests in BI
When delving into Business Intelligence (BI), hypothesis testing is a cornerstone of data analysis, providing insights and guiding decision-making. Choosing between a one-tailed and two-tailed ...
Statistics
This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different (larger or smaller) than in the null hypothesis. If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis. 3. Deciding the Significance Level. The significance level ( α) is the ...
How to Perform T-Tests in Python (One- and Two-Sample)
Where: X1 and X2 are the sample means of the two groups.; s1 and s2 are the sample variances of the two groups.; n1 and n2 are the sample sizes of the two groups.; We then need to calculate the p-value using degrees of freedom equal to (n 1 +n 2-1).If the p-value is less than your chosen significance level, we can reject the null hypothesis and say that the means differ.
6.4: One- and Two-Tailed Tests
The one-tailed hypothesis is rejected only if the sample proportion is much greater than $0.5$. The alternative hypothesis in the two-tailed test is $\pi \neq 0.5$. In the one-tailed test it is $\pi > 0.5$. You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data.
I made this "mental map" to help choose what hypothesis test ...
H0 claims p₁ = p₂ , H1 claims p₁ ≠ p₂ This will lead to a double-tailed two-sample Z-test. H0 claims p₁ = p₂ , H1 claims p₁ < p₂ This will lead to a left-tailed two-sample Z-test. H0 claims p₁ = p₂ , H1 claims p₁ > p₂ This will lead to a right-tailed two-sample Z-test.
P-value Calculator & Statistical Significance Calculator
The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.
Hypothesis Testing Questions L1 Fall 2024
A It can be stated as "not equal to" provided the alternative hypothesis is stated 8. Along with the alternative hypothesis, it considers all possible values of the population parameter. € Ina two-tailed test, it is rejected when evidence supports equality between the hypothesized value and the population as "equal to" parameter.