Logo for University of Washington Libraries

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

11 Hypothesis Testing with One Sample

Student learning outcomes.

By the end of this chapter, the student should be able to:

  • Be able to identify and develop the null and alternative hypothesis
  • Identify the consequences of Type I and Type II error.
  • Be able to perform an one-tailed and two-tailed hypothesis test using the critical value method
  • Be able to perform a hypothesis test using the p-value method
  • Be able to write conclusions based on hypothesis tests.

Introduction

Now we are down to the bread and butter work of the statistician: developing and testing hypotheses. It is important to   put this material in a broader context so that the method by which a hypothesis is formed is understood completely. Using textbook examples often clouds the real source of statistical hypotheses.

Statistical testing is part of a much larger process known as the scientific method. This method was developed more than two centuries ago as the accepted way that new knowledge could be created. Until then, and unfortunately even today, among some, “knowledge” could be created simply by some authority saying something was so, ipso dicta . Superstition and conspiracy theories were (are?) accepted uncritically.

The scientific method, briefly, states that only by following a careful and specific process can some assertion be included in the accepted body of knowledge. This process begins with a set of assumptions upon which a theory, sometimes called a model, is built. This theory, if it has any validity, will lead to predictions; what we call hypotheses.

As an example, in Microeconomics the theory of consumer choice begins with certain assumption concerning human behavior. From these assumptions a theory of how consumers make choices using indifference curves and the budget line. This theory gave rise to a very important prediction, namely, that there was an inverse relationship between price and quantity demanded. This relationship was known as the demand curve. The negative slope of the demand curve is really just a prediction, or a hypothesis, that can be tested with statistical tools.

Unless hundreds and hundreds of statistical tests of this hypothesis had not confirmed this relationship, the so-called Law of Demand would have been discarded years ago. This is the role of statistics, to test the hypotheses of various theories to determine if they should be admitted into the accepted body of knowledge; how we understand our world. Once admitted, however, they may be later discarded if new theories come along that make better predictions.

Not long ago two scientists claimed that they could get more energy out of a process than was put in. This caused a tremendous stir for obvious reasons. They were on the cover of Time and were offered extravagant sums to bring their research work to private industry and any number of universities. It was not long until their work was subjected to the rigorous tests of the scientific method and found to be a failure. No other lab could replicate their findings. Consequently they have sunk into obscurity and their theory discarded. It may surface again when someone can pass the tests of the hypotheses required by the scientific method, but until then it is just a curiosity. Many pure frauds have been attempted over time, but most have been found out by applying the process of the scientific method.

This discussion is meant to show just where in this process statistics falls. Statistics and statisticians are not necessarily in the business of developing theories, but in the business of testing others’ theories. Hypotheses come from these theories based upon an explicit set of assumptions and sound logic. The hypothesis comes first, before any data are gathered. Data do not create hypotheses; they are used to test them. If we bear this in mind as we study this section the process of forming and testing hypotheses will make more sense.

One job of a statistician is to make statistical inferences about populations based on samples taken from the population. Confidence intervals are one way to estimate a population parameter. Another way to make a statistical inference is to make a decision about the value of a specific parameter. For instance, a car dealer advertises that its new small truck gets 35 miles per gallon, on average. A tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. A company says that women managers in their company earn an average of $60,000 per year.

A statistician will make a decision about these claims. This process is called ” hypothesis testing .” A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, to reject the null hypothesis.

In this chapter, you will conduct hypothesis tests on single means and single proportions. You will also learn about the errors associated with these tests.

Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H_0

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

Table 1 presents the various hypotheses in the relevant pairs. For example, if the null hypothesis is equal to some value, the alternative has to be not equal to that value.

equal (=) not equal (≠)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

NOTE                                                                             

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

\mu

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

Outcomes and the Type I and Type II Errors

True False
Correct Outcome Type II error
Type I Error Correct Outcome

The four possible outcomes in the table are:

Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.

\alpha

By way of example, the American judicial system begins with the concept that a defendant is “presumed innocent”. This is the status quo and is the null hypothesis. The judge will tell the jury that they can not find the defendant guilty unless the evidence indicates guilt beyond a “reasonable doubt” which is usually defined in criminal cases as 95% certainty of guilt. If the jury cannot accept the null, innocent, then action will be taken, jail time. The burden of proof always lies with the alternative hypothesis. (In civil cases, the jury needs only to be more than 50% certain of wrongdoing to find culpability, called “a preponderance of the evidence”).

The example above was for a test of a mean, but the same logic applies to tests of hypotheses for all statistical parameters one may wish to test.

The following are examples of Type I and Type II errors.

Type I error : Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.

Type II error : Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

This is a situation described as “accepting a false null”.

Type I error : The emergency crew thinks that the victim is dead when, in fact, the victim is alive. Type II error : The emergency crew does not know if the victim is alive when, in fact, the victim is dead.

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

Distribution Needed for Hypothesis Testing

Particular distributions are associated with hypothesis testing.We will perform hypotheses tests of a population mean using a normal distribution or a Student’s t -distribution. (Remember, use a Student’s t -distribution when the population standard deviation is unknown and the sample size is small, where small is considered to be less than 30 observations.) We perform tests of a population proportion using a normal distribution when we can assume that the distribution is normally distributed. We consider this to be true if the sample proportion, p ‘ , times the sample size is greater than 5 and 1- p ‘ times the sample size is also greater then 5. This is the same rule of thumb we used when developing the formula for the confidence interval for a population proportion.

Hypothesis Test for the Mean

Going back to the standardizing formula we can derive the test statistic for testing hypotheses concerning means.

Z_c=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}

This gives us the decision rule for testing a hypothesis for a two-tailed test:

If
If

P-Value Approach

hypothesis testing for one sample

Both decision rules will result in the same decision and it is a matter of preference which one is used.

One and Two-tailed Tests

\mu\neq100

The claim would be in the alternative hypothesis. The burden of proof in hypothesis testing is carried in the alternative. This is because failing to reject the null, the status quo, must be accomplished with 90 or 95 percent significance that it cannot be maintained. Said another way, we want to have only a 5 or 10 percent probability of making a Type I error, rejecting a good null; overthrowing the status quo.

Figure 5 shows the two possible cases and the form of the null and alternative hypothesis that give rise to them.

hypothesis testing for one sample

Effects of Sample Size on Test Statistic

\sigma

Table 3 summarizes test statistics for varying sample sizes and population standard deviation known and unknown.

< 30 (σ unknown)
< 30 (σ known)
> 30 (σ unknown)
> 30 (σ known)

A Systematic Approach for Testing A Hypothesis

A systematic approach to hypothesis testing follows the following steps and in this order. This template will work for all hypotheses that you will ever test.

  • Set up the null and alternative hypothesis. This is typically the hardest part of the process. Here the question being asked is reviewed. What parameter is being tested, a mean, a proportion, differences in means, etc. Is this a one-tailed test or two-tailed test? Remember, if someone is making a claim it will always be a one-tailed test.
  • Decide the level of significance required for this particular case and determine the critical value. These can be found in the appropriate statistical table. The levels of confidence typical for the social sciences are 90, 95 and 99. However, the level of significance is a policy decision and should be based upon the risk of making a Type I error, rejecting a good null. Consider the consequences of making a Type I error.
  • Take a sample(s) and calculate the relevant parameters: sample mean, standard deviation, or proportion. Using the formula for the test statistic from above in step 2, now calculate the test statistic for this particular case using the parameters you have just calculated.
  • Compare the calculated test statistic and the critical value. Marking these on the graph will give a good visual picture of the situation. There are now only two situations:

a.     The test statistic is in the tail: Cannot Accept the null, the probability that this sample mean (proportion) came from the hypothesized distribution is too small to believe that it is the real home of these sample data.

b.   The test statistic is not in the tail: Cannot Reject the null, the sample data are compatible with the hypothesized population parameter.

  • Reach a conclusion. It is best to articulate the conclusion two different ways. First a formal statistical conclusion such as “With a 95 % level of significance we cannot accept the null hypotheses that the population mean is equal to XX (units of measurement)”. The second statement of the conclusion is less formal and states the action, or lack of action, required. If the formal conclusion was that above, then the informal one might be, “The machine is broken and we need to shut it down and call for repairs”.

All hypotheses tested will go through this same process. The only changes are the relevant formulas and those are determined by the hypothesis required to answer the original question.

Full Hypothesis Test Examples

Tests on means.

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard freestyle, with a standard deviation of 0.8 seconds . His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for 15 25-yard freestyle swims . For the 15 swims, Jeffrey’s mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds. Conduct a hypothesis test using a preset α = 0.05.

Solution – Example 6

Set up the Hypothesis Test:

Since the problem is about a mean, this is a test of a single population mean . Set the null and alternative hypothesis:

In this case there is an implied challenge or claim. This is that the goggles will reduce the swimming time. The effect of this is to set the hypothesis as a one-tailed test. The claim will always be in the alternative hypothesis because the burden of proof always lies with the alternative. Remember that the status quo must be defeated with a high degree of confidence, in this case 95 % confidence. The null and alternative hypotheses are thus:

For Jeffrey to swim faster, his time will be less than 16.43 seconds. The “<” tells you this is left-tailed. Determine the distribution needed:

Distribution for the test statistic:

The sample size is less than 30 and we do not know the population standard deviation so this is a t-test and the proper formula is:

t_c=\frac{\bar{x}-{\mu_0}}{\frac{s}{\sqrt{n}}}

Our step 2, setting the level of significance, has already been determined by the problem, .05 for a 95 % significance level. It is worth thinking about the meaning of this choice. The Type I error is to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually swims the 25-yard freestyle, on average, in 16.43 seconds. (Reject the null hypothesis when the null hypothesis is true.) For this case the only concern with a Type I error would seem to be that Jeffery’s dad may fail to bet on his son’s victory because he does not have appropriate confidence in the effect of the goggles.

To find the critical value we need to select the appropriate test statistic. We have concluded that this is a t-test on the basis of the sample size and that we are interested in a population mean. We can now draw the graph of the t-distribution and mark the critical value (Figure 6). For this problem the degrees of freedom are n-1, or 14. Looking up 14 degrees of freedom at the 0.05 column of the t-table we find 1.761. This is the critical value and we can put this on our graph.

Step 3 is the calculation of the test statistic using the formula we have selected.

t_c=\frac{16-16.43}{\frac{0.8}{\sqrt{15}}}

We find that the calculated test statistic is 2.08, meaning that the sample mean is 2.08 standard deviations away from the hypothesized mean of 16.43.

hypothesis testing for one sample

Step 4 has us compare the test statistic and the critical value and mark these on the graph. We see that the test statistic is in the tail and thus we move to step 4 and reach a conclusion. The probability that an average time of 16 minutes could come from a distribution with a population mean of 16.43 minutes is too unlikely for us to accept the null hypothesis. We cannot accept the null.

Step 5 has us state our conclusions first formally and then less formally. A formal conclusion would be stated as: “With a 95% level of significance we cannot accept the null hypothesis that the swimming time with goggles comes from a distribution with a population mean time of 16.43 minutes.” Less formally, “With 95% significance we believe that the goggles improves swimming speed”

If we wished to use the p-value system of reaching a conclusion we would calculate the statistic and take the additional step to find the probability of being 2.08 standard deviations from the mean on a t-distribution. This value is .0187. Comparing this to the α-level of .05 we see that we cannot accept the null. The p-value has been put on the graph as the shaded area beyond -2.08 and it shows that it is smaller than the hatched area which is the alpha level of 0.05. Both methods reach the same conclusion that we cannot accept the null hypothesis.

Jane has just begun her new job as on the sales force of a very competitive company. In a sample of 16 sales calls it was found that she closed the contract for an average value of $108 with a standard deviation of 12 dollars. Test at 5% significance that the population mean is at least $100 against the alternative that it is less than 100 dollars. Company policy requires that new members of the sales force must exceed an average of $100 per contract during the trial employment period. Can we conclude that Jane has met this requirement at the significance level of 95%?

Solution – Example 7

STEP 1 : Set the Null and Alternative Hypothesis.

STEP 2 : Decide the level of significance and draw the graph (Figure 7) showing the critical value.

t_a = 1.753

STEP 3 : Calculate sample parameters and the test statistic.

t_c=\frac{108-100}{\frac{12}{\sqrt{16}}} = 2.67

STEP 4 : Compare test statistic and the critical values

STEP 5 : Reach a Conclusion

The test statistic is a Student’s t because the sample size is below 30; therefore, we cannot use the normal distribution. Comparing the calculated value of the test statistic and the critical value of t ( t a ) at a 5% significance level, we see that the calculated value is in the tail of the distribution. Thus, we conclude that 108 dollars per contract is significantly larger than the hypothesized value of 100 and thus we cannot accept the null hypothesis. There is evidence that supports Jane’s performance meets company standards.

s^2

Again we will follow the steps in our analysis of this problem.

Solution – Example 8

STEP 1 : Set the Null and Alternative Hypothesis. The random variable is the quantity of fluid placed in the bottles. This is a continuous random variable and the parameter we are interested in is the mean. Our hypothesis therefore is about the mean. In this case we are concerned that the machine is not filling properly. From what we are told it does not matter if the machine is over-filling or under-filling, both seem to be an equally bad error. This tells us that this is a two-tailed test: if the machine is malfunctioning it will be shutdown regardless if it is from over-filling or under-filling. The null and alternative hypotheses are thus:

STEP 2 : Decide the level of significance and draw the graph showing the critical value.

This problem has already set the level of significance at 99%. The decision seems an appropriate one and shows the thought process when setting the significance level. Management wants to be very certain, as certain as probability will allow, that they are not shutting down a machine that is not in need of repair. To draw the distribution and the critical value, we need to know which distribution to use. Because this is a continuous random variable and we are interested in the mean, and the sample size is greater than 30, the appropriate distribution is the normal distribution and the relevant critical value is 2.575 from the normal table or the t-table at 0.005 column and infinite degrees of freedom. We draw the graph and mark these points (Figure 8).

hypothesis testing for one sample

STEP 3 : Calculate sample parameters and the test statistic. The sample parameters are provided, the sample mean is 7.91 and the sample variance is .03 and the sample size is 35. We need to note that the sample variance was provided not the sample standard deviation, which is what we need for the formula. Remembering that the standard deviation is simply the square root of the variance, we therefore know the sample standard deviation, s, is 0.173. With this information we calculate the test statistic as -3.07, and mark it on the graph.

Z_c=\frac{\bar{x}-{\mu_0}}{\frac{s}{\sqrt{n}}} = Z_c=\frac{7.91-8}{\frac{.173}{\sqrt{35}}}=-3.07

STEP 4 : Compare test statistic and the critical values Now we compare the test statistic and the critical value by placing the test statistic on the graph. We see that the test statistic is in the tail, decidedly greater than the critical value of 2.575. We note that even the very small difference between the hypothesized value and the sample value is still a large number of standard deviations. The sample mean is only 0.08 ounces different from the required level of 8 ounces, but it is 3 plus standard deviations away and thus we cannot accept the null hypothesis.

Three standard deviations of a test statistic will guarantee that the test will fail. The probability that anything is within three standard deviations is almost zero. Actually it is 0.0026 on the normal distribution, which is certainly almost zero in a practical sense. Our formal conclusion would be “ At a 99% level of significance we cannot accept the hypothesis that the sample mean came from a distribution with a mean of 8 ounces” Or less formally, and getting to the point, “At a 99% level of significance we conclude that the machine is under filling the bottles and is in need of repair”.

Media Attributions

  • Type1Type2Error
  • HypTestFig2
  • HypTestFig3
  • HypTestPValue
  • OneTailTestFig5
  • HypTestExam7
  • HypTestExam8

Quantitative Analysis for Business Copyright © by Margo Bergman is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing for one sample

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved July 30, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Statistical Hypothesis Testing Overview

By Jim Frost 59 Comments

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide .

Why You Should Perform Statistical Hypothesis Testing

Graph that displays mean drug scores by group. Use hypothesis testing to determine whether the difference between the means are statistically significant.

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.  For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.

Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error .

For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!

Let’s cover some basic hypothesis testing terms that you need to know.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Hypothesis Testing

Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.

When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.

The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.

Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics .

An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance .

Null Hypothesis

The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H 0 .

In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Related post : Understanding the Null Hypothesis in More Detail

Alternative Hypothesis

The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H 1 or H A .

For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is H A : μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, H A : μ > 0 can only test for differences that are greater than zero.

Related posts : Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained

Image of a P for the p-value in hypothesis testing.

P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.

Related post : Interpreting P-values Correctly

Significance Level (Alpha)

image of the alpha symbol for hypothesis testing.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.

Related posts : Graphical Approach to Significance Levels and P-values and Conceptual Approach to Understanding Significance Levels

Types of Errors in Hypothesis Testing

Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.

  • False positives: You reject a null that is true. Statisticians call this a Type I error . The Type I error rate equals your significance level or alpha (α).
  • False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size , noisy data, or a small effect size. The type II error rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics .

Related posts : Types of Errors in Hypothesis Testing and Estimating a Good Sample Size for Your Study Using Power Analysis

Which Type of Hypothesis Test is Right for You?

There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.

To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data . This background research is necessary before you begin a study.

Related Post : Hypothesis Tests for Continuous, Binary, and Count Data

Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics !

If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:

  • How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
  • Fatality Rates in Star Trek . This example shows how to use hypothesis testing with categorical data.
  • Busting Myths About the Battle of the Sexes . A fun example based on a Mythbusters episode that assess continuous data using several different tests.
  • Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.

Share this:

hypothesis testing for one sample

Reader Interactions

' src=

January 14, 2024 at 8:43 am

Hello professor Jim, how are you doing! Pls. What are the properties of a population and their examples? Thanks for your time and understanding.

' src=

January 14, 2024 at 12:57 pm

Please read my post about Populations vs. Samples for more information and examples.

Also, please note there is a search bar in the upper-right margin of my website. Use that to search for topics.

' src=

July 5, 2023 at 7:05 am

Hello, I have a question as I read your post. You say in p-values section

“P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null.”

But according to your definition of effect, the null states that an effect does not exist, correct? So what I assume you want to say is that “P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is **incorrect**.”

July 6, 2023 at 5:18 am

Hi Shrinivas,

The correct definition of p-value is that it is a probability that exists in the context of a true null hypothesis. So, the quotation is correct in stating “if the null hypothesis is correct.”

Essentially, the p-value tells you the likelihood of your observed results (or more extreme) if the null hypothesis is true. It gives you an idea of whether your results are surprising or unusual if there is no effect.

Hence, with sufficiently low p-values, you reject the null hypothesis because it’s telling you that your sample results were unlikely to have occurred if there was no effect in the population.

I hope that helps make it more clear. If not, let me know I’ll attempt to clarify!

' src=

May 8, 2023 at 12:47 am

Thanks a lot Ny best regards

May 7, 2023 at 11:15 pm

Hi Jim Can you tell me something about size effect? Thanks

May 8, 2023 at 12:29 am

Here’s a post that I’ve written about Effect Sizes that will hopefully tell you what you need to know. Please read that. Then, if you have any more specific questions about effect sizes, please post them there. Thanks!

' src=

January 7, 2023 at 4:19 pm

Hi Jim, I have only read two pages so far but I am really amazed because in few paragraphs you made me clearly understand the concepts of months of courses I received in biostatistics! Thanks so much for this work you have done it helps a lot!

January 10, 2023 at 3:25 pm

Thanks so much!

' src=

June 17, 2021 at 1:45 pm

Can you help in the following question: Rocinante36 is priced at ₹7 lakh and has been designed to deliver a mileage of 22 km/litre and a top speed of 140 km/hr. Formulate the null and alternative hypotheses for mileage and top speed to check whether the new models are performing as per the desired design specifications.

' src=

April 19, 2021 at 1:51 pm

Its indeed great to read your work statistics.

I have a doubt regarding the one sample t-test. So as per your book on hypothesis testing with reference to page no 45, you have mentioned the difference between “the sample mean and the hypothesised mean is statistically significant”. So as per my understanding it should be quoted like “the difference between the population mean and the hypothesised mean is statistically significant”. The catch here is the hypothesised mean represents the sample mean.

Please help me understand this.

Regards Rajat

April 19, 2021 at 3:46 pm

Thanks for buying my book. I’m so glad it’s been helpful!

The test is performed on the sample but the results apply to the population. Hence, if the difference between the sample mean (observed in your study) and the hypothesized mean is statistically significant, that suggests that population does not equal the hypothesized mean.

For one sample tests, the hypothesized mean is not the sample mean. It is a mean that you want to use for the test value. It usually represents a value that is important to your research. In other words, it’s a value that you pick for some theoretical/practical reasons. You pick it because you want to determine whether the population mean is different from that particular value.

I hope that helps!

' src=

November 5, 2020 at 6:24 am

Jim, you are such a magnificent statistician/economist/econometrician/data scientist etc whatever profession. Your work inspires and simplifies the lives of so many researchers around the world. I truly admire you and your work. I will buy a copy of each book you have on statistics or econometrics. Keep doing the good work. Remain ever blessed

November 6, 2020 at 9:47 pm

Hi Renatus,

Thanks so much for you very kind comments. You made my day!! I’m so glad that my website has been helpful. And, thanks so much for supporting my books! 🙂

' src=

November 2, 2020 at 9:32 pm

Hi Jim, I hope you are aware of 2019 American Statistical Association’s official statement on Statistical Significance: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 In case you do not bother reading the full article, may I quote you the core message here: “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way."

With best wishes,

November 3, 2020 at 2:09 am

I’m definitely aware of the debate surrounding how to use p-values most effectively. However, I need to correct you on one point. The link you provide is NOT a statement by the American Statistical Association. It is an editorial by several authors.

There is considerable debate over this issue. There are problems with p-values. However, as the authors state themselves, much of the problem is over people’s mindsets about how to use p-values and their incorrect interpretations about what statistical significance does and does not mean.

If you were to read my website more thoroughly, you’d be aware that I share many of their concerns and I address them in multiple posts. One of the authors’ key points is the need to be thoughtful and conduct thoughtful research and analysis. I emphasize this aspect in multiple posts on this topic. I’ll ask you to read the following three because they all address some of the authors’ concerns and suggestions. But you might run across others to read as well.

Five Tips for Using P-values to Avoid Being Misled How to Interpret P-values Correctly P-values and the Reproducibility of Experimental Results

' src=

September 24, 2020 at 11:52 pm

HI Jim, i just want you to know that you made explanation for Statistics so simple! I should say lesser and fewer words that reduce the complexity. All the best! 🙂

September 25, 2020 at 1:03 am

Thanks, Rene! Your kind words mean a lot to me! I’m so glad it has been helpful!

' src=

September 23, 2020 at 2:21 am

Honestly, I never understood stats during my entire M.Ed course and was another nightmare for me. But how easily you have explained each concept, I have understood stats way beyond my imagination. Thank you so much for helping ignorant research scholars like us. Looking forward to get hardcopy of your book. Kindly tell is it available through flipkart?

September 24, 2020 at 11:14 pm

I’m so happy to hear that my website has been helpful!

I checked on flipkart and it appears like my books are not available there. I’m never exactly sure where they’re available due to the vagaries of different distribution channels. They are available on Amazon in India.

Introduction to Statistics: An Intuitive Guide (Amazon IN) Hypothesis Testing: An Intuitive Guide (Amazon IN)

' src=

July 26, 2020 at 11:57 am

Dear Jim I am a teacher from India . I don’t have any background in statistics, and still I should tell that in a single read I can follow your explanations . I take my entire biostatistics class for botany graduates with your explanations. Thanks a lot. May I know how I can avail your books in India

July 28, 2020 at 12:31 am

Right now my books are only available as ebooks from my website. However, soon I’ll have some exciting news about other ways to obtain it. Stay tuned! I’ll announce it on my email list. If you’re not already on it, you can sign up using the form that is in the right margin of my website.

' src=

June 22, 2020 at 2:02 pm

Also can you please let me if this book covers topics like EDA and principal component analysis?

June 22, 2020 at 2:07 pm

This book doesn’t cover principal components analysis. Although, I wouldn’t really classify that as a hypothesis test. In the future, I might write a multivariate analysis book that would cover this and others. But, that’s well down the road.

My Introduction to Statistics covers EDA. That’s the largely graphical look at your data that you often do prior to hypothesis testing. The Introduction book perfectly leads right into the Hypothesis Testing book.

June 22, 2020 at 1:45 pm

Thanks for the detailed explanation. It does clear my doubts. I saw that your book related to hypothesis testing has the topics that I am studying currently. I am looking forward to purchasing it.

Regards, Take Care

June 19, 2020 at 1:03 pm

For this particular article I did not understand a couple of statements and it would great if you could help: 1)”If sample error causes the observed difference, the next time someone performs the same experiment the results might be different.” 2)”If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics.”

I discovered your articles by chance and now I keep coming back to read & understand statistical concepts. These articles are very informative & easy to digest. Thanks for the simplifying things.

June 20, 2020 at 9:53 pm

I’m so happy to hear that you’ve found my website to be helpful!

To answer your questions, keep in mind that a central tenant of inferential statistics is that the random sample that a study drew was only one of an infinite number of possible it could’ve drawn. Each random sample produces different results. Most results will cluster around the population value assuming they used good methodology. However, random sampling error always exists and makes it so that population estimates from a sample almost never exactly equal the correct population value.

So, imagine that we’re studying a medication and comparing the treatment and control groups. Suppose that the medicine is truly not effect and that the population difference between the treatment and control group is zero (i.e., no difference.) Despite the true difference being zero, most sample estimates will show some degree of either a positive or negative effect thanks to random sampling error. So, just because a study has an observed difference does not mean that a difference exists at the population level. So, on to your questions:

1. If the observed difference is just random error, then it makes sense that if you collected another random sample, the difference could change. It could change from negative to positive, positive to negative, more extreme, less extreme, etc. However, if the difference exists at the population level, most random samples drawn from the population will reflect that difference. If the medicine has an effect, most random samples will reflect that fact and not bounce around on both sides of zero as much.

2. This is closely related to the previous answer. If there is no difference at the population level, but say you approve the medicine because of the observed effects in a sample. Even though your random sample showed an effect (which was really random error), that effect doesn’t exist. So, when you start using it on a larger scale, people won’t benefit from the medicine. That’s why it’s important to separate out what is easily explained by random error versus what is not easily explained by it.

I think reading my post about how hypothesis tests work will help clarify this process. Also, in about 24 hours (as I write this), I’ll be releasing my new ebook about Hypothesis Testing!

' src=

May 29, 2020 at 5:23 am

Hi Jim, I really enjoy your blog. Can you please link me on your blog where you discuss about Subgroup analysis and how it is done? I need to use non parametric and parametric statistical methods for my work and also do subgroup analysis in order to identify potential groups of patients that may benefit more from using a treatment than other groups.

May 29, 2020 at 2:12 pm

Hi, I don’t have a specific article about subgroup analysis. However, subgroup analysis is just the dividing up of a larger sample into subgroups and then analyzing those subgroups separately. You can use the various analyses I write about on the subgroups.

Alternatively, you can include the subgroups in regression analysis as an indicator variable and include that variable as a main effect and an interaction effect to see how the relationships vary by subgroup without needing to subdivide your data. I write about that approach in my article about comparing regression lines . This approach is my preferred approach when possible.

' src=

April 19, 2020 at 7:58 am

sir is confidence interval is a part of estimation?

' src=

April 17, 2020 at 3:36 pm

Sir can u plz briefly explain alternatives of hypothesis testing? I m unable to find the answer

April 18, 2020 at 1:22 am

Assuming you want to draw conclusions about populations by using samples (i.e., inferential statistics ), you can use confidence intervals and bootstrap methods as alternatives to the traditional hypothesis testing methods.

' src=

March 9, 2020 at 10:01 pm

Hi JIm, could you please help with activities that can best teach concepts of hypothesis testing through simulation, Also, do you have any question set that would enhance students intuition why learning hypothesis testing as a topic in introductory statistics. Thanks.

' src=

March 5, 2020 at 3:48 pm

Hi Jim, I’m studying multiple hypothesis testing & was wondering if you had any material that would be relevant. I’m more trying to understand how testing multiple samples simultaneously affects your results & more on the Bonferroni Correction

March 5, 2020 at 4:05 pm

I write about multiple comparisons (aka post hoc tests) in the ANOVA context . I don’t talk about Bonferroni Corrections specifically but I cover related types of corrections. I’m not sure if that exactly addresses what you want to know but is probably the closest I have already written. I hope it helps!

' src=

January 14, 2020 at 9:03 pm

Thank you! Have a great day/evening.

January 13, 2020 at 7:10 pm

Any help would be greatly appreciated. What is the difference between The Hypothesis Test and The Statistical Test of Hypothesis?

January 14, 2020 at 11:02 am

They sound like the same thing to me. Unless this is specialized terminology for a particular field or the author was intending something specific, I’d guess they’re one and the same.

' src=

April 1, 2019 at 10:00 am

so these are the only two forms of Hypothesis used in statistical testing?

April 1, 2019 at 10:02 am

Are you referring to the null and alternative hypothesis? If so, yes, that’s those are the standard hypotheses in a statistical hypothesis test.

April 1, 2019 at 9:57 am

year very insightful post, thanks for the write up

' src=

October 27, 2018 at 11:09 pm

hi there, am upcoming statistician, out of all blogs that i have read, i have found this one more useful as long as my problem is concerned. thanks so much

October 27, 2018 at 11:14 pm

Hi Stano, you’re very welcome! Thanks for your kind words. They mean a lot! I’m happy to hear that my posts were able to help you. I’m sure you will be a fantastic statistician. Best of luck with your studies!

' src=

October 26, 2018 at 11:39 am

Dear Jim, thank you very much for your explanations! I have a question. Can I use t-test to compare two samples in case each of them have right bias?

October 26, 2018 at 12:00 pm

Hi Tetyana,

You’re very welcome!

The term “right bias” is not a standard term. Do you by chance mean right skewed distributions? In other words, if you plot the distribution for each group on a histogram they have longer right tails? These are not the symmetrical bell-shape curves of the normal distribution.

If that’s the case, yes you can as long as you exceed a specific sample size within each group. I include a table that contains these sample size requirements in my post about nonparametric vs parametric analyses .

Bias in statistics refers to cases where an estimate of a value is systematically higher or lower than the true value. If this is the case, you might be able to use t-tests, but you’d need to be sure to understand the nature of the bias so you would understand what the results are really indicating.

I hope this helps!

' src=

April 2, 2018 at 7:28 am

Simple and upto the point 👍 Thank you so much.

April 2, 2018 at 11:11 am

Hi Kalpana, thanks! And I’m glad it was helpful!

' src=

March 26, 2018 at 8:41 am

Am I correct if I say: Alpha – Probability of wrongly rejection of null hypothesis P-value – Probability of wrongly acceptance of null hypothesis

March 28, 2018 at 3:14 pm

You’re correct about alpha. Alpha is the probability of rejecting the null hypothesis when the null is true.

Unfortunately, your definition of the p-value is a bit off. The p-value has a fairly convoluted definition. It is the probability of obtaining the effect observed in a sample, or more extreme, if the null hypothesis is true. The p-value does NOT indicate the probability that either the null or alternative is true or false. Although, those are very common misinterpretations. To learn more, read my post about how to interpret p-values correctly .

' src=

March 2, 2018 at 6:10 pm

I recently started reading your blog and it is very helpful to understand each concept of statistical tests in easy way with some good examples. Also, I recommend to other people go through all these blogs which you posted. Specially for those people who have not statistical background and they are facing to many problems while studying statistical analysis.

Thank you for your such good blogs.

March 3, 2018 at 10:12 pm

Hi Amit, I’m so glad that my blog posts have been helpful for you! It means a lot to me that you took the time to write such a nice comment! Also, thanks for recommending by blog to others! I try really hard to write posts about statistics that are easy to understand.

' src=

January 17, 2018 at 7:03 am

I recently started reading your blog and I find it very interesting. I am learning statistics by my own, and I generally do many google search to understand the concepts. So this blog is quite helpful for me, as it have most of the content which I am looking for.

January 17, 2018 at 3:56 pm

Hi Shashank, thank you! And, I’m very glad to hear that my blog is helpful!

' src=

January 2, 2018 at 2:28 pm

thank u very much sir.

January 2, 2018 at 2:36 pm

You’re very welcome, Hiral!

' src=

November 21, 2017 at 12:43 pm

Thank u so much sir….your posts always helps me to be a #statistician

November 21, 2017 at 2:40 pm

Hi Sachin, you’re very welcome! I’m happy that you find my posts to be helpful!

' src=

November 19, 2017 at 8:22 pm

great post as usual, but it would be nice to see an example.

November 19, 2017 at 8:27 pm

Thank you! At the end of this post, I have links to four other posts that show examples of hypothesis tests in action. You’ll find what you’re looking for in those posts!

Comments and Questions Cancel reply

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 12, hypothesis testing and p-values.

  • One-tailed and two-tailed tests
  • Z-statistics vs. T-statistics
  • Small sample hypothesis test
  • Large sample proportion hypothesis testing

hypothesis testing for one sample

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Video transcript

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.3 - hypothesis testing for one-sample mean.

In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution of the sample mean.

Hypothesis Test for One-Sample Mean Section  

Recall that under certain conditions, the sampling distribution of the sample mean, \(\bar{x} \), is approximately normal with mean, \(\mu \), standard error \(\dfrac{\sigma}{\sqrt{n}} \), and estimated standard error \(\dfrac{s}{\sqrt{n}} \).

\(H_0\colon \mu=\mu_0\)

Conditions:

  • The distribution of the population is Normal
  • The sample size is large \( n>30 \).

Test Statistic:

If at least one of conditions are satisfied, then...

\( t=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} \)

will follow a t-distribution with \(n-1 \) degrees of freedom.

Notice when working with continuous data we are going to use a t statistic as opposed to the z statistic. This is due to the fact that the sample size impacts the sampling distribution and needs to be taken into account. We do this by recognizing “degrees of freedom”. We will not go into too much detail about degrees of freedom in this course.

Let’s look at an example.

Example 5-1 Section  

This depends on the standard deviation of \(\bar{x} \) . 

\begin{align} t^*&=\dfrac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\\&=-1.3 \end{align} 

Thus, we are asking if \(-1.3\) is very far away from zero, since that corresponds to the case when \(\bar{x}\) is equal to \(\mu_0 \). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis. 

hypothesis testing for one sample

Snapsolve any problem by taking a picture. Try it in the Numerade app?

Back to blog home

Hypothesis testing explained in 4 parts, yuzheng sun, phd.

As data scientists, Hypothesis Testing is expected to be well understood, but often not in reality. It is mainly because our textbooks blend two schools of thought – p-value and significance testing vs. hypothesis testing – inconsistently.

For example, some questions are not obvious unless you have thought through them before:

Are power or beta dependent on the null hypothesis?

Can we accept the null hypothesis? Why?

How does MDE change with alpha holding beta constant?

Why do we use standard error in Hypothesis Testing but not the standard deviation?

Why can’t we be specific about the alternative hypothesis so we can properly model it?

Why is the fundamental tradeoff of the Hypothesis Testing about mistake vs. discovery, not about alpha vs. beta?

Addressing this problem is not easy. The topic of Hypothesis Testing is convoluted. In this article, there are 10 concepts that we will introduce incrementally, aid you with visualizations, and include intuitive explanations. After this article, you will have clear answers to the questions above that you truly understand on a first-principle level and explain these concepts well to your stakeholders.

We break this article into four parts.

Set up the question properly using core statistical concepts, and connect them to Hypothesis Testing, while striking a balance between technically correct and simplicity. Specifically, 

We emphasize a clear distinction between the standard deviation and the standard error, and why the latter is used in Hypothesis Testing

We explain fully when can you “accept” a hypothesis, when shall you say “failing to reject” instead of “accept”, and why

Introduce alpha, type I error, and the critical value with the null hypothesis

Introduce beta, type II error, and power with the alternative hypothesis

Introduce minimum detectable effects and the relationship between the factors with power calculations , with a high-level summary and practical recommendations

Part 1 - Hypothesis Testing, the central limit theorem, population, sample, standard deviation, and standard error

In Hypothesis Testing, we begin with a null hypothesis , which generally asserts that there is no effect between our treatment and control groups. Commonly, this is expressed as the difference in means between the treatment and control groups being zero.

The central limit theorem suggests an important property of this difference in means — given a sufficiently large sample size, the underlying distribution of this difference in means will approximate a normal distribution, regardless of the population's original distribution. There are two notes:

1. The distribution of the population for the treatment and control groups can vary, but the observed means (when you observe many samples and calculate many means) are always normally distributed with a large enough sample. Below is a chart, where the n=10 and n=30 correspond to the underlying distribution of the sample means.

Central Limit Theorem

2. Pay attention to “the underlying distribution”. Standard deviation vs. standard error is a potentially confusing concept. Let’s clarify.

Standard deviation vs. Standard error

Let’s declare our null hypothesis as having no treatment effect. Then, to simplify, let’s propose the following normal distribution with a mean of 0 and a standard deviation of 1 as the range of possible outcomes with probabilities associated with this null hypothesis.

Standard Deviation v Standard Error

The language around population, sample, group, and estimators can get confusing. Again, to simplify, let’s forget that the null hypothesis is about the mean estimator, and declare that we can either observe the mean hypothesis once or many times. When we observe it many times, it forms a sample*, and our goal is to make decisions based on this sample.

* For technical folks, the observation is actually about a single sample, many samples are a group, and the difference in groups is the distribution we are talking about as the mean hypothesis. The red curve represents the distribution of the estimator of this difference, and then we can have another sample consisting of many observations of this estimator. In my simplified language, the red curve is the distribution of the estimator, and the blue curve with sample size is the repeated observations of it. If you have a better way to express these concepts without causing confusiongs, please suggest.

This probability density function means if there is one realization from this distribution, the realitization can be anywhere on the x-axis, with the relative likelihood on the y-axis.

If we draw multiple observations , they form a sample . Each observation in this sample follows the property of this underlying distribution – more likely to be close to 0, and equally likely to be on either side, which makes the odds of positive and negative cancel each other out, so the mean of this sample is even more centered around 0.

We use the standard error to represent the error of our “sample mean” . 

The standard error = the standard deviation of the observed sample / sqrt (sample size). 

For a sample size of 30, the standard error is roughly 0.18. Compared with the underlying distribution, the distribution of the sample mean is much narrower.

Standard Deviation and Standard Error 2 Images

In Hypothesis Testing, we try to draw some conclusions – is there a treatment effect or not? – based on a sample. So when we talk about alpha and beta, which are the probabilities of type I and type II errors , we are talking about the probabilities based on the plot of sample means and standard error .

Part 2, The null hypothesis: alpha and the critical value

From Part 1, we stated that a null hypothesis is commonly expressed as the difference in means between the treatment and control groups being zero.

Without loss of generality*, let’s assume the underlying distribution of our null hypothesis is mean 0 and standard deviation 1

Then the sample mean of the null hypothesis is 0 and the standard error of 1/√ n, where n is the sample size.

When the sample size is 30, this distribution has a standard error of ≈0.18 looks like the below. 

Null Hypothesis YZ

*: A note for the technical readers: The null hypothesis is about the difference in means, but here, without complicating things, we made the subtle change to just draw the distribution of this “estimator of this difference in means”. Everything below speaks to this “estimator”.

The reason we have the null hypothesis is that we want to make judgments, particularly whether a  treatment effect exists. But in the world of probabilities, any observation, and any sample mean can happen, with different probabilities. So we need a decision rule to help us quantify our risk of making mistakes.

The decision rule is, let’s set a threshold. When the sample mean is above the threshold, we reject the null hypothesis; when the sample mean is below the threshold, we accept the null hypothesis.

Accepting a hypothesis vs. failing to reject a hypothesis

It’s worth noting that you may have heard of “we never accept a hypothesis, we just fail to reject a hypothesis” and be subconsciously confused by it. The deep reason is that modern textbooks do an inconsistent blend of Fisher’s significance testing and Neyman-Pearson’s Hypothesis Testing definitions and ignore important caveats ( ref ). To clarify:

First of all, we can never “prove” a particular hypothesis given any observations, because there are infinitely many true hypotheses (with different probabilities) given an observation. We will visualize it in Part 3.

Second, “accepting” a hypothesis does not mean that you believe in it, but only that you act as if it were true. So technically, there is no problem with “accepting” a hypothesis.

But, third, when we talk about p-values and confidence intervals, “accepting” the null hypothesis is at best confusing. The reason is that “the p-value above the threshold” just means we failed to reject the null hypothesis. In the strict Fisher’s p-value framework, there is no alternative hypothesis. While we have a clear criterion for rejecting the null hypothesis (p < alpha), we don't have a similar clear-cut criterion for "accepting" the null hypothesis based on beta.

So the dangers in calling “accepting a hypothesis” in the p-value setting are:

Many people misinterpret “accepting” the null hypothesis as “proving” the null hypothesis, which is wrong; 

“Accepting the null hypothesis” is not rigorously defined, and doesn’t speak to the purpose of the test, which is about whether or not we reject the null hypothesis. 

In this article, we will stay consistent within the Neyman-Pearson framework , where “accepting” a hypothesis is legal and necessary. Otherwise, we cannot draw any distributions without acting as if some hypothesis was true.

You don’t need to know the name Neyman-Pearson to understand anything, but pay attention to our language, as we choose our words very carefully to avoid mistakes and confusion.

So far, we have constructed a simple world of one hypothesis as the only truth, and a decision rule with two potential outcomes – one of the outcomes is “reject the null hypothesis when it is true” and the other outcome is “accept the null hypothesis when it is true”. The likelihoods of both outcomes come from the distribution where the null hypothesis is true.

Later, when we introduce the alternative hypothesis and MDE, we will gradually walk into the world of infinitely many alternative hypotheses and visualize why we cannot “prove” a hypothesis.

We save the distinction between the p-value/significance framework vs. Hypothesis Testing in another article where you will have the full picture.

Type I error, alpha, and the critical value

We’re able to construct a distribution of the sample mean for this null hypothesis using the standard error. Since we only have the null hypothesis as the truth of our universe, we can only make one type of mistake – falsely rejecting the null hypothesis when it is true. This is the type I error , and the probability is called alpha . Suppose we want alpha to be 5%. We can calculate the threshold required to make it happen. This threshold is called the critical value . Below is the chart we further constructed with our sample of 30.

Type I Error Alpha Critical Value

In this chart, alpha is the blue area under the curve. The critical value is 0.3. If our sample mean is above 0.3, we reject the null hypothesis. We have a 5% chance of making the type I error.

Type I error: Falsely rejecting the null hypothesis when the null hypothesis is true

Alpha: The probability of making a Type I error

Critical value: The threshold to determine whether the null hypothesis is to be rejected or not

Part 3, The alternative hypothesis: beta and power

You may have noticed in part 2 that we only spoke to Type I error – rejecting the null hypothesis when it is true. What about the Type II error – falsely accepting the null hypothesis when it is not true?

But it is weird to call “accepting” false unless we know the truth. So we need an alternative hypothesis which serves as the alternative truth. 

Alternative hypotheses are theoretical constructs

There is an important concept that most textbooks fail to emphasize – that is, you can have infinitely many alternative hypotheses for a given null hypothesis, we just choose one. None of them are more special or “real” than the others. 

Let’s visualize it with an example. Suppose we observed a sample mean of 0.51, what is the true alternative hypothesis?

Alternative hypotheses theoretical

With this visualization, you can see why we have “infinitely many alternative hypotheses” because, given the observation, there is an infinite number of alternative hypotheses (plus the null hypothesis) that can be true, each with different probabilities. Some are more likely than others, but all are possible.

Remember, alternative hypotheses are a theoretical construct. We choose one particular alternative hypothesis to calculate certain probabilities. By now, we should have more understanding of why we cannot “accept” the null hypothesis given an observation. We can’t prove that the null hypothesis is true, we just fail to accept it given the observation and our pre-determined decision rule. 

We will fully reconcile this idea of picking one alternative hypothesis out of the world of infinite possibilities when we talk about MDE. The idea of “accept” vs. “fail to reject” is deeper, and we won’t cover it fully in this article. We will do so when we have an article about the p-value and the confidence interval.

Type II error and Beta

For the sake of simplicity and easy comparison, let’s choose an alternative hypothesis with a mean of 0.5, and a standard deviation of

1. Again, with a sample size of 30, the standard error ≈0.18. There are now two potential “truths” in our simple universe.

Type II Error and Beta

Remember from the null hypothesis, we want alpha to be 5% so the corresponding critical value is 0.30. We modify our rule as follows:

If the observation is above 0.30, we reject the null hypothesis and accept the alternative hypothesis ; 

If the observation is below 0.30, we accept the null hypothesis and reject the alternative hypothesis .

Reject alternative and accept null

With the introduction of the alternative hypothesis, the alternative “(hypothesized) truth”, we can call “accepting the null hypothesis and rejecting the alternative hypothesis” a mistake – the Type II error. We can also calculate the probability of this mistake. This is called beta, which is illustrated by the red area below.

Null hypothesis alternative hypothesis

From the visualization, we can see that beta is conditional on the alternative hypothesis and the critical value. Let’s elaborate on these two relationships one by one, very explicitly, as both of them are important.

First, Let’s visualize how beta changes with the mean of the alternative hypothesis by setting another alternative hypothesis where mean = 1 instead of 0.5

Sample Size 30 for Null and Alternative Hypothesis

Beta change from 13.7% to 0.0%. Namely, beta is the probability of falsely rejecting a particular alternative hypothesis when we assume it is true. When we assume a different alternative hypothesis is true, we get a different beta. So strictly speaking, beta only speaks to the probability of falsely rejecting a particular alternative hypothesis when it is true . Nothing else. It’s only under other conditions, that “rejecting the alternative hypothesis” implies “accepting” the null hypothesis or “failing to accept the null hypothesis”. We will further elaborate when we talk about p-value and confidence interval in another article. But what we talked about so far is true and enough for understanding power.

Second, there is a relationship between alpha and beta. Namely, given the null hypothesis and the alternative hypothesis, alpha would determine the critical value, and the critical value determines beta. This speaks to the tradeoff between mistake and discovery. 

If we tolerate more alpha, we will have a smaller critical value, and for the same beta, we can detect a smaller alternative hypothesis

If we tolerate more beta, we can also detect a smaller alternative hypothesis. 

In short, if we tolerate more mistakes (either Type I or Type II), we can detect a smaller true effect. Mistake vs. discovery is the fundamental tradeoff of Hypothesis Testing.

So tolerating more mistakes leads to more chance of discovery. This is the concept of MDE that we will elaborate on in part 4.

Finally, we’re ready to define power. Power is an important and fundamental topic in statistical testing, and we’ll explain the concept in three different ways.

Three ways to understand power

First, the technical definition of power is 1−β. It represents that given an alternative hypothesis and given our null, sample size, and decision rule (alpha = 0.05), the probability is that we accept this particular hypothesis. We visualize the yellow area below.

Understand Power Hypothesis

Second, power is really intuitive in its definition. A real-world example is trying to determine the most popular car manufacturer in the world. If I observe one car and see one brand, my observation is not very powerful. But if I observe a million cars, my observation is very powerful. Powerful tests mean that I have a high chance of detecting a true effect.

Third, to illustrate the two concepts concisely, let’s run a visualization by just changing the sample size from 30 to 100 and see how power increases from 86.3% to almost 100%.

Same size from 30 to 100

As the graph shows, we can easily see that power increases with sample size . The reason is that the distribution of both the null hypothesis and the alternative hypothesis became narrower as their sample means got more accurate. We are less likely to make either a type I error (which reduces the critical value) or a type II error.  

Type II error: Failing to reject the null hypothesis when the alternative hypothesis is true

Beta: The probability of making a type II error

Power: The ability of the test to detect a true effect when it’s there

Part 4, Power calculation: MDE

The relationship between mde, alternative hypothesis, and power.

Now, we are ready to tackle the most nuanced definition of them all: Minimum detectable effect (MDE). First, let’s make the sample mean of the alternative hypothesis explicit on the graph with a red dotted line.

Relationship between MDE

What if we keep the same sample size, but want power to be 80%? This is when we recall the previous chapter that “alternative hypotheses are theoretical constructs”. We can have a different alternative that corresponds to 80% power. After some calculations, we discovered that when it’s the alternative hypothesis with mean = 0.45 (if we keep the standard deviation to be 1).

MDE Alternative Hypothesis pt 2

This is where we reconcile the concept of “infinitely many alternative hypotheses” with the concept of minimum detectable delta. Remember that in statistical testing, we want more power. The “ minimum ” in the “ minimum detectable effect”, is the minimum value of the mean of the alternative hypothesis that would give us 80% power. Any alternative hypothesis with a mean to the right of MDE gives us sufficient power.

In other words, there are indeed infinitely many alternative hypotheses to the right of this mean 0.45. The particular alternative hypothesis with a mean of 0.45 gives us the minimum value where power is sufficient. We call it the minimum detectable effect, or MDE.

Not enough power MDE

The complete definition of MDE from scratch

Let’s go through how we derived MDE from the beginning:

We fixed the distribution of sample means of the null hypothesis, and fixed sample size, so we can draw the blue distribution

For our decision rule, we require alpha to be 5%. We derived that the critical value shall be 0.30 to make 5% alpha happen

We fixed the alternative hypothesis to be normally distributed with a standard deviation of 1 so the standard error is 0.18, the mean can be anywhere as there are infinitely many alternative hypotheses

For our decision rule, we require beta to be 20% or less, so our power is 80% or more. 

We derived that the minimum value of the observed mean of the alternative hypothesis that we can detect with our decision rule is 0.45. Any value above 0.45 would give us sufficient power.

How MDE changes with sample size

Now, let’s tie everything together by increasing the sample size, holding alpha and beta constant, and see how MDE changes.

How MDE changes with sample size

Narrower distribution of the sample mean + holding alpha constant -> smaller critical value from 0.3 to 0.16

+ holding beta constant -> MDE decreases from 0.45 to 0.25

This is the other key takeaway:  The larger the sample size, the smaller of an effect we can detect, and the smaller the MDE.

This is a critical takeaway for statistical testing. It suggests that even for companies not with large sample sizes if their treatment effects are large, AB testing can reliably detect it.

Statistical Power Curve

Summary of Hypothesis Testing

Let’s review all the concepts together.

Assuming the null hypothesis is correct:

Alpha: When the null hypothesis is true, the probability of rejecting it

Critical value: The threshold to determine rejecting vs. accepting the null hypothesis

Assuming an alternative hypothesis is correct:

Beta: When the alternative hypothesis is true, the probability of rejecting it

Power: The chance that a real effect will produce significant results

Power calculation:

Minimum detectable effect (MDE): Given sample sizes and distributions, the minimum mean of alternative distribution that would give us the desired alpha and sufficient power (usually alpha = 0.05 and power >= 0.8)

Relationship among the factors, all else equal: Larger sample, more power; Larger sample, smaller MDE

Everything we talk about is under the Neyman-Pearson framework. There is no need to mention the p-value and significance under this framework. Blending the two frameworks is the inconsistency brought by our textbooks. Clarifying the inconsistency and correctly blending them are topics for another day.

Practical recommendations

That’s it. But it’s only the beginning. In practice, there are many crafts in using power well, for example:

Why peeking introduces a behavior bias, and how to use sequential testing to correct it

Why having multiple comparisons affects alpha, and how to use Bonferroni correction

The relationship between sample size, duration of the experiment, and allocation of the experiment?

Treat your allocation as a resource for experimentation, understand when interaction effects are okay, and when they are not okay, and how to use layers to manage

Practical considerations for setting an MDE

Also, in the above examples, we fixed the distribution, but in reality, the variance of the distribution plays an important role. There are different ways of calculating the variance and different ways to reduce variance, such as CUPED, or stratified sampling.

Related resources:

How to calculate power with an uneven split of sample size: https://blog.statsig.com/calculating-sample-sizes-for-a-b-tests-7854d56c2646

Real-life applications: https://blog.statsig.com/you-dont-need-large-sample-sizes-to-run-a-b-tests-6044823e9992

Create a free account

2m events per month, free forever..

Sign up for Statsig and launch your first experiment in minutes.

Build fast?

Try statsig today.

hypothesis testing for one sample

Recent Posts

Controlling your type i errors: bonferroni and benjamini-hochberg.

The Benjamini-Hochberg procedure on Statsig reduces false positives in experiments by adjusting significance levels for multiple comparisons, ensuring reliable results.

Top 8 common experimentation mistakes and how to fix them

I discussed 8 A/B testing mistakes with Allon Korem (Bell Statistics) and Tyler VanHaren (Statsig). Learn fixes to improve accuracy and drive better business outcomes.

Introducing Differential Impact Detection

Introducing Differential Impact Detection: Identify how different user groups respond to treatments and gain useful insights from varied experiment results.

Identifying and experimenting with Power Users using Statsig

Identify power users to drive growth and engagement. Learn to pinpoint and leverage these key players with targeted experiments for maximum impact.

How to Ingest Data Into Statsig

Simplify data pipelines with Statsig. Use SDKs, third-party integrations, and Data Warehouse Native Solution for effortless data ingestion at any stage.

A/B Testing performance wins on NestJS API servers

Learn how we use Statsig to enhance our NestJS API servers, reducing request processing time and CPU usage through performance experiments.

Z-test vs T-test: the differences and when to use each

What is hypothesis testing, what is a z-test, examples of a z-test, what is a t-test, examples of a t-test, how to know when to use z-test vs t-test, difference between z-test and t-test: a comparative table.

two light bulbs with inscriptions Z-test vs T-test

The EPAM Anywhere Editorial Team is an international collective of senior software engineers, managers and communications professionals who create, review and share their insights on technology, career, remote work, and the daily life here at Anywhere.

Testing is how you determine effectiveness. Whether you work as a data scientist , statistician, or software developer, to ensure quality, you must measure performance. Without tests, you could deploy flawed code, features, or data points.

With that in mind, the use cases of testing are endless. Machine learning models need statistical tests. Data analysis involves statistical tests to validate assumptions. Optimization of any kind requires evaluation. You even need to test the strength of your hypothesis before you begin an inquiry.

Let's explore two inferential statistics: the Z-test vs the T-test. That way you can understand their differences, their unique purposes, and when to use a Z-test vs T-test.

To start, imagine you have a good idea. At the moment of inception, you have no data to back up your idea. It is an unformed thought. But the idea is an excellent starting point that can launch a full investigation. We consider this starting point a hypothesis.

But what if your hypothesis is off-base? You don’t want to dive into a full-scale search if it is a pointless chase with no reward. That is a waste of resources. You need to determine if you have a workable hypothesis.

Enter hypothesis testing. It is a statistical act used to assess the viability of a hypothesis. The method discovers whether there is sufficient data to support your idea. If there is next to no significance, you do not have a very plausible hypothesis.

To confirm the validity of a hypothesis, you compare it against the status quo (also known as the null hypothesis). Your idea is something new, opposite from normal conditions (also known as the alternative hypothesis). It is zero sum: only one hypothesis between the null and alternate hypothesis can be true in a given set of parameters.

In such a comparison test, you can now determine validity. You can compare and contrast conditions to find meaningful conclusions. Whichever conditions become statistically apparent determines which hypothesis is plausible.

A Z-test is a test statistic. It works with two groups: the sample mean and the population mean. It will test whether these two groups are different.

With a Z-test, you know the population standard deviation. That is to ensure statistical accuracy as you compare one group (the sample mean) vs the second group (the population mean). In other words, you can minimize external confounding factors with a normal distribution. In addition, a defining characteristic of a Z-test is that it works with large sample sizes (typically more than 30, so we achieve normal distribution as defined by the central limit theorem). These are two crucial criteria for using a Z-test.

Within hypothesis testing, your null hypothesis states there is no difference between the two groups your Z-test will compare. Your alternative hypothesis will state there is a difference that your Z-test will expose.

How to perform a Z-test

A Z-test occurs in the following standard format:

  • Formulate your hypothesis: First, define the parameters of your alternative and null hypothesis.
  • Choose a critical value: Second, determine what you consider a viable difference between your two groups. This threshold determines when you can say the null hypothesis should be rejected. Common levels are 0.01 (1%) or 0.05 (5%) , values found to best balance Type I and Type II errors .
  • Collect samples: Obtain the needed data. The data must be large enough and random.
  • Calculate for a Z-score: Input your data into the standard Z-test statistics formula, shown below, where Z = standard score, x = observed value, mu = mean of the sample, sigma = standard deviation of the sample .
  • Compare: If the statistical test is greater than the critical value, you have achieved statistical significance. The sample mean is so different so you can reject the null hypothesis. Your alternative hypothesis (something other than the status quo) is at work, and that's worth investigating.

There are different variations of a Z-test. Let's explore examples of one-sample and two-sample Z-tests.

One-sample Z-test

A one-sample Z-test looks for either an increase or a decrease. There is one sample group involved, taken from a population. We want to see if there is a difference between those two means.

For example, consider a school principal who believes their students' IQ is higher than the state average. The state average is 100 points (population mean), give or take 20 (the population standard deviation). To prove this hypothesis, the principal takes 50 students (the sample size) and finds their IQ scores. To their delight, they earn an average of 110.

But does the difference offer any statistical value? The principal then plugs the numbers into a Z-test. Any Z-score greater than the critical value would state there is sufficient significance. The claim that the students have an above-average IQ is valid.

Two-sample Z-test

A two-sample test involves comparing the average of two sample groups against the population means. It is to determine a difference between two independent samples.

For example, our principal wants to compare their students' IQ scores to the school across the street. They believe their students' average IQ is higher. They don’t need to know the exact numerical increase or decrease. All they want is proof that their student's average scores are higher than the other group.

To confirm the validity of this hypothesis, the principal will search for statistical significance. They can take a 50-student sample size from their school and a 50-student sample size from the rival school. Now in possession of both sample group's average IQ (and the sample standard deviation), they hope to find a number value that is not equal. And they need them to be unequal by a significant amount.

If the test statistic comes in less than the critical value, the differences are negligible. There is not enough evidence to say the hypothesis is worth exploring, the null hypothesis is maintained. He would not have enough proof that the IQ levels between the two schools are different.

data_analyst_portfolio_preview.jpg

Read full story

A T-test performs the same crucial function as a Z-test: determine if there is a difference between the means of two groups. If there is a significant difference, you have achieved statistical validity for your hypothesis.

However, a T-test involves a different set of factors. Most importantly, a T-test applies when you do not know the sample variance of your values. You must generalize the normal distribution (or T-distribution). Plus, there is an expectation that you do not possess all the data in a given scenario.

These conditions better match reality, as it is often hard to collect data from entire populations or always obtain a standard normal distribution. That is why T-tests are more widely applicable than Z-tests, though they operate with less precision.

How to perform a T-test

A T-test occurs in the following standard format:

  • Formulate your hypothesis: First, define the parameters of your null and alternative hypothesis.
  • Choose a critical value: Like a Z-test, determine what you consider a viable difference between your two groups.
  • Collect data: Obtain the needed data. One of the key differences is degrees of freedom in the samples of a T-test, so try to define the typical values and range of values in each group.
  • Calculate your T-score: Input your data into the T-test formula you chose. Here is a one-sample formula:
  • Compare: If the statistical test is greater than the critical value, you have achieved statistical significance. The sample mean is so far from the population mean that you likely have a useful hypothesis.

There are several different kinds of T-tests as well. Let's go through the standard one-sample and two-sample T-tests.

One-sample T-test

A one-sample T-test looks for an increase or decrease compared to a population mean.

For example, your company just went through sales training. Now, the manager wants to know if the training helped improve sales.

Previous company sales data shows an average of $100 on each transaction from all workers. The null hypothesis would be no change. The alternative hypothesis (which you hope is significant), is that there is an improvement.

To test if there is significance, you take the sales average of 20 salesmen. That is the only available data, and you have no other data from nationwide stores. The average of that sample of salesmen in the past month is $130. We will also assume that the standard deviation is approximately equal .

With this set of factors, you can calculate your T-score with a T-test. You compare the sample result to the critical value. In addition, you assess it against the number of degrees of freedom. Since we know with smaller sample data sizes there is greater uncertainty, we allow more room for our data to vary.

After comparing, we may find a lot of significance. That means the data possesses enough strength to support our hypothesis that sales training likely impacted sales. Of course, this is an estimate, as we only assessed one factor with a small group. Sales could have risen for numerous other reasons. But with our set of assumptions, our hypothesis is valid.

Two-sample T-test

A two-sample T-test occurs the same as a two-sample Z-test and compares if two groups are equal when compared to a defined population parameter.

For example, consider English and non-native speakers. We want to see the effect of maternal language on test scores inside a country. To do that, we will offer both groups a reading test and compare those scores to the average.

Of course, finding the mean of an entire population of language speakers is impossible to procure. Still, we can make some assumptions and compare them with a smaller size. We take 15 English speakers and 15 non-native speakers and collect their results. We can decide on a critical score value on the reading test as well. If the average score on the test is not crucially different or outside the population standard deviation, our assumption failed. There is no significant difference between the groups, so the impact of maternal language is not worth investigating.

Both a Z-test and a T-test validate a hypothesis. Both are parametric tests that rely on assumptions. The key difference between Z-test and T-test is in their assumptions (e.g. population variance).

Key differences about the data used result in different applications. You want to use the appropriate tool, otherwise you won’t draw valid conclusions from your data.

So when should you use a Z-test vs a T-test? Here are some factors to consider:

  • Sample size: If the available sample size is small, opt for a T-test. Small sample sizes are more variable, so the greater spread of distribution and estimation of error involved with T-tests is ideal.
  • Knowledge of the population standard deviation: Z-tests are more precise and often simpler to execute. So if you know the standard deviation, use a Z-test.
  • Test purpose: If you are assessing the validity of a mean, a T-test is the best choice. If you are working with a hypothesized population proportion, go for a Z-test.
  • Assumption of normality: A Z-test assumes a normal distribution. This does not apply to all real-world scenarios. If you hope to validate a hypothesis that is not well-defined, opt for a T-test instead.
  • Type of data: You can only work within the constraints of the available data. The more information the better, but that is often not possible given testing and collecting conditions. If you have limited data describing means between groups, opt for a T-test. If you have large data sets comparing means between populations, you can use a Z-test.

Knowing the key differences with each statistical test makes selecting the right tool far easier. Here is a table that can help you compare:

Compare means of small samples (n < 30)Compare means of large samples (n ≥ 30)
Normally distributed data, approximate normalityNormally distributed data, known population standard deviation
UnknownKnown
Small (n < 30)Large (n ≥ 30)
T-distributionStandard normal distribution (Z-distribution)
n1 + n2 - 2Not applicable
Small sample analysis, comparing means between groupsLarge sample analysis, population mean comparisons
BothUsually two-sample
Raw dataRaw data
Relatively more complexRelatively simpler

Statistical testing lets you determine the validity of a hypothesis. You discover validity by determining if there is a significant difference between your hypothesis and the status quo. If there is, you have a possible idea worth exploring.

That process has numerous applications in the field of computer science and data analysis . You might want to determine the performance of an app with an A/B test. Or you might need to test if an application fits within the defined limits and compare performance metrics. Z-tests and T-tests can depict whether there is significant evidence in each of these scenarios. With that information, you can take the appropriate measures to fix bugs or optimize processes.

Z-test and T-test are helpful tools, especially for hypothesis testing. For data engineers of the future, knowledge of statistical testing will only help your work and overall career trajectory.

Are you a data scientist looking for a job? Check out our remote data scientist jobs available .

Explore our Editorial Policy to learn more about our standards for content creation.

step-by-step guide to creating, building, and showcasing your data analyst portfolio projects

step-by-step guide to creating, building, and showcasing your data analyst portfolio projects

C# introduction: basics, introduction, and beginner projects

C# introduction: basics, introduction, and beginner projects

SDLC models: approach development the way that works for you

SDLC models: approach development the way that works for you

how to write effective usability test scripts

how to write effective usability test scripts

what is .NET?

what is .NET?

how quantitative usability testing improves digital products

how quantitative usability testing improves digital products

product feedback: how to use it to improve digital products

product feedback: how to use it to improve digital products

top 14 C++ machine learning libraries

top 14 C++ machine learning libraries

how to become a senior software engineer: a guide

how to become a senior software engineer: a guide

creative Salesforce portfolio project ideas

creative Salesforce portfolio project ideas

rebase vs. merge: a comprehensive guide

rebase vs. merge: a comprehensive guide

the future of Agile: what to expect in 2024 and beyond

the future of Agile: what to expect in 2024 and beyond

best React projects for a portfolio: from ideas to standout examples

best React projects for a portfolio: from ideas to standout examples

Ruby on Rails interview questions

Ruby on Rails interview questions

data scientist interview questions and answers

data scientist interview questions and answers

24 unique JavaScript projects for portfolio

24 unique JavaScript projects for portfolio

  • Effective Strategies for Conducting Hypothesis Tests

Effective Strategies for Conducting Hypothesis Tests in Statistics

Dr. Jane Smith

Hypothesis testing is a fundamental aspect of statistics, crucial for drawing meaningful conclusions from data. Whether you’re a student tackling your statistics homework or a researcher analyzing experimental results, understanding the process of hypothesis testing can significantly enhance your analytical skills. This guide will provide you with effective strategies for conducting hypothesis tests, ensuring you approach your statistical problems with confidence and precision.

Understanding Hypothesis Testing

Before diving into strategies, let's briefly recap what hypothesis testing involves. Hypothesis testing is a statistical method used to make inferences about a population parameter based on a sample. The process typically involves the following steps:

  • Formulate the Hypotheses: Establish the null hypothesis ((H_0)) and the alternative hypothesis ((H_a)).
  • Select the Significance Level ((\alpha)): Choose the probability of rejecting the null hypothesis when it is true.
  • Choose the Appropriate Test: Depending on the data and the hypotheses, select the appropriate statistical test.
  • Calculate the Test Statistic: Use the sample data to calculate the test statistic.
  • Determine the p-value or Critical Value: Compare the test statistic to the critical value or use the p-value to decide whether to reject the null hypothesis.
  • Draw Conclusions: Based on the comparison, decide whether to reject or fail to reject the null hypothesis.

Effective Strategies for Conducting Hypothesis Tests in Statistics

Strategy 1: Clearly Define Your Hypotheses

The first step in hypothesis testing is to clearly define your null and alternative hypotheses. The null hypothesis ((H_0)) typically represents the status quo or a statement of no effect, while the alternative hypothesis ((H_a)) represents the effect or difference you aim to detect. For instance, if you are testing whether a new drug is more effective than the existing one, (H_0) could be "the new drug has no effect" and (H_a) could be "the new drug is more effective."

Strategy 2: Choose the Right Test for Your Data

Selecting the appropriate statistical test is crucial. The choice depends on several factors, including the type of data, sample size, and whether the population variance is known. Common tests include:

  • Z-test: Used when the population variance is known and the sample size is large.
  • T-test: Used when the population variance is unknown and the sample size is small.
  • Chi-square test: Used for categorical data to assess how likely it is that an observed distribution is due to chance.
  • ANOVA (Analysis of Variance): Used to compare the means of three or more samples.

Strategy 3: Ensure Proper Sample Size

A proper sample size is essential for the reliability of your hypothesis test. Too small a sample size may lead to inconclusive results, while too large a sample size may make even trivial differences appear significant. Use power analysis to determine the appropriate sample size for your study, ensuring that it is neither too small nor unnecessarily large.

Strategy 4: Check Assumptions

Every statistical test comes with its assumptions. For instance, the t-test assumes that the data are normally distributed and that the samples have equal variances. Always check these assumptions before proceeding with the test. If the assumptions are violated, consider using a different test or transforming your data.

Strategy 5: Interpret Results in Context

Interpreting the results of a hypothesis test goes beyond just looking at the p-value. Consider the practical significance of your findings. A statistically significant result does not always imply a practically important effect. Additionally, consider the confidence interval, which provides a range of values within which the true population parameter is likely to lie.

Strategy 6: Avoid Common Pitfalls

Be aware of common pitfalls in hypothesis testing:

  • P-hacking: Avoid manipulating your data or performing multiple tests just to achieve significant results.
  • Ignoring Effect Size: Always report the effect size to provide a sense of the magnitude of the observed effect.
  • Overreliance on P-values: A p-value is not the probability that the null hypothesis is true. It merely indicates the strength of the evidence against the null hypothesis.

Practical Example

Let’s consider a practical example to illustrate these strategies. Suppose we want to test whether a new teaching method improves student performance compared to the traditional method.

1. Formulate Hypotheses:

(H_0): The new teaching method has no effect on student performance.

(H_a): The new teaching method improves student performance.

  • Select Significance Level: Choose (\alpha = 0.05).
  • Choose the Test: If we have a small sample size and do not know the population variance, a t-test is appropriate.
  • Calculate the Test Statistic: Use sample data to calculate the t-value.
  • Determine p-value or Critical Value: Compare the calculated t-value to the critical value from the t-distribution table.
  • Draw Conclusions: If the p-value is less than 0.05, we reject (H_0), concluding that the new teaching method improves student performance.

Hypothesis testing is a powerful tool for making inferences about populations based on sample data. By clearly defining hypotheses, choosing the right test, ensuring proper sample size, checking assumptions, interpreting results in context, and avoiding common pitfalls, you can effectively conduct hypothesis tests and draw meaningful conclusions from your statistical analyses. Whether you’re working on homework assignments or conducting research, these strategies will enhance your ability to perform accurate and reliable hypothesis tests.

Post a comment...

Effective strategies for conducting hypothesis tests submit your homework, attached files.

File Actions

COMMENTS

  1. Hypothesis Testing with One Sample

    STEP 3: Calculate sample parameters and the test statistic. The sample parameters are provided, the sample mean is 7.91 and the sample variance is .03 and the sample size is 35. We need to note that the sample variance was provided not the sample standard deviation, which is what we need for the formula.

  2. One Sample T Test: Definition, Using & Example

    One Sample T Test Hypotheses. A one sample t test has the following hypotheses: Null hypothesis (H 0): The population mean equals the hypothesized value (µ = H 0).; Alternative hypothesis (H A): The population mean does not equal the hypothesized value (µ ≠ H 0).; If the p-value is less than your significance level (e.g., 0.05), you can reject the null hypothesis.

  3. 9: Hypothesis Testing with One Sample

    9.1: Prelude to Hypothesis Testing. A statistician will make a decision about claims via a process called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject ...

  4. Significance tests (hypothesis testing)

    One-tailed and two-tailed tests (Opens a modal) Z-statistics vs. T-statistics (Opens a modal) Small sample hypothesis test (Opens a modal) Large sample proportion hypothesis testing (Opens a modal) Up next for you: Unit test. Level up on all the skills in this unit and collect up to 1,500 Mastery points!

  5. One Sample t-test: Definition, Formula, and Example

    Fortunately, a one sample t-test allows us to answer this question. One Sample t-test: Formula. A one-sample t-test always uses the following null hypothesis: H 0: μ = μ 0 (population mean is equal to some hypothesized value μ 0) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  6. Lesson 6b: Hypothesis Testing for One-Sample Mean

    Step 4: Compute the appropriate p-value based on our alternative hypothesis. If \(H_a \) is right-tailed, then the p-value is the probability the sample data produces a value equal to or greater than the observed test statistic.

  7. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  8. Statistical Hypothesis Testing Overview

    Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.

  9. 8: Hypothesis Testing with One Sample

    8.2: Hypothesis Test Examples for Means. The hypothesis test itself has an established process. This can be summarized as follows: Determine H0 and Ha. Remember, they are contradictory. Determine the random variable. Determine the distribution for the test. Draw a graph, calculate the test statistic, and use the test statistic to calculate the ...

  10. Hypothesis testing and p-values (video)

    In this video there was no critical value set for this experiment. In the last seconds of the video, Sal briefly mentions a p-value of 5% (0.05), which would have a critical of value of z = (+/-) 1.96. Since the experiment produced a z-score of 3, which is more extreme than 1.96, we reject the null hypothesis.

  11. 5.3

    5.3 - Hypothesis Testing for One-Sample Mean. In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution ...

  12. 10: Hypothesis Testing with One Sample

    10.1: Prelude to Hypothesis Testing. A statistician will make a decision about claims via a process called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject ...

  13. PDF 8 Hypothesis*Tests*for* One*Sample

    Chapter 8 Stat 4570/5570. Material from Devore's book (Ed 8), and Cengage. A statistical hypothesis: a claim about the value of a parameter, population characteristic (could be a combination of parameters), or about the form of an entire probability distribution. Examples:

  14. Introduction to Hypothesis Testing

    A hypothesis test consists of five steps: 1. State the hypotheses. State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false. 2. Determine a significance level to use for the hypothesis. Decide on a significance level.

  15. 9: Hypothesis Testing with One Sample

    9.1: Prelude to Hypothesis Testing. A statistician will make a decision about claims via a process called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject ...

  16. 9.2: Hypothesis Testing

    Null and Alternative Hypotheses. The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the ...

  17. 8.6: Hypothesis Test of a Single Population Mean with Examples

    Steps for performing Hypothesis Test of a Single Population Mean. Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure. Find or identify the sample size, n, the sample mean, ˉx. x ¯.

  18. PDF Chapter (9) Fundamentals of Hypothesis Testing: One-Sample Tests

    distribution to test the null hypothesis about the mean. • When using the t distribution you must assume the population you are sampling from follows a normal distribution. Step (4): Selected the Critical value Types of test Z t The one - tailed test (Right) Z α (α , n−1) The one - tailed test (left) - Z (α - α,n−1)

  19. 10: Hypothesis Testing with One Sample

    A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject the null hypothesis. 10.1: Null and Alternative Hypotheses. The actual test begins by considering two hypotheses.

  20. 9.E: Hypothesis Testing with One Sample (Exercises)

    An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized-approximately 1,200 students-small city demographic) to determine if the local high school's percentage was lower. One hundred fifty students were chosen at random and surveyed.

  21. Learn about Hypothesis Testing with One Sample for ...

    Hypothesis testing with one sample is a fundamental concept in statistics that falls under the STEM category. It involves using a sample of data to test a hypothesis about a population parameter. The process involves setting up a null hypothesis, which assumes that there is no significant difference between the sample and population, and an ...

  22. Hypothesis Testing explained in 4 parts

    Hypothesis Testing often confuses data scientists due to mixed teachings. This guide clarifies 10 key concepts with visuals and simple explanations for better understanding. ... we have constructed a simple world of one hypothesis as the only truth, and a decision rule with two potential outcomes - one of the outcomes is "reject the null ...

  23. 8: Hypothesis Testing with One Sample

    8.1: Prelude to Hypothesis Testing. A statistician will make a decision about claims via a process called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject ...

  24. Z-test vs T-test: the differences and when to use each

    Of course, this is an estimate, as we only assessed one factor with a small group. Sales could have risen for numerous other reasons. But with our set of assumptions, our hypothesis is valid. Two-sample T-test. A two-sample T-test occurs the same as a two-sample Z-test and compares if two groups are equal when compared to a defined population ...

  25. 9.2: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses. They are called the null hypothesis and the alternative hypothesis. ... Hypothesis Testing with One Sample 9.2: Null and Alternative Hypotheses Expand/collapse global location ... 9.1: Prelude to Hypothesis Testing; 9.2E: Null and Alternative Hypotheses (Exercises) Was this article helpful ...

  26. Effective Strategies for Conducting Hypothesis Tests

    For instance, if you are testing whether a new drug is more effective than the existing one, (H_0) could be "the new drug has no effect" and (H_a) could be "the new drug is more effective." ... A proper sample size is essential for the reliability of your hypothesis test. Too small a sample size may lead to inconclusive results, while too large ...

  27. 8.3: Hypothesis Testing of Single Mean

    Standardized Test Statistics for Small Sample Hypothesis Tests Concerning a Single Population Mean ... There are two formulas for the test statistic in testing hypotheses about a population mean with small samples. One test statistic follows the standard normal distribution, the other Student's \(t\)-distribution.