Geography Department Penn State

  • Instructor Information
  • Using the Penn State Library
  • Class Communication Guidelines
  • Getting Help!

Hypothesis Testing: Randomization Distributions

Print

Like bootstrap distributions, randomization distributions tell us about the spread of possible sample statistics. However, while bootstrap distributions originate from the raw sample data, randomization distributions simulate what sort of sample statistics values we should see if the null hypothesis is true . This is key to using randomization distributions for hypothesis testing, where we can then compare our actual sample statistic (from the raw data) to the range of what we'd expect if the null hypothesis were to be true (i.e., the randomization distribution).

Randomization Procedure 

The procedure for generating a randomization distribution, and subsequently comparing to the actual sample statistic for hypothesis testing, is depicted in the figure below.

A conceptual flowchart depicting the generation of the randomization distribution from the null hypothesis.

The pseudo-code (general coding steps, not written in a specific coding language) for generating a randomization distribution is:

Obtain a sample of size n

For i in 1, 2, ..., N

Manipulate and randomize sample so that the null hypothesis condition is met. I t is important that this new sample has the same size as the original sample (n).

Calculate the statistic of interest for the ith randomized sample

Store this value as the ith randomized statistic

Combine all N randomized statistics into the randomization distribution

Here, we've set the number of randomization samples to N = 1000, which is safe to use and which you can use as the default for this course. The validity of the randomization distribution depends on having a large enough number of samples, so it is not recommended to go below N = 1000. In the end, we have a vector or array of sample statistic values; that is our randomization distribution.

As we'll see in the subsequent pages, we'll use different strategies for simulating conditions under the assumption of the null hypothesis being true (Step 2.1 above). The choice of strategy will depend on the type of testing we're doing (e.g., single mean vs. single proportion vs. ...). Generally speaking, the goal of each strategy is to have the collection of sample statistics agree with the null hypothesis value on average, while maintaining the level of variability contained in the original sample data. This is really important, because our goal with hypothesis testing is to see what could occur just by random chance alone, given the null conditions are true , and then compare our data (representing what is actually happening in reality) to that range of possibilities. 

  • The randomization distribution is centered on the value in the null hypothesis (the null value).
  • The spread of the randomization distribution describes what sample statistic values would occur with random chance and given the null hypothesis is true.

The figure below exemplifies these key features, where the histogram represents the randomization distribution and the vertical red dashed line is on the null value.

Hypothesis Testing Randomization Distribution Histogram

The p-value

Our goal with the randomization distribution is to see how likely our actual sample statistic is to occur, given that the null hypothesis is true. This measure of likelihood is quantified by the p-value:

Let's elaborate on some important aspects of this definition and provide guidance on how to determine the p-value:

  • The p-value is a probability, so it has a value between 0 and 1.
  • This probability is measured as the proportion of samples in the randomization distribution that are at least as extreme as the observed sample (from the original data)
  • If the alternative is < (a.k.a. "left-tailed test"), the p-value = the proportion of samples ≤ the sample statistic.
  • If the alternative is > (a.k.a. "right-tailed test"), the p-value = the proportion of samples ≥ the sample statistic.
  • If the alternative is ≠ (a.k.a. "two-tailed test"), the p-value = twice the smaller of: the proportion of samples ≤ the sample statistic or the proportion of samples ≥ the sample statistic

The default threshold for rejecting or not rejecting the null hypothesis is 0.05, refer to as the "significance level" (more on this later). Thus,

  • If the p-value < 0.05, we can reject the null hypothesis (in favor of the alternative hypothesis)
  • If the p-value > 0.05, we fail to reject the null hypothesis

Although it should be noted that some researchers are moving away from the classical paradigm and starting to think of the p-value on more of a continuous scale, where smaller values are more indicative of rejecting the null. 

 Assess It: Check Your Knowledge

  • Diversity and Inclusion
  • Faculty Positions
  • Location and Maps
  • Our Department
  • Graduate Programs
  • How to Apply
  • Online Programs
  • Student Resources
  • Grad Student Assoc.
  • Graduate Student Highlights
  • Frequently Asked Questions
  • Online Courses
  • Prospective Students
  • Undergraduate Programs
  • Integrated Degree
  • Research Opportunities
  • Statistics Club
  • Syllabus Archive
  • Career Development
  • Administrative Faculty
  • Graduate Students
  • M.A.S. Students
  • Affiliated Faculty
  • Lindsay Assistant Professors
  • Online Instructors
  • Post-docs / Visitors
  • Astrostatistics
  • Bayesian Statistics
  • Biostatistics & Bioinformatics
  • Business Analytics
  • Computational Statistics
  • Functional Data Analysis
  • High Dimensional Data
  • Imaging Science
  • Social Science
  • Spatial and Spatiotemporal Data

  • Statistical Network Science
  • Statistical and Machine Learning
  • Statistics Education
  • Giving to Statistics

Dept. of Department of Statistics

Runze Li

Runze Li is Eberly Family Chair Professor of Statistics at Penn State.

Li received his Ph.D. in Statistics from University of North Carolina at Chapel Hill in 2000.

Li's research interest includes variable selection and feature screening for high dimensional data, nonparametric modeling and semiparametric modeling and their application to social behavior science research. He is also interested in longitudinal data analysis and survival data analysis and their application to biomedical data analysis.

Li joined Penn State as an assistant professor of statistics in 2000, and became associate professor, full professor, distinguished professor and Verne M. Willaman Professor of Statistics in 2005, 2008, 2012 and 2014, respectively. Since 2018, he is the Eberly Family Chair Professor of Statistics. He received his NSF Career Award in 2004. He is a fellow of IMS, ASA and AAAS. He was co-editor of Annals of Statistics, and served as associate editor of Annals of Statistics and Statistica Sinica. He currently serves as associate editor of JASA and Journal of Multivariate Analysis.

Honors and Awards

  • The United Nations' World Meteorological Organization Gerbier-Mumm International Award for 2012
  • Editor of The Annals of Statistics (2013 - 2015)
  • Highly Cited Researcher in Mathematics (2014 - )
  • ICSA Distinguished Achievement Award, 2017
  • Faculty Research Recognition Awards for Outstanding Collaborative Research. College of Medicine, Penn State University, 2018
  • IMS Medallion Lecturer at Joint Statistical Meetings, August 5-10, 2023 in Toronto
  • Distinguished Mentoring Award, Eberly College of Science, Penn State University, 2023
  • Fellow, IMS, ASA and American Association for the Advancement of Science

Publications

  • Zhong, W., Qian, C., Liu, W., Zhu, L. and Li, R. (2023). Feature screening for interval-valued response with application to study association between posted salary and required skills. Journal of American Statistical Association. 118, 805 - 817.
  • Sheng, B., Li, C., Bao, L. and Li, R. (2023). Probabilistic HIV recency classification - a logistic regression without labelled individual level training data. Annals of Applied Statistics. 17, 108-129.
  • Guo, X, Li, R, Liu, J. and Zeng, M. (2023). Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic. Journal of Econometrics. 235, 166-179.
  • Bao, L, Li, C., Li, R. and Yang, S. (2022). Causal structural learning on MPHIA individual dataset. Journal of American Statistical Association. 117, 1642-1655.
  • Li, C. and Li, R. (2022). Linear hypothesis testing in linear models with high dimensional responses. Journal of American Statistical Association. 117, 1738-1750.
  • Guo, X., Li, R., Liu, J. and Zeng, M. (2022). High-dimensional mediation analysis for selecting DNA methylation Loci mediating childhood trauma and cortisol stress reactivity. Journal of American Statistical Association. 117, 1110-1121.
  • Nandy, D., Chiaromonte, F. and Li, R. (2022). Covariate information number for feature screening in ultrahigh-dimensional supervised problems. Journal of American Statistical Association. 117, 1516 - 1529.
  • Ren, H., Zou, C., Chen, N. and Li, R. (2022). Large-scale data streams surveillance via pattern-oriented-sampling. Journal of American Statistical Association. 117, 794-808.
  • Liu, W., Ke, Y., Liu, J. and Li, R. (2022). Model-free feature screening and FDR control with knockoff features. Journal of American Statistical Association. 117(537), 428-443.
  • Zou, T, Lan, W, Li, R. and Tsai, C.-L. (2022). Inference on Covariance-Mean Regression. Journal of Econometrics. 230, 318 - 338.
  • Liu, W., Yu, X. and Li, R. (2022). Multiple-splitting project test for high dimensional mean vectors. Journal of Machine Learning and Research. 23(71), 1-27.
  • Cai, Z., Li, R. and Zhang, Y. (2022). A distribution free conditional independence test with applications to causal discovery. Journal of Machine Learning and Research. 23(85), 1-41.
  • Shi, C., Song, R., Lu, W. and Li, R. (2021). Statistical inference for high-dimensional models via recursive online-score estimation. Journal of American Statistical Association. 116, 1307 - 1318.
  • Li, Z., Wang, Q. and Li, R. (2021). Central limit theorem for linear spectral statistics of large dimensional Kendall's rank correlation matrices and its applications. Annals of Statistics. 49, 1569 -1593.
  • Xiao, D., Ke, Y. and Li, R. (2021). Homogeneity structure learning in large-scale panel data with heavy-tailed errors. Journal of Machine Learning Research. 22 22(13):1-42, 2021.
  • Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020). A tuning-free robust and efficient approach to high-dimensional regression (with discussions and rejoinder). Journal of American Statistical Association. 115, 1700 - 1729.
  • Fang, X. E., Ning, Y. and Li, R. (2020). Test of signi cance for high-dimensional longitudinal data. Annals of Statistics. 48, 2622 - 2645.
  • Zhou, T., Zhu, L., Xu, C. and Li, R. (2020). Model-free forward regression via cumulative divergence. Journal of American Statistical Association. 115, 1393 - 1405.
  • Zou, C., Wang, G. and Li, R. (2020). Consistent selection of the number of change-points via sample-splitting. Annals of Statistics. 48, 413-439.
  • Cui, X., Li, R., Yang, G. and Zhou, W. (2020). Empirical likelihood test for large dimensional mean vector. Biometrika. 107, 591 - 607.
  • Wang, L., Chen, Z., Wang, C. D. and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation. Journal of Econometrics. 215, 118-130.
  • Chu, W., Li, R., Liu, J. and Reimherr, M. (2020). Feature screening for generalized varying coefficient mixed effect models with application to obesity GWAS. Annals of Applied Statistics. 14, 276 - 298.
  • Cai, Z, Li, R. and Zhu, L. (2020). Online sufficient dimension reduction through sliced inverse regression. Journal of Machine Learning and Research. 21(10), 1-25.
  • Li, X., Li, R., Xia, Z. and Xu, C. (2020). Distributed feature screening via componentwise debiasing. Journal of Machine Learning and Research. 21 (24), 1-32
  • Zhong, P.-S., Li, R. and Santo, S. (2019). Homogeneity test of covariance matrices and change-points identification with high-Dimensional longitudinal data. Biometrika. 106, 619 - 634.
  • Zheng, S., Chen, Z., Cui, H. and Li, R. (2019). Hypothesis testing on linear structures of high dimensional covariance matrix. Annals of Statistics. 47, 3300 - 3334.
  • Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models. Annals of Statistics. 47, 2671 - 2703.
  • Liu, H., Wang, X., Yao, T., Li, R. and Ye, Y. (2019). Sample average approximation with sparsity-inducing penalty for high-dimensional stochastic programming. Mathematical Programming, 78, 69-108.
  • Chu, W., Li, R. and Reimherr, M. (2016). Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data. Annals of Applied Statistics, 10, 596 - 617.
  • Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. Journal of American Statistical Association. 107, 1129 - 1139.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Annals of Statistics, 36, 1509-1566.
  • Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Annals of Statistics. 36, 261-286
  • Fan, J. and Li, R. (2006). Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery. Proceedings of the International Congress of Mathematicians (M. Sanz-Sole, J. Soria, J.L. Varona, J. Verdera, eds.), Vol. III, European Mathematical Society, Zurich, 595-622.
  • Li, R. and Sudjianto, A. (2005). Analysis of computer experiments using penalized likelihood in Gaussian kriging Models. Technometrics. 47, 111-120.
  • Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of American Statistical Association, 99, 710-723.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties, Journal of American Statistical Association. 96, 1348-1360.
  • Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying coefficient models. Journal of the American Statistical Association. 5, 888-902.

Stat 565 - Multivariate Analysis

Stat 597 - Statistical Foundations of Data Science

Stat 597 - Statistical Inference on High-dimensional Data

Penn State logo

  • All bulletins
  • Undergraduate
  • Penn State Law
  • Dickinson Law
  • College of Medicine
  • Departments

Statistics (STAT)

Descriptive statistics, hypothesis testing, power, estimation, confidence intervals, regression, one- and 2-way ANOVA, Chi-square tests, diagnostics.

Prerequisite: one undergraduate course in statistics

Analysis of research data through simple and multiple regression and correlation; polynomial models; indicator variables; step-wise, piece-wise, and logistic regression.

Prerequisite: STAT 500 or equivalent; matrix algebra

Analysis of variance and design concepts; factorial, nested, and unbalanced data; ANCOVA; blocked, Latin square, split-plot, repeated measures designs.

Prerequisite: STAT 462 or STAT 501

Design principles; optimality; confounding in split-plot, repeated measures, fractional factorial, response surface, and balanced/partially balanced incomplete block designs.

Prerequisite: STAT 462 or STAT 501 ; STAT 502

Models for frequency arrays; goodness-of-fit tests; two-, three-, and higher- way tables; latent and logistic models.

Prerequisite: STAT 460 or STAT 502 or STAT 516 ; matrix algebra

Analysis of multivariate data; T2-tests; particle correlation; discrimination; MANOVA; cluster analysis; regression; growth curves; factor analysis; principal components; canonical correlations.

Prerequisite: MATH 441 , STAT 501 , STAT 502

Theory and application of sampling from finite populations.

Prerequisite: calculus; 3 credits in statistics

Research and quantitative methods for analysis of epidemiologic observational studies. Non-randomized, intervention studies for human health, and disease treatment. STAT 507 Epidemiologic Research Methods (3) This 3-credit course develops research and quantitative methods related to the design and analysis of epidemiological (mostly observational) studies. Such studies assess the health and disease status of one or more human populations or identify factors associated with health and disease status. To a lesser degree, the course also covers non-randomized, intervention (experimental) studies that may be designed and analyzed with epidemiological methods. This course is a second-level course and complements Biostat Methods, STAT 509 , which is focused on clinical (experimental) trials. Together, these two courses provide students with a complete review of research methods for the design and analysis for common studies related to human health, disease, and treatment. Prerequisite are Intro Biostats ( STAT 250 or equivalent).

Prerequisite: STAT 250 or equivalent

With rapid advances in information technology, the field of Applied Statistics and Data Science has witnessed an explosive growth in the capabilities to generate and collect data. In the business world, very large databases on commercial transactions are generated by retailers. Huge amounts of scientific data are generated in various fields as well using a wide assortment of high throughput technologies. The internet provides another example of billions of web pages consisting of textual and multimedia information that is used by millions of people. Analyzing large complex bodies of data systematically and efficiently remains a challenging problem. This course addresses this problem by covering techniques and new software that automate the analysis and exploration of large complex data sets. Data Mining methods are introduced by using examples to demonstrate the power of the statistical methods for exploring structure in data sets, discovering patterns in data, making predictions, and reducing the dimensionality by Principal Component Analysis (PCA) and other tools for visualization of high dimensional data. Exploratory data analysis, classification methods, clustering methods, and other statistical and algorithmic tools are presented and applied to actual data. In particular, the course investigates classification methods (supervised learning), and clustering methods (unsupervised learning), and other statistical and algorithmic tools as they are applied to actual data. In addition, data mining and learning techniques developed in fields other than statistics, e.g., machine learning and signal processing, will also be reviewed. The Statistics graduate program also offers more in-depth courses on data mining, STAT 557 and STAT 558 . This course focuses on how to use software to investigate and analyze large data sets, whereas STAT 557 and STAT 558 focus more on writing data mining algorithms and the computational aspects of algorithm implementation.

Prerequisite: ( STAT 501 ; STAT 462 )

An introduction to the design and statistical analysis of randomized and observational studies in biomedical research. STAT 509 Design and Analysis of Clinical Trials (3) The objective of the course is to introduce students to the various design and statistical analysis issues in biomedical research. This is intended as a survey course covering a wide variety of topics in clinical trials, bioequivalence trials, toxicological experiments, and epidemiological studies. Many of these topics do not appear in other statistics courses, although a few topics are covered in greater depth in more advanced statistics courses. Computations are performed via the SAS statistical software package. Evaluation methods include four to five homework assignments, an in-class mid-semester examination and an in-class final examination.

Prerequisite: STAT 500

Identification of models for empirical data collected over time. Use of models in forecasting.

Prerequisite: STAT 462 or STAT 501 or STAT 511

Multiple regression methodology using matrix notation; linear, polynomial, and nonlinear models; indicator variables; AOV models; piece-wise regression, autocorrelation; residual analyses.

Prerequisite: STAT 500 or equivalent; matrix algebra; calculus

AOV, unbalanced, nested factors; CRD, RCBD, Latin squares, split-plot, and repeatd measures; incomplete block, fractional factorial, response surface designs; confounding.

Prerequisite: STAT 511

Probability models, random variables, expectation, generating functions, distribution theory, limit theorems, parametric families, exponential families, sampling distributions.

Prerequisite: MATH 230

Sufficiency, completeness, likelihood, estimation, testing, decision theory, Bayesian inference, sequential procedures, multivariate distributions and inference, nonparametric inference.

Prerequisite: STAT 513

Conditional probability and expectation, Markov chains, Poisson processes, Continuous-time Markov chains, Monte Carlo methods, Markov chain Monte Carlo. STAT 515 Stochastic Processes and Monte Carlo Methods (3) This course provides an introduction to stochastic processes and Monte Carlo methods. The course covers topics usually covered in a standard introductory course on stochastic processes, including Markov chains of various kinds. It also covers modern Monte Carlo and Markov chain Monte Carlo methods. Simulation and computing are emphasized throughout the course. The course is divided into two parts: the first part (roughly 8 weeks) provides an introduction to stochastic processes, while the latter (roughly 7 weeks) focuses on Monte Carlo methods, including Markov chain Monte Carlo. The first part of the course begins with a review of elementary conditional probability and expectation before covering basic discrete-time Markov chain theory and Poisson processes. The course then provides students with an overview of continuous-time Markov chains and birth-death processes. The second part of the course covers Monte Carlo methods. Starting with basic random variate generation, the course covers classical Monte Carlo methods such as accept-reject and importance sampling before discussing Markov chain Monte Carlo (MCMC) methods, which includes the Metropolis-Hastings and Gibbs sampling algorithms, and Markov chain theory for discrete-time continuous-space Markov chains.

Prerequisite: MATH 414 , STAT 414 , or STAT 513

Measure theoretic foundation of probability, distribution functions and laws, types of convergence, central limit problem, conditional probability, special topics.

Prerequisite: MATH 403

Cross-listed with: MATH 517

Prerequisite: STAT 517

Cross-listed with: MATH 518

Selected topics in stochastic processes, including Markov and Wiener processes; stochastic integrals, optimization, and control; optimal filtering.

Prerequisite: STAT 516 , STAT 517

Cross-listed with: MATH 519

Location estimation, 2- and K- sample problems, matched pairs, tests for association and covariance analysis when the data are censored.

Prerequisite: STAT 512 , STAT 514

Computational foundations of statistics; algorithms for linear and nonlinear models, discrete algorithms in statistics, graphics, missing data, Monte Carlo techniques.

Prerequisite: STAT 501 or STAT 511 ; STAT 415 ; matrix algebra

Two-way tables; generalized linear models; logistic and conditional logistic models; loglinear models; fitting strategies; model selection; residual analysis.

A coordinate-free treatment of the theory of univariate linear models, including multiple regression and analysis of variance models.

Prerequisite: MATH 415 or STAT 415 or STAT 514 ; STAT 512 ; MATH 436 or MATH 441

Treatment of other normal models, including generalized linear, repeated measures, random effects, mixed, correlation, and some multivariate models.

Prerequisite: STAT 551

A rigorous but non-measure-theoretic introduction to statistical large-sample theory for Ph.D. students. STAT 553 Asymptotic Tools (3) STAT 553 covers most standard statistical asymptotics theory but does not require any knowledge of measure theory (it does not define convergence with probability one, for example). It covers convergence of random variables in both the univariate and multivariate settings, Slutsky's theorem(s) and the delta method, the Lindeberg-Feller central limit theorem, power and sample size, likelihood-based estimation and testing, and U-statistics. Although there is no measure theory in the course, it is a mathematically rigorous course and major results are proved. Many common applications of the theory in mathematical statistics are discussed, and most assignments require the use of a computer.

Prerequisite: STAT 513 and STAT 514

Statistical Analysis of High Throughput Biology Experiments.

Cross-listed with: BIOL 555 , MCIBS 555

This course introduces data mining and statistical/machine learning, and their applications in information retrieval, database management, and image analysis. STAT 557 Data Mining I With rapid advances in information technology, we have witnessed an explosive growth in our capabilities to generate and collect data in the last decade. In the business world, very large databases on commercial transactions have been generated by retailers. Huge amount of scientific data have been generated in various fields as well. For instance, the human genome database project has collected gigabytes of data on the human genetic code. The World Wide Web provides another example with billions of web pages consisting of textual and multimedia information that are used by millions of people. How to analyze huge bodies of data so that they can be understood and used efficiently remains a challenging problem. Data mining addresses this problem by providing techniques and software to automate the analysis and exploration of large complex data sets. Research on data mining have been pursued by researchers in a wide variety of fields, including statistics, machine learning, database management and data visualization. This course on data mining will cover methodology, major software tools and applications in this field. By introducing principal ideas in statistical learning, the course will help students to understand conceptual underpinnings of methods in data mining. Considerable amount of effort will also be put on computational aspects of algorithm implementation. To make an algorithm efficient for handling very large scale data sets, issues such as algorithm scalability need to be carefully analyzed. Data mining and learning techniques developed in fields other than statistics, e.g., machine learning and signal processing, will also be introduced. Example topics include linear classification/regression, logistic regression, model regularization, dimension reduction, prototype methods, decision trees, mixture models, and hidden Markov models. Students will be required to work on projects to practice applying existing software and to a certain extent, developing their own algorithms. Classes will be provided in three forms: lecture, project discussion, and special topic survey/research applications. Project discussion will enable students to share and compare ideas with each other and to receive specific guidance from the instructors. Efforts will be made to help students formulate real-world problems into mathematical models so that suitable algorithms can be applied with consideration of computational constraints. By surveying special topics, students will be exposed to massive literature and become more aware of recent research. Students are strongly encouraged to survey or present their own applications of data mining and statistical learning in graduate research and carry out discussions on data collection and problem formulation.

Prerequisite: STAT 318 or STAT 416 and basic programming skills

Advanced data mining techniques: temporal pattern mining, network mining, boosting, discriminative models, generative models, data warehouse, and choosing mining algorithms. IST (STAT) 558 Data Mining II (3)This course is the second course in a two-course sequence on data mining. It emphasizes advanced concepts and techniques for data mining and their application to large-scale data warehouse. Building on the statistical foundations and underpinnings of data mining introduced in Data Mining I , this course covers advanced topics on data mining; mining association rules from large-scale data warehouse, hierarchical clustering, mining patterns from temporal data, semi-supervised learning, active learning and boosting. In addition, to computational aspects of algorithm implementation, the course will also cover architecture and implementation of data warehouse, data preprocessing (including data cleansing), and the choice of mining algorithms for applications. In addition to discriminative models such as CRF and SVM models, the course will also introduce generative models such as Bayesian Net and LDA. A term project will be developed by each student to apply an advanced data mining algorithm to a multi-dimensional data set. Classes will include lectures, paper discussions, and project presentations. Paper discussions will allow students to discuss state-of-the-art literature related to data mining. Project presentations will enable students to share and compare project ideas with each other and to receive feedback from the instructor.

Prerequisite: STAT 557 or IST 557

Cross-listed with: IST 558

Classical optimal hypothesis test and confidence regions, Bayesian inference, Bayesian computation, large sample relationship between Bayesian and classical procedures.

Prerequisite: STAT 514 ; Concurrent: STAT 517

Basic limit theorems; asymptotically efficient estimators and tests; local asymptotic analysis; estimating equations and generalized linear models.

Prerequisite: STAT 561

Theoretical treatment of methods for analyzing multivariate data, including Hotelling's T2, MANOVA, discrimination, principal components, and canonical analysis.

Prerequisite: STAT 505 , STAT 551

General principles of statistical consulting and statistical consulting experience. Preparation of reports, presentations, and communication aspects of consulting are discussed. Students will be working on client provided short on-call and long term projects.

Prerequisites: STAT 502 , STAT 505 ; STAT 508 ; STAT 557 , STAT 503 ; STAT 504 ; STAT 506 ; STAT 510

Statistical consulting experience including client meetings, development of recommendation reports, and discussion of consulting solutions. STAT 581 Statistical Consulting Practicum II (1 per semester/maximum of 2) This course serves as a continuation of STAT 580 , which provides actual practical experience as a statistical consultant. In STAT 581 , each student will hold a consulting session biweekly (by appointment) with a researcher to discuss the statistical design, analysis and computation aspects required for the client's project. Written reports are required for each project and reviewed for appropriateness and accuracy by a supervising faculty member. In addition, a weekly seminar is utilized to discuss selected projects and non-standard applications of statistical methodology. This course will be offered in the spring and summer, with an anticipated enrollment of 15-20 students per semester.

Prerequisite: STAT 580

Computational methods for modern machine learning models, including applications to big data and non-differentiable objective functions.

Cross-listed with: CSE 584

Continuing seminars which consist of a series of individual lectures by faculty, students, or outside speakers.

This course is designed to help students become better teachers and communicators of statistics. INTAF 592 Teaching Statistics (1) This course is designed to help students become better teachers and communicators of statistics, and specifically to prepare students to supervise undergraduate statistics students in labs or small group settings, or even to lead their own undergraduate courses. Students learn about and discuss pedagogy in statistics, gain experience with practice teaching, and improve via individual feedback.

Creative projects, including nonthesis research, which are supervised on an individual basis and which fall outside the scope of formal courses.

Formal courses given on a topical or special interest subject which may be offered infrequently; several different topics may be taught in one year or term.

No description.

Investigates methods for assessing data collected from experimental and/or observational studies in various research setting. STAT 800 Applied Research Methods (3) This course provides students with a broad exploration of the tools and methods in Applied Statistics. In particular, it investigates basic probability distributions and methods for assessing data collected from experimental and/or observational studies in social science and other research settings. Students learn methods of point and interval estimation, including sample size determinations required to achieve a prescribed margin of error. Additionally, students examine hypothesis testing and the determination of sample sizes to achieve a prescribed power of a given test. The distinction between observational studies and randomized experiments is clarified and the limitations of the conclusions are emphasized. Research articles that are relevant to students' fields of study are used to determine how these statistical methods are being applied. Students then identify and critique appropriate research methods. Students work with various data sets to establish fundamental practices that properly analyze data and interpret results via either Minitab or SPSS statistical software as they formulate and communicate conclusions based on a given research context.

This course is designed to build upon a student's undergraduate quantitative backgrounds by giving an overview of multivariate statistical techniques. Many applied fields often require the use of large, multivariate data sets and students need to be aware of the wide range of statistical tools available to them. Major objectives of this course are to gain a working knowledge of probability theory, univariate and multivariate statistics, the use of copulas, Monte Carlo techniques, and multiple linear regression. Throughout the course, students will have the opportunity to apply these concepts to real world data sets using modern statistical software packages.

This course is designed to build upon a student's background by giving an overview of the techniques of time series analysis often used in applied settings. Many areas of research and application often utilize long time series of data in an effort to model changes and volatility in data measured consistently over time. Major objectives in this course include an overview of linear time series; AR, MA, and ARIMA models; ARCH and GARCH models; nonlinear time series models; multivariate time series models; and models of high-frequency data. Throughout the course, students will have the opportunity to apply these concepts to real world data sets using modern statistical software packages.

Prerequisite: ( MFE 801 , STAT 805 ; STAT 505 )

Print Options

Print this page.

The PDF will include all information unique to this page.

Download Complete Bulletin PDFs

hypothesis testing statistics penn state

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.3 hypothesis testing examples.

  • Example: Right-Tailed Test
  • Example: Left-Tailed Test
  • Example: Two-Tailed Test

Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:

H 0 : μ = 170 H A : μ > 170

The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

Descriptive Statistics

$\mu$: mean of Brinelli

Null hypothesis    H₀: $\mu$ = 170 Alternative hypothesis    H₁: $\mu$ > 170

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):

t distribution graph for df = 24 and a right tailed test of .05 significance level

Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:

t distribution graph of right tailed test showing the p-value of 0117 for a t-value of 1.22

In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:

The biologist's hypotheses are:

H 0 : μ = 15.7 H A : μ < 15.7

The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:

$\mu$: mean of Height

Null hypothesis    H₀: $\mu$ = 15.7 Alternative hypothesis    H₁: $\mu$ < 15.7

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."

If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3

Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:

t-distribution for left tailed test with significance level of 0.05 shown in left tail

In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

t-distribution graph for left tailed test with a t-value of -4.60 and left tail area of 0.000

Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

The quality control specialist's hypotheses are:

H 0 : μ = 7.5 H A : μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

$\mu$: mean of Thickness

Null hypothesis    H₀: $\mu$ = 7.5 Alternative hypothesis    H₁: $\mu \ne$ 7.5

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.

If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):

t-distribution graph of two tails with a significance level of .05 and t values of -2.2616 and 2.2616

Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:

t-distribution graph for a two tailed test with t values of -1.54 and 1.54, the corresponding p-values are 0.0789732 on both tails

In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.

Penn State  Logo

  • Help & FAQ

Linear hypothesis testing for high dimensional generalized linear models

  • Department of Public Health Sciences
  • Penn State Cancer Institute
  • Cancer Institute, Cancer Control

Research output : Contribution to journal › Article › peer-review

This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ 2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral χ 2 distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Access to Document

  • 10.1214/18-AOS1761

Other files and links

  • Link to publication in Scopus
  • Link to the citations in Scopus

Fingerprint

  • Linear Hypothesis Mathematics 100%
  • Generalized Linear Model Business & Economics 97%
  • Hypothesis Testing Business & Economics 76%
  • High-dimensional Mathematics 60%
  • Partial Mathematics 49%
  • Regularization Business & Economics 48%
  • Degree of freedom Mathematics 44%
  • Empirical Analysis Mathematics 34%

T1 - Linear hypothesis testing for high dimensional generalized linear models

AU - Shi, Chengchun

AU - Song, Rui

AU - Chen, Zhao

AU - Li, Runze

N1 - Funding Information: Received June 2017; revised July 2018. 1Supported by NSF Grant DMS 1555244, NCI Grant P01 CA142538. 2Chen is the corresponding author, and supported by NNSFC Grants 11690014 and 11690015. 3Supported by NSF Grants DMS 1512422 and 1820702, NIH Grants P50 DA039838 and P50 DA036107, and T32 LM012415. MSC2010 subject classifications. Primary 62F03; secondary 62J12. Key words and phrases. High dimensional testing, linear hypothesis, likelihood ratio statistics, score test, Wald test. Publisher Copyright: © Institute of Mathematical Statistics, 2019.

N2 - This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral χ2 distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

AB - This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral χ2 distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

UR - http://www.scopus.com/inward/record.url?scp=85072180069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072180069&partnerID=8YFLogxK

U2 - 10.1214/18-AOS1761

DO - 10.1214/18-AOS1761

M3 - Article

C2 - 31534282

AN - SCOPUS:85072180069

SN - 0090-5364

JO - Annals of Statistics

JF - Annals of Statistics

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.2: Outcomes and the Type I and Type II Errors

  • Last updated
  • Save as PDF
  • Page ID 23461

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis \(H_{0}\) and the decision to reject or not. The outcomes are summarized in the following table:

The four possible outcomes in the table are:

  • The decision is not to reject \(H_{0}\) when \(H_{0}\) is true (correct decision).
  • The decision is to reject \(H_{0}\) when \(H_{0}\) is true (incorrect decision known as aType I error).
  • The decision is not to reject \(H_{0}\) when, in fact, \(H_{0}\) is false (incorrect decision known as a Type II error).
  • The decision is to reject \(H_{0}\) when \(H_{0}\) is false ( correct decision whose probability is called the Power of the Test ).

Each of the errors occurs with a particular probability. The Greek letters \(\alpha\) and \(\beta\) represent the probabilities.

  • \(\alpha =\) probability of a Type I error \(= P(\text{Type I error}) =\) probability of rejecting the null hypothesis when the null hypothesis is true.
  • \(\beta =\) probability of a Type II error \(= P(\text{Type II error}) =\) probability of not rejecting the null hypothesis when the null hypothesis is false.

\(\alpha\) and \(\beta\) should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is \(1 - \beta\). Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test. The following are examples of Type I and Type II errors.

Example \(\PageIndex{1}\): Type I vs. Type II errors

Suppose the null hypothesis, \(H_{0}\), is: Frank's rock climbing equipment is safe.

  • Type I error : Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.
  • Type II error : Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

\(\alpha =\) probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe.

\(\beta =\) probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Exercise \(\PageIndex{1}\)

Suppose the null hypothesis, \(H_{0}\), is: the blood cultures contain no traces of pathogen \(X\). State the Type I and Type II errors.

  • Type I error : The researcher thinks the blood cultures do contain traces of pathogen \(X\), when in fact, they do not.
  • Type II error : The researcher thinks the blood cultures do not contain traces of pathogen \(X\), when in fact, they do.

Example \(\PageIndex{2}\)

Suppose the null hypothesis, \(H_{0}\), is: The victim of an automobile accident is alive when he arrives at the emergency room of a hospital.

  • Type I error : The emergency crew thinks that the victim is dead when, in fact, the victim is alive.
  • Type II error : The emergency crew does not know if the victim is alive when, in fact, the victim is dead.

\(\alpha =\) probability that the emergency crew thinks the victim is dead when, in fact, he is really alive \(= P(\text{Type I error})\).

\(\beta =\) probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead \(= P(\text{Type II error})\).

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

Exercise \(\PageIndex{2}\)

Suppose the null hypothesis, \(H_{0}\), is: a patient is not sick. Which type of error has the greater consequence, Type I or Type II?

The error with the greater consequence is the Type II error: the patient will be thought well when, in fact, he is sick, so he will not get treatment.

Example \(\PageIndex{3}\)

It’s a Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, \(H_{0}\), is: It’s a Boy Genetic Labs has no effect on gender outcome.

  • Type I error : This results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, \(\alpha\).
  • Type II error : This results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, \(\beta\).

The error of greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.

Exercise \(\PageIndex{3}\)

“Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.

In this scenario, an appropriate null hypothesis would be \(H_{0}\): the mean level of toxins is at most \(800 \mu\text{g}\), \(H_{0}: \mu_{0} \leq 800 \mu\text{g}\).

Type I error : The DMF believes that toxin levels are still too high when, in fact, toxin levels are at most \(800 \mu\text{g}\). The DMF continues the harvesting ban.

Type II error : The DMF believes that toxin levels are within acceptable levels (are at least 800 μ g) when, in fact, toxin levels are still too high (more than \(800 \mu\text{g}\)). The DMF lifts the harvesting ban. This error could be the most serious. If the ban is lifted and clams are still toxic, consumers could possibly eat tainted food.

In summary, the more dangerous error would be to commit a Type II error, because this error involves the availability of tainted clams for consumption.

Example \(\PageIndex{4}\)

A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer. Describe both the Type I and Type II errors in context. Which error is the more serious?

  • Type I : A cancer patient believes the cure rate for the drug is less than 75% when it actually is at least 75%.
  • Type II : A cancer patient believes the experimental drug has at least a 75% cure rate when it has a cure rate that is less than 75%.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Exercise \(\PageIndex{4}\)

Determine both Type I and Type II errors for the following scenario:

Assume a null hypothesis, \(H_{0}\), that states the percentage of adults with jobs is at least 88%. Identify the Type I and Type II errors from these four statements.

  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%
  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%.

Type I error: c

Type I error: b

In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when a false null hypothesis is not rejected. The probabilities of these errors are denoted by the Greek letters \(\alpha\) and \(\beta\), for a Type I and a Type II error respectively. The power of the test, \(1 - \beta\), quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis being accepted. A high power is desirable.

Formula Review

IMAGES

  1. hypothesis test formula statistics

    hypothesis testing statistics penn state

  2. Your Guide to Master Hypothesis Testing in Statistics

    hypothesis testing statistics penn state

  3. Hypothesis Testing- Meaning, Types & Steps

    hypothesis testing statistics penn state

  4. 5 steps of hypothesis testing in statistics

    hypothesis testing statistics penn state

  5. Hypothesis Testing Statistics Formula Sheet

    hypothesis testing statistics penn state

  6. Hypothesis Testing

    hypothesis testing statistics penn state

VIDEO

  1. Hypothesis Testing for Mean: p-value is more than the level of significance (Hat Size Example)

  2. Hypothesis Testing

  3. 8a. Introduction to Hypothesis Testing

  4. Statistics with Crayons: Hypothesis Testing with Hans & Hera

  5. Statistics for Hypothesis Testing

  6. Hypothesis Testing: p Value for a Left Tail Test With Standardized Test Statistics z=-2.23 invnorm

COMMENTS

  1. 5.1

    A test is considered to be statistically significant when the p-value is less than or equal to the level of significance, also known as the alpha ( α) level. For this class, unless otherwise specified, α = 0.05; this is the most frequently used alpha level in many fields. Sample statistics vary from the population parameter randomly.

  2. S.3 Hypothesis Testing

    S.3 Hypothesis Testing. In reviewing hypothesis tests, we start first with the general idea. Then, we keep returning to the basic procedures of hypothesis testing, each time adding a little more detail. The general idea of hypothesis testing involves: Making an initial assumption. Collecting evidence (data).

  3. Hypothesis Testing

    Example: Criminal Trial Analogy. First, state 2 hypotheses, the null hypothesis ("H 0 ") and the alternative hypothesis ("H A "). H 0: Defendant is not guilty.; H A: Defendant is guilty.; Usually the H 0 is a statement of "no effect", or "no change", or "chance only" about a population parameter.. While the H A, depending on the situation, is that there is a difference ...

  4. PDF Hypothesis Testing, Page 1 Hypothesis Testing

    Hypothesis Testing, Page 1. Hypothesis Testing. Author: John M. Cimbala, Penn State University Latest revision: 04 May 2022. Introduction. •An important part of statistics is hypothesis testing- making a decision about some hypothesis (reject or accept), based on statistical methods. •The four basic steps in any kind of hypothesis testing ...

  5. Lesson 5: Hypothesis testing and randomization

    Author: John Smith, Lecturer, MGIS program, The Pennsylvania State University. This courseware module is offered as part of the Repository of Open and Affordable Materials at Penn State.. Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.. The College of Earth and Mineral Sciences is committed ...

  6. 9.5: Additional Information and Full Hypothesis Test Examples

    Conduct a hypothesis test using a preset α = 0.05. Assume the throw distances for footballs are normal. First, determine what type of test this is, set up the hypothesis test, find the p -value, sketch the graph, and state your conclusion. Press STAT and arrow over to TESTS.

  7. 9: Hypothesis Testing with One Sample

    (A z-score and a t-score are examples of test statistics.) Compare the preconceived α with the p-value, make a decision (reject or do not reject H0), and write a clear conclusion. 9.6: Hypothesis Testing of a Single Mean and Single Proportion (Worksheet) A statistics Worksheet: The student will select the appropriate distributions to use in ...

  8. 10: Hypothesis Testing with Two Samples

    A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results. 10.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

  9. Section 2: Hypothesis Testing

    We'll attempt to answer such questions using a statistical method known as hypothesis testing. We'll derive good hypothesis tests for the usual population parameters, including: a population mean μ. the difference in two population means, μ 1 − μ 2, say. a population variance σ 2.

  10. Hypothesis Testing: Randomization Distributions

    Author: John Smith, Lecturer, MGIS program, The Pennsylvania State University. This courseware module is offered as part of the Repository of Open and Affordable Materials at Penn State.. Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.. The College of Earth and Mineral Sciences is committed ...

  11. Master of Applied Statistics Online

    Descriptive statistics, hypothesis testing, power, estimation, confidence intervals, regression, one- and two-way ANOVA, chi-square tests, diagnostics. ... Department of Statistics, Penn State University 315 Thomas Building University Park, PA 16802 Phone: 814-863-8128 [email protected]

  12. Statistics (STAT) & Penn State

    Penn State; People; Departments; Search. New Bulletin Edition: ... Statistics is the art and science of decision making in the presence of uncertainty. The purpose of Statistics 100 is to help students improve their ability to assess statistical information in both everyday life and other University courses. ... Review of hypothesis testing ...

  13. Runze Li

    Runze Li is Eberly Family Chair Professor of Statistics at Penn State. ... Hypothesis testing on linear structures of high dimensional covariance matrix. Annals of Statistics. 47, 3300 - 3334. Shi, C., Song, R., Chen , Z. and Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models. Annals of Statistics. 47, 2671 ...

  14. PDF Hypothesis Testing, Page 1 Hypothesis Testing

    Hypothesis Testing, Page 1 Hypothesis Testing Author: John M. Cimbala, Penn State University Latest revision: 23 September 2014 Introduction An important part of statistics is hypothesis testing - making a decision about some hypothesis (reject or accept), based on statistical methods. The four basic steps in any kind of hypothesis testing are:

  15. 10.E: Hypothesis Testing with Two Samples (Exercises)

    Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for. independent group means, population standard deviations, and/or variances known. independent group means, population standard deviations, and/or variances unknown. matched or paired samples. single mean.

  16. 7.2.2

    One major difference in the hypothesis test is the null hypothesis and assuming the null hypothesis is true. For a test for two proportions, we are interested in the difference. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be: H 0: p 1 − p 2 = 0.

  17. Statistics (STAT) & Penn State

    Descriptive statistics, hypothesis testing, power, estimation, confidence intervals, regression, one- and 2-way ANOVA, Chi-square tests, diagnostics. Prerequisite: one undergraduate course in statistics STAT 501: Regression Methods. ... Download Penn State Law Bulletin PDF.

  18. 5.1

    Taking a sample of 500 Penn State students, we asked them if they like cold weather, we observe a sample proportion of 0.556, since these students go to school in Pennsylvania it might generally be thought the true proportion of students who like cold weather is 0.5, in other words the NULL hypothesis is that the true population proportion equal to 0.5 ,

  19. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  20. 9.1: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  21. S.3.3 Hypothesis Testing Examples

    If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t* were less than -1.6939 (determined using statistical software or a t-table):s-3-3. Since the biologist's test statistic, t* = -4.60, is less than -1.6939, the biologist rejects the null hypothesis.

  22. Linear hypothesis testing for high dimensional generalized ...

    Penn State Cancer Institute; Cancer Institute, Cancer Control; ... provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. ... We show that the limiting null distributions of these three test statistics are ...

  23. 9.2: Outcomes and the Type I and Type II Errors

    Example 9.2.1 9.2. 1: Type I vs. Type II errors. Suppose the null hypothesis, H0 H 0, is: Frank's rock climbing equipment is safe. Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe. Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.