Digital Commons @ University of South Florida

  • USF Research
  • USF Libraries

Digital Commons @ USF > College of Arts and Sciences > Mathematics and Statistics > Theses and Dissertations

Mathematics and Statistics Theses and Dissertations

Theses/dissertations from 2024 2024.

The Effect of Fixed Time Delays on the Synchronization Phase Transition , Shaizat Bakhytzhan

On the Subelliptic and Subparabolic Infinity Laplacian in Grushin-Type Spaces , Zachary Forrest

Utilizing Machine Learning Techniques for Accurate Diagnosis of Breast Cancer and Comprehensive Statistical Analysis of Clinical Data , Myat Ei Ei Phyo

Quandle Rings, Idempotents and Cocycle Invariants of Knots , Dipali Swain

Comparative Analysis of Time Series Models on U.S. Stock and Exchange Rates: Bayesian Estimation of Time Series Error Term Model Versus Machine Learning Approaches , Young Keun Yang

Theses/Dissertations from 2023 2023

Classification of Finite Topological Quandles and Shelves via Posets , Hitakshi Lahrani

Applied Analysis for Learning Architectures , Himanshu Singh

Rational Functions of Degree Five That Permute the Projective Line Over a Finite Field , Christopher Sze

Theses/Dissertations from 2022 2022

New Developments in Statistical Optimal Designs for Physical and Computer Experiments , Damola M. Akinlana

Advances and Applications of Optimal Polynomial Approximants , Raymond Centner

Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems , Aditya Chakraborty

On Simultaneous Similarity of d-tuples of Commuting Square Matrices , Corey Connelly

Symbolic Computation of Lump Solutions to a Combined (2+1)-dimensional Nonlinear Evolution Equation , Jingwei He

Boundary behavior of analytic functions and Approximation Theory , Spyros Pasias

Stability Analysis of Delay-Driven Coupled Cantilevers Using the Lambert W-Function , Daniel Siebel-Cortopassi

A Functional Optimization Approach to Stochastic Process Sampling , Ryan Matthew Thurman

Theses/Dissertations from 2021 2021

Riemann-Hilbert Problems for Nonlocal Reverse-Time Nonlinear Second-order and Fourth-order AKNS Systems of Multiple Components and Exact Soliton Solutions , Alle Adjiri

Zeros of Harmonic Polynomials and Related Applications , Azizah Alrajhi

Combination of Time Series Analysis and Sentiment Analysis for Stock Market Forecasting , Hsiao-Chuan Chou

Uncertainty Quantification in Deep and Statistical Learning with applications in Bio-Medical Image Analysis , K. Ruwani M. Fernando

Data-Driven Analytical Modeling of Multiple Myeloma Cancer, U.S. Crop Production and Monitoring Process , Lohuwa Mamudu

Long-time Asymptotics for mKdV Type Reduced Equations of the AKNS Hierarchy in Weighted L 2 Sobolev Spaces , Fudong Wang

Online and Adjusted Human Activities Recognition with Statistical Learning , Yanjia Zhang

Theses/Dissertations from 2020 2020

Bayesian Reliability Analysis of The Power Law Process and Statistical Modeling of Computer and Network Vulnerabilities with Cybersecurity Application , Freeh N. Alenezi

Discrete Models and Algorithms for Analyzing DNA Rearrangements , Jasper Braun

Bayesian Reliability Analysis for Optical Media Using Accelerated Degradation Test Data , Kun Bu

On the p(x)-Laplace equation in Carnot groups , Robert D. Freeman

Clustering methods for gene expression data of Oxytricha trifallax , Kyle Houfek

Gradient Boosting for Survival Analysis with Applications in Oncology , Nam Phuong Nguyen

Global and Stochastic Dynamics of Diffusive Hindmarsh-Rose Equations in Neurodynamics , Chi Phan

Restricted Isometric Projections for Differentiable Manifolds and Applications , Vasile Pop

On Some Problems on Polynomial Interpolation in Several Variables , Brian Jon Tuesink

Numerical Study of Gap Distributions in Determinantal Point Process on Low Dimensional Spheres: L -Ensemble of O ( n ) Model Type for n = 2 and n = 3 , Xiankui Yang

Non-Associative Algebraic Structures in Knot Theory , Emanuele Zappala

Theses/Dissertations from 2019 2019

Field Quantization for Radiative Decay of Plasmons in Finite and Infinite Geometries , Maryam Bagherian

Probabilistic Modeling of Democracy, Corruption, Hemophilia A and Prediabetes Data , A. K. M. Raquibul Bashar

Generalized Derivations of Ternary Lie Algebras and n-BiHom-Lie Algebras , Amine Ben Abdeljelil

Fractional Random Weighted Bootstrapping for Classification on Imbalanced Data with Ensemble Decision Tree Methods , Sean Charles Carter

Hierarchical Self-Assembly and Substitution Rules , Daniel Alejandro Cruz

Statistical Learning of Biomedical Non-Stationary Signals and Quality of Life Modeling , Mahdi Goudarzi

Probabilistic and Statistical Prediction Models for Alzheimer’s Disease and Statistical Analysis of Global Warming , Maryam Ibrahim Habadi

Essays on Time Series and Machine Learning Techniques for Risk Management , Michael Kotarinos

The Systems of Post and Post Algebras: A Demonstration of an Obvious Fact , Daviel Leyva

Reconstruction of Radar Images by Using Spherical Mean and Regular Radon Transforms , Ozan Pirbudak

Analyses of Unorthodox Overlapping Gene Segments in Oxytricha Trifallax , Shannon Stich

An Optimal Medium-Strength Regularity Algorithm for 3-uniform Hypergraphs , John Theado

Power Graphs of Quasigroups , DayVon L. Walker

Theses/Dissertations from 2018 2018

Groups Generated by Automata Arising from Transformations of the Boundaries of Rooted Trees , Elsayed Ahmed

Non-equilibrium Phase Transitions in Interacting Diffusions , Wael Al-Sawai

A Hybrid Dynamic Modeling of Time-to-event Processes and Applications , Emmanuel A. Appiah

Lump Solutions and Riemann-Hilbert Approach to Soliton Equations , Sumayah A. Batwa

Developing a Model to Predict Prevalence of Compulsive Behavior in Individuals with OCD , Lindsay D. Fields

Generalizations of Quandles and their cohomologies , Matthew J. Green

Hamiltonian structures and Riemann-Hilbert problems of integrable systems , Xiang Gu

Optimal Latin Hypercube Designs for Computer Experiments Based on Multiple Objectives , Ruizhe Hou

Human Activity Recognition Based on Transfer Learning , Jinyong Pang

Signal Detection of Adverse Drug Reaction using the Adverse Event Reporting System: Literature Review and Novel Methods , Minh H. Pham

Statistical Analysis and Modeling of Cyber Security and Health Sciences , Nawa Raj Pokhrel

Machine Learning Methods for Network Intrusion Detection and Intrusion Prevention Systems , Zheni Svetoslavova Stefanova

Orthogonal Polynomials With Respect to the Measure Supported Over the Whole Complex Plane , Meng Yang

Theses/Dissertations from 2017 2017

Modeling in Finance and Insurance With Levy-It'o Driven Dynamic Processes under Semi Markov-type Switching Regimes and Time Domains , Patrick Armand Assonken Tonfack

Prevalence of Typical Images in High School Geometry Textbooks , Megan N. Cannon

On Extending Hansel's Theorem to Hypergraphs , Gregory Sutton Churchill

Contributions to Quandle Theory: A Study of f-Quandles, Extensions, and Cohomology , Indu Rasika U. Churchill

Linear Extremal Problems in the Hardy Space H p for 0 p , Robert Christopher Connelly

Statistical Analysis and Modeling of Ovarian and Breast Cancer , Muditha V. Devamitta Perera

Statistical Analysis and Modeling of Stomach Cancer Data , Chao Gao

Structural Analysis of Poloidal and Toroidal Plasmons and Fields of Multilayer Nanorings , Kumar Vijay Garapati

Dynamics of Multicultural Social Networks , Kristina B. Hilton

Cybersecurity: Stochastic Analysis and Modelling of Vulnerabilities to Determine the Network Security and Attackers Behavior , Pubudu Kalpani Kaluarachchi

Generalized D-Kaup-Newell integrable systems and their integrable couplings and Darboux transformations , Morgan Ashley McAnally

Patterns in Words Related to DNA Rearrangements , Lukas Nabergall

Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer , Shuang Na

Schreier Graphs of Thompson's Group T , Allen Pennington

Cybersecurity: Probabilistic Behavior of Vulnerability and Life Cycle , Sasith Maduranga Rajasooriya

Bayesian Artificial Neural Networks in Health and Cybersecurity , Hansapani Sarasepa Rodrigo

Real-time Classification of Biomedical Signals, Parkinson’s Analytical Model , Abolfazl Saghafi

Lump, complexiton and algebro-geometric solutions to soliton equations , Yuan Zhou

Theses/Dissertations from 2016 2016

A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida , Joy Marie D'andrea

Statistical Analysis of a Risk Factor in Finance and Environmental Models for Belize , Sherlene Enriquez-Savery

Putnam's Inequality and Analytic Content in the Bergman Space , Matthew Fleeman

On the Number of Colors in Quandle Knot Colorings , Jeremy William Kerr

Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering , Doo Young Kim

Some Results Concerning Permutation Polynomials over Finite Fields , Stephen Lappano

Hamiltonian Formulations and Symmetry Constraints of Soliton Hierarchies of (1+1)-Dimensional Nonlinear Evolution Equations , Solomon Manukure

Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach , Venkateswara Rao Mudunuru

Generalized Phase Retrieval: Isometries in Vector Spaces , Josiah Park

Leonard Systems and their Friends , Jonathan Spiewak

Resonant Solutions to (3+1)-dimensional Bilinear Differential Equations , Yue Sun

Statistical Analysis and Modeling Health Data: A Longitudinal Study , Bhikhari Prasad Tharu

Global Attractors and Random Attractors of Reaction-Diffusion Systems , Junyi Tu

Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering , Xing Wang

On Spectral Properties of Single Layer Potentials , Seyed Zoalroshd

Theses/Dissertations from 2015 2015

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach , Wei Chen

Active Tile Self-assembly and Simulations of Computational Systems , Daria Karpenko

Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis Distance , Vindya Kumari Pathirana

Statistical Learning with Artificial Neural Network Applied to Health and Environmental Data , Taysseer Sharaf

Radial Versus Othogonal and Minimal Projections onto Hyperplanes in l_4^3 , Richard Alan Warner

Ensemble Learning Method on Machine Maintenance Data , Xiaochuang Zhao

Theses/Dissertations from 2014 2014

Properties of Graphs Used to Model DNA Recombination , Ryan Arredondo

Advanced Search

  • Email Notifications and RSS
  • All Collections
  • USF Faculty Publications
  • Open Access Journals
  • Conferences and Events
  • Theses and Dissertations
  • Textbooks Collection

Useful Links

  • Mathematics and Statistics Department
  • Rights Information
  • SelectedWorks
  • Submit Research

Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements

Privacy Copyright

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

thesis on statistical analysis

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

thesis on statistical analysis

How To Write The Results/Findings Chapter

For quantitative studies (dissertations & theses).

By: Derek Jansen (MBA) | Expert Reviewed By: Kerryn Warren (PhD) | July 2021

So, you’ve completed your quantitative data analysis and it’s time to report on your findings. But where do you start? In this post, we’ll walk you through the results chapter (also called the findings or analysis chapter), step by step, so that you can craft this section of your dissertation or thesis with confidence. If you’re looking for information regarding the results chapter for qualitative studies, you can find that here .

Overview: Quantitative Results Chapter

  • What exactly the results chapter is
  • What you need to include in your chapter
  • How to structure the chapter
  • Tips and tricks for writing a top-notch chapter
  • Free results chapter template

What exactly is the results chapter?

The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you’ve found in terms of the quantitative data you’ve collected. It presents the data using a clear text narrative, supported by tables, graphs and charts. In doing so, it also highlights any potential issues (such as outliers or unusual findings) you’ve come across.

But how’s that different from the discussion chapter?

Well, in the results chapter, you only present your statistical findings. Only the numbers, so to speak – no more, no less. Contrasted to this, in the discussion chapter , you interpret your findings and link them to prior research (i.e. your literature review), as well as your research objectives and research questions . In other words, the results chapter presents and describes the data, while the discussion chapter interprets the data.

Let’s look at an example.

In your results chapter, you may have a plot that shows how respondents to a survey  responded: the numbers of respondents per category, for instance. You may also state whether this supports a hypothesis by using a p-value from a statistical test. But it is only in the discussion chapter where you will say why this is relevant or how it compares with the literature or the broader picture. So, in your results chapter, make sure that you don’t present anything other than the hard facts – this is not the place for subjectivity.

It’s worth mentioning that some universities prefer you to combine the results and discussion chapters. Even so, it is good practice to separate the results and discussion elements within the chapter, as this ensures your findings are fully described. Typically, though, the results and discussion chapters are split up in quantitative studies. If you’re unsure, chat with your research supervisor or chair to find out what their preference is.

Free template for results section of a dissertation or thesis

What should you include in the results chapter?

Following your analysis, it’s likely you’ll have far more data than are necessary to include in your chapter. In all likelihood, you’ll have a mountain of SPSS or R output data, and it’s your job to decide what’s most relevant. You’ll need to cut through the noise and focus on the data that matters.

This doesn’t mean that those analyses were a waste of time – on the contrary, those analyses ensure that you have a good understanding of your dataset and how to interpret it. However, that doesn’t mean your reader or examiner needs to see the 165 histograms you created! Relevance is key.

How do I decide what’s relevant?

At this point, it can be difficult to strike a balance between what is and isn’t important. But the most important thing is to ensure your results reflect and align with the purpose of your study .  So, you need to revisit your research aims, objectives and research questions and use these as a litmus test for relevance. Make sure that you refer back to these constantly when writing up your chapter so that you stay on track.

There must be alignment between your research aims objectives and questions

As a general guide, your results chapter will typically include the following:

  • Some demographic data about your sample
  • Reliability tests (if you used measurement scales)
  • Descriptive statistics
  • Inferential statistics (if your research objectives and questions require these)
  • Hypothesis tests (again, if your research objectives and questions require these)

We’ll discuss each of these points in more detail in the next section.

Importantly, your results chapter needs to lay the foundation for your discussion chapter . This means that, in your results chapter, you need to include all the data that you will use as the basis for your interpretation in the discussion chapter.

For example, if you plan to highlight the strong relationship between Variable X and Variable Y in your discussion chapter, you need to present the respective analysis in your results chapter – perhaps a correlation or regression analysis.

Need a helping hand?

thesis on statistical analysis

How do I write the results chapter?

There are multiple steps involved in writing up the results chapter for your quantitative research. The exact number of steps applicable to you will vary from study to study and will depend on the nature of the research aims, objectives and research questions . However, we’ll outline the generic steps below.

Step 1 – Revisit your research questions

The first step in writing your results chapter is to revisit your research objectives and research questions . These will be (or at least, should be!) the driving force behind your results and discussion chapters, so you need to review them and then ask yourself which statistical analyses and tests (from your mountain of data) would specifically help you address these . For each research objective and research question, list the specific piece (or pieces) of analysis that address it.

At this stage, it’s also useful to think about the key points that you want to raise in your discussion chapter and note these down so that you have a clear reminder of which data points and analyses you want to highlight in the results chapter. Again, list your points and then list the specific piece of analysis that addresses each point. 

Next, you should draw up a rough outline of how you plan to structure your chapter . Which analyses and statistical tests will you present and in what order? We’ll discuss the “standard structure” in more detail later, but it’s worth mentioning now that it’s always useful to draw up a rough outline before you start writing (this advice applies to any chapter).

Step 2 – Craft an overview introduction

As with all chapters in your dissertation or thesis, you should start your quantitative results chapter by providing a brief overview of what you’ll do in the chapter and why . For example, you’d explain that you will start by presenting demographic data to understand the representativeness of the sample, before moving onto X, Y and Z.

This section shouldn’t be lengthy – a paragraph or two maximum. Also, it’s a good idea to weave the research questions into this section so that there’s a golden thread that runs through the document.

Your chapter must have a golden thread

Step 3 – Present the sample demographic data

The first set of data that you’ll present is an overview of the sample demographics – in other words, the demographics of your respondents.

For example:

  • What age range are they?
  • How is gender distributed?
  • How is ethnicity distributed?
  • What areas do the participants live in?

The purpose of this is to assess how representative the sample is of the broader population. This is important for the sake of the generalisability of the results. If your sample is not representative of the population, you will not be able to generalise your findings. This is not necessarily the end of the world, but it is a limitation you’ll need to acknowledge.

Of course, to make this representativeness assessment, you’ll need to have a clear view of the demographics of the population. So, make sure that you design your survey to capture the correct demographic information that you will compare your sample to.

But what if I’m not interested in generalisability?

Well, even if your purpose is not necessarily to extrapolate your findings to the broader population, understanding your sample will allow you to interpret your findings appropriately, considering who responded. In other words, it will help you contextualise your findings . For example, if 80% of your sample was aged over 65, this may be a significant contextual factor to consider when interpreting the data. Therefore, it’s important to understand and present the demographic data.

 Step 4 – Review composite measures and the data “shape”.

Before you undertake any statistical analysis, you’ll need to do some checks to ensure that your data are suitable for the analysis methods and techniques you plan to use. If you try to analyse data that doesn’t meet the assumptions of a specific statistical technique, your results will be largely meaningless. Therefore, you may need to show that the methods and techniques you’ll use are “allowed”.

Most commonly, there are two areas you need to pay attention to:

#1: Composite measures

The first is when you have multiple scale-based measures that combine to capture one construct – this is called a composite measure .  For example, you may have four Likert scale-based measures that (should) all measure the same thing, but in different ways. In other words, in a survey, these four scales should all receive similar ratings. This is called “ internal consistency ”.

Internal consistency is not guaranteed though (especially if you developed the measures yourself), so you need to assess the reliability of each composite measure using a test. Typically, Cronbach’s Alpha is a common test used to assess internal consistency – i.e., to show that the items you’re combining are more or less saying the same thing. A high alpha score means that your measure is internally consistent. A low alpha score means you may need to consider scrapping one or more of the measures.

#2: Data shape

The second matter that you should address early on in your results chapter is data shape. In other words, you need to assess whether the data in your set are symmetrical (i.e. normally distributed) or not, as this will directly impact what type of analyses you can use. For many common inferential tests such as T-tests or ANOVAs (we’ll discuss these a bit later), your data needs to be normally distributed. If it’s not, you’ll need to adjust your strategy and use alternative tests.

To assess the shape of the data, you’ll usually assess a variety of descriptive statistics (such as the mean, median and skewness), which is what we’ll look at next.

Descriptive statistics

Step 5 – Present the descriptive statistics

Now that you’ve laid the foundation by discussing the representativeness of your sample, as well as the reliability of your measures and the shape of your data, you can get started with the actual statistical analysis. The first step is to present the descriptive statistics for your variables.

For scaled data, this usually includes statistics such as:

  • The mean – this is simply the mathematical average of a range of numbers.
  • The median – this is the midpoint in a range of numbers when the numbers are arranged in order.
  • The mode – this is the most commonly repeated number in the data set.
  • Standard deviation – this metric indicates how dispersed a range of numbers is. In other words, how close all the numbers are to the mean (the average).
  • Skewness – this indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph (this is called a normal or parametric distribution), or do they lean to the left or right (this is called a non-normal or non-parametric distribution).
  • Kurtosis – this metric indicates whether the data are heavily or lightly-tailed, relative to the normal distribution. In other words, how peaked or flat the distribution is.

A large table that indicates all the above for multiple variables can be a very effective way to present your data economically. You can also use colour coding to help make the data more easily digestible.

For categorical data, where you show the percentage of people who chose or fit into a category, for instance, you can either just plain describe the percentages or numbers of people who responded to something or use graphs and charts (such as bar graphs and pie charts) to present your data in this section of the chapter.

When using figures, make sure that you label them simply and clearly , so that your reader can easily understand them. There’s nothing more frustrating than a graph that’s missing axis labels! Keep in mind that although you’ll be presenting charts and graphs, your text content needs to present a clear narrative that can stand on its own. In other words, don’t rely purely on your figures and tables to convey your key points: highlight the crucial trends and values in the text. Figures and tables should complement the writing, not carry it .

Depending on your research aims, objectives and research questions, you may stop your analysis at this point (i.e. descriptive statistics). However, if your study requires inferential statistics, then it’s time to deep dive into those .

Dive into the inferential statistics

Step 6 – Present the inferential statistics

Inferential statistics are used to make generalisations about a population , whereas descriptive statistics focus purely on the sample . Inferential statistical techniques, broadly speaking, can be broken down into two groups .

First, there are those that compare measurements between groups , such as t-tests (which measure differences between two groups) and ANOVAs (which measure differences between multiple groups). Second, there are techniques that assess the relationships between variables , such as correlation analysis and regression analysis. Within each of these, some tests can be used for normally distributed (parametric) data and some tests are designed specifically for use on non-parametric data.

There are a seemingly endless number of tests that you can use to crunch your data, so it’s easy to run down a rabbit hole and end up with piles of test data. Ultimately, the most important thing is to make sure that you adopt the tests and techniques that allow you to achieve your research objectives and answer your research questions .

In this section of the results chapter, you should try to make use of figures and visual components as effectively as possible. For example, if you present a correlation table, use colour coding to highlight the significance of the correlation values, or scatterplots to visually demonstrate what the trend is. The easier you make it for your reader to digest your findings, the more effectively you’ll be able to make your arguments in the next chapter.

make it easy for your reader to understand your quantitative results

Step 7 – Test your hypotheses

If your study requires it, the next stage is hypothesis testing. A hypothesis is a statement , often indicating a difference between groups or relationship between variables, that can be supported or rejected by a statistical test. However, not all studies will involve hypotheses (again, it depends on the research objectives), so don’t feel like you “must” present and test hypotheses just because you’re undertaking quantitative research.

The basic process for hypothesis testing is as follows:

  • Specify your null hypothesis (for example, “The chemical psilocybin has no effect on time perception).
  • Specify your alternative hypothesis (e.g., “The chemical psilocybin has an effect on time perception)
  • Set your significance level (this is usually 0.05)
  • Calculate your statistics and find your p-value (e.g., p=0.01)
  • Draw your conclusions (e.g., “The chemical psilocybin does have an effect on time perception”)

Finally, if the aim of your study is to develop and test a conceptual framework , this is the time to present it, following the testing of your hypotheses. While you don’t need to develop or discuss these findings further in the results chapter, indicating whether the tests (and their p-values) support or reject the hypotheses is crucial.

Step 8 – Provide a chapter summary

To wrap up your results chapter and transition to the discussion chapter, you should provide a brief summary of the key findings . “Brief” is the keyword here – much like the chapter introduction, this shouldn’t be lengthy – a paragraph or two maximum. Highlight the findings most relevant to your research objectives and research questions, and wrap it up.

Some final thoughts, tips and tricks

Now that you’ve got the essentials down, here are a few tips and tricks to make your quantitative results chapter shine:

  • When writing your results chapter, report your findings in the past tense . You’re talking about what you’ve found in your data, not what you are currently looking for or trying to find.
  • Structure your results chapter systematically and sequentially . If you had two experiments where findings from the one generated inputs into the other, report on them in order.
  • Make your own tables and graphs rather than copying and pasting them from statistical analysis programmes like SPSS. Check out the DataIsBeautiful reddit for some inspiration.
  • Once you’re done writing, review your work to make sure that you have provided enough information to answer your research questions , but also that you didn’t include superfluous information.

If you’ve got any questions about writing up the quantitative results chapter, please leave a comment below. If you’d like 1-on-1 assistance with your quantitative analysis and discussion, check out our hands-on coaching service , or book a free consultation with a friendly coach.

thesis on statistical analysis

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

Soo

Thank you. I will try my best to write my results.

Lord

Awesome content 👏🏾

Tshepiso

this was great explaination

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HCA Healthc J Med
  • v.1(2); 2020
  • PMC10324782

Logo of hcahjm

Introduction to Research Statistical Analysis: An Overview of the Basics

Christian vandever.

1 HCA Healthcare Graduate Medical Education

Description

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.

Introduction

Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology. Some of the information is more applicable to retrospective projects, where analysis is performed on data that has already been collected, but most of it will be suitable to any type of research. This primer will help the reader understand research results in coordination with a statistician, not to perform the actual analysis. Analysis is commonly performed using statistical programming software such as R, SAS or SPSS. These allow for analysis to be replicated while minimizing the risk for an error. Resources are listed later for those working on analysis without a statistician.

After coming up with a hypothesis for a study, including any variables to be used, one of the first steps is to think about the patient population to apply the question. Results are only relevant to the population that the underlying data represents. Since it is impractical to include everyone with a certain condition, a subset of the population of interest should be taken. This subset should be large enough to have power, which means there is enough data to deliver significant results and accurately reflect the study’s population.

The first statistics of interest are related to significance level and power, alpha and beta. Alpha (α) is the significance level and probability of a type I error, the rejection of the null hypothesis when it is true. The null hypothesis is generally that there is no difference between the groups compared. A type I error is also known as a false positive. An example would be an analysis that finds one medication statistically better than another, when in reality there is no difference in efficacy between the two. Beta (β) is the probability of a type II error, the failure to reject the null hypothesis when it is actually false. A type II error is also known as a false negative. This occurs when the analysis finds there is no difference in two medications when in reality one works better than the other. Power is defined as 1-β and should be calculated prior to running any sort of statistical testing. Ideally, alpha should be as small as possible while power should be as large as possible. Power generally increases with a larger sample size, but so does cost and the effect of any bias in the study design. Additionally, as the sample size gets bigger, the chance for a statistically significant result goes up even though these results can be small differences that do not matter practically. Power calculators include the magnitude of the effect in order to combat the potential for exaggeration and only give significant results that have an actual impact. The calculators take inputs like the mean, effect size and desired power, and output the required minimum sample size for analysis. Effect size is calculated using statistical information on the variables of interest. If that information is not available, most tests have commonly used values for small, medium or large effect sizes.

When the desired patient population is decided, the next step is to define the variables previously chosen to be included. Variables come in different types that determine which statistical methods are appropriate and useful. One way variables can be split is into categorical and quantitative variables. ( Table 1 ) Categorical variables place patients into groups, such as gender, race and smoking status. Quantitative variables measure or count some quantity of interest. Common quantitative variables in research include age and weight. An important note is that there can often be a choice for whether to treat a variable as quantitative or categorical. For example, in a study looking at body mass index (BMI), BMI could be defined as a quantitative variable or as a categorical variable, with each patient’s BMI listed as a category (underweight, normal, overweight, and obese) rather than the discrete value. The decision whether a variable is quantitative or categorical will affect what conclusions can be made when interpreting results from statistical tests. Keep in mind that since quantitative variables are treated on a continuous scale it would be inappropriate to transform a variable like which medication was given into a quantitative variable with values 1, 2 and 3.

Categorical vs. Quantitative Variables

Categorical VariablesQuantitative Variables
Categorize patients into discrete groupsContinuous values that measure a variable
Patient categories are mutually exclusiveFor time based studies, there would be a new variable for each measurement at each time
Examples: race, smoking status, demographic groupExamples: age, weight, heart rate, white blood cell count

Both of these types of variables can also be split into response and predictor variables. ( Table 2 ) Predictor variables are explanatory, or independent, variables that help explain changes in a response variable. Conversely, response variables are outcome, or dependent, variables whose changes can be partially explained by the predictor variables.

Response vs. Predictor Variables

Response VariablesPredictor Variables
Outcome variablesExplanatory variables
Should be the result of the predictor variablesShould help explain changes in the response variables
One variable per statistical testCan be multiple variables that may have an impact on the response variable
Can be categorical or quantitativeCan be categorical or quantitative

Choosing the correct statistical test depends on the types of variables defined and the question being answered. The appropriate test is determined by the variables being compared. Some common statistical tests include t-tests, ANOVA and chi-square tests.

T-tests compare whether there are differences in a quantitative variable between two values of a categorical variable. For example, a t-test could be useful to compare the length of stay for knee replacement surgery patients between those that took apixaban and those that took rivaroxaban. A t-test could examine whether there is a statistically significant difference in the length of stay between the two groups. The t-test will output a p-value, a number between zero and one, which represents the probability that the two groups could be as different as they are in the data, if they were actually the same. A value closer to zero suggests that the difference, in this case for length of stay, is more statistically significant than a number closer to one. Prior to collecting the data, set a significance level, the previously defined alpha. Alpha is typically set at 0.05, but is commonly reduced in order to limit the chance of a type I error, or false positive. Going back to the example above, if alpha is set at 0.05 and the analysis gives a p-value of 0.039, then a statistically significant difference in length of stay is observed between apixaban and rivaroxaban patients. If the analysis gives a p-value of 0.91, then there was no statistical evidence of a difference in length of stay between the two medications. Other statistical summaries or methods examine how big of a difference that might be. These other summaries are known as post-hoc analysis since they are performed after the original test to provide additional context to the results.

Analysis of variance, or ANOVA, tests can observe mean differences in a quantitative variable between values of a categorical variable, typically with three or more values to distinguish from a t-test. ANOVA could add patients given dabigatran to the previous population and evaluate whether the length of stay was significantly different across the three medications. If the p-value is lower than the designated significance level then the hypothesis that length of stay was the same across the three medications is rejected. Summaries and post-hoc tests also could be performed to look at the differences between length of stay and which individual medications may have observed statistically significant differences in length of stay from the other medications. A chi-square test examines the association between two categorical variables. An example would be to consider whether the rate of having a post-operative bleed is the same across patients provided with apixaban, rivaroxaban and dabigatran. A chi-square test can compute a p-value determining whether the bleeding rates were significantly different or not. Post-hoc tests could then give the bleeding rate for each medication, as well as a breakdown as to which specific medications may have a significantly different bleeding rate from each other.

A slightly more advanced way of examining a question can come through multiple regression. Regression allows more predictor variables to be analyzed and can act as a control when looking at associations between variables. Common control variables are age, sex and any comorbidities likely to affect the outcome variable that are not closely related to the other explanatory variables. Control variables can be especially important in reducing the effect of bias in a retrospective population. Since retrospective data was not built with the research question in mind, it is important to eliminate threats to the validity of the analysis. Testing that controls for confounding variables, such as regression, is often more valuable with retrospective data because it can ease these concerns. The two main types of regression are linear and logistic. Linear regression is used to predict differences in a quantitative, continuous response variable, such as length of stay. Logistic regression predicts differences in a dichotomous, categorical response variable, such as 90-day readmission. So whether the outcome variable is categorical or quantitative, regression can be appropriate. An example for each of these types could be found in two similar cases. For both examples define the predictor variables as age, gender and anticoagulant usage. In the first, use the predictor variables in a linear regression to evaluate their individual effects on length of stay, a quantitative variable. For the second, use the same predictor variables in a logistic regression to evaluate their individual effects on whether the patient had a 90-day readmission, a dichotomous categorical variable. Analysis can compute a p-value for each included predictor variable to determine whether they are significantly associated. The statistical tests in this article generate an associated test statistic which determines the probability the results could be acquired given that there is no association between the compared variables. These results often come with coefficients which can give the degree of the association and the degree to which one variable changes with another. Most tests, including all listed in this article, also have confidence intervals, which give a range for the correlation with a specified level of confidence. Even if these tests do not give statistically significant results, the results are still important. Not reporting statistically insignificant findings creates a bias in research. Ideas can be repeated enough times that eventually statistically significant results are reached, even though there is no true significance. In some cases with very large sample sizes, p-values will almost always be significant. In this case the effect size is critical as even the smallest, meaningless differences can be found to be statistically significant.

These variables and tests are just some things to keep in mind before, during and after the analysis process in order to make sure that the statistical reports are supporting the questions being answered. The patient population, types of variables and statistical tests are all important things to consider in the process of statistical analysis. Any results are only as useful as the process used to obtain them. This primer can be used as a reference to help ensure appropriate statistical analysis.

Alpha (α)the significance level and probability of a type I error, the probability of a false positive
Analysis of variance/ANOVAtest observing mean differences in a quantitative variable between values of a categorical variable, typically with three or more values to distinguish from a t-test
Beta (β)the probability of a type II error, the probability of a false negative
Categorical variableplace patients into groups, such as gender, race or smoking status
Chi-square testexamines association between two categorical variables
Confidence intervala range for the correlation with a specified level of confidence, 95% for example
Control variablesvariables likely to affect the outcome variable that are not closely related to the other explanatory variables
Hypothesisthe idea being tested by statistical analysis
Linear regressionregression used to predict differences in a quantitative, continuous response variable, such as length of stay
Logistic regressionregression used to predict differences in a dichotomous, categorical response variable, such as 90-day readmission
Multiple regressionregression utilizing more than one predictor variable
Null hypothesisthe hypothesis that there are no significant differences for the variable(s) being tested
Patient populationthe population the data is collected to represent
Post-hoc analysisanalysis performed after the original test to provide additional context to the results
Power1-beta, the probability of avoiding a type II error, avoiding a false negative
Predictor variableexplanatory, or independent, variables that help explain changes in a response variable
p-valuea value between zero and one, which represents the probability that the null hypothesis is true, usually compared against a significance level to judge statistical significance
Quantitative variablevariable measuring or counting some quantity of interest
Response variableoutcome, or dependent, variables whose changes can be partially explained by the predictor variables
Retrospective studya study using previously existing data that was not originally collected for the purposes of the study
Sample sizethe number of patients or observations used for the study
Significance levelalpha, the probability of a type I error, usually compared to a p-value to determine statistical significance
Statistical analysisanalysis of data using statistical testing to examine a research hypothesis
Statistical testingtesting used to examine the validity of a hypothesis using statistical calculations
Statistical significancedetermine whether to reject the null hypothesis, whether the p-value is below the threshold of a predetermined significance level
T-testtest comparing whether there are differences in a quantitative variable between two values of a categorical variable

Funding Statement

This research was supported (in whole or in part) by HCA Healthcare and/or an HCA Healthcare affiliated entity.

Conflicts of Interest

The author declares he has no conflicts of interest.

Christian Vandever is an employee of HCA Healthcare Graduate Medical Education, an organization affiliated with the journal’s publisher.

This research was supported (in whole or in part) by HCA Healthcare and/or an HCA Healthcare affiliated entity. The views expressed in this publication represent those of the author(s) and do not necessarily represent the official views of HCA Healthcare or any of its affiliated entities.

  • Utility Menu

University Logo

Department of Statistics

4c69b3a36a33a4c1c5b5cd3ef5360949.

  • Open Positions

What do senior theses in Statistics look like?

This is a brief overview of thesis writing; for more information, please see our website here . Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors:                                                                                                            

1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research extends or relates to previous related work.

2. An analysis of a complex data set that advances understanding in a related field, such as public health, economics, government, or genetics. Such a thesis may rely entirely on existing methods, but should give useful results and insights into an interesting applied problem.                                                                                 

3. An analysis of a complex data set in which new methods or modifications of published methods are required. While the thesis does not necessarily contain an extensive mathematical study of the new methods, it should contain strong plausibility arguments or simulations supporting the use of the new methods.

A good thesis is clear, readable, and well-motivated, justifying the applicability of the methods used rather than, for example, mechanically running regressions without discussing the assumptions (and whether they are plausible), performing diagnostics, and checking whether the conclusions make sense. 

Recent FAQs

  • What is a qualified applicant's likelihood for admission?
  • What is the application deadline?
  • Can I start the program in the spring?
  • Can I apply to two different GSAS degree programs at the same time?
  • Is a Math or Stats major required for admission?
  • Is the GRE required?

Duke University Libraries

Statistical Science

  • Undergraduate theses
  • Finding information @ Duke
  • Data sets & collections
  • Data & visualization services This link opens in a new window
  • Statistics consulting This link opens in a new window
  • Citing sources
  • Excel This link opens in a new window
  • Bayesian statistics
  • Actuarial science
  • Sports analytics

Librarian for Mathematics, Physics and Statistical Science

Profile Photo

Ask a Librarian

Submit thesis to dukespace.

If you are an undergraduate honors student interested in submitting your thesis to DukeSpace , Duke University's online repository for publications and other archival materials in digital format, please contact Joan Durso to get this process started.

DukeSpace Electronic Theses and Dissertations (ETD) Submission Tutorial

  • DukeSpace Electronic Theses and Dissertation Self-Submission Guide

Need help submitting your thesis? Contact  [email protected] .

  • << Previous: Sports analytics
  • Last Updated: Aug 6, 2024 11:48 AM
  • URL: https://guides.library.duke.edu/stats

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

Statistical Methods in Theses: Guidelines and Explanations

Signed August 2018 Naseem Al-Aidroos, PhD, Christopher Fiacconi, PhD Deborah Powell, PhD, Harvey Marmurek, PhD, Ian Newby-Clark, PhD, Jeffrey Spence, PhD, David Stanley, PhD, Lana Trick, PhD

Version:  2.00

This document is an organizational aid, and workbook, for students. We encourage students to take this document to meetings with their advisor and committee. This guide should enhance a committee’s ability to assess key areas of a student’s work. 

In recent years a number of well-known and apparently well-established findings have  failed to replicate , resulting in what is commonly referred to as the replication crisis. The APA Publication Manual 6 th Edition notes that “The essence of the scientific method involves observations that can be repeated and verified by others.” (p. 12). However, a systematic investigation of the replicability of psychology findings published in  Science  revealed that over half of psychology findings do not replicate (see a related commentary in  Nature ). Even more disturbing, a  Bayesian reanalysis of the reproducibility project  showed that 64% of studies had sample sizes so small that strong evidence for or against the null or alternative hypotheses did not exist. Indeed, Morey and Lakens (2016) concluded that most of psychology is statistically unfalsifiable due to small sample sizes and correspondingly low power (see  article ). Our discipline’s reputation is suffering. News of the replication crisis has reached the popular press (e.g.,  The Atlantic ,   The Economist ,   Slate , Last Week Tonight ).

An increasing number of psychologists have responded by promoting new research standards that involve open science and the elimination of  Questionable Research Practices . The open science perspective is made manifest in the  Transparency and Openness Promotion (TOP) guidelines  for journal publications. These guidelines were adopted some time ago by the  Association for Psychological Science . More recently, the guidelines were adopted by American Psychological Association journals ( see details ) and journals published by Elsevier ( see details ). It appears likely that, in the very near future, most journals in psychology will be using an open science approach. We strongly advise readers to take a moment to inspect the  TOP Guidelines Summary Table . 

A key aspect of open science and the TOP guidelines is the sharing of data associated with published research (with respect to medical research, see point #35 in the  World Medical Association Declaration of Helsinki ). This practice is viewed widely as highly important. Indeed, open science is recommended by  all G7 science ministers . All Tri-Agency grants must include a data-management plan that includes plans for sharing: “ research data resulting from agency funding should normally be preserved in a publicly accessible, secure and curated repository or other platform for discovery and reuse by others.”  Moreover, a 2017 editorial published in the  New England Journal of Medicine announced that the  International Committee of Medical Journal Editors believes there is  “an ethical obligation to responsibly share data.”  As of this writing,  60% of highly ranked psychology journals require or encourage data sharing .

The increasing importance of demonstrating that findings are replicable is reflected in calls to make replication a requirement for the promotion of faculty (see details in  Nature ) and experts in open science are now refereeing applications for tenure and promotion (see details at the  Center for Open Science  and  this article ). Most dramatically, in one instance, a paper resulting from a dissertation was retracted due to misleading findings attributable to Questionable Research Practices. Subsequent to the retraction, the Ohio State University’s Board of Trustees unanimously revoked the PhD of the graduate student who wrote the dissertation ( see details ). Thus, the academic environment is changing and it is important to work toward using new best practices in lieu of older practices—many of which are synonymous with Questionable Research Practices. Doing so should help you avoid later career regrets and subsequent  public mea culpas . One way to achieve your research objectives in this new academic environment is  to incorporate replications into your research . Replications are becoming more common and there are even websites dedicated to helping students conduct replications (e.g.,  Psychology Science Accelerator ) and indexing the success of replications (e.g., Curate Science ). You might even consider conducting a replication for your thesis (subject to committee approval).

As early-career researchers, it is important to be aware of the changing academic environment. Senior principal investigators may be  reluctant to engage in open science  (see this student perspective in a  blog post  and  podcast ) and research on resistance to data sharing indicates that one of the barriers to sharing data is that researchers do not feel that they have knowledge of  how to share data online . This document is an educational aid and resource to provide students with introductory knowledge of how to participate in open science and online data sharing to start their education on these subjects. 

Guidelines and Explanations

In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping them avoid Questionable Research Practices, many of which are now deemed unethical and covered in the ethics section of textbooks.

This document is an informational tool.

How to Start

In order to follow best practices, some first steps need to be followed. Here is a list of things to do:

  • Get an Open Science account. Registration at  osf.io  is easy!
  • If conducting confirmatory hypothesis testing for your thesis, pre-register your hypotheses (see Section 1-Hypothesizing). The Open Science Foundation website has helpful  tutorials  and  guides  to get you going.
  • Also, pre-register your data analysis plan. Pre-registration typically includes how and when you will stop collecting data, how you will deal with violations of statistical assumptions and points of influence (“outliers”), the specific measures you will use, and the analyses you will use to test each hypothesis, possibly including the analysis script. Again, there is a lot of help available for this. 

Exploratory and Confirmatory Research Are Both of Value, But Do Not Confuse the Two

We note that this document largely concerns confirmatory research (i.e., testing hypotheses). We by no means intend to devalue exploratory research. Indeed, it is one of the primary ways that hypotheses are generated for (possible) confirmation. Instead, we emphasize that it is important that you clearly indicate what of your research is exploratory and what is confirmatory. Be clear in your writing and in your preregistration plan. You should explicitly indicate which of your analyses are exploratory and which are confirmatory. Please note also that if you are engaged in exploratory research, then Null Hypothesis Significance Testing (NHST) should probably be avoided (see rationale in  Gigerenzer  (2004) and  Wagenmakers et al., (2012) ). 

This document is structured around the stages of thesis work:  hypothesizing, design, data collection, analyses, and reporting – consistent with the headings used by Wicherts et al. (2016). We also list the Questionable Research Practices associated with each stage and provide suggestions for avoiding them. We strongly advise going through all of these sections during thesis/dissertation proposal meetings because a priori decisions need to be made prior to data collection (including analysis decisions). 

To help to ensure that the student has informed the committee about key decisions at each stage, there are check boxes at the end of each section.

How to Use This Document in a Proposal Meeting

  • Print off a copy of this document and take it to the proposal meeting.
  • During the meeting, use the document to seek assistance from faculty to address potential problems.
  • Revisit responses to issues raised by this document (especially the Analysis and Reporting Stages) when you are seeking approval to proceed to defense.

Consultation and Help Line

Note that the Center for Open Science now has a help line (for individual researchers and labs) you can call for help with open science issues. They also have training workshops. Please see their  website  for details.

  • Hypothesizing
  • Data Collection
  • Printer-friendly version
  • PDF version

PhD Dissertations

2024
Title Author Supervisor
Estimation and Inference of Optimal Policies ,
Statistical Learning and Modeling with Graphs and Networks ,
2023
Title Author Supervisor
Statistical Methods for the Analysis and Prediction of Hierarchical Time Series Data with Applications to Demography
Exponential Family Models for Rich Preference Ranking Data
Bayesian methods for variable selection ,
Statistical methods for genomic sequencing data
Estimating subnational health and demographic indicators using complex survey data
Inference and Estimation for Network Data
Mixture models to fit heavy-tailed, heterogeneous or sparse data ,
Addressing double dipping through selective inference and data thinning
Methods for the Statistical Analysis of Preferences, with Applications to Social Science Data
Interpretation and Validation for unsupervised learning
2022
Title Author Supervisor
Likelihood-based haplotype frequency modeling using variable-order Markov chains
Statistical Divergences for Learning and Inference: Limit Laws and Non-Asymptotic Bounds ,
Causal Structure Learning in High Dimensions ,
Missing Data Methods for Observational Health Dataset
Methods, Models, and Interpretations for Spatial-Temporal Public Health Applications
Statistical Methods for Clustering and High Dimensional Time Series Analysis
Geometric algorithms for interpretable manifold learning
2021
Title Author Supervisor
Improving Uncertainty Quantification and Visualization for Spatiotemporal Earthquake Rate Models for the Pacific Northwest ,
Statistical modeling of long memory and uncontrolled effects in neural recordings
Causality, Fairness, and Information in Peer Review ,
Subnational Estimation of Period Child Mortality in a Low and Middle Income Countries Context
Distribution-free consistent tests of independence via marginal and multivariate ranks
Progress in nonparametric minimax estimation and high dimensional hypothesis testing ,
Likelihood Analysis of Causal Models
Bayesian Models in Population Projections and Climate Change Forecast
2020
Title Author Supervisor
Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison
Statistical Methods for Geospatial Modeling with Stratified Cluster Survey Data
Representation Learning for Partitioning Problems
Space-Time Contour Models for Sea Ice Forecasting ,
Non-Gaussian Graphical Models: Estimation with Score Matching and Causal Discovery under Zero-Inflation ,
Estimation and Inference in Changepoint Models
Scalable Learning in Latent State Sequence Models
2019
Title Author Supervisor
Latent Variable Models for Prediction & Inference with Proxy Network Measures
Bayesian Hierarchical Models and Moment Bounds for High-Dimensional Time Series ,
Inferring network structure from partially observed graphs
Fitting Stochastics Epidemic Models to Multiple Data Types
Realized genome sharing in random effects models for quantitative genetic traits
Estimation and testing under shape constraints ,
Large-Scale B Cell Receptor Sequence Analysis Using Phylogenetics and Machine Learning
Statistical Methods for Manifold Recovery and C^ (1, 1) Regression on Manifolds
2018
Title Author Supervisor
Topics in Statistics and Convex Geometry: Rounding, Sampling, and Interpolation
Discovering Interaction in Multivariate Time Series
Nonparametric inference on monotone functions, with applications to observational studies
Estimation and Testing Following Model Selection
Topics on Least Squares Estimation
Bayesian Methods for Graphical Models with Limited Data
Model-Based Penalized Regression
Parameter Identification and Assessment of Independence in Multivariate Statistical Modeling
Preferential sampling and model checking in phylodynamic inference
Linear Structural Equation Models with Non-Gaussian Errors: Estimation and Discovery
Coevolution Regression and Composite Likelihood Estimation for Social Networks
2017
Title Author Supervisor
"Methods for Estimation and Inference for High-Dimensional Models" ,
"Scalable Methods for the Inference of Identity by Descent"
"Applications of Robust Statistical Methods in Quantitative Finance"
"Scalable Manifold Learning and Related Topics"
"Topics in Graph Clustering"
2016
Title Author Supervisor
"Bayesian Methods for Inferring Gene Regulatory Networks" ,
"Finite Sampling Exponential Bounds"
"Finite Population Inference for Causal Parameters"
"Projection and Estimation of International Migration"
"Statistical Hurdle Models for Single Cell Gene Expression: Differential Expression and Graphical Modeling"
"Space-Time Smoothing Models for Surveillance and Complex Survey Data"
"Testing Independence in High Dimensions & Identifiability of Graphical Models"
"Likelihood-Based Inference for Partially Observed Multi-Type Markov Branching Processes"
2015
Title Author Supervisor
"The Likelihood Pivot: Performing Inference with Confidence"
"Lord's Paradox and Targeted Interventions: The Case of Special Education" ,
"Bayesian Modeling of a High Resolution Housing Price Index"
"Phylogenetic Stochastic Mapping"
"Theory and Methods for Tensor Data"
"Discrete-Time Threshold Regression for Survival Data with Time-Dependent Covariates"
"Degeneracy, Duration, and Co-Evolution: Extending Exponential Random Graph Models (ERGM) for Social Network Analysis"
2014
Title Author Supervisor
"Bayesian Spatial and Temporal Methods for Public Health Data" ,
"Functional Quantitative Genetics and the Missing Heritability Problem"
"Predictive Modeling of Cholera Outbreaks in Bangladesh" ,
"Gravimetric Anomaly Detection Using Compressed Sensing"
"R-Squared Inference Under Non-Normal Error"
"Monte Carlo Estimation of Identity by Descent in Populations"
2013
Title Author Supervisor
"Bayesian Population Reconstruction: A Method for Estimating Age- and Sex-Specific Vital Rates and Population Counts with Uncertainty from Fragmentary Data"
"Bayesian Nonparametric Inference of Effective Population Size Trajectories from Genomic Data"
"Modeling Heterogeneity Within and Between Matrices and Arrays"
"Shape-Constrained Inference for Concave-Transformed Densities and their Modes"
"Statistical Inference Using Kronecker Structured Covariance"
"Learning and Manifolds: Leveraging the Intrinsic Geometry"
"An Algorithmic Framework for High Dimensional Regression with Dependent Variables"
2012
Title Author Supervisor
"Bayesian Modeling of Health Data in Space and Time"
"Coordinate-Free Exponential Families on Contingency Tables" ,
"Bayesian Modeling For Multivariate Mixed Outcomes With Applications To Cognitive Testing Data"
"Tests for Differences between Least Squares and Robust Regression Parameter Estimates and Related To Pics"
2011
Title Author Supervisor
"Parametrizations of Discrete Graphical Models"
"A Bayesian Surveillance System for Detecting Clusters of Non-Infectious Diseases"
"Statistical Approaches to Analyze Mass Spectrometry Data Graduating Year" ,
"Seeing the trees through the forest; a competition model for growth and mortality"
"Bayesian Inference of Exponential-family Random Graph Models for Social Networks"
"Statistical Models for Estimating and Predicting HIV/AIDS Epidemics"
"Modeling the Game of Soccer Using Potential Functions"
2010
Title Author Supervisor
"Convex analysis methods in shape constrained estimation."
"Estimating social contact networks to improve epidemic simulation models"
"Multivariate Geostatistics and Geostatistical Model Averaging"
"Covariance estimation in the Presence of Diverse Types of Data"
"Portfolio Optimization with Tail Risk Measures and Non-Normal Returns"
2009
Title Author Supervisor
"Conditional tests for localizing trait genes"
"Combining and Evaluating Probabilistic Forecasts"
"Probabilistic weather forecasting using Bayesian model averaging"
"Statistical Analysis of Portfolio Risk and Performance Measures: the Influence Function Approach"
"Factor Model Monte Carlo Methods for General Fund-of-Funds Portfolio Management"
"Statistical Models for Social Network Data and Processes"
"Models for Heterogeneity in Heterosexual Partnership Networks"
"A comparison of alternative methodologies for estimation of HIV incidence"
"Bayesian Model Averaging and Multivariate Conditional Independence Structures"
2008
Title Author Supervisor
"Nonparametric estimation of multivariate monotone densities"
"Learning transcriptional regulatory networks from the integration of heterogeneous high-throughout data"
"Extensions of Latent Class Transition Models with Application to Chronic Disability Survey Data"
"Statistical Solutions to Some Problems in Medical Imaging" ,
"Statistical methods for peptide and protein identification using mass spectrometry"
"Inference from partially-observed network data"
"Models and Inference of Transmission of DNA Methylation Patterns in Mammalian Somatic Cells"
"Estimates and projections of the total fertility rate"
2007
Title Author Supervisor
"Probabilistic weather forecasting with spatial dependence"
"Wavelet variance analysis for time series and random fields" ,
"Bayesian hierarchical curve registration"
""Up-and-Down" and the Percentile-Finding Problem"
"Statistical Methodology for Longitudinal Social Network Data"
2006
Title Author Supervisor
"Likelihood inference for population structure, using the coalescent"
"Exploring rates and patterns of variability in gene conversion and crossover in the human genome"
"Alleviating ecological bias in generalized linear models and optimal design with subsample data" ,
"Nonparametric estimation for current status data with competing risks" ,
"Goodness-of-fit statistics based on phi-divergences"
"An efficient and flexible model for patterns of population genetic variation"
"Learning in Spectral Clustering"
"Variable selection and other extensions of the mixture model clustering framework"
"Algorithms for Estimating the Cluster Tree of a Density"
2005
Title Author Supervisor
"Allele-sharing methods for linkage detection using extended pedigrees"
"Robust estimation of factor models in finance"
"Using the structure of d-connecting paths as a qualitative measure of the strength of dependence" ,
"Alternative estimators of wavelet variance" , ,
"Bayesian robust analysis of gene expression microarray data"
"Alternative models for estimating genetic maps from pedigree data"
2004
Title Author Supervisor
"Maximum likelihood estimation in Gaussian AMP chain graph models and Gaussian ancestral graph models" ,
"Nonparametric estimation of a k-monotone density: A new asymptotic distribution theory"
2003
Title Author Supervisor
"Joint relationship inference from three or more individuals in the presence of genotyping error"
"Personal characteristics and covariate measurement error in disease risk estimation" ,
"Model based and hybrid clustering of large datasets" ,
"The genetic structure of related recombinant lines"
2002
Title Author Supervisor
"Applying graphical models to partially observed data-generating processes" ,
"Generalized linear mixed models: development and comparison of different estimation methods"
"Practical importance sampling methods for finite mixture models and multiple imputation"
2001
Title Author Supervisor
"Estimation with bivariate interval censored data"
"Latent models for cross-covariance" ,
"Bayesian inference for deterministic simulation models for environmental assessment"
"Modeling recessive lethals: An explanation for excess sharing in siblings"
2000
Title Author Supervisor
"Bayesian inference in hidden stochastic population processes"
"Logic regression and statistical issues related to the protein folding problem" ,
"Likelihood ratio inference in regular and non-regular problems"
"Estimating the association between airborne particulate matter and elderly mortality in Seattle, Washington using Bayesian Model Averaging" ,
"Nonhomogeneous hidden Markov models for downscaling synoptic atmospheric patterns to precipitation amounts" ,
"Detecting and extracting complex patterns from images and realizations of spatial point processes"
"A model selection approach to partially linear regression"
"Wavelet-based estimation for trend contaminated long memory processes" ,
"Global covariance modeling: A deformation approach to anisotropy"
"Likelihood inference for parameteric models of dispersal"
1999
Title Author Supervisor
"Fast automatic unsupervised image segmentation and curve detection in spatial point processes"
"Semiparametric inference based on estimating equations in regressions models for two phase outcome dependent sampling" ,
"Capture-recapture estimation of bowhead whale population size using photo-identification data" ,
"Lifetime and disease onset distributions from incomplete observations"
"Statistical approaches to distinct value estimation" ,
"Generalization of boosting algorithms and applications of Bayesian inference for massive datasets" ,
"Bayesian inference for noninvertible deterministic simulation models, with application to bowhead whale assessment"
"Monte Carlo likelihood calculation for identity by descent data"
1998
Title Author Supervisor
"Lattice conditional independence models for incomplete multivariate data and for seemingly unrelated regressions" ,
"Estimation for counting processes with incomplete data"
"Regularization techniques for linear regression with a large set of carriers"
"Large sample theory for pseudo maximum likelihood estimates in semiparametric models"
"Additive mixture models for multichannel image data"
"Application of ridge regression for improved estimation of parameters in compartmental models"
"Bayesian modeling of highly structured systems using Markov chain Monte Carlo"
"Assessing nonstationary time series using wavelets" ,
1997
Title Author Supervisor
"Statistical inference for partially observed markov population processes"
"Tools for the advancement of undergraduate statistics education"
"A new learning procedure in acyclic directed graphs"
"Phylogenies via conditional independence modeling"
"Bayesian model averaging in censored survival models"
"Bayesian information retrieval"
1996
Title Author Supervisor
"Variability estimation in linear inverse problems"
"Inference in a discrete parameter space"
"Bootstrapping functional m-estimators"
1995
Title Author Supervisor
"Estimation of heterogeneous space-time covariance"
"Semiparametric estimation of major gene and random environmental effects for age of onset"
"Statistical analysis of biological monitoring data: State-space models for species compositions"
1994
Title Author Supervisor
"Spatial applications of Markov chain Monte Carlo for bayesian inference"
"Accounting for model uncertainty in linear regression"
"Robust estimation in point processes"
"Multilevel modeling of discrete event history data using Markov chain Monte Carlo methods"
"Estimation in regression models with interval censoring"
1993
Title Author Supervisor
"Markov chain Monte Carlo estimates of probabilities on complex structures"
"A class of stochastic models for relating synoptic atmospheric patterns to local hydrologic phenomena"
"A Bayesian framework and importance sampling methods for synthesizing multiple sources of evidence and uncertainty linked by a complex mechanistic model"
"State-space modeling of salmon migration and Monte Carlo Alternatives to the Kalman filter"
"The Poisson clumping heuristic and the survival of genome in small pedigrees"
1992
Title Author Supervisor
"Auxiliary and missing covariate problems in failure time regression analysis"
"A high order hidden markov model"
"Bayesian methods for the analysis of misclassified or incomplete multivariate discrete data"
1991
Title Author Supervisor
"General-weights bootstrap of the empirical process"
"The weighted likelihood bootstrap and an algorithm for prepivoting"
1990
Title Author Supervisor
"Modeling and bootstrapping for non-gaussian time series"
"Genetic restoration on complex pedigrees"
"Incorporating covariates into a beta-binomial model with applications to medicare policy: A Bayes/empirical Bayes approach"
"Likelihood and exponential families"
"Modelling agricultural field trials in the presence of outliers and fertility jumps"
1989
Title Author Supervisor
"Estimation of mixing and mixed distributions"
"Classical inference in spatial statistics"
1988
Title Author Supervisor
"Diagnostics for time series models"
"Constrained cluster analysis and image understanding"
"Exploratory methods for censored data"
"Aspects of robust analysis in designed experiments"
1987
Title Author Supervisor
"The data viewer: A program for graphical data analysis"
"Additive principal components: A method for estimating additive constraints with small variance from multivariate data"
"Kullback-Leibler estimation of probability measures with an application to clustering"
"Time series models for continuous proportions"
1986
Title Author Supervisor
"Estimation for infinite variance autoregressive processes"
"A computer system for Monte Carlo experimentation"
1985
Title Author Supervisor
"Robust estimation for the errors-in-variables model"
"Robust statistics on compact metric spaces"
"Weak convergence and a law of the iterated logarithm for processes indexed by points in a metric space"
1983
Title Author Supervisor
"The statistics of long memory processes"

IMAGES

  1. Standard statistical tools in research and data analysis

    thesis on statistical analysis

  2. Thesis Statistical Analysis and Editing

    thesis on statistical analysis

  3. Statistical Analysis Types

    thesis on statistical analysis

  4. Chapter 3

    thesis on statistical analysis

  5. Data analysis section of dissertation. How to Use Quantitative Data Analysis in a Thesis. 2022-10-12

    thesis on statistical analysis

  6. Data Analysis Plan for Quantitative Research Analysis

    thesis on statistical analysis

COMMENTS

  1. Mathematics and Statistics Theses and Dissertations ...

    New Developments in Statistical Optimal Designs for Physical and Computer Experiments, Damola M. Akinlana. Advances and Applications of Optimal Polynomial Approximants, Raymond Centner. Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems, Aditya Chakraborty.

  2. The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

    This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

  3. Dissertation Results/Findings Chapter (Quantitative) - Grad Coach

    Learn how to write up the quantitative results/findings/analysis chapter for your dissertation or thesis. Step-by-step guide + examples.

  4. Introduction to Research Statistical Analysis: An Overview of ...

    This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted.

  5. Guideline to Writing a Master’s Thesis in Statistics - ku

    A masters thesis is an independent scientific work and is meant to prepare students for future professional or academic work. Largely, the thesis is expected to be similar to papers published in statistical journals. It is not set in stone exactly how the thesis should be organized.

  6. (PDF) STATISTICAL ANALYSIS WITH SPSS FOR RESEARCH

    The data analysis used to answer the research objectives was descriptive and inferential statistical tests using the IBM SPSS 23.00 program.

  7. What do senior theses in Statistics look like?

    Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors: 1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research ...

  8. Undergraduate theses - Statistical Science - LibGuides at ...

    Statistical Science. This guide highlights key information and resources for Statistical Science research. Submit thesis to DukeSpace.

  9. Statistical Methods in Theses: Guidelines and Explanations

    This document is structured around the stages of thesis work: hypothesizing, design, data collection, analyses, and reporting – consistent with the headings used by Wicherts et al. (2016). We also list the Questionable Research Practices associated with each stage and provide suggestions for avoiding them.

  10. PhD Dissertations | University of Washington Department of ...

    Statistical Methods for the Analysis and Prediction of Hierarchical Time Series Data with Applications to Demography. Daphne Liu. Adrian E Raftery. Statistical methods for genomic sequencing data. Alan Min. William Noble. Exponential Family Models for Rich Preference Ranking Data. Anne Wagner. Marina Meila. Bayesian methods for variable selection.