Logo for The Wharton School

  • Youth Program
  • Wharton Online

Research Papers / Publications

arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Authors and titles for recent submissions

  • Mon, 26 Aug 2024
  • Fri, 23 Aug 2024
  • Thu, 22 Aug 2024
  • Wed, 21 Aug 2024
  • Tue, 20 Aug 2024

See today's new changes

Mon, 26 Aug 2024 (showing first 25 of 41 entries )

Data Science: the impact of statistics

  • Regular Paper
  • Open access
  • Published: 16 February 2018
  • Volume 6 , pages 189–194, ( 2018 )

Cite this article

You have full access to this open access article

statistics related research paper

  • Claus Weihs 1 &
  • Katja Ickstadt 2  

41k Accesses

51 Citations

17 Altmetric

Explore all metrics

In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data acquisition and enrichment, data exploration, data analysis and modeling, validation and representation and reporting. Also, we indicate fallacies when neglecting statistical reasoning.

Similar content being viewed by others

statistics related research paper

Data Analysis

statistics related research paper

Data science vs. statistics: two cultures?

statistics related research paper

Data Science: An Introduction

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction and premise

Data Science as a scientific discipline is influenced by informatics, computer science, mathematics, operations research, and statistics as well as the applied sciences.

In 1996, for the first time, the term Data Science was included in the title of a statistical conference (International Federation of Classification Societies (IFCS) “Data Science, classification, and related methods”) [ 37 ]. Even though the term was founded by statisticians, in the public image of Data Science, the importance of computer science and business applications is often much more stressed, in particular in the era of Big Data.

Already in the 1970s, the ideas of John Tukey [ 43 ] changed the viewpoint of statistics from a purely mathematical setting , e.g., statistical testing, to deriving hypotheses from data ( exploratory setting ), i.e., trying to understand the data before hypothesizing.

Another root of Data Science is Knowledge Discovery in Databases (KDD) [ 36 ] with its sub-topic Data Mining . KDD already brings together many different approaches to knowledge discovery, including inductive learning, (Bayesian) statistics, query optimization, expert systems, information theory, and fuzzy sets. Thus, KDD is a big building block for fostering interaction between different fields for the overall goal of identifying knowledge in data.

Nowadays, these ideas are combined in the notion of Data Science, leading to different definitions. One of the most comprehensive definitions of Data Science was recently given by Cao as the formula [ 12 ]:

data science = (statistics + informatics + computing + communication + sociology + management) | (data + environment + thinking) .

In this formula, sociology stands for the social aspects and | (data + environment + thinking) means that all the mentioned sciences act on the basis of data, the environment and the so-called data-to-knowledge-to-wisdom thinking.

A recent, comprehensive overview of Data Science provided by Donoho in 2015 [ 16 ] focuses on the evolution of Data Science from statistics. Indeed, as early as 1997, there was an even more radical view suggesting to rename statistics to Data Science [ 50 ]. And in 2015, a number of ASA leaders [ 17 ] released a statement about the role of statistics in Data Science, saying that “statistics and machine learning play a central role in data science.”

In our view, statistical methods are crucial in most fundamental steps of Data Science. Hence, the premise of our contribution is:

Statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty.

This paper aims at addressing the major impact of statistics on the most important steps in Data Science.

2 Steps in data science

One of forerunners of Data Science from a structural perspective is the famous CRISP-DM (Cross Industry Standard Process for Data Mining) which is organized in six main steps: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment [ 10 ], see Table  1 , left column. Ideas like CRISP-DM are now fundamental for applied statistics.

In our view, the main steps in Data Science have been inspired by CRISP-DM and have evolved, leading to, e.g., our definition of Data Science as a sequence of the following steps: Data Acquisition and Enrichment, Data Storage and Access , Data Exploration, Data Analysis and Modeling, Optimization of Algorithms , Model Validation and Selection, Representation and Reporting of Results, and Business Deployment of Results . Note that topics in small capitals indicate steps where statistics is less involved, cp. Table  1 , right column.

Usually, these steps are not just conducted once but are iterated in a cyclic loop. In addition, it is common to alternate between two or more steps. This holds especially for the steps Data Acquisition and Enrichment , Data Exploration , and Statistical Data Analysis , as well as for Statistical Data Analysis and Modeling and Model Validation and Selection .

Table  1 compares different definitions of steps in Data Science. The relationship of terms is indicated by horizontal blocks. The missing step Data Acquisition and Enrichment in CRISP-DM indicates that that scheme deals with observational data only. Moreover, in our proposal, the steps Data Storage and Access and Optimization of Algorithms are added to CRISP-DM, where statistics is less involved.

The list of steps for Data Science may even be enlarged, see, e.g., Cao in [ 12 ], Figure 6, cp. also Table  1 , middle column, for the following recent list: Domain-specific Data Applications and Problems, Data Storage and Management, Data Quality Enhancement, Data Modeling and Representation, Deep Analytics, Learning and Discovery, Simulation and Experiment Design, High-performance Processing and Analytics, Networking, Communication, Data-to-Decision and Actions.

In principle, Cao’s and our proposal cover the same main steps. However, in parts, Cao’s formulation is more detailed; e.g., our step Data Analysis and Modeling corresponds to Data Modeling and Representation, Deep Analytics, Learning and Discovery . Also, the vocabularies differ slightly, depending on whether the respective background is computer science or statistics. In that respect note that Experiment Design in Cao’s definition means the design of the simulation experiments.

In what follows, we will highlight the role of statistics discussing all the steps, where it is heavily involved, in Sects.  2.1 – 2.6 . These coincide with all steps in our proposal in Table  1 except steps in small capitals. The corresponding entries Data Storage and Access and Optimization of Algorithms are mainly covered by informatics and computer science , whereas Business Deployment of Results is covered by Business Management .

2.1 Data acquisition and enrichment

Design of experiments (DOE) is essential for a systematic generation of data when the effect of noisy factors has to be identified. Controlled experiments are fundamental for robust process engineering to produce reliable products despite variation in the process variables. On the one hand, even controllable factors contain a certain amount of uncontrollable variation that affects the response. On the other hand, some factors, like environmental factors, cannot be controlled at all. Nevertheless, at least the effect of such noisy influencing factors should be controlled by, e.g., DOE.

DOE can be utilized, e.g.,

to systematically generate new data ( data acquisition ) [ 33 ],

for systematically reducing data bases [ 41 ], and

for tuning (i.e., optimizing) parameters of algorithms [ 1 ], i.e., for improving the data analysis methods (see Sect.  2.3 ) themselves.

Simulations [ 7 ] may also be used to generate new data. A tool for the enrichment of data bases to fill data gaps is the imputation of missing data [ 31 ].

Such statistical methods for data generation and enrichment need to be part of the backbone of Data Science. The exclusive use of observational data without any noise control distinctly diminishes the quality of data analysis results and may even lead to wrong result interpretation. The hope for “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” [ 4 ] appears to be wrong due to noise in the data.

Thus, experimental design is crucial for the reliability, validity, and replicability of our results.

2.2 Data exploration

Exploratory statistics is essential for data preprocessing to learn about the contents of a data base. Exploration and visualization of observed data was, in a way, initiated by John Tukey [ 43 ]. Since that time, the most laborious part of data analysis, namely data understanding and transformation, became an important part in statistical science.

Data exploration or data mining is fundamental for the proper usage of analytical methods in Data Science. The most important contribution of statistics is the notion of distribution . It allows us to represent variability in the data as well as (a-priori) knowledge of parameters, the concept underlying Bayesian statistics. Distributions also enable us to choose adequate subsequent analytic models and methods.

2.3 Statistical data analysis

Finding structure in data and making predictions are the most important steps in Data Science. Here, in particular, statistical methods are essential since they are able to handle many different analytical tasks. Important examples of statistical data analysis methods are the following.

Hypothesis testing is one of the pillars of statistical analysis. Questions arising in data driven problems can often be translated to hypotheses. Also, hypotheses are the natural links between underlying theory and statistics. Since statistical hypotheses are related to statistical tests, questions and theory can be tested for the available data. Multiple usage of the same data in different tests often leads to the necessity to correct significance levels. In applied statistics, correct multiple testing is one of the most important problems, e.g., in pharmaceutical studies [ 15 ]. Ignoring such techniques would lead to many more significant results than justified.

Classification methods are basic for finding and predicting subpopulations from data. In the so-called unsupervised case, such subpopulations are to be found from a data set without a-priori knowledge of any cases of such subpopulations. This is often called clustering.

In the so-called supervised case, classification rules should be found from a labeled data set for the prediction of unknown labels when only influential factors are available.

Nowadays, there is a plethora of methods for the unsupervised [ 22 ] as well for the supervised case [ 2 ].

In the age of Big Data, a new look at the classical methods appears to be necessary, though, since most of the time the calculation effort of complex analysis methods grows stronger than linear with the number of observations n or the number of features p . In the case of Big Data, i.e., if n or p is large, this leads to too high calculation times and to numerical problems. This results both, in the comeback of simpler optimization algorithms with low time-complexity [ 9 ] and in re-examining the traditional methods in statistics and machine learning for Big Data [ 46 ].

Regression methods are the main tool to find global and local relationships between features when the target variable is measured. Depending on the distributional assumption for the underlying data, different approaches may be applied. Under the normality assumption, linear regression is the most common method, while generalized linear regression is usually employed for other distributions from the exponential family [ 18 ]. More advanced methods comprise functional regression for functional data [ 38 ], quantile regression [ 25 ], and regression based on loss functions other than squared error loss like, e.g., Lasso regression [ 11 , 21 ]. In the context of Big Data, the challenges are similar to those for classification methods given large numbers of observations n (e.g., in data streams) and / or large numbers of features p . For the reduction of n , data reduction techniques like compressed sensing, random projection methods [ 20 ] or sampling-based procedures [ 28 ] enable faster computations. For decreasing the number p to the most influential features, variable selection or shrinkage approaches like the Lasso [ 21 ] can be employed, keeping the interpretability of the features. (Sparse) principal component analysis [ 21 ] may also be used.

Time series analysis aims at understanding and predicting temporal structure [ 42 ]. Time series are very common in studies of observational data, and prediction is the most important challenge for such data. Typical application areas are the behavioral sciences and economics as well as the natural sciences and engineering. As an example, let us have a look at signal analysis, e.g., speech or music data analysis. Here, statistical methods comprise the analysis of models in the time and frequency domains. The main aim is the prediction of future values of the time series itself or of its properties. For example, the vibrato of an audio time series might be modeled in order to realistically predict the tone in the future [ 24 ] and the fundamental frequency of a musical tone might be predicted by rules learned from elapsed time periods [ 29 ].

In econometrics, multiple time series and their co-integration are often analyzed [ 27 ]. In technical applications, process control is a common aim of time series analysis [ 34 ].

2.4 Statistical modeling

Complex interactions between factors can be modeled by graphs or networks . Here, an interaction between two factors is modeled by a connection in the graph or network [ 26 , 35 ]. The graphs can be undirected as, e.g., in Gaussian graphical models, or directed as, e.g., in Bayesian networks. The main goal in network analysis is deriving the network structure. Sometimes, it is necessary to separate (unmix) subpopulation specific network topologies [ 49 ].

Stochastic differential and difference equations can represent models from the natural and engineering sciences [ 3 , 39 ]. The finding of approximate statistical models solving such equations can lead to valuable insights for, e.g., the statistical control of such processes, e.g., in mechanical engineering [ 48 ]. Such methods can build a bridge between the applied sciences and Data Science.

Local models and globalization Typically, statistical models are only valid in sub-regions of the domain of the involved variables. Then, local models can be used [ 8 ]. The analysis of structural breaks can be basic to identify the regions for local modeling in time series [ 5 ]. Also, the analysis of concept drifts can be used to investigate model changes over time [ 30 ].

In time series, there are often hierarchies of more and more global structures. For example, in music, a basic local structure is given by the notes and more and more global ones by bars, motifs, phrases, parts etc. In order to find global properties of a time series, properties of the local models can be combined to more global characteristics [ 47 ].

Mixture models can also be used for the generalization of local to global models [ 19 , 23 ]. Model combination is essential for the characterization of real relationships since standard mathematical models are often much too simple to be valid for heterogeneous data or bigger regions of interest.

2.5 Model validation and model selection

In cases where more than one model is proposed for, e.g., prediction, statistical tests for comparing models are helpful to structure the models, e.g., concerning their predictive power [ 45 ].

Predictive power is typically assessed by means of so-called resampling methods where the distribution of power characteristics is studied by artificially varying the subpopulation used to learn the model. Characteristics of such distributions can be used for model selection [ 7 ].

Perturbation experiments offer another possibility to evaluate the performance of models. In this way, the stability of the different models against noise is assessed [ 32 , 44 ].

Meta-analysis as well as model averaging are methods to evaluate combined models [ 13 , 14 ].

Model selection became more and more important in the last years since the number of classification and regression models proposed in the literature increased with higher and higher speed.

2.6 Representation and reporting

Visualization to interpret found structures and storing of models in an easy-to-update form are very important tasks in statistical analyses to communicate the results and safeguard data analysis deployment. Deployment is decisive for obtaining interpretable results in Data Science. It is the last step in CRISP-DM [ 10 ] and underlying the data-to-decision and action step in Cao [ 12 ].

Besides visualization and adequate model storing, for statistics, the main task is reporting of uncertainties and review [ 6 ].

3 Fallacies

The statistical methods described in Sect.  2 are fundamental for finding structure in data and for obtaining deeper insight into data, and thus, for a successful data analysis. Ignoring modern statistical thinking or using simplistic data analytics/statistical methods may lead to avoidable fallacies. This holds, in particular, for the analysis of big and/or complex data.

As mentioned at the end of Sect.  2.2 , the notion of distribution is the key contribution of statistics. Not taking into account distributions in data exploration and in modeling restricts us to report values and parameter estimates without their corresponding variability. Only the notion of distributions enables us to predict with corresponding error bands.

Moreover, distributions are the key to model-based data analytics. For example, unsupervised learning can be employed to find clusters in data. If additional structure like dependency on space or time is present, it is often important to infer parameters like cluster radii and their spatio-temporal evolution. Such model-based analysis heavily depends on the notion of distributions (see [ 40 ] for an application to protein clusters).

If more than one parameter is of interest, it is advisable to compare univariate hypothesis testing approaches to multiple procedures, e.g., in multiple regression, and choose the most adequate model by variable selection. Restricting oneself to univariate testing, would ignore relationships between variables.

Deeper insight into data might require more complex models, like, e.g., mixture models for detecting heterogeneous groups in data. When ignoring the mixture, the result often represents a meaningless average, and learning the subgroups by unmixing the components might be needed. In a Bayesian framework, this is enabled by, e.g., latent allocation variables in a Dirichlet mixture model. For an application of decomposing a mixture of different networks in a heterogeneous cell population in molecular biology see [ 49 ].

A mixture model might represent mixtures of components of very unequal sizes, with small components (outliers) being of particular importance. In the context of Big Data, naïve sampling procedures are often employed for model estimation. However, these have the risk of missing small mixture components. Hence, model validation or sampling according to a more suitable distribution as well as resampling methods for predictive power are important.

4 Conclusion

Following the above assessment of the capabilities and impacts of statistics our conclusion is:

The role of statistics in Data Science is under-estimated as, e.g., compared to computer science. This yields, in particular, for the areas of data acquisition and enrichment as well as for advanced modeling needed for prediction.

Stimulated by this conclusion, statisticians are well-advised to more offensively play their role in this modern and well accepted field of Data Science.

Only complementing and/or combining mathematical methods and computational algorithms with statistical reasoning, particularly for Big Data, will lead to scientific results based on suitable approaches. Ultimately, only a balanced interplay of all sciences involved will lead to successful solutions in Data Science.

Adenso-Diaz, B., Laguna, M.: Fine-tuning of algorithms using fractional experimental designs and local search. Oper. Res. 54 (1), 99–114 (2006)

Article   Google Scholar  

Aggarwal, C.C. (ed.): Data Classification: Algorithms and Applications. CRC Press, Boca Raton (2014)

Google Scholar  

Allen, E., Allen, L., Arciniega, A., Greenwood, P.: Construction of equivalent stochastic differential equation models. Stoch. Anal. Appl. 26 , 274–297 (2008)

Article   MathSciNet   Google Scholar  

Anderson, C.: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine https://www.wired.com/2008/06/pb-theory/ (2008)

Aue, A., Horváth, L.: Structural breaks in time series. J. Time Ser. Anal. 34 (1), 1–16 (2013)

Berger, R.E.: A scientific approach to writing for engineers and scientists. IEEE PCS Professional Engineering Communication Series IEEE Press, Wiley (2014)

Book   Google Scholar  

Bischl, B., Mersmann, O., Trautmann, H., Weihs, C.: Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol. Comput. 20 (2), 249–275 (2012)

Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28 (6), 2599–2619 (2013)

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. arXiv preprint arXiv:1606.04838 (2016)

Brown, M.S.: Data Mining for Dummies. Wiley, London (2014)

Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)

Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. (2017). https://doi.org/10.1145/3076253

Claeskens, G., Hjort, N.L.: Model Selection and Model Averaging. Cambridge University Press, Cambridge (2008)

Cooper, H., Hedges, L.V., Valentine, J.C.: The Handbook of Research Synthesis and Meta-analysis. Russell Sage Foundation, New York City (2009)

Dmitrienko, A., Tamhane, A.C., Bretz, F.: Multiple Testing Problems in Pharmaceutical Statistics. Chapman and Hall/CRC, London (2009)

Donoho, D.: 50 Years of Data Science. http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf (2015)

Dyk, D.V., Fuentes, M., Jordan, M.I., Newton, M., Ray, B.K., Lang, D.T., Wickham, H.: ASA Statement on the Role of Statistics in Data Science. http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/ (2015)

Fahrmeir, L., Kneib, T., Lang, S., Marx, B.: Regression: Models, Methods and Applications. Springer, Berlin (2013)

Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, Berlin (2006)

MATH   Google Scholar  

Geppert, L., Ickstadt, K., Munteanu, A., Quedenfeld, J., Sohler, C.: Random projections for Bayesian regression. Stat. Comput. 27 (1), 79–101 (2017). https://doi.org/10.1007/s11222-015-9608-z

Article   MathSciNet   MATH   Google Scholar  

Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)

Hennig, C., Meila, M., Murtagh, F., Rocci, R.: Handbook of Cluster Analysis. Chapman & Hall, London (2015)

Klein, H.U., Schäfer, M., Porse, B.T., Hasemann, M.S., Ickstadt, K., Dugas, M.: Integrative analysis of histone chip-seq and transcription data using Bayesian mixture models. Bioinformatics 30 (8), 1154–1162 (2014)

Knoche, S., Ebeling, M.: The musical signal: physically and psychologically, chap 2. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (eds.) Music Data Analysis—Foundations and Applications, pp. 15–68. CRC Press, Boca Raton (2017)

Koenker, R.: Quantile Regression. Econometric Society Monographs, vol. 38 (2010)

Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT press, Cambridge (2009)

Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin (2010)

Ma, P., Mahoney, M.W., Yu, B.: A statistical perspective on algorithmic leveraging. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 91–99. http://jmlr.org/proceedings/papers/v32/ma14.html (2014)

Martin, R., Nagathil, A.: Digital filters and spectral analysis, chap 4. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (eds.) Music Data Analysis—Foundations and Applications, pp. 111–143. CRC Press, Boca Raton (2017)

Mejri, D., Limam, M., Weihs, C.: A new dynamic weighted majority control chart for data streams. Soft Comput. 22(2), 511–522. https://doi.org/10.1007/s00500-016-2351-3

Molenberghs, G., Fitzmaurice, G., Kenward, M.G., Tsiatis, A., Verbeke, G.: Handbook of Missing Data Methodology. CRC Press, Boca Raton (2014)

Molinelli, E.J., Korkut, A., Wang, W.Q., Miller, M.L., Gauthier, N.P., Jing, X., Kaushik, P., He, Q., Mills, G., Solit, D.B., Pratilas, C.A., Weigt, M., Braunstein, A., Pagnani, A., Zecchina, R., Sander, C.: Perturbation Biology: Inferring Signaling Networks in Cellular Systems. arXiv preprint arXiv:1308.5193 (2013)

Montgomery, D.C.: Design and Analysis of Experiments, 8th edn. Wiley, London (2013)

Oakland, J.: Statistical Process Control. Routledge, London (2007)

Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos (1988)

Chapter   Google Scholar  

Piateski, G., Frawley, W.: Knowledge Discovery in Databases. MIT Press, Cambridge (1991)

Press, G.: A Very Short History of Data Science. https://www.forbescom/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#5c515ed055cf (2013). [last visit: March 19, 2017]

Ramsay, J., Silverman, B.W.: Functional Data Analysis. Springer, Berlin (2005)

Särkkä, S.: Applied Stochastic Differential Equations. https://users.aalto.fi/~ssarkka/course_s2012/pdf/sde_course_booklet_2012.pdf (2012). [last visit: March 6, 2017]

Schäfer, M., Radon, Y., Klein, T., Herrmann, S., Schwender, H., Verveer, P.J., Ickstadt, K.: A Bayesian mixture model to quantify parameters of spatial clustering. Comput. Stat. Data Anal. 92 , 163–176 (2015). https://doi.org/10.1016/j.csda.2015.07.004

Schiffner, J., Weihs, C.: D-optimal plans for variable selection in data bases. Technical Report, 14/09, SFB 475 (2009)

Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications: With R Examples. Springer, Berlin (2010)

Tukey, J.W.: Exploratory Data Analysis. Pearson, London (1977)

Vatcheva, I., de Jong, H., Mars, N.: Selection of perturbation experiments for model discrimination. In: Horn, W. (ed.) Proceedings of the 14th European Conference on Artificial Intelligence, ECAI-2000, IOS Press, pp 191–195 (2000)

Vatolkin, I., Weihs, C.: Evaluation, chap 13. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (eds.) Music Data Analysis—Foundations and Applications, pp. 329–363. CRC Press, Boca Raton (2017)

Weihs, C.: Big data classification — aspects on many features. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds.) Solving Large Scale Learning Tasks: Challenges and Algorithms, Springer Lecture Notes in Artificial Intelligence, vol. 9580, pp. 139–147 (2016)

Weihs, C., Ligges, U.: From local to global analysis of music time series. In: Morik, K., Siebes, A., Boulicault, J.F. (eds.) Detecting Local Patterns, Springer Lecture Notes in Artificial Intelligence, vol. 3539, pp. 233–245 (2005)

Weihs, C., Messaoud, A., Raabe, N.: Control charts based on models derived from differential equations. Qual. Reliab. Eng. Int. 26 (8), 807–816 (2010)

Wieczorek, J., Malik-Sheriff, R.S., Fermin, Y., Grecco, H.E., Zamir, E., Ickstadt, K.: Uncovering distinct protein-network topologies in heterogeneous cell populations. BMC Syst. Biol. 9 (1), 24 (2015)

Wu, J.: Statistics = data science? http://www2.isye.gatech.edu/~jeffwu/presentations/datascience.pdf (1997)

Download references

Acknowledgements

The authors would like to thank the editor, the guest editors and all reviewers for valuable comments on an earlier version of the manuscript. They also thank Leo Geppert for fruitful discussions.

Author information

Authors and affiliations.

Computational Statistics, TU Dortmund University, 44221, Dortmund, Germany

Claus Weihs

Mathematical Statistics and Biometric Applications, TU Dortmund University, 44221, Dortmund, Germany

Katja Ickstadt

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Claus Weihs .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0 /), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Weihs, C., Ickstadt, K. Data Science: the impact of statistics. Int J Data Sci Anal 6 , 189–194 (2018). https://doi.org/10.1007/s41060-018-0102-5

Download citation

Received : 20 March 2017

Accepted : 25 January 2018

Published : 16 February 2018

Issue Date : November 2018

DOI : https://doi.org/10.1007/s41060-018-0102-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Structures of data science
  • Impact of statistics on data science
  • Fallacies in data science
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals

Biostatistics articles from across Nature Portfolio

Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of data from them, and the analysis and interpretation of data. The data come from a wide range of sources, including genomic studies, experiments with cells and organisms, and clinical trials.

Latest Research and Reviews

statistics related research paper

Azacitidine and gemtuzumab ozogamicin as post-transplant maintenance therapy for high-risk hematologic malignancies

  • Satoshi Kaito
  • Yuho Najima
  • Noriko Doki

statistics related research paper

Impact of COVID-19 on antibiotic usage in primary care: a retrospective analysis

  • Anna Romaszko-Wojtowicz
  • K. Tokarczyk-Malesa
  • K. Glińska-Lewczuk

statistics related research paper

A standardized metric to enhance clinical trial design and outcome interpretation in type 1 diabetes

The use of a standardized outcome metric enhances clinical trial interpretation and cross-trial comparison. Here, the authors show the implementation of such a metric using type 1 diabetes trial data, reassess and compare results from these trials, and extend its use to define response to therapy.

  • Alyssa Ylescupidez
  • Henry T. Bahnson
  • Carla J. Greenbaum

statistics related research paper

A novel approach to visualize clinical benefit of therapies for chronic graft versus host disease (cGvHD): the probability of being in response (PBR) applied to the REACH3 study

  • Norbert Hollaender
  • Ekkehard Glimm
  • Robert Zeiser

statistics related research paper

Reproducibility in pharmacometrics applied in a phase III trial of BCG-vaccination for COVID-19

  • Rob C. van Wijk
  • Laurynas Mockeliunas
  • Ulrika S. H. Simonsson

statistics related research paper

Addressing mechanism bias in model-based impact forecasts of new tuberculosis vaccines

The complex transmission chain of tuberculosis (TB) forces mathematical modelers to make mechanistic assumptions when modelling vaccine effects. Here, authors posit a Bayesian formalism that unlocks mechanism-agnostic impact forecasts for TB vaccines.

Advertisement

News and Comment

Mitigating immortal-time bias: exploring osteonecrosis and survival in pediatric all - aall0232 trial insights.

  • Shyam Srinivasan
  • Swaminathan Keerthivasagam

Response to Pfirrmann et al.’s comment on How should we interpret conclusions of TKI-stopping studies

  • Junren Chen
  • Robert Peter Gale

statistics related research paper

Cell-free DNA chromosome copy number variations predict outcomes in plasma cell myeloma

  • Wanting Qiang

statistics related research paper

The role of allogeneic haematopoietic cell transplantation as consolidation after anti-CD19 CAR-T cell therapy in adults with relapsed/refractory acute lymphoblastic leukaemia: a prospective cohort study

  • Lijuan Zhou

Clinical trials: design, endpoints and interpretation of outcomes

  • Megan Othus
  • Mei-Jie Zhang

statistics related research paper

A SAS macro for estimating direct adjusted survival functions for time-to-event data with or without left truncation

  • Zhen-Huan Hu
  • Hai-Lin Wang

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

statistics related research paper

  • Search Menu
  • Sign in through your institution
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Prehistoric Archaeology
  • Browse content in Art
  • History of Art
  • Browse content in Classical Studies
  • Classical History
  • Classical Literature
  • Classical Reception
  • Greek and Roman Archaeology
  • Digital Humanities
  • Browse content in History
  • Diplomatic History
  • Environmental History
  • Genocide and Ethnic Cleansing
  • History by Period
  • Legal and Constitutional History
  • Regional and National History
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • World History
  • Browse content in Language Teaching and Learning
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Language Families
  • Language Evolution
  • Lexicography
  • Browse content in Literature
  • Bibliography
  • Literary Studies (American)
  • Literary Studies (20th Century onwards)
  • Literary Studies (British and Irish)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Medicine and Music
  • Music Theory and Analysis
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Philosophy of Science
  • Philosophy of Mind
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Browse content in Religion
  • Christianity
  • Judaism and Jewish Studies
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Ethical Issues and Debates
  • Browse content in Law
  • Arbitration
  • Company and Commercial Law
  • Comparative Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Parliamentary and Legislative Practice
  • Employment and Labour Law
  • Environment and Energy Law
  • Financial Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Legal System and Practice
  • Medical and Healthcare Law
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Dietetics and Nutrition
  • Physiotherapy
  • Radiography
  • Anaesthetics
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Rheumatology
  • Sleep Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Forensic Medicine
  • History of Medicine
  • Browse content in Medical Dentistry
  • Restorative Dentistry and Orthodontics
  • Medical Ethics
  • Medical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Paediatrics
  • Browse content in Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Preclinical Medicine
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • Neurosurgery
  • Plastic and Reconstructive Surgery
  • Trauma and Orthopaedic Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Physical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Browse content in Computing
  • Computer Security
  • Computer Networking and Communications
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Meteorology and Climatology
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Environmental Sustainability
  • Management of Land and Natural Resources (Environmental Science)
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Nanotechnology
  • Browse content in Mathematics

Applied Mathematics

  • Biomathematics and Statistics
  • Mathematical Education
  • Mathematical Analysis

Probability and Statistics

Pure mathematics.

  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Neuroscientific Techniques
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Classical Mechanics
  • Relativity and Gravitation
  • Browse content in Psychology
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Health Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Human Evolution
  • Browse content in Business and Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • Organizational Theory and Behaviour
  • Public and Nonprofit Management
  • Browse content in Criminology and Criminal Justice
  • Criminology
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • Health, Education, and Welfare
  • Labour and Demographic Economics
  • Law and Economics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Browse content in Education
  • Schools Studies
  • Teaching of Specific Groups and Special Educational Needs
  • Environment
  • Browse content in Human Geography
  • Economic Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • Foreign Policy
  • Gender and Politics
  • International Relations
  • International Organization (Politics)
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Security Studies
  • Browse content in Regional and Area Studies
  • African Studies
  • Japanese Studies
  • Research and Information
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Browse content in Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Migration Studies
  • Race and Ethnicity
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Urban and Rural Studies
  • Journals A to Z
  • Books on Oxford Academic

statistics related research paper

Articles making an impact in Mathematics and Statistics

Browse specially curated selections of high-impact research from the mathematics and statistics journals published by Oxford University Press. The collections feature a mixture of:

The most read articles published in the first half of 2022.

Untapped research sections containing articles selected by Editors-in-Chief as worthy of more attention from the research community.

And most read, most cited and most discussed articles published in 2020 and 2021.

All articles are freely available for you to read, download, and enjoy. 

IMAMAT High Impact 170x60.png

The  IMA Journal of Applied Mathematics  is an interdisciplinary journal that publishes research on mathematics arising in the physical sciences and engineering as well as suitable articles in the life sciences, social sciences, and finance.

IMAMAN High Impact 170x60.png

IMA Journal of Management Mathematics  publishes mathematical research that can be directly used or has demonstrable potential to be used in the management of profit, not-for-profit, and governmental/public organisations.

IMAMCI High Impact 170x60.png

The  IMA Journal of Mathematical Control and Information  is dedicated to developing solutions to the unsolved problems in control and information theory.

IMANUM High Impact 170x60.png

The IMA Journal of Numerical Analysis publishes original contributions to all fields of numerical analysis

IMAIAI High Impact 170x60.png

Information and Inference: A Journal of the IMA  publishes high quality mathematically-oriented articles, furthering the understanding of the theory, methods of analysis, and algorithms for information and data.

IMAMMB High Impact 170x60.png

Mathematical Medicine and Biology  publishes original articles with a significant mathematical content addressing topics in medicine and biology. 

TEAMAT High Impact 170x60.png

The journal provides a forum for the exchange of ideas and experiences which contribute to the improvement of mathematics teaching and learning for students from upper secondary/high school level through to university first degree level. 

IMATRM High Impact 170x60.png

Transactions of Mathematics and its Applications  is a generalist applied mathematics journal covering fluid and solid mechanics; probability and stochastic analysis; applied analysis; dynamical, integrable and complex systems; mathematics of information; numerical analysis and more.

COMNET High Impact 170x60.png

Journal of Complex Networks  publishes original articles and reviews with a significant contribution to the analysis and understanding of complex networks and its applications in diverse fields. 

PHIMAT High Impact 170x60.png

Publishing new work in philosophy of mathematics, the application of mathematics, and computing,  Philosophia Mathematica  is the only journal in the world devoted specifically to philosophy of mathematics. 

QJMAMJ High Impact 170x60.png

The Quarterly Journal of Mechanics and Applied Mathematics  is an online only journal publishing original research articles on the application of mathematics to the field of mechanics interpreted in its widest sense. 

LOGCOM High Impact 170x60.png

The Journal of Logic and Computation  is an online only publication aiming to promote the growth of logic and computing in several areas.

statistics related research paper

Logic Journal of the IGPL  publishes papers in all areas of pure and applied logic, including pure logical systems, proof theory, model theory, recursion theory, type theory, nonclassical logics, nonmonotonic logic, numerical and uncertainty reasoning, logic and AI, foundations of logic programming, logic and computation, logic and language, and logic engineering.

QJMATH High Impact 170x60.png

The Quarterly Journal of Mathematics  publishes original contributions to pure mathematics. All major areas of pure mathematics are represented on the editorial board.

IMRNOT High Impact 170x60.png

International Mathematics Research Notices  publishes articles of high current interest across all areas of mathematics where the research contributes to advancing the state of the science of mathematics.

BIOMET High Impact 170x60.png

Biometrika  is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications.

BIOSTS High Impact 170x60.png

Biostatistics  is an online only journal publishing papers that develop innovative statistical methods with applications to the understanding of human health and disease, including basic biomedical sciences.

LAWPRJ High Impact 170x60.png

Law, Probability & Risk  is an online only, fully refereed journal which publishes papers dealing with topics on the interface of law and probabilistic reasoning. 

Affiliations

  • Copyright © 2024
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Inferential Statistics | An Easy Introduction & Examples

Inferential Statistics | An Easy Introduction & Examples

Published on September 4, 2020 by Pritha Bhandari . Revised on June 22, 2023.

While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data.

When you have collected data from a sample , you can use inferential statistics to understand the larger population from which the sample is taken.

Inferential statistics have two main uses:

  • making estimates about populations (for example, the mean SAT score of all 11th graders in the US).
  • testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

Table of contents

Descriptive versus inferential statistics, estimating population parameters from sample statistics, hypothesis testing, other interesting articles, frequently asked questions about inferential statistics.

Descriptive statistics allow you to describe a data set, while inferential statistics allow you to make inferences based on a data set.

  • Descriptive statistics

Using descriptive statistics, you can report characteristics of your data:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.

Inferential statistics

Most of the time, you can only acquire data from samples, because it is too difficult or expensive to collect data from the whole population that you’re interested in.

While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population.

With inferential statistics, it’s important to use random and unbiased sampling methods . If your sample isn’t representative of your population, then you can’t make valid statistical inferences or generalize .

Sampling error in inferential statistics

Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error , which is the difference between the true population values (called parameters) and the measured sample values (called statistics).

Sampling error arises any time you use a sample, even if your sample is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, using probability sampling methods reduces this uncertainty.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The characteristics of samples and populations are described by numbers called statistics and parameters :

  • A statistic is a measure that describes the sample (e.g., sample mean ).
  • A parameter is a measure that describes the whole population (e.g., population mean).

Sampling error is the difference between a parameter and a corresponding statistic. Since in most cases you don’t know the real population parameter, you can use inferential statistics to estimate these parameters in a way that takes sampling error into account.

There are two important types of estimates you can make about the population: point estimates and interval estimates .

  • A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Confidence intervals

A confidence interval uses the variability around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are interested in, a confidence interval tells you the uncertainty of the point estimate. They are best used in combination with each other.

Each confidence interval is associated with a confidence level. A confidence level tells you the probability (in percentage) of the interval containing the parameter estimate if you repeat the study again.

A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.

Although you can say that your estimate will lie within the interval a certain percentage of the time, you cannot say for sure that the actual population parameter will. That’s because you can’t know the true value of the population parameter without collecting data from the full population.

However, with random sampling and a suitable sample size, you can reasonably expect your confidence interval to contain the parameter a certain percentage of the time.

Your point estimate of the population mean paid vacation days is the sample mean of 19 paid vacation days.

Hypothesis testing is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.

Hypotheses , or predictions, are tested using statistical tests . Statistical tests also estimate sampling errors so that valid inferences can be made.

Statistical tests can be parametric or non-parametric. Parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists.

Parametric tests make assumptions that include the following:

  • the population that the sample comes from follows a normal distribution of scores
  • the sample size is large enough to represent the population
  • the variances , a measure of variability , of each group being compared are similar

When your data violates any of these assumptions, non-parametric tests are more suitable. Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.

Statistical tests come in three forms: tests of comparison, correlation or regression.

Comparison tests

Comparison tests assess whether there are differences in means, medians or rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.

Means can only be found for interval or ratio data , while medians and rankings are more appropriate measures for ordinal data .

test Yes Means 2 samples
Yes Means 3+ samples
Mood’s median No Medians 2+ samples
Wilcoxon signed-rank No Distributions 2 samples
Wilcoxon rank-sum (Mann-Whitney ) No Sums of rankings 2 samples
Kruskal-Wallis No Mean rankings 3+ samples

Correlation tests

Correlation tests determine the extent to which two variables are associated.

Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be used with nominal variables.

Pearson’s Yes Interval/ratio variables
Spearman’s No Ordinal/interval/ratio variables
Chi square test of independence No Nominal/ordinal variables

Regression tests

Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.

Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.

Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.

1 interval/ratio variable 1 interval/ratio variable
2+ interval/ratio variable(s) 1 interval/ratio variable
Logistic regression 1+ any variable(s) 1 binary variable
Nominal regression 1+ any variable(s) 1 nominal variable
Ordinal regression 1+ any variable(s) 1 ordinal variable

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Confidence interval
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Prevent plagiarism. Run a free check.

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

A sampling error is the difference between a population parameter and a sample statistic .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Inferential Statistics | An Easy Introduction & Examples. Scribbr. Retrieved August 26, 2024, from https://www.scribbr.com/statistics/inferential-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, parameter vs statistic | definitions, differences & examples, descriptive statistics | definitions, types, examples, hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

  • Privacy Policy

Research Method

Home » 500+ Statistics Research Topics

500+ Statistics Research Topics

Statistics Research Topics

Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data . It is a fundamental tool used in various fields such as business, social sciences, engineering, healthcare, and many more. As a research topic , statistics can be a fascinating subject to explore, as it allows researchers to investigate patterns, trends, and relationships within data. With the help of statistical methods, researchers can make informed decisions and draw valid conclusions based on empirical evidence. In this post, we will explore some interesting statistics research topics that can be pursued by researchers to further expand our understanding of this field.

Statistics Research Topics

Statistics Research Topics are as follows:

  • Analysis of the effectiveness of different marketing strategies on consumer behavior.
  • An investigation into the relationship between economic growth and environmental sustainability.
  • A study of the effects of social media on mental health and well-being.
  • A comparative analysis of the educational outcomes of public and private schools.
  • The impact of climate change on agriculture and food security.
  • A survey of the prevalence and causes of workplace stress in different industries.
  • A statistical analysis of crime rates in urban and rural areas.
  • An evaluation of the effectiveness of alternative medicine treatments.
  • A study of the relationship between income inequality and health outcomes.
  • A comparative analysis of the effectiveness of different weight loss programs.
  • An investigation into the factors that affect job satisfaction among employees.
  • A statistical analysis of the relationship between poverty and crime.
  • A study of the factors that influence the success of small businesses.
  • A survey of the prevalence and causes of childhood obesity.
  • An evaluation of the effectiveness of drug addiction treatment programs.
  • A statistical analysis of the relationship between gender and leadership in organizations.
  • A study of the relationship between parental involvement and academic achievement.
  • An investigation into the causes and consequences of income inequality.
  • A comparative analysis of the effectiveness of different types of therapy for mental health conditions.
  • A survey of the prevalence and causes of substance abuse among teenagers.
  • An evaluation of the effectiveness of online education compared to traditional classroom learning.
  • A statistical analysis of the impact of globalization on different industries.
  • A study of the relationship between social media use and political polarization.
  • An investigation into the factors that influence customer loyalty in the retail industry.
  • A comparative analysis of the effectiveness of different types of advertising.
  • A survey of the prevalence and causes of workplace discrimination.
  • An evaluation of the effectiveness of different types of employee training programs.
  • A statistical analysis of the relationship between air pollution and health outcomes.
  • A study of the factors that affect employee turnover rates.
  • An investigation into the causes and consequences of income mobility.
  • A comparative analysis of the effectiveness of different types of leadership styles.
  • A survey of the prevalence and causes of mental health disorders among college students.
  • An evaluation of the effectiveness of different types of cancer treatments.
  • A statistical analysis of the impact of social media influencers on consumer behavior.
  • A study of the factors that influence the adoption of renewable energy sources.
  • An investigation into the relationship between alcohol consumption and health outcomes.
  • A comparative analysis of the effectiveness of different types of conflict resolution strategies.
  • A survey of the prevalence and causes of childhood poverty.
  • An evaluation of the effectiveness of different types of diversity training programs.
  • A statistical analysis of the relationship between immigration and economic growth.
  • A study of the factors that influence customer satisfaction in the service industry.
  • An investigation into the causes and consequences of urbanization.
  • A comparative analysis of the effectiveness of different types of economic policies.
  • A survey of the prevalence and causes of elder abuse.
  • An evaluation of the effectiveness of different types of rehabilitation programs for prisoners.
  • A statistical analysis of the impact of automation on different industries.
  • A study of the factors that influence employee productivity in the workplace.
  • An investigation into the causes and consequences of gentrification.
  • A comparative analysis of the effectiveness of different types of humanitarian aid.
  • A survey of the prevalence and causes of homelessness.
  • Exploring the relationship between socioeconomic status and access to healthcare services
  • An analysis of the relationship between parental education level and children’s academic performance.
  • Exploring the effects of different statistical models on prediction accuracy in machine learning.
  • The Impact of Social Media on Consumer Behavior: A Statistical Analysis
  • Bayesian hierarchical modeling for network data analysis
  • Spatial statistics and modeling for environmental data
  • Nonparametric methods for time series analysis
  • Bayesian inference for high-dimensional data analysis
  • Multivariate analysis for genetic data
  • Machine learning methods for predicting financial markets
  • Causal inference in observational studies
  • Sampling design and estimation for complex surveys
  • Robust statistical methods for outlier detection
  • Statistical inference for large-scale simulations
  • Survival analysis and its applications in medical research
  • Mixture models for clustering and classification
  • Time-varying coefficient models for longitudinal data
  • Multilevel modeling for complex data structures
  • Graphical modeling and Bayesian networks
  • Experimental design for clinical trials
  • Inference for network data using stochastic block models
  • Nonlinear regression modeling for data with complex structures
  • Statistical learning for social network analysis
  • Time series forecasting using deep learning methods
  • Model selection and variable importance in high-dimensional data
  • Spatial point process modeling for environmental data
  • Bayesian spatial modeling for disease mapping
  • Functional data analysis for longitudinal studies
  • Bayesian network meta-analysis
  • Statistical methods for big data analysis
  • Mixed-effects models for longitudinal data
  • Clustering algorithms for text data
  • Bayesian modeling for spatiotemporal data
  • Multivariate analysis for ecological data
  • Statistical analysis of genomic data
  • Bayesian network inference for gene regulatory networks
  • Principal component analysis for high-dimensional data
  • Time series analysis of financial data
  • Multivariate survival analysis for complex outcomes
  • Nonparametric estimation of causal effects
  • Bayesian network analysis of complex systems
  • Statistical inference for multilevel network data
  • Generalized linear mixed models for non-normal data
  • Bayesian inference for dynamic systems
  • Latent variable modeling for categorical data
  • Statistical inference for social network data
  • Regression models for panel data
  • Bayesian spatiotemporal modeling for climate data
  • Predictive modeling for customer behavior analysis
  • Nonlinear time series analysis for ecological systems
  • Statistical modeling for image analysis
  • Bayesian hierarchical modeling for longitudinal data
  • Network-based clustering for high-dimensional data
  • Bayesian spatial modeling for ecological systems.
  • Analysis of the Effect of Climate Change on Crop Yields: A Case Study
  • Examining the Relationship Between Physical Activity and Mental Health in Young Adults
  • A Comparative Study of Crime Rates in Urban and Rural Areas Using Statistical Methods
  • Investigating the Effect of Online Learning on Student Performance in Mathematics
  • A Statistical Analysis of the Relationship Between Economic Growth and Environmental Sustainability
  • Evaluating the Effectiveness of Different Marketing Strategies for E-commerce Businesses
  • Identifying the Key Factors Affecting Customer Loyalty in the Hospitality Industry
  • An Analysis of the Factors Influencing Student Dropout Rates in Higher Education
  • Examining the Impact of Gender on Salary Disparities in the Workplace Using Statistical Methods
  • Investigating the Relationship Between Physical Fitness and Academic Performance in High School Students
  • Analyzing the Effect of Social Support on Mental Health in Elderly Populations
  • A Comparative Study of Different Methods for Forecasting Stock Prices
  • Investigating the Effect of Online Reviews on Consumer Purchasing Decisions
  • Identifying the Key Factors Affecting Employee Turnover Rates in the Technology Industry
  • Analyzing the Effect of Advertising on Brand Awareness and Purchase Intentions
  • A Study of the Relationship Between Health Insurance Coverage and Healthcare Utilization
  • Examining the Effect of Parental Involvement on Student Achievement in Elementary School
  • Investigating the Impact of Social Media on Political Campaigns Using Statistical Methods
  • A Comparative Analysis of Different Methods for Detecting Fraud in Financial Transactions
  • Analyzing the Relationship Between Entrepreneurial Characteristics and Business Success
  • Investigating the Effect of Job Satisfaction on Employee Performance in the Service Industry
  • Identifying the Key Factors Affecting the Adoption of Renewable Energy Technologies
  • A Study of the Relationship Between Personality Traits and Academic Achievement
  • Examining the Impact of Social Media on Body Image and Self-Esteem in Adolescents
  • Investigating the Effect of Mobile Advertising on Consumer Behavior
  • Analyzing the Relationship Between Healthcare Expenditures and Health Outcomes Using Statistical Methods
  • A Comparative Study of Different Methods for Analyzing Customer Satisfaction Data
  • Investigating the Impact of Economic Factors on Voter Behavior Using Statistical Methods
  • Identifying the Key Factors Affecting Student Retention Rates in Community Colleges
  • Analyzing the Relationship Between Workplace Diversity and Organizational Performance
  • Investigating the Effect of Gamification on Learning and Motivation in Education
  • A Study of the Relationship Between Social Support and Depression in Cancer Patients
  • Examining the Impact of Technology on the Travel Industry Using Statistical Methods
  • Investigating the Effect of Customer Service Quality on Customer Loyalty in the Retail Industry
  • Analyzing the Relationship Between Internet Usage and Social Isolation in Older Adults
  • A Comparative Study of Different Methods for Predicting Customer Churn in Telecommunications
  • Investigating the Impact of Social Media on Consumer Attitudes Towards Brands Using Statistical Methods
  • Identifying the Key Factors Affecting Student Success in Online Learning Environments
  • Analyzing the Relationship Between Employee Engagement and Organizational Commitment
  • Investigating the Effect of Customer Reviews on Sales in E-commerce Businesses
  • A Study of the Relationship Between Political Ideology and Attitudes Towards Climate Change
  • Examining the Impact of Technological Innovations on the Manufacturing Industry Using Statistical Methods
  • Investigating the Effect of Social Support on Postpartum Depression in New Mothers
  • Analyzing the Relationship Between Cultural Intelligence and Cross-Cultural Adaptation
  • Investigating the relationship between socioeconomic status and health outcomes using statistical methods.
  • Analyzing trends in crime rates and identifying factors that contribute to them using statistical methods.
  • Examining the effectiveness of different advertising strategies using statistical analysis of consumer behavior.
  • Identifying factors that influence voting behavior and election outcomes using statistical methods.
  • Investigating the relationship between employee satisfaction and productivity in the workplace using statistical methods.
  • Developing new statistical models to better understand the spread of infectious diseases.
  • Analyzing the impact of climate change on global food production using statistical methods.
  • Identifying patterns and trends in social media data using statistical methods.
  • Investigating the relationship between social networks and mental health using statistical methods.
  • Developing new statistical models to predict financial market trends and identify investment opportunities.
  • Analyzing the effectiveness of different educational programs and interventions using statistical methods.
  • Investigating the impact of environmental factors on public health using statistical methods.
  • Developing new statistical models to analyze complex biological systems and identify new drug targets.
  • Analyzing trends in consumer spending and identifying factors that influence buying behavior using statistical methods.
  • Investigating the relationship between diet and health outcomes using statistical methods.
  • Developing new statistical models to analyze gene expression data and identify biomarkers for disease.
  • Analyzing patterns in crime data to predict future crime rates and improve law enforcement strategies.
  • Investigating the effectiveness of different medical treatments using statistical methods.
  • Developing new statistical models to analyze the impact of air pollution on public health.
  • Analyzing trends in global migration and identifying factors that influence migration patterns using statistical methods.
  • Investigating the impact of automation on the job market using statistical methods.
  • Developing new statistical models to analyze climate data and predict future climate trends.
  • Analyzing trends in online shopping behavior and identifying factors that influence consumer decisions using statistical methods.
  • Investigating the impact of social media on political discourse using statistical methods.
  • Developing new statistical models to analyze gene-environment interactions and identify new disease risk factors.
  • Analyzing trends in the stock market and identifying factors that influence investment decisions using statistical methods.
  • Investigating the impact of early childhood education on long-term academic and social outcomes using statistical methods.
  • Developing new statistical models to analyze the relationship between human behavior and the environment.
  • Analyzing trends in the use of renewable energy and identifying factors that influence adoption rates using statistical methods.
  • Investigating the impact of immigration on labor market outcomes using statistical methods.
  • Developing new statistical models to analyze the relationship between social determinants and health outcomes.
  • Analyzing patterns in customer churn to predict future customer behavior and improve business strategies.
  • Investigating the effectiveness of different marketing strategies using statistical methods.
  • Developing new statistical models to analyze the relationship between air pollution and climate change.
  • Analyzing trends in global tourism and identifying factors that influence travel behavior using statistical methods.
  • Investigating the impact of social media on mental health using statistical methods.
  • Developing new statistical models to analyze the impact of transportation on the environment.
  • Analyzing trends in global trade and identifying factors that influence trade patterns using statistical methods.
  • Investigating the impact of social networks on political participation using statistical methods.
  • Developing new statistical models to analyze the relationship between climate change and biodiversity loss.
  • Analyzing trends in the use of alternative medicine and identifying factors that influence adoption rates using statistical methods.
  • Investigating the impact of technological change on the labor market using statistical methods.
  • Developing new statistical models to analyze the impact of climate change on agriculture.
  • Investigating the impact of social media on mental health: A longitudinal study.
  • A comparison of the effectiveness of different types of teaching methods on student learning outcomes.
  • Examining the relationship between sleep duration and productivity among college students.
  • A study of the factors that influence employee job satisfaction in the tech industry.
  • Analyzing the relationship between income level and health outcomes among low-income populations.
  • Investigating the effectiveness of online learning platforms for high school students.
  • A study of the factors that contribute to success in online entrepreneurship.
  • Analyzing the impact of climate change on agricultural productivity in developing countries.
  • A comparison of different statistical models for predicting stock market trends.
  • Examining the impact of sports on mental health: A cross-sectional study.
  • A study of the factors that influence employee retention in the hospitality industry.
  • Analyzing the impact of cultural differences on international business negotiations.
  • Investigating the effectiveness of different weight loss interventions for obese individuals.
  • A study of the relationship between personality traits and academic achievement.
  • Examining the impact of technology on job displacement: A longitudinal study.
  • A comparison of the effectiveness of different types of advertising strategies on consumer behavior.
  • Analyzing the impact of environmental regulations on corporate profitability.
  • Investigating the effectiveness of different types of therapy for treating depression.
  • A study of the factors that contribute to success in e-commerce.
  • Examining the relationship between social support and mental health in the elderly population.
  • A comparison of different statistical methods for analyzing complex survey data.
  • Analyzing the impact of employee diversity on organizational performance.
  • Investigating the effectiveness of different types of exercise for improving cardiovascular health.
  • A study of the relationship between emotional intelligence and job performance.
  • Examining the impact of work-life balance on employee well-being.
  • A comparison of the effectiveness of different types of financial education programs for low-income populations.
  • Analyzing the impact of air pollution on respiratory health in urban areas.
  • Investigating the relationship between personality traits and leadership effectiveness.
  • A study of the factors that influence consumer behavior in the luxury goods market.
  • Examining the impact of social networks on political participation: A cross-sectional study.
  • A comparison of different statistical methods for analyzing survival data.
  • Analyzing the impact of government policies on income inequality.
  • Investigating the effectiveness of different types of counseling for substance abuse.
  • A study of the relationship between cultural values and consumer behavior.
  • Examining the impact of technology on privacy: A longitudinal study.
  • A comparison of the effectiveness of different types of online marketing strategies.
  • Analyzing the impact of the gig economy on job satisfaction: A cross-sectional study.
  • Investigating the effectiveness of different types of education interventions for improving financial literacy.
  • A study of the factors that contribute to success in social entrepreneurship.
  • Examining the impact of gender diversity on board performance in publicly-traded companies.
  • A comparison of different statistical methods for analyzing panel data.
  • Analyzing the impact of employee involvement in decision-making on organizational performance.
  • Investigating the effectiveness of different types of treatment for anxiety disorders.
  • A study of the relationship between cultural values and entrepreneurial success.
  • Examining the impact of technology on the labor market: A longitudinal study.
  • A comparison of the effectiveness of different types of direct mail campaigns.
  • Analyzing the impact of telecommuting on employee productivity: A cross-sectional study.
  • Investigating the effectiveness of different types of retirement planning interventions for low-income individuals.
  • Analyzing the effectiveness of different educational interventions in improving student performance
  • Investigating the impact of climate change on food production and food security
  • Identifying factors that influence employee satisfaction and productivity in the workplace
  • Examining the prevalence and causes of mental health disorders in different populations
  • Evaluating the effectiveness of different marketing strategies in promoting consumer behavior
  • Analyzing the prevalence and consequences of substance abuse in different communities
  • Investigating the relationship between social media use and mental health outcomes
  • Examining the role of genetics in the development of different diseases
  • Identifying factors that contribute to the gender wage gap in different industries
  • Analyzing the effectiveness of different policing strategies in reducing crime rates
  • Investigating the impact of immigration on economic growth and development
  • Examining the prevalence and causes of domestic violence in different populations
  • Evaluating the effectiveness of different interventions for treating addiction
  • Analyzing the prevalence and impact of childhood obesity on health outcomes
  • Investigating the relationship between diet and chronic diseases such as diabetes and heart disease
  • Examining the effects of different types of exercise on physical and mental health outcomes
  • Identifying factors that influence voter behavior and political participation
  • Analyzing the prevalence and impact of sleep disorders on health outcomes
  • Investigating the effectiveness of different educational interventions in improving health outcomes
  • Examining the impact of environmental pollution on public health outcomes
  • Evaluating the effectiveness of different interventions for reducing opioid addiction and overdose rates
  • Analyzing the prevalence and causes of homelessness in different communities
  • Investigating the relationship between race and health outcomes
  • Examining the impact of social support networks on health outcomes
  • Identifying factors that contribute to income inequality in different regions
  • Analyzing the prevalence and impact of workplace stress on employee health outcomes
  • Investigating the relationship between education and income levels in different communities
  • Examining the effects of different types of technology on mental health outcomes
  • Evaluating the effectiveness of different interventions for reducing healthcare costs
  • Analyzing the prevalence and impact of chronic pain on health outcomes
  • Investigating the relationship between urbanization and public health outcomes
  • Examining the effects of different types of drugs on health outcomes
  • Identifying factors that contribute to educational attainment in different populations
  • Analyzing the prevalence and causes of food insecurity in different communities
  • Investigating the relationship between race and crime rates
  • Examining the impact of social media on political participation and engagement
  • Evaluating the effectiveness of different interventions for reducing poverty levels
  • Analyzing the prevalence and impact of stress on mental health outcomes
  • Investigating the relationship between religion and health outcomes
  • Examining the effects of different types of parenting styles on child development outcomes
  • Identifying factors that contribute to political polarization in different regions
  • Analyzing the prevalence and causes of teenage pregnancy in different communities
  • Investigating the impact of globalization on economic growth and development
  • Examining the prevalence and impact of social isolation on mental health outcomes
  • Evaluating the effectiveness of different interventions for reducing gun violence
  • Analyzing the prevalence and impact of bullying on mental health outcomes
  • Investigating the relationship between immigration and crime rates
  • Examining the effects of different types of diets on health outcomes
  • Identifying factors that contribute to social inequality in different regions
  • Bayesian inference for high-dimensional models
  • Analysis of longitudinal data with missing values
  • Nonparametric regression with functional predictors
  • Estimation and inference for copula models
  • Statistical methods for neuroimaging data analysis
  • Robust methods for high-dimensional data analysis
  • Analysis of spatially correlated data
  • Bayesian nonparametric modeling
  • Statistical methods for network data
  • Optimal experimental design for nonlinear models
  • Multivariate time series analysis
  • Inference for partially identified models
  • Statistical learning for personalized medicine
  • Statistical inference for rare events
  • High-dimensional mediation analysis
  • Analysis of multi-omics data
  • Nonparametric regression with mixed types of predictors
  • Estimation and inference for graphical models
  • Statistical inference for infectious disease dynamics
  • Robust methods for high-dimensional covariance matrix estimation
  • Analysis of spatio-temporal data
  • Bayesian modeling for ecological data
  • Multivariate spatial point pattern analysis
  • Statistical methods for functional magnetic resonance imaging (fMRI) data
  • Nonparametric estimation of conditional distributions
  • Statistical methods for spatial econometrics
  • Inference for stochastic processes
  • Bayesian spatiotemporal modeling
  • High-dimensional causal inference
  • Analysis of data from complex survey designs
  • Bayesian nonparametric survival analysis
  • Statistical methods for fMRI connectivity analysis
  • Spatial quantile regression
  • Statistical modeling for climate data
  • Estimation and inference for item response models
  • Bayesian model selection and averaging
  • High-dimensional principal component analysis
  • Analysis of data from clinical trials with noncompliance
  • Nonparametric regression with censored data
  • Statistical methods for functional data analysis
  • Inference for network models
  • Bayesian nonparametric clustering
  • High-dimensional classification
  • Analysis of ecological network data
  • Statistical modeling for time-to-event data with multiple events
  • Estimation and inference for nonparametric density estimation
  • Bayesian nonparametric regression with time-varying coefficients
  • Statistical methods for functional magnetic resonance spectroscopy (fMRS) data

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cyber Security Research Topics

500+ Cyber Security Research Topics

Communication Research Topics

300+ Communication Research Topics

Interesting Research Topics

300+ Interesting Research Topics

American History Research Paper Topics

300+ American History Research Paper Topics

Sociology Research Topics

1000+ Sociology Research Topics

Research Paper Topics

1100+ Research Paper Topics

data science Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation matters: human-centered ai system to assist data science code documentation in computational notebooks.

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

Data science in the business environment: Insight management for an Executive MBA

Adventures in financial data science, gecoagent: a conversational agent for empowering genomic data extraction and analysis.

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools. This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

Differentially Private Medical Texts Generation Using Generative Neural Networks

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.

Impact on Stock Market across Covid-19 Outbreak

Abstract: This paper analysis the impact of pandemic over the global stock exchange. The stock listing values are determined by variety of factors including the seasonal changes, catastrophic calamities, pandemic, fiscal year change and many more. This paper significantly provides analysis on the variation of listing price over the world-wide outbreak of novel corona virus. The key reason to imply upon this outbreak was to provide notion on underlying regulation of stock exchanges. Daily closing prices of the stock indices from January 2017 to January 2022 has been utilized for the analysis. The predominant feature of the research is to analyse the fact that does global economy downfall impacts the financial stock exchange. Keywords: Stock Exchange, Matplotlib, Streamlit, Data Science, Web scrapping.

Information Resilience: the nexus of responsible and agile approaches to information use

AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Alzheimers disease (AD) is a brain disorder that is mainly characterized by a progressive degeneration of neurons in the brain, causing a decline in cognitive abilities and difficulties in engaging in day-to-day activities. This study compares an FFT-based spectral analysis against a functional connectivity analysis based on phase synchronization, for finding known differences between AD patients and Healthy Control (HC) subjects. Both of these quantitative analysis methods were applied on a dataset comprising bipolar EEG montages values from 20 diagnosed AD patients and 20 age-matched HC subjects. Additionally, an attempt was made to localize the identified AD-induced brain activity effects in AD patients. The obtained results showed the advantage of the functional connectivity analysis method compared to a simple spectral analysis. Specifically, while spectral analysis could not find any significant differences between the AD and HC groups, the functional connectivity analysis showed statistically higher synchronization levels in the AD group in the lower frequency bands (delta and theta), suggesting that the AD patients brains are in a phase-locked state. Further comparison of functional connectivity between the homotopic regions confirmed that the traits of AD were localized in the centro-parietal and centro-temporal areas in the theta frequency band (4-8 Hz). The contribution of this study is that it applies a neural metric for Alzheimers detection from a data science perspective rather than from a neuroscience one. The study shows that the combination of bipolar derivations with phase synchronization yields similar results to comparable studies employing alternative analysis methods.

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

A growing number of physical objects with embedded sensors with typically high volume and frequently updated data sets has accentuated the need to develop methodologies to extract useful information from big data for supporting decision making. This study applies a suite of data analytics and core principles of data science to characterize near real-time meteorological data with a focus on extreme weather events. To highlight the applicability of this work and make it more accessible from a risk management perspective, a foundation for a software platform with an intuitive Graphical User Interface (GUI) was developed to access and analyze data from a decommissioned nuclear production complex operated by the U.S. Department of Energy (DOE, Richland, USA). Exploratory data analysis (EDA), involving classical non-parametric statistics, and machine learning (ML) techniques, were used to develop statistical summaries and learn characteristic features of key weather patterns and signatures. The new approach and GUI provide key insights into using big data and ML to assist site operation related to safety management strategies for extreme weather events. Specifically, this work offers a practical guide to analyzing long-term meteorological data and highlights the integration of ML and classical statistics to applied risk and decision science.

Export Citation Format

Share document.

StatAnalytica

Top 99+ Trending Statistics Research Topics for Students

statistics research topics

Being a statistics student, finding the best statistics research topics is quite challenging. But not anymore; find the best statistics research topics now!!!

Statistics is one of the tough subjects because it consists of lots of formulas, equations and many more. Therefore the students need to spend their time to understand these concepts. And when it comes to finding the best statistics research project for their topics, statistics students are always looking for someone to help them. 

In this blog, we will share with you the most interesting and trending statistics research topics in 2023. It will not just help you to stand out in your class but also help you to explore more about the world.

If you face any problem regarding statistics, then don’t worry. You can get the best statistics assignment help from one of our experts.

As you know, it is always suggested that you should work on interesting topics. That is why we have mentioned the most interesting research topics for college students and high school students. Here in this blog post, we will share with you the list of 99+ awesome statistics research topics.

Why Do We Need to Have Good Statistics Research Topics?

Table of Contents

Having a good research topic will not just help you score good grades, but it will also allow you to finish your project quickly. Because whenever we work on something interesting, our productivity automatically boosts. Thus, you need not invest lots of time and effort, and you can achieve the best with minimal effort and time. 

What Are Some Interesting Research Topics?

If we talk about the interesting research topics in statistics, it can vary from student to student. But here are the key topics that are quite interesting for almost every student:-

  • Literacy rate in a city.
  • Abortion and pregnancy rate in the USA.
  • Eating disorders in the citizens.
  • Parent role in self-esteem and confidence of the student.
  • Uses of AI in our daily life to business corporates.

Top 99+ Trending Statistics Research Topics For 2023

Here in this section, we will tell you more than 99 trending statistics research topics:

Sports Statistics Research Topics

  • Statistical analysis for legs and head injuries in Football.
  • Statistical analysis for shoulder and knee injuries in MotoGP.
  • Deep statistical evaluation for the doping test in sports from the past decade.
  • Statistical observation on the performance of athletes in the last Olympics.
  • Role and effect of sports in the life of the student.

Psychology Research Topics for Statistics

  • Deep statistical analysis of the effect of obesity on the student’s mental health in high school and college students.
  • Statistical evolution to find out the suicide reason among students and adults.
  • Statistics analysis to find out the effect of divorce on children in a country.
  • Psychology affects women because of the gender gap in specific country areas.
  • Statistics analysis to find out the cause of online bullying in students’ lives. 
  • In Psychology, PTSD and descriptive tendencies are discussed.
  • The function of researchers in statistical testing and probability.
  • Acceptable significance and probability thresholds in clinical Psychology.
  • The utilization of hypothesis and the role of P 0.05 for improved comprehension.
  • What types of statistical data are typically rejected in psychology?
  • The application of basic statistical principles and reasoning in psychological analysis.
  • The role of correlation is when several psychological concepts are at risk.
  • Actual case study learning and modeling are used to generate statistical reports.
  • In psychology, naturalistic observation is used as a research sample.
  • How should descriptive statistics be used to represent behavioral data sets?

Applied Statistics Research Topics

  • Does education have a deep impact on the financial success of an individual?
  • The investment in digital technology is having a meaningful return for corporations?
  • The gap of financial wealth between rich and poor in the USA.
  • A statistical approach to identify the effects of high-frequency trading in financial markets.
  • Statistics analysis to determine the impact of the multi-agent model in financial markets. 

Personalized Medicine Statistics Research Topics

  • Statistical analysis on the effect of methamphetamine on substance abusers.
  • Deep research on the impact of the Corona vaccine on the Omnicrone variant. 
  • Find out the best cancer treatment approach between orthodox therapies and alternative therapies.
  • Statistics analysis to identify the role of genes in the child’s overall immunity.
  • What factors help the patients to survive from Coronavirus .

Experimental Design Statistics Research Topics

  • Generic vs private education is one of the best for the students and has better financial return.
  • Psychology vs physiology: which leads the person not to quit their addictions?
  • Effect of breastmilk vs packed milk on the infant child overall development
  • Which causes more accidents: male alcoholics vs female alcoholics.
  • What causes the student not to reveal the cyberbullying in front of their parents in most cases. 

Easy Statistics Research Topics

  • Application of statistics in the world of data science
  • Statistics for finance: how statistics is helping the company to grow their finance
  • Advantages and disadvantages of Radar chart
  • Minor marriages in south-east Asia and African countries.
  • Discussion of ANOVA and correlation.
  • What statistical methods are most effective for active sports?
  • When measuring the correctness of college tests, a ranking statistical approach is used.
  • Statistics play an important role in Data Mining operations.
  • The practical application of heat estimation in engineering fields.
  • In the field of speech recognition, statistical analysis is used.
  • Estimating probiotics: how much time is necessary for an accurate statistical sample?
  • How will the United States population grow in the next twenty years?
  • The legislation and statistical reports deal with contentious issues.
  • The application of empirical entropy approaches with online grammar checking.
  • Transparency in statistical methodology and the reporting system of the United States Census Bureau.

Statistical Research Topics for High School

  • Uses of statistics in chemometrics
  • Statistics in business analytics and business intelligence
  • Importance of statistics in physics.
  • Deep discussion about multivariate statistics
  • Uses of Statistics in machine learning

Survey Topics for Statistics

  • Gather the data of the most qualified professionals in a specific area.
  • Survey the time wasted by the students in watching Tvs or Netflix.
  • Have a survey the fully vaccinated people in the USA 
  • Gather information on the effect of a government survey on the life of citizens
  • Survey to identify the English speakers in the world.

Statistics Research Paper Topics for Graduates

  • Have a deep decision of Bayes theorems
  • Discuss the Bayesian hierarchical models
  • Analysis of the process of Japanese restaurants. 
  • Deep analysis of Lévy’s continuity theorem
  • Analysis of the principle of maximum entropy

AP Statistics Topics

  • Discuss about the importance of econometrics
  • Analyze the pros and cons of Probit Model
  • Types of probability models and their uses
  • Deep discussion of ortho stochastic matrix
  • Find out the ways to get an adjacency matrix quickly

Good Statistics Research Topics 

  • National income and the regulation of cryptocurrency.
  • The benefits and drawbacks of regression analysis.
  • How can estimate methods be used to correct statistical differences?
  • Mathematical prediction models vs observation tactics.
  • In sociology research, there is bias in quantitative data analysis.
  • Inferential analytical approaches vs. descriptive statistics.
  • How reliable are AI-based methods in statistical analysis?
  • The internet news reporting and the fluctuations: statistics reports.
  • The importance of estimate in modeled statistics and artificial sampling.

Business Statistics Topics

  • Role of statistics in business in 2023
  • Importance of business statistics and analytics
  • What is the role of central tendency and dispersion in statistics
  • Best process of sampling business data.
  • Importance of statistics in big data.
  • The characteristics of business data sampling: benefits and cons of software solutions.
  • How may two different business tasks be tackled concurrently using linear regression analysis?
  • In economic data relations, index numbers, random probability, and correctness are all important.
  • The advantages of a dataset approach to statistics in programming statistics.
  • Commercial statistics: how should the data be prepared for maximum accuracy?

Statistical Research Topics for College Students

  • Evaluate the role of John Tukey’s contribution to statistics.
  • The role of statistics to improve ADHD treatment.
  • The uses and timeline of probability in statistics.
  • Deep analysis of Gertrude Cox’s experimental design in statistics.
  • Discuss about Florence Nightingale in statistics.
  • What sorts of music do college students prefer?
  • The Main Effect of Different Subjects on Student Performance.
  • The Importance of Analytics in Statistics Research.
  • The Influence of a Better Student in Class.
  • Do extracurricular activities help in the transformation of personalities?
  • Backbenchers’ Impact on Class Performance.
  • Medication’s Importance in Class Performance.
  • Are e-books better than traditional books?
  • Choosing aspects of a subject in college

How To Write Good Statistics Research Topics?

So, the main question that arises here is how you can write good statistics research topics. The trick is understanding the methodology that is used to collect and interpret statistical data. However, if you are trying to pick any topic for your statistics project, you must think about it before going any further. 

As a result, it will teach you about the data types that will be researched because the sample will be chosen correctly. On the other hand, your basic outline for choosing the correct topics is as follows:

  • Introduction of a problem
  • Methodology explanation and choice. 
  • Statistical research itself is in the main part (Body Part). 
  • Samples deviations and variables. 
  • Lastly, statistical interpretation is your last part (conclusion). 

Note:   Always include the sources from which you obtained the statistics data.

Top 3 Tips to Choose Good Statistics Research Topics

It can be quite easy for some students to pick a good statistics research topic without the help of an essay writer. But we know that it is not a common scenario for every student. That is why we will mention some of the best tips that will help you choose good statistics research topics for your next project. Either you are in a hurry or have enough time to explore. These tips will help you in every scenario.

1. Narrow down your research topic

We all start with many topics as we are not sure about our specific interests or niche. The initial step to picking up a good research topic for college or school students is to narrow down the research topic.

For this, you need to categorize the matter first. And then pick a specific category as per your interest. After that, brainstorm about the topic’s content and how you can make the points catchy, focused, directional, clear, and specific. 

2. Choose a topic that gives you curiosity

After categorizing the statistics research topics, it is time to pick one from the category. Don’t pick the most common topic because it will not help your grades and knowledge. Instead of it, please choose the best one, in which you have little information, or you are more likely to explore it.

In a statistics research paper, you always can explore something beyond your studies. By doing this, you will be more energetic to work on this project. And you will also feel glad to get them lots of information you were willing to have but didn’t get because of any reasons.

It will also make your professor happy to see your work. Ultimately it will affect your grades with a positive attitude.

3. Choose a manageable topic

Now you have decided on the topic, but you need to make sure that your research topic should be manageable. You will have limited time and resources to complete your project if you pick one of the deep statistics research topics with massive information.

Then you will struggle at the last moment and most probably not going to finish your project on time. Therefore, spend enough time exploring the topic and have a good idea about the time duration and resources you will use for the project. 

Statistics research topics are massive in numbers. Because statistics operations can be performed on anything from our psychology to our fitness. Therefore there are lots more statistics research topics to explore. But if you are not finding it challenging, then you can take the help of our statistics experts . They will help you to pick the most interesting and trending statistics research topics for your projects. 

With this help, you can also save your precious time to invest it in something else. You can also come up with a plethora of topics of your choice and we will help you to pick the best one among them. Apart from that, if you are working on a project and you are not sure whether that is the topic that excites you to work on it or not. Then we can also help you to clear all your doubts on the statistics research topic. 

Frequently Asked Questions

Q1. what are some good topics for the statistics project.

Have a look at some good topics for statistics projects:- 1. Research the average height and physics of basketball players. 2. Birth and death rate in a specific city or country. 3. Study on the obesity rate of children and adults in the USA. 4. The growth rate of China in the past few years 5. Major causes of injury in Football

Q2. What are the topics in statistics?

Statistics has lots of topics. It is hard to cover all of them in a short answer. But here are the major ones: conditional probability, variance, random variable, probability distributions, common discrete, and many more. 

Q3. What are the top 10 research topics?

Here are the top 10 research topics that you can try in 2023:

1. Plant Science 2. Mental health 3. Nutritional Immunology 4. Mood disorders 5. Aging brains 6. Infectious disease 7. Music therapy 8. Political misinformation 9. Canine Connection 10. Sustainable agriculture

Related Posts

how-to-find-the=best-online-statistics-homework-help

How to Find the Best Online Statistics Homework Help

why-spss-homework-help-is-an-important-aspects-for-students

Why SPSS Homework Help Is An Important aspect for Students?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

What Research Has Been Conducted on Procrastination? Evidence From a Systematical Bibliometric Analysis

Associated data.

The original contributions presented in the study are included in the article/ Supplementary Material , further inquiries can be directed to the corresponding author/s.

Procrastination is generally perceived as a common behavioral tendency, and there are a growing number of literatures to discuss this complex phenomenon. To elucidate the overall perspective and keep abreast of emerging trends in procrastination research, this article presents a bibliometric analysis that investigates the panorama of overviews and intellectual structures of related research on procrastination. Using the Web of Science Database, we collected 1,635 articles published between 1990 and 2020 with a topic search on “procrastination” and created diverse research maps using CiteSpace and VOS viewer. Bibliometric analysis in our research consists of category distribution, keyword co-occurrence networks, main cluster analysis, betweenness centrality analysis, burst detection analysis, and structure variation analysis. We find that most research has focused on students' samples and has discussed the definition, classification, antecedents, consequences and interventions to procrastination, whereas procrastination in diverse contexts and groups remains to be investigated. Regarding the antecedents and consequences, research has mainly been about the relationship between procrastination and personality differences, such as the five-factor model, temperament, character, emotional intelligence, and impulsivity, but functions of external factors such as task characteristics and environmental conditions to procrastination have drawn scant attention. To identify the nature and characteristics of this behavior, randomized controlled trials are usually adopted in designing empirical research. However, the predominant use of self-reported data collection and for a certain point in time rather than longitudinal designs has limited the validation of some conclusions. Notably, there have been novel findings through burst detection analysis and structure variation analysis. Certain research themes have gained extraordinary attention in a short time period, have evolved progressively during the time span from 1990 to 2020, and involve the antecedents of procrastination in a temporal context, theoretical perspectives, research methods, and typical images of procrastinators. And emerging research themes that have been investigated include bedtime procrastination, failure of social media self-control, and clinical interventions. To our knowledge, this is almost the first time to conduct systematically bibliometric analysis on the topic of procrastination and findings can provide an in-depth view of the patterns and trends in procrastination research.

Introduction

Procrastination is commonly conceptualized as an irrational tendency to delay required tasks or assignments despite the negative effects of this postponement on the individuals and organizations (Lay, 1986 ; Steel, 2007 ; Klingsieck, 2013 ). Poets have even written figuratively about procrastination, with such phrases as “ Procrastination is the Thief of Time ,” and “ Procrastination is the Art of Keeping Up with Yesterday ” (Ferrari et al., 1995 ). Literal meanings are retained today in terms of time management. The conceptualizations of procrastination imply inaction, or postponing, delaying, or putting off a decision, in keeping with the Latin origins of the term “pro-,” meaning “forward, forth, or in favor of,” and “-crastinus,” meaning “tomorrow” (Klein, 1971 ). Time delay is just the behavioral reflection, while personality traits, cognitive and motivational process, as well as contextual conditions are in-depth inducements to procrastination. Procrastination can be viewed as purposive and irrational delay so as to miss the deadlines (Akerlof, 1991 ; Schraw et al., 2007 ).

Procrastination is believed to be a self-regulation failure that is associated with a variety of personal and situational determinants (Hen and Goroshit, 2018 ). Specifically, research suggests that task characteristics (e.g., unclear instructions, the timing of rewards and punishment, as well as task aversiveness), personality facets (e.g., the five-factor model, motivation, and cognition), and environmental factors (e.g., temptation, incentives, and accountability) are the main determinants of procrastination (Harris and Sutton, 1983 ; Johnson and Bloom, 1995 ; Green et al., 2000 ; Wypych et al., 2018 ). Procrastination can be an impediment to success, and may influence the individual's mood, and increase the person's anxiety, depression, and low self-esteem (Ferrari, 1991 ; Duru and Balkis, 2017 ). Furthermore, a person with procrastination is prone to poor performance, with lower exam scores, slower job promotions, and poorer health (Sirois, 2004 ; Legood et al., 2018 ; Bolden and Fillauer, 2020 ). Importantly, if policymakers postpone conducting their decision-making until after the proper timing, that procrastination can cause a significant and negative impact on the whole society, such as the cases with the COVID-19 pandemic management in some countries (Miraj, 2020 ).

In practice, procrastination is stable and complex across situations, ranging from students' academic procrastination, to staffs' work procrastination, to individuals' bedtime procrastination, to administrative behavior procrastination when government organizations face multiple tasks in national governance, and even to delayed leadership decision-making in crisis situations in global governance (Nevill, 2009 ; Hubner, 2012 ; Broadbent and Poon, 2015 ; Legood et al., 2018 ). As for science research, procrastination has attracted more and more attention and been studied extensively. Personally, possible explanations for emerging research focuses mainly consist of two aspects. On one hand, procrastination with high prevalence and obvious consequences highlights the importance to explore the complex phenomenon deeply, especially the meteoric rise in availability of information and communications technologies (ICTs) amplifies chronic procrastination, such as problematic social media use, smartphone addictions as well as mobile checking habit intrusion (Ferrari et al., 2007 ; Przepiorka et al., 2021 ; Aalbers et al., 2022 ). On the other hand, more and more basic and milestone research emerges in large numbers, which set the foundation for latecomer' further exploration toward procrastination. In particular, it can't be ignored the efforts of those productive authors in different periods to drive the knowledge development of procrastination.

Procrastination research has experienced tremendous expansion and diversification, but systematic and overview discussion is lacking. Several meta-analyses about procrastination have emerged, but they emphasize more on specific topics (Steel, 2007 ; Sirois et al., 2017 ; Malouff and Schutte, 2019 ). Furthermore, the number of newly published articles is increasing, so it becomes difficult to fully track the relevant domain literature. In order to grasp knowledge development about the fast-moving and complex research field, bibliometric analysis is necessary to construct diagram-based science mapping, so as to provide a comprehensive and intuitive reference for subsequent researchers. Thus, this article emphasizes on the following major research question: what is the intellectual base and structure of procrastination research? How does the emerging direction of procrastination develop? In our research, bibliometric analysis included the annual distribution of literature, distribution of categories, keyword co-occurrence networks, main research clusters, high citation betweenness centrality, and the strongest citation bursts, as well as the recent publications with transformative potential, in order to look back on the early development of procrastination research and look forward to the future transformation of that research. For both scholars and members of the public, this study can comprehensively enhance their understanding of procrastination and can provide overall perspectives for future research.

Data and Methodology

Bibliometric analysis is a quantitative method to investigate intellectual structures of topical field. On the basis of co-citation assumption that if two articles are usually cited together, then there are high associations between those articles, bibliometric analysis can reflect the scientific communicational structures holistically (Garfield, 1979 ; Chen et al., 2012 ). Bibliometric techniques, such as CiteSpace, VOSviewer, HistCite, can generate the science maps based on plenty of literature concerning certain domain. Through the process of charting, mining, analyzing, sorting, and displaying knowledge, science mapping can extract pivotal information from huge complex literature, present knowledge base and intellectual structure of a given field visually, then researchers even general individual can quickly grasp one subject's core structure, development process, frontier field and the whole knowledge framework (Chen, 2017 ; Widziewicz-Rzonca and Tytla, 2020 ). Bibliometric analysis is commonly regarded as a complementary method to traditional structured literature reviews such as narrative analysis and meta-analysis (Fang et al., 2018 ; Jiang et al., 2019 ). Traditional literature analysis tends to labor intensive with subjective preferences, and faces difficulties in analyzing larger body of literature, whereas bibliometric analysis provides a more objective approach for investigating considerable literature's intellectual structure through statistical analysis and interactive visual exploration.

In order to master the characteristics of procrastination research, the study adopted the bibliometric software of CiteSpace and VOSviewer to analyze the literature on procrastination during the time period 1990–2020. The software tool VOSviewer is designed for creating maps of authors, journals, and keyword co-occurrences based on network data (van Eck and Waltman, 2010 ), whereas CiteSpace is applied to conduct co-citation analysis, including centrality betweenness analysis, burst detection, and the emerging trends of research (Chen, 2006 , 2017 ). In our study, we adopted the CiteSpace (5.7.R1) and VOSviewer (1.6.15) software together. Specifically, co-citation analysis mainly depends on CiteSpace software, and co-occurrence analysis is conducted through VOS viewer (Markscheffel and Schroeter, 2021 ).

Though there is one similar bibliometrics analysis toward this topic (Tao et al., 2021 ), related research just focuses on academic procrastination, and mainly conducts co-occurrence analysis using VOSviewer, so as to there is a lack of analysis to core co-citation structures including high betweenness centrality articles, citation burst research and structure variation analysis. To offer insight into the intellectual structure of procrastination research, we further employ CiteSpace — a java application including bibliometric analysis, data mining algorithms and visualization methods developed by Chen — to visualize and elucidate vital trends and pivotal points about knowledge development.

To conduct our bibliometric analysis of procrastination research, we collected bibliographic records from the Web of Science Core Collection as of December 31, 2020. Web of Science is currently the most relevant scientific platform regarding systematic review needs, allowing for a “Topic” query, including searching a topic in the documents' “title”, “abstract”, “author keywords” and “keywords plus” of the documents being reviewed (Yi et al., 2020 ). A topic search strategy is broad enough to be used in science mapping (Olmeda-Gomez et al., 2019 ). Given the aim of the study, records were downloaded if they had the term “procrastination” in the “Topic” field. After restricting the type of publication to “Article” for the years 1900–2020, we had searched 2105 papers about procrastination research.

Figure 1 shows the yearly distribution of 2105 literature during 1900–2020, and it can be classified into three phases. In phase I (1900–1989), the annual number of publications never exceeded 10. In phase II (1990–2010), the annual quantity gradually increased from 11 papers in 1991 to 48 in 2010. The annual number of publications had begun to grow in this period, but remained below 50 papers yearly. In phase III (2011–2020), however, the procrastination research experienced a dramatic growth, with 255 literature in the year 2020. Although procrastination research appeared as early as 1900s, it had a stable total volume until the 1990s, when it developed sustained growth, and that growth became extraordinary during the 2010s. Therefore, this research emphasized centered on 1,635 literature that were published during the time span 1990–2020.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0001.jpg

Distribution of publications on the topic of procrastination, 1900-2020.

Panoramic Overview of Procrastination Research

Category distribution.

Procrastination research has been attracting increasing attention from scholars, and it has been successfully integrated into various scientific fields. With the help of CiteSpace software, we present in Figure 2 the timelines of the various disciplines that are involved in procrastination research, and the cumulative numbers of literature that have been published.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0002.jpg

Distribution of categories involved in procrastination research.

As Figure 2 shows, the size of node on the horizontal lines represents the quantity of literature published. Node colors denote the range of years of occurrence, and purple outlining is an indication of those articles with prominent betweenness centrality, and red nodes present references with high citation burst (Chen, 2017 ). Besides, the uppermost line shows the timeline of different disciplines, and the numbers on the longitudinal lines describe the distinct categories of procrastination research, of which are arranged vertically in the descending order of cluster's size. Clusters are numbered from 0, i.e Cluster #0 is the largest cluster and Cluster #1 is the second largest one. Specifically, the earlier research about procrastination occurs in the Psychology and Social Science disciplines. Subsequently, research has expanded into Computer Science and Information Systems, Economics, the Neurosciences, the Environmental Sciences, Ethics, Surgery, and general Medicine. As the connections arc in the Figure 2 presents, those categories #0 Psychology and Social Sciences, #1 Computer Science, and #2 Economics interact actively, but the interdisciplinary research about the remaining categories, such as #9 Medicine, #5 Ethics, and #4 Environmental Science, is not active.

Our analysis of the category distribution reveals two aspects of the characteristics about procrastination research. One, related research mostly has its roots in the Psychology and Social Science disciplines, and interdisciplinary research needs to be improved. And Two, the foundational literature dates back to the 1990s, and transformational exploration is currently needed in order to further develop the research on procrastination.

Keyword Co-occurrence Network: Core Contents

Analysis of co-occurring keywords is often used to obtain the content of research fields. Using the VOS viewer, we obtained a total of 5,203 keywords and created a co-occurrence network. As mentioned above, the size of a node represents the number of times that a specific keyword occurs. Several keywords turn up frequently, such as Procrastination, Performance, Academic Procrastination, Motivation, Personality, Self-regulation, Self-control, and Behavior. To create a readable map, the “minimum number of occurrences” is set to 20, and the final network includes 90 high-frequency keywords and five clusters with 2,650 links, as is shown in Figure 3 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0003.jpg

Keywords co-occurrence network for procrastination research.

Among the five clusters depicted in Figure 3 , the blue cluster is mainly related to the definition of procrastination, with keywords such as Procrastination, Delay, Deadlines, Choice, Self-Control, and Implementation Intentions. Procrastination is a complex phenomenon, and previous research has elaborated on the core traits about procrastination from various dimensions. Mainstream views hold that procrastination can be defined as the intentional delay of work because of a self-regulation failure, time-management inefficiency, short-term benefits, a gap between intention and action (Tice and Baumeister, 1997 ; Steel, 2007 ; Pychyl and Flett, 2012 ; Klingsieck, 2013 ), or missing a deadline and causing negative outcomes (Johnson and Bloom, 1995 ; Howell and Watson, 2007 ; Sirois, 2021 ).

The cluster in red in Figure 3 involves procrastination performance in relation to different life-domains, including Academic Achievement, Life Satisfaction, Online Learning, and Technology Uses. Previous research has elaborated on procrastination as being negatively correlated with performance. However, intrinsic motivation, self-regulated learning, and time-management have been shown to relieve the procrastination behavior (Wolters, 2003 ; Howell and Watson, 2007 ; Baker et al., 2019 ).

The green cluster highlights traits associated with procrastination. Related research in that cluster mostly discusses the correlation between the five-factor model (neuroticism, extraversion, openness to experience, agreeableness, conscientiousness) and procrastination (Schouwenburg and Lay, 1995 ). In addition, personality traits including indecisiveness, indecision, and perfectionism have been elaborated upon (Klingsieck, 2013 ; Tibbett and Ferrari, 2019 ). Furthermore, to measure the trait of procrastination itself, various scales have been developed, such as the General Procrastination Scale, Decisional Procrastination Questionnaire, Procrastination at Work Scale, Irrational Procrastination Scale, Adult Inventory of Procrastination Scale and so on (Lay, 1986 ; Ferrari et al., 1995 ; Steel, 2010 ; Metin et al., 2016 ). The validity and reliability of those scales have also been investigated fully.

The cluster presented in yellow depicts studies that focuses on academic procrastination, and especially those that discuss the antecedents of the prevalent behavior, such as Anxiety, Perfectionism, Self-efficacy, Depression, and Stress (Schraw et al., 2007 ; Goroshit, 2018 ). Owing to their accessibility for use as a research sample, a large body of procrastination research has chosen students in an academic setting as the research objects. Researchers have found that academic procrastination is an impediment to academic performance, especially for very young students. Notably, too, female students may perform lower levels of academic procrastination than males do.

The last cluster, presented in purple, relates to chronic procrastination's involvement in health and addiction, for either adults or adolescents. Discussion about chronic procrastination is growing, and interventions can be effective in relieving this behavior.

From the analysis of co-occurrence keywords, we can infer that procrastination research has been developing steadily. The fundamental discussion has become more adequate and persuasive in regard to the definition, the individual differences, and the antecedents of procrastination, and a discussion of how to relieve the behavior has begun.

Main Research Cluster: Core Theme and Hot Topics

Comparing to keyword co-occurrence network analyses, cluster analysis can help us grasp the primary themes in procrastination research. Clusters are based on the assumption that if two references are often cited together, they may be associated in some way (Chen et al., 2012 ; Pan et al., 2019 ). Eventually, related references shape diverse co-citation networks. Clustering is a procedure to classify co-cited references into groups, with references in the same clusters being tightly connected with each other but loosely associated with other clusters (Chen et al., 2010 ).

Based on the references of the top 50 articles with the most citations every year (if the number was less than 50 in a certain year, then all of the articles were combined), the final network contained 982 references and we were able to develop the final cluster landscape. Two procedures are used to label each cluster: (1) retrieval of keywords from the citing articles using the log likelihood ratio, and (2) retrieval of terms contained in the cited articles with latent semantic indexing (Olmeda-Gomez et al., 2019 ). In our research, we adopted the log-likelihood ratio (LLR) method to label the clusters automatically. Given the related structural and time-based values, articles in the co-citation network are assigned to each cluster. Eventually, the network was divided into 23 co-citation clusters.

In addition, two critical parameters, silhouette and modularity, are used to measure whether clusters are available and whether they are well-constructed. Silhouette indicates the homogeneity of clusters, whereas modularity measures whether the network is reasonably divided into independent clusters. The silhouette value ranges from −1 to 1, and the modularity score ranges from 0 to 1. When values of the two metrics are high, the co-citation network is well-constructed (Chen et al., 2010 ; Widziewicz-Rzonca and Tytla, 2020 ). As is shown in Figure 4 , the mean silhouette score of 0.9223 suggested that the homogeneity of these clusters was acceptable, and the modularity score of 0.7822 indicated that the network was reasonably divided.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0004.jpg

Landscape view of co-citation network of procrastination research.

In our research, we summed the largest nine clusters. As is shown in Table 1 , the silhouette value for all clusters was higher than 0.8, suggesting the references in each cluster were highly homogeneous. The labels of these clusters were controlled trial, avoidant procrastination, conscientiousness procrastination, smoking cessation, explaining lack, academic achievement, procrastinatory media use, career indecision, and goal orientation.

Summary of the nine largest clusters in procrastination research.

01820.855Controlled trial2014
11480.836Avoidant procrastination2005
21440.938Conscientiousness procrastination1994
3720.989Smoking cessation2000
4650.97Explaining lack1988
5580.903Academic achievement2009
6330.988Procrastinatory media use2013
7310.99Career indecision2006
8280.981Goal orientation1995

In Table 1 , the year in the far-right column indicated the average year when the reference was cited. Ranking the clusters by the mean cited year, we can follow the development of research themes. During the 1990s, research themes focused on discussions about the antecedents of procrastination. For example, Lay ( 1988 ) discussed that the self-regulation model cannot explain procrastination fully, and errors in estimations of the time taken to complete a task may be attributed to procrastination. Procrastinators were thought to tend to lack conscientiousness and goal orientation as well as to be motivated by neurotic avoidance (Ferrari et al., 1995 ; Elliot and Harackiewicz, 1996 ). Besides, procrastination was prevalent throughout our lifespan, and empirical research on procrastination conducted through controlled trials had considered various settings or scenarios, such as academic procrastination, smoking cessation, career indecision, and in the most recent years, media use (Klassen et al., 2008 ; Germeijs and Verschueren, 2011 ; Du et al., 2019 ). Because procrastination was negatively associated with performance, life satisfaction, health and well-being, research on procrastination avoidance and intervention, including strengths-based training and cognitive behavioral therapy had attracted the most attention from scholars (van Eerde, 2003 ; Balkis and Duru, 2016 ; Visser et al., 2017 ).

Intellectual Structure of Procrastination Research

Co-citation analysis and clustering analysis form the cornerstone for bibliometric investigation (Olmeda-Gomez et al., 2019 ), especially for the microscopic intellectual structures of the science, such as betweenness centrality, burst detection, and structural variation analysis (Pan et al., 2019 ). Based on the cited references network during the period of 1990–2020, we generated a landscape visualization of intellectual structures about procrastination research. The section consists of three parts: (1) Betweenness Centrality Analysis captures the bridge nodes, which represents the landmark and pivotal literature of a scientific field (Freeman, 1978 ). (2) Burst Detection Analysis is used to detect the emergent and sharp increases of interest in a research field (Kleinberg, 2003 ), which is a useful method for easily tracing the development of research focus and research fronts. (3) Structural Variation Analysis (SVA) is an optional measurement to identify whether newly published articles have the potential to transform the citation network in the latest years. Newly published articles initially have fewer citations and may be overlooked. To overcome the limitation, structural variation analysis often employs zero-inflated negative binomial (ZINB) and negative binomial (NB) models to detect these transformative and potential literature (Chen, 2013 ).

Betweenness Centrality Analysis

Literature with high betweenness centrality tends to represent groundbreaking and landmark research. On the basis of our co-citation network on procrastination research for the period 1990–2020, we chose the top 10 articles to explore (see Supplementary Material for details). Related research mainly focuses on three areas.

Definition and Classification of Procrastination

Procrastination is described as the postponement of completion of a task or the failure to meet deadlines, even though the individual would meet adverse outcomes and feel uncomfortable as a result (Johnson and Bloom, 1995 ). Extracting from authoritative procrastination scales, Diaz-Morales et al. ( 2006 ) proposed a four-factor model of procrastination: dilatory behaviors, indecision, lack of punctuality, and lack of planning. Procrastination is commonly considered to be a pattern of self-regulation failure or self-defeating behavior (Tice and Baumeister, 1997 ; Sirois and Pychyl, 2013 ).

The most popular classification is the trinity of procrastination: decisional, arousal, and avoidant procrastination (Ferrari, 1992 ). Using the General Behavioral Procrastination Scale and Adult Inventory of Procrastination Scale, Ferrari et al. ( 2007 ) measured the difference between arousal and avoidant procrastination, and they elaborated that those two patterns of procrastination showed similarity and commonality across cultural values and norms. However, by conducting a meta-analytic review and factor analyses, Steel ( 2010 ) found that evidence for supporting the tripartite model of procrastination may not be sufficient. Research has reached a consensus about the basic definition of procrastination, but how to classify procrastination needs further discussion.

Procrastination Behavior in a Temporal Context

Procrastination is related to time management in its influence on one's behavior. Non-procrastinators or active procrastinators have better time control and purposive use of time (Corkin et al., 2011 ). However, time management is an obstacle to procrastinators. From the temporal disjunction between present and future selves, Sirois and Pychyl ( 2013 ) pointed out that procrastinators tended to give priority to short-term mood repair in the present, even though their future self would pay for the inaction. Similarly, in a longitudinal study Tice and Baumeister ( 1997 ) pointed out that maladjustment about benefits-costs in participants' timeframe shaped their procrastination. When a deadline is far off, procrastination can bring short-term benefits, such as less stress suffering and better health, whereas early benefits are often outweighed by possible long-term costs, including poor performance, low self-esteem, and anxiety. These viewpoints confirm that procrastination is a form of self-regulation failure, and that it involves the regulation of mood and emotion, as well as benefit-cost tradeoffs.

Causes of and Interventions for Procrastination

Procrastination shows significant stability among persons across time and situations. Predictors of procrastination include personality traits, task characteristics, external environments, and demographics (Steel, 2007 ). However, typically, empirical research has mostly focused on the relationship between the five-factor model and procrastination behavior. Johnson and Bloom ( 1995 ) systematically discussed five factors of personality to variance in academic procrastination. Research also had found that facets of conscientiousness and neuroticism were factors that explained most procrastination. In alignment with these findings above, Schouwenburg and Lay ( 1995 ) elaborated that procrastination was largely related to a lack of conscientiousness, which was associated with six facets: competence, order, dutifulness, achievement-striving, self-discipline, and deliberation. Meanwhile, impulsiveness (a facet of neuroticism) has some association with procrastination, owing to genetic influences (Gustavson et al., 2014 ). These discussions have established a basis for research about personality traits and procrastination (Flett et al., 2012 ; Kim et al., 2017 ).

To relieve procrastination, time management (TM) strategies and clinical methods are applied in practice. Glick and Orsillo ( 2015 ) compared the effectiveness of those interventions and found that acceptance-based behavior therapies (ABBTs) were more effective for chronic procrastinators. Regarding academic procrastination, Balkis ( 2013 ) discussed the role of rational beliefs in mediating procrastination, life satisfaction, and performance. However, there is no “Gold Standard” intervention for procrastination. How to manage this complex behavior needs further investigation.

Burst Detection Analysis

A citation burst indicates that one reference has gained extraordinary attention from the scientific community in a short period of time, and thus it can help us to detect and identify emergent research in a specialty (Kleinberg, 2003 ). A citation burst contains two dimensions: the burst strength and the burst status duration. Articles with high strength values can be considered to be especially relevant to the research theme (Widziewicz-Rzonca and Tytla, 2020 ). Burst status duration is labeled by the red segment lines in Figure 5 , which presents active citations' beginning year and ending year during the period 1990-2020. As can be seen in Figure 5 , we ranked the top 20 references (see Supplementary Material for details) with the strongest citation bursts, from the oldest to the most recent.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0005.jpg

Top 20 references with the strongest citation bursts.

To systematically investigate the active areas of procrastination research in different time periods, we divided the study's overall timespan into three time periods. During the period 1990 through 1999, there were six references with high citation bursts, with two of them by Ferrari and a third by Ferrari, Johnson, and McCown. Subsequently, in 2000 through 2009, there were eight reference bursts, and the meta-analysis and theoretical review by Steel ( 2007 ) had the highest citation burst among those 20 references. From the period 2010 through 2020, six references showed high citation bursts.

Period I (1990–1999): Preliminary Understanding of Procrastination's Antecedents

How one defines procrastination is important to interventions. During the early period of procrastination research, scholars paid significant attention to define procrastination and discuss its antecedents. Time delay in completing tasks constitutes the vital dimension that distinguishes procrastination behavior, and that distinction has set the foundation for future exploration of the behavior. Lay ( 1988 ) found that errors in estimations of time led to procrastination, then identified two types of procrastinators: pessimistic procrastinators and optimistic ones, according to whether one is optimistic or pessimistic about judgments of time. In addition, the timeframe or constraint scenario influences one's behavioral choices. Procrastinators tend to weigh short-term benefits over long-term costs (Tice and Baumeister, 1997 ).

However, time delay is just a behavioral representation, and personality traits may be in-depth inducements to procrastination behavior (Ferrari, 1991 ; Ferrari et al., 1995 ). Schouwenburg and Lay ( 1995 ) empirically studied and elaborated upon the relationship between the five-factor model and procrastination facing a sample of students, and their findings showed consistency with research by Ferrari ( 1991 ) which demonstrated that the trait facets of lacking conscientiousness and of neurotic avoidance were associated with procrastination. In addition, Ferrari ( 1992 ) evaluated two popular scales to measure procrastination: the General Procrastination (GP) scale and the Adult Inventory for Procrastination (AIP) scale. Regarding the measurement of procrastination, a variety of scales have been constructed to further enhance the development of procrastination research.

Period II (2000–2009): Investigation of Cognitive and Motivational Facets and Emergence of Various Research Methods

During period II, procrastination research with high citation bursts focused largely on two dimensions: behavioral antecedences and empirical methods. On one hand, discussions about cognitive and motivational antecedents spring up. A series of studies find that cognitive and motivational beliefs, including goal orientation, perceived self-efficacy, self-handicapping, and self-regulated learning strategies, are strongly related to procrastination (Wolters, 2003 ; Howell and Watson, 2007 ; Klassen et al., 2008 ). Specifically, Howell and Watson ( 2007 ) examined the achievement goal framework with two variables, achievement goal orientation and learning strategies usage, in which four types of goal orientation can be derived by the performance vs. mastery dimension and the approach vs. avoidance dimension. Their research found that procrastination was attributed to a mastery-avoidance orientation, whereas it was adversely related to a mastery-approach orientation. Moreover, Chu and Choi ( 2005 ) identified two types of procrastinators, active procrastinators versus passive procrastinators, in terms of the individual's time usage and perception, self-efficacy beliefs, motivational orientation, stress-coping strategies, and final outcomes. This classification of procrastinators has aroused a hot discussion about procrastination research (Zohar et al., 2019 ; Perdomo and Feliciano-Garcia, 2020 ). Cognitive and motivational antecedents are complementary to personality traits, and the antecedents and traits together reveal the complex phenomenon.

In addition, there are various research methods being applied in the research, such as meta-analyses and grounded theory. Having the strongest citation burst in period II, research that was based on a meta-analysis of procrastination by Steel ( 2007 ) elaborated on temporal motivation theory (TMT). Temporal motivational theory provides an innovative foothold for understanding self-regulation failure, using four critical indicators: expectancy, value, sensitivity to delay, and delay itself. Similarly, van Eerde ( 2003 ) conducted a meta-analysis to examine the relationship between procrastination and personality traits, and proposed that procrastination was negatively related to conscientiousness and self-efficacy, but was also actively associated with self-handicapping. Procrastinators commonly set deadlines, but research has found that external deadlines may be more effective than self-imposed ones (Ariely and Wertenbroch, 2002 ). Furthermore, Schraw et al. ( 2007 ) constructed a paradigm model through grounded theory to analyze the phenomenon of academic procrastination, looking at context and situational conditions, antecedents, phenomena, coping strategies, and consequences. These diverse research methods are enhancing our comprehensive and systematical understanding of procrastination.

Period III (2010–2020): Diverse Focuses on Procrastination Research

After nearly two decades of progressive developments, procrastination research has entered a steady track with diverse current bursts, on topics such as type distinction, theoretical perspective, temporal context, and the typical image of procrastinators. Steel ( 2010 ) revisited the trinity of procrastination — arousal procrastinators, avoidant procrastinators, and decisional procrastinators — and using the Pure Procrastination Scale (PPS) and the Irrational Procrastination Scale (IPS), he found that there was no distinct difference among the three types. Regarding research settings, a body of literature has focused on academic procrastination in-depth, and that literature has experienced a significant citation burst (Kim and Seo, 2015 ; Steel and Klingsieck, 2016 ). For example, academic procrastination is associated more highly with performance for secondary school students than for other age groups.

Notably, theoretical discussions and empirical research have been advancing synchronously. Klingsieck ( 2013 ) investigated systematic characteristics of procrastination research and concluded that theoretical perspectives to explain the phenomenon, whereas Steel and Ferrari ( 2013 ) portrayed the “typical procrastinator” using the variables of sex, age, marital status, education, community location, and nationality. Looking beyond the use of time control or time perception to define procrastination, Sirois and Pychyl ( 2013 ) compared the current self and the future self, then proposed that procrastination results from short-term mood repair and emotion regulation with the consequences being borne by the future self. In line with the part of introduction, in the last 10 years, research on procrastination has flourished and knowledge about this complex phenomenon has been emerging and expanding.

Structure Variation Analysis

Structure variation analysis (SVA) can predict the literature that will have potential transformative power in the future. Proposed by Chen ( 2012 ), structure variation analysis includes three primary metrics — the modularity change rate, cluster linkage, and centrality divergence — to monitor and discern the potential of newly published articles in specific domains. The modularity change rate measures the changes in and interconnectivity of the overall structure when newly published articles are introduced into the intellectual network. Cluster linkage focuses on these differences in linkages before and after a new between-cluster link is added by an article, whereas centrality divergence measures the structural variations in the divergence of betweenness centrality that a newly published article causes (Chen, 2012 ; Hou et al., 2020 ). The values of these metrics are higher, and the newly published articles are expected to have more potential to transform the intellectual base (Hou et al., 2020 ). Specifically, cluster linkage is a direct measure of intellectual potential and structural change (Chen, 2012 ). Therefore, we adopted cluster linkage as an indicator by which to recognize and predict the valuable ideas in newly published procrastination research. These top 20 articles with high transformative potential that were published during the period 2016-2020 were listed (see Supplementary Material for details). Research contents primarily consist of four dimensions.

Further Investigations Into Academic Procrastination

Although procrastination research has drawn mostly on samples of students, innovative research contents and methods have been emerging that enhance our understanding of academic procrastination. In the past five years, different language versions of scales have been measured and validated (Garzon Umerenkova and Gil-Flores, 2017a , b ; Svartdal, 2017 ; Guilera et al., 2018 ), and novel research areas and contents have arisen, such as how gender difference influences academic procrastination, what are the effective means of intervention, and what are the associations among academic procrastination, person-environment fit, and academic achievement (Balkis and Duru, 2016 ; Garzon Umerenkova and Gil-Flores, 2017a , b ; Goroshit, 2018 ). Interestingly, research has found that females perform academic procrastination less often and gain better academic achievements than males do (Balkis and Duru, 2017 ; Perdomo and Feliciano-Garcia, 2020 ).

In addition, academic procrastination is viewed as a fluid process. Considering the behavior holistically, three different aspects of task engagement have been discussed: initiation, completion, and pursuit. Vangsness and Young ( 2020 ) proposed the metaphors of “turtles” (steady workers), “task ninjas” (precrastinators), and “time wasters” (procrastinators) to elaborate vividly on task completion strategies when working toward deadlines. Individual differences and task characteristics can influence one's choices of a task-completion strategy. To understand the fluid and multifaceted phenomenon of procrastination, longitudinal research has been appearing. Wessel et al. ( 2019 ) observed behavioral delay longitudinally through tracking an undergraduate assignment over two weeks to reveal how passive and active procrastination each affected assignment completion.

Relationships Between Procrastination and Diverse Personality Traits

In addition to the relationship between procrastination and the five-factor model, other personality traits, such as temperament, character, emotional intelligence, impulsivity, and motivation, have been investigated in connection with procrastination. Because the five-factor model is not effective for distinguishing the earlier developing temperamental tendencies and the later developing character traits, Zohar et al. ( 2019 ) discussed how temperament and character influence procrastination in terms of active and passive procrastinators, and revealed that a dependable temperament profile and well-developed character predicted active procrastination.

Procrastination is commonly defined as a self-regulation failure that includes emotion and behavior. Emotional intelligence (EI) is an indicator with which to monitor one's feelings, thinking, and actions, and hot discussions about its relationship with procrastination have sprung up recently. Sheybani et al. ( 2017 ) elaborated on how the relationship between emotional intelligence and the five-factor model influence decisional procrastination on the basis of a students' sample. As a complement to the research above, Wypych et al. ( 2018 ) explored the roles of impulsivity, motivation, and emotion regulation in procrastination through path analysis. Motivation and impulsivity reflecting a lack of value, along with delay discounting and lack of perseverance, are predicators of procrastination, whereas emotion regulation, especially for suppression of procrastination, has only appeared to be significant in student and other low-age groups. How personality traits influence procrastination remains controversial, and further research is expected.

Procrastination in Different Life-Domains and Settings

Newly published research is paying more attention to procrastination in different sample groups across the entire life span. Not being limited to student samples, discussions about procrastination in groups such as teachers, educated adults, and workers have been emerging. With regard to different life domains, the self-oriented domains including health and leisure time, tend to procrastinate, whereas parenting is low in procrastination among highly educated adults. Although the achievement-oriented life domains of career, education, and finances are found with moderate frequency in conjunction with procrastination, these three domains together with health affect life the most (Hen and Goroshit, 2018 ). Similarly, Tibbett and Ferrari ( 2019 ) investigated the main regret domains facing cross-cultural samples, so as to determine which factors increased the likelihood of identifying oneself as a procrastinator. Their research found that forms of earning potential, such as education, finances, and career, led participants to more easily label themselves as procrastinators. Procrastination can lead to regret, and this research adopted reverse thinking to discuss the antecedents of procrastination.

In addition to academic procrastination, research about the behavior in diverse-context settings has begun to draw scholars' attention. Nauts et al. ( 2019 ) used a qualitative study to investigate why people delay their bedtime, and the study identified three forms of bedtime procrastination: deliberate procrastination, mindless procrastination, and strategic delay. Then, those researchers proposed coached interventions involving time management, priority-setting skills, and reminders according to the characteristics of the bedtime procrastination. Interestingly, novel forms of procrastination have been arising in the attention-shortage situations of the age of the internet, such as social media self-control failure (SMSCF). Du et al. ( 2019 ) found that habitual checking, ubiquity, and notifications were determinants for self-control failures due to social media use, and that finding provided insight into how to better use ICTs in a media-pervasive environment. Moreover, even beyond those life-related-context settings, procrastination in the workplace has been further explored. Hen ( 2018 ) emphasized the factor of professional role ambiguity underlying procrastination. Classification of procrastination context is important for the effectiveness of intervention and provides us with a better understanding of this multifaceted behavior.

Interventions to Procrastination

Overcoming procrastination is a necessary topic for discussion. Procrastination is prevalent and stable across situations, and it is commonly averse to one's performance and general well-being. Various types of interventions are used, such as time management, self-management, and cognitive behavioral therapy. To examine the effectiveness of those interventions, scholars have used longitudinal studies or field experimental designs to investigate these methods of intervention for procrastination. Rozental et al. ( 2017 ) examined the efficacy of internet-based cognitive behavior therapy (ICBT) to relieve procrastination, from the perspective of clinical trials. Through a one-year follow-up in a randomized controlled trial, researchers found that ICBT could be beneficial to relieve severe, chronic procrastination. Taking the temporal context into consideration, Visser et al. ( 2017 ) discussed a strengths-based approach — one element of the cognitive behavioral approach — that showed greater usefulness for students at an early stage of their studies than it did at later ages. Overall, research on the effectiveness of intervention for procrastination is relatively scarce.

Discussion and Conclusion

Discussion on procrastination research.

This article provides a systematic bibliometric analysis of procrastination research over the past 30 years. The study identifies the category distribution, co-occurrence keywords, main research clusters, and intellectual structures, with the help of CiteSpace and VOS viewer. As is shown in Figure 6 , the primary focuses for research themes have been on the definition and classification of procrastination, the relationships between procrastination and personality traits, the influences brought by procrastination, and how to better intervene in this complex phenomenon.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0006.jpg

Bibliometric analysis and science map of the literature on procrastination.

Those contents have built the bases for procrastination research, but determining how those bases are constructed is important to the development of future research. Therefore, this article primarily discusses three aspects of intellectual structure of procrastination research: betweenness centrality, burst detection, and structural variation analysis. From the betweenness centrality analysis, three research themes are identifiable and can be generally summarized as: definition and classification of procrastination, procrastination behavior in a temporal context, and causes and interventions for procrastination.

However, procrastination research themes have evolved significantly across the time period from 1990–2020. Through burst detection analysis, we are able to infer that research has paid extraordinary attention to diverse themes at different times. In the initial stage, research is mainly about the antecedents of procrastination from the perspectives of time-management, self-regulation failure, and the five-factor model, which pays more attention to the behavior itself, such as delays in time. Subsequently, further discussions have focused on how cognitive and motivational facets such as goal orientation, perceived self-efficacy, self-handicapping, as well as self-regulated learning strategies influence procrastination. In the most recent 10 years, research has paid significant attention to expanding diverse themes, such as theoretical perspectives, typical images of procrastinators, and procrastination behavior in diverse temporal contexts. Research about procrastination has been gaining more and more attention from scholars and practitioners.

To explore newly published articles and their transformative potential, we conduct structural variation analysis. Beyond traditional research involving academic procrastination, emerging research themes consist of diverse research settings across life-domains, such as bedtime procrastination, social media self-control failure, procrastination in the workplace, and procrastination comparisons between self-oriented and achievement-oriented domains. Furthermore, novel interventions from the perspective of clinical and cognitive orientations to procrastination have been emerging in response to further investigation of procrastination's antecedents, such as internet-based cognitive behavior therapy (ICBT) and the strengths-based approach.

Conclusions and Limitations

In summary, research on procrastination has gained increasing attention during 1990 to 2020. Specifically in Figure 7 , research themes have involved in the definition, classification, antecedents, consequences, interventions, and diverse forms of procrastination across different life-domains and contexts. Furthermore, empirical research has been conducted to understand this complex and multifaceted behavior, including how best to design controlled trial experiments, how to collect and analyze the data, and so on.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-809044-g0007.jpg

Brief conclusions on procrastination research.

From the perspective of knowledge development, related research about procrastination has experienced tremendous expansion in the last 30 years. There are three notable features to describe the evolutionary process.

First, research focuses are moving from broader topics to more specific issues. Prior research mostly explored the definition and antecedents of procrastination, as well as the relationship between personality traits and procrastination. Besides, earlier procrastination research almost drew on students' setting. Based on previous research above, innovative research starts to shed light on procrastination in situation-specific domains, such as work procrastination, bedtime procrastination, as well as the interaction between problematic new media use and procrastination (Hen, 2018 ; Nauts et al., 2019 ; Przepiorka et al., 2021 ). With the evolvement of research aimed at distinct contexts, more details and core contents about procrastination have been elaborated. For example, procrastination in workplace may have association with professional role ambiguity, abusive supervision, workplace ostracism and task characteristics (Hen, 2018 ; He et al., 2021 ; Levin and Lipshits-Braziler, 2021 ). In particular, owing to the use of information and communication technology (ICTs), there currently are ample temptations to distract our attention, and those distractions can exacerbate the severity of procrastination (Du et al., 2019 ; Hong et al., 2021 ). Therefore, how to identify those different forms of procrastination, and then to reduce their adverse outcomes, will be important to discuss.

Second, antecedents and consequences of procrastination are further explored over time. On one hand, how procrastination occurs arises hot discussions from diverse dimensions including time management, personality traits, contextual characteristics, motivational and cognitive factors successively. Interestingly, investigations about neural evidences under procrastination have been emerging, such as the underlying mechanism of hippocampal-striatal and amygdala-insula to procrastination (Zhang et al., 2021 ). Those antecedents can be divided into internal factors and external factors. Internal factors including character traits and cognitive maladjustments have been elucidated fully, but scant discussion has occurred about how external factors, such as task characteristics, peers' situations, and environmental conditions, influence procrastination (Harris and Sutton, 1983 ; He et al., 2021 ). On the other hand, high prevalence of procrastination necessitates the importance to identify the negative consequences including direct and indirect. Prior research paid more attention to direct consequences, such as low performance, poor productivity, stress and illness, but the indirect consequences that can be brought about by procrastination remain to be unclear. For example, “second-hand” procrastination vividly describes the “spillover effect” of procrastination, which is exemplified by another employee often working harder in order to compensate for the lost productivity of a procrastinating coworker (Pychyl and Flett, 2012 ). Although such phenomena are common, adverse outcomes are less well investigated. Combining the contexts and groups involved, targeted discussions about the external antecedents and indirect consequences of procrastination are expected.

Third, empirical research toward procrastination emphasizes more on validity. When it comes to previous research, longitudinal studies are often of small numbers. However, procrastination is dynamic, so when most studies focus on procrastination of students' sample during just one semester or several weeks, can limit the overall viewpoints about procrastination and the effectiveness of conclusions. With the development of research, more and more longitudinal explorations are springing up to discuss long-term effects of procrastination through behavioral observation studies and so on. Besides, how to design the research and collect data evolves gradually. Self-reported was the dominant method to collect data in prior research, and measurements of procrastination usually depended on different scales. However, self-reported data are often distorted by personal processes and may not reflect the actual situation, even to overestimate the level of procrastination (Kim and Seo, 2015 ; Goroshit, 2018 ). Hence, innovative studies start to conduct field experimental designs to get observed information through randomized controlled trials. For the following research, how to combine self-reported data and observed data organically should be investigated and refined.

This bibliometric analysis to procrastination is expected to provide overall perspective for future research. However, certain limitations merit mentioning here. Owing to the limited number of pages allowed, it is difficult to clarify the related articles in detail, so discussion tends to be heuristic. Furthermore, the data for this research comes from the Web of Science database, and applying the same strategy to a different database might have yielded different results. In the future, we will conduct a systematic analysis using diverse databases to detect pivotal articles on procrastination research.

Data Availability Statement

Author contributions.

BY proposed the research question and conducted the research design. XZ analyzed the data and wrote primary manuscript. On the base of that work mentioned above, two authors discussed and adjusted the final manuscript together.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.809044/full#supplementary-material

  • Aalbers G., vanden Abeele M. M., Hendrickson A. T., de Marez L., Keijsers L. (2022). Caught in the moment: are there person-specific associations between momentary procrastination and passively measured smartphone use? Mobile Media Commun . 10 , 115–135. 10.1177/2050157921993896 [ CrossRef ] [ Google Scholar ]
  • Akerlof G. A.. (1991). Procrastination and obedience . Am. Econ. Rev. 81 , 1–19. [ Google Scholar ]
  • Ariely D., Wertenbroch K. (2002). Procrastination, deadlines, and performance: self-control by precommitment . Psychol. Sci . 13 , 219–224. 10.1111/1467-9280.00441 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baker R., Evans B., Li Q., Cung B. (2019). Does inducing students to schedule lecture watching in online classes improve their academic performance? An experimental analysis of a time management intervention . Res. Higher Educ. 60 , 521–552. 10.1007/s11162-018-9521-3 [ CrossRef ] [ Google Scholar ]
  • Balkis M.. (2013). Academic procrastination, academic life satisfaction and academic achievement: the mediation role of rational beliefs about studying . J. Cogn. Behav. Psychother. 13 , 57–74. [ Google Scholar ]
  • Balkis M., Duru E. (2017). Gender differences in the relationship between academic procrastination, satisfaction with academic life and academic performance . Electr. J. Res. Educ. Psychol. 15 , 105–125. 10.25115/ejrep.41.16042 [ CrossRef ] [ Google Scholar ]
  • Balkis M., Duru E. (2016). The analysis of relationships among person-environment fit, academic satisfaction, procrastination and academic achievement . Univ. J. Educ. 39 , 119–129. [ Google Scholar ]
  • Bolden J., Fillauer J. P. (2020). “Tomorrow is the busiest day of the week”: executive functions mediate the relation between procrastination and attention problems . J. Am.College Health . 68 , 854–863. 10.1080/07448481.2019.1626399 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Broadbent J., Poon W. L. (2015). Self-regulated learning strategies & academic achievement in online higher education learning environments: a systematic review . Inter Higher Educ. 27 , 1–13. 10.1016/j.iheduc.2015.04.007 [ CrossRef ] [ Google Scholar ]
  • Chen C.. (2006). CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature . J. Am. Soc. Inf. Sci. Technol. 57 , 359–377. 10.1002/asi.20317 [ CrossRef ] [ Google Scholar ]
  • Chen C.. (2012). Predictive effects of structural variation on citation counts . J. Am. Soc. Inf. Sci. Technol . 63 , 431–449. 10.1002/asi.21694 [ CrossRef ] [ Google Scholar ]
  • Chen C.. (2013). Hindsight, insight, and foresight: a multi-level structural variation approach to the study of a scientific field . Technol. Analy. Strat. Manage . 25 , 619–640. 10.1080/09537325.2013.801949 [ CrossRef ] [ Google Scholar ]
  • Chen C.. (2017). Science mapping: a systematic review of the literature . J. Data Inf. Sci. 2 , 1–40. 10.1515/jdis-2017-0006 [ CrossRef ] [ Google Scholar ]
  • Chen C. M., Hu Z. G., Liu S. B., Tseng H. (2012). Emerging trends in regenerative medicine: a scientometric analysis in CiteSpace . Expert Opin. Biol. Therapy. 12 , 593–608. 10.1517/14712598.2012.674507 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen C. M., Ibekwe-SanJuan F., Hou J. (2010). The structure and dynamics of co-citation clusters: a multiple-perspective co-citation analysis . J. Am. Soc. Inf. Sci. Technol . 61 , 1386–1409. 10.1002/asi.21309 [ CrossRef ] [ Google Scholar ]
  • Chu A. H. C., Choi J. N. (2005). Rethinking procrastination: positive effects of “active” procrastination behavior on attitudes and performance . J. Soc. Psychol . 145 , 245–264. 10.3200/SOCP.145.3.245-264 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Corkin D. M., Yu S. L., Lindt S. F. (2011). Comparing active delay and procrastination from a self-regulated learning perspective . Learn. Indiv. Differ. 21 , 602–606. 10.1016/j.lindif.2011.07.005 [ CrossRef ] [ Google Scholar ]
  • Diaz-Morales J. F., Ferrari J. R., Diaz K., Argumedo D. (2006). Factorial structure of three procrastination scales with a Spanish adult population . Eur. J. Psychol. Assess. 22 , 132–137. 10.1027/1015-5759.22.2.132 [ CrossRef ] [ Google Scholar ]
  • Du J., Kerkhof P., van Koningsbruggen G. M. (2019). Predictors of social media self-control failure: immediate gratifications, habitual checking, ubiquity, and notifications . Cyber Psychol. Behav. Soc. Network. 22 , 477–485. 10.1089/cyber.2018.0730 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Duru E., Balkis M. (2017). Procrastination, self-esteem, academic performance, and well-being: a moderated mediation model . International J. Educ. Psychol . 6 , 97–119. 10.17583/ijep.2017.2584 [ CrossRef ] [ Google Scholar ]
  • Elliot A. J., Harackiewicz J. M. (1996). Approach and avoidance achievement goals and intrinsic motivation: a mediational analysis . J. Personal. Soc. Psychol . 70 , 461–475. 10.1037/0022-3514.70.3.461 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fang Y., Yin J., Wu B. (2018). Climate change and tourism: a scientometric analysis using CiteSpace . J. Sustain. Tour . 26 , 108–126. 10.1080/09669582.2017.1329310 [ CrossRef ] [ Google Scholar ]
  • Ferrari J. R.. (1991). Compulsive procrastination: some self-reported characteristics . Psychol. Reports. 68 , 455–458. 10.2466/pr0.1991.68.2.455 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ferrari J. R.. (1992). Psychometric validation of two Procrastination inventories for adults: arousal and avoidance measures . J. Psychopathol. Behav. Assess. 14 , 97–110. 10.1007/BF00965170 [ CrossRef ] [ Google Scholar ]
  • Ferrari J. R., Diaz-Morales J. F., O'Callaghan J., Diaz K., Argumedo D. (2007). Frequent behavioral delay tendencies by adults - International prevalence rates of chronic procrastination . J. Cross-Cultural Psychol . 38 , 458–464. 10.1177/0022022107302314 [ CrossRef ] [ Google Scholar ]
  • Ferrari J. R., Johnson J. L., McCown W. G. (1995). Procrastination and Task Avoidance: Theory, Research, and Treatment . US: Springer US. 10.1007/978-1-4899-0227-6 [ CrossRef ] [ Google Scholar ]
  • Flett G. L., Stainton M., Hewitt P. L., Sherry S. B., Lay C. (2012). Procrastination automatic thoughts as a personality construct: an analysis of the procrastinatory cognitions inventory . J. Rational-Emot. Cogn. Behav. Therapy. 30 , 223–236. 10.1007/s10942-012-0150-z [ CrossRef ] [ Google Scholar ]
  • Freeman L.. (1978). Centrality in social networks conceptual clarification . Soc. Netw. 1 , 215–239. 10.1016/0378-8733(78)90021-7 [ CrossRef ] [ Google Scholar ]
  • Garfield E.. (1979). Is citation analysis a legitimate evaluation tool . Scientometrics . 1 , 359–375. 10.1007/BF02019306 [ CrossRef ] [ Google Scholar ]
  • Garzon Umerenkova A., Gil-Flores J. (2017a). Psychometric properties of the Spanish version of the test procrastination assessment scale-students (PASS) . Rev. Iberoamericana De Diagnostico Y Evaluacion-E Avaliacao Psicol. 1 , 149–163. 10.21865/RIDEP43_149 [ CrossRef ] [ Google Scholar ]
  • Garzon Umerenkova A., Gil-Flores J. (2017b). Academic procrastination in non-traditional college students . Electronic J. Res. Educ. Psychol . 15 , 510–531. 10.14204/ejrep.43.16134 [ CrossRef ] [ Google Scholar ]
  • Germeijs V., Verschueren K. (2011). Indecisiveness and big five personality factors: relationship and specificity . Person. Indiv. Differ. 50 , 1023–1028. 10.1016/j.paid.2011.01.017 [ CrossRef ] [ Google Scholar ]
  • Glick D. M., Orsillo S. M. (2015). An investigation of the efficacy of acceptance-based behavioral therapy for academic procrastination . J. Experim. Psychol. General. 144 , 400–409. 10.1037/xge0000050 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goroshit M.. (2018). Academic procrastination and academic performance: an initial basis for intervention . J. Prevent Intervent. Commun. 46 , 131–142. 10.1080/10852352.2016.1198157 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green M. C., Visser P. S., Tetlock P. E. (2000). Coping with accountability cross-pressures: low-effort evasive tactics and high-effort quests for complex compromises . Personal. Soc. Psychol. Bull. 26 , 1380–1391. 10.1177/0146167200263006 [ CrossRef ] [ Google Scholar ]
  • Guilera G., Barrios M., Penelo E., Morin C., Steel P., Gomez-Benito J. (2018). Validation of the Spanish version of the irrational procrastination scale (IPS) . PLoS ONE. 13 , 1–11. 10.1371/journal.pone.0190806 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gustavson D. E., Miyake A., Hewitt J. K., Friedman N. P. (2014). Genetic relations among procrastination, impulsivity, and goal-management ability: implications for the evolutionary origin of procrastination . Psychol. Sci. 25 , 1178–1188. 10.1177/0956797614526260 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harris N. N., Sutton R. I. (1983). Task procrastination in organizations: a framework for research . Human Relat. 36 , 987–995. 10.1177/001872678303601102 [ CrossRef ] [ Google Scholar ]
  • He Q., Wu M., Wu W., Fu J. (2021). The effect of abusive supervision on employees' work procrastination behavior . Frontiers in Psychol . 12 , 596704. 10.3389/fpsyg.2021.596704 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hen M.. (2018). Causes for procrastination in a unique educational workplace . J. Prevent. Inter. Commun. 46 , 215–227. 10.1080/10852352.2018.1470144 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hen M., Goroshit M. (2018). General and life-domain procrastination in highly educated adults in Israel . Front. Psychol . 9 , 1173. 10.3389/fpsyg.2018.01173 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hong W., Liu R. D., Ding Y., Jiang S. Y., Yang X. T., Sheng X. T. (2021). Academic procrastination precedes problematic mobile phone use in Chinese adolescents: a longitudinal mediation model of distraction cognitions . Addic. Behav. 121 , 106993. 10.1016/j.addbeh.2021.106993 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hou J. H., Yang X. C., Chen C. M. (2020). Measuring researchers' potential scholarly impact with structural variations: four types of researchers in information science (1979-2018) . PLoS ONE. 15 , e0234347. 10.1371/journal.pone.0234347 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Howell A. J., Watson D. C. (2007). Procrastination: associations with achievement goal orientation and learning strategies . Personal. Indiv. Differ. 43 , 167–178. 10.1016/j.paid.2006.11.017 [ CrossRef ] [ Google Scholar ]
  • Hubner K.. (2012). German crisis management and leadership-from ignorance to procrastination to action . Asia Eur. J. 9 , 159–177. 10.1007/s10308-012-0313-7 [ CrossRef ] [ Google Scholar ]
  • Jiang Y., Ritchie B. W., Benckendorff P. (2019). Bibliometric visualization: an application in tourism crisis and disaster management research . Curr. Issues Tour . 22 , 1925–1957. 10.1080/13683500.2017.1408574 [ CrossRef ] [ Google Scholar ]
  • Johnson J. L., Bloom A. M. (1995). An analysis of the contribution of the five factors of personality to variance in academic procrastination . Personal. Indiv. Differ. 18 , 127–133. 10.1016/0191-8869(94)00109-6 [ CrossRef ] [ Google Scholar ]
  • Kim K. R., Seo E. H. (2015). The relationship between procrastination and academic performance: a meta-analysis . Personal. Indiv. Differ. 82 , 26–33. 10.1016/j.paid.2015.02.038 [ CrossRef ] [ Google Scholar ]
  • Kim S., Fernandez S., Terrier L. (2017). Procrastination, personality traits, and academic performance: when active and passive procrastination tell a different story . Personal. Indiv. Differ. 108 , 154–157. 10.1016/j.paid.2016.12.021 [ CrossRef ] [ Google Scholar ]
  • Klassen R. M., Krawchuk L. L., Rajani S. (2008). Academic procrastination of undergraduates: low self-efficacy to self-regulate predicts higher levels of procrastination . Contemp. Educ.Psychol . 33 , 915–931. 10.1016/j.cedpsych.2007.07.001 [ CrossRef ] [ Google Scholar ]
  • Klein E.. (1971). A comprehensive etymological dictionary of the Hebrew language for readers of English . Tyndale House Publishers. [ Google Scholar ]
  • Kleinberg J.. (2003). Bursty and hierarchical structure in streams . Data Mining Knowl. Disc. 7 , 373–397. 10.1023/A:1024940629314 [ CrossRef ] [ Google Scholar ]
  • Klingsieck K. B.. (2013). Procrastination: when good things don't come to those who wait . Eur. Psychol . 18 , 24–34. 10.1027/1016-9040/a000138 [ CrossRef ] [ Google Scholar ]
  • Lay C.. (1986). At last, my research article on procrastination . J. Res. Personal . 20 , 474–495. 10.1016/0092-6566(86)90127-3 [ CrossRef ] [ Google Scholar ]
  • Lay C.. (1988). The relation of procrastination and optimism to judgments of time to complete an essay and anticipation of setbacks . J. Soc. Behav. Personal. 3 , 201–214. [ Google Scholar ]
  • Legood A., Lee A., Schwarz G., Newman A. (2018). From self-defeating to other defeating: examining the effects of leader procrastination on follower work outcomes . J. Occup. Organiz. Psychol . 91 , 430–439. 10.1111/joop.12205 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Levin N., Lipshits-Braziler Y. (2021). Facets of adaptability in career decision-making . Int. J. Educ. Vocat. Guidance . 6 , 1–12. 10.1007/s10775-021-09489-w [ CrossRef ] [ Google Scholar ]
  • Malouff J. M., Schutte N. S. (2019). The efficacy of interventions aimed at reducing procrastination: a meta-analysis of randomized controlled trials . J. Counsel. Develop. 97 , 117–127. 10.1002/jcad.12243 [ CrossRef ] [ Google Scholar ]
  • Markscheffel B., Schroeter F. (2021). Comparison of two science mapping tools based on software technical evaluation and bibliometric case studies . Collnet J. Scientometr. Inf. Manage . 15 , 365–396. 10.1080/09737766.2021.1960220 [ CrossRef ] [ Google Scholar ]
  • Metin U. B., Tanis T. W., Peeters M. C. W. (2016). Measuring procrastination at work and its associated workplace aspects . Personal. Indiv. Differ . 101 , 254–263. 10.1016/j.paid.2016.06.006 [ CrossRef ] [ Google Scholar ]
  • Miraj S. A.. (2020). Coronavirus disease 2019: the public health challenge and our preparedness . Biosci. Biotechnol. Res. Commun. 13 , 361–364. 10.21786/bbrc/13.2/1 [ CrossRef ] [ Google Scholar ]
  • Nauts S., Kamphorst B. A., Stut W., De Ridder D. T. D., Anderson J. H. (2019). The explanations people give for going to bed late: a qualitative study of the varieties of bedtime procrastination . Behav. Sleep Med. 17 , 753–762. 10.1080/15402002.2018.1491850 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nevill C. J.. (2009). Managing cumulative impacts: groundwater reform in the Murray-Darling basin, Australia . Water Res. Manage. 23 , 2605–2631. 10.1007/s11269-009-9399-0 [ CrossRef ] [ Google Scholar ]
  • Olmeda-Gomez C., Roma-Mateo C., Ovalle-Perandones M. A. (2019). Overview of trends in global epigenetic research (2009-2017) . Scientometrics. 119 , 1545–1574. 10.1007/s11192-019-03095-y [ CrossRef ] [ Google Scholar ]
  • Pan W. W., Jian L. R., Liu T. (2019). Grey system theory trends from 1991 to 2018: a bibliometric analysis and visualization . Scientometrics. 121 , 1407–1434. 10.1007/s11192-019-03256-z [ CrossRef ] [ Google Scholar ]
  • Perdomo A. S., Feliciano-Garcia L. (2020). The influence of active procrastination: a profile on educational sciences students' academic achievement . Bordon-Rev. De Pedagogia. 72 , 157–170. 10.13042/Bordon.2020.73642 [ CrossRef ] [ Google Scholar ]
  • Przepiorka A., Blachnio A., Cudo A. (2021). Procrastination and problematic new media use: the mediating role of future anxiety . Current Psychol . 5 , 1–10. 10.1007/s12144-021-01773-w [ CrossRef ] [ Google Scholar ]
  • Pychyl T. A., Flett G. L. (2012). Procrastination and self-regulatory failure: an introduction to the special issue . J. Rational-Emotive Cogn. Behav. Therapy. 30 , 203–212. 10.1007/s10942-012-0149-5 [ CrossRef ] [ Google Scholar ]
  • Rozental A., Forsell E., Svensson A., Andersson G., Carlbring P. (2017). Overcoming procrastination: one-year follow-up and predictors of change in a randomized controlled trial of Internet-based cognitive behavior therapy . Cogn. Behav.Therapy. 46 , 177–195. 10.1080/16506073.2016.1236287 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schouwenburg H. C., Lay C. H. (1995). Trait procrastination and the Big-five factors of personality . Personal. Indiv. Differ. 18 , 481–490. 10.1016/0191-8869(94)00176-S [ CrossRef ] [ Google Scholar ]
  • Schraw G., Wadkins T., Olafson L. (2007). Doing the things we do: a grounded theory of academic procrastination . J. Educ.Psychol . 99 , 12–25. 10.1037/0022-0663.99.1.12 [ CrossRef ] [ Google Scholar ]
  • Sheybani F., Gharraee B., Bakhshizadeh M., Tamanaeefar S. (2017). Decisional procrastination: prevalence among students and relationship with emotional intelligence and big five-factor model of personality . Int. J. Life Sci. Pharma Res. 7 , 26–32. [ Google Scholar ]
  • Sirois F., Pychyl T. (2013). Procrastination and the priority of short-term mood regulation: consequences for future self . Soc. Personal. Psychol. Compass. 7 , 115–127. 10.1111/spc3.12011 [ CrossRef ] [ Google Scholar ]
  • Sirois F. M.. (2004). Procrastination and intentions to perform health behaviors: the role of self-efficacy and the consideration of future consequences . Personal. Indiv. Differ. 37 , 115–128. 10.1016/j.paid.2003.08.005 [ CrossRef ] [ Google Scholar ]
  • Sirois F. M.. (2021). Trait procrastination undermines outcome and efficacy expectancies for achieving health-related possible selves . Curr. Psychol . 40 , 3840–3847. 10.1007/s12144-019-00338-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sirois F. M., Molnar D. S., Hirsch J. K. (2017). A meta-analytic and conceptual update on the associations between procrastination and multidimensional perfectionism . Eur. J. Personal. 31 , 137–159. 10.1002/per.2098 [ CrossRef ] [ Google Scholar ]
  • Steel P.. (2007). The nature of procrastination: a meta-analytic and theoretical review of quintessential self-regulatory failure . Psychol.Bull. 133 , 65–94. 10.1037/0033-2909.133.1.65 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Steel P.. (2010). Arousal, avoidant and decisional procrastinators: do they exist? Personal. Indiv. Differ. 48 , 926–934. 10.1016/j.paid.2010.02.025 [ CrossRef ] [ Google Scholar ]
  • Steel P., Ferrari J. (2013). Sex, education and procrastination: an epidemiological study of procrastinators' characteristics from a global sample . Eur. J. Personal. 27 , 51–58. 10.1002/per.1851 [ CrossRef ] [ Google Scholar ]
  • Steel P., Klingsieck K. B. (2016). Academic procrastination: psychological antecedents revisited . Austral. Psychol . 51 , 36–46. 10.1111/ap.12173 [ CrossRef ] [ Google Scholar ]
  • Svartdal F.. (2017). Measuring procrastination: psychometric properties of the Norwegian versions of the irrational procrastination scale (IPS) and the pure procrastination scale (PPS) . Scand. J. Educ. Res . 61 , 18–30. 10.1080/00313831.2015.1066439 [ CrossRef ] [ Google Scholar ]
  • Tao X., Hanif H., Ahmed H. H., Ebrahim N. A. (2021). Bibliometric analysis and visualization of academic procrastination . Front. Psychol . 12 , 722332. 10.3389/fpsyg.2021.722332 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tibbett T., Ferrari J. (2019). Return to the origin: what creates a procrastination identity? Curr. Issues Personal. Psychol . 7 , 1–7. 10.5114/cipp.2018.75648 [ CrossRef ] [ Google Scholar ]
  • Tice D. M., Baumeister R. F. (1997). Longitudinal study of procrastination, performance, stress, and health: the costs and benefits of dawdling . Psychol. Sci. 8 , 454–458. 10.1111/j.1467-9280.1997.tb00460.x [ CrossRef ] [ Google Scholar ]
  • van Eck N. J., Waltman L. (2010). Software survey: VOS viewer, a computer program for bibliometric mapping . Scientometrics. 84 , 523–538. 10.1007/s11192-009-0146-3 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Eerde W.. (2003). A meta-analytically derived nomological network of procrastination . Personal. Indiv. Differ. 35 , 1401–1418. 10.1016/S0191-8869(02)00358-6 [ CrossRef ] [ Google Scholar ]
  • Vangsness L., Young M. E. (2020). Turtle, task ninja, or time waster? Who cares? traditional task-completion strategies are overrated . Psychol. Sci. 31 , 306–315. 10.1177/0956797619901267 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Visser L., Schoonenboom J., Korthagen F. A. J. (2017). A field experimental design of a strengths-based training to overcome academic procrastination: short- and long-term effect . Front. Psychol . 8 , 1949. 10.3389/fpsyg.2017.01949 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wessel J., Bradley G. L., Hood M. (2019). Comparing effects of active and passive procrastination: a field study of behavioral delay . Personal. Indiv. Differ. 139 , 152–157. 10.1016/j.paid.2018.11.020 [ CrossRef ] [ Google Scholar ]
  • Widziewicz-Rzonca K., Tytla M. (2020). First systematic review on PM-bound water: exploring the existing knowledge domain using the CiteSpace software . Scientometrics. 124 , 1945–2008. 10.1007/s11192-020-03547-w [ CrossRef ] [ Google Scholar ]
  • Wolters C. A.. (2003). Understanding procrastination from a self-regulated learning perspective . J. Educ.Psychol . 95 , 179–187. 10.1037/0022-0663.95.1.179 [ CrossRef ] [ Google Scholar ]
  • Wypych M., Matuszewski J., Dragan W. Ł. (2018). Roles of impulsivity, motivation, and emotion regulation in procrastination – path analysis and comparison between students and non-students . Front. Psychol . 9 , 891. 10.3389/fpsyg.2018.00891 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yi Y. T., Luo J. S., Wubbenhorst M. (2020). Research on political instability, uncertainty and risk during 1953-2019: a scientometric review . Scientometrics. 123 , 1051–1076. 10.1007/s11192-020-03416-6 [ CrossRef ] [ Google Scholar ]
  • Zhang S. M., Verguts T., Zhang C. Y., Feng P., Chen Q., Feng T. Y. (2021). Outcome value and task aversiveness impact task procrastination through separate neural pathways . Cerebral Cortex . 31 , 3846–3855. 10.1093/cercor/bhab053 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zohar A. H., Shimone L. P., Hen M. (2019). Active and passive procrastination in terms of temperament and character . Peerj. 7 , e6988. 10.7717/peerj.6988 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Statistics Research Paper

Academic Writing Service

View sample Statistics Research Paper. Browse other  research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service s for professional assistance. We offer high-quality assignments for reasonable rates.

Statistics Research Paper

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code, more statistics research papers:.

  • Time Series Research Paper
  • Crime Statistics Research Paper
  • Economic Statistics Research Paper
  • Education Statistics Research Paper
  • Health Statistics Research Paper
  • Labor Statistics Research Paper
  • History of Statistics Research Paper
  • Survey Sampling Research Paper
  • Multidimensional Scaling Research Paper
  • Sequential Statistical Methods Research Paper
  • Simultaneous Equation Estimation Research Paper
  • Statistical Clustering Research Paper
  • Statistical Sufficiency Research Paper
  • Censuses Of Population Research Paper
  • Stochastic Models Research Paper
  • Stock Market Predictability Research Paper
  • Structural Equation Modeling Research Paper
  • Survival Analysis Research Paper
  • Systems Modeling Research Paper
  • Nonprobability Sampling Research Paper

1. Introduction

Statistics is a body of quantitative methods associated with empirical observation. A primary goal of these methods is coping with uncertainty. Most formal statistical methods rely on probability theory to express this uncertainty and to provide a formal mathematical basis for data description and for analysis. The notion of variability associated with data, expressed through probability, plays a fundamental role in this theory. As a consequence, much statistical effort is focused on how to control and measure variability and/or how to assign it to its sources.

Almost all characterizations of statistics as a field include the following elements:

(a) Designing experiments, surveys, and other systematic forms of empirical study.

(b) Summarizing and extracting information from data.

(c) Drawing formal inferences from empirical data through the use of probability.

(d) Communicating the results of statistical investigations to others, including scientists, policy makers, and the public.

This research paper describes a number of these elements, and the historical context out of which they grew. It provides a broad overview of the field, that can serve as a starting point to many of the other statistical entries in this encyclopedia.

2. The Origins Of The Field of Statistics

The word ‘statistics’ is related to the word ‘state’ and the original activity that was labeled as statistics was social in nature and related to elements of society through the organization of economic, demographic, and political facts. Paralleling this work to some extent was the development of the probability calculus and the theory of errors, typically associated with the physical sciences. These traditions came together in the nineteenth century and led to the notion of statistics as a collection of methods for the analysis of scientific data and the drawing of inferences therefrom.

As Hacking (1990) has noted: ‘By the end of the century chance had attained the respectability of a Victorian valet, ready to be the logical servant of the natural, biological and social sciences’ ( p. 2). At the beginning of the twentieth century, we see the emergence of statistics as a field under the leadership of Karl Pearson, George Udny Yule, Francis Y. Edgeworth, and others of the ‘English’ statistical school. As Stigler (1986) suggests:

Before 1900 we see many scientists of different fields developing and using techniques we now recognize as belonging to modern statistics. After 1900 we begin to see identifiable statisticians developing such techniques into a unified logic of empirical science that goes far beyond its component parts. There was no sharp moment of birth; but with Pearson and Yule and the growing number of students in Pearson’s laboratory, the infant discipline may be said to have arrived. (p. 361)

Pearson’s laboratory at University College, London quickly became the first statistics department in the world and it was to influence subsequent developments in a profound fashion for the next three decades. Pearson and his colleagues founded the first methodologically-oriented statistics journal, Biometrika, and they stimulated the development of new approaches to statistical methods. What remained before statistics could legitimately take on the mantle of a field of inquiry, separate from mathematics or the use of statistical approaches in other fields, was the development of the formal foundations of theories of inference from observations, rooted in an axiomatic theory of probability.

Beginning at least with the Rev. Thomas Bayes and Pierre Simon Laplace in the eighteenth century, most early efforts at statistical inference used what was known as the method of inverse probability to update a prior probability using the observed data in what we now refer to as Bayes’ Theorem. (For a discussion of who really invented Bayes’ Theorem, see Stigler 1999, Chap. 15). Inverse probability came under challenge in the nineteenth century, but viable alternative approaches gained little currency. It was only with the work of R. A. Fisher on statistical models, estimation, and significance tests, and Jerzy Neyman and Egon Pearson, in the 1920s and 1930s, on tests of hypotheses, that alternative approaches were fully articulated and given a formal foundation. Neyman’s advocacy of the role of probability in the structuring of a frequency-based approach to sample surveys in 1934 and his development of confidence intervals further consolidated this effort at the development of a foundation for inference (cf. Statistical Methods, History of: Post- 1900 and the discussion of ‘The inference experts’ in Gigerenzer et al. 1989).

At about the same time Kolmogorov presented his famous axiomatic treatment of probability, and thus by the end of the 1930s, all of the requisite elements were finally in place for the identification of statistics as a field. Not coincidentally, the first statistical society devoted to the mathematical underpinnings of the field, The Institute of Mathematical Statistics, was created in the United States in the mid-1930s. It was during this same period that departments of statistics and statistical laboratories and groups were first formed in universities in the United States.

3. Emergence Of Statistics As A Field

3.1 the role of world war ii.

Perhaps the greatest catalysts to the emergence of statistics as a field were two major social events: the Great Depression of the 1930s and World War II. In the United States, one of the responses to the depression was the development of large-scale probability-based surveys to measure employment and unemployment. This was followed by the institutionalization of sampling as part of the 1940 US decennial census. But with World War II raging in Europe and in Asia, mathematicians and statisticians were drawn into the war effort, and as a consequence they turned their attention to a broad array of new problems. In particular, multiple statistical groups were established in both England and the US specifically to develop new methods and to provide consulting. (See Wallis 1980, on statistical groups in the US; Barnard and Plackett 1985, for related efforts in the United Kingdom; and Fienberg 1985). These groups not only created imaginative new techniques such as sequential analysis and statistical decision theory, but they also developed a shared research agenda. That agenda led to a blossoming of statistics after the war, and in the 1950s and 1960s to the creation of departments of statistics at universities—from coast to coast in the US, and to a lesser extent in England and elsewhere.

3.2 The Neo-Bayesian Revival

Although inverse probability came under challenge in the 1920s and 1930s, it was not totally abandoned. John Maynard Keynes (1921) wrote A Treatise on Probability that was rooted in this tradition, and Frank Ramsey (1926) provided an early effort at justifying the subjective nature of prior distributions and suggested the importance of utility functions as an adjunct to statistical inference. Bruno de Finetti provided further development of these ideas in the 1930s, while Harold Jeffreys (1938) created a separate ‘objective’ development of these and other statistical ideas on inverse probability.

Yet as statistics flourished in the post-World War II era, it was largely based on the developments of Fisher, Neyman and Pearson, as well as the decision theory methods of Abraham Wald (1950). L. J. Savage revived interest in the inverse probability approach with The Foundations of Statistics (1954) in which he attempted to provide the axiomatic foundation from the subjective perspective. In an essentially independent effort, Raiffa and Schlaifer (1961) attempted to provide inverse probability counterparts to many of the then existing frequentist tools, referring to these alternatives as ‘Bayesian.’ By 1960, the term ‘Bayesian inference’ had become standard usage in the statistical literature, the theoretical interest in the development of Bayesian approaches began to take hold, and the neo-Bayesian revival was underway. But the movement from Bayesian theory to statistical practice was slow, in large part because the computations associated with posterior distributions were an overwhelming stumbling block for those who were interested in the methods. Only in the 1980s and 1990s did new computational approaches revolutionize both Bayesian methods, and the interest in them, in a broad array of areas of application.

3.3 The Role Of Computation In Statistics

From the days of Pearson and Fisher, computation played a crucial role in the development and application of statistics. Pearson’s laboratory employed dozens of women who used mechanical devices to carry out the careful and painstaking calculations required to tabulate values from various probability distributions. This effort ultimately led to the creation of the Biometrika Tables for Statisticians that were so widely used by others applying tools such as chisquare tests and the like. Similarly, Fisher also developed his own set of statistical tables with Frank Yates when he worked at Rothamsted Experiment Station in the 1920s and 1930s. One of the most famous pictures of Fisher shows him seated at Whittingehame Lodge, working at his desk calculator (see Box 1978).

The development of the modern computer revolutionized statistical calculation and practice, beginning with the creation of the first statistical packages in the 1960s—such as the BMDP package for biological and medical applications, and Datatext for statistical work in the social sciences. Other packages soon followed—such as SAS and SPSS for both data management and production-like statistical analyses, and MINITAB for the teaching of statistics. In 2001, in the era of the desktop personal computer, almost everyone has easy access to interactive statistical programs that can implement complex statistical procedures and produce publication-quality graphics. And there is a new generation of statistical tools that rely upon statistical simulation such as the bootstrap and Markov Chain Monte Carlo methods. Complementing the traditional production-like packages for statistical analysis are more methodologically oriented languages such as S and S-PLUS, and symbolic and algebraic calculation packages. Statistical journals and those in various fields of application devote considerable space to descriptions of such tools.

4. Statistics At The End Of The Twentieth Century

It is widely recognized that any statistical analysis can only be as good as the underlying data. Consequently, statisticians take great care in the the design of methods for data collection and in their actual implementation. Some of the most important modes of statistical data collection include censuses, experiments, observational studies, and sample Surveys, all of which are discussed elsewhere in this encyclopedia. Statistical experiments gain their strength and validity both through the random assignment of treatments to units and through the control of nontreatment variables. Similarly sample surveys gain their validity for generalization through the careful design of survey questionnaires and probability methods used for the selection of the sample units. Approaches to cope with the failure to fully implement randomization in experiments or random selection in sample surveys are discussed in Experimental Design: Compliance and Nonsampling Errors.

Data in some statistical studies are collected essentially at a single point in time (cross-sectional studies), while in others they are collected repeatedly at several time points or even continuously, while in yet others observations are collected sequentially, until sufficient information is available for inferential purposes. Different entries discuss these options and their strengths and weaknesses.

After a century of formal development, statistics as a field has developed a number of different approaches that rely on probability theory as a mathematical basis for description, analysis, and statistical inference. We provide an overview of some of these in the remainder of this section and provide some links to other entries in this encyclopedia.

4.1 Data Analysis

The least formal approach to inference is often the first employed. Its name stems from a famous article by John Tukey (1962), but it is rooted in the more traditional forms of descriptive statistical methods used for centuries.

Today, data analysis relies heavily on graphical methods and there are different traditions, such as those associated with

(a) The ‘exploratory data analysis’ methods suggested by Tukey and others.

(b) The more stylized correspondence analysis techniques of Benzecri and the French school.

(c) The alphabet soup of computer-based multivariate methods that have emerged over the past decade such as ACE, MARS, CART, etc.

No matter which ‘school’ of data analysis someone adheres to, the spirit of the methods is typically to encourage the data to ‘speak for themselves.’ While no theory of data analysis has emerged, and perhaps none is to be expected, the flexibility of thought and method embodied in the data analytic ideas have influenced all of the other approaches.

4.2 Frequentism

The name of this group of methods refers to a hypothetical infinite sequence of data sets generated as was the data set in question. Inferences are to be made with respect to this hypothetical infinite sequence. (For details, see Frequentist Inference).

One of the leading frequentist methods is significance testing, formalized initially by R. A. Fisher (1925) and subsequently elaborated upon and extended by Neyman and Pearson and others (see below). Here a null hypothesis is chosen, for example, that the mean, µ, of a normally distributed set of observations is 0. Fisher suggested the choice of a test statistic, e.g., based on the sample mean, x, and the calculation of the likelihood of observing an outcome as or more extreme as x is from µ 0, a quantity usually labeled as the p-value. When p is small (e.g., less than 5 percent), either a rare event has occurred or the null hypothesis is false. Within this theory, no probability can be given for which of these two conclusions is the case.

A related set of methods is testing hypotheses, as proposed by Neyman and Pearson (1928, 1932). In this approach, procedures are sought having the property that, for an infinite sequence of such sets, in only (say) 5 percent for would the null hypothesis be rejected if the null hypothesis were true. Often the infinite sequence is restricted to sets having the same sample size, but this is unnecessary. Here, in addition to the null hypothesis, an alternative hypothesis is specified. This permits the definition of a power curve, reflecting the frequency of rejecting the null hypothesis when the specified alternative is the case. But, as with the Fisherian approach, no probability can be given to either the null or the alternative hypotheses.

The construction of confidence intervals, following the proposal of Neyman (1934), is intimately related to testing hypotheses; indeed a 95 percent confidence interval may be regarded as the set of null hypotheses which, had they been tested at the 5 percent level of significance, would not have been rejected. A confidence interval is a random interval, having the property that the specified proportion (say 95 percent) of the infinite sequence, of random intervals would have covered the true value. For example, an interval that 95 percent of the time (by auxiliary randomization) is the whole real line, and 5 percent of the time is the empty set, is a valid 95 percent confidence interval.

Estimation of parameters—i.e., choosing a single value of the parameters that is in some sense best—is also an important frequentist method. Many methods have been proposed, both for particular models and as general approaches regardless of model, and their frequentist properties explored. These methods usually extended to intervals of values through inversion of test statistics or via other related devices. The resulting confidence intervals share many of the frequentist theoretical properties of the corresponding test procedures.

Frequentist statisticians have explored a number of general properties thought to be desirable in a procedure, such as invariance, unbiasedness, sufficiency, conditioning on ancillary statistics, etc. While each of these properties has examples in which it appears to produce satisfactory recommendations, there are others in which it does not. Additionally, these properties can conflict with each other. No general frequentist theory has emerged that proposes a hierarchy of desirable properties, leaving a frequentist without guidance in facing a new problem.

4.3 Likelihood Methods

The likelihood function (first studied systematically by R. A. Fisher) is the probability density of the data, viewed as a function of the parameters. It occupies an interesting middle ground in the philosophical debate, as it is used both by frequentists (as in maximum likelihood estimation) and by Bayesians in the transition from prior distributions to posterior distributions. A small group of scholars (among them G. A. Barnard, A. W. F. Edwards, R. Royall, D. Sprott) have proposed the likelihood function as an independent basis for inference. The issue of nuisance parameters has perplexed this group, since maximization, as would be consistent with maximum likelihood estimation, leads to different results in general than does integration, which would be consistent with Bayesian ideas.

4.4 Bayesian Methods

Both frequentists and Bayesians accept Bayes’ Theorem as correct, but Bayesians use it far more heavily. Bayesian analysis proceeds from the idea that probability is personal or subjective, reflecting the views of a particular person at a particular point in time. These views are summarized in the prior distribution over the parameter space. Together the prior distribution and the likelihood function define the joint distribution of the parameters and the data. This joint distribution can alternatively be factored as the product of the posterior distribution of the parameter given the data times the predictive distribution of the data.

In the past, Bayesian methods were deemed to be controversial because of the avowedly subjective nature of the prior distribution. But the controversy surrounding their use has lessened as recognition of the subjective nature of the likelihood has spread. Unlike frequentist methods, Bayesian methods are, in principle, free of the paradoxes and counterexamples that make classical statistics so perplexing. The development of hierarchical modeling and Markov Chain Monte Carlo (MCMC) methods have further added to the current popularity of the Bayesian approach, as they allow analyses of models that would otherwise be intractable.

Bayesian decision theory, which interacts closely with Bayesian statistical methods, is a useful way of modeling and addressing decision problems of experimental designs and data analysis and inference. It introduces the notion of utilities and the optimum decision combines probabilities of events with utilities by the calculation of expected utility and maximizing the latter (e.g., see the discussion in Lindley 2000).

Current research is attempting to use the Bayesian approach to hypothesis testing to provide tests and pvalues with good frequentist properties (see Bayarri and Berger 2000).

4.5 Broad Models: Nonparametrics And Semiparametrics

These models include parameter spaces of infinite dimensions, whether addressed in a frequentist or Bayesian manner. In a sense, these models put more inferential weight on the assumption of conditional independence than does an ordinary parametric model.

4.6 Some Cross-Cutting Themes

Often different fields of application of statistics need to address similar issues. For example, dimensionality of the parameter space is often a problem. As more parameters are added, the model will in general fit better (at least no worse). Is the apparent gain in accuracy worth the reduction in parsimony? There are many different ways to address this question in the various applied areas of statistics.

Another common theme, in some sense the obverse of the previous one, is the question of model selection and goodness of fit. In what sense can one say that a set of observations is well-approximated by a particular distribution? (cf. Goodness of Fit: Overview). All statistical theory relies at some level on the use of formal models, and the appropriateness of those models and their detailed specification are of concern to users of statistical methods, no matter which school of statistical inference they choose to work within.

5. Statistics In The Twenty-first Century

5.1 adapting and generalizing methodology.

Statistics as a field provides scientists with the basis for dealing with uncertainty, and, among other things, for generalizing from a sample to a population. There is a parallel sense in which statistics provides a basis for generalization: when similar tools are developed within specific substantive fields, such as experimental design methodology in agriculture and medicine, and sample surveys in economics and sociology. Statisticians have long recognized the common elements of such methodologies and have sought to develop generalized tools and theories to deal with these separate approaches (see e.g., Fienberg and Tanur 1989).

One hallmark of modern statistical science is the development of general frameworks that unify methodology. Thus the tools of Generalized Linear Models draw together methods for linear regression and analysis of various models with normal errors and those log-linear and logistic models for categorical data, in a broader and richer framework. Similarly, graphical models developed in the 1970s and 1980s use concepts of independence to integrate work in covariance section, decomposable log-linear models, and Markov random field models, and produce new methodology as a consequence. And the latent variable approaches from psychometrics and sociology have been tied with simultaneous equation and measurement error models from econometrics into a broader theory of covariance analysis and structural equations models.

Another hallmark of modern statistical science is the borrowing of methods in one field for application in another. One example is provided by Markov Chain Monte Carlo methods, now used widely in Bayesian statistics, which were first used in physics. Survival analysis, used in biostatistics to model the disease-free time or time-to-mortality of medical patients, and analyzed as reliability in quality control studies, are now used in econometrics to measure the time until an unemployed person gets a job. We anticipate that this trend of methodological borrowing will continue across fields of application.

5.2 Where Will New Statistical Developments Be Focused ?

In the issues of its year 2000 volume, the Journal of the American Statistical Association explored both the state of the art of statistics in diverse areas of application, and that of theory and methods, through a series of vignettes or short articles. These essays provide an excellent supplement to the entries of this encyclopedia on a wide range of topics, not only presenting a snapshot of the current state of play in selected areas of the field but also affecting some speculation on the next generation of developments. In an afterword to the last set of these vignettes, Casella (2000) summarizes five overarching themes that he observed in reading through the entire collection:

(a) Large datasets.

(b) High-dimensional/nonparametric models.

(c) Accessible computing.

(d) Bayes/frequentist/who cares?

(e) Theory/applied/why differentiate?

Not surprisingly, these themes fit well those that one can read into the statistical entries in this encyclopedia. The coming together of Bayesian and frequentist methods, for example, is illustrated by the movement of frequentists towards the use of hierarchical models and the regular consideration of frequentist properties of Bayesian procedures (e.g., Bayarri and Berger 2000). Similarly, MCMC methods are being widely used in non-Bayesian settings and, because they focus on long-run sequences of dependent draws from multivariate probability distributions, there are frequentist elements that are brought to bear in the study of the convergence of MCMC procedures. Thus the oft-made distinction between the different schools of statistical inference (suggested in the preceding section) is not always clear in the context of real applications.

5.3 The Growing Importance Of Statistics Across The Social And Behavioral Sciences

Statistics touches on an increasing number of fields of application, in the social sciences as in other areas of scholarship. Historically, the closest links have been with economics; together these fields share parentage of econometrics. There are now vigorous interactions with political science, law, sociology, psychology, anthropology, archeology, history, and many others.

In some fields, the development of statistical methods has not been universally welcomed. Using these methods well and knowledgeably requires an understanding both of the substantive field and of statistical methods. Sometimes this combination of skills has been difficult to develop.

Statistical methods are having increasing success in addressing questions throughout the social and behavioral sciences. Data are being collected and analyzed on an increasing variety of subjects, and the analyses are becoming increasingly sharply focused on the issues of interest.

We do not anticipate, nor would we find desirable, a future in which only statistical evidence was accepted in the social and behavioral sciences. There is room for, and need for, many different approaches. Nonetheless, we expect the excellent progress made in statistical methods in the social and behavioral sciences in recent decades to continue and intensify.

Bibliography:

  • Barnard G A, Plackett R L 1985 Statistics in the United Kingdom, 1939–1945. In: Atkinson A C, Fienberg S E (eds.) A Celebration of Statistics: The ISI Centennial Volume. Springer-Verlag, New York, pp. 31–55
  • Bayarri M J, Berger J O 2000 P values for composite null models (with discussion). Journal of the American Statistical Association 95: 1127–72
  • Box J 1978 R. A. Fisher, The Life of a Scientist. Wiley, New York
  • Casella G 2000 Afterword. Journal of the American Statistical Association 95: 1388
  • Fienberg S E 1985 Statistical developments in World War II: An international perspective. In: Anthony C, Atkinson A C, Fienberg S E (eds.) A Celebration of Statistics: The ISI Centennial Volume. Springer-Verlag, New York, pp. 25–30
  • Fienberg S E, Tanur J M 1989 Combining cognitive and statistical approaches to survey design. Science 243: 1017–22
  • Fisher R A 1925 Statistical Methods for Research Workers. Oliver and Boyd, London
  • Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger L 1989 The Empire of Chance. Cambridge University Press, Cambridge, UK
  • Hacking I 1990 The Taming of Chance. Cambridge University Press, Cambridge, UK
  • Jeffreys H 1938 Theory of Probability, 2nd edn. Clarendon Press, Oxford, UK
  • Keynes J 1921 A Treatise on Probability. Macmillan, London
  • Lindley D V 2000/1932 The philosophy of statistics (with discussion). The Statistician 49: 293–337
  • Neyman J 1934 On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection (with discussion). Journal of the Royal Statistical Society 97: 558–625
  • Neyman J, Pearson E S 1928 On the use and interpretation of certain test criteria for purposes of statistical inference. Part I. Biometrika 20A: 175–240
  • Neyman J, Pearson E S 1932 On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society, Series. A 231: 289–337
  • Raiffa H, Schlaifer R 1961 Applied Statistical Decision Theory. Harvard Business School, Boston
  • Ramsey F P 1926 Truth and probability. In: The Foundations of Mathematics and Other Logical Essays. Kegan Paul, London, pp.
  • Savage L J 1954 The Foundations of Statistics. Wiley, New York
  • Stigler S M 1986 The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Cambridge, MA
  • Stigler S M 1999 Statistics on the Table: The History of Statistical Concepts and Methods. Harvard University Press, Cambridge, MA
  • Tukey John W 1962 The future of data analysis. Annals of Mathematical Statistics 33: 1–67
  • Wald A 1950 Statistical Decision Functions. Wiley, New York
  • Wallis W 1980 The Statistical Research Group, 1942–1945 (with discussion). Journal of the American Statistical Association 75: 320–35

ORDER HIGH QUALITY CUSTOM PAPER

statistics related research paper

Dealing with Criticisms in Interdisciplinary Research Settings: Navigating Peer Review and Its Challenges

Information Matters, Vol. 4, No. 8, 2024

3 Pages Posted: 20 Aug 2024

Tongrui Zhang

Sun Yat-sen University

Date Written: July 30, 2024

This paper highlights strategies for navigating interdisciplinary research challenges and turning criticisms into opportunities for growth.

Keywords: interdisciplinary research, Peer Review, SI Professional Development

Suggested Citation: Suggested Citation

Tongrui Zhang (Contact Author)

Sun yat-sen university ( email ), do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, information matters.

Subscribe to this free journal for more curated articles on this topic

Information Theory & Research eJournal

Subscribe to this fee journal for more curated articles on this topic

IMAGES

  1. Statistics Research Paper Example

    statistics related research paper

  2. Sample of Chapter 3 in Research Paper (Probability and Statistics

    statistics related research paper

  3. 😊 Statistical analysis paper. Free statistics project Essays and Papers

    statistics related research paper

  4. Figures in Research Paper

    statistics related research paper

  5. How To Write A Statistics Research Paper?

    statistics related research paper

  6. RRL-Statistics

    statistics related research paper

COMMENTS

  1. Statistics

    A paper in Physical Review X presents a method for numerically generating data sequences that are as likely to be observed under a power law as a given observed dataset. Zoe Budrikis Research ...

  2. Statistics

    A new possibilistic-based clustering method for probability density functions and its application to detecting abnormal elements. Hung Tran-Nam. , Thao Nguyen-Trang. & Ha Che-Ngoc. Article. 30 ...

  3. Research Papers / Publications

    Previous Statistics Seminars; Related Seminars; Programs. Undergraduate Program. Undergraduate Contact Information; Undergraduate Statistics Concentration; ... Research Papers / Publications. Search Publication Type Publication Year Yan Sun, Pratik Chaudhari, Ian J. Barnett, Edgar Dobriban A ...

  4. (PDF) Data Science: the impact of statistics

    In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods. to find structure in and to give deeper insight into data, and ...

  5. Home

    Overview. Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications. The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.

  6. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  7. Statistics

    Title: Fast and robust cross-validation-based scoring rule inference for spatial statistics Helga Kristin Olafsdottir, Holger Rootzén, David Bolin. Subjects: Methodology (stat.ME) arXiv:2408.11977 [pdf, html, other] Title: An ...

  8. Journal of Applied Statistics

    The Journal publishes original research papers, review articles, and short application notes. In general, original research papers should present one or two challenges in an area, include relevant data, provide a novel method to solve challenges, and demonstrate that the proposed method has answered questions that were not properly or optimally ...

  9. Journal of Probability and Statistics

    Online ISSN: 1687-9538. Print ISSN: 1687-952X. Journal of Probability and Statistics is an open access journal publishing papers on the theory and application of probability and statistics that consider new methods and approaches to their implementation, or report significant results for the field. As part of Wiley's Forward Series, this ...

  10. Data Science: the impact of statistics

    In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data ...

  11. Biostatistics

    Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of data from them, and the analysis and interpretation of ...

  12. Articles making an impact in Mathematics and Statistics

    Articles making an impact in Mathematics and Statistics. Browse specially curated selections of high-impact research from the mathematics and statistics journals published by Oxford University Press. The collections feature a mixture of: The most read articles published in the first half of 2022. Untapped research sections containing articles ...

  13. Basic statistical tools in research and data analysis

    Abstract. Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise ...

  14. (PDF) The most-cited statistical papers

    Only a few of the most influential papers on the field of statistics are included on our list. through papers in statistics'. Four of our most cited papers, Duncan (1955), Kramer. (1956), and ...

  15. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  16. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  17. Inferential Statistics

    Example: Inferential statistics. You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics. You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

  18. 500+ Statistics Research Topics

    500+ Statistics Research Topics. March 25, 2024. by Muhammad Hassan. Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data. It is a fundamental tool used in various fields such as business, social sciences, engineering, healthcare, and many more.

  19. data science Latest Research Papers

    Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia. Fuel . 10.1016/j.fuel.2021.123098 . 2022 . Vol 314 . pp. 123098. Author (s): Muhammad Mohsin . Sobia Naseem .

  20. Top 99+ Trending Statistics Research Topics for Students

    If we talk about the interesting research topics in statistics, it can vary from student to student. But here are the key topics that are quite interesting for almost every student:-. Literacy rate in a city. Abortion and pregnancy rate in the USA. Eating disorders in the citizens.

  21. What Research Has Been Conducted on Procrastination? Evidence From a

    Data and Methodology. Bibliometric analysis is a quantitative method to investigate intellectual structures of topical field. On the basis of co-citation assumption that if two articles are usually cited together, then there are high associations between those articles, bibliometric analysis can reflect the scientific communicational structures holistically (Garfield, 1979; Chen et al., 2012).

  22. Statistics Research Paper

    View sample Statistics Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a re ... The word 'statistics' is related to the word 'state' and the original activity that was labeled as statistics was social in nature and related to elements of society through ...

  23. Comparative evaluation of VAE-based monitoring statistics for real-time

    In this study, we conduct a comprehensive comparison and evaluation of various monitoring statistics based on VAE within the context of statistical process monitoring. Furthermore, we propose a new real-time monitoring method by integrating VAE-based monitoring statistics with the CUSUM chart for monitoring AIS data.

  24. Dealing with Criticisms in Interdisciplinary Research Settings ...

    If you need immediate assistance, call 877-SSRNHelp (877 777 6435) in the United States, or +1 212 448 2500 outside of the United States, 8:30AM to 6:00PM U.S. Eastern, Monday - Friday.