U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Big Data in Psychology: Introduction to Special Issue

Lisa l. harlow.

Department of Psychology, University of Rhode Island

Frederick L. Oswald

Department of Psychology, Rice University

The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are mentioned, including: 1. The benefits of collaboration across disciplines, such as those in the social sciences, applied statistics, and computer science. Doing so assists in grounding big data research in sound theory and practice, as well as in affording effective data retrieval and analysis. 2. Availability of large datasets on Facebook, Twitter, and other social media sites that provide a psychological window into the attitudes and behaviors of a broad spectrum of the population. 3. Identifying, addressing, and being sensitive to ethical considerations when analyzing large datasets gained from public or private sources. 4. The unavoidable necessity of validating predictive models in big data by applying a model developed on one dataset to a separate set of data or hold-out sample. Translational abstracts that summarize the articles in very clear and understandable terms are included in Appendix A , and a glossary of terms relevant to big data research discussed in the articles is presented in Appendix B .

Big data involves the storing, retrieval, and analysis of large amounts of information and has been gaining interest in the scientific literature writ large since the 1990s. As a catch-all term, big data has also been referred to by a number of other related terms such as: data mining, knowledge discovery in databases, data or predictive analytics, or data science. The domain has traditionally been associated with computer science, statistics, and business, and now it is clearly, quickly, and usefully making inroads into psychological research and applied practice. There is a healthy and growing infrastructure for dealing with big data, some of it being open source and free to use. For example, Hadoop (a name originally based on that of a child’s toy elephant) is a widely used open source file system and framework. Within this framework, MySQL is a structured query language that is also open source and is used a great deal. It allows powerful capabilities to “Select” a specific group of entities, “From” a specific database or set of files, “Where” one or more specific conditions hold. For example, an academic researcher could select and analyze data based on student identification numbers from class records in several majors, where the GPA is less than 2.0. In turn, this could allow for the possibility of strategic data-driven interventions with these students to offer enrichment or tutoring that would bolster their grades and improve their chances of staying in school and succeeding. Once big data are queried and refined, they can be analyzed with a number of tools, increasingly with commonly known software and programs such as R and Python, respectively.

Who is using big data? Business industries in this area abound (e.g., insurance, manufacturing, retail, pharmaceuticals, transportation, utilities, law, gaming, eBay, telecommunication, hotels). Social media is also prominently involved (e.g., Google, Facebook, LinkedIn, Yahoo, Twitter). Various academic disciplines also have a visible presence (e.g., genomics, medicine, and environmental sciences, the latter often using spatial geographic information systems, or GIS). There are several journals in this area, including the open access and peer-reviewed journal Big Data , founded in 2013 and currently edited by Dhar. Their web page ( http://www.liebertpub.com/overview/big-data/611/ ) boasts a comprehensive coverage and audience—yet has not yet mentioned psychology or even the broader social sciences. At least two other journals were founded in 2014, the open access Journal of Big Data that is edited by Furht and Khoshgoftaar, and Big Data Research that is edited by Wu and Palpanas. Likewise, these two journals also do not appear to be directed to those in psychology or the larger social sciences. Similarly, a quick Google search in September 2016 for “big data book” revealed more than 48 million results, although it is noteworthy that all of the big data books listed on the front page, are not specifically directed to social science fields. Noting all of this is not to indict the current state of big data for neglecting psychology—quite the opposite: Psychology and the social sciences should be proactive and take advantage of a real opportunity in front of them. The timing is ripe, now that the big data movement has matured beyond many of its fads.

So, where does psychology fit into the field of big data or related areas such as computational social science? There are a number of areas in which psychology could and has begun to weigh in, such as wellness, mental health, depression, substance use, behavioral health, behavior change, social media, workplace well-being and effectiveness, student learning and adjustment, and behavioral genetics. A number of recent books of interest to psychology researchers have been published ( Alvarez, 2016 ; Cioffi-Revilla, 2014 ; Mayer-Schönberger & Cukier, 2013 ; McArdle & Ritschard, 2014 , to name a few). Researchers are studying topics such as health and the human condition in big datasets comprising thousands of individuals, such as in the Kavli Human Project ( http://kavlihumanproject.org/ ; Azmak et al., 2015 ). In a similar vein, Fawcett (2016) discusses the analysis of what is called the quantified self in which individuals collect data on themselves (e.g., number of steps, heart rate, sleep patterns) using personal trackers such as Fitbit, Jawbone, iPhone, and similar devices. Researchers envision studies that could link such personal data to health and productivity to reveal patterns or links between behavior and various outcomes of interest.

It is apparent that big data or data science is here to stay, with or without psychology. This broad-and-growing field offers a unique opportunity for interested psychological scientists to be involved in addressing the complex technical, substantive, and ethical challenges with regard to storing, retrieving, analyzing, and verifying large datasets. Big data science can be instrumental in collaboratively working to uncover and illuminate cogent and robust patterns in psychological data that directly or indirectly involve human behavior, cognition, and affect over time and within sociocultural systems. These psychological patterns, in turn, give meaning to non-psychological data (e.g., medical data involving health-related interventions; booms and busts tied to financial investing behavior). The big data community, and big data themselves, can together propel psychological science forward.

In this special issue, we offer 10 articles that focus on various aspects of big data and how they can be used by applied researchers in psychology and other social science fields. One of the common themes of these articles is also clearly evident in federal funding announcements for big data projects: Psychologists and psychology benefit from the collaboration and contributions of other disciplines—and vice-versa. For example, such collaborations can incorporate cutting-edge breakthroughs from computer science that can help access and analyze large amounts of data, as well as theory and behavioral science from across the social sciences that offer insight into the areas that are most in need of understanding, prediction, and intervention.

A second theme is that data are widely available in open forums such as Facebook, Twitter, and other social media sites, and can offer the opportunity to identify trends and patterns that are important to address. For example, tapping the content of Google activity could indicate geographic areas where users are inquiring about various flu or other symptoms, thus pointing to areas in which it may be important to focus health intervention efforts. The psychological nature of the query content might allow for early planning in targeting the intervention (e.g., judging the level of knowledge and concern about the health problem and its related symptoms and treatment). Note that when big data analyses incidentally detect a useful signal in the noise of social media data, one’s discoveries and research efforts need not stop there; researchers can develop new construct-driven measures that help amplify those signals that may have initially been discovered serendipitously.

A third general theme is that it is critically important to consider and carefully attend to the ethical issues of big data projects, including data acquisition and security, the protection of the identity of the users who often inadvertently provide extensive data, and decisions about how the information will be used and interpreted vis-à-vis the nature of the audience or stakeholders involved.

A fourth shared theme of these articles is that it is essential to develop theories and hypotheses on an initial training set of data and then verify those findings with other validation datasets, either from a hold-out sample of the original data or from separate, independent data. With the existence of large datasets that often may not have had an overriding theory or set of hypotheses guiding their formation, an initial analysis of big data is often at the exploratory or data mining level. At least one or more subsequent analyses of separate data may be needed to be able to generalize past the initial data, particularly as there can be a large number of variables that are relevant to prediction, but not necessarily the best measures that one could obtain with additional foresight and planning. Given a large number of incidental variables, and given the flexible modeling afforded by big data analyses, it is perhaps more important than ever to avoid over-interpreting what might be considered a modern-day version of the classic “ crud factor ” ( Meehl, 1990 , p. 108), namely where researchers could find the appearance of relationships between variables in a large dataset that are robustly upheld (e.g., through cross-validation), yet these relationships may change or dissipate over time, as the nature of the relevant sample, population, and the phenomenon under study change as well. Each of the articles in this special issue address one or more of these four themes in relatively easy-to-understand presentations of how big data can be used by researchers in psychology.

A summary of the highlights of the articles is presented, below, followed by Appendix A , which provides translational abstracts (TAs) of the articles, briefly describing the essence of the papers in clearly understandable language. Appendix B includes a Glossary of some of the major terms used in the 10 articles, providing brief descriptions of each and an indication of which articles refer to these terms. To be clear, the Glossary is not intended to provide an exhaustive list of big data concepts; it is more of a summary of some of the ideas and practices that are referred to in these special issue articles so that readers can have a reference of the terminology and find out which special issue articles are discussing them. To help identify which terms are included in the Glossary in Appendix B , these terms are italicized in this introductory article, although not necessarily in the separate articles themselves.

The first article by Chen and Wojcik offers an excellent guide to conducting behavioral science research on large datasets. In addition to describing some background and concepts, they provide three tutorials in the supplemental materials in which interested readers can move through the steps. Their first tutorial clearly indicates how to acquire the congressional speech data through application programming interfaces ( APIs ) that reflect specific procedures needed to acquire data from a site. Their second tutorial demonstrates how these data are analyzed using procedures known as latent semantic analysis ( LSA ) and latent Dirichlet allocation ( LDA ) topic modeling, both of which can be used to assess the co-occurrence of words in a dataset based on underlying topics and relationships between documents. Other terms, common to the big data community and discussed in their main article and their third tutorial, include bag of words , stop words , support vector machines , machine learning , and s upervised learning algorithms (see also our Glossary in Appendix B of this article). Chen and Wojcik also provide two appendices to help apply the material they discuss. Their Appendix A provides the Python code for acquiring data from the Congressional Daily Digest that are discussed in the first and second tutorials, and the use of MySQL . Their Appendix B offers a checklist for conducting research with big data.

In the second article, Landers et al. discuss web scraping , an automated process that can quickly extract data from websites behind the scenes. Behavioral scientists are increasingly involved in this type of research, within academia and in organizations, determining the pulse of social consciousness and norms on web sources such as Facebook, Twitter, Instagram, and Google. Along with delineating potential benefits of web scraping, Landers et al. also provide their expert advice on the need to emphasize theory in such a project. In particular, they discuss what they call theory of the data source or data source theory to help ensure the relevance and meaningfulness of data that are obtained from web scraping. Although there are not yet exact standards on the ethics of scraping the web for data, Landers et al. suggest that the APA Ethical Principles of Psychologists and Code of Ethics (2010) , along with those from the Data Science Association , can suggest policies and procedures for collecting data in a responsible manner that respects the participants and the research field in which conclusions will be shared. Assessing large datasets that are gleaned or scraped from the web using the theory-driven method suggested by Landers et al. can help lessen the possibility that the findings are just happenstances of a large collection of information.

The third article, by Kosinski et al., discusses how to use large data bases collected from the web to understand and predict a relevant outcome. Their paper is a tutorial that describes an example of using Facebook digital footprint data, stored in what is called a user-footprint matrix , to predict personality characteristics. The authors analyze input from over 100,000 Facebook users (see myPersonality project , http://www.mypersonality.org/ ; Kosinski, Matz, Gosling, Popov, & Stillwell, 2015) using dimension-reduction procedures such as singular value decomposition ( SVD ) that is computationally easy to use as a method for conducting principal components analysis . The Kosinski et al. article also discusses a clustering procedure known as latent Dirichlet allocation ( LDA ) to help form dimensions with similar content from large datasets of text or counts of words or products. Findings from an LDA model can be visually depicted in a heatmap that shows darker colors when a trait or characteristic is more correlated with one of the LDA clusters. Thus, you can see at glance the patterns that characterize each cluster.

In the fourth article, Kern et al. discuss the analysis of big data found on social media, such as on Facebook and Twitter. The authors discuss several steps in acquiring, processing, and quantifying these kinds of data, so as to make them more manageable for statistical analyses. The authors discuss a World Well-Being Project and use LDA or latent semantic analysis that helps reduce large amounts of text-based information into a smaller set of relevant dimensions. They also discuss a procedure known as differential language analysis , encouraging the use of database management systems that pervade the world of business and increasingly are being implemented in psychological research. Cautioning that results could be specific to a particular dataset and need to be further tested with independent data, Kern et al. explain and implement the k-fold cross-validation method that tests a prediction model across repeated subsets of a large dataset to support the robustness of the findings. The authors also discuss prediction methods such as the lasso (i.e., least absolute shrinkage and selection operator ) as a regression method for robust prediction, based on screening a large set of predictors and weighting predictors that were selected conservatively (i.e., with lower magnitudes than traditional OLS regression). They also caution against ecological fallacies , whereby researchers derive erroneous conclusions about individuals and subgroups based on results from a larger group of data, and exception fallacies , when a conclusion is drawn based on outliers (exceptions) in the data that may stand out but may not fully represent the group. Not everyone uses social media, and some use it far more often or idiosyncratically than others. Still, these authors are optimistic about the amount and richness of the data that can be gleaned from social media, and the insights that can be gained from such data.

In the fifth article, Jones, Wojcik, Sweting, and Silver examine the content of Twitter posts after three different traumatic events (violence in or near college campuses), applying linguistic analyses to the text for negative emotional responses. They discuss a procedure known as Linguistic Inquiry and Word Count and an R-based computer twitteR package to analyze such data. Using an innovative approach, the authors recognize pertinent Twitter users by identifying people who follow relevant community networks tied to the geographical area of the event, and they are careful to compare results with control groups not similarly geographically situated, to help ensure that results were event-driven, versus other contemporaneous events that were more geographically widespread. Overall, this work demonstrates how psychological themes can be reliably extracted and related to region- and time-dependent events, similar to prior related work in the health arena.

In the sixth article, Stanley and Byrne contribute a theory-driven approach to big-data modeling of human memory (i.e., long-term knowledge storage and retrieval), testing two theoretical models that predict the tags that users apply to Twitter and Stack Overflow posts. Incorporating but going beyond the psychological tenet that “past behavior predicts future behavior,” the current models robustly predict how and to what extent this tenet applies given the nature, recency, and frequency of past behavior. This paper exemplifies an important general point, that big-data analyses benefit from being theory-driven, demonstrating how theories can develop in their usefulness as a joint function of empirical competition (i.e., deciding which model affords better prediction) and empirical cooperation (i.e., demonstrating how model ensembles might account for the data more robustly than models taken individually). The authors discuss the use of an ACT-R based Bayesian model and a random permutation model to understand and clarify predictions about links between processes and outcomes.

The seventh article, by Brandmaier et al, discuss ensemble methods that they developed, one of which is called structural equation model (SEM) trees that combines decision trees (also called recursive partitioning methods ) and SEM to understand the nature of a large dataset. These authors suggest an extended method called SEM forests that allows researchers to generate and test hypotheses, combining both data- and theory-based approaches. These and other methods, such as latent class analysis and multiple sample SEM , help in assessing distinct clusters in the data. Several methods are described to gauge how effectively an SEM forest is modeling the data, such as examining variable importance based on out-of-bag samples from the SEM trees, as well as case proximity and conversely, an average dissimilarity metric, the latter indicating its novelty . Brandmaier et al. provide two examples to demonstrate the use of SEM forests. Interested researchers can conduct similar analyses using Brandmaier’s (2015) semtree package that is written in R, with their supplemental material providing the R code for the examples they provide.

In the eighth article, Miller, Lubke, McArtor, and Bergeman detail a new method for detecting robust nonlinearities and interactions in large data sets based on decision trees . Called multivariate gradient boosted trees , this method extends a well-established machine-learning or statistical learning theory method. Whereas most predictive models in the big data arena seek to predict a single criterion, the present approach consider multiple criteria to be predicted (as does the Beaton et al. partial least squares correspondence analysis method). Such exploration is useful for informing and refining theories, measures, and models that take a more deductive approach. To do this, a boosted tree-based model for each outcome is fit separately, where the goal is to minimize cross-validated prediction error across all outcomes. An advantage of tree-based methods comes in detecting complex predictive relationships (interactions and nonlinearities) without having to specify their functional form beforehand. In the current approach, tree models can be compared across outcomes, and the explained covariance between pairs of outcomes can also be explored. The authors illustrate this approach using measures of psychological well-being as predictors of multiple psychological and physical health outcomes. Interested readers can apply this method to their own data with Miller’s R-based mvtboost package .

In the ninth article, Chapman, Weiss, and Dubenstein consider measure-development models that focus squarely on predictive validity using a machine-learning approach that challenges—and complements—traditional approaches to measure development involving psychometric reliability. The proposed approach seeks out additional model complexity so long as it is justified by increased prediction; the approach incorporates k-fold cross validation methods to avoid model overfitting. Almost two decades ago, McDonald’s (1999) classic book, Test theory: A unified treatment , also suggested that measures of a construct judged to be similar should not only demonstrate psychometric reliability, but also show similar relationships with measures of other constructs in a larger nomological net. The current big-data paper reflects one important step toward advancing this general idea, discussing procedures and terms such as elastic net, expected prediction error , generalized cross-validation error , stochastic gradient boosting and supervised principal components analysis , as well as R-based computer packages glmnet , and superpc .

For the final 10 th article, Beaton, Dunlop, and Abdi jointly analyze genetic, behavioral, and structural MRI in a tutorial for a generalized version of partial least squares called partial least squares correspondence analysis (PLSCA). The method can handle disparate data types that are on widely different scales, as might become increasingly common in large and complex data sets. In particular, their methods can accommodate categorical data when analyzing relationships between two sets of multivariate data, where traditional analyses assume the data for each variable are continuous (or even more strictly, multivariate normal). These authors have developed a freely available R package, TExPosition , which allows readers to apply the PLSCA method to their own data.

In closing, we hope you find something of interest to you in one or more of the 10 articles we present in this special issue on the use of big data in psychology. We recognize that other articles may approach these topics differently, and likewise, many other big data topics will be discussed in the future. We look forward to continued tutorials and other research publications in Psychological Methods that share even more about how to apply innovative and informative big data methods to meaningful and relevant data of interest to researchers in psychology and related social science fields.

Acknowledgments

The co-editors (Harlow and Oswald) would like to thank the authors and reviewers who contributed to this special issue. We also would like to offer much appreciation and thanks to our manuscript coordinator, Meleah Ladd, who has played an integral part in helping to make every aspect of our work better and more enjoyable, and especially so with this special issue. Lisa Harlow also extends thanks to the National Institutes of Health grant G20RR030883.

Appendix A: Translational Abstracts (TAs) for the 10 Special Issue Articles

The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering the practices of acquiring, managing, processing, and analyzing data. It is accompanied by three tutorials that walk through the acquisition of real text data, the analysis of that text data, and the use of an algorithm to classify data into different categories. Big data practitioners in academia, industry, and the community have built a comprehensive base of tools and knowledge that makes big data research accessible to researchers in a broad range of fields. However, big data research does require knowledge of software programming and a different analytical mindset. For those willing to acquire the requisite skills, innovative analyses of unexpected or previously untapped data sources can offer fresh ways to develop, test, and extend theories. When conducted with care and respect, big data research can become an essential complement to traditional research.

One of the biggest challenges for psychology researchers is finding high quality sources of data to address research questions of interest. Often, researchers rely on simply giving surveys to undergraduate students, which can cause problems when trying to draw conclusions about human behavior in general. To work around these problems, sometimes researchers actually watch people in real life, or observe via the web, taking notes on their behaviors to be analyzed later. But this process is time-consuming, difficult, and error-prone. In this paper, we provide a tutorial on a technique that can be used to create datasets summarizing actual human behavior on the internet in an automated way, partially solving both of these problems. This big data technique, called web scraping, takes advantage of a programming language called Python commonly used by data scientists. We also introduce a new related concept, called data source theories, as a way to address a common criticism of many big data approaches – specifically, that because the analytic techniques are “data-driven,” they tend to take advantage of luck more so than psychology’s typical approaches. As a result of this tendency, researchers sometimes make conclusions that do not reflect reality beyond their dataset. In creating a data source theory, researchers precisely account why the data they found exist and test the hypotheses implied by that theory with additional analyses. Thus, we combine the strengths of psychology (i.e., high quality measurement and rich theory) with those of data science (i.e., flexibility and power in analysis).

Humans are increasingly migrating to the digital environment, producing large amounts of digital footprints of behaviors, communication, and social interactions. Analyzing big datasets of such footprints presents unique methodological challenges, but could greatly further our understanding of individuals, groups, and societies. This tutorial provides an accessible introduction to the crucial methods used in big data analysis. We start by listing potential data sources, and explain how to efficiently store and prepare data for the analysis. We then show the reader how to reduce the dimensionality and extract patterns from big datasets. Finally, we demonstrate how to employ such data to build prediction models. The text is accompanied by examples of R code and a sample dataset, allowing the reader put their new skills into practice.

Many people spend considerable time on social media sites such as Facebook and Twitter, expressing thoughts, emotions, behaviors, and more. The massive data that are available provide researchers with opportunities to study people within their real-world contexts, at a scale previously impossible for psychological research. However, typical psychological methods are inadequate for dealing with the size and messiness of such data. Modern computational linguistics strategies offer tools and techniques, and numerous resources are available, but there is little guidance for psychologists on where to even begin. We provide an introduction to help guide such research. We first consider how to acquire social media data and transform it from meaningless characters into words, phrases, and topics. Both top down theory driven approaches and bottom up data-driven approaches can be used to describe characteristics of individuals, groups, and communities, and to predict other outcomes. We then provide several examples from our own work, looking at personality and well-being. However, the power and potential of social media language data also brings responsibility. We highlight challenges and issues that need to be considered, including how data are accessed, processed, analyzed, and interpreted, and ever-evolving ethical issues. Social media has become a valuable part of social life, and there is much we can learn by cautiously bringing together the tools of computer science with the theories and insights of psychology.

Capturing a snapshot of emotional responses of a community soon after a collective trauma (e.g., school shooting) is difficult. However, because of its rapid distribution and widespread use, social media such as Twitter may provide an immediate window into a community’s emotional response. Nonetheless, locating Twitter users living in communities that have experienced collective traumas is challenging. Prior researchers have either used the extremely small number of geo-tagged tweets (3–6%) to identify residents of affected communities or used hashtags to collect tweets without certainty of the users’ location. We offer an alternative: identify a subset of local community Twitter accounts (e.g., city hall), identify followers of those accounts, and download their tweets for content analysis. Across three case studies of college campus killings (i.e., UC-Santa Barbara, Northern Arizona State University, Umpqua Community College), we demonstrate the utility of this method for rapidly investigating negative emotion expression among likely community members. Using rigorous longitudinal quasi-experimental designs, we randomly selected Twitter users from each impacted community and matched control communities to compare patterns of negative emotion expression in users’ tweets. Despite variation in the severity of violence across cases, similar patterns of increased negative emotion expression were visible in tweets posted by followers of Twitter accounts in affected communities after the killings compared to before the violence. Tweets from community-based Twitter followers in matched control communities showed no change in negative emotion expression over time. Using localized Twitter data offers promise in studying community-level response in the immediate aftermath of collective traumas.

The growth of social media and user-created content on online sites provides unique opportunities to study models of long-term memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a long-term memory retrieval problem, two long-term memory models were tested on millions of posts and tweets and evaluated on how accurately they predict a user’s chosen tags. An uncompressed and compressed model (in terms of storage of information in long-term memory) were tested on the large datasets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the compressed model that previously used only context. Also, an attentional weight term in the uncompressed model was linked to a natural language processing method used to attenuate common words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the compressed model performed comparably to the uncompressed model without including word order. This shows that the strength of the compressed model is not in the ability to represent word order, but rather in the way in which information is efficiently compressed. The results of the large-scale exploration show how the architecture of the two memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains.

Building models fully informed by theory is impossible when data sets are large and their relations to theory not yet specified. In such instances, researchers may start with a core model guided by theory, and then face the problem which additional variables should be included and which may be omitted. Structural equation model (SEM) trees, a combination of SEM anddecision trees, offer a principled solution to this selection problem. SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns by recursively selecting optimal predictors of these differences from a potentially large set of candidates. SEM forests are an extension of SEM trees, consisting of ensembles of SEM trees each built on a random sample of the original data. By aggregating the predictive information contained in a forest, researchers obtain a measure of variable importance that is more robust than corresponding measures from single trees. Variable importance informs on what variables may be missing from their models and may guide revisions of the underlying theory. In summary, SEM trees and forests serve as a data-driven tool for the improvement of theory-guided latent variable models. By combining the flexibility of SEM as a generic modeling technique with the potential of trees and forests to account for diverse and interactive predictors, SEM trees and forests serve as a powerful tool for hypothesis generation and theory development.

Collecting data from smart-phones, watches or web-sites is a promising development for psychological research. However, exploring these data sets can be challenging because there are often extremely large numbers of possible variables that could be used to predict an outcome of interest. In addition, there is often not much established theory that could help making a selection. Using statistical models such as regression models for data exploration can be inconvenient because these standard methods are not designed to handle large data. In the worst case, using simple statistical models can be misleading. For example, simply testing the correlation between predictors and outcomes will likely miss predictors with effects that are not approximately linear. In this paper we suggest using a machine learning method called ‘gradient boosted decision trees’. This approach can detect predictors with many different kinds of effects, but is easy to use compared to fitting many different statistical models. We extend this method to multivariate outcomes, and implement our approach in the R package mvtboost which is freely available on CRAN. To illustrate the approach, we analyze predictors of psychological well-being and show how to estimate, tune, and interpret the results. The analysis showed, for example, that especially above average control of internal states is associated with increased personal growth. Experimental results from statistical simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy. It exceeds or matches the performance of other cutting edge machine learning methods over a wide range of conditions.

Researchers are often faced with problems that involve predicting an important outcome, based on a large number of factors that may be plausibly related to that outcome. Traditional methods for null hypothesis significance tests of one or a small number of specific predictors are not optimal for such problems. Machine learning, reformulated in a statistics framework as Statistical Learning Theory (SLT), offers a powerful alternative. We review the fundamental tenets of SLT, which center around constructing models that maximize predictive accuracy. Importantly, these models prioritize predictive accuracy in new data, external to the sample used to build the models. We illustrate three common SLT algorithms exemplifying this principle, in the psychometric task of developing a personality scale to predict future mortality. We conclude by reviewing some of the diverse contexts in which SLT models might be useful. These contexts are unified by research problems that do not seek to test a single or small number of null hypotheses, but instead involve accurate prediction of an outcome based on a large amount of potentially relevant data.

For nearly a century, detecting the genetic contributions to cognitive and behavioral phenomena has been a core interest for psychological research, and that interest is even stronger now. Today, the collection of genetic data is both simple and inexpensive. As a consequence a vast amount of genetic data is collected across different disciplines as diverse as experimental and clinical psychology, cognitive sciences, and neurosciences. However, such an explosion in data collection can make data analyses very difficult. This difficultly is especially relevant when we wish to identify relationships within, and between genetic data and, for example, cognitive and neuropsychological batteries. To alleviate such problems, we have developed a multivariate approach to make these types of analyses easier and to better identify the relationships between multiple genetic markers and multiple behavioral or cognitive phenomena. Our approach—called partial least squares correspondence analysis (PLSCA)—generalizes partial least squares and identifies the information common to two different data tables measured on the same participants. PLSCA is specifically tailored for the analysis of complex data that may exist in a variety of measurement scales (e.g., categorical, ordinal, interval, or ratio scales). In our paper, we present—in a tutorial format—how PLSCA works, how to use it, and how to interpret its results. We illustrate PLSCA with genetic, behavioral, and neuroimaging data from the Alzheimer’s Disease Neuroimaging Initiative. Finally, we make available R code and data examples so that those interested can easily learn and use the technique.

Appendix B: Glossary of Some of the Major Terms used in the10 Special Issue Articles

ACT-R based Bayesian models are based on the ACT-R theory of declarative memory that can be operationalized as a big data predictive model, reflecting how declarative memory processes (e.g., exposure, learning, recall, forgetting) affect behavioral outcomes. The predictive model incorporates a version of the Naïve Bayes method, such that any piece of knowledge is assigned a prior probability for being retrieved by the user, independent of all other pieces of available knowledge, which is then weighted by the information in the current context to yield a posterior distribution and prediction. See Stanley and Byrne.

APA Ethical Principles of Psychologists and Code of Ethics (2010) , along with those from the Data Science Association , suggest policies and procedures for collecting data in a responsible manner that respects the participants and the research field in which conclusions will be shared. See Landers et al.

Application Programming Interfaces ( APIs ) refer to sets of procedures that software programs use to request and access data in a systematic way from other software sources (APIs can be web-based or platform-specific). See Chen and Wojcik; Jones et al.; Kern et al.; and Stanley and Byrne.

Average dissimilarity is a general term indicating how different a case tends to be from the rest of the data. See Brandmaier et al.

Bag of words conveys word frequency in a relevant text (e.g., sentence, paragraph, entire document), without retaining the ordering or context of the words. See Chen and Wojcik.

Case proximity is a general term for the similarity between entities in a data set, identifying any clear outliers. See Brandmaier et al.

Crud factor ( Meehl, 1990 , p. 108) is a general term used to indicate that in any psychological domain, measures of constructs are all correlated with one another, at some overall level. Traditional analyses have dealt with this, as will big data analyses. See Harlow and Oswald.

Data source theory refers to a well-thought out theoretical rationale, developed on the basis of the available variables in a given set, to support the nature of the data and the findings derived from them. Researchers working with big data projects are encouraged to have a data source theory to guide exploration, analyses, and empirical results in large data sets. See Landers et al.

Database management system ( DBMS ) is a structure that can store, update, and retrieve large amounts of data that can be accrued in research studies. See Kern et al.

Data Science Association ( http://www.datascienceassn.org/ ) is an educational group that offers guidelines for researchers to follow regarding ethics and other matters relevant to organizations. See Landers et al.

Decision trees ( also called recursive partitioning methods ) are models that apply a series of cutoffs on predictor variables, such that at each stage of selecting a predictor and cutoff point, the two groups created by the cutoff are as separated (i.e., internally coherent and externally distinct) as possible on the outcome variable. Decision trees model complex interactions, because each split of the tree on a given predictor is dependent on all splits from the previous predictors. See Brandmaier et al., and Miller et al.

Differential language analysis ( DLA ) is an empirical method used to extract underlying dimensions of words or phrases without making a priori assumptions about the structure of the language, and then relating these dimensions to outcomes of interest. See Kern et al.

Digital footprint refers to data that can be obtained from various sources such as the web, the media, and other forums in which publicly available information is posted by or stored regarding individuals or events. These kind of data can be stored in what is called a User-Footprint Matrix . See Kosinski et al.

Ecological fallacies are incorrect conclusions made about individual people or entities that are derived from information that summarizes a larger group. For example, if a census found that higher educational levels were associated with higher income, it would not necessarily be true that everyone with high income had a high level of education. Simpson’s paradox is an extreme example, where each within-group relationship may be different from or even the opposite of a between-group relationship. See Kern et al.

Elastic net refers to a regression model that linearly weights the penalty functions from two regression models: the lasso regression model (applying an L1 penalty that conducts variable selection and shrinkage of non-zero weights) and the ridge regression model (applying an L2 penalty that applies shrinkage, does not select variables, and will include correlated predictors, unlike lasso ). See Chapman et al.

Ensemble methods involve the use of predictions across several models. The idea is that combining predictions across models tends to be an improvement over the predictions taken from any single model in isolation. An example of an ensemble method is the structural equation model random forests (see this term, below). See Brandmaier et al.

Exception fallacies involve mistaken conclusions about a group derived from a few unrepresentative instances in which an event, term, or characteristic occurs quite a lot. For example, if one or two participants in a dataset mention the word “sad” many times, it could falsely be surmised that the group of data as a whole experienced depression. See Kern et al.

Expected prediction error ( EPE ) is an index of accuracy for a predictive model, decomposed into: (a) squared bias (systematic model over- or under-prediction across data sets), (b) variance (fluctuation in the model parameter estimates across data sets), and (c) irreducible error variance (variance that cannot be explained by any model). Expected prediction error captures the bias-variance tradeoff : Models that are too simple will under-fit the data and show high bias yet low variance in the EPE formula; models that are too complex will over-fit the data and show low bias yet high variance in the EPE formula. See Chapman et al.

Generalized cross-validation error indicates the target that is be minimized (the loss function) in k -fold cross-validation: e.g., the sum of squared errors, the sum of absolute errors, or the Gini coefficient for dichotomous outcomes. See Chapman et al.

glmnet is a computer package written in R code by Friedman, Hastie, Simon and Tibshirani (2016) that fits lasso and elastic-net models, with the ability to graph model solutions across the entire path of relevant tuning parameters. See Chapman et al.

Heatmaps plot the relationships among variables and/or clusters, using colors or shading to indicate the strength of relationship among variables. See Kosinski et al.

k-fold cross-validation involves partitioning a large dataset into k subsets of equal size. First, a model is developed on ( k -1) partitions of the data – the “test” data set; then predicted values from model are obtained on the k th partition of the data that was held out – the “training” data set). This process is repeated k times so that all the data serve as training data, and all data therefore have predicted values from models in which they did not participate. See Chapman et al., and Kern et al.

Lasso ( least absolute shrinkage and selection operator ) is a regression method that helps screen out predictor variables that are not contributing much to a model relative to the others. See Kern et al.

Latent class analysis can help explain the heterogeneity in a set of data by clustering individuals into unobserved types, based on observed multivariate features. Features may be continuous or categorical in nature. See Brandmaier et al.

Latent Dirichlet allocation ( LDA ) is a method that models words within a corpus as being attributable to a smaller set of unobserved categories (topics) that are empirically derived. See Chen and Wojcik.

Latent semantic analysis ( LSA ) involves the examination of different texts, where it is assumed that the use of similar words can reveal common themes across different sources. See Chen and Wojcik, and Kern et al.

Linguistic inquiry and word count ( LIWC ) is a commercial analysis tool for matching target words (words within the corpus being analyzed) to dictionary words (words in the LIWC dictionary). Target words are then characterized by the coded features of their matching dictionary words, such as their tense and part of speech, psychological characteristics (e.g., affect, motivation, cognition), and type of concern (e.g., work, home, religion, money). See Jones et al.

Machine learning , which has also been called statistical learning theory , is a generic term that refers to computational procedures for identifying patterns and developing models that improve the prediction of an outcome of interest. See Chapman et al.; Chen and Wojcik; Harlow and Oswald; Kern et al; and Miller et al.

Multiple sample structural equation modeling ( SEM ) helps in testing differences across the different clusters that emerge, to identify the patterns of heterogeneity. See Brandmaier et al.

Multivariate gradient boosted trees involve a nonparametric regression method that applies the idea of stochastic gradient boosting to trees (see stochastic gradient boosting ). Trees are fitted iteratively to the residuals obtained from previous trees, while seeking to optimize cross-validated prediction across multiple outcomes (not just one). See Miller et al.

mvtboost is a package written in R code by Miller that implements multivariate gradient boosted trees, allowing the user to tune and explore the model. See Miller et al.

MyPersonality project ( http://www.mypersonality.org/ ; Kosinski, Matz, Gosling, Popov, and Stillwell, 2015) stores the scores from dozens of psychological questionnaires as well as Facebook profile data of over six million participants. See Kosinski, et al.

MySQL is an open source version of a structured query language for working with big data projects. See Harlow and Oswald, and Chen and Wojcik.

Novelty refers to how different a case is from the rest of the data, showing little proximity and more dissimilarity. See Brandmaier et al.

Out-of-bag samples are portions of a larger dataset that do participate in the development of a predictive model and can be used to generate predicted values (and error). Out-of-bag samples are similar to the test sample data referred to previously in k -fold cross-validation. See Brandmaier et al.

Partial least squares correspondence analysis ( PLSCA ) is a generalization of partial least squares that can extract relationships from two separate sets of data measured on the same sample. In particular, PLSCA is useful for handling both categorical and continuous data types (e.g., genetic single-nucleotide polymorphisms that are categorical, and behavioral data that are roughly continuous). Permutation tests and bootstrapping are applied to conduct statistical inference for the overall fit of the model as well as inference on the stability of each obtained component. See Beaton et al.

Random permutation model is an approach for determining whether to preserve information about word order in text analytics, in case doing so provides additional predictive information. Permutations create uncorrelated vectors as a point of contrast with the actual ordering. See Stanley and Byrne.

semtree is a computer package that was developed by ( Brandmaier, 2015 ; http://brandmaier.de/semtree/ ) and written in R. It can be used to analyze SEM tree and forest methods to help explore and discern clusters or subgroups within a large dataset. See Brandmaier et al. and related references.

Singular value decomposition ( SVD ) is a procedure used to reduce a large set of variables or items to a smaller set of dimensions. It is one approach to conducting a principal components analysis . See Kosinski et al.

Stack Overflow is an online question-and-answer forum for programmers (using R, Python, and otherwise). See Stanley and Byrne.

Stochastic gradient boosting is a general term for an iterative method of regression, such that the predictor entered first has the highest functional relationship with the outcome; then residuals are created, and the same rule is applied (where the outcome now becomes the residuals). Also, at each iteration, only a subset of the data is used to help develop more robust models (where out-of-bag prediction errors can be obtained from the data outside of the model). The learning rate and number of iterations are, loosely, inversely related (low learning rate, or improvement in prediction at each step, generally means more iterations) and optimizing these can be explored and supported through cross-validation. See Chapman et al.

Stop words are words that are not essential to a phrase or text and therefore can be omitted to help keep a file more concise. Examples of stop words include “an” and “the” or other similarly nondescript words that can be deleted from a large database (e.g., Twitter, Facebook) and do not need to be analyzed. See Chen and Wojcik; Kern et al.; and Stanley and Byrne.

Structural equation model (SEM) forests are classification procedures that combine SEM and decision-tree or SEM-tree methods to understand the nature of subgroups that exist in a large dataset. SEM forests extends the method of SEM trees by resampling the data to form aggregates of the SEM trees that should have less bias and more stability. See Brandmaier et al.

Structural equation model (SEM) trees combine the methods of decision trees and SEM to conduct theory-guided analysis of large datasets. SEM trees are useful in examining a theoretically based prediction model, but can be unstable when random variation in the data is inadvertently featured in a decision tree. See Brandmaier et al.

superpc is a computer package written in R code by Bair and Tibshirani (2010) that conducts the procedure known as supervised principal components analysis, a term that is defined below. See Chapman et al.

Supervised learning algorithms are procedures that can be developed on a training dataset, and then be used to build regression models that can predict an outcome with one or more variables. See Chen and Wojcik.

Supervised principal components analysis ( SPCA ) is a generalization of principal components regression that first selects predictors with meaningful univariate relationships with the outcome and then performs principal components analysis. Cross-validation is used to determine the appropriate threshold for variable selection and the number of principal components to retain. See Chapman et al.

TExPosition is a computer package written in R code by Beaton and colleagues that implements partial least squares correspondence analysis (this latter term being defined previously). See Beaton et al.

Theory of the data source is the process whereby a larger conceptual framework is adopted when analyzing and interpreting findings from a large dataset, particularly one obtained for another purpose, such as with web scraping of generally available data. See Landers et al.

twitteR is a package written in R code by Jeff Gentry that accesses the Twitter API (see glossary entry on this term), which then allows one to extract subsets of Twitter data found online, search the data, and subject the data to text analyses. See Jones et al.

User-footprint matrix holds information obtained from sources such as the web or various records and lists. See Kosinski et al.

Variable importance is a term indicating how much the inclusion of a specific variable will reduce the degree of uncertainty there is in a model (or models) of interest. The uncertainty criterion and the model of course must be mathematically formalized. See Brandmaier et al.

Web scraping is a process that culls large amounts of data from web pages to be used in observational or archival data collection projects. See Landers et al.

World Well-Being Project ( WWBP , http://www.wwbp.org/ ) involves a collaboration with researchers from psychology and computer science. The project draws on language data from social media to study evidence for well-being that can be revealed through themes of interpersonal relationships, successful achievements, involvement with activities, and indication of meaning and purpose in life. See Kern et al.

A draft of a portion of this introduction was previously presented in Harlow, L. L., & Spahn, R. (2014, October). Big data science: Is there a role for psychology ? Abstract for Society of Multivariate Experimental Psychology, Nashville, TN.

Contributor Information

Lisa L. Harlow, Department of Psychology, University of Rhode Island.

Frederick L. Oswald, Department of Psychology, Rice University.

  • Alvarez RM. Computational social science: Discovery and prediction. Cambridge: Cambridge University Press; 2016. [ Google Scholar ]
  • APA. Ethical principles of psychologists and code of conduct. 2010 Retrieved September 28, 2016 from: http://www.apa.org/ethics/code/
  • Azmak O, Bayer H, Caplin A, Chun M, Glimcher P, Koonin S, Patrinos A. Using big data to understand the human condition: The Kavli HUMAN Project. Big Data. 2015; 3 :173–188. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bair E, Tibshirani R. superpc: Supervised principal components. R package version 1.07. 2010 Retrieved from http://www-stat.standord.edu/~tibs/superpc .
  • Brandmaier AM. semtree: Recursive partitioning of structural equation models in R [Computer software manual] 2015 Retrieved from http://www.brandmaier.de/semtree .
  • Cioffi-Revilla C. Introduction to computational social science: Principles and applications. London: Springer-Verlag; 2014. [ Google Scholar ]
  • Fawcett T. Mining the quantified self: Personal knowledge discovery as a challenge for data science. Big Data. 2016; 3 :249–266. [ PubMed ] [ Google Scholar ]
  • Friedman J, Hastie T, Simon N, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package 2.0-6. 2016 Retrieved from https://cran.r-project.org/web/packages/glmnet/glmnet.pdf .
  • Gentry J. Package ‘twitteR’, version 1.1.9. 2016 Retrieved from https://cran.r-project.org/web/packages/twitteR/twitteR.pdf .
  • Matz Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt; 2013. [ Google Scholar ]
  • McArdle JJ, Ritschard G, editors. Contemporary issues in exploratory data mining in the behavioral sciences. New York: Routledge; 2014. [ Google Scholar ]
  • McDonald RP. Test theory: A unified treatment. New York: Routledge; 1999. [ Google Scholar ]
  • Meehl PE. Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry. 1990; 1 :108–141. [ Google Scholar ]

Advertisement

Advertisement

Big data in social and psychological science: theoretical and methodological issues

  • Survey Article
  • Published: 05 December 2017
  • Volume 1 , pages 59–66, ( 2018 )

Cite this article

big data psychology research

  • Lin Qiu 1 ,
  • Sarah Hian May Chan 1 &
  • David Chan 2  

Big data presents unprecedented opportunities to understand human behavior on a large scale. It has been increasingly used in social and psychological research to reveal individual differences and group dynamics. There are a few theoretical and methodological challenges in big data research that require attention. In this paper, we highlight four issues, namely data-driven versus theory-driven approaches, measurement validity, multi-level longitudinal analysis, and data integration. They represent common problems that social scientists often face in using big data. We present examples of these problems and propose possible solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

big data psychology research

Practical issues to consider when working with big data

big data psychology research

Big Data Predictions Devoid of Theory

big data psychology research

Statistical Models, Scientific Method and Psychosocial Research

Explore related subjects.

  • Artificial Intelligence

Chan, D. (1998). The conceptualization and analysis of change over time: An integrative approach incorporating longitudinal means and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1 (4), 421–483.

Article   Google Scholar  

Chan, D. (1998). Functional relations among constructs in the same content domain at different levels of analysis: A typology of composition models. Journal of Applied Psychology, 83 (2), 234–246.

Chan, D. (2005). Current directions in personnel selection. Current Directions in Psychological Science, 14 (4), 220–223.

Chan, D. (2010). Advances in analytical strategies. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology (Vol. 1). Washington, DC: American Psychological Association.

Google Scholar  

Chan, D. (2013). Advances in modeling dimensionality and dynamics of job performance. In K. J. Ford, J. Hollenbeck, & A. M. Ryan (Eds.), The psychology of work . Washington, DC: American Psychological Association.

Chan, D. (2014). Time and methodological choices. In A. J. Shipp & Y. Fried (Eds.), Time and work: How time impacts groups, organizations, and methodological choices (Vol. 2, pp. 146–176). New York: Psychology Press.

Cheung, M. W. L., & Jak, S. (2016). Analyzing big data in psychology: A split/analyze/meta-analyze approach. Frontiers in Psychology, 7, 738. https://doi.org/10.3389/fpsyg.2016.00738 .

Doré, B., Ort, L., Braverman, O., & Ochsner, K. N. (2015). Sadness shifts to anxiety over time and distance from the national tragedy in Newtown. Connecticut. Psychological Science, 26 (4), 363–373. https://doi.org/10.1177/0956797614562218 .

Gao, W., Qiu, L., Chiu, C-y, & Yang, Y. (2015). Diffusion of opinions in a complex culture system: Implications for emergence of descriptive norms. Journal of Cross-Cultural Psychology, 46 (10), 1252–1259.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457 (7232), 1012–1014.

Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333 (6051), 1878–1881. https://doi.org/10.1126/science.1202775 .

Hodas, N. O., & Lerman, K. (2014). The simple rules of social contagion. Scientific Reports, 4, 4343. https://doi.org/10.1038/srep04343 .

IDC. (2010, May). The digital universe decade — Are you ready? Retrieved Nov 23, 2017, from https://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf .

Jackson, J. C., Rand, D., Lewis, K., Norton, M. I., & Gray, K. (2017). Agent-based modeling: A guide for social psychologists. Social Psychological and Personality Science, 8 (4), 387–395. https://doi.org/10.1177/1948550617691100 .

Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., et al. (2014). From “sooo excited!!!” to “so proud”: Using language to study development. Developmental Psychology, 50, 178–188.

Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110 (15), 5802–5805.

Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, 70 (6), 543–556. https://doi.org/10.1037/a0039210 .

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323 (5915), 721.

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google flu: Traps in big data analysis. Science, 343 (6176), 1203.

Lin, H., Tov, W., & Qiu, L. (2014). Emotional disclosure on social networking sites: The role of network structure and psychological needs. Computers in Human Behavior, 41, 342–350.

Liu, P., Tov, W., Kosinski, M., Stillwell, D. J., & Qiu, L. (2015). Do Facebook status updates reflect subjective well-being? Cyberpsychology, Behavior, and Social Networking, 18 (7), 373–379.

Liu, L., Preotiuc-Pietro, D., Riahi Samani, Z., Moghaddam, M. E., & Ungar, L. (2016). Analyzing personality through social media profile picture choice. In Tenth international AAAI conference on web and social media .

Nai, J., Narayanan, J., Hernandez, I., & Savani, K. (in press). People in more racially diverse neighborhoods are more prosocial. Journal of Personality and Social Psychology .

Park, G., Schwartz, H. A., Sap, M., Kern, M. L., Weingarten, E., Eichstaedt, J. C., et al. (2017). Living in the past, present, and future: Measuring temporal orientation with language. Journal of Personality, 85 (2), 270–280. https://doi.org/10.1111/jopy.12239 .

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of LIWC2007 . (LIWC.net, Austin, TX). Retrieved Nov 23, 2017, from www.liwc.net/LIWC2007LanguageManual.pdf .

Qiu, L., Lin, H., Leung, A. K.-Y., & Tov, W. (2012). Putting their best foot forward: Emotional disclosure on Facebook. Cyberpsychology, Behavior, and Social Networking, 15 (10), 569–572. https://doi.org/10.1089/cyber.2012.0200 .

Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you Tweet: Personality expression and perception on Twitter. Journal of Research in Personality, 46 (6), 710–718.

Qiu, L., Lin, H., & Leung, A. K.-Y. (2013). Cultural differences and switching of in-group sharing behavior between an American (Facebook) and a Chinese (Renren) social networking site. Journal of Cross-Cultural Psychology, 44 (1), 106–121.

Qiu, L., Lu, J., Yang, S., Qu, W., & Zhu, T. (2015). What does your selfie say about you? Computers in Human Behavior, 52, 443–449.

Rentfrow, P. J., & Jokela, M. (2016). Geographical psychology: The spatial organization of psychological phenomena. Current Directions in Psychological Science, 25 (6), 393–398.

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., et al. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8 (9), e73791. https://doi.org/10.1371/journal.pone.0073791 .

Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29 (1), 24–54.

Tov, W., Ng, K. L., Lin, H., & Qiu, L. (2013). Detecting well-being via computerized content analysis of brief diary entries. Journal of Personality Assessment, 25 (4), 1069–1078. https://doi.org/10.1037/a0033007 .

Wojcik, S. P., Hovasapian, A., Graham, J., Motyl, M., & Ditto, P. (2015). Conservatives report, but liberals display, greater happiness. Science, 347 (6227), 1243–1246.

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12 (6), 1100–1122. https://doi.org/10.1177/1745691617693393 .

Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112 (4), 1036–1040.

Download references

Author information

Authors and affiliations.

School of Social Sciences, Nanyang Technological University, Singapore, Singapore

Lin Qiu & Sarah Hian May Chan

Behavioural Sciences Institute, Singapore Management University, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lin Qiu .

Rights and permissions

Reprints and permissions

About this article

Qiu, L., Chan, S.H.M. & Chan, D. Big data in social and psychological science: theoretical and methodological issues. J Comput Soc Sc 1 , 59–66 (2018). https://doi.org/10.1007/s42001-017-0013-6

Download citation

Received : 16 November 2017

Accepted : 27 November 2017

Published : 05 December 2017

Issue Date : January 2018

DOI : https://doi.org/10.1007/s42001-017-0013-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Computational social science
  • Social science
  • Social media
  • Methodology
  • Find a journal
  • Publish with us
  • Track your research
  • DOI: 10.1037/amp0000190
  • Corpus ID: 3468200

Big Data in Psychology: A Framework for Research Advancement

  • Idris Adjerid , Ken Kelley
  • Published in American Psychologist 22 February 2018

77 Citations

Using big data and machine learning in personality measurement: opportunities and challenges, principles, approaches and challenges of applying big data in safety psychology research, small effects: the indispensable foundation for a cumulative psychological science, an analysis model of psychological resilience of outstanding poor college students based on network big data and mobile computing, the social life of digital methods in psychology: situating digital methods in the new data politics, challenges and future directions of big data and artificial intelligence in education, machine learning and psychological research: the unexplored effect of measurement, a study of du fu's psychological changes around the an-shi rebellion based on big data, are regional differences in psychological characteristics and their correlates robust applying spatial-analysis techniques to examine regional variation in personality, how are personality states associated with smartphone data, 90 references, a practical guide to big data research in psychology., critical questions for big data, exploring small, confirming big: an alternative system to the new statistics for advancing cumulative and replicable psychological research, a primer on theory-driven web scraping: automatic extraction of big data from the internet for use in psychological research., facebook as a research tool for the social sciences: opportunities, challenges, ethical considerations, and practical guidelines., charting the future of social psychology on stormy seas: winners, losers, and recommendations, big data, little data, no data: the contested landscape of data sharing and reuse, business intelligence and analytics: from big data to big impact, planned missing data designs for research in cognitive development, multiple regression in behavioral research., related papers.

Showing 1 through 3 of 0 Related Papers

METHODS article

Analyzing big data in psychology: a split/analyze/meta-analyze approach.

\r\nMike W.-L. Cheung*

  • 1 Department of Psychology, National University of Singapore, Singapore, Singapore
  • 2 Department of Methods and Statistics, Utrecht University, Utrecht, Netherlands

Big data is a field that has traditionally been dominated by disciplines such as computer science and business, where mainly data-driven analyses have been performed. Psychology, a discipline in which a strong emphasis is placed on behavioral theories and empirical research, has the potential to contribute greatly to the big data movement. However, one challenge to psychologists—and probably the most crucial one—is that most researchers may not have the necessary programming and computational skills to analyze big data. In this study we argue that psychologists can also conduct big data research and that, rather than trying to acquire new programming and computational skills, they should focus on their strengths, such as performing psychometric analyses and testing theories using multivariate analyses to explain phenomena. We propose a split/analyze/meta-analyze approach that allows psychologists to easily analyze big data. Two real datasets are used to demonstrate the proposed procedures in R. A new research agenda related to the analysis of big data in psychology is outlined at the end of the study.

The amount of data in the world is enormous. For example, the size of the data that Google owns is estimated to be 15 exabytes [that is 15,000 petabytes or 15 billion gigabytes (GB)], Facebook is estimated to have 150 petabytes of data, and eBay is estimated to have 90 petabytes of data ( Huss and Westerberg, 2014 ). According to IBM, the amount of data produced in the world each day is about 2.5 exabytes. Moreover, the total amount of data in the world is predicted to double every 2 years. Little wonder that big data is a big topic. This is especially the case in the world of business, where data means money. Of course, the value lies not in the data itself, but in the information that can be extracted from the data.

The largest collector of data is probably Google, which uses data from their search engine, for example, to present users with personalized advertisements. Online stores use big data to suggest items that customers might wish to purchase, based on the purchases of customers with similar profiles. Networking sites like LinkedIn and Facebook are excellent at suggesting potential connections for people. With the increasing availability of big datasets, big data has also become a big issue in many scientific disciplines.

Besides the big data available in business and industry, a great deal of large and big data are also freely available to the public (“ Open data. ” 2015 ). One of the most important open data initiatives is the open data in government, which makes many government data available over the Internet. For example, the U.S. Government (“Data.gov,” n.d.) 1 makes more than 195,000 datasets available for downloading. These datasets include topics such as climate, finance, education, and public safety. It is reasonable to expect that more and more big datasets will be freely available in the future. Now the question is whether psychologists know how to analyze these datasets to address important questions in their research domains.

Examples of Large and Big Data in Psychology

Most psychological datasets are relatively small, i.e., small enough to be analyzed using a standard desktop computer. Large datasets occasionally appear in the literature. Examples are the World Values Survey (“WVS Database,” n.d.) 2 , the International Social Survey Programme (ISSP; “ISSP–General information,” n.d.) 3 , the Longitudinal Study of American Youth (LSAY; “LSAY,” n.d.) 4 , the International PISA study ( OECD, 2012 ), and the GLOBE project ( House et al., 2004 ). Taking the World Values Survey as an example, the dataset contains data from 343,309 participants on 1377 variables spanning across 100 regions and six waves. More data are being collected in the coming years. Since many of these datasets are too large, most researchers simply select part of the data in their analyses. As a result, their analyses and interpretations may not be optimally comprehensive.

Big datasets in psychology may also be gathered through online applications, such as “math garden,” which is an online environment in which children can train and develop their mathematical skills (see Klinkenberg et al., 2011 ). The math garden project collects around 1 million item responses per day, and uses adaptive testing, adjusting the difficulty of the presented items to the estimated ability of the respondent.

Another example of the use of big data in research is an experimental study on visual search by Mitroff et al. (2015) . They developed a mobile game in which respondents had to detect illegal items in X-rays of bags, acting as if they were an airport security officer. One of the research goals was to investigate errors in the visual search of (ultra) rare items. The large number of trials available allowed the investigation of visual search regarding very rare events, with targets being presented in 1 out of 1000 trials.

Characteristics and Analysis of Big Data

Characteristics of big data.

There is no clear consensus on neither who coined the term “Big Data” nor the definition of it ( Diebold, 2012 ). In general one could say big data refers to datasets that cannot be perceived, acquired, managed, and processed by traditional IT and software/hardware tools within a tolerable time ( Chen et al., 2014 ). We adopt this definition on big data. We define large data as datasets that are large in comparison to conventional datasets in psychological research. Researchers can still analyze large datasets with their standard computers but it may take more time to process the data, such that efficient data-analysis is desirable. It should be noted that these definitions are all relative to the computing facilities. A dataset of 10 GB, e.g., the Airlines data in the illustration, is considered as big data in typical computers with 8 GB RAM. The same dataset is no longer big for workstations with 128 GB RAM.

One of the first to describe big data was probably Laney (2001) , who used three dimensions, namely Volume, Velocity , and Variety (the 3 Vs), to describe the challenges with big data. High volume data means that the size of the dataset may lead to problems with storage and analysis. High velocity data refers to data that come in at a high rate and/or have to be processed within as short an amount of time as possible (e.g., real-time processing). High variety data are data consisting of many types, often unstructured, such as mixtures of text, photographs, videos, and numbers.

A fourth V that is often mentioned is Veracity , indicating the importance of the quality (or truthfulness ) of the data ( Saha and Srivastava, 2014 ). Veracity is different in kind from the other three Vs, as veracity is not a characteristic of big data per se . That is, data quality is important for all datasets, not only big ones. However, due to the methods that are used to gather big data, the scale of the problems with respect to the veracity of data may be larger with big datasets than with small ones. Therefore, with big data it may be even more important to consider whether the conclusions based on the data are valid than with carefully obtained smaller datasets ( Lazer et al., 2014 ; Puts et al., 2015 )

As big data analyses are mainly performed in the physical sciences and business settings, and not commonly in the social sciences, the quality of the data is often not considered in terms of reliability and validity of the constructs of interest, but in terms of screening for duplicate cases and faulty entries. By focusing on the reliability and validity of the data, the veracity of big data is an area where psychology can really contribute to the field of big data. In the illustrations, we demonstrate how reliability and validity can be evaluated in big and large datasets. Example 1 shows how the reliability and the construct validity of the measures can be studied, while Example 2 illustrates how various regression techniques that are often used to study predictive validity, can be applied to big and large datasets.

In order to analyze large volumes of data properly using a typical computer, the size of the dataset cannot be larger than the amount of random-access memory (RAM), which will often be 4 or 8 GB on typical computers. The present study focuses exclusively on how to handle the large volume and the veracity of data in psychology so that psychologists may begin to analyze big data in their research.

Potential Contributions from Quantitative Psychology and Psychology

Psychologists are generally not part of the team in the big data movement (cf. Tonidandel et al., 2015 ). One of the reasons for their absence may be the high threshold required to take part, e.g., psychologists may have to master new programming skills, and may not have access to big data. In this paper we argue that psychologists are well trained in psychological and behavioral theories, psychometrics, and statistics, that are valuable in understanding big data. They are in a good position to start addressing theory-based research questions with big data.

Psychological theories provide fundamental models to explain behavior. Psychometrics gives us empirical information on the measurement properties of the data (the fourth V, Veracity). Advanced statistics, such as multilevel modeling, structural equation modeling, and meta-analysis, provide statistical methods to test the proposed theories. Psychologists can provide a new perspective on how the data are collected (if it is new), whether the measurements have good psychometric properties, and which statistical models can be used to analyze the data.

Because the systems to manage and query big data require strong computational skills, big data analysis in social sciences calls for interdisciplinary teams of researchers. Specifically, with current big data techniques, psychologists may need to co-work with researchers with knowledge of the big data techniques. However, the technical skills for big data analysis are in high demand and suffer from low supply. It may not be straightforward for a researcher to find a data scientist to work with. Not only because it requires a network to find co-researchers but also because it requires more funding, possibly leading to inequity between well-funded and less well-funded research ( Rae and Singleton, 2015 ). Therefore, although we may all agree that working in an interdisciplinary team that includes substantive researchers with strong theories and data scientists with strong computational skills would be desirable, it may not be possible to form such teams. Thus, it may be wise for psychologists to learn how to analyze their big data.

We outline a simple framework for psychologists to use in analyzing big data. In this framework, psychologists can analyze big data with their favorite statistical models such as regression models, path models, mixed-effects models, or even structural equation models. Therefore, this simple framework provides a stepping-stone for psychologists to analyze big data. Instead of handling all four Vs in big data, this framework focuses solely on the first V (Volume) and the fourth V (Veracity).

The remaining sections are organized as follows. The next section proposes a split/analyze/meta-analyze (SAM) approach to analyze big data. This approach breaks a big data problem into a problem with many smaller and independent pseudo “studies.” Then, meta-analysis is used to summarize the “findings.” We illustrate the proposed method using two empirical datasets. The last section addresses new challenges and future directions related to the proposed approach.

A SAM Approach to Analyze Big Data

In this section we first introduce several statistical platforms to analyze big data and suggest why using R ( R Development Core Team, 2016 ) is a good choice. We review several alternative approaches and explain why these approaches may not be optimal. Common approaches to handling big data are reviewed. We then introduce the proposed SAM approach to analyze big data.

Statistical Platforms for Handling Big Data

There are several statistical platforms and computing languages for analyzing big data. Two popular choices are R and Python ( Rossum, 1995 ). In a survey conducted in the data mining community, R emerged as the second most widely used analytical tool, after a specific data mining tool called “RapidMiner” ( Piatetsky, 2014 ). R comes with many packages to perform statistical analyses that are often applied in psychological research, e.g., multilevel modeling, structural equation modeling, and meta-analysis. R is popular in statistics, while Python is dominant in computer science. The popularity of R is rapidly increasing across many fields (Robert Muenchen, n.d.). It seems legitimate to assume that future psychologists will be more comfortable with R ( Culpepper and Aguinis, 2011 ), especially if they are planning to handle large data. In this paper we will focus on analyses with R, but the general principles apply to Python or other statistical platforms as well.

Naïve Approaches to Handling Big Data

A naïve approach is to handle big data as a typical dataset. This approach, however, rarely works. Because of the large volume of data, most computer facilities cannot hold the data and perform the statistical analyses. A second approach is to analyze only a subset of data. This approach is used by Google for some applications ( Bollier, 2010 ). However, when doing scientific research it is preferable to use as much relevant information as possible. Moreover, conclusions based on a subset of data may be different from those based on the full data, especially when there are geographical clusters or hierarchies in the data.

Another possible approach is to aggregate the data based on some characteristics, e.g., company or geographic locations. Instead of analyzing the raw data, researchers may analyze the aggregated means of the data. This approach was popular in cross-cultural research. For example, the famous cultural dimension of individualism vs. collectivism was derived based on a factor analysis of the means for the country ( Hofstede, 1980 ). The main limitation of this approach is that results based on the raw scores can be totally different from those based on the aggregated scores. Researchers may commit an ecological fallacy by inferring findings from the aggregated scores to the raw scores ( Robinson, 1950 ).

Common Approaches to Handling Big Data

As big data are too big to be directly analyzed, data scientists usually break the data into smaller pieces for parallel analyses. After the analyses, the results are combined (e.g., Chen and Xie, 2012 ). Two popular programs for parallel computing are MapReduce ( Dean and Ghemawat, 2008 ) and Apache Hadoop ( White, 2012 ). A similar approach is the split-apply-combine approach ( Wickham, 2011 ), which is very popular in R. These approaches involve converting a big dataset into many manageable datasets.

Let us illustrate how the split-apply-combine approach works with a simple example. Suppose we have a dataset on the heights of participants and their countries. We are interested in calculating the mean heights of the participants in each country. The split step groups the heights according to their countries. The apply step calculates the mean height of each country. The combine step merges the mean heights and the countries. Although this example is trivial, more complicated analyses may be used in the apply step. The output of the split-apply-combine approach usually returns a list or a data frame conditioned on the grouping variables. Researchers may apply further calculations on the list of the summaries. In this study, we modify these approaches by using meta-analysis in the last step so that statistical inferences can be made in analyzing big data. The proposed approach is applicable to many statistical analyses.

The SAM Approach

Figure 1 shows the graphical representation of the SAM approach. In the first step we split the data into many independent datasets. We treat each dataset as a pseudo “study” and analyze it independently. The parameter estimates are considered as effect sizes in the studies. In the last step the effect sizes are combined with meta-analytic models. The following sections provide more details on the three stages involved.

www.frontiersin.org

Figure 1. The split/analyze/meta-analyze (SAM) model .

Splitting Data into Many Pseudo “Studies”

By definition, big data are too big to fit into the computer's RAM. Even if they can be read into the RAM, there may not be sufficient RAM left to perform the analysis. Therefore, we will need to break the dataset into smaller datasets. We propose two methods to split the data depending on the research questions and the data structure. If the data are already stored according to some characteristics, e.g., geographic locations or years, we may split the data based on these characteristics, which is termed stratified split here. If there are no special sample characteristics that we can use to split the data, we may apply an arbitrary (random) split on the data, which is termed random split here. These two choices have implications for how the results are to be combined in the final step. After splitting the data, each of the resulting datasets can be viewed as a separate pseudo “study.”

Analyzing Data as Separate Studies

We may apply common statistical analyses, such as regression analysis, reliability analysis, factor analysis, multilevel analysis, or structural equation modeling, on each pseudo “study.” After each analysis, the parameter estimates, e.g., regression coefficients or coefficient alpha, and their sampling covariance matrices are returned. These parameter estimates are treated as effect sizes in the next stage of the analysis. Generally speaking, most parametric techniques, that is, those that result in parameter estimates and a sampling covariance matrix, may be applied in this step. However, two additional points need to be noted. First, it remains unclear how to apply cluster analysis and classifications techniques such as latent class analysis and mixture models. Although we may classify the data into many clusters in each study, future studies may need to address how these clusters are to be combined in the next step.

Second, it is not easy to apply techniques involving model assessment in each pseudo “study.” For example, the illustration using the WVS-dataset presented in the next section shows how to test a one-factor model in the data. The estimated factor loadings are misleading if the proposed model does not fit the data (see Cheung and Cheung, in press for a discussion). In the illustration we address this issue by calculating the correlation matrix as the effect sizes for each study. In the step of combing the results, we apply meta-analytic structural equation modeling (MASEM; Cheung and Chan, 2005 ; Cheung, 2014 ) to synthesize the correlation matrices and to test the proposed factor model.

Combining Results with Meta-Analysis

After obtaining the summary statistics (effect sizes) from various pseudo studies, we may combine them together using meta-analytic models (e.g., Borenstein et al., 2009 ; Cheung, 2015 ). It has been found that meta-analysis on summary statistics is equivalent to an analysis of the raw data ( Olkin and Sampson, 1998 ). In fact, random- and mixed-effects meta-analyses are special cases of multilevel models with known sampling variances or covariance matrices ( Raudenbush and Bryk, 2002 ; Hox, 2010 ; Goldstein, 2011 ). The proposed approach allows us to study the phenomena at the individual level based on the effect sizes.

If we use a random split in the first stage, the population parameters in different pseudo “studies” are assumed to be equal. All differences in the observed effect sizes are due to sampling error. Therefore, fixed-effects meta-analytic models may be used to combine the parameter estimates. When the “studies” are split according to some characteristics (a stratified split), the population parameters are likely to be different across studies. Besides the differences due to sampling error, there are also true differences (population heterogeneity) across studies. Random-effects models account for the differences between studies, and are more suitable than fixed-effects models in this case (see Hedges and Vevea, 1998 for a discussion of the differences between fixed- and random-effects models).

Suppose that researchers have a big dataset on some purchasing behaviors stratified over products, years, and geographic locations. Researchers are rarely interested in finding one predictive model on the whole data set. Instead, it is more valuable to see how the predictive model works across products, years, and geographic locations. Therefore, the stratified split is usually preferable. In a mixed-effects meta-analysis, the characteristics of the study may be used as predictors to explain variability in the effect sizes. If there is only one effect size, we may use a univariate meta-analysis to summarize the findings. If there are more than one effect sizes, a multivariate meta-analysis or MASEM may be used to summarize the findings ( Cheung, 2015 ).

Illustrations with Two Real Datasets

We used two datasets to demonstrate how to apply the SAM approach to real data. The first dataset was downloaded from the WVS-website (“WVS Database,” n.d.) 2 . This illustration is useful to show how to analyze large datasets such as ISSP, LSAY, PISA, the GLOBE project, and many Open Data projects. Psychologists may address new research questions based on many large datasets.

The second example was based on airlines data. This dataset was used in the 2009 Data Exposition organized by the American Statistical Association to illustrate how to analyze big data (“2009. Data expo. ASA Statistics Computing and Graphics,” n.d.) 5 . The airlines data are not psychological data, but qualify as big data. Therefore, Example 1 serves to show that we can perform the typical analyses often used in psychological studies with the SAM approach, and allows us to compare (part of) the results with the analysis of the raw data. Example 2 serves to show that the SAM approach also works with truly big data. It also demonstrates the potential contributions of quantitative psychology in analyzing big data. Since the sample sizes are by definition huge in big data, researchers should not solely rely on testing the significance of the parameter estimates. Researchers should focus on the effect sizes and their confidence intervals (CIs; Cumming, 2014 ). Although we only report the standard errors ( SE s) in the illustrations, researchers may easily convert them into the CIs. All the analyses are conducted in R. The supplementary documents include annotated R code for all analyses, including the output and figures.

Example 1: World Values Survey

The dataset contains the scores of 343,309 participants on 1377 variables spanning 100 regions and 6 waves (1981–1984, 1990–1994, 1995–1998, 1999–2004, 2005–2009, and 2010–2014). One useful tip for handling big data is that it is rarely necessary to analyze all variables. For example, there are 1377 variables in the WVS dataset, but we probably need a handful of them in the analyses. It is crucial not to read irrelevant variables into the RAM so that we can spare more memory for the analyses. We may read subsets of the variables from a database or use some programs to filter the variables before reading the data. The following examples illustrate how to apply the SAM approach.

Illustration Using Random Split

As we have discussed in this paper, there are two possible methods of splitting large data: via a random split or stratified split based on some sample characteristics. We first illustrate the analysis using a random split.

The Splitting Step

There were a total of 343,309 participants in the data set. To demonstrate the effect of the number of studies on the results, we randomly split the data into k = 1, 5, 10, 50, 100, 500, and 1000 studies. When k = 1, it simply means that all data were analyzed simultaneously. The choice of k is usually arbitrary and depends on the size of the RAM. If k is too small, the RAM may not be sufficient for the analysis.

The Analyzing Step

Six variables were selected to illustrate a multiple regression analysis. The dependent variable was life satisfaction ( A 170; 1: dissatisfied to 10: satisfied ), while the predictors were: subjective state of health ( A 009; 1: very good to 4: very poor ), which was reverse coded in the analysis; freedom of choice and control ( A 173; 1: none at all to 10: a great deal ); financial satisfaction ( C 006; 1: none at all to 10: a great deal ); sex ( X 001; 1: male and 2: female ); and age ( X 003). The proposed regression model in the i th study is:

The subscript i indicates that the regression coefficients may vary across studies.

The Meta-Analysis Step

After running the regression analysis, the estimated regression coefficients β ^ 1 ( i ) to β ^ 5 ( i ) are available in each study. Since the data were randomly split into k studies, the population parameters are assumed to be equal across studies. A multivariate fixed-effects meta-analysis (e.g., Cheung, 2013 ) is conducted using y 1 ( i ) = β ^ 1 ( i ) to y 5 ( i ) = β ^ 5 ( i ) as the effect sizes and their sampling covariance matrix V ( i ) as the known sampling covariance matrix. We use y ( i ) rather than β ^ ( i ) to emphasize that the effect sizes are treated as inputs rather than outputs in this step of the analysis.

Table 1 shows the parameter estimates and their SE s for the splitting with different numbers of studies. The parameter estimates and their SE s are nearly identical. This demonstrates that the SAM approach can recover the relationship at the individual level. Interpretations of the parameter estimates of the SAM approach are identical to those using a conventional analysis at the individual level.

www.frontiersin.org

Table 1. Comparisons between analysis of raw data, and analysis based on a fixed-effects meta-analysis with random splits .

Illustration Using a Stratified Split

We split the data according to their regions and waves. After running the analyses, the effect sizes were combined by either a random- or a mixed-effects meta-analysis.

Although it is easy to implement a random split based on a fixed-effects model, this may not be realistic in applied settings. Data are usually nested in some hierarchies. For example, participants in WVS were nested within countries and waves. A better approach is to use a stratified split of the data according to countries and waves. Doing this, the number of studies was 239, with 240–6025 respondents per study.

We illustrate multiple regression analysis, mediation analysis, confirmatory factor analysis, and reliability analysis on different variables of the data. For each analysis, we collect the parameter estimates and associated sampling variances and covariances for each of the studies.

Since the studies are different in terms of countries and waves, it is reasonable to expect that each study has its own population parameters. Thus, a multivariate random- or mixed-effects meta-analysis is more appropriate than a fixed-effects analysis (e.g., Cheung, 2013 ). This means that besides the estimated average population effect sizes, we also estimate the variance component of the heterogeneity of the random effects. A study level variable, like “wave” in this example, can be used to explain some variability in effect sizes.

Results and Discussion

The following sections summarize the various statistical analyses using the stratified split.

Multiple Regression Analysis

The regression model in Equation (1) was fitted in each study. Since the estimated regression coefficients were used as effect sizes in the meta-analysis, we used y 1 ( i ) = β ^ 1 ( i ) to y 5 ( i ) = β ^ 5 ( i ) to represent the effect sizes. The model for the multivariate meta-analysis is:

where T 2 = V a r   ( [ u 1 ( i ) , … , u 5 ( i ) ] T ) is the variance component of the random effects, and V i = V a r   ( [ e 1 ( i ) , … , e 5 ( i ) ] T ) is the known sampling covariance matrix. We may calculate an I 2 ( Higgins and Thompson, 2002 ) to indicate the degree of between-study heterogeneity to the total variance. For example, the I 1 2 for the first effect size y 1 is

where T ^ 11 2 and V - 11 are the estimated heterogeneity and typical known sampling variance for the first effect size, respectively (e.g., Cheung, 2015 ).

We may test whether Wave predicts the effect sizes by using a multivariate mixed-effects meta-analysis using wave as a moderator:

An R 2 type index ( Raudenbush, 2009 ) may be used to indicate the percentage of explanation of the heterogeneity variance by the moderator. For example, the R 1 2 for the first effect size y 1 is

where T ^ 11 ( 1 ) 2 and T ^ 11 ( 0 ) 2 are the estimated heterogeneity with and without the moderator, respectively (e.g., Cheung, 2015 ).

To save space, we will only present the results of the mixed-effects meta-analysis by using wave as the moderator here. The estimated effect regression coefficients ( γ ^ 11 to γ ^ 51 ) and their SE s on predicting the regression slopes are: subjective state of health (A009) = 0.0325 ( SE = 0.0078), freedom of choice and control (A173) = −0.0088 ( SE = 0.0041), financial satisfaction (C006) = −0.0194 ( SE = 0.0072), sex (X001) = −0.0012 ( SE = 0.0072), and age (X003) = −0.0060 ( SE = 0.0028). All the regression coefficients were statistically significant at α = 0.05 except for the regression coefficient of sex (X001). The estimated R 2 in predicting the heterogeneity variances on the slopes by wave are 0.0881, 0.0215, 0.0312, 0.0000, and 0.0332. The meanings of the estimated intercepts depend on the scaling of the moderator. We do not report them here to save space. Readers may refer to the online Appendix for the full set of results.

Mediation Analysis

The following example serves to illustrate how the SAM approach can be used to fit models with mediated effects. A mediation model with life satisfaction (A170) as the dependent variable, freedom of choice and control (A173) as the mediator, and subjective state of health (A009) as the predictor, was hypothesized. The mediation model was fitted on each study by country and wave. The estimated indirect effect and the direct effect were considered as two multiple effect sizes for the multivariate meta-analysis ( Cheung and Cheung, in press ).

By running the random-effects meta-analysis, we obtained estimates of the average population indirect and direct effects of 0.1311 ( SE = 0.0056) and 0.5636 ( SE = 0.0131), respectively. The estimated heterogeneity variance for the indirect and direct effects is T ^ 2 = [ 0 . 0064 0 . 0011 0 . 0346 ] . Figure 2 displays the 95% confidence ellipses (see Cheung, 2013 ) on the estimated indirect and direct effects. We also conducted a mixed-effect meta-analysis by using wave as a moderator. Wave was significant in predicting the direct effect, 0.0441 ( SE = 0.0086), R 2 = 0.1206, but not in predicting the indirect effect, −0.0027 ( SE = 0.0038), R 2 = 0.0058.

www.frontiersin.org

Figure 2. The 95% confidence ellipse on the indirect and direct effects on the WVS data .

Confirmatory Factor Analysis

More advanced multivariate analyses, such as confirmatory factor analysis (CFA), may also be performed with the SAM approach. As an illustration, four items (1: never justifiable to 10: always justifiable ) were used to measure a single factor called “fraud.” These items asked participants whether it was justifiable to (1) claim government benefits to which you are not entitled (F114); (2) avoid paying a fare on public transport (F115); (3) cheat on taxes (F116); and (4) accept a bribe from someone in the course of carrying out one's duties (F117). We used two-stage structural equation modeling (TSSEM; Cheung and Chan, 2005 ; Cheung, 2014 ) to fit the one factor model.

Correlation matrices were calculated by country and wave. These correlation matrices were treated as stemming from different studies and averaged together with a random-effects model in the stage 1 analysis. The proposed one-factor model was fitted against the average correlation matrix with its asymptotic covariance matrix as the weight matrix using the weighted least squares estimation method in the stage 2 analysis. The proposed model fits the data reasonable well according to the RMSEA and SRMR with X 2 ( df = 2) = 333.92, p < 0.001, RMSEA = 0.0230 and SRMR = 0.0472. The estimated factor loadings with their 95% likelihood-based confidence intervals for the items F114 to F117 were 0.5742 (0.5542, 0.5939), 0.7286 (0.7152, 0.7418), 0.7317 (0.7182, 0.7449), and 0.5852 (0.5659, 0.6041), respectively.

Reliability Generalization

One of the strong areas in quantitative psychology is to study the measurement properties of measures. In the previous example, we conceptualized four items to measure the concept of “fraud.” We may test the reliability of the scale consisting of these four items by using reliability generalization (e.g., Beretvas and Pastor, 2003 ; Botella et al., 2010 ). We first calculated the coefficient alpha and its sampling variance ( Bonett, 2010 ) in each study per country and wave. The estimated coefficient alpha and its sampling variance were tested in a mixed-effects meta-analysis with wave as the moderator. The estimated slope was significant, with 0.0216 ( SE = 0.005). Wave explains 9.92% of the variation on the coefficient alpha across studies. The estimated residual heterogeneity variance was 0.0087, which is the between-study difference in the reliability coefficient that could not be explained by wave. The estimated coefficient alphas at wave 1 and wave 6 were 0.6342 and 0.7422, respectively.

Example 2: Airlines Data

The airlines dataset contains scores on 29 variables from more than 123 million flight records for almost all arrivals and departures at airports in the USA from 1987 to 2008. The sizes of the compressed files and the uncompressed files are 1.7 GB and 12 GB, respectively. Since most big datasets are stored in database format, we simulated this environment by converting the data sets into a SQLite database. This illustrates how the proposed model can be applied to handle other big data. The R code in the online supplement shows how the SQLite database, which is about 14.3 GB in size, was created.

We selected a few variables for this illustration. We were interested in how to predict the arrival delay time (in minutes; ArrDelay in the data) by using departure delay time (in minutes; DepDelay in the data) and distance (Distance in the data) between the original and destination airports (in miles). It was hypothesized that (1) departure delay time was positively related to arrival delay time; while (2) distance between the airports was negatively related to arrival delay time. The logic for the second hypothesis is that a longer travel distance allows the flight to adjust its schedule to compensate for the departure delay.

Before testing the above hypotheses, we reported some descriptive statistics and figures as we usually do in data analyses. As there were too much data, we aggregated the means by years and months. Figure 3 shows the scatter plot on the aggregated arrival delay time, departure delay time , and distance . As expected, arrival delay time and departure delay time is positively correlated ( r = 0.96). However, the direction of the association of distance is different from the hypothesized direction ( r = 0.11). It should be noted that these correlation coefficients were calculated based on the aggregated means. The results may or may not be the same as those based on the individual-level analysis. The SAM approach may correctly analyze the data at the individual level. We applied the SAM approach to test the above hypotheses.

www.frontiersin.org

Figure 3. Scatter plot on the means of the selected variables on the airlines data .

The data were split by years. There were a total of 22 pseudo “studies,” with sample sizes ranging from 1,311,826 flights in 1987 to 7,453,215 flights in 2007.

The Analyzing and Meta-Analysis Steps

Two different models—a regression model and a mixed-effects model—were considered. Since the scales for arrival delay and distance were very different, distance was divided by 1000 in the analyses.

Regression Analysis

The regression model assumes that the data within each study are independent. The regression model for the i th study was

There is a subscript i in the regression coefficients, indicating that they may vary across studies. A multivariate random-effects meta-analysis was used to combine the results. Suppose that the estimated regression coefficients y ˘ D e p ( i ) = β ˘ D e p ( i ) and y ˘ D i s t ( i ) = β ˘ D i s t ( i ) and its sampling covariance matrix V ˘ i for the i th study are obtained, the multivariate random-effects meta-analysis is:

where γ 10 and γ 20 are the average population effect sizes for y ˘ D e p ( i ) and y ˘ D i s p ( i ) V ˘ i = V a r   ( [ e D e p ( i ) , e D i s t ( i ) ] T ) is the variance component of the random effects, and V ˘ i = V a r   ( [ e D e p ( i ) , e D i s t ( i ) ] T ) is the known sampling co variance matrix in Equation (6).

We further fitted a multivariate mixed-effects meta-analysis by using year as a moderator:

where γ 11 and γ 21 are the regression coefficients of predicting the slopes for y ˘ D e p ( i ) and y ˘ D i s p ( i ) by year, which is centered around its mean, and the other quantities are defined similarly to those in Equation (7).

Mixed-Effects Regression Analysis

One fundamental assumption underlying the multiple regression model is that the data are independent. This assumption may not be tenable for the data. There may be seasonal effects as the arrival delay is nested within month, day of month, and day of week. Moreover, there may also be locational effects from the airports of origin and destination. With a slight abuse of notation, a mixed-effects model using month, day of month, day of week, origin, and destination as random effects was fitted in each study:

where u 1 ··· u 5 are the random effects for month, day of month, day of week, origin, and destination, respectively.

After running the mixed-effects analysis, the estimated regression coefficients ỹ D e p ( i ) = β ~ D e p ( i ) and ỹ D i s t ( i ) = β ~ D i s t ( i ) and their sampling covariance matrix Ṽ i are used in a multivariate meta-analysis similar to those in Equations (7) and (8).

Table 2 shows the results of the above analyses. We will only focus on the mixed-effects regression. Comparisons between the results of the regression analysis assuming independence of the observations and the mixed-effects regression analysis will be discussed later. The average effect of departure delay γ ^ 10 and distance γ ^ 20 on the arrival delay are 0.8961, and −1.2010, respectively. The I 2 on both effect sizes is almost 1, indicating that there is a large degree of heterogeneity (see Figure 4 ). These results are consistent with the research hypotheses that departure delay has a positive effect on arrival delay while distance has a negative effect on arrival delay. It should be noted that these results are quite different from those of the analysis of the aggregated means, where the average effects were 1.2084 and −13.1609 for departure delay and distance, respectively.

www.frontiersin.org

Table 2. Parameter estimates from the regression model and mixed-effects regression model .

www.frontiersin.org

Figure 4. The 95% confidence ellipse on the regression coefficients on the airlines data .

When year was used as a moderator, the regression coefficients on departure delay γ ^ 11 and distance γ ^ 21 were 0.0108 ( SE = 0.0017), −0.0440 ( SE = 0.0122), respectively. These findings suggest that the effects of departure delay and distance on arrival delay become stronger over time.

One interesting finding is that the parameter estimates and their SE s based on the regression analysis assuming independence of the observations and the mixed-effects regression are comparable. In theory, mixed-effects regression is preferred as it takes the dependence of the data into account. The results seem to suggest that accounting for the dependency is not necessary. A possible reason for this is that the sample sizes are very large in the studies, so the sampling error is very small. Even though the sampling variances are underestimated in the regression analysis by assuming independence of the observations, this bias may not create serious problems in the meta-analysis. However, it should be noted that this is just one example. Further studies are required to clarify whether researchers may “safely” ignore the dependence of data in analyses of big data.

Issues, New Challenges, and Research Agenda

As we have argued in this paper, the key strength of psychologists is their knowledge of substantive theories, psychometrics, and advanced statistical skills to test research hypotheses. The proposed SAM approach enables psychologists to start analyzing big data. The two illustrations demonstrated how the SAM approach can be used to analyze large behavioral datasets and big data. Existing quantitative methods, such as reliability analysis, multiple regression, mediation analysis, mixed-effects modeling, confirmatory factor analysis, and even structural equation modeling, can be applied to big data. When we move away from laboratory or survey designs to big data, there are many new challenges and issues that need to be addressed. Some of these issues are highlighted in the following section.

How Far Can We Generalize the Research Findings in Big Data Analysis?

Sampling error is unlikely to be a major issue in big data analysis because of the large sample size, but sampling bias (selection bias) may be a serious concern. When data are obtained through the Internet or other media, researchers rarely have control over who is providing the data. With data gathered from mobile applications or apps, for example, few or no background characteristics of the respondents may be available to the researchers, making it unclear which population the sample is actually taken from Hargittai (2015) . Examples of concerns about Internet samples include the fear that Internet participants are less motivated to engage in the task, that they may be less inclined to cooperate, and that samples obtained through Internet are less diverse than traditional samples.

In a study comparing the characteristics of an Internet sample used for an online questionnaire to those of traditional samples, Gosling et al. (2004) found that Internet samples are not that different from other samples used in psychological research. They concluded that “Internet samples are certainly not representative or even random samples of the general population, but neither are traditional samples in psychology (p. 102).” Due to the size of the sample with big data, it is a concern that even small selection bias may lead to a false rejection of the null hypothesis ( Ioannidis, 2005 ).

Potential data duplication from the same source is another issue. When participants have several email accounts or mobile devices, the data that are collected are not independent. However, researchers may not have the information to link up such data. Ignoring the dependence of the data is a major concern in data analysis. Our airlines data illustration shows that the results of the regression analysis ignoring the dependence and the mixed-effects model taking the dependence of month, day of month, day of week, and airports of the origin and destination into account are comparable. The results seem to suggest that the effect of ignoring the dependence is minor. If this is the case, researchers will be more confident that the potential data duplication may not seriously threaten the quality of big data. Future research should be conducted to verify whether our findings are applicable to other big datasets.

Which Approach, Large K (Number of Studies) or Large N (Sample Size), Should We Use?

When the data are clustered over some characteristics, e.g., geographic locations or time, it makes sense to split the data according to these characteristics. For example, we analyzed the airlines data by year, leading to 22 groups of data on which the regression models were fitted. We could also analyze the airlines data per year per city of departure, leading to 4461 smaller groups of data. Conducting this analysis lead to estimation problems in the groups with small airports, where the variation in “Distance” was small, e.g., 25% of the airports served no more than 10 destinations. Therefore, when data are split based on specific characteristics, one should be careful not to create groups of data that are too small to obtain reliable results.

When we apply a random split on a dataset, at least two approaches can be used: large k with small N or small k with large N . In all cases with groups of equal size, N * k is equal to the total number of cases. The main difference is whether we put the computational burden on the primary analysis (the analyzing step) or the meta-analysis. If N per study is too large, there may not be sufficient RAM to conduct the analysis. Our findings in Table 1 show that the results are nearly identical with k = 1 to k = 1000. The choice therefore only depends on the size of the RAM.

Another factor to consider is that meta-analytic techniques are used to combine the parameter estimates. In meta-analysis, reasonably large samples are required in each study so that the parameter estimates are approximately distributed as multivariate normal. Therefore, it is preferable to have at least N = 100 per study. However, this requirement will not be a problem with big data.

How Can We Address the Issue of Variety in Big Data?

We focused on quantitative data in this paper, but different types of data could be informative in psychological research (the V of Variety). One popular application is the analysis of text data from Twitter, which can also be performed through R ( Gentry, 2015 ). These data can for example be used to analyze tweeting behavior of people following shocking events like terrorist attacks ( Burnap et al., 2014 ) and riots ( Procter et al., 2013 ). Another example of big data analysis is the use of Amazon product reviews to evaluate emotional components of attitudes ( Rocklage and Fazio, 2015 ). See for example ( Russell, 2013 ) and ( Munzert et al., 2014 ) for overviews of methods to analyze data from social media sites. In all instances where qualitative data are quantified in the end, and statistical models are to be fitted on the data, the SAM approach as presented in the current article might be useful. These qualitative data provide rich information for psychologists to test theories in real life scenarios rather than in laboratory settings.

How Can We Address the Issue of Velocity in Big Data?

Another V defining big data is Velocity. In many big data projects, huge amounts of new data are added to the system in real time. It is fair to say that this type of data presents real challenges to psychologists. Researchers often have neither the theories nor the computational techniques to handle this type of data. This issue will become crucial if psychological researchers want to test theories with dynamic data. One possible approach is to regard the new data as if they were new studies in a meta-analysis. Cumulative or updating meta-analysis may be used to combine existing parameter estimates in a meta-analysis along with the new studies (e.g., Lau et al., 1995 ; Schmidt and Raju, 2007 ).

On the other hand, these new data may create new opportunities for psychologists to propose and test new dynamic theories related to behavior. The question is how to combine the new data or pseudo “studies” with existing meta-analytic findings under the SAM approach. Future research may address how the SAM approach can be combined with the cumulative meta-analytic techniques.

Does the Current Quantitative Training Meet the Future Needs in the Big Data Movement?

Aiken et al. (2008) carried out a comprehensive survey on doctoral training in statistics, measurement, and methodology in psychology. They found that PhD students receive more training that supports laboratory research (e.g., ANOVA) than field research (e.g., SEM and multilevel modeling). They were also greatly concerned about the lack of training in measurement in doctoral programs. Generally speaking, quantitative training lags behind the advances in statistical methods. In a slightly different context, Putka and Oswald (2015) discussed how industrial and organizational psychologists should reshape training to meet the new challenges of big data in an organizational environment.

The proposed SAM approach allows researchers to use many of the existing quantitative techniques to analyze big data. It can be fitted easily into the statistical training in psychology. It can serve as a stepping stone for researchers learning how to analyze big data. However, the analysis of big data is still statistically and computationally demanding because researchers are expected to have a knowledge of various advanced statistical techniques and R or some other statistical language suitable for analyzing big data. It is clear that the current quantitative training in psychology is insufficient to meet future demands in the handling of big data analyses, and that to be part of the big data movement, it will be essential to acquire some new skills ( Oswald and Putka, 2015 ). Future studies are definitely required to see how graduate training programs should be reformed to meet these new needs.

Concluding Remarks

Big data opens up many new opportunities in the field of psychology. Researchers may test theories on huge datasets that are based on real human behavior. On the other hand, big data also presents challenges to current and future psychologists. With the SAM approach presented in this study, we aimed to lower the threshold for engaging in big data research.

Author Contributions

MC formulated the research questions and proposed methodology. Both MC and SJ contributed to the data analysis and drafting the paper. Both authors agreed to submit to the Frontiers.

MC was supported by the Academic Research Fund Tier 1 (FY2013-FRC5-002) from the Ministry of Education, Singapore. SJ was supported by Rubicon grant 446-14-003 from the Netherlands Organization for Scientific Research (NWO).

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank Ranjith Vijayakumar for providing useful comments on an earlier version of this manuscript.

Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00738

1. ^ Retrieved April 12, 2016, from http://www.data.gov/

2. ^ Retrieved April 13, 2016, from http://www.worldvaluessurvey.org/wvs.jsp

3. ^ Retrieved May 3, 2015, from http://www.issp.org/

4. ^ Retrieved May 3, 2015, from http://lsay.org/

5. ^ Retrieved April 17, 2015, from http://stat-computing.org/dataexpo/2009/

Aiken, L. S., West, S. G., and Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: replication and extension of Aiken, West, Sechrest, and Reno's (1990) Survey of PhD programs in North America. Am. Psychologist 63, 32–50. doi: 10.1037/0003-066X.63.1.32

PubMed Abstract | CrossRef Full Text | Google Scholar

Beretvas, S. N., and Pastor, D. A. (2003). Using mixed-effects models in reliability generalization studies. Educ. Psychol. Meas. 63, 75–95. doi: 10.1177/0013164402239318

CrossRef Full Text | Google Scholar

Bollier, D. (2010). The Promise and Peril of Big Data . The Aspen Institute. Available online at: http://bollier.org/promise-and-peril-big-data-2009

Bonett, D. G. (2010). Varying coefficient meta-analytic methods for alpha reliability. Psychol. Methods 15, 368–385. doi: 10.1037/a0020142

Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. R. (2009). Introduction to Meta-Analysis . Chichester; Hoboken, NJ: John Wiley & Sons.

Botella, J., Suero, M., and Gambara, H. (2010). Psychometric inferences from a meta-analysis of reliability and internal consistency coefficients. Psychol. Methods 15, 386–397. doi: 10.1037/a0019626

Burnap, P., Williams, M. L., Sloan, L., Rana, O., Housley, W., Edwards, A., et al. (2014). Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack. Soc. Network Analysis Mining 4, 1–14. doi: 10.1007/s13278-014-0206-4

Chen, M., Mao, S., Zhang, Y., and Leung, V. C. M. (2014). Big Data: Related Technologies, Challenges and Future Prospects . Springer International Publishing. Available online at: http://link.springer.com/chapter/10.1007/978-3-319-06245-7_1 . doi: 10.1007/978-3-319-06245-7

CrossRef Full Text

Chen, X., and Xie, M. (2012). A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data (DIMACS Technical Report No. 2012-01) . Rutgers University.

Cheung, M. W.-L. (2013). Multivariate meta-analysis as structural equation models. Struct. Equation Model. Multidisciplinary J. 20, 429–454. doi: 10.1080/10705511.2013.797827

Cheung, M. W.-L. (2014). Fixed- and random-effects meta-analytic structural equation modeling: examples and analyses in R. Behav. Res. Methods 46, 29–40. doi: 10.3758/s13428-013-0361-y

Cheung, M. W.-L. (2015). Meta-Analysis: A Structural Equation Modeling Approach . Chichester: John Wiley & Sons.

Google Scholar

Cheung, M. W.-L., and Chan, W. (2005). Meta-analytic structural equation modeling: a two-stage approach. Psychol. Methods 10, 40–64. doi: 10.1037/1082-989X.10.1.40

Cheung, M. W.-L., and Cheung, S. F. (in press). Random effects models for meta-analytic structural equation modeling: review, issues, illustrations. Res. Synthesis Methods .

Culpepper, S. A., and Aguinis, H. (2011). R is for revolution a cutting-edge, free, open source statistical package. Organ. Res. Methods 14, 735–740. doi: 10.1177/1094428109355485

Cumming, G. (2014). The new statistics why and how. Psychol. Sci. 25, 7–29. doi: 10.1177/0956797613504966

Dean, J., and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113. doi: 10.1145/1327452.1327492

Diebold, F. X. (2012). On the Origin(s) and Development of the Term “Big Data” (SSRN Scholarly Paper No. ID 2152421). Rochester, NY: Social Science Research Network. Available online at: http://papers.ssrn.com/abstract=2152421

Gentry, J. (2015). twitteR: R based Twitter client (Version 1.1.8) . Available online at: http://cran.r-project.org/web/packages/twitteR/index.html

Goldstein, H. (2011). Multilevel Statistical Models (4th Edn.) . Hoboken, NJ: Wiley.

Gosling, S. D., Vazire, S., Srivastava, S., and John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. Am. Psychologist 59, 93–104. doi: 10.1037/0003-066X.59.2.93

Hargittai, E. (2015). Is bigger always better? Potential biases of big data derived from social network sites. ANNALS Am. Acad. Political Soc. Sci. 659, 63–76. doi: 10.1177/0002716215570866

Hedges, L. V., and Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychol. Methods 3, 486–504. doi: 10.1037/1082-989X.3.4.486

Higgins, J. P. T., and Thompson, S. G. (2002). Quantifying heterogeneity in a meta−analysis. Stat. Med. 21, 1539–1558. doi: 10.1002/sim.1186

Hofstede, G. H. (1980). Culture's Consequences: International Differences in Work-Related Values (1st Edn.) . Beverly Hills, CA: Sage Publications.

House, R. J., Hanges, P. J., Javidan, M., Dorfman, P. W., and Gupta, V. (eds.). (2004). Culture, Leadership, and Organizations?: The GLOBE Study of 62 Societies . Thousand Oaks, CA: Sage Publications.

Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (2nd Edn.) . New York, NY: Routledge.

Huss, M., and Westerberg, J. (2014). Data Size Estimates . Available online at: https://followthedata.wordpress.com/2014/06/24/data-size-estimates/

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med. 2:e124. doi: 10.1371/journal.pmed.0020124

Klinkenberg, S., Straatemeier, M., and van der Maas, H. L. J. (2011). Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation. Comput. Educ. 57, 1813–1824. doi: 10.1016/j.compedu.2011.02.003

Laney, D. (2001). 3D Data Management: Controling Data Volume, Velocity, and Variety . Available online at: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

Lau, J., Schmid, C. H., and Chalmers, T. C. (1995). Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care. J. Clin. Epidemiol. 48, 45–57. doi: 10.1016/0895-4356(94)00106-Z

Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205. doi: 10.1126/science.1248506

Mitroff, S. R., Biggs, A. T., Adamo, S. H., Dowd, E. W., Winkle, J., and Clark, K. (2015). What can 1 billion trials tell us about visual search? J. Exp. Psychol. Hum. Percept. Perform. 41, 1–5. doi: 10.1037/xhp0000012

Munzert, S., Rubba, C., Meißner, P., and Nyhuis, D. (2014). Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining . Hoboken, NJ; Chichester; West Sussex: John Wiley & Sons.

OECD (2012). PISA Products - OECD . Retrieved March 20, 2015, from http://www.oecd.org/pisa/pisaproducts/

Open data. (2015). In Wikipedia, the Free Encyclopedia . Available online at: http://en.wikipedia.org/w/index.php?title=Open_data&oldid=660727720

Olkin, I., and Sampson, A. (1998). Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics 54, 317–322. doi: 10.2307/2534018

Oswald, F. L., and Putka, D. J. (2015). Statistical methods for big data, in Big Data at Work: The Data Science Revolution and Organizational Psychology , eds S. Tonidandel, E. King, and J. M. Cortina (New York, NY: Routledge), 43–63.

PubMed Abstract

Piatetsky, G. (2014). KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead . Available online at: http://www.kdnuggets.com/2014/06/kdnuggets-annual-software-poll-rapidminer-continues-lead.html

Procter, R., Vis, F., and Voss, A. (2013). Reading the riots on Twitter: methodological innovation for the analysis of big data. Int. J. Soc. Res. Methodol. 16, 197–214. doi: 10.1080/13645579.2013.774172

Putka, D. J., and Oswald, F. L. (2015). Implications of the big data movement for the advancement of I-O science and practice, in Big Data at Work: The Data Science Revolution and Organizational Psychology , eds S. Tonidandel, E. King, and J. M. Cortina (New York, NY: Routledge), 181–212.

Puts, M., Daas, P., and de Waal, T. (2015). Finding errors in Big Data. Significance 12, 26–29. doi: 10.1111/j.1740-9713.2015.00826.x

Rae, A., and Singleton, A. (2015). Putting big data in its place: a regional studies and regional science perspective. Regional Stud. Regional Sci. 2, 1–5. doi: 10.1080/21681376.2014.990678

Raudenbush, S. W. (2009). Analyzing effect sizes: random effects models, in The Handbook of Research Synthesis and Meta-Analysis, 2nd Edn ., eds H. M. Cooper, L. V. Hedges, and J. C. Valentine (New York, NY: Russell Sage Foundation), 295–315.

Raudenbush, S. W., and Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods . Thousand Oaks, CA: Sage Publications.

R Development Core Team (2016). R: A Language and Environment for Statistical Computing. Vienna, Austria. Available online at: http://www.R-project.org/

Robert Muenchen. (n.d.). The Popularity of Data Analysis Software . Available online at: http://r4stats.com/articles/popularity/ (Accessed May 11 2016).

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. Am. Sociol. Rev. 15, 351–357. doi: 10.2307/2087176

Rocklage, M. D., and Fazio, R. H. (2015). The Evaluative Lexicon: adjective use as a means of assessing and distinguishing attitude valence, extremity, and emotionality. J. Exp. Soc. Psychol. 56, 214–227. doi: 10.1016/j.jesp.2014.10.005

Rossum, G. (1995). Python Reference Manual . Amsterdam: CWI (Centre for Mathematics and Computer Science).

Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More . Sebastopol, CA: O'Reilly Media, Inc.

Saha, B., and Srivastava, D. (2014). Data quality: the other face of Big Data, in 2014 IEEE 30th International Conference on Data Engineering (ICDE) . (Chicago, IL: IEEE), 1294–1297.

Schmidt, F. L., and Raju, N. S. (2007). Updating meta-analytic research findings: Bayesian approaches versus the medical model. J. Appl. Psychol. 92, 297–308. doi: 10.1037/0021-9010.92.2.297

Tonidandel, S., King, E., and Cortina, J. M. (eds.). (2015). Big Data at Work: The Data Science Revolution and Organizational Psychology . New York, NY: Routledge.

White, T. (2012). Hadoop: The Definitive Guide (3rd Edn.) . Farnham; Sebastopol, CA: O'Reilly.

Wickham, H. (2011). The split-apply-combine strategy for data analysis. J. Stat. Softw. 40, 1–29. doi: 10.18637/jss.v040.i01

Keywords: big data, multilevel model, structural equation modeling, meta-analysis, R platform

Citation: Cheung M-L and Jak S (2016) Analyzing Big Data in Psychology: A Split/Analyze/Meta-Analyze Approach. Front. Psychol . 7:738. doi: 10.3389/fpsyg.2016.00738

Received: 19 December 2015; Accepted: 03 May 2016; Published: 23 May 2016.

Reviewed by:

Copyright © 2016 Cheung and Jak. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mike W.-L. Cheung, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

APS

What Big Data Means For Psychological Science

  • APS 26th Annual Convention (2014)
  • Behavioral Science
  • Experimental Psychology
  • Language Development
  • Methodology

Major advances in computing technology, combined with the vast digital networks and the immense popularity of social media platforms, have given rise to unimaginably large troves of information about people. It’s estimated that the amount of digital data in existence today is in the thousands of exabytes — or 10 to the 18th power of bytes.

This era of Big Data has enormous potential to change the way psychological scientists observe human behavior. But just as it creates new opportunities, access to huge chests of information also creates new challenges for research, said Michael N. Jones of Indiana University Bloomington, introducing a theme program on Big Data at the 2014 APS Annual Convention in San Francisco.

Jones_web

Michael N. Jones

“Each little piece of data is a trace of human behavior and offers us a potential clue to understanding basic psychological principles,” said Jones. “But we have to be able to put all those pieces together correctly to better understand the basic psychological principles that generated them.”

The study of language development, one of Jones’s own research interests, is a great example of a line of research poised to benefit from Big Data. Collecting large data samples from infants in naturalistic settings is extremely time-consuming and typically results in small samples. Testing theories about the way children learn language takes a long time as a result.

Big Data can help expedite the process. As a proof of concept, Jones showed how more than 100,000 words from natural language could be fed into a computer model based on the theory of associative learning — the idea that children group words together based on how often they’re used near other words. Jones showed that, as its analysis progressed, the model indeed recognized that “computer” and “data” were more closely related as word categories than, say, “computer” and “aardvark.”

Ultimately, said Jones, a similar analysis can be done to study associative learning theories in direct samples of child conversation. “These models are quite good at learning things from noise, as long as they have enough data to go on,” he said.

Choudhury_web

Tanzeem Choudhury

Big Data might help researchers get to a point where they can collect behavioral information without sampling human participants at all, Cornell University information scientist Tanzeem Choudhury said. Technology such as smartphones and wearable sensors can gather information on physical activity, social interactions, geographic location, and so on.

The upshot of this type of data collection is that it’s effectively invisible to users; it doesn’t require their time or energy, and it drastically reduces self-report errors.

“We can continuously get measurements of behavior without bugging people to fill out surveys,” said Choudhury. “We can potentially get continuous measurement without actually having to engage users all the time and rely on their self-input.”

Choudhury has been involved in a number of such projects already. StressSense tracks where people experience stress most frequently throughout the day to help them avoid anxious situations. MyBehavior uses physical activity patterns to suggest ways to stay in shape — walking to work more often along a route users seem to enjoy, for instance. MoodRhythm lets patients with bipolar disorder monitor sleep and social interactions to maintain balanced mood and energy levels, a major improvement over pen-and-paper tracking of daily behavior. (The programs remain in development as smartphone apps.)

The goal is to make it easier than ever for people to improve their lives, said Choudhury: “Just like sensing [technology] has become invisible, can we actually make behavioral change invisible?”

D'Onofrio_web

Brian M. D’Onofrio

Big Data also enables researchers to reconsider past problems in fresh ways, said APS Fellow Brian M. D’Onofrio of Indiana University Bloomington. In particular, he said, researchers should consider repurposing data that might have been collected for other reasons. Repurposing large data samples can help researchers produce insights that traditional samples can’t as well as achieve the statistical power many lab studies lack — a big challenge as psychology makes a push to improve its methodology and replication process.

“With Big Data, it gives you the opportunity to use several different types of quasi-experimental designs, to help rule out alternative explanations,” D’Onofrio said.

D’Onofrio and collaborators recently repurposed millions of personal records compiled in Sweden to challenge the conventional notion that smoking during pregnancy directly causes bad behavior outcomes, such as criminality, later in life. In one study, the researchers analyzed 50,000 siblings whose mothers smoked during one pregnancy but not the other. They determined that family background factors — as opposed to exposure to smoking during pregnancy — accounted for the association with criminal convictions. Such realizations can greatly improve interventions: In this case, getting women to quit smoking should be only part of the focus of a broader suite of social services.

Dumais_web

Susan T. Dumais

Big Data is already producing positive change in the world of Web search. The billions of Internet searches that occur each day leave behind behavioral logs that analysts use to improve search engines over time, said Susan T. Dumais of Microsoft Research. Without that vast record, sites like Google and Bing would never be able take the 2.4 words in an average Internet search and convert them into something useful.

“Behavioral logs allow us to characterize, with a richness and fidelity that we’ve never had before, what it is people are trying to do with the tools and systems they’re interacting with,” said Dumais.

By mining behavioral logs, analysts can create personalized algorithms that improve the search experience for users. If Dumais searches for “sigir,” for instance, she probably wants the homepage of the Special Interest Group on Information Retrieval (abbreviated SIGIR). If Stuart Bowen Jr. performs the same search, he probably wants the website for his position: Special Inspector General for Iraq Reconstruction (also abbreviated SIGIR).

In other words, systems can learn that words and acronyms in isolation aren’t always the best way to predict what a user wants from a search. Modeling searches in a way that takes into account the context in which the query is issued is important in improving Web search. Previous search activity matters, as does the location and time when the query occurs. A search for “US Open” performed in late spring likely refers to golf, for instance, while the same search in late summer likely refers to tennis.

“Before you were able to collect Big Data, the person who spoke loudest, or the highest-paid person’s opinion, would dominate,” said Dumais. “Now the data, especially when derived from carefully controlled Web-scale experiments, dominates.”

Yarkoni_web

Tal Yarkoni

Big Data can even help psychological scientists study studies, said Tal Yarkoni of the University of Texas at Austin. Yarkoni and others recently developed Neurosynth, an online program that analyzes huge amounts of fMRI data to guide users toward a subject of interest. To date, said Yarkoni, Neurosynth has synthesized research from over 9,000 neuroimaging studies and about 300,000 brain activations.

One major goal of Neurosynth is to distinguish between brain activity that is consistently associated with a particular psychological process, but is nonspecific, and brain activity that implies a high probability that a specific psychological process is present. For example, painful physical stimulation might consistently produce a certain pattern of brain activity, and yet that pattern of activity need not imply the presence of pain; other mental states potentially produce a similar pattern. Inferring mental processes from observed brain activity — a process known as “reverse inference” — is very difficult to do in any individual neuroimaging study.

Neurosynth makes reverse inference possible by amassing loads of images and study data in one place. For example, the database helps researchers identify brain regions that are specifically related to pain instead of working memory or emotion, even if some of the active brain regions overlap in all three cases. Tests show that in many cases, Neurosynth performs as well as analyses done manually by sifting through the research literature — but with a time-savings of hundreds of hours of research versus simply pushing a button, said Yarkoni.

“That’s the long-term goal,” Yarkoni said. “To do this in a quantitative, automated way instead of a manual, qualitative way.”

Long-term goals were the theme of the program, since Big Data is still emerging as a scientific presence. Not everyone believes it will create a paradigm shift. (Behavioral scientist Dan Ariely of Duke University has compared Big Data to teenage sex, in that “everyone talks about it, nobody really knows how to do it.”) Even if Big Data does change statistical analysis, it can’t replace strong behavioral theories or experimentation, said Jones. But insofar as Big Data can refine those theories or sharpen those experiments, researchers can’t afford to ignore it.

big data psychology research

Thank you for shining a light in this direction. A pill employs bio-chemistry to deliver a mechanism of action that results in a specific therapeutic result. Bio-chemistry poses human risk that justifies experimental design. An app, on the other hand, employs a user experience (UX) to deliver a mechanism of action that may result in lasting behavior modification. The UX poses no risk. And new data mobility, velocity and storage capacity, married to software (aka apps) allow a virtually unlimited sample size, and the ability to look at all of the data. The data will tell us what works, what does not and for whom. Behavioral scientists should revisit the work of Donald C. Campbell, recognize that, in this new digital era, quasi experimentation is no longer a weak alternative; it has come of age. It may be the gold standard for a new era of behavioral research. The experiment and the treatment, in many cases, may become one.

big data psychology research

What about user privacy and consent? At least with many psychological studies the onus is on us as researchers to gather informed consent from potential participants as part of our ethics processes, which were established for a reason. However, implicit in the above for ‘repurposing data’ this principle appears to have been abandoned in the case of big data. And it is not like there is no risk, the risks of compulsive use of social media and loss of privacy (i.e. loss of social trust) are well-documented. You are co-opting methods used for dubious means by internet companies for research purposes and that makes you complicit in their questionable practices.

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

big data psychology research

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Joel Anderson, a senior research fellow at both Australian Catholic University and La Trobe University, researches group processes, with a specific interest on prejudice, stigma, and stereotypes.

big data psychology research

Experimental Methods Are Not Neutral Tools

Ana Sofia Morais and Ralph Hertwig explain how experimental psychologists have painted too negative a picture of human rationality, and how their pessimism is rooted in a seemingly mundane detail: methodological choices. 

APS Fellows Elected to SEP

In addition, an APS Rising Star receives the society’s Early Investigator Award.

Privacy Overview

CookieDurationDescription
__cf_bm30 minutesThis cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
CookieDurationDescription
AWSELBCORS5 minutesThis cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.
CookieDurationDescription
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 27 daysSet by addthis.com to determine the usage of addthis.com service.
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_11 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CookieDurationDescription
loc1 year 27 daysAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextIdneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requestsneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

John Nosta

Artificial Intelligence

Ai makes the big picture much bigger, llms are reshaping our world to create a dynamic landscape of imagination..

Posted August 30, 2024 | Reviewed by Tyler Woods

  • The "big picture" has evolved into a vast, dynamic mosaic shaped by the connectivity of LLMs.
  • LLMs synthesize data across domains, transforming how we perceive and engage with complex networks.
  • Success in the Cognitive Age requires embracing complexity and navigating the evolving cognitive mosaic.

Art: DALL-E/OpenAI

It's not the Matrix, it's the Mosaic.

In the not-so-distant past, we were tasked with understanding and navigating what was often referred to as "the big picture." This concept, while broad and all-encompassing, was still relatively straightforward: grasp the mission of your life, family, or organization, understand your role within it, and align your efforts accordingly.

Today, in an era defined by the rise of large language models (LLMs), the big picture has morphed into something far more complex and intricate—a vast mosaic of interconnected ideas, insights, and opportunities.

From Big Picture to Vast Mosaic

The traditional big picture was a static representation, a snapshot of where things stood and where they were headed. It was a useful guide, but one limited by the scope of human cognition and the availability of data. Enter LLMs, and suddenly, the boundaries of this picture have expanded exponentially. What was once a simple frame now extends in all directions, filled with layers of information, connections, and patterns that we’re only just beginning to comprehend.

This shift isn't just a matter of scale—it's a transformation in how we perceive and interact with the world around us. The big picture is no longer a single, flat image but a vast mosaic, rich in detail and depth. Every piece of this mosaic contributes to a more comprehensive understanding, but it also challenges us to think in new ways, to embrace complexity and connectivity as fundamental aspects of our personal and professional lives.

The Extraordinary Connectivity of LLMs

At the heart of this transformation is the extraordinary connectivity that LLMs bring to the table. These models don't just process data—they synthesize information across domains, bridging gaps that once seemed insurmountable. They identify patterns and relationships that are not immediately obvious, offering insights that can redefine strategies, reshape industries, and even alter the trajectory of our lives.

This " cognitive connectivity " is what turns the big picture into a mosaic. It's the ability to see how one piece of information relates to another, how insights and decisions in one area can ripple across an entire structure. LLMs provide the tools to explore these connections, to dive into the details without losing sight of the larger context. They help us see the world not just as it is but as a dynamic, evolving network of possibilities.

Navigating the Mosaic: A New Mindset

In this new landscape, LLMs challenge us to adopt a mindset that goes beyond traditional big-picture thinking. The vast mosaic demands a blend of adaptability, curiosity, and a willingness to engage with complexity. It's no longer enough to understand your role—we now have the opportunity to understand how that role intersects with others, and how it contributes to a larger narrative that is constantly being rewritten.

This requires a shift from linear thinking to systems thinking. It's about recognizing that every action, every decision, is part of a broader web of influences and outcomes. LLMs, with their ability to process and analyze vast amounts of information, can be invaluable partners in this journey. They offer not just answers but new questions, new avenues to explore, and new ways to think about the challenges we face.

The Ongoing Evolution of the Mosaic

The vast mosaic isn't a finished product; it's a work in progress, continually reshaped by the data and insights that flow through it. As LLMs continue to evolve, so too will our understanding of this mosaic. New connections will emerge, new patterns will be discovered, and the picture will become ever more intricate.

For all of us, this means an ongoing commitment to learning and adaptation. The skills that served us in the past may not be sufficient in the future, but that's not a cause for concern—it's an opportunity. By embracing the vast mosaic, we open ourselves up to a world of possibilities, where innovation and creativity are limited only by our willingness to explore and engage.

big data psychology research

Embracing the Vast Mosaic

The big picture is no longer what it used to be. It's now a vast mosaic, a dynamic and interconnected landscape that challenges us to think more broadly, act more decisively, and engage more deeply with the world around us. In this new reality, LLMs will be our guides, helping us to see the connections, understand the patterns, and ultimately, find our place within the mosaic.

In the emerging Cognitive Age , the key to success lies not in simplifying the complexity but in embracing it. The vast cognitive mosaic is here to stay, and those who can navigate its intricacies will be the ones who shape the future.

John Nosta

John Nosta is an innovation theorist and founder of NostaLab.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • International
  • New Zealand
  • South Africa
  • Switzerland
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Self Tests NEW
  • Therapy Center
  • Diagnosis Dictionary
  • Types of Therapy

July 2024 magazine cover

Sticking up for yourself is no easy task. But there are concrete skills you can use to hone your assertiveness and advocate for yourself.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 31 August 2024

Knowledge mapping and evolution of research on older adults’ technology acceptance: a bibliometric study from 2013 to 2023

  • Xianru Shang   ORCID: orcid.org/0009-0000-8906-3216 1 ,
  • Zijian Liu 1 ,
  • Chen Gong 1 ,
  • Zhigang Hu 1 ,
  • Yuexuan Wu 1 &
  • Chengliang Wang   ORCID: orcid.org/0000-0003-2208-3508 2  

Humanities and Social Sciences Communications volume  11 , Article number:  1115 ( 2024 ) Cite this article

Metrics details

  • Science, technology and society

The rapid expansion of information technology and the intensification of population aging are two prominent features of contemporary societal development. Investigating older adults’ acceptance and use of technology is key to facilitating their integration into an information-driven society. Given this context, the technology acceptance of older adults has emerged as a prioritized research topic, attracting widespread attention in the academic community. However, existing research remains fragmented and lacks a systematic framework. To address this gap, we employed bibliometric methods, utilizing the Web of Science Core Collection to conduct a comprehensive review of literature on older adults’ technology acceptance from 2013 to 2023. Utilizing VOSviewer and CiteSpace for data assessment and visualization, we created knowledge mappings of research on older adults’ technology acceptance. Our study employed multidimensional methods such as co-occurrence analysis, clustering, and burst analysis to: (1) reveal research dynamics, key journals, and domains in this field; (2) identify leading countries, their collaborative networks, and core research institutions and authors; (3) recognize the foundational knowledge system centered on theoretical model deepening, emerging technology applications, and research methods and evaluation, uncovering seminal literature and observing a shift from early theoretical and influential factor analyses to empirical studies focusing on individual factors and emerging technologies; (4) moreover, current research hotspots are primarily in the areas of factors influencing technology adoption, human-robot interaction experiences, mobile health management, and aging-in-place technology, highlighting the evolutionary context and quality distribution of research themes. Finally, we recommend that future research should deeply explore improvements in theoretical models, long-term usage, and user experience evaluation. Overall, this study presents a clear framework of existing research in the field of older adults’ technology acceptance, providing an important reference for future theoretical exploration and innovative applications.

Similar content being viewed by others

big data psychology research

Research progress and intellectual structure of design for digital equity (DDE): A bibliometric analysis based on citespace

big data psychology research

Exploring the role of interaction in older-adult service innovation: insights from the testing stage

big data psychology research

Smart device interest, perceived usefulness, and preferences in rural Alabama seniors

Introduction.

In contemporary society, the rapid development of information technology has been intricately intertwined with the intensifying trend of population aging. According to the latest United Nations forecast, by 2050, the global population aged 65 and above is expected to reach 1.6 billion, representing about 16% of the total global population (UN 2023 ). Given the significant challenges of global aging, there is increasing evidence that emerging technologies have significant potential to maintain health and independence for older adults in their home and healthcare environments (Barnard et al. 2013 ; Soar 2010 ; Vancea and Solé-Casals 2016 ). This includes, but is not limited to, enhancing residential safety with smart home technologies (Touqeer et al. 2021 ; Wang et al. 2022 ), improving living independence through wearable technologies (Perez et al. 2023 ), and increasing medical accessibility via telehealth services (Kruse et al. 2020 ). Technological innovations are redefining the lifestyles of older adults, encouraging a shift from passive to active participation (González et al. 2012 ; Mostaghel 2016 ). Nevertheless, the effective application and dissemination of technology still depends on user acceptance and usage intentions (Naseri et al. 2023 ; Wang et al. 2023a ; Xia et al. 2024 ; Yu et al. 2023 ). Particularly, older adults face numerous challenges in accepting and using new technologies. These challenges include not only physical and cognitive limitations but also a lack of technological experience, along with the influences of social and economic factors (Valk et al. 2018 ; Wilson et al. 2021 ).

User acceptance of technology is a significant focus within information systems (IS) research (Dai et al. 2024 ), with several models developed to explain and predict user behavior towards technology usage, including the Technology Acceptance Model (TAM) (Davis 1989 ), TAM2, TAM3, and the Unified Theory of Acceptance and Use of Technology (UTAUT) (Venkatesh et al. 2003 ). Older adults, as a group with unique needs, exhibit different behavioral patterns during technology acceptance than other user groups, and these uniquenesses include changes in cognitive abilities, as well as motivations, attitudes, and perceptions of the use of new technologies (Chen and Chan 2011 ). The continual expansion of technology introduces considerable challenges for older adults, rendering the understanding of their technology acceptance a research priority. Thus, conducting in-depth research into older adults’ acceptance of technology is critically important for enhancing their integration into the information society and improving their quality of life through technological advancements.

Reviewing relevant literature to identify research gaps helps further solidify the theoretical foundation of the research topic. However, many existing literature reviews primarily focus on the factors influencing older adults’ acceptance or intentions to use technology. For instance, Ma et al. ( 2021 ) conducted a comprehensive analysis of the determinants of older adults’ behavioral intentions to use technology; Liu et al. ( 2022 ) categorized key variables in studies of older adults’ technology acceptance, noting a shift in focus towards social and emotional factors; Yap et al. ( 2022 ) identified seven categories of antecedents affecting older adults’ use of technology from an analysis of 26 articles, including technological, psychological, social, personal, cost, behavioral, and environmental factors; Schroeder et al. ( 2023 ) extracted 119 influencing factors from 59 articles and further categorized these into six themes covering demographics, health status, and emotional awareness. Additionally, some studies focus on the application of specific technologies, such as Ferguson et al. ( 2021 ), who explored barriers and facilitators to older adults using wearable devices for heart monitoring, and He et al. ( 2022 ) and Baer et al. ( 2022 ), who each conducted in-depth investigations into the acceptance of social assistive robots and mobile nutrition and fitness apps, respectively. In summary, current literature reviews on older adults’ technology acceptance exhibit certain limitations. Due to the interdisciplinary nature and complex knowledge structure of this field, traditional literature reviews often rely on qualitative analysis, based on literature analysis and periodic summaries, which lack sufficient objectivity and comprehensiveness. Additionally, systematic research is relatively limited, lacking a macroscopic description of the research trajectory from a holistic perspective. Over the past decade, research on older adults’ technology acceptance has experienced rapid growth, with a significant increase in literature, necessitating the adoption of new methods to review and examine the developmental trends in this field (Chen 2006 ; Van Eck and Waltman 2010 ). Bibliometric analysis, as an effective quantitative research method, analyzes published literature through visualization, offering a viable approach to extracting patterns and insights from a large volume of papers, and has been widely applied in numerous scientific research fields (Achuthan et al. 2023 ; Liu and Duffy 2023 ). Therefore, this study will employ bibliometric methods to systematically analyze research articles related to older adults’ technology acceptance published in the Web of Science Core Collection from 2013 to 2023, aiming to understand the core issues and evolutionary trends in the field, and to provide valuable references for future related research. Specifically, this study aims to explore and answer the following questions:

RQ1: What are the research dynamics in the field of older adults’ technology acceptance over the past decade? What are the main academic journals and fields that publish studies related to older adults’ technology acceptance?

RQ2: How is the productivity in older adults’ technology acceptance research distributed among countries, institutions, and authors?

RQ3: What are the knowledge base and seminal literature in older adults’ technology acceptance research? How has the research theme progressed?

RQ4: What are the current hot topics and their evolutionary trajectories in older adults’ technology acceptance research? How is the quality of research distributed?

Methodology and materials

Research method.

In recent years, bibliometrics has become one of the crucial methods for analyzing literature reviews and is widely used in disciplinary and industrial intelligence analysis (Jing et al. 2023 ; Lin and Yu 2024a ; Wang et al. 2024a ; Xu et al. 2021 ). Bibliometric software facilitates the visualization analysis of extensive literature data, intuitively displaying the network relationships and evolutionary processes between knowledge units, and revealing the underlying knowledge structure and potential information (Chen et al. 2024 ; López-Robles et al. 2018 ; Wang et al. 2024c ). This method provides new insights into the current status and trends of specific research areas, along with quantitative evidence, thereby enhancing the objectivity and scientific validity of the research conclusions (Chen et al. 2023 ; Geng et al. 2024 ). VOSviewer and CiteSpace are two widely used bibliometric software tools in academia (Pan et al. 2018 ), recognized for their robust functionalities based on the JAVA platform. Although each has its unique features, combining these two software tools effectively constructs mapping relationships between literature knowledge units and clearly displays the macrostructure of the knowledge domains. Particularly, VOSviewer, with its excellent graphical representation capabilities, serves as an ideal tool for handling large datasets and precisely identifying the focal points and hotspots of research topics. Therefore, this study utilizes VOSviewer (version 1.6.19) and CiteSpace (version 6.1.R6), combined with in-depth literature analysis, to comprehensively examine and interpret the research theme of older adults’ technology acceptance through an integrated application of quantitative and qualitative methods.

Data source

Web of Science is a comprehensively recognized database in academia, featuring literature that has undergone rigorous peer review and editorial scrutiny (Lin and Yu 2024b ; Mongeon and Paul-Hus 2016 ; Pranckutė 2021 ). This study utilizes the Web of Science Core Collection as its data source, specifically including three major citation indices: Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI). These indices encompass high-quality research literature in the fields of science, social sciences, and arts and humanities, ensuring the comprehensiveness and reliability of the data. We combined “older adults” with “technology acceptance” through thematic search, with the specific search strategy being: TS = (elder OR elderly OR aging OR ageing OR senile OR senior OR old people OR “older adult*”) AND TS = (“technology acceptance” OR “user acceptance” OR “consumer acceptance”). The time span of literature search is from 2013 to 2023, with the types limited to “Article” and “Review” and the language to “English”. Additionally, the search was completed by October 27, 2023, to avoid data discrepancies caused by database updates. The initial search yielded 764 journal articles. Given that searches often retrieve articles that are superficially relevant but actually non-compliant, manual screening post-search was essential to ensure the relevance of the literature (Chen et al. 2024 ). Through manual screening, articles significantly deviating from the research theme were eliminated and rigorously reviewed. Ultimately, this study obtained 500 valid sample articles from the Web of Science Core Collection. The complete PRISMA screening process is illustrated in Fig. 1 .

figure 1

Presentation of the data culling process in detail.

Data standardization

Raw data exported from databases often contain multiple expressions of the same terminology (Nguyen and Hallinger 2020 ). To ensure the accuracy and consistency of data, it is necessary to standardize the raw data (Strotmann and Zhao 2012 ). This study follows the data standardization process proposed by Taskin and Al ( 2019 ), mainly executing the following operations:

(1) Standardization of author and institution names is conducted to address different name expressions for the same author. For instance, “Chan, Alan Hoi Shou” and “Chan, Alan H. S.” are considered the same author, and distinct authors with the same name are differentiated by adding identifiers. Diverse forms of institutional names are unified to address variations caused by name changes or abbreviations, such as standardizing “FRANKFURT UNIV APPL SCI” and “Frankfurt University of Applied Sciences,” as well as “Chinese University of Hong Kong” and “University of Hong Kong” to consistent names.

(2) Different expressions of journal names are unified. For example, “International Journal of Human-Computer Interaction” and “Int J Hum Comput Interact” are standardized to a single name. This ensures consistency in journal names and prevents misclassification of literature due to differing journal names. Additionally, it involves checking if the journals have undergone name changes in the past decade to prevent any impact on the analysis due to such changes.

(3) Keywords data are cleansed by removing words that do not directly pertain to specific research content (e.g., people, review), merging synonyms (e.g., “UX” and “User Experience,” “aging-in-place” and “aging in place”), and standardizing plural forms of keywords (e.g., “assistive technologies” and “assistive technology,” “social robots” and “social robot”). This reduces redundant information in knowledge mapping.

Bibliometric results and analysis

Distribution power (rq1), literature descriptive statistical analysis.

Table 1 presents a detailed descriptive statistical overview of the literature in the field of older adults’ technology acceptance. After deduplication using the CiteSpace software, this study confirmed a valid sample size of 500 articles. Authored by 1839 researchers, the documents encompass 792 research institutions across 54 countries and are published in 217 different academic journals. As of the search cutoff date, these articles have accumulated 13,829 citations, with an annual average of 1156 citations, and an average of 27.66 citations per article. The h-index, a composite metric of quantity and quality of scientific output (Kamrani et al. 2021 ), reached 60 in this study.

Trends in publications and disciplinary distribution

The number of publications and citations are significant indicators of the research field’s development, reflecting its continuity, attention, and impact (Ale Ebrahim et al. 2014 ). The ranking of annual publications and citations in the field of older adults’ technology acceptance studies is presented chronologically in Fig. 2A . The figure shows a clear upward trend in the amount of literature in this field. Between 2013 and 2017, the number of publications increased slowly and decreased in 2018. However, in 2019, the number of publications increased rapidly to 52 and reached a peak of 108 in 2022, which is 6.75 times higher than in 2013. In 2022, the frequency of document citations reached its highest point with 3466 citations, reflecting the widespread recognition and citation of research in this field. Moreover, the curve of the annual number of publications fits a quadratic function, with a goodness-of-fit R 2 of 0.9661, indicating that the number of future publications is expected to increase even more rapidly.

figure 2

A Trends in trends in annual publications and citations (2013–2023). B Overlay analysis of the distribution of discipline fields.

Figure 2B shows that research on older adults’ technology acceptance involves the integration of multidisciplinary knowledge. According to Web of Science Categories, these 500 articles are distributed across 85 different disciplines. We have tabulated the top ten disciplines by publication volume (Table 2 ), which include Medical Informatics (75 articles, 15.00%), Health Care Sciences & Services (71 articles, 14.20%), Gerontology (61 articles, 12.20%), Public Environmental & Occupational Health (57 articles, 11.40%), and Geriatrics & Gerontology (52 articles, 10.40%), among others. The high output in these disciplines reflects the concentrated global academic interest in this comprehensive research topic. Additionally, interdisciplinary research approaches provide diverse perspectives and a solid theoretical foundation for studies on older adults’ technology acceptance, also paving the way for new research directions.

Knowledge flow analysis

A dual-map overlay is a CiteSpace map superimposed on top of a base map, which shows the interrelationships between journals in different domains, representing the publication and citation activities in each domain (Chen and Leydesdorff 2014 ). The overlay map reveals the link between the citing domain (on the left side) and the cited domain (on the right side), reflecting the knowledge flow of the discipline at the journal level (Leydesdorff and Rafols 2012 ). We utilize the in-built Z-score algorithm of the software to cluster the graph, as shown in Fig. 3 .

figure 3

The left side shows the citing journal, and the right side shows the cited journal.

Figure 3 shows the distribution of citing journals clusters for older adults’ technology acceptance on the left side, while the right side refers to the main cited journals clusters. Two knowledge flow citation trajectories were obtained; they are presented by the color of the cited regions, and the thickness of these trajectories is proportional to the Z-score scaled frequency of citations (Chen et al. 2014 ). Within the cited regions, the most popular fields with the most records covered are “HEALTH, NURSING, MEDICINE” and “PSYCHOLOGY, EDUCATION, SOCIAL”, and the elliptical aspect ratio of these two fields stands out. Fields have prominent elliptical aspect ratios, highlighting their significant influence on older adults’ technology acceptance research. Additionally, the major citation trajectories originate in these two areas and progress to the frontier research area of “PSYCHOLOGY, EDUCATION, HEALTH”. It is worth noting that the citation trajectory from “PSYCHOLOGY, EDUCATION, SOCIAL” has a significant Z-value (z = 6.81), emphasizing the significance and impact of this development path. In the future, “MATHEMATICS, SYSTEMS, MATHEMATICAL”, “MOLECULAR, BIOLOGY, IMMUNOLOGY”, and “NEUROLOGY, SPORTS, OPHTHALMOLOGY” may become emerging fields. The fields of “MEDICINE, MEDICAL, CLINICAL” may be emerging areas of cutting-edge research.

Main research journals analysis

Table 3 provides statistics for the top ten journals by publication volume in the field of older adults’ technology acceptance. Together, these journals have published 137 articles, accounting for 27.40% of the total publications, indicating that there is no highly concentrated core group of journals in this field, with publications being relatively dispersed. Notably, Computers in Human Behavior , Journal of Medical Internet Research , and International Journal of Human-Computer Interaction each lead with 15 publications. In terms of citation metrics, International Journal of Medical Informatics and Computers in Human Behavior stand out significantly, with the former accumulating a total of 1,904 citations, averaging 211.56 citations per article, and the latter totaling 1,449 citations, with an average of 96.60 citations per article. These figures emphasize the academic authority and widespread impact of these journals within the research field.

Research power (RQ2)

Countries and collaborations analysis.

The analysis revealed the global research pattern for country distribution and collaboration (Chen et al. 2019 ). Figure 4A shows the network of national collaborations on older adults’ technology acceptance research. The size of the bubbles represents the amount of publications in each country, while the thickness of the connecting lines expresses the closeness of the collaboration among countries. Generally, this research subject has received extensive international attention, with China and the USA publishing far more than any other countries. China has established notable research collaborations with the USA, UK and Malaysia in this field, while other countries have collaborations, but the closeness is relatively low and scattered. Figure 4B shows the annual publication volume dynamics of the top ten countries in terms of total publications. Since 2017, China has consistently increased its annual publications, while the USA has remained relatively stable. In 2019, the volume of publications in each country increased significantly, this was largely due to the global outbreak of the COVID-19 pandemic, which has led to increased reliance on information technology among the elderly for medical consultations, online socialization, and health management (Sinha et al. 2021 ). This phenomenon has led to research advances in technology acceptance among older adults in various countries. Table 4 shows that the top ten countries account for 93.20% of the total cumulative number of publications, with each country having published more than 20 papers. Among these ten countries, all of them except China are developed countries, indicating that the research field of older adults’ technology acceptance has received general attention from developed countries. Currently, China and the USA were the leading countries in terms of publications with 111 and 104 respectively, accounting for 22.20% and 20.80%. The UK, Germany, Italy, and the Netherlands also made significant contributions. The USA and China ranked first and second in terms of the number of citations, while the Netherlands had the highest average citations, indicating the high impact and quality of its research. The UK has shown outstanding performance in international cooperation, while the USA highlights its significant academic influence in this field with the highest h-index value.

figure 4

A National collaboration network. B Annual volume of publications in the top 10 countries.

Institutions and authors analysis

Analyzing the number of publications and citations can reveal an institution’s or author’s research strength and influence in a particular research area (Kwiek 2021 ). Tables 5 and 6 show the statistics of the institutions and authors whose publication counts are in the top ten, respectively. As shown in Table 5 , higher education institutions hold the main position in this research field. Among the top ten institutions, City University of Hong Kong and The University of Hong Kong from China lead with 14 and 9 publications, respectively. City University of Hong Kong has the highest h-index, highlighting its significant influence in the field. It is worth noting that Tilburg University in the Netherlands is not among the top five in terms of publications, but the high average citation count (130.14) of its literature demonstrates the high quality of its research.

After analyzing the authors’ output using Price’s Law (Redner 1998 ), the highest number of publications among the authors counted ( n  = 10) defines a publication threshold of 3 for core authors in this research area. As a result of quantitative screening, a total of 63 core authors were identified. Table 6 shows that Chen from Zhejiang University, China, Ziefle from RWTH Aachen University, Germany, and Rogers from Macquarie University, Australia, were the top three authors in terms of the number of publications, with 10, 9, and 8 articles, respectively. In terms of average citation rate, Peek and Wouters, both scholars from the Netherlands, have significantly higher rates than other scholars, with 183.2 and 152.67 respectively. This suggests that their research is of high quality and widely recognized. Additionally, Chen and Rogers have high h-indices in this field.

Knowledge base and theme progress (RQ3)

Research knowledge base.

Co-citation relationships occur when two documents are cited together (Zhang and Zhu 2022 ). Co-citation mapping uses references as nodes to represent the knowledge base of a subject area (Min et al. 2021). Figure 5A illustrates co-occurrence mapping in older adults’ technology acceptance research, where larger nodes signify higher co-citation frequencies. Co-citation cluster analysis can be used to explore knowledge structure and research boundaries (Hota et al. 2020 ; Shiau et al. 2023 ). The co-citation clustering mapping of older adults’ technology acceptance research literature (Fig. 5B ) shows that the Q value of the clustering result is 0.8129 (>0.3), and the average value of the weight S is 0.9391 (>0.7), indicating that the clusters are uniformly distributed with a significant and credible structure. This further proves that the boundaries of the research field are clear and there is significant differentiation in the field. The figure features 18 cluster labels, each associated with thematic color blocks corresponding to different time slices. Highlighted emerging research themes include #2 Smart Home Technology, #7 Social Live, and #10 Customer Service. Furthermore, the clustering labels extracted are primarily classified into three categories: theoretical model deepening, emerging technology applications, research methods and evaluation, as detailed in Table 7 .

figure 5

A Co-citation analysis of references. B Clustering network analysis of references.

Seminal literature analysis

The top ten nodes in terms of co-citation frequency were selected for further analysis. Table 8 displays the corresponding node information. Studies were categorized into four main groups based on content analysis. (1) Research focusing on specific technology usage by older adults includes studies by Peek et al. ( 2014 ), Ma et al. ( 2016 ), Hoque and Sorwar ( 2017 ), and Li et al. ( 2019 ), who investigated the factors influencing the use of e-technology, smartphones, mHealth, and smart wearables, respectively. (2) Concerning the development of theoretical models of technology acceptance, Chen and Chan ( 2014 ) introduced the Senior Technology Acceptance Model (STAM), and Macedo ( 2017 ) analyzed the predictive power of UTAUT2 in explaining older adults’ intentional behaviors and information technology usage. (3) In exploring older adults’ information technology adoption and behavior, Lee and Coughlin ( 2015 ) emphasized that the adoption of technology by older adults is a multifactorial process that includes performance, price, value, usability, affordability, accessibility, technical support, social support, emotion, independence, experience, and confidence. Yusif et al. ( 2016 ) conducted a literature review examining the key barriers affecting older adults’ adoption of assistive technology, including factors such as privacy, trust, functionality/added value, cost, and stigma. (4) From the perspective of research into older adults’ technology acceptance, Mitzner et al. ( 2019 ) assessed the long-term usage of computer systems designed for the elderly, whereas Guner and Acarturk ( 2020 ) compared information technology usage and acceptance between older and younger adults. The breadth and prevalence of this literature make it a vital reference for researchers in the field, also providing new perspectives and inspiration for future research directions.

Research thematic progress

Burst citation is a node of literature that guides the sudden change in dosage, which usually represents a prominent development or major change in a particular field, with innovative and forward-looking qualities. By analyzing the emergent literature, it is often easy to understand the dynamics of the subject area, mapping the emerging thematic change (Chen et al. 2022 ). Figure 6 shows the burst citation mapping in the field of older adults’ technology acceptance research, with burst citations represented by red nodes (Fig. 6A ). For the ten papers with the highest burst intensity (Fig. 6B ), this study will conduct further analysis in conjunction with literature review.

figure 6

A Burst detection of co-citation. B The top 10 references with the strongest citation bursts.

As shown in Fig. 6 , Mitzner et al. ( 2010 ) broke the stereotype that older adults are fearful of technology, found that they actually have positive attitudes toward technology, and emphasized the centrality of ease of use and usefulness in the process of technology acceptance. This finding provides an important foundation for subsequent research. During the same period, Wagner et al. ( 2010 ) conducted theory-deepening and applied research on technology acceptance among older adults. The research focused on older adults’ interactions with computers from the perspective of Social Cognitive Theory (SCT). This expanded the understanding of technology acceptance, particularly regarding the relationship between behavior, environment, and other SCT elements. In addition, Pan and Jordan-Marsh ( 2010 ) extended the TAM to examine the interactions among predictors of perceived usefulness, perceived ease of use, subjective norm, and convenience conditions when older adults use the Internet, taking into account the moderating roles of gender and age. Heerink et al. ( 2010 ) adapted and extended the UTAUT, constructed a technology acceptance model specifically designed for older users’ acceptance of assistive social agents, and validated it using controlled experiments and longitudinal data, explaining intention to use by combining functional assessment and social interaction variables.

Then the research theme shifted to an in-depth analysis of the factors influencing technology acceptance among older adults. Two papers with high burst strengths emerged during this period: Peek et al. ( 2014 ) (Strength = 12.04), Chen and Chan ( 2014 ) (Strength = 9.81). Through a systematic literature review and empirical study, Peek STM and Chen K, among others, identified multidimensional factors that influence older adults’ technology acceptance. Peek et al. ( 2014 ) analyzed literature on the acceptance of in-home care technology among older adults and identified six factors that influence their acceptance: concerns about technology, expected benefits, technology needs, technology alternatives, social influences, and older adult characteristics, with a focus on differences between pre- and post-implementation factors. Chen and Chan ( 2014 ) constructed the STAM by administering a questionnaire to 1012 older adults and adding eight important factors, including technology anxiety, self-efficacy, cognitive ability, and physical function, based on the TAM. This enriches the theoretical foundation of the field. In addition, Braun ( 2013 ) highlighted the role of perceived usefulness, trust in social networks, and frequency of Internet use in older adults’ use of social networks, while ease of use and social pressure were not significant influences. These findings contribute to the study of older adults’ technology acceptance within specific technology application domains.

Recent research has focused on empirical studies of personal factors and emerging technologies. Ma et al. ( 2016 ) identified key personal factors affecting smartphone acceptance among older adults through structured questionnaires and face-to-face interviews with 120 participants. The study found that cost, self-satisfaction, and convenience were important factors influencing perceived usefulness and ease of use. This study offers empirical evidence to comprehend the main factors that drive smartphone acceptance among Chinese older adults. Additionally, Yusif et al. ( 2016 ) presented an overview of the obstacles that hinder older adults’ acceptance of assistive technologies, focusing on privacy, trust, and functionality.

In summary, research on older adults’ technology acceptance has shifted from early theoretical deepening and analysis of influencing factors to empirical studies in the areas of personal factors and emerging technologies, which have greatly enriched the theoretical basis of older adults’ technology acceptance and provided practical guidance for the design of emerging technology products.

Research hotspots, evolutionary trends, and quality distribution (RQ4)

Core keywords analysis.

Keywords concise the main idea and core of the literature, and are a refined summary of the research content (Huang et al. 2021 ). In CiteSpace, nodes with a centrality value greater than 0.1 are considered to be critical nodes. Analyzing keywords with high frequency and centrality helps to visualize the hot topics in the research field (Park et al. 2018 ). The merged keywords were imported into CiteSpace, and the top 10 keywords were counted and sorted by frequency and centrality respectively, as shown in Table 9 . The results show that the keyword “TAM” has the highest frequency (92), followed by “UTAUT” (24), which reflects that the in-depth study of the existing technology acceptance model and its theoretical expansion occupy a central position in research related to older adults’ technology acceptance. Furthermore, the terms ‘assistive technology’ and ‘virtual reality’ are both high-frequency and high-centrality terms (frequency = 17, centrality = 0.10), indicating that the research on assistive technology and virtual reality for older adults is the focus of current academic attention.

Research hotspots analysis

Using VOSviewer for keyword co-occurrence analysis organizes keywords into groups or clusters based on their intrinsic connections and frequencies, clearly highlighting the research field’s hot topics. The connectivity among keywords reveals correlations between different topics. To ensure accuracy, the analysis only considered the authors’ keywords. Subsequently, the keywords were filtered by setting the keyword frequency to 5 to obtain the keyword clustering map of the research on older adults’ technology acceptance research keyword clustering mapping (Fig. 7 ), combined with the keyword co-occurrence clustering network (Fig. 7A ) and the corresponding density situation (Fig. 7B ) to make a detailed analysis of the following four groups of clustered themes.

figure 7

A Co-occurrence clustering network. B Keyword density.

Cluster #1—Research on the factors influencing technology adoption among older adults is a prominent topic, covering age, gender, self-efficacy, attitude, and and intention to use (Berkowsky et al. 2017 ; Wang et al. 2017 ). It also examined older adults’ attitudes towards and acceptance of digital health technologies (Ahmad and Mozelius, 2022 ). Moreover, the COVID-19 pandemic, significantly impacting older adults’ technology attitudes and usage, has underscored the study’s importance and urgency. Therefore, it is crucial to conduct in-depth studies on how older adults accept, adopt, and effectively use new technologies, to address their needs and help them overcome the digital divide within digital inclusion. This will improve their quality of life and healthcare experiences.

Cluster #2—Research focuses on how older adults interact with assistive technologies, especially assistive robots and health monitoring devices, emphasizing trust, usability, and user experience as crucial factors (Halim et al. 2022 ). Moreover, health monitoring technologies effectively track and manage health issues common in older adults, like dementia and mild cognitive impairment (Lussier et al. 2018 ; Piau et al. 2019 ). Interactive exercise games and virtual reality have been deployed to encourage more physical and cognitive engagement among older adults (Campo-Prieto et al. 2021 ). Personalized and innovative technology significantly enhances older adults’ participation, improving their health and well-being.

Cluster #3—Optimizing health management for older adults using mobile technology. With the development of mobile health (mHealth) and health information technology, mobile applications, smartphones, and smart wearable devices have become effective tools to help older users better manage chronic conditions, conduct real-time health monitoring, and even receive telehealth services (Dupuis and Tsotsos 2018 ; Olmedo-Aguirre et al. 2022 ; Kim et al. 2014 ). Additionally, these technologies can mitigate the problem of healthcare resource inequality, especially in developing countries. Older adults’ acceptance and use of these technologies are significantly influenced by their behavioral intentions, motivational factors, and self-management skills. These internal motivational factors, along with external factors, jointly affect older adults’ performance in health management and quality of life.

Cluster #4—Research on technology-assisted home care for older adults is gaining popularity. Environmentally assisted living enhances older adults’ independence and comfort at home, offering essential support and security. This has a crucial impact on promoting healthy aging (Friesen et al. 2016 ; Wahlroos et al. 2023 ). The smart home is a core application in this field, providing a range of solutions that facilitate independent living for the elderly in a highly integrated and user-friendly manner. This fulfills different dimensions of living and health needs (Majumder et al. 2017 ). Moreover, eHealth offers accurate and personalized health management and healthcare services for older adults (Delmastro et al. 2018 ), ensuring their needs are met at home. Research in this field often employs qualitative methods and structural equation modeling to fully understand older adults’ needs and experiences at home and analyze factors influencing technology adoption.

Evolutionary trends analysis

To gain a deeper understanding of the evolutionary trends in research hotspots within the field of older adults’ technology acceptance, we conducted a statistical analysis of the average appearance times of keywords, using CiteSpace to generate the time-zone evolution mapping (Fig. 8 ) and burst keywords. The time-zone mapping visually displays the evolution of keywords over time, intuitively reflecting the frequency and initial appearance of keywords in research, commonly used to identify trends in research topics (Jing et al. 2024a ; Kumar et al. 2021 ). Table 10 lists the top 15 keywords by burst strength, with the red sections indicating high-frequency citations and their burst strength in specific years. These burst keywords reveal the focus and trends of research themes over different periods (Kleinberg 2002 ). Combining insights from the time-zone mapping and burst keywords provides more objective and accurate research insights (Wang et al. 2023b ).

figure 8

Reflecting the frequency and time of first appearance of keywords in the study.

An integrated analysis of Fig. 8 and Table 10 shows that early research on older adults’ technology acceptance primarily focused on factors such as perceived usefulness, ease of use, and attitudes towards information technology, including their use of computers and the internet (Pan and Jordan-Marsh 2010 ), as well as differences in technology use between older adults and other age groups (Guner and Acarturk 2020 ). Subsequently, the research focus expanded to improving the quality of life for older adults, exploring how technology can optimize health management and enhance the possibility of independent living, emphasizing the significant role of technology in improving the quality of life for the elderly. With ongoing technological advancements, recent research has shifted towards areas such as “virtual reality,” “telehealth,” and “human-robot interaction,” with a focus on the user experience of older adults (Halim et al. 2022 ). The appearance of keywords such as “physical activity” and “exercise” highlights the value of technology in promoting physical activity and health among older adults. This phase of research tends to make cutting-edge technology genuinely serve the practical needs of older adults, achieving its widespread application in daily life. Additionally, research has focused on expanding and quantifying theoretical models of older adults’ technology acceptance, involving keywords such as “perceived risk”, “validation” and “UTAUT”.

In summary, from 2013 to 2023, the field of older adults’ technology acceptance has evolved from initial explorations of influencing factors, to comprehensive enhancements in quality of life and health management, and further to the application and deepening of theoretical models and cutting-edge technologies. This research not only reflects the diversity and complexity of the field but also demonstrates a comprehensive and in-depth understanding of older adults’ interactions with technology across various life scenarios and needs.

Research quality distribution

To reveal the distribution of research quality in the field of older adults’ technology acceptance, a strategic diagram analysis is employed to calculate and illustrate the internal development and interrelationships among various research themes (Xie et al. 2020 ). The strategic diagram uses Centrality as the X-axis and Density as the Y-axis to divide into four quadrants, where the X-axis represents the strength of the connection between thematic clusters and other themes, with higher values indicating a central position in the research field; the Y-axis indicates the level of development within the thematic clusters, with higher values denoting a more mature and widely recognized field (Li and Zhou 2020 ).

Through cluster analysis and manual verification, this study categorized 61 core keywords (Frequency ≥5) into 11 thematic clusters. Subsequently, based on the keywords covered by each thematic cluster, the research themes and their directions for each cluster were summarized (Table 11 ), and the centrality and density coordinates for each cluster were precisely calculated (Table 12 ). Finally, a strategic diagram of the older adults’ technology acceptance research field was constructed (Fig. 9 ). Based on the distribution of thematic clusters across the quadrants in the strategic diagram, the structure and developmental trends of the field were interpreted.

figure 9

Classification and visualization of theme clusters based on density and centrality.

As illustrated in Fig. 9 , (1) the theme clusters of #3 Usage Experience and #4 Assisted Living Technology are in the first quadrant, characterized by high centrality and density. Their internal cohesion and close links with other themes indicate their mature development, systematic research content or directions have been formed, and they have a significant influence on other themes. These themes play a central role in the field of older adults’ technology acceptance and have promising prospects. (2) The theme clusters of #6 Smart Devices, #9 Theoretical Models, and #10 Mobile Health Applications are in the second quadrant, with higher density but lower centrality. These themes have strong internal connections but weaker external links, indicating that these three themes have received widespread attention from researchers and have been the subject of related research, but more as self-contained systems and exhibit independence. Therefore, future research should further explore in-depth cooperation and cross-application with other themes. (3) The theme clusters of #7 Human-Robot Interaction, #8 Characteristics of the Elderly, and #11 Research Methods are in the third quadrant, with lower centrality and density. These themes are loosely connected internally and have weak links with others, indicating their developmental immaturity. Compared to other topics, they belong to the lower attention edge and niche themes, and there is a need for further investigation. (4) The theme clusters of #1 Digital Healthcare Technology, #2 Psychological Factors, and #5 Socio-Cultural Factors are located in the fourth quadrant, with high centrality but low density. Although closely associated with other research themes, the internal cohesion within these clusters is relatively weak. This suggests that while these themes are closely linked to other research areas, their own development remains underdeveloped, indicating a core immaturity. Nevertheless, these themes are crucial within the research domain of elderly technology acceptance and possess significant potential for future exploration.

Discussion on distribution power (RQ1)

Over the past decade, academic interest and influence in the area of older adults’ technology acceptance have significantly increased. This trend is evidenced by a quantitative analysis of publication and citation volumes, particularly noticeable in 2019 and 2022, where there was a substantial rise in both metrics. The rise is closely linked to the widespread adoption of emerging technologies such as smart homes, wearable devices, and telemedicine among older adults. While these technologies have enhanced their quality of life, they also pose numerous challenges, sparking extensive research into their acceptance, usage behaviors, and influencing factors among the older adults (Pirzada et al. 2022 ; Garcia Reyes et al. 2023 ). Furthermore, the COVID-19 pandemic led to a surge in technology demand among older adults, especially in areas like medical consultation, online socialization, and health management, further highlighting the importance and challenges of technology. Health risks and social isolation have compelled older adults to rely on technology for daily activities, accelerating its adoption and application within this demographic. This phenomenon has made technology acceptance a critical issue, driving societal and academic focus on the study of technology acceptance among older adults.

The flow of knowledge at the level of high-output disciplines and journals, along with the primary publishing outlets, indicates the highly interdisciplinary nature of research into older adults’ technology acceptance. This reflects the complexity and breadth of issues related to older adults’ technology acceptance, necessitating the integration of multidisciplinary knowledge and approaches. Currently, research is primarily focused on medical health and human-computer interaction, demonstrating academic interest in improving health and quality of life for older adults and addressing the urgent needs related to their interactions with technology. In the field of medical health, research aims to provide advanced and innovative healthcare technologies and services to meet the challenges of an aging population while improving the quality of life for older adults (Abdi et al. 2020 ; Wilson et al. 2021 ). In the field of human-computer interaction, research is focused on developing smarter and more user-friendly interaction models to meet the needs of older adults in the digital age, enabling them to actively participate in social activities and enjoy a higher quality of life (Sayago, 2019 ). These studies are crucial for addressing the challenges faced by aging societies, providing increased support and opportunities for the health, welfare, and social participation of older adults.

Discussion on research power (RQ2)

This study analyzes leading countries and collaboration networks, core institutions and authors, revealing the global research landscape and distribution of research strength in the field of older adults’ technology acceptance, and presents quantitative data on global research trends. From the analysis of country distribution and collaborations, China and the USA hold dominant positions in this field, with developed countries like the UK, Germany, Italy, and the Netherlands also excelling in international cooperation and research influence. The significant investment in technological research and the focus on the technological needs of older adults by many developed countries reflect their rapidly aging societies, policy support, and resource allocation.

China is the only developing country that has become a major contributor in this field, indicating its growing research capabilities and high priority given to aging societies and technological innovation. Additionally, China has close collaborations with countries such as USA, the UK, and Malaysia, driven not only by technological research needs but also by shared challenges and complementarities in aging issues among these nations. For instance, the UK has extensive experience in social welfare and aging research, providing valuable theoretical guidance and practical experience. International collaborations, aimed at addressing the challenges of aging, integrate the strengths of various countries, advancing in-depth and widespread development in the research of technology acceptance among older adults.

At the institutional and author level, City University of Hong Kong leads in publication volume, with research teams led by Chan and Chen demonstrating significant academic activity and contributions. Their research primarily focuses on older adults’ acceptance and usage behaviors of various technologies, including smartphones, smart wearables, and social robots (Chen et al. 2015 ; Li et al. 2019 ; Ma et al. 2016 ). These studies, targeting specific needs and product characteristics of older adults, have developed new models of technology acceptance based on existing frameworks, enhancing the integration of these technologies into their daily lives and laying a foundation for further advancements in the field. Although Tilburg University has a smaller publication output, it holds significant influence in the field of older adults’ technology acceptance. Particularly, the high citation rate of Peek’s studies highlights their excellence in research. Peek extensively explored older adults’ acceptance and usage of home care technologies, revealing the complexity and dynamics of their technology use behaviors. His research spans from identifying systemic influencing factors (Peek et al. 2014 ; Peek et al. 2016 ), emphasizing familial impacts (Luijkx et al. 2015 ), to constructing comprehensive models (Peek et al. 2017 ), and examining the dynamics of long-term usage (Peek et al. 2019 ), fully reflecting the evolving technology landscape and the changing needs of older adults. Additionally, the ongoing contributions of researchers like Ziefle, Rogers, and Wouters in the field of older adults’ technology acceptance demonstrate their research influence and leadership. These researchers have significantly enriched the knowledge base in this area with their diverse perspectives. For instance, Ziefle has uncovered the complex attitudes of older adults towards technology usage, especially the trade-offs between privacy and security, and how different types of activities affect their privacy needs (Maidhof et al. 2023 ; Mujirishvili et al. 2023 ; Schomakers and Ziefle 2023 ; Wilkowska et al. 2022 ), reflecting a deep exploration and ongoing innovation in the field of older adults’ technology acceptance.

Discussion on knowledge base and thematic progress (RQ3)

Through co-citation analysis and systematic review of seminal literature, this study reveals the knowledge foundation and thematic progress in the field of older adults’ technology acceptance. Co-citation networks and cluster analyses illustrate the structural themes of the research, delineating the differentiation and boundaries within this field. Additionally, burst detection analysis offers a valuable perspective for understanding the thematic evolution in the field of technology acceptance among older adults. The development and innovation of theoretical models are foundational to this research. Researchers enhance the explanatory power of constructed models by deepening and expanding existing technology acceptance theories to address theoretical limitations. For instance, Heerink et al. ( 2010 ) modified and expanded the UTAUT model by integrating functional assessment and social interaction variables to create the almere model. This model significantly enhances the ability to explain the intentions of older users in utilizing assistive social agents and improves the explanation of actual usage behaviors. Additionally, Chen and Chan ( 2014 ) extended the TAM to include age-related health and capability features of older adults, creating the STAM, which substantially improves predictions of older adults’ technology usage behaviors. Personal attributes, health and capability features, and facilitating conditions have a direct impact on technology acceptance. These factors more effectively predict older adults’ technology usage behaviors than traditional attitudinal factors.

With the advancement of technology and the application of emerging technologies, new research topics have emerged, increasingly focusing on older adults’ acceptance and use of these technologies. Prior to this, the study by Mitzner et al. ( 2010 ) challenged the stereotype of older adults’ conservative attitudes towards technology, highlighting the central roles of usability and usefulness in the technology acceptance process. This discovery laid an important foundation for subsequent research. Research fields such as “smart home technology,” “social life,” and “customer service” are emerging, indicating a shift in focus towards the practical and social applications of technology in older adults’ lives. Research not only focuses on the technology itself but also on how these technologies integrate into older adults’ daily lives and how they can improve the quality of life through technology. For instance, studies such as those by Ma et al. ( 2016 ), Hoque and Sorwar ( 2017 ), and Li et al. ( 2019 ) have explored factors influencing older adults’ use of smartphones, mHealth, and smart wearable devices.

Furthermore, the diversification of research methodologies and innovation in evaluation techniques, such as the use of mixed methods, structural equation modeling (SEM), and neural network (NN) approaches, have enhanced the rigor and reliability of the findings, enabling more precise identification of the factors and mechanisms influencing technology acceptance. Talukder et al. ( 2020 ) employed an effective multimethodological strategy by integrating SEM and NN to leverage the complementary strengths of both approaches, thus overcoming their individual limitations and more accurately analyzing and predicting older adults’ acceptance of wearable health technologies (WHT). SEM is utilized to assess the determinants’ impact on the adoption of WHT, while neural network models validate SEM outcomes and predict the significance of key determinants. This combined approach not only boosts the models’ reliability and explanatory power but also provides a nuanced understanding of the motivations and barriers behind older adults’ acceptance of WHT, offering deep research insights.

Overall, co-citation analysis of the literature in the field of older adults’ technology acceptance has uncovered deeper theoretical modeling and empirical studies on emerging technologies, while emphasizing the importance of research methodological and evaluation innovations in understanding complex social science issues. These findings are crucial for guiding the design and marketing strategies of future technology products, especially in the rapidly growing market of older adults.

Discussion on research hotspots and evolutionary trends (RQ4)

By analyzing core keywords, we can gain deep insights into the hot topics, evolutionary trends, and quality distribution of research in the field of older adults’ technology acceptance. The frequent occurrence of the keywords “TAM” and “UTAUT” indicates that the applicability and theoretical extension of existing technology acceptance models among older adults remain a focal point in academia. This phenomenon underscores the enduring influence of the studies by Davis ( 1989 ) and Venkatesh et al. ( 2003 ), whose models provide a robust theoretical framework for explaining and predicting older adults’ acceptance and usage of emerging technologies. With the widespread application of artificial intelligence (AI) and big data technologies, these theoretical models have incorporated new variables such as perceived risk, trust, and privacy issues (Amin et al. 2024 ; Chen et al. 2024 ; Jing et al. 2024b ; Seibert et al. 2021 ; Wang et al. 2024b ), advancing the theoretical depth and empirical research in this field.

Keyword co-occurrence cluster analysis has revealed multiple research hotspots in the field, including factors influencing technology adoption, interactive experiences between older adults and assistive technologies, the application of mobile health technology in health management, and technology-assisted home care. These studies primarily focus on enhancing the quality of life and health management of older adults through emerging technologies, particularly in the areas of ambient assisted living, smart health monitoring, and intelligent medical care. In these domains, the role of AI technology is increasingly significant (Qian et al. 2021 ; Ho 2020 ). With the evolution of next-generation information technologies, AI is increasingly integrated into elder care systems, offering intelligent, efficient, and personalized service solutions by analyzing the lifestyles and health conditions of older adults. This integration aims to enhance older adults’ quality of life in aspects such as health monitoring and alerts, rehabilitation assistance, daily health management, and emotional support (Lee et al. 2023 ). A survey indicates that 83% of older adults prefer AI-driven solutions when selecting smart products, demonstrating the increasing acceptance of AI in elder care (Zhao and Li 2024 ). Integrating AI into elder care presents both opportunities and challenges, particularly in terms of user acceptance, trust, and long-term usage effects, which warrant further exploration (Mhlanga 2023 ). These studies will help better understand the profound impact of AI technology on the lifestyles of older adults and provide critical references for optimizing AI-driven elder care services.

The Time-zone evolution mapping and burst keyword analysis further reveal the evolutionary trends of research hotspots. Early studies focused on basic technology acceptance models and user perceptions, later expanding to include quality of life and health management. In recent years, research has increasingly focused on cutting-edge technologies such as virtual reality, telehealth, and human-robot interaction, with a concurrent emphasis on the user experience of older adults. This evolutionary process demonstrates a deepening shift from theoretical models to practical applications, underscoring the significant role of technology in enhancing the quality of life for older adults. Furthermore, the strategic coordinate mapping analysis clearly demonstrates the development and mutual influence of different research themes. High centrality and density in the themes of Usage Experience and Assisted Living Technology indicate their mature research status and significant impact on other themes. The themes of Smart Devices, Theoretical Models, and Mobile Health Applications demonstrate self-contained research trends. The themes of Human-Robot Interaction, Characteristics of the Elderly, and Research Methods are not yet mature, but they hold potential for development. Themes of Digital Healthcare Technology, Psychological Factors, and Socio-Cultural Factors are closely related to other themes, displaying core immaturity but significant potential.

In summary, the research hotspots in the field of older adults’ technology acceptance are diverse and dynamic, demonstrating the academic community’s profound understanding of how older adults interact with technology across various life contexts and needs. Under the influence of AI and big data, research should continue to focus on the application of emerging technologies among older adults, exploring in depth how they adapt to and effectively use these technologies. This not only enhances the quality of life and healthcare experiences for older adults but also drives ongoing innovation and development in this field.

Research agenda

Based on the above research findings, to further understand and promote technology acceptance and usage among older adults, we recommend future studies focus on refining theoretical models, exploring long-term usage, and assessing user experience in the following detailed aspects:

Refinement and validation of specific technology acceptance models for older adults: Future research should focus on developing and validating technology acceptance models based on individual characteristics, particularly considering variations in technology acceptance among older adults across different educational levels and cultural backgrounds. This includes factors such as age, gender, educational background, and cultural differences. Additionally, research should examine how well specific technologies, such as wearable devices and mobile health applications, meet the needs of older adults. Building on existing theoretical models, this research should integrate insights from multiple disciplines such as psychology, sociology, design, and engineering through interdisciplinary collaboration to create more accurate and comprehensive models, which should then be validated in relevant contexts.

Deepening the exploration of the relationship between long-term technology use and quality of life among older adults: The acceptance and use of technology by users is a complex and dynamic process (Seuwou et al. 2016 ). Existing research predominantly focuses on older adults’ initial acceptance or short-term use of new technologies; however, the impact of long-term use on their quality of life and health is more significant. Future research should focus on the evolution of older adults’ experiences and needs during long-term technology usage, and the enduring effects of technology on their social interactions, mental health, and life satisfaction. Through longitudinal studies and qualitative analysis, this research reveals the specific needs and challenges of older adults in long-term technology use, providing a basis for developing technologies and strategies that better meet their requirements. This understanding aids in comprehensively assessing the impact of technology on older adults’ quality of life and guiding the optimization and improvement of technological products.

Evaluating the Importance of User Experience in Research on Older Adults’ Technology Acceptance: Understanding the mechanisms of information technology acceptance and use is central to human-computer interaction research. Although technology acceptance models and user experience models differ in objectives, they share many potential intersections. Technology acceptance research focuses on structured prediction and assessment, while user experience research concentrates on interpreting design impacts and new frameworks. Integrating user experience to assess older adults’ acceptance of technology products and systems is crucial (Codfrey et al. 2022 ; Wang et al. 2019 ), particularly for older users, where specific product designs should emphasize practicality and usability (Fisk et al. 2020 ). Researchers need to explore innovative age-appropriate design methods to enhance older adults’ usage experience. This includes studying older users’ actual usage preferences and behaviors, optimizing user interfaces, and interaction designs. Integrating feedback from older adults to tailor products to their needs can further promote their acceptance and continued use of technology products.

Conclusions

This study conducted a systematic review of the literature on older adults’ technology acceptance over the past decade through bibliometric analysis, focusing on the distribution power, research power, knowledge base and theme progress, research hotspots, evolutionary trends, and quality distribution. Using a combination of quantitative and qualitative methods, this study has reached the following conclusions:

Technology acceptance among older adults has become a hot topic in the international academic community, involving the integration of knowledge across multiple disciplines, including Medical Informatics, Health Care Sciences Services, and Ergonomics. In terms of journals, “PSYCHOLOGY, EDUCATION, HEALTH” represents a leading field, with key publications including Computers in Human Behavior , Journal of Medical Internet Research , and International Journal of Human-Computer Interaction . These journals possess significant academic authority and extensive influence in the field.

Research on technology acceptance among older adults is particularly active in developed countries, with China and USA publishing significantly more than other nations. The Netherlands leads in high average citation rates, indicating the depth and impact of its research. Meanwhile, the UK stands out in terms of international collaboration. At the institutional level, City University of Hong Kong and The University of Hong Kong in China are in leading positions. Tilburg University in the Netherlands demonstrates exceptional research quality through its high average citation count. At the author level, Chen from China has the highest number of publications, while Peek from the Netherlands has the highest average citation count.

Co-citation analysis of references indicates that the knowledge base in this field is divided into three main categories: theoretical model deepening, emerging technology applications, and research methods and evaluation. Seminal literature focuses on four areas: specific technology use by older adults, expansion of theoretical models of technology acceptance, information technology adoption behavior, and research perspectives. Research themes have evolved from initial theoretical deepening and analysis of influencing factors to empirical studies on individual factors and emerging technologies.

Keyword analysis indicates that TAM and UTAUT are the most frequently occurring terms, while “assistive technology” and “virtual reality” are focal points with high frequency and centrality. Keyword clustering analysis reveals that research hotspots are concentrated on the influencing factors of technology adoption, human-robot interaction experiences, mobile health management, and technology for aging in place. Time-zone evolution mapping and burst keyword analysis have revealed the research evolution from preliminary exploration of influencing factors, to enhancements in quality of life and health management, and onto advanced technology applications and deepening of theoretical models. Furthermore, analysis of research quality distribution indicates that Usage Experience and Assisted Living Technology have become core topics, while Smart Devices, Theoretical Models, and Mobile Health Applications point towards future research directions.

Through this study, we have systematically reviewed the dynamics, core issues, and evolutionary trends in the field of older adults’ technology acceptance, constructing a comprehensive Knowledge Mapping of the domain and presenting a clear framework of existing research. This not only lays the foundation for subsequent theoretical discussions and innovative applications in the field but also provides an important reference for relevant scholars.

Limitations

To our knowledge, this is the first bibliometric analysis concerning technology acceptance among older adults, and we adhered strictly to bibliometric standards throughout our research. However, this study relies on the Web of Science Core Collection, and while its authority and breadth are widely recognized, this choice may have missed relevant literature published in other significant databases such as PubMed, Scopus, and Google Scholar, potentially overlooking some critical academic contributions. Moreover, given that our analysis was confined to literature in English, it may not reflect studies published in other languages, somewhat limiting the global representativeness of our data sample.

It is noteworthy that with the rapid development of AI technology, its increasingly widespread application in elderly care services is significantly transforming traditional care models. AI is profoundly altering the lifestyles of the elderly, from health monitoring and smart diagnostics to intelligent home systems and personalized care, significantly enhancing their quality of life and health care standards. The potential for AI technology within the elderly population is immense, and research in this area is rapidly expanding. However, due to the restrictive nature of the search terms used in this study, it did not fully cover research in this critical area, particularly in addressing key issues such as trust, privacy, and ethics.

Consequently, future research should not only expand data sources, incorporating multilingual and multidatabase literature, but also particularly focus on exploring older adults’ acceptance of AI technology and its applications, in order to construct a more comprehensive academic landscape of older adults’ technology acceptance, thereby enriching and extending the knowledge system and academic trends in this field.

Data availability

The datasets analyzed during the current study are available in the Dataverse repository: https://doi.org/10.7910/DVN/6K0GJH .

Abdi S, de Witte L, Hawley M (2020) Emerging technologies with potential care and support applications for older people: review of gray literature. JMIR Aging 3(2):e17286. https://doi.org/10.2196/17286

Article   PubMed   PubMed Central   Google Scholar  

Achuthan K, Nair VK, Kowalski R, Ramanathan S, Raman R (2023) Cyberbullying research—Alignment to sustainable development and impact of COVID-19: Bibliometrics and science mapping analysis. Comput Human Behav 140:107566. https://doi.org/10.1016/j.chb.2022.107566

Article   Google Scholar  

Ahmad A, Mozelius P (2022) Human-Computer Interaction for Older Adults: a Literature Review on Technology Acceptance of eHealth Systems. J Eng Res Sci 1(4):119–126. https://doi.org/10.55708/js0104014

Ale Ebrahim N, Salehi H, Embi MA, Habibi F, Gholizadeh H, Motahar SM (2014) Visibility and citation impact. Int Educ Stud 7(4):120–125. https://doi.org/10.5539/ies.v7n4p120

Amin MS, Johnson VL, Prybutok V, Koh CE (2024) An investigation into factors affecting the willingness to disclose personal health information when using AI-enabled caregiver robots. Ind Manag Data Syst 124(4):1677–1699. https://doi.org/10.1108/IMDS-09-2023-0608

Baer NR, Vietzke J, Schenk L (2022) Middle-aged and older adults’ acceptance of mobile nutrition and fitness apps: a systematic mixed studies review. PLoS One 17(12):e0278879. https://doi.org/10.1371/journal.pone.0278879

Barnard Y, Bradley MD, Hodgson F, Lloyd AD (2013) Learning to use new technologies by older adults: Perceived difficulties, experimentation behaviour and usability. Comput Human Behav 29(4):1715–1724. https://doi.org/10.1016/j.chb.2013.02.006

Berkowsky RW, Sharit J, Czaja SJ (2017) Factors predicting decisions about technology adoption among older adults. Innov Aging 3(1):igy002. https://doi.org/10.1093/geroni/igy002

Braun MT (2013) Obstacles to social networking website use among older adults. Comput Human Behav 29(3):673–680. https://doi.org/10.1016/j.chb.2012.12.004

Article   MathSciNet   Google Scholar  

Campo-Prieto P, Rodríguez-Fuentes G, Cancela-Carral JM (2021) Immersive virtual reality exergame promotes the practice of physical activity in older people: An opportunity during COVID-19. Multimodal Technol Interact 5(9):52. https://doi.org/10.3390/mti5090052

Chen C (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 57(3):359–377. https://doi.org/10.1002/asi.20317

Chen C, Dubin R, Kim MC (2014) Emerging trends and new developments in regenerative medicine: a scientometric update (2000–2014). Expert Opin Biol Ther 14(9):1295–1317. https://doi.org/10.1517/14712598.2014.920813

Article   PubMed   Google Scholar  

Chen C, Leydesdorff L (2014) Patterns of connections and movements in dual‐map overlays: A new method of publication portfolio analysis. J Assoc Inf Sci Technol 65(2):334–351. https://doi.org/10.1002/asi.22968

Chen J, Wang C, Tang Y (2022) Knowledge mapping of volunteer motivation: A bibliometric analysis and cross-cultural comparative study. Front Psychol 13:883150. https://doi.org/10.3389/fpsyg.2022.883150

Chen JY, Liu YD, Dai J, Wang CL (2023) Development and status of moral education research: Visual analysis based on knowledge graph. Front Psychol 13:1079955. https://doi.org/10.3389/fpsyg.2022.1079955

Chen K, Chan AH (2011) A review of technology acceptance by older adults. Gerontechnology 10(1):1–12. https://doi.org/10.4017/gt.2011.10.01.006.00

Chen K, Chan AH (2014) Gerontechnology acceptance by elderly Hong Kong Chinese: a senior technology acceptance model (STAM). Ergonomics 57(5):635–652. https://doi.org/10.1080/00140139.2014.895855

Chen K, Zhang Y, Fu X (2019) International research collaboration: An emerging domain of innovation studies? Res Policy 48(1):149–168. https://doi.org/10.1016/j.respol.2018.08.005

Chen X, Hu Z, Wang C (2024) Empowering education development through AIGC: A systematic literature review. Educ Inf Technol 1–53. https://doi.org/10.1007/s10639-024-12549-7

Chen Y, Chen CM, Liu ZY, Hu ZG, Wang XW (2015) The methodology function of CiteSpace mapping knowledge domains. Stud Sci Sci 33(2):242–253. https://doi.org/10.16192/j.cnki.1003-2053.2015.02.009

Codfrey GS, Baharum A, Zain NHM, Omar M, Deris FD (2022) User Experience in Product Design and Development: Perspectives and Strategies. Math Stat Eng Appl 71(2):257–262. https://doi.org/10.17762/msea.v71i2.83

Dai J, Zhang X, Wang CL (2024) A meta-analysis of learners’ continuance intention toward online education platforms. Educ Inf Technol 1–36. https://doi.org/10.1007/s10639-024-12654-7

Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319–340. https://doi.org/10.2307/249008

Delmastro F, Dolciotti C, Palumbo F, Magrini M, Di Martino F, La Rosa D, Barcaro U (2018) Long-term care: how to improve the quality of life with mobile and e-health services. In 2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 12–19. IEEE. https://doi.org/10.1109/WiMOB.2018.8589157

Dupuis K, Tsotsos LE (2018) Technology for remote health monitoring in an older population: a role for mobile devices. Multimodal Technol Interact 2(3):43. https://doi.org/10.3390/mti2030043

Ferguson C, Hickman LD, Turkmani S, Breen P, Gargiulo G, Inglis SC (2021) Wearables only work on patients that wear them”: Barriers and facilitators to the adoption of wearable cardiac monitoring technologies. Cardiovasc Digit Health J 2(2):137–147. https://doi.org/10.1016/j.cvdhj.2021.02.001

Fisk AD, Czaja SJ, Rogers WA, Charness N, Sharit J (2020) Designing for older adults: Principles and creative human factors approaches. CRC Press. https://doi.org/10.1201/9781420080681

Friesen S, Brémault-Phillips S, Rudrum L, Rogers LG (2016) Environmental design that supports healthy aging: Evaluating a new supportive living facility. J Hous Elderly 30(1):18–34. https://doi.org/10.1080/02763893.2015.1129380

Garcia Reyes EP, Kelly R, Buchanan G, Waycott J (2023) Understanding Older Adults’ Experiences With Technologies for Health Self-management: Interview Study. JMIR Aging 6:e43197. https://doi.org/10.2196/43197

Geng Z, Wang J, Liu J, Miao J (2024) Bibliometric analysis of the development, current status, and trends in adult degenerative scoliosis research: A systematic review from 1998 to 2023. J Pain Res 17:153–169. https://doi.org/10.2147/JPR.S437575

González A, Ramírez MP, Viadel V (2012) Attitudes of the elderly toward information and communications technologies. Educ Gerontol 38(9):585–594. https://doi.org/10.1080/03601277.2011.595314

Guner H, Acarturk C (2020) The use and acceptance of ICT by senior citizens: a comparison of technology acceptance model (TAM) for elderly and young adults. Univ Access Inf Soc 19(2):311–330. https://doi.org/10.1007/s10209-018-0642-4

Halim I, Saptari A, Perumal PA, Abdullah Z, Abdullah S, Muhammad MN (2022) A Review on Usability and User Experience of Assistive Social Robots for Older Persons. Int J Integr Eng 14(6):102–124. https://penerbit.uthm.edu.my/ojs/index.php/ijie/article/view/8566

He Y, He Q, Liu Q (2022) Technology acceptance in socially assistive robots: Scoping review of models, measurement, and influencing factors. J Healthc Eng 2022(1):6334732. https://doi.org/10.1155/2022/6334732

Heerink M, Kröse B, Evers V, Wielinga B (2010) Assessing acceptance of assistive social agent technology by older adults: the almere model. Int J Soc Robot 2:361–375. https://doi.org/10.1007/s12369-010-0068-5

Ho A (2020) Are we ready for artificial intelligence health monitoring in elder care? BMC Geriatr 20(1):358. https://doi.org/10.1186/s12877-020-01764-9

Hoque R, Sorwar G (2017) Understanding factors influencing the adoption of mHealth by the elderly: An extension of the UTAUT model. Int J Med Inform 101:75–84. https://doi.org/10.1016/j.ijmedinf.2017.02.002

Hota PK, Subramanian B, Narayanamurthy G (2020) Mapping the intellectual structure of social entrepreneurship research: A citation/co-citation analysis. J Bus Ethics 166(1):89–114. https://doi.org/10.1007/s10551-019-04129-4

Huang R, Yan P, Yang X (2021) Knowledge map visualization of technology hotspots and development trends in China’s textile manufacturing industry. IET Collab Intell Manuf 3(3):243–251. https://doi.org/10.1049/cim2.12024

Article   ADS   Google Scholar  

Jing Y, Wang C, Chen Y, Wang H, Yu T, Shadiev R (2023) Bibliometric mapping techniques in educational technology research: A systematic literature review. Educ Inf Technol 1–29. https://doi.org/10.1007/s10639-023-12178-6

Jing YH, Wang CL, Chen ZY, Shen SS, Shadiev R (2024a) A Bibliometric Analysis of Studies on Technology-Supported Learning Environments: Hotopics and Frontier Evolution. J Comput Assist Learn 1–16. https://doi.org/10.1111/jcal.12934

Jing YH, Wang HM, Chen XJ, Wang CL (2024b) What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi-experimental study. Humanit Soc Sci Commun 11:319. https://doi.org/10.1057/s41599-024-02751-w

Kamrani P, Dorsch I, Stock WG (2021) Do researchers know what the h-index is? And how do they estimate its importance? Scientometrics 126(7):5489–5508. https://doi.org/10.1007/s11192-021-03968-1

Kim HS, Lee KH, Kim H, Kim JH (2014) Using mobile phones in healthcare management for the elderly. Maturitas 79(4):381–388. https://doi.org/10.1016/j.maturitas.2014.08.013

Article   MathSciNet   PubMed   Google Scholar  

Kleinberg J (2002) Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91–101. https://doi.org/10.1145/775047.775061

Kruse C, Fohn J, Wilson N, Patlan EN, Zipp S, Mileski M (2020) Utilization barriers and medical outcomes commensurate with the use of telehealth among older adults: systematic review. JMIR Med Inform 8(8):e20359. https://doi.org/10.2196/20359

Kumar S, Lim WM, Pandey N, Christopher Westland J (2021) 20 years of electronic commerce research. Electron Commer Res 21:1–40. https://doi.org/10.1007/s10660-021-09464-1

Kwiek M (2021) What large-scale publication and citation data tell us about international research collaboration in Europe: Changing national patterns in global contexts. Stud High Educ 46(12):2629–2649. https://doi.org/10.1080/03075079.2020.1749254

Lee C, Coughlin JF (2015) PERSPECTIVE: Older adults’ adoption of technology: an integrated approach to identifying determinants and barriers. J Prod Innov Manag 32(5):747–759. https://doi.org/10.1111/jpim.12176

Lee CH, Wang C, Fan X, Li F, Chen CH (2023) Artificial intelligence-enabled digital transformation in elderly healthcare field: scoping review. Adv Eng Inform 55:101874. https://doi.org/10.1016/j.aei.2023.101874

Leydesdorff L, Rafols I (2012) Interactive overlays: A new method for generating global journal maps from Web-of-Science data. J Informetr 6(2):318–332. https://doi.org/10.1016/j.joi.2011.11.003

Li J, Ma Q, Chan AH, Man S (2019) Health monitoring through wearable technologies for older adults: Smart wearables acceptance model. Appl Ergon 75:162–169. https://doi.org/10.1016/j.apergo.2018.10.006

Article   ADS   PubMed   Google Scholar  

Li X, Zhou D (2020) Product design requirement information visualization approach for intelligent manufacturing services. China Mech Eng 31(07):871, http://www.cmemo.org.cn/EN/Y2020/V31/I07/871

Google Scholar  

Lin Y, Yu Z (2024a) An integrated bibliometric analysis and systematic review modelling students’ technostress in higher education. Behav Inf Technol 1–25. https://doi.org/10.1080/0144929X.2024.2332458

Lin Y, Yu Z (2024b) A bibliometric analysis of artificial intelligence chatbots in educational contexts. Interact Technol Smart Educ 21(2):189–213. https://doi.org/10.1108/ITSE-12-2022-0165

Liu L, Duffy VG (2023) Exploring the future development of Artificial Intelligence (AI) applications in chatbots: a bibliometric analysis. Int J Soc Robot 15(5):703–716. https://doi.org/10.1007/s12369-022-00956-0

Liu R, Li X, Chu J (2022) Evolution of applied variables in the research on technology acceptance of the elderly. In: International Conference on Human-Computer Interaction, Cham: Springer International Publishing, pp 500–520. https://doi.org/10.1007/978-3-031-05581-23_5

Luijkx K, Peek S, Wouters E (2015) “Grandma, you should do it—It’s cool” Older Adults and the Role of Family Members in Their Acceptance of Technology. Int J Environ Res Public Health 12(12):15470–15485. https://doi.org/10.3390/ijerph121214999

Lussier M, Lavoie M, Giroux S, Consel C, Guay M, Macoir J, Bier N (2018) Early detection of mild cognitive impairment with in-home monitoring sensor technologies using functional measures: a systematic review. IEEE J Biomed Health Inform 23(2):838–847. https://doi.org/10.1109/JBHI.2018.2834317

López-Robles JR, Otegi-Olaso JR, Porto Gomez I, Gamboa-Rosales NK, Gamboa-Rosales H, Robles-Berumen H (2018) Bibliometric network analysis to identify the intellectual structure and evolution of the big data research field. In: International Conference on Intelligent Data Engineering and Automated Learning, Cham: Springer International Publishing, pp 113–120. https://doi.org/10.1007/978-3-030-03496-2_13

Ma Q, Chan AH, Chen K (2016) Personal and other factors affecting acceptance of smartphone technology by older Chinese adults. Appl Ergon 54:62–71. https://doi.org/10.1016/j.apergo.2015.11.015

Ma Q, Chan AHS, Teh PL (2021) Insights into Older Adults’ Technology Acceptance through Meta-Analysis. Int J Hum-Comput Interact 37(11):1049–1062. https://doi.org/10.1080/10447318.2020.1865005

Macedo IM (2017) Predicting the acceptance and use of information and communication technology by older adults: An empirical examination of the revised UTAUT2. Comput Human Behav 75:935–948. https://doi.org/10.1016/j.chb.2017.06.013

Maidhof C, Offermann J, Ziefle M (2023) Eyes on privacy: acceptance of video-based AAL impacted by activities being filmed. Front Public Health 11:1186944. https://doi.org/10.3389/fpubh.2023.1186944

Majumder S, Aghayi E, Noferesti M, Memarzadeh-Tehran H, Mondal T, Pang Z, Deen MJ (2017) Smart homes for elderly healthcare—Recent advances and research challenges. Sensors 17(11):2496. https://doi.org/10.3390/s17112496

Article   ADS   PubMed   PubMed Central   Google Scholar  

Mhlanga D (2023) Artificial Intelligence in elderly care: Navigating ethical and responsible AI adoption for seniors. Available at SSRN 4675564. 4675564 min) Identifying citation patterns of scientific breakthroughs: A perspective of dynamic citation process. Inf Process Manag 58(1):102428. https://doi.org/10.1016/j.ipm.2020.102428

Mitzner TL, Boron JB, Fausset CB, Adams AE, Charness N, Czaja SJ, Sharit J (2010) Older adults talk technology: Technology usage and attitudes. Comput Human Behav 26(6):1710–1721. https://doi.org/10.1016/j.chb.2010.06.020

Mitzner TL, Savla J, Boot WR, Sharit J, Charness N, Czaja SJ, Rogers WA (2019) Technology adoption by older adults: Findings from the PRISM trial. Gerontologist 59(1):34–44. https://doi.org/10.1093/geront/gny113

Mongeon P, Paul-Hus A (2016) The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics 106:213–228. https://doi.org/10.1007/s11192-015-1765-5

Mostaghel R (2016) Innovation and technology for the elderly: Systematic literature review. J Bus Res 69(11):4896–4900. https://doi.org/10.1016/j.jbusres.2016.04.049

Mujirishvili T, Maidhof C, Florez-Revuelta F, Ziefle M, Richart-Martinez M, Cabrero-García J (2023) Acceptance and privacy perceptions toward video-based active and assisted living technologies: Scoping review. J Med Internet Res 25:e45297. https://doi.org/10.2196/45297

Naseri RNN, Azis SN, Abas N (2023) A Review of Technology Acceptance and Adoption Models in Consumer Study. FIRM J Manage Stud 8(2):188–199. https://doi.org/10.33021/firm.v8i2.4536

Nguyen UP, Hallinger P (2020) Assessing the distinctive contributions of Simulation & Gaming to the literature, 1970–2019: A bibliometric review. Simul Gaming 51(6):744–769. https://doi.org/10.1177/1046878120941569

Olmedo-Aguirre JO, Reyes-Campos J, Alor-Hernández G, Machorro-Cano I, Rodríguez-Mazahua L, Sánchez-Cervantes JL (2022) Remote healthcare for elderly people using wearables: A review. Biosensors 12(2):73. https://doi.org/10.3390/bios12020073

Pan S, Jordan-Marsh M (2010) Internet use intention and adoption among Chinese older adults: From the expanded technology acceptance model perspective. Comput Human Behav 26(5):1111–1119. https://doi.org/10.1016/j.chb.2010.03.015

Pan X, Yan E, Cui M, Hua W (2018) Examining the usage, citation, and diffusion patterns of bibliometric map software: A comparative study of three tools. J Informetr 12(2):481–493. https://doi.org/10.1016/j.joi.2018.03.005

Park JS, Kim NR, Han EJ (2018) Analysis of trends in science and technology using keyword network analysis. J Korea Ind Inf Syst Res 23(2):63–73. https://doi.org/10.9723/jksiis.2018.23.2.063

Peek ST, Luijkx KG, Rijnaard MD, Nieboer ME, Van Der Voort CS, Aarts S, Wouters EJ (2016) Older adults’ reasons for using technology while aging in place. Gerontology 62(2):226–237. https://doi.org/10.1159/000430949

Peek ST, Luijkx KG, Vrijhoef HJ, Nieboer ME, Aarts S, van der Voort CS, Wouters EJ (2017) Origins and consequences of technology acquirement by independent-living seniors: Towards an integrative model. BMC Geriatr 17:1–18. https://doi.org/10.1186/s12877-017-0582-5

Peek ST, Wouters EJ, Van Hoof J, Luijkx KG, Boeije HR, Vrijhoef HJ (2014) Factors influencing acceptance of technology for aging in place: a systematic review. Int J Med Inform 83(4):235–248. https://doi.org/10.1016/j.ijmedinf.2014.01.004

Peek STM, Luijkx KG, Vrijhoef HJM, Nieboer ME, Aarts S, Van Der Voort CS, Wouters EJM (2019) Understanding changes and stability in the long-term use of technologies by seniors who are aging in place: a dynamical framework. BMC Geriatr 19:1–13. https://doi.org/10.1186/s12877-019-1241-9

Perez AJ, Siddiqui F, Zeadally S, Lane D (2023) A review of IoT systems to enable independence for the elderly and disabled individuals. Internet Things 21:100653. https://doi.org/10.1016/j.iot.2022.100653

Piau A, Wild K, Mattek N, Kaye J (2019) Current state of digital biomarker technologies for real-life, home-based monitoring of cognitive function for mild cognitive impairment to mild Alzheimer disease and implications for clinical care: systematic review. J Med Internet Res 21(8):e12785. https://doi.org/10.2196/12785

Pirzada P, Wilde A, Doherty GH, Harris-Birtill D (2022) Ethics and acceptance of smart homes for older adults. Inform Health Soc Care 47(1):10–37. https://doi.org/10.1080/17538157.2021.1923500

Pranckutė R (2021) Web of Science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications 9(1):12. https://doi.org/10.3390/publications9010012

Qian K, Zhang Z, Yamamoto Y, Schuller BW (2021) Artificial intelligence internet of things for the elderly: From assisted living to health-care monitoring. IEEE Signal Process Mag 38(4):78–88. https://doi.org/10.1109/MSP.2021.3057298

Redner S (1998) How popular is your paper? An empirical study of the citation distribution. Eur Phys J B-Condens Matter Complex Syst 4(2):131–134. https://doi.org/10.1007/s100510050359

Sayago S (ed.) (2019) Perspectives on human-computer interaction research with older people. Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-06076-3

Schomakers EM, Ziefle M (2023) Privacy vs. security: trade-offs in the acceptance of smart technologies for aging-in-place. Int J Hum Comput Interact 39(5):1043–1058. https://doi.org/10.1080/10447318.2022.2078463

Schroeder T, Dodds L, Georgiou A, Gewald H, Siette J (2023) Older adults and new technology: Mapping review of the factors associated with older adults’ intention to adopt digital technologies. JMIR Aging 6(1):e44564. https://doi.org/10.2196/44564

Seibert K, Domhoff D, Bruch D, Schulte-Althoff M, Fürstenau D, Biessmann F, Wolf-Ostermann K (2021) Application scenarios for artificial intelligence in nursing care: rapid review. J Med Internet Res 23(11):e26522. https://doi.org/10.2196/26522

Seuwou P, Banissi E, Ubakanma G (2016) User acceptance of information technology: A critical review of technology acceptance models and the decision to invest in Information Security. In: Global Security, Safety and Sustainability-The Security Challenges of the Connected World: 11th International Conference, ICGS3 2017, London, UK, January 18-20, 2017, Proceedings 11:230-251. Springer International Publishing. https://doi.org/10.1007/978-3-319-51064-4_19

Shiau WL, Wang X, Zheng F (2023) What are the trend and core knowledge of information security? A citation and co-citation analysis. Inf Manag 60(3):103774. https://doi.org/10.1016/j.im.2023.103774

Sinha S, Verma A, Tiwari P (2021) Technology: Saving and enriching life during COVID-19. Front Psychol 12:647681. https://doi.org/10.3389/fpsyg.2021.647681

Soar J (2010) The potential of information and communication technologies to support ageing and independent living. Ann Telecommun 65:479–483. https://doi.org/10.1007/s12243-010-0167-1

Strotmann A, Zhao D (2012) Author name disambiguation: What difference does it make in author‐based citation analysis? J Am Soc Inf Sci Technol 63(9):1820–1833. https://doi.org/10.1002/asi.22695

Talukder MS, Sorwar G, Bao Y, Ahmed JU, Palash MAS (2020) Predicting antecedents of wearable healthcare technology acceptance by elderly: A combined SEM-Neural Network approach. Technol Forecast Soc Change 150:119793. https://doi.org/10.1016/j.techfore.2019.119793

Taskin Z, Al U (2019) Natural language processing applications in library and information science. Online Inf Rev 43(4):676–690. https://doi.org/10.1108/oir-07-2018-0217

Touqeer H, Zaman S, Amin R, Hussain M, Al-Turjman F, Bilal M (2021) Smart home security: challenges, issues and solutions at different IoT layers. J Supercomput 77(12):14053–14089. https://doi.org/10.1007/s11227-021-03825-1

United Nations Department of Economic and Social Affairs (2023) World population ageing 2023: Highlights. https://www.un.org/zh/193220

Valk CAL, Lu Y, Randriambelonoro M, Jessen J (2018) Designing for technology acceptance of wearable and mobile technologies for senior citizen users. In: 21st DMI: Academic Design Management Conference (ADMC 2018), Design Management Institute, pp 1361–1373. https://www.dmi.org/page/ADMC2018

Van Eck N, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2):523–538. https://doi.org/10.1007/s11192-009-0146-3

Vancea M, Solé-Casals J (2016) Population aging in the European Information Societies: towards a comprehensive research agenda in eHealth innovations for elderly. Aging Dis 7(4):526. https://doi.org/10.14336/AD.2015.1214

Venkatesh V, Morris MG, Davis GB, Davis FD (2003) User acceptance of information technology: Toward a unified view. MIS Q 27(3):425–478. https://doi.org/10.2307/30036540

Wagner N, Hassanein K, Head M (2010) Computer use by older adults: A multi-disciplinary review. Comput Human Behav 26(5):870–882. https://doi.org/10.1016/j.chb.2010.03.029

Wahlroos N, Narsakka N, Stolt M, Suhonen R (2023) Physical environment maintaining independence and self-management of older people in long-term care settings—An integrative literature review. J Aging Environ 37(3):295–313. https://doi.org/10.1080/26892618.2022.2092927

Wang CL, Chen XJ, Yu T, Liu YD, Jing YH (2024a) Education reform and change driven by digital technology: a bibliometric study from a global perspective. Humanit Soc Sci Commun 11(1):1–17. https://doi.org/10.1057/s41599-024-02717-y

Wang CL, Dai J, Zhu KK, Yu T, Gu XQ (2023a) Understanding the Continuance Intention of College Students Toward New E-learning Spaces Based on an Integrated Model of the TAM and TTF. Int J Hum-comput Int 1–14. https://doi.org/10.1080/10447318.2023.2291609

Wang CL, Wang HM, Li YY, Dai J, Gu XQ, Yu T (2024b) Factors Influencing University Students’ Behavioral Intention to Use Generative Artificial Intelligence: Integrating the Theory of Planned Behavior and AI Literacy. Int J Hum-comput Int 1–23. https://doi.org/10.1080/10447318.2024.2383033

Wang J, Zhao W, Zhang Z, Liu X, Xie T, Wang L, Zhang Y (2024c) A journey of challenges and victories: a bibliometric worldview of nanomedicine since the 21st century. Adv Mater 36(15):2308915. https://doi.org/10.1002/adma.202308915

Wang J, Chen Y, Huo S, Mai L, Jia F (2023b) Research hotspots and trends of social robot interaction design: A bibliometric analysis. Sensors 23(23):9369. https://doi.org/10.3390/s23239369

Wang KH, Chen G, Chen HG (2017) A model of technology adoption by older adults. Soc Behav Personal 45(4):563–572. https://doi.org/10.2224/sbp.5778

Wang S, Bolling K, Mao W, Reichstadt J, Jeste D, Kim HC, Nebeker C (2019) Technology to Support Aging in Place: Older Adults’ Perspectives. Healthcare 7(2):60. https://doi.org/10.3390/healthcare7020060

Wang Z, Liu D, Sun Y, Pang X, Sun P, Lin F, Ren K (2022) A survey on IoT-enabled home automation systems: Attacks and defenses. IEEE Commun Surv Tutor 24(4):2292–2328. https://doi.org/10.1109/COMST.2022.3201557

Wilkowska W, Offermann J, Spinsante S, Poli A, Ziefle M (2022) Analyzing technology acceptance and perception of privacy in ambient assisted living for using sensor-based technologies. PloS One 17(7):e0269642. https://doi.org/10.1371/journal.pone.0269642

Wilson J, Heinsch M, Betts D, Booth D, Kay-Lambkin F (2021) Barriers and facilitators to the use of e-health by older adults: a scoping review. BMC Public Health 21:1–12. https://doi.org/10.1186/s12889-021-11623-w

Xia YQ, Deng YL, Tao XY, Zhang SN, Wang CL (2024) Digital art exhibitions and psychological well-being in Chinese Generation Z: An analysis based on the S-O-R framework. Humanit Soc Sci Commun 11:266. https://doi.org/10.1057/s41599-024-02718-x

Xie H, Zhang Y, Duan K (2020) Evolutionary overview of urban expansion based on bibliometric analysis in Web of Science from 1990 to 2019. Habitat Int 95:102100. https://doi.org/10.1016/j.habitatint.2019.10210

Xu Z, Ge Z, Wang X, Skare M (2021) Bibliometric analysis of technology adoption literature published from 1997 to 2020. Technol Forecast Soc Change 170:120896. https://doi.org/10.1016/j.techfore.2021.120896

Yap YY, Tan SH, Choon SW (2022) Elderly’s intention to use technologies: a systematic literature review. Heliyon 8(1). https://doi.org/10.1016/j.heliyon.2022.e08765

Yu T, Dai J, Wang CL (2023) Adoption of blended learning: Chinese university students’ perspectives. Humanit Soc Sci Commun 10:390. https://doi.org/10.1057/s41599-023-01904-7

Yusif S, Soar J, Hafeez-Baig A (2016) Older people, assistive technologies, and the barriers to adoption: A systematic review. Int J Med Inform 94:112–116. https://doi.org/10.1016/j.ijmedinf.2016.07.004

Zhang J, Zhu L (2022) Citation recommendation using semantic representation of cited papers’ relations and content. Expert Syst Appl 187:115826. https://doi.org/10.1016/j.eswa.2021.115826

Zhao Y, Li J (2024) Opportunities and challenges of integrating artificial intelligence in China’s elderly care services. Sci Rep 14(1):9254. https://doi.org/10.1038/s41598-024-60067-w

Article   ADS   MathSciNet   PubMed   PubMed Central   Google Scholar  

Download references

Acknowledgements

This research was supported by the Social Science Foundation of Shaanxi Province in China (Grant No. 2023J014).

Author information

Authors and affiliations.

School of Art and Design, Shaanxi University of Science and Technology, Xi’an, China

Xianru Shang, Zijian Liu, Chen Gong, Zhigang Hu & Yuexuan Wu

Department of Education Information Technology, Faculty of Education, East China Normal University, Shanghai, China

Chengliang Wang

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, XS, YW, CW; methodology, XS, ZL, CG, CW; software, XS, CG, YW; writing-original draft preparation, XS, CW; writing-review and editing, XS, CG, ZH, CW; supervision, ZL, ZH, CW; project administration, ZL, ZH, CW; funding acquisition, XS, CG. All authors read and approved the final manuscript. All authors have read and approved the re-submission of the manuscript.

Corresponding author

Correspondence to Chengliang Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

Informed consent was not required as the study did not involve human participants.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shang, X., Liu, Z., Gong, C. et al. Knowledge mapping and evolution of research on older adults’ technology acceptance: a bibliometric study from 2013 to 2023. Humanit Soc Sci Commun 11 , 1115 (2024). https://doi.org/10.1057/s41599-024-03658-2

Download citation

Received : 20 June 2024

Accepted : 21 August 2024

Published : 31 August 2024

DOI : https://doi.org/10.1057/s41599-024-03658-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

big data psychology research

Banner

  • How do I start?
  • Which databases should I use?
  • Where else should I search?
  • How do I create citations? This link opens in a new window
  • PSYC 108: Experimental Methods This link opens in a new window

Goddard Library Online Help

—— —— —— —— ——— —— —— —— —— —

On  this page you will: 

  • Find answers to frequently asked questions
  • Get research help from a librarian by phone, chat, email or research consultation
  • Book a study room

On this page you will find...

There are a lot of databases available, so it can be difficult to decide which one to use for your research. The ones on this page have been curated for psychology. See the bottom of the page for the link to the full list of databases.

You can also check out the Best Bets box on the Home page.

  • PsycArticles (APA)
  • Why Should I Use It?
  • PsycARTICLES (APA) This link opens in a new window Full text journals from the American Psychological Association (APA)

PsycArticles (APA) includes a catalog of more than 100 social and behavioral science journals dating back to 1894 covering core psychology topics such as Addiction, Developmental Psychology, and Health Psychology. Journal Snapshots provides details on authors, most-cited articles and other key journal details and Journal Browse provides functionality with Impact Factors. 

  • PsycINFO This link opens in a new window Index to the psychological literature from 1887-present

PsycInfo is the premier abstracting and indexing database covering the behavioral and social sciences from the authority in psychology. It includes over 5,000,000 peer-reviewed records and spans 600 years of content.  

  • JSTOR This link opens in a new window Serach journals, ebooks, images and primary sources in the humanities, social sciences, economics, mathematics, and language and literature.

JSTOR has a large curated collection of psychology resources, covering a wide range of perspectives and topics. You will find articles published from the 20th century to the present. 

  • Project Muse
  • Project Muse This link opens in a new window Over 400 full text scholarly journals in the fields of the humanities, social sciences, and mathematics.

Project Muse provides a curated database of scholarly journals and books for humanities research. The collection includes sources from many of the world's leading university presses and scholarly societies. 

Looking for More?

Looking for more options.

Check out the full list of databases on the A-Z database page. 

  • A-Z Database List Find the full list of databases that the Goddard Library provides access to.

Goddard Library Guidance!

To view other databases available, specifically for psychology resources, you can search the A-Z Database list either by selecting Psychology from the Subjects list (on the left) or searching for "Psychology" in the search box (on the right).

A-Z Database page with Psychology selected in the left drop down and "psychology" written in the right search box

  • << Previous: How do I start?
  • Next: Where else should I search? >>
  • Last Updated: Aug 28, 2024 12:51 PM
  • URL: https://clarku.libguides.com/psychology
  • Open access
  • Published: 31 August 2024

Mediating role of alexithymia in relationship between cyberbullying and psychotic experiences in adolescents

  • Niloofar Movahedi   ORCID: orcid.org/0009-0006-2355-4729 1 ,
  • Simin Hosseinian   ORCID: orcid.org/0000-0003-1852-3953 1 ,
  • Hamid Rezaeian   ORCID: orcid.org/0000-0002-7665-3273 1 &
  • Roghieh Nooripour   ORCID: orcid.org/0000-0002-5677-0894 1 , 2  

BMC Psychology volume  12 , Article number:  465 ( 2024 ) Cite this article

Metrics details

Today, addressing issues related to the use of virtual space is of paramount importance due to its significant impact on mental well-being. This is especially crucial when the research community consists of teenagers who are cyber bullies or their victims who have higher vulnerability. The aim of the present study was to investigate the mediating role of alexithymia in the relationship between cyberbullying and psychotic experiences in adolescents.

The research method employed in this study was correlational, and the study population consisted of all male and female middle school students in Tehran during the 2022–2023 academic years. As for data collection, the Cyber-Bullying/Victimization Experiences questionnaire, Community Assessment of Psychic Experiences, and the Toronto Alexithymia scale were applied. A total of 602 samples were gathered by using multi-stage cluster sampling from Tehran in Iran. Four selection of the sample, the regions in Tehran were selected randomly according to the geographical directions of them and then some schools and classes were chosen randomly. Sample was included in the analysis after data entry into SPSS software and subsequent structural equation modeling using AMOS software.

According to the findings, cyberbullying (β = 0.11, p  < 0.05) and cyber victimization(β = 0.41, p  < 0.001) were significant predictors of psychotic experiences. Alexithymia partially mediated the relationship between cyberbullying and psychotic experiences with the mediation effect of 0.28 and cyber victimization and psychotic experiences with the mediation effect of 0.18.

Conclusions

These findings underscore the importance of identifying cyber victims or cyberbullies in order to prevent alexithymia and psychotic experiences in future, in order to prevent more serious problems and becoming psychotic.

Trial registration

The goals and conditions of this research were investigated and approved by the Ethics Committee of Alzahra University in Tehran (code: ALZAHRA.REC.1402.055) on 13th September 2023.

Peer Review reports

Adolescence is the most sensitive part of human development that is associated with important biological, behavioral and psychological changes [ 1 ]. Ever since Erik Erikson’s Childhood and Society (1950), identification starts in adolescence and is achieved by harmony between a person’s biological, psychological and social systems, failure to achieve this harmony leads to mistreat or harmful relationships [ 2 ]. According to the conflict model, which Hall is one of its pioneers, conflict and crisis are normal in adolescence and this period is characterized by its own distress and psychological suffering, these features make the person susceptible to some disorders. Psychotic experiences can emerge as the sign of the stress of this sensitive period and if they do not investigate and followed up at the right time, they will have dangerous and long-term effects on adolescent’s mental health [ 2 ]. The average prevalence rates of psychotic experiences in adolescence are nearly one in four students [ 3 ]. At the beginning, these experiences are temporary but they can become permanent and these people may suffer from some problems like mental disorders, poor performance, high health care costs; these people also have an increased risk of suicide in future [ 4 ].

Psychotic experiences refer to psychotic-like experiences that are common in normal people with symptoms such as hearing unreal voices, magical thinking, delusional symptoms, worry about being harassed by others or cognitive dysregulation [ 5 ]; if these symptoms are weak in terms of duration, degree of helplessness, intensity and the need for treatment, they are called psychotic experiences [ 6 , 7 ]. In other hand, cyberbullying and it’s victimization by the prevalence rate of 73.5% between adolescents is one of the most important problems that can cause psychotic experiences in adolescents, too [ 8 ]. International statistics shows that the prevalence of cyberbullying is averagely between 14.6 and 52% and cyber victimization prevalence is between 6.3 and 32% percentage [ 9 ]. According to the research which was done by shariatpanahi et al. [ 10 ] about 29.82% of students who study at high school had experienced being cyberbullied and 30.90% had made attempts at cyberbullying and 40.62% had friends being cyberbullied [ 10 ] which these statistics show the challenging situation of Iranian teenagers and the importance of doing research on cyberbullying and it’s victimization. In other hand, there has not done research on psychotic experiences of adolescents in Iran and the role of cyberbullying or cyber victimization on psychotic experiences prevalence.

Today, because of increasing trend toward cyberspace and Internet addiction, cyberbullying has become more prevalent among teenagers. Adolescents have a great tendency to use information and communication technologies such as the internet and smartphones in order to cover their emotional and communicational needs which is crucial for them [ 11 , 12 ]. On the other hand, some teenagers try to overcome and control their negative feelings or bad environmental conditions by performing risky behaviors in this space [ 13 ], which it can make them engage in cyberbullying [ 14 ]. In addition, teenagers who experience a lack of self-esteem and also spend much time on the internet may become victims of cyberbullying [ 15 ].

Otake and Luo [ 15 ] said that there is a significant relationship between being a perpetrator or victim of cyberbullying and having psychotic experiences in people [ 15 ]. Perpetrators of cyberbullying often have low empathy and high aggression and because of that they are predisposed to psychotic experiences [ 16 , 17 ] and alexithymia due to their inability to properly regulate their emotions and they try to use avoidance coping strategies to deal with the negative feelings and stress of cyberbullying [ 18 ].

On the other hand, victims of cyberbullying also are affected by the bad memories of this victimization and it can become like a trauma in a person’s life and it can cause some psychological distress such as depression, stress and anxiety that these are some factors that can cause psychotic experiences [ 8 , 19 ]. Having trauma and not receiving proper support from others can increase the stress of adolescents and this situation can cause alexithymia in victims [ 20 ]. Alexithymia is defined as some problems related to one’s feelings, which include problems in identifying and distinguishing between feelings and bodily sensations, problems in describing and expressing feelings, weakness in imagination, external thinking style which it means not focusing on feelings and thoughts [ 21 ]. Alexithymia can be an inciting factor of psychotic experiences [ 22 ].

The main purpose of this research was investigating the mediating role of alexithymia in the relationship between cyberbullying and psychotic experiences in adolescents which it is the first time that the relationship among these three variables and their model is examine. Another purpose of this research was measuring and predicting and controlling the effect of doing cyberbullying or victimization of it on the psychotic experiences of the adolescents. Also, the adolescents who have psychotic experiences are more likely to suffer from psychosis and cognitive dysregulation in future; therefore, the need to identify and control these factors is essential [ 2 ]. Especially the other importance of doing this research is because of that it is the first time that psychotic experiences among Iranian adolescents is studied. This research was done by using structural equation modeling that its’ variables include: independent variables (Cyberbullying and Cyber victimization), dependent variable (Psychotic experiences) and mediator variable (Alexithymia). The main assumption of this research is that alexithymia is the mediator in the relationship among cyberbullying and cyber victimization with psychotic experiences.

Design and participants

To start this research, the goals and conditions of its implementation were investigated and approved by the Ethics Committee of Alzahra University. Then the necessary permissions were taken from the Tehran Department of Education and the relevant school administrators. The study population consisted of all males and females in both middle school and high school students in Tehran during the 2022–2023 academic years. Multi-stage cluster sampling was used and four regions were selected according to the geographical directions of North, South, West and East in Tehran. After that, four schools were randomly chosen from each selected region and then four classes were chosen from each school. After evaluating the questionnaires and questions by the headmasters of the schools again, the headmasters obtained informed consent from the students’ parents or their legal guardian of chosen classes. After that, the researcher went to school on the specified day that the headmaster allowed and went to the selected classes and after explaining the research entry conditions which consisted of not experiencing parental divorce or family members’ death in the last 6 months and having informed consent from their parents or their legal guardian, they distributed the questionnaires. After collection the questionnaires, out of the 700 copies of the questionnaires, 36 ones were useless and they were removed from the study. Among the 663 remaining questionnaires the outlier data were removed and finally 602 samples (females = 48%, n  = 289; males = 52%, n  = 313) were distinguished suitable for evaluation that age range of samples was 13–18 (age less than 14 = 8%, age 14 = 10.3%, age 15 = 22.8%, age 16 = 33.1%, age 17 = 19.1% and age 18 and more = 6.8%). Initial validations were performed using SPSS26 and AMOS24.

Instruments

Cyber-bullying/victimization experiences (cbveq-g).

This scale was designed by Antoniadou et al. [ 23 ] which consists of 24 items and 2 subscales to detect cyber victims and cyberbullying offenders and each subscales consists of 12 items. A 5-point Likert scale (1 = never, 2 = once or twice, 3 = sometimes, 4 = most of the time, 5 = daily) utilizes to score this scale. The acquired points of each question are added and the score of each subscale is obtained in this way. The score can range from 1 to 24. A high score in the subscale of being a victim of cyberbullying means that a person is exposed to cyberbullying a lot, and a high score in cyberbullying also means a high level of doing cyberbullying. The reported Cronbach’s alpha by Antoniadou et al. for cyberbullying is 0.89 and for cyber victimization is 0.80 and a good fitness (CFI = 0.97, TLI = 0.97 and RMSEA = 0.031). The Cronbach’s alpha reported for this questionnaire in Iran is 0.75 for cyberbullying and 0.78 for cyber victimization and the results of confirmatory factor analysis showed a good fitness (CFI = 0.92, NFI = 0.91 and RMSEA = 0.071) by Basharpoor and Zardi [ 24 ]. In the present study, the reliability was calculated by using Cronbach’s alpha coefficient and it turned out to be 0.86 for cyberbullying, 0.87 for cyber victimization and fitness were (CFI = 0.93, NFI = 0.91 and RMSEA = 0.07) for cyber victimization and (CFI = 0.95, NFI = 0.93 and RMSEA = 0.06) for cyberbullying.

Community assessment of psychic experiences (CAPE-P15)

This scale is a short form of CAPE-42 and was developed by Capra et al. [ 25 ]. and it has 15 items. It uses as a screening-instrument for detection of individuals at ultra-high risk for psychosis. A 4-point Likert scale (1 = never, 2 = sometimes, 3 = often, 4 = almost always) uses to score this scale. Subscales of this questionnaire are Persecutory Ideation (5 items), Bizarre Experiences (7 items) and Perceptual Abnormalities (3 items). In order to score this questionnaire, the acquired points are added together based on the selected Likert scale. The scores in this questionnaire can range from 15 to 60 and the weighted score can be calculated by dividing the sum of the scores to the number of answered questions. Weighted scores range from 1 to 4. A cut-off point of 1.7 for weighted score defines for psychotic experiences in normal people. People who get a score higher than the cut-off point have alarming psychotic experiences. The Cronbach’s alpha of this scale was reported as 0.88 by Capra et al. and good fit was achieved on all criteria (CFI = 0.94, NFI = 0.90, RMSEA = 0.048). The Cronbach’s alpha for CAPE-42 in Iran was reported 0.93 and the result of the confirmatory factor analysis demonstrated the relevance of the fitness (CFI = 0.95, NFI = 0.94, RMSEA = 0.077) by Mirzaee et al. [ 26 ]. The Cronbach’s alpha in Iran was calculated as 0.88 in the present study and fitness (CFI = 0.97, NFI = 0.94, RMSEA = 0.041).

Toronto alexithymia scale (TAS-20)

This scale was proposed by Bagby et al. [ 27 ] and it is the most frequently and widely used measure of alexithymia that consists of three subscales: difficulty identifying feelings (7 items), difficulty describing feelings (5 items) and externally oriented thinking (8 items). The participants should answer to 20 questions on a 5-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = strongly agree). To score this questionnaire, first the score of items 4, 5, 10, 18 and 19 should reverse (1 = 5, 2 = 4, 4 = 2, 5 = 1, 3 = 3) then the acquired points of all items based on their selected Likert scale should add together. According to the recommendation of the developers of this scale, a total score above 60 is considered as the cut-off point for diagnosing alexithymia. Cronbach’s alpha for this questionnaire was reported by Bagby et al. as 0.85, and in Iran Besharat reported it as 0.85 [ 28 ] and good fitness (CFI = 0.92, NNFI = 0.93, RMSEA = 0.05); Cronbach’s alpha of this questionnaire was calculated as 0.92 in this study and (CFI = 0.92, NFI = 0.90, RMSEA = 0.069).

Analysis method

Descriptive analysis of this research was performed by SPSS26. First, the situation of missing data was investigated and 15 questionnaires which had more than 5% of missing data were left out from the calculations. Other remaining questionnaires which had less than 5% of missing data were handled by using the listwise deletion procedure [ 29 ]. The outlier data were also identified by the box diagram in SPSS so 47 questionnaires were removed and 602 suitable samples remained, after that to ensure the appropriateness of outlier data removal, Mahalanobis distance was calculated, which It was calculated by dividing the largest amount of Mahalanobis distance in AMOS which it was 231.248 and it was divided to the number of questions (which is 59), and Mahalanobis value was 3.9 which this number is less than cut off point of 4 and it indicates that there is any multivariable outliers. Normality of the data was checked by skewness and kurtosis values of subscales (Table  1 ), Also the mean, standard deviation and Cronbach’s alpha of all subscales of the questionnaires which had used in this research are available in Table  1 . For evaluating multicollinearity of the variables, variance inflation factor was checked and for cyber victimization and cyberbullying it was 2.397 which this amount is less than cut off point 5 and it shows that there is not multicollinearity between independent variables.

Descriptive analysis

The bivariate correlations of all variables were calculated and they are available in Table  2 . The data shows that cyberbullying was positively correlated with alexithymia ( r  = 0.37, p  < 0.01); cyber victimization with alexithymia ( r  = 0.41, p  < 0.01). Alexithymia was significantly and positively correlated with the dependent variable ( r  = 0.36, p  < 0.01), too. After that, the structure equation modelling was conducted in Amos software in order to examine the model fit and mediation model of the variables.

Confirmatory factor analysis

The factor loading of all items of the questionnaires was checked in Amos and it was observed that all the factor loadings were in the appropriate range (more than 0.4) therefore none of them was deleted.

Measurement model fit

The values of the normalized chi-square index CMIN/DF, RMSEA, NFI, TLI, CFI, and GFI of the model and the scales were checked (Table  3 ). According to the literature, the resulting value for the normalized chi-square should be less than 5, the value of the root mean square error of approximation (RMSEA) should be between 0.03 and 0.08, and the values of the comparative fit index (CFI), goodness of fit index (GFI) and normalized fit index (NFI) should be all higher than 0.9 [ 30 ]. All the scales which were used in this research had good validity and reliability and they were suitable for the structural equations modeling. The model fit of the research model was (CMIN/DF = 2.09, CFI = 0.9, TLI = 0.88, RMSEA = 0.04) and all standardized factor loadings were significant ( p  < 0.001).

Structural model

The structural model was conducted in order to investigate linear relationships and beta coefficients between exogenous and endogenous variables and calculating coefficient of determination (Fig.  1 ). Based on the results of the structural model, cyberbullying (β = 0.10, p  < 0.05) and alexithymia (β = 0.33, p  < 0.001) were positively predicted the psychotic experiences of adolescents. There was also a positive relationship between cyber victimization (β = 0.41, p  < 0.001) and psychotic experiences. These variables explain 50% of the variance of psychotic experiences among Iranian students.

figure 1

Structural model of psychotic experiences

Mediation model

In order to determine the type of mediator of the research model, the meaningfulness of relationships investigated; a comparison was performed between full and indirect models. The results proved that the research model was partially mediated by alexithymia, which it means that a part of the effect of the independent variable on the dependent variable was depended on the presence of the mediator. There is a positive and significant direct relationship between cyber victimization and psychotic experiences (β = 0.64, C.R  = 9.81, p  < 0.001) which the standard coefficient in the full model decreased (β = 0.49, C.R  = 8.43, p  < 0.001) which it means that the mediator affected the direct relationship between cyber victimization and psychotic experiences. The mediation effect of cyber victimization in this model was equal to 0.18.

In the relationship between doing cyberbullying and psychotic experiences, the direct relationship (β = 0.52, C.R  = 8.26, p  < 0.001) decreased to (β = 0.38, C.R  = 7.97, p  < 0.001) in full model which it means that there is a partial mediation between these two variables. The mediation effect of this model was 0.28.

Current research was done in Iran on middle and high school students to realize the effect of cyberbullying on psychotic experiences in adolescents which are one of the most important and sensitive group of the societies. This research emerged some finding; investigation the relationship between doing cyberbullying and its victimization with psychotic experiences was the first purpose of this study; this result is similar to the result of Otak and Luo’s [ 15 ] research which indicated that both doing cyberbullying and cyber victimization relate to psychotic experiences [ 15 ].

Cyber victimization had positive relationship with psychotic experiences ( r  = 0.51, p  < 0.01) in this research which this result was compatible with the results of the research of Fekih-Romdhane et al. [ 8 ]; in order to describe this relationship, they represented that cyber victimization are often unpleasant experience that can cause stress and trauma [ 8 ]; while Turner et al. [ 31 ]. resulted that trauma and stress are some important factors that can cause psychotic experiences in adolescents [ 31 ] and according to the type of trauma, it can have different level of effects on the person; for example, it has proven that traumas caused by interpersonal violence or being ignored by parents are more effective than other traumas and they are more probable to cause psychotic experiences [ 32 ]. This relationship was delineated by Croft et al. [ 32 ] and Hartley et al. [ 33 ]. in this way that cyber victims may suffer from some kinds of distrust, pessimism and rumination, which these symptoms are related to psychotic experiences [ 33 , 34 ]; however these symptoms can last for a long time and lead to social anxiety, illusions or strange behaviors in the person.

On the other hand, there is also a relationship between doing cyberbullying and psychotic experiences ( r  = 0.43, p  < 0.01), and this result was compatible to the researches of García-Vázquez FI et al. [ 35 ], Aricak [ 36 ] and Connolly and Moore [ 37 ] which indicate that most perpetrators of cyberbullying suffer from a high level of aggression and psychosis [ 35 , 36 , 37 ], and Fekih-Romdhane et al.(2023) told that these features make perpetrators more prone to psychotic experiences [ 16 ].

The second purpose of the research was investigating the relationship between cyberbullying and cyber victimization with alexithymia that these relationships were present according to the results of this research. Our findings support the previous interpretations of Aricak and Ozbay [ 38 ], Espinoza [ 39 ] and Waches et al. [ 40 ] and was compatible with them. Waches et al. [ 40 ] reported that there is a positive relationship between cyber victimization and alexithymia. Eichhorn et al. [ 20 ] discussed that victims of cyber bullying who have trauma and they also do not receive adequate support, try to overcome their stress and other negative emotions by avoiding and ignoring them, so they become more sensitive for alexithymia in the future [ 20 ]; also according to Fekih-Romdhane et al. [ 8 ]. which resulted that cyber victimization decreases self-esteem [ 8 ] and this can cause some problems like low self-assertivity and emotional expression that they can cause alexithymia in future [ 41 ]. In other hand, Levantini et al. [ 42 ] believed that perpetrators of cyberbullying who have low psychological, emotional and social adjustments use alexithymia as a defense mechanism to minimize their emotional conflicts by not empathizing and ignoring their feelings, so they will have more difficulty in identifying and describing their emotions; gradually this lack of empathy may become a personality trait in them [ 42 ]. Our finding in this scope is congruous with the findings of Wachs & Wright [ 43 ], Aricak & Ozbay [ 38 ] and Wachs et al. [ 44 ] researches.

According to this research, the relationship between alexithymia and psychotic experiences was proven (β = 0.36, p  < 0.001) that this result is compatible with the result of the research was done by Poza [ 22 ] which it can delineate in this way that people need to identify and express their feelings in order to manage them better; however, this work is difficult for alexithymic person why they cannot identify and describe their feelings in difficult situations and they also have some problems with transferring their feelings to others in an appropriate way, and for this reason, they tolerate psychological pressure and stress that can make them more deserved to psychotic experiences [ 22 ]. This positive and significant relationship was aligned with the result of the van der Velde et al.(2015) research [ 45 ].

In the subscale levels, all the subscales of the variables that had a relationship with each other also showed a significant and positive correlation with each other, except for externally oriented thinking (subscale of alexithymia) that had a weak relationship with bizarre experiences in psychotic experiences and it had not any relationship with others subscales, because this subscale have been associated with deficits in cognitive processing so it can cause some bizarre experiences in the person (Table  2 ) [ 46 ].

The limitation of this research was the cross-sectional nature of it; the results of this type of studies can only denote the relationships between the variables and longitudinal studies are needed in order to reach more accurate results and measure other influencing factors and understand their effects. In other hand, in this research the samples live in the city of Tehran and this can affect the access to facilities and culture of students, although it was tried to collect samples from all regions and all levels of welfare in Tehran. The special limitation of this research can be the age group of the participants, although this is a winning point and made a lot of suitable information about adolescents, but it can limit the generalization of the results.

Teenagers who have been victims of cyberbullying or have done cyberbullying can face with more negative thoughts, bad feelings and inner angers, and they try to suppress their feelings in order to decrease them, so they can become susceptible to alexithymia. Accompanying these bad memories and feelings with the lack of proper understanding of emotion can cause strange and illusory experiences and suspicion that these features are some causative factors of psychotic experiences especially in teenagers which are one the most sensitive group of people in societies.

The findings from this study illustrate the need of school base intervention programs in order to prevent cyberbullying and its victimization, in this way the adolescents will be protected against alexithymia and psychotic experiences. Students need to learn some protection methods to deal with the trauma have caused by cyberbullying victimization from psychologists in order to prevent alexithymia; Also educational planning or preparing some useful packages can help the alexithymic persons to control it by promoting their emotional intelligence with learning some useful techniques like mindfulness so they will be protected against psychotic experiences too.

Data availability

Sequence data that support the findings of this study have been deposited in: https://figshare.com/articles/dataset/cyberData_sav/25399264 .

Abbreviations

Community Assessment of Psychic Experiences

Cyber-Bullying/Victimization Experiences

Comparative Fit Index

Goodness of Fit Index

Normalized Fit Index

Root Mean Square Error of Approximation

Toronto Alexithymia Scale

Nooripour R, Hosseinian S, Ghanbari N, Wisniewski P, Sikström S. Validity and reliability of persian version of cyber – bullying / victimization experience questionnaire (CBVEQ) among Iranian adolescents. Int J Bullying Prev. 2024;(0123456789). https://doi.org/10.1007/s42380-024-00211-2

Chatlos JC. Adolescent identity formation versus, spiritual transformation. Zygon. 2023;58(1):156–82. https://onlinelibrary.wiley.com/doi/full/ https://doi.org/10.1111/zygo.12862

Fekih-Romdhane F, Pandi-Perumal SR, Conus P, Krebs MO, Cheour M, Seeman MV et al. Prevalence and risk factors of self-reported psychotic experiences among high school and college students: a systematic review, meta-analysis, and meta-regression. Acta Psychiatr Scand. 2022;146(6):492–514. https://onlinelibrary.wiley.com/doi/full/ https://doi.org/10.1111/acps.13494

Staines L, Healy C, Murphy F, Byrne J, Murphy J, Kelleher I et al. Incidence and persistence of psychotic experiences in the general population: systematic review and meta-analysis. Schizophr Bull. 2023;49(4):1007–21. https://doi.org/10.1093/schbul/sbad056

Yung AR, Lin A. Psychotic experiences and their significance. Vol. 15, World Psychiatry. World Psychiatric Association; 2016. pp. 130–1. /pmc/articles/PMC4911755/

Remberk B. Znaczenie kliniczne doznań podobnych do psychotycznych u dzieci i młodziezy. Psychiatr Pol. 2017;51(2):271–82.

Article   PubMed   Google Scholar  

Cohen AS, Mohr C, Ettinger U, Chan RCK, Park S. Schizotypy as an organizing framework for social and affective sciences. Schizophr Bull. 2015;4:S427–35. https://doi.org/10.1093/schbul/sbu195

Fekih-Romdhane F, Stambouli M, Malaeb D, Farah N, Cheour M, Obeid S. Insomnia and distress as mediators on the relationship from cyber-victimization to self-reported psychotic experiencesParuk, Nassen ME, R., Paruk M et al. (2022). Cyberbullying perpetration and victimisation amongst adolescent psychiatric patients at Len. BMC Psychiatry. 2023;23(1):1–11. https://link.springer.com/articles/ https://doi.org/10.1186/s12888-023-05019-w

Zhu C, Huang S, Evans R, Zhang W. Cyberbullying among adolescents and children: a Comprehensive Review of the Global Situation, Risk factors, and preventive measures. Front Public Heal. 2021;9:634909.

Article   Google Scholar  

Shariatpanahi G, Tahouri K, Asadabadi M, Sayarifard A. Cyberbullying Its Contributing Factors among Iran Adolescents. 2021;10(3).

Lin S, Mastrokoukou S, Longobardi C. Social relationships and social media addiction among adolescents: variable-centered and person-centered approaches. Comput Hum Behav. 2023;147:107840.

Dolev-Cohen M, Barak A. Adolescents’ use of instant messaging as a means of emotional relief. Comput Hum Behav. 2013;29(1):58–63.

Lu H, Fu G. The Effect of Alexithymia on adolescent risk-taking behavior. Int J Humanit Soc Sci Educ. 2022;9(8):20–8.

Gámez-Guadix M, Borrajo E, Almendros C. Risky online behaviors among adolescents: longitudinal relations among problematic Internet use, cyberbullying perpetration, and meeting strangers online. J Behav Addict. 2016;5(1):100–7. https://akjournals.com/view/journals/2006/5/1/article-p100.xml

Otake Y, Luo X. Psychotic-like experiences Associated with Cyber and traditional bullying. Heal Behav Policy Rev. 2019;6(2):192–8.

Fekih-Romdhane F, Malaeb D, Loch AA, Farah N, Obeid S, Hallit S. Insomnia mediates the relationship between aggression indicators and positive psychotic experiences in a large community-based adult sample. Int J Ment Health Addict. 2023;1–22. https://link.springer.com/article/ https://doi.org/10.1007/s11469-023-01044-8

Montag C, Brandt L, Lehmann A, De Millas W, Falkai P, Gaebel W et al. Cognitive and emotional empathy in individuals at clinical high risk of psychosis. Acta Psychiatr Scand. 2020;142(1):40–51. https://onlinelibrary.wiley.com/doi/full/ https://doi.org/10.1111/acps.13178

Scale V, Scale C, Scale A. 04 Bünyamin ATEŞ-Alican KAYA. 2022;90(Bahar).

MacKie CJ, O’Leary-Barrett M, Al-Khudhairy N, Castellanos-Ryan N, Struve M, Topper L et al. Adolescent bullying, cannabis use and emerging psychotic experiences: a longitudinal general population study. Psychol Med. 2013;43(5):1033–44.

Eichhorn S, Brähler E, Franz M, Friedrich M, Glaesmer H. Traumatic experiences, alexithymia, and posttraumatic symptomatology: a cross-sectional population-based study in Germany. Eur J Psychotraumatol. 2014;5(1). https://www.tandfonline.com/doi/abs/ https://doi.org/10.3402/ejpt.v5.23870

Preece DA, Gross JJ. Conceptualizing alexithymia. Pers Individ Dif. 2023;215:112375.

Pozza A. The role of aberrant salience and alexithymia in psychotic experiences of non-treatment-seeking adolescent immigrants compared with natives. Neuropsychiatr Dis Treat. 2019;15:2057–61. https://www.tandfonline.com/action/journalInformation?journalCode=dndt20

Antoniadou N, Kokkinos CM, Markos A. Development, construct validation and measurement invariance of the Greek cyber-bullying/victimization experiences questionnaire (CBVEQ-G). Comput Hum Behav. 2016;65:380–90.

Basharpoor S, Zardi B. Psychometric Properties of Cyber-Bullying/Victimization Experiences Questionnaire (CBVEQ) in Students. J Sch Psychol [Internet]. 2019 May 22 [cited 2023 Oct 15];8(1):43–57. https://jsp.uma.ac.ir/article_795_en.html

Capra C, Kavanagh DJ, Hides L, Scott J. Brief screening for psychosis-like experiences. Schizophr Res. 2013;149(1–3):104–7.

Mirzaei Poueenak F, Ghanbari Pirkashani N, Nooripour R, Hosseini SR, Mazloomzadeh M, Shirkhani M. Psychometric validation of the Persian version of the community assessment of psychotic experiences-42 (CAPE-42) in Iranian college students. Psychosis. 2022;14(1):81–92. https://www.tandfonline.com/doi/abs/ https://doi.org/10.1080/17522439.2020.1861075

Bagby RM, Parker JDA, Taylor GJ. The twenty-item Toronto Alexithymia scale—I. item selection and cross-validation of the factor structure. J Psychosom Res. 1994;38(1):23–32.

Besharat MA. Reliability and factorial validity of a farsi version of the 20-item Toronto alexithymia scale with a sample of Iranian students. Psychol Rep. 2007;101(1):209–20. https://journals.sagepub.com/doi/abs/ . https://doi.org/10.2466/pr0.101.1.209-220

Graham JW. Missing data analysis: making it work in the real world. Ann Rev Psychol. 2008;60:549–76. https://doi.org/10.1146/annurev.psych58110405085530

Byrne BM. Structural equation modeling with Mplus. Structural equation modeling with Mplus. Routledge; 2013.

Turner R, Louie K, Parvez A, Modaffar M, Rezaie R, Greene T, et al. The effects of developmental trauma on theory of mind and its relationship to psychotic experiences: a behavioural study. Psychiatry Res. 2022;312:114544.

Croft J, Heron J, Teufel C, Cannon M, Wolke D, Thompson A, et al. Association of trauma type, age of exposure, and frequency in childhood and adolescence with psychotic experiences in early adulthood. JAMA Psychiatry. 2019;76(1):79–86. https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2714595

Hartley S, Haddock G, Vasconcelos E, Sa D, Emsley R, Barrowclough C. An experience sampling study of worry and rumination in psychosis. Psychol Med. 2014;44(8):1605–14. https://www.cambridge.org/core/journals/psychological-medicine/article/abs/an-experience-sampling-study-of-worry-and-rumination-in-psychosis/23512D77348B648DF93CFAD70AC884A9

Liu S, Wu W, Zou H, Chen Y, Xu L, Zhang W, et al. Cybervictimization and non-suicidal self-injury among Chinese adolescents: the effect of depression and school connectedness. Front Public Heal. 2023;11:1091959.

García-Vázquez FI, Parra-Pérez LG, Valdés-Cuervo AA. The Effects of Forgiveness, Gratitude, and Self-Control on Reactive and Proactive Aggression in Bullying. Int J Environ Res Public Heal. 2020, Vol 17, Page 5760 [Internet]. 2020 Aug 10 [cited 2023 Oct 1];17(16):5760. https://www.mdpi.com/1660-4601/17/16/5760/htm

Aricak OT. Psychiatric symptomatology as a predictor of cyberbullying among university students. Egit Arastirmalari - Eurasian J Educ Res. 2009;(34):167–84.

Connolly I, O’Moore M. Personality and family relations of children who bully. Pers Individ Dif. 2003;35(3):559–67.

ArIcak OT, Ozbay A. Investigation of the relationship between cyberbullying, cybervictimization, alexithymia and anger expression styles among adolescents. Comput Hum Behav. 2016;55:278–85.

Espinoza G. Daily cybervictimization among latino adolescents: links with emotional, physical and school adjustment. J Appl Dev Psychol. 2015;38:39–48.

Article   PubMed   PubMed Central   Google Scholar  

Wachs S, Vazsonyi AT, Wright MF, Ksinan Jiskrova G. Cross-national associations among Cyberbullying victimization, Self-Esteem, and internet addiction: Direct and Indirect effects of Alexithymia. Front Psychol. 2020;11:529618.

Mousavi M, Alavinezhad R. Relationship of Alexithymia to adult attachment styles and self-esteem among college students. J Psychiatry Psychiatr Disord. 2023;1(1):6–14. http://www.fotunejournals.com/relationship-of-alexithymia-to-adult-attachment-styles-and-selfesteem-among-college-students.html

Levantini V, Camodeca M, Iannello NM. The contribution of bullying involvement and Alexithymia to somatic complaints in preadolescents. Child. 2023;10(5):905. https://www.mdpi.com/2227-9067/10/5/905/htm

Wachs S, Wright MF. Bullying and alexithymia: are there differences between traditional, cyber, combined bullies, and nonbullies in reading their own emotions? Crim Behav Ment Heal. 2018;28(5):409–13. https://onlinelibrary.wiley.com/doi/full/ https://doi.org/10.1002/cbm.2083

Wachs S, Bilz L, Fischer SM, Wright MF. Do emotional components of Alexithymia mediate the interplay between cyberbullying victimization and perpetration? Int J Environ Res Public Heal. 2017;14(12):1530. https://www.mdpi.com/1660-4601/14/12/1530/htm

Van Der Velde J, Swart M, Van Rijn S, Van Der Meer L, Wunderink L, Wiersma D, et al. Cognitive Alexithymia is associated with the degree of risk for psychosis. PLoS One. 2015;10(6):e0124803. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0124803

Kekkonen V, Kraav SL, Hintikka J, Kivimäki P, Kaarre O, Tolmunen T. Externally oriented thinking style increases primary health care use in adolescence. Eur J Public Health. 2023;33(3):418–23. https://doi.org/10.1093/eurpub/ckad041

Download references

The authors received no financial support for the research, authorship, and publication of this article.

Author information

Authors and affiliations.

Department of Counseling, Faculty of Education and Psychology, Alzahra University, Vanak Village Street, Tehran, 1993893973, Iran

Niloofar Movahedi, Simin Hosseinian, Hamid Rezaeian & Roghieh Nooripour

Department of Counseling, Qazvin Branch, Islamic Azad University, Qazvin, Iran

Roghieh Nooripour

You can also search for this author in PubMed   Google Scholar

Contributions

NM gathered and analyzed and interpreted the data and was the major contributor in writing the manuscript. SH and HR were supervisors of the text and background and conclusion and they helped in writing and SH is the corresponding author and is the corresponding author of the article. RN was supervisor of this project and helped in analyzing and interpreting the data and helped in writing.

Corresponding author

Correspondence to Simin Hosseinian .

Ethics declarations

Human ethics and consent to participate declaration.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the1964 Helsinki declaration and its later amendments or comparable ethical standards. The goals and conditions of this research were investigated and approved by the Ethics Committee of Alzahra University in Tehran (code: ALZAHRA.REC.1402.055). In the schools, first informed consent was obtained from all the samples’ parents or their legal guardian and confidentiality of data had been fully maintained by both the researcher and her colleagues and this was explained to the participants at the beginning, also necessary permissions were taken from all relevant organizations. No unnecessary confidential information was taken from the participants and if a person showed his unwillingness to participate in the research, according to the principles of respect for human rights, we have no right to force him and this research had no fee for the participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Movahedi, N., Hosseinian, S., Rezaeian, H. et al. Mediating role of alexithymia in relationship between cyberbullying and psychotic experiences in adolescents. BMC Psychol 12 , 465 (2024). https://doi.org/10.1186/s40359-024-01960-x

Download citation

Received : 13 March 2024

Accepted : 21 August 2024

Published : 31 August 2024

DOI : https://doi.org/10.1186/s40359-024-01960-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Alexithymia
  • Cyberbullying
  • Psychotic experiences

BMC Psychology

ISSN: 2050-7283

big data psychology research

American Psychological Association Logo

Statistics in Psychological Research

  • Data Collection and Analysis

Psychological Research

August 2023

big data psychology research

Unlock the power of data with this 10-hour, comprehensive course in data analysis. This course is perfect for anyone looking to deepen their knowledge and apply statistical methods effectively in psychology or related fields.

The course begins with consideration of how researchers define and categorize variables, including the nature of various scales of measurement and how these classifications impact data analysis and interpretation. This is followed by a thorough introduction to the measures of central tendency, variability, and correlation that researchers use to describe their findings, providing an understanding of such topics as which descriptive statistics are appropriate for given research designs, the meaning of a correlation coefficient, and how graphs are used to visualize data.

The course then moves on to a conceptual treatment of foundational inferential statistics that researchers use to make predictions or inferences about a population based on a sample. The focus is on understanding the logic of these statistics, rather than on making calculations. Specifically, the course explores the logic behind null hypothesis significance testing, long a cornerstone of statistical analysis. Learn how to formulate and test hypotheses and understand the significance of p-values in determining the validity of your results. The course reviews how to select the appropriate inferential test based on your study criteria. Whether it’s t-tests, ANOVA, chi-square tests, or regression analysis, you’ll know which test to apply and when.

In keeping with growing concerns about some of the limitations of null hypothesis significance testing, such as its role in the so-called replication crisis, the course also delves into these concerns and possible ways to address them, including introductory consideration of statistical power and alternatives to hypothesis testing like estimation techniques and confidence intervals, meta-analysis, modeling, and Bayesian inference.

Learning objectives

  • Explain various ways to categorize variables.
  • Describe the logic of inferential statistics.
  • Explain the logic of null hypothesis significance testing.
  • Select the appropriate inferential test based on study criteria.
  • Compare and contrast the use of statistical significance, effect size, and confidence intervals.
  • Explain the importance of statistical power.
  • Describe how alternative procedures address the major objections to null hypothesis significance testing.
  • Explain various ways to describe data.
  • Describe how graphs are used to visualize data.
  • Explain the meaning of a correlation coefficient.

This program does not offer CE credit.

More in this series

Introduces the scientific research process and concepts such as the nature of variables for undergraduates, high school students, and professionals.

August 2023 On Demand Training

Introduces the importance of ethical practice in scientific research for undergraduates, high school students, and professionals.

More From Forbes

What hanumankind’s ‘big dawgs’ teaches us about hidden bias—by a psychologist.

  • Share to Facebook
  • Share to Twitter
  • Share to Linkedin

Have you ever been taken aback by someone’s profession based on their appearance? Here’s how to ... [+] expand your perspective and rethink unconscious biases.

If you’ve been on social media lately, you might have caught wind of Hanumankind’s viral new song “Big Dawgs.” Listeners all over the world love its fantastic beat, hard-hitting lyrics and visually stunning music video. However, many were surprised to learn that Hanumankind is Indian.

Having grown up in Houston before returning to India, his sound naturally reflects U.S. influences. However, the surprise at his origins could hint at unconscious biases lurking within people’s perceptions.

Unconscious or implicit bias refers to attitudes or stereotypes that unconsciously affect our understanding, actions and decisions. These biases are automatic and operate without our awareness, often in ways that contradict our conscious values.

Unconscious biases stem from our natural inclination to categorize information quickly and efficiently. This process involves the brain relying on mental shortcuts developed through experience and societal conditioning. These shortcuts help the brain make rapid decisions, but can lead to biased judgments.

The Horrifying True Story Behind Netflix’s ‘The Deliverance’—What Happened To Latoya Ammons?

Fbi issues urgent ransomware attack warning—do these 3 things now, today’s nyt mini crossword clues and answers for saturday, august 31.

For instance, unconscious bias can stop you from fully participating in and appreciating new experiences. You might avoid eating at certain restaurants because you believe the food will be bad due to the cuisine or appearance, or in the case of Hanumankind, you might preemptively dismiss his music because you assume it’s not your style, before giving it a real chance.

Here are two ways to avoid the trap of unconscious bias.

1. Expand Your Social Circle

Changing your perception of others is hard when everyone in your social circle mimics your thoughts and beliefs. To avoid living in an echo chamber , you must surround yourself with varying perspectives. Making an effort to interact with a diverse group of people can help broaden your worldview.

Research shows that engaging in “perspective-taking” or considering a situation or experience from another person’s point of view can reduce implicit bias. This practice requires empathy and an open mind, as it challenges individuals to see beyond their own experiences and assumptions. Meeting and listening to new people gives you the chance to do just that.

“I’m a white girl and a lot of my girl friends are black. It’s made me very aware of how real white privilege is,” one reddit user explains, emphasizing the importance of a diverse social circle.

2. Embrace Curiosity

Another way to counteract unconscious bias is to approach situations with curiosity instead of relying on your assumptions. When you encounter something unexpected, like discovering that an artist doesn’t fit your mental image of their background, lean into the opportunity to learn more.

Ask yourself: Why does this surprise me? What can I learn from this? By staying open and curious, you give yourself the chance to appreciate new experiences without the filter of bias.

A 2021 study published in the Journal of Graduate Medical Education found that self-reflection effectively increased participants’ awareness and knowledge of implicit biases. Researchers found that first impressions based on appearance, ethnicity and stereotypes often led to inaccurate assumptions.

“For a White male physician covered in tattoos, only 2% correctly identified him as a physician, and 60% felt he was untrustworthy. For a smiling Black female astronaut, only 13% correctly identified her as an astronaut. For a brooding White male serial killer, 50% found him trustworthy,” the researchers write, highlighting shocking disparities in stereotypes and reality.

It’s essential to consciously challenge stereotypes when they arise. If you catch yourself making a snap judgment about someone based on their appearance, accent or background, pause and remind yourself that people are complex and multidimensional.

Just because someone doesn’t fit the mold you’re used to doesn’t mean they aren’t incredibly talented or capable. Hanumankind’s success is a reminder that talent transcends borders and by being aware of our biases, we can appreciate the diversity that makes art—and life—so rich.

Wondering how tolerant you are of others’ differences? Take this test to find out: Warm Tolerance Scale

Mark Travers

  • Editorial Standards
  • Reprints & Permissions

Join The Conversation

One Community. Many Voices. Create a free account to share your thoughts. 

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's  Terms of Service.   We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

  • False or intentionally out-of-context or misleading information
  • Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
  • Attacks on the identity of other commenters or the article's author
  • Content that otherwise violates our site's  terms.

User accounts will be blocked if we notice or believe that users are engaged in:

  • Continuous attempts to re-post comments that have been previously moderated/rejected
  • Racist, sexist, homophobic or other discriminatory comments
  • Attempts or tactics that put the site security at risk
  • Actions that otherwise violate our site's  terms.

So, how can you be a power user?

  • Stay on topic and share your insights
  • Feel free to be clear and thoughtful to get your point across
  • ‘Like’ or ‘Dislike’ to show your point of view.
  • Protect your community.
  • Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's  Terms of Service.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A practical guide to big data research in psychology

Affiliations.

  • 1 Department of Psychology and Social Behavior, University of California, Irvine.
  • 2 Department of Data and Analytics, Upworthy.
  • PMID: 27918178
  • DOI: 10.1037/met0000111

The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including key supervised and unsupervised learning data mining methods). It is accompanied by walkthrough tutorials on data acquisition, text analysis with latent Dirichlet allocation topic modeling, and classification with support vector machines. Big data practitioners in academia, industry, and the community have built a comprehensive base of tools and knowledge that makes big data research accessible to researchers in a broad range of fields. However, big data research does require knowledge of software programming and a different analytical mindset. For those willing to acquire the requisite skills, innovative analyses of unexpected or previously untapped data sources can offer fresh ways to develop, test, and extend theories. When conducted with care and respect, big data research can become an essential complement to traditional research. (PsycINFO Database Record

(c) 2016 APA, all rights reserved).

PubMed Disclaimer

Similar articles

  • Mining big data to extract patterns and predict real-life outcomes. Kosinski M, Wang Y, Lakkaraju H, Leskovec J. Kosinski M, et al. Psychol Methods. 2016 Dec;21(4):493-506. doi: 10.1037/met0000105. Psychol Methods. 2016. PMID: 27918179
  • A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. Landers RN, Brusso RC, Cavanaugh KJ, Collmus AB. Landers RN, et al. Psychol Methods. 2016 Dec;21(4):475-492. doi: 10.1037/met0000081. Epub 2016 May 23. Psychol Methods. 2016. PMID: 27213980
  • Unsupervised Tensor Mining for Big Data Practitioners. Papalexakis EE, Faloutsos C. Papalexakis EE, et al. Big Data. 2016 Sep;4(3):179-91. doi: 10.1089/big.2016.0026. Big Data. 2016. PMID: 27642720
  • Big Data Analytics in Chemical Engineering. Chiang L, Lu B, Castillo I. Chiang L, et al. Annu Rev Chem Biomol Eng. 2017 Jun 7;8:63-85. doi: 10.1146/annurev-chembioeng-060816-101555. Epub 2017 Feb 27. Annu Rev Chem Biomol Eng. 2017. PMID: 28301733 Review.
  • [Big data in medicine and healthcare]. Rüping S. Rüping S. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2015 Aug;58(8):794-798. doi: 10.1007/s00103-015-2181-y. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2015. PMID: 26063521 Review. German.
  • Using machine learning to develop a five-item short form of the children's depression inventory. Lin S, Wang C, Jiang X, Zhang Q, Luo D, Li J, Li J, Xu J. Lin S, et al. BMC Public Health. 2024 Apr 23;24(1):1118. doi: 10.1186/s12889-024-18657-w. BMC Public Health. 2024. PMID: 38654267 Free PMC article.
  • Leveraging and exercising caution with ChatGPT and other generative artificial intelligence tools in environmental psychology research. Yuan S, Li F, Browning MHEM, Bardhan M, Zhang K, McAnirlin O, Patwary MM, Reuben A. Yuan S, et al. Front Psychol. 2024 Apr 8;15:1295275. doi: 10.3389/fpsyg.2024.1295275. eCollection 2024. Front Psychol. 2024. PMID: 38650897 Free PMC article.
  • Assessment of global shipping risk caused by maritime piracy. He Z, Wang C, Gao J, Xie Y. He Z, et al. Heliyon. 2023 Oct 14;9(10):e20988. doi: 10.1016/j.heliyon.2023.e20988. eCollection 2023 Oct. Heliyon. 2023. PMID: 37916124 Free PMC article.
  • Predicting non-improvement of symptoms in daily mental healthcare practice using routinely collected patient-level data: a machine learning approach. Franken K, Ten Klooster P, Bohlmeijer E, Westerhof G, Kraiss J. Franken K, et al. Front Psychiatry. 2023 Sep 25;14:1236551. doi: 10.3389/fpsyt.2023.1236551. eCollection 2023. Front Psychiatry. 2023. PMID: 37817829 Free PMC article.
  • The (Mis)Information Game: A social media simulator. Butler LH, Lamont P, Wan DLY, Prike T, Nasim M, Walker B, Fay N, Ecker UKH. Butler LH, et al. Behav Res Methods. 2024 Mar;56(3):2376-2397. doi: 10.3758/s13428-023-02153-x. Epub 2023 Jul 11. Behav Res Methods. 2024. PMID: 37433974 Free PMC article.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • American Psychological Association
  • Ovid Technologies, Inc.

Other Literature Sources

  • scite Smart Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

School of Nursing

  • Which Program is Right for Me Overview
  • Comparing the PhD & DNP
  • Technical Standards
  • Bachelor of Science in Nursing
  • Master of Nursing
  • PhD in Nursing
  • Doctor of Nursing Practice
  • Certificate Overview
  • Adult Gerontological Acute Care Nurse Practitioner Certificate
  • Leadership in Health Information Technology for Health Professionals Certificate
  • Population Health Informatics and Technology (PHIT) Certificate
  • Post-Graduate Certificate Program Overview
  • Meet Our Students
  • Current Student Resources
  • Office of Student & Career Advancement Services
  • Research Overview
  • Research Projects
  • Research Day
  • Faculty Grants
  • Faculty Experts
  • Laboratory of Clinical Exercise Physiology
  • Publications Overview
  • Books by Faculty
  • Student Research Resources
  • About the Office of Nursing Research and Scholarship
  • News & Events Overview
  • Minnesota Nursing Magazine Overview
  • Current Issue
  • Magazine Issue Archive
  • Clinical Preceptors Overview
  • BSN/MN (Prelicensure) Preceptor Information
  • DNP Preceptor Information
  • Clinical Preceptor e-Toolkit
  • Nurse-midwifery specialty
  • DNP Project Ideas
  • School of Nursing at a Glance
  • School of Nursing Leadership
  • Our Faculty
  • Faculty Emeriti & Faculty Ad Honorem
  • Cooperatives
  • Faculty Honors and Awards
  • Employment Opportunities
  • Our Facilities Overview
  • Bentson Health Communities Innovation Center
  • Bakken Center for Spirituality and Healing
  • Global Health Overview
  • Visiting Scholars
  • Global Collaborations
  • Global Health Faculty Scholars
  • Opportunities for Students
  • Inclusivity, Diversity & Equity
  • Professional Development
  • Giving to Nursing Overview
  • Areas of Need
  • Scholarships
  • Impact Stories
  • Ways to Give
  • Board of Trustees
  • History Overview
  • Heritage Committee
  • History of Our School Leadership
  • Program Histories
  • Historical Videos and Photos
  • Sigma Theta Tau International Overview
  • Zeta Chapter Board
  • Grants & Awards
  • Volunteer Opportunities
  • Alumni Overview
  • Submit a Class Note
  • Alumni Society Awards Overview
  • Rising Star Award
  • Distinguished Alumni Humanitarian Award
  • Excellence in Practice Award
  • Board of Directors
  • Degree Verification
  • Volunteering
  • Centers Overview
  • Center for Adolescent Nursing Overview
  • Center Projects
  • Learning Opportunities
  • Publications
  • Center for Aging Science & Care Innovation Overview
  • Center for Child and Family Health Promotion Research Overview
  • Get Connected
  • Center projects
  • Center for Nursing Informatics Overview
  • 2025 Nursing Knowledge: Big Data Science Conference
  • Previous Conferences
  • Nursing Knowledge Big Data Science Initiative
  • Center for Planetary Health and Environmental Justice Overview
  • Katharine J. Densford International Center for Nursing Leadership Overview
  • Leadership Model
  • Directorate
  • Initiatives
  • Contact the Center
  • 2024 Planting Seeds of Innovation Colloquium
  • Admission Requirements
  • Required Courses
  • Admissions Requirements
  • How to Apply
  • Post-Graduate Certificate Program Plans by Specialty
  • Message from the Dean
  • Board of Visitors
  • Adult & Gerontological Health
  • Child & Family Health
  • Population Health & Systems
  • Tenured Faculty Opportunities
  • Tenure-Track Faculty
  • Clinical Track Faculty Opportunities
  • Guidance for Applicants for faculty positions
  • Marie Manthey Endowed Professorship
  • Leaving a Nursing Legacy
  • 100 Distinguished Nursing Alumni (A-E)
  • 100 Distinguished Nursing Alumni (F-J)
  • 100 Distinguished Nursing Alumni (K-O)
  • 100 Distinguished Nursing Alumni (P-T)
  • 100 Distinguished Nursing Alumni (U-Z)
  • Distinguished Faculty Alumni
  • Program Histories Overview
  • Adult-Gero Nurse Practitioner Programs
  • DNP Program
  • Geriatric Nurse Practitioner/Geriatric Clinical Nurse Specialist
  • Health Innovation and Leadership
  • Integrative Health and Healing
  • Master of Nursing Administration
  • Nurse Anesthesia
  • Nurse Midwifery
  • Nursing Informatics
  • Pediatric Primary Care Nurse Practitioner
  • PhD Program
  • Practical Nursing Program
  • Pre-licensure programs
  • Psychiatric-Mental Health
  • Public Health Nursing
  • Women's Health Nurse Practitioner
  • Evidence-based Practice Grants
  • Research Grants
  • Travel Grants
  • Summer Institute in Adolescent Health
  • Past Summer Institutes in Adolescent Health
  • Clinical Teaching in Nursing Homes
  • Resources for Clinical Teaching in Nursing Homes
  • Competencies for Public Health Nursing Practice Instrument
  • Doctoral Education Pathway for American Indian/Alaska Native Nurses
  • Online Teaching Resources
  • 2022 Nursing Knowledge: Big Data Science Conference
  • 2020 Nursing Knowledge: Big Data Science Conference
  • Accreditation
  • Call for Abstracts
  • Registration
  • 2019 Workgroups
  • 2018 Workgroups
  • 2017 Nursing Knowledge: Big Data Science Conference
  • 2016 Workgroups
  • 2015 Workgroups
  • 2014 Nursing Knowlege: Big Data Science Conference
  • 2013 Nursing Knowledge: Pre-conference Materials
  • Vision and Mission
  • Five-Year Strategic Plan
  • Steering committee
  • Resources for Workgroup Members
  • Nursing Knowledge: Big Data Science Conference
  • Nursing Big Data Repository
  • Contact information and social media
  • Foresight Leadership

IMAGES

  1. Big Data in Psychological Research 1st edition

    big data psychology research

  2. Big Data and the Psychologist's Role on the Analytical Team

    big data psychology research

  3. Research and Data Analysis in Psychology

    big data psychology research

  4. (PDF) A Practical Guide to Big Data Research in Psychology

    big data psychology research

  5. Data Analytics & Psychology

    big data psychology research

  6. Buy Big Data In Psychological Research Book By: Sang E Woo

    big data psychology research

VIDEO

  1. Psychology 2 #mindset #behavior #photo

  2. Guillem Borrell: Most of you don't need Spark. Large-scale data management on a budget with Python

  3. How 'big data' yields productivity and profits

  4. INTELLIGENCE AND BEHAVIOURAL CHANGE |EMOTIONAL QUOTIENT

  5. Using Big Data to Understand Customer Psychology

  6. Researcher Stories: Using Big Data to advise international development

COMMENTS

  1. Big Data in Psychological Research

    Future Research Agenda for Big Data Research in Psychology Frederick Oswald; Contributor bios. Sang Eun Woo, PhD, is an associate professor in the Department of Psychological Sciences at Purdue University. Her research focuses on industrial-organizational psychology, particularly personality and motivation, work attitudes, withdrawal behaviors ...

  2. Big Data in Psychology: Introduction to Special Issue

    The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are ...

  3. PDF Big Data in Psychology: A Framework for Research Advancement

    researchers contemplating big data research in psychology. First, we highlight that big data research efforts are much more within reach than many researchers realize. Specifi-cally, we argue that big data research goes well beyond the numberofparticipants(i.e.,samplesize),whichhasattimes been considered to be the primary factor when considering

  4. An introductory guide for conducting psychological research with big data

    Big Data can bring enormous benefits to psychology. However, many psychological researchers show skepticism in undertaking Big Data research. Psychologists often do not take Big Data into consideration while developing their research projects because they have difficulties imagining how Big Data could help in their specific field of research, imagining themselves as "Big Data scientists," or ...

  5. PDF Big Data in Psychology: Introduction to the Special Issue

    Data that is edited by Furht and Khoshgoftaar, and Big Data Research that is edited by Wu and Palpanas. Likewise, these two journals also do not appear to be directed to those in psychology or the larger social sciences. Similarly, a quick Google search in September 2016 for "big data book" revealed more than 48 million results, although it ...

  6. Big data in psychology: A framework for research advancement

    Abstract. The potential for big data to provide value for psychology is significant. However, the pursuit of big data remains an uncertain and risky undertaking for the average psychological researcher. In this article, we address some of this uncertainty by discussing the potential impact of big data on the type of data available for ...

  7. Big data in psychological research.

    The rapid emergence of big data has been met with enthusiasm in many different fields—especially within applied settings. This book seeks to showcase the opportunities of big data and its related methodologies for psychologists to study human behavior and cognition. At the same time, the authors believe that the key to unlocking this possibility requires addressing many of these concerns and ...

  8. Big data in psychology: A framework for research advancement.

    The potential for big data to provide value for psychology is significant. However, the pursuit of big data remains an uncertain and risky undertaking for the average psychological researcher. In this article, we address some of this uncertainty by discussing the potential impact of big data on the type of data available for psychological research, addressing the benefits and most significant ...

  9. Big Data in Psychological Research on JSTOR

    Big Data in Psychological Research Big Data in Psychological Research. 978-1-4338-3233-8. Psychology. Technological advances have led to an abundance of widely availabledata on every aspect of life today. Psychologists today have moreinformation than ever before...

  10. Big data ups its reach

    Psychologists are incorporating big data techniques into research and related business ventures. They're learning analytic techniques and applying tools such as artificial intelligence (the simulation of human intelligence processes by machines) and machine learning (computers' ability to learn from data without being explicitly programmed to do so).

  11. Big data in social and psychological science: theoretical and

    Big data presents unprecedented opportunities to understand human behavior on a large scale. It has been increasingly used in social and psychological research to reveal individual differences and group dynamics. There are a few theoretical and methodological challenges in big data research that require attention. In this paper, we highlight four issues, namely data-driven versus theory-driven ...

  12. Big Data in Psychology: A Framework for Research Advancement

    The potential impact of big data on the type of data available for psychological research, addressing the benefits and most significant challenges that emerge from these data, and organizing a variety of research opportunities for psychology are discussed. The potential for big data to provide value for psychology is significant. However, the pursuit of big data remains an uncertain and risky ...

  13. (PDF) Big Data in Psychological Research

    Abstract. Technological advances have led to an abundance of widely available data on every aspect of life today. Psychologists today have more information than ever before on human cognition ...

  14. Frontiers

    A new research agenda related to the analysis of big data in psychology is outlined at the end of the study. The amount of data in the world is enormous. ... Another example of the use of big data in research is an experimental study on visual search by Mitroff et al. (2015). They developed a mobile game in which respondents had to detect ...

  15. What Big Data Means For Psychological Science

    Big Data can even help psychological scientists study studies, said Tal Yarkoni of the University of Texas at Austin. Yarkoni and others recently developed Neurosynth, an online program that analyzes huge amounts of fMRI data to guide users toward a subject of interest. To date, said Yarkoni, Neurosynth has synthesized research from over 9,000 ...

  16. A Practical Guide to Big Data Research in Psychology

    E-mail: [email protected]. A PRACTICAL GUIDE TO BIG DATA RESEARCH 3. Abstract. The massive volume of data that now covers a wide variety of human behaviors offers. researchers in psychology an ...

  17. PDF Big Data in Psychology

    Psychological research in a time of Big Data Big Data in Psychology, Trier, 2018 4. Big Data in the (Behavioral) Sciences •Everything is measured ... Big Data in Psychology, Trier, 2018 37 PC1 PC2 PC3 ANGRY 0 0 0 ANGRY_1 0 0 0 DEPRE 0.24 0 0 DEPRE_1 0.23 0 0 SAD 0.24 0 0 SAD _1 0.24 0 0

  18. PDF Big Data in Psychological Research Sample Chapter

    This is the era of big data for psychological research, which broadly refers to multiplying multiform data (e.g., structured, unstructured) and their supporting techno-logical infrastructure (i.e., capture, storage, processing) and analytic techniques that can enhance psychological research (cf. Adjerid & Kelley, 2018; Harlow & Oswald, 2016).1.

  19. A practical guide to big data research in psychology.

    The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including key supervised and unsupervised learning data mining ...

  20. Methodological improvements for studying face matching in border

    We reanalyzed open access data from the three experiments conducted by Weatherford et al. and found that, across all frequencies of passport mismatches (20%, 50%, 80% ... Undergraduate university students were recruited through a research participation system at a psychology department in the Greater Vancouver area. Participants received ...

  21. AI Makes the Big Picture Much Bigger

    Key points. The "big picture" has evolved into a vast, dynamic mosaic shaped by the connectivity of LLMs. LLMs synthesize data across domains, transforming how we perceive and engage with complex ...

  22. Big data in psychology: Introduction to the special issue

    The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are ...

  23. Knowledge mapping and evolution of research on older adults ...

    Under the influence of AI and big data, research should continue to focus on the application of emerging technologies among older adults, exploring in depth how they adapt to and effectively use ...

  24. LibGuides: Psychology: Which databases should I use?

    There are a lot of databases available, so it can be difficult to decide which one to use for your research. The ones on this page have been curated for psychology. See the bottom of the page for the link to the full list of databases. You can also check out the Best Bets box on the Home page.

  25. Mediating role of alexithymia in relationship between cyberbullying and

    Descriptive analysis. The bivariate correlations of all variables were calculated and they are available in Table 2.The data shows that cyberbullying was positively correlated with alexithymia (r = 0.37,p < 0.01); cyber victimization with alexithymia (r = 0.41,p < 0.01).Alexithymia was significantly and positively correlated with the dependent variable (r = 0.36,p < 0.01), too.

  26. Big data in psychology: Introduction to the special issue.

    The introduction to this special issue on psychological research involving big data summarizes the highlights of 10 articles that address a number of important and inspiring perspectives, issues, and applications. Four common themes that emerge in the articles with respect to psychological research conducted in the area of big data are mentioned, including: (a) The benefits of collaboration ...

  27. Statistics in psychological research

    Unlock the power of data with this 10-hour, comprehensive course in data analysis. This course is perfect for anyone looking to deepen their knowledge and apply statistical methods effectively in psychology or related fields. ... Methods for Quantitative Research in Psychology Introduces the scientific research process and concepts such as the ...

  28. What Hanumankind's 'Big Dawgs' Teaches Us About Hidden ...

    Research shows that engaging in "perspective-taking" or considering a situation or experience from another person's point of view can reduce implicit bias. This practice requires empathy and ...

  29. A practical guide to big data research in psychology

    The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including ...

  30. 2025 Nursing Knowledge: Big Data Science Conference

    2025 Nursing Knowledge: Big Data Science Conference Current date: 2024-08-29T19:35:55-0500 , end date: 2025-06-06T17:00:00-0500. Wednesday, June 4 - Friday, June 6. Wednesday, June 4, 2025. ... Research ethics at the University of Minnesota. We are committed to protecting research participants, upholding ethical standards, and improving our ...