Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Data Structures and Algorithms

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

New submissions for Friday, 2 August 2024 (showing 4 of 4 entries )

A Monge directed acyclic graph (DAG) $G$ on the nodes $1,2,\cdots,N$ has edges $\left( i,j\right) $ for $1\leq i<j\leq N$ carrying submodular edge-lengths. Finding a shortest $M$-link path from $1$ to $N$ in $G$ for any given $1<M<N-1$ has many applications. In this paper, we give a contract-and-conquer algorithm for this problem which runs in $O\left( \sqrt{NM\left( N-M\right) \log\left( N-M\right) }\right) $ time and $O\left( N\right) $ space. It is the first $o\left( NM\right) $-time algorithm with linear space complexity, and its time complexity decreases with $M$ when $M\geq N/2$. In contrast, all previous strongly polynomial algorithms have running time growing with $M$. For both $O\left( poly\left( \log N\right) \right) $ and $N-O\left( poly\left( \log N\right) \right) $ regimes of $M$, our algorithm has running time $O\left( N\cdot poly\left( \log N\right) \right) $, which partially answers an open question rased in \cite{AST94} affirmatively.

The net frequency (NF) of a string, of length $m$, in a text, of length $n$, is the number of occurrences of the string in the text with unique left and right extensions. Recently, Guo et al. [CPM 2024] showed that NF is combinatorially interesting and how two key questions can be computed efficiently in the offline setting. First, SINGLE-NF: reporting the NF of a query string in an input text. Second, ALL-NF: reporting an occurrence and the NF of each string of positive NF in an input text. For many applications, however, facilitating these computations in an online manner is highly desirable. We are the first to solve the above two problems in the online setting, and we do so in optimal time, assuming, as is common, a constant-size alphabet: SINGLE-NF in $O(m)$ time and ALL-NF in $O(n)$ time. Our results are achieved by first designing new and simpler offline algorithms using suffix trees, proving additional properties of NF, and exploiting Ukkonen's online suffix tree construction algorithm and results on implicit node maintenance in an implicit suffix tree by Breslauer and Italiano.

We consider two natural variants of the problem of minimum spanning tree (MST) of a graph in the parallel setting: MST verification (verifying if a given tree is an MST) and the sensitivity analysis of an MST (finding the lowest cost replacement edge for each edge of the MST). These two problems have been studied extensively for sequential algorithms and for parallel algorithms in the PRAM model of computation. In this paper, we extend the study to the standard model of Massive Parallel Computation (MPC). It is known that for graphs of diameter $D$, the connectivity problem can be solved in $O(\log D + \log\log n)$ rounds on an MPC with low local memory (each machine can store only $O(n^{\delta})$ words for an arbitrary constant $\delta > 0$) and with linear global memory, that is, with optimal utilization. However, for the related task of finding an MST, we need $\Omega(\log D_{\text{MST}})$ rounds, where $D_{\text{MST}}$ denotes the diameter of the minimum spanning tree. The state of the art upper bound for MST is $O(\log n)$ rounds; the result follows by simulating existing PRAM algorithms. While this bound may be optimal for general graphs, the benchmark of connectivity and lower bound for MST suggest the target bound of $O(\log D_{\text{MST}})$ rounds, or possibly $O(\log D_{\text{MST}} + \log\log n)$ rounds. As for now, we do not know if this bound is achievable for the MST problem on an MPC with low local memory and linear global memory. In this paper, we show that two natural variants of the MST problem: MST verification and sensitivity analysis of an MST, can be completed in $O(\log D_T)$ rounds on an MPC with low local memory and with linear global memory; here $D_T$ is the diameter of the input ``candidate MST'' $T$. The algorithms asymptotically match our lower bound, conditioned on the 1-vs-2-cycle conjecture.

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.

Cross submissions for Friday, 2 August 2024 (showing 3 of 3 entries )

In quantum computing, the efficient optimization of Pauli string decompositions is a crucial aspect for the compilation of quantum circuits for many applications, such as chemistry simulations and quantum machine learning. In this paper, we propose a novel algorithm for the synthesis of trotterized time-evolution operators that results in circuits with significantly fewer gates than previous solutions. Our synthesis procedure takes the qubit connectivity of a target quantum computer into account. As a result, the generated quantum circuit does not require routing, and no additional CNOT gates are needed to run the resulting circuit on a target device. We compare our algorithm against Paulihedral and TKET, and show a significant improvement for randomized circuits and different molecular ansatzes. We also investigate the Trotter error introduced by our ordering of the terms in the Hamiltonian versus default ordering and the ordering from the baseline methods and conclude that our method on average does not increase the Trotter error.

A recent surprising result in the implementation of worst-case-optimal (wco) multijoins in graph databases (specifically, basic graph patterns) is that they can be supported on graph representations that take even less space than a plain representation, and orders of magnitude less space than classical indices, while offering comparable performance. In this paper we uncover a wide set of new wco space-time tradeoffs: we (1) introduce new compact indices that handle multijoins in wco time, and (2) combine them with new query resolution strategies that offer better times in practice. As a result, we improve the average query times of current compact representations by a factor of up to 13 to produce the first 1000 results, and using twice their space, reduce their total average query time by a factor of 2. Our experiments suggest that there is more room for improvement in terms of generating better query plans for multijoins.

Over the last years, there has been a significant amount of work studying the power of specific classes of computationally efficient estimators for multiple statistical parametric estimation tasks, including the estimators classes of low-degree polynomials, spectral methods, and others. Despite that, our understanding of the important class of MCMC methods remains quite poorly understood. For instance, for many models of interest, the performance of even zero-temperature (greedy-like) MCMC methods that simply maximize the posterior remains elusive. In this work, we provide an easy to check condition under which the low-temperature Metropolis chain maximizes the posterior in polynomial-time with high probability. The result is generally applicable, and in this work, we use it to derive positive MCMC results for two classical sparse estimation tasks: the sparse tensor PCA model and sparse regression. Interestingly, in both cases, we also leverage the Overlap Gap Property framework for inference (Gamarnik, Zadik AoS '22) to prove that our results are tight: no low-temperature local MCMC method can achieve better performance. In particular, our work identifies the "low-temperature (local) MCMC threshold" for both sparse models. Interestingly, in the sparse tensor PCA model our results indicate that low-temperature local MCMC methods significantly underperform compared to other studied time-efficient methods, such as the class of low-degree polynomials.

Replacement submissions for Friday, 2 August 2024 (showing 4 of 4 entries )

We introduce determinantal sieving, a new, remarkably powerful tool in the toolbox of algebraic FPT algorithms. Given a polynomial $P(X)$ on a set of variables $X=\{x_1,\ldots,x_n\}$ and a linear matroid $M=(X,\mathcal{I})$ of rank $k$, both over a field $\mathbb{F}$ of characteristic 2, in $2^k$ evaluations we can sieve for those terms in the monomial expansion of $P$ which are multilinear and whose support is a basis for $M$. Alternatively, using $2^k$ evaluations of $P$ we can sieve for those monomials whose odd support spans $M$. Applying this framework, we improve on a range of algebraic FPT algorithms, such as: 1. Solving $q$-Matroid Intersection in time $O^*(2^{(q-2)k})$ and $q$-Matroid Parity in time $O^*(2^{qk})$, improving on $O^*(4^{qk})$ over general fields (Brand and Pratt, ICALP 2021) 2. $T$-Cycle, Colourful $(s,t)$-Path, Colourful $(S,T)$-Linkage in undirected graphs, and the more general Rank $k$ $(S,T)$-Linkage problem, all in $O^*(2^k)$ time, improving on $O^*(2^{k+|S|})$ and $O^*(2^{|S|+O(k^2 \log(k+|\mathbb{F}|))})$ respectively (Fomin et al., SODA 2023) 3. Many instances of the Diverse X paradigm, finding a collection of $r$ solutions to a problem with a minimum mutual distance of $d$ in time $O^*(2^{r(r-1)d/2})$, improving solutions for $k$-Distinct Branchings from time $2^{O(k \log k)}$ to $O^*(2^k)$ (Bang-Jensen et al., ESA 2021), and for Diverse Perfect Matchings from $O^*(2^{2^{O(rd)}})$ to $O^*(2^{r^2d/2})$ (Fomin et al., STACS 2021) Here, all matroids are assumed to be represented over fields of characteristic 2. Over general fields, we achieve similar results at the cost of using exponential space by working over the exterior algebra. For a class of arithmetic circuits we call strongly monotone, this is even achieved without any loss of running time. However, the odd support sieving result appears to be specific to working over characteristic 2.

In the Stable Roommates problem, we seek a stable matching of the agents into pairs, in which no two agents have an incentive to deviate from their assignment. It is well known that a stable matching is unlikely to exist, but a stable partition always does and provides a succinct certificate for the unsolvability of an instance. Furthermore, apart from being a useful structural tool to study the problem, every stable partition corresponds to a stable half-matching, which has applications, for example, in sports scheduling and time-sharing. We establish new structural results for stable partitions and show how to enumerate all stable partitions and the cycles included in such structures efficiently. We also adapt optimality criteria from stable matchings to stable partitions and give complexity and approximability results for the problems of computing such "fair" and "optimal" stable partitions. Through this research, we contribute to a deeper understanding of stable partitions from a combinatorial point of view, as well as the computational complexity of computing "fair" or "optimal" stable half-matchings in practice, closing the gap between integral and fractional stable matchings and paving the way for further applications of stable partitions to unsolvable instances and computationally hard stable matching problems.

Fredman proposed in 1976 the following algorithmic problem: Given are a ground set $X$, some partial order $P$ over $X$, and some comparison oracle $O_L$ that specifies a linear order $L$ over $X$ that extends $P$. A query to $O_L$ has as input distinct $x, x' \in X$ and outputs whether $x <_L x'$ or vice versa. If we denote by $e(P)$ the number of linear extensions of $P$, then $\log e(P)$ is a worst-case lower bound on the number of queries needed to output the sorted order of $X$. Fredman did not specify in what form the partial order is given. Haeupler, Hladík, Iacono, Rozhon, Tarjan, and Tětek ('24) propose to assume as input a directed acyclic graph, $G$, with $m$ edges and $n=|X|$ vertices. Denote by $P_G$ the partial order induced by $G$. Algorithmic performance is measured in running time and the number of queries used, where they use $\Theta(m + n + \log e(P_G))$ time and $\Theta(\log e(P_G))$ queries to output $X$ in its sorted order. Their algorithm is worst-case optimal in terms of running time and queries, both. Their algorithm combines topological sorting with heapsort, and uses sophisticated data structures (including a recent type of heap with a working-set bound). Their analysis relies upon sophisticated counting arguments using entropy, recursively defined sets defined over the run of their algorithm, and vertices in the graph that they identify as bottlenecks for sorting. In this paper, we do away with sophistication. We show that when the input is a directed acyclic graph then the problem admits a simple solution using $\Theta(m + n + \log e(P_G))$ time and $\Theta(\log e(P_G))$ queries. Especially our proofs are much simpler as we avoid the usage of advanced charging arguments and data structures, and instead rely upon two brief observations.

More than 40 years ago, Schroeppel and Shamir presented an algorithm that solves the Subset Sum problem for $n$ integers in time $O^*(2^{0.5n})$ and space $O^*(2^{0.25n})$. The time upper bound remains unbeaten, but the space upper bound has been improved to $O^*(2^{0.249999n})$ in a recent breakthrough paper by Nederlof and Węgrzycki (STOC 2021). Their algorithm is a clever combination of a number of previously known techniques with a new reduction and a new algorithm for the Orthogonal Vectors problem. In this paper, we improve the space bound by Nederlof and Węgrzycki to $O^*(2^{0.246n})$ and also simplify their algorithm and its analysis. We achieve this by using an idea, due to Howgrave-Graham and Joux, of using a random prime number to filter the family of subsets. We incorporate it into the algorithm by Schroeppel and Shamir and then use this amalgam inside the representation technique. This allows us to reduce an instance of Subset Sum to a larger number of instances of weighted orthogonal vector.

current research on data structures

6.851: Advanced Data Structures (Spring'21)

Prof. erik demaine     tas: josh brunner, yevhenii diomidov, dylan hendrickson.

current research on data structures

Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current research directions in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).

Assuming there is interest, there will be an optional problem solving session, in which we will discuss and try to solve current open problems.

Prerequisites

The required prerequisite is 6.046, Design and Analysis of Algorithms or an equivalently thorough undergraduate algorithms class from another school (e.g., covering much of CLRS ). The recommended prerequisite is 6.854, Advanced Algorithms . This is the broad entry-level graduate course in Theory/Algorithms, and it normally makes sense to start there before jumping into deeper graduate courses. If you haven't taken 6.854, you must have a strong understanding of algorithms at the undergraduate level, such as receiving an A in 6.046, having had a relevant UROP, involvement in computer competitions, etc.

Requirements & Grading

required all , measured by thumbs-upping .
required recorded portions of synchronous classes that you didn't attend.
required during synchronous classes, at least one per week (email 6851-staff about exceptions)
required in class discussion, measured as coauthoring at least one nontrivial Coauthor post each week. See to check for zero weeks.
(Example of a trivial post for a solved problem: "Use a tree." Example of a nontrivial post for a solved problem: Describing the solution techniques you tried but failed. Deleted and unpublished posts do not count, but private posts do.) . Problem sets will be posted weekly, and follow a “one-page in, one-page out” rule.
choice write-up and presentation based on in-class work.
  • Revise existing scribe notes for one lecture (or maybe two), according to your own inspiration for improvement and feedback from fellow students. The entire calendar for the course has been posted, so you can select a lecture that interests you. We will circulate a sign-up sheet during the second week. Scribe notes are generally due noon on Tuesday after the corresponding class (but extensions are possible).
  • Research-oriented final project (paper and presentation). We allow theoretical, experimental, survey, and Wikipedia final projects.

Past and Future

The class is offered once every two years or so. It was given in Spring 2003 and Spring 2005 as 6.897, and in Spring 2007 , Spring 2010 , Spring 2012 , Spring 2014 , and Fall 2017 , as 6.851.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

algorithms-logo

Journal Menu

  • Algorithms Home
  • Aims & Scope
  • Editorial Board
  • Reviewer Board
  • Topical Advisory Panel
  • Instructions for Authors
  • Special Issues
  • Sections & Collections
  • Article Processing Charge
  • Indexing & Archiving
  • Editor’s Choice Articles
  • Most Cited & Viewed
  • Journal Statistics
  • Journal History
  • Journal Awards
  • Society Collaborations
  • Conferences
  • Editorial Office

Journal Browser

  • arrow_forward_ios Forthcoming issue arrow_forward_ios Current issue
  • Vol. 17 (2024)
  • Vol. 16 (2023)
  • Vol. 15 (2022)
  • Vol. 14 (2021)
  • Vol. 13 (2020)
  • Vol. 12 (2019)
  • Vol. 11 (2018)
  • Vol. 10 (2017)
  • Vol. 9 (2016)
  • Vol. 8 (2015)
  • Vol. 7 (2014)
  • Vol. 6 (2013)
  • Vol. 5 (2012)
  • Vol. 4 (2011)
  • Vol. 3 (2010)
  • Vol. 2 (2009)
  • Vol. 1 (2008)

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Machine Learning in Data Structures

  • Print Special Issue Flyer

Special Issue Editors

Special issue information.

  • Published Papers

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section " Databases and Data Structures ".

Deadline for manuscript submissions: closed (31 May 2022) | Viewed by 5925

Share This Special Issue

current research on data structures

Dear Colleagues,

Machine learning is the study of computer algorithms that allow computer programs to improve automatically through experience. Machine learning algorithms build a model based on training data to make predictions or decisions without being explicitly programmed to do so. In recent years, a recent subfield of machine learning has developped that is worth highlighting called deep learning (DL). DL addresses the use of different architectures of artificial neural networks that, through a hierarchy of layers with non-linear processing units, learn high-level abstractions for data.

Machine learning techniques and deep learning have been extensively used to explore the causal relations and correlations among big data. Moreover, in recent years, machine learning techniques have been used as a replacement for classical data structures. Specifically, the notion of smart data structures or learned indexes is appealing to what we want be achieved.

This Special Issue focuses on the latest developments in machine learning foundations on data structures, as well as on the synergy between data structures and machine learning. The aim of this Special Issue is to explore machine learning techniques and especially deep learning in order to recognize data schemas and data structures and make them interoperable.

This Special Issue is particularly interested in novel algorithms in the context of the application of machine learning to effectively design data structures in various applications. The notions related to the studies we intend to receive in this Special Issue are learned indices, multicriteria data structures and data structure alchemy. Theoretically well-founded contributions and their real-world applications in laying new foundations for machine learning and data structures are welcome. However, demonstration manuscripts that discuss successful system developments, as well as experience/use-case articles and high-quality survey papers, are also welcome. Contributions may span a wide range of algorithms for modeling, practices for building, and techniques for evaluating operations and services.

Submissions should focus on the aspects of application of machine learning techniques to the design of data structures, including (but not limited to) the following areas:

  • Machine learning techniques on data structures
  • Analysis of algorithms and distributed data structures
  • Novel programming models
  • Learned indices for big data
  • Smart data structures
  • Multicriteria data structures
  • Dynamic data structures
  • Distributed data structures
  • Classic data structures
  • Database applications
  • Mining and analytics for scientific and business data, social networks, time series, streams, text, web, graphs, rules, patterns, logs, and spatio–temporal data
  • Cloud-based applications
  • Performance models
  • Network management and techniques
  • Schema management and design
  • Machine learning, AI and databases
  • Data management issues and support for machine learning and AI
  • Specialized and domain-specific data management
  • Learned index structures or information retrieval

Dr. Christos Makris Dr. Andreas Kanavos Guest Editors

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form . Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (1 paper)

current research on data structures

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

current research on data structures

Algorithms and Data Structures

18th International Symposium, WADS 2023, Montreal, QC, Canada, July 31 – August 2, 2023, Proceedings

  • Conference proceedings
  • © 2023
  • Pat Morin   ORCID: https://orcid.org/0000-0003-0471-4118 0 ,
  • Subhash Suri   ORCID: https://orcid.org/0000-0002-5668-7521 1

Carleton University, Ottawa, Canada

You can also search for this editor in PubMed   Google Scholar

University of California, Santa Barbara, USA

Part of the book series: Lecture Notes in Computer Science (LNCS, volume 14079)

Included in the following conference series:

  • WADS: Algorithms and Data Structures Symposium

Conference proceedings info: WADS 2023.

35k Accesses

10 Citations

3 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

This book constitutes the refereed proceedings of the 18th International Symposium on Algorithms and Data Structures, WADS 2023, held during July 31-August 2, 2023. The 47 regular papers, presented in this book, were carefully reviewed and selected from a total of 92 submissions. They present original research on the theory, design and application of algorithms and data structures.

  • data structures
  • adaptive algorithms
  • artificial intelligence
  • computer networks
  • directed graphs
  • graph theory
  • graphic methods

Table of contents (47 papers)

Front matter, geometric spanning trees minimizing the wiener index.

  • A. Karim Abu-Affash, Paz Carmi, Ori Luwisch, Joseph S. B. Mitchell

The Mutual Visibility Problem for Fat Robots

  • Rusul J. Alsaedi, Joachim Gudmundsson, André van Renssen

Faster Algorithms for Cycle Hitting Problems on Disk Graphs

  • Shinwoo An, Kyungjin Cho, Eunjin Oh

Tight Analysis of the Lazy Algorithm for Open Online Dial-a-Ride

  • Júlia Baligács, Yann Disser, Farehe Soheil, David Weckbecker

Online TSP with Known Locations

  • Evripidis Bampis, Bruno Escoffier, Niklas Hahn, Michalis Xefteris

Socially Fair Matching: Exact and Approximation Algorithms

  • Sayan Bandyapadhyay, Fedor Fomin, Tanmay Inamdar, Fahad Panolan, Kirill Simonov

A Parameterized Approximation Scheme for Generalized Partial Vertex Cover

  • Sayan Bandyapadhyay, Zachary Friggstad, Ramin Mousavi

Dominator Coloring and CD Coloring in Almost Cluster Graphs

  • Aritra Banik, Prahlad Narasimhan Kasthurirangan, Venkatesh Raman

Tight Approximation Algorithms for Ordered Covering

  • Jatin Batra, Syamantak Das, Agastya Vibhuti Jha

Online Minimum Spanning Trees with Weight Predictions

  • Magnus Berg, Joan Boyar, Lene M. Favrholdt, Kim S. Larsen

Compact Distance Oracles with Large Sensitivity and Low Stretch

  • Davide Bilò, Keerti Choudhary, Sarel Cohen, Tobias Friedrich, Simon Krogmann, Martin Schirneck

Finding Diameter-Reducing Shortcuts in Trees

  • Davide Bilò, Luciano Gualà, Stefano Leucci, Luca Pepè Sciarria

Approximating the Smallest k -Enclosing Geodesic Disc in a Simple Polygon

  • Prosenjit Bose, Anthony D’Angelo, Stephane Durocher

Online Interval Scheduling with Predictions

  • Joan Boyar, Lene M. Favrholdt, Shahin Kamali, Kim S. Larsen

On Length-Sensitive Fréchet Similarity

  • Kevin Buchin, Brittany Terese Fasy, Erfan Hosseini Sereshgi, Carola Wenk

Hardness of Graph-Structured Algebraic and Symbolic Problems

  • Jingbang Chen, Yu Gao, Yufan Huang, Richard Peng, Runze Wang

Sublinear-Space Streaming Algorithms for Estimating Graph Parameters on Sparse Graphs

  • Xiuge Chen, Rajesh Chitnis, Patrick Eades, Anthony Wirth

Efficient k -Center Algorithms for Planar Points in Convex Position

  • Jongmin Choi, Jaegun Lee, Hee-Kap Ahn

Classification via Two-Way Comparisons (Extended Abstract)

  • Marek Chrobak, Neal E. Young

Other volumes

Editors and affiliations.

Subhash Suri

Bibliographic Information

Book Title : Algorithms and Data Structures

Book Subtitle : 18th International Symposium, WADS 2023, Montreal, QC, Canada, July 31 – August 2, 2023, Proceedings

Editors : Pat Morin, Subhash Suri

Series Title : Lecture Notes in Computer Science

DOI : https://doi.org/10.1007/978-3-031-38906-1

Publisher : Springer Cham

eBook Packages : Computer Science , Computer Science (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

Softcover ISBN : 978-3-031-38905-4 Published: 28 July 2023

eBook ISBN : 978-3-031-38906-1 Published: 27 July 2023

Series ISSN : 0302-9743

Series E-ISSN : 1611-3349

Edition Number : 1

Number of Pages : XII, 721

Number of Illustrations : 42 b/w illustrations, 121 illustrations in colour

Topics : Data Structures and Information Theory , Algorithm Analysis and Problem Complexity , Computer Systems Organization and Communication Networks , Symbolic and Algebraic Manipulation , Discrete Mathematics in Computer Science , Computer Graphics

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 April 2023

Hierarchical data structures for flowchart

  • Peng Zhang 1 ,
  • Wenzhang Dou 1 &
  • Huaping Liu 2  

Scientific Reports volume  13 , Article number:  5800 ( 2023 ) Cite this article

2647 Accesses

1 Citations

Metrics details

  • Computer science
  • Information technology

Flowcharts have broad applications in the fields of software development, engineering design, and scientific experimentation. Current flowchart data structure is mainly based on the adjacency list, cross-linked list, and adjacency matrix of the graph structure. Such design originated from the fact that any two nodes could have a connection relationship. But flowcharts have clear regularities, and their nodes have a certain inflow or outflow relationship. When graph structures such as an adjacency table or an adjacency matrix are used to store a flowchart, there is a large room for optimization in terms of traversal time and storage complexities, as well as usage convenience. In this paper we propose two hierarchical data structures for flowchart design. In the proposed structures, a flowchart is composed of levels, layers, and numbered nodes. The nodes between layers are connected according to a certain set of systematic design rules. Compared with the traditional graph data structures, the proposed schemes significantly reduce the storage space, improve the traversal efficiency, and resolve the problem of nesting between sub-charts. Experimental data based on flowchart examples used in this paper show that, compared with adjacency list, the hierarchical table data structure reduces the traversal time by 50% while their storage spaces are similar; compared with adjacency matrix, the hierarchical matrix data structure reduces the traversal time by nearly 70% and saves the storage space by about 50%. The proposed structures could have broad applications in flowchart-based software development, such as low-code engineering for smart industrial manufacturing.

Similar content being viewed by others

current research on data structures

Automated code compliance checking research based on BIM and knowledge graph

current research on data structures

Topological methods for data modelling

current research on data structures

7 Dimensions of software change patterns

Introduction.

Graph has a variety of use cases including hyperlink structure applications based on the world wide web, such as ranking of web search and application software based on web page and HTML protocols. Graphs require storage, traversing, processing and analysis, and some graphs contain millions or even billions of vertices and edges. Thus processing large-scale graphs based on the world wide web presents a significant technical challenge in terms of processing power and storage needs. To solve such a problem, there exist distributed graph algorithms such as MapReduce and accelerated solutions such as PageRank 1 . There are also research efforts that try to exploit the structures of the adjacency list and adjacency matrix to optimize the interactive processes between them 2 . Another direction is to exploit the data structure itself to optimize the design without changing the basic framework of the adjacency list and adjacency matrix 3 . These schemes could shorten the running time of the graph (especially web page graphs), save storage space, and improve graphs processing efficiency.

Arifuzzaman et al. 1 analyzed the technical challenges that the adjacency list data structure is facing in representing large-scale graphs (e.g., those that are composed of billions of nodes and edges). Zhu et al. 2 analyzed the graph data storage challenges from the perspectives of support transactions and edge scan efficiency. Lai et al. 3 discussed the possibility of determining whether these exists one or more paths between two vertices of a graph via the adjacency matrix. They concluded that the path itself cannot be identified through the adjacency matrix, and proposed a method to determine the matrix path accurately. Kallaugher et al. 4 studied the cycle-counting problem of the adjacency list data structure, and analyzed the space optimization issue of the adjacency list. Lin et al. 5 described a distributed solution for graph data. Andersen et al. 6 proposed an improved algorithm to compute approximate-PageRank-vector to improve the efficiency of graph data processing. Kang et al. 7 proposed a scalable and general graph management and mining system, GBASE. It increases the query speed and thus reduces the query time without changing the graph data structure. Kiourtis et al. 8 described an autoscaling big data analytics platform that users analytics through graph data modelling to support technical and nontechnical stakeholders.

Although there exist many solutions for large-scale graph data processing, these solutions still have obvious shortcomings in terms of efficiency because they require extensive hardware and software resources. For example, the work in 1 tried to resolve this problem from the direction of distributed memory and parallel processing, and proposed a load balancing scheme and a parallel processing algorithm to improve space and time efficiencies. With the traditional adjacency list structure, the time and space complexities are \({{\mathcal {O}}}(\frac{e}{P}+n+P)\) and \({{\mathcal {O}}}(\frac{e}{P})\) , respectively, where n , e , and P denote, respectively, the number of nodes, edges and processors.

There are also solutions designed for some particular types of graph, but they are essentially the traditional graph data structures such as the adjacency list and adjacency matrix.

Flowchart is a branch of graph and has many important applications. For example, experimental results reported in 9 , 10 , 11 have shown that visual programming using flowchart could be very powerful, and in smart manufacturing, flowcharts are used to effectively describe production and manufacturing processes 12 , 13 . Siemens 13 described Mendix, an application development platform for industrial manufacturing users. It includes app services that use flowcharts of binding code to make low-code or non-programming skills available. Taking automobile manufacturing as an example, the assembly process combines and connects a large number of parts according to the drawings and technical requirements. The sub-processes of the whole assembly process are not fixed, but vary based on specific needs. Each assembly process of automobile manufacturing can be abstracted as a node, and each assembly sub-process can be divided into several sub-nodes. Different assembly processes or different technological processes will generate different flowcharts. Thus one can associate nodes (assembly processes and technological processes) and edges (arrows that connecting two nodes) with computer program code blocks, and make them available to users in the form of web-version software. The software can run on the cloud, and is made accessible to users for smart manufacturing. Users can manipulate the flowchart required by the production processes via the web, and the computer in the factory can automatically complete the smart manufacturing process based on the codes associated with the flowchart. The integrated development environment (IDE) a software application that helps programmers develop software efficiently. Writing codes manually and repeatedly is inefficient and thus graphical symbols can be used instead of code blocks. Graphic symbols in a flowchart can be combined into several flowcharts, thereby drastically increasing the software development efficiency for many scenarios. Intelligent industrial manufacturing has led to the accelerated application of IDEs in production management software development. However, it has unique field-specific characteristics and requires a lot more software development talents than available in the field. Hence making industrial manufacturing software development easier and straightforward becomes critical. Flowchart is an efficient option to meet this need, since it can make very complex information and processes crystal clear.

However, creating a flowchart for complex processes could still be a very challenging task in terms of both time and storage complexities. The adjacency matrix path calculation method can also be applied to flowcharts, it has no advantages in terms of space and time complexities. Brunn et al. 14 defined the adjacency matrix via the Collatz graph, and investigated various algorithms for generating the Collatz odd series. It showed that there is a room for optimization in using the adjacency matrix to store general flowcharts since their space and time efficiencies are very limited. Meesum et al. 15 proposed graph-modification algorithms based on the adjacency matrix. Aiming for lower space and time complexities, they did not use structured methods to define graphs, but algebraic methods. This method works well for graphs with unstructured features, but it is not suitable for flowcharts that have structured flows and nested sub-processes. Aji et al. 16 analyzed the characteristics of the anti-adjacency matrix (a transformed form of the adjacency matrix) and designed a characteristic polynomial to express the function of the number of vertices in a graph. They indirectly showed that the anti-adjacency matrix still has space and time efficiency problems when used to store flowcharts. Mehrpouya et al. 12 introduced additive manufacturing that has provided significant freedom for creating complex components and highly customizable products. They also discussed the application of low-code development tool based on flowcharts in industrial manufacturing. Hooshyar et al. 17 developed a flowchart-based programming environment (FPE) which provides the basic programming algorithms prior to surface structure using an automatic text-to-flowchart conversion approach.

For flowcharts with structured flows, how to efficiently represent them remains an unsolved problem. As artificial intelligence (AI) applications continue to develop at an accelerated pace, the uses of large-scale flowcharts are becoming explosive. This mandates new data structures with higher efficiencies to improve the decision-making and storage efficiency.

Existing research has shown that the logical data structure of the flowcharts still adopts the traditional adjacency list, adjacency matrix, cross-linked list and other conventional graph data structures. Due to the structured flow characteristics of flowcharts, applying these traditional graph data structures results in substantial redundancy in space and a high time complexity. Code platform being as concise as possible is vital for intelligent manufacturing, seamless coordination among autonomous vehicles, and dispatch of autonomous vehicles.

Graph visualization is a way for representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains. A “concise-code platform” is a graph visualization software. It refers to the way experience and knowledge are presented to users via process and modularization of them. It improves the reuse efficiency of the back-end logics by simplifying the front-end, optimizes the iteration cycle, reduces cost, responds promptly to the outputs, improves the user experience, and makes application more convent. The idea of concise-code platform will promote broad applications of flowcharts with large scale of nodes. This will present a significant challenge toward a high processing efficiency of flowcharts.

The goal of this paper is to propose a new hierarchical data structure for large-scale flowcharts, leading to a much improved solution in terms of space (storage) and time efficiencies. For a single user (or small number of users) that requires small-scale flowcharts (e.g., those requiring 100 or so nodes), if the client/server (C/S) architecture is adopted, it is unnecessary to distribute the flowchart processing tasks to the client and servers, since it often results in insignificant time and space efficiency improvements. For a large number of simultaneous users, however, each user’s flowcharts will be processed in parallel on the server, resulting in huge amount of and extremely complex flowchart processing tasks. Challenges that large-scale graph data applications are facing are also discussed in 1 . Via some representative flowchart examples, the efficiency of flowcharts that use adjacency lists and adjacency matrices is shown to be poor. We propose an optimization method and a unique logical structure, fully exploiting the characteristics of the levels and layers of a flowchart.

The rest of this paper is organized as follows. Section “ Data logical structure preliminaries ” reviews existing data logical structures, covering basic concepts of adjacency matrix, adjacency list, cross-linked list. The proposed new hierarchical data structures for flowchart design are discussed in detail in Section “ New data structures for flowchart design ”. In Section “ Analysis of the proposed data structures ” we conduct an analysis of the proposed data structures, including traversal time and storage space comparisons with existing schemes. Concluding remarks are given in Section “ Analysis of the proposed data structures ”.

Data logical structure preliminaries

Data logical structure can be classified into two categories: linear structure and nonlinear structure 18 , 19 , 20 . A linear structure has one and only one start node and one end node, and each node has at most one predecessor and one successor. A typical linear structure example is a linear table. A node in a nonlinear structure can have multiple predecessors and successors; if a node has at most one predecessor and multiple successors, then this structure is a tree structure; if there is no limit on the number of predecessors and successors of a node, that is, there may be an adjacency relationship between any two nodes, then this structure is called a graph.

Data’s logical structure describes the data from the logical relationship perspectives. It is independent of data storage and the hardware platform. It represents the user’s view and is a mathematical model for practical problems, which reflects the essence of how data are organized. Based on such structure, data can be flexibly organized for each specific problem, and all data elements can be optimized based on the required logical structure. The computer memory consists of a limited number of storage units, each of which has a unique address. The address is coded consecutively, and each storage unit has only one successor unit. When the data node N of the logical structure is stored in the computer, it is necessary to establish a mapping relationship between the logical node and the storage structure, that is, to establish the mapping relationship between the logical data and the physical storage.

A flowchart is a common way to describe logical processes in engineering design and software development 9 , 10 , 11 . A node in the flowchart may have multiple inputs or multiple outputs. It is thus inappropriate to express in a linear structure. Current flowchart logical structure is mainly represented by graphs, including adjacency lists, cross-linked lists, and adjacency matrices. However, nodes in flowcharts are connected in an orderly fashion, rather randomly, that is, the inflow and outflow of nodes have regularities. But the traditional adjacency list, cross-linked list, and adjacency matrix include undirected graphs as well as arbitrary connections of directed graphs. This causes many problems. For example, consider a graph that contains 100 nodes and many edges that connect the nodes. Suppose the nodes can be arbitrarily connected by edges, and each connection type has a unique characteristic. This can be represented by the adjacency table data structure. However, if these nodes are arranged to follow certain connection rules, like the process in a pipeline or the flow of a river, which always flows downstream, and there may be branches at the downstream, then the adjacency table data structure becomes very inefficient.

Adjacency matrix

The adjacency matrix representation of a graph includes a node table that stores the information of each node (vertex) and an adjacency matrix that describes the relationship among the nodes. Storing a flowchart G with n nodes and e edges in an adjacency matrix requires a sequence table with n nodes and an \(n \times n\) adjacency matrix, resulting in a storage capacity of \(n+n \times n=n^2+n\) . Thus, when a graph is represented by an adjacency matrix, the storage space is a function of the number of nodes in the graph, but not the number of edges. When it is used to store a sparse graph, there will be a large number of zero elements, and storing these zero elements will consume a lot of storage space. An example of an adjacency matrix structure is shown below

where the numbers 0’s and 1’s represent the connection relationship between the two nodes: 0 means no connection and 1 otherwise.

The number of 0’s and 1’s in the adjacency matrix determines its time complexity, which is the product of the number of nodes on the rows and the number of nodes on the columns \({{\mathcal {O}}}(n^2)\) .

Adjacency list

The adjacency list is another common graph representation method. Its storage space is a function of both the number of nodes and the number of edges. An adjacency list is a chained storage structure for a graph. It consists of a sequentially stored node list and n singly-linked edge lists. The entries in the node table have a one-to-one mapping with the nodes in the graph, and each entry includes two fields, which are used to store the node data and the pointer to the edge table of the node, respectively. The entries in the edge table also have two fields to store the sequence number of another node associated with the edge and a pointer to the next entry in the edge table. The adjacency list structure is as follows.

vertex

firsArc

\(\rightarrow\)

no

next

Flowcharts represented by the adjacency list data structure require a storage capacity of \(n+e\) . Compared with the adjacency matrix, the adjacency list significantly reduces storage space. The adjacency list time complexity is \({{\mathcal {O}}}(n+e)\) , determined by the sum of the number of nodes and edges.

Cross-linked list

The cross-linked list is similar to the adjacency list; the difference is that both the inflow and outflow relationships between nodes and edges are expressed in a table structure. This means that the adjacency list and the inverse adjacency list of the flowchart are combined to store the flowchart. The cross-linked list structure also consists of a node table and an edge table. Each entry in the node table consists of a node data field and an edge pointer field. The edge pointer field is divided into two: the first one points to the first edge to which this node flows out; the second points to the first edge that ends at this node, that is, the first edge that flows into this node. The edge table consists of a mark field, a node serial number field i vex and j vex, and a link pointer field i link and j link. The mark field is used to mark whether the edge has been processed or searched for; i vex and j vex are the two node numbers associated with the edge. The link pointer i link points to the next entry with the same starting point as the current edge; j link points to the next edge whose end-point is the same as this edge. The cross-linked list structure is as follows.

vertex

firstIn

firstOut

  

mark

vex

vex

link

link

The use of a cross-linked list to represent a flowchart is similar to the use of an adjacency list, with node and edge pointers as the only link relationship, and each node or edge in the node and edge lists has a separate pointer. In terms of storage capacity and time complexity, it is also similar to the adjacency list. Further graph data structures including the adjacency list and the cross-linked list structures can be found in 21 , 22 , 23 .

In the adjacency matrix representation, it is convenient to determine whether there is an edge between two nodes by the value of an element in the matrix. Problems of the adjacency matrix representation are storage space and traversal time. The adjacency list and cross-linked list require scanning the node in the edge table to determine the flow direction. They are more complex, but can reduce the storage space and time complexity to a certain extent. However, there still is a room for optimization.

AI algorithms have seen an ever increasing number of applications recently, but the required computing power and storage space are scaling up dramatically as well. Flowcharts are widely use in AI 8 , 17 , 24 . Thus the power requirement for the flowchart mandates its logical structure optimization for faster traversal speed and less storage space.

New data structures for flowchart design

Existing data structures for graph.

When the adjacency list, cross-linked list, and adjacency matrix logical structures are used to describe a flowchart, there exist null elements for non-connected items. When the data is converted into a storage structure, these null elements will increase both storage space and unnecessary running time overhead. A brief summary of existing common structures for flowchart design is given in Table 1 . Next we present three examples of the storage structures of the adjacency list, adjacency matrix, and cross-linked list for flowcharts.

Adjacency list example

An example of the adjacency list structure is shown in Fig. 1 .

figure 1

A flowchart represented by the adjacency list structure (flowchart on the left; adjacency list on the right).

As shown on the left of Fig. 1 , the directional connecting lines represent the logical relationship between the nodes, that is, the adjacency relationship. Shown on the right of Fig. 1 is the adjacency table corresponding to the flowchart on the left: \(V_1\) to \(V_6\) correspond to the nodes represented by subscripts \(1-6\) , respectively. The two columns, ‘vertex’ and ‘firstArc’, represent, respectively, the node data and the pointer to the first edge. Each row of ‘vertex’ and ‘firstArc’ forms a node entry, each occupying one storage unit. The first ‘node’ and ‘next’ columns on the right of Fig. 1 represent, respectively, the next node to which the first edge-pointer points and the next edge pointer. Each entry of these two columns constitutes an edge entry; the second ‘node’ and ‘next’ columns respectively represent the next node to which the second edge-pointer points and the next edge pointer. In the same way, each row with multiple fields in the two columns of ‘node‘ and ‘next‘ forms an edge entry which occupies one storage unit. The symbol \(^\wedge\) means that it does not exist, i.e., a null value. In Fig. 1 , there are a total of 9 nodes and 11 edges, occupying a storage space of \(C_L=9+11=20\) .

Adjacency matrix example

Figure 2 shows an adjacency matrix representation of a flowchart. Entries of the first row and first column of the matrix are node labels (i.e., \(V_1, \ldots , V_9\) ). Other entries represent the edges between nodes. For example, the (2, 3)-entry is the edge from node 1 to node 2, denoted as \(\overrightarrow{V_1 V_2}\) . The value ‘1’ of each entry means that there is an edge; ‘0’ means no edge. Whether there is an edge or not is determined by the direction of flow. In this example, there are \(n=9\) nodes, occupying 9 storage units, and a total of \(E=n^2\) 1’s and 0’s. Each 1 or 0 occupies one storage unit. Thus the total number of storage units is \(C_M=E^2+n=9^2+9=90\) .

figure 2

A flowchart represented by the adjacency matrix (flowchart on the left; adjacency matrix on the right).

Cross-linked list example

Figure 3 is the cross-linked list representation of a flowchart.

figure 3

A flowchart represented by the cross-linked list (flowchart on the left; cross-linked list on the right).

The storage space used by the cross-linked list is the same as that of the adjacency list, but the representation is more complex. An advantage of the cross-linked list representation is that it can clearly express the edges flowing into and out of the same node in the same graph. When there exist conditions and loops in the flowchart, one could specify the loop relationship and the number of loops in the ‘mark’ field. The node with conditions can be determined through the connection relationship between the node table and the entries in the edge table. The cross-linked list is more efficient than the adjacency list in storing flowcharts with loops and conditions.

With either the adjacency matrix, or adjacency list, or cross-linked list, it is difficult to describe loops between nodes concisely and clearly. Additionally, complex flowcharts could have sub-flowcharts nested in a node, and the sub-nodes in the sub-flowcharts could further have sub-flowcharts nested in them, and so on. With such a layer-by-layer nesting relationship, the adjacency matrix, adjacency list, and cross-linked list structures will become extremely complex, and the storage space and time complexity will become excessive.

Hierarchical data structures for flowcharts

To resolve the above problems, we design two new data structures for flowcharts that can effectively and efficiently express both the sequential and nesting relationships among the nodes: a hierarchical matrix data structure and a hierarchical table data structure as illustrated in Fig. 4 .

figure 4

Hierarchical data structures for flowcharts.

Hierarchical matrix data structure

In the hierarchical matrix data structure, the row \(HP_j\) is the hierarchy, and the row \(E_{j,j+1}\) is the connection of each node from the \(HP_i\) layer to the \(HP_{j+}\) layer, i.e., the edge.

Figure 5 shows an example of the hierarchical matrix table, where ‘1’ indicates that the nodes between the upper and lower layers have a connection while ‘0’ means none. The number of connections is the product of the numbers of nodes in the two layers. In the example, \(HP_1\) layer has one node, and \(HP_2\) layer has two nodes. Thus there are \(1\times 2=2\) connections in the \(E_{1,2}\) row. Similarly, there are 2 nodes in \(HP_4\) layer and 3 nodes in \(HP_5\) layer; thus there are \(2 \times 3=6\) connections in \(E_{3,4}\) rows. The maximum number of 6 connections used is the number of columns of the matrix, the symbol \(^\wedge\) means a null value, the \(E_{j, j+1}\) -th row has a ‘1’ or ‘0’ connection relationship from left to right is: \(HP_j\) layer has 2 nodes and \(HP_{j+1}\) layer (the \((j+1)\) -th layer) has 3 nodes, the first node in the \(HP_j\) layer is connected to the first, the second, and the third node in the \(HP_{j+1}\) layer. The second node in the \(HP_j\) layer is connected to the first, the second, and the third node in the \(HP_{j+1}\) layer.

figure 5

Hierarchical matrix structure.

In Fig. 5 , there are 9 rows and 6 columns of cells (memory cells), totaling 54. In order to more clearly represent this data structure with a matrix and further optimize the storage space, each \(E_{j,j+1}\) row is decomposed into multiple rows. The number of rows is the maximum number of node connections in \(E_{j,j+1}\) divided by the maximum number of nodes in each layer. In the example in Fig. 5 , the maximum number of connections is 6, and the maximum number of nodes in each layer is 3; thus the number of rows to be decomposed into is \(6 \div 3=2\) , as shown in Fig. 6 , a transformed hierarchical matrix structure. After this optimization, the matrix has 13 rows and 3 columns of cells, totaling \(13 \times 3=39\) storage cells.

figure 6

A transformed hierarchical matrix structure.

If the connection relationship between the upper and lower layers have several null values, then the structure should be further optimized, as shown in Fig. 7 .

figure 7

Decomposed form of the hierarchical matrix structure.

Figure 8 shows a hierarchical matrix structure with nodes that have loops. Here decision parameter Y and N are included, where Y denotes “conditional Yes” and N means “does not meet the condition No”. If the decision label is Y , then the loop continues; otherwise the loop stops and starts the next node.

figure 8

Hierarchical matrix structure with loops.

Hierarchical table data structure

Figure 9 shows a hierarchical table structure for flowcharts.

figure 9

Hierarchical table structure.

Figure 10 is an example of a flowchart hierarchy that includes node groups. It is assumed that node 9 is a node group, that is, a sub-flowchart is nested in node 9. The bottom-left is the nested sub-flowchart of node 9.

figure 10

Hierarchical structure of flowcharts with node groups.

Hierarchical traversal processes

The hierarchical traversal method is shown in Fig. 11 , where \(G_1, G_2, \ldots , G_N\) are level-node groups, \(G_2\) is nested in \(G_1\) , \(G_{i+1}\) is nested in \(G_i\) . \(G_2\) and \(G_i\) each can have multiple level-node groups. For example, \(G_2\) can have \(G_{21}\) and \(G_{22}\) . Each level-node group is composed of multiple layer-node groups; \(H_1, H_2, \ldots , H_N\) are the layer-level groups of each level-node group; \(v_1, v_2, \ldots , v_N\) are the nodes of each layer-node group.

The hierarchical method traverses from \(G_1\) to \(G_2\) , then to \(G_i\) ; traversal of \(G_1\) goes from \(H_1\) to \(H_2\) , and then to \(H_j\) . After all layer-node group traversals are completed, the process jumps to the layer-node group of the nested level-node group; \(G_2\) and \(G_i\) follow the same process. Traversal of \(H_1\) goes from \(v_1, v_2, \ldots , v_n\) according to the node order. When all nodes in \(G_1\) are traversed, the process finally jumps to the node of the nested level-node group and starts traversing \(G_2\) . If there are multiple nested level-node group nodes, then traverse them one by one, for example, first \(G_{21}\) and then \(G_{22}\) .

figure 11

Hierarchical traversal method.

Comparison of breadth-first, depth-first, and hierarchical traversal methods

Here we use specific examples to compare the various traversal methods. An example of the vertex sequence of breadth-first traversal is shown in Fig. 12 . First visit vertex \(v_1\) , then visit, in turn, all unvisited vertices \(v_2\) and \(v_3\) adjacent to \(v_1\) , and then visit all unvisited vertices adjacent to \(v_2\) and \(v_3\) and so on. If there are still unvisited vertices, then choose one of them as the starting vertex and repeat the process until all the vertices in the graph have been visited.

figure 12

( a ) Flowchart example; ( b ) Node sequence of breadth first traversal.

An example of the vertex sequence of depth-first traversal is shown in Fig. 13 . First visit the vertex \(v_1,\) then select a vertex \(v_2\) that is adjacent to \(v_1\) and has not been visited, and then start from \(v_2\) and continue to traverse the vertex \(v_4\) that is adjacent to \(v_2\) and has not been visited, and so on. When the traversal gets to vertex \(v_6\) whose adjacent vertices all have been visited, return to the last vertex \(v_1\) in the visited vertex sequence that has an unvisited adjacent vertex \(v_3\) , and then start from \(v_1\) and traverse \(v_3\) . This process is repeated until all vertices in the graph have been visited.

figure 13

Example of node sequence of depth-first traversal for flowchart.

For a graph with n vertices (nodes) and e edges, using the adjacency list storage structure, and adopting depth-first or breadth-first traversal, all vertices in the vertex table need to be visited once, and the entries in the edge table is scanned once. Therefore, its time complexity is \({{\mathcal {O}}}(n+e)\) . If the adjacency matrix is used as the storage structure, the the time required to find all the edges of each vertex is \({{\mathcal {O}}}(n)\) , and the time required to traverse all the vertices in the matrix graph as shown in Fig. 5 is \({{\mathcal {O}}}(n) \times {{\mathcal {O}}}(n) = {{\mathcal {O}}}(n^2)\) .

An example of the vertex sequence of hierarchical traversal is shown in Fig. 14 . This flowchart has only one level \(G_1\) . Traversal starts from the first layer of \(G_1\) , and the vertex in the first layer is \(v_1\) . After \(v_1\) , it traverses the second layer \((v_2, v_3)\) , then the third layer \((v_4, v_5)\) , then continues to traverse the fourth layer \((v_6, v_7, v_8)\) , and finally traverse the fifth layer \((v_9)\) , until all layers are traversed. Now the traversal process looks for nodes in the layer-node group that have nested level-node groups. Since in this flowchart example no layer-node groups have nested level-node group.

figure 14

Hierarchical node traversal sequence (each dashed box represents a layer).

Analysis of the proposed data structures

The space complexity is defined as the storage space, and the time complexity is defined as the time efficiency when traversing through the flowchart. There are two commonly used graph traversal methods: depth-first traversal and breadth-first traversal 25 , 26 . Depth-first traversal is similar to the root traversal of a tree. It starts from the starting node, then selects an adjacent and unvisited node, and then recursively searches for adjacent nodes from this node. For cases when all nodes have been visited, it selects an unvisited node as the starting node to continue traversing until all nodes are traversed. Breadth-first traversal starts from the starting node, then visits all unvisited nodes adjacent to this node one by one, and then visits the unvisited adjacent nodes of the next node, until all reachable nodes have been visited. If there exist unvisited nodes, then another unvisited node is selected as the starting point to continue traversing. The above processes are repeated until all nodes in the graph are visited.

For the hierarchical table and hierarchical matrix methods, the traversal methods can be either breadth-first or depth-first. In order to further optimize the time complexity, and to deal with cases with node groups, this paper also designs a level- and layer-traversal method, here named as “hierarchical traversal”. The details are as follows.

First determine the node of a level, the top-level is treated as a node, that is, a “level-node group”. Then determine the node of the layer, and treat all nodes in each layer as a node, that is, a “layer-node group”. The layers are ordered and numbered, and further determine the connection relationship between the nodes in the previous layer and the nodes in the adjacent next layer. The highest time complexity of the nodes traversing the connections between two adjacent layers is \({{\mathcal {O}}}(m \times n)\) , where m is the number of nodes in the upper layer, n is the number nodes in the lower layer of the adjacent nodes.

Here are the traversal criteria to optimize the time complexity of traversing through the nodes in two adjacent layers: the nodes in each layer are labelled numerically (in increasing order). When there is only one node in one of the two adjacent layers, the connection is one-to-many by default; when there are multiple and equal number of nodes in both layers, the connection between nodes of the two layers would be first-to-first, second-to-second, and so on. Then we must judge whether the number of edges conforms to a pre-determined number; if not, then start from the node with the largest label (a number) in the upper layer to determine the connection relationship between it and the node with the second largest sequence number in the lower layer, and check the edge conditions. This process continues and traversing stops until the edge condition is met. If none of the cases meets the edge condition, then start with the second largest sequence number in the upper layer and go through the same process. An identifier is added for condition-checking nodes. Also, for condition-checking nodes (a class) that flow out of the upper-layer but flow into the same class in the upper layer, a backflow identifier should be added for them. Nodes with a backflow identifier do not participate in the process of checking whether or not to traverse to the next layer.

Hierarchical table data structure analysis

Take the flowchart in Fig. 5 as an example. With the hierarchical traversal method, the sequence numbers of the nodes are: \(H_1\) : \(v_1\) , \(H_2\) : \(v_2, v_3\) , \(H_3\) : \(v_4, v_5\) , \(H_4\) : \(v_6, v_7, v_8\) , \(H_5\) : \(v_9\) . The layers are numbered in order; the first layer has only one node, and the second layer has two nodes. Based on the design rules, \(v_1\) flows to \(v_2\) and \(v_1\) also flows to \(v_3\) by default. The third layer has two nodes. By default the first node of the second layer \(v_2\) flows to \(v_4\) of the third layer, and the second node of the second layer \(v_3\) flows to \(v_5\) of the third layer; \(v_4\) and \(v_5\) of the third layer are condition-checking nodes. The fourth layer has three nodes. By default the first node of the third layer flows to the first node of the fourth layer, and the second node of the third layer flows to the second node of the fourth layer. Now the number of edges is two fewer than necessary. Thus check whether the second node of the third layer actually flows to the second node of the forth later. If true, and the number of traversed edges has reached three, then continue traversing the first node of the third layer and the third node of the fourth layer. At this point, the number of edges traversed is four, which meets the condition. Hence the traversal of the fourth and fifth layers begins. Since \(v_6\) and \(v_7\) of the fourth layer have backflow identifiers, they do not participate in the traversal to determine if whether they flow into the next layer or not, leaving only \(v_8\) and \(v_9\) of the fifth layer. By default \(v_8\) flows into \(v_9\) , and \(v_9\) is the bottom layer, thus ending the traversal of the flowchart.

The traversal time complexity of the flowchart in Fig. 5 is mainly determined by the third and fourth layers. The two added traversal edges n and the number nodes participated in the traversal m are indicators of the time complexity between the two layers. The time complexity of the third layer and the fourth layer is \({{\mathcal {O}}}(n+m)={{\mathcal {O}}}(5)\) , and the total traversal time complexity of the five layers is roughly estimated as

where \(C_H\) is the number of layers. If the adjacency list structure is used to store the flowchart in Fig. 5 , all nodes in the node table need to be visited once, and the entries in the edge table need to be scanned once. Thus the traversal time complexity is

where N is the total number of nodes and e is the total number of edges.

These results show that, based on the flowchart in Fig. 5 as an experimental example, the ratio of time complexities of the hierarchical table and the adjacency table is \(R_{ta} = \frac{{{\mathcal {O}}}(C_H + {{\mathcal {O}}}(n=m))}{e+N} = \frac{10}{20} = 0.5\) . The increase in time efficiency is \(A_{da}=\left( 1-R_{ta}\right) \times 100\% = \left( 1-0.5\right) \times 100\%=50\%\) . The above analysis shows that the time complexity of the hierarchical traversal is significantly lower than that of the traditional methods.

If the hierarchical table structure is used, the space complexity (i.e., storage space) of the example shown in Fig. 5 is \(TS_c=n+e=11+9=20\) .

Hierarchical matrix data structure analysis

If we adopt the adjacency matrix storage structure for the example in Fig. 5 , all edges of each node must be searched, and all cells in the adjacency matrix (i.e., the nodes) must be traversed. The traversal time complexity is

This is very high. If the hierarchical traversal method is adopted, traversing the flowchart Fig. 5 is equivalent to traversing the hierarchical matrix table shown in Fig. 8 . The product of the numbers of rows and columns of the table is traversal time complexity:

This is significantly lower than the complexity in ( 3 ). Again, based on the flowchart in Fig. 5 as an example, the ratio of the time complexities of the hierarchical matrix and the adjacency matrix is \(R_{tb} = \frac{{{\mathcal {O}}}(U_r \times U_c)}{N_2} = \frac{24}{81} \approx 0.3\) . The increase in time efficiency is \(A_{db}=\left( 1-R_{tb} \right) \times 100\% =\left( 1 -0.3 \right) \times 100\%=70\%\) .

The space complexity with the hierarchical matrix being used for the flowchart shown in Fig. 5 is

where \(U_r\) is the number of layers, \(U_c\) is the number of nodes in each layer, and N is the total number of nodes.

Experimental data via examples show that the hierarchical table data structure reduces the traversal time by 50% compared with the existing adjacency list while their storage spaces are similar. In specific applications the improvement depends on the characteristics of flowcharts. The above experimental data reflect well commonly used flowcharts. The specific sets of experimental data also show that the hierarchical matrix data structure reduces the traversal time by nearly 70% and saves the storage space by about 50% compared with the existing adjacency matrix, as shown in Sections “ Hierarchical table data structure analysis ” and “ Hierarchical matrix data structure analysis ”. In summary, both the hierarchical table structure and hierarchical matrix data structure can greatly improve the traversal efficiency without increasing the storage space.

As AI applications continue to develop at an accelerated pace, graphical processing consumes an ever-increasing amount of power. Optimizing the storage space and traversal time of flowcharts will help mitigate the problem of insufficient computing and engineering needs. Compared with traditional adjacency lists, cross-linked lists, and adjacency matrices, the hierarchical data structures proposed in this paper can efficiently solve the multi-level nesting problem of flowcharts. They also have obvious advantages in terms of time and space complexities over existing methods, as shown by many examples presented.

Our further work will focus on exploring applications of flowcharts based on the hierarchical data structures in low-code engineering. For example, in realizing full autonomy of a large number of collaborative engineering vehicles (e.g., loaders, excavators, trucks) based flowcharts in the mine. The goal is to drive innovations in the field of low-code applications via new technologies at the “micro-technologies” level, thereby drastically improving efficiency.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Arifuzzamant, S. & Khan, M. Fast parallel conversion of edge list to adjacency list for large-scale graphs. In Proceedings of Symposium on High Performance Computing , 17–24 (2015).

Zhu, X. et al. A transactional graph storage system with purely sequential adjacency list scans. Proc. VLDB Endow. 13 (7), 1020–1034 (2020).

Article   Google Scholar  

Lai, M. & Wen, Z. The adjacency matrix calculation based on the acquisition method diagram. In Proceedings of 4th International Conference on Computer Science and Network Technology (ICCSNT) (2015).

Kallaugher, J., Mcgregor, A., Price, E. & Vorotnikova, S. The complexity of counting cycles in the adjacency list streaming model. In Proceedings of 38th ACM Symposium on SIGMOD-SIGACT-SIGAI Principles of Database Systems , 119–133 (2019)

Lin, J. & Sehatz, M. Design patterns for efficient graph algorithms in MapReduce. In Proceedings of 8th Workshop on Mining and Learning with Graphs , DC, USA, 78–85 (2010).

Andersen, R., Chung, F. & Lang, K. Local graph partitioning using PageRank vectors. In Proceedings of 47th Annual IEEE Symposium on Foundations of Computer Science , Berkekev, CA, USA, 475–486 (2006).

Kang, U., Tong, H. & Sun, J. GBASE: A scalable and general graph management system. In Proceedings of KDD , San Diego, CA, USA, 1091–1099 (2011).

Kiourtis, A., Karamolegkos, P., Karabetian, A., Voulgaris, K., Poulakis, Y., Mavrogiorgou, A. & Kyriazis, D. An autoscaling platform supporting graph data modelling big data analytics. In Studies in Health Technology and Informatics , 376–379 (2022).

Charntaweekhun, K. & Wangsiripitak, S. Visual programming using flowchart. In Proceedings of 2006 IEEE International Symposium on Communications and Information Technologies (2006).

Yokoyama, T., Axelsen, H. B. & Glück, R. Fundamentals of reversible flowchart languages. Theor. Comput. Sci. 611 , 87–115 (2016).

Article   MathSciNet   MATH   Google Scholar  

Reinhart, G. & Werne, J. Robot based system for the automation of flow assembly lines. Prod. Eng. Res. Devel. 3 , 121–126 (2009).

Mehrpouya, M. et al. The potential of additive manufacturing in the smart factory industrial 4.0: A review. Appl. Sci. 9 , 3865 (2019).

https://www.sw.siemens.com/en-US/digital-transformation/cloud/application-platform/.

Bruun, R. & Ghosh, S. The Collatz graph as flow-diagram, the Copenhagen Graph and the different Algorithms for generating the Collatz odd series. arXiv:2105.11334v3 (2021).

Meesum, S. & Saurabh, S. Rank reduction of directed graphs by vertex and edge deletions. Algorithmica (2016).

Aji, B. H., Sugeng, K. A. & Aminah, S. Characteristic polynomial and eigenvalues of antiadjacency matrix of directed unicyclic flower vase graph. J. Phys. Conf. Ser. (2021).

Hooshyar, D., Ahmad, R., Shamshirband, S., Yousefi, M. & Horng, S.-J. A flowchart-based programming environment for improving problem solving skills of Cs minors in computer programming. Asia Life Sci. 24 (2), 629–646 (2015).

Google Scholar  

https://www.javatpoint.com/linear-vs-non-linear-data-structure.

Cherini, R., Rearte, L. & Blanco, J. A shape analysis for non-linear data structures. In Proceedings of International Static Analysis Symposium: Static Analysis 201–217 (2010).

Li, L. Data structures and programming, Ch. 7: Linear data structures, 229–269 (1998).

https://www.geeksforgeeks.org/introduction-to-graphs-data-structure-and-algorithm-tutorials/.

McMillan, M. Data structure and algorithms with JavaScript, Ch. 11 (2014).

Sahni, S. Data structures, algorithms, and applications in C++, 2nd edn, Ch. 16 (2004)

Sun, L., Du, H. & Hou, T. FR-DETR: End-to-end flowchart recognition with precision and robustness. IEEE Access 10 , 64292–64301 (2022).

Cornell course notes, Breadth-first and depth-first traversal. https://www.cs.cornell.edu/courses/cs1114/2013sp/sections/S05_graph_traversal.pdf.

Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. Introduction to Algorithms end Ed. (MIT Press and McGraw-Hill, Ch. VI, 2001).

Download references

Author information

Authors and affiliations.

School of Software and Microelectronics, Peking University, Beijing, 100871, China

Peng Zhang & Wenzhang Dou

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA

Huaping Liu

You can also search for this author in PubMed   Google Scholar

Contributions

P.Z.: Conceptulization, formal analysis, investigation, methodology, and writing; W.D.: Resources and supervision; H.L.: Supervision, investigation, methodology, writing review and editing.

Corresponding author

Correspondence to Huaping Liu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhang, P., Dou, W. & Liu, H. Hierarchical data structures for flowchart. Sci Rep 13 , 5800 (2023). https://doi.org/10.1038/s41598-023-31968-z

Download citation

Received : 16 September 2022

Accepted : 20 March 2023

Published : 09 April 2023

DOI : https://doi.org/10.1038/s41598-023-31968-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

current research on data structures

Browse Course Material

Course info.

  • Prof. Erik Demaine

Departments

  • Electrical Engineering and Computer Science

As Taught In

  • Algorithms and Data Structures

Learning Resource Types

Advanced data structures, course description.

Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major …

Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structure.

Acknowledgments

Thanks to videographers Martin Demaine and Justin Zhang.

Billy has redecorated. He tells his parents that now the Christmas tree has a heap of presents underneath!  His mom tells him he will not be invited home next year.

DATA STRUCTURES FOR MODERN APPLICATIONS

  • September 2023
  • Publisher: Xoffencer International Publication
  • ISBN: 9788119534548

Mr. Soumya Ranjan Jena at NIMS University Jaipur Rajasthan BHARAT

  • NIMS University Jaipur Rajasthan BHARAT
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Ismail Keshta at AlMaarefa College for Science and Technology

  • AlMaarefa College for Science and Technology

Haewon Byeon at Inje University

  • Inje University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Open access
  • Published: 16 January 2024

A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions

  • Bharti Khemani 1 ,
  • Shruti Patil 2 ,
  • Ketan Kotecha 2 &
  • Sudeep Tanwar 3  

Journal of Big Data volume  11 , Article number:  18 ( 2024 ) Cite this article

17k Accesses

11 Citations

Metrics details

Deep learning has seen significant growth recently and is now applied to a wide range of conventional use cases, including graphs. Graph data provides relational information between elements and is a standard data format for various machine learning and deep learning tasks. Models that can learn from such inputs are essential for working with graph data effectively. This paper identifies nodes and edges within specific applications, such as text, entities, and relations, to create graph structures. Different applications may require various graph neural network (GNN) models. GNNs facilitate the exchange of information between nodes in a graph, enabling them to understand dependencies within the nodes and edges. The paper delves into specific GNN models like graph convolution networks (GCNs), GraphSAGE, and graph attention networks (GATs), which are widely used in various applications today. It also discusses the message-passing mechanism employed by GNN models and examines the strengths and limitations of these models in different domains. Furthermore, the paper explores the diverse applications of GNNs, the datasets commonly used with them, and the Python libraries that support GNN models. It offers an extensive overview of the landscape of GNN research and its practical implementations.

Introduction

Graph Neural Networks (GNNs) have emerged as a transformative paradigm in machine learning and artificial intelligence. The ubiquitous presence of interconnected data in various domains, from social networks and biology to recommendation systems and cybersecurity, has fueled the rapid evolution of GNNs. These networks have displayed remarkable capabilities in modeling and understanding complex relationships, making them pivotal in solving real-world problems that traditional machine-learning models struggle to address. GNNs’ unique ability to capture intricate structural information inherent in graph-structured data is significant. This information often manifests as dependencies, connections, and contextual relationships essential for making informed predictions and decisions. Consequently, GNNs have been adopted and extended across various applications, redefining what is possible in machine learning.

In this comprehensive review, we embark on a journey through the multifaceted landscape of Graph Neural Networks, encompassing an array of critical aspects. Our study is motivated by the ever-increasing literature and diverse perspectives within the field. We aim to provide researchers, practitioners, and students with a holistic understanding of GNNs, serving as an invaluable resource to navigate the intricacies of this dynamic field. The scope of this review is extensive, covering fundamental concepts that underlie GNNs, various architectural designs, techniques for training and inference, prevalent challenges and limitations, the diversity of datasets utilized, and practical applications spanning a myriad of domains. Furthermore, we delve into the intriguing future directions that GNN research will likely explore, shedding light on the exciting possibilities.

In recent years, deep learning (DL) has been called the gold standard in machine learning (ML). It has also steadily evolved into the most widely used computational technique in ML, producing excellent results on various challenging cognitive tasks, sometimes even matching or outperforming human ability. One benefit of DL is its capacity to learn enormous amounts of data [ 1 ]. GNN variations such as graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE have shown groundbreaking performance on various deep learning tasks in recent years [ 2 ].

A graph is a data structure that consists of nodes (also called vertices) and edges. Mathematically, it is defined as G = (V, E), where V denotes the nodes and E denotes the edges. Edges in a graph can be directed or undirected based on whether directional dependencies exist between nodes. A graph can represent various data structures, such as social networks, knowledge graphs, and protein–protein interaction networks. Graphs are non-Euclidean spaces, meaning that the distance between two nodes in a graph is not necessarily equal to the distance between their coordinates in an Euclidean space. This makes applying traditional neural networks to graph data difficult, as they are typically designed for Euclidean data.

Graph neural networks (GNNs) are a type of deep learning model that can be used to learn from graph data. GNNs use a message-passing mechanism to aggregate information from neighboring nodes, allowing them to capture the complex relationships in graphs. GNNs are effective for various tasks, including node classification, link prediction, and clustering.

Organization of paper

The paper is organized as follows:

The primary focus of this research is to comprehensively examine Concepts, Architectures, Techniques, Challenges, Datasets, Applications, and Future Directions within the realm of Graph Neural Networks.

The paper delves into the Evolution and Motivation behind the development of Graph Neural Networks, including an analysis of the growth of publication counts over the years.

It provides an in-depth exploration of the Message Passing Mechanism used in Graph Neural Networks.

The study presents a concise summary of GNN learning styles and GNN models, complemented by an extensive literature review.

The paper thoroughly analyzes the Advantages and Limitations of GNN models when applied to various domains.

It offers a comprehensive overview of GNN applications, the datasets commonly used with GNNs, and the array of Python libraries that support GNN models.

In addition, the research identifies and addresses specific research gaps, outlining potential future directions in the field.

" Introduction " section describes the Introduction to GNN. " Background study " section provides background details in terms of the Evolution of GNN. " Research motivation " section describes the research motivation behind GNN. Section IV describes the GNN message-passing mechanism and the detailed description of GNN with its Structure, Learning Styles, and Types of tasks. " GNN Models and Comparative Analysis of GNN Models " section describes the GNN models with their literature review details and comparative study of different GNN models. " Graph Neural Network Applications " section describes the application of GNN. And finally, future direction and conclusions are defined in " Future Directions of Graph Neural Network " and " Conclusions " sections, respectively. Figure  1 gives the overall structure of the paper.

figure 1

The overall structure of the paper

Background study

As shown in Fig.  2 below, the evolution of GNNs started in 2005. For the past 5 years, research in this area has been going into great detail. Neural graph networks are being used by practically all researchers in fields such as NLP, computer vision, and healthcare.

figure 2

Year-wise publication count of GNN (2005–2022)

Graph neural network research evolution

Graph neural networks (GNNs) were first proposed in 2005, but only recently have they begun to gain traction. GNNs were first introduced by Gori [2005] and Scarselli [2004, 2009]. A node's attributes and connected nodes in the graph serve as its natural definitions. A GNN aims to learn a state embedding h v ε R s that encapsulates each node's neighborhood data. The distribution of the expected node label is one example of the output. An s-dimension vector of node v, the state embedding h v , can be utilized to generate an output O v , such as the anticipated distribution node name. The predicted node label (O v ) distribution is created using the state embedding h v [ 30 ]. Thomas Kipf and Max Welling introduced the convolutional graph network (GCN) in 2017. A GCN layer defines a localized spectral filter's first-order approximation on graphs. GCNs can be thought of as convolutional neural networks that have been expanded to handle graph-structured data.

Graph neural network evolution

As shown in Fig.  3 below, research on graph neural networks (GNNs) began in 2005 and is still ongoing. GNNs can define a broader class of graphs that can be used for node-focused tasks, edge-focused tasks, graph-focused tasks, and many other applications. In 2005, Marco Gori introduced the concept of GNNs and defined recursive neural networks extended by GNNs [ 4 ]. Franco Scarselli also explained the concepts for ranking web pages with the help of GNNs in 2005 [ 5 ]. In 2006, Swapnil Gandhi and Anand Padmanabha Iyer of Microsoft Research introduced distributed deep graph learning at scale, which defines a deep graph neural network [ 6 ]. They explained new concepts such as GCN, GAT, etc. [ 1 ]. Pucci and Gori used GNN concepts in the recommendation system.

figure 3

Graph Neural Network Evolution

2007 Chun Guang Li, Jun Guo, and Hong-gang Zhang used a semi-supervised learning concept with GNNs [ 7 ]. They proposed a pruning method to enhance the basic GNN to resolve the problem of choosing the neighborhood scale parameter. In 2008, Ziwei Zhang introduced a new concept of Eigen-GNN [ 8 ], which works well with several GNN models. In 2009, Abhijeet V introduced the GNN concept in fuzzy networks [ 9 ], proposing a granular reflex fuzzy min–max neural network for classification. In 2010, DK Chaturvedi explained the concept of GNN for soft computing techniques [ 10 ]. Also, in 2010, GNNs were widely used in many applications. In 2010, Tanzima Hashem discussed privacy-preserving group nearest neighbor queries [ 11 ]. The first initiative to use GNNs for knowledge graph embedding is R-GCN, which suggests a relation-specific transformation in the message-passing phases to deal with various relations.

Similarly, from 2011 to 2017, all authors surveyed a new concept of GNNs, and the survey linearly increased from 2018 onwards. Our paper shows that GNN models such as GCN, GAT, RGCN, and so on are helpful [ 12 ].

Literature review

In the Table  1 describe the literature survey on graph neural networks, including the application area, the data set used, the model applied, and performance evaluation. The literature is from the years 2018 to 2023.

Research motivation

We employ grid data structures for normalization of image inputs, typically using an n*n-sized filter. The result is computed by applying an aggregation or maximum function. This process works effectively due to the inherent fixed structure of images. We position the grid over the image, move the filter across it, and derive the output vector as depicted on the left side of Fig.  4 . In contrast, this approach is unsuitable when working with graphs. Graphs lack a predefined structure for data storage, and there is no inherent knowledge of node-to-neighbor relationships, as illustrated on the right side of Fig.  4 . To overcome this limitation, we focus on graph convolution.

figure 4

CNN In Euclidean Space (Left), GNN In Euclidean Space (Right)

In the context of GCNs, convolutional operations are adapted to handle graphs’ irregular and non-grid-like structures. These operations typically involve aggregating information from neighboring nodes to update the features of a central node. CNNs are primarily used for grid-like data structures, such as images. They are well-suited for tasks where spatial relationships between neighboring elements are crucial, as in image processing. CNNs use convolutional layers to scan small local receptive fields and learn hierarchical representations. GNNs are designed for graph-structured data, where edges connect entities (nodes). Graphs can represent various relationships, such as social networks, citation networks, or molecular structures. GNNs perform operations that aggregate information from neighboring nodes to update the features of a central node. CNNs excel in processing grid-like data with spatial dependencies; GNNs are designed to handle graph-structured data with complex relationships and dependencies between entities.

Limitation of CNN over GNN

Graph Neural Networks (GNNs) draw inspiration from Convolutional Neural Networks (CNNs). Before delving into the intricacies of GNNs, it is essential to understand why Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) may not suffice for effectively handling data structured as graphs. As illustrated in Fig.  5 , Convolutional Neural Networks (CNNs) are designed for data that exhibits a grid structure, such as images. Conversely, Recurrent Neural Networks (RNNs) are tailored to sequences, like text.

figure 5

Convolution can be performed if the input is an image using an n*n mask (Left). Convolution can't be achieved if the input is a graph using an n*n mask. (Right)

Typically, we use arrays for storage when working with text data. Likewise, for image data, matrices are the preferred choice. However, as depicted in Fig.  5 , arrays and matrices fall short when dealing with graph data. In the case of graphs, we require a specialized technique known as Graph Convolution. This approach enables deep neural networks to handle graph-structured data directly, leading to a graph neural network.

Fig. 5 illustrates that we can employ masking techniques and apply filtering operations to transform the data into vector form when we have images. Conversely, traditional masking methods are not applicable when dealing with graph data as input, as shown in the right image.

Graph neural network

Graph Neural Networks, or GNNs, are a class of neural networks tailored for handling data organized in graph structures. Graphs are mathematical representations of nodes connected by edges, making them ideal for modeling relationships and dependencies in complex systems. GNNs have the inherent ability to learn and reason about graph-structured data, enabling diverse applications. In this section, we first explained the passing mechanism of GNN (" Message Passing Mechanism in Graph Neural Network Section "), then described graphs related to the structure of graphs, graph types, and graph learning styles (" Description of GNN Taxonomy " Section).

Message passing mechanism in graph neural network

Graph symmetries are maintained using a GNN, an optimizable transformation on all graph properties (nodes, edges, and global context) (permutation invariances). Because a GNN does not alter the connectivity of the input graph, the output may be characterized using the same adjacency list and feature vector count as the input graph. However, the output graph has updated embeddings because the GNN modified each node, edge, and global-context representation.

In Fig. 6 , circles are nodes, and empty boxes show aggregation of neighbor/adjacent nodes. The model aggregates messages from A's local graph neighbors (i.e., B, C, and D). In turn, the messages coming from neighbors are based on information aggregated from their respective neighborhoods, and so on. This visualization shows a two-layer version of a message-passing model. Notice that the computation graph of the GNN forms a tree structure by unfolding the neighborhood around the target node [ 17 ]. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs [ 30 ].

figure 6

How a single node aggregates messages from its adjacent neighbor nodes

The message-passing mechanism of Graph Neural Networks is shown in Fig. 7 . In this, we take an input graph with a set of node features X ε R d ⇥ |V| and Use this knowledge to produce node embeddings z u . However, we will also review how the GNN framework may embed subgraphs and whole graphs.

figure 7

Message passing mechanism in GNN

At each iteration, each node collects information from the neighborhood around it. Each node embedding has more data from distant reaches of the graph as these iterations progress. After the first iteration (k = 1), each node embedding expressly retains information from its 1-hop neighborhood, which may be accessed via a path in the length graph 1. [ 31 ]. After the second iteration (k = 2), each node embedding contains data from its 2-hop neighborhood; generally, after k iterations, each node embedding includes data from its k-hop setting. The kind of “information” this message passes consists of two main parts: structural information about the graph (i.e., degree of nodes, etc.), and the other is feature-based.

In the message-passing mechanism of a neural network, each node has its message stored in the form of feature vectors, and each time, the neighbor updates the information in the form of the feature vector [ 1 ]. This process aggregates the information, which means the grey node is connected to the blue node. Both features are aggregated and form new feature vectors by updating the values to include the new message.

Equations  4.1 and 4.2 shows that h denotes the message, u represents the node number, and k indicates the iteration number. Where AGGREGATE and UPDATE are arbitrarily differentiable functions (i.e., neural networks), and mN(u) is the “message,” which is aggregated from u's graph neighborhood N(u). We employ superscripts to identify the embeddings and functions at various message-passing iterations. The AGGREGATE function receives as input the set of embeddings of the nodes in the u's graph neighborhood N (u) at each iteration k of the GNN and generates a message. \({m}_{N(u)}^{k}\) . Based on this aggregated neighborhood information. The update function first UPDATES the message and then combines the message. \({m}_{N(u)}^{k}\) with the previous message \({h}_{u}^{(k-1)}\) of node, u to generate the updated message \({h}_{u}^{k}\) .

Description of GNN taxonomy

We can see from Fig. 8 below shows that we have divided our GNN taxonomy into 3 parts [ 30 ].

figure 8

Graph Neural Network Taxonomy

1. Graph Structures 2. Graph Types 3. Graph Learning Tasks

Graph structure

The two scenarios shown in Fig. 9 typically present are structural and non-structural. Applications involving molecular and physical systems, knowledge graphs, and other objects explicitly state the graph structure in structural contexts.

figure 9

Graph Structure

Graphs are implicit in non-structural situations. As a result, we must first construct the graph from the current task. For text, we must build a fully connected “a word” graph and a scene graph for images.

Graph types

There may be more information about nodes and links in complex graph types. Graphs are typically divided into 5 categories, as shown in Fig.  10 .

figure 10

Types of Graphs

Directed/undirected graphs

A directed graph is characterized by edges with a specific direction, indicating the flow from one node to another. Conversely, in an undirected graph, the edges lack a designated direction, allowing nodes to interact bidirectionally. As illustrated in Fig. 11 (left side), the directed graph exhibits directed edges, while in Fig. 11 (right side), the undirected graph conspicuously lacks directional edges. In undirected graphs, it's important to note that each edge can be considered to comprise two directed edges, allowing for mutual interaction between connected nodes.

figure 11

Directed/Undirected Graph

Static/dynamic graphs

The term “dynamic graph” pertains to a graph in which the properties or structure of the graph change with time. In dynamic graphs shown in Fig. 12 , it is essential to account for the temporal dimension appropriately. These dynamic graphs represent time-dependent events, such as the addition and removal of nodes and edges, typically presented as an ordered sequence or an asynchronous stream.

A noteworthy example of a dynamic graph can be observed in social networks like Twitter. In such networks, a new node is created each time a new user joins, and when a user follows another individual, a following edge is established. Furthermore, when users update their profiles, the respective nodes are also modified, reflecting the evolving nature of the graph. It's worth noting that different deep-learning libraries handle graph dynamics differently. TensorFlow, for instance, employs a static graph, while PyTorch utilizes a dynamic graph.

figure 12

Static/Dynamic Graph

Homogeneous/heterogeneous graphs

Homogeneous graphs have only one type of node and one type of edge shown in Fig. 13 (Left). A homogeneous graph is one with the same type of nodes and edges, such as an online social network with friendship as edges and nodes representing people. In homogeneous networks, nodes and edges have the same types.

Heterogeneous graphs shown in Fig. 13 (Right) , however, have two or more different kinds of nodes and edges. A heterogeneous network is an online social network with various edges between nodes of the ‘person’ type, such as ‘friendship’ and ‘co-worker.’ Nodes and edges in heterogeneous graphs come in several varieties. Types of nodes and edges play critical functions in heterogeneous networks that require further consideration.

figure 13

Homogeneous (Left), Heterogeneous (Right) Graph

Knowledge graphs

An array of triples in the form of (h, r, t) or (s, r, o) can be represented as a Knowledge Graph (KG), which is a network of entity nodes and relationship edges, with each triple (h, r, t) representing a single entity node. The relationship between an entity’s head (h) and tail (t) is denoted by the r. Knowledge Graph can be considered a heterogeneous graph from this perspective. The Knowledge Graph visually depicts several real-world objects and their relationships [ 32 ]. It can be used for many new aspects, including information retrieval, knowledge-guided innovation, and answering questions [ 30 ]. Entities are objects or things that exist in the real world, including individuals, organizations, places, music tracks, movies, and people. Each relation type describes a particular relationship between various elements similarly. We can see from Fig. 14 the Knowledge graph for Mr. Sundar Pichai.

figure 14

Knowledge graph

Transductive/inductive graphs

In a transductive scenario shown in Fig. 15 (up), the entire graph is input, the label of the valid data is hidden, and finally, the label for the correct data is predicted. However, with an inductive graph shown in Fig. 15 (down), we also input the entire graph (but only sample to batch), mask the valid data’s label, and forecast the valuable data’s label. The model must forecast the labels of the given unlabeled nodes in a transductive context. In the inductive situation, it is possible to infer new unlabeled nodes from the same distribution.

figure 15

Transductive/Inductive Graphs

Transductive Graph:

In the transductive approach, the entire graph is provided as input.

This method involves concealing the labels of the valid data.

The primary objective is to predict the labels for the valid data.

Inductive Graph:

The inductive approach still uses the complete graph, but only a sample within a batch is considered.

A crucial step in this process is masking the labels of the valid data.

The key aim here is to make predictions for the labels of the valid data.

Graph learning tasks

We perform three tasks with graphs: node classification, link prediction, and Graph Classification shown in Fig. 16 .

figure 16

Node Level Prediction (e.g., social network) (LEFT), Edge Level Prediction (e.g., Next YouTube Video?) (MIDDLE), Graph Level Prediction (e.g., molecule) (Right)

Node-level task

Node-level tasks are primarily concerned with determining the identity or function of each node within a graph. The core objective of a node-level task is to predict specific properties associated with individual nodes. For example, a node-level task in social networks could involve predicting which social group a new member is likely to join based on their connections and the characteristics of their friends' memberships. Node-level tasks are typically used when working with unlabeled data, such as identifying whether a particular individual is a smoker.

Edge-level task (link prediction)

Edge-level tasks revolve around analyzing relationships between pairs of nodes in a graph. An illustrative application of an edge-level task is assessing the compatibility or likelihood of a connection between two entities, as seen in matchmaking or dating apps. Another instance of an edge-level task is evident when using platforms like Netflix, where the task involves predicting the following video to be recommended based on viewing history and user preferences.

Graph-level

In graph-level tasks, the objective is to make predictions about a characteristic or property that encompasses the entire graph. For example, using a graph-based representation, one might aim to predict attributes like the olfactory quality of a molecule or its potential to bind with a disease-associated receptor. The essence of a graph-level task is to provide predictions that pertain to the graph as a whole. For instance, when assessing a newly synthesized chemical compound, a graph-level task might seek to determine whether the molecule has the potential to be an effective drug. The summary of all three learning tasks are shown in Fig. 17 .

figure 17

Graph Learning Tasks Summary

GNN models and comparative analysis of GNN models

Graph Neural Network (GNN) models represent a category of neural networks specially crafted to process data organized in graph structures. They've garnered substantial acclaim across various domains, primarily due to their exceptional capability to grasp intricate relationships and patterns within graph data. As illustrated in Fig.  18 , we've outlined three distinct GNN models. A comprehensive description of these GNN models, specifically Graph Convolutional Networks (GCN), Graph Attention Networks (GAT/GAN), and GraphSAGE models can be found in the reference [ 33 ]. In Sect. " GNN models ", we delve into these GNN models' intricacies; in " Comparative Study of GNN Models " section, we provide an in-depth analysis that explores their theoretical and practical aspects.

figure 18

Graph convolution neural network (GCN)

GCN is one of the basic graph neural network variants. Thomas Kipf and Max Welling developed GCN networks. Convolution layers in Convolutional Neural Networks are essentially the same process as 'convolution' in GCNs. The input neurons are multiplied by weights called filters or kernels. The filters act as a sliding window across the image, allowing CNN to learn information from nearby cells. Weight sharing uses the same filter within the same layer throughout the image; when CNN is used to identify photos of cats vs. non-cats, the same filter is employed in the same layer to detect the cat's nose and ears. Throughout the image, the same weight (or kernel or filter in CNNs) is applied [ 33 ]. GCNs were first introduced in “Spectral Networks and Deep Locally Connected Networks on Graphs” [ 34 ].

GCNs, which learn features by analyzing neighboring nodes, carry out similar behaviors. The primary difference between CNNs and GNNs is that CNNs are made to operate on regular (Euclidean) ordered data. GNNs, on the other hand, are a generalized version of CNNs with different numbers of node connections and unordered nodes (irregular on non-Euclidean structured data). GCNs have been applied to solve many problems, for example, image classification [ 35 ], traffic forecasting [ 36 ], recommendation systems [ 17 ], scene graph generation [ 37 ], and visual question answering [ 38 ].

GCNs are particularly well-suited for tasks that involve data represented as graphs, such as social networks, citation networks, recommendation systems, and more. These networks are an extension of traditional CNNs, widely used for tasks involving grid-like data, such as images. The key idea behind GCNs is to perform convolution operations on the graph data. This enables them to capture and propagate information through the nodes in a graph by considering both a node’s features and those of its neighboring nodes. GCNs typically consist of several layers, each performing convolution and aggregation steps to refine the node representations in the graph. By applying these layers iteratively, GCNs can capture complex patterns and dependencies within the graph data.

Working of graph convolutional network

A Graph Convolutional Network (GCN) is a type of neural network architecture designed for processing and analyzing graph-structured data. GCNs work by aggregating and propagating information through the nodes in a graph. GCN works with the following steps shown in Fig.  19 :

Initialization:

figure 19

Working of GCN

Each node in the graph is associated with a feature vector. Depending on the application, these feature vectors can represent various attributes or characteristics of the nodes. For example, in a social network, each node might represent a user, and the features could include user profile information.

Convolution Operation:

The core of a GCN is the convolution operation, which is adapted from convolutional neural networks (CNNs). It aims to aggregate information from neighboring nodes. This is done by taking a weighted sum of the feature vectors of neighboring nodes. The graph's adjacency matrix determines the weights. The resulting aggregated information is a new feature vector for each node.

Weighted Aggregation:

The graph's adjacency matrix, typically after normalization, provides weights for the aggregation process. In this context, for a given node, the features of its neighboring nodes are scaled by the corresponding values within the adjacency matrix, and the outcomes are then accumulated. A precise mathematical elucidation of this aggregation step is described in " Equation of GCN " section.

Activation function and learning weights:

The aggregated features are typically passed through an activation function (e.g., ReLU) to introduce non-linearity. The weight matrix W used in the aggregation step is learned during training. This learning process allows the GCN to adapt to the specific graph and task it is designed for.

Stacking Layers:

GCNs are often used in multiple layers. This allows the network to capture more complex relationships and higher-level features in the graph. The output of one GCN layer becomes the input for the next, and this process is repeated for a predefined number of layers.

Task-Specific Output:

The final output of the GCN can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the specific application.

Equation of GCN

The Graph Convolutional Network (GCN) is based on a message-passing mechanism that can be described using mathematical equations. The core equation of a superficial, first-order GCN layer can be expressed as follows: For a graph with N nodes, let's define the following terms:

Equation  5.1 depicts a GCN layer's design. The normalized graph adjacency matrix A' and the nodes feature matrix F serve as the layer's inputs. The bias vector b and the weight matrix W are trainable parameters for the layer.

When used with the design matrix, the normalized adjacency matrix effectively smoothes a node’s feature vector based on the feature vectors of its close graph neighbors. This matrix captures the graph structure. A’ is normalized to make each neighboring node’s contribution proportional to the network's connectivity.

The layer definition is finished by applying A'FW + b to an element-wise non-linear function, such as ReLU. The downstream node classification task requires deep neural architectures to learn a complicated hierarchy of node attributes. This layer's output matrix Z can be routed into another GCN layer or any other neural network layer to do this.

Summary of graph convolution neural network (GCN) is shown in Table 2 .

Graph attention network (gat/gan).

Graph Attention Network (GAT/GAN) is a new neural network that works with graph-structured data. It uses masked self-attentional layers to address the shortcomings of past methods that depended on graph convolutions or their approximations. By stacking layers, the process makes it possible (implicitly) to assign various nodes in a neighborhood different weights, allowing nodes to focus on the characteristics of their neighborhoods without having to perform an expensive matrix operation (like inversion) or rely on prior knowledge of the graph's structure. GAT concurrently tackles numerous significant limitations of spectral-based graph neural networks, making the model suitable for both inductive and transductive applications.

Working of GAT

The Graph Attention Network (GAT) is a neural network architecture designed for processing and analyzing graph-structured data shown in Fig. 20 . GATs are a variation of Graph Convolutional Networks (GCNs) that incorporate the concept of attention mechanisms. GAT/GAN works with the following steps shown in Fig.  21 .

figure 20

How attention Coefficients updates

As with other graph-based models, GAT starts with nodes in the graph, each associated with a feature vector. These features can represent various characteristics of the nodes.

Self-Attention Mechanism and Attention Computation:

GAT introduces an attention mechanism similar to what is used in sequence-to-sequence models in natural language processing. The attention mechanism allows each node to focus on different neighbors when aggregating information. It assigns different attention coefficients to the neighboring nodes, making the process more flexible. For each node in the graph, GAT computes attention scores for its neighboring nodes. These attention scores are based on the features of the central node and its neighbors. The attention scores are calculated using a weighted sum of the features of the central node and its neighbors.

The attention scores determine how much each neighbor’s feature contributes to the aggregation for the central node. This weighted aggregation is carried out for all neighboring nodes, resulting in a new feature vector for the central node.

Multiple Attention Heads and Output Combination:

GAT often employs multiple attention heads in parallel. Each attention head computes its attention scores and aggregation results. These multiple attention heads capture different aspects of the relationships in the graph. The outputs from the multiple attention heads are combined, typically by concatenation or averaging, to create a final feature vector for each node.

Learning Weights and Stacking Layers:

Similar to GCNs, GATs learn weight parameters during training. These weights are learned to optimize the attention mechanisms and adapt to the specific graph and task. GATs can be used in multiple layers to capture higher-level features and complex relationships in the graph. The output of one GAT layer becomes the input for the next layer.

The learning weights capture the importance of node relationships and contribute to information aggregation during the neighborhood aggregation process. The learning process in GNNs also relies on backpropagation and optimization algorithms. The stacking of GNN layers enables the model to capture higher-level abstractions and dependencies in the graph. Each layer refines the node representations based on information from the previous layer.

The final output of the GAT can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the application.

Equation for GAT

GAT’s main distinctive feature is gathering data from the one-hop neighborhood [ 30 ]. A graph convolution operation in GCN produces the normalized sum of node properties of neighbors. Equation  5.2 shows the Graph attention network, which \({h}_{i}^{(l+1)}\) defines the current node output, \(\sigma\) denotes the non-linearity ReLU function, \(j\varepsilon N\left(i\right)\) one hop neighbor, \({\complement }_{i,j}\) normalized vector, \({W}^{\left(l\right)}\) weight matrix, and \({h}_{j}^{(l)}\) denotes the previous node.

Why is GAT better than GCN?

We learned from the Graph Convolutional Network (GCN) that integrating local graph structure and node-level features results in good node classification performance. The way GCN aggregates messages, on the other hand, is structure-dependent, which may limit its use.

How attention coefficients update: the attention layer has 4 parts: [ 47 ]

A linear transformation: A shared linear transformation is applied to each node in the following Equation.

where h is a set of node features. W is the weight matrix. Z is the output layer node.

Attention Coefficients: In the GAT paradigm, it is crucial because every node can now attend to every other node, discarding any structural information. The pair-wise un-normalized attention score between two neighbors is computed in the next step. It combines the 'z' embeddings of the two nodes. Where || stands for concatenation, a learnable weight vector a(l) is put through a dot product, and a LeakyReLU is used [ 1 ]. Contrary to the dot-product attention utilized in the Transformer model, this kind of attention is called additive attention. The nodes are subsequently subjected to self-attention.

Softmax: We utilize the softmax function to normalize the coefficients over all j values, improving their comparability across nodes.

Aggregation: This process is comparable to GCN. The neighborhood embeddings are combined and scaled based on the attention scores.

Summary of graph attention network (GAT) is shown in Table 3 .

GraphSAGE represents a tangible realization of an inductive learning framework shown in Fig. 22 . It exclusively considers training samples linked to the training set's edges during training. This process consists of two main steps: “Sampling” and “Aggregation.” Subsequently, the node representation vector is paired with the vector from the aggregated model and passed through a fully connected layer with a non-linear activation function. It's important to note that each network layer shares a standard aggregator and weight matrix. Thus, the consideration should be on the number of layers or weight matrices rather than the number of aggregators. Finally, a normalization step is applied to the layer's output.

Two major steps:

Sample It describes how to sample a large number of neighbors.

Aggregator refers to obtaining the neighbor node embedding and then determining how to collect these embeddings and change your embedding information.

figure 22

Working of Graph SAGE Method

Working of graphSAGE model:

First, initializes the eigenvectors of all nodes in the input graph

For each node, get its sampled neighbor nodes

The aggregation function is used to aggregate the information of neighbor nodes

And combined with embedding, Update the same by a non-linear transformation embedding Express.

Types of aggregators

In the GraphSAGE method, 4 types of Aggregators are used.

Simple neighborhood aggregator:

Mean aggregator

LSTM Aggregator: Applies LSTM to a random permutation of neighbors.

Pooling Aggregator: It applies a symmetric vector function and converts adjacent vectors.

Equation of graphSAGE

W k , B k : is learnable weight matrices.

\({W}_{k}{B}_{k}=\) is learnable wight matrices.

\({h}_{v}^{0}= {x}_{v}:initial 0-\) the layer embeddings are equal to node features.

\({h}_{u}^{k-1}=\) Generalized Aggregation.

\({z}_{v }= {h}_{v}^{k}n\) : embedding after k layers of neighborhood aggregation.

\(\sigma\) – non linearity (ReLU).

Summary of graphSAGE is shown in Table 4 .

Comparative study of gnn models, comparison based on practical implementation of gnn models.

Table 5 describes the dataset statistics for different datasets used in literature for graph type of input. The datasets are CORA, Citeseer, and Pubmed. These statistics provide information about the kind of dataset, the number of nodes and edges, the number of classes, the number of features, and the label rate for each dataset. These details are essential for understanding the characteristics and scale of the datasets used in the context of citation networks. Comparison of the GNN model with equation in shown in Fig.  23 .

figure 23

Equations of GNN Models

Table 6 shows the performance results of different Graph Neural Network (GNN) models on various datasets. Table 6 provides accuracy scores for other GNN models on different datasets. Additionally, the time taken for some models to compute results is indicated in seconds. This information is crucial for evaluating the performance of these models on specific datasets.

Comparison based on theoretical concepts of GNN models are described in Table 7 .

Graph neural network applications, graph construction.

Graph Neural Networks (GNNs) have a wide range of applications spanning diverse domains, which encompass modern recommender systems, computer vision, natural language processing, program analysis, software mining, bioinformatics, anomaly detection, and urban intelligence, among others. The fundamental prerequisite for GNN utilization is the transformation or representation of input data into a graph-like structure. In the realm of graph representation learning, GNNs excel in acquiring essential node or graph embeddings that serve as a crucial foundation for subsequent tasks [ 61 ].

The construction of a graph involves a two-fold process:

Graph creation and

Learning about graph representations

Graph Creation: The generation of graphs is essential for depicting the intricate relationships embedded within diverse incoming data. With the varied nature of input data, various applications adopt techniques to create meaningful graphs. This process is indispensable for effectively communicating the structural nuances of the data, ensuring the nodes and edges convey their semantic significance, particularly tailored to the specific task at hand.

Learning about graph representations: The subsequent phase involves utilizing the graph expression acquired from the input data. In GNN-based Learning for graph representations, some studies employ well-established GNN models like GraphSAGE, GCN, GAT, and GGNN, which offer versatility for various application tasks. However, when faced with specific tasks, it may be necessary to customize the GNN architecture to address particular challenges more effectively.

The different application which is considered a graph

Molecular Graphs: Atoms and electrons serve as the basic building blocks of matter and molecules, organized in three-dimensional structures. While all particles interact, we primarily acknowledge a covalent connection between two stable atoms when they are sufficiently spaced apart. Various atom-to-atom bond configurations exist, including single and double bonds. This three-dimensional arrangement is conveniently and commonly represented as a graph, with atoms representing nodes and covalent bonds representing edges [ 62 ].

Graphs of social networks: These networks are helpful research tools for identifying trends in the collective behavior of individuals, groups, and organizations. We may create a graph that represents groupings of people by visualizing individuals as nodes and their connections as edges [ 63 ].

Citation networks as graphs: When they publish papers, scientists regularly reference the work of other scientists. Each manuscript can be visualized as a node in a graph of these citation networks, with each directed edge denoting a citation from one publication to another. Additionally, we can include details about each document in each node, such as an abstract's word embedding [ 64 ].

Within computer vision: We may want to tag certain things in visual scenes. Then, we can construct graphs by treating these things as nodes and their connections as edges.

GNNs are used to model data as graphs, allowing for the capture of complex relationships and dependencies that traditional machine learning models may struggle to represent. This makes GNNs a valuable tool for tasks where data has an inherent graph structure or where modeling relationships is crucial for accurate predictions and analysis.

Graph neural networks (GNNs) applications in different fields

Nlp (natural language processing).

Document Classification: GNNs can be used to model the relationships between words or sentences in documents, allowing for improved document classification and information retrieval.

Text Generation: GNNs can assist in generating coherent and contextually relevant text by capturing dependencies between words or phrases.

Question Answering: GNNs can help in question-answering tasks by representing the relationships between question words and candidate answers within a knowledge graph.

Sentiment Analysis: GNNs can capture contextual information and sentiment dependencies in text, improving sentiment analysis tasks.

Computer vision

Image Segmentation: GNNs can be employed for pixel-level image segmentation tasks by modeling relationships between adjacent pixels as a graph.

Object Detection: GNNs can assist in object detection by capturing contextual information and relationships between objects in images.

Scene Understanding: GNNs are used for understanding complex scenes and modeling spatial relationships between objects in an image.

Bioinformatics

Protein-Protein Interaction Prediction: GNNs can be applied to predict interactions between proteins in biological networks, aiding in drug discovery and understanding disease mechanisms.

Genomic Sequence Analysis: GNNs can model relationships between genes or genetic sequences, helping in gene expression prediction and sequence classification tasks.

Drug Discovery: GNNs can be used for drug-target interaction prediction and molecular property prediction, which is vital in pharmaceutical research.

Table 8 offers a concise overview of various research papers that utilize Graph Neural Networks (GNNs) in diverse domains, showcasing the applications and contributions of GNNs in each study.

Table 9 highlights various applications of GNNs in Natural Language Processing, Computer Vision, and Bioinformatics domains, showcasing how GNN models are adapted and used for specific tasks within each field.

Future directions of graph neural network

The contribution of the existing literature to GNN principles, models, datasets, applications, etc., was the main emphasis of this survey. In this section, several potential future study directions are suggested. Significant challenges have been noted, including unbalanced datasets, the effectiveness of current methods, text classification, etc. We have also looked at the remedies to address these problems. We have suggested future and advanced directions to address these difficulties regarding domain adaptation, data augmentation, and improved classification. Table 10 displays future directions.

Imbalanced Datasets—Limited labeled data, domain-dependent data, and imbalanced data are currently issues with available datasets. Transfer learning and domain adaptation are solutions to these issues.

Accuracy of Existing Systems/Models—can utilize deep learning models such as GCN, GAT, and GraphSAGE approaches to increase the efficiency and precision of current systems. Additionally, training models on sizable, domain-specific datasets can enhance performance.

Enhancing Text Classification: Text classification poses another significant challenge, which is effectively addressed by leveraging advanced deep learning methodologies like graph neural networks, contributing to the improvement of text classification accuracy and performance.

The above Table  10 describes the research gaps and future directions presented in the above literature. These research gaps and future directions highlight the challenges and proposed solutions in the field of text classification and structural analysis.

Table 11 provides an overview of different research papers, their publication years, the applications they address, the graph structures they use, the graph types, the graph tasks, and the specific Graph Neural Network (GNN) models utilized in each study.

Conclusions

Graph Neural Networks (GNNs) have witnessed rapid advancements in addressing the unique challenges presented by data structured as graphs, a domain where conventional deep learning techniques, originally designed for images and text, often struggle to provide meaningful insights. GNNs offer a powerful and intuitive approach that finds broad utility in applications relying on graph structures. This comprehensive survey on GNNs offers an in-depth analysis covering critical aspects such as GNN fundamentals, the interplay with convolutional neural networks, GNN message-passing mechanisms, diverse GNN models, practical use cases, and a forward-looking perspective. Our central focus is on elucidating the foundational characteristics of GNNs, a field teeming with contemporary applications that continually enhance our comprehension and utilization of this technology.

The continuous evolution of GNN-based research has underscored the growing need to address issues related to graph analysis, which we aptly refer to as the frontiers of GNNs. In our exploration, we delve into several crucial recent research domains within the realm of GNNs, encompassing areas like link prediction, graph generation, and graph categorization, among others.

Availability of data and materials

Not applicable.

Abbreviations

Graph Neural Network

Graph Convolution Network

Graph Attention Networks

Natural Language Processing

Convolution Neural Networks

Recurrent Neural Networks

Machine Learning

Deep Learning

Knowledge Graph

Pucci A, Gori M, Hagenbuchner M, Scarselli F, Tsoi AC. Investigation into the application of graph neural networks to large-scale recommender systems, infona.pl, no. 32, no 4, pp. 17–26, 2006.

Mahmud FB, Rayhan MM, Shuvo MH, Sadia I, Morol MK. A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection, Proc. - 2022 7th Int. Conf. Data Sci. Mach. Learn. Appl. CDMA 2022, pp. 97–102, 2022.

Cui L, Seo H, Tabar M, Ma F, Wang S, Lee D, Deterrent: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 492–502, 2020.

Gori M, Monfardini G, Scarselli F, A new model for earning in raph domains, Proc. Int. Jt. Conf. Neural Networks, vol. 2, no. January 2005, pp. 729–734, 2005, https://doi.org/10.1109/IJCNN.2005.1555942 .

Scarselli F, Yong SL, Gori M, Hagenbuchner M, Tsoi AC, Maggini M. Graph neural networks for ranking web pages, Proc.—2005 IEEE/WIC/ACM Int. Web Intell. WI 2005, vol. 2005, no. January, pp. 666–672, 2005, doi: https://doi.org/10.1109/WI.2005.67 .

Gandhi S, Zyer AP, P3: Distributed deep graph learning at scale, Proc. 15th USENIX Symp. Oper. Syst. Des. Implementation, OSDI 2021, pp. 551–568, 2021.

Li C, Guo J, Zhang H. Pruning neighborhood graph for geodesic distance based semi-supervised classification, in 2007 International Conference on Computational Intelligence and Security (CIS 2007), 2007, pp. 428–432.

Zhang Z, Cui P, Pei J, Wang X, Zhu W, Eigen-gnn: A graph structure preserving plug-in for gnns, IEEE Trans. Knowl. Data Eng., 2021.

Nandedkar AV, Biswas PK. A granular reflex fuzzy min–max neural network for classification. IEEE Trans Neural Netw. 2009;20(7):1117–34.

Article   Google Scholar  

Chaturvedi DK, Premdayal SA, Chandiok A. Short-term load forecasting using soft computing techniques. Int’l J Commun Netw Syst Sci. 2010;3(03):273.

Google Scholar  

Hashem T, Kulik L, Zhang R. Privacy preserving group nearest neighbor queries, in Proceedings of the 13th International Conference on Extending Database Technology, 2010, pp. 489–500.

Sun Z et al. Knowledge graph alignment network with gated multi-hop neighborhood aggregation, in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 01, pp. 222–229.

Zhang M, Chen Y. Link prediction based on graph neural networks. Adv Neural Inf Process Syst. 31, 2018.

Stanimirović PS, Katsikis VN, Li S. Hybrid GNN-ZNN models for solving linear matrix equations. Neurocomputing. 2018;316:124–34.

Stanimirović PS, Petković MD. Gradient neural dynamics for solving matrix equations and their applications. Neurocomputing. 2018;306:200–12.

Zhang C, Song D, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network, in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 793–803.

Fan W et al. Graph neural networks for social recommendation," in The world wide web conference, 2019, pp. 417–426.

Gui T et al. A lexicon-based graph neural network for Chinese NER," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 1040–1050.

Qasim SR, Mahmood H, Shafait F. Rethinking table recognition using graph neural networks, in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 142–147

You J, Ying R, Leskovec J. Position-aware graph neural networks, in International conference on machine learning, 2019, pp. 7134–7143.

Cao D, et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv Neural Inf Process Syst. 2020;33:17766–78.

Xhonneux LP, Qu M, Tang J. Continuous graph neural networks. In International Conference on Machine Learning, 2020, pp. 10432–10441.

Zhou K, Huang X, Li Y, Zha D, Chen R, Hu X. Towards deeper graph neural networks with differentiable group normalization. Adv Neural Inf Process Syst. 2020;33:4917–28.

Gu F, Chang H, Zhu W, Sojoudi S, El Ghaoui L. Implicit graph neural networks. Adv Neural Inf Process Syst. 2020;33:11984–95.

Liu Y, Guan R, Giunchiglia F, Liang Y, Feng X. Deep attention diffusion graph neural networks for text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8142–8152.

Gasteiger J, Becker F, Günnemann S. Gemnet: universal directional graph neural networks for molecules. Adv Neural Inf Process Syst. 2021;34:6790–802.

Yao D et al. Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification. Def. Technol. 2022.

Li Y, et al. Research on multi-port ship traffic prediction method based on spatiotemporal graph neural networks. J Mar Sci Eng. 2023;11(7):1379.

Djenouri Y, Belhadi A, Srivastava G, Lin JC-W. Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting. Futur Gener Comput Syst. 2023;139:100–8.

Zhou J, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1(January):57–81. https://doi.org/10.1016/j.aiopen.2021.01.001 .

Rong Y, Huang W, Xu T, Huang J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903. 2019.

Abu-Salih B, Al-Qurishi M, Alweshah M, Al-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: a systematic review of the state-of-the-art, open issues, and opportunities. J Big Data. 2023;10(1):81.

Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv Prepr. arXiv1609.02907, 2016.

Berg RV, Kipf TN, Welling M. Graph Convolutional Matrix Completion. 2017, http://arxiv.org/abs/1706.02263

Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM. Geometric deep learning on graphs and manifolds using mixture model cnns. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 5115-5124).

Cui Z, Henrickson K, Ke R, Wang Y. Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. IEEE Trans Intell Transp Syst. 2020;21(11):4883–94. https://doi.org/10.1109/TITS.2019.2950416 .

Yang J, Lu J, Lee S, Batra D, Parikh D. Graph r-cnn for scene graph generation. InProceedings of the European conference on computer vision (ECCV) 2018 (pp. 670-685). https://doi.org/10.1007/978-3-030-01246-5_41 .

Teney D, Liu L, van Den Hengel A. Graph-structured representations for visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1-9). https://doi.org/10.1109/CVPR.2017.344 .

Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell. 2019;33(01):7370–7.

De Cao N, Aziz W, Titov I. Question answering by reasoning across documents with graph convolutional networks. arXiv Prepr. arXiv1808.09920, 2018.

Gao H, Wang Z, Ji S. Large-scale learnable graph convolutional networks. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1416–1424.

Hu F, Zhu Y, Wu S, Wang L, Tan T. Hierarchical graph convolutional networks for semi-supervised node classification. arXiv Prepr. arXiv1902.06667, 2019.

Lange O, Perez L. Traffic prediction with advanced graph neural networks. DeepMind Research Blog Post, https://deepmind.google/discover/blog/traffic-prediction-with-advanced-graph-neural-networks/ . 2020.

Duan C, Hu B, Liu W, Song J. Motion capture for sporting events based on graph convolutional neural networks and single target pose estimation algorithms. Appl Sci. 2023;13(13):7611.

Balcıoğlu YS, Sezen B, Çerasi CC, Huang SH. machine design automation model for metal production defect recognition with deep graph convolutional neural network. Electronics. 2023;12(4):825.

Baghbani A, Bouguila N, Patterson Z. Short-term passenger flow prediction using a bus network graph convolutional long short-term memory neural network model. Transp Res Rec. 2023;2677(2):1331–40.

Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. Stat. 2017;1050(20):10–48550.

Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Advances in neural information processing systems. 2017; 30.

Ye Y, Ji S. Sparse graph attention networks. IEEE Trans Knowl Data Eng. 2021;35(1):905–16.

MathSciNet   Google Scholar  

Chen Z et al. Graph neural network-based fault diagnosis: a review. arXiv Prepr. arXiv2111.08185, 2021.

Brody S, Alon U, Yahav E. How attentive are graph attention networks? arXiv Prepr. arXiv2105.14491, 2021.

Huang J, Shen H, Hou L, Cheng X. Signed graph attention networks," in International Conference on Artificial Neural Networks. 2019, pp. 566–577.

Seraj E, Wang Z, Paleja R, Sklar M, Patel A, Gombolay M. Heterogeneous graph attention networks for learning diverse communication. arXiv preprint arXiv: 2108.09568. 2021.

Zhang Y, Wang X, Shi C, Jiang X, Ye Y. Hyperbolic graph attention network. IEEE Transactions on Big Data. 2021;8(6):1690–701.

Yang X, Ma H, Wang M. Research on rumor detection based on a graph attention network with temporal features. Int J Data Warehous Min. 2023;19(2):1–17.

Lan W, et al. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform. 2022;23(1):bbab494.

Xiao L, Wu X, Wang G, 2019, December. Social network analysis based on graph SAGE. In 2019 12th international symposium on computational intelligence and design (ISCID) (Vol. 2, pp. 196–199). IEEE.

Chang L, Branco P. Graph-based solutions with residuals for intrusion detection: The modified e-graphsage and e-resgat algorithms. arXiv preprint arXiv:2111.13597. 2021.

Oh J, Cho K, Bruna J. Advancing graphsage with a data-driven node sampling. arXiv preprint arXiv:1904.12935. 2019.

Kapoor M, Patra S, Subudhi BN, Jakhetiya V, Bansal A. Underwater Moving Object Detection Using an End-to-End Encoder-Decoder Architecture and GraphSage With Aggregator and Refactoring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 5635-5644).

Bhatti UA, Tang H, Wu G, Marjan S, Hussain A. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. Int J Intell Syst. 2023;2023:1–28.

David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform. 2020;12(1):1–22.

Davies A, Ajmeri N. Realistic Synthetic Social Networks with Graph Neural Networks. arXiv preprint arXiv:2212.07843. 2022; 15.

Frank MR, Wang D, Cebrian M, Rahwan I. The evolution of citation graphs in artificial intelligence research. Nat Mach Intell. 2019;1(2):79–85.

Gao C, Wang X, He X, Li Y. Graph neural networks for recommender system. InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022 (pp. 1623-1625).

Wu S, Sun F, Zhang W, Xie X, Cui B. Graph neural networks in recommender systems: a survey. ACM Comput Surv. 2022;55(5):1–37.

Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, Pei J, Long B. Graph neural networks for natural language processing: a survey. Found Trends Mach Learn. 2023;16(2):119–328.

Wu L, Chen Y, Ji H, Liu B. Deep learning on graphs for natural language processing. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 (pp. 2651-2653).

Liu X, Su Y, Xu B. The application of graph neural network in natural language processing and computer vision. In2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) 2021 (pp. 708-714).

Harmon SHE, Faour DE, MacDonald NE. Mandatory immunization and vaccine injury support programs: a survey of 28 GNN countries. Vaccine. 2021;39(49):7153–7.

Yan W, Zhang Z, Zhang Q, Zhang G, Hua Q, Li Q. Deep data analysis-based agricultural products management for smart public healthcare. Front Public Health. 2022;10:847252.

Hamaguchi T, Oiwa H, Shimbo M, Matsumoto Y. Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach. arXiv preprint arXiv:1706.05674. 2017.

Dai D, Zheng H, Luo F, Yang P, Chang B, Sui Z. Inductively representing out-of-knowledge-graph entities by optimal estimation under translational assumptions. arXiv preprint arXiv:2009.12765.

Pradhyumna P, Shreya GP. Graph neural network (GNN) in image and video understanding using deep learning for computer vision applications. In2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) 2021 (pp. 1183-1189).

Shi W, Rajkumar R. Point-gnn: Graph neural network for 3d object detection in a point cloud. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020 (pp. 1711-1719).

Wu Y, Dai HN, Tang H. Graph neural networks for anomaly detection in industrial internet of things. IEEE Int Things J. 2021;9(12):9214–31.

Pitsik EN, et al. The topology of fMRI-based networks defines the performance of a graph neural network for the classification of patients with major depressive disorder. Chaos Solitons Fractals. 2023;167: 113041.

Liao W, Zeng B, Liu J, Wei P, Cheng X, Zhang W. Multi-level graph neural network for text sentiment analysis. Comput Electr Eng. 2021;92: 107096.

Kumar VS, Alemran A, Karras DA, Gupta SK, Dixit CK, Haralayya B. Natural Language Processing using Graph Neural Network for Text Classification. In2022 International Conference on Knowledge Engineering and Communication Systems (ICKES) 2022; (pp. 1-5).

Dara S, Srinivasulu CH, Babu CM, Ravuri A, Paruchuri T, Kilak AS, Vidyarthi A. Context-Aware auto-encoded graph neural model for dynamic question generation using NLP. ACM transactions on asian and low-resource language information processing. 2023.

Wu L, Cui P, Pei J, Zhao L, Guo X. Graph neural networks: foundation, frontiers and applications. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022; (pp. 4840-4841).

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans neural networks. 2008;20(1):61–80.

Cao P, Zhu Z, Wang Z, Zhu Y, Niu Q. Applications of graph convolutional networks in computer vision. Neural Comput Appl. 2022;34(16):13387–405.

You R, Yao S, Mamitsuka H, Zhu S. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction. Bioinformatics. 2021;37(Supplement_1):i262-71.

Long Y, et al. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics. 2022;38(8):2254–62.

Wu Y, Gao M, Zeng M, Zhang J, Li M. BridgeDPI: a novel graph neural network for predicting drug–protein interactions. Bioinformatics. 2022;38(9):2571–8.

Kang C, Zhang H, Liu Z, Huang S, Yin Y. LR-GNN: a graph neural network based on link representation for predicting molecular associations. Briefings Bioinf. 2022;23(1):bbab513.

Wei X, Huang H, Ma L, Yang Z, Xu L. Recurrent Graph Neural Networks for Text Classification. in 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 2020, pp. 91–97.

Schlichtkrull MS, De Cao N, Titov I. Interpreting graph neural networks for nlp with differentiable edge masking. arXiv Prepr. arXiv2010.00577, 2020.

Tu M, Huang J, He X, Zhou B. Graph sequential network for reasoning over sequences. arXiv Prepr. arXiv2004.02001, 2020.

Download references

Acknowledgements

I am grateful to all of those with whom I have had the pleasure to work during this research work. Each member has provided me extensive personal and professional guidance and taught me a great deal about scientific research and life in general.

This work was supported by the Research Support Fund (RSF) of Symbiosis International (Deemed University), Pune, India.

Author information

Authors and affiliations.

Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Bharti Khemani

Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Shruti Patil & Ketan Kotecha

IEEE, Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, India

Sudeep Tanwar

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, BK and SP; methodology, BK and SP; software, BK; validation, BK, SP, KK; formal analysis, BK; investigation, BK; resources, BK; data curation, BK and SP; writing—original draft preparation, BK; writing—review and editing, SP, KK, and ST; visualization, BK; supervision, SP; project administration, SP, ST; funding acquisition, KK.

Corresponding author

Correspondence to Shruti Patil .

Ethics declarations

Ethics approval and consent to participate.

Not applicable

Consent for publication

Competing interests.

The authors declare that they have no competing interests .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Tables  12 and 13

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Khemani, B., Patil, S., Kotecha, K. et al. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data 11 , 18 (2024). https://doi.org/10.1186/s40537-023-00876-4

Download citation

Received : 28 June 2023

Accepted : 27 December 2023

Published : 16 January 2024

DOI : https://doi.org/10.1186/s40537-023-00876-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Graph Neural Network (GNN)
  • Graph Convolution Network (GCN)
  • Graph Attention Networks (GAT)
  • Message Passing Mechanism
  • Natural Language Processing (NLP)

current research on data structures

tree data structure Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Granule-Based-Classifier (GbC): A Lattice Computing Scheme Applied on Tree Data Structures

Social robots keep proliferating. A critical challenge remains their sensible interaction with humans, especially in real world applications. Hence, computing with real world semantics is instrumental. Recently, the Lattice Computing (LC) paradigm has been proposed with a capacity to compute with semantics represented by partial order in a mathematical lattice data domain. In the aforementioned context, this work proposes a parametric LC classifier, namely a Granule-based-Classifier (GbC), applicable in a mathematical lattice (T,⊑) of tree data structures, each of which represents a human face. A tree data structure here emerges from 68 facial landmarks (points) computed in a data preprocessing step by the OpenFace software. The proposed (tree) representation retains human anonymity during data processing. Extensive computational experiments regarding three different pattern recognition problems, namely (1) head orientation, (2) facial expressions, and (3) human face recognition, demonstrate GbC capacities, including good classification results, and a common human face representation in different pattern recognition problems, as well as data induced granular rules in (T,⊑) that allow for (a) explainable decision-making, (b) tunable generalization enabled also by formal logic/reasoning techniques, and (c) an inherent capacity for modular data fusion extensions. The potential of the proposed techniques is discussed.

Secure and Practical Group Nearest Neighbor Query for Location-Based Services in Cloud Computing

Group nearest neighbor (GNN) query enables a group of location-based service (LBS) users to retrieve a point from point of interests (POIs) with the minimum aggregate distance to them. For resource constraints and privacy concerns, LBS provider outsources the encrypted POIs to a powerful cloud server. The encryption-and-outsourcing mechanism brings a challenge for the data utilization. However, as previous work from k − anonymity technique leaks all contents of POIs and returns an answer set with redundant communication cost, the LBS system cannot work properly with those privacy-preserving schemes. In this paper, we illustrate a secure group nearest neighbor query scheme, which is referred to as SecGNN. It supports the GNN query with n n ≥ 3 LBS users and assures the data privacy and query privacy. Since SecGNN only achieves linear search complexity, an efficiency enhanced scheme (named Sec GNN + ) is introduced by taking advantage of the KD-tree data structure. Specifically, we convert the GNN problem to the nearest neighbor problem for their centroid, which can be computed by anonymous veto network and Burmester–Desmedt conference key agreement protocols. Furthermore, the Sec GNN + scheme is introduced from the KD-tree data structure and a designed tool, which supports the computation of inner products over ciphertexts. Finally, we run experiments on a real-database and a random database to evaluate the performance of our SecGNN and Sec GNN + schemes. The experimental results show the high efficiency of our proposed schemes.

A Polynomial Time Algorithm for Graph Isomorphism and Automorphism

This paper describes a polynomial time algorithm for solving graph isomorphism and automorphism. We introduce a new tree data structure called Walk Length Tree. We show that such tree can be both constructed and compared with another in polynomial time. We prove that graph isomorphism and automorphism can be solved in polynomial time using Walk Length Trees.

Correction to: HARE: A new Hash-based Authenticated Reliable and Efficient Modified Merkle Tree Data Structure to Ensure Integrity of Data in the Healthcare Systems

Efficient and scalable initialization of partitioned coupled simulations with precice.

preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme.

An Effective Algorithm of Uneven Road Surface Modeling and Calculating Reaction Forces for a Vehicle Dynamics Simulation

Computer simulations of vehicle dynamics become more complex when a vehicle movement takes place on the uneven road surface. In such a case two problems must be solved. The first one concerns a way of road surface modeling, and the second one a way of precisely determining a place of interaction of reaction forces of the road on the vehicle wheels. In this paper triangular irregular networks (TIN) surface was used for modeling surface unevenness, and the author’s algorithm based on the efficient kd-tree data structure was developed for determining a place of an application road surface reaction forces. For calculating the reaction forces including rolling resistance force the Pacejka Magic Formula tire model was used. The solution presented in the paper is computing efficient and for this reason it can be used in the in real-time simulations not only for a vehicle dynamics but for any objects moving over an uneven surface road.

Interactive Path Tracing and Reconstruction of Sparse Volumes

We combine state-of-the-art techniques into a system for high-quality, interactive rendering of participating media. We leverage unbiased volume path tracing with multiple scattering, temporally stable neural denoising and NanoVDB [Museth 2021], a fast, sparse voxel tree data structure for the GPU, to explore what performance and image quality can be obtained for rendering volumetric data. Additionally, we integrate neural adaptive sampling to significantly improve image quality at a fixed sample budget. Our system runs at interactive rates at 1920 × 1080 on a single GPU and produces high quality results for complex dynamic volumes.

HARE: A new Hash-based Authenticated Reliable and Efficient Modified Merkle Tree Data Structure to Ensure Integrity of Data in the Healthcare Systems

E-art: a new encryption algorithm based on the reflection of binary search tree.

Data security has become crucial to most enterprise and government applications due to the increasing amount of data generated, collected, and analyzed. Many algorithms have been developed to secure data storage and transmission. However, most existing solutions require multi-round functions to prevent differential and linear attacks. This results in longer execution times and greater memory consumption, which are not suitable for large datasets or delay-sensitive systems. To address these issues, this work proposes a novel algorithm that uses, on one hand, the reflection property of a balanced binary search tree data structure to minimize the overhead, and on the other hand, a dynamic offset to achieve a high security level. The performance and security of the proposed algorithm were compared to Advanced Encryption Standard and Data Encryption Standard symmetric encryption algorithms. The proposed algorithm achieved the lowest running time with comparable memory usage and satisfied the avalanche effect criterion with 50.1%. Furthermore, the randomness of the dynamic offset passed a series of National Institute of Standards and Technology (NIST) statistical tests.

HOVA-FPPM: Flexible Periodic Pattern Mining in Time Series Databases Using Hashed Occurrence Vectors and Apriori Approach

Finding flexible periodic patterns in a time series database is nontrivial due to irregular occurrence of unimportant events, which makes it intractable or computationally intensive for large datasets. There exist various solutions based on Apriori, projection, tree, and other techniques to mine these patterns. However, the existence of constant size tree structure, i.e., suffix tree, with extra information in memory throughout the mining process, redundant and invalid pattern generation, limited types of mined flexible periodic patterns, and repeated traversal over tree data structure for pattern discovery, results in unacceptable space and time complexity. In order to overcome these issues, we introduce an efficient approach called HOVA-FPPM based on Apriori approach with hashed occurrence vectors to find all types of flexible periodic patterns. We do not rely on complex tree structure rather manage necessary information in a hash table for efficient lookup during the mining process. We measured the performance of our proposed approach and compared the results with the baseline approach, i.e., FPPM. The results show that our approach requires lesser time and space, regardless of the data size or period value.

Export Citation Format

Share document.

current research on data structures

Microsoft Research Blog

Improving llm understanding of structured data and exploring advanced prompting methods.

Published March 7, 2024

By Mengyu Zhou , Senior Researcher

Share this page

  • Share on Facebook
  • Share on LinkedIn
  • Share on Reddit
  • Subscribe to our RSS feed

This research paper was presented at the 17th ACM International Conference on Web Search and Data Mining (opens in new tab)  (WSDM 2024), the premier conference on web-inspired research on search and data mining.

WSDM logo in white to the left of the first page of the

In today’s data-driven landscape, tables are indispensable for organizing and presenting information, particularly text. They streamline repetitive content, enhance data manageability, enable easier data analysis, and improve machine processing capabilities. Meanwhile, large language models (LLMs) are advancing in their ability to tackle challenges associated with natural language, but the degree to which they understand tables included in their prompts remains an open question. Our research aims to explore this question and improve how LLMs use and work with table-based data.

Our paper, “ Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study (opens in new tab) ,” presented at WSDM 2024 (opens in new tab) , investigates what kinds of prompts most effectively enable LLMs to understand tables; how much LLMs inherently detect structured data; and how LLMs’ existing knowledge can be harnessed to improve this understanding. We also analyze the complex trade-off among multiple combinations of input designs and overall performance.

Digital pathology helps decode tumor microenvironments for precision immunotherapy. GigaPath is a novel vision transformer that can scale to gigapixel whole-slide images by adapting dilated attention for digital pathology. In joint work with Providence and UW, we’re sharing Prov-GigaPath, the first whole-slide pathology foundation model pretrained on large-scale real-world data, for advancing clinical research and discovery.

GigaPath: Whole-Slide Foundation Model for Digital Pathology

Digital pathology helps decode tumor microenvironments for precision immunotherapy. In joint work with Providence and UW, we’re sharing Prov-GigaPath, the first whole-slide pathology foundation model, for advancing clinical research.

To address these questions, we propose a new benchmark called Structural Understanding Capabilities (SUC), shown in Figure 1 (a), which focuses on specific tasks to assess LLMs’ ability to understand structured data in tables and compare different types of prompts. We conducted a series of experiments using different prompt designs. Our findings, detailed in the paper , evaluate how each design enhances LLMs’ ability to work with tables. 

current research on data structures

Insights and findings using the SUC benchmark

Based on humans’ perception of tables, we developed tasks to evaluate how LLMs understand them. We conducted evaluations on GPT-3.5 and GPT-4 and discovered that the results depended on certain input factors, such as table format, content order, and partition marks. The findings, detailed in Tables 1 and 2, reveal some notable and unexpected findings:

  • Delimiter-separated formats (e.g., CSV, TSV), underperformed compared with HTML by 6.76 percent.
  • Using HTML and few-shot learning consistently improved performance. The effectiveness of other approaches, such as format explanation, role prompting, order change, and partition marks, varied depending on task difficulty and the required capacity.
  • Despite the simplicity of the benchmark tasks, the highest overall accuracy across seven tasks is only 65.43 percent. This underscores the need for LLMs to have better awareness of table structures and highlights areas for further improvement in table serialization.

Our exploration suggests that:

  • LLMs have a basic understanding of table structures but are far from perfect, even in straightforward tasks like detecting the number of columns and rows.
  • Choosing the right combination of input designs can significantly enhance LLMs’ understanding of structured data.

Our findings revealed significant performance gaps in downstream tasks, attributed to the different combinations of serialization functions and input options. These gaps remained even with GPT-4, underscoring the effectiveness of our benchmark approach.

This is a table regarding the comparison table displaying the accuracy (Acc) of GPT-4 versus previous models in different tasks. Tasks include Table Partition, Cell Lookup, Reverse Lookup, Column Retrieval, Row Retrieval, Size Detection, and Merged Cell Detection. The data formats compared are NL + Sep, Markdown, JSON, XML, and HTML. GPT-4 shows improved accuracy across nearly all tasks and formats compared to its predecessors, with notable high accuracy in the HTML format for Table Partition and Merged Cell Detection tasks.

Improved performance with self-augmented prompting

Based on these benchmark evaluations, we investigated how LLMs’ existing knowledge could be used to enhance their understanding of structured data. To do this, we introduced self-augmentation, a model-agnostic technique that improves structural prompting—enabling LLMs to identify key values and ranges by tapping into their own internal knowledge. This technique simplifies and optimizes how LLMs utilize their existing knowledge base to improve their understanding of structured content, allowing them to generate intermediate structural insights. This process is shown in Figure 2, with the results detailed in Table 3.

current research on data structures

Looking forward

Our study sets a key benchmark in expanding the capabilities of LLMs to better understand structured table data, moving beyond conventional natural language processing tasks. We suggest future research should prioritize the integration of structural information to improve performance with various structured data types. Additionally, we propose exploring LLMs’ ability to use external tools or agents for improved handling of structured data, opening new avenues for application.

Related publications

Table meets llm: can large language models understand structured table data a benchmark and empirical study, meet the authors.

Portrait of Mengyu Zhou

Mengyu Zhou

Senior Researcher

Continue reading

SIGMOD PODS 2024 logo to the left of the first page of "LST-Bench: Benchmarking Log-Structured Tables in the Cloud"

LST-Bench: A new benchmark tool for open table formats in the data lake

SAMMO optimizer diagram showing progression from starting prompt to optimized prompt.

SAMMO: A general-purpose framework for prompt optimization

First page of the "Learning Hierarchical Prompt with Structured Linguistic Knowledge for Language Models" publication to the right of the AAAI conference on a blue and purple gradient background

Structured knowledge from LLMs improves prompt learning for visual language models

EMNLP 2023 logo to the left of accepted paper "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models" on a blue/green gradient background

LLMLingua: Innovating LLM efficiency with prompt compression

Research areas.

current research on data structures

Research Groups

  • Data, Knowledge, and Intelligence

Related projects

  • Spreadsheet Intelligence

Related labs

  • Microsoft Research Lab - Asia
  • Follow on X
  • Like on Facebook
  • Follow on LinkedIn
  • Subscribe on Youtube
  • Follow on Instagram

Share this page:

  • Data Structures
  • Linked List
  • Binary Tree
  • Binary Search Tree
  • Segment Tree
  • Disjoint Set Union
  • Fenwick Tree
  • Red-Black Tree
  • Advanced Data Structures
  • Data Structures Tutorial
  • Introduction to Data Structures
  • Data Structure Types, Classifications and Applications

Overview of Data Structures

  • Introduction to Linear Data Structures
  • Introduction to Hierarchical Data Structure
  • Overview of Graph, Trie, Segment Tree and Suffix Tree Data Structures

Different Types of Data Structures

  • Array Data Structure
  • String in Data Structure
  • Stack Data Structure
  • Queue Data Structure
  • Linked List Data Structure
  • Introduction to Tree Data Structure
  • Heap Data Structure
  • Hashing in Data Structure
  • Graph Data Structure And Algorithms
  • Matrix Data Structure
  • Data Structure Alignment : How data is arranged and accessed in Computer Memory?
  • Static Data Structure vs Dynamic Data Structure
  • Static and Dynamic Data Structures
  • Common operations on various Data Structures

Real-life Applications of Data Structures and Algorithms (DSA)

Different types of advanced data structures.

You may have heard that DSA is primarily used in the field of computer science. Although DSA is most commonly used in the computing field, its application is not restricted to it. The concept of DSA can also be found in everyday life. Here we’ll address the common concept of DSA that we use in our day-to-day lives.

Application of DataStructure

  • Application of Data Structure
  • Application of Arrays
  • Application of Strings
  • Application of Matrix
  • Application of Linked Lists
  • Application of Stack
  • Application of Queue
  • Application of Priority Queue
  • Application of Graph
  • Application of Tree
  • Application of Binary Search Tree

Application of RED-BLACK Tree

Application of avl tree, application of suffix tree, application of trie.

  • Application of Hash Tables
  • Application of Heap

Application of Algorithm

Application of Sorting Algorithms

Application of greedy algorithms:, application of dijkstra algorithm, application of prim’s and kruskal’s algorithm, application of dynamic programming algorithms:, application of backtracking algorithms:, application of data structure:.

A data structure is a particular way of organizing data in a computer so that it can be used effectively. The real-life applications of all the data structures are discussed below. 

Application of Arrays :

Arrays are the simplest data structures that store items of the same data type. A basic application of Arrays can be storing data in tabular format. For example, if we wish to store the contacts on our phone, then the software will simply place all our contacts in an array.

Some other applications of the arrays are: 

  • Arrangement of the leader board of a game can be done simply through arrays to store the score and arrange them in descending order to clearly make out the rank of each player in the game.
  • A simple question Paper is an array of numbered questions with each of them assigned some marks.
  • 2D arrays , commonly known as, matrices, are used in image processing.
  • It is also used in speech processing, in which each speech signal is an array. 
  • Your viewing screen is also a multidimensional array of pixels.
  • Book titles in a Library Management Systems.
  • Online ticket booking.
  • Contacts on a cell phone.
  •  For CPU scheduling in computer.
  • To store the possible moves of chess on a chessboard.
  • To store images of a specific size on an android or laptop.  

Application of Strings:

  • Spam email detection.
  • Plagiarism detection.
  • Search engine.
  • Digital forensic and information retrieval system
  • Spell checkers.
  • In the database to check valid information of the user

Application of Matrix:

Matrix is an ordered collection of columns and rows of elements. It is necessary to enclose the elements of a matrix within the brackets.

Some applications of a matrix are:

  • In geology, matrices are used for making seismic surveys.
  • Used for plotting graphs, and statistics and also to do scientific studies and research in almost different fields.
  • Matrices are also used in representing real-world data like the population of people, infant mortality rate, etc.
  • They are the best representation methods for plotting surveys. 
  •  For refraction and reflection in science optics.
  • Electronic circuit and quantum physics.
  • Media player.
  • Mailing list.
  • Symbol table creation.  

Application of Linked Lists :

A linked list is a sequence data structure, which connects elements, called nodes, through links.  

Some other applications of the linked list are:  

  • Images are linked with each other. So, an image viewer software uses a linked list to view the previous and the next images using the previous and next buttons.
  • Web pages can be accessed using the previous and the next URL links which are linked using a linked list.
  • The music players also use the same technique to switch between music.
  • To keep the track of turns in a multi-player game, a circular linked list is used. 
  • MS-Paint drawings and shapes are connected via a linked list on canvas.
  • Social media content “feeds”.
  • Used for symbol table management in a designing compiler
  • Used in switching between applications and programs (Alt + Tab) in the Operating system (implemented using Circular Linked List)
  • Train coaches are connected to one another in a doubly-linked list fashion.
  • It can be used to implement Stacks, Queues, Graphs, and Trees.
  • To perform undo operation.
  • Back button.[LIFO]
  • Syntax in the coding editor.
  • History of visited pages. 

Application of Stack :

A stack is a data structure that uses LIFO order .  

Some Applications of a stack are: 

  • Converting infix to postfix expressions.
  • Undo/Redo button/operation in word processors.
  • Syntaxes in languages are parsed using stacks.
  • It is used in many virtual machines like JVM .
  • Forward-backward surfing in the browser.
  • History of visited websites.
  • Message logs and all messages you get are arranged in a stack.
  • Call logs, E-mails, Google photos’ any gallery, YouTube downloads, Notifications ( latest appears first ).
  • Scratch card’s earned after Google pay transaction.
  • Wearing/Removing Bangles, Pile of Dinner Plates, Stacked chairs.
  • Changing wearables on a cold evening, first in, comes out at last.
  • Last Hired, First Fired - which is typically utilized when a company reduces its workforce in an economic recession.
  • Loading bullets into the magazine of a gun. The last one to go in is fired first. Bam!
  • Java Virtual Machine.
  • Used in IDEs to check for proper parentheses matching
  • Media playlist. T o play previous and next song 

Application of Queue :

A queue is a data structure that uses FIFO order .  

Some applications of a queue are: 

  • Operating System uses queues for job scheduling.
  • To handle congestion in the networking queue can be used.
  • Data packets in communication are arranged in queue format.
  • Sending an e-mail, it will be queued.
  • Server while responding to request
  • Uploading and downloading photos, first kept for uploading/downloading will be completed first (Not if there is threading)
  • Most internet requests and processes use queue.
  • While switching multiple applications, windows use circular queue.
  • In Escalators, Printer spooler, Car washes queue.
  • A circular queue is used to maintain the playing sequence of multiple players in a game.
  • A queue can be implemented in - Linked List-based Queue, Array-based Queue, Stack-based Queue.
  • Uploading and downloading photos, first kept for uploading/downloading will be completed first (Not if there is threading).
  •  Handle website traffic
  • CPU scheduling

Application of Priority Queue:

  • Process scheduling in the kernel.
  • Priority queues are used in file-downloading operations in a browser
  •  Vehicle at the toll center.

Application of Graph :

Graph is a data structure where data is stored in a collection of interconnected vertices (nodes) and edges (paths).  

Some applications of a graph are:  

  • Facebook’s Graph API uses the structure of Graphs.
  • Google’s Knowledge Graph also has to do something with Graph.
  • Dijkstra algorithm or the shortest path first algorithm also uses graph structure to find the smallest path between the nodes of the graph.
  • The GPS navigation system also uses shortest path APIs.
  • Networking components have a huge application for graph
  • Facebook, Instagram, and all social media networking sites every user is Node
  • Data organization
  • React’s virtual DOM uses graph data structures.
  • MS Excel uses DAG (Directed Acyclic Graphs).
  • Path Optimization Algorithms, BFS, DFS.
  • Recommendation Engines.
  • Scientific Computations, Flight Networks, Page ranking.
  •  Google map to find nearest location.
  • Facebook to suggest mutual friends  

Application of Tree :

Trees are hierarchical structures having a single root node.  

Some applications of the trees are: 

  • XML Parser uses tree algorithms.
  • The decision-based algorithm is used in machine learning which works upon the algorithm of the tree.
  • Databases also use tree data structures for indexing.
  • Domain Name Server(DNS) also uses tree structures.
  • File explorer/my computer of mobile/any computer
  • BST used in computer Graphics
  • Posting questions on websites like Quora, the comments are a child of questions.
  • Parsers(XML parser).
  • Code Compression(zip).
  • DOM in Html.
  • Evaluate an expression (i.e., parse).
  • Integral to compilers/automata theory.
  • To store the possible moves in a chess game.
  • To store the genealogy information of biological species.
  • Used by JVM (Java Virtual Machine) to store Java objects. 

Application of Binary Search Tree:

  • D Game Engine.
  • Computer Graphics Rendering.
  • Routing table.
  • Used when there is frequent Insertion/Deletion and few searches.
  • K -mean Clustering using a red-black tree, Databases, Simple-minded database, searching words inside dictionaries, searching on the web.
  • Process Scheduling in Linux.
  • More Search and less Insertion/Deletion.
  • Data Analysis and Data Mining and the applications which involve more searches.
  • Fast full-text search, used in most word processors.
  • Dictionary application.
  • Autocomplete feature in searching.
  • Auto-completing the text and spells checking.

Application of Hash Tables :

Hash Tables are store data in key-value pairs. It only stores data that has a key associated with it. Inserting and Searching operations are easily manageable while using Hash Tables. 

Some applications of a hashtable are:  

  • Data stored in databases is generally of the key-value format which is done through hash tables.
  • Every time we type something to be searched in google chrome or other browsers, it generates the desired output based on the principle of hashing.
  • Message Digest, a function of cryptography also uses hashing for creating output in such a manner that reaching the original input from that generated output is almost next to impossible.
  • In our computers we have various files stored in it, each file has two very crucial pieces of information that is, the filename and file path, in order to make a connection between the filename to its corresponding file path hash tables are used. 
  • Social network “feeds”.
  • Password hashing.
  • Used for fast data lookup - symbol table for compilers, database indexing, caches, Unique data representation .
  • To store a set of fixed keywords that are referenced very frequently.

Application of Heap :

A Heap is a special case of a binary tree where the parent nodes are compared to their children with their values and are arranged accordingly.  

Some applications of heaps are:  

  • In heapsort Algorithm , is an algorithm for sorting elements in either min heap (the key of the parent is less than or equal to those of its children) or max heap (the key of the parent is greater than or equal to those of its children), sorting is done with the creation of heaps.
  • Heaps are used to implementing a priority queue where priority is based on the order of heap created.
  • Systems concerned with security and embedded system such as Linux Kernel uses Heap Sort because of the O( n log(n) ).
  • If we are stuck in finding the K th smallest (or largest) value of a number then heaps can solve the problem in an easy and fast manner.
  • Used by JVM (Java Virtual Machine) to store Java objects.

Application of Algorithms:

Algorithms are well-defined sets of instructions designed that are used to solve problems or perform a task. To explain in simpler terms, it is a set of operations performed in a step-by-step manner to execute a task. The real-life applications of algorithms are discussed below. 

  • Order things by their value.
  • Backend Databases (Merge Sort).
  • Playing Cards with your friends (Insertion Sort).
  • sort() - uses IntroSort (a hybrid of Quicksort, Heapsort, and Insertion Sort), Faster than qsort()
  • The contact list on the phone
  • Online shopping. To sort prize in different range . example : flipkart and amazon.
  • Dijkstra algorithm.
  • Shopping on a tight budget but want to buy gifts for all family members.
  • Prim’s and Kruskal’s algorithms are used for finding the minimum spanning trees.
  • Used in applications like Google Maps to find the shortest path in a graph.
  • Used for finding the minimum spanning trees.
  • In Google Maps to find the shortest path between the source and the series of destinations (one by one) out of the various available paths.
  • In networking to transfer data from a sender to various receivers in a sequential manner.
  • Suppose we are coding a chess-playing algorithm and at a certain point, the algorithm finds that a set of steps fails to win. In this situation, the algorithm will reverse back to the safe state and try another possible set of steps.
  • Sudoku solver
  •  Computer networking.
  • To solve problem of the N Queen. 
  • To solve the problem of the Maze.
  • To find the Hamiltonian Path present in a graph.
  • To the love problem of Knight’s Tour Problem.

Please Login to comment...

Similar reads.

  • Data Structures-Array
  • Data Structures-B and B+ Trees
  • Data Structures-Balanced Binary Search Trees
  • Data Structures-Binary Trees
  • Data Structures-Graph
  • Data Structures-Hash
  • Data Structures-Heap
  • Data Structures-Linked List
  • Data Structures-Misc
  • Data Structures-Queue
  • Data Structures-Stack
  • Data Structures-Tree Traversals

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Research: Resume Gaps Still Matter

  • Boris Groysberg

current research on data structures

While attitudes about gaps are changing, data shows that they can still be damaging to job applicants.

Without knowing the details of a person’s history, employers rely on signals of quality to make bets on who will make quality employees with a strong organizational fit. Resume gaps used to be clear negative signals, but attitudes seem to be changing today. For example, LinkedIn recently adopted a new “Career Breaks” feature in which users can showcase skills acquired during a professional pause. While tempting to declare the present day a new age of tolerance and compassion, a deeper analysis suggests it might be wise to take a more guarded perspective, as the reality of the post-pandemic labor market is still unfolding. Drawing on both current studies as well as executive compensation data from the 2008 Great Recession period, the authors show that resume gaps hurt job seekers, both in their ability to get jobs and their pay.

Attitudes toward resume gaps seem to be changing. While they once were considered a serious red flag for job candidates, today we’re seeing more people talking openly and without reservation about taking a break from employment. A 2022 LinkedIn survey of 23,000 global workers indicated that nearly two-thirds of respondents indicated that they had taken some sort of career break. LinkedIn released this survey data while introducing a new “Career Breaks” feature, allowing its users to showcase non-employment experiences and skills acquired during a professional pause.

  • BG Boris Groysberg is a professor of business administration in the Organizational Behavior unit at Harvard Business School and a faculty affiliate at the school’s Race, Gender & Equity Initiative. He is the coauthor, with Colleen Ammerman, of Glass Half-Broken: Shattering the Barriers That Still Hold Women Back at Work (Harvard Business Review Press, 2021). bgroysberg
  • Eric Lin is an associate professor of business and co-chair of the business program at Oberlin College and Conservatory. His research focuses on human capital and talent management. LinXEric

Partner Center

  • Introduction
  • Conclusions
  • Article Information

Data Sharing Statement

  • Sociodemographic and Lifestyle Factors and Epigenetic Aging in US Young Adults JAMA Network Open Original Investigation July 29, 2024 This cohort study investigates the association of sociodemographic and lifestyle factors with biological age as measured by epigenetic clocks among younger adults. Kathleen Mullan Harris, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Chantel Martin, PhD; Jess M. Meyer, PhD; Aura Ankita Mishra, PhD; Audrey L. Kelly, PhD; Allison E. Aiello, PhD
  • Telehealth Parenting Program and Epigenetic Biomarkers in Children With Developmental Delay JAMA Network Open Original Investigation July 29, 2024 This secondary analysis of a randomized clinical trial assesses the association of a telehealth parent-child interaction training program with biomarkers associated with aging and chronic inflammation among preschool-aged children with developmental delay. Sarah M. Merrill, PhD; Christina Hogan, MS; Anne K. Bozack, PhD; Andres Cardenas, PhD; Jonathan S. Comer, PhD; Daniel M. Bagner, PhD; April Highlander, PhD; Justin Parent, PhD
  • Socioeconomic Status, Lifestyle, and DNA Methylation Age JAMA Network Open Original Investigation July 29, 2024 This cohort study explores whether the rate of biological aging estimated by an epigenetic clock is associated with social determinants of health in a racially and ethnically diverse population. Alika K. Maunakea, PhD; Krit Phankitnirundorn, PhD; Rafael Peres, PhD; Christian Dye, PhD; Ruben Juarez, PhD; Catherine Walsh, PhD; Connor Slavens, BSc; S. Lani Park, PhD; Lynne R. Wilkens, DrPH; Loïc Le Marchand, MD, PhD
  • Epigenetic Age Acceleration and Disparities in Posttraumatic Stress in Women JAMA Network Open Original Investigation July 29, 2024 This cohort study examines the association of epigenetic age acceleration with probable posttraumatic stress disorder and symptom severity in US women exposed to disaster. Alicia K. Smith, PhD; Seyma Katrinli, PhD; Dawayland O. Cobb, MS; Evan G. Goff, BS; Michael Simmond, BS; Grace M. Christensen, PhD, MPH; Tyler Prusisz, BS; Sierra N. Garth, MPH; Meghan Brashear, MPH; Anke Hüls, PhD, MSc; Erika J. Wolf, PhD; Edward J. Trapido, ScD; Ariane L. Rung, PhD, MPH; Nicole R. Nugent, PhD; Edward S. Peters, DMD, SM, ScD
  • Childhood Maltreatment and Longitudinal Epigenetic Aging JAMA Network Open Original Investigation July 29, 2024 This cohort study examines whether childhood exposure to physical and emotional abuse and neglect is associated with the rate of epigenetic aging. Olivia D. Chang, MSW; Helen C. S. Meier, PhD; Kathryn Maguire-Jack, PhD; Pamela Davis-Kean, PhD; Colter Mitchell, PhD
  • Familial Loss of a Loved One and Biological Aging JAMA Network Open Original Investigation July 29, 2024 This cohort study evaluates associations between losing a loved one and accelerated biological aging. Allison E. Aiello, PhD, MS; Aura Ankita Mishra, PhD; Chantel L. Martin, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Daniel W. Belsky, PhD; Robert A. Hummer, PhD; Debra J. Umberson, PhD; Kathleen Mullan Harris, PhD
  • Obesity and Early-Onset Breast Cancer in Black and White Women JAMA Network Open Original Investigation July 29, 2024 This cohort study of patients with breast cancer examines whether a race-specific association exists between obesity and early-onset breast cancer or the diagnosis of specific molecular subtypes. Sarabjeet Kour Sudan, PhD; Amod Sharma, PhD; Kunwar Somesh Vikramdeo, PhD; Wade Davis, BS; Sachin K. Deshmukh, PhD; Teja Poosarla, MD; Nicolette P. Holliday, MD; Pranitha Prodduturvar, MD; Cindy Nelson, BS; Karan P. Singh, PhD; Ajay P. Singh, PhD; Seema Singh, PhD
  • Psychosocial Disadvantage During Childhood and Midlife Health JAMA Network Open Original Investigation July 29, 2024 This cohort study examines independent and additive associations of low childhood socioeconomic status and perceived stress in childhood with insulin resistance and epigenetic aging among women followed up from 10 to 40 years of age. Ryan L. Brown, PhD; Katie E. Alegria, PhD; Elissa Hamlat, PhD; A. Janet Tomiyama, PhD; Barbara Laraia, PhD; Eileen M. Crimmins, PhD; Terrie E. Moffitt, PhD; Elissa S. Epel, PhD
  • Epigenetic Aging and Racialized, Economic, and Environmental Injustice JAMA Network Open Original Investigation July 29, 2024 This cross-sectional study assesses whether socially structured adversity is associated with increased epigenetic accelerated aging among US-born Black non-Hispanic, Hispanic, and White non-Hispanic adults. Nancy Krieger, PhD; Christian Testa, BS; Jarvis T. Chen, ScD; Nykesha Johnson, MPH; Sarah Holmes Watkins, PhD; Matthew Suderman, PhD; Andrew J. Simpkin, PhD; Kate Tilling, BSc, MSc, PhD; Pamela D. Waterman, MPH; Brent A. Coull, PhD; Immaculata De Vivo, PhD; George Davey Smith, MA(Oxon), MD, BChir(Cantab), MSc(Lond); Ana V. Diez Roux, MD, PhD, MPH; Caroline Relton, PhD
  • Prenatal Maternal Occupation and Child Epigenetic Age Acceleration JAMA Network Open Original Investigation July 29, 2024 This cohort study of mother-infant pairs examines the association between prenatal maternal occupation and epigenetic aging among children in a Latino agricultural community in California. Saher Daredia, MPH; Anne K. Bozack, PhD; Corinne A. Riddell, PhD; Robert Gunier, PhD; Kim G. Harley, PhD; Asa Bradman, PhD; Brenda Eskenazi, PhD; Nina Holland, PhD; Julianna Deardorff, PhD; Andres Cardenas, PhD
  • Advancing Health Disparities Science Through Social Epigenomics Research JAMA Network Open Special Communication July 29, 2024 This special communication introduces the studies included in this special issue as part of the National Institutes of Health National Institute on Minority Health and Health Disparities Social Epigenomics Program. Arielle S. Gillman, PhD, MPH; Eliseo J. Pérez-Stable, MD; Rina Das, PhD

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Chiu DT , Hamlat EJ , Zhang J , Epel ES , Laraia BA. Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Midlife Black and White Women : NIMHD Social Epigenomics Program . JAMA Netw Open. 2024;7(7):e2422749. doi:10.1001/jamanetworkopen.2024.22749

Manage citations:

© 2024

  • Permissions

Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Midlife Black and White Women : NIMHD Social Epigenomics Program

  • 1 Community Health Sciences Division, School of Public Health, University of California, Berkeley
  • 2 Osher Center for Integrative Health, University of California, San Francisco
  • 3 Department of Psychiatry and Behavioral Sciences, University of California, San Francisco
  • 4 Department of Human Genetics, University of California, Los Angeles
  • Original Investigation Sociodemographic and Lifestyle Factors and Epigenetic Aging in US Young Adults Kathleen Mullan Harris, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Chantel Martin, PhD; Jess M. Meyer, PhD; Aura Ankita Mishra, PhD; Audrey L. Kelly, PhD; Allison E. Aiello, PhD JAMA Network Open
  • Original Investigation Telehealth Parenting Program and Epigenetic Biomarkers in Children With Developmental Delay Sarah M. Merrill, PhD; Christina Hogan, MS; Anne K. Bozack, PhD; Andres Cardenas, PhD; Jonathan S. Comer, PhD; Daniel M. Bagner, PhD; April Highlander, PhD; Justin Parent, PhD JAMA Network Open
  • Original Investigation Socioeconomic Status, Lifestyle, and DNA Methylation Age Alika K. Maunakea, PhD; Krit Phankitnirundorn, PhD; Rafael Peres, PhD; Christian Dye, PhD; Ruben Juarez, PhD; Catherine Walsh, PhD; Connor Slavens, BSc; S. Lani Park, PhD; Lynne R. Wilkens, DrPH; Loïc Le Marchand, MD, PhD JAMA Network Open
  • Original Investigation Epigenetic Age Acceleration and Disparities in Posttraumatic Stress in Women Alicia K. Smith, PhD; Seyma Katrinli, PhD; Dawayland O. Cobb, MS; Evan G. Goff, BS; Michael Simmond, BS; Grace M. Christensen, PhD, MPH; Tyler Prusisz, BS; Sierra N. Garth, MPH; Meghan Brashear, MPH; Anke Hüls, PhD, MSc; Erika J. Wolf, PhD; Edward J. Trapido, ScD; Ariane L. Rung, PhD, MPH; Nicole R. Nugent, PhD; Edward S. Peters, DMD, SM, ScD JAMA Network Open
  • Original Investigation Childhood Maltreatment and Longitudinal Epigenetic Aging Olivia D. Chang, MSW; Helen C. S. Meier, PhD; Kathryn Maguire-Jack, PhD; Pamela Davis-Kean, PhD; Colter Mitchell, PhD JAMA Network Open
  • Original Investigation Familial Loss of a Loved One and Biological Aging Allison E. Aiello, PhD, MS; Aura Ankita Mishra, PhD; Chantel L. Martin, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Daniel W. Belsky, PhD; Robert A. Hummer, PhD; Debra J. Umberson, PhD; Kathleen Mullan Harris, PhD JAMA Network Open
  • Original Investigation Obesity and Early-Onset Breast Cancer in Black and White Women Sarabjeet Kour Sudan, PhD; Amod Sharma, PhD; Kunwar Somesh Vikramdeo, PhD; Wade Davis, BS; Sachin K. Deshmukh, PhD; Teja Poosarla, MD; Nicolette P. Holliday, MD; Pranitha Prodduturvar, MD; Cindy Nelson, BS; Karan P. Singh, PhD; Ajay P. Singh, PhD; Seema Singh, PhD JAMA Network Open
  • Original Investigation Psychosocial Disadvantage During Childhood and Midlife Health Ryan L. Brown, PhD; Katie E. Alegria, PhD; Elissa Hamlat, PhD; A. Janet Tomiyama, PhD; Barbara Laraia, PhD; Eileen M. Crimmins, PhD; Terrie E. Moffitt, PhD; Elissa S. Epel, PhD JAMA Network Open
  • Original Investigation Epigenetic Aging and Racialized, Economic, and Environmental Injustice Nancy Krieger, PhD; Christian Testa, BS; Jarvis T. Chen, ScD; Nykesha Johnson, MPH; Sarah Holmes Watkins, PhD; Matthew Suderman, PhD; Andrew J. Simpkin, PhD; Kate Tilling, BSc, MSc, PhD; Pamela D. Waterman, MPH; Brent A. Coull, PhD; Immaculata De Vivo, PhD; George Davey Smith, MA(Oxon), MD, BChir(Cantab), MSc(Lond); Ana V. Diez Roux, MD, PhD, MPH; Caroline Relton, PhD JAMA Network Open
  • Original Investigation Prenatal Maternal Occupation and Child Epigenetic Age Acceleration Saher Daredia, MPH; Anne K. Bozack, PhD; Corinne A. Riddell, PhD; Robert Gunier, PhD; Kim G. Harley, PhD; Asa Bradman, PhD; Brenda Eskenazi, PhD; Nina Holland, PhD; Julianna Deardorff, PhD; Andres Cardenas, PhD JAMA Network Open
  • Special Communication Advancing Health Disparities Science Through Social Epigenomics Research Arielle S. Gillman, PhD, MPH; Eliseo J. Pérez-Stable, MD; Rina Das, PhD JAMA Network Open

Question   Are dietary patterns, including essential nutrients and added sugar intakes, and scores of nutrient indices associated with epigenetic aging?

Findings   In this cross-sectional study of 342 Black and White women at midlife, higher added sugar intake was associated with older epigenetic age, whereas higher essential, pro-epigenetic nutrient intake and higher Alternate Mediterranean Diet (aMED) and Alternate Healthy Eating Index (AHEI)–2010 scores (reflecting dietary alignment with Mediterranean diet and chronic disease prevention guidelines, respectively) were associated with younger epigenetic age.

Meaning   The findings of this study suggest a tandem importance in both optimizing nutrient intake and reducing added sugar intake for epigenetic health.

Importance   Nutritive compounds play critical roles in DNA replication, maintenance, and repair, and also serve as antioxidant and anti-inflammatory agents. Sufficient dietary intakes support genomic stability and preserve health.

Objective   To investigate the associations of dietary patterns, including intakes of essential nutrients and added sugar, and diet quality scores of established and new nutrient indices with epigenetic age in a diverse cohort of Black and White women at midlife.

Design, Setting, and Participants   This cross-sectional study included analyses (2021-2023) of past women participants of the 1987-1997 National Heart, Lung, and Blood Institute Growth and Health Study (NGHS), which examined cardiovascular health in a community cohort of Black and White females aged between 9 and 19 years. Of these participants who were recruited between 2015 and 2019 from NGHS’s California site, 342 females had valid completed diet and epigenetic assessments. The data were analyzed from October 2021 to November 2023.

Exposure   Diet quality scores of established nutrient indices (Alternate Mediterranean Diet [aMED], Alternate Healthy Eating Index [AHEI]–2010); scores for a novel, a priori–developed Epigenetic Nutrient Index [ENI]; and mean added sugar intake amounts were derived from 3-day food records.

Main Outcomes and Measures   GrimAge2, a second-generation epigenetic clock marker, was calculated from salivary DNA. Hypotheses were formulated after data collection. Healthier diet indicators were hypothesized to be associated with younger epigenetic age.

Results   A total of 342 women composed the analytic sample (mean [SD] age, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants). In fully adjusted models, aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06) scores, and added sugar intake (β, 0.02; 95% CI, 0.01-0.04) were each significantly associated with GrimAge2 in expected directions. In combined analyses, the aforementioned results with GrimAge2 were preserved with the association estimates for aMED and added sugar intake retaining their statistical significance.

Conclusions and Relevance   In this cross-sectional study, independent associations were observed for both healthy diet and added sugar intake with epigenetic age. To our knowledge, these are among the first findings to demonstrate associations between added sugar intake and epigenetic aging using second-generation epigenetic clocks and one of the first to extend analyses to a diverse population of Black and White women at midlife. Promoting diets aligned with chronic disease prevention recommendations and replete with antioxidant or anti-inflammatory and pro-epigenetic health nutrients while emphasizing low added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.

Epigenetic clocks powerfully predict biological age independent of chronological age. These clocks reflect altered gene and protein expression patterns, particularly those resulting from differential DNA methylation (DNAm) at CpG (5′-C-phosphate-G-3′) sites. DNAm that accumulates over time is a testament to the toll social, behavioral, and environmental forces can have on the body. 1 - 3 These alterations often result in pathogenic processes (eg, genomic instability, systemic inflammation, and oxidative stress) characteristic of aging and chronic disease. 1 , 4 , 5 As such, myriad clocks reflecting epigenetic age have been developed for a range of age- or disease-related targets. 4 , 6 The GrimAge series contains second-generation markers of epigenetic aging that account for clinical and functional biomarkers, and is most notable for its robust associations with human mortality and morbidity risk, including time to death and comorbidity counts. 6 , 7 The recently developed version 2 of the GrimAge clock (hereafter, GrimAge2) improved on the first’s predictive abilities and confirmed its applicability for people at midlife and of different racial and ethnic backgrounds. 1 , 6

Epigenetic changes are modifiable and efforts to counter epigenetic alteration in humans have centered on lifestyle factors including diet, inspiring concepts of an “epigenetic diet” and “nutriepigenetics.” 8 , 9 So far, 2 epidemiological studies have found inverse associations between higher diet quality and slower epigenetic aging using clock measures related to mortality, including the first version of GrimAge. 7 , 10 In those studies, diet measures were reflective of healthy dietary patterns (eg, the Dietary Approaches to Stop Hypertension [DASH] diet, the Alternate Mediterranean Diet [aMED] score) emphasizing consumption of fruits, vegetables, whole grains, nuts and seeds, and legumes. 8 , 11 For example, the Mediterranean-style diet is largely plant-based with emphasis on extra virgin olive oil and seafood. This makes it replete with bioactive nutrients and phytotherapeutic compounds and low in highly processed, high fat, and nutrient-poor foods, a mixture hypothesized to be protective against low-grade chronic inflammation (“inflammaging”), oxidative stress, intracellular and extracellular waste accumulation, and disrupted intracellular signaling and protein-protein interactions. Thus, such a pattern is likely effective in preventing and reversing the epigenetic changes and pathogenic processes associated with aging, disease, and decline. 4 , 8 , 12 - 14

Dietary Reference Intakes (DRIs) are an established set of nutrient-specific reference values determined by experts that guide population intakes for adequacy and toxic effects. 15 Recent thinking, however, suggests that diets may not always adequately supply nutrients and other bioactives, particularly relative to the amounts necessary to fully condition gene expression or counteract epigenetic alterations to ensure optimal physiological metabolism. 8 Macronutrients and micronutrients play crucial roles in DNA replication, damage prevention, and repair, whereas nutrient deficiencies (and excesses) can cause genomic damage to the same degree as physical or chemical exposures. 16 Given that (1) progenome effects of some micronutrients have been observed at different and higher levels than the established DRIs and (2) determination of DRIs does not solely consider genomic stability (ie, lesser susceptibility to genomic alterations), experts have called for refining the DRIs to be better aligned for genomic health maintenance. 14 , 16 - 18 Diet quality inventories, such as those for Mediterranean-style diets, have not generally incorporated DRIs, although such considerations could clarify how food-based indices compare against requirements for related nutrients (eg, those with epigenetic properties) and refine epidemiological and intervention efforts. Accordingly, for this study, a novel nutrient index theoretically associated with epigenetic health was created and its associations with epigenetic aging were tested alongside established diet quality indices.

To date, nutriepigenetic work has mostly involved older White populations and focused on healthy dietary aspects. It is therefore important to examine the associations between nutrition and epigenetic aging in more diverse samples and to better understand what specific dietary aspects could be underlying the observed associations. Nutrients with established epigenetic action should be examined, especially considering intakes relative to amounts set forth in the DRIs and nutritional recommendations. Similarly, sugar is an established pro-inflammatory and oxidative agent that has been implicated in cancer as well as cardiometabolic diseases. 19 - 21 However, in diet quality indices often studied in the epigenetic context (eg, the aMED), sugar is noticeably unaccounted for, and it has also yet to be examined alone. Given the high consumption of sugar globally and the demographic variations within, 22 - 24 elucidating this association could motivate future dietary interventions and guidelines as well as health disparities research. This study sought to examine associations of diet with GrimAge2 in a midlife cohort comprising Black and White US women. The central hypothesis was that indicators of a healthier diet may be associated with decelerated epigenetic aging, and added sugar intake with accelerated aging.

This cross-sectional study used data from the original National Heart, Lung, and Blood Institute (NHLBI) Growth and Health Study (NGHS) (1987-1999) and its follow-up (2015-2019), which studied a cohort of Black and White females aged from 9 or 10 years into midlife (age 36-43 years), examining cardiometabolic health and related determinants. The participants were recruited based on biological female sex at age 9 or 10 years. The follow-up study re-recruited women from the California site. 25 , 26 Participants (and/or their parent[s] or guardian[s]) provided demographic data and completed online or paper surveys and new assessments. Participants received remuneration and provided informed consent. The institutional review board of the University of California, Berkeley, approved all study protocols. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline.

For inclusion in current analyses, the participants needed valid diet records and epigenetic data at midlife along with age and race and ethnicity information (participant self-reported); after excluding 5 women with epigenetic data quality issues, 342 individuals were included in the analytic sample. Complete case analyses were done. Among the 624 women who were followed up, the women composing the analytic sample were younger (39.2 years vs 39.9 years; P  < .001) and had greater body mass index (BMI, calculated as weight in kilograms divided by height in meters squared) compared with women without complete diet and epigenetic data (32.5 vs 30.7; P  = .02) ( Table 1 ). No differences were otherwise observed.

Participants provided saliva samples used for DNAm analyses performed by the University of California, Los Angeles Neuroscience Genomics Core (UNGC) of the Semel Institute for Neuroscience and Human Behavior using the Infinium HumanMethylation450 BeadChip platform (Illumina, Inc). DNAm profiles were generated by Horvath’s online calculator, 27 which provided (1) estimates of epigenetic age based on GrimAge2 estimation methods; and (2) assessments of data quality (again, 5 observations did not pass quality checks). GrimAge2 uses Cox proportional hazards regression models that regress time to death (due to all-cause mortality) on DNAm-based surrogates of plasma proteins, a DNAm-based estimator of smoking pack-years, age, and female sex. It was updated from GrimAge, version 1 6 by including 2 new DNAm-based estimators of plasma proteins—high-sensitivity C-reactive protein (logCRP) and hemoglobin A 1c (log A 1c )—beyond the original 7. Linear transformation of results from these models allows GrimAge2 to be taken as an epigenetic age estimate (in years). Further information can be accessed from studies on DNA treatment and isolation and advanced analysis options for generating output files 28 or GrimAge2. 1

The participants were instructed by the NGHS study staff to self-complete a 3-day food record at follow-up for 3 nonconsecutive days. 29 Data were entered into and analyzed by the Nutrition Data System for Research (NDSR) software, version 2018 (University of Minnesota Nutrition Coordinating Center).

Mean nutrient and food intakes were calculated across valid food records for each woman based on the NDSR 2018 output. These values were used to calculate the scores of 2 overall diet quality nutrient indices (aMED and the Alternate Healthy Eating Index [AHEI]–2010) and a novel index (Epigenetic Nutrient Index [ENI]) score as described below. The aMED (Mediterranean-style diet) followed published scoring methodology 30 reflecting the degree of adherence to 9 components of an anti-inflammatory, antioxidant-rich diet. The AHEI-2010 was assessed following published scoring instructions 31 and reflects the degree of adherence to 11 dietary components associated with decreased risk for chronic disease.

This study developed a novel nutrient index (ENI) after the Mediterranean-style diet, but via a nutrient-based approach rather than a food-based one. Nutrient selection was done a priori based on antioxidant and/or anti-inflammatory capacities as well as roles in DNA maintenance and repair documented in the literature. 16 , 32 , 33 Scores can range from 0 to 24, with higher scores reflecting higher DRI adherence ( Table 2 ). 34 The internal consistency of the ENI was acceptable (Cronbach α = 0.79). The ENI also demonstrated convergent validity with r  = 0.51 ENI-aMED correlation as well as higher ENI scores in women from childhood households with higher annual incomes (13.9 vs 11.7, for ≥$40 000/y vs <$10 000/y, respectively) and parental educational attainment (14.7 vs 12.3, for ≥college graduate vs < high school graduate, respectively), corresponding to the literature. 36 Pearson correlations between the ENI and diet scores and added sugar intake were also calculated. The ENI score was moderately correlated with the AHEI-2010 score ( r  = 0.44) but not correlated with added sugar intake. The aMED and AHEI-2010 scores were highly correlated at r  = 0.73. Added sugar intake had moderate correlation with the AHEI-2010 score ( r  = −0.44) and low correlation with the aMED score ( r  = −0.28).

Added sugar intake was calculated as the mean across valid food records using NDSR output. The NDSR defines added sugar intake as the total sugar added to foods (eg, as syrups and sugars) during food preparation and commercial food processing. Monosaccharides and disaccharides naturally occurring in foods are not included. 35

To maximize internal validity and minimize confounding, several covariates were included. Age and sample batch were controlled for as well as naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts, thus accounting for normal cell count variation. To control for baseline factors and their potential influence on diet and epigenetic age over time, the following parameters assessed at age 9 or 10 years (mostly parent or caregiver reported) were further adjusted for annual household income, highest parental educational attainment, number of parents in household, and number of siblings. Additionally, self-reported race (Black or White) as well as the current health and lifestyle factors of self-reported chronic conditions (yes to any of the following ever: cancer, diabetes [including gestational, prediabetes], hypertension, or hypercholesterolemia) or medication use (currently yes for any of the following conditions: diabetes, hypertension, hypercholesterolemia, or thyroid), BMI (measured), having ever smoked (yes or no), and mean daily total energy intake (as higher diet quality scores might result from higher energy intake) 37 were also included.

Descriptive analyses provided summary statistics. Linear regression models estimated unadjusted and adjusted cross-sectional associations between each of the 4 dietary exposures with GrimAge2. Per expert recommendations, unadjusted models controlled for women’s current age, sample batch, and both naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts. Adjusted models controlled for those variables in addition to relevant sociodemographic and health behavior–related covariates already listed. To examine the association between healthy diet measures together with added sugar intake and GrimAge2, aMED, AHEI-2010, and ENI scores were each separately put into the same fully adjusted multivariable linear regression model. The threshold for statistical significance was 2-tailed (α = .05) and all statistical analyses were conducted from October 2021 to November 2023 with Stata15 SE, version 15.1 (StataCorp LLC).

The analytic sample of this study comprised 342 women (mean [SD] age at follow-up, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants; mean [SD] BMI, 32.5 [10.0]; 150 [43.9%] ever smokers; 164 [48.0%] ever diagnosed with a chronic condition; and 58 [17.0%] currently taking medication) ( Table 1 ). The participants were well distributed across socioeconomic status categories at baseline (9-10 years old). The participants presented with low to moderate levels of diet quality; the mean (SD) scores were 3.9 (1.9) (possible range, 0-9) on the anti-inflammatory, antioxidant Mediterranean-style pattern (aMED); 55.4 (14.7) (possible range, 0-110) on the AHEI-2010 for chronic disease risk; and 13.5 (5.0) (possible range, 0-24) on the ENI for intakes of epigenetic-relevant nutrients relative to DRIs. The participants also reported mean (SD) daily added sugar intake of 61.5 (44.6) g, although the score range was large (2.7-316.5 g).

Table 3 provides the overall unadjusted and adjusted associations between each dietary exposure of interest and GrimAge2 resulting from multivariable linear regression models. In both unadjusted and adjusted models, all dietary exposures were statistically and significantly associated with GrimAge2 in the hypothesized, anticipated direction. In adjusted models, the associations observed for each dietary exposure were slightly attenuated. Each unit increase in the scores was associated with year changes in GrimAge2, as follows: aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06), indicating that healthier diets were associated with decelerated epigenetic aging. Each gram increase in added sugar intake was associated with a 0.02 (95% CI, 0.01 to 0.04) increase in GrimAge2, reflecting accelerated epigenetic aging.

Table 4 illustrates the associations of healthy diet measures (aMED, AHEI-2010, and ENI scores) and added sugar intake with epigenetic aging and gives the adjusted results for each healthy diet measure and added sugar intake with GrimAge2 in the context of each other. In all instances, healthier diet measures and added sugar intake appeared to maintain their independent associations with GrimAge2 in the expected directions. Associations were statistically significant for added sugar intake in all models as well as for aMED scores; 95% CIs were more imprecise for AHEI-2010 and ENI scores.

The findings of this cross-sectional study are among the first, to our knowledge, to demonstrate the association of added sugar intake with an epigenetic clock. Further, to our knowledge, it is the first study to examine the associations of diet with GrimAge2 and extend the applicability of such results to a cohort of Black and White women at midlife. As hypothesized, measures of healthy dietary patterns (aMED, AHEI-2010 scores), and high intakes of nutrients theoretically related to epigenetics (ENI) were associated with younger epigenetic age, while a higher intake of added sugar was associated with older epigenetic age. Additionally, this study examined indicators of healthy and less healthy diets in the same model, allowing simultaneous evaluation of each in the presence of the other. Although the magnitudes of associations were diminished and some 95% CIs became wider, their statistical significance generally persisted, supporting the existence of independent epigenetic associations of both healthy and less healthy diet measures. This approach is informative, as dietary components are often examined singularly or in indices, which can lead to erroneous conclusions if key contextual dietary components are not accounted for or are obscured. From these findings, even in healthy dietary contexts, added sugar still has detrimental associations with epigenetic age. Similarly, despite higher added sugar intake, healthier dietary intakes appear to remain generally associated with younger epigenetic age.

The number of published nutriepigenetic studies, particularly on examining second-generation epigenetic clock markers, is still relatively small. However, the results of the present study are consistent with the literature. Two other studies 7 , 10 have examined GrimAge1-associated outcomes and found higher diet quality scores, including the DASH and aMED, were associated with slower epigenetic aging. However, those studies were limited to older (>50 years) and White populations, limiting their demographic generalizability. Analyses of epigenetic aging and added sugar intake are new, but findings are consistent with the larger body of epidemiological work that has drawn connections between added sugar intake and cardiometabolic disease, 19 , 20 perhaps suggesting a potential mechanism underlying such observations. Granted, point and 95% CI estimates for the added sugar–GrimAge2 associations were close to zero, suggesting a smaller role for added sugar compared with healthy dietary measures; however, more studies are needed. Nevertheless, their statistical significance was persistent.

Nutrient-based inventories can provide epidemiological contributions for genomic health studies. The idea of epigenetically critical nutrients is important for 2 reasons. First, it supports the notion that epigenetic nutrient intakes above DRI levels could boost epigenetic preservation and potentially motivate updates to nutritional guidelines, an outcome advocated for by nutriepigenetic experts. 16 - 18 In the novel ENI constructed for the present study, points were awarded based on comparisons of average daily intakes with: (1) estimated average requirements, or the requirement considered adequate for half of the healthy individuals in a population, and (2) recommended dietary allowances or adequate intakes, or where 97% to 98% or essentially all of a population’s healthy individuals’ requirements for a nutrient are met. 15 Future iterations could test varying ENI scoring parameters relative to DRIs for epigenetic benefit. Second, taking a nutrient approach suggests that any dietary pattern rich in vitamins, minerals, and other bioactives could be useful for preserving epigenetic health. This is helpful because dietary patterns are socioculturally influenced, but a nutrient focus rather than a focus on foods could help bridge cultures, class, and geography. 9 The Okinawan diet, for example, is nutritionally similar to the Mediterranean-style diet but more aligned to Asian tastes. 38 In general, the sociodemographic determinants of diet should not be discounted. Across the US population, for instance, it is known that overall diet quality is mediocre and relatively low while added sugar intake is considerably high, as also observed in the sample of the present study. However, specific nutrient intakes will vary based on the particulars of dietary patterns. 22 , 36 As dietetics and medicine progresses into the era of personalized nutrition and personalized medicine, the role of social factors including diet will be important to consider in epigenetic studies and could figure prominently in work on health disparities.

Strengths of this study are its inclusion of a diverse group of women as well as use of robust measures of diet and DNAm. It was also possible to control for several potential sociodemographic confounders.

This study also has limitations. As a cross-sectional study, it is not possible to infer causality without temporality, and therefore longitudinal studies are needed. Additionally, diet was self-reported via 3-day food records, which may lead to underestimates and overestimates of intakes depending on the nutrient. Therefore, augmenting dietary assessment with food frequency questionnaires and/or biomarkers could be helpful. 39 Also, other nutrients with pro-epigenetic properties were not included in the current ENI. Still, the Cronbach α for this first ENI version was acceptable at 0.79 and it demonstrated good convergent validity with customary socioeconomic and demographic characteristics. The tolerable upper intake levels of the DRIs were not considered in constructing the ENI. Future work should assess the prevalence of intakes beyond upper limits to assess whether toxicity could be a concern.

To our knowledge, the findings of this cross-sectional study are among the first to find associations between indicators of healthy diet as well as added sugar intake and second-generation epigenetic aging markers and one of the first to include a cohort of Black women. Higher diet quality and higher consumption of antioxidants or anti-inflammatory nutrients were associated with younger epigenetic age, whereas higher consumption of added sugar was associated with older epigenetic age. Promotion of healthy diets aligned with chronic disease prevention and decreased added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.

Accepted for Publication: April 29, 2024.

Published: July 29, 2024. doi:10.1001/jamanetworkopen.2024.22749

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Chiu DT et al. JAMA Network Open .

Corresponding Author: Dorothy T. Chiu, PhD, Osher Center for Integrative Health, University of California, San Francisco, 1545 Divisadero St, #301D, San Francisco, CA 94115 ( [email protected] ); Barbara A. Laraia, PhD, MPH, RD, Community Health Sciences Division, School of Public Health, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720 ( [email protected] ).

Author Contributions: Drs Chiu and Laraia had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Epel and Laraia share co–senior authorship on this article.

Concept and design: Chiu, Hamlat, Epel, Laraia.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Chiu, Hamlat, Laraia.

Critical review of the manuscript for important intellectual content: Hamlat, Zhang, Epel, Laraia.

Statistical analysis: Chiu, Hamlat, Zhang.

Obtained funding: Epel, Laraia.

Administrative, technical, or material support: Chiu, Laraia.

Supervision: Epel, Laraia.

Conflict of Interest Disclosures: Dr Chiu reported receiving support from grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); the National Heart, Lung, and Blood Institute (NHLBI); the National Institute on Aging (NIA); and the National Center for Complementary and Integrative Health (NCCIH) during the conduct of the study. Dr Hamlat reported receiving grants from the National Institutes of Health (NIH) during the conduct of the study. Dr Laraia reported receiving grants from NIH NICHD during the conduct of the study. No other disclosures were reported.

Funding/Support: The research reported in this publication was supported by grant R01HD073568 from the Eunice Kennedy Shriver NICHD (Drs Laraia and Epel, principal investigators [PIs]); grant R56HL141878 from the NHLBI; and grants R56AG059677 and R01AG059677 from the NIA (both for Drs Epel and Laraia, PIs). The participation of Dr Chiu was supported by the University of California, San Francisco Osher Center research training fellowship program under grant T32AT003997 from NCCIH.

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See the Supplement .

Additional Contributions: We recognize the past and present NHLBI Growth and Health Study (NGHS) staff for their talents and dedication, without which the study and these analyses would not have been possible. We also thank the Nutrition Policy Institute for providing consultation and support with historical study data. Additionally, we express immense gratitude to Ake T. Lu, PhD, and Steve Horvath, PhD, now of Altos Labs, for their epigenetic clock expertise and consultation. Neither was financially compensated for their contributions beyond their usual salary. Of note, we thank the NGHS participants for their time and efforts over the years.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

You're reading a free article with opinions that may differ from The Motley Fool's Premium Investing Services. Become a Motley Fool member today to get instant access to our top analyst recommendations, in-depth research, investing resources, and more. Learn More

Lam Research EPS Up, Revenue Surges

You’re reading a free article with opinions that may differ from The Motley Fool’s Premium Investing Services. Become a Motley Fool member today to get instant access to our top analyst recommendations, in-depth research, investing resources , and more. Learn More

NASDAQ: LRCX

Lam research.

Lam Research Stock Quote

Lam Research's latest earnings surpassed expectations with strong growth in key areas, maintaining its leadership in the semiconductor equipment industry.

  • Revenue for June 2024 quarter was $3.87 billion, exceeding the midpoint of management's guidance range at $3.8 billion.
  • GAAP diluted EPS increased by 6% to $7.78 compared to $7.34 in the previous quarter.
  • Management provided optimistic guidance for next quarter, with revenue expected to reach $4.05 billion ± $300 million.

Lam Research ( LRCX -9.87% ) is a leading provider of wafer fabrication equipment and services to the semiconductor industry. In its earnings release dated July 31, 2024, for the quarter ending June 30, 2024, it reported revenue of $3.87 billion, surpassing management's guidance of $3.8 billion ± $300 million.

This represents a quarterly increase of 2.1% from $3.79 billion in the March 2024 quarter. Diluted earnings per share (EPS) according to generally accepted accounting practices (GAAP) rose by 6.0% to $7.78, compared to $7.34 in March 2024. Non-GAAP diluted EPS increased to $8.14 from $7.79, above the midpoint of management's guidance of $7.50 ± $0.75.

Overall, the report showed solid financial performance with improvements across several metrics.

Q2 2024Management's GuidanceQ1 2024Percent Change
Revenue$3.87 billion$3.8 ± 0.3 billion$3.79 billion2.1%
Gross Margin47.5%46.7% ± 1%47.5%0.0 pp
Operating Income Margin29.1%28.3% ± 1%27.9%+1.2 pp
Diluted EPS$7.78$7.20 ± 0.75$7.346.0%

Data source: SEC filings with expectations based on management's guidance. PP = percentage points.

Understanding Lam Research

Lam Research provides essential equipment for semiconductor wafer fabrication, a key component in creating integrated circuits used in many electronic devices. Its primary product segments include etching, deposition, and cleaning tools used in semiconductor manufacturing.

Recently, the company has focused heavily on research and development (R&D) to maintain its competitive edge. Investments in advanced packaging, high-bandwidth memory, and technologies related to artificial intelligence (AI) have been pivotal. The company's strategic direction also emphasizes strong customer relationships and collaborating within the semiconductor ecosystem .

Quarterly Highlights

The quarter ending June 30, 2024, saw several notable achievements for Lam Research. Revenue increased by 2.1% quarter-over-quarter to $3.87 billion, driven by solid gains in customer support-related revenue amid a 9% drop in systems sales. Gross margin stood at 47.5%, consistent with the prior quarter, and operating income improved to 29.1% of revenue, up from 27.9% in the previous quarter.

From a financial perspective, GAAP diluted EPS increased to $7.78, a 6.0% increase from $7.34 in the March quarter. Non-GAAP diluted EPS also showed growth, rising 4.5% to $8.14, slightly above management's guidance of $7.50 ± $0.75.

Key growth areas included the customer support business, which saw an increase in revenue from $1.40 billion in the March quarter to $1.70 billion. Geographically, China continued to be a major revenue contributor, accounting for 39% of total sales. Other important regions included Korea (18%) and Taiwan (15%).

However, there were also challenges. Deferred revenue decreased significantly from $1.75 billion at the end of March to $1.55 billion at the end of June. This decrease could indicate potential future revenue timing issues. On the regulatory front, ongoing concerns about potential U.S. government-imposed restrictions on exports to China remain a risk.

Lam Research also continued to make strategic investments in R&D and operations to position itself favorably within the industry. The company is optimistic about ongoing strong demand, particularly for AI-related equipment and services. Furthermore, the company announced a 10-for-1 stock split set to take effect in October 2024, which could positively impact stock liquidity and affordability.

Looking Ahead

For the third quarter of 2024, ending September 29, management projects revenue of $4.05 billion ± $300 million, indicating an optimistic outlook with an expected sequential growth. Gross margin is projected to be around 47%, and operating income is expected to be approximately 29.4% of revenue.

Investors should monitor the company's continued focus on innovation and technological leadership, especially in AI and memory technologies. Additionally, potential regulatory changes and their impact on Lam’s operations, particularly regarding exports to China, remain areas to watch closely.

JesterAI is a Foolish AI, based on a variety of Large Language Models (LLMs) and proprietary Motley Fool systems. All articles published by JesterAI are reviewed by our editorial team, and The Motley Fool takes ultimate responsibility for the content of this article. JesterAI cannot own stocks and so it has no positions in any stocks mentioned. The Motley Fool has positions in and recommends Lam Research. The Motley Fool has a disclosure policy .

Invest Smarter with The Motley Fool

Join over half a million premium members receiving….

  • New Stock Picks Each Month
  • Detailed Analysis of Companies
  • Model Portfolios
  • Live Streaming During Market Hours
  • And Much More

Motley Fool Investing Philosophy

  • #1 Buy 25+ Companies
  • #2 Hold Stocks for 5+ Years
  • #3 Add New Savings Regularly
  • #4 Hold Through Market Volatility
  • #5 Let Winners Run
  • #6 Target Long-Term Returns

Why do we invest this way? Learn More

Stocks Mentioned

Lam Research Stock Quote

*Average returns of all recommendations since inception. Cost basis and return based on previous market day close.

Related Articles

featured-transcript-logo

Motley Fool Returns

Motley Fool Stock Advisor

Market-beating stocks from our award-winning analyst team.

Calculated by average return of all stock recommendations since inception of the Stock Advisor service in February of 2002. Returns as of 08/02/2024.

Discounted offers are only available to new members. Stock Advisor list price is $199 per year.

Calculated by Time-Weighted Return since 2002. Volatility profiles based on trailing-three-year calculations of the standard deviation of service investment returns.

Chart Showing the Cumulative Growth of a $10,000 Investment in Stock Advisor

Premium Investing Services

Invest better with The Motley Fool. Get stock recommendations, portfolio guidance, and more from The Motley Fool's premium services.

IMAGES

  1. What are Data Structures? Definition and Types

    current research on data structures

  2. (PDF) Advanced Data Structures: An Introduction to Data Structures and

    current research on data structures

  3. Mastering Data Structures and Algorithms: A Comprehensive Guide

    current research on data structures

  4. Structured data in big data: What it is and why is it important

    current research on data structures

  5. Data Structures

    current research on data structures

  6. Introduction to Data Structures and Algorithms

    current research on data structures

VIDEO

  1. Control Structures Repeat, Next, Break Johns Hopkins University Coursera

  2. How much Data Structures Algorithms

  3. Python for Data Analysis: Built-in Data Structures, Functions, and Files: Part 2 (py4da02 3)

  4. Control Structures For loops Johns Hopkins University Coursera

  5. Why Data Structures Matter for IT Careers

  6. Behind the Hype

COMMENTS

  1. Data Structures and Algorithms

    Title:The Power of Graph Sparsification in the Continual Release Model. Alessandro Epasto, Quanquan C. Liu, Tamalika Mukherjee, Felix Zhou. Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR) [7] arXiv:2407.18031 (cross-list from cs.DC) [ pdf, html, other ]

  2. Data Structures and Algorithms

    Subjects:Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG) In this work we study the problem of using machine-learned predictions to improve the performance of online algorithms. We consider two classical problems, ski rental and non-clairvoyant job scheduling, and obtain new online algorithms that use predictions to make their ...

  3. data structure Latest Research Papers

    Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show ...

  4. Algorithms

    For this Special Issue of Algorithms, we would like to invite articles dealing with the design, formal analysis, implementation, and experimental evaluation of efficient data structures for all kinds of computational problems. Of particular interest are algorithms for constructing data structures and extracting information from them efficiently.

  5. Recent advances and applications of deep learning methods in ...

    Graphs and their variants. Classical CNNs as described above are based on a regular grid Euclidean data (such as 2D grid in images). However, real-life data structures, such as social networks ...

  6. Advanced Data Structures (6.851)

    You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current research directions in data structures:

  7. Algorithms

    Special Issue Information. Dear Colleagues, Machine learning is the study of computer algorithms that allow computer programs to improve automatically through experience. Machine learning algorithms build a model based on training data to make predictions or decisions without being explicitly programmed to do so.

  8. efficient data structures Latest Research Papers

    The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T [i,j] of T with exactly one occurrence in [α,β]. We present an O (nlogn)-word data structure with O (logwn) query time, where w=Ω (logn) is the word size.

  9. Algorithms and Data Structures

    This book constitutes the refereed proceedings of the 18th International Symposium on Algorithms and Data Structures, WADS 2023, held during July 31-August 2, 2023. ... were carefully reviewed and selected from a total of 92 submissions. They present original research on the theory, design and application of algorithms and data structures ...

  10. Algorithms and Data Structures for New Models of Computation

    In the early days of computer science, the community settled on a simple standard model of computing and a basic canon of general purpose algorithms and data structures suited to that model. With isochronous computing, heterogeneous multiprocessors, flash memory, energy-aware computing, cache and other anisotropic memory, distributed computing, streaming environments, functional languages ...

  11. Hierarchical data structures for flowchart

    Current flowchart data structure is mainly based on the adjacency list, cross-linked list, and adjacency matrix of the graph structure. ... Existing research has shown that the logical data ...

  12. 125417 PDFs

    Data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. | Explore the latest full-text research PDFs, articles, conference papers ...

  13. Advanced Data Structures

    Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structure.

  14. New study shows the potential of DNA-based data-structures systems

    Newcastle University research offers important insights into how we could turn to DNA into a green-by-design data structure that organises data like conventional computers.

  15. (PDF) Data Structure: Theoretical Approach

    Data structures are generally based on the abilit y of a. computer to fetch and store data at any pla ce in its. memory, specified by a pointer — a bit string, representing a memory address ...

  16. Handbook of Data Structures and Applications

    The Handbook of Data Structures and Applications was first published over a decade ago. This second edition aims to update the first by focusing on areas of research in data structures that have seen significant progress. While the discipline of data structures has not matured as rapidly as other areas of computer science, the book aims to update those areas that have seen advances.

  17. (PDF) DATA STRUCTURES FOR MODERN APPLICATIONS

    A line ar data structure is a data structure in which the data elements are put consecutively or linearly, and each element is attached to its previous and next adjace nt components.

  18. PDF Concurrent Data Structures

    Concurrent Data Structures 1-3 1.1.1 Performance The speedup of an application when run on P processors is the ratio of its execution time on a single processor to its execution time on P processors. It is a measure of how effectively the application is utilizing the machine it is running on.

  19. A review of graph neural networks: concepts, architectures, techniques

    Deep learning has seen significant growth recently and is now applied to a wide range of conventional use cases, including graphs. Graph data provides relational information between elements and is a standard data format for various machine learning and deep learning tasks. Models that can learn from such inputs are essential for working with graph data effectively. This paper identifies nodes ...

  20. ScienceDaily: Your source for the latest research news

    Your source for the latest research news ... engineers have discovered a way to make a single plastic cubed structure ... 2024 — Modern cutting-edge research generates enormous amounts of data ...

  21. tree data structure Latest Research Papers

    Time Algorithm . Tree Data . Tree Data Structure. This paper describes a polynomial time algorithm for solving graph isomorphism and automorphism. We introduce a new tree data structure called Walk Length Tree. We show that such tree can be both constructed and compared with another in polynomial time.

  22. Improving LLM understanding of structured data and exploring advanced

    This research paper was presented at the 17th ACM International Conference on Web Search and Data Mining (opens in new tab) (WSDM 2024), the premier conference on web-inspired research on search and data mining. In today's data-driven landscape, tables are indispensable for organizing and presenting information, particularly text.

  23. List of data structures

    Multiset (bag) Stack. Queue (example Priority queue) Double-ended queue. Graph (example Tree, Heap) Some properties of abstract data types: This article needs attention from an expert in Computer science. The specific problem is: further features needed. WikiProject Computer science may be able to help recruit an expert.

  24. PDF A Survey Paper on Data Structure and Algorithm Visualization

    stand various Data Structures and Algorithms.VI. CONCLUSIONOur paper, Data Structure and Algorithm visualization is built using data structures and algorithms and web development will provide a platform for students, teachers or anyone with a wish to visualize and understand Data Structure and algorithms, and also get to know the re. l-li.

  25. Real-life Applications of Data Structures and Algorithms (DSA)

    Application of Algorithms: Algorithms are well-defined sets of instructions designed that are used to solve problems or perform a task. To explain in simpler terms, it is a set of operations performed in a step-by-step manner to execute a task. The real-life applications of algorithms are discussed below.

  26. Must read research papers on Data Structures

    Apply now. Data Structures are not seen to be as important as Algorithms but in reality, it is equally important to solve computational problems efficiently. The must read research papers on Data Structures are: Ordered Hash Table (1973) Randomized Search Trees (1989) EERTREE: An Efficient Data Structure for Processing Palindromes in Strings (2015)

  27. List of data structures

    This is a list of data structures. Note: The above text is excerpted from the Wikipedia article "List of data structures", which has been released under the GNU Free Documentation License. For ...

  28. Research: Resume Gaps Still Matter

    Drawing on both current studies as well as executive compensation data from the 2008 Great Recession period, the authors show that resume gaps hurt job seekers, both in their ability to get jobs ...

  29. Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Black

    For inclusion in current analyses, the participants needed valid diet records and epigenetic data at midlife along with age and race and ethnicity information (participant self-reported); after excluding 5 women with epigenetic data quality issues, 342 individuals were included in the analytic sample. Complete case analyses were done.

  30. Lam Research EPS Up, Revenue Surges

    Lam Research (LRCX-9.87%) is a leading provider of wafer fabrication equipment and services to the semiconductor industry. In its earnings release dated July 31, 2024, for the quarter ending June ...