Presentation Domain Data Layering

26 August 2015

Martin Fowler

team organization

encapsulation

application architecture

web development

One of the most common ways to modularize an information-rich program is to separate it into three broad layers: presentation (UI), domain logic (aka business logic), and data access. So you often see web applications divided into a web layer that knows about handling HTTP requests and rendering HTML, a business logic layer that contains validations and calculations, and a data access layer that sorts out how to manage persistent data in a database or remote services.

On the whole I've found this to be an effective form of modularization for many applications and one that I regularly use and encourage. It's biggest advantage (for me) is that it allows me to reduce the scope of my attention by allowing me to think about the three topics relatively independently. When I'm working on domain logic code I can mostly ignore the UI and treat any interaction with data sources as an abstract set of functions that give me the data I need and update it as I wish. When I'm working on the data access layer I focus on the details of wrangling the data into the form required by my interface. When I'm working on the presentation I can focus on the UI behavior, treating any data to display or update as magically appearing by function calls. By separating these elements I narrow the scope of my thinking in each piece, which makes it easier for me to follow what I need to do.

This narrowing of scope doesn't imply any sequence to programming them - I usually find I need to iterate between the layers. I might build the data and domain layers off my initial understanding of the UX, but when refining the UX I need to change the domain which necessitates a change to the data layer. But even with that kind of cross-layer iteration, I find it easier to focus on one layer at a time as I make changes. It's similar to the switching of thinking modes you get with refactoring's two hats .

Another reason to modularize is to allow me to substitute different implementations of modules. This separation allows me to build multiple presentations on top of the same domain logic without duplicating it. Multiple presentations could be separate pages in a web app, having a web app plus mobile native apps, an API for scripting purposes, or even an old fashioned command line interface. Modularizing the data source allows me to cope gracefully with a change in database technology, or to support services for persistence that may change with little notice. However I have to mention that while I often hear about data access substitution being a driver for separating the data source layer, I rarely hear of someone actually doing it.

Modularity also supports testability, which naturally appeals to me as a big fan of SelfTestingCode . Module boundaries expose seams that are good affordance for testing . UI code is often tricky to test, so it's good to get as much logic as you can into a domain layer which is easily tested without having to do gymnastics to access the program through a UI 1 . Data access is often slow and awkward, so using TestDoubles around the data layer often makes domain logic testing much easier and responsive.

1: A PageObject is also an important tool to help testing around UIs.

While substitutability and testability are certainly benefits of this layering, I must stress that even without either of these reasons I would still divide into layers like this. The reduced scope of attention reason is sufficient on its own.

When talking about this we can either look at it as one pattern (presentation-domain-data) or split it into two patterns (presentation-domain, and domain-data). Both points of view are useful - I think of presentation-domain-data as a composite of presentation-domain and domain-data.

I consider these layers to be a form of module, which is a generic word I use for how we clump our software into relatively independent pieces. Exactly how this corresponds to code depends on the programming environment we're in. Usually the lowest level is some form of subroutine or function. An object-oriented language will have a notion of class that collects functions and data structure. Most languages have some form of higher level called packages or namespaces, which often can be formed into a hierarchy. Modules may correspond to separately deployable units: libraries, or services, but they don't have to.

Layering can occur at any of these levels. A small program may just put separate functions for the layers into different files. A larger system may have layers corresponding to namespaces with many classes in each.

I've mentioned three layers here, but it's common to see architectures with more than three layers. A common variation is to put a service layer between the domain and presentation, or to split the presentation layer into separate layers with something like Presentation Model . I don't find that more layers breaks the essential pattern, since the core separations still remain.

The dependencies generally run from top to bottom through the layer stack: presentation depends on the domain, which then depends on the data source. A common variation is to arrange things so that the domain does not depend on its data sources by introducing a mapper between the domain and data source layers. This approach is often referred to as a Hexagonal Architecture .

These layers are logical layers not physical tiers. I can run all three layers on my laptop, I can run the presentation and domain model in a desktop with a database on a server, I can split the presentation with a rich client in the browser and a Backed For Frontend on the server. In that case I treat the BFF as a presentation layer as it's focused on supporting a particular presentation option.

Although presentation-domain-data separation is a common approach, it should only be applied at a relatively small granularity. As an application grows, each layer can get sufficiently complex on its own that you need to modularize further. When this happens it's usually not best to use presentation-domain-data as the higher level of modules. Often frameworks encourage you to have something like view-model-data as the top level namespaces; that's OK for smaller systems, but once any of these layers gets too big you should split your top level into domain oriented modules which are internally layered.

Developers don't have to be full-stack but teams should be.

One common way I've seen this layering lead organizations astray is the AntiPattern of separating development teams by these layers. This looks appealing because front-end and back-end development require different frameworks (or even languages) making it easy for developers to specialize in one or the other. Putting those people with common skills together supports skill sharing and allows the organization to treat the team as a provider of a single, well-delineated type of work. In the same way, putting all the database specialists together fits in with the common centralization of databases and schemas. But the rich interplay between these layers necessitates frequent swapping between them. This isn't too hard when you have specialists in the same team who can casually collaborate, but team boundaries add considerable friction, as well as reducing an individual's motivation to develop the important cross-layer understanding of a system. Worse, separating the layers into teams adds distance between developers and users. Developers don't have to be full-stack (although that is laudable) but teams should be.

Further Reading

I've written about this separation from a number of different angles elsewhere. This layering drives the structure of P of EAA and chapter 1 of that book talks more about this layering. I didn't make this layering a pattern in its own right in that book but have toyed with that territory with Separated Presentation and PresentationDomainSeparation .

For more on why presentation-domain-data shouldn't be the highest level modules in a larger system, take a look at the writing and speaking of Simon Brown . I also agree with him that software architecture should be embedded in code.

I had a fascinating discussion with my colleague Badri Janakiraman about the nature of hexagonal architectures. The context was mostly around applications using Ruby on Rails, but much of the thinking applies to other cases when you may be considering this approach.

Acknowledgements

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • Android Developers

Domain layer

The domain layer is an optional layer that sits between the UI layer and the data layer.

presentation layer domain layer data layer

The domain layer is responsible for encapsulating complex business logic, or simple business logic that is reused by multiple ViewModels. This layer is optional because not all apps will have these requirements. You should only use it when needed-for example, to handle complexity or favor reusability.

A domain layer provides the following benefits:

  • It avoids code duplication.
  • It improves readability in classes that use domain layer classes.
  • It improves the testability of the app.
  • It avoids large classes by allowing you to split responsibilities.

To keep these classes simple and lightweight, each use case should only have responsibility over a single functionality, and they should not contain mutable data. You should instead handle mutable data in your UI or data layers.

Naming conventions in this guide

In this guide, use cases are named after the single action they're responsible for. The convention is as follows:

verb in present tense + noun/what (optional) + UseCase .

For example: FormatDateUseCase , LogOutUserUseCase , GetLatestNewsWithAuthorsUseCase , or MakeLoginRequestUseCase .

Dependencies

In a typical app architecture, use case classes fit between ViewModels from the UI layer and repositories from the data layer. This means that use case classes usually depend on repository classes, and they communicate with the UI layer the same way repositories do—using either callbacks (for Java) or coroutines (for Kotlin). To learn more about this, see the data layer page .

For example, in your app, you might have a use case class that fetches data from a news repository and an author repository, and combines them:

Because use cases contain reusable logic, they can also be used by other use cases. It's normal to have multiple levels of use cases in the domain layer. For example, the use case defined in the example below can make use of the FormatDateUseCase use case if multiple classes from the UI layer rely on time zones to display the proper message on the screen:

presentation layer domain layer data layer

Call use cases in Kotlin

In Kotlin, you can make use case class instances callable as functions by defining the invoke() function with the operator modifier. See the following example:

In this example, the invoke() method in FormatDateUseCase allows you to call instances of the class as if they were functions. The invoke() method is not restricted to any specific signature—it can take any number of parameters and return any type. You can also overload invoke() with different signatures in your class. You'd call the use case from the example above as follows:

To learn more about the invoke() operator, see the Kotlin docs .

Use cases don't have their own lifecycle. Instead, they're scoped to the class that uses them. This means that you can call use cases from classes in the UI layer, from services, or from the Application class itself. Because use cases shouldn't contain mutable data, you should create a new instance of a use case class every time you pass it as a dependency.

Use cases from the domain layer must be main-safe ; in other words, they must be safe to call from the main thread. If use case classes perform long-running blocking operations, they are responsible for moving that logic to the appropriate thread. However, before doing that, check if those blocking operations would be better placed in other layers of the hierarchy. Typically, complex computations happen in the data layer to encourage reusability or caching. For example, a resource-intensive operation on a big list is better placed in the data layer than in the domain layer if the result needs to be cached to reuse it on multiple screens of the app.

The following example shows a use case that performs its work on a background thread:

Common tasks

This section describes how to perform common domain layer tasks.

Reusable simple business logic

You should encapsulate repeatable business logic present in the UI layer in a use case class. This makes it easier to apply any changes everywhere the logic is used. It also allows you to test the logic in isolation.

Consider the FormatDateUseCase example described earlier. If your business requirements regarding date formatting change in the future, you only need to change code in one centralized place.

Combine repositories

In a news app, you might have NewsRepository and AuthorsRepository classes that handle news and author data operations respectively. The Article class that NewsRepository exposes only contains the name of the author, but you want to display more information about the author on the screen. Author information can be obtained from the AuthorsRepository .

presentation layer domain layer data layer

Because the logic involves multiple repositories and can become complex, you create a GetLatestNewsWithAuthorsUseCase class to abstract the logic out of the ViewModel and make it more readable. This also makes the logic easier to test in isolation, and reusable in different parts of the app.

The logic maps all items in the news list; so even though the data layer is main-safe, this work shouldn't block the main thread because you don't know how many items it'll process. That's why the use case moves the work to a background thread using the default dispatcher.

Other consumers

Apart from the UI layer, the domain layer can be reused by other classes such as services and the Application class. Furthermore, if other platforms such as TV or Wear share codebase with the mobile app, their UI layer can also reuse use cases to get all the aforementioned benefits of the domain layer.

Data layer access restriction

One other consideration when implementing the domain layer is whether you should still allow direct access to the data layer from the UI layer, or force everything through the domain layer.

UI layer cannot access data layer directly, it must go through the Domain layer

An advantage of making this restriction is that it stops your UI from bypassing domain layer logic, for example, if you are performing analytics logging on each access request to the data layer.

However, the potentially significant disadvantage is that it forces you to add use cases even when they are just simple function calls to the data layer, which can add complexity for little benefit.

A good approach is to add use cases only when required. If you find that your UI layer is accessing data through use cases almost exclusively, it may make sense to only access data this way.

Ultimately, the decision to restrict access to the data layer comes down to your individual codebase, and whether you prefer strict rules or a more flexible approach.

General testing guidance applies when testing the domain layer. For other UI tests, developers typically use fake repositories, and it's good practice to use fake repositories when testing the domain layer as well.

The following Google samples demonstrate the use of the domain layer. Go explore them to see this guidance in practice:

Recommended for you

  • Note: link text is displayed when JavaScript is off
  • UI State production

Content and code samples on this page are subject to the licenses described in the Content License . Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2024-07-02 UTC.

Flutter App Architecture: The Presentation Layer

Andrea Bizzotto

Andrea Bizzotto

Updated   Sep 21, 2023 11 min read

When writing Flutter apps, separating any business logic from the UI code is very important.

This makes our code more testable and easier to reason about , and is especially important as our apps become more complex.

To accomplish this, we can use design patterns to introduce a separation of concerns between different components in our app.

And for reference, we can adopt a layered app architecture such as the one represented in this diagram:

I have already covered some of the layers above in other articles:

  • Flutter App Architecture with Riverpod: An Introduction
  • Flutter App Architecture: The Repository Pattern
  • Flutter App Architecture: The Domain Model
  • Flutter App Architecture: The Application Layer

And this time, we will focus on the presentation layer and learn how we can use controllers to:

  • hold business logic
  • manage the widget state
  • interact with repositories in the data layer
This kind of controller is the same as the view model that you would use in the MVVM pattern . If you've worked with flutter_bloc before, it has the same role as a cubit .

We will learn about the AsyncNotifier class, which is a replacement for the StateNotifier and the ValueNotifier / ChangeNotifier classes in the Flutter SDK.

And to make this more useful, we will implement a simple authentication flow as an example.

Ready? Let's go!

A simple authentication flow

Let's consider a very simple app that we can use to sign in anonymously and toggle between two screens:

And in this article, we'll focus on how to implement:

  • an auth repository that we can use to sign in and sign out
  • a sign-in widget screen that we show to the user
  • the corresponding controller class that mediates between the two

Here's a simplified version of the reference architecture for this specific example:

You can find the complete source code for this app on GitHub . For more info about how it is organized, read this: Flutter Project Structure: Feature-first or Layer-first?

The AuthRepository class

As a starting point, we can define a simple abstract class that contains three methods that we'll use to sign in, sign out, and check the authentication state:

In practice, we also need a concrete class that implements AuthRepository . This could be based on Firebase or any other backend. We can even implement it with a fake repository for now. For more details, see this article about the repository pattern .

For completeness, we can also define a simple AppUser model class:

And if we use Riverpod, we also need a Provider that we can use to access our repository:

Next up, let's focus on the sign-in screen.

The SignInScreen widget

Suppose we have a simple SignInScreen widget defined like so:

This is just a simple Scaffold with an ElevatedButton in the middle.

Note that since this class extends ConsumerWidget , in the build() method we have an extra ref object that we can use to access providers as needed.

Accessing the AuthRepository directly from our widget

As a next step, we can use the onPressed callback to sign in like so:

This code works by obtaining the AuthRepository with a call to ref.read(authRepositoryProvider) . and calling the signInAnonymously() method on it.

This covers the happy path (sign-in successful). But we should also account for loading and error states by:

  • disabling the sign-in button and showing a loading indicator while sign-in is in progress
  • showing a SnackBar or alert if the call fails for any reason

The "StatefulWidget + setState" way

One simple way of addressing this is to:

  • convert our widget into a StatefulWidget (or rather, ConsumerStatefulWidget since we're using Riverpod)
  • add some local variables to keep track of state changes
  • set those variables inside calls to setState() to trigger a widget rebuild
  • use them to update the UI

Here's how the resulting code may look like:

For a simple app like this, this is probably ok.

But this approach gets quickly out of hand when we have more complex widgets, as we are mixing business logic and UI code in the same widget class.

And if we want to handle loading in error states consistently across multiple widgets, copy-pasting and tweaking the code above is quite error-prone (and not much fun).

Instead, it would be best to move all these concerns into a separate controller class that can:

  • mediate between our SignInScreen and the AuthRepository
  • provide a way for the widget to observe state changes and rebuild itself as a result

So let's see how to implement it in practice.

A controller class based on AsyncNotifier

The first step is to create a AsyncNotifier subclass which looks like this:

Or even better, we can use the new @riverpod syntax and let Riverpod Generator do the heavy lifting for us:

Either way, we need to implement a build method, which returns the initial value that should be used when the controller is first loaded.

If desired, we can use the build method to do some asynchronous initialization (such as loading some data from the network). But if the controller is "ready to go" as soon as it is created (just like in this case), we can leave the body empty and set the return type to Future<void> .

Implementing the method to sign in

Next up, let's add a method that we can use to sign in:

A few notes:

  • We obtain the authRepository by calling ref.read on the corresponding provider ( ref is a property of the base AsyncNotifier class)
  • Inside signInAnonymously() , we set the state to AsyncLoading so that the widget can show a loading UI
  • Then, we call AsyncValue.guard and await for the result (which will be either AsyncData or AsyncError )
AsyncValue.guard is a handy alternative to try / catch . For more info, read this: Use AsyncValue.guard rather than try/catch inside your AsyncNotifier subclasses

And as an extra tip, we can use a method tear-off to simplify our code even further:

This completes the implementation of our controller class, in just a few lines of code:

Note about the relationship between types

Note that there is a clear relationship between the return type of the build method and the type of the state property:

In fact, using AsyncValue<void> as the state allows us to represent three possible values:

  • default (not loading) as AsyncData (same as AsyncValue.data )
  • loading as AsyncLoading (same as AsyncValue.loading )
  • error as AsyncError (same as AsyncValue.error )
If you're not familiar with AsyncValue and its subclasses, read this: How to handle loading and error states with StateNotifier & AsyncValue in Flutter

Time to get back to our widget class and wire everything up!

Using our controller in the widget class

Here's an updated version of the SignInScreen that uses our new SignInScreenController class:

Note how in the build() method we watch our provider and rebuild the widget when the state changes.

And in the onPressed callback we read the provider's notifier and call signInAnonymously() . And we can also use the isLoading property to conditionally disable the button while sign-in is in progress.

We're almost done, and there's only one thing left to do.

Listening to state changes

Right at the top of the build method, we can add this:

We can use this code to call a listener callback whenever the state changes.

This is useful for showing an error alert or a SnackBar if an error occurs when signing in.

Bonus: An AsyncValue extension method

The listener code above is quite useful and we may want to reuse it in multiple widgets.

To do that, we can define this AsyncValue extension :

And then, in our widget, we can just import our extension and call this:

By implementing a custom controller class based on AsyncNotifier , we've separated our business logic from the UI code .

As a result, our widget class is now completely stateless and is only concerned with:

  • watching state changes and rebuilding as a result (with ref.watch )
  • responding to user input by calling methods in the controller (with ref.read )
  • listening to state changes and showing errors if something goes wrong (with ref.listen )

Meanwhile, the job of our controller is to:

  • talk to the repository on behalf of the widget
  • emit state changes as needed

And since the controller doesn't depend on any UI code, it can be easily unit tested , and this makes it an ideal place to store any widget-specific business logic.

In summary, widgets and controllers belong to the presentation layer in our app architecture:

But there are three additional layers: data , domain , and application , and you can learn about them here:

Or if you want to dive deeper, check out my Flutter Foundations course. 👇

Flutter Foundations Course Now Available

I launched a brand new course that covers Flutter app architecture in great depth, along with other important topics like state management, navigation & routing, testing, and much more:

Flutter Foundations Course

Flutter Foundations Course

Learn about State Management, App Architecture, Navigation, Testing, and much more by building a Flutter eCommerce app on iOS, Android, and web.

Invest in yourself with my high-quality Flutter courses.

Flutter & Firebase Masterclass

Flutter & Firebase Masterclass

Learn about Firebase Auth, Cloud Firestore, Cloud Functions, Stripe payments, and much more by building a full-stack eCommerce app with Flutter & Firebase.

The Complete Dart Developer Guide

The Complete Dart Developer Guide

Learn Dart Programming in depth. Includes: basic to advanced topics, exercises, and projects. Fully updated to Dart 2.15.

Flutter Animations Masterclass

Flutter Animations Masterclass

Master Flutter animations and build a completely custom habit tracking application.

Clean Architecture - An Introduction

avatar

What is N-Tier Architecture?

User interface layer, business logic layer, introducing clean architecture, domain layer, application layer, infrastructure layer, presentation layer, testing layer, starter templates.

For a long time, I have been using the classic "N-Tier" architecture (UI Layer -> Logic Layer -> Data Layer) in most of the applications I build. I rely heavily on interfaces, and learnt a long time ago that IoC (Inversion of Control) is your friend. This architecture enabled me to build loosely coupled, testable applications, and has served me well so far. However, like many professional software engineers, I'm always on the look out as to how I can improve my architecture when designing applications.

Recently, I came across Clean Architecture from a presentation by Jason Taylor at a Goto conference, and have become fascinated with this architecture / pattern. It validated some of the things I had already been doing, but improved in other areas that always felt a bit clunky to me (like integrating with 3rd party services, and where the heck does validation go?).

NOTE: Although this architecture is language and framework agnostic, I will be mentioning some .NET Framework terms to help illustrate concepts.

First off, let's examine the classic N-Tier architecture. This has been around for 20+ years, and it still common in the industry today.

N-Tier most commonly has 3 layers:

N-Tier Architecture

Each layer is only allowed to communicate with the next layer down (i.e UI cannot communicate directly with Data). On the surface this limitation might seem like a good idea, but in implementation, it means a different set of models for each layer which results in way too much mapping code. This problem gets amplified the more layers you add.

  • Controllers
  • Static Assets

Typically this would be an MVC or Web API project.

  • Business / Application Logic (usually implemented as Services)
  • Integration with 3rd party services
  • Calls directly into the Data layer

This is where the meat of an N-Tier application is.

  • Data models
  • Database access

This architecture has had many names over the years. Onion Architecture , Hexagonal Archecture , Screaming Architecture , and others. This approach is not new, but it is also not nearly as common as it perhaps should be.

So, compared to N-Tier, what is Clean Architecture and how is it different?

Let's start with a picture:

Clean Architecture

The first thing to notice here is the direction of the dependencies. All dependencies flow inwards. Outer layers can communicate with ANY inner layer (compare this to N-Tier where each layer can only communicate with the one below it). This follows the Dependency Inversion Principle which means that dependencies are injected instead of being explicitly created. Another name for this is the Hollywood Principle: Don't call us, we'll call you . 😊

Application and Domain are considered the 'core' of the application. Application depends on Domain , but Domain depends on nothing.

When the Application needs functionality from Infrastructure (e.g. database access) the application will define it's own interfaces that infrastructure will implement. This decoupling is huge, and is one of the major benefits of this approach. Not only does it allow for easier unit testing, it also means it is persistence ignorant. When querying data, the underling data store could be a database, web service, or even flat file. The application doesn't care and doesn't need to know. With the implementation details being outside core, it allows us to focus on business logic and avoids pollution with less important details. It also provides flexibility in that today the data might come from one data source, but in the future it may need to come from a different data source. Due to the loose coupling, only the infrastructure layer will need to change to support this.

Jeffrey Palermo defined Four Tenants of Clean Architecture:

  • The application is built around an independent object model
  • Inner layers define interfaces. Outer layers implement interfaces
  • The direction of coupling is toward the center
  • All application core code can be compiled and run separate from infrastructure

All of this strives to make it easy for developers to do the right things, and hard for them to do the wrong things. We are striving to "force developers into the pit of success".

  • Value Objects
  • Aggregates (if doing DDD)
  • Enumerations

The Domain Layer is the heart of your application, and responsible for your core models. Models should be persistence ignorant, and encapsulate logic where possible. We want to avoid ending up with Anemic Models (i.e. models that are only collections of properties).

The Domain Layer could be included in the Application Layer , but if you are using an ORM like entity framework, the Infrastructure Layer will need to reference the domain models, in which case it's better to split out into a separate layer.

Data annotations should be left out of domain models. This should be added in the Infrastructure Layer using fluent syntax. If you are used to using the data annotations for you validation, I instead recommend using Fluent Validation in the Application Layer , which provides event more capability than annotations.

  • Application Interfaces
  • View Models / DTOs
  • Application Exceptions
  • Commands / Queries (if doing CQRS)

This is the Application of your Domain use to implement the use cases for your business. This provides the mapping from your domain models to one or more view models or DTOs.

Validation also goes into this layer. It could be argued that Validation goes into the domain, but the risk there is that errors raised may reference fields not present in the DTO / View Model which would cause confusion. IMO it's better to have potentially duplicated validation, than it is to validate an object that has not been passed into the command/query.

My preference is to use CQRS Commands and Queries to handle all application requests. MediatR can be used to facilitate this and add additional behaviour like logging, caching, automatic validation, and performance monitoring to every request. If you choose not to use CQRS, you can swap this out for using services instead.

The Application Layer ONLY references the Domain Layer . It knows nothing of databases, web services, etc. It does however define interfaces (e.g. IContactRepository, IContactWebService, IMessageBus), that are implemented by the Infrastructure Layer .

  • Web services
  • Message Bus
  • Configuration

The Infrastructure Layer will implement interfaces from the Application Layer to provide functionality to access external systems. These will be hooked up by the IoC container, usually in the Presentation Layer .

The Presentation Layer will usually have a reference to the Infrastructure Layer , but only to register the dependencies with the IoC container. This can be avoided with IoC containers like Autofac with the use of Registries and assembly scanning.

  • MVC Controllers
  • Web API Controllers
  • Swagger / NSwag
  • Authentication / Authorisation

The Presentation Layer is the entry point to the system from the user's point of view. Its primary concerns are routing requests to the Application Layer and registering all dependencies in the IoC container. Autofac is my favourite container, but use whatever makes you happy. If you are using ASP.NET, actions in controllers should be very thin, and mostly will simply passing the request or command to MediatR .

The Testing Layer is another entry point to the system. Primarily this should be aiming at the Application Layer , which is the core of the application. Because all infrastructure is abstracted by interfaces, mocking out these dependencies becomes trivial.

If you are interested in learning more about testing I highly recommend Clean Testing .

The layers described so far, make up the basic approach of Clean Architecture. You may need more layers depending on your application.

If you are not using an ORM you may be able to combine Domain and Application Layers for simplicity.

The outer-most right might also have more segments. For example, you may wish to split out infrastructure into other projects (e.g. Persistence).

This approach works well with Domain-Driven Design, but works equally well without it.

CQRS is the recommended approach for the entry point into the Application Layer . However, you could also use typical services if you're not comfortable with that.

I've also seen implementations where the application core is divided up into 4 internal layers. IMO this is overkill for most projects. We want to balance clarity with simplicity.

This all sounds great right? But how can I get started?

Fortunately for us all Jason Taylor has created a .NET Core Solution Template, that contains a fully functioning application with an Angular 9 Frontend, and associated unit and integration tests. This is a great starting point to see everything in action.

Clean Architecture Solution Template

You can find more about this here

Clean Architecture is by no means new, and is nothing groundbreaking. However, with a few tweaks on the typical N-Tier architecture the result is a completely testable, more maintainable solution that can adapt to change faster. Due to the loose coupling between outer and inner layers, modifications can be made easier, which can be the difference between an application lasting 2 years and 10 years.

I'm still working out the kinks in my own implementations, but really see the advantages with this approach and am excited to see the results over time.

Give it a try and let me know how you go.

  • Clean Architecture - Jason Taylor
  • Rules to Better Clean Architecture
  • Microsoft Docs - Clean Architecture
  • The Clean Architecture - Robert C. Martin
  • Peeling Back the Onion Architecture - Tony Sneed
  • Onion Architecture - Jeffrey Palermo
  • Engineering Mathematics
  • Discrete Mathematics
  • Operating System
  • Computer Networks
  • Digital Logic and Design
  • C Programming
  • Data Structures
  • Theory of Computation
  • Compiler Design
  • Computer Org and Architecture

Presentation Layer in OSI model

Prerequisite : OSI Model

Introduction : Presentation Layer is the 6th layer in the Open System Interconnection (OSI) model. This layer is also known as Translation layer, as this layer serves as a data translator for the network. The data which this layer receives from the Application Layer is extracted and manipulated here as per the required format to transmit over the network. The main responsibility of this layer is to provide or define the data format and encryption. The presentation layer is also called as Syntax layer since it is responsible for maintaining the proper syntax of the data which it either receives or transmits to other layer(s).

Functions of Presentation Layer :

The presentation layer, being the 6th layer in the OSI model, performs several types of functions, which are described below-

  • Presentation layer format and encrypts data to be sent across the network.
  • This layer takes care that the data is sent in such a way that the receiver will understand the information (data) and will be able to use the data efficiently and effectively.
  • This layer manages the abstract data structures and allows high-level data structures (example- banking records), which are to be defined or exchanged.
  • This layer carries out the encryption at the transmitter and decryption at the receiver.
  • This layer carries out data compression to reduce the bandwidth of the data to be transmitted (the primary goal of data compression is to reduce the number of bits which is to be transmitted).
  • This layer is responsible for interoperability (ability of computers to exchange and make use of information) between encoding methods as different computers use different encoding methods.
  • This layer basically deals with the presentation part of the data.
  • Presentation layer, carries out the data compression (number of bits reduction while transmission), which in return improves the data throughput.
  • This layer also deals with the issues of string representation.
  • The presentation layer is also responsible for integrating all the formats into a standardized format for efficient and effective communication.
  • This layer encodes the message from the user-dependent format to the common format and vice-versa for communication between dissimilar systems.
  • This layer deals with the syntax and semantics of the messages.
  • This layer also ensures that the messages which are to be presented to the upper as well as the lower layer should be standardized as well as in an accurate format too.
  • Presentation layer is also responsible for translation, formatting, and delivery of information for processing or display.
  • This layer also performs serialization (process of translating a data structure or an object into a format that can be stored or transmitted easily).

Features of Presentation Layer in the OSI model: Presentation layer, being the 6th layer in the OSI model, plays a vital role while communication is taking place between two devices in a network.

List of features which are provided by the presentation layer are:

  • Presentation layer could apply certain sophisticated compression techniques, so fewer bytes of data are required to represent the information when it is sent over the network.
  • If two or more devices are communicating over an encrypted connection, then this presentation layer is responsible for adding encryption on the sender’s end as well as the decoding the encryption on the receiver’s end so that it can represent the application layer with unencrypted, readable data.
  • This layer formats and encrypts data to be sent over a network, providing freedom from compatibility problems.
  • This presentation layer also negotiates the Transfer Syntax.
  • This presentation layer is also responsible for compressing data it receives from the application layer before delivering it to the session layer (which is the 5th layer in the OSI model) and thus improves the speed as well as the efficiency of communication by minimizing the amount of the data to be transferred.

Working of Presentation Layer in the OSI model : Presentation layer in the OSI model, as a translator, converts the data sent by the application layer of the transmitting node into an acceptable and compatible data format based on the applicable network protocol and architecture.  Upon arrival at the receiving computer, the presentation layer translates data into an acceptable format usable by the application layer. Basically, in other words, this layer takes care of any issues occurring when transmitted data must be viewed in a format different from the original format. Being the functional part of the OSI mode, the presentation layer performs a multitude (large number of) data conversion algorithms and character translation functions. Mainly, this layer is responsible for managing two network characteristics: protocol (set of rules) and architecture.

Presentation Layer Protocols : Presentation layer being the 6th layer, but the most important layer in the OSI model performs several types of functionalities, which makes sure that data which is being transferred or received should be accurate or clear to all the devices which are there in a closed network. Presentation Layer, for performing translations or other specified functions, needs to use certain protocols which are defined below –

  • Apple Filing Protocol (AFP): Apple Filing Protocol is the proprietary network protocol (communications protocol) that offers services to macOS or the classic macOS. This is basically the network file control protocol specifically designed for Mac-based platforms.
  • Lightweight Presentation Protocol (LPP): Lightweight Presentation Protocol is that protocol which is used to provide ISO presentation services on the top of TCP/IP based protocol stacks.
  • NetWare Core Protocol (NCP): NetWare Core Protocol is the network protocol which is used to access file, print, directory, clock synchronization, messaging, remote command execution and other network service functions.
  • Network Data Representation (NDR): Network Data Representation is basically the implementation of the presentation layer in the OSI model, which provides or defines various primitive data types, constructed data types and also several types of data representations.
  • External Data Representation (XDR): External Data Representation (XDR) is the standard for the description and encoding of data. It is useful for transferring data between computer architectures and has been used to communicate data between very diverse machines. Converting from local representation to XDR is called encoding, whereas converting XDR into local representation is called decoding.
  • Secure Socket Layer (SSL): The Secure Socket Layer protocol provides security to the data that is being transferred between the web browser and the server. SSL encrypts the link between a web server and a browser, which ensures that all data passed between them remains private and free from attacks.

author

Please Login to comment...

Similar reads, improve your coding skills with practice.

 alt=

What kind of Experience do you want to share?

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Application layer vs domain layer?

I am reading Domain-Driven Design by Evans and I am at the part discussing the layered architecture. I just realized that application and domain layers are different and should be separate. In the project I am working on, they are kind of blended and I can't tell the difference until I read the book (and I can't say it's very clear to me now), really.

My questions, since both of them concerns the logic of the application and are supposed to be clean of technical and presentation aspects, what are the advantages of drawing a boundary these two?

  • architecture
  • domain-driven-design

Louis Rhys's user avatar

7 Answers 7

I recently read DDD myself. When I got to this section I was pleasantly surprised to find out I discovered the same 4-layer architecture that Evans did. As @lonelybug pointed out, the domain layer should be completely isolated from the rest of the system. However, something has to translate UI-specific values (query strings, POST data, session, etc.) into domain objects. This is where the application layer comes into play. It's job is to translate back and forth between the UI, the data layer and the domain, effectively hiding the domain from the rest of the system.

I see a lot of ASP.NET MVC applications now where almost all the logic is in the controllers. This is a failed attempt to implement the classic 3-layer architecture. Controllers are difficult to unit test because they have so many UI-specific concerns. In fact, writing a controller so that it isn't directly concerned with "Http Context" values is a serious challenge in and of itself. Ideally, the controller should be just perform translation, coordinate work and spit back the response.

It can even make sense to do basic validation in the application layer. It's okay for the domain to assume the values going into it make sense (is this a valid ID for this customer and does this string represent a date/time). However, validation involving business logic (can I reserve a plane ticket in the past?) should be reserved for the domain layer.

Martin Fowler actually comments on how flat most domain layers are these days . Even though most people don't even know what an application layer is, he finds that a lot of people make rather dumb domain objects and complex application layers that coordinate the work of the different domain objects. I'm guilty of this myself. The important thing isn't to build a layer because some book told you to. The idea is to identify responsibilities and separate our your code based on those responsibilities. In my case, the "application layer" kind of evolved naturally as I increased unit testing.

Travis Parks's user avatar

  • 24 I don't think what you state here is correct: " However, something has to translate UI-specific values (query strings, POST data, session, etc.) into domain objects. This is where the application layer comes into play". What you are referring is in DDD's terms the "Presentation" layer. The Application Layer is supposed to deal with plumbing, concurrency and cross-cutting concerns, being just a tiny wrapper over the Domain Layer. What you are describing would correspond to a (sub) layer in the Presentation Layer. –  devoured elysium Commented Dec 18, 2015 at 16:48
  • Agreeing with the other commentor. It sounds like App is like a Controller. This is false. Application layer does know nothing about the UI –  IceFire Commented Jun 10, 2022 at 9:13

The domain layer models the business of your application. This should be your clear interpretation of it's rules, it's component dynamics and contains it's state at any given moment.

The application layer is concerned with defining the jobs needed to be done to accomplish a certain application task. Mainly, it is responsible for mandate the necessary domain work and interacts with other (external or not) services.

For example , my financial software application has a user operation for changing the state of a model entity (entity as defined in DDD [89]):

  • "The Chief of operations can approve a financial proposal".

But, as an application process, besides all the model consequences of this operation, I have to send an internal communication to other users of the application. This kind of work is "orchestrated" in the application layer. I would not want my domain layer thinking about directing a messaging service. (and certainly this is not a presentation layer responsibility). Whatever way, one thing is for sure: I need a new layer as my domain layer is all about the core business and my presentation layer is all about interpreting user commands and presenting results.

  • Business is one of those words that frequently lead to multiple interpretations of it's meaning but for sure you can find lots of examples and talk-about in DDD;
  • DDD stands for Domain-Driven Design book by Eric Evans and number inside square brackets for page number.

Hidden's user avatar

  • 2 This is probably the clearest distinction I found between Application and Domain layer over dozens of blog posts and SO topics. Thank you! –  Jean Claveau Commented Feb 21, 2023 at 10:02
  • Which layer shoud define API (DTOs) for quering data needed for validation, e.g. IsProductCodeUnique query/response, or queries needed for fetching data to datagrids (ofther reading from SQL views with cross aggregate joins) –  Liero Commented Apr 22 at 13:57

Taking from Martin Fowler's patterns of enterprise design, the most common layers are:

Presentation - these are views, presentation templates which generate the interaction interface for your application (I am using interaction in case your application is accessed by other systems through web services or RMI so may not be a user interface). This also includes controllers which decide how actions will be executed and how.

Domain - this is where your business rules and logic resides, your domain models are defined etc

Data Source - this is the data mapping layer (ORM) and data source (database, file system etc)

How do you draw the boundaries between the three layers:

Do not put presentation specific logic within your models or domain objects

Do not put logic within your pages and controllers, i.e., logic to save objects to the database, create database connections etc, which will make your presentation layer brittle and difficult to test

Use an ORM which enables you to decouple your datasource access and actions from the model

Follow the thin controller - fat model paradigm, controllers are for controlling the process of execution not carrying it out, more at http://www.littlehart.net/atthekeyboard/2007/04/27/fat-models-skinny-controllers/ and http://weblog.jamisbuck.org/2006/10/18/skinny-controller-fat-model model, view and controller,

Stefan Falk's user avatar

  • Application Layer and Domain Layer both comes under the scope of implementation.
  • Application Layer is acts as API.
  • Domain Layer is acts as a implementation of API, it contains business logic so it is also call Business Logic Layer.

enter image description here

  • 2 never though of it this way....I feel enlightened –  Nikos Commented Apr 22, 2019 at 18:40
  • 1 I like this. @Premraj could you provide reference? –  Christian H Commented Jul 15, 2021 at 23:05
  • it is not like that. What is difference between API in Application layer and Front Controller?? API is presentation layer at all. App layer consist of business use cases, wheras domain layer consists of business rules. –  zolty13 Commented Feb 11, 2022 at 15:33

Domain Layer should be designed as an isolation layer, which means the business logic and rules should not be affected with any codes (in Application Layer, Presentation Layer and Infrastructure Layer) changes.

Application Layer is suppose to be designed to provide some functions about what a system (application) interface (think this like an API or RESTful) can do. For example, users can log in a system, and in this application action (login), application layer codes will be the client codes for Domain Layer (or Infrastructure Layer), in which retrieves User domain object and apply this object's methods to implement the 'login' function.

Application Layer should also be designed as an isolation layer, which means the application's behaviours should not be affected with any codes (in Presentation Layer, Domain Layer and Infrastructure Layer) changes.

stevesun21's user avatar

  • 3 At least in literature such as Domain-Driven Design (Evans), it is acknowledged that the layers have a one-way dependency ... fact is, at some point your code depends on something . UI depends on Application, but not vice-versa. Application depends on Domain, but not vice-verse. Domain on Infrastructure, not vice-versa. –  user44798 Commented Jan 26, 2013 at 18:39
  • 1 Dependency is about how your programming, the isolation layer is about how you design you system layers. One way dependency does not broke the isolation concept here, because when you programming, the top layer code should dependent on the interface of lower layer rather than the implementation classes. –  stevesun21 Commented Sep 15, 2013 at 12:51
  • That's great and all on paper, but in practice, business requirements result in changes that can affect the interface of the application layer in such a way that changes bubble up through the presentation layer, and sometimes down to the storage layer. That is all I was getting at... –  user44798 Commented Sep 15, 2013 at 15:40
  • Isolation layer design does not mean no changes allowed in the future. Contrary, it makes the changes much more easier -- easier to test and easier to estimate the works. Yes, a new business requirement means you may need to change from the top to the bottom, isn't it the way how you implemented the existing function before? If you can design each layer based on SOLID principles, then you may found that you can just reuse existing functions from the bottom layer. –  stevesun21 Commented Jun 14, 2014 at 14:17

The main reason for these boundaries is separation of concerns . The code that accesses the data store should only have to worry about accessing the data store. It should not be responsible for enforcing rules upon the data. Additionally the UI should be responsible for updating controls in the UI, getting values from user input and translating them to something that the domain layer can use, and nothing more. It should call operations provided by the domain layer to perform any needed actions (e.g. save this file). A web service that is called should be responsible for converting from the transmission medium to something the domain layer can use, and then call the domain layer (most tools do a lot of this work for you).

This separation, when implemented properly can afford you the capability to change parts of your code without affecting others. For example, maybe the sort order of a returned collection of objects needs to change. Since you know that the layer responsible for data manipulation (usually the business logic layer) handles this stuff, you can easily identify where the code needs to be changed. As well as not having to modify how it is retrieved from the data store, or any of the applications using the domain (the UI and web service from my example above).

The ultimate goal is to make your code as easy to maintain as possible.

As a side note, some things cannot be pigeon-holed into a specific layer of the domain (e.g. logging, validation, and authorization). These items are commonly referred to as cross-cutting concerns, and in some cases can be treated as a layer that stands by itself that all the other layers can see and use.

Personally I think the layered approach is outdated, and that the service approach is better. You still have the stark line drawn in the sand as to who does what, but it doesn't force you to be as hierarchical. For example, a purchase order service, a billing service, and a shipping service, from the application perspective all of these services represent your domain, and the deferment of responsibility I described above is still valid in this context, it has just been altered such that your domain exists in multiple places, further utilizing the separation of concerns concept.

Charles Lambert's user avatar

  • I have been curious about placement of authorization logic, and from what I am trying to understand, it fits into the 'application layer'. Would you mind sharing some insight as to why it might not be best to contain it within that layer of logic? –  user44798 Commented Jan 26, 2013 at 18:42
  • 1 That is the perfect type of question for this site. You should post it, so that every has a chance to answer. –  Charles Lambert Commented Jan 28, 2013 at 13:19
  • @tuespetre Could you provide a link to that post? –  drizzie Commented Apr 7, 2015 at 17:04

The point of Domain Driven Modelling is to separate the essential domain model out and have it exist without any dependencies on other layers and other application concerns.

This allows you to focus on the domain itself without distractions (such as coordinating between the UI and the persistence services).

Oded's user avatar

  • Then, the data source(an ORM) is inside the domain? –  Maykonn Commented Oct 26, 2013 at 13:22
  • @Maykonn - It could be. However, an ORM is not a source of data. It is a tool between your code and the actual source of data (a relational database). How you access the data shouldn't be a concern of the domain - builders and factories can deal with that (and an ORM if you have one). –  Oded Commented Oct 26, 2013 at 19:10
  • I agree. And I was wrong about datasource and ORM. Thanks! –  Maykonn Commented Oct 27, 2013 at 14:18
  • But the question was about difference between Application (not Presentation) and Domain layer. –  Liero Commented Apr 22 at 13:52

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Software Engineering Stack Exchange. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged architecture domain-driven-design or ask your own question .

  • Featured on Meta
  • Bringing clarity to status tag usage on meta sites
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Announcing a change to the data-dump process

Hot Network Questions

  • Did the United States have consent from Texas to cede a piece of land that was part of Texas?
  • Extrude Individual Faces function is not working
  • Very old fantasy adventure movie where the princess is captured by evil, for evil, and turned evil
  • Does a Way of the Astral Self Monk HAVE to do force damage with Arms of the Astral Self from 10' away, or can it be bludgeoning?
  • What is the reason why creating a veth requires root?
  • Pollard's rho algorithm implementation
  • Where will the ants position themselves so that they are precisely twice as far from vinegar as they are from peanut butter?
  • What can cause a 24 volt solar panel to output 40 volt?
  • Is it mandatory in German to use the singular in negative sentences like "none of the books here are on fire?"
  • What is the number ways to count tuples whose sum is zero?
  • Short story or novella where a man's wife dies and is brought back to life. The process is called rekindling. Rekindled people are very different
  • Uneven Spacing with Consecutive Math Environments
  • Ways to paint a backbone on a tree
  • Who is affected by Obscured areas?
  • Advice needed: Team needs developers, but company isn't posting jobs
  • MOSFETs keep shorting way below rated current
  • 'best poster' and 'best talk' prizes - can we do better determining winners?
  • Why is deontological ethics the opposite of teleological ethics and not "ontological" ethics
  • What makes a new chain jump other than a worn cassette?
  • What is the origin and meaning of the phrase “wear the brown helmet”?
  • Is there a law against biohacking your pet?
  • How can I cover all my skin (face+neck+body) while swimming outside (sea or outdoor pool) to avoid UV radiations?
  • Simple JSON parser in lisp
  • How to allow just one user to use SSH?

presentation layer domain layer data layer

  Layer 6 Presentation Layer

De/Encryption, Encoding, String representation

The presentation layer (data presentation layer, data provision level) sets the system-dependent representation of the data (for example, ASCII, EBCDIC) into an independent form, enabling the syntactically correct data exchange between different systems. Also, functions such as data compression and encryption are guaranteed that data to be sent by the application layer of a system that can be read by the application layer of another system to the layer 6. The presentation layer. If necessary, the presentation layer acts as a translator between different data formats, by making an understandable for both systems data format, the ASN.1 (Abstract Syntax Notation One) used.

OSI Layer 6 - Presentation Layer

The presentation layer is responsible for the delivery and formatting of information to the application layer for further processing or display. It relieves the application layer of concern regarding syntactical differences in data representation within the end-user systems. An example of a presentation service would be the conversion of an EBCDIC-coded text computer file to an ASCII-coded file. The presentation layer is the lowest layer at which application programmers consider data structure and presentation, instead of simply sending data in the form of datagrams or packets between hosts. This layer deals with issues of string representation - whether they use the Pascal method (an integer length field followed by the specified amount of bytes) or the C/C++ method (null-terminated strings, e.g. "thisisastring\0"). The idea is that the application layer should be able to point at the data to be moved, and the presentation layer will deal with the rest. Serialization of complex data structures into flat byte-strings (using mechanisms such as TLV or XML) can be thought of as the key functionality of the presentation layer. Encryption is typically done at this level too, although it can be done on the application, session, transport, or network layers, each having its own advantages and disadvantages. Decryption is also handled at the presentation layer. For example, when logging on to bank account sites the presentation layer will decrypt the data as it is received.[1] Another example is representing structure, which is normally standardized at this level, often by using XML. As well as simple pieces of data, like strings, more complicated things are standardized in this layer. Two common examples are 'objects' in object-oriented programming, and the exact way that streaming video is transmitted. In many widely used applications and protocols, no distinction is made between the presentation and application layers. For example, HyperText Transfer Protocol (HTTP), generally regarded as an application-layer protocol, has presentation-layer aspects such as the ability to identify character encoding for proper conversion, which is then done in the application layer. Within the service layering semantics of the OSI network architecture, the presentation layer responds to service requests from the application layer and issues service requests to the session layer. In the OSI model: the presentation layer ensures the information that the application layer of one system sends out is readable by the application layer of another system. For example, a PC program communicates with another computer, one using extended binary coded decimal interchange code (EBCDIC) and the other using ASCII to represent the same characters. If necessary, the presentation layer might be able to translate between multiple data formats by using a common format. Wikipedia
  • Data conversion
  • Character code translation
  • Compression
  • Encryption and Decryption

The Presentation OSI Layer is usually composed of 2 sublayers that are:

CASE common application service element

ACSEAssociation Control Service Element
ROSERemote Operation Service Element
CCRCommitment Concurrency and Recovery
RTSEReliable Transfer Service Element

SASE specific application service element

FTAMFile Transfer, Access and Manager
VTVirtual Terminal
MOTISMessage Oriented Text Interchange Standard
CMIPCommon Management Information Protocol
JTMJob Transfer and Manipulation
MMSManufacturing Messaging Service
RDARemote Database Access
DTPDistributed Transaction Processing

Layer 7   Application Layer

Layer 6   presentation layer, layer 5   session layer, layer 4   transport layer, layer 3   network layer, layer 2   data link layer, layer 1   physical layer.

U.S. flag

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Home

  •   Facebook
  •   Twitter
  •   Linkedin
  •   Digg
  •   Reddit
  •   Pinterest
  •   Email

Latest Earthquakes |    Chat Share Social Media  

Lower Salinas Valley Hydrologic Models: Discretization Data (ver. 1.2, August 2024)

The Lower Salinas Valley Hydrologic Models’ discretization data includes a shapefile of the model domain and layers and a shapefile of the water balance subregions. The Lower Salinas Valley Hydrologic Models (LSVHM) include a historical model, the Salinas Valley Integrated Hydrologic Model (SVIHM) and a reservoir operations model, the Salinas Valley Operational Model (SVOM). While the Lower Salinas Valley Hydrologic Models have different purposes, they have the same model extent and many of the same input datasets, including the discretization data included in this data release.

The model domain for the Lower Salinas Valley Hydrologic Models includes the Salinas Valley groundwater basin and extends offshore to include offshore aquifers to represent potential seawater intrusion. The aquifer system is bounded by faults and depositional or formational boundaries; Some of these faults cut across some of the aquifer layers, these faults in the interior of the model domain are simulated as potential hydrologic flow barriers. The model grid is uniform, where each grid cell is approximately 6.46 acres (530-by-530 ft). There are 976 rows, 272 columns, and 9 layers having a varying number of active cells in each layer. The 9 model layers correspond to locally defined hydrostratigraphic units that represent aquifer systems that are separated by confining units (Feeney and Rosenberg, 2003; Kennedy/Jenks Consultants, 2004; Brown and Caldwell, 2015; Sweetkind, 2023). . These include the saturated portions of the younger and older alluvium that represent the shallow aquifer that is underlain by the Salinas Valley Aquitard. These units overlie the Pressure 180-Foot Aquifer and the Pressure 180/400-Foot Aquitard, which in turn overlies the Pressure 400-Foot Aquifer, the underlying deep aquitard, and the basement bedrock of the Monterey Formation. The geologic units that comprise the aquifers and confining units above the Monterey Formation include the recent Alluvium, Aromas Formation, Paso Robles Formation, and Purisima Formation (Sweetkind, 2023). The top of the Lower Salinas Valley Hydrologic Models' is represented by the altitude of the land surface, but because hydrostratigraphic units are discontinuous across the study area, the uppermost active layer is a composite of model layers 1, 3, 5, 7, and 9. The following are brief descriptions of the layers: (1) the uppermost shallow Quaternary Alluvial aquifer, (2) the Salinas Valley Aquitard, (3 & 5) the Pressure 180-Foot Aquifer, (4 & 6) the 180/400-Foot Aquitard, (7) the Paso Robles Formation, (8) the Purisima Formation, and (9) the basement bedrock. Within the model domain, a mass balance is maintained for 31 water balance subregions (WBS). The delineation of the WBS is based on the management areas of the Monterey County Zone 2C jurisdictional region offshore regions, and areas outside of the Zone 2C jurisdictional region that are inside the active model domain that represents the groundwater basin (MCWRA, 2015; MCWRA, 2018).

Citations: Brown and Caldwell. 2015. State of the Salinas River Groundwater Basin, Consultants report prepared for Monterey County Resources Management Agency, January 16, 2015, 240 p. https://www.co.monterey.ca.us/home/showpublisheddocument/19678/63623275… .

Feeney, M. and Rosenberg, L., 2003, Deep Aquifer Investigation – Hydrogeologic Data Inventory, Review, Interpretation and Implications, Technical Memorandum to WRIME, Inc., 40 p. https://www.co.monterey.ca.us/home/showdocument?id=61923 .

Kennedy/Jenks Consultants, 2004, Hydrostratigraphic Analysis of the Northern Salinas Valley, Consultants report prepared for Monterey County Water Resources Agency, 14 May 2004, 112p., https://www.co.monterey.ca.us/home/showdocument?id=61922 .

Monterey County Water Resources Agency (MCWRA), 2015, Monterey County Water Resources Agency, https://digitalcommons.csumb.edu/hornbeck_cgb_6_a/21 .

Monterey County Water Resources Agency (MCWRA), 2016, Boundary of the Monterey County Water Resources Agency (MCWRA) Benefit Assessment Zone, https://montereycountyopendata-12017-01-13t232948815z-montereyco.openda… .

Sweetkind, D.S., 2023, Digital data for the Salinas Valley Geological Framework, California: U.S. Geological Survey data release https://doi.org/10.5066/P9IL8VBD .

Citation Information

Publication Year 2022
Title Lower Salinas Valley Hydrologic Models: Discretization Data (ver. 1.2, August 2024)
DOI
Authors Wesley Henson, Elizabeth R Jachens
Product Type Data Release
Record Source
USGS Organization Sacramento Projects Office (USGS California Water Science Center)

Related Content

Wesley henson, ph.d., research hydrologist.

  • Open access
  • Published: 09 August 2024

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

  • Yunfei Hu 1 ,
  • Manfei Xie 2 ,
  • Yikang Li 2 ,
  • Mingxing Rao 1 ,
  • Wenjun Shen 3 ,
  • Can Luo 2 ,
  • Haoran Qin 1 ,
  • Jihoon Baek 1 &
  • Xin Maizie Zhou   ORCID: orcid.org/0000-0003-4015-4787 1 , 2  

Genome Biology volume  25 , Article number:  212 ( 2024 ) Cite this article

1534 Accesses

18 Altmetric

Metrics details

Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development.

In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets.

Conclusions

Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.

Spatial transcriptomics (ST) technology, emerging as a complementary approach to scRNA-seq, facilitates comprehensive gene expression profiling in tissue samples while preserving the spatial information of every cell or spot analyzed [ 1 , 2 ]. ST techniques have significantly enhanced our understanding of cellular heterogeneity and tissue organization, offering insights into developmental processes, disease mechanisms, and potential therapeutic strategies [ 3 , 4 , 5 , 6 ]. ST technologies are commonly categorized into two groups: imaging-based and sequencing-based methods [ 7 , 8 , 9 , 10 , 11 , 12 , 13 ]. Advancements in spatial resolution, capture capabilities, and computational methods are continuously enhancing their potential applications and capabilities.

An essential initial step in ST research is to cluster the spots and define spatially coherent regions in terms of expression data and location adjacency [ 14 , 15 ]. This process essentially entails classical unsupervised clustering of spots into groups according to the similarity of their gene expression profiles and spatial locations, subsequently assigning labels to each cluster. To date, existing clustering methods in ST can be broadly categorized into two groups: statistical methods and graph-based deep learning methods [ 16 ].

Representative methods for statistical models are BayesSpace [ 17 ], BASS [ 18 ], SpatialPCA [ 19 ], DR.SC [ 20 ], and BANSKY [ 21 ]. BayesSpace performs spatial clustering at the spot level, utilizing a t-distributed error model to identify clusters, along with employing Markov chain Monte Carlo (MCMC) for estimating model parameters. BASS detects spatial domains and clusters cell types within a tissue section simultaneously by utilizing a hierarchical Bayesian model framework. BASS can also be applied to perform multi-slice clustering. SpatialPCA is a dimension reduction method aimed at extracting a low-dimensional representation of ST data using spatial correlation information. DR.SC employs a two-layer hierarchical model that simultaneously performs dimension reduction and spatial clustering, optimizing the extraction of low-dimensional features as well as the identification of spatial clusters. The BANSKY algorithm clusters cells using an azimuthal Gabor filter (AGF)-inspired kernel to capture gene expression variations. It constructs a neighborhood graph, computes z-scaled average neighborhood expression and AGF matrices, and combines these with the original gene expression data. This is followed by dimension reduction and graph-based clustering to determine cell types and domains.

Recent trends indicate a growing momentum toward utilizing graph-based deep learning backbones, attributed to their ability for graphing cell relations and capturing representative features. Representative methods are SpaGCN [ 22 ], SEDR [ 23 ], CCST [ 24 ], STAGATE [ 3 ], conST [ 25 ], ConGI [ 26 ], SpaceFlow [ 27 ], GraphST [ 4 ], and ADEPT [ 28 ]. These methods predominantly employ graph neural network models to extract latent spot features prior to clustering, albeit with variations in network architectures and design strategies. SpaGCN has a unique design of building an adjacency matrix while considering histology image pixel values. SEDR employs multiple variation autoencoders to handle data from different modalities. CCST is based on a graph convolutional network to improve cell clustering and discover novel cell types. STAGATE learns low-dimensional latent embeddings with both spatial information and gene expressions via a graph attention auto-encoder. conST, ConGI, and GraphST all rely on a contrastive learning strategy [ 29 ]. conST adopts a two-phase training strategy incorporating self-supervised contrastive learning at three levels: local-local, local-global, and local-context. ConGI utilizes three different contrastive learning losses to integrate information from both the histology images as well as the gene expression profiles. GraphST utilizes representations of both normal graphs and corrupted graphs to construct positive and negative spot pairs for contrastive training. SpaceFlow uses spatially regularized deep graph networks to create spatially-consistent low-dimensional embeddings. This framework introduces a pseudo-spatiotemporal map to integrate pseudotime with spatial locations. ADEPT employs differentially expressed gene selection and imputation procedures to minimize the variations in prediction.

In contrast to merely identifying spatial domains or cell types within a single slice, there is an increasing acknowledgment of the importance of integrative and comparative analyses of multiple ST slices [ 30 ]. Thus, ST analysis tools might integrate samples originating from diverse sources, encompassing various individual samples, biological conditions, technological platforms, and developmental stages. Nonetheless, ST slices may exhibit significant “batch effects” [ 15 ], which refer to technical biases such as uneven amplification during PCR [ 31 ], variations in cell lysis [ 32 ], or differences in reverse transcriptase enzyme efficiency during sequencing. These factors have the potential to obscure genuine biological signals, thereby complicating data interpretation and integration.

To analyze multiple ST slices by minimizing batch effects, different alignment and integration methods have been introduced. Alignment methods are designed to align or match spots or cells from different ST sections or datasets to a common spatial or anatomical reference. These methods are critical for correcting distortions or differences in tissue sections, ensuring consistency across samples. Integration methods primarily merge data from various sources or conditions to create a comprehensive dataset, enhancing data robustness and revealing broader patterns not apparent in individual datasets. These techniques excel at adjusting for batch effects and normalizing data. Some tools can perform both alignment and integration tasks. Representative alignment methods include PASTE [ 33 ], PASTE2 [ 34 ], SPACEL [ 35 ], STalign [ 36 ], and GPSA [ 37 ]. PASTE utilizes the Gromov-Wasserstein optimal transport (OT) algorithm [ 38 ] for aligning adjacent consecutive ST data. PASTE2, an extension of PASTE, allows partial alignment, accommodating partial overlap between aligned slices and/or slice-specific cell types. Both PASTE and PASTE2 output a mapping matrix for every pair of consecutive ST slices, facilitating the reconstruction of the tissue’s 3D architecture through multi-slice alignment. SPACEL combines a multi-layer perceptron and a probabilistic model for deconvolution. It subsequently employs a graph convolutional network with adversarial learning to identify spatial domains across multiple ST slices and finally constructs the 3D tissue architecture by transforming and stacking the spatial coordinate systems of consecutive slices. STalign aligns ST datasets across sections, samples, and technologies by using diffeomorphic metric mapping to account for partially matched tissue sections and local non-linear distortions. GPSA is a probabilistic model that employs a two-layer Gaussian process where the first layer maps observed spatial locations to a common coordinate system (CCS), and the second layer maps from the CCS to the observed phenotypic readouts, such as gene expression.

Several integration methods have also been introduced. Notable examples include STAligner [ 39 ], DeepST [ 40 ], PRECAST [ 41 ], and SPIRAL [ 42 ]. These tools do not directly align slices; instead, they learn shared latent spot embeddings after jointly training on multiple slices. STAligner, built on the STAGATE model, introduces triplet loss by utilizing mutual nearest neighbors between spots from consecutive slices to exploit the contrastive learning strategy for enhancing inter-slice connection. DeepST consists of a graph neural network autoencoder and a denoising autoencoder to generate a representation of the augmented ST data as well as domain adversarial neural networks to integrate ST data. DeepST is also applicable to individual slices for spatial clustering. PRECAST leverages a unified model including a hidden Markov random field model and a Gaussian mixture model to simultaneously tackle low-dimensional embedding estimation, spatial clustering, and alignment embedding across multiple ST datasets. SPIRAL employs a graph autoencoder backbone with an OT-based discriminator and a classifier to remove the batch effect, align coordinates, and enhance gene expression. BASS applies a hierarchical Bayesian model framework for multi-slice clustering and outputs clustering labels.

The dichotomization of alignment and integration methods is not absolute. PASTE also outputs an integrated center slice, so it can also be classified as an integration tool. STAligner and SPIRAL are also capable of aligning multiple adjacent slices to construct a 3D architecture. For simplicity, we classified each tool into either the alignment or integration category.

Although clustering, alignment, and integration methods have enhanced our understanding of ST data and their practical applications, the lack of comprehensive benchmarking constrains comparison and hampers further algorithm development. It is common for a method to demonstrate excellent performance on well-studied, commonly used datasets; however, its performance may vary significantly when applied to brand-new data. In this work, we systematically analyze and evaluate the performance of 16 state-of-the-art clustering methods, five alignment methods, and five integration methods on a multitude of simulated and real ST datasets. We design a comprehensive benchmark framework in Fig. 1  and evaluate the clustering performance, overall robustness, layer-wise and spot-to-spot alignment accuracy, integration performance, 3D reconstruction, and computing time of each method. We consolidate these findings into a comprehensive recommendation spanning multiple aspects for the users, while also spotlighting potential areas in need of further research.

figure 1

Benchmarking framework for clustering, alignment, and integration methods on different real and simulated datasets. Top, illustration of the set of methods benchmarked, which includes 16 clustering methods, five alignment methods, and five integration methods. Bottom, overview of the benchmarking analysis, in terms of different metrics (1–7). Different experimental metrics and analyses, Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), Homogeneity (HOM), Average Silhouette Width (ASW), CHAOS, Percentage of Abnormal Spots (PAS), Spatial Coherence Score (SCS), uniform manifold approximation and projection (UMAP) visualization, layer-wise and spot-to-spot alignment accuracy, 3D reconstruction, and runtime, are designed to quantitatively and qualitatively assess method performance as well as data quality. Additional details are provided in the “ Results ” section

ST datasets examined and data preprocessing

We collected 10 ST datasets with a total of 68 slices for benchmarking, which had corresponding manual annotations shown in Table 1 . These datasets were produced by several ST protocols, including 10x Visium, ST, Slide-seq v2, Stereo-seq, STARmap, and MERFISH. We broadly categorized them into two groups based on the methodology employed-sequencing-based or imaging-based. The datasets varied in size, with the number of spots ranging from approximately 200 to over 50,000 and the number of genes from 150 to approximately 36,000.

Specifically, (1) the DLPFC dataset, generated with 10x Visium, includes 12 human DLPFC sections with manual annotation, indicating cortical layers 1 to 6 and white matter (WM), taken from three individual samples [ 51 ]. Each sample contains four consecutive slices (for example, slice A, B, C, and D in order). In each sample, the initial pair of slices, AB, and the final pair, CD, are directly adjacent (10 µm apart), whereas the intermediate pair, BC, is situated 300 µm apart.

(2) The HBCA1 dataset, generated with 10x Visium, includes a single slice of human breast cancer, which is open-sourced from 10x genomics [ 23 ].

(3) The MB2SA&P dataset, generated with 10x Visium, includes two slices of the anterior and posterior mouse brain. Only the anterior section includes annotation [ 12 , 26 ].

(4) The HER2BT dataset [ 46 ] by spatial transcriptomics contains HER2-positive tumors from eight individuals (patients A–H). Each slice contains between 177 and 692 spots and was examined and annotated by a pathologist based on morphology. Regions were labeled as either: cancer in situ, invasive cancer, adipose tissue, immune infiltrate, breast glands, or connective tissue.

(5) The MHPC dataset [ 19 ] by Slide-seq v2 is the largest slice used in our study with over 40,000 spots and 23,000 genes. The Allen Mouse Brain Atlas [ 52 ] was used as ground truth to identify seven key anatomical regions of the hippocampus, namely CA1, CA2, CA3, dentate gyrus (DG), third ventricle (V3), medial habenula (MH), and lateral habenula (LH). The cell-type annotations were provided by Goeva and Macosko [ 53 ].

(6) The Embryo dataset by Stereo-seq has over 50 slices, and the slices at two different time points E11.5 and E12.5 were used in our experiments. These datasets are from a large stereo-seq project called MOSTA [ 48 ]: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas by BGI.

(7) The MVC dataset [ 9 ] by STARmap contains one slice and was generated from the mouse visual cortex. It extends from the hippocampus (HPC) to the corpus callosum (CC) and includes the six neocortical layers.

(8) The MPFC dataset [ 9 ] of the mouse prefrontal cortex, annotated by BASS [ 18 ], was sequenced with the STARmap protocol. This dataset includes expression values for 166 genes measured across 1049 to 1088 single cells, along with their centroid coordinates on the tissue. Spatial domains, such as cortical layers L1, L2/3, L5, and L6, have been assigned based on the spatial expression patterns of marker genes, including Bgn for L1, Cux2 for L2/3, Tcerg1l for L5, and Pcp4 for L6. Three slices in this dataset are not categorized as consecutive.

(9) The MHypo dataset by MERFISH contains five manually annotated consecutive slices [ 18 ] labeled Bregma -0.04 mm (5488 cells), Bregma -0.09 mm (5557 cells), Bregma -0.14 mm (5926 cells), Bregma -0.19 mm (5803 cells), and Bregma -0.24 mm (5543 cells). Expression measurements were taken for a common set of 155 genes. Each tissue slice includes a detailed cell annotation, identifying eight structures: third ventricle (V3), bed nuclei of the stria terminalis (BST), columns of the fornix (fx), medial preoptic area (MPA), medial preoptic nucleus (MPN), periventricular hypothalamic nucleus (PV), paraventricular hypothalamic nucleus (PVH), and paraventricular nucleus of the thalamus (PVT).

Finally, (10) the MB dataset [ 35 , 50 ] by MERFISH has 33 consecutive mouse primary motor cortex tissue slices with similar shapes, which can be used for 3D reconstruction. Region annotation includes the six layers (L1-L6) and white matter (WM). Further details about the ground truth for each dataset are outlined in Additional file 1: Table S1. All except the MB dataset were used for benchmarking clustering tools. Five datasets, DLPFC, MB2SA&P, Embryo, MHypo, and MB, were used for benchmarking alignment and integration tools. Utilizing the evaluation framework illustrated in Fig. 1 , we conducted benchmarking of various clustering, alignment, and integration methods across all ST datasets.

All methods employ customized and often inconsistent preprocessing strategies, which might significantly impact their performance. The preprocessing of ST data typically encompasses four essential steps: quality control, normalization, feature selection, and/or dimension reduction. Each method may employ one or more of these steps. The scanpy package is commonly used to eliminate low-quality cells that lack sufficient expressed transcripts or low-quality genes that are rarely observed across the data slice, thereby mitigating the impact of noise. Subsequently, the expression matrix is normalized within each cell and log-transformed to further suppress potential extreme values. Feature selection involves any form of expression profile dimension reduction or subsetting steps. Due to the variability in preprocessing steps across different methods, it is challenging to draw a simple conclusion. Therefore, we have summarized the parameter settings and descriptions used in the preprocessing steps when benchmarking each method in Additional file 1: Table S2. For instance, STAGATE selects only highly variable genes (HVGs), while CCST and conST calculate principal components (PCAs) to reduce the input feature dimensions. SpaceFlow and ADEPT utilize HVGs but also emphasize input feature quality control by removing noisy genes and samples. Regarding alignment and integration methods, for example, STAligner, SPIRAL, and GPSA incorporate preprocessing in their workflows. All three select HVGs, but only GPSA also controls data quality by removing low-quality genes and cells. We also provided the specific pipeline of data preprocessing for each method in our GitHub.

PCA is commonly used for dimensionality reduction in clustering methods. GLM-PCA [ 54 ] is believed to improve low-dimensional representation compared to PCA. As detailed in Additional file 2: Supplementary results and Fig. S1, we analyzed whether replacing principal components (PCs) with GLM-PCs enhances performance.

Performance comparison of 16 clustering methods

We first performed a comprehensive benchmarking analysis for 16 different clustering methods aimed at assessing their performance in accurately identifying spatial domains. The two heatmaps of Fig. 2 a, b illustrated the average Adjusted Rand Index (ARI) for each method across 33 slices from eight ST datasets, along with the corresponding rank scores for each tool. We ranked the tools in descending order based on their average rank of ARI. Details for computing ARI values and rank score are included in the “ Methods ” section. The ARI and rank results revealed that BASS, GraphST, ADEPT, BANKSY, and STAGATE emerged as top-tier tools, followed by SpatialPCA and CCST. Notably, BASS attained the highest average and sum rank, followed by GraphST, ADEPT, and BANKSY. BASS achieved a much higher ARI than other methods on the MHypo datasets. Most methods struggled to give reasonable predictions on the HER2BT datasets since the annotated regions by ground truth were less coherent and the data more noisy. This comprehensive evaluation shed light on the relative strengths of these methods in the context of spatial domain identification within each ST slice.

figure 2

Clustering performance over 16 methods on 33 ST slices of eight datasets. a ARI heatmap. Each average ARI value is based on 20 runs. Empty entries for specific tools indicate either that the tool is not optimized for those use cases or that technical issues prevent the tool from completing its execution. b Ranking heatmap. This ranking heatmap is created by normalizing all results within the same slice by dividing them by the maximum ARI value (representing the best performance) among all methods, thus standardizing all ARI values to 1. For each method, the best ranking for the sum result is 33, and the best ranking for the average result is 1. The two heatmaps in ( a ,  b ) share a color bar ranging from 0 to 1. c Line plots illustrating the overall robustness of all methods across eight datasets in terms of ARI. d – k Ground truth visualization plots and box plots depicting ARI values from 20 runs of all tools on selected data slices from each dataset. The box plots illustrate the variability in the ARI on individual slices for certain tools since they do not use a fixed seed. In the box plots, the center line, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5 \(\times\) interquartile range, respectively. Certain tools were not applicable to specific datasets, so for the purpose of ordering, their ARI values in the box plots were assigned a value of 0

In Fig. 2 c, we further present a holistic assessment of the overall robustness of each clustering method by aggregating the average ARI across slices within each of the eight datasets and depict the results in a line chart. Notably, lower variances were exhibited in the DLPFC, MB2SA (the anterior section of MB2SA&P), HER2BT, and Embryo datasets across all clustering methods, albeit for different reasons. BASS, in alignment with previous analyses, emerged as the best clustering tool for four datasets. Nevertheless, it exhibited comparatively poorer performance on the HBCA1 dataset. ADEPT and BANKSY consistently secured the second and third positions, respectively, across most datasets, while GraphST led in the DLPFC and MB2SA datasets. The two slices from the Embryo dataset, each containing approximately 30,000 and 50,000 cells, respectively, were used to investigate the scalability of various methods. GraphST, CCST, and DeepST were not applicable to either slice due to memory constraints. ADEPT, SpaGCN, SEDR, and conST were not applicable to one of the slices (Embryo E12.5) for the same reason. Among all the tools, STAGATE achieved the highest overall performance in terms of ARI across both Embryo slices.

Although we have highlighted top tools based on overall performance across all slices and datasets, certain tools may perform exceptionally well or experience performance degradation in datasets for specific ST protocols or tissue types. For instance, GraphST performed best in 10x Visium datasets but experienced a decline in performance with the STARmap and MERFISH datasets, which were not specialized data types for GraphST (Fig. 2 c, dark green line). STAGATE (Fig. 2 c, purple line) performed the best for the Stereo-seq Embryo dataset, but its accuracy ranking was not at the top for other protocol datasets. SpaceFlow ranked third for the MERFISH (imaging-based) dataset but did not perform well for other sequencing-based datasets (Fig. 2 c, olive line). ConGI achieved top accuracy in both tumor slice datasets (HBCA1 and HER2BT), but did not perform well in brain slice datasets (Fig. 2 c, orange line).

Random seed analysis

Since the mean ARI does not capture the variance of each method, we also plotted box plots and ground truth visualization plots on all slices from each dataset (Fig. 2 d–k and Additional file 2: Fig. S2-S3). All six statistical methods, namely BASS, BayesSpace, DR.SC, PRECAST, SpatialPCA, and BANKSY, exhibited no variance as they set fixed seed for the initialization of parameters inside their functions. The remaining methods primarily relied on graph-based deep learning techniques, leading to potential variations in their predictions owing to random seeds. However, GraphST, ConGI, SpaGCN, and SpaceFlow also fixed their seeds to be identical for each run. In contrast, some deep learning-based methods do not adhere to this practice. To investigate the impact of random seeds and the corresponding loss function or objective function values on the clustering accuracy of these methods, we selected deep learning-based methods (CCST, ADEPT, and STAGATE) and statistical methods (BayesSpace and BASS) for additional analysis. The plots of ARI versus loss value, ARI versus seed, and loss value versus seed for the three deep learning-based methods indicated that clustering performance, measured by ARI, was randomly associated with both the loss value and the selected seed for each deep learning method (Additional file 2: Fig. S4-S6), making it challenging to select a particular result. However, these findings suggested that all three tools exhibited variance in ARI across various individual DLPFC slices, consistent with previous box plots for all slices (Fig. 2 d–k and Additional file 2: Fig. S2). A similar analysis on random seed, objective function value, and ARI for the statistical methods BayesSpace and BASS yielded the same result: clustering performance, in terms of ARI, was randomly associated with both the objective function value and the selected seed (Additional file 2: Fig. S7-S8). For BASS, we did not use the objective function value since it does not have one, but only the random seed and performance.

Clustering performance comparison using NMI, AMI and HOM

We also utilized three additional metrics-Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), and Homogeneity (HOM)-to further evaluate the clustering performance of all 16 methods. Similar to the ARI evaluation, we plotted two heatmaps for each of these metrics. Details for computing the NMI, AMI, and HOM values, as well as each rank score, are provided in the “ Methods ” section. The ranking order using these metrics was highly consistent with that obtained using ARI, with only a few exceptions (Fig. 3 a–f). BASS, GraphST, BANKSY, SpatialPCA, and ADEPT remained the top tools across the three metrics followed by CCST, and STAGATE, while SpaceFlow achieved the best HOM, indicating the highest cluster purity (Fig. 3 e, f).

figure 3

Clustering performance in terms of NMI, AMI, and HOM. a NMI heatmap. Each average NMI value is based on 20 runs. b Ranking heatmap. This ranking heatmap is created by normalizing all results within the same slice by dividing them by the maximum NMI value (representing the best performance) among all methods, thus standardizing all NMI values to 1. For each method, the best ranking for the sum result is 33, and the best ranking for the average result is 1. c ,  d Equivalent heatmaps as shown in ( a ,  b ) for AMI. e ,  f Equivalent heatmaps as shown in ( a ,  b ) for HOM. All heatmaps in ( a – f ) share a color bar ranging from 0 to 1.  g – i Line plots illustrating the overall robustness of all methods across eight datasets in terms of NMI ( g ), AMI ( h ), and HOM ( i )

To investigate the overall robustness of each method, we aggregated the average values of these three metrics across slices within each dataset (Fig. 3 g–i). The observed patterns were similar to those seen with ARI. BASS achieved the best performance in five out of eight datasets for these three metrics. STAGATE continued to perform well for the Stereo-seq Embryo dataset in terms of NMI and AMI. SpaceFlow and ConGI performed well for the MERFISH and tumor datasets, respectively.

Qualitative and quantitative benchmarking of a Slide-seq v2 dataset

So far, we quantitatively evaluated all clustering methods by ARI and other metrics. For the MHPC data using the Slide-seq v2 protocol (Fig. 4 a), where the spots were labeled by cell types, visual comparison with the ground truth was more effective than calculating ARIs. Additionally, we employed the Allen Brain Atlas as a ground truth for the anatomical regions (Fig. 4 b). The ground truth comprised four key distinguished anatomical regions, CA1, CA2, CA3, and dentate gyrus (DG), which displayed curved shapes. For better visualization, we have extracted clusters from each method to match these key distinguished anatomical regions. Our results demonstrated that all methods successfully recovered this feature; however, DR.SC and BASS failed to identify them as separate regions (Fig. 4 c). Moreover, ADEPT, GraphST, STAGATE, and BANKSY could further differentiate CA1 and CA3 (Fig. 4 c). Notably, no method delineated a separate CA2 region, merging it with CA3 instead. To quantitatively evaluate all methods for these regions, we conducted a manual region-based annotation of “CA1_CA2_CA3” and DG regions based on existing cell type annotations. This manual annotation (shown in Fig. 4 d) served as the ground truth for calculating clustering performance, measured by ARI, NMI, AMI, and HOM. Our results indicated that PRECAST exhibited the highest overall performance across all four metrics, followed sequentially by GraphST, SpaceFlow, ADEPT, STAGATE, and BANKSY (Fig. 4 e).

figure 4

Clustering performance on the MHPC dataset. a Ground truth (GT) annotation for the MHPC dataset. b The Allen Brain Atlas. c Comparisons of the predicted clusters generated by different clustering methods. d Customized GT annotation only for CA1_CA2_CA3 and Dentate Gyrus for the MHPC dataset. e Box plots of ARI, NMI, AMI, and HOM for all tools based on customized GT annotation only for CA1_CA2_CA3 and Dentate Gyrus

We further investigated three other key anatomical regions-third ventricle (V3), medial habenula (MH), and lateral habenula (LH). BASS, ADEPT, STAGATE, SpaceFlow, and BANKSY could successfully delineate these three regions. In conclusion, ADEPT, STAGATE, BANKSY, SpaceFlow, and GraphST were effective tools for delineating all seven key regions.

Spatial continuity analysis for clustering methods

Continuity is a key metric in spatial clustering, as it captures spatial coherence and well-defined interfaces between predicted spatial domains. To assess continuity by different methods, we utilized three widely recognized metrics: average silhouette width (ASW) [ 55 ], CHAOS [ 19 ], and percentage of abnormal spots (PAS) [ 19 ]. The methods are described in detail in the “ Methods ” section. Similar to the ARI evaluation, we plotted two heatmaps for each of these metrics. Details for computing the ASW, CHAOS, and PAS values, as well as each rank score, are provided in the “ Methods ” section. Unlike ASW, where a higher value indicates higher spatial continuity, lower CHAOS and PAS values indicate higher spatial continuity. Considering all three metrics together, we observed that SpaceFlow, BANKSY, and CCST achieved the best spatial continuity, followed by BASS and GraphST (Fig. 5 a–f). SpatialPCA and ADEPT had similar overall rankings, with SpatialPCA demonstrating better spatial continuity in terms of CHAOS and PAS.

figure 5

Clustering performance in terms of ASW, CHAOS, PAS, and SCS for spatial continuity. a ASW heatmap. Each average ASW value is based on 20 runs. b Ranking heatmap. This ranking heatmap is created by normalizing all results within the same slice by dividing them by the maximum ASW value (representing the best performance) among all methods, thus standardizing all ASW values to 1. A higher ASW value indicates greater spatial continuity. For each method, the best ranking for the sum result for ASW is 33, and the best ranking for the average result is 1. c ,  d Equivalent heatmaps as shown in a ,  b  for CHAOS. e ,  f Equivalent heatmaps as shown in ( a ,  b ) for PAS. The ranking heatmaps were created by normalizing all results within the same slice by dividing them by the maximum CHAOS/PAS value (representing the worst performance) among all methods, thus standardizing all CHAOS/PAS values to 1. Lower CHAOS and PAS values indicate greater spatial continuity. For each method, the worst ranking for the sum result for CHAOS and PAS is 33, and the worst ranking for the average result is 1. All heatmaps in ( a – f ) share a color bar ranging from 0 to 1. g – j Average ARI values across all methods as a function of data slice complexity quantified by ASW ( g ), CHAOS ( h ), PAS ( i ), and SCS ( j ). Two Embryo slices were excluded in ( j ) for better visualization. Pearson correlation coefficients and p -values are indicated within the plots

All clustering methods exhibited performance that varied considerably across datasets. To reveal the effect of data complexity on performance, we plotted the average ARI by all methods for each slice as a function of data complexity. (Fig. 5 g–j). To quantify data complexity, we utilized ASW, CHAOS, and PAS as metrics to measure spatial continuity for each slice based on the ground truth labels. Additionally, we introduced another metric, the Spatial Coherence Score (SCS), to quantify data complexity. Details are described in the “ Methods ” section. The overall trend of the average ARI across all methods, represented by each regression line, indicated that clustering accuracy decreased as data complexity increased. All Pearson correlation values between ARI and each data complexity metric were significant ( p = 0.0036, p = 0.0002, p = 0.0027, and p = 0.0015 for ASW, CHAOS, PAS, and SCS, respectively). Since a higher ASW and SCS value indicates higher spatial continuity and lower data complexity, their Pearson correlation coefficient was positive ( R = 0.49 for ASW and R = 0.55 for SCS) in Fig. 5 g and j. Conversely, higher CHAOS and PAS values indicate lower spatial continuity and higher data complexity, resulting in negative Pearson correlation coefficients (Fig. 5 h, i; R = − 0.61 for CHAOS and R = − 0.51 for PAS). However, an intriguing observation emerged: the average ARIs for well-studied datasets were mostly above the regression line, whereas for less-studied datasets, average ARIs were below the regression line. This outcome indicated that the designs of most current algorithms favored the commonly used datasets and were not generally effective for all datasets. Though this phenomenon was due to the scarcity of available ST datasets with high-quality ground truth, it did exhibit a potential issue of algorithm overfitting, which should be noted and prevented in future studies.

Runtime analysis for clustering methods

Finally, we benchmarked the runtime of each method on seven selected ST slices (Fig. 6 ). The MVC slice has the smallest number of spots (1207). The MB2SA, DLPFC 151673, HBCA1, and MHypo Bregma -0.19 slices have 2695, 3611, 3798, and 5803 spots, respectively. The two largest Embryo slices have 30,124 and 51,365 spots, respectively. We plotted the runtime by arranging the datasets in ascending order based on the number of spots and sorted the tools in ascending order based on the runtime of the first MVC dataset. Overall, for the first five data slices, four tools-SpaGCN, BANKSY, GraphST, and STAGATE-demonstrated advantages in terms of runtime, as they could analyze each slice within a minute. Six tools, including SpatialPCA, DR.SC, SEDR, conST, DeepST, and SpaceFlow, exhibited comparably slower speeds but still completed execution within 5 mins per slice. In contrast, six tools-PRECAST, CCST, BASS, ADEPT, BayesSpace, and ConGI-lacked scalability and were significantly impacted by both the number of spots and genes, with their runtime increasing drastically as the data size grew. Regarding the two largest Embryo slices, STAGATE, BANKSY, and DR.SC demonstrated good scalability, processing both slices within 2–12 mins. SpaGCN and SEDR processed the Embryo E11.5 slice within 7–15 mins but could not process the Embryo E12.5 slice due to memory constraints on our computation platform, as described in the “ Methods ” section. conST, ADEPT, BASS, BayesSpace, PRECAST, SpaceFlow, and SpatialPCA could handle one or both slices, but their processing times increased significantly, ranging from 18 mins to 3.5 h. GraphST, DeepST, and CCST could not process either slice due to memory constraints. ConGI was also not applicable to either slice due to the absence of a histology image. Overall, STAGATE achieved the best runtime and scalability across all slices, followed by BANKSY and DR.SC.

figure 6

Runtime comparison of clustering methods. Runtime analysis of all 16 clustering methods on seven ST slices. The runtime is plotted by arranging the datasets in ascending order based on the number of spots and tools are sorted in ascending order based on the runtime of the first MVC dataset

Assessing the characteristics of joint spot embedding with pairwise two-slice joint analysis

In contrast to the conventional approach of ST focusing on spatial domain distribution in a single slice, there is a growing recognition of the value of integrative and comparative analyses of ST datasets. In our pairwise two-slice joint analysis, we started by using nine pairs of DLPFC slices to explore whether integration could improve joint spot embeddings by leveraging adjacent consecutive slices. Evaluation experiments were conducted by introducing layer-wise alignment accuracy. The fundamental idea behind this analysis is based on the hypothesis that aligned spots across consecutive slices are more likely to belong to the same spatial domain or cell type. The detailed method for defining layer-wise alignment accuracy is outlined in the “ Methods ” section.

In Fig. 7 a, we compared the layer-wise alignment accuracy of all nine methods on nine DLPFC slice pairs. Given the unique layered structure of DLPFC data, we designed this evaluation metric to assess whether “anchor” spots from the first slice and “aligned” spots from the second slice belong to the same layer (layer shift = 0) or different layers (layer shift = 1 to 6). The expectation was that a good integration or alignment tool would show high accuracy for anchor and aligned spots belonging to the same layer (layer shift = 0), and this accuracy should decrease when the number of layer shift increases. We plotted the layer-wise alignment accuracy and sorted the tools in descending order based on the accuracy for layer shift of 0. In seven out of nine DLPFC slice pairs, SPACEL demonstrated the highest layer-wise alignment accuracy, while PASTE and STalign led in the remaining two pairs (Fig. 7 a). A similar experiment was conducted on four pairs drawn from the MHypo dataset (Fig. 7 b), but layer-wise alignment accuracy was only plotted for a layer shift of 0 due to the nature of the data. SPACEL still exhibited the best performance, followed by PASTE and STalign in the second position. It was not surprising that the two alignment tools, SPACEL and PASTE, exhibited the highest accuracy in layer-wise alignment across most pairs, which was expected as their primary objective was the direct alignment of spots across slices, rather than relying on joint spot embeddings for integration analysis. Conversely, tools like STAligner, PRECAST, DeepST, and SPIRAL, which leverage joint spot embeddings for indirect alignment across slices, demonstrated slightly lower but still satisfactory layer-wise alignment accuracy. Among these tools, STAligner achieved the highest accuracy, followed by DeepST, while PRECAST and SPRIAL performed the least accurately. These results highlighted, to some extent, the inherent qualities of joint spot embeddings by these integration tools. PASTE2, an extension version of PASTE, exhibited poor performance in this scenario because it primarily addresses the partial overlap alignment problem, where only partial overlap occurs between two slices or slice-specific cell types. Notably, the other two alignment tools, STalign and GPSA, lacked in robust and accurate alignment performance compared to SPACEL and PASTE.

figure 7

Bar plots for layer-wise alignment accuracy. a Bar plots depicting the layer-wise alignment accuracy for a layer shift from 0 to 6 for different methods on nine DLPFC slice pairs. b Bar plots depicting the layer-wise alignment accuracy for a layer shift of 0 for different methods on four MHypo slice pairs. GPSA could not be applied to the MHypo dataset. Tools are sorted in descending order based on the accuracy for layer shift of 0 in ( a ,  b )

While layer-wise alignment accuracy provides insight into spot-to-layer alignment, it is crucial to evaluate the spot-to-spot matching ratio to further evaluate joint spot embeddings. In Fig. 8 a, b, we marked “anchor” and “aligned” spots on both slices using three different colors, further classifying them into aligned (orange), misaligned (blue), and unaligned (green) spots based on ground truth layer labels, as described in the “ Methods ” section. Notably, for the DLPFC 151507-151508 pair, STAligner, GPSA, SPIRAL, DeepST, PASTE2, and PRECAST showed a notable proportion of unaligned spots on the second slice. This suggested a bias in these six tools, aligning multiple “anchor” spots from the first slice to the same “aligned” spot on the second slice, thereby leaving a significant number of spots unaligned on the second slice. The spot-to-spot mapping ratio further corroborated this observation, with PASTE demonstrating the lowest ratio (1.00), followed by STalign (1.01), SPACEL (1.24), PASTE2 (1.42), PRECAST (1.85), DeepST (2.13), SPIRAL (2.41), GPSA (2.59), and STAligner (2.78). Averaging this ratio across all nine pairs for each tool revealed a similar pattern (Fig. 8 c), except that GPSA achieved a better overall ratio, while PASTE2 had a worse overall ratio. Moreover, across all nine pairs, it was observed that misaligned spots (Fig. 8 a and Additional file 2: Fig. S9-S12) on the first slice tended to aggregate along the layer boundaries in PASTE, STalign, and SPACEL. In contrast, the other rest tools exhibited a dispersion of these misaligned spots within the layers. The high spot-to-spot mapping ratio and the dispersed pattern of misaligned spots in all integration tools suggested a shared trade-off, wherein the learned low-dimensional embeddings sacrifice certain local geometric information in the process of optimization and training. SPACEL, the alignment tool, exhibited coherent regions of unaligned spots (illustrated in green) outside the matched regions.

figure 8

Visualization plots for alignment-misalignment-unalignment and spot-to-spot mapping ratio. a ,  b Visualization plots showing aligned spots, misaligned spots, and unaligned spots when aligning the anchor spot from the first (top) slice to the aligned spots on the second (bottom) slice on DLPFC 151507-151508 ( a ) and MHypo Bregma -0.04 - -0.09 pair ( b ). Values below each plot represent the spot-to-spot matching ratio. c ,  d Bar plots representing the average spot-to-spot mapping ratio of each tool on two datasets: DLPFC ( c ) and MHypo ( d ). GPSA could not be applied to the MHypo dataset

We further performed this evaluation analysis in four pairs of MHypo slices and observed a similar trend for spot-to-spot mapping ratio and a similar dispersed pattern of misaligned spots in all tools (Fig. 8 b and Additional file 2: Fig. S13-S14). Specifically, SPIRAL had the worst average spot-to-spot mapping ratio, followed by STAligner, DeepST, and PRECAST (Fig. 8 d). PASTE2 and PASTE achieved a ratio of approximately 1. STalign and SPACEL demonstrated a less favorable average ratio (1.55 for STalign; 1.58 for SPACEL) for the MHypo data in comparison to the DLPFC data (1.09 for STalign; 1.30 for SPACEL).

Alignment accuracy on simulated datasets

While real datasets enabled us to assess alignment accuracy to some extent, they lacked precise spot-to-spot alignment ground truth. To comprehensively investigate alignment accuracy, we simulated datasets with the gold standard for different scenarios to demonstrate the robustness of all alignment and integration methods.

We first used one DLPFC slice as the reference and simulated another slice with different overlap ratios (20%, 40%, 60%, 80%, and 100%) in comparison to the reference slice (Fig. 9 a). In this simulation scenario, the pseudocount (gene expression) perturbation was fixed at 1.0 for all simulated slices. The detailed simulation method is outlined in the “ Methods ” section. In Fig. 9 b, c, the layer-wise alignment accuracy for a layer shift of 0 and spot-to-spot alignment accuracy are shown in bar plots. We observed that all five alignment methods achieved superior layer-wise alignment accuracy for a layer shift of 0 in comparison to the four integration methods. Furthermore, for each tool, accuracy tended to decline as the overlapping ratio between two slices diminished. Nevertheless, in terms of spot-to-spot alignment accuracy, all four integration methods-STAligner, PRECAST, DeepST, and SPIRAL-failed to achieve even a marginal value, which was consistent with the earlier conclusion that these tools exhibit relatively high spot-to-spot mapping ratios. On the other hand, three alignment tools-SPACEL, PASTE2, and PASTE-achieved relatively better spot-to-spot alignment accuracy. Among them, PASTE2 achieved a near-perfect accuracy at the 100% overlapping ratio and consistently maintained approximately 60% accuracy at lower overlapping ratios. SPACEL exhibited slightly better accuracy than PASTE2 when the overlapping ratio was lower than 100%. However, its accuracy decreased to approximately 40% at the 100% overlapping ratio. PASTE, on the other hand, failed to achieve satisfactory accuracy when the overlapping ratio was lower than 100%. For the other two alignment tools, STalign and GPSA, the spot-to-spot alignment accuracy was unexpectedly low, comparable to that of the four integration tools.

figure 9

Alignment accuracy in simulation Data. a DLPFC 151673 slice, consisting of seven layers, along with its simulated consecutive slices featuring overlapping ratios of 20%, 40%, 60%, 80%, and 100% with respect to DLPFC 151673 slice. b Layer-wise alignment accuracy for a layer shift of 0 across different tools, as a function of increased overlapping ratio. Tools are sorted in descending order based on the layer-wise alignment accuracy for layer shift of 0 on the left dataset (with 100% overlapping percentage). c Spot-to-spot alignment accuracy across different tools as a function of increased overlapping ratio. Tools are sorted in descending order based on the spot-to-spot alignment accuracy on the left dataset (with 100% overlapping percentage). d Layer-wise alignment accuracy for a layer shift of 0 across different tools, as a function of increased pseudocount perturbation. Tools are sorted in descending order based on the layer-wise alignment accuracy for layer shift of 0 on the left dataset (with pseudocount perturbation = 0.0). e Spot-to-spot alignment accuracy across different tools as a function of increased pseudocount perturbation. Tools are sorted in descending order based on the spot-to-spot alignment accuracy on the left dataset (with pseudocount perturbation = 0.0)

In the second simulation scenario, we simulated the slice with different pseudocounts (0–3.0 with a step size of 0.5) to represent perturbation on gene expression while keeping the overlapping ratio fixed at 100%. In Fig. 9 d, the bar plots demonstrated that the layer-wise alignment accuracy for a layer shift of 0 of four integration tools−DeepST, SPIRAL, PRECAST, and STAligner−decreased when pseudocount perturbation increased. This result suggested that all integration methods were sensitive to perturbation on the expression profiles, as they utilized gene expression profiles as spot (node) features when constructing a graph model for training. Conversely, five alignment tools-PASTE, SPACEL, STalign, GPSA, and PASTE2-exhibited significantly greater resilience to perturbations in gene expression. This resilience stems from their objective functions for alignment, which allowed for a more pronounced emphasis on spatial coordinates when gene expression varied across slices. Regarding spot-to-spot alignment accuracy in Fig. 9 e, three alignment tools (PASTE, PASTE2, and SPACEL) consistently maintained similar accuracy across various pseudocount perturbations. PASTE2 demonstrated the highest accuracy when pseudocount perturbation ranged from 0.5 to 3.0. The other two alignment tools (STalign and GPSA) still demonstrated low spot-to-spot alignment accuracy across all scenarios. Notably, when pseudocount perturbation was set to 0, indicating identical gene expression levels for each spot across slices, all four integration tools achieved better accuracy.

Integration methods improve integration of consecutive slices with batch correction

Once joint spot embeddings for each integration method were generated, we further visually evaluated the “batch-corrected” joint embeddings for the integration of consecutive slices using two components from uniform manifold approximation (UMAP). Alignment tools, PASTE, PASTE2, and SPACEL, were excluded from this analysis as they did not generate latent embeddings.

For the DLPFC 151507 and 151508 pair (Fig. 10 a), the UMAP plots for PRECAST, STAligner, DeepST, and SPIRAL showed that spots from two different slices were evenly mixed to some extent (Fig. 10 a, right panel), and their predicted domain clusters were well segregated (Fig. 10 , middle panel). Specifically, PRECAST tended to generate embeddings in a pattern with separated clusters, with some predicted clusters encompassing spots from different domains, a pattern that did not entirely align with the ground truth (Fig. 10 a, left panel). STAligner, DeepST, and SPIRAL maintained the hierarchical connections of the seven layers in the latent embedding space to some degree. However, there were instances where predicted spatial domains included spots from nearby domains, or one spatial domain was predicted to be two adjacent domains. STAligner achieved better UMAP visualization than DeepST and SPIRAL. Among all tools, PRECAST lost more geometry information than the other three tools since it prominently separated spatial domains in the latent space. We further demonstrated this UMAP analysis for all the rest DLPFC pairs and plotted annotations by ground truth, method prediction, and slice index (Additional file 2: Fig. S15-S16). All remaining UMAP results exhibited consistent patterns and further affirmed that all four methods were capable of generating “batch-corrected” joint embeddings for the integration of consecutive slices. However, the integrated spatial domains were not highly concordant with the ground truth.

figure 10

UMAP plots of low dimensional joint embedding distribution for batch correction. a – d These UMAP plots depicting the 2D distribution of latent joint embeddings after integration with batch correction by different methods on the DLPFC 151507-151508 pair ( a ), the MHypo Bregma -0.04 - -0.09 pair ( b ), the DLPFC 151507-151510 four consecutive slices ( c ), and the MHypo Bregma -0.04 - -0.24 five consecutive slices. Each UMAP contains colored spots labeled by three different setups: ground truth (GT), method prediction, and slice index

We extended this analysis to four pairs of the MHypo data (Fig. 10 b and Additional file 2: Fig. S17). The joint embeddings generated by PRECAST, STAligner, and DeepST somewhat facilitated integration across consecutive slices, although this effect was much inferior compared to the results of the DLPFC data. These three tools exhibited several connected small clusters or a single large cluster which were hard to differentiate based on the annotation by ground truth. The other tool, SPIRAL, experienced a significant batch effect as its joint embeddings across slices were unevenly mixed and experienced substantial separation. This result was in agreement with the least favorable spot-to-spot mapping ratio (4.01) by SPRIAL.

In addition to benchmarking on the integration of slice pairs, we further demonstrated the performance of each method on multi-slice ( \(>2\) ) integration. All UMAP plots for PRECAST, STAligner, DeepST, and SPIRAL indicated a relatively even mixture of spots from four distinct slices provided by three samples (DLPFC 151507-151510, 151669-151672, 151673-151676) (Fig. 10 c and Additional file 2: Fig. S18). Consistent with observations in paired settings, the embeddings generated by PRECAST continued to exhibit a pattern characterized by separated clusters. On the other hand, STAligner, DeepST, and SPIRAL still maintained hierarchical connections across seven layers in the latent embedding space. STAligner demonstrated slightly better UMAP visualization than DeepST and SPIRAL. As for the integration of the five slices of the MHypo dataset (Fig. 10 d), all tools still displayed several small connected clusters or a single large cluster that was challenging to differentiate based on the annotation by ground truth. However, SPIRAL mixed the spots across five slices evenly and did not display any batch effect, which indicated SPIRAL could use adequate data to remove the batch effect for its latent embeddings. In summary, there is still a need for an optimal and robust tool for integration. While existing tools have shown efficacy to some extent in well-studied datasets, their performance has not consistently generalized to diverse datasets.

Integration methods enhance domain identification through joint embedding

Integrating data from multiple ST slices can allow us to estimate joint embeddings of expressions representing variations between cell or domain types across slices, which has the potential to better detect spatial domains or cell types, compared to single slice analysis [ 33 ]. To further quantitatively compare the effectiveness of these methods in capturing spatial domains via joint embeddings, we employed joint embeddings from each pair of slices in the MHypo and DLPFC datasets to perform clustering together using the clustering method mclust [ 56 ]. We then computed ARI as an evaluation metric to compare the clustering results of each tool with the ground truth in each slice, with higher ARI scores indicating better domain identification.

In Fig. 11 a, b, we plotted the average ARI results under two scenarios. BASS, PRECAST, and DeepST supported both single-slice and multi-slice joint (integration) analyses. Accordingly, we utilized blue bars to depict the results before integration (single-slice mode) and orange bars to represent the results after integration. However, since STAligner and SPIRAL only have a multi-slice joint analysis mode, the blue bars for these methods were left unpopulated. It was difficult to conclude which tool had the overall best performance in all pairs after integration. In nine pairs of DLPFC data (Fig. 11 a), DeepST and STAligner exhibited the most variance across all runs. SPIRAL demonstrated the best performance on DLPFC 151509-151510 and 151669-151670 pairs. STAligner led the performance on DLPFC 151673-151674, 151674-151675, and 151675-151676 pairs, albeit marginally. Notably, the DLPFC 151670-151671 pair, characterized by a large spatial distance along the z -axis (300 µm apart) within the tissue between two slices, presented challenges for all methods. These tools either exhibited a significant performance discrepancy in two slices or failed to perform well in both slices. A similar observation has been spotted on the 151508-151509 distant pair as well. In the DLPFC 151671-151672 pair, SPIRAL and STAligner demonstrated better performance. Most methods performed similarly on the DLPFC 151507-151508 pair. Results were comparatively simpler on the four pairs of the MHypo dataset (Fig. 11 b). BASS demonstrated superior performance in all four pairs, followed by STAligner. However, the remaining three methods failed to produce reasonable results. To compute an overall ranking based on ARI for each tool across all slice pairs from the DLPFC and MHypo datasets, we generated ARI value and rank heatmaps after integration. Our results demonstrated that BASS achieved the best average and sum rank after integration, followed by STAligner and SPIRAL (Additional file 2: Fig. S19).

figure 11

ARI plots before and after integration for domain identification in DLPFC and MHypo datasets. a ,  b ARI bar plots for nine DLPFC pairs ( a ) and four MHypo pairs ( b ) using different methods. Blue bars represent the average ARI values for 20 runs before integration (in single-slice mode), and orange bars represent the average ARI values for 20 runs after integration. Error bars represent standard deviations calculated from 20 runs. Note that the blue bars for STAligner and SPIRAL remain unpopulated since they do not support single-slice clustering. c ARI plots for anchor slices as a function of increased slice distance for different methods. Dashed lines indicate the ARI of anchor slices before integration (in single-slice mode). d Paired ARI plots comparing values before and after integration for three methods. Solid lines indicate that ARI after integration is higher than before integration. Dashed lines indicate that ARI after integration is lower than before integration. Statistical significance between the before and after integration values is assessed using a paired t -test and indicated as follows: \(^{ns}p \ge 0.05\) and \(^{***}p<0.001\) . The average ARI across before and after integration conditions is marked with a bar and the respective value

We investigated every adjacent consecutive slice pair before and after integration analysis. The distant DLPFC slice pairs such as 151670-151671 and 151508-151509 posed challenges for all methods to improve clustering accuracy after integration. To explore how the physical distance between slices affects integration, we analyzed the ARI of all tools at four distances from the Bregma in the MHypo dataset. Specifically, we examined distances of 0.05 mm, 0.1 mm, 0.15 mm, and 0.2 mm, using slices at Bregma -0.04 and -0.24 as fixed anchor points. This analysis included comparisons across seven distinct pairings from Bregma -0.04 to -0.24, helping to discern the impact of slice distance on integration effectiveness. We plotted the ARI of two anchor slices against the increasing distance between slices, observing two different outcomes (Fig. 11 c): (1) for BASS and DeepST, integration led to an improvement in the ARI of both anchor slices (surpassing the dashed line that represents the ARI for a single anchor slice before integration) when the distance between the slices was small. However, the ARI of the anchor slices declined as the distance between the slices increased. This indicated that integration could reduce the clustering accuracy of the anchor slice if the slice distance was sufficiently large (dropping below the corresponding dashed line). (2) For PRECAST, STAligner, and SPRIAL, integrating with slices that were either close or distant did not impact the clustering accuracy of the anchor slices. In conclusion, integration can enhance clustering for individual slices, but the effectiveness of this improvement depends on the distance between slices for each specific dataset.

Although no clear overall winner emerged after integration, integration analysis produced some improvement in clustering accuracy compared to single-slice analysis within certain tools. Specifically, both PRECAST and DeepST exhibited enhanced clustering accuracy after integration (Fig. 11 d). Across a total of 26 before-and-after pair conditions for two datasets, PRECAST’s average ARI increased from 0.363 before integration to 0.411 after, though this change was not statistically significant ( p  = 0.3). In contrast, DeepST exhibited a notable increase in clustering accuracy, with the average ARI improving from 0.285 before integration to 0.395 after, which was statistically significant ( p  = 0.0006). BASS did not show any significant improvement in clustering accuracy through integration, with its average ARI slightly changing from 0.517 before to 0.532 after integration ( p  = 0.5).

Integration methods align samples across different anatomical regions and development stages

So far, our benchmarking has focused on evaluating the integration capabilities of methods across adjacent consecutive sample slices. In this section, we delved deeper into its efficacy for integrating non-consecutive slices. We employed a 10x Visium dataset representing mouse brain sagittal sections, divided into posterior and anterior. We employed the Allen Brain Atlas as a reference (Fig. 12 a) and visually compared the clustering results of all methods (Fig. 12 b–f). Among all methods, PRECAST demonstrated the least effective performance and failed to detect and connect common spatial domains. In contrast, BASS, STAligner, DeepST, and SPIRAL were better able to identify and connect common spatial domains along this shared boundary. Specifically, only STAligner identified and aligned six distinct layers in the cerebral cortex (CTX) across the anterior and posterior sections. On the other hand, BASS and SPRIAL only managed to identify four distinct layers in CTX. Additionally, STAligner and SPRIAL performed well in distinguishing layers within the cerebellar cortex (CBX). However, none of them identified a coherent arc across two sections for CA1, CA2, and CA3. In summary, STAligner showed capacity in integration for adjacent slices across different anatomical regions.

figure 12

Visualization plots for integration with batch correction in MB2SA&P dataset and mouse Embryo dataset. a The Allen Brain atlas serving as the ground truth. b – f Domain identification by five methods in the MB2SA&P dataset. g Domain identification by the ground truth in the mouse Embryo dataset. h Domain identification by STAligner in the mouse Embryo dataset

Next, we investigated the ability of all methods to integrate two slices from different development stages, to study the spatiotemporal development in tissue structures during mouse organogenesis. Only STAligner has scalability in processing this big benchmarking dataset (over 50k spots for each slice), so other tools were excluded from this analysis. In Fig. 12 g, the two mouse embryo slices were acquired at two different time points (E11.5 and E12.5) with region-based manual annotations for different organs and tissues. We observed that STAligner successfully retrieved several shared structures such as dorsal root ganglion, brain, heart, and liver in both slices (Fig. 12 h). We also observed that at developmental stage E11.5, structures like the ovary and kidney were less developed compared to E12.5. These results facilitated the reconstruction of the developmental progression of each tissue structure throughout organogenesis.

Reconstruction of 3D architecture from consecutive 2D slices

Initially, 2D slices were produced from 3D tissue, and alignment or integration tools, specifically designed for pairwise or all-to-all alignments using multiple adjacent consecutive slices, can then reconstruct the 3D architecture. 3D architecture allows users to explore the dynamics of transcript distributions from any direction, so reconstructing an effective 3D architecture of complex tissues or organs is essential. In Fig. 13 , we provided 3D reconstruction visualization results from three different samples using four methods, SPACEL, PASTE, SPIRAL, and STAligner. The methods are described in detail in the “ Methods ” section. All four tools achieved consistent and satisfactory 3D visualization results on DLPFC sample 3, encompassing four adjacent consecutive slices numbered 151673-151674-151675-151676 (Fig. 13 b). For the MHypo sample which contains five consecutive slices, SPACEL and PASTE demonstrated comparable and effective 3D visualizations (Fig. 13 c). In contrast, SPIRAL exhibited misaligned scatter spots beginning from the second slice, and the occurrence of these misalignments increased with the addition of more stacks of slices. Starting from the third slice, STAligner exhibited rotational distortions in the slices, leading to a discordant 3D architecture. The underlying reason could be that SPIRAL performed all-to-all alignments, whereas SPACEL and PASTE performed pairwise alignments between each pair of adjacent consecutive slices sequentially. All-to-all alignments have the potential to introduce more false alignment, particularly when two slices are not closely positioned along the z -axis. GPSA can reconstruct the 3D architecture using DLPFC slices; however, the original shape of the DLPFC slice is distorted after alignment (Additional file 2: Fig. S20).

figure 13

Reconstruction of 3D architecture of three different datasets. a 3D architecture reconstructed from 33 slices of MB data using SPACEL (with and without manual annotation labels) and PASTE. b 3D architecture reconstructed from four slices (DLPFC 151673-151676) of DLPFC Sample 3 using SPACEL, PASTE, SPIRAL, and STAligner. c 3D architecture reconstructed from five slices of MHypo data using SPACEL, PASTE, SPIRAL, and STAligner

In terms of the MB sample, which contains 33 adjacent consecutive mouse brain tissue (Fig. 13 a), only SPACEL and PASTE proved suitable for reconstructing the 3D architecture with this substantial number of slices. We selected a similar orientation of the 3D architecture for comparison purposes. The final module, Scube in SPACEL, successfully generated an effective 3D visualization by incorporating manual annotation labels. However, both SPACEL (without manual annotation labels) and PASTE produced a discordant 3D architecture, particularly noticeable from the second half of the slices onward. Combining pairwise alignments from multiple adjacent slices into a stacked 3D alignment of tissue led to the propagation of errors, resulting in the observation of two disjointed 3D architectures.

Runtime analysis for alignment and integration methods

Finally, we benchmarked the average runtime of each alignment and integration method on five selected datasets (Fig. 14 ). The DLPFC and the MB2SA&P datasets were medium-sized, with approximately 3-4k spots and 30k genes. Though each slice of the MHypo dataset has approximately 5k spots, each spot only contains 155 genes. The Embryo dataset is the largest in terms of the number of spots and genes. Lastly, the MB dataset has 33 slices in total for alignment and 3D reconstruction. We plotted the runtime and sorted the tools in ascending order based on the runtime of the first DLPFC dataset. The plot of Fig. 14 a illustrates the average runtime when aligning or integrating two slices. Empty columns indicate scenarios where either the algorithm is not optimized for such use cases, or where memory consumption is excessively high, leading to the tool’s inability to complete execution. Overall, methods such as STAligner, BASS, PRECAST, PASTE, and PASTE2 finished integration within 10 mins and exhibited reasonable scalability. Their time consumption was only marginally affected by increases in both the number of spots and genes. In contrast, scalability issues were more pronounced with methods like GPSA, SPACEL, SPIRAL, DeepST, and STalign, where integration tasks might take hours or even days to complete. STAligner stands out as the sole tool capable of completing analysis on the Embryo dataset without encountering any memory constraints thus far.

figure 14

Comparison of runtime bar plots for different integration methods across five datasets. a Runtime for aligning or integrating two slices across four datasets. b Runtime for aligning or integrating multiple (> 2) slices across three datasets. Empty columns for specific tools indicate scenarios where either the tool is not optimized for such cases, or where the memory consumption is excessively high, resulting in the tool’s inability to complete execution

In Fig. 14 b, we further compared the runtime of each tool when aligning or integrating multiple (> 2) slices. STAligner, PRECAST, and PASTE continued to exhibit promising scalability under these conditions. GPSA, SPACEL, SPIRAL, and DeepST showed significantly slower performance, typically being 100x to 1000x slower than the aforementioned methods when integrating more than two slices. PASTE and SPACEL took 32 mins and 5 h, respectively, to complete 3D alignment and reconstruction for the MB dataset.

In this study, we conducted comprehensive benchmark analyses covering different clustering, alignment, and integration tasks. We assessed 16 clustering methods, five alignment methods, and five integration methods across 68 slices of 10 publicly available ST datasets. We provide a user recommendation table (Table 2 ) for users to choose an optimal tool to conduct the corresponding analysis. For the majority of our recommendations, we based our conclusions on overall rankings derived from multiple metrics and various datasets. Our study revealed that BASS, GraphST, BANKSY, ADEPT, SpatialPCA, STAGATE, and CCST outperformed the other ten clustering methods in terms of overall clustering accuracy, robustness, and continuity, as evaluated by seven metrics: ARI, NMI, AMI, HOM, ASW, CHAOS, and PAS. Despite these findings, identifying a definitive best-performing tool was challenging. For example, while BASS achieved the best overall accuracy, it did not excel in clustering continuity. Additionally, certain other tools exhibited their peak performance within specific ST protocols or tissue types. Notably, the overall performance trend for all methods decreased as the data complexity increased. All methods potentially suffer from algorithm overfitting, as indicated by their performance exceeding expectations on well-studied datasets but underperforming on less-studied ones. In terms of runtime and scalability, STAGATE, BANKSY, DR.SC, SpatialPCA, SpaceFlow, and PRECAST demonstrated the best scalability across datasets of varying sizes.

Alignment vs. integration methods

While alignment and integration methods are capable of conducting multi-slice analysis, alignment methods such as PASTE, PASTE2, SPACEL, STalign, and GPSA typically produce spot-to-spot alignment matrices or transformed spot coordinates based on alignment. In contrast, integration methods using deep learning backbones often generate joint spot embeddings for subsequent integration analyses. Therefore, it was not surprising to see that SPACEL and PASTE exhibited higher accuracy in layer-wise alignment compared to all integration tools as the primary objective of alignment methods was the direct alignment of spots across slices, rather than relying on joint spot embeddings for integration analysis. Relying on the joint spot embeddings to align spots across slices, STAligner achieved the highest layer-wise alignment accuracy among all integration methods, followed by DeepST, while PRECAST performed the least accurately. These results highlighted, to some extent, the inherent qualities of their learned joint spot embeddings. Our additional visualization plots for alignment-misalignment-unalignment analysis and spot-to-spot mapping ratios revealed that integration tools such as STAligner, SPIRAL, DeepST, and PRECAST produced joint spot embeddings capable of capturing global features for coarse layer-wise alignment and integration. Nevertheless, they might not suffice for capturing the local geometry necessary for spot-to-spot alignment. Our simulation experiments provided further validation for this observation. Notably, among all tools, PASTE2 and SPACEL achieved better spot-to-spot alignment accuracy when slices partially overlapped. The performance of all integration methods was highly sensitive to perturbation on the expression profiles. Notably, PASTE2 exhibited the greatest robustness to these perturbations, followed by PASTE, SPACEL, and GPSA.

Most integration methods were initially designed to learn joint spot embeddings across multiple slices. UMAP plots, projecting embeddings into two components, can to some extent reflect integration performance. Among these methods, STAligner stood out with better UMAP visualization, demonstrating integration with batch correction. However, its performance degraded for the MHypo dataset compared to the DLPFC dataset. SPIRAL, on the other hand, suffered from a significant batch effect due to uneven mixing of joint embeddings across slices, leading to notable separation issues across slices for the MHypo dataset, consistent with its least favorable and super high spot-to-spot mapping ratio. PRECAST tended to lose substantial geometry information, resulting in a more noticeable segregation of spatial domains in the latent space compared to the other tools. Although joint spot embeddings learned by multi-slice analysis have the potential to provide us a way to better detect spatial domains or cell types compared to single-slice analysis, and certain tools demonstrated this potential improvement, it was difficult to conclude which tool had the overall best clustering performance in all pairs after integration. In summary, there is still a need for more robust integration tools. Integration methods could also align samples across different anatomic regions or development stages. We found STAligner outperformed other tools and had the scalability to process big datasets (over 50k spots).

As for the reconstruction of 3D architecture from multiple adjacent consecutive 2D slices, alignment tools such as PASTE and SPACEL outperformed integration tools like STAligner, SPIRAL, and GPSA. Specifically, when aligning a significant number of adjacent consecutive slices, SPACEL with manual annotation labels outperformed SPACEL without manual annotation labels and PASTE. This is because an erroneous alignment can trigger a cascade of errors in subsequent slices in SPACEL and PASTE. It is also worth noting that the 3D reconstruction by SPACEL is not deterministic and exhibits variance. Finally, in terms of runtime for alignment and integration, STAligner, PRECAST, and PASTE demonstrated good scalability for large datasets.

Comparison with existing benchmarks

To date, two other benchmarking studies [ 57 , 58 ] have been conducted for ST clustering methods. However, unlike the methods in these studies, which focused primarily on identifying spatial domains within a single slice, there is a growing recognition of the importance of integrative and comparative analyses across multiple ST slices. Integration analysis with adjacent slices also has the potential to enhance the detection of spatial domains compared to single-slice analysis. Therefore, in terms of the evaluation scope, our work provides a more comprehensive benchmarking study encompassing various types of methods, including clustering, alignment, and integration algorithms, evaluated on both real and simulated datasets. Our study includes the most extensive collection of clustering tools to date and also offers a pair-wise evaluation of clustering performance both before and after integration, with a focus on tools such as BASS, PRECAST, and DeepST. For alignment and integration analyses, we have designed several specific qualitative and quantitative metrics, including layer-wise and spot-to-spot alignment accuracy, visualization for alignment-misalignment-unalignment, and spot-to-spot mapping ratio. These metrics are designed to enhance our understanding of the joint embeddings generated by integration methods and to highlight the significant performance differences between alignment and integration methods.

While it is challenging to identify a single best tool, we have summarized results and offered a comprehensive recommendation based on a broad range of metrics and scenarios, enabling users to select the most suitable tools for their needs. Notably, there are common and important recommendations for clustering tools benchmarked in our work and others. For instance, BASS demonstrated the best clustering accuracy and generalizability across different datasets. While SpaceFlow and CCST did not achieve the highest overall clustering accuracy, they excelled in contiguity. Certain tools, like GraphST, exhibited technology-biased performance. While it performed well in 10x Visium datasets, its performance declined with STARmap and MERFISH datasets, which were not specialized data types for GraphST. STATAGE had the best runtime and scalability for big datasets. However, there are also some important recommendations for tools like ConGI, BANKSY, SpatialPCA, and ADEPT, which were never benchmarked in other work. ConGI is the most effective tool for tumor datasets, although its performance declines with non-tumor datasets. BANKSY, ADEPT, and SpatialPCA are top tools across most recommendation scenarios.

As spatial transcriptomic data become more widely used in studying complex tissues, numerous methods for clustering, alignment, and integration are developed each year. In this benchmark study, we highlight several essential aspects to guide further method development. (1) Robust clustering methods: it is crucial to build robust clustering methods that excel in terms of both clustering accuracy and continuity and are capable of handling large-scale spatial omics datasets efficiently, thereby reducing analysis time and resources. (2) Avoid overfitting: minimize excessive parameter tuning on well-studied datasets to ensure that models generalize effectively across diverse datasets. (3) Joint embedding learning: developing methods to learn and utilize joint embedding for integration and spatial domain identification while capturing the data geometry for better alignment. (4) 3D visualization: creating tools for the 3D visualization of spatial omics data is necessary to better represent complex tissue architectures. (5) Incorporation of advanced spatial data types: many current methods primarily focus on transcriptomics data, often overlooking other advanced spatial data types like spatial proteomics and metabolomics, which could offer complementary insights. To address these limitations, future research should aim to incorporate spatial multi-omics data and design sophisticated computational methods, such as multi-model deep learning networks or multi-model statistical approaches for heterogeneous data integration and joint learning.

Clustering methods overview

BANKSY [ 21 ] utilizes a spatial feature augmentation strategy to cluster spatial omics data. It enhances each cell’s features with the average features of its neighboring cells and gradients of features across neighborhoods. By integrating neighborhood details into clustering, BANKSY can detect spatial domains that share similar microenvironments.

ADEPT [ 28 ] relies on a graph autoencoder backbone and performs an iterative clustering on imputed, differentially expressed genes-based matrices to minimize the variance of clustering results. The learned representations are suitable for subsequent clustering analyses.

GraphST [ 4 ] enhances ST analysis in terms of spatial clustering, multisample integration, and cell-type deconvolution by combining graph neural networks with self-supervised contrastive learning. The learned spot representations are suitable for clustering analyses.

SpaceFlow [ 27 ] employs spatially regularized deep graph networks to combine gene expression similarities with spatial information. This process generates spatially-consistent low-dimensional embeddings that are suitable for subsequent clustering analyses.

conST [ 25 ] is a versatile SRT data analysis framework employing contrastive learning techniques. conST integrates multi-modal ST data-gene expression, spatial information, and morphology (if applicable)-to learn low-dimensional embeddings. These embeddings are suitable for various downstream analyses.

ConGI [ 26 ] detects spatial domains by integrating gene expression and histopathological images, adapting gene expression to image information via contrastive learning. The learned representations are valuable for various downstream analyses.

SpatialPCA [ 19 ], a spatially aware dimension reduction method for ST data, extracts a low-dimensional representation of gene expression. It enhances the probabilistic version of PCA with localization information, employing a kernel matrix to model spatial correlation across tissue locations. The resulting components are termed spatial principal components (PCs).

DR.SC [ 20 ] employs a two-layer hierarchical model that simultaneously performs dimension reduction via a probabilistic PCA model and enhances spatial clustering using an HMRF based on empirical Bayes. DR.SC is characterized by automatical determination of the optimal number of clusters.

STAGATE [ 3 ] leverages a graph attention auto-encoder architecture for spatial clustering by integrating spatial information and gene expression profiles to derive low-dimensional embeddings. The learned embeddings are suitable for subsequent clustering analyses.

CCST [ 24 ] utilizes an extended Deep Graph Informax (DGI) framework by incorporating a hybrid adjacent matrix for gene expression and spatial data. It encodes cell embeddings and then employs PCA for dimension reduction. k-means++ was applied for clustering to identify novel cell groups or subpopulations.

SEDR [ 23 ] learns low-dimensional representations of gene expression data with spatial information. It uses deep autoencoder networks and variational graph encoders for spatial embeddings. SEDR is proficient in handling high-resolution ST data.

SpaGCN [ 22 ] utilizes a graph convolutional network to unify gene expression, spatial location, and histology data to identify spatial domains with coherent expression and histology. Subsequently, SpaGCN conducts domain-guided differential expression analysis to detect genes exhibiting enriched expression within identified domains across various ST studies.

BayesSpace [ 17 ], a fully Bayesian method, enhances resolution in ST data by integrating spatial neighborhood information for clustering analysis. It employs a t-distributed error model and Markov chain Monte Carlo (MCMC) for spot-level clustering, promoting neighboring cells to share clusters. It refines cell clustering by dividing spots into subspots with their neighbors.

Alignment and integration methods overview

STalign [ 36 ] utilizes diffeomorphic metric mapping to align ST datasets, accommodating partially matched tissue sections and local non-linear distortions. It effectively aligns ST datasets within and across technologies, as well as to a 3D common coordinate framework.

GPSA [ 37 ] employs a Bayesian model to align spatially-resolved samples to a common coordinate system (CCS) based on phenotypic readouts like gene expression. It involves a two-layer Gaussian process. The first layer maps the spatial locations of observed samples to the CCS, while the second layer maps from the CCS to the observed readouts.

SPIRAL [ 42 ] performs the integration task and the alignment task through two consecutive modules: SPIRAL-integration, focusing on data integration using graph domain adaptation, and SPIRAL-alignment, centered around alignment using cluster-aware optimal transport coordination.

STAligner [ 39 ] employs a graph attention auto-encoder neural network to extract spatially aware embeddings and constructs the spot triplets based on embeddings to guide different slices’ integration and alignment process.

PRECAST [ 41 ], an integration method, takes normalized gene expression matrices from multiple tissue slides as input. It factorizes each matrix into latent factors shared within cell/domain clusters, while performing spatial dimension reduction and clustering. It also aligns and estimates joint embeddings for biological effects between cell/domain types across the slides.

SPACEL [ 35 ] includes three modules: Spoint deconvolutes cell type composition per spot using a probabilistic multiple-layer perceptron in a single ST slice; Splane identifies coherent spatial domains across multiple slices via a graph convolutional network and adversarial learning; Scube constructs a 3D tissue architecture by transforming and stacking consecutive slices.

One important note for SPACEL in this benchmark work is that only the Scube module is utilized for alignment and 3D reconstruction for the MHypo and simulated datasets. This is achieved by incorporating manual annotation labels, as single-cell reference is not available for the initial Spoint module to perform deconvolution.

PASTE [ 33 ] employs an fused Gromov-Wasserstein optimal transport formulation to compute pairwise alignments of slices, integrating both transcriptional similarity and physical distances between spots. Moreover, PASTE aggregates these pairwise alignments to create a stacked 3D alignment of a tissue.

PASTE2 [ 34 ] introduces a novel formulation of the partial fused Gromov-Wasserstein optimal transport problem to addresses partial alignment and 3D reconstruction of multi-slice ST datasets. It accommodates scenarios wit partial overlap between aligned slices and/or slice-specific cell types.

BASS [ 18 ] detects spatial domains and clusters cell types simultaneously using a hierarchical Bayesian model. BASS performs well in identifying rare cell types and spatial patterns, showing robustness in handling multiple dominant cell types within spatial domains.

DeepST [ 40 ] uses neural networks, including a graph autoencoder and a denoising autoencoder, to jointly process the data and generate latent representations. Additionally, DeepST incorporates domain adversarial neural networks to integrate the ST data effectively.

Quantitative analysis for clustering

Benchmark metrics.

Adjusted Rand Index (ARI) [ 59 ]: ARI is a measure of the similarity between two data clusterings. It is a correction of the Rand Index, which evaluates the concordance between pairs of data points, determining whether they are grouped together or separated in two different clusterings. The ARI value is calculated using Eqs. 1 and 2 . a is the number of pairs of elements that are in the same cluster in both the ground true and predicted clusterings, b is the number of pairs of elements that are in different clusters in both the ground true and predicted clusterings, c is the number of pairs of elements that are in the same cluster in the true clustering but in different clusters in the predicted clustering, and d is the number of pairs of elements that are in different clusters in the true clustering but in the same cluster in the predicted clustering. E ( RI ) is the expected value of the Rand Index under the assumption of independence between the true and predicted clusterings. max ( RI ) is the maximum possible Rand Index. The ARI value ranges from − 1 to 1, where 1 indicates perfect agreement between the clusterings, 0 indicates random clustering and negative values indicate clustering that is worse than random.

Normalized Mutual Information (NMI) [ 59 ]: NMI is another measure commonly used to evaluate the similarity between two clusterings. It normalizes the Mutual Information (MI) score, evaluating the agreement between ground truth and predicted clusterings while considering both intra-cluster homogeneity and inter-cluster completeness. It ranges from 0 to 1: 0 signifies no mutual information (random clustering), while 1 indicates perfect agreement. The NMI value is calculated using Eqs. 3 and 4 . H ( U ) and H ( V ) represent the entropy of the clustering U and V , respectively, while MI ( U ,  V ) denotes the MI between U and V .

Adjusted Mutual Information (AMI) [ 59 ]: AMI is a measure commonly used to evaluate the similarity between two clusterings as NMI. It adjusts for chance agreement by considering the expected mutual information under random clustering. The AMI value ranges from − 1 to 1, where 1 signifies a perfect agreement between the clusterings, 0 indicates agreement expected purely by chance, and negative values indicate worse than chance agreement. To calculate AMI, Eqs. 5 and 4 are used.

Homogeneity (HOM) [ 59 ]: HOM is a metric commonly used in clustering analysis to evaluate the quality of clusters produced by a clustering algorithm. Homogeneity score measures the purity of clusters (Eq. 6 ), indicating whether each cluster contains predominantly data points from a single group or if it contains a mixture of different groups. A high homogeneity score suggests that the clustering algorithm has successfully identified distinct and homogeneous clusters, while a low score indicates that the clusters are more heterogeneous and less well-defined.

Average Silhouette Width (ASW) [ 55 ]: The ASW score is utilized to evaluate the spatial coherence of predicted domains concerning physical space in the ST field. ASW values range from − 1 to 1 (rescaled from 0 to 1), with higher values indicating better performance. To compute ASW, the silhouette width (SW) must first be defined, followed by averaging SWs across all cells. SW for a cell, described in Eq. 7 , is calculated based on the mean distance to all other cells in the same spatial domain a and the mean distance to all other cells in the next nearest cluster b .

CHAOS [ 19 ]: The CHAOS score is used to measure the spatial continuity of the detected spatial domains in the ST field, as described in Eqs. 8 and 9 . CHAO values range from 0 to N/A. Lower CHAOS value indicates higher spatial continuity and better performance.

Percentage of Abnormal Spots (PAS) [ 19 ]: The PAS score assesses the spatial homogeneity of spatial domain identification algorithms in the ST field. It is computed by determining the proportion of spots with a cluster label different from at least six out of their neighboring ten spots. A low PAS score suggests homogeneity of spots within spatial clusters. PAS values range from 0 to 1.

Spatial Coherence Score (SCS): A spatial coherence score of the cluster labels is computed based on O’Neill’s spatial entropy. A high spatial (more negative from the entropy) coherence score indicates that the cluster labels of adjacent spots are frequently identical, while a low spatial coherence score (less negative for the entropy) suggests that cluster labels of adjacent spots are more chaotic and less coherent. This score serves as an indicator of data quality. Specifically, let \(G = (V, E)\) be a graph where \(V\) is the set of spots, and edges \((i, j) \in E\) connect every pair \((i, j)\) of adjacent spots. Let \(K = \{1, 2, \ldots , k\}\) be a set of \(k\) cluster labels, and let \(L = [l(i)]\) be a set of labelings of spots, where \(l(i) \in K\) is the cluster label of spot \(i\) . The spatial entropy \(H(G, L)\) is defined in Eq. 10 , where \(P(\{a, b\} | E) = \frac{n_{a, b}}{|E|}\) , and \(n_{a, b}\) is the number of edges \((i, j) \in E\) such that \(l(i) = a\) and \(l(j) = b\) . The spatial coherence score is defined as a normalized form of spatial entropy, using the value of the Z score of spatial entropy over random permutations of the labels of spots in a slice [ 33 ].

Runtime: We collected the average runtimes from 20 iterations for each clustering method across all benchmarking datasets to assess their scalability.

Domain identification performance across 33 ST slices

Given that spatial domain or cell type identification is the primary objective of clustering methods, we aim to conduct a thorough performance comparison using ARI when manual annotation serving as ground truth is available. Some deep learning-based methods and all statistical methods fix the seed to produce deterministic output, some deep learning-based methods do not fix the seed in the practice. To address the variances in performance, we computed the average ARI from 20 runs on each dataset and displayed these results using box plots and a heatmap plot to enhance comparison and visualization. Additionally, since there are 33 ST slices across eight different datasets, it is challenging to rank the overall performance solely based on the average ARI heatmap plot. Therefore, we also provided another heatmap for the overall ranking. This ranking heatmap was generated by normalizing all results within the same slice by dividing them by the maximum ARI value (representing the best performance) among all methods, thereby standardizing all ARI values to 1. With 33 data slices in total, for each method, the best ranking for the sum result is 33, while the best ranking for the average result is 1. To ensure fairness, the rank scores were averaged exclusively over feasible ST data, excluding instances with NaN values. We performed the same analysis based on the NMI, AMI, and HOM metrics.

Overall robustness across seven ST datasets

To assess the robustness of methods on each dataset, the clustering results across different ST slices within the same dataset were averaged. A robust method is expected to demonstrate the highest overall ARI, NMI, AMI, or HOM value across all datasets, even if it may encounter challenges in predicting a few individual slices.

Data complexity effect on method performance

Data complexity is recognized to have an impact on method performance. Although different methods are often fine-tuned on different datasets to demonstrate superiority in specific contexts, our objective is to identify a general trend wherein methods exhibit diminished performance as data complexity increases. In this context, the Average Silhouette Width (ASW), CHAOS, Percentage of Abnormal Spots (PAS), and Spatial Coherence Score (SCS) are introduced as metrics for quantifying data complexity. The underlying assumption is that data with more coherent regions, indicated by a higher ASW/SCS (or lower CHAOS/PAS), are easier for domain identification.

Qualitative analysis for clustering

Clustering evaluation by visualization.

For MHPC data without region-based annotation, the evaluation is constrained to comparing the clustering results with the cell type annotation through visualization, supplemented by reference to the Mouse Allen Brain atlas.

Quantitative analysis for alignment and integration

Adjusted Rand Index: As illustrated in the clustering metrics section.

Layer-wise alignment accuracy: This metric relies on an important hypothesis that aligned spots from adjacent consecutive slices within a dataset are more likely to pertain to the same spatial domain or cell type. Joint spot embeddings learned from each method are utilized to align (anchor) spots from the first slice to (aligned) spots on the second slice for each slice pair. This alignment accuracy is defined as the ratio of the number of anchor spots to the total number of spots within the first slice when anchor spots and aligned spots belong to the same spatial domain or cell type. Euclidean distance is employed to define the closeness of spots to be aligned. A good integration tool is expected to demonstrate high accuracy for anchor and aligned spots belonging to the same spatial domain or cell type. For DLPFC data which has a unique layered structure, this metric is also meticulously designed to demonstrate whether anchor and aligned spots belong to the same layer (layer shift = 0) or they belong to different layers (layer shift = 1 to 6).

Spot-to-spot matching ratio: This metric further evaluates whether joint embeddings’ quality captures the data geometry. The ratio is defined as the ratio of the total number of anchor spots from the first slice to the number of aligned spots from the second slice. For two adjacent consecutive slices, a nearly 1:1 ratio is expected for an optimal tool.

Spot-to-spot alignment accuracy: This metric is used to evaluate joint embeddings for simulated datasets since the ground truth for spot-to-spot alignment relationship is available. This spot-wise alignment accuracy is defined as the percentage of anchor spots from the first slice that match correctly to aligned spots on the second slice.

Comparison of clustering performance before and after integration

A good practice that connects integration and clustering tasks is multi-slice joint clustering. To determine if incorporating information from adjacent consecutive slices enhances domain or cell type identification, we used batch-corrected joint embeddings to evaluate clustering results on each single slice based on ARI values. We plotted ARI for clustering results before and after the integration. However, some integration methods do not support single-slice clustering. We thus only plotted ARI after the integration of these methods.

Simulated data for alignment and integration

Given the scarcity of benchmark datasets available for integration tasks to evaluate spot-to-spot alignment accuracy, we modified the simulation method proposed in PASTE [ 33 ] and generated 11 simulated 10x Visium datasets for this evaluation. We first used one DLPFC slice (151673) as the reference and simulated additional slices with different overlapping ratios (20%, 40%, 60%, 80%, and 100%) in comparison to the reference slice. In this simulation scenario, the pseudocount perturbation was fixed at 1.0 for all simulated slices. Next, we simulated additional slices with different pseudocounts (0–3.0 with a step size of 0.5) to represent perturbation on gene expression while keeping the overlapping ratio fixed at 100%. Specifically, by taking the DLPFC 151673 slice as the reference, we altered the spatial coordinates in the new slice by rotating this reference slice, perturbed the gene expression by adding pseudocounts, and adjusted the number of spots by removing some spots that did not align with the grid coordinates following the rotation. To keep fidelity with the real 10x Visium data, the spots within the tissue in our simulation are arranged in a hexagonal grid rather than in a rectangular grid pattern. Additionally, we utilized the minimal distance between adjacent spots on the DLPFC 151673 slice as the distance between any two adjacent simulated spots on the grid, rather than arbitrarily setting it to 1.

More detailed procedures to generate simulated datasets are described as follows.

Create a hexagonal grid G . Let \(g_{.i}\) and \(z_{.k}\) denote the 2D coordinates of spot i on grid G and spot k on the reference slice DLPFC 151673, respectively. \(d_{ij}=||g_{.i}-g_{.j}||=\min _{kl}||z_{.k}-z_{.l}||\) for any two adjacent simulated spots i and j on grid that \(i, j \in G\) .

Let R be a rotation matrix with an angle \(\theta\) . After spot k is rotated with an angle \(\theta\) , the rotated coordinates of spot k , \(r_{.k}=Rz_{.k}\) , is used to mapped the spot k to the closest grid spot \(\hat{i}\) by \(\hat{i}=\arg \min _{i}||g_{.i}-r_{.k}||\) . Then, the simulated coordinates of tissue spot k , \(z'_{k}\) is given by \(z'_{k}=g_{.\hat{i}}\) . Spot k is dropped if the grid spot \(g_{.\hat{i}}\) was already used by a previous tissue spot.

Let \(X=[x_{ij}] \in \mathbb {N}^{m\times n}\) represent the m genes by n spots expression profile matrix of DLPFC slice 151673, where \(x_{ij}\) is the read count of gene i in tissue spot j . We can calculate the mean of the total transcript count of the tissue spots, \(\mu = \frac{1}{n}\sum _{ij}x_{ij}\) , and the variance of the total read count, \(\sigma ^2=\frac{1}{n}\sum _{j}(\mu -\sum _{i}x_{ij})^2\) . Total read counts of spot j , \(k_j\) , are generated according to \(k_j \sim\) NegativeBinomial( r ,  p ). Here,  \(r = \frac{\mu ^2}{\sigma ^2-\mu }\) and \(p = \frac{\mu }{\sigma ^2}\) such that \(E(k_j)=\mu\) and var \((k_j)=\sigma ^2\) .

Generate simulated gene i read count for spot j according to \(x'_{ij} \sim\) Multinomial \((k_j, \frac{x_{ij}+\delta }{\sum _ix_{ij+\delta m}})\) , where \(\delta \in \{0, 0.5, \dots , 3\}\) is a pseudocount.

Qualitative analysis for alignment and integration

Visualization of aligned, misaligned and unaligned spots from pairwise alignment.

To assess the joint spot embeddings by integration tools and the alignment matrices by alignment tools, we quantified the alignment accuracy based on aligned, misaligned, and unaligned spots across two consecutive slices. For integration tools such as STAligner, PRECAST, DeepST, and SPIRAL, we aligned the spot (referred to as the “anchor” spot) on the first slice with the spot (referred to as the “aligned” spot) on the second slice based on their joint latent embeddings using Euclidean distance. If the aligned spot belonged to the same spatial domain or cell type as the anchor spot according to ground truth labels, we classified both spots as “aligned” spots (denoted as “orange” color in Fig. 8 a, b). If the aligned spot did not belong to the same spatial domain or cell type as the anchor spot, we classified both spots as “misaligned” spots (denoted as “blue” color in Fig. 8 a, b). In the last scenario, if spots on the second slice were not used to match any spot on the first slice, these spots on the second slice were classified as “unaligned” spots (denoted as “green” color in Fig. 8 a, b). For alignment tools like PASTE, PASTE2, SPACEL, STalign, and GPSA, we directly used their alignment matrices or refined coordinates to perform this analysis.

Reconstruction of three-dimensional (3D) architecture of the tissue

Among all alignment and integration methods, Tools such as PASTE, PASTE2, SPACEL, SPIRAL, STalign, and GPSA have an output for a transformed coordinate system for all slices. Tools like STAligner use an embedded algorithm like ICP to align different slices based on an anchor cluster. Consequently, they can combine pairwise alignments from multiple adjacent consecutive slices into a stacked 3D alignment of a tissue. These three tools were benchmarked in three datasets by comparing their 3D architecture of the tissue.

One important note for SPACEL is that the 3D architecture for the MB dataset was reconstructed in two scenarios: (1) using the Scube module with manual annotation labels and (2) using both the Splane and Scube modules, incorporating the cell-type decomposition results provided by the authors.

Visualization of UMAP plot for joint embeddings

Most integration methods primarily concentrate on embedding the spots within a high-dimensional latent space, which often proves challenging to interpret intuitively. To enhance comprehension of the distribution in the latent space, we performed dimension reduction for spot embeddings to two dimensions using UMAP. A quality UMAP plot of latent embeddings should exhibit structures resembling those of the real data while also demonstrating spatial domain or cell types in a separable manner.

Visualization of clustering results after integration

For the MB2SA&P dataset, we compared the identified domains after integration with the Allen Brain atlas through visualization. Furthermore, we examined the consistency of regions across the fissure between the anterior and posterior sections. Higher similarity to the atlas, along with the region coherence, serve as indicators of superior integration performance.

For the mouse Embryo data, we compared the clustering result after integrating two slices for developmental stages E11.5 and E12.5 with the manual annotation defined by different organs and tissues.

Computation platform

We conducted all benchmarking experiments on our computer server equipped with one Intel Xeon W-2195 CPUs, running at 2.3 GHz, featuring a total of 25 MB L3 cache, and comprising 36 CPU cores. The cluster also boasted 256 GB of DDR4 memory operating at 2666 MHz.

For the GPU configurations, we utilized the same computer with four Quadro RTX A6000 cards, each having 48 GB of memory and a total of 4608 CUDA cores.

Availability of data and materials

All code, tutorials, and related data files are freely available on GitHub [ 60 ] https://github.com/maiziezhoulab/BenchmarkST and on Zenodo [ 61 ] with DOI: https://doi.org/10.5281/zenodo.13128213 under MIT licenses. All data and the corresponding annotation can be downloaded from https://benchmarkst-reproducibility.readthedocs.io/en/latest/Data%20availability.html and are described in Table 1 with their sources. Dataset 1 consists of 12 human DLPFC sections, available at http://research.libd.org/spatialLIBD/ with manual annotation [ 43 ]. Dataset 2 [ 44 ] includes a single slice of human breast cancer, which is open-sourced from 10x genomics at https://www.10xgenomics.com/ with annotation [ 23 ]. Dataset 3 [ 45 ] includes two slices of anterior and posterior mouse brain available at https://www.10xgenomics.com/ with annotation [ 26 ]. Dataset 4 contains HER2-positive tumors from eight individuals at https://github.com/almaan/her2st with annotation [ 46 ]. Dataset 5, including anatomical regions of the mouse hippocampus, is acquired through the Broad Institute, available at https://singlecell.broadinstitute.org/ with annotation [ 47 ]. Dataset 6 is the Embryo dataset sequenced by Stereo-seq from the MOSTA project at https://db.cngb.org/stomics/mosta/resource/ with annotation [ 48 ]. Dataset 7 contains one slice from the mouse visual cortex and is available at https://www.starmapresources.org/data [ 9 ]. Dataset 8 contains three slices of the mouse prefrontal cortex and is available at https://www.starmapresources.org/data [ 9 ]. Dataset 9 [ 49 ] includes five slices from the mouse hypothalamus available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248 with annotation [ 18 ]. Dataset 10 [ 50 ] contains 33 consecutive mouse cerebral cortex tissue slices with similar shapes at https://zenodo.org/records/8167488 with annotation [ 35 ]. The simulation data is deposited in Zenodo https://zenodo.org/records/10800745 [ 62 ].

Marx V. Method of the Year: spatially resolved transcriptomics. Nat Methods. 2021;18(1):9–14.

Article   PubMed   CAS   Google Scholar  

Tian L, Chen F, Macosko EZ. The expanding vistas of spatial transcriptomics. Nat Biotechnol. 2023;41(6):773–82.

Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1–12.

CAS   Google Scholar  

Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14(1):1155.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Ma C, Chitra U, Zhang S, Raphael BJ. Belayer: Modeling discrete and continuous spatial variation in gene expression from spatially resolved transcriptomics. Cell Syst. 2022;13(10):786–97.

Yang Y, Li G, Zhong Y, Xu Q, Chen BJ, Lin YT, et al. Gene knockout inference with variational graph autoencoder learning single-cell gene regulatory networks. Nucleic Acids Res. 2023;51(13):6578–92.

Asp M, Bergenstråhle J, Lundeberg J. Spatially resolved transcriptomes-next generation tools for tissue exploration. BioEssays. 2020;42(10):1900221.

Article   Google Scholar  

Chen J, McSwiggen D, Ünal E. Single molecule fluorescence in situ hybridization (smFISH) analysis in budding yeast vegetative growth and meiosis. JoVE (J Visualized Exp). 2018;135:e57774.

Google Scholar  

Wang X, Allen WE, Wright MA, Sylwestrak EL, Samusik N, Vesuna S, et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 2018;361(6400):eaat5691.

Moffitt JR, et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc Natl Acad Sci. 2016;113(39):11046–51.

Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363(6434):1463–7.

Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82.

Article   PubMed   Google Scholar  

Cheng M, Jiang Y, Xu J, Mentis AFA, Wang S, Zheng H, et al. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics. 2023;50(9):625–40.

Wang B, Luo J, Liu Y, Shi W, Xiong Z, Shen C, et al. Spatial-MGCN: a novel multi-view graph convolutional network for identifying spatial domains with attention mechanism. Brief Bioinforma. 2023;24(5):bbad262.

Fang S, Chen B, Zhang Y, Sun H, Liu L, Liu S, et al. Computational approaches and challenges in spatial transcriptomics. Genomics Proteomics Bioinforma. 2023;21(1):24–47.

Wang Y, Jin W, Derr T. Graph neural networks: self-supervised learning. Graph Neural Netw Found Front Appl. 2022. p. 391–420.

Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39(11):1375–84.

Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022;23(1):168.

Shang L, Zhou X. Spatially aware dimension reduction for spatial transcriptomics. Nat Commun. 2022;13(1):7203.

Liu W, Liao X, Yang Y, Lin H, Yeong J, Zhou X, et al. Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data. Nucleic Acids Res. 2022;50(12):e72.

Singhal V, Chou N, Lee J, Yue Y, Liu J, Chock WK, et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet. 2024;56(3):431–41.

Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;18(11):1342–51.

Xu H, Fu H, Long Y, Ang KS, Sethi R, Chong K, et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024;16(1):12.

Article   PubMed   PubMed Central   Google Scholar  

Li J, Chen S, Pan X, Yuan Y, Shen HB. Cell clustering for spatial transcriptomics data with graph neural networks. Nat Comput Sci. 2022;2(6):399–408.

Zong Y, Yu T, Wang X, Wang Y, Hu Z, Li Y. conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics. bioRxiv. 2022. https://doi.org/10.1101/2022.01.14.476408 .

Zeng Y, Yin R, Luo M, Chen J, et al. Deciphering spatial domains by integrating histopathological image and transcriptomics via contrastive learning. bioRxiv. 2022:2022.09.30.510297. Available from: https://www.biorxiv.org/content/10.1101/2022.09.30.510297 .

Ren H, Walker BL, Cang Z, Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat Commun. 2022;13(1):4076.

Hu Y, Zhao Y, Schunk CT, Ma Y, Derr T, Zhou XM. ADEPT: autoencoder with differentially expressed genes and imputation for robust spatial transcriptomics clustering. Iscience. 2023;26(6):106792.

Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR; 2020. p. 1597–1607.

Longo SK, Guo MG, Ji AL, Khavari PA. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet. 2021;22(10):627–44.

Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11(10):733–9.

Haghverdi L, Lun AT, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36(5):421–7.

Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19(5):567–75.

Liu X, Zeira R, Raphael BJ. PASTE2: partial alignment of multi-slice spatially resolved transcriptomics data. bioRxiv. 2023:2023.01.08.523162. Available from: https://www.biorxiv.org/content/10.1101/2023.01.08.523162 .

Xu H, Wang S, Fang M, Luo S, Chen C, Wan S, et al. SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat Commun. 2023;14(1):7603.

Clifton K, Anant M, Aihara G, Atta L, Aimiuwu OK, Kebschull JM, et al. STalign: alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat Commun. 2023;14(1):8123.

Jones A, Townes FW, Li D, Engelhardt BE. Alignment of spatial genomics data using deep Gaussian processes. Nat Methods. 2023;20(9):1379–87.

Titouan V, Courty N, Tavenard R, Flamary R. Optimal transport for structured data with application on graphs. In: International Conference on Machine Learning. PMLR; 2019. p. 6275–6284.

Zhou X, Dong K, Zhang S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci. 2023;3(10):894–906.

Xu C, Jin X, Wei S, Wang P, Luo M, Xu Z, et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res. 2022;50(22):e131.

Liu W, Liao X, Luo Z, Yang Y, Lau MC, Jiao Y, et al. Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat Commun. 2023;14(1):296.

Guo T, Yuan Z, Pan Y, Wang J, Chen F, Zhang MQ, et al. sSPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 2023;24(1):241.

Pardo B, Spangler A, Weber LM, Page SC, Hicks SC, Jaffe AE, et al. spatialLIBD: an R/Bioconductor package to visualize spatially-resolved transcriptomics data. Springer; 2022. http://research.libd.org/spatialLIBD/ . Accessed 15 Apr 2023.

10x Genomics. Human Breast Cancer (Block A Section 1). https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Breast_Cancer_Block_A_Section_1 . Accessed 15 Apr 2023.

10x Genomics. Mouse Brain Serial Section 2 (Sagittal-Anterior). https://www.10xgenomics.com/datasets/mouse-brain-serial-section-2-sagittal-anterior-1-standard . Accessed 15 Apr 2023.

Andersson A, Larsson L, Stenbeck L, Salmén F, Ehinger A, Wu S, et al. Spatial deconvolution of HER2-positive breast tumors reveals novel intercellular relationships. Cold Spring Harbor Laboratory; 2020. https://github.com/almaan/her2st . Accessed 15 Apr 2023.

Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella DJ, et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nature Publishing Group US New York; 2021. https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary . Accessed 15 Apr 2023.

Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Elsevier; 2022. https://db.cngb.org/stomics/mosta/resource/ . Accessed 31 July 2024.

Moffitt JR, Bambah-Mukku D, Eichhorn SW, Vaughn E, Shekhar K, Perez JD, et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. American Association for the Advancement of Science; 2018. https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248 . Accessed 31 July 2024.

Zhang M, Eichhorn SW, Zingg B, Yao Z, Cotter K, Zeng H, et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature Publishing Group UK London; 2021. https://zenodo.org/records/8167488 . Accessed 15 Apr 2023.

Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021;24(3):425–36.

Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature. 2007;445(7124):168–76.

Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, et al. Squidpy: a scalable framework for spatial omics analysis. Nat Methods. 2022;19(2):171–8.

Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 2019;20:1–16.

Zuo C, Zhang Y, Cao C, Feng J, Jiao M, Chen L. Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning. Nat Commun. 2022;13(1):5962.

Fraley C, Raftery AE, Murphy T, Scrucca L. mclust Version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Washington: University of Washington; 2012.

Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods. 2024;21(4):712–22.

Cheng A, Hu G, Li WV. Benchmarking cell-type clustering methods for spatially resolved transcriptomics data. Brief Bioinforma. 2023;24(1):bbac475.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

Hu Y, Xie M, Li Y, Rao M, Shen W, Luo C, et al. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. GitHub. 2024. https://github.com/maiziezhoulab/BenchmarkST . Accessed 31 July 2024.

Hu Y, Xie M, Li Y, Rao M, Shen W, Luo C, et al. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. Zenodo. 2024. https://doi.org/10.5281/zenodo.13128213 .

Hu Y, Xie M, Li Y, Rao M, Shen W, Luo C, et al. DLPFC 151673 simulated data. Zenodo. 2024. https://doi.org/10.5281/zenodo.10800745 .

Download references

Review history

The review history is available as Additional file 3.

Peer review information

Kevin Pang and Veronique van den Berghe were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

This work was supported by the NIGMS Maximizing Investigators’ Research Award (MIRA) R35 GM146960 to X.M.Z., and Guangdong Basic and Applied Basic Research Foundation (2023A1515030154) to W.S..

Author information

Authors and affiliations.

Department of Computer Science, Vanderbilt University, 37235, Nashville, USA

Yunfei Hu, Mingxing Rao, Haoran Qin, Jihoon Baek & Xin Maizie Zhou

Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, USA

Manfei Xie, Yikang Li, Can Luo & Xin Maizie Zhou

Department of Bioinformatics, Shantou University Medical College, 515041, Shantou, China

Wenjun Shen

You can also search for this author in PubMed   Google Scholar

Contributions

X.M.Z. conceived and led this work. Y.H. and X.M.Z. designed the framework. Y.H., M.X., Y.L., M.R., W.S., C.L., H.Q., and J.B. performed all benchmark analyses. Y.H. and X.M.Z. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Xin Maizie Zhou .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

13059_2024_3361_moesm1_esm.xlsx.

Additional file 1: Ground truth annotation and parameter settings for benchmark tools. Table S1 provides a comprehensive overview of the ground truth for each dataset, detailing the specific ground truth labels and the information utilized in deriving them. Table S2 outlines the parameter settings and descriptions for each tool benchmarked.

Additional file 2: Supplementary results and Figures S1-S20.

Additional file 3. review history., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Hu, Y., Xie, M., Li, Y. et al. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. Genome Biol 25 , 212 (2024). https://doi.org/10.1186/s13059-024-03361-0

Download citation

Received : 12 March 2024

Accepted : 30 July 2024

Published : 09 August 2024

DOI : https://doi.org/10.1186/s13059-024-03361-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Spatial transcriptomics
  • Benchmarking
  • Integration
  • Batch correction
  • 3D reconstruction

Genome Biology

ISSN: 1474-760X

presentation layer domain layer data layer

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

In clean architecture, is the presentation layer allowed to communicate directly with the infrastructure layer?

I have an application using the classic clean architecture with all the dependencies set up as described (i.e. flowing inward). I have an external service I wish to use in my project, so defined the functionality (objects that interact with the service, some interfaces etc.) for interacting with that service in the Infrastructure layer.

Currently the Presentation layer is consuming the external service directly from the Infrastructure layer.

A question about this would be: is the presentation layer communicating directly with the infrastructure layer acceptable, or must everything go via the application layer?

Ideally I would like the presentation layer to call the application layer so that I can reuse some of the functionality it has available (for things such as validation amongst others) but then with the infrastructure layer not knowing about the application layer, I would need to define the external service objects in the application layer.

I would rather avoid doing so as these are service specific rather than application specific. I would much rather have them defined in the infrastructure closest to where they are used.

So, is there a way to reuse the interfaces defined in the infrastructure layer for this external service or will I just have to suck it up and have some duplication (i.e. define interfaces in the application layer that my external service in the infrastructure layer implements)?

  • asp.net-core
  • asp.net-core-webapi
  • clean-architecture

marc_s's user avatar

In Clean Architecture, you generally want to define interfaces in the Application layer. This is because all your business logic should live in and be dictated by the application/domain layer. It should define the interfaces it needs to send requests to and receive responses from the outside world. This allows you to build out your application's unique functionality without concern for how you will get/persist data, call external apis, etc.

The Infrastructure layer should be aware of the Application layer so that it can implement the interfaces. Then, the Presentation layer should invoke the Application layer. Depending on how dogmatic you want to be with CA principles, the Presentation layer should either NOT reference/use the Infrastructure layer at all (in this case, you would use some compositional root project to build a DI container), or it should reference the Infrastructure layer ONLY as a means to register the Application's interfaces with the Infrastructure's implementations for a DI container.

You shouldn't define "external service objects" in an Application layer; rather, you define only domain objects and interfaces needed by your app to do its thing. Then, the concerns of "external service" are addressed in the Infrastructure's implementation and its concerns usually look something like:

  • Knowing how to call the external service
  • Having models for sending requests to and receiving responses from the external service
  • Mapping application/domain requests from the application/domain to the external service models
  • Mapping external responses back to some application/domain model

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged asp.net-core asp.net-core-webapi clean-architecture or ask your own question .

  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Bringing clarity to status tag usage on meta sites
  • Feedback requested: How do you use tag hover descriptions for curating and do...

Hot Network Questions

  • Is it possible to use wi-fi on phone while tethered to MacBook Pro?
  • when translating a video game's controls into German do I assume a German keyboard?
  • Has technology regressed in the Alien universe?
  • Advice needed: Team needs developers, but company isn't posting jobs
  • Does H3PO exist?
  • Just got a new job... huh?
  • Function to find the most common numeric ordered pairings (value, count)
  • Why do these finite group Dedekind matrices seem to have integer spectrum when specialized to the order of group elements?
  • Did the United States have consent from Texas to cede a piece of land that was part of Texas?
  • What is the good errorformat or compiler plugin for pyright?
  • If I purchase a house through an installment sale, can I use it as collateral for a loan?
  • conflict of \counterwithin and cleveref package when sectioncounter is reset
  • What can cause a 24 volt solar panel to output 40 volt?
  • Where exactly was this picture taken?
  • How can I prove both series are equal?
  • Stargate "instructional" videos
  • Very old fantasy adventure movie where the princess is captured by evil, for evil, and turned evil
  • ~1980 UK TV: very intelligent children recruited for secret project
  • What if something goes wrong during the seven minutes of terror?
  • How to allow just one user to use SSH?
  • ambobus? (a morphologically peculiar adjective with a peculiar syntax here)
  • Finding a Linear Algebra reading topic
  • What is the number ways to count tuples whose sum is zero?
  • Is the oil level here too high that it needs to be drained or can I leave it?

presentation layer domain layer data layer

IMAGES

  1. Presentation Layer Of Osi Model

    presentation layer domain layer data layer

  2. Software Architecture Guide

    presentation layer domain layer data layer

  3. Discovering the Domain Architecture

    presentation layer domain layer data layer

  4. Where to Use Tiers and Layers Graphics in a Presentation

    presentation layer domain layer data layer

  5. [Andoird] 앱 아키텍처

    presentation layer domain layer data layer

  6. A Guide to the Presentation Layer

    presentation layer domain layer data layer

COMMENTS

  1. Presentation Domain Data Layering

    26 August 2015. One of the most common ways to modularize an information-rich program is to separate it into three broad layers: presentation (UI), domain logic (aka business logic), and data access. So you often see web applications divided into a web layer that knows about handling HTTP requests and rendering HTML, a business logic layer that ...

  2. Clean Architecture Guide (with tested examples): Data Flow ...

    Clean Architecture Layers. Let's identify the different layers & boundaries. Presentation Layer contains UI (Activities & Fragments) that are coordinated by Presenters/ViewModels which execute 1 or multiple Use cases. Presentation Layer depends on Domain Layer. Domain Layer is the most INNER part of the onion (no dependencies with other layers) and it contains Entities, Use cases ...

  3. Domain layer

    Domain layer. The domain layer is an optional layer that sits between the UI layer and the data layer. Figure 1. The domain layer's role in app architecture. The domain layer is responsible for encapsulating complex business logic, or simple business logic that is reused by multiple ViewModels. This layer is optional because not all apps will ...

  4. Flutter App Architecture: The Presentation Layer

    Flutter App Architecture: The Domain Model. Flutter App Architecture: The Application Layer. And this time, we will focus on the presentation layer and learn how we can use controllers to: hold business logic. manage the widget state. interact with repositories in the data layer. This kind of controller is the same as the view model that you ...

  5. Clean Architecture

    Data annotations should be left out of domain models. This should be added in the Infrastructure Layer using fluent syntax. ... The Presentation Layer will usually have a reference to the Infrastructure Layer, but only to register the dependencies with the IoC container. This can be avoided with IoC containers like Autofac with the use of ...

  6. Layered Architecture

    In Domain-Driven Design (DDD), a layered architecture is often used to organize the solution domain into different parts. For example, it may consist of a presentation layer, an application layer, a domain layer, and an infrastructure layer. The presentation layer handles user interaction, the application layer implements the use cases and ...

  7. App Architecture: Domain layer. The domain layer is a layer that sits

    The domain layer is responsible for encapsulating business logic, that is reused by the Presentation layer (e.g. multiple ViewModels). This separation of concerns allows the domain layer to be used as a platform-independent module to reproduce business logic on different platforms (e.g. Android, iOS, Web) and be covered with unit testing.

  8. Presentation Layer in OSI model

    Prerequisite : OSI Model. Introduction : Presentation Layer is the 6th layer in the Open System Interconnection (OSI) model. This layer is also known as Translation layer, as this layer serves as a data translator for the network. The data which this layer receives from the Application Layer is extracted and manipulated here as per the required ...

  9. DDD

    Domain layer in DDD shouldn't know anything about persistence. Controller/Application service should fetch the data using repository interfaces and save the data using same interfaces. In your case, you should be fetching and saving only the User class, while Job should be below it as aggregate root child.

  10. Presentation layer

    The presentation layer ensures the information that the application layer of one system sends out is readable by the application layer of another system. On the sending system it is responsible for conversion to standard, transmittable formats. [7] On the receiving system it is responsible for the translation, formatting, and delivery of ...

  11. architecture

    The Application Layer is supposed to deal with plumbing, concurrency and cross-cutting concerns, being just a tiny wrapper over the Domain Layer. What you are describing would correspond to a (sub) layer in the Presentation Layer. Agreeing with the other commentor. It sounds like App is like a Controller.

  12. Presentation Layer

    The presentation layer is the lowest layer at which application programmers consider data structure and presentation, instead of simply sending data in the form of datagrams or packets between hosts. This layer deals with issues of string representation - whether they use the Pascal method (an integer length field followed by the specified ...

  13. Clean Architecture with MVVM

    Presentation Layers includes normal activities, fragments, adapters, view models. Domain Layer is a contract between the Data Layer and the Presentation Layer. The domain layer is related to ...

  14. Part 5— The Presentation Layer

    The presentation layer will depend in all the other layers. Code related to business rules and data access is going to reside in the shared/commonMain module, which is pure Kotlin. If we need to make any change to the business rules, we only have to modify the code in the domain layer.

  15. Lower Salinas Valley Hydrologic Models: Discretization Data (ver. 1.2

    The Lower Salinas Valley Hydrologic Models' discretization data includes a shapefile of the model domain and layers and a shapefile of the water balance subregions. The Lower Salinas Valley Hydrologic Models (LSVHM) include a historical model, the Salinas Valley Integrated Hydrologic Model (SVIHM) and a reservoir operations model, the Salinas Valley Operational Model (SVOM).

  16. Data Trasfer Objects Between Layers in Domain Driven Design

    Data Access Layer (DAL). It is used in order to get data from database (DB). Usually it knows about Domain Entities and Domain Layer. The DAL can return either Domain Entities or DTOs (DB oriented data structures). These DTOs or Domain Entities can be used in order to build DTOs of Presentation Layer ( view models) if it is needed.

  17. Benchmarking clustering, alignment, and integration methods for spatial

    A similar experiment was conducted on four pairs drawn from the MHypo dataset (Fig. 7b), but layer-wise alignment accuracy was only plotted for a layer shift of 0 due to the nature of the data. SPACEL still exhibited the best performance, followed by PASTE and STalign in the second position.

  18. domain driven design

    DDD data layer and presentation layer. Ask Question Asked 3 years, 4 months ago. Modified 3 years, 3 months ago. Viewed 186 times ... DDD - communication between the presentation and domain layer. 0. DDD design concept. 2. Data Access Layer and DDD. 8. DDD (Domain Driven Design) Application Layer.

  19. In clean architecture, is the presentation layer allowed to communicate

    I have an application using the classic clean architecture with all the dependencies set up as described (i.e. flowing inward). I have an external service I wish to use in my project, so defined the functionality (objects that interact with the service, some interfaces etc.) for interacting with that service in the Infrastructure layer.. Currently the Presentation layer is consuming the ...