Skip to content

Saga Pattern

Introduction

The Saga Pattern is a distributed transaction management strategy that breaks a long-lived transaction into a sequence of smaller, local transactions, each with a corresponding compensating action to undo its effects if a later step fails. In microservices architectures where traditional two-phase commit (2PC) is impractical due to service autonomy and network unreliability, sagas provide a way to maintain data consistency across service boundaries without distributed locks. Understanding this pattern is essential for anyone designing resilient, eventually consistent distributed systems.

Core Concepts

The Problem with Distributed Transactions

In a monolithic application, a single database transaction can atomically update multiple tables. In a microservices architecture, each service owns its own database, making ACID transactions across services impossible without coordination protocols like 2PC — which introduce tight coupling, blocking, and a single point of failure.

The Saga Pattern solves this by replacing a single distributed transaction with a chain of local transactions. Each local transaction updates data within one service and publishes an event or sends a command to trigger the next step.

Saga Guarantees

Sagas provide ACD guarantees (no Isolation):

  • Atomicity: All steps complete or all compensations run
  • Consistency: The system moves from one valid state to another
  • Durability: Each local transaction is durable

The lack of Isolation means intermediate states are visible to other transactions. This requires careful handling through countermeasures like semantic locks, commutative updates, and read-your-writes patterns.

Two Coordination Strategies

There are two primary approaches to coordinating a saga:

  1. Choreography: Each service listens for events from other services and decides locally whether to act. There is no central controller.
  2. Orchestration: A central orchestrator (saga execution coordinator) tells each participant what to do and when, managing the overall flow.

Forward Recovery vs. Backward Recovery

When a step in a saga fails, the system has two options:

  • Backward Recovery: Execute compensating transactions for all previously completed steps (rollback semantics)
  • Forward Recovery: Retry the failed step until it succeeds (requires idempotent operations)

Step Classification

Saga steps fall into three categories:

  • Compensatable Transactions: Steps that can be undone by a compensating transaction (e.g., reserve inventory → cancel reservation)
  • Pivot Transaction: The go/no-go decision point. If it succeeds, the saga will run to completion. If it fails, compensation begins.
  • Retriable Transactions: Steps after the pivot that are guaranteed to eventually succeed (e.g., sending a confirmation email)

Implementation: Choreography-Based Saga

In the choreography approach, each service publishes domain events that other services subscribe to. Let's model an e-commerce order flow.

When payment fails, compensation flows backward:

Java Implementation — Choreography

java
import java.util.*;
import java.util.concurrent.*;
import java.util.function.Consumer;

// Simple in-memory event bus for demonstration
public class EventBus {
    private final Map<String, List<Consumer<Map<String, Object>>>> subscribers = new ConcurrentHashMap<>();

    public void subscribe(String eventType, Consumer<Map<String, Object>> handler) {
        subscribers.computeIfAbsent(eventType, k -> new CopyOnWriteArrayList<>()).add(handler);
    }

    public void publish(String eventType, Map<String, Object> payload) {
        System.out.printf("[EventBus] Publishing: %s -> %s%n", eventType, payload);
        List<Consumer<Map<String, Object>>> handlers = subscribers.getOrDefault(eventType, List.of());
        for (Consumer<Map<String, Object>> handler : handlers) {
            CompletableFuture.runAsync(() -> handler.accept(payload));
        }
    }
}
java
public class OrderService {
    private final EventBus eventBus;
    private final Map<String, String> orders = new ConcurrentHashMap<>();

    public OrderService(EventBus eventBus) {
        this.eventBus = eventBus;

        // Listen for saga completion/failure events
        eventBus.subscribe("OrderShipped", this::onOrderShipped);
        eventBus.subscribe("StockReleased", this::onStockReleased);
        eventBus.subscribe("StockReservationFailed", this::onReservationFailed);
    }

    public String createOrder(String customerId, String productId, int quantity) {
        String orderId = UUID.randomUUID().toString().substring(0, 8);
        orders.put(orderId, "PENDING");
        System.out.printf("[OrderService] Order %s created (PENDING)%n", orderId);

        Map<String, Object> event = Map.of(
            "orderId", orderId,
            "customerId", customerId,
            "productId", productId,
            "quantity", quantity
        );
        eventBus.publish("OrderCreated", event);
        return orderId;
    }

    private void onOrderShipped(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        orders.put(orderId, "CONFIRMED");
        System.out.printf("[OrderService] Order %s CONFIRMED%n", orderId);
    }

    private void onStockReleased(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        orders.put(orderId, "CANCELLED");
        System.out.printf("[OrderService] Order %s CANCELLED (compensation complete)%n", orderId);
    }

    private void onReservationFailed(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        orders.put(orderId, "CANCELLED");
        System.out.printf("[OrderService] Order %s CANCELLED (no stock)%n", orderId);
    }

    public String getStatus(String orderId) {
        return orders.getOrDefault(orderId, "NOT_FOUND");
    }
}
java
public class InventoryService {
    private final EventBus eventBus;
    private final Map<String, Integer> stock = new ConcurrentHashMap<>();
    private final Set<String> reservations = ConcurrentHashMap.newKeySet();

    public InventoryService(EventBus eventBus) {
        this.eventBus = eventBus;
        stock.put("PROD-001", 10);

        eventBus.subscribe("OrderCreated", this::onOrderCreated);
        eventBus.subscribe("PaymentFailed", this::onPaymentFailed);
    }

    private void onOrderCreated(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        String productId = (String) event.get("productId");
        int quantity = (int) event.get("quantity");

        int available = stock.getOrDefault(productId, 0);
        if (available >= quantity) {
            stock.put(productId, available - quantity);
            reservations.add(orderId);
            System.out.printf("[InventoryService] Stock reserved for order %s%n", orderId);
            eventBus.publish("StockReserved", Map.of(
                "orderId", orderId,
                "customerId", event.get("customerId"),
                "amount", quantity * 25.0
            ));
        } else {
            System.out.printf("[InventoryService] Insufficient stock for order %s%n", orderId);
            eventBus.publish("StockReservationFailed", Map.of("orderId", orderId));
        }
    }

    // Compensating transaction
    private void onPaymentFailed(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        if (reservations.remove(orderId)) {
            String productId = (String) event.getOrDefault("productId", "PROD-001");
            stock.merge(productId, 1, Integer::sum); // simplified
            System.out.printf("[InventoryService] COMPENSATING: Stock released for order %s%n", orderId);
            eventBus.publish("StockReleased", Map.of("orderId", orderId));
        }
    }
}
java
public class PaymentService {
    private final EventBus eventBus;
    private final boolean simulateFailure;

    public PaymentService(EventBus eventBus, boolean simulateFailure) {
        this.eventBus = eventBus;
        this.simulateFailure = simulateFailure;
        eventBus.subscribe("StockReserved", this::onStockReserved);
    }

    private void onStockReserved(Map<String, Object> event) {
        String orderId = (String) event.get("orderId");
        double amount = (double) event.get("amount");

        if (simulateFailure) {
            System.out.printf("[PaymentService] Payment FAILED for order %s ($%.2f)%n", orderId, amount);
            eventBus.publish("PaymentFailed", Map.of("orderId", orderId));
        } else {
            System.out.printf("[PaymentService] Payment processed for order %s ($%.2f)%n", orderId, amount);
            eventBus.publish("PaymentProcessed", Map.of("orderId", orderId));
        }
    }
}
java
public class ChoreographySagaDemo {
    public static void main(String[] args) throws InterruptedException {
        System.out.println("=== Happy Path ===");
        EventBus bus1 = new EventBus();
        OrderService orderService1 = new OrderService(bus1);
        new InventoryService(bus1);
        new PaymentService(bus1, false);
        // Simplified shipping handler
        bus1.subscribe("PaymentProcessed", event -> {
            System.out.printf("[ShippingService] Shipment scheduled for order %s%n", event.get("orderId"));
            bus1.publish("OrderShipped", event);
        });

        orderService1.createOrder("CUST-1", "PROD-001", 2);
        Thread.sleep(1000);

        System.out.println("\n=== Failure Path (Payment Fails) ===");
        EventBus bus2 = new EventBus();
        OrderService orderService2 = new OrderService(bus2);
        new InventoryService(bus2);
        new PaymentService(bus2, true); // simulate failure

        orderService2.createOrder("CUST-2", "PROD-001", 1);
        Thread.sleep(1000);
    }
}

Implementation: Orchestration-Based Saga

The orchestrator approach centralizes saga logic in a single component that drives the workflow.

java
import java.util.*;
import java.util.function.Function;

public class SagaOrchestrator {

    public enum StepStatus { SUCCESS, FAILURE }

    public static class SagaStep {
        private final String name;
        private final Function<Map<String, Object>, StepStatus> action;
        private final Function<Map<String, Object>, StepStatus> compensation;

        public SagaStep(String name,
                        Function<Map<String, Object>, StepStatus> action,
                        Function<Map<String, Object>, StepStatus> compensation) {
            this.name = name;
            this.action = action;
            this.compensation = compensation;
        }
    }

    private final List<SagaStep> steps = new ArrayList<>();
    private final List<SagaStep> completedSteps = new ArrayList<>();

    public SagaOrchestrator addStep(String name,
                                     Function<Map<String, Object>, StepStatus> action,
                                     Function<Map<String, Object>, StepStatus> compensation) {
        steps.add(new SagaStep(name, action, compensation));
        return this;
    }

    public boolean execute(Map<String, Object> context) {
        System.out.println("\n[Orchestrator] Starting saga execution...");

        for (SagaStep step : steps) {
            System.out.printf("[Orchestrator] Executing step: %s%n", step.name);
            StepStatus result = step.action.apply(context);

            if (result == StepStatus.SUCCESS) {
                System.out.printf("[Orchestrator] Step '%s' succeeded%n", step.name);
                completedSteps.add(step);
            } else {
                System.out.printf("[Orchestrator] Step '%s' FAILED — starting compensation%n", step.name);
                compensate(context);
                return false;
            }
        }

        System.out.println("[Orchestrator] Saga completed successfully!");
        return true;
    }

    private void compensate(Map<String, Object> context) {
        System.out.println("[Orchestrator] Running compensating transactions...");
        // Compensate in reverse order
        ListIterator<SagaStep> it = completedSteps.listIterator(completedSteps.size());
        while (it.hasPrevious()) {
            SagaStep step = it.previous();
            System.out.printf("[Orchestrator] Compensating: %s%n", step.name);
            step.compensation.apply(context);
        }
        System.out.println("[Orchestrator] All compensations complete.");
    }
}
java
import java.util.*;

public class OrchestrationSagaDemo {

    static SagaOrchestrator.StepStatus createOrder(Map<String, Object> ctx) {
        String orderId = UUID.randomUUID().toString().substring(0, 8);
        ctx.put("orderId", orderId);
        ctx.put("orderStatus", "CREATED");
        System.out.printf("  [OrderService] Order %s created%n", orderId);
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus cancelOrder(Map<String, Object> ctx) {
        ctx.put("orderStatus", "CANCELLED");
        System.out.printf("  [OrderService] Order %s cancelled%n", ctx.get("orderId"));
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus reserveStock(Map<String, Object> ctx) {
        System.out.printf("  [InventoryService] Stock reserved for order %s%n", ctx.get("orderId"));
        ctx.put("stockReserved", true);
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus releaseStock(Map<String, Object> ctx) {
        System.out.printf("  [InventoryService] Stock released for order %s%n", ctx.get("orderId"));
        ctx.put("stockReserved", false);
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus processPayment(Map<String, Object> ctx) {
        boolean shouldFail = (boolean) ctx.getOrDefault("simulatePaymentFailure", false);
        if (shouldFail) {
            System.out.printf("  [PaymentService] Payment DECLINED for order %s%n", ctx.get("orderId"));
            return SagaOrchestrator.StepStatus.FAILURE;
        }
        System.out.printf("  [PaymentService] Payment charged for order %s%n", ctx.get("orderId"));
        ctx.put("paymentProcessed", true);
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus refundPayment(Map<String, Object> ctx) {
        System.out.printf("  [PaymentService] Payment refunded for order %s%n", ctx.get("orderId"));
        ctx.put("paymentProcessed", false);
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus scheduleShipment(Map<String, Object> ctx) {
        System.out.printf("  [ShippingService] Shipment scheduled for order %s%n", ctx.get("orderId"));
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    static SagaOrchestrator.StepStatus cancelShipment(Map<String, Object> ctx) {
        System.out.printf("  [ShippingService] Shipment cancelled for order %s%n", ctx.get("orderId"));
        return SagaOrchestrator.StepStatus.SUCCESS;
    }

    public static void main(String[] args) {
        // Happy path
        System.out.println("========== HAPPY PATH ==========");
        SagaOrchestrator saga1 = new SagaOrchestrator()
            .addStep("Create Order",        OrchestrationSagaDemo::createOrder,      OrchestrationSagaDemo::cancelOrder)
            .addStep("Reserve Stock",       OrchestrationSagaDemo::reserveStock,     OrchestrationSagaDemo::releaseStock)
            .addStep("Process Payment",     OrchestrationSagaDemo::processPayment,   OrchestrationSagaDemo::refundPayment)
            .addStep("Schedule Shipment",   OrchestrationSagaDemo::scheduleShipment, OrchestrationSagaDemo::cancelShipment);

        Map<String, Object> ctx1 = new HashMap<>();
        saga1.execute(ctx1);
        System.out.printf("Final order status: %s%n", ctx1.get("orderStatus"));

        // Failure path
        System.out.println("\n========== FAILURE PATH (Payment Fails) ==========");
        SagaOrchestrator saga2 = new SagaOrchestrator()
            .addStep("Create Order",        OrchestrationSagaDemo::createOrder,      OrchestrationSagaDemo::cancelOrder)
            .addStep("Reserve Stock",       OrchestrationSagaDemo::reserveStock,     OrchestrationSagaDemo::releaseStock)
            .addStep("Process Payment",     OrchestrationSagaDemo::processPayment,   OrchestrationSagaDemo::refundPayment)
            .addStep("Schedule Shipment",   OrchestrationSagaDemo::scheduleShipment, OrchestrationSagaDemo::cancelShipment);

        Map<String, Object> ctx2 = new HashMap<>();
        ctx2.put("simulatePaymentFailure", true);
        saga2.execute(ctx2);
        System.out.printf("Final order status: %s%n", ctx2.get("orderStatus"));
    }
}

Choreography vs. Orchestration

Handling the Isolation Problem

Since sagas lack isolation, concurrent sagas can interfere with each other. Several countermeasures exist:

Semantic Lock Example

java
public class OrderWithSemanticLock {
    public enum Status { 
        PENDING,            // Semantic lock — saga in progress
        APPROVAL_PENDING,   // Semantic lock — awaiting approval  
        CONFIRMED,          // Final state
        CANCELLED           // Final state
    }

    private final String orderId;
    private Status status;
    private int version;

    public OrderWithSemanticLock(String orderId) {
        this.orderId = orderId;
        this.status = Status.PENDING; // Lock acquired
        this.version = 0;
    }

    public synchronized boolean tryTransition(Status expected, Status next) {
        if (this.status != expected) {
            System.out.printf("  Transition rejected: expected %s but was %s%n", expected, this.status);
            return false;
        }
        this.status = next;
        this.version++;
        System.out.printf("  Order %s: %s -> %s (v%d)%n", orderId, expected, next, version);
        return true;
    }

    public boolean isLocked() {
        return status == Status.PENDING || status == Status.APPROVAL_PENDING;
    }

    public static void main(String[] args) {
        OrderWithSemanticLock order = new OrderWithSemanticLock("ORD-100");
        System.out.println("Order locked (in saga): " + order.isLocked());

        // Saga step completes
        order.tryTransition(Status.PENDING, Status.APPROVAL_PENDING);
        System.out.println("Still locked: " + order.isLocked());

        // Final saga step — lock released
        order.tryTransition(Status.APPROVAL_PENDING, Status.CONFIRMED);
        System.out.println("Locked: " + order.isLocked());
    }
}

Saga State Machine

A robust orchestrator tracks the saga's state persistently to survive crashes:

AWS Integration Example

AWS Step Functions is a natural fit for the orchestration-based saga pattern. Each state machine step invokes a Lambda function, with catch blocks defining compensating actions.

Best Practices

  1. Design compensating transactions carefully: Compensations must be semantically correct undo operations, not necessarily database rollbacks. A payment compensation is a refund, not deleting the payment record.

  2. Make all steps idempotent: Network retries and at-least-once delivery mean steps may execute multiple times. Use idempotency keys to prevent duplicate side effects.

  3. Persist saga state: Store the saga's current step and context in a durable store (database or Step Functions) so it can resume after orchestrator crashes.

  4. Use correlation IDs: Every event and command in a saga should carry a unique saga ID (correlation ID) to trace and debug the entire transaction flow.

  5. Prefer orchestration for complex sagas: When you have more than three to four steps, choreography becomes difficult to reason about. Orchestration provides a single place to understand the complete flow.

  6. Implement timeouts on every step: Without timeouts, a saga can hang indefinitely if a participant service becomes unresponsive. Use deadlines to trigger compensation.

  7. Handle partial failures in compensation: Compensating transactions themselves can fail. Implement retry logic with exponential backoff for compensations, and have a dead letter queue for manual intervention.

  8. Apply semantic locks for isolation: Flag records that are part of an in-progress saga so concurrent sagas can detect and handle conflicts appropriately.

  9. Order steps to minimize compensation cost: Place the most likely-to-fail or most expensive-to-compensate steps as early as possible (before the pivot transaction) to reduce wasted work.

  10. Test failure scenarios extensively: Happy path testing is insufficient. Test every possible failure point and verify that compensations restore the system to a consistent state.