Skip to content

Strategies for Microservices Interdependencies with Java

Introduction

Microservices architectures decompose monolithic systems into independently deployable services, but this decomposition introduces complex interdependencies that must be carefully managed. When Service A depends on Service B, which depends on Service C, a failure anywhere in the chain can cascade catastrophically. This article explores proven strategies — circuit breakers, bulkheads, sagas, event-driven choreography, API gateways, and service meshes — with complete Java implementations that demonstrate how to build resilient, loosely-coupled microservice ecosystems.

Core Concepts

The Dependency Problem

In a monolith, method calls between modules are in-process and nearly instantaneous. In microservices, every call crosses a network boundary, introducing latency, partial failures, and the need for serialization. Interdependencies fall into several categories:

  • Synchronous (request-response): Service A calls Service B and blocks until a response arrives. This is the tightest form of coupling.
  • Asynchronous (event-driven): Service A publishes an event; interested services consume it independently. This is the loosest form of coupling.
  • Orchestrated: A central coordinator directs the workflow across services.
  • Choreographed: Each service reacts to events and emits new events without a central coordinator.

Key Resilience Patterns

PatternPurposeCoupling Impact
Circuit BreakerPrevents cascading failures by short-circuiting calls to unhealthy servicesReduces temporal coupling
BulkheadIsolates resources so one failing dependency doesn't starve othersLimits blast radius
Retry with BackoffHandles transient failures gracefullyMaintains availability
SagaCoordinates distributed transactions without 2PCManages data consistency
Event-Driven ChoreographyDecouples services via asynchronous eventsEliminates synchronous coupling
API GatewayCentralizes cross-cutting concerns and aggregates callsSimplifies client coupling

Strategy 1: Circuit Breaker Pattern

The circuit breaker prevents a service from repeatedly calling a failing dependency. It tracks failure rates and "opens" the circuit when a threshold is exceeded, returning a fallback response immediately instead of waiting for timeouts.

Java Implementation with Resilience4j

java
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.github.resilience4j.retry.Retry;
import io.github.resilience4j.retry.RetryConfig;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.function.Supplier;

public class InventoryServiceClient {

    private final HttpClient httpClient;
    private final CircuitBreaker circuitBreaker;
    private final Retry retry;
    private final String baseUrl;

    public InventoryServiceClient(String baseUrl) {
        this.baseUrl = baseUrl;
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(3))
                .build();

        // Circuit breaker: opens after 50% failure rate in 10 calls
        CircuitBreakerConfig cbConfig = CircuitBreakerConfig.custom()
                .failureRateThreshold(50)
                .slidingWindowSize(10)
                .waitDurationInOpenState(Duration.ofSeconds(30))
                .permittedNumberOfCallsInHalfOpenState(3)
                .slowCallDurationThreshold(Duration.ofSeconds(2))
                .slowCallRateThreshold(80)
                .build();

        CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(cbConfig);
        this.circuitBreaker = registry.circuitBreaker("inventoryService");

        // Retry: 3 attempts with exponential backoff
        RetryConfig retryConfig = RetryConfig.custom()
                .maxAttempts(3)
                .waitDuration(Duration.ofMillis(500))
                .retryExceptions(java.io.IOException.class)
                .build();
        this.retry = Retry.of("inventoryRetry", retryConfig);
    }

    public InventoryResponse checkInventory(String productId) {
        Supplier<InventoryResponse> decoratedSupplier = CircuitBreaker
                .decorateSupplier(circuitBreaker, () -> callInventoryService(productId));

        // Wrap with retry (retry wraps circuit breaker)
        decoratedSupplier = Retry.decorateSupplier(retry, decoratedSupplier);

        try {
            return decoratedSupplier.get();
        } catch (Exception e) {
            System.err.println("Circuit breaker state: " + circuitBreaker.getState());
            System.err.println("Falling back for product: " + productId);
            return fallbackResponse(productId);
        }
    }

    private InventoryResponse callInventoryService(String productId) {
        try {
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(URI.create(baseUrl + "/api/inventory/" + productId))
                    .timeout(Duration.ofSeconds(5))
                    .GET()
                    .build();

            HttpResponse<String> response = httpClient.send(
                    request, HttpResponse.BodyHandlers.ofString());

            if (response.statusCode() == 200) {
                return InventoryResponse.fromJson(response.body());
            }
            throw new RuntimeException("Inventory service returned: " + response.statusCode());
        } catch (Exception e) {
            throw new RuntimeException("Inventory service call failed", e);
        }
    }

    private InventoryResponse fallbackResponse(String productId) {
        // Return a degraded but safe response
        return new InventoryResponse(productId, -1, "UNKNOWN",
                "Inventory data temporarily unavailable");
    }

    public static void main(String[] args) {
        InventoryServiceClient client = new InventoryServiceClient("http://inventory-svc:8080");

        // Simulate multiple calls — circuit breaker will open after failures
        for (int i = 0; i < 15; i++) {
            InventoryResponse response = client.checkInventory("PROD-" + (i % 3));
            System.out.printf("Call %d: product=%s, qty=%d, status=%s%n",
                    i, response.productId(), response.quantity(), response.status());
        }
    }
}

record InventoryResponse(String productId, int quantity, String status, String message) {
    static InventoryResponse fromJson(String json) {
        // Simplified parsing — use Jackson in production
        return new InventoryResponse("parsed-id", 42, "IN_STOCK", "OK");
    }
}

Strategy 2: Bulkhead Pattern

The bulkhead pattern isolates different service dependencies into separate thread pools or semaphores, preventing a slow dependency from consuming all available threads.

Java Implementation

java
import io.github.resilience4j.bulkhead.Bulkhead;
import io.github.resilience4j.bulkhead.BulkheadConfig;
import io.github.resilience4j.bulkhead.BulkheadRegistry;

import java.time.Duration;
import java.util.concurrent.*;
import java.util.function.Supplier;

public class BulkheadOrderService {

    private final Bulkhead inventoryBulkhead;
    private final Bulkhead paymentBulkhead;
    private final Bulkhead shippingBulkhead;

    public BulkheadOrderService() {
        // Each dependency gets its own concurrency limit
        this.inventoryBulkhead = Bulkhead.of("inventory", BulkheadConfig.custom()
                .maxConcurrentCalls(10)
                .maxWaitDuration(Duration.ofMillis(500))
                .build());

        this.paymentBulkhead = Bulkhead.of("payment", BulkheadConfig.custom()
                .maxConcurrentCalls(5)
                .maxWaitDuration(Duration.ofMillis(1000))
                .build());

        this.shippingBulkhead = Bulkhead.of("shipping", BulkheadConfig.custom()
                .maxConcurrentCalls(8)
                .maxWaitDuration(Duration.ofMillis(300))
                .build());

        // Register event listeners for observability
        inventoryBulkhead.getEventPublisher()
                .onCallRejected(event ->
                    System.err.println("Inventory bulkhead rejected call: " + event));
    }

    public OrderResult processOrder(Order order) {
        // Each external call is isolated in its own bulkhead
        Supplier<Boolean> inventoryCheck = Bulkhead.decorateSupplier(
                inventoryBulkhead, () -> checkInventory(order));

        Supplier<String> paymentResult = Bulkhead.decorateSupplier(
                paymentBulkhead, () -> processPayment(order));

        Supplier<String> shippingResult = Bulkhead.decorateSupplier(
                shippingBulkhead, () -> scheduleShipping(order));

        try {
            boolean hasStock = inventoryCheck.get();
            if (!hasStock) {
                return new OrderResult(order.id(), "REJECTED", "Out of stock");
            }

            String paymentId = paymentResult.get();
            String trackingId = shippingResult.get();

            return new OrderResult(order.id(), "COMPLETED",
                    "Payment: " + paymentId + ", Tracking: " + trackingId);
        } catch (Exception e) {
            return new OrderResult(order.id(), "FAILED", e.getMessage());
        }
    }

    private boolean checkInventory(Order order) {
        simulateNetworkCall(100);
        return true;
    }

    private String processPayment(Order order) {
        simulateNetworkCall(200);
        return "PAY-" + System.nanoTime();
    }

    private String scheduleShipping(Order order) {
        simulateNetworkCall(150);
        return "TRACK-" + System.nanoTime();
    }

    private void simulateNetworkCall(long millis) {
        try { Thread.sleep(millis); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
    }

    public static void main(String[] args) throws Exception {
        BulkheadOrderService service = new BulkheadOrderService();
        ExecutorService executor = Executors.newFixedThreadPool(20);

        // Simulate concurrent orders
        for (int i = 0; i < 20; i++) {
            final int orderId = i;
            executor.submit(() -> {
                Order order = new Order("ORD-" + orderId, "PROD-1", 2, 29.99);
                OrderResult result = service.processOrder(order);
                System.out.printf("Order %s: %s — %s%n",
                        result.orderId(), result.status(), result.details());
            });
        }

        executor.shutdown();
        executor.awaitTermination(30, TimeUnit.SECONDS);
    }
}

record Order(String id, String productId, int quantity, double amount) {}
record OrderResult(String orderId, String status, String details) {}

Strategy 3: Saga Pattern (Orchestration)

Sagas manage distributed transactions by breaking them into local transactions with compensating actions. The orchestrator coordinates the sequence and handles rollbacks.

Java Implementation

java
import java.util.*;
import java.util.function.Function;

public class SagaOrchestrator {

    private final List<SagaStep> steps = new ArrayList<>();
    private final List<SagaStep> completedSteps = new ArrayList<>();

    public SagaOrchestrator addStep(String name,
                                     Function<SagaContext, StepResult> action,
                                     Function<SagaContext, StepResult> compensation) {
        steps.add(new SagaStep(name, action, compensation));
        return this;
    }

    public SagaResult execute(SagaContext context) {
        System.out.println("=== Starting Saga ===");

        for (SagaStep step : steps) {
            System.out.printf("Executing step: %s%n", step.name());
            try {
                StepResult result = step.action().apply(context);
                if (result.success()) {
                    completedSteps.add(step);
                    context.put(step.name() + "_result", result.data());
                    System.out.printf("  Step %s succeeded: %s%n", step.name(), result.data());
                } else {
                    System.out.printf("  Step %s failed: %s%n", step.name(), result.data());
                    return compensate(context, step.name() + " failed: " + result.data());
                }
            } catch (Exception e) {
                System.out.printf("  Step %s threw exception: %s%n", step.name(), e.getMessage());
                return compensate(context, step.name() + " exception: " + e.getMessage());
            }
        }

        System.out.println("=== Saga Completed Successfully ===");
        return new SagaResult(true, "All steps completed", context);
    }

    private SagaResult compensate(SagaContext context, String reason) {
        System.out.println("=== Starting Compensation ===");

        // Compensate in reverse order
        Collections.reverse(completedSteps);
        List<String> compensationErrors = new ArrayList<>();

        for (SagaStep step : completedSteps) {
            System.out.printf("Compensating step: %s%n", step.name());
            try {
                StepResult result = step.compensation().apply(context);
                if (result.success()) {
                    System.out.printf("  Compensation %s succeeded%n", step.name());
                } else {
                    compensationErrors.add(step.name() + ": " + result.data());
                    System.err.printf("  Compensation %s failed: %s%n", step.name(), result.data());
                }
            } catch (Exception e) {
                compensationErrors.add(step.name() + ": " + e.getMessage());
                System.err.printf("  Compensation %s threw: %s%n", step.name(), e.getMessage());
            }
        }

        String message = "Saga rolled back. Reason: " + reason;
        if (!compensationErrors.isEmpty()) {
            message += ". Compensation errors: " + compensationErrors;
        }
        return new SagaResult(false, message, context);
    }

    public static void main(String[] args) {
        SagaOrchestrator saga = new SagaOrchestrator()
                .addStep("reserveInventory",
                        ctx -> {
                            String reservationId = "RES-" + System.nanoTime();
                            ctx.put("reservationId", reservationId);
                            return new StepResult(true, reservationId);
                        },
                        ctx -> {
                            System.out.println("  Releasing reservation: " + ctx.get("reservationId"));
                            return new StepResult(true, "released");
                        })
                .addStep("chargePayment",
                        ctx -> {
                            String paymentId = "PAY-" + System.nanoTime();
                            ctx.put("paymentId", paymentId);
                            return new StepResult(true, paymentId);
                        },
                        ctx -> {
                            System.out.println("  Refunding payment: " + ctx.get("paymentId"));
                            return new StepResult(true, "refunded");
                        })
                .addStep("scheduleShipment",
                        ctx -> {
                            // Simulate failure
                            return new StepResult(false, "No carriers available");
                        },
                        ctx -> new StepResult(true, "shipment cancelled"));

        SagaContext context = new SagaContext();
        context.put("orderId", "ORD-12345");
        context.put("customerId", "CUST-789");

        SagaResult result = saga.execute(context);
        System.out.printf("%nFinal result: success=%b, message=%s%n", result.success(), result.message());
    }
}

record SagaStep(String name,
                Function<SagaContext, StepResult> action,
                Function<SagaContext, StepResult> compensation) {}

record StepResult(boolean success, String data) {}
record SagaResult(boolean success, String message, SagaContext context) {}

class SagaContext {
    private final Map<String, String> data = new HashMap<>();

    public void put(String key, String value) { data.put(key, value); }
    public String get(String key) { return data.get(key); }
}

Strategy 4: Event-Driven Choreography

Instead of a central orchestrator, each service listens for events and reacts independently. This dramatically reduces coupling but requires careful design to maintain observability.

Java Implementation with AWS SNS/SQS

java
import software.amazon.awssdk.services.sns.SnsClient;
import software.amazon.awssdk.services.sns.model.PublishRequest;
import software.amazon.awssdk.services.sqs.SqsClient;
import software.amazon.awssdk.services.sqs.model.*;

import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Consumer;

public class EventDrivenService {

    private final SnsClient snsClient;
    private final SqsClient sqsClient;
    private final String topicArn;
    private final String queueUrl;
    private final Map<String, Consumer<DomainEvent>> handlers = new ConcurrentHashMap<>();
    private volatile boolean running = true;

    public EventDrivenService(String topicArn, String queueUrl) {
        this.snsClient = SnsClient.create();
        this.sqsClient = SqsClient.create();
        this.topicArn = topicArn;
        this.queueUrl = queueUrl;
    }

    // Publish an event to the event bus
    public void publishEvent(DomainEvent event) {
        String messageBody = event.toJson();

        PublishRequest request = PublishRequest.builder()
                .topicArn(topicArn)
                .message(messageBody)
                .messageAttributes(Map.of(
                        "eventType", software.amazon.awssdk.services.sns.model
                                .MessageAttributeValue.builder()
                                .dataType("String")
                                .stringValue(event.type())
                                .build()
                ))
                .build();

        snsClient.publish(request);
        System.out.printf("Published event: %s [%s]%n", event.type(), event.aggregateId());
    }

    // Register an event handler for a specific event type
    public void on(String eventType, Consumer<DomainEvent> handler) {
        handlers.put(eventType, handler);
    }

    // Start polling for events
    public void startListening() {
        Thread pollerThread = new Thread(() -> {
            while (running) {
                ReceiveMessageRequest receiveRequest = ReceiveMessageRequest.builder()
                        .queueUrl(queueUrl)
                        .maxNumberOfMessages(10)
                        .waitTimeSeconds(20) // Long polling
                        .messageAttributeNames("All")
                        .build();

                ReceiveMessageResponse response = sqsClient.receiveMessage(receiveRequest);

                for (Message message : response.messages()) {
                    try {
                        DomainEvent event = DomainEvent.fromJson(message.body());
                        Consumer<DomainEvent> handler = handlers.get(event.type());

                        if (handler != null) {
                            handler.accept(event);
                            // Delete message after successful processing
                            sqsClient.deleteMessage(DeleteMessageRequest.builder()
                                    .queueUrl(queueUrl)
                                    .receiptHandle(message.receiptHandle())
                                    .build());
                        }
                    } catch (Exception e) {
                        System.err.println("Failed to process message: " + e.getMessage());
                        // Message remains visible after visibility timeout for retry
                    }
                }
            }
        }, "event-poller");
        pollerThread.setDaemon(true);
        pollerThread.start();
    }

    public void stop() { this.running = false; }

    public static void main(String[] args) {
        // Inventory service setup
        EventDrivenService inventoryService = new EventDrivenService(
                "arn:aws:sns:us-east-1:123456789:order-events",
                "https://sqs.us-east-1.amazonaws.com/123456789/inventory-queue"
        );

        inventoryService.on("OrderCreated", event -> {
            System.out.println("Reserving inventory for order: " + event.aggregateId());
            // Reserve stock in local database
            DomainEvent reserved = new DomainEvent(
                    "InventoryReserved", event.aggregateId(),
                    "{\"reservationId\": \"RES-001\"}");
            inventoryService.publishEvent(reserved);
        });

        inventoryService.on("PaymentFailed", event -> {
            System.out.println("Releasing inventory for order: " + event.aggregateId());
            // Compensating action: release reserved stock
        });

        inventoryService.startListening();
    }
}

record DomainEvent(String type, String aggregateId, String payload) {
    String toJson() {
        return String.format(
                "{\"type\":\"%s\",\"aggregateId\":\"%s\",\"payload\":%s}",
                type, aggregateId, payload);
    }

    static DomainEvent fromJson(String json) {
        // Simplified — use Jackson in production
        return new DomainEvent("OrderCreated", "ORD-123", "{}");
    }
}

Strategy 5: API Gateway Aggregation

An API gateway can aggregate calls to multiple downstream services, shielding clients from internal interdependencies and reducing client-side complexity.

Java Implementation with CompletableFuture

java
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;

public class ApiGatewayAggregator {

    private final HttpClient client;

    public ApiGatewayAggregator() {
        this.client = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(2))
                .build();
    }

    public CompletableFuture<AggregatedDashboard> getDashboard(String userId) {
        // Fan out to three services in parallel
        CompletableFuture<String> profileFuture = callService(
                "http://user-svc:8080/api/users/" + userId)
                .exceptionally(ex -> "{\"name\": \"Unknown\", \"degraded\": true}");

        CompletableFuture<String> ordersFuture = callService(
                "http://order-svc:8080/api/orders?userId=" + userId)
                .exceptionally(ex -> "{\"orders\": [], \"degraded\": true}");

        CompletableFuture<String> recommendationsFuture = callService(
                "http://reco-svc:8080/api/recommendations/" + userId)
                .exceptionally(ex -> "{\"items\": [], \"degraded\": true}");

        // Combine all three results with a timeout
        return profileFuture.thenCombine(ordersFuture, (profile, orders) ->
                new PartialResult(profile, orders))
            .thenCombine(recommendationsFuture, (partial, recommendations) ->
                new AggregatedDashboard(
                        partial.profile(),
                        partial.orders(),
                        recommendations))
            .orTimeout(5, TimeUnit.SECONDS)
            .exceptionally(ex -> new AggregatedDashboard(
                    "{\"degraded\": true}",
                    "{\"degraded\": true}",
                    "{\"degraded\": true}"));
    }

    private CompletableFuture<String> callService(String url) {
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .timeout(Duration.ofSeconds(3))
                .GET()
                .build();

        return client.sendAsync(request, HttpResponse.BodyHandlers.ofString())
                .thenApply(HttpResponse::body);
    }

    public static void main(String[] args) throws Exception {
        ApiGatewayAggregator gateway = new ApiGatewayAggregator();

        AggregatedDashboard dashboard = gateway.getDashboard("user-42").get();
        System.out.println("Profile: " + dashboard.profile());
        System.out.println("Orders: " + dashboard.orders());
        System.out.println("Recommendations: " + dashboard.recommendations());
    }
}

record PartialResult(String profile, String orders) {}
record AggregatedDashboard(String profile, String orders, String recommendations) {}

Strategy 6: Service Discovery and Health Checks

Dynamic service discovery eliminates hard-coded URLs and enables automatic failover when service instances become unhealthy.

java
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class ClientSideServiceDiscovery {

    private final Map<String, List<ServiceInstance>> registry = new ConcurrentHashMap<>();
    private final Map<String, AtomicInteger> roundRobinCounters = new ConcurrentHashMap<>();
    private final HttpClient httpClient;
    private final ScheduledExecutorService healthChecker;

    public ClientSideServiceDiscovery() {
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(2))
                .build();
        this.healthChecker = Executors.newScheduledThreadPool(1);
        this.healthChecker.scheduleAtFixedRate(this::performHealthChecks, 10, 10, TimeUnit.SECONDS);
    }

    public void register(String serviceName, String host, int port) {
        ServiceInstance instance = new ServiceInstance(host, port, true);
        registry.computeIfAbsent(serviceName, k -> new CopyOnWriteArrayList<>()).add(instance);
        roundRobinCounters.putIfAbsent(serviceName, new AtomicInteger(0));
        System.out.printf("Registered %s at %s:%d%n", serviceName, host, port);
    }

    public Optional<ServiceInstance> resolve(String serviceName) {
        List<ServiceInstance> instances = registry.getOrDefault(serviceName, List.of());
        List<ServiceInstance> healthy = instances.stream()
                .filter(ServiceInstance::healthy)
                .toList();

        if (healthy.isEmpty()) return Optional.empty();

        // Round-robin load balancing
        AtomicInteger counter = roundRobinCounters.get(serviceName);
        int index = Math.abs(counter.getAndIncrement() % healthy.size());
        return Optional.of(healthy.get(index));
    }

    private void performHealthChecks() {
        registry.forEach((serviceName, instances) -> {
            for (ServiceInstance instance : instances) {
                try {
                    HttpRequest req = HttpRequest.newBuilder()
                            .uri(URI.create("http://" + instance.host() + ":" + instance.port() + "/health"))
                            .timeout(Duration.ofSeconds(2))
                            .GET().build();
                    HttpResponse<Void> resp = httpClient.send(req, HttpResponse.BodyHandlers.discarding());
                    instance.setHealthy(resp.statusCode() == 200);
                } catch (Exception e) {
                    instance.setHealthy(false);
                }
            }
        });
    }

    public static void main(String[] args) {
        ClientSideServiceDiscovery discovery = new ClientSideServiceDiscovery();
        discovery.register("inventory-service", "10.0.1.10", 8080);
        discovery.register("inventory-service", "10.0.1.11", 8080);
        discovery.register("inventory-service", "10.0.1.12", 8080);

        for (int i = 0; i < 6; i++) {
            Optional<ServiceInstance> instance = discovery.resolve("inventory-service");
            instance.ifPresentOrElse(
                    inst -> System.out.printf("Resolved to: %s:%d%n", inst.host(), inst.port()),
                    () -> System.out.println("No healthy instances available")
            );
        }
    }
}

class ServiceInstance {
    private final String host;
    private final int port;
    private volatile boolean healthy;

    ServiceInstance(String host, int port, boolean healthy) {
        this.host = host;
        this.port = port;
        this.healthy = healthy;
    }

    String host() { return host; }
    int port() { return port; }
    boolean healthy() { return healthy; }
    void setHealthy(boolean healthy) { this.healthy = healthy; }
}

Combining Strategies: Full Architecture

Decision Matrix: Choosing the Right Strategy

Best Practices

  1. Prefer asynchronous communication: Use event-driven patterns as the default; fall back to synchronous calls only when real-time responses are strictly required.
  2. Always implement circuit breakers on synchronous calls: Every HTTP or gRPC call to another service must be protected by a circuit breaker to prevent cascading failures.
  3. Design compensating transactions explicitly: Every saga step must have a corresponding compensation action defined before deployment — never assume rollback is trivial.
  4. Use bulkheads to isolate critical dependencies: Allocate separate thread pools or semaphores for each downstream service to contain blast radius.
  5. Implement idempotency on all event handlers: Events can be delivered more than once; use idempotency keys to ensure safe reprocessing without side effects.
  6. Set timeouts at every layer: Connection timeout, read timeout, and overall operation timeout must all be explicitly configured — never rely on defaults.
  7. Centralize observability with correlation IDs: Propagate a unique correlation ID through all synchronous and asynchronous interactions to enable distributed tracing.
  8. Version your events and APIs: Use semantic versioning on event schemas and API contracts to enable independent deployment of services.
  9. Test failure scenarios in CI: Use chaos engineering techniques — inject latency, errors, and timeouts in integration tests to verify resilience patterns work correctly.
  10. Monitor circuit breaker and bulkhead metrics: Track open/closed state transitions, rejection rates, and retry counts — alert on anomalies before users notice.