Event-Driven Architecture
Synchronous REST works well when the caller needs an immediate answer. But when an order is placed, the system might need to notify 5 downstream services — email, inventory, fraud detection, shipping, analytics. Calling them all synchronously before returning to the user creates a latency cliff and a coupling explosion. Event-driven architecture solves this with a fundamentally different contract: publish an event, and let interested parties react asynchronously.
Why Event-Driven?
Two delivery models side by side: the synchronous chain forces the producer to wait for every downstream service before it can respond; the async model returns immediately after publishing.
The producer must wait for every downstream service to complete before returning a response to the client. Latency accumulates, and a single slow (or failing) dependency stalls the entire request.
sequenceDiagram
autonumber
participant Client
participant OrderService
participant EmailService
participant InventoryService
participant FraudService
Client->>+OrderService: POST /orders
OrderService->>+EmailService: sendConfirmation()
EmailService-->>-OrderService: ok
OrderService->>+InventoryService: reserveStock()
InventoryService-->>-OrderService: ok
OrderService->>+FraudService: checkFraud()
FraudService-->>-OrderService: ok
OrderService-->>-Client: 201 Created The producer publishes a single event and immediately returns. Each downstream service consumes the event independently — in parallel, on its own schedule, without the producer knowing or caring.
sequenceDiagram
participant Client
participant OrderService
participant Kafka
participant EmailService
participant InventoryService
participant FraudService
Client->>+OrderService: POST /orders
OrderService->>Kafka: publish OrderCreated
OrderService-->>-Client: 202 Accepted
Kafka-->>EmailService: OrderCreated (async)
Kafka-->>InventoryService: OrderCreated (async)
Kafka-->>FraudService: OrderCreated (async) | Concern | Synchronous (REST) | Asynchronous (Event) |
|---|---|---|
| Latency | Sum of all calls | Only producer latency |
| Coupling | Producer knows all consumers | Producer knows none |
| Resilience | One slow consumer blocks all | Consumer failures are independent |
| Throughput | Limited by slowest dependency | Consumers scale independently |
| Debugging | Simple request trace | Requires distributed tracing |
| Consistency | Strong (synchronous) | Eventual |
The simulation below illustrates what happens to queue depth when the consumer cannot keep up with the producer — and what happens when it catches up.
Core Concepts
| Concept | Definition | Example |
|---|---|---|
| Event | An immutable record of something that happened | OrderCreated { orderId, customerId, timestamp } |
| Message | Generic data container sent between services | Any payload in a queue |
| Command | An instruction to do something (can be rejected) | PlaceOrder { items, paymentToken } |
| Topic | A named, durable, ordered log of events | order-events |
| Partition | A shard of a topic for parallelism | Partition 0, 1, 2 of order-events |
| Offset | Sequential integer identifying a message position | Offset 42 in partition 1 |
| Consumer Group | A set of consumers sharing the work of a topic | notification-service group |
Events are facts
Unlike database records, events are never updated or deleted — the log is append-only. This immutability makes events safe to replay for rebuilding state, auditing history, or populating new projections without touching the original data.
Kafka Architecture
Kafka organises messages into topics, each split into ordered partitions. Multiple consumer groups can read the same topic independently — each group maintains its own offset pointer per partition.
flowchart LR
subgraph broker ["Kafka Broker"]
subgraph topic ["Topic: order-events"]
P0["Partition 0\n0, 1, 2, 3, →"]
P1["Partition 1\n0, 1, 2, →"]
P2["Partition 2\n0, 1, →"]
end
end
subgraph ng ["notification-group"]
N0[Consumer A]
N1[Consumer B]
N2[Consumer C]
end
subgraph ag ["analytics-group"]
A0[Consumer X]
end
P0 --> N0
P1 --> N1
P2 --> N2
P0 --> A0
P1 --> A0
P2 --> A0 Each consumer group tracks its own offsets, so notification-group and analytics-group progress through the log independently. Adding a new consumer group never affects existing groups.
Retention policies control how long Kafka keeps messages:
| Policy | Config | Behavior |
|---|---|---|
| Time-based | retention.ms=604800000 | Delete messages older than 7 days |
| Size-based | retention.bytes=1073741824 | Delete when topic exceeds 1 GB |
| Compact | cleanup.policy=compact | Keep only the latest message per key (useful for state snapshots) |
Messaging Patterns
Pub/Sub (Fan-out)
One published event is delivered to every interested consumer group. Each group receives a full, independent copy of the message. Because groups track their own offsets, a slow analytics pipeline does not affect the notification pipeline.
Use cases: email notifications, audit logs, analytics pipelines, cache invalidation.
Competing Consumers
Multiple consumers within the same group share partitions — each partition is owned by exactly one consumer at a time, so each message is processed by exactly one consumer. This pattern enables horizontal scaling of processing without duplicated work.
Use cases: order processing, payment handling, image resizing jobs.
Event Sourcing
Instead of storing only the current state, the system stores the full history of events. Current state is derived by replaying all events from the beginning (or from a snapshot).
sequenceDiagram
actor User
User->>+CommandHandler: PlaceOrder
CommandHandler->>+EventStore: append OrderCreated
EventStore-->>-CommandHandler: ack
CommandHandler-->>-User: 202 Accepted
EventStore-->>Projection: OrderCreated event
Projection->>ReadModel: update order view This means the write model is the event log itself. Projections (read models) are rebuilt from events at any time, enabling time travel debugging and zero-downtime schema migration.
CQRS (Command Query Responsibility Segregation)
Separate the write side (commands, events, event store) from the read side (queries, projections, read model). The write side is optimised for consistency and auditability; the read side is optimised for query performance.
flowchart LR
client --> |"POST /orders (Command)"| cmd["Command Handler"]
cmd --> |"OrderCreated"| bus["Event Bus (Kafka)"]
bus --> |"project"| view["Read Model\n(optimised for queries)"]
client --> |"GET /orders (Query)"| view
cmd --> |"append"| store["Event Store"] CQRS is commonly paired with Event Sourcing because the event log is the natural write model. However, CQRS can also be applied without event sourcing — using separate read/write databases kept in sync via change-data-capture.
Dead Letter Queue
When a consumer fails to process a message repeatedly, it must not block the rest of the stream. A Dead Letter Queue (DLQ) is a separate topic where messages are moved after exhausting retry attempts, allowing the main stream to continue while the failing messages are investigated and reprocessed manually.
stateDiagram-v2
[*] --> Pending
Pending --> Processing : consumer picks up
Processing --> Committed : success
Processing --> Retry : transient failure (attempt < max)
Retry --> Processing : after backoff
Processing --> DLQ : max retries exceeded
DLQ --> Processing : manual reprocessing Spring Kafka
Spring Kafka's @RetryableTopic annotation handles retry scheduling and DLQ routing automatically. Configure attempts, backoff, and the DLQ topic name declaratively without manual error-handling boilerplate.
When to Use (and Not Use) Event-Driven
| Scenario | Best fit | Reason |
|---|---|---|
| Login / auth token generation | REST | Caller needs the token synchronously |
| Order confirmation email | Event | User doesn't wait; email can lag |
| Real-time fraud check (must block) | REST | Need the answer before proceeding |
| Audit log for all order mutations | Event | Fire-and-forget, fan-out |
| Payment processing | Depends | If idempotent retry is guaranteed: event. If blocking confirmation needed: REST + saga |
| Analytics data pipeline | Event | High volume, consumers can be slow |
| Service health check | REST | Synchronous by nature |
Operational Complexity
Event-driven systems are harder to debug than REST. A request that spans 5 services and 3 Kafka topics requires distributed tracing (covered in the Observability class) to diagnose. Do not adopt event-driven architecture for its own sake — adopt it when the coupling and latency costs of synchronous calls become concrete problems.
-
NARKHEDE, N.; SHAPIRA, G.; PALINO, T. Kafka: The Definitive Guide, 2nd ed. O'Reilly, 2021. ↩
-
RICHARDSON, C. Microservices Patterns — Event Sourcing, CQRS, Saga. ↩
-
FOWLER, M. Event Sourcing, 2005. ↩
-
YOUNG, G. CQRS Documents, 2010. ↩