a. Overview
Goal
Add OpenTelemetry instrumentation to every Spring Boot service so that a single user request — traversing Gateway → OrderService → ProductService — generates a unified distributed trace visible in Jaeger, with trace IDs automatically injected into log lines.
You already have Prometheus scraping metrics and Grafana displaying dashboards. Metrics answer "how much" and "how often". They cannot tell you which service inside a multi-hop request chain is responsible for a 2-second tail latency. Distributed tracing answers that question.
The Problem This Solves
Consider a slow /orders response. Prometheus shows elevated http_server_requests_seconds on the gateway, but the gateway delegates to OrderService which in turn calls ProductService and the database. The metric cannot tell you where the time went.
Without tracing, the only option is to stare at logs across four separate services and guess.
sequenceDiagram
participant C as Client
participant G as Gateway
participant O as OrderService
participant P as ProductService
participant D as Database
C->>G: POST /orders (? ms)
G->>O: POST /orders (? ms)
O->>P: GET /products/{id} (? ms)
P->>D: SELECT ... (? ms)
D-->>P: row
P-->>O: ProductDTO
O-->>G: OrderDTO
G-->>C: 200 OK (total: 2 340 ms) With tracing, every hop is measured as a span and all spans for one request share a trace ID. The Jaeger UI renders the full call graph as a flame chart so you can see the slow span immediately.
sequenceDiagram
participant C as Client
participant G as Gateway
participant O as OrderService
participant P as ProductService
participant D as Database
C->>G: POST /orders<br/>traceparent: 00-4bf9...36-01
note over G: span: gateway.POST /orders (45 ms)
G->>O: POST /orders<br/>traceparent: 00-4bf9...36-01
note over O: span: order.POST /orders (2 250 ms)
O->>P: GET /products/7<br/>traceparent: 00-4bf9...36-01
note over P: span: product.GET /products/7 (2 190 ms)
P->>D: SELECT ...
note over D: span: jdbc.query (2 180 ms)
D-->>P: row
P-->>O: ProductDTO
O-->>G: OrderDTO
G-->>C: 200 OK trace_id=4bf9...36 The bottleneck is immediately visible: the JDBC query inside ProductService accounts for almost all of the latency.
Architecture
This hands-on adds Jaeger to the observability layer and wires the OpenTelemetry Java agent into each service container:
flowchart LR
subgraph obs [Observability Layer]
prometheus@{ shape: cyl, label: "Prometheus\n:9090" }
grafana[Grafana\n:3000]
jaeger["Jaeger\n:16686 UI\n:4317 OTLP"]:::highlighted
prometheus --> grafana
end
subgraph api [Trusted Layer]
loadbalancer --> gateway
gateway --> order
gateway --> account
order --> product
order --> db@{ shape: cyl, label: "PostgreSQL" }
end
gateway & order & account & product e1@-->|"OTLP traces"| jaeger
prometheus e2@-.->|"scrape /actuator/prometheus"| gateway & order & account & product
internet e0@==>|":80"| loadbalancer
e0@{ animate: true }
e1@{ animate: true }
e2@{ animate: true }
classDef highlighted fill:#fcc Adding Jaeger to Docker Compose
Jaeger ships as a single all-in-one container that includes the collector, query engine, and UI. For production you would run these as separate components, but for local development the all-in-one image is sufficient.
Add the following service to compose.yaml:
jaeger:
image: jaegertracing/all-in-one:1.57
container_name: jaeger
environment:
COLLECTOR_OTLP_ENABLED: "true"
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
networks:
- store-net
Port 4317 vs 4318
The OTel Java agent defaults to gRPC on port 4317. Port 4318 is for HTTP/protobuf. Both carry the same OTLP protocol — use 4317 unless a firewall forces HTTP.
OpenTelemetry Java Agent
The OpenTelemetry Java agent is a JVM instrumentation agent — a single JAR you attach with -javaagent. It uses Java's bytecode instrumentation API to intercept Spring MVC, Spring WebClient, RestTemplate, JDBC drivers, and dozens of other libraries without any code changes.
Download the agent
Add a download step to each service's Dockerfile:
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
ADD https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v2.4.0/opentelemetry-javaagent.jar \
/app/opentelemetry-javaagent.jar
COPY target/*.jar app.jar
ENTRYPOINT ["java", \
"-javaagent:/app/opentelemetry-javaagent.jar", \
"-jar", "/app/app.jar"]
Pin the agent version
The OTel agent version and the Spring Boot version must be compatible. v2.4.0 works with Spring Boot 3.x. Check the compatibility matrix before upgrading either dependency.
Configure each service in Docker Compose
The agent is configured entirely through environment variables. Add the following environment block to each service in compose.yaml:
Why OTEL_METRICS_EXPORTER: none?
By default the OTel agent also tries to export metrics via OTLP. You are already collecting metrics with Prometheus and Micrometer — exporting them a second time to Jaeger would create duplicates and waste resources. Setting none disables the OTel metric pipeline while leaving trace export fully enabled.
Trace Propagation — How It Works
When the agent intercepts an outgoing HTTP call, it injects the W3C traceparent header into the request. The receiving service's agent reads that header and creates a child span under the same trace.
The traceparent format is:
00-{traceId}-{parentSpanId}-{flags}
│ │ │ │
│ │ │ └── 01 = sampled, 00 = not sampled
│ │ └── 16-char parent span ID (hex)
│ └── 32-char trace ID (hex) — identical on every hop
└── version byte (always 00)
Example value seen in HTTP headers:
The trace ID 4bf92f3577b34da6a3ce929d0e0e4736 is present on every service in the chain. All spans carrying that ID are grouped into one trace in Jaeger.
sequenceDiagram
participant G as Gateway Agent
participant O as Order Agent
participant P as Product Agent
G->>G: start root span<br/>traceId=4bf9... spanId=aaa1
G->>O: POST /orders<br/>traceparent: 00-4bf9...-aaa1-01
O->>O: start child span<br/>traceId=4bf9... spanId=bbb2 parent=aaa1
O->>P: GET /products/7<br/>traceparent: 00-4bf9...-bbb2-01
P->>P: start child span<br/>traceId=4bf9... spanId=ccc3 parent=bbb2
P-->>O: 200
O-->>G: 200
G->>G: end root span → export to Jaeger
O->>O: end child span → export to Jaeger
P->>P: end child span → export to Jaeger Sampling
The 01 flags byte means the trace is sampled and will be exported. The default sampler exports 100% of traces. For production, switch to a rate-limiting sampler: set OTEL_TRACES_SAMPLER=parentbased_traceidratio and OTEL_TRACES_SAMPLER_ARG=0.1 to sample 10% of requests.
Correlating Traces with Logs
A trace ID in Jaeger is only useful if you can find the matching log lines. The OTel agent bridges both worlds: it injects the active trace ID and span ID into SLF4J's MDC, so every log statement emitted during a traced request automatically carries the trace context.
Add the logback bridge dependency
In each service's pom.xml:
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-logback-appender-1.0</artifactId>
<version>2.4.0-alpha</version>
<scope>runtime</scope>
</dependency>
Configure logback-spring.xml
Create src/main/resources/logback-spring.xml in each service:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>
%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36}
[trace_id=%X{trace_id} span_id=%X{span_id}] - %msg%n
</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</configuration>
MDC key names
The OTel agent writes trace context under the keys trace_id and span_id in SLF4J MDC. These match the OTLP/W3C naming convention and are directly searchable in Loki, CloudWatch Insights, Splunk, and most other log aggregators.
With this configuration, every log line during a traced request will look like:
14:23:01.442 [http-nio-8080-exec-3] INFO c.example.order.OrderService
[trace_id=4bf92f3577b34da6a3ce929d0e0e4736 span_id=00f067aa0ba902b7] - creating order for account 42
14:23:01.501 [http-nio-8080-exec-3] INFO c.example.order.OrderService
[trace_id=4bf92f3577b34da6a3ce929d0e0e4736 span_id=00f067aa0ba902b7] - calling ProductService for product id 7
Workflow: open the slow trace in Jaeger → click the problematic span → copy the trace_id value → paste it into your log aggregator's search bar → all log lines from that exact request appear together.
Viewing Traces in Jaeger UI
Rebuild and start the full stack:
Verify Jaeger started:
Expected output:
NAME IMAGE STATUS PORTS
jaeger jaegertracing/all-in-one:1.57 running 0.0.0.0:4317->4317/tcp, 0.0.0.0:16686->16686/tcp
Generate some traffic so there are traces to explore:
curl -s -X POST http://localhost/orders \
-H "Content-Type: application/json" \
-d '{"accountId": 1, "productId": 7, "quantity": 2}' | jq
Open http://localhost:16686 in your browser.
Finding a trace
- In the Service dropdown, select
gateway-service. - Leave Operation as
alland click Find Traces. - The results list recent traces sorted by timestamp. Each row shows the root operation name, total duration, and span count.
- Click any trace to open the flame chart view.
Reading the flame chart
The flame chart renders each span as a horizontal bar. The root span (gateway) is at the top. Child spans are indented below their parent. Bar width is proportional to duration — a wide bar is a slow operation.
What to look for
- A span nearly as wide as the root span is the primary latency contributor.
- Spans labelled
SELECTorjdbc.querywith long duration indicate slow database queries. - Spans labelled
HTTP GETorHTTP POSTare outbound HTTP calls captured by the agent. - A span with a red
!icon recorded an exception — click it to see the full stack trace attached as a span event.
Span attributes
Clicking a span expands its tags. The agent records these automatically:
| Tag | Example value |
|---|---|
http.method | POST |
http.url | http://order:8080/orders |
http.status_code | 200 |
db.system | postgresql |
db.statement | SELECT * FROM products WHERE id = ? |
net.peer.name | db |
Adding Custom Spans
The Java agent instruments framework code automatically, but it knows nothing about your business logic. When validation, pricing calculation, or any significant processing step needs to be traced individually, add a manual span.
Add the OpenTelemetry API dependency
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
Spring Boot's dependency management includes opentelemetry-bom, so no version is needed here.
Instrument business logic
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Scope;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
@Service
public class OrderService {
@Autowired
private Tracer tracer;
public OrderOut create(OrderIn orderIn) {
Span span = tracer.spanBuilder("validateOrder").startSpan();
try (Scope scope = span.makeCurrent()) {
span.setAttribute("order.account_id", orderIn.getAccountId());
span.setAttribute("order.product_id", orderIn.getProductId());
validateAccount(orderIn.getAccountId());
checkInventory(orderIn.getProductId(), orderIn.getQuantity());
} catch (Exception e) {
span.recordException(e);
throw e;
} finally {
span.end();
}
return persist(orderIn);
}
}
Always call span.end() in a finally block
If span.end() is never called, the span is never exported to Jaeger. The try-finally pattern guarantees cleanup even when an exception is thrown. The agent handles this automatically for framework code; manual spans are your responsibility.
When to add manual spans
Add manual spans when: (1) a method takes more than ~50 ms and you want to measure it independently, (2) you need to attach business-level attributes (order.id, account.tier) that are not captured by framework instrumentation, or (3) you need to record a caught exception as a span event. Do not add manual spans to trivial getters or field assignments — the noise obscures the useful data.
Checklist
Before moving on, verify:
-
docker compose psshowsjaegerrunning with ports 16686 and 4317 exposed -
http://localhost:16686loads the Jaeger UI without errors - After sending a request,
gateway-serviceappears in the Jaeger service dropdown - A trace for
POST /orderscontains at least 3 spans: gateway, order-service, product-service - Each span has
http.methodandhttp.status_codetags set by the agent automatically - Log output includes
trace_id=fields that match the trace ID shown in Jaeger