Monitoring & Observability¶
IOSetu provides built-in observability via Prometheus metrics and OpenTelemetry tracing.
Prometheus Metrics¶
Exposed at /metrics (standard Prometheus format).
Key Metrics¶
| Metric | Type | Description |
|---|---|---|
http_requests_total |
Counter | Total requests by status/method/path. |
http_request_duration_seconds |
Histogram | Request latency distribution. |
iosetu_parsing_duration_seconds |
Histogram | Time spent parsing AL3 (excluding network/JSON overhead). |
iosetu_validation_errors_total |
Counter | Count of validation failures. |
Dashboard (Grafana)¶
Recommended panels:
1. Request Rate: rate(http_requests_total[1m])
2. Error Rate (4xx/5xx): rate(http_requests_total{status=~"4..|5.."}[1m])
3. P99 Latency: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
OpenTelemetry Tracing¶
IOSetu exports traces via OTLP (gRPC).
Configuration¶
export IOSETU_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_SERVICE_NAME=iosetu-prod
Trace Structure¶
- Root Span:
POST /v1/parse - Child:
al3.Parse(The core parsing logic) - Child:
json.Marshal(Response generation)
Use traces to identify bottlenecks in specific complex AL3 files.
Health Checks¶
Orchestrators should monitor /health.
- 200 OK: Service is ready.
- Fail: Service is stuck or shutting down.