Monitoring & Observability
Integra provides built-in observability via Prometheus metrics and OpenTelemetry tracing.
Prometheus Metrics
Exposed at /metrics (standard Prometheus format).
Key Metrics
| Metric | Type | Description |
|---|---|---|
http_requests_total |
Counter | Total requests by status/method/path. |
http_request_duration_seconds |
Histogram | Request latency distribution. |
integra_parsing_duration_seconds |
Histogram | Time spent parsing AL3 (excluding network/JSON overhead). |
integra_validation_errors_total |
Counter | Count of validation failures. |
Dashboard (Grafana)
Recommended panels:
1. Request Rate: rate(http_requests_total[1m])
2. Error Rate (4xx/5xx): rate(http_requests_total{status=~"4..|5.."}[1m])
3. P99 Latency: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
OpenTelemetry Tracing
Integra exports traces via OTLP (gRPC).
Configuration
export INTEGRA_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_SERVICE_NAME=integra-prod
Trace Structure
- Root Span:
POST /v1/parse - Child:
al3.Parse(The core parsing logic) - Child:
json.Marshal(Response generation)
Use traces to identify bottlenecks in specific complex AL3 files.
Health Checks
Orchestrators should monitor /health.
- 200 OK: Service is ready.
- Fail: Service is stuck or shutting down.