Monitoring & Observability

Integra provides built-in observability via Prometheus metrics and OpenTelemetry tracing.

Prometheus Metrics

Exposed at /metrics (standard Prometheus format).

Key Metrics

Metric	Type	Description
`http_requests_total`	Counter	Total requests by status/method/path.
`http_request_duration_seconds`	Histogram	Request latency distribution.
`integra_parsing_duration_seconds`	Histogram	Time spent parsing AL3 (excluding network/JSON overhead).
`integra_validation_errors_total`	Counter	Count of validation failures.

Dashboard (Grafana)

Recommended panels: 1. Request Rate: rate(http_requests_total[1m]) 2. Error Rate (4xx/5xx): rate(http_requests_total{status=~"4..|5.."}[1m]) 3. P99 Latency: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

OpenTelemetry Tracing

Integra exports traces via OTLP (gRPC).

Configuration

export INTEGRA_TELEMETRY_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_SERVICE_NAME=integra-prod

Trace Structure

Root Span: POST /v1/parse
Child: al3.Parse (The core parsing logic)
Child: json.Marshal (Response generation)

Use traces to identify bottlenecks in specific complex AL3 files.

Health Checks

Orchestrators should monitor /health. - 200 OK: Service is ready. - Fail: Service is stuck or shutting down.