Monitoring and Metrics
This guide shows you how to set up comprehensive monitoring and observability for Passage using OpenTelemetry, Prometheus, Grafana, and Sentry.
Overview
Section titled “Overview”Passage provides built-in observability through:
- OpenTelemetry - Metrics and distributed tracing
- Sentry (optional) - Error tracking and reporting
- Structured Logging - JSON-formatted logs with context
OpenTelemetry Integration
Section titled “OpenTelemetry Integration”Passage natively exports metrics and traces using OpenTelemetry (OTLP over HTTP).
Configuration
Section titled “Configuration”Configure OpenTelemetry in your config.toml:
[otel]environment = "production"traces_endpoint = "https://otlp-gateway.example.com/v1/traces"traces_token = "base64_auth_token"metrics_endpoint = "https://otlp-gateway.example.com/v1/metrics"metrics_token = "base64_auth_token"Grafana Cloud Setup
Section titled “Grafana Cloud Setup”-
Get Your Endpoints:
- Navigate to Configuration → Data Sources → OpenTelemetry
- Copy the OTLP endpoint URLs
-
Generate Auth Tokens:
Terminal window # Format: instanceID:tokenecho -n "12345:glc_xxxxx" | base64 -
Configure Passage:
[otel]environment = "production"traces_endpoint = "https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/traces"traces_token = "MTIzNDU6Z2xjX3h4eHh4"metrics_endpoint = "https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/metrics"metrics_token = "MTIzNDU6Z2xjX3h4eHh4"
Environment Variables
Section titled “Environment Variables”Override with environment variables:
export PASSAGE_OTEL_ENVIRONMENT=productionexport PASSAGE_OTEL_TRACES_ENDPOINT=https://otlp.example.com/v1/tracesexport PASSAGE_OTEL_TRACES_TOKEN=base64_tokenexport PASSAGE_OTEL_METRICS_ENDPOINT=https://otlp.example.com/v1/metricsexport PASSAGE_OTEL_METRICS_TOKEN=base64_tokenMetrics
Section titled “Metrics”Passage exports the following metrics via OpenTelemetry:
Connection Metrics
Section titled “Connection Metrics”| Metric | Type | Description |
|---|---|---|
passage_connections_total | Counter | Total connection attempts |
passage_connections_active | Gauge | Currently active connections |
passage_connections_failed | Counter | Failed connection attempts |
passage_connections_rate_limited | Counter | Connections blocked by rate limiter |
Labels:
client_ip- Client IP addressserver_address- Server address connected toprotocol_version- Minecraft protocol version
Request Metrics
Section titled “Request Metrics”| Metric | Type | Description |
|---|---|---|
passage_requests_total | Counter | Total requests (status pings + logins) |
passage_status_requests_total | Counter | Server list ping requests |
passage_login_requests_total | Counter | Login/join requests |
passage_request_duration_seconds | Histogram | Request processing time |
Labels:
request_type-statusorloginadapter_type- Adapter used (fixed,http,grpc, etc.)result-successorfailure
Adapter Metrics
Section titled “Adapter Metrics”| Metric | Type | Description |
|---|---|---|
passage_adapter_requests_total | Counter | Adapter invocations |
passage_adapter_errors_total | Counter | Adapter errors |
passage_adapter_duration_seconds | Histogram | Adapter response time |
Labels:
adapter_name-status,discovery, orstrategyadapter_type- Implementation type (fixed,http,grpc, etc.)error_type- Error category (if applicable)
Target Metrics
Section titled “Target Metrics”| Metric | Type | Description |
|---|---|---|
passage_targets_discovered | Gauge | Number of discovered backend servers |
passage_target_selections_total | Counter | Target selection operations |
passage_target_connections_total | Counter | Successful target connections |
passage_target_connection_failures_total | Counter | Failed target connections |
Labels:
target_identifier- Target server IDtarget_address- Target server address
System Metrics
Section titled “System Metrics”| Metric | Type | Description |
|---|---|---|
passage_uptime_seconds | Gauge | Uptime in seconds |
passage_version_info | Gauge | Version information (value always 1) |
Labels:
version- Passage version
Distributed Tracing
Section titled “Distributed Tracing”Passage exports distributed traces to help debug performance issues and track request flows.
Trace Spans
Section titled “Trace Spans”Each connection generates the following spans:
-
connection- Overall connection lifecycle- Attributes:
client_ip,server_address,protocol_version,username,user_id
- Attributes:
-
status_request- Server list ping (if applicable)- Attributes:
adapter_type
- Attributes:
-
discovery- Target discovery- Attributes:
adapter_type,targets_found
- Attributes:
-
strategy- Target selection- Attributes:
adapter_type,selected_target
- Attributes:
-
target_connection- Backend server connection- Attributes:
target_identifier,target_address
- Attributes:
Viewing Traces
Section titled “Viewing Traces”In Grafana, navigate to Explore → Traces and search by:
- Service name:
passage - Operation:
connection,status_request,discovery, etc. - Attributes:
username,client_ip,target_identifier
Prometheus Setup
Section titled “Prometheus Setup”If you’re using Prometheus instead of Grafana Cloud, set up an OpenTelemetry Collector:
OpenTelemetry Collector Configuration
Section titled “OpenTelemetry Collector Configuration”receivers: otlp: protocols: http: endpoint: "0.0.0.0:4318"
processors: batch:
exporters: prometheus: endpoint: "0.0.0.0:8889"
jaeger: endpoint: "jaeger:14250" tls: insecure: true
service: pipelines: metrics: receivers: [otlp] processors: [batch] exporters: [prometheus]
traces: receivers: [otlp] processors: [batch] exporters: [jaeger]Docker Compose Setup
Section titled “Docker Compose Setup”version: '3.8'
services: passage: image: ghcr.io/scrayosnet/passage:latest ports: - "25565:25565" environment: - PASSAGE_OTEL_METRICS_ENDPOINT=http://otel-collector:4318/v1/metrics - PASSAGE_OTEL_TRACES_ENDPOINT=http://otel-collector:4318/v1/traces depends_on: - otel-collector
otel-collector: image: otel/opentelemetry-collector:latest command: ["--config=/etc/otel-collector-config.yaml"] volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml ports: - "4318:4318" - "8889:8889"
prometheus: image: prom/prometheus:latest command: - '--config.file=/etc/prometheus/prometheus.yml' volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090"
grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin depends_on: - prometheusPrometheus Configuration
Section titled “Prometheus Configuration”global: scrape_interval: 15s
scrape_configs: - job_name: 'passage' static_configs: - targets: ['otel-collector:8889']Grafana Dashboards
Section titled “Grafana Dashboards”Creating a Dashboard
Section titled “Creating a Dashboard”- Navigate to Grafana
- Create → Dashboard
- Add Panel
Example Queries
Section titled “Example Queries”Connection Rate
Section titled “Connection Rate”rate(passage_connections_total[5m])Active Connections
Section titled “Active Connections”passage_connections_activeRequest Latency (p95)
Section titled “Request Latency (p95)”histogram_quantile(0.95, rate(passage_request_duration_seconds_bucket[5m]))Error Rate
Section titled “Error Rate”rate(passage_connections_failed[5m])Adapter Performance
Section titled “Adapter Performance”histogram_quantile(0.99, rate(passage_adapter_duration_seconds_bucket[5m]))Target Distribution
Section titled “Target Distribution”sum(passage_target_selections_total) by (target_identifier)Pre-Built Dashboard
Section titled “Pre-Built Dashboard”A pre-built Grafana dashboard is available in the Passage repository:
# Import from filecurl -o passage-dashboard.json \ https://raw.githubusercontent.com/scrayosnet/passage/main/docs/grafana-dashboard.json
# Import in Grafana:# Dashboard → Import → Upload JSON fileLogging
Section titled “Logging”Passage uses structured logging with configurable log levels.
Log Levels
Section titled “Log Levels”Set via RUST_LOG environment variable:
# Error onlyRUST_LOG=error passage
# Info (default)RUST_LOG=info passage
# DebugRUST_LOG=debug passage
# Trace (very verbose)RUST_LOG=trace passage
# Per-module levelsRUST_LOG=passage=debug,passage::adapter=trace passageLog Format
Section titled “Log Format”Logs are output in JSON format:
{ "timestamp": "2024-02-05T10:30:45.123Z", "level": "INFO", "target": "passage::connection", "message": "Connection established", "client_ip": "192.168.1.100", "username": "Steve", "user_id": "069a79f4-44e9-4726-a5be-fca90e38aaf5", "target": "hub-1"}Centralized Logging
Section titled “Centralized Logging”Loki (Grafana)
Section titled “Loki (Grafana)”services: passage: image: ghcr.io/scrayosnet/passage:latest logging: driver: loki options: loki-url: "http://loki:3100/loki/api/v1/push" loki-batch-size: "400"
loki: image: grafana/loki:latest ports: - "3100:3100" volumes: - ./loki-config.yaml:/etc/loki/local-config.yamlELK Stack
Section titled “ELK Stack”services: passage: image: ghcr.io/scrayosnet/passage:latest logging: driver: fluentd options: fluentd-address: localhost:24224 tag: passage
fluentd: image: fluent/fluentd:latest ports: - "24224:24224" volumes: - ./fluent.conf:/fluentd/etc/fluent.confSentry Error Tracking
Section titled “Sentry Error Tracking”Sentry provides real-time error tracking and alerting.
Configuration
Section titled “Configuration”[sentry]enabled = truedebug = falseenvironment = "production"Environment Variables
Section titled “Environment Variables”export PASSAGE_SENTRY_ENABLED=trueexport PASSAGE_SENTRY_ADDRESS=https://your-key@sentry.io/project-idexport PASSAGE_SENTRY_ENVIRONMENT=productionWhat Gets Reported
Section titled “What Gets Reported”Sentry captures:
- Panic/crash events
- Adapter errors
- Connection failures
- Configuration errors
Each event includes:
- Stack traces
- Request context (username, IP, target)
- Environment information
- Custom tags and metadata
Viewing Errors
Section titled “Viewing Errors”In Sentry:
- Navigate to Issues
- Filter by environment (
production) - View stack traces and context
- Set up alerts for new/recurring errors
Health Checks
Section titled “Health Checks”Passage doesn’t expose a dedicated health check endpoint, but you can monitor health by:
Connection Test
Section titled “Connection Test”# Test if Passage is accepting connectionsnc -zv localhost 25565Minecraft Status Check
Section titled “Minecraft Status Check”# Using mcstatus toolpip install mcstatusmcstatus localhost:25565 statusKubernetes Liveness Probe
Section titled “Kubernetes Liveness Probe”apiVersion: v1kind: Podmetadata: name: passagespec: containers: - name: passage image: ghcr.io/scrayosnet/passage:latest livenessProbe: tcpSocket: port: 25565 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: tcpSocket: port: 25565 initialDelaySeconds: 2 periodSeconds: 5Alerting
Section titled “Alerting”Grafana Alerts
Section titled “Grafana Alerts”Create alerts in Grafana for:
High Error Rate
Section titled “High Error Rate”rate(passage_connections_failed[5m]) > 0.1High Latency
Section titled “High Latency”histogram_quantile(0.95, rate(passage_request_duration_seconds_bucket[5m])) > 0.5No Active Connections (possible crash)
Section titled “No Active Connections (possible crash)”passage_connections_active == 0Adapter Errors
Section titled “Adapter Errors”rate(passage_adapter_errors_total[5m]) > 0.05Prometheus Alertmanager
Section titled “Prometheus Alertmanager”groups: - name: passage interval: 30s rules: - alert: PassageHighErrorRate expr: rate(passage_connections_failed[5m]) > 0.1 for: 2m labels: severity: warning annotations: summary: "High error rate detected" description: "Error rate is {{ $value }} errors/sec"
- alert: PassageHighLatency expr: histogram_quantile(0.95, rate(passage_request_duration_seconds_bucket[5m])) > 0.5 for: 5m labels: severity: warning annotations: summary: "High latency detected" description: "P95 latency is {{ $value }}s"
- alert: PassageDown expr: up{job="passage"} == 0 for: 1m labels: severity: critical annotations: summary: "Passage is down" description: "Passage instance has been down for more than 1 minute"Performance Tuning
Section titled “Performance Tuning”Identify Bottlenecks
Section titled “Identify Bottlenecks”Use metrics to identify performance issues:
- High
passage_adapter_duration_seconds→ Optimize adapters - High
passage_request_duration_seconds→ Check adapter performance - High
passage_target_connection_failures_total→ Backend server issues - High
passage_connections_rate_limited→ Adjust rate limiter settings
Adapter Optimization
Section titled “Adapter Optimization”- Keep adapter response times under 50ms
- Use caching where appropriate
- Implement connection pooling
- Monitor adapter-specific metrics
Best Practices
Section titled “Best Practices”Monitoring
Section titled “Monitoring”- Set up dashboards for key metrics
- Configure alerts for critical issues
- Monitor both Passage and adapters
- Track long-term trends
Logging
Section titled “Logging”- Use structured logging (JSON)
- Set appropriate log levels
- Aggregate logs centrally
- Include context (username, IP, target)
Observability
Section titled “Observability”- Enable OpenTelemetry in production
- Use distributed tracing for debugging
- Monitor adapter performance
- Track error rates and latency
Security
Section titled “Security”- Protect metrics endpoints
- Secure OTLP credentials
- Monitor for unusual patterns
- Set up security alerts
Troubleshooting
Section titled “Troubleshooting”No Metrics Appearing
Section titled “No Metrics Appearing”-
Check OTLP endpoints:
Terminal window curl -v $PASSAGE_OTEL_METRICS_ENDPOINT -
Verify authentication:
Terminal window echo $PASSAGE_OTEL_METRICS_TOKEN | base64 -d -
Check logs:
Terminal window RUST_LOG=debug passage
High Latency
Section titled “High Latency”-
Check adapter metrics:
passage_adapter_duration_seconds -
View traces in Grafana
-
Optimize slow adapters
-
Consider caching
Missing Traces
Section titled “Missing Traces”- Ensure traces endpoint is configured
- Check sample rate (default: 100%)
- Verify network connectivity
- Check OpenTelemetry Collector logs