Monitoring & Observability
mik provides comprehensive observability through Prometheus metrics, structured logging, and OpenTelemetry tracing.
Prometheus Metrics
Section titled “Prometheus Metrics”mik exposes metrics at /metrics in Prometheus format.
Key Metrics
Section titled “Key Metrics”| Metric | Type | Description |
|---|---|---|
mik_http_requests_total | Counter | Total HTTP requests by path and status |
mik_http_request_duration_seconds | Histogram | Request latency distribution |
mik_wasm_execution_duration_seconds | Histogram | WASM handler execution time |
mik_module_cache_hits_total | Counter | AOT cache hits |
mik_module_cache_misses_total | Counter | AOT cache misses |
mik_circuit_breaker_state | Gauge | Circuit breaker state (0=closed, 1=open, 2=half-open) |
mik_active_requests | Gauge | Currently processing requests |
Daemon Metrics (port 9919)
Section titled “Daemon Metrics (port 9919)”| Metric | Type | Description |
|---|---|---|
mik_instance_count | Gauge | Running/stopped/crashed instances |
mik_instance_uptime_seconds | Gauge | Instance uptime |
mik_kv_operations_total | Counter | KV operations by type |
mik_sql_queries_total | Counter | SQL queries by type |
mik_storage_operations_total | Counter | Storage operations by type |
mik_cron_executions_total | Counter | Cron job executions |
mik_cron_execution_duration_seconds | Histogram | Cron job duration |
Prometheus Scrape Config
Section titled “Prometheus Scrape Config”scrape_configs: - job_name: 'mik' static_configs: - targets: ['localhost:3000'] metrics_path: /metrics scrape_interval: 15s
- job_name: 'mik-daemon' static_configs: - targets: ['localhost:9919'] metrics_path: /metrics scrape_interval: 15sGrafana Dashboard
Section titled “Grafana Dashboard”Importing the Dashboard
Section titled “Importing the Dashboard”- Open Grafana
- Navigate to Dashboards > Import
- Import from
examples/deploy/grafana/dashboard.json
Key Panels
Section titled “Key Panels”Request Overview
- Request rate (requests/second)
- Error rate (4xx, 5xx responses)
- Latency percentiles (P50, P95, P99)
WASM Execution
- Execution time histogram
- Module-by-module breakdown
- Timeout occurrences
Cache Performance
- Cache hit ratio
- Cache size (entries and bytes)
- Eviction rate
Reliability
- Circuit breaker states per module
- Rate limiting rejections
- Active connections
Example Grafana Queries
Section titled “Example Grafana Queries”# Request rate by statussum by (status) (rate(mik_http_requests_total[5m]))
# P99 latencyhistogram_quantile(0.99, rate(mik_http_request_duration_seconds_bucket[5m]))
# Cache hit ratiosum(rate(mik_module_cache_hits_total[5m])) /(sum(rate(mik_module_cache_hits_total[5m])) + sum(rate(mik_module_cache_misses_total[5m])))
# Circuit breaker openmik_circuit_breaker_state == 1Alerting
Section titled “Alerting”Recommended Alerts
Section titled “Recommended Alerts”Create alert rules in Prometheus or Grafana:
High Error Rate
- alert: MikHighErrorRate expr: | sum(rate(mik_http_requests_total{status=~"5.."}[5m])) / sum(rate(mik_http_requests_total[5m])) > 0.01 for: 5m labels: severity: warning annotations: summary: "mik error rate above 1%" description: "Error rate is {{ $value | humanizePercentage }}"High Latency
- alert: MikHighLatency expr: | histogram_quantile(0.99, rate(mik_http_request_duration_seconds_bucket[5m])) > 1 for: 5m labels: severity: warning annotations: summary: "mik P99 latency above 1s"Circuit Breaker Open
- alert: MikCircuitBreakerOpen expr: mik_circuit_breaker_state == 1 for: 1m labels: severity: critical annotations: summary: "Circuit breaker open for {{ $labels.module }}"Low Cache Hit Ratio
- alert: MikLowCacheHitRatio expr: | sum(rate(mik_module_cache_hits_total[5m])) / (sum(rate(mik_module_cache_hits_total[5m])) + sum(rate(mik_module_cache_misses_total[5m]))) < 0.8 for: 10m labels: severity: warning annotations: summary: "Cache hit ratio below 80%"Structured Logging
Section titled “Structured Logging”mik uses structured JSON logging via the tracing crate.
Log Format
Section titled “Log Format”{ "timestamp": "2025-01-15T10:30:00.123456Z", "level": "INFO", "target": "mik::runtime", "message": "Module loaded", "module": "auth", "duration_ms": 45, "span": { "request_id": "abc-123", "trace_id": "def-456" }}Log Levels
Section titled “Log Levels”| Level | Use Case |
|---|---|
ERROR | Failures requiring immediate attention |
WARN | Potential issues (auth failures, timeouts, circuit breaker trips) |
INFO | Normal operations (module loads, requests) |
DEBUG | Detailed debugging (request details, cache operations) |
TRACE | Very verbose (WASM execution details) |
Configuring Log Level
Section titled “Configuring Log Level”# Set via environment variableRUST_LOG=info mik run
# More granular controlRUST_LOG=mik=debug,mik::runtime=trace mik run
# Quiet mode (errors only)RUST_LOG=error mik runLog Rotation
Section titled “Log Rotation”Configure in mik.toml:
[server]log_max_size_mb = 50 # Rotate when file reaches 50MBlog_max_files = 10 # Keep 10 rotated filesShipping Logs
Section titled “Shipping Logs”To Loki (via Promtail)
server: http_listen_port: 9080
positions: filename: /tmp/positions.yaml
clients: - url: http://loki:3100/loki/api/v1/push
scrape_configs: - job_name: mik static_configs: - targets: - localhost labels: job: mik __path__: /var/log/mik/*.log pipeline_stages: - json: expressions: level: level module: module trace_id: span.trace_id - labels: level: module:To Elasticsearch (via Filebeat)
filebeat.inputs: - type: log enabled: true paths: - /var/log/mik/*.log json.keys_under_root: true
output.elasticsearch: hosts: ["elasticsearch:9200"] index: "mik-%{+yyyy.MM.dd}"Distributed Tracing
Section titled “Distributed Tracing”mik supports OpenTelemetry tracing with W3C Trace Context propagation.
Configuration
Section titled “Configuration”Enable in mik.toml:
[tracing]service_name = "my-api"otlp_endpoint = "http://localhost:4317"Trace Structure
Section titled “Trace Structure”[HTTP Request] | +-- [Route Matching] | +-- [WASM Execution] | | | +-- [Module Load (if cache miss)] | | | +-- [Handler Invocation] | +-- [Response Serialization]Trace Context Propagation
Section titled “Trace Context Propagation”Incoming requests with traceparent header are linked to the parent trace:
curl -H "traceparent: 00-abc123-def456-01" http://localhost:3000/run/api/Outbound HTTP calls from handlers automatically propagate trace context.
Jaeger Setup
Section titled “Jaeger Setup”services: jaeger: image: jaegertracing/all-in-one:latest ports: - "16686:16686" # UI - "4317:4317" # OTLP gRPC environment: - COLLECTOR_OTLP_ENABLED=true
mik: image: ghcr.io/dufeutech/mik:latest environment: - RUST_LOG=info volumes: - ./mik.toml:/app/mik.tomlWith mik.toml:
[tracing]service_name = "my-api"otlp_endpoint = "http://jaeger:4317"Grafana Tempo Setup
Section titled “Grafana Tempo Setup”services: tempo: image: grafana/tempo:latest command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml
grafana: image: grafana/grafana:latest ports: - "3001:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=trueObservability Stack
Section titled “Observability Stack”Complete observability setup with Docker Compose:
services: mik: image: ghcr.io/dufeutech/mik:latest ports: - "3000:3000" volumes: - ./:/app environment: - RUST_LOG=info
prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana: image: grafana/grafana:latest ports: - "3001:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=true volumes: - ./grafana/dashboards:/var/lib/grafana/dashboards
loki: image: grafana/loki:latest ports: - "3100:3100"
tempo: image: grafana/tempo:latest ports: - "4317:4317"Health Endpoints
Section titled “Health Endpoints”| Endpoint | Purpose | Response |
|---|---|---|
/health | Basic liveness | {"status": "ready", ...} |
/metrics | Prometheus metrics | Text format |
Health Response
Section titled “Health Response”{ "status": "ready", "timestamp": "2025-01-15T10:30:00Z", "cache_size": 5, "cache_capacity": 100, "cache_bytes": 1048576, "total_requests": 1000}Next Steps
Section titled “Next Steps”- Operations Runbook - Troubleshooting common issues
- Production Deployment - Full deployment guide
- Reliability Features - Circuit breaker, rate limiting