Monitoring stack with Grafana, Prometheus, and Loki.


Overview

The monitoring stack at dashboard.spacemusic.tv provides metrics, logs, and alerting for all SpaceMusic services.

Component Image Purpose
Grafana grafana/grafana:latest Visualization and dashboards
Prometheus prom/prometheus:latest Metrics collection (30-day retention)
Loki grafana/loki:latest Log aggregation
Promtail grafana/promtail:latest Log shipping (Docker + /var/log)
cAdvisor gcr.io/cadvisor/cadvisor:latest Container metrics
node-exporter prom/node-exporter:latest Host system metrics

Access

URL: dashboard.spacemusic.tv

Authentication: Authentik OIDC (native Grafana OAuth integration). Role mapping by Authentik group:

Authentik Group Grafana Role
spacemusic-admins Admin
spacemusic-studio Editor
All others Viewer

Provisioned Dashboards

All dashboards are provisioned from JSON files in the repository and auto-synced every 30 seconds. UI edits do not persist -- changes must be made in the JSON files and committed to git.

Dashboard UID Description
SpaceMusic Server (Home) home Portal with polystat service map, quick health metrics, recent logs
Server Overview server-overview CPU, RAM, disk usage, network I/O from node-exporter
Docker Fleet docker-fleet Per-container CPU, memory, network, disk I/O from cAdvisor
LiveKit Streaming livekit-streaming Active rooms, participants, track bandwidth, connection quality
Uptime & Alerts uptime-alerts Service state timeline, response times, SSL certificate expiry
Storage (MinIO) storage Bucket usage, API requests, disk utilization
Relay (Centrifugo) relay WebSocket connections, messages/sec, channel activity
Auth (Authentik) auth Request rate, latency, status codes, flow timing, DB queries, memory
API Gateway api-gateway Health status, container metrics, request logs
I/O Hub io-hub Placeholder (service not yet deployed)

Datasources

Datasource UID URL Purpose
Prometheus prometheus http://prometheus:9090 Metrics (default)
Loki loki http://loki:3100 Logs
Infinity infinity -- REST API queries (JSON/CSV/XML)

Prometheus Scrape Targets

Prometheus collects metrics from all SpaceMusic services:

Job Target Notes
prometheus localhost:9090 Self-scrape
node_exporter node-exporter:9100 Host CPU, RAM, disk, network
cadvisor cadvisor:8080 Container metrics (filtered to key metrics)
livekit 172.17.0.1:7889 Host network, reached via Docker gateway IP
minio spacemusic-minio:9000 Bearer token auth
centrifugo spacemusic-centrifugo:8000 WebSocket relay metrics
authentik authentik-server:9300 SSO service metrics
kuvasz spacemusic-kuvasz:8080 Uptime monitoring metrics (bearer token auth)
Adding a new scrape target

After adding a new scrape target that references containers on external Docker networks, a full docker compose restart prometheus is needed. A SIGHUP reload is insufficient if DNS for the new container name fails on first resolution.

Plugins

Plugin Purpose
grafana-polystat-panel Hexagon service map on the home dashboard
grafana-clock-panel Live server clock widget
yesoreyeram-infinity-datasource Query REST APIs as datasources

Logs

Promtail ships all Docker container logs and /var/log system logs to Loki. Access logs via Grafana Explore using LogQL queries:

{container_name="spacemusic-api"} | json | status >= 400

The home dashboard includes a recent logs panel for quick troubleshooting.

Editing Dashboards

Dashboards are provisioned with disableDeletion: true and allowUiUpdates: false. To make changes:

  1. Edit the JSON file in services/dashboard/spacemusic-dashboard/grafana/dashboards/
  2. Commit and push to main
  3. GitHub Actions deploys the change; Grafana picks it up within 30 seconds