Tailscale Monitoring Stack

Unified observability across your tailnet mesh

Research Briefing — February 26, 2026
3 hosts • Prometheus + Grafana + Loki • Grafana Alloy

Arrow keys or swipe to navigate • 16 slides

The Challenge

3 hosts on a Tailscale mesh, 1 central monitoring stack

Mac Mini (Ocasia)

macOS • 100.x.x.4

Central monitoring stack via Colima Docker: Prometheus, Grafana, Loki, Alertmanager, cAdvisor, node-exporter

DigitalOcean Docker

Linux • 100.x.x.45

penny-backtest, penny-web containers. No monitoring agent yet.

Raspberry Pi 5

Ubuntu 24.04 • 100.x.x.8

Brand new, 8GB RAM, nothing installed yet. Future AI agent host.

Goals

Ship metrics (CPU, RAM, disk, network) from all hosts to central Prometheus
Ship logs from all hosts to central Loki
Monitor Tailscale itself (connection status, latency, DERP relay usage)
All traffic over the encrypted Tailscale mesh — no public exposure

The Answer: Grafana Alloy

Verdict: Deploy Grafana Alloy on every remote host. It replaces both node-exporter AND Promtail in a single binary. Promtail hits EOL on March 2, 2026.

Before (Old Stack)

node-exporter   → Prometheus scrapes :9100
promtail        → pushes logs to Loki
                   (2 processes, 2 configs)

After (Alloy)

grafana-alloy   → pushes metrics via remote_write
                → pushes logs via loki.write
                   (1 process, 1 config)

Why Alloy Wins

EOL Promtail dies March 2, 2026 — Alloy is the official replacement
Built-in prometheus.exporter.unix = the node_exporter, inside Alloy
Push-based — no firewall rules, no Prometheus scrape targets to maintain
Native ARM64 binary — runs on the Pi 5
One config file per host, uses constants.hostname for auto-labeling

Promtail is Dead

Promtail End of Life: March 2, 2026 — No new features since early 2025. Grafana Labs has officially deprecated it in favor of Alloy.

Migration Path

# Convert existing promtail config to Alloy format automatically
alloy convert --source-format=promtail --output=/etc/alloy/config.alloy promtail.yaml

Log Shipper Comparison

Agent	RAM	Loki Support	Verdict
Grafana Alloy	~100-200 MiB	Native	Recommended
Fluent Bit	~10-20 MiB	Plugin	Fallback
Vector	~100-200 MiB	Native	Good, but not Grafana-native
~~Promtail~~	~50 MiB	Native	EOL Mar 2026

If the Pi ever gets RAM-constrained, Fluent Bit (10 MiB) for logs + bare node-exporter (20 MiB) is the ultra-lean fallback.

Alloy Resource Footprint

Official Estimates (from Grafana docs)

Workload	CPU	RAM
Metrics (per 1M active series)	~0.4 cores	~11 GiB
Logs (per 1 MiB/s throughput)	~1 core	~120 MiB
Homelab (basic metrics + low-volume logs)	0.05-0.15 cores	80-200 MiB

Comparison on Raspberry Pi 5 (8GB)

Alloy (unified)

~100-200 MiB RAM as systemd service

~500 MiB if run in Docker (overhead)

Run as systemd, not Docker, on Pi

Separate (old way)

node-exporter: ~15 MiB

Promtail: ~50 MiB

Total: ~65 MiB (but 2 processes)

The tradeoff: ~3x more RAM for consolidation + future-proofing. On an 8GB Pi with 7.4 GiB free, 200 MiB is nothing.

Alloy Configuration

Complete remote host agent config

// /etc/alloy/config.alloy — drop on every remote host

// ═══ METRICS (replaces node-exporter) ═══
prometheus.exporter.unix "host" {
  set_collectors = ["cpu","meminfo","diskstats",
    "filesystem","netdev","loadavg","uname","processes"]
}

prometheus.scrape "host_metrics" {
  targets    = prometheus.exporter.unix.host.targets
  forward_to = [prometheus.relabel.add_host.receiver]
  scrape_interval = "30s"
}

prometheus.relabel "add_host" {
  forward_to = [prometheus.remote_write.central.receiver]
  rule {
    target_label = "host"
    replacement  = constants.hostname
  }
}

prometheus.remote_write "central" {
  endpoint {
    url = "http://100.x.x.4:9090/api/v1/write"
  }
}

// ═══ LOGS (replaces promtail) ═══
loki.source.journal "journal" {
  max_age    = "12h"
  forward_to = [loki.write.central.receiver]
  labels     = { host = constants.hostname, job = "journal" }
}

loki.write "central" {
  endpoint {
    url = "http://100.x.x.4:3100/loki/api/v1/push"
  }
  external_labels = { host = constants.hostname }
}

Tailscale Native Metrics

Built-in since Tailscale v1.78 — zero dependencies

Each Tailscale node exposes Prometheus metrics natively. Enable with:

tailscale set --webclient    # exposes :5252/metrics over the tailnet

Available Metrics

Metric	Type	What it tells you
`tailscaled_inbound_bytes_total`	counter	Inbound bytes by path (direct_ipv4, derp, etc.)
`tailscaled_outbound_bytes_total`	counter	Outbound bytes by path
`tailscaled_inbound_dropped_packets_total`	counter	Dropped packets with reason labels
`tailscaled_home_derp_region_id`	gauge	Which DERP relay the node uses
`tailscaled_health_messages`	gauge	Health warnings (type label)
`tailscaled_advertised_routes`	gauge	Subnet routes advertised
`tailscaled_approved_routes`	gauge	Subnet routes approved

Path labels on throughput: direct_ipv4, direct_ipv6, derp — tells you if traffic is direct or relayed.

Tailscale Exporters

Fleet-level visibility via the Tailscale API

Recommendation: Run adinhodovic/tailscale-exporter on the Mac Mini for tailnet-wide device monitoring. Combine with native client metrics on each host.

Exporter	What it monitors	Auth needed	Status
adinhodovic/tailscale-exporter	Fleet: devices, users, keys, DNS	OAuth client	v0.3.0 Dec 2025
Native client metrics (:5252)	Per-host: throughput, DERP, health	None	Built-in v1.78+
josh/tailscale_exporter	Device status, auth expiry	API key	Active
cfunkhouser/tailscalesd	Service discovery (not metrics)	API key	Niche

Pre-built Grafana Dashboards

Dashboard 24177 — Tailscale Overview (fleet-wide)
Dashboard 24178 — Tailscale Machine (per-device)
Both from the tailscale-mixin in the adinhodovic repo

Pull vs Push Architecture

Verdict: Hybrid — Alloy pushes metrics + logs from remotes (simplest config), central Prometheus pulls Tailscale client metrics on :5252, and tailscale-exporter runs centrally.

Push (Alloy remote_write)

No Prometheus scrape targets to manage
Works through any network topology
Alloy does metrics + logs in one config
Remote hosts are self-contained

Pull (Prometheus scrape)

Traditional, well-understood model
"up" metric works (host-down alerting)
Tailscale eliminates the firewall objection
Best for Tailscale native metrics (:5252)

Why Hybrid?

Push for host metrics + logs (Alloy handles both, zero Prometheus config per host). Pull for Tailscale metrics (native :5252 endpoint, already exposed). Best of both worlds.

Security Model

Tailscale provides the security layer

All traffic is WireGuard-encrypted end-to-end — no additional TLS needed between nodes
Only enrolled devices on your tailnet can reach services
For a 3-device personal homelab, no additional auth is required

Best Practices

Practice	How
Bind to Tailscale IP only	`--web.listen-address=100.x.x.x:9100`
Tailscale ACLs	Restrict :9100, :5252, :3100 to monitoring host only
No public exposure	Never bind exporters to 0.0.0.0 on public-facing hosts
Access Grafana via tunnel	`ssh -L 3000:localhost:3000 dan@mini`

Example Tailscale ACL

{
  "acls": [
    { "action": "accept",
      "src": ["100.x.x.4"],       // Mac Mini only
      "dst": ["*:9100", "*:5252"] }   // monitoring ports
  ]
}

Target Architecture

Mac Mini — 100.x.x.4 (Central) ├── Docker (Colima): │ ├── prometheus :9090 ← scrapes Tailscale :5252 on all hosts │ │ ← receives remote_write from Alloy agents │ ├── grafana :3000 ← dashboards: node, ollama, tailscale │ ├── loki :3100 ← receives log push from Alloy agents │ ├── alertmanager :9093 │ ├── tailscale-exporter ← fleet API metrics (OAuth) │ ├── cadvisor, node-exporter, promtail (existing) │ └── ollama-exporter :9101 ← already deployed ├── Native: │ └── tailscale client metrics :5252 DigitalOcean — 100.x.x.45 ├── Docker: Grafana Alloy │ ├── prometheus.exporter.unix → remote_write → Mac Mini :9090 │ └── loki.source.journal → loki.write → Mac Mini :3100 └── tailscale client metrics :5252 Raspberry Pi 5 — 100.x.x.8 ├── Systemd: Grafana Alloy │ ├── prometheus.exporter.unix → remote_write → Mac Mini :9090 │ └── loki.source.journal → loki.write → Mac Mini :3100 └── tailscale client metrics :5252

Prometheus Config Updates

Add to existing `~/monitoring-stack/prometheus/prometheus.yml`

# Enable remote write receiver (add to docker-compose command)
# command: '--web.enable-remote-write-receiver'

scrape_configs:
  # ... existing jobs ...

  # Tailscale client metrics from all hosts
  - job_name: 'tailscale-clients'
    static_configs:
      - targets: ['host.docker.internal:5252']
        labels: { host: 'mac-mini' }
      - targets: ['100.x.x.45:5252']
        labels: { host: 'digitalocean' }
      - targets: ['100.x.x.8:5252']
        labels: { host: 'raspberry-pi' }
    scrape_interval: 30s

  # Tailscale fleet exporter (runs locally)
  - job_name: 'tailscale-fleet'
    static_configs:
      - targets: ['tailscale-exporter:9090']

Alloy pushes host metrics via remote_write — no scrape targets needed for those.

Grafana Dashboards

Import these dashboard IDs

ID	Name	What it shows
1860	Node Exporter Full	CPU, memory, disk, network per host
24177	Tailscale Overview	Fleet: all devices, online/offline, OS, users
24178	Tailscale Machine	Per-device: throughput, DERP, latency
Custom	Ollama Observability	Already deployed (uid: ollama-ocasia-001)

Import via Grafana UI or API

# Via CLI (from Mac Mini)
curl -s -u dan:Sl33py!!! \
  http://localhost:3000/api/dashboards/import \
  -H 'Content-Type: application/json' \
  -d '{"dashboard":{"id":null},"overwrite":true,
       "inputs":[{"name":"DS_PROMETHEUS","type":"datasource",
                  "pluginId":"prometheus","value":"Prometheus"}],
       "pluginId":"","folderId":0,
       "gnetId": 24177}'   # or 24178, or 1860

Pi 5 Deployment Steps

Full install sequence for the Raspberry Pi

# 1. Install Grafana Alloy (ARM64)
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key \
  | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] \
  https://apt.grafana.com stable main" \
  | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install -y alloy

# 2. Deploy config (see Slide 6 for full config)
sudo tee /etc/alloy/config.alloy < alloy-config.alloy

# 3. Enable Tailscale client metrics
sudo tailscale set --webclient

# 4. Start Alloy
sudo systemctl enable --now alloy

# 5. Verify
curl http://localhost:12345       # Alloy debug UI
journalctl -u alloy -f           # Alloy logs

Note Run Alloy as systemd service on the Pi, not Docker, to minimize RAM overhead (~100-200 MiB vs ~500 MiB in Docker).

DigitalOcean Deployment

Run Alloy in Docker alongside existing containers

# Add to existing docker-compose.yml or run standalone

docker run -d \
  --name alloy \
  --restart always \
  --network host \
  -v /var/log:/var/log:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v ./alloy-config.alloy:/etc/alloy/config.alloy \
  grafana/alloy:latest \
  run /etc/alloy/config.alloy

Enable Tailscale metrics

sudo tailscale set --webclient

Tailscale Exporter (fleet metrics, runs on Mac Mini)

# Add to ~/monitoring-stack/docker-compose.yml
  tailscale-exporter:
    image: ghcr.io/adinhodovic/tailscale-exporter:latest
    container_name: tailscale-exporter
    restart: always
    environment:
      - TS_CLIENT_ID=<your-oauth-client-id>
      - TS_CLIENT_SECRET=<your-oauth-secret>
      - [email protected]

Implementation Checklist

Phase 1: Enable Prometheus remote_write

Add --web.enable-remote-write-receiver to Prometheus in docker-compose
Add Tailscale client scrape jobs to prometheus.yml
Restart Prometheus container

Phase 2: Deploy Alloy on Pi 5

Install Alloy via apt
Deploy config pointing to Mac Mini Tailscale IP
Enable tailscale set --webclient
Verify metrics in Prometheus, logs in Loki

Phase 3: Deploy Alloy on DO host

Run Alloy Docker container
Enable tailscale set --webclient
Verify metrics + logs flowing

Phase 4: Fleet monitoring

Create Tailscale OAuth client
Deploy tailscale-exporter container
Import dashboards 24177, 24178, 1860
Configure Tailscale ACLs

End state: All 3 hosts shipping metrics + logs to Ocasia's Docker stack. Tailscale mesh fully monitored. 4 Grafana dashboards. One agent per remote host.