Lightweight OpenClaw alternatives, local LLMs, and the optimal headless OS
Research compiled February 26, 2026
4 parallel researchers | Claude, Perplexity, Gemini, General
Target: RPi 5 • 8GB RAM • ARM64 • Headless
What we researched
Running a full AI agent + local LLM on an 8GB ARM board
| Component | RAM |
|---|---|
| OS + system | 150-200 MB (Ubuntu) / 30-50 MB (DietPi) |
| Agent framework | 5-200 MB (depends on choice) |
| Local LLM (3B Q4) | ~3.2 GB |
| KV cache (1024 ctx) | ~200-400 MB |
| Headroom | ~4 GB free |
12 alternatives evaluated — from 5MB Rust binaries to full platforms
| Framework | Language | RAM | Telegram | MCP | Sub-agents | Tier |
|---|---|---|---|---|---|---|
| ZeroClaw | Rust | <5 MB | Yes | Yes | Yes | Pi-First |
| Moltis | Rust | ~60 MB | Yes | Yes | Yes | Pi-First |
| Neko | Rust/Go | <100 MB | Yes | Yes | No | Pi-First |
| NanoBot | Python | 191 MB | Yes | Yes | Yes | Pi-First |
| Picobot | Go | <256 MB | Yes | No | No | Pi-First |
| NanoClaw | TypeScript | ~200 MB | No | Yes | Lightweight | |
| n8n | TypeScript | 300-500 MB | Yes | Community | Yes | Heavy |
| Open WebUI | Python/Svelte | 500 MB-1 GB | No | No | No | Heavy |
| Agno | Python | Light | DIY | No | Yes | Framework |
| LocalAI | Go | Light | No | No | No | Inference Only |
The direct OpenClaw replacement — 99% less RAM
Rust-based OpenClaw runtime replacement. Single static binary, no Node.js, no npm, no runtime deps.
Reads OpenClaw config files directly — migration path exists from your current setup.
| LLM gateway routing | 22+ providers |
| Tool/skill system | Rust trait plugins |
| Telegram | Yes |
| Discord/Slack | Yes |
| MCP support | Yes |
| Scheduled jobs | Yes |
| Sub-agents | Yes |
| Memory | Local vector + keyword |
Newer project, smaller community. Rust binary = can't hot-modify core.
Most feature-complete Rust option — OpenClaw parity in one binary
"A local-first AI gateway — a single Rust binary that sits between you and multiple LLM providers." Hit HN front page Feb 2026.
ghcr.io/moltis-org/moltis:latest (multi-arch)Neko — Targets Pi Zero 2W (512MB). Markdown file memory (like PAI). Telegram + MCP. Minimal but elegant.
NanoBot (HKUDS) — Python, 191MB on Pi, broadest messaging support (Telegram, Discord, WhatsApp, Feishu, QQ). Fully auditable.
Flowise — 4GB+ recommended. ARM64 stability issues.
Open WebUI — No Telegram/bot interface. Chat UI only.
Fedora IoT — Wrong tool entirely.
The sweet spot: 1-3B parameter models in Q4_K_M quantization
| Model | Params | GGUF Size | RAM | Speed | Best For |
|---|---|---|---|---|---|
| Qwen2.5 1.5B Instruct | 1.5B | ~1.0 GB | ~1.8 GB | 15-20 t/s | Best speed-to-quality ratio |
| Gemma3 1B | 1B | ~700 MB | ~1.2 GB | 10-15 t/s | Fast conversational agent |
| Llama 3.2 1B Instruct | 1B | ~700 MB | ~1.2 GB | 10-14 t/s | General assistant |
| Qwen2.5 3B Instruct | 3B | ~2.0 GB | ~3.2 GB | 5-8 t/s | Better reasoning |
| Llama 3.2 3B Instruct | 3B | ~2.0 GB | ~3.2 GB | 4-6 t/s | Solid all-around |
| Model | Params | RAM | Speed | Notes |
|---|---|---|---|---|
| Gemma2 2B | 2B | ~2.5 GB | 6-9 t/s | Strong benchmarks for size |
| BitNet b1.58 2B | 2B | ~800 MB | ~8 t/s | 1-bit quant, tiny footprint |
| Phi-3.5 Mini | 3.8B | ~3.8 GB | 3-4 t/s | Good at coding, hallucination-prone |
| DeepSeek-R1 (distilled) | 1.5B | ~1.0 GB | 8-12 t/s | Strong reasoning |
| Qwen2.5 0.5B | 0.5B | ~700 MB | 25-35 t/s | Ultra-fast, some quality loss |
7B models: 1-3 tok/sec. Mistral 7B, Llama 2 7B, Phi-4 — all too slow for conversational use.
Token/sec thresholds for real-world usability
Feels like typing. Qwen2.5 0.5B, Qwen2.5 1.5B
Responses feel live, slight pause. Gemma3 1B, Llama 3.2 1B, TinyLlama
Noticeable wait, OK for short exchanges. Qwen2.5 3B, Llama 3.2 3B
Only for batch/async tasks. All 7B models on Pi 5
Batch processing only. 13B+ models
Critical caveat: With 2048-token context, expect 6-50 seconds total response time. Keep context at 512-1024 tokens for interactive feel.
How you run the model matters as much as which model you pick
Gold standard for CPU inference. ARM64/NEON native. 10-20% faster than Ollama.
cmake -B build \ -DGGML_BLAS=ON \ -DGGML_BLAS_VENDOR=OpenBLAS \ -DCMAKE_C_FLAGS="-march=armv8.2-a+dotprod+fp16" cmake --build build --config Release -j4
Best for: Maximum performance, full control
One-command install. OpenAI-compatible API. Agent frameworks love it.
OLLAMA_CONTEXT_LENGTH=512 OLLAMA_NUM_PARALLEL=1 OLLAMA_KEEP_ALIVE=24h
Best for: Agent framework integration
Single-file executable. Up to 4x faster than Ollama and 30-40% lower power draw. Academic paper (arxiv:2511.07425) confirmed.
Best for: Max speed without API server
Microsoft's 1-bit model runtime. 1.37x-5.07x speedup over llama.cpp. Only works with BitNet b1.58 models.
Best for: Absolute minimum RAM footprint
Real Pi 5 8GB numbers from multiple independent sources
| Model | Quant | RAM | Ollama t/s | llama.cpp t/s | llamafile t/s |
|---|---|---|---|---|---|
| Qwen2.5 0.5B | Q4_K_M | ~700 MB | 25-35 | 30-40 | - |
| Qwen2.5 1.5B | Q4_K_M | ~1.8 GB | 15-20 | 18-22 | - |
| TinyLlama 1.1B | Q4_K_M | ~1.1 GB | 12-15 | 15-18 | ~23 |
| Gemma3 1B | Q4_K_M | ~1.2 GB | ~10 | ~15 | - |
| Llama 3.2 1B | Q4_K_M | ~1.2 GB | 10-12 | 12-15 | - |
| BitNet b1.58 2B | I2_S | ~800 MB | - | ~8 (bitnet.cpp) | - |
| Gemma2 2B | Q4_K_M | ~2.5 GB | 6-9 | 8-11 | - |
| Qwen2.5 3B | Q4_K_M | ~3.2 GB | 5-8 | 7-10 | - |
| Llama 3.2 3B | Q4_K_M | ~3.2 GB | 3.3-5.8 | 5-7 | - |
| Phi-3.5 Mini | Q4_K_M | ~3.8 GB | ~3.2 | ~4-5 | - |
| Llama 2 7B | Q4_K_M | ~5.5 GB | 1.5-2.5 | 2-3.5 | - |
| DeepSeek-R1 7B | Q4_K_M | ~7 GB | ~1.4 | ~2 | - |
Sources: Stratosphere Labs, arxiv:2511.07425, itsfoss.com, DFRobot, aidatatools, Raspberry Pi Forums
7 OS options evaluated for headless Pi 5 AI workloads
| OS | Idle RAM | Docker | ARM64 Opt | AI Ecosystem | Verdict |
|---|---|---|---|---|---|
| Ubuntu 24.04 LTS | 150-200 MB | Best | Very Good | Best | Recommended |
| DietPi | 30-50 MB | Good | Excellent | Moderate | RAM-Critical |
| Pi OS Lite 64-bit | 60-80 MB | Good | Best | Solid | Safe Default |
| Armbian | 80-120 MB | Good | Good | Moderate | No advantage |
| NixOS ARM64 | 150-200 MB | Good | Good | Growing | Expert only |
| Alpine Linux | 15-30 MB | musl issues | Good | Poor | Avoid |
| Fedora IoT | 200-250 MB | Podman | Good | Small | Avoid |
The two real contenders
Choose when: Docker is central to your stack, or you want maximum ecosystem compatibility.
dietpi-softwareChoose when: Every MB matters (running 7B model), or running LLM natively (not Docker).
Kernel parameters, swap strategy, and filesystem for LLM inference
/etc/sysctl.confvm.swappiness=10 vm.overcommit_memory=1 vm.dirty_ratio=5 vm.dirty_background_ratio=2
echo performance | tee \ /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo madvise > \ /sys/kernel/mm/transparent_hugepage/enabled
# /etc/security/limits.conf * hard memlock unlimited * soft memlock unlimited # llama.cpp --mlock flag
1000x faster than physical storage. 2GB zram = ~4GB at 2:1 compression.
apt install zram-tools echo "ALGO=lz4" >> /etc/default/zramswap echo "PERCENT=25" >> /etc/default/zramswap systemctl enable zramswap --now
| Level | RAM Savings | Quality |
|---|---|---|
| Q4_K_M | -50% | Small loss |
| Q5_K_M | -38% | Minimal loss |
| Q3_K_M | -63% | Moderate loss |
NVMe, active cooling, and power — the non-negotiables
Model load: 45s (SD) vs 2-3s (NVMe)
Drive: Samsung 980/990 EVO 2230
Mandatory. Without it: 20-30% throughput loss from throttling.
Pi 5 needs 5V/5A (25W).
Does NOT help with LLMs. Vision tasks only. Skip it.
Supports LLMs 1-7B. The NPU that actually matters. Jan 2026 release.
If Pi 5 8GB isn't enough
| Board | RAM | AI Accel | LLM Speed | Price | Best For |
|---|---|---|---|---|---|
| Pi 5 8GB | 8 GB | CPU only* | 5-20 t/s (1-3B) | ~$80 | Budget + ecosystem |
| Pi 5 16GB | 16 GB | CPU only* | Same speed | ~$120 | Larger models, same speed |
| Orange Pi 5 Max | 16 GB | 6 TOPS NPU | TinyLlama 17.7 t/s | ~$150 | Best value upgrade |
| Radxa Rock 5B+ | 32 GB | 6 TOPS NPU | Similar to OPi5 | ~$180 | Maximum RAM on ARM SBC |
| NVIDIA Jetson Orin Nano | 8 GB | 67 TOPS (CUDA) | Best in class | ~$250 | Serious AI, don't care about cost |
* Hailo-10H AI HAT+ 2 adds 40 TOPS NPU to Pi 5
Real-world Pi 5 AI agent builds
github.com/syxanash/maxheadbox
Fully local AI agent desk toy. Pi 5, USB mic, screen. Qwen3 1.7B (tool-calling) + Gemma3 1B (conversational). Wake-word triggered.
Covered by XDA Developers, Hackaday
github.com/brenpoly/be-more-agent
100% local conversational AI agent. Ollama + Whisper.cpp (STT). Full Pi local stack, no cloud.
github.com/m15-ai/TrooperAI
Low-latency local voice assistant for Pi 5 with LED and gesture control. Real-time STT + streaming LLM + TTS.
github.com/RightNow-AI/picolm
Runs 1B LLM on boards with 256MB RAM. JSON grammar mode for structured tool calling on tiny hardware.
raspberrypi.com/news + openclawpi.com
Official Raspberry Pi blog featured OpenClaw. Adafruit has full install guide. Dedicated community site.
home-assistant.io/integrations/ollama
Official Ollama integration. home-llm project with Pi-optimized 3B model for home control. Wyoming voice pipeline.
The final answer for Pi 5 8GB headless AI agent
| Layer | Choice | Why |
|---|---|---|
| OS | Ubuntu Server 24.04 LTS | Best Docker + AI ecosystem. DietPi if pushing 7B. |
| Storage | NVMe SSD via PCIe HAT | 2-3s model load vs 45s. Waveshare HAT+ ($10) |
| Agent | ZeroClaw or Moltis | ZeroClaw for min RAM (<5MB). Moltis for max features. |
| Inference | Ollama | OpenAI-compatible API. Easiest agent integration. |
| Model | Qwen2.5 1.5B Instruct Q4_K_M | Best speed-to-quality. 15-20 t/s, 1.8GB RAM. |
| Swap | 2GB zram (lz4) | 1000x faster than disk swap |
| Filesystem | ext4 + noatime | Battle-tested, no drama |
| Cooling | Official Active Cooler | Non-negotiable for sustained inference |
OLLAMA_CONTEXT_LENGTH=1024 OLLAMA_NUM_PARALLEL=1 OLLAMA_KEEP_ALIVE=24h
vm.swappiness=10 vm.overcommit_memory=1 CPU governor: performance THP: madvise
Total hardware cost: ~$110 (Pi 5 8GB + Active Cooler + NVMe HAT + SSD)
Key references from 4 parallel research agents
Benchmarks & Papers
• Stratosphere Labs — LLM Performance on Pi 5
• arxiv:2511.07425 — LLM Inference on SBCs
• itsfoss.com — 9 LLMs on Pi 5
• DFRobot Lab — SLM Performance Analysis
• aidatatools — llamafile vs llama.cpp comparison
Agent Frameworks
• ZeroClaw — github.com/zeroclaw-labs/zeroclaw
• Moltis — github.com/moltis-org/moltis
• Neko — github.com/superhq-ai/neko
• NanoBot — github.com/HKUDS/nanobot
• Picobot — github.com/louisho5/picobot
• NanoClaw — github.com/qwibitai/nanoclaw
Community Projects
• Max Headbox — blog.simone.computer
• OpenClaw on Pi — raspberrypi.com/news
• be-more-agent, TrooperAI, PicoLM (GitHub)
• Arm Learning Paths — Smart Home LLM guide
OS & System
• DietPi — dietpi.com/stats
• Jeff Geerling — NVMe boot, overclocking
• NixOS Wiki — Pi 5 ARM support
• eunomia.dev — OS-Level LLM Optimizations
Hardware
• Pi AI HAT+ 2 — raspberrypi.com announcement
• ezrknn-llm — RK3588 NPU toolkit
• ThinkRobotics — Jetson vs Pi 5 comparison
Research date: February 26, 2026 • PAI Research System • 4 parallel agents