🇯🇮

AI Agent on Raspberry Pi 5

Lightweight OpenClaw alternatives, local LLMs, and the optimal headless OS

Research compiled February 26, 2026

4 parallel researchers | Claude, Perplexity, Gemini, General

Target: RPi 5 • 8GB RAM • ARM64 • Headless

Agenda

What we researched

01   The Challenge — What we need from a Pi 5
02   Agent Frameworks — Lightweight OpenClaw alternatives
03   Top Agent Picks — ZeroClaw, Moltis, Neko, NanoBot
04   Local LLMs — Models that fit in 4-5GB
05   Inference Engines — Ollama vs llama.cpp vs llamafile
06   Benchmark Data — Real tok/sec numbers
07   Operating System — Ubuntu vs DietPi vs Pi OS
08   System Tuning — Kernel, swap, filesystem
09   Hardware — NVMe, cooling, UPS, alternative SBCs
10   Community Projects — Real-world Pi AI builds
11   Recommended Stack — The final answer
12   Sources — All references

The Challenge

Running a full AI agent + local LLM on an 8GB ARM board

What OpenClaw Provides
  • LLM gateway routing (multi-provider)
  • Tool/skill system with MCP support
  • Multi-channel messaging (Telegram, Discord, etc.)
  • Scheduled jobs & cron
  • Sub-agent spawning
  • Persistent memory
Pi 5 Constraints
  • 8GB LPDDR4X — shared between OS, agent, and model
  • CPU-only inference — no GPU, Cortex-A76 quad-core
  • NVMe via PCIe — fast model loading
  • Thermal limits — throttles at 80°C
  • ARM64 native — good ecosystem support

RAM Budget

ComponentRAM
OS + system150-200 MB (Ubuntu) / 30-50 MB (DietPi)
Agent framework5-200 MB (depends on choice)
Local LLM (3B Q4)~3.2 GB
KV cache (1024 ctx)~200-400 MB
Headroom~4 GB free

Agent Frameworks

12 alternatives evaluated — from 5MB Rust binaries to full platforms

FrameworkLanguageRAMTelegramMCPSub-agentsTier
ZeroClawRust<5 MB YesYesYes Pi-First
MoltisRust~60 MB YesYesYes Pi-First
NekoRust/Go<100 MB YesYesNo Pi-First
NanoBotPython191 MB YesYesYes Pi-First
PicobotGo<256 MB YesNoNo Pi-First
NanoClawTypeScript~200 MB WhatsAppNoYes Lightweight
n8nTypeScript300-500 MB YesCommunityYes Heavy
Open WebUIPython/Svelte500 MB-1 GB NoNoNo Heavy
AgnoPythonLight DIYNoYes Framework
LocalAIGoLight NoNoNo Inference Only

Top Pick: ZeroClaw

The direct OpenClaw replacement — 99% less RAM

What It Is

Rust-based OpenClaw runtime replacement. Single static binary, no Node.js, no npm, no runtime deps.

Key Stats

  • Binary size: 3.4 MB
  • RAM idle: <5 MB
  • Startup: <10 ms
  • ARM64: native cross-compiled binary
  • LLM providers: 22+

Killer Feature

Reads OpenClaw config files directly — migration path exists from your current setup.

Feature Parity with OpenClaw

LLM gateway routing22+ providers
Tool/skill systemRust trait plugins
TelegramYes
Discord/SlackYes
MCP supportYes
Scheduled jobsYes
Sub-agentsYes
MemoryLocal vector + keyword

Trade-off

Newer project, smaller community. Rust binary = can't hot-modify core.

Runner-Up: Moltis

Most feature-complete Rust option — OpenClaw parity in one binary

What It Is

"A local-first AI gateway — a single Rust binary that sits between you and multiple LLM providers." Hit HN front page Feb 2026.

Key Stats

  • Binary: 60 MB (150k lines, includes web UI)
  • Docker: ghcr.io/moltis-org/moltis:latest (multi-arch)
  • 2,300+ tests, zero unsafe Rust

Everything In One Binary

  • Multi-provider LLM gateway
  • MCP support (stdio + HTTP/SSE, health polling, auto-restart)
  • Telegram + web UI + API
  • Embeddings-powered long-term memory with auto-compaction
  • Sub-agents
  • Built-in TTS
  • Browser automation
  • Scheduled jobs
Also Worth Watching

Neko — Targets Pi Zero 2W (512MB). Markdown file memory (like PAI). Telegram + MCP. Minimal but elegant.

NanoBot (HKUDS) — Python, 191MB on Pi, broadest messaging support (Telegram, Discord, WhatsApp, Feishu, QQ). Fully auditable.

Skip These for Pi

Flowise — 4GB+ recommended. ARM64 stability issues.

Open WebUI — No Telegram/bot interface. Chat UI only.

Fedora IoT — Wrong tool entirely.

Local LLMs for Pi 5

The sweet spot: 1-3B parameter models in Q4_K_M quantization

Tier 1: Recommended

ModelParamsGGUF SizeRAMSpeedBest For
Qwen2.5 1.5B Instruct1.5B~1.0 GB~1.8 GB 15-20 t/sBest speed-to-quality ratio
Gemma3 1B1B~700 MB~1.2 GB 10-15 t/sFast conversational agent
Llama 3.2 1B Instruct1B~700 MB~1.2 GB 10-14 t/sGeneral assistant
Qwen2.5 3B Instruct3B~2.0 GB~3.2 GB 5-8 t/sBetter reasoning
Llama 3.2 3B Instruct3B~2.0 GB~3.2 GB 4-6 t/sSolid all-around

Tier 2: Situational

ModelParamsRAMSpeedNotes
Gemma2 2B2B~2.5 GB6-9 t/sStrong benchmarks for size
BitNet b1.58 2B2B~800 MB~8 t/s1-bit quant, tiny footprint
Phi-3.5 Mini3.8B~3.8 GB3-4 t/sGood at coding, hallucination-prone
DeepSeek-R1 (distilled)1.5B~1.0 GB8-12 t/sStrong reasoning
Qwen2.5 0.5B0.5B~700 MB25-35 t/sUltra-fast, some quality loss

Do NOT Run on Pi 5

7B models: 1-3 tok/sec. Mistral 7B, Llama 2 7B, Phi-4 — all too slow for conversational use.

What Speed Feels Like

Token/sec thresholds for real-world usability

15+ tok/sec — Near-instant

Feels like typing. Qwen2.5 0.5B, Qwen2.5 1.5B

8-15 tok/sec — Very usable

Responses feel live, slight pause. Gemma3 1B, Llama 3.2 1B, TinyLlama

4-8 tok/sec — Acceptable

Noticeable wait, OK for short exchanges. Qwen2.5 3B, Llama 3.2 3B

1-4 tok/sec — Painful

Only for batch/async tasks. All 7B models on Pi 5

Below 1 tok/sec — Unusable

Batch processing only. 13B+ models

Critical caveat: With 2048-token context, expect 6-50 seconds total response time. Keep context at 512-1024 tokens for interactive feel.

Inference Engines

How you run the model matters as much as which model you pick

llama.cpp Fastest

Gold standard for CPU inference. ARM64/NEON native. 10-20% faster than Ollama.

cmake -B build \
  -DGGML_BLAS=ON \
  -DGGML_BLAS_VENDOR=OpenBLAS \
  -DCMAKE_C_FLAGS="-march=armv8.2-a+dotprod+fp16"
cmake --build build --config Release -j4

Best for: Maximum performance, full control

Ollama Easiest

One-command install. OpenAI-compatible API. Agent frameworks love it.

OLLAMA_CONTEXT_LENGTH=512
OLLAMA_NUM_PARALLEL=1
OLLAMA_KEEP_ALIVE=24h

Best for: Agent framework integration

llamafile Highest Throughput

Single-file executable. Up to 4x faster than Ollama and 30-40% lower power draw. Academic paper (arxiv:2511.07425) confirmed.

Best for: Max speed without API server

BitNet.cpp Specialized

Microsoft's 1-bit model runtime. 1.37x-5.07x speedup over llama.cpp. Only works with BitNet b1.58 models.

Best for: Absolute minimum RAM footprint

Benchmark Data

Real Pi 5 8GB numbers from multiple independent sources

ModelQuantRAMOllama t/sllama.cpp t/sllamafile t/s
Qwen2.5 0.5BQ4_K_M~700 MB25-3530-40-
Qwen2.5 1.5BQ4_K_M~1.8 GB15-2018-22-
TinyLlama 1.1BQ4_K_M~1.1 GB12-1515-18~23
Gemma3 1BQ4_K_M~1.2 GB~10~15-
Llama 3.2 1BQ4_K_M~1.2 GB10-1212-15-
BitNet b1.58 2BI2_S~800 MB-~8 (bitnet.cpp)-
Gemma2 2BQ4_K_M~2.5 GB6-98-11-
Qwen2.5 3BQ4_K_M~3.2 GB5-87-10-
Llama 3.2 3BQ4_K_M~3.2 GB3.3-5.85-7-
Phi-3.5 MiniQ4_K_M~3.8 GB~3.2~4-5-
Llama 2 7BQ4_K_M~5.5 GB1.5-2.52-3.5-
DeepSeek-R1 7BQ4_K_M~7 GB~1.4~2-

Sources: Stratosphere Labs, arxiv:2511.07425, itsfoss.com, DFRobot, aidatatools, Raspberry Pi Forums

Operating System

7 OS options evaluated for headless Pi 5 AI workloads

OSIdle RAMDockerARM64 OptAI EcosystemVerdict
Ubuntu 24.04 LTS150-200 MB BestVery GoodBest Recommended
DietPi30-50 MB GoodExcellentModerate RAM-Critical
Pi OS Lite 64-bit60-80 MB GoodBestSolid Safe Default
Armbian80-120 MB GoodGoodModerate No advantage
NixOS ARM64150-200 MB GoodGoodGrowing Expert only
Alpine Linux15-30 MB musl issuesGoodPoor Avoid
Fedora IoT200-250 MB PodmanGoodSmall Avoid

Ubuntu vs DietPi

The two real contenders

Ubuntu Server 24.04 LTS
  • Best Docker support — Docker's primary platform
  • Widest AI/ML ecosystem — every tool documents Ubuntu
  • 5-year LTS — support until 2029
  • Kernel 6.8+ with Cortex-A76 optimizations
  • Python 3.12, Node 20/22
  • 150-200 MB idle RAM

Choose when: Docker is central to your stack, or you want maximum ecosystem compatibility.

DietPi
  • 30-50 MB idle RAM — 4x lighter than Ubuntu
  • Auto-optimizes — CPU governor, tmpfs, disabled services
  • Pi OS kernel — inherits all Pi-specific patches
  • Good Docker via dietpi-software
  • Same apt repos as Pi OS
  • Fewer AI tutorials target it

Choose when: Every MB matters (running 7B model), or running LLM natively (not Docker).

System Tuning

Kernel parameters, swap strategy, and filesystem for LLM inference

Kernel Parameters /etc/sysctl.conf

vm.swappiness=10
vm.overcommit_memory=1
vm.dirty_ratio=5
vm.dirty_background_ratio=2

CPU Governor

echo performance | tee \
  /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Transparent Huge Pages

echo madvise > \
  /sys/kernel/mm/transparent_hugepage/enabled

Memory Locking

# /etc/security/limits.conf
* hard memlock unlimited
* soft memlock unlimited
# llama.cpp --mlock flag

Swap Strategy: zram

Use zram, not NVMe swap

1000x faster than physical storage. 2GB zram = ~4GB at 2:1 compression.

apt install zram-tools
echo "ALGO=lz4" >> /etc/default/zramswap
echo "PERCENT=25" >> /etc/default/zramswap
systemctl enable zramswap --now

Filesystem

  • NVMe: ext4 (battle-tested, no benefit to f2fs)
  • SD card: ext4 + noatime + tmpfs for logs
  • Docker-heavy: consider btrfs for CoW snapshots

Quantization Sweet Spot

LevelRAM SavingsQuality
Q4_K_M-50%Small loss
Q5_K_M-38%Minimal loss
Q3_K_M-63%Moderate loss

Hardware Essentials

NVMe, active cooling, and power — the non-negotiables

NVMe Storage

Model load: 45s (SD) vs 2-3s (NVMe)

  • Waveshare PCIe M.2 HAT+ — <$10
  • Pimoroni NVMe Base — $14
  • 52Pi M.2 HAT — budget

Drive: Samsung 980/990 EVO 2230

Active Cooling

Mandatory. Without it: 20-30% throughput loss from throttling.

  • Official Active Cooler — $5-8, keeps 72-74°C
  • Throttle temp: 80°C
  • Hard limit: 85°C
  • Full load in <90s without cooling
UPS Power

Pi 5 needs 5V/5A (25W).

  • Geekworm X1200 — 2-cell 18650
  • Geekworm X1202 — 4-cell, 24/7
  • SunFounder PiPower 5 — smart dashboard

Hailo AI Kit Verdict

Hailo-8L (Original, 13 TOPS)

Does NOT help with LLMs. Vision tasks only. Skip it.

Hailo-10H AI HAT+ 2 (40 TOPS)

Supports LLMs 1-7B. The NPU that actually matters. Jan 2026 release.

Alternative SBCs

If Pi 5 8GB isn't enough

BoardRAMAI AccelLLM SpeedPriceBest For
Pi 5 8GB8 GBCPU only*5-20 t/s (1-3B)~$80 Budget + ecosystem
Pi 5 16GB16 GBCPU only*Same speed~$120 Larger models, same speed
Orange Pi 5 Max16 GB6 TOPS NPUTinyLlama 17.7 t/s~$150 Best value upgrade
Radxa Rock 5B+32 GB6 TOPS NPUSimilar to OPi5~$180 Maximum RAM on ARM SBC
NVIDIA Jetson Orin Nano8 GB67 TOPS (CUDA)Best in class~$250 Serious AI, don't care about cost

* Hailo-10H AI HAT+ 2 adds 40 TOPS NPU to Pi 5

Decision Framework

  • Cheapest viable setup: Pi 5 8GB + active cooler + NVMe HAT (~$110 total)
  • Better LLM speed + more RAM: Orange Pi 5 Max 16GB (RK3588 NPU accelerates inference)
  • Maximum ARM RAM: Radxa Rock 5B+ 32GB
  • Best LLM performance period: NVIDIA Jetson Orin Nano Super (67 TOPS CUDA)

Community Projects

Real-world Pi 5 AI agent builds

Max Headbox

github.com/syxanash/maxheadbox

Fully local AI agent desk toy. Pi 5, USB mic, screen. Qwen3 1.7B (tool-calling) + Gemma3 1B (conversational). Wake-word triggered.

Covered by XDA Developers, Hackaday

be-more-agent

github.com/brenpoly/be-more-agent

100% local conversational AI agent. Ollama + Whisper.cpp (STT). Full Pi local stack, no cloud.

TrooperAI

github.com/m15-ai/TrooperAI

Low-latency local voice assistant for Pi 5 with LED and gesture control. Real-time STT + streaming LLM + TTS.

PicoLM

github.com/RightNow-AI/picolm

Runs 1B LLM on boards with 256MB RAM. JSON grammar mode for structured tool calling on tiny hardware.

OpenClaw on Pi

raspberrypi.com/news + openclawpi.com

Official Raspberry Pi blog featured OpenClaw. Adafruit has full install guide. Dedicated community site.

Home Assistant + LLM

home-assistant.io/integrations/ollama

Official Ollama integration. home-llm project with Pi-optimized 3B model for home control. Wyoming voice pipeline.

Recommended Stack

The final answer for Pi 5 8GB headless AI agent

The Stack

LayerChoiceWhy
OS Ubuntu Server 24.04 LTS Best Docker + AI ecosystem. DietPi if pushing 7B.
Storage NVMe SSD via PCIe HAT 2-3s model load vs 45s. Waveshare HAT+ ($10)
Agent ZeroClaw or Moltis ZeroClaw for min RAM (<5MB). Moltis for max features.
Inference Ollama OpenAI-compatible API. Easiest agent integration.
Model Qwen2.5 1.5B Instruct Q4_K_M Best speed-to-quality. 15-20 t/s, 1.8GB RAM.
Swap 2GB zram (lz4) 1000x faster than disk swap
Filesystem ext4 + noatime Battle-tested, no drama
Cooling Official Active Cooler Non-negotiable for sustained inference
Ollama Config
OLLAMA_CONTEXT_LENGTH=1024
OLLAMA_NUM_PARALLEL=1
OLLAMA_KEEP_ALIVE=24h
Kernel Tuning
vm.swappiness=10
vm.overcommit_memory=1
CPU governor: performance
THP: madvise

Total hardware cost: ~$110 (Pi 5 8GB + Active Cooler + NVMe HAT + SSD)

Sources

Key references from 4 parallel research agents

Benchmarks & Papers

• Stratosphere Labs — LLM Performance on Pi 5

• arxiv:2511.07425 — LLM Inference on SBCs

• itsfoss.com — 9 LLMs on Pi 5

• DFRobot Lab — SLM Performance Analysis

• aidatatools — llamafile vs llama.cpp comparison

Agent Frameworks

• ZeroClaw — github.com/zeroclaw-labs/zeroclaw

• Moltis — github.com/moltis-org/moltis

• Neko — github.com/superhq-ai/neko

• NanoBot — github.com/HKUDS/nanobot

• Picobot — github.com/louisho5/picobot

• NanoClaw — github.com/qwibitai/nanoclaw

Community Projects

• Max Headbox — blog.simone.computer

• OpenClaw on Pi — raspberrypi.com/news

• be-more-agent, TrooperAI, PicoLM (GitHub)

• Arm Learning Paths — Smart Home LLM guide

OS & System

• DietPi — dietpi.com/stats

• Jeff Geerling — NVMe boot, overclocking

• NixOS Wiki — Pi 5 ARM support

• eunomia.dev — OS-Level LLM Optimizations

Hardware

• Pi AI HAT+ 2 — raspberrypi.com announcement

• ezrknn-llm — RK3588 NPU toolkit

• ThinkRobotics — Jetson vs Pi 5 comparison

Research date: February 26, 2026 • PAI Research System • 4 parallel agents