🇯🇮

AI Agent on Raspberry Pi 5

Lightweight OpenClaw alternatives, local LLMs, and the optimal headless OS

Research compiled February 26, 2026

4 parallel researchers | Claude, Perplexity, Gemini, General

Target: RPi 5 • 8GB RAM • ARM64 • Headless

Agenda

What we researched

01 The Challenge — What we need from a Pi 5

02 Agent Frameworks — Lightweight OpenClaw alternatives

03 Top Agent Picks — ZeroClaw, Moltis, Neko, NanoBot

04 Local LLMs — Models that fit in 4-5GB

05 Inference Engines — Ollama vs llama.cpp vs llamafile

06 Benchmark Data — Real tok/sec numbers

07 Operating System — Ubuntu vs DietPi vs Pi OS

08 System Tuning — Kernel, swap, filesystem

09 Hardware — NVMe, cooling, UPS, alternative SBCs

10 Community Projects — Real-world Pi AI builds

11 Recommended Stack — The final answer

12 Sources — All references

The Challenge

Running a full AI agent + local LLM on an 8GB ARM board

What OpenClaw Provides

LLM gateway routing (multi-provider)
Tool/skill system with MCP support
Multi-channel messaging (Telegram, Discord, etc.)
Scheduled jobs & cron
Sub-agent spawning
Persistent memory

Pi 5 Constraints

8GB LPDDR4X — shared between OS, agent, and model
CPU-only inference — no GPU, Cortex-A76 quad-core
NVMe via PCIe — fast model loading
Thermal limits — throttles at 80°C
ARM64 native — good ecosystem support

RAM Budget

Component	RAM
OS + system	150-200 MB (Ubuntu) / 30-50 MB (DietPi)
Agent framework	5-200 MB (depends on choice)
Local LLM (3B Q4)	~3.2 GB
KV cache (1024 ctx)	~200-400 MB
Headroom	~4 GB free

Agent Frameworks

12 alternatives evaluated — from 5MB Rust binaries to full platforms

Framework	Language	RAM	Telegram	MCP	Sub-agents	Tier
ZeroClaw	Rust	<5 MB	Yes	Yes	Yes	Pi-First
Moltis	Rust	~60 MB	Yes	Yes	Yes	Pi-First
Neko	Rust/Go	<100 MB	Yes	Yes	No	Pi-First
NanoBot	Python	191 MB	Yes	Yes	Yes	Pi-First
Picobot	Go	<256 MB	Yes	No	No	Pi-First
NanoClaw	TypeScript	~200 MB	WhatsApp	No	Yes	Lightweight
n8n	TypeScript	300-500 MB	Yes	Community	Yes	Heavy
Open WebUI	Python/Svelte	500 MB-1 GB	No	No	No	Heavy
Agno	Python	Light	DIY	No	Yes	Framework
LocalAI	Go	Light	No	No	No	Inference Only

Top Pick: ZeroClaw

The direct OpenClaw replacement — 99% less RAM

What It Is

Rust-based OpenClaw runtime replacement. Single static binary, no Node.js, no npm, no runtime deps.

Key Stats

Binary size: 3.4 MB
RAM idle: <5 MB
Startup: <10 ms
ARM64: native cross-compiled binary
LLM providers: 22+

Killer Feature

Reads OpenClaw config files directly — migration path exists from your current setup.

Feature Parity with OpenClaw

LLM gateway routing	22+ providers
Tool/skill system	Rust trait plugins
Telegram	Yes
Discord/Slack	Yes
MCP support	Yes
Scheduled jobs	Yes
Sub-agents	Yes
Memory	Local vector + keyword

Trade-off

Newer project, smaller community. Rust binary = can't hot-modify core.

Runner-Up: Moltis

Most feature-complete Rust option — OpenClaw parity in one binary

What It Is

"A local-first AI gateway — a single Rust binary that sits between you and multiple LLM providers." Hit HN front page Feb 2026.

Key Stats

Binary: 60 MB (150k lines, includes web UI)
Docker: ghcr.io/moltis-org/moltis:latest (multi-arch)
2,300+ tests, zero unsafe Rust

Everything In One Binary

Multi-provider LLM gateway
MCP support (stdio + HTTP/SSE, health polling, auto-restart)
Telegram + web UI + API
Embeddings-powered long-term memory with auto-compaction
Sub-agents
Built-in TTS
Browser automation
Scheduled jobs

Also Worth Watching

Neko — Targets Pi Zero 2W (512MB). Markdown file memory (like PAI). Telegram + MCP. Minimal but elegant.

NanoBot (HKUDS) — Python, 191MB on Pi, broadest messaging support (Telegram, Discord, WhatsApp, Feishu, QQ). Fully auditable.

Skip These for Pi

Flowise — 4GB+ recommended. ARM64 stability issues.

Open WebUI — No Telegram/bot interface. Chat UI only.

Fedora IoT — Wrong tool entirely.

Local LLMs for Pi 5

The sweet spot: 1-3B parameter models in Q4_K_M quantization

Tier 1: Recommended

Model	Params	GGUF Size	RAM	Speed	Best For
Qwen2.5 1.5B Instruct	1.5B	~1.0 GB	~1.8 GB	15-20 t/s	Best speed-to-quality ratio
Gemma3 1B	1B	~700 MB	~1.2 GB	10-15 t/s	Fast conversational agent
Llama 3.2 1B Instruct	1B	~700 MB	~1.2 GB	10-14 t/s	General assistant
Qwen2.5 3B Instruct	3B	~2.0 GB	~3.2 GB	5-8 t/s	Better reasoning
Llama 3.2 3B Instruct	3B	~2.0 GB	~3.2 GB	4-6 t/s	Solid all-around

Tier 2: Situational

Model	Params	RAM	Speed	Notes
Gemma2 2B	2B	~2.5 GB	6-9 t/s	Strong benchmarks for size
BitNet b1.58 2B	2B	~800 MB	~8 t/s	1-bit quant, tiny footprint
Phi-3.5 Mini	3.8B	~3.8 GB	3-4 t/s	Good at coding, hallucination-prone
DeepSeek-R1 (distilled)	1.5B	~1.0 GB	8-12 t/s	Strong reasoning
Qwen2.5 0.5B	0.5B	~700 MB	25-35 t/s	Ultra-fast, some quality loss

Do NOT Run on Pi 5

7B models: 1-3 tok/sec. Mistral 7B, Llama 2 7B, Phi-4 — all too slow for conversational use.

What Speed Feels Like

Token/sec thresholds for real-world usability

15+ tok/sec — Near-instant

Feels like typing. Qwen2.5 0.5B, Qwen2.5 1.5B

8-15 tok/sec — Very usable

Responses feel live, slight pause. Gemma3 1B, Llama 3.2 1B, TinyLlama

4-8 tok/sec — Acceptable

Noticeable wait, OK for short exchanges. Qwen2.5 3B, Llama 3.2 3B

1-4 tok/sec — Painful

Only for batch/async tasks. All 7B models on Pi 5

Below 1 tok/sec — Unusable

Batch processing only. 13B+ models

Critical caveat: With 2048-token context, expect 6-50 seconds total response time. Keep context at 512-1024 tokens for interactive feel.

Inference Engines

How you run the model matters as much as which model you pick

llama.cpp Fastest

Gold standard for CPU inference. ARM64/NEON native. 10-20% faster than Ollama.

cmake -B build \
  -DGGML_BLAS=ON \
  -DGGML_BLAS_VENDOR=OpenBLAS \
  -DCMAKE_C_FLAGS="-march=armv8.2-a+dotprod+fp16"
cmake --build build --config Release -j4

Best for: Maximum performance, full control

Ollama Easiest

One-command install. OpenAI-compatible API. Agent frameworks love it.

OLLAMA_CONTEXT_LENGTH=512
OLLAMA_NUM_PARALLEL=1
OLLAMA_KEEP_ALIVE=24h

Best for: Agent framework integration

llamafile Highest Throughput

Single-file executable. Up to 4x faster than Ollama and 30-40% lower power draw. Academic paper (arxiv:2511.07425) confirmed.

Best for: Max speed without API server

BitNet.cpp Specialized

Microsoft's 1-bit model runtime. 1.37x-5.07x speedup over llama.cpp. Only works with BitNet b1.58 models.

Best for: Absolute minimum RAM footprint

Benchmark Data

Real Pi 5 8GB numbers from multiple independent sources

Model	Quant	RAM	Ollama t/s	llama.cpp t/s	llamafile t/s
Qwen2.5 0.5B	Q4_K_M	~700 MB	25-35	30-40	-
Qwen2.5 1.5B	Q4_K_M	~1.8 GB	15-20	18-22	-
TinyLlama 1.1B	Q4_K_M	~1.1 GB	12-15	15-18	~23
Gemma3 1B	Q4_K_M	~1.2 GB	~10	~15	-
Llama 3.2 1B	Q4_K_M	~1.2 GB	10-12	12-15	-
BitNet b1.58 2B	I2_S	~800 MB	-	~8 (bitnet.cpp)	-
Gemma2 2B	Q4_K_M	~2.5 GB	6-9	8-11	-
Qwen2.5 3B	Q4_K_M	~3.2 GB	5-8	7-10	-
Llama 3.2 3B	Q4_K_M	~3.2 GB	3.3-5.8	5-7	-
Phi-3.5 Mini	Q4_K_M	~3.8 GB	~3.2	~4-5	-
Llama 2 7B	Q4_K_M	~5.5 GB	1.5-2.5	2-3.5	-
DeepSeek-R1 7B	Q4_K_M	~7 GB	~1.4	~2	-

Sources: Stratosphere Labs, arxiv:2511.07425, itsfoss.com, DFRobot, aidatatools, Raspberry Pi Forums

Operating System

7 OS options evaluated for headless Pi 5 AI workloads

OS	Idle RAM	Docker	ARM64 Opt	AI Ecosystem	Verdict
Ubuntu 24.04 LTS	150-200 MB	Best	Very Good	Best	Recommended
DietPi	30-50 MB	Good	Excellent	Moderate	RAM-Critical
Pi OS Lite 64-bit	60-80 MB	Good	Best	Solid	Safe Default
Armbian	80-120 MB	Good	Good	Moderate	No advantage
NixOS ARM64	150-200 MB	Good	Good	Growing	Expert only
Alpine Linux	15-30 MB	musl issues	Good	Poor	Avoid
Fedora IoT	200-250 MB	Podman	Good	Small	Avoid

Ubuntu vs DietPi

The two real contenders

Ubuntu Server 24.04 LTS

Best Docker support — Docker's primary platform
Widest AI/ML ecosystem — every tool documents Ubuntu
5-year LTS — support until 2029
Kernel 6.8+ with Cortex-A76 optimizations
Python 3.12, Node 20/22
150-200 MB idle RAM

Choose when: Docker is central to your stack, or you want maximum ecosystem compatibility.

DietPi

30-50 MB idle RAM — 4x lighter than Ubuntu
Auto-optimizes — CPU governor, tmpfs, disabled services
Pi OS kernel — inherits all Pi-specific patches
Good Docker via dietpi-software
Same apt repos as Pi OS
Fewer AI tutorials target it

Choose when: Every MB matters (running 7B model), or running LLM natively (not Docker).

System Tuning

Kernel parameters, swap strategy, and filesystem for LLM inference

Kernel Parameters `/etc/sysctl.conf`

vm.swappiness=10
vm.overcommit_memory=1
vm.dirty_ratio=5
vm.dirty_background_ratio=2

CPU Governor

echo performance | tee \
  /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Transparent Huge Pages

echo madvise > \
  /sys/kernel/mm/transparent_hugepage/enabled

Memory Locking

# /etc/security/limits.conf
* hard memlock unlimited
* soft memlock unlimited
# llama.cpp --mlock flag

Swap Strategy: zram

Use zram, not NVMe swap

1000x faster than physical storage. 2GB zram = ~4GB at 2:1 compression.

apt install zram-tools
echo "ALGO=lz4" >> /etc/default/zramswap
echo "PERCENT=25" >> /etc/default/zramswap
systemctl enable zramswap --now

Filesystem

NVMe: ext4 (battle-tested, no benefit to f2fs)
SD card: ext4 + noatime + tmpfs for logs
Docker-heavy: consider btrfs for CoW snapshots

Quantization Sweet Spot

Level	RAM Savings	Quality
Q4_K_M	-50%	Small loss
Q5_K_M	-38%	Minimal loss
Q3_K_M	-63%	Moderate loss

Hardware Essentials

NVMe, active cooling, and power — the non-negotiables

NVMe Storage

Model load: 45s (SD) vs 2-3s (NVMe)

Waveshare PCIe M.2 HAT+ — <$10
Pimoroni NVMe Base — $14
52Pi M.2 HAT — budget

Drive: Samsung 980/990 EVO 2230

Active Cooling

Mandatory. Without it: 20-30% throughput loss from throttling.

Official Active Cooler — $5-8, keeps 72-74°C
Throttle temp: 80°C
Hard limit: 85°C
Full load in <90s without cooling

UPS Power

Pi 5 needs 5V/5A (25W).

Geekworm X1200 — 2-cell 18650
Geekworm X1202 — 4-cell, 24/7
SunFounder PiPower 5 — smart dashboard

Hailo AI Kit Verdict

Hailo-8L (Original, 13 TOPS)

Does NOT help with LLMs. Vision tasks only. Skip it.

Hailo-10H AI HAT+ 2 (40 TOPS)

Supports LLMs 1-7B. The NPU that actually matters. Jan 2026 release.

Alternative SBCs

If Pi 5 8GB isn't enough

Board	RAM	AI Accel	LLM Speed	Price	Best For
Pi 5 8GB	8 GB	CPU only*	5-20 t/s (1-3B)	~$80	Budget + ecosystem
Pi 5 16GB	16 GB	CPU only*	Same speed	~$120	Larger models, same speed
Orange Pi 5 Max	16 GB	6 TOPS NPU	TinyLlama 17.7 t/s	~$150	Best value upgrade
Radxa Rock 5B+	32 GB	6 TOPS NPU	Similar to OPi5	~$180	Maximum RAM on ARM SBC
NVIDIA Jetson Orin Nano	8 GB	67 TOPS (CUDA)	Best in class	~$250	Serious AI, don't care about cost

* Hailo-10H AI HAT+ 2 adds 40 TOPS NPU to Pi 5

Decision Framework

Cheapest viable setup: Pi 5 8GB + active cooler + NVMe HAT (~$110 total)
Better LLM speed + more RAM: Orange Pi 5 Max 16GB (RK3588 NPU accelerates inference)
Maximum ARM RAM: Radxa Rock 5B+ 32GB
Best LLM performance period: NVIDIA Jetson Orin Nano Super (67 TOPS CUDA)

Community Projects

Real-world Pi 5 AI agent builds

Max Headbox

github.com/syxanash/maxheadbox

Fully local AI agent desk toy. Pi 5, USB mic, screen. Qwen3 1.7B (tool-calling) + Gemma3 1B (conversational). Wake-word triggered.

Covered by XDA Developers, Hackaday

be-more-agent

github.com/brenpoly/be-more-agent

100% local conversational AI agent. Ollama + Whisper.cpp (STT). Full Pi local stack, no cloud.

TrooperAI

github.com/m15-ai/TrooperAI

Low-latency local voice assistant for Pi 5 with LED and gesture control. Real-time STT + streaming LLM + TTS.

PicoLM

github.com/RightNow-AI/picolm

Runs 1B LLM on boards with 256MB RAM. JSON grammar mode for structured tool calling on tiny hardware.

OpenClaw on Pi

raspberrypi.com/news + openclawpi.com

Official Raspberry Pi blog featured OpenClaw. Adafruit has full install guide. Dedicated community site.

Home Assistant + LLM

home-assistant.io/integrations/ollama

Official Ollama integration. home-llm project with Pi-optimized 3B model for home control. Wyoming voice pipeline.

Recommended Stack

The final answer for Pi 5 8GB headless AI agent

The Stack

Layer	Choice	Why
OS	Ubuntu Server 24.04 LTS	Best Docker + AI ecosystem. DietPi if pushing 7B.
Storage	NVMe SSD via PCIe HAT	2-3s model load vs 45s. Waveshare HAT+ ($10)
Agent	ZeroClaw or Moltis	ZeroClaw for min RAM (<5MB). Moltis for max features.
Inference	Ollama	OpenAI-compatible API. Easiest agent integration.
Model	Qwen2.5 1.5B Instruct Q4_K_M	Best speed-to-quality. 15-20 t/s, 1.8GB RAM.
Swap	2GB zram (lz4)	1000x faster than disk swap
Filesystem	ext4 + noatime	Battle-tested, no drama
Cooling	Official Active Cooler	Non-negotiable for sustained inference

Ollama Config

OLLAMA_CONTEXT_LENGTH=1024
OLLAMA_NUM_PARALLEL=1
OLLAMA_KEEP_ALIVE=24h

Kernel Tuning

vm.swappiness=10
vm.overcommit_memory=1
CPU governor: performance
THP: madvise

Total hardware cost: ~$110 (Pi 5 8GB + Active Cooler + NVMe HAT + SSD)

Sources

Key references from 4 parallel research agents

Benchmarks & Papers

• Stratosphere Labs — LLM Performance on Pi 5

• arxiv:2511.07425 — LLM Inference on SBCs

• itsfoss.com — 9 LLMs on Pi 5

• DFRobot Lab — SLM Performance Analysis

• aidatatools — llamafile vs llama.cpp comparison

Agent Frameworks

• ZeroClaw — github.com/zeroclaw-labs/zeroclaw

• Moltis — github.com/moltis-org/moltis

• Neko — github.com/superhq-ai/neko

• NanoBot — github.com/HKUDS/nanobot

• Picobot — github.com/louisho5/picobot

• NanoClaw — github.com/qwibitai/nanoclaw

Community Projects

• Max Headbox — blog.simone.computer

• OpenClaw on Pi — raspberrypi.com/news

• be-more-agent, TrooperAI, PicoLM (GitHub)

• Arm Learning Paths — Smart Home LLM guide

OS & System

• DietPi — dietpi.com/stats

• Jeff Geerling — NVMe boot, overclocking

• NixOS Wiki — Pi 5 ARM support

• eunomia.dev — OS-Level LLM Optimizations

Hardware

• Pi AI HAT+ 2 — raspberrypi.com announcement

• ezrknn-llm — RK3588 NPU toolkit

• ThinkRobotics — Jetson vs Pi 5 comparison

Research date: February 26, 2026 • PAI Research System • 4 parallel agents

AI Agent on Raspberry Pi 5

Agenda

The Challenge

RAM Budget

Agent Frameworks

Top Pick: ZeroClaw

What It Is

Key Stats

Killer Feature

Feature Parity with OpenClaw

Trade-off

Runner-Up: Moltis

What It Is

Key Stats

Everything In One Binary

Local LLMs for Pi 5

Tier 1: Recommended

Tier 2: Situational

Do NOT Run on Pi 5

What Speed Feels Like

Inference Engines

Benchmark Data

Operating System

Ubuntu vs DietPi

System Tuning

Kernel Parameters /etc/sysctl.conf

CPU Governor

Transparent Huge Pages

Memory Locking

Swap Strategy: zram

Use zram, not NVMe swap

Filesystem

Quantization Sweet Spot

Hardware Essentials

Hailo AI Kit Verdict

Alternative SBCs

Decision Framework

Community Projects

Recommended Stack

The Stack

Sources

Kernel Parameters `/etc/sysctl.conf`