Developer Tools Raspberry Pi 1 min read

Be More Agent: Build an Offline AI Companion on Raspberry Pi

B
Bright Coding
Author
Share:
Be More Agent: Build an Offline AI Companion on Raspberry Pi
Advertisement

Be More Agent: Build an Offline AI Companion on Raspberry Pi

Stop sending your voice data to Big Tech. Every time you ask Alexa about the weather or tell Siri to set a timer, that audio clip travels through corporate servers, gets logged, analyzed, and potentially leaked. What if you could have a conversational AI that never leaves your living room—one that thinks, speaks, and sees entirely on a $75 computer?

Enter Be More Agent, the open-source project that's making developers abandon cloud assistants faster than you can say "privacy breach." This isn't another ChatGPT wrapper or API-dependent gadget. It's a fully autonomous, offline-first AI agent that transforms your Raspberry Pi into a sentient little companion with animated expressions, custom voices, and vision capabilities. No subscription fees. No internet dependency. No data harvesting.

The secret sauce? It runs 100% locally using Ollama for language processing, Whisper.cpp for speech recognition, and Piper TTS for neural voice synthesis. Even the wake word detection happens on-device using OpenWakeWord. In this complete guide, I'll walk you through why this project is exploding in popularity, how to build your own, and why it might be the most important Raspberry Pi project you'll tackle this year.


What is Be More Agent?

Be More Agent is an open-source, customizable AI agent framework created by developer brenpoly and hosted at github.com/brenpoly/be-more-agent. Designed specifically for Raspberry Pi hardware, it represents a radical departure from the cloud-dependent smart speakers that have dominated the consumer market for the past decade.

The project's philosophy is elegantly simple: your AI should belong to you. Every component—from speech recognition to language understanding to voice synthesis—executes locally on the Pi's ARM processor. The only time it reaches for the internet is when you explicitly enable the optional DuckDuckGo web search feature, and even then, it's fetching public information rather than transmitting your personal data.

What's making this project trend now? Three converging forces: the explosion of efficient open-source LLMs (Gemma 2B, Moondream), mature on-device speech tools (Whisper.cpp, Piper), and growing developer fatigue with API costs and rate limits. When GPT-4's API can cost $0.03 per thousand tokens and cloud speech services charge by the minute, a one-time $75 hardware investment with zero ongoing fees becomes incredibly attractive.

The project is also deliberately designed as a "blank canvas." The default configuration evokes BMO from Adventure Time (with appropriate fan-project disclaimers), but every visual and auditory element is swappable. Want a stern British butler? A chirpy anime sidekick? A minimalist geometric face? Just replace the PNG sequences in the faces/ folders and drop in new .wav sound effects. The architecture doesn't care about your aesthetic choices—it just provides the neural infrastructure for your creativity.


Key Features: The Technical Breakdown

Let's dissect what makes Be More Agent technically remarkable, feature by feature:

100% Local Intelligence with Ollama and Whisper.cpp The agent's "brain" runs on Ollama, the increasingly popular tool for running open-source LLMs locally. By default, it uses Google's Gemma 2B—a surprisingly capable model that fits comfortably in a Pi's limited RAM while delivering coherent, context-aware responses. For vision tasks, Moondream (a tiny 1.6B parameter vision-language model) enables the agent to describe what it sees through a connected camera. Speech-to-text leverages Whisper.cpp, Georgi Gerganov's highly optimized C++ port of OpenAI's Whisper that runs efficiently on ARM processors without GPU acceleration.

Open Source Wake Word with OpenWakeWord Unlike proprietary wake word systems that require developer accounts, API keys, and cloud verification, Be More Agent uses OpenWakeWord—a fully offline, trainable wake word detection system. The default model responds to "Hey Jarvis," but you can train custom wake words using your own voice samples. The .onnx model file sits in the project root, completely self-contained.

Hardware-Aware Audio Processing Here's a detail that reveals serious engineering: the script auto-detects your microphone's sample rate and resamples audio on-the-fly. This prevents the dreaded ALSA errors that plague most Raspberry Pi audio projects. If you've ever fought with arecord throwing cryptic "sample rate not supported" messages, this feature alone saves hours of frustration.

Smart Web Search Fallback When the local LLM encounters knowledge gaps, the agent can query DuckDuckGo for real-time information. This hybrid approach—local reasoning with optional web augmentation—delivers the best of both worlds: privacy for personal queries, accuracy for current events.

Reactive Face Animation System The GUI isn't static. It displays PNG sequences that change based on the agent's internal state: gentle breathing when idle, alert expression when listening, animated "thinking" dots when processing, and synchronized mouth movements during speech. This state machine creates the illusion of personality without complex 3D rendering.

Piper TTS for Neural Voice Synthesis Piper (from the Rhasspy project) generates remarkably natural speech at low latency. The project even includes a custom fine-tuned BMO voice model, trained locally from Piper's base "Amy" model, available through the releases page.


Use Cases: Where Be More Agent Shines

1. The Privacy-First Smart Home Hub

Deploy Be More Agent as a voice-controlled home automation interface that never transmits your commands to external servers. Integrate it with Home Assistant via local MQTT, and you've got a Jarvis-like interface that knows when you turned off the lights—but no corporation does.

2. Offline Educational Companion

In classrooms with restricted internet or developing regions with unreliable connectivity, this becomes a tutoring device that answers questions, explains concepts, and engages students through conversation. The vision capability lets it comment on physical objects, making it ideal for interactive learning.

3. Accessible Technology for Sensitive Environments

Hospitals, legal offices, and government facilities often prohibit cloud-connected devices. Be More Agent provides AI assistance in air-gapped environments—taking notes, answering procedure questions, or reading documents aloud without compliance risks.

4. Custom Character Development & Prototyping

Game developers and interactive artists can rapidly prototype NPC personalities by swapping face animations and voice models. The modular architecture lets you test dialogue trees and emotional responses before investing in custom game engine integration.


Step-by-Step Installation & Setup Guide

Ready to build your own? Here's the complete installation process, extracted and verified from the project's documentation.

Prerequisites

First, ensure your Raspberry Pi OS is fully updated:

sudo apt update && sudo apt upgrade -y
sudo apt install git -y

Hardware requirements: Raspberry Pi 5 (recommended) or Pi 4 with 4GB+ RAM, USB microphone and speaker, LCD display (DSI or HDMI), and optionally the Raspberry Pi Camera Module.

Install Ollama

The agent's intelligence depends on Ollama. Install it with:

curl -fsSL https://ollama.com/install.sh | sh

Then pull the required models:

ollama pull gemma:2b      # The main conversational brain
ollama pull moondream     # Vision capability for camera input

Clone and Run Setup

git clone https://github.com/brenpoly/be-more-agent.git
cd be-more-agent
chmod +x setup.sh
./setup.sh

The setup.sh script is doing heavy lifting here: installing system libraries, creating folder structures, downloading Piper TTS and voice models, establishing a Python virtual environment, and fetching the default "Hey Jarvis" wake word model.

Configure Your Wake Word

The default wake word works out of the box, but customization is straightforward:

  1. Train your own model at OpenWakeWord
  2. Place the resulting .onnx file in the project root
  3. Rename it to wakeword.onnx

Launch the Agent

source venv/bin/activate
python agent.py

On first run, agent.py auto-generates config.json with sensible defaults. The GUI window should appear, showing the warmup face animation sequence.


REAL Code Examples from the Repository

Let's examine actual code patterns from the Be More Agent project, with detailed explanations of how each component functions.

Configuration Architecture (config.json)

The project uses a clean JSON configuration system. Here's the structure with explanatory comments:

Advertisement
{
    "text_model": "gemma3:1b",
    // Which Ollama model handles conversation. 
    // "gemma:2b" is default; "gemma3:1b" offers newer architecture
    
    "vision_model": "moondream",
    // Enables camera-based visual understanding
    // Moondream runs efficiently on Pi's limited compute
    
    "voice_model": "piper/en_GB-semaine-medium.onnx",
    // Path to Piper TTS voice model
    // "medium" quality implies 22050 Hz sample rate
    
    "chat_memory": true,
    // Persists conversation history to chat_memory.json
    // Enables context-aware multi-turn dialogue
    
    "camera_rotation": 0,
    // Adjust if camera is mounted upside-down or sideways
    // Values: 0, 90, 180, 270 degrees
    
    "system_prompt_extras": "You are a helpful robot assistant. Keep responses short and cute."
    // Appended to base system prompt
    // Controls personality, response length, tone
}

This configuration demonstrates thoughtful design: every tunable parameter is exposed without overwhelming newcomers. The system_prompt_extras field is particularly powerful—it's how you inject personality without modifying source code.

Project Structure: The State Machine Pattern

The directory layout reveals the agent's operational architecture:

be-more-agent/
├── agent.py                   # Main event loop: wake word → STT → LLM → TTS → face update
├── setup.sh                   # Idempotent environment bootstrap
├── wakeword.onnx              # Neural network for wake word detection (ONNX Runtime)
├── config.json                # Runtime configuration (hot-reloadable on restart)
├── chat_memory.json           # SQLite-like JSON persistence for conversation context
├── requirements.txt           # Python dependencies (PyAudio, ONNX, OpenCV, etc.)
├── whisper.cpp/               # Git submodule or compiled binary for STT
├── piper/                     # TTS engine + voice model storage
├── sounds/                    # Audio feedback organized by agent state
│   ├── greeting_sounds/       # Played once on startup
│   ├── thinking_sounds/       # Looped during LLM inference
│   ├── ack_sounds/            # Short confirmation of wake word detection
│   └── error_sounds/          # Played on exceptions or unrecognized input
└── faces/                     # Visual state machine output
    ├── idle/                  # Subtle breathing animation (low CPU)
    ├── listening/             # Alert, attentive expression
    ├── thinking/              # Processing indicator (often animated dots)
    ├── speaking/              # Mouth movement synchronized to TTS output
    ├── error/                 # Confused or apologetic expression
    └── warmup/                # Boot sequence before full initialization

The faces/ and sounds/ directories implement a declarative state machine: agent.py simply checks its current state and loads resources from the corresponding folder. Adding new expressions requires zero code changes—just drop appropriately named PNGs or WAVs.

Custom Voice Model Integration

The BMO voice setup shows advanced Piper TTS integration:

"voice_model": "voices/bmo.onnx"

With manual installation steps documented as:

# Create dedicated voices directory
mkdir voices/

# Download fine-tuned model artifacts from GitHub Releases
# bmo.onnx        - The neural network weights
# bmo.onnx.json   - Model configuration (sample rate, phoneme mappings, inference params)

# Place both files in voices/
# Update config.json path accordingly

The fine-tuning pipeline is noteworthy: starting from Piper's en_US-amy-medium base, the creator trained a custom voice locally. This proves the project isn't just consuming open-source tools—it's demonstrating how to extend and customize them.

Critical Audio Debugging: Sample Rate Fix

The troubleshooting section contains this essential fix for voice pitch distortion:

# In agent.py, locate this constant:
PIPER_RATE = 22050  # Default for "medium" quality models

# If your custom model uses different training parameters,
# match this value to your .onnx.json "sample_rate" field.
# Common alternatives: 16000 (fast/low quality), 48000 (high quality)

And for pacing issues in the JSON configuration:

{
  "inference": {
    "length_scale": 1.0
    // < 1.0 = faster speech, > 1.0 = slower, "zombie-like" speech
    // Adjust if voice sounds compressed or stretched despite correct sample rate
  }
}

These snippets reveal deep audio engineering knowledge. The sample rate mismatch problem—where 48kHz models played at 22.05kHz produce "demonic" pitch shifting—is a classic digital signal processing pitfall that the documentation addresses with precision.


Advanced Usage & Best Practices

Optimize Model Selection for Your Pi Variant On Raspberry Pi 4 with 4GB RAM, stick to gemma:2b and accept 3-5 second inference delays. On Pi 5 with 8GB, experiment with gemma3:1b or phi3:mini for faster, richer responses. Monitor htop during conversations—if swap usage spikes, downgrade your model.

Pre-warm Models for Instant Response Ollama lazily loads models into VRAM (shared memory on Pi). Send a silent "hello" prompt on boot to preload the LLM, eliminating the first-query delay that makes agents feel sluggish.

Customize Without Coding The most powerful "pro tip" is leveraging the asset system. Create 10-frame PNG sequences for each state using tools like Aseprite or Photoshop. Name files sequentially (frame_01.png through frame_10.png). The loader automatically detects and loops them—no animation code required.

Network Isolation for Maximum Privacy For a truly air-gapped deployment, disable the DuckDuckGo search in config.json (or block outbound DNS at your router). The agent functions fully without internet—just with a knowledge cutoff at the LLM's training date.


Comparison with Alternatives

Feature Be More Agent Rhasspy Mycroft (Deprecated) Cloud Assistants (Alexa/Google)
Offline Operation ✅ 100% local ✅ Offline ⚠️ Partial cloud ❌ Cloud-dependent
LLM Integration ✅ Native Ollama ❌ Intent-only ❌ Limited ✅ Proprietary
Custom Wake Words ✅ Free, trainable ✅ Free ✅ Paid tier ❌ Limited options
Vision Capability ✅ Moondream ❌ No ❌ No ✅ Yes
API Costs $0 forever $0 $0 Ongoing fees
Hardware Cost ~$75 (Pi + accessories) ~$75 ~$100+ $30-100 device
Privacy Complete local control Complete local control Mixed Corporate data collection
Active Development ✅ 2024-2025 ✅ Maintained ❌ Project ended ✅ Corporate backed
Customization Depth Deep (faces, sounds, models) Moderate Moderate Surface-level only

Be More Agent uniquely combines modern LLM capabilities with genuine offline operation and deep aesthetic customization. Rhasspy remains excellent for traditional intent-based voice control but lacks conversational AI. Mycroft's demise leaves a gap that this project fills admirably.


FAQ

Does Be More Agent work on Raspberry Pi Zero 2 W? No. The 512MB RAM is insufficient for running Ollama, Whisper.cpp, and the GUI simultaneously. Minimum 4GB RAM (Pi 4) is required; Pi 5 is strongly recommended.

Can I use a different LLM than Gemma? Absolutely. Any Ollama-compatible model works. Try phi3:mini for faster responses, llama3.2:1b for different reasoning styles, or qwen2.5:0.5b for minimal resource usage. Adjust config.json accordingly.

How accurate is the wake word detection? OpenWakeWord performs well in quiet environments with clear pronunciation. For noisy settings, train a custom model with your specific voice and background audio. The .onnx format runs efficiently on Pi's CPU.

Is the BMO voice legal to use? The project includes appropriate disclaimers: it's a non-commercial fan project with no affiliation to Cartoon Network. The voice model itself is derived from Piper's MIT-licensed "Amy" base. For commercial deployments, use default Piper voices or train your own.

Can I run this without the GUI/screen? The agent.py script likely requires modification for headless operation, as face animations are integral to the state feedback loop. However, the core STT→LLM→TTS pipeline could be extracted for audio-only implementations.

What's the power consumption? Expect 7-15W depending on Pi model and active inference load. A standard 5V 3A USB-C supply suffices. Consider active cooling for sustained operation—LLM inference thermally throttles without it.

How do I update the agent? git pull in the repository directory, then re-run ./setup.sh to catch new dependencies. Back up your config.json, chat_memory.json, and custom faces//sounds//voices/ folders first.


Conclusion

Be More Agent represents something rare in today's AI landscape: a project that gives you superpowers without selling your soul. In an era where every "smart" device demands cloud connectivity, API subscriptions, and opaque data policies, this open-source framework proves that capable conversational AI can live entirely on your desk, under your control, with your custom personality layered on top.

The technical execution is genuinely impressive. Auto-resampling audio to prevent ALSA errors. ONNX-based wake word detection requiring no API keys. A state-driven animation system that needs zero code to customize. These aren't amateur conveniences—they're thoughtful solutions to real embedded-systems challenges.

Is it perfect? No. You'll wait seconds for LLM responses on Pi 4. You'll troubleshoot sample rate mismatches when experimenting with custom voices. You'll wish for faster vision inference. But these limitations are yours to optimize, not vendor-imposed restrictions designed to upsell you to a "Pro" tier.

If you've ever wanted a Jarvis-like assistant that actually respects your privacy, or if you're simply exhausted by API rate limits and monthly bills, clone Be More Agent today. Start with the default BMO personality. Then make it yours. Replace the face with your original character. Train a voice that sounds like your childhood imaginary friend. Build something that exists nowhere else in the world—because it only exists in your room, on your Pi, running your code.

The future of personal AI isn't in the cloud. It's sitting on your workbench, waiting for you to power it on.

Advertisement

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Apps & Tools Open Source

Apps & Tools Open Source

Bright Coding Prompt

Bright Coding Prompt

Categories

Advertisement
Advertisement
Advertisement