// open source

The first entitative harness. Built for Gemma 4.

Bumblebee is a framework and agentic harness for creating digital entities that run on your own hardware locally.

Experience local open-source intelligence through a harness purpose-built for the Gemma family of models by Google.

install //
npm i -g bumbleagi

What’s Bumblebee?

Bumblebee is a framework and agentic harness for creating digital entities that run on your own hardware.

You define a personality — traits, voice, drives, emotional range. It develops the rest: opinions, relationships, habits, a journal it writes in at night. It can live ANYWHERE. It costs nothing to run. It remembers everything.

Inference stays local by default — Ollama on your GPU, no API keys, no subscriptions. Hybrid mode keeps the brain at home behind a gateway and tunnel while an always-on worker runs on Railway with Postgres.

Use bumblebee setup, .env.example, and configs/default.yaml (deployment, inference) for wiring.

  • Entitative architecture — The fundamental unit is a self, not a task.
  • 40+ tools — Search, browse, code, speak, create, remember.
  • Emotional state — Real-time mood and drives that motivate behavior.
  • Lived memory — Episodes, relationships, beliefs, narrative identity.
  • Self-programming — Creates its own routines, knowledge, and journal.
  • 100% local — Gemma 4 on your GPU. No API keys. Free forever.
  • Hybrid deploy — Brain at home, hands on Railway. Fully isolated.
  • Multi-platform — CLI, Telegram, Discord. Same entity everywhere.
  • MCP extensible — Connect any MCP server. Instant new capabilities.
  • Personality evolution — Traits drift through experience over time.
  • Open source — Apache 2.0. No restrictions.

// native surface

Tools & MCP

40+ native tools are the entity’s senses and reach — not one-off “skills for you,” but how it touches the world. Toggle groups in configs/default.yaml; optional stacks (browser, image generation, voice) need their pip extras when you turn them on.

  • Web & files

    Search, fetch URLs, Wikipedia and Reddit, PDFs, and allowed filesystem paths.

  • Browsing & terminal

    Terminal-based browsing with Firecrawl, plus shell commands, workspace files, and Python/JavaScript execution.

  • Voice & media

    TTS voice notes, transcripts, and YouTube search — wire bumblebee[voice] when needed.

  • Automations & time

    Cron-style routines, reminders, and timezone-aware clock reads.

  • Memory & knowledge

    Private journal, structured knowledge updates, and contact-aware messaging helpers.

  • Messaging

    DMs and routed messages on Telegram and Discord with confirmation flows.

// machine checklist

Requirements

// software

Python
3.11+
Ollama
Ollama with gemma4:26b for chat (reflex + deliberate)
Embeddings
nomic-embed-text for vector memory
Packages
uv (recommended) or pip

// hardware

Minimum
~8 GB VRAM — e.g. RTX 3060 8 GB, RX 7600, Arc A770 8 GB. Use smaller or more aggressive quantization; keep to reflex-only or a very tight dual-model setup. You can point reflex at gemma4:e4b in entity YAML to lighten load. CPU-only via Ollama works for experiments but expect slow turns.
Recommended
~16 GB VRAM — e.g. RTX 4060 Ti 16 GB, RTX 4070 (Ti), RX 6800 XT. Matches the default stack: gemma4:26b on both reflex and deliberate. Close other GPU-heavy apps if you are near the limit.
Comfort
24–32+ GB VRAM — e.g. RTX 3090 / 4090, RX 7900 XTX. Easier dual-model headroom, room for larger deliberate weights or more context without constantly juggling VRAM.
Notes
MoE-style models keep active parameters per token lower than full dense size; real-world fit still depends on context length, thinking budget, and concurrent platforms. Full table: Hardware guide.
Gemma 4

Onboarding: From zero to new entity

The CLI wizard bumblebee setup walks you through environment, inference, and optional hybrid deploy. Here is the happy path in order — skip anything you do not need yet.

  1. Get the harness

    Clone the repo, install with uv (uv sync), and pull your Gemma stack in Ollama (e.g. gemma4:26b, nomic-embed-text).

  2. Run bumblebee setup

    Creates or updates .env. Choose hybrid (home brain + gateway + tunnel + optional Railway worker) or local (single machine). The wizard can merge tokens, start the home stack on Windows, and apply Railway variables when the CLI is linked.

  3. Define your entity

    Use the built-in entity step or run bumblebee create. You get a YAML under configs/entities/ — personality, models, tools, and platforms live there.

  4. Wire chat surfaces

    Add Telegram or Discord under presence.platforms in your entity YAML; put bot tokens in .env to match token_env. For hybrid workers, use durable S3-compatible attachment storage so photos survive redeploys.

  5. Go live

    bumblebee talk <entity> for a terminal-only session, or bumblebee run <entity> for the full presence loop (CLI + Telegram + Discord as configured). Hybrid: keep the home gateway up whenever the cloud worker should think.

// spatial companion

BumbleAR

Coming later — an optional, separate spatial workstation/interface for the same entities: 3D body in any space, WebSocket + Spatial Action protocol, same inference stack. Not required for CLI, Telegram, or Discord.

open BumbleAR

// open hive

Community

Open source on GitHub — code, docs, and issue history in public. Chat may live beside the repo; contribution and governance stay on GitHub.

  • Contribute on GitHub Issues for bugs and gaps, Discussions for design and support threads, pull requests for fixes, docs, and features that fit the architecture.
  • Develop in the open — Roadmap and trade-offs show up in issues and readme updates; nothing is hidden behind a vendor wall.
  • Same look as the bee — This globe uses the same ASCII dither pass as the hero 3D panel.

Bumblebee is open source: you can read the code, run it as-is, fork it for experiments, and self-host without asking permission. That’s where bugs, ideas, and changes are tracked — so when you’re ready to ship a patch or improve the docs, the path is already there.

FAQ

Quick answers about running Bumblebee locally, memory, platforms, tools, and how the pieces fit together.

What does “entitative” mean here?

Entitative is our shorthand for entity-first: the system is organized around a single, named digital self you configure (persona, channels, tools, data paths) — not around anonymous chat threads or a grab-bag of unrelated tasks. The word grows out of entity: one coherent subject the harness keeps running over days and weeks.

In practice that means memory, habits, and voice accumulate for that entity across sessions and surfaces (CLI, Telegram, Discord, etc.). You are not “starting fresh” every time by default; you are continuing the same presence, with resets and tools available when you intend to use them. Bumblebee’s YAML entities, storage layout, and presence loop are all shaped around that idea.

It is a design stance, not a buzzword: many tools optimize for stateless or disposable conversations. We optimize for a persistent self you own — local inference, your disks, your rules — while still fitting real engineering (hybrid mode, gateways, optional APIs) underneath.

Do I need API keys or a paid cloud model?

No. Inference is designed to run locally by default (e.g. via Ollama on your GPU). You are not required to use hosted APIs or subscriptions to get started.

What do I need on my machine?

A normal developer setup: Python environment for the harness, and a local inference stack such as Ollama with a compatible model (the site highlights the Gemma family). A GPU helps for speed but isn’t strictly required for every workflow.

How do I install and configure it?

Use the install commands above, then bumblebee setup together with .env.example and configs/default.yaml (see deployment and inference sections) to wire your environment.

What is hybrid mode?

Hybrid keeps the brain at home behind your gateway and tunnel while an always-on worker can run on a host such as Railway with Postgres — so you get persistence and reachability without sending inference to a third-party API by default.

What is the difference between bumblebee talk and bumblebee run?

bumblebee talk <entity> starts a terminal-only conversation: no background daemon and no Telegram, Discord, or other configured platforms. It is ideal for quick tests and debugging.

bumblebee run <entity> starts the full presence loop: the daemon plus every platform listed under presence.platforms in your entity YAML (and an optional CLI REPL if you enabled CLI there).

Does /reset delete my entity’s long-term memory?

No. Platform commands like /reset clear rolling chat turns for the session — they do not wipe episodic memory, beliefs, relationships, or other data in the database.

A full experiential wipe is intentional and host-side: bumblebee wipe <entity> --yes (see the readme). Always back up first if you are unsure.

Where is memory stored?

By default each entity uses a SQLite file under your Bumblebee data path (see memory.database_path in harness defaults). When DATABASE_URL or memory.database_url is set — typical for a hybrid worker on Postgres — the harness uses that instead.

What are native tools and MCP?

Native tools are built into the harness (web search, filesystem allowances, automations, messaging helpers, optional browser/voice stacks, and more). You toggle groups in configs/default.yaml; heavier stacks often need a pip extra such as bumblebee[full] when you enable them.

MCP lets you attach stdio servers in entity config so tools appear at runtime with prefixed names, alongside native ones. Both paths are documented in the readme and configs/entities/example.yaml.

How do Telegram and Discord work?

Add a telegram or discord entry under presence.platforms in your entity YAML. Tokens come from environment variables named by token_env (commonly TELEGRAM_TOKEN / DISCORD_TOKEN in .env).

You can restrict who may talk to the bot with optional allowlists (allowed_user_ids, allowed_chat_ids on Telegram). Start the entity with bumblebee run so those platforms connect.

Why use S3-compatible storage for attachments?

In local mode, images and audio from chats are written to a disk folder under your user data path. On ephemeral cloud disks (e.g. a Railway worker), those files would disappear on redeploy.

Set BUMBLEBEE_ATTACHMENTS_BACKEND=object_s3_compat and the BUMBLEBEE_S3_* variables so blobs land in object storage (any S3-compatible API). bumblebee setup can prompt for this on the hybrid path.

What is knowledge.md?

Each entity can have a knowledge file you edit on disk. Sections marked [locked] are yours only; unlocked sections can be updated by the entity over time. Use bumblebee knowledge <entity> to create or open it in your editor.

What about Firecrawl or other optional APIs?

Bumblebee does not require paid web APIs. If you add a Firecrawl API key, the harness can prefer it for richer fetch_url / search_web behavior when configured — entirely optional.

What license is Bumblebee under?

Bumblebee is open source under the Apache License 2.0 — usable for personal and commercial projects, subject to that license’s terms.

Where can I ask questions or report issues?

Use GitHub Discussions for community questions and Issues for bugs. The readme is the best starting point for deeper documentation.