Cloud Data Contract and Privacy Boundary.

Budi's cloud sync is optional. When you turn it on, only pre-aggregated numbers leave your machine — never prompts, code, or AI responses. This page is the full technical contract. For the short version, see the landing page summary.

Last updated: 2026-05-15

The short version

Your code is yours

Prompts, AI responses, source code, and file paths never leave your machine. There is no "full upload" mode — this is enforced structurally, not by a setting.

Only numbers travel

If you opt in to cloud sync, only pre-aggregated daily totals travel: token counts, cost, model name, hashed repo ID, and branch name. Nothing content-bearing.

Off by default

Cloud sync is disabled until you explicitly run budi cloud join. No telemetry, no phone-home, no automatic opt-in.

Section 1

What stays, what goes.

The sync worker can only read from rollup tables — pre-aggregated counts and costs. It has no access to message content, file paths, or raw payloads. This is a structural guarantee, not a config toggle.

Never leaves your machine

Your prompts

Everything you type to an AI agent stays on your machine.

AI responses

Generated code, reasoning, and tool call output stay local.

Source code & file paths

Your project structure and file system layout are never exposed.

Email addresses

Identity is handled by API key, not personal information.

Raw request/response payloads

Unstructured data dumps could contain anything sensitive.

MCP tool details

Server names, tool arguments, and execution results stay private.

Custom tag values

Tags are user-defined and could contain anything, so they stay local-only.

Can optionally leave (when cloud sync is on)

These are the only fields that cross the wire — all derived from pre-aggregated daily rollups. No content, no paths, no PII.

Token counts (input, output, cache) — numbers, no content
Cost in cents — derived from token counts and public pricing
Model and provider names — public identifiers like claude-sonnet-4-20250514
Hashed repo identifier — a SHA-256 hash, not the actual path
Git branch name — useful for ticket attribution, contains no code
Session title — workspace or project-level label (e.g. Verkada-Web)
Message and session counts — numeric totals per time bucket
Ticket ID — extracted from branch name (e.g. PROJ-1234)

Repo identifier note: The repo_id is a SHA-256 hash of the repo root path, computed locally. The actual file system path never appears in the sync payload.

Section 2

What the sync looks like.

The daemon pushes daily rollup records to the cloud — once per sync tick (every 5 minutes by default). Daily granularity gives managers the views they need without revealing per-hour work patterns.

Sync envelope

Every push is wrapped in an envelope that identifies the device and workspace:

sync envelope

{
  "schema_version": 2,
  "device_id": "d_abc123def456",
  "workspace_id": "ws_xyz789",
  "label": "ivan-mbp",
  "synced_at": "2026-04-10T18:30:00Z",
  "payload": {
    "daily_rollups": [ ... ],
    "session_summaries": [ ... ]
  }
}

Daily rollup record

One record per unique combination of day, role, provider, model, repo, and branch. This is what a single row looks like:

daily rollup

{
  "bucket_day": "2026-04-10",
  "role": "assistant",
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "repo_id": "sha256:a1b2c3d4e5f6",
  "git_branch": "feature/PROJ-1234-add-auth",
  "ticket": "PROJ-1234",
  "ticket_source": "branch",
  "message_count": 47,
  "input_tokens": 125000,
  "output_tokens": 89000,
  "cache_creation_tokens": 15000,
  "cache_read_tokens": 42000,
  "cost_cents": 3.42
}

Session summary record

A scrubbed per-session summary with computed totals only — no per-message detail, no paths, no content:

session summary

{
  "session_id": "d99dfe22-d05c-4c78-8698-015d06e5dabb",
  "provider": "claude_code",
  "title": "Verkada-Web",
  "started_at": "2026-04-10T09:15:00Z",
  "ended_at": "2026-04-10T10:45:00Z",
  "duration_ms": 5400000,
  "repo_id": "sha256:a1b2c3d4e5f6",
  "git_branch": "feature/PROJ-1234-add-auth",
  "ticket": "PROJ-1234",
  "message_count": 47,
  "total_input_tokens": 125000,
  "total_output_tokens": 89000,
  "total_cost_cents": 3.42,
  "primary_model": "claude-sonnet-4-20250514"
}

Section 3

Identity model.

Three entities form a simple hierarchy. A workspace is your team's billing and visibility boundary. Users belong to one workspace. Each user can have multiple devices (laptop, desktop, CI runner).

hierarchy

Workspace (1) ──< User (many) ──< Device (many)

Workspace

ws_<alphanumeric>

Your team boundary. Manager creates it in the cloud dashboard. All devices in a workspace contribute to the same dashboard.

User

usr_<alphanumeric>

A cloud account created via self-registration or manager invite. Can be a member (sync data, see own stats) or manager (see everyone's stats).

Device

dev_<alphanumeric>

One budi installation on one machine. Generated locally on first login. Persists across daemon restarts — regenerated only if the config file is deleted.

Section 4

Authentication & transport.

All communication is encrypted, one-directional, and initiated by your machine. The cloud never calls back to the daemon — there is no webhook, no pull endpoint, no remote command channel.

How the daemon talks to the cloud

Transport: HTTPS only. The daemon refuses to sync over plain HTTP (hard-coded).
Auth: API key in Authorization: Bearer budi_<key> header.
Key storage: ~/.config/budi/cloud.toml, file permissions 0600.
Rate limiting: Server-side. On 429s the daemon backs off exponentially (1s → 2s → 4s → ... → 5 min cap).

What the cloud can and cannot see

Sees: Aggregated cost/usage metrics, session summaries, model names, provider names, hashed repo IDs, branch names, ticket IDs.
Never sees: Prompts, responses, code, file paths, email addresses, raw payloads, tag values, tool arguments, MCP traffic.
Direction: Push-only from daemon. The cloud never initiates a connection to your machine.

Other outbound requests

Besides cloud sync, the daemon can make two additional HTTPS requests for pricing data. Both are opt-out via BUDI_PRICING_REFRESH=0:

LiteLLM pricing manifest

Anonymous GET to fetch the latest model prices. No user content, no identifiers.

Team pricing pull

Authenticated GET to fetch your workspace's negotiated price list. No user content in the request.

Section 5

Retries & deduplication.

Network failures, retries, and overlapping sync windows are handled gracefully. The server uses UPSERT semantics — re-sending the same data never creates duplicates.

How duplicates are prevented

Each record type has a unique key. If the same key arrives twice, the server overwrites the old row — so the daemon can safely retry on any failure.

Daily rollup key: (device_id, bucket_day, role, provider, model, repo_id, git_branch)
Session summary key: (device_id, session_id)

Sync watermark

The daemon remembers the last day that was fully synced. On each tick it sends:

New days since the watermark
Today's rollups again (they grow throughout the day)
Session summaries for sessions that changed since the last sync

The server confirms the watermark in its response. If the push fails, the watermark doesn't advance — and the next tick retries.

Section 6

Cloud alpha specs.

The current cloud alpha is designed for small teams of 1 to 20 developers.

Aspect	How it works
Getting started	Manager signs up, creates a workspace, and shares an invite link.
Joining a team	Developer runs budi cloud join <invite-token> — one command.
Roles	Manager sees the whole team's dashboard. Member sees only their own data.
Data granularity	Daily cost breakdowns. No per-hour or real-time views yet.
Retention	90 days of synced data. Configurable in future versions.
Multi-workspace	Not yet — a user belongs to exactly one workspace in v1.
SSO / SAML	Not yet — API key auth only. Enterprise auth is planned for post-8.0.

Section 7

API surface (v1).

The daemon talks to exactly four cloud endpoints. The dashboard and user management APIs are separate and not covered here.

Endpoint	What it does
`POST /v1/ingest`	Receives the sync payload from the daemon
`GET /v1/ingest/status`	Returns sync health and current watermark for the device
`GET /v1/whoami`	Returns the identity of the authenticated device
`GET /v1/pricing/active`	Returns the workspace's negotiated price list

POST /v1/ingest response codes

200 OK

Payload accepted. Watermark updated.

401

Auth failed. Daemon stops syncing until re-authenticated.

422

Schema mismatch. Daemon logs a warning and waits for update.

429 / 5xx

Rate limit or server error. Daemon retries with exponential backoff.

Section 8

Configuration reference.

Everything lives in one file: ~/.config/budi/cloud.toml. It's created by budi cloud join and disabled by default — no automatic opt-in, no telemetry, no phone-home.

Default config file

~/.config/budi/cloud.toml

[cloud]
enabled = false                       # nothing syncs until you flip this
api_key = "budi_..."                  # your bearer token
device_id = "dev_..."                 # stable machine identifier
workspace_id = "ws_..."               # your team's workspace
endpoint = "https://app.getbudi.dev"  # cloud ingest URL

[cloud.sync]
interval_seconds = 300                # push every 5 minutes
retry_max_seconds = 300               # max backoff cap

Environment variable overrides

These override the config file — useful for CI or self-hosted deployments:

BUDI_CLOUD_ENABLED

Override the enabled flag (true / false)

BUDI_CLOUD_API_KEY

Override API key (useful for CI)

BUDI_CLOUD_ENDPOINT

Override cloud URL (useful for self-hosted)

BUDI_PRICING_REFRESH

Set to 0 to disable pricing refresh requests

Section 9

Trade-offs.

Privacy-first comes with real costs. Here's what the design intentionally gives up.

Daily granularity only

The cloud dashboard shows daily cost breakdowns. You can't drill into hourly or per-request data in the cloud — but you can locally via the CLI.

No per-message visibility

The cloud sees "47 messages cost $3.42 on this branch today" — never the content of those messages. Less debuggable, more private.

No tag sync

Custom tags stay local-only because their values are user-defined and could contain anything sensitive. Managers filter by model/repo/branch/ticket instead.

One workspace per user

If you consult for multiple clients, you can't aggregate across workspaces yet. Multi-workspace is planned for post-8.0.

API key auth only

No SSO, SAML, or OAuth yet. Fine for small-team alpha; enterprise auth is on the roadmap.

No central configuration

Managers can't push budget limits or settings to developer machines. All config is local. Budget enforcement happens on your machine, not in the cloud.

Section 10

Amendments.

Changes to the original data contract since it was published. In every case, the core privacy guarantee — no content, no code, no prompts leave your machine — is unchanged.

2026-04-21

Pricing refresh

The daemon now fetches an open-source pricing manifest (LiteLLM) to stay current with model costs. This is an anonymous request with no user data attached. Disable with BUDI_PRICING_REFRESH=0.

2026-05-11

Team pricing

Workspaces can now set custom model prices. The daemon fetches these via an authenticated request that carries no user content — just a small JSON of rates. Same opt-out switch.

#836 2026-05-15

Workspace rename

"Organization" was renamed to "Workspace" across the codebase. The daemon accepts both the old org_id and new workspace_id keys during a transition period so existing configs keep working without edits.

Questions about the data contract? Open an issue or check the wiki source.

Back to home