Back to overview

How we build

Built in the open, by humans and machines, to last.

OpenWarden is a two-person, grant-funded project with no revenue. That single fact shapes everything: which work gets chosen, how scope stays small, and how AI coding engines accelerate the build without ever being trusted to merge it. Here is the whole machine.

Every issue gets a tier

Work is tracked by what it costs to keep alive.

Every issue and PR is labeled at triage with a feature tier and two cost estimates. The labels decide whether a thing gets built, and when.

Tier 0Must ship in v1

Foundational. If it doesn't ship, OpenWarden doesn't ship.

  • Device Owner DPC
  • Signed policy bundles
  • Recovery phrase
  • Allowlist + time windows
  • Audit log
Tier 1v1 if time, else v2

High-leverage. Materially improves safety or UX, but the product is viable without it.

  • DNS content filter
  • Install-approval
  • Babysitter role
  • Reproducible builds
Tier 2v3+, never before

Nice-to-have. Real value, real cost. Not started until Tier 0 + 1 ship and hold under one critical bug per month for three straight months.

  • On-device AI classifiers
  • Time-bank
  • Geofencing
  • Multi-language UI
Tier 3Never

Out of scope. Proposed many times; the answer is no.

  • SaaS / web dashboard
  • Plugin marketplace
  • Telemetry
  • Content monitoring

Build cost: S / M / L

Engineering time to ship. S under a week, M one to four weeks, L over four weeks.

Maintain cost: S / M / L

Hours per quarter to keep it working. S under 4h, M 4–16h, L over 16h. Anything Maintain L is deferred or refused. We don't take on permanent rent.

Saying no is the job

Every feature has a five-year maintenance bill.

There is no revenue to pay that bill, so scope creep, not a bypassed lock, is the project's single biggest threat. One question gates everything new.

If we ship this in v1.x, can we still maintain it in v6.0 with a two-person team funded by grants? If the answer is no, it does not ship in core.

When the answer is no, the reply is one of six copy-paste templates, so every contributor gets the same honest decline:

  • Out of scope (Tier 3)
  • Defer (Tier 2)
  • Threat-model concern
  • Paid-service dependency
  • Telemetry / analytics
  • Tone or fit

Every “no” thanks the contributor first, links to the exact rule, and never disparages the idea. A fork under Apache 2.0 is always a welcome answer.

How direction changes

Big calls are written down.

Architecture Decision Records

One numbered Markdown file per decision in docs/adr/: context, options, decision, consequences. Proposed by PR with a five-business-day comment window. Twelve are accepted so far, including one device tier, privacy via a no-server architecture, and Kotlin Multiplatform, not Flutter.

Governance that grows up

It starts as a BDFL: the maintainer owns the roadmap, the threat model, and the release keys through v1. Once five active maintainers exist it becomes a small steering council running on lazy approval, two approvals with a 72-hour SLA. Conduct follows Contributor Covenant 2.1.

Humans and machines

AI writes a lot of the code. Humans own every merge.

The dev loop is dual-LLM: Claude Code drives, Codex critiques. It accelerates scaffolding, refactors, and tests, but it is explicitly not “vibe-coding a security app.” Crypto, provisioning, and the device controller always get human review.

Driver

Claude Code

Reads the relevant spec section, writes failing tests first (test-vector-driven for crypto), implements the smallest diff to green, runs the conformance check, and drafts the PR. Crypto, :proto, and provisioning work enters Plan mode before a single edit.

Critic

Codex CLI

An independent second opinion, pulled in via the codex:rescue skill or a direct shell call. Used for high-stakes crypto and UX, or automatically after three failed attempts on the same problem.

The loop, every time

Read spec Tests first Confirm red Implement Confirm green /verify Open PR Human review
Six skills do the heavy lifting

Repeatable jobs, one command each.

The skills are the shared interface. Any coding engine drives the same commands against the same specs and test vectors.

Bootstrap

First-time setup: JDK 21, Android SDK + emulator, ktlint, optional Codex CLI. Idempotent.

/bootstrap-repo

Unit tests

Fast JVM tests for crypto, protocol, and policy logic.

/test-openwarden-unit

End-to-end

Boots an emulator, provisions Device Owner, asserts /health, tears down. ~15 min.

/test-openwarden-e2e-emulator

Provision

Sets up a running emulator as Device Owner for manual debugging.

/provision-openwarden-emulator

Verify spec

Runs the test-vector corpus against the code and reports PROTOCOL / CRYPTO / PROVISIONING conformance.

/verify-openwarden-spec

Second opinion

Spawns the codex:rescue subagent for an independent review.

/codex-second-opinion
What the machine can't do

The autonomy is bounded on purpose.

Speed is welcome; trust is earned. These limits are enforced in config and hooks, not by good intentions.

  • No AI push to mainCODEOWNERS routes crypto, provisioning, :proto, and test-vectors to mandatory human review.
  • No unsafe commandsSettings deny git push --force, --no-verify, anything matching BIP39 / MNEMONIC, and rm -rf /.
  • No skipping testsA Stop hook runs the suite before a turn can end. Tests are required for crypto, protocol, features, and bug fixes.
  • No runaway loopsEscalate to Codex after three failed iterations; trip at five edits to one file; per-task cost caps.
  • No silent mergesOpening or merging a PR is gated to ask. Every commit is human-signed with a DCO sign-off.

Engine-agnostic by design: the contract is the spec docs, the skills, and a pinned model config, not a vendor. Claude Code is the harness; Codex plugs in through the codex:rescue skill or a shell script; any engine is judged against the same conformance checks.

Ready?

Pick up a good first issue.

At least five good-first-issue tickets are kept open at all times, each labeled with honest build and maintain cost so you can pick something real.

Read the rules

CONTRIBUTING.md, GOVERNANCE.md, and SIMPLIFY.md are short and link from every “no.”

Bring your own engine

Claude Code, Codex, or another, judged against the same tests and conformance vectors.