Skip to content

SOVEREIGN BRIEFING — THE JANITOR ENGINE

Canonical state document. Generated by direct source audit. All values derive from code, not prior documentation. Supersedes all predecessor files.


I. CORE PRIMITIVES

Runtime Language Infrastructure

Grammar polyglot Static Extension(s)
Rust RUST rs
Python PYTHON py
JavaScript JAVASCRIPT js, jsx
TypeScript TYPESCRIPT ts, tsx
C++ CPP cpp, cxx, cc, hpp
C C c, h
Java JAVA java
C# CSHARP cs
Go GO go
GLSL GLSL glsl, vert, frag
Objective-C OBJC m, mm
YAML YAML yaml, yml
Bash BASH sh, bash
Scala SCALA scala
Ruby RUBY rb
PHP PHP php
Swift SWIFT swift
Lua LUA lua
HCL/Terraform HCL tf, hcl
Nix NIX nix
GDScript GDSCRIPT gd
Kotlin KOTLIN kt, kts

23 grammars total. Each grammar is a OnceLock<Language> static — loaded once on first use, zero per-call allocation thereafter.

Grammar library: tree-sitter 0.26 (workspace pinned).

Foundational Crates & Mathematical Models

Primitive Crate / Constant Purpose
Fuzzy clone detection AstSimHasher — SimHash over (kind_id u16, depth u32) feature pairs Detects structural refactors without textual identity
Swarm clustering LshIndex — MinHash, 8 bands × 8 rows, 64-hash sketch O(1) amortised PR → clone-cluster lookup
Patch entropy gate check_entropyzstd::encode_all level 3, ratio = compressed/raw NCD verbosity detection
Byte lattice shield ByteLatticeAnalyzer — windowed Shannon entropy, 512-byte window, 256-byte stride Binary/generated blob detection
Compiled payload scanner binary_hunter — AhoCorasick over 7 static byte patterns ELF/PE/WASM/miner injection detection
Merge simulation simulate_merge via libgit2diff.foreach + RefCell patch accumulation Zero disk checkout; O(PR-diff) memory
Symbol persistence rkyv 0.8 — zero-copy archive format IPC registry at .janitor/symbols.rkyv
Hot-swap registry arc_swap::ArcSwap<SymbolRegistry> Lock-free atomic reload without daemon downtime
Memory backpressure Physarumsysinfo 0.30, threshold model System-aware concurrency gate; prevents OOM
Dependency graph petgraph 0.7 — transitive compile-time reach analysis C++ silo detection in Structural Topology tab
Pattern matching aho_corasick 1.1 — single automaton per pattern group (OnceLock) Hot-path multi-pattern search without recompilation
Hashing blake3 1.5 — exact structural hash per function body Zombie/clone identity
Security — PQC ML-DSA-65 (FIPS 204) — PRODUCTION. the-governor/src/compliance/pqc.rs implements keygen/sign/verify via the fips204 crate. The Governor generates a persistent governor.key on first run and signs every CycloneDX CBOM bond with it. pqc_enforced: true in janitor.toml blocks the PR merge if bond issuance fails. Post-quantum attestation for CycloneDX v1.5 CBOM bonds
Security — Ed25519 vault::SigningOracle::verify_token — public-key-only; no private key in binary API token gate for janitor_clean (MCP) and /v1/attest

II. POLICY LAYER

janitor.toml at the repository root encodes maintainer-controlled slop tolerance as version-controlled, reviewable configuration. Loaded by JanitorPolicy::load() in crates/common/src/policy.rs. Full field reference: docs/governance.md / https://thejanitor.app/governance.

Field Default Purpose
min_slop_score 100 Gate threshold — raw score ceiling
require_issue_link false Block PRs with no #N reference
allowed_zombies false Downgrade zombie veto to warning
pqc_enforced false Block PR when ML-DSA-65 bond fails (needs Sentinel)
refactor_bonus 0 Raise gate for [REFACTOR]/[FIXES-DEBT] PRs
custom_antipatterns [] Project-specific .scm query files
trusted_bot_authors [] Handles exempt from unlinked-PR penalty
[forge].automation_accounts [] Ecosystem accounts without [bot] suffix

III. EXECUTION PIPELINE — HYPER-GAUNTLET

Entry point: tools/gauntlet-runner binary, invoked via just hyper-gauntlet or just run-gauntlet.

Stage 0 — Pre-emptive Workspace Purge

For each target repo (non-resume mode):
  rm -rf <gauntlet_dir>/<owner>/<repo>
  # Clean slate: no stale bounce_log.ndjson bleeds into aggregate

Resume mode (--resume):
  Save <repo>/.janitor/bounce_log.ndjson → memory buffer
  Perform clone/fetch below
  Restore buffer → <repo>/.janitor/bounce_log.ndjson

Stage 1 — Repository Hydration

Hyper-drive mode (--hyper, default for just hyper-gauntlet):

git clone --no-checkout https://github.com/<owner>/<repo>.git <repo_dir>
git -C <repo_dir> fetch origin 'refs/pull/*/head:refs/remotes/origin/pr/*'

Zero gh pr diff subshells. All PR refs land in the packfile.

Parallel-bounce mode (default for just run-gauntlet):

gh pr list --json number,author,mergeable,state --limit <PR_LIMIT>
Filter: exclude CONFLICTING, exclude Bot authors
For each open PR (rayon pool sized by `detect_optimal_concurrency()`, GIT_LOCK mutex):
  gh pr diff <N> --repo <slug>   → diff text
  janitor bounce <repo_dir> --patch <diff> --pr-number <N> --author <A> --format json

Stage 2 — In-Memory Merge Simulation (Hyper-drive)

janitor hyper-drive <repo_dir> --pr-limit <N> --timeout <S>s
  → build_symbols_rkyv(repo_path, base_sha)    [git_drive.rs — Necrotic Hydration]
  → For each PR ref:
      simulate_merge(repo, base_oid, pr_head_oid) → MergeSnapshot {
          blobs:   HashMap<PathBuf, Vec<u8>>    // Full HEAD blob per file
          patches: HashMap<PathBuf, String>     // Actual unified diff per file
          deleted: Vec<PathBuf>
          total_bytes: usize
      }
      Routing:
        snapshot.blobs   → IncludeGraphBuilder + SemanticNull
        snapshot.patches → PatchBouncer (SlopHunter, AstSimHasher, NCD)
      iter_by_priority() feeds high-SLOP-vector extensions first (Chemotaxis):
        "rs","py","go","js","ts","tsx","jsx","cs","java","cpp","cc","cxx","c"

Stage 3 — PatchBouncer Per-File (sequence within bounce())

1. Language detection          lang_for_ext() from "+++ b/<path>" header
2. Circuit breaker             patch > 1 MiB → skip
3. IAC bypass                  ext in IAC_TEXT_EXTS → skip ByteLatticeAnalyzer
4. Binary asset bypass         ext in BINARY_ASSET_EXTS → skip ByteLatticeAnalyzer
5. ByteLatticeAnalyzer         AnomalousBlob? → antipattern:agnostic_shield_anomaly
6. Extract added lines         "+"-prefixed diff lines only
7. Tree-sitter parse           grammar for lang; ERROR/MISSING nodes → neutral score
8. Structural hashing          BLAKE3 exact + SimHash fuzzy per function/method
9. Logic clone detection       Hamming(a, b) ≤ 3 → Refactor; 4–9 → Zombie
10. find_slop(lang, source)    Language-specific AST antipatterns (see Threat Matrix)
11. check_entropy(patch_bytes) NCD verbosity gate (zstd ratio < 0.05)
12. binary_hunter::scan()      AhoCorasick 7-pattern compiled payload scan
13. CommentScanner             Banned-phrase detection in added comment nodes
14. is_pr_unlinked(pr_body)    No linked issue → +20 pts
15. Collider lookup            LshIndex.query(PrDeltaSignature) → collided_pr_numbers
16. Necrotic Hydration check   backlog_pruner verdict → necrotic_flag

Stage 4 — Bounce Log Persistence

Each PR result appended to <repo_dir>/.janitor/bounce_log.ndjson via append_bounce_log(). f.sync_all() called after every write — survives SIGKILL.

Stage 5 — Global Aggregation (post all repos)

Two threads spawned in parallel after sequential repo processing:

Thread A: janitor report --global --format pdf  → <gauntlet_dir>/global_report.pdf
Thread B: janitor export --global               → <gauntlet_dir>/export.csv

IV. THE THREAT MATRIX

All threats detected by PatchBouncer::bounce() in crates/forge/src/slop_filter.rs.

Tier 1 — Critical Threats (security: prefix → $150/intercept)

ID Detector Condition Points
security:compiled_payload_anomaly binary_hunter ELF magic \x7fELF, WASM \x00asm\x01\x00\x00\x00, PE MZ\x90\x00\x03, /bin/sh\x00, cmd.exe\x00, stratum+tcp://, stratum2+tcp:// +50 per match
Swarm Collision LshIndex collided_pr_numbers non-empty Categorical → $150 billing

Tier 2 — Architectural Antipatterns (AST-derived)

Detected by find_slop(lang, source) in crates/forge/src/slop_hunter.rs:

Language Pattern Severity Points
YAML VirtualService/Ingress/HTTPRoute/Gateway with hosts: ["*"] Critical 50
C gets() call (removed in C11; unbounded buffer overflow) Critical 50
HCL/Terraform Open CIDR 0.0.0.0/0 in ingress rule Critical 50

Tier 3 — NCD Verbosity (antipattern: prefix → Warning tier)

ID Detector Condition Points
antipattern:ncd_anomaly check_entropy compressed/raw < 0.05 AND patch ≥ 256 bytes 10

Critical distinction: antipattern: prefix — NOT security:. is_critical_threat() gates on "security:". NCD intentionally uses antipattern: to avoid $150 categorical billing.

Tier 4 — Structural Quality

Detector Condition Score Effect
Logic clone detection Hamming ≤ 3 (Refactor-class similarity) +5 per clone pair (capped at 50 pairs)
Zombie symbols Dead body hash matches symbol in registry +10 per zombie
Comment violations Banned phrase in added comment +5 per violation
Unlinked PR No issue reference in PR body +20
Hallucinated security fix Security keywords, non-code-only diff +100

Bypass Rules

Condition Effect
ext in IAC_TEXT_EXTS (nix lock json toml yaml yml csv) Skip ByteLatticeAnalyzer
ext in BINARY_ASSET_EXTS (wasm woff woff2 eot ttf png jpg jpeg gif ico zip gz tar pdf) Skip ByteLatticeAnalyzer
patch > 1 MiB Skip entire file
Nix entities Protection::WisdomRule — all symbols shielded from dead-code classification
Bot author (app/ prefix, [bot] suffix, trusted list, forge.automation_accounts) Score still computed; billing classification unchanged

V. THE ACTUARIAL LEDGER

Classification Function (crates/cli/src/report.rs)

pub fn is_critical_threat(e: &BounceLogEntry) -> bool {
    e.antipatterns.iter().any(|a| a.contains("security:"))
        || !e.collided_pr_numbers.is_empty()
}

Billing Tiers

Classification Condition Rate
Critical Threat is_critical_threat(e) == true $150 per intercept
GC-only Necrotic necrotic_flag.is_some() OR !zombie_deps.is_empty() AND NOT critical $20 per intercept
StructuralSlop slop_score > 0 AND NOT critical AND NOT necrotic $20 per intercept
Boilerplate slop_score == 0, no threat signal $0

Score Formula (SlopScore::score())

score = (logic_clones_found.min(50) × 5)
      + (zombie_symbols_added × 10)
      + (antipattern_score.min(500))      ← sum of Severity::points() per finding
      + (comment_violations × 5)
      + (unlinked_pr × 20)
      + (hallucinated_security_fix × 100)

dead_symbols_added is tracked but excluded from score() (v7.6.2 — FFI false-positive elimination).

Total Economic Impact (TEI)

critical_threats_count    = entries where is_critical_threat()
gc_only_count             = entries where necrotic_flag.is_some() AND NOT is_critical_threat()
structural_slop_count     = entries where slop_score > 0 AND NOT critical AND NOT necrotic
total_actionable_intercepts = critical_threats_count + gc_only_count + structural_slop_count

ci_compute_saved_usd    = critical_threats_count × $150
gc_value_usd            = gc_only_count × $20
structural_slop_usd     = structural_slop_count × $20
total_economic_impact   = ci_compute_saved_usd + gc_value_usd + structural_slop_usd

CSV Column Schema (16 columns, exact order)

 1. PR_Number
 2. Author
 3. Score
 4. Threat_Class          "Critical" | "Necrotic" | "StructuralSlop" | "Boilerplate"
 5. Unlinked_PR
 6. Logic_Clones
 7. Antipattern_IDs       pipe-delimited rule labels (e.g. security:compiled_payload_anomaly|antipattern:ncd_anomaly)
 8. Collided_PRs          pipe-delimited collided PR numbers; empty if none
 9. Time_Saved_Hours      necrotic_count × triage_minutes_per_finding ÷ 60 (default 12 min; configurable via [billing] in janitor.toml)
10. Operational_Savings_USD ($150 critical / $20 GC-only / $0 otherwise; rates configurable via [billing] in janitor.toml)
11. Timestamp
12. PR_State
13. Is_Bot
14. Repo_Slug
15. Commit_SHA            Git SHA of the PR head at bounce time; from --head or GITHUB_SHA; empty when unavailable
16. Policy_Hash           BLAKE3 hex digest of janitor.toml at bounce time; empty when no manifest present (SOC 2 audit trail)

[billing] TOML table (override actuarial defaults):

[billing]
triage_minutes_per_finding = 12.0   # senior-engineer minutes per finding (Workslop 2026 default)
critical_threat_usd        = 150.0  # billing rate for Critical Threats
necrotic_usd               = 20.0   # billing rate for Necrotic GC flags

[webhook] TOML table (SIEM / Slack / Teams integration):

[webhook]
url    = "https://hooks.slack.com/services/..."
secret = "env:JANITOR_WEBHOOK_SECRET"   # or literal string for dev
events = ["critical_threat"]            # "critical_threat" | "necrotic_flag" | "all"

PDF Report Structure

Global Report (render_global_markdown)

Page Content
1 — Executive Summary Timestamp, repo count, PR count; Critical Threats / Necrotic GC / TEI table; methodology footnote
2 — Threat Distribution ASCII bar chart — one line per repo, Critical, Necrotic
3 — Repository Breakdown Table: Repository / PRs / Total Slop / Intercepts / Economic Impact / Worst PR
4 — Top 10 Riskiest PRs Cross-repo PRs with score > 50, ranked descending
— Scoring Methodology Billing-tier table + score formula
— Appendix: Full Audit Log Per-repo \newpage sections: metric table, Top 10 Sloppiest PRs, Top 10 Cleanest Contributors, C/C++ compile-time silos

Single-Repo Report (render_markdown)

Section Content
Executive Summary Workslop metric table (actionable intercepts, critical, GC, hours, TEI)
(page break)
Scoring Methodology Billing-tier table + score formula
Top 10 High-Risk PRs Table + antipattern detail expansion
Necrotic PRs Backlog Pruner GC flags
Structural Clones MinHash clone pairs
Zombie Dependencies Manifest scan results
Full PR Log Every PR scored > 0

VI. THE COMMAND & CONTROL INTERFACE

Invoked via: janitor dashboard Crate: crates/dashboard

Mode 1 — TargetSelection

Scans <gauntlet_dir> for cloned repositories. Displays them as a navigable list. Repositories with bounce_log.ndjson modified within the last 10 seconds are tagged [ AUDIT ACTIVE ] (blinking). List rescans automatically every 2 seconds.

Key Bindings:

Key Action
/ Navigate repository list
Enter Open repository in ActiveSurveillance mode
q Quit

Mode 2 — ActiveSurveillance (per-repo)

Full-screen view. Layout: title bar (3 rows) → tab selector (3 rows) → content (fill) → footer (1 row). Log file polled for changes every 2 seconds.

Three tabs:

Tab Index Content
Live Telemetry 0 PR delta feed: score, necrotic flag, clone count, antipattern detail strings, collided PR numbers
Structural Topology 1 Top-10 C++ compile-time silos ranked by transitive reach (petgraph). C++ graph rebuilt every 5 seconds when empty.
Swarm Intelligence 2 Structural clone cluster detection table: PR pairs, Jaccard similarity, band collisions

Key Bindings:

Key Action
/ Change tab
Esc / Backspace Return to TargetSelection
q Quit

(Mode 3 — Static Dashboard draw_dashboard removed in v7.9.1. The WOPR TUI is the sole production view.)


VII. OPERATIONAL COMMANDS — COMPLETE JUSTFILE MANIFEST

just shell
Drop into the Nix development environment (nix develop). All tools pinned via flake.nix.

just init
Scaffold workspace from scratch: write Cargo.toml, mkdir crates/, cargo new each crate. Destructive — resets existing workspace.

just audit
Definition of Done. Runs inside Nix shell if available: cargo fmt --all -- --checkcargo clippy --workspace -- -D warningscargo check --workspacecargo test --workspace

just build
cargo build --release --workspace (inside Nix shell if available).

just clean
cargo clean + find . -name "*.rkyv" -delete — vaporises target artefacts and all rkyv registry files.

just auth-refresh
No-op. Token is injected at runtime via --token flag. Stateless auth model.

just bump-version <version>
Updates version strings in Cargo.toml (root + all crates/ + tools/), README.md, docs/index.md, ARCHITECTURE.md, and CLAUDE.md. Runs cargo check as sanity pass.

just release <version>
Full release pipeline: auditbump-versioncargo build --releasestrip target/release/janitorgit commitgit tag v<version> → floating major tag (v<MAJOR>, force-pushed) → git pushgh release createmkdocs gh-deploy.

just run-gauntlet [*ARGS]
Build gauntlet-runner (cargo build --release -p gauntlet-runner) then execute. Reads gauntlet_targets.txt. Uses gh pr diff subshells per PR (parallel-bounce mode). Accepts: --pr-limit, --timeout, --targets, --gauntlet-dir, --out-dir, --resume, --concurrency (0 = auto from RAM).

just hyper-gauntlet [*ARGS]
Build gauntlet-runner + cli (cargo build --release -p gauntlet-runner -p cli) then execute with --hyper --pr-limit 5000. Clones repos once via libgit2, fetches all PR refs — zero gh pr diff subshells. Accepts same flags as run-gauntlet.

just deploy-docs
uv run --with "mkdocs-material<9.6" --with "mkdocs<2" mkdocs gh-deploy --force — builds and pushes MkDocs site to GitHub Pages.

just sync
rsync -av --delete to /mnt/c/Projects/the-janitor/ — excludes target/, .git/, .janitor/shadow_src/.


VIII. R&D VAULT — EXPERIMENTAL CRATES

Located at crates/experimental/. All four are workspace members but only advanced_threats is wired into the production forge pipeline.

Crate File Status Function
advanced_threats binary_hunter.rs PRODUCTION (wired into slop_filter.rs + cli) Zero-allocation AhoCorasick scanner for ELF/PE/WASM/miner byte patterns. 7 patterns. THREAT_LABEL = "security:compiled_payload_anomaly". +50 pts per match.
backlog_pruner PRODUCTION (wired into forge) Necrotic GC flag assignment: classifies PRs as SEMANTIC_NULL, GHOST_COLLISION, or UNWIRED_ISLAND. Populates necrotic_flag on SlopScore.
include_deflator PRODUCTION — graduated v7.9.2. C/C++ transitive header dependency analyser. IncludeGraphBuilder used in git_drive.rs; powers architecture:compile_time_bloat and architecture:graph_entanglement antipatterns and WOPR Structural Topology tab. C/C++ compile-time silo analysis
phantom_ffi_gate DELETED (v7.9.1) Architecture requires full-repo C++ registry (not patch-scope). Cannot be wired into PatchBouncer::bounce() — only produces false negatives on single-file diffs.

IX. FINAL VERSION

7.9.4

Extracted from [workspace.package].version in root Cargo.toml.

Release profile: opt-level = "z", lto = true, codegen-units = 1, strip = true, panic = "abort". MSRV: rust-version = "1.88" (enforced by CI MSRV workflow). Edition: 2021. License: BUSL-1.1 (all workspace crates via license.workspace = true).

v7.9.4 — Architecture Inversion Implementation

Architecture Inversion (Steps 1–4 complete):

  1. Governor: POST /v1/analysis-token — issues a short-lived (5-min TTL) Ed25519-signed JWT scoped to {repo}:{pr}:{head_sha}. Rate-limited: same (repo, PR) pair cannot get a new token within 60 s. Controlled by GOVERNOR_INVERT_MODE=1.

  2. CLI: --report-url + --analysis-token — after append_bounce_log, if both flags are set, POSTs the BounceLogEntry JSON to the Governor's /v1/report with Authorization: Bearer <token>. Non-fatal: source code stays on the runner.

  3. Governor: POST /v1/report — verifies the JWT, checks commit_sha == claims.head_sha, retrieves the pending check from pending_checks DashMap, updates the GitHub Check Run, removes the entry. Only active in invert mode.

  4. GitHub Action: invert_mode + governor_url inputs — pre-bounce step fetches the analysis token from /v1/analysis-token; token is passed to janitor bounce --report-url --analysis-token.

New env var: GOVERNOR_INVERT_MODE=1 — gates all inversion behaviour in the Governor. Default: 0 (legacy clone path). New CLI flags: janitor bounce --report-url <url> --analysis-token <jwt> New AppState fields: invert_mode: bool, token_rate_limit: DashMap, pending_checks: DashMap


End of SOVEREIGN BRIEFING.