SOVEREIGN BRIEFING — THE JANITOR ENGINE
Canonical state document. Generated by direct source audit. All values derive from code, not prior documentation. Supersedes all predecessor files.
I. CORE PRIMITIVES
Runtime Language Infrastructure
| Grammar | polyglot Static |
Extension(s) |
|---|---|---|
| Rust | RUST |
rs |
| Python | PYTHON |
py |
| JavaScript | JAVASCRIPT |
js, jsx |
| TypeScript | TYPESCRIPT |
ts, tsx |
| C++ | CPP |
cpp, cxx, cc, hpp |
| C | C |
c, h |
| Java | JAVA |
java |
| C# | CSHARP |
cs |
| Go | GO |
go |
| GLSL | GLSL |
glsl, vert, frag |
| Objective-C | OBJC |
m, mm |
| YAML | YAML |
yaml, yml |
| Bash | BASH |
sh, bash |
| Scala | SCALA |
scala |
| Ruby | RUBY |
rb |
| PHP | PHP |
php |
| Swift | SWIFT |
swift |
| Lua | LUA |
lua |
| HCL/Terraform | HCL |
tf, hcl |
| Nix | NIX |
nix |
| GDScript | GDSCRIPT |
gd |
| Kotlin | KOTLIN |
kt, kts |
23 grammars total. Each grammar is a OnceLock<Language> static — loaded once on first use, zero per-call allocation thereafter.
Grammar library: tree-sitter 0.26 (workspace pinned).
Foundational Crates & Mathematical Models
| Primitive | Crate / Constant | Purpose |
|---|---|---|
| Fuzzy clone detection | AstSimHasher — SimHash over (kind_id u16, depth u32) feature pairs |
Detects structural refactors without textual identity |
| Swarm clustering | LshIndex — MinHash, 8 bands × 8 rows, 64-hash sketch |
O(1) amortised PR → clone-cluster lookup |
| Patch entropy gate | check_entropy — zstd::encode_all level 3, ratio = compressed/raw |
NCD verbosity detection |
| Byte lattice shield | ByteLatticeAnalyzer — windowed Shannon entropy, 512-byte window, 256-byte stride |
Binary/generated blob detection |
| Compiled payload scanner | binary_hunter — AhoCorasick over 7 static byte patterns |
ELF/PE/WASM/miner injection detection |
| Merge simulation | simulate_merge via libgit2 — diff.foreach + RefCell patch accumulation |
Zero disk checkout; O(PR-diff) memory |
| Symbol persistence | rkyv 0.8 — zero-copy archive format |
IPC registry at .janitor/symbols.rkyv |
| Hot-swap registry | arc_swap::ArcSwap<SymbolRegistry> |
Lock-free atomic reload without daemon downtime |
| Memory backpressure | Physarum — sysinfo 0.30, threshold model |
System-aware concurrency gate; prevents OOM |
| Dependency graph | petgraph 0.7 — transitive compile-time reach analysis |
C++ silo detection in Structural Topology tab |
| Pattern matching | aho_corasick 1.1 — single automaton per pattern group (OnceLock) |
Hot-path multi-pattern search without recompilation |
| Hashing | blake3 1.5 — exact structural hash per function body |
Zombie/clone identity |
| Security — PQC | ML-DSA-65 (FIPS 204) — PRODUCTION. the-governor/src/compliance/pqc.rs implements keygen/sign/verify via the fips204 crate. The Governor generates a persistent governor.key on first run and signs every CycloneDX CBOM bond with it. pqc_enforced: true in janitor.toml blocks the PR merge if bond issuance fails. | Post-quantum attestation for CycloneDX v1.5 CBOM bonds |
| Security — Ed25519 | vault::SigningOracle::verify_token — public-key-only; no private key in binary |
API token gate for janitor_clean (MCP) and /v1/attest |
II. POLICY LAYER
janitor.toml at the repository root encodes maintainer-controlled slop tolerance as
version-controlled, reviewable configuration. Loaded by JanitorPolicy::load() in
crates/common/src/policy.rs. Full field reference: docs/governance.md /
https://thejanitor.app/governance.
| Field | Default | Purpose |
|---|---|---|
min_slop_score |
100 | Gate threshold — raw score ceiling |
require_issue_link |
false | Block PRs with no #N reference |
allowed_zombies |
false | Downgrade zombie veto to warning |
pqc_enforced |
false | Block PR when ML-DSA-65 bond fails (needs Sentinel) |
refactor_bonus |
0 | Raise gate for [REFACTOR]/[FIXES-DEBT] PRs |
custom_antipatterns |
[] | Project-specific .scm query files |
trusted_bot_authors |
[] | Handles exempt from unlinked-PR penalty |
[forge].automation_accounts |
[] | Ecosystem accounts without [bot] suffix |
III. EXECUTION PIPELINE — HYPER-GAUNTLET
Entry point: tools/gauntlet-runner binary, invoked via just hyper-gauntlet or just run-gauntlet.
Stage 0 — Pre-emptive Workspace Purge
For each target repo (non-resume mode):
rm -rf <gauntlet_dir>/<owner>/<repo>
# Clean slate: no stale bounce_log.ndjson bleeds into aggregate
Resume mode (--resume):
Save <repo>/.janitor/bounce_log.ndjson → memory buffer
Perform clone/fetch below
Restore buffer → <repo>/.janitor/bounce_log.ndjson
Stage 1 — Repository Hydration
Hyper-drive mode (--hyper, default for just hyper-gauntlet):
git clone --no-checkout https://github.com/<owner>/<repo>.git <repo_dir>
git -C <repo_dir> fetch origin 'refs/pull/*/head:refs/remotes/origin/pr/*'
Zero gh pr diff subshells. All PR refs land in the packfile.
Parallel-bounce mode (default for just run-gauntlet):
gh pr list --json number,author,mergeable,state --limit <PR_LIMIT>
Filter: exclude CONFLICTING, exclude Bot authors
For each open PR (rayon pool sized by `detect_optimal_concurrency()`, GIT_LOCK mutex):
gh pr diff <N> --repo <slug> → diff text
janitor bounce <repo_dir> --patch <diff> --pr-number <N> --author <A> --format json
Stage 2 — In-Memory Merge Simulation (Hyper-drive)
janitor hyper-drive <repo_dir> --pr-limit <N> --timeout <S>s
→ build_symbols_rkyv(repo_path, base_sha) [git_drive.rs — Necrotic Hydration]
→ For each PR ref:
simulate_merge(repo, base_oid, pr_head_oid) → MergeSnapshot {
blobs: HashMap<PathBuf, Vec<u8>> // Full HEAD blob per file
patches: HashMap<PathBuf, String> // Actual unified diff per file
deleted: Vec<PathBuf>
total_bytes: usize
}
Routing:
snapshot.blobs → IncludeGraphBuilder + SemanticNull
snapshot.patches → PatchBouncer (SlopHunter, AstSimHasher, NCD)
iter_by_priority() feeds high-SLOP-vector extensions first (Chemotaxis):
"rs","py","go","js","ts","tsx","jsx","cs","java","cpp","cc","cxx","c"
Stage 3 — PatchBouncer Per-File (sequence within bounce())
1. Language detection lang_for_ext() from "+++ b/<path>" header
2. Circuit breaker patch > 1 MiB → skip
3. IAC bypass ext in IAC_TEXT_EXTS → skip ByteLatticeAnalyzer
4. Binary asset bypass ext in BINARY_ASSET_EXTS → skip ByteLatticeAnalyzer
5. ByteLatticeAnalyzer AnomalousBlob? → antipattern:agnostic_shield_anomaly
6. Extract added lines "+"-prefixed diff lines only
7. Tree-sitter parse grammar for lang; ERROR/MISSING nodes → neutral score
8. Structural hashing BLAKE3 exact + SimHash fuzzy per function/method
9. Logic clone detection Hamming(a, b) ≤ 3 → Refactor; 4–9 → Zombie
10. find_slop(lang, source) Language-specific AST antipatterns (see Threat Matrix)
11. check_entropy(patch_bytes) NCD verbosity gate (zstd ratio < 0.05)
12. binary_hunter::scan() AhoCorasick 7-pattern compiled payload scan
13. CommentScanner Banned-phrase detection in added comment nodes
14. is_pr_unlinked(pr_body) No linked issue → +20 pts
15. Collider lookup LshIndex.query(PrDeltaSignature) → collided_pr_numbers
16. Necrotic Hydration check backlog_pruner verdict → necrotic_flag
Stage 4 — Bounce Log Persistence
Each PR result appended to <repo_dir>/.janitor/bounce_log.ndjson via append_bounce_log().
f.sync_all() called after every write — survives SIGKILL.
Stage 5 — Global Aggregation (post all repos)
Two threads spawned in parallel after sequential repo processing:
Thread A: janitor report --global --format pdf → <gauntlet_dir>/global_report.pdf
Thread B: janitor export --global → <gauntlet_dir>/export.csv
IV. THE THREAT MATRIX
All threats detected by PatchBouncer::bounce() in crates/forge/src/slop_filter.rs.
Tier 1 — Critical Threats (security: prefix → $150/intercept)
| ID | Detector | Condition | Points |
|---|---|---|---|
security:compiled_payload_anomaly |
binary_hunter |
ELF magic \x7fELF, WASM \x00asm\x01\x00\x00\x00, PE MZ\x90\x00\x03, /bin/sh\x00, cmd.exe\x00, stratum+tcp://, stratum2+tcp:// |
+50 per match |
| Swarm Collision | LshIndex |
collided_pr_numbers non-empty |
Categorical → $150 billing |
Tier 2 — Architectural Antipatterns (AST-derived)
Detected by find_slop(lang, source) in crates/forge/src/slop_hunter.rs:
| Language | Pattern | Severity | Points |
|---|---|---|---|
| YAML | VirtualService/Ingress/HTTPRoute/Gateway with hosts: ["*"] |
Critical | 50 |
| C | gets() call (removed in C11; unbounded buffer overflow) |
Critical | 50 |
| HCL/Terraform | Open CIDR 0.0.0.0/0 in ingress rule |
Critical | 50 |
Tier 3 — NCD Verbosity (antipattern: prefix → Warning tier)
| ID | Detector | Condition | Points |
|---|---|---|---|
antipattern:ncd_anomaly |
check_entropy |
compressed/raw < 0.05 AND patch ≥ 256 bytes |
10 |
Critical distinction: antipattern: prefix — NOT security:. is_critical_threat() gates on "security:". NCD intentionally uses antipattern: to avoid $150 categorical billing.
Tier 4 — Structural Quality
| Detector | Condition | Score Effect |
|---|---|---|
| Logic clone detection | Hamming ≤ 3 (Refactor-class similarity) | +5 per clone pair (capped at 50 pairs) |
| Zombie symbols | Dead body hash matches symbol in registry | +10 per zombie |
| Comment violations | Banned phrase in added comment | +5 per violation |
| Unlinked PR | No issue reference in PR body | +20 |
| Hallucinated security fix | Security keywords, non-code-only diff | +100 |
Bypass Rules
| Condition | Effect |
|---|---|
ext in IAC_TEXT_EXTS (nix lock json toml yaml yml csv) |
Skip ByteLatticeAnalyzer |
ext in BINARY_ASSET_EXTS (wasm woff woff2 eot ttf png jpg jpeg gif ico zip gz tar pdf) |
Skip ByteLatticeAnalyzer |
| patch > 1 MiB | Skip entire file |
| Nix entities | Protection::WisdomRule — all symbols shielded from dead-code classification |
Bot author (app/ prefix, [bot] suffix, trusted list, forge.automation_accounts) |
Score still computed; billing classification unchanged |
V. THE ACTUARIAL LEDGER
Classification Function (crates/cli/src/report.rs)
pub fn is_critical_threat(e: &BounceLogEntry) -> bool {
e.antipatterns.iter().any(|a| a.contains("security:"))
|| !e.collided_pr_numbers.is_empty()
}
Billing Tiers
| Classification | Condition | Rate |
|---|---|---|
| Critical Threat | is_critical_threat(e) == true |
$150 per intercept |
| GC-only Necrotic | necrotic_flag.is_some() OR !zombie_deps.is_empty() AND NOT critical |
$20 per intercept |
| StructuralSlop | slop_score > 0 AND NOT critical AND NOT necrotic |
$20 per intercept |
| Boilerplate | slop_score == 0, no threat signal |
$0 |
Score Formula (SlopScore::score())
score = (logic_clones_found.min(50) × 5)
+ (zombie_symbols_added × 10)
+ (antipattern_score.min(500)) ← sum of Severity::points() per finding
+ (comment_violations × 5)
+ (unlinked_pr × 20)
+ (hallucinated_security_fix × 100)
dead_symbols_added is tracked but excluded from score() (v7.6.2 — FFI false-positive elimination).
Total Economic Impact (TEI)
critical_threats_count = entries where is_critical_threat()
gc_only_count = entries where necrotic_flag.is_some() AND NOT is_critical_threat()
structural_slop_count = entries where slop_score > 0 AND NOT critical AND NOT necrotic
total_actionable_intercepts = critical_threats_count + gc_only_count + structural_slop_count
ci_compute_saved_usd = critical_threats_count × $150
gc_value_usd = gc_only_count × $20
structural_slop_usd = structural_slop_count × $20
total_economic_impact = ci_compute_saved_usd + gc_value_usd + structural_slop_usd
CSV Column Schema (16 columns, exact order)
1. PR_Number
2. Author
3. Score
4. Threat_Class "Critical" | "Necrotic" | "StructuralSlop" | "Boilerplate"
5. Unlinked_PR
6. Logic_Clones
7. Antipattern_IDs pipe-delimited rule labels (e.g. security:compiled_payload_anomaly|antipattern:ncd_anomaly)
8. Collided_PRs pipe-delimited collided PR numbers; empty if none
9. Time_Saved_Hours necrotic_count × triage_minutes_per_finding ÷ 60 (default 12 min; configurable via [billing] in janitor.toml)
10. Operational_Savings_USD ($150 critical / $20 GC-only / $0 otherwise; rates configurable via [billing] in janitor.toml)
11. Timestamp
12. PR_State
13. Is_Bot
14. Repo_Slug
15. Commit_SHA Git SHA of the PR head at bounce time; from --head or GITHUB_SHA; empty when unavailable
16. Policy_Hash BLAKE3 hex digest of janitor.toml at bounce time; empty when no manifest present (SOC 2 audit trail)
[billing] TOML table (override actuarial defaults):
[billing]
triage_minutes_per_finding = 12.0 # senior-engineer minutes per finding (Workslop 2026 default)
critical_threat_usd = 150.0 # billing rate for Critical Threats
necrotic_usd = 20.0 # billing rate for Necrotic GC flags
[webhook] TOML table (SIEM / Slack / Teams integration):
[webhook]
url = "https://hooks.slack.com/services/..."
secret = "env:JANITOR_WEBHOOK_SECRET" # or literal string for dev
events = ["critical_threat"] # "critical_threat" | "necrotic_flag" | "all"
PDF Report Structure
Global Report (render_global_markdown)
| Page | Content |
|---|---|
| 1 — Executive Summary | Timestamp, repo count, PR count; Critical Threats / Necrotic GC / TEI table; methodology footnote |
| 2 — Threat Distribution | ASCII bar chart — one line per repo, █ Critical, ░ Necrotic |
| 3 — Repository Breakdown | Table: Repository / PRs / Total Slop / Intercepts / Economic Impact / Worst PR |
| 4 — Top 10 Riskiest PRs | Cross-repo PRs with score > 50, ranked descending |
| — Scoring Methodology | Billing-tier table + score formula |
| — Appendix: Full Audit Log | Per-repo \newpage sections: metric table, Top 10 Sloppiest PRs, Top 10 Cleanest Contributors, C/C++ compile-time silos |
Single-Repo Report (render_markdown)
| Section | Content |
|---|---|
| Executive Summary | Workslop metric table (actionable intercepts, critical, GC, hours, TEI) |
| (page break) | |
| Scoring Methodology | Billing-tier table + score formula |
| Top 10 High-Risk PRs | Table + antipattern detail expansion |
| Necrotic PRs | Backlog Pruner GC flags |
| Structural Clones | MinHash clone pairs |
| Zombie Dependencies | Manifest scan results |
| Full PR Log | Every PR scored > 0 |
VI. THE COMMAND & CONTROL INTERFACE
Invoked via: janitor dashboard
Crate: crates/dashboard
Mode 1 — TargetSelection
Scans <gauntlet_dir> for cloned repositories. Displays them as a navigable list.
Repositories with bounce_log.ndjson modified within the last 10 seconds are tagged [ AUDIT ACTIVE ] (blinking).
List rescans automatically every 2 seconds.
Key Bindings:
| Key | Action |
|---|---|
↑ / ↓ |
Navigate repository list |
Enter |
Open repository in ActiveSurveillance mode |
q |
Quit |
Mode 2 — ActiveSurveillance (per-repo)
Full-screen view. Layout: title bar (3 rows) → tab selector (3 rows) → content (fill) → footer (1 row). Log file polled for changes every 2 seconds.
Three tabs:
| Tab | Index | Content |
|---|---|---|
| Live Telemetry | 0 | PR delta feed: score, necrotic flag, clone count, antipattern detail strings, collided PR numbers |
| Structural Topology | 1 | Top-10 C++ compile-time silos ranked by transitive reach (petgraph). C++ graph rebuilt every 5 seconds when empty. |
| Swarm Intelligence | 2 | Structural clone cluster detection table: PR pairs, Jaccard similarity, band collisions |
Key Bindings:
| Key | Action |
|---|---|
← / → |
Change tab |
Esc / Backspace |
Return to TargetSelection |
q |
Quit |
(Mode 3 — Static Dashboard draw_dashboard removed in v7.9.1. The WOPR TUI is the sole production view.)
VII. OPERATIONAL COMMANDS — COMPLETE JUSTFILE MANIFEST
Drop into the Nix development environment (nix develop). All tools pinned via flake.nix.
Scaffold workspace from scratch: write Cargo.toml, mkdir crates/, cargo new each crate. Destructive — resets existing workspace.
Definition of Done. Runs inside Nix shell if available:
cargo fmt --all -- --check → cargo clippy --workspace -- -D warnings → cargo check --workspace → cargo test --workspace
cargo build --release --workspace (inside Nix shell if available).
cargo clean + find . -name "*.rkyv" -delete — vaporises target artefacts and all rkyv registry files.
No-op. Token is injected at runtime via --token flag. Stateless auth model.
Updates version strings in Cargo.toml (root + all crates/ + tools/), README.md, docs/index.md, ARCHITECTURE.md, and CLAUDE.md. Runs cargo check as sanity pass.
Full release pipeline: audit → bump-version → cargo build --release → strip target/release/janitor → git commit → git tag v<version> → floating major tag (v<MAJOR>, force-pushed) → git push → gh release create → mkdocs gh-deploy.
Build gauntlet-runner (cargo build --release -p gauntlet-runner) then execute. Reads gauntlet_targets.txt. Uses gh pr diff subshells per PR (parallel-bounce mode). Accepts: --pr-limit, --timeout, --targets, --gauntlet-dir, --out-dir, --resume, --concurrency (0 = auto from RAM).
Build gauntlet-runner + cli (cargo build --release -p gauntlet-runner -p cli) then execute with --hyper --pr-limit 5000. Clones repos once via libgit2, fetches all PR refs — zero gh pr diff subshells. Accepts same flags as run-gauntlet.
uv run --with "mkdocs-material<9.6" --with "mkdocs<2" mkdocs gh-deploy --force — builds and pushes MkDocs site to GitHub Pages.
rsync -av --delete to /mnt/c/Projects/the-janitor/ — excludes target/, .git/, .janitor/shadow_src/.
VIII. R&D VAULT — EXPERIMENTAL CRATES
Located at crates/experimental/. All four are workspace members but only advanced_threats is wired into the production forge pipeline.
| Crate | File | Status | Function |
|---|---|---|---|
advanced_threats |
binary_hunter.rs |
PRODUCTION (wired into slop_filter.rs + cli) |
Zero-allocation AhoCorasick scanner for ELF/PE/WASM/miner byte patterns. 7 patterns. THREAT_LABEL = "security:compiled_payload_anomaly". +50 pts per match. |
backlog_pruner |
— | PRODUCTION (wired into forge) |
Necrotic GC flag assignment: classifies PRs as SEMANTIC_NULL, GHOST_COLLISION, or UNWIRED_ISLAND. Populates necrotic_flag on SlopScore. |
include_deflator |
— | PRODUCTION — graduated v7.9.2. C/C++ transitive header dependency analyser. IncludeGraphBuilder used in git_drive.rs; powers architecture:compile_time_bloat and architecture:graph_entanglement antipatterns and WOPR Structural Topology tab. | C/C++ compile-time silo analysis |
phantom_ffi_gate |
— | DELETED (v7.9.1) | Architecture requires full-repo C++ registry (not patch-scope). Cannot be wired into PatchBouncer::bounce() — only produces false negatives on single-file diffs. |
IX. FINAL VERSION
Extracted from [workspace.package].version in root Cargo.toml.
Release profile: opt-level = "z", lto = true, codegen-units = 1, strip = true, panic = "abort".
MSRV: rust-version = "1.88" (enforced by CI MSRV workflow).
Edition: 2021.
License: BUSL-1.1 (all workspace crates via license.workspace = true).
v7.9.4 — Architecture Inversion Implementation
Architecture Inversion (Steps 1–4 complete):
-
Governor:
POST /v1/analysis-token— issues a short-lived (5-min TTL) Ed25519-signed JWT scoped to{repo}:{pr}:{head_sha}. Rate-limited: same (repo, PR) pair cannot get a new token within 60 s. Controlled byGOVERNOR_INVERT_MODE=1. -
CLI:
--report-url+--analysis-token— afterappend_bounce_log, if both flags are set, POSTs theBounceLogEntryJSON to the Governor's/v1/reportwithAuthorization: Bearer <token>. Non-fatal: source code stays on the runner. -
Governor:
POST /v1/report— verifies the JWT, checkscommit_sha == claims.head_sha, retrieves the pending check frompending_checksDashMap, updates the GitHub Check Run, removes the entry. Only active in invert mode. -
GitHub Action:
invert_mode+governor_urlinputs — pre-bounce step fetches the analysis token from/v1/analysis-token; token is passed tojanitor bounce --report-url --analysis-token.
New env var: GOVERNOR_INVERT_MODE=1 — gates all inversion behaviour in the Governor. Default: 0 (legacy clone path).
New CLI flags: janitor bounce --report-url <url> --analysis-token <jwt>
New AppState fields: invert_mode: bool, token_rate_limit: DashMap, pending_checks: DashMap
End of SOVEREIGN BRIEFING.