Skip to content

SOVEREIGN BRIEFING — THE JANITOR ENGINE

Canonical state document. Generated by direct source audit. All values derive from code, not prior documentation. Supersedes all predecessor files.


I. CORE PRIMITIVES

Runtime Language Infrastructure

Grammar polyglot Static Extension(s)
Rust RUST rs
Python PYTHON py
JavaScript JAVASCRIPT js, jsx
TypeScript TYPESCRIPT ts, tsx
C++ CPP cpp, cxx, cc, hpp
C C c, h
Java JAVA java
C# CSHARP cs
Go GO go
GLSL GLSL glsl, vert, frag
Objective-C OBJC m, mm
YAML YAML yaml, yml
Bash BASH sh, bash
Scala SCALA scala
Ruby RUBY rb
PHP PHP php
Swift SWIFT swift
Lua LUA lua
HCL/Terraform HCL tf, hcl
Nix NIX nix
GDScript GDSCRIPT gd
Kotlin KOTLIN kt, kts

23 grammars total. Each grammar is a OnceLock<Language> static — loaded once on first use, zero per-call allocation thereafter.

Active AST security rules: 23/23 grammars (100%) have dedicated find_<lang>_slop or find_<lang>_danger_nodes AST walks in crates/forge/src/slop_hunter.rs (v8.8.0). Phase 7 completed full grammar coverage: Rust (unsafe transmute + raw ptr deref), GLSL (dangerous extension byte scan), HCL/Terraform (data external + local-exec provisioner AST walk), and TSX/JSX (dangerouslySetInnerHTML attribute walk).

Active offensive expansion lanes: Live-Tenant AEG HTML Harness Generation, GraphQL/AsyncAPI Trust Boundary Extraction, and Web3 EVM Invariant Checking are first-class enforcement surfaces, not roadmap placeholders.

Grammar library: tree-sitter 0.26 (workspace pinned).

Foundational Crates & Mathematical Models

Primitive Crate / Constant Purpose
Fuzzy clone detection AstSimHasher — SimHash over (kind_id u16, depth u32) feature pairs Detects structural refactors without textual identity
Swarm clustering LshIndex — MinHash, 8 bands × 8 rows, 64-hash sketch O(1) amortised PR → clone-cluster lookup
Patch entropy gate check_entropyzstd::encode_all level 3, ratio = compressed/raw NCD verbosity detection
Byte lattice shield ByteLatticeAnalyzer — windowed Shannon entropy, 512-byte window, 256-byte stride Binary/generated blob detection
Compiled payload scanner binary_hunter — AhoCorasick over 7 static byte patterns ELF/PE/WASM/miner injection detection
Merge simulation simulate_merge via libgit2diff.foreach + RefCell patch accumulation Zero disk checkout; O(PR-diff) memory
Symbol persistence rkyv 0.8 — zero-copy archive format IPC registry at .janitor/symbols.rkyv
Hot-swap registry arc_swap::ArcSwap<SymbolRegistry> Lock-free atomic reload without daemon downtime
Memory backpressure Physarumsysinfo 0.30, threshold model System-aware concurrency gate; prevents OOM
Dependency graph petgraph 0.7 — transitive compile-time reach analysis C++ silo detection in Structural Topology tab
Pattern matching aho_corasick 1.1 — single automaton per pattern group (OnceLock) Hot-path multi-pattern search without recompilation
Hashing blake3 1.5 — exact structural hash per function body Zombie/clone identity
Security — Dual-PQC ML-DSA-65 (FIPS 204) + SLH-DSA-SHAKE-192s (FIPS 205) — PRODUCTION. Dual-PQC key bundles (4128 bytes) generated via janitor generate-keys. The Governor signs every CycloneDX v1.6 CBOM bond with both algorithms. pqc_enforced: true in janitor.toml blocks PR merge if bond issuance fails. Post-quantum attestation for CycloneDX v1.6 CBOM bonds
Security — Ed25519 vault::SigningOracle::verify_token — public-key-only; no private key in binary API token gate for janitor_clean (MCP) and /v1/attest

II. POLICY LAYER

janitor.toml at the repository root encodes maintainer-controlled slop tolerance as version-controlled, reviewable configuration. Loaded by JanitorPolicy::load() in crates/common/src/policy.rs. Full field reference: Setup → janitor.toml Reference.

Field Default Purpose
min_slop_score 100 Gate threshold — raw score ceiling
require_issue_link false Block PRs with no #N reference
allowed_zombies false Downgrade zombie veto to warning
pqc_enforced false Block PR when ML-DSA-65 bond fails (needs Sentinel)
refactor_bonus 0 Raise gate for [REFACTOR]/[FIXES-DEBT] PRs
custom_antipatterns [] Project-specific .scm query files
trusted_bot_authors [] Handles exempt from unlinked-PR penalty
[forge].automation_accounts [] Ecosystem accounts without [bot] suffix

III. EXECUTION PIPELINE — HYPER-GAUNTLET

Entry point: tools/gauntlet-runner binary, invoked via just hyper-gauntlet or just run-gauntlet.

Stage 0 — Pre-emptive Workspace Purge

For each target repo (non-resume mode):
  rm -rf <gauntlet_dir>/<owner>/<repo>
  # Clean slate: no stale bounce_log.ndjson bleeds into aggregate

Resume mode (--resume):
  Save <repo>/.janitor/bounce_log.ndjson → memory buffer
  Perform clone/fetch below
  Restore buffer → <repo>/.janitor/bounce_log.ndjson

Stage 1 — Repository Hydration

Hyper-drive mode (--hyper, default for just hyper-gauntlet):

git clone --no-checkout https://github.com/<owner>/<repo>.git <repo_dir>
git -C <repo_dir> fetch origin 'refs/pull/*/head:refs/remotes/origin/pr/*'

Zero gh pr diff subshells. All PR refs land in the packfile.

Parallel-bounce mode (default for just run-gauntlet):

gh pr list --json number,author,mergeable,state --limit <PR_LIMIT>
Filter: exclude CONFLICTING, exclude Bot authors
For each open PR (rayon pool sized by `detect_optimal_concurrency()`, GIT_LOCK mutex):
  gh pr diff <N> --repo <slug>   → diff text
  janitor bounce <repo_dir> --patch <diff> --pr-number <N> --author <A> --format json

Stage 2 — In-Memory Merge Simulation (Hyper-drive)

janitor hyper-drive <repo_dir> --pr-limit <N> --timeout <S>s
  → build_symbols_rkyv(repo_path, base_sha)    [git_drive.rs — Necrotic Hydration]
  → For each PR ref:
      simulate_merge(repo, base_oid, pr_head_oid) → MergeSnapshot {
          blobs:   HashMap<PathBuf, Vec<u8>>    // Full HEAD blob per file
          patches: HashMap<PathBuf, String>     // Actual unified diff per file
          deleted: Vec<PathBuf>
          total_bytes: usize
      }
      Routing:
        snapshot.blobs   → IncludeGraphBuilder + SemanticNull
        snapshot.patches → PatchBouncer (SlopHunter, AstSimHasher, NCD)
      iter_by_priority() feeds high-SLOP-vector extensions first (Chemotaxis):
        "rs","py","go","js","ts","tsx","jsx","cs","java","cpp","cc","cxx","c"

Stage 3 — PatchBouncer Per-File (sequence within bounce())

1. Language detection          lang_for_ext() from "+++ b/<path>" header
2. Circuit breaker             patch > 1 MiB → skip
3. IAC bypass                  ext in IAC_TEXT_EXTS → skip ByteLatticeAnalyzer
4. Binary asset bypass         ext in BINARY_ASSET_EXTS → skip ByteLatticeAnalyzer
5. ByteLatticeAnalyzer         AnomalousBlob? → antipattern:agnostic_shield_anomaly
6. Extract added lines         "+"-prefixed diff lines only
7. Tree-sitter parse           grammar for lang; ERROR/MISSING nodes → neutral score
8. Structural hashing          BLAKE3 exact + SimHash fuzzy per function/method
9. Logic clone detection       Hamming(a, b) ≤ 3 → Refactor; 4–9 → Zombie
10. find_slop(lang, source)    Language-specific AST antipatterns (see Threat Matrix)
11. check_entropy(patch_bytes) NCD verbosity gate (zstd ratio < 0.05)
12. binary_hunter::scan()      AhoCorasick 7-pattern compiled payload scan
13. CommentScanner             Banned-phrase detection in added comment nodes
14. is_pr_unlinked(pr_body)    No linked issue → +20 pts
15. Collider lookup            LshIndex.query(PrDeltaSignature) → collided_pr_numbers
16. Necrotic Hydration check   backlog_pruner verdict → necrotic_flag

Stage 4 — Bounce Log Persistence

Each PR result appended to <repo_dir>/.janitor/bounce_log.ndjson via append_bounce_log(). f.sync_all() called after every write — survives SIGKILL.

Stage 5 — Global Aggregation (post all repos)

Two threads spawned in parallel after sequential repo processing:

Thread A: janitor report --global --format pdf  → <gauntlet_dir>/global_report.pdf
Thread B: janitor export --global               → <gauntlet_dir>/export.csv

IV. THE THREAT MATRIX

All threats detected by PatchBouncer::bounce() in crates/forge/src/slop_filter.rs.

Tier 1 — Critical Threats (security: prefix → $150/intercept)

ID Detector Condition Points
security:compiled_payload_anomaly binary_hunter ELF magic \x7fELF, WASM \x00asm\x01\x00\x00\x00, PE MZ\x90\x00\x03, /bin/sh\x00, cmd.exe\x00, stratum+tcp://, stratum2+tcp:// +50 per match
security:signature_replay solidity_taint Solidity ecrecover flow lacks nonce consumption or block.chainid domain separation +50 per match
security:unsafe_delegatecall solidity_taint Solidity delegatecall target is derived from caller-controlled input without an owner/role guard +50 per match
Swarm Collision LshIndex collided_pr_numbers non-empty Categorical → $150 billing

Tier 2 — Architectural Antipatterns (AST-derived)

Detected by find_slop(lang, source) in crates/forge/src/slop_hunter.rs:

Language Pattern Severity Points
YAML VirtualService/Ingress/HTTPRoute/Gateway with hosts: ["*"] Critical 50
C gets() call (removed in C11; unbounded buffer overflow) Critical 50
HCL/Terraform Open CIDR 0.0.0.0/0 in ingress rule Critical 50

Tier 3 — NCD Verbosity (antipattern: prefix → Warning tier)

ID Detector Condition Points
antipattern:ncd_anomaly check_entropy compressed/raw < 0.05 AND patch ≥ 256 bytes 10

Critical distinction: antipattern: prefix — NOT security:. is_critical_threat() gates on "security:". NCD intentionally uses antipattern: to avoid $150 categorical billing.

Tier 4 — Structural Quality

Detector Condition Score Effect
Logic clone detection Hamming ≤ 3 (Refactor-class similarity) +5 per clone pair (capped at 50 pairs)
Zombie symbols Dead body hash matches symbol in registry +10 per zombie
Comment violations Banned phrase in added comment +5 per violation
Unlinked PR No issue reference in PR body +20
Hallucinated security fix Security keywords, non-code-only diff +100

Bypass Rules

Condition Effect
ext in IAC_TEXT_EXTS (nix lock json toml yaml yml csv) Skip ByteLatticeAnalyzer
ext in BINARY_ASSET_EXTS (wasm woff woff2 eot ttf png jpg jpeg gif ico zip gz tar pdf) Skip ByteLatticeAnalyzer
patch > 1 MiB Skip entire file
Nix entities Protection::WisdomRule — all symbols shielded from dead-code classification
Bot author (app/ prefix, [bot] suffix, trusted list, forge.automation_accounts) Score still computed; billing classification unchanged

V. THE ACTUARIAL LEDGER

Classification Function (crates/cli/src/report.rs)

pub fn is_critical_threat(e: &BounceLogEntry) -> bool {
    e.antipatterns.iter().any(|a| a.contains("security:"))
        || !e.collided_pr_numbers.is_empty()
}

Billing Tiers

Classification Condition Rate
Critical Threat is_critical_threat(e) == true $150 per intercept
GC-only Necrotic necrotic_flag.is_some() OR !zombie_deps.is_empty() AND NOT critical $20 per intercept
StructuralSlop slop_score > 0 AND NOT critical AND NOT necrotic $20 per intercept
Boilerplate slop_score == 0, no threat signal $0

Score Formula (SlopScore::score())

score = (logic_clones_found.min(50) × 5)
      + (zombie_symbols_added × 10)
      + (antipattern_score.min(500))      ← sum of Severity::points() per finding
      + (comment_violations × 5)
      + (unlinked_pr × 20)
      + (hallucinated_security_fix × 100)

dead_symbols_added is tracked but excluded from score() (v7.6.2 — FFI false-positive elimination).

Total Economic Impact (TEI)

critical_threats_count    = entries where is_critical_threat()
gc_only_count             = entries where necrotic_flag.is_some() AND NOT is_critical_threat()
structural_slop_count     = entries where slop_score > 0 AND NOT critical AND NOT necrotic
total_actionable_intercepts = critical_threats_count + gc_only_count + structural_slop_count

critical_threat_bounty_usd = critical_threats_count × $150
gc_value_usd               = gc_only_count × $20
structural_slop_usd        = structural_slop_count × $20
total_economic_impact      = critical_threat_bounty_usd + gc_value_usd + structural_slop_usd
total_ci_energy_saved_kwh  = sum(ci_energy_saved_kwh per entry)   # configurable via [billing] ci_kwh_per_run

CSV Column Schema (16 columns, exact order)

 1. PR_Number
 2. Author
 3. Score
 4. Threat_Class          "Critical" | "Necrotic" | "StructuralSlop" | "Boilerplate"
 5. Unlinked_PR
 6. Logic_Clones
 7. Antipattern_IDs       pipe-delimited rule labels (e.g. security:compiled_payload_anomaly|antipattern:ncd_anomaly)
 8. Collided_PRs          pipe-delimited collided PR numbers; empty if none
 9. Time_Saved_Hours      necrotic_count × triage_minutes_per_finding ÷ 60 (default 12 min; configurable via [billing] in janitor.toml)
10. Operational_Savings_USD ($150 critical / $20 GC-only / $0 otherwise; rates configurable via [billing] in janitor.toml)
11. Timestamp
12. PR_State
13. Is_Bot
14. Repo_Slug
15. Commit_SHA            Git SHA of the PR head at bounce time; from --head or GITHUB_SHA; empty when unavailable
16. Policy_Hash           BLAKE3 hex digest of janitor.toml at bounce time; empty when no manifest present (SOC 2 audit trail)

[billing] TOML table (override actuarial defaults):

[billing]
triage_minutes_per_finding = 12.0   # senior-engineer minutes per finding (Workslop 2026 default)
critical_threat_usd        = 150.0  # billing rate for Critical Threats
necrotic_usd               = 20.0   # billing rate for Necrotic GC flags

[webhook] TOML table (SIEM / Slack / Teams integration):

[webhook]
url    = "https://hooks.slack.com/services/..."
secret = "env:JANITOR_WEBHOOK_SECRET"   # or literal string for dev
events = ["critical_threat"]            # "critical_threat" | "necrotic_flag" | "all"

PDF Report Structure

Global Report (render_global_markdown)

Page Content
1 — Executive Summary Timestamp, repo count, PR count; Critical Threats / Necrotic GC / TEI table; methodology footnote
2 — Threat Distribution ASCII bar chart — one line per repo, Critical, Necrotic
3 — Repository Breakdown Table: Repository / PRs / Total Slop / Intercepts / Economic Impact / Worst PR
4 — Top 10 Riskiest PRs Cross-repo PRs with score > 50, ranked descending
— Scoring Methodology Billing-tier table + score formula
— Appendix: Full Audit Log Per-repo \newpage sections: metric table, Top 10 Sloppiest PRs, Top 10 Cleanest Contributors, C/C++ compile-time silos

Single-Repo Report (render_markdown)

Section Content
Executive Summary Workslop metric table (actionable intercepts, critical, GC, hours, TEI)
(page break)
Scoring Methodology Billing-tier table + score formula
Top 10 High-Risk PRs Table + antipattern detail expansion
Necrotic PRs Backlog Pruner GC flags
Structural Clones MinHash clone pairs
Zombie Dependencies Manifest scan results
Full PR Log Every PR scored > 0

VI. THE COMMAND & CONTROL INTERFACE

Invoked via: janitor dashboard Crate: crates/dashboard

Mode 1 — TargetSelection

Scans <gauntlet_dir> for cloned repositories. Displays them as a navigable list. Repositories with bounce_log.ndjson modified within the last 10 seconds are tagged [ AUDIT ACTIVE ] (blinking). List rescans automatically every 2 seconds.

Key Bindings:

Key Action
/ Navigate repository list
Enter Open repository in ActiveSurveillance mode
q Quit

Mode 2 — ActiveSurveillance (per-repo)

Full-screen view. Layout: title bar (3 rows) → tab selector (3 rows) → content (fill) → footer (1 row). Log file polled for changes every 2 seconds.

Three tabs:

Tab Index Content
Live Telemetry 0 PR delta feed: score, necrotic flag, clone count, antipattern detail strings, collided PR numbers
Structural Topology 1 Top-10 C++ compile-time silos ranked by transitive reach (petgraph). C++ graph rebuilt every 5 seconds when empty.
Swarm Intelligence 2 Structural clone cluster detection table: PR pairs, Jaccard similarity, band collisions

Key Bindings:

Key Action
/ Change tab
Esc / Backspace Return to TargetSelection
q Quit

(Mode 3 — Static Dashboard draw_dashboard removed. The WOPR TUI is the sole production view.)


VII. OPERATIONAL COMMANDS — COMPLETE JUSTFILE MANIFEST

just shell
Drop into the Nix development environment (nix develop). All tools pinned via flake.nix.

just init
Scaffold workspace from scratch: write Cargo.toml, mkdir crates/, cargo new each crate. Destructive — resets existing workspace.

just audit
Definition of Done. Runs inside Nix shell if available: cargo fmt --all -- --checkcargo clippy --workspace -- -D warningscargo check --workspacecargo test --workspace

just build
cargo build --release --workspace (inside Nix shell if available).

just clean
cargo clean + find . -name "*.rkyv" -delete — vaporises target artefacts and all rkyv registry files.

just auth-refresh
No-op. Token is injected at runtime via --token flag. Stateless auth model.

just bump-version <version>
Updates version strings in Cargo.toml (root + all crates/ + tools/), README.md, docs/index.md, ARCHITECTURE.md, and CLAUDE.md. Runs cargo check as sanity pass.

just release <version>
Linear release entrypoint: runs just audit once, then delegates to just fast-release <version> for cargo build --releasestrip target/release/janitorgit commitgit tag v<version> → floating major tag (v<MAJOR>, force-pushed) → git pushgh release createmkdocs gh-deploy.

just run-gauntlet [*ARGS]
Build gauntlet-runner (cargo build --release -p gauntlet-runner) then execute. Reads gauntlet_targets.txt. Uses gh pr diff subshells per PR (parallel-bounce mode). Accepts: --pr-limit, --timeout, --targets, --gauntlet-dir, --out-dir, --resume, --concurrency (0 = auto from RAM).

just hyper-gauntlet [*ARGS]
Build gauntlet-runner + cli (cargo build --release -p gauntlet-runner -p cli) then execute with --hyper --pr-limit 5000. Clones repos once via libgit2, fetches all PR refs — zero gh pr diff subshells. Accepts same flags as run-gauntlet.

just deploy-docs
uv run --with "mkdocs-material<9.6" --with "mkdocs<2" mkdocs gh-deploy --force — builds and pushes MkDocs site to GitHub Pages.

just sync
rsync -av --delete to /mnt/c/Projects/the-janitor/ — excludes target/, .git/, .janitor/shadow_src/.


VIII. R&D VAULT — EXPERIMENTAL CRATES

Located at crates/experimental/. All four are workspace members but only advanced_threats is wired into the production forge pipeline.

Crate File Status Function
advanced_threats binary_hunter.rs PRODUCTION (wired into slop_filter.rs + cli) Zero-allocation AhoCorasick scanner for ELF/PE/WASM/miner byte patterns. 7 patterns. THREAT_LABEL = "security:compiled_payload_anomaly". +50 pts per match.
backlog_pruner PRODUCTION (wired into forge) Necrotic GC flag assignment: classifies PRs as SEMANTIC_NULL, GHOST_COLLISION, or UNWIRED_ISLAND. Populates necrotic_flag on SlopScore.
include_deflator PRODUCTION. C/C++ transitive header dependency analyser. IncludeGraphBuilder used in git_drive.rs; powers architecture:compile_time_bloat and architecture:graph_entanglement antipatterns and WOPR Structural Topology tab. C/C++ compile-time silo analysis
phantom_ffi_gate DELETED Architecture requires full-repo C++ registry (not patch-scope). Cannot be wired into PatchBouncer::bounce() — only produces false negatives on single-file diffs.

IX. FINAL VERSION

v10.2.0-beta.2

Extracted from [workspace.package].version in root Cargo.toml.

Release profile: opt-level = "z", lto = true, codegen-units = 1, strip = true, panic = "abort". MSRV: rust-version = "1.88" (enforced by CI MSRV workflow). Edition: 2021. License: BUSL-1.1 (all workspace crates via license.workspace = true).

Architecture Inversion Implementation

Architecture Inversion (Steps 1–4 complete):

  1. Governor: POST /v1/analysis-token — issues a short-lived (5-min TTL) Ed25519-signed JWT scoped to {repo}:{pr}:{head_sha}. Rate-limited: same (repo, PR) pair cannot get a new token within 60 s. Controlled by GOVERNOR_INVERT_MODE=1.

  2. CLI: --report-url + --analysis-token — after append_bounce_log, if both flags are set, POSTs the BounceLogEntry JSON to the Governor's /v1/report with Authorization: Bearer <token>. Non-fatal: source code stays on the runner.

  3. Governor: POST /v1/report — verifies the JWT, checks commit_sha == claims.head_sha, retrieves the pending check from pending_checks DashMap, updates the GitHub Check Run, removes the entry. Only active in invert mode.

  4. GitHub Action: invert_mode + governor_url inputs — pre-bounce step fetches the analysis token from /v1/analysis-token; token is passed to janitor bounce --report-url --analysis-token.

New env var: GOVERNOR_INVERT_MODE=1 — gates all inversion behaviour in the Governor. Default: 0 (legacy clone path). New CLI flags: janitor bounce --report-url <url> --analysis-token <jwt> New AppState fields: invert_mode: bool, token_rate_limit: DashMap, pending_checks: DashMap


X. SOVEREIGN CONTROL PLANE (AIR-GAP READY)

The Janitor Sovereign Governor (janitor-gov) is a self-contained binary that runs your Check Run enforcement service entirely on-premise — no cloud dependency, no inbound internet, no data egress.

Architecture

Component Technology Role
Governance API Axum (Rust) — POST /v1/analysis-token, POST /v1/report, POST /v1/attest Receives signed score reports from the CLI runner; updates GitHub Check Runs
Persistent Storage SQLite (via sqlx) — single file governor.db Stores pending checks, marketplace subscriptions, API key state, and audit trail
Attestation Key ML-DSA-65 (FIPS 204) — governor.key generated on first run Signs every CycloneDX CBOM bond; key never leaves the host
Local Key Intake --pqc-key file:<path> Loads ML-DSA-65 signing key from an on-premise file
KMS Integration --pqc-key awskms:<key-id>, --pqc-key azkv:<vault>/<key> Enterprise KMS delegation without key material in CLI memory
PKCS#11 HSM --pqc-key pkcs11:<slot> Hardware Security Module integration for air-gap labs

Local-First Deployment

# janitor.toml — air-gap configuration
pqc_enforced = true          # block merge if CBOM bond fails
governor_url = "https://janitor.internal"

[billing]
ci_kwh_per_run = 0.08        # site-measured PUE 1.4 × 400W × 15min; override with actual grid data

The janitor-gov binary starts with:

GOVERNOR_DB_PATH=/opt/janitor/governor.db \
GOVERNOR_INVERT_MODE=1 \
./janitor-gov --pqc-key file:/opt/janitor/keys/governor.key

No network call is required to a remote attestation service. The full Check Run lifecycle — token issuance, score ingestion, status update — runs on-device.

Compliance Posture

Framework Satisfied By
FedRAMP High — AU-2 Immutable bounce log (bounce_log.ndjson, f.sync_all() after every entry)
FedRAMP High — SC-28 SQLite storage under operator-controlled encryption; no cloud egress
DISA STIG — V-222608 Zero outbound data from CI runner; Governor receives score only
DISA STIG — V-222449 ML-DSA-65 (FIPS 204) attestation on every CBOM bond
IL5 / Air-Gap Networks Full functionality with no inbound or outbound internet

X-B. UNIVERSAL SCM SUPPORT

The ScmContext abstraction decouples the Janitor engine from any single source control platform. The analysis engine (PatchBouncer, ForgeConfig, bounce log) is platform-agnostic. ScmContext provides the thin adapter layer.

Supported Platforms

Platform CI Runtime Check Run Delivery
GitHub Actions action.yml step in .github/workflows/ GitHub Checks API via Governor
GitLab CI .gitlab-ci.yml script block; $CI_MERGE_REQUEST_DIFF_BASE_SHA GitLab MR status API via Governor
Bitbucket Pipelines bitbucket-pipelines.yml script step; $BITBUCKET_PR_DESTINATION_BRANCH Bitbucket Build Status API
Azure DevOps Azure Pipelines YAML task; $(System.PullRequest.TargetBranchName) Azure DevOps Checks API

Environment Variable Contract

Every SCM adapter populates the same environment contract consumed by janitor bounce:

JANITOR_PR_NUMBER   # PR / MR number (integer)
JANITOR_HEAD_SHA    # head commit SHA
JANITOR_BASE_SHA    # merge-base SHA
JANITOR_AUTHOR      # PR author handle
JANITOR_PR_BODY     # PR description text (for unlinked-PR detection)
JANITOR_REPO_SLUG   # owner/repo format

The engine reads these from the environment when explicit --flags are absent — zero platform-specific conditional logic inside the Rust binary.


X-C. VERSION SILO DETECTION — DEPENDENCY GRAPH HARDENING

architecture:version_silo is emitted when the engine detects a crate or package resolved at multiple distinct versions within the PR's manifest files (Cargo.toml, package.json).

Scoring: +20 points per duplicate crate version. Each duplicate emits a distinct antipattern entry: architecture:version_silo — <crate_name> (<v1> vs <v2>).

Mechanism: The engine parses the in-memory Cargo.lock blob (no disk reads) via find_version_silos_in_blobs in crates/anatomist/src/manifest.rs. Detection is O(PR-diff): only files present in the PR patch are scanned, not the full repository lockfile.

Impact: A version silo forces the Cargo resolver to maintain two parallel compilation artifacts for the same crate, is a common source of diamond dependency conflicts in rapidly evolving monorepos, and is a footprint vector for supply chain drift.


XI-A. GATEKEEPER PROVENANCE

The Provenance struct (attached to every BounceLogEntry in crates/cli/src/report.rs) records three fields at bounce time:

Field Description
analysis_duration_ms Wall-clock duration of the full bounce analysis in milliseconds
source_bytes_processed Total bytes of added source content fed into the analysis engine
egress_bytes_sent Exact byte-length of the JSON score report POSTed to the Governor

Exfiltration Ratio = egress_bytes_sent / source_bytes_processed.

Since the engine transmits only a structured JSON score report (PR metadata, score, antipattern IDs) and never source lines, this ratio is mathematically bounded near zero. The report renders the Exfiltration Ratio as a percentage — a machine-verifiable proof that no source code crossed the network boundary. zero_upload_verified is set true when egress_bytes_sent == 0 or the ratio is < 1.0%.


XII. MEMORY BACKPRESSURE — PHYSARUM 2.0

Hardware-Aware Concurrency

detect_optimal_concurrency() in crates/common/src/physarum.rs queries sysinfo for total system RAM and returns a worker-count recommendation used by both janitor and gauntlet-runner:

Total RAM Workers Mode
< 8 GB 2 Safety
8–16 GB 4 Standard
16–32 GB 8 High-Velocity
> 32 GB logical CPU count Aggressive

The --concurrency 0 flag (default) selects auto-detection. Manual override is available via --concurrency <N>.

Request concurrency is further governed by the SMA-gated semaphore model:

RAM Utilisation Semaphore Max Concurrent Requests
≤ 75% flow_semaphore 4
75–90% constrict_semaphore 2
> 90% Busy-wait 0 (backpressure)

Melanin Layer

The Melanin Layer is a 500 ms background thread (start_background_heart()) that refreshes OS memory statistics and publishes the result to a static AtomicU8 (GLOBAL_PULSE). Analysis threads read global_pulse() via a single AtomicU8::load(Relaxed) — zero mutex acquisition, zero OS syscall per check.

This decouples the memory observer from the scanning hot-path. A rayon pool processing 100 PRs/sec issues at most 2 sysinfo::refresh_memory() syscalls per second instead of 100. The background thread is idempotent — spawned at most once regardless of how many callers invoke start_background_heart().


Security Model

Zero-Copy Architecture: RAM-Only AST Pipeline

All file reads in the hot path use memmap2::Mmap — a read-only memory-mapped view (PROT_READ only). The file content is never copied into a heap allocation. Tree-sitter receives a &[u8] slice of the mmap'd region and constructs the AST entirely in RAM. No AST is written to disk. No temporary file is created.

Circuit breakers prevent resource exhaustion before parsing begins:

Limit Value Location
Max file size for parsing 1 MiB slop_filter.rs
Parse timeout 100 ms parser.rs::PARSE_TIMEOUT_MICROS
Panic containment catch_unwind(AssertUnwindSafe) parser.rs::timed_parse()

Shadow Merger: Air-Gapped PR Simulation

simulate_merge(repo, base_oid, head_oid) in crates/forge/src/shadow_git.rs uses libgit2's tree-diff API to compute changed blobs within the git object store — read-only. The result is a MergeSnapshot { blobs: HashMap<PathBuf, Vec<u8>> } — a pure heap allocation. No file is checked out. No working directory is written. No build tool is invoked. A malicious CMakeLists.txt or Makefile exists only as an inert byte array.

Cryptographic Provenance: ML-DSA-65 (FIPS 204)

Attestation is signed with ML-DSA-65 — 128-bit post-quantum security, standardised by NIST as FIPS 204 in August 2024. The binary embeds only the 32-byte verifying key (VERIFYING_KEY_BYTES). The signing key is held exclusively by thejanitor.app and never appears in the binary, repository, or process memory.

Token revocation is achieved by keypair rotation: all tokens signed against the old private key become cryptographically invalid against the new verifying key — no revocation server, no database lookup.

Shadow Tree Isolation and Atomic Rollback

Before touching any source file, the engine creates a Shadow Tree — a mirror of the project directory using zero additional disk space (symlinks on Linux/macOS, hard links on Windows). Physical excision proceeds bottom-to-top (descending byte order) to preserve upstream offsets. Backup copies are written to .janitor/ghost/ before any write. restore_all() is called on any write failure.

All destructive commands default to dry-run mode. Nothing is modified without --force-purge --token <TOKEN>.

Supply Chain Integrity

All GitHub Actions steps are SHA-pinned to 40-character commit SHAs. step-security/harden-runner is the first step of every job. cargo audit and cargo deny are required gates in just audit. Docker base images are pinned to @sha256:<digest>. The engine's CI runs janitor scan against its own source tree on every PR.

Compliance Mapping

Framework Control Implementation
SOC 2 Type II — CC6 Logical access controls ML-DSA-65 token gate on all destructive commands
SOC 2 Type II — CC7 System monitoring Remote attestation POST to /v1/attest on every excision
NIST FIPS 204 Post-quantum signature ML-DSA-65
SLSA Level 2 Build provenance GitHub Actions release workflow with SHA-pinned steps
CIS Benchmark — 14.2 Encrypt data in transit All API calls use HTTPS; ureq enforces TLS
OWASP — A08:2021 Software and data integrity cargo audit + cargo deny in CI; SHA-pinned Docker images

Responsible Disclosure

Security issues: [email protected]

Acknowledge within 24 hours. Initial assessment within 72 hours. Critical vulnerabilities (RCE, token forgery, audit log tampering) treated as P0 with a 48-hour patch target.


Benchmarks

Results from v6.12.7 gauntlet run. 22 Tier-1 repositories. Live PRs via just run-gauntlet. Hardware: AMD64 / WSL2, Linux 6.6.87, 8 GB RAM.

Global Audit 2026 — Summary

Metric Value
Repositories audited 22
Pull requests analyzed 2,090
Total Slop Score 38,685
Antipatterns Blocked 124
Engine panics 0
OOM events 0

22-Repo Tier-1 Matrix

Repo Lang Peak RSS Dead Symbols Clone Groups PRs Bounced Antipatterns
godotengine/godot C++ 58 MB 717 2 98 8
NixOS/nixpkgs Nix 29 MB 205 2 100 0
microsoft/vscode TS 107 MB 2,827 0 95 10
kubernetes/kubernetes Go 166 MB 73 2 98 4
pytorch/pytorch C++/Py 164 MB 8,247 24 95 2
apache/kafka Java 72 MB 1 3 100 16
rust-lang/rust Rust 235 MB 30 2 100 24
tauri-apps/tauri Rust/JS 29 MB 1 0 95 12
redis/redis C 23 MB 87 2 98 3
vercel/next.js JS/TS 51 MB 0 0 93 8
home-assistant/core Python 101 MB 8,311 9 97 4
ansible/ansible Python 25 MB 895 2 95 6
cloudflare/workers-sdk TS 38 MB 14 1 90 3
langchain-ai/langchain Python 20 MB 1,483 2 95 4
denoland/deno Rust/TS 44 MB 22 1 100 2
rails/rails Ruby 46 MB 120 2 95 3
laravel/framework PHP 34 MB 85 1 95 3
apple/swift Swift/C++ 182 MB 450 3 88 2
dotnet/aspnetcore C# 142 MB 4 0 95 2
square/okhttp Kotlin/Java 48 MB 22 0 88 0
hashicorp/terraform Go/HCL 52 MB 38 1 93 0
neovim/neovim C/Lua 28 MB 145 3 90 8

Language Support Matrix

Language Grammar Gauntlet Repos
Python tree-sitter-python ansible, home-assistant, pytorch, langchain
Rust tree-sitter-rust rust-lang/rust, tauri, deno
JavaScript tree-sitter-javascript next.js, workers-sdk
TypeScript tree-sitter-typescript vscode, next.js, workers-sdk
C++ tree-sitter-cpp godot, pytorch, apple/swift
C tree-sitter-c redis, neovim
Java tree-sitter-java kafka, okhttp
C# tree-sitter-c-sharp dotnet/aspnetcore
Go tree-sitter-go kubernetes, terraform
Ruby tree-sitter-ruby rails/rails
PHP tree-sitter-php laravel/framework
Swift tree-sitter-swift apple/swift
Lua tree-sitter-lua neovim/neovim
Nix tree-sitter-nix NixOS/nixpkgs
Kotlin tree-sitter-kotlin square/okhttp
GLSL tree-sitter-glsl godot shaders
Objective-C tree-sitter-objc godot, apple/swift
Bash tree-sitter-bash ansible
Scala tree-sitter-scala kafka

23 grammars total. OnceLock<Language> statics: 184 bytes total static overhead for all 23 grammars. Zero per-call allocation.


XI. STRUCTURAL PIPELINE

flowchart TD
    A([PR Opened]) --> B[Vibe-Check Gate\nzstd NCD ratio < 0.15?]
    B -->|NCD anomaly| C[antipattern:ncd_anomaly\n+10 pts — AI boilerplate signal]
    B --> D[AST Autopsy\ntree-sitter 23 grammars\ngetsℹ strcpy · innerHTML · shell=True]
    C --> D
    D -->|findings| E[Silo Check\njanitor_dep_check\nversion-split pairs]
    D --> E
    E -->|silos ≥ 5| F[Hard block\nmerge prevented]
    E --> G{Score ≤ gate?}
    G -->|Yes — SANCTUARY INTACT| H[PQC Bond Issuance\nML-DSA-65 · CycloneDX v1.6\nCBOM written to .janitor/]
    G -->|No — BREACH DETECTED| I[Check Run FAIL\nSARIF annotations uploaded\nwebhook fired]

The diagram above is the canonical execution sequence. Every PR traverses all stages left-to-right. The Vibe-Check Gate fires before AST parsing — an O(N) compression pass that short-circuits tree-sitter for statistically self-similar AI blobs. The PQC Bond is issued only when the score falls below the min_slop_score ceiling defined in janitor.toml.


XII. BLAST RADIUS GATE (v8.0.11)

Problem

Autonomous coding agents (Copilot, Cursor, Devin) frequently produce "hallucinated refactors" — PRs that modify files across 6, 8, or even 12 unrelated subsystems in a single change. These shotgun diffs are indistinguishable from legitimate cross-cutting refactors by score alone; they often have zero antipattern findings and low clone counts.

Gate

PatchBouncer counts the number of distinct top-level directories touched by a multi-file PR. Canonical lockfile updates (Cargo.lock, package-lock.json, yarn.lock, pnpm-lock.yaml, Gemfile.lock, poetry.lock, go.sum, flake.lock) are excluded from the count — dependency bumps legitimately touch lockfiles across the entire repo.

Distinct top-level dirs (excl. lockfiles) Verdict Score delta
≤ 5 PASS +0
> 5 architecture:blast_radius_violation +50 pts

The +50 Critical penalty is additive to all other findings. An agentic PR that also carries a single other Critical finding scores 100 (50 + 50) and fails the default 100-point gate.

Supply Chain Integrity Guard (v8.0.11)

Two new supply-chain patterns are active at Critical severity (+50 pts each):

Pattern Label Rationale
<script src="http security:unpinned_asset External script without SRI — CDN hijack vector
.github.io/ security:unpinned_asset GitHub Pages URL in production — no integrity SLA

Both patterns run via find_supply_chain_slop() (language-agnostic, called from find_slop()) and via binary_hunter::scan() for diff-level coverage on non-source file types. The Crucible Threat Gallery carries true-positive and true-negative fixtures for both patterns.


End of SOVEREIGN BRIEFING.