Security Posture

Audience: Security architects, procurement teams, and enterprise buyers conducting vendor due-diligence. This document maps each security control to the source-code construct that implements it.

Executive Summary

The Janitor is a static-analysis engine that processes untrusted code at high volume. Every architectural decision has been made under the assumption that the input is adversarial. The result is a system with no shell execution surface, no mutable disk state during analysis, post-quantum signed audit trails, and a CI/CD pipeline that continuously audits itself.

Security Highlights

Control	Implementation
ML-DSA-65 Token Gate	Post-quantum (NIST FIPS 204) signed bearer token. Binary embeds only the 32-byte public verifying key — private key never on end-user machine. All destructive operations require a valid token.
Zero-Emission Local Scans	`janitor bounce` with `--patch` or `--base`/`--head` flags sends zero outbound traffic to the target repository. No webhooks. No telemetry to the scanned repo. Source code is memory-mapped locally and never transmitted.
Agnostic IaC Shield	`ByteLatticeAnalyzer` detects binary blobs and encrypted payloads in patches without requiring a grammar. IaC files (`.nix`, `.lock`, `.json`, `.toml`) bypass entropy analysis — eliminating lockfile false positives while maintaining full coverage of injected binary content.
Universal Bot Shield	`is_automation_account()` 4-layer bot classification — bot PRs receive full structural analysis, correctly attributed for reporting. No code is exempt from the engine.
Physarum Backpressure	`SystemHeart::beat()` enforces an 8 GB RAM ceiling across all analysis operations. Requests held (not dropped) when RAM > 90% — prevents OOM termination without manual capacity planning.

1. Zero-Copy Architecture: RAM-Only AST Pipeline

Threat Modelled

A developer submits a PR containing a weaponised source file: a polyglot document designed to exploit a parser vulnerability, an oversized file intended to exhaust memory, or a binary blob disguised as source code. A naive analysis engine materialises the file to disk and invokes a subprocess — creating a reachable, executable attack surface.

The Control

All file reads in The Janitor's hot path use memmap2::Mmap — a read-only memory-mapped view of the file. The operating system maps the file's pages into the process's virtual address space with the PROT_READ flag only. The file content is never copied into a heap allocation. Tree-sitter receives a &[u8] slice of the mmap'd region and constructs the AST entirely in RAM.

  Disk                   Process Virtual Memory
  ─────                  ──────────────────────
  file.rs  ──PROT_READ──► &[u8] (mmap region)
                               │
                          tree-sitter parse()
                               │
                          AST nodes (heap, owned by tree-sitter)
                               │
                          Entity extraction ──► Vec<Entity> (heap)
                               │
                          mmap drop ──► OS reclaims pages

No AST is written to disk. No temporary file is created. The parsed representation exists only for the duration of the analysis call. When the Mmap handle is dropped, the OS reclaims the mapped pages.

Circuit Breakers

Two hard limits prevent resource exhaustion before any parsing begins:

Limit	Value	Location
Max file size for parsing	1 MiB	`slop_filter.rs` circuit breaker
Parse timeout	100 ms	`parser.rs::PARSE_TIMEOUT_MICROS`
Panic containment	`catch_unwind(AssertUnwindSafe)`	`parser.rs::timed_parse()`

A file that exceeds 1 MiB is skipped without ever reaching tree-sitter. A parse that exceeds 100 ms is cancelled. A grammar-level panic is caught and converted to a ParseFailure error — it never unwinds past the crate boundary.

Attack Surface Eliminated

Traditional Approach	The Janitor
Write file to `/tmp/`, invoke linter subprocess	No subprocess. No `/tmp/` write.
Parse with full-heap copy of file bytes	PROT_READ mmap: no copy, no write access
Persist intermediate AST to disk	AST lives in RAM for ≤ 100 ms, then reclaimed
No timeout on parser	100 ms hard ceiling via progress callback

2. Shadow Merger: Air-Gapped PR Simulation

Threat Modelled

A hostile PR modifies Makefile, CMakeLists.txt, setup.py, or a GitHub Actions workflow. An analysis pipeline that checks out the branch, builds the project, or runs any tooling against the working tree executes the attacker's code.

The Control

crates/forge/src/shadow_git.rs exposes simulate_merge(repo, base_oid, head_oid). This function uses libgit2's tree-diff API to compute the set of blobs that differ between the base commit and the PR head — entirely within the git object store, which is already on disk and read-only from The Janitor's perspective.

┌──────────────────────────────────────────────────────┐
│                   janitor bounce                      │
│                                                       │
│  simulate_merge(repo, base_oid, head_oid)             │
│       │                                               │
│  libgit2 tree-diff ── reads .git/objects/ (RO)        │
│       │                                               │
│  MergeSnapshot { blobs: HashMap<PathBuf, Vec<u8>> }   │
│       │           (pure heap allocation)              │
│  find_slop() ◄── tree-sitter parses &[u8]             │
│                  (never touches filesystem)           │
│                                                       │
│  ══ ISOLATION BOUNDARY: zero shell execution below ══ │
└──────────────────────────────────────────────────────┘

No file is checked out. No working directory is written. No build tool is invoked. The MergeSnapshot is a HashMap<PathBuf, Vec<u8>> — a pure heap allocation. A malicious CMakeLists.txt exists only as an inert byte array.

A compromised PR could include:

CMakeLists.txt that runs a add_custom_command(POST_BUILD ...) shell payload
Makefile targets executed by make during a build-triggered scan
setup.py / pyproject.toml with setup_requires that pip-installs malware
.github/actions/ that a naive tool might evaluate locally

The Shadow Merger never materialises any of these to disk. The malicious content exists only as a byte array in heap memory — unexecutable, unreachable by the OS process loader.

3. Cryptographic Provenance: ML-DSA-65 (FIPS 204)

Threat Modelled

A future adversary with access to a cryptographically relevant quantum computer retroactively breaks ECDSA or RSA signatures on archived attestation logs, enabling silent forgery of historical audit records.

The Control

The Janitor's attestation pipeline is signed with ML-DSA-65, the Module Lattice Digital Signature Algorithm standardised by NIST in August 2024 as FIPS 204. ML-DSA is lattice-based; no known quantum algorithm provides a sub-exponential speedup for signature forgery against it. The scheme provides 128-bit post-quantum security.

Key Architecture

Component	Location	Role
Verifying key (public)	Embedded in binary at compile time	Token verification (offline)
Signing key (private)	Held exclusively by thejanitor.app	Never in binary, never in repo
Token format	`base64(ml_dsa_65_sign("JANITOR_PURGE_AUTHORIZED", sk))`	Bearer authorisation
Verification	`vault::SigningOracle::verify_token()` — pure offline	No network call required

The binary embeds only VERIFYING_KEY_BYTES. The corresponding private key does not appear in the repository, binary, build artefact, or process memory at runtime. Running strings or objdump against the binary will produce the public key and nothing else.

Token Revocation

Revocation is achieved by keypair rotation, not by a revocation list:

cargo run -p mint-token -- generate produces a new ML-DSA-65 keypair.
The new verifying key is embedded in a patch release binary.
All tokens signed against the old private key are cryptographically invalid against the new verifying key — no database lookup, no network check, no revocation server.

Trigger	Response
Scheduled annual rotation	New binary released at license renewal
Suspected token compromise	Emergency binary release; all licensees notified via [email protected]
Binary integrity failure	Binary replaced; SHA-256 hash published on GitHub Release

Industrial Core licensees receive a contractual rotation SLA: an emergency keypair rotation and new binary delivery within 4 hours of a confirmed compromise report.

"Harvest Now, Decrypt Later" Resistance

Attestation logs signed today with ML-DSA-65 remain unforgeable under a future quantum adversary. Classical ECDSA-signed audit trails collected over the next decade will be retroactively forgeable once sufficiently powerful quantum computers exist. We made the migration in 2024, before the threat materialised.

Every physical excision event signed with a valid token includes a per-event ML-DSA-65 signature in the audit log:

{
  "timestamp": "2026-02-19T10:00:00Z",
  "file_path": "/abs/path/src/module.py",
  "sha256_pre_cleanup": "a3b4c5d6...",
  "attestation_signature": "<base64-mldsa65-sig>"
}

The attestation_signature field covers {timestamp}{file_path}{sha256_pre_cleanup}. Auditors can verify this signature independently using only the public verifying key embedded in the binary at the time of excision — no server access required.

4. Shadow Tree Isolation & Atomic Rollback

Before touching any source file, The Janitor creates a Shadow Tree — a mirror of your project directory that uses zero additional disk space.

Platform	Technique	Privilege Required
Linux / macOS	Symbolic links per file	None
Windows	Hard links per file	None (no Admin, no Developer Mode)

When The Janitor identifies a dead symbol, it removes the link from the Shadow Tree — the original file remains intact. Your test suite runs against the shadow view. If tests pass, the symbol was genuinely unused. If they fail, nothing has been permanently modified.

Atomic Rollback Layers

Layer 1 — Shadow Rollback (always active): If the test suite fails against the Shadow Tree, all links are immediately restored. The source tree is in its original, unmodified state.

Layer 2 — Backup Rollback (active during physical excision): SafeDeleter copies each file to .janitor/ghost/<timestamp>_<filename>.bak before the first write. Symbol byte ranges are removed bottom-to-top (descending byte order) to preserve upstream offsets. UTF-8 character boundaries are verified before every splice.

If any write operation fails partway through, restore_all() copies every .bak file back to its original path.

Dry-Run Default

All destructive commands default to dry-run mode. Nothing is modified unless you explicitly request it:

# Safe: reports what would be deleted
janitor clean ./src

# Requires explicit intent + a valid token
janitor clean ./src --force-purge --token <TOKEN>

Rollback Command

# In a git repository: stashes all uncommitted changes
# Without git: restores files from .janitor/ghost/
janitor undo ./src

Failure Mode	What Happens
Test suite fails in Shadow Tree	Links restored, source unchanged, exit 1
File write fails during excision	Backup restored, source in original state
Process killed mid-excision	Run `janitor undo` to restore from `.janitor/ghost/`
Accidental run without intent	Default dry-run prints report, modifies nothing

5. Hermetic Builds: Nix Flakes

The Janitor audits other projects for zombie dependencies and supply-chain drift. A hermetic build guarantees that every developer, CI runner, and release pipeline produces bit-identical artefacts from the same source revision — regardless of OS version, globally installed packages, or ambient PATH contents.

Risk	Mitigated By
"Works on my machine"	Nix Flake pins exact package revisions
Rust toolchain drift	`rust-toolchain.toml` pins Rust 1.85.0
Pandoc / TeX version skew	Nix devShell provides pinned pandoc + texlive
libgit2 / OpenSSL ABI mismatch	Nix provides C library headers via `pkg-config`
CI/CD supply chain	GitHub Actions steps are SHA-pinned (see Section 6 below)

Entering the Dev Shell

just shell
# — or equivalently —
nix develop

just audit and just build detect whether they are running inside the Nix devShell via the IN_NIX_SHELL environment variable. If Nix is installed but the shell is not active, the recipe transparently re-execs itself under nix develop --command just <recipe>.

Pinning Strategy

rust-toolchain.toml declares the exact channel:

[toolchain]
channel = "1.85.0"
components = ["rustfmt", "clippy", "rust-src"]

flake.lock pins every Nix input — including nixpkgs and rust-overlay — to an exact git commit SHA. Commit flake.lock alongside flake.nix so that CI and all contributors use identical package revisions.

The production Dockerfile pins its base images to @sha256:<digest>:

FROM rust:1.85-slim@sha256:3490aa77... AS builder
FROM debian:bookworm-slim@sha256:6458e6ce... AS runtime

6. Supply Chain Integrity: Pinned Dependencies, Self-Audited CI

GitHub Actions: SHA-Pinned, Harden-Runner Gated

Every action in .github/workflows/ is pinned to a 40-character commit SHA — never a mutable version tag. step-security/harden-runner is the first step of every job, restricting the egress network policy to only the endpoints required by that workflow.

- uses: step-security/harden-runner@5ef0c079ce82195b2a36a210272d6b661572d83e # v2.14.2
  with:
    egress-policy: audit

A tag-pinned action (@v4) is a mutable pointer — the action owner can silently replace the tag with malicious code. A SHA-pinned action is immutable.

Cargo Audit

cargo audit is a required gate in the just audit recipe. Any crate with a known advisory in the RustSec database causes the build to fail. The workspace deny.toml policy additionally enforces licence compatibility and bans crates with duplicate transitive dependencies.

The Janitor Scans The Janitor

The engine's own CI/CD pipeline runs janitor scan against the engine's own source tree on every pull request. Any PR that introduces dead symbols, zombie dependencies, hallucinated security claims, or structural clones into the engine is blocked by the engine — before a human reviewer sees it.

PR opened ──► janitor-pr-gate.yml
                    │
             janitor bounce <diff>
                    │
             slop_score ≥ 100 ? ──► CI FAIL (PR blocked)
                    │
             slop_score < 100 ? ──► CI PASS (review proceeds)

7. RAM Pressure Management: Physarum Backpressure

crates/common/src/physarum.rs implements SystemHeart::beat(), which samples total RAM utilisation on every request. The daemon acquires a concurrency semaphore before processing:

RAM Utilisation	Semaphore	Max Concurrent Requests
≤ 75%	`flow_semaphore`	4
75 – 90%	`constrict_semaphore`	2
> 90%	Busy-wait until < 90%	0 (backpressure)

Requests that arrive when RAM utilisation exceeds 90% are held — not dropped, not errored — until the system returns to the Constrict or Flow band. This prevents OOM termination without requiring manual capacity planning.

8. Responsible Disclosure

Security issues in The Janitor should be reported to [email protected]. Include:

A description of the vulnerability and its potential impact.
Steps to reproduce (proof-of-concept code or a minimal diff is helpful).
Your preferred contact method for follow-up.

We commit to acknowledging receipt within 24 hours and providing an initial assessment within 72 hours. Critical vulnerabilities (RCE, token forgery, audit log tampering) are treated as P0 with a target patch cadence of 48 hours from confirmation.

Compliance Mapping

Framework	Relevant Control	The Janitor Implementation
SOC 2 Type II — CC6	Logical access controls	ML-DSA-65 token gate on all destructive commands
SOC 2 Type II — CC7	System monitoring	Remote attestation POST to `/v1/attest` on every excision
NIST FIPS 204	Post-quantum signature	ML-DSA-65 (`pqcrypto-mldsa` crate, verified against NIST KATs)
SLSA Level 2	Build provenance	GitHub Actions release workflow with SHA-pinned steps
CIS Benchmark — 14.2	Encrypt data in transit	All API calls use HTTPS; `ureq` enforces TLS
OWASP — A08:2021	Software and data integrity	`cargo audit` + `cargo deny` in CI; SHA-pinned Docker images