Security Posture
Audience: Security architects, procurement teams, and enterprise buyers conducting vendor due-diligence. This document maps each security control to the source-code construct that implements it.
Executive Summary
The Janitor is a static-analysis engine that processes untrusted code at high volume. Every architectural decision has been made under the assumption that the input is adversarial. The result is a system with no shell execution surface, no mutable disk state during analysis, post-quantum signed audit trails, and a CI/CD pipeline that continuously audits itself.
Security Highlights
| Control | Implementation |
|---|---|
| ML-DSA-65 Token Gate | Post-quantum (NIST FIPS 204) signed bearer token. Binary embeds only the 32-byte public verifying key — private key never on end-user machine. All destructive operations require a valid token. |
| Zero-Emission Local Scans | janitor bounce with --patch or --base/--head flags sends zero outbound traffic to the target repository. No webhooks. No telemetry to the scanned repo. Source code is memory-mapped locally and never transmitted. |
| Agnostic IaC Shield | ByteLatticeAnalyzer detects binary blobs and encrypted payloads in patches without requiring a grammar. IaC files (.nix, .lock, .json, .toml) bypass entropy analysis — eliminating lockfile false positives while maintaining full coverage of injected binary content. |
| Universal Bot Shield | is_automation_account() 4-layer bot classification — bot PRs receive full structural analysis, correctly attributed for reporting. No code is exempt from the engine. |
| Physarum Backpressure | SystemHeart::beat() enforces an 8 GB RAM ceiling across all analysis operations. Requests held (not dropped) when RAM > 90% — prevents OOM termination without manual capacity planning. |
1. Zero-Copy Architecture: RAM-Only AST Pipeline
Threat Modelled
A developer submits a PR containing a weaponised source file: a polyglot document designed to exploit a parser vulnerability, an oversized file intended to exhaust memory, or a binary blob disguised as source code. A naive analysis engine materialises the file to disk and invokes a subprocess — creating a reachable, executable attack surface.
The Control
All file reads in The Janitor's hot path use memmap2::Mmap — a read-only memory-mapped
view of the file. The operating system maps the file's pages into the process's virtual
address space with the PROT_READ flag only. The file content is never copied into a
heap allocation. Tree-sitter receives a &[u8] slice of the mmap'd region and
constructs the AST entirely in RAM.
Disk Process Virtual Memory
───── ──────────────────────
file.rs ──PROT_READ──► &[u8] (mmap region)
│
tree-sitter parse()
│
AST nodes (heap, owned by tree-sitter)
│
Entity extraction ──► Vec<Entity> (heap)
│
mmap drop ──► OS reclaims pages
No AST is written to disk. No temporary file is created. The parsed representation
exists only for the duration of the analysis call. When the Mmap handle is dropped,
the OS reclaims the mapped pages.
Circuit Breakers
Two hard limits prevent resource exhaustion before any parsing begins:
| Limit | Value | Location |
|---|---|---|
| Max file size for parsing | 1 MiB | slop_filter.rs circuit breaker |
| Parse timeout | 100 ms | parser.rs::PARSE_TIMEOUT_MICROS |
| Panic containment | catch_unwind(AssertUnwindSafe) |
parser.rs::timed_parse() |
A file that exceeds 1 MiB is skipped without ever reaching tree-sitter. A parse that
exceeds 100 ms is cancelled. A grammar-level panic is caught and converted to a
ParseFailure error — it never unwinds past the crate boundary.
Attack Surface Eliminated
| Traditional Approach | The Janitor |
|---|---|
Write file to /tmp/, invoke linter subprocess |
No subprocess. No /tmp/ write. |
| Parse with full-heap copy of file bytes | PROT_READ mmap: no copy, no write access |
| Persist intermediate AST to disk | AST lives in RAM for ≤ 100 ms, then reclaimed |
| No timeout on parser | 100 ms hard ceiling via progress callback |
2. Shadow Merger: Air-Gapped PR Simulation
Threat Modelled
A hostile PR modifies Makefile, CMakeLists.txt, setup.py, or a GitHub Actions
workflow. An analysis pipeline that checks out the branch, builds the project, or runs
any tooling against the working tree executes the attacker's code.
The Control
crates/forge/src/shadow_git.rs exposes simulate_merge(repo, base_oid, head_oid).
This function uses libgit2's tree-diff API to compute the set of blobs that differ
between the base commit and the PR head — entirely within the git object store, which is
already on disk and read-only from The Janitor's perspective.
┌──────────────────────────────────────────────────────┐
│ janitor bounce │
│ │
│ simulate_merge(repo, base_oid, head_oid) │
│ │ │
│ libgit2 tree-diff ── reads .git/objects/ (RO) │
│ │ │
│ MergeSnapshot { blobs: HashMap<PathBuf, Vec<u8>> } │
│ │ (pure heap allocation) │
│ find_slop() ◄── tree-sitter parses &[u8] │
│ (never touches filesystem) │
│ │
│ ══ ISOLATION BOUNDARY: zero shell execution below ══ │
└──────────────────────────────────────────────────────┘
No file is checked out. No working directory is written. No build tool is invoked.
The MergeSnapshot is a HashMap<PathBuf, Vec<u8>> — a pure heap allocation. A
malicious CMakeLists.txt exists only as an inert byte array.
A compromised PR could include:
CMakeLists.txtthat runs aadd_custom_command(POST_BUILD ...)shell payloadMakefiletargets executed bymakeduring a build-triggered scansetup.py/pyproject.tomlwithsetup_requiresthat pip-installs malware.github/actions/that a naive tool might evaluate locally
The Shadow Merger never materialises any of these to disk. The malicious content exists only as a byte array in heap memory — unexecutable, unreachable by the OS process loader.
3. Cryptographic Provenance: ML-DSA-65 (FIPS 204)
Threat Modelled
A future adversary with access to a cryptographically relevant quantum computer retroactively breaks ECDSA or RSA signatures on archived attestation logs, enabling silent forgery of historical audit records.
The Control
The Janitor's attestation pipeline is signed with ML-DSA-65, the Module Lattice Digital Signature Algorithm standardised by NIST in August 2024 as FIPS 204. ML-DSA is lattice-based; no known quantum algorithm provides a sub-exponential speedup for signature forgery against it. The scheme provides 128-bit post-quantum security.
Key Architecture
| Component | Location | Role |
|---|---|---|
| Verifying key (public) | Embedded in binary at compile time | Token verification (offline) |
| Signing key (private) | Held exclusively by thejanitor.app | Never in binary, never in repo |
| Token format | base64(ml_dsa_65_sign("JANITOR_PURGE_AUTHORIZED", sk)) |
Bearer authorisation |
| Verification | vault::SigningOracle::verify_token() — pure offline |
No network call required |
The binary embeds only VERIFYING_KEY_BYTES. The corresponding private key does not
appear in the repository, binary, build artefact, or process memory at runtime. Running
strings or objdump against the binary will produce the public key and nothing else.
Token Revocation
Revocation is achieved by keypair rotation, not by a revocation list:
cargo run -p mint-token -- generateproduces a new ML-DSA-65 keypair.- The new verifying key is embedded in a patch release binary.
- All tokens signed against the old private key are cryptographically invalid against the new verifying key — no database lookup, no network check, no revocation server.
| Trigger | Response |
|---|---|
| Scheduled annual rotation | New binary released at license renewal |
| Suspected token compromise | Emergency binary release; all licensees notified via [email protected] |
| Binary integrity failure | Binary replaced; SHA-256 hash published on GitHub Release |
Industrial Core licensees receive a contractual rotation SLA: an emergency keypair rotation and new binary delivery within 4 hours of a confirmed compromise report.
"Harvest Now, Decrypt Later" Resistance
Attestation logs signed today with ML-DSA-65 remain unforgeable under a future quantum adversary. Classical ECDSA-signed audit trails collected over the next decade will be retroactively forgeable once sufficiently powerful quantum computers exist. We made the migration in 2024, before the threat materialised.
Every physical excision event signed with a valid token includes a per-event ML-DSA-65 signature in the audit log:
{
"timestamp": "2026-02-19T10:00:00Z",
"file_path": "/abs/path/src/module.py",
"sha256_pre_cleanup": "a3b4c5d6...",
"attestation_signature": "<base64-mldsa65-sig>"
}
The attestation_signature field covers {timestamp}{file_path}{sha256_pre_cleanup}. Auditors can verify this signature independently using only the public verifying key embedded in the binary at the time of excision — no server access required.
4. Shadow Tree Isolation & Atomic Rollback
Before touching any source file, The Janitor creates a Shadow Tree — a mirror of your project directory that uses zero additional disk space.
| Platform | Technique | Privilege Required |
|---|---|---|
| Linux / macOS | Symbolic links per file | None |
| Windows | Hard links per file | None (no Admin, no Developer Mode) |
When The Janitor identifies a dead symbol, it removes the link from the Shadow Tree — the original file remains intact. Your test suite runs against the shadow view. If tests pass, the symbol was genuinely unused. If they fail, nothing has been permanently modified.
Atomic Rollback Layers
Layer 1 — Shadow Rollback (always active): If the test suite fails against the Shadow Tree, all links are immediately restored. The source tree is in its original, unmodified state.
Layer 2 — Backup Rollback (active during physical excision): SafeDeleter copies each file to .janitor/ghost/<timestamp>_<filename>.bak before the first write. Symbol byte ranges are removed bottom-to-top (descending byte order) to preserve upstream offsets. UTF-8 character boundaries are verified before every splice.
If any write operation fails partway through, restore_all() copies every .bak file back to its original path.
Dry-Run Default
All destructive commands default to dry-run mode. Nothing is modified unless you explicitly request it:
# Safe: reports what would be deleted
janitor clean ./src
# Requires explicit intent + a valid token
janitor clean ./src --force-purge --token <TOKEN>
Rollback Command
# In a git repository: stashes all uncommitted changes
# Without git: restores files from .janitor/ghost/
janitor undo ./src
| Failure Mode | What Happens |
|---|---|
| Test suite fails in Shadow Tree | Links restored, source unchanged, exit 1 |
| File write fails during excision | Backup restored, source in original state |
| Process killed mid-excision | Run janitor undo to restore from .janitor/ghost/ |
| Accidental run without intent | Default dry-run prints report, modifies nothing |
5. Hermetic Builds: Nix Flakes
The Janitor audits other projects for zombie dependencies and supply-chain drift. A hermetic build guarantees that every developer, CI runner, and release pipeline produces bit-identical artefacts from the same source revision — regardless of OS version, globally installed packages, or ambient PATH contents.
| Risk | Mitigated By |
|---|---|
| "Works on my machine" | Nix Flake pins exact package revisions |
| Rust toolchain drift | rust-toolchain.toml pins Rust 1.85.0 |
| Pandoc / TeX version skew | Nix devShell provides pinned pandoc + texlive |
| libgit2 / OpenSSL ABI mismatch | Nix provides C library headers via pkg-config |
| CI/CD supply chain | GitHub Actions steps are SHA-pinned (see Section 6 below) |
Entering the Dev Shell
just audit and just build detect whether they are running inside the Nix devShell via the IN_NIX_SHELL environment variable. If Nix is installed but the shell is not active, the recipe transparently re-execs itself under nix develop --command just <recipe>.
Pinning Strategy
rust-toolchain.toml declares the exact channel:
flake.lock pins every Nix input — including nixpkgs and rust-overlay — to an exact git commit SHA. Commit flake.lock alongside flake.nix so that CI and all contributors use identical package revisions.
The production Dockerfile pins its base images to @sha256:<digest>:
FROM rust:1.85-slim@sha256:3490aa77... AS builder
FROM debian:bookworm-slim@sha256:6458e6ce... AS runtime
6. Supply Chain Integrity: Pinned Dependencies, Self-Audited CI
GitHub Actions: SHA-Pinned, Harden-Runner Gated
Every action in .github/workflows/ is pinned to a 40-character commit SHA — never a
mutable version tag. step-security/harden-runner is the first step of every job,
restricting the egress network policy to only the endpoints required by that workflow.
- uses: step-security/harden-runner@5ef0c079ce82195b2a36a210272d6b661572d83e # v2.14.2
with:
egress-policy: audit
A tag-pinned action (@v4) is a mutable pointer — the action owner can silently
replace the tag with malicious code. A SHA-pinned action is immutable.
Cargo Audit
cargo audit is a required gate in the just audit recipe. Any crate with a known advisory in the RustSec database causes the build to fail. The workspace deny.toml policy additionally enforces licence compatibility and bans crates with duplicate transitive dependencies.
The Janitor Scans The Janitor
The engine's own CI/CD pipeline runs janitor scan against the engine's own source tree on every pull request. Any PR that introduces dead symbols, zombie dependencies, hallucinated security claims, or structural clones into the engine is blocked by the engine — before a human reviewer sees it.
PR opened ──► janitor-pr-gate.yml
│
janitor bounce <diff>
│
slop_score ≥ 100 ? ──► CI FAIL (PR blocked)
│
slop_score < 100 ? ──► CI PASS (review proceeds)
7. RAM Pressure Management: Physarum Backpressure
crates/common/src/physarum.rs implements SystemHeart::beat(), which samples total
RAM utilisation on every request. The daemon acquires a concurrency semaphore before
processing:
| RAM Utilisation | Semaphore | Max Concurrent Requests |
|---|---|---|
| ≤ 75% | flow_semaphore |
4 |
| 75 – 90% | constrict_semaphore |
2 |
| > 90% | Busy-wait until < 90% | 0 (backpressure) |
Requests that arrive when RAM utilisation exceeds 90% are held — not dropped, not
errored — until the system returns to the Constrict or Flow band. This prevents OOM
termination without requiring manual capacity planning.
8. Responsible Disclosure
Security issues in The Janitor should be reported to [email protected]. Include:
- A description of the vulnerability and its potential impact.
- Steps to reproduce (proof-of-concept code or a minimal diff is helpful).
- Your preferred contact method for follow-up.
We commit to acknowledging receipt within 24 hours and providing an initial assessment within 72 hours. Critical vulnerabilities (RCE, token forgery, audit log tampering) are treated as P0 with a target patch cadence of 48 hours from confirmation.
Compliance Mapping
| Framework | Relevant Control | The Janitor Implementation |
|---|---|---|
| SOC 2 Type II — CC6 | Logical access controls | ML-DSA-65 token gate on all destructive commands |
| SOC 2 Type II — CC7 | System monitoring | Remote attestation POST to /v1/attest on every excision |
| NIST FIPS 204 | Post-quantum signature | ML-DSA-65 (pqcrypto-mldsa crate, verified against NIST KATs) |
| SLSA Level 2 | Build provenance | GitHub Actions release workflow with SHA-pinned steps |
| CIS Benchmark — 14.2 | Encrypt data in transit | All API calls use HTTPS; ureq enforces TLS |
| OWASP — A08:2021 | Software and data integrity | cargo audit + cargo deny in CI; SHA-pinned Docker images |