Innovation Log

Autonomous architectural insights, structural gap observations, intelligence ledgers, and forward-looking feature proposals. Maintained by the Evolution Tracker skill. Entries are organized by priority tier and dated at creation.

CISO Pulse Audit completed 2026-04-03 (v9.1.1). All entries re-tiered into P0/P1/P2. Twelve new grammar depth rules added. Redundant ideas merged. Low-value noise dropped.

P0 — Enterprise Security Depth

Grammar-first. Every language construct without an AST gate is a liability a CISO cannot sign off on. P0 entries are the reason enterprises select the Janitor over byte-pattern scanners.

Grammar Depth: Go — 3 New Detection Rules

Languages to target: Go (current AST coverage: 2 rules — exec.Command shell interpreter, InsecureSkipVerify)

Go-3 — security:sql_injection_concatenation (KevCritical, 150 pts) - Trigger: call_expression where the selector matches db.Query|db.Exec|db.QueryRow|db.QueryContext|db.ExecContext and the first argument is a binary_expression with + operator. - Suppress if: the + operand is a interpreted_string_literal on both sides (constant concatenation). - AST node: call_expression → selector_expression{db.*} → binary_expression{+} - File: crates/forge/src/slop_hunter.rs::find_go_slop() - CVE class: CWE-89, countless DB-driver CVEs

Go-4 — security:unsafe_pointer_cast (Critical, 50 pts) - Trigger: call_expression matching unsafe.Pointer(expr) inside a type conversion (*T)(unsafe.Pointer(...)) where the inner expression is not an address-of literal. - Suppress if: inside a function named with ffi, cgo, or bridge. - AST node: type_conversion_expression → call_expression{unsafe.Pointer} - File: crates/forge/src/slop_hunter.rs::find_go_slop() - CVE class: CWE-843 (type confusion), memory safety

Go-5 — security:path_traversal_http_serve (KevCritical, 150 pts) - Trigger: call_expression matching http.ServeFile|os.Open|os.ReadFile where the path argument is a binary_expression{+} containing a variable. - Suppress if: path is a interpreted_string_literal (constant). - AST node: call_expression{http.ServeFile|os.Open} → argument → binary_expression{+} - File: crates/forge/src/slop_hunter.rs::find_go_slop() - CVE class: CWE-22 (path traversal), OWASP A01

Grammar Depth: Rust — 3 New Detection Rules

Rust current AST coverage: 2 rules — mem::transmute, raw pointer deref.

Rust-3 — security:unsafe_slice_from_raw_parts (Critical, 50 pts) - Trigger: call_expression matching from_raw_parts|from_raw_parts_mut inside an unsafe block where the pointer argument is not an address-of expression (&arr[0] is acceptable; a variable is not). - Suppress if: function name contains ffi, raw, sys, extern. - AST node: unsafe_block → call_expression{from_raw_parts} - File: crates/forge/src/slop_hunter.rs::find_rust_slop() - CVE class: CWE-119 (buffer overflow / out-of-bounds read)

Rust-4 — security:smart_ptr_from_raw (Critical, 50 pts) - Trigger: call_expression matching Box::from_raw|Arc::from_raw|Rc::from_raw inside an unsafe block; any argument. - Suppress if: function name contains ffi, raw, extern. - Rationale: from_raw reconstructs ownership from a raw pointer; misuse causes use-after-free or double-free. - AST node: unsafe_block → call_expression{Box::from_raw|Arc::from_raw|Rc::from_raw} - File: crates/forge/src/slop_hunter.rs::find_rust_slop() - CVE class: CWE-416 (use-after-free), CWE-415 (double-free)

Rust-5 — security:process_command_injection (KevCritical, 150 pts) - Trigger: call_expression matching Command::new(expr) where expr is not a string_literal — i.e., the executable name is user-influenced. - Suppress if: function is #[test] or name contains test. - Rationale: Command::new(user_input) is direct OS command injection with no shell interpolation needed; more dangerous than shell=True Python equivalents. - AST node: call_expression{Command::new} → argument (non-literal) - File: crates/forge/src/slop_hunter.rs::find_rust_slop() - CVE class: CWE-78 (OS command injection)

Grammar Depth: Java — 3 New AST-Level Rules (Promoting Byte-Level)

Java current AST coverage: 0 rules. All detection is byte-level in slop_hunter.rs string patterns. Tree-sitter-java grammar is fully loaded. This is the highest-priority grammar gap in the workspace.

Java-1 — security:java_deserialization_gadget (KevCritical, 150 pts) - Trigger: method_invocation matching readObject() where the receiver is an ObjectInputStream or a subclass. - Suppress if: inside a function named test* or *Test. - AST node: method_invocation{readObject} → primary{ObjectInputStream.*} - File: crates/forge/src/slop_hunter.rs::find_java_slop() (new function) - CVE class: CVE-2015-4852, CVE-2016-4463, hundreds of Java deser RCEs. This is the most exploited class in the Java ecosystem.

Java-2 — security:runtime_exec_injection (KevCritical, 150 pts) - Trigger: method_invocation matching .exec(expr) on a receiver that is Runtime.getRuntime(), OR object_creation_expression{ProcessBuilder} where the constructor argument is not a string_literal. - Suppress if: inside a function named test* or *Test. - AST node: method_invocation{exec} → primary{Runtime.getRuntime()} or object_creation_expression{ProcessBuilder} - File: crates/forge/src/slop_hunter.rs::find_java_slop() - CVE class: CWE-78; Oracle JDK, Spring, Struts exploit chains

Java-3 — security:xxe_documentbuilder (Critical, 50 pts) - Trigger: method_invocation{newInstance} on DocumentBuilderFactory where no setFeature("…disallow-doctype-decl…", true) call follows in the same method body. - Suppress if: setFeature with FEATURE_SECURE_PROCESSING is present. - AST node: method_invocation{newInstance} on DocumentBuilderFactory receiver without subsequent setFeature hardening call. - File: crates/forge/src/slop_hunter.rs::find_java_slop() - CVE class: CWE-611 (XXE); Spring, Android, Apache Commons

Grammar Depth: Python — 3 New Detection Rules

Python current AST coverage: SQLi concatenation, SSRF dynamic URL, path traversal concatenation. Missing deserialization and shell-injection vectors.

Python-1 — security:pickle_deserialization (KevCritical, 150 pts) - Trigger: call{pickle.loads|pickle.load} where the argument is not a bytes literal. - Suppress if: inside a function named test* or *_test. - AST node: call → attribute{pickle.loads|pickle.load} - File: crates/forge/src/slop_hunter.rs::find_python_slop() - CVE class: CWE-502; countless ML-pipeline supply chain attacks (pickle is the serialization format for PyTorch, scikit-learn model files)

Python-2 — security:yaml_unsafe_load (Critical, 50 pts) - Trigger: call{yaml.load} where the keywords list does NOT contain a keyword_argument with key Loader. - Suppress if: inside a function named test*. - Rationale: yaml.load(data) without Loader=yaml.SafeLoader executes arbitrary Python objects embedded in YAML. - AST node: call → attribute{yaml.load} → assert no Loader= keyword - File: crates/forge/src/slop_hunter.rs::find_python_slop() - CVE class: CVE-2017-18342; OWASP A08 (software/data integrity)

Python-3 — security:subprocess_shell_injection (KevCritical, 150 pts) - Trigger: call{subprocess.Popen|subprocess.call|subprocess.run|subprocess.check_call} where a keyword_argument{shell=True} is present AND the first positional argument is not a string literal. - Suppress if: inside a function named test*. - Rationale: shell=True combined with a non-literal first arg passes user input to /bin/sh -c, enabling full OS command injection. - AST node: call{subprocess.*} with keyword_argument{shell: True} and non-literal first arg - File: crates/forge/src/slop_hunter.rs::find_python_slop() - CVE class: CWE-78; the most common Python pentest finding

IDEA-002: Provenance-Aware KEV Escalation — Dependency × CVE Correlation

Class: Threat Intelligence Integration Priority: P0 Inspired by: crates/anatomist/src/manifest.rs::find_version_silos_from_lockfile

Observation: The KEV gate (Severity::KevCritical, 150 pts) fires only when a patch contains a syntactic pattern matching a known exploit class. The most common real-world scenario is different: a dependency upgrade silently introduces a version that contains a CVE-listed vulnerability, with no change to the calling code.

Proposal: Extend janitor_dep_check to correlate the resolved dependency tree against the CISA KEV catalog (fetched via update-wisdom):

For each direct + transitive dep in Cargo.lock, query the local wisdom.db for KEV entries matching the crate + version range.
If a match is found, synthesize a SlopFinding with severity: KevCritical, category: "supply_chain:kev_dependency", and the CVE ID in description.
The finding is emitted into the bounce result even if the patch itself contains no dangerous code.

Security impact: Closes the gap between "dep is vulnerable" and "patch uses the vulnerable codepath." A cargo add [email protected] that pulls in a KEV-listed transitive dep becomes a hard block at slop_score >= 150 before the PR is merged.

Implementation path: crates/anatomist/src/manifest.rs::check_kev_deps(lockfile, wisdom_db) → returns Vec<SlopFinding> with KevCritical severity → merged into PatchBouncer::bounce() result alongside structural findings.

VULN-01: Sovereign Governor Binary (Long-term)

Severity: Critical Class: Infrastructure / Reliability

The short-term --soft-fail mode is [COMPLETED — v9.0.0]. The long-term sovereign deployment path is still open.

Long-term (v9.1.x) — Sovereign Governor binary: Package the Governor as a self-contained binary (janitor-gov) that the customer deploys inside their own VPC (EKS, GKE, or bare-metal). The SaaS Fly.io Governor becomes optional; janitor bounce --governor-url https://janitor-gov.internal routes to the on-prem instance. Stateless-first — PostgreSQL is optional; SQLite (janitor-gov --storage sqlite:///.janitor/gov.db) is the default for air-gapped deployments.

Definition of Done: - just audit passes with Sovereign Governor binary crate skeleton - cmd_bounce routes to --governor-url override when set - Integration test: custom governor URL path validates end-to-end

VULN-04: `--deep-scan` Flag — Extended Parse Budget

Severity: High Class: Detection Coverage / Evasion

Finding: Two circuit breakers create exploitable blind spots: 1. 1 MiB patch skip — files exceeding 1 MiB are skipped before tree-sitter parsing. A malicious actor can pad a payload past this threshold to guarantee bypass. 2. 500 ms parse timeout — adversarially crafted source can force a timeout, causing the file to be skipped with Severity::Exhaustion.

Solution — --deep-scan mode in janitor bounce: - File size limit raised from 1 MiB to 32 MiB (configurable via [forge] deep_scan_max_bytes in janitor.toml) - Parse timeout raised from 500 ms to 30 s per file - Parallelism capped at Pulse::Constrict level (2 workers) to prevent OOM

Definition of Done: - --deep-scan flag parsed in cmd_bounce; ForgeConfig gains deep_scan_max_bytes: Option<u64> and deep_scan_timeout_us: Option<u64> - Unit test: 2 MiB synthetic file skipped on fast path, processed on deep-scan path - cargo test covers Exhaustion retry logic under deep-scan

P1 — Compliance / Zero-Upload

These entries unlock regulated-market deals (FedRAMP, DISA STIG, ISO 27001) and multi-SCM enterprises. Not the primary reason a CISO buys the product, but hard blocks on procurement if absent.

Executable Surface Gaps — 5 New Grammar Extensions

Current grammar coverage: 23 languages, all AST depth. Unmapped critical extensions in the enterprise corpus:

Rank	Extension	Count	Class	Risk
1	Dockerfile	∞	container	Critical — supply chain
2	xml	1 439	infra / config	Critical — XXE
3	proto	481	RPC contract	High — deser gadget
4	bzl, bazel	473	build system	High — unverified fetch
5	cmake	48	build system	High — build injection

Proposed AST gates:

Gate 1 — security:dockerfile_pipe_execution (Critical, 50 pts) Grammar: tree-sitter-dockerfile | Trigger: RUN … | bash/sh Rationale: supply-chain execution; XZ Utils backdoor class.

Gate 2 — security:xxe_external_entity (Critical, 50 pts) Grammar: tree-sitter-xml | Trigger: DOCTYPE … SYSTEM/PUBLIC Rationale: OWASP A05, CWE-611; Spring/Java/Android attack surface.

Gate 3 — security:protobuf_any_type_field (High, 50 pts) Grammar: tree-sitter-proto | Trigger: google.protobuf.Any field in RPC message Rationale: arbitrary-message gadget chain via attacker-controlled type_url.

Gate 4 — security:bazel_unverified_http_archive (Critical, 50 pts) Grammar: tree-sitter-starlark | Trigger: http_archive() without sha256 Rationale: mirrors Nix-1 gate; supply-chain tarball substitution.

Gate 5 — security:cmake_execute_process_injection (High, 50 pts) Grammar: tree-sitter-cmake | Trigger: execute_process(COMMAND ${VAR}) Rationale: build-time RCE via user-controlled toolchain variable.

IDEA-003: Adversarial Grammar Stress Harness

Class: Defensive Hardening / Fuzzing Priority: P1 Inspired by: PARSER_TIMEOUT_MICROS, Severity::Exhaustion, VULN-04

Observation: The 500 ms parse timeout exists because adversarially crafted source can drive tree-sitter into O(n²) or O(n³) parse time on certain grammar ambiguities. We have no systematic way to discover which inputs trigger worst-case parse behaviour across all 23 grammars before a real attacker does.

Proposal: Build a crates/fuzz target (cargo-fuzz / libFuzzer) for each grammar: 1. Takes arbitrary bytes as input. 2. Attempts to parse with the grammar under a 100 ms budget. 3. If the parse exhausts the budget, records the input as a new Severity::Exhaustion Crucible fixture. 4. Runs as a scheduled CI job (nightly, 30 min budget per grammar).

Implementation path: crates/fuzz/fuzz_targets/fuzz_grammar_<lang>.rs × 23 → crash corpus committed to crates/crucible/fixtures/exhaustion/

IDEA-004: HSM / KMS Integration for `--pqc-key`

Class: Compliance / Key Custody Priority: P1 Inspired by: CT-006 (v9.1.0), FedRAMP/DISA STIG requirements

Observation: --pqc-key accepts only a path to raw private key bytes on disk. Enterprise FedRAMP and DISA STIG deployments require that private key material NEVER touch the runner filesystem — signing operations must be delegated to an HSM (PKCS#11) or cloud KMS.

Proposal (v9.2.x): Extend --pqc-key to accept: - PKCS#11 URI: pkcs11:token=janitor;object=mlksa-key - AWS KMS ARN: arn:aws:kms:us-east-1:123456789012:key/abc-... - Azure Key Vault URI: https://vault.azure.net/keys/janitor-pqc/...

The file-path mode remains the default for air-gapped deployments. The KMS/HSM mode requires a thin shim crate (crates/pqc-kms).

Implementation path: crates/cli/src/main.rs — extend --pqc-key arg type; add PqcKeySource enum in crates/pqc-kms/src/lib.rs.

VULN-03: `ScmContext` Abstraction

Severity: High Class: Portability / Ecosystem

Finding: All env var resolution and webhook handling are coupled to GitHub's specific contract (GITHUB_SHA, GITHUB_REF, GitHub App installation IDs). GitLab CI, Bitbucket Pipelines, and Azure DevOps lock the Janitor out of ~45% of the enterprise SCM market.

Solution — ScmContext struct in crates/common/src/scm.rs:

pub enum ScmProvider { GitHub, GitLab, Bitbucket, AzureDevOps, Generic }
pub struct ScmContext {
    pub provider: ScmProvider,
    pub commit_sha: String,
    pub repo_slug: String,
    pub pr_number: Option<u64>,
    pub base_ref: String,
    pub head_ref: String,
    pub token: Option<String>,
}
impl ScmContext { pub fn from_env() -> Self { … } }

Detection priority: GITLAB_CI → GitLab; BITBUCKET_BUILD_NUMBER → Bitbucket; TF_BUILD → Azure DevOps; GITHUB_ACTIONS → GitHub; else Generic.

Definition of Done: - ScmContext::from_env() detects all 4 providers in unit tests - cmd_bounce uses ScmContext for all env var reads - CI matrix tests GitHub + GitLab env fixture sets

P2 — Operational / CLI Ergonomics

DX improvements and maintenance items. Important for retention but not the primary purchasing decision driver. Implement after P0 and P1 queues drain.

IDEA-001: Semantic CST Diff Engine — Structural Patch Analysis

Class: Core Engine Enhancement Priority: P2 Inspired by: crates/forge/src/hashing.rs::AstSimHasher, VULN-04

Observation: The current bounce engine operates on line-level unified diffs. Two diffs that are semantically identical produce different line hashes, inflating clone detection false-negative rate. A malicious payload inserted via a whitespace-only formatting change can evade the diff pre-filter.

Proposal: Replace the line-diff input to find_slop() with a CST diff computed via tree-sitter's incremental parsing. Feed only the changed subtrees into the slop detectors — not the whole file diff.

Security impact: Eliminates whitespace-padding evasion. Enables sub-file granularity for the 1 MiB circuit breaker.

Implementation path: crates/forge/src/cst_diff.rs (new) → CstDelta { added: Vec<Node>, removed: Vec<Node> } → wire into PatchBouncer::bounce() as optional fast path via --cst-diff.

CT-001: `BounceLogEntry` Default Derive

Found during: VULN-01 Remediation Location: crates/cli/src/report.rs Issue: BounceLogEntry does not derive Default, forcing every callsite to enumerate all fields. Adding a new field requires updating every struct literal across main.rs, daemon.rs, git_drive.rs, cbom.rs, and test helpers. Suggested fix: Derive Default on BounceLogEntry; switch existing struct literals to BounceLogEntry { field: value, ..Default::default() }.

CT-002: Degraded Attestation Has No SIEM Visibility Path

Found during: VULN-01 Remediation Location: crates/cli/src/main.rs (soft_fail match arm) Issue: When governor_status: "degraded" is written to the local NDJSON log, there is no mechanism to forward the degraded event to the configured webhook endpoint. A governance auditor reviewing the webhook feed would have no signal that some CI runs proceeded without attestation. Suggested fix: Add a "degraded_attestation" event class to WebhookConfig::events filter and fire fire_webhook_if_configured in the soft-fail match arm.

CT-004: `just fast-release` Has No Audit Stamp Guard

Found during: Forward-Looking Telemetry Location: justfile — fast-release recipe Issue: fast-release skips the audit prerequisite on the honour-system assumption that the caller ran just audit first. An operator who invokes just fast-release directly will ship a binary that has never been audited. Suggested fix: In just audit, write .janitor/audit_stamp containing git rev-parse HEAD. In just fast-release, verify that .janitor/audit_stamp matches HEAD before proceeding; abort with an actionable error if not.

Continuous Telemetry — 2026-04-03 (CISO Pulse Audit, v9.1.1)

CT-007: `update-wisdom` has no CISA KEV diff / checklist export path

Found during: CISO Pulse & Autonomous Clock (v9.1.1) Location: crates/cli/src/main.rs::cmd_update_wisdom Issue: update-wisdom downloads a binary wisdom.rkyv file — a format that cannot be diffed in CI, grepped by jq, or used to generate human-readable checklists. The CISA KEV sync workflow (cisa-kev-sync.yml) therefore fetches the CISA JSON directly from www.cisa.gov rather than using the wisdom registry. This bypasses the on-device sovereignty model and creates a split-path architecture: the wisdom registry and the KEV catalog are not unified. Suggested fix: Add a --ci-mode flag to update-wisdom that, in addition to writing wisdom.rkyv, emits a JSON summary (.janitor/wisdom_manifest.json) listing all KEV entries in a diffable, human-readable format. The CISA sync workflow can then use this JSON file instead of fetching the CISA feed independently.

CT-008: C/C++ grammars have zero AST-level detectors

Found during: CISO Pulse grammar depth audit Location: crates/forge/src/slop_hunter.rs Issue: tree-sitter-c and tree-sitter-cpp are loaded into the polyglot registry but have no corresponding find_c_slop() or find_cpp_slop() functions in slop_hunter.rs. All C/C++ detection is byte-level (binary_hunter.rs). The following high-priority patterns have no AST gate: gets() (CWE-119), strcpy() / strcat() with non-literal dest (CWE-121), sprintf() / vsprintf() with non-literal format string (CWE-134), system() with non-literal arg (CWE-78). Suggested fix (P0): Implement find_c_slop() and find_cpp_slop() for the four patterns above. C/C++ is the language most represented in CVE exploits; having only byte-level detection is a credibility gap in enterprise security conversations.