S Srenix

Features · Action policy

The agent reasons. The policy decides what it can touch.

Every layer of Srenix's safety envelope is operator-defined and in code you can read before you install. Eight layers; here's all of them.

Komodor, Robusta, Causely, NeuBird — they observe and suggest. Srenix mutates. That changes the safety story. An agent that can actually change cluster state needs a policy that says which changes are allowed, where, and under which approval. Eight layers, all in code you can read before you install.

1. Action-kind allowlist

pkg/ai/types.go — ActionKind closed enum

The agent can only propose action_kinds from a closed enum compiled into pkg/ai — anything outside it is rejected by the validator before the proposal even reaches an operator. The full list today:

  • DeletePod Delete a single Pod so its controller reschedules it — the restart-class remediation.
  • DeleteJob Delete a stuck or failed Job so its owner (CronJob / operator) can recreate it.
  • PatchDeployment Patch a Deployment — every JSONPath in the patch must pass the patch validator allowlist (no image, no replica changes).
  • DeleteCertRequest Delete a stuck cert-manager CertificateRequest so issuance retries.
  • DeleteACMEOrder Delete a stuck ACME Order so the challenge re-runs.
  • ApplyManifest Apply a full manifest — constrained by ValidateManifest before it can ever reach an approval link.
  • ProposePullRequest No direct cluster mutation — the GitOps path: Srenix opens a pull request against your repo and the merge is the approval.

2. Target-resource scope

ai/approval/patch_validator.go — allow-list of JSONPaths

Each action_kind ships with a hard-coded target-resource filter. DeletePod only matches Pod resources. PatchDeployment payloads are restricted to a single allowed path: spec.template.metadata.annotations.kubectl.kubernetes.io/restartedAt (triggers a rollout restart) — not container images, not replicas. ApplyManifest payloads must pass ValidateManifest.

3. Namespace protection

pkg/ai/protected.go — append-only extras over the validate.go floor

Protected namespaces are a compiled-in floor (kube-system, kube-public, kube-node-lease, rook-ceph, vault, external-secrets, cnpg-system, calico-system, tigera-operator) the agent refuses to touch. Operators extend the list append-only — SRENIX_PROTECTED_NAMESPACES_EXTRA, rendered from Helm protectedNamespaces.extra or the operator CR — so nothing can ever remove a compiled-in namespace. The fixer guard and the AI validator consume the same set, and the detect side escalates (not shields) findings in protected namespaces.

4. GitOps-managed skip

internal/fix/gitops.go — GitOpsReason()

Resources labelled by Argo CD, Flux, or Helm (app.kubernetes.io/managed-by, argocd.argoproj.io/instance, helm.sh/release) are skipped — the agent doesn't fight a reconciliation loop.

5. Signed JWT click-to-fix

ai/approval/{signer,verifier,replay}.go

Every T1+ mutation is wrapped in an Ed25519-signed JWT URL delivered to the operator via Slack or ticket. The approval-server verifies signature + expiry + JTI (one-time use). T2 plan steps additionally enforce prerequisite ordering at execution time — a step cannot run before its predecessor has executed. Without the click, nothing mutates — unless you explicitly enable an autonomy tier: either PR auto-merge (off by default; Wilson lower bound ≥0.95; the PR body is re-fetched from the forge and its Ed25519 attestation re-verified and field-bound before merge) or in-cluster confidence-gated auto-apply (off by default; reversible-action allowlist; confidence ≥0.95). Both require every safety gate to pass.

6. Dual-approval for T3

ai/approval/runbook_store.go — RecordApproval

Vault break-glass runbooks (T3) require two distinct approvers separated by at least 30 minutes. The runbook itself is never executed by Srenix — the operator runs it manually after dual approval. Key names only, never values; ${VALUE_*} placeholders in command templates.

7. Hash-chained audit

pkg/audit/hash_chain.go (OSS)

Every AI action (LLM call, proposal created, validator decision, approval granted, action applied, post-apply verify) emits an AuditEvent with prev_hash chained against the prior event. Tamper-evident even if a downstream sink is compromised. The hash chain itself is Apache-2.0 OSS; the streaming sinks (JSONL, Loki, OTLP) ship in the paid binary.

8. Per-(approver, class) rate budget

ai/rate_limit.go — TakeApproval (enforced in the approval-server)

Approval executions are token-bucket budgeted per (approver, action class) pair — default 10 executions per hour per pair (per-class override is a library-level construct in RateLimitConfig; no CLI/Helm knob exists yet), and the bucket key is collision-safe so a crafted approver/class pair cannot alias another pair's budget. Layer-2 LLM investigations draw on their own budget. Prevents both flapping-workload cost blowup and a single approver bulk-approving a class of actions.

Why this matters for an AI SRE.

An LLM-powered agent without policy is a liability — reasoning capability without a brake. Srenix's design says the LLM can propose anything, but only operator-allowlisted action_kinds against operator-allowlisted namespaces can ever be executed, and only after a signed-JWT operator click (dual-approval for break-glass).

That makes Srenix suitable for the kinds of deployments where pure-LLM autonomy is a non-starter: regulated industries, sovereign clouds, air-gapped clusters, FedRAMP track. The agent gets to be smart; the policy decides what smart looks like in your environment.

Read the full security posture →