Features · Action policy
The agent reasons. The policy decides what it can touch.
Every layer of Srenix's safety envelope is operator-defined and in code you can read before you install. Eight layers; here's all of them.
Komodor, Robusta, Causely, NeuBird — they observe and suggest. Srenix mutates. That changes the safety story. An agent that can actually change cluster state needs a policy that says which changes are allowed, where, and under which approval. Eight layers, all in code you can read before you install.
1. Action-kind allowlist
pkg/ai/types.go — ActionKind closed enum The agent can only propose action_kinds from a closed enum compiled into pkg/ai — anything outside it is rejected by the validator before the proposal even reaches an operator. The full list today:
-
DeletePodDelete a single Pod so its controller reschedules it — the restart-class remediation. -
DeleteJobDelete a stuck or failed Job so its owner (CronJob / operator) can recreate it. -
PatchDeploymentPatch a Deployment — every JSONPath in the patch must pass the patch validator allowlist (no image, no replica changes). -
DeleteCertRequestDelete a stuck cert-manager CertificateRequest so issuance retries. -
DeleteACMEOrderDelete a stuck ACME Order so the challenge re-runs. -
ApplyManifestApply a full manifest — constrained by ValidateManifest before it can ever reach an approval link. -
ProposePullRequestNo direct cluster mutation — the GitOps path: Srenix opens a pull request against your repo and the merge is the approval.
2. Target-resource scope
ai/approval/patch_validator.go — allow-list of JSONPaths Each action_kind ships with a hard-coded target-resource filter. DeletePod only matches Pod resources. PatchDeployment payloads are restricted to a single allowed path: spec.template.metadata.annotations.kubectl.kubernetes.io/restartedAt (triggers a rollout restart) — not container images, not replicas. ApplyManifest payloads must pass ValidateManifest.
3. Namespace protection
pkg/ai/protected.go — append-only extras over the validate.go floor Protected namespaces are a compiled-in floor (kube-system, kube-public, kube-node-lease, rook-ceph, vault, external-secrets, cnpg-system, calico-system, tigera-operator) the agent refuses to touch. Operators extend the list append-only — SRENIX_PROTECTED_NAMESPACES_EXTRA, rendered from Helm protectedNamespaces.extra or the operator CR — so nothing can ever remove a compiled-in namespace. The fixer guard and the AI validator consume the same set, and the detect side escalates (not shields) findings in protected namespaces.
4. GitOps-managed skip
internal/fix/gitops.go — GitOpsReason() Resources labelled by Argo CD, Flux, or Helm (app.kubernetes.io/managed-by, argocd.argoproj.io/instance, helm.sh/release) are skipped — the agent doesn't fight a reconciliation loop.
5. Signed JWT click-to-fix
ai/approval/{signer,verifier,replay}.go Every T1+ mutation is wrapped in an Ed25519-signed JWT URL delivered to the operator via Slack or ticket. The approval-server verifies signature + expiry + JTI (one-time use). T2 plan steps additionally enforce prerequisite ordering at execution time — a step cannot run before its predecessor has executed. Without the click, nothing mutates — unless you explicitly enable an autonomy tier: either PR auto-merge (off by default; Wilson lower bound ≥0.95; the PR body is re-fetched from the forge and its Ed25519 attestation re-verified and field-bound before merge) or in-cluster confidence-gated auto-apply (off by default; reversible-action allowlist; confidence ≥0.95). Both require every safety gate to pass.
6. Dual-approval for T3
ai/approval/runbook_store.go — RecordApproval Vault break-glass runbooks (T3) require two distinct approvers separated by at least 30 minutes. The runbook itself is never executed by Srenix — the operator runs it manually after dual approval. Key names only, never values; ${VALUE_*} placeholders in command templates.
7. Hash-chained audit
pkg/audit/hash_chain.go (OSS) Every AI action (LLM call, proposal created, validator decision, approval granted, action applied, post-apply verify) emits an AuditEvent with prev_hash chained against the prior event. Tamper-evident even if a downstream sink is compromised. The hash chain itself is Apache-2.0 OSS; the streaming sinks (JSONL, Loki, OTLP) ship in the paid binary.
8. Per-(approver, class) rate budget
ai/rate_limit.go — TakeApproval (enforced in the approval-server) Approval executions are token-bucket budgeted per (approver, action class) pair — default 10 executions per hour per pair (per-class override is a library-level construct in RateLimitConfig; no CLI/Helm knob exists yet), and the bucket key is collision-safe so a crafted approver/class pair cannot alias another pair's budget. Layer-2 LLM investigations draw on their own budget. Prevents both flapping-workload cost blowup and a single approver bulk-approving a class of actions.
Why this matters for an AI SRE.
An LLM-powered agent without policy is a liability — reasoning capability without a brake. Srenix's design says the LLM can propose anything, but only operator-allowlisted action_kinds against operator-allowlisted namespaces can ever be executed, and only after a signed-JWT operator click (dual-approval for break-glass).
That makes Srenix suitable for the kinds of deployments where pure-LLM autonomy is a non-starter: regulated industries, sovereign clouds, air-gapped clusters, FedRAMP track. The agent gets to be smart; the policy decides what smart looks like in your environment.