Docs
AI Tiers
Four LLM-augmented capabilities, each approval-gated. Available in the paid srenix-enterprise binary — same policy bounds and RBAC ceiling as the OSS engine.
The paid srenix-enterprise binary runs alongside the OSS watcher — it never replaces it. AI tiers add an LLM layer on top of OSS findings. The architectural invariant: the paid binary cannot exceed the OSS RBAC ceiling. Every mutation flows through the same snapshot.Mutator interface and the operator-defined action policy.
Nothing mutates without a human click — unless you explicitly enable an autonomy tier: either PR auto-merge (off by default; Wilson lower bound ≥0.95; every safety gate must pass) or in-cluster confidence-gated auto-apply (off by default; reversible-action allowlist; confidence ≥0.95 — see below). T1 and above require an approval action — a signed URL delivered to Slack or your ticket. Without the click (or that explicit opt-in), no cluster state changes.
Tier reference
Diagnostic narrative
Enriches every DriftReport finding with an LLM-generated narrative summary — what happened, why it matters, and the most likely root cause. Read-only; no action proposed.
None
Fix proposals
Proposes a specific action (bounded to the operator-defined action_kind policy) and delivers a signed click-to-fix URL to Slack or the ticket. Nothing mutates until the URL is clicked — unless you explicitly enable an autonomy tier (PR auto-merge or in-cluster confidence-gated auto-apply; both off by default).
One-click signed URL
Multi-step planner
For complex findings, proposes a plan of up to 5 prerequisite-linked steps. Each step requires its own approval click. Steps are linked — later steps can reference earlier results.
Per-step signed URL
Vault runbook proposer
Proposes break-glass Vault runbooks for Vault-related outages. Delivered as a structured document, never auto-run. Requires dual approval before any Vault path is touched.
Dual approval
Enabling AI tiers
AI tiers are additive — the same Helm chart, plus ai.enabled=true. The OSS watcher workloads are untouched.
helm upgrade srenix srenix/agentic-sre --reuse-values \
--set ai.enabled=true \
--set ai.tier=t1 \
--set ai.endpoint=https://your-llm-endpoint/v1 \
--set ai.model=your-model-name \
--set ai.apiKey.secretName=srenix-ai-llm-key Confidence-gated auto-apply (--autonomy)
Separate from PR auto-merge, the srenix-enterprise watch loop has an opt-in in-cluster auto-apply tier. It is OFF by default. You enable it with the --autonomy flag, which also requires --memory-store-url for the confidence signal. When a proposal qualifies, it is applied without a human click — but only when every gate below passes (defense-in-depth; the first failing gate is reported and audited):
- Confidence ≥ threshold. Default
0.95(configurable via--autonomy-min-confidence). Confidence ismax(cosine similarity to a verifiably-cleared prior fix, Wilson lower bound of the action class's success rate). - Reversible low-risk action. The action kind must be in the
--autonomy-allowallowlist. Default:DeletePod,DeleteCertRequest,DeleteACMEOrder(controller-recreated / reversible). It explicitly excludesPatchDeploymentandDeleteJob. - Unprotected namespace. The target namespace must not be a protected namespace.
- Circuit breaker closed. If the breaker has tripped, autonomy is suspended until it resets.
- A near-identical prior fix verifiably cleared. The post-apply verifier must have confirmed a matching prior fix actually resolved the finding.
The zero value is safe: with --autonomy unset, nothing ever auto-applies. The allowlist is intentionally limited to reversible actions, and the live-state precondition is re-validated at apply time, so a stale prior can never push through a now-invalid change.
Bring your own LLM
Srenix Enterprise uses any OpenAI-compatible endpoint — your in-cluster vLLM instance, Azure OpenAI, or any other gateway. Anthropic is also supported via a native Anthropic Messages API client (shipped — v0.1.0-alpha.1 line): provider selection is automatic by endpoint host, so an api.anthropic.com endpoint gets the native client — and, like any SaaS endpoint, requires --ai-allow-saas. Cluster diagnostics and prompts never leave your perimeter when using an in-cluster LLM. Set ai.endpoint to the /v1 base URL of your endpoint.
Recommended: run an in-cluster vLLM instance behind a ClusterIP service. Set ai.allowSaas=false to block any external LLM calls. Cluster data never crosses the cluster boundary.