S Srenix
Docs / AI Tiers

Docs

AI Tiers

Four LLM-augmented capabilities, each approval-gated. Available in the paid srenix-enterprise binary — same policy bounds and RBAC ceiling as the OSS engine.

The paid srenix-enterprise binary runs alongside the OSS watcher — it never replaces it. AI tiers add an LLM layer on top of OSS findings. The architectural invariant: the paid binary cannot exceed the OSS RBAC ceiling. Every mutation flows through the same snapshot.Mutator interface and the operator-defined action policy.

Nothing mutates without a human click — unless you explicitly enable an autonomy tier: either PR auto-merge (off by default; Wilson lower bound ≥0.95; every safety gate must pass) or in-cluster confidence-gated auto-apply (off by default; reversible-action allowlist; confidence ≥0.95 — see below). T1 and above require an approval action — a signed URL delivered to Slack or your ticket. Without the click (or that explicit opt-in), no cluster state changes.

Tier reference

T0

Diagnostic narrative

Enriches every DriftReport finding with an LLM-generated narrative summary — what happened, why it matters, and the most likely root cause. Read-only; no action proposed.

Approval

None

T1

Fix proposals

Proposes a specific action (bounded to the operator-defined action_kind policy) and delivers a signed click-to-fix URL to Slack or the ticket. Nothing mutates until the URL is clicked — unless you explicitly enable an autonomy tier (PR auto-merge or in-cluster confidence-gated auto-apply; both off by default).

Approval

One-click signed URL

T2

Multi-step planner

For complex findings, proposes a plan of up to 5 prerequisite-linked steps. Each step requires its own approval click. Steps are linked — later steps can reference earlier results.

Approval

Per-step signed URL

T3

Vault runbook proposer

Proposes break-glass Vault runbooks for Vault-related outages. Delivered as a structured document, never auto-run. Requires dual approval before any Vault path is touched.

Approval

Dual approval

Enabling AI tiers

AI tiers are additive — the same Helm chart, plus ai.enabled=true. The OSS watcher workloads are untouched.

helm upgrade srenix srenix/agentic-sre --reuse-values \
  --set ai.enabled=true \
  --set ai.tier=t1 \
  --set ai.endpoint=https://your-llm-endpoint/v1 \
  --set ai.model=your-model-name \
  --set ai.apiKey.secretName=srenix-ai-llm-key

Confidence-gated auto-apply (--autonomy)

Separate from PR auto-merge, the srenix-enterprise watch loop has an opt-in in-cluster auto-apply tier. It is OFF by default. You enable it with the --autonomy flag, which also requires --memory-store-url for the confidence signal. When a proposal qualifies, it is applied without a human click — but only when every gate below passes (defense-in-depth; the first failing gate is reported and audited):

  1. Confidence ≥ threshold. Default 0.95 (configurable via --autonomy-min-confidence). Confidence is max(cosine similarity to a verifiably-cleared prior fix, Wilson lower bound of the action class's success rate).
  2. Reversible low-risk action. The action kind must be in the --autonomy-allow allowlist. Default: DeletePod, DeleteCertRequest, DeleteACMEOrder (controller-recreated / reversible). It explicitly excludes PatchDeployment and DeleteJob.
  3. Unprotected namespace. The target namespace must not be a protected namespace.
  4. Circuit breaker closed. If the breaker has tripped, autonomy is suspended until it resets.
  5. A near-identical prior fix verifiably cleared. The post-apply verifier must have confirmed a matching prior fix actually resolved the finding.

The zero value is safe: with --autonomy unset, nothing ever auto-applies. The allowlist is intentionally limited to reversible actions, and the live-state precondition is re-validated at apply time, so a stale prior can never push through a now-invalid change.

Bring your own LLM

Srenix Enterprise uses any OpenAI-compatible endpoint — your in-cluster vLLM instance, Azure OpenAI, or any other gateway. Anthropic is also supported via a native Anthropic Messages API client (shipped — v0.1.0-alpha.1 line): provider selection is automatic by endpoint host, so an api.anthropic.com endpoint gets the native client — and, like any SaaS endpoint, requires --ai-allow-saas. Cluster diagnostics and prompts never leave your perimeter when using an in-cluster LLM. Set ai.endpoint to the /v1 base URL of your endpoint.

Recommended: run an in-cluster vLLM instance behind a ClusterIP service. Set ai.allowSaas=false to block any external LLM calls. Cluster data never crosses the cluster boundary.

← Back to docs