Detect. Remediate. Verify.

Autonomous SRE for Kubernetes. Policy-bounded. Audit-anchored. Bring-your-own-LLM. An in-cluster agent across Kubernetes, AWS, GCP, Azure, and the edge. Everyone else observes; Srenix actually mutates.

Try it now View pricing

$ helm install srenix srenix/agentic-sre

One operator. Every cloud. Every stack you already run.

Cloud AWS GCP Azure

K8s distros Kubernetes EKS GKE AKS k3s OpenShift RKE2

Observability & Ticketing Prometheus Alertmanager Grafana OpenProject Jira ServiceNow Slack

K8s-native infra Vault cert-manager CNPG Rook/Ceph External Secrets Kong ArgoCD

Trigger sources K8s informers Alertmanager polling Webhook (HMAC) CronJob resync

AI providers OpenAI Anthropic In-cluster vLLM

Research (opt-in, paid) Firecrawl (deep-RCA, redacted query, external egress)

The autopilot loop, by default.

Five steps. Re-run on every cycle. Closed-loop is the default mode, not a roadmap milestone.

Install

helm install srenix srenix/agentic-sre — works on any conformant K8s 1.27+ (EKS, GKE, AKS, k3s, RKE2, OpenShift).

$ helm repo add srenix https://srenix-ai.github.io/agentic-sre
$ helm install srenix srenix/agentic-sre
$ kubectl get driftreports        # cluster-scoped CRD
NAME                       SEVERITY  SOURCE                    SUBJECT
drift-9c1e04f2a77b3d18     warning   StuckCertificateRequests  CertificateRequest/kube-system/api-tls

Detect

21 K8s probes (incl. KongRoutes + GPUNodes) + 10 AWS + 10 GCP + 10 Azure probe families. 20 OSS analyzers (drift, log, workload, diagnostic — LogPatternMatcher, OOMKillRecurrence, PVOrphan, CronJobStuck, DisruptionDrift on top of GitOps/Workload/RBAC/Capacity/Security/Config). Three trigger classes (resource events, Prometheus alerts via Alertmanager, external HMAC-authed webhooks).

Probes 21 K8s     10 AWS   10 GCP   10 Azure
       Ceph        RDS      CloudSQL SQL DB
       Nodes       EBS      GKE      AKS
       Postgres    EKS      PD       Disk
       PVCs        IAM      IAM SA   Identity
       Endpoints   ALB      LB       AppGW
       NodePr.     ACM      Cert     Backend
       DaemonSet   KMS      GCS/KMS  Subnet
       Pending     S3       Backend  ...
       CrashLoop   VPC
       ETCD
       FailedMnt
       KongRoutes ←  M2
       GPUNodes   ←  M3

Remediate

5 policy-bounded fixers run by default. AI-tier fix proposals require human approval via signed click-to-fix URLs — OR auto-merge silently at very-high confidence (Phase 3.B): matching approve-class policy + verified Ed25519 attestation + Wilson-bound class success-rate ≥ threshold (default 0.95) + closed circuit breaker. RAG memory is live: Srenix reads prior resolutions before proposing (short-circuit default ON at similarity ≥ 0.92). Paid tier: opt-in deep-RCA grounded in live web research via Firecrawl — LLM synthesizes a generic technical query (no namespace, hostname, or secret leaves the cluster); the RCA is forwarded into every AI tier (T0 → T3).

DriftReport  StaleErrorPod    — fixer ran        OK
DriftReport  StuckJob         — fixer ran        OK
DriftReport  StuckRS          — fixer ran        OK
DriftReport  StuckCertReq     — fixer ran        OK
DriftReport  TLSSecretMismatch — fixer ran       OK
DriftReport  SecurityDrift    — DigestPin PR
                                 attestation: Ed25519 ✓
                                 auto-merge gate: 5/5 ✓
                                 squash-merged via API ✓
Re-verify in 60s ...

Report

Findings flow to Slack, Alertmanager, OpenProject (OSS), and Jira / ServiceNow (paid). DriftReport CRDs let you kubectl get your cluster’s drift state.

kubectl get driftreports          # columns: SEVERITY SOURCE SUBJECT LAST SEEN COUNT TICKET
NAME                       SEVERITY  SOURCE             SUBJECT                       COUNT  TICKET
drift-9c1e04f2a77b3d18     critical  SecretKeyMissing   Secret/mcp/openproject-url    4      WP-1287
drift-4f2a77b3d189c1e0f    warning   CronJobStuck       CronJob/ai/nightly-index      2
# active findings have a CR; cleared findings are deleted on the next cycle

Verify

Re-diagnose after every fix. No "the fix maybe worked" — Srenix actively re-checks and closes the loop.

diagnose → fix → re-diagnose → resolve
                            ↑        |
                            +--------+
   Closed-loop is the DEFAULT, not a roadmap milestone.

Why Srenix, structurally.

Three architectural commitments that competitors cannot copy without rewriting their product.

An agent that actually mutates

Komodor, Robusta, Causely, Resolve — they observe and summarise. Srenix Enterprise proposes an action, signs it as a JWT, and (with one operator click) executes it. Every action lands inside the operator-defined policy: which action_kinds, which namespaces, which resources. The agent has reasoning power; the policy has the leash.

In-cluster + bring-your-own-LLM

No SaaS. No vendor LLM lock-in. Point Srenix Enterprise at any OpenAI-compatible endpoint — your in-cluster vLLM, an Azure OpenAI deployment, your own gateway. Cluster data and prompts never leave your perimeter.

Open core, audit-anchored

OSS engine is Apache-2.0 — 21 K8s probes, 10 AWS + 10 GCP + 10 Azure cloud probe families, 20 OSS analyzers (drift, log, workload, diagnostic), 5 policy-bounded fixers. Srenix Enterprise paid tier adds the LLM Investigator agent + the T0–T3 AI SRE flow + Phase 2 closure (HA aiwatch via leader-election, Prometheus instrumentation, cosign-style PR attestation) + Phase 3 (auto-merge gate, target-history RAG grounding, SOC2 audit-bundle exporter). Every AI action is JWT-signed, hash-chained, replayable.

Compliance-ready audit bundle

srenix-enterprise audit-bundle --since 30d --output bundle.tar.gz produces a SOC2-friendly evidence pack with manifest.json (versions + SHA-256 of each file), audit.jsonl (verbatim copy of the JSONL audit log: every approval click + auto-apply + LLM call + verifier result), and outcomes.jsonl (every RAG outcome within --since). Local-only — no network egress. Shipped (v0.1.0-alpha.1).

Auto-merge at very-high confidence

When the Phase 2.B "approve+remember class" policy matches AND the Phase 2.H Ed25519 attestation verifies AND the Phase 2.C Wilson-bound class success-rate clears the operator-set threshold (default 0.95) AND the circuit breaker is closed, freshly-opened DigestPin PRs auto-merge via the Forge API without a human click. Closes the "incidents resolved without paging a human" promise. Shipped (v0.1.0-alpha.1).

vs. the competition.

Detect-fix-verify is the default loop. Every other player is a copilot for the on-call rotation you already have.

Product	Where it runs	Closed-loop?	Pricing
Srenix (us)	In-cluster operator	Yes, by default	Flat per-cluster (OSS / Team / Enterprise)
NeuBird	SaaS, pulls telemetry	No — "architecturally enforced read-only"	$15–25 per investigation
Resolve AI	SaaS + thin Satellite	Roadmap (their words: "next milestone")	Contact sales
Ciroos	SaaS, zero-copy queries	Opaque "autonomy slider"	Contact sales
OpenSRE (Tracer)	Customer-hosted (docker-compose)	No — code-blocks mutations	OSS only

See detailed comparisons →

On-call should be quieter every week.

Srenix is how you get there. Helm install in 5 minutes. No telemetry exfiltration. No per-investigation surprises.

Install in 5 minutes Read the code