S Srenix
Docs / Fixers

Docs

Fixers

5 policy-bounded auto-fixers. Remediation is off by default. Once enabled, each fixer runs when its condition fires — safety-gated, idempotent, re-verified after every run.

Fixers are the remediation layer. Each fixer targets exactly one named failure class, runs only when its corresponding analyzer fires, and re-runs the analyzer afterward to confirm resolution. Fixers never edit Secrets, ConfigMaps, or generic CRDs — those changes require a human and a git commit.

Remediation is off by default. Enable it with remediation.enabled: true (remediate CronJob) or watcher.remedy.enabled: true (event-driven watcher). The four built-in fixers are always registered in the binary and run as soon as remediation is on — there are no per-fixer Helm toggles for them. TLSSecretMismatch is the one opt-in fixer: setting fixers.tlsSecretMismatch.enabled=true makes the chart set the SRENIX_FIXER_TLS_SECRET_MISMATCH env var (which gates its registration) and add the extra RBAC verbs it needs.

Fixer reference

Fixer What it fixes Helm flag Opt-in
StaleErrorPods Error/Failed pods owned by a Job or unowned (debug leftovers) — (built in; runs whenever remediation is enabled) No (on when remediation.enabled)
StuckJobsWithBadSecretRef Frozen Jobs whose pod template references a renamed Secret key — deletes the Job so the CronJob respawns clean — (built in; runs whenever remediation is enabled) No (on when remediation.enabled)
StuckRSPods ReplicaSet pods stuck on a stale revision when the Deployment has rolled forward (rollout restart) — (built in; runs whenever remediation is enabled) No (on when remediation.enabled)
StuckCertificateRequests cert-manager CRs in terminal Ready=False/Failed — deletion lets cert-manager re-issue — (built in; runs whenever remediation is enabled) No (on when remediation.enabled)
TLSSecretMismatch Repoints Ingress.spec.tls[].secretName to the cert-manager-managed Secret. Skips GitOps-managed Ingresses. fixers.tlsSecretMismatch.enabled=true Yes (off by default)

Safety gates

Every fixer checks these gates before running. If any gate blocks, the finding is reported but no mutation occurs.

  • GitOps guard — skips resources managed by ArgoCD, Flux, or Helm (detected via standard labels). A fixer won't fight a reconciler.
  • Paused/suspended guard — skips Deployments with spec.paused: true and CronJobs with spec.suspend: true.
  • cert-manager controller health guard — StuckCertificateRequests fixer checks that the cert-manager Deployment is healthy (readyReplicas > 0) before deleting failed CertificateRequests.
  • Protected namespace list — fixers never mutate resources in kube-system, kube-public, kube-node-lease, rook-ceph, vault, external-secrets, or cnpg-system. This floor is compiled into the binary and can never shrink; operators can append additional namespaces via Helm protectedNamespaces.extra (rendered as SRENIX_PROTECTED_NAMESPACES_EXTRA) or the operator CR field spec.protectedNamespacesExtra. (Diagnose still reports findings in those namespaces — only the act-side is gated.)

Dry-run mode

Pass --dry-run (in Helm values: remediation.dryRun: true for the remediate CronJob, watcher.remedy.dryRun: true for the watcher) to log every fix Srenix would apply without applying it. The fix log is identical to production mode minus the actual mutation. Use this in your eval cycle before enabling fixers.

Re-verify loop

After every fix, Srenix re-runs the analyzer for the fixed subject. If the finding persists, the action is recorded as unresolved in the DriftReport — not silently closed. The loop is: diagnose → fix → re-diagnose → resolve.

DriftReport  StaleErrorPod    — fixer ran        OK
DriftReport  StuckJob         — fixer ran        OK
DriftReport  TLSSecretMismatch — fixer ran       OK
Re-verify in 60s ...
→ All cleared
← Back to docs