Docs
Fixers
5 policy-bounded auto-fixers. Remediation is off by default. Once enabled, each fixer runs when its condition fires — safety-gated, idempotent, re-verified after every run.
Fixers are the remediation layer. Each fixer targets exactly one named failure class, runs only when its corresponding analyzer fires, and re-runs the analyzer afterward to confirm resolution. Fixers never edit Secrets, ConfigMaps, or generic CRDs — those changes require a human and a git commit.
Remediation is off by default. Enable it with remediation.enabled: true (remediate CronJob) or watcher.remedy.enabled: true (event-driven watcher). The four built-in fixers are always registered in the binary and run as soon as remediation is on — there are no per-fixer Helm toggles for them. TLSSecretMismatch is the one opt-in fixer: setting fixers.tlsSecretMismatch.enabled=true makes the chart set the SRENIX_FIXER_TLS_SECRET_MISMATCH env var (which gates its registration) and add the extra RBAC verbs it needs.
Fixer reference
| Fixer | What it fixes | Helm flag | Opt-in |
|---|---|---|---|
| StaleErrorPods | Error/Failed pods owned by a Job or unowned (debug leftovers) | — (built in; runs whenever remediation is enabled) | No (on when remediation.enabled) |
| StuckJobsWithBadSecretRef | Frozen Jobs whose pod template references a renamed Secret key — deletes the Job so the CronJob respawns clean | — (built in; runs whenever remediation is enabled) | No (on when remediation.enabled) |
| StuckRSPods | ReplicaSet pods stuck on a stale revision when the Deployment has rolled forward (rollout restart) | — (built in; runs whenever remediation is enabled) | No (on when remediation.enabled) |
| StuckCertificateRequests | cert-manager CRs in terminal Ready=False/Failed — deletion lets cert-manager re-issue | — (built in; runs whenever remediation is enabled) | No (on when remediation.enabled) |
| TLSSecretMismatch | Repoints Ingress.spec.tls[].secretName to the cert-manager-managed Secret. Skips GitOps-managed Ingresses. | fixers.tlsSecretMismatch.enabled=true | Yes (off by default) |
Safety gates
Every fixer checks these gates before running. If any gate blocks, the finding is reported but no mutation occurs.
- →GitOps guard — skips resources managed by ArgoCD, Flux, or Helm (detected via standard labels). A fixer won't fight a reconciler.
- →Paused/suspended guard — skips Deployments with
spec.paused: trueand CronJobs withspec.suspend: true. - →cert-manager controller health guard — StuckCertificateRequests fixer checks that the cert-manager Deployment is healthy (readyReplicas > 0) before deleting failed CertificateRequests.
- →Protected namespace list — fixers never mutate resources in kube-system, kube-public, kube-node-lease, rook-ceph, vault, external-secrets, or cnpg-system. This floor is compiled into the binary and can never shrink; operators can append additional namespaces via Helm
protectedNamespaces.extra(rendered asSRENIX_PROTECTED_NAMESPACES_EXTRA) or the operator CR fieldspec.protectedNamespacesExtra. (Diagnose still reports findings in those namespaces — only the act-side is gated.)
Dry-run mode
Pass --dry-run (in Helm values: remediation.dryRun: true for the remediate CronJob, watcher.remedy.dryRun: true for the watcher) to log every fix Srenix would apply without applying it. The fix log is identical to production mode minus the actual mutation. Use this in your eval cycle before enabling fixers.
Re-verify loop
After every fix, Srenix re-runs the analyzer for the fixed subject. If the finding persists, the action is recorded as unresolved in the DriftReport — not silently closed. The loop is: diagnose → fix → re-diagnose → resolve.
DriftReport StaleErrorPod — fixer ran OK
DriftReport StuckJob — fixer ran OK
DriftReport TLSSecretMismatch — fixer ran OK
Re-verify in 60s ...
→ All cleared