Localise debugging of OPA Gatekeeper Rego Policy

Introduction

Background

On November 2022, in a attempt to better enforce policy in our Kubernetes clusters, we at Reecetech adopted OPA Gatekeeper.

Successor to the Open Policy Agent project, Gatekeeper is a Kubernetes-centric offering that serves an admission controller utilising OPA. This admission controller provides a way to manage and enforce policies using OPA’s expressive Rego policy language.

Just like any new adopters of any piece of tech, off we headed to the documentation.

Following the reading of the documentation and assuming a full embrace of Gatekeeper, you may have landed in the same stack as us; TL;DR:

Awesome! So however we decide to curate our policies, we can test our policies locally and through continuous integration.

Disallowing Anonymous Access

Our efforts in localising the debugging of Gatekeeper Rego policy stemmed from the implementation of a policy. This policy was to disallow roles to exist which allow rights to anonymous users within our Kubernetes clusters. It can be performed by disallowing admission of role bindings tied to the system:unauthenticated and system:anonymous Kubernetes groups.

This implementation was copied from the OPA Gatekeeper project as they provide a policy to Disallow Anonymous Access through their community-driven library.

Later after implementation of the aforementioned policy, by pure chance, we encountered the existence of the following Cluster Role Binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: oidc-reviewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:service-account-issuer-discovery
subjects:
  - kind: Group
    name: system:unauthenticated
    apiGroup: rbac.authorization.k8s.io

This cluster role binding essentially allows an unauthenticated user to access URL paths for support on how to authenticate, and access keys for identity verification.

This was not showing up as a violation in any cluster. What’s going on? This clearly is binding to the disallowed group. There was a need to poke around the test framework.


Setting Up

Structure

Let’s delve into the setup of the directory structure.

.
└── policy-library
    └── k8sdisallowanonymous
        ├── template.yaml
        ├── constraint.yaml
        ├── suite.yaml
        └── tests
            └── oidc-reviewer.yaml

Testing

Using the preceding file and directory structure, we can assert the test suite through the Gator CLI. No need to install anything, we can just leverage the Gator CLI docker container.

docker run --rm \
    -v ${PWD}/policy-library:/home/nonroot/ \
        openpolicyagent/gator:v3.14.0 \
            verify -v k8sdisallowanonymous/...

Where the following output resulted:

=== RUN   no-anonymous
    === RUN   test-oidc-reviewer
    --- FAIL: test-oidc-reviewer	(0.002s)
        unexpected number of violations: got 0 violations but want at least 1: got messages []
--- FAIL: no-anonymous	(0.007s)
FAIL	k8sdisallowanonymous/suite.yaml	0.007s
FAIL

Once again, we are expecting a violation with a message to assert the issue, but no complaint from the policy. Gator CLI is voicing that the tests were not asserted correctly. Sounds like we might need to debug the policy.


Debugging

Referring back to documentation, Gatekeeper provide some easy-to-follow steps for debugging by Viewing the Request Object.

Fetching the Admission Review Object

This is done by forcing a violation by outputting the admission review input object.

diff --git a/policy-library/k8sdisallowanonymous/template.yaml b/policy-library/k8sdisallowanonymous/template.yaml
index 56a3d84..b508e5d 100644
--- a/policy-library/k8sdisallowanonymous/template.yaml
+++ b/policy-library/k8sdisallowanonymous/template.yaml
@@ -30,6 +30,10 @@ spec:
       rego: |
         package k8sdisallowanonymous

+        violation[{"msg": msg}] {
+          msg := sprintf("REVIEW OBJECT: %v", [input])
+        }
+
         violation[{"msg": msg}] {
           not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
           review(input.review.object.subjects[_])

We chose to output the entire admission review input object by adding the preceding code to the beginning of the disallow anonymous access template. This would allow for easy copy-paste and testing against policy.

After re-running the Gator CLI test container, we received the following request object from standard output.

{"parameters": {}, "review": {"kind": {"group": "rbac.authorization.k8s.io", "kind": "ClusterRoleBinding", "version": "v1"}, "name": "oidc-reviewer", "object": {"apiVersion": "rbac.authorization.k8s.io/v1", "kind": "ClusterRoleBinding", "metadata": {"name": "oidc-reviewer"}, "roleRef": {"apiGroup": "rbac.authorization.k8s.io", "kind": "ClusterRole", "name": "system:service-account-issuer-discovery"}, "subjects": [{"apiGroup": "rbac.authorization.k8s.io", "kind": "Group", "name": "system:unauthenticated"}]}}}

Cool. I followed the debugging guide. What do I do with this? And how can I make use of this admission review object with our policies?

One unfortunate thing about the debugging guide on Gatekeeper is that they teach you how to fetch the admission review object, but not how to use it. It is not easily understood to new learners of the Gatekeeper tool set. In terms of granular debugging, re-running the Gator CLI with minor amendments may end up being less effective when scoping in on the low-level. What would be better would be to play in a Rego environment directly.

Locally debugging policy

To perform debugging locally, we can run a OPA container which provides Rego language in a Read-Evaluate-Print Loop (REPL).

docker run -it --rm openpolicyagent/opa

Since we are working with a REPL Rego policy engine, lets predefine the underlying rules and constants prior to defining the violation set rule.

Now we are in a position that we can test the Rego policy so we can take a look at the violation function:

violation[{"msg": msg}] {
  not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
  review(input.review.object.subjects[_])
  msg := sprintf("Unauthenticated user reference is not allowed in %v %v ", [input.review.object.kind, input.review.object.metadata.name])
}

On attempt to fetch the violation, the following occurs:

violation[{"msg": msg}]

Response: undefined

With Rego this can occur if the body of the violation rule never evaluates to true. Syntactically speaking, this makes sense, as a “violation” would be a “violation” if it is true. If a violation is false or undefined, then it isn’t a violation. Rego treats rule bodies the same. The violation is not assigned due to this behaviour.

So what condition within the violation set is not resolving?

Let’s go more granular

We need to assess the conditions within the violation set assignment to hone in on the underlying issue.

For a pure sanity check, let’s see if the msg assignment resolves:

Now lets check the review() rule which assesses if the group is one of the violating groups:

Well it must be the is_allowed() rule. Lets assess the value.

Looks like the input parameters of allowedRoles is the culprit, and the Rego policy cannot support the lack of presence of this parameter. Ideally we would have liked to have the parameters mentioned in the constraint as:

 apiVersion: constraints.gatekeeper.sh/v1beta1
 kind: K8sDisallowAnonymous
 metadata:
   name: no-anonymous
 spec:
 enforcementAction: deny
 match:
   kinds:
     - apiGroups: ["rbac.authorization.k8s.io"]
       kinds: ["ClusterRoleBinding"]
+  parameters:
+    allowedRoles:
+      - some-allowed-anonymous-role

This is also apparent with the admission review input object containing an empty collection of parameters:

{"parameters": {}, "review": {"kind": ...

In closing, with the capability to be strict on defining a violation to be true or false, there is no need for undefined response behaviour. An undefined in Rego expression seems like a mishandling to be mindful of.

OPA Query-explanation/Tracing

For those adept with the OPA Gatekeeper tool set, or just a keen eye with future prospects in where’s waldo, likely would have caught the issue earlier.

When performing debugging withing the OPA container, there is the option to enable tracing/query-explanations.