Written by George Kolecsanyi
on February 12, 2024

Localise debugging of OPA Gatekeeper Rego Policy

Introduction

Background

On November 2022, in a attempt to better enforce policy in our Kubernetes clusters, we at Reecetech adopted OPA Gatekeeper.

Successor to the Open Policy Agent project, Gatekeeper is a Kubernetes-centric offering that serves an admission controller utilising OPA. This admission controller provides a way to manage and enforce policies using OPA’s expressive Rego policy language.

Just like any new adopters of any piece of tech, off we headed to the documentation.

Following the reading of the documentation and assuming a full embrace of Gatekeeper, you may have landed in the same stack as us; TL;DR:

A library of policies containing your templates and constraints to enforce
Kubernetes objects to test violation or conformance against
Exercising the use of Gator CLI to assert test cases against policies

Awesome! So however we decide to curate our policies, we can test our policies locally and through continuous integration.

Disallowing Anonymous Access

Our efforts in localising the debugging of Gatekeeper Rego policy stemmed from the implementation of a policy. This policy was to disallow roles to exist which allow rights to anonymous users within our Kubernetes clusters. It can be performed by disallowing admission of role bindings tied to the system:unauthenticated and system:anonymous Kubernetes groups.

This implementation was copied from the OPA Gatekeeper project as they provide a policy to Disallow Anonymous Access through their community-driven library.

Later after implementation of the aforementioned policy, by pure chance, we encountered the existence of the following Cluster Role Binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: oidc-reviewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:service-account-issuer-discovery
subjects:
  - kind: Group
    name: system:unauthenticated
    apiGroup: rbac.authorization.k8s.io

This cluster role binding essentially allows an unauthenticated user to access URL paths for support on how to authenticate, and access keys for identity verification.

This was not showing up as a violation in any cluster. What’s going on? This clearly is binding to the disallowed group. There was a need to poke around the test framework.

Setting Up

Structure

Let’s delve into the setup of the directory structure.

.
└── policy-library
    └── k8sdisallowanonymous
        ├── template.yaml
        ├── constraint.yaml
        ├── suite.yaml
        └── tests
            └── oidc-reviewer.yaml

template.yaml: disallow anonymous access policy copied from Disallow Anonymous Access

File Contents 📄

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sdisallowanonymous
  annotations:
    metadata.gatekeeper.sh/title: "Disallow Anonymous Access"
    metadata.gatekeeper.sh/version: 1.0.0
    description: Disallows associating ClusterRole and Role resources to the system:anonymous user and system:unauthenticated group.
spec:
  crd:
    spec:
      names:
        kind: K8sDisallowAnonymous
      validation:
        # Schema for the `parameters` field
        openAPIV3Schema:
          type: object
          properties:
            allowedRoles:
              description: >-
                The list of ClusterRoles and Roles that may be associated
                with the `system:unauthenticated` group and `system:anonymous`
                user.                
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sdisallowanonymous

        violation[{"msg": msg}] {
          not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
          review(input.review.object.subjects[_])
          msg := sprintf("Unauthenticated user reference is not allowed in %v %v ", [input.review.object.kind, input.review.object.metadata.name])
        }

        is_allowed(role, allowedRoles) {
          role.name == allowedRoles[_]
        }

        review(subject) = true {
          subject.name == "system:unauthenticated"
        }

        review(subject) = true {
          subject.name == "system:anonymous"
        }

constraint.yaml: configuration to enforce the disallow anonymous access policy template

File Contents 📄

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDisallowAnonymous
metadata:
  name: no-anonymous
spec:
enforcementAction: deny
match:
  kinds:
    - apiGroups: ["rbac.authorization.k8s.io"]
      kinds: ["ClusterRoleBinding"]

suite.yaml: Gator CLI test cases for the constraint

File Contents 📄

kind: Suite
apiVersion: test.gatekeeper.sh/v1alpha1
metadata:
  name: noanonymous
tests:
  - name: no-anonymous
    template: template.yaml
    constraint: constraint.yaml
    cases:
      - name: test-oidc-reviewer
        object: tests/oidc-reviewer.yaml
        assertions:
          - violations: 'yes'
          - message: Unauthenticated user reference is not allowed
            violations: 1

oidc-reviewer.yaml: the introduced oidc-reviewer CRB object in question

File Contents 📄

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: oidc-reviewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:service-account-issuer-discovery
subjects:
  - kind: Group
    name: system:unauthenticated
    apiGroup: rbac.authorization.k8s.io

Testing

Using the preceding file and directory structure, we can assert the test suite through the Gator CLI. No need to install anything, we can just leverage the Gator CLI docker container.

docker run --rm \
    -v ${PWD}/policy-library:/home/nonroot/ \
        openpolicyagent/gator:v3.14.0 \
            verify -v k8sdisallowanonymous/...

Where the following output resulted:

=== RUN   no-anonymous
    === RUN   test-oidc-reviewer
    --- FAIL: test-oidc-reviewer	(0.002s)
        unexpected number of violations: got 0 violations but want at least 1: got messages []
--- FAIL: no-anonymous	(0.007s)
FAIL	k8sdisallowanonymous/suite.yaml	0.007s
FAIL

Once again, we are expecting a violation with a message to assert the issue, but no complaint from the policy. Gator CLI is voicing that the tests were not asserted correctly. Sounds like we might need to debug the policy.

Debugging

Referring back to documentation, Gatekeeper provide some easy-to-follow steps for debugging by Viewing the Request Object.

Fetching the Admission Review Object

This is done by forcing a violation by outputting the admission review input object.

diff --git a/policy-library/k8sdisallowanonymous/template.yaml b/policy-library/k8sdisallowanonymous/template.yaml
index 56a3d84..b508e5d 100644
--- a/policy-library/k8sdisallowanonymous/template.yaml
+++ b/policy-library/k8sdisallowanonymous/template.yaml
@@ -30,6 +30,10 @@ spec:
       rego: |
         package k8sdisallowanonymous

+        violation[{"msg": msg}] {
+          msg := sprintf("REVIEW OBJECT: %v", [input])
+        }
+
         violation[{"msg": msg}] {
           not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
           review(input.review.object.subjects[_])

We chose to output the entire admission review input object by adding the preceding code to the beginning of the disallow anonymous access template. This would allow for easy copy-paste and testing against policy.

After re-running the Gator CLI test container, we received the following request object from standard output.

{"parameters": {}, "review": {"kind": {"group": "rbac.authorization.k8s.io", "kind": "ClusterRoleBinding", "version": "v1"}, "name": "oidc-reviewer", "object": {"apiVersion": "rbac.authorization.k8s.io/v1", "kind": "ClusterRoleBinding", "metadata": {"name": "oidc-reviewer"}, "roleRef": {"apiGroup": "rbac.authorization.k8s.io", "kind": "ClusterRole", "name": "system:service-account-issuer-discovery"}, "subjects": [{"apiGroup": "rbac.authorization.k8s.io", "kind": "Group", "name": "system:unauthenticated"}]}}}

Cool. I followed the debugging guide. What do I do with this? And how can I make use of this admission review object with our policies?

One unfortunate thing about the debugging guide on Gatekeeper is that they teach you how to fetch the admission review object, but not how to use it. It is not easily understood to new learners of the Gatekeeper tool set. In terms of granular debugging, re-running the Gator CLI with minor amendments may end up being less effective when scoping in on the low-level. What would be better would be to play in a Rego environment directly.

Locally debugging policy

To perform debugging locally, we can run a OPA container which provides Rego language in a Read-Evaluate-Print Loop (REPL).

docker run -it --rm openpolicyagent/opa

Since we are working with a REPL Rego policy engine, lets predefine the underlying rules and constants prior to defining the violation set rule.

input := {"parameters": {}, "review": {"kind": {"group": "rbac.authorization.k8s.io", "kind": "ClusterRoleBinding", "version": "v1"}, "name": "oidc-reviewer", "object": {"apiVersion": "rbac.authorization.k8s.io/v1", "kind": "ClusterRoleBinding", "metadata": {"name": "oidc-reviewer"}, "roleRef": {"apiGroup": "rbac.authorization.k8s.io", "kind": "ClusterRole", "name": "system:service-account-issuer-discovery"}, "subjects": [{"apiGroup": "rbac.authorization.k8s.io", "kind": "Group", "name": "system:unauthenticated"}]}}}

Response: Rule 'input' defined in package repl. Type 'show' to see rules.

```
is_allowed(role, allowedRoles) {
  role.name == allowedRoles[_]
}
```
Response: Rule 'is_allowed' defined in package repl. Type 'show' to see rules.
```
review(subject) = true {
  subject.name == "system:unauthenticated"
}
```
Response: Rule 'review' defined in package repl. Type 'show' to see rules.
```
review(subject) = true {
  subject.name == "system:anonymous"
}
```
Response: Rule 'review' defined in package repl. Type 'show' to see rules.

Now we are in a position that we can test the Rego policy so we can take a look at the violation function:

violation[{"msg": msg}] {
  not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
  review(input.review.object.subjects[_])
  msg := sprintf("Unauthenticated user reference is not allowed in %v %v ", [input.review.object.kind, input.review.object.metadata.name])
}

On attempt to fetch the violation, the following occurs:

violation[{"msg": msg}]

Response: undefined

With Rego this can occur if the body of the violation rule never evaluates to true. Syntactically speaking, this makes sense, as a “violation” would be a “violation” if it is true. If a violation is false or undefined, then it isn’t a violation. Rego treats rule bodies the same. The violation is not assigned due to this behaviour.

So what condition within the violation set is not resolving?

Let’s go more granular

We need to assess the conditions within the violation set assignment to hone in on the underlying issue.

For a pure sanity check, let’s see if the msg assignment resolves:

```
sprintf("Unauthenticated user reference is not allowed in %v %v ", [input.review.object.kind, input.review.object.metadata.name])
```
Response: "Unauthenticated user reference is not allowed in ClusterRoleBinding oidc-reviewer " No undefined result here, looks like it resolved correctly.

Now lets check the review() rule which assesses if the group is one of the violating groups:

review(input.review.object.subjects[_])

Response: true; No problems here.

Well it must be the is_allowed() rule. Lets assess the value.

```
not is_allowed(input.review.object.roleRef, input.parameters.allowedRoles)
```
Response: undefined; Found You! 👈 Since the rule is just comparing the values between the is_allowed parameters, we can just display the parameter values.
- ```
input.review.object.roleRef
```
  Response: {"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"system:service-account-issuer-discovery"}
- ```
input.parameters.allowedRoles
```
  Response: undefined

Looks like the input parameters of allowedRoles is the culprit, and the Rego policy cannot support the lack of presence of this parameter. Ideally we would have liked to have the parameters mentioned in the constraint as:

 apiVersion: constraints.gatekeeper.sh/v1beta1
 kind: K8sDisallowAnonymous
 metadata:
   name: no-anonymous
 spec:
 enforcementAction: deny
 match:
   kinds:
     - apiGroups: ["rbac.authorization.k8s.io"]
       kinds: ["ClusterRoleBinding"]
+  parameters:
+    allowedRoles:
+      - some-allowed-anonymous-role

This is also apparent with the admission review input object containing an empty collection of parameters:

{"parameters": {}, "review": {"kind": ...

In closing, with the capability to be strict on defining a violation to be true or false, there is no need for undefined response behaviour. An undefined in Rego expression seems like a mishandling to be mindful of.

OPA Query-explanation/Tracing

For those adept with the OPA Gatekeeper tool set, or just a keen eye with future prospects in where’s waldo, likely would have caught the issue earlier.

When performing debugging withing the OPA container, there is the option to enable tracing/query-explanations.

Submitting the command trace full will enable full tracing/query-explanation

See More ℹ️

Can be affirmed when you show debug:

 {
-  "explain": "off",
+  "explain": "full",
   "metrics": false,
   "instrument": false,
   "profile": false,
   "strict-builtin-errors": false
 }

When performing the following violation assignment in the REPL, should return the following response:

violation[{"msg": msg}]

Enter data.repl.violation[{"msg": msg}] = _
| Eval data.repl.violation[{"msg": msg}] = _
| Index data.repl.violation (matched 1 rule)
| Enter data.repl.violation
| | Eval __local6__ = data.repl.input.review.object.roleRef
| | Index data.repl.input (matched 1 rule, early exit)
| | Enter data.repl.input
| | | Eval true
| | | Exit data.repl.input early
| | Eval __local7__ = data.repl.input.parameters.allowedRoles
| | Index data.repl.input (matched 1 rule, early exit)
| | Fail __local7__ = data.repl.input.parameters.allowedRoles
| | Redo __local6__ = data.repl.input.review.object.roleRef
| | Redo data.repl.input
| | | Redo true
| Fail data.repl.violation[{"msg": msg}] = _
undefined

🧐 Fail __local7__ = data.repl.input.parameters.allowedRoles

← → Top