K8s on Oracle Cloud [Part 7]: Setting up cert-manager

10 minute read

cert-manager should be installed as one of the first services as many services depend on it and may fail to deploy corectly if cert-manager is missing.

To speed the process up, add automation and to make sure the entire installation can be easily replayed we use a set of scripts available on gihub repository: k8s-scripts. While there is some documentation for the scripts and you can look in scripts source code to get more details, this guide expands on the details explaining various options and suggesting optimal settings.

Personal notes: My personal notes on how to setup stuff to make it easier to repeat next time.

Step 1: Prerequisites

  • K8s on Oracle Cloud “Part 4” is completed.
  • K8s on Oracle Cloud “Part 6” and all before that, although not all of the services are used by cert-manager, they all will be needed later on.
  • cmctl installed. This is optional but the command line tool is very useful to check and diagnose cert-manager deployment. You can skip this but I highly recommend to get it and use it.

Step 2: Configuration

Please read carefuly Letsencrypt section as it is critical for correctly setting up cert-manager.

Version adjustment

k8s-scripts define versions for services which are up to date and tested at the when the project was last updated by it’s developers. These versions may become outdated over time, or perhaps you need/want to use a very specific version of the package.

To adjust the cert-manager package version look at the ~/.tigase-flux/envs/versions.env file and change value of the CM_VER property:

# Cert-Manager
CM_VER="1.8.0"

To check what is the latest available version for the package run command:

helm search hub --max-col-width 65 cert-manager | grep "URL\|cert-manager/cert-manager "

Example:

~$ helm search hub --max-col-width 65 cert-manager | grep "URL\|cert-manager/cert-manager "
URL                                                              	CHART VERSION	APP VERSION	DESCRIPTION
https://artifacthub.io/packages/helm/cert-manager/cert-manager   	1.8.0        	v1.8.0     	A Helm chart for cert-manager

Adjust the versions.env file with the latest CHART VERSION.

Letsencrypt settings

There are 3 settings affecting cert-manaher deployment. One is critical to set correctly and 2 others are importent to be left as they are:

SSL_EMAIL="EMAIL_FOR_LETSENCRYPT"
SSL_STAG_ISSUER="letsencrypt-staging"
SSL_PROD_ISSUER="letsencrypt"
ROUTE53_ACCESS_KEY=""
ROUTE53_SECRET_KEY=""
  • SSL_EMAIL - must be set to a correct and working email address. Technically, the email is not verified so anything email-like would work but Letsencrypt is going to send notifications about certificate expiration, renewal and possibly others. I do not know how would Letsencrypt handle it if emails are bounced back with unknown recipient. Maybe they would stop issuing certificates? It is better to set it to a correct working email address.
  • SSL_STAG_ISSUER and SSL_PROD_ISSUER these are just certificate issuer identification for your k8s cluster. In theory they can be set to anything. k8s-scripts scripts for other services may have hardcoded certificate issuer set to the default values, so it is better to left these as they are unless you really want to check them. If changed, make sure other services use a correct issuer as well.
  • ROUTE53_ACCESS_KEY, ROUTE53_SECRET_KEY are used to configure DNS01 challenge issuer. This is used when the DNS domain is hosted on AWS Route53. DNS01 challenge issuer is more flexible than HTTP01 as domains and hostnames do not have to point to the cluster’s IP address to obtain SSL certificates. Good tutorial on how to create AWS user with a correct access credentials.
    • Go to IAM
    • Add users
    • Set a User name - route53-man
    • Select AWS credential type - Programmatic access
    • Set permissions - Attach existing policies directly
    • Create policy - route53-man
      {
      "Version": "2012-10-17",
      "Statement": [
      {
        "Effect": "Allow",
        "Action": "route53:GetChange",
        "Resource": "arn:aws:route53:::change/*"
      },
      {
        "Effect": "Allow",
        "Action": "route53:ChangeResourceRecordSets",
        "Resource": "arn:aws:route53:::hostedzone/*"
      },
      {
        "Effect": "Allow",
        "Action": "route53:ListHostedZonesByName",
        "Resource": "*"
      }
      ]
      }
      
    • Attach the new policy to the user
    • Copy user’s Access keu ID and Secret access key to envs/cluster.env

Custom values

There is not much of custom configuration added by default.

    installCRDs: true
    prometheus:
      # set to true if you want to enable prometheus monitoring for cert-manager
      enabled: false
  • installCRDs must be set to true or cert-manager will not be able to work correctly. It needs several custom resources to be installed. If this property is set to true it installs all automaticall. Otherwise you would need to install them manually.
  • prometheus is set to false by default because cert-manager is installed as one of the first services before monitoring and prometheus is installed. So it is not yet available. If prometheus is already installed or if prometheus is installed later, this setting can be changed to true.

On top of this the cert-manager installation script adds additional custom files with certificate issuer configuration, which is set to obtain certificates from Letsencrypt service.

Step 2: Installation

Once all the settings are adjusted and ready, the installation step is pretty simple. We just have to run a correct script from the k8s-scripts package: scripts/cluster-cert-manager.sh. Before we run the script, I suggest to execute flux get all -A and/or flux get hr -A before and after to see the difference.

The installation script makes following changes:

  1. Installs the chart source repository
  2. Creates Helm release manifest for cert-manager in the FluxCD’s git repository
  3. Installs 2 certificate issuers: 3.1. letsencrypt-staging - for test certificates 3.2. letsencrypt - for production certificates

Optionally check the cert-manager installation using cmctl tool:

~/temp/k8s-scripts$ cmctl check api
Not ready: the cert-manager CRDs are not yet installed on the Kubernetes API server

And this is correct becuase we have not installed cert-manager yet.

Now is the time to run the installation script:

~/temp/k8s-scripts$ ./scripts/cluster-cert-manager.sh 
      Adding cert-manager source at https://charts.jetstack.io
/home/t/.tigase-flux/projects/cluster-name
[master 8fbc46c] cert-manager deployment
 2 files changed, 11 insertions(+)
 create mode 100644 infra/common/sources/cert-manager.yaml
Enumerating objects: 12, done.
Counting objects: 100% (12/12), done.
Delta compression using up to 16 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 891 bytes | 891.00 KiB/s, done.
Total 7 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/a/cluster-name
   491aa23..8fbc46c  master -> master
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision master/8fbc46cd7da63164b98d9fc3fbb743df914569ee
Waiting for the system to be ready
   Deploying cert-manager
Creating folder for cert-manager namespace...
Update service kustomization
/home/t/.tigase-flux/projects/cluster-name
Update namespace kustomization
/home/t/.tigase-flux/projects/cluster-name
Update common kustomization
/home/t/.tigase-flux/projects/cluster-name
[master 0781bce] cert-manager deployment
 5 files changed, 42 insertions(+)
 create mode 100644 infra/common/cert-manager/cert-manager/cert-manager.yaml
 create mode 100644 infra/common/cert-manager/cert-manager/kustomization.yaml
 create mode 100644 infra/common/cert-manager/kustomization.yaml
 create mode 100644 infra/common/cert-manager/namespace.yaml
Enumerating objects: 14, done.
Counting objects: 100% (14/14), done.
Delta compression using up to 16 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 1.39 KiB | 1.39 MiB/s, done.
Total 11 (delta 1), reused 2 (delta 1), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.
To https://github.com/a/cluster-name
   8fbc46c..0781bce  master -> master
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision master/0781bce54c1913f4348157a58ff5c7066f5cb42e
Waiting for the system to be ready
Update service kustomization for infra/common/cert-manager/cert-manager 
    in /home/t/.tigase-flux/projects/cluster-name
/home/t/.tigase-flux/projects/cluster-name
[master 3e96610] cert-manager deployment
 3 files changed, 30 insertions(+)
 create mode 100644 infra/common/cert-manager/cert-manager/issuer-production.yaml
 create mode 100644 infra/common/cert-manager/cert-manager/issuer-staging.yaml
Enumerating objects: 14, done.
Counting objects: 100% (14/14), done.
Delta compression using up to 16 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 1.12 KiB | 1.12 MiB/s, done.
Total 9 (delta 2), reused 1 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 1 local object.
To https://github.com/a/cluster-name
   0781bce..3e96610  master -> master
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision master/3e96610f70cae2163eb4c410043a7a246473b0c1

Seems to be successful, let check it out:

/.tigase-flux$ flux get hr -A
NAMESPACE    	NAME          	READY	MESSAGE                         	REVISION	SUSPENDED 
cert-manager 	cert-manager  	True 	Release reconciliation succeeded	v1.8.0  	False    	
flux-system  	sealed-secrets	True 	Release reconciliation succeeded	2.1.8   	False    	
ingress-nginx	ingress-nginx 	True 	Release reconciliation succeeded	4.1.1   	False    	

and kubectl:

~/temp/k8s-scripts$ kubectl get pods -n cert-manager
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-789bb474bd-n2fsm             1/1     Running   0          4m5s
cert-manager-cainjector-6bc9d758b-gb48s   1/1     Running   0          4m5s
cert-manager-webhook-586d45d5ff-z7t8d     1/1     Running   0          4m5s

and cert-manager API:

~/temp/k8s-scripts$ cmctl check api
The cert-manager API is ready

Let’s get more details:

~/temp/k8s-scripts$ kubectl describe deployment cert-manager -n cert-manager
Name:                   cert-manager
Namespace:              cert-manager
CreationTimestamp:      Wed, 11 May 2022 16:53:30 -0700
Labels:                 app=cert-manager
                        app.kubernetes.io/component=controller
                        app.kubernetes.io/instance=cert-manager
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=cert-manager
                        app.kubernetes.io/version=v1.8.0
                        helm.sh/chart=cert-manager-v1.8.0
                        helm.toolkit.fluxcd.io/name=cert-manager
                        helm.toolkit.fluxcd.io/namespace=cert-manager
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: cert-manager
                        meta.helm.sh/release-namespace: cert-manager
Selector:               app.kubernetes.io/component=controller,
                          app.kubernetes.io/instance=cert-manager,
                          app.kubernetes.io/name=cert-manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=cert-manager
                    app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=cert-manager
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=cert-manager
                    app.kubernetes.io/version=v1.8.0
                    helm.sh/chart=cert-manager-v1.8.0
  Service Account:  cert-manager
  Containers:
   cert-manager:
    Image:      quay.io/jetstack/cert-manager-controller:v1.8.0
    Port:       9402/TCP
    Host Port:  0/TCP
    Args:
      --v=2
      --cluster-resource-namespace=$(POD_NAMESPACE)
      --leader-election-namespace=kube-system
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
    Mounts:           <none>
  Volumes:            <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cert-manager-789bb474bd (1/1 replicas created)
Events:          <none>

More important and interesting is, perhaps, certificate issuer information:

~/temp/k8s-scripts$ kubectl get clusterissuer -A
NAME                  READY   AGE
letsencrypt           True    20h
letsencrypt-staging   True    20h

Which gives us a list of all certificate issuers available on the cluster.

Now let’s see details information about a specific issuer:

~/temp/k8s-scripts$ kubectl describe clusterissuer letsencrypt
Name:         letsencrypt
Namespace:    
Labels:       kustomize.toolkit.fluxcd.io/name=common
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         ClusterIssuer
Metadata:
  Creation Timestamp:  2022-05-11T23:53:57Z
  Generation:          1
  Managed Fields:
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name:
          f:kustomize.toolkit.fluxcd.io/namespace:
      f:spec:
        f:acme:
          f:email:
          f:privateKeySecretRef:
            f:name:
          f:server:
          f:solvers:
    Manager:      kustomize-controller
    Operation:    Apply
    Time:         2022-05-11T23:53:57Z
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:acme:
          .:
          f:lastRegisteredEmail:
          f:uri:
        f:conditions:
          .:
          k:{"type":"Ready"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:observedGeneration:
            f:reason:
            f:status:
            f:type:
    Manager:         cert-manager-clusterissuers
    Operation:       Update
    Subresource:     status
    Time:            2022-05-11T23:53:57Z
  Resource Version:  4451893
  UID:               f9f88019-fadb-48b3-8368-46b9cc4b09a8
Spec:
  Acme:
    Email:            cluster-name@domain.com
    Preferred Chain:  
    Private Key Secret Ref:
      Name:  letsencrypt
    Server:  https://acme-v02.api.letsencrypt.org/directory
    Solvers:
      http01:
        Ingress:
          Class:  nginx
Status:
  Acme:
    Last Registered Email:  cluster-name@domain.com
    Uri:                    https://acme-v02.api.letsencrypt.org/acme/acct/539454786
  Conditions:
    Last Transition Time:  2022-05-11T23:53:57Z
    Message:               The ACME account was registered with the ACME server
    Observed Generation:   1
    Reason:                ACMEAccountRegistered
    Status:                True
    Type:                  Ready
Events:                    <none>

It seems everything is ready and waiting.

Testing and verification that it really works

Ok, now everything seems to be working. Let’s check it out.

First we need a domain with DNS for that domain configured to point to our cluster load balancer, the one which is used by ingress. Yes, this is why we need ingress service installed.

Let’s say we want to use very rare and unique domain called example.com. I know, very unusual. Anyway. To make things more interesting, we want to obtain this certificate for a few subdomains as well: www.example.com and mail.example.com. Make sure all the domains and hostnames are configured and resolve correctly to the ingress LB IP addres:

~$ host example.com
example.com has address 123.456.789.123

Now, we have to create manifest file ~/cert-man-test.yaml for the certificate request for cert-manager on our cluster:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  secretName: example-com-tls
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  commonName: example.com
  dnsNames:
    - example.com
    - www.example.com
    - mail.example.com

Let’s load the file to the cluster and see what happens:

~$ kubectl apply -f ~/cert-man-test.yaml
certificate.cert-manager.io/example-com created

Now, we can check status of our request:

~$ kubectl get certificaterequest -A
NAMESPACE   NAME                    APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
default     example-com-2lspx      True                True    letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   21m
default     example-com-kjcrg      True                False   letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   12m

It usually takes a while, a few secs up to a few minutes, to receive a certificate. As we can see above, there are 2 requests, one is successfull, second is still not ready. To get more details on what is going on with the one “READY=False”, we can check our orders:

~$ kubectl get order -A
NAMESPACE   NAME                               STATE     AGE
default     example-com-2lspx-473502887       valid     22m
default     example-com-kjcrg-1627368428      pending   13m

We see one order is still pending. This most likely happens if one of the listed domains or hostnames does not have a proper DNS configuration or the DNS change we made is not yet propagated on the Internet.

You can check the order details for more information:

~$ kubectl describe order example-com-kjcrg-1627368428
...

Once the certificate request is successfull, you can edit the ~/cert-man-test.yaml file and replace ‘letsencrypt-staging’ with ‘letsencrypt’ to check if production certificates are being generated correctly.

Uninstallation

To uninstall cert-manager, you can run the installation script with --remove parameter:

~/temp/k8s-scripts$ ./scripts/cluster-cert-manager.sh --remove
Switched to context "cluster-name".
    Preparing to remove: cert-manager
    Removing: infra/common/cert-manager
Update service kustomization for infra/common/ in /home/k/.tigase-flux/projects/cluster-name
/home/k/.tigase-flux/projects/cluster-name
    Removing: infra/common/sources/cert-manager.yaml
Update service kustomization for infra/common/sources in /home/k/.tigase-flux/projects/cluster-name
/home/k/.tigase-flux/projects/cluster-name
[master c61d6e8] Removing cert-manager deployment
 11 files changed, 121 deletions(-)
 delete mode 100644 infra/common/cert-manager/cert-manager/cert-manager.yaml
 delete mode 100644 infra/common/cert-manager/cert-manager/issuer-production-dns.yaml
 delete mode 100644 infra/common/cert-manager/cert-manager/issuer-production.yaml
 delete mode 100644 infra/common/cert-manager/cert-manager/issuer-staging.yaml
 delete mode 100644 infra/common/cert-manager/cert-manager/kustomization.yaml
 delete mode 100644 infra/common/cert-manager/cert-manager/route53-secret.yaml
 delete mode 100644 infra/common/cert-manager/kustomization.yaml
 delete mode 100644 infra/common/cert-manager/namespace.yaml
 delete mode 100644 infra/common/sources/cert-manager.yaml
Enumerating objects: 12, done.
Counting objects: 100% (12/12), done.
Delta compression using up to 16 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 726 bytes | 726.00 KiB/s, done.
Total 7 (delta 3), reused 1 (delta 1), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
To https://github.com/a/cluster-name
   ce8f2e5..c61d6e8  master -> master
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision master/c61d6e8f44b8bc3965929ed014b2d78077512d0e

Note, if you have services which depends on cert-manager, uninstallation may fail or may be successful but the cluster can become unstable.