SSL and the Openshift API

Adventures trying to use the OpenShift and Kubernetes APIs on RHEL, Windows, and from within a pod.

One of the great features of Kubernetes and OpenShift is its API and the ability to perform any action by making the correct REST call, but as a wise man once said the three great virtues of a programmer are laziness, impatience, and hubris. Making REST calls explicitly and dealing with the results, even in a language like python, is long-winded and hard work so obviously libraries have been that wrap up the OpenShift API to make life easy.

Sadly writing great documentation is really not a virtue that comes naturally to programmers so the OpenShift python libraries take a little figuring out to make work, especially when you start hitting weird errors or working on platforms that are not well tested.

Installation is pretty straightforward, a simple ††

pip install openshift

will install both the openshift python library, and the Kubernetes python library that it builds upon. Alternately on RHEL7 a simple

yum install python2-openshift.noarch

should work. Failing that the code for the openshift library is part of the official openshift release and can be found at https://github.com/openshift/openshift-restclient-python

Once you have it installed using it is simple:

from kubernetes import client, config
from openshift.dynamic import DynamicClient

k8s_client = config.new_client_from_config()
client = DynamicClient(k8s_client)
v1_pod = client.resources.get(api_version ='v1', kind='Pod')                                        
podList = v1_pod.get(namespace = "default")
for pod in podList['items']:
    print("%s %s"%(pod.metadata.name,pod.status.podIP))

which should use the credentials stored in .kube/config (generated by the oc login command) and use them to get all the pods in the clusters default namespace, and list their names and IPs.

Sadly the difference between theory and practice is that in theory they are the same, and in practice they’re not, so what actually happened when I ran that on my development platform, a slightly left field mix of Windows10, python, and mingw (don’t ask, the customer makes the rules) I was greeted with an ugly SSL error message clearly suggesting a certificate problem:

2019-03-19 17:21:30,191 WARNING Retrying 
(Retry(total=2, connect=None, read=None, redirect=None, status=None))
 after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]
certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)'))': /version

There is an issue in github that seems to fit, https://github.com/openshift/openshift-restclient-python/issues/198 but no solution.

An experiment with the pure Kubernetes library had the same result.

Several frustrating hours later it appears that there is something strange in how python on windows deals with certificates, quite what would take a lot longer to figure out than I had, certainly longer than it took to figure out the workaround!

That workaround is pretty simple and obvious, you just turn off ssl verification, its not exactly great practice, and should never be done in production, but the technique opens the way to a more flexible approach to authorisation. This leaves our little demo program looking like:

import os
from kubernetes import client, config
from openshift.dynamic import DynamicClient

token = os.getenv("OCP_TOKEN")

config = client.Configuration()
config.host = "https://example.com:8443/"
config.verify_ssl = False
config.api_key = {"authorization": "Bearer " + token}

k8s_client = client.ApiClient(config)
client = DynamicClient(k8s_client)

v1_pod = client.resources.get(api_version ='v1', kind='Pod')
podList = v1_pod.get(namespace = "default")
for pod in podList['items']:
    print("%s %s"%(pod.metadata.name,pod.status.podIP))

Complete with reading the token from an environment variable which you can set with

export OCP_TOKEN=`oc login -t`

And there you are, a simple workaround to an annoying problem.

RHEL

Interestingly while testing out that bit of code I discovered another annoying ssl issue, this time running on RHEL7, where turning off SSL verification leads to the following error:

urllib3.exceptions.SSLError: 
[Errno 2] No such file or directory

This one comes down to the python2-certifi package which seems to point to a certificate file that it should provide itself but doesn’t, and is needed even with verification turned off. This looks to be fixed in the python3 version, but python2 is still the default on RHEL7 so that’s a little annoying. Luckily, this gives us the chance to make out little test program a lot better, by turning verification back on and pointing it to the system certs provided by the ca-certificates package:

import os
from kubernetes import client, config
from openshift.dynamic import DynamicClient

token = os.getenv("OCP_TOKEN")

k8sconfig = client.Configuration()
config.host = "https://example.com:8443/"
k8sconfig.verify_ssl = True
k8sconfig.ssl_ca_cert = "/etc/pki/tls/certs/ca-bundle.crt"
k8sconfig.api_key = {"authorization": "Bearer " + token}

k8s_client = client.ApiClient(k8sconfig)
client = DynamicClient(k8s_client)
v1_pod = client.resources.get(api_version ='v1', kind='Pod')
mypod = v1_pod.get(namespace = "default")
for pod in mypod['items']:
    print("%s %s"%(pod.metadata.name,pod.status.podIP))

And there we have it, the different variations of the program we need to get it working on different client platforms.

As a final word the kubernetes python library is similar but a little different, the same program would look more like:

import os
from kubernetes import client, config

k8s_client = config.new_client_from_config()

token = os.getenv("OCP_TOKEN")

k8sconfig = client.Configuration()
k8sconfig.host = "https://example.com:8443"
k8sconfig.verify_ssl = True
k8sconfig.ssl_ca_cert = "/etc/pki/tls/certs/ca-bundle.crt"
k8sconfig.api_key = {"authorization": "Bearer " + token}

k8sclient = client.ApiClient(k8sconfig)
v1 = client.CoreV1Api(k8sclient)
podList = v1.list_namespaced_pod(namespace="default")
for pod in podList.items:
    print("%s"%(pod.metadata.name))

Postscript

Just for fun, if you wanted to run this from within an OpenShift container there are a couple of changes that are worth making. There is a service that gets set up automatically which allows pods to contact the API server at https://kubernetes.default.io and the certs required to talk to it are injected into all pods at run time. So a our little test program becomes:

import os
from kubernetes import client, config
from openshift.dynamic import DynamicClient

token = os.getenv("OCP_TOKEN")

k8sconfig = client.Configuration()
config.host = "https://kubernetes.default.io"
k8sconfig.verify_ssl = True
k8sconfig.ssl_ca_cert = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
k8sconfig.api_key = {"authorization": "Bearer " + token}

k8s_client = client.ApiClient(k8sconfig)
client = DynamicClient(k8s_client)
v1_pod = client.resources.get(api_version ='v1', kind='Pod')
mypod = v1_pod.get(namespace = "default")
for pod in mypod['items']:
    print("%s %s"%(pod.metadata.name,pod.status.podIP))

OpenShift Container Platform 4 on Azure using Installer Provisioned Infrastructure

Overview

This post is entirely for fun. I am trying a developer preview product – the OpenShift Container Platform 4 (OCP 4) Installer Provisioned Infrastructure (IPI) on Microsoft Azure.

I really didn’t want the day to end in blood, sweat, and tears so I went through as much documentation as I could related to OCP4.1 about AWS, Azure, and generally some code. I created a pay as you go account an purchased a domain name. For now let’s call it example.com.

Blood, Sweat, and Tears (or not)

The first thing I created in Azure was a resource group called openshift4-azure. I created a public DNS zone with a DNS name  in that resource group that was delegated for management to the Azure DNS servers. This is to manage the entries that the OCP 4. installer will need to create in order to manage traffic into the cluster.

I then created my local golang environment. I needed to create a golang environment and path https://golang.org/doc/install. This was to compile the installer for Azure. The binaries are not readily available yet.  To test that this was working env |grep GOPATH.  My path is $HOME/go.

I then forked the openshift/installer repository and cloned it in the go path: $HOME/go/src/github.com/openshift/ . I added the correct upstream git remote add upstream https://github.com/openshift/installer.git to my fork in case I made code/documentation changes for PRs. To build the binary I needed I run ./hack/build.sh from the installer. This created the installer in the bin  folder.

I followed the instructions at https://github.com/openshift/installer/tree/master/docs/user/azure/install.md to clone an image for CoreOS in my region. In every region where I want to create a cluster I need to copy the same image. I wanted to run these repeatedly so I downloaded the Azure CLI  from https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-yum?view=azure-cli-latest. As I’m running this in uksouth so these are the commands I needed to run:

export VHD_NAME=rhcos-410.8.20190504.0-azure.vhd
az storage account create --location uksouth --name ckocp4storage --kind StorageV2 --resource-group openshift-azure
az storage container create --name vhd --account-name ckocp4storage
az group create --location uksouth --name rhcos_images
ACCOUNT_KEY=$(az storage account keys list --account-name ckocp4storage --resource-group openshift-azure --query "[0].value" -o tsv)
az storage blob copy start --account-name "ckocp4storage" --account-key "$ACCOUNT_KEY" --destination-blob "$VHD_NAME" --destination-container vhd --source-uri "https://openshifttechpreview.blob.core.windows.net/rhcos/$VHD_NAME"

To create a unique storage account it took me a few tries. I think it needs to be unique across a region.

It is recommended to use Premium_LRS sku. To get premium storage in Azure in a PAYG account,  I needed to enable the right subscription in the storage provider in PayAsYouGo subscription -> Resource Providers. This needed to be registered. Before creating the image, the storage blob needs to finish creating otherwise you get the following error:

Cannot import source blob https://ckocp4storage.blob.core.windows.net/vhd/rhcos-410.8.20190504.0-azure.vhd since it has not been completely copied yet. Copy status of the blob is CopyPending.
export RHCOS_VHD=$(az storage blob url --account-name ckocp4storage -c vhd --name "$VHD_NAME" -o tsv)
az image create --resource-group rhcos_images --name rhcostestimage --os-type Linux --storage-sku Premium_LRS --source "$RHCOS_VHD" --location uksouth

I created a service principal for my installation and copied the following somewhere safe:

 az​ ad sp create-for-rbac --name   openshift4azure
{
"appId": "serviceprincipal",
"displayName": "openshift4azure",
"name": "http://openshift4azure",
"password": serviceprincipalpassword",
"tenant": "tenant id"
}

And gave it the following access:

az role assignment create --assignee serviceprincipal --role "User Access Administrator"
az role assignment create --assignee serviceprincipal --role "Contributor"

I then got my oc client and pull secret as described at https://cloud.redhat.com/openshift/install/azure/user-provisioned.

I tried my first Azure IPI OCP4 install and the first thing that I got was the following.

openshift-install create cluster
? SSH Public Key $HOME/.ssh/id_rsa.pub
? azure subscription id yyyy-xxxx-nnnn-bbbb-fffffff
? azure tenant id yyy-xxxx-nnnn-bbbb-nnnnnn
? azure service principal client id yyy-xxxx-nnnn-bbbb-ccccccccc
? azure service principal client secret [? for help] ************************************
INFO Saving user credentials to "$HOME/.azure/osServicePrincipal.json" 
? Region uksouth
? Base Domain example.com
? Cluster Name attempt1
? Pull Secret [? for help] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Creating infrastructure resources... 
^CERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/bootstrap/main.tf line 117, in resource "azurerm_virtual_machine" "bootstrap": 
ERROR 117: resource "azurerm_virtual_machine" "bootstrap" { 
ERROR 
ERROR 
ERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/master/master.tf line 44, in resource "azurerm_virtual_machine" "master": 
ERROR 44: resource "azurerm_virtual_machine" "master" { 
ERROR 
ERROR 
ERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/master/master.tf line 44, in resource "azurerm_virtual_machine" "master": 
ERROR 44: resource "azurerm_virtual_machine" "master" { 
ERROR 
ERROR 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform

Standard PAYG account does not allow for the amount of resources that IPI will create. It requires more than the 10 compute resources available so I needed to increase compute quota to allow for creation:

Resource Manager, UKSOUTH, DSv2 Series from 10 to 100
Resource Manager, UKSOUTH, DSv3 Series from 10 to 100

I needed to export the environment variable for the install image for RHCOS which I found from the account storage account blob:

export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE="/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage"

I destroyed the stack oc destroy cluster --dir=cluster-dir to try again and watched with glee as my Azure attempt-1 resource group diminished. It was then time for attempt 2 for which I also passed the Azure authentication credentials location in a json file by exporting this variable AZURE_AUTH_LOCATION=creds.json. Baaaad idea. The installer overwrote my credentials location. It’s a good thing I had a copy and didn’t particularly care.

Attempt 2 seems to have worked. I have an operational cluster. All my operators are running in a good state (not degraded and not progressing):

bin]$ ~/bin/oc get co
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.okd-2019-06-25-110619   True        False         False      46m
cloud-credential                           4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
cluster-autoscaler                         4.2.0-0.okd-2019-06-25-110619   True        False         False      65m
console                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      50m
dns                                        4.2.0-0.okd-2019-06-25-110619   True        False         False      65m
image-registry                             4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
ingress                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      53m
kube-apiserver                             4.2.0-0.okd-2019-06-25-110619   True        False         False      61m
kube-controller-manager                    4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
kube-scheduler                             4.2.0-0.okd-2019-06-25-110619   True        False         False      61m
machine-api                                4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
machine-config                             4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
marketplace                                4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
monitoring                                 4.2.0-0.okd-2019-06-25-110619   True        False         False      52m
network                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
node-tuning                                4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
openshift-apiserver                        4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
openshift-controller-manager               4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
openshift-samples                          4.2.0-0.okd-2019-06-25-110619   True        False         False      53m
operator-lifecycle-manager                 4.2.0-0.okd-2019-06-25-110619   True        False         False      63m
operator-lifecycle-manager-catalog         4.2.0-0.okd-2019-06-25-110619   True        False         False      63m
operator-lifecycle-manager-packageserver   4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
service-ca                                 4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
service-catalog-apiserver                  4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
service-catalog-controller-manager         4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
storage                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
support                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      66m

 

Conclusion

For a first attempt on a developer preview things went very well. I’ve trolled through the Azure logs and found things like access role issues so I still don’t know if I’ve made a mistake on my Service Principal allocation. I think some better error handling and messages would help with the installer. I’d hate to see things like Machine Sets not being able to be expanded because my IAM is wrong and I didn’t know about it. Ofcourse general things like installation behind proxy, bring your own DNS or SecurityGroups/Networking and better publicising of the CoreOS images would also help.

I’m hoping to find out more as I use the cluster over the next few days. If you haven’t yet, try the installer on Azure and let me know what you think:

  1. To get started, visit try.openshift.com and click on “Get Started”.
  2. Log in or create a Red Hat account and follow the instructions for setting up your first cluster on Azure.

References