OpenShift Container Platform 4 on Azure using Installer Provisioned Infrastructure

Overview

This post is entirely for fun. I am trying a developer preview product – the OpenShift Container Platform 4 (OCP 4) Installer Provisioned Infrastructure (IPI) on Microsoft Azure.

I really didn’t want the day to end in blood, sweat, and tears so I went through as much documentation as I could related to OCP4.1 about AWS, Azure, and generally some code. I created a pay as you go account an purchased a domain name. For now let’s call it example.com.

Blood, Sweat, and Tears (or not)

The first thing I created in Azure was a resource group called openshift4-azure. I created a public DNS zone with a DNS name  in that resource group that was delegated for management to the Azure DNS servers. This is to manage the entries that the OCP 4. installer will need to create in order to manage traffic into the cluster.

I then created my local golang environment. I needed to create a golang environment and path https://golang.org/doc/install. This was to compile the installer for Azure. The binaries are not readily available yet.  To test that this was working env |grep GOPATH.  My path is $HOME/go.

I then forked the openshift/installer repository and cloned it in the go path: $HOME/go/src/github.com/openshift/ . I added the correct upstream git remote add upstream https://github.com/openshift/installer.git to my fork in case I made code/documentation changes for PRs. To build the binary I needed I run ./hack/build.sh from the installer. This created the installer in the bin  folder.

I followed the instructions at https://github.com/openshift/installer/tree/master/docs/user/azure/install.md to clone an image for CoreOS in my region. In every region where I want to create a cluster I need to copy the same image. I wanted to run these repeatedly so I downloaded the Azure CLI  from https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-yum?view=azure-cli-latest. As I’m running this in uksouth so these are the commands I needed to run:

export VHD_NAME=rhcos-410.8.20190504.0-azure.vhd
az storage account create --location uksouth --name ckocp4storage --kind StorageV2 --resource-group openshift-azure
az storage container create --name vhd --account-name ckocp4storage
az group create --location uksouth --name rhcos_images
ACCOUNT_KEY=$(az storage account keys list --account-name ckocp4storage --resource-group openshift-azure --query "[0].value" -o tsv)
az storage blob copy start --account-name "ckocp4storage" --account-key "$ACCOUNT_KEY" --destination-blob "$VHD_NAME" --destination-container vhd --source-uri "https://openshifttechpreview.blob.core.windows.net/rhcos/$VHD_NAME"

To create a unique storage account it took me a few tries. I think it needs to be unique across a region.

It is recommended to use Premium_LRS sku. To get premium storage in Azure in a PAYG account,  I needed to enable the right subscription in the storage provider in PayAsYouGo subscription -> Resource Providers. This needed to be registered. Before creating the image, the storage blob needs to finish creating otherwise you get the following error:

Cannot import source blob https://ckocp4storage.blob.core.windows.net/vhd/rhcos-410.8.20190504.0-azure.vhd since it has not been completely copied yet. Copy status of the blob is CopyPending.
export RHCOS_VHD=$(az storage blob url --account-name ckocp4storage -c vhd --name "$VHD_NAME" -o tsv)
az image create --resource-group rhcos_images --name rhcostestimage --os-type Linux --storage-sku Premium_LRS --source "$RHCOS_VHD" --location uksouth

I created a service principal for my installation and copied the following somewhere safe:

 az​ ad sp create-for-rbac --name   openshift4azure
{
"appId": "serviceprincipal",
"displayName": "openshift4azure",
"name": "http://openshift4azure",
"password": serviceprincipalpassword",
"tenant": "tenant id"
}

And gave it the following access:

az role assignment create --assignee serviceprincipal --role "User Access Administrator"
az role assignment create --assignee serviceprincipal --role "Contributor"

I then got my oc client and pull secret as described at https://cloud.redhat.com/openshift/install/azure/user-provisioned.

I tried my first Azure IPI OCP4 install and the first thing that I got was the following.

openshift-install create cluster
? SSH Public Key $HOME/.ssh/id_rsa.pub
? azure subscription id yyyy-xxxx-nnnn-bbbb-fffffff
? azure tenant id yyy-xxxx-nnnn-bbbb-nnnnnn
? azure service principal client id yyy-xxxx-nnnn-bbbb-ccccccccc
? azure service principal client secret [? for help] ************************************
INFO Saving user credentials to "$HOME/.azure/osServicePrincipal.json" 
? Region uksouth
? Base Domain example.com
? Cluster Name attempt1
? Pull Secret [? for help] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Creating infrastructure resources... 
^CERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/bootstrap/main.tf line 117, in resource "azurerm_virtual_machine" "bootstrap": 
ERROR 117: resource "azurerm_virtual_machine" "bootstrap" { 
ERROR 
ERROR 
ERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/master/master.tf line 44, in resource "azurerm_virtual_machine" "master": 
ERROR 44: resource "azurerm_virtual_machine" "master" { 
ERROR 
ERROR 
ERROR 
ERROR Error: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 8, Additional requested: 8. Please read more about quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/{\"subId\":\"ae90eef6-f8ea-479c-8c6a-9dd4bf9e47d0\",\"pesId\":\"15621\",\"supportTopicId\":\"32447243\"}." 
ERROR 
ERROR on ../../../../../../../../tmp/openshift-install-216822811/master/master.tf line 44, in resource "azurerm_virtual_machine" "master": 
ERROR 44: resource "azurerm_virtual_machine" "master" { 
ERROR 
ERROR 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform

Standard PAYG account does not allow for the amount of resources that IPI will create. It requires more than the 10 compute resources available so I needed to increase compute quota to allow for creation:

Resource Manager, UKSOUTH, DSv2 Series from 10 to 100
Resource Manager, UKSOUTH, DSv3 Series from 10 to 100

I needed to export the environment variable for the install image for RHCOS which I found from the account storage account blob:

export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE="/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage"

I destroyed the stack oc destroy cluster --dir=cluster-dir to try again and watched with glee as my Azure attempt-1 resource group diminished. It was then time for attempt 2 for which I also passed the Azure authentication credentials location in a json file by exporting this variable AZURE_AUTH_LOCATION=creds.json. Baaaad idea. The installer overwrote my credentials location. It’s a good thing I had a copy and didn’t particularly care.

Attempt 2 seems to have worked. I have an operational cluster. All my operators are running in a good state (not degraded and not progressing):

bin]$ ~/bin/oc get co
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.okd-2019-06-25-110619   True        False         False      46m
cloud-credential                           4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
cluster-autoscaler                         4.2.0-0.okd-2019-06-25-110619   True        False         False      65m
console                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      50m
dns                                        4.2.0-0.okd-2019-06-25-110619   True        False         False      65m
image-registry                             4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
ingress                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      53m
kube-apiserver                             4.2.0-0.okd-2019-06-25-110619   True        False         False      61m
kube-controller-manager                    4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
kube-scheduler                             4.2.0-0.okd-2019-06-25-110619   True        False         False      61m
machine-api                                4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
machine-config                             4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
marketplace                                4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
monitoring                                 4.2.0-0.okd-2019-06-25-110619   True        False         False      52m
network                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
node-tuning                                4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
openshift-apiserver                        4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
openshift-controller-manager               4.2.0-0.okd-2019-06-25-110619   True        False         False      62m
openshift-samples                          4.2.0-0.okd-2019-06-25-110619   True        False         False      53m
operator-lifecycle-manager                 4.2.0-0.okd-2019-06-25-110619   True        False         False      63m
operator-lifecycle-manager-catalog         4.2.0-0.okd-2019-06-25-110619   True        False         False      63m
operator-lifecycle-manager-packageserver   4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
service-ca                                 4.2.0-0.okd-2019-06-25-110619   True        False         False      66m
service-catalog-apiserver                  4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
service-catalog-controller-manager         4.2.0-0.okd-2019-06-25-110619   True        False         False      60m
storage                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      59m
support                                    4.2.0-0.okd-2019-06-25-110619   True        False         False      66m

 

Conclusion

For a first attempt on a developer preview things went very well. I’ve trolled through the Azure logs and found things like access role issues so I still don’t know if I’ve made a mistake on my Service Principal allocation. I think some better error handling and messages would help with the installer. I’d hate to see things like Machine Sets not being able to be expanded because my IAM is wrong and I didn’t know about it. Ofcourse general things like installation behind proxy, bring your own DNS or SecurityGroups/Networking and better publicising of the CoreOS images would also help.

I’m hoping to find out more as I use the cluster over the next few days. If you haven’t yet, try the installer on Azure and let me know what you think:

  1. To get started, visit try.openshift.com and click on “Get Started”.
  2. Log in or create a Red Hat account and follow the instructions for setting up your first cluster on Azure.

References

One thought on “OpenShift Container Platform 4 on Azure using Installer Provisioned Infrastructure

  1. +1 for effort. I wouldn’t run this in production, as the registry uses ephemeral storage – your images will disappear if the registry pod is restarted!

    OpenShift 4.2 is scheduled to GA Azure support. We’ll properly use Azure blob storage for the registry (amongst many, many other updates). Stay tuned!

    Liked by 1 person

Leave a comment