Cloud Platform Engineering, DevOps, SRE, Kubernetes, KOPS

Sharing Kubernetes clusters. Different approaches

Here’s a thing: in some senses Kubernetes abstracts physical infrastructure where services run. Just like with physical infrastructure there may be a clash between different teams or organisational units competing for resources or stepping on each others feet whilst “doing stuff”. Without getting into the details of how such issues were solved in past, let’s discuss how this could be solved in a modern infrastructure setup. As stated in pretty much each of my posts: no one solution fits all, thus you need to adapt it to your needs.

ACME: our organisation

The purpose of an IT infrastructure in this example is to address some IT needs of a company – ACME. ACME is a company employing anything between 25-200 IT engineers. It plans to expand its operations and market share thus its infrastructure need to scale both vertically and horizontally. When the company was founded a few years ago it had a small team of 6 engineers working on an online service called “Coffee Maker”. Back in the days it was a monolithic application deployed with a small set of scripts to one of many AWS VPCs.

Things moved on and “Coffee Maker” now has a microservices based architecture. It is composed of 11 different micro-services developed by 4 different teams: C, O, F and E. Each team takes care of a defined set of those services. Recently ACME started a development of a new online service called “Grinder”. It started with 3 teams G, R and I but plans a future expansion. ACME also has a small number of experienced engineers who work across all projects and are providing sort of “central” or “platform” service to the whole organisation.

Strategies for sharing Kubernetes clusters

O.K., so we have two separate products, few teams and multiple microservices. How do we split Kubernetes between them? When approaching a problem I like to start with two extreme approaches and then consider a few options in between these extremes. So, the two extreme approaches are:

Single Kubernetes cluster for all teams (one for non-prod and one for prod)

In this scenario we got one non-production cluster and one production cluster shared among teams. There is an authentication and authorisation mechanism in-place for accessing the cluster (i.e. backed by LDAP with groups and roles). The cluster is heavily namespaced and quite efficiently utilised. Let’s go over a few pros and cons:

Pros:

– Lower resources overhead – especially applicable to a small cluster (i.e. high ratio of masters to nodes) – Easier to achieve setup consistency for all teams and projects – More efficient “bean packing” of Kubernetes Nodes

Cons:

– Potentially higher blast radius when something goes totally wrong everyone is impacted – Harder to roll out systems impacting changes to the cluster – Greater need for careful RBAC setup for security and stability of the system – Greater need for cooperation between teams across products when dealing with shared resources

Kubernetes cluster per team (one for non-prod and one for prod)

In this scenario each team has its own Kubernetes clusters – one for non-production deployments (i.e. dev, qa, staging, etc) and one for production deployments. Here we can potentially give each team greater autonomy in managing clusters and at the same time we are isolating services. Let’s go over few pros and cons:

Pros:

– Team “owns” the Kubernetes cluster – gives more responsibility in managing it and leads to developer upskilling – Lower “blast radius” – when something goes totally wrong with the cluster, other services are up (unless they have a direct dependency) – Easier to attribute cost and spending – Teams working at “higher speeds” have greater flexibility in customising certain aspects of the solution – Easier to achieve separation of access from security point of view

Cons:

– For small clusters potentially higher infrastructure spend (i.e. overhead on masters or some monitoring resources) – Potentially higher management overhead, but teams can upskill and be more self-sufficient – Teams given too much flexibility could morph a cluster into something not-practical if not experienced enough with Kubernetes – leading to an “oddball solution”

As you can figure out, different approaches would be more optimal different organisations. They will depend on the ability to easily create clusters at scale (fully automated “cookie-cutting” of clusters, etc) and potentially the size of a required cluster and criticality or confidentiality of resources. Of course, you can come up with some middle ground approach, like having a Kubernetes cluster per product or having Kubernetes cluster per set of teams. There must however, be some logical reasoning for setting this up.

Common requirements for Kubernetes setup

Whichever way you decide to setup your clusters there are few things you should always take into consideration:

  • Make sure you can create and destroy each cluster in a fully automated manner – no snowflakes or “pet resources”.
  • Make sure you have sound authentication and authorisation mechanisms in place – no sharing of the “master key” with people.
  • Make sure your clusters are well monitored and technical people know how to operate them.