To recap briefly: all external (non-Kubernetes) traffic comes in through Ingress, Ingress routes it to the Service which then directs it to one of the Pods with matching labels (i.e. app=X). Those Pods are created by Deployment.
The real power of good PaaS comes from supporting tools, technologies, and build in capabilities. Whether there is a low or high entry barrier for using them and whether you can automate configuration without hacking or re-engineering whole solution. Below is list of what I consider must-have essentials to be covered. Setting those up on Kubernetes is pretty straightforward and well documented. However, there are many approaches to achieve the same or similar results, so you need to decide what suits you best.
It is crucial that monitoring and telemetry of infrastructure and services is done right from the start and that it scales with the platform. There are a multitude of tools available to collect, aggregate and visualise various metrics of cluster health, capacity, as well as those of running services. I am a big fan of Prometheus Operator by CoreOS. It basically creates a Prometheus instance and configures it dynamically. It also defines custom Kubernetes objects such as ServiceMonitor. It makes it easier to discover what to scrape within the Kubernetes cluster. If you need Prometheus to scrape some new metrics from a specific application running on Kubernetes just define a new ServiceMonitor and apply it to the cluster. I like to use Grafana for metrics visualisation. Again, as part of cluster deployment you can pre-define Grafana metrics and Datasources so that when a brand new cluster comes up, you will have your Gafana configured and loaded with favourite metrics. When it comes to collecting metrics of the cluster itself there are plenty of options as well, two examples are Node exporter and Kube State Metrics.
Alerting – a sibling of monitoring. You want to know that something when something has gone wrong with your app or PaaS before people start shouting on Slack or Twitter! Again, you can programmatically define Prometheus Alerts. Define them, store in SCM, and apply new changes as they happen.
Everything generates logs: applications, instances and Kubernetes internal services. It’s relatively easy to collect all the metrics. The challenge is to collect and label metrics at the same time so that those are discoverable without much effort. For Kubernetes my favourite stack is EFK – ElasticSearch, FluentD and Kibana – just like ELK but with FluentD. FluentD is quite amazing in picking up labels from Docker containers and decorating logs accordingly. Logs generated by Service ABC, running in specific namespace with some labels will be decorated and indexed accordingly – thus you will be able to search and discover relevant logs in Kibana accurately and fast.
This one needs to be done right – when a node forming a cluster (be it one of masters or a worker node) dies for whatever reason the cluster should cope with that. When setting Kubernetes clusters on AWS the use of ASG is quite obvious i.e. 3 ASGs for master nodes (in different AZ) and one or more ASGs for nodes where containers are scheduled. There are no problems with re-launching instances in this setup. Kubernetes has no problem with new nodes joining the cluster so new nodes can join cluster pretty seamlessly. At the same time Kubernetes can migrate/schedule Pods off unhealthy nodes into healthy ones.
With savings in mind, and where flexibility of platform autoscaling capabilities are must for any modern setup, we don’t want to pay for capacity we don’t need – until we need it. Kubernetes offers three types of autoscaling:
To be continued …
kubernetes Docker containerisation