Data Solutions and Kubernetes. A Love Story.

Our Data Solutions team was the first of our engineering teams to deploy a dockerized production like service into the cloud back in spring 2020. Back then it was done using Terraform and the Azure App Services. This marked the starting point for our Docker & cloud first strategy we follow when it comes to component development.

The most recent component is Data Weir, a data forwarding service to share pseudonymized data with our new data platform (Cappa) on Azure. The data platform is used to sell anonymized WEBFLEET data to third party data customers e.g. extended streams of trace data or datasets of driving event data for specific cities. Also, it enables us to do big data analysis and machine learning fast.

Data Weir is a very crucial component since it's the primary way to ingest data to the data platform. For every minute of downtime, we literally loose revenue immediately. That's why it's important that the component is very resilient, redundant & scalable.

We knew that we would need to have massive parallelism to process and shovel most of the data of e.g. the Kafka reporting topic to Azure Event Hubs. The 90 days total is roughly 11.4 B messages leading to 16 TByte of data with a median latency of 200 ms. So, we made Data Weir first-class citizen of Kubernetes. Also, we knew/inferred the long-term company strategy is to migrate all services to Kubernetes. Another reason why we opted for Kubernetes and not for the usual on-prem puppet deployment was the use of python as programming language (there were many reasons
to choose that one, but this is out of scope of this article). With Kubernetes we could use Docker to package our application, which eases it up the deployment in many ways.

One fixed requirement was to have easy scalability and control of the Data Weir deployment. Data Weir has multiple data forwarding feeds that operate distributed over all pods and a control channel implemented as internal Kafka topic to control all pods simultaneously.
Commands can be sent by an API interface to one random pod (selected by Kubernetes load balancer), will be fed into the control topic and consumed by all pods immediately. This way we can scale
up & down as we like but keep full control.

The development started in April 2021, the final production rollout was in September 2021. 
We started with the AKS deployment on our data platform since Kubernetes on the Noris cloud was not fully ready at this time. Then later we switched to Noris in three steps. First, we deployed another instance of the application to Noris so both deployments shared the load. Then we monitored the new home and finally scaled down the AKS instance to zero, ready to get spun up again on problems.
Unfortunately, there were multiple incidents on Noris, so we decided to keep both instances up and running so that both could handle the full load if the other one goes down. As you might noticed, we do full DevOps. Only infrastructure tasks are not done by us.

In the meantime, we managed to deploy Data Weir to dev & stage Kubernetes using ArgoCD. And now we are waiting for the WEBFLEET AKS to be production ready so that we can move our production setup to this new shiny place.
 
And finally some eye candy. This is a heatmap of trace data in Amsterdam of one week, created on our data platform in less than 10 minutes. On-prem this would take several hours to do. This works for all small and big places where you can find our fleet.