After Kubernetes, We Don’t Lose Control – We Start to See Clearly

The goal of modernization is clear: faster development, more flexible infrastructure, better scalability. Monolithic applications are replaced by microservices, virtual machines by containers, single data centers by hybrid or multi-cloud environments.

In practice, however, many DevOps teams encounter the same phenomenon: after containerization, the number of incidents rises, failures become harder to reproduce, and root cause analysis takes hours. In these moments, one sentence is often heard: “The system has become less stable.” In reality, this is generally not the case – and the difference matters.

Kubernetes does not increase the number of failures.

In this article, we explain why traditional monitoring is no longer sufficient in a Kubernetes-based environment, and how observability helps make sense of modern system behavior.

What Really Changes with Containerization?

In a monolithic environment, systems are relatively predictable. Failures can often be localized. The number of components is limited, connections are stable, and the runtime environment stays unchanged for longer periods. When a failure occurs, it can usually be traced back to a specific module or server.

Moreover, in a monolithic environment, much of the knowledge about the system lives implicitly in people. The architect keeps the dependencies in mind. The senior engineer knows where the weak points are. This knowledge was never recorded in the system itself – it existed in meetings, wikis, and memory.

In Kubernetes-based, microservice architectures, this changes fundamentally. A single user request can travel through an API gateway, 5–10 services, caches, databases, and message queues. Pods are created and destroyed dynamically. Services communicate through each other. A single user request travels through dozens of components before receiving a response. Resources scale automatically, and routing paths change continuously.

This dynamism is precisely what gives modern architecture its value. But this same dynamism makes failures invisible.

A slowing database call can trigger a cascade effect across the service chain. The node is healthy. The pod is running. CPU values look fine. Yet the user experience degrades – and nobody can pinpoint where the problem started.

The problem rarely appears in a single component. What matters is the behavior of the entire service chain.

Why Traditional Monitoring Fails Here?

Traditional monitoring thinks in components.

It watches CPU load, memory usage, and node health, and alerts when a predefined threshold is crossed. This approach worked well when system structure was stable, connections were simple, and a failure could be clearly linked to a single resource.

In a containerized environment, problems rarely appear in such a clean form.

A user request passes through multiple services, databases, and network components. Performance degradation does not necessarily stem from a single overloaded resource, but from a subtle deviation somewhere in the service chain that slowly propagates through the entire system.

And often everything is green. Yet the user experience degrades.

Why? Because the problem is not in a server. But in the interactions between services.

What is missing is context. How components are connected. Which call passed through which service. Where response time began to distort – and why.

In modern architecture, the question is no longer whether a component is alive. It is how the entire system behaves during a given transaction – from the customer’s perspective, right now, in real time.

Traditional monitoring simply cannot answer that question.

How Does Observability Provide an Answer?

Observability is not just more data. It is a different question.

Monitoring asks: is the component alive? Observability asks:

“What happened to this specific user transaction across the entire system?”

In practice, this means it does not examine a single metric, but the relationships between metrics, logs, and traces. It interprets – and reveals the root cause of problems. Not to generate more dashboards. But to understand system behavior.

In a Kubernetes-based environment, this manifests across three levels.

At the topology level, it automatically maps which services are connected and how – even when the topology changes from minute to minute.

At the transaction level, it traces the full path of a user request and pinpoints exactly where latency or failure occurred.

At the root cause level, it does not merely signal an anomaly but identifies its true source – in a dynamically changing environment, without manual intervention. This is not statistical correlation: it reveals causal relationships, and tells you what triggered what you are seeing.

Control, then, does not come from collecting more data – but from making system behavior interpretable.

What Does This Mean for DevOps Teams in Practice?

Containerization changes not only the architecture, but the logic of incident management.

In a traditional environment, troubleshooting was linear. An alert fired, the team looked at the affected server, and localization began. Ownership was usually clear.

In a distributed, microservice-based system, a single incident can simultaneously affect multiple teams’ domains, multiple services, and multiple technology layers. Without full context, incident investigation becomes a coordination problem. Everyone looks at their own area. Every component appears to be functioning. The customer still experiences an error – and no one has a view of the whole picture.

Observability changes this situation. It enables incidents to be understood at the service chain level – not team by team, but uniformly, within a shared context.

The result is not just faster troubleshooting. The engineering capacity currently consumed by war rooms and root cause analysis is freed up – and redirected toward development and innovation. This is observability’s real business case: not fewer outages, but more productive engineering hours.

The Lesson

Containerization is not slowing down. Architectures are becoming more complex, release cycles shorter, and the number of dependencies continues to grow. Regulatory requirements – DORA, NIS2 – are increasingly demanding concrete accountability for system resilience and incident response capability.

In this environment, observability is not a monitoring upgrade. It is not a tool swap. It is a shift in mindset – one that determines whether the next steps: AI adoption, further cloud migration, new service launches, happen under control or in blind flight.

Kubernetes does not take away control. But it shows that modern systems can no longer be understood through infrastructure monitoring alone.

As architecture evolves, the way we observe it must evolve too. Not more data. But better system understanding.

Organizations that build their observability foundation now gain the advantage. Not only because they handle incidents faster. But because they understand what they see – and that understanding is what real governance is built on.

You can read about how Air France–KLM executed cloud migration with observability to see this in practice.

Telvice Ltd. implements and customizes observability solutions for enterprise clients – on Dynatrace and Datadog platforms. If you would like to assess where your systems stand today, contact us and request a free consultation.

Mizsei János

Author

János is the local ambassador for Dynatrace and a lead observability expert. He has over 20 years of experience in the IT industry, with extensive expertise in monitoring and stabilizing IT systems for insurance companies and banks. He firmly believes that Dynatrace solutions significantly enhance customer experience and increase the reliability of IT systems. In 2024, he was awarded the Dynatrace Community RockStar Award, which he received in Las Vegas at the company’s annual conference. In his free time, he is a nature enthusiast, particularly enjoying hiking around the mountain lakes of Europe.