
Illustration by evv from Shutterstock
Istio Ambient Mesh was introduced in September 2022, but I didn’t give it sufficient attention at the time. However, for those users who already use or plan to use Istio as a service mesh, this development aims to address some of the shortcomings of the traditional sidecar model, which we will discuss in subsequent sections of this blog.
Note: Istio Ambient Mesh is still in alpha and should not be used in production environments until it is upgraded to General Availability (GA).
Before we delve further into this literature, we must cover some foundational concepts without the knowledge of which, the rest of this article may as well be written in hieroglyphs.
What is a Service Mesh?
Many contemporary applications are built on a framework of distributed microservices, where individual microservices perform a dedicated function and often communicate with each other. For lack of a better analogy, you can think of it as modular Lego bricks that are put together to build a statue as opposed to a monolith statue carved from stone.
A service mesh is a layer added on top of these applications (or microservices) that enables features like traffic management, observability, and security without you having to make any modifications to the applications themselves.
Istio Service Mesh Features
Istio is an open-source service mesh that layers transparently onto existing distributed applications. The default communication mode for services within a cluster is clear text, which isn’t ideal for security. Istio service mesh secures this traffic by encrypting these communications with mTLS (mutual TLS). It also offers additional features including but not limited to:
- HTTP, gRPC, WebSocket, and TCP Load Balancing
- Granular level of traffic control
- Access controls, rate limits and quotas
- Service discovery
- Eagle eye observability (metrics & telemetry data, logs, and traces for all traffic within a cluster)
Now that we have an understanding of what a service mesh is and the benefits that it offers, let’s compare the traditional Istio sidecar model with the new Ambient Mesh model.
Istio Architecture without Ambient Mesh
Istio has 2 fundamental components: a C ontrol Plane, and a Data Plane.
The Data Plane represents the communications within services in the mesh. The service mesh uses an Envoy proxy deployed alongside each service in the mesh (as a sidecar) and all the inbound and outbound traffic within the mesh goes through these Envoy proxies.
The Control Plane gathers data from these proxies and decides and controls the configuration of these proxies by reconciling the current state of the environment with the desired state.
src: https://istio.io/latest/docs/ops/deployment/architecture/
Drawbacks of this model:
- Resilience: Making changes such as upgrading the proxies via the control plane requires restarting each sidecar container which can be disruptive.
- Resource Hungry: Resources for the sidecars need to be reserved for worst-case usage which is inefficient and tends to make the billing admin squirm.
- Traffic breaking: Particularly when dealing with applications that have non-conformant HTTP implementations.
The sidecar performs both Layer 4 and Layer 7 traffic processing. One prominent issue here is that L7 processing is very compute intensive and this feature is essentially being forced upon the services even when they only require simple transport security. Additionally, most of the critical Common Vulnerabilities and Exposures (CVEs) with the Envoy proxy happen at the L7 layer so there is essentially a larger surface area of attack when the baggage of L7 filtering is added to services that don’t need it.
Istio Architecture with Ambient Mesh
Istio Ambient Mesh introduces some radical changes to the Data Plane architecture. This model splits the L4 and L7 functionalities that used to be an all-or-nothing proposition with sidecars.
src: https://istio.io/v1.15/blog/2022/introducing-ambient-mesh/#slicing-the-layers
Instead of sidecars, we now have a secure overlay created by ztunnels (zero-trust tunnels). These ztunnels act as shared agents, being deployed as a DaemonSet meaning that there is one agent per node in the Kubernetes cluster.
The ztunnel uses an eBPF program compiled into the istio-cni component to route traffic. This has several performance and flexibility benefits compared to using iptables for routing.
Each ztunnel is responsible for securing L4 traffic for the workloads within its respective node.
For L7 features, ambient mesh allows you to deploy Envoy-based waypoint proxies that are applied at a namespace level allowing the workloads within that namespace to utilize the full breadth of istio features.
src: https://istio.io/v1.15/blog/2022/introducing-ambient-mesh/#slicing-the-layers
These waypoint proxies scale based on the actual demand of the workloads within the namespace that it is operating in. This is substantially more efficient and economical than reserving resources for sidecars based on worst-case usage of workloads.
This architecture inherently allows for a more ergonomic and cost-efficient usage of Istio service mesh given that the L7 proxies are only applied where needed combined with the the fact that they utilize resources more efficiently, and can scale more dynamically and independently.
It also allows for interoperability with the traditional sidecar model granting you the flexibility of choice.
There are 2 concerns of this new data plane model which I won’t be directly addressing here:
- Performance (due to the extra hops involved):
Istio claims that without the redundant 2 way L7 filtering of the sidecar model, the expected performance degradation of the ambient mesh due to the extra hop will be more than compensated for. They will also publish a dedicated performance blog post on this, presumably in collaboration with Solo.io & Google with whom they have worked on this advancement. 2. Security (due to the shared agent model):
For those concerned about the security implications of a sidecar-less Data Plane, I recommend perusing the Ambient Mesh Security Deep Dive blog.
I’m very excited to see the Ambient Mesh in a production environment and it promises to introduce substantial cost savings and efficiency for those that leverage it.
However, at the end of the day, it feels like the elephant in the room is still very much present and ignored — the Envoy proxy.
I don’t think sidecars are going away anytime soon, but maybe one day we’ll have a lightweight, secure, yet powerful L7 processing software that will be a panacea for the ills of every service mesh user. Until then, I’m grateful for every step in that direction and you should be too.