Interact: ServiceMesh and MultiCloud Demystified – Part 2

In the last blog post on ServiceMesh and ServiceDiscovery, the basics of this topic were described.
The essential technical challenges are listed again here:
What is a ServiceMesh?
This section explains the basic, fundamental components and their functionality in the Service Mesh.
Control Plane
The central Control Plane of a Service Mesh has the following tasks:
Data Plane
The distributed data plane is responsible for the connection and routing between the services.

Functionality
Note:
There are also ServiceMesh implementations that do not require a so-called sidecar proxy. In some environments, this can be a valid and sensible approach – in our projects, however, we usually see the sidecar approach as the most suitable with regard to the requirements for multi-platform, security & governance and compliance (e.g. a cert from our own CA for each service).
Service Discovery
Explanation of “Service Discovery” – i.e. how services find each other – using a simple example with 2 services: “web” and “db”:

Security
How Was it before (and Often Still is Today)?
Firewalls regulate which communication (traffic pattern) is permitted based on IP and ports:

What could it Look like for a Service Mesh?
In dynamic multi-platform or multi-cloud environments, maintaining firewall rules (ACL) becomes an almost impossible task. Apart from the challenge of “East-West” vs. “North-South” traffic:

Better: Service Based Security – Intentions
A service graph is used to regulate which service is allowed to communicate with which other service. Regardless of where these services are operated. The identity of a service is verified (for this purpose, a highly automated TLS CA is integrated into a mesh – basis: SPIFFE).

Best-Practice Tip: HashiCorp Vault can represent a (own) and highly automated CA for the ServiceMesh.
These permissions are referred to as “intention”. It can be parameterized down to the semantics of OSI Layer 7 (e.g. an API call).
Service Based Security
- 1The local agent also returns the URI for the expected identity of the service to which it is connected.
- 2
Proxies between web and database start TLS handshake to authenticate the identity.

- 1
The DB proxy sends the authorization request to its local agent.
- 2
The local agent authorizes the connection based on the locally cached intention.
- 3
mTLS is being established

Routing
Multi Platform
The central challenges are

Multi Cloud
As mentioned at the beginning, multi cloud networking is even harder…
Direct Connect, Express Route, BGP and then a VPN on top of that?
Really? Rather not!
We will cover this topic in one of the following blog posts, as the complexity and scope of this area would go beyond the scope of this article.
Resilience
A chain is only as strong as its weakest link
This well-known quote also applies to software systems, of course. In the worst case, a subsystem can lead to the failure of the entire system if appropriate measures have not already been taken in advance.
Before service meshes became popular, developers could take measures with different frameworks. One of the best-known examples was Hystrix, which is no longer actively developed. Depending on the framework, different resilience patterns such as retries, fallbacks or circuit breakers can be taken into account in the subsystem (configurable or programmatically).
However, the exclusive use of a framework to increase the resilience of the overall system can be a fallacy. If only one subsystem is not configured accordingly, this can render the measures taken in the other subsystems ineffective.
This is where a service mesh comes into play. Many resilience patterns are more useful in the infrastructure for the reason mentioned above, among others. The measures configured in this way are transparent for the subsystems and can be parameterized independently of them. This results in the further advantage that adjustments to the resilience patterns can be carried out more flexibly and independently of application releases.
Have the frameworks become obsolete through the use of a service mesh? This question can clearly be answered with no. Some patterns cannot be made available in a meaningful way at the infrastructure level. The classic example of this is fallbacks, as the decision about the specific fallback is usually a technical decision that is taken into account in the application code.
The recommendation is therefore to make all resilience patterns that are better placed at the infrastructure level available there. For patterns that are better implemented directly in the subsystem, care should be taken to use standardized frameworks – such as MicroProfile Fault Tolerance – in the Java case.
Finally, we want to give a summary and draw a conclusion:
Why Service Mesh – and why Consul?
Why are we developing towards native cloud applications?
We want to be able to develop and deploy functions faster in order to react more quickly to the needs of our customers.
We want to automate everything to reduce risks and increase resilience.
And finally, we want to be able to run the application anywhere without having to change it – no lock-in and a real “RUN ANYWHERE”.
HashiCorp Consul optimally covers exactly these (and many other points). With the highest security, governance and compliance. Of course, also with full observability and telemetry.
And this is exactly where a ServiceMesh contributes to the company goals.




