4. September | Nicht kategorisiert

Interact: ServiceMesh and MultiCloud Demystified – Part 2

clipboard image 1756196587 - FULLSTACKS

In the last blog post on ServiceMesh and ServiceDiscovery, the basics of this topic were described.

The essential technical challenges are listed again here:

  • Service Discovery – How do the services find each other?

  • Security – Which service is allowed to talk to which (via secure mTLS)?

  • Routing – How is routing done to the individual services?

  • Resilience – How can the resilience of a software system be improved with infrastructure resources (e.g. circuit breaker)?

What is a ServiceMesh?

This section explains the basic, fundamental components and their functionality in the Service Mesh.

Control Plane

The central Control Plane of a Service Mesh has the following tasks:

  • Service Discovery and Service Catalog

  • Authorization and configuration of the proxies of the Data Plane.

 

Data Plane

The distributed data plane is responsible for the connection and routing between the services.

fullstacks Data Plane - FULLSTACKS

Functionality

  • A proxy is placed together with the service instance, which regulates the incoming traffic (SideCar)

  • The client agent instantiates the proxy and registers it as a service

  • The proxy is configured with a port used for the service, as well as ports for all upstream destinations to which the service wants to connect.

 

Note:
There are also ServiceMesh implementations that do not require a so-called sidecar proxy. In some environments, this can be a valid and sensible approach – in our projects, however, we usually see the sidecar approach as the most suitable with regard to the requirements for multi-platform, security & governance and compliance (e.g. a cert from our own CA for each service).

Service Discovery

Explanation of “Service Discovery” – i.e. how services find each other – using a simple example with 2 services: “web” and “db”:

  • The proxy of the “web” service uses the name of the “db” service to query the location of the DB

  • The local agent returns the IP address/port of a healthy DB instance to the proxy

fullstacks Service Discovery - FULLSTACKS

Security

How Was it before (and Often Still is Today)?

Firewalls regulate which communication (traffic pattern) is permitted based on IP and ports:

fullstacks Security frueher - FULLSTACKS

 

What could it Look like for a Service Mesh?

In dynamic multi-platform or multi-cloud environments, maintaining firewall rules (ACL) becomes an almost impossible task. Apart from the challenge of “East-West” vs. “North-South” traffic:

fullstacks Security wie es ein koennte - FULLSTACKS

 

Better: Service Based Security – Intentions

A service graph is used to regulate which service is allowed to communicate with which other service. Regardless of where these services are operated. The identity of a service is verified (for this purpose, a highly automated TLS CA is integrated into a mesh – basis: SPIFFE).

fullstacks Service Based Security Intentions - FULLSTACKS

Best-Practice Tip: HashiCorp Vault can represent a (own) and highly automated CA for the ServiceMesh.

These permissions are referred to as “intention”. It can be parameterized down to the semantics of OSI Layer 7 (e.g. an API call).

Service Based Security

  • 1
    The local agent also returns the URI for the expected identity of the service to which it is connected.
  • 2

    Proxies between web and database start TLS handshake to authenticate the identity.

fullstacks Service Based Security 1 2 - FULLSTACKS
  • 1

    The DB proxy sends the authorization request to its local agent.

  • 2

    The local agent authorizes the connection based on the locally cached intention.

  • 3

    mTLS is being established

fullstacks Service Based Security 2 2 - FULLSTACKS

Routing

Multi Platform

The central challenges are

  • Overlapping IP Ranges

  • Mix / Integration of Kubernetes with VM or Bare-Metal Workloads

  • Security (not all ports have to be opened, etc.)

fullstacks Routing - FULLSTACKS

Multi Cloud

As mentioned at the beginning, multi cloud networking is even harder…

Direct Connect, Express Route, BGP and then a VPN on top of that?

Really? Rather not!

We will cover this topic in one of the following blog posts, as the complexity and scope of this area would go beyond the scope of this article.

Resilience

A chain is only as strong as its weakest link

This well-known quote also applies to software systems, of course. In the worst case, a subsystem can lead to the failure of the entire system if appropriate measures have not already been taken in advance.

Before service meshes became popular, developers could take measures with different frameworks. One of the best-known examples was Hystrix, which is no longer actively developed. Depending on the framework, different resilience patterns such as retries, fallbacks or circuit breakers can be taken into account in the subsystem (configurable or programmatically).

However, the exclusive use of a framework to increase the resilience of the overall system can be a fallacy. If only one subsystem is not configured accordingly, this can render the measures taken in the other subsystems ineffective.

This is where a service mesh comes into play. Many resilience patterns are more useful in the infrastructure for the reason mentioned above, among others. The measures configured in this way are transparent for the subsystems and can be parameterized independently of them. This results in the further advantage that adjustments to the resilience patterns can be carried out more flexibly and independently of application releases.

Have the frameworks become obsolete through the use of a service mesh? This question can clearly be answered with no. Some patterns cannot be made available in a meaningful way at the infrastructure level. The classic example of this is fallbacks, as the decision about the specific fallback is usually a technical decision that is taken into account in the application code.

The recommendation is therefore to make all resilience patterns that are better placed at the infrastructure level available there. For patterns that are better implemented directly in the subsystem, care should be taken to use standardized frameworks – such as MicroProfile Fault Tolerance – in the Java case.

Finally, we want to give a summary and draw a conclusion:

Why Service Mesh – and why Consul?

Why are we developing towards native cloud applications?

We want to be able to develop and deploy functions faster in order to react more quickly to the needs of our customers.

We want to automate everything to reduce risks and increase resilience.

And finally, we want to be able to run the application anywhere without having to change it – no lock-in and a real “RUN ANYWHERE”.

  • Quick creation and deployment of applications

  • Automating processes to reduce the risk of errors

  • Architecture for failure safety

  • Avoiding lock in of a platform operator

  • Selection of the best and most economical services

  • Failover between platforms or cloud providers

HashiCorp Consul optimally covers exactly these (and many other points). With the highest security, governance and compliance. Of course, also with full observability and telemetry.

And this is exactly where a ServiceMesh contributes to the company goals.

Why HashiCorp Consul?

  • The most mature service mesh (Connect) from our point of view

  • Single GO Binary, runs in Bare Metal, VM, Container and Cloud (e.g. AWS Lambda)

  • SPIFFE/SIRE

  • Intent based Networking & Security

  • Optimal Tracing

  • Features such as:

    • Canary Deployments
    • Splitter / Circuit Breaker
    • API Gateway
    • Rate Limiting
    • Blue / Green Deployments
  • Ideally integrated into the Hashicorp Ecosystem (e.g. Vault and Terraform)

  • Ideally suited for a modern, proven “Platform Ops” approach

  • Enterprise Support by manufacturer HashiCorp

  • Design & Implementation at the highest level by FullStackS

More Blog Posts