Cilium 官方版本给出的 Service Mesh 全景图，不同于其它 Service Mesh 开源项目设计了很多 CRD 概念，Cilium Service Mesh 当前专注实现了 mesh data plane，通过开放、包容的设计，能够对接其它 control plane，当前版本已实现了对 Envoy CRD、Kubernetes ingress、Istio、Spiffe、Gateway API 的支持。

前言

Cilium Service Mesh 与其它 ServiceMesh 显著对比：

当前 Service Mesh 领域中， per-pod proxy (即 sidecar) 大行其道，Cilium 走出了 per-node proxy 路线的项目，其规避了 sidecar 资源开销多、网络延时高等弊端。
其它 Service Mesh 项目几乎都是借助 Linux 内核网络协议栈劫持流量，而 Cilium Service Mesh 基于 eBPF datapath，有着天生的加速效果。
Cilium Service Mesh 承载于 Cilium CNI 的底座能力，Cilium CNI 本身提供的网络 policy、多集群 cluster mesh、eBPF 加速、观察性 hubble 等能力，其功能非常丰富。

Cilium 实现了 2个新的 CRD，CiliumEnvoyConfig 和 CiliumClusterwideEnvoyConfig，两者的写法和作用几乎相同，唯一区别是，CiliumEnvoyConfig 是 namespace scope，而 CiliumClusterwideEnvoyConfig 是 cluster scope。

本篇文章通过 Cilium Mesh 示例来了解如何配置 CiliumEnvoyConfig、CiliumClusterwideEnvoyConfig。

搭建 Kubernetes 集群

使用 Kind 构建 k8s 集群，具体如下所示：

kind create cluster  --image=kindest/node:v1.22.17 --name  tanjunchen
Creating cluster "tanjunchen" ...
⢀⡱ Ensuring node image (kindest/node:v1.22.17) 🖼
⢎⡀ Ensuring node image (kindest/node:v1.22.17) 🖼
⠎⠁ Ensuring node image (kindest/node:v1.22.17) 🖼
 ✓ Ensuring node image (kindest/node:v1.22.17) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-tanjunchen"
You can now use your cluster with:
kubectl cluster-info --context kind-tanjunchen
Thanks for using kind! 😊

root@instance-00qqerhq:~/cilium-mesh# kubectl get nodes
NAME                       STATUS   ROLES                  AGE   VERSION
tanjunchen-control-plane   Ready    control-plane,master   61s   v1.22.17
root@instance-00qqerhq:~/cilium-mesh# kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.22.17
WARNING: version difference between client (1.27) and server (1.22) exceeds the supported minor version skew of +/-1

部署测试应用

部署测试应用 wrk 与 nginx，具体如下所示：

root@instance-00qqerhq:~/cilium-mesh# kubectl apply -f wrk-nginx.yaml
deployment.apps/wrk unchanged
service/nginx unchanged
deployment.apps/nginx configured
root@instance-00qqerhq:~/cilium-mesh# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-699bb76bb4-p26v7   1/1     Running   0          2m16s
wrk-64884c57d7-vpml8     1/1     Running   0          2m16s

Cilium Mesh

前提条件

Cilium 必须使用 kubeProxyReplacement 模式为 partial 或 strict。
支持的最低 Kubernetes 版本是 1.19。

安装

安装 Cilium Mesh

helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.14.0  --namespace kube-system --set kubeProxyReplacement=strict --set-string extraConfig.enable-envoy-config=true  --set envoy.enabled=true --set hubble.relay.enabled=true --set hubble.ui.enabled=true

Envoy 配置

目前仅支持 Envoy API v3。Cilium 节点部署 Envoy 来支持 Cilium HTTP 网络策略和可观察性。Cilium Mesh 中使用的 Envoy 已经针对 Cilium Agent 的需求进行了优化，并且不包含 Envoy 代码库中可用的许多 Envoy 扩展。Envoy 文档中引用的标准类型（type.googleapis.com/envoy.config.listener.v3.Listener 和 type.googleapis.com/envoy.config.route.v3.RouteConfiguration）始终可用。具体拓展如下所示：

envoy.clusters.dynamic_forward_proxy
envoy.filters.http.dynamic_forward_proxy
envoy.filters.http.ext_authz
envoy.filters.http.jwt_authn
envoy.filters.http.local_ratelimit
envoy.filters.http.oauth2
envoy.filters.http.ratelimit
envoy.filters.http.router
envoy.filters.http.set_metadata
envoy.filters.listener.tls_inspector
envoy.filters.network.connection_limit
envoy.filters.network.ext_authz
envoy.filters.network.http_connection_manager
envoy.filters.network.local_ratelimit
envoy.filters.network.mongo_proxy
envoy.filters.network.mysql_proxy
envoy.filters.network.ratelimit
envoy.filters.network.tcp_proxy
envoy.filters.network.sni_cluster
envoy.filters.network.sni_dynamic_forward_proxy
envoy.stat_sinks.metrics_service
envoy.transport_sockets.raw_buffer
envoy.upstreams.http.http
envoy.upstreams.http.tcp

场景

Cilium 提供了通过 CRD CiliumEnvoyConfig 和 CiliumClusterwideEnvoyConfig 控制 L7 流量。

这些 Envoy CRD 配置根本没有经过 K8s 验证，因此 Envoy 资源中的任何错误只会在 Cilium Agent 看到。kubectl apply 将报告成功，而解析和/或安装节点本地 Envoy 实例的资源可能会失败。目前验证这一点的唯一方法是观察 Cilium Agent 日志中的错误和警告。

版本

Cilium：v1.14.0
Kubernetes：v1.22.17

配置 Admin

给 envoy 下发 admin 配置，使其暴露 admin 管理界面。

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-admin-listener
spec:
  resources:
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: envoy-admin-listener
    address:
      socket_address:
        address: "::"
        ipv4_compat: true
        port_value: 9901
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: envoy-admin-listener
          route_config:
            name: admin_route
            virtual_hosts:
            - name: "admin_route"
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: "envoy-admin"
          use_remote_address: true
          skip_xff_append: true
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

查看 envoy 的 config_dump 接口，如下所示：

URL 重写

部署应用

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.14.0/examples/kubernetes/servicemesh/envoy/test-application.yaml

测试应用工作负载包括：两个客户端 client 和 client2，两个服务 echo-service-1 和 echo-service-2。

配置环境变量

export CLIENT2=$(kubectl get pods -l name=client2 -o jsonpath='{.items[0].metadata.name}')
export CLIENT=$(kubectl get pods -l name=client -o jsonpath='{.items[0].metadata.name}')

我们将使用 Envoy 配置来请求服务 echo-service-1 和 echo-service-2。我们可以请求 /，但是请求 /foo 路径，就会报 404。

kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080/foo
kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080
kubectl exec -it $CLIENT2 -- curl -I echo-service-2:8080

部署 envoy-lb-listener.yaml ，该文件定义了 CiliumClusterwideEnvoyConfig，如下所示：

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
    - name: echo-service-2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 50
                    - name: "default/echo-service-2"
                      weight: 50
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
                regex_rewrite:
                  pattern:
                    google_re2: { }
                    regex: "^/foo.*$"
                  substitution: "/"
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

上述配置两个后端 echo 服务之间的请求比例为 50/50，并且将路径 /foo 重写为 /，由于路径重写，对 /foo 的请求现在应该会成功。我们发现服务重写成功，原本 /foo 是请求响应 404，配置策略后是可以成功的，如下所示：

限流

针对 echo-service-1 配置 CiliumClusterwideEnvoyConfig 限流策略，如下所示：

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-ratelimit
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                - name: envoy.filters.http.local_ratelimit
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                    stat_prefix: http_local_rate_limiter
                    token_bucket:
                      max_tokens: 2
                      tokens_per_fill: 2
                      fill_interval: 5s
                    filter_enabled:
                      runtime_key: local_rate_limit_enabled
                      default_value:
                        numerator: 100
                        denominator: HUNDRED
                    filter_enforced:
                      runtime_key: local_rate_limit_enforced
                      default_value:
                        numerator: 100
                        denominator: HUNDRED
                    response_headers_to_add:
                    - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                      header:
                        key: x-local-rate-limit
                        value: 'true'
                    local_rate_limit_per_downstream_connection: false
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 100
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

执行以下操作，触发限流。

export CLIENT2=$(kubectl get pods -l name=client2 -o jsonpath='{.items[0].metadata.name}')
for i in {1..5}; do  kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080 | grep -E "x-local-rate-limit|429|local_rate_limited"; done

因为 echo-service-2 没有配置限流策略，所以请求 echo-service-2 没有触发限流。

熔断

部署 fortio 压测工具。

apiVersion: v1
kind: Service
metadata:
  name: fortio
  labels:
    app: fortio
    service: fortio
spec:
  ports:
  - port: 8080
    name: http
  selector:
    app: fortio
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fortio-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortio
  template:
    metadata:
      labels:
        app: fortio
    spec:
      containers:
      - name: fortio
        image: fortio/fortio:latest_release
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http-fortio
        - containerPort: 8079
          name: grpc-ping

给 echo-service-1 配置 CiliumClusterwideEnvoyConfig 或者 CiliumEnvoyConfig 熔断策略。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-circuit-breaker
spec:
  services:
    - name: echo-service-1
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 100
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      circuit_breakers:
        thresholds:
        - priority: "DEFAULT"
          max_requests: 2
          max_pending_requests: 1
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

使用两个并发连接 (-c 2) 调用服务并发送 20 个请求 (-n 20)。

kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service-1:8080

如下所示，当并发请求 echo-service-1 压力较大时，会触发熔断策略。

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service-1:8080
{"ts":1691997950.946896,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"}
Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service-1:8080
{"ts":1691997950.947488,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service-1:8080","threads":"2","qps":"-1.0","warmup":"parallel","conn-reuse":""}
Starting at max qps with 2 thread(s) [gomax 4] for exactly 20 calls (10 per thread + 0)
{"ts":1691997950.950205,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
{"ts":1691997950.976027,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 27.10868ms : 10 calls. qps=368.8855377687147"}
{"ts":1691997950.976465,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 27.547964ms : 10 calls. qps=363.0032331971974"}
Ended after 27.591205ms : 20 calls. qps=724.87
{"ts":1691997950.976519,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"27.591205ms","calls":"20","qps":"724.8686673887566"}
Aggregated Function Time : count 20 avg 0.0027248375 +/- 0.002902 min 0.001064346 max 0.010990329 sum 0.05449675
# range, mid point, percentile, count
>= 0.00106435 <= 0.002 , 0.00153217 , 70.00, 14
> 0.002 <= 0.003 , 0.0025 , 80.00, 2
> 0.003 <= 0.004 , 0.0035 , 85.00, 1
> 0.006 <= 0.007 , 0.0065 , 90.00, 1
> 0.01 <= 0.0109903 , 0.0104952 , 100.00, 2
# target 50% 0.00171211
# target 75% 0.0025
# target 90% 0.007
# target 99% 0.0108913
# target 99.9% 0.0109804
Error cases : count 1 avg 0.001306888 +/- 0 min 0.001306888 max 0.001306888 sum 0.001306888
# range, mid point, percentile, count
>= 0.00130689 <= 0.00130689 , 0.00130689 , 100.00, 1
# target 50% 0.00130689
# target 75% 0.00130689
# target 90% 0.00130689
# target 99% 0.00130689
# target 99.9% 0.00130689
# Socket and IP used for each connection:
[0]   2 socket used, resolved to 10.96.39.34:8080, connection timing : count 2 avg 0.0002502065 +/- 6.296e-05 min 0.000187245 max 0.000313168 sum 0.000500413
[1]   1 socket used, resolved to 10.96.39.34:8080, connection timing : count 1 avg 0.00018792 +/- 0 min 0.00018792 max 0.00018792 sum 0.00018792
Connection time histogram (s) : count 3 avg 0.00022944433 +/- 5.92e-05 min 0.000187245 max 0.000313168 sum 0.000688333
# range, mid point, percentile, count
>= 0.000187245 <= 0.000313168 , 0.000250207 , 100.00, 3
# target 50% 0.000218726
# target 75% 0.000265947
# target 90% 0.00029428
# target 99% 0.000311279
# target 99.9% 0.000312979
Sockets used: 3 (for perfect keepalive, would be 2)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.96.39.34:8080: 3
Code 200 : 19 (95.0 %)
Code 503 : 1 (5.0 %)
Response Header Sizes : count 20 avg 370.55 +/- 85.01 min 0 max 391 sum 7411
Response Body/Total Sizes : count 20 avg 2336.75 +/- 480.8 min 241 max 2448 sum 46735
All done 20 calls (plus 0 warmup) 2.725 ms avg, 724.9 qps
root@instance-00qqerhq:~/cilium-mesh/strateges#

负载均衡

配置请求客户端 CLIENT：

export CLIENT=$(kubectl get pods -l name=client -o jsonpath='{.items[0].metadata.name}')

测试命令，如下所示：

for i in {1..10}; do  kubectl exec -it $CLIENT -- curl  helloworld:5000/hello; done
kubectl exec -it $CLIENT -- curl  helloworld:5000/hello

部署 helloworld 测试应用：

kubectl apply -f https://github.com/istio/istio/blob/master/samples/helloworld/helloworld.yaml

配置 CiliumEnvoyConfig 路由策略，随机访问 helloworld v1 与 v2。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: helloworld-lb
spec:
  services:
    - name: helloworld
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: helloworld-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: helloworld-lb-listener
                rds:
                  route_config_name: helloworld_lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: helloworld_lb_route
      virtual_hosts:
        - name: "helloworld_lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                cluster: default/helloworld
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS

配置策略，访问 helloworld 的请求，v1 占比 80%，v2 占比 20%。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: helloworld-lb
spec:
  services:
    - name: helloworld
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: helloworld-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: helloworld-lb-listener
                rds:
                  route_config_name: helloworld_lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: helloworld_lb_route
      virtual_hosts:
        - name: "helloworld_lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                cluster: default/helloworld
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: STATIC
      load_assignment:
        cluster_name: "default/helloworld"
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 10.0.0.55  # helloworld v2 pod IP
                  port_value: 5000
            load_balancing_weight: 20
          - endpoint:
              address:
                socket_address:
                  address: 10.0.0.238
                  port_value: 5000
            load_balancing_weight: 80  # helloworld v1 pod IP

在上述 CiliumEnvoyConfig 配置文件中，我们使用的是静态 Pod IP，只是作为测试，不太推荐使用。

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl apply -f cec-envoy-lb-weight-load_assignment.yaml
ciliumenvoyconfig.cilium.io/helloworld-lb configured

从上述截图中可以得出，配置的 CCEC 策略生效了，接近 80% 的请求发送到 v1，20% 的请求发送到 v2。
上述配置静态 Pod IP 不太推荐，我们使用按照比例 90% 的请求发送到 v1，10% 的请求发送到 v2 前，简单了解下述内容。
Envoy 发现服务 (EDS) 的名字需要遵循 namespace/service-name:port 规范。

CiliumClusterwideEnvoyConfig 或者 CiliumEnvoyConfig 中的 BackendServices 指定 Kubernetes 服务，其后端使用 EDS 自动同步到 Envoy。这些服务的流量不会转发到 Envoy 侦听器。这允许 Envoy 侦听器对这些后端的流量进行负载平衡，而正常的 Cilium 服务负载平衡则同时负责平衡这些服务的流量。

我们需要为 helloworld-v1 与 helloworld-v2 分别定义 service helloworld-v1 与 helloworld-v2，如下所示：

apiVersion: v1
kind: Service
metadata:
  name: helloworld
  labels:
    app: helloworld
    service: helloworld
    version: v1
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: helloworld
---
apiVersion: v1
kind: Service
metadata:
  name: helloworld-v1
  labels:
    app: helloworld
    service: helloworld
    version: v1
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: helloworld
    version: v1

配置路由策略，helloworld 90% 流量指向 v1 版本，10% 流量指向 v2 版本。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: helloworld
      namespace: default
  backendServices:
    - name: helloworld-v1
      namespace: default
    - name: helloworld-v2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/helloworld-v1"
                      weight: 90
                    - name: "default/helloworld-v2"
                      weight: 10
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld-v1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld-v2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

执行以下测试命令，我们发现 90 % 的流量指向 v1，10%的流量指向 v2，符合预期。

for i in {1..10}; do  kubectl exec -it $CLIENT -- curl  helloworld:5000/hello; done

Metric

给 envoy 下发 Prometheus 配置，使其暴露 Metric 指标。

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-prometheus-metrics-listener
spec:
  resources:
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: envoy-prometheus-metrics-listener
    address:
      socket_address:
        address: "::"
        ipv4_compat: true
        port_value: 9090
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: envoy-prometheus-metrics-listener
          rds:
            route_config_name: prometheus_route
          use_remote_address: true
          skip_xff_append: true
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
    name: prometheus_route
    virtual_hosts:
    - name: "prometheus_metrics_route"
      domains: ["*"]
      routes:
      - match:
          path: "/metrics"
        route:
          cluster: "envoy-admin"
          prefix_rewrite: "/stats/prometheus"

查看指标

http://xxx/stats/prometheus

AccessLog

Cilium Mesh 使用的 Envoy 不支持 accesslog，因为 Cilium 使用的 Envoy 构建中未启用这些 Envoy 扩展。具体参见源码 extensions_build_config。后续有个 PR 已经打开该特性，可参见 Enable access logger resource PR。

灰度

针对 echo-service-1 与 echo-service-2 配置 CiliumEnvoyConfig 路由策略，实现 header 灰度效果。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
    - name: echo-service-2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
                headers:
                  - name: "version"
                    exact_match: "foo"
              route:
                cluster: default/echo-service-1
                regex_rewrite:
                  pattern:
                    google_re2: { }
                    regex: "^/foo.*$"
                  substitution: "/"
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 50
                    - name: "default/echo-service-2"
                      weight: 50
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

如下图所示：默认访问 echo-service-1 foo 接口会报 404 错误，但是如果请求中携带 header version=foo，则会请求成功。

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl exec -it $CLIENT2 -- curl -H"version:foo" -I echo-service-1:8080/foo
HTTP/1.1 200 OK
x-powered-by: Express
vary: Origin, Accept-Encoding
access-control-allow-credentials: true
accept-ranges: bytes
cache-control: public, max-age=0
last-modified: Wed, 21 Sep 2022 10:25:56 GMT
etag: W/"809-1835f952f20"
content-type: text/html; charset=UTF-8
content-length: 2057
date: Mon, 21 Aug 2023 06:26:17 GMT
x-envoy-upstream-service-time: 0
server: envoy

说明 header 路由灰度测试是生效的。

总结

本文我带你部署了 Cilium Mesh，并通过功能示例，带你体验了 Cilium Mesh。总之，这种方式能带来一定的便利性，但是流量治理配置主要依靠于 CiliumEnvoyConfig 或者 CiliumClusterwideEnvoyConfig，对于使用者而言不太友好。

Cilium Mesh 常见场景与示例

Cilium Mesh 流量治理功能，如限流、熔断、负载均衡、灰度、Admin等

前言