Cilium Mesh 常见场景与示例

Cilium Mesh 流量治理功能,如限流、熔断、负载均衡、灰度、Admin等

Posted by 陈谭军 on Saturday, August 12, 2023 | 阅读 |,阅读约 9 分钟

Cilium 官方版本给出的 Service Mesh 全景图,不同于其它 Service Mesh 开源项目设计了很多 CRD 概念,Cilium Service Mesh 当前专注实现了 mesh data plane,通过开放、包容的设计,能够对接其它 control plane,当前版本已实现了对 Envoy CRD、Kubernetes ingress、Istio、Spiffe、Gateway API 的支持。

前言

Cilium Service Mesh 与其它 ServiceMesh 显著对比:

  • 当前 Service Mesh 领域中, per-pod proxy (即 sidecar) 大行其道,Cilium 走出了 per-node proxy 路线的项目,其规避了 sidecar 资源开销多、网络延时高等弊端。
  • 其它 Service Mesh 项目几乎都是借助 Linux 内核网络协议栈劫持流量,而 Cilium Service Mesh 基于 eBPF datapath,有着天生的加速效果。
  • Cilium Service Mesh 承载于 Cilium CNI 的底座能力,Cilium CNI 本身提供的网络 policy、多集群 cluster mesh、eBPF 加速、观察性 hubble 等能力,其功能非常丰富。

Cilium 实现了 2个新的 CRD,CiliumEnvoyConfig 和 CiliumClusterwideEnvoyConfig,两者的写法和作用几乎相同,唯一区别是,CiliumEnvoyConfig 是 namespace scope,而 CiliumClusterwideEnvoyConfig 是 cluster scope。

本篇文章通过 Cilium Mesh 示例来了解如何配置 CiliumEnvoyConfig、CiliumClusterwideEnvoyConfig。

搭建 Kubernetes 集群

使用 Kind 构建 k8s 集群,具体如下所示:

kind create cluster  --image=kindest/node:v1.22.17 --name  tanjunchen
Creating cluster "tanjunchen" ...
⢀⡱ Ensuring node image (kindest/node:v1.22.17) 🖼
⢎⡀ Ensuring node image (kindest/node:v1.22.17) 🖼
⠎⠁ Ensuring node image (kindest/node:v1.22.17) 🖼
 ✓ Ensuring node image (kindest/node:v1.22.17) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-tanjunchen"
You can now use your cluster with:
kubectl cluster-info --context kind-tanjunchen
Thanks for using kind! 😊

root@instance-00qqerhq:~/cilium-mesh# kubectl get nodes
NAME                       STATUS   ROLES                  AGE   VERSION
tanjunchen-control-plane   Ready    control-plane,master   61s   v1.22.17
root@instance-00qqerhq:~/cilium-mesh# kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.22.17
WARNING: version difference between client (1.27) and server (1.22) exceeds the supported minor version skew of +/-1

部署测试应用

部署测试应用 wrk 与 nginx,具体如下所示:

root@instance-00qqerhq:~/cilium-mesh# kubectl apply -f wrk-nginx.yaml
deployment.apps/wrk unchanged
service/nginx unchanged
deployment.apps/nginx configured
root@instance-00qqerhq:~/cilium-mesh# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-699bb76bb4-p26v7   1/1     Running   0          2m16s
wrk-64884c57d7-vpml8     1/1     Running   0          2m16s

Cilium Mesh

前提条件

  • Cilium 必须使用 kubeProxyReplacement 模式为 partial 或 strict。
  • 支持的最低 Kubernetes 版本是 1.19。

安装

安装 Cilium Mesh

helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.14.0  --namespace kube-system --set kubeProxyReplacement=strict --set-string extraConfig.enable-envoy-config=true  --set envoy.enabled=true --set hubble.relay.enabled=true --set hubble.ui.enabled=true

Envoy 配置

目前仅支持 Envoy API v3。Cilium 节点部署 Envoy 来支持 Cilium HTTP 网络策略和可观察性。Cilium Mesh 中使用的 Envoy 已经针对 Cilium Agent 的需求进行了优化,并且不包含 Envoy 代码库中可用的许多 Envoy 扩展。Envoy 文档中引用的标准类型(type.googleapis.com/envoy.config.listener.v3.Listener 和 type.googleapis.com/envoy.config.route.v3.RouteConfiguration)始终可用。具体拓展如下所示:

envoy.clusters.dynamic_forward_proxy
envoy.filters.http.dynamic_forward_proxy
envoy.filters.http.ext_authz
envoy.filters.http.jwt_authn
envoy.filters.http.local_ratelimit
envoy.filters.http.oauth2
envoy.filters.http.ratelimit
envoy.filters.http.router
envoy.filters.http.set_metadata
envoy.filters.listener.tls_inspector
envoy.filters.network.connection_limit
envoy.filters.network.ext_authz
envoy.filters.network.http_connection_manager
envoy.filters.network.local_ratelimit
envoy.filters.network.mongo_proxy
envoy.filters.network.mysql_proxy
envoy.filters.network.ratelimit
envoy.filters.network.tcp_proxy
envoy.filters.network.sni_cluster
envoy.filters.network.sni_dynamic_forward_proxy
envoy.stat_sinks.metrics_service
envoy.transport_sockets.raw_buffer
envoy.upstreams.http.http
envoy.upstreams.http.tcp

场景

Cilium 提供了通过 CRD CiliumEnvoyConfig 和 CiliumClusterwideEnvoyConfig 控制 L7 流量。

这些 Envoy CRD 配置根本没有经过 K8s 验证,因此 Envoy 资源中的任何错误只会在 Cilium Agent 看到。kubectl apply 将报告成功,而解析和/或安装节点本地 Envoy 实例的资源可能会失败。目前验证这一点的唯一方法是观察 Cilium Agent 日志中的错误和警告。

版本

  • Cilium:v1.14.0
  • Kubernetes:v1.22.17

配置 Admin

给 envoy 下发 admin 配置,使其暴露 admin 管理界面。

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-admin-listener
spec:
  resources:
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: envoy-admin-listener
    address:
      socket_address:
        address: "::"
        ipv4_compat: true
        port_value: 9901
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: envoy-admin-listener
          route_config:
            name: admin_route
            virtual_hosts:
            - name: "admin_route"
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: "envoy-admin"
          use_remote_address: true
          skip_xff_append: true
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

查看 envoy 的 config_dump 接口,如下所示:

URL 重写

部署应用

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.14.0/examples/kubernetes/servicemesh/envoy/test-application.yaml

测试应用工作负载包括:两个客户端 client 和 client2,两个服务 echo-service-1 和 echo-service-2。

配置环境变量

export CLIENT2=$(kubectl get pods -l name=client2 -o jsonpath='{.items[0].metadata.name}')
export CLIENT=$(kubectl get pods -l name=client -o jsonpath='{.items[0].metadata.name}')

我们将使用 Envoy 配置来请求服务 echo-service-1 和 echo-service-2。我们可以请求 /,但是请求 /foo 路径,就会报 404。

kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080/foo
kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080
kubectl exec -it $CLIENT2 -- curl -I echo-service-2:8080

部署 envoy-lb-listener.yaml ,该文件定义了 CiliumClusterwideEnvoyConfig,如下所示:

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
    - name: echo-service-2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 50
                    - name: "default/echo-service-2"
                      weight: 50
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
                regex_rewrite:
                  pattern:
                    google_re2: { }
                    regex: "^/foo.*$"
                  substitution: "/"
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

上述配置两个后端 echo 服务之间的请求比例为 50/50,并且将路径 /foo 重写为 /,由于路径重写,对 /foo 的请求现在应该会成功。我们发现服务重写成功,原本 /foo 是请求响应 404,配置策略后是可以成功的,如下所示:

限流

针对 echo-service-1 配置 CiliumClusterwideEnvoyConfig 限流策略,如下所示:

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-ratelimit
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                - name: envoy.filters.http.local_ratelimit
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                    stat_prefix: http_local_rate_limiter
                    token_bucket:
                      max_tokens: 2
                      tokens_per_fill: 2
                      fill_interval: 5s
                    filter_enabled:
                      runtime_key: local_rate_limit_enabled
                      default_value:
                        numerator: 100
                        denominator: HUNDRED
                    filter_enforced:
                      runtime_key: local_rate_limit_enforced
                      default_value:
                        numerator: 100
                        denominator: HUNDRED
                    response_headers_to_add:
                    - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                      header:
                        key: x-local-rate-limit
                        value: 'true'
                    local_rate_limit_per_downstream_connection: false
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 100
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

执行以下操作,触发限流。

export CLIENT2=$(kubectl get pods -l name=client2 -o jsonpath='{.items[0].metadata.name}')
for i in {1..5}; do  kubectl exec -it $CLIENT2 -- curl -I echo-service-1:8080 | grep -E "x-local-rate-limit|429|local_rate_limited"; done

因为 echo-service-2 没有配置限流策略,所以请求 echo-service-2 没有触发限流。

熔断

部署 fortio 压测工具。

apiVersion: v1
kind: Service
metadata:
  name: fortio
  labels:
    app: fortio
    service: fortio
spec:
  ports:
  - port: 8080
    name: http
  selector:
    app: fortio
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fortio-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortio
  template:
    metadata:
      labels:
        app: fortio
    spec:
      containers:
      - name: fortio
        image: fortio/fortio:latest_release
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http-fortio
        - containerPort: 8079
          name: grpc-ping

给 echo-service-1 配置 CiliumClusterwideEnvoyConfig 或者 CiliumEnvoyConfig 熔断策略。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-circuit-breaker
spec:
  services:
    - name: echo-service-1
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 100
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      circuit_breakers:
        thresholds:
        - priority: "DEFAULT"
          max_requests: 2
          max_pending_requests: 1
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

使用两个并发连接 (-c 2) 调用服务并发送 20 个请求 (-n 20)。

kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service-1:8080 

如下所示,当并发请求 echo-service-1 压力较大时,会触发熔断策略。

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 http://echo-service-1:8080
{"ts":1691997950.946896,"level":"info","file":"scli.go","line":107,"msg":"Starting Φορτίο 1.57.3 h1:kdPlBiws3cFsLcssZxCt2opFmHj14C3yPBokFhMWzmg= go1.20.6 amd64 linux"}
Fortio 1.57.3 running at 0 queries per second, 4->4 procs, for 20 calls: http://echo-service-1:8080
{"ts":1691997950.947488,"level":"info","file":"httprunner.go","line":100,"msg":"Starting http test","run":"0","url":"http://echo-service-1:8080","threads":"2","qps":"-1.0","warmup":"parallel","conn-reuse":""}
Starting at max qps with 2 thread(s) [gomax 4] for exactly 20 calls (10 per thread + 0)
{"ts":1691997950.950205,"level":"warn","file":"http_client.go","line":1104,"msg":"Non ok http code","code":"503","status":"HTTP/1.1 503","thread":"0","run":"0"}
{"ts":1691997950.976027,"level":"info","file":"periodic.go","line":832,"msg":"T001 ended after 27.10868ms : 10 calls. qps=368.8855377687147"}
{"ts":1691997950.976465,"level":"info","file":"periodic.go","line":832,"msg":"T000 ended after 27.547964ms : 10 calls. qps=363.0032331971974"}
Ended after 27.591205ms : 20 calls. qps=724.87
{"ts":1691997950.976519,"level":"info","file":"periodic.go","line":564,"msg":"Run ended","run":"0","elapsed":"27.591205ms","calls":"20","qps":"724.8686673887566"}
Aggregated Function Time : count 20 avg 0.0027248375 +/- 0.002902 min 0.001064346 max 0.010990329 sum 0.05449675
# range, mid point, percentile, count
>= 0.00106435 <= 0.002 , 0.00153217 , 70.00, 14
> 0.002 <= 0.003 , 0.0025 , 80.00, 2
> 0.003 <= 0.004 , 0.0035 , 85.00, 1
> 0.006 <= 0.007 , 0.0065 , 90.00, 1
> 0.01 <= 0.0109903 , 0.0104952 , 100.00, 2
# target 50% 0.00171211
# target 75% 0.0025
# target 90% 0.007
# target 99% 0.0108913
# target 99.9% 0.0109804
Error cases : count 1 avg 0.001306888 +/- 0 min 0.001306888 max 0.001306888 sum 0.001306888
# range, mid point, percentile, count
>= 0.00130689 <= 0.00130689 , 0.00130689 , 100.00, 1
# target 50% 0.00130689
# target 75% 0.00130689
# target 90% 0.00130689
# target 99% 0.00130689
# target 99.9% 0.00130689
# Socket and IP used for each connection:
[0]   2 socket used, resolved to 10.96.39.34:8080, connection timing : count 2 avg 0.0002502065 +/- 6.296e-05 min 0.000187245 max 0.000313168 sum 0.000500413
[1]   1 socket used, resolved to 10.96.39.34:8080, connection timing : count 1 avg 0.00018792 +/- 0 min 0.00018792 max 0.00018792 sum 0.00018792
Connection time histogram (s) : count 3 avg 0.00022944433 +/- 5.92e-05 min 0.000187245 max 0.000313168 sum 0.000688333
# range, mid point, percentile, count
>= 0.000187245 <= 0.000313168 , 0.000250207 , 100.00, 3
# target 50% 0.000218726
# target 75% 0.000265947
# target 90% 0.00029428
# target 99% 0.000311279
# target 99.9% 0.000312979
Sockets used: 3 (for perfect keepalive, would be 2)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.96.39.34:8080: 3
Code 200 : 19 (95.0 %)
Code 503 : 1 (5.0 %)
Response Header Sizes : count 20 avg 370.55 +/- 85.01 min 0 max 391 sum 7411
Response Body/Total Sizes : count 20 avg 2336.75 +/- 480.8 min 241 max 2448 sum 46735
All done 20 calls (plus 0 warmup) 2.725 ms avg, 724.9 qps
root@instance-00qqerhq:~/cilium-mesh/strateges#

负载均衡

配置请求客户端 CLIENT:

export CLIENT=$(kubectl get pods -l name=client -o jsonpath='{.items[0].metadata.name}')

测试命令,如下所示:

for i in {1..10}; do  kubectl exec -it $CLIENT -- curl  helloworld:5000/hello; done
kubectl exec -it $CLIENT -- curl  helloworld:5000/hello

部署 helloworld 测试应用:

kubectl apply -f https://github.com/istio/istio/blob/master/samples/helloworld/helloworld.yaml

配置 CiliumEnvoyConfig 路由策略,随机访问 helloworld v1 与 v2。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: helloworld-lb
spec:
  services:
    - name: helloworld
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: helloworld-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: helloworld-lb-listener
                rds:
                  route_config_name: helloworld_lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: helloworld_lb_route
      virtual_hosts:
        - name: "helloworld_lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                cluster: default/helloworld
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS

配置策略,访问 helloworld 的请求,v1 占比 80%,v2 占比 20%。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: helloworld-lb
spec:
  services:
    - name: helloworld
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: helloworld-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: helloworld-lb-listener
                rds:
                  route_config_name: helloworld_lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: helloworld_lb_route
      virtual_hosts:
        - name: "helloworld_lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                cluster: default/helloworld
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: STATIC
      load_assignment:
        cluster_name: "default/helloworld"
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 10.0.0.55  # helloworld v2 pod IP
                  port_value: 5000
            load_balancing_weight: 20
          - endpoint:
              address:
                socket_address:
                  address: 10.0.0.238
                  port_value: 5000
            load_balancing_weight: 80  # helloworld v1 pod IP

在上述 CiliumEnvoyConfig 配置文件中,我们使用的是静态 Pod IP,只是作为测试,不太推荐使用

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl apply -f cec-envoy-lb-weight-load_assignment.yaml
ciliumenvoyconfig.cilium.io/helloworld-lb configured

从上述截图中可以得出,配置的 CCEC 策略生效了,接近 80% 的请求发送到 v1,20% 的请求发送到 v2。
上述配置静态 Pod IP 不太推荐,我们使用按照比例 90% 的请求发送到 v1,10% 的请求发送到 v2 前,简单了解下述内容。
Envoy 发现服务 (EDS) 的名字需要遵循 namespace/service-name:port 规范。

CiliumClusterwideEnvoyConfig 或者 CiliumEnvoyConfig 中的 BackendServices 指定 Kubernetes 服务,其后端使用 EDS 自动同步到 Envoy。这些服务的流量不会转发到 Envoy 侦听器。这允许 Envoy 侦听器对这些后端的流量进行负载平衡,而正常的 Cilium 服务负载平衡则同时负责平衡这些服务的流量。

我们需要为 helloworld-v1 与 helloworld-v2 分别定义 service helloworld-v1 与 helloworld-v2,如下所示:

apiVersion: v1
kind: Service
metadata:
  name: helloworld
  labels:
    app: helloworld
    service: helloworld
    version: v1
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: helloworld
---
apiVersion: v1
kind: Service
metadata:
  name: helloworld-v1
  labels:
    app: helloworld
    service: helloworld
    version: v1
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: helloworld
    version: v1

配置路由策略,helloworld 90% 流量指向 v1 版本,10% 流量指向 v2 版本。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: helloworld
      namespace: default
  backendServices:
    - name: helloworld-v1
      namespace: default
    - name: helloworld-v2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/helloworld-v1"
                      weight: 90
                    - name: "default/helloworld-v2"
                      weight: 10
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld-v1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/helloworld-v2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

执行以下测试命令,我们发现 90 % 的流量指向 v1,10%的流量指向 v2,符合预期。

for i in {1..10}; do  kubectl exec -it $CLIENT -- curl  helloworld:5000/hello; done

Metric

给 envoy 下发 Prometheus 配置,使其暴露 Metric 指标。

apiVersion: cilium.io/v2
kind: CiliumClusterwideEnvoyConfig
metadata:
  name: envoy-prometheus-metrics-listener
spec:
  resources:
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: envoy-prometheus-metrics-listener
    address:
      socket_address:
        address: "::"
        ipv4_compat: true
        port_value: 9090
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: envoy-prometheus-metrics-listener
          rds:
            route_config_name: prometheus_route
          use_remote_address: true
          skip_xff_append: true
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
    name: prometheus_route
    virtual_hosts:
    - name: "prometheus_metrics_route"
      domains: ["*"]
      routes:
      - match:
          path: "/metrics"
        route:
          cluster: "envoy-admin"
          prefix_rewrite: "/stats/prometheus"

查看指标

http://xxx/stats/prometheus

AccessLog

Cilium Mesh 使用的 Envoy 不支持 accesslog,因为 Cilium 使用的 Envoy 构建中未启用这些 Envoy 扩展。具体参见源码 extensions_build_config。后续有个 PR 已经打开该特性,可参见 Enable access logger resource PR。

灰度

针对 echo-service-1 与 echo-service-2 配置 CiliumEnvoyConfig 路由策略,实现 header 灰度效果。

apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: envoy-lb-listener
spec:
  services:
    - name: echo-service-1
      namespace: default
    - name: echo-service-2
      namespace: default
  resources:
    - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
      name: envoy-lb-listener
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: envoy-lb-listener
                rds:
                  route_config_name: lb_route
                use_remote_address: true
                skip_xff_append: true
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
      name: lb_route
      virtual_hosts:
        - name: "lb_route"
          domains: [ "*" ]
          routes:
            - match:
                prefix: "/"
                headers:
                  - name: "version"
                    exact_match: "foo"
              route:
                cluster: default/echo-service-1
                regex_rewrite:
                  pattern:
                    google_re2: { }
                    regex: "^/foo.*$"
                  substitution: "/"
            - match:
                prefix: "/"
              route:
                weighted_clusters:
                  clusters:
                    - name: "default/echo-service-1"
                      weight: 50
                    - name: "default/echo-service-2"
                      weight: 50
                retry_policy:
                  retry_on: 5xx
                  num_retries: 3
                  per_try_timeout: 1s
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-1"
      connect_timeout: 5s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2
    - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
      name: "default/echo-service-2"
      connect_timeout: 3s
      lb_policy: ROUND_ROBIN
      type: EDS
      outlier_detection:
        split_external_local_origin_errors: true
        consecutive_local_origin_failure: 2

如下图所示:默认访问 echo-service-1 foo 接口会报 404 错误,但是如果请求中携带 header version=foo,则会请求成功。

root@instance-00qqerhq:~/cilium-mesh/strateges# kubectl exec -it $CLIENT2 -- curl -H"version:foo" -I echo-service-1:8080/foo
HTTP/1.1 200 OK
x-powered-by: Express
vary: Origin, Accept-Encoding
access-control-allow-credentials: true
accept-ranges: bytes
cache-control: public, max-age=0
last-modified: Wed, 21 Sep 2022 10:25:56 GMT
etag: W/"809-1835f952f20"
content-type: text/html; charset=UTF-8
content-length: 2057
date: Mon, 21 Aug 2023 06:26:17 GMT
x-envoy-upstream-service-time: 0
server: envoy

说明 header 路由灰度测试是生效的。

总结

本文我带你部署了 Cilium Mesh,并通过功能示例,带你体验了 Cilium Mesh。总之,这种方式能带来一定的便利性,但是流量治理配置主要依靠于 CiliumEnvoyConfig 或者 CiliumClusterwideEnvoyConfig,对于使用者而言不太友好。