Kubernetes应用质量管理

# 服务质量管理

在 Kubernetes 中，Pod 是最小的调度单元，所以跟资源和调度相关的属性都是 Pod 对象的字段，而其中最重要的就是 CPU 和内存。如下所示：

---
apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
spec:
  containers:
    - name: myweb
      image: wordpress
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          memory: "128Mi"
          cpu: "250m"
        limits:
          memory: "256Mi"
          cpu: "500m"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

其中 resources 就是资源限制部分。

注：由于一个 Pod 里可以定义多个 Containers，而每个资源限制都是配置在各自的 Container，所以 Pod 的整体配置资源是所有 Containers 的总和。

在 Kubernetes 中，CPU 这样的资源被称为"可压缩资源"，所谓可压缩资源就是当可用资源不足的时候，Pod 只会"饥饿"，不会退出。而向 Memory 这样的资源被称为"不可压缩资源"，所谓的不可压缩资源就是当资源不足的时候 Pod 只会 OOM。

其中 CPU 的设置单位是 CPU 的个数，比如 CPU=1 就表示这个 Pod 的 CPU 限额是 1 个 CPU，而到底是 1 个 CPU 核心、是 1 个 vCPU 还是 1 个 CPU 超线程，这要取决于宿主机上 CPU 实现方式，而 Kunernetes 只需要保证该 Pod 能够使用到 1 个 CPU 的使用能力。

Kubernetes 允许将 CPU 的限额设置位分数，比如上面我们设置的 CPU.limits 的值为 500m，而所谓的 500m 就是 500milliCPU，也就是 0.5 个 CPU，这样，这个 Pod 就会被分到一个 CPU 一半的计算能力。所以我们可以直接把配置写成 cpu=0.5，不过官方推荐 500m 的写法，这是 Kubernetes 内部的 CPU 计算方式。

在 Kubernetes 中，内存资源的单位是 bytes，支持使用 Ei，Pi，Ti，Gi，Mi，Ki 的方式作为 bytes 的值，其中需要注意 Mi 和 M 的区别（1Mi=10241024，1M=10001000）。

Kubernetes 中 Pod 的 CPU 和内存的资源限制，实际上分为 requests 和 limits 两种情况。

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory

1
2
3
4

这两者的区别如下：

在调度的时候，kube-scheduler 会安 requests 的值进行计算；
在设置 CGroups 的时候，kubelet 会安 limits 的值来进行设置；

# QoS 模型

Kubernetes 中支持三种 QoS 模型。其分类是基于 requests 和 limits 的不同配置。

# Guaranteed

当 Pod 里的每一个 Containers 都设置了 requests 和 limits，并且其值都相等的时候，这种 Pod 就属于 Guaranteed 类别，如下：

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  namespace: qos-example
spec:
  containers:
    - name: qos-demo-ctr
      image: nginx
      resources:
        limits:
          memory: "200Mi"
          cpu: "700m"
        requests:
          memory: "200Mi"
          cpu: "700m"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

注意，当这 Pod 仅设置 limits，没有设置 requests 的时候，系统默认为它分配于 limits 相等的 requests 值，也就会被划分为 Guaranteed 类别。

# Burstable

而当这个 Pod 不满足 Guaranteed 条件，但至少有一个 Contaienrs 设置了 requests，那么这个 Pod 就会被划分为 Burstable 类别。如下：

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-2
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-2-ctr
    image: nginx
    resources:
      limits
        memory: "200Mi"
      requests:
        memory: "100Mi"

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# BestEffort

如果这个 Pod 既没有设置 requests 值，也没有设置 limits 的值的时候，那么它的 QoS 类别就是 BestEffort 类别。

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-3
  namespace: qos-example
spec:
  containers:
    - name: qos-demo-3-ctr
      image: nginx

1
2
3
4
5
6
7
8
9

而 QoS 划分的主要场景就是当宿主机资源紧张的时候，kubelet 对资源进行 Eviction 时需要用到。目前 Kubernetes 设置的默认 Eviction 的阈值如下：

memory.available<100Mi
nodefs.available<10%
nodefs.inodesFree<5%
imagefs.available<15%

1
2
3
4

上述条件可以在 kubelet 中设置：

kubelet --eviction-hard=imagefs.available<10%,memory.available<500Mi,nodefs.available<5%,nodefs.inodesFree<5% --eviction-soft=imagefs.available<30%,nodefs.available<10% --eviction-soft-grace-period=imagefs.available=2m,nodefs.available=2m --eviction-max-pod-grace-period=600

Kubernetes 中的 Eviction 分为 Soft Eviction 和 Hard Eviction 两种模式。

Soft Eviction 允许设置优雅等待时间，如上设置 imagefs.available=2m，允许在 Imagefs 不足阈值达到 2 分钟之后才进行 Eviction;
Hard Eviction 在达到阈值就进行 Eviction；

当宿主机的 Eviction 阈值达到后，就会进入 MemoryPressure 或者 DiskPressure 状态，从而避免新的 Pod 调度到上面去。而当 Eviction 发生时，kubelet 删除 Pod 的先后顺序如下：

BestEffort 类型的 Pod；
Burstable 类别并且发生"饥饿"的资源使用量已经超出了 requests 的 Pod；
Guaranteed 类别并且只有当 Guaranteed 类别的 Pod 的资源使用量超过了其 limits 限制，或者宿主机本身处于 Memory Pressure 状态时，Guaranteed 才会被选中被 Eviction;

# cpuset

cpuset，就是把容器绑定到某个 CPU 核上，减少 CPU 的上下文切换。

Pod 必须是 Guaranteed 类型；
只需要将 Pod 的 CPU 资源的 requests 和 limits 设置为同一个相等的数值；

spec:
  containers:
    - name: nginx
      image: nginx
      resources:
        limits:
          memory: "200Mi"
          cpu: "2"
        requests:
          memory: "200Mi"
          cpu: "2"

1
2
3
4
5
6
7
8
9
10
11

# LimitRange

在正常配置应用 Pod 的时候，都会把服务质量加上，也就是配置好 requests 和 limits，但是，如果 Pod 非常多，而且很多 Pod 只需要相同的限制，我们还是像上面那样一个一个的加就非常繁琐了，这时候我们就可以通过LimitRange做一个全局限制。如果在部署 Pod 的时候指定了 requests 和 Limits，则指定的生效。反之则由全局的给 Pod 加上默认的限制。

总结，LimitRange可以实现的功能：

限制 namespace 中每个 pod 或 container 的最小和最大资源用量。
限制 namespace 中每个 PVC 的资源请求范围。
限制 namespace 中资源请求和限制数量的比例。
配置资源的默认限制。

常用的场景如下（来自《Kubernetes 权威指南》）

集群中的每个节点都有 2GB 内存，集群管理员不希望任何 Pod 申请超过 2GB 的内存：因为在整个集群中都没有任何节点能满足超过 2GB 内存的请求。如果某个 Pod 的内存配置超过 2GB，那么该 Pod 将永远都无法被调度到任何节点上执行。为了防止这种情况的发生，集群管理员希望能在系统管理功能中设置禁止 Pod 申请超过 2GB 内存。
集群由同一个组织中的两个团队共享，分别运行生产环境和开发环境。生产环境最多可以使用 8GB 内存，而开发环境最多可以使用 512MB 内存。集群管理员希望通过为这两个环境创建不同的命名空间，并为每个命名空间设置不同的限制来满足这个需求。
用户创建 Pod 时使用的资源可能会刚好比整个机器资源的上限稍小，而恰好剩下的资源大小非常尴尬：不足以运行其他任务但整个集群加起来又非常浪费。因此，集群管理员希望设置每个 Pod 都必须至少使用集群平均资源值（CPU 和内存）的 20%，这样集群能够提供更好的资源一致性的调度，从而减少了资源浪费。

（1）、首先创建一个 namespace

apiVersion: v1
kind: Namespace
metadata:
  name: coolops

1
2
3
4

（2）、为 namespace 配置 LimitRange

apiVersion: v1
kind: LimitRange
metadata:
  name: mylimit
  namespace: coolops
spec:
  limits:
    - max:
        cpu: "1"
        memory: 1Gi
      min:
        cpu: 100m
        memory: 10Mi
      maxLimitRequestRatio:
        cpu: 3
        memory: 4
      type: Pod
    - default:
        cpu: 300m
        memory: 200Mi
      defaultRequest:
        cpu: 200m
        memory: 100Mi
      max:
        cpu: "2"
        memory: 1Gi
      min:
        cpu: 100m
        memory: 10Mi
      maxLimitRequestRatio:
        cpu: 5
        memory: 4
      type: Container

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

参数说明：

max：如果 type 是 Pod，则表示 pod 中所有容器资源的 Limit 值和的上限，也就是整个 pod 资源的最大 Limit，如果 pod 定义中的 Limit 值大于 LimitRange 中的值，则 pod 无法成功创建。如果 type 是 Container，意义类似。
min：如果 type 是 Pod，则表示 pod 中所有容器资源请求总和的下限，也就是所有容器 request 的资源总和不能小于 min 中的值，否则 pod 无法成功创建。如果 type 是 Container，意义类似。
maxLimitRequestRatio：如果 type 是 Pod，表示 pod 中所有容器资源请求的 Limit 值和 request 值比值的上限，例如该 pod 中 cpu 的 Limit 值为 3，而 request 为 0.5，此时比值为 6，创建 pod 将会失败。
defaultrequest 和 defaultlimit 则是默认值，只有 type 为 Container 才有这两项配置

注意：

（1）、如果container设置了max， pod中的容器必须设置limit，如果未设置，则使用defaultlimt的值，如果defaultlimit也没有设置，则无法成功创建

（2）、如果设置了container的min，创建容器的时候必须设置request的值，如果没有设置，则使用defaultrequest，如果没有defaultrequest，则默认等于容器的limit值，如果limit也没有，启动就会报错

创建上面配置的 LimitRange：

$ kubectl apply -f limitrange.yaml
limitrange/mylimit created

$ kubectl get limitrange -n coolops
NAME      CREATED AT
mylimit   2022-08-02T06:08:43Z

$ kubectl describe limitranges -n coolops mylimit
Name:       mylimit
Namespace:  coolops
Type        Resource  Min   Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---   ---  ---------------  -------------  -----------------------
Pod         cpu       100m  1    -                -              3
Pod         memory    10Mi  1Gi  -                -              4
Container   cpu       100m  2    200m             300m           5
Container   memory    10Mi  1Gi  100Mi            200Mi          4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

（3）、创建一个允许范围之内的 requests 和 limits 的 pod

apiVersion: v1
kind: Pod
metadata:
  name: pod01
  namespace: coolops
spec:
  containers:
    - name: pod-01
      image: nginx
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: 200m
          memory: 30Mi
        limits:
          cpu: 300m
          memory: 50Mi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

我们通过kubectl apply -f pod-01.yaml可以正常创建 Pod。

（4）、创建一个 cpu 超出允许访问的 Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod02
  namespace: coolops
spec:
  containers:
    - name: pod-02
      image: nginx
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: 200m
          memory: 30Mi
        limits:
          cpu: 2
          memory: 50Mi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

然后我们创建会报如下错误：

## kubectl apply -f pod-02.yaml
Error from server (Forbidden): error when creating "pod-02.yaml": pods "pod02" is forbidden: [maximum cpu usage per Pod is 1, but limit is 2, cpu max limit to request ratio per Pod is 3, but provided ratio is 10.000000, cpu max limit to request ratio per Container is 5, but provided ratio is 10.000000]

1
2

（5）创建低于允许范围的 Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod03
  namespace: coolops
spec:
  containers:
    - name: pod-03
      image: nginx
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: 200m
          memory: 30Mi
        limits:
          cpu: 100m
          memory: 10Mi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

然后会报如下错误：

## kubectl apply -f pod-03.yaml
The Pod "pod03" is invalid:
* spec.containers[0].resources.requests: Invalid value: "200m": must be less than or equal to cpu limit
* spec.containers[0].resources.requests: Invalid value: "30Mi": must be less than or equal to memory limit

1
2
3
4

（6）、创建一个未定义 request 或 Limits 的 Pod

apiVersion: v1
kind: Pod
metadata:
  name: pod04
  namespace: coolops
spec:
  containers:
    - name: pod-04
      image: nginx
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: 200m
          memory: 200Mi

1
2
3
4
5
6
7
8
9
10
11
12
13
14

然后我们创建完 Pod 后会发现自动给我们加上了 limits。如下：

## kubectl describe pod -n coolops pod04
---
Limits:
  cpu: 300m
  memory: 200Mi
Requests:
  cpu: 200m
  memory: 200Mi

1
2
3
4
5
6
7
8

上面我指定了 requests，LimitRange 自动给我们加上了 defaultLimits，你也可以试一下全都不加或者加一个，道理是一样的。值得注意的是这里要注意一下我们设置的maxLimitRequestRatio，配置的比列必须小于等于我们设置的值。

上文有介绍 LimitRange 还可以限制还可以限制 PVC，如下：

apiVersion: v1
kind: LimitRange
metadata:
  name: storagelimits
  namespace: coolops
spec:
  limits:
    - type: PersistentVolumeClaim
      max:
        storage: 2Gi
      min:
        storage: 1Gi

1
2
3
4
5
6
7
8
9
10
11
12

创建完后即可查看：

 kubectl describe limitranges -n coolops storagelimits
Name:                  storagelimits
Namespace:             coolops
Type                   Resource  Min  Max  Default Request  Default Limit  Max Limit/Request Ratio
----                   --------  ---  ---  ---------------  -------------  -----------------------
PersistentVolumeClaim  storage   1Gi  2Gi  -                -              -

1
2
3
4
5
6

你可以创建 PVC 进行测试，道理是一样的。

# 服务可用性管理

# 高可用

生产级别应用，为了保证应用的可用性，除了特殊应用（例如批次应用）都会保持高可用，所以在设计应用 Pod 的时候，就要考虑应用的高可用。

最简单的就是多副本，也就是在创建应用的时候，至少需要 2 个副本，如下指定 replicas 为 3 就表示该应用有 3 个副本：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx:1.8
          imagePullPolicy: IfNotPresent
          name: nginx
          resources:
            requests:
              cpu: 0.5
              memory: 500M
            limits:
              cpu: 0.5
              memory: 500M
          ports:
            - containerPort: 80
              name: http
              protocol: TCP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

但是光配置多副本就够了么？

如果这三个副本都调度到一台服务器上，该服务器因某些原因宕机了，那上面的应用是不是就不可用？

为了解决这个问题，我们需要为同一个应用配置反亲和性，也就是不让同一应用的 Pod 调度到同一主机上，将上面的应用 YAML 改造成如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - nginx
              topologyKey: kubernetes.io/hostname
      containers:
        - image: nginx:1.8
          imagePullPolicy: IfNotPresent
          name: nginx
          resources:
            requests:
              cpu: 0.5
              memory: 500M
            limits:
              cpu: 0.5
              memory: 500M
          ports:
            - containerPort: 80
              name: http
              protocol: TCP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

这样能保证同应用不会被调度到同节点，基本的高可用已经做到了。

# 可用性

但是光保证应用的高可用，应用本身不可用，也会导致异常。

我们知道 Kubernetes 的 Deployment 的默认更新策略是滚动更新，如何保证新应用更新后是可用的，这就要使用 readinessProbe，用来确保应用可用才会停止老的版本，上面的 YAML 修改成如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - nginx
              topologyKey: kubernetes.io/hostname
      containers:
        - image: nginx:1.8
          imagePullPolicy: IfNotPresent
          name: nginx
          resources:
            requests:
              cpu: 0.5
              memory: 500M
            limits:
              cpu: 0.5
              memory: 500M
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: http
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 3
          ports:
            - containerPort: 80
              name: http
              protocol: TCP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

这样至少能保证只有新版本可访问才接收外部流量。

但是应用运行过程中异常了呢？这就需要使用 livenessProbe 来保证应用持续可用，上面的 YAML 修改成如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: kubernetes.io/hostname
      containers:
      - image: nginx:1.8
        imagePullPolicy: IfNotPresent
        name: nginx
        resources:
          requests:
            cpu: 0.5
            memory: 500M
          limits:
            cpu: 0.5
            memory: 500M
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        ports:
        - containerPort: 80
          name: http
          protocol: TCP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

上面的 readinessProbe 和 livenessProbe 都是应用在运行过程中如何保证其可用，那应用在退出的时候如何保证其安全退出？

所谓安全退出，也就是能正常处理退出逻辑，能够正常处理退出信号，也就是所谓的优雅退出。

优雅退出有两种常见的解决方法：

应用本身可以处理 SIGTERM 信号。
设置一个 preStop hook，在 hook 中指定怎么优雅停止容器

这里抛开应用本身可以处理 SIGTERM 信号不谈，默认其能够处理，我们要做的就是协助其能优雅退出。在 Kubernetes 中，使用 preStop hook 来协助处理，我们可以将上面的 YAML 修改成如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: kubernetes.io/hostname
      containers:
      - image: nginx:1.8
        imagePullPolicy: IfNotPresent
        name: nginx
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 15
        resources:
          requests:
            cpu: 0.5
            memory: 500M
          limits:
            cpu: 0.5
            memory: 500M
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: http
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        ports:
        - containerPort: 80
          name: http
          protocol: TCP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

当然，这里只是一个样例，实际的配置还需要根据企业情况做跳转，比如企业使用了注册中心如 zk 或者 nacos，我们就需要把服务从注册中心下掉。

# PDB

上面的那些配置基本可以让应用顺利的在 Kubernetes 里跑了，但是不可避免有维护节点的需求，比如升级内核，重启服务器等。

而且也不是所有的应用都可以多副本，当我们使用kubectl drain的时候，为了避免某个或者某些应用直接销毁而不可用，Kubernetes 引入了 PodDisruptionBudget（PDB）控制器，用来控制集群中 Pod 的运行个数。

在 PDB 中，主要通过两个参数来控制 Pod 的数量：

minAvailable：表示最小可用 Pod 数，表示在 Pod 集群中处于运行状态的最小 Pod 数或者是运行状态的 Pod 数和总数的百分比；
maxUnavailable：表示最大不可用 Pod 数，表示 Pod 集群中处于不可用状态的最大 Pod 数或者不可用状态 Pod 数和总数的百分比；

注意：minAvailable 和 maxUnavailable 是互斥了，也就是说两者同一时刻只能出现一种。

kubectl drain 命令已经支持了 PodDisruptionBudget 控制器，在进行 kubectl drain 操作时会根据 PodDisruptionBudget 控制器判断应用 POD 集群数量，进而保证在业务不中断或业务 SLA 不降级的情况下进行应用 POD 销毁。在进行 kubectl drain 或者 Pod 主动逃离的时候，Kubernetes 会通过以下几种情况来进行判断：

minAvailable 设置成了数值 5：应用 POD 集群中最少要有 5 个健康可用的 POD，那么就可以进行操作。
minAvailable 设置成了百分数 30%：应用 POD 集群中最少要有 30%的健康可用 POD，那么就可以进行操作。
maxUnavailable 设置成了数值 5：应用 POD 集群中最多只能有 5 个不可用 POD，才能进行操作。
maxUnavailable 设置成了百分数 30%：应用 POD 集群中最多只能有 30%个不可用 POD，才能进行操作。

在极端的情况下，比如将 maxUnavailable 设置成 0，或者设置成 100%，那么就表示不能进行 kubectl drain 操作。同理将 minAvailable 设置成 100%，或者设置成应用 POD 集群最大副本数，也表示不能进行 kubectl drain 操作。

注意：使用 PodDisruptionBudget 控制器并不能保证任何情况下都对业务 POD 集群进行约束，PodDisruptionBudget 控制器只能保证 POD 主动逃离的情况下业务不中断或者业务 SLA 不降级，例如在执行 kubectldrain 命令时。

（1）、定义 minAvailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-demo
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: nginx

1
2
3
4
5
6
7
8
9

（2）、定义 maxUnavailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-demo
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: nginx

1
2
3
4
5
6
7
8
9

可以看到 PDB 是通过 label selectors 和应用 Pod 建立关联，而后在主动驱逐 Pod 的时候，会保证 app: nginx 的 Pod 最大不可用数为 1，假如本身是 3 副本，至少会保证 2 副本正常运行。

# 总结

上面只是对 Kubernetes 中应用做了简单的可用性保障，在生产中，应用不仅仅是它自己，还关联上游、下游的应用，所以全链路的应用可用性保障才能让应用更稳定。

上次更新: 2025/07/19, 21:23:02

← Kubernetes调度管理 Kubernetes数据持久化管理→