Baidu AI Cloud
中国站

百度智能云

Cloud Container Engine

Description of CCE Dynamic Scheduling Plug-in

CCE Dynamic Scheduling Plugin

cce-dysched-extender is a plugin of the default k8s scheduler. This plugin uses the scheduler extender mechanism to register Filter and Prioritize hooks with kube-scheduler to intervene in the scheduling behavior of the default scheduler.

The metrics data of the node comes from the metrics-server component. Before deploying cce-dysched-extender, you need to ensure that the metrics-server component in the cluster normally works.

The main features of the plugin are as follows:

  • The Filter feature can filter out nodes that have high resource utilization according to the threshold;
  • The Prioritize feature can prioritize nodes. The nodes with lower resource utilization have higher priority. The order of comparison is cpu memory;

You can set the resource tolerance threshold --tolerance-memory-rateor --tolerance-cpu-rateparameters according to your actual situation. The default value of the two parameters is 80%.

Installation and Deployment

  1. Deploy the cce-dysched-extender plugin and execute the kubectl apply -f all-in-one.yaml command. The content of the all-in-one.yaml file is as follows:
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: cce-dysched-extender
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: cce-dysched-extender
  name: cce-dysched-extender
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: cce-dysched-extender
    app.kubernetes.io/name: cce-dysched-extender
  name: cce-dysched-extender
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - pods
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - metrics.k8s.io
  resources:
  - nodes
  - pods
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: cce-dysched-extender
    app.kubernetes.io/name: cce-dysched-extender
  name: cce-dysched-extender
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cce-dysched-extender
subjects:
- kind: ServiceAccount
  name: cce-dysched-extender
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/cce-load-balancer-internal-vpc: "true"
  labels:
    app: cce-dysched-extender
  name: cce-dysched-extender
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  selector:
    app: cce-dysched-extender
  sessionAffinity: None
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: cce-dysched-extender
  name: cce-dysched-extender
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cce-dysched-extender
  template:
    metadata:
      labels:
        app: cce-dysched-extender
    spec:
      containers:
      - args:
        - -v=3
        - --tolerance-cpu-rate=80
        - --tolerance-memory-rate=80
        image: registry.baidubce.com/cce-plugin-pro/cce-dysched-extender:v0.3.0
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          periodSeconds: 60
        name: dysched-dysched
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          periodSeconds: 60
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: 100m
            memory: 100Mi
      serviceAccountName: cce-dysched-extender
  1. Get dysched-extender service
$ export DYSCHED_EXTENDER_SVC_IP=$(kubectl get service cce-dysched-extender -n kube-system | awk '{print $4}' | grep -v EXTER)
$ echo $DYSCHED_EXTENDER_SVC_IP

You should implement the remaining steps on all master nodes.

  1. Modify the /etc/kubernetes/scheduler-policy.json file (if nonexistent, you need to create the file) with the following content, and replace the <DYSCHED_EXTENDER_SVC_IP> with the value got in the preceding steps.
{
  "kind": "Policy",
  "apiVersion": "v1",
  "extenders": [
    {
      "urlPrefix": "http://<DYSCHED_EXTENDER_SVC_IP>:8080/dysched/extender",
      "prioritizeVerb": "prioritize",
      "filterVerb": "filter",
      "enableHttps": false,
      "nodeCacheCapable": true,
      "ignorable": true,
      "weight": 10
    }
  ]
}
  1. Modify the starting parameter of kube-scheduler, add --policy-config-file=/etc/kubernetes/scheduler-policy.json, and use the following command.
$ vim /etc/systemd/system/kube-scheduler.service
  1. Restart kube-scheduler and use the following command.
$ systemctl daemon-reload && systemctl restart kube-scheduler
Previous
Description of CCE Deep Learning Frameworks Operator
Next
Description of CCE GPU Manager