发布于 

容器使用lxcfs进行资源隔离

背景

容器可以通过cgroup的方式对资源的使用情况进行限制,包括cpu、内存等,但是容器内的一些进程获取系统参数时获取到的还是宿主机的参数,如node中通过os获取cpu数量,获取到的就是宿主机的数据,而非容器的数据。这是由于容器并没有做到对/proc/sys等文件系统的资源视图隔离。

如下所示:
部署一个node容器,并对该容器的cpu和内存进行限制

apiVersion: apps/v1
kind: Deployment
metadata:
  name: node-test
spec:
  selector:
    matchLabels:
      app: node
  replicas: 1
  template:
    metadata:
      labels:
        app: node
    spec:
      containers:
      - name: node
        image: node:12.18.4
        command:
        - sleep
        args:
        - infinity
        resources:
          limits:
            cpu: 2000m
            memory: "256Mi"
          requests:
            cpu: 2000m
            memory: "256Mi"

进入容器中查看,可以看到/proc/cpuinfo下的cpu数据和free显示的内存数据都是宿主机的数据,同时进入node,使用os模块获取到的cpu和内存数据也都是宿主机的数据

使用lxcfs进行资源视图隔离

lxcfs项目地址:https://github.com/lxc/lxcfs
lxcfs是一个小型FUSE文件系统,其目的是使容器看起来更像一个虚拟机。lxcfs将为容器提供一下信息:

/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
/proc/slabinfo
/sys/devices/system/cpu/online

这使得在容器中获取到的这些信息是容器真实的数据,而非宿主机的数据

在Docker中使用lxcfs

安装lxcfs

git clone git://github.com/lxc/lxcfs
cd lxcfs
meson setup -Dinit-script=systemd --prefix=/usr build/
meson compile -C build/
sudo meson install -C build/

sudo mkdir -p /var/lib/lxcfs
sudo lxcfs /var/lib/lxcfs

编写service文件,注册为系统服务

cat > /usr/lib/systemd/system/lxcfs.service <<EOF
[Unit]
Description=lxcfs

[Service]
ExecStart=/usr/bin/lxcfs -f /var/lib/lxcfs
Restart=on-failure

[Install]
WantedBy=multi-user.targetEOF

启动服务并设置为开机自启

systemctl enable lxcfs --now

在Docker中使用lxcfs示例

docker run -it -m 256m --memory-swap 256m \
      -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
      -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \
      -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \
      -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \
      -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \
      -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \
      -v /var/lib/lxcfs/proc/slabinfo:/proc/slabinfo:rw \
      -v /var/lib/lxcfs/sys/devices/system/cpu:/sys/devices/system/cpu:rw \
      ubuntu:18.04 /bin/bash

在K8s中使用lxcfs

在kubernetes中使用lxcfs需要解决两个问题:

  1. k8s的每个节点都需要安装lxcfs
  2. 每个pod都需要挂载lxcfs维护的/var/lib/lxcfs/proc文件
    针对第一个问题,我们可以使用DaemonSet类型的控制器,以保证在每个节点都安装lxcfs,可直接使用以下yaml部署文件 lxcfs.yaml:
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: lxcfs
      labels:
        app: lxcfs
    spec:
      selector:
        matchLabels:
          app: lxcfs
      template:
        metadata:
          labels:
            app: lxcfs
        spec:
          hostPID: true
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          containers:
          - name: lxcfs
            image: registry.cn-hangzhou.aliyuncs.com/toodo/lxcfs:3.1.2-build.0
            imagePullPolicy: Always
            securityContext:
              privileged: true
            volumeMounts:
            - name: cgroup
              mountPath: /sys/fs/cgroup
            - name: lxcfs
              mountPath: /var/lib/lxcfs
              mountPropagation: Bidirectional
            - name: usr-local
              mountPath: /usr/local
          volumes:
          - name: cgroup
            hostPath:
              path: /sys/fs/cgroup
          - name: usr-local
            hostPath:
              path: /usr/local
          - name: lxcfs
            hostPath:
              path: /var/lib/lxcfs
              type: DirectoryOrCreate
    使用以下命令部署lxcfs至每个节点:
    kubectl apply -f lxcfs.yaml
    针对第二个问题,可以通过在k8s资源文件中声明对宿主机/var/lib/lxcfs系列文件进行挂载,也可以通过admission-webhook进行控制【可参考:https://github.com/denverdino/lxcfs-admission-webhook】,我们以第一种方式为例:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: node-test-lxcfs
    spec:
      selector:
        matchLabels:
          app: node-test-lxcfs
      replicas: 1
      template:
        metadata:
          labels:
            app: node-test-lxcfs
        spec:
          containers:
          - name: node-test-lxcfs
            image: node:12.18.4
            resources:
              limits:
                cpu: 2000m
                memory: "256Mi"
              requests:
                cpu: 2000m
                memory: "256Mi"
            ports:
            - containerPort: 80
            volumeMounts:
              - name: cpuinfo
                mountPath: /proc/cpuinfo
              - name: diskstats
                mountPath: /proc/diskstats
              - name: meminfo
                mountPath: /proc/meminfo
              - name: stat
                mountPath: /proc/stat
              - name: swaps
                mountPath: /proc/swaps
              - name: uptime
                mountPath: /proc/uptime
          volumes:
          - name: cpuinfo
            hostPath:
              path: /var/lib/lxcfs/proc/cpuinfo
          - name: diskstats
            hostPath:
              path: /var/lib/lxcfs/proc/diskstats
          - name: meminfo
            hostPath:
              path: /var/lib/lxcfs/proc/meminfo
          - name: stat
            hostPath:
              path: /var/lib/lxcfs/proc/stat
          - name: swaps
            hostPath:
              path: /var/lib/lxcfs/proc/swaps
          - name: uptime
            hostPath:
              path: /var/lib/lxcfs/proc/uptime
    进入容器中可以看到使用命令获取到的为容器实际分配的资源大小了

附录

lxcfs镜像构建文件

Dockerfile

FROM centos:7 as build
RUN sed -e 's|^mirrorlist=|#mirrorlist=|g' -e 's|^#baseurl=http://mirror.centos.org/altarch/|baseurl=https://mirrors.ustc.edu.cn/centos-altarch/|g' -e 's|^#baseurl=http://mirror.centos.org/centos|baseurl=https://mirrors.ustc.edu.cn/centos|g' -i.bak /etc/yum.repos.d/CentOS-Base.repo && \
    yum -y update
RUN yum -y install fuse-devel pam-devel wget install gcc automake autoconf libtool make
ENV LXCFS_VERSION 3.1.2
RUN wget https://linuxcontainers.org/downloads/lxcfs/lxcfs-$LXCFS_VERSION.tar.gz && \
    mkdir /lxcfs && tar xzvf lxcfs-$LXCFS_VERSION.tar.gz -C /lxcfs  --strip-components=1 && \
    cd /lxcfs && ./configure && make

FROM centos:7
STOPSIGNAL SIGINT
COPY --from=build /lxcfs/lxcfs /usr/local/bin/lxcfs
COPY --from=build /lxcfs/.libs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so
COPY --from=build /lxcfs/lxcfs /lxcfs/lxcfs
COPY --from=build /lxcfs/.libs/liblxcfs.so /lxcfs/liblxcfs.so
COPY --from=build /usr/lib64/libfuse.so.2.9.2 /lxcfs/libfuse.so.2.9.2
COPY --from=build /usr/lib64/libulockmgr.so.1.0.1 /lxcfs/libulockmgr.so.1.0.1

COPY start.sh /
CMD ["/start.sh"]

start.sh

#!/bin/bash

  # Cleanup
  nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxcfs 2> /dev/null || true
  nsenter -m/proc/1/ns/mnt [ -L /etc/mtab ] || \
          sed -i "/^lxcfs \/var\/lib\/lxcfs fuse.lxcfs/d" /etc/mtab

  # remove /var/lib/lxcfs
  rm -rf /var/lib/lxcfs/*

  # Prepare
  mkdir -p /usr/local/lib/lxcfs /var/lib/lxcfs

  # Update lxcfs
  cp -f /lxcfs/lxcfs /usr/local/bin/lxcfs
  cp -f /lxcfs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so

  cp -f /lxcfs/libfuse.so.2.9.2 /usr/lib64/libfuse.so.2.9.2
  cp -f /lxcfs/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1.0.1

  ln -s /usr/lib64/libfuse.so.2.9.2 /usr/lib64/libfuse.so.2
  ln -s /usr/lib64/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1

  # Mount
  exec nsenter -m/proc/1/ns/mnt /usr/local/bin/lxcfs /var/lib/lxcfs/