xxxxプログラマのメモ

先人に感謝と敬意:自分の困ったこと調べたことのメモ

Resource Monitoring #GPU #CPU

ayatk.hatenablog.com

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: prometheus
    chart: prometheus-try
    component: node-exporter
    heritage: Tiller
    release: prometheus
  name: prometheus-dcgm-exporter
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: prometheus
      component: node-exporter
      release: prometheus
  template:
    metadata:
      labels:
        app: prometheus
        component: node-exporter
        release: prometheus
    spec:
      nodeSelector:
        nvidia.com/gpu: "true"
      containers:
        - image: nvidia/dcgm-exporter
          name: nvidia-dcgm-exporter
          securityContext:
            runAsNonRoot: false
            runAsUser: 0
          volumeMounts:
          - mountPath: /run/prometheus
            name: collector-textfiles
      hostNetwork: true
      hostPID: true
      serviceAccount: prometheus-node-exporter
      serviceAccountName: prometheus-node-exporter
      volumes:
        - name: collector-textfiles
          hostPath:
            path: /host/path/to/dcgm-exporter            

Thanks!