Azure Kubernetes Service pull unssl image registry

 ·  ☕ 7 

記錄一下之前在工作上遇到的 Gitlab domain 沒有簽憑證,但是外部服務(Azure Kubernetes Service)又需要存取 Gitlab Container Registry 會遇到的問題。

發生了什麼?

  1. Gitlab 的 domain 沒有買 SSL (由於政策問題也不能用 letsencrypt)

  2. Kubernetes 拉取 Gitlab Container Registry 需要帳密

  3. Kubernetes 拉取 Gitlab Container Registry 憑證不被信任

檢視各項問題

Gitlab domain 沒有ssl

如果我們在 host 上透過 docker 拉取該 Container Registry 的 image 可能會出現的狀況。

先看一下我使用的 docker 測試環境

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$docker version
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:45:43 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:

接著我透過 Docker CLI 去拉 Gitlab Container Registry 上的 private image,由於…一些公司政策問題我把公司的位置替換掉了請見諒。

1
2
$docker pull registry-gitlab.com.tw/repo/image-cli:v4.4.0
Error response from daemon: Get registry-gitlab.com.tw/v2/: x509: certificate is not authorized to sign other certificates

直接垃取會出現 x509: certificate is not authorized to sign other certificates,這個部分非常容易解決 Google 一下就可以很快的拿到解,只要docker login 一下就沒問題了。

在那之前我們先來看一下相關的 docker 設定檔

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

上面可以看到 Docker systemD 的相關設定 ,看起來都相當的簡單而且沒有任何繞過 insecure 的選項。

也同時在 /etc 底下看看 Docker 的設定

1
2
$ls /etc/docker/
key.json

這邊我們可以看到 Docker 完全沒有設定任何繞過 insecure 的選項後,使用docker login 後再看看有沒有任何的改變。

1
2
3
4
docker login registry-gitlab.com.tw
Username (test):        
Password: 
Error response from daemon: Get https://registry-gitlab.com.tw/v2/: x509: certificate is not authorized to sign other certificates

這時候我們又觀察到新的問題了,在 login 的時候發現我們的Gitlab Registry 的憑證有問題,在繼續 Google 會看到原來要把憑證有問題的 domain 設定到 /etc/docker/daemon.json 下面,這邊我很快的設定起來,並且重啟dockerd。

1
2
3
4
5
cat <<EOF | >> /etc/docker/daemon.json
{
    "insecure-registries":["registry-gitlab.com.tw"]
}
systemctl restart docker

接著再嘗試透過docker login 登入並且拉取 Gitlab container registry 的 image。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$docker login registry-gitlab.com.tw
Username (test):        
Password: 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

$docker pull registry-gitlab.com.tw/repo/image-cli:v4.4.0

v4.4.0: Pulling from registry-gitlab.com.tw/repo/image-cli:v4.4.0
23302e52b49d: Pulling fs layer
cf5693de4d3c: Download complete
0bdf97977791: Downloading [=>                                                 ]  110.1kB/3.493MB
8b8c7ad8f3fb: Waiting
40eb930bd6b2: Waiting

到這邊我們終於解決在 docker 拉取沒有 SSL Container registry 上的 private image 了,接著我們要來看在公有雲(Azure Kubernetes Service)上出了什麼問題。

Kubernetes pull image og Gitlab Container Registry

接著來看看在公有雲環境中( Azure Kubernetes Service , AKS )會出現的問題,我在 AKS 部署一個三個節點的環境,Kubernetes 版本為 1.15.11。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
bash-5.0# kubectl get node
NAME                                STATUS   ROLES   AGE   VERSION
aks-agentpool-20139558-vmss000000   Ready    agent   11m   v1.15.11
aks-agentpool-20139558-vmss000001   Ready    agent   11m   v1.15.11
aks-agentpool-20139558-vmss000002   Ready    agent   11m   v1.15.11

bash-5.0# kubectl get pod --all-namespaces
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
kube-system   coredns-698c77c5d7-69rch                1/1     Running   0          11m
kube-system   coredns-698c77c5d7-vsrzb                1/1     Running   0          14m
kube-system   coredns-autoscaler-5bd7c6759b-jx9dt     1/1     Running   0          14m
kube-system   kube-proxy-46phd                        1/1     Running   0          11m
kube-system   kube-proxy-fgb6f                        1/1     Running   0          11m
kube-system   kube-proxy-gxck8                        1/1     Running   0          11m
kube-system   kubernetes-dashboard-74d8c675bc-j25fz   1/1     Running   0          14m
kube-system   metrics-server-7d654ddc8b-bljdd         1/1     Running   0          14m
kube-system   omsagent-rs-c45c944df-5crtb             1/1     Running   0          14m
kube-system   omsagent-w5b8l                          1/1     Running   1          11m
kube-system   omsagent-xfndn                          1/1     Running   1          11m
kube-system   omsagent-xhnbv                          1/1     Running   0          11m
kube-system   tunnelfront-98c8b5dc6-b56d5             1/1     Running   0          14m

在AKS的環境上我直接透過kubectl CLI 建立一個非常簡單的Deployment Resource,測試拉取 Container registry 上的 private image 會有什麼問題。

1
bash-5.0# kubectl run -ti --rm  test --image registry-gitlab.com.tw/repo/image-cli:v4.4.0 bash

接著觀察 pod 是否有成功被建立起起來。

1
2
3
bash-5.0# kubectl get pod
NAME READY STATUS RESTARTS AGE
test 0/1 ErrImagePull 0 12s

可以看到 pod 出現 ErrImagePull 的 status 此時,需要更進步一分析出現ErrImagePull的原因。

1
2
3
4
5
6
7
8
bash-5.0# kubectl describe pod test

...
  Normal   BackOff    16s (x3 over 45s)  kubelet, aks-agentpool-20139558-vmss000000  Back-off pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     16s (x3 over 45s)  kubelet, aks-agentpool-20139558-vmss000000  Error: ImagePullBackOff
  Normal   Pulling    4s (x3 over 46s)   kubelet, aks-agentpool-20139558-vmss000000  Pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     4s (x3 over 45s)   kubelet, aks-agentpool-20139558-vmss000000  Failed to pull image "registry-gitlab.com.tw/repo/image-cli:v4.4.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-gitlab.com.tw/v2/: x509: certificate is not authorized to sign other certificates
  Warning  Failed     4s (x3 over 45s)   kubelet, aks-agentpool-20139558-vmss000000  Error: ErrImagePull

這邊可以觀察到問題

Pulling image “registry-gitlab.com.tw/repo/image-cli:v4.4.0” 時發生
x509: certificate is not authorized to sign other certificates


Google 可以搜尋到很多 Kubernetes pull private 的解決方法,這邊直接使用看看還會遇到什麼問題。大致上就是設定 kubernetes 的某個 namespace 如果要拉某一個位置上的 private image 可以透過這一個使用者帳密去執行。

kubectl create secret docker-registry gitlab-registry \
    --docker-username=jason \
    --docker-password=RmgmeG4K-1oD4j3XW5A- \
    --docker-email=jason@hello.com.tw \
    --docker-server=registry-gitlab.com.tw
    
secret/gitlab-registry created

這一步做完我們可以再來看看是不是可以成功pull到image,我們先把原本的 pod delete 再重新 執行一個。

1
2
3
4
5
bash-5.0# kubectl delete pod test
pod "test" deleted

kubectl run -ti --rm  test --image registry-gitlab.com.tw/repo/image-cli:v4.4.0 bash
...

這時候持續觀察 pod 的狀態是不是running

1
2
3
bash-5.0# kubectl get pod
NAME   READY   STATUS         RESTARTS   AGE
test   0/1     ErrImagePull   0          12s

什麼竟然還不是 running 那我們繼續順藤摸瓜看看 pod 到底錯了什麼。

1
2
3
4
5
6
7
8
bash-5.0# kubectl describe pod test

...
  Normal   BackOff    16s (x3 over 45s)  kubelet, aks-agentpool-20139558-vmss000000  Back-off pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     16s (x3 over 45s)  kubelet, aks-agentpool-20139558-vmss000000  Error: ImagePullBackOff
  Normal   Pulling    4s (x3 over 46s)   kubelet, aks-agentpool-20139558-vmss000000  Pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     4s (x3 over 45s)   kubelet, aks-agentpool-20139558-vmss000000  Failed to pull image "registry-gitlab.com.tw/repo/image-cli:v4.4.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-gitlab.com.tw/v2/: x509: certificate is not authorized to sign other certificates
  Warning  Failed     4s (x3 over 45s)   kubelet, aks-agentpool-20139558-vmss000000  Error: ErrImagePull

怎麼還是樣的錯,這時候不得不去看看 pod 的 spec 到底定義了什麼有沒有使用我們指定的 image pull secret 。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
bash-5.0# kubectl get pod test -o yaml
...
      name: default-token-wvgsq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: aks-agentpool-20139558-vmss000000
  nodeSelector:
    node-role.kubernetes.io/agent: ""
  priority: 0
...

怎麼沒有 image pull secret 呢…,找了一下文件 pod 再啟動時都會去用 default service account 的一些設定。

ref:https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account

這邊就偷懶讓 default namesapce 都用同一個 image pull secret 吧 xDD

kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "gitlab-registry"}]}'

這邊再把 Pod 刪掉再重新測試一次看看 image 能不能正確的拉取到。

1
2
3
4
5
bash-5.0# kubectl delete pod test
pod "test" deleted

kubectl run -ti --rm  test --image registry-gitlab.com.tw/repo/image-cli:v4.4.0 bash
...

這時候持續觀察 pod 的狀態是不是running

1
2
3
bash-5.0# kubectl get pod
NAME   READY   STATUS         RESTARTS   AGE
test   0/1     ErrImagePull   0          12s

還是不行啊,繼續透過kubectl CLI 找尋錯誤的原因。

bash-5.0# kubectl describe pod test

...
  Normal   BackOff    21s                kubelet, aks-agentpool-20139558-vmss000002  Back-off pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     21s                kubelet, aks-agentpool-20139558-vmss000002  Error: ImagePullBackOff
  Normal   Pulling    10s (x2 over 23s)  kubelet, aks-agentpool-20139558-vmss000002  Pulling image "registry-gitlab.com.tw/repo/image-cli:v4.4.0"
  Warning  Failed     10s (x2 over 22s)  kubelet, aks-agentpool-20139558-vmss000002  Failed to pull image "registry-gitlab.com.tw/repo/image-cli:v4.4.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-gitlab.com.tw/v2/: x509: certificate is not authorized to sign other certificates
  Warning  Failed     10s (x2 over 22s)  kubelet, aks-agentpool-20139558-vmss000002  Error: ErrImagePull

這次看到一樣的錯誤資訊,這時候想到 unssl 的 image registry 不是要去設定 docker/daemon.json ,這樣 docker 去拉 image 的時候才不會認證他的憑證。

因為在公有雲(Azure Kubernetes Server,AKS)上碰不到這幾台worker 主機,只好用奇門遁甲的方式 mount 他的 filesystem 了。

這邊我解決的方法透過 daemonset 讓 pod 部署到所有的 node 上,該 pod 把 host 的 /etc mount 出來。讓我可以觀察AKS host docker 的 config。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: registry-ca
  namespace: kube-system
  labels:
    k8s-app: config-docker
spec:
  selector:
    matchLabels:
      name: config-docker
  template:
    metadata:
      labels:
        name: config-docker
    spec:
      containers:
      - name: config-docker
        image: nginx:1.18
        command: [ 'sh' ]
        args: [ '-c' , 'tail -f /dev/null']
        volumeMounts:
        - name: etc-docker
          mountPath: /etc/docker
        securityContext:
          privileged: true
      terminationGracePeriodSeconds: 30
      hostPID: true
      volumes:
      - name: etc-docker
        hostPath:
          path: /etc/docker

這邊眼尖得觀眾可能會發現為什麼我用了,securityContext以及hostPID這邊後續會講為什麼我要這樣做。

好的廢話不多說,部署完這個 daemset 後直接進入 pod 裡面看他的設定。

root@registry-ca-rdzbv:/# cat /etc/docker/daemon.json
{
   "live-restore": true,
   "log-driver": "json-file",
   "log-opts":  {
      "max-size": "50m",
      "max-file": "5"
   }
}

發現!他竟然沒有設定 insecure-registries 這邊我只好手動幫他做設定,設定完後直接送給 host pid reload dockerd。這邊你會好奇說為什麼可動到 host 上的 pid ,因為上面的 yaml 有設定 securityContext 以及 hostPID ,所以 container 可以直皆碰到 host 的 process 並且可以下一些需要 security 的指令。

1
2
3
root@registry-ca-rdzbv:/#pidof dockerd
3322
root@registry-ca-rdzbv:/#kill -1 3322

這邊完成後,直接delete test pod 看看重新 run 一個能不能成功pull 到 private image 。

1
2
3
4
5
bash-5.0# kubectl delete pod test
pod "test" deleted

kubectl run -ti --rm  test --image registry-gitlab.com.tw/repo/image-cli:v4.4.0 bash
...

這時候持續觀察 pod 的狀態是不是running

1
2
3
bash-5.0# kubectl get pod
NAME   READY   STATUS         RESTARTS   AGE
test   1/1     Running        0          30s

此時還有一個問題,剛剛只有修改一個節點如果 Pod 今天部署到其他節點上,還是會出現x509: certificate is not authorized to sign other certificates的問題。有沒有方式可以一勞永逸呢?我這邊直接沿用剛剛的daemset 並且修改一下 yaml file讓這個 pod 可以處理這個問題。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: registry-ca
  namespace: kube-system
  labels:
    k8s-app: config-docker
spec:
  selector:
    matchLabels:
      name: config-docker
  template:
    metadata:
      labels:
        name: config-docker
    spec:
      containers:
      - name: config-docker
        image: nginx:1.18
        command: [ 'sh' ]
        args: [ '-c', 'cp --remove-destination /home/core/daemon.json /etc/docker/daemon.json && kill -1  $(pidof dockerd) && tail -f /dev/null']
        volumeMounts:
        - name: etc-docker
          mountPath: /etc/docker
        - name: docker-config
          mountPath: /home/core
        securityContext:
          privileged: true
      terminationGracePeriodSeconds: 30
      hostPID: true
      volumes:
      - name: etc-docker
        hostPath:
          path: /etc/docker
      - name: docker-config
        configMap:
          name: docker-insecure

這邊可以看到直接套了一個 configmap 以及透過 runtime 的 args 把configmap 的資料複製到 /etc/docker/daemon.json 最後重啟了 dockerd 的 process。

以上是解決在Azure Kubernetes Service 遇到的 unssl image registry 的思路與解決方法。


Meng Ze Li
Meng Ze Li
Kubernetes / DevOps / Backend