K8S 网络问题导致 ns 相关的服务不能互相访问

近期重新部署了一套K8S环境，是基于本机虚拟机，采用 Kuboard-Spray 方式使用 KuboardSpray 安装kubernetes_v1.23.1 | Kuboard安装成功后，并无感觉不妥，看到 pod 状态都是 running，以为大功告成，便开始部署应用。发现问题第一次发现问题时，当时是部署了一套若依系统，后端服务都是running，但是前端服务running20s左右状态变成了

A心有千千结

1184人浏览 · 2022-07-01 14:51:05

A心有千千结 · 2022-07-01 14:51:05 发布

背景

近期重新部署了一套K8S环境，是基于本机虚拟机，采用 Kuboard-Spray 方式

使用 KuboardSpray 安装kubernetes_v1.23.1 | Kuboard

安装成功后，并无感觉不妥，看到 pod 状态都是 running，以为大功告成，便开始部署应用。

发现问题

第一次发现问题时，当时是部署了一套若依系统，后端服务都是running，但是前端服务running20s左右状态变成了error.

通过查看log发现了端倪， nginx无法找到 upstream。而实际上这个应该是一个 host，却被当成了 upstream。

nginx 部分配置如下

		
	    location ^~ /prod-api/{
            proxy_set_header Host $http_host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header REMOTE-HOST $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://ruoyi-gateway.ruoyi-k8s:8080/;
        }

通过ping ruoyi-gateway.ruoyi-k8s 也不通。

当时就怀疑集群安装有问题，这整套服务之前在华为云服务器用k8s部署是ok的，本机部署就存在这种问题，配置调了几轮，无果。

后面又用 system服务访问gateway服务，通过ping的方式仍然不通。

当时就下结论，是集群网络问题，但具体是啥问题，还是一头雾水。

准备再次重装集群的时候，问了以下同事——杰哥。

按照杰哥的思路，查了下网络插件，用的是 calico，都是running，以为正常，杰哥给我画了圈圈，一下子就明白了，原来虽然是running，但不代表服务是正常的，因为Ready数是0。

沟通之后犹如醍醐灌顶，顺藤摸瓜，终于发现了插件网络不通。

kubectl describe pods calico-node-f5qzf -n kube-system

...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  88s                default-scheduler  Successfully assigned kube-system/calico-node-f5qzf to node1
  Normal   Pulled     89s                kubelet            Container image "quay.io/calico/cni:v3.21.5" already present on machine
  Normal   Created    89s                kubelet            Created container upgrade-ipam
  Normal   Started    88s                kubelet            Started container upgrade-ipam
  Normal   Pulled     88s                kubelet            Container image "quay.io/calico/cni:v3.21.5" already present on machine
  Normal   Created    88s                kubelet            Created container install-cni
  Normal   Started    87s                kubelet            Started container install-cni
  Normal   Pulled     86s                kubelet            Container image "quay.io/calico/pod2daemon-flexvol:v3.21.5" already present on machine
  Normal   Created    86s                kubelet            Created container flexvol-driver
  Normal   Started    85s                kubelet            Started container flexvol-driver
  Normal   Pulled     85s                kubelet            Container image "quay.io/calico/node:v3.21.5" already present on machine
  Normal   Created    85s                kubelet            Created container calico-node
  Normal   Started    84s                kubelet            Started container calico-node
  Warning  Unhealthy  78s (x4 over 83s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  69s                kubelet            Readiness probe failed: 2022-07-01 03:42:37.964 [INFO][220] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  59s  kubelet  Readiness probe failed: 2022-07-01 03:42:47.960 [INFO][255] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  49s  kubelet  Readiness probe failed: 2022-07-01 03:42:57.893 [INFO][282] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  39s  kubelet  Readiness probe failed: 2022-07-01 03:43:07.909 [INFO][311] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

网络异常解决

calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

有异常就好办了，网上慰问了一番度娘，终于找出了问题。

原因是通过 Kuboard-Spray 方式安装 K8s 集群，calico 网络默认读取的是 eth0 网口，但是，如果是通过 VM 虚拟机安装系统，网口一般为ens33，也就是网口配置不对。

解决

需要打开配置calico的YAML文件

原始部分内容如下：

            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  key: calico_backend
                  name: calico-config
            - name: IP_AUTODETECTION_METHOD
              value: skip-interface=eth0

将IP_AUTODETECTION_METHOD 的值改成 interface=ens33 即可，注意，yaml有几处都需要修改（大概是3处）。