背景

近期重新部署了一套K8S环境,是基于本机虚拟机,采用 Kuboard-Spray 方式

使用 KuboardSpray 安装kubernetes_v1.23.1 | Kuboard

安装成功后,并无感觉不妥,看到 pod 状态都是 running,以为大功告成,便开始部署应用。

发现问题

第一次发现问题时,当时是部署了一套若依系统,后端服务都是running,但是前端服务running20s左右状态变成了error.

 通过 查看log发现了端倪, nginx无法找到 upstream。而实际上这个应该是一个 host,却被当成了 upstream。

 nginx 部分配置如下

		
	    location ^~ /prod-api/{
            proxy_set_header Host $http_host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header REMOTE-HOST $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://ruoyi-gateway.ruoyi-k8s:8080/;
        }

通过ping ruoyi-gateway.ruoyi-k8s 也不通。

当时就怀疑集群安装有问题,这整套服务之前在华为云服务器用k8s部署是ok的,本机部署就存在这种问题,配置调了几轮,无果。

后面又用 system服务访问gateway服务,通过ping的方式仍然不通。

当时就下结论,是集群网络问题,但具体是啥问题,还是一头雾水。

准备再次重装集群的时候,问了以下同事——杰哥。 

按照杰哥的思路,查了下网络插件,用的是 calico,都是running,以为正常,杰哥给我画了圈圈,一下子就明白了,原来虽然是running,但不代表服务是正常的,因为Ready数是0。

 沟通之后犹如醍醐灌顶,顺藤摸瓜,终于发现了插件网络不通。

kubectl describe pods calico-node-f5qzf   -n kube-system

...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  88s                default-scheduler  Successfully assigned kube-system/calico-node-f5qzf to node1
  Normal   Pulled     89s                kubelet            Container image "quay.io/calico/cni:v3.21.5" already present on machine
  Normal   Created    89s                kubelet            Created container upgrade-ipam
  Normal   Started    88s                kubelet            Started container upgrade-ipam
  Normal   Pulled     88s                kubelet            Container image "quay.io/calico/cni:v3.21.5" already present on machine
  Normal   Created    88s                kubelet            Created container install-cni
  Normal   Started    87s                kubelet            Started container install-cni
  Normal   Pulled     86s                kubelet            Container image "quay.io/calico/pod2daemon-flexvol:v3.21.5" already present on machine
  Normal   Created    86s                kubelet            Created container flexvol-driver
  Normal   Started    85s                kubelet            Started container flexvol-driver
  Normal   Pulled     85s                kubelet            Container image "quay.io/calico/node:v3.21.5" already present on machine
  Normal   Created    85s                kubelet            Created container calico-node
  Normal   Started    84s                kubelet            Started container calico-node
  Warning  Unhealthy  78s (x4 over 83s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  69s                kubelet            Readiness probe failed: 2022-07-01 03:42:37.964 [INFO][220] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  59s  kubelet  Readiness probe failed: 2022-07-01 03:42:47.960 [INFO][255] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  49s  kubelet  Readiness probe failed: 2022-07-01 03:42:57.893 [INFO][282] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211
  Warning  Unhealthy  39s  kubelet  Readiness probe failed: 2022-07-01 03:43:07.909 [INFO][311] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

网络异常解决

calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.211

有异常就好办了,网上慰问了一番度娘,终于找出了问题。

原因是通过 Kuboard-Spray 方式安装 K8s 集群,calico 网络默认读取的是 eth0 网口,但是,如果是通过 VM 虚拟机安装系统,网口一般为ens33,也就是网口配置不对。

解决

需要打开配置calico的YAML文件

原始部分内容如下:

            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  key: calico_backend
                  name: calico-config
            - name: IP_AUTODETECTION_METHOD
              value: skip-interface=eth0

将IP_AUTODETECTION_METHOD 的值改成  interface=ens33 即可,注意,yaml有几处都需要修改(大概是3处)。

修改后自动重启服务

再来看看 calico 

 READY 变成 了1/1,running状态。

再次重启若依 web服务,也变成了running了,查看日志并无报错,

 再来检验system服务与gateway服务的网络

kubectl exec -it ruoyi-system-7b6488bdd4-4kz5m -n ruoyi-k8s /bin/bash

前方道路畅通 (* ̄︶ ̄)

参考网址

https://www.codenong.com/cs109711759/

Logo

快速构建 Web 应用程序

更多推荐