Issue
I'm working on installing a three node kubernetes cluster on a CentOS 7 with flannel for a some time, however the CoreDNS pods cannot connect to API server and constantly restarting.
The reference HowTo document I followed is here.
What Have I Done so Far?
- Disabled SELinux,
- Disabled
firewalld
, - Enabled
br_netfilter
,bridge-nf-call-iptables
, - Installed kubernetes on three nodes, set-up master's pod network with flannel default network (
10.244.0.0/16
), - Installed other two nodes, and joined the master.
- Deployed flannel,
- Configured Docker's BIP to use flannel default per-node subnet and network.
Current State
- The kubelet works and the cluster reports nodes as ready.
- The Cluster can schedule and migrate pods, so CoreDNS are spawned on nodes.
- Flannel network is connected. No logs in containers and I can ping
10.244.0.0/24
networks from node to node. - Kubernetes can deploy and run arbitrary pods (Tried shell demo, and can access its shell via
kubectl
even if the container is on a different node.- However, since DNS is not working, they cannot resolve any IP addresses.
What is the Problem?
CoreDNS pods report that they cannot connect to API server with error:
Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
I cannot see
10.96.0.0
routes in routing tables:default via 172.16.0.1 dev eth0 proto static metric 100 10.1.0.0/24 dev eth1 proto kernel scope link src 10.1.0.202 metric 101 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 dev docker0 proto kernel scope link src 10.244.1.1 10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.202 metric 100
Additional Info
- Cluster init is done with the command
kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16
. - I have torn down the cluster and rebuilt with 1.12.0 The problem still persists.
- The workaround in Kubernetes documentation doesn't work.
- Problem is present and same both with
1.11-3
and1.12-0
CentOS7 packages.
Progress so Far
- Downgraded Kubernetes to
1.11.3-0
. - Re-initialized Kubernetes with
kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16
, since the server has another external IP which cannot be accessed via other hosts, and Kubernetes tends to select that IP as API Server IP.--pod-network-cidr
is mandated by flannel. Resulting
iptables -L
output after initialization with no joined nodesChain INPUT (policy ACCEPT) target prot opt source destination KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ DOCKER-USER all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (1 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 Chain KUBE-FORWARD (1 references) target prot opt source destination ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000 Chain KUBE-SERVICES (1 references) target prot opt source destination REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
Looks like API Server is deployed as it should
$ kubectl get svc kubernetes -o=yaml apiVersion: v1 kind: Service metadata: creationTimestamp: 2018-10-25T06:58:46Z labels: component: apiserver provider: kubernetes name: kubernetes namespace: default resourceVersion: "6" selfLink: /api/v1/namespaces/default/services/kubernetes uid: 6b3e4099-d823-11e8-8264-a6f3f1f622f3 spec: clusterIP: 10.96.0.1 ports: - name: https port: 443 protocol: TCP targetPort: 6443 sessionAffinity: None type: ClusterIP status: loadBalancer: {}
Then I've applied flannel network pod with
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
As soon as I apply the flannel network, CoreDNS pods start and start to give the same error:
Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500\u0026resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
I've found out that
flanneld
is using the wrong network interface, and changed it in thekube-flannel.yml
file before deployment. However the outcome is still the same.
Any help is greatly appreciated.
Solution
I've solved the problem. The cause is a mixture of inexperience, lack of documentation and some old, no-longer-correct information.
The guy who will be using the installation told me that Docker's bridge needs to be in the same subnet with the Flannel network, hence I edited Docker's bridge network.
However, when Kubernetes started to use CNI, this requirement not only became unnecessary, but plain wrong. Having both cni0
and docker0
on the same network with same IP address always felt wrong, but since I'm a complete beginner in Kubernetes, I ignored my hunch.
As a result, I reset Docker's network to its default, tore down the cluster and rebuilt it. Now everything is working as it should.
TL;DR: Never, ever touch Docker's network parameters if you are setting up a recent Kubernetes release. Just install Docker, init the Kubernetes and deploy Flannel. Kubernetes and CNI will take care of container to Flannel transport.
Answered By - bayindirh