Is the service affected when all masters are stopped?
OpenShift 4
- Infra Node 3
- Master Node 3
- Worker Node 3
※ Router pods are in the Infra Node.
The work request is as follows.
frontend(DC) -> api(DC)
- Internet -> Infra Node(Router) -> SDN -> Worker Node
www.frontend.test.com(443) - Route
- api REST Call (HttpUrlConnection or HttpClient 4.x, api:8080 or api.test1.svc.cluster.local:8080)
When all master nodes are stopped, the front-end always succeeds. However, the API call fails intermittently.
Slow Hang or UnknownHostException Message.
It is ok if there is at least one master node.
GET http://api.test1.svc.cluster.local:8080/ : java.net.UnknownHostException: api.test1.svc.cluster.local
or
GET http://api:8080/ : java.net.UnknownHostException: api
When slow Message.
"http-nio-8080-exec-2" #33 daemon prio=5 os_prio=0 cpu=6.89ms elapsed=452.97s tid=0x00007fdb88f15800 nid=0xb3 runnable [0x00007fdb34fc2000]
java.lang.Thread.State: RUNNABLE
at java.net.Inet6AddressImpl.lookupAllHostAddr([email protected]/Native Method)
at java.net.InetAddress$PlatformNameService.lookupAllHostAddr([email protected]/InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService([email protected]/InetAddress.java:1519)
at java.net.InetAddress$NameServiceAddresses.get([email protected]/InetAddress.java:848)
- locked <0x0000000782ff5be0> (a java.net.InetAddress$NameServiceAddresses)
at java.net.InetAddress.getAllByName0([email protected]/InetAddress.java:1509)
at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1368)
at java.net.InetAddress.getAllByName([email protected]/InetAddress.java:1302)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at
thank you.
CodePudding user response:
Yes, it is affected. Here's what's happening when you stop all the master nodes.
- The incoming traffic to the DC egress is being forwarded to the
Ingresscomponent of your Kubernetes/Openshift cluster. (Yourfront-end). - This is succeeding because the name resolution of
Ingressis responsibility of your infrastructure, sinceIngressinterfacing is external toOpenshift. - Once the traffic reaches the
Ingress(front-endsuccessfully reached), it needs to be forwarded now, to thebackendservice depending on the path in the request. - This cannot be done, since
Ingressobjects, by design, dynamically resolveServiceDNS names into IP addresses in order to reach them inside the cluster. This is done, so that when services go down and come up and the IP address changes,Ingressdoesn't need to be reconfigured since the DNS name stays consistent. - Here, the resolution fails, because your DNS system (probably
core-dns) is supposed to be running on themasternodes, which it isn't, and this leads tounresolved namebehavior. - Sometimes, it is possible that the
Ingresshas a local resolver cache entry which is valid and the request makes it to theServiceand it gets a response. But this is highly unstable, since it is possible that this cache has been set with a auto-clean timeout and those entries are automatically purged after a while.
