微信黑魔法-专注解决微信开发周边,小程序,公众号,微信机器人等技术分享

浅谈Kubernetes的探针工作方式

浅谈Kubernetes的探针工作方式

背景原因

今日在生产环境中, 遇到一个奇怪的问题, 后端的Go服务偶尔能看到类似下面的错误

write: broken pipe

但是这个问题比较难以复现, 直到最近迁移了服务到k8s集群, 发现能够稳定复现. 于是小研究了一下原因.

具体问题

抓包分析, 可以观察到, 客户端10.244.2.1.52180 和 服务端wxjiguang-web-7876d5858-lkqqz.5008 的交互数据流如下.

09:25:02.764630 IP 10.244.2.1.52180 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [S], seq 2657245979, win 28200, options [mss 1410,sackOK,TS val 1805607032 ecr 0,nop,wscale 7], length 0
09:25:02.764658 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52180: Flags [S.], seq 3805641569, ack 2657245980, win 27960, options [mss 1410,sackOK,TS val 3463082419 ecr 1805607032,nop,wscale 7], length 0
09:25:02.764685 IP 10.244.2.1.52180 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 1, win 221, options [nop,nop,TS val 1805607032 ecr 3463082419], length 0
09:25:02.764837 IP 10.244.2.1.52180 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [P.], seq 1:122, ack 1, win 221, options [nop,nop,TS val 1805607032 ecr 3463082419], length 121
09:25:02.764846 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52180: Flags [.], ack 122, win 219, options [nop,nop,TS val 3463082419 ecr 1805607032], length 0
09:25:08.377747 IP 10.244.2.1.52252 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [S], seq 1890553115, win 28200, options [mss 1410,sackOK,TS val 1805612645 ecr 0,nop,wscale 7], length 0
09:25:08.377777 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52252: Flags [S.], seq 3331486571, ack 1890553116, win 27960, options [mss 1410,sackOK,TS val 3463088032 ecr 1805612645,nop,wscale 7], length 0
09:25:08.377806 IP 10.244.2.1.52252 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 1, win 221, options [nop,nop,TS val 1805612645 ecr 3463088032], length 0
09:25:08.377964 IP 10.244.2.1.52252 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [P.], seq 1:122, ack 1, win 221, options [nop,nop,TS val 1805612645 ecr 3463088032], length 121
09:25:08.377973 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52252: Flags [.], ack 122, win 219, options [nop,nop,TS val 3463088032 ecr 1805612645], length 0
09:25:08.765167 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52180: Flags [P.], seq 1:146, ack 122, win 219, options [nop,nop,TS val 3463088419 ecr 1805607032], length 145
09:25:08.765210 IP 10.244.2.1.52180 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 146, win 229, options [nop,nop,TS val 1805613033 ecr 3463088419], length 0
09:25:08.765262 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52180: Flags [F.], seq 146, ack 122, win 219, options [nop,nop,TS val 3463088420 ecr 1805613033], length 0
09:25:08.765493 IP 10.244.2.1.52180 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [F.], seq 122, ack 147, win 229, options [nop,nop,TS val 1805613033 ecr 3463088420], length 0
09:25:08.765506 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.52180: Flags [.], ack 123, win 219, options [nop,nop,TS val 3463088420 ecr 1805613033], length 0

重点观察链接是如果关闭的, 可以发现,这里是服务端wxjiguang-web-7876d5858-lkqqz.5008 主动关闭的. 而且这里的四次挥手优化成了3次挥手.

如果手动用curl来访问, 客户端default.svc.cluster.local.49536 服务端wxjiguang-web-7876d5858-lkqqz.5008 抓包数据如下

09:54:05.848582 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [S], seq 3061581492, win 28200, options [mss 1410,sackOK,TS val 1486184019 ecr 0,nop,wscale 7], length 0
09:54:05.848605 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.49536: Flags [S.], seq 1624739138, ack 3061581493, win 27960, options [mss 1410,sackOK,TS val 2084452184 ecr 1486184019,nop,wscale 7], length 0
09:54:05.848632 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 1, win 221, options [nop,nop,TS val 1486184019 ecr 2084452184], length 0
09:54:05.848686 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [P.], seq 1:86, ack 1, win 221, options [nop,nop,TS val 1486184019 ecr 2084452184], length 85
09:54:05.848690 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.49536: Flags [.], ack 86, win 219, options [nop,nop,TS val 2084452184 ecr 1486184019], length 0
09:54:08.377719 IP 10.244.2.1.41166 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [S], seq 2524592419, win 28200, options [mss 1410,sackOK,TS val 1807352610 ecr 0,nop,wscale 7], length 0
09:54:08.377750 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.41166: Flags [S.], seq 867577693, ack 2524592420, win 27960, options [mss 1410,sackOK,TS val 3464827997 ecr 1807352610,nop,wscale 7], length 0
09:54:08.377776 IP 10.244.2.1.41166 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 1, win 221, options [nop,nop,TS val 1807352610 ecr 3464827997], length 0
09:54:08.377929 IP 10.244.2.1.41166 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [P.], seq 1:122, ack 1, win 221, options [nop,nop,TS val 1807352610 ecr 3464827997], length 121
09:54:08.377935 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.41166: Flags [.], ack 122, win 219, options [nop,nop,TS val 3464827997 ecr 1807352610], length 0
09:54:08.765302 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.41080: Flags [P.], seq 1:146, ack 122, win 219, options [nop,nop,TS val 3464828384 ecr 1807346997], length 145
09:54:08.765363 IP 10.244.2.1.41080 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 146, win 229, options [nop,nop,TS val 1807352997 ecr 3464828384], length 0
09:54:08.765417 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.41080: Flags [F.], seq 146, ack 122, win 219, options [nop,nop,TS val 3464828384 ecr 1807352997], length 0
09:54:08.765572 IP 10.244.2.1.41080 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [F.], seq 122, ack 147, win 229, options [nop,nop,TS val 1807352998 ecr 3464828384], length 0
09:54:08.765583 IP wxjiguang-web-7876d5858-lkqqz.5008 > 10.244.2.1.41080: Flags [.], ack 123, win 219, options [nop,nop,TS val 3464828385 ecr 1807352998], length 0
09:54:11.254358 ARP, Request who-has default.svc.cluster.local tell wxjiguang-web-7876d5858-lkqqz, length 28
09:54:11.254571 ARP, Reply default.svc.cluster.local is-at 66:c6:48:d5:ca:4f (oui Unknown), length 28
09:54:11.849135 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.49536: Flags [P.], seq 1:127, ack 86, win 219, options [nop,nop,TS val 2084458184 ecr 1486184019], length 126
09:54:11.849211 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 127, win 221, options [nop,nop,TS val 1486190019 ecr 2084458184], length 0
09:54:11.849404 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [F.], seq 86, ack 127, win 221, options [nop,nop,TS val 1486190019 ecr 2084458184], length 0
09:54:11.849470 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.49536: Flags [F.], seq 127, ack 87, win 219, options [nop,nop,TS val 2084458184 ecr 1486190019], length 0
09:54:11.849504 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 128, win 221, options [nop,nop,TS val 1486190019 ecr 2084458184], length 0

可以观察到这里是客户端发起关闭的, 从这个方向可以观察到是服务端先发起FIN的

09:54:11.849404 IP default.svc.cluster.local.49536 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [F.], seq 86, ack 127, win 221, options [nop,nop,TS val 1486190019 ecr 2084458184], length 0

问题分析

那么问题来了, 同样的服务端endpoint, 为什么curl发来的请求,TCP链接关闭在客户端, 而kubenetes自带的Liveness和Readiness的Probe探针, 却是在服务端发起关闭链接操作呢?

进一步观察发现,

kubenetes自带的探针请求包头都带了Connection: close的操作,

而使用curl的请求头, 不带Connection: close的请求头.

# curl wxjiguang-web/healthz2 -i
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Sun, 22 Nov 2020 09:54:11 GMT
Content-Length: 4

验证问题

至此, 原因应该比较清晰, 在curl加上close验证是否属实.

# curl wxjiguang-web/healthz2 -i -H "Connection: close"
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Sun, 22 Nov 2020 09:38:16 GMT
Content-Length: 4
Connection: close

抓包得到, 可以观察到现在是服务端wxjiguang-web-7876d5858-lkqqz.5008先发起FIN包, 开始关闭连接操作的.

10:05:10.102735 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.55680: Flags [P.], seq 1:146, ack 105, win 219, options [nop,nop,TS val 2085116424 ecr 1486842259], length 145
10:05:10.102830 IP default.svc.cluster.local.55680 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [.], ack 146, win 229, options [nop,nop,TS val 1486848259 ecr 2085116424], length 0
10:05:10.102894 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.55680: Flags [F.], seq 146, ack 105, win 219, options [nop,nop,TS val 2085116425 ecr 1486848259], length 0
10:05:10.102991 IP default.svc.cluster.local.55680 > wxjiguang-web-7876d5858-lkqqz.5008: Flags [F.], seq 105, ack 147, win 229, options [nop,nop,TS val 1486848260 ecr 2085116425], length 0
10:05:10.103001 IP wxjiguang-web-7876d5858-lkqqz.5008 > default.svc.cluster.local.55680: Flags [.], ack 106, win 219, options [nop,nop,TS val 2085116425 ecr 1486848260], length 0

猜想

kubernetes 这么做的原因, 估计是不愿意在探针发起链接关闭操作, 从而导致大量TIME_WAIT 堆积在集群控制层面,造成资源不足, 影响巨大, 所以把锅甩给了服务Pod自己处理, 因为定时发起的,对服务端来说可以忽略不计.