问题描述
错误信息如下:
➜ packetbeat git:(master) ✗ ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-1_prot-7101_20161222110711_20161222110721.pcap -E packetbeat.protocols.redis.ports=7101 -t 2017/01/11 09:42:37.662148 logp.go:219: INFO Metrics logging every 30s 2017/01/11 09:42:37.661865 beat.go:267: INFO Home path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat] Config path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat] Data path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat/data] Logs path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat/logs] 2017/01/11 09:42:37.662756 beat.go:177: INFO Setup Beat: packetbeat; Version: 6.0.0-alpha1 2017/01/11 09:42:37.663073 file.go:45: INFO File output path set to: ./logs 2017/01/11 09:42:37.663096 file.go:46: INFO File output base filename set to: packetbeat 2017/01/11 09:42:37.663105 file.go:49: INFO Rotate every bytes set to: 10240000 2017/01/11 09:42:37.663110 file.go:53: INFO Number of files set to: 7 2017/01/11 09:42:37.663140 outputs.go:106: INFO Activated file as output plugin. 2017/01/11 09:42:37.663409 publish.go:291: INFO Publisher name: sunfeideMacBook-Pro.local 2017/01/11 09:42:37.663797 async.go:63: INFO Flush Interval set to: -1s 2017/01/11 09:42:37.663817 async.go:64: INFO Max Bulk Size set to: -1 2017/01/11 09:42:37.664065 procs.go:79: INFO Process matching disabled 2017/01/11 09:42:37.664290 protos.go:89: INFO registered protocol plugin: amqp 2017/01/11 09:42:37.664309 protos.go:89: INFO registered protocol plugin: http 2017/01/11 09:42:37.664315 protos.go:89: INFO registered protocol plugin: MysqL 2017/01/11 09:42:37.664320 protos.go:89: INFO registered protocol plugin: redis 2017/01/11 09:42:37.665784 beat.go:207: INFO packetbeat start running. 2017/01/11 09:47:45.218365 redis_parse.go:306: ERR Failed to read integer reply: Expected digit value should be Int,but we get -> // 后面的内容输出为自己增加的调试信息 "{\"id\":153,\"email\":\"xiaoxxx@xxxxxxx\",\"work_code\":\"Exxxxxxx\",\"mobile\":186xxxx0925,\"name\":\"谢xxx\",\"walle_id\":77287,\"status\":6,\"pinyin_name\":\"xxj\",\"sex\":1,\"security_level\":60,\"certificate_type\":0,\"certificate_number\":\"420683198908113733\",\"created_at\":1431550029000,\"updated_at\":1449228237000,\"nchr_id\":\"0001A910000000002EQP\"}}" 2017/01/11 09:42:38.430211 sniffer.go:384: INFO Input finish. Processed 40644 packets. Have a nice day! 2017/01/11 09:42:38.430657 util.go:48: INFO flows worker loop stopped 2017/01/11 09:42:38.430709 logp.go:245: INFO Total non-zero values: libbeat.publisher.published_events=8080 tcp.dropped_because_of_gaps=15 redis.unmatched_responses=15 2017/01/11 09:42:38.430722 logp.go:246: INFO Uptime: 909.957024ms 2017/01/11 09:42:38.430728 beat.go:211: INFO packetbeat stopped. ➜ packetbeat git:(master) ✗
能够看到错误信息为
2017/01/11 09:47:45.218365 redis_parse.go:306: ERR Failed to read integer reply: Expected digit value should be Int,but we get -> "{\"id\":153,\"nchr_id\":\"0001A910000000002EQP\"}}"
问题原因
当调用 HMGET
进行多个数据查询时,且应答包含的数据量比较大时,则会遇到 TCP 协议层面的分包回复;若在应答回复的过程中,出现丢包,则会导致数据解析出错;
结论增强:经过深入研究发现,应答内容数据量大并不是充分条件,而是必要条件;导致数据解析出错的根本原因和 TCP 分包时的位置有关,因为发现有些情况下的分包数据并不会导致解析错误的发生);
具体分析如下:
37642 号包:发起 HMGET
请求,查询 key 为 hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user 下的 8 个 filed 的值;
*10 $5 HMGET $44 hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user $5 18370 $5 52708 $1 0 $5 18370 $4 1117 $2 13 $3 153 $3 147
37642 号包:提示 [TCP PrevIoUs segment not captured] 信息,参考 HMGET
请求的内容,结合当前包的数据内容可以看出,该包为 HMGET
应答的最后两个 field 的部分内容(并且截断位置也应该是无规律的);
:{"id":153,"email":"xiaoxxxxx@xxxx","work_code":"Exxxxx","mobile":186xxxx0925,"name":".........","walle_id":77287,"status":6,"pinyin_name":"xxj","sex":1,"security_level":60,"certificate_type":0,"certificate_number":"420683198908113733","created_at":1431550029000,"updated_at":1449228237000,"nchr_id":"0001A910000000002EQP"}} $519 {"userId":147,"userBuList":[3175],"tagsList":[],"userBuRoleDto":[{"id":81524,"bu_id":3175,"bu_name":"............BU","role_id":859,"role_name":".........","user_id":147,"user_name":"......"}],"user":{"id":147,"email":"xinxxxxx@xxxx","work_code":"Exxxx","mobile":186xxxx626,"name":"......","walle_id":56063,"pinyin_name":"jx","security_level":70,"certificate_number":"420106198511242510","created_at":1431550030000,"updated_at":1449228115000,"nchr_id":"0001A910000000002ERE"}}
37648 号包:提示 [TCP Fast Retransmission] 信息,结合数据包内容可以看出,重传数据即为之前判定丢失的数据;
*8 $532 {"userId":18370,"userBuList":[4594],"userBuRoleDto":[{"id":120993,"bu_id":4594,"bu_name":"..................","role_id":924,"role_name":"......","user_id":18370,"user":{"id":18370,"email":"huixxxxx@xxxx","mobile":137xxxxx8281,"walle_id":23156752,"pinyin_name":"yh","security_level":20,"certificate_number":"130984199001023033","created_at":1438657893000,"updated_at":1449228019000,"nchr_id":"0001A910000000013PM5"}} $536 {"userId":52708,"userBuList":[4704],"userBuRoleDto":[{"id":109320,"bu_id":4704,"bu_name":".....................","role_id":867,"user_id":52708,"user_name":"Neil"}],"user":{"id":52708,"email":"lianxxxxx@xxxxx","mobile":186xxxx6367,"name":"Neil","walle_id":123225472,"pinyin_name":"Neil","certificate_number":"370181198109040350","created_at":1476093008000,"updated_at":1477916894000,"nchr_id":"0001B61000000010WI0Q"}} $-1 $532 {"userId":18370,"email":"hui.xxxx@xxxx","mobile":137xxxx8281,"nchr_id":"0001A910000000013PM5"}} $533 {"userId":1117,"userBuList":[4596],"userBuRoleDto":[{"id":89811,"bu_id":4596,"role_id":923,"user_id":1117,"user_name":"........."}],"user":{"id":1117,"email":"xuxxxxxx@xxxxx","mobile":185xxxxx2164,"walle_id":1682514,"pinyin_name":"hxh","certificate_number":"640102199404171817","created_at":1431550294000,"updated_at":1463403636000,"nchr_id":"0001A910000000002GCC"}} $512 {"userId":13,"userBuList":[104],"userBuRoleDto":[{"id":769,"bu_id":104,"bu_name":".........","role_id":865,"role_name":"CEO","user_id":13,"user":{"id":13,"email":"markxxxx@xxx","mobile":134xxxx0180,"walle_id":869105,"pinyin_name":"zxh","security_level":90,"certificate_number":"310103198504094033","created_at":1431550018000,"nchr_id":"0001A910000000002EGU"}} $526 {"userId":153,"userBuList":[3174],"userBuRoleDto":[{"id":53306,"bu_id":3174,"bu_name":"............","role_id":922,"user_id":153,"user"
解决办法
这个是 packetbeat 在 redis 协议解析时的 bug ,目前尚未解决;
其他
在官网论坛上的讨论;