我正在使用安装了amazon
linux的EC2实例(使用来自DHCP的amazon dns服务器设置)以及RDS数据库. EC2实例落后于ELB并获得高流量.我使用的应用程序是用
PHP编写的.
问题是当PHP尝试连接到RDS数据库时,有时会返回以下错误:
它不会发生很多,但有时它会变得更糟;我收到了这条消息的数以千计的错误事件.
诊断问题有什么建议吗?我正在考虑将所有DNS流量转储到文件并进行检查,但服务器的流量非常高,因此很难从该文件中进行跟踪.
- Ip:
- 197171459 total packets received
- 1 with invalid addresses
- 0 forwarded
- 0 incoming packets discarded
- 197171458 incoming packets delivered
- 175015443 requests sent out
- Icmp:
- 12528 ICMP messages received
- 0 input ICMP message Failed.
- ICMP input histogram:
- destination unreachable: 188
- echo requests: 12340
- 12559 ICMP messages sent
- 0 ICMP messages Failed
- ICMP output histogram:
- destination unreachable: 219
- echo replies: 12340
- IcmpMsg:
- InType3: 188
- InType8: 12340
- OutType0: 12340
- OutType3: 219
- Tcp:
- 5231380 active connections openings
- 3978862 passive connection openings
- 881 Failed connection attempts
- 6420 connection resets received
- 17 connections established
- 191630575 segments received
- 200105352 segments send out
- 2797151 segments retransmited
- 0 bad segments received.
- 6910 resets sent
- Udp:
- 5577451 packets received
- 219 packets to unknown port received.
- 0 packet receive errors
- 5577700 packets sent
- UdpLite:
- TcpExt:
- 172 invalid SYN cookies received
- 808 resets received for embryonic SYN_RECV sockets
- 7176788 TCP sockets finished time wait in fast timer
- 507 packets rejects in established connections because of timestamp
- 448055 delayed acks sent
- 2927 delayed acks further delayed because of locked socket
- Quick ack mode was activated 2433 times
- 94865861 packets directly queued to recvmsg prequeue.
- 16611185 packets directly received from backlog
- 54150864749 packets directly received from prequeue
- 2158966 packets header predicted
- 79141174 packets header predicted and directly queued to user
- 40780030 acknowledgments not containing data received
- 56946553 predicted acknowledgments
- 84 times recovered from packet loss due to SACK data
- Detected reordering 4 times using FACK
- Detected reordering 11 times using SACK
- Detected reordering 69 times using time stamp
- 70 congestion windows fully recovered
- 1241 congestion windows partially recovered using Hoe heuristic
- TCPDSACKUndo: 13
- 2491 congestion windows recovered after partial ack
- 0 TCP data loss events
- 220 timeouts after SACK recovery
- 104 fast retransmits
- 99 forward retransmits
- 7 retransmits in slow start
- 2792531 other TCP timeouts
- 22 times receiver scheduled too late for direct processing
- 2423 DSACKs sent for old packets
- 2785871 DSACKs received
- 5162 connections reset due to unexpected data
- 921 connections reset due to early user close
- 135 connections aborted due to timeout
- TCPDSACKIgnoredOld: 533
- TCPDSACKIgnoredNoUndo: 393
- TCPSackShifted: 477
- TCPSackMerged: 536
- TCPSackShiftFallback: 2709
- TCPBacklogDrop: 46
- TCPDeferAcceptDrop: 3906058
- IpExt:
- InOctets: 69400712361
- OutOctets: 94841399143
解决方法
有一个已知的AWS错误会导致DNS解析偶尔失败:
https://forums.aws.amazon.com/thread.jspa?messageID=330465#330465
您可能希望使用持久连接进行测试,因为这会降低执行DNS解析的频率.
本地DNS缓存(例如pdns-recursor或@L_502_4@)将降低频率,但RDS主机名记录具有非常短(60秒)的TTL,因此这意味着问题发生的频率低得多,但仍然每天发生几次.