nginx反向代理大大增加了最坏情况的延迟

(编辑：部分理解和解决,见评论)

我有一个设置,Nginx充当CherryPy应用服务器前面的反向代理.我正在使用ab来比较通过Nginx而不是Nginx的性能,并注意到前一种情况的最坏情况表现更差：

$ab -n 200 -c 10 'http://localhost/noop'
This is ApacheBench,Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss,Zeus Technology Ltd,http://www.zeustech.net/
Licensed to The Apache Software Foundation,http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests


Server Software:        Nginx
Server Hostname:        localhost
Server Port:            80

Document Path:          /noop
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   3.145 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      29600 bytes
HTML transferred:       0 bytes
Requests per second:    63.60 [#/sec] (mean)
Time per request:       157.243 [ms] (mean)
Time per request:       15.724 [ms] (mean,across all concurrent requests)
Transfer rate:          9.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     5   48 211.7     31    3007
Waiting:        5   48 211.7     31    3007
Total:          5   48 211.7     31    3007

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     36
  75%     39
  80%     41
  90%     46
  95%     51
  98%     77
  99%    252
 100%   3007 (longest request)
$ab -n 200 -c 10 'http://localhost:8080/noop'
This is ApacheBench,http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests


Server Software:        CherryPy/3.2.0
Server Hostname:        localhost
Server Port:            8080

Document Path:          /noop
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   0.564 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      27600 bytes
HTML transferred:       0 bytes
Requests per second:    354.58 [#/sec] (mean)
Time per request:       28.202 [ms] (mean)
Time per request:       2.820 [ms] (mean,across all concurrent requests)
Transfer rate:          47.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.7      0      11
Processing:     6   26  23.5     24     248
Waiting:        3   25  23.6     23     248
Total:          6   26  23.4     24     248

Percentage of the requests served within a certain time (ms)
  50%     24
  66%     27
  75%     29
  80%     31
  90%     34
  95%     40
  98%     51
  99%    234
 100%    248 (longest request)

可能是什么导致了这个？我唯一能想到的是Nginx以不同于他们到达的顺序向后端发送请求,但这似乎难以置信.

该机器是具有2个内核的EC2 c1.medium实例,CherryPy使用具有10个线程的线程池,并且Nginx具有worker_connections = 1024.

更新：两个令人困惑的发现：

>在给定的并发性下,发送更多请求可提高性能.对于40和40个请求的并发性,我的中位时间为3秒,最大值为10.5秒;在并发40和200个请求的情况下,我的中位数为38毫秒(！),最大值为7.5秒.事实上,200个请求的总时间少了！ (6.5s vs.4.5s for 40).这一切都是可重复的.
>使用strace监控两个Nginx工作进程可以大大提高其性能,例如中位时间为3s至77ms,没有明显改变其行为. (我测试了一个非常重要的API调用,并确认strace不会改变响应,以及所有这些性能观察结果仍然存在.)这也是可重复的.

最佳答案

第一次ab运行中的3秒最坏情况看起来像丢包.这可能是由于配置了一些不足的缓冲区/资源,一些可能的原因没有特定的顺序：

>后端上的侦听队列太小导致偶尔的侦听队列溢出(Linux通常配置为在这种情况下丢弃SYN数据包,从而使其与数据包丢失不可分割;请参阅netstat -s | grep侦听以确定它是否是问题).
> localhost上的状态防火墙接近它对状态数量的限制,并因此丢弃一些随机SYN数据包.
>由于套接字处于TIME_WAIT状态,系统超出套接字/本地端口,如果使用Linux,请参阅this question.

您必须仔细检查您的操作系统,找出原因并相应地配置您的操作系统.您可能还需要遵循适用于您的操作系统的某些网络子系统调整指南.请注意,此处EC2可能有点具体,因为有报告称EC2实例的网络性能非常有限.

从Nginx的角度来看,任何解决方案都会有或多或少的错误(因为问题不在于Nginx,而是在无法处理负载和丢弃数据包的操作系统中).不过,您可以尝试一些技巧来减少OS网络子系统的负载：

>配置keepalive connections to a backend.
>配置后端以侦听unix域套接字(如果您的后端支持它),并配置Nginx以代理对它的请求.

nginx反向代理大大增加了最坏情况的延迟

猜你在找的Nginx相关文章