linux – Elastic Beanstalk日志轮换导致Apache重启

我已经完成了 AWS Elastic Beanstalk – Apache is restarting constantly

我们的弹性beanstalk实例在error_log中报告以下消息

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received.  Attempting to restart
*** Error in (wsgi:wsgi)    ': double free or corruption (out): 0x00007f564cced560 ***

有时错误序列看起来更像是这样：

[Tue Jun 27 00:01:01.215260 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] mod_wsgi (pid=6429): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'.
[Tue Jun 27 00:01:01.215320 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] OSError: Failed to write data
[Tue Jun 27 00:01:01.222407 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] mod_wsgi (pid=6430): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'.
[Tue Jun 27 00:01:01.222460 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] OSError: Failed to write data
[Tue Jun 27 00:01:04.554810 2017] [core:warn] [pid 8595] AH00045: child process 7614 still did not exit,sending a SIGTERM
[Tue Jun 27 00:01:04.554850 2017] [core:warn] [pid 8595] AH00045: child process 7615 still did not exit,sending a SIGTERM
[Tue Jun 27 00:01:05.555958 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received.  Attempting to restart
*** Error in (wsgi:wsgi)    ': double free or corruption (out): 0x00007f5640cae900 ***
*** Error in (wsgi:wsgi)    ': double free or corruption (out): 0x00007f78649b7970 ***

几乎每个小时都会这样.常见的信息是：

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received.  Attempting to restart

我查找了mpm_prefork模块conf块…并且没有一个,所以所有默认值都在使用.

我查找了由弹性beanstalk执行的logrotation命令

/var/log/httpd/* {
size 10M
missingok
notifempty
rotate 5
sharedscripts
compress
dateext
dateformat -%s
create
postrotate
    /sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
olddir /var/log/httpd/rotated
}

很标准的东西.我对重装的理解是尝试优雅重启……

我可以通过执行sudo apachectl -k restart手动触发错误消息,虽然我无法找到在日志轮换期间运行的位置.

我们有下游服务似乎在此服务器挂起它的所有连接时抛出异常.

所以我的问题是,在logrotate期间还有什么可能导致mpm_prefork中的SIGHUP？据我所知,这不应该在错误条件之外发生.

Apache / 2.4.18(亚马逊)mod_wsgi / 3.5 Python / 3.4.3

解决方法

简而言之,看起来当前的Elastic Beanstalk日志转换配置被破坏,导致服务停机,504网关超时.让我们来看看.

再生产

我们创建最简单的Python WSGI应用程序.

application.py

import time


def application(environ,start_response):
    # somewhat realistic response duration
    time.sleep(0.5)

    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status,response_headers)
    return [b'Hello world!\n']

将其压缩到application.zip.然后创建Elastic Beanstalk Python应用程序和环境,上传档案.确保使用您拥有的密钥对.保留其他设置默认值.等到它完成(几分钟).

ssh到底层的EC2实例中(参见EB日志中的实例标识符).输入(httpd的logrotate post-action,见下文)：

sudo /sbin/service httpd reload

然后在你的机器上运行：

siege -v -b -c 10 -t 10S http://your-test-eb.you-aws-region.elasticbeanstalk.com/

当它运行时,重复重载命令几次.

然后你会看到如下内容：

** SIEGE 3.0.8
** Preparing 10 concurrent users for battle.
The server is now under siege...
HTTP/1.1 200   0.63 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.65 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.64 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.60 secs:      13 bytes ==> GET  /
...

这是重装时会发生什么.

HTTP/1.1 504   0.06 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.07 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.08 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.10 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.11 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.66 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.19 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.20 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.09 secs:       0 bytes ==> GET  /

然后它恢复了.

HTTP/1.1 200   1.25 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.24 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.26 secs:      13 bytes ==> GET  /
...

Lifting the server siege..      done.

Transactions:                 75 hits
Availability:              81.52 %
Elapsed time:               9.40 secs
Data transferred:           0.00 MB
Response time:              1.21 secs
Transaction rate:           7.98 trans/sec
Throughput:             0.00 MB/sec
Concurrency:                9.68
Successful transactions:      75
Failed transactions:          17
Longest transaction:        4.27
Shortest transaction:       0.06

请注意,ELB似乎对问题没有任何影响,并且可以通过两个SSH会话复制到底层EC2并且(Amazon AMI没有围攻)：

ab -v 4 -c 10 -t 10 http://your-test-eb.you-aws-region.elasticbeanstalk.com/

原因

/etc/cron.hourly/cron.logrotate.elasticbeanstalk.httpd.conf

#!/bin/sh
test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/var/log/httpd/* {
size 10M
missingok
notifempty
rotate 5
sharedscripts
compress
dateext
dateformat -%s
create
postrotate
    /sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
olddir /var/log/httpd/rotated
}

请注意postrotate. / sbin / service只是/etc/init.d/中脚本的System V包装器.它的手册页说：

service runs a System V init script in as predictable environment as possible,removing most environment variables and with
current working directory set to /.

请注意,重新加载不是标准的Apache maintenance command.这是发行版的下游添加.让我们看一下init脚本/etc/init.d/httpd.相关部分如下：

reload() {
        echo -n $"Reloading $prog: "
        check13 || exit 1
        killproc -p ${pidfile} $httpd -HUP
        RETVAL=$?
        echo
}

如您所见,它向Apache发送HUP信号,解释为Restart Now：

Sending the HUP or restart signal to the parent causes it to kill off its children like in TERM,but the parent doesn’t exit. It re-reads its configuration files,and re-opens any log files. Then it spawns a new set of children and continues serving hits.

TERM很好地解释了504s.但它可能应该如何完成是Graceful Restart,因为它也重新打开日志但不终止正在提供的请求：

The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they’re not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration,which begins serving new requests immediately.

…

The code was written to both minimize the time in which the server is unable to serve new requests (they will be queued up by the operating system,so they’re not lost in any event) and to respect your tuning parameters.

解决方法

可以使用.ebextensions替换/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf.在根目录中创建
.ebextensions / 10_logs.config包含以下内容(基本上将“reload”替换为“graceful”)：

files:
    "/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf":
        mode: "000644"
        owner: root
        group: root
        content: |
            /var/log/httpd/* {
                size 10M
                missingok
                notifempty
                rotate 5
                sharedscripts
                compress
                dateext
                dateformat -%s
                create
                postrotate
                    /sbin/service httpd graceful > /dev/null 2>/dev/null || true
                endscript
                olddir /var/log/httpd/rotated
            }

并重新部署您的Elastic Beanstalk环境.请注意,然而随后的亚秒优雅重启,我能够(零星地)产生503服务不可用,但是,对于日志轮换不是这样,因为均匀间隔的优雅重启没有错误.

linux – Elastic Beanstalk日志轮换导致Apache重启

解决方法

猜你在找的Linux相关文章