我们在高负载时遇到cpu短缺.只有在线用户大约1300-1400时才会发生这种情况(根据谷歌分析).我们在浏览器中收到空响应.
我觉得奇怪的是短缺发生了.我的意思是 – 有超过50%的空闲cpu,突然它在几秒内降到0%,然后跳回来.
这是当时的“iostat 1”输出.看最后一栏. 70%是最新的正常行为.
http://pastebin.com/sMjQ0AaJ
队列中的所有PHP-FPM进程都在死亡,将这些日志留在/ var / log / messages中
Feb 25 20:20:01 <kern.info> dasaran kernel: pid 36267 (PHP-fpm),uid 80: exited on signal 11 Feb 25 20:20:01 <kern.info> dasaran kernel: pid 36265 (PHP-fpm),uid 80: exited on signal 11 Feb 25 20:20:01 <kern.info> dasaran kernel: pid 36263 (PHP-fpm),uid 80: exited on signal 11 Feb 25 20:20:01 <kern.info> dasaran kernel: pid 36262 (PHP-fpm),uid 80: exited on signal 11
2013/02/25 20:22:14 [error] 34877#0: accept() Failed (53: Software caused connection abort) 2013/02/25 20:22:14 [error] 34877#0: accept() Failed (53: Software caused connection abort) 2013/02/25 20:22:14 [error] 34877#0: accept() Failed (53: Software caused connection abort) 2013/02/25 20:22:14 [error] 34877#0: accept() Failed (53: Software caused connection abort) 2013/02/25 20:22:14 [error] 34874#0: accept() Failed (53: Software caused connection abort)
我不明白这种行为有两个原因.
>如果负载导致cpu短缺,那么cpu空闲时不应该线性下降,而不是突然?但另一个事实是,这种情况发生在负载上.
>为什么0%闲置持续几秒钟?
我们尝试优化脚本,服务器和数据库(单独的服务器).它只会使峰值用户略有增加.
服务器配置:
FreeBSD 8.3 Intel® Xeon® E3-1245 Quadcore 32 GB ECC RAM
什么会导致这样的问题?
我应该采取什么策略来找到瓶颈?
UPDATE
这是bt和dump_bt的gdb输出.
(gdb) dump_bt executor_globals.current_execute_data [0x801827a58] getSaveHandler() /www/svn/zend-libs/Toktik/Session/Set.PHP:42 [0x8018278d0] Toktik_Session_Set::getSaveHandler() /www/svn/zend-libs/Toktik/Session/Set.PHP:59 [0x801827630] Toktik_Session_Set->add("6j6omknh8tbr28358gadtp40s7") /www/svn/zend-libs/Toktik/Session/SaveHandler/PHPredis.PHP:146 [0x7fffffffc350] Toktik_Session_SaveHandler_PHPredis->write("6j6omknh8tbr28358gadtp40s7","Zend_Auth|a:1:{s:7:"storage";s:7:"3963623";}") (gdb) bt #0 0x0000000000695cfe in zend_fetch_var_address_helper_SPEC_CONST_VAR (type=0,execute_data=0x801827a58) at zend_vm_execute.h:4836 #1 0x00000000006961da in ZEND_FETCH_R_SPEC_CONST_VAR_HANDLER (execute_data=0x801827a58) at zend_vm_execute.h:4863 #2 0x0000000000680a01 in execute (op_array=0x80dc8e2c8) at zend_vm_execute.h:410 #3 0x000000000063101d in zend_call_function (fci=0x7fffffffc640,fci_cache=0x7fffffffc320) at /usr/ports/lang/PHP5/work/PHP-5.4.10/Zend/zend_execute_API.c:958 #4 0x000000000062fe8a in call_user_function_ex (function_table=0x80185e060,object_pp=0x0,function_name=0x80dc99b78,retval_ptr_ptr=0x7fffffffc6e8,param_count=2,params=0x80dadcab0,no_separation=1,symbol_table=0x0) at /usr/ports/lang/PHP5/work/PHP-5.4.10/Zend/zend_execute_API.c:750 #5 0x000000000062fcbd in call_user_function (function_table=0x80185e060,retval_ptr=0x80dae2670,params=0x7fffffffc7a0) at /usr/ports/lang/PHP5/work/PHP-5.4.10/Zend/zend_execute_API.c:723 #6 0x0000000803cc924f in ps_call_handler () from /usr/local/lib/PHP/20100525-debug/session.so #7 0x0000000803cc9924 in ps_write_user () from /usr/local/lib/PHP/20100525-debug/session.so #8 0x0000000803cbf4a8 in PHP_session_save_current_state () from /usr/local/lib/PHP/20100525-debug/session.so #9 0x0000000803cc3d06 in PHP_session_flush () from /usr/local/lib/PHP/20100525-debug/session.so #10 0x0000000803cc5cd9 in zm_deactivate_session () from /usr/local/lib/PHP/20100525-debug/session.so #11 0x000000000064f121 in zend_deactivate_modules () at /usr/ports/lang/PHP5/work/PHP-5.4.10/Zend/zend_API.c:2335 #12 0x00000000005b8aea in PHP_request_shutdown (dummy=0x0) at /usr/ports/lang/PHP5/work/PHP-5.4.10/main/main.c:1759 #13 0x000000000079ec06 in main (argc=1,argv=0x7fffffffed58) at /usr/ports/lang/PHP5/work/PHP-5.4.10/sapi/fpm/fpm/fpm_main.c:1948
这是使用PHPredis负责会话管理的类(dump_bt指向这些)
http://pastebin.com/kaRNXGCa