我在一些客户端的Linode服务器上有一个Hibernate,Spring,Debian,Tomcat,MySql堆栈.它是一个Spring-Multitenant应用程序,可为大约30个客户端托管网页.
应用程序启动正常,过了一会儿,我得到这个错误:
java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:662)
然而,在抛出此错误之前,nagios警告我ping服务器停止响应.
以前,我有Nginx作为代理,并且每个请求都得到这个Nginx错误,并且不得不重新启动tomcat:
2014/04/21 12:31:28 [error] 2259#0: *2441630 no live upstreams while connecting to upstream,client: 66.249.64.115,server: abril,request: "GET /catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox= HTTP/1.1",upstream: "http://appcluster/catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox=",host: "www.anabocafe.com"
2014/04/21 12:31:40 [error] 2259#0: *2441641 upstream timed out (110: Connection timed out) while reading response header from upstream,client: 200.74.195.61,request: "GET / HTTP/1.1",upstream: "http://127.0.0.1:8080/",host: "www.oli-med.com"
这是我的server.xml连接器配置:
我尝试使用this tutorial更改ulimit我能够为运行tomcat的用户更改打开的文件描述符的硬限制,但它没有解决问题,应用程序仍然挂起.
我最后一次重新启动服务器,它运行了大约3个小时,我有这些值用于socked打开的连接:
lsof -p TOMCAT_PID | wc -l
632 (more or less!! i did not write the exact number)
这个数字突然开始增长.
我有一些应用程序与其他服务器上的这个应用程序非常相似,不同之处在于它们是独立版本,这是一个多租户架构,我注意到在这个应用程序中我得到这种类型的套接字连接,这不会发生在任何其他安装的Stand Alone版本:
java 11506 root 646u IPv6 136862 0t0 TCP lixxx-xxx.members.linode.com:www->180.76.6.16:49545 (ESTABLISHED)
java 11506 root 647u IPv6 136873 0t0 TCP lixxx-xxx.members.linode.com:www->50.31.164.139:37734 (CLOSE_WAIT)
java 11506 root 648u IPv6 135889 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-247-188-179.eu-west-1.compute.amazonaws.com:28335 (CLOSE_WAIT)
java 11506 root 649u IPv6 136882 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-251-34-67.ap-southeast-1.compute.amazonaws.com:19023 (CLOSE_WAIT)
java 11506 root 650u IPv6 136884 0t0 TCP lixxx-xxx.members.linode.com:www->crawl-66-249-75-113.googlebot.com:39665 (ESTABLISHED)
java 11506 root 651u IPv6 136886 0t0 TCP lixxx-xxx.members.linode.com:www->190.97.240.116.viginet.com.ve:1391 (ESTABLISHED)
java 11506 root 652u IPv6 136887 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-50-112-95-211.us-west-2.compute.amazonaws.com:19345 (ESTABLISHED)
java 11506 root 653u IPv6 136889 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-248-250-232.ap-northeast-1.compute.amazonaws.com:51153 (ESTABLISHED)
java 11506 root 654u IPv6 136897 0t0 TCP lixxx-xxx.members.linode.com:www->baiduspider-180-76-5-149.crawl.baidu.com:31768 (ESTABLISHED)
java 11506 root 655u IPv6 136898 0t0 TCP lixxx-xxx.members.linode.com:www->msnbot-157-55-32-60.search.msn.com:35100 (ESTABLISHED)
java 11506 root 656u IPv6 136900 0t0 TCP lixxx-xxx.members.linode.com:www->50.31.164.139:47511 (ESTABLISHED)
java 11506 root 657u IPv6 135924 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-184-73-237-85.compute-1.amazonaws.com:28206 (ESTABLISHED)
我猜他们是某种自动连接.
所以我的问题是:
如何确定问题是由于我的代码,服务器还是某种攻击以及您建议采用哪种方法来解决问题?
先感谢您 :)
我认为这个问题的方法是得益于appdynamics.com的精彩工具,它可以让您检查ApplicationInfraestructurePerformance指标中的大量指标.
http://www.tomcatexpert.com/blog/2010/04/01/configuring-jdbc-pool-high-concurrency
官方文件也有帮助:
https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html.
我猜到达的连接首先启动了一个查询,首先崩溃了服务器响应能力,然后填写了操作系统套接字限制,在linux中,打开套接字是打开的文件.我希望这可以帮助别人 !
编辑
你好!这个解决方案在短期内解决了这个问题,但是出现了关于JDBC连接的另一个错误,应用程序没有关闭连接,我打开并解决了关于该问题的票证here