老主持人:鲤鱼
Sun Fire x4150带有8核,32 GB RAM
SLES 9 SP4
网络驱动程序:e1000
me@carp:~> uname -a Linux carp 2.6.5-7.308-smp #1 SMP Mon Dec 10 11:36:40 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux
新主持人:辣椒
HP ProLiant Dl360P Gen8具有8核,64 GB RAM
CentOS 6.3
网络驱动程序:tg3
me@pepper:~> uname -a Linux pepper 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
我将跳转到一些说明读/写测试的图表.继承胡椒及其不平衡的读/写:
这是鲤鱼,看起来很好:
测试
以下是我正在运行的读/写测试.我分别运行这些并且它们在胡椒上看起来很棒,但是当它们一起运行时(使用&),写入性能保持稳定,而读取性能则受到很大影响.测试文件的大小是RAM的两倍(辣椒为128 GB,鲤鱼为64 GB).
# write time dd if=/dev/zero of=/mnt/peppershare/testfile bs=65536 count=2100000 & # read time dd if=/mnt/peppershare/testfile2 of=/dev/null bs=65536 &
NFS服务器主机名是nfsc. Linux客户端在子网上具有与其他任何内容分离的专用NIC(即,与主IP不同的子网).每个Linux客户端都将nfs共享从服务器nfsc挂载到/ mnt / hostnameshare.
nfsiostat
在胡椒的模拟测试期间,有1分钟的样本:
me@pepper:~> nfsiostat 60 nfsc:/vol/pg003 mounted on /mnt/peppershare: op/s rpc bklog 1742.37 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 49.750 3196.632 64.254 0 (0.0%) 9.304 26.406 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 1642.933 105628.395 64.293 0 (0.0%) 3.189 86559.380
我还没有老主机鲤鱼的nfsiostat,但正在努力.
的/ proc /坐骑
me@pepper:~> cat /proc/mounts | grep peppershare nfsc:/vol/pg003 /mnt/peppershare nfs rw,noatime,nodiratime,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.x.x.x,mountvers=3,mountport=4046,mountproto=tcp,local_lock=none,addr=172.x.x.x 0 0 me@carp:~> cat /proc/mounts | grep carpshare nfsc:/vol/pg008 /mnt/carpshare nfs rw,v3,rsize=32768,wsize=32768,timeo=60000,retrans=3,tcp,lock,addr=nfsc 0 0
网卡设置
me@pepper:~> sudo ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 4 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: g Wake-on: g Current message level: 0x000000ff (255) Link detected: yes me@carp:~> sudo ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes
卸载设置:
me@pepper:~> sudo ethtool -k eth3 Offload parameters for eth3: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off me@carp:~> # sudo ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on
它全部位于LAN上,在nfs客户端和nfs服务器之间具有全双工的千兆交换机.另一方面,我看到cpu在胡椒上的等待时间比鲤鱼多得多,正如预期的那样,因为我怀疑它等待nfs操作.
我用Wireshark / Ethereal捕获了数据包,但我在那个区域并不强大,所以不确定要寻找什么.我没有在Wireshark中看到一堆以红色/黑色突出显示的数据包,所以这就是我所寻找的所有数据包:).这种糟糕的nfs性能体现在我们的Postgres环境中.
还有其他想法或疑难解答提示吗如果我能提供进一步的信息,请告诉我.
UPDATE
根据@ ewwhite的评论,我尝试了两种不同的tuned-adm配置文件,但没有变化.
在我的红色标记的右边还有两个测试.第一个是吞吐量性能,第二个是企业级存储.
nfsiostat 60企业存储配置文件
nfsc:/vol/pg003 mounted on /mnt/peppershare: op/s rpc bklog 1758.65 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 51.750 3325.140 64.254 0 (0.0%) 8.645 24.816 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 1655.183 106416.517 64.293 0 (0.0%) 3.141 159500.441
更新2
解决方法
你可以看到我标记了我在测试时使用的各种“块”大小,即rsize / wsize缓冲区大小安装选项.我发现,8k尺寸的dd测试吞吐量最高,令人惊讶.
这些是我现在使用的nfs挂载选项,每个/ proc / mounts:
nfsc:/vol/pg003 /mnt/peppershare nfs rw,sync,rsize=8192,wsize=8192,noac,addr=172.x.x.x 0 0
仅供参考,noac选项man entry:
ac / noac
Selects whether the client may cache file attributes. If neither
option is specified (or if ac is specified),the client caches file
attributes.To improve performance,NFS clients cache file attributes. Every few
seconds,an NFS client checks the server’s version of each file’s
attributes for updates. Changes that occur on the server in those
small intervals remain undetected until the client checks the server
again. The noac option prevents clients from caching file attributes
so that applications can more quickly detect file changes on the
server.In addition to preventing the client from caching file attributes,the
noac option forces application writes to become synchronous so that
local changes to a file become visible on the server immediately. That
way,other clients can quickly detect recent writes when they check
the file’s attributes.Using the noac option provides greater cache coherence among NFS
clients accessing the same files,but it extracts a significant
performance penalty. As such,judicIoUs use of file locking is
encouraged instead. The DATA AND MetaDATA COHERENCE section contains a
detailed discussion of these trade-offs.
我在网络上阅读关于属性缓存的不同意见,所以我唯一的想法是它是一个必要的选项,或者与NetApp NFS服务器和/或具有更新内核的Linux客户端(> 2.6.5)一起使用.我们没有在具有2.6.5内核的SLES 9上看到这个问题.
我也阅读了关于rsize / wise的不同意见,通常你采用默认值,目前我的系统是65536,但8192给了我最好的测试结果.我们也将使用postgres做一些基准测试,所以我们将看到这些各种缓冲区大小的表现.