solaris – NFS v3与v4

我想知道为什么NFS v4会比NFS v3快得多,并且如果v3上有任何可以调整的参数.

我挂载了一个文件系统

sudo mount  -o  'rw,bg,hard,nointr,rsize=1048576,wsize=1048576,vers=4'  toto:/test /test

然后跑

dd if=/test/file  of=/dev/null bs=1024k

我可以读取200-400MB / s但是当我将版本更改为vers = 3时,重新安装并重新运行dd我只能获得90MB / s.我正在读取的文件是NFS服务器上的内存文件.连接的两端都是Solaris并且具有10GbE NIC.我通过在所有测试之间重新安装来避免任何客户端缓存.我使用dtrace在服务器上查看通过NFS测量数据的速度.对于v3和v4,我改变了：

nfs4_bsize
 nfs3_bsize

从默认的32K到1M(在v4上我最大为150MB / s,32K)
我试过调整

> nfs3_max_threads
> clnt_max_conns
> nfs3_async_clusters

提高v3性能,但没有去.

在v3上,如果我运行四个并行dd,吞吐量从90MB / s下降到70-80MBs,这使我相信问题是一些共享资源,如果是这样,那么我想知道它是什么以及我是否可以增加该资源.

dtrace代码获取窗口大小：

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs

inline string ADDR=$$1;

dtrace:::BEGIN
{
       TITLE = 10;
       title = 0;
       printf("starting up ...\n");
       self->start = 0;
}

tcp:::send,tcp:::receive
/   self->start == 0  /
{
     walltime[args[1]->cs_cid]= timestamp;
     self->start = 1;
}

tcp:::send,tcp:::receive
/   title == 0  &&
     ( ADDR == NULL || args[3]->tcps_raddr == ADDR  ) /
{
      printf("%4s %15s %6s %6s %6s %8s %8s %8s %8s %8s  %8s %8s %8s  %8s %8s\n","cid","ip","usend","urecd","delta","send","recd","ssz","sscal","rsz","rscal","congw","conthr","flags","retran"
      );
      title = TITLE ;
}

tcp:::send
/     ( ADDR == NULL || args[3]->tcps_raddr == ADDR ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    this->delta= timestamp-walltime[args[1]->cs_cid];
    walltime[args[1]->cs_cid]=timestamp;
    this->flags="";
    this->flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_SYN ) ? "SYN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_RST ) ? "RST|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_PUSH ) ? "PUSH|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ACK ) ? "ACK|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_URG ) ? "URG|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ECE ) ? "ECE|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_CWR ) ? "CWR|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags == 0 ) ? "null " : ""),this->flags);
    printf("%5d %14s %6d %6d %6d %8d \ %-8s %8d %6d %8d  %8d %8d %12d %s %d  \n",args[1]->cs_cid%1000,args[3]->tcps_raddr,args[3]->tcps_snxt - args[3]->tcps_suna,args[3]->tcps_rnxt - args[3]->tcps_rack,this->delta/1000,args[2]->ip_plength - args[4]->tcp_offset,"",args[3]->tcps_swnd,args[3]->tcps_snd_ws,args[3]->tcps_rwnd,args[3]->tcps_rcv_ws,args[3]->tcps_cwnd,args[3]->tcps_cwnd_ssthresh,this->flags,args[3]->tcps_retransmit
      );
    this->flags=0;
    title--;
    this->delta=0;
}

tcp:::receive
/ nfs[args[1]->cs_cid] &&  ( ADDR == NULL || args[3]->tcps_raddr == ADDR ) /
{
    this->delta= timestamp-walltime[args[1]->cs_cid];
    walltime[args[1]->cs_cid]=timestamp;
    this->flags="";
    this->flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),this->flags);
    printf("%5d %14s %6d %6d %6d %8s / %-8d %8d %6d %8d  %8d %8d %12d %s %d  \n",args[3]->tcps_retransmit
      );
    this->flags=0;
    title--;
    this->delta=0;
}

输出看起来像(不是来自这种特殊情况)：

cid              ip  usend  urecd  delta     send     recd      ssz    sscal      rsz     rscal    congw   conthr     flags   retran
  320 192.168.100.186    240      0    272      240 \             49232      0  1049800         5  1049800         2896 ACK|PUSH| 0
  320 192.168.100.186    240      0    196          / 68          49232      0  1049800         5  1049800         2896 ACK|PUSH| 0
  320 192.168.100.186      0      0  27445        0 \             49232      0  1049800         5  1049800         2896 ACK| 0
   24 192.168.100.177      0      0 255562          / 52          64060      0    64240         0    91980         2920 ACK|PUSH| 0
   24 192.168.100.177     52      0    301       52 \             64060      0    64240         0    91980         2920 ACK|PUSH| 0

一些标题

usend - unacknowledged send bytes
urecd - unacknowledged received bytes
ssz - send window
rsz - receive window
congw - congestion window

计划在v3和v4上采用dd的窥探并进行比较.已经完成了但是流量太大而且我使用了磁盘文件而不是缓存文件,这使得比较时间毫无意义.将使用缓存数据运行其他snoop,而不会在框之间运行其他流量. TBD

此外,网络人员表示,连接上没有流量整形或带宽限制.

解决方法

NFS 4.1 (minor 1)旨在成为一种更快,更高效的协议,推荐使用以前的版本,尤其是4.0.

这包括 client-side caching,虽然在这种情况下不相关,但是parallel-NFS (pNFS).主要的变化是协议现在是有状态的.

http://www.netapp.com/us/communities/tech-ontap/nfsv4-0408.html

我认为这是使用NetApps时推荐的协议,从性能文档来看.该技术类似于Windows Vista机会锁定.

NFSv4 differs from prevIoUs versions of NFS by allowing a server to
delegate specific actions on a file to a client to enable more
aggressive client caching of data and to allow caching of the locking
state. A server cedes control of file updates and the locking state to
a client via a delegation. This reduces latency by allowing the client
to perform varIoUs operations and cache data locally. Two types of
delegations currently exist: read and write. The server has the
ability to call back a delegation from a client should there be
contention for a file. Once a client holds a delegation,it can
perform operations on files whose data has been cached locally to
avoid network latency and optimize I/O. The more aggressive caching
that results from delegations can be a big help in environments with
the following characteristics:

Frequent opens and closes

Frequent GETATTRs

File locking

Read-only sharing

High latency

Fast clients

Heavily loaded server with many clients

solaris – NFS v3与v4

解决方法

猜你在找的Linux相关文章