centos – 广泛使用RAM时服务器计算速度减慢

前端之家收集整理的这篇文章主要介绍了centos – 广泛使用RAM时服务器计算速度减慢前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

> 1)我使用计算应用WRF(天气研究和预测)@H_403_2@> 2)我使用双Xeon E5-2620 v3和128GB RAM(NUMA架构 – 可能与问题有关!)@H_403_2@> 3)我用mpirun -n 22 wrf.exe运行WRF(我有24个逻辑核心可用)@H_403_2@> 4)我使用Centos 7和3.10.0-514.26.2.el7.x86_64内核@H_403_2@> 5)在计算性能方面,Everthing工作正常,直到有一件事情发生:@H_403_2@> 5a)linux文件缓存获取一些数据,或@H_403_2@> 5b)我使用tmpfs并用一些数据填充它


> 6)RAM没有被交换,甚至没有接近发生,在最坏的情况下我有大约80%的RAM空闲!@H_403_2@> 7)/etc/sysctl.conf中的vm.zone_reclaim_mode = 1似乎有助于延迟5a场景中的问题@H_403_2@> 8)回声1> / proc / sys / vm / drop_caches在5a场景中彻底解决问题,将WRF性能恢复到最大速度,但只是暂时直到文件缓存再次获取数据,所以我在cron中使用这个命令(别担心,没关系,我仅将计算机用于WRF,并且不需要文件缓存才能以完全的性能工作)@H_403_2@> 9)但是,上面的命令在5b场景中仍然没有做任何事情(当我使用tmpfs作为临时文件时)@H_403_2@> 10)只有当我手动清空tmpfs时,才能在5b场景中恢复perfomanace@H_403_2@> 11)这不是WRF或mpi问题@H_403_2@> 12)这只发生在这一种计算机类型上,我管理了很多相同/类似的purporse(WRF).只有这一个有完整的NUMA架构,所以我怀疑它有它的东西@H_403_2@> 13)我也怀疑RHEL内核有这个但是不确定,没有尝试重新安装到不同的发行版中@H_403_2@> 14)numad和numactl选项调用像“numactl -l”这样的mpirun,没有任何区别



yum install numad
systemctl enable numad
systemctl start numad






Quick Summary

If you are simply looking for how to run an MPI application,you probably want to use a command line of the following form:@H_403_2@ % mpirun [ -np X ] [ –hostfile ]@H_403_2@ This will run X copies of in your current run-time environment (if running under a supported resource manager,Open MPI’s mpirun will usually automatically use the corresponding resource manager process starter,as opposed to,for example,rsh or ssh,which require the use of a hostfile,or will default to running all X copies on the localhost),scheduling (by default) in a round-robin fashion by cpu slot. See the rest of this page for more details.

Please note that mpirun automatically binds processes as of the start of the v1.8 series. Three binding patterns are used in the absence of any further directives:

Bind to core:@H_403_2@ when the number of processes is <= 2

Bind to socket:@H_403_2@ when the number of processes is > 2

Bind to none:@H_403_2@ when oversubscribed

If your application uses threads,then you probably want to ensure that you are either not bound at all (by specifying –bind-to none),or bound to multiple cores using an appropriate binding level or specific number of processing elements per application process.

在n = 22的情况下,没有应用绑定,可以重新定位线程.您可以尝试外部cpu绑定(与任务集一样).你必须做实验.
