C 11元组表现

在包括单个元素在内的很多情况下,我只是想通过使用std :: tuple来使我的代码更加通用化.我的意思是例如元组< double>而不是双倍.但我决定检查这个特例的表现.

这是简单的性能基准测试：

#include <tuple>
#include <iostream>

using std::cout;
using std::endl;
using std::get;
using std::tuple;

int main(void)
{

#ifdef TUPLE
    using double_t = std::tuple<double>;
#else
    using double_t = double;
#endif

    constexpr int count = 1e9;
    auto array = new double_t[count];

    long long sum = 0;
    for (int idx = 0; idx < count; ++idx) {
#ifdef TUPLE
        sum += get<0>(array[idx]);
#else
        sum += array[idx];
#endif
    }
    delete[] array;
    cout << sum << endl; // just "external" side effect for variable sum.
}

并运行结果：

$g++ -DTUPLE -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m3.347s
user    0m2.839s
sys     0m0.485s

$g++  -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m2.963s
user    0m2.424s
sys     0m0.519s

我认为元组是严格的静态编译模板和所有get<>在这种情况下,函数只是通常的变量访问.此测试中的BTW内存分配大小相同.
为什么会出现执行时间差异？

编辑：问题是在元组的初始化<>宾语.为了使测试更准确,必须更改一行：

constexpr int count = 1e9;
-    auto array = new double_t[count];
+    auto array = new double_t[count]();

     long long sum = 0;

之后,可以观察到类似的结果：

$g++ -DTUPLE -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.342s
real    0m3.339s
real    0m3.343s

$g++ -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.349s
real    0m3.339s
real    0m3.334s

解决方法

元组所有默认构造值(因此一切都为0)双精度不会默认初始化.

在生成的程序集中,以下初始化循环仅在使用元组时出现.否则它们是等价的.

.L2:
    movq    $0,(%rdx)
    addq    $8,%rdx
    cmpq    %rcx,%rdx
    jne .L2

C 11元组表现

解决方法

猜你在找的C&C++相关文章