所以我决定使用自定义简约实现,我可以访问数据指针.
#ifndef LIB_BITS_T_H #define LIB_BITS_T_H #include <algorithm> template <typename B> class bits_t{ public: typedef B block_t; static const size_t block_size = sizeof(block_t) * 8; block_t* data; size_t size; size_t blocks; class bit_ref{ public: block_t* const block; const block_t mask; bit_ref(block_t& block,const block_t mask) noexcept : block(&block),mask(mask){} inline void operator=(bool v) const noexcept{ if(v) *block |= mask; else *block &= ~mask; } inline operator bool() const noexcept{ return (bool)(*block & mask); } }; bits_t() noexcept : data(nullptr){} void resize(const size_t n,const bool v) noexcept{ block_t fill = v ? ~block_t(0) : block_t(0); size = n; blocks = (n + block_size - 1) / block_size; data = new block_t[blocks]; std::fill(data,data + blocks,fill); } inline block_t& block_at_index(const size_t i) const noexcept{ return data[i / block_size]; } inline size_t index_in_block(const size_t i) const noexcept{ return i % block_size; } inline bit_ref operator[](const size_t i) noexcept{ return bit_ref(block_at_index(i),block_t(1) << index_in_block(i)); } ~bits_t(){ delete[] data; } }; #endif // LIB_BITS_T_H
代码与/usr/include / c /4.7/bits/stl_bvector.h中的代码几乎相同,但速度较慢.
我尝试过优化,
#ifndef LIB_BITS_T_H #define LIB_BITS_T_H #include <algorithm> template <typename B> class bits_t{ const B mask[64] = { 0b0000000000000000000000000000000000000000000000000000000000000001,0b0000000000000000000000000000000000000000000000000000000000000010,0b0000000000000000000000000000000000000000000000000000000000000100,0b0000000000000000000000000000000000000000000000000000000000001000,0b0000000000000000000000000000000000000000000000000000000000010000,0b0000000000000000000000000000000000000000000000000000000000100000,0b0000000000000000000000000000000000000000000000000000000001000000,0b0000000000000000000000000000000000000000000000000000000010000000,0b0000000000000000000000000000000000000000000000000000000100000000,0b0000000000000000000000000000000000000000000000000000001000000000,0b0000000000000000000000000000000000000000000000000000010000000000,0b0000000000000000000000000000000000000000000000000000100000000000,0b0000000000000000000000000000000000000000000000000001000000000000,0b0000000000000000000000000000000000000000000000000010000000000000,0b0000000000000000000000000000000000000000000000000100000000000000,0b0000000000000000000000000000000000000000000000001000000000000000,0b0000000000000000000000000000000000000000000000010000000000000000,0b0000000000000000000000000000000000000000000000100000000000000000,0b0000000000000000000000000000000000000000000001000000000000000000,0b0000000000000000000000000000000000000000000010000000000000000000,0b0000000000000000000000000000000000000000000100000000000000000000,0b0000000000000000000000000000000000000000001000000000000000000000,0b0000000000000000000000000000000000000000010000000000000000000000,0b0000000000000000000000000000000000000000100000000000000000000000,0b0000000000000000000000000000000000000001000000000000000000000000,0b0000000000000000000000000000000000000010000000000000000000000000,0b0000000000000000000000000000000000000100000000000000000000000000,0b0000000000000000000000000000000000001000000000000000000000000000,0b0000000000000000000000000000000000010000000000000000000000000000,0b0000000000000000000000000000000000100000000000000000000000000000,0b0000000000000000000000000000000001000000000000000000000000000000,0b0000000000000000000000000000000010000000000000000000000000000000,0b0000000000000000000000000000000100000000000000000000000000000000,0b0000000000000000000000000000001000000000000000000000000000000000,0b0000000000000000000000000000010000000000000000000000000000000000,0b0000000000000000000000000000100000000000000000000000000000000000,0b0000000000000000000000000001000000000000000000000000000000000000,0b0000000000000000000000000010000000000000000000000000000000000000,0b0000000000000000000000000100000000000000000000000000000000000000,0b0000000000000000000000001000000000000000000000000000000000000000,0b0000000000000000000000010000000000000000000000000000000000000000,0b0000000000000000000000100000000000000000000000000000000000000000,0b0000000000000000000001000000000000000000000000000000000000000000,0b0000000000000000000010000000000000000000000000000000000000000000,0b0000000000000000000100000000000000000000000000000000000000000000,0b0000000000000000001000000000000000000000000000000000000000000000,0b0000000000000000010000000000000000000000000000000000000000000000,0b0000000000000000100000000000000000000000000000000000000000000000,0b0000000000000001000000000000000000000000000000000000000000000000,0b0000000000000010000000000000000000000000000000000000000000000000,0b0000000000000100000000000000000000000000000000000000000000000000,0b0000000000001000000000000000000000000000000000000000000000000000,0b0000000000010000000000000000000000000000000000000000000000000000,0b0000000000100000000000000000000000000000000000000000000000000000,0b0000000001000000000000000000000000000000000000000000000000000000,0b0000000010000000000000000000000000000000000000000000000000000000,0b0000000100000000000000000000000000000000000000000000000000000000,0b0000001000000000000000000000000000000000000000000000000000000000,0b0000010000000000000000000000000000000000000000000000000000000000,0b0000100000000000000000000000000000000000000000000000000000000000,0b0001000000000000000000000000000000000000000000000000000000000000,0b0010000000000000000000000000000000000000000000000000000000000000,0b0100000000000000000000000000000000000000000000000000000000000000,0b1000000000000000000000000000000000000000000000000000000000000000 }; public: typedef B block_t; static const size_t block_size = sizeof(block_t) * 8; block_t* data; size_t size; size_t blocks; class bit_ref{ public: block_t* const block; const block_t mask; bit_ref(block_t& block,mask[index_in_block(i)]); } ~bits_t(){ delete[] data; } }; #endif // LIB_BITS_T_H
(用g 4.7 -O3编译)
Eratosthenes筛分算法(33.333.333位)
的std ::矢量< BOOL> 19.1s bits_t的<为size_t> 19.9s bits_t的<为size_t> (使用查找表)19.7s ctor调整大小(33.333.333位)dtor 的std ::矢量< BOOL> 120毫秒 bits_t的<为size_t> 150毫秒 问题:减速从何而来?
解决方法
话虽这么说,你没有做必要的删除&复制,这可能会对您的版本产生“积极的”影响……(“正面”影响速度明智,您不会删除旧数据,也不会将其复制到新缓冲区中.)
此外,std :: vector将正确放大缓冲区,从而复制可能已存在于cpu缓存中的数据.对于您的版本,该缓存将丢失,因为您只是忽略每个resize()上的旧缓冲区.
此外,当一个类处理内存缓冲区时,由于某些原因,通常会实现复制和赋值运算符……您也可以考虑使用shared_ptr<>().然后隐藏删除,并且类是一个模板,因此它非常快(它不会添加您自己的版本中没有的任何代码.)
===更新
还有一件事.你是operator []的实现:
inline bit_ref operator[](const size_t i) noexcept{ return bit_ref(block_at_index(i),mask[index_in_block(i)]); }
(旁注:内联不是必需的,因为你在类中编写代码已经意味着你已经确定了内联功能.)
您只提供一个“非常慢”的非const版本,因为它创建了一个子类.您应该尝试实现一个返回bool的const版本,看看它是否会导致您看到的差异大约为3%.
bool operator[](const size_t i) const noexcept { return (block_at_index(i) & mask[index_in_block(i)]) != 0; }
此外,使用mask []数组也可以减慢速度. (1LL<<(index& 0x3F))应该更快(具有0存储器访问的2个cpu指令).