我现在一直在编写一个图像处理算法,在某些时候我需要收集一些关于转换像素的统计信息,以便更深入地了解我应该遵循的进一步开发方向.我需要收集的信息格式如下:
key: RGB value value: int
我做了什么,是我打开转换后的图像并通过它迭代,将我需要的值保存到具有以下签名的std :: unordered_map:
typedef std::unordered_map<boost::gil::rgb8_pixel_t,unsigned int> pixel_map_t;
在循环中:
for(int y = 0; y < vi.height(); y++) { SrcView::x_iterator dst_it = src.row_begin(y); for(int x = 0; x < vi.width(); x++,hits++) { diff_map.insert(std::make_pair(dst_it[x],/* some uint32 */)); }
我还写了一个自定义哈希函数(它是一个完美的哈希函数:256 ^ 2 x R 256 x G.
B – 所以无论桶和哈希表的布局如何(合理的延伸),冲突都应该是最小的.
我注意到的是,插入速度非常慢! – 在达到第11次迭代后,插入速度降低约100倍.我发生了大量的碰撞!
尽管图像中的重复颜色数量非常少.
之后,我想消除代码中的任何可能的错误,并开始使用带有原始键类型(如int)的STL哈希函数对unordered_map进行基准测试.
该基准的代码是:
std::size_t hits = 0,colls = 0; for(int y = 0; y < vi.height(); y++) { SrcView::x_iterator dst_it = src.row_begin(y); for(int x = 0; x < vi.width(); x++,hits++) { if(diff_map.find(x*y) != diff_map.cend()) colls++; diff_map.insert(std::make_pair(x*y,10)); } std::cout << y << "/" << vi.height() << " -> buckets: " << diff_map.bucket_count() << "(" << std::floor(diff_map.load_factor() * 100) << "% load factor) [ " << colls << " collisions / " << hits << " hits ]" << std::endl; }
…以下是外部循环的前20次迭代的结果(仅对ST型键使用STL的散列函数):
0/480 -> buckets: 8(12% load factor) [ 639 collisions / 640 hits ] 1/480 -> buckets: 4096(15% load factor) [ 640 collisions / 1280 hits ] 2/480 -> buckets: 4096(23% load factor) [ 960 collisions / 1920 hits ] 3/480 -> buckets: 4096(31% load factor) [ 1281 collisions / 2560 hits ] 4/480 -> buckets: 4096(37% load factor) [ 1654 collisions / 3200 hits ] 5/480 -> buckets: 4096(45% load factor) [ 1964 collisions / 3840 hits ] 6/480 -> buckets: 4096(51% load factor) [ 2370 collisions / 4480 hits ] 7/480 -> buckets: 4096(59% load factor) [ 2674 collisions / 5120 hits ] 8/480 -> buckets: 4096(65% load factor) [ 3083 collisions / 5760 hits ] 9/480 -> buckets: 4096(71% load factor) [ 3460 collisions / 6400 hits ] 10/480 -> buckets: 4096(77% load factor) [ 3872 collisions / 7040 hits ] 11/480 -> buckets: 4096(85% load factor) [ 4161 collisions / 7680 hits ] 12/480 -> buckets: 4096(90% load factor) [ 4612 collisions / 8320 hits ] 13/480 -> buckets: 4096(99% load factor) [ 4901 collisions / 8960 hits ] 14/480 -> buckets: 32768(13% load factor) [ 5315 collisions / 9600 hits ] 15/480 -> buckets: 32768(13% load factor) [ 5719 collisions / 10240 hits ] 16/480 -> buckets: 32768(14% load factor) [ 6148 collisions / 10880 hits ] 17/480 -> buckets: 32768(15% load factor) [ 6420 collisions / 11520 hits ] 18/480 -> buckets: 32768(16% load factor) [ 6870 collisions / 12160 hits ] 19/480 -> buckets: 32768(17% load factor) [ 7135 collisions / 12800 hits ] 20/480 -> buckets: 32768(17% load factor) [ 7584 collisions / 13440 hits ] 21/480 -> buckets: 32768(18% load factor) [ 7993 collisions / 14080 hits ]
在这种情况下,碰撞次数是否过高? STL库通常具有高质量,但对于简单的基于int的键音具有639/640和640/1280至少是奇怪的.
或者也许我做错了什么?
这是我的哈希函数(理论上,应该没有碰撞 – 但数字非常接近):
template<> struct std::hash<boost::gil::rgb8_pixel_t> : public std::unary_function<const boost::gil::rgb8_pixel_t&,size_t> { size_t operator()(const boost::gil::rgb8_pixel_t& key) const { size_t ret = (static_cast<size_t>(key[0]) << 16) | (static_cast<size_t>(key[1]) << 8) | (static_cast<size_t>(key[2])); //return 256 * 256 * key[0] + 256 * key[1] + key[2]; return ret; } };
现在,这不再有趣了……
我写了这个哈希函数:
template<> struct std::hash<int> : public std::unary_function<const int&,size_t> { size_t operator()(const int& key) const { return 5; } };
从理论上讲,我应该有100%的碰撞率,对吗?但结果是:
0/480 -> buckets: 8(12% load factor) [ 639 collisions / 640 hits ] 1/480 -> buckets: 4096(15% load factor) [ 640 collisions / 1280 hits ] 2/480 -> buckets: 4096(23% load factor) [ 960 collisions / 1920 hits ] 3/480 -> buckets: 4096(31% load factor) [ 1281 collisions / 2560 hits ] 4/480 -> buckets: 4096(37% load factor) [ 1654 collisions / 3200 hits ] 5/480 -> buckets: 4096(45% load factor) [ 1964 collisions / 3840 hits ] 6/480 -> buckets: 4096(51% load factor) [ 2370 collisions / 4480 hits ] 7/480 -> buckets: 4096(59% load factor) [ 2674 collisions / 5120 hits ] 8/480 -> buckets: 4096(65% load factor) [ 3083 collisions / 5760 hits ] 9/480 -> buckets: 4096(71% load factor) [ 3460 collisions / 6400 hits ]
为什么?
环境:MSVS2010
解决方法
colls不测量碰撞.如果要测量冲突,则对于[0,bucket_count())范围内的每个桶b,获取bucket_size(b).这将告诉你每个桶中有多少项.如果存储桶中有2个或更多项目,则表示存在bucket_size(b) – 存储桶b的1个冲突.