我最近在论坛上看到(我认为他们正在讨论Java),如果你需要解析大量的字符串数据,那么比使用split()的字符串更好地使用字节数组.确切的帖子是:
One performance trick to working with any language,C++,Java,C# is
to avoid object creation. It’s not the cost of allocation or GC,its
the cost to access large memory arrays that dont fit in the cpu cache.Modern cpu’s are much faster than their memory. They stall for many,
many cycles for each cache miss. Most of the cpu transister budget is
allocated to reduce this with large caches and lots of ticks.GPU’s solve the problem differently by having lots of threads ready to
execute to hide memory access latency and have little or no cache and
spend the transistors on more cores.So,for example,rather than using String’s and split to parse a
message,use byte arrays that can be updated in place. You really want
to avoid random memory access over large data structures,at least in
the inner loops.
他只是说“不要使用字符串,因为它们是一个对象,创建对象是昂贵的”?还是他说别的?
使用字节数组确保数据保持在缓存中尽可能长吗?
当你使用一个字符串是太大,不能被保存在cpu缓存中?
一般来说,使用原始数据类型编写更快的代码的最佳方法?
解决方法
当然这是有限制的:如果文本非常非常大,并且只需要解析出它的一部分,那么这些几个小的字符串可能比缓冲区中的大块文本更适合.