我正在编写一些解析日志文件的代码,但需要注意的是这些文件是压缩的,必须在运行时解压缩.这段代码是一段性能敏感的代码,所以我正在尝试各种方法来找到合适的代码.无论我使用多少个线程,我的内存基本上与程序所需的内存一样多.
我发现了一种似乎表现相当不错的方法,我试图理解为什么它提供更好的性能.
这两种方法都有一个读取器线程,一个从管道gzip进程读取并写入大缓冲区.然后在请求下一个日志行时对该缓冲区进行延迟解析,返回基本上是指向缓冲区中不同字段所在位置的指针结构.
代码在D中,但它与C或C非常相似.
共享变量:
shared(bool) _stream_empty = false;;
shared(ulong) upper_bound = 0;
shared(ulong) curr_index = 0;
解析代码:
//Lazily parse the buffer
void construct_next_elem() {
while(1) {
// Spin to stop us from getting ahead of the reader thread
buffer_empty = curr_index >= upper_bound -1 &&
_stream_empty;
if(curr_index >= upper_bound && !_stream_empty) {
continue;
}
// Parsing logic .....
}
}
方法1:
Malloc是一个足够大的缓冲区,可以预先保存解压缩文件.
char[] buffer; // Same as vector
方法2:
使用匿名内存映射作为缓冲区
MmFile buffer;
buffer = new MmFile(null,MmFile.Mode.readWrite,// PROT_READ || PROT_WRITE
buffer_length,null); // MAP_ANON || MAP_PRIVATE
读者主题:
ulong buffer_length = get_gzip_length(file_path);
pipe = pipeProcess(["gunzip","-c",file_path],Redirect.stdout);
stream = pipe.stdout();
static void stream_data() {
while(!l.stream.eof()) {
// Splice is a reference inside the buffer
char[] splice = buffer[upper_bound..upper_bound + READ_SIZE];
ulong read = stream.rawRead(splice).length;
upper_bound += read;
}
// Clean up
}
void start_stream() {
auto t = task!stream_data();
t.executeInNewThread();
construct_next_elem();
}
User time (seconds): 112.22
System time (seconds): 38.56
Percent of cpu this job got: 151%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:39.40
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3784992
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 5463
Voluntary context switches: 90707
Involuntary context switches: 2838
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
与
User time (seconds): 275.92
System time (seconds): 73.92
Percent of cpu this job got: 117%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:58.73
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3777336
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 944779
Voluntary context switches: 89305
Involuntary context switches: 9836
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
有人可以帮助我阐明为什么使用mmap会有如此明显的性能下降?
编辑 – – –
改变方法2做:
char * buffer = cast(char*)mmap(cast(void*)null,buffer_length,PROT_READ | PROT_WRITE,MAP_ANON | MAP_PRIVATE,-1,0);
现在使用简单的MmFile获得3倍的性能提升.我试图弄清楚什么可能导致性能如此明显不同,它本质上只是mmap的包装.
仅使用直接char * mmap vs Mmfile的Perf数字,减少页面错误的方式:
User time (seconds): 109.99
System time (seconds): 36.11
Percent of cpu this job got: 151%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:36.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3777896
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2771
Voluntary context switches: 90827
Involuntary context switches: 2999
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
最佳答案
原文链接:https://www.f2er.com/linux/440871.html