我正面临一个最近才开始的令人费解的问题.
我有一个程序,它使用一个线程写入文件,另一个线程从该文件中读取.
两个线程都使用不同的文件描述符.写入程序线程使用O_WRONLY标志打开文件,读取器线程以O_RDONLY模式打开文件.
就逻辑而言,读者线程不知道编写器线程正在做什么,并且两者都可以使用不同的文件.
写入程序线程以固定间隔连续写入文件(数据来自设备流,速度高达20Mbit / s).
读者线程也定期读取文件.
这是读者循环:
while (tot < sz) { LOG(VB_FILE,LOG_DEBUG,LOC + QString("read(%1) -- begin").arg(sz-tot)); ret = read(fd2,(char *)data + tot,sz - tot); LOG(VB_FILE,LOC + QString("read(%1) -> %2 end").arg(sz).arg(ret)); if ((sz - tot) != ret) { LOG(VB_FILE,LOC + QString("errno = %1").arg(errno)); } if (ret < 0) { if (errno == EAGAIN) { LOG(VB_FILE,LOC + QString("read(%1) -> %2 EAGAIN").arg(sz).arg(ret)); usleep(1000); continue; } LOG(VB_GENERAL,LOG_ERR,LOC + "File I/O problem in 'safe_read()'" + ENO); errcnt++; numfailures++; if (errcnt == 3) break; } else if (ret > 0) { tot += ret; } [...snipped...] }
您可以看到我在调用read之前显示日志,并在返回之后立即显示.
阅读将不时被调用,它永远不会回来……
2014-02-19 11:24:10.156417 D TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 5076 2014-02-19 11:24:10.156466 D TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 26934760 bytes 2014-02-19 11:24:10.156514 D FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -- begin 2014-02-19 11:24:10.190769 D FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -> 60968 end 2014-02-19 11:24:10.190781 I RingBuf(/external/recordings/1001_20140219002351.mpg): safe_read(...@1698944,65536) -> 65536,took 60 ms (8.73813Mbps) 2014-02-19 11:24:10.190786 D RingBuf(/external/recordings/1001_20140219002351.mpg): total read so far: 26930304 bytes 2014-02-19 11:24:10.190795 I FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -- begin 2014-02-19 11:24:10.195917 D FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -> 4456 end 2014-02-19 11:24:10.195927 D FileRingBuf(/external/recordings/1001_20140219002351.mpg): errno = 0 2014-02-19 11:24:10.206445 D TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 1692 2014-02-19 11:24:10.206489 D TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27000184 bytes 2014-02-19 11:24:10.256103 D FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(61080) -- begin 2014-02-19 11:24:10.256499 D TFW(/external/recordings/1001_20140219002351.mpg:64): write(47376) cnt 1 total 40984 2014-02-19 11:24:10.262073 D TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27047560 bytes 2014-02-19 11:24:10.273385 D TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 940 2014-02-19 11:24:10.385495 D TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27112984 bytes
你可以在这里看到编写器已经向磁盘写了26934760个字节.到目前为止读取的读数为26930304字节,因此我们从EOF读取4456字节.然后尝试64kB读取,读取几乎立即返回4456字节.到现在为止还挺好.
立即尝试另一次读取61080字节(65536-4456).
不久之后,编写器线程再次写入文件.
64kB读取现在正在等待,并且不会再持续30秒.
所以关于为什么读取会突然阻塞的任何特定想法?
编辑:从查看行为开始,一旦读取达到EOF并且提前返回,如果在新写入发生之前立即重试读取,则阻塞似乎总是发生.在这种情况下,读取将不会退出几秒钟(通常为20秒)
解决方法
好…
我发现了这个问题以及如何解决它.
正如原始问题中所提到的,一旦读取达到EOF,就会发生阻塞,提前返回并立即重试读取(在文件发生新写入之前).
在这种情况下,read()不会退出几秒钟(通常超过20秒)
因此,解决方法是记录我们到目前为止已读取的字节数,以便知道它在文件中的位置,并调用fstat来检查文件的大小.从那里,确保我们从不调用read()如果我们已经在文件的末尾或要求read()检索比文件中更多的字节.
struct stat sb; off_t current_pos = internalreadpos; while (tot < sz) { off_t toread = sz - tot; bool read_ok = true; // check that we have some data to read,// so we never attempt to read past the end of file // if fstat errored or isn't a regular file,default to prevIoUs behavior ret = fstat(fd2,&sb); if (ret == 0 && S_ISREG(sb.st_mode)) { if (current_pos >= sb.st_size) { // We're at the end,don't attempt to read read_ok = false; LOG(VB_FILE,LOC + "not reading,reached EOF"); } else { toread = min(sb.st_size - current_pos,toread); if (toread < (sz-tot)) { LOG(VB_FILE,LOC + QString("About to reach EOF,reading %1 wanted %2") .arg(toread).arg(sz-tot)); } } } if (read_ok) { LOG(VB_FILE,LOC + QString("read(%1) -- begin").arg(toread)); ret = read(fd2,toread); LOG(VB_FILE,LOC + QString("read(%1) -> %2 end").arg(toread).arg(ret)); } if (ret < 0) { if (errno == EAGAIN) continue; LOG(VB_GENERAL,LOC + "File I/O problem in 'safe_read()'" + ENO); errcnt++; numfailures++; if (errcnt == 3) break; } else if (ret > 0) { tot += ret; current_pos += ret; } if (oldfile) break; if (ret == 0) // EOF returns 0 { if (tot > 0) break; zerocnt++; // 0.36 second timeout for livetvchain with usleep(60000),// or 2.4 seconds if it's a new file less than 30 minutes old. if (zerocnt >= (livetvchain ? 6 : 40)) { break; } }