我有一个文本文件infile.txt:
abc what's the foo bar. foobar hello world,hhaha cluster spatio something something. xyz trying to do this in parallel kmeans you're mean,who's mean?
文件中的每一行都将被这个perl命令处理成out.txt
`cat infile.txt | perl dosomething > out.txt`
想象一下,如果文本文件是100,000,000行.我想并行化bash命令,所以我尝试这样的东西:
$mkdir splitfiles $mkdir splitfiles_processed $cd splitfiles $split -n3 ../infile.txt $for i in $(ls); do "cat $i | perl dosomething > ../splitfiles_processed/$i &"; done $wait $cd ../splitfiles_processed $cat * > ../infile_processed.txt
但是还有一个比较冗长的做法吗?
来自@Ulfalizer的答案给您一个很好的解决方案提示,但它缺乏细节.
您可以使用GNU parallel(apt-get install Debian)
所以你的问题可以使用以下命令解决:
parallel -a infile.txt -l 1000 -j10 -k --spreadstdin perl dosomething > result.txt
这是论证的意思
-a: read input from file instead of stdin -l 1000: send 1000 lines blocks to command -j 10: launch 10 jobs in parallel -k: keep sequence of output --spreadstdin: sends the above 1000 line block to the stdin of the command