我有点麻烦将大文本文件拆分成多个较小的文本文件.我的文本文件的语法如下:
dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done ...and so on
(dasdas#42319 blaablaa 50 50,内容内容,更多内容和内容结论都是他们自己单独的行,后跟一个空白行是该信息表的结尾.我文件中的典型信息表有10-40行之间的任何地方.)
我希望将此文件拆分为n个较小的文件,其中n是内容表的数量.
那是
dasdas #42319 blaablaa 50 50 content content more content content conclusion
将是它自己的单独文件,(whateverN.txt)
和
asdasd #92012 blaablaa 30 70 content again more of it content conclusion
再一个单独的文件,无论是1.txt等等.
似乎awk或Perl是这方面的漂亮工具,但在语法之前从未使用它们有点莫名其妙.
我发现这两个问题几乎与我的问题相对应,但未能修改语法以满足我的需求.
Split text file into multiple files&
https://unix.stackexchange.com/questions/46325/how-can-i-split-a-text-file-into-multiple-text-files
将RS设置为null会告诉awk使用一个或多个空行作为记录分隔符.然后,您只需使用NR设置与每个新记录对应的文件的名称:
awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
RS:
This is awk’s input record separator. Its default value is a string containing a single newline character,which means that an input record consists of a single line of text. It can also be the null string,in which case records are separated by runs of blank lines,or a regexp,in which case records are separated by matches of the regexp in the input text.
$cat file.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done $awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt $ls whatever-*.txt whatever-1.txt whatever-2.txt whatever-3.txt $cat whatever-1.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion $cat whatever-2.txt asdasd #92012 blaablaa 30 70 content again more of it content conclusion $cat whatever-3.txt asdasd #299 yadayada 60 40 content content contend done $