我在
csv file中有一个引用列表,我想在
CrossRef用来填写基于XML的查询表单
CrossRef提供了一个XML模板(下面,删除了未使用的字段),我想解析csv文件的列以填充查询标记中的重复字段:
<?xml version = "1.0" encoding="UTF-8"?> <query_batch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0" xmlns="http://www.crossref.org/qschema/2.0" xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd"> <head> <email_address>test@crossref.org</email_address> <doi_batch_id>test</doi_batch_id> </head> <body> <query enable-multiple-hits="true" list-components="false" expanded-results="false" key="key"> <article_title match="fuzzy"></article_title> <author search-all-authors="false"></author> <volume></volume> <year></year> <first_page></first_page> <journal_title></journal_title> </query> </body> </query_batch>
如何在shell脚本中完成?
样本输入:
author,year,article_title,journal_title,volume,first_page Adler,2006,"Biomass yield and biofuel quality of switchgrass harvested in fall or spring","Agronomy Journal",98,1518 Alexopolou,2008,"Biomass yields for upland and lowland switchgrass varieties grown in the Mediterranean region","Biomass and Bioenergy",32,926 Balasko,1984,"Yield and Quality of Switchgrass Grown without Soil Amendments.",76,204
期望的输出:
<?xml version = "1.0" encoding="UTF-8"?> <query_batch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0" xmlns="http://www.crossref.org/qschema/2.0" xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd"> <head> <email_address>test@crossref.org</email_address> <doi_batch_id>test</doi_batch_id> </head> <body> <query> <author>Adler</author > <year>2006</year > <article_title>Biomass yield and biofuel quality of switchgrass harvested in fall or spring</article_title > <journal_title>Agronomy Journal</journal_title > <volume>98</volume > <first_page>1518</first_page > </query> <query> <author>Alexopolou</author > <year>2008</year > <article_title>Biomass yields for upland and lowland switchgrass varieties grown in the Mediterranean region</article_title > <journal_title>Biomass and Bioenergy</journal_title > <volume>32</volume > <first_page>926</first_page > </query> <query> <author>Balasko</author > <year>1984</year > <article_title>Yield and Quality of Switchgrass Grown without Soil Amendments.</article_title > <journal_title>Agronomy Journal</journal_title > <volume>76</volume > <first_page>204</first_page > </query> </body>
#!/usr/bin/awk -f # XML Attributes Must be Quoted. Attribute values must always be quoted. Either single or double quotes can be used. BEGIN{ FS="," print "<?xml version = '1.0' encoding='UTF-8'?>" print "<query_batch xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' version='2.0' xmlns='http://www.crossref.org/qschema/2.0'" print " xsi:schemaLocation='http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd'>" print "<head>" print " <email_address>test@crossref.org</email_address>" print " <doi_batch_id>test</doi_batch_id>" print "</head>" print "<body>" } NR>1{ print " <query enable-multiple-hits='true'" print " list-components='false'" print " expanded-results='false' key='key'>" print " <article_title match='fuzzy'>" $3 "</article_title>" print " <author search-all-authors='false'>" $1 "</author>" print " <volume>" $5 "</volume>" print " <year>" $2 "</year>" print " <first_page>" $6 "</first_page>" print " <journal_title>" $4 "</journal_title>" print " </query>" } END{ print "</body>" print "</query_batch>" }
$awk -f script.awk input.csv