我正在从
http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html解析修改后的XML
这是它的样子:
<?xml version="1.0" encoding="utf-8"?> <population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com"> <success>true</success> <row_count>2</row_count> <summary> <bananas>0</bananas> </summary> <people> <person> <firstname>Michael</firstname> <age>25</age> </person> <person> <firstname>Eliezer</firstname> <age>2</age> </person> </people> </population>
如何获得每个人的名字和年龄列表?
我的目标是使用http-conduit下载这个xml,然后解析它,但我正在寻找一个解决方案,解决在没有属性时如何解析(使用tagNoAttrs?)
{-# LANGUAGE OverloadedStrings #-} import Control.Monad.Trans.Resource import Data.Conduit (($$)) import Data.Text (Text,unpack) import Text.XML.Stream.Parse import Control.Applicative ((<*)) data Person = Person Int Text deriving Show -- Do I need to change the lambda function \age to something else to get both name and age? parsePerson = tagNoAttr "person" $\age -> do name <- content -- How do I get age from the content? "unpack" is for attributes return $Person age name parsePeople = tagNoAttr "people" $many parsePerson -- This doesn't ignore the xmlns attributes parsePopulation = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $parsePeople main = do people <- runResourceT $ parseFile def "people2.xml" $$parsePopulation print people
首先:解析xml-conduit中的组合器在很长一段时间内没有更新,并显示它们的年龄.我建议大多数人使用DOM或游标界面.那就是说,让我们来看看你的例子.您的代码有两个问题:
>它无法正确处理XML名称空间.所有元素名称都在http://example.com命名空间中,您的代码需要反映这一点.
>解析组合器要求您考虑所有元素.他们不会自动跳过某些元素.
所以这是使用流API的实现,它获得了所需的结果:
{-# LANGUAGE OverloadedStrings #-} import Control.Monad.Trans.Resource (runResourceT) import Data.Conduit (Consumer,($$)) import Data.Text (Text) import Data.Text.Read (decimal) import Data.XML.Types (Event) import Text.XML.Stream.Parse data Person = Person Int Text deriving Show -- Do I need to change the lambda function \age to something else to get both name and age? parsePerson :: MonadThrow m => Consumer Event m (Maybe Person) parsePerson = tagNoAttr "{http://example.com}person" $do name <- force "firstname tag missing" $tagNoAttr "{http://example.com}firstname" content ageText <- force "age tag missing" $tagNoAttr "{http://example.com}age" content case decimal ageText of Right (age,"") -> return $Person age name _ -> force "invalid age value" $return Nothing parsePeople :: MonadThrow m => Consumer Event m [Person] parsePeople = force "no people tag" $do _ <- tagNoAttr "{http://example.com}success" content _ <- tagNoAttr "{http://example.com}row_count" content _ <- tagNoAttr "{http://example.com}summary" $ tagNoAttr "{http://example.com}bananas" content tagNoAttr "{http://example.com}people" $many parsePerson -- This doesn't ignore the xmlns attributes parsePopulation :: MonadThrow m => Consumer Event m [Person] parsePopulation = force "population tag missing" $ tagName "{http://example.com}population" ignoreAttrs $\() -> parsePeople main :: IO () main = do people <- runResourceT $ parseFile def "people2.xml" $$parsePopulation print people
这是使用游标API的示例.请注意,它具有不同的错误处理特性,但对于格式良好的输入应该产生相同的结果.
{-# LANGUAGE OverloadedStrings #-} import Text.XML import Text.XML.Cursor import Data.Text (Text) import Data.Text.Read (decimal) import Data.Monoid (mconcat) main :: IO () main = do doc <- Text.XML.readFile def "people2.xml" let cursor = fromDocument doc print $cursor $// element "{http://example.com}person" >=> parsePerson data Person = Person Int Text deriving Show parsePerson :: Cursor -> [Person] parsePerson c = do let name = c $/ element "{http://example.com}firstname" &/ content ageText = c $/ element "{http://example.com}age" &/ content case decimal $mconcat ageText of Right (age,"") -> [Person age $mconcat name] _ -> []