html5 – 使用SPARQL从HTML页面查询元数据不返回任何内容

前端之家收集整理的这篇文章主要介绍了html5 – 使用SPARQL从HTML页面查询元数据不返回任何内容前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我似乎遇到了一些问题,要么使用 HTML :: HTML5 :: Microdata :: Parser或RDF :: Query,要么使用SPARQL语法和语义.我对 news site page的这一点感兴趣.
<div class="authors">
Autoři: <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url" class="name" href="http://vice.idnes.cz/novinari.aspx?idnov=2504" ><span itemprop="name">Zdeňka Trachtová</span></a></span>,<span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url"  href="http://vice.idnes.cz/novinari.aspx?idnov=3495" ><span itemprop="additionalName">san</span></a><span class="h" itemprop="name">Sabina Netrvalová</span></span>
</div>

这是我的测试代码

#! env perl

use strict;
use Data::Dumper;
use HTML::HTML5::Microdata::Parser;
use RDF::Query;
use IO::Handle;
use LWP::Simple;


STDOUT->binmode(":utf8");
STDERR->binmode(":utf8");

my $htmldoc = LWP::Simple::get(
    "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt");
die "Could not fetch URL. $@" unless defined $htmldoc;

my $microdata = HTML::HTML5::Microdata::Parser->new (
    $htmldoc,$ARGV[0],{auto_config => 1,tdb_service => 1,xhtml_Meta => 1,xhtml_rel => 1});
print STDERR "microdata->graph:\n",Dumper($microdata->graph),"\n";

my $query = RDF::Query->new(<<'SPARQL');
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
   ?author a schema:Person .
}
SPARQL

my $people = $query->execute($microdata->graph);
print STDERR "authors from RDF:\n",Dumper($people),"\n";
while (my $person = $people->next) {
    print STDERR "people: ",$person,"\n";
}

HTML :: HTML5 :: Microdata :: Parser的选项只是我努力完成这项工作的最后努力. (我基本上不知道我在做什么.)

任何想法如何使这项工作,并得到作者的名字?

解决方法

只需使用 Mojo::UserAgentMojo::DOM
use strict;
use warnings;
use utf8;
use v5.10;

BEGIN {
    binmode *STDOUT,':utf8';
    binmode *STDERR,':utf8';
}

use Mojo::UserAgent;

my $url = "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt";

my $dom = Mojo::UserAgent->new->get($url)->res->dom;

# Process all authors
for my $span ($dom->find('span[itemprop=author]')->each) {
    say $span->all_text;
}

输出

Zdeňka Trachtová
san Sabina Netrvalová

有关这些模块的简短8分钟教程,请查看Mojocast episode 5.

猜你在找的HTML5相关文章