我见过
this question,但它并不能满足我的需求.该问题的答案要么是:从元描述标签中提升,第二个是为您已经拥有主体的文章生成摘录.
我想要做的实际上是获得文章的前几句话,如可读性.这不是最好的方法吗? HTML解析?这是我目前使用的,但这不是很可靠.
function guessExcerpt($url) { $html = file_get_contents_curl($url); $doc = new DOMDocument(); @$doc->loadHTML($html); $Metas = $doc->getElementsByTagName('Meta'); for ($i = 0; $i < $Metas->length; $i++) { $Meta = $Metas->item($i); if($Meta->getAttribute('name') == 'description') $description = $Meta->getAttribute('content'); } return $description; } function file_get_contents_curl($url) { $ch = curl_init(); curl_setopt($ch,CURLOPT_HEADER,0); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_TIMEOUT,5); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); $data = curl_exec($ch); curl_close($ch); return $data; }
这是PHP中的可读性端口:
https://github.com/feelinglucky/php-readability.试试吧.提取结果类似于Readability(因为它实现了Readability的算法).
原文链接:https://www.f2er.com/php/240226.htmlrequire 'lib/Readability.inc.PHP'; $html = file_get_contents_curl($url); $Readability = new Readability($html,$html_input_charset); // default charset is utf-8 $ReadabilityData = $Readability->getContent(); $title = $ReadabilityData['title']; $content = $ReadabilityData['content'];
然后你可以使用$content中的一些句子作为摘录.