使用file_get_contents将php数组解析为html表

我正在尝试将所显示的表格解析为多维PHP数组.我使用以下代码,但由于某种原因,它返回一个空数组.在网路搜索后,我发现 this site是哪里我得到了parseTable()函数.从阅读该网站上的评论,我看到功能完美.所以我假设我从file_get_contents()获取 HTML代码的方式有问题.对我做错了什么的想法？

<?PHP

$data = file_get_contents('http://flow935.com/playlist/flowhis.HTM');

function parseTable($html)
{
  // Find the table
  preg_match("/<table.*?>.*?<\/[\s]*table>/s",$html,$table_html);

  // Get title for each row
  preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/",$table_html[0],$matches);
  $row_headers = $matches[1];

  // Iterate each row
  preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s",$matches);

  $table = array();

  foreach($matches[1] as $row_html)
  {
    preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/",$row_html,$td_matches);
    $row = array();
    for($i=0; $i<count($td_matches[1]); $i++)
    {
      $td = strip_tags(html_entity_decode($td_matches[1][$i]));
      $row[$row_headers[$i]] = $td;
    }

    if(count($row) > 0)
      $table[] = $row;
  }
  return $table;
}

$output = parseTable($data);

print_r($output);

?>

我想要我的输出数组看起来像这样：

1
--> 11:33AM
--> DEV
--> IN THE DARK

2
--> 11:29AM
--> LIL' WAYNE
--> SHE WILL

3
--> 11:26AM
--> KARDINAL OFFISHALL
--> NUMBA 1 (TIDE IS HIGH)

不要使用regexps来解析HTML！相反,让HTML解析器库担心您的标记结构.

我建议你查看简单的HTML DOM(http://simplehtmldom.sourceforge.net/).它是一个专门用于帮助解决 PHP中的这种网页抓取问题的图书馆.通过使用这样一个库,你可以用更少的代码来编写你的刮擦,而不用担心创建正常的正则表达式.

原则上,使用简单的HTML DOM,您只需编写如下内容：

$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
   // Parse table row here
}

这可以扩展为以某种格式捕获数据,例如创建一系列艺术家和相应的标题：

<?PHP
require('simple_html_dom.PHP');

$table = array();

$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
    $time = $row->find('td',0)->plaintext;
    $artist = $row->find('td',1)->plaintext;
    $title = $row->find('td',2)->plaintext;

    $table[$artist][$title] = true;
}

echo '<pre>';
print_r($table);
echo '</pre>';

?>

我们可以看到,这个代码可以(简单地)改变为以任何其他方式重新格式化数据.

使用file_get_contents将php数组解析为html表

猜你在找的PHP相关文章