我需要以不区分大小写的方式查找和替换所有文本匹配,除非文本在锚标签内,例如:
<p>Match this text and replace it</p> <p>Don't <a href="/">match this text</a></p> <p>We still need to match this text and replace it</p>
搜索“匹配此文本”将仅替换第一个实例和最后一个实例.
根据Gordon的评论,在这种情况下可能会使用DOMDocument.我完全不熟悉DOMDocument扩展,并且非常感谢这个功能的一些基本示例.
这是一个UTF-8安全解决方案,它不仅适用于正确格式化的文档,而且与文档片段一起使用.
需要mb_convert_encoding,因为loadHtml()似乎有一个UTF-8编码的错误(见here和here).
mb_substr从输出中修剪body标签,这样你就可以获得原始内容,而无需任何额外的标记.
<?PHP $html = '<p>Match this text and replace it</p> <p>Don\'t <a href="/">match this text</a></p> <p>We still need to match this text and replace itŐŰ</p> <p>This is <a href="#">a link <span>with <strong>don\'t match this text</strong> content</span></a></p>'; $dom = new DOMDocument(); // loadXml needs properly formatted documents,so it's better to use loadHtml,but it needs a hack to properly handle UTF-8 encoding $dom->loadHtml(mb_convert_encoding($html,'HTML-ENTITIES',"UTF-8")); $xpath = new DOMXPath($dom); foreach($xpath->query('//text()[not(ancestor::a)]') as $node) { $replaced = str_ireplace('match this text','MATCH',$node->wholeText); $newNode = $dom->createDocumentFragment(); $newNode->appendXML($replaced); $node->parentNode->replaceChild($newNode,$node); } // get only the body tag with its contents,then trim the body tag itself to get only the original content echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)),6,-7,"UTF-8");
参考文献:
1. find and replace keywords by hyperlinks in an html fragment,via php dom
2. Regex / DOMDocument – match and replace text not in a link
3. php problem with russian language
4. Why Does DOM Change Encoding?
我读了几十个答案,所以我很抱歉,如果我忘了某人(请评论,我会在这种情况下添加你的).
感谢Gordon和my other answer的评论.