解决方法
编辑,2013年12月:Google已弃用旧的Xml服务,将其替换为
XmlService
.此答案中的脚本已更新为使用新服务.新服务需要符合标准的XML& HTML,虽然旧的原谅了丢失关闭标签等问题.
查看Tutorial: Parsing an XML Document.(截至2013年12月,本教程仍然在线,虽然不推荐使用Xml服务.)从该基础开始,您可以利用脚本服务中的XML解析来导航页面.这是一个操作你的例子的小脚本:
function getProgrammeList() { txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>' // Put the receieved xml response into XMLdocument format var doc = Xml.parse(txt,true); Logger.log(doc.html.body.div.div.div.id +" = " +doc.html.body.div.div.div.Text ); /// here = hello world!! debugger; // Pause in debugger - examine content of doc }
要获得真实页面,请从以下开始:
var url = 'http://blah.blah/whatever?querystring=foobar'; var txt = UrlFetchApp.fetch(url).getContentText(); ....
如果查看getElements
的文档,您将看到支持检索特定标记,例如“div”.找到特定元素的直接子元素,它不会探索整个XML文档.您应该能够编写一个遍历文档的函数来检查每个div元素的id,直到找到您的程序列表.
var programmeList = findDivById(doc,"here");
编辑 – 我忍不住了……
这是一个实用功能,可以做到这一点.
/** * Find a <div> tag with the given id. * <pre> * Example: getDivById( html,'tagVal' ) will find * * <div id="tagVal"> * </pre> * * @param {Element|Document} * element XML document or element to start search at. * @param {String} id HTML <div> id to find. * * @return {XmlElement} First matching element (in doc order) or null. */ function getDivById( element,id ) { // Call utility function to do the work. return getElementByVal( element,'div','id',id ); } /** * !Now updated for XmlService! * * Traverse the given Xml Document or Element looking for a match. * Note: 'class' is stripped during parsing and cannot be used for * searching,I don't know why. * <pre> * Example: getElementByVal( body,'input','value','Go' ); will find * * <input type="submit" name="btn" value="Go" id="btn" class="submit buttonGradient" /> * </pre> * * @param {Element|Document} * element XML document or element to start search at. * @param {String} elementType XML element type,e.g. 'div' for <div> * @param {String} attr Attribute or Property to compare. * @param {String} val Search value to locate * * @return {Element} First matching element (in doc order) or null. */ function getElementByVal( element,elementType,attr,val ) { // Get all descendants,in document order var descendants = element.getDescendants(); for (var i =0; i < descendants.length; i++) { var elem = descendants[i]; var type = elem.getType(); // We'll only examine ELEMENTs if (type == XmlService.ContentTypes.ELEMENT) { var element = elem.asElement(); var htmlTag = element.getName(); if (htmlTag === elementType) { if (val === element.getAttribute(attr).getValue()) { return element; } } } } // No matches in document return null; }
将此应用于您的示例,我们得到:
function getProgrammeList() { txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>' // Get the receieved xml response into an XML document var doc = XmlService.parse(txt); var found = getDivById(doc.getElement(),'here'); Logger.log(found.getAttribute(attr).getValue() + " = " + found.getValue()); /// here = hello world!! }
注意:有关使用这些实用程序的实际示例,请参阅this answer.