地址:webpage_link
我尝试过的脚本:
import requests_html with requests_html.HTMLSession() as session: r = session.get('https://www.gdax.com/trade/LTC-EUR') js = r.html.render() item = js.find('.MarketInfo_market-num_1lAXs',first=True).text print(item)
这是完整的追溯:
Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49 handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49> Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py",line 145,in _run self._callback(*self._args) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py",line 52,in watchdog_cb self._timeout) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py",line 40,in _raise_error raise error concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\experiment.py",line 6,in <module> item = js.find('.MarketInfo_market-num_1lAXs',first=True).text AttributeError: 'NoneType' object has no attribute 'find' Error in atexit._run_exitfuncs: Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\shutil.py",line 387,in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 5] Access is denied: 'C:\\Users\\ar\\.pyppeteer\\.dev_profile\\tmp1gng46sw\\CrashpadMetrics-active.pma'
我所能找到的价格在页面顶部可以看到这样的177.59 EUR最后交易价格.我希望得到177.59或当前的价格.
解决方法
Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49 handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49> Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py",in _raise_error raise error concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded
主线程中不会引发此回溯,因此您的代码未中止.您的页面可能已完成,也可能未完成;您可能希望为浏览器设置更长的超时或引入睡眠周期,以便有时间处理AJAX响应.
接下来,response.html.render()元素返回None.它将HTML加载到无头Chromium浏览器中,将JavaScript呈现留给该浏览器,然后将页面HTML复制回到适当的response.html数据结构中,并且不需要返回任何内容.所以js设置为None,而不是新的HTML实例,导致你的下一个回溯.
渲染后,使用现有的response.html对象进行搜索:
r.html.render() item = r.html.find('.MarketInfo_market-num_1lAXs',first=True)
很可能没有这样的CSS类,因为在通过AJAX加载JSON数据之后,在每个页面渲染上生成最后5个字符.这使得很难使用CSS来查找有问题的元素.
此外,我发现没有睡眠周期,浏览器就没有时间获取AJAX资源并呈现您想要加载的信息.在复制HTML之前,给它做10秒的睡眠以完成一些工作.如果看到网络超时,请设置更长的超时(默认为8秒):
r.html.render(timeout=10,sleep=10)
希望a future API update还提供wait for network activity to cease的功能.
您可以使用included parse
library查找匹配的CSS类:
# search for CSS suffixes suffixes = [r[0] for r in r.html.search_all('MarketInfo_market-num_{:w}')] for suffix in suffixes: # for each suffix,find all matching elements with that class items = r.html.find('.MarketInfo_market-num_{}'.format(suffix)) for item in items: print(item.text)
现在我们得到输出:
169.81 EUR + 1.01 % 18,420 LTC 169.81 EUR + 1.01 % 18,420 LTC
您的上一次回溯显示无法清除Chromium用户数据路径.底层Pyppeteer library使用临时用户数据路径配置无头Chromium浏览器,在您的情况下,该目录包含一些仍然锁定的资源.您可以忽略该错误,但您可能希望稍后尝试删除.pyppeteer文件夹中的所有剩余文件.