我正在尝试使用selenium phantomjs设置webscraping的代理.我正在使用
python.
我在很多地方都看到phantomjs中存在一个bug,因为proxy-auth不起作用.
from selenium.webdriver.common.proxy import * from selenium import webdriver from selenium.webdriver.common.by import By service_args = [ '--proxy=http://fr.proxymesh.com:31280','--proxy-auth=USER:PWD','--proxy-type=http',] driver = webdriver.PhantomJS(service_args=service_args) driver.get("https://www.google.com") print driver.page_source
代理网格建议使用以下代码:
page.customHeaders={‘Proxy-Authorization’: ‘Basic ‘+btoa(‘USERNAME:PASSWORD’)};
但我不知道如何将其转换为python.
这就是我目前拥有的:
from selenium import webdriver import base64 from selenium.webdriver.common.proxy import * from selenium import webdriver from selenium.webdriver.common.by import By service_args = [ '--proxy=http://fr.proxymesh.com:31280',] headers = { 'Proxy-Authorization': 'Basic ' + base64.b64encode('USERNAME:PASSWORD')} for key,value in enumerate(headers): webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = value driver = webdriver.PhantomJS(service_args=service_args) driver.get("https://www.google.com") print driver.page_source
但它不起作用.
有关如何使其工作的任何建议?
解决方法
我正在编译答案:
How to correctly pass basic auth (every click) using Selenium and phantomjs webdriver
以及:
base64.b64encode error
How to correctly pass basic auth (every click) using Selenium and phantomjs webdriver
以及:
base64.b64encode error
from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities import base64 service_args = [ '--proxy=http://fr.proxymesh.com:31280',] authentication_token = "Basic " + base64.b64encode(b'username:password') capa = DesiredCapabilities.PHANTOMJS capa['phantomjs.page.customHeaders.Proxy-Authorization'] = authentication_token driver = webdriver.PhantomJS(desired_capabilities=capa,service_args=service_args) driver.get("http://...")