网页抓取 – 是否允许网页抓取?

前端之家收集整理的这篇文章主要介绍了网页抓取 – 是否允许网页抓取?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在开发一个需要来自其他网站的某些统计数据的项目,并且我已经创建了一个HTML scraper,可以每15分钟自动获取一次这些数据.但是,我现在停止了机器人,就像他们的使用条款一样,他们提到他们不允许它.

我真的很想尊重这一点,特别是如果有法律禁止我接受这些数据,但我已经多次通过电子邮件与他们联系而没有一个答案,所以现在我得出的结论是,我只是简单地说抓住数据,如果它是合法的.

在某些论坛上,我已经读到它是合法的,但我更愿意在StackOverflow上获得更“精确”的答案.

让我们说这实际上并不违法,他们是否有任何软件可以发现我的机器人每隔15分钟建立几个连接?

此外,在谈论获取他们的数据时,我们正在讨论每个“团队”的单个号码,而这个号码我将转移到我们自己的号码.

解决方法

我引用Pablo Hoffman的(Scrapinghub联合创始人)回答“网络抓取的合法性是什么?”,我在其他网站上找到:

First things first: I am not a lawyer and these comments are solely
based on my experience working at 07000,please seek legal
assistance accordingly.

Here are a few things to consider when scraping public data from websites (note that the following addresses only US law):

  • As long as they don’t crawl at a disruptive rate,scrapers do not breach any contract (in the form of terms of use) or commit a crime
    (as defined in the Computer Fraud and Abuse Act).
  • Website’s user agreement 07001 as a browsewrap agreement because companies do not provide sufficient notice of the
    terms to site visitors.
  • Scrapers accesses website data as a visitor,
    and by following paths similar to a search engine. This can be done
    without registering as a user (and explicitly accepting any terms).
  • In Nguyen v. Barnes & Noble,Inc. the courts 07002 that simply placing a
    link to a terms of use at the bottom of webpage is not sufficient to
    “give rise to constructive notice.” In other words,there is nothing
    on a public page that would imply that merely accessing the
    information is subject to any contractual terms. Scrapers gives
    neither explicit nor implicit assent to any agreement,therefore
    breaches no contract.
  • Social networks,for example,assign the value of becoming a user (based on call-to-action on public page),as the ability to: i) Gain access to full profiles,ii) Identify common friends/connections,iii) Get introduced to others,and iv) Contact members directly. As long as scrapers makes no attempt to perform any of these actions they do not gain “unauthorized access” to their services and thus does not violate 07003
  • A thorough evaluation of the legal issues involved can be seen here: 07004

猜你在找的HTML相关文章