Bitcoin Forum

Other => Meta => Topic started by: oikoskev on April 16, 2020, 03:12:42 AM



Title: Scraping policy?
Post by: oikoskev on April 16, 2020, 03:12:42 AM
Hi all, I made a small script that polls bitcointalk for new posts (every 10 seconds) and sends me an alert on Discord when a new post matches a keyword. I was wondering if this was an acceptable use of the site? I couldn't find any rules or guidelines related to automation/bots/scraping/etc.

Thx! Kev


Title: Re: Scraping policy?
Post by: Solosanz on April 16, 2020, 03:21:43 AM
I think if it's not a bug or anything threaten about this forum, it's allowed

Quote
28. Exploiting bugs or flaws (even if the result is harmless) in the forum's software is not allowed


Title: Re: Scraping policy?
Post by: TryNinja on April 16, 2020, 03:21:52 AM
There is nothing wrong with doing this. There are at least 2 big projects that scrape pretty much every aspect of the forum and make the data public (trust, posts, merit, users, etc...). And they are both maintained by trusted members.

http://loyce.club/
http://bpip.org/

Just respect the 1 request per second limit and the rules and you will be fine.

The rules are the same as for humans. But keep in mind:
- No one is allowed to access the site more often than once per second on average. (Somewhat higher burst accesses are OK.)
- Every post must be on-topic. Any bot response to a topic is almost certainly off-topic. Changetip's behavior of responding to user commands publicly would not be allowed, for example.
- If someone complains about an unsolicited PM you send them, then you're probably going to be banned.
Those IPs are not blocked currently. But your other abusive IPs were blocked. Just your quotefast requests (which are only part of what your crawler does) were occurring at an average frequency of 7.6 requests per second in the most recent access logging period. Your requests constituted 3.4% of all forum requests in this period. This is entirely unacceptable and of course resulted in those IPs being banned.

The maximum allowed bot request frequency is 1 request per second. Those IPs are now accessing pages at an average of 2.5 requests per second combined. If you continue exceeding the allowed request limit, we will continue banning your IPs.


Title: Re: Scraping policy?
Post by: Vod on April 16, 2020, 10:14:05 AM
Hi all, I made a small script that polls bitcointalk for new posts (every 10 seconds) and sends me an alert on Discord when a new post matches a keyword. I was wondering if this was an acceptable use of the site? I couldn't find any rules or guidelines related to automation/bots/scraping/etc.

Thx! Kev

You'll find ten seconds is too fast.  Most of the time you'll have 0 new posts except for a couple hours a day when there might be 11 posts in 10 seconds.  Instead you should set it for a minute or so, and then if you parse ten new records on the first page, parse the second page, and so on, until you get to Page 10.  It wouldn't be likely the forum would get over one hundred posts in a minute.

Your ISP will thank you.  :)


Title: Re: Scraping policy?
Post by: UserU on April 16, 2020, 11:51:19 AM
Your ISP will thank you.  :)

It should be the other way round, he paid for the service, man :D