Bitcoin Forum
August 07, 2024, 03:26:25 AM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3]  All
  Print  
Author Topic: Bitcointalk Search Project  (Read 453 times)
Vod
Legendary
*
Offline Offline

Activity: 3780
Merit: 3107


Licking my boob since 1970


View Profile WWW
July 28, 2024, 06:28:13 AM
 #41

I haven't really done such a thing before. But like I said, I have a few IP addresses, so I guess I'll see how that goes.

You are still thinking of ONE parser going out pretending to be another parser.  You are fighting against every fraud detection tool out there.  

Create a schedule table in your database.   Columns include jobid, lockid, lastjob and parsedelay.   When your parser grabs a job, it locks it in the table so the next parser will grab a different job.   It releases the lock when it finishes.   Your parser can call the first record in the schedule based on (lastjob+parsedelay) where lockid is free.

Edit:  Then go to one of the cloud providers and use a free service to create a second parser.

https://nastyscam.com - featuring 13 years of OGNasty bitcoin scams     https://vod.fan - advanced image hosting - coming sooner than you think!
LoyceV
Legendary
*
Offline Offline

Activity: 3388
Merit: 17126


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 06, 2024, 03:14:09 PM
 #42

My scraper was broken by Cloudflare after about 58K posts or so.
If you ask nicely, maybe theymos can whitelist your server IP in Cloudflare. That solved my download problems when Cloudflare goes in full DDoS protection mode.

Quote
I do however have LoyceV's archive (thanks Loyce) But I am not sure whether it covers posts before 2018.
It's in the "oldposts" directory Smiley

Why don't you implement a record locking system into your parser, so you can have multiple parsers running at once from various IPs?
The rate limit is supposed to be per person, not per server. You shouldn't use multiple scrapers to get around the limit (1 connection per second).
The rules are the same as for humans. But keep in mind:
- No one is allowed to access the site more often than once per second on average. (Somewhat higher burst accesses are OK.)

Vod
Legendary
*
Offline Offline

Activity: 3780
Merit: 3107


Licking my boob since 1970


View Profile WWW
August 06, 2024, 10:52:46 PM
 #43

Why don't you implement a record locking system into your parser, so you can have multiple parsers running at once from various IPs?
The rate limit is supposed to be per person, not per server. You shouldn't use multiple scrapers to get around the limit (1 connection per second).

I use multiple parsers for backup - if one goes down for whatever reason a second one can take over.  90% of the time, my parsers have nothing to do, since I'm not parsing every profile like I did with BPIP.  I parse once every ten seconds to check for any new posts, and if any I parse them.  My record locking system has a parse delay for many things to prevent it from hitting bct too often.  I don't even parse as a logged in user.

https://nastyscam.com - featuring 13 years of OGNasty bitcoin scams     https://vod.fan - advanced image hosting - coming sooner than you think!
Pages: « 1 2 [3]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!