Bitcoin Forum

Bitcoin => Project Development => Topic started by: ~DefaultTrust on January 16, 2020, 11:19:24 AM



Title: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 16, 2020, 11:19:24 AM
I am working on script that will collect all bitcointalk users profiles. This is a simple parser with simple frontend. The Project is temporaly hosted on this domain http://forbtt.tk

For now there is fully parsed and ready to use. You can test it with sorting users by id, name, posts etc. For addition you can use some filters lile Minimum posts and Minimum merits. Now reguest is limited by 3000 users on page.

I would be grateful for the feedback and suggestions.

https://i.imgur.com/9Y484Zc.jpg




Title: Re: [ANN] All bitcointalk users table project
Post by: AB de Royse777 on January 16, 2020, 11:23:01 AM
Whatever the link was in your post, it has been removed. And talking about the project, we have LoyceV DdmrDdmr and some other users who scraps forum data and they also have the data you have scrapped so far. I hope you are aware of http://loyce.club site.

Anyway, let's see what you bring up with the data you are collecting. Good luck.

Edit: Just found the IP. First impression is good. I like the thing that I can filter by number of posts and merits. Also the sorting option of the table data.

I would be grateful for the feedback and suggestions.
Give this feature to the users to see x number of data in one page. Right now it's everything in one page and once you will have huge data then I am sure the page will take ages to load.


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 16, 2020, 11:26:55 AM
Whatever the link was in your post, it has been removed. And talking about the project, we have LoyceV DdmrDdmr and some other users who scraps forum data and they also have the data you have scrapped so far. I hope you are aware of http://loyce.club site.

Anyway, let's see what you bring up with the data you are collecting. Good luck.

Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk


Title: Re: [ANN] All bitcointalk users table project
Post by: AB de Royse777 on January 16, 2020, 11:36:28 AM
Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk
Yeah got that. Great job on the site and I hope you will start adding features too with more staffs like trust, flag etc. Post, Activity, Last Active, Merit, Local time, Website, BTC address are some dynamic column. How are you going to keep them updated. What frequency you are checking each users?

By the way, nice username you have there :-P

Edit:
You really need to PM theymos and request to change the username to something else. From your trust page I already see it has already got attention and obviously I think all those users are correct in their inputs. Your creation seems exciting (on the site) and I really hope you will respect the forum value too.
PS: I left you 5 merits to show some encouragements on your work.


Title: Re: [ANN] All bitcointalk users table project
Post by: LoyceV on January 16, 2020, 11:42:54 AM
It seems that forum do not like links to free .tk domains. forbtt[dot]tk
That's spam protection for Newbies. Dot.tk links are fine:
All 8 domains and 3 updates per hour are working again:
IsLaudaStillOnDT.tk (http://islaudastillondt.tk/)
IsTimelord2067onDTyet.tk (http://istimelord2067ondtyet.tk)
IsOgNastyStillOnDT.tk (http://isognastystillondt.tk)
IsUnibitcoinistOnDTyet.tk (http://isunibitcoinistondtyet.tk/)
IsTECSHAREstillOnDT.tk (http://istecsharestillondt.tk/)
IsFoxpupStillAVixen.tk (http://isfoxpupstillavixen.tk/)
IsHhampuzStillOnDT.tk (http://ishhampuzstillondt.tk/)
DidTMANsayAbadWord.tk (http://didtmansayabadword.tk/)

You really need to PM theymos and request to change the username to something else. From your trust page I already see it has already got attention and obviously I think all those users are correct in their inputs.
This is a troll account (some say it's owned by banned user korner (https://bitcointalk.org/index.php?topic=5105163.msg53579061#msg53579061)), created right after theymos DefaultTrust (https://bitcointalk.org/index.php?topic=5095156.0).


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 16, 2020, 11:43:15 AM
Thank you. It seems that forum do not like links to free .tk domains. forbtt[dot]tk
Yeah got that. Great job on the site and I hope you will start adding features too with more staffs like trust, flag etc. Post, Activity, Last Active, Merit, Local time, Website, BTC address are some dynamic column. How are you going to keep them updated. What frequency you are checking each users?


I have start it only 16 hours ago and now it parsed 55000 users. So I think that all 3000000 users will parsed about 870 hours (one month)
After that I will parse again with begin but except deleted users. It should be faster.


Title: Re: [ANN] All bitcointalk users table project
Post by: DdmrDdmr on January 17, 2020, 11:05:33 AM
If I recall correctly, the last full profile DB published by a forum member was @piggy’s Open scraped data of all the users - SQL Lite DB - 2.481.270 users (https://bitcointalk.org/index.php?topic=5066192.msg47728970#msg47728970).

That was published over a year ago now, and at the time it took him just over 5 days to obtain a full DB dump (different IPs running parallel processes). It seemed to take-up quite some personal time, and was thus done only a couple of times. The good thing about getting the data in those five days is that the dataset will have more inner time/value related consistency than performing it over a longer period of time (such as a month - which is what it would probably take me too).


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 17, 2020, 11:22:40 AM
If I recall correctly, the last full profile DB published by a forum member was @piggy’s Open scraped data of all the users - SQL Lite DB - 2.481.270 users (https://bitcointalk.org/index.php?topic=5066192.msg47728970#msg47728970).

That was published over a year ago now, and at the time it took him just over 5 days to obtain a full DB dump (different IPs running parallel processes). It seemed to take-up quite some personal time, and was thus done only a couple of times. The good thing about getting the data in those five days is that the dataset will have more inner time/value related consistency than performing it over a longer period of time (such as a month - which is what it would probably take me too).

I don’t understand how he managed to circumvent the CloudFlare defense.

I am parsing with one process and one IP address and with no parallel reqiests. But even with such a low speed, my bot was banned twice yesterday by CloudFlare. It took a long time to solve the problem. Not sure I solved it completely


Title: Re: [ANN] All bitcointalk users table project
Post by: TryNinja on January 17, 2020, 11:28:44 AM
When I make a search with the reg. date set to the range 1 January 2014 - 17 January 2020, I get just an “error”. I was trying to find myself. Is this because you haven’t parsed users from that date and forward?


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 17, 2020, 11:32:36 AM
When I make a search with the reg. date set to the range 1 January 2014 - 17 January 2020, I get just an “error”. I was trying to find myself. Is this because you haven’t parsed users from that date and forward?

Yes. Parser is still working right now an it is parsed only till April 2013


Title: Re: [ANN] All bitcointalk users table project
Post by: DdmrDdmr on January 17, 2020, 11:59:44 AM
<…>
Not sure how he did it either (I think you can achieve it with multiple VMs, but I have not done it myself).
I assume you’ve set your scraper script to intervals no shorter than 1 second between queries.


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on January 17, 2020, 12:07:40 PM
I probably really will try to parse from multiple addresses


Title: Re: [ANN] All bitcointalk users table project
Post by: ~DefaultTrust on February 01, 2020, 03:47:31 PM
Ready to use! All profiles are parsed. Added filter by banned users and some stats