Bitcoin Forum
May 28, 2024, 01:00:39 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Raw data of bitcointalk forum - how to avoid or correctly scrap information  (Read 191 times)
JeremyB (OP)
Sr. Member
****
Offline Offline

Activity: 812
Merit: 270



View Profile
March 18, 2018, 08:33:25 AM
Last edit: March 18, 2018, 10:17:19 AM by JeremyB
 #1

I recently discover some posts related to Merit system based on raw data file provided by theymos

Here you go: https://bitcointalk.org/merit.txt.xz

Similar to trust.txt.xz, it'll be updated weekly. It will show only the last 120 days of data; someone else should archive the old ones if you want them.

Then another one from LoyceV that points to a username/id mapping (pastebin) but I don't know who provided it, how and when it has been last updated.

For some time I was thinking about retrieving some information from the forum to create some stats but it would involve a lot of pages scraping.

So my first question is: is there any list of available raw data to use? I heard about the trust data (https://bitcointalk.org/trust.txt.xz) but what I clearly looking for is related to the forum architecture: thread parent/children, message parent thread, message author, user id/names mapping, etc...

Second question: if this kind of data is not available, what are the policies concerning the forum scrapping?

EDIT: just found theymos thread concerning new data dumps here: https://bitcointalk.org/index.php?topic=3151741.0
So I locked the thread.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!