Bitcoin Forum
November 12, 2024, 02:42:37 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Bitcoin Forum data extraction (for research purposes): information request  (Read 629 times)
NiDe (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
November 15, 2015, 02:36:32 AM
 #1

Hopefully the subject of this thread is not all too cryptic Smiley. Anyways, in what follows I will try to explain why I would like to obtain some sort of a copy of the Bitcoin Forum and how (in case you are interested) you could possibly help me (by allowing me to pick your brains).

Here is the story: (you can skip this and the next paragraph in case you do not care about the story)
I am a university student, working on a big (well, it is big for a student  Grin) project: it involves replicating existing research and trying to come to similar (or maybe not?) conclusions. Whenever possible one tries to make some improvements. We are free to do whatever we want, as long as it is related to our field of studies. Obviously I wanted to do something with Bitcoin (or cryptocurrencies in general). Given that my degree situates itself at the conjuncture of finance and computer science it had to be finance related as well.

You might be aware of quite a lot of research that has been done on trying to predict the Bitcoin price based on all kind of data sources. Often one tries to relate it to the EMH-theory. I try to do something similar. More specifically I am trying to replicate some research involving sentiment analysis and apply it to Bitcoin. Also I try to improve on the data sources that have been used (so far: not much in my humble opinion). E.g. I will try to include the sentiment of Chinese news sources. Another thing that I have noticed is that I never came across such a paper that included Bitcoin Forum. I might be mistaken, but to me it seems that if someone knows what will happen with Bitcoin (and its price), (s)he is probably active here.

In conclusion: I would like to perform sentiment analysis on (among others) this forum's threads. The idea of this post is to get some input (from admins, moderators or whoever that has some affinity with the topic) on how to do so. Namely I would need a copy of all publicly available posts (or maybe one specific board) so that I can process it easily. I know there are different ways to try to achieve this:
  • wget the entire website
  • use existing tools for scraping phpbb forums
  • code a scraper myself
  • ask very nicely to get a database backup

While probably I could try all of them and figure what works best (or what does not work), I would first like to ask for your input before I waste a lot of time, bandwidth and the admins patience Smiley.

So input on how to do this technically, legally (e.g. what I can and cannot do), socially (e.g. who of the admins I should ask)... is very welcome. Also if you want to discuss the project itself: please feel free (not sure if this violates the 'no off-topic reply rules').

Thanks!
achow101
Staff
Legendary
*
Offline Offline

Activity: 3542
Merit: 6886


Just writing some code


View Profile WWW
November 15, 2015, 03:21:38 AM
 #2

If you want to scrape the forum, it will take an extremely long time. The forum has a limit of 1 request per second per ip as a DDoS protection. If you exceed that limit, your ip will be banned for a few minutes. So if you intend on scraping the forum, I advise that you follow that limit. You can semi-easily get all of the pages of a topic by appending
Code:
;all
to the url of the topic, but for some topics, the resulting download will be massive.

dogie
Legendary
*
Offline Offline

Activity: 1666
Merit: 1185


dogiecoin.com


View Profile WWW
November 15, 2015, 03:27:50 AM
 #3

;all is only available in threads up to around 20 pages, still very useful. Maybe you could find the data source that the mimic sites are using?

botany
Legendary
*
Offline Offline

Activity: 1582
Merit: 1064


View Profile
November 15, 2015, 04:23:20 AM
 #4

;all is only available in threads up to around 20 pages, still very useful. Maybe you could find the data source that the mimic sites are using?

You can append
Code:
;action=printpage
to the url of the topic. This can be used for longer threads and might be helpful if ;all doesn't work.
dogie
Legendary
*
Offline Offline

Activity: 1666
Merit: 1185


dogiecoin.com


View Profile WWW
November 15, 2015, 04:56:34 AM
 #5

You can append
Code:
;action=printpage
to the url of the topic. This can be used for longer threads and might be helpful if ;all doesn't work.

Oh wow that's nice, good for archiving threads although it does snip the post numbers off.

NiDe (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
November 15, 2015, 05:53:37 AM
 #6

Many thanks for your replies! I think you all suggest to code something myself (I am thinking about some quick and dirty Python).

So let's assume I do so and want to limit myself to one or maybe a couple of boards: which would you suggest?

Personally I think the Speculation board (under Economics) could potentially be very useful for sentiment analysis.
shorena
Copper Member
Legendary
*
Offline Offline

Activity: 1498
Merit: 1540


No I dont escrow anymore.


View Profile
November 15, 2015, 12:10:45 PM
 #7

Many thanks for your replies! I think you all suggest to code something myself (I am thinking about some quick and dirty Python).

So let's assume I do so and want to limit myself to one or maybe a couple of boards: which would you suggest?

Personally I think the Speculation board (under Economics) could potentially be very useful for sentiment analysis.

Speculation would be where the price related topics should be, yes. Keep in mind that you only query the board once per second or slower, otherwise you might get the account and/or IP banned.

Im not really here, its just your imagination.
Cyrus
Ninja
Administrator
Legendary
*
Online Online

Activity: 3948
Merit: 3152



View Profile
November 15, 2015, 01:51:01 PM
 #8

  • use existing tools for scraping phpbb SMF forums

Hope you haven't started looking for phpbb tools, as this is an SMF forum(for now anyways).

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!