Bitcoin Forum
May 04, 2024, 04:55:13 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Thinking of doing my master's thesis (in statistics) on Bitcoin.  (Read 1355 times)
hello_good_sir (OP)
Hero Member
*****
Offline Offline

Activity: 1008
Merit: 531



View Profile
September 05, 2012, 03:58:37 AM
 #1

I need an interesting problem, but more importantly I need data.  Relevant citable sources are somewhat important.

Last year I did a couple of semester projects on various topics and I kept hitting the same problem.  I would pick something really interesting and then be unable to find data (or people who said that they would give it to me didn't come through), or I would pick something original and nothing had been written about it by researchers (and thus I wasn't able to cite anything).  So then I would switch topics and I was already behind the rest of the class.  I don't want to be in this situation so I want to make sure that I have the data before I commit to a topic.

First of all, about me:  I am a grad student in statistics.  I am not actually that good at statistics but I am good enough.  I know how to program but I am more familiar with old-timey stuff (C++, x86) and the theory (turing machines) than web programming.  So writing a webcrawler is probably out of the question, but once I get the data onto my harddrive I will have no trouble processing it.

So what kind of data can I get?  I know that the blockchain is public information but I am guessing that it isn't in a user-friendly format.  I know that exchanges have historical information, but I know that bitcoin prices fluctuate and are often influenced by major events (mtgox hacked, pirate40, news coverate).  Apparently silk road does 2 million in business per month?  Do they release this information or are people checking out the block chain?  What about miners?  What kind of information is available on them?

As for my topic, that would mostly depend on what kind of data I can get my hands on.  I am thinking that it might be interesting to try to categorize addresses by their transaction behavior.  Remember that this has to be on statistics, so I have to study this from a data-centric perspective.  Talking about the protocol or the economic model using logic isn't an option, I have to focus on data.


If you can help me I would appreciate it, and maybe interest a few people in bitcoin.  Thanks.

1714841713
Hero Member
*
Offline Offline

Posts: 1714841713

View Profile Personal Message (Offline)

Ignore
1714841713
Reply with quote  #2

1714841713
Report to moderator
1714841713
Hero Member
*
Offline Offline

Posts: 1714841713

View Profile Personal Message (Offline)

Ignore
1714841713
Reply with quote  #2

1714841713
Report to moderator
1714841713
Hero Member
*
Offline Offline

Posts: 1714841713

View Profile Personal Message (Offline)

Ignore
1714841713
Reply with quote  #2

1714841713
Report to moderator
The Bitcoin software, network, and concept is called "Bitcoin" with a capitalized "B". Bitcoin currency units are called "bitcoins" with a lowercase "b" -- this is often abbreviated BTC.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
FreeMoney
Legendary
*
Offline Offline

Activity: 1246
Merit: 1014


Strength in numbers


View Profile WWW
September 05, 2012, 04:20:04 AM
 #2

There is a buttload of data concerning satoshi dice.

Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
kiba
Legendary
*
Offline Offline

Activity: 980
Merit: 1014


View Profile
September 05, 2012, 04:31:36 AM
 #3

How about you collaborate with http://bitmit.net and other platform services and quantify the economy of bitcoin?

cbeast
Donator
Legendary
*
Offline Offline

Activity: 1736
Merit: 1006

Let's talk governance, lipstick, and pigs.


View Profile
September 05, 2012, 05:05:47 AM
 #4

Howabout trying to figure out a statitical algorithm that could analyze user input of Bitcoin price (in various currencies) at the point of sale or trade? Let's just say for argument's sake that the major exchanges no longer have bank access or get hacked or shut down and individuals had to form their own exchanges, how would they price Bitcoin? Users would need to report their Bitcoin price to a hypothetical decentralized network or the Bitcoin Network itself that is (hypothetically) modified for this function. You would need to eliminate HFT and obvious price manipulation. Since there isn't such data available, you could use data from different exchanges and compare the ones with large orders and manipulation vs. smaller exchanges.

This would not be an easy test, but it may prove useful to Bitcoin.

Any significantly advanced cryptocurrency is indistinguishable from Ponzi Tulips.
hello_good_sir (OP)
Hero Member
*****
Offline Offline

Activity: 1008
Merit: 531



View Profile
September 05, 2012, 06:04:37 AM
 #5

FreeMoney, thanks for the tip.  Satoshi's Dice seems to post every bet and the results, and there are a lot of bets.  If I were to do something on this I would be doing a project on betting.  I would need to try to find patterns in who is betting, how much, when, etc.... if wins/losses affect future bets.  I am not really sure that I want to do something on betting, but this is a contender.

Kiba, I have had bad luck with trying to get data (or as they like to call it, trade secrets) from companies before.  My new motto is: if it isn't publicly available, it isn't available.  I suppose I could try to offer some sort of data mining service to maximize their sales through market segmentation.  So maybe we would be conducting experiments to see how to squeeze more money out of people.  My local grocery store just started doing this and now each customer gets unique prices.

Cbeast, I think that what you are describing is... trying to determine prices using non-market methods...but I am not sure.  Statistics is about looking at a lump of data and making mathematically-sound conclusions.  It is more like science than engineering.  So creating a new system to do something wouldn't be acceptable as a project.  Evaluating systems given their historical performance data would though.

Thank you everyone for the ideas so far!

organofcorti
Donator
Legendary
*
Offline Offline

Activity: 2058
Merit: 1007


Poor impulse control.


View Profile WWW
September 05, 2012, 06:42:06 AM
 #6

There's lots of data available from bitcoin mining pools. I use some of it here: http://organofcorti.blogspot.com  - some of the posts there might give you some ideas.

Other ideas:
1. Analyse the network hashrate for cyclical trends (is a large increase in hashrate returning the same time every day / week ?) 2. Find correlations between network hashrate and known external phenomena  (heatwaves might lead many to turn GPUs off, countries with high electricity costs probably have fewer miners)
3. Determine if there is a relationship between percentage of blocks orphaned and block size.

HTH

Bitcoin network and pool analysis 12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r
follow @oocBlog for new post notifications
szuetam
Sr. Member
****
Offline Offline

Activity: 377
Merit: 253



View Profile
September 05, 2012, 02:54:16 PM
 #7

I can imagine lots of interesting stats which you could get out of block-chain itself for example:
1. How long average bitcoins stay at one address before moving to another
2. How this time is correlated with amount of transfer/amount at this address
3. Diagram showing how long coins already mined wear not transferred and how much of them there is.
(I'm curious for example what is amount of BTC not moving at all for last two years)
4. How average transaction volume depends of an hour (daytime)
5. How average number of transaction depends on that too.
6. Some way to determine and estimate number of unique users etc.


I think that today statistics needs to be visualized properly to make it usable like here for example:
http://www.ted.com/talks/lang/pl/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
Because avarage user won't spend his time to understand it without visualization.
hello_good_sir (OP)
Hero Member
*****
Offline Offline

Activity: 1008
Merit: 531



View Profile
September 05, 2012, 10:30:07 PM
 #8

Organofcorti,  that's a cool and well-done blog.  Where do you get the information?

Szuetam, those are really good questions.  I'll look into the blockchain a bit this weekend and see how feasible they are.

organofcorti
Donator
Legendary
*
Offline Offline

Activity: 2058
Merit: 1007


Poor impulse control.


View Profile WWW
September 05, 2012, 10:51:28 PM
 #9

Organofcorti,  that's a cool and well-done blog.  Where do you get the information?

Well, hello_good_sir and thank you.

All data is self published by pools, either as json, csv or html tables. I use R's webscraping tools, JSON converters and curl to get the data.

Pools that don't publish data as csv or json are a bit of a pain - every weekend when I come to post the weekly average data, one of them changes something on their data page and I spend too much time rewriting and testing the script.

If you're concerned that self published data from pools could be faked, you might find posts 10,11, and 12 at howtohop.blogspot.com and posts 1 and 1.5 at organofcorti.blogspot.com interesting.

https://bitcointalk.org/index.php?topic=66026.msg769530#msg769530


Bitcoin network and pool analysis 12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r
follow @oocBlog for new post notifications
teknohog
Sr. Member
****
Offline Offline

Activity: 519
Merit: 252


555


View Profile WWW
September 06, 2012, 06:11:40 PM
 #10

1. Analyse the network hashrate for cyclical trends (is a large increase in hashrate returning the same time every day / week ?) 2. Find correlations between network hashrate and known external phenomena  (heatwaves might lead many to turn GPUs off, countries with high electricity costs probably have fewer miners)

Seconded. Whenever I look at the graphs by Sipa, I cannot help imagining there must be some daily/weekly trends. For example, workplace computers being turned on and off daily. However, that would only be noticeable if the computers are unevenly distributed by timezone, although weekends would show a clear difference. Another weekly factor is that exchanges are quieter on weekends, when there are no bank transfers.

I wonder what it would look like if the hashrate curves for each week were normalized, so as to compare them to each other to spot any trends. This would also filter out most of the inherent variance in hashing results.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!