Bitcoin Forum
May 08, 2024, 01:42:22 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: .  (Read 1800 times)
chodpaba (OP)
Jr. Member
*
Offline Offline

Activity: 57
Merit: 10



View Profile
.
October 25, 2013, 07:43:12 PM
Last edit: January 17, 2014, 04:10:32 PM by chodpaba
 #1

.
1715175742
Hero Member
*
Offline Offline

Posts: 1715175742

View Profile Personal Message (Offline)

Ignore
1715175742
Reply with quote  #2

1715175742
Report to moderator
BitcoinCleanup.com: Learn why Bitcoin isn't bad for the environment
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715175742
Hero Member
*
Offline Offline

Posts: 1715175742

View Profile Personal Message (Offline)

Ignore
1715175742
Reply with quote  #2

1715175742
Report to moderator
notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 25, 2013, 07:45:30 PM
 #2

Match them up by timestamp, and then do your volume binning.

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
Chainsaw
Hero Member
*****
Offline Offline

Activity: 625
Merit: 501


x


View Profile
October 25, 2013, 09:29:11 PM
 #3

You just made me google 'imputation'  Cheesy

I think the approach taken would have a lot to do with what you're trying to do with the data.

Separate overlaid graphs serve some purposes (such as showing historical arbitrage trends) - combined results serve others (such as gross volume trends).  Hell, there'd be value in overlaying individual market graphs with the combined values too.  

Point being - I would think as long as your raw data from various markets are stored discretely, you can build up whatever logical data combinations you wish as a layer stacked on top, with views (mostly graphs here) representing whatever concepts you care to.

It's tough to be more specific without knowing the specific approach you're using to gather and interpret your data, or what kinds of outputs you're trying to create.  You draw some pretty strong, solid (read: valuable) conclusions from the data you analyze. If you think I could be of more help with some more data, feel free to elaborate here or shoot me a PM.  I don't think I'm as math-heavy as you, and typically do most number-crunching via C# or Excel.

snackman
Sr. Member
****
Offline Offline

Activity: 260
Merit: 250

snack of all trades


View Profile WWW
October 26, 2013, 12:23:55 AM
 #4

I know R pretty well, if you need any assistance.

notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 26, 2013, 01:42:43 AM
 #5

http://bitcoincharts.com/about/markets-api/

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
balanghai
Sr. Member
****
Offline Offline

Activity: 364
Merit: 253


View Profile
October 26, 2013, 01:45:33 AM
 #6

If truncated, will it not be hard to include realtime data streams?
notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 26, 2013, 02:27:04 AM
 #7


Yes, thx. I have been using Bitcoin Charts data, it is very convenient. The problem I have run into is getting Forex data. But these folks may have solved that problem for me: http://www.quandl.com/help/api

Oh, duh.  Interesting site.

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
snackman
Sr. Member
****
Offline Offline

Activity: 260
Merit: 250

snack of all trades


View Profile WWW
October 28, 2013, 03:18:03 PM
 #8

A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

thezerg
Legendary
*
Offline Offline

Activity: 1246
Merit: 1010


View Profile
October 28, 2013, 03:30:22 PM
 #9

I have long resisted the inclusion of data from exchanges other than Gox because I never really understood how to include samples of different lengths into a model. But now that the volumes of Gox, Bistamp, and Btcchina have been comparable for so long I am forced to include their trade data into a model.

It may be sufficient to simply truncate the trade data to the shortest sample, but I really hate to throw away data. As well, I expect there will occasionally be cases where there will be missing data ongoing.

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

You'll want to use the exciting science of reverse imputation.  This complex mathematical technique uses the desired solution to inform the chosen imputation algorithm and data-source weighting coefficients.   Grin  Come on, get with its GUARANTEED to make Bitcoin look awesome!  We know this from seeing the CPI numbers.

bucktotal
Full Member
***
Offline Offline

Activity: 232
Merit: 100


View Profile
October 28, 2013, 05:08:21 PM
 #10


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?

notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 28, 2013, 06:07:41 PM
 #11


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1025



View Profile
October 28, 2013, 06:51:59 PM
 #12

Imputed data isn't.  Ditto for interpolated.

If your model involves anything resembling regression, cleaning the data in any way will cause your model to vastly overestimate the certainty and accuracy of the output.

This kind of thing is a pain to model.  The spreads between the exchanges distort the price signal that you are looking for, but not totally.

You could use (sign,magnitude) of changes instead of absolute values, which will remove the pure-arbitrage signal from the price signal, but that distorts the price signal.  (sign,log(magnitude)) might help a bit, but that's hard to say too.

Or, you can ignore the arbitrage signal, and just mash the prices together as they really are.  But that will result in a price that is constantly too high by a factor related to the difficulty of moving stuff around.

You are going to hate this, but the most valid way to go is to model each exchange and the relationships between them.  You'll have to give up on the notion of "the bitcoin price" and instead work with "the bitcoin price at locations X,Y and Z".

Oh, and I forgot that the arbitrage issues are not linear.  Your model is going to get screwed every time the real world factors that cause the spreads change.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
bucktotal
Full Member
***
Offline Offline

Activity: 232
Merit: 100


View Profile
October 28, 2013, 10:38:30 PM
 #13


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 28, 2013, 10:42:26 PM
 #14


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
bucktotal
Full Member
***
Offline Offline

Activity: 232
Merit: 100


View Profile
October 28, 2013, 10:47:23 PM
 #15


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...


notme
Legendary
*
Offline Offline

Activity: 1904
Merit: 1002


View Profile
October 29, 2013, 01:22:27 AM
 #16


I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...




Interpolation is synthesizing new points between existing data.  This is not the same thing as as interweaving two data series based on a common time series.  I'm not sure what the best term is for that, but it isn't interpolation.

http://en.wikipedia.org/wiki/Interpolation

https://www.bitcoin.org/bitcoin.pdf
While no idea is perfect, some ideas are useful.
prophetx
Legendary
*
Offline Offline

Activity: 1666
Merit: 1010


he who has the gold makes the rules


View Profile WWW
October 29, 2013, 01:28:05 AM
 #17

A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

if someone puts together this dataset i actually need (for my thesis):

total daily volume on the exchanges

average weighted daily price

OR

total daily trade volume in $
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!