chodpaba (OP)
Jr. Member
Offline
Activity: 57
Merit: 10
|
|
October 25, 2013, 07:43:12 PM Last edit: January 17, 2014, 04:10:32 PM by chodpaba |
|
.
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 25, 2013, 07:45:30 PM |
|
Match them up by timestamp, and then do your volume binning.
|
|
|
|
Chainsaw
|
|
October 25, 2013, 09:29:11 PM |
|
You just made me google 'imputation' I think the approach taken would have a lot to do with what you're trying to do with the data. Separate overlaid graphs serve some purposes (such as showing historical arbitrage trends) - combined results serve others (such as gross volume trends). Hell, there'd be value in overlaying individual market graphs with the combined values too. Point being - I would think as long as your raw data from various markets are stored discretely, you can build up whatever logical data combinations you wish as a layer stacked on top, with views (mostly graphs here) representing whatever concepts you care to. It's tough to be more specific without knowing the specific approach you're using to gather and interpret your data, or what kinds of outputs you're trying to create. You draw some pretty strong, solid (read: valuable) conclusions from the data you analyze. If you think I could be of more help with some more data, feel free to elaborate here or shoot me a PM. I don't think I'm as math-heavy as you, and typically do most number-crunching via C# or Excel.
|
|
|
|
snackman
|
|
October 26, 2013, 12:23:55 AM |
|
I know R pretty well, if you need any assistance.
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 26, 2013, 01:42:43 AM |
|
|
|
|
|
balanghai
|
|
October 26, 2013, 01:45:33 AM |
|
If truncated, will it not be hard to include realtime data streams?
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 26, 2013, 02:27:04 AM |
|
Yes, thx. I have been using Bitcoin Charts data, it is very convenient. The problem I have run into is getting Forex data. But these folks may have solved that problem for me: http://www.quandl.com/help/apiOh, duh. Interesting site.
|
|
|
|
snackman
|
|
October 28, 2013, 03:18:03 PM |
|
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.
|
|
|
|
thezerg
Legendary
Offline
Activity: 1246
Merit: 1010
|
|
October 28, 2013, 03:30:22 PM |
|
I have long resisted the inclusion of data from exchanges other than Gox because I never really understood how to include samples of different lengths into a model. But now that the volumes of Gox, Bistamp, and Btcchina have been comparable for so long I am forced to include their trade data into a model.
It may be sufficient to simply truncate the trade data to the shortest sample, but I really hate to throw away data. As well, I expect there will occasionally be cases where there will be missing data ongoing.
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
You'll want to use the exciting science of reverse imputation. This complex mathematical technique uses the desired solution to inform the chosen imputation algorithm and data-source weighting coefficients. Come on, get with its GUARANTEED to make Bitcoin look awesome! We know this from seeing the CPI numbers.
|
|
|
|
bucktotal
|
|
October 28, 2013, 05:08:21 PM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ?
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 28, 2013, 06:07:41 PM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ? Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.
|
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
October 28, 2013, 06:51:59 PM |
|
Imputed data isn't. Ditto for interpolated.
If your model involves anything resembling regression, cleaning the data in any way will cause your model to vastly overestimate the certainty and accuracy of the output.
This kind of thing is a pain to model. The spreads between the exchanges distort the price signal that you are looking for, but not totally.
You could use (sign,magnitude) of changes instead of absolute values, which will remove the pure-arbitrage signal from the price signal, but that distorts the price signal. (sign,log(magnitude)) might help a bit, but that's hard to say too.
Or, you can ignore the arbitrage signal, and just mash the prices together as they really are. But that will result in a price that is constantly too high by a factor related to the difficulty of moving stuff around.
You are going to hate this, but the most valid way to go is to model each exchange and the relationships between them. You'll have to give up on the notion of "the bitcoin price" and instead work with "the bitcoin price at locations X,Y and Z".
Oh, and I forgot that the arbitrage issues are not linear. Your model is going to get screwed every time the real world factors that cause the spreads change.
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
bucktotal
|
|
October 28, 2013, 10:38:30 PM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ? Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams. sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 28, 2013, 10:42:26 PM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ? Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams. sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly.
|
|
|
|
bucktotal
|
|
October 28, 2013, 10:47:23 PM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ? Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams. sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly. im listening...
|
|
|
|
notme
Legendary
Offline
Activity: 1904
Merit: 1002
|
|
October 29, 2013, 01:22:27 AM |
|
I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.
interpolate ? Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams. sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly. im listening... Interpolation is synthesizing new points between existing data. This is not the same thing as as interweaving two data series based on a common time series. I'm not sure what the best term is for that, but it isn't interpolation. http://en.wikipedia.org/wiki/Interpolation
|
|
|
|
prophetx
Legendary
Offline
Activity: 1666
Merit: 1010
he who has the gold makes the rules
|
|
October 29, 2013, 01:28:05 AM |
|
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.
if someone puts together this dataset i actually need (for my thesis): total daily volume on the exchanges average weighted daily price OR total daily trade volume in $
|
|
|
|
|