Bitcoin Forum

Economy => Speculation => Topic started by: chodpaba on October 25, 2013, 07:43:12 PM



Title: .
Post by: chodpaba on October 25, 2013, 07:43:12 PM
.


Title: Re: The Imputation Problem
Post by: notme on October 25, 2013, 07:45:30 PM
Match them up by timestamp, and then do your volume binning.


Title: Re: The Imputation Problem
Post by: Chainsaw on October 25, 2013, 09:29:11 PM
You just made me google 'imputation'  :D

I think the approach taken would have a lot to do with what you're trying to do with the data.

Separate overlaid graphs serve some purposes (such as showing historical arbitrage trends) - combined results serve others (such as gross volume trends).  Hell, there'd be value in overlaying individual market graphs with the combined values too.  

Point being - I would think as long as your raw data from various markets are stored discretely, you can build up whatever logical data combinations you wish as a layer stacked on top, with views (mostly graphs here) representing whatever concepts you care to.

It's tough to be more specific without knowing the specific approach you're using to gather and interpret your data, or what kinds of outputs you're trying to create.  You draw some pretty strong, solid (read: valuable) conclusions from the data you analyze. If you think I could be of more help with some more data, feel free to elaborate here or shoot me a PM.  I don't think I'm as math-heavy as you, and typically do most number-crunching via C# or Excel.


Title: Re: The Imputation Problem
Post by: snackman on October 26, 2013, 12:23:55 AM
I know R pretty well, if you need any assistance.


Title: Re: The Imputation Problem
Post by: notme on October 26, 2013, 01:42:43 AM
http://bitcoincharts.com/about/markets-api/


Title: Re: The Imputation Problem
Post by: balanghai on October 26, 2013, 01:45:33 AM
If truncated, will it not be hard to include realtime data streams?


Title: Re: The Imputation Problem
Post by: notme on October 26, 2013, 02:27:04 AM
http://bitcoincharts.com/about/markets-api/

Yes, thx. I have been using Bitcoin Charts data, it is very convenient. The problem I have run into is getting Forex data. But these folks may have solved that problem for me: http://www.quandl.com/help/api

Oh, duh.  Interesting site.


Title: Re: The Imputation Problem
Post by: snackman on October 28, 2013, 03:18:03 PM
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.


Title: Re: The Imputation Problem
Post by: thezerg on October 28, 2013, 03:30:22 PM
I have long resisted the inclusion of data from exchanges other than Gox because I never really understood how to include samples of different lengths into a model. But now that the volumes of Gox, Bistamp, and Btcchina have been comparable for so long I am forced to include their trade data into a model.

It may be sufficient to simply truncate the trade data to the shortest sample, but I really hate to throw away data. As well, I expect there will occasionally be cases where there will be missing data ongoing.

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

You'll want to use the exciting science of reverse imputation.  This complex mathematical technique uses the desired solution to inform the chosen imputation algorithm and data-source weighting coefficients.   ;D  Come on, get with its GUARANTEED to make Bitcoin look awesome!  We know this from seeing the CPI numbers.



Title: Re: The Imputation Problem
Post by: bucktotal on October 28, 2013, 05:08:21 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Title: Re: The Imputation Problem
Post by: notme on October 28, 2013, 06:07:41 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.


Title: Re: The Imputation Problem
Post by: kjj on October 28, 2013, 06:51:59 PM
Imputed data isn't.  Ditto for interpolated.

If your model involves anything resembling regression, cleaning the data in any way will cause your model to vastly overestimate the certainty and accuracy of the output.

This kind of thing is a pain to model.  The spreads between the exchanges distort the price signal that you are looking for, but not totally.

You could use (sign,magnitude) of changes instead of absolute values, which will remove the pure-arbitrage signal from the price signal, but that distorts the price signal.  (sign,log(magnitude)) might help a bit, but that's hard to say too.

Or, you can ignore the arbitrage signal, and just mash the prices together as they really are.  But that will result in a price that is constantly too high by a factor related to the difficulty of moving stuff around.

You are going to hate this, but the most valid way to go is to model each exchange and the relationships between them.  You'll have to give up on the notion of "the bitcoin price" and instead work with "the bitcoin price at locations X,Y and Z".

Oh, and I forgot that the arbitrage issues are not linear.  Your model is going to get screwed every time the real world factors that cause the spreads change.


Title: Re: The Imputation Problem
Post by: bucktotal on October 28, 2013, 10:38:30 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume



Title: Re: The Imputation Problem
Post by: notme on October 28, 2013, 10:42:26 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.


Title: Re: The Imputation Problem
Post by: bucktotal on October 28, 2013, 10:47:23 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...




Title: Re: The Imputation Problem
Post by: notme on October 29, 2013, 01:22:27 AM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...




Interpolation is synthesizing new points between existing data.  This is not the same thing as as interweaving two data series based on a common time series.  I'm not sure what the best term is for that, but it isn't interpolation.

http://en.wikipedia.org/wiki/Interpolation


Title: Re: The Imputation Problem
Post by: prophetx on October 29, 2013, 01:28:05 AM
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

if someone puts together this dataset i actually need (for my thesis):

total daily volume on the exchanges

average weighted daily price

OR

total daily trade volume in $