c0inbuster (OP)
|
|
May 12, 2013, 08:29:44 AM Last edit: May 12, 2013, 09:42:30 AM by c0inbuster |
|
Hello, I'm starting here a new thread about zipline / Quantopian It's a python trading framework - event driven that can be use for backtesting strategy. https://bitcointalk.org/index.php?topic=148462.msg2105722#msg2105722http://vimeo.com/53064082If you want to try it, you should run ipython with pylab inline ipython notebook --pylab inline
MtQuid posts a Python Notebook here https://bitcointalk.org/index.php?topic=148462.msg2116508#msg2116508http://nbviewer.ipython.org/5561936I'm posting here to avoid to overload goxtool thread (ncurse python software to trade BTC with MtGox) I have some questions... about zipline... First, I noticed that data (daily mtgox|BTC/USD data are coming from http://www.quandl.com/api/v1/datasets/BITCOIN/MTGOXUSD.csv?trim_start=2012-01-01&sort_order=asc( http://www.quandl.com/BITCOIN-Bitcoin-Charts/MTGOXUSD-Bitcoin-Markets-mtgoxUSD ) raw data from http://bitcoincharts.com/charts/chart.json?m=mtgoxUSD open high low close volume volume_usd price Date 2013-05-11 00:00:00+00:00 117.70000 118.74000 113.00 113.47000 25532.277740 2952016.798507 115.619015 2013-05-10 00:00:00+00:00 112.79900 122.50000 111.54 117.70000 77443.672681 9140709.083964 118.030418 2013-05-09 00:00:00+00:00 113.20000 113.71852 108.80 112.79900 26894.458204 3003068.410660 111.661235 2013-05-08 00:00:00+00:00 109.60013 116.77700 109.50 113.20000 61680.324704 6990518.957611 113.334665 2013-05-07 00:00:00+00:00 112.25000 114.00000 97.52 109.60013 139626.724860 14898971.673747 106.705731 <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 497 entries, 2013-05-11 00:00:00+00:00 to 2012-01-01 00:00:00+00:00 Data columns: open 497 non-null values high 497 non-null values low 497 non-null values close 497 non-null values volume 497 non-null values volume_usd 497 non-null values price 497 non-null values dtypes: float64(7) open high low close volume volume_usd price Date 2012-01-05 00:00:00+00:00 5.57383 7.2200 5.57401 6.94760 182328.193876 1130623.294233 6.201034 2012-01-04 00:00:00+00:00 4.88080 5.7000 4.75100 5.57383 131170.856663 688717.856619 5.250540 2012-01-03 00:00:00+00:00 5.21678 5.2900 4.65000 4.88080 125170.253872 619170.541604 4.946627 2012-01-02 00:00:00+00:00 5.26766 5.4700 4.80000 5.21678 69150.931963 360357.284302 5.211170 2012-01-01 00:00:00+00:00 4.72202 5.4999 4.61500 5.26766 108509.229901 553045.139811 5.096757 Note: in fact data need to be sort using ascending index without that you will get this error message AssertionError: Period start falls after period end. I wonder what is "weighted price"... (renamed price) this notebook seems to use this "weighted price" to simulate kind of tick data it will be in my mind much better to simulate each price that have been seen on market (open high low close) because if you are going long and you put a Stop Loss, it will be probably be hitten by low price. (or if you are goind short it will probably be hitten by high price) Second, I have some problem to run notebook (I always get a (*) ) but I'm running without notebook http://pastebin.com/jmfuNTKsThird, I wonder why I don't see buy/sell (^ and v) Fourth, what about day trading !!! (with M15 timeframe !) some data are here https://bitcointalk.org/index.php?topic=199979.0or https://bitcointalk.org/index.php?topic=196834.0unfortunately I'm quite busy today ;-( Kind regards
|
|
|
|
MtQuid
Newbie
Offline
Activity: 24
Merit: 0
|
|
May 12, 2013, 12:47:45 PM |
|
Yeah I'm also wondering about the not seeing the buy/sell (^ and v) I think this ties in with me having to add extra values to those two series. I finished that stuff off drunk last night...but that charts show it would have made profit I've updated the notebook now to use OHLC and it works. I took the code from load_bars_from_yahoo() so we can use stuff like data['BTC']['open'] within the handler. Works well. Almost there.... I also agree that we need to use a better data source and that should probably be bitcoincharts. Bots just puke up machine language. Time to talk to some humans down the pub. Sunday Roast!!!
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
May 12, 2013, 02:36:06 PM |
|
Almost there.... I also agree that we need to use a better data source and that should probably be bitcoincharts. Why not mtgox itself?
|
|
|
|
c0inbuster (OP)
|
|
May 12, 2013, 07:30:08 PM |
|
Because MtGox only provide (to my knowledge) only an API to download each trade (and it's a very big file !!!) About latest version of MtQuid notebook... http://nbviewer.ipython.org/5561936I don't understand why using a Pandas panel I also don't understand the goal of "ajusted" I think we just need to resample data A very basic idea (to test long strategy) could be to send price as follow OPEN_dt0 LOW_dt0 HIGH_dt0 CLOSE_dt0 OPEN_dt1 LOW_dt1 HIGH_dt1 CLOSE_dt1 ... it allows to consider the worst case so if we set stop loss and take profit in simulator, price will first go in direction of stop loss and after into take profit direction
|
|
|
|
MtQuid
Newbie
Offline
Activity: 24
Merit: 0
|
|
May 12, 2013, 08:12:44 PM Last edit: May 12, 2013, 08:45:48 PM by MtQuid |
|
You have to use panel because 'data' is a DataFrame dict of TimeSeries. As far as I know a TimeSeries can only have one value for each time-stamped row, and that is why the previous notebook only passed the single ['price'] TimeSeries and not the rest. We need multiple values/observations (open,high,low,close,volume...) per row in the TimeSeries so we use the panel method. Reading the Quantopian forum and zipline commit logs you can see that this is the chosen and agreed upon method for passing around OHCL sets. I just took adjusted from the load_bars_from_yahoo() source and use the defaulted values but I'll delete the code on Monday as Bitcoin is without splits and dividends. I was in a rush to post before the roast.You can use whatever data you want with the simulator but you will need to turn it into a panel if you want to be able to pass around multiple observations per tick, and also if you want to have the TradingAlgorithm be able to issue orders in the handle_data(), unless you build your own datasource tick generator wich might not be a bad thing. Anyway, It is very easy now. handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance. Everything is now possible. Add bitcoincharts json files with selectable time collapse and then we are are really cooking ..but that is work for Monday. Edit: notebook has been updated from the lastest DMA example shipped with zipline source. The bugs with extra values added and non showing graph arrows have been resolved. Edit: we still need to work MtGox fees into the analysis but I'm doing that on a goxtool bot so I have the accurate code
|
|
|
|
c0inbuster (OP)
|
|
May 12, 2013, 09:19:26 PM Last edit: May 13, 2013, 05:07:32 AM by c0inbuster |
|
Thanks for your code. We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio, drawdown ...)
I didn't find a good tutorial about zipline maybe you have any pointer to provide me ? we should also add entry / exit point efficiency.
you was talking about trading fees... but there is an other kind of fee that is not modeled here : spread bid and ask price for a given BTC volume are differents ! difference is called spread = ask - bid even if trading fees were 0% you will lose money to buy and sell BTC simultaneously
For now, we only have price... we don't know what spread value was for a given datetime !
moreover unlike Forex market where spread is either fixed or time dependant... in BTC market spread is volume dependant. the higher BTC volume is, the higher spread is !!!
but that's probably only noticeable for very big BTC volume
|
|
|
|
jbsnyder
Newbie
Offline
Activity: 9
Merit: 0
|
|
May 13, 2013, 01:58:16 AM |
|
Thanks for your code. We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio ...)
I didn't find a good tutorial about zipline maybe you have any pointer to provide me ? we should also add entry / exit point efficiency.
There there are some examples: https://github.com/quantopian/zipline/tree/master/zipline/examplesBut I haven't found a ton of documentation. At the moment if feels like one might be doing some code skimming and pydoc usage to look at the API. I haven't tried compiling the documentation in the repo but the few files I looked at didn't seem to expand far beyond that. Here's an example that extends MtQuid's notebook to try a method on the zipline mailing list: http://nbviewer.ipython.org/ec53445ececcd94980b8I'm not sure if those are correct or not, didn't check that the dates are actually UTC. you was talking about trading fees... but there is an other kind of fee that is not modeled here : spread bid and ask price for a given BTC volume are differents ! difference is called spread = ask - bid even if trading fees were 0% you will lose money to buy and sell BTC simultaneously
For now, we only have price... we don't know what spread value was for a given datetime !
moreover unlike Forex market where spread is either fixed or time dependant... in BTC market spread is volume dependant. the higher BTC volume is, the higher spread is !!!
but that's probably only noticeable for very big BTC volume
Yeah, this is one thing that makes me the most concerned about backtesting. I don't know if there are any data sources that keep book history that could be used for this purpose either. I can think of ways to maybe get a sense for it from the data by looking for alternating jumps in the data, but that'd be an approximation at best. The only thing that came up in a quick search regarding order book history was this other thread which links to data from 2012: https://bitcointalk.org/index.php?topic=88054.0I'd be interested in other theories on how to deal with this. I'd be thinking maybe either some estimated factor, or binning the data and going by the low, or doing priced asks/bids rather than market orders? Or, as you suggest perhaps we could collect some data and do a model based on volume? One could also look at trades that alternatingly up/down to get an idea of the spread? Any modeling would need some test data though.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
May 13, 2013, 04:44:57 AM |
|
Because MtGox only provide (to my knowledge) only an API to download each trade (and it's a very big file !!!)
I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem?
|
|
|
|
c0inbuster (OP)
|
|
May 13, 2013, 07:42:13 AM Last edit: May 13, 2013, 09:34:17 AM by c0inbuster |
|
My idea to build pseudo tick price using candlestick can be expressed like that : data['BTC'] = data['BTC'].resample('8H', how='mean') data['BTC']['ModCol'] = np.mod(np.arange(0,len(data['BTC']),1),3) data['BTC']['price'] = np.where(data['BTC']['ModCol']==0, data['BTC']['open'], np.nan).fillna(0) + np.where(data['BTC']['ModCol']==1, data['BTC']['low'].shift(1), np.nan).fillna(0) + np.where(data['BTC']['ModCol']==2, data['BTC']['high'].shift(2), np.nan).fillna(0)
in fact I build TimeSeries for price like Open Low High Open Low High ... (assuming that there is no gap between close of previous candle and open of current candle) There is probably a better way to do this ! (But I'm not very clever with code vectorization) Edit: in fact we should make a dataframe which simulates candlestick trying to build Time Open High Low Close =========================================== t0 = Open Open Open Open t0+timeframe/3 = Open Open Low Low t0+2*timeframe/3 = Open High Low High t1=t0+timeframe = Open High Low Close I'm sorry but I don't understand what you (MtQuid) are saying : handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance. I don't think handle_data manage how candlestick are being build over time... @hugolp problem with big data is not about storing them... it's about processing them...
|
|
|
|
c0inbuster (OP)
|
|
May 13, 2013, 07:55:20 AM |
|
If you want to draw candlestick plot in you IPython notebook you can use this from matplotlib.finance import * fig = plt.figure() ax = fig.add_subplot(111, ylabel='price') Date = range(1,len(data['BTC'])+1) Open = data['BTC']['open'].values High = data['BTC']['high'].values Low = data['BTC']['low'].values Close = data['BTC']['close'].values Volume = data['BTC']['volume'].values DOCHLV = zip(Date, Open, Close, High, Low, Volume) candlestick(ax, DOCHLV, width=0.6, colorup='g', colordown='r', alpha=1.0) but it needs to be improved but there is also an issue with Pandas https://github.com/pydata/pandas/issues/783see also this notebook http://nbviewer.ipython.org/4982660/it seems to be a clean way to draw candlesticks
|
|
|
|
jbsnyder
Newbie
Offline
Activity: 9
Merit: 0
|
|
May 13, 2013, 03:50:18 PM |
|
Stumbled on that issue before. It looks like the user who suggested a solution, even though he didn't submit a patch, has put up some of his personal charting tools: https://github.com/dalejung/trtoolsHaven't tried them yet, seems like it needs a few dependencies like tables (and consequently hdf5).
|
|
|
|
jbsnyder
Newbie
Offline
Activity: 9
Merit: 0
|
|
May 13, 2013, 05:54:11 PM |
|
Because MtGox only provide (to my knowledge) only an API to download each trade (and it's a very big file !!!)
I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem? Certainly. I think as it stands, I think the main problem with getting detailed trading data is that if you want to start from scratch it will take some time to pull it down from mtgox. There is a sqlite database up to fairly recent trades here (and a python script that will attempt to pull in more recent trades): http://cahier2.ww7.be/bitcoinmirror/phantomcircuit/Edit: the script connects to mtgox.com rather than data.mtgox.com and should be updated in order to continue getting transactions.
|
|
|
|
c0inbuster (OP)
|
|
May 13, 2013, 07:38:35 PM |
|
Thanks a lot for your link...
So in your mind we could feed zipline with tick data...
But I wonder if we could have differents indicators with differents timeframe (M30 and H1 for example)
for example a Moving average based on M30 candlestick chart and an other indicator (RSI for example) based on a H1 candlestick chart
|
|
|
|
btc_lurker
Member
Offline
Activity: 78
Merit: 10
|
|
May 13, 2013, 08:08:34 PM |
|
Thanks a lot for your link...
So in your mind we could feed zipline with tick data...
But I wonder if we could have differents indicators with differents timeframe (M30 and H1 for example)
for example a Moving average based on M30 candlestick chart and an other indicator (RSI for example) based on a H1 candlestick chart
I didn't really read the thread, but it seems people is using pandas here. Are you aware of the resample method that is available ? If you have real time (ticker) data, you can resample it based on any granularity very easily using pandas.
|
|
|
|
MtQuid
Newbie
Offline
Activity: 24
Merit: 0
|
|
May 13, 2013, 11:16:44 PM |
|
I've not found a good tutorial on zipline so have just been reading the source code. This new book pulls data from bitcoincharts http://nbviewer.ipython.org/5572250You can use non daily data but the results from TradingAlgorithm.run() are daily so you have to play around a bit at the end. The simulation will run correctly though. Can not place fractional orders. To fix the issue of not being able to place fractional orders we will have to use MtGox order volumes which are satoshi. And there are no buys or sells in results when using M15,H1 etc... even though the buy or sell takes place during the simulation.
|
|
|
|
c0inbuster (OP)
|
|
May 14, 2013, 05:23:48 AM |
|
Thanks MtGuid
@btc_lurker if we feed zipline with tick data, handle data (which is called several times) will have to resample data several times... that's why I don't know if it's a good idea....
|
|
|
|
btc_lurker
Member
Offline
Activity: 78
Merit: 10
|
|
May 14, 2013, 02:30:42 PM |
|
if we feed zipline with tick data, handle data (which is called several times) will have to resample data several times... that's why I don't know if it's a good idea....
Is that a poem ? You are supposed to collect data in real time, which is very different from doing analysis in real time. There is very little value in doing the analysis in real time, actually. So suppose right now you have all data ever produced by some exchange, and some new data come in. You aggregate it to your existing data, and, for example, each 5 minutes you update your analysis on this data. Note that there is also little value in using the entire history for doing something like EMAs, per definition of EMA. You still need to resample data, but only each 5 minutes. And resampling is done very efficiently by pandas.
|
|
|
|
c0inbuster (OP)
|
|
May 14, 2013, 04:45:43 PM |
|
Is that a poem ? english is not my mother tongue that's why I'm probably not very able to write as you could expect ! I have several problems: I don't see anyone here feeding zipline with tick data and doing EMA with candlestick data inside handle_data. (so we will be able to calculate EMA on several timeframe) I also don't know if zipline will support non daily timeframe, and at least the planned to provide support for such feature. I don't how we could feed zipline with real time data Why not making analysis each time a tick is received... This is exactly what Metatrader is doing in start() function http://book.mql4.com/programm/special@knowitnothing I will have a look at your code btcx
|
|
|
|
c0inbuster (OP)
|
|
May 14, 2013, 07:07:10 PM |
|
A new poem for you guys... http://nbviewer.ipython.org/587a80f5e2eb9cf41d6dSome features: - Stop Loss - Take Profit - Trailing Stop - BreakEven (set as percentage of price) Max drawdown is now 1.9 % alpha: 23.52% previous values was: alpha:16.34% max_drawdown:12.55% But I'm sure that our backtesting is false... because we are only using price (and never low_price) ToDo: variable lot size (depending of portfolio value) to keep same risk value for each trade output data such as trade entry efficiency / exit efficiency sub-daily timeframe trading (M15, M30, H1...) (see zipline team ?) add real time feed (goxtool or btcx code could help) use a file to store Stop Loss/Take Profit values (we should also consider that we could have several positions opened... that's not the case in this strategy but it could be in a more sophisticated strategy) so we need to store a trade identifier (trade number) we should also add a MagicNumber to identify that a given trade has been opened by a given strategy. http://www.onestepremoved.com/magic-number/use scipy optimize to be able to optimize parameters. divide data into 2 parts - data for optimizing parameter - data for testing parameter in order to ensure that settings are robust walk forward analysis...
|
|
|
|
mebi
Newbie
Offline
Activity: 7
Merit: 0
|
|
May 15, 2013, 12:25:40 AM |
|
Really fascinating stuff, playing with these examples to try and get up to speed. c0inbuster, I get an error: print "{dt}: hit BreakEvent - moving Stop Loss from {SL1} to {SL2}".format(size=size, dt=data['BTC'].datetime, SL1=self.price_SL, SL2=self.price_BE_offset) NameError: global name 'size' is not defined
Am I correct to assume it should be ?
|
|
|
|
|