Title: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 12, 2013, 08:29:44 AM Hello,
I'm starting here a new thread about zipline / Quantopian It's a python trading framework - event driven that can be use for backtesting strategy. https://bitcointalk.org/index.php?topic=148462.msg2105722#msg2105722 http://vimeo.com/53064082 If you want to try it, you should run ipython with pylab inline Code: ipython notebook --pylab inline MtQuid posts a Python Notebook here https://bitcointalk.org/index.php?topic=148462.msg2116508#msg2116508 http://nbviewer.ipython.org/5561936 I'm posting here to avoid to overload goxtool thread (ncurse python software to trade BTC with MtGox) I have some questions... about zipline... First, I noticed that data (daily mtgox|BTC/USD data are coming from http://www.quandl.com/api/v1/datasets/BITCOIN/MTGOXUSD.csv?trim_start=2012-01-01&sort_order=asc ( http://www.quandl.com/BITCOIN-Bitcoin-Charts/MTGOXUSD-Bitcoin-Markets-mtgoxUSD ) raw data from http://bitcoincharts.com/charts/chart.json?m=mtgoxUSD Code: open high low close volume volume_usd price Note: in fact data need to be sort using ascending index without that you will get this error message Code: AssertionError: Period start falls after period end. I wonder what is "weighted price"... (renamed price) this notebook seems to use this "weighted price" to simulate kind of tick data it will be in my mind much better to simulate each price that have been seen on market (open high low close) because if you are going long and you put a Stop Loss, it will be probably be hitten by low price. (or if you are goind short it will probably be hitten by high price) Second, I have some problem to run notebook (I always get a (*) ) but I'm running without notebook http://pastebin.com/jmfuNTKs Third, I wonder why I don't see buy/sell (^ and v) Fourth, what about day trading !!! (with M15 timeframe !) some data are here https://bitcointalk.org/index.php?topic=199979.0 or https://bitcointalk.org/index.php?topic=196834.0 unfortunately I'm quite busy today ;-( Kind regards Title: Re: zipline / Quantopian - backtesting / trading framework Post by: MtQuid on May 12, 2013, 12:47:45 PM Yeah I'm also wondering about the not seeing the buy/sell (^ and v)
I think this ties in with me having to add extra values to those two series. I finished that stuff off drunk last night...but that charts show it would have made profit :P I've updated the notebook now to use OHLC and it works. I took the code from load_bars_from_yahoo() so we can use stuff like data['BTC']['open'] within the handler. Works well. Almost there.... I also agree that we need to use a better data source and that should probably be bitcoincharts. Bots just puke up machine language. Time to talk to some humans down the pub. Sunday Roast!!! :) Title: Re: zipline / Quantopian - backtesting / trading framework Post by: hugolp on May 12, 2013, 02:36:06 PM Almost there.... I also agree that we need to use a better data source and that should probably be bitcoincharts. Why not mtgox itself? Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 12, 2013, 07:30:08 PM Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!) About latest version of MtQuid notebook... http://nbviewer.ipython.org/5561936 I don't understand why using a Pandas panel I also don't understand the goal of "ajusted" I think we just need to resample data A very basic idea (to test long strategy) could be to send price as follow OPEN_dt0 LOW_dt0 HIGH_dt0 CLOSE_dt0 OPEN_dt1 LOW_dt1 HIGH_dt1 CLOSE_dt1 ... it allows to consider the worst case so if we set stop loss and take profit in simulator, price will first go in direction of stop loss and after into take profit direction Title: Re: zipline / Quantopian - backtesting / trading framework Post by: MtQuid on May 12, 2013, 08:12:44 PM You have to use panel because 'data' is a DataFrame dict of TimeSeries. As far as I know a TimeSeries can only have one value for each time-stamped row, and that is why the previous notebook only passed the single ['price'] TimeSeries and not the rest. We need multiple values/observations (open,high,low,close,volume...) per row in the TimeSeries so we use the panel method. Reading the Quantopian forum and zipline commit logs you can see that this is the chosen and agreed upon method for passing around OHCL sets.
You can use whatever data you want with the simulator but you will need to turn it into a panel if you want to be able to pass around multiple observations per tick, and also if you want to have the TradingAlgorithm be able to issue orders in the handle_data(), unless you build your own datasource tick generator wich might not be a bad thing. Anyway, It is very easy now. handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance. Everything is now possible. Add bitcoincharts json files with selectable time collapse and then we are are really cooking ..but that is work for Monday. Edit: notebook (http://nbviewer.ipython.org/5561936) has been updated from the lastest DMA example shipped with zipline source. The bugs with extra values added and non showing graph arrows have been resolved. Edit: we still need to work MtGox fees into the analysis but I'm doing that on a goxtool bot so I have the accurate code Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 12, 2013, 09:19:26 PM Thanks for your code.
We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio, drawdown ...) I didn't find a good tutorial about zipline maybe you have any pointer to provide me ? we should also add entry / exit point efficiency. you was talking about trading fees... but there is an other kind of fee that is not modeled here : spread bid and ask price for a given BTC volume are differents ! difference is called spread = ask - bid even if trading fees were 0% you will lose money to buy and sell BTC simultaneously For now, we only have price... we don't know what spread value was for a given datetime ! moreover unlike Forex market where spread is either fixed or time dependant... in BTC market spread is volume dependant. the higher BTC volume is, the higher spread is !!! but that's probably only noticeable for very big BTC volume Title: Re: zipline / Quantopian - backtesting / trading framework Post by: jbsnyder on May 13, 2013, 01:58:16 AM Thanks for your code. We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio ...) I didn't find a good tutorial about zipline maybe you have any pointer to provide me ? we should also add entry / exit point efficiency. There there are some examples: https://github.com/quantopian/zipline/tree/master/zipline/examples But I haven't found a ton of documentation. At the moment if feels like one might be doing some code skimming and pydoc usage to look at the API. I haven't tried compiling the documentation in the repo but the few files I looked at didn't seem to expand far beyond that. Here's an example that extends MtQuid's notebook to try a method on the zipline mailing list: http://nbviewer.ipython.org/ec53445ececcd94980b8 I'm not sure if those are correct or not, didn't check that the dates are actually UTC. you was talking about trading fees... but there is an other kind of fee that is not modeled here : spread bid and ask price for a given BTC volume are differents ! difference is called spread = ask - bid even if trading fees were 0% you will lose money to buy and sell BTC simultaneously For now, we only have price... we don't know what spread value was for a given datetime ! moreover unlike Forex market where spread is either fixed or time dependant... in BTC market spread is volume dependant. the higher BTC volume is, the higher spread is !!! but that's probably only noticeable for very big BTC volume Yeah, this is one thing that makes me the most concerned about backtesting. I don't know if there are any data sources that keep book history that could be used for this purpose either. I can think of ways to maybe get a sense for it from the data by looking for alternating jumps in the data, but that'd be an approximation at best. The only thing that came up in a quick search regarding order book history was this other thread which links to data from 2012: https://bitcointalk.org/index.php?topic=88054.0 I'd be interested in other theories on how to deal with this. I'd be thinking maybe either some estimated factor, or binning the data and going by the low, or doing priced asks/bids rather than market orders? Or, as you suggest perhaps we could collect some data and do a model based on volume? One could also look at trades that alternatingly up/down to get an idea of the spread? Any modeling would need some test data though. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: hugolp on May 13, 2013, 04:44:57 AM Because MtGox only provide (to my knowledge) only an API to download each trade (and it's a very big file !!!) I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem? Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 13, 2013, 07:42:13 AM My idea to build pseudo tick price using candlestick can be expressed like that :
Code: data['BTC'] = data['BTC'].resample('8H', how='mean') Code: Open There is probably a better way to do this ! (But I'm not very clever with code vectorization) Edit: in fact we should make a dataframe which simulates candlestick trying to build Code: Time Open High Low Close I'm sorry but I don't understand what you (MtQuid) are saying : Quote handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance. I don't think handle_data manage how candlestick are being build over time...@hugolp problem with big data is not about storing them... it's about processing them... Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 13, 2013, 07:55:20 AM If you want to draw candlestick plot in you IPython notebook you can use this
Code: from matplotlib.finance import * but it needs to be improved but there is also an issue with Pandas https://github.com/pydata/pandas/issues/783 see also this notebook http://nbviewer.ipython.org/4982660/ it seems to be a clean way to draw candlesticks Title: Re: zipline / Quantopian - backtesting / trading framework Post by: jbsnyder on May 13, 2013, 03:50:18 PM but there is also an issue with Pandas https://github.com/pydata/pandas/issues/783 see also this notebook http://nbviewer.ipython.org/4982660/ it seems to be a clean way to draw candlesticks Stumbled on that issue before. It looks like the user who suggested a solution, even though he didn't submit a patch, has put up some of his personal charting tools: https://github.com/dalejung/trtools Haven't tried them yet, seems like it needs a few dependencies like tables (and consequently hdf5). Title: Re: zipline / Quantopian - backtesting / trading framework Post by: jbsnyder on May 13, 2013, 05:54:11 PM Because MtGox only provide (to my knowledge) only an API to download each trade (and it's a very big file !!!) I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem? Certainly. I think as it stands, I think the main problem with getting detailed trading data is that if you want to start from scratch it will take some time to pull it down from mtgox. There is a sqlite database up to fairly recent trades here (and a python script that will attempt to pull in more recent trades): http://cahier2.ww7.be/bitcoinmirror/phantomcircuit/ Edit: the script connects to mtgox.com rather than data.mtgox.com and should be updated in order to continue getting transactions. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 13, 2013, 07:38:35 PM Thanks a lot for your link...
So in your mind we could feed zipline with tick data... But I wonder if we could have differents indicators with differents timeframe (M30 and H1 for example) for example a Moving average based on M30 candlestick chart and an other indicator (RSI for example) based on a H1 candlestick chart Title: Re: zipline / Quantopian - backtesting / trading framework Post by: btc_lurker on May 13, 2013, 08:08:34 PM Thanks a lot for your link... So in your mind we could feed zipline with tick data... But I wonder if we could have differents indicators with differents timeframe (M30 and H1 for example) for example a Moving average based on M30 candlestick chart and an other indicator (RSI for example) based on a H1 candlestick chart I didn't really read the thread, but it seems people is using pandas here. Are you aware of the resample method that is available ? If you have real time (ticker) data, you can resample it based on any granularity very easily using pandas. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: MtQuid on May 13, 2013, 11:16:44 PM I've not found a good tutorial on zipline so have just been reading the source code.
This new book pulls data from bitcoincharts http://nbviewer.ipython.org/5572250 You can use non daily data but the results from TradingAlgorithm.run() are daily so you have to play around a bit at the end. The simulation will run correctly though. Can not place fractional orders. To fix the issue of not being able to place fractional orders we will have to use MtGox order volumes which are satoshi. And there are no buys or sells in results when using M15,H1 etc... even though the buy or sell takes place during the simulation. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 14, 2013, 05:23:48 AM Thanks MtGuid
@btc_lurker if we feed zipline with tick data, handle data (which is called several times) will have to resample data several times... that's why I don't know if it's a good idea.... Title: Re: zipline / Quantopian - backtesting / trading framework Post by: btc_lurker on May 14, 2013, 02:30:42 PM if we feed zipline with tick data, handle data (which is called several times) will have to resample data several times... that's why I don't know if it's a good idea.... Is that a poem ? You are supposed to collect data in real time, which is very different from doing analysis in real time. There is very little value in doing the analysis in real time, actually. So suppose right now you have all data ever produced by some exchange, and some new data come in. You aggregate it to your existing data, and, for example, each 5 minutes you update your analysis on this data. Note that there is also little value in using the entire history for doing something like EMAs, per definition of EMA. You still need to resample data, but only each 5 minutes. And resampling is done very efficiently by pandas. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 14, 2013, 04:45:43 PM Quote Is that a poem ? english is not my mother tongue that's why I'm probably not very able to write as you could expect ! I have several problems: I don't see anyone here feeding zipline with tick data and doing EMA with candlestick data inside handle_data. (so we will be able to calculate EMA on several timeframe) I also don't know if zipline will support non daily timeframe, and at least the planned to provide support for such feature. I don't how we could feed zipline with real time data Why not making analysis each time a tick is received... This is exactly what Metatrader is doing in start() function http://book.mql4.com/programm/special @knowitnothing I will have a look at your code btcx Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 14, 2013, 07:07:10 PM A new poem for you guys...
http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d Some features: - Stop Loss - Take Profit - Trailing Stop - BreakEven (set as percentage of price) Max drawdown is now 1.9 % alpha: 23.52% previous values was: alpha:16.34% max_drawdown:12.55% But I'm sure that our backtesting is false... because we are only using price (and never low_price) ToDo: variable lot size (depending of portfolio value) to keep same risk value for each trade output data such as trade entry efficiency / exit efficiency sub-daily timeframe trading (M15, M30, H1...) (see zipline team ?) add real time feed (goxtool or btcx code could help) use a file to store Stop Loss/Take Profit values (we should also consider that we could have several positions opened... that's not the case in this strategy but it could be in a more sophisticated strategy) so we need to store a trade identifier (trade number) we should also add a MagicNumber to identify that a given trade has been opened by a given strategy. http://www.onestepremoved.com/magic-number/ use scipy optimize to be able to optimize parameters. divide data into 2 parts - data for optimizing parameter - data for testing parameter in order to ensure that settings are robust walk forward analysis... Title: Re: zipline / Quantopian - backtesting / trading framework Post by: mebi on May 15, 2013, 12:25:40 AM Really fascinating stuff, playing with these examples to try and get up to speed.
A new poem for you guys... http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d c0inbuster, I get an error: Code: print "{dt}: hit BreakEvent - moving Stop Loss from {SL1} to {SL2}".format(size=size, dt=data['BTC'].datetime, SL1=self.price_SL, SL2=self.price_BE_offset) Am I correct to assume it should be Code: size=self.trade_volume Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 15, 2013, 04:45:07 AM Quote Am I correct to assume it should be Code: size=self.trade_volume no, it's Code: size=self.invested (because of commission fees) that's fixed in my gist now... if you have some coding ability could you try to implement this http://www.onestepremoved.com/backtesting-efficiency/ Quote Really fascinating stuff, playing with these examples to try and get up to speed. Overfitting a backtest, is really easy... but it does not reflect future results https://www.google.fr/search?q=backtest+overfitting Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 15, 2013, 11:24:12 AM New feature :
variable lot size (instead of fixed lot size) self.trade_volume is now a function which returns volume to trade according portfolio cash and BTCUSD price. http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d alpha= 138.7% max DD = 7.5% should I order the rolls-royce? ;D Title: Re: zipline / Quantopian - backtesting / trading framework Post by: MtQuid on May 15, 2013, 12:18:15 PM New feature : variable lot size (instead of fixed lot size) self.trade_volume is now a function which returns volume to trade according portfolio cash and BTCUSD price. http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d alpha= 138.7% max DD = 7.5% should I order the rolls-royce? ;D Nice work.... if only we could go back in time :) I think there were some missing rows in the method I used to pull from bitcoincharts so I've made some modifications and a cache that stays up to date. I've also include frequency information but still 'minute' runs are not working correctly. We might have to tamper with zipline internals to get them going. You should update your work to at least use the better bitcoincharts methods. Now in my testing I keep an up to date cache of minute data from bitcoincharts and resample that to how I want it. Fractional trades are still an issue. A fix would be to use satoshi but then the prices will all have to be suitable scaled and the results will just look a mess. Or bump the initial portfolio value up to a few billion and increase the trade volume so that there will be no fractions but then slippage is very unrealistic. Or modify zipline internals or use an inherited object About passing live trade data to an algorithm: I'm thinking of creating a new TradingAlgorithm object that is passed a DateTimeIndex and will take trade data from the phantom sqlite3 db and create OHCL and pass both simulated live data and OHCL to an algorithm under test. So the wrapper gets a datetime and frequency from zipline and extracts the data from the phantom db, passes this to the algo inder test and then passes the results back to zipline. Just an idea bu this might also be a way to fix the fractional trades issue. Updated - http://nbviewer.ipython.org/5572250 Edit: In fact even if the zipline results records are always daily it does not matter because we can put minute results into something else, or accumulate them for a day. Depends on what you want to log in the results. Currently, I guess that, the same record gets overwritten a lot for each minute in a day so only the last record of the day is saved in the results Edit: All working now. I was not using ohlc resampling Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 15, 2013, 03:33:33 PM Quote Nice work.... if only we could go back in time zipline uses Delorean... http://delorean.readthedocs.org/en/latest/quickstart.html maybe it could help ;) +1 for splitting work into 2 parts: a data producer (which store OHLC values into database) a data consumer which will read data, show prices, apply strategy but in such a case I wonder how you can inform data consumer that new data are just coming in... but maybe in your idea data consumer will only react every 5 minutes... that's quite different from Metrader start function which is launch every tick... but when I'm saying that "Metrader start function is launch every tick..." in fact I think that start function is launch every tick **if that's possible** (if previous start function call is finished) if previous start function call is not finished, even if a new tick is coming start will not be executed again. (but Metatrader shows on the GUI the new price from last tick, but expert advisor start function is not executed again) So I think a kind of mechanism with signal/slot is needed but we also need a kind of "lock" mechanism. I hope you understand what I mean... Title: Re: zipline / Quantopian - backtesting / trading framework Post by: monsterer on May 15, 2013, 04:40:52 PM I don't want to be disparaging, but I found that just using OHLCV was giving very misleading results for bitcoin when testing algorithms.
The reason is liquidity. The top-of-the-book Bid/Ask values often don't have enough volume associated with them to be tradable - in real life you'd get a partial fill on your orders, or no fill at all. You really need the entire order-book, which is what I ended up capturing. This perfectly captures the liquidity of the market for the period tested. Requires a lot of data, but storage is cheap. :) Cheers, Paul. Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 15, 2013, 04:50:02 PM don't worry... I don't consider you as disparaging us...
A volume dependant model for spread could help... Maybe you can help us to get it (using historical orderbook depth) Title: Re: zipline / Quantopian - backtesting / trading framework Post by: monsterer on May 15, 2013, 05:01:56 PM don't worry... I don't consider you as disparaging us... A volume dependant model for spread could help... Maybe you can help us to get it (using historical orderbook depth) I'm not sure how you could approximate the order-book well enough with just two values - it works fine for forex symbols because of the huge liquidity, but bitcoin is a different kettle of fish. Like I say, I just store the whole order-book every 10 seconds - really it wants to be tick by tick, though for total accuracy :) Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 15, 2013, 05:44:53 PM Could you provide such data ?
Title: Re: zipline / Quantopian - backtesting / trading framework Post by: monsterer on May 15, 2013, 06:11:24 PM Could you provide such data ? Unfortunately, I've only been collecting it for the last 6 days. I had been collecting for 1 month last year but there was a big gap between that and now so I don't have anything worth while to give you :| Best thing to do is to start collecting it now and then if you decide you need it, you've got it :) Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 16, 2013, 07:06:24 AM Live trading / paper trading with Quantopian
http://blog.quantopian.com/paper-trading-and-live-trading/ Title: Re: zipline / Quantopian - backtesting / trading framework Post by: c0inbuster on May 16, 2013, 02:56:31 PM zipline / quantopian developers seems to be interested by this thread
you can also share your experiences at https://groups.google.com/forum/#!topic/zipline/M39VhqDRORM (https://groups.google.com/forum/#!topic/zipline/M39VhqDRORM) |