Evaluating parameters for EMA crossover trading: A rigorous approach (kind of)

Preliminary note #1 I originally posted this in Goomboo's thread, which started out a long time ago as an introduction to the EMA20+10 crossover method, and has since then turned to the more general topic of backtesting (average based) trading strategies. If you're unsure about the basic idea behind EMA based trading strategies, read his thread. Goomboo wrote an excellent index for it a while ago, making it easy to navigate.

Preliminary note #2 This post should be self-contained, but I should make special mention of the software I used for backtesting: Gekko, available on github. It is easy to install and use, and highly recommended if you plan to use an EMA crossover strategy yourself. Among other things, it calculates the averages (and potential crossovers) in real-time, so you will receive trading signals without delay, and it allows you to set thresholds for the crossovers which is extremely important for the 1 hour variant of the EMA method. Many thanks to Mike van Rossum.

Preliminary note #3 This is going to be an extremely long post. One day I will be able to write and explain my ideas in a more concise manner. Today is not that day. Consider yourself warned. There's a summary at the end that contains all the important bits, feel free to skip the rest.

Introduction

The following test of different EMA strategies was based on a few simple assumptions that can be summarized as follows:

(1) Recent performance (i.e. the result of back-testing on recent historical data) is a much greater factor in determining future profits than performance on older historical data.

(2) The simple parameter search we employ seems to favor curve fitting, which makes future performance of the parameters doubtful. Therefore, we need to back-test across different time segments to confirm results found in recent history.

(3) Trading is more expensive than previously assumed. It is necessary to take into account not only the rather high trading fees itself, but also slippage while executing a trade, and possibly even the delay between the EMA method signaling a trade and the trader executing it, which tends to cut into profits.

Methodology

The obvious parts: I'm using Gekko for back-testing, performing a search for optimal parameters for EMA crossover methods, yadda yadda yadda. All of the following is based on mtgox data.

I didn't perform an exhaustive search across those main EMA parameters, but only plugged in those values that had been previously established as profitable in this thread. In particular, I need to thank ErebusBat for his August 10 post, highlighting the profitable parameter ranges.

I only tested *hourly* and *daily* variants of the EMA method. A very sloppy test on 12h interval gave pretty bad results. I suspect that a 2h version of EMA crossovers (with different parameters of course) might perform quite well. I chose '1 hour' and '1 day' because I wanted to see results from the usable range of time periods: faster than 1h is almost certainly too sensitive, slower than 1d probably is too slow for the pretty volatile Bitcoin market.

Time period(s)

Like I said above, I believe it makes sense to put more weight on recent performance when trying to find optimal parameters for an EMA crossover trading strategy. The reason is simple: say a method performed spectacularly well during 2011 and 2012, but fails to generate any profit in the current year. It would be rather naive to simply look at the aggregate profit and conclude that the method will perform with that (aggregate) profit on average in the *future*. Basically, it would be nice if our method would have generated money not only in the distant past but also the last few months, right?

I decided to start the primary backtest after the April 10 peak, for the reason that the (double?) exponential growth during the weeks before April was unusual, even for the fast bitcoin price movement. So post-peak it is. I based the decision where to start exactly after the peak on Bollinger Bandwidth (2h interval). Volatility around and immediately after the April 10 peak (initial correction) was extremely high, but around May 4th it went back to normal. Well, bitcoin normal. So the "recent history" is 2013-05-04 to 2013-08-22 (110 days). Almost 4 months, enough data to chew on in my opinion.

In addition, I needed another period to backtest the backtesting results on. For that I chose approximately a time period going back 1 year from, so the "entire history" is: 2012-10-22 to 2013-08-22 (304 days). Yes, they overlap. Deal with it.

Benchmark

We need something to compare our results again, right? "Buy & hold" (from now on: B&H) is the obvious candidate. However, B&H is problematic. Why? You'll notice that when proponents of B&H on this forum talk about its profits, they assume you bought in at a really low price point, like in early January, or at the lowest point after the bubble burst in April. Which is bullshit of course. That's not "B&H", that's "magically knowing the local minimum", and if you can do that, you don't need trading advice anyway, you're already the richest guy on Earth.

So let's define B&H for our purposes: B&H profit is calculated as the volume weighted 12h price in the middle of the last day of our testing period, divided by the price halfway through the first day of the testing period.

Gekko settings

As I said, I think most parameter searches in here assumed slightly too low trading costs. An actual trading fee of 0.5% is quite normal on mtgox as far as I know. Also, some slippage usually occurs (unless you buy or sell only minimal amounts). Finally,unless you have a trading bot, there will be a delay between the EMA method signaling a trade and the trade being executed, which often reduces profits. So I decided to set the "fee" value to 1%, which includes all profit reducing factors mentioned above. Note that setting this value higher favors B&H on the one hand, and "slower" EMA methods (i.e. methods that react slower to trends, and yields less trade signals) on the other.

The initial history setting defines the number of candles at the beginning of the data that are reserved to calculate the EMA methods initial average values. Default is 100, but this is a bad value for our purposes: the EMA methods would include all the volatile data for the initial history that I carefully excluded. So candles is set to 1, which essentially means the EMA method buys immediately. Which is good, because that's what B&H, our benchmark, does as well, so they start on equal grounds.

Results

Finally.

Anyone still reading this?

Congratulations. You must be really bored.

Hourly intervals

The list starts with the classical EMA20/10, since that's how Goomboo's thread started. You will notice it performs pretty bad.

The only other parameters I list below are the ones that performed best during backtesting. In particular, if a parameter combination didn't manage to outperform the B&H benchmark on the recent data (May to August), I immediately dismissed it (except for EMA20/10), based on my initial assumption that performance on recent data matters most.

Results are ranked by profit on recent history (h1), followed by profit on the entire history (h2), the 'long' and 'short' average parameter, the optimal 'threshold' values and finally, number of trades executed.

Code:

time periods
------------

h1: 2013-05-04 to 2013-08-22 (110 days)

h2: 2012-10-22 to 2013-08-22 (304 days)


EMA results
-----------

* 1h EMA20+10, -0.3/0.3
  h1: +2.5%, 41 trades
  
* 1h EMA24+15, -0.4/0.4 
  h1: +32%, 19 trades
  h2: +518%
  
* 1h EMA29+18, -0.25/0.25
  h1: +38%, 23 trades
  h2: +850%

* 1h EMA28+18, -0.4/0.4
  h1: +42%, 15 trades
  h2: +823%

Code:

benchmark
---------

recent history (h1): 2013-05-04 to 2013-08-22 (110 days), B&H profit: +9.9%

entire history (h2): 2012-10-22 to 2013-08-22 (304 days), B&H profit: +945%

Pretty good recent profits for the hourly methods, huh? But did you notice that I didn't find a single hourly EMA method that outperforms B&H on recent data AND on the entire history? Some come close though, EMA29+18 gives +850% on h2 vs. B&H 945% on h2.

But as I said several times, I don't think performance on the older data is equally important. It mainly serves as a sanity check to protect us from too much curve fitting. Here's an example of such curve fitting: I found some parameter combinations that performed even better than the ones I listed when set to some very high threshold value, like -1.8/1.8, or some odd combination of threshold, like -0.2/1.8. If you test those parameters on the older history (h2) however, you'll see that it falls apart. That's a pretty good indicator that those parameters where the result of curve fitting.

In conclusion, several hourly parameter combinations drastically outperform B&H during h1, even with the rather high trading fee I chose. Those methods perform well enough on the entire history as well, so I expect them to be reasonably generalized and perform well enough in the (near) future.

Daily intervals

Thresholds were 0 for all of the daily results. The daily EMA crossover method is "cautious" enough already and doesn't benefit from thresholds, it seems.

Code:

EMA results
-----------

* 1d EMA21+20
  h1: +4%, 3 trades 
  h2: +826%

* 1d EMA37+5 
  h1: +6%, 5 trades
  h2: +288%

* 1d EMA24+15
  h1: +7%, 3 trades
  h2: +826%

* 1d EMA21+18 
  h1: +9%, 3 trades
  h2: +847%

* 1d EMA29+12
  h1: +12%, 3 trades
  h2: +827%

* 1d EMA20+10
  h1: +17%, 3 trades 
  h2: +366%
  
* 1d EMA5+1
  h1: +20%, 19 trades
  h2: +1339%
  
* 1d EMA20+1
  h1: +23%, 9 trades
  h2: +878%

* 1d EMA23+3 
  h1: +24%, 5 trades 
  h2: +564%

* 1d EMA24+2 
  h1: +24%, 5 trades
  h2: +727% 

* 1d EMA20+6 
  h1: +32%, 3 trades 
  h2: +676%

* 1d EMA16+4
  h1: +32%, 3 trades
  h2: +639%

Code:

benchmark
---------

recent history (h1): 2013-05-04 to 2013-08-22 (110 days), B&H profit: +9.9%

entire history (h2): 2012-10-22 to 2013-08-22 (304 days), B&H profit: +945%

As you can see, daily method profits on recent history are somewhat lower than those of the hourly variant, but the number of trades is much lower as well.

Also, I think I found the answer to a question that came up earlier in this thread. Marcus Antonius reported that daily EMA20+21 generated a spectacular 3533% profit over the entire history of trading. The h2 profit of that parameter in my own test confirm this, it is rather high at +826%. Applying this parameter to the recent data h1 however is much less spectacular, only +4% profit. It's up to you to decide of course, but I wouldn't trust a method, that once upon a time performed extremely well, but in the past months failed to generate any serious profits.

On to the better parameters: 20+6 for example performs very well on recent data (+32%) and reasonably well on the entire history (+676%, vs. B&H 945%).

One strange beast showed up in my search: EMA5+1 is a rather fast version of the daily method, compared to the other parameter combinations (see the high number of trades). It performs relatively well on recent data, and *extremely* well on the old data. I'm not sure what to make of it, but I suggest to watch how this combination performs in the future.

Summary / Conclusions

There seems to be a trade-off between "historical" performance and "recent" performance. My assumption was that recent performance is more important for future performance, but we also need to look at performance in the more distant past to check how consistently profitable the found parameters really are.

EMA20+10 is dead. At least with hourly interval size.

Hourly EMA crossovers absolutely need threshold values to be profitable, especially if we assume relatively high cost of trading (fees+slippage).

One of the best *hourly* parameters I found during my search: EMA28+18, threshold -0.4/0.4. Profit May to August: +42% (vs. B&H profit +9.9%). Parameters hold up well in the "historical" back-test as well.

Good *daily* parameters: EMA20+6, EMA16+4. Both generate +32% profit on recent data. Note that those profits were generated with very few trades (3), which could be important if you want to trade as seldom as possible, and if your trading volume is large enough that your profits are reduced by slippage.

Another profitable *daily* combination: EMA20+1. Profit May to August is +23% (less than the ones above), but still outperforms B&H during recent history. Note that this combination performed better on historical data, and it generated the recent history profit with a total of 9 trades, more than the parameters above, which increases the chance that future results are somewhat in line with historic results.[1]

Which interval is better, hourly or daily? Hourly EMA methods can generate higher total profits in principle, but there's a trade-off: hourly interval requires a much higher number of trades to reach this profit, which means more work for the trader, more fees, more chance for slippage and more room to make mistakes. Personally, I would recommend using methods with daily interval size.

Here's my attempt to answer the recurring question "Can a simple strategy like EMA crossover actually beat B&H?". Answer: it depends. If you have (a) enough time to trade often, and more importantly: trade as soon as you receive a crossover signal, and (b) trade with a relatively small volume so that slippage stays manageable, then the crossover method beats B&H by a significant margin. On the other hand, if your goal is to trade as little as possible, or your trading volume is large enough that it causes significant slippage on your exchange, then B&H might be the better choice. But even if you use B&H, I would still suggest to use one of the daily EMA crossovers to determine at which point to *buy in*, e.g. to avoid buying in in the middle of a big correction/downtrend.

Another caveat: there's always the risk that past performance and future performance diverge. As we've seen, testing different EMA parameters on different partitions of the historical data yields very different results. There is always the possibility that the EMA parameters you choose now, based on back-testing, will actually generate a loss in the future. In practice this means you should probably define a limit up to which you trust your method: if the parameters you chose are unprofitable for, say, 2 or 3 months in a row, it might be time to look for new parameters.

Addendum: Higher Volume, Higher Slippage

As was pointed out in a discussion by your friendly neighborhood whale, Rampion, my initial assumptions about cost per trade don't hold for trading with relatively high volume. While the exchange fee will go down, slippage will drastically increase. So I decided to repeat parts of my test assuming even higher costs per trade. In order to estimate the expected slippage percentage, I knew I had to write a program that reads in historic trading data and estimates slippage as a function of order volume and market depth. Then I thought, fuck it, that's way too much work, and decided to guess the percentage instead. Half-assedly, even.

First, let's establish which kind of volume we're talking about. The results I posted above should hold for traders with an order volume of up to 100 btc. Looking at the recent market depth seems to confirm that orders of up to 100 btc should be filled not further than 1%, maybe 2% away from market price. (Note that slippage is defined by the cost of the entire order, so even if part of the order is filled at a much higher/lower price than market price, some parts probably were filled closer to market price.) Let's say then that 'higher volume' means order volume well above 100 btc, and up to around 1000 btc. Anything bigger than that takes us into proper whale territory, and I'm going to guess that EMA crossover strategies are not their method of choice.

Looking at the candles of some recent high volume orders (half-assed, remember?) I estimate the slippage range for orders between 100 btc and 1000 btc to be between 2% and 4%. Obviously, slippage varies wildly from day to day, but usually it should be possible to stay in that range, maybe breaking up an order in half if necessary. Given those values, I repeated the backtest on the 'recent history' (h1) period for the most profitable daily parameters. Note that I didn't bother repeating the backtest of hourly EMA parameters: a quick look at their number of trades will show you that they become completely pointless with that kind of slippage, and will incur heavy losses. The results:

Code:


* 1d EMA5+1
  h1, 2%: -0.2%
  
* 1d EMA20+1
  h1, 2%: +13%
  h1, 3%: +3%
  h1, 4%: -5%

* 1d EMA23+3 
  h1, 2%: +11%
  h1, 3%: +6%
  h1, 4%: +1.2%

* 1d EMA24+2 
  h1, 2%: +19%
  h1, 3%: +14%
  h1, 4%: +8%

* 1d EMA20+6 
  h1, 2%: +29%
  h1, 3%: +26%
  h1, 4%: +23%

* 1d EMA16+4
  h1, 2%: +29%
  h1, 3%: +25%
  h1, 4%: +22%

Not surprisingly, the "fast" daily parameter 5+1 (with a high number of trades) is almost as bad as the hourly parameters under such high trading costs. Several other parameters (e.g. 20+10, not listed here) only barely outperformed B&H in the original test, so after applying higher trading costs, they now fall below B&H profit. But three of the parameter combinations that combine high profits with a relatively low number of trades stay profitable even when applying higher trading costs (2%, 3%), and the two most profitable combinations of the previous test (20+6, 16+4) substantially outperform B&H's +10% even with a very high cost per trade of 4%.

Note also: One advantage of using B&H as our benchmark is that we can think of B&H as a special case of an EMA crossover strategy in which all crossover signals are ignored by the trader. Practically this means that a trader with high volume could in principle resolve to follow a profitable (daily) EMA crossover strategy, but only execute those orders that will be filled reasonably close to market price. In the worst case, he won't execute any trades, in which case his strategy will effectively be B&H, but he will never perform *worse* than B&H if he sets limit orders within the profitable range (around 4% max).[2] You could think of the profit numbers above as "maximum profit" then.

All of the above summed up, in bright red letters:

If your bitcoin investment is still relatively small, and you want to trade aggressively to increase your coin stash (and you don't mind spending a lot of time on it), consider one of the high profit hourly EMA parameter combinations, like EMA29+18 (threshold: -0.25/0.25) or EMA28+18 (threshold: -0.4/0.4). Watch your costs per trade though!

If you already own a larger amount of coins (or want to invest a large amount of fiat) and you know you have to be careful about slippage, or if you simply don't want to waste all your time on trading, consider one of the daily parameter combinations, like EMA20+6 and EMA16+4. Trade by setting limit orders within the profitable range.

Important for everyone who uses EMA crossovers: profitability of parameters changes over time. Don't expect the parameters that perform well now to still be the best ones in a year from now.

Footnotes:

[1] That assumption is based on intuition, but if necessary, I can offer a (sloppy) proof sketch.

[2] Not entirely true. In the worst case, a high-volume trader would be able to *sell* btc near market price, but since the emphasis of B&H is on holding btc, we have to buy back at some point, and it is conceivable that all opportunities to buy back btc would incur high slippage.