By Jake Bernstein • Sep 12th, 2008 • Category: Systems Trading
Systems Trading (part
By: Jake Bernstein www.trade-futures.com
System Testing
The days of untested systems are gone forever. In fact, the pendulum is now swinging in the other direction. While unscrupulous operators ‘in the days of old’ sold systems and methods for which they claimed fantastic results, today’s unethical operators use statistics as a toll of deception. The public will always be easily duped by these individuals who, paradoxically, will benefit from the trend toward the statistical validation of systems. Manipulating statistics is not difficult. Just as Archimedes once said, ‘Give me a place to stand on and I can move the earth,’ the modern systems promoter would likely say, ‘Give me enough statistics and I can prove anything.’
This sermonette on system validation makes the point that merely testing a system and generating highly favorable hypothetical results does not guarantee success with that system. Nor should such statistics be used as a security blanket or crutch by traders. Statistics can easily be manipulated-systems can be (and are) curve-fitted, and results, unless realistic, will not reflect actual performance when the system is implemented.
Why test Trading Systems?
Traders test systems for various reasons. Some test a system merely to say they’ve done so, only to disregard the outcome or to accept mediocre results, rationalizing the negative aspects of their ‘system.’ Other traders test systems in order to sell them to the public, their goal is to optimize systems in order to show maximum performance. Then there’s the serious futures trader who tests systems to achieve several goals, including but not limited to the following:
- To determine whether a theory or hypothetical construct is valid in historical testing,
- To summarize the overall hypothetical performance of a system and to analyze its various aspects in order to isolate its strong and weak points,
- To determine how different timing indicators interact with one another to produce an effective trading system, and
- To explore the interaction of risk and reward variables (i.e., stop loss, trailing stop loss, position size, etc.) that would have returned the best overall performance with the smallest drawdown.
Test Your Trading System
While it may seem that the last item listed above refers to optimization, you will see from the discussion of optimization later in this chapter that it is not optimization according to my definition of the term. The purpose of testing systems in simply to find what will work best for you based on what appears to have worked best in the past. In so doing, we must remember that what worked in the past in hypothetical testing may not necessarily work in the future. In testing WT I have followed very strict procedures.
A thorough test of your trading system should include at least the following information:
Number of Years Analyzed
Although it is desirable to test as much data as possible, many trading systems and indicators do not withstand the test of time. The farther back you test, the less effective most systems will be. Many system developers test only five years of historical data. You must make your own decision regarding the length of your test. I have tested WT signals back to the 1970’s on many cases…
Number of Trades Analyzed
More important than the number of years analyzed is the number of trades. You need not analyze many years of important data if you have a large sample size of trades, I recommend at least 100 trades, provided your system will generate this number of trades in back-testing. If you are truly interested in determining the effectiveness of your system, the more trades you test, the better. Remember that there will always be a tendency to test fewer trades when you realize that the system is not holding up under back-testing.
Some traders argue that the factors underlying futures market trends 25 years ago were distinctly different from those during the past ten years. They feel that testing 25 years of data distorts the picture. If they are correct, how would we know when the current market forces change and that we must there change our trading systems? We are much better off finding systems that work in all types of markets. I have back-tested WT over bull, bear and sideways markets to validate its consistency.
Maximum Drawdown
This is one of the most important aspects of a trading system. A very large drawdown is a negative factor since it eliminates most traders from the game well before the system would have turned in its positive performance. Since most traders are not well capitalized, they cannot withstand a large drawdown. However, drawdown is a function of account size. Obviously, a $15,000 drawdown in a $100,000 account is not unusual; however, the same drawdown in a $25,000 account is serious. You may decide to risk large drawdown in order to achieve outstanding performance, but this is your decision.
Consider also the source of the drawdown by examining the largest losing trade. If the majority if the drawdown occurred on only one trade, you will be better off than if the drawdown was spread out over numerous successive losses.
Maximum Consecutive Losses
This performance variable is more psychological than anything else. An otherwise excellent trading system may have lost money on many trades in succession. Few traders can maintain their discipline through four or more successive losing trades. Even after the third loss, many traders are ready to either abandon their system or to find ways of changing it. However, at time it is necessary to weather the storm of ten or more successive losses. If you know ahead of time what the worst case scenario has been, you will be prepared. That’s why it’s important for your system test to give you this information.
Largest Single Losing Trade
This important piece of information indicates how much of the maximum drawdown is the result of a single losing trade, And this allows you to adjust the initial stop loss in retesting the system so as to see how large the average losing trade has been.
I strongly recommend close examination of the actual trade that resulted in the single largest loss if this loss is clearly much higher than the average losing trade. Another question to ask is: Why was the single losing trade so much larger than the stop loss selected? A single largest losing trade that is several time larger than your selected stop loss points to a potential problem, perhaps with the system test. You much investigate further in such cases.
Largest Single Winning Trade
Perhaps more important than the largest single losing trade is the largest single winning trade. If, for example, your hypothetical profits total $96,780, and $67,540 of this is attributed to only one trade, you have a distorted average trade figure. It’s often a good idea to remove this one trade from the overall results and recomputed them in order to show the performance without this extraordinary winner. You may find that the system you have tested in mediocre, perhaps even a loser, when the single largest trade has been eliminated from the performance summary. If you can wait ten years for the one big trade, the use the system – but, do so against my advice. What you’re looking for in any system with regard to average winning and losing trades is consistency-far more important than one of two extremely large winning trades that give a distorted performance picture.
The performance summaries shown later list the largest single winning long and short trades. Frequently these two trades alone account for a considerable portion of the net system profits. While some traders feel that this somehow diminishes the value of the system, I disagree. As long as at least two-thirds of the overall system performance is due to trades other than the largest single winning long and short trade combined, the system is valid. As far as numbers are concerned, I would not use any system that, after deducting reasonable slippage and commission as well as the largest single long and short winners, does not show at least $100 average profit per trade.
More importantly, because a large portion of profits in many systems derive from a very small number of trades, it is imperative that you follow each and every trade as closely to the rules as possible. Trading systems are not money machines; they don’t grind out one profit after another. Trading systems make their money on the bottom line. There are many losers and few winners. The losers are kept in check by using money management stop losses that must, in most cases, be reasonably large. And the winners, only a few of which are very large, make the game worth the candle. The trader who can’t stick with a position, or let it ride, is the trader who will be sorely disappointed with the results because the big winners will be cut short. The essence of WT is that it is willing to sacrifice small winners in order to “grab” big winners.
Percentage Winning Trades
This statistic is not nearly as important as one might think. In actuality few systems have more than 50 percent winning trades; and the more trades in your sample, the smaller this figure will be. Systems that are correct as little as 30 percent of the time can still be good systems; and systems that are accurate as much as 80 percent of the time can be bad systems. It’s easy to see that even a high degree of accuracy with a large average losing trade and small average winning trade does not make a good system.
Average Trade
This statistic will tell you what the average hypothetical trade has been. You must make certain that when you test your system you deduct slippage and commission from your average trade. Commissions add up, even discount commissions. And slippage is an important factor from when determining system performance. As a rule of thumb I recommend deducting between $75 and $100 per trade for slippage and commission. Once this has been done, you will often significantly reduce the average trade figure. As I pointed out earlier, you must also pay close attention to the largest winning trade and the largest losing trade when evaluating the average trade. The average trade figure is important since it considers all profits, all losses, slippage, and commission.
Optimization and Retesting
There has been considerable controversy about trading system optimization. What exactly is wrong with optimizing systems? Can you go too far? Is there a happy medium?
The real issues in system optimization are complex, and they’ve been exacerbated by the tendency of systems developers to optimize their programs above and beyond any reasonable degree. To optimize a system is to discover the parameters that provide the best results in hypothetical back-testing. In other words, optimization is a form of discovering what would have produced the best results using numerous if-then scenarios and numerous variables.
Before low-cost computer hardware and software, optimization was a long and laborious procedure. To discover the best fit, the systems developer would need to repeatedly backtrack and test several variables. If the system parameter were numerous, the process was virtually impossible. Obviously, computers have made this a quick and efficient task.
Such ease of testing and optimizing is both good and bad. On the one hand it allows traders to develop, test and refine (i.e., optimize) systems much more rapidly. On the other hand it has opened the door to what is called curve-fitting. The simple fact is that the powerful system-testing programs now available allow traders as well as systems vendors to repeatedly test a host of timing variables, stop losses and other risk management schemes in order to determine which combinations would have produced the best results. In effect this procedure fits the best parameters on past history to produce the best hypothetical results. However, the conclusions reached by such methods are often deceptive and specious. WT is not curve-filled. Stop loss and market variables were determined on the basis of market volatility and personality (to be discussed).
The trader who tests and retests to find the best fit will eventually reach his or her goal, but the goal itself may be nothing more than a reflection of the curve-fitted results. Tests tell us what has worked in the past, but may not reveal anything worthwhile about the futures. Since the past is not a carbon copy of the future, it is doubtful that the optimized parameters will work in the future. The more parameters in the decision-making model, the less likely they are to work in the future.
Overly optimized results lead to false conclusions. The result will likely mean losses. For those who develop and sell futures trading systems as a business, optimization is an amazing tool that allows the creation of outstanding hypothetical performance results that in turn allow systems developers to make incredible claims. And claims sell systems.
Time will tell if I am wrong about overly optimized systems. Vast personal experience, however, strongly validates my conclusions. I recall recent developments regarding several popular trading systems sold by a software developer. The advertised claims were fantastic, systems were sold for TBond futures, S&P and currency futures. The outstanding performance claims provided a strong media campaign.
Naturally, all of the proper disclaimers were made to comply with the then current regulatory requirements. There were no disclaimers regarding optimized results, however, nor was it disclosed that not all buyers of the systems would be using the same system parameters. Because the systems were continually optimized for best results, the hypothetical track records were truly impressive. However, the results did not necessarily jibe with results experienced by those who had old versions of the software-versions that did not reflect the new optimized parameters. Recognizing that there might be legal liability, the systems developers eventually disclosed this fact in small print. Few buyers understood the meaning of the disclosure and even fewer cared, given the impressive hypothetical performance record. Naturally, buyers of the software felt that they could match the hypothetical performance.
Part One, Part Two, Part Three, Part Four, Part Five, Part Six, Part Seven, Part Nine



