Strategy Evaluation

Introduction

In the age of computerized trading, anyone can download historical price data, run an optimization script, and generate a strategy that appears to make millions of dollars. However, when these strategies are traded with live capital, they almost always fail. Why? Because they fell victim to Curve Fitting—the process of over-optimizing rules to fit historical noise.

Professional Strategy Evaluation is the science of testing a strategy to prove it has a genuine, robust statistical edge that will survive in live markets.

Why It Matters

Prevents Capital Loss: Filters out fragile, over-optimized systems before you risk real money.
Builds Psychological Conviction: Knowing your strategy has passed rigorous statistical tests helps you maintain disciplineDisciplineThe psychological ability to strictly execute your trading plan and rules consistently, regardless of emotional pressures.Read full glossary entry → during drawdowns.
Measures Real-World Friction: Factors in the spreads, commissions, and slippage that simple backtests ignore.

Core Concepts

1. In-Sample vs. Out-of-Sample Data

When evaluating a strategy, you must divide your historical data into two separate pools:

In-Sample (IS) Data (e.g., 70% of data): The data used to design and optimize the strategy parameters.
Out-of-Sample (OOS) Data (e.g., 30% of data): The 'unseen' data used to test the strategy. If a strategy performs well on IS data but fails on OOS data, it is curve-fitted and must be discarded.

2. Curve Fitting (Over-Optimization)

If you optimize a strategy too much, you are not finding a repeatable market edge; you are simply creating a historical database query. For example, setting your indicator rules to: "Buy when RSI is exactly 29.4 and SMA cross is 12.8 on Tuesday morning" is curve-fitted. It worked perfectly in the past, but has zero predictive value for the future.

3. Statistical Significance

To prove your strategy is profitable due to edge rather than luck, you must analyze a significant sample size:

Small Sample (10-30 trades): High probability that results are driven by random streaks (variance).
Large Sample (100-200+ trades): High probability that results reflect the true expectancy of the edge.

Professional Applications

Professional quant traders evaluate strategies using a Parameter Sensitivity Grid. They test a range of parameter values (e.g., moving average lengths from 10 to 30) and plot the results.

If the strategy is robust, the surrounding parameters will all show similar profitability. If the strategy is fragile, only one specific value (e.g., 21 SMA) will be highly profitable, while surrounding values (20 or 22 SMA) lose money. Professionals discard parameter-sensitive strategies immediately.

Common Mistakes

[!WARNING]

Optimizing Until the Curve is Perfect: Tweaking rules until the historical backtest has no losing trades. This guarantees immediate failure in live markets.

Ignoring Execution Friction: Assuming zero slippage and zero commission in backtests. In live trading, these costs eat up marginal edges.

BacktestingBacktestingThe process of testing a trading strategy against historical data to evaluate its performance and expectancy before risking real money.Read full glossary entry → Too Short a History: Testing a strategy only during a massive bull marketBull MarketA market condition characterized by a sustained period of rising prices, optimistic investor sentiment, and strong economic/fundamental indicators.Read full glossary entry → (e.g., 2021) and assuming it will perform during a bear marketBear MarketA market condition characterized by a sustained period of falling prices, typically defined by a decline of 20% or more from recent highs, accompanied...Read full glossary entry → or consolidation phase.

Strategy Evaluation

Interactive Model

Strategy Robustness Score

Strategy Conceptualized