Data Sets

robsmith · Fri Aug 29, 2014 4:18 pm

Guys,

After reading some posts on developing strategies I have been running an automated strategy on the in play racing markets. Once I have a reasonable set of data I will look to profile races and remove those with negative expectations. The question I have is how big a data set do I need to be able to draw conclusions? 500 races, 1,000 races, etc?

Thanks,

Rob

xitian · Fri Aug 29, 2014 6:08 pm

I don't think it's an exact science as it depends what sort of strategy you're running. If it's primarily a betting strategy (i.e. just taking open positions) rather than trading strategy (i.e. opening and closing positions quite frequently), then you'll need a lot more historical data.

To generalise the above, you can look at the volatility of your returns by analysing in Excel or something. Betting strategies would have higher ups and downs, and so you need more data to verify it's working.

To give you an idea, I'm also developing an in-play strategy which has a daily PnL which is positive 78% of the days, and I'm using roughly a 3 month history. Of course this is only from backtesting.

The trickiest part with looking at historical data is that you have to be very careful not to overfit your strategy. So when you come to "profile and remove races with negative expectations" you need to be clear about your reasoning. Removing all horses with names that started with a "J" may have improved your historical results, but it's clearly just coincidence and non-sensical. Really you need to keep a set of data to do out-of-sample testing once you've done any tweaking.

Perhaps someone else with more formal statistical training can suggest how to calculate some tests for significance. Personally, I just plot a chart and see if it looks like it's going up or not! At the end of the day you need to be comfortable yourself with how volatile the returns are. If my chart doesn't look consistent enough to me, then I don't put money on it.

Dublin_Flyer · Fri Aug 29, 2014 8:05 pm

I'll 2nd what Xitian says about overfitting. I've had and am still having long long debates on another forum about systems people use on Horseracebase.
At what point does the line come where you're using continuous previous history which has been profitable, or just eliminating some features like class/distance/track etc, because they don't make pretty figures, so you're backfitting yourself into a corner.
Personally I prefer to leave my systems as wide open as possible so I'm not backfitting stats to hope the future ones follow up, unless there's something that really stands out, e.g. 3/45 win rate in Beginners Chases, 7/45 place rate Beginners Chases, both for massive losses. I'm a gambler so I wouldn't be touching them, but if I was a trader they wouldn't be near in contention for winning, so I'd steer clear IR too.

In a nutshell, if you think you have valid and thoroughly thought through reasons for including or excluding things in your trading, then you should be ok, if you're filtering willy-nilly for the big + or big -, then shit gets ugly.

Wyndon · Sat Aug 30, 2014 11:27 am

Agree with Xitian and Dublin Flyer. I'm developing an in-play automated strategy which seems to be working for All-Weather and the Jumps - but not for ordinary flat racing. It's really bugging me that before I exclude flat races all together, I would like to have some intuitive reasoning for doing so.

Data Sets

Login • Register