Application of data

mordenboy · Wed Jun 07, 2023 1:01 pm

Hello

I have a general question regarding applying historical data to future strategies.

Given long enough do all odds based trends regress to zero, or can trends be assumed to continue?

For example (and these are just imagined, not genuine)
Backing every favourite in hunter chases over 2m5 for the previous 10 years yields a profit
Or
Backing every 3rd fav in 1m maidens for the previous 10 years yields a profit.

Would you assume that over time, these will regress to zero, or that for whatever reason, the implied chance underestimates the actual chance under these conditions and the trend will continue over the forthcoming years

Trader724 · Wed Jun 07, 2023 4:46 pm

Your question is an interesting one and it touches on a fundamental issue in statistical analysis and prediction. In general the answer to your question is that it depends on the specific trend you are looking at and the context in which it occurs.

In some cases, trends can persist over long periods of time while in others they may eventually regress to the mean. Past performance is not always a reliable indicator of future results and there are many factors that can influence the outcome of any given event.

That being said, there are some strategies that have been shown to be successful over long periods of time such as value betting or following certain trainers or jockeys however it's important to continually evaluate and adjust these strategies based on new data and changing circumstances. It's important to approach any strategy with a critical and analytical mindset and to be willing to adapt.

ShaunWhite · Wed Jun 07, 2023 6:23 pm

Yep I agree with that, I'll just mention that whatever data you've got you'll find an edge in it unless you apply some way to eliminate back fitting. The usual way is to randomise your data and use half for searching and half for the proof. You need an "unseen" sample to test your idea against.

But generally simple things like traps, distances trainers etc won't tend to persist and will revert back to mean. There's a hell of a lot of people looking at that sort of thing and any market inefficiency is soon ironed out. What does persist though are the fundamentals that underpin any type of market such as basic supply and demand.

But your own effect on the market can't be underestimated. Even if you find a small something that others haven't, you're taking money from people (could be one or many). That might make them change their behaviour or refine what they're doing.... and you end up "shooting all the fish".

mordenboy · Wed Jun 07, 2023 9:24 pm

Thank you for the interesting replies. Great to get some insights.
The extent of my understanding begins to fail at the point of randomising data.

Below is a genuine data set, but for simplicity of example, lets say its for winning favourites in every listed NH race in GB. No track, trainer, jockey etc, just betting posotion of fav.

Year
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Total
Number
24 17 22 24 18 21 25 16 22 26 215
P&L to £1
5.61 -1.87 10.38 8.74 0.28 7.78 12.44 -2.43 2.26 9.60 52.79

So you can see, it looks like the odds available for favs in listed races are greater than would accurately represent it's chance.

What would be the steps from here to decide if this was a trend likely to continue, or, to expect some losing years eventually regressing to zero? Both could be advantageous to know of course

Many thanks

mordenboy · Wed Jun 07, 2023 9:36 pm

No matter how I lay the numbers on the screen, they appear like that when posted unfortunately, im sure you'll get the idea

Trader724 · Wed Jun 07, 2023 10:50 pm

Shuffle your data randomly before splitting it into training and test and use different random splits. This helps to ensure that the data is representative and not biased towards any particular order or sequence.

mordenboy wrote: ↑
Wed Jun 07, 2023 9:24 pm

What would be the steps from here to decide if this was a trend likely to continue, or, to expect some losing years eventually regressing to zero?

To figure out what's really going on you'll need to dig a little deeper. Take a closer look at the data and see if there are certain track conditions or distances that seem to give favorites an edge. You should also think about other things that might be affecting the trend like changes in the competition, rules, or market conditions.

ShaunWhite · Thu Jun 08, 2023 1:16 am

Another point to consider is something called statistical significance. If you were to toss a fair coin 10 times, you'll only get 5 heads about 25% of the time. The point being that outcomes will rarely match the probability that they'll happen but it doesn't necessarily mean the difference is meaningful. 4 heads or 6 heads are 21% chances, so although 6/10 even money winners looks interesting, it's within the range of outcomes that aren't particularly unlikely, and therefore not significant. But 3 or 7 heads, now that's a 12% chance, 8/1 in old money, and starting to feel like something less likely to be occurring through chance.

On a practical note though, Excel makes FFWD betting easy, but handing over piles of money week after week in bullet time hoping to win a couple of grand by Christmas would be hard, and insane.

...Have you included commission?

ShaunWhite · Thu Jun 08, 2023 1:51 am

Trader724 wrote: ↑
Wed Jun 07, 2023 10:50 pm
Shuffle your data randomly before splitting it into training and test and use different random splits.

And so starts the emotional rollercoaster, training data is fantastic and full of huge golden nuggets, test data is dog dirt where nothing works and dreams die.

Atho55 · Thu Jun 08, 2023 8:29 am

Here is something to ponder over. Taking Listed races from 2010 to 2022, the Sum and Count of Win/Lose you can compare an expected No of wins v the Actual No of wins at each BSP. These are =FLOOR() so groups the BSP values in small = increments. This is a portion of the data table. So 2 odds highlighted yellow, 78 started, expectation 50% to win, actual wins = 33 and so on down the list showing the difference in the Diff column. Doing a cumulative on the Diff values shows that it bottoms out at odds 3.2 so possibly nothing to be gained by betting below those odds. The small Pivot table in the centre picks out the diff values >=2. Plotting out expected v actual looks like this below.

Listed 1.png

Listed 2.png

mordenboy · Thu Jun 08, 2023 2:48 pm

That's really helpful

i've tested at different SPs from 1.01 to 6 at this point, at every increment the actual wins exceed the expected wins.
I then tested only using every other day, across the years, then a random sample of months through the years and the actual always exceeds the expected.

ShaunWhite · Thu Jun 08, 2023 5:23 pm

mordenboy wrote: ↑
Thu Jun 08, 2023 2:48 pm
That's really helpful

i've tested at different SPs from 1.01 to 6 at this point, at every increment the actual wins exceed the expected wins.
I then tested only using every other day, across the years, then a random sample of months through the years and the actual always exceeds the expected.

If it always exceeds it then that's illogical and there has to be an error in your workings out. Remember the #1 rule in any analysis is to prove yourself to be wrong, not right. If not you get all sorts of confirmation bias issues.

Trader724 · Thu Jun 08, 2023 6:03 pm

The value of a golden nugget is tied to its scarcity. Obtaining one will require prospecting, digging, and sifting through large amounts of dirt. Maybe even dog dirt. You never know.

mordenboy · Thu Jun 08, 2023 7:05 pm

ShaunWhite wrote: ↑
Thu Jun 08, 2023 5:23 pm

mordenboy wrote: ↑
Thu Jun 08, 2023 2:48 pm
That's really helpful

i've tested at different SPs from 1.01 to 6 at this point, at every increment the actual wins exceed the expected wins.
I then tested only using every other day, across the years, then a random sample of months through the years and the actual always exceeds the expected.
If it always exceeds it then that's illogical and there has to be an error in your workings out. Remember the #1 rule in any analysis is to prove yourself to be wrong, not right. If not you get all sorts of confirmation bias issues.

I should have said, this was the SPs of the fav, not all runners. 2nd, 3rd and 5th fav, the actual was under the expected. 4th fav exceeded the expected also.

ajanthony · Mon Jan 01, 2024 6:25 am

Hello,
My question is not the same as the previous but it is to do with statistics.
How many races is enough to believe the edge you have is valid. I am currently collecting data which shows that after 2500 races I have a an edge of just over 1%. I intend to sample 10,000 races, is this enough.
My strike rate, or should I say, my positive return rate (as I lay, not back) is 67% and this has remained constant right from the start.
Any insight appreciated.
Regards,
Anthony

firlandsfarm · Mon Jan 01, 2024 11:07 pm

ajanthony wrote: ↑
Mon Jan 01, 2024 6:25 am
Hello,
My question is not the same as the previous but it is to do with statistics.
How many races is enough to believe the edge you have is valid. I am currently collecting data which shows that after 2500 races I have a an edge of just over 1%. I intend to sample 10,000 races, is this enough.
My strike rate, or should I say, my positive return rate (as I lay, not back) is 67% and this has remained constant right from the start.
Any insight appreciated.
Regards,
Anthony

OK, this is the answer you probably don't want to hear! Enough is never enough. As the other contributors have said the problem is you could have data on an infinite number of races and the next race is when the market swallows your edge. Also the more data you have the older it is from when you started recording so maybe of diminished value.

Application of data

Login • Register