Hi all,
Anyone know what is an adequate sample size of race finish times in order to calculate a good avg time;
Per course, distance, going, etc. separated by flats & AW. Might do NH as well just for the sake of it, but mainly interested upto 1m trips.
Going to scrape it, but when talking about 5f, 6f, 7f, 8f, at every course, at all goings... gona need years worth of data me thinks.
I was guesstimating 100 samples each? Not enough or overkill?
Sample size for calculating avg Fin.Times
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
Could you do it live from TPD Pars?
Or there's some info here <https://medium.com/@paulingliam/using-m ... 40776536e4> from someone who's fairly smart
But if neither works for you then basically just plot what you can, draw trendlines through it and turn those into equations to approximate the missing data.
On the distances you should be able to draw curve through your sample times (a scatter chart with time on the Y distance on the X) from maybe as few as a couple of hundered per course if they span a decent range of distances. Turning that chart of times into something useful to automation means describing it with a formula.....
Add an excel 3rd order polynomial trendline through your 'normal' going distances, it's a curve because long races are slower per furlong so you can't just use a linear trend or straight s/f. Display the trendline equation and note the factors.
Once you have those 3 factors, you can insert X as the race distance in furlongs and it will give you the estimated race duration.
Eg for Chester I ended up with 0.19 10.2 and -2 as my 3 factors.
Insert that into a polynomial equation for a race distance of 16f and you get...
=(0.19 * 16^2 ) + (10.2 * 16) + (- 2) = 210s
Substituting with the factors for Epsom (0.28 10 & -12)
=(0.28 * 16^2 ) + (10 * 16) + (- 12) = 220s
and a 3½ miler would be
=(0.28 * 28^2 ) + (10 * 28) + (- 12) = 488s (far more than twice as long but not even twice as far)
Make sure you err on the fast side, it's better to think a race is done when it's not that try to be betting when they've finished. You've got to be careful with trendlines though, there's a set of equations called Anscombe's Quartet that illustrates the problem. Sample size will be whatever looks like a relaible trend is forming, vague but true. .
.
I'm not able to share anymore than that, what I have is very much based on a huge amount of work done by someone else and it wouldn't be right to just disseminate it.
Or there's some info here <https://medium.com/@paulingliam/using-m ... 40776536e4> from someone who's fairly smart
But if neither works for you then basically just plot what you can, draw trendlines through it and turn those into equations to approximate the missing data.
On the distances you should be able to draw curve through your sample times (a scatter chart with time on the Y distance on the X) from maybe as few as a couple of hundered per course if they span a decent range of distances. Turning that chart of times into something useful to automation means describing it with a formula.....
Add an excel 3rd order polynomial trendline through your 'normal' going distances, it's a curve because long races are slower per furlong so you can't just use a linear trend or straight s/f. Display the trendline equation and note the factors.
Once you have those 3 factors, you can insert X as the race distance in furlongs and it will give you the estimated race duration.
Eg for Chester I ended up with 0.19 10.2 and -2 as my 3 factors.
Insert that into a polynomial equation for a race distance of 16f and you get...
=(0.19 * 16^2 ) + (10.2 * 16) + (- 2) = 210s
Substituting with the factors for Epsom (0.28 10 & -12)
=(0.28 * 16^2 ) + (10 * 16) + (- 12) = 220s
and a 3½ miler would be
=(0.28 * 28^2 ) + (10 * 28) + (- 12) = 488s (far more than twice as long but not even twice as far)
Make sure you err on the fast side, it's better to think a race is done when it's not that try to be betting when they've finished. You've got to be careful with trendlines though, there's a set of equations called Anscombe's Quartet that illustrates the problem. Sample size will be whatever looks like a relaible trend is forming, vague but true. .
.
I'm not able to share anymore than that, what I have is very much based on a huge amount of work done by someone else and it wouldn't be right to just disseminate it.
You do not have the required permissions to view the files attached to this post.
At least a thousand (law of large numbers, to avoid effects of "law-of-small-numbers") and its median (for robustness).
https://effectiviology.com/law-of-small-numbers/
https://effectiviology.com/law-of-small-numbers/
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
-
- Posts: 1067
- Joined: Fri Nov 20, 2015 9:38 am
Really need to look at the variance associated with the sample in order to understand your mean.
And probably want to split samples by race grade & conditions. And take out outliers...
==> a random sample of c5% say of all races in past year, split by above criteria should give you a pretty good estimate (taking account of variance)
And probably want to split samples by race grade & conditions. And take out outliers...
==> a random sample of c5% say of all races in past year, split by above criteria should give you a pretty good estimate (taking account of variance)
Median solves this problem.
- Brovashift
- Posts: 475
- Joined: Tue May 18, 2021 12:35 am
Wow.... here's me thinking I'm being really clever. Apparently not
I was just thinking of collecting an adequate number of samples per 'criteria' and get the avg time simply by avgTime = totalTotal / numberOfSamples, so when reviewing a selections finish times I can see if it is above or below the course avg finish times.
I'm not automating anything, I'm laying in play, so just looking for weak / slow runners.
I'm using a Python web scraper to scrape today's racecard and historical data, which is saved as csv. Then I've created a userform that pulls in the data, and just makes my life easier and allows me to analyse the racecards faster. I just want to display course avg time (by trip, going, class, type) in a label on the form when racecard selected.
I currently use the old pen and paper method to identify my selections, but in all honesty it hurts my brain after a days trading, so am writing some logic code that will do it at the click of my mouse. That way I should only have to dive deep into my psyche once so I can focus on watching my selections and reading the race. That's the plan anyway
I was just thinking of collecting an adequate number of samples per 'criteria' and get the avg time simply by avgTime = totalTotal / numberOfSamples, so when reviewing a selections finish times I can see if it is above or below the course avg finish times.
I'm not automating anything, I'm laying in play, so just looking for weak / slow runners.
I'm using a Python web scraper to scrape today's racecard and historical data, which is saved as csv. Then I've created a userform that pulls in the data, and just makes my life easier and allows me to analyse the racecards faster. I just want to display course avg time (by trip, going, class, type) in a label on the form when racecard selected.
I currently use the old pen and paper method to identify my selections, but in all honesty it hurts my brain after a days trading, so am writing some logic code that will do it at the click of my mouse. That way I should only have to dive deep into my psyche once so I can focus on watching my selections and reading the race. That's the plan anyway
- Brovashift
- Posts: 475
- Joined: Tue May 18, 2021 12:35 am
Mode, median, and mean??? I was just reading about this in Statistics Without Tears
Wouldn't that method be thwarted by hills? At Epsom, the first 4 furlongs of the 5-furlong start is downhill resulting in about the fastest 5 furlongs in the world, whereas the first 4 furlongs of the Derby course rises over 100 feet.ShaunWhite wrote: ↑Fri May 19, 2023 2:07 amCould you do it live from TPD Pars?
Or there's some info here <https://medium.com/@paulingliam/using-m ... 40776536e4> from someone who's fairly smart
But if neither works for you then basically just plot what you can, draw trendlines through it and turn those into equations to approximate the missing data.
On the distances you should be able to draw curve through your sample times (a scatter chart with time on the Y distance on the X) from maybe as few as a couple of hundered per course if they span a decent range of distances. Turning that chart of times into something useful to automation means describing it with a formula.....
Add an excel 3rd order polynomial trendline through your 'normal' going distances, it's a curve because long races are slower per furlong so you can't just use a linear trend or straight s/f. Display the trendline equation and note the factors.
Once you have those 3 factors, you can insert X as the race distance in furlongs and it will give you the estimated race duration.
Eg for Chester I ended up with 0.19 10.2 and -2 as my 3 factors.
Insert that into a polynomial equation for a race distance of 16f and you get...
=(0.19 * 16^2 ) + (10.2 * 16) + (- 2) = 210s
Substituting with the factors for Epsom (0.28 10 & -12)
=(0.28 * 16^2 ) + (10 * 16) + (- 12) = 220s
and a 3½ miler would be
=(0.28 * 28^2 ) + (10 * 28) + (- 12) = 488s (far more than twice as long but not even twice as far)
Make sure you err on the fast side, it's better to think a race is done when it's not that try to be betting when they've finished. You've got to be careful with trendlines though, there's a set of equations called Anscombe's Quartet that illustrates the problem. Sample size will be whatever looks like a relaible trend is forming, vague but true.
Screenshot_3.jpg
.
.
I'm not able to share anymore than that, what I have is very much based on a huge amount of work done by someone else and it wouldn't be right to just disseminate it.
Forget about mode. I suggested median cause it helps ignore outliers that can influence mean dramatically.Brovashift wrote: ↑Fri May 19, 2023 11:52 amMode, median, and mean??? I was just reading about this in Statistics Without Tears
- Brovashift
- Posts: 475
- Joined: Tue May 18, 2021 12:35 am
Thanks for this Athos, appreciated... If only it had 'going' and 'class' as well. I think without those times are going to be way off if GF vs Yielding, or Class 5 vs Class 1 runners.
Can I ask where you found this... is there any others??
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
If all the 5f races all start at the same place it's in the numbers. In fact Epsom is the one with the greatest differences per furlong for that very reason. But you've highlighted why an across the board time per furlong is no use.
Tbh TPD renders most of this unnecessary these days.
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
What sort of accuracy are you expecting/need? The variation is huge, even with going and class, horses don't like running into driving rain so how about weather and wind. And what about if all the horses like or dislike the going? 8 horses who hate it will plod while 8 mudlarks will romp home. And what about 2 runners vs 20?Brovashift wrote: ↑Fri May 19, 2023 1:00 pm.. If only it had 'going' and 'class' as well. I think without those times are going to be way off if GF vs Yielding, or Class 5 vs Class 1 runners.
Unless you've got a killer strategy you're looking to improve it's a hell of lot of work for speculative benefit.
If you are looking for a point to pitch in your bets during the race then this article intimates that a horse can go flat out for about 20s. Taking the times from the table then extracting 20s might be a reasonable start point.
https://www.ukgamblingsites.com/sports- ... orses-run/
If you want to go into the detail of race class, going etc look out for a copy of Mordin on Time which has tables of allowances for most things
https://www.ukgamblingsites.com/sports- ... orses-run/
If you want to go into the detail of race class, going etc look out for a copy of Mordin on Time which has tables of allowances for most things