Sample size for calculating avg Fin.Times

Brovashift · Thu May 18, 2023 11:31 pm

Hi all,

Anyone know what is an adequate sample size of race finish times in order to calculate a good avg time;
Per course, distance, going, etc.

separated by flats & AW. Might do NH as well just for the sake of it, but mainly interested upto 1m trips.

Going to scrape it, but when talking about 5f, 6f, 7f, 8f, at every course, at all goings... gona need years worth of data me thinks.
I was guesstimating 100 samples each? Not enough or overkill?

ShaunWhite · Fri May 19, 2023 2:07 am

Could you do it live from TPD Pars?

Or there's some info here <https://medium.com/@paulingliam/using-m ... 40776536e4> from someone who's fairly smart

But if neither works for you then basically just plot what you can, draw trendlines through it and turn those into equations to approximate the missing data.

On the distances you should be able to draw curve through your sample times (a scatter chart with time on the Y distance on the X) from maybe as few as a couple of hundered per course if they span a decent range of distances. Turning that chart of times into something useful to automation means describing it with a formula.....

Add an excel 3rd order polynomial trendline through your 'normal' going distances, it's a curve because long races are slower per furlong so you can't just use a linear trend or straight s/f. Display the trendline equation and note the factors.

Once you have those 3 factors, you can insert X as the race distance in furlongs and it will give you the estimated race duration.

Eg for Chester I ended up with 0.19 10.2 and -2 as my 3 factors.
Insert that into a polynomial equation for a race distance of 16f and you get...
=(0.19 * 16^2 ) + (10.2 * 16) + (- 2) = 210s

Substituting with the factors for Epsom (0.28 10 & -12)
=(0.28 * 16^2 ) + (10 * 16) + (- 12) = 220s

and a 3½ miler would be
=(0.28 * 28^2 ) + (10 * 28) + (- 12) = 488s (far more than twice as long but not even twice as far)

Make sure you err on the fast side, it's better to think a race is done when it's not that try to be betting when they've finished. You've got to be careful with trendlines though, there's a set of equations called Anscombe's Quartet that illustrates the problem. Sample size will be whatever looks like a relaible trend is forming, vague but true.

Screenshot_3.jpg

.
.
I'm not able to share anymore than that, what I have is very much based on a huge amount of work done by someone else and it wouldn't be right to just disseminate it.

napshnap · Fri May 19, 2023 6:55 am

At least a thousand (law of large numbers, to avoid effects of "law-of-small-numbers") and its median (for robustness).

https://effectiviology.com/law-of-small-numbers/

Atho55 · Fri May 19, 2023 8:29 am

This might save you a bit of time but no idea who created it.

AverageRaceTimes.zip

ShaunWhite · Fri May 19, 2023 10:39 am

napshnap wrote: ↑
Fri May 19, 2023 6:55 am
At least a thousand

There's only about 10,000 races a year. With the number of courses, goings, distances and codes, 1000 of each would be an eternity. Normal sample sizing goes out of the window.

sionascaig · Fri May 19, 2023 11:11 am

Really need to look at the variance associated with the sample in order to understand your mean.

And probably want to split samples by race grade & conditions. And take out outliers...

==> a random sample of c5% say of all races in past year, split by above criteria should give you a pretty good estimate (taking account of variance)

napshnap · Fri May 19, 2023 11:28 am

sionascaig wrote: ↑
Fri May 19, 2023 11:11 am
... And take out outliers...
...

Median solves this problem.

Brovashift · Fri May 19, 2023 11:49 am

Wow.... here's me thinking I'm being really clever. Apparently not

I was just thinking of collecting an adequate number of samples per 'criteria' and get the avg time simply by avgTime = totalTotal / numberOfSamples, so when reviewing a selections finish times I can see if it is above or below the course avg finish times.

I'm not automating anything, I'm laying in play, so just looking for weak / slow runners.
I'm using a Python web scraper to scrape today's racecard and historical data, which is saved as csv. Then I've created a userform that pulls in the data, and just makes my life easier and allows me to analyse the racecards faster. I just want to display course avg time (by trip, going, class, type) in a label on the form when racecard selected.
I currently use the old pen and paper method to identify my selections, but in all honesty it hurts my brain after a days trading, so am writing some logic code that will do it at the click of my mouse. That way I should only have to dive deep into my psyche once so I can focus on watching my selections and reading the race. That's the plan anyway

Brovashift · Fri May 19, 2023 11:52 am

Mode, median, and mean??? I was just reading about this in Statistics Without Tears

Derek27 · Fri May 19, 2023 11:57 am

ShaunWhite wrote: ↑
Fri May 19, 2023 2:07 am
Could you do it live from TPD Pars?

Or there's some info here <https://medium.com/@paulingliam/using-m ... 40776536e4> from someone who's fairly smart

But if neither works for you then basically just plot what you can, draw trendlines through it and turn those into equations to approximate the missing data.

On the distances you should be able to draw curve through your sample times (a scatter chart with time on the Y distance on the X) from maybe as few as a couple of hundered per course if they span a decent range of distances. Turning that chart of times into something useful to automation means describing it with a formula.....

Add an excel 3rd order polynomial trendline through your 'normal' going distances, it's a curve because long races are slower per furlong so you can't just use a linear trend or straight s/f. Display the trendline equation and note the factors.

Once you have those 3 factors, you can insert X as the race distance in furlongs and it will give you the estimated race duration.

Eg for Chester I ended up with 0.19 10.2 and -2 as my 3 factors.
Insert that into a polynomial equation for a race distance of 16f and you get...
=(0.19 * 16^2 ) + (10.2 * 16) + (- 2) = 210s

Substituting with the factors for Epsom (0.28 10 & -12)
=(0.28 * 16^2 ) + (10 * 16) + (- 12) = 220s

and a 3½ miler would be
=(0.28 * 28^2 ) + (10 * 28) + (- 12) = 488s (far more than twice as long but not even twice as far)

Make sure you err on the fast side, it's better to think a race is done when it's not that try to be betting when they've finished. You've got to be careful with trendlines though, there's a set of equations called Anscombe's Quartet that illustrates the problem. Sample size will be whatever looks like a relaible trend is forming, vague but true.
Screenshot_3.jpg

.
.
I'm not able to share anymore than that, what I have is very much based on a huge amount of work done by someone else and it wouldn't be right to just disseminate it.

Wouldn't that method be thwarted by hills? At Epsom, the first 4 furlongs of the 5-furlong start is downhill resulting in about the fastest 5 furlongs in the world, whereas the first 4 furlongs of the Derby course rises over 100 feet.

napshnap · Fri May 19, 2023 12:34 pm

Brovashift wrote: ↑
Fri May 19, 2023 11:52 am
Mode, median, and mean??? I was just reading about this in Statistics Without Tears

Forget about mode. I suggested median cause it helps ignore outliers that can influence mean dramatically.

Brovashift · Fri May 19, 2023 1:00 pm

Atho55 wrote: ↑
Fri May 19, 2023 8:29 am
This might save you a bit of time but no idea who created it.

AverageRaceTimes.zip

Thanks for this Athos, appreciated... If only it had 'going' and 'class' as well. I think without those times are going to be way off if GF vs Yielding, or Class 5 vs Class 1 runners.

Can I ask where you found this... is there any others??

ShaunWhite · Fri May 19, 2023 2:32 pm

Derek27 wrote: ↑
Fri May 19, 2023 11:57 am
wouldn't that method be thwarted by hills? At Epsom, the first 4 furlongs of the 5-furlong start is downhill resulting in about the fastest 5 furlongs in the world, whereas the first 4 furlongs of the Derby course rises over 100 feet.

If all the 5f races all start at the same place it's in the numbers. In fact Epsom is the one with the greatest differences per furlong for that very reason. But you've highlighted why an across the board time per furlong is no use.

Tbh TPD renders most of this unnecessary these days.

ShaunWhite · Fri May 19, 2023 2:40 pm

Brovashift wrote: ↑
Fri May 19, 2023 1:00 pm
.. If only it had 'going' and 'class' as well. I think without those times are going to be way off if GF vs Yielding, or Class 5 vs Class 1 runners.

What sort of accuracy are you expecting/need? The variation is huge, even with going and class, horses don't like running into driving rain so how about weather and wind. And what about if all the horses like or dislike the going? 8 horses who hate it will plod while 8 mudlarks will romp home. And what about 2 runners vs 20?

Unless you've got a killer strategy you're looking to improve it's a hell of lot of work for speculative benefit.

Atho55 · Fri May 19, 2023 3:08 pm

If you are looking for a point to pitch in your bets during the race then this article intimates that a horse can go flat out for about 20s. Taking the times from the table then extracting 20s might be a reasonable start point.

https://www.ukgamblingsites.com/sports- ... orses-run/

If you want to go into the detail of race class, going etc look out for a copy of Mordin on Time which has tables of allowances for most things

Sample size for calculating avg Fin.Times

Login • Register