Building a model

gazuty · Mon Oct 18, 2021 11:01 pm

I came across this through a recommendation from a person who runs a 24/7 horse and greys model.

https://www.analyticsvidhya.com/blog/20 ... l-project/

Great step by step guidance for those who are inclined. While some maths and a bit of programming will help you, this stuff can all easily be learned. You can literally follow youtube videos and other free courses on python and clip and chop and borrow and take from all sorts of resources to build up your own work.

For those who want to learn python (and yes there are many sites and there are many recommendations) - see https://www.kaggle.com/

Obviously building a model is a different branch and stream of thought from trading but nevertheless its all interesting and worth exploring (in my humble opinion).

foxwood · Tue Oct 19, 2021 9:15 am

gazuty wrote: ↑
Mon Oct 18, 2021 11:01 pm
I came across this through a recommendation from a person who runs a 24/7 horse and greys model.

https://www.analyticsvidhya.com/blog/20 ... l-project/

Great step by step guidance for those who are inclined. While some maths and a bit of programming will help you, this stuff can all easily be learned. You can literally follow youtube videos and other free courses on python and clip and chop and borrow and take from all sorts of resources to build up your own work.

For those who want to learn python (and yes there are many sites and there are many recommendations) - see https://www.kaggle.com/

Obviously building a model is a different branch and stream of thought from trading but nevertheless its all interesting and worth exploring (in my humble opinion).

Good article with loads of useful learning stuff covered in there - some of the ideas might transfer to other sports like soccer leagues which, like IPL, have a limited number of teams competing together. The gotcha with those however is the promotions / demotions between divisions where the stats of the incomers are not like-for-like with the core teams.

More interesting is the comment about running models on horses and greys - any more info available on that ?

I spent a lot of time several years ago working on ML models for UK horse racing. Sadly I never found the magic silver bullet that would stand up to random verification tests.

jimibt · Tue Oct 19, 2021 10:18 am

foxwood wrote: ↑
Tue Oct 19, 2021 9:15 am

gazuty wrote: ↑
Mon Oct 18, 2021 11:01 pm
I came across this through a recommendation from a person who runs a 24/7 horse and greys model.

https://www.analyticsvidhya.com/blog/20 ... l-project/

Great step by step guidance for those who are inclined. While some maths and a bit of programming will help you, this stuff can all easily be learned. You can literally follow youtube videos and other free courses on python and clip and chop and borrow and take from all sorts of resources to build up your own work.

For those who want to learn python (and yes there are many sites and there are many recommendations) - see https://www.kaggle.com/

Obviously building a model is a different branch and stream of thought from trading but nevertheless its all interesting and worth exploring (in my humble opinion).
Good article with loads of useful learning stuff covered in there - some of the ideas might transfer to other sports like soccer leagues which, like IPL, have a limited number of teams competing together. The gotcha with those however is the promotions / demotions between divisions where the stats of the incomers are not like-for-like with the core teams.

More interesting is the comment about running models on horses and greys - any more info available on that ?

I spent a lot of time several years ago working on ML models for UK horse racing. Sadly I never found the magic silver bullet that would stand up to random verification tests.

likewise spent a lot of time pulling together ML models for horse racing (pre-race mostly). I never found the golden ticket, in terms of classification, features or labels and I must have tried most of the OBVIOUS ones (to name a few - venue, runner gender, book% at 1 second, 10 seconds, 30, 60, jockey win%, runner win%, %age money on runner, logical order of favouritism etc, etc [further extended list of features tried on various alternative models!!]). traning the model would without exception produce an amazing outcome for the 80% sample. then, the test against the 20% non sampled data would falter around the mid line.

I know in my case, it probably was a case of inexperience but am also aware that the choice of features is crucial to obtaining a decent outcome. you only have to look at some of the examples on the web of species identification to see that a set of simple features produces a reasonable outcome. the problem arises when you have very dynamic parameters that can vary race to race and even the influence of a single runner within a mix of competitors.

all good stuff tho - and something i will gravitate towards again at some point.

sionascaig · Tue Oct 19, 2021 11:45 am

I had something on the go & was expecting a margin of c9%. After 1000 races the margin was 1% (net).

Passed the actual data to a stats expert and he came back with:

- on 1000 races you can be 60% confident on getting a 1% margin
- would need 100,000 races covered to get to a 95% confidence level on the 1% margin

==> as the criteria only identifies 1000 races pa, it could take a while!

foxwood · Tue Oct 19, 2021 12:45 pm

jimibt wrote: ↑
Tue Oct 19, 2021 10:18 am
I know in my case, it probably was a case of inexperience but am also aware that the choice of features is crucial to obtaining a decent outcome. you only have to look at some of the examples on the web of species identification to see that a set of simple features produces a reasonable outcome. the problem arises when you have very dynamic parameters that can vary race to race and even the influence of a single runner within a mix of competitors.

Too many features definitely seem to get you nowhere - kiss is better

Species and plants etc work because once a dandelion is always a dandelion.

Trouble with horses and dogs etc is that once a winner probably means it's never going to win again - and yes I looked at laying instead and no that didn't work either

I was trying to pick winners/losers ie straight betting - might be more mileage in ML trading where pricing and volumes might signal movement better than looking at the screen. May go back and play with that especially since I am re-reading Superforecasting which is getting the juices going again lol

jimibt · Tue Oct 19, 2021 1:15 pm

foxwood wrote: ↑
Tue Oct 19, 2021 12:45 pm

jimibt wrote: ↑
Tue Oct 19, 2021 10:18 am
I know in my case, it probably was a case of inexperience but am also aware that the choice of features is crucial to obtaining a decent outcome. you only have to look at some of the examples on the web of species identification to see that a set of simple features produces a reasonable outcome. the problem arises when you have very dynamic parameters that can vary race to race and even the influence of a single runner within a mix of competitors.
Too many features definitely seem to get you nowhere - kiss is better

Species and plants etc work because once a dandelion is always a dandelion.

Trouble with horses and dogs etc is that once a winner probably means it's never going to win again - and yes I looked at laying instead and no that didn't work either

I was trying to pick winners/losers ie straight betting - might be more mileage in ML trading where pricing and volumes might signal movement better than looking at the screen. May go back and play with that especially since I am re-reading Superforecasting which is getting the juices going again lol

how i approached it was pretty straightfwd. i recorded every UK horse race via the betfair api, cacheing every parameter that the api presented to me and saved it into a SQLServer db. I saved almost 6 months worth of data before i even started modelling anything and took many many runs at it. the most sucessful features of all the ones I tried are basically the little subset i mentioned above - i.e.

runner gender, book% at 1 second, 10 seconds, 30, 60, jockey win%, runner win%, %age money on runner, logical order of favouritism

this was second by second recording of data (from 6 minutes out) and in truth may have been overload, however, it did mean i could partition it however i wanted. as i said, there are some spectacular near misses in the above set of features and I'm sure as you say a KISS approach may just unearth a better representation of what's being sought..

Keep me/us posted on your next steps

karenlarson · Tue Oct 19, 2021 11:31 pm

I spent a lot of time several years ago working on ML models for UK horse racing. Sadly I never found the magic silver bullet that would stand up to random verification tests.
Trouble with horses and dogs etc is that once a winner probably means it's never going to win again - and yes I looked at laying instead and no that didn't work either
Keep me updated on your next steps

Building a model

Login • Register