Licensing a betting Model?

The sport of kings.
Post Reply
Gcampb
Posts: 9
Joined: Mon Mar 13, 2023 8:42 pm

Hi

I posted a few weeks ago here regarding a machine learning project I was working on. It appeared to have found an edge in the UK&iE racing markets, it was backtested on 9 years of racing and it has a strong, consistent signal that I'm currently testing on live races. My initial tests showed it had circa a 13% roi per annum, however as ive been trading on live races I decided on specific tweaks to making selections ie excluding selections with odds of less than 2.0 and the roi shoots up to 30% per annum, though with a drop from circa 1400 selections to 400..

Further to this, ive also got a model that initially I couldn't find value in - it could, predict the structure of a market before the off and without any sight of odds data accurately predict how the market was going to react to the race pre off, initially I couldn't find a trading strategy, however after returning to it - it actually works very well for lay betting. The model takes a race card and scores the runners, but just as it was accurate with the top horses, it's accurate with the horses likely to underperform. The edge lies in that the market doesnt price these very well and again, following a basic stategy its been profitable year on year. It gives me the maximum odds to lay bet on a selection, execution wise its a bit trickier to do than my other model, but not impossible.

Both models have minimal drawdown periods so psychologically they are not too bad to work with.

My question... I had been sharing picks on the UK horse racing subreddit as a test and just so anyone interested could follow. along. I received lots of messages etc and a bit of stick too :) because the picks weren't obviously flashy 10/1 picks, however ive been contacted by a 'broker' and to be honest, feel slightly out of my depth with what's being proposed.

I've been asked if I could be introduced to his clients - syndicates and infrastructure providers (not quite sure what that means to be honest, but they receive rebates from books so I assume they are market makers) with the aim of licensing out the models for their use..

When I started creating these models, I'd intended to use these myself and possibly start a tipping/subscription service down the line if the signals held up. I'm aware syndicates exist, but really have no idea where they exist or how they operate.

Does anyone here have any advice? anyone been involved in syndicates or have info on 'infrastrucure providers' ?
User avatar
Euler
Posts: 26788
Joined: Wed Nov 10, 2010 1:39 pm

When I first started doing this I was offered a chance of be part of a syndicate and also getting funding in return for a cut of profits.

In the end, while very tempting, I rejected both offers and it turned out to be the best thing I ever did. It would have cost me millions.

Scale is an issue with betting, so holding onto it yourself will provide you with the best return but also no risk of people unpicking what you are doing. Which is also something syndicates seem to be quite focused on when they onboard somebody.
User avatar
wearthefoxhat
Posts: 3635
Joined: Sun Feb 18, 2018 9:55 am

Euler wrote:
Mon Dec 29, 2025 2:46 pm
but also no risk of people unpicking what you are doing. Which is also something syndicates seem to be quite focused on when they onboard somebody.

That would be the main point for sure.

Many corporate entitities do this under the guise of an interview (or set of interviews), and ask to see samples of your work. It's a shady practice and they know they could likely reverse engineer your set up and good luck taking them to court.

In the OP's shoes, I'd slowly build to scale and compound the profits. Keep reviewing the results, set a timescale, ie: 9 months (like giving birth) and keep your reddit comms to a minimum as it could serve as distraction during a losing run.

If the OP goes ahead with the meeting, go with eyes wide open. Don't sign anything until you get a legal perspective on things, and certainly don't give way any code or research.
User avatar
ShaunWhite
Posts: 10663
Joined: Sat Sep 03, 2016 3:42 am

Euler wrote:
Mon Dec 29, 2025 2:46 pm
Scale is an issue with betting.
Scale can also be an issue with ops too. Data collection and analysis, idea creation (multisport), test/sim development, model optimisation, implementation, monitoring results etc etc, it's a lot for one person. Sometimes 1+1 can = 3 or 4 but then there's the problem of remuneration, it's rarely the case that everyone deserves an equal slice, so do you draw a wage, but that's then an unwelcome overhead, and then what happens if you grow further. It's a nightmare.
Goobs
Posts: 143
Joined: Thu Jul 10, 2025 4:01 pm

I can't say I really see the point of all this algo data mining stuff on these exchanges, when the net result is equity-like returns that can be had from just buying an ETF and having a much easier life....

Especially when I see in-play traders making £40-50k from £200 stakes, their "ROI" must be in the 1000% of percent!
User avatar
ShaunWhite
Posts: 10663
Joined: Sat Sep 03, 2016 3:42 am

Goobs wrote:
Wed Dec 31, 2025 10:53 am
I can't say I really see the point of all this algo data mining stuff on these exchanges, when the net result is equity-like returns that can be had from just buying an ETF and having a much easier life....

Especially when I see in-play traders making £40-50k from £200 stakes, their "ROI" must be in the 1000% of percent!
Returns are small but you're only holding it for seconds rather than months.
£200 stake done 20 times a day is a turnover of £1.5m, 50k would be an RoI of about 3.5%. But you only need a few grand instead of £1.5m. So yeah it's 50k for a grand tied up, thousands of % but we generally calculate it on turnover rather than the amount tied up.
LinusP
Posts: 1934
Joined: Mon Jul 02, 2012 10:45 pm

I think you might just be getting ahead of yourself here, I am assuming you are backtesting your model against SP?

- 13% ROI is very high, the best of the best high (@SP) in a market which is very efficient
- Are you getting EV when betting morning/5 mins before start?
- I am assuming you are using timeform or similar data? this is all priced in, unless you are doing something particularly novel I have doubts over your 13%
- The live 'tweak' concerns me
- 1400 selections (per year?) is small, 400 is into dangerous overfitting territory (1 bet a day?)
- No genuine syndicate would be interested in what you are currently describing so I would stay well away from anyone that has approached you
LinusP
Posts: 1934
Joined: Mon Jul 02, 2012 10:45 pm

Goobs wrote:
Wed Dec 31, 2025 10:53 am
I can't say I really see the point of all this algo data mining stuff on these exchanges, when the net result is equity-like returns that can be had from just buying an ETF and having a much easier life....

Especially when I see in-play traders making £40-50k from £200 stakes, their "ROI" must be in the 1000% of percent!
Yeah, complete waste of time
Gcampb
Posts: 9
Joined: Mon Mar 13, 2023 8:42 pm

[
LinusP wrote:
Thu Jan 01, 2026 8:59 am
I think you might just be getting ahead of yourself here, I am assuming you are backtesting your model against SP?

- 13% ROI is very high, the best of the best high (@SP) in a market which is very efficient
- Are you getting EV when betting morning/5 mins before start?
- I am assuming you are using timeform or similar data? this is all priced in, unless you are doing something particularly novel I have doubts over your 13%
- The live 'tweak' concerns me
- 1400 selections (per year?) is small, 400 is into dangerous overfitting territory (1 bet a day?)
- No genuine syndicate would be interested in what you are currently describing so I would stay well away from anyone that has approached you
So to clarify, that 13% was for one exploratory strategy I’ve been testing, but the underlying model is a predictive model, that’s completely market agnostic - which is where the inherent value is. It’s doesn’t use any 3rd party ratings system, market information, it’s all derived from raw pre race data, 1.2m results over the last 10 years.

Leakage was a huge concern and I’ve iterated many different walkforward and tests to rule anything out. What’s remains is a model that identifies its top 2–3 runners in a race, it captures the majority of actual placers at a rate far above chance, and with little year-to-year degradation, 81-83%. On the flip side to this it also identifies the weakest runners with as much accuracy, which is where a lot of market inefficiencies seem to lie.

I’m now separately testing execution and trading strategies using Betfair historical data, that’s where the 13% figure came from. I’m being cautious here, because probability to price disagreement does not automatically translate into a finished betting edge. What the work does show, consistently, is that the model identifies persistent misalignment between estimated probabilities and market prices, and the open question is how best to exploit that under realistic conditions.
Goobs
Posts: 143
Joined: Thu Jul 10, 2025 4:01 pm

Like others have said before, what's the relevance of10 year old data to today's races?

Pricing, handicapping, the horses themselves have all changed so much over that period...:? I fail to see how prices that were made 10 years ago can have any predictive value today...

All I see day in and day out is horses with starting at prices > 20 still romping home.....how can they get the starting price so wrong with all of today's data? The horse is priced with not even being in contention, then wins by 3 lengths...... what it really tells me is no one knows what's going on!
Gcampb
Posts: 9
Joined: Mon Mar 13, 2023 8:42 pm

Goobs wrote:
Fri Jan 02, 2026 2:40 pm
Like others have said before, what's the relevance of10 year old data to today's races?

Pricing, handicapping, the horses themselves have all changed so much over that period...:? I fail to see how prices that were made 10 years ago can have any predictive value today...

All I see day in and day out is horses with starting at prices > 20 still romping home.....how can they get the starting price so wrong with all of today's data? The horse is priced with not even being in contention, then wins by 3 lengths...... what it really tells me is no one knows what's going on!
Thats fair.. I guess I need to frame it better and it’s a really common misunderstanding about how models like this work.

I’m not using 10-year-old prices to predict today’s prices. Old odds themselves have no predictive power and in fact, my model doesnt feature odds at all - it doesnt take any market data into account for predictions.

What machine learning does instead is look for stable relationships between measurable race factors and outcomes - things like field structure, relative ability, context, and behaviour under race conditions. Those relationships exist within races, not across time.
The horses change. The trainers change. The markets evolve.
But the problem doesn’t change:

Which horses are more likely to be competitive in this race, given what we know before the off?

The model is trained on past races to learn those relationships, then it’s walk-forward tested on completely unseen seasons.

If the edge only existed in the past, performance would collapse the moment you step forward and that’s exactly what proper testing is designed to catch. I trained on 2015/17, tested on 18, retrained on 2015/18, tested on 19 etc, so it's always testing the stability of the signal on unseen data.

The model makes predictions, not on historic data, but how certain features of a race weighed in previous outcomes. What differentiates models is what features are used. Now ive tested 100+ model features vs very low numbers of features - there are features that dont matter that when I learned handicapping, I thought were really important, but actually when tested - they dont mean nearly as much as I'd have expected.

With regard to outsiders running home, thats completely to be expected. The market prices expected value, it expects the favourite to win, thats why it's short price, it doesnt expect the 20/1 horse to win hence the price.
Goobs
Posts: 143
Joined: Thu Jul 10, 2025 4:01 pm

I hate to say it, but people have been doing this for a very long time before you, like since the 1960's.....like this story on Benter....

https://www.bloomberg.com/news/features ... acing-code

Surely the current market price includes what has already been modelled by eveyone else, some with vastly more resources than yourself....

In my readings from Benter, he had many variables in his models, and only a few had a very high weighting. He said in one paper I read that one of the highest ratings was "days since last run" and that around 15-21 days was the sweet spot.

From your research, what is your current highest factored variable?
Gcampb
Posts: 9
Joined: Mon Mar 13, 2023 8:42 pm

Goobs wrote:
Fri Jan 02, 2026 5:15 pm
I hate to say it, but people have been doing this for a very long time before you, like since the 1960's.....like this story on Benter....

https://www.bloomberg.com/news/features ... acing-code

Surely the current market price includes what has already been modelled by eveyone else, some with vastly more resources than yourself....

In my readings from Benter, he had many variables in his models, and only a few had a very high weighting. He said in one paper I read that one of the highest ratings was "days since last run" and that around 15-21 days was the sweet spot.

From your research, what is your current highest factored variable?
Bill Benter's story is where I first learned about ML, popular science book about math that I cant for the life of me remember the title of. I'd heard he was even using humidity data in his model which influenced pace. Who knows what his model actually used, but its the perfect example of why ML is so valuable, especially considering his edge was seemingly under 3% which is staggering considering how much he made out of it.. this was of course compounded by rebates etc but it shows you dont need much of an edge to profit, especially in highly liquid markets.

I've tested lots and lots of features derived from the raw data. But the features and weightings are the secret sauce unfortunately. With the model im working on, there isnt a particular feature that is overweighted and ive tested this by removing features and comparing walkforward tests, so ive got a very clear idea of how it's calibrated.

What I can say is, some of the dead obvious handicapping features matter so little it's shocking. One example might be jocket/trainer combo strike rate, the tests show it's just noise
User avatar
wearthefoxhat
Posts: 3635
Joined: Sun Feb 18, 2018 9:55 am

One important thing, with regard to Bill Benter/Alan Woods models, were that they focussed on Hong Kong racing, only 2 tracks. (Sha Tin & Happy Valley). Their first attempt was a loss, but it was quickly realised the live odds needed to be factored to align with the 1000's of attribute subsets.

Their falling out, was probably the best thing for BB, as he was free from the shackles of a demanding betting partner, whom then took BB database and used it for his own syndicates. As we know, BB went on to make Billions after reprogramming his and not sharing with anyone. The rebate system worked well for him too along with the extra tech the HK jockey club gave him in the early days.

The UK/IRE tracks are so quirky with their different undulations, going stick inconsistencies, rail movements...etc. Maybe the best approach is to focus on the All Weather tracks. Still a minefield, but at least consistent stalls and circumference measurements for speed analysis and all the other attributes that need to be considered. Reckon it's still worth doing even in today trading conditions.
Post Reply

Return to “Trading Horse racing”