Data for modelling / automation / getting started for beginners.

Advanced automation available in Guardian - Chat with others and share files here.
Post Reply
rkuk
Posts: 20
Joined: Sat Jun 28, 2025 8:29 pm

Hello everyone,

Sorry if this has been asked loads of times, I have tried searching the forum and also the BA manual but I might not be using the correct search terminology? I have read a few posts with people talking about data. But where does the data come from / how do we obtain it? Do I need to search online to find my own data and put it in certain excel formats? I seen there is some historical data on betfair, but there was a bit of jargon regarding using coding and API's. Is this what we should be using? Or is there access to datasets in bet angel that we can access?

I thought if I could start to learn some of the basics perhaps using some of the files Dallas has kindly created and shared then perhaps the data might help me look for opportunities to deploy? e.g. goals scored in games if I start learning now then I might be ready for the new football season? Or some findings perhaps in horse racing.

I've watched a video by Peter explaining that back testing with data could be backfitting, which I interpret as "well I'm looking for that and now I've found it, but it doesn't mean the future will play out the same"...I think. A bit like past performance does not mean future results in investing. That said, if able to look at perhaps an automation strategy and then find some markets to see evidence of what that would return is a starting point (or am I just returning to back filling again here?). Is there a beginners guide in laymans terms for this sort of stuff for the average Joe without a background in software and programming etc (as I am sure you can tell! :oops: ) or am I all wrong and I don't need the data for consideration of the automation tools (but surely I need something to inform my decisions).

Thank you :)
sionascaig
Posts: 1639
Joined: Fri Nov 20, 2015 9:38 am

That is a lot of questions...

At a high level:

You can use "data" to build "models".

You use models to get a better understanding of the thing you are trying to price.

Taking account of your costs & the variance within your model you can then determine where value lies in laying or backing a specific outcome.

From memory Euler has some good vids on modelling some football markets where the data & model are relatively straight forward.

BA also has models already built into it, e.g. Soccer Mystic & Tennis Trader. You will find vids / blogs on these too. So maybe what you need is already there.

The following is a good introduction to data / modelling:

https://www.youtube.com/watch?v=dRjrMbk ... ackAndrews

As the guys says, it is not for everyone but you wont know unless you try it...

The mistake that most people make with data is simply searching it for a pattern that produces say a monthly profit, then assuming that if they implement that strategy going forward it will generate a profit..

You do not need models to be a good profitable trader they are just tools that "may" help you...

You should however collect data all the time so you can review / improve your trading approach.
User avatar
ShaunWhite
Posts: 10530
Joined: Sat Sep 03, 2016 3:42 am

Backfitting is when you find a pattern in data, but then fail to check the same phenomenon exists in data the strategy hasn't seen before. So you'd normally randomise your data into two sets, training data and validation data. Look for things in your training data.... then check it's still there in the validation data.

People often think that a large dataset means you need this rigour, but any set of data will exhibit a bias and unless it exists in unseen data too, then it's just backfitted.

This might give you some idea of the levels of data you could access.
https://historicdata.betfair.com/#/home

'Data' can be anything from a few basic stats, to recording the entire output from Betfair in real time, several gigabytes a day. What you need depends on what you want to do with it.
rkuk
Posts: 20
Joined: Sat Jun 28, 2025 8:29 pm

sionascaig wrote:
Thu Jul 24, 2025 7:12 pm
That is a lot of questions...

At a high level:

You can use "data" to build "models".

You use models to get a better understanding of the thing you are trying to price.

Taking account of your costs & the variance within your model you can then determine where value lies in laying or backing a specific outcome.

From memory Euler has some good vids on modelling some football markets where the data & model are relatively straight forward.

BA also has models already built into it, e.g. Soccer Mystic & Tennis Trader. You will find vids / blogs on these too. So maybe what you need is already there.

The following is a good introduction to data / modelling:

https://www.youtube.com/watch?v=dRjrMbk ... ackAndrews

As the guys says, it is not for everyone but you wont know unless you try it...

The mistake that most people make with data is simply searching it for a pattern that produces say a monthly profit, then assuming that if they implement that strategy going forward it will generate a profit..

You do not need models to be a good profitable trader they are just tools that "may" help you...

You should however collect data all the time so you can review / improve your trading approach.
Sorry for all the q's :oops: and thanks for the reply! :)
I will try to find the football videos. As an example I know Peter had a video on predicting goals, which was comparing home team and away team. From memory I think that example might have been to work out the odds of a given team (but I could be wrong). So the soccer mystic, am I able to find data from lots of football matches for example, and plug that into soccer mystic to give some clues on perhaps number of goals and so forth? I have downloaded the predicted goals setup from the automation pages and managed to get that working, so perhaps it does all this for me as you say. But if I wanted to look at more things such as typical home / away / draw results in specific leagues (just making up some examples here) I wonder how to do that.

That example coincides nicely with the video you shared actually - I enjoyed watching that. I didn't understand the odds / bet of being -100 or whatever in basketball though - unless this means under 100 points for the team perhaps? (the example given was for 3 pointers scored), which wouldn't include the 2 pointers scored as well. But either way I liked it and the fact it showed the formula.

Get what you say about the models being tools to help, I think a potential upside is letting it do its job without any emotions and seeing what the results are on something with low stakes to help me learn. The collecting data bit, that was part of my first question 'where to get it', which might be answered in Shaun's reply.

Thanks for helping :)
rkuk
Posts: 20
Joined: Sat Jun 28, 2025 8:29 pm

ShaunWhite wrote:
Thu Jul 24, 2025 8:06 pm
Backfitting is when you find a pattern in data, but then fail to check the same phenomenon exists in data the strategy hasn't seen before. So you'd normally randomise your data into two sets, training data and validation data. Look for things in your training data.... then check it's still there in the validation data.

People often think that a large dataset means you need this rigour, but any set of data will exhibit a bias and unless it exists in unseen data too, then it's just backfitted.

This might give you some idea of the levels of data you could access.
https://historicdata.betfair.com/#/home

'Data' can be anything from a few basic stats, to recording the entire output from Betfair in real time, several gigabytes a day. What you need depends on what you want to do with it.
Thanks for explaining that Shaun, that makes sense to me now, and it helps with this two data samples as I had read that in another post but I had no idea what it meant. So sticking with football, if I was testing something in a specific league as say it has more goals than other leagues usually, then would I still test that in other leagues? Or maybe I'm not seeing the bigger picture here as it might be a league has more goals but correlates to the starting odds which could be the same in other leagues - I don't know, I'm confusing myself as I think about it.

That was the link I found from searching previously (thank you). I've started downloading it. Next questions will be 'how do I use this in BA?' I imagine!
User avatar
ShaunWhite
Posts: 10530
Joined: Sat Sep 03, 2016 3:42 am

The number of goals in a league vs any other league just is what it is. The backtest stuff (aka in and out of sample testing) is more to do with testing if a strategy would make money. Ie to stop you looking at your data and picking some combination of circumstances that appears to make a profit. What some call data mining or p-hunting.

But I think you're seeing there's two main types of data. Fundamentals, ie what league/team/horse does what, and Price data, what odds you could have got if you'd acted on your hunch.
sionascaig
Posts: 1639
Joined: Fri Nov 20, 2015 9:38 am

rkuk wrote:
Thu Jul 24, 2025 8:59 pm

.... But if I wanted to look at more things such as typical home / away / draw results in specific leagues (just making up some examples here) I wonder how to do that....

Thanks for helping :)
I use : https://fbref.com/en/comps/ for football stats..

All the major leagues are covered & it gives stats on leagues & players going back years.

It is relatively straight forward to copy / paste the info you want into excel.

There are some interesting discussions in the football thread & great examples of good use of data to spot market opportunities.

For example, Euler highlighted a change to how injury time was allowed for in the Premier league - it would be create more playing time - and anticipated more goals as a result. It took the markets until Jan / Feb before the prices caught up with the new normal.

Similarly, the amount of cards awarded was up about 50% in the early part of last season and again it took until Jan / Feb for the new normal to be reflected in prices.

==> Happy days if you were on the right side of the overs market or backing players getting carded.

Note: these features were specific to the Premier league..

The above link will allow you to capture this sort of data for current & past seasons but the key point is thinking ahead and looking for reasons why it may no longer be a good model for what is happening now. Therein lies your edges )
rkuk
Posts: 20
Joined: Sat Jun 28, 2025 8:29 pm

ShaunWhite wrote:
Fri Jul 25, 2025 12:36 am
Ie to stop you looking at your data and picking some combination of circumstances that appears to make a profit.
Thank you Shaun, I've read this a few times this morning and sorry I'm not quite understanding this sentence. To stop me looking at the data and picking combinations? Do you mean 'I've looked at my data, potentially found something that could work, now time to see if it plays out?
ShaunWhite wrote:
Fri Jul 25, 2025 12:36 am
But I think you're seeing there's two main types of data. Fundamentals, ie what league/team/horse does what, and Price data, what odds you could have got if you'd acted on your hunch.
This is good right? i.e. a basic step in the right direction. Even though I don't quite have hunch yet :lol: but thinking of scenarios and then questioning it through data might help me find a hunch. If I find a correlation of say a hunch on goals and pricing that did result in a profit, is this finding value?

Popular matched betting sites are promoting value betting and they must have a secret sauce that creates their version of value. So maybe this is a similar thing?

Thanks again for taking the time to reply - appreciated! :)
rkuk
Posts: 20
Joined: Sat Jun 28, 2025 8:29 pm

sionascaig wrote:
Fri Jul 25, 2025 8:31 am
rkuk wrote:
Thu Jul 24, 2025 8:59 pm

.... But if I wanted to look at more things such as typical home / away / draw results in specific leagues (just making up some examples here) I wonder how to do that....

Thanks for helping :)
I use : https://fbref.com/en/comps/ for football stats..

All the major leagues are covered & it gives stats on leagues & players going back years.

It is relatively straight forward to copy / paste the info you want into excel.

There are some interesting discussions in the football thread & great examples of good use of data to spot market opportunities.

For example, Euler highlighted a change to how injury time was allowed for in the Premier league - it would be create more playing time - and anticipated more goals as a result. It took the markets until Jan / Feb before the prices caught up with the new normal.

Similarly, the amount of cards awarded was up about 50% in the early part of last season and again it took until Jan / Feb for the new normal to be reflected in prices.

==> Happy days if you were on the right side of the overs market or backing players getting carded.

Note: these features were specific to the Premier league..

The above link will allow you to capture this sort of data for current & past seasons but the key point is thinking ahead and looking for reasons why it may no longer be a good model for what is happening now. Therein lies your edges )
Thank you sionascaig, very kind of you to share. Perhaps I need to find the football thread action for more reading. Interesting points regarding the rules changing and the betting conditions not changing at the same pace. Makes sense that over time it creates more opportunities as there is a % increase to the available time. Last minute goals are very much a thing.
User avatar
ShaunWhite
Posts: 10530
Joined: Sat Sep 03, 2016 3:42 am

rkuk wrote:
Fri Jul 25, 2025 10:26 am
ShaunWhite wrote:
Fri Jul 25, 2025 12:36 am
Ie to stop you looking at your data and picking some combination of circumstances that appears to make a profit.
Thank you Shaun, I've read this a few times this morning and sorry I'm not quite understanding this sentence. To stop me looking at the data and picking combinations? Do you mean 'I've looked at my data, potentially found something that could work, now time to see if it plays out?
I mean that if you roll a dice 1000 times, one number will always come up more often than the others. But if you divide that data in half the it's unlikely to be the most frequent number in both sets.

Value betting? It's all value betting, if you find a situation that happens half the time, whether that's a win or someone scoring first, then if you don't make better than 2.0 on it then you'll lose long term. Value betting is often seen as just the odds on the final result but it's also the odds on your 'trade' being successful.
Post Reply

Return to “Bet Angel - Automation”