Making sense of Basic betfair historical data

Post Reply
Tiesto13
Posts: 29
Joined: Tue Aug 08, 2023 2:39 pm

So I've been trying to make sense of the betfair historical data for cricket (Basic only for now). Its a bit harder to work with than any dataset I've used before however. The resources out there to help with this stuff that I've found are either incredibly techy and assume a very high level of base knowledge or don't seem to work to convert this data into a CSV, so I'm trying to do it myself (also I've found so far that working this stuff out for myself from first principles is far better to allow me to understand both the data I'm using as well as how to be flexible in my approach to collating and processing new data types).

I therefore wanted to sense check my proposed method of trying to use it before I go down a bit of a rabbit hole of trying to code it.

I'm not sure what some of the other datasets look like, but for cricket, each JSON file typically includes a large number of JSON objects. These vary between JSON objects that have the full market definition data in, and JSON objects that just have a price change in. e.g.

JSON object with full market definition data (useful metadata that applies to every data point following it up until the next market definition change)
JSON object with LTP for two selections (data point that I actually want to capture but missing most context beyond a timestamp and selection ID)
JSON object with LTP for one selection
....
JSON object with LTP for two selections
JSON object with full market definition data indicating market now in play
JSON object with LTP...
...
JSON object with LTP...
JSON object with full market definiton data indicating market now suspended
JSON object with full market definition data indicating market now closed

When capturing all this data into a dataframe, am I best of effectively calling the JSON objects with all the market definition data 'metadata' JSON objects, and using these objects to apply any required metadata to any 'data' JSON objects that follow it rather than trying to capture the 'metadata' JSON object in my dataframe?

Hope that all makes sense. And actually formulating my query and typing it out has made me think this is the best approach to take to capture this data, but would appreciate any thoughts.
LinusP
Posts: 1873
Joined: Mon Jul 02, 2012 10:45 pm

Do you have a language preference, not sure if you are referring to R or python when using dataframes?

Regarding parsing the data others have done the hard work for you (python):

https://github.com/betcode-org

And if you are feeling really lazy you can use the processor I developed:

https://apps.betfair.com/data/betfair-h ... processor/
Tiesto13
Posts: 29
Joined: Tue Aug 08, 2023 2:39 pm

Cheers for sharing these. In terms of what language I'm using, I'm using Python. I'm entirely self taught (and very much at the bottom of the learning curve) hence my issues here!

These were both resources I've come across actually but struggled with using. Does betfairlight not require an API key? Or can I use the historic data elements of betfairlight without an API key?

I've now spent a bit more time with the betfair historical data processor though to get it to work (I was trying to put in the initial TAR file at first rather than the bz2 files within it so a fairly rookie error :D) and that has been useful in seeing how that spits out the data. However, I'm looking to add hundreds if not thousands of separate markets to the same dataframe so I think I still need to develop my own code here (or get betfairlight to work).

I'm nearly there in terms of being able to parse everything I want in python, and I think the approach I suggested (using the market data as meta data to then apply to all the individual data points) is very similar to how the historical data processor works from what I can see so makes me think I'm on the right track for what I'm trying to do.
Tiesto13
Posts: 29
Joined: Tue Aug 08, 2023 2:39 pm

Also, one issue I have that is throwing my data collection off, is that for each event_ID, I seem to have two types of JSON file. One is a JSON file which seems to contain all markets that were created for that event, and the other is a separate JSON file for each market. Is this consistent with how the historical data is structured for other sports, and if so, presumably I can just filter my code so it doesn't touch the JSON files that contain all the markets (if I'm only interested in match odds for example)?
Tiesto13
Posts: 29
Joined: Tue Aug 08, 2023 2:39 pm

Ok, I think I've wrinkled out any issues from the one month of sample data I've been using. Now to try and run it across a bigger dataset!
LinusP
Posts: 1873
Joined: Mon Jul 02, 2012 10:45 pm

If you are just processing historical data then you don’t need an api key, however join the slack and get it authorised for free anyway when you are ready.

You have both market and events because you asked for them when downloading.

If you are new to Python just use bflw, parsing the stream is not easy, few lines and you will be done.

https://github.com/betcode-org/betfair/ ... torical.py
Tiesto13
Posts: 29
Joined: Tue Aug 08, 2023 2:39 pm

I seem to be getting errors when I try to run that code from betfairlightweight. I've managed to write some code that can parse all the files I have in the way I want but only for the match odds market currently (I could do it for other markets but anything over 3 runners and I'd need to revisit it as its quite specific currently)... So when I want to do any further analysis on these files, I probably need to spend a bit of time getting to grips with BFLW. But now I'm pretty sure the match odds database I've just created will keep me busy for a while :)

And yep, I see now that there was an option to only request the market data when downloading :oops:
Post Reply

Return to “Betfair Data”