Database Structure for Horse Racing

Post Reply
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

Hi all,

I started recording horse racing data a few weeks ago using Switesh's brilliant spreadsheet. I'm now looking to go a step further and move out of Excel and into a database. I'm going to start with MS Access to help improve my knowledge of database design before moving on to something more sophisticated when the data volume ramps up.

The issue I have is that I have no real experience with MS Access or databases in general and having to learn as I go (which is the fun part anyway). But I'm getting a little bit confused in terms of the best way to structure the database to cover all of the data that can be captured, especially the pre-off data at each 1 second interval from 5:30 out to post time (i.e. back prices, lay prices, LTP, volume, WOM Etc... as design by Switesh himself). How do people generally structure the tables to capture this level of data? As someone that is experienced in Excel but not Access, I'm struggling with the adaptation!

I've split some of the data into separate tables, i.e. courses, race types, distances etc., but it's the detail that's throwing me... I'd really appreciate some advice and guidance from those who are far more experienced at this stuff than I am! :)

Thanks in advance

Tom
User avatar
ODPaul82
Posts: 683
Joined: Sun May 08, 2011 6:32 am
Location: Digswell Herts

This is how I structure mine based on the free betfair data
ER Diagram.png
It's to a certain level of normalisation which is where you dont want to have masses of duplicate information
Basically if you can achieve 3rd normal form (google it) then you're pretty much there in structure.
You do not have the required permissions to view the files attached to this post.
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

Thanks Paul - that is really useful. I'll have a look into 3rd normal form!

Nice to see a fellow member from Sutton! I've lived here all my life - assuming you're in Sutton (Surrey)... :)
User avatar
ODPaul82
Posts: 683
Joined: Sun May 08, 2011 6:32 am
Location: Digswell Herts

Was born in St Helier's, grew up around here and I have moved away here & there am currently residing in Rosehill (for my sins, somebody has to)
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

Ah I know the area well mate - I grew up in Belmont/Carshalton Beeches and currently live just near Sutton station. Small world!
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

So looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?

Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
harry
Posts: 14
Joined: Tue Sep 08, 2020 9:04 am

TLindeth wrote:
Sun Mar 14, 2021 2:19 pm
So looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?

Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Great! Yes I think you're overcomplicating things slightly. In this case the best thing to do is probably to create separate fields (in the same table) for each of the items you mentioned, and then insert a separate row for each second.

Also note re normalisation: it can be helpful if you're trying to minimise duplication, but has the downside that you'll have to do a lot more joins when you query. You might be better off using a flatter data structure, maybe something like a star schema.
User avatar
ShaunWhite
Posts: 9731
Joined: Sat Sep 03, 2016 3:42 am

TLindeth wrote:
Sun Mar 14, 2021 1:56 pm
Ah I know the area well mate - I grew up in Belmont/Carshalton Beeches and currently live just near Sutton station. Small world!
Small world indeed, I worked in station approach in Cheam for ages in 80s when Clarke & Tilley were there (who eventually became part of DST). I lived in South Croydon for 20yrs (22-42) so know your manor pretty well.
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

harry wrote:
Sun Mar 14, 2021 3:53 pm
TLindeth wrote:
Sun Mar 14, 2021 2:19 pm
So looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?

Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Great! Yes I think you're overcomplicating things slightly. In this case the best thing to do is probably to create separate fields (in the same table) for each of the items you mentioned, and then insert a separate row for each second.

Also note re normalisation: it can be helpful if you're trying to minimise duplication, but has the downside that you'll have to do a lot more joins when you query. You might be better off using a flatter data structure, maybe something like a star schema.
Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... :D ). So in essence I could create a fact table which has something along the lines of:

Date
Time
RaceID
RunnerID
CourseID
RaceTypeID

etc.. etc..

With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..

Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then :D but thanks for your help!
Last edited by TLindeth on Sun Mar 14, 2021 4:25 pm, edited 1 time in total.
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

ShaunWhite wrote:
Sun Mar 14, 2021 4:14 pm
TLindeth wrote:
Sun Mar 14, 2021 1:56 pm
Ah I know the area well mate - I grew up in Belmont/Carshalton Beeches and currently live just near Sutton station. Small world!
Small world indeed, I worked in station approach in Cheam for ages in 80s when Clarke & Tilley were there (who eventually became part of DST). I lived in South Croydon for 20yrs (22-42) so know your manor pretty well.
Christ it certainly is Shaun! I'm just off Mulgrave Road which is basically adjacent to that. I know South Croydon well, grew up playing football regularly in that area, still a regular in The Jolly Farmer just down the road in Purley too!
harry
Posts: 14
Joined: Tue Sep 08, 2020 9:04 am

TLindeth wrote:
Sun Mar 14, 2021 4:20 pm

Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... :D ). So in essence I could create a fact table which has something along the lines of:

Date
Time
RaceID
RunnerID
CourseID
RaceTypeID

etc.. etc..

With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..

Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then :D but thanks for your help!
Haha yeah there's so much to it, I always find I end up spending a lot more time than I plan on the database side of things and it takes me a few iterations to get it the way I want. The main thing is to get something reasonable that will work, start capturing the data, and then when you think of a better way to do it later you can migrate your old data into that. I think your plan sounds pretty good!
User avatar
ShaunWhite
Posts: 9731
Joined: Sat Sep 03, 2016 3:42 am

TLindeth wrote:
Sun Mar 14, 2021 4:24 pm

Christ it certainly is Shaun! I'm just off Mulgrave Road which is basically adjacent to that. I know South Croydon well, grew up playing football regularly in that area, still a regular in The Jolly Farmer just down the road in Purley too!
:) Bloody google. I've just spend the last 5mins 'walking' down Mulgrave Rd re-living my old commute. I wasn't in the posh end down near Purley, I was up near the flyover (in The Waldrons), it was only estate agents that called it South Croydon really.
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

harry wrote:
Sun Mar 14, 2021 5:09 pm
TLindeth wrote:
Sun Mar 14, 2021 4:20 pm

Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... :D ). So in essence I could create a fact table which has something along the lines of:

Date
Time
RaceID
RunnerID
CourseID
RaceTypeID

etc.. etc..

With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..

Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then :D but thanks for your help!
Haha yeah there's so much to it, I always find I end up spending a lot more time than I plan on the database side of things and it takes me a few iterations to get it the way I want. The main thing is to get something reasonable that will work, start capturing the data, and then when you think of a better way to do it later you can migrate your old data into that. I think your plan sounds pretty good!

Thanks Harry - now I just need to put the plan into action, maybe that will my help my understanding of how this truly works...
TLindeth
Posts: 16
Joined: Thu Mar 04, 2021 4:03 pm

ShaunWhite wrote:
Sun Mar 14, 2021 5:39 pm
TLindeth wrote:
Sun Mar 14, 2021 4:24 pm

Christ it certainly is Shaun! I'm just off Mulgrave Road which is basically adjacent to that. I know South Croydon well, grew up playing football regularly in that area, still a regular in The Jolly Farmer just down the road in Purley too!
:) Bloody google. I've just spend the last 5mins 'walking' down Mulgrave Rd re-living my old commute. I wasn't in the posh end down near Purley, I was up near the flyover (in The Waldrons), it was only estate agents that called it South Croydon really.
Haha - Sutton has changed a fair bit since your commuting days. There are significant developments around the station now, it actually looks pretty impressive. Shame the high street hasn't really received the same attention. The same could be said for Croydon too!
Post Reply

Return to “Betfair Data”