Dogs Pre Off Data Database Design

Post Reply
User avatar
FrankColumbo
Posts: 5
Joined: Fri Jan 15, 2021 4:39 am

Hello all

Attached is a picture of my scheme design for holding pre off data for dogs.

Pre off data is collected every half second (ish) and then saved to a .csv file, and this file is uploaded to a DB.

Please feel free to use it, or give suggestions on how it can be improved.

Columbo
You do not have the required permissions to view the files attached to this post.
User avatar
johnsheppard
Posts: 235
Joined: Mon Feb 04, 2019 6:00 am
Location: Cairns Australia

Cant see what else is in RaceData, but you'd want that in 3rd normal form wouldnt ya? i.e. A market table, A Runner table (specific to a market), and a Runner Description table (independent of the market), and a price table...why not just mirror the api?
TimeToStart, why not a time stamp? You can't store that on the fly because you don't know exactly when it's going to start? right? (I havent got up to looking at dogs yet)
Why split of the Distance into a different table?
Your race table would probably just be your market table as per the api....
I'm a bit rusty though :)
User avatar
MemphisFlash
Posts: 1569
Joined: Fri May 16, 2014 10:12 pm
Location: Leicester

You say feel free to share but you haven't posted
anything we can have a look at?
ilovepizza82
Posts: 227
Joined: Thu Nov 02, 2017 3:41 pm

The diagram...what is it made in ?
And 2...what does all this science even mean ? : )
You do not have the required permissions to view the files attached to this post.
User avatar
FrankColumbo
Posts: 5
Joined: Fri Jan 15, 2021 4:39 am

Good morning / G' Day

thanks for you replies.

I didnt explain this very well. I use BA excel functions to get this data (I currently have zero skills for obtaining this direct from the API) live. Excel records this data and creates an .csv file of all the races (UK only) at the end of the day. Another program then grabs the .csv and puts it into the database on a daily basis.

johnsheppard:
The RaceData table just repeats the columns until the 6th runner.
Runner data - the DB was created to enable back testing of bots; its not a prefect test, but gives you a pretty good idea of what doesnt work and what could work (the next stage is practice mode). That is why, for me, the runner names are not relevant.
TimeToStart. This was my biggest headache. Here are my thoughts.
If I am using a bot I have to green up at some point before the race starts.
The only thing I am certain of is that the race wont start before the start time.
Therefore, everything after the start time is not required - the bot must green up before start
Different ways to store time in a DB: UTC, epoch, HH:MM:SS.mmm, YYYYMMDD HH:MM:SS.mmm, number of milisecond etc etc
This column is there to ensure that race data is ordered by time when you request it from the DB
I've spilt the distance into another table because of normalisation

MemphisFlash:
During this development the hardest thing (for me) was designing the database. I thought it might be useful to others

ilovepizza82
This diagram was made in mySql workbench. This program is free and really good for design.
lol

everyone:
I would class myself as mediocre in database design. If anyone thinks my reasoning is wrong or there are other / better ways, please let me know.

EDIT: with time to start you can ask for tthings like: all races with volumne greater than x at 4 minutes to start
User avatar
johnsheppard
Posts: 235
Joined: Mon Feb 04, 2019 6:00 am
Location: Cairns Australia

FrankColumbo wrote:
Fri May 28, 2021 10:22 am
johnsheppard:
The RaceData table just repeats the columns until the 6th runner.
Runner data - the DB was created to enable back testing of bots; its not a prefect test, but gives you a pretty good idea of what doesnt work and what could work (the next stage is practice mode). That is why, for me, the runner names are not relevant.
TimeToStart. This was my biggest headache. Here are my thoughts.
If I am using a bot I have to green up at some point before the race starts.
The only thing I am certain of is that the race wont start before the start time.
Therefore, everything after the start time is not required - the bot must green up before start
Different ways to store time in a DB: UTC, epoch, HH:MM:SS.mmm, YYYYMMDD HH:MM:SS.mmm, number of milisecond etc etc
This column is there to ensure that race data is ordered by time when you request it from the DB
I've spilt the distance into another table because of normalisation

everyone:
I would class myself as mediocre in database design. If anyone thinks my reasoning is wrong or there are other / better ways, please let me know.
I am no 30 year hardened professional, but in my experience with database design (or any kind of software engineering) there's never really a 100% always correct answer... everything always has a trade off and it's ALWAYS about picking the lesser of several evils for YOUR situation.. I would say just do what works for you and keep it simple as you can. It's not like it's going to be a corporate database with millions of users.

For mine the distance thing isn't necessary in another table. It's a property of the race entity, and isn't going to change independent of it.

Regarding TimeToStart, I have always been of the thought that you should record what you see, then calculate whatever you have to after the fact... (there are exceptions, but I don't think this is one of them). It doesn't really matter what format you store it as. I use UTC because that's what the API provides. If you just time stamp it, you can calc time frames based on exactly when the race did actually jump (else wise you are doing some messy calculations)... I would presume there are nuances to market behaviour when races are delayed, etc... those would be things you could exploit.... (I don't do dogs though so I don't really know)
Post Reply

Return to “Betfair Data”