I really want to get to grips with some of the Betfair data sheets ( using the Betfair data downloader from paspuggie IIRC). That's working fine, and I'm a fairly advanced Excel user- but Excel isn't really built for storing large quantities of data, nor is it much good at queries and chopping data around.
Thing is- I don't know where to go from here! I have very very basic knowledge of using SQL ( 10 yrs ago plus when I was working for a big multinational that basically threw all manner of systems at us)- but now it's me and my laptop, with no IT department or SQL gurus to call upon, I feel a bit lost!
How did you guys start re collecting and formatting data?
Using Excel or SQL for data crunching
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
Started with Excel and soon realised it wasn't up to the job. So I now store all my raw api data in SQL.
Like you it had been a decade plus since I used it but you don't need anything very sophisticated so relearning isn't a huge job, just a week or 3. It's certainly easier than getting used to the lingo of the newer dB stuff.
I use the MS SQL Server Management Studio front end (with a SQL Server 2008R2 database) but also link it to Excel for graphical output, or extract csv files using code (c#) to analyse in Excel.
It's harder to start with but worth it in the long run.
Like you it had been a decade plus since I used it but you don't need anything very sophisticated so relearning isn't a huge job, just a week or 3. It's certainly easier than getting used to the lingo of the newer dB stuff.
I use the MS SQL Server Management Studio front end (with a SQL Server 2008R2 database) but also link it to Excel for graphical output, or extract csv files using code (c#) to analyse in Excel.
It's harder to start with but worth it in the long run.
Am pretty much the same as Shaun with everything going into either SQL Server or into MS Access DBs that I'm yet to port over (mostly due to laziness).
I tend to recommend here for people to learn SQL
https://www.w3schools.com/sql/
For getting the data into a database from Excel I use ADO connections in my VBA as when I do shift stuff from one backend DB to another, its easy enough to just update the connection string (the SQL itself doesn't need major amounts of modification)
Main two bits you need are
ADODB.Connection
ADODB.Recordset
Adding or updating data is then easy enough using ADO
As a sample (and no this wouldn't compile)
Fetching
Where mstrGetStandardTimesSQL is a function that returns a SQL statement
And if I was storing data it'd be very similar
I tend to recommend here for people to learn SQL
https://www.w3schools.com/sql/
For getting the data into a database from Excel I use ADO connections in my VBA as when I do shift stuff from one backend DB to another, its easy enough to just update the connection string (the SQL itself doesn't need major amounts of modification)
Main two bits you need are
ADODB.Connection
ADODB.Recordset
Adding or updating data is then easy enough using ADO
As a sample (and no this wouldn't compile)
Fetching
Code: Select all
Public Function gstrConnString() As String
gstrConnString = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=D:\Trading\Models\Horse Racing\Ratings Backing DB.accdb;Persist Security Info=False;"
End Function
...
...
Private Sub Blah()
.....
Set objConn = New ADODB.Connection
objConn.Open gstrConnString()
Set objRST = New ADODB.Recordset
objRST.Open mstrGetStandardTimesSQL(), objConn, adOpenStatic, adLockReadOnly
And if I was storing data it'd be very similar
Code: Select all
objConn.Open gstrConnString()
objConn.Execute "INSERT INTO tablle (col1, col2) VALUES (val1, val2)"
Almost same as Shaun - SQL 2008R2 / 2012 + ssms + excel.
Data in sql - do broad summary queries in ssms (simple SUM() queries etc) and if something looks interesting export the underlying rows to excel.
I have a standard skeleton spreadsheet with predefined graphs, summary analysis tables etc so just a simple copy/paste gives me loads of info. Also, pivot tables are really useful for poking around. Rinse and repeat.
If you want sql features/capacity beyond the free versions there are lots of sellers of licence keys online that work. You just need to find installation media.
I've not got anything useful out of PowerBI (powerquery replacement) - it's a bit like pivot graph connected to the db but it can't handle lots of features of normal queries - means you need predefined views for anything a little bit complex which brings its own problems of view features and maintenance.
Data in sql - do broad summary queries in ssms (simple SUM() queries etc) and if something looks interesting export the underlying rows to excel.
I have a standard skeleton spreadsheet with predefined graphs, summary analysis tables etc so just a simple copy/paste gives me loads of info. Also, pivot tables are really useful for poking around. Rinse and repeat.
If you want sql features/capacity beyond the free versions there are lots of sellers of licence keys online that work. You just need to find installation media.
I've not got anything useful out of PowerBI (powerquery replacement) - it's a bit like pivot graph connected to the db but it can't handle lots of features of normal queries - means you need predefined views for anything a little bit complex which brings its own problems of view features and maintenance.
- MemphisFlash
- Posts: 2159
- Joined: Fri May 16, 2014 10:12 pm
- Location: Leicester
Your wrong about excel not being up to the task for data.
Power Query can handle millions of rows of data, then slice and dice it as you wish.
Power Query can handle millions of rows of data, then slice and dice it as you wish.
Ditto SQL Server but with SQL Server report builder (all free versions). My charts are simple so can't weigh Report builder vs Excel. I imagine the latest paid versio of Excel could well be more powerful.
Only thing I would say is installing these things and getting to grips with the functionaity can present hair-tearing hurdles.
When you overcome one of those hurdles make sure you document it - beleive me it's worth the effort to do this before you excitedly move on to the next stage. Patience and diligence at those moments will repay you if lose your config for any reason (it happens, wrecked a laptop but saved the hard drive which was very lucky as I'd been lax on my backup strategy).
Only thing I would say is installing these things and getting to grips with the functionaity can present hair-tearing hurdles.
When you overcome one of those hurdles make sure you document it - beleive me it's worth the effort to do this before you excitedly move on to the next stage. Patience and diligence at those moments will repay you if lose your config for any reason (it happens, wrecked a laptop but saved the hard drive which was very lucky as I'd been lax on my backup strategy).
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
It depends what you want to do with it, and we haven't suggested the more complex solution because we didn't know Excel. Databases were career bread and butter for some of the above people.MemphisFlash wrote: ↑Sat Sep 10, 2022 10:41 amYour wrong about excel not being up to the task for data.
Power Query can handle millions of rows of data, then slice and dice it as you wish.
Eg I'm collection iro 2gb of data a day in total, just the UK dogs is about 2.5m "rows" of data although it's not arranged in rows in a relational dababase. Running a 6 month backtest means reading almost half a billion market change messages. That's aside from analysing the hundreds of thousands of bets that generates.
Others will have their own use case scenario such as linking to the latest ML or AI software.
Excel will get you so far but if you want to be limited by your imagination rather than your tools then the upfront effort is worth it.
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
Just to clarify I'm not knocking Excel for looking at your data, it's a really brilliant tool for that, it's just not a great place to keep the data warehouse itself.MemphisFlash wrote: ↑Sat Sep 10, 2022 10:41 amYour wrong about excel not being up to the task for data.
Interesting reading, thanks- shows here's more than one way to skin a cat. I know Excel and I'm comfortable with it, as well as some Power Query; but keeping data in Excel is a worry of mine as if it corrupts I've lost the lot. Plus seeing that you can save 2GB of data day just on UK dogs shows Excel won't cope long term for storage.
Will definitely look into some SQL for server functionality- I am prepared to be tearing my hair out while I get to grips with it though!
Will definitely look into some SQL for server functionality- I am prepared to be tearing my hair out while I get to grips with it though!
Google usually throws up results from stackoverflow and specialist excel sites. And this forum is a rich resource. Personally I've found answers without asking a direct question on here (the Q had already been answered, just needed to alight on it). But for sure I would ask any question on here that I can't find a solution for and I'm certain I'd get it.BetWin wrote: ↑Sat Sep 10, 2022 3:09 pmInteresting reading, thanks- shows here's more than one way to skin a cat. I know Excel and I'm comfortable with it, as well as some Power Query; but keeping data in Excel is a worry of mine as if it corrupts I've lost the lot. Plus seeing that you can save 2GB of data day just on UK dogs shows Excel won't cope long term for storage.
Will definitely look into some SQL for server functionality- I am prepared to be tearing my hair out while I get to grips with it though!
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
(That was the api stream for UK and Aus, dogs and horses. The UK dogs are only about 1/4 of it.)
... But how are you planning to harvest the data? Will it be just api price data or supplemented/synced with anything else. And what do you plan to do with it? Just rhetorical questions (I don't care ) but all part of the decision I guess.
The one thing I'm sure we all agree on is to collect anything and everything. A year down the line the one thing you thought you wouldn't need is the only thing you want.
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
This is a really useful video from our Antipodean friends about data with lots of interesting stuff for people at all levels.
It's rare to see content that's detailed and current, so worth watching even if you don't use data much or don't think it would help. https://youtu.be/2gC_ES-VKwo
It's rare to see content that's detailed and current, so worth watching even if you don't use data much or don't think it would help. https://youtu.be/2gC_ES-VKwo
You do not have the required permissions to view the files attached to this post.