Betfair SP file Scrapper using Python

Post Reply
PeterLe
Posts: 3715
Joined: Wed Apr 15, 2009 3:19 pm

Morning,
I though this may be helpful for those new to Python and who maybe interested in downloading betfair SP files for analysis. Its really targeted at people new to Python/Programming
I've had some great help from others over the years and I wanted to help those who find this whole subject a bit daunting.

Overview:
This file scrapper navigates to the betfair SP history URL and subject to your date input, it downloads the SP files to a local directory and then concatenates them. You can then use the data in SQL for analysis.
You will need to set up you directories as per the code (file_path) , or the program will fail.
NOTE: Don't try downloading too many files in one go!!!!! Just do a few months at a time.
This will save you loads of time.
(Although not detailed in the program, I then copy these to my NAS drives and Im starting to build up my own data set).

How to do it:
If your new to Python and would like to have a go; Id recommend - The modern Python 3 bootcamp by Colt Steele on Udemy by Colt Steele (Currently £16.99) 30 Hours , I didnt complete this 100%, I tend to use it by referring back to certain sections as I need it etc.

You will need an editor (IDE) - I use Pycharm (The basic version is free and will do you for now! (I cant see me wanting to upgrade anytime soon, although some of the better coders may make that suggestion? copy and paste the code in below (after you have the basics of Python under your belt)

Once you have the data, you will need to view/analyse it. SQL is a good tool for this. new to SQL? So was I. This is an excellent course and very hands on. SQL is fairly simple once you get the grasp of it:

My SQL for data Analysis by John Pauler (£12.99) 6 Hours Training

Caveats: I'm only just learning this myself, so i may not be able to answer any queries (Note if you encounter an error, Python will give you a message. There's a very good chance that someone else will have had the same issue before and will give you pointers as to the issues. In fact, I've found this a big part and parcel of learning and you will need to put the 'spade work' in.

If you find it useful and want to progress to the next stage (collecting API data/prices etc) then search for Liam's betfairlightweight and Flumine on Github. (I'm still working on that! Its a stretch goal for me!).
Anyway, this is a nice simple project to get your feet wet.
regards
Peter

# My Betfair File Scraper

import wget
import traceback
from urllib.error import HTTPError
from os import path, mkdir, remove
from datetime import datetime, timedelta
import shutil
import os

# If set to True, it will re-download and overwrite existing files
OVERWRITE = False

DATE_INPUT_FORMAT = '%d/%m/%Y'

URL = 'https://promo.betfair.com/betfairsp/prices/'

SP_DATA_PATH = r'A:\Users\User-1\Data\BetfairSPData'

GREYHOUND_FOLDER = 'Greyhound'
HORSERACING_FOLDER = 'Horseracing'

PREFIX_LOCATION_MAP = {'dwbfgreyhoundplace': path.join(GREYHOUND_FOLDER, 'Greyhound_Place'),
'dwbfgreyhoundwin': path.join(GREYHOUND_FOLDER, 'Greyhound_Win'),
'dwbfpricesireplace': path.join(HORSERACING_FOLDER, 'IREHorseRacing_Place'),
'dwbfpricesirewin': path.join(HORSERACING_FOLDER, 'IREHorseRacing_Win'),
'dwbfpricesukwin': path.join(HORSERACING_FOLDER, 'UKHorseRacing_Win'),
'dwbfpricesukplace': path.join(HORSERACING_FOLDER, 'UKHorseRacing_Place')}

TEMP_FOLDER = 'Temp'

if __name__ == '__main__':

# Check if target folders exist and make sure temp folders are emptied

folders = list(PREFIX_LOCATION_MAP.values())
temp_folders = [path.join(x, TEMP_FOLDER) for x in folders]

for location in [GREYHOUND_FOLDER, HORSERACING_FOLDER] + folders + temp_folders:
folder_path = path.join(SP_DATA_PATH, location)
if location in temp_folders:
try:
shutil.rmtree(folder_path)
except FileNotFoundError:
pass
if not path.exists(folder_path):
mkdir(folder_path)

# Get date input

dt_start = dt_end = None

while dt_start is None:
try:
date_input = input('Please enter the start date of the Files you want to download: ').strip()
dt_start = datetime.strptime(date_input, DATE_INPUT_FORMAT)
except ValueError:
print('Could not parse the date.')

while dt_end is None:
try:
date_input = input('Please enter the end date of the Files you want to download: ').strip()
dt_end = datetime.strptime(date_input, DATE_INPUT_FORMAT)
except ValueError:
print('Could not parse the date.')

# Generate a range of dates

dates = [dt_start + timedelta(days=x) for x in range((dt_end - dt_start).days + 1)]

# Download files to temp folders

for date in dates:
date_str = datetime.strftime(date, '%d%m%Y')

for prefix in PREFIX_LOCATION_MAP:
try:
filename = '{}{}.csv'.format(prefix, date_str)
print(filename)
destination = path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], TEMP_FOLDER, filename)

if path.exists(destination):
if OVERWRITE:
remove(destination)
else:
continue

wget.download(URL + filename, destination, )
except HTTPError as http_error:
print('HTTP Error:', http_error.code)
except:
print(traceback.format_exc())

# Concatenate files

for prefix in PREFIX_LOCATION_MAP:
filename = '{}{}-{}.csv'.format(prefix,
datetime.strftime(dt_start, '%d%m%Y'),
datetime.strftime(dt_end, '%d%m%Y'))
file_path = path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], filename)

with open(file_path, 'w') as destination:
folder_path = os.path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], TEMP_FOLDER)
start = 0
for name in os.listdir(folder_path):
with open(os.path.join(folder_path, name)) as source:
lines = source.readlines()
# Column names should be added just once
destination.writelines(lines[start:])
start = 1
shutil.rmtree(folder_path)
spreadbetting
Posts: 3140
Joined: Sun Jan 31, 2010 8:06 pm

Thanks Peter, I currently use PHP but will definitely have a look to switch when I get a chance.
LinusP
Posts: 1871
Joined: Mon Jul 02, 2012 10:45 pm

Looking good, you will be contributing to open source soon 8-)

Btw if you wrap in code tags its easier to read:

Code: Select all

import wget
import traceback
from urllib.error import HTTPError
from os import path, mkdir, remove
from datetime import datetime, timedelta
import shutil
import os

# If set to True, it will re-download and overwrite existing files
OVERWRITE = False

DATE_INPUT_FORMAT = '%d/%m/%Y'

URL = 'https://promo.betfair.com/betfairsp/prices/'

SP_DATA_PATH = r'A:\Users\User-1\Data\BetfairSPData'

GREYHOUND_FOLDER = 'Greyhound'
HORSERACING_FOLDER = 'Horseracing'

PREFIX_LOCATION_MAP = {'dwbfgreyhoundplace': path.join(GREYHOUND_FOLDER, 'Greyhound_Place'),
                       'dwbfgreyhoundwin': path.join(GREYHOUND_FOLDER, 'Greyhound_Win'),
                       'dwbfpricesireplace': path.join(HORSERACING_FOLDER, 'IREHorseRacing_Place'),
                       'dwbfpricesirewin': path.join(HORSERACING_FOLDER, 'IREHorseRacing_Win'),
                       'dwbfpricesukwin': path.join(HORSERACING_FOLDER, 'UKHorseRacing_Win'),
                       'dwbfpricesukplace': path.join(HORSERACING_FOLDER, 'UKHorseRacing_Place')}

TEMP_FOLDER = 'Temp'

if __name__ == '__main__':

    # Check if target folders exist and make sure temp folders are emptied

    folders = list(PREFIX_LOCATION_MAP.values())
    temp_folders = [path.join(x, TEMP_FOLDER) for x in folders]

    for location in [GREYHOUND_FOLDER, HORSERACING_FOLDER] + folders + temp_folders:
        folder_path = path.join(SP_DATA_PATH, location)
        if location in temp_folders:
            try:
                shutil.rmtree(folder_path)
            except FileNotFoundError:
                pass
        if not path.exists(folder_path):
            mkdir(folder_path)

    # Get date input

    dt_start = dt_end = None

    while dt_start is None:
        try:
            date_input = input('Please enter the start date of the Files you want to download: ').strip()
            dt_start = datetime.strptime(date_input, DATE_INPUT_FORMAT)
        except ValueError:
            print('Could not parse the date.')

    while dt_end is None:
        try:
            date_input = input('Please enter the end date of the Files you want to download: ').strip()
            dt_end = datetime.strptime(date_input, DATE_INPUT_FORMAT)
        except ValueError:
            print('Could not parse the date.')

    # Generate a range of dates

    dates = [dt_start + timedelta(days=x) for x in range((dt_end - dt_start).days + 1)]

    # Download files to temp folders

    for date in dates:
        date_str = datetime.strftime(date, '%d%m%Y')

        for prefix in PREFIX_LOCATION_MAP:
            try:
                filename = '{}{}.csv'.format(prefix, date_str)
                print(filename)
                destination = path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], TEMP_FOLDER, filename)

                if path.exists(destination):
                    if OVERWRITE:
                        remove(destination)
                    else:
                        continue

                wget.download(URL + filename, destination, )
            except HTTPError as http_error:
                print('HTTP Error:', http_error.code)
            except:
                print(traceback.format_exc())

    # Concatenate files

    for prefix in PREFIX_LOCATION_MAP:
        filename = '{}{}-{}.csv'.format(prefix,
                                        datetime.strftime(dt_start, '%d%m%Y'),
                                        datetime.strftime(dt_end, '%d%m%Y'))
        file_path = path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], filename)

        with open(file_path, 'w') as destination:
            folder_path = os.path.join(SP_DATA_PATH, PREFIX_LOCATION_MAP[prefix], TEMP_FOLDER)
            start = 0
            for name in os.listdir(folder_path):
                with open(os.path.join(folder_path, name)) as source:
                    lines = source.readlines()
                    # Column names should be added just once
                    destination.writelines(lines[start:])
                    start = 1
            shutil.rmtree(folder_path)
James
Posts: 38
Joined: Fri Sep 21, 2012 12:57 pm

This is great Peter, thanks.
User avatar
MemphisFlash
Posts: 2126
Joined: Fri May 16, 2014 10:12 pm
Location: Leicester

or use this, much better and easier
Betfair File Downloader.xls
You do not have the required permissions to view the files attached to this post.
Post Reply

Return to “Betfair Exchange API”