Python CSV Clean Up for Greyhounds

Post Reply
User avatar
gazuty
Posts: 2553
Joined: Sun Jun 26, 2011 11:03 am

Over my summer break I am taking baby steps in Python - using PyCharm Community 2020.3

I thought I would start out looking at my last three months of Greyhound Data, so I downloaded that in csv format from my Betfair P&L page.

Those familiar will know that the first column of the csv data is very messy and so I have created a short program to clean it up and prepare a revised excel spreadsheet.

In order to run the program you will need PyCharm Community 2020.3, you will need to have pandas and openpyxl installed and have excel.

After downloading my betfair data I named it "hounds.csv" and put it in the same folder as my PyCharm project. You will see in the red text below you will need to insert your own username into the pathway.

I plan to create a program to analyse this data.

Hope this is helfpul.

/////

import pandas as pd

df = pd.read_csv('hounds.csv')
df.dropna(inplace = True)
df.drop(columns=['Start time', 'Settled date'])
header_list = ['Market', 'Track', 'Date', 'Race', 'Length', 'Profit/Loss (AUD)']
df = df.reindex(columns=header_list)

new = df["Market"].str.split("/", n = 1, expand = True)
df['Market']=new[0]
newc=new[1].str.split(" ", n=2, expand=True)
df['Track']=newc[1]
newc=newc[2].str.split(":", expand=True)
newc['Date']=newc[0].str[-9:]
df['Date']=newc['Date']
newc['Race']=newc[1].str[:3]
df['Race']=newc['Race']
newc['Length']=newc[1].str[4:9]
df['Length']=newc['Length']

df.to_excel (r'C:\Users\XXXX\Desktop\export_dataframe.xlsx', index = False, header=True)
User avatar
Dallas
Posts: 23408
Joined: Sun Aug 09, 2015 10:57 pm

Good work Gazuty

Always good to see new stuff shared to the forum, and Python is something i'd like to get stuck into and begin to learn one day
spreadbetting
Posts: 3140
Joined: Sun Jan 31, 2010 8:06 pm

Same here, Dallas. I use VBA to parse the data files but always good to learn something new. I started learning python about a year ago but got knocked off track when covid shut everything down. Hopefully get back into coding in the new yer.
User avatar
mcgoo
Posts: 898
Joined: Thu Jul 18, 2013 12:30 pm

Had to really fiddle with my pycharm interpreter :shock: and working directory (which helped a lot with understanding) to get this to work. Been whiling away my hours with some python learning but chemo and old brain makes for hard yakka :D
Thanks for this gazuty. Look forward to using pandas+python more so I can at least follow what you did :)
User avatar
gazuty
Posts: 2553
Joined: Sun Jun 26, 2011 11:03 am

mcgoo wrote:
Sat Jan 02, 2021 1:05 am
Had to really fiddle with my pycharm interpreter :shock: and working directory (which helped a lot with understanding) to get this to work. Been whiling away my hours with some python learning but chemo and old brain makes for hard yakka :D
Thanks for this gazuty. Look forward to using pandas+python more so I can at least follow what you did :)
Well done McGoo. Hope you beat that bastard cancer. One day we will die, but every day before then we will live. Carpe diem.
PeterLe
Posts: 3722
Joined: Wed Apr 15, 2009 3:19 pm

Gazuty
This is a great little course (dont pay more than 12 GBP for it though
Regards
Peter

https://www.udemy.com/course/python-for-excel-users/
Cash Collector
Posts: 12
Joined: Sat Sep 21, 2024 1:06 pm

Hey did you have any luck setting up a greyhounds CSV? I'm currently trying to create one and would love to ask a couple of questions if so!
User avatar
MemphisFlash
Posts: 2329
Joined: Fri May 16, 2014 10:12 pm

how would you add code so that Start Time is included in the clean up?
Revenant
Posts: 25
Joined: Fri Dec 20, 2024 2:09 pm

MemphisFlash wrote:
Fri Mar 21, 2025 1:52 pm
how would you add code so that Start Time is included in the clean up?
The line
df.drop(columns=['Start time', 'Settled date'])
Is dropping those columns so basically you need to leave them in and decide what format you want it in or if it's split into date and time.

To be honest I'd have thought you'd be better using VBA to process the CSV files as most of your stuff is done in excel so setting up a function would be more useful to convert and stick on your sheet for analysis.

Some VBA, probably needs a little tweaking

Code: Select all

Sub ProcessBettingPandL()
    Dim ws As Worksheet
    Dim filePath As String
    Dim fileNum As Integer
    Dim lineData As String
    Dim dataArray As Variant
    Dim outputRow As Integer
    Dim splitMarket As Variant
    Dim splitDetails As Variant
    Dim eventDetails As String
    Dim splitEvent As Variant
    Dim i As Integer

    ' Use the active sheet and clear previous data
    Set ws = ActiveSheet
    ws.Cells.Clear

    ' Prompt user to select the file
    With Application.FileDialog(msoFileDialogFilePicker)
        .Title = "Select BettingPandL.csv"
        .Filters.Clear
        .Filters.Add "CSV Files", "*.csv", 1
        .AllowMultiSelect = False
        If .Show <> -1 Then Exit Sub
        filePath = .SelectedItems(1)
    End With

    ' Open the file
    fileNum = FreeFile
    Open filePath For Input As #fileNum

    ' Write headers
    ws.Cells(1, 1).Value = "Sport"
    ws.Cells(1, 2).Value = "Track"
    ws.Cells(1, 3).Value = "Grade"
    ws.Cells(1, 4).Value = "Distance"
    ws.Cells(1, 5).Value = "Date"
    ws.Cells(1, 6).Value = "Time"
    ws.Cells(1, 7).Value = "Profit/Loss (£)"
    
    outputRow = 2

    ' Read and process file line by line
    Line Input #fileNum, lineData ' Skip the header row
    Do While Not EOF(fileNum)
        Line Input #fileNum, lineData
        dataArray = Split(lineData, ",") ' Split CSV row into columns

        ' Process "Market" column
        splitMarket = Split(dataArray(0), " / ") ' Split by " / "
        
        ' Extract event details (everything after the slash)
        eventDetails = Trim(splitMarket(1))

        ' Split the event details by " : " to separate the date + race info
        splitDetails = Split(eventDetails, " : ")

        ' Ensure there is data after " : "
        If UBound(splitDetails) < 1 Then GoTo NextRow

        ' Extract course (track) name from the portion **before the date**
        splitEvent = Split(splitDetails(0), " ") ' Split by space

        ' Track name is everything **before the last two words (date)**
        Dim trackName As String
        Dim numParts As Integer
        numParts = UBound(splitEvent)

        ' Reconstruct track name (everything before the last two elements)
        trackName = ""
        If numParts > 1 Then
            Dim j As Integer
            For j = 0 To numParts - 2 ' Exclude last two (date)
                trackName = trackName & splitEvent(j) & " "
            Next j
            trackName = Trim(trackName) ' Remove trailing space
        Else
            trackName = splitEvent(0) ' If only one word, assume it's the track
        End If

        ' Extract the race details (Grade + Distance)
        Dim raceDetails As String
        raceDetails = Trim(splitDetails(1)) ' FIX: Use it as is

        ' Handle "To Be Placed"
        If raceDetails = "To Be Placed" Then
            ws.Cells(outputRow, 1).Value = splitMarket(0) ' Sport
            ws.Cells(outputRow, 2).Value = trackName ' Track
            ws.Cells(outputRow, 3).Value = "" ' No grade
            ws.Cells(outputRow, 4).Value = "To Be Placed" ' Assign "To Be Placed" to Distance
        Else
            ' Split into Grade and Distance
            splitEvent = Split(raceDetails, " ")

            Dim grade As String, distance As String
            grade = ""
            distance = ""

            ' Ensure we have enough elements before assigning values
            If UBound(splitEvent) >= 1 Then
                grade = splitEvent(0) ' First part is always Grade
                distance = splitEvent(1) ' Second part is always Distance
            ElseIf UBound(splitEvent) = 0 Then
                ' If there's only one part, it's assumed to be the Grade
                grade = splitEvent(0)
                distance = ""
            End If

            ' Assign to correct columns
            ws.Cells(outputRow, 1).Value = splitMarket(0) ' Sport
            ws.Cells(outputRow, 2).Value = trackName ' Track
            ws.Cells(outputRow, 3).Value = grade ' Grade
            ws.Cells(outputRow, 4).Value = distance ' Distance
        End If

        ' Split "Start Time" into Date and Time
        Dim startDateTime As String
        startDateTime = Trim(dataArray(1)) ' Get the start time column
        Dim splitDateTime As Variant
        splitDateTime = Split(startDateTime, " ") ' Split by space

        If UBound(splitDateTime) = 1 Then
            ws.Cells(outputRow, 5).Value = splitDateTime(0) ' Date
            ws.Cells(outputRow, 6).Value = splitDateTime(1) ' Time
        Else
            ws.Cells(outputRow, 5).Value = startDateTime ' If no space, keep as Date
            ws.Cells(outputRow, 6).Value = "" ' Leave Time blank
        End If

        ' Copy "Profit/Loss (£)" column (Skip Settled Date)
        ws.Cells(outputRow, 7).Value = dataArray(3) ' Profit/Loss

NextRow:
        outputRow = outputRow + 1
    Loop
    
    ' Close the file
    Close #fileNum
    


End Sub
User avatar
MemphisFlash
Posts: 2329
Joined: Fri May 16, 2014 10:12 pm

thank you for your reply, very informative.
Post Reply

Return to “Trade analysis & record keeping”