Jut use this file, maybe you messed up the identation copying it overBlackHat Betting wrote: ↑Sun Mar 08, 2020 9:55 amYeah I copied the code and dumped it in...sa7med wrote: ↑Sun Mar 08, 2020 5:15 amBlackHat Betting wrote: ↑Sun Mar 08, 2020 1:47 am
Hiya
I saw this and thought I would play with it and see if I can get it to scrape.... Do I just dump that code into python and hit return?
I get an error
IndentationError: unexpected indent
>>>
>>> if (x[1][0]-x[0][0]) >=0.1:
File "<stdin>", line 1
if (x[1][0]-x[0][0]) >=0.1:
^
IndentationError: unexpected indent
>>>
>>> timeDiff =round((x[1][0]-x[0][0]),2)
File "<stdin>", line 1
timeDiff =round((x[1][0]-x[0][0]),2)
^
IndentationError: unexpected indent
>>> print(f"{race},{x[0][1]}, class {distance}, time difference {timeDiff}")
python is tab based rather than brackets or end statements so you have to make sure all the tabs/indents are in the right place for your nested ifs/loops etc. Try to make your code resemble what SB posted in terms of indentation.
Greyhound Mystique
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
Last edited by spreadbetting on Sun Mar 08, 2020 6:00 pm, edited 1 time in total.
- wearthefoxhat
- Posts: 3552
- Joined: Sun Feb 18, 2018 9:55 am
wearthefoxhat wrote: ↑Sun Mar 08, 2020 11:52 amBetangelBlackHat Betting wrote: ↑Sun Mar 08, 2020 9:55 amWhats the password? ThanksEyesnack wrote: ↑Sun Mar 08, 2020 9:29 amSundays Greyhound Cards here http://gofile.me/4txpF/GFuCbrkKN![]()
On these ones, the better class graded races, ie: A5,A4,A3,A2&A1 are more consistent overall. ie: Rated 75+
Other factors include, a good run after a recent drop in grade. The drop in grade may have been up to 3 races ago. A good run could be an improved calculated time compared to its' last run.
Personal preference is for the (EP) Early pace and (QAW) Quick Away types over the (SAW ) Slowly Away, Ran On. This gives an increase in probability of a clear run compared to one that could get bumped/checked/crowded.
Unless its Scurlogue Champ of course....running out of Trap 6....

https://www.youtube.com/watch?v=XAkJ9h-MPTo
- wearthefoxhat
- Posts: 3552
- Joined: Sun Feb 18, 2018 9:55 am
SB Thanks for posting this code. Just run it and it looks good!spreadbetting wrote: ↑Wed Jan 22, 2020 3:42 pmProbably not as long as you'd think , I bought the £9 udemy python course that was recommended on this thread viewtopic.php?f=55&t=19959
It's about 30 hours but most of that is exercises or tests which I skipped as they were a bit boring plus some of the SQL stuff isn't needed to start but not hard either. I finished watching it around xmas time so needed to test my new found skills on something and scraping with python isn't too hard. I had coded previously with php but mainly just look on google when I need to do something still . But doing a course does give you that structured learning and the course, even though it gets boring, is quite good and well done. I probably managed around an hour most days, can't imagine I'd be able to write anything without google thoughBut for me coding is a means to an end so once I have something working I never bother coding or trying to code.
Only had dealings so far with VBA for excel, php for old web stuff and python but got to say python is definetly the easiest and , being a newer language, seems to have learnt alot from the failings of other coding languages.
Here's the code I wrote, I imagine most pro coders would spot so many areas it could be made more efficient but as a first attempt at a scraper I was happy it actually kicked out what I needed.
Code: Select all
import re import requests from bs4 import BeautifulSoup from requests_html import HTMLSession def extract_times(input): times_regex = re.compile(r'Best: (.....)sLast: (.....)s') best_times_regex = re.compile(r'Best: (.....)s') match = times_regex.search(input) best_match = best_times_regex.search(input) if match: if float(match.group(2)) < float(match.group(1)): return float(match.group(1)) else: return round((float(match.group(1))+float(match.group(2)))/2,2) if best_match: return float(best_match.group(1)) return 100 session = HTMLSession() baseUrl = "https://www.sportinglife.com" str="/greyhounds/racecards/20" res = requests.get("https://www.sportinglife.com/greyhounds/racecards") soup=BeautifulSoup(res.text,"html.parser") summary=soup.find_all("a", class_="") x=0 for link in soup.find_all('a'): link = link.get('href') if str in link: res = session.get(baseUrl+link) soup=BeautifulSoup(res.text,"html.parser") race = soup.find_all('h1')[1].get_text() distance =soup.find(class_='gh-racecard-summary-race-class gh-racecard-summary-always-open').get_text() summary=soup.find_all(class_="gh-racing-runner-key-info-container") Runners = dict() for link in summary: Trap= link.find(class_="gh-racing-runner-cloth").get_text() Name =re.sub(r'\(.*\)', '',link.find(class_="gh-racing-runner-greyhound-name").get_text()) Average_time = extract_times(link.find(class_="gh-racing-runner-greyhound-sub-info").get_text()) Runners[Average_time]= Trap+'. '+Name if bool(Runners) == True and ('OR' in distance or 'A' in distance): x = sorted(((k,v) for k,v in Runners.items())) if (x[1][0]-x[0][0]) >=0.1: timeDiff =round((x[1][0]-x[0][0]),2) print(f"{race},{x[0][1]}, class {distance}, time difference {timeDiff}")
(For anyone new to Python, I had to :
PIP Install request
PIP Install requests_html
From terminal to get it running
-
- Posts: 4478
- Joined: Thu Oct 24, 2019 8:25 am
These are my BTL Trades for Mon Mar 09:
Perry Barr
11:03 A9 3 Commanche Rising
11:34 A8 6 Bellside Tic
Swindon
11:11 A5 3 Goeasyonme
11:56 A3 5 Westwell Delilah
12:26 A8 6 Skyflash Ralph
12:42 A6 6 Renegade Reason
12:57 A8 4 Magical Lesley
Henlow
13:39 A9 4 Ebony Velvet
Monmore
14:06 A2 1 Roxhill Mystique
16:58 A4 3 Manks Vanfrater
17:37 A5 6 Bow Lightening
Romford
15:49 A4 5 Cairns Hubble
17:08 A2 1 Moorstown Victor
17:44 A1 1 Secondtimearound
Sheffield
16:27 A6 2 Corgrigg Saoirse
16:46 A5 4 Swift Benedict
17:06 D2 2 Exiles Rocket
17:56 D3 2 Funk Leader
18:11 A8 4 Harper Of Hearts
Yarmouth
19:42 A4 2 Memories Star
20:13 A6 4 Suirview Mia
20:44 A8 6 Trapper Nellie
20:59 A5 5 Snowflake Girl
21:14 A5 4 Black Pepper
Nottingham
18:47 A3 1 Ashgrove Raven
Doncaster
19:03 B5 6 Ted Holdem
19:17 B3 1 Terrific Tina
20:06 B4 6 Tullyotter Queen
20:21 D3 3 Nans Primrose
21:08 B7 4 Hilltop Boozer
Harlow
20:11 A7 2 Lurriga Silver
These are my LTB Trades for Mon Mar 09:
Perry Barr
11:03 A9 6 Petite Honey
11:34 A8 5 Johns Vintage
12:48 A3 5 Catunda Robert
13:04 A7 5 Running Scholar
Swindon
11:11 A5 5 Wychwood Harvey
11:56 A3 4 Jet Stream News
13:12 A5 1 Brickfield Grace
13:27 A3 6 Sunfield Ruby
Central Park
12:01 A4 1 Hazelwoodjayfkay
Henlow
13:08 A4 5 Savana Alvez
13:39 A9 2 Domino Ninja
Monmore
15:02 A4 2 Strategic Stevie
16:58 A4 2 Southlodge Kane
17:54 A6 3 Kaybee Sapphire
Romford
14:52 A5 1 Jaxx Mermaid
15:11 A3 6 Borwick Bob
15:29 A2 1 Rough Hammer
17:08 A2 5 Slaneyside Otis
Sheffield
16:46 A5 1 Young Jayfkay
17:06 D2 5 Coney Ciroc
18:11 A8 1 Autumn Reaper
Yarmouth
18:22 A3 2 Milky Bar
20:44 A8 3 Clonard Pearl
Doncaster
20:06 B4 4 Barnside Lottie
21:08 B7 1 Luttons Meghan
Perry Barr
11:03 A9 3 Commanche Rising
11:34 A8 6 Bellside Tic
Swindon
11:11 A5 3 Goeasyonme
11:56 A3 5 Westwell Delilah
12:26 A8 6 Skyflash Ralph
12:42 A6 6 Renegade Reason
12:57 A8 4 Magical Lesley
Henlow
13:39 A9 4 Ebony Velvet
Monmore
14:06 A2 1 Roxhill Mystique
16:58 A4 3 Manks Vanfrater
17:37 A5 6 Bow Lightening
Romford
15:49 A4 5 Cairns Hubble
17:08 A2 1 Moorstown Victor
17:44 A1 1 Secondtimearound
Sheffield
16:27 A6 2 Corgrigg Saoirse
16:46 A5 4 Swift Benedict
17:06 D2 2 Exiles Rocket
17:56 D3 2 Funk Leader
18:11 A8 4 Harper Of Hearts
Yarmouth
19:42 A4 2 Memories Star
20:13 A6 4 Suirview Mia
20:44 A8 6 Trapper Nellie
20:59 A5 5 Snowflake Girl
21:14 A5 4 Black Pepper
Nottingham
18:47 A3 1 Ashgrove Raven
Doncaster
19:03 B5 6 Ted Holdem
19:17 B3 1 Terrific Tina
20:06 B4 6 Tullyotter Queen
20:21 D3 3 Nans Primrose
21:08 B7 4 Hilltop Boozer
Harlow
20:11 A7 2 Lurriga Silver
These are my LTB Trades for Mon Mar 09:
Perry Barr
11:03 A9 6 Petite Honey
11:34 A8 5 Johns Vintage
12:48 A3 5 Catunda Robert
13:04 A7 5 Running Scholar
Swindon
11:11 A5 5 Wychwood Harvey
11:56 A3 4 Jet Stream News
13:12 A5 1 Brickfield Grace
13:27 A3 6 Sunfield Ruby
Central Park
12:01 A4 1 Hazelwoodjayfkay
Henlow
13:08 A4 5 Savana Alvez
13:39 A9 2 Domino Ninja
Monmore
15:02 A4 2 Strategic Stevie
16:58 A4 2 Southlodge Kane
17:54 A6 3 Kaybee Sapphire
Romford
14:52 A5 1 Jaxx Mermaid
15:11 A3 6 Borwick Bob
15:29 A2 1 Rough Hammer
17:08 A2 5 Slaneyside Otis
Sheffield
16:46 A5 1 Young Jayfkay
17:06 D2 5 Coney Ciroc
18:11 A8 1 Autumn Reaper
Yarmouth
18:22 A3 2 Milky Bar
20:44 A8 3 Clonard Pearl
Doncaster
20:06 B4 4 Barnside Lottie
21:08 B7 1 Luttons Meghan
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
Glad you found it useful , Peter, it was basically cobbled from that course and checking stackoverflow.com whenever I was stuck or an error cropped up. I actually changed the script you quoted to include some error trapping and dump the data to a text file so it can be imported elsewhere . Screen scraping is always a nightmare as websites change things on their site on a whim and then errors you haven't accounted for start to crop up. So much easier to deal with API's where the format is stable.
Here's the current version with some try/except trapping, just needs the file address amending to whatever directory people want. Hopefully it'll be of use to someone as I remember when I started coding you need something that works so you can deconstruct it rather than printing "Hello" to the screen.
Code: Select all
myfile = open('C:\\Users\\?????????\\Desktop\\dogs.txt', 'w')
I suppose you are scraping sporting life? They do have API, its not public but quite easy to spot with some network monitoring.spreadbetting wrote: ↑Mon Mar 09, 2020 12:36 pmSo much easier to deal with API's where the format is stable.
sportinglife.com/api/greyhound-racing/race/193152 <- last digits are race ID from their front page. Scrap them from front page or just go from ID 1, for some 2016 data, lol.
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
Thanks, poklius, I'll check it out tomorrow see what data it returns.
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
Just had a quick look and looks like it's got everything you need and morepoklius wrote: ↑Mon Mar 09, 2020 8:03 pmI suppose you are scraping sporting life? They do have API, its not public but quite easy to spot with some network monitoring.spreadbetting wrote: ↑Mon Mar 09, 2020 12:36 pmSo much easier to deal with API's where the format is stable.
sportinglife.com/api/greyhound-racing/race/193152 <- last digits are race ID from their front page. Scrap them from front page or just go from ID 1, for some 2016 data, lol.

Big thanks for that , have to do a complete recode now to use all the available data, I'll see if I can find the football and racing api's too as they'll be very useful to scrape

Any advice on a decent , and free, network monitor to use?
I use "Inspect Element" tool in Firefox browser, at Network tab you can see everything that is being loaded. Look for json, xml data types. Most of the time if page is not pure javascript you can find links to some kind of text format data.spreadbetting wrote: ↑Tue Mar 10, 2020 12:08 pmAny advice on a decent , and free, network monitor to use?
-
- Posts: 4478
- Joined: Thu Oct 24, 2019 8:25 am
These are my BTL Trades for Wed Mar 11:
Swindon
11:03 A9 2 Knockmehill Jim
11:34 A3 2 Geelo Dee Dee
11:48 A6 2 Saleen Mollie
12:04 A3 4 Crypto Blues
12:48 A9 2 Coolykereen Day
13:04 A9 1 Sirius Lady
Central Park
11:14 D3 6 Miss Van Dijk
11:28 A5 1 Heather Time
11:59 A2 1 Maireads Brave
13:28 A2 5 Holborn Runtowin
Romford
13:22 A4 4 Shake It Up
13:37 A9 3 Marbella Katie
Newcastle
13:59 A8 3 Sals Magic
Monmore
14:18 A6 4 Crackerjackie
15:18 A3 2 Rip Rock Paddy
17:18 A7 5 Up To You
Belle Vue
16:27 A6 4 Michaels Advice
17:56 A2 2 Shaneboy Berg
Hove
14:48 A7 5 Annies King
15:27 A9 2 Insane Rocky
16:07 A8 2 Conor Pass Flyer
Sunderland
18:39 A3 5 Zari Bally
18:55 A1 1 Abbys Attitude
20:13 HP 6 Fairholme Sky
20:44 A5 4 Mayhem Mollie
Doncaster
18:27 A2 5 Santro Mac
18:44 A2 1 Haven Dreamer
19:17 A4 5 A Definate Berry
19:33 A5 6 Sive The Lady
20:38 D3 6 Charlie Be Slick
Peterborough
20:37 A4 2 Breeze
These are my LTB Trades for Wed Mar 11:
Swindon
11:03 A9 1 Marley Boy
12:04 A3 1 Ballymac Wow
12:48 A9 4 Treanaree Lilly
Central Park
11:59 A2 4 Holborn Tilly
12:13 D2 5 Castlerock Jack
13:47 HP 1 Tiger Duke
Romford
12:36 A8 4 Beautiful South
13:22 A4 1 Bonville Bruno
13:37 A9 5 Harley Queen
Newcastle
13:44 HP 1 Watermill Elsa
Monmore
14:18 A6 6 Smokestack Loco
15:18 A3 4 Headleys Breda
17:18 A7 2 Moorstown Paddy
Belle Vue
14:27 A7 3 Croaghill Nancy
15:47 A6 5 Mrs Mac
16:27 A6 2 Belle Vue Ruth
16:46 A5 6 Burning Rubber
18:11 A4 5 Blistering Flash
Hove
14:48 A7 1 Snooze Ya Lose
17:28 A5 3 Rockys Ace
Sunderland
18:22 A4 3 Los Pepes
18:39 A3 4 Greenhall Stella
20:13 HP 2 Cappagh Tess
20:29 HP 2 Nuthill Jackie
20:44 A5 5 Mouna Gneiss
21:14 A2 4 Vincys Jaxxon
Doncaster
18:27 A2 6 Let It Ride
19:17 A4 2 Nolas Bloom
20:53 B4 5 Moss Keeto
Peterborough
19:05 A5 3 Maybeanothertime
20:37 A4 5 Beech Hill Jet
20:51 A7 4 Making Moments
Swindon
11:03 A9 2 Knockmehill Jim
11:34 A3 2 Geelo Dee Dee
11:48 A6 2 Saleen Mollie
12:04 A3 4 Crypto Blues
12:48 A9 2 Coolykereen Day
13:04 A9 1 Sirius Lady
Central Park
11:14 D3 6 Miss Van Dijk
11:28 A5 1 Heather Time
11:59 A2 1 Maireads Brave
13:28 A2 5 Holborn Runtowin
Romford
13:22 A4 4 Shake It Up
13:37 A9 3 Marbella Katie
Newcastle
13:59 A8 3 Sals Magic
Monmore
14:18 A6 4 Crackerjackie
15:18 A3 2 Rip Rock Paddy
17:18 A7 5 Up To You
Belle Vue
16:27 A6 4 Michaels Advice
17:56 A2 2 Shaneboy Berg
Hove
14:48 A7 5 Annies King
15:27 A9 2 Insane Rocky
16:07 A8 2 Conor Pass Flyer
Sunderland
18:39 A3 5 Zari Bally
18:55 A1 1 Abbys Attitude
20:13 HP 6 Fairholme Sky
20:44 A5 4 Mayhem Mollie
Doncaster
18:27 A2 5 Santro Mac
18:44 A2 1 Haven Dreamer
19:17 A4 5 A Definate Berry
19:33 A5 6 Sive The Lady
20:38 D3 6 Charlie Be Slick
Peterborough
20:37 A4 2 Breeze
These are my LTB Trades for Wed Mar 11:
Swindon
11:03 A9 1 Marley Boy
12:04 A3 1 Ballymac Wow
12:48 A9 4 Treanaree Lilly
Central Park
11:59 A2 4 Holborn Tilly
12:13 D2 5 Castlerock Jack
13:47 HP 1 Tiger Duke
Romford
12:36 A8 4 Beautiful South
13:22 A4 1 Bonville Bruno
13:37 A9 5 Harley Queen
Newcastle
13:44 HP 1 Watermill Elsa
Monmore
14:18 A6 6 Smokestack Loco
15:18 A3 4 Headleys Breda
17:18 A7 2 Moorstown Paddy
Belle Vue
14:27 A7 3 Croaghill Nancy
15:47 A6 5 Mrs Mac
16:27 A6 2 Belle Vue Ruth
16:46 A5 6 Burning Rubber
18:11 A4 5 Blistering Flash
Hove
14:48 A7 1 Snooze Ya Lose
17:28 A5 3 Rockys Ace
Sunderland
18:22 A4 3 Los Pepes
18:39 A3 4 Greenhall Stella
20:13 HP 2 Cappagh Tess
20:29 HP 2 Nuthill Jackie
20:44 A5 5 Mouna Gneiss
21:14 A2 4 Vincys Jaxxon
Doncaster
18:27 A2 6 Let It Ride
19:17 A4 2 Nolas Bloom
20:53 B4 5 Moss Keeto
Peterborough
19:05 A5 3 Maybeanothertime
20:37 A4 5 Beech Hill Jet
20:51 A7 4 Making Moments