spreadbetting wrote: ↑Wed Jan 22, 2020 3:42 pm
Bog wrote: ↑Wed Jan 22, 2020 3:13 pm
How much time it took to learn that? I started to watch Python tutorials on YT, total newbie, never coded, but looks interesting. So much info. Any advice?
Probably not as long as you'd think , I bought the £9 udemy python course that was recommended on this thread
viewtopic.php?f=55&t=19959
It's about 30 hours but most of that is exercises or tests which I skipped as they were a bit boring plus some of the SQL stuff isn't needed to start but not hard either. I finished watching it around xmas time so needed to test my new found skills on something and scraping with python isn't too hard. I had coded previously with php but mainly just look on google when I need to do something still . But doing a course does give you that structured learning and the course, even though it gets boring, is quite good and well done. I probably managed around an hour most days, can't imagine I'd be able to write anything without google though

But for me coding is a means to an end so once I have something working I never bother coding or trying to code.
Only had dealings so far with VBA for excel, php for old web stuff and python but got to say python is definetly the easiest and , being a newer language, seems to have learnt alot from the failings of other coding languages.
Here's the code I wrote, I imagine most pro coders would spot so many areas it could be made more efficient but as a first attempt at a scraper I was happy it actually kicked out what I needed.
Code: Select all
import re
import requests
from bs4 import BeautifulSoup
from requests_html import HTMLSession
def extract_times(input):
times_regex = re.compile(r'Best: (.....)sLast: (.....)s')
best_times_regex = re.compile(r'Best: (.....)s')
match = times_regex.search(input)
best_match = best_times_regex.search(input)
if match:
if float(match.group(2)) < float(match.group(1)):
return float(match.group(1))
else:
return round((float(match.group(1))+float(match.group(2)))/2,2)
if best_match:
return float(best_match.group(1))
return 100
session = HTMLSession()
baseUrl = "https://www.sportinglife.com"
str="/greyhounds/racecards/20"
res = requests.get("https://www.sportinglife.com/greyhounds/racecards")
soup=BeautifulSoup(res.text,"html.parser")
summary=soup.find_all("a", class_="")
x=0
for link in soup.find_all('a'):
link = link.get('href')
if str in link:
res = session.get(baseUrl+link)
soup=BeautifulSoup(res.text,"html.parser")
race = soup.find_all('h1')[1].get_text()
distance =soup.find(class_='gh-racecard-summary-race-class gh-racecard-summary-always-open').get_text()
summary=soup.find_all(class_="gh-racing-runner-key-info-container")
Runners = dict()
for link in summary:
Trap= link.find(class_="gh-racing-runner-cloth").get_text()
Name =re.sub(r'\(.*\)', '',link.find(class_="gh-racing-runner-greyhound-name").get_text())
Average_time = extract_times(link.find(class_="gh-racing-runner-greyhound-sub-info").get_text())
Runners[Average_time]= Trap+'. '+Name
if bool(Runners) == True and ('OR' in distance or 'A' in distance):
x = sorted(((k,v) for k,v in Runners.items()))
if (x[1][0]-x[0][0]) >=0.1:
timeDiff =round((x[1][0]-x[0][0]),2)
print(f"{race},{x[0][1]}, class {distance}, time difference {timeDiff}")
Hiya
I saw this and thought I would play with it and see if I can get it to scrape.... Do I just dump that code into python and hit return?
I get an error
IndentationError: unexpected indent
>>>
>>> if (x[1][0]-x[0][0]) >=0.1:
File "<stdin>", line 1
if (x[1][0]-x[0][0]) >=0.1:
^
IndentationError: unexpected indent
>>>
>>> timeDiff =round((x[1][0]-x[0][0]),2)
File "<stdin>", line 1
timeDiff =round((x[1][0]-x[0][0]),2)
^
IndentationError: unexpected indent
>>> print(f"{race},{x[0][1]}, class {distance}, time difference {timeDiff}")