Power Query - Example Spreadsheets (Indexed by Sport)

Post Reply
Archery1969
Posts: 3192
Joined: Thu Oct 24, 2019 8:25 am
Location: Newport

It may not be clear cut....

The legal position in the UK:

While web scraping is itself not illegal, there are certain instances where it may become illegal depending on how and what the scraper does with the data gathered.

The two most common claims that can be brought against data scrapers are breach of contract and IP infringement (specifically, database right infringement). Depending on the precise circumstances, it is possible that a data scraper could also infringe copyright or trade mark rights, data protection legislation and/or contravene the Computer Misuse Act 1990.
sionascaig
Posts: 1053
Joined: Fri Nov 20, 2015 9:38 am

paspuggie48 wrote:
Thu Dec 03, 2020 9:42 am
Vladimir CC wrote:
Wed Dec 02, 2020 5:51 pm
Archery1969 wrote:
Wed Dec 02, 2020 2:41 pm
Hi,

While I think all this stuff is excellent, people need to be careful as isn't scraping of data illegal ? :roll:

Cheers,
web scraping is like going manually by yourself on a website and noting down the info, except you have a bot that's doing it. nothing illegal IMO
Agree :) ;)
I think it depends on the websites Terms of Service (which you agree to by using their site) & specifically whether you publish it. Plenty of lawyers out there who will happily send you a cease & desist letter and damages request.

Its unlikely to be straight forward, e.g. the FBRef site pays a 3rd party for a lot of its data for use on its website so if you publish that data you could potentially have two entities after you - and does anyone really have the willpower to go through all the T&C's to really figure it out?

Suspect you are fine for personal use but not sure you are safe by publishing it on the betangel forum (why risk it?).
Archery1969
Posts: 3192
Joined: Thu Oct 24, 2019 8:25 am
Location: Newport

sionascaig wrote:
Thu Dec 03, 2020 9:56 am
paspuggie48 wrote:
Thu Dec 03, 2020 9:42 am
Vladimir CC wrote:
Wed Dec 02, 2020 5:51 pm


web scraping is like going manually by yourself on a website and noting down the info, except you have a bot that's doing it. nothing illegal IMO
Agree :) ;)
I think it depends on the websites Terms of Service (which you agree to by using their site) & specifically whether you publish it. Plenty of lawyers out there who will happily send you a cease & desist letter and damages request.

Its unlikely to be straight forward, e.g. the FBRef site pays a 3rd party for a lot of its data for use on its website so if you publish that data you could potentially have two entities after you - and does anyone really have the willpower to go through all the T&C's to really figure it out?

Suspect you are fine for personal use but not sure you are safe by publishing it on the betangel forum (why risk it?).
Agreed.

But some of them are obviously a dead give away as to where the data has been scrapped from. To me that's just asking for trouble!
You do not have the required permissions to view the files attached to this post.
sionascaig
Posts: 1053
Joined: Fri Nov 20, 2015 9:38 am

Yes, and that example also has trademark / copyright issues re logo.
User avatar
paspuggie48
Posts: 611
Joined: Thu Jun 20, 2013 9:22 am
Location: South-West

Ruddy hell...we are only trying to help.
jamesg46
Posts: 3769
Joined: Sat Jul 30, 2016 1:05 pm

paspuggie48 wrote:
Thu Dec 03, 2020 5:36 pm
Ruddy hell...we are only trying to help.
I think these guys may be trying to help you also. It's a great thing what you & Memphis have done but they do have a point & them saying it "could be a life saver".
jamesg46
Posts: 3769
Joined: Sat Jul 30, 2016 1:05 pm

Some sites don't mind you scraping and sharing... I can't remember the site Memphis has made reference to but I once used it a while back for NBA data & they openly say that they dont mind and also ask you to share where you got the data, others aren't so scraper friendly though from what I've read up.
Archery1969
Posts: 3192
Joined: Thu Oct 24, 2019 8:25 am
Location: Newport

Hi,

I wasn't trying to be negative and I said it was excellent stuff.

But broadcasting where the data is coming from is a giving them ammunition in any legal issues. TimeForm having an expensive API springs to mind. I doubt they will look kindly on anyone scraping their data and sharing for free.

Anyway, its all good stuff.

Cheers,
User avatar
paspuggie48
Posts: 611
Joined: Thu Jun 20, 2013 9:22 am
Location: South-West

It's not a problem gents, it's not like Memphis or I are selling it or profiting from it and I know about Timeform and their £500/mth API charge. Pretty hefty and unjustified cost that is LOL.

Personally, you will have seen many of my solutions have no logo...easiest solution is not to have any ;)

...watch this space !
jamesg46
Posts: 3769
Joined: Sat Jul 30, 2016 1:05 pm

Cant imagine you would have much if any of an issue anyway, pinging their website a few times a day (assuming none of the files need constant refreshes) with all different i.p address's that have downloaded the file isn't going to raise many eyebrows. Defo a good idea to scrap the logo though.
User avatar
paspuggie48
Posts: 611
Joined: Thu Jun 20, 2013 9:22 am
Location: South-West

jamesg46 wrote:
Thu Dec 03, 2020 6:48 pm
Cant imagine you would have much if any of an issue anyway, pinging their website a few times a day (assuming none of the files need constant refreshes) with all different i.p address's that have downloaded the file isn't going to raise many eyebrows. Defo a good idea to scrap the logo though.
I don't even use it, it was just an example of what can be done. I could have quite easily (and have done) copy 'n pasted the data into Excel by hand and as one knows it only takes seconds. The beauty of what Memphis and I are doing is automating it and yes I agree it's not pinging their website more than once a day, if that :)

To save face, not just to me but to BA, is to remove logos and delete the original post...and post a revised edition.
jamesg46
Posts: 3769
Joined: Sat Jul 30, 2016 1:05 pm

paspuggie48 wrote:
Thu Dec 03, 2020 6:53 pm
jamesg46 wrote:
Thu Dec 03, 2020 6:48 pm
Cant imagine you would have much if any of an issue anyway, pinging their website a few times a day (assuming none of the files need constant refreshes) with all different i.p address's that have downloaded the file isn't going to raise many eyebrows. Defo a good idea to scrap the logo though.
I don't even use it, it was just an example of what can be done. I could have quite easily (and have done) copy 'n pasted the data into Excel by hand and as one knows it only takes seconds. The beauty of what Memphis and I are doing is automating it and yes I agree it's not pinging their website more than once a day, if that :)

To save face, not just to me but to BA, is to remove logos and delete the original post...and post a revised edition.
I agree, I'm a scraping fan myself. I've not downloaded any but from what I've seen from the screen grabs (especially a Greyhound sheet Memphis has done) its all looks nice work, way better than mine.
User avatar
paspuggie48
Posts: 611
Joined: Thu Jun 20, 2013 9:22 am
Location: South-West

I think we are all scraping at some point and it brings into the whole factor (apart from advertising the logo in plain sight) if any solution on this Forum, the plethora of other Forums around, the tonne of macros people who have and will continue to develop and the Parsehub software packages of the world, are all susceptible or being subjected to website police monitoring.

Personally, the sheets I use, I get & transform data once (because I only need to scrape once), I have to assume and have some doubt that doesn't compare to some Python-Machine Learning-AI-Geeky programmes out there that are pinging every nano-second ;)
User avatar
paspuggie48
Posts: 611
Joined: Thu Jun 20, 2013 9:22 am
Location: South-West

P.S. I'm not really a scraper...I just wanted to see if I could do it LOL.

My bag is lots of data (e.g. BF Historic files) and connecting and transforming hundreds if not thousands of data files.

My record thus far was where I converted 10,300 A4 sized PDF documents into text files, each containing thousands of pages. This totalled over 60 million lines of sentences/words. With PQ I was able to connect to all 10,300 txt files and "find' information that has never been available or known of before in the history of our organisation. Of course it hasn't, I mean which human being is going to go through millions of pages to find something? It took me minutes to search for stuff !

Or where I was able to find data 'within' 20,000 rows of data in a column where a number plate was hidden in a sentence of words and then compare it to another column in a different workbook which had 30,000 rows of data and pull out the adjacent column of information...and repeat that same process for another column. That's 20,000*30,000*2 = 1.2 Billion calcs ! It would have literally took a person a year to do that task .My laptop struggled but did it in 4 hours LOL.

That's the Power of Power Query ;)
jamesg46
Posts: 3769
Joined: Sat Jul 30, 2016 1:05 pm

paspuggie48 wrote:
Thu Dec 03, 2020 7:44 pm
P.S. I'm not really a scraper...I just wanted to see if I could do it LOL.

My bag is lots of data (e.g. BF Historic files) and connecting and transforming hundreds if not thousands of data files.

My record thus far was where I converted 10,300 A4 sized PDF documents into text files, each containing thousands of pages. This totalled over 60 million lines of sentences/words. With PQ I was able to connect to all 10,300 txt files and "find' information that has never been available or known of before in the history of our organisation. Of course it hasn't, I mean which human being is going to go through millions of pages to find something? It took me minutes to search for stuff !

Or where I was able to find data 'within' 20,000 rows of data in a column where a number plate was hidden in a sentence of words and then compare it to another column in a different workbook which had 30,000 rows of data and pull out the adjacent column of information...and repeat that same process for another column. That's 20,000*30,000*2 = 1.2 Billion calcs ! It would have literally took a person a year to do that task .My laptop struggled but did it in 4 hours LOL.

That's the Power of Power Query ;)
Seems Memphis did a good job of teaching you ;)
Post Reply

Return to “Excel Power Query”