Using IMPORTXML in Google Sheets to scrape data from Transfermarkt - google-sheets

I'm trying to see if it's possible to scrape data from Transfermarkt.com to import into a Google Sheets doc.
Currently, I'm trying to import the team data from a player's profile page (see example URL here: https://www.transfermarkt.co.uk/joao-palhinha/profil/spieler/257455) but in future may want to import other data as well.
I wonder if it's possible to scrape data in this way from Transfermarkt, but if it is, any advice on what I'm doing wrong would be very much appreciated!
Right now I'm using, =IMPORTXML(B1,"/html/body/div[3]/main/header/div[4]/div/span[1]/a") where the URL is in B1. I copied the Full XPath from the HTML, but have also tried this by copying just the XPath too.
It says Loading for a few seconds before returning N/A, with an error message 'Resource at URL not found'.
I'm expecting the result in this instance to be Fulham.
Thanks

Related

IMPORTHTML-function in Google Sheets pulls wrong data

I am trying to scrape the current gas prices in a German city by using the IMPORTHTML-function in Google Sheets. The function seems to work, at least the data is being imported into my sheet. When taking a closer look, one recognizes, that the data inserted into the sheet differs from the current data displayed on the webpage I am scraping.
This is the function I inserted into my Google sheet:=IMPORTHTML("https://www.benzinpreis.de/aktuell/super_e5/deutschland/nordrhein-westfalen/koeln/koeln/koeln"; "table";4)
I took a screenshot of the differing values:
Does anyone have an idea where I made a mistake?
You may consider using external tools which can render JS website OR debug if the website makes some AJAX call, and get raw JSON instead of trying to fight with HTML.
It looks like the website uses this xhr request to get the actual data in JSON:
https://www.benzinpreis.de/bpmap-callback.php?lat_min=50.86707808969461&lat_max=51.01850632466553&lng_min=6.700286865234375&lng_max=7.215270996093751&action=getTankstellen
( see Chrome Dev Tools Network tab for detailed information )
then you might use ImportJSON to import data into your Google Sheet.
https://workspace.google.com/marketplace/app/importjson_import_json_data_into_google/782573720506
Discovering hidden APIs using Chrome Dev Tools:
https://www.youtube.com/watch?v=kPe3wtA9aPM

Error in getting data in Google Sheets from URL

I want to get following data (marked in red) from following website.
Image of data I want to get in google sheet
WEBSITE: https://trendlyne.com/equity/PE/NIFTY50/1887/nifty-50/
I tried using importhtml and importxml functions. But I get errors like "could not fetch URL".
For example I used:
=importxml("https://trendlyne.com/equity/PE/NIFTY50/1887/nifty-50/","/html/body/div[3]/div[1]/div[2]/div[2]/div[1]/div")
I am using xPathFinder google chrome plugin to get xPathQuery.
This method is working for other websites, but not working on this one.
Please help. How can I get the data to my Excel sheet.
I tested the same method with other websites and it works but I get the same error with the website you're trying to get data from.
The error "Could not fetch url" basically means that Google is not able to import the content from this specific website. This site is rejecting Google's request to prevent web scraping, so this means that IMPORT functions will not work in this scenario.

Google sheets - is import DPD parcel status possible?

I am trying to figure out whether it is possible to import latest parcel status from DPD.
For example, I would like to get 'Delivered' status from below link and import it to google sheets:
https://tracking.dpd.de/status/en_DE/parcel/05252044194808
Any attempts with importxml are, however, empty. Any chance there is a way to download the latest status to Google Sheets? Perhaps that site is secured from scrapping?
IMPORTXML and IMPORTHTML would return data whenever the web page information is not generated through JAVASCRIPT, disabling JAVASCRIPT on the web page you are trying to scrape data from shows no data, so most likely the whole content is generated through Javascript dynamically and that's the reason why those methods don't return any information.

Can text be scraped from Grammarly to google spreadsheet using IMPORTXML function?

I am trying to get texts from the Grammarly application imported into a Google spreadsheet using the IMPORTXML function. To do so, I follow the required syntax IMPORTXML(URL, xpath_query), but it keeps showing an error that the "imported content is empty".
However, the same steps work fine to import data from other websites, and I am confused what might be the matter with Grammarly. Is it because it does allow data scraping at all, maybe?
Thanks for your help. 1 2 3
not possible because this is behind the login gate. google sheets cant read such data

Importxml giving a resource at URL contents exceeds maximum size

I'm trying to scrape some baseball data from a site. I need to projected/confirmed lineups to import into a google sheet. This was working great until this morning. Now I'm getting the error Resource at URL contents exceeded the maximum size.
Any assistance as to what this means or a workaround would be a great help. Below is my code.
=IMPORTXML("https://rotogrinders.com/lineups/MLB?site=fanduel","//span[#class='pname']")
unfortunately, this won't be possible anymore because the site you trying to scrape has implemented anti-scraping measurements. eg. none of IMPORT formulas work.

Resources