Looking to import HTML information into a Google sheet - google-sheets

After multiple test and research I don't have success in importing the data of this table (div) into a Google slide.
None of the formula I tested actually work included this simple test to extract the first column/line "Name":
=importxml("https://ecosystem.lafrenchtech.com/lists/18872/list?showGrid=false", "//span[#class='table-column-text']")
:(
Anyone could help me ?
Thx by advance.

Answer:
I've tested your function on a test sheet and it returns an empty content.
According to an answer at Google Sheets importXML Returns Empty Value , IMPORTXML can not retrieve data which is being populated by a script and it is a limitation. Unfortunately, I have checked that when Javascript is disabled for the ecosystem.lafrenchtech.com site in Chrome browser, the table never loads. Thus, this confirms that the table is being populated by a script and this is the reason why it returns an empty content.
A possible alternative solution is to check if the ecosystem.lafrenchtech.com offers an API, where you can directly get the data that they show from their table using an API key (if it is available). However, this will require you to use Apps Script to parse the data from their API and then post it on your spreadsheet, which would be quite a tedious for a quite simple process.
Note:
On your post, google-slides was the set tag.

Related

IMPORTHTML-function in Google Sheets pulls wrong data

I am trying to scrape the current gas prices in a German city by using the IMPORTHTML-function in Google Sheets. The function seems to work, at least the data is being imported into my sheet. When taking a closer look, one recognizes, that the data inserted into the sheet differs from the current data displayed on the webpage I am scraping.
This is the function I inserted into my Google sheet:=IMPORTHTML("https://www.benzinpreis.de/aktuell/super_e5/deutschland/nordrhein-westfalen/koeln/koeln/koeln"; "table";4)
I took a screenshot of the differing values:
Does anyone have an idea where I made a mistake?
You may consider using external tools which can render JS website OR debug if the website makes some AJAX call, and get raw JSON instead of trying to fight with HTML.
It looks like the website uses this xhr request to get the actual data in JSON:
https://www.benzinpreis.de/bpmap-callback.php?lat_min=50.86707808969461&lat_max=51.01850632466553&lng_min=6.700286865234375&lng_max=7.215270996093751&action=getTankstellen
( see Chrome Dev Tools Network tab for detailed information )
then you might use ImportJSON to import data into your Google Sheet.
https://workspace.google.com/marketplace/app/importjson_import_json_data_into_google/782573720506
Discovering hidden APIs using Chrome Dev Tools:
https://www.youtube.com/watch?v=kPe3wtA9aPM

Importing a website table into Google Spreadsheet

I am trying to import this table into a Google Spreadsheet:
The table is available here:
https://competitions.lta.org.uk/sport/drawsheet.aspx?id=8D598CDE-8579-4541-B7AD-48558BF6FEA3&draw=4
Before Google changed their Spreadsheet addresses, I had the import working with ImportHTML(URL, "table", 2) - but this no longer works, even though there appears to be only two 'table' labels in the page HTML.
Looking for a way to abstract the table, I went to 'importXML' but tried several versions like 'importxml("https://competitions.lta.org.uk/sport/drawsheet.aspx?id=8D598CDE-8579-4541-B7AD-48558BF6FEA3&draw=4", "//div[contains(#id,'poule')]")'
and the same first part of the statement with "//table[contains(#class,'ruler')]")
but the formula fails with 'no content'
Would really appreciate some help to find a way to import this table!
Thanks in anticipation,
The reason you can't get the table data is because of the cookies page
Every time Google Sheets is trying to access that link, you need to accept cookies, and by default, Google Sheets won't do it.
You need to bypass or accept the cookies from the website to access data, you will need to implement more advanced things in Python or Google Apps Script

ImportXML / ImportHTML workaround with URL Tabs on Google Sheets [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
So I am trying to do a spreadsheet using Google Sheets and the importxml/html function. However, I am not seeing a solution for the URL since it has tabs on a persistent URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE
My goal is to extract the tables of value & growth but not seeing a way to work around that. Only making it work on the main page of the URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T which is data I don't intend to use.
I did try to importhtml with table selection, however not displaying any data when the first URL is used. Also did try importxml with both full Xpath and Xpath for the items I'm interested in and not working either...
Options used:
=importhtml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"table";"2")
=importxml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"//#html/body/div/sal-components-pillar-cards-process/div/div[2]/div/div[3]/div[2]/div/sal-components-mip-style-measures/div/div[3]/div/div[1]/sal-components-mip-measures/div/div[2]/div/div[2]/div/div/div/table/tbody/tr[1]/td[2]")
Any ideas?
It seems that the table you are trying to fetch is controlled by Javascript which is out of hand when using IMPORTs in Google Sheets. Thus, the table can't be scraped.
You can check whether a website/table in a website is javascript controlled by doing this. Go click on the lock button on the left side of the address bar and click site settings, look for Javascript then block it. If you try and reload the website, You should notice a difference before blocking Javascript.
In this case, if you try it on your end, you will notice that after blocking Javascript on the website, you won't be able to see the tables anymore.
IMPORT functions of google sheets are not able to handle JavaScript elements. if you disable JS you are left with (and only this can be imported):

Not able to fetch website data using Importxml in Google Sheets

I am trying to fetch this website (https://www.covidhotspots.in/?city=Mumbai&source=share) data using Importxml in Google Sheets but it gives me no data.
I am trying to apply below formula but it is giving me #NA
=IMPORTXML("https://www.covidhotspots.in/?city=Mumbai&source=share","//li/text()")
I want to fetch geocodes as mentioned in the below images
Issue
IMPORTXML can only read the HTML source of a website. Therefore, those elements and components of a website added dynamically will not be able to be retrieved by the IMPORTXML.
If in your browser you take a look and view the source of the website, specifically focusing on the parent ul element that contains the li that you want to retrieve, you will find that the ul element is empty. This is because the children are inserted dynamically throughout a Javascript script (and that is why when you call this IMPORTXML accordinly to that ul nothing is returned, because it is empty in the source HTML).
Possible workaround
Sometimes, in the Javascript files of the website, you can find out the URL of the source of data being inserted dynamically but that is a tedious task to achieve.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Google spreadsheet importHTML Could not fetch URL

Can someone confirm it for me?
I'm helping someone with the importHTML problem on Google spreadsheet. I'm not familiar with importHTML but I thought it should work.
=importhtml("http://www.stockq.org/","table",1)
I don't care which table I'm importing so long as it imports something. It's giving out error message Error: Could not fetch url: http://www.stockq.org/. But the web site is accessible in my browser. That's really bizarre.
My Google Spreadsheet can't cope with the Chinese characters but numbers recognisable by me on the web page are happily imported, as least for the middle table of the three, with:
=importhtml("http://www.stockq.org/","table",A12)
This is much what was I think mentioned by #DigitalSeraphim way back in September. To quote from an answer that was deleted (as not an answer?):
So, I have been building a page to help me keep up with mod updates for my minecraft server, using importxml heavily. I have found that I get the same error for some sites that load absolutely fine in the browser. Looking into it further, I found that the sites are reporting a 404 error, but actually returning the data requested. According to https://drupal.stackexchange.com/questions/110651/how-to-show-a-node-but-return-http-404-response, this is used to remove pages from search engines, as I had assumed. I don't think there is any way around this without some hackery... namely, setting up a "proxy" server that would "fix" the status.
However, it appears that the example you gave is now working, so maybe give it another try.
TL;DR
Use IMPORTXML with XPaths.
I encountered similar problem where I tried to switch between http and https. The work around worked occasionally but the result is not consistent (either way failed a lot).
Later I noticed there is another API named IMPORTXML (XML, not HTML here). With this one you can actually query the content from the same URL and apply XPath instead.
Therefore I would suggest to switch to use IMPORTXML. For example, the following formula
=IMPORTXML("http://www.stockq.org/index/IBOV.php", "//table[#class='indexpagetable']")
will give you all the tables that have class indexpagetable from the page of the given URL.
Note the XPath is slightly different in the spreadsheet, you can refer to the documents for more specifics.

Resources