This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm attempting to parse the 'PEG Ratio' value of a stock from Yahoo Finance into a Google Sheet, but seeing an error.
URL used: https://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV
Cell Expression used: =IMPORTXML("http://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV", "//td[#data-reactid='132']")
Error: '#N/A' value (Error: Imported Content is empty)
Value expected is 1.28 (at the time of posting this query) - from Yahoo Finance > Statistics tab > PEG Ratio table (td has a, attribute data-reactid='132' that I have attempted to filter in the query)
Can anyone help please? Here is a link to the sheet: Google Sheet
Issue
IMPORTXML can only read the HTML source of a website. Therefore, those elements and components of a website added dynamically will not be able to be retrieved by the IMPORTXML and thus IMPORTXML will interpret the tag to be with empty content.
Possible workaround
Sometimes, in the Javascript files of the website, you can find out the URL of the source of data being inserted dynamically but that is a tedious task to achieve.
Other option to get the desired value is to use other web scraping techniques.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
This is probably not what you want, but I was searching around, and found a Google Sheets Add-On that does manage to pull the "1.28" value from that page. It is free for doing a very limited number of queries per month. If interested, search for IMPORTFROMWEB in the GSuite Marketplace.
I only plugged in your URL and the same XPath that you used, so I was very surprised when the data showed up. No idea how it works.
I apologise if mentioning an Add-On is not appropriate on SO. But knowing that an add-on can get that data off the web page may encourage some other ideas on how to do it natively with Sheets.
Related
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I want to get the rate of the euro from this website: bonbast.com. I tried this formula:
IMPORTXML("https://www.bonbast.com/","//tr[#id='eur1']")
but nothing imported. What is wrong with it?
As mentioned in JaSON's comment, the IMPORTXML function cannot read dynamic values generated after the page loads. It's just meant to read static pages.
The website bonbast.com seems to have an API so you can use that to retrieve the data, though it's a paid service.
By default Sheets is not really equipped to scrape dynamic websites. You're better off looking for another site that has static data, look for some kind of extension or add-on that does the work for you or learn more advanced scraping approaches.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I tried for the past 2 hours using an xPath scraper, inspecting, googling and still can't figure this out for the life of me.
I'm trying to scrape the interest rates on this table but it's not pulling through ->
Website
https://www.fhlbboston.com/fhlbank-boston/rates#/long-term
Formula (incorrect)
importxml("https://www.fhlbboston.com/fhlbank-boston/rates#/long-term","//table",1)
import formulae of google sheets does not support the scrapping of JavaScript elements. you can always check this by disabling JS for a given site and usually only what is left can be imported. in your case:
the workaround would be to find alternative URL that hosts your desired dataset
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to do a personal search and would like to get some data (Number of players, Number of reviews, Category etc), from the single-game page of BoardGameGeek Website (https://boardgamegeek.com/boardgame/174430/gloomhaven).
Unfortunately, the IMPORTXML Google Spreadsheet function doesn't work and I don't understand why. Maybe the page is JS generated? I'm not an expert, does anyone have a solution? I have looked in other treads but it seems to me a rather specific case.
IMOPORTXML formula (or any other IMPORT formula) does not support the scrapping of JavaScript elements. you can always test this by disabling JS for a given site and usually only what is left can be imported. in your case its pure JS:
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Creating a ticker scanner tool on googlesheets with mainly google finance and yahoo finance. No difficulties with index(importhtml()) and other functions however I can't manage to find the correct Xpath when using importxml. I have no background in html or Xpath so I am a novice but I understand the basics behind it from troubleshooting recently.
URL: https://au.finance.yahoo.com/quote/FMG.AX?p=FMG.AX
I am trying to pull in text information about SECTOR, INDUSTRY and the DESCRIPTION which is on the right hand side (about half way down the page). It seems to be within a column which may be causing me trouble. Using Chrome inspect to retrieve XML but also tried several chrome extensions which didn't work either.
This is what I got when copying Xpath (short and long versions)
Sectors:
//*[#id="Col2-11-QuoteModule-Proxy"]/div/div/div/div/p[2]
Business Summary:
/html/body/div[1]/div/div/div[1]/div/div[3]/div[2]/div/div/div/div/div/div[12]/div/div/div/div/div/p
Also tried shortening the /div with //p but doesn't work anyway.
I played around and used //body/div//div/p which retrieved news data from the middle of the page.
Wondering if someone could help me adjust or explain what I am doing wrong and point me in the right direct.
this will never work with IMPORTXML / IMPORHTML formulae because elements you are trying to import are controlled by JavaScript which google sheets can't process.
Apparently, the data you are trying to pull is controlled by JavaScript which means you won't be able to fetch it using IMPORTXML.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.