Googlesheets function - IMPORTXML Xpath difficulties for column text within Yahoo Finance [duplicate] - google-sheets

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Creating a ticker scanner tool on googlesheets with mainly google finance and yahoo finance. No difficulties with index(importhtml()) and other functions however I can't manage to find the correct Xpath when using importxml. I have no background in html or Xpath so I am a novice but I understand the basics behind it from troubleshooting recently.
URL: https://au.finance.yahoo.com/quote/FMG.AX?p=FMG.AX
I am trying to pull in text information about SECTOR, INDUSTRY and the DESCRIPTION which is on the right hand side (about half way down the page). It seems to be within a column which may be causing me trouble. Using Chrome inspect to retrieve XML but also tried several chrome extensions which didn't work either.
This is what I got when copying Xpath (short and long versions)
Sectors:
//*[#id="Col2-11-QuoteModule-Proxy"]/div/div/div/div/p[2]
Business Summary:
/html/body/div[1]/div/div/div[1]/div/div[3]/div[2]/div/div/div/div/div/div[12]/div/div/div/div/div/p
Also tried shortening the /div with //p but doesn't work anyway.
I played around and used //body/div//div/p which retrieved news data from the middle of the page.
Wondering if someone could help me adjust or explain what I am doing wrong and point me in the right direct.

this will never work with IMPORTXML / IMPORHTML formulae because elements you are trying to import are controlled by JavaScript which google sheets can't process.

Apparently, the data you are trying to pull is controlled by JavaScript which means you won't be able to fetch it using IMPORTXML.

Related

How to get the correct XPath for ImportXML [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I tried for the past 2 hours using an xPath scraper, inspecting, googling and still can't figure this out for the life of me.
I'm trying to scrape the interest rates on this table but it's not pulling through ->
Website
https://www.fhlbboston.com/fhlbank-boston/rates#/long-term
Formula (incorrect)
importxml("https://www.fhlbboston.com/fhlbank-boston/rates#/long-term","//table",1)
import formulae of google sheets does not support the scrapping of JavaScript elements. you can always check this by disabling JS for a given site and usually only what is left can be imported. in your case:
the workaround would be to find alternative URL that hosts your desired dataset

IMPORTXML Function doesn't work for this page [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to do a personal search and would like to get some data (Number of players, Number of reviews, Category etc), from the single-game page of BoardGameGeek Website (https://boardgamegeek.com/boardgame/174430/gloomhaven).
Unfortunately, the IMPORTXML Google Spreadsheet function doesn't work and I don't understand why. Maybe the page is JS generated? I'm not an expert, does anyone have a solution? I have looked in other treads but it seems to me a rather specific case.
IMOPORTXML formula (or any other IMPORT formula) does not support the scrapping of JavaScript elements. you can always test this by disabling JS for a given site and usually only what is left can be imported. in your case its pure JS:

ImportXML / ImportHTML workaround with URL Tabs on Google Sheets [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
So I am trying to do a spreadsheet using Google Sheets and the importxml/html function. However, I am not seeing a solution for the URL since it has tabs on a persistent URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE
My goal is to extract the tables of value & growth but not seeing a way to work around that. Only making it work on the main page of the URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T which is data I don't intend to use.
I did try to importhtml with table selection, however not displaying any data when the first URL is used. Also did try importxml with both full Xpath and Xpath for the items I'm interested in and not working either...
Options used:
=importhtml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"table";"2")
=importxml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"//#html/body/div/sal-components-pillar-cards-process/div/div[2]/div/div[3]/div[2]/div/sal-components-mip-style-measures/div/div[3]/div/div[1]/sal-components-mip-measures/div/div[2]/div/div[2]/div/div/div/table/tbody/tr[1]/td[2]")
Any ideas?
It seems that the table you are trying to fetch is controlled by Javascript which is out of hand when using IMPORTs in Google Sheets. Thus, the table can't be scraped.
You can check whether a website/table in a website is javascript controlled by doing this. Go click on the lock button on the left side of the address bar and click site settings, look for Javascript then block it. If you try and reload the website, You should notice a difference before blocking Javascript.
In this case, if you try it on your end, you will notice that after blocking Javascript on the website, you won't be able to see the tables anymore.
IMPORT functions of google sheets are not able to handle JavaScript elements. if you disable JS you are left with (and only this can be imported):

Get historical stock price from investing.com to google sheets [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.

Google Sheets IMPORTXML XPath - Imported Content is empty [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm attempting to parse the 'PEG Ratio' value of a stock from Yahoo Finance into a Google Sheet, but seeing an error.
URL used: https://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV
Cell Expression used: =IMPORTXML("http://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV", "//td[#data-reactid='132']")
Error: '#N/A' value (Error: Imported Content is empty)
Value expected is 1.28 (at the time of posting this query) - from Yahoo Finance > Statistics tab > PEG Ratio table (td has a, attribute data-reactid='132' that I have attempted to filter in the query)
Can anyone help please? Here is a link to the sheet: Google Sheet
Issue
IMPORTXML can only read the HTML source of a website. Therefore, those elements and components of a website added dynamically will not be able to be retrieved by the IMPORTXML and thus IMPORTXML will interpret the tag to be with empty content.
Possible workaround
Sometimes, in the Javascript files of the website, you can find out the URL of the source of data being inserted dynamically but that is a tedious task to achieve.
Other option to get the desired value is to use other web scraping techniques.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
This is probably not what you want, but I was searching around, and found a Google Sheets Add-On that does manage to pull the "1.28" value from that page. It is free for doing a very limited number of queries per month. If interested, search for IMPORTFROMWEB in the GSuite Marketplace.
I only plugged in your URL and the same XPath that you used, so I was very surprised when the data showed up. No idea how it works.
I apologise if mentioning an Add-On is not appropriate on SO. But knowing that an add-on can get that data off the web page may encourage some other ideas on how to do it natively with Sheets.

Resources