Google Sheets yahoo finance importXML text not td data [duplicate] - google-sheets

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
On SO I often see people inquiring about how to import data from the tables on yahoo finance. I'm trying to import the business description under the profile section from yahoo finance. It seems this would require the importxml function but I'm struggling. This is my function:
=IMPORTXML("http://finance.yahoo.com/quote/AAPL/profile", "//div[#data-reactid='139']")
I think my issue is related to "div" but not sure. Might anyone be able to provide guidance? Thanks!

Sample formula:
=IMPORTXML(A1,"//h2[#data-reactid='139']/../p")
In this case, the URL of http://finance.yahoo.com/quote/AAPL/profile is put in "A1".
I used //h2[#data-reactid='139']/../p as the xpath.
Result:

Related

ImportXML extract paginated table into Google Sheets [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I would like to scrape this table url is:
https://www.londonstockexchange.com/indices/ftse-aim-all-share/constituents/table?page=1
As you can see its currently 39 pages but this can change so it's dynamic. Can someone please provide guidance on how to import it into google sheets. I have come up with the following so far:
=IMPORTXML(https://www.londonstockexchange.com/indices/ftse-aim-all-share/constituents/table?page=1", "table",1)
But it doesn't seem to work
The website you are trying to scrape is loading the table dynamically. IMPORTXML is used only for static content.
Your best bet would be to write your own script to parse it, or to find a paid service.

how to scrap one of the rates in this website using google sheets' importxml formula? [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I want to get the rate of the euro from this website: bonbast.com. I tried this formula:
IMPORTXML("https://www.bonbast.com/","//tr[#id='eur1']")
but nothing imported. What is wrong with it?
As mentioned in JaSON's comment, the IMPORTXML function cannot read dynamic values generated after the page loads. It's just meant to read static pages.
The website bonbast.com seems to have an API so you can use that to retrieve the data, though it's a paid service.
By default Sheets is not really equipped to scrape dynamic websites. You're better off looking for another site that has static data, look for some kind of extension or add-on that does the work for you or learn more advanced scraping approaches.

How to get the correct XPath for ImportXML [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I tried for the past 2 hours using an xPath scraper, inspecting, googling and still can't figure this out for the life of me.
I'm trying to scrape the interest rates on this table but it's not pulling through ->
Website
https://www.fhlbboston.com/fhlbank-boston/rates#/long-term
Formula (incorrect)
importxml("https://www.fhlbboston.com/fhlbank-boston/rates#/long-term","//table",1)
import formulae of google sheets does not support the scrapping of JavaScript elements. you can always check this by disabling JS for a given site and usually only what is left can be imported. in your case:
the workaround would be to find alternative URL that hosts your desired dataset

IMPORTXML Function doesn't work for this page [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to do a personal search and would like to get some data (Number of players, Number of reviews, Category etc), from the single-game page of BoardGameGeek Website (https://boardgamegeek.com/boardgame/174430/gloomhaven).
Unfortunately, the IMPORTXML Google Spreadsheet function doesn't work and I don't understand why. Maybe the page is JS generated? I'm not an expert, does anyone have a solution? I have looked in other treads but it seems to me a rather specific case.
IMOPORTXML formula (or any other IMPORT formula) does not support the scrapping of JavaScript elements. you can always test this by disabling JS for a given site and usually only what is left can be imported. in your case its pure JS:

Returned values from ImportXML in Google Sheet is different than the actual values on Yahoo Finance [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
ImportXML not producing correct values [duplicate]
(1 answer)
Closed last month.
I am trying to import option price from Yahoo Finance into my Google sheet using ImportXML but the actual values for some of the prices received in Google Sheet is very different than what I can see on Yahoo Finance website. I even tried ImportHTML and the result is the same.
Formula used in Google Sheet:
=TRANSPOSE(IMPORTXML( "https://finance.yahoo.com/quote/KL220121C00045000?p=KL220121C00045000" ,"//tr"))
Here's the result in Google Sheet (all red cells are the values that are different):
Actual values on Yahoo Finance page:
I am totally clueless why this is happening and how to solve it.
#Tanaike's link to ImportXML not producing correct values answers how to workaround the issue using App Scripts.
To answer the question of "why", I believe Yahoo Finance has implemented some sort of user agent detection, such that requests from Google Spreadsheets, or more specifically requests with the user agent Mozilla/5.0 (compatible; GoogleDocs; apps-spreadsheets; +http://docs.google.com) will be served a different (I believe older) version of the data.
When I visit the link https://finance.yahoo.com/quote/KL220121C00045000?p=KL220121C00045000 in the browser, it currently shows
As of 3:17PM EST. Market open
But when I change my user agent to mimic Google sheets, I get
As of 10:43AM EST. Market open.
Which is the same result as IMPORTXML.
I am guessing they implemented this either to reduce fetching from automated spreadsheets, or to discourage people from scraping their sites using Google Sheets.

Resources