This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I need to parse balance sheet data for a given set of stocks on otcmarkets.com. I'm trying to use the importXML function in Google Spreadsheet, but it is not returning any data for me. The xPath query did not return any data. Here is the function I'm using:
=importxml("http://www.otcmarkets.com/stock/AAEH/financials","//*[#id=’totalCurrentLiabilities’]")
Let me know what I'm doing wrong and if there is a better way to parse specific balance sheet data.
The page contents are loaded using JavaScript, which is not executed in Google Spreadsheets. You cannot parse this page using =importxml(...).
What to do now?
Ask the providers if they offer an API. Most probably they don't want to be scraped anyway.
Analyze the page logic and find the JavaScript call which loads the data, and fetch it yourself. Most probably it is in JSON format, which is not easy to parse in Google Spreadsheets without external libraries.
Use some environment to query the data which will execute the JavaScript calls, for example Selenium. Will result in much more programming than using Google Spreadsheets.
Try using the importdata function:
=IMPORTDATA("http://www.otcmarkets.com/otciq/ajax/EdgarFinancialsController.json?ticker=AAEH&mode=annual")
Related
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I would like to scrape this table url is:
https://www.londonstockexchange.com/indices/ftse-aim-all-share/constituents/table?page=1
As you can see its currently 39 pages but this can change so it's dynamic. Can someone please provide guidance on how to import it into google sheets. I have come up with the following so far:
=IMPORTXML(https://www.londonstockexchange.com/indices/ftse-aim-all-share/constituents/table?page=1", "table",1)
But it doesn't seem to work
The website you are trying to scrape is loading the table dynamically. IMPORTXML is used only for static content.
Your best bet would be to write your own script to parse it, or to find a paid service.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I want to get the rate of the euro from this website: bonbast.com. I tried this formula:
IMPORTXML("https://www.bonbast.com/","//tr[#id='eur1']")
but nothing imported. What is wrong with it?
As mentioned in JaSON's comment, the IMPORTXML function cannot read dynamic values generated after the page loads. It's just meant to read static pages.
The website bonbast.com seems to have an API so you can use that to retrieve the data, though it's a paid service.
By default Sheets is not really equipped to scrape dynamic websites. You're better off looking for another site that has static data, look for some kind of extension or add-on that does the work for you or learn more advanced scraping approaches.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm attempting to parse the 'PEG Ratio' value of a stock from Yahoo Finance into a Google Sheet, but seeing an error.
URL used: https://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV
Cell Expression used: =IMPORTXML("http://finance.yahoo.com/quote/ABBV/key-statistics?p=ABBV", "//td[#data-reactid='132']")
Error: '#N/A' value (Error: Imported Content is empty)
Value expected is 1.28 (at the time of posting this query) - from Yahoo Finance > Statistics tab > PEG Ratio table (td has a, attribute data-reactid='132' that I have attempted to filter in the query)
Can anyone help please? Here is a link to the sheet: Google Sheet
Issue
IMPORTXML can only read the HTML source of a website. Therefore, those elements and components of a website added dynamically will not be able to be retrieved by the IMPORTXML and thus IMPORTXML will interpret the tag to be with empty content.
Possible workaround
Sometimes, in the Javascript files of the website, you can find out the URL of the source of data being inserted dynamically but that is a tedious task to achieve.
Other option to get the desired value is to use other web scraping techniques.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
This is probably not what you want, but I was searching around, and found a Google Sheets Add-On that does manage to pull the "1.28" value from that page. It is free for doing a very limited number of queries per month. If interested, search for IMPORTFROMWEB in the GSuite Marketplace.
I only plugged in your URL and the same XPath that you used, so I was very surprised when the data showed up. No idea how it works.
I apologise if mentioning an Add-On is not appropriate on SO. But knowing that an add-on can get that data off the web page may encourage some other ideas on how to do it natively with Sheets.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to pull the view count from a video on Instagram. This is the video: https://www.instagram.com/p/BxEApSqgNJn/
I have been able to get Youtube and Facebook views, but struggling with Instagram.
I have used the following Formula to pull data from Facebook Video:
=IFERROR(LEFT(IMPORTXML(H28,"//*[#data-tooltip-position='below']"),LEN(IMPORTXML(H28,"//*[#data-tooltip-position='below']"))-5),"0")
H28 is the Link
It should show the views the video has achieved, in this case... 351,271 views as of May 14, 2019.
How about this workaround for retrieving the value? In this workaround, the value is retrieved from the data which is preparing for Javascript. The data is updated when the page is loaded, and retrieved using a xpath, and the value is retrieved using a regular expression. So I used this method. The modified formula is as follows. Please think of this as just one of several answers.
Sample formula:
In this sample formula, https://www.instagram.com/p/BxEApSqgNJn/ is put in the cell "A1".
=REGEXEXTRACT(IMPORTXML(A1,"//script[#type='application/ld+json']"),"userInteractionCount"":""(\d+)")
Retrieve data using the xpath of //script[#type='application/ld+json'] with IMPORTXML().
Retrieve the value using the regular expression of userInteractionCount"":""(\d+) with REGEXEXTRACT().
Result:
References:
IMPORTXML
REGEXEXTRACT
If I misunderstood your question and this was not the result you want, I apologize.
unfortunately, that won't be possible because Instagram is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see a blank page