How to scrape an income statement from Yahoo Finance into google sheets - google-sheets

I'm trying to scrape an income statement for Apple (AAPL) into google sheets
https://finance.yahoo.com/quote/AAPL/financials?p=AAPL
First off I'd like to say I'm new to using anything technical or function related for computers so sorry if its a dumb question but I'm aware sheets has built in import functions for web data and I tried using the IMPORTXML function and I couldn't find the right xpath for the whole income statement
So my questions are
Which Import function would be best for scraping the income statement into sheets
Whichever function is the best how can I do it.
Would I repeat the steps that you show me if I wanted to scrape the balance sheet and cash flow as well
Thank you for your time

It seems that you are trying to fetch dynamically generated data in the link you've provided. Import functions cannot be used or cannot function properly in dynamically generated data as well as in websites which data are being controlled by JavaScript.
I suggest finding another link or website that will provide you with the same data and can be fetched through IMPORT functions by taking into consideration the mentioned limitations above.

Related

Importing a website table into Google Spreadsheet

I am trying to import this table into a Google Spreadsheet:
The table is available here:
https://competitions.lta.org.uk/sport/drawsheet.aspx?id=8D598CDE-8579-4541-B7AD-48558BF6FEA3&draw=4
Before Google changed their Spreadsheet addresses, I had the import working with ImportHTML(URL, "table", 2) - but this no longer works, even though there appears to be only two 'table' labels in the page HTML.
Looking for a way to abstract the table, I went to 'importXML' but tried several versions like 'importxml("https://competitions.lta.org.uk/sport/drawsheet.aspx?id=8D598CDE-8579-4541-B7AD-48558BF6FEA3&draw=4", "//div[contains(#id,'poule')]")'
and the same first part of the statement with "//table[contains(#class,'ruler')]")
but the formula fails with 'no content'
Would really appreciate some help to find a way to import this table!
Thanks in anticipation,
The reason you can't get the table data is because of the cookies page
Every time Google Sheets is trying to access that link, you need to accept cookies, and by default, Google Sheets won't do it.
You need to bypass or accept the cookies from the website to access data, you will need to implement more advanced things in Python or Google Apps Script

Googlesheets function - IMPORTXML Xpath difficulties for column text within Yahoo Finance [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Creating a ticker scanner tool on googlesheets with mainly google finance and yahoo finance. No difficulties with index(importhtml()) and other functions however I can't manage to find the correct Xpath when using importxml. I have no background in html or Xpath so I am a novice but I understand the basics behind it from troubleshooting recently.
URL: https://au.finance.yahoo.com/quote/FMG.AX?p=FMG.AX
I am trying to pull in text information about SECTOR, INDUSTRY and the DESCRIPTION which is on the right hand side (about half way down the page). It seems to be within a column which may be causing me trouble. Using Chrome inspect to retrieve XML but also tried several chrome extensions which didn't work either.
This is what I got when copying Xpath (short and long versions)
Sectors:
//*[#id="Col2-11-QuoteModule-Proxy"]/div/div/div/div/p[2]
Business Summary:
/html/body/div[1]/div/div/div[1]/div/div[3]/div[2]/div/div/div/div/div/div[12]/div/div/div/div/div/p
Also tried shortening the /div with //p but doesn't work anyway.
I played around and used //body/div//div/p which retrieved news data from the middle of the page.
Wondering if someone could help me adjust or explain what I am doing wrong and point me in the right direct.
this will never work with IMPORTXML / IMPORHTML formulae because elements you are trying to import are controlled by JavaScript which google sheets can't process.
Apparently, the data you are trying to pull is controlled by JavaScript which means you won't be able to fetch it using IMPORTXML.

Get historical stock price from investing.com to google sheets [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.

Performing Google Search In Spreadsheet And Scraping The SERP Data

I hope that you guys are fine. I want to build a simple spreadsheet and I thought I could be able to make one but blank sheet looks horrible to me. I am sure that you guys are kind enough to help me out.
I want to perform multiple Google search queries in Google spreadsheet and want to parse results of each search (top 10 results of each search)
Something like this: https://www.youtube.com/watch?v=tBwEbuMRFlI
But when I tried his given formula in description to play test, Google returned #Error to me, I don't know why.
Can you guys please help me out in making a simple spreadsheet compatible for multiple queries at once? Like one column for keywords (where I could paste my list of keywords) and then 10 columns of search results. All results for one keyword should come in one row
Something like this:
My 1st Example Query = 1st search result, 2nd search result, 3rd result and so on.
My 2nd Example Query = 1st search result, 2nd search result, 3rd result and so on.
It must be easy to code but yeah, it might be time-consuming and I would be very grateful if anyone of you could help me about it.
Looking forward to your help guys.
The problem is that you want to scrape out of Spreadsheets, that's a bad approach and is almost certainly not going to work. Even if you manage to write a scraper inside that limited environment it will easily be spotted by Google.
As you said time is not a problem, I would suggest another route.
Use a backend tool/script that scrapes the data
Use a backend tool/script that creates/modifies the Google spreadsheet
You can run such a script(s) manually on your PC or from a server full automated using a scheduler/cron job.
To create/modify spreadsheets look here: How do I access the Google Spreadsheets API in PHP?
To scrape Google look here: Is it ok to scrape data from Google results?
So this is PHP as language of choice but you can do the exactly same in Java or Python or C#
There is a third party solution like SerpApi you could use for this. It's a paid API with a free trial.
Google Sheets Add-on: SerpApi - Search Engine Results and Ranks
Example code to extract title from the first result:
=SERPAPI_RESULT("engine=google&q=coffee&location=Austin, Texas, United States&google_domain=google.com&gl=us&hl=en", "organic_results.0.title")

importXML Parse Error [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I need to parse balance sheet data for a given set of stocks on otcmarkets.com. I'm trying to use the importXML function in Google Spreadsheet, but it is not returning any data for me. The xPath query did not return any data. Here is the function I'm using:
=importxml("http://www.otcmarkets.com/stock/AAEH/financials","//*[#id=’totalCurrentLiabilities’]")
Let me know what I'm doing wrong and if there is a better way to parse specific balance sheet data.
The page contents are loaded using JavaScript, which is not executed in Google Spreadsheets. You cannot parse this page using =importxml(...).
What to do now?
Ask the providers if they offer an API. Most probably they don't want to be scraped anyway.
Analyze the page logic and find the JavaScript call which loads the data, and fetch it yourself. Most probably it is in JSON format, which is not easy to parse in Google Spreadsheets without external libraries.
Use some environment to query the data which will execute the JavaScript calls, for example Selenium. Will result in much more programming than using Google Spreadsheets.
Try using the importdata function:
=IMPORTDATA("http://www.otcmarkets.com/otciq/ajax/EdgarFinancialsController.json?ticker=AAEH&mode=annual")

Resources