How to webscrape from Marketwatch Financials for Google Sheets - google-sheets

I want to scrape data from MarketWatch. I have a formula to pull from Finviz:
=value(regexextract(query(importhtml("http://finviz.com/quote.ashx?t="&$C7,"table",9),"select Col2 where Col1 = 'Income' ",0),"[-\d.]+"))
Note: The C7 box contains SBSW.
How do I scrape the Sales/Revenue of 2021 for the ticker SBSW. Here's the link:
https://www.marketwatch.com/investing/stock/SBSW/financials
The result should show 172.19

I tested using this formula, and it works for me:
=IMPORTXML("https://www.marketwatch.com/investing/stock/SBSW/financials", "//*[#id='maincontent']/div[6]/div/div[2]/div/div/table/tbody/tr[1]/td[6]/div/span")
And it looks like this:
You can get the xpath_query with the developer tools like this:
Edit answer, removing the B at the end
First option
If the letter is always "B."
=SUBSTITUTE(IMPORTXML("https://www.marketwatch.com/investing/stock/SBSW/financials", "//*[#id='maincontent']/div[6]/div/div[2]/div/div/table/tbody/tr[1]/td[6]/div/span"),"B","")
Second option
If the letter at the end always changes.
=REGEXEXTRACT(IMPORTXML("https://www.marketwatch.com/investing/stock/SBSW/financials", "//*[#id='maincontent']/div[6]/div/div[2]/div/div/table/tbody/tr[1]/td[6]/div/span"),"[0-9]+.+[0-9]")
Reference:
IMPORTXML
SUBSTITUTE
REGEXEXTRACT

Related

How to use IMPORTXML and SEQUENCE together in Google Sheet

=ARRAYFORMULA("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1))
I use the code above to have the links that I would like to scrap the data. There are 5 links.
=IMPORTXML(A6,"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")
I also use the formula above to scrap the data I want from the link. A6 refers to the first link the first formula creates.
What I would like to do is, if possible, I want to scrap the data from the 5 links and list them in a column.
=IMPORTXML(ARRAYFORMULA("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1)),"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")
The formula above did not work.
=ARRAYFORMULA(IMPORTXML("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1),"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i"))
The formula above did not work as well. It always scraps the first link's data only.
Thank you for your help in advance.
keep in mind that IMPORTXML itself is a "type of arrayformula" so it is not supported under ARRAYFORMULA
in your case try to hardcode 5 IMPORTRANGE formulae into array {} like:
={IMPORTRANGE();
IMPORTRANGE();
IMPORTRANGE();
etc}
update
with new LAMBDA function its possible to do it in one go:
=INDEX(TRIM(FLATTEN(SPLIT(FLATTEN(BYCOL(
"https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&
SEQUENCE(1,5,1,1), LAMBDA(x, QUERY(IMPORTXML(x,
"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")&"×",,9^9)))), "×"))))

Can you copy down an IMPORTXML function using an ArrayFormula?

I'm trying to get the below formula to copy down column D (in red) and have the same results as column E (in green)
={"Query in H1";ARRAYFORMULA(IF(B2:B<>"",(IF(ISNUMBER(SEARCH(B3:B,IMPORTXML(A2:A,"//h1"))),"Yes","No"))))}
This formula collects the H1 (xpath) via the IMPORTXML function of the URL (column A) and checks to see if the keyword (column B) is included. If it is "Yes" if not "No"
See Google Sheet for reference:
https://docs.google.com/spreadsheets/d/1iHkU-rNtNhoOKvW_CWY7WU5OLsMFVqEFNRZlx_R-7RY/edit#gid=1497887942
Your formula just needs a few modifications:
Remove the header text and modify the SEARCH parameters to B2:B such that the formula looks like this:
=ARRAYFORMULA(IF(B2:B<>"",(IF(ISNUMBER(SEARCH(B2:B,IMPORTXML(A2:A,"//h1"))),"Yes","No"))))
Place the formula in the D2 cell.
After all the changes, this is how your sheet will look like:
I have also taken the opportunity to create a copy of the sheet named Answer with all the modifications.
As far as I know, we can not use ImportXML with ArrayFormula, the result after run formular will not correct. Even you see the result run down from the top to the last column, but the result will be wrong.
You can see an example in the screenshot below, with the same URL, the results in Column F and I are different from each other.

How do I get Row Totals using QUERY() in Google Sheets?

I'm working with the following Google Sheet.
Sheet2 uses the following QUERY() function to retrieve data from Sheet1
=QUERY(IMPORTRANGE("1s8krJ7rbZ1DMblZ3vdLcG5pySVM3ESCBy1o7R5Zv4LM", "Sheet1!B3:D"))
Is it possible to return the Row Totals (For Example: B4+C4+D4 for Row 4) using the above QUERY() function?
Please Advise.
My Query and Expected Output are Outlined on the Google Sheet.
You should be able to do something like this:
=QUERY(IMPORTRANGE("1s8krJ7rbZ1DMblZ3vdLcG5pySVM3ESCBy1o7R5Zv4LM", "Sheet1!B3:D"), "Select Col1+Col2+Col3 label Col1+Col2+Col3 ''")
Note that importing from another tab in the same spreadsheet doesn't require importrange. In that case, this should also work:
=QUERY(Sheet1!B3:D, "Select B+C+D label B+C+D ''")
Another way, to achieve the same result would be
=ArrayFormula(if(len(Sheet1!B3:B), Mmult(--Sheet1!B3:D, transpose(column(Sheet1!B2:D2)^0)),))
Be aware that while a simple QUERY can return the results, it will actually take up the entire column if you don't limit it. In other words, all the null rows from Sheet1! B:D will also come over with the QUERY. If you want only the results that have numbers, try something like this:
=QUERY(Sheet1!B3:D, "Select B+C+D WHERE B+C+D is not null label B+C+D ''")
Or you could use MMULT like this:
=MMULT(FILTER(Sheet1!B3:D,Sheet1!B3:B<>""),SEQUENCE(3,1,1,0))
The results may look the same whether limited or not. But in a QUERY or MMULT without limitations, you won't be able to use the space below the visible results for anything. I only mention this because, currently in your sheet, you do have data (a TRANSPOSE formula) below the main results. If you won't in your real sheet and don't care about the are below the visible results being inaccessible to other data or formula entry, then you don't need to limit.

Vlookup in same range within different sheets on a different Google Spreadsheet | Google Sheets

I'm hoping to do a VLOOKUP in a different Google Sheet based on 2 criteria: sheet name and then the lookup value. My data looks something like this:
A1 B1 C1
Sheet_Name Lookup_Value Lookup_Value
Sheet_1 123456 =vlookup(B3,"Sheet_1!$A$1:$C$1000",2,false)
Sheet_1 987456 =vlookup(B4,"Sheet_1!$A$1:$C$1000",2,false)
Sheet_2 654123 =vlookup(B5,"Sheet_2!$A$1:$C$1000",2,false)
Sheet_3 959595 =vlookup(B6,"Sheet_3!$A$1:$C$1000",2,false)
Sheet_3 621346 =vlookup(B7,"Sheet_3!$A$1:$C$1000",2,false)
Is there a way I can choose the sheet in my vlookup equation based on the value in column A rather than going in manually and updating this?
Currently, I'm trying this, but it's not working:
=vlookup(B3,importrange("key_here",indirect(A3)&"!A1:C1000"),2,false)
Use INDIRECT:
=vlookup(B3,INDIRECT("'"&A3&"'!$A$1:$C$1000",2,false)
Figured it out: Google doesn't require the indirect function. So what works is:
=vlookup(B3,importrange("key_here",A3&"!A1:C1000"),2,false)

IMPORTRANGE: value missing?

I have been working on this issue for a few weeks now and can't seem to find a solution. I used this answer (IMPORTRANGE with CONDITIONS) to get as far as I could, but I keep getting a value error.
This is the sheet I'm working with.
My goal is to use the first tab in the sheet (All Games) to enter all the games that I come across to create a compendium. But, then I want it to automatically populate the other tabs based on certain criteria (what type of game, skills learned, etc.)
On the Warm-Ups tab you'll see the formulas I have tried. A1 is the most recent.
Here is the formula I tried:
=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1F64PMg_iFu-DaJAUaE4BkpqF4zoteknp56VfwAUe8ag/edit#gid=1359689553", "All Games!A1:A1300"),"SELECT Col1 WHERE (Col2 = 'w') ")
I am getting a value error:
Unable to parse query string for Function QUERY parameter 2: NO_COLUMN: Col2
you are trying to reference Col2 but you selected range A1:A1300 which is just 1 column. therefore try:
=QUERY(IMPORTRANGE("1F64PMg_iFu-DaJAUaE4BkpqF4zoteknp56VfwAUe8ag", "All Games!A1:B1300"),
"SELECT Col1 WHERE (Col2 = 'w') ")
from the brief look on your sheet, you may want use matches or contains instead of = in your QUERY
You are only importing (part of) ColumnA - so no wonder Google can't find a second column :)

Resources