How to make importxml only give a certain data - google-sheets

I am trying to get only the number of likes from a website. Currently, I am using
=IMPORTXML("https://www.abillionveg.com/articles/vegan-diet-nutrition-guide","//button")
However, it gives me data from all of the buttons. Can someone help me modify the formula to show only the likes?
Sorry if this is a basic question, I am just learning.

You want to retrieve the number of the number of likes using IMPORTXML.
If my understanding is correct, how about this answer?
Modified formula 1:
=INDEX(SPLIT(IMPORTXML(A1,"//div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')]")," "),1)
The URL of https://www.abillionveg.com/articles/vegan-diet-nutrition-guide is put in the cell "A1".
The xpath is //div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')].
Retrieve the value using IMPORTXML.
Retrieve the number of ### from the value like ### likes using SPLIT and INDEX.
Result:
Modified formula 2:
=REGEXEXTRACT(IMPORTXML(A1,"//script[#id='__NEXT_DATA__']"),"likesCount""\:(\d+)") - 1
This result is the same with Modified formula 1.
Note:
For example, if =IMPORTXML(A1,"//div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')]") is used, 100 likes is retrieved.
References
IMPORTXML
SPLIT
INDEX

Related

Unpivoting using QUERY function is not fetching the expected result

I am currently working on a dataset that includes several columns, mostly the dates. What I am trying to achieve is - unpivot all the date columns to use for my subsequent calculations. I use the following formula to unpivot: =ARRAYFORMULA(SPLIT(FLATTEN(Data!A2:A&"|"&Data!D1:AG1&"|"&Data!D2:AG),"|"))
Even though this returns the expected result, when I try to nest this within a Query function, it does not work correctly. This is how I applied the formula - QUERY(ARRAYFORMULA(SPLIT(FLATTEN(Data!A2:A&"|"&Data!D1:AG1&"|"&Data!D2:AG),"|")),"Select * WHERE Col3 IS NOT NULL")
PS: When I change the data range to say, A2:A100, it gives me the correct result. However, it does not help since lot of new data would get added and I want the formula to be dynamic.
Here's the link to the sample sheet - https://docs.google.com/spreadsheets/d/1dgFY5mT9nUJtFefjAros-XpWXRMFtxEf8Fqrv82N5Ys/edit#gid=1813140523
Any help/suggestions would be highly appreciated
Not sure where you got your SPLIT(FLATTEN technique,
but you have to include both the 3rd and 4th parameters of the split function as FALSE or 0. so in your case it would be:
=ARRAYFORMULA(SPLIT(FLATTEN(Data!A2:A&"|"&Data!D1:AG1&"|"&Data!D2:AG),"|",0,0))
If you do that you'll find your query works.
Also note that the way you have it it's not really working. If you look all the way down in column 1 you'll find a bunch of dates formatted to look like integers.

Extract html table row to google sheet

I’m trying to extract a single row from a table
When using the google sheet importhtml function, I get the whole table.
=IMPORTHTML("https://www.marketwatch.com/investing/stock/jwn/options?mod=mw_quote_tab", "table",1)
How can I extract just the row right above the word “ Current price as of “
So e.g. in this case the row will have the data below. (this data will change as the date changes)
quote 1.5 0.53 76 1.36 1.47 142 39 quote 0.88 -1.73 23
I have several urls to go thorough
So e.g if I put the following url then the row position will change.
https://www.marketwatch.com/investing/stock/ge/options
Any idea how to extract that just last row right above the word “ Current price as of “
When I saw the HTML data from the URL of https://www.marketwatch.com/investing/stock/ge/options, I thought that the value you expect might be able to be retrieved using IMPORTXML and a xpath. So in this answer, I would like to propose to use IMPORTXML.
Sample formula:
=IMPORTXML(A1,"//tr[td[1]/#class='acenter inthemoney'][last()]")
In this case, the URL of https://www.marketwatch.com/investing/stock/ge/options is put in the cell "A1".
Result:
Note:
This sample formula can be used for the current URL of https://www.marketwatch.com/investing/stock/ge/options. So when the URL is changed and the HTML structure is changed by updated of the site, the formula might not be able to be used. So please careful this.
Reference:
IMPORTXML
ImportHTML() simply allows you to read an (entire!) HTML table or list into your Google sheet.
If you want to filter or manipulate the imported data, then you'll need to use other Google Sheets functions. These are documented here:
Google Sheets function list
Alternatively, you might want to "import" input one sheet, then select certain data into another, separate sheet:
Get data from other sheets in your spreadsheet
Here are some examples for "filtering" your data:
FILTER function

Using =GetPivotData in GoogleSheet to get specific Grand Total

I'm trying to use =GetPivotData formula to dynamically pull specific sum totals (40% & 80% Sell-Out Probability in my example) into a table. I may not fully understand how to use the formula, but everything I've tried has returned an error.
Sample Sheet
This formula seems to pull the answer you want, if I understand your question.
=GETPIVOTDATA("SUM of Sales",H2,"Sell-Out Probability","40%")
This would go in N8 of your sample sheet.
N9 would have the same formula, but with "80%" as the last value.
Let me know if this helps.

Need help getting data from website using importxml and xpath

I would like to have some help to get the data beside ROE from this link using importxml / xpath. http://fundamentus.com.br/detalhes.php?papel=TAEE11 ... so in this case the ROE data is 20,8% . I would like to get this value using importxml / xpath.
How to do that? I've tried some formulas but.. not able to get the details from the website.
You can do this with a combination of importxml, match and index:
=index(IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),match("roe",IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),0)+1,0)
Basically what is happening is that by point to the class # txt it stacks all the labels above the data fields, so you can consistently search for a label such as ROE, and just increase the index by 1 to retrieve the corresponding value.

Google Spreadsheet & ArrayFormula - auto-adding formulas

I'm having problems with something that is likey very simple to correct. I have a form that submits data to a Google Spreadsheet, simply a date, name and score. On a separate sheet I am going to have a leaderboard which shows all submissions ranked by highest score (for simplicity in the example in the link below, I just have the leaderboard showing up on the right of the same sheet). I have it sorting the data fine, but I'm struggling with getting the 'rank' value to display. As shown for the first 3 rows (G2, G3, G4) I know what the formula is to display the 'rank' value...but what I'm struggling with is how to get that value to show without having to have that formula in each cell. Since the data will be coming from a form, there will obviously be new rows added regularly which means the leaderboard will automatically get adjusted and I want all of the rows to display the rank #. From what I have read, ArrayFormula should allow this to work, but even with looking at examples I can't figure out how to get it to work with my formula.
I know I could just highlight the entire 'G' column and paste in the formula, and hope it adds it to enough rows...but then it displays 'N/A' for all of the rows which don't currenlty have any data.
Hoping its just a simple solution that I'm being dumb and missing...any help would be greatly appreciated. The link to an example is below. To summarize, for all rows that have content in column H and I, the G cell for that row should show the rank value automatically.
https://docs.google.com/spreadsheets/d/1pCIJQi5g2scOtB6o2PgVVb-0azzhupEOPjiL0RMM57A/edit?usp=sharing
Thank you!
=ARRAYFORMULA(RANK(INDIRECT("I2:I"&COUNTA(H:H)),$I$2:I,0))
This will automatically rank and sort, for all values, including additional ones that are added. You only need to enter it into G2, and it will dynamically fill in the rest for you.
You can use
IFERROR(RANK(...),"")
and drag it to all rows - this will leave blank cells instead of #N/As. I'm sure there are other ways but that seems like the easiest one to me.

Resources